9

HRDP Optimization with Hybrid Parallel Computing and ...ltis.icnslab.net/ALTIS/Files/20131031__1389269588127.pdf · using CPU codecs can cause a long delay due to the long compression

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HRDP Optimization with Hybrid Parallel Computing and ...ltis.icnslab.net/ALTIS/Files/20131031__1389269588127.pdf · using CPU codecs can cause a long delay due to the long compression
Page 2: HRDP Optimization with Hybrid Parallel Computing and ...ltis.icnslab.net/ALTIS/Files/20131031__1389269588127.pdf · using CPU codecs can cause a long delay due to the long compression

HRDP Optimization with Hybrid Parallel Computing and Mirror Driver ................................ 292 Myeong-Seob Kim, Jun-Young Park, Kyoung-Hun Kim, Eui-Nam Huh

A 3GPP IP Multimedia Subsystem-Based System Architecture for a M2M Horizontal Service Platform ............................................................................................................................................ 299 Jhon Edisson Villarreal Padilla, Jung Ho Kim, Jae-Oh Lee

Roles of Technical and Functional Service Quality, and Cognitive and Affective Trust in IT Service Encounter ............................................................................................................................ 312 Jun-Gi Park, Hyejung Lee, Junbeom Jang, Jungwoo Lee

Fault Attack and countermeasures on Elliptic Curve Scalar Multiplication based on Euclidean Addition Chain ................................................................................................................................. 322 Tae won Kim, Young-Ho Park, Soo Jeong Lee, Sung Min Cho

A Study of Factors that Influence the Acceptance of the Anti-Piracy System ............................ 330 Jang, Dong-Huk, Han, Kyeong Seok

Secret Sharing based on XOR for Efficient Data Recovery in Cloud Computing Environment ..................................................................................................................................... 348 Su-Hyun Kim, Im-Yeong Lee

A Chaining Authentication Scheme Using S/Key OTP Based on a Fast and Secure Hash Algorithm in Wireless Sensor Networks ........................................................................................ 353 DONG-HOON KIM, YOUN-SIK HONG, KI YOUNG LEE

Development of Web-based Collaborative Knowledge Creation, Delivery, and Presentation Tool .................................................................................................................................................... 359 Won Ho

Study of the u-Health Fusion Service Information Protection Pre-diagnosis Method ............... 364 Jung Chan Suk, Yongtae Shin

Linear Constellation Precoding Strategies in OFDM Based Two-Way Relay Systems .............. 370 Joonwoo Shin, Bangwon Seo

Side-channel Attacks on the Final Round SHA-3 Candidate Skein............................................. 379 Ae-Sun Park, Dong-Guk Han, Young-Ho Park

A Casual Structure Analysis of Smart phone Addiction: Use Motives of Smart phone Users and Psychological Characteristics .......................................................................................................... 386 Lee Young Joo

Influence of SNS use on Marketing Performance of Internet Shopping Mall Businesses - with or without Consulted Experience - ...................................................................................................... 393 Sang-Sun Kim, Sang-Hyun Kim, Yen-Yoo You

A Multi-item Stochastic Demand Periodic Review System with a Budget Constraint ............... 403 Myeong-Ho Lee, Dong-Ju Lee

Page 3: HRDP Optimization with Hybrid Parallel Computing and ...ltis.icnslab.net/ALTIS/Files/20131031__1389269588127.pdf · using CPU codecs can cause a long delay due to the long compression

HRDP Optimization with Hybrid Parallel Computing and Mirror Driver

1Myeong-Seob Kim, 2Jun-Young Park, 3Kyoung-Hun Kim, 4,*Eui-Nam Huh 1,First AuthorKyung Hee University, [email protected]

2Kyung Hee University, [email protected] 3Gangdong College,[email protected]

4,* Corresponding AuthorKyung Hee University, [email protected]

Abstract The need to process high definition (HD) video and 3D applications is becoming one of the major

issues in mobile cloud computing. The Mobile Virtual Desktop Infrastructure (mVDI) service needs efficient data processing in order to provide a high quality of experience (QoE) in mobile cloud computing. In this paper, we propose and discuss a mVDI service optimization method using OpenMP and Compute Unified Device Architecture (CUDA). Also propose and compare the performance of a system level hooking method that uses a mirror driver to reduce the system latency.

Keywords: Mobile Cloud, mVDI, Thin Client

1. Introduction

Recently, the need for high-end applications such as HD video and 3D applications has been

expanding on low resource and mobile devices with increasing quantities of content data being transmitted. mVDI services have been developed in order to satisfy these requirements. mVDI enables a virtual mobile platform on the cloud, and all contents are processed using powerful cloud resources. The result of this application on cloud is a screen received by the user’s device with a remote display protocol. To provide result screens in the mobile network, remote display protocols need efficient hooking and encoding technology.

In this paper, we propose and discuss the mVDI service optimization method in a CPU and GPU mixed environment using OpenMP [7] and CUDA [8]. Furthermore, to reduce system latency during hooking, suggest a system-level hooking method with a mirror driver [9].

Section 2 discusses some related work in mVDI research for satisfying high-end applications in the mobile cloud. Section 3 gives an overview of optimized architecture and describes each optimization method. Discusses performance results in Section 4 and conclude with suggestions for future work in Section 5.

2. Related work 2.1. GPU virtualization

Cloud environments provide a powerful resource using virtualization technology. Recently, the

nVidia grid [3][6] was released. nVidia grid is a technology for GPU virtualization and it provides a virtual desktop and application with network-delivered GPU acceleration. Cloud gaming [2][5] is one of the examples using virtual GPU acceleration.

Applied to the mobile cloud, GPU virtualization can also be a key method for addressing the requirements of high-end mobile application. In original Hybrid Remote Display Protocol (HRDP) case [1], they applied the GPU process to improve performance in an MJPEG encoding, but this has limitations since there is no consideration for multi GPU by GPU virtualization. Therefore, in this paper, consider using multi GPU to optimize HRDP.

2.2. Hybrid Remote Display Protocol

A mobile thin client made it possible for an mVDI service to be provided to multiple devices

without location constraints. The mobile device receives display images from the application

HRDP Optimization with Hybrid Parallel Computing and Mirror Driver Myeong-Seob Kim, Jun-Young Park, Kyoung-Hun Kim, Eui-Nam Huh

Journal of Next Generation Information Technology(JNIT) Volume 4, Number 8, October 2013

292

Page 4: HRDP Optimization with Hybrid Parallel Computing and ...ltis.icnslab.net/ALTIS/Files/20131031__1389269588127.pdf · using CPU codecs can cause a long delay due to the long compression

which is running on the server. In addition, developed a protocol for a mobile Thin-Client called the Hybrid Remote Display Protocol.

HRDP uses a combination of RFB, a classic thin-client protocol, and MJPEG to send the graphical output of an application. Also, to separate high motion and low motion sections, HRDP uses motion detection.

When low motion is detected, the RFB module will be used as the remote display protocol since it consumes fewer resources at the server. In the high motion scenario, the MJPEG module is responsible for real-time encoding. With HRDP, it can provide high quality mVDI service over low mobile bandwidth. However, HRDP needs more reduced system latency to provide HD video or 3D applications. In Section 3, we propose an optimization method with hybrid parallel processing and mirror driver hooking to address this.

3. Optimization Method

The HRDP work is divided into 4 threads and basically supports parallel processes. The tasks

assigned to each thread are shown as follows:

Thread 1: Screen capturing, Motion detection Thread 2: MJPEG encoding Thread 3: RFB encoding, Data combination Thread 4: Data transmission

Before applying optimization, we simulated the original HRDP time. The simulation environment is

further described in Section 4. The simulation results are shown in Figure 1. In Figure 1, classified according to application type such as word processor, browser, game, video and measured performance. In our test results, we found a lot of time consumption in the screen capturing module (Thread1).

Figure 1. Original HRDP latency test result

We analyzed the screen capturing module and allocated a suitable thread to each work using

OpenMP. Figure 2 shows the original HRDP screen capture module (left) and the changed screen capturing module (right). In the screen capturing module, HRDP hooks the raw data using the mirror driver. After hooking, HRDP compares the previous and current screen in order to detect motion. The raw data is transferred to YCbCr color. Each component’s specific task are shown as follow:

Raw Data: Hook the raw level screen data with system kernel level API Compare Screen: Before applying encoding RFB or MJPEG, compare the screen data

with before screen data Save Difference Screen: Save difference screen data as a comparing result with bmp file Save Current Screen: Save current raw level screen data with bmp image for comparing

next raw screen data

HRDP Optimization with Hybrid Parallel Computing and Mirror Driver Myeong-Seob Kim, Jun-Young Park, Kyoung-Hun Kim, Eui-Nam Huh

293

Page 5: HRDP Optimization with Hybrid Parallel Computing and ...ltis.icnslab.net/ALTIS/Files/20131031__1389269588127.pdf · using CPU codecs can cause a long delay due to the long compression

Raw to YCbCr: For encoding with MJPEG, transferred difference screen data with YCbCr color

Before applying OpenMP, divide the task into a repetitive subtask and an independent subtask. In

the repetitive task, allocate more threads in order to reduce task time. In the independent task, allocate each task thread for multitasking.

Figure 2. Original HRDP Screen Capturing Module (Left) and changed Screen Capturing Module (Right)

In addition, we also use multi GPU processing in the CUDA environment to improve the potential

performance. JPEG compression is a computing intensive task that can be slow on current CPUs, since using CPU codecs can cause a long delay due to the long compression time. So HRDP uses the GPU for MJPEG encoding.

To apply multi-graphics cards, analyzed the MJPEG encoding module. Figure 3 shows the original HRDP MJPEG encoding module (left) and the changed encoding module (right). In this module raw data means YCbCr color that mentioned at screen capturing module. YCbCr color compose with chrominance color (Cb, Cr) and luminance color (Y). In original HRDP MJPEG encoding module, HRDP encode luminance color and chrominance color sequentially. So create 2 thread and allocate each task on each graphic card for independent running.

Figure 3. Original HRDP MJPEG Encoding Module (left) and changed Encoding Module (right)

A mirror driver is a display driver for a virtual device that mirrors the drawing operations of one or

more additional physical display devices. A mirror device can specify an arbitrary clip region in the virtual desktop, including one that spans more than one physical display device. GDI then sends the mirror device all drawing operations that intersect that driver's clip region. A mirror device can set a clip region that exactly matches a particular physical device; therefore, it can effectively mirror that device [4].

In original HRDP, use GDI procedure for screen hooking. Mirror driver has better performance than the GDI procedure, hence, we modify the screen capturing method using the mirror driver. Figure 4 shows our proposed HRDP architecture. The bold boxes have been modified and added in the current work.

HRDP Optimization with Hybrid Parallel Computing and Mirror Driver Myeong-Seob Kim, Jun-Young Park, Kyoung-Hun Kim, Eui-Nam Huh

294

Page 6: HRDP Optimization with Hybrid Parallel Computing and ...ltis.icnslab.net/ALTIS/Files/20131031__1389269588127.pdf · using CPU codecs can cause a long delay due to the long compression

Server

Kernal

Screen Capturing Module (Thread 1)

Motion Detection Module

GPU Encoding Module (Thread 2) CPU Encoding Module (Thread 3)

Frame Sending Module (Thread 4)

M‐CUDA Parallel

DCT QuantizationGet Screen

Mirror Driver

Create Current Frame

Parallel High Motion Analysis

CurrentFrame

High Motion Area

RGB to YCbCr

MJPEG Temp Buffer

Run Length Encoding

Huffman Encoding

Previous Frame RFB Encoding

Establish Connection

Generate String

String to be Sent

Send to Client

ZigZag

EncodingBuffer

Figure 4. Proposed HRDP architecture

4. Simulation Environment & Results

We simulate performance to compare the original HRDP with our proposed optimization method.

For these experiments, measured the performance on the physical machine with the following hardware environment.

Table 1. Simulation environment

Name mVDI Server mVDI Client

CPU 3.30GHz

(Quad Core) 2.93GHz

(Quad Core) GPU GeForce GTX 460 GeForce 9300 GS

OS Windows

Network Ethernet

Display Screen 640*480

Figure 5. Application-specific performance evaluation with OpenMP

HRDP Optimization with Hybrid Parallel Computing and Mirror Driver Myeong-Seob Kim, Jun-Young Park, Kyoung-Hun Kim, Eui-Nam Huh

295

Page 7: HRDP Optimization with Hybrid Parallel Computing and ...ltis.icnslab.net/ALTIS/Files/20131031__1389269588127.pdf · using CPU codecs can cause a long delay due to the long compression

The results are presented in Figures 5 and 8. As shown in Figure 5, the system latency time is reduced using OpenMP and the Mirror Driver methods. However, our proposed OpenMP method shows better performance on word and browser applications (approximately 15%) compare with game and video applications (approximately 5%).

Figure 6. Video Application Capturing Time Each frame

Figure 7. Word application capturing time on each frame

We focus performance difference on each application. To analyze this performance difference,

simulate and measured capturing time on each frame. The Figure 6 and 7 shows the Video/Word application capturing time on each frame graph. Line represent OpenMP result and stripe represent original HRDP. In figure 6, we can see impressive performance until 35 frame. After 35 frame, however, performance is decreased and be similar original HRDP case. In figure 7, each frame has a severe different capturing time. But instinctively, OpenMP has lower capturing time than original case.

HRDP Optimization with Hybrid Parallel Computing and Mirror Driver Myeong-Seob Kim, Jun-Young Park, Kyoung-Hun Kim, Eui-Nam Huh

296

Page 8: HRDP Optimization with Hybrid Parallel Computing and ...ltis.icnslab.net/ALTIS/Files/20131031__1389269588127.pdf · using CPU codecs can cause a long delay due to the long compression

Figure 8. GPU encoding performance evaluation for a video application

Figure 8 shows performance for three simulation types (SLI [11], multi-CUDA, mixed) using a

video application. Scalable Link Interface (SLI) is a brand name for a multi-GPU solution developed by NVIDIA for linking two or more graphic cards together to produce a single output. SLI is an application of parallel processing for computer graphics, meant to increase the processing power available for graphics.

In our proposed HRDP case, however, the GPU encoding latency increased (ca. -21.6%); yet the proposed Multi-CUDA reduced latency by approximately 30%. In the SLI and Multi-CUDA mixed cases, performance is better than in the SLI and original cases (ca. 24.8%).

Table 2. Mirror Driver Result Paper

Screen Size Running Time(ms) Sleep Time(ms)

Mirror Driver

1280*1024 385 x

1024*768 125 x

800*600 47 x

640*480 31 x

640*480 47 5

GDI 640*480 46 x

Table 2 shows the GDI and mirror driver hooking time. Simulated and measured several screen size

and compare with GDI procedure. As may you see, in same screen size, mirror driver is faster than GDI hooking (approximately 15ms). When simulation that add sleep 5ms sleep time at mirror driver hooking, mirror driver has similar capturing time.

5. Conclusion

In this paper, we proposed a method for optimizing HRDP mVDI services. Used 3 methods

for optimization, OpenMP, Multi-CUDA, and Mirror Driver, and analyzed the results of each method.

In future work, we will measure performance of the integrated system with OpenMP, Multi-CUDA, and a Mirror driver. And also, to improve each optimization method, need to analyze performance difference each application type; research GPU memory to allocate efficient data size when using multi GPU. Furthermore, research on efficient thread allocation is needed for each user, since the mVDI service needs to function for a number of people.

HRDP Optimization with Hybrid Parallel Computing and Mirror Driver Myeong-Seob Kim, Jun-Young Park, Kyoung-Hun Kim, Eui-Nam Huh

297

Page 9: HRDP Optimization with Hybrid Parallel Computing and ...ltis.icnslab.net/ALTIS/Files/20131031__1389269588127.pdf · using CPU codecs can cause a long delay due to the long compression

6. Acknowledgements

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (2010-0020725).

7. References

[1] Biao Song, Wei Tang, Eui-Nam Huh, “Novel isolation technology and remote display protocol for mobile thin client computing”, In Proceeding of the ACM ICUIMC 2012, Kuala Lumpur, Malaysia, Feb 2012.

[2] C. G. Lim, S.S Kim, K.I. Kim, J.H. Won, C.J. Park. “Technology trends of cloud computing-based game streaming”, Electronics and Telecommunications Trends. 26 vol 1, pp. 47-56, Feb. 2011.

[3] J.H Kim, I.H. Kim, C.W. Kim, Y.I Eom, “Technology trends of mobile virtualization.”, Korea information science society review, June. 2010.

[4] W.O. Kwon, H.Y. Kim, “Technology trends of high performance VDI protocol.”, NIPA Weekly Technology trends, May. 2012.

[5] K.W. Hong, J.W. Yoon, W. Ryu, “Technology trends of game virtualization.”, NIPA Weekly Technology trends, Oct. 2012.

[6] Cloud computing, http://www.nvidia.co.kr/object/cloud-computing-kr.html, nVIDIA, 2013. [7] OpenMP, http://openmp.org/wp/, OpenMp, 2013. [8] CUDA, http://www.nvidia.com/object/cuda_home_new.html, nVIDIA, 2013. [9] Mirror Driver, http://msdn.microsoft.com/en-us/library/windows/hardware/ff568315.aspx,

Microsoft, 2013. [10] VDI, http://en.wikipedia.org/wiki/Desktop_virtualization#VDI, Wikipedia, 2013.

SLI, http://www.nvidia.co.kr/object/sli-technology-overview-kr.html, nVIDIA, 2013. [11] M. S. Kim, J. Y. Park, S. J. Lee, J. H. Ku, E, Huh, “VDI performance optimization with OpenMP

based on multi core environment thick client system”, In Proceeding of the KICS winter 2013 domestic conference, pp. 324-325, Jan. 2013.

[12] S.Y. Lee, Y.R. Shin, M.S. Kim, A.Y. Son, J.S. Bong, E.N. Huh. “Parallel processing research for VDI optimization on multi GPU environment”, In Proceeding of the KICS winter 2013 domestic conference, pp. 318-319, Jan. 2013.

HRDP Optimization with Hybrid Parallel Computing and Mirror Driver Myeong-Seob Kim, Jun-Young Park, Kyoung-Hun Kim, Eui-Nam Huh

298