A Measurement Study on Achieving Imperceptible Latency in ...siekkine/pub/kamarainen17mmsys.pdf · a few papers on cloud gaming server placement also exist[14, 35]. To the best of

A Measurement Study on Achieving ImperceptibleLatency in Mobile Cloud Gaming

Teemu KämäräinenAalto University

[email protected]

Matti SiekkinenAalto University

[email protected]

Antti Ylä-JääskiAalto University

[email protected]

Wenxiao ZhangThe Hong Kong University of

Science and TechnologyHong Kong

[email protected]

Pan HuiThe Hong Kong University of

Science and TechnologyHong Kong

[email protected]

ABSTRACTCloud gaming is a relatively new paradigm in which thegame is rendered in the cloud and is streamed to an end-user device through a thin client. Latency is a key challengefor cloud gaming. In order to optimize the end-to-end la-tency, it is first necessary to understand how the end-to-endlatency builds up from the mobile device to the cloud gam-ing server. In this paper we dissect the delays occurring inthe mobile device and measure access delays in various net-works and network conditions. We also perform a Europe-wide latency measurement study to find the optimal serverlocations and see how the number of server locations affectsthe network delay. The results are compared to limits foundfor perceivable delays in recent human-computer interactionstudies. We show that the limits can be achieved only withthe latest mobile devices with specific control methods. Inaddition, we study the expected latency reduction by nearfuture technological development and show that its poten-tial impact is bigger on the end-to-end latency than that ofreplication of the service and server placement optimization.

CCS Concepts•Information systems → Multimedia streaming;•Networks → Network measurement; Cloudcomputing;

KeywordsCloud gaming; latency; network measurements

1. INTRODUCTIONCloud gaming is a recently emerged application that com-

bines the concepts of cloud computing and online gaming. It

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

MMSys ’17 June 20–23, 2017, Taipei, Taiwanc© 2017 ACM. ISBN 123-4567-24-567/08/06. . . $15.00

DOI: 10.475/123 4

allows the end-user device to offload computation, storage,and the tasks of graphic rendering to the cloud. This re-duces hardware requirements for the end-user device, as onlya thin client that takes care of video decoding and user in-teraction is running on the device. The resulting client-sideresource requirements are constant regardless of the gamebeing played.

The main challenge for cloud gaming is latency. The userexperience when playing certain games is very sensitive to la-tency. These games become unplayable if the delay betweenthe action, such as tapping a control key to steer a car left,and response to that action, i.e., observing the car to turnleft on display, becomes too high. The latency challenge isamplified with wireless mobile networks which typically adda longer part to the end-to-end latency compared to fixedwired access network technologies. To cope with the latencychallenge, the cloud gaming system must be distributed sothat the gaming servers can be brought closer to the userwhen necessary.

This paper is our first attempt to answer the followingquestion: “Is it possible to implement and deploy a mo-bile cloud gaming service with which the user cannot per-ceive the latency imposed by remote computation, and if so,how?” The fact that humans cannot perceive very short la-tencies motivates our study. In particular, earlier work onhuman-computer interaction has characterized perceivablelatencies in direct and indirect interaction by human witha touch screen [11]. In this paper, we study the achievabil-ity of latencies below these perceivable latency bounds inmobile cloud gaming using currently available technologies.We also do some projections based on technological advancesexpected in the near future.

We first show a set of measurements to characterize therange of expected latencies in mobile cloud gaming systems.We then study the effect of server deployment in the case ofEurope-wide deployment of a cloud gaming service where weleverage measurements with a wide range of possible clientand server locations from Planetlab[28] and Speedtest[34].Specifically, we study the expected latencies observed byclients in different geographic locations when distributingservers in different number of optimally or near optimallychosen locations.

The problem of optimal server placement given constraintsand an optimization target, such as cost or latency mini-

mization, has been studied earlier with peer-to-peer systemsand content distribution networks[36, 29]. Gaming imposesspecific latency constraints and workload and, consequently,a few papers on cloud gaming server placement also exist[14,35]. To the best of our knowledge, our work is the first ofits kind in two ways: 1) We focus specifically on mobilecloud gaming and 2) we study the achievability of imper-ceptible latency in such a system. Hence, our work differsfrom previous studies that have used latency limits whoseeffect on user experience have been characterized throughMean Opinion Score (MOS) scale ratings in subjective ex-periments and that do not correspond to true end-to-endlatencies in vast majority of cases. We argue that impercep-tible end-to-end latency is the holy grail that such systemsshould strive for and, therefore, we feel that this study istimely and important. In addition, while previous work fo-cuses on optimizing existing cloud gaming systems and isconstrained by their server and client locations, our study isnot tied to any deployed gaming system.

In summary, the contributions of this paper are the fol-lowing:

• We dissect the end-to-end latency in mobile cloud gam-ing and study the impact of different factors on thetotal delay.

• We show that it is possible in mobile cloud gamingsystems to achieve latencies so short that users cannotperceive them with current technologies. However, itrequires distribution of the game servers, latest mobiledevice models, and specific control methods. Specifi-cally, the use of touch screen adds too much latencywith currently available mobile devices and, thus, pro-hibits reaching short enough latencies.

• We quantify the relative impact of different mobile de-vice, server, and network induced delays and show theimpact of server placement optimization for the end-to-end latency in a Europe-wide case study. We alsomake projections on the impact of near future techno-logical developments on the latency.

2. BACKGROUND

2.1 Mobile Cloud GamingMoving games into the cloud enables users to play compute-

intensive games on mobile devices like smartphones and tablets,as heavy processing is offloaded to the cloud. A dozen or socompanies are offering varying cloud gaming solutions atthe moment, while many trials have ended in bankruptcyor discontinuation of the service. Nvidia’s Geforce Now andSony’s Playstation Now are some of the biggest cloud gamingproviders at the moment with Geforce Now having supportfor their own mobile device although their current focus hasbeen shifted for console gaming.

So far the most common usage of cloud computing in gam-ing is to deliver the game software and to manage the userdata. Meanwhile, a lot of game providers have already beenrunning multi-player game servers on the cloud. These gameservers can handle the state changes of the connected gameclients, while leaving the graphic rendering on the gameclients. Alternatively, the game servers can take care ofgraphic rendering as well to keep the game clients as thin aspossible, which is the focus of our work too.

Figure 1: Framework of a cloud gaming platform(adapted from [32]).

The main advantages of moving games to the cloud arethe far less strict requirements for the client hardware andthe possibility to remove almost all architectural limitations.Potentially millions of new devices such as tablets and phonescould gain access to games, which would run normally onlyon high-end desktop PCs. Users would not need to upgradefrequently their devices to support the latest games anymoreand possibly enable the users to play more games thanks tothe lowered hardware/software costs. As discussed in [33]and [15], cloud gaming also has potential to reduce powerconsumption and production costs and increase net revenuesfor the developers by not having to develop for several plat-forms at the same time.

Figure 1 presents the framework of a typical cloud gamingplatform. The game is entirely executed and rendered bythe cloud gaming server running in the cloud. There aretwo data flows between the cloud gaming server and theclient. The video data flow carries rendered, captured, andencoded images from the server to the game clients whichare then decoded and played back to the user. The controlflow is used by the clients to capture user’s control inputand convey it to the cloud gaming server which replays theinput on the host machine.

Similar to the work presented in [15], we term the timedifference between a user’s command input and the corre-sponding in-game action appearing on the screen as the re-sponse delay (RD). It is a key factor that affects the qualityof user experience while playing cloud-based games. Fol-lowing the terminology presented in [8], it can be furtherdivided into three components: processing delay (PD), net-work delay (ND) and playout delay (OD).

The processing delay is the time interval between theserver receiving a command from the user and submittingthe corresponding video frame to the user. Network delay isthe round-trip time (RTT) between the thin client and thecloud gaming server and is equal to the time it takes for acommand to traverse through the network and a frame tocome back to the client. Playout delay is the time it takesfor the client to display a frame to the user. It includesframe reception, frame decoding and displaying the decodedframe on the display of the mobile device. In addition, thepipeline includes also control delay (CD), which means thetime difference between the user initiating a command (e.g.touching the screen, pressing a button on a gamepad) andthe device registering the input. Playout delay and controldelay form the device delay, which together with networkand processing delays create the end-to-end pipeline.

2.2 Related WorkWhile deep understanding of the impact of latency on user

experience is still an open problem, many discoveries havebeen made about human perception of latency with mobiletechnology. Deber et al. recently characterized the Just No-ticeable Difference (JND) threshold to a reference latency of0.98 ms, i.e. virtually zero latency, through psychophysicalexperiments based on the adaptive staircase procedure[11].They also investigated the impact of additional latency ontask performance in direct and indirect user interaction witha touch device. They found that the mean JND thresholdfor a simple tapping task is 69 ms and 96 ms for direct andindirect touch, respectively. The threshold is substantiallyshorter when performing a dragging task. Lee et al. stud-ied error rates in pointing tasks where a target is about toappear within a limited time window for selection[21].

The impact of network latency on the quality of gamingexperience has been discussed in several studies. For exam-ple, Pantel et al. [27] proposed that the delay should not belonger than 100 ms based on the measurement of two racinggames. On the other hand, Lee et al. [24] show that theimpact of latency on quality of experience depends on thethe game type because there are notable differences betweenthe amount of screen changes in response to a player’s com-mands. However these studies examined only the impactof network delay on traditional multiplayer servers wherethe delay can often be compensated. Jarschel et al. haveresearched the perceived QoE of users in different networkconditions specifically in cloud gaming [18]. They concludedthat in fast-paced games the delay component becomes thedominant metric affecting the QoE. Unfortunately, the basedelay of the system was not measured in the study. This iswhy the latency limits found cannot be directly used to setboundaries to acceptable end-to-end latencies.

Deploying cloud services in a geographically distributedmanner has also been discussed by for example Zhang etal.[37], who focused on the optimization algorithms that de-termine where to deploy the services with minimum cost andpotentially optimal quality of service. In addition Hong etal. have studied the efficient consolidation of multiple cloudgaming servers on a physical machine[13]. In this paper weshow how the network latency is affected as more server lo-cations are added to the system. Satyanarayanan et al. havepresented Cloudlets that offer computing power for mobileclients within one-hop latency[31]. The solution proposedby Choy et al. is to use the existing CDN network to offloadcomputation from the mobile device[9]. Processing delay onthe server side as well as on the client side occur both inthe software and hardware. Jain et al. focus on balancingthe network and the computational delay with accuracy inmobile AR[17]. Lee et al. developed a system to specula-tively execute different possible scenarios of a cloud gamein order to mask latency[22]. This however requires sup-port from the game engine and results to more overhead inthe transmission. Boos et al. applied the same approach toVR[4]. Their system aggressively precomputes and cachesall possible images that a VR user might encounter in orderto achieve low latency and energy consumption.

Concerning latency measurements, we performed a dis-section of the end-to-end latency particularly on the mobiledevice side in our earlier work [2]. Related studies havemainly used timing hooks injected into the code or a high-speed camera[7, 16, 6]. Our method includes a modified and

Figure 2: The complete end-to-end response delaypipeline in cloud gaming with the used terminology.

extended version of the WALT Latency Timer [20] togetherwith code injections allowing a full dissection of the delaypipeline. Similar measurement setups have been previouslyutilized successfully in measuring mobile phone display re-sponsiveness[3, 10]. Our approach can also measure gyro,gamepad and Bluetooth delay on top of the traditional touchlatency measurements. A predictive approach has also beenproposed by Cattan et al. but it requires separate calibra-tion[5]. We summarize the latency measurement results inSection 3 and use the found base latencies in evaluating thepresent and future potential of providing a completely un-noticable delay in the case of mobile cloud gaming. However,the latency study for the wired segment portion of the end-to-end latency can also be utilized in designing a non-mobilecloud gaming service.

3. LATENCY MEASUREMENTSIn order to understand whether imperceptible latency is

achievable, we first need to understand the end-to-end la-tency resulting from the current technology used in mobilecloud gaming. The complete end-to-end pipeline of the to-tal delay together with the used terminology is presented inFigure 2. In this section we summarize our mobile device la-tency measurements from our previous work[2], measure theeffect of different access networks and network conditions onthe latency, and confirm server delay results from previousstudies with our own experiments. In Section 4, we continuethe analysis by performing a server placement optimizationstudy based on the wired segment part of the network delay.

3.1 Mobile deviceFor analyzing the mobile device induced delays we run

a number of test cases in which we vary parameters listed

Table 1: Parameters varied in the different test casesControl Touch / Gamepad (USB/BLE) / Gyro

Mobile phone Samsung S4 / Samsung S7

Figure 3: Touch, gamepad, Ethernet and screen de-lay measurement setup.

in Table 1 to find the lower bounds for the delays presentin different control scenarios. The mobile devices used forthe measurements are Samsung S4 and Samsung S7. Thedevices are equipped with the newest Android versions avail-able for each device at the time of writing, which are An-droid 5.0 for the S4 and Android 6.0 for the S7. We controlthe device either by using the touch screen or an externalgamepad connected to the mobile device both with USB andBluetooth connection. We also measure the delay when thegame is controlled using the embedded gyroscope of the mo-bile device. The gaming platform consists of a remote serverand an Android mobile client device, both of which run theGamingAnywhere software[15].

3.1.1 Control delayWe define control delay in the mobile device to be the de-

lay between user initiating a control input and the controlcommand to be sent through the radio interface of the mo-bile device. The most common interface for user commandsis a virtual gamepad using the touch screen of the mobiledevice. However, an external or embedded USB-connectedgamepad can also be used in mobile cloud gaming. Thegamepad can also be attached via a wireless Bluetooth con-nection. Additionally, the gyro sensor inside the mobile de-vice can be used for controlling certain games. Each controlmethod introduces different amounts of delay to the pipeline.

In our previous work we have measured the control delayof the input options mentioned previously using an Arduino-compatible measurement device depicted in Figures 3 and 4.The device is based on the WALT[20] project. The device isbuilt around the Teensy 3.2 USB development board. Thedevice can trigger a touch input on the mobile phone us-ing a coin attached to a relay. The device can also act asa gamepad inputting control commands through the USBinterface. The two attached photodiodes can detect thechange in illumination of a marked frame on the displayof the mobile device. This information together with timesynchronization with the mobile device is used to calculatethe base latencies of the mobile devices. The time error be-

Figure 4: Gyro and Bluetooth delay measurementsetup.

Table 2: Control delay measurement results.Samsung S4 Samsung S7Avg. SD Avg. SD

Touch to kernel (ms) 40.5 2.3 24.1 3.0Gamepad to kernel (ms) 0.6 0.6 0.2 0.4Kernel to callback (ms) 5.5 1.6 3.4 0.6Callback to radio (ms) 9.1 2.7 1.6 0.8

tween the devices is minimized using the USB connectionfrom the Teensy board to the mobile phone before each ex-periment. The setup is capable of synchronizing the clockswithin 100µs accuracy

In addition we measured the delay of the embedded gy-roscope in the mobile device by comparing it to raw read-ings from a gyro attached to the measurement device. Boththe reference gyro and the mobile phone were attached toa metal plate. When moved, both create a similar acceler-ation trace. Comparing the start and end times of move-ments, we could measure how much delayed are the gyroreadings available for mobile application. The measurementdevice was also equipped with a BLE (Bluetooth Low En-ergy) chip, allowing us to measure the delay in receivingcommands through a Bluetooth connection.

The average delays and standard deviations are summa-rized in Table 2. The results show the delays from the userinput to the kernel registering the event and the delay fromthe kernel event to the application code (callback). We alsomeasured how long does it take to prepare a single controlevent packet and send it through the radio interface. Touchscreen has the slowest response on both tested devices. Thedelay however varies signifigantly between the devices. Thenewer Samsung S7 averages at 29.1 ms total control delaywhile the older Samsung S4 uses 55.1 ms on average for pro-cessing the touch input. A USB-connected gamepad is the

Table 3: Mobile device gyro sensor and Bluetoothdelay.

Samsung S4 Samsung S7Avg. SD Avg. SD

Gyro delay (ms) 78.8 32.8 12.2 4.1BLE delay (ms) 17.5 5.2 22.0 4.9

Table 4: Frame receive, decode and display mea-surement results.

Samsung S4 Samsung S7Avg. SD Avg. SD

Frame receive (ms) 10.5 5.8 9.6 4.5Frame decode (ms) 20.4 11.6 8.3 1.1Frame display (ms) 25.1 5.4 27.3 4.7

fastest input method on all devices ranging from 5.2 to 15.2ms.

For the Bluetooth and Gyro delays, we were only able tomeasure the complete delay from input to the applicationcallback. The results are shown in Table 3. Bluetooth (BLE)response varies between 17.5 ms and 28.2 ms on the testeddevices. Gyro response is fast, around 10 ms for the morenewer devices. The gyro sensor on the Samsung S4 howeverseems to be significantly slower with a delay of 79 ms.

In the cloud gaming use case the controls are sent directlyto the cloud gaming server. We measured this delay to be9.1 ms on the Samsung S4 and 1.6 ms on the Samsung S7.Overall the results show the decrease in control delay usingthe more recent mobile device and Android operating systemversion.

3.1.2 Playout delayIn order to measure the total playout delay (OD) per-

ceived by the player, we slightly modified the GamingAny-where Android client by injecting timestamps to differentparts of the code. In this way, we could log and analyze alsothe breakdown of the total OD into delay caused by framereception and frame decoding. We measured the timestamp-ing code to add under 1 ms of additional latency to thesystem. The measurement is repeated for several framesprocessed giving us the possibility to calculate the averagesover a time period.

Frame receive and decode: The cloud gaming clientreceives a single frame in multiple network packets and theentire frame is then decoded using the hardware accelerateddecoders. Using the GamingAnywhere client, we measuredboth delays with 1080p resolution. The results are presentedin Table 4. The results show that the delay of frame recep-tion is not dependant on the device. However, frame decod-ing is substantially faster with the newer Samsung S7 whichdecodes a single frame in roughly 8 ms compared to the 20ms of the Samsung S4.

Frame display: After decoding the frame is sent to thedisplay buffer. We define frame display delay as the timebetween the frame returning from the video decoder to thetime the frame is displayed on the screen of the mobile de-vice. Our previous measurements have shown that this finaldelay is a substantial addition to the overall base latency.Table 4 includes the frame display times measured for eachof the mobile phones. The frame display time does not varysubstantially between the tested devices. This is because all

phones have similar displays with 60 Hz refresh rates whichtranslates to the screen updating roughly every 17 ms. Theresults however show an average of 1.5 display refresh peri-ods from decoder output to the frame being visible on thescreen. This is because Android uses double buffering toavoid screen tearing with the expense of display delay.

3.2 Network delayThe cloud gaming client is connected to the Internet us-

ing either a WiFi or a dedicated campus LTE network whichwas very lightly loaded. The deployment includes two cases:When using WiFi access, we deploy it in the same local net-work as the mobile device. When using the LTE access,the gaming platform is deployed behind a fibre connectionfrom the Internet Service Provider (ISP) which in practiceprovides similar latency as if it is deployed within the ISP’snetwork. We measured a difference of 1-2 ms when measur-ing the latency to our gaming platform server compared tothe first pingable IP address behind the packet core of theLTE network from our mobile device.

3.2.1 Access delayAccess delay is the latency between the mobile device and

the first pingable IP address. In the Wi-Fi use case thisis the access point and in the case of mobile networks it isa machine just behind the packet core. Regardless of thedistance between the cloud gaming client and the server,access delay must be accounted for to the base latency ofthe system. We connect the tested mobile devices both toa dedicated campus LTE network and to a local networkthrough a Wi-Fi router. The LTE network is a full-fledgednetwork setup with a maximun download speed of 70 Mbps(category 3) with practically no load. The average delay forthe LTE scenario was 12 ms. The Wi-Fi network was fasterin optimal conditions with an average delay of only 1.6 ms.

3.2.2 Impact of Access Network Conditions on La-tency and Throughput

In the optimal test cases the LTE network has virtuallyno load and the nearest base station is located close withinthe same building as the test device. In reality the signalstrength of the connection might vary drastically betweengeographical locations which affects the response delay. Wenext look at the effect of signal strength and cross-traffic tothe delay of the LTE scenario.

We measure the effect of signal strength on the networkdelay by logging the LTE Reference Signal Received Power(RSRP) and the latency to the closest pingable IP addresswhile walking closer and further from the base station. RSRPis defined as the linear average over the power contributionsof the resource elements that carry cell-specific reference sig-nals within the measurement frequency bandwidth [1]. Fig-ure 5 highlights the effect of signal strength for the delay.

We observe three distinct patterns in the delay. The delayis really stable from -50 to just under -70 dBm. After thatthe fluctuation of the delay becomes more noticeable up toa signal strength of -100 dBm. Starting from just under -100 dBm, the delay starts to fluctuate more heavily and theaverage delay doubles quickly. At -100 dBm the delay hasalready doubled and around -110 dBm the average delay isfour times the delay in the optimal conditions and contin-ues to grow exponentially. We used packet length of 1700bytes in the experiments which we measured to be the aver-

−60 −70 −80 −90 −100 −110 −120

050

100

150

200

RSRP (dBm)

Net

wor

k de

lay

(ms)

Median25% percentile75% percentile

Figure 5: The effect of signal strength on delay mea-sured to the first pingable IP address in the LTEnetwork.

age packet length of the RTP traffic used between the cloudgaming client and server. Total of 3000 latency/RSRP pairswere measured. The packet loss in the link layer is the maincontributor for the delay in the worst signal strength con-ditions. Such low signal strength values are however rarelyobserved in commercial network with decent network cover-age. In our further analysis of the delay results in Section4 we presume a network with good coverage and thus don’ttake into account the increased network delays in extremeconditions.

The traffic load in the access network also affects directlythe delay perceived by the user as the user’s mobile phoneshares the available resources with others. We were ableto create a notable difference in the delay with three othermobile devices connected to the base station. We uploadedrandom data with Iperf to a server with one to three LTEcategory 4 mobile phones while measuring the latency tothe nearest pingable instance with an LTE category 3 mo-bile phone. Using only one or two devices to create trafficwas not enough but with three devices sending traffic uplinkat their maximum rates, the test device’s latency started tofluctuate. Figure 6 shows the fluctuation with one to threemobile phones creating uplink traffic. We did not observesimilar results in downlink traffic. However increasing theamount of users generating traffic should worsen the delaysobserved also in the downlink case. The uplink scenarios arehowever rare with current mobile applications. The major-ity of the traffic is downstream with the exception of somevideo broadcasting applications. Even those don’t howeverconsume the entire uplink bandwidth.

3.2.3 Wide Area Network (WAN) delayWAN delay depends on the location of the cloud gaming

server and the distance between the user and the server.We conducted a ping latency study between a large numberof hosts within Europe in order to characterize the latencycontribution of the wired segment of the network path. We

0 50 100 150 200 250 300

050

100

150

Time (s)

Netw

ork

dela

y (

ms)

No cross−traffic1 user max. UL2 users max. UL3 users max. UL

Figure 6: The effect of uplink traffic on the mov-ing average of delay measured to the first pingableinstance in an LTE network.

used the Planetlab Europe testbed to measure the latencyfrom 79 Planetlab servers to 1622 speedtest.net server lo-cations. Both the Planetlab and the Speedtest.net serversare geographically distributed across Europe. The serversincluded in the study are displayed in Figure 7.

Next, we divided the European continent into smaller ge-ographical areas based on the Nomenclature of TerritorialUnits for Statistics (NUTS) classification of areas providedby Eurostat[12]. NUTS is a hierarchical system for divid-ing Europe into geographic territories for regional statisticsanalysis. We chose the smallest (level 3) areas for considera-tion and calculated the mean average latencies to each areafrom the 79 Planetlab servers.

We analyze the delay benefits of distributing the cloudgaming architecture based on the gathered latency data inSection 4.

3.3 Processing delayThe cloud gaming server captures the game scene, encodes

it into a video stream and sends it to the cloud gaming client.Simultaneously the cloud gaming server receives input com-mands from the client and feeds them to the game throughthe operating system. The processing delay depends on thespecific hardware and software used for the cloud gamingserver. Huang et al. [15] benchmarked the open-sourcecloud gaming software GamingAnywhere against commer-cial alternatives available at that time on a desktop PC.GamingAnywhere had the lowest delay ranging between 27and 34 ms depending on the game.

This delay can however be shortened by using purpose-built features of modern graphics cards. Nvidia’s solution isto capture the framebuffer on the fly and hand it to the built-in NVENC chip which is capable of encoding the graphicsoutput in realtime. These features are available for cloudgaming providers through the Capture SDK[25] and is sup-ported on selected graphics cards. We launched an AmazonG2 instance with an Nvidia GRID K520 graphics card and

Figure 7: PlanetLab servers (red) and Speedtest.netservers (blue) used in the latency study.

used timing hooks injected into the Capture SDK to measurethe time from starting a frame capture to an encoded frameready for transport. We ran the Unigine Heaven graphicsbenchmark on the virtual machine while capturing the videostream using the SDK. The average delay for a single framewas 17 ms. As the purpose of this paper is to analyze theachievability of imperceptible latency in mobile cloud gam-ing in optimal conditions, we choose the 17 ms as a referencedelay for the cloud gaming server.

3.4 Summary of different scenariosThe delay measurement results are summarized in Table

5. The results include all other delays of mobile cloud gam-ing except the wired segment of the network path. It isclear that the newer and more powerful Samsung GalaxyS7 is faster in all scenarios with an average of 25% or 37ms improvement over all the test cases. This improvementcumulates mainly from faster touch interface, faster videodecoding capabilities and a clearly faster gyroscope. In theoptimal conditions using WiFi can save approximately 10ms on the total delay as the access delay is significantlyshorter. This improvement can be even greater dependingon the network conditions.

The control method has a significant effect on the totaldelay as can be seen from the results. The most commonlyused input method using the touch screen of the mobile de-vice has quite a large impact on the total delay. There ishowever a substantial improvement in the newer S7 whichreceives a touch input in 27.5 ms on average to the appli-cation code compared to the 46 ms of the older S4. USBconnected controller is the fastest method of receiving inputcommands into the application code adding only a couple ofmilliseconds to the delay. The gyroscope on the S4 is clearlynot suitable for delay sensitive applications while the gyro onthe S7 adds only 12 ms of delay to the pipeline. This showsthat certain games with continuous movements such as cargames could benefit from using the gyroscope of the mobiledevice instead of a virtual gamepad on the touch screen.

Table 5: Delay measurement results for all scenarioswithout wired segment of the network delay.Mobilephone

Networktype

Inputtype

Devicedelay

Serverdelay

Accessdelay

Total

S4

WiFi

Touch 110 17 1.6 128.6USB 70.4 17 1.6 89BLE 86.9 17 1.6 105.5Gyro 148.2 17 1.6 166.8

LTE

Touch 110 17 12 139USB 70.4 17 12 99.4BLE 86.9 17 12 115.9Gyro 148.2 17 12 177.2

S7

WiFi

Touch 77.5 17 1.6 96.1USB 51.5 17 1.6 70.1BLE 72.5 17 1.6 91.1Gyro 62.7 17 1.6 81.3

LTE

Touch 77.5 17 12 106.5USB 51.5 17 12 80.5BLE 72.5 17 12 101.5Gyro 62.7 17 12 91.7

Bluetooth controller on the S7 has approximately the samedelay as the touch screen. However on the S4 the Bluetoothdelay is actually 5 ms shorter than on the S7. This can beexplained through the different Bluetooth configurations inthe versions of the Android operating system. The newerAndroid 6.0 accepts slightly higher minimum intervals forupdates on the BLE connection than the older Android 5.0.

The different delay scenarios are visualized in Figure 8without the wired segment of the network path. The JNDthreshold for imperceivable delay for indirect touch (96 ms)is plotted as a dashed line to the figure. With the SamsungS4 phone the only scenario which leaves any room for thedelay induced by the wired segment of the network path isthe WiFi option with USB input. Same options are alsobest for the S7, however also the WiFi with Bluetooth andgyro also manage to fall under the threshold for perceivabledelay. Using LTE, only the USB and gyro control optionsfall under the limit.

4. OPTIMIZING SERVER PLACEMENT:CASE EUROPE

In this section we measure the delay in the wired networkpath of the cloud gaming pipeline. This is the last piecemissing from the total delay. We conduct a latency study inEurope and analyze how much the delay can be shortened byincreasing the number of data center locations. The resultsare compared to the limits found for the different scenariosmeasured in Section 3. The found placements for the datacenter locations are also compared to existing and futureAmazon EC2 data center locations.

4.1 Optimizing data center location countThe 79 Planetlab servers depicted in Figure 7 were cho-

sen as the candidates for the locations of the data centersfor the cloud gaming provider. The Eurostat statistics usedalso provide population information for each area with thespeedtest.net servers for which we studied the network la-tency from the Planetlab server locations. This enabled usto have a database of geographical regions in Europe withlatency data to the selected cloud gaming server locationswith an estimate of the relative potential user data base ineach region.

We utilized a greedy algorithm to rank the servers basedon their coverage of potential users. The greedy algorithm

S4 WiFi S4 LTE S7 WiFi S7 LTE

0

20

40

60

80

100

120

140

160

180La

tency

(m

s)

Device delayServer delayAccess delay

Touch

US

B

BLE

Gyro

Touch

US

B

BLE

Gyro

Touch

US

B

BLE

Gyro

Touch

US

B

BLE

Gyro

Figure 8: Response delay in different scenarios with-out the wired segment of the network path. TheJND threshold for imperceivable delay (96 ms) isplotted with a dashed line.

has been previously proven to achieve near-optimal perfor-mance[29]. It approximates the global optimal solution bymaking the locally optimal decision in each stage. Thegreedy algorithm works as follows. In the first step, wechoose a single server location which leads to the lowest av-erage delay for all clients under the assumption that all theclients connect to that server. We then iteratively add asingle server location in each step to the system which inconjunction with the previously chosen servers lead to thebest average delay. We assume that each client connects tothe closest server location. We add server locations into thesystem one by one until the latency benefit becomes neg-ligible. We were also able to calculate the 4 first optimaldata center locations with brute force and the results wereidentical with the greedy algorithm.

Figure 9 shows how the latencies reduce as more datacenter locations are added into the system. The results showthat the latency difference between the 90th percentile fromone data center location to ten locations is roughly 20 ms.After this each new data center location barely affects thelatency distribution at all.

The results show that additional server locations above10 don’t shorten the average delay concieved by the serviceusers significantly. The locations chosen by the greedy al-gorithm are shown in Figure 10. The number before thecity name shows the priority of the location if less than 10locations are to be chosen. As expected, central regions ofEurope are more densily covered by chosen locations as thepopulation density is larger in these regions. However aswe did not impose capacity limits to the data centers, thefirst four locations are quite sparsely located to optimize thelatency for all parts of the continent.

0 10 20 30 40 50 60 70Latency (ms)

0.0

0.2

0.4

0.6

0.8

1.0

CDF

S7/LTE/Gyro

S7/WiFi/Bluetooth

S4/WiFi/Gamepad

S7/WiFi/Gyro

S7/LTE/Gamepad

S7/WiFi/Gamepad

Location count

12345678910

Figure 9: Latency distribution with different loca-tion counts

Using data from Section 3 and the 96 ms JND threshold,we can calculate what is the limit of perceivable delay forthe wired network delay in different scenarios. The limits aredrawn in Figure 9 with dashed lines. We notice that withthe newer Samsung S7 phone using WiFi and USB input wecan reach latencies below the limit for approximately half ofthe population using one data center location. The coveragecan be raised to 60% with 3 data center locations and to80% using 10 locations. With the same device and WiFialso the gyro input and USB input with LTE scenenariosare somewhat feasible for a proportion of the populationranging from 30% with one location up to 60% with tenlocations. In other scenarios it is practically impossible toachieve low enough latencies for any number of users.

Setting up a cloud infrastructure is possible only for thebiggest companies in the entertainment business. Smallercompanies most likely would at least in the beginning liketo use existing cloud providers for their server locations. InFigure 11 we compare the latency distribution when usingexisting and future Amazon EC2 locations and the locationschosen with the greedy algorithm. At the time of writing theavailable locations for EC2 servers in Europe were Irelandand Frankfurt with London and Paris to be added soon. Wecompare both existing and future EC2 locations against thesame number of locations placed by the greedy algorithm.

For the S7/WiFi/Gamepad scenario using locations cal-culated by the greedy algorithm, no additional users canachieve delays under the required limit compared to the EC2locations. However the coverage can be raised from 50% to70% by using four locations calculated by the greedy algo-rithm instead of the future Amazon EC2 locations. Thisshows that the current and future EC2 locations are possi-bly not completely optimized for latency. This is natural asthere are multiple other requirements for a data center loca-tion and most of the latency-related aspects can be coveredby using content delivery network (CDN) locations placedmore on the network edge.

In our latency study so far, we have not assigned a maxi-mum size for a data center location. This can lead to vastlydifferent amount of users server by different data center lo-cations. Next, we study how the latency distribution differs

1. Amsterdam

2. Madrid

3. Turin

4. Warsaw

5. Karlsruhe

6. London

7. Stockholm

8. Budapest

9. Lisbon

10. Prague

Figure 10: Server locations allocated by the greedyalgorithm.

when the load on data centers locations is more evenly di-vided.

4.2 Load balancing and local optimizationThe delay results presented in Figure 9 assumed all users

would be served by their closest data center location. Thisis however not always possible if a certain location is un-der heavy load. To measure the effect of load balancing onthe overall latency distribution, we perform a simulation ofa cloud gaming system where the location sizes have beencalculated so that under peak hours the users are evenly dis-tributed across the data center locations. This means that inthe peak hours a location with the lowest latency for a usermight be full which leads to the user being assigned for thenext location with longer latency until a free slot is available.We use the WoWAH dataset [23] to get realistic informationon user churn in a game. The dataset includes 91065 playersin total for which we assigned locations in Europe based onthe population distribution for different areas.

The latency distribution for the load balanced simulationis compared with the non-balanced version in Figure 12.The median latency differs by under 5 ms from the non-balanced version. The latency range widens also a little onthe balanced version, not however more than 5 ms.

The dataset used enables us to also calculate how muchthe server count can be lowered if local optimizations areperformed using migrations inside a data center. The costsincrease depending on the amount of servers the cloud gam-ing provider needs to set up. If the servers are rented, thenthe number of running instances also affects the overall cost.The g2.2xlarge instance used can handle 2-16 concurrentusers at a time[26]. We used 4 as a realistic number of con-current users for a high-end game as the instance has twoGPU cores. However, as users leave the service, the gameservers might have empty slots and the player distributionamong the servers might become unnecessarily scattered. Iflocal migration is possible, then the empty slots could be

0 10 20 30 40 50 60 70Latency (ms)

0.0

0.2

0.4

0.6

0.8

1.0

CDF

S7/LTE/Gyro

S7/W

iFi/Bluetooth

S4/W

iFi/Gamepad

S7/W

iFi/Gyro

S7/LTE/Gamepad

S7/W

iFi/Gamepad

Optimal servers [2]

Optimal servers [4]

Amazon EC2 locations [2]

Amazon EC2 locations (2017) [4]

Figure 11: Latency distribution comparison betweenAmazon EC2 locations and locations given by thegreedy algorithm.

1 2 3 4 5 6 7 8 9 10Location count

0

10

20

30

40

50

60

70

80

Latency

(ms)

Location size not limitedLocation size limited

Figure 12: Effect of load balancing on latency dis-tribution with different location counts.

filled on-the-fly and unnecessary servers shut down. We runa simulation using the dataset to calculate the possible costsavings if local migration is possible in the system. In thesimulation, the servers were optimized each hour by migrat-ing users between servers to minimize the total number ofservers running. The simulation showed that the amountof servers running can be decreased by 13% by average ifmigrations are enabled. However the maximum number ofservers required only decreased by 4%.

5. FUTURE PROJECTIONSThe results shown in Chapters 3 and 4 show that achiev-

ing imperceptible latency with current technology is difficultand requires optimal conditions with the correct type of con-trol method. The base latencies of the mobile devices arestill surprisingly high although definitive development fromprevious device generations has occurred. The touch screen

and displaying a frame on the screen of the mobile device arethe largest delay components that still need to be developed.

Previous study by Beyer et al. reveals how the touchscreen latency has the tendency to decrease over device gen-erations[3]. Frame display times are bound by the displayrefresh rate and by the use of double and triple buffering.Mobile device displays currently have a set refresh rate of60 Hz which translates into a lowest possible delay of 16.7ms between frames. In order to avoid visible tearing in theimage, the operating system, including Android, uses doubleor sometimes triple buffering for holding previous renderedframes before they are actually displayed on the screen ofthe mobile device. This results in better visual quality withthe expense of latency.

It is difficult for us to forecast whether the trend in re-ducing the touch screen latency continues in the future butsome solutions to reduce the frame display latency are al-ready available. One option is to increase the refresh rate ofthe device display. SoCs (system-on-a-chip) which supporthigher frequency (120 Hz) refresh rates have been alreadyintroduced[19]. They are not however present in currentproduction devices.

Another option is to enable on operating system level thepossibility for front buffer rendering. We measured a delayof 1.5 display periods or roughly 25 ms for display updateson current mobile phones showing the existence of doublebuffering. Our results suggest that front buffer renderingwith a method called scanline racing could potentially dropthis delay to 8 ms[2]. On the other hand, a display with 120Hz refresh rate could reduce the latency of buffered updatesby half compared to currently used displays. Cloud gam-ing clients also benefit from more powerful SoCs on mobilephones as the frame decoding time is shorter. However, thescreen resolutions are also rising which could diminish theeffect of more powerful SoCs.

Another source of delay that is likely to reduce in the fu-ture is the one caused by the access network. 5G mobilenetworks are aiming for shorter than 1ms latency. Such lowlatency is achievable through novel radio access technolo-gies tapping the currently largely unused higher frequencyspectrum, which allows for much wider channels and, con-sequently, much higher data rates as a result of which alsothe frame delivery delay reduces[30].

We plot the hypothetical latency results with front bufferrendering (FB), 120 Hz displays (Hz), and 1 ms access net-work delay (Ac) together with the current measured ones forthe case of Samsung S7 device with LTE access in Figure 13.Just by employing front buffer rendering with a fast displayreduces the latency by roughly 20 ms regardless of the con-trol method. Looking back at Figure 11, such improvementwould have a substantially bigger effect than doubling thenumber of server locations. In the case of clients using LTEand gamepad, for example, it would increase the fraction ofclients that could be potentially served with imperceptiblelatency by 40-50 percentage points, depending on the serverplacements chosen, because the network latency budget in-creases from roughly 15 to 35 ms. Doubling the numberof server locations increases the fraction of clients by 20-25percentage points at most.

6. CONCLUSIONThis paper studies the end-to-end latency in mobile cloud

gaming. We set out to investigate whether it is possible

S7 LTE S7 LTEFB

S7 LTEFB+Hz

S7 LTEFB+Hz+Ac

0

20

40

60

80

100

120

Late

ncy

(m

s)

Device delayServer delayAccess delay

Touch

US

B

BLE

Gyro

Touch

US

B

BLE

Gyro

Touch

US

B

BLE

Gyro

Touch

US

B

BLE

Gyro

Figure 13: Response delay in different scenarioswith future improvements without the wired seg-ment of the network path. FB = Front buffer ren-dering. Hz = 120 Hz displays. Ac = 1 ms Accessdelay (5G)

to achieve latencies short enough for users not to perceivethem while using the system. Our results suggest that it ispossible but only in limited situations where, for instance,latest mobile device models and a separate control deviceare used because touch screen delay is too long. Replica-tion of the service and server placement optimization arealso necessary and they can have a major effect in provid-ing short enough latencies. However, we also noticed thattheir impact is substantially smaller than the expected im-pact of near future developments on the mobile device sidejust because the mobile device imposed delays are clearlythe biggest components in the end-to-end latency. In thefuture, it would be interesting to study in more detail theability of users to perceive latency in mobile cloud gaming,extend the case study to other continents, and to investigatethe effect of new VR and AR devices, such as Oculus Riftand Hololens, on the end-to-end latency.

7. ACKNOWLEDGMENTSThis work has been financially supported by the Academy

of Finland (grant numbers 278207 and 297892), Tekes - theFinnish Funding Agency for Innovation, and the Nokia Cen-ter for Advanced Research.

8. REFERENCES[1] 3rd Generation Partnership Project. TS 36.214 LTE;

Evolved Universal Terrestrial Radio Access(E-UTRA); Physical layer; Measurements, 2014.Release 12.

[2] Anonymous. Anonymized for double-blind review.

[3] J. Beyer, R. Varbelow, J.-N. Antons, and S. Zander. Amethod for feedback delay measurement using alow-cost arduino microcontroller. In Proceedings of the7th International Workshop on Quality of MultimediaExperience (QoMEX), IEEE, 2015.

[4] K. Boos, D. Chu, and E. Cuervo. Flashback:Immersive virtual reality on mobile devices viarendering memoization. In Proceedings of the 14thAnnual International Conference on Mobile Systems,Applications, and Services, MobiSys ’16, pages291–304, New York, NY, USA, 2016. ACM.

[5] E. Cattan, A. Rochet-Capellan, and F. Berard. Apredictive approach for an end-to-end touch-latencymeasurement. In Proceedings of the 2015 InternationalConference on Interactive Tabletops & Surfaces, ITS’15, pages 215–218, New York, NY, USA, 2015. ACM.

[6] C.-M. Chang, C.-H. Hsu, C.-F. Hsu, and K.-T. Chen.Performance measurements of virtual reality systems:Quantifying the timing and positioning accuracy. InProceedings of the 2016 ACM on MultimediaConference, MM ’16, pages 655–659. ACM, 2016.

[7] K.-T. Chen, Y.-C. Chang, H.-J. Hsu, D.-Y. Chen,C.-Y. Huang, and C.-H. Hsu. On the quality of serviceof cloud gaming systems. Trans. Multi., 16(2):480–495,Feb. 2014.

[8] K.-T. Chen, Y.-C. Chang, P.-H. Tseng, C.-Y. Huang,and C.-L. Lei. Measuring the latency of cloud gamingsystems. In Proceedings of the 19th ACM internationalconference on Multimedia, pages 1269–1272. ACM,2011.

[9] S. Choy, B. Wong, G. Simon, and C. Rosenberg. Ahybrid edge-cloud architecture for reducingon-demand gaming latency. Multimedia Systems,20(5):503–519, 2014.

[10] J. Deber, B. Araujo, R. Jota, C. Forlines, D. Leigh,S. Sanders, and D. Wigdor. Hammer time!: Alow-cost, high precision, high accuracy tool to measurethe latency of touchscreen devices. In Proceedings ofthe 2016 CHI Conference on Human Factors inComputing Systems, pages 2857–2868. ACM, 2016.

[11] J. Deber, R. Jota, C. Forlines, and D. Wigdor. Howmuch faster is fast enough?: User perception oflatency & latency improvements in direct and indirecttouch. In Proceedings of the 33rd Annual ACMConference on Human Factors in Computing Systems,pages 1827–1836. ACM, 2015.

[12] Eurostat. NUTS - Nomenclature of territorial units forstatistics. Overview.http://ec.europa.eu/eurostat/web/nuts/overview.

[13] H.-J. Hong, D.-Y. Chen, C.-Y. Huang, K.-T. Chen,and C.-H. Hsu. Qoe-aware virtual machine placementfor cloud games. In Network and Systems Support forGames (NetGames), 2013 12th Annual Workshop on,pages 1–2. IEEE, 2013.

[14] H.-J. Hong, D.-Y. Chen, C.-Y. Huang, K.-T. Chen,and C.-H. Hsu. Placing virtual machines to optimizecloud gaming experience. Cloud Computing, IEEETransactions on, 3(1):42–53, Jan 2015.

[15] C.-Y. Huang, C.-H. Hsu, Y.-C. Chang, and K.-T.Chen. Gaminganywhere: An open cloud gamingsystem. In Proceedings of the 4th ACM MultimediaSystems Conference, pages 36–47, 2013.

[16] Z. Ivkovic, I. Stavness, C. Gutwin, and S. Sutcliffe.Quantifying and mitigating the negative effects oflocal latencies on aiming in 3d shooter games. InProceedings of the 33rd Annual ACM Conference onHuman Factors in Computing Systems, CHI ’15, pages135–144, New York, NY, USA, 2015. ACM.

[17] P. Jain, J. Manweiler, and R. Roy Choudhury.Overlay: Practical mobile augmented reality. InProceedings of the 13th Annual InternationalConference on Mobile Systems, Applications, andServices, MobiSys ’15, pages 331–344, New York, NY,USA, 2015. ACM.

[18] M. Jarschel, D. Schlosser, S. Scheuring, andT. Hossfeld. An evaluation of qoe in cloud gamingbased on subjective tests. In Proceedings of the 5thInternational Conference on Innovative Mobile andInternet Services in Ubiquitous Computing (IMIS),,pages 330–335. IEEE, 2011.

[19] B. C. Joshua Ho. MediaTek Demonstrates 120 HzMobile Display. http://www.anandtech.com/show/8852/mediatek-demonstrates-120-hz-mobile-display.

[20] M. Koudritsky. Github. Walt Latency Timer.https://github.com/google/walt.

[21] B. Lee and A. Oulasvirta. Modelling error rates intemporal pointing. In Proc. of the 2016 CHIConference on Human Factors in Computing Systems,CHI ’16, pages 1857–1868. ACM, 2016.

[22] K. Lee, D. Chu, E. Cuervo, J. Kopf, Y. Degtyarev,S. Grizan, A. Wolman, and J. Flinn. Outatime: Usingspeculation to enable low-latency continuousinteraction for mobile cloud gaming. In Proceedings ofthe 13th Annual International Conference on MobileSystems, Applications, and Services, MobiSys ’15,pages 151–165, New York, NY, USA, 2015. ACM.

[23] Y.-T. Lee, K.-T. Chen, Y.-M. Cheng, and C.-L. Lei.World of warcraft avatar history dataset. InProceedings of the second annual ACM conference onMultimedia systems, pages 123–128. ACM, 2011.

[24] Y.-T. Lee, K.-T. Chen, H.-I. Su, and C.-L. Lei. Are allgames equally cloud-gaming-friendly? anelectromyographic approach. In Proceedings of the 11thAnnual Workshop on Network and Systems Supportfor Games (NetGames), pages 1–6. IEEE, 2012.

[25] NVIDIA. NVIDIA Capture SDK.https://developer.nvidia.com/capture-sdk.

[26] Nvidia. Nvidia GRID GPU Specs and features.http://www.nvidia.co.uk/object/cloud-gaming-gpu-boards-uk.html.

[27] L. Pantel and L. C. Wolf. On the impact of delay onreal-time multiplayer games. In Proceedings of the12th International Workshop on Network andOperating Systems Support for Digital Audio andVideo, NOSSDAV ’02, pages 23–29. ACM, 2002.

[28] PlanetLab. PlanetLab Europe.https://www.planet-lab.eu/.

[29] L. Qiu, V. N. Padmanabhan, and G. M. Voelker. Onthe placement of web server replicas. In INFOCOM2001. Twentieth Annual Joint Conference of the IEEEComputer and Communications Societies. Proceedings.IEEE, volume 3, pages 1587–1596. IEEE, 2001.

[30] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao,Y. Azar, K. Wang, G. N. Wong, J. K. Schulz,

M. Samimi, and F. Gutierrez. Millimeter wave mobilecommunications for 5g cellular: It will work! Access,IEEE, 1:335–349, 2013.

[31] M. Satyanarayanan, P. Bahl, R. Caceres, andN. Davies. The case for vm-based cloudlets in mobilecomputing. IEEE pervasive Computing, 8(4):14–23,2009.

[32] R. Shea, J. Liu, E.-H. Ngai, and Y. Cui. Cloudgaming: architecture and performance. IEEE Network,27(4), 2013.

[33] O. Soliman, A. Rezgui, H. Soliman, and N. Manea.Mobile Cloud Gaming: Issues and Challenges, pages121–128. Mobile Web Information Systems. Springer,2013.

[34] Speedtest. Speedtest.net. http://www.speedtest.net/.

[35] D. Wu, Z. Xue, and J. He. icloudaccess: Cost-effectivestreaming of video games from the cloud with lowlatency. Circuits and Systems for Video Technology,IEEE Transactions on, 24(8):1405–1416, Aug 2014.

[36] Q. Zhang, Q. Zhu, M. Zhani, R. Boutaba, andJ. Hellerstein. Dynamic service placement ingeographically distributed clouds. Selected Areas inCommunications, IEEE Journal on, 31(12):762–772,December 2013.

[37] Q. Zhang, Q. Zhu, M. F. Zhani, R. Boutaba, and J. L.Hellerstein. Dynamic service placement ingeographically distributed clouds. IEEE Journal onSelected Areas in Communications, 31(12):762–772,2013.

Documents

A Measurement Study on Achieving Imperceptible Latency in ...siekkine/pub/kamarainen17mmsys.pdf · a few papers on cloud gaming server placement also exist[14, 35]. To the best of