12
LBVC: Towards Low-bandwidth Video Chat on Smartphones Xin Qi, Qing Yang, David T. Nguyen, Gang Zhou, Ge Peng Department of Computer Science College of William and Mary Williamsburg, VA 23185, USA {xqi, qyang, dnguyen, gzhou, gpeng}@cs.wm.edu ABSTRACT Video chat apps enable users to stay in touch with their family, friends and colleagues. However, they consume a lot of bandwidth and hence can quickly use up a monthly data plan quota, which is a high-cost resource on smart- phones. In this paper, we propose LBVC (Low-bandwidth Video Chat), a user-guided vibration-aware frame rate adap- tion framework. LBVC takes a sender-receiver cooperative approach and reduces bandwidth usage as well as allevi- ates video quality degradation for video chat apps on smart- phones. We implement LBVC on the Android platform and evaluate its performance on a series of experiments and user study with 21 pairs of subjects. Compared to the default solution, LBVC decreases bandwidth usage by 35% and at the same time maintains good video quality without intro- ducing extra power consumption under typical video chat scenarios. Categories and Subject Descriptors C.4 [Performance of Systems]: Design Studies General Terms Measurement, Design, Performance Keywords Video Chat, Frame Rate Adaption, Frame Interpolation, Smartphones 1. INTRODUCTION Nowadays various smartphone apps have been developed for people to keep in touch with their family, friends and colleagues. No matter whether apps of this kind have been commercialized (such as FaceTime, GoogleTalk and Skype) or not (such as CSipSimple, Linphone and SipDroid), almost all of them support video chat function. Through video chats, users not only hear people’s voice, but also observe Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. MMSys ’15, March 18 - 20, 2015, Portland, OR, USA Copyright 2015 ACM 978-1-4503-3351-1/15/03 ...$15.00 http://dx.doi.org/10.1145/2713168.2713180. other contexts such as people’s expressions and locations. As 4G service becomes prevalent, network speed is no longer a bottleneck of performing mobile video chats. According to recent reports, about 19% of Americans have contacted others using video chats [14] and it is expected that there will be 29 million smartphone video chat users in 2015 [3]. However, streaming a video chat is a data-intensive op- eration and results in high bandwidth usage. For example, the upload and download bandwidth requirements for low quality video chats over Skype are both 300Kbps [12]. Tak- ing T-Mobile [4] as an example, its 500MB monthly 4G data plan, with the cost of $50 per phone, can support the use of such low quality video chats over Skype only for 1.85 hours. Although some carriers offer unlimited data plans, they come with significantly higher prices. For instance, the cost of T-Mobile’s unlimited data plan is about 1.5 times higher than that of its limited data plan. Moreover, it has been reported that most carriers tend to only offer limited data plans [49], and going over the data limit means either higher cost or lower network speed [23]. Therefore, reducing the bandwidth usage of video chat apps on smartphones is imperative. In this paper, we propose LBVC (Low-bandwidth Video Chat), a user-guided vibration-aware frame rate adaption framework. It reduces bandwidth usage as well as allevi- ates video quality degradation for video chat apps on smart- phones. LBVC provides a user-friendly interface to help even non-technical users set an appropriate frame rate to save bandwidth with great facility. To be informative, the interface provides the estimated bandwidth usage and video quality degradation of each candidate frame rate compared to the default frame rate. To save bandwidth and alleviate video quality degrada- tion, LBVC introduces a vibration-aware sender-receiver co- operative approach. The sender adopts the lower user input frame rate to save bandwidth most of the time. It only adopts the default frame rate when it detects severe smart- phone vibrations with respect to inertial sensors (accelerom- eter and gyroscope) readings. The receiver interpolates the ‘missing’ frames when the instant frame rate (calculated based on frame interval) is smaller than the default one. The purpose of the frame rate adaption at the sender is to keep the scene change between consecutive frames small in order to prevent strong artifacts from the frame interpolation. It has been demonstrated that video streaming apps are among the most popular smartphone apps [26]. However, they consume much more bandwidth than other apps ” [18]. To lower the bandwidth requirement of video streaming, 1

LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

LBVC: Towards Low-bandwidth Video Chat onSmartphones

Xin Qi, Qing Yang, David T. Nguyen, Gang Zhou, Ge PengDepartment of Computer Science

College of William and MaryWilliamsburg, VA 23185, USA

{xqi, qyang, dnguyen, gzhou, gpeng}@cs.wm.edu

ABSTRACTVideo chat apps enable users to stay in touch with theirfamily, friends and colleagues. However, they consume alot of bandwidth and hence can quickly use up a monthlydata plan quota, which is a high-cost resource on smart-phones. In this paper, we propose LBVC (Low-bandwidthVideo Chat), a user-guided vibration-aware frame rate adap-tion framework. LBVC takes a sender-receiver cooperativeapproach and reduces bandwidth usage as well as allevi-ates video quality degradation for video chat apps on smart-phones. We implement LBVC on the Android platform andevaluate its performance on a series of experiments and userstudy with 21 pairs of subjects. Compared to the defaultsolution, LBVC decreases bandwidth usage by 35% and atthe same time maintains good video quality without intro-ducing extra power consumption under typical video chatscenarios.

Categories and Subject DescriptorsC.4 [Performance of Systems]: Design Studies

General TermsMeasurement, Design, Performance

KeywordsVideo Chat, Frame Rate Adaption, Frame Interpolation,Smartphones

1. INTRODUCTIONNowadays various smartphone apps have been developed

for people to keep in touch with their family, friends andcolleagues. No matter whether apps of this kind have beencommercialized (such as FaceTime, GoogleTalk and Skype)or not (such as CSipSimple, Linphone and SipDroid), almostall of them support video chat function. Through videochats, users not only hear people’s voice, but also observe

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting withcredit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee. Requestpermissions from [email protected] ’15, March 18 - 20, 2015, Portland, OR, USACopyright 2015 ACM 978-1-4503-3351-1/15/03 ...$15.00http://dx.doi.org/10.1145/2713168.2713180.

other contexts such as people’s expressions and locations. As4G service becomes prevalent, network speed is no longera bottleneck of performing mobile video chats. Accordingto recent reports, about 19% of Americans have contactedothers using video chats [14] and it is expected that therewill be 29 million smartphone video chat users in 2015 [3].

However, streaming a video chat is a data-intensive op-eration and results in high bandwidth usage. For example,the upload and download bandwidth requirements for lowquality video chats over Skype are both 300Kbps [12]. Tak-ing T-Mobile [4] as an example, its 500MB monthly 4G dataplan, with the cost of $50 per phone, can support the useof such low quality video chats over Skype only for 1.85hours. Although some carriers offer unlimited data plans,they come with significantly higher prices. For instance, thecost of T-Mobile’s unlimited data plan is about 1.5 timeshigher than that of its limited data plan. Moreover, it hasbeen reported that most carriers tend to only offer limiteddata plans [49], and going over the data limit means eitherhigher cost or lower network speed [23]. Therefore, reducingthe bandwidth usage of video chat apps on smartphones isimperative.

In this paper, we propose LBVC (Low-bandwidth VideoChat), a user-guided vibration-aware frame rate adaptionframework. It reduces bandwidth usage as well as allevi-ates video quality degradation for video chat apps on smart-phones. LBVC provides a user-friendly interface to helpeven non-technical users set an appropriate frame rate tosave bandwidth with great facility. To be informative, theinterface provides the estimated bandwidth usage and videoquality degradation of each candidate frame rate comparedto the default frame rate.

To save bandwidth and alleviate video quality degrada-tion, LBVC introduces a vibration-aware sender-receiver co-operative approach. The sender adopts the lower user inputframe rate to save bandwidth most of the time. It onlyadopts the default frame rate when it detects severe smart-phone vibrations with respect to inertial sensors (accelerom-eter and gyroscope) readings. The receiver interpolates the‘missing’ frames when the instant frame rate (calculatedbased on frame interval) is smaller than the default one. Thepurpose of the frame rate adaption at the sender is to keepthe scene change between consecutive frames small in orderto prevent strong artifacts from the frame interpolation.

It has been demonstrated that video streaming apps areamong the most popular smartphone apps [26]. However,“they consume much more bandwidth than other apps” [18].To lower the bandwidth requirement of video streaming,

1

Page 2: LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

many video encoding techniques [8] [17] [45] [48] have beenproposed. Rather than a new encoding technique, LBVCintroduces a novel frame rate adaption framework that re-duces the number of frames that need to be encoded by videochat apps on smartphones. Some works [23] [31] proposesolutions that lower the bandwidth usage of downloadingvideos. They perform the trade-off between the quality ofoutgoing videos and bandwidth usage only at video sender.In contrast, LBVC reduces bandwidth usage more aggres-sively at video sender and does not guarantee the qualityof outgoing videos. Instead, it utilizes extra techniques atboth video sender (vibration-aware frame rate control) andreceiver (frame interpolation) to guarantee the video qualityat the receiver.

The main contributions of this paper are summarized asfollows:

• We propose LBVC, a user-guided vibration-aware framerate adaption framework for video chat apps on smart-phones.

• We implement LBVC as a framework extension to theAndroid-based version of Linphone, a popular open-source video chat app for smartphones.

• We validate LBVC using real system experiments anda user study. The results demonstrate that LBVCsaves bandwidth by 35% compared to the default so-lution and maintains good video quality.

The rest of the paper is organized as follows. We firstpresent the experimental measurements that motivate ourwork. Then, the design and implementation of the LBVCframework are described. Next, we present the real systemevaluation and user study. We finally discuss future work,elaborate existing works and conclude the paper.

2. MEASUREMENTSOne intuitive way to reduce the bandwidth usage of video

chat apps on smartphones is to decrease frame rate. In thissection, we answer the following two questions through mea-surements: (i) how does frame rate impact bandwidth usageand power consumption; (ii) how does frame rate impactvideo quality.

2.1 Measurement PlatformsIn the measurements, we utilize two recent Android-based

smartphones: Galaxy Nexus and Nexus 4. We choose Lin-phone [4], an open-source and lightweight smartphone VoIP(Voice over Internet Protocol) app, as the prototype soft-ware upon which we perform the measurements. Linphoneis one of the most popular video chat apps in Android Mar-ket, with more than 100,000 downloads. It has support fordifferent platforms, including Android devices, iPhone anddesktop.

The Android version of Linphone is composed of a userinterface, a multimedia library (called Mediastreamer2 ) anda library supporting SIP (Session Initiation Protocol) /RTP(Real-time Transport Protocol). Mediastreamer2 is a nativeC++ library for recording, encoding/decoding and playingvideo and audio. We control the frame rate of video chatsby configuring the corresponding parameter in the Medi-astreamer2 library. We keep the default configuration inSILK [11], the codec used by Linphone for audio encod-ing/decoding. We set H.264/AVC [17] as the codec for

video encoding/decoding in Linphone. It utilizes both theintra-frame and inter-frame prediction techniques to com-press video.

In H.264/AVC codec, the size of a compressed video de-pends on the rate control method adopted. Some rate con-trol methods, such as Bitrate (BR) and Average Bitrate(ABR), fix the average encoding bitrate. Using these meth-ods, the size of a compressed video is known in advance nomatter what frame rate of the original video is. In con-trast, other rate control methods, such as Constant Quan-tizer (CQ), Constant Rate Factor (CRF) and Lossless Mode,control quality instead of bitrate. Using these methods, thesize of a compressed video varies with frame rates. Thus, thisgroup of rate control methods allow us to reduce bandwidthusage through adopting lower frame rates. In this paper wechoose CRF, since CRF achieves the best subjective visualquality without significant increase in video size [17]. Weset the target quality (CRF parameter) to 22. From realworld experience, this value is able to achieve satisfiable vi-sual quality for most purposes [7].

2.2 Bandwidth Usage vs. Frame RateIn this subsection, we investigate through measurement

how does the frame rate reduction decrease the bandwidthusage? Since no data limit exists for smartphones whendata is transmitted through a WiFi network, we only do themeasurements over a cellular network.

We select 7 frame rates, which are 1, 2, 4, 8, 12, 16and 20 fps (frames per second). The frame rates, rangingfrom 12 ∼ 20 fps, are the typical ones adopted by existingvideo chat apps on smartphones. For example, by defaultLinphone records video chats with 12 fps, while SipDroidrecords video chats with 20 fps. To be fair, we set the samevalues to all parameters, except for the frame rate, in allmeasurements. In each measurement, the Linphone app iscompiled with one of the selected frame rates and installedon both smartphones. The video compression function inH.264/AVC codec is enabled.

To capture the network traffic, we utilize an app calledShark for Root [10] in Android Market. It is developed basedon tcpdump [13], a widely used software for capturing net-work traffic. The captured network traffic is stored in pcapfiles, which is then analyzed using WireShark [15].

At each selected frame rate, 10 pairs of subjects perform10 video chats, each of which lasts for 6 minutes. We choose6 minutes as the video chat duration in the measurements,since it is recently reported as the average duration of mobilevideo chats [1]. During each video chat, two smartphonesare separately held by two subjects who perform a typicalvideo chat using Linphone. These two smartphones accessInternet through the T-Mobile HSPA+ infrastructure. Wemeasure the average bandwidth usage of the Galaxy Nexusin each video chat. The network traffic of the Nexus 4 issymmetric. During the measurements, we disable all otherapps and services not required by Linphone and Shark forRoot.

We utilize WireShark to filter the captured packets of thevideo chat app. These network packets include network con-trol packets and data packets containing both audio andvideo data. Then, we calculate the average total bandwidthusage of the Galaxy Nexus during each video chat. We illus-trate the box plot of the measured average bandwidth usageat the selected frame rates in Figure 1.

2

Page 3: LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

1 2 4 8 12 16 200.1

0.2

0.3

0.4

0.5

0.6

Frame Rate (fps)

To

tal B

and

wid

th (

MB

it/s

)

Figure 1: Total Bandwidth vs.Frame Rate on Galaxy Nexus.

1 2 4 8 12 16 20

1300

1400

1500

1600

1700

1800

Frame Rate (fps)Po

wer

Co

nsu

mp

tio

n −

Cel

lula

r (m

W)

Figure 2: Power Consumption vs.Frame Rate on Galaxy Nexus in aCellular Network.

1 2 4 8 12 16 20

1000

1050

1100

1150

1200

1250

1300

1350

1400

Frame Rate (fps)

Po

wer

Co

nsu

mp

tio

n −

WiF

i (m

W)

Figure 3: Power Consumption vs.Frame Rate on Galaxy Nexus in aWiFi Network.

In the figure, when the frame rates of the video chat arebelow 12 fps, we observe that the total bandwidth usagerapidly decreases as the frame rate decreases. For example,the total bandwidth usage at 12 fps is about 188% of that at4 fps. Above 12 fps, the bandwidth usage increases slowlysince video compression techniques begin taking effects.

The observations are obtained when video compressiontechniques in the codec (H.264/AVC) are involved. The ob-servations indicate two facts below the typical frame raterange (12∼20 fps): (i) the video compression techniques failto efficiently compress video chats when frame rate is low;(ii) reducing frame rate is a feasible way to efficiently reducebandwidth usage below the typical frame rate range adoptedby existing video chat apps on smartphones.

2.3 Power Consumption vs. Frame RateIn this subsection, we investigate through measurement

how does the frame rate reduction decrease the power con-sumption of performing video chats? Since the power con-straint exists for smartphones no matter what type of net-work the data is transmitted through, we conduct the mea-surements over a public college WiFi network and the T-Mobile HSPA+ cellular network.

We use the Monsoon Power Monitor [5] to measure thepower consumption of performing video chats with Linphoneon Galaxy Nexus under different frame rates. We organizea similar set of experiments as the one in the previous sub-section. In experiments, we disable Shark for Root and allother apps and services not required by Linphone. The av-erage power consumption of each video chat is measured asfollows.

We first measure the average power consumption whenthe smartphone’s screen is on but no video chat is ongoing.Then, for each video chat, we measure its average power con-sumption, which is subtracted by the first measured powerconsumption to obtain the average power consumption ofsampling, processing, encoding/decoding, transmitting/rece-iving and playing the video and audio.

We illustrate the box plot of the measured average powerconsumption in Figure 2 and 3. From the figures, we ob-serve that the power consumption increases as the framerate increases. It is because the smartphone has to process,encode/decode and transmit/receive more video frames athigher frame rates.

However, compared to the potential bandwidth usage re-duction (5.24% ∼ 78.72% for the cellular network), the po-

tential power consumption reduction (2.66% ∼ 24.72% forthe cellular network and 0.5% ∼ 16.59% for the WiFi net-work) is much smaller. It is because the continuous multi-media stream keeps the smartphone networking componentactive.

These observations are consistent with existing works [40]and indicate that although reducing frame rate is able to re-duce the power consumption of video chats, this power con-sumption reduction is not a major benefit.

In this subsection, we introduce the video quality met-ric we choose in this paper and illustrate the relationshipbetween video quality and frame rate with this metric.

To quantify the video quality under different frame rates,we use a state-of-the-art non-reference objective video qual-ity metric, TVM (Temporal Variation Metric), which is specif-ically designed for measuring video quality on mobile de-vices [22]. A non-reference metric is chosen, since no refer-ring video exists at the receiver in a live chat.

TVM models the scene change occurred between consecu-tive frames in a video frame sequence. The range of a TVMscore is from 0 to ∞. A lower score indicates that the scenechanges between consecutive frames are jerky, and hence thevideo quality is low. A higher score indicates the opposites.

1 2 4 8 12 16 20

30

32

34

36

38

40

42

Frame Rate (fps)

TV

M S

core

s (d

B)

Figure 4: Video Quality vs Frame Rate.

2.4 Video Quality vs. Frame RateIn the measurements, we use five 10-minute videos recorded

with 30 frames per second by 5 different subjects underdifferent backgrounds and light conditions. We adjust the

3

Page 4: LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

Media Recorder

Sender

Frames

FrameInterpolation

Codecs/Decode

Codecs/Encode

Frame RateController

Camera

Receiver

SIP/RTP Network

Bit Stream Bit Stream Frames..011011....011011..

VideoSamplesAudio

GUI - Set fps

Samples

Frame Rate

MediaPlayer

fps<default

Video

Yes

Inertial SensorsAccel. & Gyro.

Microphone Audio

DelayQueue

InterpolatedFrames

Figure 5: LBVC Architecture.

frame rate of each video off-line by ffmpeg [2] and calculatethe TVM score for each video at each selected frame rate.We depict the box plot of all the scores in Figure 4.

From Figure 4, we observe that the objective video qual-ity decreases as the frame rate decreases. For example, theTVM scores indicate that the mean square difference be-tween consecutive frames at 4 fps increases by about 214%compared to that at 12 fps [22]. The larger the difference be-tween consecutive frames, the jerkier the video. It is also re-ported that subjective video quality also decreases as framerate decreases [39].

Therefore, we cannot simply reduce the frame rate of videochats to save bandwidth, because video quality degrades asframe rate decreases.

3. LBVC DESIGN AND IMPLEMENTATIONTo save bandwidth usage of mobile video chats, we present

the design and implementation of LBVC (Low-bandwidthVideo Chat), a user-guided vibration-aware frame rate adap-tion framework.

3.1 LBVC Design

3.1.1 LBVC ArchitectureFigure 5 depicts the architecture of LBVC with the com-

mon video chat app components in red and newly introducedcomponents highlighted in purple grid. During a video chat,the communication is bi-direction and a smartphone is botha sender and receiver.

At the sender side of LBVC, users can set the frame ratefor a video chat through a user-friendly interface. With re-spect to this input frame rate and on-line inertial sensorreadings, a Frame Rate Controller adjusts the frame rate ofthe media recorder for video recording. To compress videoframes and audio samples, the media recorder uses one of theexisting codecs (e.g., SILK [11] and H.264/AVC [17] used inLinphone) to encode them. Then, the encoded data is trans-mitted over SIP/RTP networks.

At the receiver side of LBVC, the data packets receivedfrom the network is decoded into video frames and audiosamples. To maintain video quality, the receiver adopts fea-sible frame interpolation technique to interpolate intermedi-ate frames between any two received frames. The numberof intermediate frames to be interpolated is chosen as the

minimal number that makes the frame rate no smaller thanthe default frame rate.

A frame interpolation technique needs two frames as in-put. After a frame is decoded, the ‘missing’ (or interme-diate) frames after it cannot be interpolated until the nextframe arrives. To ensure the video frames being played at aconstant speed, the former decoded frame cannot be playeduntil the accomplishment of the frame interpolation. Thus,a delay should be added to the video player at the beginningof each video chat.

To ensure a constant speed of video playing as well as syn-chronize the audio and video, a Delay Queue adds a com-mon delay to media player. The length of the common de-lay is determined based on the sender-side input frame rate,which the receiver obtains from the sender through SIP ne-gotiation parameters [9]. Within the common delay, theFrame Interpolation Component interpolates the intermedi-ate frames when the instant frame rate (calculated based onframe interval) is lower than the default one (e.g., 12 fps inLinphone). We note that the delay penalty (e.g., 260ms at4 fps) is only paid once at the beginning of each video chat,and results in no desynchronization between audio and videoduring playing.

It is worth emphasizing that LBVC is a generic designfor typical video chat apps on smartphones, and is indepen-dent of lower layer components such as codecs and networkprotocols.

3.1.2 Interface for Setting fpsThe interface is designed to help users set an appropriate

frame rate to save bandwidth for video chats. Particularly,it provides several candidate frame rates, which are smallerthan or equal to the default frame rate of a video chat app.However, selecting a lower candidate frame rate not only re-duces bandwidth usage, but also degrades the video qualityon conversation partner side. Without any indication, eventechnical users may encounter difficulties in making a goodtrade-off between bandwidth usage and video quality.

Therefore, the interface also provides the estimated band-width usage and video quality degradation of each candidateframe rate compared to the default frame rate. With thisknowledge, users are aware of the benefits and damages re-sulting from selecting a frame rate. Thus, they have moreconfidence when selecting an appropriate one with respect

4

Page 5: LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

to their needs. For example, a user who has sufficient dataplan quota can select a higher candidate frame rate for bet-ter video quality. In contrast, a user who has limited dataplan quota can select a lower candidate frame rate to reducebandwidth usage.

3.1.3 Frame Rate ControllerAlthough the Frame Interpolation Component on the re-

ceiver side can rescue the video quality to some extent, eventhe most advanced frame interpolation technique may pro-duce strong artifacts when the scene change between con-secutive frames is large [21]. Therefore, we design a FrameRate Controller that dynamically adapts frame rate so as toavoid large scene changes between consecutive frames.

The Frame Rate Controller utilizes the user input framerate as the target frame rate. At run-time, it adopts thetarget frame rate to save bandwidth and only adopts thedefault frame rate, which is usually higher than the userinput frame rate, when it detects large scene changes.

To detect large scene changes, we may directly use framechanges as the criteria. However, this method has a pitfall inthat it requires multiple frames to detect large scene changesbefore making a decision. Waiting for multiple frames makesit unpractical because they cannot timely response to scenechanges. For example, LBVC needs to wait 260ms for evenone frame at 4 fps.

We also note that the typical causes of large scene changesbetween consecutive frames are severe smartphone vibra-tions. Therefore, to let Frame Rate Controller response intime, we design it to detect large scene changes by indi-rectly detecting severe smartphone vibrations. In LBVC,the Controller utilizes inertial sensors (accelerometer andgyroscope) readings to determine whether a smartphone vi-bration is severe or not during a video chat. In evaluation,we will demonstrate that this indirect method is still goodenough for achieving acceptable video quality.

mA =

end∑t=start

[|∂Ax(t)|+ |∂Ay(t)|+ |∂Az(t)|] (1)

mG =

end∑t=start

[|∂Gx(t)|+ |∂Gy(t)|+ |∂Gz(t)|] (2)

The Frame Rate Controller utilizes metrics based on L1

norm of sensor reading changes [42] to quantify the smart-phone vibration (See Equation 1 and 2). In the equations,∂Ax(t), ∂Ay(t) and ∂Az(t) are the acceleration changes inthree measured dimensions; while ∂Gx(t), ∂Gy(t) and ∂Gz(t)are the angular velocity changes in three measured dimen-sions. In addition, start and end are the starting and endingtime stamps of each measuring period. L1 norm is adoptedbecause it rigorously captures the smartphone vibrations sig-naled by sensor reading changes in any measured dimension.

In LBVC, the sampling rate of the sensors is set as 50 Hzand the measuring period is set as 60 milliseconds, which issmall enough to quickly reflect the vibrations in real-time.If the value of either mA or mG is larger than a threshold,the smartphone vibration is considered severe. Otherwise,the vibration is considered light.

Moreover, the degree of vibration the Frame InterpolationComponent can tolerate also depends on the input framerate. For instance, if the input frame rate is set as a largevalue (such as 8 fps), it may generate intermediate frames

without any artifacts under typical smartphone vibrations.However, if the input frame rate is set as a small value (suchas 1 fps), it may generate intermediate frames with strongartifacts due to typical smartphone vibrations. Therefore,the vibration threshold values for different frame rates aredifferent and should be determined experimentally by appdevelopers with the objective of preventing obvious artifacts.

3.1.4 Frame Interpolation AlgorithmTo select a proper frame interpolation algorithm for LBVC,

we first investigate two optical-flow [21] [43] based algo-rithms. However, we find that their execution time is long.Take the algorithm in [43] as an example, we run it on a lap-top which has even more advanced hardware configurations(Intel Core i7-2760QM CPU @ 2.4GHz ×8 and a memoryof 12 Gigabytes) than recent smartphones such as Nexus4. The input frames have similar resolution to the ones insmartphone video chats. Based on the timing records, thealgorithm takes 825 seconds on average to interpolate oneframe. A comprehensive survey of the execution time ofvarious optical-flow based methods are listed in [6].

Since the optical-flow based algorithms are not practi-cal in LBVC, we turn to the cross dissolve algorithm [32].The cross dissolve algorithm is simple, since it only involvesadding two frames and multiplying a frame and a constant.Moreover, it has been mathematically proved that “giventwo images of a scene with small motion between them, across dissolve produces a sequence in which the edges movesmoothly.” [32]. In LBVC, the Frame Rate Controller on thesender is designed to prevent large scene changes betweenconsecutive frames.

An example of the input and output of the cross dissolvealgorithm is depicted in Figure 6. The time interval betweenthe two input frames is 300 ms. As indicated by the equa-tion in Figure 6, the intermediate frames calculated earlierobtain larger weights for the first input frame, while theones calculated later obtain larger weights for the secondinput frame. Although the algorithm cannot generate theexact intermediate frames as the original ones, it is able tosmooth the transition between the two input frames to someextent. In addition, artifacts such as blurring are alleviatedwhen the interpolated frames are displayed in a short timeinterval (e.g. 300 ms in Figure 6).

3.2 LBVC ImplementationWe implement LBVC as an extension to Linphone, a widely

used open-source video chat app. Aside from Linphone, wealso investigate the potential of modifying other popularopen-source video chat apps, such as SipDroid and CSip-Simple. We choose Linphone [4] as the base of our imple-mentation because its software structure allows us to focuson the implementation of the video processing functionali-ties.

We implement the interface for user setting fps with An-droid SDK 4.2 in Java and integrate it to the Android ver-sion of Linphone. An example of the interface is shown inFigure 7. The app developers can add more candidate framerates when they have the associated information.

The whole video streaming procedure in Linphone is im-plemented as filter graphs in the native Mediastreamer2 li-brary. Each filter in a graph is in charge of one task suchas video capturing, encoding, or displaying. When a videostream starts, a thread called Ticker is scheduled to wake

5

Page 6: LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

Input Frame 2

Original Frames

Interpolated Frames by Cross Dissolve

T=1.0T=0.8T=0.6T=0.2 T=0.4

TITIIT *)1(* 211I 2I

T=0

Input Frame 1

Figure 6: Cross Dissolve Example. Figure 7: The Interface.

up every 10ms and executes the filter graphs. We implementthe Frame Rate Controller as a component at the beginningof the Ticker thread. It collects the accelerometer and gy-roscope readings through the Android NDK sensor libraryAPI. For every 60ms, if the calculated vibration metrics islarger than a specific threshold, the default frame rate (12fps) is set for the video capturing and encoding filters; oth-erwise, the user-specified lower frame rate is set.

We implement the Frame Interpolation Component as anew filter between the video decoding and displaying fil-ters. It utilizes the two most recently decoded frames asthe input to the cross dissolve algorithm and calculates theintermediate frames. All the frames are put in a queue andassigned proper time stamps including the common delayfor displaying. For each frame in the queue, it is not sent tothe displaying filter unless its time stamp is not smaller thanthe current system time. The audio playing filter utilizes asimilar queue to add the common delay for audio playing.

4. EVALUATIONIn this section, we evaluate LBVC with a series of experi-

ments and answer the following two questions: (1) How doesLBVC impact the bandwidth usage and power consumptionunder typical video chat scenarios and mobility cases? (2)How does LBVC impact the objective and subjective videoquality?

4.1 LBVC Configuration

Frame Rate (fps) 1 2 4 6 8

mA THRs (m/s2) 2.0 3.0 4.0 4.5 6.0mG THRs (rad/s) 2.5 3.0 3.2 3.5 4.0Common Delay (ms) 1100 620 260 240 200

Table 1: Selected Frame Rates and Their AssociatedParameters (Thresholds and Delay).

In the experiments, we use the implementation of theLBVC extension to Linphone. We select 5 typical frame

rates that are lower than the default 12 fps, and summa-rize them in Table 1. For each selected frame rate, we listthe thresholds for the Frame Rate Controller to adapt framerate and common delay added to synchronize the audio andvideo (refer to the LBVC Architecture subsection). Thesevalues are determined empirically. Particularly, the thresh-olds are the minimal values that can prevent obvious frameinterpolation artifacts resulting from smartphone vibrations.The delay is the upper bound of the frame interval observedfrom extensive experiments.

4.2 Performance Under Typical ScenariosIn this subsection, we evaluate the performance impact

of LBVC at the selected frame rates in Table 1. We re-cruit the same 10 pairs of subjects in measurement section.Each pair performs video chats at all selected frame rates forbandwidth and power measurements on a Galaxy Nexus andNexus 4. Each video chat lasts for 6 minutes. The subjectsare instructed to perform video chats as usual under typicalvideo chat scenarios such as sitting and standing. There areno other physical constraints imposed on the subjects. Dur-ing all video chats, we disable all unnecessary apps, services,and radios on smartphones.

We perform bandwidth measurements over the T-MobileHSPA+ cellular network, and power measurements over apublic college WiFi network and the T-Mobile HSPA+ cel-lular network. We separate the video chats for bandwidthmeasurements from those for power measurements, since thenetwork traffic capturing software also consumes power.

4.2.1 Bandwidth UsageIn the experiments, we utilize Shark for Root to record

network traffic and WireShark to analyze bandwidth usageon a server.

Figure 8 illustrates the measured total bandwidth usageof uploading and downloading the video chats on GalaxyNexus. The red line depicts the average total bandwidth us-age at the default 12 fps with LBVC disabled. The blue barsillustrate the average total bandwidth usage at the selectedframe rates with LBVC enabled.

6

Page 7: LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

1 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

Frame Rate (fps)

To

tal B

and

wid

th (

MB

it/s

)Default 12 fps with LBVC DisabledSelected fps with LBVC Enabled

Figure 8: Bandwidth Usage vs.Frame Rate on Galaxy Nexus.

1 2 4 6 80

500

1000

1500

2000

Frame Rate (fps)

Po

wer

− C

ellu

lar

(mW

)

Default 12 fps with LBVC DisabledSelected fps with LBVC Enabled

Figure 9: Power Consumption vs.Frame Rate in a Cellular Networkon Galaxy Nexus.

1 2 4 6 80

200

400

600

800

1000

1200

1400

1600

Frame Rate (fps)

Po

wer

− W

iFi (

mW

)

Default 12 fps with LBVC DisabledSelected fps with LBVC Enabled

Figure 10: Power Consumptionvs. Frame Rate in a WiFi Net-work on Galaxy Nexus.

In the figure, we observe that the average total bandwidthusage with LBVC enabled at 1 fps is even higher than thatat 2 fps. This is due to the small thresholds used for framerate adaption at 1 fps. Small thresholds allow some typicalsmartphone vibrations to trigger switches from lower inputframe rate to higher default frame rate, resulting in largerbandwidth usage. At 2 fps, the larger thresholds result inless switches to the default frame rate, and hence the band-width usage with LBVC enabled is smaller.

Moreover, when LBVC is enabled, we observe that thebandwidth usage variances at 1 and 2 fps are larger thanthose at higher frame rates. It is because that the thresh-olds at 1 and 2 fps are smaller than those at higher framerates. Smaller thresholds allow smartphone vibrations totrigger more switches between the input frame rate and thedefault frame rate. The unstable runtime frame rate resultsin large bandwidth usage variances. However, as the framerate increases, increased thresholds result in smaller band-width usage variances.

Frame Rate (fps) 1 2 4 6 8Galaxy Nexus (%) 39.1 43.2 35.2 22.3 13.0Nexus 4 (%) 38.5 41.9 35.4 21.2 13.3

Table 2: Average Bandwidth Savings vs. FrameRates on Galaxy Nexus and Nexus 4 (Compared tothe Default 12 fps with LBVC Disabled).

We summarize the average bandwidth savings in Table2. The average bandwidth savings for the two smartphonesare similar because the communication is symmetric. Theresults demonstrate that LBVC is able to reduce the averagebandwidth usage of video chats by up to 43.2% under typicalvideo chat scenarios. Additionally, lower bandwidth usageat clients reduces the traffic flow competition at the RTPserver, and consequently may result in less packet loss andjitter.

4.2.2 Power ConsumptionIn the experiments, we utilize the Monsoon Power Monitor

[5] to measure the average power consumption of perform-ing each video chat on Galaxy Nexus. However, the poweroutput interface of the monitor cannot be connected to thebattery pins of Nexus 4. Due to this hardware constraint,we use the software PowerTutor [50] to measure the averagepower consumption of performing each video chat on Nexus

4. In general, the measured power consumption includes thepower for sampling, processing, encoding/decoding, trans-mitting/receiving and playing the video and audio. WhenLBVC is enabled, it also includes the power of the sensorsand frame interpolation.

Figure 9 and 10 summarize the measured average powerconsumption of all the video chats over the cellular and WiFinetworks on Galaxy Nexus. The red lines depict the aver-age power consumption at the default 12 fps with LBVCdisabled. The blue bars illustrate the average power con-sumption at the selected frame rates with LBVC enabled.

In the figures, we observe that the average power con-sumption at 1 fps is larger than that at 2 fps. Higher powerconsumption at 1 fps not only results from the frequentadaption to the default frame rate, but also results frommore frame interpolation operations performed at 1 fps.

In addition, we observe that the power consumption vari-ance with LBVC enabled at 1 fps is larger than those athigher frame rates. This is due to the unstable runtimeframe rate. However, this variance is not significant com-pared to the total power consumption.

Frame Rate (fps) 1 2 4 6 8Gal.N.-WiFi(%) 8.9 13.2 10.4 7.8 6.6Gal.N.-Cell(%) 9.7 10.1 8.8 5.2 3.6Nexus4-WiFi(%) 10.1 14.0 11.2 8.6 7.4Nexus4-Cell(%) 11.9 13.6 10.3 8.7 5.1

Table 3: Average Power Savings vs. Frame Rateson Galaxy Nexus and Nexus 4 (Compared to theDefault 12 fps with LBVC Disabled).

The average power savings are summarized in Table 3.From the table, we learn that although LBVC does not savemuch power, the power saving resulting from frame rate re-duction is large enough to offset the power consumption ofsensors and frame interpolation. Therefore, LBVC does notintroduce extra power overhead under typical video chat sce-narios.

4.3 Performance in Mobility CasesSince the Frame Rate Controller uses the inertial sensors

to adapt frame rate, the frame rate adaption is affected bysubjects’ mobility. In different mobility cases, the amountof time during which LBVC works at the input frame rate isdifferent. Consequently, this affects the bandwidth savings.

7

Page 8: LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

In the previous section, we conducted experiments to dem-onstrate the bandwidth and power savings of LBVC whilethe subjects are stationary, such as sitting and standing. Inthis section, we conduct the same group of experiments inthe other two mobility cases, walking, a representative of lowmobility, and sitting in a moving vehicle, a representative ofhigh mobility.

We summarize the percentages of time for LBVC to workat different input frame rates in all the three mobility casesin Figure 11. Since a higher input frame rate has a largerthreshold for frame rate adaption, we observe that as theinput frame rate increases, the percentage of time for LBVCto operate at that frame rate also increases.

In Figure 11, we also observe that the stationary case hasthe largest percentage of time for LBVC to work at userinput frame rates among all the cases. It is because thereare more smartphone vibrations in the other two cases. Inorder to compensate the increased smartphone vibrationsand rescue video quality, the Frame Rate Controller has tofrequently adapt the instant frame rate to the default one.In the stationary case, LBVC has the highest chance to savebandwidth.

1 2 4 6 80

102030405060708090

100

Input Frame Rate (fps)

Per

cen

tag

e o

f T

ime

at A

n In

pu

t F

ram

e R

ate

(%)

Sitting or Standing (Stationary)Walking (Low Mobility)In a Vehicle (High Mobility)

Figure 11: Percentage of Time for LBVC to Workat Different Input Frame Rates in Different MobilityCases.

Additionally, we observe that LBVC has the lowest chanceto work at input frame rates to save bandwidth in the walk-ing case. It is because that the subjects do not control theirwalking speed and gestures during video chats when theyare walking. Thus, in walking case, frequent speed changesand slight jolts cause LBVC operating at the default framerate most of time.

In the walking case, although a subject’s head does nothave a significant relative movement in the screen during avideo chat, the background of the video chat continuouslychanges. However, the fast background change is success-fully compensated thanks to LBVC’s frequent adaption tothe default frame rate.

Finally, we observe that LBVC operates at input framerates for more time in the vehicle case than it does in thewalking case. It is because that the vehicles are usuallyoperated at almost constant speed near the speed limits onthe roads. At almost constant speed, acceleration changesare not large enough to trigger the switch to the defaultframe rate. However, different traffic conditions result in

different speed controls. The variances of the percentages inthe vehicle case are large.

In a vehicle, a subject’s head usually has a significantrelative movement in the screen when the vehicle changesspeed during braking, accelerating or hitting a speed bump.In these scenarios, LBVC is triggered by the inertial sensorsto adapt the frame rate in order to compensate the relativemovement.

We summarize the bandwidth savings in Table 4. Al-though the bandwidth savings in these two cases are smallerthan those under the typical scenarios, the extra consumedbandwidth compensates the frequent smartphone vibrationsin order to alleviate video quality degradation. Since it ishard to run the power monitor while the subjects are walk-ing or sitting in a vehicle and power saving is not a majorbenefit of LBVC, we do not perform power measurementsin these two cases.

Frame Rate (fps) 1 2 4 6 8Walking Case

Galaxy Nexus (%) 0.7 1.6 3.0 3.1 4.5Nexus 4 (%) 0.7 1.5 3.0 3.2 4.4

Vehicle CaseGalaxy Nexus (%) 4.2 11.2 13.9 8.6 8.6Nexus 4 (%) 4.4 11.2 13.10 8.8 8.6

Table 4: Average Bandwidth Savings vs. FrameRates on Galaxy Nexus and Nexus 4 (Compared tothe Default 12 fps with LBVC Disabled) in the TwoMobility Cases.

4.4 Video QualityWe perform a user study to evaluate the impact of LBVC

on video quality and demonstrate that LBVC can allevi-ate the video quality degradation resulting from frame ratereduction. In the study, we recruit 21 different pairs ofsubjects (42 subjects in total) to perform video chats whenLBVC is either disabled or enabled under each selected framerate. Half of the subjects major in Computer Science, whilethe others major in varying areas, including Physics, AppliedScience, Mathematics, Nursing, Education and RhythmicGymnastic. Each video chat lasts for 6 minutes. We collectboth subjective and objective scores of each video chat.

To collect subjective scores, we carry out a Double Stimu-lus Continuous Quality Evaluation (DSCQE) [16], in whicheach subject rates a video chat by comparing it to a referencevideo. The scores are from 0 to 100. An intuitive explana-tion for the relationship between the video quality and scoreis as follows: very annoying (0-20); annoying (21-40); fair(41-60); good (61-80); excellent (81-100).

In each round of the DSCQE, we first ask each subject pairto perform and score a video chat under the default 12 fpswith LBVC disabled and use this video chat as the referenceone. Second, we configure the Linphone using a randomlychosen frame rate from Table 1 with LBVC either enabled ordisabled. Then, each subject pair is instructed to performand score a video chat under this configuration. Duringeach video chat, the subjects are also instructed to vibratethe smartphone to simulate different mobility cases. Werepeat the second step until all the selected frame rates withLBVC either enabled or disabled have been traversed byeach subject pair. In addition, we also query them whether

8

Page 9: LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

1 2 4 6 80

10

20

30

40

Frame Rate (fps)

TV

M S

core

s (d

B)

Default 12 fps with LBVC DisabledSelected fps with LBVC EnabledSelected fps with LBVC Disabled

Figure 12: TVM Scores vs. FrameRate. Error Bars indicate theStandard Deviations.

1 2 4 6 80

20

40

60

80

100

Frame Rate (fps)

Su

bje

ctiv

e S

core

s

Default 12 fps with LBVC DisabledSelected fps with LBVC EnabledSelected fps with LBVC Disabled

Figure 13: Subjective Scores vs.Frame Rate. Error Bars indicatethe Standard Deviations.

Frame Rate (fps)

Pai

r ID

1 2 4 6 8

124686789

101112131415161718192021 1

1.5

2

2.5

3

3.5

4

Figure 14: The Ratio of theSubjective Score with LBVC En-abled to the Subjective Score withLBVC Disabled for Each Pair.

the common delay affects the subjects’ video chat experienceor not.

To collect objective scores, we implement a recorder inLinphone to automatically record all video chats during theDSCQE. As in the measurement section, we use the Tem-poral Variation Metric (TVM) [22] to measure the objectivevideo quality. We analyze the recorded videos by MATLABon a PC to obtain the TVM scores.

The objective and subjective scores are depicted in Figure12 and 13. In these two figures, the red lines depict theaverage score at the default 12 fps of Linphone with LBVCdisabled. The blue bars illustrate the average scores at theselected frame rates with LBVC enabled, and the green barsillustrate the average scores at the selected frame rates withLBVC disabled.

Objective Scores Figure 12 demonstrates that LBVC isable to alleviate the objective video quality degradation re-sulting from frame rate reduction. For instance, the averageTVM scores at 4 fps indicate that the mean square differ-ence between consecutive frames with LBVC enabled is de-creased by 58% compared to that with LBVC disabled. Thissignificant decrease in the mean square difference betweenconsecutive frames results in smoother videos.

Subjective Scores Figure 13 summarizes the subjectivescores collected from the study. This figure demonstratesthat LBVC is able to alleviate the perceptual video qual-ity degradation resulting from frame rate reduction. Forinstance, the average subjective score at 4 fps with LBVCenabled is improved by 25.3% compared to the one withLBVC disabled. The score is even higher than that at 6 fpswith LBVC disabled.

When LBVC is enabled, we also notice that the relation-ship between subjective score and frame rate is differentfrom that between objective score and frame rate. Thereare two possible reasons. First, although the cross dissolvealgorithm interpolates intermediate frames to smooth thevideo chats, the artifacts in the interpolated frames may alsoimpair the subjects’ perceptual experience. The lower theframe rate, the larger the scene change between consecutiveframes, which consequently results in more artifacts in theinterpolated frames. Second, the frequent changes betweenthe default frame rate and input frame rate at lower framerates (e.g., 1 and 2 fps) also impair the subjects’ percep-tual experience. In contrast, the TVM only measures thevideo jerkiness but not the aforementioned artifacts. How-

ever, Figure 13 demonstrates that the subjective perceptualexperience is improved as a whole, although the artifactsoffset some improvements.

Notwithstanding the aforementioned limitation, we stillchoose TVM for two reasons. First, very few non-referencemetrics are designed for measuring video quality with dif-ferent frame rates. TVM is the newest metric of this sortand specifically designed for mobile devices. Second, TVMscores are accurate enough to demonstrate the effective al-leviation of video quality degradation.

Figure 14 is a heat map of the ratio of the subjective scorewith LBVC enabled and the one with LBVC disabled undereach frame rate. We calculate the score of each pair as theaverage of the two subjects’ scores. We observe that LBVCrarely worsens the video quality, and is able to alleviate thevideo quality degradation for most of the time.

From both Figure 13 and Figure 14, we observe that thevideo quality improvements are larger at 1 and 2 fps thanthose at other frame rates. It is because the extreme videojerkiness at 1 and 2 fps accentuates the smoothness broughtby frame interpolation and frame rate control.

Finally, almost all the subjects claim that the impacts ofthe common delay during each video chat are acceptable andeven negligible when video chats are performed at 4 framesper second and above. They also claim that there is noobvious desynchronization between audio and video duringvideo chats.

From Table 2 and 3, and Figure 13, we notice that at 4 fps,LBVC achieves about 35% bandwidth saving while still main-taining good video quality without introducing extra powerconsumption under typical video chat scenarios.

5. DISCUSSION AND FUTURE WORKIn this section, we discuss the assumptions and limita-

tions of LBVC. Then, we will discuss how to further improveLBVC in the future.

First, in the design, LBVC utilizes the inertial sensor read-ings to dynamically adapt the frame rate. One shortcomingof this approach is being unaware of the object movementsin videos. For example, if the user’s head moves fast, butthe handheld smartphone does not, this approach may notbe able to instantly adapt the frame rate in order to com-pensate for the fast head movement. However, since user’shand and head are connected through user’s body, it is theusual case that user’s head moves fast and hands also move

9

Page 10: LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

significantly. From this perspective, we do not observe suchshortcoming frequently.

Second, we note that LBVC may miss some chance ofbandwidth savings when users are sitting in a moving vehi-cle. Frequent vehicle speed changes make the accelerometerreadings highly dynamic and frequently trigger the switchto higher default frame rate. To solve this problem, LBVCmay first use sensors, such as accelerometer and GPS, todetect whether a user is sitting in a vehicle or not. With thecontextual knowledge of being in a vehicle, LBVC then onlyuses the gyroscope to sense smartphone vibrations. We willextend LBVC to address this special case in the future.

Third, in the design, LBVC switches to the default framerate when it detects severe vibrations with respect to thethresholds of input frame rate. The principle of this design isto provide enough quality for users under severe vibrations.However, LBVC can further reduce bandwidth through amore fine-grained frame rate adaption approach. In the ap-proach, when a severe vibration is detected, LBVC does notnecessarily switch to the default frame rate, which is higherthan other candidate frame rates. Instead, it can increaseframe rate to a candidate, which is high enough for frameinterpolation to compensate the current vibration. Withthis approach, LBVC has much less chance to switch to thedefault frame rate, and in consequence has more chance toreduce bandwidth usage. Meantime, it may worsen the over-all video quality. Thus, to explore this alternative approach,we should also investigate its impacts on video quality. Wewill leave these works in future.

Finally, in Section 4.1, we empirically set up the thresholdvalues in such a way that they can prevent obvious frame in-terpolation artifacts resulting from smartphone vibrations.However, we are aware that different users may have dif-ferent subjective criteria for whether interpolation artifactsare obvious or not. In future, instead of pinning down thethresholds manually for all users, we will explore the pos-sibility of building a model that describes the relationshipbetween frame rate and threshold values. Once the modelis abstracted, it can be concreted for different users throughlearning model parameters.

6. RELATED WORKThe initial idea of LBVC is presented in a poster [41].

We further improve our idea from 5 aspects: (i) we designa user-friendly interface to help users with varying videoquality requirements set an appropriate frame rate to savebandwidth with great confidences; (ii) we investigate the re-lationship between power consumption and frame rate, andthe impact of our design on power consumption; (iii) we in-vestigate bandwidth savings of our design in other mobilitycases such as walking and sitting in a vehicle; (iv) we orga-nize a user study and collect subjective scores to investigatewhether our design can maintain subjects’ perceptual ex-perience; (v) we also investigate other frame interpolationalgorithms, such as the optical-flow based algorithm in [43].

In this section, we articulate how LBVC is different fromexisting works of video streaming bandwidth reduction, videobitrate adaption, video streaming power reduction and im-age interpolation.

Video Streaming Bandwidth Reduction. To lower thebandwidth usage of streaming videos, many video compres-sion techniques [8] [17] [25] [29] [45] have been proposed.

VP8 [8] and H.264 [17] utilize motion compensation tech-nique that exploits the temporal correlation among framesto reduce video sizes. SaVE [25] and SenseCoding [29] utilizeaccelerometers to simplify existing motion estimation meth-ods. In addition to the aforementioned video compressiontechniques, existing works in compress sensing such as [37]and context-aware compression [19] are also related.

In general, video compression techniques exploit variousinformation from video frames to reduce the number of nec-essary bits to encode them. In contrast, LBVC is designedto reduce the number of frames to be encoded, which isequivalent to reducing the size of the input to video com-pression techniques. Thus, LBVC and video compressionare orthogonal.

Video Bitrate Adaption. Video bitrate adaption tech-niques take into account various information, such as net-working conditions [24] [33] [44], video contents [35] and userpreferences [46], to control video streaming bitrate in orderto maximize video utility (or video quality) for video con-sumer [27]. Compared to these quality-oriented video bi-trate adaption techniques, LBVC adjusts video chat bitrateto help user reduce their data plan quota usage and maintainsatisfiable video quality.

Some recent works, such as QAVA [23]and TUBE [28],also propose techniques to help user economically use theirdata plan quota for video streaming. LBVC is different fromthese works from two perspectives. First, LBVC is designedto reduce the bandwidth usage of streaming live and inter-active video chats instead of downloading existing videos.Second, QAVA and TUBE optimally perform the trade-offbetween video quality and bitrate (or bandwidth usage) atvideo sender in order to save date usage as well as guaranteethe quality of outgoing videos. In contrast, LBVC reducesvideo bitrate more aggressively (through frame rate adap-tion) at video sender and does not guarantee the qualityof outgoing videos. Instead, it utilizes extra techniques atboth video sender (vibration-aware frame rate control) andreceiver (frame interpolation) to guarantee the video qualityat the receiver. In short, LBVC has more capacity to reducebandwidth usage of video chats since it has more guaranteesfor video quality.

Video Streaming Power Reduction. In [38], an inte-grated power management method for video streaming isproposed. It achieves architectural power optimizations onCPU, register, and memory. It also includes an OS power-saving mechanism - dynamic voltage scaling, and adaptivemiddleware for video transcoding and network traffic regu-lation. Empirical measurements performed in [36] demon-strate that the chunk-based video streaming protocols arethe most power efficient for downloading videos on mobiledevices, since networking components on mobile devices havemore chance to sleep. The method in [30] shortens the ac-tive duration of WiFi after each video prefetchings and cutsoff unnecessary video pre-fetchings with respect to users’video viewing preferences. GreenTube [34] optimizes powerconsumption for mobile video downloading by judiciouslyscheduling downloading activities to minimize unnecessaryactive period of 3G/4G radio.

In contrast, LBVC is proposed to mainly reduce band-width usage of video chats streaming on smartphones, al-though it has the capacity of, but not significantly, reducingpower consumption of video chats.

10

Page 11: LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

Image Interpolation. Exploring Photobios [32] also uti-lizes the cross dissolve algorithm to interpolate intermediateframes between two well-aligned images with the purposeof creating a face animation with smooth face transitions.InterviewVideo [20] provides a tool for automatically plac-ing interpolated frames to make smooth transitions in in-terview videos. Some methods such as [21] and [43] pro-pose optical-flow based warping algorithms to interpolatethe frames. However, as we report in Section 3.1.4, warp-ing algorithms usually involves complex operations, such asimage derivation and matrix multiplication. Thus, they aremore computationally intensive than the cross dissolve algo-rithm [32] adopted in LBVC implementation. Some workssuch as [47] utilize frame interpolation techniques to makeup the lost frames in video transmission. Although LBVCalso makes up frames, those frames are deliberately dropped(not passively lost in [47]) to save bandwidth.

7. CONCLUSIONHow to appropriately balance the mobile video quality

and bandwidth usage is a relevant but hard problem. Par-ticularly, we attempt to address the problem in the scenarioof video chats by proposing LBVC. LBVC adapts the videoframe rate with respect to the smartphone vibrations at thesender side and interpolates the ‘missing ’ frames at the re-ceiver side. Through real system experiments and a userstudy, we demonstrate that LBVC saves bandwidth by upto 43% under typical video chat scenarios at the expense oflimited video quality degradation.

8. ACKNOWLEDGMENTSThe authors would like to express their gratitude to the

reviewers who provided insightful comments and suggestionsto improve the paper. The authors also wish to extend theirthanks to the subjects, who participated in the experimentsand user study. This work was supported in part by U.S.National Science Foundation under grant CNS-1250180 andCNS-1253506 (CAREER).

9. REFERENCES[1] Average duration of mobile video chat.

http://goo.gl/ow9uDT.

[2] ffmpeg. http://www.ffmpeg.org/.

[3] Juniper research report. http://goo.gl/F5qvGz.

[4] Linphone. www.linphone.org.

[5] Monsoon power monitor. http://goo.gl/spM1pt.

[6] Optical-flow execution time. http://goo.gl/vcS1px.

[7] Rate control in h.264. http://goo.gl/7pfUp8.

[8] Rfc6386 - vp8 data format and decoding guide.http://tools.ietf.org/html/rfc6386.

[9] Session description protocol. http://goo.gl/fcphyP.

[10] Shark for root. http://goo.gl/IRCUVw.

[11] Silk. http://dev.skype.com/silk.

[12] Skype bandwidth requirement.http://goo.gl/apzY9s.

[13] Tcpdump. http://www.tcpdump.org.

[14] Us video call statistics. http://goo.gl/C7yjcO,.

[15] Wireshark. http://www.wireshark.org/.

[16] Methodology for the subjective assessment of thequality of television pictures. ITU-R Rec. BT.500-11 ,2002.

[17] H.264: Advanced video coding for generic audiovisualservice. International Telecommunication Union, 2013.

[18] White paper: Live and on-demand streaming videowith lifesize uvc video center. March, 2012.

[19] Xuan Bao, Trevor Narayan, Ardalan Amiri Sani,Wolfgang Richter, Romit Roy Choudhury, Lin Zhong,and Mahadev Satyanarayanan. The case forcontext-aware compression. In Proc. of HotMobile2011.

[20] Floraine Berthouzoz, Wilmot Li, and ManeeshAgrawala. Tools for placing cuts and transitions ininterview video. ACM Trans. Graph., 31(4):67:1–67:8,July 2012.

[21] Thomas Brox, AndrAl’s Bruhn, Nils Papenberg, andJoachim Weickert. High accuracy optical flowestimation based on a theory for warping. In Proc. ofECCV 2004.

[22] An (Jack) Chan, Amit Pande, Eilwoo Baik, andPrasant Mohapatra. Temporal quality assessment formobile videos. In Proc. of Mobicom ’12.

[23] Jiasi Chen, Amitabha Ghosh, Josphat Magutt, andMung Chiang. Qava: quota aware video adaptation. InProc. of CoNEXT 2012.

[24] Jiasi Chen, Rajesh Mahindra, Mohammad AmirKhojastepour, Sampath Rangarajan, and MungChiang. A scheduling framework for adaptive videodelivery over cellular networks. In Proc. of MobiCom2013.

[25] Xiaoming Chen, Zhendong Zhao, Ahmad Rahmati,Ye Wang, and Lin Zhong. Save: sensor-assistedmotion estimation for efficient h.264/avc videoencoding. In Proc. of ACMMM 2009.

[26] Hossein Falaki, Ratul Mahajan, and et al. Diversity insmartphone usage. In Proc. of MobiSys 2010.

[27] Shih fu Chang, Shih fu Chang, Anthony Vetro,Anthony Vetro, and Senior Member. Videoadaptation: Concepts, technologies, and open issues.In Proc. of the IEEE, 2005.

[28] Sangtae Ha, Soumya Sen, Carlee Joe-Wong, YoungbinIm, and Mung Chiang. Tube: time-dependent pricingfor mobile data. In Proc. of SIGCOMM 2012.

[29] Guangming Hong, Ahmad Rahmati, Ye Wang, andLin Zhong. Sensecoding: Accelerometer-assistedmotion estimation for efficient video encoding. InProc. of ACMMM 2008.

[30] Mohammad Ashraful Hoque, Matti Siekkinen, andJukka K. Nurminen. Using crowd-sourced viewingstatistics to save energy in wireless video streaming. InProc. of MobiCom 2013.

[31] Lorenzo Keller, Anh Le, Blerim Cici, Hulya Seferoglu,Christina Fragouli, and Athina Markopoulou.Microcast: cooperative video streaming onsmartphones. In Proc. of MobiSys 2012.

[32] Ira Kemelmacher-Shlizerman, Eli Shechtman, RahulGarg, and Steven M. Seitz. Exploring photobios. InProc. of ACM SIGGRAPH 2011.

[33] Sunhun Lee and Kwangsue Chung. Combining therate adaptation and quality adaptation schemes forwireless video streaming. J. Vis. Comun. ImageRepresent., 19(8):508–519, December.

[34] Xin Li, Mian Dong, Zhan Ma, and Felix C.A.

11

Page 12: LBVC: Towards Low-bandwidth Video Chat on Smartphonesgzhou.blogs.wm.edu/files/2018/09/MMSys15.pdf · Linphone records video chats with 12 fps, while SipDroid records video chats with

Fernandes. Greentube: power optimization for mobilevideostreaming via dynamic cache management. InProc. of ACMMM 2012.

[35] Ying Li, Zhu Li, Mung Chiang, and A. RobertCalderbank. Content-aware distortion-fair videostreaming in congested networks. IEEE Trans. Multi.,11(6):1182–1193, October.

[36] Yao Liu, Lei Guo, Fei Li, and Songqing Chen. Anempirical evaluation of battery power consumption forstreaming data transmission to mobile devices. InProc. of ACMMM 2011.

[37] Jia Meng. Compressive sensing in wirelesscommunications. In Ph.D. Thesis, UNIVERSITY OFHOUSTON, 2010, 121 pages.

[38] Shivajit Mohapatra, Radu Cornea, Nikil Dutt, AlexNicolau, and Nalini Venkatasubramanian. Integratedpower management for video streaming to mobilehandheld devices. In Proc. of ACMMM 2003.

[39] Yen-Fu Ou, Tao Liu, Zhi Zhao, Zhan Ma, and YaoWang. Modeling the impact of frame rate onperceptual quality of video. In Proc. of ICIP 2008.

[40] Andrew J. Pyles, Xin Qi, Gang Zhou, MatthewKeally, and Xue Liu. Sapsm: Smart adaptive 802.11psm for smartphones. In Proc. of UbiComp 2012.

[41] Xin Qi, Qing Yang, David T. Nguyen, and Gang Zhou.Context-aware frame rate adaption for video chat onsmartphones. In Adjunct Proc. of UbiComp 2013.

[42] Ahmad Rahmati and Lin Zhong. Context-for-wireless:Context-sensitive energy-efficient wireless datatransfer. In Proc. of MobiSys 2007.

[43] Lars Lau Raket, Lars Roholm, Andres Bruhn, andJoachim Weickert. Motion compensated frameinterpolation with a symmetric optical flow constraint.In Proc. of ISVC 2012.

[44] Reza Rejaie, Mark Handley, and Deborah Estrin.Quality adaptation for congestion controlled videoplayback over the internet. In Proc. of SIGCOMM1999.

[45] Shu Shi, Cheng-Hsin Hsu, Klara Nahrstedt, and RoyCampbell. Using graphics rendering contexts toenhance the real-time video coding for mobile cloudgaming. In Proc. of ACMMM 2011.

[46] Wei Song, Dian Tjondronegoro, and Michael Docherty.Saving bitrate vs. pleasing users: where is thebreak-even point in mobile video quality? In Proc. ofACMMM 2011.

[47] Jonathan Su and Russell M. Mersereau.Motion-compensated interpolation of untransmittedframes in compressed video. In Proc. of 301h AsilomarConf. on Signals Systems and Computers, Asiloma;USA, 1996.

[48] Yao Wang, Ya-quin Zhang, and Joern Ostermann.Video Processing and Communications. 2001.

[49] Li Zhang, Dhruv Gupta, and Prasant Mohapatra.How expensive are free smartphone apps?SIGMOBILE Mob. Comput. Commun. Rev.,16(3):21–32, December 2012.

[50] Lide Zhang, Birjodh Tiwana, Zhiyun Qian, and et al.Accurate online power estimation and automaticbattery behavior based power model generation forsmartphones. In Proc. of CODES/ISSS 2010.

12