12
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012 431 QoE Prediction Model and its Application in Video Quality Adaptation Over UMTS Networks Asiya Khan, Lingfen Sun, and Emmanuel Ifeachor Abstract—The primary aim of this paper is to present a new con- tent-based, non-intrusive quality of experience (QoE) prediction model for low bitrate and resolution (QCIF) H.264 encoded videos and to illustrate its application in video quality adaptation over Universal Mobile Telecommunication Systems (UMTS) networks. The success of video applications over UMTS networks very much depends on meeting the QoE requirements of users. Thus, it is highly desirable to be able to predict and, if appropriate, to control video quality to meet such QoE requirements. Video quality is af- fected by distortions caused both by the encoder and the UMTS ac- cess network. The impact of these distortions is content dependent, but this feature is not widely used in non-intrusive video quality prediction models. In the new model, we chose four key parame- ters that can impact video quality and hence the QoE-content type, sender bitrate, block error rate and mean burst length. The video quality was predicted in terms of the mean opinion score (MOS). Subjective quality tests were carried out to develop and evaluate the model. The performance of the model was evaluated with un- seen dataset with good prediction accuracy . The model also performed well with the LIVE database which was recently made available to the research community. We illustrate the appli- cation of the new model in a novel QoE-driven adaptation scheme at the pre-encoding stage in a UMTS network. Simulation results in NS2 demonstrate the effectiveness of the proposed adaptation scheme, especially at the UMTS access network which is a bot- tleneck. An advantage of the model is that it is light weight (and so it can be implemented for real-time monitoring), and it pro- vides a measure of user-perceived quality, but without requiring time-consuming subjective tests. The model has potential applica- tions in several other areas, including QoE control and optimiza- tion in network planning and content provisioning for network/ser- vice providers. Index Terms—Content types, MOS, non-intrusive model, NS2, QoE, UMTS, video quality prediction and adaptation. I. INTRODUCTION T RANSMISSION of video content over Universal Mo- bile Telecommunication Systems (UMTS) networks is growing exponentially and gaining popularity. Digital videos are now available everywhere—from handheld devices to Manuscript received January 05, 2011; revised May 31, 2011 and October 29, 2011; accepted November 01, 2011. Date of publication November 16, 2011; date of current version March 21, 2012. This work was supported in part by the EU FP7 ADAMANTIUM project (contract No. 214751). The associate editor coordinating the review of this manuscript and approving it for publication was Prof. James E. Fowler. The authors are with the Centre for Signal Processing and Multimedia Communication, School of Computing and Mathematics, University of Plymouth, Plymouth PL4 8AA, U.K. (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TMM.2011.2176324 personal computers. However, due to the bandwidth constraints of UMTS networks quality of experience (QoE) still remains of concern. This is because low video quality leads to poor QoE which in turn leads to reduced usage of the applications/ser- vices and hence reduced revenues. User expectation of video quality over handheld and mobile terminals is increasing to that of broadcast level [1], [2]. In order to meet QoE requirements of users, there is a need to predict, monitor and if necessary control video quality. Non-in- trusive models provide an effective and practical way to achieve this [3]. However, research on video quality modeling is still limited. ITU-T Study Group 9 [4] is working on standardiza- tion of non-intrusive video quality modeling and recently has produced a draft version of the test plan for the hybrid percep- tual/bitstream models for IPTV and mobile video streaming ap- plications. Existing non-intrusive models do not take into ac- count the impact of several important parameters, such as, video content which has an impact on video quality achievable under same network conditions, as well as the encoder. Video con- tent features (such as blurriness, brightness, etc.) can be ex- tracted from the video signal before encoding the video. Ex- isting video quality prediction algorithms tend to consider ei- ther video content features or the effects of distortions caused by the encoder or network impairments but rarely all three. In addition, they are restricted to IP networks. However, with the growth of video services over wireless access networks it is im- portant to take into account impairments that occur in the access network. Video quality can be measured in an intrusive (full ref- erence) or non-intrusive (reference free) way. Intrusive predic- tion models require access to the source, whereas non-intrusive modes do not and hence are preferred for online monitoring, pre- diction and control. Video quality prediction models presented in [5]–[8] consider encoder based distortions only and those in [9] and [10] are based on video content features. The model pre- sented in [9], called MOVIE index, is a full reference model de- veloped from the spatio-temporal features of the video, whereas the model presented in [10] is reference free. Full reference video quality prediction models are difcult to implement for real-time monitoring due to their complexity. Work presented in [11] uses bitstream measurements only to monitor video quality. The model presented in [12] combines video content features with the distortions caused by the encoder. Work in [13] presents a metric that measures temporal quality degradation caused by regular and irregular frame loss. Several models have also been developed to predict video quality [14], [15] over IP networks from network distortions only, e.g., packet loss, delay and jitter. Work presented in [16] proposes a metric that models multiple packet losses in H.264 videos using reduced reference methods, 1520-9210/$26.00 © 2011 IEEE

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

  • Upload
    hatram

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Page 1: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012 431

QoE Prediction Model and its Application in VideoQuality Adaptation Over UMTS Networks

Asiya Khan, Lingfen Sun, and Emmanuel Ifeachor

Abstract—The primary aim of this paper is to present a new con-tent-based, non-intrusive quality of experience (QoE) predictionmodel for low bitrate and resolution (QCIF) H.264 encoded videosand to illustrate its application in video quality adaptation overUniversal Mobile Telecommunication Systems (UMTS) networks.The success of video applications over UMTS networks very muchdepends on meeting the QoE requirements of users. Thus, it ishighly desirable to be able to predict and, if appropriate, to controlvideo quality to meet such QoE requirements. Video quality is af-fected by distortions caused both by the encoder and the UMTS ac-cess network. The impact of these distortions is content dependent,but this feature is not widely used in non-intrusive video qualityprediction models. In the new model, we chose four key parame-ters that can impact video quality and hence the QoE-content type,sender bitrate, block error rate and mean burst length. The videoquality was predicted in terms of the mean opinion score (MOS).Subjective quality tests were carried out to develop and evaluatethe model. The performance of the model was evaluated with un-seen dataset with good prediction accuracy . The modelalso performed well with the LIVE database which was recentlymade available to the research community. We illustrate the appli-cation of the new model in a novel QoE-driven adaptation schemeat the pre-encoding stage in a UMTS network. Simulation resultsin NS2 demonstrate the effectiveness of the proposed adaptationscheme, especially at the UMTS access network which is a bot-tleneck. An advantage of the model is that it is light weight (andso it can be implemented for real-time monitoring), and it pro-vides a measure of user-perceived quality, but without requiringtime-consuming subjective tests. The model has potential applica-tions in several other areas, including QoE control and optimiza-tion in network planning and content provisioning for network/ser-vice providers.

Index Terms—Content types, MOS, non-intrusive model, NS2,QoE, UMTS, video quality prediction and adaptation.

I. INTRODUCTION

T RANSMISSION of video content over Universal Mo-bile Telecommunication Systems (UMTS) networks is

growing exponentially and gaining popularity. Digital videosare now available everywhere—from handheld devices to

Manuscript received January 05, 2011; revisedMay 31, 2011 and October 29,2011; accepted November 01, 2011. Date of publication November 16, 2011;date of current version March 21, 2012. This work was supported in part by theEU FP7 ADAMANTIUM project (contract No. 214751). The associate editorcoordinating the review of this manuscript and approving it for publication wasProf. James E. Fowler.The authors are with the Centre for Signal Processing and Multimedia

Communication, School of Computing and Mathematics, University ofPlymouth, Plymouth PL4 8AA, U.K. (e-mail: [email protected];[email protected]; [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TMM.2011.2176324

personal computers. However, due to the bandwidth constraintsof UMTS networks quality of experience (QoE) still remains ofconcern. This is because low video quality leads to poor QoEwhich in turn leads to reduced usage of the applications/ser-vices and hence reduced revenues. User expectation of videoquality over handheld and mobile terminals is increasing to thatof broadcast level [1], [2].In order to meet QoE requirements of users, there is a need to

predict, monitor and if necessary control video quality. Non-in-trusive models provide an effective and practical way to achievethis [3]. However, research on video quality modeling is stilllimited. ITU-T Study Group 9 [4] is working on standardiza-tion of non-intrusive video quality modeling and recently hasproduced a draft version of the test plan for the hybrid percep-tual/bitstream models for IPTV and mobile video streaming ap-plications. Existing non-intrusive models do not take into ac-count the impact of several important parameters, such as, videocontent which has an impact on video quality achievable undersame network conditions, as well as the encoder. Video con-tent features (such as blurriness, brightness, etc.) can be ex-tracted from the video signal before encoding the video. Ex-isting video quality prediction algorithms tend to consider ei-ther video content features or the effects of distortions causedby the encoder or network impairments but rarely all three. Inaddition, they are restricted to IP networks. However, with thegrowth of video services over wireless access networks it is im-portant to take into account impairments that occur in the accessnetwork. Video quality can be measured in an intrusive (full ref-erence) or non-intrusive (reference free) way. Intrusive predic-tion models require access to the source, whereas non-intrusivemodes do not and hence are preferred for onlinemonitoring, pre-diction and control. Video quality prediction models presentedin [5]–[8] consider encoder based distortions only and those in[9] and [10] are based on video content features. The model pre-sented in [9], called MOVIE index, is a full reference model de-veloped from the spatio-temporal features of the video, whereasthe model presented in [10] is reference free. Full referencevideo quality prediction models are difficult to implement forreal-time monitoring due to their complexity. Work presented in[11] uses bitstreammeasurements only to monitor video quality.The model presented in [12] combines video content featureswith the distortions caused by the encoder.Work in [13] presentsa metric that measures temporal quality degradation caused byregular and irregular frame loss. Several models have also beendeveloped to predict video quality [14], [15] over IP networksfrom network distortions only, e.g., packet loss, delay and jitter.Work presented in [16] proposes a metric that models multiplepacket losses in H.264 videos using reduced reference methods,

1520-9210/$26.00 © 2011 IEEE

Page 2: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

432 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

but do not consider either encoder based distortions or videocontent features. Work presented in [17] uses neural networksto assess video quality based on a combination of parametersfrom the network and encoder, but this is restricted to IP net-works with limited content. In [18] a model for videophone ap-plications is proposed. The model uses two encoder parameters(sender bitrate and frame rate) and packet loss from the network.The model is for videophone services only (i.e., content withhead and shoulder movement) with high sender bitrates over IPnetworks. In [19] a video quality estimation model is proposedfor IPTV applications. The model combines parameters fromthe network and encoder based parameters.Recent work on video quality assessment [20]–[22] has

shown that video quality is affected by parameters associatedwith the encoder (e.g., sender bitrate) and the network (e.g.,packet loss). In [23] a review of the evolution of video qualitymetrics and a discussion of the state of art are given. A metricthat combines network losses with video bitstream informationis also presented. In a previous paper [24] we showed thatvideo quality is impacted by distortions caused by the encoderand access network. These distortions are very much contentdependent. Work presented in [25] also concluded that contenttypes was the second most important QoS parameter afterencoder type and has significant impact on video quality.We proposed in [26] a video quality prediction model overUMTS networks that combine parameters associated with theencoder and the UMTS access network, but this is based onpeak-signal-to-noise-ratio (PSNR) to MOS conversion whichmay not adequately reflect visual quality. What is needed is amodel that is based on subjective tests so the quality predictedby the model is closely linked to user-perceived quality. Themodel should be efficient and light weight and suitable for alltypes of video content so that it can be implemented at thereceiver side to monitor and predict quality and, if appropriate,control the end-to-end perceived quality.Thus, the focus of this study is to develop a new content-

based, non-intrusive, video quality prediction model that takesinto account distortions caused by the UMTS access networkand the encoder. Our work focuses on low bitrate videos en-coded with H.264 codec transmitted over UMTS access net-work, taking into account content types. The most significantcontent types were identified in a previous study [27] and wereclassified into groups using a combination of temporal (move-ment) and spatial (edges, blurriness, brightness) features using acluster analysis tool [28]. As part of the model development, weconducted subjective tests with different test scenarios. The testcases were prepared by considering the distortions introducedby the encoder and the UMTS access network for different typesof content (from head and shoulders to fast moving sports).The new model was used to control and optimize QoE to

demonstrate its application. The application is of interest inits own right as the optimization of QoE is crucial for mobilemultimedia design and delivery. The idea in this application isto move away from the use of individual network parameters,such as block loss or delay, to control performance. Instead,to move towards perceptual-based, video quality control, inorder to achieve the best possible end-to-end video quality [10],[29], [30]. In this study, we used fuzzy techniques to perform

TABLE IVIDEO CONTENT DESCRIPTION

adaptation as they provide a natural way to perform control andbecause they have been successfully used in network adaptationschemes before for video applications [31]–[33]. In particular,they have been used in adaptive feedback for packet loss rateand congestion notification from routers [31], adaptive controlof video bit rate [32], and in control algorithms for variable bitrate applications [33].The main contributions of the paper are twofold.• A new and efficient model to predict video quality overUMTS networks non-intrusively. The model uses a com-bination of parameters associated with the encoder and theUMTS access network for different types of content.

• Application of the new model in QoE control using a newsender bitrate adaptation scheme at the pre-encoding stage.

The rest of the paper is organized as follows. Section IIpresents the new model. In Section III we demonstrate theapplication of the proposed model in QoE control. Section IVconcludes the paper and highlights areas of future work.

II. QOE PREDICTION MODEL FOR H.264 VIDEO

In this section, we present the development of the non-intru-sive content-based QoE prediction model for low bitrate H.264video for mobile streaming application. First, we describe insubsection A the generation of data sets used to develop themodel, and in subsection B the subjective tests are described.Data analysis and modeling are presented in subsection C andmodel validation with external databases are presented in sub-section D.

A. Data Set Generation

The content of the source clips and the choice of codec werechosen to be representative of a typical scenario for watchingvideo on mobile devices. The test material comprises of sixclips—three chosen for model training and three for validation.The video sequences represent contents with low spatio-tem-poral (ST) to high ST features as classified in a previous study[27]. The videos were encoded with H.264/AVC [34] codec asit is the recommended codec for video transmission over UMTSnetworks. The considered frame structure is IPPP for all the se-quences, since the use of periodic IDR-frames (InstantaneousDecoding Refresh within H.264 encoder) could cause suddendata rate increases and delays.The video contents, description and duration of the clips are

summarized in Table I. The video frame size was QCIF. QCIF was specifically chosen (instead of, for example,

CIF or larger sizes) as it is the recommended size for mobilephones and small handheld terminals which are the target ap-plication areas of the study. However, newer smart phones have

Page 3: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

KHAN et al.: QOE PREDICTION MODEL AND ITS APPLICATION IN VIDEO QUALITY ADAPTATION OVER UMTS NETWORKS 433

Fig. 1. Snapshot of the training and validation sequences.

TABLE IITEST CONDITIONS

a resolution of 320 240, or even higher on, e.g., ipads. Thus,the QCIF resolution may seem small and in this sense our studyreflects the worst case scenario. All the generated video clipscan be downloaded from [35]. Snapshots of the video clips aredepicted in Fig. 1.A combination of parameters associated with the encoder

and the UMTS access network for different content types werechosen (see Table II). We chose six content types (CTs)—threefor model training and three for validation. Video sequencesof Akiyo, Foreman and Stefan were used for training and se-quences of Suzie, Carphone and Football for validation of themodel. The frame rate (FR) was fixed at 10 fps as this is typicalin mobile video streaming [2], [6], [12]. We found that SBR hasa greater impact on quality than FR and hence SBR was chosen.The videos were encoded with three Constant Bitrate (CBR)SBR values of 48 kbps, 88 kbps and 128 kbps for the trainingvideos and 90 kbps and 130 kbps for the validation videos. TheUMTS access network parameters chosen were block error rate(BLER) andmean burst length (MBL). In the access network wefound that BLER is the most important parameter [24]. MBLwas chosen to account for more bursty scenarios. The videoswere then sent over OPNET [36] simulated UMTS network tocreate conditions with BLER of 1, 5, 10, 15 and 20%. BLERof 20% corresponds to an IP loss of 2%–3%. Thus the qualitywas not degraded beyond 20% BLER. We found that BLERof 1%–5% corresponded to no IP loss. OPNET was used toanalyze the specific impact of the UMTS error conditions onthe perceived video quality, due to its accurate implementationof the radio-link-control (RLC) not-in-order delivery mecha-nism. Error simulated in the physical layer (BLER) was em-ployed to generate losses at the link layer modeled with 2-stateMarkov model [37] with variable MBLs [38] to depict the var-ious UMTS scenarios. The 2-state Markov model is depicted inFig. 2. According to this model, the network is either in a good(G) state, where all packets are correctly delivered, or in bad (B)state, where all packets are lost. Transitions between the states(from G to B and vice versa) occur with probabilities of - and- . The average block error rate and mean burst length can beexpressed as - and - - - .If , this reduces to a random error model with the onlydifference that loss of two consecutive packets is not allowed.

is selected based on the mean error burst lengthfound in [38] for typical roaming scenarios from real-worldUMTSmeasurements. if for a scenario where morebursty errors are found, while the is for random uni-form error model.

Fig. 2. Two-state Markov loss model.

Fig. 3. Experimental setup for data generation.

All the chosen test conditions were sent over the simulatednetwork to generate test data with network impairments. Thiswas specifically done as in the literature onlylimited work onvideo quality assessment takes network errors into account. Acontribution of the paper is to look at the combined effects ofdistortions caused by both the encoder and access network im-pairments on end-to-end quality. The video test conditions aredescribed in Table II. The experimental set up is shown in Fig. 3.In total 81 sequences were generated for training and 54 formodel validation.

B. Subjective Tests

The subjective quality assessment experiment follows ITU-TRecommendations [39], and was conducted using the single-stimulus absolute category rating (ACR) method with a fivepoint quality scale [39]. Each processed video is presented oneat a time and rated individually. The presentation order was ran-domized between viewers such that each one of them viewed thetest sequences in a different presentation order. The videos wererated independently on a discrete 5 level scale from “bad” (1)to “excellent” (5). The ratings for each test clip were then av-eraged over all subjects to obtain a mean opinion score (MOS)[40]. Voting period was not time-limited. After choosing theirquality rating, assessors had to confirm their choice using the“submit” button. This approach gave subjects the possibility tochange their mind before committing to their final vote. Viewingdistance was not fixed. Participants were allowed to adjust totheir most comfortable viewing distance, although they were in-structed to keep their back in contact with the chair.The laboratory had calibrated 20-inch computer LCD mon-

itor (Philips 200 WB7) to display the video sequences. The dis-play had a native resolution of 1280 1024 pixels and colorquality selected as highest (32 bit). The room had a white back-ground. The sequences were displayed in their original size withgrey border. Participants provided their ratings electronicallyusing the computer mouse.A total of 20 naïve viewers participated in the experi-

ment—11 males and 9 females. This conforms to the minimumnumber of viewers specified by ITU-T Recommendations [39].

Page 4: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

434 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

Fig. 4. Histogram of subjective MOS (dashed line represents median at 3.7).

The age range of 14 participants was between 18 to 25; 4were between 25 to 30 and 2 were over 35. Participants wererecruited from within the University. They were first presentedwith three training sequences that were different to the testsequences. The experiments were divided into two sessionswith a 10–15 min comfort break between them over three days.This adhered to the ITU-T recommendation of time period notexceeding half an hour. An informal survey was conductedafter the tests regarding the length of the study, fatigue duringthe tests, etc. which concluded that all participants did not ex-perience any fatigue or discomfort during the tests. All subjectsassessed all degraded video sequences in the test.The MOS data obtained from the test was scanned for unreli-

able and inconsistent results.We used the ITU-T [39] criteria forscreening subjective ratings which led to three subjects being re-jected. The scores from the rest of the subjects were averaged tocompute the overall MOS for each test condition. Fig. 4 showsthe histogram of subjective quality ratingsfor both the trainingand validation datasets. Fig. 4 indicates that the MOS distri-bution is biased towards high MOS values. This is due to thefact that subjects gave a higher vote to sequences with low tomedium movement. Also in the test conditions the data werenot degraded beyond 3% IP loss and so no sequence were ratedas ’bad’ (see later Section IV).

C. Data Analysis and Modeling

We analyzed the relationships ofthe four chosen parametersthat impacts on QoE—sender bitrate, content type, block errorrate and mean burst length on end-to-end video quality. Thisenabled us to establish a relationship between these four pa-rameters and MOS which is used in the regression modeling(see later). We performed 4-way repeated analysis of variance(ANOVA) [28] on our training dataset to determine if the meansin the MOS data set given by the 4 QoE parameters differ whengrouped by multiple factors (i.e., the impact of all four parame-ters on MOS). ANOVA would also enable us to understand theinteractions of the four variables and hence their relationshipsin the regression modeling.

TABLE IIIANOVA RESULTS FOR MAIN AND INTERACTION EFFECTS

Table III shows the results of the analysis. The fourth columnshows the F statistic and the fifth column gives the p-value,which is derived from the cumulative distribution function (cdf)of F [28]. A small p-value indicates that MOS issignificantly affected by the corresponding parameter. From theresults, we may conclude that MOS is affected by CT, SBRand BLER - . However, MBL is not as significant

. However, there were interaction effects betweeneach pair of parameters. The two-way interaction between CTand SBR have a greater impact compared to that of CTand BLER . This was due to the fact that in ourdata we only considered limited values of BLER. BLER had animpact only on fast moving contents. This is due to the fact thatas long as the IP packet error probability remains unchanged theimpact of link layer losses on end-to-end video quality is negli-gible. With three-way interactions, the combination of CT, SBRand MBL and SBR, BLER and MBL have a significant impactcompared to the other two. The two-way interactions capturethe three way impact as well.We can summarize our findings as follows.1) The most important QoE parameter in the application layeris content type. Therefore, an accurate video quality pre-diction model should consider all content types.

2) The optimal SBR that gives the best quality is very muchcontent dependent and varies from sequence to sequence.We found that for slow moving contentgave acceptable quality [40]. However, as the spatio-tem-poral activity of the content increased quality became un-acceptable under no network impairment. Hence the choiceof SBR is very much dependent on the type of content.

3) The most important QoE parameter in the UMTS accessnetwork is BLER. Therefore, an accurate video quality pre-diction model should consider the impact of access net-work parameters in addition to encoder parameters.

4) The impact of access network parameters of MBL andBLER vary depending on the type of content. For slowmoving content BLER of 20% gave acceptable quality.However, for fast moving content for the same BLER thequality may not be acceptable . Therefore, theimpact of access network QoE parameters is very muchcontent dependent.

Page 5: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

KHAN et al.: QOE PREDICTION MODEL AND ITS APPLICATION IN VIDEO QUALITY ADAPTATION OVER UMTS NETWORKS 435

Fig. 5. Pie chart representation of the ranking of QoE parameters.

Fig. 6. Functional block of proposed regression-based model.

Based on our findings we have ranked the four QoE parame-ters in order of importance. The most important is ranked as 1and the least important as 4. The ranking is based on the resultsof the ANOVA analysis carried out earlier. Our analysis showedthat CT is the most important QoE parameter as the impact ofSBR and BLER are very much content dependent. Similarly,we found MBL to be least important. The degree of importanceof each QoE parameter is depicted in the form of a pie chart inFig. 5. We note from Fig. 5 that BLER (25%) and MBL (22%)are close in importance. Similarly, BLER (25%) and SBR (24%)are very close in order of importance. The pie chart takes as itsinput the interactions of the four QoE parameters.The ANOVA analysis shown in Table III confirmed that there

are interactions between the four QoE parameters. This enabledus to carry out a nonlinear regression analysis on the datasetsusing different polynomial and rational equations to capturethese effects in a model. The functional block of the proposedmodel is shown in Fig. 6. The application layer parametersconsidered are content type (CT) and sender bitrate (SBR). TheUMTS access networkparameters in the physical layerare BLER, modeled with 2-state Markov model with varyingMBLs.We plotted 2-D graphs to analyze the relationship between

MOS, sender bitrate, content type, block error rate and meanburst length [see Fig. 7(a)–(c)]. Based on the relationships ofthe QoE parameters we established the following function forestimating the overall video quality:

(1)

Fig. 7. Relationships of the QoS parameters with MOS.

where is a constant, is measured in terms of SBR andCT, and is measured in terms of BLER and MBL for 2-stateMarkov model.

Page 6: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

436 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

TABLE IVCOEFFICIENTS OF QOE PREDICTION MODEL

TABLE VMODEL VALIDATION CORRELATION COEFFICIENTS AGAINST [25] AND [41]

In our previous work [27], we extracted the temporal featuresof sum of absolute differences (SAD) and the spatial features ofedge, blurriness and brightness, thus giving CT as shown in (2):

(2)

CT is then predicted using cluster analysis [28] as three dis-crete values of 0.1 (Suzie), 0.5 (Carphone) and 0.9 (Football).See Section III for details.1) Content Type and Encoder : From ANOVA analysis

and Fig. 7(a), we established the relationship between MOS andSBR and CT. We found that there is a logarithmic relationshipbetween SBR and MOS and a linear relationship between MOSand CT. Also from ANOVA analysis (Table III), we found thatthe combined impact of SBR and CT is significant. Mathemat-ically, this is shown by (3):

(3)

2) UMTS Access Network : Similarly, in the UMTS ac-cess network we found that the relationships between MOSand BLER and MBL can be modeled as a polynomial functionshown in Fig. 7(b) and (c) can be modeled as polynomial func-tion as shown in (4):

(4)

Based on the relationship obtained for the four chosen QoEparameters from the ANOVA analysis and Fig. 7(a)–(c), wefound the following rational model from nonlinear regressionanalysis of the subjective data using MATLAB:

(5)The values of the coefficients of the model are listed in Table

IV. Equation (5) can be readily extended to include FR as a vari-able to account for the impact of frame rate on quality. The coef-ficients for the QoE prediction model are given in Table V alongwith the correlation coefficient and root mean squarederror (RMSE).Fig. 8 shows the resulting scatter plot of subjective data

against model prediction. We achieved a correlation coefficientof around 93% for the validation dataset and 95.6% for thetraining dataset. There were 81 test conditions for the trainingand 54 test conditions for the validation.

Fig. 8. Scatter plot of subjective video quality against quality prediction frommodel.

D. Validation of the Proposed Model

This section shows the validation of the proposed model withexternal MOS databases given in [25] and [41]. The databasein [25] considers H.264 encoder based distortions only (i.e., nonetwork dependent distortions). The data is for H.264 encodedQCIF videos with encoder parameters of SBR and FR with fivevideo sequences. We chose encoder only parameters and CT toshow that if network losses were taken as zero (in (5)) the QoE model gives a correlation coefficient of 78%. Inthis case (5) reduces to (6). Fig. 9(a) shows the model validationresults on the external database from [24]:

(6)

The LIVE database in [41] considers H.264 encoder baseddistortions along with network packet losses for ten typesof video sequences.We have used the LIVE database thatwas made available for the research community to use. Thesubjective quality measure in this database is based on theDegradation MOS (DMOS) as opposed to MOS. The datain Fig. 9(b) is from LIVE which is for H.264 videos of size768 480 with high SBRs and packet losses. The data is takensuch that , packet losses are taken as BLER, andSBR values are taken according to the LIVE database. CT iscalculated from (2) for ten video sequences. The frame rate wasfixed at 30 fps. We achieved a correlation coefficient offor data from [41].Fig. 9(b) shows the model validation results on the LIVE

database. Fig. 9(b) shows that our model overestimates whenDMOS is low (10–20) and underestimates when DMOS is be-tween 35–45. The DMOS from 16–32 would be equivalent toMOS of 4. DMOS from 0–15 would be equivalent to MOS of5. The model overestimates as the MOS dataset was biased to-wards high MOS values (in the proposed model). This seems tobe a limitation of the existing dataset and will be addressed infuture work. Similarly, DMOS from 33–48 represents MOS of3.Model performance betweenDMOS 35–45 shows little varia-tion in some points only. This could be due to the little variationin the content types in LIVE dataset. The model performance

Page 7: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

KHAN et al.: QOE PREDICTION MODEL AND ITS APPLICATION IN VIDEO QUALITY ADAPTATION OVER UMTS NETWORKS 437

(a)

(b)

Fig. 9. (a) Validation of proposed model from subjective data in [25]. (b) Val-idation of proposed model from subjective data in [41].

is good between – . In addition, the varia-tion in content types in the LIVE dataset is very little. The con-tent classifier classified most of the content as slow to mediumactivity and hence this added to the two clusters in Fig. 9(b).Further, LIVE wireless video database considered videos en-coded at maximum frame rates and SBRs, whereas our modelis derived from subjective data where test conditions includedonly one fixed frame rate (10 fps) and low sender bitrates. How-ever, there is evidence in the literature [7] that subjective qualityvaries with frame rate and that this is a result of interaction be-tween temporal and spatial dimensions. At present, the modeldoes not take this aspect into account because this is not thefocus of this study, and this may explain the bias on certainpoints in the LIVE wireless video database.We did not validate against the VQM FRTV1 dataset [42] as

it is limited to MPEG4 and H263 codecs. In addition, it does not

Fig. 10. Conceptual diagram to illustrate QoE-driven adaptation.

Fig. 11. Content classification method.

take into account the impact of network losses. Table V summa-rizes the correlation coefficient and root mean squared error ofour model against external databases of [25] and [41].

III. APPLICATION OF THE PROPOSED MODELIN QOE-BASED SENDER BITRATE ADAPTATION

In this section we demonstrate the application of the newmodel in sender bitrate adaptation at the sender side. The op-timization of QoE is crucial for mobile multimedia design anddelivery. The conceptual diagram of our proposed QoE-drivensender bitrate adaptation is given in Fig. 10. The QoE-based pre-diction model derived earlier is used in the QoE-based senderbitrate adaptive control mechanism as shown in Fig. 10.Subsection A describes the content classification method,

QoE-adaptation scheme is described in subsection B, whereassubsection C describes the evaluation set-up of the proposedscheme. Results are discussed in subsection D.

A. Content Classification

The video content classification [27] is carried out from de-graded videos at the receiver side by extracting the spatial andtemporal features using a well known multivariate statisticalanalysis called cluster analysis [28]. This technique is used asit groups samples that have various characteristics into similargroups. The spatio-temporal metrics have a low complexity andthus can be extracted from the videos in real time. The spatialfeatures extracted are edge, blurriness and brightness, whereasthe temporal features extracted are the sum of absolute differ-ence values. Based on the extracted spatio-temporal features, acluster analysis based on the Euclid distance of the data, is per-formed to determine the content type. Therefore, video clips inone cluster have similar content complexity. The content classi-fier takes the extracted features for each new video as input thenpredicts its most likely type. Once the CT is predicted, then itis used as an input to the QoE model given by (5). Details ofour content classification design are given in [27]. The blockdiagram of the video content classification function is given inFig. 11. Content classification was done offline in this paper forsimplicity.

Page 8: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

438 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

The content classifier can be extended to predict CT as con-tinuous from 0 to 1 based on, e.g., neural networks. For largervideo clips or movies the input will be segmented by segmentanalysis of the content features extracted.

B. QoE-Driven Adaptation Scheme

The fuzzy logic algorithm [43] implemented at the senderside processes the feedback information and decides the op-timum number of layers that will be sent using fuzzy logic con-trol. Layered encoding is used for adapting the video streamsto the network dynamics. Video streams are encoded in a lay-ered manner in a way that every additional layer increases theperceived quality of the stream. Base layers are encoded at avery low rate to accommodate for the UMTS access networkconditions. Additional layers are added or dropped in order toadapt the video stream according to the content type and net-work conditions.We used stream switching in Evalvid-RA [44]where video streams are encoded at different sender bitrates.We use the model proposed in (5) for MOS prediction. The

model is light weight and easy to implement. The predicted QoEmetrics from (5) together with network QoS parameters is thenused in the QoE-driven adaptation scheme to adapt the senderbitrate as shown in Fig. 9. RTCP is used to exchange the feed-back on the quality of the data distribution by exchanging re-ports between the sender and the receiver. The feedback infor-mation is sent through extended RTCP reports [45] every secondfrom the network and collects QoS information like loss rate,delay and jitter from the core network to give the network con-gestion level. The network congestion level is calculated fromthe packet loss ratio. Here it is referred to as the BLER computedfrom the total number of blocks lost over the total blocks sent.We use BLER as opposed to packets lost as in UMTS networks,the physical layer passes the transport blocks to the medium ac-cess control (MAC) layer together with the error indication fromcyclic redundancy check, the output of the physical layer can becharacterized by the overall probability (BLER) in this paper.Thus, an error model based on 2-state Markov model [37] ofblock errors was used in the simulation. However, this is also alimitation of the current adaptation scheme and in the future weaim to consider the losses in the core network in addition to ac-cess network. We define loss rate (LR), computed from [45] asthe fraction of the number of block lost (BL) divided the totalnumber of blocks sent (BS) within an interval. Therefore, theloss rate (LR) is given by (7) as

(7)

The range of loss rate level is from with 0 being nocongestion and 1 meaning fully congested network. The LRwaspartitioned into four levels as ,

, and . The levels ofLR are chosen such that 1% blocks lost, 3% blocks lost, 6%clocks lost and blocks lost. LR is an input to the decisionalgorithm for SBR adaptation.The second input to the decision algorithm is the degrada-

tion (D) and is calculated as the difference between the max-imum achievable MOS and the [computed from

Fig. 12. Membership functions for the two inputs and the output and the oututSBR adaptor surface.

TABLE VILINGUISTIC RULES

the QoE prediction model given in (5)]. The maximum achiev-able MOS is set to 4.2 when no blocks are lost. The D is there-fore given by (8) as

(8)

The maximum value that D can have is 3.2, indicating max-imum degradation, and the minimum value that D can have is 0indicating no degradation at all. The D has been split into fourlevels as 0–0.25, 0.25–0.7, 0.7–1.2 and . The split in thevalues of D are chosen as a change of 0.25 in MOS. The levelsof D are chosen such that MOS ranges from 3.8–4.2, 3.8–3.5,3.5–3.0 and . This is then linked with an SBR level. TheD along with the LR are used as input to the fuzzy logic senderbitrate adaptor.The membership functions for the two inputs (linguistic input

variables) and the output (SBRchange) is shown in Fig. 12. Tri-angular functions are chosen due to their simplicity. The SBRchange (output) surface is also given by Fig. 12 which shows theoverall behavior of the SBR adaptor. The first linguistic variable(LV) input LR is the network loss rate. It ranges from 0 to 1. Thesecond linguistic variable (LV) D is the degradation calculatedfrom QoE model. D ranges from 0 to 3.2.The fuzzy SBR adaptor processes the two linguistic vari-

ables based on the predefined if-then rule statements (rule base)shown in Table VI, and derives the linguistic output variableSBRchange, which is defined for every possible combination ofinputs. An example of the fuzzy rule is:

Page 9: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

KHAN et al.: QOE PREDICTION MODEL AND ITS APPLICATION IN VIDEO QUALITY ADAPTATION OVER UMTS NETWORKS 439

If loss rate is large (L) and degradation is medium (M) thenSBRchange is BC (big change).The linguistic variables in Table VI are given by the member-

ship functions of the output in Fig. 12 and are described as nochange (NC), very small change (VSC), small change (SC) andbig change (BC). The linguistic variables in Table VI for the twoinputs are given by zero (Z), small (S), medium (M) and large(L). The defuzzified output can then be used to determine thenext level of SBR as given by (9):

(9)

Each value of SBR change corresponds to a layer of the en-coded video bitstream. The defuzzified output is selected from0 to 1 as shown in Fig. 12. Thus a gradual increase in SBR is al-lowed when the bandwidth is available and there is no/reducedloss rate, whereas quick action is taken to reduce the SBR incase of severe loss rate.

C. Evaluation Set-Up

The network topology is modeled in the UMTS extensionfor the NS2 [46] namely, Enhanced UMTS Radio Access Net-work Extension (EURANE) [47] integrated with Evalvid-RA[43] modified for H.264 video streaming. H.264 codec is chosenas it is the recommended codec for low bitrate transmission.We chose H.264 due to its better efficiency, more control atthe encoder and being an evolving codec. The results of ourproposed adaptive scheme are compared with the well knownTFRC (TCP-Friendly Rate) [48] controller. TFRC calculatesthe sending rate as a function of the measured packet loss rateduring a single round trip time durationmeasured at the receiver.The sender then calculates the sending rate according to [48].With the Evalvid-RA [44] framework, it is possible to simulatepure TFRC transport directly on top of the network layer usingstream switching. Constant bit rate (CBR) videos are used inthe simulation as proof of concept. Three compression settingswere used as shown in Table VII (104 kbps, 88 kbps, 44 kbps)for stream switching. The mismatch at the switching points isachieved by switching frames. The three CBR bitrates used inswitching frames are given in Table VII. The switching is sim-ulated in Evalvid-RA. Foreman video is used which is 30 slong and comprised of 3 equal sequences of 10 s (300 framesat 30 fps). The adaptive algorithm adjusts the compression ratesuntil all packets get through. However, our technique can easilybe extended to variable bit rate (VBR) videos, too.The evaluation model is given in Fig. 13. It consists of a

streaming client and server. In the evaluation, the user equip-ment (UE) is a streaming client and a fixed host is the videostreaming server located in the Internet. The addressed scenariocomprises of a UMTS radio cell covered by a node B connectedto a radio network controller (RNC). The evaluation model con-sists of a UE connected to downlink dedicated physical channel(DPCH).As the main aim of the evaluation was to investigate the

impact of the radio interface (UMTS access network) on thequality of streaming H.264 video and carry out adaptation ofSBR, no packet losses occur on either the Internet or the UMTScore network (e.g., SGSN, GGSN). No adaptation is carried

TABLE VIIEVALUATION PARAMETERS FOR UMTS NETWORK FOR SBR ADAPTATION

Fig. 13. UMTS network topology.

out if the quality stays above MOS of 4.2. This is because fre-quent adaptation is annoying to viewers. In Fig. 13 the linksbetween the two nodes are labeled with their bitrate (in bitsper second) and delay (in milliseconds). Each link capacity waschosen so that the radio channel is the connection bottleneck.Consequently, the functionality of Serving GPRS Support Node(SGSN) and Gateway GPRS Support Node (GGSN) was ab-stracted out and modeled as traditional ns nodes since they arewired nodes and in many ways mimic the behavior of IP router.Currently no header compression technique is supported in thePacket Data Convergence Protocol (PDCP) layer.From the 3GPP [49] recommendations we find that for video

streaming services, such as VOD or unicast IPTV services, aclient should support H.264 (AVC) Baseline Profile up to theLevel 1.2 [34]. As the transmission of video was for mobilehandsets, all the video sequences are encoded with a QCIF res-olution. The considered frame structure is IPPP for all the se-quences, since the extensive use of I frames could saturate theavailable data channel. In the IPPP frame structure used, Intra-RefreshMacroblocks (IRMBs) were not included in P-pictures.The problem of error propagation with the loss of a P-frame willresult in a quality drop if IR MBs have been used. However, ouradaptation mechanism using IPPPP frame structure results inacceptable quality and the implementation of the various waysof using IR MBs will combat the problem of error propaga-tion and enhance the performance of our adaptation mechanismand improve the delivered video quality. The results validated

Page 10: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

440 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

Fig. 14. Comparison of end user quality with TFRC and no adaptation.

the use of our control mechanism and loss concealment mech-anisms and error propagation combating algorithms will add tothe performance of our algorithm. Therefore, our results repre-sent a lower bound on the expected video quality.From these considerations, we set up the encoding features

as shown in Table VII. The implemented link loss model is a2-state Markov model given in Fig. 2 with three chosen valuesof MBL as 1, 1.75 and 2.5 to represent all UMTS scenarios. Weran ten simulations with eachMBL and calculated the PSNR foreach scenario. See subsection D for results and analysis.

D. Results and Analysis

In order to study the effect of link bandwidth on the MOS(QoE of the user), we conducted scenarios involving oneuser and then up to five UMTS users that received streamingvideo over NS2 simulated UMTS network. We conductedexperiments with content type of Foreman and assessed theperformance of our QoE-driven adaptation scheme over sim-ulated NS2 [46] UMTS networks in terms of Average PSNR.NS2 was chosen due to its flexibility and based on the char-acteristics of the link bandwidth. PSNR values are comparedto non-adaptive and TFRC in Fig. 14. Fig. 14 shows that theQoE-based fuzzy adaptive scheme successfully adapts thesender bitrate to network congestion. The proposed schemeslowly reduces the sender bitrate according to the networkconditions maintaining acceptable quality. TFRC uses a moreaggressive manner of recovery after network congestion andincreases their transmission rate faster causing significantdegradations of end-user quality measured in terms of averagePSNR.Fig. 15 provides a perception for the capacity of the proposed

QoE-based fuzzy adaptation scheme with respect to the numberof UMTS users that can be supported by a video streamingserver, taking into account the bottleneck link bandwidth. Thefigure shows the performance of the proposed QoE-drivenadaptive scheme at three UMTS downlink bandwidths of 128kbps, 256 kbps and 384 kbps compared to TFRC. The adaptive

Fig. 15. Average PSNR versus number of active users.

scheme out performs TFRC at all three link bandwidths with5 users.Fig. 15 also depicts the quality (average PSNR) that is expe-

rienced by multiple identical users (total of 5) having the sameconnection characteristics, with respect to the bandwidth of thebottleneck link. The dashed lines indicate acceptable qualitytaken as (as from literature average

is taken as acceptable).When the Link Bandwidth (LBW)is high enough (384 kbps) to sustain the aggregated video trans-mission rate, all users are supported by the video streamingserver at equal quality levels. Even at the bottleneck LBW of256 kbps all users can be supported at the minimum accept-able level. We have shown that our scheme would maintain ac-ceptable quality for five users as most networks have admissioncontrol. This should enable us to admit more users with the pro-posed scheme if the cost of implementation is justified by theadditional revenues. However, at the lowest level of the LBW(128 kbps) only two users can be supported and then the qualityreduces below the acceptable threshold.Similarly, Fig. 16 gives the adaptive video quality over

UMTS compared to the non-adaptive one at LBWs of 128kbps, 256 kps and 384 kbps. Again, we observe an improve-ment in quality for content type of Foreman. At bottleneckbandwidth of 128 kb/s, adaptive “Foreman” gives an averagePSNR 31 dB compared to 24 dB without adaptation. Therefore,the adaptive video scheme gracefully adapted the deliveredvideo quality to the available network downlink bandwidth.

IV. CONCLUSIONS AND FUTURE WORK

In this paper we have presented a novel content-based non-in-trusive video quality prediction model for low bitrate H.264videos over UMTS networks. However, the model can be easilyextended to access networks of Wireless Local Area Network(WLAN). The model was developed using a combination ofparameters associated with the encoder and the UMTS accessnetwork for different content types. We demonstrated an ap-plication of the model in a new QoE-driven SBR adaptationscheme. Themodel was evaluated with unseen dataset (different

Page 11: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

KHAN et al.: QOE PREDICTION MODEL AND ITS APPLICATION IN VIDEO QUALITY ADAPTATION OVER UMTS NETWORKS 441

Fig. 16. Video quality results for different bottleneck bandwidth over UMTSnetwork.

video clips within the same content type) with good predic-tion accuracy. It was also validated with the recentLIVE [41] database. The model has potential applications inseveral other areas, including QoE control and optimization innetwork planning and content provisioning for network/serviceproviders.We found that the impact of combined encoder and network

distortions on video quality are very much content dependent.This was reflected in the subjective scores. For example, se-quences with slow movement and high losses were rated higherthan sequences with fast movement. In our subjective tests noneof the subjects rated the video sequences as “bad”. This is be-cause BLER was restricted to 20%. In future, a study will be un-dertaken on mobile handsets with higher packet losses to coverthe whole MOS range (i.e., from “bad” to “excellent”).In this study, QCIF videos were specifically chosen as the

target application is mobile, but in general the size of the video(spatial resolution) has an impact on overall quality [25]. Withthe increase in available bandwidth, the QCIF resolution mayseem small. On newer smart mobile phones a resolution of320 240 is more commonly used. Future studies will considersubjective tests using mobile handsets with higher resolutionsand hence investigate the impact of resolution on quality. Themethodology presented in this paper is the same and shouldapply to higher resolution phones/terminals. Future studies willalso take into account the impact of core network losses in ad-dition to that of access network losses on video quality. Further,the results from the adaptation scheme will be validated usingsubjective tests.

ACKNOWLEDGMENT

The authors would like to thank Mr. J. O. Fajardo for hishelp in the generation of the dataset and Dr. E. Jammeh for thediscussions on fuzzy logic.

REFERENCES

[1] S. Jeong and H. Ahn, “Mobile IPTV QoS/QoE monitoring systembased on OMA DM protocol,” in Proc. Int. Conf. Information andCommunication Technology Convergence (ICTC), Jeju, Korea, Nov.17–19, 2010, pp. 99–100.

[2] S. Jumisko, V. Ilvonen, A. Kaisa, and V.Mattila, “Effect of TV contentin subjective assessment of video quality on mobile devices,” in Proc.IST/SPIE Conf. Mobile Multimedia, San Jose, CA, 2005, pp. 243–254.

[3] D. S. Hands, “Video QoS enhancement using perceptual quality met-rics,” BT Technol. J., vol. 23, no. 2, pp. 208–216, Apr. 2005.

[4] ITU-T SG 9, Q 12/9, Hybrid Perceptual/Bitstream Models, 2011.[Online]. Available: http://www.itu.int/itu-t/workprog/wp_item.aspx?isn=6299.

[5] H. Koumaras, A. Kourtis, C. Lin, and C. Shieh, “A theoreticalframework for end-to-end video quality prediction of MPEG-basedsequences,” in Proc. 3rd Int. Conf. Networking and Services, Jun.19–25, 2007, pp. 62–65.

[6] A. Eden, “No-reference image quality analysis for compressed videosequences,” IEEE Trans. Broadcast., vol. 54, no. 3, pp. 691–697, Sep.2008.

[7] Q. Huynh-Thu andM. Ghanbari, “Temporal aspect of perceived qualityin mobile video broadcasting,” IEEE Trans. Broadcast., vol. 54, no. 3,pp. 641–651, Sep. 2008.

[8] R. Feghali, F. Speranza, D. Wang, and A. Vincent, “Video qualitymetric for bitrate control via joint adjustment of quantization andframe rate,” IEEE Trans. Broadcast., vol. 53, no. 1, pp. 441–446, Mar.2007.

[9] K. Seshadrinathan and A. Bovik, “Motion tuned spatio-temporalquality assessment of natural videos,” IEEE Trans. Image Process.,vol. 19, no. 2, pp. 335–350, Feb. 2010.

[10] G. Zhai, J. Cai, W. Lin, X. Yang, and W. Zhang, “Three dimensionalscalable video adaptation via user-end perceptual quality assessment,”IEEE Trans. Broadcast, Special Issue on Quality Issues in MultimediaBroadcasting, vol. 54, no. 3, pp. 719–727, Sep. 2008.

[11] A. R. Reibman, V. A. Vaishampayan, andY. Sermadevi, “Quality mon-itoring of video over a packet network,” IEEE Trans. Multimedia, vol.6, no. 2, pp. 327–334, Apr. 2004.

[12] M. Ries, O. Nemethova, and M. Rupp, “Video quality estimation formobile H.264/AVC video streaming,” J. Commun., vol. 3, no. 1, pp.41–50, Jan. 2008.

[13] K.-C. Yang, C. C. Guest, K. El-Maleh, and P. K. Das, “Perceptual tem-poral quality metric for compressed video,” IEEE Trans. Multimedia,vol. 9, no. 7, pp. 1528–1535, Nov. 2007.

[14] P. Calyam, E. Ekicio, C. Lee,M. Haffner, and N. Howes, “A gap-modelbased framework for online VVoIPQoE measurement,” J. Commun.Netw., vol. 9, no. 4, pp. 446–456, Dec. 2007.

[15] S. Tao, J. Apostolopoulos, and R. Guerin, “Real-Time monitoring ofvideo quality in IP networks,” IEEE/ACM Trans. Netw., vol. 16, no. 5,pp. 1052–1065, Oct. 2008.

[16] S. Kanumuri, S. G. Subramanian, P. C. Cosman, and A. R. Reibman,“Predicting H.264 packet loss visibility using a generalized linearmodel,” in Proc. IEEE Int. Conf. Image Processing, Oct. 8–11, 2006,pp. 2245–2248.

[17] S. Mohamed and G. Rubino, “A study of real-time packet videoquality using random neural networks,” IEEE Trans. Circuits Syst.Video Technol., vol. 12, no. 12, pp. 1071–1083, Dec. 2002.

[18] K. Yamagishi and T. Hayashi, “Opinion model for estimating videoquality of videophone services,” in Proc. IEEE Globecom, San Fran-cisco, CA, Nov. 27–Dec. 1, 2006, pp. 1–5.

[19] K. Yamagishi, T. Kawano, and T. Hayashi, “Hybrid video-quality-esti-mation model for IPTV services,” in Proc. IEEE Globecom, Honolulu,HI, Nov. 30–Dec. 4, 2009, pp. 1–5.

[20] G.W. Cermak, “Subjective video quality as a function of bit rate, framerate, packet loss rate and codec,” in Proc. 1st Int. Workshop Quality ofMultimedia Experience (QoMEX), Jul. 29–31, 2009, pp. 41–46.

[21] A. K. Moorthy, K. Seshadrinathan, R. Soundararajan, and A. C. Bovik,“Wireless video quality assessment: A study of subjective scores andobjective algorithms,” IEEE Trans. Circuits Syst. Video Technol., vol.20, no. 4, pp. 513–516, Apr. 2010.

[22] K. Seshadrinathan, R. Soundararajan, A. Bovik, and L. Cormack,“Study of subjective and objective quality assessment of video,” IEEETrans. Image Process., vol. 19, no. 6, pp. 1427–1441, Jun. 2010.

[23] S. Winkler and P. Mohandas, “The evolution of video quality measure-ment: From PSNR to hybrid metrics,” IEEE Trans. Broadcast., vol. 54,no. 3, pp. 660–668, Sep. 2008.

Page 12: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, …thephdconsultancy.com/wp-content/uploads/2013/10/khan-paper.pdf · IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

442 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 2, APRIL 2012

[24] A. Khan, L. Sun, E. Ifeachor, J. Fajardo, and F. Liberal, “Video qualityprediction models based on video content dynamics for H.264 videoover UMTS networks,” Int. J. Digit. Multimedia Broadcast., SpecialIssue on IP and Broadcasting Systems Convergence (IPBSC), vol.2010, 2010, 17 pp.

[25] G. Zhai, J. Cai, W. Lin, X. Yang, W. Zhang, and M. Etoh, “Cross-di-mensional perceptual quality assessment for low bitrate videos,” IEEETrans. Multimedia, vol. 10, no. 7, pp. 1316–1324, Nov. 2008.

[26] A. Khan, L. Sun, E. Ifeachor, J. Fajardo, and F. Liberal, “Video qualityprediction model for H.264 video over UMTS networks and their ap-plication in mobile video streaming,” in Proc. IEEE ICC, Cape Town,South Africa, May 23–27, 2010, pp. 1–5.

[27] A. Khan, L. Sun, and E. Ifeachor, “Content clustering-based videoquality prediction model for MPEG4 video streaming over wirelessnetworks,” in Proc. IEEE ICC, Dresden, Germany, Jun. 14–18, 2009,pp. 1–5.

[28] G. W. Snedecor and W. G. Cochran, Statistical Methods, 8th ed.Ames: Iowa State Univ. Press, 1989.

[29] G. Muntean, P. Perry, and L. Murphy, “A new adaptive multimediastreaming system for all-IP multi-service networks,” IEEE Trans.Broadcast., vol. 50, no. 1, pp. 1–10, Mar. 2004.

[30] B. Ciubotaru, G. Muntean, and G. Ghinea, “Objective assessment ofregion of interest-aware adaptive multimedia streaming quality,” IEEETrans. Broadcast., vol. 55, no. 2, pp. 202–212, Jun. 2009.

[31] P. Antoniou, V. Vassiliou, and A. Pitssillides, “Delivering adaptivescalable video over the wireless networks,” in Proc. 1st ERCIM Work-shop eMobility, Coimbra, Portugal, May 21, 2007, pp. 23–34.

[32] E. Jammeh, M. Fleury, and M. Ghanbari, “Fuzzy logic congestion con-trol of transcoded video streaming without packet loss feedback,” IEEETrans. Circuits Syst. Video Technol., vol. 18, no. 3, pp. 387–393, Mar.2008.

[33] M. Rezaei, M. Hannuksela, and M. Gabbouj, “Semi-fuzzy rate con-troller for variable bit rate video,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 18, no. 5, pp. 633–645, May 2008.

[34] JM H.264 Software. [Online]. Available: http://iphome.hhi.de/suehring/tml/.

[35] Degraded Video Clips. [Online]. Available: http://www.tech.plym.ac.uk/spmc/staff/akhan/degraded_video.html.

[36] OPNET for Research. [Online]. Available: http://www.opnet.com.[37] E. N. Gilbert, “Capacity of a burst-noise channel,” Bell Syst. Tech. J.,

vol. 39, pp. 1253–1265, Sep. 1960.[38] W. Karner, O. Nemethova, P. Svoboda, andM. Rupp, “Link error anal-

ysis and modelling for video streaming cross-layer design in mobilecommunication networks,” ETRI J., vol. 29, no. 5, pp. 569–595, Oct.2007.

[39] BT-500-11: Methodology for Subjective Assessment of the Quality ofTelevision Picture, International Telecommunication Union.

[40] Subjective Video Quality Assessment Methods for Multimedia Appli-cations, International Telecommunications Union, 2008, ITU-T. Rec.P.910.

[41] A. K. Moorthy, K. Seshadrinathan, R. Soundararajan, and A. C.Bovik, LIVE Wireless Video Quality Assessment Database. [Online].Available: http://live.ece.utexas.edu/research/quality/live_wire-less_video.html.2009.

[42] Final Report From the Video Quality Experts Group on theValidation of Objective Quality Metrics for Video Quality As-sessment, Video Quality Experts Group (VQEG), MultimediaGroup Test Plan, 2008. [Online]. Available: http://www.its.bl-drdoc.gov/vqeg/projects/frtv_phaseI.

[43] H. Takagi, “Application of neural networks and fuzzy logic to con-sumer products,” IEEE Technol. Updates Series: Fuzzy Logic Technol.Appl., vol. 1, pp. 8–12, 1994.

[44] A. Lie and J. Klaue, “Evalvid-RA: Trace driven simulation of rate adap-tive MPEG4 VBR video,” Multimedia Syst., vol. 14, no. 1, pp. 33–50,2008.

[45] T. Freidman, R. Caceres, and A. Clark, RTP Control Protocol ExtendedReports (RTCP XR), 2003.

[46] NS2. [Online]. Available: http://www.isi.edu/nsnam/ns/.

[47] Enhanced UMTS Radio Access Network Extensions for ns-2(E.U.R.A.N.E). [Online]. Available: http://eurane.ti-wmc.nl/eurane/.

[48] M. Handley, S. Floyd, J. Widmer, and J. Padhye, RFC3448: TCP-Friendly Rate Control (TFRC): Protocol Specification, 2003. [Online].Available: http://ww.ietf.org/rfc/rfc3448.txt.

[49] Third Generation Partnership Project: Technical Specification GroupAccess Network; Radio Link Control (RLC Specification (Release 5),3GPP TS 25.322.

Asiya Khan received the B.Eng. degree (Hons)in electrical and electronic engineering from theUniversity of Glasgow, Glasgow, U.K., in 1992,the M.Sc. degree in communication, control, anddigital signal processing from Strathclyde Univer-sity, Glasgow, in 1993, and the Ph.D. degree inmultimedia communication from the University ofPlymouth, Plymouth, U.K.She worked with British Telecommunication Plc

from 1993 to 2002 in a management capacity devel-oping various products and seeing them from incep-

tion through to launch. She has been a Research Assistant in Perceived QoSControl for New and Emerging Multimedia Services (VoIP and IPTV)–FP7ADAMANTIUM project at the University of Plymouth. She has published sev-eral papers in international journals and conferences. Her research interests in-clude video quality of service over wireless networks, adaptation, perceptualmodeling, and content-based analysis.Dr. Khan was awarded with the “Best Paper Award” in ICAS 2009.

Lingfen Sun received the B.Eng. degree in telecom-munication engineering in 1985 and theM.Sc. degreein communication and electronics system in 1988from the Institute of Communication Engineering,Nanjing, China, in 1985 and 1988, respectively, andthe Ph.D. in computing and communications fromthe University of Plymouth, Plymouth, U.K., in2004.She is currently an Associate Professor (Reader)

in Multimedia Communications and Networks in theSchool of Computing and Mathematics, University

of Plymouth. She has been involved in several European and industry fundedprojects related with multimedia QoE. She has published 60 peer-refereed tech-nical papers since 2000 and filed 1 patent. Her current research interests includemultimedia (voice/video/audiovisual) quality assessment, QoS/QoE manage-ment/control, VoIP, and network performance characterization.Dr. Sun is the Chair of QoE Interest Group of IEEE MMTC during

2010–2012, Publicity Co-Chair of IEEE ICME 2011, and Post & DemoCo-Chair of IEEE Globecom 2010.

Emmanuel Ifeachor received the M.Sc. degree incommunication engineering from Imperial College,London, U.K., and the Ph.D. degree in medical elec-tronics from the University of Plymouth, Plymouth,U.K.He is a Professor of Intelligent Electronic Systems

and Head of Signal Processing andMultimedia Com-munications research at the University of Plymouth.His primary research interests are in information pro-cessing and computational intelligence techniquesand their application to problems in communications

and biomedicine. His current research includes user-perceived QoS and QoEprediction and control for real-time multimedia services, biosignals analysisfor personalized healthcare, and ICT for health. He has published extensivelyin these areas.