7
IEEE Communications Magazine • May 2013 101 INTRODUCTION Interactive 3D video streaming will enable seam- less, more involving, and adaptable delivery of 3D content to end users. However, 3D video streaming over band-limited and unreliable com- munication channels can introduce artifacts on the transmitted 3D content. The effect could be much more significant compared to conventional 2D video streaming. For instance, the nature of the 3D video source format (e.g., color plus depth images vs. left and right views) and the way our human visual system (HVS) perceives channel introduced artifacts in 3D video is dif- ferent from 2D video; as an example, color plus depth map 3D video presentation may have to utilize impaired depth map information at the receiver side to render novel views. In the remainder of this article, after an intro- duction on interactive video streaming and 3D artifacts, we discuss how we can quantify the overall user experience in 3D viewing and how our HVS reacts to different artifacts. Moreover, we elaborate on how we could exploit the mea- surement of the quality at the receiver side to overcome these effects through QoE-driven sys- tem adaptation. INTERACTIVE 3D VIDEO STREAMING Video streaming over the Internet has become one of the most popular applications, and Inter- net 3D video streaming is expected to become more popular in the future, also thanks to the recently standardized wireless systems, including WiMAX, Third Generation Partnership Project (3GPP) Long Term Evolution (LTE)/LTE- Advanced, the latest 802.11 standards, and advanced short-range wireless communication sys- tems, enabling the transmission of high-band- width demanding multimedia data. For such applications the target of the system design should be the maximization of the final quality perceived by the user, or quality of experience (QoE), rather than only of the performance of the net- work in terms of “classical” quality of service (QoS) parameters such as throughput and delay. 3D video services, and in particular those deliv- ered through wireless and mobile channels, face a number of challenges due to the need to handle a large amount of data and the possible limitations due to the characteristics of the transmission chan- nel and the device. This can result in perceivable impairments originated in the different steps of the communication system, from content produc- tion to display techniques (Fig. 1), and influences the user’s perception of quality. For instance, channel congestion and errors at the physical layer may result in packet losses and delay, whereas compression techniques introduce compression artifacts. Such impairments could be perceived by the end user and result to a different extent in the degradation of the quality of the rendered 3D video. Some of these artifacts are common to 2D video applications as well. In addition, artifacts that are specific to 3D can be introduced during the end-to-end chain of 3D video delivery such as crosstalk and keystone distortion [1]. 3D VIDEO QOE The overall enjoyment or annoyance of 3D video streaming applications or services is influenced by several factors such as human factors (e.g., demo- graphic and socio economic background), system factors (e.g., content and network related influ- ences), and contextual factors (e.g., duration, time of day, and frequency of use). The overall experi- ence can be analyzed and measured by QoE relat- ed parameters that quantify the user’s overall satisfaction about a service [2, 3]. QoS-related measurements only measure performance aspects 0163-6804/13/$25.00 © 2013 IEEE ABSTRACT New 3D video applications and services are emerging to fulfill increasing user demand. This effort is well supported by the increasing 3D video content including user generated content (e.g., through 3D capture/display enabled mobile phones), technological advancements (e.g., HD 3D video capture and processing methods), affordable 3D displays, and standardization activi- ties. However, not much attention has been given to how these technologies, along the end-to-end chain from content capture to display, affect user perception and whether the overall experience of 3D video users is satisfactory or not. 3D video streaming also introduces artifacts on the recon- structed 3D video at the receiver end, leading to inferior quality and user experience. In this article we present and discuss in detail how artifacts introduced during 3D video streaming affect the end-user perception and how we could use real- time quality evaluation methodologies to over- come these effects. The observations presented can underpin the design of future QoE-aware 3D video streaming systems. INTERACTIVE 3D VIDEO STREAMING Chaminda T. E. R. Hewage and Maria G. Martini, Kingston University Quality of Experience for 3D Video Streaming

Quality of experience for 3D video streaming

  • Upload
    maria-g

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Quality of experience for 3D video streaming

IEEE Communications Magazine • May 2013 101

INTRODUCTION

Interactive 3D video streaming will enable seam-less, more involving, and adaptable delivery of3D content to end users. However, 3D videostreaming over band-limited and unreliable com-munication channels can introduce artifacts onthe transmitted 3D content. The effect could bemuch more significant compared to conventional2D video streaming. For instance, the nature ofthe 3D video source format (e.g., color plusdepth images vs. left and right views) and theway our human visual system (HVS) perceiveschannel introduced artifacts in 3D video is dif-ferent from 2D video; as an example, color plusdepth map 3D video presentation may have toutilize impaired depth map information at thereceiver side to render novel views.

In the remainder of this article, after an intro-duction on interactive video streaming and 3Dartifacts, we discuss how we can quantify theoverall user experience in 3D viewing and howour HVS reacts to different artifacts. Moreover,we elaborate on how we could exploit the mea-surement of the quality at the receiver side toovercome these effects through QoE-driven sys-tem adaptation.

INTERACTIVE 3D VIDEO STREAMING

Video streaming over the Internet has becomeone of the most popular applications, and Inter-net 3D video streaming is expected to becomemore popular in the future, also thanks to therecently standardized wireless systems, includingWiMAX, Third Generation Partnership Project(3GPP) Long Term Evolution (LTE)/LTE-Advanced, the latest 802.11 standards, andadvanced short-range wireless communication sys-tems, enabling the transmission of high-band-width demanding multimedia data. For suchapplications the target of the system design shouldbe the maximization of the final quality perceivedby the user, or quality of experience (QoE),rather than only of the performance of the net-work in terms of “classical” quality of service(QoS) parameters such as throughput and delay.

3D video services, and in particular those deliv-ered through wireless and mobile channels, face anumber of challenges due to the need to handle alarge amount of data and the possible limitationsdue to the characteristics of the transmission chan-nel and the device. This can result in perceivableimpairments originated in the different steps ofthe communication system, from content produc-tion to display techniques (Fig. 1), and influencesthe user’s perception of quality. For instance,channel congestion and errors at the physical layermay result in packet losses and delay, whereascompression techniques introduce compressionartifacts. Such impairments could be perceived bythe end user and result to a different extent in thedegradation of the quality of the rendered 3Dvideo. Some of these artifacts are common to 2Dvideo applications as well. In addition, artifactsthat are specific to 3D can be introduced duringthe end-to-end chain of 3D video delivery such ascrosstalk and keystone distortion [1].

3D VIDEO QOEThe overall enjoyment or annoyance of 3D videostreaming applications or services is influenced byseveral factors such as human factors (e.g., demo-graphic and socio economic background), systemfactors (e.g., content and network related influ-ences), and contextual factors (e.g., duration, timeof day, and frequency of use). The overall experi-ence can be analyzed and measured by QoE relat-ed parameters that quantify the user’s overallsatisfaction about a service [2, 3]. QoS-relatedmeasurements only measure performance aspects

0163-6804/13/$25.00 © 2013 IEEE

ABSTRACT

New 3D video applications and services areemerging to fulfill increasing user demand. Thiseffort is well supported by the increasing 3Dvideo content including user generated content(e.g., through 3D capture/display enabled mobilephones), technological advancements (e.g., HD3D video capture and processing methods),affordable 3D displays, and standardization activi-ties. However, not much attention has been givento how these technologies, along the end-to-endchain from content capture to display, affect userperception and whether the overall experience of3D video users is satisfactory or not. 3D videostreaming also introduces artifacts on the recon-structed 3D video at the receiver end, leading toinferior quality and user experience. In this articlewe present and discuss in detail how artifactsintroduced during 3D video streaming affect theend-user perception and how we could use real-time quality evaluation methodologies to over-come these effects. The observations presentedcan underpin the design of future QoE-aware 3Dvideo streaming systems.

INTERACTIVE 3D VIDEO STREAMING

Chaminda T. E. R. Hewage and Maria G. Martini, Kingston University

Quality of Experience for 3D Video Streaming

MARTINI LAYOUT_Layout 1 4/29/13 1:28 PM Page 101

Page 2: Quality of experience for 3D video streaming

IEEE Communications Magazine • May 2013102

of a physical system, with the main focus ontelecommunications services. Measuring QoSparameters is straightforward since objective,explicit technological methods can be used, where-as measuring and understanding QoE requires amultidisciplinary and multitechnological approach.The added dimension of depth in 3D viewinginfluences several perceptual attributes such asoverall image quality, depth perception, natural-ness, presence, and visual comfort. For instance,an increased binocular disparity enhances thedepth perception of viewers, although in extremecases this can lead to eye fatigue as well. There-fore, the overall enjoyment of the 3D applicationcould be hindered by the eye strain experienced bythe end user. The influence of these attributes onthe overall experience of 3D video streaming usersis yet to be investigated. Figure 2 outlines a com-parison between QoE and QoS.

3D VIDEO ARTIFACTSThe different processing steps along the end-to-end 3D video chain introduce image artifactswhich may affect 3D perception and the overallexperience of viewers [1]. Even though muchattention has been paid to analyzing and mitigat-ing the effects of 3D image/video capture, pro-cessing, rendering, and display techniques, theeffects of artifacts introduced by the transmissionsystem have not received much attention com-pared to the 2D image/video counterpart. Someof these artifacts influence the overall imagequality, for instance, blurriness, luminance andcontrast levels, similar as in 2D image/video. Theeffect of transmission over band-limited andunreliable communication channels (e.g., wirelesschannels) can be much worse for 3D video thanfor 2D video, due to the presence in the first caseof two channels (i.e., stereoscopic 3D video) thatcan be impaired in a different way; as a conse-quence the 3D reconstruction in the HVS maybe affected. Some networks introduce factorsdirectly related to temporal domain desynchro-nization issues. For instance, delay in one viewcould lead to temporal desynchronization, andthis can lead to reduced comfort in 3D viewing.

The methods employed to mitigate these arti-facts (e.g., error concealment) need to be carefullydesigned to suit 3D video applications. The simpleapplication of 2D image/video methods would notwork effectively in this case, as discussed in [4] fordifferent error concealment algorithms for 3Dvideo transmission errors. In [4] it is observed that

in some cases switching back to the 2D video modeis preferred to applying 2D error concealmentmethods separately for left and right views to recov-er missing image information during transmission.There could be added implications introduced bythese artifacts into our HVS. Therefore, artifactscaused as a result of 3D video streaming can beclearly appreciated only by understanding how ourHVS perceives different 3D video artifacts.

Binocular Vision — The HVS is capable ofaligning and fusing two slightly different views fedinto the left and right eyes, and hence of perceiv-ing the depth of the scene. Both binocular andmonocular cues assist our HVS to perceive differ-ent depth planes of image objects. Binocular dis-parity is the major cue used by the HVS to identifythe relative depth of objects. Other monoculardepth cues include perspective, occlusions, motionparallax, and so on. During 3D video streaming,one view or both views could be badly affected bychannel impairments (e.g., bit errors and packetlosses caused by adverse channel conditions, delay,or jitter). For instance, frame freezing mechanismsemployed to tackle missing frames caused bytransmission errors or delay could lead to tempo-ral desynchronization where one eye sees delayedcontent compared to the other eye. There are twoimplications associated with the case where oneview is affected by transmission impairments:• Binocular suppression• Binocular rivalryOur HVS is still able to align and fuse stereoscopiccontent if one view is affected by artifacts due tocompression, transmission, and rendering. Binocu-lar suppression theory suggests that in these situa-tions the overall perception is usually driven by thequality of the best view (i.e., left or right), at leastif the quality of the worst view is above a thresholdvalue. However, this capability is limited, and stud-ies show that additional cognitive load is necessaryto fuse these views [6]. Increased cognitive loadleads to visual fatigue and eye strain, and preventsusers from watching 3D content for a long time.This directly affects user perception and QoE. Ifone of the views is extremely altered by the trans-mission system, the HVS will not be able to fusethe affected views, and this causes binocular rival-ry. This has detrimental effects on the final QoEperceived by the end user. Recent studies on 3Dvideo transmission [5] have found that binocularrivalry causes the overall perception to be affected,and this effect prevails over the effect of binocularsuppression. To avoid the detrimental effect ofbinocular rivalry, the transmission system could bedesigned appropriately taking this issue intoaccount. For instance, the transmission systemparameters can be updated on the fly to obtain 3Dviews with minimum distortions, according to thefeedback on the measurement of 3D video qualityat the receiver side. In case of low quality due todifferent errors in the two views, if the receivedquality of one of the views is significantly low, thetransmission system could be informed to allocatemore resources to the worse view or to increasethe error protection level for that 3D video chan-nel to mitigate the quality loss in the future. Thisincreases the opportunity to fuse the 3D videocontent more effectively and improve the finalQoE of users. The next section of this article dis-

Figure 1. End-to-end 3D video processing chain.

3D video displays3D video transmission3D video capture3D video processing

MARTINI LAYOUT_Layout 1 4/29/13 1:28 PM Page 102

Page 3: Quality of experience for 3D video streaming

IEEE Communications Magazine • May 2013 103

cusses in detail how we can measure 3D videoquality.

MEASURING 3D VIDEO QUALITYImmersive video quality evaluation is a hot topicamong researchers and developers at present, dueto its complex nature and the unavailability of anaccurate objective quality metric for 3D video. 3Dperception can be associated with several percep-tual attributes such as overall image quality, depthperception, naturalness, presence, and comfort.Currently, 3D quality evaluation studies focusonly on one specific aspect such as overall quality,depth perception, or visual comfort. A detailedanalysis is necessary to study how these 3D per-cepts influence the overall perceived 3Dimage/video quality in general (i.e., 3D QoE). Forinstance, these attributes can be in conflict (e.g.,increased disparity can cause eye fatigue), result-ing in affected user experience. Mostly, apprecia-tion oriented psychophysical experiments areconducted to measure and quantify 3D perceptu-al attributes. At present, electro-physiologicaldevices are also being used to record a user’soverall excitement about a 3D service.

A few standards define subjective quality eval-uation procedures for 3D video, for example,International Telecommunication Union Radio-communication Standards Section (ITU-R)BT1438, ITU-R BT500-12, and ITU Telecommu-nication Standards Sector (ITU-T) P.910. Howev-er, these procedures are not competent enough tomeasure 3D QoE parameters and show severallimitations; for instance, these are not able tomeasure the combined effect of different percep-tual attributes. Current standardization activitieson 3D quality evaluation are discussed. Subjectivequality evaluation studies under different systemparameter changes have been reported in a num-ber of studies [7]. However, these studies are lim-ited to certain types of image artifacts (e.g.,compression artifacts) and have limited usage inpractical applications. Furthermore, subjective

quality evaluation requires time, effort, controlledtest environments, money, human observers, andso on, and cannot be deployed in a live environ-ment where quality is measured on the fly.

Objective quality evaluation methods for 3Dvideo are also emerging to provide accurate resultsin comparison to the quality ratings achieved withsubjective tests [8]. The performance of these met-rics is, most of the time, an approximation of theresults of subjective quality assessments. 3D objec-tive quality metrics are designed to account forboth disparity and texture related artifacts basedon extracted image features. The final scoreshould therefore reflect the quality degradation interms of both 2D image and depth perception. Itis important to have a reliable ground truth 3Ddataset to evaluate and verify the performance ofemerging 3D objective metrics. These metricsneed to be verified and validated using sequenceswith different characteristics and under differentenvironmental settings. Studies have also foundout that there is a high correlation between sub-jective ratings and individual objective quality rat-ings of 3D video components (e.g., average peaksignal-to-noise ratio [PSNR] and structural simi-larity index [SSIM] of left and right video or colorand depth video) [9]. For instance, depth percep-tion is highly correlated to the average PSNR ofthe rendered left and right image sequences [9].This could be due to the loss of correspondencebetween left and right objects and reduction ofmonocular depth cues as a result of compressionand transmission errors. This means that we coulduse individual objective quality measures of differ-ent 3D video components to predict the true userperception in place of subjective quality evalua-tion, through a suitable approximation derivedbased on correlation analysis. However, with some3D source representations such as the color anddepth map 3D image format, it may be difficult toderive a direct relationship between objectivemeasures and subjective quality ratings. Forinstance, the objective quality of the depth mapmay have a very weak correlation on its own with

Figure 2. Quality of experience vs. quality of service.

Human factors (e.g., demographic andsocio-economic background, physical

and mental constitution, user’semotional state)

System factors• Content related (e.g., content type, temporal characteristics)• Media related (e.g., resolution, encoding, sampling rate)• Network related (e.g., bandwidth, delay, jitter, loss, error rate, throughput)• Device related (e.g., display size, usability, security)

System factors• Network related (e.g., bandwidth, delay, jitter, loss, error rate, throughput)

Contextual factors• Temporal context (e.g., time of the day, duration)• Economic context (e.g., cost)• Task context• Social context

Factors influencingQoE

Factors influencingQoS

3D objective quality

metrics are designed

to account for both

disparity and texture

related artifacts

based on extracted

image features.

The final score

should therefore

reflect the quality

degradation in terms

of both 2D image

and depth

perception.

MARTINI LAYOUT_Layout 1 4/29/13 1:28 PM Page 103

Page 4: Quality of experience for 3D video streaming

IEEE Communications Magazine • May 2013104

the overall subjective quality, because the depthmap is used for projecting the corresponding colorimage into 3D coordinates and is not directlyviewed by the end users. Individual quality ratingsof left and right views may not always account fordepth reproduction of the scene. Therefore, thenext phase of 3D objective quality metrics includesa methodology to quantify the effect of binoculardisparity of 3D scenes in addition to a convention-al image/video quality assessment methodology.For instance, in [8], in addition to image qualityartifacts, disparity distortion measures were alsoincorporated to evaluate the overall 3D videoquality.The article showed improved performanceover the method that does not account for thecorrespondence information of stereoscopic views.

The latest 3D image/video quality metrics eval-uate depth reproduction in addition to the usualimage artifacts (e.g., blockiness) using specificimage features (e.g., edge, disparity and structuralinformation of stereoscopic images) that areimportant for the HVS in both 2D and 3D view-ing. For instance, the method proposed in [10]shows high correlation values with subjective qual-ity results (mean opinion score, MOS): the corre-lation coefficient with subjective quality ratings isas high as 0.95; this outperforms the methodbased on 2D image quality + disparity [8] andother conventional 2D quality metrics separatelyapplied to left and right views (Table 1). Thereported performance figures in Table 1 areobtained using the same 3D dataset. These obser-vations confirm that accurate 3D image qualitymetrics should be designed to also consider binoc-ular disparity distortions. All the methodsdescribed above are full-reference (FR) methodsand need the original 3D image sequence to mea-sure the quality by comparison; hence, they arenot suitable for the evaluation of the quality onthe fly in real-time transmission applications suchas interactive 3D video streaming. In this case thesolution is to use reduced-feference (RR) or no-reference (NR) metrics, which do not require theoriginal image for quality assessment, but eitherno information (NR) or just some side informa-tion about it (RR) requiring few bits for its trans-mission. Most of the NR metrics are designedspecifically for a known set of artifacts (e.g., JPEGcompression) and cannot be deployed in a moregeneral scenario. In case of RR metrics, sideinformation is generated from features extractedfrom the original 3D image sequence and sent tothe receiver side to measure 3D video quality.Since the reference side information has to betransmitted over the channel, either in-band or on

a dedicated connection, the overhead should bekept at a minimum level. The next sectiondescribes how we could measure 3D video qualityon the fly using RR and NR methods, and pro-vides a brief description of the existing methods.

REAL-TIME 3D VIDEO QUALITYEVALUATION STRATEGIES

The measured image quality at the receiver sidecan be used as feedback information to updatesystem parameters on the fly in a QoE-aware sys-tem design approach [2, 11]. However, measuring3D video quality in real time is a challenge mainlydue to the complex nature of 3D video quality andalso the fact that the amount of side informationto be sent to measure the quality with RR meth-ods is larger compared to 2D image/video applica-tions. The emerging RR and NR qualityevaluation methods are based on image featuresassociated with the characteristics of the HVS.Some of these features are related to image per-ception (e.g., luminance, contrast) and some todepth perception (e.g., disparity, structural corre-lations). An appropriate selection of these featuresis crucial to design an effective 3D image/videoquality assessment method. The selected featuresshould be able to quantify image and depth per-ception related artifacts with a minimum over-head. If the overhead is significant, the feasibilityof deploying the designed RR method is reduced.Figure 3 shows how the extracted edge informa-tion is employed to measure 3D video quality inthe RR method proposed in [12]. In this method,luminance and contrast details of the original anddistorted images are utilized to count for conven-tional image artifacts, whereas edge informationbased structural correlation is employed to mea-sure the structural/disparity degradation of the 3Dscene, which directly affects rendering using colorplus depth map based 3D video. In order toreduce the overhead for side information (i.e.,extracted features of the reference image), losslesscompression mechanisms can be deployed for itscompression. Extra effort should be also made tosend the side information without corruptionusing a dedicated channel or highly protected for-ward channel. Visual attention models could alsobe utilized to find 3D image/video features thatattract significant attention during 3D viewing.However, a direct relationship between visualattention and image perception for 3D images andvideo has yet to be found.

NR methods are the most suitable for real-time 3D video applications since these do notconsume any bandwidth for the transmission ofside information. However, their performanceand application domain is limited since they relysolely on the received 3D image/video sequenceand other contextual information (e.g., hybridNR methods: packet loss rate, bit error rate). Itmay be impossible to account for all the artifactsimposed along the end-to-end 3D video chainwithout referring to the original image sequence.This is why most of the proposed NR metricsare limited to a specific set of artifacts [13].

Table 2 reports a few existing NR and RRquality metrics for 3D image/video. This tableexplains which image features are used to mea-

Table 1. Correlation between objective 3D image/video quality measurementand subjective quality.

Method CC SSE RMSE

SSIM 0.837 0.965 0.159

Video quality metric (VQM) 0.932 0.423 0.106

Proposed in [8]: 2D image quality + disparity 0.901 0.608 0.126

Proposed in [10] 0.947 0.341 0.095

MARTINI LAYOUT_Layout 1 4/29/13 1:28 PM Page 104

Page 5: Quality of experience for 3D video streaming

IEEE Communications Magazine • May 2013 105

sure the overall perception and how much the dif-ferent metrics are correlated with subjective qual-ity scores (i.e., MOS) and existing full referencemethods. It can be observed that most of thesemethods show a high degree of correlation withsubjective MOS and full reference methods.However, these metrics are focused on one ortwo specific 3D perceptual attributes. The com-bined effect of these perceptual attributes, whichis directly related to user 3D QoE, has not beenaddressed to date. The methods in [13, 14] areevaluated using the same image database, where-as others are evaluated using different data sets.Since some of these metrics (e.g., NR metrics [13,15]) are designed for a particular type of imageartifacts (e.g., JPEG compression), it is not alwayspossible to compare the performance of an NRmetric with another objective quality model in acommon dataset. On the other hand, due to theoverhead associated with RR metrics comparedto zero overhead for NR metrics, the usage andadvantages of these methods are significantly dif-ferent. In addition, due to some practical reasons(intellectual property rights, different source 3Dvideo formats, e.g., color + depth vs. left andright images, unavailability of ground truth depthmaps, etc.), it is not always feasible to comparethe performance of two different 3D quality eval-uation algorithms in a common dataset. The lackof reliable and comprehensive 3D image/videodatabases is another major challenge faced byresearchers and developers, making it difficult toeffectively compare the performance of emergingobjective and subjective quality evaluation meth-ods with that of the existing methods.

CHALLENGES ANDFUTURE RESEARCH DIRECTIONS

The possibility to measure 3D image/video qualityin real time, as requested by 3D video applica-tions, is hindered by several issues. The majorchallenge is how we could measure the effect of

all perceptual attributes (depth, presence, natural-ness, etc.) associated with 3D viewing. The lack ofavailability of common 3D image/video databasesis also detrimental for the advance in this disci-pline. The following paragraphs briefly discussthese challenges and possible solutions foreseen.

MEASUREMENT OFDIFFERENT 3D PERCEPTUAL ATTRIBUTES

Even though emerging 3D quality evaluation meth-ods accurately predict a given quality attribute, therelationship among these perception attributes hasnot been thoroughly studied. The combined effectdirectly affects user experience and can be mea-sured using emerging QoE indices. Therefore, thecurrent need is to understand how 3D audio/imageprocessing and transmission artifacts affect theoverall experience of the user, then identify audio,image, and contextual features that can be used toquantify the overall effect on user experience. Onthe other hand, it is necessary to understand howthe HVS perceives these 3D artifacts. For instance,there could be conflicts based on whether binocularsuppression or binocular rivalry is taking placebased on the artifacts in question. These aspectsneed extended attention in order to measure theoverall experience of 3D viewing.

In order to enable a unified approach to 3Dobjective and subjective quality evaluation studies,standardization of these procedures are necessary.Several standardization activities are being carriedout by the Video Quality Expert Group (VQEG),ITU (Recommendations: ITU-T P- and J-series),European Broadcasting Union (EBU, 3D-TVGroup), and other standards development organi-zations (SDOs) in relation to 3D video subjectiveand objective quality evaluations. Currently,VQEG is working (3DTV project) on creating aground truth 3D video dataset (GroTruQoEdataset) using the pair-comparison method. Thisground truth database will then be used to evalu-ate other time-efficient 3D subjective quality eval-uation methodologies and objective quality

Figure 3. Reduced reference edge based 3D video quality metric [12].

RR feature extraction

Processed/received video

Proposed qualityindex for depth

Depth structural comparison

Colour structural comparison

3D video decoder

Luminance andcontrast statistics fororiginal depth map

Lum

inan

ce a

ndco

ntra

st s

tati

stic

s fo

rpr

oces

sed

dept

h m

ap

Proposed qualityindex for color

Luminance andcontrast statistics fororiginal color image

Luminance and contraststatistics for processed

color image

The possibility to

measure 3D

image/video quality

in real time, as

requested by 3D

video applications, is

hindered by several

issues. The major

challenge is how we

could measure the

effect of all

perceptual attributes

associated with

3D viewing.

MARTINI LAYOUT_Layout 1 4/29/13 1:28 PM Page 105

Page 6: Quality of experience for 3D video streaming

IEEE Communications Magazine • May 2013106

models. In addition, the project also addresses theobjective quality assessment of 3D video, with theplan to evaluate 3D QoE in relation to the visualquality, depth quality, and visual comfort dimen-sions. Most of these findings are reported toobjective and subjective 3D video quality studiesin ITU-T Study Groups (SGs) 9 and 12. EBU isalso working on 3D video production, formats,and sequence properties for 3D-TV broadcastingapplications (e.g., EBU Recommendation R 135).

LACK OF 3D IMAGE/VIDEO DATABASESThere are several image/video quality databasesfor conventional 2D image/video artifacts,although only a few have been reported for 3Dimage/video artifacts. This prevents developersfrom using a common dataset to evaluate the per-formance of their metrics. Table 3 shows some ofthe reported 3D image/video databases in the lit-erature. The amount of artifacts considered inthese databases is limited. Most of them do notconsider artifacts that could be introduced during

transmission. Therefore, it is a responsibility ofthe research community to produce comprehen-sive 3D video datasets covering a range of imageand transmission artifacts, and make available thedeveloped 3D image/video dataset publicly.

VISUAL ATTENTION MODELS TODEVELOP RR AND NR QUALITY METRICS

The attention of users during 3D viewing can beinfluenced by several factors including spatial/tem-poral frequencies, depth cues, conflicting depthcues, and so on. The studies on visual attention in2D/3D images found out that the behavior of view-ers during 2D viewing and 3D viewing is not alwaysidentical (e.g., center bias vs. depth bias). Theseobservations are tightly linked with the way we per-ceive 3D video. Therefore, effective 3D video qual-ity evaluation and 3D QoE enhancement schemescould be designed based on these observations.

There are still unanswered questions such aswhether quality assessment is analogous to atten-

Table 2. NR and RR methods for 3D image/video.

Qualitymetric

Method(NR orRR)

ArtifactsFeatures used to measureimage artifacts (IA) anddisparity (D)

CC ROCC OR RMSE

Cyclop[14] RR JPEG symmetric and asymmet-

ric coding artifacts

IA: Contrast sensitivity (spa-tial frequency and orienta-tion); D: coherence ofcyclopean images

0.981 0.950 0.050 —

Sazzad etal. [13] NR JPEG symmetric and asymmet-

ric coding artifacts

IA: Blockiness and zerocrossing of edge, flat andtexture areas; D: averagezero crossing of plane andnon-plane areas

0.960 0.920 0.069 —

Solh et al.[15] NR

Depth map and colored videocompression, depth estimation(stereo matching), and depthfrom 2D to 3D conversion

IA D: Temporal outliers (TO),temporal inconsistencies(TI), and spatial outliers (SO)using ideal depth estimatefor each pixel

0.916 0.1003 0.8 1.686

Hewageand Martini[12]

RR H264 compression and ran-dom packet losses

IA: Luminance, structure andcontrast D: edge basedstructural correlation

Colour:0.9273 (vs.FR); Depth:0.9795 (vs.FR)

Colour:0.0110 (vs.FR); Depth:0.0064 (vs.FR)

Table 3. Available 3D image/video databases.

3D image/video database Creator Artifacts

Mobile 3D video database University of Tampereand Nokia

Crosstalk, blocking, color mismatch and bleeding, packet losses for low-resolution video (only impaired sequences, no MOS values provided).

IRCCyN 3D image database University of Nantes JPEG, J2K, upsample/downsample, etc.

EPFL databases for images/videos EPFL Different camera distances

Kingston University videodatabase

Kingston University-London Packet losses

NAMA3DS1-COSPAD1 University of Nantes H.264 and JPEG2000 compression artifacts

RMIT3DV RMIT University Uncompressed HD 3D video

MARTINI LAYOUT_Layout 1 4/29/13 1:28 PM Page 106

Page 7: Quality of experience for 3D video streaming

IEEE Communications Magazine • May 2013 107

tional quality assessment and also how attentionmechanisms could be properly integrated intodesign of QoE assessment methodologies. Athorough study has not been conducted to datein order to identify the relationship between 3Dimage/video attention models and 3D image/video quality evaluation.

Similar to the integrated model described above,attentive areas identified by visual attention studiescan be utilized to extract image features which canbe used to design NR and RR quality metrics forreal-time 3D video application. Furthermore, sincevisual attention models can predict the highly atten-tive areas of an image or video, these can be inte-grated into source and channel coding at the senderside. Emerging 3D saliency models incorporate 2Dimage, depth, and motion information, which canbe applied to 3D video sequences. Most of thereported 3D saliency models are extensions of 2Dvisual saliency models by incorporating depth infor-mation. Table 4 lists a few 3D saliency modelsreported in the literature. There are two main typesof depth integrated saliency models: depth weight-ed 3D saliency and depth saliency based methods.The depth weighted saliency models weight the 2Dsaliency map based on depth information. In depthsaliency models, the predicted 3D saliency map isderived based on the chosen weights for 2D anddepth saliency maps.

CONCLUSIONWe have presented the main concepts for theassessment of the quality of experience for 3Dvideo streaming, highlighting the most recentachievements and current challenges. Studies onthe human visual system together with the devel-opment of mathematical models will enableadvances in this area, together with the availabil-ity of 3D video databases for the comparison ofresults among different research teams.

REFERENCES[1] L. Meesters, W. IJsselsteijn, and P. Seuntiens, “A Survey

of Perceptual Evaluations and Requirements of Three-Dimensional TV,” IEEE Trans. Circuits and Systems forVideo Tech., vol. 14, no. 3, Mar. 2004, pp. 381–91.

[2] M. Martini et al., “Guest Editorial: QoE-Aware WirelessMultimedia Systems,” IEEE JSAC, vol. 30, no. 7, 2012,pp. 1153–56.

[3] P. Le-Callet, S. Moeller, and A. Perkis, “Qualinet WhitePaper on Definitions of Quality of Experience, version1.1,” European Network on Quality of Experience inMultimedia Systems and Services (COST Action IC1003), June 2012.

[4] K. Wang et al., “Perceived 3D TV Transmission Quality

Assessment: Multilaboratory Results Using AbsoluteCategory Rating on Quality of Experience Scale,” IEEETrans. Broadcasting, 2012.

[5] J. Cutting and P. Vishton, “Perceiving Layout and Know-ing Distances: The Integration, Relative Potency, andContextual Use of Different Information About Depth,”Perception of Space and Motion, W. Epstein and S.Rogers, Eds., 1995, pp. 69–117.

[6] M. Lambooij et al., “Visual Discomfort and VisualFatigue of Stereoscopic Displays: A Review,” J. ImagingScience and Tech., vol. 53, no. 3, May 2009, pp. 30201-1–14.

[7] P. Seuntiens, L. Meesters, and W. Ijsselsteijn, “PerceivedQuality of Compressed Stereoscopic Images: Effects ofSymmetric and Asymmetric JPEG Coding and CameraSeparation,” ACM Trans. Applied Perception, vol. 3, no.2, 2006, pp. 95–109.

[8] A. Benoit et al., “Quality Assessment of StereoscopicImages,” EURASIP J. Image and Video Proc., vol. 2008,2008.

[9] C. Hewage et al., “Quality Evaluation of Color PlusDepth Map-based Stereoscopic Video,” IEEE J. SelectedTopics in Sig. Proc., vol. 3, no. 2, 2009, pp. 304–18.

[10] J. Seo et al., “An Objective Video Quality Metric forCompressed Stereoscopic Video,” Circuits, Systems, andSig. Proc., 2012, pp. 1–19.

[11] M. Martini et al., “Content Adaptive Network AwareJoint Optimization of Wireless Video Transmission,”IEEE Commun. Mag., vol. 45, no. 1, 2007, pp. 84–90.

[12] C. Hewage and M. Martini, “Edge-based Reduced-Ref-erence Quality Metric for 3-D Video Compression andTransmission,” IEEE J. Selected Topics in Sig. Proc., vol.6, no. 5, 2012, pp. 471–82.

[13] Z. Sazzad et al., “Stereoscopic Image Quality Predic-tion,” Proc. Int’l. Wksp. Quality of Multimedia Experi-ence, 2009, pp. 180–85.

[14] A. Maalouf and M. Larabi, “CYCLOP: A Stereo ColorImage Quality Assessment Metric,” Proc. IEEE Int’l.Conf. Acoustics, Speech and Signal Processing, 2011,pp. 1161–64.

[15] M. Solh and G. AlRegib, “A No-Reference Quality Mea-sure for DIBRbased 3D Videos,” Proc. IEEE Int’l. Conf.Multimedia and Expo, 2011, pp. 1–6.

BIOGRAPHIESCHAMINDA T. E. R. HEWAGE ([email protected])received his B.Sc.Eng. (Hons.) degree in electrical and infor-mation engineering from the University of Ruhuna, SriLanka. During 2004–2005, he worked as a telecommunica-tion engineer for Sri Lanka Telecom, Ltd.. He completed hisPh.D. (thesis: Perceptual Quality Driven 3D Video over Net-works) at the Centre for Communication Systems Research,University of Surrey, United Kingdom. Since September2009 he has been attached to the Wireless and MultimediaNetworking Research Group at Kingston University (KU),London, United Kingdom.

MARIA G. MARTINI ([email protected]) is a reader(asociate professor) at KU. She received her Laurea in elec-tronic engineering (summa cum laude) from the Universityof Perugia, Italy, in 1998 and her Ph.D. in electronics andcomputer science from the University of Bologna, Italy, in2002. She has led the KU Wireless Multimedia Networkingresearch group in national and international research pro-jects. Her research interests include wireless multimedianetworks, cross-layer design, and 2D/3D video qualityassessment.

Table 4. 3D image/video visual attention models.

3D visual attention model Creator Model type and considered image features

J. Wang, M.P. Da Silva, P. Le Callet, V.Ricordel (2013) University of Nantes

2D + depth saliency (using both current and priorimage information), motion information is not takeninto account

Y. Zhang, G. Jiang, M. Yu, K. Chen (2010) Ningbo University 2D + depth + motion

E. Potapova, M. Zillich, M. Vincze (2011) Vienna University of Technology 2D + depth saliency (based on surface height and rel-ative surface orientation)

N. Ouerhani, H. Hugli (2010) Neuchatel University 2D + scene depth

MARTINI LAYOUT_Layout 1 4/29/13 1:28 PM Page 107