GOP-level transmission distortion modeling for mobile streaming video

ARTICLE IN PRESS

0923-5965/$ - se

doi:10.1016/j.im

�Correspondand Informatio

Shanghai 20024

fax: +8621 342

E-mail addr

Signal Processing: Image Communication 23 (2008) 116–126

www.elsevier.com/locate/image

GOP-level transmission distortion modeling for mobilestreaming video

Chongyang Zhanga,b,�, Hua Yanga,b, Songyu Yua,b, Xiaokang Yanga,b

aInstitute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai 200240, PR ChinabShanghai Key Laboratory of Digital Media Processing and Transmissions, Shanghai Jiao Tong University, Shanghai 200240, PR China

Received 21 May 2007; received in revised form 5 December 2007; accepted 6 December 2007

Abstract

Unequal loss protection is an effective tool in delivering compressed video streaming over packet-switched networks

robustly. A critical component in any unequal-loss-protection scheme is a metric for evaluating the importance of different

frames in a Group-Of-Pictures (GOP). In the case of video streaming over 3G mobile networks, packet loss usually

corresponds to whole-frame loss due to low bandwidth and small picture size, which results in high error rates and thus

most of the existing low-complexity transmission-distortion-estimate models may be ineffective. In this paper, we firstly

develop a recursive algorithm to compute the GOP-level transmission distortion at pixel-level precision using pre-

computed video information. Based on the study on the propagating behavior of the whole-frame-loss transmission

distortion, we then propose a piecewise linear-fitting approach to achieve low-complexity transmission distortion

modeling. The simulation results demonstrate that the proposed two models are accurate and robust. The proposed

transmission distortion models are fast and accurate importance assessment tools in allocating limited channel resources

optimally for the mobile streaming video.

r 2007 Elsevier B.V. All rights reserved.

Keywords: Transmission-distortion-estimate; Unequal loss protection; Streaming video

1. Introduction

Nowadays, the expanded bandwidth for the airinterfaces has made a solid ground for streamingmedia applications on 3G mobile network. With theadvantages of wireless system in time and place,mobile streaming media service is very attractive.

e front matter r 2007 Elsevier B.V. All rights reserved

age.2007.12.002

ing author at: Institute of Image Communication

n Processing, Shanghai Jiao Tong University,

0, PR China. Tel.: +86 21 34204503;

04155.

ess: [email protected] (C. Zhang).

Due to different kinds of fading and multipathinterference for the wireless channels, mobile videocommunication over 3G networks experiences burstpacket losses. Moreover, the compressed videosignal is extremely vulnerable against transmissionerrors, since low bit-rate video coding schemes relyon interframe coding for high coding efficiency. Thecoding structure of motion compensated interframeprediction creates strong spatio-temporal depen-dency in video frames [1,2]. Consequently, theunavoidable packet losses during wireless transmis-sion may result in error propagation of recon-structed video and thus induce severe quality

.

www.elsevier.com/locate/image

dx.doi.org/10.1016/j.image.2007.12.002

mailto:[email protected]

ARTICLE IN PRESSC. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126 117

degradation. This type of picture distortion is calledtransmission distortion.

In order to combat the effect of losses, many errorcontrol techniques are proposed for the packetvideo transmission [2–5]. In Ref. [4], the error-control mechanisms are classified into four cate-gories: forward error correction (FEC), retransmis-sion, error resilience, and error concealment. Forthe streaming media applications, retransmission-based schemes are not allowed usually for real-timeapplications because of long time delay andhundreds of multicast members. Error-resilientschemes deal with packet loss on the compressionlayer, and most of them (e.g., resynchronizationmarking, data partitioning, and data recovery, etc.),are targeted to recover the decoding from bit errors.For packet video, transmission error is usually theentire packet loss and these bit-recovery basederror-control mechanisms may cease to be effective[4]. Since complex error-concealment mechanismshave to be restricted by both the limited processingability and power consumption of the handhelddevices, the simplest and most common approach,previous frame repetition, is adopted in the mobilestreaming media applications usually. FEC-basedunequal loss protection (ULP) is one of the efficienterror-control schemes used for the transmission ofcompressed video streaming over packet-loss net-works [6,7], by which limited channel resources areallocated efficiently to achieve increased rate-distor-tion (R-D) performance.

One of the most critical aspects of efficientresource allocation is accurate evaluation of theend-to-end video quality [8]. In the accuratetransmission-distortion-estimate (TDE) schemeswith moderate complexity, such as ROPE algorithm[9] and the statistics-based analytic model [10], theoverall distortion accumulated from previousframes is computed to determine the coding modefor current macroblock (MB), where the totaltransmission distortion include distortion in currentMB and its propagation in subsequent framescannot be obtained. The low-complexity modelsinclude the method considering the intra refreshingand spatial loop filtering [11] and the method usingthe basic concepts of control systems [12]. It isworth noting that the low-complexity estimatemodels above are applicable for the low error-ratesapplications. For example, the error rate in Ref. [11]is less than 6% and that in Ref. [12] is about 8%. Inthe mobile video applications, packet loss usuallycorresponds to whole-frame loss due to low

bandwidth and small picture size [13], which resultsin high error rates (an amount of blocks withnonzero motion vectors (MVs) are corrupted due toprevious frame concealment) and thus the abovelow-complexity schemes are ineffective [11]. Thesimplified distortion-estimate model without con-sidering the picture complexity, such as ELEP inRef. [6] where only temporal error propagation istaken into account, are usually not accurate enoughand thus optimal R-D performance cannot beachieved. Based on statistics of error propagation,a method for lightweight prediction of videodistortion is proposed in Ref. [14].

In this paper, by exploiting the pre-computedinformation (such as MVs) of stored video signals,we first present a recursive TDE algorithm in thispaper to compute the GOP-level transmissiondistortion for whole-frame losses. We then developa low-complexity TDE model using piecewiselinear-fitting approach based on the study on theerror propagation behaviors of whole-frame losses.The experimental results demonstrate that the twoproposed TDE models are accurate and robust. Theproposed transmission distortion models provide atype of fast and accurate importance-assessmenttools in allocating limited channel resources effec-tively.

The rest of this paper is organized as follows. InSection 2, we analyze the error propagationcharacteristics and unequal importance for differentframes in a Group-Of-Pictures (GOP). Section 3presents the recursive TDE algorithm for the storedvideo streaming, and Section 4 gives a piecewiselinear-fitting approach based low-complexity esti-mate model. Simulation results are shown in Section 5.Finally, conclusions are drawn in Section 6.

2. Error propagation and unequal importance in a

GOP

2.1. Interframe error propagation

The common video coding scheme employsinterframe prediction to remove temporal redun-dancies. Although interframe coding generallyachieves higher compression efficiency, it is moresensitive to channel packet losses since each inter-frame prediction depends on its predecessor and anypacket loss may break the prediction chain andaffect all subsequent inter-predicted frames.

Let a packet containing data from the currentframe be lost in the channel, and let the decoder

ARTICLE IN PRESSC. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126118

perform previous-frame-repetition error conceal-ment. Clearly, the resulting reconstruction at thedecoder is different from the reconstruction at theencoder. Note that at the decoder side, the currentreconstruction frame ‘‘corrupted’’ by packet losswill still be used as the motion-compensationreference for the next frame. In this way, thechannel distortion propagates along the motion-compensation path. Whenever the motion vector isnonzero, the error will propagate in both thetemporal and the spatial directions [9], which makesthe resulting artifacts particularly annoying (asshown in Figs. 1 and 2).

2.2. Unequal importance assessment using GOP-level

transmission distortion

In video sequences, a typical GOP is composed ofone I-frame followed by ‘‘NG-1’’ P- and B-frames(here NG denotes the total number of frames in oneGOP). Since B-frame losses do not interfere withother frames, we consider the frame sequences in a

Frame loss

Errors induced by concealment

Time

Fig. 1. Illustration of spatio-temporal error propagation.

Fig. 2. Illustration of spatial-temporal error propagation due to the lo

(the frame index in the two picture groups above is: 1, 5, 10, 15, 20, an

GOP structure without B-frames. In this structure,losing different frames of a GOP often results indifferent distortion. In other word, the frames in aGOP have unequal importance [15]. Fig. 2 gives theillustrated sample: the errors induced by the loss ofthe second frame leads to more artifacts in thefollowing frames (Fig. 2, up) compared with that bythe loss of the fifth frame (Fig. 2, down). There havebeen some video transmission strategies that con-sider the unequal importance in a GOP. In Ref. [16],channel resources are optimally assigned to the P-frames with degressive importance in a GOP.However, when video scenario or GOP size changes,it may be inaccurate that preceding frames are moreimportant than the following frames [17]. In thestate-of-art unequal-error-protection schemes, ac-curate evaluation of the GOP-level transmissiondistortion is a key element in assigning the channelresources optimally [8]. Since more transmissiondistortion means more importance, we utilize theGOP-level transmission distortion DG(n0), whichdenotes the overall distortion in a GOP that inducedby the loss of frame n0 and corresponding errorconcealment, to assess the importance of differentframes in a GOP accurately.

Let F(n, i) be the original value of pixel i in framen, F ðn; iÞ and ~F ðn; iÞ be its encoder reconstructionand decoder reconstruction respectively. Let Dt(n0)be the instantaneous transmission distortion (ITD)induced by error concealment in case of the loss offrame n0, defined as the mean square error (MSE)between its encoder and decoder reconstruction.Based on this definition, we can get its value asfollows:

Dtðn0Þ ¼1

NP

XNP�1

i¼0

½F ðn0; iÞ � ~F ðn0; iÞ�2; (1)

ss of the second frame (up) and the fifth frame (down) separately

d 25).

ARTICLE IN PRESS

0

2

4

6

8

18

16

14

12

10

0 5 10 15 20 25 30

Frame index

GO

P-le

vel D

isto

rtio

n

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30

Frame indexG

OP-

leve

l Dis

tort

ion

0

350

300

250

200

150

100

50

0 5 10 30252015

Frame index

GO

P-le

vel D

isto

rtio

n

Fig. 3. GOP-level transmission distortion vs. frame index for three test sequences: (a) Akiyo, (b) Foreman and (c) Carphone.

C. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126 119

where NP is the total number of pixels in a frame.Due to most of the prevalent video coding schemesare based on a hybrid structure of motion compen-sated prediction, strong spatio-temporal depen-dency is created in the compressed bitstreams,which will result in the error propagation from thelost frame to its following pictures in the same GOP.Similarly to Eq. (1), the distortion of picture (n0+k)introduced by the error propagation from frame n0can be expressed as:

Dtðn0 þ kÞ ¼1

NP

XNP�1

i¼0

½F ðn0 þ k; iÞ � ~F ðn0 þ k; iÞ�2.

(2)

Based on the above analysis, GOP-level transmis-sion distortion for frame n0, DG(n0), can be obtainedby the accumulation of the ITD induced in thecurrent frame and that in the following frames inthe same GOP:

DGðn0Þ ¼XNG�1

n¼n0

DtðnÞ ¼1

NP

XNG�1

n¼n0

XNP�1

i¼0

½F ðn; iÞ � ~F ðn; iÞ�2.

(3)

Using Eq. (3), Fig. 3 plots the correspondingGOP-level distortion DG(n0) when losing differentframes for three typical QCIF sequences, namelyAkiyo, foreman, and Carphone. Each sequence issimulated with GOP size 30. As can be seen fromFig. 3, different frames in same GOP result indifferent GOP-level transmission distortion.

3. Recursive computation of the GOP-level


Unfortunately, the computational complexity isvery high when Eq. (3) is used to calculate the GOP-

level transmission distortion, and the low-complex-ity TDE models will be badly suited to the higherror rates (induced by whole-frame losses) wirelessvideo. Thus, for the mobile streaming media withwhole-frame losses, the theoretical or approximatemodel with modest complexity is needed to computethe GOP-level transmission distortion with reason-able accuracy. Considering the characteristic thaterror propagates along the motion predictionpath and the available video information (such asMVs) in the pre-coded media, we develop arecursive transmission–distortion–computation ap-proach, which is presented in this section.

3.1. Recursive construction of the motion vector

mapping table

We construct the MV table V for each P-framefirstly: the element V(n, i) in the table is assignedaccordingly with Eq. (4):

V ðn; iÞ ¼MVðn; iÞ ði 2 Inter-coded blockÞ;

CMAX ði 2 Intra-coded blockÞ:

((4)

Here MV(n, i) is the motion vector value of pixel i inframe n, and CMAX is a predefined possiblemaximum of MV. For non-real-time applications,knowledge about the interdependence among blocksor pixels can be obtained by analyzing MVs betweensuccessive frames [18]. We develop a mapping tableM using MVs to construct the dependence graphbetween the lost frame and its following framesdirectly, which is used to calculate the distortionpropagated in a GOP.

An example of how to construct a graph andcalculate the mapping vector is presented in Fig. 4.In Fig. 4, pixel c in frame n0+2 is predicted by usingpixel b in frame n0+1, and the reference pixel of b is

ARTICLE IN PRESS

Fig. 4. An illustration of constructing a MV mapping.

C. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126120

pixel a in frame n0. We thus define the mappingvector of pixel c to be:

Mn0þ2;n0 ðcÞ ¼ V ðn0 þ 2; cÞ þ V ðn0 þ 1; bÞ. (5)

Similarly, the mapping vector can be calculatedby Eq. (5), generally and recursively.

Mn0þ1;n0ðiÞ¼ V ðn0 þ 1; iÞ

Mn0þ2;n0ðiÞ¼ V ðn0 þ 2; iÞ þMn0þ1;n0ðRn0þ2;n0þ1ðiÞÞ

..

.

Mn0þk;n0ðiÞ¼ V ðn0 þ k; iÞ þMn0þk�1;n0 ðRn0þk;n0þk�1ðiÞÞ

ð6Þ

In Eq. (6), Rn0þk;n0 (i) represents the reference pixel inn0th frame of pixel i in frame n0+k, and its valuecan be calculated from the constructed MV map-ping table:

Rn0þk;n0 ðiÞ ¼ i þ DðMn0þk;n0 ðiÞÞ, (7)

where D( � ) denotes the index offset, which can beobtained according to the mapping vector.

Note that in Eqs. (4)–(7), the left-hand expressionshould be assigned to CMAX when one of thecomponents in the right-hand side equals toCMAX. Here CMAX is introduced to solve theproblem that the mapping chains may be brokenby the intra-coded blocks. Once an intra-codedblock is met in the mapping chain, the propagationof transmission error will stop and the distortion forthe pixel predicted from the intra-coded blocks is setto zero by

Dtðn0 þ k; iÞ ¼ 0 when Rn0þk ;n0ðiÞ ¼ CMAX, (8)

where Dt(n0+k,i) denotes the pixel-level transmis-sion distortion of pixel i in frame n0+k.

Based on above construction scheme, only singleMV mapping table Mn0þk;n0 is needed during thecalculation of GOP-level transmission. In practice,this MV mapping table will be updated dynamicallyalong with the process of the distortion computing.

3.2. Recursive calculation of instantaneous


Assume all the frames in a GOP are receivedcorrectly, we can get the following relations:

~F ðn; iÞ ¼ F ðn; iÞ ¼ F ðn� 1;Rn;n�1ðiÞÞ þ eðn; iÞ, (9)

where F ðn� 1;Rn;n�1ðiÞÞ is the encoder reconstruc-tion of pixel i’s reference pixel and eðn; iÞ is thequantized prediction error. To assess the impor-tance of frame n0 accurately, we need to calculate itsGOP-level transmission distortion under the as-sumption that only frame n0 is lost and the restpictures in the same GOP are error-free. Supposethat previous frame repetition is taken as the errorconcealment (i.e., ~F ðn; iÞ ¼ ~F ðn� 1; iÞ), we can getthe derivation in Eq. (10):

~F ðn0; iÞ¼ F ðn0 � 1; iÞ

~F ðn0 þ 1; iÞ¼ ~F ðn0;Rn0þ1;n0ðiÞÞ þ eðn0 þ 1; iÞ

¼ F ðn0 � 1;Rn0þ1;n0ðiÞÞ þ eðn0 þ 1; iÞ

~F ðn0 þ 2; iÞ¼ ~F ðn0 þ 1;Rn0þ2;n0þ1ðiÞÞ þ eðn0 þ 2; iÞ

¼ F ðn0 � 1;Rn0þ2;n0ðiÞÞ

þeðn0 þ 1;Rn0þ2;n0þ1ðiÞÞ þ eðn0 þ 2; iÞ

..

.

~F ðn0 þ k; iÞ¼ F ðn0 � 1;Rn0þk;n0ðiÞÞ

þeðn0 þ 1;Rn0þk;n0þ1ðiÞÞ þ � � � þ eðn0 þ k; iÞ

ð10Þ

Similarly, F ðn0 þ k; iÞ can be expressed as:

F ðn0 þ k; iÞ ¼F ðn0;Rn0þk;n0 ðiÞÞ þ eðn0 þ 1;Rn0þk;n0þ1ðiÞÞ

þ � � � þ eðn0 þ k; iÞ. ð11Þ

Substituting Eqs. (10) and (11) into Eqs. (1) and (2),respectively, we can get:

Dtðn0; iÞ ¼ ½F ðn0; iÞ � F ðn0 � 1; iÞ�2, (12)

Dtðn0 þ k; iÞ ¼½F ðn0 þ k; iÞ � ~F ðn0 þ k; iÞ�2

¼ ½F ðn0;Rn0þk ;n0 ðiÞÞ � F ðn0 � 1;Rn0þk ;n0ðiÞÞ�2

¼ Dtðn0;Rn0þk ;n0 ðiÞÞ ð13Þ

From Eq. (13), it can be found that Dt(n0+k, i) canbe obtained using Dt(n0, j) (j ¼ Rn0þk;n0ðiÞ). To getthe value of Rn0þk;n0 ðiÞ and Dt(n0, i), the informationof MVs and F ðn0; iÞ are necessary. For the videoserver used to transmit the compressed bitstreams,information such as MVs and reconstructed pixelF ðn0; iÞ can be gotten by decoding the compressedbitstreams and reconstructing the pictures in the


sender side. Therefore, the GOP-level transmissiondistortion DG(n0) can be calculated by accumulatingthe ITD Dt(n0+k) (n0pn0+kp(NG�1)), accordingto Eqs (3)–(13).

3.3. Complexity analysis

Compared with the traditional computing methodusing Eq. (3), an amount of frames reconstructionand MSE computation (from n0+1 to NG�1) can besaved during the GOP-level transmission distortioncalculating by using the above recursive scheme. Fora given GOP size NG, the computational complexity(including frame-level decoding and MSE comput-ing) of traditional method is O((NG)!), while thatvalue of the proposed recursive method is O(NG).Given NG ¼ 30, we compare the computationalcomplexity of traditional method and that of theproposed recursive method in Table 1.

It can be seen from Table 1 that the operations offrame-level decoding and MSE computation, whichform most of the computational burden, arereduced significantly by the recursive scheme insteadof the traditional scheme. Although additional data(distortion) moving and addition operation arerequired in the proposed method, the complexityis negligible compared with that of decoding andMSE computation. Considering most of the trans-mitters are PC-based server, the computationalcomplexity of the recursive scheme is modest forthe mobile streaming video applications.

Note that in the application of video servers withmultiple concurrent video streams, where up to300–400 concurrent streams are required in somelarge-scale mobile VOD systems, the computationalcomplexity using the above recursive algorithm maybecome unacceptable. Based on the recursive dis-tortion–computation algorithm, we propose a low-complexity GOP-level TDE model to reduce thecomputing load further.

Table 1

Computational complexity comparison for different GOP-level

TDE methods with GOP size 30

Operation (frame-level) Computational complexity

Traditional

method

Recursive

method

Decoding 465 times 30 times

MSE computation 465 times 30 times

Data moving and

addition

0 times 465 times

4. Low-complexity modeling for GOP-level

transmission distortion estimate

4.1. The behavior of the whole-frame-loss error

propagation

For typical mobile video, the compressed size ofone video frame can become fairly small (800 byteson average for 10 frames per second of QCIFvideo over 64Kbit/s wireless channel [13]), and asingle packet per video frame is often adopted toensure efficient packet header overhead. Thus, apacket loss corresponds to one whole-frame-loss[13,19].

The authors in Ref. [10] have demonstrated thatthe impulse transmission distortion has an exponen-tial fading behavior. However, the fading behavior oferror propagation is determined by two effects: intra-block coding and repeated spatial filtering (i.e., sub-pixel interpolation) [2]. In fact, if the two effects areabsent, the errors caused by MBs losses will not decayover time (the simulation A in Ref. [2]). In otherwords, the distortion will present exponential fadingbehavior only in the low error rate cases, such as the8% packet loss rate simulated in Ref. [10]. The reasonis that the advanced video coding standards (MPEG-4 [20] or H.264/AVC [21]) provide embedded spatialfiltering function: the few mistakes can be smoothedout by a large number of error-free blocks usingsubpixel interpolation. In the case of whole-framelosses occurred in mobile video communication, thehigh error rates will invalidate the filtering efficiencyof interpolation, and the error propagation may notpresent exponential fading behavior [17].

By using H.264/AVC reference software JM11.0[22], we simulate and analyze the behavior of sixtypical QCIF sequences: Akiyo, Carphone, Foreman,Hall, Salesman, and Coastguard. These sequenceshave a wide spectrum of scene characteristics. Thesequences are coded without considering the chan-nel losses, and the encoding structure is an intra-coded frame followed by a serial of P-frames. Weintroduce a transmission error (frame loss) at framen0 ¼ 3, and the rest frames are error free (receivedcorrectly). Here, the frame number 3 is arbitrarilychosen, and the GOP length NG is set to 30. Let

DtðnÞ ¼DtðnÞ

Maxn¼1;...;NG�1fDtðnÞg

, (14)

which is the normalized transmission distortion.In Fig. 5, we plot the normalized transmission

ARTICLE IN PRESS

0.00

0.20

0.40

0.60

0.80

1.00

1.20

0 5 10 15 20 25 30Frame index

Nor

mal

ized

Dis

tort

ion

AkiyoCarphoneForemanHallSalesmanCoastguard

Fig. 5. The propagation behavior of the whole-frame-loss

transmission distortion (the third frame is lost).

1

0.8

0.6

0.4

0.2

00 10 20 30

Frame Number

Nor

mal

ized

Dis

tort

ion

c1

a1 b1b2(a2)

(a3)

c2

c3

b3

Fig. 6. The piecewise linear-fitting for the transmission distortion

estimate (the simulation is based on the test sequence Carphone

with the loss of frame 3).

C. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126122

distortion DtðnÞ as a function of the frame index (ortime).

It can be seen from Fig. 5 that most of thetransmission distortion plots present linear orpiecewise linear aspect. Thus, we can use linearfunction to approximate the propagation behaviorof the impulse transmission distortion. The approx-imation will be more accurate when piecewiselinear-fitting is used, as shown in Fig. 6.

If we use the proportion s of the area thattransmission distortion propagates from frame n0 tothe end of the GOP, and take the area as a sum of K

trapeziums. Here K is the total number of thepiecewise. Let ai, bi, and ci denote the two sides andthe height of each trapezium, respectively (see Fig. 6),then the estimated GOP-level distortion DGðn0Þ canbe calculated by

DGðn0Þ ¼1

NG

XK

i¼1

si ¼1

NG

XK

i¼1

ai þ bi

2ci (15)

with

a0 ¼ Dtðn0Þ

aiþ1 ¼ bi ¼ Dtðn0 þ kiÞ

ki ¼Pi

j¼1

cj

9>>>>=>>>>;. (16)

4.2. Complexity analysis

Based on analysis above, we can get the GOP-leveltransmission distortion using Eq. (15) and (16). Foreach piece, there are two parameters needed: instanttransmission distortion Dt(n0) and Dt(n0+k). With thepiecewise fitting approach, the transmitter can com-pute the GOP-level transmission distortion using onlya few frame-level samples (start and end points foreach piece). Obviously, the more sample points areused, the more accurate the estimation will be.

Based on the recursive algorithm presented inSection 3, we can get Dt(n0+k) using Dt(n0) and MVmapping table directly. Thus, the computational costand the induced time-delay are reduced further: frame-level distortion-data-accumulating operations can bedecreased from

PNG

n¼1n ¼ 465 times toPNG�1

n¼1 ½ðNG �

nÞ=K� þNG ¼ 131 times (with NG ¼ 30 and K ¼ 5).This means that about two-third computationalburden is saved for the piecewise fitting scheme.

Compared with the existing low-complexity dis-tortion-estimate schemes, such as the modelsproposed in Refs. [16,14], the proposed method isa little bit more complex. However, the proposedmethod has less misclassification ratio due to theaccurate tracking of the error propagation, and thusboth the temporal and special error propagation canbe considered in the proposed scheme. The referenceschemes in Refs. [16,14] are both developed basedon the error propagation length, and thus only thetemporal error propagation is considered in thedistortion computing. The low-computational com-plexity distortion-estimate model in Ref. [14] isparticularly applicable to the simply two-classDiffServ networks. However, if the picture’s priorityor class number is larger than 2, the proposedscheme may be more general.

5. Simulation results

Simulations have been carried out to evaluate theperformance of the proposed GOP-level TDEmodels. We used H.264/AVC reference softwareJM11.0 [22] to encode six typical test videos withoutconsidering any channel losses. All of them are


coded with a GOP size of 30 frames (one previousframe is used as reference) and the frame rate is setto 15 frames per second (fps). The proposed twoTDE models are simulated separately to get theGOP-level transmission distortion for each frame ina GOP.

0

30

25

20

15

10

5

35

40

0 10 20 30 40 50 60 70 80 90

Frame Index

GO

P-L

evel

Dis

tort

ion

Actual

Recusive-Est.

Fitting-Est.

0

10

20

30

40

50

60

70

80

90

0 10 20 30 40 50 60 70 80 90

Frame Index

GO

P-L

evel

Dis

tort

ion

Actual

Recusive-Est.

Fitting-Est.

0

50

100

150

200

250

0 10 20 30 40 50 60 70 80 90

Frame Index

GO

P-L

evel

Dis

tort

ion

Actual

Recusive-Est.

Fitting-Est.

c d

e f

Fig. 7. Estimation of the GOP-level transmission distortion for six QC

(e) Carphone and (f) Coastguard.

The actual transmission distortion (named ‘‘Ac-tual’’), the estimation based on recursive algorithm(named ‘‘Receusive-Est.’’) and the estimation usingpiecewise linear-fitting approach (named ‘‘Fitting-Est.’’) are shown in Fig. 7. Although non-integermotion compensation and deblocking filters used in

0

10

20

30

40

50

60

70

80

90

0 10 20 30 40 50 60 70 80 90

Frame Index

GO

P-L

evel

Dis

tort

ion

Actual

Recusive-Est.

Fitting-Est.

0

200

200

300

400

500

600

700

800

0 10 20 30 40 50 60 70 80 90

Frame Index

GO

P-L

evel

Dis

tort

ion

Actual

Recusive-Est.

Fitting-Est.

0

200

400

600

800

1000

1200

1400

0 10 20 30 40 50 60 70 80 90

Frame Index

GO

P-L

evel

Dis

tort

ion

Actual

Recusive-Est.

Fitting-Est.

IF test sequences: (a) Akiyo, (b) Hall, (c) Salesman, (d) Foreman,


H.264/AVC introduce cross-correlation betweenpixels that make recursive estimate algorithm lessprecise, it can be seen from Fig. 7 that the twoproposed transmission distortion models are bothaccurate and robust. The piecewise linear-fittingscheme uses fewer samples than the recursivescheme, while its relative importance judgementerror (RIJE) values are almost the same as that ofthe recursive algorithm, which demonstrates thevalidity of the proposed low-complexity TDEmodel.

Since the most critical application of TDE is theassessment of importance for different frames in aGOP, we use the following formula:

E ¼

PNG�1n¼0 eðnÞ

NG� 100% (17)

to measure the RIJE. Here e(n) is the error flag forframe n. If the estimated priority of frame n equalsto its priority assigned by actual transmissiondistortion, e(n) ¼ 0; otherwise, e(n) ¼ 1.

Two experiments were given to compare the RIJEof different TDE models. In first experiment, theframes in a GOP are sorted into two classes: the first20% frames with the highest GOP-level transmis-sion distortion are assigned to the premium class,

Table 2

Relative importance judgement error comparison with class number ¼

Video sequence Bit-rate (Kbit/s) RIJE (%)

Scheme-I Sch

Akiyo 48 29 22

Hall 64 29 16

Salesman 64 11 11

Foreman 64 16 13

Carphone 96 16 11

Coastguard 96 33 62

Table 3

Relative importance judgement error comparison with class number ¼

Video sequence Bit-rate (Kbit/s) RIJE (%)

Scheme-I Sch

Akiyo 48 41 39

Hall 64 53 39

Salesman 64 42 26

Foreman 64 49 58

Carphone 96 34 30

Coastguard 96 33 62

and the rest is sent as normal class. The two-classnetwork architecture is one of the simplest DiffServscenarios [14]. To simulate the scenario with morethan two priorities, such as the fine ULP scheme inRef. [16], we set the class number to 3 in the secondexperiment, where every 10 frames with similardistortion are grouped and assigned of one samepriority.

In Tables 2 and 3, we give the RIJE comparisonresults for five TDE schemes. Scheme-I is the LEP(length of error propagation) method used in Ref.[16] and Scheme-II denotes the method that use theinstantaneous distortion as the GOP-level transmis-sion distortion; Scheme-III uses the model-baseddistortion estimation developed in Ref. [14],Scheme-IV and Scheme-V correspond to the pro-posed recursive algorithm and fitting scheme,respectively.

From Tables 2 and 3, we can find that in all cases,the RIJE of the proposed models (Scheme-IV andScheme-V) are less than that of the intuitive models(Scheme-I to Scheme-III). It implies that theproposed TDE models are more accurate. AlthoughScheme-III can achieve low misclassified ratio in thetwo-class scenario, its performance is decreasedquickly in the three-class scenario. The reason is the

2

eme-II Scheme-III Scheme-IV Scheme-V

18 2 2

13 7 7

7 0 0

13 0 0

2 2 2

18 13 16

3

eme-II Scheme-III Scheme-IV Scheme-V

38 7 10

13 13 13

20 13 16

56 23 26

6 3 3

18 13 16


distortion-estimate model used in Scheme-III con-siders the temporal error-propagation only, andspatial error-propagation is ignored. Due to thelow-computational complexity, Scheme-III is veryapplicable to the simply two-class DiffServ net-works [14]. However, for the finer ULP applicationswith more priority number, the proposed schemes(Scheme-IV and Scheme-V) will be more accurateand more applicable.

From the comparison in Table 3 and Fig. 7, it canbe found that the RIJE is higher when theestimation matches the experimental results well(refer to the RIJE values in Table 3 and theillustrations of Foreman, Salesman, and Coastguard

in Fig. 7). One rational explanation is that the GOP-level transmission distortion in these sequences ismore ‘‘regular’’ than other sequences: only a fewsamples are abnormally (far outweighs others) andthe rest samples are with similar distortion value.Most of the assessment errors may occur in the flatregion where the distortion difference of adjacentframes is very small. This type of importance

35

36

37

38

39

40

5 10 15 20Average Packet Loss Rate (%)

Ave

rage

PSN

R-Y

(dB

)

Akiyo with Actual distortionAkiyo with Scheme-I

Akiyo with Scheme-IIIAkiyo with Scheme-V

25

26

27

28

29

30

31

5 10 15 20

Average Packet Loss Rate (%)

Ave

rage

PSN

R-Y

(dB

)

Foreman with Actual distortion

Foreman with Scheme-I

Foreman with Scheme-III

Foreman with Scheme-V

Fig. 8. Performance (PSNR vs. packet loss rate) comparison

between proposed scheme and reference schemes for Akiyo (up)

and Foreman (down) sequences.

judgment errors should have negligible effects onthe optimal resource allocation.

An error scenario was also simulated to investi-gate the performance of the proposed low-complex-ity TDE model. A two-class DiffServ network isimplemented with a discrete-even simulator; the20% frames with highest GOP-level transmissiondistortion are assigned to premium service withnearly no losses and low delay, and the rest is sent asthe regular best-effort traffic [14]. Fig. 8 comparesthe performance of Scheme-I, Scheme-III andScheme-V as a function of packet loss rate. Resultswere obtained by simulating the transmission of theForeman and Akiyo sequences. The results depictedin each experiment are averaged over 100 simulationruns. From Fig. 8, it can be seen that the proposedscheme (Scheme-V) gets the approximate R-Dperformance compared with the actual distortionscheme, and outperforms the reference scheme(Scheme-I and Scheme-III).

6. Conclusion

In this paper, we have firstly proposed a recursiveestimation algorithm to compute the GOP-leveltransmission distortion induced by whole-framelosses. Based on the study on the propagationbehavior of the whole-frame-loss transmission dis-tortion for stored video streaming, we have thendeveloped a low-complexity model to estimate theGOP-level transmission distortion accurately androbustly. With the estimation, the transmitter canassess the importance for each frame in a GOPeffectively. This low-complexity estimation modelcan be incorporated with channel resources alloca-tion algorithm to achieve increased system perfor-mance for the mobile media streaming applications.

Acknowledgments

The authors would like to acknowledge thefinancial support of National Science Foundationof China (NSFC) under grants nos. 60502034 and60625103. We also would like to thank theanonymous reviewers for their valuable suggestionsthat greatly improved the presentation of this paper.

References

[1] Q. Zhang, W. Zhu, Y.Q. Zhang, End-to-end QoS for video

delivery over wireless Internet, Proc. IEEE 93 (1) (January

2005) 123–134.


[2] B. Girod, N. Farber, Feedback-based error control for

mobile video transmission, Proc. IEEE 87 (10) (October

1999) 1707–1723.

[3] Y. Wang, Q.F. Zhu, Error control and concealment for

video communication: a review, Proc. IEEE 86 (5) (May

1998) 974–997.

[4] D. Wu, Y.T. Hou, Y.Q. Zhang, Transporting real-time

video over the Internet: challenges and approaches, Proc.

IEEE 88 (12) (December 2000) 1855–1875.

[5] Z.G. Li, C. Zhu, N. Ling, X.K. Yang, G.N. Feng, S. Wu, F.

Pan, A unified architecture for real time video coding

systems, IEEE Trans. Circuits Syst. Video Technol. 13 (6)

(2003) 472–487.

[6] X.K. Yang, C. Zhu, Z.G. Li, X. Lin, G.N. Feng, S. Wu, N.

Ling, Unequal loss protection for robust transmission of

motion compensated video over the Internet, Signal

Process.: Image Commun. 18 (2003) 157–167.

[7] M. Murroni, A power-based unequal error protection

system for digital cinema broadcasting over wireless

channels, Signal Process.: Image Commun. 22 (2007)

331–339.

[8] A.K. Katsaggelos, Y. Eisenberg, F. Zhai, R. Berry, T.N.

Papps, Advances in efficient resource allocation for packet-

based real-time video transmission, Proc. IEEE 93 (1)

(January 2005).

[9] R. Zhang, S.L. Regunathan, K. Rose, Video coding with

optimal inter/intra-mode switching for packet loss resilience,

IEEE J. Selected Areas Commun. 18 (June 2000) 966–976.

[10] Z.H. He, J.F. Cai, C.W. Chen, Joint source channel rate-

distortion analysis for adaptive mode selection and rate

control in wireless video coding, IEEE Transact. Circuits

Syst. Video Technol., Special Issue on Wireless Video 12 (6)

(June 2002) 511–523.

[11] K. Stuhlmuller, N. Farber, M. Link, B. Girod, Analysis of

video transmission over lossy channels, IEEE J. Selected

Areas Commun. 18 (June 2000) 1012–1032.

[12] Z.H. He, H.K. Xiong, Transmission distortion analysis for

real-time video encoding and streaming over wireless

networks, IEEE Transact. Circuits Syst. Video Technol. 16

(9) (September 2006) 1051–1062.

[13] H. Liu, W.J. Zhang, S.Y. Yu, X.K. Yang, Channel-aware

frame dropping for cellular video streaming, in Proceedings

of the International Conference on Acoustics, Speech, Signal

Processing, Thulouse, France, May 2006.

[14] F. De Vito, D. Quaglia, J.C. De Martin, Model-based

distortion estimation for perceptual classification of video

packets, in: Proceedings of IEEE Multimedia Signal Proces-

sing Workshop, vol. 1, Siena, Italy, September 2004,

pp. 79–82

[15] A. Albanese, J. Blomer, J. Edmonds, M. Ludy, M. Sudan,

Priority encoding transmission, IEEE Trans. Inform. Theory

42 (November 1996) 1737–1744.

[16] X.K. Yang, C. Zhu, Z.G. Li, G.N. Feng, S. Wu, N. Ling,

Unequal error protection for motion compensated video

streaming over the Internet, in: Proceedings of the Interna-

tional Conference on Image Processing (ICIP2002), vol. 2,

New York, USA, September 2002, pp. II-717–II-720.

[17] C.Y. Zhang, H. Yang, S.Y. Yu, X.K. Yang, H. Liu,

Transmission distortion modeling for unequal importance

judgement, in: Proceeding of the IEEE 2007 International

Conference on Multimedia & Expo (ICME 2007), Beijing,

China, July 2007.

[18] M.H. Willebeek-LeMair, Robust H.263 video coding for

transmission over the Internet, in: Proceedings of the IEEE

INFOCOMM’98, March 1998.

[19] S. Belfiore, M. Grangetto, E. Magli, G. Olmo, Concealment

of whole-frame losses for wireless low bit-rate video based

on multiframe optical flow estimation, IEEE Transact.

Multimedia 7 (2) (April 2005) 316–329.

[20] ISO/IEC JTC 1/SC 29/WG11, 14496-2: Information tech-

nology—generic coding of audio-visual objects—Part 2:

Visual, MPEG99/N 2688, Seoul, March 1999.

[21] ITU-T Rec.H.264|ISO/IEC 14496-10 AVC, Advanced Video

Coding for Generic Audio–Visual Services, May 2003.

[22] /http://iphome.hhi.de/suehring/tml/download/old_jm/jm11.0.

zipS.

http://iphome.hhi.de/suehring/tml/download/old_jm/jm11.0.zip

http://iphome.hhi.de/suehring/tml/download/old_jm/jm11.0.zip

Documents

GOP-level transmission distortion modeling for mobile streaming video