Upload
chongyang-zhang
View
242
Download
0
Embed Size (px)
Citation preview
ARTICLE IN PRESS
0923-5965/$ - se
doi:10.1016/j.im
�Correspondand Informatio
Shanghai 20024
fax: +8621 342
E-mail addr
Signal Processing: Image Communication 23 (2008) 116–126
www.elsevier.com/locate/image
GOP-level transmission distortion modeling for mobilestreaming video
Chongyang Zhanga,b,�, Hua Yanga,b, Songyu Yua,b, Xiaokang Yanga,b
aInstitute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai 200240, PR ChinabShanghai Key Laboratory of Digital Media Processing and Transmissions, Shanghai Jiao Tong University, Shanghai 200240, PR China
Received 21 May 2007; received in revised form 5 December 2007; accepted 6 December 2007
Abstract
Unequal loss protection is an effective tool in delivering compressed video streaming over packet-switched networks
robustly. A critical component in any unequal-loss-protection scheme is a metric for evaluating the importance of different
frames in a Group-Of-Pictures (GOP). In the case of video streaming over 3G mobile networks, packet loss usually
corresponds to whole-frame loss due to low bandwidth and small picture size, which results in high error rates and thus
most of the existing low-complexity transmission-distortion-estimate models may be ineffective. In this paper, we firstly
develop a recursive algorithm to compute the GOP-level transmission distortion at pixel-level precision using pre-
computed video information. Based on the study on the propagating behavior of the whole-frame-loss transmission
distortion, we then propose a piecewise linear-fitting approach to achieve low-complexity transmission distortion
modeling. The simulation results demonstrate that the proposed two models are accurate and robust. The proposed
transmission distortion models are fast and accurate importance assessment tools in allocating limited channel resources
optimally for the mobile streaming video.
r 2007 Elsevier B.V. All rights reserved.
Keywords: Transmission-distortion-estimate; Unequal loss protection; Streaming video
1. Introduction
Nowadays, the expanded bandwidth for the airinterfaces has made a solid ground for streamingmedia applications on 3G mobile network. With theadvantages of wireless system in time and place,mobile streaming media service is very attractive.
e front matter r 2007 Elsevier B.V. All rights reserved
age.2007.12.002
ing author at: Institute of Image Communication
n Processing, Shanghai Jiao Tong University,
0, PR China. Tel.: +86 21 34204503;
04155.
ess: [email protected] (C. Zhang).
Due to different kinds of fading and multipathinterference for the wireless channels, mobile videocommunication over 3G networks experiences burstpacket losses. Moreover, the compressed videosignal is extremely vulnerable against transmissionerrors, since low bit-rate video coding schemes relyon interframe coding for high coding efficiency. Thecoding structure of motion compensated interframeprediction creates strong spatio-temporal depen-dency in video frames [1,2]. Consequently, theunavoidable packet losses during wireless transmis-sion may result in error propagation of recon-structed video and thus induce severe quality
.
ARTICLE IN PRESSC. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126 117
degradation. This type of picture distortion is calledtransmission distortion.
In order to combat the effect of losses, many errorcontrol techniques are proposed for the packetvideo transmission [2–5]. In Ref. [4], the error-control mechanisms are classified into four cate-gories: forward error correction (FEC), retransmis-sion, error resilience, and error concealment. Forthe streaming media applications, retransmission-based schemes are not allowed usually for real-timeapplications because of long time delay andhundreds of multicast members. Error-resilientschemes deal with packet loss on the compressionlayer, and most of them (e.g., resynchronizationmarking, data partitioning, and data recovery, etc.),are targeted to recover the decoding from bit errors.For packet video, transmission error is usually theentire packet loss and these bit-recovery basederror-control mechanisms may cease to be effective[4]. Since complex error-concealment mechanismshave to be restricted by both the limited processingability and power consumption of the handhelddevices, the simplest and most common approach,previous frame repetition, is adopted in the mobilestreaming media applications usually. FEC-basedunequal loss protection (ULP) is one of the efficienterror-control schemes used for the transmission ofcompressed video streaming over packet-loss net-works [6,7], by which limited channel resources areallocated efficiently to achieve increased rate-distor-tion (R-D) performance.
One of the most critical aspects of efficientresource allocation is accurate evaluation of theend-to-end video quality [8]. In the accuratetransmission-distortion-estimate (TDE) schemeswith moderate complexity, such as ROPE algorithm[9] and the statistics-based analytic model [10], theoverall distortion accumulated from previousframes is computed to determine the coding modefor current macroblock (MB), where the totaltransmission distortion include distortion in currentMB and its propagation in subsequent framescannot be obtained. The low-complexity modelsinclude the method considering the intra refreshingand spatial loop filtering [11] and the method usingthe basic concepts of control systems [12]. It isworth noting that the low-complexity estimatemodels above are applicable for the low error-ratesapplications. For example, the error rate in Ref. [11]is less than 6% and that in Ref. [12] is about 8%. Inthe mobile video applications, packet loss usuallycorresponds to whole-frame loss due to low
bandwidth and small picture size [13], which resultsin high error rates (an amount of blocks withnonzero motion vectors (MVs) are corrupted due toprevious frame concealment) and thus the abovelow-complexity schemes are ineffective [11]. Thesimplified distortion-estimate model without con-sidering the picture complexity, such as ELEP inRef. [6] where only temporal error propagation istaken into account, are usually not accurate enoughand thus optimal R-D performance cannot beachieved. Based on statistics of error propagation,a method for lightweight prediction of videodistortion is proposed in Ref. [14].
In this paper, by exploiting the pre-computedinformation (such as MVs) of stored video signals,we first present a recursive TDE algorithm in thispaper to compute the GOP-level transmissiondistortion for whole-frame losses. We then developa low-complexity TDE model using piecewiselinear-fitting approach based on the study on theerror propagation behaviors of whole-frame losses.The experimental results demonstrate that the twoproposed TDE models are accurate and robust. Theproposed transmission distortion models provide atype of fast and accurate importance-assessmenttools in allocating limited channel resources effec-tively.
The rest of this paper is organized as follows. InSection 2, we analyze the error propagationcharacteristics and unequal importance for differentframes in a Group-Of-Pictures (GOP). Section 3presents the recursive TDE algorithm for the storedvideo streaming, and Section 4 gives a piecewiselinear-fitting approach based low-complexity esti-mate model. Simulation results are shown in Section 5.Finally, conclusions are drawn in Section 6.
2. Error propagation and unequal importance in a
GOP
2.1. Interframe error propagation
The common video coding scheme employsinterframe prediction to remove temporal redun-dancies. Although interframe coding generallyachieves higher compression efficiency, it is moresensitive to channel packet losses since each inter-frame prediction depends on its predecessor and anypacket loss may break the prediction chain andaffect all subsequent inter-predicted frames.
Let a packet containing data from the currentframe be lost in the channel, and let the decoder
ARTICLE IN PRESSC. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126118
perform previous-frame-repetition error conceal-ment. Clearly, the resulting reconstruction at thedecoder is different from the reconstruction at theencoder. Note that at the decoder side, the currentreconstruction frame ‘‘corrupted’’ by packet losswill still be used as the motion-compensationreference for the next frame. In this way, thechannel distortion propagates along the motion-compensation path. Whenever the motion vector isnonzero, the error will propagate in both thetemporal and the spatial directions [9], which makesthe resulting artifacts particularly annoying (asshown in Figs. 1 and 2).
2.2. Unequal importance assessment using GOP-level
transmission distortion
In video sequences, a typical GOP is composed ofone I-frame followed by ‘‘NG-1’’ P- and B-frames(here NG denotes the total number of frames in oneGOP). Since B-frame losses do not interfere withother frames, we consider the frame sequences in a
Frame loss
Errors induced by concealment
Time
Fig. 1. Illustration of spatio-temporal error propagation.
Fig. 2. Illustration of spatial-temporal error propagation due to the lo
(the frame index in the two picture groups above is: 1, 5, 10, 15, 20, an
GOP structure without B-frames. In this structure,losing different frames of a GOP often results indifferent distortion. In other word, the frames in aGOP have unequal importance [15]. Fig. 2 gives theillustrated sample: the errors induced by the loss ofthe second frame leads to more artifacts in thefollowing frames (Fig. 2, up) compared with that bythe loss of the fifth frame (Fig. 2, down). There havebeen some video transmission strategies that con-sider the unequal importance in a GOP. In Ref. [16],channel resources are optimally assigned to the P-frames with degressive importance in a GOP.However, when video scenario or GOP size changes,it may be inaccurate that preceding frames are moreimportant than the following frames [17]. In thestate-of-art unequal-error-protection schemes, ac-curate evaluation of the GOP-level transmissiondistortion is a key element in assigning the channelresources optimally [8]. Since more transmissiondistortion means more importance, we utilize theGOP-level transmission distortion DG(n0), whichdenotes the overall distortion in a GOP that inducedby the loss of frame n0 and corresponding errorconcealment, to assess the importance of differentframes in a GOP accurately.
Let F(n, i) be the original value of pixel i in framen, F ðn; iÞ and ~F ðn; iÞ be its encoder reconstructionand decoder reconstruction respectively. Let Dt(n0)be the instantaneous transmission distortion (ITD)induced by error concealment in case of the loss offrame n0, defined as the mean square error (MSE)between its encoder and decoder reconstruction.Based on this definition, we can get its value asfollows:
Dtðn0Þ ¼1
NP
XNP�1
i¼0
½F ðn0; iÞ � ~F ðn0; iÞ�2; (1)
ss of the second frame (up) and the fifth frame (down) separately
d 25).
ARTICLE IN PRESS
0
2
4
6
8
18
16
14
12
10
0 5 10 15 20 25 30
Frame index
GO
P-le
vel D
isto
rtio
n
0
200
400
600
800
1000
1200
0 5 10 15 20 25 30
Frame indexG
OP-
leve
l Dis
tort
ion
0
350
300
250
200
150
100
50
0 5 10 30252015
Frame index
GO
P-le
vel D
isto
rtio
n
Fig. 3. GOP-level transmission distortion vs. frame index for three test sequences: (a) Akiyo, (b) Foreman and (c) Carphone.
C. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126 119
where NP is the total number of pixels in a frame.Due to most of the prevalent video coding schemesare based on a hybrid structure of motion compen-sated prediction, strong spatio-temporal depen-dency is created in the compressed bitstreams,which will result in the error propagation from thelost frame to its following pictures in the same GOP.Similarly to Eq. (1), the distortion of picture (n0+k)introduced by the error propagation from frame n0can be expressed as:
Dtðn0 þ kÞ ¼1
NP
XNP�1
i¼0
½F ðn0 þ k; iÞ � ~F ðn0 þ k; iÞ�2.
(2)
Based on the above analysis, GOP-level transmis-sion distortion for frame n0, DG(n0), can be obtainedby the accumulation of the ITD induced in thecurrent frame and that in the following frames inthe same GOP:
DGðn0Þ ¼XNG�1
n¼n0
DtðnÞ ¼1
NP
XNG�1
n¼n0
XNP�1
i¼0
½F ðn; iÞ � ~F ðn; iÞ�2.
(3)
Using Eq. (3), Fig. 3 plots the correspondingGOP-level distortion DG(n0) when losing differentframes for three typical QCIF sequences, namelyAkiyo, foreman, and Carphone. Each sequence issimulated with GOP size 30. As can be seen fromFig. 3, different frames in same GOP result indifferent GOP-level transmission distortion.
3. Recursive computation of the GOP-level
transmission distortion
Unfortunately, the computational complexity isvery high when Eq. (3) is used to calculate the GOP-
level transmission distortion, and the low-complex-ity TDE models will be badly suited to the higherror rates (induced by whole-frame losses) wirelessvideo. Thus, for the mobile streaming media withwhole-frame losses, the theoretical or approximatemodel with modest complexity is needed to computethe GOP-level transmission distortion with reason-able accuracy. Considering the characteristic thaterror propagates along the motion predictionpath and the available video information (such asMVs) in the pre-coded media, we develop arecursive transmission–distortion–computation ap-proach, which is presented in this section.
3.1. Recursive construction of the motion vector
mapping table
We construct the MV table V for each P-framefirstly: the element V(n, i) in the table is assignedaccordingly with Eq. (4):
V ðn; iÞ ¼MVðn; iÞ ði 2 Inter-coded blockÞ;
CMAX ði 2 Intra-coded blockÞ:
((4)
Here MV(n, i) is the motion vector value of pixel i inframe n, and CMAX is a predefined possiblemaximum of MV. For non-real-time applications,knowledge about the interdependence among blocksor pixels can be obtained by analyzing MVs betweensuccessive frames [18]. We develop a mapping tableM using MVs to construct the dependence graphbetween the lost frame and its following framesdirectly, which is used to calculate the distortionpropagated in a GOP.
An example of how to construct a graph andcalculate the mapping vector is presented in Fig. 4.In Fig. 4, pixel c in frame n0+2 is predicted by usingpixel b in frame n0+1, and the reference pixel of b is
ARTICLE IN PRESS
Fig. 4. An illustration of constructing a MV mapping.
C. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126120
pixel a in frame n0. We thus define the mappingvector of pixel c to be:
Mn0þ2;n0 ðcÞ ¼ V ðn0 þ 2; cÞ þ V ðn0 þ 1; bÞ. (5)
Similarly, the mapping vector can be calculatedby Eq. (5), generally and recursively.
Mn0þ1;n0ðiÞ¼ V ðn0 þ 1; iÞ
Mn0þ2;n0ðiÞ¼ V ðn0 þ 2; iÞ þMn0þ1;n0ðRn0þ2;n0þ1ðiÞÞ
..
.
Mn0þk;n0ðiÞ¼ V ðn0 þ k; iÞ þMn0þk�1;n0 ðRn0þk;n0þk�1ðiÞÞ
ð6Þ
In Eq. (6), Rn0þk;n0 (i) represents the reference pixel inn0th frame of pixel i in frame n0+k, and its valuecan be calculated from the constructed MV map-ping table:
Rn0þk;n0 ðiÞ ¼ i þ DðMn0þk;n0 ðiÞÞ, (7)
where D( � ) denotes the index offset, which can beobtained according to the mapping vector.
Note that in Eqs. (4)–(7), the left-hand expressionshould be assigned to CMAX when one of thecomponents in the right-hand side equals toCMAX. Here CMAX is introduced to solve theproblem that the mapping chains may be brokenby the intra-coded blocks. Once an intra-codedblock is met in the mapping chain, the propagationof transmission error will stop and the distortion forthe pixel predicted from the intra-coded blocks is setto zero by
Dtðn0 þ k; iÞ ¼ 0 when Rn0þk ;n0ðiÞ ¼ CMAX, (8)
where Dt(n0+k,i) denotes the pixel-level transmis-sion distortion of pixel i in frame n0+k.
Based on above construction scheme, only singleMV mapping table Mn0þk;n0 is needed during thecalculation of GOP-level transmission. In practice,this MV mapping table will be updated dynamicallyalong with the process of the distortion computing.
3.2. Recursive calculation of instantaneous
transmission distortion
Assume all the frames in a GOP are receivedcorrectly, we can get the following relations:
~F ðn; iÞ ¼ F ðn; iÞ ¼ F ðn� 1;Rn;n�1ðiÞÞ þ eðn; iÞ, (9)
where F ðn� 1;Rn;n�1ðiÞÞ is the encoder reconstruc-tion of pixel i’s reference pixel and eðn; iÞ is thequantized prediction error. To assess the impor-tance of frame n0 accurately, we need to calculate itsGOP-level transmission distortion under the as-sumption that only frame n0 is lost and the restpictures in the same GOP are error-free. Supposethat previous frame repetition is taken as the errorconcealment (i.e., ~F ðn; iÞ ¼ ~F ðn� 1; iÞ), we can getthe derivation in Eq. (10):
~F ðn0; iÞ¼ F ðn0 � 1; iÞ
~F ðn0 þ 1; iÞ¼ ~F ðn0;Rn0þ1;n0ðiÞÞ þ eðn0 þ 1; iÞ
¼ F ðn0 � 1;Rn0þ1;n0ðiÞÞ þ eðn0 þ 1; iÞ
~F ðn0 þ 2; iÞ¼ ~F ðn0 þ 1;Rn0þ2;n0þ1ðiÞÞ þ eðn0 þ 2; iÞ
¼ F ðn0 � 1;Rn0þ2;n0ðiÞÞ
þeðn0 þ 1;Rn0þ2;n0þ1ðiÞÞ þ eðn0 þ 2; iÞ
..
.
~F ðn0 þ k; iÞ¼ F ðn0 � 1;Rn0þk;n0ðiÞÞ
þeðn0 þ 1;Rn0þk;n0þ1ðiÞÞ þ � � � þ eðn0 þ k; iÞ
ð10Þ
Similarly, F ðn0 þ k; iÞ can be expressed as:
F ðn0 þ k; iÞ ¼F ðn0;Rn0þk;n0 ðiÞÞ þ eðn0 þ 1;Rn0þk;n0þ1ðiÞÞ
þ � � � þ eðn0 þ k; iÞ. ð11Þ
Substituting Eqs. (10) and (11) into Eqs. (1) and (2),respectively, we can get:
Dtðn0; iÞ ¼ ½F ðn0; iÞ � F ðn0 � 1; iÞ�2, (12)
Dtðn0 þ k; iÞ ¼½F ðn0 þ k; iÞ � ~F ðn0 þ k; iÞ�2
¼ ½F ðn0;Rn0þk ;n0 ðiÞÞ � F ðn0 � 1;Rn0þk ;n0ðiÞÞ�2
¼ Dtðn0;Rn0þk ;n0 ðiÞÞ ð13Þ
From Eq. (13), it can be found that Dt(n0+k, i) canbe obtained using Dt(n0, j) (j ¼ Rn0þk;n0ðiÞ). To getthe value of Rn0þk;n0 ðiÞ and Dt(n0, i), the informationof MVs and F ðn0; iÞ are necessary. For the videoserver used to transmit the compressed bitstreams,information such as MVs and reconstructed pixelF ðn0; iÞ can be gotten by decoding the compressedbitstreams and reconstructing the pictures in the
ARTICLE IN PRESSC. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126 121
sender side. Therefore, the GOP-level transmissiondistortion DG(n0) can be calculated by accumulatingthe ITD Dt(n0+k) (n0pn0+kp(NG�1)), accordingto Eqs (3)–(13).
3.3. Complexity analysis
Compared with the traditional computing methodusing Eq. (3), an amount of frames reconstructionand MSE computation (from n0+1 to NG�1) can besaved during the GOP-level transmission distortioncalculating by using the above recursive scheme. Fora given GOP size NG, the computational complexity(including frame-level decoding and MSE comput-ing) of traditional method is O((NG)!), while thatvalue of the proposed recursive method is O(NG).Given NG ¼ 30, we compare the computationalcomplexity of traditional method and that of theproposed recursive method in Table 1.
It can be seen from Table 1 that the operations offrame-level decoding and MSE computation, whichform most of the computational burden, arereduced significantly by the recursive scheme insteadof the traditional scheme. Although additional data(distortion) moving and addition operation arerequired in the proposed method, the complexityis negligible compared with that of decoding andMSE computation. Considering most of the trans-mitters are PC-based server, the computationalcomplexity of the recursive scheme is modest forthe mobile streaming video applications.
Note that in the application of video servers withmultiple concurrent video streams, where up to300–400 concurrent streams are required in somelarge-scale mobile VOD systems, the computationalcomplexity using the above recursive algorithm maybecome unacceptable. Based on the recursive dis-tortion–computation algorithm, we propose a low-complexity GOP-level TDE model to reduce thecomputing load further.
Table 1
Computational complexity comparison for different GOP-level
TDE methods with GOP size 30
Operation (frame-level) Computational complexity
Traditional
method
Recursive
method
Decoding 465 times 30 times
MSE computation 465 times 30 times
Data moving and
addition
0 times 465 times
4. Low-complexity modeling for GOP-level
transmission distortion estimate
4.1. The behavior of the whole-frame-loss error
propagation
For typical mobile video, the compressed size ofone video frame can become fairly small (800 byteson average for 10 frames per second of QCIFvideo over 64Kbit/s wireless channel [13]), and asingle packet per video frame is often adopted toensure efficient packet header overhead. Thus, apacket loss corresponds to one whole-frame-loss[13,19].
The authors in Ref. [10] have demonstrated thatthe impulse transmission distortion has an exponen-tial fading behavior. However, the fading behavior oferror propagation is determined by two effects: intra-block coding and repeated spatial filtering (i.e., sub-pixel interpolation) [2]. In fact, if the two effects areabsent, the errors caused by MBs losses will not decayover time (the simulation A in Ref. [2]). In otherwords, the distortion will present exponential fadingbehavior only in the low error rate cases, such as the8% packet loss rate simulated in Ref. [10]. The reasonis that the advanced video coding standards (MPEG-4 [20] or H.264/AVC [21]) provide embedded spatialfiltering function: the few mistakes can be smoothedout by a large number of error-free blocks usingsubpixel interpolation. In the case of whole-framelosses occurred in mobile video communication, thehigh error rates will invalidate the filtering efficiencyof interpolation, and the error propagation may notpresent exponential fading behavior [17].
By using H.264/AVC reference software JM11.0[22], we simulate and analyze the behavior of sixtypical QCIF sequences: Akiyo, Carphone, Foreman,Hall, Salesman, and Coastguard. These sequenceshave a wide spectrum of scene characteristics. Thesequences are coded without considering the chan-nel losses, and the encoding structure is an intra-coded frame followed by a serial of P-frames. Weintroduce a transmission error (frame loss) at framen0 ¼ 3, and the rest frames are error free (receivedcorrectly). Here, the frame number 3 is arbitrarilychosen, and the GOP length NG is set to 30. Let
DtðnÞ ¼DtðnÞ
Maxn¼1;...;NG�1fDtðnÞg
, (14)
which is the normalized transmission distortion.In Fig. 5, we plot the normalized transmission
ARTICLE IN PRESS
0.00
0.20
0.40
0.60
0.80
1.00
1.20
0 5 10 15 20 25 30Frame index
Nor
mal
ized
Dis
tort
ion
AkiyoCarphoneForemanHallSalesmanCoastguard
Fig. 5. The propagation behavior of the whole-frame-loss
transmission distortion (the third frame is lost).
1
0.8
0.6
0.4
0.2
00 10 20 30
Frame Number
Nor
mal
ized
Dis
tort
ion
c1
a1 b1b2(a2)
(a3)
c2
c3
b3
Fig. 6. The piecewise linear-fitting for the transmission distortion
estimate (the simulation is based on the test sequence Carphone
with the loss of frame 3).
C. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126122
distortion DtðnÞ as a function of the frame index (ortime).
It can be seen from Fig. 5 that most of thetransmission distortion plots present linear orpiecewise linear aspect. Thus, we can use linearfunction to approximate the propagation behaviorof the impulse transmission distortion. The approx-imation will be more accurate when piecewiselinear-fitting is used, as shown in Fig. 6.
If we use the proportion s of the area thattransmission distortion propagates from frame n0 tothe end of the GOP, and take the area as a sum of K
trapeziums. Here K is the total number of thepiecewise. Let ai, bi, and ci denote the two sides andthe height of each trapezium, respectively (see Fig. 6),then the estimated GOP-level distortion DGðn0Þ canbe calculated by
DGðn0Þ ¼1
NG
XK
i¼1
si ¼1
NG
XK
i¼1
ai þ bi
2ci (15)
with
a0 ¼ Dtðn0Þ
aiþ1 ¼ bi ¼ Dtðn0 þ kiÞ
ki ¼Pi
j¼1
cj
9>>>>=>>>>;. (16)
4.2. Complexity analysis
Based on analysis above, we can get the GOP-leveltransmission distortion using Eq. (15) and (16). Foreach piece, there are two parameters needed: instanttransmission distortion Dt(n0) and Dt(n0+k). With thepiecewise fitting approach, the transmitter can com-pute the GOP-level transmission distortion using onlya few frame-level samples (start and end points foreach piece). Obviously, the more sample points areused, the more accurate the estimation will be.
Based on the recursive algorithm presented inSection 3, we can get Dt(n0+k) using Dt(n0) and MVmapping table directly. Thus, the computational costand the induced time-delay are reduced further: frame-level distortion-data-accumulating operations can bedecreased from
PNG
n¼1n ¼ 465 times toPNG�1
n¼1 ½ðNG �
nÞ=K� þNG ¼ 131 times (with NG ¼ 30 and K ¼ 5).This means that about two-third computationalburden is saved for the piecewise fitting scheme.
Compared with the existing low-complexity dis-tortion-estimate schemes, such as the modelsproposed in Refs. [16,14], the proposed method isa little bit more complex. However, the proposedmethod has less misclassification ratio due to theaccurate tracking of the error propagation, and thusboth the temporal and special error propagation canbe considered in the proposed scheme. The referenceschemes in Refs. [16,14] are both developed basedon the error propagation length, and thus only thetemporal error propagation is considered in thedistortion computing. The low-computational com-plexity distortion-estimate model in Ref. [14] isparticularly applicable to the simply two-classDiffServ networks. However, if the picture’s priorityor class number is larger than 2, the proposedscheme may be more general.
5. Simulation results
Simulations have been carried out to evaluate theperformance of the proposed GOP-level TDEmodels. We used H.264/AVC reference softwareJM11.0 [22] to encode six typical test videos withoutconsidering any channel losses. All of them are
ARTICLE IN PRESSC. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126 123
coded with a GOP size of 30 frames (one previousframe is used as reference) and the frame rate is setto 15 frames per second (fps). The proposed twoTDE models are simulated separately to get theGOP-level transmission distortion for each frame ina GOP.
0
30
25
20
15
10
5
35
40
0 10 20 30 40 50 60 70 80 90
Frame Index
GO
P-L
evel
Dis
tort
ion
Actual
Recusive-Est.
Fitting-Est.
0
10
20
30
40
50
60
70
80
90
0 10 20 30 40 50 60 70 80 90
Frame Index
GO
P-L
evel
Dis
tort
ion
Actual
Recusive-Est.
Fitting-Est.
0
50
100
150
200
250
0 10 20 30 40 50 60 70 80 90
Frame Index
GO
P-L
evel
Dis
tort
ion
Actual
Recusive-Est.
Fitting-Est.
c d
e f
Fig. 7. Estimation of the GOP-level transmission distortion for six QC
(e) Carphone and (f) Coastguard.
The actual transmission distortion (named ‘‘Ac-tual’’), the estimation based on recursive algorithm(named ‘‘Receusive-Est.’’) and the estimation usingpiecewise linear-fitting approach (named ‘‘Fitting-Est.’’) are shown in Fig. 7. Although non-integermotion compensation and deblocking filters used in
0
10
20
30
40
50
60
70
80
90
0 10 20 30 40 50 60 70 80 90
Frame Index
GO
P-L
evel
Dis
tort
ion
Actual
Recusive-Est.
Fitting-Est.
0
200
200
300
400
500
600
700
800
0 10 20 30 40 50 60 70 80 90
Frame Index
GO
P-L
evel
Dis
tort
ion
Actual
Recusive-Est.
Fitting-Est.
0
200
400
600
800
1000
1200
1400
0 10 20 30 40 50 60 70 80 90
Frame Index
GO
P-L
evel
Dis
tort
ion
Actual
Recusive-Est.
Fitting-Est.
IF test sequences: (a) Akiyo, (b) Hall, (c) Salesman, (d) Foreman,
ARTICLE IN PRESSC. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126124
H.264/AVC introduce cross-correlation betweenpixels that make recursive estimate algorithm lessprecise, it can be seen from Fig. 7 that the twoproposed transmission distortion models are bothaccurate and robust. The piecewise linear-fittingscheme uses fewer samples than the recursivescheme, while its relative importance judgementerror (RIJE) values are almost the same as that ofthe recursive algorithm, which demonstrates thevalidity of the proposed low-complexity TDEmodel.
Since the most critical application of TDE is theassessment of importance for different frames in aGOP, we use the following formula:
E ¼
PNG�1n¼0 eðnÞ
NG� 100% (17)
to measure the RIJE. Here e(n) is the error flag forframe n. If the estimated priority of frame n equalsto its priority assigned by actual transmissiondistortion, e(n) ¼ 0; otherwise, e(n) ¼ 1.
Two experiments were given to compare the RIJEof different TDE models. In first experiment, theframes in a GOP are sorted into two classes: the first20% frames with the highest GOP-level transmis-sion distortion are assigned to the premium class,
Table 2
Relative importance judgement error comparison with class number ¼
Video sequence Bit-rate (Kbit/s) RIJE (%)
Scheme-I Sch
Akiyo 48 29 22
Hall 64 29 16
Salesman 64 11 11
Foreman 64 16 13
Carphone 96 16 11
Coastguard 96 33 62
Table 3
Relative importance judgement error comparison with class number ¼
Video sequence Bit-rate (Kbit/s) RIJE (%)
Scheme-I Sch
Akiyo 48 41 39
Hall 64 53 39
Salesman 64 42 26
Foreman 64 49 58
Carphone 96 34 30
Coastguard 96 33 62
and the rest is sent as normal class. The two-classnetwork architecture is one of the simplest DiffServscenarios [14]. To simulate the scenario with morethan two priorities, such as the fine ULP scheme inRef. [16], we set the class number to 3 in the secondexperiment, where every 10 frames with similardistortion are grouped and assigned of one samepriority.
In Tables 2 and 3, we give the RIJE comparisonresults for five TDE schemes. Scheme-I is the LEP(length of error propagation) method used in Ref.[16] and Scheme-II denotes the method that use theinstantaneous distortion as the GOP-level transmis-sion distortion; Scheme-III uses the model-baseddistortion estimation developed in Ref. [14],Scheme-IV and Scheme-V correspond to the pro-posed recursive algorithm and fitting scheme,respectively.
From Tables 2 and 3, we can find that in all cases,the RIJE of the proposed models (Scheme-IV andScheme-V) are less than that of the intuitive models(Scheme-I to Scheme-III). It implies that theproposed TDE models are more accurate. AlthoughScheme-III can achieve low misclassified ratio in thetwo-class scenario, its performance is decreasedquickly in the three-class scenario. The reason is the
2
eme-II Scheme-III Scheme-IV Scheme-V
18 2 2
13 7 7
7 0 0
13 0 0
2 2 2
18 13 16
3
eme-II Scheme-III Scheme-IV Scheme-V
38 7 10
13 13 13
20 13 16
56 23 26
6 3 3
18 13 16
ARTICLE IN PRESSC. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126 125
distortion-estimate model used in Scheme-III con-siders the temporal error-propagation only, andspatial error-propagation is ignored. Due to thelow-computational complexity, Scheme-III is veryapplicable to the simply two-class DiffServ net-works [14]. However, for the finer ULP applicationswith more priority number, the proposed schemes(Scheme-IV and Scheme-V) will be more accurateand more applicable.
From the comparison in Table 3 and Fig. 7, it canbe found that the RIJE is higher when theestimation matches the experimental results well(refer to the RIJE values in Table 3 and theillustrations of Foreman, Salesman, and Coastguard
in Fig. 7). One rational explanation is that the GOP-level transmission distortion in these sequences ismore ‘‘regular’’ than other sequences: only a fewsamples are abnormally (far outweighs others) andthe rest samples are with similar distortion value.Most of the assessment errors may occur in the flatregion where the distortion difference of adjacentframes is very small. This type of importance
35
36
37
38
39
40
5 10 15 20Average Packet Loss Rate (%)
Ave
rage
PSN
R-Y
(dB
)
Akiyo with Actual distortionAkiyo with Scheme-I
Akiyo with Scheme-IIIAkiyo with Scheme-V
25
26
27
28
29
30
31
5 10 15 20
Average Packet Loss Rate (%)
Ave
rage
PSN
R-Y
(dB
)
Foreman with Actual distortion
Foreman with Scheme-I
Foreman with Scheme-III
Foreman with Scheme-V
Fig. 8. Performance (PSNR vs. packet loss rate) comparison
between proposed scheme and reference schemes for Akiyo (up)
and Foreman (down) sequences.
judgment errors should have negligible effects onthe optimal resource allocation.
An error scenario was also simulated to investi-gate the performance of the proposed low-complex-ity TDE model. A two-class DiffServ network isimplemented with a discrete-even simulator; the20% frames with highest GOP-level transmissiondistortion are assigned to premium service withnearly no losses and low delay, and the rest is sent asthe regular best-effort traffic [14]. Fig. 8 comparesthe performance of Scheme-I, Scheme-III andScheme-V as a function of packet loss rate. Resultswere obtained by simulating the transmission of theForeman and Akiyo sequences. The results depictedin each experiment are averaged over 100 simulationruns. From Fig. 8, it can be seen that the proposedscheme (Scheme-V) gets the approximate R-Dperformance compared with the actual distortionscheme, and outperforms the reference scheme(Scheme-I and Scheme-III).
6. Conclusion
In this paper, we have firstly proposed a recursiveestimation algorithm to compute the GOP-leveltransmission distortion induced by whole-framelosses. Based on the study on the propagationbehavior of the whole-frame-loss transmission dis-tortion for stored video streaming, we have thendeveloped a low-complexity model to estimate theGOP-level transmission distortion accurately androbustly. With the estimation, the transmitter canassess the importance for each frame in a GOPeffectively. This low-complexity estimation modelcan be incorporated with channel resources alloca-tion algorithm to achieve increased system perfor-mance for the mobile media streaming applications.
Acknowledgments
The authors would like to acknowledge thefinancial support of National Science Foundationof China (NSFC) under grants nos. 60502034 and60625103. We also would like to thank theanonymous reviewers for their valuable suggestionsthat greatly improved the presentation of this paper.
References
[1] Q. Zhang, W. Zhu, Y.Q. Zhang, End-to-end QoS for video
delivery over wireless Internet, Proc. IEEE 93 (1) (January
2005) 123–134.
ARTICLE IN PRESSC. Zhang et al. / Signal Processing: Image Communication 23 (2008) 116–126126
[2] B. Girod, N. Farber, Feedback-based error control for
mobile video transmission, Proc. IEEE 87 (10) (October
1999) 1707–1723.
[3] Y. Wang, Q.F. Zhu, Error control and concealment for
video communication: a review, Proc. IEEE 86 (5) (May
1998) 974–997.
[4] D. Wu, Y.T. Hou, Y.Q. Zhang, Transporting real-time
video over the Internet: challenges and approaches, Proc.
IEEE 88 (12) (December 2000) 1855–1875.
[5] Z.G. Li, C. Zhu, N. Ling, X.K. Yang, G.N. Feng, S. Wu, F.
Pan, A unified architecture for real time video coding
systems, IEEE Trans. Circuits Syst. Video Technol. 13 (6)
(2003) 472–487.
[6] X.K. Yang, C. Zhu, Z.G. Li, X. Lin, G.N. Feng, S. Wu, N.
Ling, Unequal loss protection for robust transmission of
motion compensated video over the Internet, Signal
Process.: Image Commun. 18 (2003) 157–167.
[7] M. Murroni, A power-based unequal error protection
system for digital cinema broadcasting over wireless
channels, Signal Process.: Image Commun. 22 (2007)
331–339.
[8] A.K. Katsaggelos, Y. Eisenberg, F. Zhai, R. Berry, T.N.
Papps, Advances in efficient resource allocation for packet-
based real-time video transmission, Proc. IEEE 93 (1)
(January 2005).
[9] R. Zhang, S.L. Regunathan, K. Rose, Video coding with
optimal inter/intra-mode switching for packet loss resilience,
IEEE J. Selected Areas Commun. 18 (June 2000) 966–976.
[10] Z.H. He, J.F. Cai, C.W. Chen, Joint source channel rate-
distortion analysis for adaptive mode selection and rate
control in wireless video coding, IEEE Transact. Circuits
Syst. Video Technol., Special Issue on Wireless Video 12 (6)
(June 2002) 511–523.
[11] K. Stuhlmuller, N. Farber, M. Link, B. Girod, Analysis of
video transmission over lossy channels, IEEE J. Selected
Areas Commun. 18 (June 2000) 1012–1032.
[12] Z.H. He, H.K. Xiong, Transmission distortion analysis for
real-time video encoding and streaming over wireless
networks, IEEE Transact. Circuits Syst. Video Technol. 16
(9) (September 2006) 1051–1062.
[13] H. Liu, W.J. Zhang, S.Y. Yu, X.K. Yang, Channel-aware
frame dropping for cellular video streaming, in Proceedings
of the International Conference on Acoustics, Speech, Signal
Processing, Thulouse, France, May 2006.
[14] F. De Vito, D. Quaglia, J.C. De Martin, Model-based
distortion estimation for perceptual classification of video
packets, in: Proceedings of IEEE Multimedia Signal Proces-
sing Workshop, vol. 1, Siena, Italy, September 2004,
pp. 79–82
[15] A. Albanese, J. Blomer, J. Edmonds, M. Ludy, M. Sudan,
Priority encoding transmission, IEEE Trans. Inform. Theory
42 (November 1996) 1737–1744.
[16] X.K. Yang, C. Zhu, Z.G. Li, G.N. Feng, S. Wu, N. Ling,
Unequal error protection for motion compensated video
streaming over the Internet, in: Proceedings of the Interna-
tional Conference on Image Processing (ICIP2002), vol. 2,
New York, USA, September 2002, pp. II-717–II-720.
[17] C.Y. Zhang, H. Yang, S.Y. Yu, X.K. Yang, H. Liu,
Transmission distortion modeling for unequal importance
judgement, in: Proceeding of the IEEE 2007 International
Conference on Multimedia & Expo (ICME 2007), Beijing,
China, July 2007.
[18] M.H. Willebeek-LeMair, Robust H.263 video coding for
transmission over the Internet, in: Proceedings of the IEEE
INFOCOMM’98, March 1998.
[19] S. Belfiore, M. Grangetto, E. Magli, G. Olmo, Concealment
of whole-frame losses for wireless low bit-rate video based
on multiframe optical flow estimation, IEEE Transact.
Multimedia 7 (2) (April 2005) 316–329.
[20] ISO/IEC JTC 1/SC 29/WG11, 14496-2: Information tech-
nology—generic coding of audio-visual objects—Part 2:
Visual, MPEG99/N 2688, Seoul, March 1999.
[21] ITU-T Rec.H.264|ISO/IEC 14496-10 AVC, Advanced Video
Coding for Generic Audio–Visual Services, May 2003.
[22] /http://iphome.hhi.de/suehring/tml/download/old_jm/jm11.0.
zipS.