[IEEE Multimedia and Expo, 2007 IEEE International Conference on - Beijing, China (2007.07.2-2007.07.5)] Multimedia and Expo, 2007 IEEE International Conference on - A Motion-Based

A MOTION-BASED SELECTIVE ERROR PROTECTION METHOD FOR SCALABLEVIDEO OVER ERROR-PRONE CHANNEL

Yu Wang, Lap-Pui Chau, Kim-Hui Yap

School of Electrical and Electronics EngineeringNanyang Technological University, Singapore, 639798

ABSTRACT

Video transmission over unreliable networks introduces newchallenges in video coding. Due to the predictive codingtechniques, the effect of channel errors on the decoded videocan be extremely severe when the compressed video istransmitted over error-prone channel. In this paper, theproblem of scalable video transmission over error-pronechannel is addressed. It is proposed to selectively addforward error correction (FEC) codes to partial informationof the compressed bit-stream based on the motion activity ofthe input video. In addition, unequal error protection (UEP)is applied on the selected data of different temporal layers ina group of pictures (GOP), where the channel rates areoptimally allocated. It is shown from the experimentalresults that our proposed method has a good performanceand the improvement is up to 1.2 dB.

1. INTRODUCTION

Recent advances in technology have led to an increasinginterest in video services over wireless networks. Althoughtoday's broadband wireless networks can transmit videodata at high bit-rate, there are still major challenges existing,such as fluctuations in channel quality and high bit errorrate. Transmission errors, together with lossy source codingtechniques, lead to the distortion of the video at the decoderside. To alleviate the effect of transmission error, the videoapplications are required to develop robust video codingtechniques to ensure that the quality of the decoded video isnot overly affected by the channel unreliability.Different types of error control methods have been proposedto improve the robustness of the transmitted video data.Among all the error control techniques, error protection hasbeen proven to be very effective. This kind of technique isimplemented at the transport layer by introducingredundancy in the transmitted information, which helps todetect or correct the transmission errors. FEC is well knownfor error recovery in video communications. A significantportion of previous research on FEC for videocommunication employed equal error protection (EEP), inwhich all the bits in the compressed video stream are treated

equally. However, different parts of the compressed bit-stream are not equally important. UEP assigns moreprotection to the more significant information bits tooptimize the perceptual quality of video. Therefore, UEP isextensively studied in the current research [1 ]-[3] [6].Priority encoding transmission (PET) [1] is a representativework of UEP. Different priorities of error protection are setto different segments of the video bit-stream. In [2], an UEPscheme is employed on three types of frames consideringthe bit error sensitivity of different types of frames. UEP hasalso been applied on the scalable video coding (SVC).Different layers of scalable coding are not equallyimportant. UEP schemes normally allocate more protectionbits for the base layer information while add less protectionbits for the enhancement layers. For example, van derSchaar and Radha [3] propose to apply UEP between base-and enhancement- layer of fine-granularity-scalability(FGS) coding. Some other error protection schemes areproposed to assign a higher priority to the bits for headersand motion vectors in the compressed bit-stream. Forinstance, in [4], selective use of FEC on the motioninformation is employed, which can result in goodperformance without introducing too much redundancy bits.In this paper, we propose a novel UEP scheme for thescalable video with temporal scalability to provide agraceful degradation of video quality during transmissionover packet-erasure channel. For the scalable video withtemporal scalability, the compressed bit-stream in each GOPis divided into several temporal layers. Partial data areselected from each temporal layer based on the motionactivity. Unequal amount of channel protection codes areoptimally allocated to the selected data in each temporallayer considering the error sensitivity and the channelcondition. Dynamic programming is employed to solve theoptimization problem.The rest of this paper is organized as follows. In section 2,we firstly present the overview of the scheme, where theframework is developed. After that, the optimizationproblem is formulated and properly solved. Theexperimental results are given in section 3. Finally, we drawa conclusion in section 4.

1-4244-1017-7/07/$25.00 ©2007 IEEE 763 ICME 2007

I B3 B2 B3 El B3 B2 B3 I/P B3 B2 B3 El B3 B2 B3 I/P0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

G-op of picture (GOP) G-op of pict-rs (GOP)

Temp-rl Layer0: |Temp-rl Layer1: |TemporalLayer2: | TemporaLayr3 ......... ..

I/P B1 B2 B2 B3 B3 B3 B3

Fig. 1. Structure of hierarchical B pictures.

(a) (b)Fig. 2. Frame 8 of Football (QCIF): (a) all the motion vectors are lost and(b) all the DCT coefficients are lost.

2. PROPOSED ERROR PROTECTION SCHEME

2.1. System Overview

In scalable video coding [8], the input video sequence isdivided into groups of pictures with fixed size m (m = 2T ).Temporal scalability is achieved by hierarchical B pictureswithin each GOP. A typical hierarchical structure isdepicted in Fig. 1. As illustrated in Fig. 1, through dyadicdecomposition, each GOP is decomposed into T + 1temporal layers. In Fig. 1, 4 temporal layers are generated ineach GOP and the pictures with the same type (I/P, BI, B2,B3) are grouped into the same temporal layer. At thedecoder side, if the lowest temporal layer is correctlyreceived, the video sequence can be reconstructed with atemporal resolution fH Im (fH is the frame rate of theoriginal video sequence). With receiving of higher temporallayers based on all the lower layers, the temporal resolutionof the reconstructed video increases until a full resolution isachieved. This is denoted as temporal scalability.Due to the hierarchical structure of the pictures within eachGOP, the effect of channel errors on the video could beextremely severe when the compressed video data istransmitted over error-prone channels. In order to providethe video data with some measure of reliability whilecommunicating over error-prone channels, it is necessary toexert some forms of error control. FEC schemes aredesigned to protect data against channel erasures byintroducing parity packets. Systematic Reed-Solomon (R-S)codes can be used to generate FEC. The block codes aredenoted by a (N, k) pair, where N is the block length andk is the number of source symbols. They have thecapability that the source packets can be accurately

reconstructed when any k of the N packets are correctlyreceived.From the relationship existing among different temporallayers, we can find that the lower temporal layer an erroroccurs in, the more the reconstructed frames will beaffected, and the lower the quality the reconstructedsequence is. To minimize the impact of transmission error, itis straightforward to apply UEP on different temporallayers. But it neglects the fact that different segments in thesame temporal layer are not equally important. To bespecific, normally the bits for motion vectors are moreimportant than the bits for DCT coefficients. Consideringthe relatively small fraction of the total bit-rate occupied bythe motion information, FEC can be selectively applied onthe motion information instead of on all the compressed data[4]. However, it is not always be appropriate to apply theselective FEC on the motion data. For instance, for the videosequence with high motion, sometimes a majority ofmacroblocks (MBs) in a inter frame are encoded as intraMB. Error happened in the intra MB may result in a greatdegradation for the decoded video sequence. Fig. 2 showsthe visual effect of the reconstructed frame under two cases.Fig. 2 (a) illustrates the reconstructed frame when all themotion vectors are lost. Fig. 2 (b) shows the decoded framewhen all the DCT coefficients are lost.Therefore, we propose a motion-based selective errorprotection method to properly allocate the available channelrates on the selected source data. In the scheme, we assumethat an error-free transmission of the I frame in each GOPcan be guaranteed. Thus the UEP scheme is only employedon the hierarchical B frames. The detail of the proposedscheme is presented in the following subsection.

2.2. Proposed Method

Fig. 3 shows the flowchart of the proposed method. At first,dyadic decomposition is applied on a GOP and severaltemporal layers are generated. Subsequently, for eachtemporal layer (excluding the temporal base layer), thepercentage of intra coded MBs is computed. Here we usethe amount of intra MBs to measure the motion activity. Ifthe value of the percentage is larger than a predefinedthreshold THu, scheme 1 will be employed, where all thedata in the temporal layer are assumed to be significant andwill be marked as selected data. Next, the value of thepercentage is compared with a predefined threshold THL. Ifthe value is smaller than THL, we will employ scheme 2,where only the motion information is marked as selecteddata to be protected. Otherwise, we will go to executescheme 3, where both the motion information and the intraMBs are selected for protection. Through the above steps, ineach temporal layer, the source codes to be allocated withchannel protection bits are properly selected. After that, theUEP scheme is proposed to be applied on the selected data.Because of the dependency among different temporal layers,

764

Packe t7 v Lf V J//////////////J/ _ , 7v~~~~~--

J X6614//// Z

} DS ' s;~~~~~~s\/ /// / / / / / / iT.

_ o 1//s

: i 0_1 S j j l,, fj

7w r F ijF ijF 9 9 Z ~~~~~~~~L-

N

Packe t Number

Fig. 4. Generalized UEP scheme for multiple layers in a GOP.

T

PSNRavg = A + Ej 5ii=O

Fig. 3. Flowchart ofthe proposed method.

the selected data in the lower temporal layer will beassigned with more protection while those in the higherlayer will be added with less protection. To maximize thequality of the reconstructed video, the available channelrates should be optimally allocated. Therefore, theoptimization problem is formulated and solved in thefollowing subsection.

2.3. Formulation of the Optimization Problem

The generalized UEP scheme for multiple layers in a GOP isshown in Fig. 4. There are T +1 temporal layers from layer0 to layer T . We add FEC codes for the selectedinformation in each temporal layer. In Fig. 4, S1 is used to

represent the selected source data in temporal layer i, wherei represents the temporal level and i = 1,2,...,T . Unequalamount of FEC codes is added to each S1. The length and

the height of the source codes in Si are denoted by k1 and l,respectively. Therefore, the length of the FEC codes for S1

is N - ki, with N being the number of packets. The packetsize is denoted by L. In the figure, S0 consists the source

codes that are not selected for protection in the GOP. Thelength and the height of S0 are N and lo, respectively.Through proper allocation of the channel rates, it is expectedto maximize the average PSNR of the reconstructed GOP,which is

(1)

where is the PSNR increment contributed by S1 and is

the probability of correctly receiving S, (i = 0,1,..., T). A

is the PSNR of the reconstructed video when only the Iframe in the GOP is correctly received. A and 5 are

calculated experimentally.The probability Pi is related to the packet loss rate over

wireless packet-erasure channel. In this paper, a two-stateMarkov model is used to approximate the wireless channel'spacket loss behavior [5]. The Markov model can becalculated by p(m,N), which illustrates the probability oflosing m packets within N packets. As long as the numberof the lost packets does not exceed the number of protectionpackets, the original data can be reconstructed. Thus Pi can

be calculated asN-k

= p(m, N)m=O

(2)

Till now the problem is deduced to find the optimizedchannel rates allocation vector K, which is

K = [kl k2 ... ki * kT ] (3)Searching of K should follow the following constraints,Constraint 1:

T

Eli <Li=O

(4)

which restricts the total amount of source bits and channelbits not to exceed the target bit-rate. Ii can be calculated as

li = B-1 (5)

k-iwith B1 being the amount of source bytes in S1.

Constraint 2:(6)

765

I1

12

1.

1

1,

1 2

k-, < k-,,,, i = 1, 21 ... IT-1

which confines the relationship between the elements in thevector K . More protection should be allocated to theselected data in the lower temporal layer.Now the optimization problem is expressed as

max PSNRaVg (K), subject to Constraints 1, 2 (7)

Exhaustive searching can be applied to solve the aboveproblem. However it is unfeasible in reality due to the largeamount of computation. Instead, dynamic programming isemployed to solve this maximization problem. A detaileddescription for this algorithm can be found in [6].

3. EXPERIMENTAL RESULTS

In the experiment, we use the SVC reference softwareJSVM 5.0 [7] to implement the proposed method. FourQCIF sequences, Football, Foreman, City and Crew aretested, which have the same frame rate of 15 Hz. There are113 frames in each test sequence and the GOP size isdefined as 16. The upper threshold THu and the lowerthreshold THL are set to 10% and 80%, respectively. Thesetwo values are determined through experiments.Experiments were performed to transmit video sequencesover a Two-state Markov channel. Due to the random natureof such a channel, 100 different runs of the experimentswere conducted under different packet loss rates from 2% to30°0. Because of the stringent place, we use "Football" and"Foreman" to show the experimental results in the paper.Our proposed method is compared with other two schemes.Scheme 1 is to apply UEP on different temporal layers.Scheme 2 selectively adds FEC codes to the motioninformation instead of all the data in each temporal layer.The performance (average PSNR) of these three schemes iscompared under a variety of packet loss rates. Thecomparison results are illustrated in Fig. 5. It is observedthat when the error rate is low, all the three error protectionmethods show similar performance. However, withincreasing of the packet loss rate, our proposed methodexhibits more advantage than the other two schemes and theimprovement is up to 1.2 dB.

4. CONCLUSIONS

In this paper, we propose a motion-based selective errorprotection method for scalable video over error-pronechannel. For the scalable video with temporal scalability, thecompressed bit-stream in each GOP is divided into severaltemporal layers. A framework is developed to select partialstream from each layer based on the motion activity.Furthermore, UEP is employed on the selected data ofdifferent temporal layers, where the data in the lower layerare assigned a higher priority. The available channelprotection codes are optimally allocated to different parts ofthe source codes to maximize the quality of thereconstructed video. We apply the dynamic programming to

34

30 -

28 -

-

24 -

22 - -_

20 1

0 5 10 15Packet Loss Rate(%)

(a)38

36

Scheme 1Scheme 2Our proposed method

2 2----- 30

20 25 30

Scheme 1- Scheme 2- Our proposed method

m 32 -

28 ,

26

240 5 10 15 20 25 30

Packet Loss Rate(%)

(b)Fig. 5. PSNR comparison ofthe proposed scheme against other twoschemes: (a) Football and (b) Foreman.

solve the optimization problem. The method is comparedwith other two schemes under a variety of packet loss rates.The experimental results demonstrate the advantage of ourproposed method.

5. REFERENCES

[1] A. Albanese, J. Blomer, J. Edmonds, M. Luby, and M. Sudan,"Priority encoding transmission", IEEE Trans. Inform. Theory, vol.42, pp. 1737-1744, Nov. 1996.[2] C. Huang and S. Liang, "Unequal Error Protection for MPEG-2Video Transmission over Wireless Channels", Signal Processing,Image Commun., vol. 19, pp. 67-79, 2004.[3] M. van der Schaar and H. Radha, "Unequal Packet LossResilience for Fine-Granular-Scalability Video," IEEE Trans.Multimedia, vol. 3, pp. 381-393, Dec. 2001.[4] J. T. H. Chung-How and D. R. Bull, "Loss Resilient H.263+Video over the Internet," Signal Processing, Image Commun., vol.16, pp. 891-908, 2004.[5] E. 0. Elliott, "A model of the switched telephone network fordata communications," Bell syst. Techn. J., pp. 89-109, Jan. 1965.[6] J. Kim, R. M. Mersereau and Y. Altunbasak, "Error-resilientimage and video transmission over the internet using unequal errorprotection," IEEE Trans. Image Processing, vol. 12, pp. 121-131,Feb. 2003.[7] MPEG Committee, JSVM 5 Software, ISO/IEC JTC 1/SC29/WG 11 N7797, Apr. 2006.[8] MPEG Committee, "Joint scalable video model JSVM-8,"ISO/IEC JTC 1/SC 29/WG 11 N8456, Oct. 2006.

766

3b 7.r r r

Documents

[IEEE Multimedia and Expo, 2007 IEEE International Conference on - Beijing, China (2007.07.2-2007.07.5)] Multimedia and Expo, 2007 IEEE International Conference on - A Motion-Based