9
Error-resilient video coding for wireless video telephony applications Rahul Vanam and Yuriy Reznik, InterDigital Communications, LLC, 9710 Scranton Road, San Diego, CA 92121 USA ABSTRACT In this paper, we present an error resilient video coding scheme for wireless video telephony applications that uses feedback to limit error propagation. In conventional feedback-based error resilient schemes, error propagation can significantly degrade visual quality when feedback delay is in the order of a few seconds. We propose a coding structure based on multiple description coding that mitigates error propagation during feedback delay, and uses feedback to adapt its coding structure to effectively limit error propagation. We demonstrate the effectiveness of our approach at different error rates when compared to conventional coding schemes that use feedback. Keywords: E rror resilience, error concealment, video coding, RTCP feedback, mobile video telephony. 1. INTRODUCTION Thanks to the advances in wireless networks and improvements in processing and graphics capabilities of mobile devices, mobile video telephony is now becoming a part of our daily lives. 1 Yet, some technical challenges in the design of mobile video phones still exist. One such a challenge is the lossy nature of wireless networks, as well as other communication links connecting one user to the other. We provide a simple illustrative example of such a system in Figure 1. In this case, video from user A is sent to user B using the RTP transport and RTCP control protocol. 2 Packet loss could occur either at the local link between the phone (UE) and the base station (eNB), in the Internet, or at the remote wireless link. This loss is eventually noticed by user B’s application, and information about packet loss can be communicated back to user A by means of an RTCP receiver report (RR). 2, 3 However, receiver reports are sent only periodically, usually once in every 1-5 second interval, as they should not generate a significant amount of traffic by themselves. 2 Hence, by the time a sender knows that the receiver did not receive some video packets, it is too late to retransmit them. Instead, the sender is usually instructed to send an I- or IDR-frame to stop error propagation caused by lost packets. Additionally, in order to reduce visual artifacts caused by lost packets in periods between receiver reports, the sender must employ video coding techniques that are resilient to packet loss. In this paper we offer a brief review of several existing approaches for error resilient video coding and pro- pose a new approach, which is customarily designed to accommodate long notification delays in RTP/RTCP - based systems. 1.1 Prior art The problem of error resilient video coding is well known, and prior research has produced a number of practical techniques for solving it. Recent surveys of such algorithms can be found in. 4–6 Below we list few general classes of such techniques. Conventional methods for reducing error propagation. Random intra macroblock insertions, intra slices, and slice interleaving – are among best known practical techniques for error resiliency. 4 Such schemes Contact information: R.V.: E-mail: [email protected] Y.R.: E-mail: [email protected]

Error-resilient video coding for wireless video telephony

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Error-resilient video coding for wireless video telephony

Error-resilient video coding for wireless video telephonyapplications

Rahul Vanam and Yuriy Reznik,

InterDigital Communications, LLC, 9710 Scranton Road, San Diego, CA 92121 USA

ABSTRACT

In this paper, we present an error resilient video coding scheme for wireless video telephony applicationsthat uses feedback to limit error propagation. In conventional feedback-based error resilient schemes, errorpropagation can significantly degrade visual quality when feedback delay is in the order of a few seconds.We propose a coding structure based on multiple description coding that mitigates error propagation duringfeedback delay, and uses feedback to adapt its coding structure to effectively limit error propagation. Wedemonstrate the effectiveness of our approach at different error rates when compared to conventional codingschemes that use feedback.

Keywords: E rror resilience, error concealment, video coding, RTCP feedback, mobile video telephony.

1. INTRODUCTION

Thanks to the advances in wireless networks and improvements in processing and graphics capabilities ofmobile devices, mobile video telephony is now becoming a part of our daily lives.1 Yet, some technicalchallenges in the design of mobile video phones still exist. One such a challenge is the lossy nature ofwireless networks, as well as other communication links connecting one user to the other.

We provide a simple illustrative example of such a system in Figure 1. In this case, video from user Ais sent to user B using the RTP transport and RTCP control protocol.2 Packet loss could occur eitherat the local link between the phone (UE) and the base station (eNB), in the Internet, or at the remotewireless link. This loss is eventually noticed by user B’s application, and information about packet loss canbe communicated back to user A by means of an RTCP receiver report (RR).2,3 However, receiver reportsare sent only periodically, usually once in every 1-5 second interval, as they should not generate a significantamount of traffic by themselves.2 Hence, by the time a sender knows that the receiver did not receive somevideo packets, it is too late to retransmit them. Instead, the sender is usually instructed to send an I- orIDR-frame to stop error propagation caused by lost packets. Additionally, in order to reduce visual artifactscaused by lost packets in periods between receiver reports, the sender must employ video coding techniquesthat are resilient to packet loss.

In this paper we offer a brief review of several existing approaches for error resilient video coding and pro-pose a new approach, which is customarily designed to accommodate long notification delays in RTP/RTCP- based systems.

1.1 Prior art

The problem of error resilient video coding is well known, and prior research has produced a number ofpractical techniques for solving it. Recent surveys of such algorithms can be found in.4–6 Below we list fewgeneral classes of such techniques.

Conventional methods for reducing error propagation. Random intra macroblock insertions, intraslices, and slice interleaving – are among best known practical techniques for error resiliency.4 Such schemes

Contact information:R.V.: E-mail: [email protected].: E-mail: [email protected]

Page 2: Error-resilient video coding for wireless video telephony

GWeNBUEWireless

linkGW eNB UE

Wireless

linkInternet

User A User B

RTCP RR: 1-5 sec

Figure 1. Mobile video communication system employing RTP transport and RTCP feedback. Transmission pathincludes the local wireless link, base station (eNB), gateway (GW), and the internet.

break the coding dependency of macroblocks or slices in consecutive video frames, thereby limiting errorpropagation.7 A recursive optimal per pixel estimate (ROPE) algorithm8 estimates the overall distortiondue to quantization, error propagation, and error concealment, and uses rate-distortion optimization tochoose the best intra or inter mode for each macroblock. Stockhammer et al.9 describe a multidecoderdistortion estimation method that improves error resilience. Both methods8 and9 show good performance,but require high computational complexity. All these schemes assume no feedback, and offer better resiliencyof encoded video at the expense of a moderate increase in the bitrate.

Feedback-based schemes. If feedback is available, it can be used to direct the video encoder to eitherencode the next frame as an IDR/I-frame, or encode using the most recent correctly transmitted frame asthe reference. The former approach is called an Intra refresh and latter is called reference picture selection(RPS).10 Feedback-based methods may also be combined with using hierarchical P-frame coding structures,as in such cases it is sufficient to fix frames that belong to the “base layer”.11 Most such techniques are onlyeffective when the notification delay is relatively small (on the order of 100s of milliseconds). The longerthe notification delay, the longer the part of video sequence that is affected by the error. In practice, videodecoders usually employ error concealment techniques, but even with state-of-art concealment, 1-5 secondsof delay before refresh can cause significant and visible artifacts (so-called “ghosting”).

Multiple-description coding (MDC)- based schemes. MDC encoders produce several descriptions(subsets of packets), such that reception of any description is sufficient for meaningful reconstruction ofvideo. The more descriptions that are received, the higher the quality of the reconstruction.12 Simpleexamples of techniques in this class include temporal-, or spatial sub-sampling of the original video andcoding of each sample set as a separate video stream. A survey and classification of MDC-based videocoding schemes can be found in.13

Feedback-based schemes for MDC. Several feedback-based techniques have been proposed for correctingerrors in MDC-encoded video. These include: (a) RPS,14,15 (b) error concealment,15 and (c) retransmissionwith fast decoding.15 In the RPS method, the sender on receiving a loss notification predicts the nextframe from a correctly transmitted frame,14,15 and in addition may also use correctly received portions ofthe corrupted reference frame.15 In the error concealment method, the encoder on receiving feedback, errorconceals the frame in error, and uses it to predict future frames. This approach requires the encoder to knowthe error concealment used at the decoder (which usually is not the case in practice). The retransmissionapproach15 is very similar to,11 except that it uses an MDC structure. All these techniques, however, wereproposed for systems with very short notification delay (1-2 frames),15 and don’t seem to be practical incases when this delay is long.

1.2 Contributions

In this paper, we propose a novel approach, which we call Inhomogeneous Temporal Multiple DescriptionCoding (IHTMDC) for video. We consider long feedback delay in the design of our approach, which hasnot been considered by most prior methods. Our approach lowers error propagation distortion when waitingfor the feedback, and on receiving it adapts its coding structure to limit error propagation. We call thisadaptation mechanism Cross-Description RPS (CDRPS), and show that in the presence of long feedbackdelay it is more efficient than existing RPS-based methods.14,15 In the experimental section, we compare

Page 3: Error-resilient video coding for wireless video telephony

(a)

(b)

Figure 2. (a) Conventional “IPPP” coding structure, and (b) Homogeneous temporal MDC.

Figure 3. Inhomogeneous Temporal Multiple Description (IMHDC) coding structure with interleaving factor k = 4.

different coding structures at different packet error rates, and show that our approach has better performanceover conventional methods at higher error rates.

1.3 Outline

The remainder of this paper is organized as follows. In Section 2, we describe our approach. Details of ourexperiments and results are provided in Section 3. Conclusions and outlook for future work are given inSection 4.

2. DESCRIPTION OF THE PROPOSED SCHEME

In this section, we first describe the coding structure of conventional video codec and its generalizationto temporal MDC. We then describe our proposed scheme, and show its relation to conventional (singledescription) and multiple description schemes. We also describe mechanisms for adaptation of this schemeusing delayed feedback, allowing it to limit error propagation.

2.1 Conventional and MDC structures

We show the coding structure employed by the majority of today’s real-time video codecs in Figure 2(a). Itconsists of an Intra- or IDR- frame followed by temporally predicted P-frames. It is commonly referred to as“IPPP” structure. The disadvantage of this scheme is a continuous chain of dependencies between framesand its susceptibility to error propagation.

One way to break this dependency is to create two or more sub-sequences of frames, which are not cross-referencing each other. We illustrate this approach in Figure 2(b), where we use two uniformly sampledsub-sequences to produce two independent encodings or descriptions of video. This is a very simple exampleof an MDC scheme for video, which we will call homogeneous temporal MDC (HMDC).

2.2 Inhomogeneous temporal MDC

We now propose a modification of temporal MDC method, where the temporal distances between adjacentframes in each description are not equal. We call this approach Inhomogeneous Temporal MDC (IHTMDC),and we illustrate it with an example in Figure 3. In this figure, frames i and (i+1) are set five frames apart,while frames (i+1) and (i+2) are set one frame apart.

Our motivation for using this scheme is to maintain the correlation between frames to a large extentwhile generating descriptions, which results in a hybrid coding structure shown in Figure 3.

Page 4: Error-resilient video coding for wireless video telephony

2.2.1 Connection to a single description and HMDC

We characterize IHTMDC by an interleaving interval k. In our example in Figure 3, this factor is set tok = 4. Different coding structures can be derived from the IHTMDC by varying k. For example, whenk = 1, IHTMDC turns into a homogeneous temporal MDC scheme, shown in Figure 2(b). Similarly, if weset k = ∞, IHTMDC effectively becomes a single description IPPP coding structure as shown in Figure 2(a).

2.2.2 Effects of packet loss

In the IPPP coding structure, a packet loss would corrupt all successive frames. On the other hand, inHTMDC, the error propagates through one of the descriptions as illustrated in Figure 4 (a). Moreover,HTMDC structure allows the decoder to better conceal successive frames of a corrupted description byusing neighboring frames belonging to uncorrupted description, thereby limiting error propagation drift toat most k consecutive frames.

2.2.3 Effects of interleave factor k on overall distortion

When considering transmission over a lossy channel, the overall (end-to-end) distortion of received video canbe approximately expressed as:

DETE(k) ≈ DQ(k) +DT(k), (1)

where DETE, DQ, and DT denote the end-to-end-, source coding-, and transmission- induced distortions,respectively. Specific conditions under which (1) holds true and related discussion can be found in.16

Assuming that (1) holds true, we may conjecture that for a given source and a given channel there mayexist an optimal choice of parameter k for our proposed coding scheme:

k∗ = arg mink∈Z+

DETE(k). (2)

Intuitively, with no transmission errors, single description (k = ∞) is most desirable since it yields leastcoding distortion. However, in the presence of packet loss, k = ∞ may not be a good choice since errorpropagates through the length of the video yielding larger DETE. Using smaller k would increase DQ, butit would also make the bitstream less sensitive to transmission errors, as errors propagates through one ofthe descriptions, thereby resulting in smaller DETE. Therefore, the optimal choice of k must depend on thepacket error rate.

In order to test this theory, in Section 3, we will perform experiments to study the effect of k on therate-distortion performance under different packet error rates.

2.3 Adapting IHTMDC in response to feedback

We will now discuss uses of RTCP feedback for limiting error propagation in IHTMDC-coded video. Thereare at least two possible solutions:

• Intra refresh: Encode the next frame belonging to the corrupted description as an IDR/I-frame asillustrated in Figure 4 (b).

• Cross description RPS (CDRPS): In this approach, the encoder based on rate-distortion optimizationdecides whether to encode the next frame belonging to the corrupted description as an intra/IDRframe, or encode it using the nearest frame from the uncorrupted description as the reference. Thelatter approach is illustrated in Figure 4 (c).

Performing an intra refresh or CDRPS on the next corrupted frame limits error propagation of thecorrupted description. When k = ∞, the above two methods turn into conventional single description intrarefresh and RPS, respectively. However, when k is finite, the CDRPS method is different compared totraditional RPS techniques.10 In traditional RPS schemes, the reference is always set to last frame that wasconfirmed as delivered. With long 1-5 second feedback, this means that such reference would have to be

Page 5: Error-resilient video coding for wireless video telephony

Figure 4. IHTMDC subjected to errors. (a) Error propagation in IHTMDC without feedback. IHTMDC with feedbackwhen using (b) intra refresh, and (c) cross description reference picture selection.

25-100 frames back. On the other hand, with IHTMDC and one surviving description - such a reference canalways be found within last k frames. This makes this scheme much more suitable for systems with delayedfeedback.

In practical implementations, the CDRPS and intra-refresh techniques can be used in a complementaryfashion. For example, when encoder knows that both descriptions have been lost since last feedback, it mayinsert a new IDR frame in one description, and use cross-description reference in another description torestart the encoding process.

2.4 Opportunistic error concealment

In our IHTMDC approach, when a packet is lost, successive frames belonging to the lost packet’s descriptionare corrupted due to error propagation as illustrated in Figure 5(a). Although half the descriptions areuncorrupted, error propagation can sometimes cause flickering due to the display of alternating corruptedand uncorrupted descriptions, which can lower the overall visual quality. To mitigate this problem we presentan opportunistic error concealment method for our IHTMDC scheme.

The decoder on detecting a lost packet conceals it using a conventional error concealment method, suchas frame copy. For the next uncorrupted description, the decoder samples the first frame labeled as ‘x’ inFigure 41 (a), and repeats it for the entire length of the description as shown in Figure 41 (b). For the nextdescription, it uses the last frame from the previous description, labeled ‘y’ in Figure 41 (a), and repeats itover the entire length of the description as illustrated in Figure 41 (b). This error concealment procedure

Page 6: Error-resilient video coding for wireless video telephony

Figure 5. Opportunistic error concealment method for IHTMDC scheme. (a) After a packet is lost, successive framesbelonging to the lost packet’s description are corrupted. (b) The opportunistic error concealment method sampleand holds the uncorrupted frames over the entire length of the description. For example, frames ‘x’ and ‘y’ from theuncorrupted description are repeated over the entire length of the description.

is repeated over a period equal to the RTCP feedback delay. Although our concealment lowers the framerate, the resulting video visually appears to be almost-smooth since the sampled frames are temporally close.Therefore, our error concealment improves overall visual quality and has low computational complexity.

3. EXPERIMENTAL RESULTS

In this section, we describe our experimental setup and results.

3.1 Experiment setup

In our tests we have utilized standard CIF and high-definition test sequences,17 and looped them back andforth to generate 1000 frames for each test. We used “Foreman”, “Soccer”, and “News” for CIF sequences(352 × 288, 30 fps), and “Pedestrian” for HD sequence (1080p, 25 fps). We have generated IHTMDCbitstreams using a modification of the x264 encoder.18 In,19 the x264 encoder was compared with the H.264JM reference encoder and was shown to be 50 times faster while providing bit rates within 5% for the samePSNR. Constant QP rate control option and one reference frame was used in all our experiments. We usedthe H.264 JM decoder with frame-copy error concealment method enabled.

For CIF sequences, we set QP = 26, 28, 30, 32, and 34, and use a frame as a slice. Here a lost packetcorresponds to a lost frame. For the 1080p sequence, we set QP = 30, 34, 38, and 42, and encode using 14slices per frame, as this was necessary to keep the NAL unit size within 1400 bytes for our operating bitrates.In order to understand the effectiveness of the proposed methods we have setup an experiment in which wehave simulated a channel with no errors, 10−2 and 3× 10−2 packet error rates (PER), which are typical forconversational services over LTE. We have also implemented RTCP notification with a one second delay. Wehave tested IHTMDC with interleaving factors k = 1, 2, 4, as well as conventional H.264 single-descriptioncoding scheme (k = ∞). RPS technique (CDRPS in case of IHTMDC) was used to correct errors uponRTCP notification.

3.2 Results

Table 1 illustrates visual quality achievable with single description coding (k = ∞) vs. IHTMDC withk = 4. Sequence “Pedestrian.yuv” is used in this experiment. The error starts at frame number 166 for bothschemes. As expected, single description scheme (k = ∞) propagates error into frame 186, while in the caseof IHTMDC (k = 4) the error is not noticeable in frames 176 and 186.

Page 7: Error-resilient video coding for wireless video telephony

Foreman.yuv

(a)

200 300 400 500 600 700 800 90028

30

32

34

36

38

Bitrate (kb/s)

PSN

R (

dB)

No error

k = 1k = 2k = 4k = ∞

(b)

200 300 400 500 600 700 800 90028

30

32

34

36

38

Bitrate (kb/s)

PS

NR

(dB

)

PER = 10−2

k = 1k = 2k = 4k = ∞

(c)

200 300 400 500 600 700 800 90028

30

32

34

36

38

Bitrate (kb/s)

PS

NR

(dB

)

PER = 3 × 10−2

k = 1k = 2k = 4k = ∞

Soccer.yuv

(d)

300 400 500 600 700 800 900

26

28

30

32

34

36

38

Bitrate (kb/s)

PSN

R (

dB)

No error

k = 1k = 2k = 4k = ∞

(e)

300 400 500 600 700 800 900

26

28

30

32

34

36

38

Bitrate (kb/s)

PS

NR

(dB

)

PER = 10−2

k = 1k = 2k = 4k = ∞

(f)

300 400 500 600 700 800 900

26

28

30

32

34

36

38

Bitrate (kb/s)

PS

NR

(dB

)

PER = 3 × 10−2

k = 1k = 2k = 4k = ∞

News.yuv

(g)

100 150 200 250 300 350 40032

33

34

35

36

37

38

39

40

Bitrate (kb/s)

PSN

R (

dB)

No error

k = 1k = 2k = 4k = ∞

(h)

100 150 200 250 300 350 40032

33

34

35

36

37

38

39

40

Bitrate (kb/s)

PS

NR

(dB

)

PER = 10−2

k = 1k = 2k = 4k = ∞

(i)

100 150 200 250 300 350 40032

33

34

35

36

37

38

39

40

Bitrate (kb/s)P

SN

R (

dB)

PER = 3 × 10−2

k = 1k = 2k = 4k = ∞

Figure 6. Rate-distortion performance of IHTMDC for different interleaving factors k, packet error rates, and frameresolutions. Plots for “Foreman.yuv” : (a) no errors, (b) PER = 10−2, and (c) PER = 3×10−2. Plots for “Soccer.yuv”:(d) no error, (e) PER = 10−2, and (f) PER = 3 × 10−2. Plots for “News.yuv”: (d) no error, (e) PER = 10−2, and(f) PER = 3 × 10−2. Cases when k = ∞ and k = 1 correspond to the single description scheme and homogeneoustemporal MDC, respectively.

Figures 6 (a)-(i) show the rate-distortion performance of IHTMDC with CDRPS for different packet errorrates and values of k for CIF sequences. As expected, the RD performance of the single description scheme(k = ∞) performs the best for the no-error case as shown in Figure 6 (a), (d), and (e). With packet loss,IHTMDC and HMDC show better performance over the single description scheme for the “Foreman” and“Soccer” sequences as shown in Figures 6 (b), (c), (e), and (f). For the “News” sequence at PER = 10−2,single description (k = ∞) performs the best at bitrates less than 250 kb/s, and k = 4 performs the bestat higher bitrates as shown in Figure 6 (h). This clearly indicates that the choice of k is also dependenton the video content and operating bitrate. For PER = 3 × 10−2, k = 4 shows the best R-D performance.With packet loss, HMDC (k = 1) shows poor performance over the single description scheme as shown inFigures 6 (h) and (i).

For the HD “Pedestrian” sequence, we only test for PER = 10−3 and 2 × 10−3, since our IHTMDCscheme at k = 4 demonstrates good performance at such low packet error rates. Specifically, for PER =10−3, k = 4 has similar performance to single description (k = ∞) for bitrates less than 1.4 Mb/s, andhas best performance for higher bitrates, yielding up to 0.7 dB gain over single description as shown inFigure 7 (b). For PER = 2 × 10−2, k = 4 has the best performance yielding up to 1.5 dB gain over singledescription as shown in Figure 7 (c).

Page 8: Error-resilient video coding for wireless video telephony

frame# 166 frame# 176 frame# 186

k = ∞

k = 4

Table 1. Illustration of error propagation in single-description coding vs. IHTMDC using CDRPS with one secondfeedback delay using “Pedestrian.yuv” sequence. Error occurs at frame number 166. The red ellipses highlight errors.For single description (k = ∞), error propagates all the way until frame number 186, while in the IHTMDC case(k = 4) error propagation is not noticeable in frames 176 and 186.

Pedestrian.yuv

(a)

1 1.5 2 2.5 333

34

35

36

37

38

39

Bitrate (Mb/s)

PSN

R (

dB)

No error

k = 1k = 2k = 4k = ∞

(b)

1 1.5 2 2.5 333

34

35

36

37

38

39

Bitrate (Mb/s)

PS

NR

(dB

)

PER = 10−3

k = 1k = 2k = 4k = ∞

(c)

1 1.5 2 2.5 333

34

35

36

37

38

39

Bitrate (Mb/s)

PS

NR

(dB

)

PER = 2 × 10−3

k = 1k = 2k = 4k = ∞

Figure 7. Rate-distortion performance of IHTMDC for different interleaving factors k and packet error rates for“Pedestrian.yuv” sequence: (a) no error, (b) PER = 10−3, and (c) PER = 2× 10−3.

4. CONCLUSIONS AND FUTURE WORK

In this paper, we have presented an inhomogeneous multiple description video coding technique that providesexcellent error resilience properties, and is suitable for systems with long feedback delay. Our schemeeffectively uses feedback to reset the coding structure, thereby limiting error propagation. Our scheme can beused to derive different coding structures by varying the interleaving factor. We compare our approach withsingle description coding and homogeneous temporal multiple description coding with feedback at differentpacket error rates, and find that our scheme provides better visual quality and rate-distortion performanceat higher error rates. In our current work, we studied our approach using a fixed interleaving factor. Infuture work, we plan to use an approach whose interleaving factor dynamically adapts with observed packeterror rates.

REFERENCES

[1] T. Weigand and G.J. Sullivan, “The picturephone is here. Really,” IEEE Spectrum, vol. 48, no. 9, pp.50–54, Sept. 2011.

[2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, “RFC 3550: RTP: A transport protocol forreal-time applications,” July 2003.

[3] J. Ott, S.Wenger, N.Sato, C.Burmeister, and J. Ray, “IETF RFC 4585: Extended RTP profile forreal-time transport control protocol (RTCP)-based feedback (RTP/AVPF),” 2006.

Page 9: Error-resilient video coding for wireless video telephony

[4] Y. Wang, S. Wenger, J. Wen, and A. K. Katsaggelos, “Review of error resilient coding techniques forreal-time video communications,” IEEE Signal Proc. Magazine, vol. 17, pp. 61–82, 2000.

[5] Y. Wang and Q-F. Zhu, “Error control and concealment for video communication – a review,” inProceedings of the IEEE, 1998, pp. 974–997.

[6] S. Kumar, L. Xu, M. K. Mandal, and S. Panchanathan, “Error resiliency schemes in H.264/AVCstandard,” J. Visual Communication and Image Representation, vol. 17, no. 2, pp. 425–450, 2006.

[7] T. Stockhammer, “Error robust macroblock mode and reference frame selection,” in VCEG JVT-B102,Jan 2002.

[8] R. Zhang, S. L. Regunathan, and K. Rose, “Video coding with optimal inter/intra-mode switching forpacket loss resilience,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, pp. 966–976,2000.

[9] T. Stockhammer, M. M. Hannuksela, and T. Wiegand, “H.264/AVC in wireless environments,” IEEETrans. Cir. and Sys. for Video Technol., vol. 13, pp. 657–673, 2003.

[10] B. Girod and N. Farber, “Feedback-based error control for mobile video transmission,” in Proceedingsof the IEEE, 1999, pp. 1707–1723.

[11] I. Rhee and S. R. Joshi, “Error recovery for interactive video transmission over the internet,” IEEEJournal on Selected Areas in Communications, vol. 18, pp. 1033–1049, 2000.

[12] V. K. Goyal, “Multiple description coding: Compression meets the network,” IEEE Signal Processingmagazine, vol. 18, no. 5, pp. 74 – 93, Sept 2001.

[13] Y. Wang, A. R. Reibman, and S. Lin, “Multiple description coding for video delivery,” Proceedings ofthe IEEE, vol. 93, no. 1, pp. 57–70, 2005.

[14] S. Fukunaga, T. Nakai, and H. Inoue, “Error resilient video coding by dynamic replacing of referencepictures,” in IEEE GLOBECOM 1996, 1996, vol. 3, pp. 1503 – 1508.

[15] W. Tu and E. G. Steinbach, “Proxy-based reference picture selection for error resilient conversationalvideo in mobile networks,” IEEE Trans. Cir. and Sys. for Video Technol., vol. 19, no. 2, pp. 151–164,Feb 2009.

[16] Z. Chen and D. Wu, “Rate-distortion optimized cross-layer rate control in wireless video communica-tion,” IEEE Trans. Cir. Sys. Video Tech., vol. 22, no. 3, pp. 352–365, March 2012.

[17] “Raw video sequences,” ftp.ldv.e-technik.tu-muenchen.de.

[18] “x264 encoder,” http://www.videolan.org/developers/ x264.html.

[19] Loren Merritt and Rahul Vanam, “Improved rate control and motion estimation for H.264 encoder,”in Proceedings of IEEE ICIP (5), 2007, pp. 309–312.