13
Iterative joint source–channel decoding of H.264 compressed video David Levine a , William E. Lynch a, , Tho Le-Ngoc b a Department of Electrical and Computer Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8 b Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec, Canada H3A 2A7 article info Article history: Received 20 August 2008 Accepted 22 December 2009 Keywords: Error resilience Joint source–channel decoding Likelihood decoding Compressed video H.264 abstract This paper proposes an Iterative Joint Source–Channel Decoding (IJSCD) scheme for error resilient transmission of H.264 compressed video over noisy channels by using the available H.264 compression, e.g., Context-based Adaptive Binary Arithmetic Coding (CABAC), and channel coding, i.e., rate-1/2 Recursive Systematic Convolutional (RSC) code, in transmission. At the receiver, the turbo decoding concept is explored to develop a joint source–channel decoding structure using a soft-in soft-out channel decoder in conjunction with the source decoding functions, e.g., CABAC-based H.264 semantic verification, in an iterative manner. Illustrative designs of the proposed IJSCD scheme for an Additive White Gaussian Noise (AWGN) channel, including the derivations of key parameters for soft information are discussed. The performance of the proposed IJSCD scheme is shown for several video sequences. In the examples, for the same desired Peak Signal-to-Noise Ratio (PSNR), the proposed IJSCD scheme offers a savings of up to 2.1 dB in required channel Signal-to-Noise Ratio (SNR) as compared to a system using the same RSC code alone. The complexity of the proposed scheme is also evaluated. As the number of iterations is controllable, a tradeoff can be made between performance improvement and the overall complexity. & 2009 Elsevier B.V. All rights reserved. 1. Introduction Modern communication systems are commonly used for the transmission of video information. For video transmission systems to be feasible, they should make efficient use of the available communication resources. Hence, video data is typically compressed before being transmitted. Compression removes redundancy, which reduces the required transmission bandwidth. However, compression also makes transmitted video more suscep- tible to error propagation, which can lead to serious quality degradation. Two ways to make video more resilient to transmis- sion errors are to use a channel code and to use source residual redundancy. Channel coding trades bandwidth for the ability to perform error correction on a received sequence. Residual redundancy can be used by the source decoder to correct bit errors as a video sequence is decompressed. By making use of both added redundancy from channel coding and residual redundancy from source coding, a decoding scheme can be created to make compressed video even more resilient to transmission errors. Syntax or semantic errors at the source decoder indicate that an error has occurred. The ability of the source decoder to eliminate possible bitstreams can be combined with soft information from the channel to create a decoding scheme that improves error resilience. Several authors have proposed decoding schemes that take advantage of source residual redundancy. These decoding schemes fall into two categories. The first category includes schemes that consider specific knowledge about one or several syntax elements Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/image Signal Processing: Image Communication ARTICLE IN PRESS 0923-5965/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2009.12.006 Corresponding author. E-mail address: [email protected] (W.E. Lynch). Signal Processing: Image Communication 25 (2010) 75–87

Iterative joint source–channel decoding of H.264 compressed video

Embed Size (px)

Citation preview

Page 1: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

Contents lists available at ScienceDirect

Signal Processing: Image Communication

Signal Processing: Image Communication 25 (2010) 75–87

0923-59

doi:10.1

� Cor

E-m

journal homepage: www.elsevier.com/locate/image

Iterative joint source–channel decoding of H.264 compressed video

David Levine a, William E. Lynch a,�, Tho Le-Ngoc b

a Department of Electrical and Computer Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8b Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec, Canada H3A 2A7

a r t i c l e i n f o

Article history:

Received 20 August 2008

Accepted 22 December 2009

Keywords:

Error resilience

Joint source–channel decoding

Likelihood decoding

Compressed video

H.264

65/$ - see front matter & 2009 Elsevier B.V. A

016/j.image.2009.12.006

responding author.

ail address: [email protected] (W.E. Ly

a b s t r a c t

This paper proposes an Iterative Joint Source–Channel Decoding (IJSCD) scheme for

error resilient transmission of H.264 compressed video over noisy channels by using the

available H.264 compression, e.g., Context-based Adaptive Binary Arithmetic Coding

(CABAC), and channel coding, i.e., rate-1/2 Recursive Systematic Convolutional (RSC)

code, in transmission. At the receiver, the turbo decoding concept is explored to develop

a joint source–channel decoding structure using a soft-in soft-out channel decoder in

conjunction with the source decoding functions, e.g., CABAC-based H.264 semantic

verification, in an iterative manner. Illustrative designs of the proposed IJSCD scheme

for an Additive White Gaussian Noise (AWGN) channel, including the derivations of key

parameters for soft information are discussed. The performance of the proposed IJSCD

scheme is shown for several video sequences. In the examples, for the same desired

Peak Signal-to-Noise Ratio (PSNR), the proposed IJSCD scheme offers a savings of up to

2.1 dB in required channel Signal-to-Noise Ratio (SNR) as compared to a system using

the same RSC code alone. The complexity of the proposed scheme is also evaluated. As

the number of iterations is controllable, a tradeoff can be made between performance

improvement and the overall complexity.

& 2009 Elsevier B.V. All rights reserved.

1. Introduction

Modern communication systems are commonly usedfor the transmission of video information. For videotransmission systems to be feasible, they should makeefficient use of the available communication resources.Hence, video data is typically compressed before beingtransmitted. Compression removes redundancy, whichreduces the required transmission bandwidth. However,compression also makes transmitted video more suscep-tible to error propagation, which can lead to seriousquality degradation.

Two ways to make video more resilient to transmis-sion errors are to use a channel code and to use source

ll rights reserved.

nch).

residual redundancy. Channel coding trades bandwidthfor the ability to perform error correction on a receivedsequence. Residual redundancy can be used by the sourcedecoder to correct bit errors as a video sequence isdecompressed. By making use of both added redundancyfrom channel coding and residual redundancy fromsource coding, a decoding scheme can be created to makecompressed video even more resilient to transmissionerrors.

Syntax or semantic errors at the source decoderindicate that an error has occurred. The ability of thesource decoder to eliminate possible bitstreams can becombined with soft information from the channel tocreate a decoding scheme that improves error resilience.Several authors have proposed decoding schemes thattake advantage of source residual redundancy. Thesedecoding schemes fall into two categories.

The first category includes schemes that considerspecific knowledge about one or several syntax elements

Page 2: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

D. Levine et al. / Signal Processing: Image Communication 25 (2010) 75–8776

while decoding. Schemes that fall into the first categoryinclude [1–4].

Bystrom et al. [1] investigated a scheme for softdecoding of Variable Length Code (VLC) data in MPEG-4video. In the scheme, the number of bits and the numberof codewords in a coded bitstream are transmitted in anerror-free side-channel. This limits the number of validbitstreams for the decoder to choose from, making it morelikely to choose the right one. Jeanne et al. [2] proposed anapproach for decoding MPEG-4 DCT coefficients. Thedecoder uses probability information to select the mostlikely sequence of DCT coefficients. Nguyen and Duhamel[3] proposed a method to decode DCT coefficients takinginto account the maximum number of DCT coefficientsand the fact that only the last VLC codeword should belabeled as ‘‘last’’. This scheme assumes that only DCTcoefficients are vulnerable to channel errors and all othersyntax elements are error-free. Wang and Yu [4] devel-oped a scheme for H.264 video that used joint source–channel decoding to decode VLC encoded motion vectors.A MAP decoder used source and channel statistics toselect the most probable set of motion vectors. Each ofthese schemes is designed to operate on a specific part(DCT coefficients, motion vectors) of a compressed bit-stream. All other parts of the bitstream are assumed to beerror-free. In contrast, the scheme proposed in this paperaddresses errors in most parts of the compressed bit-stream.

The second category of schemes considers all syntaxelements in a video sequence. These schemes check thesyntax of each video slice and detect or correct anysyntax/semantic errors. Several authors [5–8] have pro-posed joint source–channel decoding schemes that fallinto this category.

Pu et al. [5] used a Low Density Parity Check (LDPC)code to transmit JPEG2000 images. At the receiver, thechannel decoder performs soft decoding and the sourcedecoder detects syntax errors. Based on the results oferror detection, the soft channel values are modified andfed back to the channel decoder to be used in the nextiteration. Peng et al. [6–8] proposed a similar schemeusing turbo codes to transmit JPEG images [6], MPEG-1video [6], vector-quantized images [7], and sub-bandcoded images [8]. The source decoders in the schemes in[5–8] only detects syntax errors. None of these schemesattempt to conceal errors before information is fed back tothe channel decoder.

While the source decoders in [5–8] only detect syntaxerrors, several other schemes [9–11] include sourcedecoders that try to conceal syntax errors. These schemesgenerate several candidates for each video slice. Aftertesting the syntax of the candidates, a winner is chosenamong them. However, once selected, the winning slicecandidates are decompressed. No effort is made to feedinformation back to the channel decoder.

Ma and Lynch in [12] proposed a transmission schemethat combines iterative joint source–channel decoding (asin [5–8]) and error concealment in source decoding (as in[9–11]). The proposed scheme uses turbo coding andMPEG-4 video compression. Between iterations of theturbo decoder, multiple slice candidates are generated

and a winner is selected among them. Using the winningslice candidate, the slice’s soft bit values are modifiedprior to the next iteration of the turbo decoder.

This paper investigates an IJSCD approach for trans-mitting H.264 video. The work here follows a similarframework to [12], although there are significant differ-ences. In [12], a conventional turbo decoder with twoconvolutional codes is used, and source decoding is donein between turbo decoder iterations. In contrast, a singleconvolutional code is used in this paper and the turbodynamic is between convolutional decoder and the sourcedecoder itself. Furthermore, the scheme in [12] wasdesigned to use VLC for entropy compression. On theother hand, CABAC [13], which is available in H.264, isused as the entropy code in this paper.

The remainder of this paper is organized as follows:Section 2 discusses error detection in H.264 decompres-sion and provides the details of the proposed scheme.Simulation results are presented in Section 3 and thepaper concludes in Section 4.

2. Iterative joint source–channel decoding (IJSCD)scheme

H.264 is a layered video format [14-16]. Each layer hasa header, which contains information global to that layer.The layers of H.264 are the sequence, the picture, the slice,the macroblock and the block. All layers above andincluding the slice begin with a start code, which acts asa re-synchronization point. The proposed IJSCD schemeassumes all bits in start codes, slice headers and higherlayers to be error-free by using a very strong channelcode. The remaining slice data bits, which make up themajority of the bits in the compressed sequence, aretransmitted over the main channel, which is considered tobe an AWGN channel. There are two modes for entropycoding available in H.264: VLC and CABAC mode [15]. Inthis paper, we only consider the CABAC for all syntaxelements in the slice data [13].

As mentioned earlier, compressed video is susceptibleto error propagation. In [17], the effect of bit errors onH.264 decompression was observed when CABAC entropycoding was used. It was found that H.264 decompressionwas highly sensitive to bit errors and that even a single biterror would often cause noticeable degradation. It wasalso seen that the H.264 decompressor was ofteninstructed to perform a task known to be incorrect. Thismeant that the decompressor was capable of detectingsemantic errors. Several different types of semantic errorswere described.

Also in [17], it was observed that bit errors causesemantic errors with a probability of approximately0.993. This implies that, when no semantic error isdetected, the probability of the slice not having any biterrors is high. Furthermore, undetected bit errors werenever observed to cause any noticeable video qualitydegradation. This implies that it is acceptable to display asemantically valid video slice even if that slice containsuncorrected bit errors. The scheme proposed in thefollowing section uses the semantic error detection

Page 3: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

D. Levine et al. / Signal Processing: Image Communication 25 (2010) 75–87 77

capability of the H.264 decompressor described in [17] totry to improve the quality of transmitted video.

The scheme proposed in this paper combines sourceand channel decoding in an iterative manner to decode atransmitted H.264 sequence. The scheme was firstdiscussed in [18] and is described in greater detail here.

On the transmitter side, source coding (H.264 com-pression) and channel coding (convolutional coding) aredone separately. On the receiver side, source and channeldecoding are done iteratively. The channel decoder passessoft bit information to the source decoder. The sourcedecoder combines this soft information with sourcesemantic information to detect and correct any remainingbit errors. The results of source decoding are fed back tothe channel decoder and another iteration is performed.After several iterations of this decoding process, the videois decompressed.

The block diagram of the proposed scheme is shown inFig. 1. On the transmitter side, the input first passesthrough the H.264 Compressor, which compresses thevideo. The compressed video is then passed to the H.264Header Splitter. In IJSCD, it is assumed that all bits in slice

Fig. 1. Block diagram of

headers and all bits in higher layers are error-free. Thus,the only bits subject to channel errors are slice data bits.To enable this, the H.264 Header Splitter partitions thecompressed sequence into two bitstreams: a data and aheader stream. The data stream contains all slice data bitswhile the header stream contains all other bits. To later beable to recombine the two streams, the length of eachslice is appended to the end of the slice header in theheader stream. To be able to assume that the headerstream is error-free, it is protected by a very strongchannel code. The header stream is then transmitted in aside-channel. The data stream is passed to the Interleaver.

The Interleaver on the transmitter side of Fig. 1 createsinformation blocks by distributing bits from several slicesin the data stream. This is done for two reasons. First, it isdifficult for the Source Semantic Verifier (described later)to successfully correct a slice with a large number of biterrors in it. In convolutional decoding, incorrect decodingdecisions often results in burst errors. If a convolutionalcode block were made up of bits from the same slice, thena burst error would result in multiple bit errors in thatslice. Interleaving ensures that consecutive bits in a

proposed scheme.

Page 4: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

D. Levine et al. / Signal Processing: Image Communication 25 (2010) 75–8778

convolutional code block come from different slices. Thus,burst errors from convolutional decoding are diffusedamong multiple slices, which reduces the likelihood thatany one slice will have a large number of errors. Second,when interleaving is used, nearby bits in informationblocks are rarely from the same slice. This allowsimprovements made by the Source Semantic Verifier toassist convolutional decoding (and vice-versa) in aniterative manner. The Interleaver used in this paper is ablock interleaver using 5000�10 matrix. The Interleaverpasses the information blocks to the ConvolutionalEncoder.

The Convolutional Encoder uses a rate-½ RSC codewith a constraint length of 2. It outputs both informationand parity bits. The information blocks created by theInterleaver are RSC encoded and transmitted over a noisychannel. For simplicity, binary transmission (e.g., usingBPSK modulation) over an AWGN channel is considered.

On the receiver side, the channel-corrupted bitstreamis passed to the Parity Splitter, which separates informa-tion bits from parity bits and passes both bitstreams tothe Soft Output Viterbi Algorithm (SOVA) Decoder.

The SOVA Decoder is a Soft-Input Soft-Output (SISO)Maximum-Likelihood (ML) convolutional decoder [19].On the first decoding iteration, the SOVA Decoder uses theinformation bits coming from the Parity Splitter. On allsubsequent iterations, it uses the information bits thatresult from the previous iteration. The SOVA Decoderoutputs soft bit values, denoted L(uk).

To determine the value of soft-output L(uk), twolikelihood estimates are defined. The first estimate,labeled pk, is an estimate of the likelihood that a 1 wastransmitted and is shown in Eq. (1). The second estimate,labeled qk, is an estimate of the likelihood that a 0 wastransmitted. The values pk and qk are shown in Eqs. (1)and (2).

pk ¼ Pðuk ¼ 19yÞ ð1Þ

qk ¼ 1�pk ¼ Pðuk ¼ 09yÞ ð2Þ

Here uk is the transmitted bit k and y is the received slice.L(uk) is the logarithm likelihood ratio (LLR) of pk and qk, asshown in Eq. (3).

LðukÞ ¼ logpk

qk

� �ð3Þ

To determine the L(uk) values, the SOVA Decoder usesthe classical Viterbi Algorithm [20] to find the MLcodeword. As the ML codeword is decoded, a softreliability measure is associated with each bit [19]. Thereliability measure for each bit determines the magnitudeof L(uk), and the corresponding bit value in the MLcodeword determines the sign of L(uk).

The Deinterleaver performs the inverse operation ofthe transmitter’s Interleaver. It rearranges the bits so thatthe slices are in their original order. The resultingbitstream is passed to the Stream Merger.

The Stream Merger combines the header informationfrom the side-channel with the slice data from theDeinterleaver. This results in a complete H.264 bitstream,which is then passed to the Source Decoder.

The proposed Source Decoder is comprised of the SliceCandidate Generator (SCG) and the Source SemanticVerifier (SSV). The SCG generates a list of slice candidatessorted in descending order of likelihood. To perform thistask, the SCG uses the soft-values of the bits in each slice.The SSV checks the semantics of the slice candidatesgenerated by the SSV and selects a winner among them.

The Slice Candidate Generator (SCG) takes in the soft-values from the Deinterleaver as well as the length of eachslice from the header stream and generates a list of slicecandidates. A slice candidate is a bit sequence that couldpossibly be the original slice. For a slice that is N bits long,there are 2N possible slice candidates (i.e., all bitsequences of length N). Among the 2N slice candidates,one of the candidates, called the Target Candidate, isidentical to the original slice. The goal of the SCG is toinclude the Target Candidate as early as possible in the listof slice candidates.

Given the soft values decoded by each iteration of theSOVA Decoder, some slice candidates are more likely to bethe Target Candidate than others. Using these soft values,the SCG ranks the slice candidates in descending order oflikelihood. When the channel noise is low, the TargetCandidate will usually be early in the list. However, whenthe channel noise is high, the Target Candidate will moreoften occur later in the list.

To ensure that the Target Candidate appears in the list,the SCG could theoretically generate all 2N possible slicecandidates. However, for practical values of N, thecomplexity of this task is too high. To reduce thecomplexity, the SCG generates up to nsc slice candidatesfor all slices regardless of the length of the slice. Here, nsc

is a number chosen to yield reasonable complexity. In thispaper, it is chosen to be 300. Because there are only 300slice candidates in the list, it is possible that the TargetCandidate is excluded.

To rank the slice candidates in descending order oflikelihood, a ranking measure, denoted R(s) is determinedas shown below.

A soft value is received from the SOVA Decoder at thesource decoder at each iteration. The soft value of bit kL(uk ) is relabelled asyk i.e.,

yk ¼ LðukÞ ¼ logpk

qk

� �ð4Þ

where pk is the probability that the bit is 1 and qk is theprobability that it is 0. To find the most likely slicecandidate (denoted v) we hard limit each bit, i.e., if thesoft value yk is positive the bit in location k is 1 and if thesoft value is negative the bit in that location is 0.

The probability of the most likely slice candidate is

PðvÞ ¼ PN�1

k ¼ 0PðvkÞ ð5Þ

Here if vk is 1 then p(vk) is pk and if it is zero then p(vk) isqk. Note that the vk’s are assumed to be statisticallyindependent.

The probability that a slice candidate s occurs is

PðsÞ ¼ PN�1

k ¼ 0PðskÞ ð6Þ

Page 5: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

Table 1Compression specifications of the video sequences ‘‘Football’’, ‘‘Table-

Tennis’’ and ‘‘Foreman’’.

Attribute Video sequence

Football Table-Tennis Foreman

Frame size 352�240 352�240 352�288Duration 4 s 4 s 4 sFrame rate 30 fps 30 fps 30 fpsCompression H.264 H.264 H.264Profile [15] Main Main MainEntropy coding CABAC CABAC CABACGOP length 15 15 15Bitrate 1 Mbps 1 Mbps 1 Mbps

D. Levine et al. / Signal Processing: Image Communication 25 (2010) 75–87 79

We can form the ranking value R(s) for each slicecandidate s as

RðsÞ ¼�logPðsÞ

PðvÞð7Þ

Note that P(s)oP(v) for all sav. Therefore, R(s)40 forall sav and R(v)=0. Furthermore, the smaller R(s) is morelikely the slice candidate s.

Manipulating R(s), we have

RðsÞ ¼�XN�1

k ¼ 0

logPðskÞ

PðvkÞð8Þ

Define Fs as the set of bit locations k where skavk . Wemay think of these as the flip bit locations. Let Fs

0 denotethe set of all bit locations not in Fs. Now

RðsÞ ¼�Xk2Fs

0

logPðskÞ

PðvkÞ�Xk2Fs

logPðskÞ

PðvkÞð9Þ

Clearly the terms in the first summation are all zero.For the terms in the second summation, if the soft valuefrom the SOVA Decoder yk was positive, then this meantthat 1 was more likely than 0. Thus vk was 1 and thereforesk was 0 (since in the second summation we are dealingwith the flip bit locations). In this case a term in thesecond summation would be �yk. With the negative signin front of the summation it would become merely yk.

On the other hand, if the soft value yk was negativethen this meant that 0 was more likely than 1. Thus vk

was 0 and therefore sk was 1. In this case a term in thesecond summation would be yk. With the negative sign infront of the summation it would become �yk , yielding apositive number.

With this in mind, it is clear that

RðsÞ ¼ Sk2Fs

9yk9 ð10Þ

It should be noted that, depending on the soft values,sometimes a slice candidate with two or more flip-bitswill be more likely than another with just one flip bit.

The SCG generates a list of the nsc most likely slicecandidates ranked in ascending order of R(s), which isdetermined using Eq. (10). To generate this list, analgorithm called the Incomplete Partial Sums Algorithm(IPSA) is used. IPSA determines the nsc candidates with thesmallest R(s) without having to calculate R(s) for allpossible slice candidates.

IPSA is an iterative algorithm. The algorithm startswith an empty list and adds one R(s) value each timearound. IPSA ensures that each R(s) added is the smallestvalue not currently in the list. The algorithm stops oncethe output list contains nsc slice candidates. As R(s) valuesare determined, the SCG keeps track of the correspondingslice candidates. When the list is complete, the SCGrecords the actual slice candidates and not the R(s) values.

To begin the algorithm, the received slice, denoted y, isfirst sorted in order of soft bit value magnitude. The sortedslice, denoted a, is used by IPSA to generate the requirednumber of R(s) values. As Eq. (10) shows, R(s) is a partialsum of real numbers chosen from a fixed set (themagnitudes of the soft values of the bits in the slice). Asa contains the same elements as y (but arrangeddifferently), R(s) can be written as the sum of several

elements of a, as in Eq. (11).

RðsÞ ¼ Sk2Gs

ak ð11Þ

Here ak is the kth element of the sorted slice a and Gs is theset of indices to the elements of a that correspond to theflip-bits of slice candidate s.

Once a has been created, the IPSA builds a min-heap.This heap is used to store several R(s) values that couldpotentially be the next smallest. Initially, the heapcontains the slice candidate with Gs={0} (because, afterR(v), it always has the smallest R(s)).

On each iteration, the root value of the heap, denotedR(x), is removed from the heap and added to the outputlist. Two new values, R(y) and R(z), are then added to theheap. R(y) replaces R(x) at the root of the heap and R(z) isappended to the end of the heap, thus increasing the heapsize by one. Eqs. (12) and (13) show how R(y) and R(z) aredetermined.

RðyÞ ¼ RðxÞ�arþarþ1 ð12Þ

RðzÞ ¼ RðxÞþarþ1 ð13Þ

Here, ar is the largest value in a used in the sum of R(x)and ar+ 1 is the next element in a after ar. Once R(y) andR(z) are placed in the heap, they are sifted into it. If ar isthe largest value in a, no new values are added to theheap, the heap size decreases by 1 and is sifted as is.Either way, the value on top of the heap after sifting is thenew smallest value and becomes R(x) on the nextiteration. The overall complexity of IPSA is on the orderof nsc � log(nsc), excluding the cost of the sort of the softvalues.

A different approach to selecting slice candidates isfound in [12]. In [12], a fixed number of bits are chosen asflip-bits. In contrast, in this paper, any bit in the slice canbe a flip-bit. This ensures that the slice candidatesgenerated will be the most likely.

The Source Semantic Verifier (SSV) checks the slicecandidates and tries to find one that does not cause asemantic error to be detected. If no H.264 semantic errorsare detected in a slice candidate, it is said to have passedsemantic verification. Otherwise, it is said to have failedsemantic verification. The types of semantic errors thatare most commonly found are: invalid intra-predictionmode; slice run-on; slice fragment; and macro-blockoverrun.

Page 6: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

Fig. 2. Luminance (Y) PSNR vs. channel SNR for video ‘‘Football’’.

Fig. 3. Post-decoding Bit Error Rate (BER) vs. channel SNR for video ‘‘Football’’.

D. Levine et al. / Signal Processing: Image Communication 25 (2010) 75–8780

The slice candidates generated by the SCG are tested inorder from most likely to least likely. As soon as asemantic error is detected, the SSV gives up on the currentslice candidate and begins checking the next candidate.The SSV stops checking slice candidates as soon as onepasses verification. That candidate is declared the winnerand the SSV proceeds to the next slice. In the event that no

slice candidate passes semantic verification, a comparisonis done on the bit locations where each slice candidatefailed. The candidate with the latest failure location isdeclared the winner. A winning candidate with nosemantic error will generally be discovered only whenthere are a few bit errors in a particular slice. When awinning candidate is determined, the SSV passes the

Page 7: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

D. Levine et al. / Signal Processing: Image Communication 25 (2010) 75–87 81

winning candidate and a flag indicating whether or not itpassed semantic verification to the Modifier.

Rather than decompressing the winning slice candi-date, the information is fed back to the channel decoderthrough the Modifier.

The Modifier takes in the winning slice candidate fromthe SSV, denoted w, and uses it to alter each of the L(uk)values originally decoded by the SOVA Decoder. To alterL(uk), the likelihood estimates pk and qk (from Eqs. (1) and(2)) are modified to include information about thewinning slice candidate w. Based on Eqs. (1) and (2), theresulting modified likelihood estimates, denoted p̂k and q̂k

are given as:

p̂k ¼ Pðuk ¼ 19y;wÞ ð14Þ

q̂k ¼ 1�p̂k ¼ Pðuk ¼ 09y;wÞ ð15Þ

The altered value of L(uk), denoted L̂ðukÞ, is the LLR ofp̂k and q̂k:

L̂ðukÞ ¼ logp̂k

q̂k

� �ð16Þ

It is difficult to determine an expression for thelikelihood estimates in Eqs. (14) and (15) because it isdifficult to quantify the contribution of winning slicecandidate w. Thus, the values of p̂k and q̂k are determinedin an ad-hoc manner.

It is desired to have an expression for p̂k that hascontributions from both pk and wk. Here, wk is the kth bit inthe winning slice candidate. The contribution of wk shouldsomehow be related to the certainty of the winning slicecandidate w. Given this guideline, the following expres-sions were considered:

p̂k ¼ ð1�aÞpkþawk ð17Þ

0 0.5 1 1.5 210

12

14

16

18

20

22

24

26

28

30

Channe

Lum

inan

ce (Y

) PS

NR

(in

dB)

Fig. 4. Chebychev Bounds on IJSCD (4 Iterations)

q̂k ¼ ð1�aÞqkþa ð1�wkÞ ð18Þ

Here, a is called the Modification Parameter and isestimated empirically and can be any real numberbetween 0 and 1. Two different values of a are selected:one when the SSV delivers a winning slice candidate thatpasses semantic verification and one when the SSVdelivers a winning slice candidate that fails semanticverification. Based on experiments, a was chosen to be0.95 for the first case and 0.80 for the second case.Combining Eqs. (16), (17) and (18), and after somemanipulation, L̂ðukÞ can be written as:

L̂ðukÞ ¼ ð2wk�1Þ logeð2wk�1ÞULðukÞ þa

1�a

� �ð19Þ

The Modifier in the proposed scheme uses Eq. (19) toalter the soft value L(uk) and produce L̂ðukÞ. In examiningEqs. (17) and (18), it should be noted that when wk is 1then p̂k moves towards 1, while q̂kmoves towards 0. Thisin turn pushes L(uk) towards positive infinity. If the bitwas not a flip bit then the original soft value was positiveand it is pushed away from 0, on the other hand if this wasa flip bit then the original soft value was negative and it ispushed towards 0 (and often passes through it). When wk

is 0 the opposite occurs. Thus, in general, L(uk) is pushedtowards 0 (and often passes through it) when bit k is aflip-bit. On the other hand, L(uk) is pushed away from 0(thus reinforcing its hard decision) when bit k is a nonflip-bit.

In the end, the Modified soft L̂ðukÞ values are fed backinto the channel decoder, where they are interleaved andpassed to the SOVA Decoder as the Information Stream forthe next IJSCD iteration.

The modified sequence is passed to an Interleaveridentical to the one in the transmitter. This creates a new

2.5 3 3.5 4 4.5

l SNR (in dB)

IJSCD 4 IterationsConvolutional Decoding

and convolutional decoding for ‘‘Football’’.

Page 8: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

Fig. 5. Luminance (Y) PSNR vs. channel SNR for video ‘‘Table-Tennis’’.

Fig. 6. Output Bit Error Rate (BER) vs. channel SNR for video ‘‘Table-Tennis’’.

D. Levine et al. / Signal Processing: Image Communication 25 (2010) 75–8782

Information Stream, which is fed back to the channeldecoder for the next iteration.

The decoding process is repeated for several iterations.On the last iteration, rather than sending the winningcandidates to the Modifier, the SSV makes hard decisionson all slices and passes the sequence to the H.264Decompressor, which produces the final output.

3. Simulation results and analysis

Two objective measures are used to evaluate theperformance of the proposed scheme: PSNR and Bit ErrorRate (BER). Both PSNR and BER are plotted as a function ofthe channel SNR. It addition to objective results, severaldecompressed frames are observed to evaluate the

Page 9: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 510

15

20

25

30

35

Channel SNR (in dB)

Lum

inan

ce (Y

) PS

NR

(in

dB)

IJSCD 3 IterationsConvolutional Decoding

Fig. 7. Chebychev Bounds on IJSCD (3 Iterations) and convolutional decoding for ‘‘Table-Tennis’’.

Table 2Minimum Channel SNR needed to achieve maximum luminance (Y)

PSNR and gain in channel SNR for video sequence ‘‘Table-Tennis’’.

Scheme Channel SNR Gain in Channel SNR

Convolutional decoding 4.4 dB –

IJSCD–1 iteration 2.9 dB 1.5 dB

IJSCD–2 iterations 2.5 dB 1.9 dB

IJSCD–3 iterations 2.35 dB 2.05 dB

D. Levine et al. / Signal Processing: Image Communication 25 (2010) 75–87 83

subjective performance of the proposed scheme. Thecomplexity of the proposed scheme is also analyzed. Theperformance of the proposed scheme is compared to theperformance of the convolutional code by itself. Resultsare shown for the sequences ‘‘Football’’, ‘‘Table-Tennis’’and ‘‘Foreman’’. The specifications of the sequences areshown in Table 1.

3.1. Objective performance

Fig. 2 shows the objective performance comparison interms of luminance PSNR for the video sequence‘‘Football’’. Fig. 3 shows the BER of the outputcompressed video sequence before it is decompressed.Results are shown for the proposed scheme with 1, 2, 3and 4 iterations, and are compared to the results ofconvolutional coding alone. The results for red and bluechrominance PSNR are similar to the results for luminancePSNR and are not shown.

When there are no bit errors in the received videosequence, its PSNR is 28.91 dB. This value is determinedby the compression and is the highest PSNR that can beachieved when correcting errors. When convolutional

decoding is performed alone, Fig. 2 shows that themaximum achievable PSNR is reached when the channelSNR is about 4.4 dB or higher. As Fig. 3 shows, thiscorresponds to a post-decoding BER of 3�10�7 or lower.Fig. 3 also shows that the same post-decoding BER of3�10�7 can be achieved by the proposed IJSCD at achannel SNR of 3.0, 2.4 and 2.3 dB with 1, 2 and 3iterations, respectively. As a result, the proposed IJSCDscheme can attain the maximum achievable PSNR of28.91 dB over an AWGN channel at a lower channel SNR.Thus, without any extra bandwidth expansion, theproposed IJSCD scheme can provide a gain in channelSNR of 1.4, 2.0 and 2.1 dB with 1, 2 and 3 iterations,respectively. Practically speaking, this means that a videosignal can either be transmitted at a lower power or over alonger distance.

When the IJSCD scheme uses 4 iterations, the max-imum PSNR is achieved at the same minimum channelSNR as with 3 iterations. However, Fig. 2 shows that whenthe channel SNR is between 1 and 2.2 dB, IJSCD yields aslightly higher PSNR with 4 iterations than with 3iterations.

Little to no gain occurs at the bottom end of the curvein Fig. 2. This could be due to the fact that it is difficult forthe SSV to yield improvements when there are too manyerrors in each slice. In any case, even a small gain in thisregion would still result in a video quality too poor to beusable. As the bottom of the curve stays in the same place,the drop-off of the curve in the central region becomessteeper as the knee point moves left.

The values in Figs. 2 and 3 represent the average valuesof several runs. In order to better evaluate the statisticalsignificance of these results the values of standarddeviations of these points are calculated and a Chebychev

Page 10: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

Fig. 8. Luminance (Y) PSNR vs. channel SNR for video ‘‘Foreman’’.

Fig. 9. Output Bit Error Rate (BER) vs. channel SNR for video ‘‘Foreman’’.

D. Levine et al. / Signal Processing: Image Communication 25 (2010) 75–8784

bound representing a 90% confidence interval is calcu-lated for each point. These bounds are shown in Fig. 4. Themean values and the corresponding Chebychev 90%confidence interval are shown for convolutionaldecoding and IJSCD after 4 iterations. Here the upperbounds have been clipped so that they do not go higherthan the PSNR for the no error case. Despite the

Chebychev bound being a loose bound these curves stillshow a significant improvement for the IJSCD schemeover convolutional decoding.

Fig. 5 shows the performance comparison forluminance PSNR for the video sequence ‘‘Table-Tennis’’.Fig. 6 shows the performance comparison for BER. Resultsare shown for 1, 2 and 3 iterations. Once again, the

Page 11: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

0 0.5 1 1.5 2 2.5 3 3.5 4 4.55

10

15

20

25

30

35

40

Channel SNR (in dB)

Lum

inan

ce (Y

) PS

NR

(in

dB)

IJSCD 3 IterationsConvolutional Decoding

Fig. 10. Chebychev Bounds on IJSCD (3 Iterations) and convolutional decoding for ‘‘Foreman’’.

Table 3Minimum channel SNR needed to achieve maximum luminance (Y) PSNR

and gain in channel SNR over convolutional decoding for video sequence

‘‘Foreman’’.

Scheme Channel SNR Gain in Channel SNR

Convolutional decoding 4.4 dB –

IJSCD–1 iteration 3.2 dB 1.2 dB

IJSCD–2 iterations 2.7 dB 1.7 dB

IJSCD–3 iterations 2.45 dB 1.95 dB

D. Levine et al. / Signal Processing: Image Communication 25 (2010) 75–87 85

chrominance PSNR curves are similar to the luminancePSNR curve and are not shown. Fig. 7 shows the 90%Chebychev confidence interval for Table-Tennis. Onceagain these bounds have clipped so that the upperbound does not exceed the error-free PSNR. Again thebounds indicate that the improvement is statisticallysignificant.

Table 2 shows the minimum channel SNR needed toachieve the maximum luminance PSNR for convolutionaldecoding and for IJSCD. Table 2 also shows the resultinggains in channel SNR at the maximum achievable PSNRprovided by the proposed scheme over convolutionaldecoding. It is observed that the gains are similar to thoseseen for ‘‘Football’’.

Fig. 8 shows the luminance PSNR performancecomparison for the video sequence ‘‘Foreman’’. Fig. 9shows the BER performance comparison. Results areshown for IJSCD using 1, 2 and 3 iterations and forconvolutional decoding. Again, the chrominance PSNRcurves are similar to the luminance PSNR curve and arenot shown. Fig. 10 shows the 90% Chebychev confidenceinterval for Foreman.

Table 3 shows the minimum channel SNR needed toachieve the maximum luminance PSNR for IJSCD and forconvolutional decoding. Table 3 shows the resulting gainsin channel SNR provided by IJSCD over convolutionaldecoding.

The gains in channel SNR for IJSCD are slightly lowerfor ‘‘Foreman’’ than they are for either ‘‘Football’’ or‘‘Table-Tennis’’. This could be due to the fact that theframe size is different while the bitrate is the same.Regardless, the gains are still significant.

3.2. Subjective performance

The subjective quality improvement offered by theproposed scheme can be seen in Fig. 11. Fig. 11 shows aparticular frame of ‘‘Football’’ after having been decodedby the proposed IJSCD using 1 and 2 iterations, as well asby convolutional decoding alone. For comparison, anerror-free version of the frame is also shown. Theperformance is evaluated at 2.3 dB channel SNR.

It can be observed that the decompressed video usingconvolutional decoding is blocky and that a significantportion of the frame is lost. In contrast, the decompressedvideo using 1 IJSCD iteration has significantly fewervisible errors. When 2 iterations are performed, the framehas no visible errors.

3.3. Complexity

The complexity of the proposed scheme is evaluated fora single iteration using the complexity ratio. The complexityratio compares the run time of a module or of the entirescheme to the run time of the H.264 decompressor. The

Page 12: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

Fig. 11. Frame 42 of video sequence ‘‘Football’’ transmitted over AWGN channel at 2.3 dB SNR (a) error-free frame; (b) convolutional decoding; (c) IJSCD-

1 iteration; (d) IJSCD-2 iterations.

Table 4Complexity ratio for each module in the IJSCD scheme for video

‘‘Football’’ at 2.3 dB channel SNR.

Module Complexity ratio % Of iteration complexity

Parity Splitter 0.0002 0.017%SOVA Decoder 0.4744 40.06%Deinterleaver 0.0232 1.96%Stream merger 0.0003 0.025%SCG 0.1289 10.88%SSV 0.4771 40.29%Modifier 0.0573 4.84%Interleaver 0.0229 1.93%TOTAL 1.1843 100%

D. Levine et al. / Signal Processing: Image Communication 25 (2010) 75–8786

complexity of each iteration is approximately the same. Thecomplexity ratio for an IJSCD iteration is the sum of thecomplexity ratios of the Parity Splitter, the SOVA Decoder,the Deinterleaver, the Stream Merger, the SCG, the SSV, theModifier and the Interleaver.

Table 4 presents the complexity ratio of each of themodules in one IJSCD iteration at 2.3 dB channel SNR. Thepercentage of time taken up by each module is alsopresented. Over 80% of the iteration time is spent betweenthe SOVA Decoder and the SSV while the Parity Splitter,Deinterleaver, Stream Merger and Interleaver combinedtake up less than 3% of the iteration time.

Each iteration runs with approximately the samecomplexity. On the final iteration, the Modifier andInterleaver are not run and the H.264 Decompressor isrun. The complexity ratio of the last iteration is therefore0.932 greater than the complexity ratio for all otheriterations (the complexity of the H.264 Decompressor is

1). Thus, the overall complexity ratio for N IJSCD iterationsat 2.3 dB channel SNR is roughly 1.18N+0.932. This meansthat when 1 iteration is used, the IJSCD scheme takesslightly more than double the complexity of a full H.264decompression. When 2 iterations are used, the schemetakes slightly more than triple the complexity of a fullH.264 decompression, and so on.

4. Conclusions

By checking the semantics of an H.264 slice, errorscould be detected and corrected errors prior to decom-pression. In the proposed scheme, this led to both anoticeable decrease in output BER and a noticeableincrease in PSNR. Furthermore, by interleaving bits fromseveral different slices, results from the source decodercould be fed back to the channel decoder to performadditional decoding iterations. This led to a furtherdecrease in output BER and gain PSNR. In addition toyielding better objective video quality, the proposedscheme also presented subjective video quality improve-ments.

When the proposed scheme is compared to conven-tional convolutional decoding, the objective and subjec-tive qualities are improved without any additionalbandwidth expansion. The only cost incurred by theproposed scheme is increased complexity.

One potential improvement to the proposed scheme isto use information about the slice candidates that wererejected by the SSV when deciding how the Modifieralters the soft bit values. For example, a rejected slice

Page 13: Iterative joint source–channel decoding of H.264 compressed video

ARTICLE IN PRESS

D. Levine et al. / Signal Processing: Image Communication 25 (2010) 75–87 87

candidate may give an indication about the correctness orincorrectness of several bit decisions. The modificationparameter in the Modifier, is determined empirically.More work could be done to develop a less empiricalapproach to feeding source information back to thechannel decoder. This could potentially improve results.

Acknowledgements

This work was supported in part by the NaturalSciences and Engineering Research Council of Canada(NSERC).

References

[1] M. Bystrom, S. Kaiser, A. Kopansky, Soft source decoding withapplications, IEEE Transactions on Circuits and Systems for VideoTechnology 11 (10) (2001) 1108–1120.

[2] M. Jeanne, J.C. Carlach, P. Siohan, L. Guivarch, Source and jointsource–channel decoding of variable length codes, InternationalConference on Communications 2 (2002) 768–772.

[3] H. Nguyen, P. Duhamel, Robust source decoding of variable-lengthencoded video data taking into account source constraints, IEEETransactions on Communications 53 (7) (2005) 1077–1084.

[4] Y. Wang, S. Yu, Joint source–channel decoding for H.264 codedvideo stream, IEEE Transactions on Consumer Electronics 51 (4)(2005) 1273–1276.

[5] L. Pu, Z. Wu, A. Bilgin, M. Marcellin, B. Vasic, LDPC-based iterativejoint source–channel decoding scheme for JPEG2000, IEEE Transac-tions on Image Processing 16 (2) (2007) 577–581.

[6] Z. Peng, Y. Huang, D. Costello, Turbo codes for image transmission —

a joint channel and source decoding approach, IEEE Journal onSelected Areas in Communications 18 (6) (2000) 868–879.

[7] Z. Peng, Y. Huang, D. Costello, R.L. Stevenson, Joint channel andsource decoding for vector quantized images using turbo codes,ISCAS 1998 4 (1998) 5–8.

[8] Z. Peng, Y. Huang, D. Costello, R.L. Stevenson, Joint channel andsource decoding for subband coded image, ICIP 1998 1 (1998)329–333.

[9] W.E. Lynch, V. Papadakis, R. Krishnamurthy, T. Le-Ngoc, Syntaxbased error concealment, Signal Processing: Image Communication16 (9) (2001) 827–835.

[10] W.E. Lynch, V. Papadakis, R. Krishnamurthy, T. Le-Ngoc, Syntax anddiscontinuity based error concealment, ISCAS 1999 4 (1999)235–238.

[11] Y. Mei, T. Le-Ngoc, W.E. Lynch, A combined multiple candidatelikelihood decoding and error concealment scheme for compressedvideo transmission over noisy channels, Signal Processing: ImageCommunication 18 (10) (2003) 971–980.

[12] X.F. Ma, W.E. Lynch, Iterative joint source–channel decoding usingTurbo Codes for MPEG-4 video transmission, International Confer-ence on Acoustics, Speech and Signal Processing 4 (2004) 657–660.

[13] D. Marpe, H. Schwarz, T. Wiegand, Context-based adaptive binaryarithmetic coding in the H.264/AVC video compression standard,IEEE Transactions on Circuits and Systems for Video Technology 13(7) (2003) 620–636.

[14] A. Puri, X. Chen, A. Luthra, Video coding using the H.264/MPEG-4AVC compression standard, Signal Processing, Image Communica-tion 19 (9) (2004) 793–849.

[15] I.E.G. Richardson, H.264 and MPEG-4 Video Compression. TheAtrium, Southern Gate, Chichester, West Sussex PO19 8SQ,England: John Wiley & Sons, first ed., 2003.

[16] Joint Video Team (JVT), Recommendation ITU-T H.264: AdvancedVideo Coding for Generic Audiovisual Services, ITU-T, May 2003.

[17] D. Levine, W. Lynch, T. Le-Ngoc, Observations on error detection inH.264, MWSCAS 2007 (2007).

[18] D. Levine, W. Lynch, T. Le-Ngoc, Iterative joint source–channeldecoding of H.264 compressed video, ISCAS 2007 (2007) 1517–1520.

[19] J. Hagenauer, P. Hoeher, A Viterbi Algorithm with soft-decisionoutputs and its applications, Global Telecommunications Confer-ence 3 (1989) 1680–1686.

[20] G.D. Forney, The Viterbi Algorithm, Proceedings of the IEEE 61 (3)(1973) 268–278.