18
Adaptive video transcoding and streaming over wireless channels Zhijun Lei * , Nicolas D. Georganas Distributed and Collaborative Virtual Environments Research Laboratory (DISCOVER), School of Information Technology and Engineering, University of Ottawa, 800 King Edward Avenue, Ottawa, Ont., Canada K1N 6N5 Received 21 January 2003; received in revised form 6 August 2003; accepted 7 September 2003 Available online 25 March 2004 Abstract In this work, we investigate the problem of bit rate adaptation transcoding for transmitting pre-encoded VBR video over burst- error wireless channels, i.e., channels such that errors tend to occur in clusters during fading periods. In particular, we consider a scenario consisting of packet-based transmission with Automatic Repeat ReQuest (ARQ) error control and a feedback channel. With the acknowledgements received through the feedback channel and a statistical channel model, we have an estimate of the current channel state, and effective channel bandwidth. In this paper, we analyze the constraints of buffer and end-to-end delay, and derive the conditions that the transcoder buffers have to meet for preventing the end decoder buffer from underflowing and overflowing. Furthermore, we also investigate the source characteristics and scene changes of the pre-encoded video stream. Based on the channel constraints and source video characteristics, we propose an adaptive bit rate adaptation algorithm for transcoding and transmitting pre-encoded VBR video stream over wireless channel. Our experimental results demonstrate that, by reusing the source characteristics and scene change information, transcoding high quality video can produce better video picture quality than that produced by directly encoding the uncompressed video at the same low bit rate. Moreover, by controlling the frame bit budget according to the channel conditions and buffer occupancy, the initial startup delay of streaming pre-encoded video can be signif- icantly reduced. Ó 2004 Elsevier Inc. All rights reserved. Keywords: Wireless video; Mobile multimedia; Video transcoding; Content based rate adaptation; Video streaming 1. Introduction Recently, there has been a great demand for audio/ visual services to be provided over wireless links. It is expected that many video services and multimedia applications will enable users to access pre-encoded video bit streams through wireless connections and handheld devices. Such applications include video on demand (VoD), tele-learning, etc. In these applications, the pre-encoded video needs to be decoded and dis- played on the fly, while it is downloaded. Due to the variety of different networks comprising the present communication infrastructure, users may connect to the pre-encoded video stream through connections with different characteristics and capacities. In order to accommodate different connections and allow all users to be able to access the pre-encoded video, effective compression and transmission schemes need to be adopted. For instance, like what we have seen on the Internet, the same video program is encoded into several copies with different quality and bit rates. When users want to download and play a video, they have to select the specific copy, which is compatible with their devices and connections. However, a great lack of flexibility arises because only a few copies cannot represent all possible connection bandwidths, and, sometimes, users have to choose the specific copy with bit rate different from the actual connection bandwidth. Moreover, when the effective connection bandwidth is changing, pre- encoded videos are unable to adapt to the changes, because the dynamic channel condition is usually unknown when the video is originally coded. Especially when the original video is encoded with unified quality, * Corresponding author. Address: 402-25 Leith Hill Rd., Toronro, Canada M2J 1Z1. Tel.: +1-416-491-8631. E-mail addresses: [email protected] (Z. Lei), georganas@ discover.uottawa.ca (N.D. Georganas). 0164-1212/$ - see front matter Ó 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.jss.2003.09.029 The Journal of Systems and Software 75 (2005) 253–270 www.elsevier.com/locate/jss

Adaptive video transcoding and streaming over wireless channels

Embed Size (px)

Citation preview

Page 1: Adaptive video transcoding and streaming over wireless channels

The Journal of Systems and Software 75 (2005) 253–270

www.elsevier.com/locate/jss

Adaptive video transcoding and streaming over wireless channels

Zhijun Lei *, Nicolas D. Georganas

Distributed and Collaborative Virtual Environments Research Laboratory (DISCOVER), School of Information Technology and Engineering,

University of Ottawa, 800 King Edward Avenue, Ottawa, Ont., Canada K1N 6N5

Received 21 January 2003; received in revised form 6 August 2003; accepted 7 September 2003

Available online 25 March 2004

Abstract

In this work, we investigate the problem of bit rate adaptation transcoding for transmitting pre-encoded VBR video over burst-

error wireless channels, i.e., channels such that errors tend to occur in clusters during fading periods. In particular, we consider a

scenario consisting of packet-based transmission with Automatic Repeat ReQuest (ARQ) error control and a feedback channel.

With the acknowledgements received through the feedback channel and a statistical channel model, we have an estimate of the

current channel state, and effective channel bandwidth. In this paper, we analyze the constraints of buffer and end-to-end delay, and

derive the conditions that the transcoder buffers have to meet for preventing the end decoder buffer from underflowing and

overflowing. Furthermore, we also investigate the source characteristics and scene changes of the pre-encoded video stream. Based

on the channel constraints and source video characteristics, we propose an adaptive bit rate adaptation algorithm for transcoding

and transmitting pre-encoded VBR video stream over wireless channel. Our experimental results demonstrate that, by reusing the

source characteristics and scene change information, transcoding high quality video can produce better video picture quality than

that produced by directly encoding the uncompressed video at the same low bit rate. Moreover, by controlling the frame bit budget

according to the channel conditions and buffer occupancy, the initial startup delay of streaming pre-encoded video can be signif-

icantly reduced.

� 2004 Elsevier Inc. All rights reserved.

Keywords: Wireless video; Mobile multimedia; Video transcoding; Content based rate adaptation; Video streaming

1. Introduction

Recently, there has been a great demand for audio/

visual services to be provided over wireless links. It is

expected that many video services and multimediaapplications will enable users to access pre-encoded

video bit streams through wireless connections and

handheld devices. Such applications include video on

demand (VoD), tele-learning, etc. In these applications,

the pre-encoded video needs to be decoded and dis-

played on the fly, while it is downloaded. Due to the

variety of different networks comprising the present

communication infrastructure, users may connect to thepre-encoded video stream through connections with

*Corresponding author. Address: 402-25 Leith Hill Rd., Toronro,

Canada M2J 1Z1. Tel.: +1-416-491-8631.

E-mail addresses: [email protected] (Z. Lei), georganas@

discover.uottawa.ca (N.D. Georganas).

0164-1212/$ - see front matter � 2004 Elsevier Inc. All rights reserved.

doi:10.1016/j.jss.2003.09.029

different characteristics and capacities. In order to

accommodate different connections and allow all users

to be able to access the pre-encoded video, effective

compression and transmission schemes need to be

adopted. For instance, like what we have seen on theInternet, the same video program is encoded into several

copies with different quality and bit rates. When users

want to download and play a video, they have to select

the specific copy, which is compatible with their devices

and connections. However, a great lack of flexibility

arises because only a few copies cannot represent all

possible connection bandwidths, and, sometimes, users

have to choose the specific copy with bit rate differentfrom the actual connection bandwidth. Moreover, when

the effective connection bandwidth is changing, pre-

encoded videos are unable to adapt to the changes,

because the dynamic channel condition is usually

unknown when the video is originally coded. Especially

when the original video is encoded with unified quality,

Page 2: Adaptive video transcoding and streaming over wireless channels

254 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270

the generated bit rate will tend to be highly variable due

to the nature of the video. When the VBR coded videos

need to be transmitted over a network, frames will arrive

at the decoder experiencing different end-to-end delays

due to their variable size. Usually, an initial startup

buffer is used at the decoder side to compensate for thedelay jitters. The decoder will start to decode and dis-

play the video when the buffer is full. If for some reason

the delivery of the next frame is delayed and the initial

buffer is empty, the decoder has to suspend rendering

until the buffer is refilled. This so-called rebuffering

process is a frequent cause of users finally giving up.

Layered video coding was originally designed to solve

this kind of problems. In layered video coding, a video isencoded into several layers with different levels of

importance and quality. However, since the number of

layers is limited and no dynamic changes can be made

on the compressed video during transmission, the

inflexibility still exists.

Alternatively, a video transcoder can be used at the

video source or an intermediate node to convert the

video bit rate. When the connection bandwidth is verylow, if the video bit rate can be reduced, then the same

number of frames can be transmitted as in the case when

both video bit rate and channel bandwidth are high.

Similar to source encoders, video transcoders can

modulate the data they produce by adjusting a number

of parameters, including quality, frame rate, and reso-

lution. Using transcoders gives us a second chance to

dynamically adjust the video bit rate according tochannel bandwidth. This is particularly useful when

there are time variations in the channel characteristics.

The increasing demand for mobile communications

has resulted in the extensive use of wireless communi-

cation technology. However, a signal received over a

wireless channel exhibits considerable fades in the signal

strength. Unlike a wireline channel where the signal

strength is relatively constant and the errors in receptionare mainly due to the additive noise, errors in a wireless

channel are predominantly due to the time varying sig-

nal strength caused by the multi-path propagation from

local scatters. Thus, errors in a wireless channel tend to

be bursty, with the duration of bursts being a function of

the receiver velocity and the nature of the time varying

environment. To achieve high video quality at the de-

coder requires a robust transmission scheme. Closed-loop error techniques like Automatic Repeat ReQuest

(ARQ) have been shown to be more effective than

Forward Error Correction (FEC) and successfully ap-

plied to wireless video transmission (Khansari et al.,

1996). ARQ approaches, assuming the existence of a

back channel and sufficiently long end-to-end delays, are

appealing in that retransmission is only required during

periods of poor channel conditions.In this paper, we concentrate on using rate adapta-

tion transcoding techniques and an ARQ error control

scheme for transmitting pre-encoded video over wireless

channels. To take full advantage of the error control

capabilities of the ARQ scheme, we propose to combine

the ARQ feedback mechanism with the transcoding

mechanism at the video transcoder. The scheme can be

broadly divided into two parts. First, content basedapproach for determining transcoding frame bit budget.

Second, adjust the frame bit budget or adaptively drop

frames according to effective channel bandwidth and

required end-to-end delay bounds. By using this scheme,

one can achieve the following appealing results. First, at

the video server, for every video program, only one high

quality compressed copy is saved. When a video pro-

gram needs to be transmitted to a client, transcoders canbe used to adaptively transcode the video program for

different channel conditions. Second, the rate for the

transcoded video is reduced during the periods of poor

channel conditions. Third, every frame will be delivered

to the decoder within the required end-to-end delay,

thus, no large initial buffer is needed at the decoder side

and rebuffering is avoided. Fourth, the quality of video

can be improved when applications relax the end-to-enddelay requirement.

The rest of this paper is organized as follows. In

Section 2, we introduce some related works in three

aspects: video transcoding, video rate control and bit

allocation, and error control for wireless video com-

munications. In Section 3, the studied system is intro-

duced. We analyze the delay and buffer constraints that

the transcoder buffer has to satisfy. Based on theseconstraints, we derive the constraints on the transcoding

ratio. In Section 4, we investigate the source character-

istics of the pre-encoded video, including the frame

types, source video rate, and scene changes, and their

effects on visual quality. Based on this analysis, we

propose the adaptive frame layer rate control and frame

skipping algorithms. In Section 5, we briefly describe a

wireless channel model and the method for estimatingthe effective channel bandwidth based on this channel

model, which will be used as our simulation test bed. In

Section 6, we propose a joint source-channel rate

adaptation transcoding scheme for video transmission

over the studied wireless video system. We assume that

an a priori probabilistic model of the channel behavior is

available and a selective ARQ scheme is used for error

control. Our simulation results and conclusions will bepresented in Section 7.

2. Related works

2.1. Video transcoding

Video transcoding deals with converting a previously

compressed video signal into another one with different

Page 3: Adaptive video transcoding and streaming over wireless channels

Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 255

format, such as different bit rate, frame rate, frame

size, or even compression standard. The concept of

transcoding was first proposed by Sun et al. (1996)

for compressed video bit rate scaling. Later on, trans-

coding was used for spatial resolution (video frame

size) conversion (Shen et al., 1999; Shanableh andGhanbari, 2000), temporal resolution (video frame rate)

conversion (Hwang et al., 1998; Youn and Sun, 1999;

Chen et al., 2002; Fung et al., 2001), bit rate adaptation

(Assunc�~ao and Ghanbari, 1998; Assunc�~ao and Ghan-

bari, 2000; Lei and Georganas, 2002), multipoint video

combining (Lin et al., 2000a,b; Lin, 2000), and error

resilience (Dogan et al., 2001a,b; Reyes et al., 2000), etc.

Among all transcoding related research topics andapplications, bit rate adaptation transcoding has been

the most popular one. The idea of compressed video bit

rate adaptation is initiated by the applications of trans-

mitting precoded video streams over heterogeneous

networks. In this case, the diversity of channel capacities

in different transmission media often gives rise to prob-

lems. More specifically, when connecting two transmis-

sion media, the channel capacity of the outgoing channelmay be less than that of the incoming channel or the

channel capacity of the outgoing channel may change

over time. On the other hand, when pre-encoded video

needs to be distributed to users with different connec-

tions, the target transmission channel conditions are

generally unknown when the video is originally encoded.

In this case, transcoders can be used to dynamically

convert the bit rate of the compressed video for the targetchannel.

There are three major transcoder architectures that

have been proposed in the literature. The most

straightforward method connects a standard decoder

and a standard encoder together. This close-loop

transcoder is called Cascaded Pixel Domain Transcoder

(CPDT) (Keesman et al., 1996). On the other extreme,

the input video bitstream is first partially decoded tothe DCT coefficient level. Then, the bit rate can be easily

scaled down by cutting higher frequency coefficients

or by requantizing all coefficients with a larger quanti-

zation step size (Sun et al., 1996). This kind of trans-

coders is also referred to as the Open-Loop Transcoder

(OLT). In between the above two extreme methods is

a third method, which also uses requantization of

the DCT coefficients, but the requantization error isstored in a buffer and is fed back to the requantizer

to correct the requantization error introduced in the

previous frames (Zhu et al., 1999). This kind of trans-

coder simplifies the architecture of the first category

by reusing motion vectors, and merging two motion

compensation loops in the CPDT into one (Assunc�~aoand Ghanbri, 1996). If motion compensation is car-

ried out in the DCT domain (Chang and Messersch-mitt, 1993), the simplified transcoding can be performed

totally in the DCT domain, which results in a DCT-

domain transcoder (DDT) with much reduced com-

plexity. These transcoding architectures operate in

different layers (pixel domain and DCT domain) and

have different complexity and effects on the final visual

quality. Choosing a transcoding architecture should be

determined by application requirements.

2.2. Video rate control and bit allocation

Besides downscaling bit rate, a video transcoder must

have the ability to accurately and dynamically control

the output bit rate according to the channel bandwidth.

This is the subject of video rate control. Generic rate

control belongs to the budget-constrained bit allocationproblem and can be separated into the following two

steps:

(1) Allocate target bits for each frame according to im-

age complexities, buffer occupancy, or a given chan-

nel bit rate. This step is usually called Frame-Layer

Rate Control.

(2) Derive the actual quantization parameter for eachMacroblock (MB) in the picture, and make the

number of produced bits meet the bit target. This

step is usually called Macroblock-Layer Bit Alloca-

tion.

For Macroblock-Layer Bit Allocation, methods

based on conventional Rate-Distortion Theory (Hang

and Chen, 1997; Wu and Gersho, 1991; Choi and Park,1994; Ramchandran et al., 1994; Corbera and Lei, 1999)

and empirical Rate-Quantization models (Ding and Liu,

1996; Tao et al., 2000; Chiang and Zhang, 1997; He

et al., 2001) have been proposed in the literature to solve

it. As far as Frame-Layer Rate Control is concerned, the

bit budget for every frame has to be determined con-

sidering the channel bit rate and buffer occupancy. Since

all video coding standards assume that the compressedvideo will be transmitted over a constant bit rate (CBR)

channel, the frame-layer rate control scheme of these

standards is based on this assumption. However, this

is not the case in reality. For example, if compressed

video is transmitted over the Internet, which currently

cannot provide a guaranteed constant bit rate channel

for a specific application, when the network is con-

gested, most video packets will be dropped, which re-sults in unacceptable video quality, if no other

mechanisms exist to protect the video stream. Another

case is transmitting video over a wireless channel, which

is characterized by high bit error rate (BER) and vari-

able effective channel bit rate. Although Shannon’s

separation theorem states that source coding (compres-

sion) and channel coding (error protection) can be per-

formed separately and sequentially, a lot of researchresults have shown that the source coding and channel

Page 4: Adaptive video transcoding and streaming over wireless channels

256 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270

coding have to be combined together in practical video

communication systems.

When pre-encoded video needs to be distributed to

heterogeneous networks, since the target transmission

channel conditions are generally unknown when the

video is originally encoded, using transcoders gives us asecond chance to dynamically adjust the video bit rate

according to channel conditions. Some research works

have been done for bit rate adaptation and bit rate

downscaling using transcoding. In (Assunc�~ao and

Ghanbari, 1998; Assunc�~ao and Ghanbari, 1997a,b),

transcoding is regarded as a down conversion process,

where the bit rate of a compressed video bit stream is

reduced according to a given constraint. Assunc�~ao et al.propose an optimal transcoder in a rate-distortion sense

for transmitting video over the Internet. In their work,

the problem of optimal transcoding is formulated in an

operational rate-distortion context and solved by using

a Lagrangian algorithm. New quantizer scales are se-

lected based on classic Rate-Distortion theory for

transcoding each MB or group of MBs such that the

output rate does not exceed the given constraint, whileproducing a minimum average distortion. In (Assunc�~aoand Ghanbari, 1997a,b), video transcoding is utilized as

a mechanism capable of decoupling video encoders from

network constraints and providing congestion control of

pre-encoded video traffic over ATM networks for video

distribution applications. This mechanism provides an

effective method of shaping video traffic independently

of the initial video encoder’s constraints. By usingtranscoders, video traffic can be controlled at any point

along the transmission path and thus the Quality of

Service can be maintained without relying on on-line

encoders. In (Dogan et al., 2001a,b), Dogan et al.

address the problem of traffic planning for mobile video

communications and propose a video transcoder bank

to resolve congestion and/or bandwidth limitation. The

proposed architecture presents a layered structure ofmultiple video rates as required by various networks.

The paper also introduces an adaptive method for

resolving congestion. The designed system monitors the

congestion with a feedback loop within a network and

adaptively produces necessary transmission rates while

providing the best available service quality. In our pre-

vious work (Lei and Georganas, 2002), a scene context

based on frame layer rate control is proposed fordetermining the frame bit budget, and an algorithm

based a linear bit allocation model is used to do macro-

block layer bit allocation.

2.3. Error control for wireless video communication

Recently, the increasing demand for mobile commu-

nications has resulted in the extensive use of wirelesscommunication technology. It is necessary to support

multimedia services, including video and audio other

than voice and data, over wireless links. However,

wireless links are characterized by high bit error rate,

limited bandwidth and time-varying conditions. Unlike

a wireline channel where the signal strength is relatively

constant and the errors in reception are mainly due to

the additive noise, errors in a wireless channel are pre-dominantly due to the time varying signal strength

caused by the multi-path propagation from local scat-

ters. Thus, errors in a wireless channel tend to be bursty,

with the duration of bursts being a function of the re-

ceiver velocity and the nature of the time varying envi-

ronment (Aramvith et al., 2001). Transmission of video

over wireless networks is challenging because of the

delay constraints involved, and because of the negativeimpact of channel errors on the perceptual quality of

video at the decoder. Uncorrected channel errors may

result in significant quality degradation at the decoder.

This is particularly evident in standard coders, such as

those based on MPEG or H.263, where variable length

coding is used or where compression involves a predic-

tive coding scheme.

There have been may techniques proposed in the lit-erature to combat the transmission error problem from

different aspects. At the encoder side, source coding

techniques, such as layered coding (Ghanbari, 1989),

multiple description coding (Puri et al., 2001), Error

Resilience Entropy Coding (Cheng and Kingsbury,

1992; Redmill and Kingsbury, 1996), etc. can be used to

increase the robustness of the video stream against

channel errors.At the decoder side, error concealment techniques

attempt to recover the lost information by estimation

and interpolation without relying on additional infor-

mation from the encoder (Wang and Zhu, 1998). Besides

source coding and error concealment techniques, chan-

nel coding techniques, such as FEC and ARQ, have been

the classic techniques to combat transmission errors.

FEC codes can be chosen to guarantee certain error raterequirements for the worst channel conditions. However,

this causes unnecessary overhead and wastes bandwidth

when the channel is in a good state (Liu and Zarki, 1998).

Using ARQ error control for the mobile radio channels

has been shown to be more effective than FEC because

retransmission is only required during periods of poor

channel conditions (Khansari et al., 1996).

3. Delay and rate constraints for transcoding

In this work, we define a pre-encoded video streaming

system in which the pre-encoded high quality video

program is transcoded, transmitted, decoded and dis-

played in real time within some delay interval. A block

diagram of the whole system is illustrated in Fig. 1. Inthis section, we will analyze the effect of the delay and

buffer constraints on the transcoding ratio.

Page 5: Adaptive video transcoding and streaming over wireless channels

Pre-encodedVideo

VideoTranscoder

TranscoderBuffer

Wirelesschannel Video Decoder

DecoderBuffer

Videooutput

Video Source Video Client

Dt Dtb Dtc Ddb Dd

Fig. 1. Basic component of the defined video streaming system.

Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 257

3.1. Delay constraints for the defined system

In the defined system as illustrated in Fig. 1, we

assume that a video is transcoded and transmitted at

a fixed frame rate F . At the decoder side, the video is

decoded and displayed at the same frame rate F . In this

work, we use the total end-to-end delay as a perfor-

mance metric. Obviously, lower delay is preferable sinceit allows reduced initial startup delay in one-way com-

munication systems. In the studied system, the end-to-

end delay each frame experiences (from the time it is

transcoded to the time it is placed in the video display)

consists of several delay components, as shown in Eq.

(1)

D ¼ Processing DelayðDt þ DdÞþ Transmission DelayðDtcÞþ Buffer DelayðDtb þ DdbÞ ð1Þ

where the subscripts t, d stand for the transcoder and

decoder, respectively. In the above equation, the pro-

cessing delay in either the transcoder or decoder depends

on the computer power and the transcoding complexity

can be much lower than that of the encoding process.

Therefore, we can assume the processing delay compo-

nents are constant and can be neglected. In this work, anARQ protocol is assumed to be used for video trans-

mission, therefore, the transmission delay, Dtc, includes

the time for transmission and retransmission of the

video packets and Acknowledgement or Negative

Acknowledgement (ACK/NAK) messages. In this work,

we are primarily concerned with the delays introduced

by the transcoder buffer and decoder buffer, because

they can be much larger than the transmission delaysand the amount of buffering in the video system can

strongly affect the video quality and end-to-end delay.

After the first frame is transcoded, it will be sent into

the transcoder buffer, which is empty at this time. In

general, it is possible for transcoder buffer underflow to

occur, if transmission starts at the same time as the

transcoder puts the first bit into the buffer. On the other

hand, at the decoder buffer, which is also empty for thefirst frame, underflow will occur if decoding starts at the

same time as the first bit of the first frame arrives. In

practice, this is prevented by starting the transmission

and decoding after a certain initial delay. Therefore, the

end-to-end delay for the first frame is given by the sum

of the initial delays. To simplify the analysis, we define

the system initial delay D, i.e., the first frame will be

transcoded at time t and will be decoded and displayed

at time t þ D. Then, the maximum delay that anyfollowing video frame will experience has to remain

constant as D. In real-time interactive applications, the

end-to-end delay, D, must be kept less than a certain

limit, such as 100 ms in video conferencing applications.

In precoded video streaming applications, the end-to-

end delay can be much longer. However, short initial

delay is still a desired target. At the same time, long

initial delay is equivalent to larger decoder buffer, whichis what we want to avoid, because most handheld

devices are memory limited. Under this constraint, the

system will function normally, as long as the decoder

buffer does not overflow and underflow, which prevents

the video data from being lost and guarantee that the

decoder has received the data of a video frame before it

is scheduled to be displayed, respectively.

3.2. Buffer analysis of VBR transcoders

In this section, we will analyze the conditions that the

transcoder buffer has to meet in order to prevent the

decoder buffer from underflowing within the end-to-end

delay constraints. In (Assunc�~ao and Ghanbari, 2000),

Assunc�~ao et al. have analyzed the buffering implication

of inserting a transcoder along the CBR transmissionpath. However, in our studied system, due to the re-

transmission in the ARQ scheme, the effective channel

bandwidth becomes variable. As illustrated in Fig. 2, in

our selective repeat (SR) ARQ scheme, the reception of

a packet is acknowledged by the receiver sending either

an ACK or a NAK to the transmitter. Packets that have

been sent are stored in the ARQ buffer until they are

acknowledged with ACK. Packets awaiting transmis-sion are stored in the transcoder buffer and the decoder

buffer can be used to rearrange the received packets,

which may be out-of-order due to retransmission.

Page 6: Adaptive video transcoding and streaming over wireless channels

Bd(t)

r'(t-D)

r(t)R(t)

Bt (t)

)(tβ r'(t)

D

ARQControl

ARQ Buffer

ACK/NAK

Channel feedback

Fig. 2. Buffering with variable transcoding ratio for VBR channel.

258 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270

As illustrated in Fig. 2, the precoded video is sent intothe transcoder at the source coding rate rðtÞ. Actually,

because all data of the pre-encoded video are available

at the server side, the transcoder can work much faster

than F frames per second. To simplify the analysis here,

we assume that the transcoder only transcodes F frames

in one second. At the transcoder, the transcoding is

modeled as a scaling function bðtÞ which is multiplied by

rðtÞ produces the transcoded video at rate r0ðtÞ, i.e.,

r0ðtÞ ¼ bðtÞ � rðtÞ ð2Þ

The effect of multiplying bðtÞ by rðtÞ can be seen as

equivalent to reducing the number of bits used in the

video frame transcoded at time t. In Fig. 2, BtðtÞ andBdðtÞ are buffer occupancy of transcoder and decoder at

time t, respectively. RðtÞ is the channel bandwidth at

time t. We define Bti and Bd

i as the instantaneous occu-

pancy of the transcoder and decoder buffers at the ithframe. First, we discretize the problem by defining Ei

(i ¼ 1; 2; . . .) to be the number of bits generated by the

transcoder in the ith frame interval [ði� 1ÞT ; iT ), whereT is the duration of one frame interval. Therefore,

Ei ¼Z iT

ði�1ÞTr0ðtÞdt ð3Þ

Similarly, let Ri be the number of bits that aretransmitted during the ith frame interval:

Ri ¼Z iT

ði�1ÞTRðtÞdt ð4Þ

Assuming the transcoder buffer is empty at timet ¼ 0, then the transcoder buffer occupancy after trans-

coding the ith frame is

Bti ¼

Xi

j¼1

Ej �Xi

j¼1

Rj ¼ Bti�1 þ Ei � Ri ð5Þ

After the decoder begins to receive data, it waits Dframe intervals before starting to decode and play.

Then, the decoder buffer occupancy after the ith frame

interval is

Bdi ¼

Pij¼1 Rj; i6DPij¼1 Rj �

Pi�Dj¼1 Ej ¼ Bd

i�1 þ Ri � Ei�D; i > D

(

ð6Þ

The transcoder can calculate the decoder buffer fullness,

if D is predetermined or sent explicitly as a decoder

parameter. The system will function normally as long as

the decoded buffer does not underflow within the end-

to-end delay constraint. Therefore, Bdi should be greater

than zero at any time.

3.3. Transcoding ratio constraints

We now combine equations from Section 3.2 to ob-

tain conditions necessary to prevent transcoder and

decoder buffers underflow. To prevent transcoder buffer

underflow, from Eq. (5), we have

Bti ¼ Bt

i�1 þ Ei � Ri > 0 ) Ei > Ri � Bti�1 ð7Þ

which is a constraint on the number of bits of everytranscoded frame.

In order to prevent the decoder buffer underflow,

from Eq. (6), we have

Bdi ¼ Bd

i�1 þ Ri � Ei�D P 0 ) Ei�D 6Bdi�1 þ Ri; i > D

) Ei 6BdiþD�1 þ RiþD; i > 1 ) Ei 6

XiþD�1

j¼1

Rj

�Xi�1

j¼1

Ej þ RiþD ¼XiþD

j¼1

Rj �Xi�1

j¼1

Ej ¼Xi�1

j¼1

Rj

þXiþD

j¼i

Rj �Xi�1

j¼1

Ej ð8Þ

As we can see from Eq. (8),Pi�1

j¼1 Rj is the number of

total bits that have been transmitted in the past ði� 1Þframe intervals, and

Pi�1

j¼1 Ej is the number of total bits

of the past (i� 1) transcoded frames. These two terms

are known by the transcoder. However,PiþD

j¼i Rj is the

number of total bits that will be transmitted in the future

Page 7: Adaptive video transcoding and streaming over wireless channels

Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 259

(Dþ 1) frame intervals, which is not available at the

transcoder, if the channel bandwidth is variable.

Therefore, the condition that the bit number of the

ith transcoded frame has to satisfy is

maxð0;Ri � Bti�1Þ < Ei <

Xi�1

j¼1

Rj þXiþD

j¼i

Rj �Xi�1

j¼1

Ej ð9Þ

Under this condition, there will be a maximum of Dframes stored in the whole system, because the maxi-

mum end-to-end delay is D frame intervals. Then, we

have

Bti þ Bd

i 6

Xi

j¼i�Dþ1

Ej; 8i > D ð10Þ

In a VBR channel operating at a channel rate RðtÞ,which is not deterministically known, the key diffi-culty in dealing with this scenario is that the trans-

coder has to estimate the effective channel bandwidth to

decide the scaling factor for every frame. In Section 5,

we will introduce how to make use of a probabilistic

model of the channel and observation of the current

channel state to estimate the effective channel band-

width.

4. Adaptive transcoding based on video content

In this section, we will introduce the frame layer rate

control based on video content. Here, video content

refers to the inherent visual feature present in the video.

In this work, we consider the frame types, video source

coding rate and scene changes.

4.1. Frame types and source video traffic

Current video coding standards, such as MPEG-x and

H.26x, use motion compensation to reduce the temporal

0 50 100 150 200 2500

1

2

3

4

5x 10

4

Scene Change

Fig. 3. Source coding rate of a t

redundancy between successive frames. Usually, in

MPEG-x, one group of pictures (GOP) contains one

INTRA (I) frame and several INTER (P , or B) frames in

a certain pattern. An I frame has no motion compensa-

tion performed on it but it is very important because

motion compensation of the following frames is depen-dent on the quality of the I frame. P frames use the

previous I , or P frames as reference for motion com-

pensation and also are used as reference frames for other

INTER frames. B frames use both the previous and

successive I , or P frames as references for motion com-

pensation but B frames are not used as reference

frames for INTER frames. Because of this hierarchy in

coding, visual quality for different frame types has dif-ferent effect on the quality of the whole video stream with

the priority being I > P > B. Therefore, an effective rate

control scheme should treat different types of frames

unequally, in order to improve the quality of the whole

video.

For wireless applications, the H.263 standard is more

popular due to its focus on low bit rate. Different from

MPEG-1, 2, in which I frames are periodically usedmainly for indexing, in H.263 standard, I frames are

seldomly used, just to refresh the visual quality. When a

fixed quantization parameter is used for coding all

frames in a video sequence, the visual objective quality

(PSNR) will tend to be near constant while the gener-

ated bit rate will be liable to fluctuate due to the scene

content and different frame types. Thus, if the pre-

encoded bitstream generated by the encoder is kept veryclose to a constant rate, usually the purpose of doing so

is for easy transmission over a CBR channel, there will

be a penalty in terms of quality. In one of our experi-

ments, we encoded a video sequences with unified

quantization parameters for all I , P , B frames. Only the

first frame is coded as an I frame and all other frames

are coded as B, and P frames in a certain pattern (IBBP).

Fig. 3 shows the bit rate of this stream. As we can see, I

300 350 400 450 500

ypical H.263 coded video.

Page 8: Adaptive video transcoding and streaming over wireless channels

Fig. 4. Relation between the bits/frame of two bit streams encoded at two different rates.

260 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270

frames usually have the largest amount of bits, B and Pframes have much fewer bits than I frames. We notice

that when scene changes happen, the bit rate of the first

INTER frame (we called it anchor frame) after the scene

change is very close to that of I frames. This is because

when a scene change happens, most MBs are encoded as

INTRA blocks. We also found that the average bit

number for B frame is usually half of that of the P frame

within a scene segment.The source bit rate can be recorded when the original

video is encoded and saved with the encoded video as

side information. Some research (Corbera and Lei,

1999) has shown that for the same type of frame en-

coded with a unified quantization parameter, the gen-

erated frame size indicates the scene variations and

motion activities. Therefore, in the proposed scheme, we

directly use the source bit rate as a parameter to measurethe video content and to calculate the transcoding bit

budget for every frame.

As pointed out by (Assunc�~ao and Ghanbari, 2000),

in transcoding of precoded video, the bit target for

every frame should not be simply calculated as the

product of the number of bits of the incoming frame

and the ratio between the output and input bit rates of

the transcoder. In one of our experiment, we encodedthe same video sequence with two different quantiza-

tion parameters and the bit rates of the generated bit

streams are 263 and 667 kbps. The obtained bits/frame

ratios are illustrated in Fig. 4. The scaling factor a of

the two video streams is about 0.4, however, as we can

see from Fig. 4, the number of bits Bi1;B

i2 used to en-

code the corresponding picture of each video stream

are not related through the same scaling factor a, i.e.,Ni2 6¼ aNi

1.

If the objective of transcoding is to make the quality

of pictures to be near constant at the reduced rate, then

the scaling factor of such a transcoder should follow the

curve shown in Fig. 4.

4.2. Scene changes detection

A scene change represents any distinctive difference

between two adjacent video frames. It includes rapid

motion of moving objects as well as changes to different

visual content. In most existing rate control schemes, the

information obtained from previously coded pictures is

utilized for estimating the target bits for the current pic-

ture. However, if a scene change occurs, information ob-tained from previously coded pictures is no longer useful

and even can cause visual quality degradation in the pic-

tures following the scene change. Therefore, when a scene

change happens, the first frame after scene change should

be transcoded in high quality to prevent quality degra-

dation after scene change. Prior to that, scene changes

must be detected. As mentioned in the previous section,

when a scene change happens, most MBs in the anchorframe will be encoded as INTRA blocks. Therefore, in

this work, we simply use the percent of INTRA mode

MBs in a frame to detect the scene change as follows.

SCD¼Number of INTRACodedMacroblocks

Number of Macroblocks in a Frame�100%

If SCDP 40%, we believe a scene change happens.

Same as the source coding rate, SCD can also be cal-culated and recorded when the video is pre-encoded.

Fig. 5 shows variations in SCD for the sequence shown

in Fig. 3. As we can see, using the percent of INTRA

mode MBs in a frame is accurate enough to detect scene

changes. In the proposed scheme, if a scene change is

detected, we will transcode the anchor frame as an Iframe. Therefore, the quality of frames after the scene

change will not be affected.

4.3. Adaptive frame layer rate control

Different from real time interactive video applica-tions, such as video conferences, in which the end-to-end

Page 9: Adaptive video transcoding and streaming over wireless channels

0 50 100 150 200 250 300 350 400 450 5000

20

40

60

80

100

Perc

ent o

f IN

TRA

MBs

(%)

Frame Number

Fig. 5. SCD of a typical H.263 coded video sequence.

Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 261

delay of each frame must be kept within certain limit,

the requirement for end-to-end delay in the streaming

and playing precoded video application is relaxed.

In this type of application, an initial startup delay is

allowed and a buffer at the decoder will smooth the

delay jitter. Therefore, as long as the decoder buffer is

not underflowing, the end-to-end delay for every frame

can be different.The current frame layer rate control schemes in the

H.263 standard, such as TMN-8, are dedicated for

transmitting real time video in low delay over a constant

bit rate channel. The frame budget for every frame is

only determined by the buffer occupancy and channel

bandwidth. Therefore, nearly constant bit budget is

allocated to each frame and each frame will experience

similar end-to-end delay. Different from the frame layerrate control scheme in TMN-8, the proposed scheme

determines the transcoded frame bit budget considering

the scene change, frame types and source coding rate of

the pre-encoded video. In the design of the bit rate

control algorithm, since an exact curve such as the one

in Fig. 4 is not available to the transcoder, the following

rules are adopted to determine the transcoded frame bit

budget.

(1) If the incoming frame is an INTRA coded frame, it

will be transcoded as an INTRA frame with unified

quantization parameter used for every MB.

(2) P and B frames are treated differently by the trans-

coder. The bit budget for B frames will be half of

that for P frames, on the average.

(3) For the same type of INTER frames, the frame bitbudget is determined considering the effective chan-

nel bandwidth, end-to-end delay and the original

source traffic.

(4) If a scene change is detected, the anchor frame will

be transcoded as an INTRA frame with unified

quantization parameter for every MB.

At the frame layer, we first determine the bit budget

for a GOP, which has a fixed number of frames. Since

the INTRA frames are seldomly used in H.263, we as-

sume here that the GOP only includes INTER frames (Pand B) in a certain pattern (such as PBPBPB. . . or

PBBPBB. . .). The bit budget for encoding a GOP is

defined as

BGOP ¼ ðN þ aDÞ � RF

þ D

a ¼0:7; first GOP

0; otherwise

� ð11Þ

where N is the number of frames in a GOP, R is the

effective channel bandwidth, F is the frame rate, D is the

required end-to-end delay measured in frame intervals,and D is the bit budget unused from the previous GOP.

In order to take full advantage of the initial delay D, weintroduce parameter a, which is empirically set to 0.7 for

the first GOP. As we can see in Eq. (11), for the first

GOP, N frames will have more bit budget than that the

channel can actually transmit in the first N frame

intervals. Then, when the end-to-end delay is relaxed by

the application, the visual quality will be improved. Forother following GOPs, Eq. (11) assumes that N frames

in a GOP will be transmitted within N frame intervals.

The calculation of the effective channel bandwidth, R,will be introduced in the following section. Inside a

GOP, the frame bit budget is determined according to

frame types and the original source bit rate following the

above rules. Based on the above rules and the pattern of

P , and B frames within a GOP, we can know the bit

Page 10: Adaptive video transcoding and streaming over wireless channels

262 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270

budget for all P frames and B frames in a GOP,

respectively, as follows:

BGOP�B ¼ NB

2N � NBBGOP;

BGOP�P ¼ 2Np

Np þ NBGOP

ð12Þ

where NP ;NB are the numbers of P , and B frames in a

GOP. For the same type of frames, bits are allocated to

every frame according to the original source bit rate.

Then, for P frames, we have

BP ðiÞ ¼BP ðiÞPNP�1

j¼0 BP ðjÞBGOP�P ;

BBðiÞ ¼BBðiÞPNB�1

j¼0 BBðjÞBGOP�B

ð13Þ

where BP ðiÞ;BBðiÞ are the bit budgets for the ith P or Bframe, BP ðiÞ;BBðiÞ are the bit rates of the ith P and

B frames in the precoded video, respectively. When theframe bit budget is determined, the macroblock layer bit

allocation algorithm is responsible for calculating the

IP PP

skip

ItB P

t+3B

t+1B

t+2P

t+6 BB

Fr am e toskip

Pt

Bt-2

Transcoto P frato I frame

Transcode

Frame toskip

Transcode toI frame

Next unskippedframe

Frame toskip

skip

skip

(c) (d)

(a)

Fig. 6. Adaptive fra

quantization parameter for every MB and make the

generated bits as close as possible to the bit budget. In

Section 2, we have provided some references for macro-

block layer bit allocation. However, we believe that the

varying quality produced by varying quantization

parameters is far less than that produced by varying theframe bit budget. Therefore, in this work, we directly use

the macroblock layer bit allocation algorithm in TMN-

8. Details about the macroblock layer bit allocation in

TMN-8 can be found in (Corbera and Lei, 1999).

4.4. Adaptive frame skipping

As indicated in Eq. (10), at any given time, the sum oftranscoder buffer and decoder buffer occupancy must be

smaller than the bits for the D frames that have been

transcoded but not yet decoded. Otherwise, some frames

will experience longer end-to-end delay than D. The

reason of violating Eq. (10) is as follows. First, due to

the control error of the macroblock layer bit allocation

algorithm, the generated frame size may not be exact as

the bit budget. Second, due to the frame type change,such as transcoding P frames to I frames, the generated

PP PP

skip

Bt-1

Pt+3 BB P

tBt-2

Bt-1

Pt+3

skip

skipdeme

Transcode toP frame

Frame toskip

Next unskippedframe

Frame toskip

(e)

(b)

me skipping.

Page 11: Adaptive video transcoding and streaming over wireless channels

Table 1

Adaptive frame skipping decisiona

Frame type Scheduled to skip? SCDP 40%? Action

I Yes Yes Keep the I frame, skip the next PBB frames

I Yes No N/Ab

I No Yes Keep the I frame

I No No N/Ab

P Yes Yes Transcode the P frame as I frame, skip the next BB frames

P Yes No Keep the P frame, skip the next BB frames

P No Yes Transcode the P frame as I frame

P No No Keep the P frame

B(1) Yes Yes Skip the 2 B frames, transcode the next P frame as I frame

B(1) Yes No Skip the B frames

B(1) No Yes Skip the 2 B frames, transcode the next P frame as I frame

B(1) No No Keep the B frame

B(2) Yes Yes Skip the B frame, transcode the next P frame as I frame

B(2) Yes No Skip the B frame

B(2) No Yes Skip the B frame, transcode the next P frame as I frame

B(2) No No Keep the B frame

aAssume the frame pattern is IPBBPBBPBB. . . B(1), B(2) stand for the first and the second B frame in a PBBP pattern, respectively.bWhen the frame type is I, the SCD will be definitely greater than 40%.

Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 263

frame size may much greater than the frame bit budget.

Third, because the algorithm has to estimate the effective

channel bandwidth of the future D frame intervals, the

accuracy of the estimation will also affect the result of

Eq. (10). In order to guarantee that the end-to-end delay

of every frame is less than D, in the proposed algorithm,

after transcoding every frame, we will evaluate Eq. (10).If there are more than D frames stored in the transcoder

and decoder buffer, the next frame may be dropped

according to its type.

For example, if the original video was encoded in

IPPP pattern and the frame that needs to be skipped is

an I frame, then this I frame will be skipped and the next

unskipped frame (P ) will be transcoded to an I frame

with unified quantization parameter as in Fig. 6(a). Ifthe frame that needs to be skipped is a P frame, then this

P frame will be skipped and the next unskipped frame

will be transcoded to a P frame as in Fig. 6(b). If the

original video was encoded in IBBP pattern, we have to

consider the frame reordering that will happen at the

decoder. As illustrated in Fig. 6(c), suppose the I frame

will be displayed at time t, then the following two frames

that need to be displayed will be B frames at time t þ 1and t þ 2. However, when the original video is encoded,

the P frame, which is scheduled to be displayed at time

t þ 3, is encoded and stored prior to the two B frames.

When the decoder decodes this video stream, it will first

decode the P frame, and then the following two B frames

can be decoded and displayed. Therefore, in our trans-

coder, if the frame that needs to be skipped is an Iframe, we will skip this frame and transcode the fol-lowing P frames as an I frame. At the same time, the

following two B frames will also be skipped as in Fig.

6(c). If the frame that needs to be skipped is a P frame,

then instead of skipping this P frame, we keep this frame

but skip the following two B frames as in Fig. 6(d),

because, as we mentioned before, the bit rate for a Pframe is usually two times of that of a B frame. If the

frame that needs to be skipped is a B frame, we then

simply skip this frame as in Fig. 6(e). When the frame

scheduled to be skipped happens to be an anchor frame,

based on the same idea, the following rules in Table 1will be applied for frame skipping.

5. Probabilistic modeling of channel behavior

As introduced before, the estimation of the effective

channel bandwidth is very important for the frame layer

rate control. In this section, we will introduce a wirelessARQ channel model and related method for estimating

effective channel bandwidth, which has been extensively

used for simulating burst-error wireless channels in the

literature, such as in (Aramvith et al., 2001; Hsu et al.,

1999).

5.1. Wireless channel model

From the video transmission point of view, when

ARQ is used, the channel becomes a variable bit-rate

channel with throughputs depending on the channel

conditions. When the channel becomes poor, the re-

transmissions use up bandwidth and thus reduce the

effective channel rate (the effective channel rate is

defined as the rate of the information that is correctly

transmitted). For the purpose of simulation, a Markovchain was used to model bursty errors during trans-

mission based on collected network traffic traces.

Previous studies show that a first-order Markov chain,

such as a two-state Markov model or a finite-state

Page 12: Adaptive video transcoding and streaming over wireless channels

S0 S1

P01

P10

P00 P11

Good channel state(no error)

Bad channel state(error occurs)

Fig. 7. Two-state Markov channel model.

Table 2

Summary of the transition matrices used in our experiments

Average

packet error

rate (%)

P00 P01 P10 P11

Mean_Burst_Length¼ 18 Packets, Packet Size¼ 40 bits

5 0.9971 0.0029 0.0556 0.9444

10 0.9938 0.0062 0.0556 0.9444

15 0.9902 0.0098 0.0556 0.9444

20 0.9861 0.0139 0.0556 0.9444

25 0.9815 0.0185 0.0556 0.9444

30 0.9762 0.0238 0.0556 0.9444

264 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270

model, provides a good approximation in modeling

the error process at the packet level in wireless

channels. Using these models, one can dynamically

estimate the situation of the specified channel and use

this knowledge to help designing corresponding algo-

rithms. At the same time, one can generate artificialnetwork traces for the network under study and use

the traces to simulate, and thus, better understand

performances of existing and new protocols and

applications.

In (Aramvith et al., 2001), a two-state Markov model

was used to estimate the channel states. In this work,

since we are only interested in the changing of the

effective channel rate due to the retransmission in ARQ,to simplify the analysis, we use a two-state Markov

model, which is similar to (Aramvith et al., 2001), to

emulate the process of packet errors. We do not consider

the effect of other factors, such as the reliability of the

feedback channel, the round trip delay of ACK/NAK

feedbacks, retransmission times, etc.

In the channel model under study, the channel

switches between a ‘‘good state’’ and a ‘‘bad state’’, S0and S1, as illustrated in Fig. 7.

A packet is transmitted correctly when the channel is

in state S0, and errors occur when the channel is in state

S1. Pij for i; j 2 f0; 1g are the transition probabilities. The

packet-error statistics will vary according to the values of

the transition probabilities. The transition probabilities

can be calculated from the collected channel statistics

(such as the average packet-error-burst length and thepacket error-rate) generated from the wireless channel

simulator. The channel state-transition probability

matrix for this channel model can be set up as

P ¼ P00 P01P10 P11

� �¼ 1� P01 P01

P10 1� P10

� �ð14Þ

The transition probability, P01 and P10, can be derived

using the assumption of Gilbert’s Markov model (Gil-

bert, 1960). The run length of error-bursts has a geo-

metric distribution with mean 1=P10, i.e.,

P10 ¼1

Mean Burst Length

The mean_burst_length statistics can be obtained

from the packet error-pattern generated by the wireless

channel simulator. The average packet error rate is given

by

PER ¼ P01P01 þ P10

Therefore, P01 can be derived from two parameters, P10and packet error rate (PER), as

P01 ¼P10 � PER

ð1� PERÞ

In this work, we use different packet error rates to

generate several transition matrices as in Table 2. We do

not consider channel coding overhead, and define the

packet payload size as 40 bits and the mean burst length

as 18 packets. In our experiments, we will use these

transition matrices to estimate the effective channel rate

as described in the following section.

5.2. Effective channel throughput of ARQ protocol

Based on the channel feedback information from the

client and the channel model introduced in Section 5.1,

the future effective channel rate can be estimated. From

the transitional probability matrices and a given initial

state, the expected future channel throughput, i.e., the

average of the probability of the correct transmission inthe next i packets, can be calculated as in (Aramvith

et al., 2001). We use this information to adjust the target

number of bits in the video rate-control algorithm. In

the following discussion, all the time periods mentioned

are normalized with the time to transmit a packet. If we

do not consider the round-trip delay of the ACK/NAK

messages, when a packet is transmitted at time t, we canimmediately get the feedback message, ACK or NAK.The channel state at time t, SðtÞ, is known. Based on the

transmission probability matrix P , we define the state-

probability vector at time k

pðkjSðtÞ ¼ SnÞ ¼ ½p0ðkjSðtÞ ¼ SnÞ;p1ðkjSðtÞ ¼ SnÞ�; n 2 f0; 1g

ð15Þ

Page 13: Adaptive video transcoding and streaming over wireless channels

Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 265

as a row vector formed by the two-state probabilities,

i.e., the probabilities for the channel to be in state S0 andS1 at time k, respectively, given that it was observed to

be in state Sn at time t. Note that t, and k are all discrete

values. The initial state probability pðtjSðtÞ ¼ SnÞ at time

t is set as

piðtjSðtÞ ¼ SjÞ ¼1; when i ¼ j

0; otherwise

�i; j 2 f0; 1g

ð16Þ

The state probabilities at time k can be derived from

the state probabilities at time k � 1

pðkjSðtÞ ¼ SnÞ ¼ pðk � 1jSðtÞ ¼ SnÞ � P ð17ÞBy recursively using Eq. (17), the channel state proba-

bilities at time k, where k > t, can be calculated from the

initial state probability and the transition probability

matrix

pðkjSðtÞ ¼ SnÞ ¼ pðtjSðtÞ ¼ SnÞ � Pk�t ð18ÞIn this channel model, packets are transmitted correctly

(C bits are transmitted, where C is the packet size) when

the channel is in state S0, while errors occur (0 bits are

transmitted) when the channel is in state S1. The ex-

pected channel rate E½CðkÞjSðtÞ� given the observation of

channel state SðtÞ can be calculated as

E½CðkÞjSðtÞ� ¼ C � p0ðkjSðtÞÞ ð19Þ

6. Joint source-channel transcoding for rate adaptation

Combining the ideas in the previous sections, we

propose the joint source-channel rate adaptation algo-

rithm. In this section, we summarize the whole process

of the algorithm. The following notations are used:

BGOP target number of bits assigned to a GOP;

F target frame rate in frames per second;N frame number in a GOP;

NP number of uncoded P frames in a GOP;

NB number of uncoded B frames in a GOP;

BGOP target number of bits left for the uncoded

frames in a GOP;

BGOP�P target number of bits left for the uncoded Pframes in a GOP;

BGOP�B target number of bits left for the uncoded Bframes in a GOP;

BP ðiÞ bit target for the ith P frame;

BBðiÞ bit target for the ith B frame;

The detailed algorithm is as follows:

Step 1: Determine the current channel state: we calcu-

lated the average ratio of successfully transmitted bits to

the average number of total transmitted bits in the past

L frame intervals. If the ratio is greater than a thresh-

old H , we decide that the channel is in the good

state. In our simulation, we set L as 10 and H as 0.9

empirically.

Step 2: Calculate the estimated channel bandwidth:

Depending on the current channel state, we can use theMarkov model to find the probability of the correct

transmission in the next D frame intervals. Then we can

calculate the estimated channel bandwidth, R, for the

next D frame intervals as in Eq. (19).

Step 3: Calculate the bit budget for a GOP. At the

beginning of every GOP, we first calculate the bit

budget for the GOP as in Eq. (11). In our experiments,

we set N to 30. At the beginning of a GOP, the bitbudget for the uncoded frames in the GOP, BGOP, is

equal to BGOP.

Step 4: Calculate the bit budget for every frame

within a GOP according to frame types, scene context

and source coding rate. If the incoming frame will be

transcoded as INTER frame, the frame bit target for

this frame will be calculated according to the source

coding rate. Since the original source rate and framepattern are known when the original video is coded,

the frame bit target can be calculated by applying the

rules in Section 4.3. Similarly as Eqs. (12) and (13), we

can have

BGOP�P ¼ 2� NP � BGOP

2� NP þ NB;

BGOP�B ¼ NB � BGOP

2� NP þ NB

ð20Þ

BP ðiÞ ¼BP ðiÞPNP�1

j¼i BP ðjÞBGOP�P ;

BBðiÞ ¼BBðiÞPNB�1

j¼i BBðjÞBGOP�B

ð21Þ

Step 5: Adjust the frame bit budget. By using Eq. (9),

the frame bit budget needs to be adjusted considering

the buffer and end-to-end delay constraints.Step 6: Transcoding the incoming frame. In this work,

the TMN-8 macroblock layer bit allocation algorithm

is adopted to calculate the quantization parameter of

every MB.

Step 7: Update. Transcoder and decoder buffer

occupancy is updated as in Eqs. (5) and (6). Other

parameters, such as BGOP;NP ;NB are also updated. Eq.

(10) will be evaluated to see if the total number of framesstored in the transcoder and decoder buffer is greater

than the maximum end-to-end delay, D, or not. If thereare more than D frames stored in the system, the suc-

cessive frame is scheduled to be skipped. The action will

be decided according to Table 1. If there are more

Page 14: Adaptive video transcoding and streaming over wireless channels

266 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270

frames that need to be transcoded, go to Step 1, other-

wise, stop.

7. Experimental results and conclusions

In this work, we implement the rate adaptation

transcoder and the proposed algorithm based on the

public domain software for H.263 (TMN8, 1997). In our

experiments, the test sequence as in Fig. 3 is encoded

with using unified quantization parameters, QP¼ 5, for

every frame with a frame rate of 10 fps. Only the first

frame of the sequences is encoded as an I frame and

other frames are encoded as B or P frames. This se-quence is created by cascading several test sequences

with different scene content and serves as a high quality

pre-encoded video. In order to simulate the wireless

channel with the ARQ protocol, we use a random

number generator to generate a random number r,which is uniformly distributed in {0,1}. Based on the

Markov model and transition matrices introduced in

Section 5.1, if the current state is S0, when r is less thanP01, the transition from state S0 to S1 occurs. If the

current state is S1, when r is less than P10, the transitionfrom S1 to S0 occurs. In this way, we generate 6 packet

level channel error traces with length of 50,000 packets.

Every trace represents a possible channel with a specific

packet error rate. In the transcoder, the proposed joint

source-channel adaptive transcoding scheme is used to

transcode the original video according to the estimatedchannel conditions. We assume the channel bandwidth

for video payload is 64, 48 and 32 kbps. Based on the

channel feedback and the same channel model that is

used to generate the error trace, the transcoder will

estimate the effective channel bandwidth after every

frame interval. We simulate the channel coder, trans-

mitter and decoder buffer. Whether a video packet is

transmitted successfully to the decoder will be deter-

Transcoder

Pre-encodedVideo

ChannelModel

Decoder

VideoOutput

ARQ

Transcod

Fig. 8. System diagram of the proposed ARQ-b

mined by the error trace. A simplified simulation sce-

nario is shown in Fig. 8.

In order to compare the video quality, we also encode

the source video sequence at the same rates of 64, 48 and

32 kbps by using the TMN8 rate control scheme. At the

frame layer, TMN8 allocates near constant bit budget toevery frame without considering the frame types and

scene context. A frame is skipped if the number of bits

accumulated in the buffer after encoding the previ-

ous frame is greater than a threshold. To get a fair

comparison, we change this threshold to allow longer

end-to-end delay, and then there are will be no frame

skipping in the encoded video.

The visual quality is measured by calculating the peaksignal noise ratio (PSNR) of the transcoded video and

the encoded video. The PSNR comparison result for

different effective channel rate and end-to-end delay is

illustrated in Fig. 9. As shown in Fig. 9, by controlling

the number of bits assigned to each frame considering the

frame types, we can obtain higher PSNR result as com-

pared with encoding the original video at the same low

channel bandwidth, and maximum end-to-end delay.Especially when a scene change happens at the 100th

frame, because the INTER frame at the scene change is

transcoded to an I frame, the visual quality of the fol-

lowing frame is much higher than the encoded version.

We also calculate the time when every frame arrives

at the decoder buffer and is displayed. The comparison

result is illustrated in Fig. 10. At the right side, we also

calculate the result of the encoded video being trans-mitted over the same channel. As we can see, by rate

adaptation transcoding, the arrival time of every frame

is very close to the display time. The decoder will not

have to suspend decoding and displaying for a long

time. However, if the test sequence is encoded with low

bit rate (64, 48, 32 kbps) as the target bit rate, since the

effective channel rate is changing over time, after several

frames the frame arrival time will be late than its display

Channel Coder

Channel Decoder

(Packetizer,FEC,

Interleaving)

ed Stream Rate

Effective Rate with Retransmissions

Generated packeterror traces

(...010111011000…)

ased transcoding and streaming scheme.

Page 15: Adaptive video transcoding and streaming over wireless channels

0 40 80 120 160 2000

10

20

30

40

50

R = 64 kbpsD = 10 frame intervals

PSN

R (d

B)

Frame

encoded version

0 40 80 120 160 2000

10

20

30

40

50

PSN

R (d

B)

R = 48 kbpsD = 20 frame intervals

0 40 80 120 160 2000

10

20

30

40

50

PSN

R (d

B)

R = 32 kbpsD = 30 frame intervals

encoded version

encoded versiontranscoded version

Frame

Frame

transcoded version

transcoded verison

Fig. 9. Video quality comparison of transcoded video and encoded video at different channel bandwidth and end-to-end delay.

Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 267

time. Therefore, the decoder will have to wait for the

next frame coming.Fig. 11 shows the relation among the average PSNR

of the whole sequence, the end-to-end delay and the

target channel rate. As we can see, for the same target

channel rate, when the end-to-end delay is relaxed, theaverage PSNR increases. Therefore, the proposed

transcoding scheme will be able to take full advantage of

Page 16: Adaptive video transcoding and streaming over wireless channels

0 40 80 120 160 2000

50

100

150

200

250

display time

Transcoded VersionR = 64 kbpsD = 10 frame intervals

0 40 80 120 160 2000

50

100

150

200

250

Encoded VersionR = 64 kbpsD = 10

0 40 80 120 160 2000

50

100

150

200

250

Transcoded VersionR = 48 kbpsD = 10 frame intervals

0 40 80 120 160 2000

50

100

150

200

250

Encoded VersionR = 48 kbpsD = 10

0 40 80 120 160 2000

50

100

150

200

250

Transcoded VersionR = 32 kbpsD = 10

0 40 80 120 160 2000

50

100

150

200

250

Encoded VersionR = 32 kbpsD = 10 frame intervals

FrameFrame

FrameFrame

Frame Frame

Tim

e (fr

ame

inte

rval

s)

Tim

e (fr

ame

inte

rval

s)Ti

me

(fram

e in

terv

als)

Tim

e (fr

ame

inte

rval

s)

Tim

e (fr

ame

inte

rval

s)Ti

me

(fram

e in

terv

als)

arrival timedisplay time

arrival timedisplay time

frame intervals

arrival timedisplay time

frame intervals

arrival timedisplay time

frame intervalsarrival timedisplay time

arrival time

Fig. 10. Comparison of frame arrival time and scheduled display time.

268 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270

the end-to-end delay. However, if we just encode the test

sequence at the target channel rate, the video quality will

be fixed after encoding. No matter what the end-to-end

delay will be, the quality of the decoded video sequence

will not change.

In the above shown experimental results, only the

channel error trace with average PER¼ 5% is used.Fig. 12 shows the relation between the packet error

rate and the average PSNR at the same end-to-end

delay. From the experiment results, we can see that

rate adaptation transcoding is an effective solution

for streaming high quality pre-encoded video through

low bandwidth wireless channels. Due to its finer

level control on the transcoding frame bit budget,

considering the source video content, channel condi-

tion and end-to-end delay, it is very flexible for deliv-ering video services and applications over wireless

channels.

Page 17: Adaptive video transcoding and streaming over wireless channels

5 10 15 20 25 3033

35

37

39

41

43

R= 64 KbpsR= 48 KbpsR= 32 Kbps

D= 10 frame intervals

Average Packet Error Rate (%)

Aver

age

PSN

R (d

B)

Fig. 12. Average PSNR vs. average PER and channel bandwidth.

10 15 20 25 30 3533

35

37

39

41

43

R = 64 kbpsR = 48 kbpsR = 32 kbps

Delay (frame intervals)

Ave

rage

PS

NR

(dB

)

Fig. 11. Average PSNR vs. end-to-end delay and channel bandwidth.

Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 269

References

Aramvith, S., Pao, I.-M., Sun, M.-T., 2001. A rate-control scheme for

video transport over wireless channels. IEEE Transactions on

Circuits and Systems for Video Technology 11 (5).

Assunc�~ao, P., Ghanbri, M., 1996. Post-processing of MPEG-2 coded

video for transmission at lower bit rates. In: Proceedings of IEEE

International Conference on Acoustics, Speech, and Signal Pro-

cessing, ICASSP’96, vol. 4, May 1996.

Assunc�~ao, P., Ghanbari, M., 1997. Optimal transcoding of compressed

video. In: IEEE International Conference on Image Processing,

ICIP’97, vol.1, USA, October 1997, pp. 739–742.

Assunc�~ao, P., Ghanbari, M., 1997. Congestion control of video traffic

with transcoders. In: IEEE International Conference on Commu-

nications, ICC’97, Montreal, 1997.

Assunc�~ao, P., Ghanbari, M., 1998. A frequency-domain video

transcoder for dynamic bit-rate reduction of MPEG-2 bit streams.

IEEE Transactions on Circuits and Systems for Video Technology

8 (8).

Assunc�~ao, P., Ghanbari, M., 2000. Buffer analysis and control in CBR

video transcoding. IEEE Transactions on Circuits and Systems for

Video Technology 10 (1).

Chang, S.-F., Messerschmitt, D.G., 1993. A new approach to decoding

and composting motion compensated DCT-based images. In:

Proceeding of IEEE International Conference on Acoustics,

Speech, and Signal Processing, ICASSP’93, vol. 5, April 1993.

Chen, M.-J., Chu, M.-C., Pan, C.-W., 2002. Efficient motion-estima-

tion algorithm for reduced frame-rate video transcoder. IEEE

Transactions on Circuits and Systems for Video Technology

12 (4).

Cheng, N.T., Kingsbury, N.G., 1992. The EREC: an efficient error

resilient technique for encoding positional information on sparse

data. IEEE Transactions on Communications 40 (Jan.).

Chiang, T., Zhang, Y.-Q., 1997. A new rate control scheme using

quadratic rate distortion model. IEEE Transactions on Circuits

and Systems for Video Technology 7 (1), 246–250.

Choi, J., Park, D., 1994. A stable feedback control of the buffer state

using the controlled Langrange multiplier method. IEEE Transac-

tions on Image Processing 3 (Sept.), 546–558.

Corbera, J.R., Lei, S., 1999. Rate control in DCT video coding for

low-delay communications. IEEE Transaction on Circuit and

System for Video Technology 9 (1).

Ding, W., Liu, B., 1996. Rate control of MPEG video coding and

recording by rate-quantization modeling. IEEE Transactions on

Circuits and Systems for Video Technology 6 (1), 12–20.

Dogan, S., Cellatoglu, A., Sadka, A.H., Kondoz, A.M., 2001. Error-

resilient MPEG-4 video transcoder for bit rate regulation. In:

Proceedings of the Fifth World Multi-Conference on Systemics,

Cybernetics and Informatics (SCI’2001), vol. XII, Part. II,

Orlando, Florida, USA, 22–25 July 2001, pp. 312–317.

Dogan, S., Sadka, A.H., Kondoz, A.M., 2001. MPEG-4 video

transcoder for mobile multimedia traffic planning. In: Proceedings

of the IEE Second International Conference on 3G Mobile

Communication Technologies (3G’2001), No. 477, London, UK,

26–28 March 2001, pp. 109–113.

Fung, K.-T., Chan, Y.-L., Siu, W.-C., 2001. Dynamic frame skipping

for high-performance transcoding. In: Proceedings of the Interna-

tional Conference on Image Processing (ICIP2001).

Ghanbari, M., 1989. Two-layer coding of video signals for VBR

networks. IEEE Journal on Selected Areas in Communications 7

(June).

Gilbert, E.N., 1960. Capacity of a burst-noise channel. The Bell

System Technical Journal 39 (Sept.), 1253–1265.

Hang, H.-M., Chen, J.-J., 1997. Source model for transform video

coder and its application––Part I: fundamental theory. IEEE

Transactions on Circuits and System for Video Technology 7 (2),

287–298.

He, Z., Kim Yong, K., Mitra, S.K., 2001. Low-delay rate control for

DCT video coding via q––domain source modeling. IEEE Trans-

action on Circuits and Systems for Video Technology 11 (8).

Hsu, C.-Y., Ortega, A., Khansari, M., 1999. Rate control for robust

video transmission over burst-error wireless channels. IEEE

Journal on Selected Areas in Communications 17 (5).

Hwang, J.-N., Wu, T.-D., Lin, C.-W., 1998. Dynamic frame skipping

in video transcoding. In: Proceedings of IEEE Workshop on

Multimedia Signal Processing, USA, December 1998.

Keesman, G., Hellinghuizen, R., Hoeksema, F., Heideman, G., 1996.

Transcoding of MPEG bitstreams. Signal Processing: Image

Communication, 481–500.

Khansari, M., Jalali, A., Dubois, E., Mermelstein, P., 1996. Low bit-

rate video transmission over fading channels for wireless microcel-

lular systems. IEEE Transactions on Circuits and Systems for

Video Technology 6 (1).

Lei, Z., Georganas, N.D., 2002. Rate adaptation transcoding for

precoded video streams. In: Proceedings of ACMMultimedia 2002,

Juan-les-Pins, France, December 1–6, 2002.

Lin, C.-W., 2000. Video transcoding techniques for multipoint video

conferencing. Ph.D. dissertation, Department of Electrical Engi-

neering, National Tsing Hua University, January 2000.

Page 18: Adaptive video transcoding and streaming over wireless channels

270 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270

Lin, C.-W., Chen, Y.-C., Sun, M.-T., 2000. Dynamic region of interest

transcoding for multipoint video conferencing. In: Proceedings of

International Computer Symposium and Workshop on Computer

Networks, Internet, and Multimedia, Chiayi, Taiwan, December

2000.

Lin, C.-W., Liou, T.-J., Chen, Y.-C., 2000. Dynamic rate control in

multipoint video transcoding. In: IEEE International Symposium

on Circuits and Systems, ISCAS 2000, Geneva, Switzerland, May

2000.

Liu, H., Zarki, M., 1998. Adaptive source rate control for real-time

wireless video transmission. Mobile Networks and Applications 3

(1).

Puri, R., Lee, K.-W., Ramchandran, K., Bharghavan, V., 2001. An

integrated source transcoding and congestion control paradigm for

video streaming in the Internet. IEEE Transactions on Multimedia

3 (1).

Ramchandran, K., Ortega, A., Vetterli, M., 1994. Bit allocation for

dependent quantization with application to multiresolutions and

MPEG video coders. IEEE Transaction on Image Processing 2

(September).

Redmill, D.W., Kingsbury, N.G., 1996. The EREC: an error resilient

technique for coding variable length blocks of data. IEEE

Transactions on Image Processing 5 (April), 6.

Reyes, G., Reibman, A.R., Chang, S.-F., Chuang, J.C., 2000. Error-

resilient transcoding for video over wireless channels. IEEE Journal

on Selected Areas in Communications 18 (6).

Shanableh, T., Ghanbari, M., 2000. Heterogeneous video transcoding

to lower spatio-temporal resolutions and different encoding

formats. IEEE Transactions on Multimedia 2 (2).

Shen, B., Sethi, I., Vasudev, B., 1999. Adaptive motion-vector

resampling for compressed video downscaling. IEEE Transactions

on Circuits and Streams for Video Technology 9 (6).

Sun, H., Kwok, W., Zdopski, J.W., 1996. Architecture for MPEG

compressed bitstream scaling. IEEE Transactions on Circuits and

Systems for Video Technology 6 (Apr).

Tao, B., Dickinson, B.W., Peterson, H.A., 2000. Adaptive model-

driven bit allocation for MPEG video coding. IEEE Transactions

on Circuits and Systems for Video Technology 10 (1), 147–157.

Video Codec Test Model, TMN8, ITU-T/SG-15, 1997. Available from

<http://www.ece.ubc.ca/spmg/h263plus/h263plus.html>.

Wang, Y., Zhu, Q.-F., 1998. Error control and concealment for video

communication: a review. Proceedings of the IEEE 86 (5).

Wu, S.-W., Gersho, A., 1991. Rate-constrained optimal block-adap-

tive coding for digital tape recording of HDTV. IEEE Transactions

on Circuits and Systems for Video Technology 1 (March), 100–112.

Youn, J., Sun, M.T., 1999. A fast motion vector composition method

for temporal transcoding. In: Proceedings IEEE International

Symposium on Circuits and Systems (ISCAS’99), May 1999.

Zhu, Q.-F., Keofsky, L., Garrison, M.B., 1999. Low-delay, low-

complexity rate reduction and continuous presence for multipoint

videoconferencing. IEEE Transactions on Circuits and Systems for

Video Technology 9 (4).

Zhijun Lei received his B.E. degree in Computer Engineering fromDalian University of Technology in 1996, his M.E. degree in ComputerEngineering from Beijing University of Posts and Telecommunicationsin 1999, and his Ph.D. in Computer Science from the University ofOttawa in 2003. His research interests are in the fields of digital videocoding and transcoding, rate control, wireless video communications,etc. Dr. Lei is a member of IEEE.

Nicolas D. Georganas, OOnt, FIEEE, FRSC, FCAE, FEIC is Dis-tinguished University Professor and Canada Research Chair inInformation Technology at the School of Information Technology andEngineering, University of Ottawa.

He received the Dipl. Ing. degree in Electrical Engineering from theNational Technical University of Athens, Greece, in 1966 and thePh.D. in Electrical Engineering (Summa cum Laude) from the Uni-versity of Ottawa in 1970.

He has published over 300 technical papers and is co-author ofthe book ’’Queueing Networks-Exact Computational Algorithms: AUnified Theory by Decomposition and Aggregation’’, MIT Press,1989. He has received research grants and contracts totaling morethan $51 million and has supervised more than175 researchers,among which 90 graduate students (25 Ph.D., 65 MASc) and 18PostDocs.

In 1990, he was elected Fellow of IEEE. In 1994, he was electedFellow of the Engineering Institute of Canada. In 1995, he was co-recipient of the IEEE INFOCOM’95 Prize Paper Award. In 1997, hewas inducted as Fellow in the Canadian Academy of Engineering andFellow of the Royal Society of Canada. In 1998, he was selected asthe University of Ottawa Researcher of the Year and also receivedthe University 150th Anniversary Medal for Research. In 1999, hewas awarded the Thomas W. Eadie Medal of the Royal Society ofCanada, funded by Bell Canada, for his contributions to Canadianand International telecommunications. In 2000, he received theA.G.L. McNaughton Gold Medal and Award for 1999–2000,the highest distinction of IEEE Canada; the Julian C. Smith Medal ofthe Engineering Institute of Canada; the OCRI President’s Award(jointly with Dr. Samy Mahmoud) for the creation of the NationalCapital Institute of Telecommunications (NCIT); the Bell CanadaForum Award from the Corporate-Higher Education Forum, theResearcher Achievement Award, from the TeleLearning Network ofCentres of Excellence and a Canada Research Chair in InformationTechnology. In 2001, he was appointed Distinguished UniversityProfessor of the University of Ottawa and he was also received theOrder of Ontario, the province’s highest and most prestigious hon-our. In 2002, he received the Killam Prize for Engineering, Canada’smost distinguished award for outstanding career achievements.