View
214
Download
0
Category
Preview:
Citation preview
The Journal of Systems and Software 75 (2005) 253–270
www.elsevier.com/locate/jss
Adaptive video transcoding and streaming over wireless channels
Zhijun Lei *, Nicolas D. Georganas
Distributed and Collaborative Virtual Environments Research Laboratory (DISCOVER), School of Information Technology and Engineering,
University of Ottawa, 800 King Edward Avenue, Ottawa, Ont., Canada K1N 6N5
Received 21 January 2003; received in revised form 6 August 2003; accepted 7 September 2003
Available online 25 March 2004
Abstract
In this work, we investigate the problem of bit rate adaptation transcoding for transmitting pre-encoded VBR video over burst-
error wireless channels, i.e., channels such that errors tend to occur in clusters during fading periods. In particular, we consider a
scenario consisting of packet-based transmission with Automatic Repeat ReQuest (ARQ) error control and a feedback channel.
With the acknowledgements received through the feedback channel and a statistical channel model, we have an estimate of the
current channel state, and effective channel bandwidth. In this paper, we analyze the constraints of buffer and end-to-end delay, and
derive the conditions that the transcoder buffers have to meet for preventing the end decoder buffer from underflowing and
overflowing. Furthermore, we also investigate the source characteristics and scene changes of the pre-encoded video stream. Based
on the channel constraints and source video characteristics, we propose an adaptive bit rate adaptation algorithm for transcoding
and transmitting pre-encoded VBR video stream over wireless channel. Our experimental results demonstrate that, by reusing the
source characteristics and scene change information, transcoding high quality video can produce better video picture quality than
that produced by directly encoding the uncompressed video at the same low bit rate. Moreover, by controlling the frame bit budget
according to the channel conditions and buffer occupancy, the initial startup delay of streaming pre-encoded video can be signif-
icantly reduced.
� 2004 Elsevier Inc. All rights reserved.
Keywords: Wireless video; Mobile multimedia; Video transcoding; Content based rate adaptation; Video streaming
1. Introduction
Recently, there has been a great demand for audio/
visual services to be provided over wireless links. It is
expected that many video services and multimediaapplications will enable users to access pre-encoded
video bit streams through wireless connections and
handheld devices. Such applications include video on
demand (VoD), tele-learning, etc. In these applications,
the pre-encoded video needs to be decoded and dis-
played on the fly, while it is downloaded. Due to the
variety of different networks comprising the present
communication infrastructure, users may connect to thepre-encoded video stream through connections with
*Corresponding author. Address: 402-25 Leith Hill Rd., Toronro,
Canada M2J 1Z1. Tel.: +1-416-491-8631.
E-mail addresses: leizj@discover.uottawa.ca (Z. Lei), georganas@
discover.uottawa.ca (N.D. Georganas).
0164-1212/$ - see front matter � 2004 Elsevier Inc. All rights reserved.
doi:10.1016/j.jss.2003.09.029
different characteristics and capacities. In order to
accommodate different connections and allow all users
to be able to access the pre-encoded video, effective
compression and transmission schemes need to be
adopted. For instance, like what we have seen on theInternet, the same video program is encoded into several
copies with different quality and bit rates. When users
want to download and play a video, they have to select
the specific copy, which is compatible with their devices
and connections. However, a great lack of flexibility
arises because only a few copies cannot represent all
possible connection bandwidths, and, sometimes, users
have to choose the specific copy with bit rate differentfrom the actual connection bandwidth. Moreover, when
the effective connection bandwidth is changing, pre-
encoded videos are unable to adapt to the changes,
because the dynamic channel condition is usually
unknown when the video is originally coded. Especially
when the original video is encoded with unified quality,
254 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270
the generated bit rate will tend to be highly variable due
to the nature of the video. When the VBR coded videos
need to be transmitted over a network, frames will arrive
at the decoder experiencing different end-to-end delays
due to their variable size. Usually, an initial startup
buffer is used at the decoder side to compensate for thedelay jitters. The decoder will start to decode and dis-
play the video when the buffer is full. If for some reason
the delivery of the next frame is delayed and the initial
buffer is empty, the decoder has to suspend rendering
until the buffer is refilled. This so-called rebuffering
process is a frequent cause of users finally giving up.
Layered video coding was originally designed to solve
this kind of problems. In layered video coding, a video isencoded into several layers with different levels of
importance and quality. However, since the number of
layers is limited and no dynamic changes can be made
on the compressed video during transmission, the
inflexibility still exists.
Alternatively, a video transcoder can be used at the
video source or an intermediate node to convert the
video bit rate. When the connection bandwidth is verylow, if the video bit rate can be reduced, then the same
number of frames can be transmitted as in the case when
both video bit rate and channel bandwidth are high.
Similar to source encoders, video transcoders can
modulate the data they produce by adjusting a number
of parameters, including quality, frame rate, and reso-
lution. Using transcoders gives us a second chance to
dynamically adjust the video bit rate according tochannel bandwidth. This is particularly useful when
there are time variations in the channel characteristics.
The increasing demand for mobile communications
has resulted in the extensive use of wireless communi-
cation technology. However, a signal received over a
wireless channel exhibits considerable fades in the signal
strength. Unlike a wireline channel where the signal
strength is relatively constant and the errors in receptionare mainly due to the additive noise, errors in a wireless
channel are predominantly due to the time varying sig-
nal strength caused by the multi-path propagation from
local scatters. Thus, errors in a wireless channel tend to
be bursty, with the duration of bursts being a function of
the receiver velocity and the nature of the time varying
environment. To achieve high video quality at the de-
coder requires a robust transmission scheme. Closed-loop error techniques like Automatic Repeat ReQuest
(ARQ) have been shown to be more effective than
Forward Error Correction (FEC) and successfully ap-
plied to wireless video transmission (Khansari et al.,
1996). ARQ approaches, assuming the existence of a
back channel and sufficiently long end-to-end delays, are
appealing in that retransmission is only required during
periods of poor channel conditions.In this paper, we concentrate on using rate adapta-
tion transcoding techniques and an ARQ error control
scheme for transmitting pre-encoded video over wireless
channels. To take full advantage of the error control
capabilities of the ARQ scheme, we propose to combine
the ARQ feedback mechanism with the transcoding
mechanism at the video transcoder. The scheme can be
broadly divided into two parts. First, content basedapproach for determining transcoding frame bit budget.
Second, adjust the frame bit budget or adaptively drop
frames according to effective channel bandwidth and
required end-to-end delay bounds. By using this scheme,
one can achieve the following appealing results. First, at
the video server, for every video program, only one high
quality compressed copy is saved. When a video pro-
gram needs to be transmitted to a client, transcoders canbe used to adaptively transcode the video program for
different channel conditions. Second, the rate for the
transcoded video is reduced during the periods of poor
channel conditions. Third, every frame will be delivered
to the decoder within the required end-to-end delay,
thus, no large initial buffer is needed at the decoder side
and rebuffering is avoided. Fourth, the quality of video
can be improved when applications relax the end-to-enddelay requirement.
The rest of this paper is organized as follows. In
Section 2, we introduce some related works in three
aspects: video transcoding, video rate control and bit
allocation, and error control for wireless video com-
munications. In Section 3, the studied system is intro-
duced. We analyze the delay and buffer constraints that
the transcoder buffer has to satisfy. Based on theseconstraints, we derive the constraints on the transcoding
ratio. In Section 4, we investigate the source character-
istics of the pre-encoded video, including the frame
types, source video rate, and scene changes, and their
effects on visual quality. Based on this analysis, we
propose the adaptive frame layer rate control and frame
skipping algorithms. In Section 5, we briefly describe a
wireless channel model and the method for estimatingthe effective channel bandwidth based on this channel
model, which will be used as our simulation test bed. In
Section 6, we propose a joint source-channel rate
adaptation transcoding scheme for video transmission
over the studied wireless video system. We assume that
an a priori probabilistic model of the channel behavior is
available and a selective ARQ scheme is used for error
control. Our simulation results and conclusions will bepresented in Section 7.
2. Related works
2.1. Video transcoding
Video transcoding deals with converting a previously
compressed video signal into another one with different
Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 255
format, such as different bit rate, frame rate, frame
size, or even compression standard. The concept of
transcoding was first proposed by Sun et al. (1996)
for compressed video bit rate scaling. Later on, trans-
coding was used for spatial resolution (video frame
size) conversion (Shen et al., 1999; Shanableh andGhanbari, 2000), temporal resolution (video frame rate)
conversion (Hwang et al., 1998; Youn and Sun, 1999;
Chen et al., 2002; Fung et al., 2001), bit rate adaptation
(Assunc�~ao and Ghanbari, 1998; Assunc�~ao and Ghan-
bari, 2000; Lei and Georganas, 2002), multipoint video
combining (Lin et al., 2000a,b; Lin, 2000), and error
resilience (Dogan et al., 2001a,b; Reyes et al., 2000), etc.
Among all transcoding related research topics andapplications, bit rate adaptation transcoding has been
the most popular one. The idea of compressed video bit
rate adaptation is initiated by the applications of trans-
mitting precoded video streams over heterogeneous
networks. In this case, the diversity of channel capacities
in different transmission media often gives rise to prob-
lems. More specifically, when connecting two transmis-
sion media, the channel capacity of the outgoing channelmay be less than that of the incoming channel or the
channel capacity of the outgoing channel may change
over time. On the other hand, when pre-encoded video
needs to be distributed to users with different connec-
tions, the target transmission channel conditions are
generally unknown when the video is originally encoded.
In this case, transcoders can be used to dynamically
convert the bit rate of the compressed video for the targetchannel.
There are three major transcoder architectures that
have been proposed in the literature. The most
straightforward method connects a standard decoder
and a standard encoder together. This close-loop
transcoder is called Cascaded Pixel Domain Transcoder
(CPDT) (Keesman et al., 1996). On the other extreme,
the input video bitstream is first partially decoded tothe DCT coefficient level. Then, the bit rate can be easily
scaled down by cutting higher frequency coefficients
or by requantizing all coefficients with a larger quanti-
zation step size (Sun et al., 1996). This kind of trans-
coders is also referred to as the Open-Loop Transcoder
(OLT). In between the above two extreme methods is
a third method, which also uses requantization of
the DCT coefficients, but the requantization error isstored in a buffer and is fed back to the requantizer
to correct the requantization error introduced in the
previous frames (Zhu et al., 1999). This kind of trans-
coder simplifies the architecture of the first category
by reusing motion vectors, and merging two motion
compensation loops in the CPDT into one (Assunc�~aoand Ghanbri, 1996). If motion compensation is car-
ried out in the DCT domain (Chang and Messersch-mitt, 1993), the simplified transcoding can be performed
totally in the DCT domain, which results in a DCT-
domain transcoder (DDT) with much reduced com-
plexity. These transcoding architectures operate in
different layers (pixel domain and DCT domain) and
have different complexity and effects on the final visual
quality. Choosing a transcoding architecture should be
determined by application requirements.
2.2. Video rate control and bit allocation
Besides downscaling bit rate, a video transcoder must
have the ability to accurately and dynamically control
the output bit rate according to the channel bandwidth.
This is the subject of video rate control. Generic rate
control belongs to the budget-constrained bit allocationproblem and can be separated into the following two
steps:
(1) Allocate target bits for each frame according to im-
age complexities, buffer occupancy, or a given chan-
nel bit rate. This step is usually called Frame-Layer
Rate Control.
(2) Derive the actual quantization parameter for eachMacroblock (MB) in the picture, and make the
number of produced bits meet the bit target. This
step is usually called Macroblock-Layer Bit Alloca-
tion.
For Macroblock-Layer Bit Allocation, methods
based on conventional Rate-Distortion Theory (Hang
and Chen, 1997; Wu and Gersho, 1991; Choi and Park,1994; Ramchandran et al., 1994; Corbera and Lei, 1999)
and empirical Rate-Quantization models (Ding and Liu,
1996; Tao et al., 2000; Chiang and Zhang, 1997; He
et al., 2001) have been proposed in the literature to solve
it. As far as Frame-Layer Rate Control is concerned, the
bit budget for every frame has to be determined con-
sidering the channel bit rate and buffer occupancy. Since
all video coding standards assume that the compressedvideo will be transmitted over a constant bit rate (CBR)
channel, the frame-layer rate control scheme of these
standards is based on this assumption. However, this
is not the case in reality. For example, if compressed
video is transmitted over the Internet, which currently
cannot provide a guaranteed constant bit rate channel
for a specific application, when the network is con-
gested, most video packets will be dropped, which re-sults in unacceptable video quality, if no other
mechanisms exist to protect the video stream. Another
case is transmitting video over a wireless channel, which
is characterized by high bit error rate (BER) and vari-
able effective channel bit rate. Although Shannon’s
separation theorem states that source coding (compres-
sion) and channel coding (error protection) can be per-
formed separately and sequentially, a lot of researchresults have shown that the source coding and channel
256 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270
coding have to be combined together in practical video
communication systems.
When pre-encoded video needs to be distributed to
heterogeneous networks, since the target transmission
channel conditions are generally unknown when the
video is originally encoded, using transcoders gives us asecond chance to dynamically adjust the video bit rate
according to channel conditions. Some research works
have been done for bit rate adaptation and bit rate
downscaling using transcoding. In (Assunc�~ao and
Ghanbari, 1998; Assunc�~ao and Ghanbari, 1997a,b),
transcoding is regarded as a down conversion process,
where the bit rate of a compressed video bit stream is
reduced according to a given constraint. Assunc�~ao et al.propose an optimal transcoder in a rate-distortion sense
for transmitting video over the Internet. In their work,
the problem of optimal transcoding is formulated in an
operational rate-distortion context and solved by using
a Lagrangian algorithm. New quantizer scales are se-
lected based on classic Rate-Distortion theory for
transcoding each MB or group of MBs such that the
output rate does not exceed the given constraint, whileproducing a minimum average distortion. In (Assunc�~aoand Ghanbari, 1997a,b), video transcoding is utilized as
a mechanism capable of decoupling video encoders from
network constraints and providing congestion control of
pre-encoded video traffic over ATM networks for video
distribution applications. This mechanism provides an
effective method of shaping video traffic independently
of the initial video encoder’s constraints. By usingtranscoders, video traffic can be controlled at any point
along the transmission path and thus the Quality of
Service can be maintained without relying on on-line
encoders. In (Dogan et al., 2001a,b), Dogan et al.
address the problem of traffic planning for mobile video
communications and propose a video transcoder bank
to resolve congestion and/or bandwidth limitation. The
proposed architecture presents a layered structure ofmultiple video rates as required by various networks.
The paper also introduces an adaptive method for
resolving congestion. The designed system monitors the
congestion with a feedback loop within a network and
adaptively produces necessary transmission rates while
providing the best available service quality. In our pre-
vious work (Lei and Georganas, 2002), a scene context
based on frame layer rate control is proposed fordetermining the frame bit budget, and an algorithm
based a linear bit allocation model is used to do macro-
block layer bit allocation.
2.3. Error control for wireless video communication
Recently, the increasing demand for mobile commu-
nications has resulted in the extensive use of wirelesscommunication technology. It is necessary to support
multimedia services, including video and audio other
than voice and data, over wireless links. However,
wireless links are characterized by high bit error rate,
limited bandwidth and time-varying conditions. Unlike
a wireline channel where the signal strength is relatively
constant and the errors in reception are mainly due to
the additive noise, errors in a wireless channel are pre-dominantly due to the time varying signal strength
caused by the multi-path propagation from local scat-
ters. Thus, errors in a wireless channel tend to be bursty,
with the duration of bursts being a function of the re-
ceiver velocity and the nature of the time varying envi-
ronment (Aramvith et al., 2001). Transmission of video
over wireless networks is challenging because of the
delay constraints involved, and because of the negativeimpact of channel errors on the perceptual quality of
video at the decoder. Uncorrected channel errors may
result in significant quality degradation at the decoder.
This is particularly evident in standard coders, such as
those based on MPEG or H.263, where variable length
coding is used or where compression involves a predic-
tive coding scheme.
There have been may techniques proposed in the lit-erature to combat the transmission error problem from
different aspects. At the encoder side, source coding
techniques, such as layered coding (Ghanbari, 1989),
multiple description coding (Puri et al., 2001), Error
Resilience Entropy Coding (Cheng and Kingsbury,
1992; Redmill and Kingsbury, 1996), etc. can be used to
increase the robustness of the video stream against
channel errors.At the decoder side, error concealment techniques
attempt to recover the lost information by estimation
and interpolation without relying on additional infor-
mation from the encoder (Wang and Zhu, 1998). Besides
source coding and error concealment techniques, chan-
nel coding techniques, such as FEC and ARQ, have been
the classic techniques to combat transmission errors.
FEC codes can be chosen to guarantee certain error raterequirements for the worst channel conditions. However,
this causes unnecessary overhead and wastes bandwidth
when the channel is in a good state (Liu and Zarki, 1998).
Using ARQ error control for the mobile radio channels
has been shown to be more effective than FEC because
retransmission is only required during periods of poor
channel conditions (Khansari et al., 1996).
3. Delay and rate constraints for transcoding
In this work, we define a pre-encoded video streaming
system in which the pre-encoded high quality video
program is transcoded, transmitted, decoded and dis-
played in real time within some delay interval. A block
diagram of the whole system is illustrated in Fig. 1. Inthis section, we will analyze the effect of the delay and
buffer constraints on the transcoding ratio.
Pre-encodedVideo
VideoTranscoder
TranscoderBuffer
Wirelesschannel Video Decoder
DecoderBuffer
Videooutput
Video Source Video Client
Dt Dtb Dtc Ddb Dd
Fig. 1. Basic component of the defined video streaming system.
Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 257
3.1. Delay constraints for the defined system
In the defined system as illustrated in Fig. 1, we
assume that a video is transcoded and transmitted at
a fixed frame rate F . At the decoder side, the video is
decoded and displayed at the same frame rate F . In this
work, we use the total end-to-end delay as a perfor-
mance metric. Obviously, lower delay is preferable sinceit allows reduced initial startup delay in one-way com-
munication systems. In the studied system, the end-to-
end delay each frame experiences (from the time it is
transcoded to the time it is placed in the video display)
consists of several delay components, as shown in Eq.
(1)
D ¼ Processing DelayðDt þ DdÞþ Transmission DelayðDtcÞþ Buffer DelayðDtb þ DdbÞ ð1Þ
where the subscripts t, d stand for the transcoder and
decoder, respectively. In the above equation, the pro-
cessing delay in either the transcoder or decoder depends
on the computer power and the transcoding complexity
can be much lower than that of the encoding process.
Therefore, we can assume the processing delay compo-
nents are constant and can be neglected. In this work, anARQ protocol is assumed to be used for video trans-
mission, therefore, the transmission delay, Dtc, includes
the time for transmission and retransmission of the
video packets and Acknowledgement or Negative
Acknowledgement (ACK/NAK) messages. In this work,
we are primarily concerned with the delays introduced
by the transcoder buffer and decoder buffer, because
they can be much larger than the transmission delaysand the amount of buffering in the video system can
strongly affect the video quality and end-to-end delay.
After the first frame is transcoded, it will be sent into
the transcoder buffer, which is empty at this time. In
general, it is possible for transcoder buffer underflow to
occur, if transmission starts at the same time as the
transcoder puts the first bit into the buffer. On the other
hand, at the decoder buffer, which is also empty for thefirst frame, underflow will occur if decoding starts at the
same time as the first bit of the first frame arrives. In
practice, this is prevented by starting the transmission
and decoding after a certain initial delay. Therefore, the
end-to-end delay for the first frame is given by the sum
of the initial delays. To simplify the analysis, we define
the system initial delay D, i.e., the first frame will be
transcoded at time t and will be decoded and displayed
at time t þ D. Then, the maximum delay that anyfollowing video frame will experience has to remain
constant as D. In real-time interactive applications, the
end-to-end delay, D, must be kept less than a certain
limit, such as 100 ms in video conferencing applications.
In precoded video streaming applications, the end-to-
end delay can be much longer. However, short initial
delay is still a desired target. At the same time, long
initial delay is equivalent to larger decoder buffer, whichis what we want to avoid, because most handheld
devices are memory limited. Under this constraint, the
system will function normally, as long as the decoder
buffer does not overflow and underflow, which prevents
the video data from being lost and guarantee that the
decoder has received the data of a video frame before it
is scheduled to be displayed, respectively.
3.2. Buffer analysis of VBR transcoders
In this section, we will analyze the conditions that the
transcoder buffer has to meet in order to prevent the
decoder buffer from underflowing within the end-to-end
delay constraints. In (Assunc�~ao and Ghanbari, 2000),
Assunc�~ao et al. have analyzed the buffering implication
of inserting a transcoder along the CBR transmissionpath. However, in our studied system, due to the re-
transmission in the ARQ scheme, the effective channel
bandwidth becomes variable. As illustrated in Fig. 2, in
our selective repeat (SR) ARQ scheme, the reception of
a packet is acknowledged by the receiver sending either
an ACK or a NAK to the transmitter. Packets that have
been sent are stored in the ARQ buffer until they are
acknowledged with ACK. Packets awaiting transmis-sion are stored in the transcoder buffer and the decoder
buffer can be used to rearrange the received packets,
which may be out-of-order due to retransmission.
Bd(t)
r'(t-D)
r(t)R(t)
Bt (t)
)(tβ r'(t)
D
ARQControl
ARQ Buffer
ACK/NAK
Channel feedback
Fig. 2. Buffering with variable transcoding ratio for VBR channel.
258 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270
As illustrated in Fig. 2, the precoded video is sent intothe transcoder at the source coding rate rðtÞ. Actually,
because all data of the pre-encoded video are available
at the server side, the transcoder can work much faster
than F frames per second. To simplify the analysis here,
we assume that the transcoder only transcodes F frames
in one second. At the transcoder, the transcoding is
modeled as a scaling function bðtÞ which is multiplied by
rðtÞ produces the transcoded video at rate r0ðtÞ, i.e.,
r0ðtÞ ¼ bðtÞ � rðtÞ ð2Þ
The effect of multiplying bðtÞ by rðtÞ can be seen as
equivalent to reducing the number of bits used in the
video frame transcoded at time t. In Fig. 2, BtðtÞ andBdðtÞ are buffer occupancy of transcoder and decoder at
time t, respectively. RðtÞ is the channel bandwidth at
time t. We define Bti and Bd
i as the instantaneous occu-
pancy of the transcoder and decoder buffers at the ithframe. First, we discretize the problem by defining Ei
(i ¼ 1; 2; . . .) to be the number of bits generated by the
transcoder in the ith frame interval [ði� 1ÞT ; iT ), whereT is the duration of one frame interval. Therefore,
Ei ¼Z iT
ði�1ÞTr0ðtÞdt ð3Þ
Similarly, let Ri be the number of bits that aretransmitted during the ith frame interval:
Ri ¼Z iT
ði�1ÞTRðtÞdt ð4Þ
Assuming the transcoder buffer is empty at timet ¼ 0, then the transcoder buffer occupancy after trans-
coding the ith frame is
Bti ¼
Xi
j¼1
Ej �Xi
j¼1
Rj ¼ Bti�1 þ Ei � Ri ð5Þ
After the decoder begins to receive data, it waits Dframe intervals before starting to decode and play.
Then, the decoder buffer occupancy after the ith frame
interval is
Bdi ¼
Pij¼1 Rj; i6DPij¼1 Rj �
Pi�Dj¼1 Ej ¼ Bd
i�1 þ Ri � Ei�D; i > D
(
ð6Þ
The transcoder can calculate the decoder buffer fullness,
if D is predetermined or sent explicitly as a decoder
parameter. The system will function normally as long as
the decoded buffer does not underflow within the end-
to-end delay constraint. Therefore, Bdi should be greater
than zero at any time.
3.3. Transcoding ratio constraints
We now combine equations from Section 3.2 to ob-
tain conditions necessary to prevent transcoder and
decoder buffers underflow. To prevent transcoder buffer
underflow, from Eq. (5), we have
Bti ¼ Bt
i�1 þ Ei � Ri > 0 ) Ei > Ri � Bti�1 ð7Þ
which is a constraint on the number of bits of everytranscoded frame.
In order to prevent the decoder buffer underflow,
from Eq. (6), we have
Bdi ¼ Bd
i�1 þ Ri � Ei�D P 0 ) Ei�D 6Bdi�1 þ Ri; i > D
) Ei 6BdiþD�1 þ RiþD; i > 1 ) Ei 6
XiþD�1
j¼1
Rj
�Xi�1
j¼1
Ej þ RiþD ¼XiþD
j¼1
Rj �Xi�1
j¼1
Ej ¼Xi�1
j¼1
Rj
þXiþD
j¼i
Rj �Xi�1
j¼1
Ej ð8Þ
As we can see from Eq. (8),Pi�1
j¼1 Rj is the number of
total bits that have been transmitted in the past ði� 1Þframe intervals, and
Pi�1
j¼1 Ej is the number of total bits
of the past (i� 1) transcoded frames. These two terms
are known by the transcoder. However,PiþD
j¼i Rj is the
number of total bits that will be transmitted in the future
Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 259
(Dþ 1) frame intervals, which is not available at the
transcoder, if the channel bandwidth is variable.
Therefore, the condition that the bit number of the
ith transcoded frame has to satisfy is
maxð0;Ri � Bti�1Þ < Ei <
Xi�1
j¼1
Rj þXiþD
j¼i
Rj �Xi�1
j¼1
Ej ð9Þ
Under this condition, there will be a maximum of Dframes stored in the whole system, because the maxi-
mum end-to-end delay is D frame intervals. Then, we
have
Bti þ Bd
i 6
Xi
j¼i�Dþ1
Ej; 8i > D ð10Þ
In a VBR channel operating at a channel rate RðtÞ,which is not deterministically known, the key diffi-culty in dealing with this scenario is that the trans-
coder has to estimate the effective channel bandwidth to
decide the scaling factor for every frame. In Section 5,
we will introduce how to make use of a probabilistic
model of the channel and observation of the current
channel state to estimate the effective channel band-
width.
4. Adaptive transcoding based on video content
In this section, we will introduce the frame layer rate
control based on video content. Here, video content
refers to the inherent visual feature present in the video.
In this work, we consider the frame types, video source
coding rate and scene changes.
4.1. Frame types and source video traffic
Current video coding standards, such as MPEG-x and
H.26x, use motion compensation to reduce the temporal
0 50 100 150 200 2500
1
2
3
4
5x 10
4
Scene Change
Fig. 3. Source coding rate of a t
redundancy between successive frames. Usually, in
MPEG-x, one group of pictures (GOP) contains one
INTRA (I) frame and several INTER (P , or B) frames in
a certain pattern. An I frame has no motion compensa-
tion performed on it but it is very important because
motion compensation of the following frames is depen-dent on the quality of the I frame. P frames use the
previous I , or P frames as reference for motion com-
pensation and also are used as reference frames for other
INTER frames. B frames use both the previous and
successive I , or P frames as references for motion com-
pensation but B frames are not used as reference
frames for INTER frames. Because of this hierarchy in
coding, visual quality for different frame types has dif-ferent effect on the quality of the whole video stream with
the priority being I > P > B. Therefore, an effective rate
control scheme should treat different types of frames
unequally, in order to improve the quality of the whole
video.
For wireless applications, the H.263 standard is more
popular due to its focus on low bit rate. Different from
MPEG-1, 2, in which I frames are periodically usedmainly for indexing, in H.263 standard, I frames are
seldomly used, just to refresh the visual quality. When a
fixed quantization parameter is used for coding all
frames in a video sequence, the visual objective quality
(PSNR) will tend to be near constant while the gener-
ated bit rate will be liable to fluctuate due to the scene
content and different frame types. Thus, if the pre-
encoded bitstream generated by the encoder is kept veryclose to a constant rate, usually the purpose of doing so
is for easy transmission over a CBR channel, there will
be a penalty in terms of quality. In one of our experi-
ments, we encoded a video sequences with unified
quantization parameters for all I , P , B frames. Only the
first frame is coded as an I frame and all other frames
are coded as B, and P frames in a certain pattern (IBBP).
Fig. 3 shows the bit rate of this stream. As we can see, I
300 350 400 450 500
ypical H.263 coded video.
Fig. 4. Relation between the bits/frame of two bit streams encoded at two different rates.
260 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270
frames usually have the largest amount of bits, B and Pframes have much fewer bits than I frames. We notice
that when scene changes happen, the bit rate of the first
INTER frame (we called it anchor frame) after the scene
change is very close to that of I frames. This is because
when a scene change happens, most MBs are encoded as
INTRA blocks. We also found that the average bit
number for B frame is usually half of that of the P frame
within a scene segment.The source bit rate can be recorded when the original
video is encoded and saved with the encoded video as
side information. Some research (Corbera and Lei,
1999) has shown that for the same type of frame en-
coded with a unified quantization parameter, the gen-
erated frame size indicates the scene variations and
motion activities. Therefore, in the proposed scheme, we
directly use the source bit rate as a parameter to measurethe video content and to calculate the transcoding bit
budget for every frame.
As pointed out by (Assunc�~ao and Ghanbari, 2000),
in transcoding of precoded video, the bit target for
every frame should not be simply calculated as the
product of the number of bits of the incoming frame
and the ratio between the output and input bit rates of
the transcoder. In one of our experiment, we encodedthe same video sequence with two different quantiza-
tion parameters and the bit rates of the generated bit
streams are 263 and 667 kbps. The obtained bits/frame
ratios are illustrated in Fig. 4. The scaling factor a of
the two video streams is about 0.4, however, as we can
see from Fig. 4, the number of bits Bi1;B
i2 used to en-
code the corresponding picture of each video stream
are not related through the same scaling factor a, i.e.,Ni2 6¼ aNi
1.
If the objective of transcoding is to make the quality
of pictures to be near constant at the reduced rate, then
the scaling factor of such a transcoder should follow the
curve shown in Fig. 4.
4.2. Scene changes detection
A scene change represents any distinctive difference
between two adjacent video frames. It includes rapid
motion of moving objects as well as changes to different
visual content. In most existing rate control schemes, the
information obtained from previously coded pictures is
utilized for estimating the target bits for the current pic-
ture. However, if a scene change occurs, information ob-tained from previously coded pictures is no longer useful
and even can cause visual quality degradation in the pic-
tures following the scene change. Therefore, when a scene
change happens, the first frame after scene change should
be transcoded in high quality to prevent quality degra-
dation after scene change. Prior to that, scene changes
must be detected. As mentioned in the previous section,
when a scene change happens, most MBs in the anchorframe will be encoded as INTRA blocks. Therefore, in
this work, we simply use the percent of INTRA mode
MBs in a frame to detect the scene change as follows.
SCD¼Number of INTRACodedMacroblocks
Number of Macroblocks in a Frame�100%
If SCDP 40%, we believe a scene change happens.
Same as the source coding rate, SCD can also be cal-culated and recorded when the video is pre-encoded.
Fig. 5 shows variations in SCD for the sequence shown
in Fig. 3. As we can see, using the percent of INTRA
mode MBs in a frame is accurate enough to detect scene
changes. In the proposed scheme, if a scene change is
detected, we will transcode the anchor frame as an Iframe. Therefore, the quality of frames after the scene
change will not be affected.
4.3. Adaptive frame layer rate control
Different from real time interactive video applica-tions, such as video conferences, in which the end-to-end
0 50 100 150 200 250 300 350 400 450 5000
20
40
60
80
100
Perc
ent o
f IN
TRA
MBs
(%)
Frame Number
Fig. 5. SCD of a typical H.263 coded video sequence.
Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 261
delay of each frame must be kept within certain limit,
the requirement for end-to-end delay in the streaming
and playing precoded video application is relaxed.
In this type of application, an initial startup delay is
allowed and a buffer at the decoder will smooth the
delay jitter. Therefore, as long as the decoder buffer is
not underflowing, the end-to-end delay for every frame
can be different.The current frame layer rate control schemes in the
H.263 standard, such as TMN-8, are dedicated for
transmitting real time video in low delay over a constant
bit rate channel. The frame budget for every frame is
only determined by the buffer occupancy and channel
bandwidth. Therefore, nearly constant bit budget is
allocated to each frame and each frame will experience
similar end-to-end delay. Different from the frame layerrate control scheme in TMN-8, the proposed scheme
determines the transcoded frame bit budget considering
the scene change, frame types and source coding rate of
the pre-encoded video. In the design of the bit rate
control algorithm, since an exact curve such as the one
in Fig. 4 is not available to the transcoder, the following
rules are adopted to determine the transcoded frame bit
budget.
(1) If the incoming frame is an INTRA coded frame, it
will be transcoded as an INTRA frame with unified
quantization parameter used for every MB.
(2) P and B frames are treated differently by the trans-
coder. The bit budget for B frames will be half of
that for P frames, on the average.
(3) For the same type of INTER frames, the frame bitbudget is determined considering the effective chan-
nel bandwidth, end-to-end delay and the original
source traffic.
(4) If a scene change is detected, the anchor frame will
be transcoded as an INTRA frame with unified
quantization parameter for every MB.
At the frame layer, we first determine the bit budget
for a GOP, which has a fixed number of frames. Since
the INTRA frames are seldomly used in H.263, we as-
sume here that the GOP only includes INTER frames (Pand B) in a certain pattern (such as PBPBPB. . . or
PBBPBB. . .). The bit budget for encoding a GOP is
defined as
BGOP ¼ ðN þ aDÞ � RF
þ D
a ¼0:7; first GOP
0; otherwise
� ð11Þ
where N is the number of frames in a GOP, R is the
effective channel bandwidth, F is the frame rate, D is the
required end-to-end delay measured in frame intervals,and D is the bit budget unused from the previous GOP.
In order to take full advantage of the initial delay D, weintroduce parameter a, which is empirically set to 0.7 for
the first GOP. As we can see in Eq. (11), for the first
GOP, N frames will have more bit budget than that the
channel can actually transmit in the first N frame
intervals. Then, when the end-to-end delay is relaxed by
the application, the visual quality will be improved. Forother following GOPs, Eq. (11) assumes that N frames
in a GOP will be transmitted within N frame intervals.
The calculation of the effective channel bandwidth, R,will be introduced in the following section. Inside a
GOP, the frame bit budget is determined according to
frame types and the original source bit rate following the
above rules. Based on the above rules and the pattern of
P , and B frames within a GOP, we can know the bit
262 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270
budget for all P frames and B frames in a GOP,
respectively, as follows:
BGOP�B ¼ NB
2N � NBBGOP;
BGOP�P ¼ 2Np
Np þ NBGOP
ð12Þ
where NP ;NB are the numbers of P , and B frames in a
GOP. For the same type of frames, bits are allocated to
every frame according to the original source bit rate.
Then, for P frames, we have
BP ðiÞ ¼BP ðiÞPNP�1
j¼0 BP ðjÞBGOP�P ;
BBðiÞ ¼BBðiÞPNB�1
j¼0 BBðjÞBGOP�B
ð13Þ
where BP ðiÞ;BBðiÞ are the bit budgets for the ith P or Bframe, BP ðiÞ;BBðiÞ are the bit rates of the ith P and
B frames in the precoded video, respectively. When theframe bit budget is determined, the macroblock layer bit
allocation algorithm is responsible for calculating the
IP PP
skip
ItB P
t+3B
t+1B
t+2P
t+6 BB
Fr am e toskip
Pt
Bt-2
Transcoto P frato I frame
Transcode
Frame toskip
Transcode toI frame
Next unskippedframe
Frame toskip
skip
skip
(c) (d)
(a)
Fig. 6. Adaptive fra
quantization parameter for every MB and make the
generated bits as close as possible to the bit budget. In
Section 2, we have provided some references for macro-
block layer bit allocation. However, we believe that the
varying quality produced by varying quantization
parameters is far less than that produced by varying theframe bit budget. Therefore, in this work, we directly use
the macroblock layer bit allocation algorithm in TMN-
8. Details about the macroblock layer bit allocation in
TMN-8 can be found in (Corbera and Lei, 1999).
4.4. Adaptive frame skipping
As indicated in Eq. (10), at any given time, the sum oftranscoder buffer and decoder buffer occupancy must be
smaller than the bits for the D frames that have been
transcoded but not yet decoded. Otherwise, some frames
will experience longer end-to-end delay than D. The
reason of violating Eq. (10) is as follows. First, due to
the control error of the macroblock layer bit allocation
algorithm, the generated frame size may not be exact as
the bit budget. Second, due to the frame type change,such as transcoding P frames to I frames, the generated
PP PP
skip
Bt-1
Pt+3 BB P
tBt-2
Bt-1
Pt+3
skip
skipdeme
Transcode toP frame
Frame toskip
Next unskippedframe
Frame toskip
(e)
(b)
me skipping.
Table 1
Adaptive frame skipping decisiona
Frame type Scheduled to skip? SCDP 40%? Action
I Yes Yes Keep the I frame, skip the next PBB frames
I Yes No N/Ab
I No Yes Keep the I frame
I No No N/Ab
P Yes Yes Transcode the P frame as I frame, skip the next BB frames
P Yes No Keep the P frame, skip the next BB frames
P No Yes Transcode the P frame as I frame
P No No Keep the P frame
B(1) Yes Yes Skip the 2 B frames, transcode the next P frame as I frame
B(1) Yes No Skip the B frames
B(1) No Yes Skip the 2 B frames, transcode the next P frame as I frame
B(1) No No Keep the B frame
B(2) Yes Yes Skip the B frame, transcode the next P frame as I frame
B(2) Yes No Skip the B frame
B(2) No Yes Skip the B frame, transcode the next P frame as I frame
B(2) No No Keep the B frame
aAssume the frame pattern is IPBBPBBPBB. . . B(1), B(2) stand for the first and the second B frame in a PBBP pattern, respectively.bWhen the frame type is I, the SCD will be definitely greater than 40%.
Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 263
frame size may much greater than the frame bit budget.
Third, because the algorithm has to estimate the effective
channel bandwidth of the future D frame intervals, the
accuracy of the estimation will also affect the result of
Eq. (10). In order to guarantee that the end-to-end delay
of every frame is less than D, in the proposed algorithm,
after transcoding every frame, we will evaluate Eq. (10).If there are more than D frames stored in the transcoder
and decoder buffer, the next frame may be dropped
according to its type.
For example, if the original video was encoded in
IPPP pattern and the frame that needs to be skipped is
an I frame, then this I frame will be skipped and the next
unskipped frame (P ) will be transcoded to an I frame
with unified quantization parameter as in Fig. 6(a). Ifthe frame that needs to be skipped is a P frame, then this
P frame will be skipped and the next unskipped frame
will be transcoded to a P frame as in Fig. 6(b). If the
original video was encoded in IBBP pattern, we have to
consider the frame reordering that will happen at the
decoder. As illustrated in Fig. 6(c), suppose the I frame
will be displayed at time t, then the following two frames
that need to be displayed will be B frames at time t þ 1and t þ 2. However, when the original video is encoded,
the P frame, which is scheduled to be displayed at time
t þ 3, is encoded and stored prior to the two B frames.
When the decoder decodes this video stream, it will first
decode the P frame, and then the following two B frames
can be decoded and displayed. Therefore, in our trans-
coder, if the frame that needs to be skipped is an Iframe, we will skip this frame and transcode the fol-lowing P frames as an I frame. At the same time, the
following two B frames will also be skipped as in Fig.
6(c). If the frame that needs to be skipped is a P frame,
then instead of skipping this P frame, we keep this frame
but skip the following two B frames as in Fig. 6(d),
because, as we mentioned before, the bit rate for a Pframe is usually two times of that of a B frame. If the
frame that needs to be skipped is a B frame, we then
simply skip this frame as in Fig. 6(e). When the frame
scheduled to be skipped happens to be an anchor frame,
based on the same idea, the following rules in Table 1will be applied for frame skipping.
5. Probabilistic modeling of channel behavior
As introduced before, the estimation of the effective
channel bandwidth is very important for the frame layer
rate control. In this section, we will introduce a wirelessARQ channel model and related method for estimating
effective channel bandwidth, which has been extensively
used for simulating burst-error wireless channels in the
literature, such as in (Aramvith et al., 2001; Hsu et al.,
1999).
5.1. Wireless channel model
From the video transmission point of view, when
ARQ is used, the channel becomes a variable bit-rate
channel with throughputs depending on the channel
conditions. When the channel becomes poor, the re-
transmissions use up bandwidth and thus reduce the
effective channel rate (the effective channel rate is
defined as the rate of the information that is correctly
transmitted). For the purpose of simulation, a Markovchain was used to model bursty errors during trans-
mission based on collected network traffic traces.
Previous studies show that a first-order Markov chain,
such as a two-state Markov model or a finite-state
S0 S1
P01
P10
P00 P11
Good channel state(no error)
Bad channel state(error occurs)
Fig. 7. Two-state Markov channel model.
Table 2
Summary of the transition matrices used in our experiments
Average
packet error
rate (%)
P00 P01 P10 P11
Mean_Burst_Length¼ 18 Packets, Packet Size¼ 40 bits
5 0.9971 0.0029 0.0556 0.9444
10 0.9938 0.0062 0.0556 0.9444
15 0.9902 0.0098 0.0556 0.9444
20 0.9861 0.0139 0.0556 0.9444
25 0.9815 0.0185 0.0556 0.9444
30 0.9762 0.0238 0.0556 0.9444
264 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270
model, provides a good approximation in modeling
the error process at the packet level in wireless
channels. Using these models, one can dynamically
estimate the situation of the specified channel and use
this knowledge to help designing corresponding algo-
rithms. At the same time, one can generate artificialnetwork traces for the network under study and use
the traces to simulate, and thus, better understand
performances of existing and new protocols and
applications.
In (Aramvith et al., 2001), a two-state Markov model
was used to estimate the channel states. In this work,
since we are only interested in the changing of the
effective channel rate due to the retransmission in ARQ,to simplify the analysis, we use a two-state Markov
model, which is similar to (Aramvith et al., 2001), to
emulate the process of packet errors. We do not consider
the effect of other factors, such as the reliability of the
feedback channel, the round trip delay of ACK/NAK
feedbacks, retransmission times, etc.
In the channel model under study, the channel
switches between a ‘‘good state’’ and a ‘‘bad state’’, S0and S1, as illustrated in Fig. 7.
A packet is transmitted correctly when the channel is
in state S0, and errors occur when the channel is in state
S1. Pij for i; j 2 f0; 1g are the transition probabilities. The
packet-error statistics will vary according to the values of
the transition probabilities. The transition probabilities
can be calculated from the collected channel statistics
(such as the average packet-error-burst length and thepacket error-rate) generated from the wireless channel
simulator. The channel state-transition probability
matrix for this channel model can be set up as
P ¼ P00 P01P10 P11
� �¼ 1� P01 P01
P10 1� P10
� �ð14Þ
The transition probability, P01 and P10, can be derived
using the assumption of Gilbert’s Markov model (Gil-
bert, 1960). The run length of error-bursts has a geo-
metric distribution with mean 1=P10, i.e.,
P10 ¼1
Mean Burst Length
The mean_burst_length statistics can be obtained
from the packet error-pattern generated by the wireless
channel simulator. The average packet error rate is given
by
PER ¼ P01P01 þ P10
Therefore, P01 can be derived from two parameters, P10and packet error rate (PER), as
P01 ¼P10 � PER
ð1� PERÞ
In this work, we use different packet error rates to
generate several transition matrices as in Table 2. We do
not consider channel coding overhead, and define the
packet payload size as 40 bits and the mean burst length
as 18 packets. In our experiments, we will use these
transition matrices to estimate the effective channel rate
as described in the following section.
5.2. Effective channel throughput of ARQ protocol
Based on the channel feedback information from the
client and the channel model introduced in Section 5.1,
the future effective channel rate can be estimated. From
the transitional probability matrices and a given initial
state, the expected future channel throughput, i.e., the
average of the probability of the correct transmission inthe next i packets, can be calculated as in (Aramvith
et al., 2001). We use this information to adjust the target
number of bits in the video rate-control algorithm. In
the following discussion, all the time periods mentioned
are normalized with the time to transmit a packet. If we
do not consider the round-trip delay of the ACK/NAK
messages, when a packet is transmitted at time t, we canimmediately get the feedback message, ACK or NAK.The channel state at time t, SðtÞ, is known. Based on the
transmission probability matrix P , we define the state-
probability vector at time k
pðkjSðtÞ ¼ SnÞ ¼ ½p0ðkjSðtÞ ¼ SnÞ;p1ðkjSðtÞ ¼ SnÞ�; n 2 f0; 1g
ð15Þ
Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 265
as a row vector formed by the two-state probabilities,
i.e., the probabilities for the channel to be in state S0 andS1 at time k, respectively, given that it was observed to
be in state Sn at time t. Note that t, and k are all discrete
values. The initial state probability pðtjSðtÞ ¼ SnÞ at time
t is set as
piðtjSðtÞ ¼ SjÞ ¼1; when i ¼ j
0; otherwise
�i; j 2 f0; 1g
ð16Þ
The state probabilities at time k can be derived from
the state probabilities at time k � 1
pðkjSðtÞ ¼ SnÞ ¼ pðk � 1jSðtÞ ¼ SnÞ � P ð17ÞBy recursively using Eq. (17), the channel state proba-
bilities at time k, where k > t, can be calculated from the
initial state probability and the transition probability
matrix
pðkjSðtÞ ¼ SnÞ ¼ pðtjSðtÞ ¼ SnÞ � Pk�t ð18ÞIn this channel model, packets are transmitted correctly
(C bits are transmitted, where C is the packet size) when
the channel is in state S0, while errors occur (0 bits are
transmitted) when the channel is in state S1. The ex-
pected channel rate E½CðkÞjSðtÞ� given the observation of
channel state SðtÞ can be calculated as
E½CðkÞjSðtÞ� ¼ C � p0ðkjSðtÞÞ ð19Þ
6. Joint source-channel transcoding for rate adaptation
Combining the ideas in the previous sections, we
propose the joint source-channel rate adaptation algo-
rithm. In this section, we summarize the whole process
of the algorithm. The following notations are used:
BGOP target number of bits assigned to a GOP;
F target frame rate in frames per second;N frame number in a GOP;
NP number of uncoded P frames in a GOP;
NB number of uncoded B frames in a GOP;
BGOP target number of bits left for the uncoded
frames in a GOP;
BGOP�P target number of bits left for the uncoded Pframes in a GOP;
BGOP�B target number of bits left for the uncoded Bframes in a GOP;
BP ðiÞ bit target for the ith P frame;
BBðiÞ bit target for the ith B frame;
The detailed algorithm is as follows:
Step 1: Determine the current channel state: we calcu-
lated the average ratio of successfully transmitted bits to
the average number of total transmitted bits in the past
L frame intervals. If the ratio is greater than a thresh-
old H , we decide that the channel is in the good
state. In our simulation, we set L as 10 and H as 0.9
empirically.
Step 2: Calculate the estimated channel bandwidth:
Depending on the current channel state, we can use theMarkov model to find the probability of the correct
transmission in the next D frame intervals. Then we can
calculate the estimated channel bandwidth, R, for the
next D frame intervals as in Eq. (19).
Step 3: Calculate the bit budget for a GOP. At the
beginning of every GOP, we first calculate the bit
budget for the GOP as in Eq. (11). In our experiments,
we set N to 30. At the beginning of a GOP, the bitbudget for the uncoded frames in the GOP, BGOP, is
equal to BGOP.
Step 4: Calculate the bit budget for every frame
within a GOP according to frame types, scene context
and source coding rate. If the incoming frame will be
transcoded as INTER frame, the frame bit target for
this frame will be calculated according to the source
coding rate. Since the original source rate and framepattern are known when the original video is coded,
the frame bit target can be calculated by applying the
rules in Section 4.3. Similarly as Eqs. (12) and (13), we
can have
BGOP�P ¼ 2� NP � BGOP
2� NP þ NB;
BGOP�B ¼ NB � BGOP
2� NP þ NB
ð20Þ
BP ðiÞ ¼BP ðiÞPNP�1
j¼i BP ðjÞBGOP�P ;
BBðiÞ ¼BBðiÞPNB�1
j¼i BBðjÞBGOP�B
ð21Þ
Step 5: Adjust the frame bit budget. By using Eq. (9),
the frame bit budget needs to be adjusted considering
the buffer and end-to-end delay constraints.Step 6: Transcoding the incoming frame. In this work,
the TMN-8 macroblock layer bit allocation algorithm
is adopted to calculate the quantization parameter of
every MB.
Step 7: Update. Transcoder and decoder buffer
occupancy is updated as in Eqs. (5) and (6). Other
parameters, such as BGOP;NP ;NB are also updated. Eq.
(10) will be evaluated to see if the total number of framesstored in the transcoder and decoder buffer is greater
than the maximum end-to-end delay, D, or not. If thereare more than D frames stored in the system, the suc-
cessive frame is scheduled to be skipped. The action will
be decided according to Table 1. If there are more
266 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270
frames that need to be transcoded, go to Step 1, other-
wise, stop.
7. Experimental results and conclusions
In this work, we implement the rate adaptation
transcoder and the proposed algorithm based on the
public domain software for H.263 (TMN8, 1997). In our
experiments, the test sequence as in Fig. 3 is encoded
with using unified quantization parameters, QP¼ 5, for
every frame with a frame rate of 10 fps. Only the first
frame of the sequences is encoded as an I frame and
other frames are encoded as B or P frames. This se-quence is created by cascading several test sequences
with different scene content and serves as a high quality
pre-encoded video. In order to simulate the wireless
channel with the ARQ protocol, we use a random
number generator to generate a random number r,which is uniformly distributed in {0,1}. Based on the
Markov model and transition matrices introduced in
Section 5.1, if the current state is S0, when r is less thanP01, the transition from state S0 to S1 occurs. If the
current state is S1, when r is less than P10, the transitionfrom S1 to S0 occurs. In this way, we generate 6 packet
level channel error traces with length of 50,000 packets.
Every trace represents a possible channel with a specific
packet error rate. In the transcoder, the proposed joint
source-channel adaptive transcoding scheme is used to
transcode the original video according to the estimatedchannel conditions. We assume the channel bandwidth
for video payload is 64, 48 and 32 kbps. Based on the
channel feedback and the same channel model that is
used to generate the error trace, the transcoder will
estimate the effective channel bandwidth after every
frame interval. We simulate the channel coder, trans-
mitter and decoder buffer. Whether a video packet is
transmitted successfully to the decoder will be deter-
Transcoder
Pre-encodedVideo
ChannelModel
Decoder
VideoOutput
ARQ
Transcod
Fig. 8. System diagram of the proposed ARQ-b
mined by the error trace. A simplified simulation sce-
nario is shown in Fig. 8.
In order to compare the video quality, we also encode
the source video sequence at the same rates of 64, 48 and
32 kbps by using the TMN8 rate control scheme. At the
frame layer, TMN8 allocates near constant bit budget toevery frame without considering the frame types and
scene context. A frame is skipped if the number of bits
accumulated in the buffer after encoding the previ-
ous frame is greater than a threshold. To get a fair
comparison, we change this threshold to allow longer
end-to-end delay, and then there are will be no frame
skipping in the encoded video.
The visual quality is measured by calculating the peaksignal noise ratio (PSNR) of the transcoded video and
the encoded video. The PSNR comparison result for
different effective channel rate and end-to-end delay is
illustrated in Fig. 9. As shown in Fig. 9, by controlling
the number of bits assigned to each frame considering the
frame types, we can obtain higher PSNR result as com-
pared with encoding the original video at the same low
channel bandwidth, and maximum end-to-end delay.Especially when a scene change happens at the 100th
frame, because the INTER frame at the scene change is
transcoded to an I frame, the visual quality of the fol-
lowing frame is much higher than the encoded version.
We also calculate the time when every frame arrives
at the decoder buffer and is displayed. The comparison
result is illustrated in Fig. 10. At the right side, we also
calculate the result of the encoded video being trans-mitted over the same channel. As we can see, by rate
adaptation transcoding, the arrival time of every frame
is very close to the display time. The decoder will not
have to suspend decoding and displaying for a long
time. However, if the test sequence is encoded with low
bit rate (64, 48, 32 kbps) as the target bit rate, since the
effective channel rate is changing over time, after several
frames the frame arrival time will be late than its display
Channel Coder
Channel Decoder
(Packetizer,FEC,
Interleaving)
ed Stream Rate
Effective Rate with Retransmissions
Generated packeterror traces
(...010111011000…)
ased transcoding and streaming scheme.
0 40 80 120 160 2000
10
20
30
40
50
R = 64 kbpsD = 10 frame intervals
PSN
R (d
B)
Frame
encoded version
0 40 80 120 160 2000
10
20
30
40
50
PSN
R (d
B)
R = 48 kbpsD = 20 frame intervals
0 40 80 120 160 2000
10
20
30
40
50
PSN
R (d
B)
R = 32 kbpsD = 30 frame intervals
encoded version
encoded versiontranscoded version
Frame
Frame
transcoded version
transcoded verison
Fig. 9. Video quality comparison of transcoded video and encoded video at different channel bandwidth and end-to-end delay.
Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 267
time. Therefore, the decoder will have to wait for the
next frame coming.Fig. 11 shows the relation among the average PSNR
of the whole sequence, the end-to-end delay and the
target channel rate. As we can see, for the same target
channel rate, when the end-to-end delay is relaxed, theaverage PSNR increases. Therefore, the proposed
transcoding scheme will be able to take full advantage of
0 40 80 120 160 2000
50
100
150
200
250
display time
Transcoded VersionR = 64 kbpsD = 10 frame intervals
0 40 80 120 160 2000
50
100
150
200
250
Encoded VersionR = 64 kbpsD = 10
0 40 80 120 160 2000
50
100
150
200
250
Transcoded VersionR = 48 kbpsD = 10 frame intervals
0 40 80 120 160 2000
50
100
150
200
250
Encoded VersionR = 48 kbpsD = 10
0 40 80 120 160 2000
50
100
150
200
250
Transcoded VersionR = 32 kbpsD = 10
0 40 80 120 160 2000
50
100
150
200
250
Encoded VersionR = 32 kbpsD = 10 frame intervals
FrameFrame
FrameFrame
Frame Frame
Tim
e (fr
ame
inte
rval
s)
Tim
e (fr
ame
inte
rval
s)Ti
me
(fram
e in
terv
als)
Tim
e (fr
ame
inte
rval
s)
Tim
e (fr
ame
inte
rval
s)Ti
me
(fram
e in
terv
als)
arrival timedisplay time
arrival timedisplay time
frame intervals
arrival timedisplay time
frame intervals
arrival timedisplay time
frame intervalsarrival timedisplay time
arrival time
Fig. 10. Comparison of frame arrival time and scheduled display time.
268 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270
the end-to-end delay. However, if we just encode the test
sequence at the target channel rate, the video quality will
be fixed after encoding. No matter what the end-to-end
delay will be, the quality of the decoded video sequence
will not change.
In the above shown experimental results, only the
channel error trace with average PER¼ 5% is used.Fig. 12 shows the relation between the packet error
rate and the average PSNR at the same end-to-end
delay. From the experiment results, we can see that
rate adaptation transcoding is an effective solution
for streaming high quality pre-encoded video through
low bandwidth wireless channels. Due to its finer
level control on the transcoding frame bit budget,
considering the source video content, channel condi-
tion and end-to-end delay, it is very flexible for deliv-ering video services and applications over wireless
channels.
5 10 15 20 25 3033
35
37
39
41
43
R= 64 KbpsR= 48 KbpsR= 32 Kbps
D= 10 frame intervals
Average Packet Error Rate (%)
Aver
age
PSN
R (d
B)
Fig. 12. Average PSNR vs. average PER and channel bandwidth.
10 15 20 25 30 3533
35
37
39
41
43
R = 64 kbpsR = 48 kbpsR = 32 kbps
Delay (frame intervals)
Ave
rage
PS
NR
(dB
)
Fig. 11. Average PSNR vs. end-to-end delay and channel bandwidth.
Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270 269
References
Aramvith, S., Pao, I.-M., Sun, M.-T., 2001. A rate-control scheme for
video transport over wireless channels. IEEE Transactions on
Circuits and Systems for Video Technology 11 (5).
Assunc�~ao, P., Ghanbri, M., 1996. Post-processing of MPEG-2 coded
video for transmission at lower bit rates. In: Proceedings of IEEE
International Conference on Acoustics, Speech, and Signal Pro-
cessing, ICASSP’96, vol. 4, May 1996.
Assunc�~ao, P., Ghanbari, M., 1997. Optimal transcoding of compressed
video. In: IEEE International Conference on Image Processing,
ICIP’97, vol.1, USA, October 1997, pp. 739–742.
Assunc�~ao, P., Ghanbari, M., 1997. Congestion control of video traffic
with transcoders. In: IEEE International Conference on Commu-
nications, ICC’97, Montreal, 1997.
Assunc�~ao, P., Ghanbari, M., 1998. A frequency-domain video
transcoder for dynamic bit-rate reduction of MPEG-2 bit streams.
IEEE Transactions on Circuits and Systems for Video Technology
8 (8).
Assunc�~ao, P., Ghanbari, M., 2000. Buffer analysis and control in CBR
video transcoding. IEEE Transactions on Circuits and Systems for
Video Technology 10 (1).
Chang, S.-F., Messerschmitt, D.G., 1993. A new approach to decoding
and composting motion compensated DCT-based images. In:
Proceeding of IEEE International Conference on Acoustics,
Speech, and Signal Processing, ICASSP’93, vol. 5, April 1993.
Chen, M.-J., Chu, M.-C., Pan, C.-W., 2002. Efficient motion-estima-
tion algorithm for reduced frame-rate video transcoder. IEEE
Transactions on Circuits and Systems for Video Technology
12 (4).
Cheng, N.T., Kingsbury, N.G., 1992. The EREC: an efficient error
resilient technique for encoding positional information on sparse
data. IEEE Transactions on Communications 40 (Jan.).
Chiang, T., Zhang, Y.-Q., 1997. A new rate control scheme using
quadratic rate distortion model. IEEE Transactions on Circuits
and Systems for Video Technology 7 (1), 246–250.
Choi, J., Park, D., 1994. A stable feedback control of the buffer state
using the controlled Langrange multiplier method. IEEE Transac-
tions on Image Processing 3 (Sept.), 546–558.
Corbera, J.R., Lei, S., 1999. Rate control in DCT video coding for
low-delay communications. IEEE Transaction on Circuit and
System for Video Technology 9 (1).
Ding, W., Liu, B., 1996. Rate control of MPEG video coding and
recording by rate-quantization modeling. IEEE Transactions on
Circuits and Systems for Video Technology 6 (1), 12–20.
Dogan, S., Cellatoglu, A., Sadka, A.H., Kondoz, A.M., 2001. Error-
resilient MPEG-4 video transcoder for bit rate regulation. In:
Proceedings of the Fifth World Multi-Conference on Systemics,
Cybernetics and Informatics (SCI’2001), vol. XII, Part. II,
Orlando, Florida, USA, 22–25 July 2001, pp. 312–317.
Dogan, S., Sadka, A.H., Kondoz, A.M., 2001. MPEG-4 video
transcoder for mobile multimedia traffic planning. In: Proceedings
of the IEE Second International Conference on 3G Mobile
Communication Technologies (3G’2001), No. 477, London, UK,
26–28 March 2001, pp. 109–113.
Fung, K.-T., Chan, Y.-L., Siu, W.-C., 2001. Dynamic frame skipping
for high-performance transcoding. In: Proceedings of the Interna-
tional Conference on Image Processing (ICIP2001).
Ghanbari, M., 1989. Two-layer coding of video signals for VBR
networks. IEEE Journal on Selected Areas in Communications 7
(June).
Gilbert, E.N., 1960. Capacity of a burst-noise channel. The Bell
System Technical Journal 39 (Sept.), 1253–1265.
Hang, H.-M., Chen, J.-J., 1997. Source model for transform video
coder and its application––Part I: fundamental theory. IEEE
Transactions on Circuits and System for Video Technology 7 (2),
287–298.
He, Z., Kim Yong, K., Mitra, S.K., 2001. Low-delay rate control for
DCT video coding via q––domain source modeling. IEEE Trans-
action on Circuits and Systems for Video Technology 11 (8).
Hsu, C.-Y., Ortega, A., Khansari, M., 1999. Rate control for robust
video transmission over burst-error wireless channels. IEEE
Journal on Selected Areas in Communications 17 (5).
Hwang, J.-N., Wu, T.-D., Lin, C.-W., 1998. Dynamic frame skipping
in video transcoding. In: Proceedings of IEEE Workshop on
Multimedia Signal Processing, USA, December 1998.
Keesman, G., Hellinghuizen, R., Hoeksema, F., Heideman, G., 1996.
Transcoding of MPEG bitstreams. Signal Processing: Image
Communication, 481–500.
Khansari, M., Jalali, A., Dubois, E., Mermelstein, P., 1996. Low bit-
rate video transmission over fading channels for wireless microcel-
lular systems. IEEE Transactions on Circuits and Systems for
Video Technology 6 (1).
Lei, Z., Georganas, N.D., 2002. Rate adaptation transcoding for
precoded video streams. In: Proceedings of ACMMultimedia 2002,
Juan-les-Pins, France, December 1–6, 2002.
Lin, C.-W., 2000. Video transcoding techniques for multipoint video
conferencing. Ph.D. dissertation, Department of Electrical Engi-
neering, National Tsing Hua University, January 2000.
270 Z. Lei, N.D. Georganas / The Journal of Systems and Software 75 (2005) 253–270
Lin, C.-W., Chen, Y.-C., Sun, M.-T., 2000. Dynamic region of interest
transcoding for multipoint video conferencing. In: Proceedings of
International Computer Symposium and Workshop on Computer
Networks, Internet, and Multimedia, Chiayi, Taiwan, December
2000.
Lin, C.-W., Liou, T.-J., Chen, Y.-C., 2000. Dynamic rate control in
multipoint video transcoding. In: IEEE International Symposium
on Circuits and Systems, ISCAS 2000, Geneva, Switzerland, May
2000.
Liu, H., Zarki, M., 1998. Adaptive source rate control for real-time
wireless video transmission. Mobile Networks and Applications 3
(1).
Puri, R., Lee, K.-W., Ramchandran, K., Bharghavan, V., 2001. An
integrated source transcoding and congestion control paradigm for
video streaming in the Internet. IEEE Transactions on Multimedia
3 (1).
Ramchandran, K., Ortega, A., Vetterli, M., 1994. Bit allocation for
dependent quantization with application to multiresolutions and
MPEG video coders. IEEE Transaction on Image Processing 2
(September).
Redmill, D.W., Kingsbury, N.G., 1996. The EREC: an error resilient
technique for coding variable length blocks of data. IEEE
Transactions on Image Processing 5 (April), 6.
Reyes, G., Reibman, A.R., Chang, S.-F., Chuang, J.C., 2000. Error-
resilient transcoding for video over wireless channels. IEEE Journal
on Selected Areas in Communications 18 (6).
Shanableh, T., Ghanbari, M., 2000. Heterogeneous video transcoding
to lower spatio-temporal resolutions and different encoding
formats. IEEE Transactions on Multimedia 2 (2).
Shen, B., Sethi, I., Vasudev, B., 1999. Adaptive motion-vector
resampling for compressed video downscaling. IEEE Transactions
on Circuits and Streams for Video Technology 9 (6).
Sun, H., Kwok, W., Zdopski, J.W., 1996. Architecture for MPEG
compressed bitstream scaling. IEEE Transactions on Circuits and
Systems for Video Technology 6 (Apr).
Tao, B., Dickinson, B.W., Peterson, H.A., 2000. Adaptive model-
driven bit allocation for MPEG video coding. IEEE Transactions
on Circuits and Systems for Video Technology 10 (1), 147–157.
Video Codec Test Model, TMN8, ITU-T/SG-15, 1997. Available from
<http://www.ece.ubc.ca/spmg/h263plus/h263plus.html>.
Wang, Y., Zhu, Q.-F., 1998. Error control and concealment for video
communication: a review. Proceedings of the IEEE 86 (5).
Wu, S.-W., Gersho, A., 1991. Rate-constrained optimal block-adap-
tive coding for digital tape recording of HDTV. IEEE Transactions
on Circuits and Systems for Video Technology 1 (March), 100–112.
Youn, J., Sun, M.T., 1999. A fast motion vector composition method
for temporal transcoding. In: Proceedings IEEE International
Symposium on Circuits and Systems (ISCAS’99), May 1999.
Zhu, Q.-F., Keofsky, L., Garrison, M.B., 1999. Low-delay, low-
complexity rate reduction and continuous presence for multipoint
videoconferencing. IEEE Transactions on Circuits and Systems for
Video Technology 9 (4).
Zhijun Lei received his B.E. degree in Computer Engineering fromDalian University of Technology in 1996, his M.E. degree in ComputerEngineering from Beijing University of Posts and Telecommunicationsin 1999, and his Ph.D. in Computer Science from the University ofOttawa in 2003. His research interests are in the fields of digital videocoding and transcoding, rate control, wireless video communications,etc. Dr. Lei is a member of IEEE.
Nicolas D. Georganas, OOnt, FIEEE, FRSC, FCAE, FEIC is Dis-tinguished University Professor and Canada Research Chair inInformation Technology at the School of Information Technology andEngineering, University of Ottawa.
He received the Dipl. Ing. degree in Electrical Engineering from theNational Technical University of Athens, Greece, in 1966 and thePh.D. in Electrical Engineering (Summa cum Laude) from the Uni-versity of Ottawa in 1970.
He has published over 300 technical papers and is co-author ofthe book ’’Queueing Networks-Exact Computational Algorithms: AUnified Theory by Decomposition and Aggregation’’, MIT Press,1989. He has received research grants and contracts totaling morethan $51 million and has supervised more than175 researchers,among which 90 graduate students (25 Ph.D., 65 MASc) and 18PostDocs.
In 1990, he was elected Fellow of IEEE. In 1994, he was electedFellow of the Engineering Institute of Canada. In 1995, he was co-recipient of the IEEE INFOCOM’95 Prize Paper Award. In 1997, hewas inducted as Fellow in the Canadian Academy of Engineering andFellow of the Royal Society of Canada. In 1998, he was selected asthe University of Ottawa Researcher of the Year and also receivedthe University 150th Anniversary Medal for Research. In 1999, hewas awarded the Thomas W. Eadie Medal of the Royal Society ofCanada, funded by Bell Canada, for his contributions to Canadianand International telecommunications. In 2000, he received theA.G.L. McNaughton Gold Medal and Award for 1999–2000,the highest distinction of IEEE Canada; the Julian C. Smith Medal ofthe Engineering Institute of Canada; the OCRI President’s Award(jointly with Dr. Samy Mahmoud) for the creation of the NationalCapital Institute of Telecommunications (NCIT); the Bell CanadaForum Award from the Corporate-Higher Education Forum, theResearcher Achievement Award, from the TeleLearning Network ofCentres of Excellence and a Canada Research Chair in InformationTechnology. In 2001, he was appointed Distinguished UniversityProfessor of the University of Ottawa and he was also received theOrder of Ontario, the province’s highest and most prestigious hon-our. In 2002, he received the Killam Prize for Engineering, Canada’smost distinguished award for outstanding career achievements.
Recommended