A Uniﬁed Trafﬁc Model for MPEG-4 and H.264 Video Tracesirl.cs.tamu.edu/people/min/papers/infocom2005-tr.pdfStar Wars IV [30] FGS 30 MPEG-4 Citizen Kane [30] Temporal 30 MPEG-4

1

A Unified Traffic Model forMPEG-4 and H.264 Video Traces

Min Dai and Dmitri Loguinov

Abstract— This paper presents a frame-level hybrid frameworkfor modeling MPEG-4 and H.264 multi-layer variable bit rate(VBR) video traffic. To accurately capture long-range dependentand short-range dependent properties of VBR sequences, we usewavelets to model the distribution of I-frame sizes and a simpletime-domain model for P/B frame sizes. However, unlike previousstudies, we analyze and successfully model both inter-GOP(Group of Pictures) and intra-GOP correlation in VBR video andbuild an enhancement-layer model using cross-layer correlation.Simulation results demonstrate that our model effectively pre-serves the temporal burstiness and captures important statisticalfeatures (e.g., the autocorrelation function and the frame-sizedistribution) of original traffic. We also show that our modelpossesses lower complexity and has better performance than theprevious methods in both single and multi-layer sequences.

I. INTRODUCTION

Video traffic modeling plays an important role in the char-acterization and analysis of network traffic. Besides providingan insight into the coding process and structure of videosequences, traffic models can be used for many practicalpurposes including allocation of network resources, designof efficient networks for streaming services, and delivery ofcertain Quality of Service (QoS) guarantees to end users [27].

Although many studies have been conducted in this area,most existing traffic models only apply to single-layer VBRvideo and often overlook the multi-layer aspects of streamingtraffic in the current Internet [2], [40]. In addition, trafficmodeling research is falling behind the rapid advances in videotechniques since very limited research has been done to modelH.264 video sequences [6]. Therefore, the goal of this work isto better understand the statistical properties of various videosequences and to develop a unified model for single and multi-layer MPEG-4 and H.264 video traffic.

A good traffic model should capture the characteristics ofvideo sequences and accurately predict network performance(e.g., buffer overflow probabilities and packet loss). Amongthe various characteristics of video traffic, there are twomajor interests: (1) the distribution of frame sizes; and (2)the autocorrelation function (ACF) that captures commondependencies between frame sizes in VBR video. In regardto the first issue, several models have been proposed for theframe-size distribution, including the lognormal [10], Gamma[34], and various hybrid distributions (e.g., Gamma/Pareto [23]or Gamma/lognormal [32]).

Furthermore, compared to the task of fitting a model tothe frame-size distribution, capturing the ACF structure of

An earlier version of this paper appeared in IEEE INFOCOM 2005.Min Dai and Dmitri Loguinov are with Texas A&M University, College

Station, TX 77843 USA (email: [email protected], [email protected]).

VBR video traffic is more challenging due to the fact thatVBR traces exhibit both long-range dependent (LRD) andshort-range dependent (SRD) properties [12], [24]. The co-existence of SRD and LRD indicates that the ACF structureof video traffic is similar to that of SRD processes at smalltime lags and to that of LRD processes at large time lags[12]. Thus, using either an LRD or SRD model alone does notprovide satisfactory results. Many studies have been conductedto address this problem, but only a few of them have managedto model the complicated LRD/SRD ACF structure of realvideo traffic (e.g., [23], [24]).

Besides the complex autocorrelation properties, video trafficalso exhibits inter- and intra-GOP1 correlation due to theGOP-based coding structure of many popular standards. Whilethe former is well characterized by the ACF of the I-framesof each GOP, the latter refers to the correlation betweenP/B-frames and the I-frame in the same GOP. Whereas mostmodels try to capture the inter-GOP correlation, the intra-GOPcorrelation has been rarely addressed in related work eventhough it is an important characteristic useful in computingprecise bounds on network packet loss [21].

In this paper, we develop a modeling framework that is ableto capture the complex LRD/SRD structure of single/multi-layer video traffic, while addressing the issues of bothinter/intra-GOP and cross-layer correlation. We model I-framesizes in the wavelet domain using estimated wavelet coef-ficients, which are more mathematically tractable than theactual coefficients. After a thorough analysis of intra-GOPcorrelation, we generate synthetic P-frame traffic using a time-domain linear model of the preceding I-frame to preserve theintra-GOP correlation. We use a similar model to preservethe cross-layer correlation in multi-layer video sequences andshow that the performance of the resulting model is better thanthat of prior methods.

The specifics of the sample sequences used in this paper areshown in Table I. As shown in the table, single-layer trafficincludes both MPEG-4 and H.264 video sequences and multi-layer traffic includes both layer-coded and multiple descriptioncoded (MDC) sequences. Non-MDC sequences are one-hourlong and have GOP structure IBBPBBPBBPBB. The length ofthe sample MDC-coded sequence is varying since temporalsubsampling is applied. More details about MDC coding isgiven in Section VI. Further note that sequences coded fromthe same video but with different quantization steps are notrepetitively listed. For example, we only show JurassicPark I once in Table I, while this paper uses several single-

1A GOP includes one I-frame and several P- and B-frames.

2

TABLE ISOME CHARACTERISTICS OF SAMPLE SEQUENCES

Sequence name Scalability Rate (fps) StandardStarship Troopers [30] None 25 H.264Star Wars IV [8] None 25 MPEG-4Jurassic Park I [8] None 25 MPEG-4Starship Troopers [8] None 25 MPEG-4Star Trek - First Contact [8] None 25 MPEG-4The Silence of the Lambs [8] None 25 MPEG-4Bridge [30] MDC 25 MPEG-4Star Wars IV [30] FGS 30 MPEG-4Citizen Kane [30] Temporal 30 MPEG-4The Silence of the Lambs [30] Spatial 30 MPEG-4

layer Jurassic Park I sequences that are coded withdifferent quantization steps.

The outline of this paper is as follows. Section II overviewsthe related work on traffic modeling. In Section III, we providethe technical background knowledge on wavelet analysis andanalyze some statistical properties of wavelet coefficients (i.e.,the coefficients generated by the wavelet transform). In SectionIV, we show how to model single-layer and base-layer videotraffic by generating synthetic I-trace in the wavelet domainand P/B traces with a linear I-trace model. Sections V and VIanalyze and model the cross-layer correlation in layer-codedand MDC-coded video traces, respectively. In Section VII, weevaluate the accuracy of our model using both single-layer andmulti-layer video traffic. Section VIII concludes the paper.

II. RELATED WORK

The topic of VBR traffic modeling has been extensivelystudied and a variety of models have been proposed in theliterature. In this section, we briefly overview related work onsingle-layer and multi-layer traffic models.

A. Single-Layer Models

According to the dominant stochastic method applied ineach model, we group existing single-layer models into severalcategories and present the main results of each group below.

We first discuss auto-regressive (AR) models, since theyare classical approaches in the area of traffic modeling. Afterthe first AR model was applied to video traffic in 1988 [25],AR processes and their variations remain highly popular inthis area of research [23]. For example, Corte et al. [4] use alinear combination of two AR(1) processes to model the ACFof the original video traffic, in which one AR(1) process isused for modeling small lags and the other one for large lags.Since using a single AR process is generally preferred, Krunzet al. [10] model the deviation of I-frame sizes from their meanin each scene using an AR(2) process. Building upon Krunz’work [10], Liu et al. [23] propose a nested AR(2) model,which uses a second AR(2) process to model the mean frame-size of each scene. In both cases, scene changes are detectedand scene length is modeled as a geometrically distributedrandom variable. In [14], Heyman et al. propose a discreteautoregressive (DAR) model to model videoconferencing data.

Since the DAR model is not effective for single-source videotraffic, Heyman [16] later develops a GBAR model, which hasGamma-distributed marginal statistics and a geometric auto-correlation function. By considering the GOP cyclic structureof video traffic, Frey et al. [9] extend the GBAR model in[16] to the GOP-GBAR model.

The second category consists of Markov-modulated models,which employ Markov chains to create other processes (e.g.,the Bernoulli process [20], AR process [3]). Rose [33] usesnested Markov chains to model GOP sizes. Since syntheticdata are generated at the GOP level, this model actuallycoarsens the time scale and thus is not suitable for high-speed networks. Ramamurthy and Sengupta [29] propose ahierarchical video traffic model, which uses a Markov chainto capture scene change and two AR processes to match theautocorrelation function in short and long range, respectively.Extending the above work, Chen et al. [3] use a doublyMarkov modulated punctured AR model, in which a nestedMarkov process describes the transition between the differentstates and an AR process describes the frame size at each state.The computation complexity of this method is quite high dueto the combination of a doubly Markov model and an ARprocess. Sarkar et al. [34] propose two Markov-modulatedGamma-based algorithms. At each state of the Markov chain,the sizes of I, P, and B-frames are generated as Gamma-distributed random variables with different sets of parameters.Although Markov-modulated models can capture the LRD ofvideo traffic, it is usually difficult to accurately define andsegment video sources into the different states in the timedomain due to the dynamic nature of video traffic [24].

We classify self-similar processes and fractal models asthe third category. Garrett et al. [12] propose a fractionalARIMA (Autoregressive Integrated Moving Average) modelto replicate the LRD properties of compressed sequences, butdo not provide an explicit model for the SRD structure of videotraffic. Using the results of [12], Huang et al. [17] present aself-similar fractal traffic model; however, this model does notcapture the multi-timescale variations in video traffic [10].

Other approaches include the M/G/∞ process [11] andTransform-Expand-Sample (TES) based models [26]. Theformer creates SRD traffic [23] and the latter has high compu-tational complexity and often requires special software (e.g.,TEStool) to generate synthetic sequences. Different from theabove time-domain methods, several wavelet models [24], [31]recently emerged due to their ability to accurately capture bothLRD and SRD properties of video traffic [24].

B. Multi-Layer Models

Most traffic modeling studies focus on single-layer videotraffic and much less work has been done to model multi-layer sequences. Ismail et al. [18] use a TES-based methodto model VBR MPEG video that has two levels of priority,which might be considered the first multi-layer traffic model.Later, Chandra et al. [2] use a finite-state Markov chain tomodel one- and two-layer scalable video traffic. They assumethat only one I-frame exists in the whole video sequence andthe I-frame size is simply a Gaussian random variable. The

3

model clusters P-frame sizes into K states according to thecorrelation between successive P-frame sizes and uses a first-order AR process to model the frame size in each state. Thegoal of [2] is to model one or two-layer video traffic with aCBR base layer, while many multi-layer video sequences havemore than two layers and the base-layer is VBR.

Similarly to the work in [2], Zhao et al. [40] build a K-stateMarkov chain based on frame-size clusters. The clusteringfeature in [40] is the cross-layer correlation between the framesize of the base layer and that of the enhancement layer at thesame frame index. In each state of the Markov chain, the baseand the enhancement-layer frame sizes follow a multivariatenormal distribution. However, the computational cost of thehierarchical clustering approach applied in [40] limits itsapplication only to short video sequences. Furthermore, inboth [2] and [40], there is no general method for choosingthe optimal number of states and the necessary parameters areoften selected empirically.

III. WAVELET ANALYSIS

The wavelet transform has become a powerful technique inthe area of traffic modeling [24]. Wavelet analysis is typicallybased on a decomposition of the signal using a family of basisfunctions, which includes a high-pass wavelet function anda low-pass scaling filter. The former generates the detailedcoefficients, while the latter produces the approximation coef-ficients of the original signal.

In order to better understand the structure of wavelet co-efficients, we investigate statistical properties of both detailedand approximation coefficients in this section.

A. Detailed Coefficients

For discussion convenience, we define {Aj} to be therandom process modeling approximation coefficients Ak

j and{Dj} to be the process modeling detailed coefficients Dk

j

at the wavelet decomposition level j, where k is the spatiallocation of Ak

j and Dkj . We also assume that j = J is the

coarsest scale and j = 0 is the original signal.One big advantage of the wavelet transform is its ability to

reduce the correlation of the original signal, which is capturedby the covariance function of the detailed coefficients.

Proposition 1: The detailed coefficients of an LRD processpossess SRD properties.

Proof: Assume that {X(t)} is an LRD process withspectral density function Γ(ν) ∼ c|ν|−α, where c > 0 and0 < α < 1.

The covariance function of any two detailed coefficients Dkj

and Dk′j′ of {X(t)} is:

cov[Dkj , Dk′

j′ ] = E[Dkj Dk′

j′ ]− E[Dkj ]E[Dk′

j′ ], (1)

where j, j′ are the decomposition levels and k, k′ show thesample locations. Note that the high-pass nature of waveletfunction leads to E[Dk

j ] = E[Dk′j′ ] = 0.

Furthermore, Tewfik and Kim [37] show that E[Dkj Dk′

j′ ]decreases hyperbolically with the distance between the two

wavelet coefficients as |2jk − 2j′k′|2H−2N , where H is theHurst parameter and α = 2H − 1. Thus, (1) becomes

cov[Dkj , Dk′

j′ ] = E[Dkj Dk′

j′ ]

= |2jk − 2j′k′|α−1−2N , (2)

where α is the parameter in the spectral density function of{X(t)}, N is the vanishing moment2 of mother wavelet ψ0,and term |2jk − 2j′k′| is the shortest distance between twodetailed coefficients. Note that |2jk− 2j′k′| is always greaterthan 1.

For further analysis, we rewrite the last step in (2) as|2jk − 2j′k′|−(2N+1−α). Since N ≥ 1 and α ∈ [0, 1), itfollows that N ≥ α/2 and (2N + 1 − α) > 1. Therefore,|2jk−2j′k′|−(2N+1−α) converges to zero and

∑∞k=−∞ |2jk−

2j′k′|−(2N+1−α) converges to a constant. Recall that a processis short-range dependent if the sum of its autocorrelation3

function γ(k) is summable, i.e.,∑∞

k=−∞ γ(k) is finite. Thus,the detailed coefficients are short-range dependent.

B. Approximation Coefficients

After analyzing the detailed coefficients, we next examinethe autocorrelation function and the marginal distributionof the approximation coefficients. We use the Haar wavelettransform as a typical example since it is often chosen for itssimplicity and good performance [24], [31]. Recall that theHaar scaling and wavelet functions are, respectively:

ϕ(t) =

{1 0 ≤ t < 10 otherwise

, ψ(t) =

1 0 ≤ t < 1/2−1 1/2 ≤ t < 1

0 otherwise.

Then the Haar approximation coefficients Akj are obtained via

[31]:Ak

j = 2−1/2(A2kj−1 + A2k+1

j−1 ), (3)

where j is the decomposition level and k is the index ofa process. Since video traffic possesses strong self-similarity[27], we have the following proposition.

Proposition 2: The Haar approximation coefficients of aself-similar process preserve the correlation structure of theoriginal signal.

Proof: Recall that the aggregated process {X(m)} of aself-similar process {X(t)} at aggregation level m is,

X(m)(i) =1m

mi∑

t=m(i−1)+1

X(t). (4)

Comparing (4) to (3), one can observe that {Aj} is aweighted aggregated process {X(2j)} of the original signal{X(t)}. Since that a self-similar process and its aggregatedprocess have the same autocorrelation structure [27], theproposition follows.

We apply the wavelet transform to the sizes of I-framesin sample sequences and examine the statistical properties

2A wavelet has vanishing moments of order m ifR∞−∞ tpψ(t)dt = 0,

where p = 0, · · · , m− 1 [13].3In the traffic modeling literature, the normalized auto-covariance function

is often used instead of the autocorrelation function [23].

4

of the detailed and approximation coefficients. In Fig. 1 (a),we show the ACF of processes {A3} and {D3} computedbased on the I-frame sizes in single-layer Star Wars IVusing Haar wavelets (labeled as “ACF detailed” and “ACFapprox”, respectively). As shown in the figure, the ACF of{D3}, which is a typical example of detailed coefficients, isalmost zero at non-zero lags, which means that it is an i.i.d.(uncorrelated) noise. This explains why previous literaturecommonly models detailed coefficients {Dj} as zero-meani.i.d. Gaussian variables [24]. Fig. 1 (a) also shows that theapproximation coefficients {Aj} have a slower decaying ACFcompared to that of the detailed coefficients, which impliesthat they cannot be modeled as i.i.d. random variables.

We further examine the marginal distribution of the originalsignal and the approximation coefficients using sample se-quences. As a typical example, we illustrate the distribution ofthe approximation coefficients {A3} and that of {A0} (originalI-frame sizes) of single-layer Star Wars IV in Fig. 1 (b).Although the approximation coefficients are often modeled asGaussian distributed random variables according to the CentralLimit Theorem [24], Fig. 1 (b) shows that the symmetricalGaussian distribution might not be able to describe the heavytail of the actual PDF of {A3}.

Recalling that I-frame sizes {A0} frequently follow aGamma distribution [32], we next examine the relationship be-tween {A0} and the approximation coefficients {Aj , j > 0} invarious sequences with the help of the following lemma. Notethat {Aj} is a random process Aj = (A1

j , A2j , · · · , Ak

j , · · · )and Ak

j is a random variable.Lemma 1: Given that the I-frame sizes follow a Gamma

distribution, the approximation coefficients Akj , j ≥ 1 follow

a linear combination of Gamma distributions.Proof: For brevity, we only derive the distribution of Ak

1

and note that the derivations for Akj , j ≥ 2 are very similar.

According to (3), each value of Ak1 is a linear summation of

the sizes of two neighboring I-frames, which we denote byXk

1 and Xk2 , respectively. Notice that Xk

1 and Xk2 are two

correlated Gamma-distributed random variables. Then,

Ak1 = 2−1/2(Xk

1 + Xk2 ), (5)

where Xki ∼ Gamma(αi, λi), i = 1, 2. We can rewrite Xk

i inthe form of a standard Gamma distribution:

Xk1 = λ1Y1, (6)

Xk2 = λ2Y2, (7)

where Yi ∼ Gamma(αi, 1) are two standard Gamma randomvariables.

Next, to capture the correlation between Xk1 and Xk

2 , wefurther decompose Y1 and Y2 into a sum of two independentstandard Gamma random variables using the decompositionproperties of standard Gamma distributions [9]:

Y1 = Y11 + Y12, (8)Y2 = Y12 + Y22, (9)

where Y11, Y12, and Y22 are independent of each other andfollow standard Gamma distributions with parameters α11,

-0.2

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50

lag

auto

corr

elat

ion

ACF approx.

ACF detailed

(a)

0

0.01

0.02

0.03

0.04

0.05

0 2000 4000 6000 8000 10000

bytes

pro

bab

ility

I-frame size

approx. coefficients

(b)

Fig. 1. (a) The ACF structure of coefficients {A3} and {D3} in single-layer Star Wars IV. (b) The histogram of I-frame sizes and that ofapproximation coefficients {A3}.

α12, and α22, respectively. Then the correlation between Xk1

and Xk2 becomes:

cov(Xk1 , Xk

2 ) = λ1λ2var(Y12) = λ1λ2α22. (10)

Combining (5) and (10), we rewrite Ak1 as:

Ak1 = 2−1/2 (λ1Y11 + (λ1 + λ2)Y12 + λ2Y22) . (11)

As can be observed from (11), Ak1 is a linear combination of

independent standard Gamma distributions, which leads to thestatement of the lemma.

Extensive experimental results show that a single Gammadistribution is accurate enough to describe the actual histogramof {Aj}. Fig. 1 (b) also shows that the distribution of {Aj}has a similar Gamma shape as that of I-frame sizes, but withdifferent parameters. In the next section, we use this infor-mation to efficiently estimate the approximation coefficients.

IV. MODELING SINGLE/BASE-LAYER

In this section, we discuss the issue of modeling the single-layer traffic and the base-layer of layer-coded traces, sincethe latter can be considered as the former from video codingperspective. We generate synthetic I-frame sizes in the waveletdomain and then model P/B-frame sizes in the time domainbased on the intra-GOP correlation.

A. Modeling I-Frame Sizes

It is well-known that wavelet-based algorithms have anadvantage over the time-domain methods in capturing theLRD and SRD properties of video [24], [31]. Furthermore,wavelet methods do not require that we specifically modelscene-change lengths since wavelets are good at detectingdiscontinuities in video traffic, which are most often generatedby scene changes. Due to these characteristics of wavelettransform, we model the I-frame sizes in the wavelet domainusing the estimated approximation and detailed coefficients,which are represented by {Dj} and {Aj}, respectively.

Even though previous wavelet-based traffic modeling meth-ods often model {Dj} as zero-mean i.i.d. Gaussian variables[24], there is insufficient evidence as to the distribution of theactual {Dj} found in GOP-based video traffic. To provide

5

1

10

100

1000

10000

-700 -500 -300 -100 100 300 500 700

coefficients (bytes)

(a) actual

1

10

100

1000

10000

-700 -500 -300 -100 100 300 500 700

coefficients (bytes)

(b) Gaussian

1

10

100

1000

10000

-700 -500 -300 -100 100 300 500 700coefficients (bytes)

(c) GGD

1

10

100

1000

10000

-700 -500 -300 -100 100 300 500 700coefficients (bytes)

(d) mixture-Laplacian

Fig. 2. Histograms of (a) the actual detailed coefficients; (b) the Gaussianmodel; (c) the GGD model; and (d) the mixture-Laplacian model.

some insight into the structure of detailed coefficients, wecompare the histogram of the actual coefficients {D1} inStar Wars IV with those generated by several alternativemodels in Fig. 2 (note that the y-axis is scaled logarithmically).Fig. 2 (a) displays the histogram of the actual {D1}, part (b)shows that the Gaussian fit matches neither the shape, northe range of the actual distribution, and part (c) demonstratesthat the Generalized Gaussian Distribution (GGD) producesan overly sharp peak at zero (the number of zeros in GGDis almost three times larger than that in the actual {D1}) andalso does not model the range of the real {D1}.

Additional simulations (not shown for brevity) demonstratethat a single Laplacian distribution is not able to describethe fast decay and large data range of the actual histogram;however, a mixture-Laplacian distribution follows the real datavery well:

f(x) = pλ0

2e−λ0|x| + (1− p)

λ1

2e−λ1|x|, (12)

where f(x) is the PDF of the mixture-Laplacian model, p isthe probability to obtain a sample from a low-variance Lapla-cian component, and λ0 and λ1 are the shape parameters of thecorresponding low- and high-variance Laplacian distributions.Fig. 2 (d) shows that the histogram of the mixture-Laplaciansynthetic coefficients {D1} is much closer to the actual onethan the other discussed distributions.

We next discuss approximation coefficients {Aj}. Recallthat current methods generate the coarsest approximationcoefficients (i.e., {AJ}) either as independent Gaussian [24] orBeta random variables [31]. However, as mentioned in Sec-tion III-B, the approximation coefficients are non-negligiblycorrelated and are not i.i.d. To preserve the correlation ofapproximation coefficients and achieve the expected distribu-

-0.2

0

0.2

0.4

0.6

0.8

1

0 100 200 300

lag

auto

corr

elat

ion

actual

our model

nested AR

(a) LRD

-0.2

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30 35

lag

auto

corr

elat

ion

actual

our model

GBAR

Gamma_A

(b) SRD

Fig. 3. The ACF of the actual I-frame sizes and that of the synthetic trafficin (a) long range and (b) short range.

tion in the synthetic coefficients, we assume that the coarsestapproximation coefficients {AJ} are dependent random vari-ables with marginal Gamma distributions4. We first generateN dependent Gaussian variables xi using a k × k correlationmatrix, where N is the length of {AJ} and the correlationmatrix is obtained from the actual coefficients {AJ}. Thenumber of preserved correlation lags k is chosen to be areasonable value (e.g., the average scene length5). By applyingthe Gaussian CDF FG(x) directly to xi, we convert them into auniformly distributed set of variables FG(xi). It is well knownthat if F is a continuous distribution with inverse F−1 and uis a uniform random number, then F−1(u) has the distributionF . Based on this insight, we pass the result from the last stepthrough the inverse Gamma CDF to generate (still dependent)Gamma random variables [7].

Using the estimated approximation and detailed coefficients,we perform the inverse wavelet transform to generate syntheticI-frame sizes. Fig. 3 (a) shows the ACF of the actual I-framesizes and that of the synthetic traffic in long range. Fig. 3 (b)shows the correlation of the synthetic traffic from the GOP-GBAR model [9] and Gamma A model [34] in short range. Asobserved in both figures, our synthetic I-frame sizes captureboth the LRD and SRD properties of the original traffic betterthan the previous models.

B. Intra-GOP Correlation Analysis

We next provide a detailed analysis of intra-GOP correlationfor various video sequences and model P/B-frame sizes in thetime domain based on intra-GOP correlation. Before furtherdiscussion, we define I and P/B-traces as follows. Assumingthat n ≥ 1 represents the GOP number,• φI(n) is the I-frame size of the n-th GOP;• φP

i (n) is the size of the i-th P-frame in GOP n;• φB

i (n) is the size of the i-th B-frame in GOP n.For example, φP

3 (10) represents the size of the third P-framein the 10-th GOP.

Although previous work model the P/B-frame sizes as i.i.d.random variables [10], [23], [34], Lombardo et al. [20] noticedthat there is a strong correlation between the P/B-frame sizes

4For more details about how to construct dependent random variables, thereaders are referred to the concept of copula [7].

5This is a reasonable choice because there is much less correlation amongI-frames of different scenes than among I-frames of the same scene.

6

-0.1

0

0.1

0.2

0.3

0.4

0 50 100 150 200 250 300 350

lag

corr

elat

ion

cov(I,P1)

cov(I,P2)

cov(I,P3)

(a) Star Wars IV

-0.1

0.1

0.3

0.5

0.7

0.9

0 50 100 150 200 250 300 350

lag

corr

elat

ion

cov(I,P1)

cov(I,P2)

cov(I,P3)

(b) Jurassic Park I

Fig. 4. The correlation between {φPi (n)} and {φI(n)} in (a) Star Wars

IV and (b) Jurassic Park I, for i = 1, 2, 3.

0

0.2

0.4

0.6

0.8

1

0 5 10 15

quant. step

corr

elat

ion

StarWarsJurassicTroopersSilenceStarTrek

(a)

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20

quant. step

corr

elat

ion

StarWarsJurassicTroopersSilenceStarTrek

(b)

Fig. 5. (a) The correlation between {φI(n)} and {φP1 (n)} in MPEG-4

sequences coded at Q = 4, 10, 14. (b) The correlation between {φI(n)} and{φB

1 (n)} in MPEG-4 sequences coded at Q = 4, 10, 18.

and the I-frame size belonging to the same GOP, which isalso called intra-GOP correlation. Motivated by their results,we conduct the analysis of the intra-GOP correlation between{φI(n)} and {φP

i (n)} or {φBi (n)} in two situations: (a)

the intra-GOP correlation for different i in a specific videosequence with fixed quantization step Q; and (b) the intra-GOP correlation for the same i in various sequences coded atdifferent steps Q.

For the first part of our analysis, we investigate the correla-tion between {φI(n)} and {φP

i (n)} and that between {φI(n)}and {φB

i (n)} for different i in various sequences. Fig. 4 (a)shows the intra-GOP correlation in single-layer Star WarsIV, which is coded with quantization step Q = 10, 14, 18for I/P/B frames, respectively. Fig. 4 (b) shows the samecorrelation in Jurassic Park I that is coded at Q = 4 forall frames6. As shown in the figure, the correlation is almostidentical for different i, which is rather convenient for ourmodeling purposes.

For the second part of our analysis, we examine variousvideo sequences coded at different quantization steps to un-derstand the relationship between intra-GOP correlation andquantization steps. We show the correlation between {φI(n)}and {φP

1 (n)} and that between {φI(n)} and {φB1 (n)} in five

MPEG-4 coded video sequences in Fig. 5. We also showthe same correlation in H.264 coded Starship Troopers

6If a sequence is coded with the same quantization step c for all frames, wesay this sequence is coded at Q = c. Otherwise, we describe the quantizationstep for each type of frames in this sequence.

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30 35

quant. step

corr

elat

ion

cov(I, P1)

cov(I, B1)

(a)

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

quant. step

corr

elat

ion

cov(I, P1)

cov(I, B1)

(b)

Fig. 6. The correlation between {φI(n)} and {φP1 (n)} and that between

{φI(n)} and {φB1 (n)} in (a) H.264 Starship Troopers and (b) the

base layer of the spatially scalable The Silence of the Lambs codedat different Q.

0

500

1000

1500

2000

2500

3000

0 1000 2000 3000 4000 5000

I-frame size (bytes)

byt

es

mean P frame size

mean B frame size

(a) Star Wars IV

0

500

1000

1500

2000

2500

0 1000 2000 3000

I-frame size (bytes)

byt

es

mean P frame size

mean B frame size

(b) The Silence of the Lambs

Fig. 7. The mean sizes of P and B-frames of each GOP given the size ofthe corresponding I-frame in (a) the single-layer Star Wars IV and (b)the base layer of the spatially scalable The Silence of the Lambs.

[30] and in the base layer of the spatially scalable TheSilence of the Lambs in Fig. 6.

As observed from Fig. 5 and Fig. 6, the intra-GOP correla-tion decreases as the quantization step increases. This is dueto the fact that sequences coded with smaller Q share moresource information among the different frames in one GOPand thus have stronger intra-GOP correlation than sequencescoded with larger Q. This observation is very useful for usersto decide whether or not it is worthwhile to preserve intra-GOPcorrelation with an increase in model complexity.

To better model P and B-frame sizes, we also investigate therelationship between the sizes of P/B-frames and that of the I-frame belong to the same GOP. Lombardo et al. [20] modeledthe sizes of MPEG-1 coded P/B-frames as Gamma distributedrandom variables, with mean and variance estimated by alinear function of {φI(n)}. However, we find that this linearestimation does not hold for general video traffic. As shown inFig. 7, the means of P and B-frames are not linear functionsof I-frame sizes in MPEG-4 coded Star Wars IV and TheSilence of the Lambs. Therefore, in the next section,we propose an alternative model for generating P and B-framesizes, which captures the intra-GOP correlation in generalGOP-based VBR video.

C. Modeling P and B-Frame Sizes

The above discussion shows that there is a similar correla-tion between {φP

i (n)} and {φI(n)} with respect to different

7

0

100

200

300

400

500

-100 300 700 1100 1500 1900

bytes

v_1

v_2

v_3

(a)

0

50

100

150

200

250

300

350

400

-500 2500 5500 8500 11500

bytes

Q=14

Q=10

Q=4

(b)

Fig. 8. (a) Histograms of {v(n)} for {φPi (n)} with i = 1, 2, 3 in Star

Wars IV coded at Q = 14. (b) Histograms of {v(n)} for {φP1 (n)} in

Jurassic Park I coded at Q = 4, 10, 14.

i. Motivated by this observation, we propose a linear modelto estimate the size of the i-th P-frame in the n-th GOP:

φPi (n) = aφI(n) + v(n), (13)

where φI(n) = φI(n) − E[φI(n)] and v(n) is a syntheticprocess (whose properties we study below) that is independentof φI(n).

Lemma 2: To capture the intra-GOP correlation, the valueof coefficient a in (13) must be equal to:

a =r(0)σP

σI, (14)

where σP is the standard deviation of {φPi (n)}, σI is the

standard deviation of {φI(n)}, and r(0) is their normalizedcorrelation coefficient at lag zero.

Proof: Without loss of generality, we assume that bothφI(n) and φP

i (n) are wide-sense stationary processes. Thus,E[φP

i (n)] is constant and:

E[φI(n− k)] = E[φI(n)] = 0. (15)

Denote by C(k) the covariance between φPi (n) and φI(n) at

lag k:

C(k) = E[(φPi (n)− E[φP

i ])(φI(n− k)− E[φI ])]. (16)

Recall that v(n) and φI(n) are independent of each other andthus E[v(n) · φI(n)] = E[v(n)] · E[φI(n)] = 0. Then C(k)becomes:

C(k) = E[(aφI(n) + v(n)− E[φPi ])φI(n− k)]

= aE[φI(n)φI(n− k)] (17)

Next, observe that the normalized correlation coefficient r atlag zero is:

r(0) =C(0)σP σI

=aE[φI(n)2]

σP σI

, (18)

where σI is the standard deviation of φI(n). Recalling thatE[φI(n)] = 0, we have E[φI(n)2] = σ2

I= σ2

I and:

a · σI

σP= r(0), (19)

which leads to (14).

-0.1

0.1

0.3

0.5

0 70 140 210 280 350

lag

corr

elat

ion

actual

our model

i.i.d methods

(a)

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 50 100 150 200 250 300 350

lag

corr

elat

ion

actual

our model

i.i.d methods

(b)

Fig. 9. (a) The correlation between {φP1 (n)} and {φI(n)} in Star Wars

IV. (b) The correlation between {φB1 (n)} and {φI(n)} in Jurassic

Park I.

To understand how to generate {v(n)}, we next examinethe actual residual process v(n) = φP

i (n)− aφI(n) for eachi and for video sequences coded at various Q. Fig. 8 (a)shows the histograms of {v(n)} for P-traces with differenti in single-layer Star Wars IV and Fig. 8 (b) shows thehistograms of {v(n)} for sequences coded at different Q.The figure show that the residual process {v(n)} does notchange much as a function of i but its histogram becomesmore Gaussian-like when Q increases. Due to the diversityof the histogram of {v(n)}, we use a generalized Gammadistribution Gamma(γ, α, β) to estimate {v(n)}.

From Fig. 5 (b), we observe that the correlation between{φB

i (n)} and {φI(n)} could be as small as 0.1 (e.g., in StarWars IV coded at Q = 18) or as large as 0.9 (e.g., in TheSilence of the Lambs coded at Q = 4). Thus, wecan generate the synthetic B-frame traffic simply by an i.i.d.lognormal random number generator when the correlationbetween {φB

i (n)} and {φI(n)} is small, or by a linear modelsimilar to (13) when the correlation is large. The linear modelhas the following form:

φBi (n) = aφI(n) + vB(n), (20)

where a = r(0)σB/σI , r(0) is the lag-0 correlation between{φI(n)} and {φB

i (n)}, σB and σI are the standard deviationof {φB

i (n)} and {φI(n)}, respectively. Process vB(n) isindependent of φI(n).

We illustrate the difference between our model and a typicali.i.d. method of prior work (e.g., [23], [34]) in Fig. 9.The figure shows that our model indeed preserves the intra-GOP correlation of the original traffic, while the previousmethods produce white (uncorrelated) noise. Statistical para-meters (r(0), σP , σI , γ, α, β) needed for this model are easilyestimated from the original sequences.

D. Further Discussion

As shown in Fig. 5 and Fig. 6, the intra-GOP correlationis small in video sequences coded with a large quantizationstep. Furthermore, the correlation between the I-frame of aGOP and the following P/B-frames decreases as the GOP sizeincreases, since these P/B-frames are far away from the I-frame in the same GOP due to the large GOP size. Under thesecircumstances, we can model {φP

i (n)} or {φBi (n)} using the

8

0

0.05

0.1

0.15

0.2

0 100 200 300 400 500

lag

corr

elat

ion

cov(B1,I)

cov(B2,I)

(a)

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500

lag

corr

elat

ion

cov(B1,P1)

cov(B2,P1)

(b)

Fig. 10. (a) The correlation between {φBi (n)} and {φI(n)} and (b) that

between {φBi (n)} and {φP

1 (n)} in temporally scalable-coded CitizenKane, for i = 1, 2.

correlation between them and their reference frame due to thefact that P/B-frames are predictively coded.

This discussion benefits the modeling of temporally scal-able traffic. Note that in temporally scalable video, the baselayer and the enhancement layer are approximately equivalentto extracting I/P-frames and B-frames out of a single-layersequence, respectively [30]. In other words, we model theenhancement layer of a temporally scalable-coded sequenceas modeling the B-frames of a single-layer video traffic.

To better understand the correlation between the neighbor-ing frames, we examine the correlation between the I-traceand P/B-traces and that between two neighboring P/B-tracesin various sequences. In Fig. 10, we show the correlationbetween {φI(n)} and {φB

i (n)} and that between {φP1 (n)}

and {φBi (n)}, for i = 1, 2, in temporally scalable Citizen

Kane coded with quantization step Q = 30. As Fig. 10 (a)shows, the correlation between the base layer I-frames andthe enhancement layer is not large enough to apply the linearmodel (20) in this sequence. However, Fig. 10 (b) shows thatthe enhancement layer and its neighboring base layer P-framesare highly correlated.

Therefore, we rewrite linear model (20) to:

φBi (n) = aφP (n) + vB(n), (21)

where parameter a = r(0)σB/σP , r(0) is the lag-0 correlationbetween the neighboring P and B-frame sizes, and σB , σP arethe standard deviations of these two P/B-traces.

Since the lag-0 correlation r(0) is important in the modelingprocess, we compare it in the original and synthetic temporallyscalable Citizen Kane coded with Q = 4 in Table II. Asthe table shows, the actual correlation coefficients are closeto those of the synthetic traffic generated by model (21), withthe average absolute error 3.2%.

V. MODELING LAYER-CODED TRAFFIC

In this section, we provide brief background knowledge ofmulti-layer video, investigate methods to capture cross-layerdependency, and model the enhancement-layer traffic.

Due to its flexibility and high bandwidth utilization, layeredvideo coding is common in video applications. Layered codingcan be further classified as coarse-granular (e.g., spatial scal-ability) or fine-granular (e.g., fine granular scalability (FGS))

TABLE IILAG-0 CORRELATION IN THE ORIGINAL AND SYNTHETIC TRAFFIC

Correlation between traces Original Syntheticcov({φB

1 (n)}, {φP1 (n)}) 0.8504 0.8715

cov({φB2 (n)}, {φP

1 (n)}) 0.8714 0.8940cov({φB

3 (n)}, {φP2 (n)}) 0.8692 0.8927


2 (n)}) 0.8749 0.8994cov({φB

5 (n)}, {φP3 (n)}) 0.8706 0.8959


3 (n)}) 0.8685 0.8996cov({φB

7 (n)}, {φP3 (n)}) 0.8141 0.8490


3 (n)}) 0.8112 0.8510

0

0.2

0.4

0.6

0.8

1

0 50 100 150

lag

corr

elat

ion

Q=4

Q=24

Q=30

(a)

0

0.2

0.4

0.6

0.8

1

0 50 100 150

lag

corr

elat

ion

cov(P1_BL,P1_EL)

cov(P2_BL,P2_EL)

cov(P3_BL,P3_EL)

(b)

Fig. 11. (a) The correlation between {εI(n)} and {φI(n)} in TheSilence of the Lambs coded at Q = 4, 24, 30. (b) The correlationbetween {εP

i (n)} and {φPi (n)} in The Silence of the Lambs coded

at Q = 30, for i = 1, 2, 3.

[38]. The major difference between coarse granularity and finegranularity is that the former provides quality improvementsonly when a complete enhancement layer has been received,while the latter continuously improves video quality withevery additionally received codeword of the enhancement layerbitstream.

In both coarse granular and fine granular coding methods,an enhancement layer is coded with the residual betweenthe original image and the reconstructed image from thebase layer. Therefore, the enhancement layer has a strongdependency on the base layer. Zhao et al. [40] also indicatethat there exists a cross-layer correlation between the baselayer and the enhancement layer; however, this correlation hasnot been fully addressed in previous studies.

In the next subsection, we investigate the cross-layer cor-relation between the enhancement layer and the base layerusing spatially scalable The Silence of the Lambssequence and an FGS-coded Star Wars IV sequence asexamples. We only show the analysis of two-layer sequencesfor brevity and note that similar results hold for video streamswith more than two layers.

A. Analysis of the Enhancement LayerFor discussion convenience, we define the enhancement

layer frame sizes as follows. Similar to the definition in thebase layer, we define εI(n) to be the I-frame size of the n-thGOP, εP

i (n) to be the size of the i-th P-frame in GOP n, andεBi (n) to be the size of the i-th B-frame in GOP n.

Since each frame in the enhancement layer is predicted fromthe corresponding frame in the base layer, we examine the

9

0

0.2

0.4

0.6

0.8

1

0 50 100 150lag

corr

elat

ion

BL_I_cov

EL_I_cov

(a)

0

0.2

0.4

0.6

0.8

1

0 50 100 150lag

corr

elat

ion

BL_P_cov

EL_P_cov

(b)

Fig. 12. (a) The ACF of {εI(n)} and that of {φI(n)} in Star WarsIV. (b) The ACF of {εP

1 (n)} and that of {φP1 (n)} in The Silence of

the Lambs.

cross-layer cross-layer correlation between the enhancementlayer frame sizes and corresponding base layer frame sizes invarious sequences.

In Fig. 11 (a), we display the correlation between {εI(n)}and {φI(n)} in The Silence of the Lambs coded atdifferent Q. As observed from the figure, the correlation be-tween {εI(n)} and {φI(n)} is stronger when the quantizationstep Q is smaller. However, the difference among these cross-layer correlation curves is not as obvious as that in intra-GOPcorrelation. We also observe that cross-layer correlation is stillstrong even at large lags, which indicates that {εI(n)} exhibitsLRD properties and we should preserve these properties in thesynthetic enhancement layer I-frame sizes.

In Fig 11 (b), we show the cross-layer correlation betweenprocesses {εP

i (n)} and {φPi (n)} for i = 1, 2, 3. The figure

demonstrates that the correlation between the enhancementlayer and the base layer is quite strong, and the correlationstructures between each {εP

i (n)} and {φPi (n)} are very sim-

ilar to each other. To avoid repetitive description, we do notshow the correlation between {εB

i (n)} and {φBi (n)}, which

is similar to that between {εPi (n)} and {φP

i (n)}.Aside from cross-layer correlation, we also examine the

autocorrelation of each frame sequence in the enhancementlayer and that of the corresponding sequence in the base layer.We show the ACF of {εI(n)} and that of {φI(n)} (labeledas “EL I cov” and “BL I cov”, respectively) in Fig. 12 (a);and display the ACF of {εP

1 (n)} and that of {φP1 (n)} in

Fig. 12 (b). The figure shows that although the ACF structureof {εI(n)} has some oscillation, its trend closely follows thatof {φI(n)}. One also observes from the figures that the ACFstructures of processes {εP

i (n)} and {φPi (n)} are similar to

each other.

B. Modeling the Enhancement Layer I-Frame Sizes

Although cross-layer correlation is obvious in multi-layertraffic, previous work neither considered it during modeling[2], nor explicitly addressed the issue of its modeling [40]. Inthis section, we first describe how we model the enhancementlayer I-frame sizes and then evaluate the performance of ourmodel in capturing the cross-layer correlation.

Recalling that {εI(n)} also possesses both SRD and LRDproperties (as shown in Fig. 12 (a)), we model it in the waveletdomain as we modeled {φI(n)}. We define {Aj(ε)} and

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150

lag

corr

elat

ion

ca_BL_cov

ca_EL_cov

(a) Q = 30

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150

lag

corr

elat

ion

ca_BL_cov

ca_EL_cov

(b) Q = 4

Fig. 13. The ACF of {A3(ε)} and {A3(φ)} in The Silence of theLambs coded at (a) Q = 30 and (b) Q = 4.

-0.3

-0.1

0.1

0.3

0.5

0.7

0.9

0 100 200 300 400lag

cro

ss c

orr

elat

ion

actual

our model

(a) our model

-0.3

-0.1

0.1

0.3

0.5

0.7

0.9

0 100 200 300 400lag

cro

ss c

orr

elat

ion

actual

Zhao et al.

(b) model [40]

Fig. 14. The cross-layer correlation between {εI(n)} and {φI(n)} in TheSilence of the Lambs and that in the synthetic traffic generated from(a) our model and (b) model [40].

{Aj(φ)} to be the approximation coefficients of {εI(n)} and{φI(n)} at the wavelet decomposition level j, respectively.To better understand the relationship between {Aj(ε)} and{Aj(φ)}, we show the ACF of {A3(ε)} and {A3(φ)} usingHaar wavelets (labeled as “ca EL cov” and “ca BL cov”,respectively) in Fig. 13.

As shown in Fig. 13, {Aj(ε)} and {Aj(φ)} exhibit similarACF structure. Thus, we generate {AJ(ε)} by borrowing theACF structure of {AJ(φ)}, which is known from our base-layer model. Using the ACF of {AJ (φ)} in modeling {εI(n)}not only saves computational cost, but also preserves the cross-layer correlation.

In Fig. 14, we compare the actual cross-layer correlationbetween {εI(n)} and {φI(n)} to that between the synthetic{εI(n)} and {φI(n)} generated from our model and Zhao’smodel [40]. The figure demonstrates that our model signifi-cantly outperforms Zhao’s model in preserving the cross-layercorrelation.

C. Modeling P and B-Frame Sizes

Recall that the cross-layer correlation between {εPi (n)} and

{φPi (n)} and that between {εB

i (n)} and {φBi (n)} are also

strong, as shown in Fig. 11. We use the linear model fromSection IV-C to estimate the sizes of the i-th P and B-framesin the n-th GOP:

εPi (n) = aφP

i (n) + w1(n), (22)εB

i (n) = aφBi (n) + w2(n), (23)

10

0

100

200

300

400

-500 0 500 1000 1500 2000 2500

bytes

actual

estimate

(a) Star Wars IV

0

100

200

300

400

500

600

700

-500 0 500 1000 1500 2000

bytes

actual

estimate


Fig. 15. Histograms of {w1(n)} and {w1(n)} for {εP1 (n)} in (a) Star

Wars IV and (b) The Silence of the Lambs coded at Q = 30.

where a = r(0)σε/σφ, r(0) is the lag-0 cross-layer correlationcoefficient, σε is the standard deviation of the enhancement-layer sequence, and σφ is the standard deviation of the cor-responding base-layer sequence. Processes {w1(n)}, {w2(n)}are independent of {φP

i (n)} and {φBi (n)}.

We examine {w1(n)} and {w2(n)} in Fig. 15 and find thattheir histograms are asymmetric and decay fast on both sides.To capture this asymmetry, we use two exponential distribu-tions to estimate the PDF of either {w1(n)} or {w2(n)}. Wefirst left-shift {w1(n)} by an offset δ to make the mode (i.e.,the peak) appear at zero. We then model the right side usingone exponential distribution exp(λ1) and the absolute value ofthe left side using another exponential distribution exp(λ2).Afterwards, we generate synthetic data {w1(n)} based onthese two exponential distributions and right-shift the resultby δ. As shown in Fig. 15, the histograms of {w1(n)} areclose to those of the actual data in both Star Wars IV andThe Silence of the Lambs. We generate {w2(n)} inthe same way and find its histogram is also close to that of{w2(n)}. We do not show the histograms here for brevity.

VI. ANALYSIS AND MODELING OF MDC TRAFFIC

While layered coding techniques use a hierarchical struc-ture, multiple description coding (MDC) is a nonhierarchicalcoding scheme that generates equal-importance layers. InMDC-coded sequences, each layer (i.e., description) alone canprovide acceptable quality and several layers together lead tohigher quality. Each description can be individually coded withother layered coding techniques [35].

For instance, the sample MDC sequences used in thispaper are coded with both spatial scalability and MDC.After the original video stream is split into L descriptions(i.e., L-D), the l-th description (l ∈ [1, L]) contains framesm,m+L,m+2L, . . . , m = l, of the original video sequence.Afterwards, description l is further encoded into one baselayer and one enhancement layer using spatially scalablecoding techniques. The GOP pattern of each description isIPPPPPPPPPPP. Since spatial scalability is applied to eachindividual description, no extra dependency or correlation isintroduced between the distinct descriptions.

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-100 -50 0 50 100

lag

cro

ss-c

orr

elat

ion

cov(I-1, I-2)

cov(P-1, P-2)

(a)

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-180 -120 -60 0 60 120 180

lag

auto

corr

elat

ion

{I} in 1-D{I-1} in 2-D{I-2} in 2-D

(b)

Fig. 16. (a) The cross-layer correlation between {φI2−1(n)} and {φI

2−2(n)}and that between {φP1

2−1(n)} and {φP11−i(n)} 2-D Bridge traffic. (b)

The autocorrelation of the original sequence and that of {φI2−1(n)} and

{φI2−2(n)} in 2-D Bridge.

A. Analysis of MDC Traffic

Since each description is able to independently provideacceptable quality to users, all descriptions must share funda-mental source information and thus they are highly correlatedbetween each other. This cross-layer correlation enables thereceiver/decoder to estimate the missing information of onedescription from another received one.

To better understand the correlation in MDC traffic, weanalyze the correlation structures of MDC-coded sequences forfurther modeling purposes. We give the following definitionfor demonstration purposes. Assuming that n represents theGOP number in the i-th description of an L-D sequence, weuse {φI

L−i(n)}, {φPt

L−i(n)}, and {φBt

L−i(n)} to represent theI-trace, the t-th P- and B-trace, respectively.

In Fig. 16 (a), we show the cross-layer correlation betweenthe I-traces and P-traces of two descriptions of the 2-DBridge sequence. As the figure shows, the cross-layer cor-relation between I-traces is much stronger than that betweenP-traces, which is because I-frames contain more fundamentalsource information than P-frames. We also investigate the ACFstructure of the original sequence and that of {φI

2−1(n)} and{φI

2−2(n)} in Fig. 16 (b). One observes that the autocorrela-tion of {φI

2−1(n)} and that of {φI2−2(n)} are almost identical,

both of which are a shifted and scaled version of the ACF ofthe original sequence.

B. MDC Traffic Model

Due to the strong correlation between I-traces of differentdescriptions, we generate the synthetic {φI

L−i(n)} simultane-ously to preserve this cross-layer correlation. Modeling I-framesizes is also conducted in the wavelet domain to preserve theirLRD and SRD properties.

Recall that the approximation coefficients preserve the cor-relation structure of the original signal. Also note that theACF of different descriptions of an L-D sequence are verysimilar to each other (as shown in Fig. 16 (b)). Therefore, aftergenerating the first description {φI

L−1(n)}, we borrow theACF structure of {φI

L−1(n)} to construct the approximationcoefficients of other descriptions. Detailed coefficients {Dj}are estimated as i.i.d. mixture-Laplacian distributed randomvariables.

11

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

lag

cro

ss-c

orr

elat

ion

actual

synthetic

(a)

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60

lag

cro

ss-c

orr

elat

ion

actual

synthetic

(b)

Fig. 17. The actual and synthetic cross-layer correlation between {φIL−1(n)}

and {φIL−2(n)} in Bridge, with (a) L = 2 and (b) L = 3.

0

1000

2000

3000

4000

5000

0 1000 2000 3000 4000 5000

original frame size

syn

thet

ic f

ram

e si

ze

(a) Star Wars IV

0

500

1000

1500

2000

2500

3000

0 1000 2000 3000

original frame size

syn

thet

ic f

ram

e si

ze


Fig. 18. QQ plots for the synthetic (a) single-layer Star Wars IV trafficand (b) The Silence of the Lambs base-layer traffic.

We evaluate the model performance in preserving cross-layer correlation using various MDC-coded sequences andshow the actual and synthetic correlation between {φI

L−1(n)}and {φI

L−2(n)} of 2-D and 3-D Bridge traffic in Fig. 17. Asthe figure shows, our model indeed captures the cross-layercorrelation between the different descriptions in L-D MDCtraffic. To reduce model complexity, P/B-traces are modeledin the same way as single-layer P/B-traces. More performanceevaluation is conducted in the following section.

VII. MODEL PERFORMANCE EVALUATION

As we stated earlier, a good traffic model should capturethe statistical properties of the original traffic and be able toaccurately predict network performance. In this regard, thereare three popular studies to verify the accuracy of a videotraffic model [34]: quantile-quantile (QQ) plots, the varianceof traffic during various time intervals, and buffer overflowloss evaluation. While the first two measures visually evaluatehow well the distribution and the second-order moment of thesynthetic traffic match those of the original one, the overflowloss simulation examines the effectiveness of a traffic modelto capture the temporal burstiness of original traffic.

In the following subsections, we evaluate the accuracy ofour model in both single-layer and multi-layer traffic using theabove three measures.

A. QQ Plots

The QQ plot is a graphical technique to verify the distri-bution similarity between two test data sets. If the two data

0

2000

4000

6000

0 2000 4000 6000original frame size

syn

thet

ic f

ram

e si

ze

(a) Star Wars IV

0

2000

4000

6000

8000

10000

0 2000 4000 6000 8000 10000original frame size

syn

thet

ic f

ram

e si

ze


Fig. 19. QQ plots for the synthetic enhancement-layer traffic: (a) StarWars IV and (b) The Silence of the Lambs.

0

4500

9000

13500

18000

0 4500 9000 13500 18000

original frame sizesy

nth

etic

fra

me

size

(a) Citizen Kane

0

2000

4000

6000

8000

0 2000 4000 6000 8000

original frame size

syn

thet

ic f

ram

e si

ze

(b) Bridge

Fig. 20. QQ plots for the synthetic traffic of (a) temporarily scalableCitizen Kane and (b) the first description in 3-D Bridge.

sets have the same distribution, the points should fall alongthe 45 degree reference line. The greater the departure fromthis reference line, the greater the difference between the twotest data sets.

We show QQ plots of the synthetic single-layer StarWars IV and the synthetic base layer of The Silence ofthe Lambs that are generated by our model in Fig. 18 (a)and (b), respectively. As shown in the figure, the generatedframe sizes and the original traffic are almost identical.

We also evaluate the accuracy of the synthetic enhancementlayer by using QQ plots and show two examples in Fig. 19,which confirms the accuracy of synthetic The Silenceof the Lambs and Star Wars IV enhancement-layertraffic. The figure shows that the synthetic frame sizes in bothsequences have the same distribution as those in the originaltraffic. More simulation results for MDC and temporarily scal-able traffic are shown in Fig. 20, which includes QQ plots forthe temporarily scalable Citizen Kane and synthetic 1-stdescription in 3-D Bridge. As shown in Fig. 20, the syntheticand the original traffic have almost identical distributions.

B. Second-Order Descriptor

Different from QQ plots, the variance of traffic duringvarious time intervals is a second-order descriptor, whichshows whether the model captures the burstiness properties ofthe arrival processes [2]. This metric is computed as follows.Assume that the length of a video sequence is l and thereare m frames in a given time interval t. We segment theone-dimensional data into n non-overlapping block of size m,

12

1.E+06

1.E+07

1.E+08

1.E+09

1.E+10

0 1 2 3 4 5

time interval (s)

byt

es

actualour modelGBARGamma_BNested_AR

(a) Star Wars IV

1.E+06

1.E+07

1.E+08

1.E+09

1.E+10

1.E+11

0 1 2 3 4

time interval (s)

byt

es

actualour modelGBARGamma_BNested_AR


Fig. 21. Comparison of the variance between synthetic and original traffic insingle-layer (a) Star Wars IV and (b) The Silence of the Lambsbase layer.

1.E+05

1.E+06

1.E+07

1.E+08

1.E+09

0 1 2 3 4

time interval (s)

byt

es

actual

our model

Zhao et al. [40]

(a) Star Wars IV

1.E+05

1.E+06

1.E+07

1.E+08

1.E+09

1.E+10

1.E+11

0 1 2 3 4

time interval (s)

byt

es

actual

our model

Zhao et al. [40]


Fig. 22. Comparison of the variance between the synthetic and originalenhancement layer traffic in (a) Star Wars IV and (b) The Silenceof the Lambs.

where n = l/m. After summarizing all the data in each block,we obtain a data sequence of length n and then calculate itsvariance. Given a set of time intervals (i.e., various values ofm), we can obtain a set of variances.

In Fig. 21, we give a comparison between the variance ofthe original traffic and that of the synthetic traffic generatedfrom different models using different time intervals. The figureshows that the second-order moments of our synthetic trafficare in good agreement with those of the original sequences.Note that we display the y-axis on logarithmic scale to clearlyshow the difference among the performance of the variousmodels.

We also compare the variance of the original enhancementlayer traffic and that of the synthetic traffic in Fig. 22. Dueto the computational complexity of Zhao’s model [40] incalculating long sequences, we only take the first 5000 framesof Star Wars IV and The Silence of the Lambs.As observed from the figure, our model preserves the second-order moment of the original traffic well. In Fig. 23 (a) and(b), we display the variance of the original and synthetic trafficin two temporally scalable Citizen Kane sequences codedat Q = 4 and Q = 30, respectively. The figure demonstratesthat the synthetic traffic captures the burstiness of the originaltraffic very well.

C. Buffer Overflows

Besides the distribution and burstiness, we are also con-cerned how well our approach preserves the temporal infor-

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

1.E+09

1.E+10

1.E+11

0 1 2 3 4 5

time interval (s)

byt

es

actual

synthetic

(a) Q = 4

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

1.E+09

1.E+10

0 1 2 3 4 5

time interval (s)

byt

es

actual

synthetic

(b) Q = 30

Fig. 23. Comparison of the variance between the original and synthetictemporarily scalable Citizen Kane coded at (a) Q = 4 and (b) Q = 30.

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5drain rate

dat

a lo

ss r

atio

actual

synthetic

(a) The Silence of the Lambs

0

0.2

0.4

0.6

0.8

1


dat

a lo

ss r

atio

actual

synthetic

(b) Star Wars IV

Fig. 24. Overflow data loss ratio of the original and synthetic enhancementlayer traffic for c = 10 ms for (a) The Silence of the Lambs and (b)Star Wars IV.

mation of the original traffic. A common test for this purposeis a leaky-bucket simulation, which is to pass the original orthe synthetic traffic through a generic buffer with capacity cand drain rate d [34]. The drain rate is the number of bytesdrained per second and is simulated as different multiples ofthe average traffic rate r.

We first examine the model accuracy using the absolutedata loss ratio. In Fig. 24 and Fig. 25, we show the overflowin the enhancement layers of both The Silence of theLambs (54, 000 frames) and Star Wars IV (108, 000frames) with different drain rates d for buffer capacity c = 10ms and c = 30 ms, respectively. The x-axis in the figurerepresents the ratio of the drain rates to the average traffic rater. In Fig. 26, we illustrate the loss rate of the original andsynthetic temporarily scalable Citizen Kane coded withQ = 4 with buffer capacity c = 10, 40 ms. All simulationresults show that the synthetic traffic preserves the temporalinformation of the original traffic very well.

To better evaluate the model performance, we also comparethe accuracy of our model with that of several other trafficmodels using the relative error as the main metric. We definethe relative error e as the difference between the actualabsolute packet loss p in a leaky-bucket simulation and thesynthetic absolute packet loss pmodel that is obtained undersame simulation constraints using synthetic traffic generatedby each of the models:

e =|p− pmodel|

p. (24)

13

0

0.2

0.4

0.6

0.8

1


dat

a lo

ss r

atio

actual

synthetic

(a) The Silence of the Lambs

0

0.2

0.4

0.6

0.8

1


dat

a lo

ss r

atio

actual

synthetic

(b) Star Wars IV

Fig. 25. Overflow data loss ratio of the original and synthetic enhancementlayer traffic for c = 30 ms for (a) The Silence of the Lambs and (b)Star Wars IV.

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4

drain rate

dat

a lo

ss r

atio

actual

synthetic

(a) c = 10

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4

drain rate

dat

a lo

ss r

atio

actual

synthetic

(b) c = 40

Fig. 26. Overflow data loss ratio of the original and synthetic temporarilyscalable Citizen Kane for (a) c = 10 ms and (b) c = 40 ms.

In Table III, we illustrate the values of e for variousbuffer capacities and drain rates d. As shown in the table,the synthetic traffic generated by our model provides a veryaccurate estimate of the actual data loss probability p andsignificantly outperforms the other methods. Fig. 27 showsthe relative error e of synthetic traffic generated from differentmodels in H.264 Starship Troopers coded at Q = 1, 31,given d = r. Since GOP-GBAR model [9] is specificallydeveloped for MPEG traffic, we do not apply it to H.264sequences. The figure shows that our model outperforms theother three models in Starship Troopers coded at smallQ and performs as good as model Gamma A [34] with large Q(the relative error e of both models is less than 1% in Fig. 27(b)).

We are not able to show results for previous multi-layermodels given the nature of our sample sequences since themodel in [2] is only applicable to sequences with a CBR baselayer and the one in [40] is suitable only for short sequences.Therefore, we only illustrate the relative error e of the synthetic2-D MDC-coded Bridge generated by our model in TableIV. As shown in the table, our method accurately preservesthe temporal information of MDC traffic.

VIII. CONCLUSION

In this paper, we presented a framework for modelingMPEG-4 and H.264 multi-layer full-length VBR video traffic.This work precisely captured the inter- and intra-GOP correla-tion in compressed VBR sequences by incorporating wavelet-domain analysis into time-domain modeling. Whereas many

TABLE IIIRELATIVE DATA LOSS ERROR e IN Star Wars IV

Buffer Traffic type Drain ratecapacity 2r 4r 5r

20ms Our Model 0.93% 0.61% 1.13%GOP-GBAR [9] 3.84% 2.16% 3.77%Nested AR [23] 5.81% 2.77% 8.46%Gamma A [34] 5.20% 0.61% 2.57%Gamma B [34] 4.89% 1.93% 2.05%

30ms Our Model 0.25% 0.33% 0.95%GOP-GBAR [9] 4.94% 3.33% 5.68%Nested AR [23] 6.94% 4.14% 9.92%Gamma A [34] 4.88% 1.10% 4.48%Gamma B [34] 4.67% 2.17% 4.03%

0%

1%

2%

3%

4%

10 20 30 40

buffer capacity (ms)

rela

tive

err

or

our modelNested ARGamma_AGamma_B

(a) Q = 1

0%

5%

10%

15%

20%

10 20 30 40

buffer capacity (ms)

rela

tive

err

or

our modelNested ARGamma_AGamma_B

(b) Q = 31

Fig. 27. Given d = r, the error e of various synthetic traffic in H.264Starship Troopers coded at (a) Q = 1 and (b) Q = 31.

previous traffic models are developed at slice-level or evenblock-level [34], our framework uses frame-size level, whichallows us to examine the loss ratio for each type of framesand apply other methods to improve the video quality at thereceiver. We also proposed novel methods to model cross-layercorrelation in multi-layer sequences.

Furthermore, the computation complexity of our modelis no higher than other models. For a length-N input, thecomputational complexity is O(N) for models [9], [23], [34],and O(N2 log(N)) for model [40] due to its complete linkageclustering algorithm. By contrast, our modeling framework isan O(log(N)) algorithm for I-frame size modeling and takesN steps for P/B-frame size modeling.

REFERENCES

[1] J. Beran, R. Sherman, M. S. Taqqu, and W. Willinger, “Variable-Bit-Rate Video Traffic and Long-Range Dependence,” IEEE Trans. onCommunication, vol. 43, 1995.

[2] K. Chandra and A. R. Reibman, “Modeling One- and Two-LayerVariable Bit Rate Video,” IEEE/ACM Trans. on Networking, vol. 7, June1999.

[3] T. P.-C. Chen and T. Chen, “Markov Modulated Punctured Autore-gressive Processes for Video Traffic and Wireless Channel Model-ing,”Packet Video, Apr. 2002.

[4] A. L. Corte, A. Lombardo, S. Palazzo, and S. Zinna, “Modeling Activityin VBR Video Sources,” Signal Processing: Image Commu-nication, vol.3, June 1991.

[5] P. Embrechts, F. Lindskog, and A. J. McNeil, “Modelling Dependencewith Copulas and Applications to Risk Management,” Handbook ofHeavy Tailed Distributions in Finance, Elsevier/North-Holland, Ams-terdam, 2003.

14

TABLE IVRELATIVE DATA LOSS ERROR e IN Bridge

Buffer Traffic Drain ratecapacity type 1r 1.5r 2r 2.5r

10ms Description 1 1.64% 1.38% 0.49% 0%Description 2 0.32% 2.36% 0.72% 0.30%

20ms Description 1 2.28% 0.53% 0.49% 1.22%Description 2 0.75% 0.35% 0.48% 0%

30ms Description 1 3.03% 0.93% 0.25% 0.65%Description 2 0.22% 0.18% 0.49% 0%

40ms Description 1 3.38% 0.96% 1.07% 0.73%Description 2 1.85% 1.69% 0% 0.31%

[6] M. Dai and D. Loguinov, “Analysis and Modeling of MPEG-4 and H.264Multi-Layer Video Traffic,” IEEE INFOCOM, Mar. 2005.

[7] P. Embrechts, F. Lindskog, and A. McNeil, “Correlation and Dependencein Risk Management: Properties and Pitfalls,” Cambridge UniversityPress, 2002.

[8] F. H. P. Fitzek and M. Reisslein, “MPEG-4 and H.263 Video Tracesfor Network Performance Evaluation (Extended Version),” available athttp://www-tkn.ee.tu-berlin.de.

[9] M. Frey and S. Nguyen-Quang, “A Gamma-Based Framework forModeling Variable-Rate MPEG Video Sources: the GOP GBAR Model,”IEEE/ACM Trans. on Networking, vol. 8, Dec. 2000.

[10] M. Krunz and S. K. Tripathi, “On the Characterization of VBR MPEGStreams,” Proc. of ACM SIGMETRICS, vol. 25, June 1997.

[11] M. Krunz and A. Makowski, “Modeling Video Traffic UsingM/G/infinity Input Processes: A Compromise Between Markovian andLRD Models,” IEEE J. Select. Areas Commun., vol. 16, June 1998.

[12] M. W. Garrett and W. Willinger, “Analysis, Modeling and Generationof Self-Similar VBR Video Traffic,” Proc. of ACM SIGCOMM, Aug.1994.

[13] J. C. Goswami and A. K. Chan, Fundamentals of Wavelets: Theory,Algortihms, and Applications, John Wiley Sons, 1999.

[14] D. P. Heyman, A. Tabatabai, and T. V. Lakshman, “Statistical Analysisand Simulation Study of Video Teleconference Traffic in ATM Net-works,” IEEE Trans. on CSVT, vol. 2, Mar. 1992.

[15] D. P. Heyman and T. V. Lakshman, “What are the Implications of Long-Range Dependence for VBR Video Traffic Engineering,” IEEE/ACMTrans. on Networking, vol. 4, pp. 301-317, June 1996.

[16] D. P. Heyman, “The GBAR Source Model for VBR Video Conferences,”IEEE/ACM Trans. on Networking, vol. 5, Aug. 1997.

[17] C. Huang, M. Devetsikiotis, I. Lambadaris, and A. R. Kaye, “Modelingand Simulation of Self-Similar Variable Bit Rate Compressed Video: AUnified Approach,” Proc. of ACM SIGCOMM, Aug. 1995.

[18] M. R. Ismail, I. E. Lambadaris, M. Devetsikiotis, and A. R. Raye,“Modeling Prioritized MPEG Video Using TES and a Frame SpreadingStrategy for Transmission in ATM Networks,” Proc. of INFOCOM, Apr.1995.

[19] H.-D. J. Jeong, D. McNickle, and K. Pawlikowski, “Fast Self-SimilarTeletraffic Generation Based on FGN and Wavelets,” IEEE InternationalConference on Networks, Sept. 1999.

[20] A. Lombardo, G. Morabito, and G. Schembra, “An Accurate andTreatable Markov Model of MPEG-Video Traffic,” Proc. of INFOCOM,Mar. 1998.

[21] A. Lombardo, G. Morabito, S. Palazzo, and G. Schembra, “A Markov-Based Algorithm for the Generation of MPEG Sequences MatchingIntra- and Inter-GoP Correlation,” European Trans. on Telecommuni-cations, vol. 12, Apr. 2001.

[22] S. Q. Li and C.-L. Hwang, “Queue Response to Input CorrelationFunctions: Discrete Spectral Analysis,” IEEE/ACM Trans. Networking,Feb. 1993.

[23] D. Liu, E. I. Sara, and W. Sun, “Nested Auto-Regressive Processes forMPEG-Encoded Video Traffic Modeling,” IEEE Trans. on CSVT, vol.11, Feb. 2001.

[24] S. Ma and C. Ji, “Modeling Heterogeneous Network Traffic in WaveletDomain,” IEEE/ACM Trans. on Networking, vol. 9, Oct. 2001.

[25] B. Maglaris, D. Anastassiou, P. Sen, G. Karlsson, and J. Robbins,“Performance Models of Statistical Multiplexing in Packet Video Com-munications,” IEEE Trans. on Comm., vol. 36, July 1988.

[26] B. Melamed and D. E. Pendarakis, “Modeling Full-Length VBR VideoUsing Markov-Renewal-Modulated TES Models,” IEEE J. Select. AreasCommun., vol. 16, June 1998.

[27] K. Park and W. Willinger, Self-Similar Network Traffic and PerformanceEvaluation, John Wiley & Sons, 2000.

[28] A. Papoulis, Probability, Random Variables, and Stochastic Processes,WCB/McGraw-Hill, 1991.

[29] G. Ramamurthy and B. Sengupta, “Modeling and Analysis of a VariableBit Rate Multiplexer,” IEEE INFOCOM, Florence, Italy, 1992.

[30] M. Reisslein, J. Lassetter, S. Ratnam, O. Lotfallah, F. H. P. Fitzek, andS. Panchanathan, “Video Traces for Network Performance Evaluation,”available at http://trace.eas.asu.edu.

[31] V. J. Ribeiro, R. H. Riedi, M. S. Crouse, and R. G. Baraniuk, “MultiscaleQueuing Analysis of Long-Range-Dependent Network Traffic,” Proc. ofINFOCOM, Mar. 2000.

[32] O. Rose, “Statistical Properties of MPEG Video Traffic and TheirImpact on Traffic Modeling in ATM Systems,” Proc. of the 20th AnnualConference on Local Computer Networks, Oct. 1995.

[33] O. Rose, “Simple and Efficient Models for Variable Bit Rate Mpeg VideoTraffic,” Performance Evaluation, vol. 30, 1997.

[34] U. K. Sarkar, S. Ramakrishnan, and D. Sarkar, “Modeling Full-Length Video Using Markov-Modulated Gamma-Based Framework,”IEEE/ACM Trans. on Networking, vol. 11, Aug. 2003.

[35] P. Seeling and M. Reisslein, “Video Coding with Multiple Descriptorsand Spatial Scalability for Device Diversity in Wireless Multi-hopNetworks,” Arizona State University Technical Report, July 2004.

[36] R. Sherman, M. S. Taqqu, W. Willinger, and D. V. Wilson, “StatisticalAnalysis of Ethernet LAN traffic at teh Source Level,” Proc. ACMSIGCOMM, CA, 1995.

[37] A. H. Tewfik and M. Kim, “Correlation structure of the discrete waveletcoefficients of fractional Brownian motion,” IEEE Trans. on InformationTheory, vol. 38, Mar. 1992.

[38] Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and Com-munications, Prentice Hall, 2001.

[39] Y. Wang, A. R. Reibman, and S. Lin, “Multiple Description Coding forVideo Delivery,” Proceedings of the IEEE, vol. 93, No. 1, Jan. 2005.

[40] J.-A. Zhao, B. Li, and I. Ahmad, “Traffic Model For Layered Video:An Approach On Markovian Arrival Process,” Packet Video, Apr. 2003.

Documents

A Uniﬁed Trafﬁc Model for MPEG-4 and H.264 Video Tracesirl.cs.tamu.edu/people/min/papers/infocom2005-tr.pdfStar Wars IV [30] FGS 30 MPEG-4 Citizen Kane [30] Temporal 30 MPEG-4