Upload
ravi-keerthi-c
View
217
Download
0
Embed Size (px)
Citation preview
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
1/22
Measurement and Modelling of the Temporal Dependence
in Packet Loss
Maya Yajnik, Sue Moon, Jim Kurose and Don Towsley
Department of Computer Science
University of Massachusetts
Amherst, MA 01003 USA
emails:[yajnik,sbmoon,kurose,towsley]@cs.umass.edu
UMASS CMPSCI Technical Report # 98-78
Abstract
Understanding and modelling packet loss in the Internet is especially relevant for the design and analysis of
delay-sensitive multimedia applications. In this paper, we present analysis of hours of end-to-end unicast and
multicast packet loss measurement. From these we selected hours of stationary traces for further analysis. We
consider the dependence as seen in the autocorrelation function of the original loss data as well as the dependence
between good run lengths and loss run lengths. The correlation timescale is found to be or less. We evaluate
the accuracy of three models of increasing complexity: the Bernoulli model, the 2-state Markov chain model and
the -th order Markov chain model. Out of the trace segments considered, the Bernoulli model was found tobe accurate for segments, the 2-state model was found to be accurate for segments. A Markov chain model
of order or greater was found to be necessary to accurately model the rest of the segments. For the case of
adaptive applications which track loss, we address two issues of on-line loss estimation: the required memory size
and whether to use exponential smoothing or a sliding window average to estimate average loss rate. We find that a
large memory size is necessary and that the sliding window average provides a more accurate estimate for the same
effective memory size.
Keywords : measurement, modelling, loss, temporal dependence, correlation.
This work was supported by the National Science Foundation under grant NCR 9508274 and by Defense Advanced Research Projects
Agency under agreement F30602-98-2-0238. Any opinions, findings, and conclusions or recommendations expressed in this material are
those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
1
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
2/22
1 Introduction
Packet loss is a key factor determining the quality seen by delay-sensitive multimedia applications such as au-
dio/video conferencing and Internet telephony. These applications experience degradation in quality with increasingloss and delay in the network. Understanding the loss seen by such applications is important in their design and
performance analysis. Adaptive audio/video applications adjust their transmission rate according to the perceived
congestion level in the network (see [6, 8, 9]). By this adjustment a suitable loss level can be maintained and band-
width can be shared fairly between connections. For such adaptive applications it is important to have simple loss
models that can be parameterized in an on-line manner.
In this paper, we present careful analysis of end-to-end loss using hours of measurements. The measurements
are of loss as seen by packet probes sent at regular intervals (of , , and ) sent on both unicast
and multicast Internet connections. Since significant non-stationary effects are seen in the data, we divide the traces
into two-hour segments and check each segment for stationarity. Gradual decrease and increase in the mean loss rate,
abrupt and dramatic increase in loss rate for a few minutes and spikes of high loss rate are observed. We analyze
hours of stationary trace segments to determine the extent of temporal dependence in the data. The results show thatthe correlation timescale, the time over which what happens to one packet is connected to what happens to another
packet, is approximately second or less. Beyond this timescale, packet losses are independent of each other.
We also consider loss models of increasing complexity and estimate the level of complexity necessary to accu-
rately model the stationary trace segments. The loss process is modelled as a -state discrete-time Markov chain
model where corresponds to the Bernoulli model and corresponds to a 2-state Markov chain model.
is refered to as the order of the Markov chain. For the datasets with sampling intervals of , the Bernoulli
model was found to be accurate for segments, the 2-state Markov model for segments and Markov chain models
of orders to for the remaining . The Bernoulli and the 2-state Markov chain models are not accurate for the
traces with sampling intervals of and and the estimated order of the Markov chain model ranged from
to .
Adaptive multimedia applications often track the congestion level in the network in order to adjust the rate oftransmission to share the bandwidth fairly and to maintain a low enough loss level ([6, 9]). They can use information
about loss in the network over a period of time, instead of the traditional per-packet loss information. For such
situations of on-line loss estimation, it is useful to know how much past information is necessary to accurately
estimate the parameters of the loss models. The non-stationarity observed in the data shows that the loss rate does
vary over time, sometimes dramatically, and thus up-to-date information is important.
Because of the need of adaptive applications for loss rate estimates, we address the question of how much
memory is necessary for the on-line parameter estimator to provide an accurate enough estimate. In general, it is
found that the variance in the estimator due to limited number of samples is high and thus large numbers of samples
are required to get an accurate estimate. We also address the question of whether or not exponential smoothing
provides a better estimate of the mean loss rate than the sliding window average. Though computationally quicker
and requiring less buffer space, exponential smoothing needs more than twice as many samples to provide an estimate
with the same accuracy as the sliding window average.
Earlier works on the measurement of unicast and multicast packet loss in the Internet ([1, 4, 5, 14, 15]) have
noted that the number of consecutively lost packets is small. In [15] long outages of several seconds and minutes
were also observed on the MBone (multicast backbone network overlaid on the Internet). In [13] measurements of
voice traffic are analyzed to assess the effects of strategies to compensate for varying delay and loss on the quality
of the voice connection. In [1] statistical analysis of temporal correlation in loss data has been described and weak
correlation has been noted. The use of discrete-time Markov chain models, particularly the 2-state Markov chain
2
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
3/22
Table 1: Trace Descriptions
Date Type Sampling Destination Time Duration LossInterval ( ) Rate
1 14Nov97 unicast 80ms SICS, Sweden 09:52 8hr 2.70%
2 14Nov97 multicast 80ms SICS, Sweden 09:53 8hr 11.03%
3 20Nov97 unicast 160ms SICS, Sweden 16:03 2days 1.91%
4 12Dec97 multicast 160ms St. Louis, Missouri 14:23 2days 5.24%
5 20Dec97 unicast 20ms Seattle, Washington 13:41 2hr 27min 1.69%
6 20Dec97 multicast 20ms Seattle, Washington 13:49 2hr 27min 3.76%
7 21Dec97 unicast 20ms Los Angeles, California 13:17 2hr 1.38%
8 21Dec97 multicast 20ms Los Angeles, California 13:26 2hr 3.37%
9 26Jun98 unicast 40ms Atlanta, Georgia 16:56 6hr 2.22%
model (sometimes called the Gilbert model) has been proposed in [14, 7, 8, 11]. Discrete-time Markov chain models
of increasing levels of complexity, including the 2-state Markov chain model have been described in [14]. This paper
takes a more careful look at both temporal dependence and the validity of using various models for packet loss using
hours of stationary data.
The remainder of this paper is structured as follows. In Section 2 we describe how the data was collected. In
Section 3 we first describe two ways of representing the data and then go on to discuss our analysis of stationarity
and the temporal dependence in the data. The models and an evaluation of their accuracy is discussed in Section 4.
In Section 5 we discuss the memory size required for on-line estimation of model parameters and also the relative
suitability of exponential smoothing and sliding window averaging as ways of estimating the average loss rate.
Finally, the conclusions are given in Section 6.
2 Measurement
Our study of end-to-end loss in the Internet is based on hours of traces. These traces were gathered by sending
out packet probes along unicast and multicast end-to-end connections at periodic intervals (of , , or ms)
and recording the sequence numbers of the probe packets that arrived successfully at the receiver. The packets whose
sequence numbers were not recorded were assumed to have been lost. Thus the loss data can be considered to be a
binary time series where takes the value if the probe packet arrived successfully and the value if it
was lost. The interval at which the probe packets are sent out into the network is referred to as the sampling interval
( ) for the rest of the paper.
The measurements are summarized in Table 1. The source was located at Amherst, Massachusetts for all thetraces. Note that one of multicast trace (not shown in the table) displayed an extremely high loss rate of
and was therefore disregarded for analysis. We refer to the datasets using the notation Date.Type (for example,
20Nov97.uni).
3
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
4/22
0 1000 2000 3000 4000 5000 6000 7000 80000
5
10
15
20
25
30
Time in seconds
Loss
percentage
Figure 1: The Loss Percentage for a two-hour seg-
ment of 14Nov97.uni
3 Analysis
We consider two ways of representing the loss data. The first is as a binary time series where takes the
value if the probe packet arrived successfully and the value if it was lost. The time series is a discrete-valued
time series which takes on values in the set .
A second way of representing the data is as alternating sequences of good runs and loss runs. The trace can be
divided into portions of consecutive s and portions of consecutive s. The good runs are the lengths (expressed in
number of packets) of the portions of the trace which consist solely of consecutive s. Similarly, the loss runs arethe lengths of the portions of the traces which consist solely of consecutive s. Thus the data can be considered to be
the interleaving of the two sequences of observations: and where is the good run length and
is the corresponding loss run length.
The remainder of this section is structured as follows. We discuss our analysis of the traces for stationarity
in Section 3.1. We find that approximately hours exhibit stationarity. In Section 3.2 we focus on the temporal
dependence in these stationary trace segments as shown by the sample autocorrelation function of the binary time
series representation of the data. We discuss our analysis of the temporal dependence using the good run and loss
run representation of the data in Section 3.3.
3.1 Stationarity
By looking at the smoothed trend of the loss data, obvious non-stationarity was found in most of the traces. There-
fore, the traces are divided into approximately two-hour segments and each segment was checked for stationarity. For
the rest of the paper we refer to the datasets using the notation Date.Type-Segment# (for example, 14Nov97.uni-
3). A time series is said to be stationary in the strict-sense, if the statistical properties remain constant over the
entire series. A time series is said to be stationary in the wide-sense (also called weak stationarity) if the mean
and covariance function remain constant over time. Since there is no good way to rigorously test for stationarity,
4
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
5/22
Table 2: Stationarity
sampling seg- number ofDataset interval ments stationary
segments
1 14Nov97.uni 80ms 4 none
2 14Nov97.multi 80ms 4 none
3 20Nov97.uni 160ms 24 19
4 12Dec97.multi 160ms 25 14
5 20Dec97.uni 20ms 1 1
6 20Dec97.multi 20ms 1 1
7 21Dec97.uni 20ms 1 1
8 21Dec97.multi 20ms 1 none
9 26Jun98.uni 40ms 3 3
we checked whether the average loss rate varied significantly in the trace segments. We plotted the smoothed trace
using a finite moving average filter (of window size of packets) to judge the extent to which the average loss
rate varied over the length of the trace. Abrupt increases of greater than in the average loss rate seen in the
smoothed trace, were taken to be an indication of non-stationarity. Also, to test for gradual increase or decrease in
the average loss rate, we fit a straight line to the data (a straight line which minimizes the least-square error) as an
estimate of the linear trend. A total change in the average loss rate of or greater, over a hour trace segment,
as estimated by the least-squares straight line, was considered to be an indication of non-stationarity.
There are different kinds of observed non-stationary effects. For example, consider the smoothed loss for the
third segment of trace 14Nov97.uni in Fig. 1. In the first half of the segment there is a slow, linear decay in losspercentage from to over one hour. Such slow decays and increases were noticed in other traces as well.
Also, in this case there is an abrupt, dramatic increase from loss to loss which lasts for approximately 10
minutes before abruptly dropping to a lower loss rate. Such abrupt increases were noticed in other traces although
this trace segment shows the most dramatic increases which last the longest time.
Table 2 summarizes the results of our analysis for stationarity. Traces 20Dec97.multi and 21Dec97.multi
exhibit periodic behavior (with a period of ) making them difficult to analyze and model. Thus we iden-
tified out of two hour segments as exhibiting stationary behavior. The stationary segments of five traces:
20Nov97.uni, 12Dec97.multi, 20Dec97.uni, 21Dec97.uni and 26Jun98.uni have been analyzed for tem-
poral dependence in the following sections.
3.2 Autocorrelation of the Binary Loss Data
The sample autocorrelation function of the loss data for the stationary segments, expressed as the binary time series
reveals the extent of dependence in the loss experienced by the probe packets over time. In this section,
we define the autocorrelation, describe the estimation of correlation timescale from the autocorrelation function and
discuss the results for the loss data.
Let be a stationary sequence of random variables and let be a realization of . The autocor-
5
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
6/22
relation function for the stationary sequence of random variables is defined as
where is the mean and is the lag.
The sample mean for observed data is
The sample autocovariance function, assuming stationarity is
where is the lag. The sample autocorrelation function is
(1)
The lag can also be expressed in terms of time. That is, let be the lag in terms of time.
(2)
where is the sampling interval.
Correlation timescale, is the minimum lag, in terms of time, such that is uncorrelated at all lags, , .
In the case of an independent stochastic process, the autocorrelation function is zero, and if the stochastic processis independent at a lag , then the autocorrelation function, , is zero at that lag. However, in the case of an
observed sample sequence which is the realization of an independent stochastic process, the sample autocorrelation
function could have non-zero, though very small, values. Proposition 1 gives an idea of what constitutes a significant
value for the sample autocorrelation function for a given number of samples, .
Proposition 1 For large , the sample autocorrelations of an IID sequence with finite variance are approx-
imately IID with a normal distribution (mean of and variance of )(see [10]). Hence if is a
realization of such an IID sequence then, approximately of the sample autocorrelations should fall between the
confidence bounds .
This proposition specifies what constitutes a significant value for the sample autocorrelation function for a given
number of samples. The correlation timescale is the smallest lag in terms of time, , at and beyond which the valueof the sample autocorrelation function becomes small enough to be considered insignificant.
Using Proposition 1 we can test the following hypothesis:
Hypothesis 1 is independent at lag
The test statistic is calculated from the sample autocorrelation function at lag , of the sample sequence, as
6
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
7/22
0 10 20 30 40 50
0
0.05
0.1
0.15
0.2
Lag
Autocorrelation
observedbounds
Figure 2: Sample Autocorrelation Function of Bi-nary Data for 20Nov97.uni-1
0 500 1000 1500 2000
0
0.05
0.1
0.15
0.2
d: Lag in terms of time (ms)
Autocorrelation
20Nov97.uni1 (160ms)26Jun98.uni1 (40ms)
20Dec97.uni (20ms)
Figure 3: Sample Autocorrelation Function for
three trace segments with sampling intervals of, and
follows.
If the hypothesis is true, then has a limiting standard normal distribution. Thus, for example, if then
the hypothesis is rejected at significance level .
The results for the loss data show that there is some correlation at small lags, and that the sample autocorrelation
function decays rapidly. As an example, consider the first segment of trace 20Nov97.uni, 20Nov97.uni-1 withsampling interval of . Fig. 2 shows the autocorrelation function of the binary loss data. A value close to zero
of the sample autocorrelation function, for a lag of , indicates independence between and . Since there are
a finite number of samples, the confidence bounds of show the range of values which are close enough
to zero to indicate independence. In the figure, the confidence bounds ( ) are plotted as dotted lines. The
autocorrelation at lag (that is, the correlation between consecutive packets) is approximately . It is clear that
from a lag of onwards the autocorrelation function becomes insignificantly small except for a few stray points.
This means that the data is dependent up to a lag of 2 and that the correlation timescale for this trace segment is
.
We analyzed all of the stationary segments of our traces. The traces, 20Nov97.uni and 12Dec97.multi had
sampling intervals of , the traces 20Dec97.uni and 21Dec97.uni had sampling intervals of and the
trace 26Jun98.uni had a sampling interval of . For the trace segments with sampling intervals of ,
the sample autocorrelation functions were similar to that described in the example trace segment 20Nov97.uni-1.
That is, the autocorrelation function for lag 1 is usually significant and approximately and the function decays
to insignificant values for lags of around and more. When the lag is expressed in terms of time (refer to 2) , this
corresponds to a correlation timescale of or less. The correlation between consecutive packets for datasets
20Nov97.uni and 12Dec97.multi is found to vary between and with an average of .
The behavior of the sample autocorrelation function for traces with sampling intervals of and is
somewhat different from that for traces with sampling intervals of . For the traces, the autocorrelation
7
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
8/22
160 320 480 640 800 960 11200
2
4
6
8
10
12
Correlation Timescale in ms
Frequency
Figure 4: Frequency Distribution of
Correlation Timescale
function at lag 1 is approximately which is much higher than the value of approximately for the
trace segments. This indicates much higher correlation between consecutive packets for sampling intervals of
than is present for traces with . The trace segments with sampling intervals of showed a similarly high
autocorrelation of approximately at lag of . The autocorrelation functions for the traces with sampling interval
remains significant for lags of or less, and for the traces with sampling intervals of it remains signifi-
cant for lags of or less. Fig. 3 shows the sample autocorrelation as a function of lag expressed in terms of time (2),
for three trace segments each with different sampling intervals. Here the chosen trace segments are 20Nov97.uni-
1, 26Jun98.uni and 20Dec97.uni with sampling intervals of , and respectively. Thus from
this figure it is clear that the correlation timescale is or less.
The correlation timescale of a data segment is estimated as the smallest lag (expressed in terms of time) for which
the Hypothesis 1 is rejected (that is, as the smallest lag at which the time series is independent). A significance levelof is used. This method could underestimate the correlation timescale, since the autocorrelation function could
show dependence at some higher lag, but we have ignored this effect. The results for all the stationary trace segments
are summarized in Fig. 4 as a frequency distribution of the correlation timescale estimated for each data segment.
Here we can see that the correlation timescale is less than except in two cases.
For comparison with traces with sampling intervals of , the traces 20Dec97.uni and 21Dec97.uni
were partitioned into subtraces, each subtrace corresponding to a trace with sampling interval of . Also,
the segments of trace 26Jun98.uni was partitioned into subtraces, each subtrace corresponding to a trace with
sampling interval of . As expected, the sample autocorrelation functions of the subtraces are similar to those
seen for the traces with sampling intervals of .
Thus, the results indicate a correlation timescale of 1 second or less over all the stationary trace segments.
Also, this suggests that loss may not be adequately modelled by an IID process such as the Bernoulli process sincethe traces exhibit some correlation at small lags. The 2-state Markov chain model is able to accurately model the
correlation for lag 1 but since the correlation predicted decays rapidly for larger lags, it is inaccurate with respect to
modelling the correlation for lags greater than 1.
8
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
9/22
3.3 Good Run and Loss Run Lengths
The second way of representing the data is as alternating sequences of good run lengths and loss run lengths: that
is, and . The good runs are the lengths (expressed in number of packets) of the portions of the tracewhich consist solely of consecutive s. Similarly, the loss runs are the lengths of the portions of the traces which
consist solely of consecutive s. Since the binary loss data, is not completely independent in many cases, we
check the good runs and loss runs for independence. The autocorrelation function of the good run lengths and the loss
run lengths gives information about the extent of dependence between the run lengths. Thus if the autocorrelation
functions are found to have very small values, then it indicates that there is little dependence between the lengths
of runs and the underlying processes can be assumed to be IID. Similarly, if the crosscorrelation function between
the good runs and the loss runs is found to be small it indicates that the lengths of good runs and loss runs are not
dependent on each other. It was found that, in most cases, the good runs and the loss runs are indeed independent.
Thus the good runs and the loss runs can be modelled separately as IID random variables.
Fig. 5 shows the autocorrelation function of the good run lengths, the autocorrelation function of the loss run
lengths and the crosscorrelation function between the lengths of the good runs and the loss runs for an example tracesegment 20Nov97.uni-1. The dotted lines indicate the confidence bounds for the range of values considered close
to zero thus indicating independence for that lag as specified by Proposition 1. It is evident that there is insignificant
dependence among the good runs, among the loss runs and between good runs and loss runs.
We analyzed all the stationary segments of the traces similarly. The three sample correlation functions (au-
tocorrelation of good run lengths, autocorrelation of loss run lengths and the crosscorrelation between good run
lengths and loss run lengths) indicate independence for all the segments in the trace 20Nov97.uni. The trace
12Dec97.multi did show dependence between the good run lengths in cases (up to lags , and ).
The traces with sampling intervals of (20Dec97.uni and 21Dec97.uni) show different results. The
autocorrelation function of the loss runs and the crosscorrelation function of the good runs and the loss runs have
insignificantly small values thus indicating the independence of the loss run lengths and the independence of the
good runs from the loss runs. However, the good runs lengths do show some dependence (up to lags and ).
Fig. 6 shows the three correlation functions for the trace 20Dec97.uni which has a sampling interval of .
The trace 26Jun98.uni with sampling interval of shows independence for all three correlation functions:
autocorrelation of good run lengths, autocorrelation of loss run lengths and crosscorrelation of good run and loss run
lengths.
The next step is to look at the distributions of the good run and the loss run lengths. Fig. 7 shows the distribution
of the lengths of the good runs expressed in the number of packets and Fig. 8 shows the distribution of the loss
run lengths for the same trace segment 20Nov97.uni-1. Both figures also show the distributions predicted by the
models for comparison with the observed data which will be discussed in Section 4. Here it is evident that the
distribution of the good run lengths decays approximately linearly when a logscale is used for the y-axis, suggesting
a geometric distribution for the good run length process. The loss run lengths are either or . The other stationary
segments from traces with sampling intervals of showed very similar results. The distribution of the good
run lengths for the segments of traces 20Nov97.uni and 12Dec97.multi appear to be, by visual inspection,approximately geometric. For some segments there is an excess of very short good run lengths (less than 5). The
loss run lengths were either , or for all but one segment of trace 20Nov97.uni. The loss run lengths are either
, , or for all but three segments of trace 12Dec97.multi. This low number of consecutive losses has been
noted in earlier work( [4], [5]).
The distributions of good run and loss run lengths for the traces 20Dec97.uni, 21Dec97.uni and 26Jun98.uni
look somewhat different. For example, consider the distribution of good run lengths shown in Fig. 9 for the trace
9
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
10/22
0 10 20 30 40 500.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
Lag
Autocorrelation/Crosscorrelation
autocorrelationgoodautocorrelationlosscrosscorrelationconfidence bounds
Figure 5: Autocorrelations and Crosscorrela-
tion of Good Run and Loss Run Lengths for
20Nov97.uni-1
0 10 20 30 40 500.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
Autocorrelation/Crosscorrelation
Lag
autocorrelationgoodautocorrelationlosscrosscorrelationconfidence bounds
Figure 6: Autocorrelations and Crosscorrela-
tion of Good Run and Loss Run Lengths for
20Dec97.uni
20Dec97.uni. The distribution of good run lengths shows geometric decay except for the high probability for burst
lengths of less than . The distribution of loss run lengths for this trace shown in Fig. 8, shows approximately
geometric decay. The traces 21Dec97.uni and 26Jun98.uni show similar results.
Since the good run lengths for 20Nov97.uni and 12Dec97.multi appear to be geometrically distributed, we
use the Chi-Square goodness-of-fit test to provide more systematic evidence that the distributions are geometric.
Instead of the discrete-valued geometric distribution we use its continuous-valued equivalent, the exponential distri-
bution. We also compute the shape and rate parameters assuming a Gamma distribution for the good run lengths.
The shape parameter ( , where is the coefficient of variation) of the Gamma distribution gives an
idea of how close the distribution is to the exponential distribution. A shape parameter of corresponds to an ex-
ponential distribution, which is a special case of the Gamma distribution.The shape parameter for the segments of
20Nov97.uni varied between and with an average of . The shape parameter for the segments of
12Dec97.multi varied between and with an average of .
The Chi-Square goodness-of-fit test is used to check whether to reject the hypothesis that the good run lengths are
IID random variables with the given distribution. Using the Chi-Square goodness-of-fit test, the following hypothesis
can be tested for both the Gamma and the Exponential distributions.
Hypothesis 2 The good run lengths are IID random variables with the given distribution function.
It is difficult to similarly use the Chi-Square goodness-of-fit test for the loss run lengths because the lengths of the
loss run lengths vary within a very small range. The continuous-value distributions were used to make it easier to
determine the partitions for the test. For traces 20Nov97.uni and 12Dec97.multi, the hypothesis is rejected for
the exponential distribution for out of segments and for the Gamma distribution it was rejected for out of
segments.
Thus we conclude that the good run lengths are approximately geometrically distributed for the traces 20Nov97.uni
and 12Dec97.multi, although the fit is not exact for one-third of the trace segments.
For comparison with the traces with sampling intervals of , the traces 20Dec97.uni, 21Dec97.uni and
26Jun98.uni were partitioned into subtraces, each subtrace corresponding to a trace with sampling intervals of
. These subtraces were then analyzed. As expected, the results are similar to those seen for the first two traces
10
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
11/22
0 50 100 150 200 250 30010
3
102
101
Number of packets in the burst: b
Probabilityofthe
length
being
b
observedBernoulli model2state model
Figure 7: The Distribution of Good Run Lengths
for 20Nov97.uni-1
0 2 4 6 8 10
103
102
101
100
Number of packets in the burst: b
Probabilityofthe
length
being
b
observed
Bernoulli model2state model
Figure 8: The Distribution of Loss Run Lengths
for 20Nov97.uni-1
0 50 100 150 200 250 30010
4
103
102
101
Number of packets in the burst: b
Probabilityofthe
length
being
b
observed
Bernoulli model2state model
Figure 9: The Distribution of Good Run Lengths
for 20Dec97.uni
0 2 4 6 8 1010
3
102
101
100
Number of packets in the burst: b
Probability
ofthe
length
being
b
observed
Bernoulli model2state model
Figure 10: The Distribution of Loss Run Lengths
for 20Dec97.uni
11
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
12/22
with sampling intervals of .
4 Modelling
In this section we discuss the potential of using three models of increasing complexity: the Bernoulli loss model, the
2-state Markov chain model and the -th order Markov chain model for modelling the data. The loss process which
characterizes the binary loss data (as described in Section 3) is considered to be a stationary discrete-time
stochastic process taking values in the set . The order of the process is the number of previous values of
the process on which the current value depends and can be considered as a measure of the complexity of the model.
For an order of , there are states. The Bernoulli model is the simplest model with order of and no states. The
2-state Markov chain model is of order . The Markov chain model of -th order is a generalization of these two
models and has states.
First, in Section 4.1 we describe the models and associated parameter estimators. Then in Section 4.2 we assess
the validity of the models and the results for the stationary trace segments.
4.1 Models
4.1.1 Bernoulli Loss Model
In the Bernoulli loss model, the sequence of random variables, , is IID (independent and identically dis-
tributed). That is, the probability of being either or is independent of all other values of the time series
and the probabilities are the same irrespective of . Thus this model is characterized by a single parameter, , the
probability of being a (value corresponds to a lost packet). It can be estimated from a sample trace by
(3)
where is the number of times the value occurs in the observed time series, and is the number of
samples in the time series.
The good run length distribution for this model is
The loss run length distribution for this model is
(4)
4.1.2 Two-state Markov Chain Model
In the 2-state Markov chain model the loss process is modelled as a discrete-time Markov chain with two states. The
current state, of the stochastic process is depends only by the previous value, . Unlike the Bernoulli model,
this model is able to capture the dependence between consecutive losses but has an additional parameter. This model
can be characterized by two parameters, and , the transition probabilities between the two states.
12
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
13/22
The maximum likelihood estimators (page of [2]) of and for a sample trace are
(5)
(6)
where is the number of times in the observed time series that follows a and is the number of times
follows . is the number of times a occurs in the trace and is the number of times a occurs in the trace.
The good run length distribution for this model is
The loss run length distribution for this model is
Note that if equals this 2-state Markov chain model is equivalent to the Bernoulli model, since it means
that the probability of loss is independent of which state the process is in. Also, the relative value of and
provides us us information about how bursty the losses are. If then a lost packet is more likely provided
that the previous packet was lost than when the previous packet was successfully received. On the other hand,
indicates that loss is more likely if the previous packet was not lost.
4.1.3 Markov Chain Model of -th order
A class of random processes that is rich enough to capture a large variety of temporal dependencies is the class offinite-state Markov processes. The Bernoulli model and the 2-state Markov chain model are special cases of this
class of models. In the Markov chain model, the current state of the process depends on a certain number of previous
values of the process. This number of previous values is is the order of the process.
Such a process is characterized by its order , and by a conditional probability matrix whose rows may
be interpreted as probability mass functions on ( ) according to which, the next random variable is
generated when the process is in a state .
A process is a Markov chain of order , if the conditional probability
is independent of for all (see [3]). Note that the Bernoulli and the 2-state Markov chain model are
special cases of the Markov chain model corresponding to orders of and respectively.
Let be an observed sequence or a sample path from a Markov source. The -th order state transition
probabilities of the Markov chain can be estimated for all as follows. The number of states is and
the number of conditional probabilities is .
13
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
14/22
Let be a given state of the chain. Let be the number of times state is followed by value
in the sample sequence. Let be the number of times state is seen. Let be an estimate of the probability
that , given that . Then estimates the state transition probability from state to state
. The maximum likelihood estimators of the state transition probabilities of the -th order Markov
chain are
if
otherwise
4.2 Results for the Stationary Segments
The autocorrelation function of the binary loss data as discussed in Section 3.2 shows some correlation for small
lags. The Bernoulli model does not capture any of the correlation seen for example in Fig. 2. The 2-state Markov
chain model is able to accurately model the correlation only for a lag of . The estimated order of the binary loss
process indicates the complexity of the Markov chain model necessary to accurately model the data. If the estimatedorder is , then the Bernoulli loss model is accurate, if it is then the 2-state Markov chain model is accurate, and
if it is or greater, then a Markov chain model of order is required.
Testing Hypothesis 1 (that the binary loss data is independent at a given lag ) is discussed in Section 3.2. A
test statistic for testing the hypothesis using the sample autocorrelation function is also discussed in Section 3.2. An
alternative method to test the same hypothesis for independence of the binary loss data at a given lag, is by using the
Chi-Square Test for independence ([12]). The test statistic for and a lag of is
and
where is the number of times occurs in the sample sequence and similarly, is the number of
times occurs in the sample sequence. is the number of times and (that is, the number of times
follows after a lag of ). is distributed as a chi-square distribution with degree of freedom.
The order of the Markov chain process can be estimated by estimating the minimum lag beyond which the
process is independent. This is similar to the estimation of correlation timescale from the sample autocorrelation
function as discussed in Section 3.2. Thus we test Hypothesis 1 for lags from to . Let be the first lag at which
the hypothesis is not rejected. Then the estimated order is . Fig. 11 shows the estimated order of the Markov
chain for the stationary segments of all the datasets. The two methods of testing the independence hypothesis (using
the sample autocorrelation function and using the chi-square test for indepedence) give the same results. Figure 11
shows that, out of segments, The Bernoulli loss model is accurate for segments, the 2-state Markov chain model
is accurate for segments and for the rest of the segments Markov chain models of higher orders are necessary for
accurate modelling. The estimated order for datasets 20Nov97.uni and 12Dec97.multi ranges from to and
that for datasets 20Dec97.uni, 21Dec97.uni and 26Jun98.uni ranges from to and is much higher thanthat estimated for the other datasets.
We also checked the good run and loss run lengths for independence. It was found that, in most cases, as
discussed in Section 3.3 the good run and the loss run lengths are, indeed, independent. Thus the good runs and
the loss runs can be modelled separately as IID random variables. Both the Bernoulli and the 2-state Markov chain
model model the good run and loss run processes as being independent.
Next we look at the distributions of the good run and the loss runs lengths and check how well the two models
14
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
15/22
0 5 10 15 20 25 30 35 40 450
1
2
3
4
5
6
7
8
9
10
11
Order of the Markov Chain
Frequency
Figure 11: Frequency Distribution of the Order of The Markov Chain Model
predict the empirical distributions. Both the Bernoulli model and the 2-state Markov chain model predict that the
good run and the loss run lengths are distributed according to geometric distributions. As discussed in Section 3.3,
the good run length distribution was found to be approximately geometrically distributed for traces 20Nov97.uni
and 12Dec97.multi, which means that the 2-state Markov chain model is suitable for these traces. However,
since the Chi-Square goodness-of-fit test shows that the exponential distribution is not accurate for one-third of the
segments, a Markov chain model with more states would more accurately model the good run lengths. The loss run
lengths were found to be geometrically distributed in all cases.
The traces 20Dec97.uni, 21Dec97.uni and 26Jun98.uni show a greater degree of correlation. The good run
lengths are clearly not geometrically distributed as shown in Fig. 9 although the loss run lengths are geometrically
distributed. Thus we conclude that neither the Bernoulli nor the 2-state Markov chain model are accurate for these
traces and a greater number of states are necessary to accurately model the loss process.
For the stationary segments of 20Nov97.uni the estimated parameter, varies between and
with an average of , varied between and with an average of and varied between
and with an average of .
Table 4.2 summarizes the estimated model parameters (as discussed in Section 4.1). It shows the minimum,
maximum and average over all stationary segments for each the dataset. For trace segments where the value of
is higher than the value of , the packet loss is burstier than predicted by the Bernoulli model and the 2-state
Markov chain model is more accurate.
5 On-Line Parameter Estimation
Adaptive applications monitor the congestion level in the network in order to adjust their transmission rate to keep the
loss rate low and to share the available bandwidth fairly with other connections. We address two issues in the on-line
estimation of loss for such applications. The first issue is the amount of memory necessary for an accurate estimate
of the Bernoulli model parameter (discussed in Section 5.1) and the the 2-state Markov chain model parameters
(discussed in Section 5.2). The second issue is whether to use exponential smoothing or a sliding window average
15
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
16/22
Table 3: Parameters for the Bernoulli and the 2-state Markov Chain ModelsData
20Nov97.uni minimum 0.0005 0.0004 0.0208
maximum 0.0351 0.0347 0.0909
average 0.0162 0.0158 0.0471
12Dec97.multi minimum 0.0376 0.0373 0.0408
maximum 0.0735 0.0749 0.0636
average 0.0496 0.0496 0.0513
20Dec97.uni 0.0168 0.0141 0.1732
21Dec97.uni 0.0135 0.0109 0.2085
26Jun98.uni minimum 0.0204 0.0178 0.1452
maximum 0.0254 0.0218 0.1608
average 0.0222 0.0192 0.1546
to get up-to-date and accurate average loss rate estimates. This is discussed in Section 5.3
5.1 Memory for the Bernoulli Model Parameter
Equation 3 gives an estimator for the single Bernoulli loss model parameter, , which is equivalent to the mean loss
rate. Assume that values of the binary loss data, are available for estimation of the parameter, . Henceforth,
will be referred to as the memory size of . The confidence interval for this estimator is
Fig. 12 shows the estimation of the mean loss rate for the example trace segment 20Nov97.uni-1 for memory
sizes of , and plotted as a function of the packet sequence number. The graphs display the extent of
variation in the estimated mean loss rate (it varies between and for a memory size of samples).
For an accuracy of in the estimator, and a confidence level of , the number of observations, that are
required is
Note that this depends on the loss rate, . For example, for an accuracy of in the estimate of , the required
memory size as calculated using (7) for a loss rate of is and the required memory size for a loss rate ofis .
5.2 Memory for the 2-State Markov Chain Model Parameters
For the 2-state Markov chain model, there are two parameters to be estimated, as specified by 5 and as specified
by 6. Instead of considering the parameter, directly consider the related parameter . Assume that a
16
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
17/22
memory of values of the basic loss data, are available for estimation of the parameter, . Then the
confidence interval for the estimator for this parameter is
and the confidence interval for the estimate of parameter is
Fig. 13 shows the estimate of the value of for the example trace segment 20Nov97.uni-1 for memory sizes
of and plotted as a function of the packet sequence number. The graphs display the extent of variation
in the estimated value (it varies between and for a memory size of samples). Similarly, Fig. 14
shows the estimate of the value of for the example trace segment. The estimate of fluctuates wildly especiallyfor a memory size of .
For an accuracy of in the estimator of parameter , with a confidence level of , the number of
observations, that are required is
The memory required for an accuracy of in the estimate of parameter is
Note that this depends on the fraction of packets lost, . For an absolute permissible error of in estimating , with
a confidence level of , the memory, required is
The memory required for an accuracy of in the estimate of parameter is
5.3 Exponential Smoothing versus Sliding Window Averaging
Adaptive applications need to estimate the average loss rate in an on-line manner in order to adjust to the congestion
in the network. There could be two ways to estimate the loss rate in an on-line manner: Exponential smoothing and
Sliding Window Averaging. The sliding window average gives the sample mean over a sliding window, say of size
samples. Thus only the latest samples contribute to the current estimate ensuring that the older information is not
17
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
18/22
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 104
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
packet sequence number
estimatedmeanvalue
overall meanaveraging:500averaging:1000averaging:10000
Figure 12: Estimating the mean loss rate for
20Nov97.uni-1 using sliding window averaging
for memory sizes of , and
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 104
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
packet sequence number
estimatedparametervalue:p
overall p valueest. p value:1000est. p value:10000
Figure 13: Estimating the parameter for
20Nov97.uni-1 for memory sizes of and
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5x 10
4
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
packet sequence number
estimatedparametervalue:s
overall s
est. s:1000
est. s:10000
Figure 14: Estimating the parameter for 20Nov97.uni-1 for memory sizes and
18
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
19/22
used. Restricting the memory size is important because of possible non-stationarities in the loss data. Exponential
smoothing is a computationally quicker method for estimating the current mean and it also requires less buffer space.
The question we address here is: whether sliding window averaging gives a significantly better estimate of the
loss rate in terms of both the variance of the estimator and the memory required to get an accurate estimate. As
we will see the sliding window average is, indeed, a significantly better estimator. For the same effective memory
size, exponential smoothing produces an estimate with more than double the variance of the estimate produced by
a sliding window estimate and for the same variance, it requires more than twice as much memory. In the analysis
of the variance of the estimate we have assumed, for simplicity, that the loss process is IID (that is, we assume
the Bernoulli model). In the presence of correlation, the variance of the sample mean estimator is higher than that
calculated assuming independence.
In order to compare the two styles of on-line mean estimation, we first compute the effective memory size in
the case of exponential smoothing. Consider the loss data as a binary time series . then for a gain of , the
exponential smoothing estimator at time is
Thus all the previous samples are used but the weights given to the old samples decay geometrically with the age of
the sample relative to the latest sample.
Let the weight given to a sample observation, at time , where , be . Let be a given number of
observations immediately older than the observation at time and let be the sum of the weights given to all
observations as old as or older than the observation at time .
(7)
The effective memory, , is defined as the such that the total weight given to observations as old as or older
than is restricted to a small fraction, , . That is, using (7), is calculated as the such that .
The effective memory is defined as the memory, , for which the weight in the tail, is approximately equal to
some fraction , .
(for small values of a) (8)
For , the effective memory, is given by
An important property of an estimator is the relative efficiency defined as, its variance relative to the sliding
window average (which is equivalent to the sample mean estimator). Let the population mean be and let the
sample mean estimator (that is, the sliding window average) be . Then the variance of the sample mean estimator
is
19
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
20/22
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 104
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
packet sequence number
estimatedmeanvalu
e
overall meanslide. win. average:2000exp. smoothing:2000
Figure 15: Estimating the mean loss rate
for 20Nov97.uni-1: comparison of exponential
smoothing: and sliding window aver-
age:
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 104
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
packet sequence number
estimatedmeanvalu
e
overall meanaveraging: 1000exp. smoothing: 2000
Figure 16: Estimating the mean loss rate
for 20Nov97.uni-1: comparison of exponential
smoothing: and sliding window aver-
age:
where is the population variance. The variance of the exponential smoothing estimator is
The relative efficiency of the exponential smoothing estimator (relative to the sliding window average) is the ratio of
the variances
The gain, can be calculated for a given effective memory size using (8). This means that the relative efficiency
of the exponential smoothing estimator with the effective memory ( ) equal to the memory of the sliding
window average is . This implies that we see twice as much variance in the exponential smoothing estimator
as we do with the sliding window averaging. This difference in variance can be seen in Fig. 15, which compares
the estimation of the means for the example trace segment 20Nov97.uni-1 for memory size of samples. The
mean estimator values are plotted versus the sequence number of the probe packets. The straight line is the overall
mean, the solid line is the sliding window average and the dotted line is the exponential smoothing estimator values.
Similarly, Fig. 16 shows the comparison of exponential smoothing estimator with effective memory with the
sliding window averaging estimator with memory . Here the variance is almost the same since the exponential
smoothing uses double the memory.Yet another way of comparing exponential smoothing and sliding window averaging is by considering the case
when the gain of the exponential smoothing algorithm and the window size for the sliding window average are set
such that the variances of the estimators are equal. In this case, in the exponential smoothing algorithm, the total
weight given to samples older than the corresponding window size of the sliding window averaging algorithm, is
approximately . Thus in order to get the same variance in the estimator the exponential smoothing algorithm
constructs a weighted average with a weight of given to samples beyond the window.
20
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
21/22
6 Conclusions and Future Work
We have presented measurements of Internet packet loss for unicast and multicast connections. We analyzed the
traces for stationarity and identified hours of stationary trace segments. The loss data was represented both asa binary time series and as alternating sequences of good runs and loss runs. We checked the data for temporal
dependence using both representations.
Our study of the autocorrelation function of the binary time series representation, revealed that the correlation
timescale for these traces is or less. The distributions of the good run lengths and the loss run lengths were
found to be approximately geometrically distributed for the traces. For the and traces, the good
run lengths were found to be not geometrically distributed, while the loss run lengths were geometrically distributed.
We examined the accuracy of three models: the Bernoulli model, the 2-state Markov chain model and the Markov
chain model of -th order, in terms of our analysis of temporal dependence. For the datasets with sampling intervals
of , the Bernoulli model was found to be accurate for segments, the 2-state Markov model for segments
and Markov chain models of orders to for the remaining . The Bernoulli and the 2-state Markov chain models
are not accurate for the traces with sampling intervals of and and the estimated order of the Markovchain model ranged from to .
For on-line loss estimation, we computed the required memory size for a given accuracy in the parameter estimate
for both the average loss rate and the 2-state Markov chain model parameters. Also, we found that using the sliding
window average for average loss rate estimation has advantages over exponential smoothing since it uses less of the
old information for the same variance in the estimator.
A richer collection of measurements with different sampling intervals and a variety of senders and receivers
would allow a better understanding of the temporal dependence in the loss data. And it would be worthwhile to
apply the models to protocol design and performance analysis.
Acknowledgments
We would like to thank Supratik Bhattacharyya, Timur Friedman, Christopher Rafael, Sanjeev Khudanpur and
Joseph Horowitz for fruitful discussions and mathematical insights.
We are grateful to Chuck Cranor of Washington Univ., St.Louis, Stephen Pink of Swedish Institute of Computer
Science, Linda Prime of Univ. of Washington Peter Wan of Georgia Institute of Technology and Lixia Zhang of
Univ. of California, Los Angeles for providing computer accounts which allowed us to take the measurements.
References
[1] ANDREN, J., AND HILDING, M. Real-time Determination of Traffic Conditions over the Internet. Masters
thesis, Chalmers University of Technology Gothenburg, Sweden, Jan. 1998.
[2] BILLINGSLEY, P. Statistical Inference for Markov Processes. The University of Chicago Press, 1961.
[3] BILLINGSLEY, P. Statistical Methods in Markov Chains. Ann. Math. Stat. 32 (1961), 1240.
[4] BOLOT, J. End-to-end Packet Delay and Loss Behavior in the Internet. In SIGCOMM Symposium on Commu-
nications Architectures and Protocols (San Francisco, Sept. 1993), pp. 289298.
21
7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss
22/22
[5] BOLOT, J., CREPIN, H., AND GARCIA, A. V. Analysis of Audio Packet Loss in the Internet. In Proceed-
ings of the International Worskshop on Network and Operating System Support for Digital Audio and Video
(NOSSDAV) (Durham, New Hampshire, Apr. 1995), pp. 163174.
[6] BOLOT, J., AND GARCIA, A. V. Control Mechamisms for Packet audio in the Internet. In Proceedings of the
Conference on Computer Communications (IEEE Infocom) (San Fransisco, CA, Apr. 1996), pp. 232239.
[7] BOLOT, J., AND GARCIA, A. V. The Case for FEC-Based Error Control for Packet Audio in the Internet. In
Proceedings of ACM International Conference on Multimedia (Sept. 1996).
[8] BOLOT, J., AND TURLETTI, T. Adaptive Error Control for Packet Video in the Internet. In ICIP (Lausanne,
Switzerland, Sept. 1996).
[9] BOLOT, J., AND TURLETTI, T. Experience with Rate Control Mechanisms for Packet Video in the Internet.
In SIGCOMM Symposium on Communications Architectures and Protocols (Jan. 1998), pp. 45.
[10] BROCKWELL, P., AND DAVIS, R. Introduction to Time Series and Forecasting. Springer-Verlag, 1996.
[11] HANDLEY, M. An Examination of MBone Performance. Tech. Rep. ISI/RR-97-450, USC/ISI, January 1997.
[12] LEVIN, R. Statistics for Management. Prentice-Hall, Inc., 1987.
[13] MAXEMCHUK , N., AND LO, S. Measurement and Interpretation of Voice Traffic on the Iinternet. In Confer-
ence Record of the International Conference on Communications (ICC) (Montreal, Canada, June 1997).
[14] YAJNIK, M., KUROSE, J., AND TOWSLEY, D. Packet Loss Correlation in the MBone Multicast Network:
Experimental Measurements and Markov Chain Models. Tech. Rep. UMASS CMPSCI 95-115, University of
Massachusetts, Amherst, MA, 1995.
[15] YAJNIK, M., KUROSE, J., AND TOWSLEY, D. Packet Loss Correlation in the MBone Multicast Network. In
IEEE Global Internet Mini-conference, part of GLOBECOM96(London, England, Nov. 1996), pp. 9499.
22