[YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

Embed Size (px)

Citation preview

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    1/22

    Measurement and Modelling of the Temporal Dependence

    in Packet Loss

    Maya Yajnik, Sue Moon, Jim Kurose and Don Towsley

    Department of Computer Science

    University of Massachusetts

    Amherst, MA 01003 USA

    emails:[yajnik,sbmoon,kurose,towsley]@cs.umass.edu

    UMASS CMPSCI Technical Report # 98-78

    Abstract

    Understanding and modelling packet loss in the Internet is especially relevant for the design and analysis of

    delay-sensitive multimedia applications. In this paper, we present analysis of hours of end-to-end unicast and

    multicast packet loss measurement. From these we selected hours of stationary traces for further analysis. We

    consider the dependence as seen in the autocorrelation function of the original loss data as well as the dependence

    between good run lengths and loss run lengths. The correlation timescale is found to be or less. We evaluate

    the accuracy of three models of increasing complexity: the Bernoulli model, the 2-state Markov chain model and

    the -th order Markov chain model. Out of the trace segments considered, the Bernoulli model was found tobe accurate for segments, the 2-state model was found to be accurate for segments. A Markov chain model

    of order or greater was found to be necessary to accurately model the rest of the segments. For the case of

    adaptive applications which track loss, we address two issues of on-line loss estimation: the required memory size

    and whether to use exponential smoothing or a sliding window average to estimate average loss rate. We find that a

    large memory size is necessary and that the sliding window average provides a more accurate estimate for the same

    effective memory size.

    Keywords : measurement, modelling, loss, temporal dependence, correlation.

    This work was supported by the National Science Foundation under grant NCR 9508274 and by Defense Advanced Research Projects

    Agency under agreement F30602-98-2-0238. Any opinions, findings, and conclusions or recommendations expressed in this material are

    those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

    1

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    2/22

    1 Introduction

    Packet loss is a key factor determining the quality seen by delay-sensitive multimedia applications such as au-

    dio/video conferencing and Internet telephony. These applications experience degradation in quality with increasingloss and delay in the network. Understanding the loss seen by such applications is important in their design and

    performance analysis. Adaptive audio/video applications adjust their transmission rate according to the perceived

    congestion level in the network (see [6, 8, 9]). By this adjustment a suitable loss level can be maintained and band-

    width can be shared fairly between connections. For such adaptive applications it is important to have simple loss

    models that can be parameterized in an on-line manner.

    In this paper, we present careful analysis of end-to-end loss using hours of measurements. The measurements

    are of loss as seen by packet probes sent at regular intervals (of , , and ) sent on both unicast

    and multicast Internet connections. Since significant non-stationary effects are seen in the data, we divide the traces

    into two-hour segments and check each segment for stationarity. Gradual decrease and increase in the mean loss rate,

    abrupt and dramatic increase in loss rate for a few minutes and spikes of high loss rate are observed. We analyze

    hours of stationary trace segments to determine the extent of temporal dependence in the data. The results show thatthe correlation timescale, the time over which what happens to one packet is connected to what happens to another

    packet, is approximately second or less. Beyond this timescale, packet losses are independent of each other.

    We also consider loss models of increasing complexity and estimate the level of complexity necessary to accu-

    rately model the stationary trace segments. The loss process is modelled as a -state discrete-time Markov chain

    model where corresponds to the Bernoulli model and corresponds to a 2-state Markov chain model.

    is refered to as the order of the Markov chain. For the datasets with sampling intervals of , the Bernoulli

    model was found to be accurate for segments, the 2-state Markov model for segments and Markov chain models

    of orders to for the remaining . The Bernoulli and the 2-state Markov chain models are not accurate for the

    traces with sampling intervals of and and the estimated order of the Markov chain model ranged from

    to .

    Adaptive multimedia applications often track the congestion level in the network in order to adjust the rate oftransmission to share the bandwidth fairly and to maintain a low enough loss level ([6, 9]). They can use information

    about loss in the network over a period of time, instead of the traditional per-packet loss information. For such

    situations of on-line loss estimation, it is useful to know how much past information is necessary to accurately

    estimate the parameters of the loss models. The non-stationarity observed in the data shows that the loss rate does

    vary over time, sometimes dramatically, and thus up-to-date information is important.

    Because of the need of adaptive applications for loss rate estimates, we address the question of how much

    memory is necessary for the on-line parameter estimator to provide an accurate enough estimate. In general, it is

    found that the variance in the estimator due to limited number of samples is high and thus large numbers of samples

    are required to get an accurate estimate. We also address the question of whether or not exponential smoothing

    provides a better estimate of the mean loss rate than the sliding window average. Though computationally quicker

    and requiring less buffer space, exponential smoothing needs more than twice as many samples to provide an estimate

    with the same accuracy as the sliding window average.

    Earlier works on the measurement of unicast and multicast packet loss in the Internet ([1, 4, 5, 14, 15]) have

    noted that the number of consecutively lost packets is small. In [15] long outages of several seconds and minutes

    were also observed on the MBone (multicast backbone network overlaid on the Internet). In [13] measurements of

    voice traffic are analyzed to assess the effects of strategies to compensate for varying delay and loss on the quality

    of the voice connection. In [1] statistical analysis of temporal correlation in loss data has been described and weak

    correlation has been noted. The use of discrete-time Markov chain models, particularly the 2-state Markov chain

    2

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    3/22

    Table 1: Trace Descriptions

    Date Type Sampling Destination Time Duration LossInterval ( ) Rate

    1 14Nov97 unicast 80ms SICS, Sweden 09:52 8hr 2.70%

    2 14Nov97 multicast 80ms SICS, Sweden 09:53 8hr 11.03%

    3 20Nov97 unicast 160ms SICS, Sweden 16:03 2days 1.91%

    4 12Dec97 multicast 160ms St. Louis, Missouri 14:23 2days 5.24%

    5 20Dec97 unicast 20ms Seattle, Washington 13:41 2hr 27min 1.69%

    6 20Dec97 multicast 20ms Seattle, Washington 13:49 2hr 27min 3.76%

    7 21Dec97 unicast 20ms Los Angeles, California 13:17 2hr 1.38%

    8 21Dec97 multicast 20ms Los Angeles, California 13:26 2hr 3.37%

    9 26Jun98 unicast 40ms Atlanta, Georgia 16:56 6hr 2.22%

    model (sometimes called the Gilbert model) has been proposed in [14, 7, 8, 11]. Discrete-time Markov chain models

    of increasing levels of complexity, including the 2-state Markov chain model have been described in [14]. This paper

    takes a more careful look at both temporal dependence and the validity of using various models for packet loss using

    hours of stationary data.

    The remainder of this paper is structured as follows. In Section 2 we describe how the data was collected. In

    Section 3 we first describe two ways of representing the data and then go on to discuss our analysis of stationarity

    and the temporal dependence in the data. The models and an evaluation of their accuracy is discussed in Section 4.

    In Section 5 we discuss the memory size required for on-line estimation of model parameters and also the relative

    suitability of exponential smoothing and sliding window averaging as ways of estimating the average loss rate.

    Finally, the conclusions are given in Section 6.

    2 Measurement

    Our study of end-to-end loss in the Internet is based on hours of traces. These traces were gathered by sending

    out packet probes along unicast and multicast end-to-end connections at periodic intervals (of , , or ms)

    and recording the sequence numbers of the probe packets that arrived successfully at the receiver. The packets whose

    sequence numbers were not recorded were assumed to have been lost. Thus the loss data can be considered to be a

    binary time series where takes the value if the probe packet arrived successfully and the value if it

    was lost. The interval at which the probe packets are sent out into the network is referred to as the sampling interval

    ( ) for the rest of the paper.

    The measurements are summarized in Table 1. The source was located at Amherst, Massachusetts for all thetraces. Note that one of multicast trace (not shown in the table) displayed an extremely high loss rate of

    and was therefore disregarded for analysis. We refer to the datasets using the notation Date.Type (for example,

    20Nov97.uni).

    3

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    4/22

    0 1000 2000 3000 4000 5000 6000 7000 80000

    5

    10

    15

    20

    25

    30

    Time in seconds

    Loss

    percentage

    Figure 1: The Loss Percentage for a two-hour seg-

    ment of 14Nov97.uni

    3 Analysis

    We consider two ways of representing the loss data. The first is as a binary time series where takes the

    value if the probe packet arrived successfully and the value if it was lost. The time series is a discrete-valued

    time series which takes on values in the set .

    A second way of representing the data is as alternating sequences of good runs and loss runs. The trace can be

    divided into portions of consecutive s and portions of consecutive s. The good runs are the lengths (expressed in

    number of packets) of the portions of the trace which consist solely of consecutive s. Similarly, the loss runs arethe lengths of the portions of the traces which consist solely of consecutive s. Thus the data can be considered to be

    the interleaving of the two sequences of observations: and where is the good run length and

    is the corresponding loss run length.

    The remainder of this section is structured as follows. We discuss our analysis of the traces for stationarity

    in Section 3.1. We find that approximately hours exhibit stationarity. In Section 3.2 we focus on the temporal

    dependence in these stationary trace segments as shown by the sample autocorrelation function of the binary time

    series representation of the data. We discuss our analysis of the temporal dependence using the good run and loss

    run representation of the data in Section 3.3.

    3.1 Stationarity

    By looking at the smoothed trend of the loss data, obvious non-stationarity was found in most of the traces. There-

    fore, the traces are divided into approximately two-hour segments and each segment was checked for stationarity. For

    the rest of the paper we refer to the datasets using the notation Date.Type-Segment# (for example, 14Nov97.uni-

    3). A time series is said to be stationary in the strict-sense, if the statistical properties remain constant over the

    entire series. A time series is said to be stationary in the wide-sense (also called weak stationarity) if the mean

    and covariance function remain constant over time. Since there is no good way to rigorously test for stationarity,

    4

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    5/22

    Table 2: Stationarity

    sampling seg- number ofDataset interval ments stationary

    segments

    1 14Nov97.uni 80ms 4 none

    2 14Nov97.multi 80ms 4 none

    3 20Nov97.uni 160ms 24 19

    4 12Dec97.multi 160ms 25 14

    5 20Dec97.uni 20ms 1 1

    6 20Dec97.multi 20ms 1 1

    7 21Dec97.uni 20ms 1 1

    8 21Dec97.multi 20ms 1 none

    9 26Jun98.uni 40ms 3 3

    we checked whether the average loss rate varied significantly in the trace segments. We plotted the smoothed trace

    using a finite moving average filter (of window size of packets) to judge the extent to which the average loss

    rate varied over the length of the trace. Abrupt increases of greater than in the average loss rate seen in the

    smoothed trace, were taken to be an indication of non-stationarity. Also, to test for gradual increase or decrease in

    the average loss rate, we fit a straight line to the data (a straight line which minimizes the least-square error) as an

    estimate of the linear trend. A total change in the average loss rate of or greater, over a hour trace segment,

    as estimated by the least-squares straight line, was considered to be an indication of non-stationarity.

    There are different kinds of observed non-stationary effects. For example, consider the smoothed loss for the

    third segment of trace 14Nov97.uni in Fig. 1. In the first half of the segment there is a slow, linear decay in losspercentage from to over one hour. Such slow decays and increases were noticed in other traces as well.

    Also, in this case there is an abrupt, dramatic increase from loss to loss which lasts for approximately 10

    minutes before abruptly dropping to a lower loss rate. Such abrupt increases were noticed in other traces although

    this trace segment shows the most dramatic increases which last the longest time.

    Table 2 summarizes the results of our analysis for stationarity. Traces 20Dec97.multi and 21Dec97.multi

    exhibit periodic behavior (with a period of ) making them difficult to analyze and model. Thus we iden-

    tified out of two hour segments as exhibiting stationary behavior. The stationary segments of five traces:

    20Nov97.uni, 12Dec97.multi, 20Dec97.uni, 21Dec97.uni and 26Jun98.uni have been analyzed for tem-

    poral dependence in the following sections.

    3.2 Autocorrelation of the Binary Loss Data

    The sample autocorrelation function of the loss data for the stationary segments, expressed as the binary time series

    reveals the extent of dependence in the loss experienced by the probe packets over time. In this section,

    we define the autocorrelation, describe the estimation of correlation timescale from the autocorrelation function and

    discuss the results for the loss data.

    Let be a stationary sequence of random variables and let be a realization of . The autocor-

    5

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    6/22

    relation function for the stationary sequence of random variables is defined as

    where is the mean and is the lag.

    The sample mean for observed data is

    The sample autocovariance function, assuming stationarity is

    where is the lag. The sample autocorrelation function is

    (1)

    The lag can also be expressed in terms of time. That is, let be the lag in terms of time.

    (2)

    where is the sampling interval.

    Correlation timescale, is the minimum lag, in terms of time, such that is uncorrelated at all lags, , .

    In the case of an independent stochastic process, the autocorrelation function is zero, and if the stochastic processis independent at a lag , then the autocorrelation function, , is zero at that lag. However, in the case of an

    observed sample sequence which is the realization of an independent stochastic process, the sample autocorrelation

    function could have non-zero, though very small, values. Proposition 1 gives an idea of what constitutes a significant

    value for the sample autocorrelation function for a given number of samples, .

    Proposition 1 For large , the sample autocorrelations of an IID sequence with finite variance are approx-

    imately IID with a normal distribution (mean of and variance of )(see [10]). Hence if is a

    realization of such an IID sequence then, approximately of the sample autocorrelations should fall between the

    confidence bounds .

    This proposition specifies what constitutes a significant value for the sample autocorrelation function for a given

    number of samples. The correlation timescale is the smallest lag in terms of time, , at and beyond which the valueof the sample autocorrelation function becomes small enough to be considered insignificant.

    Using Proposition 1 we can test the following hypothesis:

    Hypothesis 1 is independent at lag

    The test statistic is calculated from the sample autocorrelation function at lag , of the sample sequence, as

    6

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    7/22

    0 10 20 30 40 50

    0

    0.05

    0.1

    0.15

    0.2

    Lag

    Autocorrelation

    observedbounds

    Figure 2: Sample Autocorrelation Function of Bi-nary Data for 20Nov97.uni-1

    0 500 1000 1500 2000

    0

    0.05

    0.1

    0.15

    0.2

    d: Lag in terms of time (ms)

    Autocorrelation

    20Nov97.uni1 (160ms)26Jun98.uni1 (40ms)

    20Dec97.uni (20ms)

    Figure 3: Sample Autocorrelation Function for

    three trace segments with sampling intervals of, and

    follows.

    If the hypothesis is true, then has a limiting standard normal distribution. Thus, for example, if then

    the hypothesis is rejected at significance level .

    The results for the loss data show that there is some correlation at small lags, and that the sample autocorrelation

    function decays rapidly. As an example, consider the first segment of trace 20Nov97.uni, 20Nov97.uni-1 withsampling interval of . Fig. 2 shows the autocorrelation function of the binary loss data. A value close to zero

    of the sample autocorrelation function, for a lag of , indicates independence between and . Since there are

    a finite number of samples, the confidence bounds of show the range of values which are close enough

    to zero to indicate independence. In the figure, the confidence bounds ( ) are plotted as dotted lines. The

    autocorrelation at lag (that is, the correlation between consecutive packets) is approximately . It is clear that

    from a lag of onwards the autocorrelation function becomes insignificantly small except for a few stray points.

    This means that the data is dependent up to a lag of 2 and that the correlation timescale for this trace segment is

    .

    We analyzed all of the stationary segments of our traces. The traces, 20Nov97.uni and 12Dec97.multi had

    sampling intervals of , the traces 20Dec97.uni and 21Dec97.uni had sampling intervals of and the

    trace 26Jun98.uni had a sampling interval of . For the trace segments with sampling intervals of ,

    the sample autocorrelation functions were similar to that described in the example trace segment 20Nov97.uni-1.

    That is, the autocorrelation function for lag 1 is usually significant and approximately and the function decays

    to insignificant values for lags of around and more. When the lag is expressed in terms of time (refer to 2) , this

    corresponds to a correlation timescale of or less. The correlation between consecutive packets for datasets

    20Nov97.uni and 12Dec97.multi is found to vary between and with an average of .

    The behavior of the sample autocorrelation function for traces with sampling intervals of and is

    somewhat different from that for traces with sampling intervals of . For the traces, the autocorrelation

    7

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    8/22

    160 320 480 640 800 960 11200

    2

    4

    6

    8

    10

    12

    Correlation Timescale in ms

    Frequency

    Figure 4: Frequency Distribution of

    Correlation Timescale

    function at lag 1 is approximately which is much higher than the value of approximately for the

    trace segments. This indicates much higher correlation between consecutive packets for sampling intervals of

    than is present for traces with . The trace segments with sampling intervals of showed a similarly high

    autocorrelation of approximately at lag of . The autocorrelation functions for the traces with sampling interval

    remains significant for lags of or less, and for the traces with sampling intervals of it remains signifi-

    cant for lags of or less. Fig. 3 shows the sample autocorrelation as a function of lag expressed in terms of time (2),

    for three trace segments each with different sampling intervals. Here the chosen trace segments are 20Nov97.uni-

    1, 26Jun98.uni and 20Dec97.uni with sampling intervals of , and respectively. Thus from

    this figure it is clear that the correlation timescale is or less.

    The correlation timescale of a data segment is estimated as the smallest lag (expressed in terms of time) for which

    the Hypothesis 1 is rejected (that is, as the smallest lag at which the time series is independent). A significance levelof is used. This method could underestimate the correlation timescale, since the autocorrelation function could

    show dependence at some higher lag, but we have ignored this effect. The results for all the stationary trace segments

    are summarized in Fig. 4 as a frequency distribution of the correlation timescale estimated for each data segment.

    Here we can see that the correlation timescale is less than except in two cases.

    For comparison with traces with sampling intervals of , the traces 20Dec97.uni and 21Dec97.uni

    were partitioned into subtraces, each subtrace corresponding to a trace with sampling interval of . Also,

    the segments of trace 26Jun98.uni was partitioned into subtraces, each subtrace corresponding to a trace with

    sampling interval of . As expected, the sample autocorrelation functions of the subtraces are similar to those

    seen for the traces with sampling intervals of .

    Thus, the results indicate a correlation timescale of 1 second or less over all the stationary trace segments.

    Also, this suggests that loss may not be adequately modelled by an IID process such as the Bernoulli process sincethe traces exhibit some correlation at small lags. The 2-state Markov chain model is able to accurately model the

    correlation for lag 1 but since the correlation predicted decays rapidly for larger lags, it is inaccurate with respect to

    modelling the correlation for lags greater than 1.

    8

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    9/22

    3.3 Good Run and Loss Run Lengths

    The second way of representing the data is as alternating sequences of good run lengths and loss run lengths: that

    is, and . The good runs are the lengths (expressed in number of packets) of the portions of the tracewhich consist solely of consecutive s. Similarly, the loss runs are the lengths of the portions of the traces which

    consist solely of consecutive s. Since the binary loss data, is not completely independent in many cases, we

    check the good runs and loss runs for independence. The autocorrelation function of the good run lengths and the loss

    run lengths gives information about the extent of dependence between the run lengths. Thus if the autocorrelation

    functions are found to have very small values, then it indicates that there is little dependence between the lengths

    of runs and the underlying processes can be assumed to be IID. Similarly, if the crosscorrelation function between

    the good runs and the loss runs is found to be small it indicates that the lengths of good runs and loss runs are not

    dependent on each other. It was found that, in most cases, the good runs and the loss runs are indeed independent.

    Thus the good runs and the loss runs can be modelled separately as IID random variables.

    Fig. 5 shows the autocorrelation function of the good run lengths, the autocorrelation function of the loss run

    lengths and the crosscorrelation function between the lengths of the good runs and the loss runs for an example tracesegment 20Nov97.uni-1. The dotted lines indicate the confidence bounds for the range of values considered close

    to zero thus indicating independence for that lag as specified by Proposition 1. It is evident that there is insignificant

    dependence among the good runs, among the loss runs and between good runs and loss runs.

    We analyzed all the stationary segments of the traces similarly. The three sample correlation functions (au-

    tocorrelation of good run lengths, autocorrelation of loss run lengths and the crosscorrelation between good run

    lengths and loss run lengths) indicate independence for all the segments in the trace 20Nov97.uni. The trace

    12Dec97.multi did show dependence between the good run lengths in cases (up to lags , and ).

    The traces with sampling intervals of (20Dec97.uni and 21Dec97.uni) show different results. The

    autocorrelation function of the loss runs and the crosscorrelation function of the good runs and the loss runs have

    insignificantly small values thus indicating the independence of the loss run lengths and the independence of the

    good runs from the loss runs. However, the good runs lengths do show some dependence (up to lags and ).

    Fig. 6 shows the three correlation functions for the trace 20Dec97.uni which has a sampling interval of .

    The trace 26Jun98.uni with sampling interval of shows independence for all three correlation functions:

    autocorrelation of good run lengths, autocorrelation of loss run lengths and crosscorrelation of good run and loss run

    lengths.

    The next step is to look at the distributions of the good run and the loss run lengths. Fig. 7 shows the distribution

    of the lengths of the good runs expressed in the number of packets and Fig. 8 shows the distribution of the loss

    run lengths for the same trace segment 20Nov97.uni-1. Both figures also show the distributions predicted by the

    models for comparison with the observed data which will be discussed in Section 4. Here it is evident that the

    distribution of the good run lengths decays approximately linearly when a logscale is used for the y-axis, suggesting

    a geometric distribution for the good run length process. The loss run lengths are either or . The other stationary

    segments from traces with sampling intervals of showed very similar results. The distribution of the good

    run lengths for the segments of traces 20Nov97.uni and 12Dec97.multi appear to be, by visual inspection,approximately geometric. For some segments there is an excess of very short good run lengths (less than 5). The

    loss run lengths were either , or for all but one segment of trace 20Nov97.uni. The loss run lengths are either

    , , or for all but three segments of trace 12Dec97.multi. This low number of consecutive losses has been

    noted in earlier work( [4], [5]).

    The distributions of good run and loss run lengths for the traces 20Dec97.uni, 21Dec97.uni and 26Jun98.uni

    look somewhat different. For example, consider the distribution of good run lengths shown in Fig. 9 for the trace

    9

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    10/22

    0 10 20 30 40 500.5

    0.4

    0.3

    0.2

    0.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    Lag

    Autocorrelation/Crosscorrelation

    autocorrelationgoodautocorrelationlosscrosscorrelationconfidence bounds

    Figure 5: Autocorrelations and Crosscorrela-

    tion of Good Run and Loss Run Lengths for

    20Nov97.uni-1

    0 10 20 30 40 500.5

    0.4

    0.3

    0.2

    0.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    Autocorrelation/Crosscorrelation

    Lag

    autocorrelationgoodautocorrelationlosscrosscorrelationconfidence bounds

    Figure 6: Autocorrelations and Crosscorrela-

    tion of Good Run and Loss Run Lengths for

    20Dec97.uni

    20Dec97.uni. The distribution of good run lengths shows geometric decay except for the high probability for burst

    lengths of less than . The distribution of loss run lengths for this trace shown in Fig. 8, shows approximately

    geometric decay. The traces 21Dec97.uni and 26Jun98.uni show similar results.

    Since the good run lengths for 20Nov97.uni and 12Dec97.multi appear to be geometrically distributed, we

    use the Chi-Square goodness-of-fit test to provide more systematic evidence that the distributions are geometric.

    Instead of the discrete-valued geometric distribution we use its continuous-valued equivalent, the exponential distri-

    bution. We also compute the shape and rate parameters assuming a Gamma distribution for the good run lengths.

    The shape parameter ( , where is the coefficient of variation) of the Gamma distribution gives an

    idea of how close the distribution is to the exponential distribution. A shape parameter of corresponds to an ex-

    ponential distribution, which is a special case of the Gamma distribution.The shape parameter for the segments of

    20Nov97.uni varied between and with an average of . The shape parameter for the segments of

    12Dec97.multi varied between and with an average of .

    The Chi-Square goodness-of-fit test is used to check whether to reject the hypothesis that the good run lengths are

    IID random variables with the given distribution. Using the Chi-Square goodness-of-fit test, the following hypothesis

    can be tested for both the Gamma and the Exponential distributions.

    Hypothesis 2 The good run lengths are IID random variables with the given distribution function.

    It is difficult to similarly use the Chi-Square goodness-of-fit test for the loss run lengths because the lengths of the

    loss run lengths vary within a very small range. The continuous-value distributions were used to make it easier to

    determine the partitions for the test. For traces 20Nov97.uni and 12Dec97.multi, the hypothesis is rejected for

    the exponential distribution for out of segments and for the Gamma distribution it was rejected for out of

    segments.

    Thus we conclude that the good run lengths are approximately geometrically distributed for the traces 20Nov97.uni

    and 12Dec97.multi, although the fit is not exact for one-third of the trace segments.

    For comparison with the traces with sampling intervals of , the traces 20Dec97.uni, 21Dec97.uni and

    26Jun98.uni were partitioned into subtraces, each subtrace corresponding to a trace with sampling intervals of

    . These subtraces were then analyzed. As expected, the results are similar to those seen for the first two traces

    10

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    11/22

    0 50 100 150 200 250 30010

    3

    102

    101

    Number of packets in the burst: b

    Probabilityofthe

    length

    being

    b

    observedBernoulli model2state model

    Figure 7: The Distribution of Good Run Lengths

    for 20Nov97.uni-1

    0 2 4 6 8 10

    103

    102

    101

    100

    Number of packets in the burst: b

    Probabilityofthe

    length

    being

    b

    observed

    Bernoulli model2state model

    Figure 8: The Distribution of Loss Run Lengths

    for 20Nov97.uni-1

    0 50 100 150 200 250 30010

    4

    103

    102

    101

    Number of packets in the burst: b

    Probabilityofthe

    length

    being

    b

    observed

    Bernoulli model2state model

    Figure 9: The Distribution of Good Run Lengths

    for 20Dec97.uni

    0 2 4 6 8 1010

    3

    102

    101

    100

    Number of packets in the burst: b

    Probability

    ofthe

    length

    being

    b

    observed

    Bernoulli model2state model

    Figure 10: The Distribution of Loss Run Lengths

    for 20Dec97.uni

    11

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    12/22

    with sampling intervals of .

    4 Modelling

    In this section we discuss the potential of using three models of increasing complexity: the Bernoulli loss model, the

    2-state Markov chain model and the -th order Markov chain model for modelling the data. The loss process which

    characterizes the binary loss data (as described in Section 3) is considered to be a stationary discrete-time

    stochastic process taking values in the set . The order of the process is the number of previous values of

    the process on which the current value depends and can be considered as a measure of the complexity of the model.

    For an order of , there are states. The Bernoulli model is the simplest model with order of and no states. The

    2-state Markov chain model is of order . The Markov chain model of -th order is a generalization of these two

    models and has states.

    First, in Section 4.1 we describe the models and associated parameter estimators. Then in Section 4.2 we assess

    the validity of the models and the results for the stationary trace segments.

    4.1 Models

    4.1.1 Bernoulli Loss Model

    In the Bernoulli loss model, the sequence of random variables, , is IID (independent and identically dis-

    tributed). That is, the probability of being either or is independent of all other values of the time series

    and the probabilities are the same irrespective of . Thus this model is characterized by a single parameter, , the

    probability of being a (value corresponds to a lost packet). It can be estimated from a sample trace by

    (3)

    where is the number of times the value occurs in the observed time series, and is the number of

    samples in the time series.

    The good run length distribution for this model is

    The loss run length distribution for this model is

    (4)

    4.1.2 Two-state Markov Chain Model

    In the 2-state Markov chain model the loss process is modelled as a discrete-time Markov chain with two states. The

    current state, of the stochastic process is depends only by the previous value, . Unlike the Bernoulli model,

    this model is able to capture the dependence between consecutive losses but has an additional parameter. This model

    can be characterized by two parameters, and , the transition probabilities between the two states.

    12

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    13/22

    The maximum likelihood estimators (page of [2]) of and for a sample trace are

    (5)

    (6)

    where is the number of times in the observed time series that follows a and is the number of times

    follows . is the number of times a occurs in the trace and is the number of times a occurs in the trace.

    The good run length distribution for this model is

    The loss run length distribution for this model is

    Note that if equals this 2-state Markov chain model is equivalent to the Bernoulli model, since it means

    that the probability of loss is independent of which state the process is in. Also, the relative value of and

    provides us us information about how bursty the losses are. If then a lost packet is more likely provided

    that the previous packet was lost than when the previous packet was successfully received. On the other hand,

    indicates that loss is more likely if the previous packet was not lost.

    4.1.3 Markov Chain Model of -th order

    A class of random processes that is rich enough to capture a large variety of temporal dependencies is the class offinite-state Markov processes. The Bernoulli model and the 2-state Markov chain model are special cases of this

    class of models. In the Markov chain model, the current state of the process depends on a certain number of previous

    values of the process. This number of previous values is is the order of the process.

    Such a process is characterized by its order , and by a conditional probability matrix whose rows may

    be interpreted as probability mass functions on ( ) according to which, the next random variable is

    generated when the process is in a state .

    A process is a Markov chain of order , if the conditional probability

    is independent of for all (see [3]). Note that the Bernoulli and the 2-state Markov chain model are

    special cases of the Markov chain model corresponding to orders of and respectively.

    Let be an observed sequence or a sample path from a Markov source. The -th order state transition

    probabilities of the Markov chain can be estimated for all as follows. The number of states is and

    the number of conditional probabilities is .

    13

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    14/22

    Let be a given state of the chain. Let be the number of times state is followed by value

    in the sample sequence. Let be the number of times state is seen. Let be an estimate of the probability

    that , given that . Then estimates the state transition probability from state to state

    . The maximum likelihood estimators of the state transition probabilities of the -th order Markov

    chain are

    if

    otherwise

    4.2 Results for the Stationary Segments

    The autocorrelation function of the binary loss data as discussed in Section 3.2 shows some correlation for small

    lags. The Bernoulli model does not capture any of the correlation seen for example in Fig. 2. The 2-state Markov

    chain model is able to accurately model the correlation only for a lag of . The estimated order of the binary loss

    process indicates the complexity of the Markov chain model necessary to accurately model the data. If the estimatedorder is , then the Bernoulli loss model is accurate, if it is then the 2-state Markov chain model is accurate, and

    if it is or greater, then a Markov chain model of order is required.

    Testing Hypothesis 1 (that the binary loss data is independent at a given lag ) is discussed in Section 3.2. A

    test statistic for testing the hypothesis using the sample autocorrelation function is also discussed in Section 3.2. An

    alternative method to test the same hypothesis for independence of the binary loss data at a given lag, is by using the

    Chi-Square Test for independence ([12]). The test statistic for and a lag of is

    and

    where is the number of times occurs in the sample sequence and similarly, is the number of

    times occurs in the sample sequence. is the number of times and (that is, the number of times

    follows after a lag of ). is distributed as a chi-square distribution with degree of freedom.

    The order of the Markov chain process can be estimated by estimating the minimum lag beyond which the

    process is independent. This is similar to the estimation of correlation timescale from the sample autocorrelation

    function as discussed in Section 3.2. Thus we test Hypothesis 1 for lags from to . Let be the first lag at which

    the hypothesis is not rejected. Then the estimated order is . Fig. 11 shows the estimated order of the Markov

    chain for the stationary segments of all the datasets. The two methods of testing the independence hypothesis (using

    the sample autocorrelation function and using the chi-square test for indepedence) give the same results. Figure 11

    shows that, out of segments, The Bernoulli loss model is accurate for segments, the 2-state Markov chain model

    is accurate for segments and for the rest of the segments Markov chain models of higher orders are necessary for

    accurate modelling. The estimated order for datasets 20Nov97.uni and 12Dec97.multi ranges from to and

    that for datasets 20Dec97.uni, 21Dec97.uni and 26Jun98.uni ranges from to and is much higher thanthat estimated for the other datasets.

    We also checked the good run and loss run lengths for independence. It was found that, in most cases, as

    discussed in Section 3.3 the good run and the loss run lengths are, indeed, independent. Thus the good runs and

    the loss runs can be modelled separately as IID random variables. Both the Bernoulli and the 2-state Markov chain

    model model the good run and loss run processes as being independent.

    Next we look at the distributions of the good run and the loss runs lengths and check how well the two models

    14

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    15/22

    0 5 10 15 20 25 30 35 40 450

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    Order of the Markov Chain

    Frequency

    Figure 11: Frequency Distribution of the Order of The Markov Chain Model

    predict the empirical distributions. Both the Bernoulli model and the 2-state Markov chain model predict that the

    good run and the loss run lengths are distributed according to geometric distributions. As discussed in Section 3.3,

    the good run length distribution was found to be approximately geometrically distributed for traces 20Nov97.uni

    and 12Dec97.multi, which means that the 2-state Markov chain model is suitable for these traces. However,

    since the Chi-Square goodness-of-fit test shows that the exponential distribution is not accurate for one-third of the

    segments, a Markov chain model with more states would more accurately model the good run lengths. The loss run

    lengths were found to be geometrically distributed in all cases.

    The traces 20Dec97.uni, 21Dec97.uni and 26Jun98.uni show a greater degree of correlation. The good run

    lengths are clearly not geometrically distributed as shown in Fig. 9 although the loss run lengths are geometrically

    distributed. Thus we conclude that neither the Bernoulli nor the 2-state Markov chain model are accurate for these

    traces and a greater number of states are necessary to accurately model the loss process.

    For the stationary segments of 20Nov97.uni the estimated parameter, varies between and

    with an average of , varied between and with an average of and varied between

    and with an average of .

    Table 4.2 summarizes the estimated model parameters (as discussed in Section 4.1). It shows the minimum,

    maximum and average over all stationary segments for each the dataset. For trace segments where the value of

    is higher than the value of , the packet loss is burstier than predicted by the Bernoulli model and the 2-state

    Markov chain model is more accurate.

    5 On-Line Parameter Estimation

    Adaptive applications monitor the congestion level in the network in order to adjust their transmission rate to keep the

    loss rate low and to share the available bandwidth fairly with other connections. We address two issues in the on-line

    estimation of loss for such applications. The first issue is the amount of memory necessary for an accurate estimate

    of the Bernoulli model parameter (discussed in Section 5.1) and the the 2-state Markov chain model parameters

    (discussed in Section 5.2). The second issue is whether to use exponential smoothing or a sliding window average

    15

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    16/22

    Table 3: Parameters for the Bernoulli and the 2-state Markov Chain ModelsData

    20Nov97.uni minimum 0.0005 0.0004 0.0208

    maximum 0.0351 0.0347 0.0909

    average 0.0162 0.0158 0.0471

    12Dec97.multi minimum 0.0376 0.0373 0.0408

    maximum 0.0735 0.0749 0.0636

    average 0.0496 0.0496 0.0513

    20Dec97.uni 0.0168 0.0141 0.1732

    21Dec97.uni 0.0135 0.0109 0.2085

    26Jun98.uni minimum 0.0204 0.0178 0.1452

    maximum 0.0254 0.0218 0.1608

    average 0.0222 0.0192 0.1546

    to get up-to-date and accurate average loss rate estimates. This is discussed in Section 5.3

    5.1 Memory for the Bernoulli Model Parameter

    Equation 3 gives an estimator for the single Bernoulli loss model parameter, , which is equivalent to the mean loss

    rate. Assume that values of the binary loss data, are available for estimation of the parameter, . Henceforth,

    will be referred to as the memory size of . The confidence interval for this estimator is

    Fig. 12 shows the estimation of the mean loss rate for the example trace segment 20Nov97.uni-1 for memory

    sizes of , and plotted as a function of the packet sequence number. The graphs display the extent of

    variation in the estimated mean loss rate (it varies between and for a memory size of samples).

    For an accuracy of in the estimator, and a confidence level of , the number of observations, that are

    required is

    Note that this depends on the loss rate, . For example, for an accuracy of in the estimate of , the required

    memory size as calculated using (7) for a loss rate of is and the required memory size for a loss rate ofis .

    5.2 Memory for the 2-State Markov Chain Model Parameters

    For the 2-state Markov chain model, there are two parameters to be estimated, as specified by 5 and as specified

    by 6. Instead of considering the parameter, directly consider the related parameter . Assume that a

    16

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    17/22

    memory of values of the basic loss data, are available for estimation of the parameter, . Then the

    confidence interval for the estimator for this parameter is

    and the confidence interval for the estimate of parameter is

    Fig. 13 shows the estimate of the value of for the example trace segment 20Nov97.uni-1 for memory sizes

    of and plotted as a function of the packet sequence number. The graphs display the extent of variation

    in the estimated value (it varies between and for a memory size of samples). Similarly, Fig. 14

    shows the estimate of the value of for the example trace segment. The estimate of fluctuates wildly especiallyfor a memory size of .

    For an accuracy of in the estimator of parameter , with a confidence level of , the number of

    observations, that are required is

    The memory required for an accuracy of in the estimate of parameter is

    Note that this depends on the fraction of packets lost, . For an absolute permissible error of in estimating , with

    a confidence level of , the memory, required is

    The memory required for an accuracy of in the estimate of parameter is

    5.3 Exponential Smoothing versus Sliding Window Averaging

    Adaptive applications need to estimate the average loss rate in an on-line manner in order to adjust to the congestion

    in the network. There could be two ways to estimate the loss rate in an on-line manner: Exponential smoothing and

    Sliding Window Averaging. The sliding window average gives the sample mean over a sliding window, say of size

    samples. Thus only the latest samples contribute to the current estimate ensuring that the older information is not

    17

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    18/22

    0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

    x 104

    0

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    0.045

    0.05

    packet sequence number

    estimatedmeanvalue

    overall meanaveraging:500averaging:1000averaging:10000

    Figure 12: Estimating the mean loss rate for

    20Nov97.uni-1 using sliding window averaging

    for memory sizes of , and

    0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

    x 104

    0

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    0.045

    0.05

    packet sequence number

    estimatedparametervalue:p

    overall p valueest. p value:1000est. p value:10000

    Figure 13: Estimating the parameter for

    20Nov97.uni-1 for memory sizes of and

    0 0.5 1 1.5 2 2.5 3 3.5 4 4.5x 10

    4

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0.16

    0.18

    0.2

    packet sequence number

    estimatedparametervalue:s

    overall s

    est. s:1000

    est. s:10000

    Figure 14: Estimating the parameter for 20Nov97.uni-1 for memory sizes and

    18

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    19/22

    used. Restricting the memory size is important because of possible non-stationarities in the loss data. Exponential

    smoothing is a computationally quicker method for estimating the current mean and it also requires less buffer space.

    The question we address here is: whether sliding window averaging gives a significantly better estimate of the

    loss rate in terms of both the variance of the estimator and the memory required to get an accurate estimate. As

    we will see the sliding window average is, indeed, a significantly better estimator. For the same effective memory

    size, exponential smoothing produces an estimate with more than double the variance of the estimate produced by

    a sliding window estimate and for the same variance, it requires more than twice as much memory. In the analysis

    of the variance of the estimate we have assumed, for simplicity, that the loss process is IID (that is, we assume

    the Bernoulli model). In the presence of correlation, the variance of the sample mean estimator is higher than that

    calculated assuming independence.

    In order to compare the two styles of on-line mean estimation, we first compute the effective memory size in

    the case of exponential smoothing. Consider the loss data as a binary time series . then for a gain of , the

    exponential smoothing estimator at time is

    Thus all the previous samples are used but the weights given to the old samples decay geometrically with the age of

    the sample relative to the latest sample.

    Let the weight given to a sample observation, at time , where , be . Let be a given number of

    observations immediately older than the observation at time and let be the sum of the weights given to all

    observations as old as or older than the observation at time .

    (7)

    The effective memory, , is defined as the such that the total weight given to observations as old as or older

    than is restricted to a small fraction, , . That is, using (7), is calculated as the such that .

    The effective memory is defined as the memory, , for which the weight in the tail, is approximately equal to

    some fraction , .

    (for small values of a) (8)

    For , the effective memory, is given by

    An important property of an estimator is the relative efficiency defined as, its variance relative to the sliding

    window average (which is equivalent to the sample mean estimator). Let the population mean be and let the

    sample mean estimator (that is, the sliding window average) be . Then the variance of the sample mean estimator

    is

    19

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    20/22

    0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

    x 104

    0

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    0.045

    0.05

    packet sequence number

    estimatedmeanvalu

    e

    overall meanslide. win. average:2000exp. smoothing:2000

    Figure 15: Estimating the mean loss rate

    for 20Nov97.uni-1: comparison of exponential

    smoothing: and sliding window aver-

    age:

    0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

    x 104

    0

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    0.045

    0.05

    packet sequence number

    estimatedmeanvalu

    e

    overall meanaveraging: 1000exp. smoothing: 2000

    Figure 16: Estimating the mean loss rate

    for 20Nov97.uni-1: comparison of exponential

    smoothing: and sliding window aver-

    age:

    where is the population variance. The variance of the exponential smoothing estimator is

    The relative efficiency of the exponential smoothing estimator (relative to the sliding window average) is the ratio of

    the variances

    The gain, can be calculated for a given effective memory size using (8). This means that the relative efficiency

    of the exponential smoothing estimator with the effective memory ( ) equal to the memory of the sliding

    window average is . This implies that we see twice as much variance in the exponential smoothing estimator

    as we do with the sliding window averaging. This difference in variance can be seen in Fig. 15, which compares

    the estimation of the means for the example trace segment 20Nov97.uni-1 for memory size of samples. The

    mean estimator values are plotted versus the sequence number of the probe packets. The straight line is the overall

    mean, the solid line is the sliding window average and the dotted line is the exponential smoothing estimator values.

    Similarly, Fig. 16 shows the comparison of exponential smoothing estimator with effective memory with the

    sliding window averaging estimator with memory . Here the variance is almost the same since the exponential

    smoothing uses double the memory.Yet another way of comparing exponential smoothing and sliding window averaging is by considering the case

    when the gain of the exponential smoothing algorithm and the window size for the sliding window average are set

    such that the variances of the estimators are equal. In this case, in the exponential smoothing algorithm, the total

    weight given to samples older than the corresponding window size of the sliding window averaging algorithm, is

    approximately . Thus in order to get the same variance in the estimator the exponential smoothing algorithm

    constructs a weighted average with a weight of given to samples beyond the window.

    20

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    21/22

    6 Conclusions and Future Work

    We have presented measurements of Internet packet loss for unicast and multicast connections. We analyzed the

    traces for stationarity and identified hours of stationary trace segments. The loss data was represented both asa binary time series and as alternating sequences of good runs and loss runs. We checked the data for temporal

    dependence using both representations.

    Our study of the autocorrelation function of the binary time series representation, revealed that the correlation

    timescale for these traces is or less. The distributions of the good run lengths and the loss run lengths were

    found to be approximately geometrically distributed for the traces. For the and traces, the good

    run lengths were found to be not geometrically distributed, while the loss run lengths were geometrically distributed.

    We examined the accuracy of three models: the Bernoulli model, the 2-state Markov chain model and the Markov

    chain model of -th order, in terms of our analysis of temporal dependence. For the datasets with sampling intervals

    of , the Bernoulli model was found to be accurate for segments, the 2-state Markov model for segments

    and Markov chain models of orders to for the remaining . The Bernoulli and the 2-state Markov chain models

    are not accurate for the traces with sampling intervals of and and the estimated order of the Markovchain model ranged from to .

    For on-line loss estimation, we computed the required memory size for a given accuracy in the parameter estimate

    for both the average loss rate and the 2-state Markov chain model parameters. Also, we found that using the sliding

    window average for average loss rate estimation has advantages over exponential smoothing since it uses less of the

    old information for the same variance in the estimator.

    A richer collection of measurements with different sampling intervals and a variety of senders and receivers

    would allow a better understanding of the temporal dependence in the loss data. And it would be worthwhile to

    apply the models to protocol design and performance analysis.

    Acknowledgments

    We would like to thank Supratik Bhattacharyya, Timur Friedman, Christopher Rafael, Sanjeev Khudanpur and

    Joseph Horowitz for fruitful discussions and mathematical insights.

    We are grateful to Chuck Cranor of Washington Univ., St.Louis, Stephen Pink of Swedish Institute of Computer

    Science, Linda Prime of Univ. of Washington Peter Wan of Georgia Institute of Technology and Lixia Zhang of

    Univ. of California, Los Angeles for providing computer accounts which allowed us to take the measurements.

    References

    [1] ANDREN, J., AND HILDING, M. Real-time Determination of Traffic Conditions over the Internet. Masters

    thesis, Chalmers University of Technology Gothenburg, Sweden, Jan. 1998.

    [2] BILLINGSLEY, P. Statistical Inference for Markov Processes. The University of Chicago Press, 1961.

    [3] BILLINGSLEY, P. Statistical Methods in Markov Chains. Ann. Math. Stat. 32 (1961), 1240.

    [4] BOLOT, J. End-to-end Packet Delay and Loss Behavior in the Internet. In SIGCOMM Symposium on Commu-

    nications Architectures and Protocols (San Francisco, Sept. 1993), pp. 289298.

    21

  • 7/31/2019 [YMKT99]Measurement and Modeling of the Temporal Dependence in Packet Loss

    22/22

    [5] BOLOT, J., CREPIN, H., AND GARCIA, A. V. Analysis of Audio Packet Loss in the Internet. In Proceed-

    ings of the International Worskshop on Network and Operating System Support for Digital Audio and Video

    (NOSSDAV) (Durham, New Hampshire, Apr. 1995), pp. 163174.

    [6] BOLOT, J., AND GARCIA, A. V. Control Mechamisms for Packet audio in the Internet. In Proceedings of the

    Conference on Computer Communications (IEEE Infocom) (San Fransisco, CA, Apr. 1996), pp. 232239.

    [7] BOLOT, J., AND GARCIA, A. V. The Case for FEC-Based Error Control for Packet Audio in the Internet. In

    Proceedings of ACM International Conference on Multimedia (Sept. 1996).

    [8] BOLOT, J., AND TURLETTI, T. Adaptive Error Control for Packet Video in the Internet. In ICIP (Lausanne,

    Switzerland, Sept. 1996).

    [9] BOLOT, J., AND TURLETTI, T. Experience with Rate Control Mechanisms for Packet Video in the Internet.

    In SIGCOMM Symposium on Communications Architectures and Protocols (Jan. 1998), pp. 45.

    [10] BROCKWELL, P., AND DAVIS, R. Introduction to Time Series and Forecasting. Springer-Verlag, 1996.

    [11] HANDLEY, M. An Examination of MBone Performance. Tech. Rep. ISI/RR-97-450, USC/ISI, January 1997.

    [12] LEVIN, R. Statistics for Management. Prentice-Hall, Inc., 1987.

    [13] MAXEMCHUK , N., AND LO, S. Measurement and Interpretation of Voice Traffic on the Iinternet. In Confer-

    ence Record of the International Conference on Communications (ICC) (Montreal, Canada, June 1997).

    [14] YAJNIK, M., KUROSE, J., AND TOWSLEY, D. Packet Loss Correlation in the MBone Multicast Network:

    Experimental Measurements and Markov Chain Models. Tech. Rep. UMASS CMPSCI 95-115, University of

    Massachusetts, Amherst, MA, 1995.

    [15] YAJNIK, M., KUROSE, J., AND TOWSLEY, D. Packet Loss Correlation in the MBone Multicast Network. In

    IEEE Global Internet Mini-conference, part of GLOBECOM96(London, England, Nov. 1996), pp. 9499.

    22