5
Modeling Internet Link Delay based on Measurement Zhang Hongli, Zhang Yu, Liu Ying Research Center of Computer Network and Information Security Technology, Harbin Institute of Technology, Harbin 150001 {zhl, zhangyu, liuying }@pact518.hit.edu.cn Abstract Understanding Internet link delay is an important key of taking full advantage of Internet resources. In this paper, we find that the link delay follows Gamma distribution using Maximum Likelihood Estimation based on data measured in a large-scale network. The correlations between the link delay and the edge load, the node degree, and the betweenness are analyzed respectively. The link delay distribution parameters for different ranges of edge load are calculated. We find that the edge load is correlated positively with the shape parameter of link delay Gamma distribution and negatively with the scale parameter. 1. Introduction As the importance of Internet is increasing and its structure is more complex, Internet measurement has received great interest from academy, industry and government. Internet measurement research has two main directions: performance measurement and topology measurement. The objects in the performance measurement include delay, loss rate, reliability, reachability, bandwidth and traffic [1]. The Round- Trip Time (RTT) in terms of the end-to-end path delay or link (edge) delay influencing the user experience directly is a hotspot in the networking research. A link is a pair of connected routers and an end-to-end path consists of several links. In recent years, the efforts on network delay modeling are taken widely and have achieved some results. In 1992, Mukheerjee measured the RTTs of three paths and found that the path RTT follows the Gamma distribution [2]. In 1996, Anurag Acharya et al. measured the delays on 90 paths for 48 hours and found that the distribution of RTTs has a long-tailed characteristic and changes slowly with the time going [3]. In 2003, Konstantina measured and analyzed the router single-hop delay in Internet backbone and found that the queue delays in routers follow the Weibull distribution [4]. In 2006, J.A.Hernandez analyzed the delays on the single direction of 700,000 paths using RIPE NCC data and do some modeling with Weibull mixed model [5]. There is a lack of the research work on Internet RTT in China. Zhiping Cai presented a passive measurement method to correct the results from the active measurement and to describe the network delay more accurately [6]. Jingpin Bi et al. researched the relationships among delay bottleneck, delay distribution and loss rate and the correlation between loss rate and throughout of TCP. They found that the distribution of packet delay shows a single acute peak when the loss rate is low, and the distribution of packet delay trends to be flat with the increase of loss rate [7]. However, the previous works mostly focused a small number of end-to-end paths and didn’t understand the factors influencing the delay distribution very well. And the delay data on Chinese network hasn’t been analyzed. The research on the distribution of link RTTs and the factors not only has the theoretical importance, but also provides a basic support to Internet simulation, Internet performance evaluation, QoS routing and p2p networkThis paper will model the Internet link delay and analyze the principal factors influencing the distribution parameters of link delay on a large-scale network measurement data. 2. Internet Topology and Link RTT Measurement In October, 2007, we measured 136,993 destination IP addresses belonging to China with traceroute from six provinces, Anhui, Gansu, Guangxi, Hebei, Hainan and Heilongjiang (HLJ), China. traceroute can discover all router IP addresses and RTTs on the path where a packet passes from the source IP address to the destination IP address. Merging all of traceroute results, we can get the national IP-level topology and path delay data. The delay between a pair of connected routers in the network topology can be estimated approximately by subtracting the RTT of the former hop from the RTT of 2009 International Conference on Electronic Computer Technology 978-0-7695-3559-3/09 $25.00 © 2009 IEEE DOI 10.1109/ICECT.2009.21 420

[IEEE 2009 International Conference on Electronic Computer Technology, ICECT - Macau, China (2009.02.20-2009.02.22)] 2009 International Conference on Electronic Computer Technology

  • Upload
    ying

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Modeling Internet Link Delay based on Measurement

Zhang Hongli, Zhang Yu, Liu Ying Research Center of Computer Network and Information Security Technology, Harbin Institute of

Technology, Harbin 150001 {zhl, zhangyu, liuying }@pact518.hit.edu.cn

Abstract

Understanding Internet link delay is an important

key of taking full advantage of Internet resources. In this paper, we find that the link delay follows Gamma distribution using Maximum Likelihood Estimation based on data measured in a large-scale network. The correlations between the link delay and the edge load, the node degree, and the betweenness are analyzed respectively. The link delay distribution parameters for different ranges of edge load are calculated. We find that the edge load is correlated positively with the shape parameter of link delay Gamma distribution and negatively with the scale parameter. 1. Introduction

As the importance of Internet is increasing and its structure is more complex, Internet measurement has received great interest from academy, industry and government. Internet measurement research has two main directions: performance measurement and topology measurement. The objects in the performance measurement include delay, loss rate, reliability, reachability, bandwidth and traffic [1]. The Round-Trip Time (RTT) in terms of the end-to-end path delay or link (edge) delay influencing the user experience directly is a hotspot in the networking research. A link is a pair of connected routers and an end-to-end path consists of several links. In recent years, the efforts on network delay modeling are taken widely and have achieved some results. In 1992, Mukheerjee measured the RTTs of three paths and found that the path RTT follows the Gamma distribution [2]. In 1996, Anurag Acharya et al. measured the delays on 90 paths for 48 hours and found that the distribution of RTTs has a long-tailed characteristic and changes slowly with the time going [3]. In 2003, Konstantina measured and analyzed the router single-hop delay in Internet backbone and found that the queue delays in routers follow the Weibull distribution [4]. In 2006, J.A.Hernandez analyzed the delays on the single

direction of 700,000 paths using RIPE NCC data and do some modeling with Weibull mixed model [5]. There is a lack of the research work on Internet RTT in China. Zhiping Cai presented a passive measurement method to correct the results from the active measurement and to describe the network delay more accurately [6]. Jingpin Bi et al. researched the relationships among delay bottleneck, delay distribution and loss rate and the correlation between loss rate and throughout of TCP. They found that the distribution of packet delay shows a single acute peak when the loss rate is low, and the distribution of packet delay trends to be flat with the increase of loss rate [7].

However, the previous works mostly focused a small number of end-to-end paths and didn’t understand the factors influencing the delay distribution very well. And the delay data on Chinese network hasn’t been analyzed. The research on the distribution of link RTTs and the factors not only has the theoretical importance, but also provides a basic support to Internet simulation, Internet performance evaluation, QoS routing and p2p network。This paper will model the Internet link delay and analyze the principal factors influencing the distribution parameters of link delay on a large-scale network measurement data.

2. Internet Topology and Link RTT Measurement

In October, 2007, we measured 136,993 destination IP addresses belonging to China with traceroute from six provinces, Anhui, Gansu, Guangxi, Hebei, Hainan and Heilongjiang (HLJ), China. traceroute can discover all router IP addresses and RTTs on the path where a packet passes from the source IP address to the destination IP address. Merging all of traceroute results, we can get the national IP-level topology and path delay data.

The delay between a pair of connected routers in the network topology can be estimated approximately by subtracting the RTT of the former hop from the RTT of

2009 International Conference on Electronic Computer Technology

978-0-7695-3559-3/09 $25.00 © 2009 IEEE

DOI 10.1109/ICECT.2009.21

420

the latter hop (the average for the same link). After discarding the abnormal data such as the route loops, we attained 128,083 links among 76,116 nodes as the basic data for our link delay analysis. At the same time, the edge load (the number of times when a link is passed by the probing packets) on those links are gotten.

3. Modeling the Distribution of Link RTTs 3.1. Internet Link Delay Data Analysis

The results in the large-scale network measurement show that the average of link delay is 14.838ms, the S.D. is 47.903ms,and the range is 2,573.3ms. There are 99.879% of link delays in the interval (0ms,500ms]. The histograms of delays are plotted in Fig.1. The plot shows that the distribution of delay frequency has a long-tail characteristic, which means that the most of links receive a response in a short time, but others need a long time.

The curve of delay frequency on log-log scale follows a straight line (Fig. 1), i.e. ln( ( )) ln( )∝f x x , or

( ) bf x x−∝ . It can be determined that the power-law is the best fit model for Internet link delay distribution from Fig. 1, although it is the simplest one. Other long-tailed probability models include the Gamma, Weibull and lognormal distributions. The probability density functions (PDFs) and their parameters are listed in Table 1. Actually, the following comparison on the four models will show that the power law is worse than the others.

Figure 1 Histogram of delay frequency on log-log scale

Table 1 Definitions of four long-tailed models Model PDF Parameter

Power law ( ) bf x ax−= a , b 0, 0a b> >Gamma 11( )

( )xf x x eα α λλ

α− −=

Γ

α ,λ 1

0

1( )

tt e dtα

α∞ − −=

Γ ∫

Weibull ( ) 1 exp{ ( ) }bxf xa

= − −

a , b 0, 0a b> >

Lognormal 21( )21( )

2

x

f x eμ

σ

σ π

−−=

μ ,σ 0σ >

3.2. Comparison on Internet Link Delay Regressions

The delay data is fitted with the Gamma, lognormal and Weibull using the Maximum Likelihood Estimation (MLE) and with power law using the Least Squares. The basic idea of MLE is a hypothesis that every Internet link delay id is independent and identically-distributed. The sample population is D={

1 2, nd d d… }. For a given probability model, the model parameters is a parameter vector θ and the probability of occurrence for a sample set is

1 21

( | ) ( , | ) ( | )n

n ii

p D p d d d p d=

= = ∏θ θ θ… .

In practice, the logarithmic likelihood function (LLF) is defined usually as

1( ) ln( ( | )) ln( ( | ))

n

ii

l p D p d=

= =∑θ θ θ .

The MLE is to find a optimal vector θ so that the LLF reaches the maximum, i.e. arg max ( )l=θ θ .

The fitted values of model parameters, LLF and SSR (the Sum of the Squares of the Residues) are listed in Table 2. We can see that the Gamma distribution has the smallest error and the greatest likelihood for the delay samples. The simple of power law determines its limitation in the capability of fitting. Fig. 2 shows the empirical Cumulative Distribution Functions (CDFs) of Gamma, Weibull and lognormal distributions. The plot demonstrates that the Gamma distribution is closest to the empirical distribution.

Tabel 2 Results of fitting the delay data with four models

Model Parameters SSR LLE

Power a =-1.2612 b = 0.2903 0.013559 -2380700

Gamma α = 0.2463

λ = 55.928 0.0028567 -299240

Weibull a = 0.5726 b = 0.3669 0.004992 -305820

Lognormalμ =-0.2681

σ = 4.3479 0.027981 -335240

421

Figure 2 The CDF of link delay and models

3.3. Delay Model Tests

In statistics, the Q-Q(Quantile-Quantile)plot is used to examine whether the distribution of samples follow a distribution intuitively. In Fig.3, the values on the x-axis are the percentiles of random numbers generated by (0.2463,55.928)Γ and the values on the y-axis are the percentiles of delay data. The points in the Q-Q plot lie on a straight line approximately. So it can be considered that the delay sample data follow the Gamma distribution, (0.2463,55.928)Γ .

In 1979, Efron presented the bootstrap resampling technique to resample the data which can not be sampled more than one time due to the practical cost [8]. The main idea is that a number of sample sets (the number is equal to the number of data elements) of the observed dataset are obtained by random sampling with replacement from the original dataset. The ith sampling can obtain the estimate iθ of the parameterθ . Repeating the sampling N times can obtain 1 2, ,...... Nθ θ θ or

(1) (2) ( ), ,...... Nθ θ θ after sorting. The confidence interval with the confidence level 1-α and is

( 1) (2 2

[ , ]α α+ −B B Bθ θ for θ .

Figure 3. The Q-Q plot of Gamma distribution

For the hypothesis H0: the delay distribution follows (0.2463,55.928)Γ , we have the H1: the delay doesn’t follow the distribution. The delay data are re-

sampled 100 times and the Gamma distribution performances better than the others 100 times. The dominant ratio reaches 100%. For the confidence level α =0.05, the resulting values of Gamma distribution parameters in Section 2.2.2 fall in the confidence intervals, i.e 0.2463 [0.2458,0.2478], 55.9280[55.6087, 56.9289]. So the H0 is accepted, i.e. the delay distribution follows (0.2463,55.928)Γ .

4. Relationship between Internet Link Delay and Edge Load 4.1 Definition of Edge Load and its Distribution

The edge load on a link is defined as the number of its appearances in all of routes. The edge load can also be normalized. The edge load of a link describes the degree of importance of the link in Internet routing. Greater the edge load of a link is, more important it is in Internet. Since the first several hops from single monitor to all of destinations are the same, we cut them from the tree-like structure [9]. Figure 4 shows the distribution of edge loads of Internet links on log-log scale. We can see that the number of links with a great edge loads are small. The curve looks like a straight line so that the edge load follows the power law.

Figure 4. The distribution of edge loads (log-log)

4.2 Relationship between Internet Link Delay and Edge Load

The X-Y scatter diagram is a method to observe the correlation between two variables. The delay–edge load (on log scale) scatter diagram is plotted in Figure 5. The correlation coefficient is only 0.001.

422

Figure 5. The delay–edge load (on log scale) scatter diagram

We sort the edge load to investigate the potential

laws from the statistical properties. We bin the edge loads sorted by increasing order (at least 4,500 in each bin) so that there are 15 bins. Each group of link delays is analyzed with the probability fit. All groups of link delay follow the Gamma distribution. In particular, the Gamma distribution fits best for 11 groups and the Weibull distribution fits best for the others. With the increasing edge load, the parameter α of Gamma distribution increases while the parameter λ decreases. When the edge load increases, both the collective degree of delay data around the small value and the trend of curve drop become weaker. The reason may be the increasing queue delay due to the increasing link traffic.

The observations in section 3.2 come from the collection of data from all six monitors. Then we investigate the data from each monitor respectively with the same methods and find that the observations still hold in the data from each monitor. So the results are independent with the position of monitors. The delay distribution parameters and their statistical properties are shown in Fig.6. When the bin size is set as 3,000 or 1,000, the results are similar as those observed in Fig.6.

Relationship between Internet link delay and other metrics

In addition to edge load, we also select the node degree and link betweenness as the metric to measure the centrality of links. The correlation between those metrics and link delay are studied.

The degree of a node is the number of nodes connected with it in a topology. In fact, some nodes with a great degree may be on the margin of network, which are disturbers for estimating the centrality and

are difficult to be discarded. Let D(k) be all of edges with a degree k, where the degree of a edge can be defined as the maximum, minimum or average of degrees of two ends. We didn’t observe obvious relationships between the delay and the Gamma model parameters, the mean, the middle and so on. Both the mean of delays and the middle of delays trend to be flat and to be independent with the degree, after the number of links with the same degree reaches over a given value.

The link betweenness is defined as the sum of probabilities that the link lies on the shortest paths between two nodes, which describes the centrality in terms of graph theory. The node betweenness also can be defined [10]. According to the methods in the section 2, we did not yet observe the relationship between the delay and the statistic properties of betweenness. It can be explained as that the betweenness is an estimate of traffic in random communication, and the edge load is an estimate of traffic in the real network. The latter may describe the centrality more accurately.

Conclusions and further work

The large scale measurement data were modeled and analyzed. We found that 1) the distribution of Internet link RTTs has a long-tailed characteristics, and the Gamma distribution fits the data better than the three other probability models. The model was validated with the Q-Q plot test and the Bootstrap hypothesis test. 2) The edge loads of Internet links follow the power law. The link delay data still follow the Gamma distribution when they are binned in terms of edge loads. The parameters of Gamma distribution and statistical properties of delays changes along with the binned edge loads. The universality of results was validated through the various monitors and the centrality metrics.

The observations in this paper are about Chinese network behaviors. In the future, the would wide measurement data will be analyzed and modeled to examine whether our observations only exist in Chinese network or they are a part of nature of Internet. In addition, we have found that the distribution of delays is independent with the position of monitor. The conclusion should be validated under the different scale measurement. If our conclusions can not be influenced by the range and size of measurement, they would be significant.

423

Figure 6 group-link delay property plots (a. Gamma α ; b. Gamma λ ; c. arithmetical mean; d. median)

ACKNOWLEDGMENT This work was supported by National NSF of China (60203021) and The 973 National Basic Research Program of China (2007CB311100).

References [1] Hongli Zhang, Binxing Fang, and Mingzeng Hu. “Survey on Internet measurement and analysis”. Journal of software, 2003,14(1):110-116. (in chinese) [2] A.Mukherjee. “On the dynamics and significance of low frequency components of Internet load”. Technical Report CIS-92-83, Philadelphia University of Pennsylvania,1992. [3] A. Acharya, J. Saltz. “A study of Internet round-trip delay. Technical Report 20742, University of Maryland, 1997. [4] K. Papagiannaki, Sue Moon, C. Fraleigh. “Analysis of measured single-hop delay from an operational backbone network”. Proceedings of INFOCOM 02, University College London: IEEE,2002:535-544

[5] J.A. Hernandez, I.W. Philips. “Weibull mixture model to characterize end-to-end Internet delay at coarse time-scales”. IEEE Proc-Commun, 2006,153(2):295-304. [6] Zhiping Cai. “Active and passive probing based network measurement techniques, models and algorithms”. Phd dissertation, Changsha: National university of defense technology,2005. (in chinese) [7] Jingping Bi. “Internet behavior measurement and analysis research”. Phd dissertation, Beijing: Institute of computing technology, Chinese academy of sciences,2002. (in chinese) [8] B. Efron. “Bootstrap methods: another look at the Jackknife”. The Annals of Statistics, 1979, 7(2): 1-26. [9] Yu Jiang, Mingzeng Hu, Binxing Fang, Hongli Zhang. “An Internet router level topology automatic discovery system”. Journal of communicaition, 2002,23(12):54~62. (in chinese) [10] Freeman, L. C. “Centrality in social networks: conceptual clarification”. Social Networks, 1979,1:215-239.

424