5

Click here to load reader

[IEEE 2006 International Conference on Communications, Circuits and Systems - Guilin, Guangzi, China (2006.06.25-2006.06.28)] 2006 International Conference on Communications, Circuits

Embed Size (px)

Citation preview

Page 1: [IEEE 2006 International Conference on Communications, Circuits and Systems - Guilin, Guangzi, China (2006.06.25-2006.06.28)] 2006 International Conference on Communications, Circuits

Towards a Precise and Complete InternetTopology Generator

Shi ZhouUniversity College London

Adastral Park Campus, Ross BuildingIpswich, IP5 3RE, United KingdomEmail: s.zhouwadastral.ucl.ac.uk

Guoqiang Zhang and Guoqing ZhangInstitute of Computing Technology

Chinese Academy of SciencesBeijing, 100080, China

Email: {guoqiang, gqzhang}lict.ac.cn

Zhenrong ZhugeSystem Science and Engineering Dept

Zhejiang UniversityHangzhou, 310027, China

Email: zdjww263.net

Abstract- Correctly modeling the Internet topology is criticalfor the design of the next-generation network protocols. In thispaper we present a more rigorous evaluation of the Positive-Feedback Preference (PFP) model by comparing the modelagainst a measurement of the Internet topology at the autonomoussystems (AS) level which was collected later than that used in theprevious evaluation, and a recent measurement of the ChineseInternet AS graph which is a local subgraph evolved fairlyindependently from rest of the global graph. Our numericalsimulation shows that the PFP model, using the same parameters,precisely reproduces both the global AS graph and the ASsubgraph in China. This work increases our confidence that thePFP model is a precise and complete Internet topology generatorand suggests that the model's grow mechanisms have successfullycaptured the fundamental dynamics that universally govern theInternet evolution.

I. INTRODUCTION

During the last three decades, the Internet has experiencedfascinating evolution, both exponential growth in its traffic andendless expansion in its topology [1]. It is crucial to obtain agood description of the topology, because effective engineeringof the Internet is predicated on a detailed understandingof issues such as the large-scale structure of its underlyingphysical topology, the manner in which it evolves over time,and the way in which its constituent components contribute toits overall function [2]. The Internet topology can be studiedat two levels, namely the routers level and the autonomoussystems (AS) level, where routers are grouped into subgraphs,i.e. ASes. In this paper we focus on the Internet topology atthe AS-level, which is relevant to the routing of data trafficthrough the global Internet by using the BGP protocol.

Only recently did practical measurements of the InternetAS topology become available [3], [4], [5], [6], [7], [8], [9],[10]. These measurements improved our understanding of theInternet structure and how the structure affects the network'sbehavior, performance and reliability [11], [12]. Since thenresearchers have proposed a number of Internet models toresemble and explain topology properties of the Internet [13].As of this writing, the most precise and complete Internettopology generator is the Positive-Feedback Preference (PFP)model [14], because it most accurately reproduces the largestset of important topology characteristics [8] of the AS-levelInternet.

In this paper we present a more rigorous evaluation of thePFP model by comparing the model against a measurement ofthe Internet AS graph which was collected later than that usedin the previous evaluation, and a recent measurement of theChinese Internet AS graph. Our numerical simulation showsthat the PFP model, using the same parameters, preciselyreproduces both the global AS graph and the AS subgraphin China. The model closely matches a number of importanttopology characteristics that have practical meanings for theInternet engineering. This work increases our confidence inthe PFP model as a precise and complete Internet topologygenerator and it reconfirms that the model's grow mechanismshave successfully captured the fundamental dynamics thatuniversally govern the Internet evolution. We also shows thatby tuning the PFP model's two parameters, the PFP modelis able to generate a wide range of different topologies andthus could potentially be used to model real networks in otherdomains as well.

II. THE POSITIVE-FEEDBACK PREFERENCE MODEL

The PFP model is an extensive modification of the famousBarabasi-Albert (BA) model [15]. The PFP model uses two dy-namic mechanisms to generate Internet-like networks, namelythe interactive growth and the positive-feedback preference.

Exisiting systemn Exisiting systemPer Host Host Hu t

Peer Peer

(a) New node (b) New node

Fig. 1. The interactive growth mechanism of the PFP model.

1) Interactive Growth: It has been observed on the Internetthat the network evolution is largely due to two processes:the addition of new nodes, and the addition of new internallinks between already existing node, old nodes [16], [17]. Theinteractive growth mechanism is based on the hypothesis thatthe two processes are interdependent. That is to say, newcustomers (new nodes) provide the incentive for their serviceproviders (old nodes) to develop new connections linking to

0-7803-9584-0/06/$20.00(2006 IEEE. 1830

Page 2: [IEEE 2006 International Conference on Communications, Circuits and Systems - Guilin, Guangzi, China (2006.06.25-2006.06.28)] 2006 International Conference on Communications, Circuits

other service providers in order to improve service quality.To distinguish the different roles of old nodes, in this paperwe call an old node to which a new node is attached (via anexternal link) a host node, and an old node to which a newinternal link is attached, a peer node. Starting from a smallrandom graph, the interactive growth is described as such: ateach time step,

* with probability p C [0, 1] , a new node is attached to onehost node, and two new internal links appear connectingthe host node to two peer nodes (see Fig. 1-a);

. with probability 1- p , a new node is attached to twohost nodes, and one new internal link appears connectingone of the host nodes to one peer node (see Fig. 1-b).

The interactive growth is controlled by the parameter p, whichadjusts the proportion of new nodes with initial degrees of oneor two.

2) Positive-Feedback Preference: In the interactive growth,host nodes and peer nodes are chosen from old nodes usingthe positive-feedback preference. If node degree, k, is definedas the number of links a node has, the preference probability,Il(i), that old node i is chosen as a host/peer node is given as

kl+ log10 kit

l SOi0J 5e [0,1].ri(i) = E k1+6 logl0 kj, d °,1ij i

(1)

The positive-feedback preference is controlled by the parame-ter 6, which tunes how a node's ability of acquiring new linksincreases as a feed-back loop of links that it already obtains.The positive-feedback preference reflects the "winner-take-all"trend embedded in the social-technological competition. Whend = 0, the positive-feedback preference is equivalent to thelinear preference of the BA model.

Previous numerical simulation [14] has shown that param-eters p = 0.4 and d = 0.048 produce the best result whenmodeling the Internet AS-level topology.

III. MORE RIGOROUS MODEL EVALUATIONPrevious evaluation of the PFP model [14] was conducted

using the data kit ITDKO204, a measurement of the InternetAS graph which was extracted from traceroute data collectedby CAIDA's active probing tool Skitter in April 2002 [6]. Inthis paper we evaluate the PFP model by comparing the modelagainst CAIDA's later measurement of the AS-level Internettopology, ITDKO304, which was collected in 2003 and usedmore strict rules on filtering AS connectivity information fromtraceroute data.

In this paper we also compare the PFP model against thedataset CN05, which is a measurement of the Chinese InternetAS graph collected by researchers at the Chinese Academy ofSciences in May 2005 1. The reason that we use the CN05is because there is no Skitter monitor located in China andtherefore the CAIDA's measurement on the Chinese Internetis incomplete. More importantly, the Internet in China hasdeveloped in an environment characterized by more centralizedplanning and less commercial competition than that of the

lDownloadable at http:/www.avdastral. ucl. ac. uk/szhoulresource. htm

West. We are interested to see whether the PFP model iscapable of reproducing both the global Internet AS graph anda local AS subgraph which has evolved fairly independentlyfrom rest of the Internet.We use the PFP model with parameters p = 0.4 and

d = 0.048 to generate synthetic networks having the samenumbers of nodes as CN05 and ITDKO304 (see Table I). Allquantities and figures of the PFP networks shown in this paperare averages over ten graphs generated using different randomseeds. We then compares the PFP networks against the twoAS graphs by examining the following topology properties.

TABLE ITOPOLOGY PROPERTIES OF AS GRAPHS AND PFP NETWORKS.

CN05 PFP ITDKo304 PFP

Number of nodes, N 84 84 9204 9204

Number of links, L 211 217 28959 27612

Maximum degree, k,max 38 39 2070 1950

Exponent of P(k),<y -2.21 -2.21 -2.254 -2.255

Assortative coefficient, a -0.328 -0.298 -0.236 -0.234

Top clique size, nclique 3 3 16 16

Exponent of(o(rlN), 0 -1.42 -1.42 -1.48 -1.49

Characteristic path length, £* 2.54 2.54 3.12 3.07

A. Degree DistributionNode degree is a local property but the distribution of

node degree provides a global view of a network structure.Faloutsos et al has showed that the AS-level Internet topologyfollows a power-law degree distribution, P(k) '- [18].This means that most nodes that form the network have veryfew links. Some nodes, however, are very well connected.The power-law exponent iy is a metric that fundamentallycharacterizes the AS-level Internet topology. Table I showsthat the power-law exponent of CN05 (-y =-2.21) is slightlylarger than that of the ITDKO304 (-y =-2.255). Table I andFigure 2 show that the PFP networks closely match the degreedistribution of the two AS graphs.

B. Disassortative MixingThe Internet exhibits the so-called disassortative mixing

behavior, where on average, high-degree nodes tend to connectto low-degree nodes [16], [19]. As shown in Figure 3, the ASgraphs feature a negative correlation between degree k andthe nearest-neighbors average degree of k-degree nodes. Anetwork's mixing pattern can also be identified by calculatingthe assortative coefficient a [20], which is defined as

L-1 Zmjmkm [L- m1 (jm + km)]2L 1Zim (jmn+kmn) [L 1Zm 2(jm+km)12' 2

where L is the number of links a network has, and jm, kmare the degrees of the nodes at the ends of the mth link, withm = 1, ..., L. Table I shows that the disassortative mixingAS graphs are characterized by negative a values. The PFP

1831

Page 3: [IEEE 2006 International Conference on Communications, Circuits and Systems - Guilin, Guangzi, China (2006.06.25-2006.06.28)] 2006 International Conference on Communications, Circuits

1n-1

0n.1

U)

0

o-2-

10-3-

10-4°10-

CN05 oPFP model (N=84) -----------

ITDKO304 o. PFP model (N=9204)

10 102 103Degree

Fig. 2. Degree distribution, P(k).

10° 101 102Degree, k

Fig. 4. Rich-club connectivity vs node degree.0

10

io-1

2o 10-20

103

a)

U)

-oaua)> 2co 10

E20

-c

a)

10a)z

10° 101 102 103Degree

Fig. 3. Nearest-neighbors average degree vs node degree

0

9-

C)CD0

-0

-C

C.)

o-io-2

10-3

103

0 1 2 3 410 10 10 10 10

Triangle coefficient

Fig. 6. Complementary cumulative distribution (CCD) ofthe triangle coefficient.

100% I

80%

0 60%0

40%-

20%-

Normalised rank, r/N

Fig. 5. Rich-club connectivity vs normalized rank.

104

A-

c

U1)

a)

00

U1)cD

103

102

101

100

10° 101 102 103Degree, k

Fig. 7. Average triangle coefficient of k-degree nodes.

CN05 9PFP model (N=84)--------

ITDKO304oPFP model (N=9204)-

00eo'vW~~~~~~~

3.5 -

3.0

-ca)mc- 2.5a,)r,0

Cl) 2.0

0

1 2 3 4 5Shortest path length

Fig. 8. Complementary cumulative distribution (CCD) ofthe shortest path length.

1.5~ ~ ~ ~ ~,12 310° 101 102 103

Degree

Fig. 9. Average shortest path length of k-degree nodes, £(k).

1832

9-

CDC

0

-D

0~

io-4 _

-,O" * PFP model (N=84)------------ITDKO304 o

PFP model (N=9204)

10°

104

10°

io-i

io-2

io-3

Page 4: [IEEE 2006 International Conference on Communications, Circuits and Systems - Guilin, Guangzi, China (2006.06.25-2006.06.28)] 2006 International Conference on Communications, Circuits

networks precisely reproduces the degree-degree correlationand the assortative coefficient of the two AS graphs.

C. Rich-Club PhenomenonThe Internet AS-level topology exhibits the rich-club phe-

nomenon [21], which describes the fact that high-degree nodes,rich nodes, are tightly interconnected with other rich nodes,forming a core group, or rich-club. Subgraphs formed by richernodes are progressively more tightly interconnected. The rich-club phenomenon does not conflict with the disassortativemixing because it does not imply that the majority links ofthe rich nodes are directed to other club members.

The rich-club phenomenon is quantitatively assessed by themetric of rich-club connectivity, o, which is defined as theratio of the number of links among members of a rich-club tothe maximum possible links among the members. The rich-club membership can be defined as "nodes with degree nosmaller than k" or "the r best connected nodes", where rankr denotes a node's position on the non-increasing degree listof a graph. The Rich-club connectivity measures how wellclub members "know" each other, e.g. f 1 means that allthe members have a direct link to any other member, i.e. theyform a fully connected mesh, a clique. The tope clique size,nclique, is the maximum number of highest rank nodes stillforming a clique.

Figure 4 and Figure 5 illustrate the rich-club connectivityas a function of node degree k and node rank r respectively,where node rank is normalized by the number of nodes N.Both CN05 and ITDKO304 obey a power law of ~o(r/N)(r/N)0, where the rich-club exponent OCN = -1.42 andOITDK =-1.48. The PFP networks accurately resemble boththe tope clique size (see Table I) and the rich-club exponentof the two AS graphs.

D. Short CyclesShort cycles (e. g. triangles and quadrangles) encode the

redundancy information in a network structure because themultiplicity of paths between any two nodes increases withthe density of short cycles [22]. The triangle coefficient, kt,is defined as the number of triangles that a node shares. Asshown in Figure 6, the complimentary cumulative distribution(CCD) of the node triangle coefficient of the two AS graphsexhibit similar power-law distributions, P(> kt) 0k, (< 0.Figure 7 illustrates the correlation between node degree andtriangle coefficient, which infers neighbor clustering informa-tion of nodes with different degrees. The PFP model closelymatches the above triangle coefficient properties of the twoAS graphs.

E. Shortest Path LengthThe shortest path length, 1, is defined as the minimum

hop distance between a pair of nodes. The characteristic pathlength, 1*, is the average of the shortest path length over allpairs of nodes. The Internet is a small-world network [23]as it is possible to get to any node via only a few linksamong adjoining nodes (see 1* in Table I). Performance of

0.8

p

0.4

0.2

(a)0

0.8

0.6

p

04

(b)

p

(c)

6

6

-1.40

-1.45

0-1.50

-1.55

-1.60

-2.05-2.10-2.15-2.20-2.25-2.30 Y-2.35-2.40-2.45-2.50-2.55

-0.10-0.12-0.14-0.16-0.18-0.2-0.24-0.26

-0.28-0.28-0.30

6

Fig. 10. Sensitivity of key topology metrics to the interactive growthparameter p and the positive-feedback preference parameter 6. a) Power-lawexponent 0 of rich-club connectivity ~o(r/N), b) Power-law exponent -y ofdegree distribution P(k), and c) The assortative coefficient oz.

modem routing algorithms depend strongly on the distributionof shortest path [24]. Figure 8 shows that the CCD of theshortest path length and Figure 9 shows the average shortestpath length of k-degree nodes. The PFP networks preciselyreproduces the shortest path properties of the two AS graphs.

IV. MODEL SENSITIVITY To PARAMETERS p AND d

The interactive growth and the positive-feedback preferenceare controlled by parameters p and d respectively. In thissection we study the model's sensitivity to the two parameters.We generate PFP networks having the same number of nodesas ITDKO304 and study how the following three key topologymetrics change when the two parameters are variables: the

1833

LLJ

LL]

tooloommossom

Page 5: [IEEE 2006 International Conference on Communications, Circuits and Systems - Guilin, Guangzi, China (2006.06.25-2006.06.28)] 2006 International Conference on Communications, Circuits

rich-club power-law exponent 0, the degree distribution power-law exponent iy and the assortative coefficient a.

Fig. 10-a shows that the rich-club exponent 0 is very sensi-tive to the interactive growth parameter p and is fairly unsen-sitive to the positive-feedback preference parameter 6. Whenparameter p increases, the rich-club exponent 0 decreases,which means the generated networks contain a more tightlyinterconnected rich-club core. Fig. 10-b shows that the twoparameters have opposite effects on the degree distributionexponent -y. In general, the exponent iy increases when pa-rameter p increases and parameter d decreases. Fig. 10-c showsthe PFP model exhibits a stronger disassortative mixing whenboth the positive-feedback preference parameter d and theinteractive growth parameter p increase. It is clear that theassortative coefficient a is more sensitive to parameter d thanto parameter p.

The above analysis confirms that the recommended param-eter values for generating Internet-like graphs are p = 0.4and d = 0.048. Our result also shows that by tuning the PFPmodel's two parameters, the PFP model is able to generate awide range of different topology properties. For example bytuning the parameter d from 0 to 0.064, the PFP model iscapable of generating a wide range of disassortative networkswith assortative coefficient varying from -0.124 to -0.279,which encompasses most reported disassortative networks inthe technological and biological domains [20].

V. DISCUSSION AND CONCLUSIONIn this paper we conduct a more rigorous evaluation of

the PFP model using CAIDA's later measurement on theInternet AS graph, ITDKO304 and the recent measurement ofthe Chinese Internet AS graph, CN05. Numerical simulationshows that the PFP model precisely reproduces both of theAS graphs by matching a number of important topologycharacteristics that have practical meanings for the Internetengineering.

It is clear that the AS-level topology of the Chinese Internetis consistent with that of the global Internet. This is in spiteof the fact that the Internet in China has developed in anenvironment characterised by more centralised planning andless commercial competition than that of the West. Our resultsuggests that the non-technical factors may only affect thedevelopment of the access networks, e.g. fewer and largerInternet service providers (ISP) in China, whereas the Internettopology at the AS-level is mainly influenced by universaltechnical factors, e.g. performance metrics. That is to say,when making decisions on infrastructure expansion, all ISPsaround the world consider the same issues, such as routinglatency and network reliability etc.As an Internet topology generator, the PFP model is re-

markable in its accuracy and completeness. We advocate thatthe PFP model should be used for more realistic Internetsimulation research. Our result suggests that the PFP model'stwo mechanisms have successfully captured the fundamentaldynamics that universally govern the evolution of Internettopology. Our work also shows that the PFP model is able

to generate networks with a wide range of different topologyproperties by tuning the two parameters. Future work shouldfurther evaluate the PFP model against history snapshots andother regional subgraphs of the Internet AS-level topology.

REFERENCES

[1] G. F. Riley and M. H. Ammar, "Simulating large networks - how bigis big enough?" in Proc. of 1st Intl. Conf on Grand Challenges forModeling and Simulation, 2002.

[2] S. Floyd and E. Kohler, "Internet research needs better models," ACMSIGCOMM Computer Communications Reviews, vol. 33, no. 1, pp. 29-34, January 2003.

[3] National Laboratory for Applied Network Research (NLANR), USA.http://moat.nlanr.net/.

[4] Route Views Project, University of Oregon, Eugene, USA.http:lwww.routeviews.org/.

[5] Topology Project, University of Michigan, Ann Arbor, USA.http://topology.eecs.umich.edu/.

[6] The Cooperative Association For Internet Data Analysis, University ofCalifornia, San Diego, USA. http://www.caida.org/.

[7] L. Gao, "On inferring autonomous system relationships in the Internet,"in Proc. of IEEE Global Internet, 2000.

[8] P. Mahadevan, D. Krioukov, M. Fomenkov, B. Huffaker,X. Dimitropoulos, and K. Claffy, "Lessons from threeviews of the Internet topology," CAIDA, Tech. Rep., 2005,http://www.caida.org/outreach/papers/2005/tr-2005-02/.

[9] H. Chang, R. Govindan, S. Jamin, S. Shenker, and W. Willinger, "To-wards capturing representative AS-level Internet topologies," ComputerNetworks Journal, vol. 44, no. 6, pp. 737-755, April 2004.

[10] P. Mahadevan, D. Krioukov, B. Huffaker, X. Dimitropoulos,kc claffy, and A. Vahdat, "Lessons from three viewsof the Internet topology," CAIDA, Tech. Rep., 2005,http://www.caida.org/outreach/papers/2005/tr-2005-02/.

[11] S. Bornholdt and H. G. Schuster, Handbook of Graphs and Networks -From the Genome to the Internet. Weinheim Germany: Wiley-VCH,2002.

[12] R. Pastor-Satorras and A. Vespignani, Evolution and Structure of theInternet - A Statistical Physics Approach. Cambridge University Press,2004.

[13] S. N. Dorogovtsev and J. F. F. Mendes, Evolution of Networks - FromBiological Nets to the Internet and WWW. Oxford University Press,2003.

[14] S. Zhou and R. J. Mondrag6n, "Accurately modelling the Internettopology," Physical Review E, vol. 70, no. 066108, December 2004.

[15] R. Albert and A. L. Barabasi, "Statistical mechanics of complex net-works," Rev. Mod. Phys., vol. 74, pp. 47-97, 2002.

[16] A. Vazquez, R. Pastor-Satorras, and A. Vespignani, "Large-scale topo-logical and dynamical properties of Internet," Phys. Rev. E, vol. 65, no.066130, 2002.

[17] S. T. Park, A. Khrabrov, D. M. Pennock, S. Lawrence, C. L. Giles, andL. H. Ungar, "Static and dynamic analysis of the Internet's susceptibilityto faults and attacks," in Proc. ofIEEE INFOCOM 2003, vol. 3, April2003, pp. 2144-2154.

[18] M. Faloutsos, P. Faloutsos, and C. Faloutsos, "On power-law relation-ships of the Internet topology," Comput. Commun. Rev., vol. 29, pp.251-262, 1999.

[19] S. Maslov, K. Sneppen, and A. Zaliznyak, "Detection of topological pat-terns in complex networks: correlation profile of the Internet," PhysicaA, vol. 333, p. 529, 2004.

[20] M. E. J. Newman, "Mixing patterns in networks," Phys. Rev. E, vol. 67,no. 026126, 2003.

[21] S. Zhou and R. J. Mondrag6n, "The rich-club phenomenon in theInternet topology," IEEE Comm. Lett., vol. 8, no. 3, pp. 180-182, March2004.

[22] R. P.-S. G. Caldarelli and A. Vespignani, "Structure of cycles andlocal ordering in complex networks," The European Physical JournalB, vol. 28, no. 2, pp. 183-186, 2004.

[23] D. J. Watts and S. H. Strogatz, "Collective dynamics of 'small-world'networks," Nature, vol. 393, 1998.

[24] C. Labovitz, A. Ahuja, R. Wattenhofer, and S. Venkatachary, "Theimpact of Internet policy and topology on delayed routing convergence,"in Proc. ofINFOCOMM 2001, 2001.

1834