Towards a Theory of Scale-Free Graphs: Deﬁnition, …2001/03/09 · arXiv:cond-mat/0501169v2 [cond-mat.dis-nn] 18 Oct 2005 Towards a Theory of Scale-Free Graphs: Deﬁnition, Properties,

arX

iv:c

ond-

mat

/050

1169

v2 [

cond

-mat

.dis

-nn]

18

Oct

200

5 Towards a Theory of Scale-Free Graphs:Definition, Properties, and Implications (Extended Version)∗

Technical Report CIT-CDS-04-006, Engineering & Applied Sciences Division

California Institute of Technology, Pasadena, CA, USA

Lun Li†, David Alderson∗, Reiko Tanaka‡, John C. Doyle∗, Walter Willinger§

Updated: October 2005

Abstract

Although the “scale-free” literature is large and growing,it gives neither a precise definition of scale-free graphs nor rigorousproofs of many of their claimed properties. In fact, it is easily shown that the existing theory has many inherent contradictionsand verifiably false claims. In this paper, we propose a new, mathematically precise, and structural definition of the extentto which a graph is scale-free, and prove a series of results that recover many of the claimed properties while suggestingthepotential for a rich and interesting theory. With this definition, scale-free (or its opposite, scale-rich) is closely related to otherstructural graph properties such as various notions of self-similarity (or respectively, self-dissimilarity). Scale-free graphs arealso shown to be the likely outcome of random construction processes, consistent with the heuristic definitions implicit inexisting random graph approaches. Our approach clarifies much of the confusion surrounding the sensational qualitative claimsin the scale-free literature, and offers rigorous and quantitative alternatives.

1 Introduction

One of the most popular topics recently within the interdis-ciplinary study of complex networks has been the investiga-tion of so-called “scale-free” graphs. Originally introducedby Barabasi and Albert [15], scale-free (SF) graphs have beenproposed as generic, yet universal models of network topolo-gies that exhibit power law distributions in the connectivity ofnetwork nodes. As a result of the apparent ubiquity of such dis-tributions across many naturally occurring and man-made sys-tems, SF graphs have been suggested as representative mod-els of complex systems ranging from the social sciences (col-laboration graphs of movie actors or scientific co-authors)tomolecular biology (cellular metabolism and genetic regula-tory networks) to the Internet (Web graphs, router-level graphs,and AS-level graphs). Because these models exhibit featuresnot easily captured by traditional Erdos-Renyi random graphs[43], it has been suggested that the discovery, analysis, and ap-plication of SF graphs may even represent a “new science ofnetworks” [14, 40].

As pointed out in [24, 25] and discussed in [48], despitethe popularity of the SF network paradigm in the complexsystems literature, the definition of “scale-free” in the con-text of network graph models has never been made precise,and the results on SF graphs are largely heuristic and ex-perimental studies with“rather little rigorous mathematicalwork; what there is sometimes confirms and sometimes con-tradicts the heuristic results”[24]. Specific usage of “scale-free” to describe graphs can be traced to the observation inBarabasi and Albert [15] that“a common property of manylarge networks is that the vertex connectivities follow a scale-

free power-law distribution.”However, most of the SF litera-ture [4, 5, 6, 15, 16, 17, 18] identifies a rich variety of addi-tional (e.g. topological) signatures beyond mere power lawde-gree distributions in corresponding models of large networks.One such feature has been the role of evolutionary growth orrewiring processes in the construction of graphs. Preferentialattachment is the mechanism most often associated with thesemodels, although it is only one of several mechanisms that canproduce graphs with power law degree distributions.

Another prominent feature of SF graphs in this literature isthe role of highly connected “hubs.” Power law degree distri-butions alone imply that some nodes in the tail of the powerlaw must have high degree, but “hubs” imply something moreand are often said to “hold the network together.” The presenceof a hub-like network core yields a “robust yet fragile” con-nectivity structure that has become a hallmark of SF networkmodels. Of particular interest here is that a study of SF modelsof the Internet’s router topology is reported to show that“theremoval of just a few key hubs from the Internet splintered thesystem into tiny groups of hopelessly isolated routers”[17].Thus, apparently due to their hub-like core structure, SF net-works are said to be simultaneously robust to the random lossof nodes (i.e. “error tolerance”) since these tend to miss hubs,but fragile to targeted worst-case attacks (i.e. “attack vulnera-bility”) [6] on hubs. This latter property has been termed the“Achilles’ heel” of SF networks, and it has featured promi-nently in discussions about the robustness of many complexnetworks. Albert et al. [6] even claim to“demonstrate thaterror tolerance... is displayedonly by a class of inhomoge-neously wired networks, called scale-free networks”(empha-sis added). We will use the qualifier “SF hubs” to describe high

∗The primary version of this paper is forthcoming fromInternet Mathematics, 2005.†Engineering & Applied Science Division, California Institute of Technology, Pasadena, USA‡RIKEN, Bio-Mimetic Control Research Center, Nagoya, Japan§AT&T Labs–Research, Florham Park, NJ, USA

1

http://arXiv.org/abs/cond-mat/0501169v2

degree nodes which are so located as to provide these “robustyet fragile” features described in the SF literature, and a goalof this paper is to clarify more precisely what topological fea-tures of graphs are involved.

There are a number of properties in addition to power lawdegree distributions, random generation, and SF hubs that areassociated with SF graphs, but unfortunately, it is rarely madeclear in the SF literature which of these features define SFgraphs and which features are then consequences of this defi-nition. This has led to significant confusion about the definingfeatures or characteristics of SF graphs and the applicability ofthese models to real systems. While the usage of “scale-free”in the context of graphs has been imprecise, there is neverthe-less a large literature on SF graphs, particularly in the highestimpact general science journals. For purposes of clarity inthispaper, we will use the termSF graphs(or equivalently,SF net-works) to mean those objects as studied and discussed in this“SF literature,” and accept that this inherits from that literaturean imprecision as to what exactly SF means. One aim of thispaper is to capture as much as possible of the “spirit” of SFgraphs by proving their most widely claimed properties usinga minimal set of axioms. Another is to reconcile these theo-retical properties with the properties of real networks, and inparticular the router-level graphs of the Internet.

Recent research into the structure of several importantcomplex networks previously claimed to be “scale-free” hasrevealed that, even if their graphs could have approximatelypower law degree distributions, the networks in question donot have SF hubs, that the most highly connected nodes do notnecessarily represent an “Achilles’ heel”, and that their mostessential “robust, yet fragile” features actually come from as-pects that are only indirectly related to graph connectivity. Inparticular, recent work in the development of a first-principlesapproach to modeling the router-level Internet has shown thatthe core of that network is constructed from a mesh of high-bandwidth, low-connectivity routers and that this design re-sults from tradeoffs in technological, economic, and perfor-mance constraints on the part of Internet Service Providers(ISPs) [65, 41]. A related line of research into the struc-ture of biological metabolic networks has shown that claimsof SF structure fail to capture the most essential biochemicalas well as “robust yet fragile” features of cellular metabolismand in many cases completely misinterpret the relevant biology[102, 103]. This mounting evidence against the heart of the SFstory creates a dilemma in how to reconcile the claims of thisbroad and popular framework with the details of specific appli-cation domains (see also the discussion in [48]). In particular,it is now clear that either the Internet and biology networksare very far from “scale free”, or worse, the claimed propertiesof SF networks are simply false at a more basic mathematicallevel, independent of any purported applications.

The main purpose of this paper is to demonstrate that whenproperly defined, “scale-free networks” have the potentialfora rigorous, interesting, and rich mathematical theory. Ourpre-sentation assumes an understanding of fundamental Internettechnology as well as comfort with a theorem-proof style ofexposition, but not necessarily any familiarity with existingSF literature. While we leave many open questions and con-jectures supported only by numerical experiments, examples,and heuristics, our approach reconciles the existing contradic-tions and recovers many claims regarding the graph theoreticproperties of SF networks. A main contribution of this paperisthe introduction of a structural metric that allows us to differ-entiate between all simple, connected graphs having an identi-

cal degree sequence, particularly when that sequence follows apower law. Our approach is to leverage related definitions fromother disciplines, where available, and utilize existing methodsand approaches from graph theory and statistics. While theproposed structural metric is not intended as a general mea-sure of all graphs, we demonstrate that it yields considerableinsight into the claimed properties of SF graphs and may evenprovide a view intothe extent to which a graph is scale-free.Such a view has the benefit of beingminimal, in the sense thatit relies on few starting assumptions, yet yields a rich and gen-eral description of the features of SF networks. While far fromcomplete, our results are consistent with the main thrust oftheSF literature and demonstrate that a rigorous and interesting“scale-free theory” can be developed, with very general androbust features resulting from relatively weak assumptions. Inthe process, we resolve some of the misconceptions that existin the general SF literature and point out some of the defi-ciencies associated with previous applications of SF models,particularly to technological and biological systems.

The remainder of this article is organized as follows. Sec-tion 2 provides the basic background material, including math-ematical definitions for scaling and power law degree se-quences, a discussion of related work on scaling that datesback as far as 1925, and various additional work on self-similarity in graphs. We also emphasize here why high vari-ability is a much more important concept than scaling orpower laws per se. Section 3 briefly reviews the recent lit-erature on SF networks, including the failure of SF meth-ods in Internet applications. In Section 4, we introduce ametric for graphs having a power-law in their degree se-quence, one that highlights the diversity of such graphs andalso provides insight into existing notions of graph structuresuch as self-similarity/self-dissimilarity, motifs, anddegree-preserving rewiring. Our metric is “structural”—in the sensethat it depends only on the connectivity of a given graphand not the process by which the graph is constructed—andcan be applied to any graph of interest. Then, Section 5connects these structural features with the probabilisticper-spective common in statistical physics and traditional randomgraph theory, with particular connections to graph likelihood,degree correlation, and assortative/disassortative mixing. Sec-tion 6 then traces the shortcomings of the existing SF theoryand uses our alternate approach to outline what sort of po-tential foundation for a broader and more rigorous SF theorymay be built from mathematically solid definitions. We alsoput the ensuing SF theory in a broader perspective by com-paring it with recently developed alternative models for theInternet based on the notion ofHighly Optimized Tolerance(HOT) [29]. To demonstrate that the Internet application con-sidered in this paper is representative of a broader debate aboutcomplex systems, we discuss in Section 7 another applica-tion area that is very popular within the existing SF literature,namely biology, and illustrate that there exists a largely paral-lel SF vs. HOT story as well. We conclude in Section 8 thatmany open problems remain, including theoretical conjecturesand the potential relevance of rigorous SF models to applica-tions other than technology.

2 Background

This section provides the necessary background for our inves-tigation of what it means for a graph to be “scale-free”. Inparticular, we present some basic definitions and results inran-

2

dom variables, comment on approaches to the statistical anal-ysis of high variability data, and review notions of scale-freeand self-similarity as they have appeared in related domains.

While the advanced reader will find much of this sectionelementary in nature, our experience is that much of the con-fusion on the topic of SF graphs stems from fundamental dif-ferences in the methodological perspectives between statisti-cal physics and that of mathematics or engineering. The intenthere is to provide material that helps to bridge this potentialgap in addition to setting the stage from which our results willfollow.

2.1 Power Law and Scaling Behavior

2.1.1 Non-stochastic vs. Stochastic Definitions

A finite sequencey = (y1, y2, . . . , yn) of real numbers, as-sumed without loss of generality always to be ordered suchthat y1 ≥ y2 ≥ . . . ≥ yn, is said to follow apower laworscaling relationshipif

k = cyk−α, (1)

wherek is (by definition) therank of yk, c is a fixed constant,and α is called thescaling index. Sincelog k = log(c) −α log(yk), the relationship for the rankk versusy appears asa line of slope−α when plotted on a log-log scale. In thismanuscript, we refer to the relationship (1) as thesize-rank(orcumulative) form of scaling. While the definition of scalingin (1) is fundamental to the exposition of this paper, a morecommon usage of power laws and scaling occurs in the con-text of random variables and their distributions. That is, as-suming an underlying probability modelP for a non-negativerandom variableX , let F (x) = P [X ≤ x] for x ≥ 0 de-note the(cumulative) distribution function (CDF) ofX , and letF (x) = 1−F (x) denote thecomplementary CDF (CCDF). Atypical feature of commonly-used distribution functions is thatthe (right) tails of their CCDFs decrease exponentially fast,implying that all moments exist and are finite. In practice, thisproperty ensures that any realization(x1, x2, . . . , xn) from anindependent sample(X1, X2, . . . , Xn) of size n having thecommon distribution functionF concentrates tightly aroundits (sample) mean, thus exhibiting low variability as measured,for example, in terms of the (sample) standard deviation.

In this stochastic context, a random variableX or its corre-sponding distribution functionF is said to follow apower lawor isscalingwith indexα > 0 if, asx → ∞,

P [X > x] = 1 − F (x) ≈ cx−α, (2)

for some constant0 < c < ∞ and a tail index α > 0.Here, we writef(x) ≈ g(x) asx → ∞ if f(x)/g(x) → 1as x → ∞. For 1 < α < 2, F has infinite variance butfinite mean, and for0 < α ≤ 1, F has not only infinitevariance but also infinite mean. In general, all moments ofF of order β ≥ α are infinite. Since relationship (2) im-plies log(P [X > x]) ≈ log(c) − α log(x), doubly logarith-mic plots ofx versus1 − F (x) yield straight lines of slope−α, at least for largex. Well-known examples of power lawdistributions include the Pareto distributions of the firstandsecond kind [57]. In contrast,exponential distributions(i.e.,P [X > x] = e−λx) result in approximately straight lines onsemi-logarithmic plots.

If the derivative of the cumulative distribution functionF (x) exists, thenf(x) = d

dxF (x) is called the(probability)

density functionof X and implies that the stochastic cumula-tive form of scaling or size-rank relationship (2) has an equiv-alentnoncumulativeor size-frequencycounterpart given by

f(x) ≈ cx−(1+α) (3)

which appears similarly as a line of slope−(1 + α) on a log-log scale. However, as discussed in more detail in Section2.1.3 below, the use of this noncumulative form of scaling hasbeen a source of many common mistakes in the analysis andinterpretation of actual data and should generally be avoided.

Power-law distributions are called scaling distributionsbe-cause the sole response to conditioning is a change in scale;that is, if the random variableX satisfies relationship (2) andx > w, then the conditional distribution ofX given thatX > w is given by

P [X > x|X > w] =P [X > x]

P [X > w]≈ c1x

−α,

where the constantc1 is independent ofx and is given byc1 =1/w−α. Thus, at least for large values ofx, P [X > x|X > w]is identical to the (unconditional) distributionP [X > x], ex-cept for a change in scale. In contrast, the exponential distri-bution gives

P (X > x|X > w) = e−λ(x−w),

that is, the conditional distribution is also identical to the (un-conditional) distribution, except for a change of locationratherthan scale. Thus we prefer the termscalingto power law, butwill use them interchangeably, as is common.

It is important to emphasize again the differences betweenthese alternative definitions of scaling. Relationship (1)is non-stochastic, in the sense that there is no assumption of an under-lying probability space or distribution for the sequencey, andin what follows we will always use the termsequenceto re-fer to such a non-stochastic objecty, and accordingly we willusenon-stochasticto mean simply the absence of an under-lying probability model. In contrast, the definitions in (2)and(3) arestochasticand require an underlying probability model.Accordingly, when referring to a random variableX we willexplicitly mean an ensemble of values or realizations sampledfrom a common distribution functionF , as is common usage.We will often use the standard and trivial method of viewing anonstochastic model as a stochastic one with a singular distri-bution.

These distinctions between stochastic and nonstochasticmodels will be important in this paper. Our approach allowsfor but does not require stochastics. In contrast, the SF liter-ature almost exclusively assumes some underlying stochasticmodels, so we will focus some attention on stochastic assump-tions. Exclusive focus on stochastic models is standard in sta-tistical physics, even to the extent that the possibility ofnon-stochastic constructions and explanations is largely ignored.This seems to be the main motivation for viewing the Internet’srouter topology as a member of an ensemble of random net-works, rather than an engineering system driven by economicand technological constraints plus some randomness, whichmight otherwise seem more natural. Indeed, in the SF litera-ture “random” is typically used more narrowly than stochas-tic to mean, depending on the context, exponentially, Poisson,or uniformly distributed. Thus phrases like “scale-free versusrandom” (the ambiguity in “scale-free” notwithstanding) arecloser in meaning to “scaling versus exponential,” rather than“non-stochastic versus stochastic.”

3

2.1.2 Scaling and High Variability

An important feature of sequences that follow the scaling re-lationship (1) is that they exhibithigh variability, in the sensethat deviations from the average value or (sample) mean canvary by orders of magnitude, making the average largely unin-formative and not representative of the bulk of the values. Toquantify the notion ofvariability, we use the standard measureof (sample) coefficient of variation, which for a given sequencey = (y1, y2, . . . , yn) is defined as

CV (y) = STD(y)/y, (4)

wherey = n−1∑n

k=1 yk is the average size or (sample) meanof y andSTD(y) = (

∑nk=1(yk− y)2/(n−1))1/2 is the (sam-

ple) standard deviation, a commonly-used metric for measur-ing the deviations ofy from its averagey. The presence ofhigh variability in a sequence of values often contrasts greatlywith the typical experience of many scientists who work withempirical data exhibitinglow variability—that is, observationsthat tend to concentrate tightly around the (sample) mean andallow for only small to moderate deviations from this meanvalue.

A standard ensemble-based measure for quantifying thevariability inherent in a random variableX is the(ensemble)coefficient of variation CV(X) defined as

CV (X) =√

Var(X)/E(X), (5)

whereE(X) andV ar(X) are the (ensemble) mean and (en-semble) variance ofX , respectively. Ifx = (x1, x2, . . . , xn) isa realization of an independent and identically distributed (iid)sample of sizen taken from the common distributionF of X ,it is easy to see that the quantityCV (x) defined in (4) is sim-ply an estimate ofCV (X). In particular, ifX is scaling withα < 2, thenCV (X) = ∞, and estimatesCV (x) of CV (X)diverge for large sample sizes. Thus, random variables havinga scaling distribution are extreme in exhibiting high variabil-ity. However, scaling distributions are only a subset of a largerfamily of heavy-tailed distributions(see [111] and referencestherein) that exhibit high variability. As we will show, it turnsout that some of the most celebrated claims in the SF literature(e.g. the presence of highly connected central hubs) have asanecessary condition only the presence of high variability andnot necessarily strict scaling per se. The consequences of thisobservation are far-reaching, especially because it shifts thefocus from scaling relationships, their tail indices, and theirgenerating mechanisms to an emphasis on heavy-tailed distri-butions and identifying the main sources of “high variability.”

2.1.3 Cumulative vs. Noncumulative log-log Plots

While in principle there exists an unambiguous mathemati-cal equivalence between distribution functions and their densi-ties, as in (2) and (3), no such relationship can be assumedto hold in general when plotting sequences of real or inte-ger numbers or measured data cumulatively and noncumula-tively. Furthermore, there are good practical reasons to avoidnoncumulative or size-frequency plots altogether (a sentimentechoed in [75]), even though they are often used exclusivelyin some communities. To illustrate the basic problem, wefirst consider two sequences,ys andye, each of length 1000,whereys = (ys

1, . . . , ys1000) is constructed so that its values

all fall on a straight line when plotted on doubly logarith-mic (i.e., log-log) scale. Similarly, the values of the sequence

ye = (ye1, . . . , y

e1000) are generated to fall on a straight line

when plotted on semi-logarithmic (i.e., log-linear) scale. TheMATLAB code for generating these two sequences is availablefor electronic download [69]. When ranking the values in eachsequence in decreasing order, we obtain the following uniquelargest (smallest) values, with their corresponding frequenciesof occurrence given in parenthesis:

ys = 10000(1), 6299(1), 4807(1), 3968(1), 3419(1), . . .

. . . , 130(77), 121(77), 113(81), 106(84), 100(84),ye = 1000(1), 903(1), 847(1), 806(1), 775(1), . . .

. . . , 96(39), 87(43), 76(56), 61(83), 33(180),and the full sequences are plotted in Figure 1. In particular,the doubly logarithmic plot in Figure 1(a) shows the cumula-tive or size-rank relationships associated with the sequencesys

andye: the largest value ofys (i.e., 10,000) is plotted on thex-axis and has rank 1 (y-axis), the second largest value ofys is6,299 and has rank 2, all the way to the end, where the small-est value ofys (i.e., 100) is plotted on the x-axis and has rank1000 (y-axis). Similarly forye. In full agreement with theunderlying generation mechanisms, plotting on doubly loga-rithmic scale the rank-ordered sequence ofys versus rankkresults in a straight line; i.e.,ys is scaling (to within integertolerances). The same plot for the rank-ordered sequence ofye has a pronounced concave shape and decreases rapidly forlarge ranks—strong evidence for an exponential size-rank re-lationship. Indeed, as shown in Figure 1(b), plotting on semi-logarithmic scale the rank-ordered sequence ofye versus rankk yields a straight line; i.e.,ye is exponential (to within integertolerances). The same plot forys shows a pronounced convexshape and decreases very slowly for large rank values—fullyconsistent with a scaling size-rank relationship. Variousmet-rics for these two sequences are

ye ys

(sample) mean 167 267(sample) median 127 153(sample) STD 140 504(sample) CV .84 1.89

and all are consistent with exponential and scaling sequencesof this size.

To highlight the basic problem caused by the use of noncu-mulative or size-frequency relationships, consider Figure 1(c)and (d) that show on doubly logarithmic scale and semi-logarithmic scale, respectively, the non-cumulative or size-frequency plots associated with the sequencesys andye: thelargest value ofys is plotted on the x-axis and has frequency1 (y-axis), the second largest value ofys has also frequency1, etc., until the end where the smallest value ofys happensto occur 84 times (to within integer tolerances). Similarlyforye, where the smallest value happens to occur 180 times. It iscommon to conclude incorrectly from plots such as these, forexample, that the sequenceye is scaling (i.e., plotting on dou-bly logarithmic scale size vs. frequency results in an approx-imate straight line) and the sequenceys is exponential (i.e.,plotting on semi-logarithmic scale size vs. frequency results inan approximate straight line)—exactly the opposite of whatiscorrectly inferred about the sequences using the cumulative orsize-rank plots in Figure 1(a) and (b).

In contrast to the size-rank plots of the style in Figure 1(a)-(b) that depict the raw data itself and are unambiguous, the useof size-frequency plots as in Figure 1(c)-(d), while straight-forward to describe low variable data, creates ambiguitiesand

4

102 103 104100

101

102

103

500 1000 1500100

101

102

103

102

103

100

101

102

0 400 800 1200100

101

102

Ye

Exponential

ScalingYs

Ye appears (incorrectly) to be scaling

Ys appears (incorrectly) to be exponential

Rank k

Freq.

Ye

Exponential

ScalingYs

ykyk

yk yk

a b

c d

Rank k

Freq.

0

Ye

Ye

Ys

Ys

Figure 1: PLOTS OF EXPONENTIALye (BLACK CIRCLES) AND SCALING ys (BLUE SQUARES) SEQUENCES. (a) Doubly logarithmic size-rank plot:ys is scaling (to within integer tolerances) and thusys

kversusk is approximately a straight line.(b) Semi-logarithmic size-rank plot:ye is exponential (to

within integer tolerances) and thusyek

versusk is approximately a straight line on semi-logarithmic plots(c) Doubly logarithmic size-frequency plot: ye isexponential but appears incorrectly to be scaling(d) Semi-logarithmic size-frequency plot:ys is scaling but appears incorrectly to be exponential.

can easily lead to mistakes when applied to high variabilitydata. First, for high precision measurements it is possiblethateach data value appears only once in a sample set, making rawfrequency-based data rather uninformative. To overcome thisproblem, a typical approach is to group individual observationsinto one of a small number ofbinsand then plot for each bin (x-axis) the relative number of observations in that bin (y-axis).The problem is that choosing the size and boundary values foreach bin is a process generally left up to the experimentalist,and thisbinning processcan dramatically change the nature ofthe resulting size-frequency plots as well as their interpretation(for a concrete example, see Figure 10 in Section 6.1).

These examples have been artificially constructed specifi-cally to dramatize the effects associated with the use of cumu-lative or size-rank vs. noncumulative or size-frequency plotsfor assessing the presence or absence of scaling in given se-quence of observed values. While they may appear contrived,errors such as those illustrated in Figure 1 are easy to makeand are widespread in the complex systems literature. In fact,determining whether a realization of a sample of sizen gener-ated from one and the same (unknown) underlying distributionis consistent with a scaling distribution and then estimatingthe corresponding tail indexα from the corresponding size-frequency plots of the data is even more unreliable. Even un-der the most idealized circumstances using synthetically gen-erated pseudo-random data, size-frequency plots can misleadas shown in the following easily reproduced numerical exper-iments. Suppose that 1000 (or more) integer values are gen-erated by pseudo-random independent samples from the dis-tribution F (x) = 1 − x−1 (P (X ≥ x) = x−1) for x ≥ 1.For example, this can be done with theMATLAB fragmentx=floor(1./rand(1,1000)) whererand(1,1000)generates a vector of 1000 uniformly distributed floating pointnumbers between 0 and 1, andfloor rounds down to the

next lowest integer. In this case, discrete equivalents to equa-tions (2) and (3) exist, and forx ≫ 1, the density functionf(x) = P [X = x] is given by

P [X = x] = P [X ≥ x] − P [X ≥ x + 1]

= x−1 − (x + 1)−1

≈ x−2.

Thus it might appear that the true tail index (i.e.,α = 1) couldbe inferred from examining either the size-frequency or size-rank plots, but as illustrated in Figure 2 and described in thecaption, this is not the case.

Though there are more rigorous and reliable methods forestimatingα (see for example [85]), the (cumulative) size-rank plots have significant advantages in that they show theraw data directly, and possible ambiguities in the raw datanotwithstanding, they are also highly robust to a range ofmeasurement errors and noise. Moreover, experienced read-ers can judge at a glance whether a scaling model is plau-sible, and if so, what a reasonable estimate of the unknownscaling parameterα should be. For example, that the scat-ter in the data in Figure 2(a) is consistent with a samplefrom P (X ≥ x) = x−1 can be roughly determined byvisual inspection, although additional statistical testscouldbe used to establish this more rigorously. At the sametime, even when the underlying random variableX is scal-ing, size-frequency plots systematically underestimateα, andworse, have a tendency to suggest that scaling exists where itdoes not. This is illustrated dramatically in Figure 2(b)-(c),where exponentially distributed samples are generated usingfloor(10*(1-log(rand(1,n)))). The size-rank plotin Figure 2(b) is approximately a straight line on a semilogplot, consistent with an exponential distribution. The loglogsize-frequency plot Figure 2(c) however could be used incor-

5

100

101

102

10310

0

101

102

103

α=.67

α=1

rank

frequency

degree

1+α=2.5rank

frequency

50 100100

101

102

103

101

10210

0

101

102

degree degree

a) b) c)

Figure 2: A COMMON ERROR WHEN INFERRING/ESTIMATING SCALING BEHAVIOR. (a) 1000 integer data points sampled from the scaling distributionP (X ≥ x) = x−1, for x ≥ 1. The lower size-frequency plot (blue circles) tends to underestimate the scaling indexα; it supports a slope estimate of about-1.67 (red dashed line), implying anα-estimate of about =0.67 that is obviously inconsistent with the true value ofα = 1 (green line). The size-rank plot ofthe exact same data (upper, black dots) clearly supports a scaling behavior and yields anα-estimate that is fully consistent with the true scaling index α = 1(green line).(b) 1000 data points sampled from an exponential distribution plotted on log-linear scale.The size-rank plot clearly shows that the data areexponential and that scaling is implausible.(c) The same data as in (b) plotted on log-log scale.Based on the size-frequency plot, it is plausible to inferincorrectly that the data are consistent with scaling behavior, with a slope estimate of about -2.5, implying anα-estimate of about 1.5.

rectly to claim that the data is consistent with a scaling dis-tribution, a surprisingly common error in the SF and broadercomplex systems literature. Thus even if one a priori assumesa probabilistic framework, (cumulative) size-rank plots are es-sential for reliably inferring and subsequently studying highvariability, and they therefore are used exclusively in this pa-per.

2.1.4 Scaling: More “normal” than Normal

While power laws in event size statistics in many complex in-terconnected systems have recently attracted a great deal ofpopular attention, some of the aspects of scaling distributionsthat are crucial and important for mathematicians and engi-neers have been largely ignored in the larger complex systemsliterature. This subsection will briefly review one aspect ofscaling that is particularly revealing in this regard and isa sum-mary of results described in more detail in [67, 111].

Gaussian distributions are universally viewed as “normal”,mainly due to the well-known Central Limit Theorem (CLT).In particular, the ubiquity of Gaussians is largely attributed tothe fact that they are invariant and attractors under aggregationof summands, required only to be independent and identicallydistributed (iid) and have finite variance [47]. Another conve-nient aspect of Gaussians is that they are completely specifiedby mean and variance, and the CLT justifies using these statis-tics whenever their estimates robustly converge, even whenthedata could not possibly be Gaussian. For example, much datacan only take positive values (e.g. connectivity) or have hardupper bounds but can still be treated as Gaussian. It is un-derstood that this approximation would need refinement if ad-ditional statistics or tail behaviors are of interest. Exponen-tial distributions have their own set of invariance properties(e.g. conditional expectation) that make them attractive mod-els in some cases. The ease by which Gaussian data is gener-ated by a variety of mechanisms means that the ability of anyparticular model to reproduce Gaussian data is not counted asevidence that the model represents or explains other processesthat yield empirically observed Gaussian phenomena. How-ever, a disconnect often occurs when data have high variabil-ity, that is, when variance or coefficient of variation estimatesdon’t converge. In particular, the above type of reasoning is

often misapplied to the explanation of data that are approxi-mately scaling, for reasons that we will discuss below.

Much of science has focused so exclusively on low vari-ability data and Gaussian or exponential models that low vari-ability is not even seen as an assumption. Yet much real worlddata has extremely high variability as quantified, for example,via the coefficient of variation defined in (5). When exploringstochastic models of high variability data, the most relevantmathematical result is that the CLT has a generalization thatrelaxes the finite variance (e.g. finiteCV ) assumption, allowsfor high variability data arising from underlying infinite vari-ance distributions, and yieldsstable lawsin the limit. Thereis a rich and extensive theory on stable laws (see for example[89]), which we will not attempt to review, but mention onlythe most important features. Recall that a random variableUis said to have astable law (with index0 < α ≤ 2) if for anyn ≥ 2, there is a real numberdn such that

U1 + U2 + · · · + Und= n1/αU + dn,

whereU1, U2, . . . , Un are independent copies ofU , and

whered= denotes equality in distribution. Following [89],

the stable laws on the real line can be represented as a four-parameter familySα(σ, β, µ), with the indexα, 0 < α ≤ 2;thescale parameterσ > 0; theskewness parameterβ, −1 ≤β ≤ 1; and thelocation (shift) parameterµ, −∞ < µ < ∞.When1 < α < 2, the shift parameter is the mean, but forα ≤ 1, the mean is infinite. There is an abrupt change intail behavior of stable laws at the boundaryα = 2. Whilefor α < 2, all stable laws are scaling in the sense that theysatisfy condition (2) and thus exhibit infinite variance or highvariability; the caseα = 2 is special and represents a famil-iar, not scaling distribution—the Gaussian (normal) distribu-tion; i.e.,S2(σ, 0, µ) = N(µ, 2σ2), corresponding to the finitevariance or low variability case. While with the exception ofGaussian, Cauchy, and Levy distributions, the distributions ofstable random variables are not known in closed form, they areknown to be the only fixed points of the renormalization grouptransformation and thus arise naturally in the limit of properlynormalized sums of iid scaling random variables. From an un-biased mathematical view, the most salient features of scalingdistributions are this and additional strong invariance proper-

6

ties (e.g. to marginalization, mixtures, maximization), and theease with which scaling is generated by a variety of mecha-nisms [67, 111]. Combined with the abundant high variabilityin real world data, these features suggest that scaling distri-butions are in a sense more “normal” than Gaussians and thatthey are convenient and parsimonious models for high vari-ability data in as strong a sense as Gaussians or exponentialsare for low variability data.

While the ubiquity of scaling is increasingly recognizedand even highlighted in the physics and the popular complex-ity literature [11, 27, 14, 12], the deeper mathematical con-nections and their rich history in other disciplines have beenlargely ignored, with serious consequences. Models of com-plexity using graphs, lattices, cellular automata, and sandpilespreferred in physics and the standard laboratory-scale exper-iments that inspired these models exhibit scaling only whenfinely tuned in some way. So even when accepted as ubiq-uitous, scaling is still treated as arcane and exotic, and “emer-gence” and “self-organization” are invoked to explain how thistuning might happen [8]. For example, that SF network mod-els supposedly replicate empirically observed scaling node de-gree relationships that are not easily captured by traditionalErdos-Renyi random graphs [15] is presented as evidence formodel validity. But given the strong invariance propertiesofscaling distributions, as well as the multitude of diverse mech-anisms by which scaling can arise in the first place [75], itbecomes clear that an ability to generate scaling distributions“explains” little, if anything. Once high variability appears inreal data then scaling relationships become a natural outcomeof the processes that measure them.

2.2 Scaling, Scale-free and Self-Similarity

Within the physics community it is common to refer to func-tions of the form (3) asscale-freebecause they satisfy the fol-lowing property

f(ax) = g(a)f(x). (6)

As reviewed by Newman [75], the idea is that an increase by afactora in the scale or units by which one measuresx resultsin no change to the overall densityf(x) except for a multi-plicative scaling factor. Furthermore, functions consistent with(3) are theonly functions that are scale-free in the sense of(6)—free of a characteristic scale. This notion of “scale-free”is clear, and could be taken as simply another synonym forscaling and power law, but most actual usages of “scale-free”appear to have a richer notion in mind, and they attribute addi-tional features, such as some underlying self-similar or fractalgeometry or topology, beyond just properties of certain scalarrandom variables.

One of the most widespread and longstanding uses of theterm “scale-free” has been in astrophysics to describe the frac-tal nature of galaxies. Using a probabilistic framework, oneapproach is to model the distribution of galaxies as a station-ary random process and express clustering in terms of correla-tions in the distributions of galaxies (see the review [45] for anintroduction). In 1977, Groth and Peebles [51] proposed thatthis distribution of galaxies is well described by a power-lawcorrelation function, and this has since been called scale-freein the astrophysics literature. Scale-free here means thatthefluctuation in the galaxy density have “non-trivial, scale-freefractal dimension” and thus scale-free is associated with frac-tals in the spatial layout of the universe.

Perhaps the most influential and revealing notion of “scale-free” comes from the study ofcritical phase transitionsinphysics, where the ubiquity of power laws is often interpretedas a “signature” of a universality in behavior as well in as un-derlying generating mechanisms. An accessible history of theinfluence of criticality in the SF literature can found in [14,pp. 73-78]. Here, we will briefly review criticality in the con-text of percolation, as it illustrates the key issues in a simpleand easily visualized way. Percolation problems are a canon-ical framework in the study of statistical mechanics (see [98]for a comprehensive introduction). A typical problem consistsof a squaren×n lattice of “sites”, each of which is either “oc-cupied” or “unoccupied”. This initial configuration is obtainedat random, typically according to some uniform probability,termed thedensity, and changes to the lattice are similarly de-fined in terms of some stochastic process. The objective isto understand the relationship among groups of contiguouslyconnected sites, calledclusters. One celebrated result in thestudy of such systems is the existence of aphase transitionat acritical density of occupied sites, above which there exists withhigh probability a cluster that spans the entire lattice (termeda percolating cluster) and below which no percolating clusterexists. The existence of a critical density where a percolatingcluster “emerges” is qualitatively similar to the appearance ofa giant connected component in random graph theory [23].

Figure 3(a) shows an example of a random square lattice(n = 32) of unoccupied white sites and a critical density(≈ .59) of occupied dark sites, shaded to show their connectedclusters. As is consistent with percolation problems at criti-cality, the sequence of cluster sizes is approximately scaling,as seen in Figure 3(d), and thus there is wide variability incluster sizes. The cluster boundaries are fractal, and in thelimit of large n, the same fractal geometry occurs throughoutthe lattice and on all scales, one sense in which the lattice issaid to be self-similar and “scale-free”. These scaling, scale-free, and self-similar features occur in random lattices ifandonly if (with unit probability in the limit of largen) the den-sity is at the critical value. Furthermore, at the critical point,cluster sizes and many other quantities of interest have powerlaw distributions, and these are all independent of the detailsin two important ways. The first and most celebrated is thatthey areuniversal, in the sense that they hold identically ina wide variety of otherwise quite different physical phenom-ena. The other, which is even more important here, is that allthese power laws, including the scale-free fractal appearanceof the lattice, is unaffected if the sites are randomly rearranged.Suchrandom rewiringpreserves the critical density of occu-pied sites, which is all that matters in purely random lattices.

For many researchers, particularly those unfamiliar withthe strong statistical properties of scaling distributions, theseremarkable properties of critical phase transitions have be-come associated with more than just a mechanism givingpower laws. Rather, power laws themselves are often viewedas “suggestive” or even “patent signatures” of criticalityand“self-organization” in complex systems generally [14]. Fur-thermore, the concept ofSelf-Organized Criticality (SOC)hasbeen suggested as a mechanism that automatically tunes thedensity to the critical point [11]. This has, in turn, given rise tothe idea that power laws alone could be “signatures” of specificmechanisms, largely independent of any domain details, andthe notion that such phenomena are robust to random rewiringof components or elements has become a compelling force inmuch of complex systems research.

Our point with these examples is that typical usage of

7

100 101 102 103100

101

102

Ran

k

Cluster Size(a) (b) (c) (d)

Figure 3: PERCOLATION LATTICES WITH SCALING CLUSTER SIZES. Lattices (a)-(c) have the exact same scaling sequence of cluster sizes (d) and thesame (critical) density≈ .59). While random lattice such as in (a) have been be called “scale-free”, the highly structured lattices in (b) or (c) typically wouldnot. This suggests that, even within the framework of percolation, scale-free usually means something beyond simple scaling of some statistics and refers togeometric or topological properties.

“scale-free” is often associated with some fractal-like geom-etry, not just macroscopic statistics that are scaling. This dis-tinction can be highlighted through the use of the percolationlattice example, but contrived explicitly to emphasize this dis-tinction. Consider three percolation lattices at the critical den-sity (where the distribution of cluster sizes is known to be scal-ing) depicted in Figure 3(a)-(c). Even though these latticeshave identical cluster size sequences (shown in Figure 3(d)),only the random and fractal, self-similar geometry of the lat-tice in Figure 3(a) would typically be called “scale-free,”whilethe other lattices typically would not and do not share any ofthe other “universal” properties of critical lattices [29]. Again,the usual use of “scale-free” seems to imply certain self-similaror fractal-type features beyond simply following scaling statis-tics, and this holds in the existing literature on graphs as well.

2.3 Scaling and Self-Similarity in Graphs

While it is possible to use “scale-free” as synonymous withsimple scaling relationships as expressed in (6), the popular us-age of this term has generally ascribed something additional toits meaning, and the terms “scaling” and “scale-free” have notbeen used interchangeably, except when explicitly used to saythat “scaling” is “free of scale.” When used to describe manynaturally occurring and man-made networks, “scale free” oftenimplies something about the spatial, geometric, or topologicalfeatures of the system of interest (for a recent example of thatillustrates this perspective in the context of the World WideWeb, see [38]). While there exists no coherent, consistent lit-erature on this subject, there are some consistencies that wewill attempt to capture at least in spirit. Here we review brieflysome relevant treatments ranging from the study of river net-works to random graphs, and including the study of networkmotifs in engineering and biology.

2.3.1 Self-similarity of River Channel Networks

One application area where self-similar, fractal-like, and scale-free properties of networks have been considered in great de-tail has been the study of geometric regularities arising intheanalysis of tree-branching structures associated with river orstream channels [53, 101, 52, 68, 60, 82, 106, 39]. Following[82], consider a river network modeled as a tree graph, andrecursively assign weights (the “Horton-Strahler stream ordernumbers”) to each link as follows. First, assign order 1 to allexterior links. Then, for each interior link, determine thehigh-est order among its child links, say,ω. If two or more of the

child links have orderω, assign to the parent link orderω + 1;otherwise, assign orderω to the parent link. Orderk streams orchannels are then defined as contiguous chains of orderk links.A tree whose highest order stream has orderΩ is called a treeof orderΩ. Using this Horton-Strahler stream ordering con-cept, any rooted tree naturally decomposes into a discrete setof “scales”, with the exterior links labeled as order 1 streamsand representing the smallest scale or the finest level of detail,and the orderΩ stream(s) within the interior representing thelargest scale or the structurally coarsest level of detail.For ex-ample, consider the order 4 streams and their different “scales”depicted in Figure 4.

To define topologically self-similar trees, consider theclass of deterministic trees where every stream of orderω hasb ≥ 2 upstream tributaries of orderω − 1, andTω,k side trib-utaries of orderk, with 2 ≤ ω ≤ Ω and1 ≤ k ≤ ω − 1. Atree is called (topologically)self-similar if the correspondingmatrix (Tω,k) is a Toeplitz matrix; i.e., constant along diago-nals,Tω,ω−k = Tk, whereTk is a number that depends onkbut not onω and gives the number of side tributaries of orderω−k. This definition (with the further constraint thatTk+1/Tk

is constant for allk) was originally considered in works byTokunaga (see [82] for references). Examples of self-similartrees of order 4 are presented in Figure 4(b-c).

An important concept underlying this ordering scheme canbe described in terms of a recursive “pruning” operation thatstarts with the removal of the order 1 exterior links. Such re-moval results in a tree that is more coarse and has its own setof exterior links, now corresponding to the finest level of re-maining detail. In the next iteration, these order 2 streamsarepruned, and this process continues for a finite number of iter-ations until only the orderΩ stream remains. As illustrated inFigure 4(b-c), successive pruning is responsible for the self-similar nature of these trees. The idea is that streams of orderk are invariant under the operation of pruning—they may berelabeled or removed entirely, but are never severed—and theyprovide a natural scale or level of detail for studying the overallstructure of the tree.

As discussed in [87], early attempts at explaining the strik-ing ubiquity of Horton-Strahler stream ordering was based ona stochastic construction in which“it has been commonly as-sumed by hydrologists and geomorphologists that the topologi-cal arrangement and relative sizes of the streams of a drainagenetwork are just the result of a most probable configuration ina random environment.”However, more recent attempts at ex-plaining this regularity have emphasized an approach basedon different principles of optimal energy expenditure to iden-

8

completegraph

(a) (b)

Order 1Order 2

Order 3Order 4

order 1ext linkstrimmed


(c)order 1ext linkstrimmed


completegraph

=

124

012

001

0

00

3,42,41,4

2,31,3

1,2

TTT

TT

T(d)

Figure 4: HORTON-STRAHLER STREAMS OF ORDER4. (a) Generic stream with segments coded according to theirorder. (b) Self-similar tree without sidetributaries: branching numberb = 2 andTk = 0 for all k. (c) Self-similar tree with side tributaries: branching numberb = 2 butTk = 2k−1 for k = 1, 2, 3.(d) Toeplitz matrix of valuesTω,ω−k = Tk, representing the side tributaries in (c).

tify the universal mechanisms underlying the evolution of “thescale-free spatial organization of a river network” [87, 86].The idea is that, in addition to randomness, necessity in theform of different energy expenditure principles play a funda-mental role in yielding the multiscaling characteristics in nat-urally occurring drainage basins.

It is also interesting to note that while considerable atten-tion in the literature on river or stream channel networks isgiven to empirically observed power law relationships (com-monly referred to as “Horton’s laws of drainage network com-position”) and their physical explanations, it has been arguedin [60, 61, 62] that these “laws” are in fact a very weak testof models or theories of stream network structures. The argu-ments are based on the observation that because most streamnetworks (random or non-random) appear to satisfy Horton’slaws automatically, the latter provide little compelling evi-dence about the forces or processes at work in generating theremarkably regular geometric relationships observed in actualriver networks. This discussion is akin to the wide-spread be-lief in the SF network literature that since SF graphs exhibitpower law degree distributions, they are capable of capturinga distinctive “universal” feature underlying the evolution ofcomplex network structures. The arguments provided in thecontext of the Internet’s physical connectivity structure[65]are similar in spirit to Kirchner’s criticism of the interpreta-tion of Horton’s laws in the literature on river or stream chan-nel networks. In contrast to [60] where Horton’s laws areshown to be poor indicators of whether or not stream channelnetworks are random, [65] makes it clear that by their verydesign, engineered networks like the Internet’s router-leveltopology are essentially non-random, and that their randomlyconstructed (but otherwise comparable) counterparts result inpoorly-performing or dysfunctional networks.

2.3.2 Scaling Degree Sequence and Degree Distribution

Statistical features of graph structures that have received exten-sive treatment include the size of the largest connected compo-nent, link density, node degree relationships, the graph diam-eter, the characteristic path length, the clustering coefficient,and the betweenness centrality (for a review of these and othermetrics see [4, 74, 40]). However, the single feature that hasreceived the most attention is the distribution of node degreeand whether or not it follows a power law.

For a graph withn vertices, letdi = deg(i) denote the de-gree of nodei, 1 ≤ i ≤ n, and callD = d1, d2, . . . , dn thedegree sequenceof the graph, again assumed without loss ofgenerality always to be orderedd1 ≥ d2 ≥ . . . ≥ dn. We willsay a graph hasscaling degree sequence D(or D is scaling)if for all 1 ≤ k ≤ ns ≤ n, D satisfies apower law size-rankrelationshipof the formk dα

k = c, wherec > 0 andα > 0 are

constants, and wherens determines the range of scaling [67].Since this definition is simply a graph-specific version of (1)that allows for deviations from the power law relationship fornodes with low connectivity, we again recognize that doublylogarithmic plots ofdk versusk yield straight lines of slope−α, at least for largedk values.

This description of scaling degree sequence is general, inthe sense that it applies to any given graph without regard tohow it is generated and without reference to any underlyingprobability distributions or ensembles. That is, a scalingde-gree sequence is simply an ordered list of integers represent-ing node connectivity and satisfying the above scaling rela-tionship. In contrast, the SF literature focuses largely onscal-ing degree distribution, and thus a given degree sequence hasthe further interpretation as representing a realization of an iidsample of sizen generated from a common scaling distributionof the type (2). This in turn is often induced by some randomensemble of graphs. This paper will develop primarily a non-stochastic theory and thus focus on scaling degree sequences,but will clarify the role of stochastic models and distributionsas well. In all cases, we will aim to be explicit about which isassumed to hold.

For graphs that are not trees, a first attempt at formallydefining and relating the concepts of “scaling” or “scale-free”and “self-similar” through an appropriately defined notionof“scale invariance” is considered by Aiello et al. and describedin [3]. In short, Aiello et al. view the evolution of a graph asarandom process of growing the graph by adding new nodes andlinks over time. A model of a given graph evolution processis then called “scale-free” if “coarse-graining” in time yieldsscaled graphs that have the same power law degree distributionas the original graph. Here “coarse-graining in time” refers toconstructing scaled versions of the original graph by dividingtime into intervals, combining all nodes born in the same inter-val into super-nodes, and connecting the resulting super-nodesvia a natural mapping of the links in the original graph. Fora number of graph growing models, including the Barabasi-Albert construction, Aiello et al. show that the evolution pro-cess is “scale-free” in the sense of being invariant with respectto time scaling (i.e., the frequency of sampling with respectto the growth rate of the model) and independent of the pa-rameter of the underlying power law node degree distribution(see [3] for details). Note that the scale invariance criterionconsidered in [3] concerns exclusively the degree distributionsof the original graph and its coarse-grained or scaled counter-parts. Specifically, the definition of “scale-free” considered byAiello et al. is not “structural” in the sense that it dependsona macroscopic statistic that is largely uninformative as far astopological properties of the graph are concerned.

9

2.3.3 Network Motifs

Another recent attempt at relating the notions of “scale-free”and “self-similar” for arbitrary graphs through the more struc-turally driven concept of “coarse-graining” is due to Itzkovitzet al. [58]. More specifically, the main focus in [58] is on inves-tigating the local structure of basic network building blocks,termedmotifs, that recur throughout a network and are claimedto be part of many natural and man-made systems [92, 70].The idea is that by identifying motifs that appear in a givennetwork at much higher frequencies than in comparable ran-dom networks, it is possible to move beyond studying macro-scopic statistical features of networks (e.g. power law degreesequences) and try to understand some of the networks’ moremicroscopic and structural features. The proposed approachis based on simplifying complex network structures by creat-ing appropriately coarse-grained networks in which each noderepresents an entire pattern (i.e., network motif) in the originalnetwork. Recursing on the coarse-graining procedure yieldsnetworks at different levels of resolution, and a network iscalled “scale-free” if the coarse-grained counterparts are “self-similar” in the sense that the same coarse-graining procedurewith the same set of network motifs applies at each level ofresolution. When applying their approach to an engineerednetwork (electric circuit) and a biological network (protein-signaling network), Itzkovitz et al. found that while each ofthese networks exhibits well-defined (but different) motifs,their coarse-grained counterparts systematically display verydifferent motifs at each level.

A lesson learned from the work in [58] is that networks thathave scaling degree sequences need not have coarse-grainedcounterparts that are self-similar. This further motivates ap-propriately narrowing the definition of “scale-free” so that itdoes imply some kind of self-similarity. In fact, the exam-ples considered in [58] indicate that engineered or biologi-cal networks may be the opposite of “scale-free” or “self-similar”—their structure at each level of resolution is differ-ent, and the networks are “scale-rich” or “self-dissimilar.” Aspointed out in [58], this observation contrasts with prevail-ing views based on statistical mechanics near phase-transitionpoints which emphasize how self-similarity, scale invariance,and power laws coincide in complex systems. It also suggeststhat network models that emphasize the latter views may bemissing important structural features [58, 59]. A more formaldefinition of self-dissimilaritywas recently given by Wolpertand Macready [112, 113] who proposed it as a characteristicmeasure of complex systems. Motivated by a data-driven ap-proach, Wolpert and Macready observed that many complexsystems tend to exhibit different structural patterns overdif-ferent space and time scales. Using examples from biologicaland economic/social systems, their approach is to considerandquantify how such complex systems process information atdifferent scales. Measuring a system’s self-dissimilarity acrossdifferent scales yields a complexity “signature” of the systemat hand. Wolpert and Macready suggest that by clustering suchsignatures, one obtains a purely data-driven, yet natural,tax-onomy for broad classes of complex systems.

2.3.4 Graph Similarity and Data Mining

Finally, the notion of graph similarity is fundamental to thestudy of attributed graphs (i.e., objects that have an internalstructure that is typically modeled with the help of a graph ortree and that is augmented with attribute information). Suchgraphs arise as natural models for structured data observedin

different database applications (e.g., molecular biology, imageor document retrieval). The task of extracting relevant or newknowledge from such databases (“data mining”) typically re-quires some notion ofgraph similarityand there exists a vastliterature dealing with different graph similarity measures ormetrics and their properties [91, 31]. However, these measurestend to exploit graph features (e.g., a given one-to-one map-ping between the vertices of different graphs, or a requirementthat all graphs have to be of the same order) that are specificto the application domain. For example, a common similaritymeasure for graphs used in the context of pattern recognitionis the edit distance [90]. In the field of image retrieval, thesimilarity of attributed graphs is often measured via the vertexmatching distance [83]. The fact that the computation of manyof these similarity measures is known to be NP-complete hasmotivated the development of new and more practical mea-sures that can be used for more efficient similarity searchesinlarge-scale databases (e.g., see [63]).

3 The Existing SF Story

In this section, we first review the existing SF literature de-scribing some of the most popular models and their most ap-pealing features. This is then followed by a brief a critiqueofthe existing theory of SF networks in general and in the contextof Internet topology in particular.

3.1 Basic Properties and Claims

The main properties of SF graphs that appear in the existingliterature can be summarized as

1. SF networks have scaling (power law) degree distribu-tion.

2. SF networks can be generated by certain random pro-cesses, the foremost among which is preferential attach-ment.

3. SF networks have highly connected “hubs” which “holdthe network together” and give the “robust yet fragile”feature of error tolerance but attack vulnerability.

4. SF networks are generic in the sense of being preservedunder random degree preserving rewiring.

5. SF networks are self-similar.

6. SF networks are universal in the sense of not dependingon domain-specific details.

This variety of features suggest the potential for a rich andex-tensive theory. Unfortunately, it is unclear from the literaturewhich properties are necessary and/or sufficient to imply theothers, and if any implications are strict, or simply “likely”for an ensemble. Many authors apparently define scale-freein terms of just one property, typically scaling degree distri-bution or random generation, and appear to claim that someor all of the other properties are then consequences. A cen-tral aim of this paper is to clarify exactly what options thereare in defining SF graphs and deriving their additional prop-erties. Ultimately, we propose below in Section 6.2 a set ofminimal axioms that allow for the preservation of the mostcommon claims. However, first we briefly review the existingtreatment of the above properties, related historical results, and

10

shortcomings of the current theory, particularly as it has beenfrequently applied to the Internet.

The ambiguity regarding the definition of “scale-free”originates with the original papers [15, 6], but have contin-ued since. Here SF graphs appear to be defined both as graphswith scaling or power law degree distributions and as beinggenerated by a stochastic construction mechanism based onincremental growth(i.e. nodes are added one at a time) andpreferential attachment(i.e. nodes are more likely to attachto nodes that already have many connections). Indeed, theapparent equivalence of scaling degree distribution and pref-erential attachment, and the ability of thus-defined (if ambigu-ously so) SF network models to generate node degree statisticsthat are consistent with the ubiquity of empirically observedpower laws is the most commonly cited evidence that SF net-work mechanisms and structures are in some sense universal[5, 6, 14, 15, 18].

Models of preferential attachment giving rise to power lawstatistics actually have a long history and are at least 80 yearsold. As presented by Mandelbrot [67], one early example ofresearch in this area was the work of Yule [117], who in 1925developed power law models to explain the observed distri-bution of species within plant genera. Mandelbrot [67] alsodocuments the work of Luria and Delbruck, who in 1943 de-veloped a model and supporting mathematics for the explicitgeneration of scaling relationships in the number of mutantsin old bacterial populations [66]. A more general and popularmodel of preferential attachment was developed by Simon [94]in 1955 to explain the observed presence of power laws withina variety of fields, including economics (income distributions,city populations), linguistics (word frequencies), and biology(distribution of mutants in bacterial cultures). Substantial con-troversy and attention surrounded these models in the 1950sand 1960s [67]. A recent review of this history can also befound in [71]. By the 1990s though these models had beenlargely displaced in the popular science literature by modelsbased on critical phenomena from statistical physics [11],onlyto resurface recently in the scientific literature in this contextof “scale-free networks” [15]. Since then, numerous refine-ments and modifications to the original Barabasi-Albert con-struction have been proposed and have resulted in SF networkmodels that can reproduce power law degree distributions withany α ∈ [1, 2], a feature that agrees empirically with manyobserved networks [4]. Moreover, the largely empirical andheuristic studies of these types of “scale-free” networks haverecently been enhanced by a rigorous mathematical treatmentthat can be found in [24] and involves a precise version of theBarabasi-Albert construction.

The introduction of SF network models, combined withthe equally popular (though less ambiguous) “small world”network models [109], reinvigorated the use of abstract ran-dom graph models and their properties (particularly node de-gree distributions) to study a diversity of complex networksys-tems. For example, Dorogovtsev and Mendes [40, p.76] pro-vide a “standard programme of empirical research of a com-plex network”, which for the case of undirected graphs consistof finding 1) the degree distribution; 2) the clustering coeffi-cient; 3) the average shortest-path length. The presumption isthat these features adequately characterize complex networks.Through the collective efforts of many researchers, this ap-proach has cataloged an impressive list of real applicationnet-works, including communication networks (the WWW andthe Internet), social networks (author collaborations, movieactors), biological networks (neural networks, metabolicnet-

works, protein networks, ecological and food webs), telephonecall graphs, mail networks, power grids and electronic circuits,networks of software components, and energy landscape net-works (again, comprehensive reviews of these many results arewidely available [4, 14, 74, 40, 79]). While very different indetail, these systems share a common feature in that their de-gree distributions are all claimed to follow a power law, possi-bly with different tail indices.

Regardless of the definitional ambiguities, the use of sim-ple stochastic constructions that yield scaling degree distribu-tions and other appealing graph properties represent for manyresearchers what is arguably an ideal application of statisticalphysics to explaining and understanding complexity. SinceSFmodels have their roots in statistical physics, a key assumptionis always that any particular network is simply a realizationfrom a larger ensemble of graphs, with an explicit or implicitunderlying stochastic model. Accordingly, this approach tounderstanding complex networks has focused on those net-works that are most likely to occur under an assumed ran-dom graph model and has aimed at identifying or discoveringmacroscopic features that capture the “essence” of the struc-ture underlying those networks. Thus preferential attachmentoffers a general and hence attractive “microscopic” mechanismby which a growth process yields an ensemble of graphs withthe “macroscopic” property of power law node degree distribu-tions [16]. Second, the resulting SF topologies are “generic.”Not only is any specific SF graph the generic or likely ele-ment from such an ensemble, but also“... an important prop-erty of scale-free networks is that [degree preserving] randomrewiring does not change the scale-free nature of the network”(see Methods Supplement to [55]). Finally, this ensemble-based approach has an appealing kind of “universality” in thatit involves no model-specific domain knowledge or specialized“design” requirements and requires only minimal tuning of theunderlying model parameters.

Perhaps most importantly, SF graphs are claimed to ex-hibit a host of startling “emergent” consequences of universalrelevance, including intriguing self-similar and fractalprop-erties (see below for details), small-world characteristics [9],and “hub-like” cores. Perhaps the central claim for SF graphsis that they have hubs, what we term SF hubs, which “hold thenetwork together.” As noted, the structure of such networksis highly vulnerable (i.e., can be fragmented) to attacks thattarget these hubs [6]. At the same time, they are resilient toat-tacks that knock out nodes at random, since a randomly chosennode is unlikely to be a hub and thus its removal has minimaleffect on network connectivity. In the context of the Internet,where SF graphs have been proposed as models of the router-level Internet [115], this has been touted “the Achilles’ heelof the Internet” [6], a vulnerability that has presumably beenoverlooked by networking engineers. Furthermore, the hub-like structure of SF graphs is such that the epidemic thresholdis zero for contagion phenomena [78, 13, 80, 79], thus suggest-ing that the natural way to stop epidemics, either for computerviruses/worms or biological epidemics such as AIDS, is to pro-tect these hubs [37, 26]. Proponents of this modeling frame-work have further suggested that the emergent properties ofSF graphs contributes to truly universal behavior in complexnetworks [22] and that preferential attachment as well is a uni-versal mechanism at work in the evolution of these networks[56, 40].

11

(d) HOTnet(c) PoorDesign

(b) Random(a) HSFnet

Node Degree

Nod

e R

ank

10 1 10 2100

101

Link / Router Speed (Gbps)

5.0 – 10.050 – 100

1.0 – 5.010 – 50

0.5 – 1.05 – 10

0.1 – 0.51 – 5

0.05 – 0.10.5 – 0.1

0.01 – 0.050.1 – 0.5

0.005 – 0.010.05 – 0.1

0.001 – 0.050.01 – 0.5

(e) Graph Degree

Figure 5: NETWORK GRAPHS HAVING EXACTLY THE SAME NUMBER OF NODES AND LINKS, AS WELL AS THE SAME (POWER LAW) DEGREE SE-QUENCE. As toy models of the router-level Internet, all graphs are subject to same router technology constraints and the same traffic demand model for routersat the network periphery.(a) Hierarchical scale-free (HSF) network:Following roughly a recently proposed construction that combines scale-free structureand inherent modularity in the sense of exhibiting an hierarchical architecture [84], we start with a small 3-pronged cluster and build a 3-tier network a laRavasz-Barabasi, adding routers at the periphery roughlyin a preferential manner.(b) Random network: This network is obtained from the HSF networkin (a) by performing a number of pairwise random degree-preserving rewiring steps.(c) Poor design: In this heuristic construction, we arrange the interiorrouters in a line, pick a node towards the middle to be the high-degree, low bandwidth bottleneck, and establish connections between high-degree and low-degree nodes.(d) HOT network: The construction mimics the build-out of a network by a hypothetical ISP. It produces a 3-tier network hierarchy in whichthe high-bandwidth, low-connectivity routers live in the network core while routers with low-bandwidth and high-connectivity reside at the periphery of thenetwork.(e) Node degree sequence for each network.Only di > 1 shown.

3.2 A Critique of Existing Theory

The SF story has successfully captured the interest and imagi-nation of researchers across disciplines, and with good reason,as the proposed properties are rich and varied. Yet the exist-ing ambiguity in its mathematical formulation and many of itsmost essential properties has created confusion about whatitmeans for a network to be “scale-free” [48]. One possible andapparently popular interpretation is that scale-free means sim-ply graphs with scaling degreesequences, and that this aloneimplies all other features listed above. We will show that thisis incorrect, and in fact none of the features follows from scal-ing alone. Even relaxing this to random graphs with scalingdegreedistributionsis by itself inadequate to imply any fur-ther properties. A central aim of this paper is to clarify thereasons why these interpretations are incorrect, and proposeminimal changes to fix them. The opposite extreme interpre-tation is that scale-free graphs are defined as having all of theabove-listed properties. We will show that this is possibleinthe sense that the set of such graphs is not empty, but as adefinition this leads to two further problems. Mathematically,one would prefer fewer axioms, and we will rectify this with aminimal definition. We will introduce a structural metric thatprovides a view of the extent to which a graph is scale-free andfrom which all the above properties follow, often with neces-sary and sufficient conditions. The other problem is that thecanonical examples of apparent SF networks, the Internet andbiological metabolism, are then very far from scale-free inthatthey havenoneof the above properties except perhaps for scal-

ing degree distributions. This is simply an unavoidable conflictbetween these properties and the specifics of the applications,and cannot be fixed.

As a result, a rigorous theory of SF graphs must either de-fine scale-free more narrowly than scaling degree sequencesordistributions in order to have nontrivial emergent properties,and thus lose central claims of applicability, or instead definescale-free as merely scaling, but lose all the universal emer-gent features that have been claimed to hold for SF networks.We will pursue the former approach because we believe it ismost representative of the spirit of previous studies and alsobecause it is most inclusive of results in the existing literature.At the most basic level, simply to be a nontrivial and novelconcept, scale-free clearly must mean more than a graph withscaling degree sequence or distribution. It must capture someaspect of the graph itself, and not merely a sequence of in-tegers, stochastic or not, in which case the SF literature andthis paper would offer nothing new. Other authors may ulti-mate choose different definitions, but in any case, the resultsin this paper clarify for the first time precisely what the graphtheoretic alternatives are regarding the implications of any ofthe possible alternative definitions. Thus the definition oftheword “scale-free” is much less important than the mathemati-cal relationship between their various claimed properties, andthe connections with real world networks.

12

3.3 The Internet as a Case Study

To illustrate some key points about the existing claims regard-ing SF networks as adopted in the popular literature and theirrelationship with scaling degree distributions, we consider anapplication to the Internet where graphs are meant to modelInternet connectivity at the router-level. For a meaningful ex-planation of empirically observed network statistics, we mustaccount for network design issues concerned with technol-ogy constraints, economic factors, and network performance[65, 41]. Additionally, we should annotate the nodes and linksin connectivity-only graphs with domain-specific informationsuch as router capacity and link bandwidth in such a way thatthe resulting annotated graphs represent technically realizableand functional networks.

3.3.1 The SF Internet

Consider the simple toy model of a “hierarchical” SF net-work HSFnetshown in Figure 5(a), which has a “modular”graph constructed according to a particular type of preferen-tial attachment [84] and to which are then preferentially addeddegree-one end systems, yielding the power law degree se-quence shown in Figure 5(e). This type of construction hasbeen suggested as a SF model of both the Internet and biology,both of which are highly hierarchical and modular [18]. Theresulting graph has all the features listed above as characteris-tic of SF networks and is easily visualized and thus convenientfor our comparisons. Note that the highest-degree nodes in thetail of the degree sequence in Figure 5(e) correspond to theSF hub nodes in the SF networkHSFnet, Figure 5(a). Thisconfirms the intuition behind the popular SF view that powerlaw degree sequences imply the existence of SF hubs that arecrucial for global connectivity. If such features were trueforthe real Internet, this finding would certainly be startlingandprofound, as it directly contradicts the Internet’s legendary andmost clearly understood robustness property, i.e., it’s high re-silience to router failures [33].

Figure 5 also depicts three other networks with the exactsame degree sequence asHSFnet. The variety of these graphssuggests that the set of all connected simple graphs (i.e., noself-loops or parallel links) having exactly the same degree se-quence shown in Figure 5(e) is so diverse that its elements ap-pear to have nothing in common as graphs beyond what triv-ially follows from having a fixed (scaling) degree sequence.They certainly do not appear to share any of the features sum-marized above as conventionally claimed for SF graphs. Evenmore striking are the differences in their structures and anno-tated bandwidths (i.e., color-coding of links and nodes in Fig-ure 5). For example, while the graphs in Figure 5(a) and (b)exhibit the type of hub nodes typically associated with SF net-works, the graph in Figure 5(d) has its highest-degree nodeslo-cated at the networks’ peripheries. We will show this providesconcrete counterexamples to the idea that power law degree se-quences imply the existence of SF hubs. This then creates theobvious dilemma as to the concise meaning of a “scale-freegraph” as outlined above.

3.3.2 A Toy Model of the Real Internet

In terms of using SF networks as models for the Internet’srouter-level topology, recent Internet research has demon-strated that the real Internet is nothing like Figure 5(a), size is-sues notwithstanding, but is at least qualitatively more like thenetwork shown in Figure 5(d). We label this networkHOTnet

(for Heuristically Optimal Topology), and note that its overallpower law in degree sequence comes from high-degree routersat the network periphery that aggregate the traffic of end usershaving low bandwidth demands, while supporting aggregatetraffic flows with a mesh of low-degree core routers [65]. Infact, as we will discuss in greater detail in Section 6, thereislittle evidence that the Internet as a whole has scaling degreeor even high variability, and much evidence to the contrary,formany of the existing claims of scaling are based on a combina-tion of relying on highly ambiguous data and making a numberof statistical errors, some of them similar to those illustrated inFigures 1 and 2. What is true is that a network likeHOTnetis consistent with existing technology, and could in principlebe the router level graph for some small but plausible network.Thus a network with a scaling degree sequence in its routergraph is plausible even if the actual Internet is not scaling. Itwould however look qualitatively likeHOTnetand nothing likeHSFnet.

To see in what senseHOTnetis heuristically optimal, notethat from a network design perspective, an important questionis how well a particular topology is able to carry a given de-mand for traffic, while fully complying with actual technologyconstraints and economic factors. Here, we adopt as standardmetric fornetwork performancethe maximum throughput ofthe network under a “gravity model” of end user traffic de-mands [118]. The latter assumes that every end nodei has atotal bandwidth demandxi, that two-way traffic is exchangedbetween all pairs(i, j) of end nodesi andj, the flowXij oftraffic betweeni andj is given byXij = ρxixj , whereρ issome global constant, and is otherwise uncorrelated from allother flows. Our performance measure for a given networkgis then its maximum throughput with gravity flows, computedas

Perf(g) = maxρ

∑

ij

Xij , s.t. RX ≤ B, (7)

whereR is the routing matrix obtained using standard shortestpath routing.R = [Rkl], with Rkl = 1 if flow l passes throughrouterk, andRkl = 0 otherwise.X is the vector of all flowsXij , indexed to match the routing matrixR, andB is a vectorconsisting of all router bandwidth capacities.

An appropriate treatment of router bandwidth capacitiesrepresented inB is important for computing network perfor-mance and merits additional explanation. Due to fundamentallimits in technology, routers must adhere to flow conservationconstraints in the total amount of traffic that they process perunit of time. Thus, routers can support a large number of lowbandwidth connections or a smaller number of high bandwidthconnections. In many cases, additional routing overhead actu-ally causes the total router throughput to decrease as the num-ber of connections gets large, and we follow the presentation in[65] in choosing the termB to correspond with an abstractedversion of a widely deployed Cisco product (for details aboutthis abstracted constraint and the factors affecting real routerdesign, we refer the reader to [7, 65]).

The application of this network performance metric to thefour graphs in Figure 5 shows that although they have thesame degree sequence, they are very different from the per-spective of network engineering, and that these differences aresignificant and critical. For example, the SF networkHSFnetin Figure 5(a) achieves a performance ofPerf(HSFnet) =6.17×108 bps, while the HOT networkHOTnetin Figure 5(d)achieves a performance ofPerf(HOTnet) = 2.93×1011 bps,which is greater by more than two orders of magnitude. Thereason for this vast difference is that the HOT construction

13

explicitly incorporates the tradeoffs between realistic routercapacities and economic considerations in its design processwhile the SF counterpart does not.

The actual construction ofHOTnet is fairly straightfor-ward, and while it has high performance, it is not formally op-timal. We imposed the constraints that it must have exactly thesame degree sequence asHSFnet, and that it must satisfy therouter degree/bandwidth constraints. For a graph of this sizethe design then easily follows by inspection, and mimics in ahighly abstracted way the design of real networks. First, thedegree one nodes are designated as end-user hosts and placedat the periphery of the network, though geography per se is notexplicitly considered in the design. These are then maximallyaggregated by attaching them to the highest degree nodes atthe next level in from the periphery, leaving one or two linkson these “access router” nodes to attach to the core. The low-est degree of these access routers are given two links to thecore, which reflects that low degree access routers are capableof handling higher bandwidth hosts, and such high value cus-tomers would likely have multiple connections to the core. Atthis point there are just 4 low degree nodes left, and these be-come the highest bandwidth core routers, and are connected ina mesh, resulting in the graph in Figure 5(d). While some rear-rangements are possible, all high performance networks usinga gravity model and the simple router constraints we have im-posed would necessarily look essentially likeHOTnet. Theywould all have the highest degree nodes connected to degreeone nodes at the periphery, and they would all have a low-degree mesh-like core.

Another feature that has been highlighted in the SF litera-ture is the attack vulnerability of high degree hubs. Here again,the four graphs in Figure 5 are illustrative of the potentialdif-ferences between graphs having the same degree sequence.Using the performance metric defined in (7), we compute theperformance of each graph without disruption (i.e., the com-plete graph), after the loss of high degree nodes, and after theloss of the most important (i.e., worst case) nodes. In eachcase, when removing a node we also remove any correspond-ing degree-one end-hosts that also become disconnected, andwe compute performance over shortest path routes between re-maining nodes but in a manner that allows for rerouting. Wefind that forHSFnet, removal of the highest degree nodes doesin fact disconnect the network as a whole, and this is equivalentto the worst case attack for this network. In contrast, removalof the highest degree nodes results in only minor disruptiontoHOTnet, but a worst case attack (here, this is the removal ofthe low-degree core routers) does disconnect the network. Theresults are summarized below.

Network Complete High Degree Worst CasePerformance Graph Nodes Removed Nodes Removed

HSFnet 5.9197e + 09 Disconnected = ‘High Degree’ case

HOTnet 2.9680e + 11 2.7429e + 11 Disconnected

This example thus illustrates two important points. Thefirst is thatHSFnetdoes indeed have all the graph theoreticproperties listed above that are attributed to SF networks,in-cluding attack vulnerability, whileHOTnethas none of thesefeatures except for scaling degree. Thus the set of graphsthat have the standard scale-free attributes is neither emptynor trivially equivalent to graphs having scaling degree. Thesecond point is that the standard SF models are in all impor-tant ways exactly the opposite of the real Internet, and failtocapture even the most basic features of the Internet’s router-level connectivity. While the intuition behind these claims is

clear from inspection of Figure 5 and the performance com-parisons, full clarification of these points requires the resultsin the rest of this paper and additional details on the Internet[7, 65, 41]. These observations naturally cast doubts on therelevance of conventional SF models in other application areaswhere domain knowledge and specific functional requirementsplay a similarly crucial role as in the Internet context. Theother most cited SF example is metabolic networks in biology,where many recent SF studies have focused on abstract graphsin which nodes represent metabolites, and two nodes are “con-nected” if they are involved in the same reaction. In thesestudies, observed power laws for the degree sequences associ-ated with such graphs have been used to claim that metabolicnetworks are scale-free [19]. Though the details are far morecomplicated here than in the Internet story above, recent workin [102] that is summarized in Section 7 has shown there isa largely parallel story in that the SF claims are completelyinconsistent with the actual biology, despite their superficialappeal and apparent popularity.

4 A Structural Approach

In this section, we show that considerable insight into the fea-tures of SF graphs and models is available from a metric thatmeasures the extent to which high-degree nodes connect toother high-degree nodes. As we will show, such a metric isboth necessary and useful for explaining the extreme differ-ences between networks that have identical degree sequence,especially if it is scaling. By focusing on a graph’s structuralproperties and not on not how it was generated, this approachdoes not depend on an underlying random graph model but isapplicable to any graph of interest.

4.1 Thes-Metric

Let g be an undirected, simple, connected graph havingn =|V| nodes andl = |E| links, whereV andE are the sets ofnodes and links, respectively. As before, definedi to be thedegree of nodei ∈ V , D = d1, d2, . . . , dn to be the degreesequence forg (again assumed to be ordered), and letG(D)denote the set of all connected simple graphs having the samedegree sequenceD. Note that most graphs with scaling de-gree will be neither simple nor connected, so this is an impor-tant and nontrivial restriction. Even with these constraints, itis clear based on the previous examples that the elements ofG(D) can be very different from one another, so that in orderto constitute a non-trivial concept, “scale-free” should meanmore than merely thatD is scaling and should depend on ad-ditional topologicalor structuralproperties of the elements inG(D).

Definition 1. For any graphg having fixed degree sequenceD, we define the metric

s(g) =∑

(i,j)∈E

didj . (8)

Note thats(g) depends only on the graphg and not ex-plicitly on the process by which it is constructed. Implicitly,the metrics(g) measures the extent to which the graphg has a“hub-like” core and is maximized when high-degree nodes areconnected to other high-degree nodes. This observation fol-lows from theRearrangement Inequality[114], which states

14

that if a1 ≥ a2 ≥ · · · ≥ an andb1 ≥ b2 ≥ · · · ≥ bn, then forany permutation(a′

1, a′2, · · · , a′

n) of (a1, a2, · · · , an), we have

a1b1 + a2b2 + · · · + anbn ≥ a′1b1 + a′

2b2 + · · · + a′nbn

≥ anb1 + an−1b2 + · · · + a1bn.

Since highs(g)-values are achieved only by connecting high-degree nodes to each other, and lows(g)-values are obtainedby connecting high-degree nodes only to low-degree nodes,thes-metric moves beyond simple statements concerning thepresence of “hub” nodes (as is true for any degree sequenceD that has high variability) and attempts to quantify what rolesuch hubs play in the overall structure of the graph. In partic-ular, as we will show below, graphs with relatively highs(g)values have a “hub-like core” in the sense that these hubs playa central role in the overall connectivity of the network. Wewill also demonstrate that the metrics(g) provides a view thatis not only mathematically convenient and rigorous, but alsopractically useful as far as what it means for a graph to be“scale-free”.

4.1.1 Graph Diversity and thePerf(g) vs.s(g) Plane

Although our interest in this paper will be in graphs for whichthe degree sequenceD is scaling, we can computes(g) withrespect to any “background” setG of graphs, and we neednot restrict the set to scaling or even to connected or sim-ple graphs. Moreover, for any background set, there exists agraph whose connectivity maximizes thes-metric defined in(8), and we refer to this as an “smax graph”. Thesmax graphsfor different background sets are of interest since they arees-sentially unique and also have the most “hub-like” core struc-ture. Graphs with lows-values are also highly relevant, butunlikesmax graphs they are extremely diverse with essentiallyno features in common with each other or with other graphs inthe background set except the degree sequenceD.

Graphs with high variability and/or scaling in their degreesequence are of particular interest, however, and not simply be-cause of their association with SF models. Intuitively, scalingdegrees appear to create great “diversity” inG(D). Certainlythe graphs in Figure 5 are extremely diverse, despite havingidentical scaling degreeD, but to what extent does this dependon D being scaling? As a partial answer, note that at the ex-tremes of variability arem-regular graphs withCV (D) = 0,which haveD = m, m, m, . . . , m for somem, and per-fect star-like graphs withD = n − 1, 1, 1, 1, . . . , 1, whichhave maximalCV (D) ≈ √

n/2. In both of these extremes allgraphs inG(D) are isomorphic and thus have only one valueof s(g) for all g ∈ G(D) so from this measure the spaceG(D)of graphs lacks any diversity. In contrast, whenD is scalingwith α < 2, CV (D) → ∞ and it is easy to constructg suchthats(g)/smax → 0 asn → ∞, suggesting a possibly enor-mous diversity inG(D).

Before proceeding with a discussion of some of the fea-tures of thes-metric as well as for graphs having highs(g) val-ues, we revisit the four toy networks in Figure 5 and considerthe combined implications of the performance-oriented metricPerf(g) introduced in (7) and the connectivity-specific metrics(g) defined above. Figure 6 is a projection ofg ∈ G(D) ontoa plane ofPerf(g) versuss(g) and will be useful throughoutin visualizing the extreme diversity in the setG(D) for D inFigure 5. Of relevance to the Internet application is that graphswith highs(g)-values tend to have low performance, althougha lows(g)-value is no guarantee of good performance, as seenby the network in Figure 5(c) which has both smalls(g) and

small Perf(g). The additional points in thePerf(g) vs. s(g)plane involve degree preserving rewiring and will be discussedin more detail below.

These observations undermine the claims in the SF litera-ture that are based on scaling degree alone implying any addi-tional graph properties. On the other hand, they also suggestthat the sheer diversity ofG(D) for scalingD makes it an in-teresting object of study. We won’t further compareG(D) forscaling versus non-scalingD or attempt to define “diversity”precisely here, though these are clearly interesting topics. Wewill focus on exploring the nature of the diversity ofG(D) forscalingD such as in Figure 5.

In what follows, we will provide evidence that graphs withhigh s(g) enjoy certain self-similarity properties, and we alsoconsider the effects of random degree-preserving rewiringons(g). In so doing, we argue that thes-metric, as well as manyof the other definitions and properties that we will present,areof interest for any graph or any set of graphs. However, wewill continue to focus our attention primarily on simple con-nected graphs having scaling degree sequences. The main rea-son is that many applications naturally have simple connectedgraphs. For example, while the Internet protocols in princi-ple allow router connectivity to be nonsimple, it is relativelyrare and has little impact on network properties. Nevertheless,using other sets in many cases is preferable and will arise nat-urally in the sequel. Furthermore, while our interest will beon simple, connected graphs with scaling degree sequence, wewill often specialize our presentation to trees, in order tosim-plify the development and maximize contact with the existingSF literature. To this end, we will exploit the constructionofthesmax graph to sketch some of these relationships in moredetail.

4.1.2 Thesmax Graph and Preferential Attachment

Given a particular degree sequenceD, it is possible to con-struct thesmax graph ofG(D) using a deterministic procedure,and both the generation process and its resulting structureareinformative about thes(g) metric. Here, we describe this con-struction at a high level of abstraction (with all details deferredto Appendix A) in order to provide appropriate context for thediscussion of key features that is to follow.

The basic idea for constructing thesmax graph is to or-der all potential links(i, j) for all i, j ∈ V according to theirweightdidj and then add them one at a time in a manner thatresults in a simple, connected graph having degree sequenceD. While simple enough in concept, this type of “greedy”heuristic procedure may have difficulty achieving the intendedsequenceD due to the global constraints imposed by connec-tivity requirements. While the specific conditions under whichthis procedure is guaranteed to yield thesmax graph are de-ferred to Appendix A, we note that this type of constructionworks well in practice for the networks under considerationinthis paper, particularly those in Figure 5.

In cases where the intended degree sequenceD satisfies∑

i di = 2(n−1), then all simple connected graphs having de-gree sequenceD correspond to trees (i.e., acyclic graphs), andthis simple construction procedure is guaranteed to resultin ansmax graph. Acyclicsmax graphs have several nice propertiesthat we will exploit throughout this presentation. It is worthnoting that since adding links to a tree is equivalent to addingnodes one at a time, construction of acyclicsmax graphs canbe viewed essentially as a type of deterministic preferential at-tachment. Perhaps more importantly, by its construction the

15

0.4 0.5 0.6 0.7 0.8 0.9 1.0

1010

1011

"S=1“S(g)=1P(g)=6.73 x 109

HOTnetS(g) = 0.3952P(g) = 2.93 x 1011

"Poor Design”S(g) = 0.4536P(g) = 1.02 x 1010

HSFnetS(g) = 0.9791P(g) = 6.08 x 109

RandomS(g) = 0.8098P(g) = 9.23 x 109

s(g) / smax

Per

form

ance

Per

f(g)

Figure 6: EXPLORATION OF THE SPACE OF CONNECTED NETWORK GRAPHS HAVING EXACTLY THE SAME (POWER LAW) DEGREE SEQUENCE. Valuesfor the four networks are shown together with the values for other networks obtained by pairwise degree-preserving rewiring. Networks that are “one-rewiring”away from their starting point are shown in a corresponding color, while other networks obtained from more than one rewiring are shown in gray. Ultimately,only a careful design process explicitly incorporating technological constraints, traffic demands, or link costs yields high-performance networks. In contrast,equivalent networks resulting from even carefully craftedrandom constructions result in poor-performing networks.

smax tree has a natural ordering within its overall structure,which we now summarize.

Recall that a tree can be organized into hierarchies by des-ignating a single vertex as the “root” of the tree from which allbranches emanate. This is equivalent to assigning a directionto each arc such that all arcs flow away from the root. As aresult, each vertex of the graph becomes naturally associatedwith a particular “level” of the hierarchy, adjacent vertices areseparated by a single level, and the position of a vertex withinthe hierarchy is in relation to the root. For example, assumingthe root of the tree is at level 0 (the “highest” level), then itsneighbors are at level 1 (“below” level 0), their other neighborsin turn are at level 2 (“below” level 1), and so on.

Mathematically, the choice of the root vertex is an arbi-trary one, however for thesmax tree, the vertex with largestdegree sits as the natural root and is the most “central” (a no-tion we will formalize below). With this selection, two verticesu, v ∈ V that are directly connected to each other in the acyclicsmax graph have the following relative position within the hi-erarchy. Ifdu ≥ dv, then vertexu is one level “above” vertexv (alternatively, we say that vertexu is “upstream” of vertexvor that vertexv is “downstream” from vertexu). Thus, movingup the hierarchy of the tree (i.e., upstream) means that vertexdegrees are (eventually) becoming larger, and moving downthe hierarchy (i.e., downstream) means that vertex degreesare(eventually) becoming smaller.

In order to illustrate this natural ordering within thesmax

tree, we introduce the following notation. For any vertexv ∈ V , let N (v) denote the set of neighboring vertices forv, where for simple connected graphs|N (v)| = dv. Foran acyclic graphg, defineg(v) to be thesubgraph (subtree)of vertexv; that is, g(v) is the subtree containing vertexvalong with all downstream nodes. Since the notion of up-stream/downstream is relative to the overall root of the graph,for convenience we will additionally use the notationg(v,u)

to represent thesubgraph of the vertexv that is itself con-nected to upstream neighbor vertexu. The (ordered) degreesequence of the subtreeg(v) (equivalently forg(v,u)) is thenD(g(v)) = d(v)

1 , d(v)2 , . . ., whered

(v)1 = dv and the rest of

the sequence represents the degrees of all downstream nodes.

D(g(v)) is clearly a subsequence ofD(g). Finally, letE(g(v))denote the set of edges in the subtreeg(v).

For this subtree, we define itss-value as

s(g(v,u)) = dvdu +∑

(j,k)∈E(g(v))

djdk. (9)

This definition provides a natural decomposition for thes-metric, in that for any vertexv ∈ V , we can write

s(g) =∑

k∈N (v)

s(g(k,v)).

Furthermore, thes-value for any subtree can be defined as arecursive relationship on its downstream subtrees, specifically

s(g(v,u)) = dvdu +∑

k∈N (v)\u

s(g(k,v)).

Proposition 1. Let g be thesmax acyclic graph correspond-ing to degree sequenceD. Then for two verticesu, v ∈ V withdu > dv it necessarily follows that

(a) vertexv cannot be upstream from vertexu;

(b) the number of vertices ing(v) cannot be greater than thenumber of vertices ing(u) (i.e., |D(g(u))| ≥ |D(g(v))|);

(c) the degree sequence ofg(u) dominates that ofg(v) (i.e.,d(u)1 ≥ d

(v)1 , d

(u)2 ≥ d

(v)2 , . . .); and

(d) s(g(u)) > s(g(v)).

Although we do not prove each of these statements formally,each of parts (a)-(d) is true by simple contradiction. Essen-tially, if any of these statements is false, there is a rewiringoperation that can be performed on the graphg that increasesits s-value, thereby violating the assumption thatg is thesmax

graph. See Appendix A for additional information.

Proposition 2. Let g be thesmax acyclic graph correspond-ing to degree sequenceD. Then it necessarily follows that foreachv ∈ V and anyk 6= v ∈ V , the subgraphg(v) maximizess(g(v,k)) for the degree sequenceD(g(v)).

16

The proof of Proposition 2 follows from an inductive argu-ment that starts with the leaves of the tree and works its wayupstream. Essentially, in order for a tree to be thesmax acyclicgraph, then each of its branches must be thesmax subtree onthe corresponding degree subsequence, and this must hold atall levels of the hierarchy.

4.2 Thes-Metric and Node Centrality

While considerable attention has been devoted to networknode degree sequences in order to measure the structure ofcomplex networks, it is clear that such sequences alone areinsufficient to characterize the aggregate structure of a graph.Figure 5 has shown that high degree nodes can exist at the pe-riphery of the network or at its core, with serious consequencesfor issues such as network performance and robustness in thepresence of node loss. At the same time, it is clear from thesmax construction procedure that graphs with the largests(g)values will have their highest degree nodes located in the net-work core. Thus, an important question relates to thecentralityof individual high-degree nodes within the larger network andhow this relates, if at all, to thes-metric for graph structure.Again, the answer to this question helps to quantify the rolethat individual “hub” nodes play in the overall structure ofanetwork.

There are several possible means for measuring node cen-trality, and in the context of the Internet, one such measureisthe total throughput (orutilization) of a node when the net-work supports its maximum flow as defined in (7). The idea isthat under a gravity model in which traffic demand occurs be-tween all node pairs, nodes that are highly utilized are centralto the overall ability of the network to carry traffic. Figure7shows the utilization of individual nodes withinHSFnetandHOTnet, when each network supports its respective maximumflow, along with the corresponding degree for each node. Thepicture forHOTnetillustrates that the most “central” nodes arein fact low-degree nodes, which correspond to the core routersin Figure 5(c). In contrast, the node with highest utilization inHSFnetis the highest degree node, corresponding to the “cen-tral hub” in Figure 5(a).

Another, more graph theoretic, measure of node central-ity is its so-calledbetweenness(also known asbetweennesscentrality), which is most often calculated as the fraction ofshortest paths between node pairs that pass through the nodeof interest [40]. Defineσst to be the number of shortest pathsbetween two nodess andt. Then, the betweenness centralityof any vertexv can be computed as

Cb(v) =

∑

s<t∈V σst(v)∑

s<t∈V σst,

whereσst(v) is the number of paths betweens andt that passthrough nodev. In this manner, betweenness centrality pro-vides a measure of the trafficload that a node must handle. Analternate interpretation is that it measures the influence thatan individual node has in the spread of information within thenetwork.

Newman [72] introduces a more general measure of be-tweenness centrality that includes the flow along all paths (notjust the shortest ones), and based on an approach using randomwalks demonstrates how this quantity can be computed by ma-trix methods. Applying this alternate metric from [72] to thesimple annotated graphs in Figure 5, we observe in Figure 7that the high-degree nodes inHSFnetare the most central, and

in fact this measure of betweeness centrality increases withnode degree. In contrast, most of the nodes inHOTnetthatare central are not high degree nodes, but the low-degree corerouters.

Understanding the betweenness centrality of individualnodes is considerably simpler in the context of trees. Recallthat in an acyclic graph there is exactly one path between anytwo vertices, making the calculation ofCb(v) rather straight-forward. Specifically, observe that

∑

s<t∈V σst = n(n− 1)/2and that for eachs 6= v 6= t ∈ V , σst(v) ∈ 0, 1. Thisrecognition facilitates the following more general statement re-garding the centrality of high-degree nodes in thesmax acyclicgraph.

Proposition 3. Let g be thesmax acyclic graph for degreesequenceD, and consider two nodesu, v ∈ V satisfyingdu > dv. Then, it necessarily follows thatCb(u) > Cb(v).

The proof of Proposition 3 can be found in Appendix A, alongwith the proof of thesmax construction. Thus, the highestdegree nodes in thesmax acyclic graph are the most central.More generally for graphs that are not trees, we believe thatthere is a direct relationship between high-degree “hub” nodesin large-s(g) graphs and a “central” role in overall networkconnectivity, but this has not been formally proven.

4.3 Thes-Metric and Self-Similarity

When viewing graphs as multiscale objects, natural transfor-mations that yield simplified graphs are pruning of nodes atthe graph periphery and/or collapsing of nodes, although theseare only the simplest of many possible “coarse-graining” op-erations that can be performed on graphs. These transforma-tions are of particular interest because they are often inherentin measurement processes that are aimed at detecting the con-nectivity structure of actual networks. We will use these trans-formations to motivate that there is a plausible relationship be-tween high-s(g) graphs and self-similarity, as defined by thesesimple operations. We then consider the transformation of ran-dom pairwise degree-preserving (link) rewiring that suggests amore formal definition of the notion of a self-similar graph.

4.3.1 Graph Trimming by Link Removal

Here, we consider the properties ofsmax graphs under the op-eration of graph trimming, in which links are removed fromthe graph one at a time. Recall that by construction, the linksin the smax graph are selected from a list of potential links(denoted as(i, j) for i, j ∈ V) that are ordered according totheir weightsdidj . Denote the (ordered) list of links in thesmax graph asE = (i1, j1), (i2, j2), . . . , (il, jl), and con-sider a procedure that removes links in reverse order, start-ing with (il, jl). Define gk to be the remaining graph af-ter the removal of all but the firstk − 1 links, (i.e., afterremoving(il, jl), (il−1, jl−1), . . . , (ik+1, lk+1), (ik, lk)). Theremaining graph will have a partial degree sequenceDk =

d′

1, d′

2, . . . , d′

k, whered′

m ≤ dm, m = 1, 2, . . . k, but theoriginal ordering is preserved, i.e.,d

′

1 ≥ d′

2 ≥ . . . ≥ d′

k.This last statement holds because when removing links start-ing with the smallestdidj , nodes will “lose” links in reverseorder according to their node degree.

Observe for trees that removing a link is equivalent to re-moving a node (or subtree), so we could have equivalently de-fined this process in terms of “node pruning”. As a result, foracyclicsmax graphs, it is easy to see the following.

17

100 101 102 103108

109

1010

1011

1012

Node Degree

Rou

ter

Util

izat

ion

(bps

)

HOTnetHSFnet

100 101 102 10310-2

10-1

100

Node Degree

Nod

e B

etw

eene

ss

HOTnetHSFnet

Figure 7: Left: The centrality of nodes as defined by total traffic throughput. The most “central” nodes inHOTnetare thelow-degree core routers while the most “central” node inHSFnetis the highest-degree “hub”. TheHOTnetthroughputs areclose to the router bandwidth constraints. Right: The betweenness centrality versus node degree for non-degree-one nodes fromboth theHSFnetandHOTnetgraphs in Figure 5. InHSFnet, node centrality increases with node degree, and the highest degreenodes are the most “central”. In contrast, many of the most “central” nodes inHOTnethave low degree, and the highest degreenodes are significantly less “central” than inHSFnet.

Proposition 4. Let g be an acyclicsmax graph satisfying or-dered degree sequenceD = d1, d2, . . . , dn. For 1 ≤ k ≤ n,denote bygk the acyclic graph obtained by removing (“trim-ming”) in order nodesn, n − 1, . . . , k + 1 fromg. Then,gk isthesmax graph for degree sequenceDk = d′

1, d′

2, . . . , d′

k.

The proof of Proposition 4 follows directly from our proof ofthe construction of thesmax graph for trees (see Appendix A).More generally, for graphs exhibiting larges(g)-values, prop-erly defined graph operations of link-trimming appear to yieldsimplified graphs with high s-values, thus suggesting a broadernotion of self-similarity or invariance under such operations.However, additional work remains to formalize this notion.

4.3.2 Coarse Graining By Collapsing Nodes

A kind of coarse grainingof a graph can be obtained forproducing simpler graphs by collapsing existing nodes intoaggregate or super nodes and removing any duplicate linksemanating from the new nodes. Consider the case of a treeg having degree sequenceD = d1, d2, . . . , dn satisfyingd1 ≥ d2 ≥ . . . ≥ dn and connected in a manner such thats(g) = smax. Then, as long as node aggregation proceeds inorder with the degree sequence (i.e. aggregate nodes1 and2into 1′, then aggregate nodes1′ and3 into 1′′, and so on), allintermediate graphsg will also haves(g) = smax. To see this,observe that for trees, when aggregating nodes1 and2, wehave an abbreviated degree sequenceD′ = d′

1, d3, . . . , dn,whered

′

1 = d1 + d2 − 2. Provided thatd2 ≥ 2 then we areguaranteed to haved

′

1 ≥ d3, and the overall ordering ofD′ ispreserved. Similarly when aggregating nodes1

′

and3 we haveabbreviated degree sequenceD

′′

= d′′

1 , d4, . . . , dn, whered

′′

1 = d1 +d2 +d3−4. So as long asd3 ≥ 2 thend′′

1 ≥ d4 andordering ofD

′′

is preserved. And in general, as long as eachnew node is aggregated in order and satisfiesdi ≥ 2, then weare guaranteed to maintain an ordered degree sequence. As aresult, we have proved the following proposition.

Proposition 5. For acyclic g ∈ G(D) with s(g) = smax,coarse graining according to the above procedure yields

smaller graphsg′ ∈ G(D′) that are also thesmax graphs ofthis truncated degree distribution.

For cyclic graphs, this type of node aggregation opera-tion maintainssmax properties only if the resulting degree se-quence remains ordered, i.e.d1′ ≥ d3 ≥ d4 after the firstcoarse graining operation andd1′′ ≥ d4 ≥ d5 after the secondcoarse graining operation, etc. It is relatively easy to gener-ate cases where arbitrary node aggregation violates this con-dition and the resulting graph is no longer self-similar in thesense of having a larges(g)-value. However, when this con-dition is satisfied, the resulting simpler graphs seem to sat-isfy a broader self-similar property. Specifically, for high-s(g) graphsg ∈ G(D), properly defined graph operationsof coarse-graining appear to yield simplified graphs inG(D)with high s-values (i.e., such graphs are self-similar or in-variant under proper coarse-graining), but this has not beenproved.

These are of course not the only coarse graining, pruning,or merging processes that might be of interest, and for whichsmax graphs are preserved, but they are perhaps the simplest tostate and prove.

4.4 Self-Similar and Self-Dissimilar

While graph transformations such as link trimming or nodecollapse reflect some aspects of what it means for a graph tobe self-similar, the graph transformation of random pairwisedegree-preserving link rewiring offers additional notions ofself-similarity which potentially are even richer and alsocon-nected with the claim in the SF literature that SF graphs arepreserved under such rewirings.

4.4.1 Subgraph-Based Motifs

For any graphg ∈ G(D), consider the set of local degree-preserving rewirings of distinct pairs of links. There are(

l2

)

= l(l − 1)/2 pairs of different links on which degreepreserving rewiring can occur. Each pair of links defines itsown network subgraph, and in the case whereg is an acyclicgraph (i.e. a tree), these form three distinct types of subgraphs,as shown in Figure 8(a). Using the notationd2 =

∑

dk2,

18

ab

a b

a c

a b

a c

a b

a c

g’ = g , g’ ∈ G(D)

g’ ≠ g , g’ ∉ G(D) c

ab

c

b

ca

(i)

a b

c d

a b

c d

a b

c d

a b

c d

a b

c d

a b

c d

(ii)

a b

c d

a b

c d

a b

c d

g’ ≠ g (both cases)

For acyclic graphs, one g’∈ G(D) (connected)

and one g’∉ G(D) (disconnected)

a b

c d

a b

c d

a b

c d

(iii)

g∈ G(D) g’

g’ ≠ g , g’ ∈ G(D)

g∈ G(D) g’

g∈ G(D) g’

g’ ≠ g , g’ ∉ G(D)

Outcome from degree-preserving rewiring

g′ ∈ G(D) g′ 6∈ G(D)

simple simple not simple

connected not connected not connected

Case/Motif Count g′ = g g′ 6= g g′ 6= g g′ 6= g

(i) d2

2 − l 1 0 0 1

(ii) s − d2 + l 0 1 0 1

(iii) d2

2 − s + l2−l2 0 1 1 0

Totals l2−l2

d2

2 − l l2+l2 − d2

2d2

2 − s + l2−l2 s − d2

2

Figure 8: (a) Three possible subgraph-based motifs in degree-preserving rewiring in acyclic graphs.Blue links representslinks to be rewired. Rewiring operations that result in non-simple graphs (shaded) are assumed to revert to the originalconfig-uration. Thus defined, rewiring of motif (i) does not result in any new graphs, rewiring of motif (ii) results in one possible newgraph, and rewiring of motif (iii) results in two possible new graphs.(b) The numbers of the three motifs and successively thenumber for each possible rewiring outcome.We distinguish between equal, not equal but connected and simple, not connectedbut simple, and not simple graphs that are similar to each graph with the given motif selected for rewiring. The total numberof cases (column sum) is(l2 − l)/2, while the total number (row sum) of outcomes is twice that atl2 − l. Here, we use theabbreviated notationd2 =

∑

k dk2 ands = s(g), with l equal to the number of links in the graph.

s = s(g) we can enumerate the number of these subgraphs asfollows:

1. The two links share a common node. There are∑n

i=1

(

di

2

)

= 12d2 − l possible ways that this can oc-

cur.

2. The links have two nodes that are connected by a thirdlink. There are

∑

(i,j)∈E (di − 1)(dj − 1) = s − d2 + l

possible ways that this can occur.

3. The links have end points that do not share any directconnections. There are

(

l2

)

−∑ni=1

(

di

2

)

−∑

(i,j)∈E(di−1)(dj − 1) = 1

2d2 − s + 12 (l2 − 2) possible ways that

this can occur.

Collectively, these three basic subgraphs account for all possi-ble

(

l2

)

= l(l − 1)/2 pairs of different links. The subgraphsin cases (i) and (ii) are themselves trees, while the subgraphin case (iii) is not. We will refer to these three cases for sub-graphs as “motifs”, in the spirit of [70], noting that our notionof subgraph-based motifs is motivated by the operation of ran-dom rewiring to be discussed below.

The simplest and most striking feature of the relationshipbetween motifs ands(g) for acyclic graphs is that we can de-rive formulas for the number of subgraph-based (local) mo-tifs (and the outcomes of rewiring) entirely in terms ofd2,s = s(g), andl. Thus, for example, we can see that graphshaving higherd2 (equivalently higherCV ) values have fewerof the second motif. If we fixD, and thusl andd2, for allgraphs of interest, then the only remaining dependence is ons,and graphs with highers(g)-values contain fewer disconnected(case iii) motifs. This can be interpreted as a motif-level con-nection betweens(g) and self-similarity, in that graphs withhighers(g) contain more motifs that are themselves trees, andthus more similar to the graph as a whole. Graphs having lowers(g) have more motifs of type (iii) that are disconnected andthus dissimilar to the graph as a whole. Thus high-s(g) graphs

have this “motif self-similarity,” low-s(g) graphs have “motifself-dissimilarity” and we can precisely define a measure ofthis kind of self-similarity and self-dissimilarity as follows.

Definition 2. For a graphg ∈ G(D), another measure of theextent to whichg is self-similar is the metricss(g) defined asthe number of motifs (cases i-ii) that are themselves connectedgraphs. Accordingly, the measure of self-dissimilaritysd(g) isthen the number of motifs (case iii) that are disconnected.

For trees,ss(g) = s − d2/2 andsd(g) = −s + (l2 − l +d2)/2, so this local motif self-similarity (self-dissimilarity) isessentially equivalent to high-s(g) (resp. low-s(g)). As notedpreviously, network motifs have already been used as a wayto study self-similarity and coarse graining [58, 59]. There,one defines a recursive procedure by which node connectivitypatterns become represented as a single node (i.e. a differentkind of motif), and it was shown that many important tech-nological and biological networks were self-dissimilar, in thesense coarse-grained counterparts display very differentmotifsat each level of abstraction. Our notion of motif self-similarityis much simpler, but consistent, in that the Internet has ex-tremely lows(g) and thus minimally self-similar at the motiflevel. The next question is whether highs(g) is connected with“self-similar” in the sense of being preserved under rewiring.

4.4.2 Degree-preserving Rewiring

We can also connects(g) in several ways with the effect thatlocal rewiring has on the global structure of graphs in the setG(D). Recall the above process by which two network linksare selected at random for degree-preserving rewiring, andnote that when applied to a graphg ∈ G(D), there are fourpossible distinguishable outcomes:

1. g′ = g with g′ ∈ G(D): the new graphg′ is equalto theoriginal graphg (and therefore also a simple, connectedgraph inG(D));

19

2. g′ 6= g with g′ ∈ G(D): the new graphg′ is not equal tog, but is still a simple, connected graph in the setG(D)(note that this can includeg′ which are isomorphic tog);

3. g′ = g with g′ 6∈ G(D): the new graphg′ is still simple,but is not connected;

4. g′ = g with g′ 6∈ G(D): the new graphg′ is no longersimple (i.e. it either contains self-loops or parallel links).

There are two possible outcomes from the rewiring of any par-ticular pair of links, as shown in Figure 8(a) and this yieldsa total of2

(

l2

)

= l(l − 1) possible outcomes of the rewiringprocess. In our discussion here, we ignore isomorphisms andassume that all non-equal graphs are different.

We are ultimately interested in retaining within our newdefinitions the notion that highs(g) graphs are somehow pre-served under rewiring provided this is sufficiently random anddegrees are preserved. Scaling is of course trivially preservedby any degree-preserving rewiring, but highs(g) value is not.Again, Figure 5 provides a clear example, since successiverewirings can take any of these graphs to any other. More in-teresting for highs(g) graphs is the effect ofrandomrewiring.Consider again thePerf(g) vs.s(g) plane from Figure 6. In ad-dition to the four networks from Figure 5, we show thePerf(g)ands(g) values for other graphs inG(D) obtained by degree-preserving rewiring from the initial four networks. This isdone by selecting uniformly and randomly from thel(l − 1)different rewirings of thel(l−1)/2 different pairs of links, andrestricting rewiring outcomes to elements ofG(D) by reset-ting all disconnected or nonsimple neighbors to equal. Pointsthat match the color of one of the four networks are only onerewiring operation away, while points represented in gray aremore than one rewiring operation away.

The connections of the results in Figure 8(b) to motifcounts is more transparent however than to the consequencesof successive rewiring. Nevertheless, we can use the results inFigure 8(b) to describe related ways in which lows(g) graphsare “destroyed” by random rewiring. For any graphg, we canenumerate among all possible pairs of links on which degreepreserving rewiring can take place and count all those that re-sult in equal or non-equal graphs. In Figure 8, we consider thefour cases for degree-preserving rewiring of acyclic graphs,and we count the number of ways each can occur. For mo-tifs (i) and (ii), it is possible to check locally for outcomesthat produce non-simple graphs and these cases correspondto the shaded outcomes in Figure 8(a). If we a priori ex-clude all such nonsimple rewirings, then there remain a total ofl(l− 1)− s + d2/2 simple similar neighbors of a tree. We candefine a measure of local rewiring self-dissimilarity for treesas follows.

Definition 3. For a treeg ∈ G(D), we measure the extent towhich g is self-dissimilar under local rewiring by the metricrsd(g) defined as the number of simple similar neighbors thatare disconnected graphs.

For trees,rsd(g) = sd(g) = −s + (l2 − l + d2)/2, sothis local rewiring self-dissimilarity is identical to motif self-dissimilarity and directly related to lows(g) values. This isbecause only motif (iii) results in simple but not connectedsimilar neighbors.

4.5 A Coherent Non-Stochastic Picture

Here, we pause to reconsider the features/claims for SF graphsin the existing literature (Section 3.1) in light of our structural

approach to graphs with scaling degree sequenceD. In doingso, we make a simple observation: high-s(g) graphs exhibitmost of the features highlighted in the SF literature, but low-s(g) graphs do not, and this provides insight into the diversityof graphs in the spaceG(D). Perhaps more importantly, givena graph with scaling degreeD thes(g) metric provides a “lit-mus test” as to whether or not the existing SF literature mightbe relevant to the network under study.

By definition, all graphs inG(D) exhibit power laws intheir node degrees provided thatD is scaling. However, pref-erential attachment mechanisms typically yield only high-s(g)graphs—indeed thesmax construction uses what is essentiallythe “most preferential” type of attachment mechanism. Fur-thermore, while all graphs having scaling degree sequenceDhave high-degree nodes or “hubs”, only for high-s(g) graphsdo such hubs tend to be critical for overall connectivity. Whileit is certainly possible to construct a graph with lows(g) andhaving a central hub, this need not be the case, and our workto date suggests that most low-s(g) graphs do not have thetype of central hubs that create an “Achilles’ heel”. Addition-ally, we have illustrated that high-s(g) graphs exhibit strik-ing self-similarity properties, including that they are largelypreserved under appropriately defined graph transformationsof trimming, coarse graining and random pairwise degree-preserving rewiring. In the case of random rewiring, we of-fered numerical evidence and heuristic arguments in supportof the conjecture that in general high-s(g) graphs are the likelyoutcome of performing such rewiring operations, whereas low-s(g) graphs are unlikely to occur as a result of this process.

Collectively, these results suggest that a definition of“scale-free graphs” that restricts graphs to havingbothscalingdegreeD andhigh-s(g) results in a coherent story. It recoversall of the structural results in the SF literature and provides apossible explanation why some graphs that exhibit power lawsin their node degrees do not seem to satisfy other propertieshighlighted in the SF literature. This non-stochastic picturerepresents what is arguably a reasonable place to stop with atheory for “scale-free” graphs. However, from a graph theo-retic perspective, there is considerable more work that couldbe done. For example, it may also be possible to expand thediscussion of Section 4.4 to account more comprehensively forthe way in which local motifs are transformed into one an-other and to relate our attempts more directly to the approachconsidered in [70]. Elaborating on the precise relationshipsand providing a possible interpretation of motifs as captur-ing a kind of local as well as global self-similarity propertyof an underlying graph remain open interesting problems. Ad-ditionally, we have also seen that the use of degree-preservingrewiring among connected graphs provides one view into thespaceG(D). However, the geometry of this space is still com-plicated, and additional work is required to understand itsre-maining features. For example, our work to date suggests thatfor scalingD it is impossible to construct a graph that has bothhigh Perf(g) and highs(g), but this has not been proven. Inaddition, it will be useful to understand the way that degree-preserving rewiring causes one to “move” within the spaceG(D) (see for example, [49, 46]).

It is important to emphasize that the purpose of thes(g)metric is to provide insight into the structure of “scale free”graphs andnot as a general metric for distinguishing amongall possible graphs. Indeed, since the metric fails to distin-guish among graphs having lows(g), it provides little insightother than to say that there is tremendous diversity among such

20

graphs. However, if a graph has highs(g), then we believe thatthere exist strong properties that can be used to understandthestructure (and possibly, the behavior) of such systems. In sum-mary, if one wants to understand “scale-free graphs”, then weargue thats(g) is an important metric and highly informative.However, for graphs with lows(g) then this metric conveyslimited information.

Despite the many appealing features of a theory that con-siders only non-stochastic properties, most of the SF literaturehas considered a framework that is inherently stochastic. Thus,we proceed next with a stochastic version of the story, one thatconnects more directly with the existing literature and com-mon perspective on SF graphs.

5 A Probabilistic Approach

While the introduction and exploration of thes-metric fits nat-urally within standard studies of graph theoretic properties, itdiffers from the SF literature in that our structural approachdoes not depend on a probability model underlying the setof graphs of interest. The purpose of this section is to com-pare our approach with the more conventional probabilisticand ensemble-based views. For many application domains,including the Internet, there seems to be little motivationtoassume networks are samples from an ensemble, and our treat-ment here will be brief while trying to cover this broad subject.Here again, we show that thes(g) metric is potentially inter-esting and useful, as it has a direct relationship to notionsofgraph likelihood, graph degree correlation, and graph assor-tativity. This section also highlights the striking differencesin the way that randomness is treated in physics-inspired ap-proaches versus those shaped by mathematics and engineering.

The starting point for most probabilistic approaches to thestudy of graphs is through the definition of an appropriatesta-tistical ensemble(see for example [40, Section 4.1]).

Definition 4. A statistical ensemble of graphs is defined by

(i) a setG of graphsg, and

(ii) a rule that associates a real number (“probability”)0 ≤ P (g) ≤ 1 with each graphg ∈ G such that∑

g∈G P (g) = 1.

To describe an ensemble of graphs, one can either assign aspecific weight to each graph or define some process (i.e., astochastic generator) which results in a weight. For example,in one basic model of random graphs, the setG consists of allgraphs with vertex setV = 1, 2, . . . , n havingl edges, andeach element inG is assigned the same probability1/

(

nl

)

. Inan alternative random graph model, the setG consists of allgraphs with vertex setV = 1, 2, . . . , n in which the edgesare chosen independently and with probability0 < p < 1. Inthis case, the probabilityP (g) depends on the number of edgesin g and is given byP (g) = pl(1− p)n−l, wherel denotes thenumber of edges ing ∈ G.

The use of stochastic construction procedures to assign sta-tistical weights has so dominated the study of graphs that theassumption of an underlying probability model often becomesimplicit. For example, consider the four graph constructionprocedures listed in [40, p.22] that are claimed to form“thebasis of network science,”and include (1) classical randomgraphs due to Erdos and Renyi [43]; (2) equilibrium random

graphs with a given degree distribution such as theGener-alized Random Graph (GRG)method [32]; (3) “small-worldnetworks” due to Watts and Strogatz [109]; and (4) networksgrowing under the mechanism of preferential linking due toBarabasi and Albert [15] and made precise in [24]. All ofthese construction mechanisms are inherentlystochasticandprovide a natural means for assigning, at least in principle,probabilities to each element in the corresponding space ofrealizable graphs. While deterministic (i.e., non-stochastic)construction procedures have been considered [20], their studyhas been restricted to the treatment of deterministic preferen-tial attachment mechanisms that result in pseudofractal graphstructures. Graphs resulting from other types of deterministicconstructions are generally ignored in the context of statisticalphysics-inspired approaches since within the space of all fea-sible graphs, their likelihood of occurring is typically viewedas vanishingly small.

5.1 A Likelihood Interpretation of s(g)

Using the construction procedure associated with thegeneralmodel of random graphs with a given expected degree se-quenceconsidered in [32] (also called theGeneralized Ran-dom Graph (GRG) modelfor short) we show that thes(g) met-ric allows for a more familiar ensemble-related interpretationas(relative) likelihoodwith which the graphg is constructedaccording to the GRG method. To this end, the GRG model isconcerned with generating graphs with givenexpecteddegreesequenceD = d1, . . . dn for vertices1, . . . , n. The link be-tween verticesi andj is chosen independently with probabilitypij , with pij proportional to the productdidj (i.e. pij = ρdidj ,whereρ is a sufficiently small constant), and this defines aprobability measureP on the space of all simple graphs andthus induces a probability measure onG(D) by conditioningon having degreeD. The construction is fairly general and canrecover the classic Erdos-Renyi random graphs [43] by tak-ing the expected degree sequence to bepn, pn, . . . , pn forconstantp. As a result of choosing each link(i, j) ∈ E witha probability that is proportional todidj in the GRG model,different graphs are typically assigned different probabilitiesunderP . This generation method is closely related to thePower Law Random Graph (PLRG)method [2], which also at-tempts to replicate a given (power law) degree sequence. ThePLRG method involves forming a setL of nodes containingas many distinct copies of a given vertex as the degree of thatvertex, choosing a random matching of the elements ofL, andapplying a mapping of a given matching into an appropriate(multi)graph. It is believed that the PLRG and GRG mod-els are“basically asymptotically equivalent, subject to bound-ing error estimates”[2]. Defining thelikelihood of a graphg ∈ G(D) as the logarithm of its probability under the mea-sureP , we can show that the log likelihood (LLH) of a graphg ∈ G(D), can be computed as

LLH(g) ≈ κ + ρ s(g), (10)

whereκ is a constant.Note that the probability of any graphg underP is given

by [77]

P (g) =∏

(i,j)∈E

pij

∏

(i,j)/∈E

(1 − pij),

and using the fact that under the GRG model, we havepij =ρdidj , whereD = (d1, . . . dn) is the given degree sequence,

21

we get

P (g) = ρl∏

i∈V

ddi

i

∏

(i,j)/∈E

(1 − ρdidj)

= ρl∏

i∈V

ddi

i

∏

i,j∈V (1 − ρdidj)∏

(i,j)∈E (1 − ρdidj).

Taking the log, we obtain

log P (g) = l log ρ +∑

i∈V

di log di +∑

i,j∈V

log(1 − ρdidj)

−∑

(i,j)∈E

log(1 − ρdidj).

Defining

κ = l log ρ +∑

i∈V

di log di +∑

i,j∈V

log(1 − ρdidj),

we observe thatκ is constant for fixed degree sequenceD.Also recall thatlog(1 + a) ≈ a for |a| << 1. Thus, if ρ issufficiently small so thatpij = ρdidj << 1, we get

LLH(g) = log P (g) ≈ κ +∑

(i,j)∈E

ρdidj .

This shows that the graph likelihoodLLH(g) can be madeproportional tos(g) and thus we can interprets(g)/smax asrelative likelihoodof g ∈ G(D), for thesmax-graph has thehighest likelihood of all graphs inG(D). Choosingρ =1/

∑

i∈V di = 1/2l in the GRG formulation results in the ex-pectation

E(di) =

n∑

j=1

pij =

n∑

j=1

ρdidj = ρdi

n∑

j=1

dj = di.

However, thisρ may not havepij = ρdidj << 1 and caneven makepij > 1, particularly in cases when the degree se-quence is scaling. Thusρ must often be chosen much smallerthanρ = 1/

∑

i∈V di = 1/2l to ensure thatpij << 1 forall nodesi, j. In this case, the “typical” graph resulting fromthis construction with have degree sequence much less thanD, however this sequence will be proportional to the desireddegree sequence,E(di) ∝ di.

While this GRG construction yields a probability distribu-tion onG(D) by conditioning on having degree sequenceD,this is not an efficient, practical method to generate membersof G(D), particularly whenD is scaling and it is necessary tochooseρ << 1/2l. The appeal of the GRG method is that itis easy to analyze and yields probabilities onG(D) with clearinterpretations. All elements ofG(D) will have nonzero prob-ability with log likelihood proportional tos(g). But even thesmax graph may be extremely unlikely, and thus a naive MonteCarlo scheme using this construction would rarely yield anyelements inG(D). There are many conjectures in the SF lit-erature that suggest that a wide variety of methods, includingrandom degree-preserving rewiring, produce “essentiallythesame” ensembles. Thus it may be possible to generate prob-abilities onG(D) that can both be analyzed theoretically andalso provide a practical scheme to generate samples from theresulting ensemble. While we believe this is plausible, it’s rig-orous resolution is well beyond the scope of this paper.

5.2 Highly Likely Constructions

The interpretation ofs(g) as (relative) graph likelihood pro-vides an explicit connection between this structural metric andthe extensive literature on random graph models. Since theGRG method is a general means of generating random graphs,we can in principle generate random instances of “scale-free”graphs with a prescribed power law degree sequence, by usingGRG as described above and then conditioning on that degreesequence. (And more efficient, practical schemes may also bepossible.) In the resulting probability distribution on the spaceof graphsG(D), high-s(g) graphs with hub-like core structureare literally “highly likely” to arise at random, while low-s(g)graphs with their high-degree nodes residing at the graphs’pe-ripheries are “highly unlikely” to result from such stochasticconstruction procedures.

While graphs resulting from stochastic preferential attach-ment construction may have a different underlying probabil-ity model than GRG-generated graphs, both result in simplegraphs having approximate scaling relationships in their de-gree distributions. One can understand the manner in whichhigh-s(g) graphs are “highly likely” through the use of a sim-ple Monte Carlo simulation experiment. Recall that the toygraphs in Figure 5 each contained 1000 nodes and that thegraph in Figure 5(b) was “random” in the sense that it wasobtained by successive arbitrary rewirings ofHSFnetin Fig-ure 5(a). An alternate approach to generating random graphshaving a power law in their distribution of node degree is touse the type of preferential attachment mechanism first out-lined in [15] and consider the structural features that are most“likely” among a large number of trials. Here, we generate100,000 graphs each having 1000 nodes and measure thes-value of each. It is important to note that successive graphsresulting from preferential attachment will have different nodedegree sequences (one that is undoubtedly different from thedegree sequence in Figure 5(e)), so a raw comparison ofs(g)is not appropriate. Instead, we introduce the normalized valueS(g) = s(g)/smax and use it to compare the structure of thesegraphs. Note that this means also generating thesmax graphassociated with the particular degree sequence for the graph re-sulting from each trial. Fortunately, the construction procedurein Appendix A makes this straightforward, and so in this man-ner we obtain the normalizedS-values for 100,000 graphs re-sulting from the same preferential attachment procedure. Plot-ting the CDF and CCDF of theS-values for these graphs inFigure 9, we observe a striking picture: all of the graphs re-sulting from preferential attachment had values ofS greaterthan 0.5, most of the graphs had values0.6 < S(g) < 0.9,and a significant number had valuesS(g) > 0.9. In con-trast, the graphs in Figure 5 had values:S(HSFnet) =0.9791, S(Random) = 0.8098, S(HOTnet) = 0.3952,and S(PoorDesign) = 0.4536. Again, from the perspec-tive of stochastic construction processes, low-S values typicalof HOT constructions are “very unlikely” while high-S valuesare much more “likely” to occur at random.

With this additional insight into thes-values associatedwith different graphs, the relationship in thePerf(g) vs. s(g)plot of Figure 6 is clearer. Specifically, high-performancenet-works resulting from a careful design processare vanishinglyrare from a conventional probabilistic graph point of view. Incontrast, the likely outcome of random graph constructions(even carefully handcrafted ones) are networks that have ex-tremely poor performance or lack the desired functionality(e.g., providing connectivity) altogether.

22

Figure 9: RESULTS FROMMONTE CARLO GENERATION OF PREFERENTIAL ATTACHMENT GRAPHS HAVING1000NODES. For each trial, we computethe values(g) and then renormalize toS(g) against thesmax graph having the same degree sequence. Both the CDF and CCDF are shown. In comparison,theHOTnetgraph hasS(HOTnet) = 0.3952 andS(HSFnet) = 0.9791.

5.3 Degree Correlations

Given an appropriate statistical ensemble of graphs, the expec-tation of a random variable or random vectorX is defined as

〈X〉 =∑

g∈G

X(g)P (g). (11)

For example, for1 ≤ i ≤ n, let Di be the random vari-able denoting the degree of nodei for a graphg ∈ G andlet D = D1, D2, . . . , Dn be the random vector representingthe node degrees ofg. Then thedegree distributionis given by

P (k) ≡ P (g ∈ G : Di(g) = k; i = 1, 2, . . . , n)

and can be written in terms of an expectation of a random vari-able, namely

P (k) =1

n

⟨

n∑

i=1

δ[Di − k]

⟩

where

δ[Di(g) − k] =

1 if nodei of graphg has degreek0 otherwise.

One previously studied topic has been the correlations be-tween the degrees of connected nodes. To show that this no-tion has a direct relationship to thes(g) metric, we follow [40,Section 4.6] and define the degree correlation between two ad-jacent vertices having respective degreek andk′ as follows.

Definition 5. The degree correlation between two neighborshaving degreesk andk′ is defined by

P (k, k′) =1

n2

⟨

n∑

i,j=1

δ[di − k]aijδ[dj − k′]

⟩

(12)

where theaij are elements of the network node adjacency ma-trix such that

aij =

1 if nodesi, j are connected0 otherwise

and where the random variablesδ[Di − k] are as above.

As an expectation of indicator-type random variables,P (k, k′)can be interpreted as the probability that a randomly chosenlink connects nodes of degreesk andk′, thereforeP (k, k′) isalso called the “degree-degree distribution” for links. Observethat for a given graphg having degree sequenceD,

s(g) =∑

(i,j)∈E

didj

=∑

(i,j)∈E

∑

k∈D

kδ[di − k]∑

k′∈D

δ[dj − k′]k′

=∑

(i,j)∈E

∑

k∈D

∑

k′∈D

kδ[di − k]δ[dj − k′]k′

=1

2

∑

k,k′∈D

kk′n

∑

i,j=1


Thus, there is an inherent relationship between the structuralmetrics(g) and the degree-degree distribution, which we for-malize as follows.

Proposition 6.

〈s〉 =n2

2

∑

k,k′

kk′P (k, k′). (13)

Proof of Proposition 6: For fixed degree sequenceD,

〈s〉 =

⟨

1

2

∑

k,k′∈D

kk′n

∑

i,j=1


⟩

=1

2

∑

k,k′∈D

kk′

⟨

n∑

i,j=1


⟩

=n2

2

∑

k,k′∈D

kk′P (k, k′).

This result shows that for an ensemble of graphs havingdegree sequenceD, the expectation ofs can be written purelyin terms of the degree correlation. While other types of corre-lations have been considered (e.g., the correlations associatedwith clustering or loops in connectivity), degree correlationsof the above type are the most obviously connected with thes-metric.

23

5.4 Assortativity/Disassortativity of Networks

Another ensemble-based notion of graph degree correlationthat has been studied is the measurer(g) of assortativityinnetworks as introduced by Newman [73], who describesas-sortative mixing(r > 0) as“a preference for high-degree ver-tices to attach to other high-degree vertices”anddisassorta-tive mixing(r < 0) as the converse, where“high-degree ver-tices attach to low-degree ones.”Since this is essentially whatwe have showns(g) measures, the connection betweens(g)and assortativityr(g) should be and ultimately is very direct.As with all concepts in the SF literature, assortativity is de-veloped in the context of an ensemble of graphs, but Newmanprovides a sample estimate of assortativity of any given graphg. Using our notation, Newman’s formula [73, Eq. 4] can bewritten as

r(g) =

[

∑

(i,j)∈E didj

]

−[∑

i∈V12d2

i

]2/l

[∑

i∈V12d3

i

]

−[∑

i∈V12d2

i

]2/l

, (14)

where l is the number of links in the graph. Note that thefirst term of the numerator ofr(g) is preciselys(g), and theother terms depend only onD and not on the specific graphg ∈ G(D). Thus r(g) is linearly related tos(g). How-ever, when we computer(g) for the graphs in Figure 5 thevalues all are in the interval[−0.4815,−0.4283]. Thus allare roughly equally disassortative andr(g) seems not to dis-tinguish between what we have viewed as extremely differ-ent graphs. The assortativity interpretation appears to directlycontradict both what appears obvious from inspection of thegraphs, and the analysis based ons(g). Recall that forS(g) =s(g)/smax the graphs in Figure 5 hadS(HSFnet) = 0.979andS(HOTnet) = 0.395, with high-degree nodes inHSFnetattached to other high-degree nodes and inHOTnetattached tolow-degree nodes.

The essential reason for this apparent conflict is that−1 ≤r(g) ≤ 1 and0 < S(g) ≤ 1 are normalized against a dif-ferent “background set” of graphs. ForS(g) = s(g)/smax

here, we have computedsmax constrained to simple, con-nected graphs, whereasr(g) involves no such constraints. Ther = 0 graph with the same degree sequence asHSFnetandHOTnetwould be non-simple—having, for example, the high-est degree (d1) node highly connected to itself (with multipleself-loops) and with multiple parallel connections to the otherhigh-degree nodes (e.g. multiple links to thed2 node). Thecorrespondingr = 1 graph would be both non-simple anddisconnected—having the highest degree (d1) node essentiallyconnectedonly to itself. SoHSFnetcould be thought of asassortative when compared with graphs inG(D), but dissas-sortative when compared with all graphs. To emphasize thisdistinction, the description ofassortative mixing(r > 0) couldbe augmented to “high-degree vertices attach to other high-degree vertices, including self-loops.” Since high variability,simple, connected graphs will all typically haver(g) < 0, thismeasure is less useful than simply comparing raws(g) for thisclass of graphs. Thus conceptually,r(g) and s(g) have thesame aim, but with different and largely incomparable normal-izations, both of which are interesting.

We will now briefly sketch the technical details behindthe normalization ofr(g). The first term of the denomina-tor

∑

i∈V d3i /2l is equal tosmax for “unconstrained” graphs

(i.e., those not restricted to be simple or even connected; seeAppendix A for details), and the normalization term in the de-nominator can be understood accordingly as thissmax. The

term(∑

i∈V d2i /2

)2/l can be interpreted as the “center” or

zero assortativity case, again for unconstrained graphs. Thus,the perfectly assortative graph can be viewed as thesmax graph(within a particular background setG), and the assortativity ofgraphs is measured relative to thesmax graph, with appropriatecentering.

Newman’s development of assortativity [73] is motivatedby a definition that works both for an ensemble of graphs andas a sample-based metric for individual graphs. Accordingly,his definition depends onQ(k, k′), the joint distribution of theremaining degreesof the two vertices at either end of a ran-domly selected link belonging to a graph in an ensemble. Thatis, consider a physical process by which a graph is selectedfrom a statistical ensemble and then a link is arbitrarily cho-sen from that graph. The question of assortativity can then beunderstood in terms of some (properly normalized) statisticalaverage between the degrees of the nodes at either end of thelink. We defer the explicit connection between the ensemble-based and sample-based notions of assortativity and our struc-tural metrics(g) to Appendix B.

6 SF Graphs and the Internet Revisited

Given the definitions ofs(g), the various self-similarity andhigh likelihood features of high-s(g) graphs, as well as theextreme diversity of the set of graphsG(D) with scaling de-greeD, we look to incorporate this understanding into a theoryof SF graphs that recovers both the spirit and existing results,while making rigorous the notion of what it means for a graphto be “scale-free”. To do so, we first trace the exact nature ofprevious misconceptions concerning the SF Internet, introducean updated definition of a scale-free graph, clarify what state-ments in the SF literature can be recovered, and briefly outlinethe prospects for applying properly defined SF models in viewof alternative theoretical frameworks such as HOT (HighlyOptimized/Organized Tolerance/Tradeoffs). In this context, itis also important to understand the popular appeal that the SFapproach has had. One reason is certainly its simplicity, andwe will aim to preserve that as much as possible as we aimto replace largely heuristic and experimental results withonesmore mathematical in nature. The other is that it relies heavilyon methods from statistical physics, so much so that replacingthem with techniques that are shaped by mathematics and engi-neering will require a fundamental change in the way complexsystems such as the Internet are viewed and studied.

The logic of the existing SF theory and its central claimsregarding the Internet consists of the following steps:

1. The claim that measurements of the Internet’s router-level topology can be reasonably modeled with a graphg that has scaling degree sequenceD.

2. The assertion, or definition, that a graphg with scalingdegree sequenceD is a scale-free graph.

3. The claim that scale-free graphs have a host of “emer-gent” features, most notably the presence of severalhighly connected nodes (i.e. “hubs”) that are critical tooverall network connectivity and performance.

4. The conclusion that the Internet is therefore scale-free,and its “hubs,” through which most traffic must pass, areresponsible for the “robust yet fragile” feature of failuretolerance and attack vulnerability.

24

In the following, we revisit the steps of this logic and illus-trate that the conclusion in Step 4 is based on a series of mis-conceptions and errors, ranging in scope from taking highlyambiguous Internet measurements at face value to applying aninherently inconsistent SF theory to an engineered system likethe Internet.

6.1 Scaling Degree Sequences and the Internet

The Internet remains one of the most popular and highly citedapplication areas where power laws in network connectivityhave “emerged spontaneously”, and the notion that this in-creasingly important information infrastructure exhibits a sig-nature of self-organizing complex systems has generated con-siderable motivation and enthusiasm for SF networks. How-ever, as we will show here, this basic observation is highlyquestionable, and at worst is the simple result of errors em-anating from the misinterpretation of available measurementsand/or their naive and inappropriate statistical analysisof thetype critiqued in Section 2.1.2

To appreciate the problems inherent in the available data, itis important to realize that Internet-related connectivity mea-surements are notorious for their ambiguities, inaccuracies,and incompleteness. This is due in part to the multi-layerednature of the Internet protocol stack (where each level definesits own connectivity), and it also results from the efforts of In-ternet Service Providers (ISPs) who intentionally obscuretheirnetwork structure in order to preserve what they believe is asource of competitive advantage. Consider as an example therouter-level connectivity of the Internet, which is intended toreflect (physical) one-hop distances between routers/switches.Although information about this type of connectivity is typi-cally inferred fromtracerouteexperiments which record suc-cessive IP-hops along paths between selected network hostcomputers (see for example the Mercator [50], Skitter [34],and Rocketfuel [96] projects), there remain a number of chal-lenges when trying to reverse-engineer a network’s physicalinfrastructure from traceroute-based measurements. The firstchallenge is that IP connectivity is an abstraction (at “Layer3”) that sits on top of physical connectivity (at “Layer 2”),sotraceroute is unable to record directly the network’s physicalstructure, and its measurements are highly ambiguous aboutthe dependence between these two layers. Such ambiguity inInternet connectivity persists even at higher layers of thepro-tocol stack, where connectivity becomes increasingly virtual,but for different reasons (see for example Section 6.4 belowfor a discussion of the Internet’s AS and Web graphs).

To illustrate how the somewhat subtle interactions amongthe different layers of the Internet protocol stack can givethe(false) appearance of high connectivity at the IP-level, recallhow at the physical layer the use of Ethernet technology nearthe network periphery or Asynchronous Transfer Mode (ATM)technology in the network core can give the appearance of highIP-connectivity since the physical topologies associatedwiththese technologies may not be seen by IP-based traceroute. Insuch cases, machines that are connected to the same Ethernetor ATM network may have the illusion of direct connectivityfrom the perspective of IP, even though they are separated byan entire network (potentially spanning dozens of machinesorhundreds of miles) at the physical level. In an entirely dif-ferent fashion, the use of “Layer 2.5 technologies” such asMultiprotocol Label Switching (MPLS) tend to mask a net-work’s physical infrastructure and can give the illusion ofone-hop connectivity at Layer 3. Note that in both cases, it is the

explicit and intended design of these technologies to hide thephysical network connectivity from IP. Another practical prob-lem when interpreting traceroute data is to decide which IP ad-dresses/interface cards (and corresponding DNS names) referto the same router, a process known asalias resolution[95].While one of the contributing factors to the high fidelity of thecurrent state-of-the-art Rocketfuel maps is the use of an im-proved heuristic for performing alias resolution [96], furtherambiguities remain, as pointed out for example in [107]. Yetanother difficulty when dealing with traceroute-derived mea-surements has been considered in [64, 1] and concerns a po-tential bias whereby IP-level connectivity is inferred more eas-ily and accurately the closer the routers are to the traceroutesource(s). Such bias possibly results in incorrectly interpret-ing power law-type degree distributions when the true under-lying connectivity structure is a regular graph (e.g., Erd¨os-Renyi [43]).

Ongoing research continues to reveal new idiosyncrasiesof traceroute-derived measurements and shows that their in-terpretation or analysis requires great care and diligent miningof other available data sources. Although the challenges as-sociated with disambiguating the available measurements andidentifying those contributions that are relevant for the Inter-net’s router-level topology can be daunting, using these mea-surements at face value and submitting them to commonly-used, black box-type statistical analyses—as is common in thecomplex systems literature—is ill-advised and bound to resultin erroneous conclusions. To illustrate, Figure 10(a) shows thesize-frequency plot for the raw traceroute-derived router-levelconnectivity data obtained by the Mercator project [50], withFigure 10(b) depicting a smoothed version of the plot in (a),obtained by applying a straightforward binning operation tothe raw measurements, as is common practice in the physicsliterature. In fact, Figures 10(a)–(b) are commonly used inthe SF literature (e.g., see [4]) as empirical evidence thattherouter-level topology of the Internet exhibits power-law de-gree distributions. However, in view of the above-mentionedambiguities of traceroute-derived measurements, it is highlylikely that the two extreme points with node degrees above1,000 are really instances where the high IP-level connectiv-ity is an illusion created by an underlying Layer 2 technologyand says nothing about the actual connectivity at the physicallevel. When removing the two nodes in question and relyingon the statistically more robust size-rank plots in Figures10(c) and (d), we notice that neither the doubly-logarithmic norsemi-logarithmic plots support the claim of a power law-typenode degree distribution for the Internet’s router-level topol-ogy. In fact, Figures 10(c) and (d) strongly suggest thatthe actual router-level connectivity is more consistent with anexponentially-fast decaying node degree distribution, instarkcontrast to what is typically claimed in the existing SF litera-ture.

6.2 (Re)Defining “Scale-Free” Graphs

While it is unlikely that the Internet as a whole has scalingdegree sequences, it would not be in principle technologicallyor economically infeasible to build a network which did. Itwould, however, be utterly infeasible to build a large networkwith high-degree SF hubs, or more generally one that had bothhigh variability in node degree and larges(g). Thus in makingprecise the definition of scale-free, there are essentiallytwopossibilities. One is to define scale-free as simply having ascaling degree sequence, from which no other properties fol-

25

100 101 102 103 104100

101

102

103

104

105

106

100

101

102

10310

0

101

102

103

104

105

100 101 102 103100

101

102

103

104

105

106

0 50 100 150 200 250 300 350100

101

102

103

104

105

106

Figure 10: TRACEROUTE-DERIVED ROUTER-LEVEL CONNECTIVITY DATA FROM THE MERCATOR PROJECT[50]. (a) Doubly logarithmic size-frequency plot: Raw data. (b) Doubly logarithmic size-frequency plot: Binned data. (c) Doubly-logarithmic size-rank plot: Raw data with the 2extreme nodes (with connectivity> 1,000) removed.(d) Semi-logarithmic size-rank plot: Raw data with the 2 extreme nodes (with connectivity> 1,000)removed.

low. The other is to define scale-free more narrowly in such away that a rich set of properties are implied. Given the strongset of self-similarity properties of graphsg having highs(g),we propose the following alternate definition of what it meansfor a graph to be “scale-free”.

Definition 6. For graphsg ∈ G(D) whereD is scaling, wemeasure the extent to which the graphg is scale-free by themetrics(g).

This definition for “scale-free graphs” is restricted here to sim-ple, connected graphs having scalingD, but s(g) can obvi-ously be computed for any graphs having any degree sequence,and thus definings(g) as a measure of “scale-free” might po-tentially be overly narrow. Nonetheless, in what follows, fordegree sequencesD that are scaling, we will informally callgraphsg ∈ G(D) with low s(g)-values“scale-rich” , andthose with highs(g)-values“scale-free.” Being structural innature, this alternate definition has the additional benefitof notdepending on a stochastic model underlying the set of graphsof interest. It does not rely on the statistical physics-inspiredapproach that focuses on random ensembles and their mostlikely elements and is inherent, for example, in the originalBarabasi-Albert construction procedure.

Our proposed definition for scale-free graphs requires thatfor a graphg to be called scale-free, the degree sequenceDof g must be scaling (or, more generally, highly variable)andself-similar in the sense thats(g) must be large. Furthermore,s(g) gives a quantitative measure of the extent to which a scal-ing degree graph is scale-free. In addition, this definitioncap-tures an explicit and obvious relationship between graphs thatare “scale-free” and have a “hub-like core” of highly connectedcentrally-located nodes. More importantly, in view of Step2 ofthe above-mentioned logic, the claim that scale-free networkshave “SF hubs” is true with scale-free defined as scaling degreesequenceandhigh s(g), but false if scale-free were simply tomean scaling degree sequence, as is commonly assumed in theexisting SF literature.

With a concise measures(g) and its connections withrich self-similarity/self-dissimilar properties and likelihood,we can look back and understand how both the appeal and fail-ure of the SF literature is merely a symptom of much broaderand deeper disconnects within complex networks research.First, while there are many possible equivalent definitionsofscale-free, all nontrivial ones would seem to involve combin-ing scaling degree with self-similarity or high likelihoodandappear to be equivalent. Thus defined, models that generatescale-free graphs are easily constructed and are thereforenotour main focus here. Indeed, because of the strong invarianceproperties of scaling distributions alone, it is easy to createlimitless varieties of randomizing generative models thatcan

“grow” graphs with scaling degreeD. Preferential growth isperhaps the oldest of such models [117, 66, 94], so it is nosurprise that it resurfaces prominently in the recent SF liter-ature. No matter how scaling is generated however, the highlikelihood and rewiring invariance of high-s(g) graphs make itfurther easy—literally highly likely–to insure that thesescal-ing graphs are also scale-free.

Thus secondly, the equivalence between “highs” and“highly likely” makes it possible to define scale-free as thelikely or generic outcome of a great variety of random growthmodels. In fact, that “lows” or “scale-rich” graphs are van-ishingly unlikely to occur at random explains why the SF lit-erature has not only ignored their existence and missed theirrelevance but also conflated scale-free with scaling. Finally,since scaling and highs are both so easily and robustly gener-ated, requiring only few simple statistical properties, countlessvariations and embellishments of scale-free models have beenproposed, with appealing but ultimately irrelevant details anddiscussions of emergence, self-organization, hierarchy,modu-larity, etc. However, their additional self-similarity properties,though still largely unexplored, have made the resulting scale-free networks intuitively appealing, particularly to those whocontinue to associate complexity with self-similarity.

The practical implication is that while our proposed defi-nition of what it means for a graph to be “scale-free” recoversmany claims in the existing SF literature, some aspects can-not be salvaged. As an alternate approach, we could accept adefinition of scale-free that is equivalent to scaling, as isim-plicit in most of the SF literature. However, then the notionof“scale-free” is essentially trivial, and almost all claimsin theexisting literature about SF graphs are false, not just the onesspecific to the Internet. We argue that a much better alterna-tive is a definition of scale-free, as we propose, that impliesthe existence of “hubs” and other emergent properties, but ismore restrictive than scaling. Our proposed alternative, thatscale-free is a special case of scaling that further requires highs(g), not only provides a quantitative measure about the extentto which a graph is scale-free, but also already offers abundantemergent properties, with the potential for a rigorous and richtheory.

In summary, notwithstanding the errors in the interpreta-tion and analysis of available network measurement data, evenif the Internet’s router-level graph were to exhibit a powerlaw-type node degree distribution, we have shown here and in otherpapers (e.g., see [65, 110]) that the final conclusion in Step4 is necessarily wrong for today’s Internet. No matter howscale-free is defined, the existing SF claims about the Inter-net’s router-level topology cannot be salvaged. Adopting ourdefinitions, the router topology at least for some parts of theInternet could in principle have high variability and may even

26

be roughly scaling , but it is certainly nowhere scale-free.Itis in fact necessarily extremely “scale-rich” in a sense we havemade rigorous and quantifiable, although the diversity of scale-rich graphs means that much more must be said to describewhich scale-rich graphs are relevant to the Internet. A mainlesson learned from this exercise has been that in the contextof such complex and highly engineered systems as the Inter-net, it is largely impossible to understand any nontrivial net-work properties while ignoring all domain-specific detailssuchas protocol stacks, technological or economic constraints, anduser demand and heterogeneity, as is typical in SF treatmentsof complex networks.

6.3 Towards a Rigorous Theory of SF Graphs

Having proposed the quantitys(g) as a structural measures ofthe extent to which a given graph is “scale-free”, we can nowreview the characteristics of scale-free graphs listed in Section3 and use our results to clarify what is true if scale-free is takento mean scaling degree sequence and larges(g):

1. SF networks have scaling (power law) degree sequence(follows by definition).

2. SF networks are the likely outcome of various randomgrowth processes (follows from the equivalence ofs(g)with a natural measure of graph likelihood).

3. SF networks have a hub-like core structure (follows di-rectly from the definition ofs(g) and the betweenessproperties of high-degree hubs).

4. SF networks are generic in the sense of being pre-served by random degree-preserving rewiring (followsfrom the characterization of rewiring invariance of self-similarity).

5. SF networks are universal in the sense of not dependingon domain-specific details (follows from the structuralnature ofs(g)).

6. SF networks are self-similar (is now partially clarified inthat highs(g) trees are preserved under both appropri-ately defined link trimming and coarse graining, as wellas restriction to small motifs).

Many of these results are proven only for special cases andhave only numerical evidence for general graphs, and thuscan undoubtedly be improved upon by proving them in greatergenerality. However in most important ways the proposed def-inition is entirely consistent with the spirit of “scale-free” asit appears in the literature, as noted by its close relationship topreviously defined notions of betweeness, assortativity, degreecorrelation, and so on. Since a highs(g)-value requires high-degree nodes to connect to other high-degree nodes, there isan explicit and obvious equivalence between graphs that arescale-free (i.e., have highs(g)-value) and have a “hub-likecore” of highly connected nodes. Thus the statement “scale-free networks have hub-like cores”—while incorrect under thecommonly-used original and vague definition (i.e., meaningscaling degree sequence)—is now true almost by definitionand captures succinctly the confusion caused by some of thesensational claims that appeared in the scale-free literature. Inparticular, the consequences for network vulnerability intermsof the “Achilles’ heel” and a zero epidemic threshold followimmediately.

When normalized against a proper background set, ourproposeds(g)-metric provides insight into the diversity of net-works having the same degree sequence. On the one hand,graphs havings(g) ≈ smax are scale-free and self-similarin the sense that they appear to exhibit strong invarianceproperties across different scales, where appropriately definedcoarse-graining operations (including link trimming) give riseto the different scales or levels of resolution. On the otherhand, graphs havings(g) << smax are scale-rich and self-dissimilar; that is, they display different structure at differ-ent levels of resolution. While for scale-free graphs, degree-preserving random rewiring does not significantly alter theirstructural properties, even a modest amount of rewiring de-stroys the structure of scale-rich graphs. Thus, we suggestthata heuristic test as to whether or not a given graph is scale-freeis to explore the impact of degree-preserving random rewiring.Recent work on the Internet [65] and metabolic networks [102]as well as on more general complex networks [112] demon-strates that many important large-scale complex systems arescale-rich and display significant self-dissimilarity, suggestingthat their structure is far from scale-free and the oppositeofself-similar.

6.4 SF Models and the Internet?

For the Internet, we have shown that no matter how scale-free is defined, the existing SF claims about the “robust, yetfragile” nature of these systems (particularly any claims ofan “Achilles’ heel” type of vulnerability) are wrong no matterhow scale-free is defined. By tracing through the reasoning be-hind these SF claims, we have identified the source of this errorin the application of SF models to domains like engineering(or biology) where design, evolution, functionality, and con-straints are all key ingredients that simply cannot be ignored.In particular, by assuming that scale-free is defined as scaling(or, more generally, highly variable) plus highs(g), and fur-ther usings(g) as a quantitative measure of how scale-free agraph is, the failure of SF models to correctly and usefully ap-ply in an Internet-related context has been limited to errors dueto ignoring domain-specific details, rather than to far morese-rious and general mathematical errors about the propertiesofSF graphs themselves. In fact, with our definition, there is thepotential for a rich and interesting theory of SF graphs, lookingfor relevant and useful application domains.

One place where SF graphs may be appropriate and prac-tically useful in the study of the Internet is at the higher levelsof network abstraction, where interconnectivity is increasinglyunconstrained by physical limitations. That is, while the low-est layers of the Internet protocol stack involving the physicalinfrastructure such as routers and fiber-optic cables have hardtechnological and economic constraints, each higher layerde-fines its own unique connectivity, and the corresponding net-work topologies become by design increasingly more virtualand unconstrained. For example, in contrast to routers andphysical links, the connectivity structure defined by the docu-ments (nodes) and hyperlinks (connections) in the World WideWeb (WWW) is designed to be essentially completely uncon-strained. While we have seen that it is utterly implausible thatSF models can capture the essential features of the router-levelconnectivity in today’s Internet, it seems conceivable that theycould representvirtual graphs associated with the Internet suchas, hypothetically, the WWW or other types of overlay net-works.

27

However, even in the case of more virtual-type graphs as-sociated with the Internet, a cautionary note about the applica-bility of SF models is needed. For example, consider the Inter-net at the level of autonomous systems, where anautonomoussystem (AS)is a subnetwork or domain that is under its ownadministrative control. In an AS graph representation of theInternet, each node corresponds to an AS and a link betweentwo nodes indicates the presence of a “peering relationship”between the two ASes—a mutual willingness to carry or ex-change traffic. Thus, a single “node” in an AS graph (e.g.,AS 1239 is the Sprintlink network) represents potentially hun-dreds or thousands of routers as well as their interconnections.Although most large ASes have several connections (peeringpoints) to other ASes, the use of this representation means thatone is collapsing possibly hundreds of different physical (i.e.,router-level) connections into a single logical link between twoASes. In this sense, the AS graph is expressively not a repre-sentation of any physical aspect of the Internet, but definesa virtual graph representing business (i.e., peering) relation-ships among network providers (i.e., ASes). Significant atten-tion has been directed toward discovering the structural aspectsof AS connectivity as represented by AS graphs and inferredfrom BGP-based measurements (where theBorder GatewayProtocolor BGPis the de facto standard inter-AS routing pro-tocol deployed in today’s Internet [100, 88]) and speculatingon what these features imply about the large-scale propertiesof the Internet. However, the networking significance of theseAS graphs is very limited since AS connectivity alone sayslittle about how the actual traffic traverses the different ASes.For this purpose, the relevant information is encoded in thelinktype (i.e., peering agreement such as peer-to-peer or provider-customer relationship) and in the types of routing policiesusedby the individual ASes to enforce agreed-upon business ar-rangements between two or more parties.

In addition, due to the infeasibility of measuring AS con-nectivity directly, the measurements that form the basis for in-ferring AS-level maps consist of BGP routing table snapshotscollected, for example, by the University of Oregon RouteViews Project [88]. To illustrate the degree of ambiguity inthe inferred AS connectivity data, note for example that duetothe way BGP routing works, snapshots of BGP routing tablestaken at a few vantage points on the Internet over time are un-likely to uncover and capture all existing connections betweenASs. Indeed, [30] suggests that AS graphs inferred from theRoute Views data typically miss between 20-50% or even moreof the existing AS connections. This is an example of the gen-eral problem ofvantage pointmentioned in [81], whereby thelocation(s) of exactly where the measurements are performedcan significantly skew the interpretation of the measurements,often in quite non-intuitive ways. Other problems that are ofconcern in this context have to do with ambiguities that canarise when inferring the type of peering relationships betweentwo ASes or, more importantly, with the dynamic nature ofAS-level connectivity, whereby new ASes can join and exist-ing ASes can leave, merge, or split at any time.

This dynamic aspect is even more relevant in the context ofthe Web graph, another virtual graph associated with the Inter-net that is expressively not a representation of any physical as-pect of the Internet structure but where nodes and links repre-sent pages and hyperlinks of the WWW, respectively. Thus inaddition to the deficiencies mentioned in the context of router-level Internet measurements, the topologies that are more vir-tual and “overlay” the Internet’s physical topology exhibit anaspect of dynamic changes that is largely absent on the physi-

cal level. This questions the appropriateness and relevance ofa careful analysis or modeling of commonly considered staticcounterparts of these virtual topologies that are typically ob-tained by accumulating the connectivity information containedin a number of different snapshots taken over some time periodinto a single graph.

When combined, the virtual nature of AS or Web graphsand their lack of critical networking-specific informationmakethem awkward objects for studying the “robust yet fragile” na-ture of the Internet in the spirit of the “Achilles’ heel” argu-ment [6] or largely inappropriate structures for investigatingthe spread of viruses on the Internet as in [21]. For exam-ple, what does it mean to “attack and disable” a node such asSprintlink (AS 1239) in a representation of business relation-ships between network providers? Physical attacks at this levelare largely meaningless. On the other hand, the economic andregulatory environment for ISPs remains treacherous, so ques-tions about the robustness (or lack thereof) of the Internetatthe AS-level to this type of disruption seem appropriate. Andeven if one could make sense of physically “attacking and dis-abling” nodes or links in the AS graph, any rigorous investiga-tion of its “robust yet fragile” nature would have to at leastac-count for the key mechanisms by which BGP detects and reactsto connectivity disruptions at the AS level. In fact, as in thecase of the Internet’s router-level connectivity, claims of scale-free structure exhibited by inferred AS graphs fail to capturethe most essential “robust yet fragile” features of the Internetbecause they ignore any significant networking-specific infor-mation encoded in these graphs beyond connectivity. Again,the actual fragilities are not to physical attacks on AS nodes butto AS-related components “failing on,” particularly via BGP-related software or hardware components working improperlyor being misconfigured, or via malicious exploitation or hi-jacking of BGP itself.

6.5 The Contrasting Role of Randomness

To put our SF findings in a broader context, we briefly reviewan alternate approach to the use of randomness for understand-ing system complexity that implicitly underpins our approachin a way similar to how statistical physics underpins the SFliterature. Specifically, the notions ofHighly Optimized Toler-ance (HOT)[28] or Heuristically Organized Tradeoffs[44] hasbeen recently introduced as a conceptual framework for cap-turing the highly organized, optimized, and “robust yet fragile”structure of complex highly evolved systems [29]. Introducedin the spirit of canonical models from statistical physics—suchas percolation lattices, cellular automata, and spin glasses—HOT is an attempt to use simple models that capture someessence of the role of design or evolution in creating highlystructured configurations, power laws, self-dissimilarity, scale-richness, etc. The emphasis in the HOT view is on “organizedcomplexity”, which contrasts sharply with the view of “emer-gent complexity” that is preferred within physics and the SFcommunity. The HOT perspective is motivated by biologyand technology, and HOT models typically involve optimiz-ing functional objectives of the system as a whole, subjectto constraints on their components, usually with an explicitsource of uncertainty against which solutions must be tolerant,or robust. The explicit focus on function, constraints, opti-mization, and organization sharply distinguish HOT from SFapproaches. Both consider robustness and fragility but reachopposite and incompatible conclusions.

A toy model of the HOT approach to modeling the router-

28

level Internet was already discussed earlier. The underlyingidea is that consideration of the economic and technologi-cal factors constraining design by Internet Service Providers(ISPs) gives strong incentives to minimize the number andlength of deployed links by aggregating and multiplexing traf-fic at all levels of the network hierarchy, from the periph-ery to the core. In order to efficiently provide high through-put to users, router technology and link costs thusnecessi-tate that by and large link capacities increase and router de-grees decrease from the network’s periphery to its more aggre-gated core. Thus, the toy modelHOTnetin Figure 5(d), likethe real router-level Internet, has a mesh of uniformly high-speed low connectivity routers in its core, with greater vari-ability in connectivity at its periphery. While a more detaileddiscussion of these factors and additional examples is avail-able from [65, 41], the result is that this work has explainedwhere within the Internet’s router-level topology the highde-gree nodes might be and why they might be there, as well aswhere they can’t possibly be.

The HOT network that results is not just different than theSF network but completely opposite, and this can be seen notonly in terms relevant to the Internet application domain, suchas the performance measure (7), robustness to router and linklosses, and the link costs, but in the criteria considered withinthe SF literature itself. Specifically, SF models are gener-ated directly from ensembles and random processes, and havegeneric microscopic features that are preserved under randomrewiring. HOT models have highly structured, rare configu-rations which are destroyed by random rewiring, unless thatis made a specific design objective. SF models are universalin ignoring domain details, whereas HOT is only universal inthe sense that it formulates everything in terms of robust, con-strained optimization, but with highly domain-specific perfor-mance objectives and constraints.

One theme of the HOT framework has been that engineer-ing design or biological evolution easily generates scaling ina variety of toy models once functional performance, compo-nent constraints, and robustness tradeoffs are considered. BothSF and HOT models of the Internet yield power laws, but onceagain in opposite ways and with opposite consequences. HOTemphasizes the importance of high variability over power lawsper se, and provides a much deeper connection between vari-ability or scaling exponents and domain-specific constraintsand features. For example, the HOT Internet model consideredhere shows that if high variability occurs in router degree it canbe explained by high variability in end user bandwidth togetherwith constraints on router technology and link costs. ThusHOT provides a predictive model regarding how different ex-ternal demands or future evolution of technology could changenetwork statistics. The SF models are intrinsically incapableof providing such predictive capability in any applicationdo-main. The resulting striking differences between these twomodeling approaches and their predictions are merely symp-tomatic of a much broader gap between the popular physicsperspective on complex networks versus that of mathematicsand engineering, created by a profoundly different perspectiveon the nature and causes of high variability in real world data.For example, essentially the same kind of contrast holds forHOT and SOC models [29], where SOC is yet another theoreti-cal framework with specious claims about the Internet [97, 11].

In contrast to the SF approach, the HOT models describedabove as well as their constraints and performance measuresdo not require any assumptions, implicit or explicit, that theywere drawn directly from some random ensemble. Tradeoffs

in the real Internet and biology can be explained without in-sisting on any underlying random models. Sources of random-ness are incorporated naturally where uncertainty needs tobemanaged or accounted for, say for the case of the router-levelInternet, in a stochastic model of user bandwidth demands andgeographic locations of users, routers, and links, followed bya heuristic or optimal design. This can produce either an en-semble of network designs, or a single robust design, depend-ing on the design objective, but all results remain highly con-strained and are characterized by lows(g) and highPerf(g).This is typical in engineering theories, where random modelsare common but not required, and where uncertainty can bemodeled with random ensembles or worst-case over sets. Inall cases, uncertainty models are mixed with additional hardconstraints, say on component technology.

In the SF literature, on the other hand, random graphmodels and statistical physics-inspired approaches to networksare so deep-rooted that an underlying ensemble is taken forgranted. Indeed, in the SF literature the phrase “not random”typically does not refer to a deterministic process but meansrandom processes having some non-uniform or high variabilitydistribution, such as scaling. Furthermore, random processesare used to directly generate SF network graphs rather thanmodel uncertainty in the environment, leading in this case tohighs(g) and lowPerf(g) graphs. This particular view of ran-domness also blurs the important distinction between what isunlikely and what is impossible. That is, what is unlikely tooccur in a random ensemble (e.g. a lows(g) graph) is treatedas impossible, while what is truly impossible (e.g. an Internetwith SF hubs) from an engineering perspective is viewed aslikely from an ensemble point of view. Similarly, the relationbetween high variability, scaling, and scale-free is murkyinthe SF literature. These distinctions may all be irrelevantforsome scientific questions, but they are crucial in the study ofengineering and biology and also essential for mathematicalrigor.

7 A HOT vs. SF Viewof Biological Networks

This section describes how a roughly parallel SF vs HOT storyexists in metabolic networks, which is another applicationareathat has been very popular in the SF and broader “complex net-works” literature [19] and is also discussed in more detain in[48]. Recent progress has clarified many features of the globalarchitecture of biological metabolic networks [36]. We arguehere that they have highly organized and optimized tolerancesand tradeoffs for functional requirements of flexibility, effi-ciency, robustness, and evolvability, with constraints oncon-servation of energy, redox, and many small moieties. Theseare all canonical examples of HOT features, and are largely ig-nored in the SF literature. One consequence of this HOT archi-tecture is a highly structured modularity that is self-dissimilarand scale-rich, as in the Internet example. All aspects ofmetabolism have extremes in homogeneity and heterogeneity,and low and high variability, including power laws, in bothmetabolite and reaction degree distributions. We will brieflyreview the results in [102] which illustrate these featuresus-ing the well-understood stoichiometry of metabolic networksin bacteria.

One difficulty in comparing SF and HOT approaches is thatthere is no sense in which ordinary (not bipartite) graphs canbe used to meaningfully describe metabolism, as we will make

29

clear below. Thus one would need to generalize our definitionof scale-free to bipartite graphs just to precisely define whatit would mean for metabolism to be scale-free. While this isan interesting direction not pursued here, the SF literature hasmany fewer claims about bipartite graphs, and they are typ-ically studied by projecting them down onto one set of ver-tices. What is clear is that no definition can possibly salvagethe claim that metabolism is scale-free, and we will pursuethis aspect in a more general way. The rewiring-preserved fea-tures of scale-free networks would certainly be a central fea-ture of any claim that metabolism is scale-free. This is alsoa feature of the two other most prominent “emergent com-plexity” models of biological networks, edge-of-chaos (EOC)and self-organized criticality (SOC). While EOC adds booleanlogic and SOC adds cellular automata to the graphs of net-work connectivity, both are by definition unchanged by ran-dom degree-preserving rewiring. In fact, they are preservedunder much less restrictive rewiring processes. Thus whilethere are currently no SF, EOC, or SOC models that apply di-rectly to metabolic networks, we can clearly eliminate themapriori as candidate theories by showing that all important bio-chemical features of real metabolic networks are completelydisrupted by rewirings that are far more restrictive than whatis by definition allowable in SF, EOC, or SOC models.

7.1 Graph Representation

Cellular metabolism is described by a series of chemical reac-tions that convert nutrients to essential components and energywithin the cell, subject to conservation constraints of atoms,energy and small moieties. The simplest model of metabolicnetworks is a stoichiometry matrix, or s-matrix for short, withrows of metabolites and columns of reactions. For example,for the set of chemical reactions

S1 + NADH → S2 + NAD,S2 + ATP ↔ S3 + ADP,S4 + ATP → S5 + ADP,

(15)

we can write the associatedstoichiometry matrix, or s-matrix,as

ReactionsR1 R2 R3

Substrates

S1

S2

S3

S4

S5

Carriers

ATPADP

NADHNAD

−1 0 01 −1 00 1 00 0 −10 0 10 −1 −10 1 1−1 0 01 0 0

(16)

with the metabolites in rows and reactions in columns. Thisis the simplest model of metabolism and is defined unambigu-ously except for permutations of rows and columns, and thusmakes an attractive basis for contrasting different approachesto complex networks [19, 40, 55, 84].

Reactions in the entire network are generally grouped intostandard functional modules, such as catabolism, amino acidbiosynthesis, nucleotide biosynthesis, lipid biosynthesis andvitamin biosynthesis. Metabolites are categorized largely intocarrier and non-carrier substrates as in rows of (16). Carriermetabolites correspond to conserved quantities, are activated

in catabolism, and act as carriers to transfer energy by phos-phate groups, hydrogen/redox, amino groups, acetyl groups,and one carbon units throughout all modules. As a result,they appear in many reactions. Non-carrier substrates are cat-egorized further into precursor and other (than precursor andcarrier) metabolites. The 12 precursor metabolites are out-puts of catabolism and are the starting points for biosynthesis,and together with carriers make up the “knot” of the “bow-tie”structure [36] of the metabolism. The other metabolites occurprimarily in separate reaction modules.

The information conveyed in the s-matrix can be repre-sented in a color-coded bipartite graph, called ans-graph[102](Figure 11), where both reactions and metabolites are rep-resented as distinct nodes and membership relationships ofmetabolites to reactions are represented by links. With thecolor-coding of links indicating the reversibility of reactionsand the sign of elements in the s-matrix, all the biochemicalinformation contained in the s-matrix is accurately reflected inthe s-graph. One of the most important features of s-graphsof this type is the differentiation between carrier (e.g. ATP)and non-carrier metabolites that help to clarify biochemicallymeaningful pathways. An s-graph for a part of amino acidbiosynthesis module ofH. Pylori is shown in Figure 12. Theobjective of each functional module is to make output metabo-lites from input metabolites through successive reactions. Theenzymes of core metabolism are highly efficient and special-ized, and thus necessarily have few metabolites and involvesimple reactions [102]. As a result, long pathways are requiredbiochemically to build complex building blocks from simplerbuilding blocks within a function module. Long pathways areevident in the s-graph in Figure 12.

Simpler representations of the information in the s-graphare possible, but only at a cost of losing significant biochemicalinformation. A metabolite graph in which nodes represent onlymetabolites and are “connected” when they are involved in thesame reaction, or a reaction graph in which nodes representonly reactions and are “connected” when they contain com-mon metabolites (both shown in Figure 11) destroys much ofthe rich structure and biochemical meaning when compared tothe s-graph. (The metabolite graphs are sometimes further re-duced by deleting carriers.) Nonetheless, many recent studieshave emphasized the connectivity features of these graphs,andreports of power laws in some of the degree distribution havebeen cited as claims that (1) metabolic networks are also scale-free [19] and (2) the presence of highly connected hubs andself-similar modularity capture much of the essential detailsabout ”robust yet fragile” feature of metabolism [84]. Here,highly connected nodes are carriers, which are shared through-out metabolism.

We will first clarify why working with any of these sim-ple graphs of metabolism, rather than the full s-graph, de-stroys their biochemical meaning and leads to a variety oferrors. Consider again the simple example of Equation (15)and its corresponding s-matrix (16). Here, assume that re-actionsR1 andR2 are part of the pathways of a functionalmodule, say amino acid biosynthesis, and reactionR3 is inanother module, say lipid biosynthesis. Then the metaboliteand reaction graphs both show that substratesS3 andS4, aswell as reactionsR2 andR3 are “close” simply because theyshare ATP/ADP. However, since they are in different func-tional modules, they are not close in any biologically meaning-ful sense. (Similarly, two functionally different and geograph-ically distant appliances are not “close” in any meaningfulsense simply because both happen to be connected to the US

30

S32S1 S2

S4 S5

ADPATP

1

3

NADNADH

s-graph

21

3

Reaction graph

S3S1 S2

S4 S5

ADPATP

NADNADH

Metabolite graph

Figure 11: Graph representations of enzymatically catalyzed reactions from equation (15) having the s-matrix in equation (16).An s-graph consists of reaction nodes (black diamonds), non-carrier metabolite nodes (orange squares), and carrier metabolitenodes (light blue squares). Red and blue edges correspond topositive and negative elements in the stoichiometry matrix, re-spectively, for irreversible reactions, and pink and greenones correspond to positive and negative elements, respectively, forreversible reactions. All the information in the s-matrix appears schematically in the s-graph. Carriers which alwaysoccur inpairs (ATP/ADP, NAD/NADH etc.) are grouped for simplification. Corresponding metabolite graphs containing only metabo-lite nodes and reaction graph containing only reaction nodes lose important biochemical information. Note that all metabolitesand reactions are “close” in the metabolite and reaction graphs, simply because they share common carriers, but could bearbi-trarily far apart in any real biochemical sense. For example, reactions 2 and 3 could be in amino acid and lipid biosynthesis,respectively, and thus would be far apart biochemically andin the s-graph.

PINADNADH

ADPATP

NADPNADPH

CO2 COAACCOA

PPI AMPATP

NH3 AC THFMTH H2S

GLC

DPG

PGL

PGC RL5P

X5P

6PG

MAL

CIT

ICIT

SUCFUM

G6P

F6P

T3P

3PG

PEP

R5P

E4P

PYR

OA

SUCOA

PRPP

DAH DQT DHS SME S5P PSM CHO

AN NAN CD5 IGP

PPN HPP

BAPASS

HSE PHSDHD

PIP SAK SDP DPI MDP

PHP PPS

ASE

GLN

SER

TRP

ASP

TYR

THR

LYS

CYS

GLY

ASN

AKG GLU

Figure 12: An s-graph for part of the catabolism and amino acid biosynthesis module ofH.Pylori. The conventions are thesame as those in Figure 11. This illustrates that long biosynthetic pathways build complex building blocks (in yellow ontheright) from precursors (in orange on the left) in a series of simple reactions (in the middle), using shared common carriers (atthe bottom). Each biosynthetic module has a qualitatively similar structure.

31

GLC

DPG

PGL

PGC RL5P

X5P

6PG

MAL

CIT

ICIT

SUCFUM

G6P

F6P

T3P

3PG

PEP

R5P

E4P

PYR

OA

SUCOA

PRPP


AN NAN CD5 IGP

PPN HPP

BAPASS

HSE PHSDHD

PIP SAK SDP DPI MDP

PHP PPS

ASE

GLN

SER

TRP

ASP

TYR

THR

LYS

CYS

GLY

ASN

AKG GLU

precursors amino acidsFigure 13: The s-graph in Figure 12 with carriers deleted to highlight the long assembly pathways. Note that there are nohigh degree “hub” nodes responsible for the global connectivity of this reduced s-graph. AKG and GLU are carriers for aminogroups, and this role has been left in.

power grid.) Attempts to characterize network diameter aremeaningless in such simplified metabolite graphs because theyfail to extract biochemically meaningful pathways. Additionalwork using structural information of metabolites with carbonatomic traces [10] has clarified that the average path lengthsbetween all pairs of metabolites inE. coli is much longer thanhas been suggested by approaches that consider only simplisticconnectivity in metabolite graphs. “Achilles’ heel” statements[6] for metabolic networks are particularly misleading. Elimi-nating, say, ATP from a cell is indeed lethal but the explanationfor this must involve its biochemical role, not its graph con-nectivity. Indeed, the “Achilles’ heel” arguments suggestthatremoval of the highly connected carrier “hub” nodes wouldfragment the graph, but Figure 13 shows that removing the car-riers from biologically meaningful s-graph in Figure 12 stillyields a connected network with long pathways between theremaining metabolites. If anything, this reduced representa-tion highlights many of the more important structural featureof metabolism, and most visualizations of large metabolic net-works use a similar reduction. Attempts to “fix” this problemby a priori eliminating the carriers from metabolite graphsre-sults in graphs with low variability in node degree and thusare not even scaling, let alone scale-free. Thus the failureofthe SF graph methods to explain in any way the features ofmetabolism is even more serious than for the Internet.

7.2 Scale-rich metabolic networks

Recent work [102] has clearly shown the origin of high vari-ability in metabolic networks by consideration of both theirconstraints and functional requirements, together with bio-

chemically meaningful modular decomposition of metabolitesand reactions shown above. Since maintaining a large genomeand making a variety of enzymes is costly, the total numberof reactions in metabolism must be kept relatively small whileproviding robustness of the cell against sudden changes, oftendue to environmental fluctuations, in either required amount ofproducts or in available nutrients. In real metabolic networks,scaling only arises in the degree distribution of total metabo-lites. The reaction node degree distribution shows low vari-ability because of the specialized enzymes which allow onlyafew metabolites in each reaction. High variability in metabo-lite node degrees is a result of the mixture of a few high degreeshared carriers with many other low degree metabolites uniqueto each function module, with the precursors providing inter-mediate degrees. Thus, the entire network is extremely scale-rich, in the sense that it consists of widely different scales andis thus fundamentally self-dissimilar.

Scale-richness of metabolic networks has been evaluatedquantitatively in [103] by degree-preserving rewiring of realstoichiometry matrices, which severely alters their structuralproperties. Preserving only the metabolite degrees gives muchhigher variability in reaction node degree distribution than ispossible using simple enzymes, and rewiring also destroysconservation of redox and moieties. The same kind of degree-preserving rewiring on a simple HOT model [104], proposedwith the essential feature of metabolism, such as simple reac-tions, shared carriers, and long pathways, has reinforced theseconclusions but in a more analytical framework [103]. Even(biologically meaningless) metabolite graphs have a lowS(g)value, and are thus scale-rich, not scale-free. The simple re-actions of metabolism require that the high-degree carriers are

32

PIADPATP

CO2NADNADH

COAACCOA

PPI AMPATP

NH3 AC THFMTH H2S

GLC

DPG

PGL

PGC RL5P

X5P

6PG

MAL

CIT

ICIT

SUCFUM

G6P

F6P

T3P

3PG

PEP

R5P

E4P

PYR

OA

SUCOA

PRPP


AN NAN CD5 IGP

PPN HPP

BAPASS

HSE PHSDHD

PIP SAK SDP DPI MDP

PHP PPS

ASE

GLN

SER

TRP

ASP

TYR

THR

LYS

CYS

GLY

ASN

AKG GLU

NADPNADPH

Figure 14: High connectivity metabolites in the s-graph in Figure 12 are these carriers which are not directly involved in thepathways. Because carriers are shared throughout metabolism, they are entirely responsible for the presence of high variabilityin metabolite degree, and thus the presence of scaling in metabolism.

33

more highly connected to low-degree metabolites than to eachother, as is shown in the metabolite graph in Figure 11, yield-ing a relatively lows(g) value. Figure 15 shows thes(g) val-ues for theH. Pylori metabolite graph compared with those forgraphs obtained by random degree-preserving rewiring of thatgraph.

Even the most restrictive possible rewiring destroys thestructure of metabolism, showing that no SF, SOC, or EOCmodels are possible, even in principle. Suppose we freeze therole of carriers in each reaction, and then allow only rewiringof the remaining metabolites. This would be equivalent to Fig-ure 13 with the carrier for amino group roles of AKG and GLUalso frozen. What then remains is nearly a tree, and thus therewiring counts from Figure 8 are approximately correct. Notethat half of all rewirings disconnect a tree, and monte carlonu-merical experiments of successive rewirings produces a jumbleof futile cycles and short, dead end pathways[103]. The longassembly lines of real metabolism are extremely rare configu-rations and highly scale-rich, and are vanishingly unlikely toarise by any random ensemble model such as in SF, SOC, orEOC theories.

Real metabolic networks are scale-rich in every conceiv-able interpretation, and cannot be scale-free in any sense con-sistent with the either the definitions in this paper or with thespirit of the SF literature. In contrast to any approach thattreats metabolic networks as generic, a biological perspectiverequires that the organization of metabolic networks be dis-cussed with emphasis on the functional requirements of con-version of nutrients to products with flexibility, efficiency, ro-bustness, and evolvability under the constraints on enzymecosts and conservation of energy, redox, and many small moi-eties [35, 99]. This structure is a natural consequence ofa highly optimized and structured tradeoff (HOT) “bow-tie”structure, which facilitates great robustness and efficiency butis also a source of vulnerability, but primarily to hijacking andfail-on of components [36, 104, 99]. A power law in metabo-lite degree is simply the natural null hypothesis with any struc-ture that exhibits high variability by shared carriers, andthusis by itself not suggestive of any further particular mechanism.

Another prominent example of biologic networks claimedto be SF [54, 116] is protein-protein interaction (PPI) net-works. This claim has lead to conclude that identifying high-degree ”hub” proteins reveals important features of PPI net-works. However, recent analysis [105] evaluating the claimsthat PPI node degree sequences follow a power law, a neces-sary condition for networks to be SF, shows that the node de-gree sequences of some published refined PPI networks do nothave power laws when analyzed correctly using the cumulativeplots as discussed in Section 2.1. Thus these PPI networks arenot SF networks. It is in principle possible that the data studiedin [105] is misleading because of the small size of the networkand potential experimental errors, and that real PPI networksmight have some features attributed to SF networks. At thistime we only can draw conclusions about (noisy) subgraphs ofthe true PPI network since the data sets are incomplete and pre-sumably contain errors. If it is true that appropriately sampledsubraphs of a SF graph is SF as was claimed in [116], they pos-sess a power law node degree sequence. That these subgraphsexhibit exponential node degree sequences suggests that theentire network is not SF. Since essentially all the claims thatbiological networks are SF are based on ambiguous frequency-degree analysis, this analysis must be redone to determine thecorrect form of the degree sequences. Analysis in [105] hasprovided clear examples that ambiguous plots of frequency-

degree could lead to erroneous conclusion on the existence andparametrization of power law relationships.

As we have shown above, cell metabolism plausibly canhave power laws for some data sets, but have none of the otherfeatures attributed to SF networks. Metabolic networks havebeen shown to be scale-rich (SR), in the sense that they arefar from self-similar [102] despite some power laws in certainnode degree sequence. Their power law node degree sequenceis a result of the mixture of exponential distributions in eachfunctional module, with carriers playing a crucial role. Inprin-ciple, PPI networks could have this SR structure as well, sincetheir subnetworks have exponential degree sequence, and per-haps power laws could emerge at higher levels of organization.This will be revealed only when a more complete network iselucidated. Still, the most important point is not whether thenode degree sequence follows a power law, but whether thevariability of the node degree sequences is high or low, and thebiological protocols that necessitate this high or low variabil-ity. These issues will be explored in future publications.

8 Conclusions

The setG(D) of graphsg with fixed scaling degreeD is ex-tremely diverse. However, most graphs inG(D) are, usingour definition, scale-free and have highs-values. This impliesthat these scale-free graphs are not diverse and actually sharea wide range of “emergent” features, many of which are of-ten viewed as both intriguing and surprising, such as hub-likecores, high likelihood under a variety of random generationmechanisms, preservation under random rewiring, robustnessto random failure but fragility to attack, and various kindsof self-similarity. These features have made scale-free net-works overwhelmingly compelling to many complex systemsresearchers and have understandably given scale-free findingstremendous popular appeal [15, 115, 6, 76, 14, 12]. This paperhas confirmed that these emergent features are plausibly con-sistent with our definition, and we have proven several con-nections, but much remains heuristic and experimental. Hope-fully, more research will complete what is potentially a richgraph-theoretic treatment of scale-free networks.

Essentially all of the extreme diversity inG(D) is in itsfringes that are occupied by the rare scale-rich smalls graphs.These graphs have little or nothing in common with each otheror with scale-free graphs beyond their degree sequence so, un-fortunately,s is a nearly meaningless measure for scale-richgraphs. We have shown that those technological and biolog-ical networks which have functional requirements and com-ponent constraints tend to be scale-rich, and HOT is a theo-retical framework aimed at explaining in simplified terms thefeatures of these networks. In this context, scale-free networksserve at best as plausible null hypotheses that typically col-lapse quickly under scrutiny with real data and are easily re-futed by applying varying amounts of domain knowledge.

At the same time, scale-free networks may still be relevantwhen applied to social or virtual networks where technolog-ical, economic, or other constraints play perhaps a lesser orno role whatsoever. Indeed, a richer and more complete andrigorous theory could potentially help researchers working insuch areas. For example, as discussed in Section 4.4.1, ex-ploring the impact of degree-preserving random rewiring ofcomponents can be used as a simple preliminary litmus testfor whether or not a SF model might be appropriate. It takeslittle domain expertise to see that randomly rewiring the in-

34

0 2000 4000 6000 80001.16

1.18

1.2

1.22

1.24

1.26

1.28x 10

6

Number of rewiring iteration

Metric s(g)

Figure 15: Metrics(g) for real metabolite graph forH. Pylori (∗) and those for graphs obtained by degree-preserving randomrewiring.

ternal connections of, say, the microchips or transistors in alaptop computer or the organs in a human body will utterlydestroy their function, and thus that SF models are unlikelyto be informative. On the other hand, one can think of sometechnological (e.g. wireless ad-hoc networks) and many socialnetworks where robustness to some kinds of random rewiringis an explicitly desirable objective, and thus SF graphs arenotso obviously inapplicable. For example, it might be instruc-tive to apply this litmus test to an AS graph that reflects ASconnectivity only as compared to the same graph that also pro-vides information about the type of peering relationships andthe nature of routing policies in place.

This paper shows that scale-free networks have the poten-tial for an interesting and rich theory, with most questions, par-ticularly regarding graphs that are not trees, still largely open.Perhaps a final message of this paper is that to develop a co-herent theory for scale-free networks will require adhering tomore rigorous mathematical and statistical standards thanhasbeen typical to date.

Acknowledgments

The authors are indebted to several colleagues for ongoingconversations and valuable feedback, particularly David Al-dous, Jean Carlson, Steven Low, Chris Magee, Matt Roughan,Stanislav Shalunov. This work was supported by Boeing,AFOSR URI 49620-01-1-0365 “Architectures for Secure andRobust Distributed Infrastructures”, the Army Institute forCollaborative Biotechnologies, NSF Award: CCF-0326635“ITR COLLAB: Theory and Software Infrastructure for aScalable Systems Biology,” AFOSR Award: FA9550-05-1-0032 “Bio Inspired Networks,” and Caltech’s Lee Center forAdvanced Networking. Parts of this work were done at the In-stitute of Pure and Applied Mathematics (IPAM) at UCLA aspart of the 2002 annual program on “Large-scale communica-tion networks.”

A Constructing an smax-graph

As defined previously, thesmax graph is the elementg in somebackground setG whose connectivity maximizes the quantitys(g) =

∑

(i,j)∈E didj , wheredi is the degree of vertexi ∈ V ,E is the set of links that defineg, andD = d1, d2, . . . dnis the corresponding degree sequence. Recall that sinceD isordered according tod1 ≥ d2 ≥ . . . ≥ dn, there will usuallybe many different graphs with vertices satisfyingD. The pur-pose of this Appendix is to describe how to construct such anelement for different background sets, as well as to discusstheimportance of choosing the “right” background set.

A.1 Among “Unconstrained” Graphs

As a first case, consider the set of graphs having degree se-quenceD, with only the requirement that

∑ni=1 di be even.

That is, we do not require that these graphs be simple (i.e.,they can have self-loops or multiple links between vertices)or that they even be connected, and we accordingly call thisset of graphs “unconstrained”. Constructing thesmax elementamong these graphs can be achieved trivially, by applying thefollowing two-phase process. First, for each vertexi: if di

is even, then attachdi/2 self-loops; ifdi is odd, then attach(di − 1)/2 self-loops, leaving one available “stub”. Second,for all remaining vertices with “stubs”, connect them in pairsaccording to decreasing values ofdi. Obviously, the resultinggraph is not unique as thesmax element (indeed, two verticeswith the same degree could replace their self-loops with con-nections among one another). Nonetheless, this constructiondoes maximizes(g), and in the case whendi is even for alli ∈V , one achieves ansmax graph withs(g) =

∑ni=1(di/2) · d2

i .As discussed in Section 5.4, against this background of un-constrained graphs, thesmax graph is the perfectly assortative(e.g.,r(g) = 1) graph. In the case when somedi are odd, thenthesmax graph will have a value ofs(g) that is somewhat lessand will depend on the specific degree sequence. Thus, thevalue

∑ni=1(di/2) ·d2

i represents an idealized upper bound forthe value ofsmax among unconstrained graphs, but it can onlybe realized in the case when all vertex degrees are even.

35

A.2 Among Graphs in G(D)

A significantly more complicated situation arises when con-structing elements of the spaceG(D), that is, simple con-nected graphs havingn vertices and a particular degree se-quenceD. Even so, not all sequencesD will allow for the con-nection ofn vertices, i.e. the setG(D) may be empty. In thelanguage of discrete mathematics, one says that a sequence ofintegersd1, d2, . . . , dn is graphical if it satisfies the degreesequence of some simple, connected graph, that is ifG(D) isnonempty. One characterization of whether or not a sequenceD corresponds to a simple, connected graph is due to Erdosand Gallai [42].

Theorem 1 (Erdos and Gallai [42]). A sequence of positiveintegersd1, d2, . . . , dn with d1 ≥ d2 ≥ . . . ≥ dn is graph-ical if and only if

∑ni=1 di is even and for each integerk,

1 ≤ k ≤ n − 1,

k∑

j=1

dj ≤ k(k − 1) +

n∑

j=k+1

min(k, dj).

As already noted, one possible problem is that the se-quence may have “too many” or “too few” degree-one vertices.For example, since the total number of linksl in any graph willbe equal tol =

∑ni=1 di/2, a connected graph cannot have an

odd∑n

i=1 di, but if this happens then adding or subtracting adegree-one vertex toD would “fix” this problem. Theorem1 further states that additional conditions are required toen-sure a simple connected graph, specifically that the degree ofany vertex cannot be “too large”. For example, the sequence10, 1, 1, 1 cannot correspond to a simple graph. We willnot attempt to explain all such conditions, except to note thatimprovements have been made to Theorem 1 that reduce thenumber of sufficient conditions to be checked [108] and alsothat several algorithms have been developed to test for the ex-istence of a graph satisfying a particular degree sequenceD(e.g., see the section on “Generating Graphs” in [93]).

Our approach to constructing thesmax element ofG(D) isvia a heuristic procedure that incrementally builds the networkin a greedy fashion, by iterating through the set of all poten-tial links O = (i, j) : i < j; i, j = 1, 2, . . . , n, which weorder according to decreasing values ofdidj . In what followswe refer to the valuedidj as theweightof link (i, j). We addlinks from the ordered list of elements inO until all verticeshave been added and the corresponding links satisfy the degreesequenceD. To facilitate the exposition of this construction,we introduce the following notation. LetA be the set of ver-tices that have been added to the partial graphgA, such thatB = V\A is the set of remaining vertices to be added. At eachstage of the construction, we keep track of thecurrent degreefor vertexi, denoteddi, so that it may be compared with itsintended degreedi (note thatdi = 0 for all i ∈ B). Definewi = di − di as the number of remainingstubs, that is, thenumber of connections still to be made to vertexi. Note thatvalues ofdi andwi will change during the construction pro-cess, while the intended degreedi remains fixed. For any pointduring the construction, definewA =

∑

i∈A wi to be the totalnumber of remaining stubs inA anddB =

∑

i∈B di to be thetotal degree of the unattached vertices inB. The valueswA anddB are critical to ensuring that the final graph is connected andhas the intended degree sequence. In particular, our algorithmwill make use of several conditions.

Condition A-1: (Disconnected Cluster). If at any pointduring the incremental construction the partial graphgA haswA = 0 while |B| > 0, then the final graph will be discon-nected.

Proof: By definition wA is the number of stubs available inthe partial graphgA. If there are additional nodes to be addedto the graph but no more stubs in the partial graph, then anyincremental growth can occur only by forming an additional,separate cluster.

Condition A-1a: (Disconnected Cluster). If at any pointduring the construction algorithm the partial graphgA haswA = 2 with |B| > 0, then adding a link between the twostubs ingA will result in a disconnected graph.

Proof: Adding a link between the two stubs will yieldwA = 0with |B| > 0, thus resulting in Condition A-1.

Condition A-2: (Tree Condition). If at any point during theconstruction

dB = 2|B| − wA, (17)

then the addition of all remaining vertices and links to thegraph must beacyclic (i.e., tree-like, without loops) in orderto achieve a single connected graph while satisfying the de-gree sequence.

Proof: To see this more clearly, suppose that for some inter-mediate point in the construction process thatwA = m. Thatis, there are exactlym remaining stubs in the connected com-ponent to which the remaining vertices inB must attach. Wecan prove that, in order to satisfy the degree sequence whilemaintaining a single connected graph, each of thesem stubsmust become the root of a tree. First, recall from basic graphtheory that an acyclic graph connectingn vertices will haveexactly l = n − 1 links. DefineBj ⊂ B for j = 1, . . . , mto be the subset of remaining vertices to be added to stubj,where

⋃mj=1 Bj = B. Further assume for the moment that

⋂mj=1 Bj = ∅, that is, each vertex inB connects to a subgraph

rooted at one and only one stub. Connecting the vertices inBj

to a subgraph rooted at stubj will require a minimum of|Bj|links (i.e.|Bj | − 1 links to form a tree among the|Bj| verticesplus one additional link to connect the tree to the stub). Thus,in order to connect the vertices in the setBj as a tree rootedat stubj, we require

∑

k∈Bjdk = 2|Bj| − 1, and to attach all

vertices inB to them stubs we have

dB =∑

i∈B

di =

m∑

j=1

∑

k∈Bj

dk

=m

∑

j=1

(2|Bj| − 1)

= 2|B| − m

= 2|B| − wA.

Thus, at the point when (17) occurs, only trees can be con-structed from the remaining vertices inB.

The Algorithm

Here, we introduce the algorithm for our heuristic construc-tion and then discuss the conditions when this constructionisguaranteed to result in thesmax graph.

• STEP 0 (INITIALIZATION ):Initialize the construction by adding vertex 1 to the

36

partial graph; that is, begin withA = 1, B =2, 3, . . . , n, andO = (1, 2), . . .. Thus,wA = d1

anddB =∑n

i=2 di.

• STEP 1 (L INK SELECTION): Check to see if there areanyadmissibleelements in the ordered listO.

(a) If |O| = 0, then TERMINATE. Return the graphgA.

(b) If |O| > 0, select the element(s), denoted here as(i, j), having the largest weightdidj , noting thatthere may be more than one of them. For eachsuch link(i, j), checkwi andwj : If either wi = 0or wj = 0 then remove(i, j) fromO.

(c) If no admissible links remain, return to STEP 1(a).

(d) Among all remaining links havingboth wi > 0and wj > 0, select the element(i, j) with thelargest valuewi (where for each(i, j) wi is thesmallerof wi andwj ), and proceed to STEP 2.

• STEP 2 (L INK ADDITION): For the link (i, j) to beadded, consider two types of connections.

– Type I: i ∈ A, j ∈ B. Here, vertexi is thehighest-degree vertex inA with non-zero hubs(i.e., di = maxk∈A dk andwi > 0) andj is thehighest-degree vertex inB. Add link (i, j) to thepartial graphgA: remove vertexj fromB and addit to A, decrementwi andwj , and update bothwA

and dB accordingly. Remove(i, j) from the or-dered listO.

– Type II: i ∈ A, j ∈ A, i 6= j. Here,i andj are thelargest vertices inA for whichwi > 0 andwj > 0.

∗ Check theTree Condition:If dB = 2|B| − wA, then Type II links arenot permitted. Remove the link(i, j) from Owithout adding it to the partial graph.

∗ Check theDisconnected Cluster Condition:If wA = 2, then adding this link would re-sult in a disconnected graph. Remove the link(i, j) from O without adding it to the partialgraph.

∗ Else, add the link(i, j) to the partial graph:decrementwi andwj , and updatewA accord-ingly. Remove(i, j) from the ordered listO.

Note: There is potentially a third case in whichi ∈B, j ∈ B, i 6= j; however this can only occur if thereare no remaining stubs in the partial graphgA. This isprecluded by the test for the Disconnection Conditionamong Type II link additions; however if the algorithmwere modified to allow this, then this third case wouldrepresent the situation where graph construction contin-ues with a new (disconnected) cluster. Adding link(i, j)to the graph would require moving both verticesi andjfrom B to A, decrementingwi and wj , updating bothwA anddB accordingly, and removing(i, j) from theordered listO.

• STEP 3 (REPEAT): Return to STEP 1.

Each iteration of the algorithm either adds a link from the listin O or removes it from consideration. Since there are a finite

number of elements inO, the algorithm is guaranteed to ter-minate in a finite number of steps. Furthermore, the orderednature ofO ensures the following property.

Proposition A-3: At each point during the above construction,for any verticesi ∈ A andj ∈ B, di ≥ dj .

Proof: By construction, ifi ∈ A andj ∈ B, then for somepreviously added vertexk ∈ A, it must have been the case thatdkdi ≥ dkdj . Sincedk > 0, it follows thatdi ≥ dj .

A less obvious feature of this construction is whether ornot the algorithm returns a simple connected graph satisfyingdegree sequenceD (if one exists). While this remains an openquestion, we show that if the Tree Condition is ever reached,then the algorithm is guaranteed to return a graph satisfyingthe intended degree sequence.

Proposition A-4: (Tree Construction). Given a graphic se-quenceD, if at anypoint during the above algorithm the TreeCondition is satisfied, then

(a) the Tree Condition will remain satisfied through all in-termediate construction, and

(b) the final graph will exactly satisfy the intended degreesequence.

Proof: To show part (a), assume thatdB = 2|B|− wA and ob-serve that as a result only a link satisfying Type I can be addednext by our algorithm. Thus, the next link(i, j) to be addedwill have i ∈ A andj ∈ B, and in doing so we will move ver-tex j from the working setB to A. As a result of this update,we will have∆dB = −dj, ∆|B| = −1, and∆wA = dj − 2.Thus, we have updated the following values.

d′B ≡ dB + ∆dB= dB − dj

2|B′| − w′A ≡ 2(|B|+ ∆|B|) − (wA + ∆wA)

= 2(|B| − 1) − (wA + dj − 2)

= 2|B| − wA − dj

= dB − dj

Thus,d′B = 2|B′| − w′A, and the Tree Condition will continue

to hold after the addition of each subsequent Type I link(i, j).To show part (b), observe that after|B| Type I link addi-

tions (each of which results in∆|B| = −1) the setB will beempty, thereby implying also thatdB = 0. Since the relation-shipdB = 2|B| − wA continues to hold after each Type I linkaddition, then it must be that|B| = 0 and dB = 0 collec-tively imply wA = 0. Furthermore, sincewA =

∑

i∈A wi andwi = di− di ≥ 0 for all i, thenwi = 0 for all i, and the degreesequence is satisfied.

An important question is under what conditions the TreeCondition is met during the construction process. Rewritingthis condition asdB − [2|B| − wA] = 0, observe that whenthe algorithm is initialized in STEP 0, we havedB =

∑ni=2 di,

wA = d1 and that|B| = n − 1. This implies that after initial-ization, we have

dB − [2|B| − wA] =n

∑

i=2

di − 2|B|+ d1 =n

∑

i=1

di − 2(n − 1)

37

Note that minimal connectivity amongn nodes is achieved bya tree having total degree

∑ni=1 di = 2(n − 1), and this cor-

responds to the case when the Tree Condition is met at initial-ization. However, if the sequenceD is graphical and the TreeCondition is not met at initialization, thendB− [2|B| − wA] =2z > 0, wherez = (

∑ni=1 di/2) − (n − 1) is the number

of “extra” links above what a tree would require. Assumingz > 0, consider the outcome of subsequent LINK ADDITIONoperations, as defined in STEP 2:

• As already noted, when a Type I connection is made(thus adding a new vertexj to the graph), we have∆dB = −dj , ∆wA = dj − 2, and ∆|B| = −1,which in turn means that Type I connections result in∆(dB − [2|B| − wA]) = 0.

• Accordingly, when a Type II connection is madebetween two stubs inA, we have ∆wA = −2,and both |B| and dB remain unchanged. Thus,∆(dB − [2|B| − wA]) = −2.

So if dB − [2|B| − wA] = 2z > 0, then subsequent link ad-ditions will cause this value to either decrease by 2 or remainunchanged, or in other words, adding additional links can onlybring the algorithm closer to the Tree Condition. Nonetheless,our algorithm isnot guaranteed to reach the Tree Conditionfor all graphic sequencesD (i.e., we have not proved this),although we have not found any counter-examples in whichthe algorithm fails to achieve the desired degree sequence.Ifthat were to happen, however, the algorithm would terminatewith wi > 0 for some vertexi ∈ A, even though|B| = 0.Nonetheless, in the case where the graph resulting from ourconstruction does satisfy the intended degree sequenceD, wecan prove that it is indeed thesmax graph.

Proposition A-5: (General Construction). If the graphgresulting from our algorithm is a connected, simple graph sat-isfying the intended degree sequenceD, then this graph is thesmax graph ofG(D).

Proof: Observe that, in order to satisfy the degree sequenceD,the graphg contains a total ofl =

∑ni=1 di/2 links from the

ordered listO. Since elements ofO are ordered by decreasingweight didj , it is obvious that, in the absence of constraintsthat require the final graph to be connected or satisfy the se-quenceD, a graph containing the firstl elements ofO willmaximize

∑

(i,j)∈E didj . However, in order to ensure thatg isan element of the spaceG(D), when selecting thel links it isusually necessary to “skip” some elements ofO, and Condi-tions A-1 and A-2 identify two simple situations where skip-ping a potential link is required. While skipping links underother conditions may be necessary to guarantee that the re-sulting graph satisfiesD (indeed, the current algorithm is notguaranteed to do this), our argument is thatif these are the onlyconditionsunder which elements ofO have been skipped dur-ing constructionand the resulting graph does satisfyD, thenthe resulting graph maximizess(g).

To see this more clearly, consider a second graphg 6= galso constructed from the ordered listO. Let E ⊂ O be the(ordered) list of links in the graphg, and letE ⊂ O be the(ordered) list of links in the graphg. Assume that these twolists differ by only a single element, namelye ∈ E , e 6∈ Eand e 6∈ E , e ∈ E , whereE\e = E\e. By definition, botheande are elements ofO, and there are two possible cases fortheir relative position within this ordered list (here, we use the

notation “≺” to mean “proceeds in order”).

• If e ≺ e, then g uses in place ofe a link that occurs“later” in the sequenceO. However, sinceO is orderedby weight, usinge cannot result in a higher value fors(g).

• If e ≺ e, theng uses in place ofe a link that occurs “ear-lier” in the sequenceO—one that had been “skipped” inthe construction ofg. However, the “skipped” elementsof O will correspond to instances of Conditions A-1 andA-2, and using them must necessarily result in a graphg 6∈ G(D) because it is either disconnected or becauseits degree sequence does not satisfyD.

Thus, for any other graphg, it must be the case that eithers(g) ≤ s(g) or g 6∈ G(D), and therefore we have shown thatg is thesmax graph.

A.3 Among Connected, Acyclic Graphs

In the special case when∑n

i=1 di = 2(n − 1), there existsonly one type of graph structure that will connect alln nodes,namely an acyclic graph (i.e., a tree). All connected acyclicgraphs are necessarily simple. Because acyclic graphs are aspecial case of elements inG(D), generatingsmax trees isachieved by making the appropriate Type I connections in theaforementioned algorithm. In effect, this construction ises-sentially a type of deterministic preferential attachment, onein which we iterate through all vertices in the ordered listDand attach each to the highest-degree vertex with a remainingstub.

In the case of trees, the arguments underlying thesmax

proof can be made more precise. Observe that the incremen-tal construction of a tree is equivalent to choosing for eachvertex inB the single vertex inA to which it becomes at-tached. Consider the choices available for connecting twoverticesk, m ∈ B to verticesi, j ∈ A where di ≥ dj ,dk ≥ dm, and observe thatdidk + didm ≥ didk + djdm ≥djdk +didm ≥ djdk +djdm, where second inequality followsfrom Proposition 3 while the first and last inequalities are byassumption. There are two cases of interest. First, ifwi > 1andwj ≥ 1, then it is clear that it is optimal to connectbothverticesk, m ∈ B to vertexi ∈ A. Second, ifwi = 1 andwj ≥ 1, then it is clear that it is optimal to connectk ∈ B toi ∈ A andm ∈ B to j ∈ A. All other scenarios can be decom-posed into these two cases, thus proving that the algorithm’sincremental construction for a tree is guaranteed to resultinthesmax graph.

There are many important properties ofsmax trees that arediscussed in Section 4, which we now prove.

A.3.1 Properties ofsmax Acyclic Graphs

Recall that our working definition of so-calledbetweenness(also known asbetweenness centrality) for a vertexv ∈ Vin an acyclic graph is given by

Cb(v) =

∑

s<t∈V σst(v)∑

s<t∈V σst=

σ(v)

n(n − 1)/2,

where we use the notationσ(v) to denote the number of uniquepaths in the graph passing through nodev, and where the to-tal number of unique paths between vertex pairss and t isn(n − 1)/2.

38

For a given nodev ∈ V , letN (v) denote the set of neigh-boring nodes, where by definition|N (v)| = dv. For all nodesthat are not the root of the tree, exactly one of these neighborswill be “upstream” while the rest will be “downstream” (incontrast, the root node has only downstream neighbors). De-finebj to be the total number of nodes “connected” through thejth neighbor. Our convention will be to denote the “upstream”neighbor with index 0 (if it exists); thus for all nodesv otherthan the root, one has

∑dv−1j=0 bj = n − 1 (for the root noder,

the appropriate summation is∑dr

j=1 bj = n − 1). Using thisnotation, it becomes clear that, for each nodev other than theroot of the tree, we can express

σ(v) =

dv−1∑

j,k=0j<k

bjbk = b0

dv−1∑

k=1

bk +

dv−1∑

j,k=1j<k

bjbk.

Thus, σ(v) decomposes into two components: the first mea-sures the number of paths between upstream and downstreamnodes that pass through nodev, and the second measures thenumber of paths passing through nodev that are betweendownstream nodes only. Since

∑

s<t∈V σst is a constant fortrees containingn nodes, when comparing the centrality fortwo nodesu andv, we work directly withσ(u) andσ(v). Inso doing, for nodesu andv we will denotebu

j , bvj as the number

of nodes connected to each via their respectivejth neighbor.One property of thesmax graph that will be useful for

showing that there exists monotonicity between node central-ity and node degree is given by the following Lemma.

Lemma 1. Let g be thesmax acyclic graph for degree se-quenceD, and consider two nodesu, v ∈ V satisfyingdu >dv. Then, it necessarily follows that

du−1∑

j,k=1j<k

buj bu

k >

dv−1∑

j,k=1j<k

bvj b

vk. (18)

Note that the summation is overdownstreamnodes only, thusLemma 1 states that, forsmax trees, the contribution to central-ity from paths between downstream nodes is greater for nodeswith higher degree.

Proof of Lemma 1: Recalling from Proposition 1 thatbuj ≥ bv

j

for all j = 1, 2, . . . , dv − 1, and noting thatdu > dv,

du−1∑

j,k=1j<k

buj bu

k =

dv−1∑

j,k=1j<k

buj bu

k +

dv−1∑

j=1

du−1∑

k=dv

buj bu

k +

du−1∑

j,k=dvj<k

buj bu

k

>

dv−1∑

j,k=1j<k

bvj b

vk +

dv−1∑

j=1

du−1∑

k=dv

buj bu

k +

du−1∑

j,k=dvj<k

buj bu

k

>

dv−1∑

j,k=1j<k

bvj b

vk.

Thus, the proof is complete.

Lemma 1 in turn facilitates a proof of the more generalstatement regarding the centrality of nodes in thesmax acyclicgraph, as stated in Proposition 3.

Proof of Proposition 3: We proceed in two parts. First,we show that if nodev is downstream from nodeu, thenσ(u) > σ(v). Second, we show that ifv is in a different branchof the tree fromu (i.e., neither upstream nor downstream fromu) butdu > dv, thenσ(u) > σ(v).

Starting first with the scenario wherev is downstream fromu, there are two cases that need to be addressed.

Case 1:nodev is directly downstream from nodeu, and nodeu is the root of the tree. Observe that we can representσ(v) as

σ(v) = bv0

dv−1∑

k=1

bvk +

dv−1∑

j,k=1j<k

bvj b

vk

=

( du∑

j=1j 6=v

buj

)(

buv − 1

)

+

dv−1∑

j,k=1j<k

bvj bv

k, (19)

sincebv0 =

∑du

j=1;j 6=v buj and also thatbu

v = 1 +∑dv−1

k=1 bvk.

For nodeu, we have

σ(u) =

du∑

j,k=1j<k

buj bu

k

= buv

du∑

k=1k 6=v

buk +

du∑

j,k=1j<k;j,k 6=v

buj bu

k . (20)

Comparingσ(u) and σ(v), we observe that the first term of(20) is clearly greater than the first term of (19). Furthermore,by Lemma 1, we also observe that the second term of (20) isalso greater than the second term of (19). Thus, we concludefor this case thatσ(u) > σ(v).

Case 2:nodev is directly downstream from nodeu, but nodeu is not the root of the tree. Recognizing for any nodei that∑di−1

j=1 bj = (n − 1) − b0, we write

σ(u) = bu0

(

n − 1 − bu0

)

+

du−1∑

j,k=1j<k

buj bu

k

σ(v) = bv0

(

n − 1 − bv0

)

+

dv−1∑

j,k=1j<k

bvj b

vk

As before, we observe from Lemma 1 that∑du−1

j,k=1;j<k buj bu

k >∑dv−1

j,k=1;j<k bvj b

vk, so proving thatσ(u) > σ(v) in this case re-

quires simply that we show

bu0

(

(n − 1) − bu0

)

> bv0

(

(n − 1) − bv0

)

. (21)

Observe thatbv0 = bu

0 +1+∑du−1

j=1;j 6=v buj . As a result, we have

bv0

(

(n − 1) − bv0

)

=(

bu0 + 1 +

du−1∑

j=1j 6=v

buj

)

39

2

u

vj =1 du

k =1 2 dv-1

bvu

b0v

2

u

vj =1 du-1

k =1 2 dv-1

bvu

b0v

2

u

vj =1 du-1

k =1 2 dv-1

∑−

=

1d

1k

vk

v

b

∑−

=

1d

1j

uj

u

b

∑−

=

−+1d

1j

uj

v0

u

bb1∑

−

=

−+1d

1k

vk

u0

v

bb1Case 1: Case 2: Case 3:

Figure 16: Centrality of high-degree nodes in thesmax tree.

(

(n − 1) − bu0 − 1 −

du−1∑

j=1j 6=v

buj

)

= bu0

(

(n − 1) − bu0

)

+(

1 +

du−1∑

j=1j 6=v

buj

)

(

(n − 1) − 2bu0 −

(

1 +

du−1∑

j=1j 6=v

buj

)

)

Since1 +∑du−1

j=1;j 6=v buj > 0, (21) is true if and only if

(n − 1) − 2bu0 −

(

1 +

du−1∑

j=1j 6=v

buj

)

< 0

which is equivalent to

(n − 1) − bu0 < bu

0 + 1 +

du−1∑

j=1j 6=v

buj

du−1∑

k=1

buk < bu

0 + 1 +

du−1∑

j=1;j 6=v

buj

buv < bu

0 + 1.

This final statement will always be true for thesmax tree, sincethe “upstream” branch from nodeu will always contain at leastas many nodes as the downstream branch corresponding tonodev.

These two cases prove that any “upstream” node in thesmax tree is always more central than any “downstream” node,since by extension ifu is directly upstream fromv thenσ(u) >σ(v), and ifv is directly upstream fromw thenσ(v) > σ(w).It therefore follows thatσ(u) > σ(w), and, by induction, thatthe “root” node of thesmax tree (having highest degree) is themost central within the entire tree.

Case 3:Now we turn to the case where nodev is not directlydownstream (or upstream) from nodeu. As before, we write

σ(u) = bu0

du−1∑

k=1

buk +

du−1∑

j,k=1;j<k

buj bu

k ,

σ(v) = bv0

dv−1∑

k=1

bvk +

dv−1∑

j,k=1;j<k

bvj b

vk.

As with the previous cases, by Lemma 1 we know that∑du−1

j,k=1;j<k buj bu

k >∑dv−1

j,k=1;j<k bvj b

vk, so proving thatσ(u) >

σ(v) in this case requires simply that we show that

bu0

du−1∑

k=1

buk > bv

0

dv−1∑

k=1

bvk. (22)

We rewrite each of these as

bu0 =

dv−1∑

j=1

bvj +

(

bu0 −

dv−1∑

j=1

bvj

)

bv0 =

du−1∑

j=1

buj +

(

bv0 −

du−1∑

j=1

buj

)

so that we have

bu0

du−1∑

k=1

buk =

( dv−1∑

j=1

bvj +

(

bu0 −

dv−1∑

j=1

bvj

)

) du−1∑

k=1

buk

bv0

dv−1∑

k=1

bvk =

( du−1∑

j=1

buj +

(

bv0 −

du−1∑

j=1

buj

)

) dv−1∑

k=1

bvk

and observe that

bu0 −

dv−1∑

j=1

bvj = bv

0 −du−1∑

j=1

buj ,

which is a non-negative constant, that we denoteκ. Thus,

bu0

du−1∑

j=1

buj − bv

0

dv−1∑

j=1

bvj = κ

( du−1∑

j=1

buj −

dv−1∑

j=1

bvj

)

,

which is also non-negative since∑du−1

j=1 buj >

∑dv−1j=1 bv

j , andso (22) also holds. Thus, we have shown thatσ(u) > σ(v)in the smax tree wheneverdu > dv, thus completing theproof.

B The s(g)-Metric and Assortativity

Following the development of Newman [73], letP (Di =k) = P (k) be the node degree distribution over the ensembleof graphs and defineQ(k) = (k + 1)P (k + 1)/

∑

j∈D jP (j)

40

to be the normalized distribution ofremaining degree(i.e., thenumber of “additional” connections for each node at either endof the chosen link). LetD = d1 − 1, d2 − 2, · · · , dn − 1 de-note the remaining degree sequence forg. This remaining de-gree distribution isQ(k) =

∑

k′∈D Q(k, k′), whereQ(k, k′)is the joint probability distributionamong remaining nodes,i.e., Q(k, k′) = P (Di = k + 1, Dj = k′ + 1|(i, j) ∈ E).In a network where the remaining degree of any two ver-tices is independent, i.e.Q(k, k′) = Q(k)Q(k′), there isno degree-degree correlation, and this defines a network thatis neither assortative nor disassortative (i.e., the “center” ofthis view into the ensemble). In contrast, a network withQ(k, k′) = Q(k)δ[k − k′] defines a perfectly assortative net-work. Thus, graph assortivityr is quantified by theaverageofQ(k, k′) over all the links

r =

∑

k,k′∈D kk′(Q(k, k′) − Q(k)Q(k′))∑

k,k′∈D kk′(Q(k)δ[k − k′] − Q(k)Q(k′)), (23)

with proper centering and normalization according to the valueof perfectly assortative network, which ensures that−1 ≤ r ≤1. Many stochastic graph generation processes can be under-stood directly in terms of the correlation distributions amongthese so-called remaining nodes, and this functional form fa-cilitates the direct calculation of their assortativity. In particu-lar, Newman [73] shows that both Erdos-Renyi random graphsand Barabasi-Albert preferential attachment growth processesyield ensembles with zero assortativity.

Newman [75] also develops the following sample-baseddefinition of assortativity

r(g) =

[

∑

(i,j)∈E didj/l]

−[

∑

(i,j)∈E12 (di + dj)/l

]2

[

∑

(i,j)∈E12 (d2

i + d2j)/l

]

−[

∑

(i,j)∈E12 (di + dj)/l

]2 ,

which is equivalent to (14).While the ensemble-based notion of assortativity in (23)

has important differences from the sample-based notion ofassortativity in (14), their relationship can be understood byviewing a given graph as a singleton on an ensemble of graphs(i.e., where the graph of interest is chosen with probability 1from the ensemble). For this graph, if we define the numberof nodes with degreek asN(k), we can derive the degree dis-tributionP (k) and the remaining degree distributionQ(k) onthe ensemble as

P (k) =N(k)

n

and

Q(k) =(k + 1)P (k + 1)

∑

j∈D jP (j)=

(k + 1)N(k + 1)∑

j∈D jN(j).

Also it is easy to see that∑

i∈V

di =∑

k∈D

kN(k) = 2l,

∑

i∈V

d2i =

∑

k∈D

k2N(k),

...∑

i∈V

dmi =

∑

k∈D

kmN(k),

wherem is a positive integer.Equations (23) and (14) can be related term-by-term in the

following manner. The first term of the numerator,Q(k, k′),represents the joint probability distribution of the (remaining)degrees of the two nodes at either end of a randomly chosenlink. For a given graph, letl(k, k′) represent the number oflinks connecting nodes with degreek to nodes with degreek′.Then, we can writeQ(k, k′) = l(k, k′)/l, and hence

∑

k,k′∈D

kk′Q(k, k′) =1

l

∑

(i,j)∈E

didj .

The first term of the denominator ofr in equation (23) can bewritten as

∑

k,k′∈D

kk′Q(k)δ[k − k′] =∑

k∈D

k2Q(k) (24)

=

∑

k∈D(k + 1)3N(k + 1)∑

i∈D jN(j)

=

∑

i∈V d3i

2l, (25)

and the “centering” term (in both the numerator and the de-nominator) is

∑

k,k′∈D

kk′Q(k)Q(k′) =

∑

k∈D

kQ(k)

2

(26)

=

(∑

k∈D(k + 1)2N(k + 1)∑

i∈D jN(j)

)2

=

(∑

i∈V d2i

2l

)2

. (27)

In both of these cases, the offset of a constant in representingthe degree sequence asD versusD does not effect the over-all calculation. The relationships between the ensemble-basedquantities (LHS of 24) and (LHS of 26) and their sample-based (i.e., structural) counterparts (25) and (27) holds (ap-proximately) when the expected degree equals the actual de-gree.

To see why (27) can be viewed as the “center”, we con-sider the following thought experiment:what is the structureof a deterministic graph with degree sequenceD and havingzero assortativity?In principle, a node in such a graph willconnect to any other node in proportion to each node’s degree.While such a graph may not exist for generalD, one can con-struct a deterministicpseudographg having zero assortativityin the following manner. LetA = [aij ] represent a (directed)node adjacency matrix of non-negative real values, represent-ing the “link weights” in the pseudograph. That is, links arenotconstrained to integer values but can exist in fractional form.The zero assortative pseudograph will have symmetric weightsgiven by

aij =

(

dj∑

k∈V dk

) (

di

2

)

=

(

di∑

k∈V dk

) (

dj

2

)

= aji.

Thus, the weightaij for each link emanating out of nodei is inproportion to the degree of nodej, in a manner that is relative

41

to the sum of all node degrees. In general, the graphs of interestto us are undirected, however here it is notationally convenientto consider the construction of directed graphs. Using theseweights, the total weight among all links entering and exitinga particular nodei equals

∑

j∈V

aij +∑

k∈V

aki = di/2 + di/2 = di.

Accordingly, the total “link weights” in the pseudograph areequal to

∑

i,j∈V

aij =∑

j∈V

dj/2 = l,

where l corresponds to the total number of links in a tradi-tional graph. Thes-metric for the pseudographgA representedby matrixA can be calculated as

s(gA) =∑

j∈V

∑

i∈V

diaijdj

=∑

j∈V

[

∑

i∈V

di

(

dj∑

k∈V dk

) (

di

2

)

]

dj

=

(

∑

j∈V d2j

)(

∑

i∈V d2i

)

2(

∑

k∈V dk

)

=

(

∑

j∈V d2j

)2

4l,

and we have

s(gA)

l=

(∑

i∈V d2i

2l

)2

,

which is equal to (27).In principle, one could imagine a deterministic procedure

that uses the structural pseudographgA to generate the zeroassortativity graph among an “unconstrained” background setG. That is, graphs resulting from this procedure could havemultiple links between any pair of nodes as well as multipleself-loops and would not necessarily be connected. The chal-lenge in developing such a procedure is to ensure that the re-sulting graph has degree sequence equal toD, although onecan imagine that in the limit of large graphs this becomes lessof an issue. By extension, it is not hard to conceive a stochas-tic process that uses the structural pseudographgA to generatea statistical ensemble of graphs having expected assortativityequal to zero. In fact, it is not hard to see why the GRG methodis very close to such a procedure.

Note that the total weight in the pseudograph betweennodesi andj equalsaij + aji = didj/2l. Recall from Sec-tion 5.1 that the GRG method described is based on the choiceof a probabilitypij = ρdidj of connecting two nodesi andj, and also that in order to ensure thatE(di) = di one needsρ = 1/2l, provided thatmaxi6=j∈V didj ≤ 2l. Thus, the GRGmethod can be viewed as a stochastic procedure that generatesreal graphs from the pseudographgA, with the one importantdifference that the GRG method always results in simple (butnot necessarily connected) graphs. Thus, the zero assortativitypseudographgA can be interpreted as the “deterministic out-come” of a GRG-like construction method. Accordingly, oneexpects that the statistical ensemble of graphs resulting from

the stochastic GRG method could have zero assortativity, butthis has not been proven.

In summary, graph assortativity captures a fundamentalfeature of graph structure, one that is closely related to ours-metric. However, the existing notion of assortativity foranindividual graphg is implicitly measured against a backgroundset of graphsG that is not constrained to be either simpleor connected. The connection between the sample-based andensemble-based definitions makes it possible to calculate theassortativity among graphs of different sizes and having differ-ent degree sequences, as well as for different graph evolutionprocedures. Unfortunately, because this metric is computedrelative to an unconstrained background set, in some cases thisnormalization (against thesmax graph) and centering (againstthe gA pseudograph) does a relatively poor job of distinguish-ing among graphs having thesamedegree sequence, such asthose in Figure 5.

References

[1] D. Achlioptas, A. Clauset, D. Kempe, and C. Moore On thebias of traceroute sampling; or, power-law degree distributionsin regular graphs. 2003. cond-mat/0503087

[2] W. Aiello, F. Chung, and L. Lu. A Random Graph Model forMassive Graphs.Proc. STOC 2000.

[3] W. Aiello, F. Chung, and L. Lu. Random evolution in massivegraphs. In:Handbook on Massive Data Sets, (Eds. J. Abello etal.), pp. 97–122, 2001.

[4] R. Albert and A-L. Barabasi. Statistical mechanics of complexnetworks.Reviews of Modern Physics, Vol. 74, January 2002.

[5] R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of theWorldWide Web,Nature401, 130-131 (1999)

[6] R. Albert, H. Jeong, and A.-L. Barabasi. Attack and error toler-ance of complex networks,Nature406, 378-382 (2000)

[7] D. Alderson. Technological and Economic Drivers and Con-straints in the Internet’s “Last Mile”, Tech Report CIT-CDS-04-004, California Institute of Technology (2004)

[8] D. Alderson and W. Willinger. A contrasting look at self-organization in the Internet and next-generation communicationnetworks.IEEE Communications Magazine, July 2005.

[9] L.A.N. Amaral, A. Scala, M. Barthelemy, and H.E. Stanley.Classes of small-world networks.Proc. Natl. Acad. Sci. USA97:11149–11152, 2000.

[10] M. Arita. The metabolic world of Escherichia coli is notsmall.PNAS, Vol. 101, No. 6, 1543–1547, February 10, 2004.

[11] P. Bak How nature works: the science of self-organized criti-cality, Copernicus (1996)

[12] P. Ball.Critical Mass: How One Thing Leads to Another.Far-rar, Straus and Giroux (2004).

[13] P. Ball. Networking can be hazardous,Nature Science Update,9 March 2001.

[14] A.-L. BarabasiLinked: The New Science of Networks.PerseusPublishing(2002)

[15] A.-L. Barabasi and R. Albert. Emergence of scaling in randomnetworks,Science286, 509-512 (1999)

[16] A.-L. Barabasi, R. Albert, and H. Jeong Mean-field theory forscale-free random networks,Physica A272, 173-187 (1999)

[17] A.-L. Barabasi and E. Bonabeau. Scale-Free Networks.Scien-tific American288, 60-69 (2003).

[18] A.-L.Barabasi, Z. Dezso, E. Ravasz, S.H.Yook, and Z. Oltvai.Scale-free and Hierarchical Structures in Complex Networks,AIP Conference Proceedings, 2003.

[19] A.-L. Barabasi and Z.N. Oltvai. Network biology: understand-ing the cell’s functional organization.Nature Reviews Genetics5, 101–114 (2004).

[20] A.-L. Barabasi, E. Ravasz, and T. Vicsek. Deterministic scale-free networks.Physica A2001; 299:599, cond-mat/0107419.

42

http://arXiv.org/abs/cond-mat/0503087


[21] N. Berger, C. Borgs, J. Chayes and A. Saberi. On the Spread ofViruses on the Internet. ACM-SIAM Symposium on DiscreteAlgorithms (SODA05) Vancouver, British Columbia, January23-25, 2005.

[22] G. Bianconi and A.-L. Barabasi Competition and Multiscalingin Evolving Networks,Europhys. Lett.54, 436-442 (2001)

[23] B. Bollobas Modern Graph Theory.Springer1998.[24] B. Bollobas and O. Riordan. Robustness and vulnerability of

scale-free random graphs,Internet Math.1, pp. 1–35, 2003.[25] B. Bollobas and O. Riordan. The Diameter of a Scale-Free Ran-

dom Graph,Combinatorica24(1), pp. 5–34, 2004.[26] L. Briesemeister, P. Lincoln, and P. Porras, Epidemic Profiles

and Defense of ScaleFree Networks.ACM Workshop on RapidMalcode (WORM), Washington, D.C. October 27, 2003.

[27] M. Buchanan.Ubiquity: The Science of History. . . or Why theWorld Is Simpler Than We Think. Crown. 2001.

[28] J. M. Carlson and J. C. Doyle. Highly Optimized Tolerance: amechanism for power laws in designed systems.Physics Re-view E, 60:1412–1428, 1999.

[29] J.M. Carlson and J.Doyle Complexity and RobustnessPNAS,99, Suppl. 1, 2539-2545 (2002)

[30] H. Chang, R. Govindan, S. Jamin, S. Shenker, and W. Willinger.Towards Capturing Representative AS-Level Internet Topolo-giesACM SIGMETRICS 2002(2002).

[31] G. Chartrand, G. Kubicki, and M. Schultz. Graph similarityand distance in graphs. Aequationes Mathematicae, Vol. 55,pp.129–145, 1998.

[32] F. Chung and L. Lu. The average distance in a random graphwith given expected degrees,Internet Math., 1: 91–113, 2003.

[33] D. D. Clark. The design philosophy of the DARPA Internetpro-tocols.Proc. of the ACM SIGCOMM’88, in: ACM ComputerCommunication Reviews, Vol. 18, No. 4, pp. 106–114, 1988.

[34] Cooperative Association for Internet Data Analysis(CAIDA), Skitter. Available at http://www.caida.org/tools/measurement/skitter/.

[35] M.E. CSETE AND J.C. DOYLE. Reverse engineering of bio-logical complexity,Science295, 1664 (2002).

[36] M. CSETE AND J. DOYLE. Bowties, metabolism, and disease,Trends in Biotechnology, 22, 446-450 (2004).

[37] Z. Dezso and A.-L. Barabasi, Halting viruses in scale-free net-works,Physical Review E(2002)

[38] S. Dill, R. Kumar, K. McCurley, S. Rajagopalan, D. Sivaku-mar, and A. Tompkins. Self-similarity in the Web. ACM Trans.Internet Technology, Vol. 2, pp. 205–223, 2002.

[39] P.S. Dodds and D.H. Rothman. Unified view of scaling lawsinriver networks.Phys. Rev. E, 59(5):4865–4877, 1999.

[40] S. N. Dorogovtsev and J. F. F. Mendes.Evolution of Networks:From Biological Nets to the Internet and WWW.Oxford Uni-versity Press, Oxford, UK, 2003.

[41] J.C. Doyle, D. Alderson, L. Li, S. Low, M. Roughan, S.Shalunov, R. Tanaka, and W. Willinger. The “Robust Yet Frag-ile” Nature of the Internet.Proc. Natl. Acad. Sci. USA102(41).2005.

[42] P. Erdos and T. Gallai. Graphs with prescribed degreesof ver-tices.Mat. Lapok(Hungarian), 11:264-274, 1960.

[43] P. Erdos and A. Renyi. On random graphs IPubl. Math. (De-brecen) 9(1959), 290-297.

[44] A. Fabrikant, E. Koutsoupias, and C. Papadimitriou. (2002)Heuristically Optimized Trade-offs: A new paradigm forPower- laws in the Internet,Proceedings of ICALP2002; 110–122.

[45] S.M. Fall. Galaxy correlations and cosmology.Reviews of Mod-ern Physics51(1):21–42, 1979.

[46] I. Farkas, I. Dernyi, G. Palla, and T. Vicsek. Equilibrium sta-tistical mechanics of network structures Springer-VerlagLect.Notes in Phys.650:163-187 (2004)

[47] W. Feller.An Introduction to Probability Theory and Its Appli-cations, Volume 2. John Wiley & Sons. New York. 1971.

[48] E. Fox Keller. Revisiting “scale-free” networks.BioEssays27.10:1060–1068 (2005).

[49] C. Gkantsidis, M. Mihail, and E. Zegura. The markov chainsimulation method for generating connected power law randomgraphs.Proc. 5th Workshop on Algorithm Engineering and Ex-periments (SIAM Alenex’03), 2003.

[50] R. Govindan and H. Tangmunarunkit. Heuristics for InternetMap Discovery,IEEE INFOCOM 2000.

[51] E.J. Groth and P.J.E. Peebles. The Large Scale Structure of theUniverse, Astrophysics Journal, vol.217, p.385, 1977.

[52] J.T. Hack. Studies of longitudinal stream profiles in Virginiaand Maryland.US Geol. Survey Prof. Paper, Vol. 294-B,pp. 45–94, 1957.

[53] R.E. Horton. Erosional development of streams and theirdrainage basins: Hydrophysical approach to quantitative mor-phology.Bull. Geol. Soc. America, Vol. 56, pp. 275–370, 1945.

[54] H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai. Lethalityand centrality in protein networks.Nature411, 41-42 (2001).

[55] H.Jeong, B. Tombor, R. Albert, Z. N. Oltvai and A-L BarabsiThe large-scale organization of metabolic networks,Nature407, 651 (2000)

[56] H. Jeong, Z. Neda, and A-L. Barabasi. Measuring PreferentialAttachment in Evolving Networks,Europhys. Letter61, 567-572 (2003)

[57] Johnson, N.L., S. Kotz, and N. Balakrishnan.Continuous Uni-variate Distributions. Vol. 1, 2nd Ed. New York: John Wiley &Sons. 1994.

[58] S. Itzkovitz, R. Levitt, N. Kashtan, R. Milo, M. Itzkovitz, andU. Alon. Coarse-Graining and Self-Dissimilarity of ComplexNetworks. arXiv:q-bio.MN/0405011 v1, 14 May 2004.

[59] S. Itzkovitz, R. Levit, N. Kashtan, R. Milo, M. Itzkovitz, andU. Alon. Multi-Level Coarse-Graining of Complex Networks.Preprint.

[60] J.W. Kirchner. Statistical inevitability of Horton’slaws and theapparent randomness of stream channel networks.Geology,vol. 21, pp. 591–594, July 1993.

[61] J.W. Kirchner. Statistical inevitability of Horton’slaws andthe apparent randomness of stream channel networks–Reply toMasek and Turcotte.Geology, 22: 380–381, April 1994.

[62] J.W. Kirchner. Statistical inevitability of Horton’slaws andthe apparent randomness of stream channel networks–Reply toTroutman and Karlinger,Geology, 22: 574–575, June 1994.

[63] H.-P. Kriegel and S. Schonauer. Similarity search in structureddata. Proc. 5th Conf. on Data Warehousing and Knowledge Dis-covery DaWaK’03, Prague, Czech Republic, 2003.

[64] A. Lakhina, J.W. Byers, M. Crovella, and P. Xie. Sampling Bi-ases in IP topology Measurements,IEEE INFOCOM 2003.

[65] L. Li, D. Alderson, J. Doyle, and W. Willinger. A First-Principles Approach to Understanding the Internet’s Router-level Topology.Proc. ACM SIGCOMM 2004.

[66] S.E. Luria and M. Delbruck. Mutations of bacteria fromvirussensitivity to virus resistance.Genetics, vol. 28, 491–511, 1943.

[67] B.B. Mandelbrot.Fractals and Scaling in Finance: Disconti-nuity, Concentration, Risk. Springer-Verlag, New York, 1997.

[68] A. Marani, R. Rigon, and A. Rinaldo. A note on fractal chan-nel networks.Water Resources Research, Vol. 27, No. 12, pp.3041–3049, 1991.

[69] Matlab code for generating the se-quences used in Figure 1 is available fromhttp://hot.caltech.edu/topology/RankVsFreq.m.

[70] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii,and U. Alon. Network Motifs: Simple Building Blocks of Com-plex Networks.Science, 298 (2002).

[71] M. Mitzenmacher. A Brief History of Generative Modelsfor Power Law and Lognormal Distributions,Internet Math.1(2):226-251, 2004.

[72] M.E.J. Newman. A measure of betweenness centrality based onrandom walks.Social Networks27, 39–54, 2005.

[73] M.E.J. Newman Assortative Mixing in Networks,Phys. Rev.Lett.89, 208701 (2002)

[74] M.E.J. Newman. The Structure and Function of Complex Net-works,SIAM Review45, 167-256 (2003)

43

http://www.caida.org/

http://arXiv.org/abs/q-bio/0405011

http://hot.caltech.edu/topology/RankVsFreq.m

[75] M.E.J. Newman. Power laws, Pareto distributions and Zipfslaw, Contemporary Physics, in press. (Available electronicallyfrom arXiv:cond-mat/0412004, January 2005)

[76] J. Ottino. Engineering Complex Systems.Nature427 (29), Jan-uary 2004.

[77] J. Park and M.E.J.Newman The origin of degree correlations inthe Internet and other networksPhys. Rev. E.68, 026112, 2003.

[78] R. Pastor-Satorras and A. Vespignani. Epidemic spreading inscale-free networksPhysical Review Letters, 86(14), pp. 3200–3203, April 2, 2001.

[79] R. Pastor-Satorras and A. Vespignani.Evolution and Structureof the Internet : A Statistical Physics Approach.CambridgeUniversity Press, Cambridge, UK, 2004.

[80] K. Patch. Net inherently virus prone,Technology ResearchNews, March 21, 2001.

[81] V. Paxson. Strategies for Sound Internet Measurement.Proc. ACM SIGCOMM Internet Measurement Conference IMC2004(2004).

[82] S.D. Peckham. New results for self-similar tree with applica-tions to river networks.Water Resources Research, vol. 31,no. 4, pp. 1023–1029, April 1995.

[83] E. Petrakis. Design and evaluation of spatial similarity ap-proaches for image retrieval. Image and Vision COmputing,Vol. 20, pp. 59–76, 2002.

[84] E. Ravasz, A.L. Somera, D.A. Mongru, Z.N. Oltvai, and A.-L.Barabasi. Hierarchical organization of modularity in metabolicnetworks,Science, 2002.

[85] S. I. Resnick. Heavy tail modeling and teletraffic data.Annalsof Statistics25, pp. 1805–1869, 1997.

[86] A. Rinaldo, I. Rodriguez-Iturbe, R. Rigon, R.L. Bras, E.Ijjasz-Vasquez, and A. Marani. Minimum Energy and FractalStructures of Drainage Networks,Water Resources Research28(9):2183-2195, 1992.

[87] I. Rodriguez-Iturbe, A. Rinaldo, R. Rigon, R.L. Bras, A.Marani, and E. Ijjasz-Vasquez. Energy Dissipation, RunoffProduction, and the Three-Dimensional Structure of RiverBasins,Water Resources Research28(4):1095-1103, 1992.

[88] Route Views, University of Oregon Route Views Project,http://www.antc.uoregon.edu/route-views/.

[89] G. Samorodnitsky, and M.S. Taqqu.Stable Non-Gaussian Ran-dom Processes: Stochastic Models with Infinite Variance.Chapman and Hall, New York - London, 1994.

[90] A. Sanfeliu and K.S. Fu. A distance measure between attributedrelational graph. IEEE Trans. Systems, Man and Cybernetics,Vol. 13, pp. 353–362, 1983.

[91] T. Seidl. References for graph similarity.http://www.dbs.informatik.uni-muenchen.de/ seidl/graphs/.

[92] S.S. Shen-Orr, R. Milo, S. Mangan, et al. Network motifsin thetranscriptional regulation network of Escherichia coli.NatureGenetics, 31 (1): 64–68, May 2002.

[93] S.S. Skiena.The Algorithm Design Manual.New York:Springer-Verlag, 1997. Available electronically fromhttp://www2.toki.or.id/book/AlgDesignManual/.

[94] H.A. Simon On a class of skew distribution functions.Biometrika, Vol. 42, No. 3/4, 425-440 (1955).

[95] N. Spring, M. Dontcheva, M. Rodrig, and D. Wetherall. How toResolve IP Aliases,UW CSE Tech. Report04-05-04, 2004.

[96] N. Spring, R. Mahajan, D. Wetherall, and T. Anderson. Measur-ing ISP Topologies with Rocketfuel,IEEE/ACM Transactionson Networking, Vol. 12, pp. 2–16 (2004).

[97] R. Sole and S. Valverde, Information Transfer and Phase Tran-sitions in a model of Internet Traffic,Physica A289 (2001),595-695

[98] D. Stauffer and A. Aharony.Introduction to Percolation Theory,2nd edition. Taylor and Francis, London, 1992.

[99] J. Stelling, U. Sauer, Z. Szallasi, F.J. Doyle, and J. Doyle Ro-bustness of cellular functions.Cell, Vol. 118, pp. 675–685,2004.

[100] J. W. Stewart III.BGP4: Inter-Domain Routing in the Internet.Addison-Wesley, Boston, 1999.

[101] A.N. Strahler. Quantitative analysis of watershed geomorphol-ogy.Trans. Am. Geophy. Union, 38(6):913–920, 1957.

[102] R. Tanaka. Scale-rich metabolic networks.Physical ReviewLetters94, 168101 (2005)

[103] R. Tanaka, M. Csete and J. Doyle. Global organization ofmetabolic networks: Implications for complex biologic sys-tems, in preparation.

[104] R. Tanaka, M. Csete and J. Doyle. Highly optimized global or-ganization of metabolic networks.IEE Proc. Systems Biology,in press (2005).

[105] R. Tanaka, T.-M. Yi and J. Doyle. Some protein interactiondata do not exhibit power law statistics,FEBS letters579, 5140-5144 (2005).

[106] D.G. Tarboton. Fractal river networks, Horton’s laws, andTokunaga cyclicity.Hydrology, Vol. 187, pp. 105–117, 1996.

[107] R. Teixeira, K. Marzullo, S. Savage, and G.M. Voelker.InSearch of Path Diversity in ISP Networks, InProc. IMC’03,Miami Beach, Florida. October 27-29, 2003.

[108] A. Tripathi and S. Vijay. A note on a theorem of Erdos & Gal-lai. Discrete MathematicsVolume 265, Issue 1-3, 2003.

[109] D.J. Watts and S.H. Strogatz. Collective dynamics of “smallworld” networks.Nature, Vol. 393, June 1998.

[110] W. Willinger, D. Alderson, J. Doyle, and L. Li. A pragmaticapproach to dealing with high variability in network measure-ments.Proc. ACM SIGCOMM Internet Measurement Confer-ence IMC 2004(2004).

[111] W. Willinger, D. Alderson, J.C. Doyle, and L.Li. More “Nor-mal” Than Normal: Scaling Distributions and Complex Sys-tems.Proceedings of the 2004 Winter Simulation ConferenceR.G. Ingalls, M.D. Rossetti, J.S. Smith, and B.A. Peters, eds.(2004)

[112] D.H. Wolpert and W. Macready. Self-dissimilarity: Anem-pirically observable complexity measure,Unifying Themes inComplex Systems, New England Complex Systems Institute,626–643 (2000)

[113] D.H. Wolpert and W. Macready. Self-dissimilarity as ahighdimensional complexity measure. Preprint, 2004.

[114] K.Wu and A. Liu The Rearrangement Inequality,http://matholymp.com/TUTORIALS/Rear.pdf

[115] S.-H. Yook, H. Jeong, and A.-L. Barabasi. Modeling the Inter-nets large-scale topology,PNAS99, 13382-13386 (2002)

[116] S.-H. Yook, Z.N. Oltvai and A.-L. Barabasi. Functional andtopological characterization of protein interaction networks.Proteomics4, 928-942 (2004).

[117] G. Yule. A mathematical theory of evolution based on the con-clusions of Dr. J.C. WillisF.R.S. Philosophical Transactions ofthe Royal Society of London (Series B), 213, 21-87 (1925).

[118] Y. Zhang, M. Roughan, C. Lund, and D. Donoho. Aninformation-theoretic approach to traffic matrix estimation.Proc. ACM SIGCOMM 2003.

44


http://www.antc.uoregon.edu/route-views/

http://www.dbs

http://www2.toki.or.id/book/AlgDesignManual/

Documents

Towards a Theory of Scale-Free Graphs: Deﬁnition, …2001/03/09 · arXiv:cond-mat/0501169v2 [cond-mat.dis-nn] 18 Oct 2005 Towards a Theory of Scale-Free Graphs: Deﬁnition, Properties,