166
Studies in the structure and function of complex networks with focus on Social, Technological and Engineered networks 1 Usha Nandini Raghavan, 1 Soundar Kumara and 2 eka Albert 1 Department of Industrial Engineering, The Pennsylvania State University 2 Department of Physics, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA Prologue We at the Laboratory for Intelligent Systems and Quality (LISQ) at the department of Industrial Engineering at Penn State are involved with studying complexity since 1989. In the early stages of work at LISQ we focused on analyzing sensor signals and extracting features from them for estimating the state of the machines [1, 2]. This fundamental work evolved into characterizing and analyzing the observed data. These studies established for the first time the existence of chaos in machining [3, 4, 5, 6, 7, 8]. This work in studying complexity, in specific, nonlinear dynamics was conducted in different realms, namely sensor networks, infrastructure monitoring and supply chains. Subsequently the logical question addressed was “How do we deal with complexity when the number of participating entities (nodes) increase?”. This took us in the direction of graph theory, random graphs and large scale networks. In this monograph we summarize our work with the hope that it will help the engineering community to pursue research in this new and exciting area of complex networks. This monograph is a results of sustained work over a period of last six years. Several of our students helped us shape this work. Hari Prasad Thadakamalla who started this work is instrumental in exploring supply chains as complex networks and search on weighted graphs. We started collaborating with Dr.R´ eka Albert from the early stages of Hari’s PhD thesis. Christopher Carrino explored dynamic community formation in social networks with applications to terrorist networks. Usha Nandini Raghavan and Amit Surana explored adaptivity in general. Nandini in specific addresses algorithms for community detection in

ComplexNetworks

Embed Size (px)

Citation preview

Page 1: ComplexNetworks

Studies in the structure and function of complex networks

with focus on

Social, Technological and Engineered networks

1Usha Nandini Raghavan, 1Soundar Kumara and 2Reka Albert

1Department of Industrial Engineering, The Pennsylvania State University

2Department of Physics, The Pennsylvania State University,

University Park, Pennsylvania, 16802, USA

Prologue

We at the Laboratory for Intelligent Systems and Quality (LISQ) at the department of

Industrial Engineering at Penn State are involved with studying complexity since 1989. In

the early stages of work at LISQ we focused on analyzing sensor signals and extracting

features from them for estimating the state of the machines [1, 2]. This fundamental work

evolved into characterizing and analyzing the observed data. These studies established for

the first time the existence of chaos in machining [3, 4, 5, 6, 7, 8]. This work in studying

complexity, in specific, nonlinear dynamics was conducted in different realms, namely sensor

networks, infrastructure monitoring and supply chains. Subsequently the logical question

addressed was “How do we deal with complexity when the number of participating entities

(nodes) increase?”. This took us in the direction of graph theory, random graphs and large

scale networks. In this monograph we summarize our work with the hope that it will help

the engineering community to pursue research in this new and exciting area of complex

networks.

This monograph is a results of sustained work over a period of last six years. Several of

our students helped us shape this work. Hari Prasad Thadakamalla who started this work

is instrumental in exploring supply chains as complex networks and search on weighted

graphs. We started collaborating with Dr.Reka Albert from the early stages of Hari’s PhD

thesis. Christopher Carrino explored dynamic community formation in social networks with

applications to terrorist networks. Usha Nandini Raghavan and Amit Surana explored

adaptivity in general. Nandini in specific addresses algorithms for community detection in

Page 2: ComplexNetworks

2

large social networks.

We have structured this monographs as an evolving document, with an introduction to

complex networks and a general introduction to various problems in social, technological

and engineered networks. We follow this with a series of papers that we have published

in the area of complex networks in the last few years. This research has resulted in three

PhD dissertations (Hari Thadakamalla, Christopher Carrino and Usha Nandini Raghavan

at Penn State jointly co-advised with Dr.Reka Albert).

We look forward to your feedback and comments.

Soundar Kumara April 2008

[email protected] Penn State

Page 3: ComplexNetworks

3

I. INTRODUCTION

Why does some innovation capture the imagination of a society while others do not?

How do people form opinions and how does consensus emerge in an organization? How to

capture the opinions and votes of people during election years?

What are the fundamentals of nature and how do cells and organisms evolve and

survive? What makes a cell’s functions robust and adaptable to its environment?

How can we make resource sharing through the Internet secure? In this information

age, how do we as users quickly find relevant information from the World Wide Web?

How do we guard technological infrastructures that form the backbone of our day to day

business, from malicious attacks?

How can we sense and prevent forest fires at an early stage? How do we put to use

sensor devices to detect forest fires? How can we use autonomous sensor nodes to monitor

dangerous terrains and large chemical plants?

These are only a few questions the answers to which will affect the lives of people

and the society we live in significantly. Science and engineering in their overall effort to

address these issues have created many different avenues of research; Network Science

being one among them. Network science is the study of systems mainly using their

network structure or topology. The nodes (vertices) and links (edges) of such networks

are the entities (people, bio-molecules, webpages and sensor devices) and the interactions

(friendships, chemical reactions, hyperlinks and communications respectively) between

entities respectively.

People have opinions of their own, but they also shape opinions by interacting and ex-

changing views with their friends and neighbors. Sociologists have long understood that an

individual’s behavior is significantly affected by their social interactions [9, 10]. It is now

widely believed that biological functions of cells and the robustness of cellular processes arise

due to the interactions that exist between the components of various cells [11]. Webpages

with contents and information relate to other webpages by means of hyperlinks creating a

Page 4: ComplexNetworks

4

complex web like structure; the WWW. Miniaturized wireless sensor nodes, which individu-

ally have limited capabilities, achieve an overall sensing task by communicating and sharing

information with other nodes [12, 13].

A vast amount of research in recent years has shown that organization of links (who is

connected to whom) in a network and the topological properties carry significant information

about the behaviors of the system it represents [10, 14]. Furthermore, the topological prop-

erties have a huge impact on the performances of processes such as information diffusion,

opinion formation, search, navigation and others.

Organization of links in large-scale natural networks was originally considered to be ran-

dom [10, 14, 15]. But empirical observations in the recent past have revealed topological

properties in a wide range of social, biological and technological networks that deviate from

randomness [10, 14, 16, 17]. That is, natural networks that appear in nature and whose

evolution is largely uncontrolled (self-organized) have specific organizing principles leading

to various properties or orders in their topology. This observation has sparked an interest

in the scientific study of networks and network modeling, including the desire to engineer

man-made systems to mimic the behaviors of nature.

II. NETWORKS

As explained above, complex systems are modeled as networks to understand and op-

timize processes such as formation of opinions, resource sharing, information retrieval, ro-

bustness to perturbations etc. The following are some of the examples of systems and their

network representations.

A. Natural networks

A natural network is a representation of a system that is present in nature or has evolved

over a period of time without any centralized control. Examples include;

1. Movie actor collaborations : This network consists of movie actors as nodes and edges

represent the appearance of pairs of actors in the same movie. It is a growing network

that had about 225,226 nodes and 13,738,786 edges in 1998 [18]. Interests in this

network include the study of successful collaborations (what kind of casting makes a

Page 5: ComplexNetworks

5

movie successful?) [19] and the famous Bacon number experiment to study how other

actors are linked to Kevin Bacon through their casting roles [20].

2. Scientific co-authorship: In this network, the nodes are scientists or researchers and

an edge exist between scientists if they have collaborated together in writing a paper.

Newman [21, 22, 23] studied scientific co-authorship networks from four different areas

of research. The information was obtained in an automated way from four different

databases MEDLINE, Physics E-print archive, SPIRES and NCSTRL that has a col-

lection of all the papers and their authors in areas of biomed, physics, high-energy

physics and computer science respectively. One of these networks formed from Med-

line database for the period from 1961 to 2001 had 1,520,251 nodes and 2,163,923

edges. Developing metrics to quantify the scientific productivity or cumulative impact

of a scientist given his/her collaborations is one problem of interest in co-authorship

networks [24, 25]. The Erdos Number project , which motivated the Bacon number, is

a popular experiment that is used in the study of optimal co-authorship structures of

successful scientists [26].

3. The Internet : The Internet is a network of computers and devices connected by wired

or wireless links. The study of Internet is carried out at two different levels namely,

router level and at the level of autonomous systems [14, 27]. At the router level, each

router is represented as a node and the physical connections between them as the edges

in the network. In the autonomous systems level, every domain (Internet Provider

System) is represented as a node and the inter-domain connections are represented by

the edges. The number of nodes at the router and domain level were 150,000 in 2000

[27] and 4000 in 1999 [28] respectively. The problem of identifying and sharing files

efficiently over peer-to-peer networks (such as Gnutella [29]) that are built over the

Internet has received significant attention in recent years [30, 31].

4. World Wide Web (WWW): The WWW is a network of webpages where the hyperlinks

between the webpages are represented by the edges in the network. It is a growing

network that had about one billion nodes in 1999 [32] with a recent study estimating

the size to be about 11.5 billion in January 2005 [33]. Information retrieval from

WWW is a problem of immense interest. Algorithms such as Page Rank [34] or the

Page 6: ComplexNetworks

6

ones proposed by Kleinberg in [35], use the network structure to extract webpages in

the order of relevance to user requests.

5. Neural networks : Here the nodes are neurons and an edge connects two neurons if there

is a chemical or electrical synapse between them. Watts and Strogatz [14, 18] studied

topological properties of the neural network of nematode worm C.elegans consisting

of 282 neurons with pairs of neurons connected by the presence of either a synapse

or a gap junction. Study of neural networks is important for understanding how the

brain stores and processes information [17]. While we can observe that this is done in

an optimal and robust way in neural networks we are still at loss in quantifying this

mechanism [17].

6. Cellular networks: Here the substrates or molecules that constitute a cell are repre-

sented as nodes and the presence of bio-chemical interactions between the molecules are

represented as edges [14]. Among others, the interactions between protein molecules

are important for many biological functions [11, 36]. Jeong et al. [11] have studied

the topology of protein-protein interaction map of the yeast S.cerevisia that consists

of 1870 nodes as proteins and connected by 2240 identified interactions. Using the

network structure to predict possible (previously unidentified) interactions between

protein molecules has received wide spread attention from researchers [37, 38].

B. Engineered networks

Engineered networks are those in which the nodes of the network follow a pre-specified

set of protocols by which the links are formed. Whether the control is centralized or de-

centralized, the organization is engineered to achieve desired topological properties. Some

examples follow.

1. Agent-based supply chain networks : Here software agents that are responsible for the

functions of a supplier, manufacturer, distributor and retailer are the nodes and the

direct flow of information/tasks/commodities between entities are represented by the

edges in the network. Thadakamalla et al [39] studied the topological properties of a

military supply-chain (with 10,000 nodes [40]) and proposed mechanisms by which the

nodes can re-organize under functional constraints to provide better performances.

Page 7: ComplexNetworks

7

2. Wireless Sensor Networks (WSN): Here the nodes represent miniaturized wireless sen-

sor devices that consist of a short-ranged radio tranceiver and limited computational

capabilities [12, 13]. Though individual sensors have limited capacities, the true value

of the system is achieved by sharing responsibilities and information through a com-

munication infrastructure [13]. Thus an edge in a WSN represents the presence of

communication between two nodes. The number of nodes in a WSN can vary any-

where between a few hundreds or thousands to even millions depending on the appli-

cation scenario. The sensor nodes when deployed in a sensing region will self-organize

to establish a communication topology. There is considerable interest in developing

topology control protocols that will guide this organization process to support the

global sensing tasks [12, 41, 42].

C. Scientific and Engineering interests

Interest in the study of natural complex networks can be broadly classified into two

classes, namely scientific and engineering. The scientific interest lies in understanding the

structure, evolution, and properties of networks, with an eventual goal of engineering more

efficient processes on these networks. The engineering interest, on the other hand, lies in

developing more efficient algorithms and finding optimal parameters to better control the

processes taking place on such networks [10, 14, 17].

With an increasing understanding on the structural organization leading to emergent

properties, a rich literature of complex network models that can mimic such properties

has developed [10, 14, 17, 18]. These network models then form the basis on which

processes such as disease propagation, information diffusion, search, navigation and others

are studied and analyzed. Some of the interesting questions that can be answered using a

combination of both aspects of this research include 1) how to control the spread of diseases

in a large class of people interconnected by physical contacts, 2) how to study, maintain,

and control the diffusion of information in WWW, and 3) how to better identify targets

for drug discovery in metabolic networks? In parallel there is also considerable interest

in engineering networks such as supply chains and miniaturized wireless sensors, where,

by controlling the interactions between entities desired behaviors are achieved [12, 39, 43, 44].

Page 8: ComplexNetworks

8

Useful links within Penn State

• Laboratory for Intelligent Systems and Quality

• Biological physics and network modeling

• The Huck Institute of Life Sciences

• Center for supply chain research

Other links

• Center for Complex Network Research at Notre Dame

• Center for the study of complex systems at University of Michigan, Ann Arbor

• Social computing lab at Hewlett-Packard Labs

• Complex Systems group at the Los Alamos National Labs

• The Santa Fe Institute

• The Biocomplexity Institute at the Indiana University

• New England Complex Systems Institute

• Amaral Research group at the Northwestern University

• cFinder - Clusters & Communities - overlapping dense groups in networks

• International Network for Social Network Analysis

• Center for Computational Analysis of Social and Organizational Systems

• Small world project

• Tracing information flow - Project jointly developed at Cornell University and Carleton

College

• Program on Networked Governance

• HOT-Highly Optimized Tolerance at UCSB and Caltech

Page 9: ComplexNetworks

9

• Berkeley WEBS(Wireless Embedded Systems)

• Center for embedded network sensing at UCLA

• Embedded Networks Laboratory at USC

• Microeconomic and Social Systems at Yahoo Research

• Google Research

• Web Search & Mining and Web search and Data mining groups at Microsoft research

Links to complex network software

• orgnet software

• Graphviz - Graph Visualization Software

• NetworkX - Python package for creation, manipulation and study of complex networks

• Pajek - Program for large network analysis

III. SOCIAL NETWORKS

In a social network the nodes represent actors (such as individuals) who are interconnected

by relationships (such as friendship or acquaintance). Social network analysis (SNA) deals

with the study of such networks and how the structural measures and properties relate to

individuals and the processes taking place on these networks.

SNA emphasis the prominent role relationships play in characterizing an individual entity

(or actor). Some of the properties that are used today in complex networks research have

their origins in sociometry such as degree, betweenness centrality, closeness centrality etc.

Such concepts were defined to quantify the prominent or central role played by an actor in

a given network. Under the framework of complex network theory and SNA, there has been

many research efforts that characterized the social interactions or the relative importance

of nodes in movie actor collaborations [16, 20], co-authorship networks [24] and others.

There has also been many work that has to some extent characterized the roles of actors

and predicting future collaborations in terrorist networks [19, 45]. In [45] an extended

Page 10: ComplexNetworks

10

network of September 11th hijackers and their associates it was shown that many ties in

the network were concentrated around the pilots or persons with unique skills. Hence by

targeting and removing those with necessary skills (or high-degree nodes) for a project can

inflict maximum damage to the project’s mission (network connectivity).

There has however been a constant debate on the validity of data points that are collected

to form networks involving people and their relationships. For example, if one wants to study

relationships among school children, the network is formed by asking individual children in

a specific school to identify their friends. It is possible that some children tend to acquaint

or call every one in his/her class as friends. Especially when the data points collected is

small, it is often difficult to provide confidence to statistical analysis/observations and their

consequences. Scientific collaboration is a network where an abundant amount of accurate

information is available on scientists and their collaborations. As a result they are very

popular in the research community in the study of their structures and in understanding the

social implications of their structural properties. In this network, the nodes are scientists

or researchers and an edge exists between scientists if they have collaborated together in

writing a paper. The network can also be weighted based on some index of the number

of collaborations between scientists. In [21, 22, 23], scientific collaboration networks from

various fields (biomedical, theoretical physics, high-energy physics and computer science)

were considered for structural analysis. One of the important consequences of understanding

the underlying structures of such social networks is to test new theories on models of these

networks [10, 14, 17].

Citation networks have been studied extensively to identify the historical and social

impact of papers/research/scientists. Since the introduction of the Science Citation Index

(SCI) by the Institute for Scientific Information, researchers have been able to construct

and study the structure of large volumes of citation interconnections between papers. The

SCI provides a list all papers from selected Journals and under each of these papers is again

another list of papers that has references to them. In particular, a citation network consists

of papers as nodes and an edge exists between papers, directed towards the cited paper. Price

[46], based on his empirical study, was the first to observe that in many papers one half of

the references were to a research front of recent papers, while the other half of the references

were uniformly randomly scattered through the literature. This suggests that there is a

tendency among researchers to build a research front based on recent work. Currently, there

Page 11: ComplexNetworks

11

are many databases with information on papers and their references/citations that are freely

available to the community. Few such databases include the Stanford Public Information

Retrieval(SPIRES) which consists of papers in the field of High Energy physics, CiteSeer

which is an open access digital library that consists of a comprehensive list of scientific and

academic papers, Citebase that indexes papers that are self-archieved by authors in the field

of physics, mathematics and computer science and BioMed Central and PubMed Central

that indexes published papers in the field of biomedicine. Availability of large volumes of

accurate data has revived the interests among researchers in the field of citation analysis.

Hirsch [24] developed a structural measure called as the h-index which, unlike previous

measures, can quantify the cumulative impact and relevance of an individual’s scientific

research output. In specific, the h-index of a scientist is h if h of his/her papers has at

least h citations and the others have fewer than h citations. If this index, Hirsch argues, is

different for two different scientists who both have the same number of publications and the

same number of overall citations, then the scientist with a higher h value is likely to be the

most accomplished between the two.

The telecom industry has provided us with some of the most naturally available social

network structures for statistical analysis. Aiello et al [47, 48] analyzed the graph of long-

distance calls made between different phone numbers. They constructed a random graph

model that best emulates the properties of the phone call networks. A similar study was

also done by Nanavati et. al. [49] on a call graph constructed between cell phone users of

a certain telecom provider. Here directed edges were considered between callers originating

from the person making the call to the person who is receiving the call. Their analysis

showed that some of the properties (namely degree distribution) were different from similar

networks such as the WWW and e-mail graphs. They further proposed a Treasure-Hunt

model that can capture these degree distributions effectively.

In addition to studying the structures of social networks, there have also been works that

combine network analysis with other methods to make inferences about the characteristics

of individual entities or groups. In [50], the network of committees and subcommittees of

the U.S. House of Representatives between the 101st and 108th Congress were analyzed.

Here an edge exists between committees if they have common membership. In addition to

network theory, a Singular Value Decomposition (SVD) analysis of the roll call votes by the

members were used to identify correlations between members’ committee assignments and

Page 12: ComplexNetworks

12

their political positions (such as Republic or Democract). Hogg and Adamic [51] have argued

that using ranking methods such as PageRank [34] or NodeRank [52] to assign reputations

to nodes in a social network can be made effective by making it more difficult to alter ratings

via duplication or collusion. In particular they argue that the structural measures of the

social networks can be effectively used to make ranking systems more reliable.

While traditional models for disease propagation that assume a fully mixed population

work well on small-sized populations they fail to agree with observed trends in heterogenous

and large-sized populations [10, 53, 54, 55, 56, 57]. In such cases, simulation has emerged

as a powerful tool that can capture both the topological properties and changes along with

the disease dynamics to provide a better understanding of the disease propagation in social

networks [58]. There have also been several studies related to opinion formation [59] and

finding community or group structures in social networks [35, 60, 61, 62, 63]. It has been

observed that the interconnections between nodes in real-world networks is not random,

but display a structure wherein nodes show preferences in being connected to other nodes

within a tightly knit group. Finding such tightly interconnected groups of nodes (termed

as communities), can offer a micro level information about the structures of the networks

individually within communities and as a whole [61, 63, 64]. To social networks in particu-

lar, communities can throw light on opinion formation, common characteristics and beliefs

among groups of people that make them different from other communities.

IV. TECHNOLOGICAL NETWORKS

Information sharing and retrieval drives the day to day business across the world. This

need has propelled the research interests on technological networks such as the Internet and

the WWW. The goal of such a research is to develop efficient protocols for communication

on the Internet and information retrieval in the WWW. To achieve this goal a two pronged

approach is required. One branch of research focus is in the understanding of the organization

of the technological networks. Using this understanding, the second focus relies on the

network models developed, to optimize information sharing and retrieval.

The map of the Internet is considered at two different scales. At the level of Autonomous

Systems (AS), the network of the Internet consists of nodes as ASs with edges representing

the physical communications. An AS here is an organizational unit of a particular domain

Page 13: ComplexNetworks

13

or service provider. Edges representing physical communications connect sub-networks or

devices across these ASs. Exchange of information between devices or sub-networks are

done using routers. Routers are devices responsible for receiving and forwarding data pack-

ets. The map of the Internet at this grainier scale consists of nodes as routers and their

communications within and across ASs as edges.

The structure of the Internet has been analyzed extensively at both these scales [27, 65].

Faloutsos et al [27] have analyzed the Internet AS network individually on data collected

between 1997 to 1999. During the years, while the number of nodes and edges increased

from 3112 and 5450 to 5287 and 10,100 respectively, the average degree remained a constant.

This was also the case with average path lengths, which was found to be approximately 3.8

for all the three years. Furthermore, the path length distribution is peaked around the

average value and the shape remains essentially unchanged over the three years. On the

other hand one property of the network that did change over the years was the clustering

coefficient. It increased from 0.18 in 1997 to 0.24 in 1999. This is due to the modular

structure of the Internet where many small ASs within countries might be interconnected

forming clusters, while there are only a few connections to global areas. The distribution

of the clustering coefficient is a power-law decaying function of the degree on the nodes,

Ck ≈ k−ν , with ν = 0.75 ± 0.03. All properties except the distribution of the clustering

coefficient are similar for the Internet network at the router level. At the router level the

distribution of clustering coefficient shows independence to the degree on the nodes.

Doyle et al [66] take a different view of the steps required to understand the organized

complexities present in the Internet topology. In the Internet, physical connections between

the subsystems and routers forms the lower layer of the protocol stack. The protocols are the

ones responsible for routing and forwarding of data packets and tries to carry effectively the

expected overall traffic demand generated by the end-users. They stress that it is this func-

tional requirement that drives the organization of the Internet topology. By optimizing on

the network throughput (flow of traffic) given the user demands at all the end vertices, Doyle

et al show that one can create a network that has very similar properties and performances

as that of the real Internet.

Enabled by the growing infrastructure of the Internet, WWW is another technological

network that has grown in manifolds over the years. The WWW network consists of web-

pages as nodes and hyper-links from one webpage to another forming a directed edge between

Page 14: ComplexNetworks

14

those nodes. Since it is a directed network, unlike the internet, its in and out degree distri-

butions are analyzed differently. Albert et. al. observed the presence of a power-law degree

distribution in the WWW map at the *.nd.edu domain [67]. While the power-law exponent

for the in-degree distribution was 2.1, the exponent for the out-degree distribution was 2.45.

Pencock et al [68], analyzed the WWW by dividing them along subject categories, such

as computer science, universities, companies and newspapers. Within these categories, the

in-degree distribution of the networks showed considerable variability in the power-law ex-

ponent ν, varying between 2.1 and 2.6. This implies that the structure of the WWW shows

different dynamics based on the way the information of the webpages and their connections

are identified and mapped [64].

It has also been shown that the way in which nodes and interconnections are identified

in networks (sampling methods) can affect our estimation of the structural properties of the

original network [69, 70, 71, 72, 73, 74, 75]. For example in [76] Lakhina et. al. show that

by using traceroute-like sampling methods [75] it is possible to conclude from the sample

that the network has scale-free property when in fact the original network is a random graph

[15].

File sharing peer-to-peer networks, such as Gnutella, are another kind of communication

network that has emerged on the top of the basic Internet structure. In specific, Adamic et al

and Thadakamalla et al have analyzed how decentralized search processes on networks such

as Gnutella are affected by the heterogeneities in the degree and edge weight distributions

[30, 31, 77]. In [77] the authors studied decentralized search processes in spatial scale-free

networks. In particular they showed that two factors, namely direction and degree on nodes,

are sufficient to guide the search processes to finding the shortest paths from the origin to

destination. This result adds further evidence to the conjecture that many natural networks

are inherently searchable [78, 79].

Information retrieval is an important issue on the WWW. Search engines are useful tools

that help in information retrieval. Algorithms such as Page Rank are used to retrieve the

webpages in the order that is expected to be of relevance to user requests. This algorithm uses

both the individual webpage’s value and the value attached to the webpage by its neighbors

as an indicator of the overall value of a given webpage [34]. Kleinberg et al [35], propose

similar link based mechanisms for retrieving webpages with relevant information, but do so

using two different sets of measures. They associate values to webpage that determines if

Page 15: ComplexNetworks

15

they are good authority and/or good hubs. A good hub is a webpage that has hyperlinks to

many good authorities and a good authoritarian webpage is one that is referenced by many

good hubs. The best set of hubs and authorities then contain information that is of most

relevance to the user. Such an approach, according to Kleinberg, was motivated by a large

presence of bi-partite sub-structures observed in the WWW network [35].

Large supply chains are among the networks that have complex topologies [39, 80, 81].

Analysis of the topological properties of real-world supply chains is difficult. This is because

supply chains are composed of various individual and independent entities such as suppliers,

manufacturers, distributors and retailers. Hence, it is difficult to compose information from

various sources to form an accurate picture of any large-scale supply chain. It is however well

known that their topologies tend to be hierarchical so as to enable product flow downstream

from suppliers to customers and information flow upstream from customers back to the

suppliers. One of the well studied dynamics on the supply chains is the Bullwhip effect [82],

where small variabilities or uncertainties created at the lower most layer increases as the

uncertainties move upstream towards the manufacturer and suppliers. This cascading effect

is due to the coupling of complexities arising from human judgement with that of the supply

chain structure. Cascading effects are also studied in the context of power distribution in

power grids [83, 84, 85, 86, 87]. The North American power grid is one of the most complex

technological network. It consists of substations of three types namely, generation substation

responsible for producing electric power, transmission substations that transfers power along

high voltage lines and distribution substations that distribute power to small, local grids.

Kinney et al [87], study the effect of cascading failures on the exact topology of the North

American power grid with plausible assumptions about the load and overload of substations.

If a substation fails to work, then the generated power, since it cannot be destroyed, is re-

routed via other nodes in the network. As a result, the load on other nodes increases and

may result in cascading effects. Under single node removal, Kinney et al showed that 40

percent of transmission substations lead to cascading failures in the North American power

grid.

Page 16: ComplexNetworks

16

V. CONTENTS TO FOLLOW

We have collected a series of papers that we have published in the last few years and

added them as the remaining contents of this document. They are

• A. Surana, S. Kumara, M. Greaves and U.N. Raghavan, “Supply-chain networks:

a complex adaptive systems perspective”, International Journal of Productions Re-

search, Vol. 43, No. 20, pp. 4235 - 4265 (2005).

With the use of sophisticated technologies and the supply chains becoming more and

more global, they have acquired a complexity almost equivalent to biological systems.

In this paper, we investigate supply chain complexity from a complex adaptive sys-

tems perspective. In specific we use tools and techniques from the fields of nonlinear

dynamics, statistical physics and information theory to characterize and model supply

chain networks.

• H.P. Thadakamalla, U.N. Raghavan, S. Kumara and R. Albert, “Survivability of

Multiagent-Based Supply Networks: A Topological Perspective ”, IEEE Intelligent

Systems, Vol. 19, No. 5, pp.24 - 31 (2004).

Our main focus in this paper is on the survivability of supply networks. In specific we

look at the survivability of supply networks from a topological perspective. We define

several components that encompass topological survivability and propose methods by

which one can build topologically survivable supply networks.

• H.P. Thadakamalla, R. Albert and S. Kumara, “Search in weighted complex networks”,

Physical Review E, Vol. 72(066128) (2005) and

H.P. Thadakamalla, R. Albert and S. Kumara, “Search in spatial scale-free networks”,

New Journal of Physics, Vol. 9(190) (2007).

Search in networks is one of the important dynamical processes with a wide range of

applications including information retrieval from WWW, searching for files in peer-to-

peer networks and identifying specific nodes in ad-hoc and wireless sensor networks.

In these papers we develop and investigate decentralized search algorithms in various

classes of complex networks.

• U.N. Raghavan and S. Kumara, “Decentralized topology control algorithms for con-

nectivity of distributed wireless sensor networks”, International Journal of Sensor

Page 17: ComplexNetworks

17

Networks, Vol. 2, No. 3/4, pp.201 - 210 (2007). and

U.N. Raghavan, H.P. Thadakamalla and S. Kumara, “Phase transitions and connec-

tivity in distributed wireless sensor networks”, in the proceedings of ADCOM’05, pp.

10 - 15, Coimbatore, India (2005).

In this paper we investigate topological requirements in Wireless Sensor Networks. In

particular we focus on one such requirement, namely connectivity. With power being

one of the scarce resources in wireless sensor networks, we optimize power expenditure

subject to network connectivity.

• U.N. Raghavan, R. Albert and S. Kumara, “Near linear time algorithm to detect

community structures in large scale networks”, Physical Review E, Vol. 76 (036106)

(2007).

In this paper we study the presence of clusters/communities in various real-world com-

plex networks such as movie actor collaboration network, protein-protein interaction

maps, scientific co-authorships and the WWW.

• H.P. Thadakamalla, S. Kumara and R. Albert, “Complexity and large scale networks”,

Chapter 11 in Operations Research and Management Science Handbook edited by A.

R. Ravindran, CRC press (2007).

Engineering community, in specific the Industrial Engineering community’s focus is

on OR. We have thoroughly investigated the relationship between OR and complex

networks in this book chapter.

[1] S. Kamarthi, S. Kumara, and P. Cohen, Wavelet Representation of Acoustic Emission in

Turning Process (????).

[2] S. Kamarthi, S. Kumara, and P. Cohen, Journal of Manufacturing Science and Engineering

122, 12 (2000).

[3] S. Bukkapatnam, S. Kumara, and A. Lakhtakia, IIE Transactions 27, 519 (1995).

[4] S. Bukkapatnam, S. Kumara, and A. Lakhtakia, Physical Review E 52, 2375 (1995).

[5] S. Bukkapatnam, A. Lakhtakia, and S. Kumara, Speculations in Science and Technology 19,

137 (1996).

Page 18: ComplexNetworks

18

[6] S. Bukkapatnam, S. Kumara, and A. Lakhtakia, ASME Transactions Journal of Manufactur-

ing Science and Engineering 121, 568 (1999).

[7] S. Bukkapatnam, S. Kumara, and A. Lakhtakia, IMA Journal of Applied Mathematics 63,

149 (1999).

[8] S. Bukkapatnam, S. Kumara, and A. Lakhtakia, CIRP Journal of Manufacturing Systems 29,

321 (1999).

[9] S. Wasserman and K. Faust, Social network analysis: Methods and Applications (Cambridge

University Press, 1994).

[10] M. E. J. Newman, SIAM Review 45, 167 (2003).

[11] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A. L. Barabasi, NATURE v 407, 651

(2000).

[12] P. Santi, Topology Control in Wireless Ad Hoc and Sensor Networks (John Wiley and Sons,

Chichester, UK, 2005).

[13] D. Estrin, R. Govindan, J. Heidmann, and S. Kumar, in the Proceedings of ACM MobiCom

pp. 263–270 (1999).

[14] R. Albert and A.-L. Barabasi, Reviews of Modern Physics 74, 47 (2002).

[15] B. Bollobas, Random Graphs (Academic Press, Orlando, FL, 1985).

[16] R. Albert, H. Jeong, and A.-L. Barabasi, Nature 401, 130 (1999).

[17] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang, Physics Reports 424, 175

(2006).

[18] D. Watts and S. Strogatz, Nature 393, 440 (1998).

[19] C. Carrino, Ph.D. thesis, The Pennsylvania State University (2006).

[20] B. Tjaden and G. Wasson, The oracle of bacon, http://www.cs.virginia.edu/oracle/ (last ac-

cessed April 2008).

[21] M. E. J. Newman, Proceedings of National Academy of Sciences 98, 404 (2001).

[22] M. E. J. Newman, Physical Review E 64, 016131 (2001).

[23] M. E. J. Newman, Physical Review E 64, 016132 (2001).

[24] J. E. Hirsch, Proceedings of the National Academy of Sciences 102, 16569 (2005).

[25] L. Egghe, Scientometrics 69, 131 (2006).

[26] J. Grossman, P. Ion, and R. Castro, Erdos number project, http://www.oakland.edu/enp/(last

accessed April 2008).

Page 19: ComplexNetworks

19

[27] M. Faloutsos, P. Faloutsos, and C. Faloutsos, in SIGCOMM ’99: Proceedings of the conference

on Applications, technologies, architectures, and protocols for computer communication (ACM,

1999), pp. 251–262.

[28] R. Govindan and H. Tangmunarunkit, in IEEE INFOCOM 2000 (Tel Aviv, Israel, 2000), pp.

1371–1380.

[29] G. Kan, Peer-to-Peer Harnessing the Power of Disruptive Technologies (O’Reilly, Beijing,

2001), chap. Gnutella.

[30] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman, Physical Review E 64,

046135 (2001).

[31] H. P. Thadakamalla, R. Albert, and S. R. T. Kumara, Physical Review E 72, 066128 (2005).

[32] S. Lawrence and C. L. Giles, Nature 400, 107 (1999).

[33] A. Gulli and A. Signorini, in WWW ’05: Special interest tracks and posters of the 14th

international conference on World Wide Web (ACM Press, New York, USA, 2005), pp. 902–

903.

[34] L. Page, S. Brin, R. Motwani, and T. Winograd, Tech. Rep., Stanford Digital Library Tech-

nologies Project (1998), URL citeseer.ist.psu.edu/page98pagerank.html.

[35] J. M. Kleinberg, Journal of the ACM 46, 604 (1999).

[36] H. Jeong, S. Mason, A.-L. Barabasi, and Z. Oltvai, Nature 411, 41 (2001).

[37] I. Albert and R. Albert, Bioinformatics 20 (2004).

[38] R. Albert, The Plant Cell 19, 3327 (2007).

[39] H. P. Thadakamalla, U. N. Raghavan, S. Kumara, and R. Albert, IEEE Intelligent Systems

19, 24 (2004).

[40] S. Kumara, Tech. Rep., The Pennsylvania State University (2005).

[41] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, Computer Networks 38, 393

(2002).

[42] D. Culler (2001).

[43] D. M. Blough, M. Leoncini, G. Resta, and P. Santi, IEEE Transactions on Mobile Computing

(to appear) (2006).

[44] J. M. Ottino, Nature 427 (2004).

[45] V. Kerbs, First Monday 7 (2002).

[46] D. Price, Science 149, 510 (1965).

Page 20: ComplexNetworks

20

[47] W. Aiello, F. Chung, and L. Lu, Proceedings of the thirty-second annual ACM symposium on

Theory of computing pp. 171–180 (2000).

[48] W. Aiello, F. Chung, and L. Lu, Experimental Mathematics 10, 53 (2001).

[49] A. Nanavati, S. Gurumurthy, G. Das, D. Chakraborty, K. Dasgupta, S. Mukherjea, and

A. Joshi, in CIKM ’06: Proceedings of the 15th ACM international conference on Information

and knowledge management (ACM, New York, NY, USA, 2006), pp. 435–444.

[50] M. Porter, P. Mucha, M. Newman, and A. Friend, Physica A 386, 414 (2007).

[51] T. Hogg and L. Adamic, in EC ’04: Proceedings of the 5th ACM conference on Electronic

commerce (ACM, New York, NY, USA, 2004), pp. 236–237.

[52] K. Chitrapura and S. Kashyap, in CIKM ’04: Proceedings of the thirteenth ACM international

conference on Information and knowledge management (ACM, New York, NY, USA, 2004),

pp. 597–606.

[53] R. Pastor-Satorras and A. Vespignani, Physical Review E 63, 066117 (2001).

[54] R. Pastor-Satorras and A. Vespignani, Physical Review Letters 86, 3200 (2001).

[55] R. Pastor-Satorras and A. Vespignani, Physical Review E 65, 035108 (2002).

[56] R. Pastor-Satorras and A. Vespignani, Physical Review E 65, 036104 (2002).

[57] R. Pastor-Satorras and A. Vespignani, Handbook of Graphs and Networks (Wiley-VCH, Berlin,

2003), chap. Epidemics and immunization in scale-free networks.

[58] C. Christensen, I. Albert, B. Grenfell, and R. Albert (2008), working paper.

[59] F. Wu and B. Huberman, Computational Economics 0407002, EconWPA (2004), available at

http://ideas.repec.org/p/wpa/wuwpco/0407002.html.

[60] M. E. J. Newman and M. Girvan, Physical Review E 69, 026113 (2004).

[61] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek, Nature 435, 814 (2005).

[62] J. Duch and A. Arenas, Physical Review E 72, 027104 (2005).

[63] U. Raghavan, R. Albert, and S. Kumara, Physical Review E 76, 036106 (2007).

[64] G. Flake and D. Pencock, The Colours of Infinity: Self-organization, Self-regulation, and

Self-similarity on the Fractal Web (2004).

[65] A. Vazquez, R. Pastor-Satorras, and A. Vespignani, Internet topology at the router and

autonomous system level (2002), URL http://www.citebase.org/abstract?id=oai:arXiv.org:

cond-mat/0206084.

[66] J. Doyle, D. Alderson, L. Li, S. Low, M. Roughan, S. Shalunov, R. Tanaka, and W. Willinger,

Page 21: ComplexNetworks

21

Proceedings of the National Academy of Sciences 102, 14497 (2005).

[67] A.-L. Barabasi and R. Albert, Science 286, 509 (1999).

[68] D. Pennock, G. Flake, S. Lawrence, E. Glover, and C. Giles, Proceedings of the National

Academy of Sciences 99, 5207 (2002).

[69] J. Leskovec and C. Faloutsos, in KDD ’06: Proceedings of the 12th ACM SIGKDD inter-

national conference on Knowledge discovery and data mining (ACM, New York, NY, USA,

2006), pp. 631–636.

[70] A. Ghani, C. Donnelly, and G. Garnett, Statistics in Medicine 17, 2079 (1998).

[71] R. Rothenberg, Connections 18, 105 (1995).

[72] P. Biernacki and D. Waldorf, Sociological Methods & Research 10, 141 (1981).

[73] A. Awan, R. Ferreira, S. Jagannathan, and A. Grama, in HICSS ’06: Proceedings of the

39th Annual Hawaii International Conference on System Sciences (IEEE Computer Society,

Washington, DC, USA, 2006), p. 223.3.

[74] D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger, INFOCOM 2006. 25th IEEE

International Conference on Computer Communications. Proceedings pp. 1–6 (April 2006).

[75] D. Achlioptas, A. Clauset, D. Kempe, and C. Moore, in STOC ’05: Proceedings of the thirty-

seventh annual ACM symposium on Theory of computing (ACM, New York, NY, USA, 2005),

pp. 694–703.

[76] A. Lakhina, J. Byers, M. Crovella, and P. Xie, INFOCOM 2003. Twenty-Second Annual Joint

Conference of the IEEE Computer and Communications Societies. IEEE 1, 332 (2003).

[77] H. P. Thadakamalla, R. Albert, and S. R. T. Kumara, New Journal of Physics 9, 190 (2007).

[78] J. Kleinberg, Nature 406, 845 (2000).

[79] J. Kleinberg, Proceedings of the International Congress of Mathematicians 3, 1019 (2006).

[80] A. Surana, S. Kumara, M. Greaves, and U. Raghavan, International Journal of Productions

Research 43, 4235 (2005).

[81] D. Pathak, J. Day, A. Nair, W. Sawaya, and M. Kristal, Decision Sciences 38, 547 (November

2007).

[82] H. Lee, V. Padmanabhan, and S. Whang, Sloan Management Review 38, 93 (1997).

[83] Y. Moreno, J. B. Gomez, and A. F. Pacheco, Europhys. Lett. 58, 630 (2002).

[84] Y. Moreno, R. Pastor-Satorras, A. Vazquez, and A. Vespignani, Europhys. Lett. 62, 292

(2003).

Page 22: ComplexNetworks

22

[85] A. E. Motter and Y. Lai, Physical Review E 66, 065102 (2002).

[86] A. E. Motter, Phys. Rev. Lett. 93, 098701 (2004).

[87] R. Kinney, P. Crucitti, R. Albert, and V. Latora, The European Physical Journal B 46, 101

(2005).

Page 23: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

International Journal of Production Research,Vol. 43, No. 20, 15 October 2005, 4235–4265

Supply-chain networks: a complex adaptive systems perspective

AMIT SURANAy, SOUNDAR KUMARA*z, MARK GREAVES§and USHA NANDINI RAGHAVANz

yDepartment of Mechanical Engineering,

The Massachusetts Institute of Technology, Cambridge, MA 02139, USA

z310 Leonhard Building,

The Harold and Inge Marcus Department of Industrial & Manufacturing Engineering,

The Pennsylvania State University, University Park, PA 16802, USA

§IXO, DARPA, 3701 North Fairfax Drive, Arlington, VA 22203-1714, USA

(Revision received May 2005)

In this era, information technology is revolutionizing almost every domain oftechnology and society, whereas the ‘complexity revolution’ is occurring inscience at a silent pace. In this paper, we look at the impact of the two, in thecontext of supply-chain networks. With the advent of information technology,supply chains have acquired a complexity almost equivalent to that of biologicalsystems. However, one of the major challenges that we are facing in supply-chainmanagement is the deployment of coordination strategies that lead to adaptive,flexible and coherent collective behaviour in supply chains. The main hurdle hasbeen the lack of the principles that govern how supply chains with complexorganizational structure and function arise and develop, and what organizationsand functionality are attainable, given specific kinds of lower-level constituententities. The study of Complex Adaptive Systems (CAS), has been a researcheffort attempting to find common characteristics and/or formal distinctionsamong complex systems arising in diverse domains (like biology, social systems,ecology and technology) that might lead to a better understanding of how com-plexity occurs, whether it follows any general scientific laws of nature, and how itmight be related to simplicity. In this paper, we argue that supply chains shouldbe treated as a CAS. With this recognition, we propose how various concepts,tools and techniques used in the study of CAS can be exploited to characterizeand model supply-chain networks. These tools and techniques are based on thefields of nonlinear dynamics, statistical physics and information theory.

Keywords: Supply chain; Complexity; Complex adaptive systems; Nonlineardynamics; Networks

1. Introduction

A supply chain is a complex network with an overwhelming number of interactionsand inter-dependencies among different entities, processes and resources. The net-work is highly nonlinear, shows complex multi-scale behaviour, has a structurespanning several scales, and evolves and self-organizes through a complex interplay

*Corresponding author. Email: [email protected]

International Journal of Production Research

ISSN 0020–7543 print/ISSN 1366–588X online # 2005 Taylor & Francis

http://www.tandf.co.uk/journals

DOI: 10.1080/00207540500142274

Page 24: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

of its structure and function. This sheer complexity of supply-chain networks, withinevitable lack of prediction, makes it difficult to manage and control them. Further-more, the changing organizational and market trends mean that the supply chainsshould be highly dynamic, scalable, reconfigurable, agile and adaptive: the networkshould sense and respond effectively and efficiently to satisfy customer demand.Supply-chain management necessitates the decisions made by business entities toconsider more factors that are global. The successful integration of the entiresupply-chain process now depends heavily on the availability of accurateand timely information that can be shared by all members of the supply chain.Information technology, with its capability of setting up dynamic informa-tion exchange networks, has been a key enabling factor in shaping supply chainsto meet such requirements. However, a major obstacle remains in the deployment ofcoordination and decision technologies to achieve complex, adaptive, and flexiblecollective behaviour in the network. This is due to the lack of our understandingof organizational, functional and evolutionary aspects in supply chains. A keyrealization to tackle this problem is that supply-chain networks should be treatednot just as a ‘system’ but as a ‘Complex Adaptive System’ (CAS). The study ofCAS augments the systems theory and provides a rich set of tools and techniquesto model and analyse the complexity arising in systems encompassing science andtechnology. In this paper, we take this perspective in dealing with supply chains andshow how various advances in the realm of CAS provide novel and effective waysto characterize, understand and manage their emergent dynamics.

A similar viewpoint has been emphasized by Choi et al. (2001), who aimed todemonstrate how supply networks should be managed if we recognize them as CAS.The concept of CAS allows one to understand how supply networks as living systemsco-evolve with the rugged and dynamic environment in which they exist and identifypatterns that arise in such an evolution. The authors conjecture various propositionsstating how the patterns of behaviour of individual agents in a supply network relateto the emergent dynamics of the network. One of the important deductions made isthat when managing supply networks, managers must appropriately balance howmuch to control, and how much to let emerge. However, no concrete framework hasbeen suggested under which such conjectures can be verified and generalized. It isthe goal of this paper to show how the theoretical advances made in the realm ofCAS can be used to study such issues systematically and formally in the context ofsupply-chain networks.

We posit that supply chains are complex adaptive systems. However, we do notprovide conclusive proofs for such a claim. We survey the emerging literature, faith-fully report on the state of the art in CAS and try to establish connections, as muchas possible, between CAS tools and supply-chain analysis. Through our effort, wewould like to pave research directions in supply chains from a CAS point of view.This paper is divided into eight sections. In section 2, we give a brief introductionto complex adaptive systems in which we discuss the architecture and characteristicsof complex systems in diverse areas encompassing biology, social systems, ecologyand technology. In section 3, we discuss characteristics of supply chain-networksand argue that they should be understood in terms of a CAS. We also presentsome emerging trends in supply chains and the increasing critical role of informationtechnology in supply-chain management in the light of these trends. In section 4,we give a brief overview of the main techniques used for modelling and analysis

4236 A. Surana et al.

Page 25: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

of supply chains and then discuss how the science of complexity provides a genuineextension and reformulation of these approaches. Like any CAS, the study ofsupply chains should involve a proper balance of simulation and theory. Systemdynamics-based and recently agent-based simulation models (inspired by complexitytheory) are used extensively to make theoretical investigations of supply chainsfeasible and to support decision-making in real-world supply chains. A systemdynamics approach often leads to models of supply chains described in the formof a dynamical system. Dynamical systems theory provides a powerful frameworkfor rigorous analysis of such models and thus can be used to supplement the systemdynamics simulation approach. We illustrate this in section 5, using some nonlinearmodels, which consider the effect of priority, heterogeneity, feedback, delays andresource sharing on the performance of supply chains. Furthermore, the largevolumes of data generated from simulations can be used to understand and com-prehend the emergent dynamics of supply chains. Even though an exact understand-ing of the dynamics is difficult in complex systems, archetypal behaviour patternscan often be recognized, using techniques from complexity theory like NonlinearTime Series Analysis and Computational Mechanics, which are discussed in section 6.The benefits of integrated supply chain concepts are widely recognized, but theanalytical tools that can exploit those benefits are scarce. In order to study supplychains as a whole, it is critical to understand the interplay of organizational structureand functioning of supply chains. Network dynamics, an extension of nonlineardynamics to networks, provides a systematic framework to deal with such issuesand is discussed in section 7. We conclude in section 8, with the recommendationsfor future research.

2. Complex adaptive systems

Many natural systems, and increasingly many artificial (man-made) systems as well,are characterized by apparently complex behaviours that arise as the result of non-linear spatio-temporal interactions among a large number of components or sub-systems. We use the term agent and node interchangeably to refer to the componentor subsystems. Examples of such natural systems include immune systems, nervoussystems, multi-cellular organisms, ecologies, insect societies and social organizations.However, such systems are not confined to biology and society. Engineering theoriesof controls, communications and computing have matured in recent decades, facil-itating the creation of large-scale systems which have turned out to possess bewilder-ing complexity, almost equivalent to that of biological systems. Systems sharing thisproperty include parallel and distributed computing systems, communication net-works, artificial neural networks, evolutionary algorithms, large-scale software sys-tems, and economies. Such systems have been commonly referred to as ComplexSystems (Baranger 2005, Bar-Yam 1997, Adami 1998, Flake 1998). However, at thepresent time, the notion of a complex system is not precisely delineated.

The most remarkable phenomenon exhibited by the complex systems is the emer-gence of highly structured collective behaviour over time from the interactionof simple subsystems without any centralized control. Their typical character-istics include: dynamics involving interrelated spatial and temporal effects, cor-relations over long length and timescales, strongly coupled degrees of freedom,

4237Supply-chain networks: a complex adaptive systems perspective

Page 26: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

non-interchangeable system elements. They exist in quasi-equilibrium and show acombination of regularity and randomness (i.e. interplay of chaos and non-chaos).Such systems have structures spanning several scales and show emergent behaviour.Emergence is generally understood to be a process that leads to the appearanceof structure not directly described by the defining constraints and instantaneousforces that control a system. The combination of structure and emergence leads toself-organization, which is what happens when an emerging behaviour has an effectof changing the structure or creating a new structure. CAS is a special category ofcomplex systems to accommodate living beings. As the name suggests, they arecapable of changing themselves to adapt to changing environment. In this regard,many artificial systems like those stated earlier can be considered as CAS, due totheir capability of evolving. Coexistence of competition and cooperation is anotherdichotomy exhibited by CAS.

A CAS can be considered as a network of dynamical elements where the statesof both the nodes and the edges can change, and the topology of the network itselfoften evolves in time in a nonlinear and heterogeneous fashion. A dynamical systemcan be considered as simply ‘obeying the laws of physics’. From another perspective,it can be viewed as processing information: how systems obtain information, howthey incorporate that information in the models of their surroundings, and how theymake decisions on the basis of these models determine how they behave (Llyodand Slotine 1996). This leads to one of the more heuristic definitions of a complexsystem: one that ‘stores, processes and transmits, information’ (Sawhil 1995). Froma thermodynamic viewpoint, such systems have the total energy (or its analogy)unknown, yet something is known about the internal state structure. In these largeopen systems (that do not possess well-defined boundaries), energy enters at lowentropy and is dissipated. Open systems organize largely due to the reduction in thenumber of active degrees of freedom caused by dissipation. Not all behaviours orspatial configurations can be supported. The result is a limitation of the collectivemodes, cooperative behaviours, and coherent structures that an open system canexpress. A central goal of the sciences of complex systems is to understand the lawsand mechanisms by which complicated, coherent global behaviour can emerge fromthe collective activities of relatively simple, locally interacting components.

Complexity arises in natural systems thorough evolution, while design playsan analogous role for the complex engineering systems. Convergent evolution/design leads to remarkable similarities at a higher level of organization, though atthe molecular or device level, natural and man-made systems differ significantly.Complexity in both cases is driven far more by the need for robustness to uncer-tainty in the environment and component parts than by basic functionality.Through design/evolution, such systems develop highly structured, elaborate inter-nal configurations, with layers of feedback and signalling. Protocols organizehighly structured and complex modular hierarchies to achieve robustness, but alsocreate fragilities stemming from rare or ignored perturbations. The evolution ofprotocols can lead to a robustness/complexity/fragility spiral where complexityadded for robustness also adds new fragilities, which in turn leads to new andthus spiralling complexities (Csete and Doyle 2002). However, all this complexityremains largely hidden in normal operation and only becomes conspicuouswhen contributing to rare cascading failures or through chronic fragility/complexityevolutionary spirals. Highly Optimized Tolerance (HOT) (Carlson and Doyle 1999)

4238 A. Surana et al.

Page 27: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

has been introduced recently to focus on the ‘robust, yet fragile’ nature of complex-ity. It is also becoming increasingly clear that robustness and complexity in biology,ecology, technology, and social systems are so intertwined that they must be treatedin a unified way. Given the diversity of systems falling into this broad class, thediscovery of any commonalities or ‘universal’ laws underlying such systems requiresa very general theoretical framework.

The scientific study of CAS has been attempting to find common characteristicsand/or formal distinctions among complex systems that might lead to a better under-standing of how complexity develops, whether it follows any general scientific lawsof nature, and how it might be related to simplicity. The attractiveness of themethods developed in this research effort for general-purpose modelling, designand analysis lies in their ability to produce complex emergent phenomena out ofa small set of relatively simple rules, constraints and the relationships couched ineither quantitative or qualitative terms. We believe that the tools and techniquesdeveloped in the study of CAS offer a rich potential for design, modelling andanalysis of large-scale systems in general and supply chains in particular.

3. Supply-chain networks as complex adaptive systems

A supply-chain network transfers information, products and finances between var-ious suppliers, manufacturers, distributors, retailers and customers. A supply chainis characterized by a forward flow of goods and a backward flow of information.Typically, a supply chain comprises two main business processes: material manage-ment and physical distribution (Min and Zhou 2002). The material managementsupports the complete cycle of material flow from the purchase and internal controlof production material to the planning and control of work-in-process, to the ware-housing, shipping, and distribution of finished products. On the other hand, physicaldistribution encompasses all the outbound logistics activities related to providingcustomer services. Combining the activities of material management and physicaldistribution, a supply chain represents not only a linear chain of one-on-one businessrelationships but a web of multiple business networks and relationships.

Supply-chain networks contain an emergent phenomenon. From the view of eachindividual entity, the supply chain is self-organizing. Although the totality may beunknown, individual entities partake in the grand establishment of the network byengaging in their localized decision-making, i.e. in doing their best to select capablesuppliers and ensure on-time delivery of products to their buyers. The network ischaracterized by nonlinear interactions and strong interdependencies between theentities. In most circumstances, order and control in the network are emergent, asopposed to predetermined. Control is generated through nonlinear though simplebehavioural rules that operate based on local information. We argue that a supply-chain network forms a complex adaptive system:

. Structures spanning several scales: The supply-chain network is a bi-levelhierarchical and heterogeneous network where, at the higher level, eachnode represents an individual supplier, manufacturer, distributor, retaileror customer. However, at the lower level, the nodes represent the physicalentities that exist inside each node in the upper level. The heterogeneity of

4239Supply-chain networks: a complex adaptive systems perspective

Page 28: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

most networks is a function of various technologies being provided by what-ever vendor could supply them at the time their need was recognized.

. Strongly coupled degrees of freedom and correlations over long length andtimescales: Different entities in a supply chain typically operate autono-mously with different objectives and subject to different set of constraints.However, when it comes to improving due date performance, increasingquality or reducing costs, they become highly inter-dependent. It is theflow of material, resources, information and finances that provides the bind-ing force. The welfare of any entity in the system directly depends on theperformance of the others and their willingness and ability to coordinate.This leads to correlations between entities over long length and timescales.

. Coexistence of competition and cooperation: The entities in a supply chainoften have conflicting objectives. Competition abounds in the form ofsharing and contention of resources. Global control over nodes is an excep-tion rather than a rule; more likely is a localized cooperation out of whicha global order emerges, which is itself unpredictable.

. Nonlinear dynamics involving interrelated spatial and temporal effects: Supplychains have a wide geographic distribution. Customers can initiate transac-tions at any time with little or no regard for existing load, thus contributingto a dynamic and noisy network character. The characteristics of a networktend to drift as workloads and configuration change, producing a non-stationary behaviour. The coordination protocols attempt to arbitrateamong entities with resource conflicts. Arbitration is not perfect, however;hence, over- and under-corrections contribute to the nonlinear character ofthe network.

. Quasi-equilibrium and combination of regularity and randomness (i.e. interplayof chaos and non-chaos): The general tendency of a supply chain is to main-tain a stable and prevalent configuration in response to external disturbances.However, they can undergo a radical structural change when they arestretched from equilibrium. At such a point, a small event can trigger acascade of changes that eventually can lead to system-wide reconfiguration.In some situations, unstable phenomena can arise, due to feedback structure,inherent adjustment delays and nonlinear decision-making processes that goin the nodes. One of the causes of unstable phenomena is that the informa-tion feedback in the system is slow relative to the rate of changes that occurin the system. The first mode of unstable behaviour to arise in nonlinearsystems is usually the simple one-cycle self-sustained oscillations. If theinstability drives the system further into the nonlinear regime, more compli-cated temporal behaviour may be generated. The route to chaos throughsubsequent period-doubling bifurcations, as certain parameters of the systemare varied, is generic to a large class of systems in physics, chemistry, biology,economics and other fields. Functioning in a chaotic regime deprives theability for long-term predictions about the behaviour of the system, whileshort-term predictions may be possible sometimes. As a result, control andstabilization of such a system become very difficult.

. Emergent behaviour and self-organization: With the individual entitiesobeying a deterministic selection process, the organization of the overallsupply chain emerges through a natural process of order and spontaneity.

4240 A. Surana et al.

Page 29: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

This emergence of highly structured collective behaviour over time fromthe interaction of the simple entities leads to fulfilment of customer orders.Demand amplification and inventory swing are two of the undesirable emer-gent phenomena that can also arise. For instance, the decisions and delaysdownstream in a supply chain often leads to amplifying a non-desirable effectupstream, a phenomenon commonly known as the ‘bull whip’ effect.

. Adaptation and evolution: A supply chain reacts to the environment andthereby creates its environment. Operationally, the environment dependson the chosen scale of analysis, e.g. it can be taken as the customer market.Typically, significant dynamism exists in the environment, which necessitatesa constant adaptation of the supply network. However, the environmentis highly rugged, making the co-evolution difficult. The individual entitiesconstantly observe what emerges from a supply network and adjust theirorganizational goals and supporting infrastructure. Another common adap-tation is through altering boundaries of the network. The boundaries canchange as a result of including or excluding particular entity and by addingor eliminating connections among entities, thereby changing the underlyingpattern of interaction. As we discuss next, supply-chain management plays acritical role in making the network evolve in a coherent manner.

3.1 Supply-chain management

Supply-chain management is the integration of key business processes from end-users through original suppliers that provide products, services, and informationand add value for customers and other stakeholders (Cooper et al. 1997). It involvesbalancing reliable customer delivery with manufacturing and inventory costs. It isevolved around a customer-focused corporate vision, which drives changes through-out a firm’s internal and external linkages and then captures the synergy of inter-functional, inter-organizational integration and coordination. Owing to the inherentcomplexity, it is a challenge to coordinate the actions of entities across organiza-tional boundaries so that they perform in a coherent manner.

An important element in managing supply-chain networks is to control theripple effect of lead time so that the variability in supply chain can be minimized.

Figure 1. Supply-chain network.

4241Supply-chain networks: a complex adaptive systems perspective

Page 30: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

Demand forecasting is used to estimate demand for each stage, and the inventorybetween stages for the network is used for protecting against fluctuations in supplyand demand across the network. Owing to the decentralized control properties of theSCN, control of ripple effect requires coordination between entities in performingtheir tasks. With the increase in the number of participants in the supply chain,the problem of coordination has reached another dimension.

Two important organizational and market trends that are on their way have beenthe atomization of markets as well as that of organizational entities (Balakrishnanet al. 1999). In such a scenario, the product realization process has a continuouscustomer involvement in all phases—from design to delivery. Customization is notonly limited to selecting from pre-determined model variants; rather, product design,process plans, and even the supply chain configuration have to be tailored foreach customer. The product-realization organization has to form on the fly, as aconsortium of widely dispersed organizations to cater to the needs of a single cus-tomer. Thus, organizations consist of series of opportunistic alliances among severalfocused organizational entities to address particular market opportunities. Formanufacturing organizations to operate effectively in this environment of dynamic,virtual alliances, products must have modular architectures, processes must be wellcharacterized and standardized, documentation must be widely accessible, and sys-tems must be interoperable. Automation and intelligent information processing isvital for diagnosing problems during product realization and usage, coordination,design and production schedules, searching for relevant information in multi-mediadatabases. These trends exacerbate the challenges of coordination and collaborationas the number of product realization networks increase, and so does the numberof partners in each network.

Building a larger inventory can be used as a general means for dealing with highlychanging market demand and short-life-cycle products. However, augmenting inven-tory building with information may be a useful approach. Information aboutthe material lead time from different suppliers can be used for planning the materialarrival, instead of simply building up an inventory. The demand information canbe transmitted to the manufacturers on a timely basis, so that the orders can befulfilled with less inventory costs. In fact, it is widely realized that the successfulintegration of the entire supply-chain process depends heavily on the availability ofaccurate and timely information that can be shared by all members of the supplychain. Supply-chain management now increasingly relies on information technology,as discussed below.

3.2 Information technology in supply-chain management

Information technology, with its capability of providing global reach and widerange of connectivity, enterprise integration, micro-autonomy and intelligence,object and networked-oriented computing paradigms and rich media support, hasbeen the key enabler for the management of modern manufacturing enterprises(Balakrishnan et al. 1999). It is vital for eliminating collaboration and coordi-nation costs, and permits the rapid setup of dynamic information exchange net-works. Connectivity permits involvement of customers and other stakeholders inall aspects of manufacturing. Enterprise integration facilitates seamless interaction

4242 A. Surana et al.

Page 31: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

among global partners. Micro-autonomy and intelligence permit atomic trackingand remote control. New software paradigms enable distributed, intelligent andautonomous operations. Distributed computing facilitates quick localized decisionswithout losing the vast data-gathering potential and powerful computing capabil-ities. Rich media support, which includes capabilities like digitization, visualizationtools and virtual reality, facilitate collaboration and immersion.

Many improvements have occurred in supply-chain management because ITenables dynamic changes in inventory management and production, and it assiststhe managers in coping with uncertainty and lead time through improved collectionand sharing of information between supply-chain nodes. The success of an enterpriseis now largely dependent on how its information resources are designed, operatedand managed, especially with the information technology emerging as a critical inputto be leveraged for significant organizational productivity. However, it is difficult todesign an information system that can handle the information needs of supply-chainnodes to allow efficient, flexible and decentralized supply-chain management. Themain hurdle in efficiently using information technology is the lack of our under-standing of the organizational, functional and evolutionary principles of supplychains.

Recognizing supply chains as CAS can, however, lead to novel and effective waysto understand their emergent dynamics. It has been found that many of the diverselooking CAS share similar characteristics and problems, and thus can be tackledthrough similar approaches. While, at present, networks are largely controlled byhumans, the complexity, diversity and geographic distribution of the networks makeit necessary for networks to maintain themselves in a sort of evolutionary sense, justas biological organisms do (Maxion 1990). Similarly, the problem of coordination,which is a challenge in supply chains, has been routinely solved by biological systemsfor literally billions of years. We believe that the complexity, flexibility and adapt-ability in the collective behaviour of the supply chains can be accomplished onlyby importing the mechanisms that govern these features in nature. Along with theserobust design principles, we require equally sound techniques for modelling andanalysis of supply chains. This forms the focus of this paper. We first give a briefoverview of the main techniques that have been used for modelling and analysis ofsupply chains, and then discuss how the science of complexity provides a genuineextension and reformulation of these approaches.

4. Modelling and analysis of supply-chain networks

As pointed out, the key challenge in designing supply-chain networks or, forthat matter, any large-scale systems is the difficulty of reverse engineering, i.e. deter-mining what individual agent strategies lead to the desired collective behaviour.Because of this difficulty in understanding the effect of individual characteristicson the collective behaviour of the system, simulation has been the primary tool fordesigning and optimizing such systems. Simulation makes investigations possibleand useful when, in the real-world situation, experimentation would be too costlyor, for ethical reasons, not feasible, or where the decisions and their consequencesare well separated in space and time. It seems at present that large-scale simulations

4243Supply-chain networks: a complex adaptive systems perspective

Page 32: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

of future complex processes may be the most logical, and perhaps an importantvehicle to study them objectively (Ghosh 2002).

Simulation in general helps one to detect design errors, prior to developing aprototype in a cost-effective manner. Second, simulation of system operations mayidentify potential problems that might occur during actual operation. Third,extensive simulation may potentially detect problems that are rare and otherwiseelusive. Fourth, hypothetical concepts that do not exist in nature, even those thatdefy natural laws, may be studied. The increased speed and precision of today’scomputers promise the development of high-fidelity models of physical and naturalprocesses, models that yield reasonably accurate results, quickly. This in turn wouldpermit system architects to study the performance impact of a wide variation of keyparameters quickly and, in some cases, even in real time. Thus, a quali-tative improvement in system design may be achieved. In many cases, unexpectedvariations in external stress can be simulated quickly to yield appropriate systemparameters values, which are then adopted into the system to enable it to success-fully counteract the external stress.

Mathematical analysis, on the other hand, has to a play a critical role becauseit alone can enable us to formulate rigorous generalizations or principle. Neitherphysical experiments nor computer-based experiments on their own can supportsuch generalizations. Physical experiments usually are limited to supplying inputsand constraints for rigorous models, because experiments themselves are rarelydescribed in a language that permits deductive exploration. Computer-basedexperiments or simulations have rigorous descriptions, but they deal only in specifics.A well-designed mathematical model, on the other hand, generalizes the particularsrevealed by the physical experiments, computer-based models and anyinterdisciplinary comparisons. Using mathematical analysis, we can study thedynamics, predict long-term behaviour, and gain insights into system design: e.g.what parameters determine group behaviour, how individual agent characteristicsaffect the system and that the proposed agent strategy leads to the desired groupbehaviour. In addition, mathematical analysis may be used to select parameters thatoptimize a system’s collective behaviour, prevent instabilities, etc.

It seems that successful modelling efforts of large-scale systems like supply-chainnetworks, large-scale software systems, communication networks, biologicalecosystems, food webs, social organizations, etc. would require a solid empiricalbase. Pure abstract mathematical contemplation would be unlikely to lead touseful models. The discipline of physics provides an appropriate parallel; advancesin theoretical physics are more often than not inspired by experimental findings. Thestudy of supply-chain networks should therefore involve an amalgam of bothsimulation and analytical techniques.

Considering the broad spectrum of a supply-chain, no model can capture allthe aspects of supply-chain processes. The modelling proceeds at three levels:

1. competitive strategic analysis, which includes location-allocation decisions,demand planning, distribution channel planning, strategic alliances,new product development, outsourcing, IT selection, pricing and networkstructuring;

2. tactical problems like inventory control, production/distribution coordina-tion, material handling and layout design;

4244 A. Surana et al.

Page 33: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

3. operational level problems, which include routing/scheduling, workforcescheduling and packaging.

The individual models in supply chains can be categorized into four classes(Min and Zhou 2002):

1. deterministic: single objective and multiple objective models;2. stochastic: optimal control theoretic and dynamic programming models;3. hybrid: with elements of both deterministic and stochastic models and

includes inventory theoretic and simulations models;4. IT-driven: models that aim to integrate and coordinate various phases

of supply-chain planning on a real-time bases using application software,like ERP.

Mathematical programming techniques and simulation have been twoapproaches for the analysis and study of the supply-chain models. Mathematicalprogramming mainly takes into consideration static aspects of the supply chain.Simulation, on the other hand, studies dynamics in supply chains and generallyproceeds based on ‘system dynamics’ and ‘agent-based’ methodologies. Systemdynamics is a continuous simulation methodology that uses concepts from engineer-ing feedback control to model and analyse dynamic socio-economic systems(Forrester 1961). The mathematical description is realized with ordinary differentialequations. An important advantage of system dynamics is the possibility to deducethe occurrence of a specific behaviour mode because the structure that leads to thesystem dynamics is made transparent. We present some nonlinear models in section 5which are useful for understanding the complex interdependencies, effects ofpriority, nonlinearities, delays, uncertainties and competition/cooperation forresource-sharing in supply chains. The drawback of system dynamics models isthat the structure has to be determined before starting the simulation. Agent-based modelling (a technique from complexity theory), on the other hand, isa ‘bottom up approach’ which simulates the underlying processes believed respon-sible for the global pattern, and allows us to evaluate what mechanisms are mostinfluential in producing that emergent pattern. In Schieritz and Grobler (2003),a hybrid modelling approach has been presented that intends to make the systemdynamics approach more flexible by combining it with the discrete agent-basedmodelling approach. Such large-scale simulations with their many degrees of free-dom raise serious technical problems about the design of experiments and thesequence in which they should be carried out in order to obtain the maximumrelevant information. Furthermore, in order to analyse data from such large-scalesimulations, we require systematic analytical and statistical methods. In section 6,we describe two such techniques: nonlinear time series analyses and computationalmechanics.

A useful paradigm for modelling a supply chain, taking into considerationthe detailed pattern of interaction, is to view it as a network. A network is essen-tially anything that can be represented by a graph: a set of points (also genericallycalled nodes or vertices), connected by links (edges, ties) representing some relation-ship. Networks are inherently difficult to understand due to their structural com-plexity, evolving structure, connection diversity, dynamical complexity of nodes,node diversity and meta-complication where all these factors influence each other.

4245Supply-chain networks: a complex adaptive systems perspective

Page 34: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

Queuing theory has primarily been used to address the steady-state operation of atypical network. On the other hand, techniques from mathematical programminghave been used to solve the problem of resource allocation in networks. This ismeaningful when dynamic transients can be disregarded. However, present-daysupply-chain networks are highly dynamic, reconfigurable, intrinsically nonlinearand non-stationary. New tools and techniques are required for their analysis suchthat the structure, function and growth of networks can be considered simulta-neously. In this regard, we discuss ‘network dynamics’ in section 7, which dealswith such issues and can be used to study the structure of supply chain and itsimplication for its functionality. Understanding the behaviour of large complexnetworks is the next logical step for the field of nonlinear dynamics, because theyare so pervasive in the real world. We begin with a brief introduction to dynamicalsystems theory, in particular nonlinear dynamics in next section.

5. Dynamical systems theory

Many physical systems that produce continuous-time response can be modelled bya set of differential equations of the form:

dy

dt¼ f ð y, aÞ, ð1Þ

where y ¼ ð y1ðtÞ, y2ðtÞ, . . . , ynðtÞÞ represents the state of the system and may bethought of as a point in a suitably defined space S, which is known as phasespace, and a ¼ ða1ðtÞ, a2ðtÞ, . . . , amðtÞÞ is a parameter vector. The dimensionality ofS is the number of a priori degrees of freedom in the system. The vector field f ( y, a)is in general a nonlinear operator acting on points in S. If f ( y, a) is locally Lipschtiz,the above equation defines an initial value problem in the sense that a unique solu-tion curve passes through each point y in the phase space. Formally, we may writethe solution at time t given an initial value y0 as yðtÞ ¼ ’ty0. ’t represents a one-parameter family of maps of the phase space into itself. We can perceive the solu-tions to all possible initial value problems for the system by writing them collectivelyas ’tS. This may be thought of as a flow of points in the phase space. Initially, thedimension of the set ’tS will be that of S itself. As the system evolves, however, it isgenerally the case for the dissipative system that the flow contracts on to a set oflower dimension known as an attractor. The attractors can vary from simple sta-tionary, limit cycle, quasi-periodic to complicated chaotic attractors (Strogatz 1994,Ott 1996). The nature of attractor changes as parameters (a) is varied, a phenomenonstudied in bifurcation analysis. Typically, a nonlinear system is always chaotic forsome range of parameters. Chaotic attractors have a structure that is not simple;they are often not smooth manifolds and frequently have a highly fractured struc-ture, which is popularly referred to as Fractals (self-similar geometrical objectshaving a structure at every scale). On this attractor, stretching and folding charac-terize the dynamics. The stretching phenomenon causes the divergence of nearbytrajectories and the folding phenomenon constraints the dynamics to finite region ofthe state space. This accounts for the fractal structure of attractors and the extremesensitivity to changes in initial conditions, which is a hallmark of chaotic behaviour.A system under chaos is unstable everywhere and never settles down, producing

4246 A. Surana et al.

Page 35: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

irregular and aperiodic behaviour, which leads to a continuous broadband spectrum.While this feature can be used to distinguish chaotic behaviour from stationary, limitcycle, quasi-periodic motions using standard Fourier analysis, it makes it difficult toseparate it from noise which also has a broadband spectrum. It is this ‘deterministicrandomness’ of chaotic behaviour which makes standard linear modelling andprediction techniques unsuitable for analysis.

5.1 Nonlinear models for the supply chain

Understanding the complex interdependencies, effects of priority, nonlinearities,delays, uncertainties and competition/cooperation for resource sharing are funda-mental for prediction and control of supply chains. A system dynamics approachoften leads to models of supply chains, which can be described in the form ofequation (1). Dynamical systems theory provides a powerful framework for rigorousanalysis of such models and thus can be used to supplement the system dynamicsapproach. We next describe some nonlinear models and their detailed analysis. Thesemodels can be used either to represent entities in a supply chain or as macroscopicmodels, which capture collective behaviour. The models reiterate the fact that simplerules can lead to complex behaviour, which in general are difficult to predict andcontrol.

5.1.1 Pre-emptive queuing model with delays. Priority and heterogeneity are funda-mental to any logistic planning and scheduling. Tasks have to be prioritized in orderto do the most important things first. This comes naturally as we try to optimize anobjective and assign the tasks their ‘importance’. Priorities may also arise due to thenon-homogeneity of the system where the ‘knowledge’ level of one agent is differentfrom the other. In addition, in all logistics systems, resources are limited, in both

Figure 2. Pre-emptive queuing model.

4247Supply-chain networks: a complex adaptive systems perspective

Page 36: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

time and space. Temporal dependence plays an important role in logistic planning(interdependency). Sometimes, they can also arise from the physical facts whendifferent stages of processing have certain temporal constraints.

The considerations regarding the generality of assumptions and the clear one-to-one correspondence between the physical logistics tasks and the model param-eters described in (Erramilli and Forys 1991) made us apply their queuing model inthe context of supply chains (Kumara et al. 2003). The queuing system consideredhere has two queues (A and B) and a single server with the following characteristics:

. once served, the class A customer returns as a class B customer after aconstant interval of time;

. Class B has non-pre-emptive priority over class A, i.e. the class A queue isnot served until the class B queue is emptied;

. the schedules are organized every T units of time, i.e. if the low priority queueis emptied within time T, the server remains idle for the reminder of theinterval;

. finally, the higher-priority class B has a lower service rate than the low-priority class A.

Suppose the system is sampled at the end of every schedule cycle, and the follow-ing quantities are observed at the beginning of the kth interval: Ak: queue lengthof low-priority queue; Bk: queue length of high-priority queue; Ck: outflow fromlow-priority queue in the kth interval; Dk: outflow from high-priority queue in thekth interval; �k: inflow to low-priority queue from the outside in the kth interval.

The system is characterized by the following parameters: �a: rate per unit ofthe schedule cycle at which the low-priority queue can be served; �b: rate per unitof the schedule cycle at which the high-priority queue can be served; l: the feedbackinterval in units of the schedule cycle.

The following four equations then completely describe the evolution of thesystem:

Akþ1 ¼ Ak þ �k � Ck ð2Þ

Ck ¼ min Ak þ �k,�a 1�Dk

�b

� �� �ð3Þ

Bkþ1 ¼ Bk þ Ck�l �Dk ð4Þ

Dk ¼ minðBk þ Ck�l,�bÞ: ð5Þ

Equations (2) and (4) are merely conservation rules, while equations (3) and (5)model the constraints on the outflows and the interaction between the queues.This model, while conceptually simple, exhibits surprisingly complex behaviours.

The analytic approach to solve for the flow model under constant arrivals(i.e. �k¼ � for all k) shows several classes of solutions. The system batches its work-load even for perfectly smooth arrival patterns. The characteristics of the behaviourof the system are as follows:

1. Above a threshold arrival rate (���b/2), a momentary overload can send thesystem into a number of stable modes of oscillations.

2. Each mode of oscillations is characterized by distinct average queuing delays.

4248 A. Surana et al.

Page 37: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

3. The extreme sensitivity to parameters, and the existence of chaos, impliesthat the system at a given time may be any one of a number of distinctsteady-state modes.

The batching of the workload can cause significant queuing delays, even at moderateoccupancies. Also, such oscillatory behaviour significantly lowers the real-timecapacity of the system. For details of the application of this model in a supply-chaincontext, refer to Kumara et al. (2003).

5.1.2 Managerial systems. Decision-making is another typical characteristic inwhich the entities in a supply chain are continuously engaged. Entities make deci-sions to optimize their self-interests, often based on local, delayed and imperfectinformation.

To illustrate the effects of decisions on the dynamics of supply chain as a whole,we consider a managerial system which allocates resources to its production andmarketing departments in accordance with shifts in inventory and/or backlog(Rasmussen and Moseklide 1988). It has four level variables: resources in production,resources in sales, inventory of finished products and number of customers. In order torepresent the time required to adjust production, a third-order delay is introducedbetween production rate and inventory. The sum of the two resource variables iskept constant. The rate of production is determined from resources in productionthrough a nonlinear function, which expresses a decreasing productivity of addi-tional resources as the company approaches maximum capacity. The sales rate, onthe other hand, is determined by the number of customers and by the average salesper customer-year. Customers are mainly recruited through visits of the companysalesman. The rate of recruitment depends upon the resources allocated to marketingand sales, and again it is assumed that there is a diminishing return to increasingsales activity: once recruited, customers are assumed to remain with the companyfor an average period AT, the association time.

A difference between production and sales causes the inventory to change.The Company is assumed to respond to such changes by adjusting its resourceallocation. When the inventory is lower than desired, on the other hand, resourcesare redirected from sales to production. A certain minimum of resources is alwaysmaintained in both production and sales. In the model, this is secured by means oftwo limiting factors, which reduce the transfer rate when a resource floor isapproached. Finally, the model assumes that there is a feedback from inventory tocustomer defection rate. If the inventory of finished products becomes very low, thedelivery time is assumed to become unacceptable to many customers. As a conse-quence, the defection rate is enhanced by a factor 1þH.

The managerial system described is controlled by two interacting negative feed-back loops. Combined with the delays involved in adjusting production and sales,these loops create the potential for oscillatory behaviour. If the transfer of resourcesis fast enough, this behaviour is destabilized, and the system starts to perform self-sustained oscillations. The amplitude of these oscillations is finally limited by thevarious nonlinear restrictions in the model, particularly by the reduction in resourcetransfer rate as lower limits to resources in production or resources in sales areapproached.

4249Supply-chain networks: a complex adaptive systems perspective

Page 38: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

A series of abrupt changes in the system behaviour is observed as competitionbetween the basic growth tendency and nonlinear limiting factors is shifted.The simple one-cycle attractor corresponding to H¼ 10, becomes unstable forH¼ 13, and a new stable attractor with twice the original period arises. If H isincreased to 28, the stable attractor attains a period of 4. As H is further increased,the period-doubling bifurcations continue until H¼ 30, the threshold to chaos, isexceeded. The system now starts to behave in an aperiodic and apparently randombehaviour. Hence, the system shows chaotic behaviour through a series of period-doubling bifurcations.

5.1.3 Deterministic queuing model. In this section, we consider an alternatediscrete-time deterministic queuing model, for studying decision-making at anentity level in supply chains. The model consists of one server and two queuing

Figure 3. Managerial system.

4250 A. Surana et al.

Page 39: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

lines (X and Y) representing some activity (Feichtinger et al. 1994). The inputrates of both queues are constant, and their sum equals the server capacity. Ineach time period, the server has to decide how much time to spend on each of thetwo activities.

The following quantities can be defined: �: constant input rate for activity X;�: constant input rate for activity Y; �X: time spent on activity X; �Y: time spenton activity Y; xk: queue length of X; yk: queue length of Y.

The amount of time �X and �Y that will be spent on activities X and Y inperiod kþ 1 are determined by an adaptive feedback rule depending on thedifference of the queue lengths xk and yk. The decision rule or policy functionstates that longer queues are served with a higher priority. Two possibilitiesconsidered are:

1. All-or nothing decision: The server decides to spend all its time on theactivity corresponding to the longer queue. Hence, � is a Heaviside functiongiven by

�ðx� yÞ ¼ 1 if x � y

¼ 0 if x < y: ð6Þ

2. Mixed solutions: The server decides to spend most of its time to the activitycorresponding to the longer queue. For this decision function, an S-shapedlogistic function is used as given by:

�ðx� yÞ ¼1

1þ e kðx�yÞ: ð7Þ

The parameter k tunes the ‘steepness’ of the S-shape.With these decision functions, the new queue lengths xkþ1 and ykþ1 are given

equations

xkþ1 ¼ xk þ ���ðxk � ykÞ

ykþ1 ¼ yk þ ���ðxk � ykÞ:ð8Þ

Figure 4. Deterministic queuing model.

4251Supply-chain networks: a complex adaptive systems perspective

Page 40: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

Using the constraints �þ �¼ 1 and �Xþ�Y¼ 1, it is sufficient to consider thedynamics of the map in order to study the behaviour of the system

f ðxÞ ¼ xþ ���ð2x� 2Þ: ð9Þ

For 0<k<4 and for all 0<�<1, the map f has a globally stable equilibrium.Simulation shows that when the parameter k is not too large, the bifurcation dia-grams with respect to � are simple. For larger values of k (e.g. k¼ 7.3), chaoticbehaviour arises after infinitely many period doubling bifurcations, as � is increasedfrom 0.0 to 0.3. However, when � is further increased from 0.3 to 0.5, chaosdisappears after many period-halving bifurcations. For 0.5<�<1, the bifurcationsscenario is qualitatively the same as for 0<�<0.5, since the system is symmetricw.r.t. �¼ 0.5 and x¼ 1. Physically, when � is close to 0, there is stable equilibrium,meaning that in the long run, in each time period, the server spends a fixed pro-portion of time to each of the two activities, and it spends most of the time onthe activity Y with the highest input rate. For � close to 1, we have same behaviour,with the activities X and Y interchanged. For � close to 0.5, i.e. when the input ratesof the two activities are almost equal, the equilibrium is unstable, and there is stableperiod 2 orbit. This means that in one period, most of the time is spent on activity X,and then in the next period most of the time is spent on activity Y, and again onactivity X and so on. Chaotic behaviour arises when � is somewhere between 0 and0.5 or between 0.5 and 1, for �¼ 1/3 and �¼ 2/3. Hence, a steep decision functiontogether with a situation where the input rate of one activity is around twice theinput rate of the other activity leads to irregular queue lengths.

As k!1, the decision function � converges to the Heaviside function.The dynamical behaviour of the queuing model in that case is equivalent to rigidrotation on a circle. For rational �¼ p/q of the input rate, every point x isperiodic with period q. In that case, for all q time periods, p time periods are com-pletely spent on the first activity, while the remaining q� p time periods are spenton the other activity. On the other hand, when � is irrational, the dynamicalbehaviour is quasi-periodic, and every point x is aperiodic.

5.2 Dynamical models of resource allocation: Computational ecosystems

Because of limited resources, resource sharing and allocation is a fundamental prob-lem in any supply chain. The manner in which the resources are shared and utilizedhas a significant impact on the performance of a supply chain. It also dictates howcooperation/competition arises and is sustained in a supply chain. Resources can beof various types: physical resources, manpower, information and monetary. With theIT architectures being developed to realize supply chains, sharing of computationalresources (like CPU, memory, bandwidth, databases, etc.) is also becoming a criticalissue. It is through resource sharing that interdependencies arise between differententities. This leads to a complex web of interactions in supply chains just like in afood web or ecology. As a result, such systems can be referred to as ‘ComputationalEcosystems’ (Hogg and Huberman 1988) in analogy with biological ecosystems.

‘Computational Ecosystems’ is a generic model of the dynamics of resourceallocation among agents trying to solve a problem collectively. The model capturesthe following features: distributed control, asynchrony in execution, resource con-tention and cooperation among agents and concomitant problem of incomplete

4252 A. Surana et al.

Page 41: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

knowledge and delayed information. The behaviour of each agent is modelledusing a payoff function whose nature determines whether an agent is cooperativeor competitive. The agent here can be any entity in a supply chain like a distributor,retailer, etc. or a software agent in an e-commerce scenario. The state of the system isrepresented as an average number of entities using different resources and follows adelay differential equation under a mean field approximation. The resources can bephysical or computational as discussed before. For example, in case of two resourceswith n identical agents, the rate of change of occupation of a resource is given by:

d n1ðtÞ� �dt

¼ � n �h i � n1ðtÞ� �� �

, ð10Þ

where, hn1ðtÞi is the expected number of agents using resource 1 at a given instantof time t; � is the expected number of choices made by an agent per unit time; � isa random variable that denotes that resource 1 will be perceived to have a higherpayoff than resource 2; and �h i gives its expected value.

The global performance of the ecosystem can be obtained from the above equa-tion. Under different conditions of delay, uncertainty, and cooperation/competition,the system shows a rich panoply of behaviours ranging from stable, sustained oscil-lations to intermittent chaos and finally to fully developed chaos. Furthermore,the following generic deductions can be made from this model (Kephart et al.1989): while information delay has an adverse impact on the system performance,uncertainty has a profound effect on the stability of the system. One can deliberatelyincrease uncertainty in agents’ evaluation of the merits of choices to make it stablebut at the expense of performance degradation. A second possibility is a very slowre-evaluation rate of the agents, which makes them non-adaptive. Heterogeneity in

Figure 5. Computational ecosystems (�: time delay; �: standard deviation of �).

4253Supply-chain networks: a complex adaptive systems perspective

Page 42: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

the nature of agents can lead to more stability in the system compared with ahomogenous case, but the system loses its ability to cope with unexpected changesin the system such as new task requirements. On the other hand, a poor performancecan be traced to the fact that the non-predictive agents do not take into accountthe information delay.

If the agents are able to make accurate predictions of its current state, theinformation delay could be overcome, and the system would perform well. Thisresults in a ‘co-evolutionary’ system in which all of the individuals are simulta-neously trying to adapt to one another. In such a situation, agents can act likeTechnical Analysts and System Analysts (Kephart et al. 1990). Agents as technicalanalysts (like those in market behaviour) use either linear extrapolation or cyclictrend analysis to estimate the current state of the system. On the other hand, agentsas system analysts have knowledge about both the individual characteristics of theother agents in the system and how those characteristics are related to the overallsystem dynamics. Technical analysts are responsive to the behaviour of the systembut suffer from an inability to take into account the strategies of other agents.Moreover, a good predictive strategy for a single agent may be disastrous if appliedon a global scale. System analysts perform extremely well when they have veryaccurate information about other agents in the system but can perform verypoorly when their information is even slightly inaccurate. They take into accountthe strategies of other agents but pay no heed to the actual behaviour of the system.This suggests combining the strengths of both methods to form a hybrid-adaptivesystem analyst, which modifies its assumptions about other agents in response tofeedback about success of its own predictions. The resultant hybrid is able to per-form well.

In order to avoid chaos while maintaining a high performance and adaptabilityto unforeseen changes, more sophisticated techniques are required. One such way isby a reward mechanism (Hogg and Huberman 1991), whereby the relative number ofcomputational agents following effective strategies is increased at the expense of theothers. This procedure, which generates a right population diversity out of essen-tially homogenous ones, is able to control chaos by a series of bifurcations into astable fixed point.

In the above description, each agent chooses among different resources accord-ing to its perceived payoff, which depends on the number of agents already using it.Even the agent with predictive ability is myopic, as it considers only its currentestimate of the system state, without regard to the future. Expectations come intoplay if agents use past and present global behaviour in estimating the expected futurepayoff for each resource. A dynamical model of collective action that includesexpectations can be found in Glance (1993).

6. Models from observed data

One of the central problems in a supply chain, closely related to modelling, is thatof demand forecasting: given the past, how can we predict the future demand?The classic approach to forecasting is to build an explanatory model from firstprinciples and measure the initial conditions. Unfortunately, this has not been pos-sible for two reasons in systems like supply chains. First, we still lack the general

4254 A. Surana et al.

Page 43: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

‘first principles’ for demand variation in supply chains, which are necessary to makegood models. Second, due to the distributed nature of the supply chains, the initialdata or the conditions are often difficult to obtain.

Because of these factors, the modern theory of forecasting that has been used insupply chains views a time series x(t) as a realization of a random process. This isappropriate when effective randomness arises from complicated motion involvingmany independent, irreducible degrees of freedom. An alternative cause of random-ness is chaos, which can occur even in very simple deterministic systems, as wediscussed in the earlier sections. While chaos places a fundamental limit on long-term prediction, it suggests possibilities for short-term prediction. Random-lookingdata may contain only few irreducible degrees of freedom. Time traces of the statevariable of such chaotic systems display a behaviour, which is intermediate betweenregular periodic or quasiperiodic motions, and unpredictable, truly stochastic behav-iour. It has long been seen as a form of ‘noise’ because the tools for its analysiswere couched in language tuned to a linear process. The main such tool is Fourieranalysis, which is precisely designed to extract the composition of sines and cosinesfound in an observation x(t). Similarly, the standard linear modelling and pre-diction techniques, such as autoregressive moving average (ARMA) models, arenot suitable for nonlinear systems.

With the advances in IT and science of complexity, both the challenges forforecasting can be revisited. Large-scale simulation and micro-autonomy (section 2)enable tracking of the detailed interaction between different entities in a supplychain. The large volumes of data thus generated can be used to understanddemand patterns in particular and comprehend the emergence of other character-istics in general. Even though an exact prediction of future behaviour is difficult,often archetypal behaviour patterns can be recognized using these data. Techniquesfrom the complexity theory like Nonlinear Time Series Analysis and ComputationalMechanics are appropriate for this purpose.

6.1 Nonlinear time-series analysis

The need to extract interesting physical information about the dynamics of observedsystems when they are operating in a chaotic regime has led to the development ofnonlinear time series analysis techniques. Systematically, the study of potentially,chaotic systems may be divided into three areas: identification of chaotic behaviour,modelling and prediction and control. The first area shows how chaotic systems maybe separated from stochastic systems and, at the same time, provides estimates of thedegrees of freedom and the complexity of the underlying chaotic system. Based onsuch results, identification of a state-space representation allowing for subsequentpredictions may be carried out. The last stage, if desirable, involves control of achaotic system.

Given the observed behaviour of a dynamical system as a one-dimensional timeseries x(n), we want to build models for prediction. The most important task in thisprocess is phase space reconstruction, which involves building topologically andgeometrically equivalent attractor. In general, steps in nonlinear time series analysiscan be summarized as (Abarbanel 1996):

. Signal separation (finding the signal): Separation of a broadband signal frombroadband ‘noise’ using deterministic nature of signal.

4255Supply-chain networks: a complex adaptive systems perspective

Page 44: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

. Phase space reconstruction (finding the space): Using the method of delays,one can construct a series of vectors which is diffeomorphically equivalent tothe attractor of the original dynamical system and at the same time distin-guish it from the being stochastic. The basis for this is Taken’s Embeddingtheorem (Takens 1981). Time-lagged variables are used to construct vectorsfor a phase space in dE dimension:

yðnÞ ¼ ½xðnÞ, xðnþ T Þ, . . . , xðnþ ðdE � 1ÞT Þ�: ð11Þ

The time lag T can be determined using mutual information (Fraser andSwinney 1983) and dE using a false nearest-neighbours test (Kennel et al.1992).

. Classification of the signal: System identification in nonlinear chaotic systemsmeans establishing a set of invariants for each system of interest and thencomparing observations with that library of invariants. The invariants areproperties of attractor and are independent of any particular trajectory of theattractor. Invariants can be divided into two classes: fractal dimensions(Farmer et al. 1983) and Lyapunov exponents (Sano and Sawada 1985).Fractal dimensions characterize the geometrical complexity of dynamics,i.e. how the sample of points along a system orbit are distributed spatially.Lyapunov exponents, on the other hand, describe the dynamical complexity,i.e. ‘stretching and folding’ in the dynamical process.

. Making models and prediction: This step involves determination of theparameters aj of the assumed model of the dynamics:

yðnÞ ! yðnþ 1Þ

yðnþ 1Þ ¼ Fð yðnÞ, a1, a2, . . . , apÞ,ð12Þ

which is consistent with invariant classifiers (Lyapunov exponents, dimensions).The functional form F (�) often used includes polynomials, radial basis functions,etc. The Local False Nearest Neighbor (Abarbanel and Kennel 1993) test is used todetermine how many dimensions are locally required to describe the dynamics gen-erating the time series, without knowing the equations of motion, and hence givesthe dimension for the assumed model. The methods for building nonlinear modelsare classified as Global and Local (Farmer and Sidorowich 1987, Casdalgi 1989).By definition, Local methods vary from point to point in the phase space, whileGlobal Models are constructed once and for all in the whole phase space. Modelsbased on Machine Learning techniques such as radial basis functions or NeuralNetworks (Powell 1987) and Support Vector Machines (Mukherjee et al. 1997)carry features of both. They are usually used as global functional forms, but theyclearly demonstrate localized behaviour, too.

The techniques from nonlinear time series analysis are well suited formodelling the nonlinearities in the supply chains. For an application of nonlineartime series analysis in supply chains, the reader is referred to Lee et al. (2002).Using this, one can deduce that the time series is deterministic, so it should bepossible in principle to build predictive models. The invariants can be used toeffectively characterize the complex behaviour. For example, the largestLyapunov exponent gives an indication of how far into the future reliable predic-tions can be made, while the fractal dimensions give an indication of how complex a

4256 A. Surana et al.

Page 45: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

model should be chosen to represent the data. These models then provide thebasis for systematically developing the control strategies. It should be notedthat the functional forms used for modelling in the fourth step above are continuousin their argument. This approach builds models viewing a dynamical system asobeying laws of physics. From another perspective, a dynamical system can beconsidered as processing information. So, an alternative class of discrete ‘computa-tional’ models inspired from the theory of automata and formal languages can alsobe used for modelling the dynamics (Lind and Marcus 1996). ‘ComputationalMechanics’ considers this viewpoint and describes the system behaviour interms of its intrinsic computational architecture, i.e. how it stores and processesinformation.

6.2 Computational mechanics

Computational mechanics is a method for inferring the causal structure of stochasticprocesses from empirical data or arbitrary probabilistic representations. It combinesideas and techniques from nonlinear dynamics, information theory and auto-mata theory, and is, as it were, an ‘inverse’ to statistical mechanics. Instead ofstarting with a microscopic description of particles and their interactions, and deriv-ing macroscopic phenomena, it starts with observed macroscopic data and infers thesimplest causal structure: the ‘"-machine’ capable of generating the observations.The "-machine in turn describes the system’s intrinsic computation, i.e. how itstores and processes information. This is developed using the statistical mechanicsof orbit ensembles, rather than focusing on the computational complexity of indi-vidual orbits. By not requiring a Hamiltonian, computational mechanics can beapplied in a wide range of contexts, including those where an energy function forthe system may not manifest as for the supply chains. Notions of complexity, emer-gence and self-organization have also been formalized and quantified in terms ofvarious information measures (Shalizi 2005).

Given a time series, the (unknowable) exact states of an observed system aretranslated into a sequence of symbols via a measurement channel (Crutchfield 1992).Two histories (i.e. two series of past data) carry equivalent information if they leadto the same (conditional) probability distribution in the future (i.e. if it makes nodifference whether one or the other data series is observed). Under these circum-stances, i.e. the effects of the two series being indistinguishable, they can be lumpedtogether. This procedure identifies causal states and also identifies the structureof connections or succession in causal states and creates what is known as an‘epsilon-machine’. The "-machines form a special class of Deterministic FiniteState Automata (DFSA) with transitions labelled with conditional probabilitiesand hence can also be viewed as Markov chains. However, finite-memory machineslike "-machines may fail to admit a finite size model, implying that the number ofcasual states could turn out to be infinite. In this case, a more powerful model thanDFSA needs to be used. One proceeds by trying to use the next most powerful modelin the hierarchy of machines known as the casual hierarchy (Crutchfield 1994), inanalogy with the Chomsky hierarchy of formal languages. While ‘"-machine recon-struction’ refers to the process of constructing the machine given an assumed modelclass, ‘hierarchical machine reconstruction’ describes a process of innovation tocreate a new model class. It detects regularities in a series of increasingly accurate

4257Supply-chain networks: a complex adaptive systems perspective

Page 46: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

models. The inductive jump to a higher computational level occurs by taking thoseregularities as the new representation.

"-machines reflect a balanced utilization of deterministic and random informa-tion processing, and this is discovered automatically during "-machine reconstruc-tion. These machines are unique and optimal in the sense that they have maximalpredictive power and minimum model size (hence satisfying Occam’s Razor, i.e.causes should not be multiplied beyond necessity). "-machines provide a minimaldescription of the pattern or regularities in a system in the sense that the patternis the algebraic structure determined by the causal states and their transitions."-machines are also minimally stochastic. Hence, computational mechanics acts asa method for automatic pattern discovery.

An "-machine is the organization of the process, or at least of the part of it

which is relevant to our measurements. The "-machine that models the observedtime series from a system can be used to define and calculate macroscopic or global

properties that reflect the characteristic average information- processing capabilitiesof the system. Some of these include Entropy rate, Excess entropy and Statisticalcomplexity (Feldman and Crutchfield 1998) and (Crutchfield and Feldman 2001).

The entropy density indicates how predictable the system is. Excess entropy, onother hand, provides a measure of the apparent memory stored in a spatial config-

uration and represents how hard it is to predict. "-machine reconstruction leads to anatural measure of the statistical complexity of a process, namely the amount ofinformation needed to specify the state of the "-machine, i.e. the Shannon Entropy.

Statistical complexity is distinct and dual from information theoretic entropies anddimension (Crutchfield and Young 1989). The existence of chaos shows that there isa rich variety of unpredictability that spans the two extremes: periodic and random

behaviour. This behaviour between two extremes, while of intermediate informationcontent, is more complex in that the most concise description (modelling) is an

amalgam of regular and stochastic processes. An information theoretic descriptionof this spectrum in terms of dynamical entropies measures raw diversity of temporalpatterns. The dynamical entropies, however, do not measure directly the com-

putational effort required in modelling the complex behaviour, which is whatstatistical complexity captures.

Computational mechanics sets limits on how well processes can be predictedand shows how, at least in principle, those limits can be attained. "-machines arewhat any prediction method would build, if only they could. Similar to "-machine

reconstruction, techniques exist which can be used to discover casual architecture inmemoryless transducers, transducers with memory and spatially extended systems

(Shalizi and Crutchfield 2001). Computational mechanics can be used for modellingand prediction in supply chains in the following way:

. In systems like supply chains, it is difficult to define analogues of variousthermodynamic quantities like energy, temperature, pressure, etc. as we dofor physical systems. Each component in the network has cognition, which isabsent in physical systems such as a molecule of a gas. Because of suchdifficulties, statistical mechanics cannot be applied directly to build predic-tion models for supply chains. As discussed previously by not requiring aHamiltonian (the energy-like function), computational mechanics is stillapplicable in the case of supply chains.

4258 A. Surana et al.

Page 47: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

. "-machines can be built to discover patterns in the behaviour of variousquantities in supply chains like the inventory levels, demand fluctuations, etc.

. "-machines can be used for prediction through a process known as ‘synchro-nization’ (Crutchfield and Feldman 2003).

. "-machines can be used to calculate various global properties like entropyrate, excess entropy and statistical complexity that reflect how the systemstores and processes information. The significance of these quantities hasbeen discussed earlier.

. We can also quantify notions of Complexity, Emergence and Self-Organization in terms of various information measures derived from"-machines. By evaluating such quantities, we can compare the complexityof different supply chains and quantify the extent to which the networkis showing emergence. We can also infer when a supply chain is under-going self-organization and to what extent. Such quantification can helpus to compare precisely what policies or cognitive capabilities possessedby individual agents can lead to different degrees of emergence and self-organization. Hence, we can decide to what extent we desire to enforce thecontrol and to what extent we want to let the network emerge.

7. Network dynamics

The ubiquity of networks in the social, biological and physical sciences and intechnology leads naturally to an important set of common problems, whichare being currently studied under the rubric of ‘Network Dynamics’ (Strogatz2001). Structure always affects function, and it is important to consider dynamicaland structural complexity together in the study of networks. For instance, thetopology of social networks affects the spread of information and disease, andthe topology of the power grid affects the robustness and stability of powertransmission. The different problem areas in network dynamics are discussedbelow.

One area of research in this field has been primarily concerned with the dynam-ical complexity in regular networks without regard to other network topologies.While the collective behaviour depends on the details of the network, some general-izations can still be drawn (Strogatz 2001). For instance, if the dynamical systemat each node has stable fixed points and no other attractor, the network tends tolock into a static fixed pattern. If the nodes have competing interactions, the networkmay display an enormous number of locally stable equilibria. In the intermediatecase where each node has a stable limit cycle, synchronization and patterns liketravelling waves can be observed. For non-identical oscillators, the temporal ana-logue of phase transition can be seen with the control parameter as the couplingcoefficient. At the opposite extreme, if each node has an identical chaotic attractor,the network can synchronize their erratic fluctuations. For a wide range of networktopologies, synchronized chaos requires that the coupling be neither too weak nortoo strong; otherwise, spatial instabilities are triggered. Related lines of research thataddress networks of identical chaotic maps are coupled map lattices (Kaneko andTsuda 2000) and cellular automata (Wolfram 1994). However, these systems have

4259Supply-chain networks: a complex adaptive systems perspective

Page 48: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

been used mainly as testbeds for exploring spatio-temporal chaos and patternformation in the simplest mathematical settings, rather than as models of realsystems.

The second area in network dynamics is concerned with characterizing the net-work structure. The network structure or topologies in general can vary from com-pletely regular, like chains, grids, lattices and fully connected, to completely random.Moreover, the graphs can be directed or undirected and cyclic or acyclic. In order tocharacterize topological properties of the graphs, various statistical quantities havebeen defined. Most important of them include average path length, clustering coeffi-cient, degree distributions, size of giant component and various spectral properties.A review of the main models and analytical tools, covering regular graphs, randomgraphs, generalized random graphs, small-world and scale-free networks, as well asthe interplay between topology and the network’s robustness against failuresand attacks can be found in Albert (2000b), Albert and Barabasi (2002), Albertet al. (2002), Callaway et al. (2000) and Dorogovtsev and Mendes (2002).

Classic random graphs were introduced by Erdos and Renyi (Bollobas 1985)and have been the most thoroughly studied models of networks. Such graphs havea Poisson degree distribution and statistically uncorrelated vertices. At large N(total number of nodes in the graph) and large enough p (the probability that twoarbitrary vertices are connected), a giant connected component appears in the net-work, a process known as percolation. Random graphs exhibit a low average pathlength and a low clustering coefficient. Regular networks, on other hand, show ahigh clustering coefficient and also a greater average path length compared with therandom graphs of similar size. The networks found in the real world, however, areneither completely regular nor completely random. Instead, we see ‘small world’ and‘scale free’ characteristics for many real networks like social networks, Internet,WWW, power grids, collaboration networks, ecological and metabolic networks,to name a few.

In order to describe the transition from a regular network to a random network,Watts and Strogatz introduced the so-called small-world graphs as models of socialnetworks (Watts and Strogatz 1998) and (Newman 2000). This model exhibits a highdegree of clustering, as in the regular network, and a small average distance betweenvertices, as in the classic random graphs. A common feature of this model with arandom graph model is that the connectivity distribution of the network peaks at anaverage value and decays exponentially. Such an exponential network is homoge-neous in nature: each node has roughly the same number of connections. Because ofthe high degree of clustering, the models of dynamical systems with small-worldcoupling display an enhanced signal-propagation speed, rapid disease propagation,and synchronizability (Watts and Strogatz 1998, Newman 2002).

Another significant recent discovery in the field of complex networks is that theconnectivity distributions of a number of large-scale and complex networks, includ-ing the WWW, Internet, and metabolic networks, satisfy the power law PðkÞ � k�� ,where P(k) is the probability that a node in the network is connected to k othernodes, and � is a positive real number (Albert et al. 2000a Barabasi et al. 2000,Barabasi 2001). Since power-laws are free of the characteristic scale, networks thatsatisfy these laws are called ‘scale-free’. A scale-free network is inhomogeneous innature: most nodes have a few connections, and a small but statistically significantnumber of nodes have many connections. The average path length is smaller in the

4260 A. Surana et al.

Page 49: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

scale-free network than in a random graph, indicating that the heterogeneousscale-free topology is more efficient in bringing the nodes closer than homogenoustopology of the random graphs. The clustering coefficient of the scale-free network isabout five times higher than that of the random graph, and this factor slowlyincreases with the number of nodes. It has been shown that it is practically impos-sible to achieve synchronization in a nearest-neighbour coupled network (regularconnectivity) if the network is sufficiently large. However, it is quite easy to achievesynchronization in a scale-free dynamical network no matter how large the networkis (Wang and Chen 2002). Moreover, the synchronizability of a scale-free dynamicalnetwork is robust against random removal of nodes but is fragile to specific removalof the most highly connected nodes.

The scale-free property and high degree of clustering (the small world effect) arenot exclusive for a large number of real networks. Yet, most models proposed todescribe the topology of complex networks have difficulty in capturing simulta-neously these two features. It has been shown in Ravasz and Barabasi (2003) thatthese two features are the consequence of a hierarchical organization present in thenetworks. This argument also agrees with that proposed by Herbert Simon (Simon1997), who argues:

. . .we could expect complex systems to be hierarchies in a world in which complexity hasto evolve from simplicity. In their dynamics, hierarchies have a property, near decom-

posability, that greatly simplifies their behaviour. Near decomposability also simplifies thedescription of complex systems and makes it easier to understand how the informationneeded for the development of the system can be stored in reasonable compass.

Indeed, many networks are fundamentally modular: one can easily identify groups ofnodes that are highly interconnected with each other but have few or no links tonodes outside the group to which they belong. This clearly identifiable modularorganization is at the origin of high degree of clustering coefficient. On the otherhand, these modules can be organized in a hierarchical fashion into increasinglylarge groups, giving rise to ‘hierarchical networks’, while still maintaining thescale-free topology. Thus, modularity, scale-free character and a high degree ofclustering can be achieved under a common roof. Moreover, in hierarchical net-works, the degree of clustering characterizing the different groups follows a strictscaling law, which can be used to identify the presence of hierarchical structurein real networks.

The mathematical theory of graphs with arbitrary degree distributions known as‘generalized random graphs’ can be found in Newman et al. (2001) and Newman(2003). Using the ‘generating function formulation’, the authors have been able tosolve the percolation problem (i.e. have found conditions for predicting the appear-ance of a giant component) and have obtained formulae for calculating the clusteringcoefficient and average path length for generalized random graphs. The authors haveproposed and studied models of propagation of diseases, failures, fads and synchro-nization on such graphs and have extended their results for bipartite and directedgraphs.

Network dynamics, though in its infancy, promises a formal framework to char-acterize the organizational and functional aspects in supply chains (Thadakamallaet al. 2004). With the changing trends in supply chains, many new issueshave become critical: organizational resistance to change, inter-functional or inter-

4261Supply-chain networks: a complex adaptive systems perspective

Page 50: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

organizational conflicts, relationship management, and consumer and marketbehaviour. Such problems are ill structured, behavioural, and cannot be easilyaddressed by analytical tools such as mathematical programming. Successfulsupply-chain integration depends on the supply-chain partners’ ability to synchro-nize and share real-time information. The establishment of collaborative relation-ship among supply-chain partners is a pre-requisite to information sharing. Asa result, successful supply-chain management relies on systematically studyingquestions like

1. What are the robust architectures for collaboration, and what are the coor-dination strategies that lead to such architectures?

2. If different entities make decisions on whether or not to cooperate on thebasis of imperfect information about the group activity, and incorporateexpectations on how their decision will affect other entities, can overallcooperation be sustained for long periods of time?

3. How do the expectations, group size, and diversity affect coordination andcooperation?

4. Which kinds of organizations are most able to sustain ongoing collectiveaction, and how might such organizations evolve over time?

Network dynamics addresses many such questions and should be explored in thecontext of supply chains.

8. Conclusions and future work

The idea of managing the whole supply chain and transforming it into a highlyautonomous, dynamic, agile, adaptive and reconfigurable network certainly providesan appealing vision for managers. The infrastructure provided by information tech-nology has made this vision partially realizable. But the inherent complexityof supply chains makes the efficient utilization of information technology anelusive endeavour. Tackling this complexity has been beyond the existing toolsand techniques and requires revival and extensions.

As a result, we emphasized in this paper that in order to effectively understanda supply-chain network, it should be treated as a CAS. We laid down some initialideas for the extension of modelling and analysis of supply chains using the con-cepts, tools and techniques arising in the study of CAS. As future work, we needto verify the feasibility and usefulness of the proposed techniques in the contextof large-scale supply chains.

Acknowledgements

The authors wish to acknowledge DARPA (Grant No. MDA972-1-1-0038under the UltraLog Programme) for their generous support for this research.In addition, the partial support provided by NSF (Grant No. DMII-0075584)to Professor Kumara is greatly appreciated. The authors wish to thank theanonymous reviewers for their comments and valuable suggestions.

4262 A. Surana et al.

Page 51: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

References

Abarbanel, H.D.I., The Analysis of Observed Chaotic Data, 1996 (Springer: New York).Abarbanel, H.D.I. and Kennel, M.B., Local false nearest neighbors and dynamical dimensions

from observed chaotic data. Phys. Rev. E, 1993, 47, 3057–3068.Adami, C., Introduction to Artificial Life, 1998 (Springer: New York).Albert, R. and Barabasi, A.L., Statistical mechanics of complex networks. Rev. Mod. Phys.,

2002, 74, 47.Albert, R., Barabasi, A.L., Jeong, H. and Bianconi, G., Power-law distribution of the World

Wide Web. Science, 2000, 287, 2115.Albert R., Jeong, H., Barabasi, A.L., Error and attack tolerance of complex networks. Nature,

2000, 406, 378–382.Balakrishnan, A., Kumara, S. and Sundaresan, S., Exploiting information technologies

for product realization. Inform. Syst. Front. J. Res. Innov., 1999, 1(1), 25–50.Barabasi, A.L., The physics of web. Phys. World, July 2001.Barabasi, A.L., Albert, R. and Jeong, H., Scale-free characteristics of random networks:

The topology of the World Wide Web. Physica A, 2000, 281, 69–77.Baranger, M., Chaos, complexity, and entropy: a physics talk for non-physicists. Available

online at: http://necsi.org/projects/baranger/cce.pdf (accessed May 2005).Bar-Yam, Y., Dynamics of Complex Systems, 1997 (Addison-Wesley: Reading, MA).Bollobas, B., Random Graphs, 1985 (Academic Press: London).Callaway, D.S., Newman, M.E.J., Strogatz, S.H. and Watts, D.J., Network robustness and

fragility: Percolation on random graphs. Phys. Rev. Lett., 2000, 85, 5468–5471.Carlson, J.M., Doyle, J., Highly optimised tolerance: a mechanism for power laws in designed

systems. Phys. Rev. E, 1999, 60(2), 1412–1427.Casdalgi, M., Nonlinear prediction of chaotic time series. Physica D, 1989, 35, 335–356.Choi, T.Y., Dooley, K.J., Ruangtusanathan, M., Supply networks and complex adaptive

systems: control versus emergence. J. Operat. Manage., 2001, 19(3), 351–366.Cooper, M.C., Lambert, D.M. and Pagh, J.D., Supply chain management: more than a new

name for logistics. Int. J. Logist. Manage., 1997, 8(1), 1–13.Crutchfield, J.P., Knowledge and meaning . . . chaos and complexity. In Modeling Complex

Systems, edited by L. Lam and H.C. Morris, 1992 (Springer: Berlin), pp. 66–101.Crutchfield, J.P., The calculi of emergence: computation, dynamics and induction. Physica D,

1994, 75, 11–54.Crutchfield, J.P. and Young, K., Inferring statistical complexity. Phys. Rev. Lett., 1989,

63, 105–108.Crutchfield, J.P. and Feldman, D.P., Synchronizing to the environment: information theoretic

constraints on agent learning. Adv. Complex Syst., 2001, 4, 251–264.Crutchfield, J.P. and Feldman, D.P., Regularities unseen, randomness observed: levels of

entropy convergence. Chaos, 2003, 13, 25–54.Csete, M.E. and Doyle, J., Reverse engineering of biological complexity. Science, 2002,

295, 1664.Dorogovtsev, S.N. and Mendes, J.F.F., Evolution of networks. Adv. Phys., 2002, 51,

1079–1187.Erramilli, A. and Forys, L.J., Oscillations and chaos in a flow model of a switching system.

IEEE J. Select. Areas Commun., 1991, 9(2), 171–178.Farmer, J.D., Ott, E. and Yorke, J.A., The dimension of chaotic attractors. Physica D, 1983,

7, 153–180.Farmer, J.D. and Sidorowich, J.J., Predicting chaotic time-series. Phys. Rev. Lett., 1987,

59(8), 845–848.Feichtinger, G., Hommes, C.H. and Herold, W., Chaos in a simple deterministic queuing

system. ZOR- Math. Meth. Oper. Res., 1994, 40, 109–119.Feldman, D.P. and Crutchfield, J.P., Discovering non-critical organization: statistical mech-

anical, information theoretic and computational views of patterns in one-dimensionalspin systems. Santa Fe Institute Working Paper 98–04–026, 1998.

Flake, G.W., The Computational Beauty of Nature, 1998 (MIT Press: Cambridge, MA).Forrester, J.W., Industrial Dynamics, 1961 (MIT Press: Cambridge, MA).

4263Supply-chain networks: a complex adaptive systems perspective

Page 52: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

Fraser, A.M. and Swinney, H.L., Independent coordinates for strange attractors from mutualinformation. Phys. Rev. A, 1983, 33(2), 1134–1140.

Ghosh, S., The role of modeling and asynchronous distributed simulation in analyzing com-plex systems of the future. Inform. Syst. Front. J. Res. Innov., 2002, 4(2), 166–171.

Glance, N.S., Dynamics with expectations. PhD thesis, Physics Department, StanfordUniversity, 1993.

Hogg, T. and Huberman, B.A., The behavior of computational ecologies. In The Ecologyof Computation, edited by B.A. Huberman, pp. 77–116, 1988 (Elsevier Science:Amsterdam).

Hogg, T. and Huberman, B.A., Controlling chaos in distributed systems. IEEE Trans. onSystems, Man and Cybernetics, 1991, 21, 1325–1332.

Kaneko, K. and Tsuda, I., Complex Systems: Chaos and Beyond—A constructive approachwith applications in life sciences, 2000 (Springer: Berlin).

Kennel, M., Brown, R. and Abarbanel, H.D.I., Determining embedding dimension forphase-space reconstruction using a geometrical construction. Phys. Rev. A, 1992,45(6), 3403–3068.

Kephart, J.O., Hogg, T. and Huberman, B.A., Dynamics of computational ecosystems.Phys. Rev. A, 1989, 40(1), 404–421.

Kephart, J.O., Hogg, T. and Huberman, B.A., Collective behavior of predictive agents.Physica D, 1990, 42, 48–65.

Kumara, S., Ranjan, P., Surana, A. and Narayanan, V., Decision making in logistics:A chaos theory based analysis. Ann. Int. Inst. Prod. Eng. Res. (Ann. CIRP), 2003, 1,381–384.

Lee, S., Gautam, N., Kumara, S., Hong, Y., Gupta, H., Surana, A., Narayanan, V.,Thadakamalla, H., Brinn, M. and Greaves, M., Situation identification using dynamicparameters in complex agent-based planning systems. Intell. Eng. Syst. Artif. NeuralNetworks, 2002, 12, 555–560.

Lind, D. and Marcus, B., An introduction to symbolic dynamics and coding, 1995 (CambridgeUniversity Press: New York).

Llyod, S. and Slotine, J.J.E., Information theoretic tools for stable adaptation and learning.Int. J. Adapt. Control Signal Process., 1996, 10, 499–530.

Maxion, R.A., Toward diagnosis as an emergent behavior in a network ecosystem. Physica D,1990, 42, 66–84.

Min, H. and Zhou, G., Supply chain modeling: past, present and future. Comput. Ind. Eng.,2002, 43, 231–249.

Mukherjee, S., Osuna, E. and Girosi, F., Nonlinear prediction of chaotic time series usingsupport vector machines. In IEEE Workshop on Neural Networks for Signal ProcessingVII, 1997, pp. 511–519.

Newman, M.E.J., Models of the small world. J. Stat. Phys., 2000, 101, 819–841.Newman, M.E.J., The spread of epidemic disease on networks. Phys. Rev. E, 2002, 66.Newman, M.E.J., Random graphs as models of networks. In Handbook of Graphs and

Networks, edited by S. Bornholdt and H.G. Schuster, 2003 (Wiley-VCH, Berlin).Newman, M.E.J., Strogatz, S.H. and Watts, D.J., Random graphs with arbitrary degree

distribution and their applications. Phys. Rev. E, 2001, 64.Ott, E., Chaos in Dynamical Systems, 1996 (Cambridge University Press: Cambridge).Powell, M.J.D., Radial basis function approximation to polynomials. Preprint University of

Cambridge, 1987.Rasmussen, D.R. and Moseklide, M., Bifurcations and chaos in generic management model.

Eur. J. Oper. Res., 1988, 35, 80–88.Ravasz, E. and Barabasi, A.L., Hierarchical organization in complex networks. Phys. Rev. E,

2003, 67.Sano, M. and Sawada, Y., Measurement of the Lyapunov Spectrum form a chaotic time

series. Phys. Rev. Lett., 1985, 55, 1082–1084.Sawhill, B.K., Self-organised criticality and complexity theory. In 1993 Lectures in Complex

Systems, edited by L. Nadel and D.L. Stein, pp. 143–170, 1995 (Addison-Wesley:Reading, MA).

4264 A. Surana et al.

Page 53: ComplexNetworks

Dow

nloa

ded

By:

[Pen

nsyl

vani

a S

tate

Uni

vers

ity] A

t: 23

:20

21 A

pril

2008

Schieritz, N. and Grobler, A., Emergent structures in supply chains—A study integratingagent-based and system dynamics modeling, in 36th Annual Hawaii InternationalConference on System Sciences, Big Island, HI, 2003.

Shalizi, C.R. and Crutchfield, J.P., Computational mechanics: pattern and prediction,structure and simplicity. J. Stat. Phys., 2001, 104, 816–879.

Shalizi, C.R., Causal architecture, complexity and self-organization in time series andcellular automata. Available online at: http://www.santafe.edu/�shalizi/thesis, 2005(accessed May 2005).

Simon, H.A., The Sciences of the Artificial, 3rd ed., 1997 (The MIT Press, Cambridge, MA).Strogatz, S.H., Nonlinear Dynamics and Chaos, 1994 (Addison-Wesley: Reading, MA).Strogatz, S.H., Exploring complex networks. Nature, 2001, 410, 268–276.Takens, F., Detecting Strange Attractor in Turbulence. In L.S. Young, Editor, Dynamical

Systems and Turbulence, Lecture Notes in Mathematics, 1981, 898, 366–381,(Springer, New York).

Thadakamalla, H.P., Raghavan, U.N., Kumara, S. and Albert, R., Survivability ofmultiagent-based supply networks: a topological perspective. IEEE Intell. Syst., 2004,19(5), 24–31.

Wang, X.F. and Chen, G., Synchronization in scale-free dynamical networks: robustnessand fragility. IEEE Trans. Circuits and Systems I Fundam. Theory Applic., 2002,49(1), 54–62.

Watts, D.J. and Strogatz, S.H., Collective dynamics of ‘small-world’ networks. Nature, 1998,393, 440–442.

Wolfram, S., Cellular Automata and Complexity: Collected Papers, 1994 (Addison-Wesley:Reading, MA).

4265Supply-chain networks: a complex adaptive systems perspective

Page 54: ComplexNetworks

24 1541-1672/04/$20.00 © 2004 IEEE IEEE INTELLIGENT SYSTEMSPublished by the IEEE Computer Society

D e p e n d a b l e A g e n t S y s t e m s

Survivability ofMultiagent-BasedSupply Networks: ATopological PerspectiveHari Prasad Thadakamalla, Usha Nandini Raghavan, Soundar Kumara, and Réka Albert, Pennsylvania State University

Supply chains involve complex webs of interactions among suppliers, manufac-

turers, distributors, third-party logistics providers, retailers, and customers.

Although fairly simple business processes govern these individual entities, real-time

capabilities and global Internet connectivity make today’s supply chains complex.

Fluctuating demand patterns, increasing customerexpectations, and competitive markets also add totheir complexity.

Supply networks are usually modeled as multi-agent systems (MASs).1 Because supply chain man-agement must effectively coordinate among manydifferent entities, a multiagent modeling frameworkbased on explicit communication between these enti-ties is a natural choice.1 Furthermore, we can repre-sent these multiagent systems as a complex networkwith entities as nodes and the interactions betweenthem as edges. Here we explore the survivability (andhence dependability) of these MASs from the viewof these complex supply networks.

Today’s supply networks aren’t dependable—orsurvivable—in chaotic environments. For example,Figure 1 shows how mediocre a typical supply net-work’s reaction to a node or edge failure is comparedto a network with built-in redundancy.

Survivability is a critical factor in supply networkdesign. Specifically, supply networks in dynamicenvironments, such as military supply chains duringwartime, must be designed more for survivabilitythan for cost effectiveness. The more survivable anetwork is, the more dependable it will be.

We present a methodology for building survivable

large-scale supply network topologies that canextend to other large-scale MASs. Building surviv-able topologies alone doesn’t, however, make anMAS dependable. To create survivable—and hencedependable—multiagent systems, we must also con-sider the interplay between network topology andnode functionalities.

A topological perspectiveTo date, the survivability literature has emphasized

network functionalities rather than topology. To besurvivable, a supply network must adapt to a dy-namic environment, withstand failures, and be flex-ible and highly responsive. These characteristicsdepend on not only node functionality but also thetopology in which nodes operate.

The components of survivabilityFrom a topological perspective, the following

properties encompass survivability, and we denotethem as survivability components.

The first is robustness. A robust network can sustainthe loss of some of its structure or functionalities andmaintain connectedness under node failures, whetherthe failure is random or is a targeted attack. We mea-sure robustness as the size of the network’s largest

You can improve a

multiagent-based

supply network’s

survivability by

concentrating on the

topology and its

interplay with

functionalities.

Page 55: ComplexNetworks

connected component, in which a path existsbetween any pair of nodes in that component.

The second is responsiveness. A respon-sive network provides timely services andeffective navigation. Low characteristic pathlength (the average of the shortest pathlengths from each node to every other node)leads to better responsiveness, which deter-mines how quickly commodities or infor-mation proliferate throughout the network.

The third is flexibility. This property de-pends on the presence of alternate paths.Good clustering properties ensure alternatepaths to facilitate dynamic rerouting. Theclustering coefficient, defined as the ratiobetween the number of edges among a node’sfirst neighbors and the total possible numberof edges between them, characterizes thelocal order in a node’s neighborhood.

The fourth is adaptivity. An adaptive net-work can rewire itself efficiently—that is,restructure or reorganize its topology on thebasis of environmental shifts—to continueproviding efficient performance. For exam-ple, if a supplier can’t reliably meet a cus-tomer’s demands, the customer should beable to choose another supplier.

A typical supply chain with a tree-like orhierarchical structure lacks these four prop-erties—the clustering coefficient is nearlyzero, and the characteristic path length scaleslinearly with the number of nodes (or agents)N. In designing complex agent networkswith built-in survivability, conventional opti-mization tools won’t work because of theproblem’s extremely large scale. When net-works were smaller, we could understandtheir overall behavior by concentrating onthe individual components’ properties. Butas networks expand, this becomes impossi-ble, so we shift focus to the statistical prop-erties of the collective behavior.

Using topologiesStudying complex networks such as pro-

tein interaction networks, regulatory net-works, social networks of acquaintances,and information networks such as the Webis illuminating the principles that make thesenetworks extremely resilient to their respec-tive chaotic environments. The core princi-ples extracted from this exploration willprove valuable in building robust models forsurvivable complex agent networks.

Complex-network theory currently offersrandom-graph, small-world, and scale-free net-work topologies as likely candidates for sur-vivable networks (see the sidebar “Complex

Networks” for more on this topic). Evaluatingthese for survivability (see Figure 2), we findthat no one topology consistently outperformsthe others. For example, while small-world net-works have better clustering properties, scale-free networks are significantly more robust torandom attacks. So, we can’t directly use these

topologies to build supply networks. We can,however, use their evolution principles to buildsupply chain networks that perform well in allrespects of the survivability components.

Researchers have studied complex net-works in part to find ways to design evolu-tionary algorithms for modeling networks

SEPTEMBER/OCTOBER 2004 www.computer.org/intelligent 25

(a)

Battalion Battalion

Battalion

Battalion

Battalion

Battalion

Battalion

Battalion

Battalion

FSB

MSBNodefailure

Battalion

Battalion

Battalion Battalion

Battalion

Battalion

Battalion

Battalion

Battalion

Battalion

Battalion

BattalionBattalion

FSB

FSB

Battalion

Battalion

Battalion

FSB

(b)

Battalion Battalion

Battalion

Battalion

Battalion

Battalion

Battalion

Battalion

Battalion

FSB

MSBNodefailure

Battalion

Battalion

Battalion Battalion

Battalion

Battalion

Battalion

Battalion

Battalion

Battalion

Battalion

BattalionBattalion

FSB

FSB

Battalion

Battalion

Battalion

FSB

Figure 1. How redundancy affects survivability. (a) A part of the multiagent system for military logistics modeled using the UltraLog (www.ultralog.net) program. This example models each entity, such as main support battalion, forward support battalion,and battalion, as a software agent. (We’ve changed the agents’ names for security reasons.) In the current scenario, MSBs send the supplies to the FSBs, who in turnforward these to battalions. (b) A modified military supply chain with some redundancybuilt into it. This network performs much better in the event of node failures and henceis more dependable than the first network.

Page 56: ComplexNetworks

26 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

Social scientists, among the first to study complex networksextensively, focused on acquaintance networks, where nodesrepresent people and edges represent the acquaintances be-tween them. Social psychologist Stanley Milgram posited the“six degrees of separation” theory that in the US, a person’ssocial network has an average acquaintance path length of six.1

This turns out to be a particular instance of the small-worldproperty found in many real-world networks, which, despitetheir large size, have a relatively short path between any twonodes.

An early effort to model complex networks introduced ran-dom graphs for modeling networks with no obvious pattern orstructure.2 A random graph consists of N nodes, and two nodesare connected with a connection probability p. Random graphsare statistically homogeneous because most nodes have a de-gree (that is, the number of edges incident on the node) closeto the graph’s average degree, and significantly small and largenode degrees are exponentially rare.

However, studying the topologies of diverse large-scale net-works found in nature reveals a more complex and unpredict-able dynamic structure. Two measures quantifying networktopology found to differ significantly in real networks are thedegree distribution (the fraction of nodes with degree k) andthe clustering coefficient. Later modeling efforts focused ontrying to reproduce these properties.3,4 Duncan Watts andSteven Strogatz introduced the concept of small-world net-works to explain the high degree of transitivity (order) in com-plex networks.5 The Watts-Strogatz model starts from a regu-lar 1D ring lattice on L nodes, where each node is joined to itsfirst K neighbors. Then, with probability p, each edge is re-wired with one end remaining the same and the other endchosen uniformly at random, without allowing multiple edges(more than one edge joining a pair of vertices) or loops (edgesjoining a node to itself). The resulting network is a regular lat-tice when p = 0 and a random graph when p = 1, because alledges are rewired. This network class displays a high clusteringcoefficient for most values of p, but as p → 1, it behaves like arandom graph.

Albert-László Barabási and Réka Albert later proposed anevolutionary model based on growth and preferential attach-ment leading to a network class, scale-free networks, withpower law distribution.6 Many real-world networks’ degreedistribution follows a power law, fundamentally differentfrom the peaked distribution observed in random graphs andsmall-world networks. Barabási and Albert argued that astatic random graph of the Watts-Strogatz model fails to cap-ture two important features of large-scale networks: theirconstant growth and the inherent selectivity in edge creation.Complex networks such as the Web, collaboration networks,or even biological networks are growing continuously withthe creation of new Web pages, the birth of new individuals,and gene duplication and evolution. Moreover, unlike ran-dom networks where each node has the same chance ofacquiring a new edge, new nodes entering the scale-free net-work don’t connect uniformly to existing nodes but attachpreferentially to higher-degree nodes. This reasoning ledBarabási and Albert to define two mechanisms:

• Growth: Start with a small number of nodes—say, m0—andassume that every time a node enters the system, m edgesare pointing from it, where m < m0.

• Preferential attachment: Every time a new node enters thesystem, each edge of the newly connected node preferentially

attaches to a node i with degree ki with the probability

Research has shown that the second mechanism leads to anetwork with power-law degree distribution P(k) = k–γ withexponent γ = 3. Barabási and Albert dubbed these networks“scale free” because they lack a characteristic degree and havea broad tail of degree distribution. Following the proposal ofthe first scale-free model, researchers have introduced manymore refined models, leading to a well-developed theory ofevolving networks.7

Protein-to-protein interactions in metabolic and regulatorynetworks and other biological networks also show a strikingability to survive under extreme conditions. Most of thesenetworks’ underlying properties resemble the three mostfamiliar networks found in the literature (see Figure 1 in thearticle).

Complex networks are also vulnerable to node or edgelosses, which disrupt the paths between nodes or increasetheir length and make communication between them harder.In severe cases, an initially connected network breaks downinto isolated components that can no longer communicate.Numerical and analytical studies of complex networks indicatethat a network’s structure plays a major role in its response tonode removal. For example, scale-free networks are morerobust than random or small-world networks with respect torandom node loss.8 Large scale-free networks will tolerate theloss of many nodes yet maintain communication betweenthose remaining. However, they’re sensitive to removal of themost-connected nodes (by a targeted attack on critical nodes,for example), breaking down into isolated pieces after losingjust a small percentage of these nodes.

References

1. S. Milgram, “The Small World Problem,” Psychology Today, vol. 2,May 1967, pp. 60–67.

2. P. Erdös and A. Renyi, “On Random Graphs I,” Publicationes Math-ematicae, vol. 6, 1959, pp. 290–297.

3. S.N. Dorogovtsev and J.F.F. Mendes, “Evolution of Networks,”Advances in Physics, vol. 51, no. 4, 2002, pp. 1079–1187.

4. M.E.J. Newman, “The Structure and Function of Complex Net-works,” SIAM Rev., vol. 45, no. 2, 2003, pp. 167–256.

5. D.J. Watts and S.H. Strogatz, “Collective Dynamics of ‘Small-World’Networks,” Nature, vol. 393, June 1998, pp. 440–442.

6. A.-L. Barabási and R. Albert, “Emergence of Scaling in RandomNetworks,” Science, vol. 286, Oct. 1999, pp. 509–512.

7. R. Albert and A.-L. Barabási, “Statistical Mechanics of Complex Net-works,” Reviews of Modern Physics, Jan. 2002, pp. 47–97.

8. R. Albert, H. Jeong, and A.-L Barabási, “Error and Attack Toleranceof Complex Networks,” Nature, July 2000, pp. 378–382.

Π i =ki

k jj∑

Complex Networks

Page 57: ComplexNetworks

with distinct properties found in nature. Anetwork’s evolutionary mechanism is de-signed such that the network’s inherent prop-erties emerge owing to the mechanism. Forexample, small-world networks were de-signed to explain the high clustering coeffi-cient found in many real-world networks,while the “rich get richer” phenomenon usedin the Barabási-Albert model explains thescale-free distribution.2

Similarly, we seek to design supply net-works with inherent survivability components(see Figure 3), obtaining these components bycoining appropriate growth mechanisms. Ofcourse, having all the aforementioned proper-ties in a network might not be practically fea-sible—we’d likely have to negotiate trade-offsdepending on the domain. Also, domain speci-ficities might make it inefficient to incorpo-rate all properties. For instance, in a supplynetwork, we might not be able to rewire theedges as easily as we can in an informationnetwork, so we would concentrate more onobtaining other properties such as low char-acteristic path length, robustness to failuresand attacks, and high clustering coefficients.So, the construction of these networks isdomain specific.

Establishing edges between network nodesis also domain specific. For instance, in a sup-ply network, a retailer would likely prefer tohave contact with other geographically con-venient nodes (distributors, warehouses, andother retailers). At the same time, nodes in afile-sharing network would prefer to attach toother nodes known to locate or hold manyshared files (that is, nodes of high degree).

Obtaining the survivability components

While evolving the network on the basisof domain constraints, we need to incorpo-rate four traits into the growth model forobtaining good survivability components.

The first is low characteristic path length.During network construction, establish a fewlong-range connections between nodes thatrequire many steps to reach one fromanother.

The second is good clustering. When twonodes A and B are connected, new edgesfrom A should prefer to attach to neighborsof B, and vice versa.

The third is robustness to random and tar-geted failure. Preferential attachment—wherenew nodes entering the network don’t connectuniformly to existing nodes but attach prefer-entially to higher-degree nodes (see the side-

SEPTEMBER/OCTOBER 2004 www.computer.org/intelligent 27

Scales linearly with Nfor small p. And for higher

p scales as log(N)

High, but as p → 1behaves like

a random graph

Similar response asrandom networks.

This is because it hasa degree distribution

similar to randomnetworks.

Scales aslog(N)

p (the connectionprobability)

Similar responsesto both random

and targetedattacks

Scales aslog(N)/ log(logN))

((m–1)/2)*(log(N)/N)where m is the number of

edges with which anode enters

Highly resilient to randomfailures while being very

sensitive to targetedattacks

Characteristicpath length

Clusteringcoefficient

Robustnessto failures

Degreedistribution

PeakedPoisson

<k>

k k

P(k)

P(k)

Power law

Small-worldRandom Scale-free

2 4 6 8 10 12k

1.00.80.60.40.20.0

1 10 100 1,000

10.1

0.010.001

0.0001

Figure 2. Comparing the survivability components of random, small-world, and scale-free networks.

Failed node

Failed edge

Alternate path

Retailer

RetailerWarehouse

Manufacturer

Retailer

Retailer

RetailerWarehouse

Retailer

Retailer

RetailerWarehouse

Retailer Retailer

RetailerWarehouse

Manufacturer

Retailer

Retailer

RetailerWarehouse

Retailer

Retailer

RetailerWarehouse

Retailer

Manufacturer

Figure 3. The transition from supply chain to a survivable supply network.

Page 58: ComplexNetworks

bar for more details)—leads to scale-free net-works with very few critical and many not-so-critical nodes. Here we measure a node’s criti-cality in terms of the number of edges incidenton it. So, these networks are robust to randomfailures (the probability that a critical node failsis very small) but not to targeted attacks (attack-ing the very few critical nodes would devastatethe network). Also, it’s not practically feasibleto have all nodes play an equal role in the sys-tem—that is, be equally critical. Thus, the net-work should have a good balance of critical,not-so-critical, and noncritical nodes.

The fourth is efficient rewiring. Rewiringedges in a network might or might not be fea-sible, depending on the domain. But whereit is feasible, it should preserve the otherthree traits.

Although complete graphs come equippedwith good survivability components, theyclearly aren’t cost effective. Allowing every

agent in an agent system to communicatewith every other agent uses system band-width inefficiently and could completely bogdown the system. So the amount of redun-dancy results from a trade-off between costand survivability.

An illustrationSuppose we want to build a topology for a

military supply chain that must be survivablein wartime. First, we broadly classify the net-work nodes into three types:

• Battalions prefer to attach to a highly con-nected node so that the supplies from dif-ferent parts of the network will be trans-ported to them in fewer steps. Battalionsalso require quick responses, so they preferthe subsequent links to attach to nodes atconvenient shorter distances (in our modelwe considered a fixed distance of two).

• A forward support battalion prefers toattach to highly connected nodes so thatits supplies proliferate faster in the net-work. The supply range from an FSB goesup to a particular distance (at most threein our model).

• A main support battalion also prefers toattach to a highly connected node toenable its supplies to proliferate faster inthe network. We assume an unrestrictedsupply reach from an MSB, thus facilitat-ing some long-range connections.

In a conventional logistics network, theMSBs supply commodities (such as ammu-nitions, food, and fuel) to the FSBs, who inturn forward them to the battalions. Ourapproach doesn’t restrict node functionali-ties as such—for example, we assume thateven a battalion can supply commodities toother battalions if necessary.

D e p e n d a b l e A g e n t S y s t e m s

28 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

8

7

6

5

4

3

2

1

00 1 2

In (degree k)3 4 5

In (n

umbe

r of n

odes

of d

egre

e >

k)

Model 1Model 2Model 3

5.6

5.5

5.4

5.3

5.2

5.1

5.0

4.9

4.8

4.76.5 7.0 7.5

Ln (number of nodes)8.0 8.5 9.0

Char

acte

ristic

pat

h le

ngth

(b)(a)

Figure 5. How our proposed network performed: (a) the log-log of the degree distribution for all the three networks; (b) the characteristic path length of the proposed network against the log of the number of nodes.

Figure 4. Snapshots of the modeled networks during their growth, where the nodes number 70. MSBs are green, FSBs are red, andbattalions are blue.

Preferential attachment Random attachment Proposed attachment rules

Page 59: ComplexNetworks

Growth mechanismsStart with a small number of nodes—say,

m0—and assume that every time a nodeenters the system, m edges are pointing fromit, where m < m0. Battalions, FSBs, andMSBs enter the system in a certain ratiol:m:n where l > m > n:

• A battalion has one edge pointing from itand a second edge added with a probabil-ity p.

• An FSB has three edges pointing from it.• An MSB has five edges pointing from it.

The attachment rules applied depend onwhich node type enters the system:

• For a battalion, the first edge attaches to anode i of degree ki with the probability

The second edge, which exists with aprobability p, attaches to a randomly cho-sen node at a distance of two.

• For an FSB, the first edge attaches to anode i of degree ki with the probability

The subsequent edges attach to a randomlychosen node at a distance of at most three.

• For an MSB, each edge attaches preferen-tially to a node i with degree ki with theprobability

Simulation and analysisUsing this method, we built a network of

1,000 nodes with l, m, and n being 25, 4, and1 (we obtained these values from the currentconfiguration of the military logistics net-work used in the UltraLog program) and p = 1/2. We compared this network’s surviv-ability with that of two other networks builtusing similar mechanisms except that oneused purely preferential attachment rules(similar to scale-free networks) and the otherused purely random attachment rules (simi-lar to random networks) (see Figure 4). Allthree networks had an equal number of edgesand nodes to ensure fair comparison.

We refer to the networks built from ran-dom, preferential, and proposed attachmentrules as Models 1, 2, and 3, respectively. Aswe noted earlier, a typical military supplychain (see Figure 1a) with a tree-like or hier-archical structure has deficient survivabilitycomponents, making it vulnerable to bothrandom and targeted attacks. Models 1, 2,and 3 outperform the typical supply networkin all survivability components.

Figure 5a shows the three models’ degreedistribution. As expected, the preferential-

attachment network has a heavier tail thanthe other two networks. We measured sur-vivability components for all three networks.

The clustering coefficient for Model 3 wasthe highest (see Table 1). The Model 3 attach-ment rules, especially those for battalions andFSBs, contribute implicitly to the clusteringcoefficient, unlike the attachment rules in theother models.

The proposed network model’s characteris-tic path length measured between 4.69 and 4.79despite the network’s large size (1,000 nodes).This value puts it between the preferential andrandom attachment models. Also, as Figure 5bshows, the characteristic path length increasesin the order of log(N) as N increases. Model 3clearly displays small-world behavior.

To measure network robustness, we re-moved a set of nodes from the network andevaluated its resilience to disruptions. Weconsidered two attacks types: random and tar-geted. To simulate random attacks, we re-moved a set of randomly chosen nodes; fortargeted attacks, we removed a set of nodesselected strictly in order of decreasing nodedegree. To determine robustness, we mea-sured how the size of each network’s largestconnected component, characteristic pathlength, and maximum distance within thelargest connected component changed as afunction of the number of nodes removed. Weexpect that in a robust network the size of thelargest connected component is a consider-able fraction of N (usually O(N)), and the dis-tances between nodes in the largest connectedcomponent don’t increase considerably.

For random failures, Figure 6 shows thatModel 3’s robustness nearly matches that ofthe preferential-attachment network (note thatscale-free networks are highly resilient to ran-

Πii

jj

k

k=

∑.

Πii

jj

k

k=

∑.

Πii

jj

k

k=

∑.

SEPTEMBER/OCTOBER 2004 www.computer.org/intelligent 29

Table 1. Simulation results.

Model 1 (random) Model 2 (preferential) Model 3 (proposed)

Clustering coefficient 0.0038–0.0039 0.013–0.019 0.35–0.39Characteristic path length 5.26–5.36 4.09–4.25 4.69–4.79

109876543210

1,000900800700600500400300200100

0

Percentage of nodes removed

Size

of t

he la

rges

t con

nect

ed c

ompo

nent

Model 1Model 2Model 3

0 20 40 60 80Percentage of nodes removed

Aver

age

leng

th in

the

larg

est

conn

ecte

d co

mpo

nent

0 20 40 60 80

25

20

15

10

5

0

Percentage of nodes removed

Max

imum

dis

tanc

e in

the

larg

est

conn

ecte

d co

mpo

nent

0 20 40 60 80(b) (c)(a)

Figure 6. Responses of the three networks to random attacks, plotted as (a) the size of the largest connected component, (b) characteristic path length, and (c) maximum distance in the largest connected component against the percentage of nodesremoved from each network.

Page 60: ComplexNetworks

dom failures). Also, the decrease in the largestconnected component’s size is linear withrespect to the number of nodes removed, whichcorresponds to the slowest possible decrease.So, we can safely conclude that these networks

are robust to random failures—most of thenodes in the network have a degree less thanfour, and removing smaller-degree nodesimpacts the networks much less than removinghigh-degree nodes (called hubs).

These networks’ responses to targetedattacks are inferior compared to their re-silience to random attacks (see Figure 7). Thesize of the largest component decreases muchfaster for the proposed network than for theother two networks, but the proposed networkperforms better on the other two robustnessmeasures. That is, the distances in the con-nected component are considerably smallerwhen more than 10 percent of nodes areremoved.

We can improve robustness to targetedattacks by introducing constraints in theattachment rules. Here we assume that nodetype constrains its degree—that is, networkMSBs, FSBs, and battalions can’t have morethan m1, m2, and m3 edges, respectively, inci-dent on them. This is a reasonable assump-tion because in military logistics (or any orga-

D e p e n d a b l e A g e n t S y s t e m s

30 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

18

16

14

12

10

8

6

4

2

0

1,200

1,000

800

600

400

200

0

Percentage of nodes removed

Size

of t

he la

rges

t con

nect

ed c

ompo

nent

Model 1Model 2Model 3

0 20 40 60Percentage of nodes removed

Aver

age

leng

th in

the

larg

est

conn

ecte

d co

mpo

nent

45

40

35

30

25

20

15

10

5

0

Percentage of nodes removed

Max

imum

dis

tanc

e in

the

larg

est

conn

ecte

d co

mpo

nent

(b) (c)(a)0 6020 40 5010 30 0 6020 40 5010 30

Figure 7. The three networks’ responses to targeted attacks, plotted as (a) the size of the largest connected component, (b) characteristic path length, and (c) maximum distance in the largest connected component against the percentage of nodesremoved from each network.

1,000

900

800

700

600

500

400

300

200

100

0

Percentage of nodes removed

Sixe

of t

he la

rges

t con

nect

ed c

ompo

nent

0 20 40 605010 30

Modelm1 = 4, m2 = 10, m3 = 25m1 = 4, m2 = 8, m3 = 12m1 = 3, m2 = 6, m3 = 10

Figure 8. The proposed network’sresponses to targeted attacks for different values of m1, m2, and m3.

T h e A u t h o r s

Hari Prasad Thadakamalla is a PhD student in the Department of Industrialand Manufacturing Engineering at Pennsylvania State University, UniversityPark. His research interests include supply networks, search in complex net-works, stochastic systems, and control of multiagent systems. He obtainedhis MS in industrial engineering from Penn State. Contact him [email protected].

Usha Nandini Raghavan is a PhD student in industrial and manufacturingengineering at Pennsylvania State University, University Park. Her researchinterests include supply chain management, graph theory, complex adaptivesystems, and complex networks. She obtained her MSc in mathematics fromthe Indian Institute of Technology, Madras. Contact her at [email protected].

Soundar Kumara is a Distinguished Professor of industrial and manufac-turing engineering. He holds joint appointments with the Department of Com-puter Science and Engineering and School of Information Sciences and Tech-nology at Pennsylvania State University. His research interests includecomplexity in logistics and manufacturing, software agents, neural networks,and chaos theory as applied to manufacturing process monitoring and diag-nosis. He’s an elected active member of the International Institute of Pro-duction Research. Contact him at [email protected].

Réka Albert is an assistant professor of physics at Pennsylvania State Uni-versity and is affiliated with the Huck Institutes of the Life Sciences. Hermain research interest is modeling the organization and dynamics of com-plex networks. She received her PhD in physics from the University of NotreDame. She is a member of the American Physical Society and the Society forMathematical Biology. Contact her at [email protected].

Page 61: ComplexNetworks

nization’s logistics management, for that mat-ter), the suppliers might not be able to cater tomore than a certain number of battalions orother suppliers. Initial experiments (see Fig-ure 8) show that a network with these con-straints displayed improved robustness to tar-geted attacks while not deviating much fromthe clustering coefficient. However, as werestrict how many links a node can receive,the network’s characteristic path lengthincreases (see Table 2). Clearly a trade-offexists between robustness to targeted attacksand the average characteristic path length.

The fourth measure of survivability, net-

work adaptivity, relates more tonode functionality than totopology. Node functionalityshould facilitate the ability torewire. For example, if a sup-plier can’t fulfill a customer’sdemands, the customer seeksan alternate supplier—that is,the edge connected to the sup-

plier is rewired to be incident on another sup-plier. Our model rewires according to itsattachment rules. We conjecture that in sucha case, other survivability components (clus-tering coefficient, characteristic path length,and robustness) will be intact. But to make astronger argument we need more analysis inthis direction.

The growth mechanism we describe ismore like an illustration because

real-world data aren’t available, but we canalways modify it to incorporate domain

constraints. For example, we’ve assumedthat a new node can attach preferentially toany node in the network, which might notbe a realistic assumption. If specific geo-graphical constraints are known, we canmodify our mechanism to make the newnode entering the system attach preferen-tially only within a set of nodes that satisfythe constraints.

AcknowledgmentsWe thank the anonymous reviewers for their

helpful comments. We acknowledge DARPA forfunding this work under grant MDA972-01-1-0038 as part of the UltraLog program.

References

1. J.M. Swaminathan, S.F. Smith, and N.M.Sadeh, “Modeling Supply Chain Dynamics:A Multiagent Approach,” Decision Sciences,vol. 29, no. 3, 1998, pp. 607–632.

2. A.-L. Barabási and R. Albert, “Emergence ofScaling in Random Networks,” Science, vol.286, Oct. 1999, pp. 509–512.

SEPTEMBER/OCTOBER 2004 www.computer.org/intelligent 31

Table 2. The proposed network’s characteristic pathlength for different m1, m2, and m3 values.

Values of m1, m2, and m3 Characteristic path length

m1 = ∞, m2 = ∞, m3 = ∞ 4.4m1 = 4, m2 = 10, m3 = 25 6.2m1 = 4, m2 = 8, m3 = 12 7.1m1 = 3, m2 = 6, m3 = 10 8.0

Look to the FutureIEEE Internet Computing reports emerging tools, technologies, and applications implemented through the Internet to support a worldwidecomputing environment.

In 2004-2005, we’ll look at• Homeland Security• Internet Access to Scientific Data• Recovery-Oriented

Approaches to Dependability• Information Discovery:

Needles and Haystacks• Internet Media

... and more!

www.computer.org/internet/

Page 62: ComplexNetworks

Search in weighted complex networks

Hari P. Thadakamalla,1 R. Albert,2 and S. R. T. Kumara1

1Department of Industrial Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA2Department of Physics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA

�Received 5 August 2005; published 30 December 2005�

We study trade-offs presented by local search algorithms in complex networks which are heterogeneous inedge weights and node degree. We show that search based on a network measure, local betweenness centrality�LBC�, utilizes the heterogeneity of both node degrees and edge weights to perform the best in scale-freeweighted networks. The search based on LBC is universal and performs well in a large class of complexnetworks.

DOI: 10.1103/PhysRevE.72.066128 PACS number�s�: 89.75.Fb, 89.75.Hc, 02.10.Ox, 89.70.�c

I. INTRODUCTION

Many large-scale distributed systems found in communi-cations, biology or sociology can be represented by complexnetworks. The macroscopic properties of these networkshave been studied intensively by the scientific community,which has led to many significant results �1–3�. Graph prop-erties such as the degree distribution and clustering coeffi-cient were found to be significantly different from randomgraphs �4,5� which are traditionally used to model these net-works. One of the major findings is the presence of hetero-geneity in various properties of the elements in the network.For instance, a large number of the real-world networks in-cluding the World Wide Web, the Internet, metabolic net-works, phone call graphs, and movie actor collaboration net-works are found to be highly heterogeneous in node degree�i.e., the number of edges per node� �1–3�. The clusteringcoefficients, quantifying local order and cohesiveness �6�,were also found to be heterogeneous, i.e., C�k��k−1 �7�.These discoveries along with others related to the mixingpatterns of complex networks initiated a revival of networkmodeling in the past few years �1–3�. Focus has been onunderstanding the mechanisms which lead to heterogeneityin node degree and implications of it on the network proper-ties. It was also shown that this heterogeneity has a hugeimpact on the network properties and processes such as net-work resilience �8�, network navigation, local search �9�, andepidemiological processes �10�.

Recently, there have been many studies �11–17� that triedto analyze and characterize weighted complex networkswhere edges are characterized by capacities or strengths in-stead of a binary state �present or absent�. These studies haveshown that heterogeneity is prevalent in the capacity andstrength of the interconnections in the network as well. Manyresearchers �11,13–16� have pointed out that the diversity ofthe interaction strengths is critical in most real-world net-works. For instance, sociologists have shown that the weaklinks that people have outside their close circle of friendsplay a key role in keeping the social system together �11�.The Internet traffic �16� or the number of passengers in theairline network �15� are critical dynamical quantities that canbe represented by using weighted edges. Similarly, the diver-sity of the predator-prey interactions and of metabolic reac-

tions is considered as a crucial component of ecosystems�13� and metabolic networks, respectively �14�. Thus it isincomplete to represent real-world systems with equal inter-action strengths between different pairs of nodes.

In this paper, we concentrate on finding efficient decen-tralized search strategies on networks which have heteroge-neity in edge weights. This is an intriguing and relativelylittle studied problem that has many practical applications.Suppose some required information such as computer files orsensor data is stored at the nodes of a distributed network.Then to quickly determine the location of particular informa-tion, one should have efficient decentralized search strate-gies. This problem has become more important and relevantdue to the advances in technology that led to many distrib-uted systems such as sensor networks �18�, peer-to-peer net-works �19� and dynamic supply chains �20�. Previous re-search on local search algorithms �9,21–24� has assumed thatall the edges in the network are equivalent. In this paper westudy the complex tradeoffs presented by efficient localsearch in weighted complex networks. We simulate and ana-lyze different search strategies on Erdős-Rényi �ER� randomgraphs and scale-free networks. We define a new local pa-rameter called local betweenness centrality �LBC� and pro-pose a search strategy based on this parameter. We show thatirrespective of the edge weight distribution this search strat-egy performs the best in networks with a power-law degreedistribution �i.e., scale-free networks�. Finally, we show thatthe search strategy based on LBC is usually equivalent withhigh-degree search �discussed by Adamic et al. �9�� in un-weighted �binary� networks. This implies that the searchbased on LBC is more universal and is optimal in a largerclass of complex networks.

The rest of the paper is organized as follows. In Sec. II,we describe the problem in detail and briefly discuss theliterature related to search in complex networks. In Sec. III,we define the local betweenness centrality �LBC� of a node’sneighbor and show that it depends on the weight of the edgeconnecting the node and neighbor and on the degree of theneighbor. Section IV explains our methodology and differentsearch strategies considered. Section V gives the details ofthe simulations conducted for comparing these strategies. InSec. VI, we discuss the findings from simulations on ERrandom and scale-free networks. In Sec. VII, we prove thatthe LBC and degree-based search are equivalent in un-

PHYSICAL REVIEW E 72, 066128 �2005�

1539-3755/2005/72�6�/066128�8�/$23.00 ©2005 The American Physical Society066128-1

Page 63: ComplexNetworks

weighted networks. Finally, we give conclusions in Sec.VIII.

II. PROBLEM DESCRIPTION AND LITERATURE

The problem of decentralized search goes back to the fa-mous experiment by Milgram �25� illustrating the short dis-tances in social networks. One of the striking observations ofthis study as pointed out by Kleinberg �21� was the ability ofthe nodes in the network to find short paths by using onlylocal information. Currently, Watts et al. �26� are doing anInternet-based study to verify this phenomenon. Kleinbergdemonstrated that the emergence of such phenomenon re-quires special topological features �21�. Considering a familyof network models that generalizes the Watts-Strogatz model�6�, he showed that only one particular model among thisinfinite family can support efficient decentralized algorithms.Unfortunately, the model given by Kleinberg is too con-strained and represents only a very small subset of complexnetworks. Watts et al. presented another model to explain thephenomena observed by Milgram which is based upon plau-sible hierarchical social structures �22�. However, in manyreal-world networks, it may not be possible to divide thenodes into sets of groups in a hierarchy depending on theproperties of the nodes as in the Watts et al. model.

Recently, Adamic et al. �9� showed that in networks witha power-law degree distribution �scale-free networks� highdegree seeking search is more efficient than random walksearch. In random walk search, the node that has the messagepasses it to a randomly chosen neighbor. This process con-tinues until it reaches the target node. In high degree search,the node passes the message to the neighbor that has thehighest degree among all nodes in the neighborhood, assum-ing that a more connected neighbor has a higher probabilityof reaching the target node. The high degree search wasfound to outperform the random walk search consistently innetworks having power-law degree distribution for differentexponents varying from 2.0 to 3.0. Using generating functionformalism given by Newman �27�, Adamic et al. showed thatfor random walk search the number of steps s until approxi-mately the whole graph is revealed is given by s�N3�1−2/��,where � is the power-law exponent, while high degree searchleads to a much more favorable scaling s�N2−4/�.

The assumption of equal edge weights �meaning the cost,bandwidth, distance, or power consumption associated withthe process described by the edge� usually does not hold inreal-world networks. As pointed out by many researchers�11–17�, it is incomplete to assume that all the links areequivalent while studying the dynamics of large-scale net-works. The total path length �p� in a weighted network forthe path 1-2-3¯-n, is given by p=�i=1

n wi,i+1, where wi,i+1 isthe weight on the edge from node i to node i+1. Eventhough high-degree search results in a path with smallernumber of hops, the total path length may be high if theweights on these edges are high. Thus, to be more realisticand closer to real-world networks we need to explicitly in-corporate weights in any proposed search algorithm. In thispaper, we are interested in designing decentralized searchstrategies for networks that have the following properties:

�1� Its node degree distribution follows a power law withexponent varying from 2.0 to 3.0. Although we discuss thesearch strategies for networks with Poisson degree distribu-tion �ER random graphs�, we concentrate more on scale freenetworks since most of the real world networks are found toexhibit this behavior �1–3�.

�2� It has nonuniformly distributed weights on the edges.Here the weights signify the cost or time taken to pass themessage or query. Hence, smaller weights correspond toshorter and/or better paths. We consider different distribu-tions such as Beta, uniform, exponential, and power law.

�3� It is unstructured and decentralized. That is, eachnode has information only about its first and second neigh-bors and no global information about the target is available.Also, the nodes can communicate only with their immediateneighbors.

�4� Its topology is dynamic �ad hoc� while still maintain-ing its statistical properties. These particular types of net-works are becoming more prevalent due to advances made indifferent areas of engineering especially in sensor networks�18�, peer-to-peer networks �19� and dynamic supply chains�20�. Here, in this paper we analyze the problem of findingdecentralized algorithms in such weighted complex net-works, which we believe has not been explored to date.

Among the search strategies employed in this paper is astrategy based on the local betweenness centrality �LBC� ofnodes. Betweenness centrality �also called load�, first devel-oped in the context of social networks �28�, has been recentlyadapted to optimal transport in weighted complex networksby Goh et al. �17�. These authors have shown that in thestrong disorder limit �that is, when the total path length isdominated by the maximum edge weight over the path�, theload distribution follows a power law for both ER randomgraphs and scale-free networks. To determine a node’s be-tweenness as defined by Goh et al. one would need to havethe knowledge of the entire network. Here we define a localparameter called local betweenness centrality �LBC� whichonly uses information on the first and second neighbors of anode, and we develop a search strategy based on this localparameter.

III. LOCAL BETWEENNESS CENTRALITY

We assume that each node in the network has informationabout its first and second neighbors. For calculating the localbetweenness centrality of the neighbors of a given node weconsider the local network formed by that node �which wewill call the root node�, its first and second neighbors. Then,the betweenness centrality, defined as the fraction of shortestpaths going through a node �3�, is calculated for the firstneighbors in this local network. Let L�i� be the LBC of aneighbor node i in the local network. Then L�i� is given by

L�i� = �s�i�t

s,t�local network

�st�i��st

where �st is the total number of shortest paths �where short-est path means the path over which the sum of weights is

THADAKAMALLA, ALBERT, AND KUMARA PHYSICAL REVIEW E 72, 066128 �2005�

066128-2

Page 64: ComplexNetworks

minimal� from node s to t. �st�i� is the number of theseshortest paths passing through i. If the LBC of a node ishigh, it implies that this node is critical in the local network.Intuitively, we can see that the LBC of a neighbor dependson both its degree and the weight of the edge connecting it tothe root node. For example, let us consider the networks inFigs. 1�a� and 1�b�. Suppose that these are the local networksof node 1. In the network in Fig. 1�a�, node 2 has the highestdegree among the neighbors of node 1 �i.e., nodes 2, 3, 4,and 5�. All the shortest paths from the neighbors of node 2�6, 7, 8, and 9� to other nodes must pass through node 2.Hence, we see that higher degree for a node definitely helpsin obtaining a higher LBC.

Now consider a similar local network but with a higherweight on the edge from 2 to 1 as shown in Fig. 1�b�. In thisnetwork all the shortest paths through node 2 will also passthrough node 3 �2-3-1� instead of going directly from node 2to node 1. Hence, the LBC of the neighbor node 3 will behigher than that of neighbor 2. Thus we clearly see that theLBCs of the neighbors of node 1 depend on both the neigh-bors’ degrees and the weights on the edges connecting them.Note that a neighbor having the highest degree or the small-est weight on the edge connecting it to root node does notnecessarily imply that it will have the highest LBC.

IV. METHODOLOGY

In unweighted scale-free networks, Adamic et al. �9� haveshown that high degree search which utilizes the heterogene-ity in node degree is efficient. Thus one expects that inweighted power-law networks, an efficient search strategyshould consider both the edge weights and node degree. Weinvestigated the following set of search strategies given inthe order of the amount of information required.

�1� Choose a neighbor randomly: The node tries to reachthe target by passing the message/query to a randomly se-lected neighbor.

�2� Choose the neighbor with smallest edge weight: Thenode passes the message along the edge with minimumweight. The idea behind this strategy is that by choosing aneighbor with minimum edge weight the expected distancetraveled would be less.

�3� Choose the best-connected neighbor: The node passesthe message to the neighbor which has the highest degree.The idea here is that by choosing a neighbor which is well-connected, there is a higher probability of reaching the targetnode. Note that this strategy takes the least number of hopsto reach the target �9�.

�4� Choose the neighbor with the smallest averageweight: The node passes the message to the neighbor whichhas the smallest average weight. The average weight of anode is the average weight of all the edges incident on thatnode. The idea here is similar to the second strategy. Insteadof passing the message greedily along the least weightededge, the algorithm passes to the node that has the minimumaverage weight.

�5� Choose the neighbor with the highest LBC: The nodepasses the message to the neighbor which has the highestLBC. A neighbor with highest LBC would imply that manyshortest paths in the local network pass through this neighborand the node is critical in the local network. Thus, by passingthe message to this neighbor, the probability of reaching thetarget node quicker is higher.

Note that the strategy which depends on LBC utilizesslightly move information than strategy 4, namely the edgeweights between second neighbors, but it is considerablymore informative, it reflects the heterogeneities in both edgeweights and node degree. Thus we expect that this searchwill perform better than the others, that is, it will givesmaller path lengths than the others.

V. SIMULATIONS

For comparing the search strategies we used simulationson random networks with Poisson and power-law degree dis-tributions. For homogeneous networks we used the Poissonrandom network model given by Erdős and Rényi �4�. Weconsidered a network on N nodes where two nodes are con-nected with a connection probability p. For scale-free net-works, we considered different values of degree exponent �ranging from 2.0 to 3.0 and a degree range of 2�k�m�N1/� and generated the network using the method given byNewman �27�. Once the network was generated, we ex-tracted the largest connected component, shown to alwaysexist for 2���3.48 �29� and in ER networks for p�1/N�5�. We did our analysis on this largest connected componentthat contains the majority of the nodes after verifying that thedegree distribution of this largest connected component isnearly the same as in the original graph. The weights on theedges were generated from different distributions such asBeta, uniform, exponential and power law. We consideredthese distributions in the increasing order of their variancesto understand how the heterogeneity in edge weights affectsdifferent search strategies.

Further, we randomly choose K pairs �source and target�of nodes. The source, and consecutively each node receiving

FIG. 1. �a� In this configuration, neighbor node 2 has a higherLBC than other neighbors 3, 4, and 5. This depicts why higherdegree for a node helps in obtaining higher LBC. �b� However, inthis configuration the LBC of the neighbor node 3 is higher thanneighbors 2, 4, and 5. This is due to the fact that the edge connect-ing 1 and 2 has a larger weight. These two configurations show thatthe LBC of a neighbor depends both on the edge weight and thenode degree. In both cases, edge weights other than those shown inthe figure are assumed to be 1.

SEARCH IN WEIGHTED COMPLEX NETWORKS PHYSICAL REVIEW E 72, 066128 �2005�

066128-3

Page 65: ComplexNetworks

the message, sends the message to one of its neighbors de-pending on the search strategy. The search continues until themessage reaches the node whose neighbor is the target node.In order to avoid passing the message to a neighbor that hasalready received it, a list li of all the neighbors that receivedthe message is maintained at each node i. During the searchprocess, if node i passes the message to its neighbor j, whichdoes not have any more neighbors that are not in the list lj,then the message is routed back to the node i. This particularneighbor j is marked to note that this node cannot pass themessage any further. The average path distance was calcu-lated for each search strategy from the paths obtained forthese K pairs. We repeated this simulation for 10 to 50 in-stances of the Poisson and power-law networks depending onthe size of the network.

VI. ANALYSIS

First, we study and compare different search strategies onER random graphs. The weights on the edges were generatedfrom an exponential distribution with mean 5 and variance25. Table I compares the performance of each strategy for thenetworks of size 500, 1000, 1500, and 2000 nodes. We tookthe connection probability to be p=0.004 and hence a giantconnected component always exists �5�. From Table I, it isevident that the strategy which passes the message to theneighbor with the least edge weight is better than all theother strategies in homogeneous networks. Remarkably, asearch strategy that needs less information than other strate-

gies �3, 4, and 5�, performed best, while high degree searchand LBC did not perform well since the network is highlyhomogenous in node degree.

However, if we decrease the heterogeneity in edgeweights �use a distribution with lesser variance�, we observethat high LBC search performs best �see Table II�. In con-clusion, when the heterogeneity of edge weights is high com-pared to the relative homogeneity of node degrees, the searchstrategies which are purely based on edge weights wouldperform better. However, as the heterogeneity of the edgeweights decrease the importance of edge weights decreasesand strategies which consider both edge weights and nodedegree perform better.

Next we investigated how the search strategies performon scale-free networks. Figure 2 shows the scaling of differ-ent search strategies for scale-free networks with exponent2.1. As conjectured, the search strategy that utilizes the het-erogeneities of both the edge weights and nodes’ degrees �thehigh LBC search� performed better than the other strategies.A similar phenomenon was observed for different exponentsof the scale-free network �see Table III�. Except for thepower-law exponent 2.9, the high LBC search was consis-tently better than others. We observe that as the heterogene-ity in the node degree decreases �i.e., as power-law exponentincreases�, the difference between the high LBC search andother strategies decreases. When the exponent is 2.9, the per-formance of LBC, minimum edge weight and high degreesearches were almost the same. Note that when the networkbecomes homogeneous in node degree the minimum edgeweight search performs better than high LBC search �Table

TABLE I. Comparison of search strategies in a Poisson random network. The edge weights were gener-ated randomly from an exponential distribution with mean 5 and variance 25. The values in the table are theaverage path distances obtained for each search strategy in these networks. The strategy which passes themessage to the neighbor with the least edge weight performs the best.

Search strategy 500 nodes 1000 nodes 1500 nodes 2000 nodes

Random walk 1256.3 2507.4 3814.9 5069.5

Minimum edge weight 597.6 1155.7 1815.5 2411.2

Highest degree 979.7 1923.0 2989.2 3996.2

Minimum average node weight 832.1 1652.7 2540.5 3368.6

Highest LBC 864.7 1800.7 2825.3 3820.9

TABLE II. Comparison of search strategies in a Poisson random network with 2000 nodes. The tablegives results for different edge weight distributions. The mean for all the distributions is 5 and variance is �2.The values in the table are the average path lengths obtained for each search strategy in these networks. Whenthe weight heterogeneity is high, the minimum edge weight search strategy was the best. However, when theheterogeneity of edge weights is low, then LBC performs better.

Search strategyBeta

�2=2.3Uniform�2=8.3

Exp.�2=25

Power law�2=4653.8

Random walk 1271.91 1284.9 1253.68 1479.32

Minimum edge weight 1017.74 767.405 577.83 562.39

Highest degree 994.64 1014.05 961.5 1182.18

Minimum average node weight 1124.48 954.295 826.325 732.93

Highest LBC 980.65 968.775 900.365 908.48

THADAKAMALLA, ALBERT, AND KUMARA PHYSICAL REVIEW E 72, 066128 �2005�

066128-4

Page 66: ComplexNetworks

I�. This implies that similarly to high degree search �9�, theeffectiveness of high LBC search also depends on the het-erogeneity in node degree.

Table IV shows the performance of all the strategies on ascale-free network �exponent 2.1� with different edge weightdistributions. The percentage values in the brackets show byhow much the average distance for that search is higher thanthe average distance obtained by the high LBC search. As inrandom graphs, we observe that the impact of edge weightson search strategies increases as the heterogeneity of theedge weights increase. For instance, when the variance �het-erogeneity� of edge weights is small, high degree search isbetter than the minimum edge weight search. On the otherhand, when the variance �heterogeneity� of edge weights ishigh, the minimum edge weight strategy is better than highdegree search. In each case, the high LBC search which re-flects both edge weights and node degree always out-performed the other strategies. Thus, it is clear that in power-law networks, irrespective of the edge weight distributionand the power-law exponent, high LBC search always per-forms better than the other strategies �Tables III and IV�.

Figure 3 gives a pictorial comparison of the behavior of

high degree and high LBC search as the heterogeneity of theedge weights increase �based on the results shown in TableIV�. Since many studies �11–17� have shown that there is alarge heterogeneity in the capacity and strengths of the inter-connections in the real networks, it is important that localsearch is based on LBC rather than high degree as shown byAdamic et al. �9�.

Note that LBC has been adopted from the definition ofbetweenness centrality �BC� which requires the globalknowledge of the network. BC is defined as the fraction ofshortest paths among all nodes in the network that passthrough a given node and measures how critical the node isfor optimal transport in complex networks. In unweightedscale-free networks there exists a scaling relation betweennode betweenness centrality and degree, BC�k� �30�. Thisimplies that the higher the degree, the higher is the BC of thenode. This may be the reason why high degree search isoptimal in unweighted scale-free networks �as shown byAdamic et al. �9��. However, Goh et al. �17� have shown thatno scaling relation exists between node degree and between-ness centrality in weighted complex networks. It will be in-teresting to see the relationship between local and globalbetweenness centrality in our future work. Also, note that theminimum average node weight strategy �strategy 4� usesonly slightly less information than LBC search. However,LBC search consistently and significantly outperforms it �see

TABLE III. Comparison of search strategies in power-law network on 2000 nodes with different power-law exponents. The edge weights are generated from an exponential distribution with mean 5 and variance25. The values in the table are the average path lengths obtained for each search strategy in these networks.LBC search, which reflects both the heterogeneities in edge weights and node degree, performed the best forall power-law exponents. The systematic increase in all path lengths with the increase of the power-lawexponent � is due to the fact that the average degree of the network decreases with �.

Power-law exponent=

Search strategy 2.1 2.3 2.5 2.7 2.9

Random walk 1108.70 1760.58 2713.11 3894.91 4769.75

Minimum edge weight 318.95 745.41 1539.23 2732.01 3789.56

Highest degree 375.83 761.45 1519.74 2693.62 3739.61

Minimum average node weight 605.41 1065.34 1870.43 3042.27 3936.03

Highest LBC 298.06 707.25 1490.48 2667.74 3751.53

FIG. 2. Scaling for search strategies in power-law networks withexponent 2.1. The edge weights are generated from an exponentialdistribution with mean 10 and variance 100. The symbols representrandom walk ��� and search algorithms based on minimum edgeweight ���, high degree ���, minimum average node weight ���,and high LBC ���.

FIG. 3. The pictorial comparison of the behavior of high degreeand high LBC search as the heterogeneity of edge weights increasesin power-law networks. Note that average distances are normalizedwith respect to high LBC search.

SEARCH IN WEIGHTED COMPLEX NETWORKS PHYSICAL REVIEW E 72, 066128 �2005�

066128-5

Page 67: ComplexNetworks

Tables I–IV�. This implies that LBC search uses the informa-tion correctly.

VII. LBC ON UNWEIGHTED NETWORKS

In this section, we show that the neighbor with the highestLBC is usually the same as the neighbor with the highestdegree in unweighted networks. Hence, high LBC searchwould give identical results as high degree search in un-weighted networks. As mentioned earlier, in unweightedscale-free networks, there is a scaling relation between the�global� BC of a node and its degree, as BC�k� �30�. How-ever, this does not imply that in an unweighted local networkthe neighbor with highest LBC is always the same as theneighbor with the highest degree. Here, we show that in mostcases the highest degree and the highest LBC neighbors co-incide. First, let us consider a tree-like local network withoutany loops similar to the network configuration shown in Fig.4�a�. In a local network, there are three types of nodes,namely, root node, first neighbors and second neighbors. Letthe degree of the root node be d and the degree of the neigh-bors be k1 ,k2 ,k3 , . . . ,kd. The number of nodes �n� in thelocal network is n=1+� j=1

d kj �one root node, d first neigh-bors and � j=1

d �kj −1� second neighbors�. In a tree networkthere is a single shortest path between any pair of nodes sand t, thus �st�i� is either zero or one. Then the LBC of a firstneighbor i is given by L�i�= �ki−1��n−2�+ �ki−1��n−ki�where ki is the degree of the neighbor. The first term is due tothe shortest paths from ki−1 neighbors of node i to n−2remaining nodes �other than node i and the neighbor j� in thenetwork. The second term is due to the shortest paths fromn−ki nodes �other than ki−1 neighbors and node i� to ki−1neighbors of node i. Note that we choose not to explicitlytake into account the symmetry of distance in undirectednetworks and count the s-t and t-s paths separately. L�i� is anincreasing function if ki�n− 1

2 , a condition that is alwayssatisfied since n=1+� j=1

d kj. This implies that in a local net-work with treelike structure, the neighbor with highest de-

gree has the highest LBC. We extend the above result forother configurations of the local network by considering dif-ferent possible cases.

The possible edges other than the edges present in a tree-like local network are an edge between two first neighbors,an edge between a first neighbor and a second neighbor andan edge between two second neighbors. As shown in Fig.4�b�, an edge among two first neighbors changes the LBC ofthe root node but not that of the neighbors. Figure 4�c� showsa configuration of a local network with an edge added be-tween a first and a second neighbor. Now, there is a smallchange in the LBCs of the neighbors �nodes 2 and 3� whichare connected to a common second neighbor �node 9�. Since

TABLE IV. Comparison of search strategies in power-law networks with exponent 2.1 and 2000 nodeswith different edge weight distributions. The mean for all the edge weight distributions is 5 and the varianceis �2. The values in the table are the average distances obtained for each search strategy in these networks.The values in the brackets show the relative difference between average distance for each strategy withrespect to the average distance obtained by the LBC strategy. LBC search, which reflects both the heteroge-neities in edge weights and node degree, performed the best for all edge weight distributions.

Search strategyBeta

�2=2.3Uniform�2=8.3

Exp.�2=25

Power law�2=4653.8

Random walk 1107.71�202%�

1097.72�241%�

1108.70�272%�

1011.21�344%�

Minimum edge weight 704.47�92%�

414.71�29%�

318.95�7%�

358.54�44%�

Highest degree 379.98�4%�

368.43�14%�

375.83�26%�

394.99�59%�

Minimum average node weight 1228.68�235%�

788.15�145%�

605.41�103%�

466.18�88%�

Highest LBC 366.26 322.30 298.06 247.77

FIG. 4. �a� A configuration of a local network with a tree likestructure. In such local networks, the neighbor with the highestdegree has the highest LBC. �b� A local network with an edgebetween two first neighbors. Here again the neighbor with the high-est degree has the highest LBC. �c� A local network with an edgebetween a first neighbor and a second neighbor. Although there ischange in LBCs of neighbors, the order remains the same.

THADAKAMALLA, ALBERT, AND KUMARA PHYSICAL REVIEW E 72, 066128 �2005�

066128-6

Page 68: ComplexNetworks

node 9 is now shared by neighbors 2 and 3, the LBC con-tributed by node 9 is divided between these two neighbors.The LBC of such a neighbor i is L�i�= �ki−2��n−2�+ �ki

−2��n−ki�+ �n−kj −1� where ki is the degree of the neighbori and kj is the degree of the neighbor with which node i hasa common second neighbor. The decrease in the LBC ofneighbor i is �n−ki+kj −1�. If there are two neighbors withthe same degree �one with a common second neighbor andanother without any� then the neighbor without any commonsecond neighbors will have higher LBC. Another possiblechange of order with respect to LBC would be with a neigh-bor l of degree kl=ki−1 �if it exists�. However, L�i�−L�l�= �n−ki−kj +1� is always greater than 0, since n=� j=1

d kj inthis local network. Thus the only scenario under which theorder of neighbors with respect to LBC is different than theirorder with respect to degree when adding an edge betweenfirst and second neighbors is if that creates two first neigh-bors with the same degree. A similar argument leads to anidentical conclusion in the case of adding an edge betweentwo second neighbors as well.

The above discussion suggests that the highest degreeneighbor is always the same as the highest LBC neighbor.This is not true in few peculiar instances of local networks.For example, consider the network shown in Fig. 5 whichhas several edges between the first and second neighbors. We

see that the highest degree neighbor is not the same as thehighest LBC neighbor. In this local network, the highest de-gree first neighbor �node 2�, participates in several four-nodecircuits that include the root node. Thus, there are multipleshortest paths starting from second-neighbor nodes on thesecycles �nodes 6, 7, 9, 10� and the contributions to node 2’sLBC from the paths that pass through it are smaller thanunity, consequently the LBC of node 2 will be relativelysmall. This may be one of the reasons why the highest-degree neighbor node 2 is not the highest LBC neighbor. Wefeel that this happens only in some special instances of localnetworks. From about 50 000 simulations we found that in99.63% of cases the highest degree neighbor is the same asthe highest LBC neighbor. Hence, we can conclude that inunweighted networks the neighbor with highest LBC is usu-ally identical to the neighbor with the highest degree.

VIII. CONCLUSION

In this paper we have given a new direction for localsearch in complex networks with heterogeneous edgeweights. We proposed a local search algorithm based on anew local measure called local betweenness centrality. Westudied complex tradeoffs presented by efficient local searchin weighted complex networks and showed that heterogene-ity in edge weights has huge impact on search. Moreover, theimpact of edge weights on search strategies increases as theheterogeneity of the edge weights increase. We also demon-strated that the search strategy based on LBC utilizes theheterogeneity in both the node degree and edge weight toperform the best in power-law weighted networks. Further-more, we have shown that in unweighted power-law net-works the neighbor with the highest degree is usually thesame as the neighbor with the highest LBC. Hence, our pro-posed search strategy based on LBC is more universal and isefficient in a larger class of complex networks.

ACKNOWLEDGMENTS

The authors would like to acknowledge the National Sci-ence Foundation �Grant No. SST 0427840� and a Sloan Re-search Fellowship to one of the authors �R. A.� for makingthis work feasible. Any opinions, findings and conclusions orrecommendations expressed in this material are those of theauthor�s� and do not necessarily reflect the views of the Na-tional Science Foundation �NSF�. In addition, the first author�H.P.T.� would like to thank Usha Nandini Raghavan for in-teresting discussions on issues related to this work.

�1� R. Albert and A. L. Barabasi, Rev. Mod. Phys. 74, 1 �2002�.�2� S. N. Dorogovtsev and J. F. F. Mendes, Adv. Phys. 51, 1079

�2002�.�3� M. E. J. Newman, SIAM Rev. 45, 167 �2003�.�4� P. Erdos and A. Renyi, Publ. Math. �Debrecen� 6, 290 �1959�.�5� B. Bollobas, Random Graphs �Academic, London, 1985�.�6� D. J. Watts and S. H. Strogatz, Nature �London� 393, 440

�1998�.�7� E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and

A.-L. Barabási, Science 297, 1551 �2002�.�8� R. Albert, A. L. Barabási, and H. Jeong, Nature �London� 406,

378 �2000�; R. Albert, I. Albert, and G. L. Nakarado, Phys.Rev. E 69, 025103 �2004�.

�9� L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Hu-

FIG. 5. An instance of a local network where the order of neigh-bors with respect to LBC is not the same as the order with respectto node degree.

SEARCH IN WEIGHTED COMPLEX NETWORKS PHYSICAL REVIEW E 72, 066128 �2005�

066128-7

Page 69: ComplexNetworks

berman, Phys. Rev. E 64, 046135 �2001�.�10� R. Pastor-Satorras and A. Vespignani, Phys. Rev. E 63,

066117 �2001�; Phys. Rev. Lett. 86, 3200 �2001�; Phys. Rev. E65, 035108�R� �2002�; 65, 036104 �2002�; in Handbook ofGraphs and Networks, edited by S. Bornholdt and H. G.Schuster �Wiley-VCH, Berlin, 2003�.

�11� M. Granovetter, Am. J. Sociol. 786, 1360 �1973�; M. E. J.Newman, Phys. Rev. E 64, 016132 �2001�.

�12� S. H. Yook, H. Jeong, A. L. Barabasi, and Y. Tu, Phys. Rev.Lett. 86, 5835 �2001�; J. D. Noh and H. Rieger, Phys. Rev. E66, 066127 �2002�; L. A. Braunstein, S. V. Buldyrev, R. Co-hen, S. Havlin, and H. E. Stanley, Phys. Rev. Lett. 91, 168701�2003�; A. Barrat, M. Barthelemy, and A. Vespignani, Phys.Rev. E 70, 066149 �2004�.

�13� S. L. Pimm, Food Webs, 2nd ed. �The University of ChicagoPress, Chicago, IL, 2002�.

�14� A. E. Krause, K. A. Frank, D. M. Mason, R. E. Ulanowicz, andW. W. Taylor, Nature �London� 426, 282 �2003�; E. Almaas,B. Kovacs, T. Vicsek, Z. N. Oltvai, and A. L. Barabasi, ibid.427, 839 �2004�.

�15� A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespig-nani, Proc. Natl. Acad. Sci. U.S.A. 101, 3747 �2004�; R. Gui-mera, S. Mossa, A. Turtschi, and L. A. N. Amaral, ibid. 102,7794 �2005�.

�16� R. Pastor-Satorras and A. Vespignani, Evolution and Structureof the Internet: A Statistical Physics Approach �CambridgeUniversity Press, Cambridge, 2004�.

�17� K. I. Goh, J. D. Noh, B. Kahng, and D. Kim, cond-mat/0410317 �unpublished�.

�18� D. Estrin, R. Govindan, J. Heidemann, and S. Kumar, Proceed-ings of the Fifth Annual ACM/IEEE International Conferenceon Mobile Computing and Networking, 1999, pp. 263–270; U.

N. Raghavan, H. P. Thadakamalla, and S. R. T. Kumara, Pro-ceedings of the Thirteenth International Conference on Ad-vanced Computing and Communications-ADCOM, 2005.

�19� G. Kan, in Peer-to-Peer Harnessing the Power of DisruptiveTechnologies, edited by A. Oram �O’Reilly, Beijing, 2001�; T.Hong, in Peer-to-Peer Harnessing the Power of DisruptiveTechnologies, edited by A. Oram �O’Reilly, Beijing, 2001�.

�20� H. P. Thadakamalla, U. N. Raghavan, S. R. T. Kumara, and R.Albert, IEEE Intell. Syst. 19, 24 �2004�.

�21� J. Kleinberg, Nature �London� 406, 845 �2000�; Proceedingsof the 32nd ACM Symposium on Theory of Computing, 2000,163–170; Adv. Neural Inf. Process. Syst. 14, 431 �2001�.

�22� D. J. Watts, P. S. Dodds, and M. E. J. Newman, Science 296,1302 �2002�.

�23� L. A. Adamic and E. Adar, cond-mat/0310120 �unpublished�.�24� A. Arenas, A. Cabrales, A. Diaz-Guilera, R. Guimera, and F.

Vega, in Statistical mechanics of complex networks, edited byR. Pastor-Satorras, M. Rubi, and A. Diaz-Guilera �Springer-Verlag, Berlin, 2003�.

�25� S. Milgram, Psychol. Today 1, 61 �1967�.�26� D. J. Watts, P. S. Dodds, and R. Muhamad, http://

smallworld.columbia.edu/index.html�27� M. E. J. Newman, in Handbook of Graphs and Networks, ed-

ited by S. Bornholdt and H. G. Schuster �Wiley-VCH, Berlin,2003�.

�28� S. Wasserman and K. Faust, Social Network Analysis �Cam-bridge University Press, Cambridge, UK, 1994�.

�29� W. Aiello, F. Chung, and L. Lu, Proceedings of the Thirty-second Annual ACM Symposium on Theory of Computing,2000, pp. 171–180.

�30� K. I. Goh, B. Kahng, and D. Kim, Phys. Rev. Lett. 87, 278701�2001�.

THADAKAMALLA, ALBERT, AND KUMARA PHYSICAL REVIEW E 72, 066128 �2005�

066128-8

Page 70: ComplexNetworks

Search in spatial scale-free networks

H P Thadakamalla1,3, R Albert2 and S R T Kumara1

1 Department of Industrial Engineering, The Pennsylvania State University,University Park, Pennsylvania, 16802, USA2 Department of Physics, The Pennsylvania State University, University Park,Pennsylvania, 16802, USAE-mail: [email protected], [email protected] and [email protected]

New Journal of Physics 9 (2007) 190Received 12 March 2007Published 28 June 2007Online at http://www.njp.org/doi:10.1088/1367-2630/9/6/190

Abstract. We study the decentralized search problem in a family of param-eterized spatial network models that are heterogeneous in node degree. We in-vestigate several algorithms and illustrate that some of these algorithms exploitthe heterogeneity in the network to find short paths by using only local informa-tion. In addition, we demonstrate that the spatial network model belongs to a classof searchable networks for a wide range of parameter space. Further, we test thesealgorithms on the US airline network which belongs to this class of networks anddemonstrate that searchability is a generic property of the US airline network.These results provide insights on designing the structure of distributed networksthat need effective decentralized search algorithms.

3 Author to whom any correspondence should be addressed.

New Journal of Physics 9 (2007) 190 PII: S1367-2630(07)45866-91367-2630/07/010190+17$30.00 © IOP Publishing Ltd and Deutsche Physikalische Gesellschaft

Page 71: ComplexNetworks

2 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Contents

1. Introduction 22. Literature and problem description 33. Decentralized search algorithms 44. Spatial network model and search analysis 7

4.1. Simulation and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75. Search in the US airline network 11

5.1. Properties of the US airline network . . . . . . . . . . . . . . . . . . . . . . . 115.2. Search results and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

6. Conclusions and discussion 15Acknowledgments 16References 16

1. Introduction

Recently, many large-scale distributed systems in communications, sociology, and biology havebeen represented as networks and their macroscopic properties have been extensively studied[1]–[4]. One of the major findings is the presence of heterogeneity in network properties. Forexample, the distribution of node degree (i.e. the number of edges incident on a node) for manyreal-world networks including the Internet, the World Wide Web, phone call networks, scientificcollaboration networks and metabolic networks is found to be highly heterogeneous and tofollow a power-law, p(k) ∼ k−γ where p(k) is the fraction of nodes with degree k. The clusteringcoefficients, quantifying local order and cohesiveness [5], are also found to be heterogeneous,i.e. C(k) ∼ k−1 [6]. Further, in many networks the node betweenness centrality, which quantifiesthe number of shortest paths that pass through a node, is found to be heterogeneous [7]. Theseheterogeneities have a demonstrably large impact on the network’s resilience [8, 9] as well asnavigation, local search [10, 11], and spreading processes [12].

Another interesting property exhibited by these networks is the ‘small-world phenomenon’whereby almost every node is connected to every other node by a path with a small numberof edges. This phenomenon was first demonstrated by Milgram’s famous experiment in 1960[13]. Milgram randomly selected individuals from Wichita, Kansas and Omaha, Nebraska andrequested them to direct letters to a target person in Boston, Massachusetts. The participants,and consecutively each person receiving the letter, were asked to send it to an acquaintancewhom they judged to be closer to the target. Surprisingly, the average length of these paths (i.e.the number of edges in the path) was approximately 6, illustrating the small-world property ofsocial networks. An even more striking observation, which was later pointed out by Kleinberg[14]–[16], is that the nodes (participants) were able to find short paths by using only localinformation. Currently, Dodds et al are carrying out an Internet-based study to verify thisphenomenon, and initial findings are published in [17].

The observation by Kleinberg raises two fundamental questions: (i) Why should socialnetworks be structured in a way that local search is efficient? (ii) What is the structure ofnetworks that exhibit this phenomenon? Kleinberg [14] and later Watts et al [18] argued thatthe emergence of such a phenomenon requires special topological features. They termed the

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 72: ComplexNetworks

3 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

networks in which short paths can be found using only local information as searchable networks.These studies along with a few others [10, 19] stimulated research on decentralized searching incomplex networks [11], [20]–[26], a problem with many practical applications. In many networks,information such as data files and sensor data is stored at the nodes of a distributed network. Inaddition, the nodes have only limited or local information about the network. Hence, to access thisinformation quickly, one should have efficient algorithms that can find the target node using theavailable local information. Examples include routing of sensor data in wireless sensor networks[27, 28], locating data files in peer-to-peer networks [26, 29], and finding information indistributed databases [30]. For the search process to be efficient, it is important that these networksare designed to be searchable. The importance of search efficiency becomes even more imminentin the case of ad-hoc networks, where the networks are decentralized and distributed, and realtime searching is required to find the target node.

In this paper, we study the decentralized search problem in a family of parameterized spatialnetwork models that are heterogeneous in node degree. We propose several decentralized searchalgorithms and examine their performance by simulating them on the spatial network model forvarious parameters. As pointed out in [25], our analysis reveals that the optimal search algorithmshould effectively incorporate the direction of travel and the degree of the neighbour. We illustratethat some of these algorithms exploit the heterogeneities present in the network to find paths asshort as the paths found by using global information; thus we demonstrate that the spatial networkmodel considered defines a class of searchable networks. Further, we test these algorithms onthe US airline network which belongs to this class of networks and show that searchability is ageneric property of the US airline network.

2. Literature and problem description

Decentralized searching in networks can be broadly classified into searching in unstructurednetworks (as in peer-to-peer networks such as Gnutella [29]) and in structured/spatial networks(as in wireless sensor networks). In unstructured networks, the global position of a node cannotbe quantified and it is difficult to know whether a step in the search process is towards thetarget node or away from the target node. Hence, it is difficult to obtain short paths using localinformation. In unstructured networks with power-law degree distributions, Adamic et al [10]showed that a high-degree seeking search is better than a random-walk search. In a random-walksearch, the node that has the message passes it to a randomly chosen neighbour, and the processcontinues until it reaches the target node. Whereas, in a high-degree search, the node that has themessage passes it to the neighbour with highest degree. Thadakamalla et al [11] proposed a moregeneral algorithm based on a local measure, local betweenness centrality (LBC), for networkswhich are heterogeneous both in edge weights and in node degree. They demonstrated that thesearch based on LBC utilizes the heterogeneities in edge weights and node degree to performthe best in power-law (scale-free) weighted networks.

In structured networks the nodes are embedded in a metric space and they are connectedbased on the metric distance. Here, the global position of the target node in the space can guidethe search process to reach the target node more quickly. In [14, 15], Kleinberg studied search ina family of grid-based models that generalize the Watts–Strogatz [5] model. He proved that onlyone particular model among this infinite family can support efficient decentralized algorithms. Inthis model, a simple greedy search, where the node passes the message to the neighbour closest

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 73: ComplexNetworks

4 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

to the target node based on the grid distance, is able to give short paths. He further extended thismodel to hierarchical networks [16], where, again, the network was proven to be searchable onlyfor a specific parameter value. Unfortunately, the model given by Kleinberg represents only a verysmall subset of complex networks. Independently,Watts et al presented another model based uponplausible hierarchical social structures [18], to explain the phenomena observed in Milgram’sexperiment. The networks were shown to be searchable by a greedy search algorithm for a widerange of parameter space. Other works on decentralized searching include [20]–[26]. Simsek andJensen [25] use homophily between nodes and degree disparity in the network to design a betteralgorithm for finding the target node. However, finding an optimal way to combine location anddegree information is yet to be investigated (see [21] for a review). Another interesting problemstudied by Clauset and Moore [31], and by Sandberg [24], is the question of how real-worldnetworks evolve to become searchable. They propose a simple feedback mechanism where thenodes continuously conduct decentralized searches, and in the process partially rewire the edgesto form a searchable network.

In this paper, we consider search in a family of parameterized spatial network models thatare heterogeneous in node degree. In this model, nodes are placed in an n-dimensional space andare connected, based on preferential attachment and geographical constraints, to form spatialscale-free networks. Preferential attachment to high-degree nodes is believed to be responsiblefor the emergence of the power-law degree distribution observed in many real-world networks[32], and geographical constraints account for the fact that nodes tend to connect to nodes that arenearby. Many real-world networks such as the Internet [33] and the worldwide airline network[34], can be described by this family of spatial network models. Our objective is to designdecentralized search algorithms for this type of network model and demonstrate that this simplemodel defines a class of searchable networks. The decentralized search algorithm attempts tosend a message from a starting node s to the target node t along the edges of the network usinglocal information. Each node has information about the position of the target node, the positionof its neighbours, and the degree of its neighbours. Using this information, the start node, andconsecutively each node receiving the message, passes the message to one of its neighbours basedon the search algorithm until it reaches the target node. We evaluate each algorithm based on thenumber of hops taken for the message to reach the target node; the lower the number, the betterthe performance of the algorithm. Another potentially relevant measure is the physical distancetravelled by each search algorithm. However, the number of hops is the most pertinent distancemeasure in many networks, including social networks, the Internet and even airline networks,as the delays associated with switching between edges are comparable to the delays associatedwith traversing an edge.

As observed in previous studies [10, 11], we expect that the heterogeneity present in spatialscale-free networks influences the search process. In the following section, we discuss why thedegree of a node’s neighbour is important and propose different ways of composing the directionof travel and the degree of the neighbour.

3. Decentralized search algorithms

A simple search algorithm in spatial networks is greedy search, where each node passes themessage to the neighbour closest to the target node. Let di be the distance to the targetnode from each neighbour i (see figure 1(a)) and let ki be the degree of the neighbour i.

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 74: ComplexNetworks

5 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Figure 1. (a) Illustration of a spatial network. di is the distance to the target nodefrom each neighbour i and ki is the degree of the neighbour i. (b) Illustrationfor demonstrating that sometimes it is better to choose a neighbour with higherdegree i.e. node 2 over node 1, even if we are going away from the target. Thiswill give higher probability of taking a longer step in the next iteration.

Greedy search chooses the neighbour with the smallest di. This will ensure that the messageis always going to the neighbour closest to the target node. However, greedy search may not beoptimal in spatial scale-free networks that have high heterogeneity in node degree. Adamic et al[10] and Thadakamalla et al [11] have shown that search algorithms that utilize the heterogeneitiespresent in the network perform substantially better than those that do not. Indeed, choosing aneighbour with higher degree, even by going away from the target node, gives a higher probabilityof taking a longer step in the next iteration. For instance, in figure 1(b), it is better to choosenode 2 instead of node 1 since node 2 can take a longer step towards the target node in the nextiteration. In the following paragraph, we show that the expected distance a neighbour can takein the next iteration is a strictly increasing function of its degree.

We define the length of an edge as the Euclidian distance between the two nodesconnected by the edge. Let P(X) be the probability distribution of edge lengths. Let Yk =Max{X1, X2, X3, . . . , Xk}, whereX1, X2, X3, . . . , Xk are independent and identically distributed(i.i.d.) random variables with distribution function P(X). The cumulative distribution functionof Yk is

P[Yk � y] =k∏

i=1

P[Xi � y] = [P(X1 � y)]k.

This implies

E(Yk) =∫ ∞

0(1 − [P(X1 � y)]k) dy.

Since P(X1 � y) � 1 ∀y,

[P(X1 � y)]k1 � [P(X1 � y)]k2 if k1 � k2,

implying thatE(Yk1) � E(Yk2)∀y if k1 � k2

Similarly, we can show that if P(X) is not a delta function then

E(Yk1) < E(Yk2) if k1 < k2.

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 75: ComplexNetworks

6 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Now consider two neighbours n1 and n2 with degree k1 and k2. The expected distance theneighbours n1 and n2 can take in the next iteration irrespective of the direction is given byE[Yk1−1] and E[Yk2−1] respectively. This implies that E[Yk1−1] > E[Yk2−1] if k1 > k2. Here, weapproximate that X1, X2, X3, . . . , Xk are independent which is valid when the number of edgesis large. Hence, if we choose a neighbour with higher degree then there is a greater probability oftaking a longer step in the next iteration. Thus one expects that in spatial scale-free networks theefficient algorithm should combine the direction of travel, quantified by di, and the degree of theneighbour, ki, into one measure. Since the units of di and ki are different, there is no trivial way ofcomposition that is optimal. The aim of the measure is to choose a neighbour with smaller di andlarger ki with an intuition that a higher degree node should effectively decrease the distance fromthe target—a goal which can be achieved in many different ways. One could give an incentiveg(ki), and then subtract it from the distance di; one could also divide di either by ki or by anyincreasing function of ki. We investigated the following search algorithms, which cover a broadspectrum of possibilities.

1. Random walk: the node attempts to reach the target by passing the message to a randomlyselected neighbour.

2. High-degree search: the node passes the message to the neighbour with the highest degree.The idea here is that by choosing a neighbour that is well-connected, there is a higherprobability of reaching the target node. Note that this algorithm requires the fewest numberof hops to reach the target in unstructured networks [10].

3. Greedy search: the node passes the message to the neighbour i with the smallest di.This will ensure that the message is always going to the neighbour closest to the targetnode.

4. Algorithm 4: the node passes the message to the neighbour i with the smallest measuredi − g(ki). The function g(ki) is an incentive for choosing a neighbour of higher degree.Ideally, g(ki) should be the expected maximum length of an edge from a node withdegree ki.

5. Algorithm 5: the node passes the message to the neighbour i that has the smallest measure( di

dm)ki , where dm is the Euclidian distance between the most spatially distant nodes in the

network, and is used for normalizing di. We assume that dm is known to all the nodes in thenetwork. Note that the algorithm prefers the neighbour that has lower di and higher ki.

6. Algorithm 6: the node passes the message to the neighbour i that has the smallest measuredi

ki. Here, again, the algorithm prefers the neighbour that has lower di and higher ki.

7. Algorithm 7: the node passes the message to the neighbour i that has the smallest measure( di

dm)ln ki+1. This is a conservative version of algorithm 5 with respect to ki.

8. Algorithm 8: the node passes the message to the neighbour i that has the smallest measuredi

ln ki+1 . This algorithm is weaker version of algorithm 6 with respect to ki.

Algorithms from 4 to 8 aim to capture both the direction of travel and the neighbours’degree.Thus, we expect these algorithms to give smaller path lengths than other algorithms. In the case ofalgorithm 4, it would be extremely difficult to define a function independent of the parameters ofthe network. Hence, it may not be realistic to use this form of composition for direction of traveland degree of neighbour. Even greedy search has a slight preference for high-degree nodes, sincethe probability of reaching a node with degree k is ∼ kpk [35], where pk is the fraction of nodes

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 76: ComplexNetworks

7 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

with degree k. Hence, the proposed algorithms have to be extremely competitive to perform betterthan greedy search. The algorithms described above are mainly based on intuition. However, aswe discuss later in the paper, the successful strategies are not restricted to these functional forms.

4. Spatial network model and search analysis

The spatial network model we consider incorporates both preferential attachment andgeographical constraints. At each step during the evolution of the spatial network model oneof the following occurs [36]:

1. with probability p, a new edge is created between two existing nodes in the network;

2. with probability 1 − p, a new node is added and connected to m existing nodes in thenetwork, with the constraint that multiple edges are not formed.

In both cases, the degrees of the nodes and the distances between them are considered whenforming a new edge. In the first case, two nodes i and j are selected according to

�ij ∝ kikj

F(dij),

where ki is the degree of node i, dij is the Euclidian distance between nodes i and j and F(dij) isan increasing function of dij. A new node i is uniformly and randomly placed in an n-dimensionalspace and is connected to a pre-existing node j with probability

�j ∝ kj

F(dij).

The above process is simulated until the number of nodes in the network isN. Let the networkgenerated be G(N, p, m, F, n). Here, the preferential attachment mechanism leads to a power-law degree distribution where the exponent can be tuned by changing the value of p [36] (seefigure 2(a)). F(d) controls the truncation of the power-law decay, and if F(d) increases rapidly,then the power-law decay regime can disappear altogether [37]. Two widely-used functions forF(d) are dr [33] and exp(d/dchar) [37].

4.1. Simulation and analysis

We investigate the search algorithms by simulating them on the networks generated by the abovespatial network model. We generate the network on a two-dimensional grid with length a = 1000,breadth b = 500, and m = 1 for different values of N, p, and different functions F . Once thenetwork is formed, we randomly choose K pairs (source and target) of nodes and simulate thesearch algorithms. The source, and consecutively each node receiving the message, passes themessage to one of its neighbours, according to the search algorithm. For algorithm 4, we assumethe incentive function g(ki) to be the expected maximum distance a node with degree ki can takefor the next hop, that is, the expected maximum length of an edge from a node with degree ki.Empirically we found that this function follows the form c1 ∗ ln ki + c2 for all the spatial networks.For algorithms 5 and 7, we let dm be

√a2 + b2, the largest distance between two points in the

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 77: ComplexNetworks

8 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Table 1. Comparison of search algorithms on a spatial scale-free networkof 1000 nodes in a two-dimensional space with length and breadth equalto 1000 and 500, respectively. l is the average path length for the pathsfound by the search algorithm, dpath is the average physical distance forthe paths found by each search algorithm and c is the percentage numberof times the path was not found. The table summarizes the average of l,dpath and c obtained from 10 simulations of the network with parametersp = 0.72 and r for 2000 pairs. Note that the decentralized algorithms 5,6, 7 and 8 perform as well as the shortest paths found by using globalinformation. Even though the greedy search performs well for the paths found(l and dpath), it is sometimes unable to find a path (c).

r = 1 r = 2 r = 3

l dpath c(%) l dpath c(%) l dpath c(%)

Random walk 41.68 10957 0 70.47 9414 0 138.07 9024 0High-degree search 28.35 8032 0 54.85 8805 0 120.15 9848 0Greedy search 3.37 787 0.17 3.59 600 0.83 4.53 537 2.11Algorithm 4 10.22 2303 0.12 14.07 1987 0.46 20.08 1806 1.87Algorithm 5 2.47 646 0 2.97 594 0 4.51 677 0.02Algorithm 6 2.45 636 0 2.85 565 0 3.73 573 0.02Algorithm 7 2.54 631 0 2.80 539 0 3.52 527 0.02Algorithm 8 2.66 646 0 2.87 537 < 0.01 3.54 514 0.07Shortest path length 2.27 531 NA 2.55 435 NA 3.05 403 NA

considered space. We assume that it is sufficient if the message reaches a small neighbourhoodof the target node defined by a circle with radius D. This is a realistic assumption in many real-world networks, e.g. it is sufficient if we reach one of the airports in the close neighbourhood ofa destination city (especially when the city has multiple airports). The search process continuesuntil the message reaches a neighbour of the target node or a node within a circle of radiusD = 50 centred around the target node. In order to avoid passing the message to a neighbourthat has already received the message, a list L is maintained. During the search process, if themessage reaches a node i whose neighbours are all in the list L, then the message is passed to oneof the neighbours using the same algorithm. In the case of random walk or high degree search,the message is routed back to the previous node and this particular neighbour i is marked to notethat it cannot pass the message any further. If the number of hops exceeds N/2, then the searchprocess stops, noting that the path was not found. For each search algorithm, the average pathlength, l, measured as the number of edges in the path, the average physical distance travelledalong the path, dpath, and the percentage of times the search algorithm is unable to find a path, c,are computed from the search results obtained for K pairs in 10 instances of the network model.The lower the value of l, dpath and c, the better the performance of the search algorithm. We usethe shortest average path length and average physical distance obtained by global breadth-first-search (BFS) algorithm and Dijkstra’s algorithm [38] respectively, as a benchmark for comparingthe performance of the search algorithms.

Table 1 compares the performance of different search algorithms for the spatial network,G(1000, 0.72, 1, dr, 2) with r = 1, 2 and 3. We find that the decentralized search algorithms 5,

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 78: ComplexNetworks

9 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Table 2. Comparison of search algorithms on spatial scale-free networks withdifferent parameters. l is the average path length for the paths found by each searchalgorithm and c is the percentage number of times the path was not found.The tablesummarizes the average of l and c obtained from 10 simulations of the networkwith parameters N, p, r and dchar. Note that the decentralized algorithms 5, 6, 7and 8 perform as well as the shortest path found by using global information. Eventhough the greedy search performs well for the paths found (l), it is sometimesunable to find a path (c).

N = 1000, r = 1 p = 0.72, r = 1 N = 1000, p = 0.72

p = 0.30 p = 0.80 N = 500 N = 1500 dchar = 0.5 dchar = 2.0

l c(%) l c(%) l c(%) l c(%) l c(%) l c(%)

Greedy search 6.55 7.93 2.90 0.09 4.09 0.24 3.10 0.44 3.64 0.18 3.92 0.1Algorithm 5 3.41 0.02 2.35 0 2.83 0 2.40 0 2.46 0.03 2.55 0Algorithm 6 3.38 0.04 2.38 0 2.81 0 2.38 0 2.49 0 2.59 0Algorithm 7 3.59 0.19 2.40 0 2.95 0 2.43 0.01 2.66 0.02 2.78 0Algorithm 8 4.12 0.73 2.49 < 0.01 3.16 < 0.01 2.54 0 2.79 0.04 3.01 0.01Shortest path length 2.91 NA 2.16 NA 2.30 NA 2.26 NA 2.23 NA 2.23 NA

6, 7 and 8 perform as well as the shortest path obtained using global information of the network.Specifically, the difference between the shortest path and the path obtained by algorithms 6 and7 is less than a hop. These results are surprising because the latter algorithms only use the localinformation in the network, yet they perform as well as the BFS algorithm. This behaviour ismainly due to the power-law nature of the spatial network: the few nodes with high-degree areallowing the algorithms to make big jumps during the search process (see table 1). This conclusionis corroborated by the fact that an increase in r, meaning a decrease in the power-law regime in thedegree distribution [37], induces an increase in the path length. Greedy search which uses onlythe direction of travel is able to find short paths (compare l’s in table 1) but for a few node pairsit is unable to find a path (compare c’s in table 1). Greedy search does not consider the degreeof the nodes and sometimes the algorithm gets stuck in a loop in sparsely connected regionsof the network. In the case of algorithm 4, the composition was not very effective. It is likelythat the values of the coefficients, which are difficult to compute, were not optimal. Moreover,the optimal values are highly dependent on the parameters and the configuration of the spatialnetwork. Hence, it would be difficult to generalize the algorithm for all networks and we willnot consider it further in our analysis. Random-walk and high-degree search do not consider thedirection of travel and hence take an exorbitantly large number of hops. Further, we found thatthe search algorithms’ performance with respect to the path length l and physical distance metricdpath was similar. Hence, in the rest of our analysis, we do not discuss these two algorithms andthe physical distance metric since the results do not add significant new information.

Similar results are obtained for a wide range of parameters for the spatial network model.Table 2 summarizes the results for some of these parameter values. This parameter space coversa broad range of power-law networks with different properties. For example, as the value of p

changes from 0.3 to 0.8, the power-law exponent of the degree distribution changes from 2.4to 1.7 (see figure 2(a)), which is the usual range of many real-world networks [1]–[4]. Hence

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 79: ComplexNetworks

10 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

– 0.7

– 1.4

0.001

0.010

0.100

1.000

1 10 100 1000

Degree (k)

Cum

ulat

ive

dist

ribut

ion

p(>

=k)

0.001

0.010

0.100

1.000

1 10 100 1000

Degree (k)

Cum

ulat

ive

dist

ribut

ion

p(>

=k)

– 0.9

0

20

40

60

80

100

120

0 10 15 20

Scaled degree (k/⟨k⟩)

Nor

mal

ized

BC

0

10

20

30

40

0 10 15

Scaled degree (k/⟨k⟩)

Nor

mal

ized

BC

(a) (b)

(c) (d)Anchorage

5 5

Figure 2. (a) Cumulative degree distribution of the networks generated bythe spatial network model for different values of p. The symbols representp = 0.3(•), 0.4(�), 0.6(�), and 0.8(�). The power-law exponent of the networkcan be tuned by changing the value of p. (b) Cumulative degree distribution ofthe US airline network. (c) Scaling of normalized BC of a node i with its scaleddegree for the US airline network. Note that unlike random graphs, there existsno scaling between BC and degree of the node. (d) Scaling of normalized BC of anode i with its scaled degree for the US airline network without Alaska. Note thatthere is better correlation between BC and degree of the node when comparedwith the US airline network.

we can affirm that the spatial network model belongs to a general class of searchable networks.Although we have restricted our results to a discussion of two-dimensional spatial networks, itis easy to verify that these results will be valid for higher dimensions. Further, a large numberof decentralized search algorithms are efficient. For instance, in algorithm 6 we divide di byki, whereas in algorithm 8 we divide di by ln ki + 1 which scales logarithmically with ki. Bothalgorithms are found to be efficient. This implies that a wide range of functions f(x) that scale

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 80: ComplexNetworks

11 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

between x and ln x can be used for decentralized search. Hence, we find that the dependence ofthe search algorithms on the functional forms is weak and the searchability of these networks liesin their heterogeneous structure rather than the functional forms used in the search algorithm.

5. Search in the US airline network

Let us consider the US airline network, where nodes are the airports and two nodes are connectedby an edge if there is a direct flight from one airport to another. In this network, navigating alongan edge from one node to another represents flying from one airport to another. Suppose ourobjective is to travel from one place to another using the US airline network. In real life, one canobtain a choice of itineraries from the closest airport to the departure location (departure airport)to the closest airport to the destination location (destination airport) using various sources suchas travel agents, airline offices or the World Wide Web. These sources have global informationabout the network and one can choose the itinerary based on different criteria, such as travel fare,number of stopovers, or total time of travel. Now consider a different scenario—one in whichwe do not have access to the global information of the network, and each airport has only localinformation. In other words, each airport has information about the location of the airports it canfly to and how well these neighbouring airports are connected (their degree). We do know thelocation of the departure airport and the destination airport. The objective is to find a path withthe fewest stopovers from the departure airport to the destination. From the departure airport,and consecutively from each intermediate airport, we choose to fly to one of its neighbours basedon the degree of the neighbouring airport, its location and the location of the destination airport.This process continues until we reach the destination airport or any other airport within a smallneighbourhood of the destination airport. In real life, it is sufficient if we reach one of the airportsnear the destination airport. For example, it is sufficient to reach LaGuardia Airport (LGA),New York City if the objective is to reach John F Kennedy International Airport (JFK),New York City. In our study, as a first-order approximation we do not consider the type ofairline or travel fare as important parameters. Even though this method of travel is unrealistic, itprovides insights on the performance of decentralized search algorithms on real-world networks.

5.1. Properties of the US airline network

The Bureau of Transportation Statistics [39] has a well-documented database on the departureschedule, number of passengers, flight type etc for all the flights in the USA. We consideredthe data collected for the service class F (scheduled passenger service) flights during the monthof January 2006 to form the US airline network. Each airport is represented as a node and adirect flight connection from one airport to another is depicted as a directed edge. We filtered thedata to remove the anomalous edges formed due to redirected flights caused by environmentaldisturbances or random failures. Further, one would expect to have a flight from airport A toairport B if there is one from B to A; but for a small number of instances this was not true. Tosimplify the analysis, we added edges to make the network undirected.

After filtering the data, the airline network had 710 nodes and 3414 edges. The number ofnodes and edges in the largest connected component (LCC) were 690 and 3412 respectively. Therest of the analysis in the paper considers only the LCC of the network. Not surprisingly, theproperties of the US airline network are very similar to the properties of the world wide airline

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 81: ComplexNetworks

12 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

network (WWN) [7]. The average path length for the airline network, which is the averageminimum number of flights one has to take to go from one airport to any other, is 3.6. Theclustering coefficient, which quantifies local order of the network measured in terms of thenumber of triangles (3-cliques) present, is 0.41. Hence, the US airline network is also a small-world network [5]. The degree distribution of the network follows a power-law p(k) ∼ k−γ withexponent γ = 1.9 ± 0.1 (see figure 2(b)), which is close to the exponent of the WWN, 2.0 ± 0.1[7]. Further, as observed in the WWN, we find that the most connected airports are not necessarilythe most central airports. Figure 2(c) plots the normalized betweenness centrality (BC) of a nodei, (bi/〈b〉), where 〈b〉 is the average BC of the network, versus its scaled degree ki/〈k〉, where〈k〉 is the average degree of the network. The geopolitical considerations used to explain thisphenomenon in the WWN [34] do not apply to the US airline network, as it belongs to a singlecountry. In fact, this behaviour is due to Alaska which contains a significant percentage of theairports (255 of 690, close to 34%) yet only a few (around 6) are connected to airports outsideof Alaska. For instance, the BC of Anchorage, Alaska is significantly higher than its degree(see figure 2(c)). If we remove the Alaska airports from the network, then we observe bettercorrelation between the degree of a node and its BC (see figure 2(d)).

If an area is separated from the US mainland (such as Alaska and Hawaii), then very fewairports connect it to the mainland and it may be difficult for search algorithms to capture theseconnections between the mainland and the other areas. To investigate the effects of this propertyon the search process, we simulate the algorithms on three different networks, namely, the USairline network, the US airline network without Alaska and the US mainland airline networkwithout Alaska, Hawaii, Puerto Rico, the US Virgin Islands and the US Pacific Trust Territoriesand Possessions (US mainland network). The latter two networks have statistical propertiessimilar to those of the US airline network. The US airline network without Alaska has 459 nodesand 2857 edges with 455 nodes and 2856 edges in the LCC; the US mainland network has 431nodes and 2729 edges with 427 nodes and 2728 edges in the LCC.

5.2. Search results and analysis

We simulated the search algorithms for all N ∗ (N − 1) pairs in each network, where N is thenumber of nodes. The US airline network, the US airline network without Alaska, and the USmainland network had 475 410, 206 570, and 181 902 pairs respectively. We chose dm to be thelargest distance between two airports in the network and the neighbourhood distance D to be100 miles. Table 3 summarizes the results obtained by each search algorithm. l is the averagepath length obtained for the paths found by the search algorithm, and c is the number of timesthe search algorithm was unable to find a path. The results are similar to the results obtainedfor the spatial scale-free network model. Algorithms 6, 7 and 8 are able to find paths as short asthe paths obtained by the BFS algorithm. Again, greedy search is able to give short paths whenit is able to find paths, but there were instances in which it was unable to find any path. In thecase of the US airline network without Alaska and the US mainland network, the performanceof the search algorithms is even better, especially for algorithm 5 which did not perform well forthe complete US airline network. Figure 3 visualizes the paths obtained in a characteristic casewhen greedy search takes a higher number of hops. Often the greedy search reaches the nodeswhich are near to the destination node but are not well-connected. Hence, it results in travellingmany hops within that region before reaching the destination. The proposed search algorithmsavoid the low-connected nodes and reach the destination node in fewer hops.

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 82: ComplexNetworks

13 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Table 3. Comparison of search algorithms on the US airline network, the USnetwork without Alaska, and the US mainland network. l is the average pathlength for the paths found by the search algorithms and c is the number of timesthe path was not found. The table summarizes the average of l and c obtainedfor all the possible pairs in the network. In the US airline network, algorithms 6,7 and 8 give paths close to the shortest path length. In the other two networks,algorithms 5, 6, 7 and 8 give short paths. Here again, the greedy search performswell for the paths found (l) but it is sometimes unable to find a path (c).

US airline network US network without Alaska US mainland network(N = 690, Pairs = 475 410) (N = 455, Pairs = 206 570) (N = 427, Pairs = 181 902)

l c l c l c

Greedy search 3.93 16806 (3.54%) 2.83 4015 (1.94%) 2.74 3729 (2.05%)

Algorithm 5 5.53 13870 (2.92%) 3.75 456 (0.22%) 2.85 425 (0.23%)

Algorithm 6 4.01 752 (0.16%) 3.17 454 (0.22%) 2.68 425 (0.23%)

Algorithm 7 3.37 688 (0.14%) 2.68 453 (0.22%) 2.93 1 ( 0.01%)

Algorithm 8 3.37 41 (< 0.01%) 2.76 38 (0.02%) 2.75 39 (0.02%)Shortest path length 3.02 NA 2.39 NA 2.32 NA

When we looked at the search results in more detail we found a few more interestingbehaviours. The greedy search and algorithm 5 were unable to find paths for approximately thesame number of pairs in the US airline network (3.54% in the case of the former and 2.92%for the latter). However, there is a difference in the type of paths these search algorithms couldnot find. The paths not found by greedy search were distributed uniformly for all departure anddestination nodes; the paths not found by algorithm 5 were due predominantly to the 18 airportsin Alaska, which were unreachable, almost regardless of the starting point. It was interestingto see that even if we start from Anchorage International Airport (ANC), the most connectedairport in Alaska, these airports were not reachable. This is mainly due to the high affinity ofalgorithm 5 for high-degree nodes. The degree of neighbours of ANC which are in Alaska issmall compared to the degree of neighbours on the US mainland. Hence, when we start froman airport, the algorithm was able to reach Anchorage but afterward selected one of the highly-connected airports on the US mainland. From that point on, it is difficult to return to Alaska,since the search algorithm is self-avoiding and since the only other airport that flies to Alaska,excluding ANC, is Seattle-Tacoma International Airport (SEA). The US airline network withoutAlaska and the US mainland network do not have these constraints, and hence algorithm 5 wasable to perform better.

Among the 475 410 pairs of source and destination nodes searched, algorithms 6 and 7could not reach the destination node 752 and 688 times, respectively. Again, it turns out thatthe failure to reach the destination was mainly due to a particular airport, namely, Havre City-County Airport (HVR) in Montana. Similar behaviour was observed for these algorithms in theUS airline network without Alaska and the US mainland network. HVR is a single-degree nodethat is connected to LewistownAirport (LWT), Montana and the only other airport to which LWTis connected is Billings Logan International Airport (BIL), Montana which is a well-connectedairport. Hence, the only way to reach HVR would be to reach BIL first and then to fly to LWT.Unfortunately, none of the algorithms, other than the greedy search, can choose LWT from BIL

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 83: ComplexNetworks

14 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Figure 3. Visualization of the paths obtained in a characteristic case when greedysearch takes a higher number of hops. In this case, the departure airport is StateCollege, PA (node 1) and the destination airport is Laredo, Texas (node 9). Theairline codes and degrees corresponding to the nodes are: 1, SCE, degree 5; 2,CVG, degree 118; 3, SAT, degree 29; 4, HRL, degree 6; 5, CRP, degree 5; 6,HOU, degree 31; 7, AUS, degree 34; 8, IAH, degree 118; 9, LRD, degree 2. Thepath obtained for the greedy search is 1→2 →3 →4 →5 →6 →7 →8 →9and for the algorithms 5, 6 and 7 is 1→2 →8 →9. Algorithm 8, not shown onthe map, takes 4 hops (1→2 →3 →8 →9). Often the greedy search reaches thenodes which are near to the destination node but are not well-connected. Hence, itends up travelling many hops within that region before it reaches the destination.Whereas, the proposed search algorithms avoid the low-connected nodes andreach the destination node in a lesser number of hops.

when the destination is HVR. Here again, even though the algorithms 5, 6, 7 and 8 are ableto reach BIL, they do not choose LWT as the first choice. Moreover, once they fly out of BIL,they take many hops to reach BIL again due to the self-avoiding nature of the algorithms. Forinstance, when the destination is HVR, algorithms 7 and 8 take, on an average, only 2.5 and 3.44hops respectively to reach BIL. However, to reach HVR they take around 170 and 102 hops,respectively. The reason why this behaviour is not observed for other single-degree nodes in theUS mainland network is that single-degree nodes are usually connected to high-degree nodes.The average degree of the neighbours of the single-degree nodes was found to be 82.86, which issignificantly higher than the average degree in the network (12.78). In addition, the only airport(LWT) that flies to HVR (or to a neighbourhood of HVR) is not chosen by the only other airport(BIL) that can fly to LWT.

Table 4 gives the percentage of times the path length found by the search algorithms is thesame as the shortest path length. In approximately 90% of the pairs, the path length found byalgorithms 6, 7 and 8 was the same as the shortest path length. Further, in 97% of the pairs,the path length found was more than the shortest path by a maximum of two hops. Given that

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 84: ComplexNetworks

15 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Table 4. Comparison of search algorithms on the US airline network, theUS network without Alaska, and the US mainland network. ‘Diff = 0’ is thepercentage of pairs for which the path length found by the search algorithmsis the same as the shortest path length. Algorithms 6, 7 and 8 are able find theshortest paths in more than 90% of the pairs. ‘Diff � 2’ is the percentage of pairsfor which the path length found was more than the shortest path by a maximumof two hops. Given that the search algorithms use only local information, theseresults on the US airline network are quite fascinating.

US airline network US network without Alaska US mainland network

Diff = 0(%) Diff � 2(%) Diff = 0(%) Diff � 2(%) Diff = 0(%) Diff � 2(%)

Greedy search 66.3 85.8 75.3 92.3 75.8 92.7Algorithm 5 66.9 72.1 88.2 93.7 90.8 96.0Algorithm 6 88.8 96.6 90.8 95.6 92.2 96.8Algorithm 7 91.3 98.0 92.0 97.6 92.4 98.1Algorithm 8 88.4 97.5 89.5 97.8 89.0 97.6

the search algorithms use only local information these results on the airline networks are quitefascinating. Note that this behaviour is due mainly to the inherent structure of the US airlinenetwork, which can be considered a ‘searchable network’.

6. Conclusions and discussion

In this paper, we studied decentralized search in spatial scale-free networks. We proposeddifferent search algorithms that combine the direction of travel and the degree of the neighbourand illustrated that some of these algorithms can find short paths by using the local informationalone. We demonstrated that a family of parameterized spatial network model belongs to a classof searchable networks for a wide range of parameter space. Further, we tested these algorithmson the US airline network. Surprisingly, we found that one can travel from one place to anotherin fewer than four hops while using only local information. This implies that searchability is ageneric property of the US airline network, as is also the case for social networks.

In addition, the spatial network model and the airline network are searchable for a widerange of search algorithms. For example, algorithms 6 and 8 are both able to find short paths inthese networks. Hence, any search algorithm with a function f(x) that scales between x and ln x

should give short paths. Moreover, the algorithms can be extended to other power-law networks ifwe can embed the network in an n-dimensional metric space in which nodes are connected basedon the metric distance. The algorithms are relevant to other networks such as the Internet and roadnetworks. As demonstrated in [33], the Internet can be described by the family of spatial networkmodels considered in this paper and hence we expect that these search algorithms can find shortpaths in the Internet. However, road networks do not follow a power-law degree distribution.Investigating the algorithms on the dual form of the road networks, which do exhibit scale-freeproperties [40], is a topic of future work.

We notice that algorithm 8, the most conservative with respect to degree, performs the bestin the US airline network. This implies that direction plays the most important role in efficient

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 85: ComplexNetworks

16 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

searching, and even slight blending of direction with degree is sufficient to drastically improvethe efficiency of search algorithms. In other words, a search algorithm which traverses basedon direction and that cautiously avoids low-degree nodes should give short paths. However, asobserved with algorithm 5, sometimes high preference for degree may lead the algorithm tothe nodes far away from the destination node. Further, we can conclude that searchability is aproperty of the network rather than of the functional forms used for the search algorithm.

The difference between the results obtained on the US airline network and the US mainlandnetwork is not significant (especially for algorithms 7 and 8). This implies that the results canprobably be extended to theWWN [7] which has a very similar structure to the US airline network.In the US airline network, we have separated areas which are connected to the mainland by onlya few airports. Algorithms 7 and 8 are able to capture these connections in order to travel fromone separated area to another. The WWN will have many more of these separated areas which arewell-connected locally but are sparsely inter-connected. We feel that algorithms 7 and 8 wouldbe able to find short paths in the WWN; verification would be subject to the availability of dataon the WWN.

Probably, the results obtained for the US airline network are intuitive. For instance, in real lifeif one is asked to travel with local information, he/she can always find a short path—if not alwaysthe shortest path. But the significance of the results lies in capturing this phenomenon/intuition inan algorithm. Definitely, the structure of the network facilitates its searchability. As conjecturedby others, the results presented in this paper support the hypothesis [10, 21] that many real-world networks evolve to inherently facilitate decentralized search. Furthermore, these resultsprovide insights for designing the structure of decentralized networks that need effective searchalgorithms.

Acknowledgments

The authors would like to acknowledge the National Science Foundation (grants DMI 0537992and CCF 0643529) for making this work feasible. Any opinions, findings and conclusions orrecommendations expressed in this material are those of the author(s) and do not necessarilyreflect the views of the National Science Foundation.

References

[1] Albert R and Barabási A L 2002 Rev. Mod. Phys. 74 47[2] Boccaletti S, Latora V, Moreno Y, Chavez M and Hwang D U 2006 Phys. Rep. 424 175[3] Dorogovtsev S N and Mendes J F F 2002 Adv. Phys. 51 1079[4] Newman M E J 2003 SIAM Rev. 45 167[5] Watts D J and Strogatz S H 1998 Nature 393 440[6] Ravasz E, Somera A L, Mongru D A, Oltvai Z N and Barabási A L 2002 Science 297 1551[7] Guimera R, Mossa S, Turtschi A and Amaral L A N 2005 Proc. Natl Acad. Sci. 102 7794[8] Albert R, Jeong H and Barabási A L 2000 Nature 406 378[9] Thadakamalla H P, Raghavan U N, Kumara S R T and Albert R 2004 IEEE Intell. Syst. 19 24

[10] Adamic L A, Lukose R M, Puniyani A R and Huberman B A 2001 Phys. Rev. E 64 046135[11] Thadakamalla H P, Albert R and Kumara S R T 2005 Phys. Rev. E 72 066128[12] Pastor-Satorras R and Vespignani A 2001 Phys. Rev. Lett. 86 3200[13] Milgram S 1967 Psychol. Today 2 60[14] Kleinberg J 2000 Nature 406 845

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 86: ComplexNetworks

17 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

[15] Kleinberg J 2000 Proc. 32nd ACM Symp. Theor. Comput. pp 163–70[16] Kleinberg J 2001 Adv. Neural Inform. Process. Syst. 14 431[17] Dodds P, Muhamad R and Watts D J 2003 Science 301 827[18] Watts D J, Dodds P S and Newman M E J 2002 Science 296 1302[19] Kim B J, Yoon C N, Han S K and Jeong H 2002 Phys. Rev. E 65 027103[20] Arenas A, Cabrales A, Diaz-Guilera A, Guimera R and Vega F 2003 Statistical mechanics of complex networks

(Berlin: Springer) chapter ‘Search and Congestion in Complex Networks’ pp 175–94[21] Kleinberg J 2006 Proc. Int. Cong. Math. 3 1019[22] Liben-Nowell D, Novak J, Kumar R, Raghavan P and Tomkins A 2005 Proc. Natl Acad. Sci. 102 11623[23] Menczer F 2002 Proc. Natl Acad. Sci. 99 14014[24] Sandberg O 2006 Proc. 8th Workshop on Algorithm engineering and experiments (ALENEX) pp 144–55[25] Simsek O and Jensen D 2005 Proc. 19th Int. Joint Conf. Artificial Intell. pp 304–10[26] Zhang H, Goel A and Govindan R 2004 Comput. Netw. 46 555[27] Akyildiz I F, Su W, Sankarasubramaniam Y and Cayirci E 2002 Comput. Netw. 38 393[28] Raghavan U N and Kumara S R T 2007 Int. J. Sensor Netw. 2 201[29] Kan G 2001 Peer-to-Peer Harnessing the Power of Disruptive Technologies (Beijing: O’Reilly) chapter

‘Gnutella’[30] Chakrabarti S, van den Berg M and Dom B 1999 Comput. Netw. 31 1623[31] Clauset A and Moore C 2003 Preprint cond-mat/0309415[32] Barabási A L and Albert R 1999 Science 286 509[33] Yook S H, Jeong H and Barabási A L 2002 Proc. Natl Acad. Sci. 99 13382[34] Guimera R and Amaral L A N 2004 Eur. Phys. J. B 38 381[35] Newman M E J, Strogatz S H and Watts D J 2001 Phys. Rev. E 64 026118[36] Dorogovtsev S and Mendes J F F 2000 Europhys. Lett. 52 33[37] Barthelemy M 2003 Europhys. Lett. 63 915[38] Cormen T H, Leiserson C E, Rivest R L and Stein C 2001 Introduction to Algorithms 2nd edn (Cambridge:

MIT Press)[39] The Bureau of Transportation Statistics online at http://www.transtats.bts.gov/ (date accessed: 20 July 2006)[40] Kalapala V, Sanwalani V, Clauset A and Moore C 2006 Phys. Rev. E 73 026130

New Journal of Physics 9 (2007) 190 (http://www.njp.org/)

Page 87: ComplexNetworks

Near linear time algorithm to detect community structures in large-scale networks

Usha Nandini Raghavan,1 Réka Albert,2 and Soundar Kumara1

1Department of Industrial Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA2Department of Physics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA

�Received 9 April 2007; published 11 September 2007�

Community detection and analysis is an important methodology for understanding the organization ofvarious real-world networks and has applications in problems as diverse as consensus formation in socialcommunities or the identification of functional modules in biochemical networks. Currently used algorithmsthat identify the community structures in large-scale real-world networks require a priori information such asthe number and sizes of communities or are computationally expensive. In this paper we investigate a simplelabel propagation algorithm that uses the network structure alone as its guide and requires neither optimizationof a predefined objective function nor prior information about the communities. In our algorithm every node isinitialized with a unique label and at every step each node adopts the label that most of its neighbors currentlyhave. In this iterative process densely connected groups of nodes form a consensus on a unique label to formcommunities. We validate the algorithm by applying it to networks whose community structures are known.We also demonstrate that the algorithm takes an almost linear time and hence it is computationally lessexpensive than what was possible so far.

DOI: 10.1103/PhysRevE.76.036106 PACS number�s�: 89.75.Fb, 89.75.Hc, 87.23.Ge, 02.10.Ox

I. INTRODUCTION

A wide variety of complex systems can be represented asnetworks. For example, the World Wide Web �WWW� is anetwork of web pages interconnected by hyperlinks; socialnetworks are represented by people as nodes and their rela-tionships by edges; and biological networks are usually rep-resented by biochemical molecules as nodes and the reac-tions between them by edges. Most of the research in therecent past focused on understanding the evolution and orga-nization of such networks and the effect of network topologyon the dynamics and behaviors of the system �1–4�. Findingcommunity structures in networks is another step toward un-derstanding the complex systems they represent.

A community in a network is a group of nodes that aresimilar to each other and dissimilar from the rest of the net-work. It is usually thought of as a group where nodes aredensely interconnected and sparsely connected to other partsof the network �4–6�. There is no universally accepted defi-nition for a community, but it is well known that most real-world networks display community structures. There hasbeen a lot of effort recently in defining, detecting, and iden-tifying communities in real-world networks �5,7–15�. Thegoal of a community detection algorithm is to find groups ofnodes of interest in a given network. For example, a commu-nity in the WWW network indicates a similarity amongnodes in the group. Hence if we know the information pro-vided by a small number of web pages, then it can be ex-trapolated to other web pages in the same community. Com-munities in social networks can provide insights aboutcommon characteristics or beliefs among people that makesthem different from other communities. In biomolecular in-teraction networks, segregating nodes into functional mod-ules can help identify the roles or functions of individualmolecules �10�. Further, in many large-scale real-world net-works, communities can have distinct properties which arelost in their combined analysis �1�.

Community detection is similar to the well studied net-work partitioning problems �16–18�. The network partition-ing problem is in general defined as the partitioning of anetwork into c �a fixed constant� groups of approximatelyequal sizes, minimizing the number of edges betweengroups. This problem is NP-hard and efficient heuristicmethods have been developed over years to solve the prob-lem �16–20�. Much of this work is motivated by engineeringapplications including very large scale integrated �VLSI� cir-cuit layout designs and mapping of parallel computations.Thompson �21� showed that one of the important factorsaffecting the minimum layout area of a given circuit in a chipis its bisection width. Also, to enhance the performance of acomputational algorithm, where nodes represent computa-tions and edges represent communications, the nodes are di-vided equally among the processors so that the communica-tions between them are minimized.

The goal of a network partitioning algorithm is to divideany given network into approximately equal size groups ir-respective of node similarities. Community detection, on theother hand, finds groups that either have an inherent or anexternally specified notion of similarity among nodes withingroups. Furthermore, the number of communities in a net-work and their sizes are not known beforehand and they areestablished by the community detection algorithm.

Many algorithms have been proposed to find communitystructures in networks. Hierarchical methods divide networksinto communities, successively, based on a dissimilaritymeasure, leading to a series of partitions from the entire net-work to singleton communities �5,15�. Similarly one can alsosuccessively group together smaller communities based on asimilarity measure leading again to a series of partitions�22,23�. Due to the wide range of partitions, structural indi-ces that measure the strength of community structures areused in determining the most relevant ones. Simulation basedmethods are also often used to find partitions with a strongcommunity structure �10,24�. Spectral �17,25� and flowmaximization �cut minimization� methods �9,26� have been

PHYSICAL REVIEW E 76, 036106 �2007�

1539-3755/2007/76�3�/036106�11� ©2007 The American Physical Society036106-1

Page 88: ComplexNetworks

successfully used in dividing networks into two or morecommunities.

In this paper, we propose a localized community detectionalgorithm based on label propagation. Each node is initial-ized with a unique label and at every iteration of the algo-rithm, each node adopts a label that a maximum number ofits neighbors have, with ties broken uniformly randomly. Asthe labels propagate through the network in this manner,densely connected groups of nodes form a consensus on theirlabels. At the end of the algorithm, nodes having the samelabels are grouped together as communities. As we willshow, the advantage of this algorithm over the other methodsis its simplicity and time efficiency. The algorithm uses thenetwork structure to guide its progress and does not optimizeany specific chosen measure of community strengths. Fur-thermore, the number of communities and their sizes are notknown a priori and are determined at the end of the algo-rithm. We will show that the community structures obtainedby applying the algorithm on previously considered net-works, such as Zachary’s karate club friendship network andthe U.S. college football network, are in agreement with theactual communities present in these networks.

II. DEFINITIONS AND PREVIOUS WORK

As mentioned earlier, there is no unique definition of acommunity. One of the simplest definitions of a communityis a clique, that is, a group of nodes where there is an edgebetween every pair of nodes. Cliques capture the intuitivenotion of a community �6� where every node is related toevery other node and hence have strong similarities witheach other. An extension of this definition was used by Pallaet al. in �14�, who define a community as a chain of adjacentcliques. They define two k cliques �cliques on k nodes� to beadjacent if they share k−1 nodes. These definitions are strictin the sense that the absence of even one edge implies that aclique �and hence the community� no longer exists. k clansand k clubs are more relaxed definitions while still maintain-ing a high density of edges within communities �14�. A groupof nodes is said to form a k clan if the shortest path lengthbetween any pair of nodes, or the diameter of the group, is atmost k. Here the shortest path only uses the nodes within thegroup. A k club is defined similarly, except that the subnet-work induced by the group of nodes is a maximal subgraphof diameter k in the network.

Definitions based on degrees �number of edges� of nodeswithin the group relative to their degrees outside the groupwere given by Radicchi et al. �15�. If di

in and diout are the

degrees of node i within and outside of its group U, then U issaid to form a strong community if di

in�diout , ∀ i�U. If

�i�Udiin��i�Udi

out, then U is a community in the weaksense. Other definitions based on degrees of nodes can befound in �6�.

There can exist many different partitions of nodes in thenetwork that satisfy a given definition of community. In mostcases �4,22,26–28�, the groups of nodes found by a commu-nity detection algorithm are assumed to be communities ir-respective of whether they satisfy a specific definition or not.To find the best community structures among them we need

a measure that can quantify the strength of a communityobtained. One of the ways to measure the strength of a com-munity is by comparing the density of edges observed withinthe community with the density of edges in the network as awhole �6�. If the number of edges observed within a commu-nity U is eU, then under the assumption that the edges in thenetwork are uniformly distributed among pairs of nodes, wecan calculate the probability P that the expected number ofedges within U is larger than eU. If P is small, then theobserved density in the community is greater than the ex-pected value. A similar definition was recently adopted byNewman �13�, where the comparison is between the ob-served density of edges within communities and the expecteddensity of edges within the same communities in randomizednetworks that nevertheless maintain every node’s degree.This was termed the modularity measure Q, where Q=�i�eii−ai

2� , ∀ i. eii is the observed fraction of edgeswithin group i and ai

2 is the expected fraction of edges withinthe same group i. Note that if eij is the fraction of edges inthe network that run between group i and group j, then ai=� jeij. Q=0 implies that the density of edges within groupsin a given partition is no more than what would be expectedby a random chance. Q closer to 1 indicates stronger com-munity structures.

Given a network with n nodes and m edges N�n ,m�, anycommunity detection algorithm finds subgroups of nodes.Let C1 ,C2 , . . . ,Cp be the communities found. In most algo-rithms, the communities found satisfy the following con-straints: �i� Ci�Cj =� for i� j and �ii� �iCi spans the nodeset in N.

A notable exception is Palla et al. �14� who define com-munities as a chain of adjacent k cliques and allow commu-nity overlaps. It takes exponential time to find all such com-munities in the network. They use these sets to study theoverlapping structure of communities in social and biologicalnetworks. By forming another network where a communityis represented by a node and edges between nodes indicatesthe presence of overlap, they show that such networks arealso heterogeneous �fat-tailed� in their node degree distribu-tions. Furthermore, if a community has overlapping regionswith two other communities, then the neighboring communi-ties are also highly likely to overlap.

The number of different partitions of a network N�n ,m�into just two disjoint subsets is 2n and increases exponen-tially with n. Hence we need a quick way to find only rel-evant partitions. Girvan and Newman �5� proposed a divisivealgorithm based on the concept of edge betweenness central-ity, that is, the number of shortest paths among all pairs ofnodes in the network passing through that edge. The mainidea here is that edges that run between communities havehigher betweenness values than those that lie within commu-nities. By successively recalculating and removing edgeswith highest betweenness values, the network breaks downinto disjoint connected components. The algorithm continuesuntil all edges are removed from the network. Each step ofthe algorithm takes O�mn� time and since there are m edgesto be removed, the worst case running time is O�m2n�. As thealgorithm proceeds one can construct a dendrogram �seeFig. 1� depicting the breaking down of the network into dis-

RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 �2007�

036106-2

Page 89: ComplexNetworks

joint connected components. Hence for any given h such that1�h�n, at most one partition of the network into h disjointsubgroups is found. All such partitions in the dendrogram aredepicted, irrespective of whether or not the subgroups ineach partition represent a community. Radicchi et al. �15�propose another divisive algorithm where the dendrogramsare modified to reflect only those groups that satisfy a spe-cific definition of a community. Further, instead of edge be-tweenness centrality, they use a local measure called edgeclustering coefficient as a criterion for removing edges. Theedge clustering coefficient is defined as the fraction of num-ber of triangles a given edge participates in, to the total num-ber of possible such triangles. The clustering coefficient ofan edge is expected to be the least for those running betweencommunities and hence the algorithm proceeds by removingedges with low clustering coefficients. The total running timeof this divisive algorithm is O� m4

n2 �.Similarly one can also define a topological similarity be-

tween nodes and perform an agglomerative hierarchical clus-tering �23,29�. In this case, we begin with nodes in n differ-ent communities and group together communities that are themost similar. Newman �22� proposed an amalgamationmethod �similar to agglomerative methods� using the modu-larity measure Q, where at each step those two communitiesare grouped together that give rise to the maximum increaseor smallest decrease in Q. This process can also be repre-sented as a dendrogram and one can cut across the dendro-gram to find the partition corresponding to the maximumvalue of Q �see Fig. 1�. At each step of the algorithm onecompares at most m pairs of groups and requires at mostO�n� time to update the Q value. The algorithm continuesuntil all the n nodes are in one group and hence the worstcase running time of the algorithm is O�n�m+n��. The algo-rithm of Clauset et al. �30� is an adaptation of this agglom-erative hierarchical method, but uses a clever data structureto store and retrieve information required to update Q. Ineffect, they reduce the time complexity of the algorithm toO�md log n�, where d is the depth of the dendrogram ob-tained. In networks that have a hierarchical structure withcommunities at many scales, d� log n. There have also beenother heuristic and simulation based methods that find parti-tions of a given network maximizing the modularity measureQ �10,24�.

Label flooding algorithms have also been used in detect-ing communities in networks �27,28�. In �27�, the authorspropose a local community detection method where a node isinitialized with a label which then propagates step by stepvia the neighbors until it reaches the end of the community,where the number of edges proceeding outward from thecommunity drops below a threshold value. After finding thelocal communities at all nodes in the network, an n�n ma-trix is formed, where the ijth entry is 1 if node j belongs tothe community started from i and 0 otherwise. The rows ofthe matrix are then rearranged such that the similar ones arecloser to each other. Then, starting from the first row theysuccessively include all the rows into a community until thedistance between two successive rows is large and above athreshold value. After this a new community is formed andthe process is continued. Forming the rows of the matrix andrearranging them requires O�n3� time and hence the algo-rithm is time-consuming.

Wu and Huberman �26� propose a linear time �O�m+n��algorithm that can divide a given network into two commu-nities. Suppose that one can find two nodes �x and y� thatbelong to two different communities, then they are initializedwith values 1 and 0, respectively. All other nodes are initial-ized with value 0. Then at each step of the algorithm, allnodes �except x and y� update their values as follows. Ifz1 ,z2 , . . . ,zk are neighbors of a node z, then the value Vz is

updated asVz1

+Vz2+¯+Vzk

k . This process continues until conver-gence. The authors show that the iterative procedure con-verges to a unique value, and the convergence of the algo-rithm does not depend on the size n of the network. Once therequired convergence is obtained, the values are sorted be-tween 0 and 1. Going through the spectrum of values indescending order, there will be a sudden drop at the border oftwo communities. This gap is used in identifying the twocommunities in the network. A similar approach was used byFlake et al. �9� to find the communities in the WWW net-work. Here, given a small set of nodes �source nodes�, theyform a network of web pages that are within a bounded dis-tance from the sources. Then by designating �or artificiallyintroducing� sink nodes, they solve for the maximum flowfrom the sources to the sinks. In doing so one can then findthe minimum cut corresponding to the maximum flow. Theconnected component of the network containing the sourcenodes after the removal of the cut set is then the requiredcommunity.

Spectral bisection methods �25� have been used exten-sively to divide a network into two groups so that the numberof edges between groups is minimized. Eigenvectors of theLaplacian matrix �L� of a given network are used in thebisection process. It can be shown that L has only real non-negative eigenvalues �0��1��2� ¯ ��n� and minimizingthe number of edges between groups is the same as minimiz-ing the positive linear combination M =�isi

2�i, where si=ui

Tz and ui is the eigenvector of L corresponding to �i. z isthe decision vector whose ith entry can be either 1 or −1denoting to which of the two groups node i belongs. Tominimize M, z is chosen as parallel as possible to the eigen-vector corresponding to the second smallest eigenvalue. �Thesmallest eigenvalue is 0 and choosing z parallel to the corre-

FIG. 1. An illustration of a dendrogram which is a tree repre-sentation of the order in which nodes are segregated into differentgroups or communities.

NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 �2007�

036106-3

Page 90: ComplexNetworks

sponding eigenvector gives a trivial solution.� This bisectionmethod has been extended to finding communities in net-works that maximize the modularity measure Q �25�. Q canbe written as a positive linear combination of eigenvalues ofthe matrix B, where B is defined as the difference of the twomatrices A and P. Aij is the observed number of edges be-tween nodes i and j and Pij is the expected number of edgesbetween i and j if the edges fall randomly between nodes,while maintaining the degree of each node. Since Q has to bemaximized, z is chosen as parallel as possible to the eigen-vector corresponding to the largest eigenvalue.

Since many real-world complex networks are large insize, time efficiency of the community detection algorithm isan important consideration. When no a priori information isavailable about the likely communities in a given network,finding partitions that optimize a chosen measure of commu-nity strength is normally used. Our goal in this paper is todevelop a simple time-efficient algorithm that requires noprior information �such as number, sizes, or central nodes ofthe communities� and uses only the network structure toguide the community detection. The proposed mechanism forsuch an algorithm which does not optimize any specific mea-sure or function is detailed in the following section.

III. COMMUNITY DETECTION USING LABELPROPAGATION

The main idea behind our label propagation algorithm isthe following. Suppose that a node x has neighborsx1 ,x2 , . . . ,xk and that each neighbor carries a label denotingthe community to which they belong. Then x determines itscommunity based on the labels of its neighbors. We assumethat each node in the network chooses to join the communityto which the maximum number of its neighbors belong, withties broken uniformly randomly. We initialize every nodewith unique labels and let the labels propagate through thenetwork. As the labels propagate, densely connected groupsof nodes quickly reach a consensus on a unique label �seeFig. 2�. When many such dense �consensus� groups are cre-ated throughout the network, they continue to expand out-wards until it is possible to do so. At the end of the propa-gation process, nodes having the same labels are groupedtogether as one community.

We perform this process iteratively, where at every step,each node updates its label based on the labels of its neigh-bors. The updating process can either be synchronous orasynchronous. In synchronous updating, node x at the tthiteration updates its label based on the labels of its neighborsat iteration t−1. Hence Cx�t�= f(Cx1

�t−1� , . . . ,Cxk�t−1�),

where cx�t� is the label of node x at time t. The problem,

however, is that subgraphs in the network that are bipartite ornearly bipartite in structure lead to oscillations of labels �seeFig. 3�. This is especially true in cases where communitiestake the form of a star graph. Hence we use asynchronousupdating where Cx�t�= f(Cxi1

�t� , . . . ,Cxim�t� ,Cxi�m+1�

�t−1� , . . . ,Cxik

�t−1�) and xi1 , . . . ,xim are neighbors of x thathave already been updated in the current iteration whilexi�m+1� , . . . ,xik are neighbors that are not yet updated in thecurrent iteration. The order in which all the n nodes in thenetwork are updated at each iteration is chosen randomly.Note that while we have n different labels at the beginning ofthe algorithm, the number of labels reduces over iterations,resulting in only as many unique labels as there are commu-nities.

Ideally the iterative process should continue until no nodein the network changes its label. However, there could benodes in the network that have an equal maximum number ofneighbors in two or more communities. Since we break tiesrandomly among the possible candidates, the labels on suchnodes could change over iterations even if the labels of theirneighbors remain constant. Hence we perform the iterativeprocess until every node in the network has a label to whichthe maximum number of its neighbors belongs. By doing sowe obtain a partition of the network into disjoint communi-ties, where every node has at least as many neighbors withinits community as it has with any other community. IfC1 , . . . ,Cp are the labels that are currently active in the net-work and di

Cj is the number of neighbors node i has withnodes of label Cj, then the algorithm is stopped when forevery node i,

If i has label Cm then diCm � di

Cj ∀ j .

At the end of the iterative process nodes with the samelabel are grouped together as communities. Our stop criterioncharacterizing the obtained communities is similar �but notidentical� to the definition of strong communities proposedby Radicchi et al. �15�. While strong communities requireeach node to have strictly more neighbors within its commu-nity than outside, the communities obtained by the labelpropagation process require each node to have at least asmany neighbors within its community as it has with each ofthe other communities. We can describe our proposed labelpropagation algorithm in the following steps.

�i� Initialize the labels at all nodes in the network. For agiven node x, Cx�0�=x.

FIG. 2. Nodes are updated one by one as we move from left toright. Due to a high density of edges �highest possible in this case�,all nodes acquire the same label.

FIG. 3. An example of a bi-partite network in which the labelsets of the two parts are disjoint. In this case, due to the choicesmade by the nodes at step t, the labels on the nodes oscillate be-tween a and b.

RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 �2007�

036106-4

Page 91: ComplexNetworks

�ii� Set t=1.�iii� Arrange the nodes in the network in a random order

and set it to X.�iv� For each x�X chosen in that specific order, let

Cx�t�= f(Cxi1�t� , . . . ,Cxim

�t� ,Cxi�m+1��t−1� , . . . ,Cxik

�t−1�). f

here returns the label occurring with the highest frequencyamong neighbors and ties are broken uniformly randomly.

�v� If every node has a label that the maximum number oftheir neighbors have, then stop the algorithm. Else, set t= t+1 and go to �iii�.

Since we begin the algorithm with each node carrying aunique label, the first few iterations result in various smallpockets �dense regions� of nodes forming a consensus �ac-quiring the same label�. These consensus groups then gainmomentum and try to acquire more nodes to strengthen thegroup. However, when a consensus group reaches the borderof another consensus group, they start to compete for mem-bers. The within-group interactions of the nodes can counter-act the pressures from outside if there are less between-groupedges than within-group edges. The algorithm converges,and the final communities are identified, when a global con-sensus among groups is reached. Note that even though thenetwork as one single community satisfies the stop criterion,this process of group formation and competition discouragesall nodes from acquiring the same label in the case of het-erogeneous networks with an underlying community struc-ture. In the case of homogeneous networks such as Erdős-Rényi random graphs �31� that do not have communitystructures, the label propagation algorithm identifies the gi-ant connected component of these graphs as a single com-munity.

Our stop criterion is only a condition and not a measurethat is being maximized or minimized. Consequently there isno unique solution and more than one distinct partition of anetwork into groups satisfies the stop criterion �see Figs. 4and 5�. Since the algorithm breaks ties uniformly randomly,early on in the iterative process when possibilities of ties arehigh, a node may vote in favor of a randomly chosen com-munity. As a result, multiple community structures are reach-able from the same initial condition.

If we know the set of nodes in the network that are likelyto act as centers of attraction for their respective communi-ties, then it would be sufficient to initialize such nodes withunique labels, leaving the remaining nodes unlabeled. In thiscase when we apply the proposed algorithm the unlabelednodes will have a tendency to acquire labels from their clos-est attractor and join that community. Also, restricting the setof nodes initialized with labels will reduce the range of pos-sible solutions that the algorithm can produce. Since it isgenerally difficult to identify nodes that are central to a com-munity before identifying the community itself, here we giveall nodes equal importance at the beginning of the algorithmand provide them each with unique labels.

We apply our algorithm to the following networks. Thefirst one is Zachary’s karate club network which is a networkof friendship among 34 members of a karate club �32�. Overa period of time the club split into two factions due to lead-ership issues and each member joined one of the two fac-tions. The second network that we consider is the U.S. col-

lege football network that consists of 115 college teamsrepresented as nodes and has edges between teams thatplayed each other during the regular season in the year 2000�5�. The teams are divided into conferences �communities�and each team plays more games within its own conferencethan interconference games. Next is the coauthorship net-work of 16 726 scientists who have posted preprints on thecondensed matter archive at www.arxiv.org; the edges con-nect scientists who coauthored a paper �33�. It has been

FIG. 4. �a�–�c� are three different community structures identi-fied by the algorithm on Zachary’s karate club network. The com-munities can be identified by their shades of gray colors.

NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 �2007�

036106-5

Page 92: ComplexNetworks

shown that communities in coauthorship networks are madeup by researchers working in the same field or are researchgroups �22�. Along similar lines one can expect an actorcollaboration network to have communities containing actorsof a similar genre. Here we consider an actor collaborationnetwork of 374 511 nodes and edges running between actorswho have acted in at least one movie together �3�. We alsoconsider a protein-protein interaction network �34� consist-ing of 2115 nodes. The communities are likely to reflectfunctional groupings of this network. And finally we con-sider a subset of the WWW� consisting of 325 729 webpages within the nd.edu domain and hyperlinks interconnect-ing them �2�. Communities here are expected to be groups ofpages on similar topics.

A. Multiple community structures

Figure 4 shows three different solutions obtained for theZachary’s karate club network and Fig. 5 shows two differentsolutions obtained for the U.S. college football network. Wewill show that even though we obtain different solutions�community structure�, they are similar to each other. To findthe percentage of nodes classified in the same group in twodifferent solutions, we form a matrix M, where Mij is the

number of nodes common to community i in one solutionand community j in the other solution. Then we calculatefsame= 1

2 ��imaxj�Mij�+� jmaxi�Mij��100n . Given a network

whose communities are already known, a community detec-tion algorithm is commonly evaluated based on the percent-age �or number� of nodes that are grouped into the correctcommunities �22,26�. fsame is similar, whereby fixing one so-lution we evaluate how close the other solution is to the fixedone and vice versa. While fsame can identify how close onesolution is to another, it is, however, not sensitive to theseriousness of errors. For example, when few nodes fromseveral different communities in one solution are fused to-gether as a single community in another solution, the valueof fsame does not change much. Hence we also use Jaccard’sindex which has been shown to be more sensitive to suchdifferences between solutions �35�. If a stands for the pairsof nodes that are classified in the same community in bothsolutions, b for pairs of nodes that are in the same commu-nity in the first solution and different in the second, and cvice versa, then Jaccard’s index is defined as a

a+b+c . It takesvalues between 0 and 1, with higher values indicating stron-ger similarity between the two solutions. Figure 6 shows thesimilarities between solutions obtained from applying the al-gorithm five different times on the same network. For a

FIG. 5. The grouping of U.S. college football teams into conferences are shown in �a� and �b�. Each solution ��a� and �b�� is an aggregateof five different solutions obtained by applying the algorithm on the college football network.

RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 �2007�

036106-6

Page 93: ComplexNetworks

given network, the ijth entry in the lower triangle of the tableis the Jaccard index for solutions i and j, while the ijth entryin the upper triangle is the measure fsame for solutions i and j.We can see that the solutions obtained from the five differentruns are similar, implying that the proposed label propaga-tion algorithm can effectively identify the community struc-ture of any given network. Moreover, the tight range andhigh values of the modularity measure Q obtained for thefive solutions �Fig. 6� suggest that the partitions denote sig-nificant community structures.

B. Aggregate

It is difficult to pick one solution as the best among sev-eral different ones. Furthermore, one solution may be able toidentify a community that was not discovered in the otherand vice versa. Hence an aggregate of all the different solu-tions can provide a community structure containing the mostuseful information. In our case a solution is a set of labels onthe nodes in the network and all nodes having the same labelform a community. Given two different solutions, we com-

bine them as follows; let C1 denote the labels on the nodes insolution 1 and C2 denote the labels on the nodes in solution2. Then, for a given node x, we define a new label as Cx= �Cx

1 ,Cx2� �see Fig. 7�. Starting with a network initialized

with labels C we perform the iterative process of label propa-gation until every node in the network is in a community towhich the maximum number of its neighbors belongs. Asand when new solutions are available they are combined oneby one with the aggregate solution to form a new aggregatesolution. Note that when we aggregate two solutions, if acommunity T in one solution is broken into two �or more�different communities S1 and S2 in the other, then by definingthe new labels as described above we are showing prefer-ences to the smaller communities S1 and S2 over T. This isonly one of the many ways in which different solutions canbe aggregated. For other methods of aggregation used incommunity detection refer to �26,36,37�.

Figure 8 shows the similarities between aggregate solu-tions. The algorithm was applied on each network 30 timesand the solutions were recorded. An ijth entry is the Jaccardindex for the aggregate of the first 5i solutions with the ag-

FIG. 6. Similarities between five different solutions obtained for each network is tabulated. An entry in the ith row and jth column in thelower triangle of each of the tables is the Jaccard’s similarity index for solutions i and j of the corresponding network. Entries in the ith rowand jth column in the upper triangle of the tables are the values of the measure fsame for solutions i and j in the respective networks. Therange of modularity values Q obtained for the five different solutions is also given for each network.

NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 �2007�

036106-7

Page 94: ComplexNetworks

gregate of the first 5j solutions. We observe that the aggre-gate solutions are very similar in nature and hence a small setof solutions �5 in this case� can offer as much insight aboutthe community structure of a network as can a larger solutionset. In particular, the WWW network which had low simi-larities between individual solutions �Jaccard index range0.4883–0.5931�, shows considerably improved similarities�Jaccard index range 0.6604–0.7196� between aggregate so-lutions.

IV. VALIDATION OF THE COMMUNITY DETECTIONALGORITHM

Since we know the communities present in Zachary’s ka-rate club and the U.S. football network, we explicitly verify

the accuracy of the algorithm by applying it on these net-works. We find that the algorithm can effectively unearth theunderlying community structures in the respective networks.The community structures obtained by using our algorithmon Zachary’s karate club network is shown in Fig. 4. Whileall three solutions are outcomes of the algorithm applied tothe network, Fig. 4�b� reflects the true solution �32�.

Figure 5 gives two solutions for the U.S. college footballnetwork. The algorithm was applied to this network ten dif-ferent times and the two solutions are the aggregate of thefirst five and remaining five solutions. In both Figs. 5�a� and5�b�, we can see that the algorithm can effectively identifyall the conferences with the exception of Sunbelt. The reasonfor the discrepancy is the following: among the seven teams

FIG. 7. An example of aggregating two community structure solutions. t1, t2, t3, and t4 are labels on the nodes in a network obtained fromsolution 1 and denoted as C1. The network is partitioned into groups of nodes having the same labels. s1, s2, and s3 are labels on the nodesin the same network obtained from solution 2 and denoted as C2. All nodes that had label t1 in solution 1 are split into two groups with eachgroup having labels s1 and s2, respectively, while all nodes with labels t3, t4, or t5 in solution 1 have labels s3 in solution 2. C represents thenew labels defined from C1 and C2.

FIG. 8. Similarities between aggregate solutions obtained for each network. An entry in the ith row and jth column in the tables isJaccard’s similarity index between the aggregate of the first 5i and the first 5j solutions. While similarities between solutions for the karateclub friendship network and the protein-protein interaction network are represented in the lower triangles of the first two tables, the entriesin the upper triangle of these two tables are for the U.S. college football network and the coauthorship network, respectively. The similaritiesbetween aggregate solutions for the WWW is given in the lower triangle of the third table.

RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 �2007�

036106-8

Page 95: ComplexNetworks

in the Sunbelt conference, four teams �Sunbelt4 � �North-Texas, Arkansas State, Idaho, New Mexico State� have allplayed each other and three teams �Sunbelt3 ��Louisiana-Monroe, Middle-Tennessee State, Louisiana-Lafayette�� haveagain played one another. There is only one game connectingSunbelt4 and Sunbelt3, namely, the game between North-Texas and Louisiana-Lafayette. However, four teams fromthe Sunbelt conference �two each from Sunbelt4 andSunbelt3� have together played with seven different teams inthe Southeastern conference. Hence we have the Sunbeltconference grouped together with the Southeastern confer-ence in Fig. 5�a�. In Fig. 5�b�, the Sunbelt conference breaksinto two, with Sunbelt3 grouped together with Southeasternand Sunbelt4 grouped with an independent team �Utah State�,a team from Western Atlantic �Boise State�, and the Moun-tain West conference. The latter grouping is due to the factthat every member of Sunbelt4 has played with Utah Stateand with Boise State, who have together played five gameswith four different teams in Mountain West. There are alsofive independent teams which do not belong to any specificconference and are hence assigned by the algorithm to aconference where they have played the maximum number oftheir games.

V. TIME COMPLEXITY

It takes a near-linear time for the algorithm to run to itscompletion. Initializing every node with unique labels re-quires O�n� time. Each iteration of the label propagation al-gorithm takes linear time in the number of edges �O�m��. Ateach node x, we first group the neighbors according to theirlabels �O�dx��. We then pick the group of maximum size andassign its label to x, requiring a worst-case time of O�dx�.This process is repeated at all nodes and hence an overalltime is O�m� for each iteration.

As the number of iterations increases, the number ofnodes that are classified correctly increases. Here we assumethat a node is classified correctly if it has a label that themaximum number of its neighbors have. From our experi-ments, we found that irrespective of n, 95% of the nodes ormore are classified correctly by the end of iteration 5. Evenin the case of Erdős-Rényi random graphs �31� with n be-tween 100 and 10 000 and average degree 4, which do nothave community structures, by iteration 5, 95% of the nodesor more are classified correctly. In this case, the algorithmidentified all nodes in the giant connected component as be-longing to one community.

When the algorithm terminates it is possible that two ormore disconnected groups of nodes have the same label �thegroups are connected in the network via other nodes of dif-ferent labels�. This happens when two or more neighbors ofa node receive its label and pass the labels in different direc-tions, which ultimately leads to different communities adopt-ing the same label. In such cases, after the algorithm termi-nates one can run a simple breadth-first search on thesubnetworks of each individual group to separate the discon-nected communities. This requires an overall time of O�m+n�. When aggregating solutions, however, we rarely finddisconnected groups within communities.

VI. DISCUSSION AND CONCLUSIONS

The proposed label propagation process uses only the net-work structure to guide its progress and requires no externalparameter settings. Each node makes its own decision re-garding the community to which it belongs based on thecommunities of its immediate neighbors. These localized de-cisions lead to the emergence of community structures in agiven network. We verified the accuracy of community struc-tures found by the algorithm using Zachary’s karate club andthe U.S. college football networks. Furthermore, the modu-larity measure Q was significant for all the solutions ob-tained, indicating the effectiveness of the algorithm. Eachiteration takes a linear time O�m�, and although one can ob-serve the algorithm beginning to converge significantly afterabout five iterations, the mathematical convergence is hard toprove. Other algorithms that run in a similar time scale in-clude the algorithm of Wu and Huberman �26� �with timecomplexity O�m+n�� and that of Clauset et al. �30� whichhas a running time of O�n log2 n�.

The algorithm of Wu and Huberman is used to break agiven network into only two communities. In this iterativeprocess two chosen nodes are initialized with scalar values 1and 0 and every node updates its value as the average of thevalues of its neighbors. At convergence, if a maximum num-ber of a node’s neighbors have values above a given thresh-old then so will the node. Hence a node tends to be classifiedto a community to which the maximum number of its neigh-bors belong. Similarly if in our algorithm we choose thesame two nodes and provide them with two distinct labels�leaving the others unlabeled�, the label propagation processwill yield similar communities as the Wu and Huberman al-gorithm. However, to find more than two communities in thenetwork, the Wu and Huberman algorithm needs to know apriori how many communities there are in the network. Fur-thermore, if one knows that there are c communities in thenetwork, the algorithm proposed by Wu and Huberman canonly find communities that are approximately of the samesize, that is, n

c , and it is not possible to find communities withheterogeneous sizes. The main advantage of our proposedlabel propagation algorithm over the Wu and Huberman al-gorithm is that we do not need a priori information on thenumber and sizes of the communities in a given network;indeed such information usually is not available for real-world networks. Also, our algorithm does not make restric-tions on the community sizes. It determines such informationabout the communities by using the network structure alone.

In our test networks, the label propagation algorithmfound communities whose sizes follow approximately apower-law distribution P�S�s��s−� with the exponent �ranging between 0.5 and 2 �Fig. 9�. This implies that there isno characteristic community size in the networks and it isconsistent with previous observations �22,30,38�. While thecommunity size distributions for the WWW and coauthor-ship networks approximately follow power laws with a cut-off, with exponents 1.15 and 1.98, respectively, there is aclear crossover from one scaling relation to another for the

NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 �2007�

036106-9

Page 96: ComplexNetworks

actor collaboration network. The community size distributionfor the actor collaboration network has a power-law expo-nent of 2 for sizes up to 164 nodes and 0.5 between 164 and7425 nodes �see Fig. 9�.

In the hierarchical agglomerative algorithm of Clauset etal. �30�, the partition that corresponds to the maximum Q istaken to be the most indicative of the community structure inthe network. Other partitions with high Q values will have astructure similar to that of the maximum Q partition, as thesesolutions are obtained by progressively aggregating twogroups at a time. Our proposed label propagation algorithm,on the other hand, finds multiple significantly modular solu-tions that have some amount of dissimilarity. For the WWWnetwork in particular, the similarity between five different

solutions is low, with the Jaccard index ranging between0.4883 and 0.5921, yet all five are significantly modular withQ between 0.857 and 0.864. This implies that the proposedalgorithm can find not just one but multiple significant com-munity structures, supporting the existence of overlappingcommunities in many real-world networks �14�.

ACKNOWLEDGMENTS

The authors would like to acknowledge the National Sci-ence Foundation �Grants No. SST 0427840, No. DMI0537992, and No. CCF 0643529�. One of the authors �R.A.�acknowledges support from the Sloan Foundation.

�1� R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 �2002�.�2� R. Albert, H. Jeong, and A.-L. Barabási, Nature �London� 401,

130 �1999�.�3� A.-L. Barabási and R. Albert, Science 286, 509 �1999�.�4� M. Newman, SIAM �Soc. Ind. Appl. Math.� Rev. 45, 167

�2003�.�5� M. Girvan and M. Newman, Proc. Natl. Acad. Sci. U.S.A. 99,

7821 �2002�.�6� S. Wasserman and K. Faust, Social Network Analysis �Cam-

bridge University Press, Cambridge, England, 1994�.

�7� L. Danon, A. Díaz-Guilera, and A. Arenas, J. Stat. Mech.:Theor. Exp. 2006 P11010 �2006�.

�8� J. Eckmann and E. Moses, Proc. Natl. Acad. Sci. U.S.A. 99,5825 �2002�.

�9� G. Flake, S. Lawrence, and C. Giles, Proceedings of the 6thACM SIGKDD, 2000, pp. 150–160.

�10� R. Guimerà and L. Amaral, Nature �London� 433, 895 �2005�.�11� M. Gustafsson, M. Hornquist, and A. Lombardi, Physica A

367, 559 �2006�.�12� M. B. Hastings, Phys. Rev. E 74, 035102�R� �2006�.

FIG. 9. The cumulative probability distributions of community sizes �s� are shown for the WWW, coauthorship and actor collaborationnetworks. They approximately follow power laws with the exponents as shown.

RAGHAVAN, ALBERT, AND KUMARA PHYSICAL REVIEW E 76, 036106 �2007�

036106-10

Page 97: ComplexNetworks

�13� M. E. J. Newman and M. Girvan, Phys. Rev. E 69, 026113�2004�.

�14� G. Palla, I. Derényi, I. Farkas, and T. Vicsek, Nature �London�435, 814 �2005�.

�15� F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Pa-risi, Proc. Natl. Acad. Sci. U.S.A. 101, 2658 �2004�.

�16� D. Karger, J. ACM 47, 46 �2000�.�17� B. Kernighan and S. Lin, Bell Syst. Tech. J. 29, 291 �1970�.�18� C. Fiduccia and R. Mattheyses, Proceedings of the 19th An-

nual ACM IEEE Design Automation Conference, 1982, pp.175–181.

�19� B. Hendrickson and R. Leland, SIAM �Soc. Ind. Appl. Math.�J. Sci. Comput. 16, 452 �1995�.

�20� M. Stoer and F. Wagner, J. ACM 44, 585 �1997�.�21� C. Thompson, Proceedings of the 11th Annual ACM Sympo-

sium on Theory of Computing, 1979, pp. 81–88.�22� M. E. J.Newman, Phys. Rev. E 69, 066133 �2004�.�23� P. Pons and M. Latapy, e-print arXiv:physics/0512106.�24� J. Duch and A. Arenas, Phys. Rev. E 72, 027104 �2005�.�25� M. E. J. Newman, Phys. Rev. E 74, 036104 �2006�.

�26� F. Wu and B. Huberman, Eur. Phys. J. B 38, 331 �2004�.�27� J. P. Bagrow and E. Bollt, Phys. Rev. E 72, 046108 �2005�.�28� L. Costa, e-print arXiv:cond-mat/0405022.�29� M. E. J. Newman, Eur. Phys. J. B 38, 321 �2004�.�30� A. Clauset, M. E. J. Newman, and C. Moore, Phys. Rev. E 70,

066111 �2004�.�31� B. Bollobás, Random Graphs �Academic Press, Orlando, FL,

1985�.�32� W. Zachary, J. Anthropol. Res. 33, 452 �1977�.�33� M. Newman, Proc. Natl. Acad. Sci. U.S.A. 98, 404 �2001�.�34� H. Jeong, S. Mason, A.-L. Barabási, and Z. Oltvai, Nature

�London� 411, 41 �2001�.�35� G. Milligan and D. Schilling, Multivariate Behav. Res. 20, 97

�1985�.�36� D. Gfeller, J. C. Chappelier, and P. De Los Rios, Phys. Rev. E

72, 056135 �2005�.�37� D. Wilkinson and B. Huberman, Proc. Natl. Acad. Sci. U.S.A.

101, 5241 �2004�.�38� A. Arenas, L. Danon, A. Díaz-Guilera, P. Gleiser, and R.

Guimerà, Eur. Phys. J. B 38, 373 �2004�.

NEAR LINEAR TIME ALGORITHM TO DETECT ... PHYSICAL REVIEW E 76, 036106 �2007�

036106-11

Page 98: ComplexNetworks

Int. J. Sensor Networks, Vol. 2, Nos. 3/4, 2007 201

Decentralised topology control algorithmsfor connectivity of distributed wirelesssensor networks

Usha Nandini Raghavan* and Soundar R.T. KumaraDepartment of Industrial Engineering,The Pennsylvania State University,University Park, PA, USAE-mail: [email protected]: [email protected]*Corresponding author

Abstract: In this paper, we study the problem of maintaining the connectivity of a Wireless SensorNetwork (WSN) using decentralised topology control protocols. Previous algorithms on topologycontrol require the knowledge of the density of nodes (λ) in the sensing region. However, if λvaries continuously over time, updating this information, at all nodes is impractical. Therefore, inaddition to efficient maintenance of connectivity, we also wish to reduce the control overhead of thetopology control algorithm. In the absence of information regarding λ we study the connectivityproperties of WSNs by means of giant components. We show that by maintaining the out-degreeat each node as five will give rise to a giant connected component in the network. We also showthat this is the smallest value that can maintain a giant connected component irrespective of howoften or by how much λ changes.

Keywords: wireless sensor networks; topology control; percolation; giant connected component.

Reference to this paper should be made as follows: Raghavan, U.N. and Kumara, S.R.T.(2007) ‘Decentralised topology control algorithms for connectivity of distributed wireless sensornetworks’, Int. J. Sensor Networks, Vol. 2, Nos. 3/4, pp.201–210.

Biographical notes: Usha Nandini Raghavan is a PhD student in the Department of IndustrialEngineering at the Pennsylvania State University. Her main research interest is in theself-organisation of complex networks and localised algorithms as applied to wireless networks.Other research interests include graph theory and supply chain management. She obtained herMaster’s in Mathematics from the Indian Institute of Technology, Madras and Master’s in IndustrialEngineering and Operations Research from the Pennsylvania State University.

Soundar R.T. Kumara is a Distinguished Professor of Industrial Engineering at the PennsylvaniaState University. He received joint appointments with the Department of Computer Scienceand Engineering and School of Information Sciences and Technology. His research interestsinclude complexity in sensor networks, logistics and manufacturing, software agents andneural networks. He is an elected active member of the International Institute of ProductionResearch.

1 Introduction

In this paper, we discuss two kinds of connectivity problemson large-scale self organising Wireless Sensor Networks(WSNs). In the first case the goal is to obtain an entirelyconnected network, while in the second case, the goal is toobtain only a giant connected component. In networks suchas a computer network or the internet, it is critical to ensurethat every node can communicate with every other nodevia the communication links (maybe multihop). However,in a WSN where a large number of sensors coordinate toachieve a global sensing task it may be needless to spendextreme amounts of energy to ensure that the very last nodeis connected. Instead it may be optimal to settle with just agiant connected component, that is, a connected componentwhich, contains a large fraction of the nodes. We specifically,

concentrate on the second type of connectivity problem,namely giant connected components in WSNs.

The topology of a WSN consists of a set of sensor nodesthat perform the sensing tasks and the communication linksbetween these nodes that drive the networking exercises inthe system (Bharathidasan and Ponduru, 2005; Estrin et al.,1999; Goldsmith and Wicker, 2002; Hac, 2003; Ramanathanand Rosales-Hain, 2000; Santi, 2005). The nodes areusually battery powered and the presence or absence ofcommunication links between the nodes is influenced bythe distance between them. That is, due to severe energyconstraints the nodes communicate only with nodes that arewithin a small neighbourhood (Bharathidasan and Ponduru,2005; Estrin et al., 1999).

Distributed topology control algorithms (or protocols) areusually employed to maintain how far or how many nodes

Copyright © 2007 Inderscience Enterprises Ltd.

Page 99: ComplexNetworks

202 U.N. Raghavan and S.R.T. Kumara

with which a given node should communicate (Estrin et al.,1999; Goldsmith and Wicker, 2002; Santi, 2005). Thereexist many works that concentrate on distributed topologycontrol (Bettstetter, 2002a,b; Blough et al., 2003; Cerpaand Estrin, 2002; Glauche et al., 2003; Li et al., 2001,2003; Rodoplu and Meng, 1999). Such topology controlprotocols are required because, very often the wrong topologycan considerably reduce the performance of the system.For example, a sparse network can increase the end to endpacket delay and threaten the connectivity of the network.On the other hand, a dense network can promise a connectednetwork with higher probabilities but also leads to higherinterferences in the network resulting in limited spatial reuse(Ramanathan and Rosales-Hain, 2000).

In this paper, we are particularly interested in developingtopology control algorithms for WSNs that are large-scaleand lack a centralised authority. Advantages of suchdistributed WSNs include the ability to rapidly deploy thenodes in a sensing region (which may be unmanned orinhospitable), the distributed nature which allows for robustnetwork performances and the lack of single points of failure.In addition it is also possible to tailor the network designfor intended applications (Goldsmith and Wicker, 2002).We further assume that the density of active nodes can varywith time. This variation in densities arises because of nodesdying due to loss of power (see Figure 1) or in the case whenthey have energy harvesting capabilities the nodes may gothrough an on/off cycle. In the ‘off’ period, the nodes harvestenergy and do not participate in the sensing and networkingtasks. Once they acquire sufficient energy they switch ‘on’ tojoin the network.

Figure 1 Density of nodes decreases as time increases whenthe nodes are battery powered. Note that if thetransmission radii are independent of the density(constant in this case), then the neighbours of a nodedecrease as density decreases. Thus the probability ofconnectivity of the network also decreases

Connectivity of WSNs is well researched and various resultsexist in literature (Bettstetter, 2002a,b; Booth et al., 2003;Farago, 2002, 2004; Franceschetti et al., 2003; Glaucheet al., 2003; Gupta and Kumar, 1998; Krishnamachari et al.,2002; Meester and Roy, 1996; Penrose, 2003; Xue andKumar, 2004; Ye and Heidmann, 2003). In all the sensornetwork models considered so far, it has been established

that there exist optimal values for either the transmissionradius or the number of neighbours at each node that leadsto the connectivity of the network (Gupta and Kumar, 1998;Krishnamachari et al., 2002; Meester and Roy, 1996; Xueand Kumar, 2004). However, in all these cases, the criticalvalues depend on the density of nodes (λ) or equivalently, thenumber of nodes (N ) in the sensing region. This would implythat in order to maintain connectivity, the topology controlprotocols will require an update on the current density at allnodes. Such global updates (required in desired time intervalsover the entire lifetime of the network) lead to prohibitivecontrol overhead (especially so in our scenario) and we wouldlike to avoid the same. Note that maintaining an optimaltransmission radius or node degree is important becauseenergy conservation and low interferences are some of theprimary objectives of WSNs (Estrin et al., 1999; Goldsmithand Wicker, 2002; Santi, 2005).

Our focus therefore is to develop decentralised topologycontrol algorithms, which can maintain connectivity usingonly the localised information available at each node. Wedo not assume any global information (e.g. density) to beavailable at the nodes. In such cases the best one can hopeto do is to pool in as many nodes as possible to form aconnected network at any point in time (Santi, 2005). That is,our measure of connectivity is based on the presence of giantcomponents in the network. We will in fact show that thisrelaxation helps us in finding density independent values forthe number of neighbours needed to maintain connectivity.Though simulation based results exists (Santi, 2005), to ourknowledge we have not seen an analytical treatment of thisproblem. In this paper we attempt to do the same.

2 Problem statement

We consider a network of nodes that are distributed accordingto a Poisson point process of intensity λ in R2. Each nodehas directed edges pointing towards its nearest k neighbours.Our goal is to find the smallest value for k that gives rise toan unbounded connected component or a giant component,in the bi-directional subnetwork.

The rest of this paper is organised as follows. We beginSection 3 with some preliminaries on graph terminologiesand an introduction to percolation. We further reviewsome WSN models and a class of model called nearestk-neighbours network. In Section 4 we review the resultsfrom literature on connectivity of WSNs and on topologycontrol in Section 5. In Section 6 we show that for the nearestk-neighbours model, the critical value for the appearance ofa giant component in terms of k is 5. This is followed byverification of the analysis using simulation and an estimateof the energy consumed in maintaining a degree k in thenearest k-neighbours network. In Section 7, the proposedtopology control algorithm and critical thresholds for theappearance of giant strongly connected components withrespect to k are discussed. We finally conclude in Section 8.

3 Wireless sensor network models

In this section, we review some definitions and concepts fromgraph theory and percolation. We further look at different

Page 100: ComplexNetworks

Decentralised topology control algorithms 203

classes of wireless network models (Sections 3.2.1 and 3.2.2)and a new class of models called as the nearest k-neighboursmodel (Section 3.2.3).

3.1 Preliminaries

• Network: an undirected network G(V, E) consists of aset of nodes V, a set of edges E and a function w : E →V × V . That is, every element of the set E is mappedto an ordered pair of points from the set V × V . On theother hand a directed network again consists of the setsV and E, but has two functions s, t : E → V, where s(e)

represents the source and t (e) represents the target of theedge e.

• Degree: degree of a node v ∈ V is the number of edgesincident on that node. In the case of a directed graph wehave two different kinds of degrees on a node, namelyin-degree and out-degree. While in-degree of v is thenumber of edges with the target on v, out-degree ofv is the number of edges whose sources are at v.In undirected graphs degree of a node is just the numberof edges incident on it.

• Path: a (directed) path in a directed network is analternating sequence of nodes and edges denoted asv0, e1, v1, e2, . . . , vi, ei+1, . . . , en, vn. Here v0 is theorigin and vn is the terminus of the path. ei+1 is the edgethat has its source at vi and target at vi+1. In undirectednetworks, a path is again an alternating sequence of nodesand edges and ei+1 is the edge incident on both vi andvi+1.

• Connected network: in undirected networks if a subsetV1 ⊆ V is such that there exists a path between anytwo nodes x, y ∈ V1, then the network H1(V1, EV1) iscalled a connected component of the network G(V, E).Here EV1 ⊆ E contains only those edges that have boththe nodes it is incident on in V1. Further H1(V1, EV1)

is maximally connected if there exists no V2 ⊆ V

such that V1 ⊆ V2 and H2(V2, EV2) is connected.In general it is possible to partition V into disjoint setsof V1, V2, . . . , Vi (note that V1 ∪ V2 ∪ . . . ∪ Vi = V )such that Hj(Vj , EVj

) is maximally connected in G,

∀j = 1, 2, . . . , i. We call a network as connected ifand only if in such a partition i = 1 (Definition 1). Also,if i > 1 in such a partition, but if the size of the largestcomponent is O(N) then the network is said to havea giant component (Definition 2). If the nodes in thesystem are assumed to be spread across the entire R2

space with some density λ, then the giant componentis also called as the unbounded connected component.That is, the number of nodes in the largest componentis unbounded. However, unless otherwise mentionedwe assume the former definition (Definition 1) for aconnected network.

• Poisson point process: given a compact set K (inRd ), a point process X is a measurable mapping froma probability space to the configurations of pointsof K. The total number of points in a point processis then a random variable. Further a point processX in Rd is a Poisson point process of intensity λ

(where λ = E(X[0, 1]d)), if, (1) for mutuallydisjoint Borel sets A1, . . . , Ak , the random variablesX(A1), . . . , X(Ak) are mutually independent and (2) forany bounded Borel set A we have for every k ≥ 0,P(X(A) = k) = e−λ�(A)(λk�(A)k)/k!, where �(.)

denotes Lebesgue measure in Rd (Meester and Roy,1996). In this paper we only consider the case whend = 2. Also we can simulate a Poisson point processof intensity λ in a finite region of area A as follows.

– First generate the number of points in the region ofarea A from a Poisson distribution of mean λA.

– Place these points in the region uniformly randomly.

In most cases for simulation (and as in this paper),it is sufficient to assume that the number of points isλA, instead of generating this number from a Poissondistribution of mean λA. In this case it is called as auniform Poisson point process.

• Percolation: percolation theory studies the flow offluid across a random media, in particular on a regulard-dimensional lattice where the edges are either presentor absent with probabilities p and 1 − p, respectively(Albert and Barabasi, 2002; Bollobas, 1985; Meester andRoy, 1996). It is obvious that for small p only a fewedges are present and hence percolation of a fluid acrossthis media is not possible. But one of the interestingphenomena is the presence of a percolation thresholdpc, at which a percolating cluster of nodes connectedby edges begin to appear rather suddenly. That is forp < pc a percolating cluster does not exist almost surelywhile for p > pc it exists almost surely. To put insimple terms, for small values of p, only few edgesare present in the network and hence percolation is notpossible. However, as p increases gradually, so does thenumber of edges in the network and hence one wouldexpect the possibility of percolation to also increasegradually. On the contrary, with the gradual increasein p, the appearance of a percolating cluster arises rathersuddenly. Suppose we consider a network in which anygiven pair of nodes is connected with a probability p

(also known as Erdos-Renyi random graphs (Bollobas,1985)), we can see from Figure 2 that as p increasesgradually from 0, the giant component in the networkappears suddenly.

Note that a percolating cluster of nodes in a network is thesame as a connected network. In most cases in literaturepercolation properties (connectivity) of large-scale networks(systems) are measured in terms of giant components (Albertand Barabasi, 2002; Bollobas, 1985; Meester and Roy, 1996;Penrose, 2003). Thus critical thresholds are determined forthe appearance of a giant component in the networks. Thisis precisely the kind of approach we will be using in thispaper. However instead of an edge connection probability p,we consider a different parameter k, which, is the numberof neighbours for a node in the network and find criticalthresholds for connectivity with respect to k. It is importantto note that the critical threshold gives the point whereconnectivity can be obtained with as few number of edgesas possible.

Page 101: ComplexNetworks

204 U.N. Raghavan and S.R.T. Kumara

Figure 2 The graph shows the size of the largest connectedcomponent in a network of 500 nodes and a connectionprobability p. Note that the giant component arisessuddenly as p increases gradually

3.2 Network models

Large numbers of tiny sensors are usually placed randomlyin a sensing region. Hence the node distribution is assumedto follow a Poisson point process in the sensing regionwith a tunable parameter for the density (denoted as λ).The edges (may be directed or un-directed) of the network areformed by either setting the transmission radius at each nodeor by choosing neighbours based on a connection function(see Figure 3). In this section we review different classes ofnetwork models that are commonly used as a representationfor WSNs.

Figure 3 An example of a sensor network in which nodes areuniformly randomly distributed in a sensing region.The edges of the network are formed based on thetransmission radii at the nodes

3.2.1 Poisson Boolean model

Here the nodes are assumed to follow a Poisson pointprocess X of intensity λ in the sensing region (which canbe either the entire R2 space or a two-dimensional unitcube). Each point of X is the centre of a ball of a randomradius. These radii are independent of X. The Poisson

Boolean model is usually denoted as (X, ρ), where ρ is therandom variable for the radii on the points of X (Meester andRoy, 1996).

In this case, the points of X can be thought of as sensornodes and the radius of the balls represent the transmissionradius of the sensors. For the case when ρ = r a.s(almost surely), numerical values and bounds on the criticalthreshold of parameters such as λ and r for the appearanceof unique giant connected components are available (Dalland Christensen, 2002; Meester and Roy, 1996). Here(and elsewhere) the uniqueness of the giant component isinteresting. This implies that there is at most and at least onlyone giant component and that there do not exist two disjointgiant components in the network. Also note that the caseswhere ρ = r a.s. are also called the fixed radius models(Krishnamachari et al., 2002).

3.2.2 Poisson random connection model

Similar to the Poisson Boolean model, the random connectionmodel is also driven by a Poisson point process X.However, unlike the Boolean case, here, the existence ofedges between nodes is determined by a non-increasingfunction g from the set of positive reals to [0,1]. Thusgiven any pair of points x and y, the existence of an(undirected)edge is determined with probability g(‖x − y‖),independent of other pairs. ‖ · ‖ here stands for the Euclideanmetric. A Poisson random connection model is usuallydenoted as (X, g). It has been shown that there existfinite critical thresholds (λc(g)) that depend on g, for theappearance of unique giant connected components in thenetworks (Meester and Roy, 1996).

3.2.3 Nearest k-neighbours model

The third kind of model that we consider here is a neighbourbased model. Unlike the above models here the nodes formdirected edges towards its closest k neighbours in the plane.k here is a parameter and is a fixed number for all nodes in thenetwork. This is usually referred as the nearest k-neighboursmodel (Blough et al., 2003; Raghavan et al., 2005). In thispaper, however, we assume the nearest k-neighbours model tobe the bidirectional subnetwork obtained from such a directednetwork. This model cannot be captured by both the PoissonBoolean model as well as the random connection model.In the random connection model the connection function g

is universal across all nodes in the network. Hence for twodifferent nodes, say x and y, the distance of the kth closestneighbour is suppose dx and dy , then it is possible without lossof generality that dx > dy . Hence for node x , g(dx) = 1 andg(dy) = 1, whereas, for node y, g(dx) = 0 and g(dy) = 1.Thus, it is not possible to find a connection function that isuniversal across all nodes in the network. Also, in the PoissonBoolean model, ρ is independent of X, which is not thecase here.

Even though k is independent of λ (or N ), it ispossible that the critical value kc is dependent on λ. However,we will establish that kc for connectivity (in terms of giantcomponent in the network) is independent of λ and thuscan be used in the design of decentralised topology controlalgorithms.

Page 102: ComplexNetworks

Decentralised topology control algorithms 205

4 Background on the connectivity problemin WSNs

Fixed radius models are the most widely used graphs torepresent WSNs. Here, there is an edge (undirected) betweentwo nodes if and only if they are no more than a distance r

apart. In most cases the distance metric is the �2 norm andsome times other norms such as �p, 1 ≤ p ≤ ∞ are alsoconsidered (Penrose, 2003).

4.1 Critical transmission radius

The problem of connectivity on such networks is well studiedand one of the earliest results was proved by Philips et al.(1989). They assumed the sensing region to be a squareof area A with a constant density λ. Hence as A → ∞,number of nodes also increases. They showed that for anygiven ε > 0, if, r ≤ √

(1 − ε)lnA/πλ, then under theassumption of a constant density the graph is almost surelydisconnected as A → ∞. This implies that for any given r

and λ, we can always find an A large enough such that thegraph is almost surely disconnected. A similar analysis wasdone by Gupta and Kumar (1998) on a unit disk and theyfound that for the network to be connected with probability1, r = √

(lnN + c(N))/πN , where c(N) → ∞ as N → ∞.N here stands for the number of nodes in the sensingregion.

Penrose (1999) studied in general the problem ofk-connectivity of fixed radius networks in d-dimensionalunit cubes (d ≥ 2). He showed that the graph becomesk-connected almost surely whenever all nodes have degreegreater than or equal to k. That is, as N → ∞, P {smallestr at which the network is k-connected = smallest r at whichminimum degree ≥ k} → 1. These results hold in generalfor any lp distance metric such that 1 < p < ∞. The case of�∞ was discussed by Appel and Russo (2002).

All these critical values however depend on N which wedo not desire.

4.2 Critical number of neighbours

While all the above work looked at the properties of thetransmission radii for the connectivity of the network, thereare also other problems that focused on the desired number ofneighbours. It was first studied in the context of throughputcapacity in packet radio networks by many researchers(Hou and Li, 1986; Kleinrock and Silvester, 1978; Takagiand Kleinrock, 1984). Kleinrock and Silvester (1978) studiedthe capacity of packet radio networks that are randomlydistributed in a region and uses slotted ALOHA as theiraccess scheme. They assume that each packet radio unituses a predetermined fixed radius for transmission and tryto maximise the one hop progress of a packet in the desireddirection. The passage of the message from the source tothe target was formulated as a stochastic process and theydeveloped an objective (throughput), which they optimised,based on the average number of neighbours. In this case theyshowed that 6 is the magic number (independent of the systemsize) that maximises the throughput. Takagi and Kleinrocklater revised this number to 8 (Takagi and Kleinrock, 1984).There are also works that similarly suggest other magic

numbers (Hou and Li, 1986). They do not however addressthe problem of connectivity of the network.

While, simulation suggests that in most cases the fixedradius models are connected by assuming an average numberof neighbours as 6 or 8, this is not always the case.Xue and Kumar (2004) studied the problem of connectivitybased on the node degree required and showed that thenumber of neighbours required in fact grows as �(log N) andthat there exists no such magic numbers. They assume thateach node forms undirected edges (two way communication)incident on their ϕN nearest neighbours, where N is thenumber of nodes placed uniformly randomly in a unit square.This implies that to maintain the connectivity of a networkthe number of neighbours cannot be bounded as the numberof nodes increases.

4.3 Continuum percolation and connectivity

The problem of connectivity has also been well researchedin the area of continuum percolation (Avram and Bertsimas,1993; Booth et al., 2003; Franceschetti et al., 2003; Glaucheet al., 2003; Meester and Roy, 1996; Penrose, 2003;Quantanilla, 2001; Quantanilla et al., 2000; Raghavan et al.,2005). The presence of a giant component is usuallyconsidered as sufficient in order to study the networks’percolation properties. This is unlike all the works describedabove that require every node to be present in the giantcomponent, that is, an entirely connected network. The worksby Meester and Roy (1996), Penrose (2003), Quantanillaet al. (2000) and Quantanilla (2001) and others is concernedwith the kind of percolation that occurs by keeping the radiusr as fixed and varying the density of nodes λ. They showthat as the density of the nodes increases, for a given r ,there exists a finite percolation threshold λc beyond whicha unique giant component always exist. Similarly, in ad-dimensional unit cube, given the number of nodes N , onecan find a similar threshold Nc for N . It is easy to translatethis threshold in terms of the critical number of neighboursusing the relation α = Np (Dall and Christensen, 2002).Here α is the expected number of neighbours and p is theprobability that the given node is incident on any other nodein the network. For example, p = πr2 in a network of nodesin a two-dimensional unit square. Thus, at the threshold,αc = Ncp.

The results so far suggest that αc ≈ 4.51 when d = 2and αc ≈ 2.7 when d = 3 (Dall and Christensen,2002; Quantanilla et al., 2000) in the fixed radius models.Also, given N , the critical transmission radius rc(N) =1/

√π{αc/N(d + 2/2)}1/d for the appearance of a unique

giant component. When d = 2, rc(N) is simply√

αc/πN

(Dall and Christensen (2002) and Raghavan et al. (2005)).Note, this implies that if the density of nodes is λ in theentire R2 space, then rc(λ) = √

αc/πλ.It was also shown that there exist similar critical

density thresholds (λc(g)) for the random connectionmodels with connection function g (Meester and Roy,1996). Further, Franceschetti et al. (2003) showed thatsquish-squashing a connection function g into anotherconnection function h = pg(

√px) for some 0 <

p < 1, will lead to smaller density thresholds for apercolating cluster of nodes. That is, λc(g) ≥ λc(h).

Page 103: ComplexNetworks

206 U.N. Raghavan and S.R.T. Kumara

Note that h is a version of g in which the probabilitiesof connection between nodes are reduced by a factor p

and is stretched to maintain the same effective area asg. This implies that the presence of even a few longrange connections can help to reach percolation at alower density of points. They also introduced anotherconnection function f , which is a shift-squeezed versionof g. That is, the function g is shifted by a distances (thus two nodes that are at most a distance s

apart will not be connected) and squeezed so thatit still has the same effective area as g. It turns out that(by means of simulation) long-range edges are more helpfulin the percolation process than short-range edges for a givendensity of points. This shows the criticality of long-rangeedges to the connectivity of a network. This is usually referredto as the small-world concept (Watts and Strogatz, 1998).

5 Background on topology control protocols

Topology control can be achieved in various ways. Theycould be one of location based mechanisms, direction basedmechanisms or neighbour based to name a few (Santi, 2005).Most of the protocols based on such methods try to set thetransmission power or radius at the nodes appropriately soas to maintain the connectivity of the network. Note that thetransmission power of a node is a measure of how fast itsenergy depletes.

Location based protocols use information about theposition of the nodes. It is assumed that each node cansomehow determine its location accurately (e.g. using GPS).Examples of protocols that are location based include R&Mprotocol (Rodoplu and Meng, 1999) and Local MinimalSpanning Tree (LMST) protocol (Li et al., 2003). The R&Mprotocol tries to obtain an optimal topology, where everynode sends messages (multihop fashion) to the only masternode in the network. To do so this protocol requires globalinformation to be exchanged between nodes which, will leadto message overhead, especially when the network is highlydynamic. In the LMST protocol, each node builds a minimalspanning tree based on the information available about othernodes up to a predefined distance. The transmission radiusof all the nodes are then adjusted to have sufficient powerto communicate with the neighbours of their respectiveLMSTs.

Direction based protocols assume that each node hasthe capability to somehow determine the direction of allits neighbours. Cone Based Topology Control (CBTC)(Li et al., 2001) is one such protocol where nodes adjusttheir transmission radii so as to communicate with the closestnodes in all directions. A parameter ρ is used as a step lengthto discritise the possible directions in [0, 2π). Bounds onρ have been determined in Li et al. (2001) to generate aconnected network topology.

In neighbour based topology control the nodes, giventheir transmission radii, are required to have a knowledge ofits neighbours. k-Neigh protocol proposed by Blough et al.(2003) is one such protocol that controls the topology bykeeping track of the number of neighbours. Here it is assumedthat when a node x receives a message from another node y,it can estimate its distance from y. This protocol is simple

and uses only few information exchanges between nodes tomaintain the topology.

In all the protocols mentioned above, if the nodes aremobile and their densities vary over time, then a large numberof information exchanges between nodes will be requiredto maintain the topology (Santi, 2005). Even though anyglobal updates such as the number of active nodes or thegeographical location of nodes can be propagated in thenetwork, this information may become stale in the eventof on/off nodes. Using stale information to readjust thetransmission radius at the nodes might result in non-desirabletopologies.

Local Information No Topology (LINT) (Ramanathanand Rosales-Hain, 2000) is a neighbour based protocol thatspecifically takes into account the mobility of the nodes.When the nodes are mobile, the number of nodes withina given node’s transmission radius varies with time. LINTtherefore uses only the locally available information about anode’s current transmission radius (rcurrent) and currentdegree c to maintain connectivity. If the desired degree forconnectivity is d, then under the assumption of uniformrandom distribution of nodes, the required radius (rreqd) iscalculated using the formula, rreqd = rcurrent − 5ε log d/c

(Glauche et al., 2003; Ramanathan and Rosales-Hain, 2000).The propagation loss function is assumed to vary as someε power of distance and in practice 2 < ε < 5. Advantagesof this protocol is that it does not assume any information suchas location of nodes or direction of neighbours to be present atthe nodes. Further, this formula can be used to both increaseor decrease the transmission radius according to d. However,as discussed in Section 4.2 the critical value for d dependson N (see Figure 1). Therefore, varying densities over timecannot be handled well using this protocol.

In this paper, we aim to achieve topology control in adistributed manner and assuming no global updates to beavailable at the nodes. Then, to maintain connectivity wetry to pool in as many nodes as possible into one connectedcomponent and wish to maintain a giant component in thenetwork throughout its lifetime. Specifically, for the nearestk-neighbours model, we will show that the critical out-degreekc required for a giant component in the network is 5 andis independent of N or λ. In Blough et al. (to appear) theauthors have shown by means of extensive simulation (10,000instances of the network of sizes between 50 and 500) thatfor nodes distributed uniformly randomly in a unit square,taking k = 6 will always result in 95% of the nodes inthe largest component. We on the other hand assume thatO(N) nodes in the largest or giant component are equivalentto ‘as good as possible’ connectivity and show that 5 is themagic number. Note that our’s is an average case analysis.Due to the centrality of measures in such networks (Farago,2002), all except a very small percentage of instances of thenearest k-neighbours network will have the same statisticalproperties as the average case. In other words this meansthat when k = 5 or above, the network will have a giantcomponent with high probability. This is what we show inthe next section.

Our interest in this paper is not in developing the protocolsfor topology control. Instead we assume that there existefficient protocols that can maintain the connectivity of thenetwork, in a distributed and localised fashion, without the

Page 104: ComplexNetworks

Decentralised topology control algorithms 207

requirements of any global information (such as density).LINT is one such example. However, it does not specifywhat the desired number of neighbours d should be. ‘Basedon the study of such topology control protocols we extract thenecessary conditions and constraints to determine a desirablethreshold for the number of neighbours. In this case, a densityindependent threshold for the presence of giant componentsin WSNs’.

6 Connectivity of WSNs

Here we consider the nearest k-neighbours model wherenodes are distributed according to a Poisson point processX of intensity λ in R2. Note that as mentionedin Section 3.2.3, in the nearest k-neighbours model,each node forms a directed edge towards its closestk-neighbours. We then ‘extract the subnetwork that consistsof only the bidirectional edges and study its connectivityproperties’.

In this section we will obtain an expression for thecritical number of neighbours kc (or critical out-degree),such that, in the nearest k-neighbours model, for k < kc

there exists no unbounded connected component almostsurely and for k ≥ kc there exists an unbounded connectedcomponent almost surely. In particular, we will show thatirrespective of the density λ, kc = 5 satisfies the aboverequirements.

6.1 Critical number of neighbours

When k = 0 it is obvious that the network has no edgesand hence no unbounded connected component. We will firstshow that there exists a kα such that for all k > kα thereexists an unbounded connected component almost surely inthe nearest k-neighbours model. Then if kc exists it mustbe ≤ kα .

Let rc(λ) be the critical radius for connectivity of afixed radius network whose nodes are distributedaccording to a Poisson point process X of intensity λ

in R2 (Dall and Christensen, 2002; Raghavan et al.,2005). Suppose each node adjusts its transmissionradius to accommodate a desired number of neighbours k.Then a directed edge from a node towards its k neighbours isformed. Let

kα = inf{k|infi∈X{ri |outdegree at all

nodes in the network = k} ≥ rc(λ)}kα is then the smallest k such that, the smallest transmissionradius required at any node to have k outgoing neighbours,is at least rc(λ). Note that even though k is independent of λ,kα might not. If each node adjusts its transmission radiusto form directed edges with at least kα neighbours,there will be edges in both directions between nodesthat are no more than a distance rc(λ) apart. Hence‘the fixed radius network with radius rc(λ) becomes asubgraph of the nearest kα-neighbours network’. Thisis because in nearest kα-neighbours network each nodehas a transmission radius of at least rc(λ). Also, sincethe fixed radius network has an unbounded connectedcomponent (by the definition of rc(λ)) so does the nearest

kα-neighbours network. Therefore, for a given X and ∀ k ≥kα , the nearest k-neighbours model has an unboundedconnected component.

To determine the value of kα , we know from (Cressie,1991) that if Wk is the random variable for the distanceof the kth nearest-neighbour (k ≥ 1 ) from a point inX, then the probability density function of Wk is givenby, f (wk) = 2(πλ)kw2k−1

k e−πλw2k /(k − 1)! It immediately

follows that E(Wk) = k(2k)!/(2kk!)2λ1/2. In order to obtainkα , let us first calculate the P(Wk ≥ rc(λ)).

P(Wk ≥ rc(λ)) =∫ ∞

rc(λ)

2

(k − 1)!(πλw2

k

)ke−πλw2

kdwk

wk

(1)

By changing the variable as x = πλw2k , we get

P(Wk ≥ rc(λ)) =∫ ∞

πλr2c (λ)

xk−1e−x

(k − 1)! dx (2)

=y=k−1∑y=0

(λπr2c (λ))ye−λπr2

c (λ)

y!

Refer Hogg et al. (2005) for the right hand side equation.But we also know that, the critical average degree in thenetwork for a fixed radius model to percolate is approximately4.51 (Dall and Christensen, 2002; Quantanilla et al., 2000).Which means that in order to obtain an average degree of4.51 we must set rc(λ) = √

4.51/πλ. Note that if we fixr = √

d/πλ in the fixed radius network model, then theexpected degree on a node in the network is d. Substitutingfor rc(λ) in Equation (2), we get,

P(Wk ≥ rc(λ)) = e−4.51y=k−1∑y=0

(4.51)y

y! (3)

and this probability is independent of λ. Thus kα which isnow the smallest k such that the probability in Equation (3)is 1 is in fact independent of λ. However, only as k → ∞the above probability tends to 1. But we see that evenfor k around 10, this probability is more than 0.99 andfor k about 15 it is arbitrarily close to 1. Hence we cansafely assume that k = 15 yields a network in which eachnode has a transmission radius of at least rc(λ). Thus bydefinition of rc(λ) this network will have an unboundedconnected component. Requiring every node to have at leasta transmission radius of rc(λ) gives a pessimistic estimateof kc. If on the other hand we find the smallest value for k

such that the expected transmission radii on the nodes in thenetwork is at least rc(λ), then we need,

E(Wk) = k(2k)!(2kk!)2λ

12

≥√

4.51

πλ= rc(λ) (4)

and this implies

k(2k)!(2kk!)2

≥√

4.51

π(5)

and we see that k = 5 is the smallest value for which theabove inequality is satisfied (see Figure 4). This also impliesthat k = 4 is the largest value for which the above inequality

Page 105: ComplexNetworks

208 U.N. Raghavan and S.R.T. Kumara

is not satisfied. Hence for values of k up to 4, the nearestk-neighbours model does not have an unbounded connectedcomponent (by the definition of rc(λ)). Further this is trueirrespective of the density of nodes in the network. Simulationresults also agree that for k = 5 and above the nearestk-neighbours model of any density λ has an unboundedconnected component. While for k < 5 there exists no giantcomponents (see Figure 4).

Figure 4 For each fixed k, the graphs show how the size of thelargest connected component grows as the number ofnodes N increases. Note that while there exists no giantcomponent for k = 3, 4 it appears suddenly for k = 5

N

6.2 Simulation results

To verify the above analysis using simulation, we fixed thesize of the sensing region to be a unit square. The nodesare placed according to a uniform Poisson point process inthe sensing region (refer Section 3.1). We fixed the size ofthe network to N and the number of neighbours as k in eachsimulation. We varied N from 50 to 5000 and k from 3 to 6.We ran 10 experiments each for a fixed N and k. The resultsfrom these experiments are plotted as graphs in Figure 4.

Note that for k = 5 approximately 95% of the nodesare in the giant component. For k = 4 the size of thelargest component scales sublinearly with N . This impliesthat for N large enough, no matter what fraction of N ,less than or equal to 95%, we want in the largest componentthe corresponding critical value for k will be 5. That is,irrespective of whether we need only 95% or 90% or 75%of the nodes in the largest component the optimal value for k

does not change. While we have a bounded node degree on thenodes until this point, the node degree becomes unboundedand the optimal k increases rapidly as the fraction of nodesin the largest component increases from 95% to 100%.In fact, for full connectivity k should be at least 5.1774 log(N)

(Xue and Kumar, 2004).

6.3 Energy consumption in the nearestneighbours model

The sum of transmission radius at all the nodes in a WSNis a measure of the energy consumed in the network. In thissection we will show that for the same number of neighbours,both the fixed radius model and the nearest neighbours model

consumes approximately the same amount of energy. To showthis, we need the following.

Suppose we restrict the Poisson point process X ofintensity λ in R2 to a finite region, say a unit square. Then, letthe number of nodes in this finite region be N . The length ofa graph is the sum of the length of its edges. Hence for the kthnearest neighbour graph in which each node is adjacent to itskth closest neighbour, the length Lk,N is given by Avram andBertsimas (1993),

limN→∞ E

(Lk,N

N1/2

)= 1

2π1/2

j=k∑j=1

(j − 1/2)

(j − 1)! (6)

Therefore the expected sum of the transmission radii on thesensor nodes in the nearest k-neighbours model is the sameas the expected length of the kth nearest neighbour graph.Also, the sum of the transmission radii (Lr,N ) in a fixedradius model is

√dN/π where d is the desired connectivity.

Taking k = 5 and N sufficiently large in Equation (6), weget, E(Lk,N) ≈ 2.1809(N/π)1/2. Also for the sameconnectivity, that is taking d = 5, we have E(Lr,N) ≈2.2361(N/π)1/2. On comparison we see that for a fixed N

and k, Lr,N ≈ Lk,N .

6.4 Strongly connected components

In the nearest k-neighbours model we only considered thebidirectional edges and ignored the presence of unidirectionalones. Hence if we study the network for the appearance ofgiant strongly connected components, then the critical valuefor k is 4 (see Figure 5). By a strongly connected networkwe mean that for any pair of nodes x and y, there exists adirected path both from x to y and from y to x.

Figure 5 For each fixed k, the graphs show how the size of thelargest strongly connected component grows as thenumber of nodes N increases. Note that while thereexists no giant component for k up to 3 it suddenlyappears for k = 4

N

7 Topology control algorithm to maintainconnectivity in a distributed WSN

A simple algorithm for topology control to obtain/maintainconnectivity is to, adjust the transmission radius on the nodes

Page 106: ComplexNetworks

Decentralised topology control algorithms 209

so that the number of neighbours (in directed sense or simplyout-degree) is 5. Due to the distributed and localised nature ofthis algorithm it is scalable for large number of nodes in thesensing region. This nature also helps the network to adaptto constantly changing densities and mobility of the nodes.Further, the degree on the nodes are bounded by k, keepingthe interferences low.

The critical values (derived in the previous section),are in some sense the optimal node degree for the worst casescenario. This is because, for any node degree less then 5the network does not have a giant component of bidirectionallinks and for any node degree less than 4 the network does nothave a giant strongly connected component. Therefore anydecentralised neighbour-based topology control protocol canemploy higher values for k than derived above. However theaim should be to ensure that in the worst case k should notdrop below 5 (or 4).

8 Conclusion

In this paper, we have considered efficient maintenance ofconnectivity of a wireless sensor network. In addition toenergy efficient values for critical degree on the nodes, weconsidered the constraints and requirements from the viewof topology control protocols, when the network is highlydynamic. In specific, in the presence of mobile on/off nodes,it is desirable for topology control protocols to use onlylocalised information in a distributed manner. We thereforeassume that no global updates such as the current density λ

is available at the nodes. We use a neighbour based topologycontrol because it does not require any information suchas location of nodes or direction of neighbours, which isdesirable in the presence of mobile nodes (Santi, 2005).

In such a case, we have shown that when nodes adjust theirtransmission radius to maintain a fixed out-degree k, then 5 isthe critical threshold beyond which a giant component existsalmost surely in the network. Further, this is true irrespectiveof any change in λ as time varies. To our knowledge we areamong the first ones to provide an analytical treatment of thisproblem. Such density independent thresholds are especiallyhelpful in the efficient maintenance of the topology in thepresence of mobile on/off nodes.

Acknowledgements

This work has been supported by the National ScienceFoundation, USA, under the grant NSF-SST 0427840. Anyopinions, findings and conclusions or recommendationsexpressed in this paper are those of the authors and do notnecessarily reflect the views of National Science Foundation.

References

Albert, R. and Barabasi, A.L. (2002) ‘Statistical mechanics ofcomplex networks’, Reviews of Modern Physics, Vol. 74, No. 1,pp.47–97.

Appel, M.J.B. and Russo, R. (2002) ‘The connectivity of a graph onuniform points on [0, 1]d ’, Reviews of Modern Physics, Vol. 60,pp.351–357.

Avram, F. and Bertsimas, D. (1993) ‘On central limit theoremsin geometrical probability’, The Annals of Applied Probability,Vol. 3, No. 4, pp.1033–1046.

Bettstetter, C. (2002a) ‘On the connectivity of wirelessmultihop networks with homogeneous and inhomogeneous rangeassignment’, Proceedings of the IEEE Vehicular TechnologyConference, Vol. 3, pp.1706–1710.

Bettstetter, C. (2002b) ‘On the minimum node degree andconnectivity of a multihop wireless network’, Proceedings ofthe ACM MobiHoc, pp.80–91.

Bharathidasan, A. and Ponduru, V.A.S. (2005) ‘Sensor networks: anoverview’, Available at: http://www.cs.binghamton.edu/∼kliu/survey.pdf.

Blough, D., Leoncini, M., Resta, G. and Santi, P. (2003) ‘Thek-neigh protocol for symmetric topology control in ad hocnetworks’, Proceedings of the IEEE MobiHoc, pp.141–152.

Blough, D.M., Leoncini, M., Resta, G. and Santi, P. (to appear) ‘Thek-neighbors approach to interference bounded and symmetrictopology control in ad hoc networks’, IEEE Transactions onMobile Computing.

Bollobas, B. (1985) Random Graphs, Orlando, FL:Academic Press.

Booth, L., Bruck, J., Fransceschetti, M. and Meester, R. (2003)‘Covering algorithms, continuum percolation and the geometryof wireless networks’, The Annals of Applied Probability, Vol. 13,No. 2, pp.722–741.

Cerpa, A. and Estrin, D. (2002) ‘Ascent: adaptive self-configuringsensor networks topologies’, Proceedings of the IEEEINFOCOM, Vol. 3, pp.1278–1287.

Cressie, A.C.N. (1991) Statistics for Spacial Data, Wiley Series inProbability and Mathematical Statistics, USA: John Wiley andSons.

Dall, J. and Christensen, M. (2002) ‘Random geometric graphs’,Physical Review E, Vol. 66, No. 016121.

Estrin, D., Govindan, R., Heidmann, J. and Kumar, S. (1999) ‘Nextcentury challenges: scalable coordination in sensor networks’,Proceedings of the ACM MobiCom, pp.263–270.

Farago, A. (2002) ‘Scalable analysis and design of ad hoc networksvia random graph theory’, Proceedings of Dial-M, pp.43–50.

Farago, A. (2004) ‘On the fundamental limits of topology control’,Proceedings of the Joint Workshop on Foundations of MobileComputing DIALM-POMC, pp.1–7.

Franceschetti, M., Booth, L., Cook, M., Meester, R. and Bruck, J.(2003) ‘Percolation in multi-hop wireless networks’, IEEETransactions on Information Theory, Available at: http://www.paradise.caltech.edu/papers/etr055.pdf.

Glauche, I., Krause, W., Sollacher, R. and Greiner, M. (2003)‘Continuum percolation of wireless ad hoc communicationnetworks’, Physica A, Vol. 325, pp.577–600.

Goldsmith, A.J. and Wicker, S.B. (2002) ‘Design challenges forenergy-constrained ad hoc wireless networks’, IEEE WirelessCommunications, Vol. 9, No. 4, pp.8–27.

Gupta, P. and Kumar, P.R. (1998) Stochastic Analysis, Control,Optimization and Applications: A Volume in Honor ofW.H. Fleming, chapter Critical power for asymptotic connectivityin wireless networks, Birkhauser, Boston, pp.547–566.

Hac, A. (2003) Wireless Sensor Netowrk Designs, England: JohnWiley and Sons.

Hogg, R.V., McKean, J.W. and Craig, A.T. (2005) Introduction toMathematical Statistics, USA: Pearson Prentice Hall.

Page 107: ComplexNetworks

210 U.N. Raghavan and S.R.T. Kumara

Hou, T. and Li, V.O.K. (1986) ‘Transmission range controlin multihop packet radio networks’, IEEE Transactions onCommunications, Vol. 34, No. 1, pp.38–44.

Kleinrock, L. and Silvester, J. (1978) ‘Optimum transmissionradii for packet radio networks or why six is a magicnumber’, Proceedings of the IEEE National TelecommunicationsConference, pp.4.3.1–4.3.5.

Krishnamachari, B., Wicker, S.B., Bejar, R. and Pearlman, M.(2002) Communications, Information and Network Security,chapter Critical Density Thresholds in Distributed WirelessNetworks, Kluwer publishers.

Li, L., Halpern, J.Y., Bahl, P., Wang, Y.M. and Wattenhofer, R.(2001) ‘Analysis of a cone-based distributed topology controlalgorithm for wireless multi-hop networks’, Proceedings ofthe ACM symposium on Principles of Distributed Computing,pp.264–273.

Li, N., Hou, J.C. and Sha, L. (2003) ‘Design and analysis of anmst-based topology control algorithm’, Proceedings of the IEEEINFOCOM, Vol. 3, pp.1702–1712.

Meester, R. and Roy, R. (1996) Continuum Percolation, Cambridge,UK: Cambridge University Press.

Penrose, M.D. (1999) ‘On k-connectivity for a geometric randomgraph’, Random Structures and Algorithms, Vol. 15, No. 2,pp.145–164.

Penrose, M.D. (2003) Random Geometric Graphs, Oxford Studiesin Probability, Oxford: Oxford University Press.

Philips, T.K., Panwar, S.S. and Tantawi, A.N. (1989) ‘Connectivityproperties of packet radio network model’, IEEE Transactionson Information Theory, Vol. 35, No. 5, pp.1044–1047.

Quantanilla, J., Torquato, S. and Ziff, R.M. (2000) ‘Efficientmeansurement of the percolation threshold for fully penetrablediscs’, Journal of Physics A: Mathematics and General, Vol. 33,pp.L399–L407.

Quantanilla, J. (2001) ‘Meansurement of percolation, thresholdfor fully penetrable discs of different radii’, Physica Review E,Vol. 061108, pp.L399–L407.

Raghavan, U.N., Thadakamalla, H.P. and Kumara, S.R.T. (2005)‘Phase transitions and connectivity of distributed wireless sensornetworks’, Advanced Computing and Communications 2005.

Ramanathan, R. and Rosales-Hain, R. (2000) ‘Topology control ofmultihop wireless networks using transmit power adjustment’,Proceedings of the IEEE INFOCOM, pp.404–413.

Rodoplu, V. and Meng, T.H. (1999) ‘Minimum energy mobilewireless networks’, IEEE Journal on Selected Areas inCommunications, Vol. 17, No. 8, pp.1333–1344.

Santi, P. (2005) Topology Control in Wireless Ad Hoc and SensorNetworks, Chichester, UK: John Wiley and Sons.

Takagi, H. and Kleinrock, L. (1984) ‘Optimal transmissionranges for randomly distributed packet radio terminals’, IEEETransactions on Communications, Vol. 32, No. 3, pp.246–257.

Watts, D.J. and Strogatz, S.H. (1998) ‘Collective dynamics of smallworld networks’, Nature, Vol. 393, pp.440–442.

Xue, F. and Kumar, P.R. (2004) ‘The number of neighbors neededfor connectivity of wireless networks’, Wireless Networks,Vol. 10, pp.169–181.

Ye, W. and Heidmann, J. (2003) ‘Medium access control in wirelesssensor networks’, USC/ISI Technical Report, ISI-TR-580.

Page 108: ComplexNetworks

OPERATIONS RESEARCH AND MANAGEMENT

SCIENCE HANDBOOK

Editor : A. Ravi Ravindran

The Pennsylvania State University

December 1, 2006

Page 109: ComplexNetworks

ii

Page 110: ComplexNetworks

Contents

11 Complexity and Large-scale Networks 1

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

11.2 Statistical properties of complex networks . . . . . . . . . . . . . . . . . . . 8

11.2.1 Average path length and the small-world effect . . . . . . . . . . . . . 8

11.2.2 Clustering coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

11.2.3 Degree distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

11.2.4 Betweenness centrality . . . . . . . . . . . . . . . . . . . . . . . . . . 12

11.2.5 Modularity and community structures . . . . . . . . . . . . . . . . . 13

11.2.6 Network resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

11.3 Modeling of complex networks . . . . . . . . . . . . . . . . . . . . . . . . . . 16

11.3.1 Random graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

11.3.2 Small-world networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

11.3.3 Scale-free networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

11.4 Why “Complex” Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

i

Page 111: ComplexNetworks

ii CONTENTS

11.5 Optimization in complex networks . . . . . . . . . . . . . . . . . . . . . . . . 27

11.5.1 Network resilience to node failures . . . . . . . . . . . . . . . . . . . . 27

11.5.2 Local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

11.5.3 Other topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

11.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Page 112: ComplexNetworks

Chapter 11

Complexity and Large-scale Networks

Hari P. Thadakamalla1, Soundar R. T. Kumara1 and Reka Albert2

1Dept. of Industrial & Manufacturing Engineering, The Pennsylvania State University2Dept. of Physics, The Pennsylvania State University

11.1 Introduction

In the past few decades, graph theory has been a powerful analytical tool for understanding

and solving various problems in operations research (OR). Study on graphs (or networks)

traces back to the solution of the Konigsberg bridge problem by Euler in 1735. In Konigsberg,

the river Preger flows through the town dividing it into four land areas A, B, C and D as

shown in figure 11.1 (a). These land areas are connected by seven (1 - 7) different bridges.

The Konigsberg bridge problem is to find whether it is possible to traverse through the

city on a route that crosses each bridge exactly once, and return to the starting point.

Euler formulated the problem using a graph theoretical representation and proved that the

traversal is not possible. He represented each land area as a vertex (or node) and each bridge

as an edge between two nodes (land areas) as shown in figure 11.1 (b). Then, he posed the

question as whether there exists a path such that it passes every edge exactly once and ends

1

Page 113: ComplexNetworks

2 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

at the start node. This path was later termed an Eulerian Circuit. Euler proved that for a

graph to have an Eulerian Circuit, all the nodes in the graph need to have an even degree.

Euler’s great insight lay in representing the Konigsberg bridge problem as a graph problem

with a set of vertices and edges. Later, in the twentieth century, graph theory has developed

into a substantial area of study which is applied to solve various problems in engineering and

several other disciplines [7]. For example, consider the problem of finding the shortest route

between two geographical points. The problem can be modeled as a shortest path problem

on a network, where different geographical points are represented as nodes and they are

connected by an edge if there exists a direct path between the two nodes. The weights on

the edges represent the distance between the two nodes (see figure 11.2). Let the network

be G(V, E) where V is the set of all nodes, E is the set of edges (i, j) connecting the nodes

and w is a function such that wij is the weight of the edge (i, j). The shortest path problem

from node s to node t can be formulated as follows.

minimize∑

(i,j)∈ξ

wijxij

subject to∑

{j|(i,j)∈ξ}

xij −∑

{j|(j,i)∈ξ}

xji =

1 if i = s;

−1 if i = t;

0 otherwise.

xij ≥ 0, ∀(i, j) ∈ ξ.

where xij = 1 or 0 depending on whether the edge from node i to node j belongs to the

optimal path or not respectively. Many algorithms have been proposed to solve the shortest

path problem [7]. Using one such popular algorithm (Dijkstra’s algorithm [7]), we find the

shortest path from node 10 to node 30 as (10 - 1 - 3 - 12 - 30)(see figure 11.2). Note that

this problem and similarly other problems considered in traditional graph theory requires to

find the exact optimal path.

In the last few years there has been an intense amount of activity in understanding and

characterizing large-scale networks, which led to development of a new branch of science

called “Network science” [108]. The scale of the size of these networks is substantially dif-

Page 114: ComplexNetworks

11.1. INTRODUCTION 3

Figure 11.1: Konigsberg bridge problem. (a) Shows the river flowing through the towndividing it into four land areas A, B, C, and D. The land areas are connected by sevenbridges numbered from 1 to 7. (b) Graph theoretical representation of the Konigsbergbridge problem. Each node represents a land area and the edge between them represent thebridges connecting the land areas.

Figure 11.2: Illustration of a typical optimization problem in OR. The objective is to findthe shortest path from node 10 to node 30. The values on the edges represent the distancebetween two nodes. Here we use the exact distances between different nodes to calculate theshortest path 10 - 1 - 3 - 12 - 30.

Page 115: ComplexNetworks

4 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

ferent from the networks considered in traditional graph theory. Also, the problems posed

in such networks are very different from traditional graph theory. These large-scale net-

works are referred to as complex networks and we will discuss the reasons why they are

termed “complex” networks later in the section 11.4. The following are examples of complex

networks:

• World Wide Web: It can be viewed as a network where web pages are the nodes and

hyperlinks connecting one webpage to another are the directed edges. The World Wide

Web is currently the largest network for which topological information is available. It

had approximately one billion nodes at the end of 1999 [89] and is continuously growing

at an exponential rate. A recent study [66] estimated the size to be 11.5 billion nodes

as of January 2005.

• Internet : The Internet is a network of computers and telecommunication devices con-

nected by wired or wireless links. The topology of the Internet is studied at two

different levels [55]. At the router level, each router is represented as a node and

physical connections between them as edges. At the domain level, each domain (au-

tonomous system, Internet Provider System) is represented as a node and inter-domain

connections by edges. The number of nodes, approximately, at the router level were

150, 000 in 2000 [61] and at the domain level were 4000 in 1999 [55].

• Phone call network : The phone numbers are the nodes and every completed phone

call is an edge directed from the receiver to the caller. Abello et al. [4] constructed

a phone call network from the long distance telephone calls made during a single day

which had 53, 767, 087 nodes and over 170 million edges.

• Power grid network : Generators, transformers, and substations are the nodes and

high-voltage transmission lines are the edges. The power grid network of the western

United States had 4941 nodes in 1998 [143]. The North American power grid consisted

of 14, 099 nodes and 19, 657 edges [16] in 2005.

• Airline network : Nodes are the airports and an edge between two airports represent the

presence of a direct flight connection [29, 65]. Barthelemy et al. [29] have analyzed the

International Air Transportation Association database to form the world-wide airport

Page 116: ComplexNetworks

11.1. INTRODUCTION 5

network. The resulting network consisted of 3880 nodes and 18810 edges in 2002.

• Market graph: Recently, Boginski et al. [32, 33] represented the stock market data

as a network where the stocks are nodes and two nodes are connected by an edge if

their correlation coefficient calculated over a period of time exceeds certain threshold

value. The network had 6556 nodes and 27, 885 edges for the U.S. stock data during

the period 2000-2002 [33].

• Scientific collaboration networks: Scientists are represented as nodes and two nodes

are connected if the two scientists have written an article together. Newman [99,

100] studied networks constructed from four different databases spanning biomedical

research, high-energy physics, computer science and physics. On of these networks

formed from Medline database for the period from 1961 to 2001 had 1, 520, 251 nodes

and 2, 163, 923 edges.

• Movie actor collaboration network : Another well studied network is the movie actor

collaboration network, formed from the Internet Movie Database [1], which contains all

the movies and their casts from 1890s. Here again, the actors are represented as nodes

and two nodes are connected by an edge if the two actors have performed together in

a movie. This is a continuously growing network with 225, 226 nodes and 13, 738, 786

edges in 1998 [143].

The above are only a few examples of complex networks pervasive in the real world

[13, 31, 49, 101]. Tools and techniques developed in the field of traditional graph theory

involved studies that looked at networks of tens or hundreds or in extreme cases thousands

of nodes. The substantial growth in size of many such networks [see figure 11.3] necessitates

a different approach for analysis and design. The new methodology applied for analyzing

complex networks is similar to the statistical physics approach to complex phenomena.

The study of large-scale complex systems has always been an active research area in

various branches of science, especially in the physical sciences. Some examples are: fer-

romagnetic properties of materials, statistical description of gases, diffusion, formation of

crystals etc. For instance, let us consider a box containing one mole (6.022 ∗ 1023) of gas

atoms as our system of analysis[see figure 11.4 (a)]. If we represent the system with the

Page 117: ComplexNetworks

6 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

Figure 11.3: Pictorial description of the change in scale in the size of the networks foundin many engineering systems. This change in size necessitates a change in the analyticalapproach.

microscopic properties of the individual particles such as their position and velocity, then it

would be next to impossible to analyze the system. Rather, physicists use statistical me-

chanics to represent the system and calculate macroscopic properties such as temperature,

pressure etc. Similarly, in networks such as the Internet and WWW, where the number

of nodes is extremely large, we have to represent the network using macroscopic properties

(such as degree distribution, edge-weight distribution etc), rather than the properties of in-

dividual entities in the network (such as the neighbors of a given node, the weights on the

edges connecting this node to its neighbors etc) [see figure 11.4 (b)]. Now let us consider

the shortest path problem in such networks (for instance, WWW). We rarely require specific

shortest path solutions such as from node A to node B (from webpage A to webpage B).

Rather it is useful if we know the average distance (number of hops) taken from any node

to any other node (any webpage to any other webpage) to understand dynamical processes

(such as search in WWW). This new approach for understanding networked systems provides

new techniques as well as challenges for solving conceptual and practical problems in this

field. Furthermore, this approach has become feasible and received a considerable boost by

the availability of computers and communication networks which have made the gathering

and analysis of large-scale data sets possible.

The objective of this chapter is to introduce this new direction of inter-disciplinary re-

search (Network Science) and discuss the new challenges for the OR community. During

the last few years there has been a tremendous amount of research activity dedicated to the

Page 118: ComplexNetworks

11.1. INTRODUCTION 7

Figure 11.4: Illustration of the analogy between a box of gas atoms and complex networks.(a) A mole of gas atoms (6.022 ∗ 1023 atoms) in a box. (b) An example of a large-scalenetwork. For analysis, we need to represent both the systems using statistical properties.

study of these large-scale networks. This activity was mainly triggered by significant find-

ings in real-world networks which we will elaborate later in the chapter. There was a revival

of network modeling which gave rise to many path breaking results [13, 31, 49, 101] and

provoked vivid interest across different disciplines of the scientific community. Until now,

a major part of this research was contributed by physicists, mathematicians, sociologists

and biologists. However, the ultimate goal of modeling these networks is to understand and

optimize the dynamical processes taking place in the network. In this chapter, we address

the urgent need and opportunity for the OR community to contribute to the fast-growing

inter-disciplinary research on Network Science. The methodologies and techniques developed

till now will definitely aid the OR community in furthering this research.

The following is the outline of the chapter. In section 11.2, we introduce different sta-

tistical properties that are prominently used for characterizing complex networks. We also

present the empirical results obtained for many real complex networks that initiated a revival

of network modeling. In section 11.3, we summarize different evolutionary models proposed

to explain the properties of real networks. In particular, we discuss Erdos-Renyi random

graphs, small-world networks, and scale-free networks. In section 11.4, we discuss briefly

why these networks are called “complex” networks, rather than large-scale networks. We

summarize typical behaviors of complex systems and demonstrate how the real networks

have these behaviors. In section 11.5, we discuss the optimization in complex networks by

Page 119: ComplexNetworks

8 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

concentrating on two specific processes, robustness and local search, which are most relevant

to engineering networks. We discuss the effects of statistical properties on these processes

and demonstrate how they can be optimized. Further, we briefly summarize few more im-

portant topics and give references for further reading. Finally, in section 11.6, we conclude

and discuss future research directions.

11.2 Statistical properties of complex networks

In this section, we explain some of the statistical properties which are prominently used in the

literature. These statistical properties help in classifying different kinds of complex networks.

We discuss the definitions and present the empirical findings for many real networks.

11.2.1 Average path length and the small-world effect

Let G(V, E) be a network where V is the collection of entities (or nodes) and E is the set

of arcs (or edges) connecting them. A path between two nodes u and v in the network G

is a sequence [u = u1, u2, ..., un = v] , where u′is are the nodes in G and there exists an

edge from ui−1 to ui in G for all i. The path length is defined as sum of the weights on the

edges along the path. If all the edges are equivalent in the network, then the path length

is equal to the number of edges (or hops) along the path. The average path length (l) of a

connected network is the average of the shortest paths from each node to every other node

in a network. It is given by

l ≡ 〈d(u, w)〉 =1

N(N − 1)

u∈V

u 6=w∈ V

d(u, w),

where, N is the number of nodes in the network and d(u, w) is the shortest path between u

and w. Table 11.1 show the values of l for many different networks. We observe that despite

the large size of the network (w.r.t. the number of nodes), the average path length is small.

This implies that any node can reach any other node in the network in a relatively small

Page 120: ComplexNetworks

11.2. STATISTICAL PROPERTIES OF COMPLEX NETWORKS 9

Table 11.1: Average path length of many real networks. Note that despite the large size ofthe network (w.r.t. the number of nodes), the average path length is very small.

Network Size (number of nodes) Average path lengthWWW [37] 2 × 108 16Internet, router level [61] 150,000 11Internet, domain level [55] 4,000 4Movie actors [143] 212,250 4.54Electronic circuits [75] 24,097 11.05peer-to-peer network [122] 880 4.28

number of steps. This characteristic phenomenon, that most pairs of nodes are connected

by a short path through the network, is called the small-world effect.

The existence of the small-world effect was first demonstrated by the famous experiment

conducted by Stanley Milgram in the 1960s [92] which led to the popular concept of six

degrees of separation. In this experiment, Milgram randomly selected individuals from Wi-

chita, Kansas and Omaha, Nebraska to pass on a letter to one of their acquaintances by mail.

These letters had to finally reach a specific person in Boston, Massachusetts; the name and

profession of the target was given to the participants. The participants were asked to send

the letter to one of their acquaintances whom they judged to be closer (than themselves) to

the target. Anyone who received the letter subsequently would be given the same information

and asked to do the same until it reached the target person. Over many trials, the average

length of these acquaintance chains for the letters that reached the targeted node was found

to be approximately 6. That is there is an acquaintance path of an average length 6 in the

social network of people in the United States. We will discuss another interesting and even

more surprising observation from this experiment in section 11.5.2. Currently, Watts et al.

[145] are doing an Internet-based study to verify this phenomenon.

Mathematically, a network is considered to be small-world if the average path length

scales logarithmically or slower with the number of nodes N (∼ logN). For example, say

the number of nodes in the network, N, increases from 103 to 106, then average path length

will increase approximately from 3 to 6. This phenomenon has critical implications on the

dynamic processes taking place in the network. For example, if we consider the spread

of information, computer viruses, or contagious diseases across a network, the small-world

Page 121: ComplexNetworks

10 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

phenomenon implies that within a few steps it could spread to a large fraction of most of

the real networks.

11.2.2 Clustering coefficient

The clustering coefficient characterizes the local transitivity and order in the neighborhood

of a node. It is measured in terms of number of triangles (3-cliques) present in the network.

Consider a node i which is connected to ki other nodes. The number of possible edges

between these ki neighbors that form a triangle is ki(ki − 1)/2. The clustering coefficient of

a node i is the ratio of the number of edges Ei that actually exist between these ki nodes

and the total number ki(ki − 1)/2 possible, i.e.

Ci =2Ei

ki(ki − 1)

The clustering coefficient of the whole network (C) is then the average of C ′is over all the

nodes in the network i.e. C = 1n

i Ci (see figure 11.5). The clustering coefficient is high

for many real networks [13, 101]. In other words, in many networks if node A is connected

to node B and node C, then there is a high probability that node B and node C are also

connected. With respect to social networks, it means that it is highly likely that two friends

of a person are also friends, a feature analyzed in detail in the so called theory of balance

[43].

11.2.3 Degree distribution

The degree of a node is the number of edges incident on it. In a directed network, a node has

both an in-degree (number of incoming edges) and an out-degree (number of outgoing edges).

The degree distribution of the network is the function pk, where pk is the probability that a

randomly selected node has degree k. Here again, a directed graph has both in-degree and

out-degree distributions. It was found that most of the real networks including the World

Wide Web [5, 14, 88], the Internet [55], metabolic networks [77], phone call networks [4, 8],

Page 122: ComplexNetworks

11.2. STATISTICAL PROPERTIES OF COMPLEX NETWORKS 11

Figure 11.5: Calculating the clustering coefficient of a node and the network. For example,node 1 has degree 5 and the number of edges between the neighbors is 3. Hence, the clusteringcoefficient for node 1 is 3/10. The clustering coefficient of the entire network is the averageof the clustering coefficients at each individual nodes (109/180).

scientific collaboration networks [26, 99], and movie actor collaboration networks [12, 19, 25]

follow a power-law degree distribution (p(k) ∼ k−γ), indicating that the topology of the

network is very heterogeneous, with a high fraction of small-degree nodes and few large

degree nodes. These networks having power-law degree distributions are popularly known

as scale-free networks. These networks were called as scale-free networks because of the lack

of a characteristic degree and the broad tail of the degree distribution. Figure 11.6 shows

the empirical results for the Internet at the router level and co-authorship network of high-

energy physicists. The following are the expected values and variances of the node degree in

scale-free networks,

E[k] =

finite if γ > 2;

∞ otherwise.V [k] =

finite if γ > 3;

∞ otherwise.

where γ is the power-law exponent. Note that the variance of the node degree is infinite

when γ < 3 and the mean is infinite when γ < 2. The power-law exponent (γ) of most

of the networks lie between 2.1 and 3.0 which implies that their is high heterogeneity with

respect to node degree. This phenomenon in real networks is critical because it was shown

that the heterogeneity has a huge impact on the network properties and processes such as

network resilience [15, 16], network navigation, local search [6], and epidemiological processes

[111, 112, 113, 114, 115]. Later in this chapter, we will discuss the impact of the this

heterogeneity in detail.

Page 123: ComplexNetworks

12 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

100

101

102

103

10−6

10−5

10−4

10−3

10−2

10−1

100

P(k

)

100

101

102

103

104

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

k

(a) (b)

Figure 11.6: The degree distribution of real networks. (a) Internet at the router level. Datacourtesy of Ramesh Govindan [61]. (b) Co-authorship network of high-energy physicists,after Newman [99].

11.2.4 Betweenness centrality

Betweenness centrality (BC) of a node counts the fraction of shortest paths going through a

node. The BC of a node i is given by

BC(i) =∑

s 6=n 6=t

σst(i)

σst,

where σst is the total number of shortest paths from node s to t and σst(i) is the number of

these shortest paths passing through node i. If the BC of a node is high, it implies that this

node is central and many shortest paths pass through this node. BC was first introduced

in the context of social networks [139], and has been recently adopted by Goh et al. [59]

as a proxy for the load (li) at a node i with respect to transport dynamics in a network.

For example, consider the transportation of data packets in the Internet along the shortest

paths. If many shortest paths pass through a node then the load on that node would be high.

Goh et al. have shown numerically that the load (or BC) distribution follows a power-law,

PL(l) ∼ l−δ with exponent δ ≈ 2.2 and is insensitive to the detail of the scale-free network

as long as the degree exponent (γ) lies between 2.1 and 3.0. They further showed that

Page 124: ComplexNetworks

11.2. STATISTICAL PROPERTIES OF COMPLEX NETWORKS 13

Figure 11.7: Illustration of a network with community structure. Communities are definedas a group of nodes in the network that have higher density of edges with in the group thanbetween groups. In the above network, group of nodes enclosed with in a dotted loop is acommunity.

there exists a scaling relation l ∼ k(γ−1)/(δ−1) between the load and the degree of a node

when 2 < γ ≤ 3. Later in this chapter, we discuss how this property can be utilized for

local search in complex networks. Many other centrality measures exists in literature and a

detailed review of these measures can be found in [86].

11.2.5 Modularity and community structures

Many real networks are found to exhibit a community structure (also called modular struc-

ture). That is, groups of nodes in the network have high density of edges within the group

and lower density between the groups (see figure 11.7). This property was first proposed in

the social networks [139] where people may divide into groups based on interests, age, pro-

fession etc. Similar community structures are observed in many networks which reflects the

division of nodes into groups based on the node properties [101]. For example, in the WWW

it reflects the subject matter or themes of the pages, in citation networks it reflects the area

of research, in cellular and metabolic networks it may reflect functional groups [72, 121].

In many ways, community detection is similar to a traditional graph partitioning problem

(GPP). In GPP the objective is to divide the nodes of the network into k disjoint sets

Page 125: ComplexNetworks

14 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

of specified sizes, such that, the number of edges between these sets is minimum. This

problem is NP-complete [58] and several heuristic methods [69, 81, 119] have been proposed

to decrease the computation time. GPP arises in many important engineering problems

which include mapping of parallel computations, laying out of circuits (VLSI design) and

the ordering of sparse matrix computations [69]. Here, the number of partitions to be

made is specified and the size of each partition is restricted. For example, in mapping of

parallel computations, the tasks have to be divided between a specified number of processors

such that the communication between the processors is minimized and the loads on the

processors are balanced. However, in real networks, we do not have any a priori knowledge

about the number of communities into which we should divide and about the size of the

communities. The goal is to find the naturally existing communities in the real networks

rather than dividing the network into a pre-specified number of groups. Since we do not

know the exact partitions of network, it is difficult to evaluate the goodness of a given

partition. Moreover, there is no unique definition of a community due to the ambiguity

of how dense a group should be to form a community. Many possible definitions exist in

literature [56, 103, 109, 120, 139]. A simple definition given in [56, 120] considers a subgraph

as a community if each node in the subgraph has more connections within the community

than with the rest of the graph. Newman and Grivan [103] have proposed another measure

which calculates the fraction of links within the community minus the expected value of the

same quantity in a randomized counterpart of the network. The higher this difference, the

stronger is the community structure. It is important to note that in spite of this ambiguity,

the presence of community structures is a common phenomenon across many real networks.

Algorithms for detecting these communities are briefly discussed in section 11.5.3.

11.2.6 Network resilience

The ability of a network to withstand removal of nodes/edges in a network is called network

resilience or robustness. In general, the removal of nodes and edges disrupts the paths

between nodes and can increase the distances and thus making the communication between

nodes harder. In more severe cases, an initially connected network can break down into

isolated components that cannot communicate anymore. Figure 11.8 shows the effect of

Page 126: ComplexNetworks

11.2. STATISTICAL PROPERTIES OF COMPLEX NETWORKS 15

Figure 11.8: Effects of removing a node or an edge in the network. Observe that as weremove more nodes and edges the network disintegrates into small components/clusters.

removal of nodes/edges on a network. Observe that as we remove more nodes and edges, the

network disintegrates into many components. There are different ways of removing nodes and

edges to test the robustness of a network. For example, one can remove nodes at random with

uniform probability or by selectively targeting certain classes of nodes, such as nodes with

high degree. Usually, the removal of nodes at random is termed as random failures and the

removal of nodes with higher degree is termed as targeted attacks; other removal strategies

are discussed in detail in [71]. Similarly there are several ways of measuring the degradation of

the network performance after the removal. One simple way to measure it is to calculate the

decrease in size of the largest connected component in the network. A connected component

is a part of the network in which a path exists between any two nodes in that component

and the largest connected component is the largest among the connected components. The

lesser the decrease in the size of the largest connected component, the better the robustness

of the network. In figure 11.8, the size of the largest connected component decreases from

13 to 9 and then to 5. Another way to measure robustness is to calculate the increase of

the average path length in the largest connected component. Malfunctioning of nodes/edges

eliminates some existing paths and generally increases the distance between the remaining

nodes. Again, the lesser the increase, the better the robustness of the network. We discuss

more about network resilience and robustness with respect to optimization in section 11.5.1.

Page 127: ComplexNetworks

16 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

11.3 Modeling of complex networks

In this section, we give a brief summary of different models for complex networks. Most of

the modeling efforts focused on understanding the underlying process involved during the

network evolution and capture the above-mentioned properties of real networks. In specific,

we concentrate on three prominent models, namely, the Erdos-Renyi random graph model,

the Watts-Strogatz small-world network model, and the Barabasi-Albert scale-free network

model.

11.3.1 Random graphs

One of the earliest theoretical models for complex networks was given by Erdos and Renyi

[52, 53, 54] in the 1950s and 1960s. They proposed uniform random graphs for modeling

complex networks with no obvious pattern or structure. The following is the evolutionary

model given by Erdos and Renyi:

• Start with a set of N isolated nodes

• Connect each pair of nodes with a connection probability p

Figure 11.9 illustrates two realizations for Erdos-Renyi random graph model (ER random

graphs) for two connection probabilities. Erdos and Renyi have shown that at pc ≃ 1/N ,

the ER random graph abruptly changes its topology from a loose collection of small clusters

to one which has giant connected component. Figure 11.10 shows the change in size of the

largest connected component in the network as the value of p increases, for N = 1000. We

observe that there exists a threshold pc = 0.001 such that when p < pc, the network is com-

posed of small isolated clusters and when p > pc a giant component suddenly appears. This

phenomenon is similar to the percolation transition, a topic well-studied both in mathematics

and statistical mechanics [13].

In a ER random graph, the mean number of neighbors at a distance (number of hops) d

from a node is approximately < k >d, where < k > is the average degree of the network.

Page 128: ComplexNetworks

11.3. MODELING OF COMPLEX NETWORKS 17

Figure 11.9: An Erdos-Renyi random graph that starts with N = 20 isolated nodes andconnects any two nodes with a probability p. As the value of p increases the number of edgesin the network increase.

0

100

200

300

400

500

600

700

800

900

1000

0 0.002 0.004 0.006 0.008 0.01

Connection probability (p)

Siz

e o

f th

e la

rge

st

co

nn

ecte

d

co

mp

on

en

t

Figure 11.10: Illustration of percolation transition for the size of the largest connectedcomponent in Erdos-Renyi random graph model. Note that there exists pc = 0.001 suchthat when p < pc, the network is composed of small isolated clusters and when p > pc agiant component suddenly appears.

Page 129: ComplexNetworks

18 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

To cover all the nodes in the network, the distance (l) should be such that < k >l∼ N .

Thus, the average path length is given by l = log Nlog <k>

, which scales logarithmically with the

number of nodes N . This is only an approximate argument for illustration, a rigorous proof

can be found in [34]. Hence, ER random graphs are small world. The clustering coefficient

of the ER random graphs is found to be low. If we consider a node and its neighbors in a

ER random graph then the probability that two of these neighbors are connected is equal

to p (probability that two randomly chosen neighbors are connected). Hence, the clustering

coefficient of a ER random graph is p = <k>N

which is small for large sparse networks. Now,

let us calculate the degree distribution of the ER random graphs. The total number of edges

in the network is a random variable with an expected value of pN(N −1)/2 and the number

of edges incident on a node (the node degree) follows a binomial distribution with parameters

N − 1 and p,

p(ki = k) = CkN−1p

k(1 − p)N−1−k.

This implies that in the limit of large N , the probability that a given node has degree

k approaches a Poisson distribution, p(k) = <k>ke−<k>

k!. Hence, ER random graphs are

statistically homogenous in node degree as the majority of the nodes have a degree close to

the average, and significantly small and large node degrees are exponentially rare.

ER random graphs were used to model complex networks for a longtime [34]. The model

was intuitive and analytically tractable; moreover the average path length of real networks

is close to the average path length of a ER random graph of the same size [13]. However,

recent studies on the topologies of diverse large-scale networks found in nature indicated

that they have significantly different properties from ER random graphs [13, 31, 49, 101]. It

has been found [143] that the average clustering coefficient of real networks is significantly

larger than the average clustering coefficient of ER random graphs with the same number

of nodes and edges, indicating a far more ordered structure in real networks. Moreover, the

degree distribution of many large-scale networks are found to follow a power-law p(k) ∼ k−γ .

Figure 11.11 compares two networks with Poisson and power-law degree distributions. We

observe that there is a remarkable difference between these networks. The network with

Poisson degree distribution is more homogenous in node degree, whereas the network with

power-law distribution is highly heterogenous. These discoveries along with others related

Page 130: ComplexNetworks

11.3. MODELING OF COMPLEX NETWORKS 19

Poisson

0

0.02

0.04

0.06

0.08

0.1

0.12

0 10 20 30

k

P(k

)

Power-law

1.0E-07

1.0E-06

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

1.0E+00

1 10 100 1000

k

P(k

)

k

Figure 11.11: Comparison of networks with Poisson and power-law degree distribution of thesame size. Note that the network with Poisson distribution is homogenous in node degree.Most of the nodes in the network have same degree which is close to the average degree of thenetwork. However, the network with power-law degree distribution is highly heterogenousin node degree. There are few nodes with large degree and many nodes with a small degree

to the mixing patterns of complex networks [13, 31, 49, 101] initiated a revival of network

modeling in the past few years.

Non-uniform random graphs are also studied [8, 9, 41, 93, 102, 104] to mimic the properties

of real-world networks, in specific, power-law degree distribution. Typically, these models

specify either a degree sequence, which is set of N values of the degrees ki of nodes i =

1, 2, ..., N or a degree distribution p(k). If a degree distribution is specified then the sequence

is formed by generating N random values from this distribution. This can be thought as

giving each node i in the network ki “stubs” sticking out of it and then pairs of these stubs

are connected randomly to form complete edges [104]. Molloy and Reed [93] have proved that

for a random graph with a degree distribution p(k) a gaint connected component emerges

almost surely when∑

k≥1 k(k − 2)p(k) > 0, provided that the maximum degree is less than

N1/4. Later, Aiello et al. [8, 9] introduced a two-parameter random graph model P (α, γ)

Page 131: ComplexNetworks

20 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

for power-law graphs with exponent γ described as follows: Let nk be the number of nodes

with degree k, such that nk and k satisfy log nk = α− γ log k. The total number of nodes in

the network can be computed, noting that the maximum degree of a node in the network is

eα/γ . Using the results from Molloy and Reed [93], they showed that there is almost surely

a unique gaint connected component if γ < γ0 = 3.47875.... Whereas, there is no gaint

connected component almost surely when γ > γ0.

Newman et al. [104] have developed a general approach to random graphs by using a

generating function formalism [146]. The generating function for the degree distribution pk

is given by G0(x) =∑∞

k=0 pkxk. This function captures all the information present in the

original distribution since pk = 1k!

dkG0

dxk |x=0. The average degree of a randomly chosen node

would be < k >=∑

k kp(k) = G′

0(1). Further, this formulation helps in calculating other

properties of the network [104]. For instance, we can approximately calculate the relation for

the average path length of the network. Let us consider, the degree of the node reached by

following a randomly chosen edge. If the degree of this node is k then we are k times more

likely to reach this node than a node of degree 1. Thus the degree distribution of the node

arrived by a randomly chosen edge is given by kpk and not pk. In addition, the distribution

of number of edges from this node (one less than the degree) qk, is(k+1)pk+1

P

kkpk

=(k+1)pk+1

<k>.

Thus, the generating function for qk is given by G1(x) =P∞

k=0(k+1)pk+1xk

k=

G′

0(x)

G′

0(1)

. Note

that the distribution of number of first neighbors of a randomly chosen node (degree of

a node) is G0(x). Hence, the distribution of number of second neighbors from the same

randomly chosen node would be G0(G1(x)) =∑

k pk[G1(x)]k. Here, the probability that any

of the second neighbors is connected to first neighbors or to one another scales as N−1 and

can be neglected in the limit of large N. This implies that the average number of second

neighbors is given by [ ∂∂x

G0(G1(x))]x=1 = G′0(1)G′

1(1). Extending this method of calculating

the average number of nearest neighbors, we find that the average number of mth neighbors

zm, is [G′1(1)]m−1G′

0(1) = [z2

z1]m−1z1. Now, let us start from a node and find the number of

first neighbors, second, third ... mth neighbors. Assuming that all the nodes in the network

can be reached within l steps, we have 1 +∑l

m=1 zm = N . As for most graphs N ≫ z1 and

z2 ≫ z1, we obtain the average path length of the network l = N/z1

z2/z1+ 1. The generating

function formalism can further be extended to include other features such as directed graphs,

bipartite graphs and degree correlations [101].

Page 132: ComplexNetworks

11.3. MODELING OF COMPLEX NETWORKS 21

Another class of random graphs which are especially popular in modeling social networks

is Exponential Random Graphs Models (ERGMs) or p∗ models [20, 57, 70, 129, 140]. The

ERGM consists of a family of possible networks of N nodes in which each network G appears

with probability P (G) = 1Z

exp(−∑

i θiǫi), where the function Z is, Z =∑

G exp(−∑

i θiǫi).

This is similar to the Boltzmann ensemble of statistical mechanics with Z as the partition

function [101]. Here, {ǫi} is the set of observable’s or measurable properties of the network

such as number of nodes with certain degree, number of triangles etc. {θi} are adjustable

set of parameters for the model. The ensemble average of a property ǫi is given as 〈ǫi〉 =∑

G ǫi(G)P (G) = 1Zǫi exp(−

i θiǫi) = ∂f∂θi

. The major advantage of these models is that

they can represent any kind of structural tendencies such as dyad and triangle formations.

A detailed review of the parameter estimation techniques can be found in [20, 127]. Once the

parameters {θi} are specified, the networks can be generated by using Gibbs or Metropolis-

Hastings sampling methods [127].

11.3.2 Small-world networks

Watts and Strogatz [143] presented a small-world network model to explain the existence

of high clustering and small average path length simultaneously in many real networks,

especially, social networks. They argued that most of the real networks are neither completely

regular nor completely random, but lie somewhere between these two extremes. The Watts-

Strogatz model starts with a regular lattice on N nodes and each edge is rewired with certain

probability p. The following is the algorithm for the model,

• Start with a regular ring lattice on N nodes where each node is connected to its first

k neighbors.

• Randomly rewire each edge with a probability p such that one end remains the same

and the other end is chosen uniformly at random. The other end is chosen without

allowing multiple edges (more than one edge joining a pair of nodes) and loops (edges

joining a node to itself).

Page 133: ComplexNetworks

22 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

Figure 11.12: Illustration of the random rewiring process for the Watts-Strogatz model. Thismodel interpolates between a regular ring lattice and a random network, without changingthe number of vertices (N = 20) or edges (E = 40) in the graph. When p = 0 the graph isregular (each node has 4 edges), as p increases, the graph becomes increasingly disordereduntil p = 1, all the edges are rewired randomly. After Watts and Strogatz, 1998 [143].

The resulting network is a regular network when p = 0 and a random graph when p = 1,

since all the edges are rewired (see figure 11.12). The above model is inspired from social

networks where people are friends with their immediate neighbors such as neighbors on the

street, colleagues at work etc (the connections in the regular lattice). Also, each person has

few friends who are a long way away (long-range connections attained by random rewiring).

Later, Newman [98] proposed a similar model where instead of edge rewiring, new edges are

introduced with probability p. The clustering coefficient of the Watts-Strogatz model and

the Newman model are

CWS =3(k − 1)

2(2k − 1)(1 − p)3 CN =

3(k − 1)

2(2k − 1) + 4kp(p + 2)

respectively. This class of networks displays a high degree of clustering coefficient for small

values of p since we start with a regular lattice. Also, for small values of p the average path

length falls rapidly due to the few long-range connections. This co-existence of high clustering

coefficient and small average path length is in excellent agreement with the characteristics

of many real networks [98, 143]. The degree distribution of both models depends on the

parameter p, evolving from a univalued peak corresponding to the initial degree k to a

somewhat broader but still peaked distribution. Thus, small-world models are even more

homogeneous than random graphs, which is not the case with real networks.

Page 134: ComplexNetworks

11.3. MODELING OF COMPLEX NETWORKS 23

11.3.3 Scale-free networks

As mentioned earlier, many real networks including the World Wide Web [5, 14, 88], the

Internet [55], peer-to-peer networks [122], metabolic networks [77], phone call networks [4,

8] and movie actor collaboration networks [12, 19, 25] are scale-free, that is, their degree

distribution follows a power-law, p(k) ∼ k−γ. Barabasi and Albert [25] addressed the origin

of this power-law degree distribution in many real networks. They argued that a static

random graph or Watts-Strogatz model fails to capture two important features of large-scale

networks: their constant growth and the inherent selectivity in edge creation. Complex

networks like the World-Wide Web, collaboration networks and even biological networks

are growing continuously by the creation of new web pages, start of new researchers and

by gene duplication and evolution. Moreover, unlike random networks where each node

has the same probability of acquiring a new edge, new nodes entering the network do not

connect uniformly to existing nodes, but attach preferentially to nodes of higher degree. This

reasoning led them to define the following mechanism,

• Growth: Start with small number of connected nodes say m0 and assume that every

time a node enters the system, m edges are pointing from it, where m < m0.

• Preferential Attachment: Every time a new node enters the system, each edge of the

newly entered node preferentially attaches to a already existing node i with degree ki

with the following probability,

Πi =ki

j kj

It was shown that such a mechanism leads to a network with power-law degree distribution

p(k) = k−γ with exponent γ = 3. These networks were called as scale-free networks because

of the lack of a characteristic degree and the broad tail of the degree distribution. The average

path length of this network scales as log(N)log(log(N))

and thus displays small world property. The

clustering coefficient of a scale-free network is approximately C ∼ (log N)2

N, which is a slower

decay than C =< k > N−1 decay observed in random graphs [35]. In the years following

the proposal of the first scale-free model a large number of more refined models have been

introduced, leading to a well-developed theory of evolving networks [13, 31, 49, 101].

Page 135: ComplexNetworks

24 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

11.4 Why “Complex” Networks

In this section, we will discuss why these large-scale networks are termed as “complex” net-

works. The reason is not merely because the large size of the network, though the complexity

does arise due to the size of the network. One must also distinguish “complex systems” from

“complicated systems” [136]. Consider an airplane as an example. Even though it is a com-

plicated system, we know its components and the rules governing its functioning. However,

this is not the case with complex systems. Complex systems are characterized by diverse

behaviors that emerge as a result of non-linear spatio-temporal interactions among a large

number of components [73]. These emergent behaviors can not be fully explained by just

understanding the properties of the individual components/constituents. Examples of such

complex systems include ecosystems, economies, various organizations/societies, the nervous

system, the human brain, ant hills ... the list goes on. Some of the behaviors exhibited by

complex systems are discussed below:

• Scale invariance or self-similarity : A system is scale invariant if the structure of the

system is similar regardless of the scale. Typical examples of scale invariant systems

are fractals. For example, consider the Sierpinski triangle in figure 11.13. Note that

if we look at a small part of the triangle at a different scale, then it still looks similar

to the original triangle. Similarly, at which ever scale we look at the triangle, it is

self-similar and hence scale invariant.

• Infinite susceptibility/response: Most of the complex systems are highly sensitive or

susceptive to changes in certain conditions. A small change in the system conditions

or parameters may lead to a huge change in the global behavior. This is similar to

the percolation threshold, where a small change in connection probability induces the

emergence of a giant connected cluster. Another good example of such system is a sand

pile. When we are adding more sand particles to a sand pile, they keep accumulating.

But after reaching a certain point, an addition of one more small particle may lead to

an avalanche demonstrating that sand pile is highly sensitive.

• Self-organization and Emergence: Self-organization is the characteristic of a system

by which it can evolve itself into a particular structure based on interactions between

Page 136: ComplexNetworks

11.4. WHY “COMPLEX” NETWORKS 25

Figure 11.13: Illustration of self-similarity in the Sierpinski triangle. When we look at asmall part of the triangle at a different scale then it looks similar to the original triangle.Moreover, at each scale we look at the triangle, it is self-similar. This is a typical behaviorof a scale invariant system.

the constituents and without any external influence. Self-organization typically leads

to an emergent behavior. Emergent behavior is a phenomenon in which the system

global property is not evident from those of its individual parts. A completely new

property arises from the interactions between the different constituents of the system.

For example, consider an ant colony. Although a single ant (a constituent of an ant

colony) can perform a very limited number of tasks in its lifetime, a large number of

ants interact in an ant colony that leads to more complex emergent behaviors.

Now let us consider the real large-scale networks such as the Internet, the WWW and

other networks mentioned in section 11.1. Most of these networks have power-law degree

distribution which does not have any specific scale [25]. This implies that the networks do not

have any characteristic degree and an average behavior of the system is not typical (see figure

11.11 (b)). Due to these reasons they are called as scale-free networks. This heavy tailed

degree distribution induces a high level of heterogeneity in the degrees of the vertices. The

heterogeneity makes the network highly sensitive to external disturbances. For example,

consider the network shown in figure 11.14(a). This network is highly sensitive when we

remove just two nodes in the network. It completely disintegrates into small components.

On the other hand, the network shown in the figure 11.14(b) having the same number of

nodes and edges is not very sensitive. Most real networks are found to have a structure similar

to the the network shown in figure 11.14(a), with a huge heterogeneity in node degree. Also,

studies [111, 112, 113, 114, 115] have shown that the presence of heterogeneity has a huge

impact on epidemiological processes such as disease spreading. They have shown that in

networks which do not have a heavy tailed degree distribution if the disease transmission

Page 137: ComplexNetworks

26 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

Figure 11.14: Illustration of high sensitivity phenomena in complex networks. (a) Observethat when we remove the two highest degree nodes from the network, it disintegrates intosmall parts. The network is highly sensitive to node removals. (b) Example of a networkwith the same number of nodes and edges which is not sensitive. This network is not effectedmuch when we remove the three highest degree nodes. The network in (a) is highly sensitivedue to the presence of high heterogeneity in node degree.

rate is lesser than a certain threshold, it will not cause an epidemic or a major outbreak.

However, if the network has power-law or scale-free distribution, it becomes highly sensitive

to disease propagation. They further showed that no matter what the transmission rate is,

there exists a finite probability that the infection will cause a major outbreak. Hence, we

clearly see that these real large-scale networks are highly sensitive or infinitely susceptible.

Further, all these networks have evolved over time with new nodes joining the network (and

some leaving) according to some self-organizing or evolutionary rules. There is no external

influence that controlled the evolution process or structure of the network. Nevertheless,

these networks have evolved in such a manner that they exhibit complex behaviors such as

power-law degree distributions and many others. Hence, they are called “complex” networks

[135].

The above discussion on complexity is an intuitive explanation rather than technical

details. More rigorous mathematical definitions of complexity can be found in [23, 30].

Page 138: ComplexNetworks

11.5. OPTIMIZATION IN COMPLEX NETWORKS 27

11.5 Optimization in complex networks

The models discussed in section 11.3 are focused on explaining the evolution and growth

process of many large real networks. They mainly concentrate on statistical properties of

real networks and network modeling. But the ultimate goal in studying and modeling the

structure of complex networks is to understand and optimize the processes taking place on

these networks. For example, one would like to understand how the structure of the Internet

affects its survivability against random failures or intentional attacks, how the structure of

the WWW helps in efficient surfing or search on the web, how the structure of social networks

affects the spread of viruses or diseases, etc. In other words, to design rules for optimiza-

tion, one has to understand the interactions between the structure of the network and the

processes taking place on the network. These principles will certainly help in redesigning

or restructuring the existing networks and perhaps even help in designing a network from

scratch. In the past few years, there has been tremendous amount of effort by the research

communities of different disciplines to understand the processes taking place on networks

[13, 31, 49, 101]. In this chapter, we concentrate on two processes, namely node failures and

local search, because of their high relevance to engineering systems and discuss few other

topics briefly.

11.5.1 Network resilience to node failures

All real networks are regularly subject to node/edge failures either due to normal mal-

functions (random failures) or intentional attacks (targeted attacks) [15, 16]. Hence, it is

extremely important for the network to be robust against such failures for proper function-

ing. Albert et al. [15] demonstrated that the topological structure of the network plays a

major role in its response to node/edge removal. They showed that most of the real net-

works are extremely resilient to random failures. On the other hand, they are very sensitive

to targeted attacks. They attribute it to the fact that most of these networks are scale-free

networks, which are highly heterogenous in node degree. Since a large fraction of nodes have

small degree, random failures do not have any effect on the structure of the network. On

Page 139: ComplexNetworks

28 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

0 35 700

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

p

Siz

e o

f th

e larg

est connecte

d c

om

ponent

0 35 700

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

(a) (b)

Random graphs Scale−free networks

Figure 11.15: The size of the largest connected component as the percentage number ofnodes (p) removed from the networks due to random failures (⋄) and targeted attacks (△).(a) ER graph with number of nodes (N) = 10,000 and mean degree < k > = 4; (b) Scale-free networks generated by Barabasi-Albert model with N = 10,000 and < k > = 4. Thebehavior with respective to random failures and targeted attacks is similar for random graphs.Scale-free networks are highly sensitive to targeted attacks and robust to random failures.

the other hand, the removal of a few highly connected nodes that maintain the connectiv-

ity of the network, drastically changes the topology of the network. For example, consider

the Internet: despite frequent router problems in the network, we rarely experience global

effects. However, if a few critical nodes in the Internet are removed then it would lead to

a devastating effect. Figure 11.15 shows the decrease in the size of the largest connected

component for both scale-free networks and ER graphs, due to random failures and targeted

attacks. ER graphs are homogenous in node degree, that is all the nodes in the network

have approximately the same degree. Hence, they behave almost similarly for both random

failures and targeted attacks (see figure 11.15(a)). In contrast, for scale-free networks, the

size of the largest connected component decreases slowly for random failures and drastically

for targeted attacks (see figure 11.15(b)).

Ideally, we would like to have a network which is as resilient as scale-free networks to

random failures and as resilient as random graphs to targeted attacks. To determine the

feasibility of modeling such a network, Valente et al. [133] and Paul et al. [117] have studied

Page 140: ComplexNetworks

11.5. OPTIMIZATION IN COMPLEX NETWORKS 29

the following optimization problem: “What is the optimal degree distribution of a network

of size N nodes that maximizes the robustness of the network to both random failures and

targeted attacks with the constraint that the number of edges remain the same?”

Note that we can always improve the robustness by increasing the number of edges in

the network (for instance, a completely connected network will be the most robust network

for both random failures and targeted attacks). Hence the problem has a constraint on the

number of edges. In [133], Valente et al. showed that the optimal network configuration

is very different from both scale-free networks and random graphs. They showed that the

optimal networks that maximize robustness for both random failures and targeted attacks

have at most three distinct node degrees and hence the degree distribution is three-peaked.

Similar results were demonstrated by Paul et al. in [117]. Paul et al. showed that the

optimal network design is one in which all the nodes in the network except one have the

same degree, k1 (which is close to the average degree), and one node has a very large degree,

k2 ∼ N2/3, where N is the number of nodes. However, these optimal networks may not be

practically feasible because of the requirement that each node has a limited repertoire of

degrees.

Many different evolutionary algorithms have also been proposed to design an optimal

network configuration that is robust to both random failures and targeted attacks [44, 74,

125, 130, 134]. In particular, Thadakamalla et al. [130] consider two other measures, re-

sponsiveness and flexibility along with robustness for random failures and targeted attacks,

specifically for supply-chain networks. They define responsiveness as the ability of network

to provide timely services with effective navigation and measure it in terms of average path

length of the network. The lower the average path length, the better is the responsiveness

of the network. Flexibility is the ability of the network to have alternate paths for dy-

namic rerouting. Good clustering properties ensure the presence of alternate paths, and the

flexibility of a network is measured in terms of the clustering coefficient. They designed a pa-

rameterized evolutionary algorithm for supply-chain networks and analyzed the performance

with respect to these three measures. Through simulation they have shown that there exist

trade-offs between these measures and proposed different ways to improve these properties.

However, it is still unclear as to what would be the optimal configuration of such survivable

Page 141: ComplexNetworks

30 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

networks. The research question would be “what is the optimal configuration of a network of

N nodes that maximizes the robustness to random failures, targeted attacks, flexibility, and

responsiveness, with the constraint that the number of edges remain the same?”

Until now, we have focussed on the effects of node removal on the static properties of

a network. However, in many real networks, the removal of nodes will also have dynamic

effects on the network as it leads to avalanches of breakdowns also called cascading failures.

For instance, in a power transmission grid, the removal of nodes (power stations) changes

the balance of flows and leads to a global redistribution of loads over all the network. In

some cases, this may not be tolerated and might trigger a cascade of overload failures [82], as

happened on August 10th 1996 in 11 US states and two Canadian provinces [124]. Models

of cascades of irreversible [97] or reversible [45] overload failures have demonstrated that

removal of even a small fraction of highly loaded nodes can trigger global cascades if the

load distribution of the nodes is heterogenous. Hence, cascade-based attacks can be much

more destructive than any other strategies considered in [15, 71]. Later, in [96], Motter

showed that a defence strategy based on a selective further removal of nodes and edges,

right after the initial attack or failure, can drastically reduce the size of the cascade. Other

studies on cascading failures include [39, 94, 95, 138, 141].

11.5.2 Local search

One of the important research problems that has many applications in engineering systems

is search in complex networks. Local search is the process, in which a node tries to find a

network path to a target node using only local information. By local information, we mean

that each node has information only about its first, or perhaps second neighbors and it is

not aware of nodes at a larger distance and how they are connected in the network. This is

an intriguing and relatively little studied problem that has many practical applications. Let

us suppose some required information such as computer files or sensor data is stored at the

nodes of a distributed network or database. Then, in order to quickly determine the location

of particular information, one should have efficient local (decentralized) search strategies.

Note that this is different from neighborhood search strategies used for solving combinatorial

Page 142: ComplexNetworks

11.5. OPTIMIZATION IN COMPLEX NETWORKS 31

optimization problems [2]. For example, consider the networks shown in figure 11.16(a) and

11.16(b). The objective is for node 1 to send a message to node 30 in the shortest possible

path. In the network shown in figure 11.16(a), each node has global connectivity information

about the network (that is, how each and every node is connected in the network). In such

a case, node 1 can calculate the optimal path using traditional algorithms [7] and send the

message through this path (1 - 3 - 12 - 30, depicted by the dotted line). Next, consider

the network shown in figure 11.16 (b), in which each node knows only about its immediate

neighbors. Node 1, based on some search algorithm, chooses to send the message to one of

its neighbors: in this case, node 4. Similarly, node 4 also has only local information, and

uses the same search algorithm to send the message to node 13. This process continues until

the message reaches the target node. We can clearly see that the search path obtained (1

- 4 - 13 - 28 - 23 - 30) is not optimal. However, given that we have only local information

available, the problem tries to design optimal search algorithms in complex networks. The

algorithms discussed in this section may look similar to “distributed routing algorithms” that

are abundant in wireless ad hoc and sensor networks [10, 11]. However, the main difference

is that the former try to exploit the statistical properties of the network topology whereas

the latter do not. Most of the algorithms in wireless sensor networks literature find a path

to the target node either by broadcasting or random walk and then concentrate on efficient

routing of the data from start node to the end node [10, 76]. As we will see in this section,

the statistical properties of the networks have significant effect on the search process. Hence,

the algorithms in wireless sensor networks could be integrated with these results for better

performance.

We discuss this problem for two types of networks. In the first type of network, the global

position of the target node can be quantified and each node has this information. This

information will guide the search process in reaching the target node. For example, if we

look at the network considered in Milgram’s experiment each person has the geographical

and professional information about the target node. All the intermediary people (or nodes)

use this information as a guide for passing the messages. Whereas in the second type of

network, we can not quantify the global position of the target node. In this case, during the

search process, we would not know whether a step in the search process is going towards the

target node or away from it. This makes the local search process even more difficult. One

Page 143: ComplexNetworks

32 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

Figure 11.16: Illustration for different ways of sending message from node 1 to node 30. (a)In this case, each node has global connectivity information about the whole network. Hence,node 1 calculates the optimal path and send the message through this path. (b) In this case,each node has only information about its neighbors (as shown by the dotted curve). Usingthis local information, node 1 tries to send the message to node 30. The path obtained islonger than the optimal path.

such kind of network is the peer-to-peer network, Gnutella [79], where the network structure

is such that one may know very little information about the location of the target node.

Here, when a user is searching for a file he/she does not know the global position of the node

that has the file. Further, when the user sends a request to one of its neighbors, it is difficult

to find out whether this step is towards the target node or away from it. For lack of more

suitable name, we call the networks of the first type spatial networks and networks of the

second type non-spatial networks. In this chapter, we focus more on search in non-spatial

networks.

Search in spatial networks

The problem of local search goes back to the famous experiment by Stanley Milgram [92]

(discussed in section 11.2) illustrating the short distances in social networks. Another im-

portant observation of the experiment, which is even more surprising, is the ability of these

nodes to find these short paths using just the local information. As pointed out by Kleinberg

[83, 84, 85], this is not a trivial statement because most of the time, people have only local

information in the network. This is the information about their immediate friends or perhaps

Page 144: ComplexNetworks

11.5. OPTIMIZATION IN COMPLEX NETWORKS 33

their friends’ friends. They do not have the global information about the acquaintances of

all people in the network. Even in Milgram’s experiment, the people to whom he gave the

letters have only local information about the entire social network. Still, from the results

of the experiment, we can see that arbitrary pairs of strangers are able to find short chains

of acquaintances between them by using only local information. Many models have been

proposed to explain the existence of such short paths [13, 31, 49, 98, 101, 143]. However,

these models are not sufficient to explain the second phenomenon. The observations from

Milgram’s experiment suggest that there is something more embedded in the underlying

social network that guides the message implicitly from the source to the target. Such net-

works which are inherently easy to search are called searchable networks. Mathematically,

a network is searchable if the length of the search path obtained scales logarithmically with

the number of nodes N (∼ logN) or lesser. Kleinberg demonstrated that the emergence of

such a phenomenon requires special topological features [83, 84, 85]. Considering a family

of network models on a n-dimensional lattice that generalizes the Watts-Strogatz model,

he showed that only one particular model among this infinite family can support efficient

decentralized algorithms. Unfortunately, the model given by Kleinberg is highly constrained

and represents a very small subset of complex networks. Watts et al. [144] presented another

model which is based upon plausible hierarchical social structures and contentions regarding

social networks. This model defines a class of searchable networks and offers an explanation

for the searchability of social networks.

Search in non-spatial networks

The traditional search methods in non-spatial networks are broadcasting or random walk.

In broadcasting, each node sends the message to all its neighbors. The neighbors in turn

broadcast the message to all their neighbors, and the process continues. Effectively, all

the nodes in the network would have received the message at least once or may be even

more. This could have devastating effects on the performance of the network. A hint on

the potential damages of broadcasting can be viewed by looking at the Taylorsville NC,

elementary school project [142]. Sixth-grade students and their teacher sent out a sweet

email to all the people they knew. They requested the recipients to forward the email to

Page 145: ComplexNetworks

34 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

everyone they know and notify the students by email so that they could plot their locations

on a map. A few weeks later, the project had to be canceled because they had received

about 450,000 responses from all over the world [142]. A good way to avoid such a huge

exchange of messages is by doing a walk. In a walk, each node sends the message to one

of its neighbors until it reaches the target node. The neighbor can be chosen in different

ways depending on the algorithm. If the neighbor is chosen randomly with equal probability

then it is called random search, while in a high degree search the highest degree neighbor is

chosen. Adamic et al. [6] have demonstrated that high degree search is more efficient than

random search in networks with a power-law degree distribution (scale-free networks). High

degree search sends the message to a more connected neighbor that has higher probability

of reaching the target node and thus exploiting the presence of heterogeneity in node degree

to perform better. They showed that the number of steps (s) required for the random

search until the whole graph is revealed is s ∼ N3(1−2/γ) and for the high-degree search

it is s ∼ N (2−4/γ). Clearly, for γ > 2.0, the number of steps taken by high-degree search

scales with a smaller exponent than the random walk search. Since most real networks have

power-law degree distribution with exponent (γ) between 2.1 and 3.0, high-degree search

would be more effective in these networks.

All the algorithms discussed until now [6, 83, 84, 85, 144], have assumed that the edges in

the network are equivalent. But, the assumption of equal edge weights (which may represent

the cost, bandwidth, distance, or power consumption associated with the process described

by the edge) usually does not hold in real networks. Many researchers [17, 27, 28, 36, 60,

62, 65, 87, 100, 106, 116, 118, 148], have pointed out that it is incomplete to assume that all

the edges are equivalent. Recently, Thadakamalla et al. [131] have proposed a new search

algorithm based on a network measure called local betweenness centrality (LBC) that utilizes

the heterogeneities in node degrees and edge weights. The LBC of a neighbor node i, L(i),

is given by

L(i) =∑

s6=n6=t

s,t ∈ local network

σst(i)

σst

,

where σst is the total number of shortest paths (shortest path means the path over which

the sum of weights is minimal) from node s to t. σst(i) is the number of these shortest paths

passing through i. If the LBC of a node is high, it implies that this node is critical in the local

Page 146: ComplexNetworks

11.5. OPTIMIZATION IN COMPLEX NETWORKS 35

Table 11.2: Comparison of different search strategies in power-law networks with exponent2.1 and 2000 nodes with different edge weight distributions. The mean for all the edge weightdistributions is 5 and the variance is σ2. The values in the table are the average distancesobtained for each search strategy in these networks. The values in the brackets show therelative difference between average distance for each strategy with respect to the averagedistance obtained by the LBC strategy. LBC search, which reflects both the heterogeneitiesin edge weights and node degree, performed the best for all edge weight distributions.

Beta Uniform Exp. Power-lawSearch strategy σ2 = 2.3 σ2 = 8.3 σ2 = 25 σ2 = 4653.8

Random walk1107.71 1097.72 1108.70 1011.21(202%) (241%) (272%) (344%)

Minimum edge weight704.47 414.71 318.95 358.54(92%) (29%) (7%) (44%)

Highest degree379.98 368.43 375.83 394.99(4%) (14%) (26%) (59%)

Minimum average node weight 1228.68 788.15 605.41 466.18(235%) (145%) (103%) (88%)

Highest LBC 366.26 322.30 298.06 247.77

network. Thadakamalla et al. assume that each node in the network has information about

its first and second neighbors and using this information, the node calculates the LBC of each

neighbor and passes the message to the neighbor with the highest LBC. They demonstrated

that this search algorithm utilizes the heterogeneities in node degree and edge-weights to

perform well in power-law networks with exponent between 2.0 and 2.9 for a variety of edge-

weight distributions. Table 11.2 compares the performance of different search algorithms for

scale-free networks with different edge weight distributions. The values in the parentheses

show the relative difference between the average distance for each algorithm with respect

to the average distance obtained by the LBC algorithm. In specific, they observed that as

the heterogeneity in the edge weights increase, the difference between the high-degree search

and LBC search increase. This implies that it is critical to consider the edge weights in the

local search algorithms. Moreover, given that many real networks are heterogeneous in edge

weights, it becomes important to consider an LBC based search rather than high degree

search as shown by Adamic et. al [6].

Page 147: ComplexNetworks

36 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

11.5.3 Other topics

There are various other applications to real networks which include the issues related to the

structure of the networks and their dynamics. In this subsection, we briefly summarize these

applications and give some references for further study.

Detecting community structures

As mentioned earlier, community structures are typically found in many real networks. Find-

ing these communities is extremely helpful in understanding the structure and function of the

network. Sometimes the statistical properties of the community alone may be very different

from the whole network and hence these may be critical in understanding the dynamics in

the community. The following are some of the examples:

• The World Wide Web: Identification of communities in the web is helpful for im-

plementation of search engines, content filtering, automatic classification, automatic

realization of ontologies and focussed crawlers [18, 56].

• Social networks: Community structures are a typical feature of a social network. The

behavior of an individual is highly influenced by the community he/she belongs. Com-

munities often have their own norms, subcultures which are an important source of a

person’s identity [103, 139].

• Biological networks : Community structures are found in cellular [72, 123], metabolic

[121] and genetic networks [147]. Identifying them helps in finding the functional

modules which correspond to specific biological functions.

Algorithmically, the community detection problem is same as cluster analysis problem

studied extensively by OR community, computer scientists, statisticians, and mathemati-

cians [67]. One of the major classes of algorithms for clustering is hierarchical algorithms

which fall into two broad types, agglomerative and divisive. In an agglomerative method,

an empty network (n nodes with no edges) is considered and edges are added based on

some similarity measure between nodes (for example, similarity based on the number of

Page 148: ComplexNetworks

11.5. OPTIMIZATION IN COMPLEX NETWORKS 37

common neighbors) starting with the edge between the pairs with highest similarity. This

procedure can be stopped at any step and the distinct components of the network are taken

to be the communities. On the other hand, in divisive methods edges are removed from

the network based on certain measure (for example, the edge with the highest betweenness

centrality [103]). As this process continues the network disintegrates into different communi-

ties. Recently, many such algorithms are proposed and applied to complex networks [31, 46].

A comprehensive list of algorithms to identify community structures in complex networks

can be found in [46] where Danon et al. have compared them in terms of sensitivity and

computational cost.

Another interesting problem in community detection is to find a clique of maximum

cardinality in the network. A clique is a complete subgraph in the network. In the network

G(V, E), let G(S) denote the subgraph induced by a subset S ⊆ V . A network G(V, E) is

complete if each node in the network is connected to every other node, i.e. ∀i, j ∈ V, {i, j} ∈

E. A clique C is a subset of V such that the induced graph G(C) is complete. The maximum

clique problem has many practical applications such as project selection, coding theory,

computer vision, economics and integration of genome mapping data [38, 68, 110]. For

instance, in [33], Boginski et al. solve this problem for finding maximal independent set in the

market graph which can form a base for forming a diversified portfolio. The maximum clique

problem is known to be NP-hard [58] and details on various algorithms and heuristics can be

found in [78, 110]. Further, if the network size is large, then the data may not fit completely

inside the computer’s internal memory. Then we need to use external memory algorithms and

data structures [3] for solving the optimization problems in such networks. These algorithms

use slower external memory (such as disks) and the resulting communication between internal

memory and external memory can be a major performance bottleneck. In [4], using external

memory algorithms, Abello et al proposed decomposition schemes that make large sparse

graphs suitable for processing by graph optimization algorithms.

Page 149: ComplexNetworks

38 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

Spreading processes

Diffusion of a infectious disease, computer virus or information on a network constitute

examples of spreading processes. In particular, the spread of infectious diseases in a pop-

ulation is called epidemic spreading. The study of epidemiological modeling has been an

active research area for a long time and is heavily used in planning and implementing var-

ious prevention and control programs [48]. Recently, there has been a burst of activities

on understanding the effects of the network properties on the rate and dynamics of disease

propagation [13, 31, 49, 101]. Most of the earlier methods used the homogenous mixing

hypothesis [21], which implies that the individuals who are in contact with susceptible indi-

viduals are uniformly distributed throughout the entire population. However, recent findings

(section 11.2) such as heterogeneities in node degree, presence of high clustering coefficients,

and community structures indicate that this assumption is far from reality. Later, many

models have been proposed [13, 31, 42, 49, 101, 112, 115] which consider these properties of

the network. In particular, many researchers have shown that incorporating these properties

in the model radically changes the results previously established for random graphs. Other

spreading processes which are of interest include spread of computer viruses [24, 91, 105],

data dissemination on the Internet [80, 137], and strategies for marketing campaigns [90].

Congestion

Transport of packets or materials ranging from packet transfer in the Internet to the mass

transfer in chemical reactions in cell is one of the fundamental processes occurring on many

real networks. Due to limitations in resources (bandwidth), increase in number of packets

(packet generation rate) may lead to overload at the node and unusually long deliver times,

in other words, congestion in networks. Considering a basic model, Ohira and Sawatari [107]

have shown that there exists a phase transition from a free flow to a congested phase as

a function of the packet generation rate. This critical rate is commonly called “congestion

threshold” and the higher the threshold, the better is the network performance with respect

to congestion.

Page 150: ComplexNetworks

11.6. CONCLUSIONS 39

Many studies have shown that an important role is played by the topology and routing

algorithms in the congestion of networks [40, 47, 50, 51, 63, 64, 126, 128, 132]. Toroczkai et

al. [132] have shown that on large networks on which flows are influenced by gradients of a

scalar distributed on the nodes, scale-free topologies are less prone to congestion than random

graphs. Routing algorithms also influence congestion at nodes. For example, in scale-free

networks, if the packets are routed through the shortest paths then most of the packets

pass through the hubs and hence causing higher loads on the hubs [59]. Singh and Gupte

[126] discuss strategies to manipulate hub capacity and hub connections to relieve congestion

in the network. Similarly many congestion-aware routing algorithms [40, 50, 51, 128] have

been proposed to improve the performance. Sreenivasan et al. [128] introduced a novel static

routing protocol which is superior to shortest path routing under intense packet generation

rates. They propose a mechanism in which packets are routed through hub avoidance paths

unless the hubs are required to establish the route. Sometimes when global information is

not available, routing is done using local search algorithms. Congestion due to such local

search algorithms and optimal network configurations are studied in [22].

11.6 Conclusions

Complex networks abound in today’s world and are continuously evolving. The sheer size

and complexity of these networks pose unique challenges in their design and analysis. Such

complex networks are so pervasive that there is an immediate need to develop new analytical

approaches. In this chapter, we presented significant findings and developments in recent

years that led to a new field of inter-disciplinary research, Network Science. We discussed

how network approaches and optimization problems are different in network science than

traditional OR algorithms and addressed the need and opportunity for the OR community

to contribute to this fast-growing research field. The fundamental difference is that large-

scale networks are characterized based on macroscopic properties such as degree distribution

and clustering coefficient rather than the individual properties of the nodes and edges. Im-

portantly, these macroscopic or statistical properties have a huge influence on the dynamic

processes taking place on the network. Therefore, to optimize a process on a given config-

Page 151: ComplexNetworks

40 CHAPTER 11. COMPLEXITY AND LARGE-SCALE NETWORKS

uration, it is important to understand the interactions between the macroscopic properties

and the process. This will further help in the design of optimal network configurations for

various processes. Due to the growing scale of many engineered systems, a macroscopic

network approach is necessary for the design and analysis of such systems. Moreover, the

macroscopic properties and structure of networks across different disciplines are found to be

similar. Hence the results of this research can easily be migrated to applications as diverse

as social networks to telecommunication networks.

Acknowledgments

The authors would like to acknowledge the National Science Foundation (Grant # DMI

0537992) and a Sloan Research Fellowship to one of the authors (R. A.) for making this work

feasible. In addition, the authors would like to thank the anonymous reviewer for helpful

comments and suggestions. Any opinions, findings and conclusions or recommendations

expressed in this material are those of the author(s) and do not necessarily reflect the views

of the National Science Foundation (NSF).

Page 152: ComplexNetworks

Bibliography

[1] The Internet Movie Database can be found on the WWW at http://www.imdb.com/.

[2] E. Aarts and J. K. Lenstra, editors. Local Search in Combinatorial Optimization. J.

Wiley & Sons, Chichester, UK, 1997.

[3] J. Abello and J. Vitter, editors. External Memory Algorithms: DIMACS series in

discrete mathematics and theoretical computer science, volume 50. American Mathe-

matical Society, Boston, MA, USA, 1999.

[4] J. Abello, P. M. Pardalos, and M. G. C. Resende. External Memory Algorithms:

DIMACS series in discrete mathematics and theoretical computer science, volume 50,

chapter On maximum clique problems in very large graphs, pages 119–130. American

Mathematical Society, 1999.

[5] L. A. Adamic and B. A. Huberman. Growth dynamics of the world-wide web. Nature,

401(6749):131, 1999.

[6] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman. Search in power-

law networks. Phys. Rev. E, 64(4):046135, 2001.

[7] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network flows: Theory, Algorithms,

and Applications. Prentice-Hall, NJ, 1993.

41

Page 153: ComplexNetworks

42 BIBLIOGRAPHY

[8] W. Aiello, F. Chung, and L. Lu. A random graph model for massive graphs. Proceedings

of the thirty-second annual ACM symposium on Theory of computing, pages 171–180,

2000.

[9] W. Aiello, F. Chung, and L. Lu. A random graph model for power law graphs. Exper-

imental Mathematics, 10(1):53–66, 2001.

[10] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. Wireless sensor net-

works: A survey. Computer Networks, 38(4):393–422, 2002.

[11] J. N. Al-Karaki and A. E. Kamal. Routing techniques in wireless sensor networks: a

survey. IEEE Wireless Communications, 11(6):6–28, 2004.

[12] R. Albert and A. L. Barabasi. Topology of evolving networks: Local events and

universality. Phys. Rev. Lett., 85(24):5234–5237, 2000.

[13] R. Albert and A. L. Barabasi. Statistical mechanics of complex networks. Reviews of

Modern Physics, 74(1):47–97, 2002.

[14] R. Albert, H. Jeong, and A. L. Barabasi. Diameter of the world wide web. Nature,

401(6749):130–131, 1999.

[15] R. Albert, H. Jeong, and A. L. Barabasi. Attack and error tolerance of complex

networks. Nature, 406(6794):378–382, 2000.

[16] R. Albert, I. Albert, and G. L. Nakarado. Structural vulnerability of the north american

power grid. Phys. Rev. E, 69(2):025103, 2004.

[17] E. Almaas, B. Kovacs, T. Viscek, Z. N. Oltval, and A. L. Barabasi. Global organization

of metabolic fluxes in the bacterium escherichia coli. Nature, 427(6977):839–843, 2004.

[18] R. B. Almeida and V. A. F. Almeida. A community-aware search engine. In Proceed-

ings of the 13th International Conference on World Wide Web, ACM Press, 2004.

Page 154: ComplexNetworks

BIBLIOGRAPHY 43

[19] L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley. Classes of small-world

networks. Proc. Natl. Acad. Sci., 97(21):11149–11152, 2000.

[20] C. Anderson, S. Wasserman, and B. Crouch. A p∗ primer: Logit models for social

networks. Social Networks, 21(1):37–66, 1999.

[21] R. M. Anderson and R. M. May. Infectious Diseases in Humans. Oxford University

Press, Oxford, 1992.

[22] A. Arenas, A. Cabrales, A. Diaz-Guilera, R. Guimera, and F. Vega. Statistical me-

chanics of complex networks, chapter Search and Congestion in Complex Networks,

pages 175–194. Springer-Verlag, Berlin, Germany, 2003.

[23] R. Badii and A. Politi. Complexity : Hierarchical structures and scaling in physics.

Cambridge university press, 1997.

[24] J. Balthrop, S. Forrest, M. E. J. Newman, and M. M. Williamson. Technological

networks and the spread of computer viruses. Science, 304(5670):527–529, 2004.

[25] A. L Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286

(5439):509–512, 1999.

[26] A. L. Barabasi, H. Jeong, Z. Neda, E. Ravasz, A. Schubert, and T. Vicsek. Evolution

of the social network of scientific collaborations. Physica A, 311:590–614, 2002.

[27] A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespignani. The architecture of

complex weighted networks. Proc. Natl. Acad. Sci., 101(11):3747, 2004.

[28] A. Barrat, M. Barthelemy, and A. Vespignani. Modeling the evolution of weighted

networks. Phys. Rev. E, 70(6):066149, 2004.

[29] M. Barthelemy, A. Barrat, R. Pastor-Satorras, and A. Vespignani. Characterization

and modeling of weighted networks. Physica A, 346:34–43, 2005.

Page 155: ComplexNetworks

44 BIBLIOGRAPHY

[30] C. H. Bennett. From Complexity to Life, chapter How to Define Complexity in Physics,

and Why, pages 34–43. Oxford University Press, 2003.

[31] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D. U. Hwang. Complex networks:

Structure and dynamics. Physics Reports, 424:175–308, 2006.

[32] V. Boginski, S. Butenko, and P. Pardalos. Statistical analysis of financial networks.

Computational Statistics & Data Analysis, 48:431–443, 2005.

[33] V. Boginski, S. Butenko, and P. Pardalos. Mining market data: a network approach.

Computers & Operations Research, 33:3171–3184, 2006.

[34] B. Bollobas. Randon graphs. Academic, London, 1985.

[35] B. Bollobas and O. Riordan. Handbook of Graphs and Networks, chapter Mathematical

results on scale-free graphs. Wiley-VCH, Berlin, 2003.

[36] L. A. Braunstein, S. V. Buldyrev, R. Cohen, S. Havlin, and H. E. Stanley. Optimal

paths in disordered complex networks. Phys. Rev. Lett., 91(16):168701, 2003.

[37] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins,

and J. Wiener. Graph structure in the web. Computer networks, 33:309–320, 2000.

[38] S. Butenko and W.E. Wilhelm. Clique-detection models in computational biochemistry

and genomics. European Journal of Operational Research, 173:1–17, 2006.

[39] B. A. Carreras, V. E. Lynch, I. Dobson, and D. E. Newman. Critical points and

transitions in an electric power transmission model for cascading failure blackouts.

Chaos, 12(4):985–994, 2002.

[40] Z. Y. Chen and X. F. Wang. Effects of network structure and routing strategy on

network capacity. Phys. Rev. E, 73(3):036107, 2006.

Page 156: ComplexNetworks

BIBLIOGRAPHY 45

[41] F. Chung and L. Lu. Connected components in random graphs with given degree

sequences. Annals of combinatorics, 6:125–145, 2002.

[42] V. Colizza, A. Barrat, M. Barthlemy, and A. Vespignani. The role of the airline

transportation network in the prediction and predictability of global epidemics. PNAS,

103(7):2015–2020, 2006.

[43] N. Contractor, S. Wasserman, and K. Faust. Testing multi-theoretical multilevel hy-

potheses about organizational networks: An analytic framework and empirical exam-

ple. Academy of Management Review, 31(3):681–703, 2006.

[44] L. F. Costa. Reinforcing the resilience of complex networks. Phys. Rev. E, 69(6):

066127, 2004.

[45] P. Crucitti, V. Latora, and M. Marchiori. Model for cascading failures in complex

networks. Phys. Rev. E, 69(4):045104, 2004.

[46] L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas. Comparing community structure

identification. Journal of Statistical Mechanics, page P09008, 2005.

[47] M. Argollo de Menezes and A.-L. Barabasi. Fluctuations in network dynamics. Phys.

Rev. Lett., 92(2):028701, 2004.

[48] O. Diekmann and J. Heesterbeek. Mathematical Epidemiology of Infectious Diseases:

Model Building, Analysis and Interpretation. Wiley, New York, 2000.

[49] S. N. Dorogovtsev and J. F. F. Mendes. Evolution of networks. Adv. Phys., 51:

1079–1187, 2002.

[50] P. Echenique, J. Gomez-Gardenes, and Y. Moreno. Improved routing strategies for

internet traffic delivery. Phys. Rev. E, 70(5):056105, 2004.

[51] P. Echenique, J. Gomez-Gardenes, and Y. Moreno. Dynamics of jamming transitions

in complex networks. Europhys. Lett., 71(2):325–331, 2005.

Page 157: ComplexNetworks

46 BIBLIOGRAPHY

[52] P. Erdos and A.Renyi. On random graphs. Publicationes Mathematicae, 6:290–297,

1959.

[53] P. Erdos and A.Renyi. On the evolution of random graphs. Magyar Tud. Mat. Kutato

Int. Kozl., 5:17–61, 1960.

[54] P. Erdos and A.Renyi. On the strength of connectedness of a random graph. Acta

Math. Acad. Sci. Hungar., 12:261–267, 1961.

[55] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet

topology. Computer Communications Review, 29:251–262, 1999.

[56] G. Flake, S. Lawrence, and C. Lee Giles. Efficient identification of web communities.

In Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data

Mining, pages 150–160, 2000.

[57] O. Frank and D. Strauss. Markov graphs. J. American Statistical Association, 81:

832–842, 1986.

[58] M. R. Garey and D. S. Johnson. Computers and Intractability, A Guide to the Theory

of NP-Completeness. W. H. Freeman, 1979.

[59] K. I. Goh, B. Kahng, and D. Kim. Universal behavior of load distribution in scale-free

networks. Phys. Rev. Lett., 87(27), 2001. 278701.

[60] K. I. Goh, J. D. Noh, B. Kahng, and D. Kim. Load distribution in weighted complex

networks. Phys. Rev. E, 72(1):017102, 2005.

[61] R. Govindan and H. Tangmunarunkit. Heuristics for internet map discovery. IEEE

INFOCOM, 3:1371–1380, 2000.

[62] M. Granovetter. The strength of weak ties. American Journal of Sociology, 78(6):

1360–1380, 1973.

Page 158: ComplexNetworks

BIBLIOGRAPHY 47

[63] R. Guimera, A. Arenas, A. Dıaz-Guilera, and F. Giralt. Dynamical properties of model

communication networks. Phys. Rev. E, 66(2):026704, 2002.

[64] R. Guimera, A. Dıaz-Guilera, F. Vega-Redondo, A. Cabrales, and A. Arenas. Optimal

network topologies for local search with congestion. Phys. Rev. Lett., 89(24):248701,

2002.

[65] R. Guimera, S. Mossa, A. Turtschi, and L. A. N. Amaral. The worldwide air trans-

portation network: Anomalous centrality, community structure, and cities’ global roles.

Proc. Nat. Acad. Sci., 102:7794–7799, 2005.

[66] A. Gulli and A. Signorini. The indexable web is more than 11.5 billion pages. In

WWW ’05: Special interest tracks and posters of the 14th international conference on

World Wide Web, pages 902–903. ACM Press, New York, USA, 2005.

[67] P. Hansen and B. Jaumard. Cluster analysis and mathematical programming. Math-

ematical programming, 79:191–215, 1997.

[68] J. Hasselberg, P. M. Pardalos, and G. Vairaktarakis. Test case generators and compu-

tational results for the maximum clique problem. Journal of Global Optimization, 3:

463–482, 1993.

[69] B. Hendrickson and R. W. Leland. A multilevel algorithm for partitioning graphs. In

Supercomputing ’95: Proceedings of the 1995 ACM/IEEE conference on Supercomput-

ing, page 28. ACM Press, New York, USA, 1995.

[70] P. W. Holland and S. Leinhardt. An exponential family of probability distributions

for directed graphs. J. American Statistical Association, 76:33–65, 1981.

[71] P. Holme and B. J. Kim. Attack vulnerability of complex networks. Phys. Rev. E, 65

(5), 2002.

Page 159: ComplexNetworks

48 BIBLIOGRAPHY

[72] P. Holme, M. Huss, and H. Jeong. Subnetwork hierarchies of biochemical pathways.

Bioinformatics, 19:532–538, 2003.

[73] V. Honavar. Complex Adaptive Systems Group at Iowa State University,

http://www.cs.iastate.edu/∼honavar/cas.html, date accessed: March 22, 2006.

[74] R. Ferrer i Cancho and R. V. Sole. Statistical mechanics of complex networks, chapter

Optimization in complex networks, pages 114–126. Springer-Verlag, Berlin, 2003.

[75] R. Ferrer i Cancho, C. Janssen, and R. V. Sole. Topology of technology graphs: Small

world patterns in electronic circuits. Phys. Rev. E, 64(4):046119, 2001.

[76] C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: a scalable and

robust communication paradigm for sensor networks. Proceedings of ACM MobiCom

’00, Boston, MA, pages 174–185, 2000.

[77] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabasi. The large-scale

organization of metabolic networks. Nature, 407:651–654, 2000.

[78] D. J. Johnson and M. A. Trick, editors. Cliques, Coloring, and Satisfiability: Sec-

ond DIMACS Implementation Challenge, Workshop, October 11-13, 1993. American

Mathematical Society, Boston, USA, 1996.

[79] G. Kan. Peer-to-Peer Harnessing the Power of Disruptive Technologies, chapter

Gnutella. O’Reilly, Beijing, 2001.

[80] A.-M. Kermarrec, L. Massoulie, and A. J. Ganesh. Probabilistic reliable dissemination

in large-scale systems. IEEE Trans. on Parallel and Distributed Sys, 14(3):248–258,

2003.

[81] B.W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs.

The Bell System Technical Journal, 49:291–307, 1970.

Page 160: ComplexNetworks

BIBLIOGRAPHY 49

[82] R. Kinney, P. Crucitti, R. Albert, and V. Latora. Modeling cascading failures in the

north american power grid. The European Physical Journal B, 46:101–107, 2005.

[83] J. Kleinberg. Navigation in a small world. Nature, 406:845, 2000.

[84] J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. 32nd

ACM Symposium on Theory of Computing, pages 163–170, 2000.

[85] J. Kleinberg. Small-world phenomena and the dynamics of information. Advances in

Neural Information Processing Systems, 14:431–438, 2001.

[86] D. Koschutzki, K. A. Lehmann, L. Peeters, S. Richter, D. Tenfelde-Podehl, and O. Zlo-

towski. Network Analysis, chapter Centrality Indices, pages 16–61. Springer-Verlag,

Berlin, 2005.

[87] A. E. Krause, K. A. Frank, D. M. Mason, R. E. Ulanowicz, and W. W. Taylor. Com-

partments revealed in food-web structure. Nature, 426:282–285, 2003.

[88] R. Kumar, P. Raghavan, S. Rajalopagan, D. Sivakumar, A. Tomkins, and E. Upfal.

The web as a graph. Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART

symposium on Principles of database systems, pages 1–10, 2000.

[89] S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:

107–109, 1999.

[90] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing,

2005. e-print physics/0509039, http://lanl.arxiv.org/abs/physics?papernum=0509039.

[91] A. L. Lloyd and R. M. May. How viruses spread among computers and people. Science,

292:1316–1317, 2001.

[92] S. Milgram. The small world problem. Psychology Today, 2:60–67, 1967.

Page 161: ComplexNetworks

50 BIBLIOGRAPHY

[93] M. Molloy and B. Reed. A critical point for random graphs with a given degree

sequence. Random Structures Algorithms, 6:161–179, 1995.

[94] Y. Moreno, J. B. Gomez, and A. F. Pacheco. Instability of scale-free networks under

node-breaking avalanches. Europhys. Lett., 58(4):630–636, 2002.

[95] Y. Moreno, R. Pastor-Satorras, A. Vazquez, and A. Vespignani. Critical load and

congestion instabilities in scale-free networks. Europhys. Lett., 62(2):292–298, 2003.

[96] A. E. Motter. Cascade control and defense in complex networks. Phys. Rev. Lett., 93

(9):098701, 2004.

[97] A. E. Motter and Y. Lai. Cascade-based attacks on complex networks. Phys. Rev. E,

66(6):065102, 2002.

[98] M. E. J. Newman. Models of small world. Journal Statistical Physics, 101:819–841,

2000.

[99] M. E. J. Newman. Scientific collaboration networks: I. network construction and

fundamental results. Phys. Rev. E, 64(1):016131, 2001.

[100] M. E. J. Newman. Scientific collaboration networks: Ii. shortest paths, weighted

networks, and centrality. Phys. Rev. E, 64(1):016132, 2001.

[101] M. E. J. Newman. The structure and function of complex networks. SIAM Review,

45:167–256, 2003.

[102] M. E. J. Newman. Handbook of Graphs and Networks, chapter Random graphs as

models of networks.

[103] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in

networks. Phys. Rev. E, 69(2):026113, 2004.

Page 162: ComplexNetworks

BIBLIOGRAPHY 51

[104] M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random graphs with arbitrary

degree distributions and their applications. Phys. Rev. E, 64(2):026118, 2001.

[105] M. E. J. Newman, S. Forrest, and J. Balthrop. Email networks and the spread of

computer viruses. Phys. Rev. E, 66(3):035101, 2002.

[106] J. D. Noh and H. Rieger. Stability of shortest paths in complex networks with random

edge weights. Phys. Rev. E, 66(6):066127, 2002.

[107] T. Ohira and R. Sawatari. Phase transition in a computer network traffic model. Phys.

Rev. E, 58(1):193–195, 1998.

[108] Committee on network science for future army applications. Network Science. The

National Academies Press, 2005.

[109] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Uncovering the overlapping community

structure of complex networks in nature and society. Nature, 435:814–818, 2005.

[110] P. M. Pardalos and J. Xue. The maximum clique problem. Journal of Global Opti-

mization, 4:301–328, 1994.

[111] R. Pastor-Satorras and A. Vespignani. Epidemic dynamics and endemic states in

complex networks. Phys. Rev. E, 63(6):066117, 2001.

[112] R. Pastor-Satorras and A. Vespignani. Epidemic spreading in scale-free networks.

Phys. Rev. Lett., 86:3200–3203, 2001.

[113] R. Pastor-Satorras and A. Vespignani. Epidemic dynamics in finite size scale-free

networks. Phys. Rev. E, 65(3):035108, 2002.

[114] R. Pastor-Satorras and A. Vespignani. Immunization of complex networks. Phys. Rev.

E, 65(3):036104, 2002.

Page 163: ComplexNetworks

52 BIBLIOGRAPHY

[115] R. Pastor-Satorras and A. Vespignani. Handbook of Graphs and Networks, chapter

Epidemics and immunization in scale-free networks. Wiley-VCH, Berlin, 2003.

[116] R. Pastor-Satorras and A. Vespignani. Evolution and structure of the Internet: A

statistical physics approach. Cambridge University Press, 2004.

[117] G. Paul, T. Tanizawa, S. Havlin, and H. E. Stanley. Optimization of robustness of

complex networks. Eur. Phys. Journal B, 38:187–191, 2004.

[118] S.L. Pimm. Food Webs. The University of Chicago Press, 2 edition, 2002.

[119] A. Pothen, H. Simon, and K. Liou. Partitioning sparse matrices with eigenvectors of

graphs. SIAM J. Matrix Anal., 11(3):430–452, 1990.

[120] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identi-

fying communities in networks. Proc. Natl. Acad. Sci., 101:2658–2663, 2004.

[121] E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A. L. Barabasi. Hierarchical

organization of modularity in metabolic networks. Science, 297:1551–1555, 2002.

[122] M. Ripeanu, I. Foster, and A. Iamnitchi. Mapping the gnutella network: Properties

of large-scale peer-to-peer systems and implications for system design. IEEE Internet

Computing Journal, 6:50–57, 2002.

[123] A. W. Rives and T. Galitskidagger. Modular organization of cellular networks. Proc.

Natl. Acad. Sci., 100(3):1128–1133, 2003.

[124] M. L. Sachtjen, B. A. Carreras, and V. E. Lynch. Disturbances in a power transmission

system. Phys. Rev. E, 61(5):4877–4882, 2000.

[125] B. Shargel, H. Sayama, I. R. Epstein, and Y. Bar-Yam. Optimization of robustness

and connectivity in complex networks. Phys. Rev. Lett., 90(6):068701, 2003.

Page 164: ComplexNetworks

BIBLIOGRAPHY 53

[126] B. K. Singh and N. Gupte. Congestion and decongestion in a communication network.

Phys. Rev. E, 71(5):055103, 2005.

[127] T. A. B. Snijders. Markov chain monte carlo estimation of exponential random graph

models. J. Social Structure, 3(2):1–40, 2002.

[128] S. Sreenivasan, R. Cohen, E. Lopez, Z. Toroczkai, and H. E. Stanley. Com-

munication bottlenecks in scale-free networks, 2006. e-print cs.NI/0604023,

http://xxx.lanl.gov/abs/cs?papernum=0604023.

[129] D. Strauss. On a general class of models for interaction. SIAM Review, 28:513–527,

1986.

[130] H. P. Thadakamalla, U. N. Raghavan, S. R. T. Kumara, and R. Albert. Survivability

of multi-agent based supply networks: A topological perspective. IEEE Intelligent

Systems, 19:24–31, 2004.

[131] H. P. Thadakamalla, R. Albert, and S. R. T. Kumara. Search in weighted complex

networks. Phys. Rev. E, 72(6):066128, 2005.

[132] Z. Toroczkai and K. E. Bassler. Network dynamics: Jamming is limited in scale-free

systems. Nature, 428:716, 2004.

[133] A. X. C. N. Valente, A. Sarkar, and H. A. Stone. Two-peak and three-peak optimal

complex networks. Phys. Rev. Lett., 92(11):118702, 2004.

[134] V. Venkatasubramanian, S. Katare, P. R. Patkar, and F. Mu. Spontaneous emergence

of complex optimal networks through evolutionary adaptation. Computers & Chemical

Engineering, 28(9):1789–1798, 2004.

[135] A. Vespignani. Epidemic modeling: Dealing with complexity.

http://vw.indiana.edu/talks-fall04/, 2004. Date accessed: July 6, 2006.

Page 165: ComplexNetworks

54 BIBLIOGRAPHY

[136] A. Vespignani. Frontiers of Engineering: Reports on Leading-Edge Engineering from

the 2005 Symposium, chapter Complex Networks: Ubiquity, Importance, and Implica-

tions, pages 75–81. The National Academies Press, 2006.

[137] W. Vogels, R. van Renesse, and K. Birman. The power of epidemics: robust commu-

nication for large-scale distributed systems. SIGCOMM Comput. Commun. Rev., 33

(1):131–135, 2003.

[138] X. F. Wang and J. Xu. Cascading failures in coupled map lattices. Phys. Rev. E, 70

(5):056113, 2004.

[139] S. Wasserman and K. Faust. Social Network Analysis. Cambridge University Press,

1994.

[140] S. Wasserman and P. Pattison. Logit models and logistic regressions for social networks

1: An introduction to markov random graphs and p∗. Psychometrika, 61:401–426, 1996.

[141] D. J. Watts. A simple model of global cascades on random networks. Proc. Natl. Acad.

Sci., 99(9):5766–5771, 2002.

[142] D. J. Watts. Six degrees: The science of a connected age. W. W. Norton & Company,

2003.

[143] D. J. Watts and S. H. Strogatz. Collective dynamics of “small-world” networks. Nature,

393:440–442, 1998.

[144] D. J. Watts, P. S. Dodds, and M. E. J. Newman. Identity and search in social networks.

Science, 296:1302–1305, 2002.

[145] D. J. Watts, P. S. Dodds, and R. Muhamad.

http://smallworld.columbia.edu/index.html, date accessed: March 22, 2006.

[146] H. S. Wilf. Generating Functionology. Academic, Boston, 1990.

Page 166: ComplexNetworks

BIBLIOGRAPHY 55

[147] D. M. Wilkinson and B. A. Huberman. A method for finding communities of related

genes. Proc. Natl. Acad. Sci., 101:5241–5248, 2004.

[148] S. H. Yook, H. Jeong, A. L. Barabasi, and Y. Tu. Weighted evolving networks. Phys.

Rev. Lett., 2001:5835–5838, 86.