30
SDSC, skitter (July 1998) A random graph model for massive graphs William Aiello Fan Chung Graham Lincoln Lu

SDSC, skitter (July 1998)

  • Upload
    ingo

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

A random graph model for massive graphs. William Aiello Fan Chung Graham Lincoln Lu. SDSC, skitter (July 1998). What are the properties of the WWW Graph?. Is the World Wide Web connected? If not, how large is the largest component, the second largest component, etc.? - PowerPoint PPT Presentation

Citation preview

Page 1: SDSC, skitter (July 1998)

SDSC, skitter (July 1998)

A random graph model

for massive graphs

William Aiello

Fan Chung Graham

Lincoln Lu

Page 2: SDSC, skitter (July 1998)

What are the properties of the WWW Graph?

• Is the World Wide Web connected?

• If not, how large is the largest component, the second largest component, etc.?

• Can these questions be answered exactly?

• Probably not! The WWW is changing constantly. Even a “snapshot” of the Web is too large to handle.

Page 3: SDSC, skitter (July 1998)

An important observation

WWW graph has a power lawpower law degree distribution

• Broder, Kleinberg, Kumar, Raghavan, Rajagopalan aaand Tomkins, 1999.

• Barabási, Albert and Jeung, 1999.

Discovered by several groups independently

Page 4: SDSC, skitter (July 1998)

Power Law Graphs

Power law decay of the degree distribution:

The number of vertices of degree d is proportional to 1/d where is some constant > 0.Let y(d) be the number of nodes of degree d

y ~ 1/d

log y = – log d

Page 5: SDSC, skitter (July 1998)
Page 6: SDSC, skitter (July 1998)

Power Law Graphs Robust and Ubiquitous

• Internet Router Graph• Power Grid Graph• Phone Call Graph• Scientific Citation Graph• Co-Stars Graph (e.g. the six degrees of

Kevin Bacon)• The power in the power law stays

constant even as the graphs grow and change.

Page 7: SDSC, skitter (July 1998)

What does a massive graph look like?

sparse

clustered

small diameter

Hard to describe !

Harder to analyze !!

prohibitively large

dynamically changing

incomplete information

Page 8: SDSC, skitter (July 1998)

Don’t worry about exact answers—Use Models Instead

• Data sets too large and dynamic for exact analysis occur in many other areas: the physical, biological, and social sciences and engineering.

• Progress in understanding often made by iterative interplay between modeling and experimental data, where both often have a random or statistical nature.

Page 9: SDSC, skitter (July 1998)

Modeling Power Law Graphs

• Develop model of Power Law Graphs

• Analyze properties of model, e.g., connected component structure

• Compare results to experimental data

• Our model will be of variant of an important model in graph theory called Random Graphs

Page 10: SDSC, skitter (July 1998)

Random Graphs

• G(n,e)– n nodes– all graphs with e edges have uniform

probability

H(3,1) prob 1/3

H(3,2) prob 1/3

Page 11: SDSC, skitter (July 1998)

Random Graphs

• G(n,p)– n nodes– each edge is included with probability p– expected degree = p(n-1)

(1-p)3

p3

p(1-p)2

p2(1-p)

Page 12: SDSC, skitter (July 1998)

Paul Erdos and A. Renyi,

On the evolution of random graphs

Magyar Tud. Akad. Mat. Kut. Int. Kozl. 5 (1960) 17-61.

.. /

Page 13: SDSC, skitter (July 1998)

The evolution of random graphs G(n,p) 0

cycles of any size

one giant component, i.e., size (n), other components are o(n)-sized treeslog n/n

connected and almost regular, expected degree ~ w log n

1/n

p

disjoint union of trees

the double jumps

c’/n, c>1

G(n,p) is connected

w log n/n, w

c/n, 0<c<1

Page 14: SDSC, skitter (July 1998)

Random Graphs and Degree Distributions

• H(n,s)– n nodes– s = (y(1), y(2), … , y(n-1)), where y(i)

is the number of nodes with degree i.– all graphs with degree distribution s

have uniform probability

Page 15: SDSC, skitter (July 1998)

Random Graphs and Degree Distributions

H(4,s), s = (1,2,1). All have prob. 1/12

Page 16: SDSC, skitter (July 1998)

Random Power Law Graphs

A power law degree distribution can be described by two parameters:

y = e/x

log y = – log x

where y is the number of nodes of degree x

A new random graph model: P(,).

P() assigns uniform probability to all graphs with degree distribution y = e/x

Page 17: SDSC, skitter (July 1998)

A few facts about P(,):

•The maximum degree is e/.

• The number of vertices n is

n = e/x ~ () e , 1 x e

where x the Reimann Zeta function.• The number of edges E is

E = 1/2 e/x-1 ~ (-1) e/2

• The density E/n = is controlled by

Page 18: SDSC, skitter (July 1998)

Facts on P(,):

a root of ς(-2)=2 ς(-1)

The second largest components are of size O(log n). For any x, 2<x<O(log n), there is a component of size x.

smaller components are of size O(log n/log log n). For any x, 2<≤x<O(log n/log log n), there is a component of size x.

smaller components are of size O(1).

connected

0

1

2

3.478...

not connected—unique giant component of size (n)

no giant component

Page 19: SDSC, skitter (July 1998)
Page 20: SDSC, skitter (July 1998)

How do Power Law Graphs Arise?

• The previous model takes the power law degree distribution as a given.

• It does not explain how such graphs arise.

• Results which hold in the model with high probability (e.g., our connected component results) will apply to the vast majority of power law graphs regardless of the particulars of the evolution process.

Page 21: SDSC, skitter (July 1998)

Yet Another Random Graph Model

(n) is a random graph evolution:

• Let Kn be the set of all possible edges

• Let Et be the edges chosen in steps 1 through t.

• At time step t+1 choose uniformly one of the edges in Kn – Et

• Add this edge to Et to get Et+1.

• Study what structures appear with high probability as a function of t.

Page 22: SDSC, skitter (July 1998)

Need a new idea

(n) fixes the set of nodes and then adds edges.

Can show that to get a power law, need to add both nodes and edges.

(n) chooses uniformly among all eligible edges

Can show that selecting edges uniformly will not yield a power law.

Page 23: SDSC, skitter (July 1998)

A Graph Evolution Process

• At each time step t, toss a biased coin having heads with probability p.

• “tails” -> add a new vertex with a self-loop.

• “heads” -> add a new edge between the existing set of nodes:– Select a vertex u with probability proportional to

the the degree of u, i.e., Pr[ u chosen ] = deg(u)/2|E|.

– Independently select vertex v with probability proportional to deg v.

– Add the edge {u,v}.

Page 24: SDSC, skitter (July 1998)

A Graph Evolution Process

p 1-p

u

vGt

• The number of nodes grows with time• Edges are not added uniformly• Nodes which are added early have an “advantage” over nodes added late• Gives a power law degree distribution

y ~ 1/d1+1/p

Page 25: SDSC, skitter (July 1998)

Comparisons

From simulation using Model B

From real data

Page 26: SDSC, skitter (July 1998)

Evolution Process for Directed Graphs

• Select a vertex u with probability proportional to the the out degree of u, i.e., Pr[ u chosen ] = out-deg(u)/|E|.

• Select a vertex v with probability proportional to the the in degree of v.

• Flip two coins; heads with prob p1 and p2.– Heads, heads -> add an edge from u to v.– Heads, tails -> add an edge from u to a new node.– Tails, heads -> add an edge from a new node to v.– Tails, Tails -> add a directed self-loop to a new

node.

• # nodes w/outdegree d ~ 1/d1+1/p1

• # nodes w/indegree of d ~ 1/d1+1/p2

Page 27: SDSC, skitter (July 1998)

Massive Graphs Random graphs

Similarities: Adding one (random) edge at a time.

Differences:

Random graphs <-- almost regular.Massive graphs <-- uneven degrees. Correlations.

Page 28: SDSC, skitter (July 1998)

The advantages of power law models

• Approximating real data graphs.

• Possible to analyze rigorously—discover implicit structure of massive graphs

• Models for generating network topologies

Page 29: SDSC, skitter (July 1998)

• Erdös and Réyni’s seminal papers.

Methods:

• Martingales.

• Concentration bounds.

• Molloy+Reed’s results on random graphs with . given degree squences.

Page 30: SDSC, skitter (July 1998)

Can be found at http://math.ucsd.edu/~llu

A JAVA generation/simulation of power graphs

Future directions

The evolution of power graphs concerning

---- diameters of connected components luuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuLu’s thesis -- - frequency of occurrences of certain subgraphs

- power law of eigenvalues

- scaling behavior of power law graphs

- “signatures” in graphs to distinguish models