27
1 A Random-Surfer Web- Graph Model Avrim Blum, Hubert Chan, Mugizi Rwebangira Carnegie Mellon University

1 A Random-Surfer Web-Graph Model Avrim Blum, Hubert Chan, Mugizi Rwebangira Carnegie Mellon University

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

1

A Random-Surfer Web-Graph Model

Avrim Blum, Hubert Chan, Mugizi Rwebangira

Carnegie Mellon University

2

The Web as a GraphConsider the World Wide Web as a graph, with web pages as nodes and links between pages as edges.

Experiments suggest that the degree distribution of the Web-Graph follows a power law [FFF99].

links.html

resume.htmlindex.html

http://cnn.com

3

Power Law

Taking the logarithm of both sides:

log Pr (X=k) = log C –α log k

The distribution of a quantity X follows a power law ifPr (X=k) = Ck-α

Thus if we take a log-log plot of a power law distribution we will obtain a straight line.

4

Previous WorkBarabási and Albert proposed the Preferential

Attachment model[BA99]:

Each new node connects to the existing nodes with a probability proportional to their degree.

It is known that Preferential Attachment gives a power-law distribution. [Mitzenmacher, Cooper & Frieze 03, KRRSTU00]

Other models proposed include the “copying model.” [KRRSTU00]

5

Motivating Questions

Why would a new node connect to nodes of high degree?-Are high degree nodes more attractive?-Or are there other explanations?

How does a new node find out what the high degree nodes are?

Motivating Observation:•Suppose each page has a small probability p of being interesting.•Suppose a user does a (undirected) random walk until they find an interesting page.•If p is small then this is the same as preferential attachment.•What about other processes and directed graphs?

6

Directed 1-Step Random Surfer

At time 1, we start with a single node with a self-loop.

At time t, a node is chosen uniformly at random, with probabilityp the new node connects to this node, or with probability 1-p

it connects to a random out-neighbor of that node.

(Extension: Repeat process k times for each new node to get out-degree k)

Note: This model is just another way of stating the directed preferential attachment model.

7

Directed 1-step Random Surfer, p=.5

T=1

T=2

¾T=3

¼

T=4

32

61

61

(½) (½)+ (½) (½)+ (½) (½)

(½) (⅓)+ (½) (⅓)+ (½) (⅓)+(½) (⅓)

8

Directed Coin Flipping model

1. At time 1, we start with a single node with a self-loop.

2. At time t, we choose a node uniformly at random.

3. We then flip a coin of bias p.

4. If the coin comes up heads, we connect to the current node.

5. Else we walk to a random neighbor and go to step 3.

“each page has equal probability p of being interesting to us”

9

NEW NODE

RANDOM STARTING NODE

1. COIN TOSS: TAIL2. COIN TOSS: TAIL

3. COIN TOSS: HEAD

10

Is Directed Coin-Flipping Power-lawed?

We don’t know … but we do have some partial results ...

Note: unlike for undirected graphs, the case p → 0 is not so interesting since then you just get a star.

11

Virtual Degree

Definition:Let li(u) be the number of level i descendents of u.Let i (i ≥ 1) is a sequence of real number with 1=1.

Then v(u) = 1 + ∑ βi li(u) (i ≥ 1)

12

u

= v(u) = 1 + β1 (2) + β2 (4) + β3 (0) + β4 (0) + ...

Virtual Degree

v(u) = 1 + β1 l1(u) + β2 l2(u) + β3 l3(u) + β4 l4(u) + ...

Easy observation: If we set βi = (1-p)i then the expected increase in degree(u) is proportional to v(u).

13

Virtual DegreeTheorem: There always exist βi such that 1. For i ≥ 1, |βi| · 1.2. As i → ∞, βi →0 exponentially. 3. The expected increase in v(u) is proportional to v(u).

Theorem: For any node u and time t ≥ tu, E[vt(u)] = Θ((t/tu)p)

Let vt(u) be the virtual degree of node u at time t and tu be the time when node u first appears.

Recurrence: 1=1, 2=p, i+1=i – (1-p)i-1

E.g., for p=¾, i = 1, 3/4, 1/2, 5/16, 3/16, 7/64,... for p=½, i = 1, 1/2, 0, -1/4, -1/4, -1/8, 0, 1/16, …

14

Virtual Degree, contd

Theorem: For any node u and time t ≥ tu, E[vt(u)] = Θ((t/tu)p)

Let vt(u) be the virtual degree of node u at time t and tu be the time when node u first appears.

We also have some weak concentration bounds. Unfortunately not strong enough: if these could be strengthened then would have a proof that virtual degrees (not just their expectations) follow power law.

15

Actual Degree

Theorem: For any node u and time t ≥ tu, E[l1(u)] ≥ Ω((t/tu)p(1-p))

We can also obtain lower bounds on the actual degrees:

16

Experiments

• Random graphs of n=100,000 nodes

• Compute statistics averaged over 100 runs.

• K=1 (Every node has out-degree 1)

17

Uniform random connections

18

Directed 1-Step Random Surfer, p=3/4

19

Directed 1-Step Random Surfer, p=1/2

20

Directed 1-Step Random Surfer, p=1/4

21

Directed Coin Flipping, p=1/2

22

Directed Coin Flipping, p=1/4

23

Undirected coin flipping, p=1/2

24

Undirected Coin Flipping p=0.05

25

Conclusions

• Directed random walk models appear to generate power-laws (and partial theoretical results).

• Power laws can naturally emerge, even if all nodes have the same intrinsic “attractiveness”. (Even in absence of “role model” as in copying-model)

26

Open questions•Can we prove that the degrees in the directed coin-flipping model indeed follow a power law?

•Analyze degree distribution for undirected coin-flipping model with p=1/2?

•Suppose page i has “interestingness” pi. Can we analyze the degree as a function of t, i and pi?

27

Questions?