View
217
Download
0
Embed Size (px)
Citation preview
1
A Random-Surfer Web-Graph Model
Avrim Blum, Hubert Chan, Mugizi Rwebangira
Carnegie Mellon University
2
The Web as a GraphConsider the World Wide Web as a graph, with web pages as nodes and links between pages as edges.
Experiments suggest that the degree distribution of the Web-Graph follows a power law [FFF99].
links.html
resume.htmlindex.html
http://cnn.com
3
Power Law
Taking the logarithm of both sides:
log Pr (X=k) = log C –α log k
The distribution of a quantity X follows a power law ifPr (X=k) = Ck-α
Thus if we take a log-log plot of a power law distribution we will obtain a straight line.
4
Previous WorkBarabási and Albert proposed the Preferential
Attachment model[BA99]:
Each new node connects to the existing nodes with a probability proportional to their degree.
It is known that Preferential Attachment gives a power-law distribution. [Mitzenmacher, Cooper & Frieze 03, KRRSTU00]
Other models proposed include the “copying model.” [KRRSTU00]
5
Motivating Questions
Why would a new node connect to nodes of high degree?-Are high degree nodes more attractive?-Or are there other explanations?
How does a new node find out what the high degree nodes are?
Motivating Observation:•Suppose each page has a small probability p of being interesting.•Suppose a user does a (undirected) random walk until they find an interesting page.•If p is small then this is the same as preferential attachment.•What about other processes and directed graphs?
6
Directed 1-Step Random Surfer
At time 1, we start with a single node with a self-loop.
At time t, a node is chosen uniformly at random, with probabilityp the new node connects to this node, or with probability 1-p
it connects to a random out-neighbor of that node.
(Extension: Repeat process k times for each new node to get out-degree k)
Note: This model is just another way of stating the directed preferential attachment model.
7
Directed 1-step Random Surfer, p=.5
T=1
T=2
¾T=3
¼
T=4
32
61
61
(½) (½)+ (½) (½)+ (½) (½)
(½) (⅓)+ (½) (⅓)+ (½) (⅓)+(½) (⅓)
8
Directed Coin Flipping model
1. At time 1, we start with a single node with a self-loop.
2. At time t, we choose a node uniformly at random.
3. We then flip a coin of bias p.
4. If the coin comes up heads, we connect to the current node.
5. Else we walk to a random neighbor and go to step 3.
“each page has equal probability p of being interesting to us”
10
Is Directed Coin-Flipping Power-lawed?
We don’t know … but we do have some partial results ...
Note: unlike for undirected graphs, the case p → 0 is not so interesting since then you just get a star.
11
Virtual Degree
Definition:Let li(u) be the number of level i descendents of u.Let i (i ≥ 1) is a sequence of real number with 1=1.
Then v(u) = 1 + ∑ βi li(u) (i ≥ 1)
12
u
= v(u) = 1 + β1 (2) + β2 (4) + β3 (0) + β4 (0) + ...
Virtual Degree
v(u) = 1 + β1 l1(u) + β2 l2(u) + β3 l3(u) + β4 l4(u) + ...
Easy observation: If we set βi = (1-p)i then the expected increase in degree(u) is proportional to v(u).
13
Virtual DegreeTheorem: There always exist βi such that 1. For i ≥ 1, |βi| · 1.2. As i → ∞, βi →0 exponentially. 3. The expected increase in v(u) is proportional to v(u).
Theorem: For any node u and time t ≥ tu, E[vt(u)] = Θ((t/tu)p)
Let vt(u) be the virtual degree of node u at time t and tu be the time when node u first appears.
Recurrence: 1=1, 2=p, i+1=i – (1-p)i-1
E.g., for p=¾, i = 1, 3/4, 1/2, 5/16, 3/16, 7/64,... for p=½, i = 1, 1/2, 0, -1/4, -1/4, -1/8, 0, 1/16, …
14
Virtual Degree, contd
Theorem: For any node u and time t ≥ tu, E[vt(u)] = Θ((t/tu)p)
Let vt(u) be the virtual degree of node u at time t and tu be the time when node u first appears.
We also have some weak concentration bounds. Unfortunately not strong enough: if these could be strengthened then would have a proof that virtual degrees (not just their expectations) follow power law.
15
Actual Degree
Theorem: For any node u and time t ≥ tu, E[l1(u)] ≥ Ω((t/tu)p(1-p))
We can also obtain lower bounds on the actual degrees:
16
Experiments
• Random graphs of n=100,000 nodes
• Compute statistics averaged over 100 runs.
• K=1 (Every node has out-degree 1)
25
Conclusions
• Directed random walk models appear to generate power-laws (and partial theoretical results).
• Power laws can naturally emerge, even if all nodes have the same intrinsic “attractiveness”. (Even in absence of “role model” as in copying-model)
26
Open questions•Can we prove that the degrees in the directed coin-flipping model indeed follow a power law?
•Analyze degree distribution for undirected coin-flipping model with p=1/2?
•Suppose page i has “interestingness” pi. Can we analyze the degree as a function of t, i and pi?