Estimating PageRank on Graph Streams

Preview:

DESCRIPTION

Estimating PageRank on Graph Streams. Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research). PageRank. PageRank Determine Ranking of nodes in graphs Typically large graphs - WWW, Social Networks Run daily by commercial search engines. - PowerPoint PPT Presentation

Citation preview

Estimating PageRank on Graph Streams

Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi,

Rina Panigrahy (Microsoft Research)

PageRank

• PageRank – Determine Ranking of nodes in graphs

• Typically large graphs - WWW, Social Networks

• Run daily by commercial search engines

PageRank computation

u

a

b

c

PageRank Computation

Our Approach:No Matrix-Vector

Multiplication!

u

a

b

c

Our Result

Many Random Walk SamplesEfficiently.

Approximate PageRank

u

Other results from Random Walks

We can estimate:Mixing TimeConductance

Using Streams

G

u

Streaming

7

e1, e2, e3, e4, e5, e6, e7, ….

Input is a “stream”

Small RAM working memory

Few Passes

Frequency moments, quantiles

Graphs: Edges, arbitrary order

010001011

011101011

0100110111

Related Work

• Sparsifiers (Benczur-Karger 96, Spielman-Teng 01, Spielman-Srivastava 08)– Given an undirected graph, produces a sparse one– approximately preserves x’Lx– Can be used to compute sparse cuts

• Streaming version of BK96 (Ahn, Guha 09)– Sparse cuts in 1 pass and O(n) space.

• Accelarated Page Rank (McSherry 08)– heuristics

8

~

Key Idea

One walk from ulength l efficiently

Later extend toMany walks

u

vl

Single Random Walk - Naive Algo.

One Stepwith every

Pass!

Constant Space Passes

s

Second Naive Algo

Single PassSample sufficient edges!

If ,then sample2 out-edges

from each node.

(store order)

s

Comparison

Naive (single walk):

Our Result:

In fact walks!

u

l

Automatically:

Insight: Merge Short Walks

Sample fraction of nodes(centers)

passes - length walks

Merge and extendshort walks!

Two problems:End up at node second timeEnd up at non-sampled node

s

w

w

w

w

w

w

w

ab

Stuck Nodes

Sample an edgefrom stuck.

Again.And again...

Slow?

If new nodes, good in passes!

s

w

w

w

w

w

w

w

Stuck nodes

Stuck on sameNodes?

Sample s edges from each

s progress ORnew node!

Must include to set previous seen

centers

s

w

w

w

w

w

w

w

ww s s

s

s s

s

Summary

s

w

w

w

w

w

w

w

ww s s

s

s s

s

• Perform short walks from sampled centers

• Concatenate walks until stuck

• Sample edges from stuck

• Make local progress until new node

• Local progress = s• New node : center with

prob • Amortized progress,

every pass

Summary

s

w

w

w

w

w

w

w

ww s s

s

s s

s

Total number of passes :

Total Space :

Summary

s

w

w

w

w

w

w

w

ww s s

s

s s

s

Set

Number of passes =

Space =

Many WalksNaive Space

Bound:

Observation:Many short walks

not used inSingle RW.

s

w

w

w

w

w

w

w

ww s s

s

s s

s

We show:

lnKnO /for )(~

Many Random Walks

ir

ir

w

lKrK i

ir

• : probability node ’s short walk used in single RW.

• If known : save lot of space!• Perform K random walks• Total number of short walks required is

about

• Don’t know . But can estimate.ir

Estimating

• Run K = (log n) walks of length

• Gives a crude estimate of • Sufficient to double K• Continue doubling K• Gives K walks in space

• Passes

u

l

ir

irO

)(~

Kll

KnO

Distributions

samples

Distribution: u

SpacePasses

Mixing Time, Conductance• Undirected graphs: Compare Distribution

with Steady State.• Estimating difference: samples.

[Batu et. al.’ 01]– approximate mixing time.

• Directed, till distribution “stabilizes”: samples.

• Conductance:• Recall space for walks: lnKnO /for )(

~

Results recap

• - Mixing Time for Undirected Graphs :

• Quadratic Approximation to Conductance• PageRank to accuracy

)(~

:Space nO

Open Questions?

• Improve passes for random walks. In particular, sub-linear space and constant passes.

• Graph Cuts and Graph Sparsification for directed graphs

• Better (streaming) algorithms for computing eigenvectors

Thank You!

Summary

• Perform short walks from sampled centers• Concatenate walks until stuck• Sample edges from stuck• Make local progress until new node• Local progress = s• New node = nodes gives center• Amortized, every pass -

Summary

• Perform short walks from sampled centers• Concatenate walks until stuck• Sample edges from stuck• Make local progress until new node• Local progress = s• New node = nodes gives center• Amortized, every pass -

Analysis

• Total number of passes :• Total Space : • Set• Number of passes = • Space =

Recommended