Fan Chung
University of California, San Diego
The Mathematics of PageRank
Fan Chung, Joint Math Meeting Jan 8,2007
Fan Chung, 01/08/08
fan
Fan Chung, 01/08/08
PageRank
Mathematics
Graph theoryCombinatorial algorithms
Probabilistic methods
Spectral graph theoryRandom walks
Concepts from spectral geometry
Fan Chung, 01/08/08
Outline of the talk• Motivation
• Local cuts and the `goodness’ of a cut
-- mathematical analysis using:eigenvectorsrandom walksPageRankheat kernel
• Define PageRank
• The “geometry” of a network
Fan Chung, 01/08/08
Search Engines:
ASK, AND IT WILL BE GIVEN TO YOU; SEEK, AND YOU WILL FIND; KNOCK, AND IT WILL BE OPENED TO YOU.- MATHEW 7:7
眾裡尋他千百度
驀然回首
那人卻在
燈火闌珊處
辛棄疾
青玉
案
Fan Chung, 01/08/08
Fan Chung, 01/08/08
What is PageRank?
Google’s answer:
Fan Chung, 01/08/08
What is PageRank?
PageRank is a well-defined operator
on any given graph, introduced by
Sergey Brin and Larry Page of Google
in a paper of 1998.
Fan Chung, 01/08/08
Graph models
Vertices
cities people telephones web pages genes
Edges
flights pairs of friends phone calls linkings regulatory effect
_____________________________
Fan Chung, 01/08/08
A graph G = (V,E)
vertex
edge
Fan Chung, 01/08/08
Information
relations
Information network
Fan Chung, 01/08/08
An induced subgraph of the collaboration graph with authors of Erdös number ≤ 2.
Fan Chung, 01/08/08
A subgraph of the Hollywood graph.Fan Chung, 01/08/08
A subgraph of a BGP graphFan Chung, 01/08/08
Yahoo IM graph Reid Andersen 2005
The Octopus graph
Fan Chung, 01/08/08
Graph Theory has 250 years of history.
Leonhard Euler 1707-1783
The Bridges of KönigsburgIs it possible to walk over every
bridge once and only once?Fan Chung, 01/08/08
Geometric graphs
Algebraic graphsGeneral graphs
Topological graphs
Fan Chung, 01/08/08
Massive dataMassive graphs
• WWW-graphs
• Call graphs
• Acquaintance graphs
• Graphs from any data a.base
protein interaction network Jawoong Jeong
Fan Chung, 01/08/08
Big and bigger graphs New directions.Fan Chung, 01/08/08
Many basic questions:
• Correlation among vertices?
• The `geometry’ of a network ?
• Quantitative analysis?
distance, flow, cut, …
eigenvalues, rapid mixing, …
• Local versus global?
Fan Chung, 01/08/08
A measure for the “importance” of a website
1 14 79 785( )x R x x x= + +
2 1002 3225 9883 30027( )x R x x x x= + + +
⋅⋅⋅ = ⋅⋅⋅
The “importance” of a website is proportional
to the sum of the importance of all the sites
that link to it.
Fan Chung, 01/08/08
Adjacency matrix of a graph
=
0 1 0 0 11 0 1 0 0
A 0 1 0 1 00 0 1 0 11 0 0 1 0
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
Example: Adjacency matrix of a 5-cycle
Fan Chung, 01/08/08
A solution for the “importance” of a website
1 14 79 785( )x x x xρ= + +
2 1002 3225 9883 30027( )x x x x xρ= + + +
⋅⋅⋅ = ⋅⋅⋅
Solve 1 2( , , , )nx x x= ⋅⋅⋅forj1
n
i ijj
x a xρ=
= ∑ x
x = ρ A x ij n nA a ×⎡ ⎤= ⎣ ⎦Fan Chung, 01/08/08
Graph models
directed graphs
weighted graphs
(undirected) graphs
Fan Chung, 01/08/08
Graph models
directed graphs
weighted graphs
(undirected) graphs
Fan Chung, 01/08/08
Graph models
directed graphs
weighted graphs
(undirected) graphs
1.51.2
3.3\
1.5
2
2.3
1
2.81.1
Fan Chung, 01/08/08
In a directed graph,
authority
there are two types of “importance”:
hub
Jon Kleinberg 1998
Fan Chung, 01/08/08
Two types of the “importance” of a website
Solve
1 2( , , , )nx x x= ⋅⋅⋅x
andx = r A y
Importance as Authorities :
1 2( , , , )my y y= ⋅⋅⋅yImportance as Hubs :
y = s A xT
x = rs A A xT
y = rs A A yT
Fan Chung, 01/08/08
Eigenvalue problem for n x n matrix:.
n ≈ 30 billion websites
Hard to compute eigenvalues
Even harder to compute eigenvectors
Fan Chung, 01/08/08
In the old days,
compute for a given (whole) graph.
In reality,
can only afford to compute “locally”.
Fan Chung, 01/08/08
The definition of PageRank given by
Brin and Page is based on
random walks.
Fan Chung, 01/08/08
Random walks in a graph.
1 ,( , )
0 .u
if u vdP u v
otherwise
⎧ ⎪= ⎨⎪ ⎩
:
G : a graph
P : transition probability matrix
2I PW +=
A lazy walk:
ud := the degree of u.
Fan Chung, 01/08/08
A (bored) surfer
• either surf a random webpage
with probability 1- α• or surf a linked webpage
with probability α
Original definition of PageRank
α : the jumping constant 1 1 1( , ,...., ) (1 )n n np pWα α= + −
Fan Chung, 01/08/08
Two equivalent ways to define PageRank pr(α,s)
(1) (1 )p s pWα α= + −s: the seed as a row vectorα : the jumping constant
s
Definition of personalized PageRank
Fan Chung, 01/08/08
Two equivalent ways to define PageRank p=pr(α,s)
(1) (1 )p s pWα α= + −
0(1 ) ( )t t
tp sWα α
∞
=
= −∑s =
(2)
1 1 1( , ,...., )n n n
Definition of PageRank
the (original) PageRank
some “seed”, e.g.,s = (1,0,....,0)personalized PageRank
Fan Chung, 01/08/08
Depends on the applications?
How good is PageRank as a measure of correlationship?
Rely on mathematics
both old and new.
Fan Chung, 01/08/08
Isoperimetric properties
“What is the shortest curve enclosing a unit area?”
In a graph G and an integer m,
what is the minimum cut disconnecting a subgraph of ≥ m vertices?
In a graph G, what is the minimum cut e(S,V-S)
so that e(S,V-S) is the smallest?_____Vol S
Fan Chung, 01/08/08
Two types of cuts:
• Vertex cut
• edge cut
How “good” is the cut?
Fan Chung, 01/08/08
e(S,V-S)_____Vol S
e(S,V-S)_____|S|
Vol S = Σ deg(v)v ε S |S| = Σ 1v ε S
S
V-S
Fan Chung, 01/08/08
The Cheeger constant
( , )minmin( , )G S
e S Shvol S vol S
=
The Cheeger constant for graphs
hG and its variations are sometimes called“conductance”, “isoperimetric number”, …
GΦ
GΦ
( )The volume of S is xx S
vol S d∈
= ∑
Fan Chung, 01/08/08
The Cheeger constant
( , )minmin( , )G S
e S Shvol S vol S
=
The Cheeger inequality
GΦ
2
22G
G λΦ
Φ ≥ ≥
The Cheeger inequality
: the first nontrivial eigenvalue of the xx(normalized) Laplacian.
Fan Chung, 01/08/08
The spectrum of a graph
Many ways to define the spectrum of a graph.
How are the How are the eigenvalueseigenvalues related to related to
properties of graphs?properties of graphs?
•Adjacency matrix
Fan Chung, 01/08/08
•Combinatorial Laplacian
L D A= −diagonal degree matrix
adjacency matrix
Gustav Robert Kirchhoff
1824-1887
The spectrum of a graph
•Adjacency matrix
•Normalized Laplacian
Random walks
Rate of convergence
Fan Chung, 01/08/08
The spectrum of a graph
1( ) ( ( ) ( ))y xx
f x f x f yd
Δ = −∑:
Discrete Laplace operator
In a path Pn
2
2 ( )f xx∂
−∂
1
1
1 ( ) ( )2
( ) ( )
{( )
( )}
j j
j j
f x f x
f x f x
+
−
= − −
− −
1( ) ( ){ }j jf x f xx x+∂ ∂
− −∂ ∂
Fan Chung, 01/08/08
The spectrum of a graph
not symmetric in general
•Normalized Laplaciansymmetric normalized
1( ) ( ( ) ( ))y xx
f x f x f yd
Δ = −∑:
L( , )x y =1 if x y={ 1
x y
if x y and x yd d
− ≠ :
with eigenvalues0 1 10 2n
Discrete Laplace operator
λ λ λ −= ≤ ≤ ⋅⋅⋅ ≤ ≤
( , )L x y =1 if x y={ 1
x
if x y and x yd
− ≠ :
loopless, simple
Fan Chung, 01/08/08
Can you hear the shape of a network?
dictates many properties
of a graph.• connectivity
• diameter
• isoperimetryz(bottlenecks)
• … …
λ
How “good” is the cut by using the eigenvalue ?λ
Fan Chung, 01/08/08
Finding a cut by a sweep
For : ( ) ,f V G R→
1 2
1 2 ( )( ) ( ) .n
n
v v v
f vf v f vd d d
≥ ≥ ⋅⋅⋅ ≥
1{ , , }, 1,..., ,f
j jS v v j n= ⋅⋅⋅ =Consider sets
order the vertices
and the Cheeger constant of
( ) min ( )fjjf SΦ = Φ
.fjS
Define
Fan Chung, 01/08/08
Using a sweep by the eigenvector,
can reduce the exponential number of
choices of subsets to a linear number.
Finding a cut by a sweep
Still, there is a lower bound guarantee
by using the Cheeger inequality.
2
22
λ ΦΦ ≥ ≥
Fan Chung, 01/08/08
Four types of Cheeger inequalities.
eigenvectorsrandom walksPageRankheat kernel
Four proofs of Cheeger inequalities using:
Leading to four different
one-sweep partitioning algorithms.Fan Chung, 01/08/08
• graph spectral method
• random walks
• PageRank
• heat kernel
spectral partition algorithm
local partition algorithms
Four proofs of Cheeger inequalities
Fan Chung, 01/08/08
Graph partitioning Local graph partitioning
Fan Chung, 01/08/08
What is a local graph partitioning algorithm?
A local graph partitioning algorithm finds a small
cut near the given seed(s) with running time
depending only on the size of the output.
Fan Chung, 01/08/08
Two examples ( Reid Andersen)
Fan Chung, 01/08/08
Fan Chung, 01/08/08
Two examples ( Reid Andersen)
Fan Chung, 01/08/08
Two examples ( Reid Andersen)
Two examples ( Reid Andersen)
Fan Chung, 01/08/08
Two examples ( Reid Andersen)
Fan Chung, 01/08/08
Fan Chung, 01/08/08
• graph spectral method
• random walks
• PageRank
• heat kernel
Four proofs of Cheeger inequalities
Fan Chung, 01/08/08
where is the first non-trivial eigenvalueof the Laplacian and is the minimum Cheeger ratio in a sweep using the eigenvector .
f
2 2
22 2
αλ ΦΦ ≥ ≥ ≥
Using eigenvector
the Cheeger inequality can be stated as
The Cheeger inequality
α
Partition algorithm
,
fFan Chung, 01/08/08
Proof of the Cheeger inequality:
from definition
by Cauchy-Schwarz ineq.
summation by parts.
from the definition.
Fan Chung, 01/08/08
• graph spectral method
• random walks
• PageRank
• heat kernel
Lovasz, Simonovits, 90, 93 Spielman, Teng, 04Andersen, Chung, Lang, 06
Chung, PNAS , 08.
Four proofs of Cheeger inequalities
Fiedler ’73, Cheeger, 60’s
Fan Chung, 01/08/08
2 2
28 8Gβλ ΦΦ ≥ ≥ ≥
where is the minimum Cheeger ratio over sweeps by using a lazy walk of k steps from every vertex for an appropriate range of k .
G
A Cheeger inequality using random walks
β
Lovász, Simonovits, 90, 932( )( , ) ( ) 1
8
kk k
u
vol SW u S Sd
βπ⎛ ⎞
− ≤ −⎜ ⎟⎝ ⎠
Leads to a Cheeger inequality:
Fan Chung, 01/08/08
Using the PageRank vector.
A Cheeger inequality using PageRank
Recall the definition of PageRank p=pr(α,s): (1) (1 )p s pWα α= + −
0(1 ) ( )t t
tp sWα α
∞
=
= −∑(2)
Organize the random walks by a scalar α.
Fan Chung, 01/08/08
with seed as a subset S
2 2
8log 8logS S
S S s sγλ ΦΦ ≥ ≥ ≥
and a Cheeger inequality
can be obtained :
Using the PageRank vector
where S is the Dirichlet eigenvalue of the Laplacian, and is the minimum Cheegerratio over sweeps by using personalized PageRank with seeds S.
S
A Cheeger inequality using PageRank
γ
( ) ( ) / 4,vol S vol G≤
Fan Chung, 01/08/08
2
2
( ( ) ( ))inf
( )u v
S fw
w
f u f v
f w dλ
−=
∑∑
:
over all f satisfying the Dirichlet boundary
condition:
Dirichlet eigenvalues for a subset
( ) 0f v =
S V⊆
S
V
for all .v S∉
Fan Chung, 01/08/08
with seed as a subset S
2 2
8log 8logS S
S S s sγλ ΦΦ ≥ ≥ ≥
and a Cheeger inequality
can be obtained :
Using the PageRank vector
where S is the Dirichlet eigenvalue of the Laplacian, and is the minimum Cheegerratio over sweeps by using personalized PageRank with seed S.
S
A Cheeger inequality using PageRank
γ
( ) ( ) / 4,vol S vol G≤
Fan Chung, 01/08/08
Algorithmic aspects of PageRank
Can use the jumping constant to approximate PageRank with a support of the desired size.
greedy type algorithm, linear complexity
•
• Fast approximation algorithm for x personalized PageRank
• Errors can be effectively bounded.
Fan Chung, 01/08/08
A graph partition algorithm using PageRank
Given a set S with ,S αΦ ≤
Randomly choose a vertex v in S.
With probability at least 1 ,2
the one-sweep algorithm using ( , )f pr vα=has an initial segment with the Cheeger
constant at most ( log | |).f O SαΦ =
12
( ) ( ).vol S vol G≤
Fan Chung, 01/08/08
Graph partitioning using PageRank vector.
198,430 nodes and 1,133,512 edgesFan Chung, 01/08/08
Fan Chung, 01/08/08
Fan Chung, 01/08/08
• graph spectral method
• random walks
• PageRank
• heat kernel
Lovasz, Simonovits, 90, 93 Spielman, Teng, 04Andersen, Chung, Lang, 06
Chung, PNAS , 08.
Fiedler ’73, Cheeger, 60’s
Four proofs of Cheeger inequalities
Fan Chung, 01/08/08
PageRank versus heat kernel
,0(1 ) ( )k ks
kp sWα α α
∞
=
= −∑ ,0
( )!
kt
t sk
tWe sk
ρ∞
−
=
= ∑Geometric sum Exponential sum
(1 )p pWα α= + −
recurrence
( )I Wtρ ρ∂ = − −
∂
Heat equation
Fan Chung, 01/08/08
22( ... ...)
2 !
kt k
tt tH e I tW W W
k−= + + + + +
( )t I We− −=
Definition of heat kernel
tLe−=2
2 ... ( 1) ...2 !
kk kt tI tL L L
k= − + + + − +
( )t tH I W Ht∂
= − −∂
,t s tsHρ =Fan Chung, 01/08/08
A Cheeger inequality using the heat kernel
2, / 4
,( )( ) ( ) t utt uu
vol SS S ed
κρ π −− ≤
where is the minimum Cheeger ratio over sweeps by using heat kernel pagerank over all u in S.
,t uκ
Theorem:
Theorem: For
, ( ) ( ) .2
St
t SeS S
λ
ρ π−
− ≥
2/3( ) ( ) ,vol S vol G≤
Fan Chung, 01/08/08
2 2
8 8S S
S Sκλ ΦΦ ≥ ≥ ≥
where S is the Dirichlet eigenvalue of the Laplacian, and is the minimum Cheegerratio over sweeps by using heat kernel with seeds S for appropriate t.
S
Using the upper and lower bounds,
a Cheeger inequality can be obtained :
A Cheeger inequality using the heat kernel
κ
Fan Chung, 01/08/08
Theorem:( )/ 1 ( )
, ( ) ( ) (1 ( )) Sh t S
t S S S S eπρ π π − −− ≥ −
Sketch of a proof:
,( ) log( ( ) ( ))t SF t S Sρ π= − −Consider
Show2
2 ( ) 0F tt∂
≤∂
Then( )
( ) (0)1
SF t Ft t Sπ
Φ∂ ∂≤ =
∂ ∂ −
Solve and get ( )/ 1 ( ), ( ) ( ) (1 ( )) S
h t St S S S S e
πρ π π − −− ≥ −Fan Chung, 01/08/08
Random walks versus heat kernel
How fast is the convergence to the
stationary distribution?Choose t to satisfy
the required property.
For what k, can one have
?kf W π→
Fan Chung, 01/08/08
Fan Chung, 01/08/08
Fan Chung, 01/08/08
Fan Chung, 01/08/08
• Complex networks• pageranks• Games on graphs
Related areas: • Spectral graph theory• Random walks• Random graphs• Game theory
New Directions in Graph Theory for information networks
Topics:
• Quasi-randomness
• Probabilistic methods• Analytic methods
Methods:
Fan Chung, 01/08/08
• Andersen, Chung, Lang, Local graph partitioning xxusing pagerank vectors, FOCS 2006
• Andersen, Chung, Lang, Local partitioning for xxdirected graphs using PageRank, WAW 2007• Chung, Four proofs of the Cheeger inequality and graph
partition algorithms, ICCM 2007
• Chung, The heat kernel as the pagerank of a graph, PNAS 2008.
Some related papers:
Graph modelsA graph G = (V,E)Many basic questions:Graph modelsGraph modelsGraph modelsNew Directions in Graph Theory �for information networks