Luca Trevisan U.C Berkeley Based on work withTsz Chiu Kwok...

Preview:

Citation preview

Spectral Algorithms for Graph Partitioning

Luca TrevisanU.C Berkeley

Based on work with Tsz Chiu Kwok, Lap Chi Lau, James Lee,Yin Tat Lee, and Shayan Oveis Gharan

spectral graph theory

Use linear algebra to

• understand/characterize graph properties

• design efficient graph algorithms

linear algebra and graphs

Take an undirected graph G = (V ,E ), consider matrix

A[i , j ] := weight of edge (i , j)

eigenvalues −→ combinatorial propertieseigenvenctors −→ algorithms for combinatorial problems

linear algebra and graphs

Take an undirected graph G = (V ,E ), consider matrix

A[i , j ] := weight of edge (i , j)

eigenvalues −→ combinatorial propertieseigenvenctors −→ algorithms for combinatorial problems

linear algebra and graphs

Take an undirected graph G = (V ,E ), consider matrix

A[i , j ] := weight of edge (i , j)

eigenvalues −→ combinatorial propertieseigenvenctors −→ algorithms for combinatorial problems

fundamental thm of spectral graph theory

G = (V ,E ) undirected, A adjacency matrix, dv degree of v , Ddiagonal matrix of degrees

L := I − D−12AD−

12

L is symmetric, all its eigenvalues are real and

• 0 = λ1 ≤ λ2 ≤ · · · ≤ λn ≤ 2

• λk = 0 iff G has ≥ k connected components

• λn = 2 iff G has a bipartite connected component

fundamental thm of spectral graph theory

G = (V ,E ) undirected, A adjacency matrix, dv degree of v , Ddiagonal matrix of degrees

L := I − D−12AD−

12

L is symmetric, all its eigenvalues are real and

• 0 = λ1 ≤ λ2 ≤ · · · ≤ λn ≤ 2

• λk = 0 iff G has ≥ k connected components

• λn = 2 iff G has a bipartite connected component

fundamental thm of spectral graph theory

G = (V ,E ) undirected, A adjacency matrix, dv degree of v , Ddiagonal matrix of degrees

L := I − D−12AD−

12

L is symmetric, all its eigenvalues are real and

• 0 = λ1 ≤ λ2 ≤ · · · ≤ λn ≤ 2

• λk = 0 iff G has ≥ k connected components

• λn = 2 iff G has a bipartite connected component

fundamental thm of spectral graph theory

G = (V ,E ) undirected, A adjacency matrix, dv degree of v , Ddiagonal matrix of degrees

L := I − D−12AD−

12

L is symmetric, all its eigenvalues are real and

• 0 = λ1 ≤ λ2 ≤ · · · ≤ λn ≤ 2

• λk = 0 iff G has ≥ k connected components

• λn = 2 iff G has a bipartite connected component

0,2

3,

2

3,

2

3,

2

3,

2

3,

5

3,

5

3,

5

3,

5

3

0,2

3,

2

3,

2

3,

2

3,

2

3,

5

3,

5

3,

5

3,

5

3

0, 0, .69, .69, 1.5, 1.5, 1.8, 1.8

0, 0, .69, .69, 1.5, 1.5, 1.8, 1.8

0,2

3,

2

3,

2

3,

4

3,

4

3,

4

3, 2

0,2

3,

2

3,

2

3,

4

3,

4

3,

4

3, 2

eigenvalues as solutions to optimizationproblems

If M ∈ Rn×n is symmetric, and

λ1 ≤ λ2 ≤ · · · ≤ λnare eigenvalues counted with multiplicities, then

λk = minA k−dim subspace of RV

maxx∈A

xTMx

xT x

and eigenvectors of λ1, . . . , λk are basis of opt A

why eigenvalues relate to connectedcomponents

If G = (V ,E ) is undirected and L is Laplacian, then

xTLx

xT x=

∑(u,v)∈E |xu − xv |2∑

v dv · x2v

If λ1 ≤ λ2 ≤ · · ·λn are eigenvalues of L with multiplicities,

λk = minA k−dim subspace of RV

maxx∈A

R(x)

where

R(x) :=

∑(u,v)∈E |xu − xv |2∑

v dv · x2v

Note:∑

(u,v)∈E |xu − xv |2 = 0 iff x constant on connectedcomponents

why eigenvalues relate to connectedcomponents

If G = (V ,E ) is undirected and L is Laplacian, then

xTLx

xT x=

∑(u,v)∈E |xu − xv |2∑

v dv · x2v

If λ1 ≤ λ2 ≤ · · ·λn are eigenvalues of L with multiplicities,

λk = minA k−dim subspace of RV

maxx∈A

R(x)

where

R(x) :=

∑(u,v)∈E |xu − xv |2∑

v dv · x2v

Note:∑

(u,v)∈E |xu − xv |2 = 0 iff x constant on connectedcomponents

why eigenvalues relate to connectedcomponents

If G = (V ,E ) is undirected and L is Laplacian, then

xTLx

xT x=

∑(u,v)∈E |xu − xv |2∑

v dv · x2v

If λ1 ≤ λ2 ≤ · · ·λn are eigenvalues of L with multiplicities,

λk = minA k−dim subspace of RV

maxx∈A

R(x)

where

R(x) :=

∑(u,v)∈E |xu − xv |2∑

v dv · x2v

Note:∑

(u,v)∈E |xu − xv |2 = 0 iff x constant on connectedcomponents

if G has ≥ k connected components

λk = minA k−dim subspace of RV

maxx∈A

∑(u,v)∈E |xu − xv |2∑

v dv · x2v

Consider the space of all vectors that are constant in eachconnected component; it has dimension at least k and so itwitnesses λk = 0.

if G has ≥ k connected components

λk = minA k−dim subspace of RV

maxx∈A

∑(u,v)∈E |xu − xv |2∑

v dv · x2v

Consider the space of all vectors that are constant in eachconnected component; it has dimension at least k and so itwitnesses λk = 0.

if λk = 0

λk = minA k−dim subspace of RV

maxx∈A

∑(u,v)∈E |xu − xv |2∑

v dv · x2v

If λk = 0 there is a k-dimensional space A of vectors, eachconstant on connected components

The graph must have ≥ k connected components

if λk = 0

λk = minA k−dim subspace of RV

maxx∈A

∑(u,v)∈E |xu − xv |2∑

v dv · x2v

If λk = 0 there is a k-dimensional space A of vectors, eachconstant on connected components

The graph must have ≥ k connected components

if λk = 0

λk = minA k−dim subspace of RV

maxx∈A

∑(u,v)∈E |xu − xv |2∑

v dv · x2v

If λk = 0 there is a k-dimensional space A of vectors, eachconstant on connected components

The graph must have ≥ k connected components

conductance

vol(S) :=∑

v∈S dv

φ(S) :=E (S , S)

vol(S)

φ(G ) := minS⊆V :0<vol(S)≤ 1

2vol(V )

φ(S)

In regular graphs, same as edge expansion and, up to factor of 2non-uniform sparsest cut

conductance

0, 0, .69, .69, 1.5, 1.5, 1.8, 1.8

φ(G ) = φ(0, 1, 2) =0

6= 0

conductance

0, .055, .055, .211, . . .

φ(G ) =2

18

conductance

0,2

3,

2

3,

2

3,

2

3,

2

3,

5

3,

5

3,

5

3,

5

3

φ(G ) =5

15

finding S of small conductance

• G is a social network

• S is a set of users more likely to be friends with each otherthan anyone else

• G is a similarity graph on items of a data sets• S are items more similar to each other than to other items• [Shi, Malik]: image segmentation

• G is an input of a NP-hard optimization problem• Recurse on S , V − S , use dynamic programming to combine in

time 2O(|E(S,S)|)

finding S of small conductance

• G is a social network• S is a set of users more likely to be friends with each other

than anyone else

• G is a similarity graph on items of a data sets• S are items more similar to each other than to other items• [Shi, Malik]: image segmentation

• G is an input of a NP-hard optimization problem• Recurse on S , V − S , use dynamic programming to combine in

time 2O(|E(S,S)|)

finding S of small conductance

• G is a social network• S is a set of users more likely to be friends with each other

than anyone else

• G is a similarity graph on items of a data sets

• S are items more similar to each other than to other items• [Shi, Malik]: image segmentation

• G is an input of a NP-hard optimization problem• Recurse on S , V − S , use dynamic programming to combine in

time 2O(|E(S,S)|)

finding S of small conductance

• G is a social network• S is a set of users more likely to be friends with each other

than anyone else

• G is a similarity graph on items of a data sets• S are items more similar to each other than to other items• [Shi, Malik]: image segmentation

• G is an input of a NP-hard optimization problem• Recurse on S , V − S , use dynamic programming to combine in

time 2O(|E(S,S)|)

finding S of small conductance

• G is a social network• S is a set of users more likely to be friends with each other

than anyone else

• G is a similarity graph on items of a data sets• S are items more similar to each other than to other items• [Shi, Malik]: image segmentation

• G is an input of a NP-hard optimization problem• Recurse on S , V − S , use dynamic programming to combine in

time 2O(|E(S,S)|)

conductance versus λ2

Recall:λ2 = 0⇔ G disconnected

⇔ φ(G ) = 0

Cheeger inequality (Alon, Milman, Dodzuik, mid-1980s):

λ2

2≤ φ(G ) ≤

√2λ2

conductance versus λ2

Recall:λ2 = 0⇔ G disconnected⇔ φ(G ) = 0

Cheeger inequality (Alon, Milman, Dodzuik, mid-1980s):

λ2

2≤ φ(G ) ≤

√2λ2

conductance versus λ2

Recall:λ2 = 0⇔ G disconnected⇔ φ(G ) = 0

Cheeger inequality (Alon, Milman, Dodzuik, mid-1980s):

λ2

2≤ φ(G ) ≤

√2λ2

Cheeger inequality

λ2

2≤ φ(G ) ≤

√2λ2

Constructive proof of φ(G ) ≤√

2λ2:

• given eigenvector x of λ2.

• sort vertices v according to xv

• a set of vertices S occurring as initial segment or finalsegment of sorted order has

φ(S) ≤√

2λ2 ≤ 2√φ(G )

Fiedler’s sweep algorithm can be implemented in O(|V |+ |E |) time

Cheeger inequality

λ2

2≤ φ(G ) ≤

√2λ2

Constructive proof of φ(G ) ≤√

2λ2:

• given eigenvector x of λ2.

• sort vertices v according to xv

• a set of vertices S occurring as initial segment or finalsegment of sorted order has

φ(S) ≤√

2λ2

≤ 2√φ(G )

Fiedler’s sweep algorithm can be implemented in O(|V |+ |E |) time

Cheeger inequality

λ2

2≤ φ(G ) ≤

√2λ2

Constructive proof of φ(G ) ≤√

2λ2:

• given eigenvector x of λ2.

• sort vertices v according to xv

• a set of vertices S occurring as initial segment or finalsegment of sorted order has

φ(S) ≤√

2λ2 ≤ 2√φ(G )

Fiedler’s sweep algorithm can be implemented in O(|V |+ |E |) time

applications

λ2

2≤ φ(G ) ≤

√2λ2

• rigorous analysis of sweep algorithm(practical performance usually better than worst-case analysis)

• to certify φ ≥ Ω(1), enough to certify λ2 ≥ Ω(1)

• mixing time of random walk is O(

1λ2

log |V |)

and so also

O

(1

φ2(G )log |V |

)

applications

λ2

2≤ φ(G ) ≤

√2λ2

• rigorous analysis of sweep algorithm(practical performance usually better than worst-case analysis)

• to certify φ ≥ Ω(1), enough to certify λ2 ≥ Ω(1)

• mixing time of random walk is O(

1λ2

log |V |)

and so also

O

(1

φ2(G )log |V |

)

applications

λ2

2≤ φ(G ) ≤

√2λ2

• rigorous analysis of sweep algorithm(practical performance usually better than worst-case analysis)

• to certify φ ≥ Ω(1), enough to certify λ2 ≥ Ω(1)

• mixing time of random walk is O(

1λ2

log |V |)

and so also

O

(1

φ2(G )log |V |

)

applications

λ2

2≤ φ(G ) ≤

√2λ2

• rigorous analysis of sweep algorithm(practical performance usually better than worst-case analysis)

• to certify φ ≥ Ω(1), enough to certify λ2 ≥ Ω(1)

• mixing time of random walk is O(

1λ2

log |V |)

and so also

O

(1

φ2(G )log |V |

)

goal

A similar theory for other eigenvectors, with algorithmicapplications

One intended application: Like Cheeger’s inequality is worst caseanalysis of Fiedler’s algorithm, develop worst-case analysis ofspectral clustering

spectral clustering

The following algorithm works well in practice. (But to solve whatproblem?)

Compute k eigenvectors x(1), . . . , x(k) ofk smallest eigenvalues of L

Define mapping F : V → Rk

F (v) := (x(1)v , x

(2)v , . . . , x

(k)v )

Apply k-means to the points that vertices are mapped to

spectral embedding of a random graphk = 3

When λk is small

Lee, Oveis-Gharan, T, 2012

From fundamental theorem of spectral graph theory:

• λk = 0⇔ G has ≥ k connected components

We prove:

• λk small ⇔ G has ≥ k disjoint sets of small conductance

Lee, Oveis-Gharan, T, 2012

From fundamental theorem of spectral graph theory:

• λk = 0⇔ G has ≥ k connected components

We prove:

• λk small ⇔ G has ≥ k disjoint sets of small conductance

order-k conductance

Define order-k conductance

φk(G ) := minS1,...,Sk disjoint

maxi=1,...,k

φ(Si )

Note:

• φ(G ) = φ2(G )

• φk(G ) = 0⇔ G has ≥ k connected components

0, 0, .69, .69, 1.5, 1.5, 1.8, 1.8

φ3(G ) = max

0,

2

6,

2

4

=

1

2

0, 0, .69, .69, 1.5, 1.5, 1.8, 1.8

φ3(G ) = max

0,

2

6,

2

4

=

1

2

0, .055, .055, .211, . . .

φ3(G ) = max

2

12,

2

12,

2

14

=

1

6

λk

λk = 0⇔ φk = 0

λk2≤ φk(G ) ≤ O(k2) ·

√λk

(Lee,Oveis Gharan, T, 2012)

Upper bound is constructive: in nearly-linear time we can find kdisjoint sets each of expansion at most O(k2)

√λk .

λk

λk = 0⇔ φk = 0

λk2≤ φk(G ) ≤ O(k2) ·

√λk

(Lee,Oveis Gharan, T, 2012)

Upper bound is constructive: in nearly-linear time we can find kdisjoint sets each of expansion at most O(k2)

√λk .

λk

λk2≤ φk(G ) ≤ O(k2) ·

√λk

φk(G ) ≤ O(√

(log k) · λ1.1k)

(Lee,Oveis Gharan, T, 2012)

φk(G ) ≤ O(√

(log k) · λO(k))

(Louis, Raghavendra, Tetali, Vempala, 2012)

Factor O(√

log k) is tight

λk

λk2≤ φk(G ) ≤ O(k2) ·

√λk

φk(G ) ≤ O(√

(log k) · λ1.1k)

(Lee,Oveis Gharan, T, 2012)

φk(G ) ≤ O(√

(log k) · λO(k))

(Louis, Raghavendra, Tetali, Vempala, 2012)

Factor O(√

log k) is tight

λk

φk(G ) ≤ O(√

(log k) · λO(k))

(Louis, Raghavendra, Tetali, Vempala, 2012)(Lee,Oveis Gharan, T, 2012)

Miclo 2013:

• Analog for infinite-dimensional Markov process.

• Application: every hyper-bounded operator has a spectral gap.

• Solves 40+ year old open problem

spectral clustering

Recall spectral clustering algorithm

1 Compute k eigenvectors x(1), . . . , x(k) ofk smallest eigenvalues of L

2 Define mapping F : V → Rk

F (v) := (x(1)v , x

(2)v , . . . , x

(k)v )

3 Apply k-means to the points that vertices are mapped to

LOT algorithms and LRTV algorithm find k low-conductance setsif they exist

First step is spectral embedding

Then geometric partitioning but not k-means

spectral clustering

Recall spectral clustering algorithm

1 Compute k eigenvectors x(1), . . . , x(k) ofk smallest eigenvalues of L

2 Define mapping F : V → Rk

F (v) := (x(1)v , x

(2)v , . . . , x

(k)v )

3 Apply k-means to the points that vertices are mapped to

LOT algorithms and LRTV algorithm find k low-conductance setsif they exist

First step is spectral embedding

Then geometric partitioning but not k-means

When λk is large

Cheeger inequality

λ2

2≤ φ(G ) ≤

√2λ2

φ(G ) ≤ O(k) · λ2√λk

(Kwok, Lau, Lee, Oveis Gharan, T, 2013)

and the upper bound holds for the sweep algorithm

Nearly-linear time sweep algorithm finds S such that, for every k ,

φ(S) ≤ φ(G ) · O(

k√λk

)

Cheeger inequality

λ2

2≤ φ(G ) ≤

√2λ2

φ(G ) ≤ O(k) · λ2√λk

(Kwok, Lau, Lee, Oveis Gharan, T, 2013)

and the upper bound holds for the sweep algorithm

Nearly-linear time sweep algorithm finds S such that, for every k ,

φ(S) ≤ φ(G ) · O(

k√λk

)

Cheeger inequality

λ2

2≤ φ(G ) ≤

√2λ2

φ(G ) ≤ O(k) · λ2√λk

(Kwok, Lau, Lee, Oveis Gharan, T, 2013)

and the upper bound holds for the sweep algorithm

Nearly-linear time sweep algorithm finds S such that, for every k ,

φ(S) ≤ φ(G ) · O(

k√λk

)

Cheeger inequality

λ2

2≤ φ(G ) ≤

√2λ2

φ(G ) ≤ O(k) · λ2√λk

(Kwok, Lau, Lee, Oveis Gharan, T, 2013)

Liu 2014: analog for Riemann manifolds. Applications to boundingeigenvalue gaps

Cheeger inequality

λ2

2≤ φ(G ) ≤

√2λ2

φ(G ) ≤ O(k) · λ2√λk

(Kwok, Lau, Lee, Oveis Gharan, T, 2013)

Liu 2014: analog for Riemann manifolds. Applications to boundingeigenvalue gaps

proof structure

1 We find a 2k-valued vector y such that

||x − y ||2 ≤ O

(λ2

λk

)where x is eigenvector of λ2

2 For every k-valued vector y , applying the sweep algorithm tox finds a set S such that

φ(S) ≤ O(k) ·(λ2 +

√λ2 · ||x − y ||

)

So the sweep algorithm finds S

φ(S) ≤ O(k) · λ2 ·1√λk

proof structure

1 We find a 2k-valued vector y such that

||x − y ||2 ≤ O

(λ2

λk

)where x is eigenvector of λ2

2 For every k-valued vector y , applying the sweep algorithm tox finds a set S such that

φ(S) ≤ O(k) ·(λ2 +

√λ2 · ||x − y ||

)

So the sweep algorithm finds S

φ(S) ≤ O(k) · λ2 ·1√λk

proof structure

1 We find a 2k-valued vector y such that

||x − y ||2 ≤ O

(λ2

λk

)where x is eigenvector of λ2

2 For every k-valued vector y , applying the sweep algorithm tox finds a set S such that

φ(S) ≤ O(k) ·(λ2 +

√λ2 · ||x − y ||

)

So the sweep algorithm finds S

φ(S) ≤ O(k) · λ2 ·1√λk

bounded threshold rank

[Arora, Barak, Steurer], [Barak, Raghavendra, Steurer],[Guruswami, Sinop], ...

If λk is large,then a cut of approximately minimal conductance

can be found in time exponential in kusing spectral methods [ABS]or convex relaxations [BRS, GS]

[Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweepalgorithm finds good approximation if λk is large

bounded threshold rank

[Arora, Barak, Steurer], [Barak, Raghavendra, Steurer],[Guruswami, Sinop], ...

If λk is large,then a cut of approximately minimal conductance

can be found in time exponential in kusing spectral methods [ABS]or convex relaxations [BRS, GS]

[Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweepalgorithm finds good approximation if λk is large

bounded threshold rank

If λk is large,

[Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweepalgorithm finds good approximation

[Oveis-Gharan, T]: the Goemans-Linial relaxation (solvable inpolynomial time independent of k) can be rounded better than inthe Arora-Rao-Vazirani analysis

[Oveis-Gharan, T]: the graph satisfies a weak regularity lemma inthe sense of Frieze and Kannan

bounded threshold rank

If λk is large,

[Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweepalgorithm finds good approximation

[Oveis-Gharan, T]: the Goemans-Linial relaxation (solvable inpolynomial time independent of k) can be rounded better than inthe Arora-Rao-Vazirani analysis

[Oveis-Gharan, T]: the graph satisfies a weak regularity lemma inthe sense of Frieze and Kannan

bounded threshold rank

If λk is large,

[Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweepalgorithm finds good approximation

[Oveis-Gharan, T]: the Goemans-Linial relaxation (solvable inpolynomial time independent of k) can be rounded better than inthe Arora-Rao-Vazirani analysis

[Oveis-Gharan, T]: the graph satisfies a weak regularity lemma inthe sense of Frieze and Kannan

bounded threshold rank

If λk is large, the graph is an easy instance for several algorithms.Two types of results:

• There is a near-optimal combinatorial solution with a simplestructure

Find it by brute force

• The optimum of a relaxation has a special structure

Improved rounding algorithm that uses the special structure

When λk+1 is large and λk is small

eigenvalue gap

• if λk = 0 and λk+1 > 0 then G has exactly k connectedcomponents;

• [Tanaka 2012] If λk+1 > 5k√λk , then vertices of G can be

partitioned into k sets of small conductance, each inducing asubgraph of large conductance. [Non-algorithmic proof]

• [Oveis-Gharan, T 2013] Sufficient φk+1(G ) > (1 + ε)φk(G ); ifλk+1 > O(k2)

√λk , then partition can be found

algorithmically

eigenvalue gap

• if λk = 0 and λk+1 > 0 then G has exactly k connectedcomponents;

• [Tanaka 2012] If λk+1 > 5k√λk , then vertices of G can be

partitioned into k sets of small conductance, each inducing asubgraph of large conductance. [Non-algorithmic proof]

• [Oveis-Gharan, T 2013] Sufficient φk+1(G ) > (1 + ε)φk(G ); ifλk+1 > O(k2)

√λk , then partition can be found

algorithmically

eigenvalue gap

• if λk = 0 and λk+1 > 0 then G has exactly k connectedcomponents;

• [Tanaka 2012] If λk+1 > 5k√λk , then vertices of G can be

partitioned into k sets of small conductance, each inducing asubgraph of large conductance. [Non-algorithmic proof]

• [Oveis-Gharan, T 2013] Sufficient φk+1(G ) > (1 + ε)φk(G ); ifλk+1 > O(k2)

√λk , then partition can be found

algorithmically

In Summary

λk is small: There are k disjoint non-expanding sets; they can befound efficiently

λk is large: Easy instance for various algorithms.

λk is small and λk+1 is large: Nodes can be partitioned into ksets, each with high internal expansion and small externalexpansion

lucatrevisan.wordpress.com/tag/expanders/

Recommended