Upload
annice-gardner
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Daniel A. Spielman
MIT
Fast, Randomized Algorithms forPartitioning, Sparsification, andthe Solution of Linear Systems
Joint work with Shang-Hua Teng (Boston University)
Nearly-Linear Time Algorithms forGraph Partitioning, Graph Sparsification, and Solving Linear Systems S-Teng ‘04
Papers
The Mixing Rate of Markov Chains, An Isoperimetric Inequality, and Computing The Volume Lovasz-Simonovits ‘93
The Eigenvalues of Random Symmetric Matrices Füredi-Komlós ‘81
Preconditioning to solve linear system
Overview
Find B that approximates A, such that solving is easy
Three techniques: Augmented Spanning Trees [Vaidya ’90]
Sparsification: Approximate graph by sparse graph
Partitioning: Runtime proportional to #nodes removed
I. Linear system solvers
II. Sparsification using partitioning and random sampling
III. Graph partitioning by truncated random walks
Outline
Eigenvalues of random graphs
Combinatorial and Spectral Graph Theory
Diagonally Dominant Matrices
Just solve for
Solve wherediags of A at least sum of abs of others in row
1 2 3
Ex: Laplacian matrices of weighted graphs = negative weight from i to j diagonal = weighted degree
Complexity of Solving Ax = bA positive semi-definite
General Direct Methods:
Gaussian Elimination (Cholesky)
Conjugate Gradient
Fast matrix inversion
n is dimensionm is number non-zeros
Complexity of Solving Ax = bA positive semi-definite, structured
Planar: Nested dissection [Lipton-Rose-Tarjan ’79]
Tree: like path, work up from leaves
Path: forward and backward elimination
Express A = L LT
Preconditioned Conjugate Gradient
Omit for rest of talk
Iterative Methods
Find easy B that approximates A
Solve in time
Qualityof approximation
Time to solveBy = c
Iteratively solve in time
Main Result
For symmetric, diagonally dominant A
General:
Planar:
Vaidya’s Subgraph Preconditioners
Precondition A by the Laplacian of a subgraph, B
1 2
34
1 2
34
A B
Vaidya ’90 Relate to graph embedding cong/dil
use MSTaugmented MST
History
Gremban, Miller ‘96 Steiner vertices, many details
Bern, Boman, Chen, Gilbert, Hendrickson, Nguyen, Toledo All the details, algebraic approach
Joshi ’97, Reif ‘98 Recursive, multi-level approach
planar:
planar:
Maggs, Miller, Parekh, Ravi, Woo ‘02after preprocessing
History
Boman, Hendrickson ‘01 Low-stretch spanning trees
Clustering, Partitioning, Sparsification, Recursion Lower-stretch spanning trees
S-Teng ‘03 Augmented low-stretch trees
if A-B is positive semi-definite,that is,
if A is positive semi-definite
For A Laplacian of graph with edges E
The relative condition number:
Bounding
For B subgraph of A,
and so
Fundamental Inequality
18
1
2 3
8 7
45
6
18
1
2 3
8 7
45
6
Fundamental Inequality
Application to eigenvalues of graphs
Example: For complete graph on n nodes, all non-zero eigs = n
= 0 for Laplacian matrices, and
-7 -5 -3 -1 1 3 5 7
For path,
Lower bound on
[Gauttery-Leighton-Miller ‘97]
Lower bound on
So
And
Preconditioning with a Spanning Tree B
A B
Every edge of A not in B has unique path in B
When B is a Tree
Low-Stretch Spanning Trees
where
Theorem (Alon-Karp-Peleg-West ’91)
Theorem (Elkin-Emek-S-Teng ’04)
Theorem (Boman-Hendrickson ’01)
B a spanning tree plus s edges, in total
Vaidya’s Augmented Spanning Trees
Adding Edges to a Tree
Adding Edges to a Tree
Partition tree into t sub-trees, balancing stretch
Adding Edges to a Tree
Partition tree into t sub-trees, balancing stretchFor sub-trees connected in A, add one such bridge edge, carefully
(and in blue)
Adding Edges to a Tree
Theorem:
in general
if planar
t sub-trees
improve to
All graphs can be well-approximated by a sparse graph
SparsificationFeder-Motwani ’91, Benczur-Karger ’96
D. Eppstein, Z. Galil, G.F. Italiano, and T. Spencer. ‘93 D. Eppstein, Z. Galil, G.F. Italiano, and A. Nissenzweig ’97
SparsificationBenczur-Karger ’96: Can find sparse subgraph H s.t.
We need
(edges are subset, but different weights)
H has edges
Example: Complete Graph
If A is Laplacian of Kn, all non-zero eigenvalues are n
If B is Laplacian of Ramanujan expander all non-zero eigenvalues satisfy
And so
Example: Dumbell
If B does not contain middle edge,
Kn Kn
A
Example: Grid plus edge
1111
• Random sampling not sufficient.• Cut approximation not sufficient
= (m-1)2 + k(m-1)
(m-1)2
Conductance
Cut = partition of vertices
Conductance of S
S
Conductance of G
Conductance
S S
S
Conductance and Sparsification
If conductance high (expander) can precondition by random sampling
If conductance low can partition graph by removing few edges
Decomposition: Partition of vertex set remove few edges graph on each partition has high conductance
Graph Decomposition
Lemma: Exists partition of vertices
Each Vi has large At most half the edges cross partition
sample these edges
recurse on these edges
Alg:
Graph Decomposition Exists
Each Vi has At most half edges cross
Lemma: Exists partition of vertices
Proof: Let S be largest set s.t.
If
then
Graph Decomposition Exists
Proof: Let S be largest set s.t.
If
then
S
Graph Decomposition Exists
Proof: Let S be largest set s.t.
If
then
S
If S smallonly recurse in S
If S big, V-S not too big
S V - S
Bounded recursion depth
Sparsification from Graph Decomposition
sample these edges
recurse on these edges
Alg:
Theorem: Exists B with #edges <
Need to find the partition in nearly-linear time
Sparsification
Thm: For all A, find B edges
Thm: Given G, find subgraph H s.t.
in time
#edges(H) <
General solve time:
Cheeger’s Inequality[Sinclair-Jerrum ‘89]:
For Laplacian L, and D: diagonal matrix with = degree of node i
Useful if is big
Random Sampling
Given a graph will randomly sample to get
So that is small,
where is diagonal matrix of degrees in
Useful if is big
If
and
Then, for all x orthogonal to (mutual) nullspace
But, don’t want D in there
D has no impact:
So
implies
Work with adjacency matrix
No difference, because
= weight of edge from i to j, 0 if i = j
Random Sampling Rule
Choose param governing sparsity of
Keep edge (i,j) with prob
If keep edge, raise weight by
Random Sampling Rule
guarantees
And,
guarantees expect at most edges in
Random Sampling Theorem
Theorem
Useful for
From now on, all edges have weight 1
Will prove in unweighted case.
Analysis by Trace
Trace = sum of diagonal entries = sum of eigenvalues
For even k
Analysis by Trace
Main Lemma:
Proof of Theorem:
Markov:
k-th root
Expected Trace
21
3
4
Expected Trace
Most terms zero because
So, sum is zero unless each edge appears at least twice.
Will code such walks to count them.
Coding walks
For sequence where
S = {i : edge not used before, either way}
the index of the time edge was used before, either way
Coding walks: Example
2
1
3 4
step: 0, 1, 2, 3, 4, 5, 6, 7, 8vert: 1, 2, 3, 4, 2, 3, 4, 2, 1
S = {1, 2, 3, 4}
1 2 2 3 3 4 4 2
: 5 2 6 3 7 4 8 1
Now, do a few slides pointing out what it meas
Coding walks: Example
2
1
3 4
step: 0, 1, 2, 3, 4, 5, 6, 7, 8vert: 1, 2, 3, 4, 2, 3, 4, 2, 1
S = {1, 2, 3, 4}
1 2 2 3 3 4 4 2
: 5 2 6 3 7 4 8 1
Now, do a few slides pointing out what it meas
S = {1, 2, 3, 4}
1 2 2 3 3 4 4 2
Coding walks: Example
2
1
3 4
step: 0, 1, 2, 3, 4, 5, 6, 7, 8vert: 1, 2, 3, 4, 2, 3, 4, 2, 1
: 5 2 6 3 7 4 8 1
Now, do a few slides pointing out what it meas
S = {1, 2, 3, 4}
1 2 2 3 3 4 4 2
Coding walks: Example
2
1
3 4
step: 0, 1, 2, 3, 4, 5, 6, 7, 8vert: 1, 2, 3, 4, 2, 3, 4, 2, 1
: 5 2 6 3 7 4 8 1
Now, do a few slides pointing out what it meas
: 5 2 6 3 7 4 8 1
S = {1, 2, 3, 4}
1 2 2 3 3 4 4 2
Coding walks: Example
2
1
3 4
step: 0, 1, 2, 3, 4, 5, 6, 7, 8vert: 1, 2, 3, 4, 2, 3, 4, 2, 1
Now, do a few slides pointing out what it meas
Valid s
For each i in S is a neighbor of with a probability of being chosen < 1
That is, can be non-zero
For each i not in S can take edge indicated by
That is
Expected Trace by from Code
Expected Trace from Code
Are ways to choose given
Expected Trace from Code
Are ways to choose given
Random Sampling Theorem
Theorem
Useful for
Random sampling sparsifies graphs of high conductance
Graph Partitioning Algorithms
SDP/LP: too slow
Spectral: one cut quickly, but can be unbalanced many runs
Multilevel (Chaco/Metis): can’t analyze, miss small sparse cuts
New Alg: Based on truncated random walks. Approximates optimal balance,
Can be use to decompose
Run time
Lazy Random Walk
At each step: stay put with prob ½ otherwise, move to neighbor according to weight
Diffusing probability mass: keep ½ for self, distribute rest among neighbors
Lazy Random WalkDiffusing probability mass: keep ½ for self, distribute rest among neighbors
1 0
0
0
Or, with self-loops, distribute evenly over edges
1
3
2
2
Lazy Random WalkDiffusing probability mass: keep ½ for self, distribute rest among neighbors
1/2 1/2
0
0
1
3
2
2
Or, with self-loops, distribute evenly over edges
Lazy Random WalkDiffusing probability mass: keep ½ for self, distribute rest among neighbors
1/3 1/2
1/12
1/12
1
3
2
2
Or, with self-loops, distribute evenly over edges
Lazy Random WalkDiffusing probability mass: keep ½ for self, distribute rest among neighbors
1/4 11/24
7/48
7/48
1
3
2
2
Or, with self-loops, distribute evenly over edges
Lazy Random WalkDiffusing probability mass: keep ½ for self, distribute rest among neighbors
1/8 3/8
2/8
2/8
1
3
2
2
In limit, prob mass ~ degree
Why lazy?
Otherwise, might be no limit
1
1
0
0
Why so lazy to keep ½ mass?
Diagonally dominant matrices
weight of edge from i to j
degree of node i, if i = j
prob go from i to j
Rate of convergence
Cheeger’s inequality: related to conductance
If low conductance, slow convergence
If start uniform on S, prob leave at each step =
After steps, at least ¾ of mass still in S
Lovasz-Simonovits Theorem
If slow convergence, then low conductance
And, can find the cut from highest probability nodes
.137
.135
.134
.129 .112
.094
Lovasz-Simonovits Theorem
From now on, every node has same degree, d
For all vertex sets S, and all t · T
Where
k verts with most prob at step t
Lovasz-Simonovits Theorem
If start from node in set S with small conductance will output a set of small conductance
Extension: mostly contained in S
k verts with most prob at step t
Speed of Lovasz-Simonovits
Want run-time proportional to #vertices removedLocal clustering on massive graph
If want cut of conductancecan run for steps.
Problem: most vertices can have non-zero mass when cut is found
Speeding up Lovasz-Simonovits
Round all small entries of to zero
Algorithm Nibble input: start vertex v, target set size k start: step: all values < map to zero
Theorem: if v in conductance < size k set output a set with conductance < mostly overlapping
The Potential Function
For integer k
Linear in between these points
Concave: slopes decrease
For
Easy Inequality
Fancy Inequality
Fancy Inequality
Chords with little progress
Dominating curve, makes progress
Dominating curve, makes progress
Proof of Easy Inequality
time t-1
time t
Order vertices by probability mass
If top k at time t-1 only connect to top k at time t get equality
Proof of Easy Inequality
time t-1
time t
Order vertices by probability mass
If top k at time t-1 only connect to top k at time t get equality
Otherwise, some mass leaves, and get inequality
External edges from Self Loops
Lemma: For every set S, and every set R of same size At least frac of edges from S don’t hit R
3
3
3
3
3
3
S) = 1/2
S R
Tight example:
External edges from Self Loops
Lemma: For every set S, and every set R of same size At least S) frac of edges from S don’t hit R
Proof:
S R
When R=S, is by definition.Each edge in R-S can absorb d edges from S.But each vertex of S-R has d self-loops that do not go to R.
Proof of Fancy Inequality
time t-1
time t
topin out
It(k) = top + in · top + (in + out)/2 = top/2 + (top + in + out)/2
Local Clustering
Theorem: If S is set of conductance < v is random vertex of S Then output set of conductance < mostly in S, in time proportional to size of output.
Local Clustering
Can it be done for all conductances???
Theorem: If S is set of conductance < v is random vertex of S Then output set of conductance < mostly in S, in time proportional to size of output.
Experimental code: ClusTree
1. Cluster, crudely
2. Make trees in clusters
3. Add edges between trees, optimally
No recursion on reduced matrices
ImplementationClusTree: in java
timing not included could increase total time by 20%
PCGCholeskydroptol Inc CholVaidya
TAUCS Chen, Rotkin, Toledo
Orderings: amd, genmmd, Metis, RCM
Intel Xeon 3.06GHz, 512k L2 Cache, 1M L3 Cache
2D grid, Neumann bdry
run to residual error 10-8
0 0.5 1 1.5 2
x 106
0
20
40
60
80
100
120
num nodes
seco
nd
s
2d Grid, Neumann
pcg/ic/genmmddirect/genmmdVaidyaclusTree
0 0.5 1 1.5 2
x 106
0
20
40
60
80
100
120
num nodes
seco
nd
s
2d Grid, Neumann
pcg/ic/genmmddirect/genmmdVaidyaclusTree
0 0.5 1 1.5 2
x 106
0
20
40
60
80
100
num nodes
seco
nd
s
2d Grid, Dirichlet
direct/genmmdVaidyapcg/mic/rcmclusTree
2d Grid Dirichlet 2D grid, Neumann
Impact of boundary conditions
0 5 10 15
x 105
0
20
40
60
80
100
120
num nodes
seco
nd
s
2d Unstructured Delaunay Mesh, Neumann
Vaidyadirect/Metispcg/ic/genmmdclusTree
2D Unstructured Delaunay Mesh
0 5 10 15
x 105
0
20
40
60
80
100
120
num nodes
seco
nd
s
2d Unstructured Delaunay Mesh, Dirichlet
Vaidyadirect/Metispcg/mic/genmmdclusTree
Dirichlet Neumann
Future Work
Practical Local Clustering
More Sparsification
Other physical problems:
Cheeger’s inequalty
Implications for combinatorics,and Spectral Hypergraph Theory
“Nearly-Linear Time Algorithms for Graph Paritioning, Sparsification, and Solving Linear Systems” (STOC ’04, Arxiv)
To learn more
Will be split into two papers:numerical and combinatorial
My lecture notes forSpectral Graph Theory and its Applications