View
249
Download
4
Category
Tags:
Preview:
Citation preview
Multifaceted Algorithm Design
Richard PengM.I.T.
LARGE SCALE PROBLEMS
Emphasis on efficient algorithms in:• Scientific computing• Graph theory• (randomized) numerical
routines
Network Analysis
Physical Simulation
Optimization
WELL STUDIED QUESTIONS
Scientific computing: fast solvers for structured linear systems
Graphs / combinatorics: network flow problems
Randomized algorithms: subsampling matrices and optimization formulations
B B’
MY REPRESENTATIVE RESULTS
Lx=b
B B’
Current fastest sequential and parallel solvers for linear systems in graph Laplacians matrices
First nearly-linear time algorithm for approximate undirected maxflow
First near-optimal routine for row sampling matrices in a 1-norm preserving manner
RECURRING IDEAS
Can solve a problem by iteratively solving several similar instances
Approximations lead to better approximations
Larger problems can be approximated by smaller ones
Approximator
Data
MY APPROACH TO ALGORITHM DESIGN
Numerical analysis /Optimization
Statistics /Randomized algorithms
Problems at their intersection
Identify problems that arise at the intersection of multiple areas and study them from multiple angles
Combinatorics / Discrete algorithms
This talk: structure-preserving sampling
SAMPLING
Classical use in statistics:• Extract info from a large data
set• Directly output result
(estimator)
Sampling from matrices, networks, and optimization problems:• Often compute on the sample• Need to preserve more structure
B B’
PRESERVING GRAPH STRUCTURESUndirected graph, n vertices, m < n2 edges
Is n2 edges (dense) sometimes necessary?
For some information, e.g. connectivity:encoded by spanning forest, < n edges
Deterministic, O(m) time algorithm
: questions
MORE INTRICATE STRUCTURES
k-connectivity: # of disjoint paths between s-t
[Benczur-Karger `96]: for ANY G, can sample to get H with O(nlogn) edges s.t. G ≈ H on all cuts
Stronger: weights of all 2n cuts in graphs
Cut: # of edges leaving a subset of vertices
s
t
Menger’s theorem / maxflow-mincut
: previous works
≈: multiplicative approximation
HOW TO SAMPLE?Widely used: uniform sampling Works well when data is
uniform e.g. complete graph
Problem: long path, removing any edge changes connectivity
(can also have both in one graph)
More systematic view of sampling?
ALGEBRAIC REPRESENTATION OF GRAPHS
n rows / columnsO(m) non-zeros
1
1
n verticesm edges
graph Laplacian Matrix L• Diagonal: degree• Off-diagonal:
-edge weights
Edge-vertex incidence matrix:Beu = -1/1 if u is
endpoint of e
0 otherwise
m rowsn columns
L is the Gram matrix of B, L = BTB
2 -1 -1 -1 1 0 -1 0 1
1 -1 0 -1 0 1
xv=0
SPECTRAL SIMILARITY
Numerical analysis:LG ≈ LH if xTLGx ≈ xTLHx for all vectors x
x = {0, 1}V:
G ≈ H on all cuts
xu=1 xz=1
(1-0)2=1
(1-1)2=0
Gram matrix: LG = BGTBG xTLGx
=║BGx║22
Beu = -1/1 if u is endpoint of e
0 otherwise
║BGx║2 ≈║BHx║2 ∀ x
║yi║22
=Σi yi2
For edge e = uv, (Be:x) 2
= (xu – xv)2
║BGx║22 = size of cut given by
x
n
n
ALGEBRAIC VIEW OF SAMPLING EDGES
B’
B
L2 Row sampling:
Given B with m>>n, sample a few rows to form B’ s.t.║Bx║2 ≈║B’x║2 ∀ x
Note: normally use A instead of B, n and d instead of m and n
m
0 -1 0 0 0 1 0 0 -5 0 0 0 5
0≈n
IMPORTANCE SAMPLING
Issue: only one non-zero row
Keep a row, bi, with probability pi, rescale if kept to maintain expectation
Uniform sampling: pi = 1/k for a factor k size reduction
norm sampling:pi =m/k║bi║2
2 / ║B║F2
Issue: column with one entry
THE `RIGHT’ PROBABILITIES
Only one non-zero row Column with one entry
00100
n/mn/mn/mn/m1
Path + clique:
1
1/n
bi: row i of B,L = BTB
τ: L2 statistical leverage scores
τi = biT(BTB)-1bi = ║bi║2
L-
1
L2 MATRIX-CHERNOFF BOUNDS
[Foster `49] Σi τi = rank ≤ n O(nlogn) rows
[Rudelson, Vershynin `07], [Tropp `12]: sampling with pi ≥ τiO( logn) gives B’ s.t. ║Bx║2 ≈║B’x║2 ∀x w.h.p.
τ: L2 statistical leverage scores
τi = biT(BTB)-1bi = ║bi║2
L-
1
Near optimal:• L2-row samples of
B• Graph sparsifiers
• In practice O(logn) 5 usually suffices
• can also improve via derandomization
MY APPROACH TO ALGORITHM DESIGN
Extend insights gained from studying problems at the intersection of multiple areas back to these areas
Combinatorics / Discrete algorithms
Numerical analysis /Optimization
Statistics /Randomized algorithms
Problems at their intersection
Algorithmic extensions of structure-preserving sampling
Maximum flow
Solving linear systems
Preserving L1-structures
SUMMARY
• Algorithm design approach: study problems at the intersection of areas, and extend insights back.• Can sparsify objects via importance
sampling.
Graph Laplacian• Diagonal: degree• Off-diagonal: -
weightCombinatorics / Discrete algorithms
Numerical analysis /Optimization
Solvers for linear systems involving graph Laplacians
Lx = b
Current fastest sequential and parallel solvers for linear systems in graph Laplacians
Lx=b
Application: estimate all τi =║bi║2
L-1 by solving O(logn) linear systems
Directly related to:• Elliptic problems• SDD, M, and H-
matrices
Statistics /Randomized algorithms
ALGORITHMS FOR Lx = b
Given any graph Laplacian L with n vertices and m edges, any vector b, find vector x s.t. Lx = b
[Vaidya `89]: use graph theory!
2014: 1/2
loglog plot of c:
2011: 1
2010: 2
[Spielman-Teng `04]: O(mlogcn)
[P-Spielman `14]: alternate, fully parallelizable approach: my
results
2006: 32
2004: 70
2009: 15
2010: 6
: previous works
: questions
ITERATIVE METHODS
Division using multiplicationI + A + A2 + A3 + …. = (I – A)-1
= L-1
Spectral theorem: can view as scalars
Simplification: assume L = I – A,A: transition matrix of random walk
Richardson iteration: truncate to i terms,Approximate x = (I – A)-1b with x(i) = (I + A + … Ai)b
RICHARDSON ITERATION
#terms needed lower bounded by information propagation Adiameterb
Highly connected graphs: few terms ok
b Ab A2b
Need n matrix operations?
Evaluation (Horner’s rule):• (I + A + A2)b = A(Ab + b) +
b• i terms: x(0) = b, x(i + 1) = Ax(i)
+ b
i matrix-vector multiplications
Can interpret as gradient descent
(I – A)-1 = I + A + A2 + A3 + …. = (I + A) (I + A2) (I +
A4)…
DEGREE N N OPERATIONS?
Combinatorial view:• A: step of random walk• I – A2: Laplacian of the 2 step random walk
Dense matrix!
Repeated squaring: A16 = ((((A2)2)2)2, 4 operations
• O(logn) terms ok• Similar to multi-level
methods
Still a graph Laplacian!
Can sparsify!
REPEATED SPARSE SQUARING
Combining known tools: efficiently sparsify I – A2 without computing A2
(I – A)-1 = (I + A) (I + A2) (I + A4)…
[P-Spielman `14] approximate L-1 with O(logn) sparse matrices
key ideas: modify factorization to allow gradual introduction and control of error
SUMMARY
• Algorithm design approach: study problems at the intersection of areas, and extend insights back.• Can sparsify objects via importance sampling.• Solve Lx=b efficiently via sparsified
squaring.
FEW ITERATIONS OF Lx = b• [Tutte `61]: graph drawing, embeddings• [ZGL `03], [ZHS `05]: inference on graphical
models
Inverse powering: eigenvectors / heat kernel:• [AM `85] spectral clustering• [OSV `12]: balanced cuts• [SM `01][KMST `09]: image segmentation
[CFMNPW`14]: Helmholtz decomp. on 3D mesh
MANY ITERATIONS OF Lx = b[Karmarkar, Ye, Renegar, Nesterov, Nemirovski …]: convex optimization via. solving O(m1/2) linear systems
[DS `08]: optimization on graphs Laplacian systems
[KM `09][MST`14]: random spanning trees
[CKMST `11]: faster approx maximum flow
[KMP `12]: multicommodity flow
MAXFLOW
Combinatorics / Discrete algorithms
Numerical analysis /Optimization
Statistics /Randomized algorithms
Maximum flow
First O(mpolylog(n)) time algorithm for approximate undirected maxflow
(for unweighted, undirected graphs)
MAXIMUM FLOW PROBLEM
s
t
s
t
Given s, t, find the maximum number of disjoint s-t paths
Dual: separate s and t by removing fewest edges
Applications:• Clustering• Image processing• Scheduling
WHAT MAKES MAXFLOW HARD
Highly connected: route up to n paths
Long paths: a step may involve n vertices
Goal: handle both and do better than many steps × long paths = n2
Each ‘easy’ on their own
ALGORITHMS FOR FLOWS
Current fastest maxflow algorithms:• Exact (weakly-polytime): invoke Lx=b• Approximate: modify algorithms for
Lx=b[P`14]: (1 – ε)-approx maxflow in O(mlogcnε-2) time
Ideas introduced:
1980: dynamic trees
1970s: Blocking flows
1986: dual algorithms
1989: connections to Lx = b
2013: modify Lx = b
2010: few calls to Lx = b
Algebraic formulation of min s-t cut:Minimize ║Bx║2 subject to xs = 0, xt = 1 and x integral
MAXIMUM FLOW IN ALMOST LINEAR TIME
[Madry `10]: finding O(m1+θ) sized approximator that require O(mθ) calls in O(m1+θ) time (for any θ > 0)Approximator
Maxflow [Racke-Shah-Taubig `14] O(n) sized approximator that require O(logcn) iterations via solving maxflows on graphs of total size O(mlogcn)
Maxflow Maxflow
Approximator Approximator
Chicken and egg problem
O(m1+2θε-2) timeO(mlogcnε-2) time?
Algebraic formulation of min s-t cut:Minimize ║Bx║1 subject to xs = 0, xt = 1 ║*║1 : 1-norm, sum of absolute
values
[Sherman `13] [Kelner-Lee-Orecchia-Sidford `13]:can find approximate maxflow iteratively via several calls to a structure approximator
ALGORITHMIC SOLUTION
Ultra-sparsifier (e.g. [Koutis-Miller-P `10]): for any k, can find H close to G, but equivalent to graph of size O(m/k)
` `
Maxflow
Absorb additional (small) error via more calls to approximatorRecurse on instances with smaller total size, total cost: O(mlogcn)
Key step: vertex reductions via edge reductions[P`14]: build approximator on the smaller graph
[CLMPPS`15]: extends to numerical data, has close connections to variants of Nystrom’s method
SUMMARY
• Algorithm design approach: study problems at the intersection of areas, and extend insights back.• Can sparsify objects via importance sampling.• Solve Lx=b efficiently via sparsified squaring.• Approximate maximum flow routines and
structure approximators can be constructed recursively from each other via graph sparsifiers.
RANDOMIZED NUMERICALLINEAR ALGEBRA
Combinatorics / Discrete algorithms
Numerical analysis /Optimization
Statistics /Randomized algorithms
L1-preserving row sampling
B B’
First near-optimal routine for row sampling matrices in a 1-norm preserving manner
║y║1║y║2
GENERALIZATIONGeneralization of row sampling:given A, q, find A’ s.t.║Ax║q ≈║A’x║q ∀ x
1-norm: standard for representing cuts, used in sparse recovery / robust regression
Applications (for general A):• Feature selection• Low rank approximation / PCA
q-norm: ║y║q = (Σ|yi|q)1/q
Omitting corresponding empirical studies
ROW SAMPLING ROUTINES
#rows for q=2
#rows for q=1
Runtime
Dasgupta et al. `09 n2.5 mn5
Magdon-Ismail `10 nlog2n mn2
Sohler-Woodruff `11 n3.5 mnω-1+θ
Drineas et al. `12 nlogn mnlogn
Clarkson et al. `12 n4.5log1.5n mnlogn
Clarkson-Woodruff `12 n2logn n8 nnz
Mahoney-Meng `12 n2 n3.5 nnz+n6
Nelson-Nguyen `12 n1+θ nnz
Li et.`13, Cohen et al. 14
nlogn n3.66 nnz+nω+θ
[Naor `11][Matousek `97]: on graphs, L2 approx Lq approx ∀ 1 ≤ q ≤ 2
How special are graphs?
A’ s.t.║Ax║q ≈║A’x║q ∀ x nnz: # of non-zeros in A
How special is L2?
L1 ROW SAMPLING
L1 Lewis weights ([Lewis `78]):
w s.t. wi2 = ai
T(ATW-
1A)-1ai
Recursive definition!
[Sampling with pi ≥ wiO( logn) gives ║Ax║1 ≈ ║A’x║1
∀x
Can check: Σi wi ≤ n O(nlogn) rows
[Talagrand `90, “Embedding subspaces of L1 into LN
1”] can be analyzed as row-sampling /
sparsification
[COHEN-P `14]
Update w on LHS with w on RHS
w’i (ai
T(ATW-1A)-1ai)1/2
q Previous # of rows New # Rows Runtime
1 n2.5 nlogn nnz+nω+θ
1 < q < 2 nq/2+2 nlogn(loglogn)2 nnz+nω+θ
2 < q nq+1 np/2logn nnz+nq/2+O(1)
Converges in loglogn steps: analyze ATW-1A spectrally
Aside: similar to iterative reweighted least squares
Elementary, optimization motivated proof of w.h.p. concentration for L1
SUMMARY
• Algorithm design approach: study problems at the intersection of areas, and extend insights back.• Can sparsify objects via importance sampling.• Solve Lx=b efficiently via sparsified squaring.• Approximate maximum flow routines and cut-
approximators can be constructed recursively from each other via graph sparsifiers.• Wider ranges of structures can be
sparsified, key statistical quantities can be computed iteratively.
I’VE ALSO WORKED ON
• Dynamic graph data structures• Graph partitioning• Parallel algorithms• Image processing• Anomaly / sybil
detection in graphs
FUTURE WORK:LINEAR SYSTEM SOLVERS
• Wider classes of linear systems• Relation to optimization /
learning
Combinatorics / Discrete algorithms
Numerical analysis /Optimization
Statistics /Randomized algorithms
Mx=bSolvers for linear systems involving graph Laplacians
FUTURE WORK:COMBINATORIAL OPTIMIZATION
Faster algorithms for more classical algorithmic graph theory problems?
Combinatorics / Discrete algorithms
Numerical analysis /Optimization
Statistics /Randomized algorithms
Maximum flow
FUTURE WORK: RANDOMIZED NUMERICAL LINEAR ALGEBRA
• Other algorithmic applications of Lewis weights?• Low-rank approximation in L1?
• O(n)-sized L1-preserving row samples?(these exist for L2)
Combinatorics / Discrete algorithms
Numerical analysis /Optimization
Statistics /Randomized algorithms
L1-preserving row sampling
B B’
SUMMARY
Combinatorics / Discrete algorithms
Numerical analysis / Optimization
Statistics /Randomized algorithms
Problems at their intersection
B B’
Links to arXiv manuscripts and videos of more detailed talks are at:
math.mit.edu/~rpeng/
Mx=b
Recommended