Multifaceted Algorithm Design Richard Peng M.I.T

Multifaceted Algorithm Design

Richard PengM.I.T.

LARGE SCALE PROBLEMS

Emphasis on efficient algorithms in:• Scientific computing• Graph theory• (randomized) numerical

routines

Network Analysis

Physical Simulation

Optimization

WELL STUDIED QUESTIONS

Scientific computing: fast solvers for structured linear systems

Graphs / combinatorics: network flow problems

Randomized algorithms: subsampling matrices and optimization formulations

B B’

MY REPRESENTATIVE RESULTS

B B’

Current fastest sequential and parallel solvers for linear systems in graph Laplacians matrices

First nearly-linear time algorithm for approximate undirected maxflow

First near-optimal routine for row sampling matrices in a 1-norm preserving manner

RECURRING IDEAS

Can solve a problem by iteratively solving several similar instances

Approximations lead to better approximations

Larger problems can be approximated by smaller ones

Approximator

MY APPROACH TO ALGORITHM DESIGN

Numerical analysis /Optimization

Statistics /Randomized algorithms

Problems at their intersection

Identify problems that arise at the intersection of multiple areas and study them from multiple angles

Combinatorics / Discrete algorithms

This talk: structure-preserving sampling

SAMPLING

Classical use in statistics:• Extract info from a large data

set• Directly output result

(estimator)

Sampling from matrices, networks, and optimization problems:• Often compute on the sample• Need to preserve more structure

B B’

PRESERVING GRAPH STRUCTURESUndirected graph, n vertices, m < n2 edges

Is n2 edges (dense) sometimes necessary?

For some information, e.g. connectivity:encoded by spanning forest, < n edges

Deterministic, O(m) time algorithm

: questions

MORE INTRICATE STRUCTURES

k-connectivity: # of disjoint paths between s-t

[Benczur-Karger `96]: for ANY G, can sample to get H with O(nlogn) edges s.t. G ≈ H on all cuts

Stronger: weights of all 2n cuts in graphs

Cut: # of edges leaving a subset of vertices

Menger’s theorem / maxflow-mincut

: previous works

≈: multiplicative approximation

HOW TO SAMPLE?Widely used: uniform sampling Works well when data is

uniform e.g. complete graph

Problem: long path, removing any edge changes connectivity

(can also have both in one graph)

More systematic view of sampling?

ALGEBRAIC REPRESENTATION OF GRAPHS

n rows / columnsO(m) non-zeros

n verticesm edges

graph Laplacian Matrix L• Diagonal: degree• Off-diagonal:

-edge weights

Edge-vertex incidence matrix:Beu = -1/1 if u is

endpoint of e

0 otherwise

m rowsn columns

L is the Gram matrix of B, L = BTB

2 -1 -1 -1 1 0 -1 0 1

1 -1 0 -1 0 1

SPECTRAL SIMILARITY

Numerical analysis:LG ≈ LH if xTLGx ≈ xTLHx for all vectors x

x = {0, 1}V:

G ≈ H on all cuts

xu=1 xz=1

(1-0)2=1

(1-1)2=0

Gram matrix: LG = BGTBG xTLGx

=║BGx║22

Beu = -1/1 if u is endpoint of e

0 otherwise

║BGx║2 ≈║BHx║2 ∀ x

║yi║22

=Σi yi2

For edge e = uv, (Be:x) 2

= (xu – xv)2

║BGx║22 = size of cut given by

ALGEBRAIC VIEW OF SAMPLING EDGES

L2 Row sampling:

Given B with m>>n, sample a few rows to form B’ s.t.║Bx║2 ≈║B’x║2 ∀ x

Note: normally use A instead of B, n and d instead of m and n

0 -1 0 0 0 1 0 0 -5 0 0 0 5

IMPORTANCE SAMPLING

Issue: only one non-zero row

Keep a row, bi, with probability pi, rescale if kept to maintain expectation

Uniform sampling: pi = 1/k for a factor k size reduction

norm sampling:pi =m/k║bi║2

2 / ║B║F2

Issue: column with one entry

THE `RIGHT’ PROBABILITIES

Only one non-zero row Column with one entry

n/mn/mn/mn/m1

Path + clique:

bi: row i of B,L = BTB

τ: L2 statistical leverage scores

τi = biT(BTB)-1bi = ║bi║2

L2 MATRIX-CHERNOFF BOUNDS

[Foster `49] Σi τi = rank ≤ n O(nlogn) rows

[Rudelson, Vershynin `07], [Tropp `12]: sampling with pi ≥ τiO( logn) gives B’ s.t. ║Bx║2 ≈║B’x║2 ∀x w.h.p.

τ: L2 statistical leverage scores

τi = biT(BTB)-1bi = ║bi║2

Near optimal:• L2-row samples of

B• Graph sparsifiers

• In practice O(logn) 5 usually suffices

• can also improve via derandomization

MY APPROACH TO ALGORITHM DESIGN

Extend insights gained from studying problems at the intersection of multiple areas back to these areas

Algorithmic extensions of structure-preserving sampling

Maximum flow

Solving linear systems

Preserving L1-structures

SUMMARY

• Algorithm design approach: study problems at the intersection of areas, and extend insights back.• Can sparsify objects via importance

sampling.

Graph Laplacian• Diagonal: degree• Off-diagonal: -

weightCombinatorics / Discrete algorithms

Solvers for linear systems involving graph Laplacians

Lx = b

Current fastest sequential and parallel solvers for linear systems in graph Laplacians

Application: estimate all τi =║bi║2

L-1 by solving O(logn) linear systems

Directly related to:• Elliptic problems• SDD, M, and H-

matrices

ALGORITHMS FOR Lx = b

Given any graph Laplacian L with n vertices and m edges, any vector b, find vector x s.t. Lx = b

[Vaidya `89]: use graph theory!

2014: 1/2

loglog plot of c:

2011: 1

2010: 2

[Spielman-Teng `04]: O(mlogcn)

[P-Spielman `14]: alternate, fully parallelizable approach: my

results

2006: 32

2004: 70

2009: 15

2010: 6

: previous works

: questions

ITERATIVE METHODS

Division using multiplicationI + A + A2 + A3 + …. = (I – A)-1

Spectral theorem: can view as scalars

Simplification: assume L = I – A,A: transition matrix of random walk

Richardson iteration: truncate to i terms,Approximate x = (I – A)-1b with x(i) = (I + A + … Ai)b

RICHARDSON ITERATION

#terms needed lower bounded by information propagation Adiameterb

Highly connected graphs: few terms ok

b Ab A2b

Need n matrix operations?

Evaluation (Horner’s rule):• (I + A + A2)b = A(Ab + b) +

b• i terms: x(0) = b, x(i + 1) = Ax(i)

i matrix-vector multiplications

Can interpret as gradient descent

(I – A)-1 = I + A + A2 + A3 + …. = (I + A) (I + A2) (I +

A4)…

DEGREE N N OPERATIONS?

Combinatorial view:• A: step of random walk• I – A2: Laplacian of the 2 step random walk

Dense matrix!

Repeated squaring: A16 = ((((A2)2)2)2, 4 operations

• O(logn) terms ok• Similar to multi-level

methods

Still a graph Laplacian!

Can sparsify!

REPEATED SPARSE SQUARING

Combining known tools: efficiently sparsify I – A2 without computing A2

(I – A)-1 = (I + A) (I + A2) (I + A4)…

[P-Spielman `14] approximate L-1 with O(logn) sparse matrices

key ideas: modify factorization to allow gradual introduction and control of error

SUMMARY

• Algorithm design approach: study problems at the intersection of areas, and extend insights back.• Can sparsify objects via importance sampling.• Solve Lx=b efficiently via sparsified

squaring.

FEW ITERATIONS OF Lx = b• [Tutte `61]: graph drawing, embeddings• [ZGL `03], [ZHS `05]: inference on graphical

models

Inverse powering: eigenvectors / heat kernel:• [AM `85] spectral clustering• [OSV `12]: balanced cuts• [SM `01][KMST `09]: image segmentation

[CFMNPW`14]: Helmholtz decomp. on 3D mesh

MANY ITERATIONS OF Lx = b[Karmarkar, Ye, Renegar, Nesterov, Nemirovski …]: convex optimization via. solving O(m1/2) linear systems

[DS `08]: optimization on graphs Laplacian systems

[KM `09][MST`14]: random spanning trees

[CKMST `11]: faster approx maximum flow

[KMP `12]: multicommodity flow

MAXFLOW

Maximum flow

First O(mpolylog(n)) time algorithm for approximate undirected maxflow

(for unweighted, undirected graphs)

MAXIMUM FLOW PROBLEM

Given s, t, find the maximum number of disjoint s-t paths

Dual: separate s and t by removing fewest edges

Applications:• Clustering• Image processing• Scheduling

WHAT MAKES MAXFLOW HARD

Highly connected: route up to n paths

Long paths: a step may involve n vertices

Goal: handle both and do better than many steps × long paths = n2

Each ‘easy’ on their own

ALGORITHMS FOR FLOWS

Current fastest maxflow algorithms:• Exact (weakly-polytime): invoke Lx=b• Approximate: modify algorithms for

Lx=b[P`14]: (1 – ε)-approx maxflow in O(mlogcnε-2) time

Ideas introduced:

1980: dynamic trees

1970s: Blocking flows

1986: dual algorithms

1989: connections to Lx = b

2013: modify Lx = b

2010: few calls to Lx = b

Algebraic formulation of min s-t cut:Minimize ║Bx║2 subject to xs = 0, xt = 1 and x integral

MAXIMUM FLOW IN ALMOST LINEAR TIME

[Madry `10]: finding O(m1+θ) sized approximator that require O(mθ) calls in O(m1+θ) time (for any θ > 0)Approximator

Maxflow [Racke-Shah-Taubig `14] O(n) sized approximator that require O(logcn) iterations via solving maxflows on graphs of total size O(mlogcn)

Maxflow Maxflow

Approximator Approximator

Chicken and egg problem

O(m1+2θε-2) timeO(mlogcnε-2) time?

Algebraic formulation of min s-t cut:Minimize ║Bx║1 subject to xs = 0, xt = 1 ║*║1 : 1-norm, sum of absolute

values

[Sherman `13] [Kelner-Lee-Orecchia-Sidford `13]:can find approximate maxflow iteratively via several calls to a structure approximator

ALGORITHMIC SOLUTION

Ultra-sparsifier (e.g. [Koutis-Miller-P `10]): for any k, can find H close to G, but equivalent to graph of size O(m/k)

Maxflow

Absorb additional (small) error via more calls to approximatorRecurse on instances with smaller total size, total cost: O(mlogcn)

Key step: vertex reductions via edge reductions[P`14]: build approximator on the smaller graph

[CLMPPS`15]: extends to numerical data, has close connections to variants of Nystrom’s method

SUMMARY

• Algorithm design approach: study problems at the intersection of areas, and extend insights back.• Can sparsify objects via importance sampling.• Solve Lx=b efficiently via sparsified squaring.• Approximate maximum flow routines and

structure approximators can be constructed recursively from each other via graph sparsifiers.

RANDOMIZED NUMERICALLINEAR ALGEBRA

L1-preserving row sampling

B B’

First near-optimal routine for row sampling matrices in a 1-norm preserving manner

║y║1║y║2

GENERALIZATIONGeneralization of row sampling:given A, q, find A’ s.t.║Ax║q ≈║A’x║q ∀ x

1-norm: standard for representing cuts, used in sparse recovery / robust regression

Applications (for general A):• Feature selection• Low rank approximation / PCA

q-norm: ║y║q = (Σ|yi|q)1/q

Omitting corresponding empirical studies

ROW SAMPLING ROUTINES

#rows for q=2

#rows for q=1

Runtime

Dasgupta et al. `09 n2.5 mn5

Magdon-Ismail `10 nlog2n mn2

Sohler-Woodruff `11 n3.5 mnω-1+θ

Drineas et al. `12 nlogn mnlogn

Clarkson et al. `12 n4.5log1.5n mnlogn

Clarkson-Woodruff `12 n2logn n8 nnz

Mahoney-Meng `12 n2 n3.5 nnz+n6

Nelson-Nguyen `12 n1+θ nnz

Li et.`13, Cohen et al. 14

nlogn n3.66 nnz+nω+θ

[Naor `11][Matousek `97]: on graphs, L2 approx Lq approx ∀ 1 ≤ q ≤ 2

How special are graphs?

A’ s.t.║Ax║q ≈║A’x║q ∀ x nnz: # of non-zeros in A

How special is L2?

L1 ROW SAMPLING

L1 Lewis weights ([Lewis `78]):

w s.t. wi2 = ai

T(ATW-

1A)-1ai

Recursive definition!

[Sampling with pi ≥ wiO( logn) gives ║Ax║1 ≈ ║A’x║1

Can check: Σi wi ≤ n O(nlogn) rows

[Talagrand `90, “Embedding subspaces of L1 into LN

1”] can be analyzed as row-sampling /

sparsification

[COHEN-P `14]

Update w on LHS with w on RHS

w’i (ai

T(ATW-1A)-1ai)1/2

q Previous # of rows New # Rows Runtime

1 n2.5 nlogn nnz+nω+θ

1 < q < 2 nq/2+2 nlogn(loglogn)2 nnz+nω+θ

2 < q nq+1 np/2logn nnz+nq/2+O(1)

Converges in loglogn steps: analyze ATW-1A spectrally

Aside: similar to iterative reweighted least squares

Elementary, optimization motivated proof of w.h.p. concentration for L1

SUMMARY

• Algorithm design approach: study problems at the intersection of areas, and extend insights back.• Can sparsify objects via importance sampling.• Solve Lx=b efficiently via sparsified squaring.• Approximate maximum flow routines and cut-

approximators can be constructed recursively from each other via graph sparsifiers.• Wider ranges of structures can be

sparsified, key statistical quantities can be computed iteratively.

I’VE ALSO WORKED ON

• Dynamic graph data structures• Graph partitioning• Parallel algorithms• Image processing• Anomaly / sybil

detection in graphs

FUTURE WORK:LINEAR SYSTEM SOLVERS

• Wider classes of linear systems• Relation to optimization /

learning

Mx=bSolvers for linear systems involving graph Laplacians

FUTURE WORK:COMBINATORIAL OPTIMIZATION

Faster algorithms for more classical algorithmic graph theory problems?

Maximum flow

FUTURE WORK: RANDOMIZED NUMERICAL LINEAR ALGEBRA

• Other algorithmic applications of Lewis weights?• Low-rank approximation in L1?

• O(n)-sized L1-preserving row samples?(these exist for L2)

L1-preserving row sampling

B B’

SUMMARY

Numerical analysis / Optimization

B B’

Links to arXiv manuscripts and videos of more detailed talks are at:

math.mit.edu/~rpeng/

Multifaceted Algorithm Design Richard Peng M.I.T

Documents

€¦ · Web viewAnd they grabbed their drums and began to drum. Peng, Peng, Peng, Peng, Peng! Peng, Peng, Peng, Peng, Peng! Peng, Peng, Peng, Peng, Peng! Peng, Peng, Peng, Peng

The Multifaceted Academic Achievement Gap

M.I.T. Media Lab Perceptual Computing Group Technical Report …jdavis/Publications/... · 1999-05-24 · M.I.T. Media Lab Perceptual Computing Group Technical Report No. 387 App

M.I.T. Laboratory for Computer Science Request for ...web.mit.edu/Saltzer/www/publications/rfc/csr-rfc-228.pdf · M.I.T. Laboratory for Computer Science Request for Comments No. 228

Sampling: an Algorithmic Perspective Richard Peng M.I.T

Zhang Peng

Multifaceted aspects of charge transfer

HSEdesign: Peng Peng / Weltformat

Krste Asanovic Laboratory for Computer Science M.I.T. · Spring 2002 6.823 Pipeline Hazards Krste Asanovic Laboratory for Computer Science M.I.T. Asanovic/Devadas Spring 2002 Pipelined

M.I.T. Laboratory for Computer Science Local Network Note

Project Management Agility Global Survey - M.I.T

Program Update: M.I.T. Summer Technology Camp 2015

Peng (2011)

Guides Peng

CITIZEN PARTICIPATION IN WESTSIDE TRANSPORTATION …onlinepubs.trb.org/Onlinepubs/trr/1974/528/528-001.pdfLaboratory at M.I.T. (1, 2). A representative from M.I.T. served as an advisor

Self-Concept:Multidimensional Or Multifaceted, Unidimensional?

Whitepaper - M.I.T TNB · Whitepaper M.I.T - Miao'A International Timechain. 2 ABSTRACT Miao'A International Timechain (M.I.T) is dedicated to building a precision time-value- based

Peng listening

Kapila Vatsyayan A Multifaceted Personality

THE CLIMATE POLICY DILEMMA Robert S. Pindyck M.I.T. December 2012