Sketching as a Tool for Numerical Linear Algebrasassadi/stuff/presentations/sketch-num-1.pdf · Goal New survey by David Woodruﬀ: I Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical LinearAlgebra

David P. Woodruffpresented by Sepehr Assadi

o(n) Big Data Reading GroupUniversity of Pennsylvania

February, 2015

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 1 / 25

GoalNew survey by David Woodruff:

I Sketching as a Tool for Numerical Linear AlgebraTopics:

I Subspace EmbeddingsI Least Squares RegressionI Least Absolute Deviation RegressionI Low Rank ApproximationI Graph SparsificationI Sketching Lower Bounds


http://researcher.watson.ibm.com/researcher/files/us-dpwoodru/wNow.pdf

GoalNew survey by David Woodruff:

I Sketching as a Tool for Numerical Linear AlgebraTopics:

I Subspace EmbeddingsI Least Squares RegressionI Least Absolute Deviation RegressionI Low Rank ApproximationI Graph SparsificationI Sketching Lower Bounds


http://researcher.watson.ibm.com/researcher/files/us-dpwoodru/wNow.pdf

IntroductionYou have “Big” data!

I Computationally expensive to deal withI Excessive storage requirementI Hard to communicateI . . .


IntroductionYou have “Big” data!

I Computationally expensive to deal withI Excessive storage requirementI Hard to communicateI . . .

Summarize your dataI Sampling

F A representative subset of the dataI Sketching

F An aggregate summary of the whole data


ModelInput:

I matrix A ∈ Rn×d

I vector b ∈ Rn .Output: function F(A,b, . . .)

I e.g. least square regressionDifferent goals:

I Faster algorithmsI StreamingI Distributed


Linear SketchingInput:

I matrix A ∈ Rn×d

Let r n and S ∈ Rr×n be a random matrixLet S ·A be the sketchCompute F(S ·A) instead of F(A)


Linear Sketching (cont.)Pros:

I Compute on a r × d matrix instead of n × dI Smaller representation and faster computationI Linearity:

F S · (A + B) = S ·A + S ·BF We can compose linear sketches !

Cons:I F(S ·A) is an approximation of F(A)


Least Square Regression (`2-regression)Input:

I matrix A ∈ Rn×d (full column rank)I vector b ∈ Rn

Output x∗ ∈ Rd :

x∗ = arg minx‖Ax− b‖2

Closed form solution:

x∗ = (ATA)−1ATb

Θ(nd2)-time algorithm using naive matrix multiplication


Approximate `2-regressionInput:

I matrix A ∈ Rn×d (full column rank)I vector b ∈ Rn

I parameter 0 < ε < 1Output x ∈ Rd :

‖Ax− b‖2 ≤ (1 + ε) arg minx‖Ax− b‖2


Approximate `2-regression (cont.)A sketching algorithm:

I Sample a random matrix S ∈ Rr×n

I Compute S ·A and S · bI Output x = arg minx ‖(SA)x− (Sb)‖2

Which randomized family of matrices S and what value of r?


Approximate `2-regression (cont.)An introductory construction:

I Let r = Θ(d/ε2)I Let S ∈ Rr×n be a matrix of i.i.d normal random variables with

mean zero and variance 1/r

Proof Sketch.On the board


Approximate `2-regression (cont.)Problems:

I Computing S ·A takes Θ(nrd) timeI Constructing S requires Θ(nr) space

Different constructions for S:I Fast Johnson-Lindenstrauss transforms:

O(nd log d) + poly(d/ε) time [Sarlos, FOCS ’06]I Optimal O(nnz(A)) + poly(d/ε) time algorithm [Clarkson,

Woodruff, STOC ’13]I Random sign matrices with Θ(d)-wise independent entries:

O(d2/ε log (nd))-space streaming algorithm [Clarkson,Woodruff, STOC ’09]


Subspace EmbeddingDefinition (`2-subspace embedding)A (1± ε) `2-subspace embedding for a matrix A ∈ Rn×d is a matrixS for which for all x ∈ Rn

‖SAx‖22 = (1± ε) ‖Ax‖2

2

Actually subspace embedding for column space of AOblivious `2-subspace embedding

I The distribution from which S is chosen is oblivious to AOne very common tool for (oblivious) `2-subspace embedding isJohnson-Lindenstrauss transform (JLT)


Johnson-Lindenstrauss transformDefinition (JLT(ε, δ, f ))A random matrix S ∈ Rr×d forms a JLT(ε, δ, f ), if with probability atleast 1− δ, for any f -element subset V ⊆ Rn, it holds that:

∀ v,v′ ∈ V |〈Sv,Sv′〉 − 〈v,v′〉| ≤ ε ‖v‖2 ‖v′‖2


Johnson-Lindenstrauss transformDefinition (JLT(ε, δ, f ))A random matrix S ∈ Rr×d forms a JLT(ε, δ, f ), if with probability atleast 1− δ, for any f -element subset V ⊆ Rn, it holds that:

∀ v,v′ ∈ V |〈Sv,Sv′〉 − 〈v,v′〉| ≤ ε ‖v‖2 ‖v′‖2

Usual statement (i.e. original Johnson-Lindenstrauss Lemma)

Lemma (JLL)Given N points q1, . . . ,qN ∈ Rn, there exists a matrix S ∈ Rt×n

(linear map) for t = Θ(log N/ε2) such that with high probability,simultaneously for all pairs qi and qj ,

‖S(qi − qj)‖2 = (1± ε) ‖(qi − qj)‖2


Johnson-Lindenstrauss transform (cont.)A simple construction of JLT(ε, δ, f ):

TheoremLet 0 < ε, δ < 1 and S = 1√

r R ∈ Rr×n where the entries Ri,j areindependent standard normal random variables. Assumingr = Ω(ε−2 log (f /δ)) then S is a JLT(ε, δ, f ).

Other constructions:I Random sign matrices

[Achlioptas, ’03],[Clarkson, Woodruff, STOC ’09]I Random sparse matrices

[Dasgupta, Kumar, Sarlos, STOC ’10],[Kane, Nelson, J. ACM’14]

I Fast Johnson-Lindenstrauss transforms[Ailon, Chazelle, STOC ’06]


JLT results in `2-subspace embeddingClaimS = JLT(ε, δ, f ) is an oblivious `2-subspace embedding for A ∈ Rn×d

Challenge:I JLT(ε, δ, f ) provides a guarantee for a single finite set in Rn

I `2-subspace embedding requires the guarantee for an infiniteset, i.e. the column space of A


JLT results in `2-subspace embedding (cont.)Let S be the unit sphere in column space of A

S = y ∈ Rn | y = Ax for some x ∈ Rd and ‖y‖2 = 1

We seek a finite subset N ⊆ S so that if

∀ w,w′ ∈ N 〈Sw,Sw′〉 = 〈w,w′〉 ± ε

then∀ y ∈ S ‖Sy‖ = (1± ε) ‖y‖


JLT results in `2-subspace embedding (cont.)

Lemma (12-net for S)

Suffices to choose any N such that

∀y ∈ S ∃w ∈ N s.t. ‖y−w‖2 ≤ 1/2

Proof.1 Decompose y:

y = y(0) + y(1) + y(2) + . . .

where∥∥∥y(i)

∥∥∥2≤ 1

2i and yi

‖y(i)‖ ∈ N

2 ‖Sy‖22 =

∥∥∥S(y(0) + y(1) + y(2) + . . .)∥∥∥ = 1±O(ε)


12-net of SLemmaThere exists a 1

2 -net N of S for which |N | ≤ 5d

Proof.1 Find a set N ′ of maximal number of points in Rd so that no two

points are within 1/2 distance from each other2 Let U be the orthonormal matrix of column space of A3 N = y ∈ Rn | y = Ux for some x ∈ N ′ and ‖y‖2 = 1


Subspace Embedding via JLTTheoremLet 0 < ε, δ < 1 and S = JLT(ε, δ, 5d). For any fixed matrixA ∈ Rn×d , with probability 1− δ, S is a (1± ε) `2-subspaceembedding for A, i.e.

∀x ∈ Rd , ‖SAx‖2 = (1± ε) ‖Ax‖2

Results inI O(nnz(A) · ε−1 log d) time algorithm using column-sparsity

transform of Kane and Nelson [Kane, Nelson, J. ACM ’14]I O(nd log n) time algorithm using Fast Johnson-Lindenstrauss

transform of Ailon and Chazelle [Ailon, Chazelle, STOC ’06]


Other Subspace Embedding AlgorithmsNot JLT-based subspace embedding

I O(nnz(A)) + poly(d/ε) time algorithm [Clarkson, Woodruff,STOC ’13]

None oblivious subspace embeddingsI Based on Leverage Score Sampling [Drineas, Mahoney,

Muthukrishnan, SODA ’06]


`2-regression via Oblivious Subspace EmbeddingTheoremLet S ∈ Rr×n be any oblivious subspace embedding matrix andx = arg minx ‖SAx− Sb‖2; then,

‖SAx− Sb‖2 ≤ (1 + ε) arg minx‖Ax− b‖2

Proof.1 Let matrix U ∈ Rn×(d+1) be the orthonormal basis of columns of

A together with vector b2 Suppose S is a `2-subspace embedding for U


Questions?


Documents

Sketching as a Tool for Numerical Linear Algebrasassadi/stuff/presentations/sketch-num-1.pdf · Goal New survey by David Woodruﬀ: I Sketching as a Tool for Numerical Linear Algebra