Matrix Concentration

Matrix Concentration

Nick Harvey University of British Columbia

The ProblemGiven any random nxn, symmetric matrices Y1,…,Yk.Show that i Yi is probably “close” to E[i Yi].

Why?• A matrix generalization of the Chernoff bound.• Much research on eigenvalues of a random matrix

with independent entries. This is more general.

Chernoff/Hoeffding Bound• Theorem:

Let Y1,…,Yk be independent random scalars in [0,R].Let Y = i Yi. Suppose that ¹L · E[Y] · ¹U. Then

Rudelson’s Sampling Lemma• Theorem: [Rudelson ‘99]

Let Y1,…,Yk be i.i.d. rank-1, PSD matrices of size nxn s.t.E[Yi]=I, kYi k· R. Let Y = i Yi , so E[Y]=k¢I. Then

• Example: Balls and bins– Throw k balls uniformly into n bins

– Yi = Uniform over

– If k = O(n log n / ²2), all bins same up to factor 1§²

Rudelson’s Sampling Lemma• Theorem: [Rudelson ‘99]

Let Y1,…,Yk be i.i.d. rank-1, PSD matrices of size nxn s.t.E[Yi]=I, kYi k· R. Let Y = i Yi , so E[Y]=k¢I. Then

• Pros: We’ve generalized to PSD matrices• Mild issue: We assume E[Yi] = I.• Cons:– Yi’s must be identically distributed– rank-1 matrices only

Rudelson’s Sampling Lemma• Theorem: [Rudelson-Vershynin ‘07]

Let Y1,…,Yk be i.i.d. rank-1, PSD matrices s.t.E[Yi]=I, kYi k· R. Let Y = i Yi , so E[Y]=k¢I. Then

• Pros: We’ve generalized to PSD matrices• Mild issue: We assume E[Yi] = I.• Cons:– Yi’s must be identically distributed– rank-1 matrices only


Let Y1,…,Yk be i.i.d. rank-1, PSD matrices s.t.E[Yi]=I. Let Y=i Yi , so E[Y]=k¢I. Assume Yi ¹ R¢I. Then

• Notation: • A ¹ B , B-A is PSD• ® I ¹ A ¹ ¯ I , all eigenvalue of A lie in [®,¯]

• Mild issue: We assume E[Yi] = I.

E[Yi]=I


Let Y1,…,Yk be i.i.d. rank-1, PSD matrices.Let Z=E[Yi], Y=i Yi , so E[Y]=k¢Z. Assume Yi ¹ R¢Z. Then

• Apply previous theorem to { Z-1/2 Yi Z-1/2 : i=1,…,k }.• Use the fact that A ¹ B

, Z-1/2 A Z-1/2 ¹ Z-1/2

B Z-1/2

• So (1-²) k Z ¹ i Yi ¹ (1+²) k Z , (1-²) k I ¹ i Z-1/2

Yi Z-1/2 ¹ (1+²) k I

Ahlswede-Winter Inequality• Theorem: [Ahlswede-Winter ‘02]

Let Y1,…,Yk be i.i.d. PSD matrices of size nxn.Let Z=E[Yi], Y=i Yi , so E[Y]=k¢Z. Assume Yi ¹ R¢Z. Then

• Pros:– We’ve removed the rank-1 assumption.– Proof is much easier than Rudelson’s proof.

• Cons:– Still need Yi’s to be identically distributed.

(More precisely, poor results unless E[Ya] = E[Yb].)

Tropp’s User-Friendly Tail Bound• Theorem: [Tropp ‘12]

Let Y1,…,Yk be independent, PSD matrices of size nxn.s.t. kYi k· R. Let Y=i Yi. Suppose ¹L¢I ¹ E[Y] ¹ ¹U¢I. Then

• Pros:– Yi’s do not need to be identically distributed– Poisson-like bound for the right-tail– Proof not difficult (but uses Lieb’s inequality)

• Mild issue: Poor results unless ¸min(E[Y]) ¼ ¸max(E[Y]).


Let Y1,…,Yk be independent, PSD matrices of size nxn.Let Y=i Yi. Let Z=E[Y]. Suppose Yi ¹ R¢Z. Then


Let Y1,…,Yk be independent, PSD matrices of size nxn.s.t. kYi k· R. Let Y=i Yi. Suppose ¹L¢I ¹ E[Y] ¹ ¹U¢I. Then

• Example: Balls and bins– For b=1,…,n– For t=1,…,8 log(n)/²2

– With prob ½, throw a ball into bin b

– Let Yb,t = with prob ½, otherwise 0.

Additive Error• Previous theorems give multiplicative error:

(1-²) E[i Yi] ¹ i Yi ¹ (1+²) E[i Yi]

• Additive error also useful: ki Yi - E[i Yi]k · ²• Theorem: [Rudelson & Vershynin ‘07]

Let Y1,…,Yk be i.i.d. rank-1, PSD matrices.Let Z=E[Yi]. Suppose kZk·1, kYik· R. Then

• Theorem: [Magen & Zouzias ‘11]If instead rank Yi · k := £(R log(R/²2)/²2), then

Proof of Ahlswede-Winter• Key idea: Bound matrix moment generating function• Let Sk = i=1 Yi

k

Golden-Thompson Inequality

By induction,

tr eA+B · tr eA¢eB

Weakness:This is brutal

How to improve Ahlswede-Winter?• Golden-Thompson Inequality

tr eA+B · tr eA¢eB for all symmetric matrices A, B.

• Does not extend to three matrices! tr eA+B+C · tr eA¢eB¢eC is FALSE.

• Lieb’s Inequality: For any symmetric matrix L,the map f : PSD Cone ! R defined by f(A) = tr exp( L + log(A) )is concave.– So f interacts nicely with Expectation and Jensen’s inequality

Beyond the basics

• Hoeffding (non-uniform bounds on Yi’s) [Tropp ‘12]

• Bernstein (use bound on Var[Yi]) [Tropp ‘12]

• Freedman (martingale version of Bernstein) [Tropp ‘12]• Stein’s Method (slightly sharper results) [Mackey et al. ‘12]

• Pessimistic Estimators for Ahlswede-Winter inequality [Wigderson-Xiao ‘08]

Summary

• We now have beautiful, powerful, flexible extension of Chernoff bound to matrices.

• Ahlswede-Winter has a simple proof;Tropp’s inequality is very easy to use.

• Several important uses to date;hopefully more uses in the future.

Documents

Matrix Concentration