79
ELE 520: Mathematics of Data Science Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020

Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

ELE 520: Mathematics of Data Science

Low-Rank Matrix Recovery

Yuxin Chen

Princeton University, Fall 2020

Page 2: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Outline

• Motivation

• Problem setup

• Nuclear norm minimization

RIP and low-rank matrix recovery Phase retrieval / solving random quadratic systems of equations Matrix completion

Matrix recovery 13-2

Page 3: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Motivation

Matrix recovery 13-3

Page 4: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Motivation 1: recommendation systems

? ? ? ?

?

?

??

??

???

?

?

• Netflix challenge: Netflix provides highly incomplete ratings from0.5 million users for & 17,770 movies

• How to predict unseen user ratings for movies?Matrix recovery 13-4

Page 5: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

In general, we cannot infer missing ratings

X ? ? ? X ?? ? X X ? ?X ? ? X ? ?? ? X ? ? XX ? ? ? ? ?? X ? ? X ?? ? X X ? ?

Underdetermined system (more unknowns than observations)

Matrix recovery 13-5

Page 6: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

... unless rating matrix has other structure

? ? ? ?

?

?

??

??

???

?

?

A few factors explain most of the data

How to exploit (approx.) low-rank structure in prediction?

Matrix recovery 13-6

Page 7: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

... unless rating matrix has other structure

? ? ? ?

?

?

??

??

???

?

?

A few factors explain most of the data −→ low-rank approximation

How to exploit (approx.) low-rank structure in prediction?

Matrix recovery 13-6

Page 8: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Motivation 2: sensor localization

• n sensors / points xj ∈ R3, j = 1, · · · , n• Observe partial information about pairwise distances

Di,j = ‖xi − xj‖22 = ‖xi‖22 + ‖xj‖22 − 2x>i xj

• Goal: infer distance between every pair of nodes

Matrix recovery 13-7

Page 9: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Motivation 2: sensor localizationIntroduce

X =

x>1x>2

...x>n

∈ Rn×3

then distance matrix D = [Di,j ]1≤i,j≤n can be written as

D =

‖x1‖22...

‖xn‖22

1>

︸ ︷︷ ︸rank 1

+ 1 ·[‖x1‖22, · · · , ‖xn‖22

]︸ ︷︷ ︸

rank 1

− 2XX>︸ ︷︷ ︸rank 3

︸ ︷︷ ︸low rank

rank(D) n −→ low-rank matrix completion

Matrix recovery 13-8

Page 10: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Motivation 3: structure from motionGiven multiple images and a few correspondences between imagefeatures, how to estimate the locations of 3D points?

Snavely, Seitz, & Szeliski

Structure from motion: reconstruct 3D scene geometry︸ ︷︷ ︸structure

and

camera poses︸ ︷︷ ︸motion

from multiple images

Matrix recovery 13-9

Page 11: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Motivation 3: structure from motion

Tomasi and Kanade’s factorization:

• Consider n 3D points pj ∈ R31≤j≤n in m different 2D frames

• xi,j ∈ R2×1: locations of the jth point in the ith frame

xi,j = Mi︸︷︷︸projection matrix ∈R2×3

pj︸︷︷︸3D position ∈R3

• Matrix of all 2D locations

X =

x1,1 · · · x1,n... . . . ...

xm,1 · · · xm,n

=

M1...

Mm

[ p1 · · · pn]

︸ ︷︷ ︸low-rank factorization

∈ R2m×n

Goal: fill in missing entries of X given a small number of entries

Matrix recovery 13-10

Page 12: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Motivation 3: structure from motion

Tomasi and Kanade’s factorization:

• Consider n 3D points pj ∈ R31≤j≤n in m different 2D frames

• xi,j ∈ R2×1: locations of the jth point in the ith frame

xi,j = Mi︸︷︷︸projection matrix ∈R2×3

pj︸︷︷︸3D position ∈R3

• Matrix of all 2D locations

X =

x1,1 · · · x1,n... . . . ...

xm,1 · · · xm,n

=

M1...

Mm

[ p1 · · · pn]

︸ ︷︷ ︸low-rank factorization

∈ R2m×n

Goal: fill in missing entries of X given a small number of entries

Matrix recovery 13-10

Page 13: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Motivation 4: missing phase problemDetectors record intensities of diffracted rays• electric field x(t1, t2) −→ Fourier transform x(f1, f2)

Fig credit: Stanford SLAC

intensity of electrical field:∣∣x(f1, f2)

∣∣2 =∣∣∣∫ x(t1, t2)e−i2π(f1t1+f2t2)dt1dt2

∣∣∣2Phase retrieval: recover signal x(t1, t2) from intensity |x(f1, f2)

∣∣2Matrix recovery 13-11

Page 14: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

A discrete-time model: solving quadratic systemsA

x

Ax

1

A

x

Ax

1

A

x

Ax

1

A

x

Ax

y = |Ax|2

1

1

-3

2

-1

4

2 -2

-1

3

4

1

9

4

1

16

4 4

1

9

16

Solve for x ∈ Rn in m quadratic equations

yk = |a>k x|2, k = 1, . . . ,mor y = |Ax|2 where |z|2 := |z1|2, · · · , |zm|2

Matrix recovery 13-12

Page 15: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

An equivalent view: low-rank factorization

Lifting: introduce X = xx∗ to linearize constraints

yk = |a∗kx|2 = a∗k(xx∗)ak =⇒ yk = a∗kXak = 〈aka∗k,X〉 (13.1)

find X 0

s.t. yk = 〈aka∗k,X〉, k = 1, · · · ,mrank(X) = 1

Matrix recovery 13-13

Page 16: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Problem setup

Matrix recovery 13-14

Page 17: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Setup

• Consider M ∈ Rn×n

• rank(M) = r n

• Singular value decomposition (SVD) of M :

M = UΣV >︸ ︷︷ ︸(2n−r)r degrees of freedom

=r∑i=1

σiuiv>i

where Σ =

σ1. . .

σr

contains all singular values σi;

U := [u1, · · · ,ur], V := [v1, · · · ,vr] consist of singular vectors

Matrix recovery 13-15

Page 18: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Low-rank matrix completion

Observed entries

Mi,j , (i, j) ∈ Ω︸︷︷︸sampling set

Completion via rank minimization

minimizeX rank(X) s.t. Xi,j = Mi,j , (i, j) ∈ Ω

• An operator PΩ: orthogonal projection onto the subspace ofmatrices supported on Ω

Completion via rank minimization

minimizeX rank(X) s.t. PΩ(X) = PΩ(M)

Matrix recovery 13-16

Page 19: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Low-rank matrix completion

Observed entries

Mi,j , (i, j) ∈ Ω︸︷︷︸sampling set

Completion via rank minimization

minimizeX rank(X) s.t. Xi,j = Mi,j , (i, j) ∈ Ω

• An operator PΩ: orthogonal projection onto the subspace ofmatrices supported on Ω

Completion via rank minimization

minimizeX rank(X) s.t. PΩ(X) = PΩ(M)

Matrix recovery 13-16

Page 20: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

More general: low-rank matrix recovery

Linear measurements

yi = 〈Ai,M〉 = Tr(A>i M), i = 1, . . .m

• An operator form

y = A(M) :=

〈A1,M〉...

〈Am,M〉

∈ Rm

Recovery via rank minimization

minimizeX rank(X) s.t. y = A(X)

Matrix recovery 13-17

Page 21: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Nuclear norm minimization

Matrix recovery 13-18

Page 22: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Convex relaxation

minimizeX∈Rn×n rank(X)︸ ︷︷ ︸nonconvex

s.t. PΩ(X) = PΩ(M)

minimizeX∈Rn×n rank(X)︸ ︷︷ ︸nonconvex

s.t. A(X) = A(M)

Question: what is the convex surrogate for rank(·)?

Matrix recovery 13-19

Page 23: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Nuclear norm

Definition 13.1The nuclear norm of X is

‖X‖∗ :=n∑i=1

σi(X)︸ ︷︷ ︸ith largest singular value

• Nuclear norm is a counterpart of `1 norm for rank function

• Relations among different norms

‖X‖ ≤ ‖X‖F ≤ ‖X‖∗ ≤√r‖X‖F ≤ r‖X‖

• (Tightness) X : ‖X‖∗ ≤ 1 is the convex hull of rank-1matrices obeying ‖uv>‖ ≤ 1 (Fazel ’02)

Matrix recovery 13-20

Page 24: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Additivity of nuclear norm

Fact 13.2

Let A and B be matrices of the same dimensions. If AB> = 0 andA>B = 0, then ‖A + B‖∗ = ‖A‖∗ + ‖B‖∗.

• If row (resp. column) spaces of A and B are orthogonal, then‖A + B‖∗ = ‖A‖∗ + ‖B‖∗

• Similar to `1 norm: when x and y have disjoint support,

‖x + y‖1 = ‖x‖1 + ‖y‖1 (a key to study `1-min under RIP)

Matrix recovery 13-21

Page 25: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Proof of Fact 13.2

Suppose A = UAΣAV>A and B = UBΣBV

>B , which gives

AB> = 0A>B = 0

⇐⇒ V >A VB = 0U>AUB = 0

Thus, one can write

A = [UA,UB ,UC ]

ΣA

00

[VA,VB ,VC ]>

B = [UA,UB ,UC ]

0ΣB

0

[VA,VB ,VC ]>

and hence‖A + B‖∗ =

∥∥∥∥[UA,UB ][

ΣA

ΣB

][VA,VB ]>

∥∥∥∥∗

= ‖A‖∗ + ‖B‖∗

Matrix recovery 13-22

Page 26: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Dual norm

Definition 13.3 (Dual norm)For a given norm ‖ · ‖A, the dual norm is defined as

‖X‖?A := max〈X,Y 〉 : ‖Y ‖A ≤ 1

• `1 norm dual←→ `∞ norm

• nuclear norm dual←→ spectral norm

• `2 norm dual←→ `2 norm

• Frobenius norm dual←→ Frobenius norm

Matrix recovery 13-23

Page 27: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Representing nuclear norm via SDPSince the spectral norm is the dual norm of the nuclear norm,

‖X‖∗ = max〈X,Y 〉 : ‖Y ‖ ≤ 1

The constraint is equivalent to

‖Y ‖ ≤ 1 ⇐⇒ Y Y > ISchur complement⇐⇒

[I YY > I

] 0

Fact 13.4

‖X‖∗ = maxY

〈X,Y 〉

∣∣∣∣∣[

I YY > I

] 0

Fact 13.5 (Dual characterization)

‖X‖∗ = minW1,W2

12Tr(W1) + 1

2Tr(W2)∣∣∣∣∣[W1 XX> W2

] 0

• Optimal point: W1 = UΣU>, W2 = V ΣV > (whereX = UΣV >)

Matrix recovery 13-24

Page 28: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Representing nuclear norm via SDPSince the spectral norm is the dual norm of the nuclear norm,

‖X‖∗ = max〈X,Y 〉 : ‖Y ‖ ≤ 1

The constraint is equivalent to

‖Y ‖ ≤ 1 ⇐⇒ Y Y > ISchur complement⇐⇒

[I YY > I

] 0

Fact 13.4

‖X‖∗ = maxY

〈X,Y 〉

∣∣∣∣∣[

I YY > I

] 0

Fact 13.5 (Dual characterization)

‖X‖∗ = minW1,W2

12Tr(W1) + 1

2Tr(W2)∣∣∣∣∣[W1 XX> W2

] 0

• Optimal point: W1 = UΣU>, W2 = V ΣV > (whereX = UΣV >)

Matrix recovery 13-24

Page 29: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Aside: dual of semidefinite program

(primal) minimizeX 〈C,X〉s.t. 〈Ai,X〉 = bi, 1 ≤ i ≤ m

X 0

m

(dual) maximizey b>y

s.t.m∑i=1

yiAi + S = C

S 0

Exercise: use this to verify Fact 13.5

Matrix recovery 13-25

Page 30: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Nuclear norm minimization via SDP

Convex relaxation of rank minimization

M = argminX‖X‖∗ s.t. y = A(X)

This is solvable via SDP

minimizeX,W1,W212Tr(W1) + 1

2Tr(W2)

s.t. y = A(X),[W1 XX> W2

] 0

Matrix recovery 13-26

Page 31: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

RIP and low-rank matrix recovery

Matrix recovery 13-27

Page 32: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

RIP for low-rank matrices

Almost parallel results to compressed sensing ...1

Definition 13.6The r-restricted isometry constants δub

r (A) and δlbr (A) are the

smallest quantities s.t.

(1− δlbr )‖X‖F ≤ ‖A(X)‖F ≤ (1 + δub

r )‖X‖F, ∀X : rank(X) ≤ r

1One can also define RIP w.r.t. ‖ · ‖2F rather than ‖ · ‖F.

Matrix recovery 13-28

Page 33: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

RIP and low-rank matrix recovery

Theorem 13.7 (Recht, Fazel, Parrilo ’10, Candes, Plan ’11)

Suppose rank(M) = r. For any fixed integer K > 0, if1+δub

Kr

1−δlb(2+K)r

<√

K2 , then nuclear norm minimization is exact

• It allows δubKr to be larger than 1

• Can be easily extended to account for noisy case andapproximately low-rank matrices

Matrix recovery 13-29

Page 34: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Geometry of nuclear norm ball

Level set of nuclear norm ball:∥∥∥∥∥[x yy z

]∥∥∥∥∥∗≤ 1

Fig. credit: Candes ’14

Matrix recovery 13-30

Page 35: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Some notation

Recall M = UΣV >

• Let T be the span of matrices of the form (called tangent space)

T = UX> + Y V > : X,Y ∈ Rn×r

• Let PT be the orthogonal projection onto T :

PT (X) = UU>X + XV V > −UU>XV V >

• Its complement PT⊥ = I − PT :

PT⊥(X) = (I −UU>)X(I − V V >)

MP>T⊥(X) = 0 and M>PT⊥(X) = 0

Matrix recovery 13-31

Page 36: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Proof of Theorem 13.7

Suppose X = M + H is feasible and obeys ‖M + H‖∗ ≤ ‖M‖∗.The goal is to show that H = 0 under RIP.

The key is to decompose H into H0 + H1 + H2 + . . .︸ ︷︷ ︸Hc

• H0 = PT (H) (rank 2r)• Hc = P⊥T (H) (obeying MH>c = 0 and M>Hc = 0)• H1: the best rank-(Kr) approximation of Hc (K is const)• H2: the best rank-(Kr) approximation of Hc −H1

• ...

Matrix recovery 13-32

Page 37: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Proof of Theorem 13.7

Informally, the proof proceeds by showing that

1. H0 “dominates”∑i≥2 Hi (by objective function)

— see Step 12. (converse)

∑i≥2 Hi “dominates” H0 +H1 (by RIP + feasibility)

— see Step 2

These cannot happen simultaneously unless H = 0

Matrix recovery 13-33

Page 38: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Proof of Theorem 13.7Step 1 (which does not rely on RIP). Show that∑

j≥2‖Hj‖F ≤ ‖H0‖∗/

√Kr. (13.2)

This follows immediately by combining the following 2 observations:(i) Since M + H is assumed to be a better estimate:

‖M‖∗ ≥ ‖M + H‖∗ ≥ ‖M + Hc‖∗ − ‖H0‖∗ (13.3)≥ ‖M‖∗ + ‖Hc‖∗︸ ︷︷ ︸

Fact 13.2 (MH>c =0 and M>Hc=0)

− ‖H0‖∗

=⇒ ‖Hc‖∗ ≤ ‖H0‖∗ (13.4)

(ii) Since nonzero singular values of Hj−1 dominate those of Hj (j ≥ 2):

‖Hj‖F ≤√Kr‖Hj‖ ≤

√Kr[‖Hj−1‖∗/(Kr)

]≤ ‖Hj−1‖∗/

√Kr

=⇒∑j≥2‖Hj‖F ≤

1√Kr

∑j≥2‖Hj−1‖∗ ≤

1√Kr‖Hc‖∗ (13.5)

Page 39: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Proof of Theorem 13.7Step 2 (using feasibility + RIP). Show that ∃ρ <

√K/2 s.t.

‖H0 + H1‖F ≤ ρ∑

j≥2‖Hj‖F (13.6)

If this claim holds, then

‖H0 + H1‖F ≤ ρ∑

j≥2‖Hj‖F

(13.2)≤ ρ

1√Kr‖H0‖∗

≤ ρ 1√Kr

(√2r‖H0‖F

)= ρ

√2K‖H0‖F

≤ ρ√

2K‖H0 + H1‖F (13.7)

where the last line holds since, by construction, H0 and H1 lie inorthogonal subspaces.

This bound (13.7) cannot hold with ρ <√K/2 unless H0 + H1 = 0︸ ︷︷ ︸

equivalently, H0=H1=0

Matrix recovery 13-35

Page 40: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Proof of Theorem 13.7

We now prove (13.6). To connect H0 + H1 with∑j≥2 Hj , we use

feasibility:

A(H) = 0 ⇐⇒ A(H0 + H1) = −∑

j≥2A(Hj),

which taken collectively with RIP yields

(1− δlb(2+K)r)‖H0 + H1‖F ≤

∥∥A(H0 + H1)∥∥

F =∥∥∥∑

j≥2A(Hj)

∥∥∥F

≤∑

j≥2

∥∥A(Hj)∥∥

F

≤∑

j≥2(1 + δub

Kr)‖Hj‖F

This establishes (13.6) as long as ρ := 1+δubKr

1−δlb(2+K)r

<√

K2 .

Matrix recovery 13-36

Page 41: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Gaussian sampling operators satisfy RIP

If the entries of Aimi=1 are i.i.d. N (0, 1/m), then

δ5r(A) <√

3−√

2√3 +√

2

with high prob., provided that

m & nr (near-optimal sample size)

This satisfies the assumption of Theorem 13.7 with K = 3

Matrix recovery 13-37

Page 42: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Precise phase transition

Using the statistical dimension machienry, we can locate precise phasetransition (Amelunxen, Lotz, McCoy & Tropp ’13)

nuclear norm min

works if m > stat-dim(D (‖ · ‖∗,X)

)fails if m < stat-dim

(D (‖ · ‖∗,X)

)where

stat-dim(D (‖ · ‖∗,X)

)≈ n2ψ

(r

n

)and

ψ (ρ) = infτ≥0

ρ+ (1− ρ)

[ρ(1 + τ2) + (1− ρ)

∫ 2

τ

(u− τ)2√

4− u2

πdu]

Matrix recovery 13-38

Page 43: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Aside: subgradient of nuclear norm

Subdifferential (set of subgradients) of ‖ · ‖∗ at M is

∂‖M‖∗ =UV > + W : PT (W ) = 0, ‖W ‖ ≤ 1

• Does not depend on the singular values of M

• Z ∈ ∂‖M‖∗ iff

PT (Z) = UV >, ‖PT⊥(Z)‖ ≤ 1.

Matrix recovery 13-39

Page 44: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Derivation of the statistical dimension

WLOG, suppose X =[

Ir0

], then ∂‖X‖∗ =

[Ir

W

]| ‖W ‖ ≤ 1

.

Let G =[

G11 G12G21 G22

]be i.i.d. standard Gaussian.

From the convex geometry lecture, we know that

stat-dim(D(‖ · ‖∗,X)

)≈ inf

τ≥0E[

infZ∈∂‖X‖∗

‖G− τZ‖2F

]= infτ≥0

E

[inf

W :‖W‖≤1

∥∥∥∥[ G11 G12G21 G22

]− τ

[Ir

W

]∥∥∥∥2

F

]

Matrix recovery 13-40

Page 45: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Derivation of statistical dimension

Observe that

E

[inf

W :‖W‖≤1

∥∥∥∥[ G11 G12G21 G22

]− τ

[Ir

W

]∥∥∥∥2

F

]

= E[‖G11 − τIr‖2

F + ‖G21‖2F + ‖G12‖2

F + inf‖W‖≤1

‖G22 − τW ‖2F

]= r(2n− r + τ2)+ E

[∑n−r

i=1(σi (G22)− τ)2

+

].

empirical distributions of σi(G22)/√n− r

Recall from random matrix theory (Marchenko-Pastur law)

1n− rE

[n−r∑i=1

(σi(G22

)− τ)2

+

]→∫ 2

0(u− τ)2

+

√4− u2

πdu,

where G22 ∼ N(0, 1

n−rI)

. Taking ρ = r/n and minimizing over τ lead toclosed-form expression for phase transition boundary.

Matrix recovery 13-41

Page 46: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Derivation of statistical dimension

Observe that

E

[inf

W :‖W‖≤1

∥∥∥∥[ G11 G12G21 G22

]− τ

[Ir

W

]∥∥∥∥2

F

]

= E[‖G11 − τIr‖2

F + ‖G21‖2F + ‖G12‖2

F + inf‖W‖≤1

‖G22 − τW ‖2F

]= r(2n− r + τ2)+ E

[∑n−r

i=1(σi (G22)− τ)2

+

].

empirical distributions of σi(G22)/√n− r

Recall from random matrix theory (Marchenko-Pastur law)

1n− rE

[n−r∑i=1

(σi(G22

)− τ)2

+

]→∫ 2

0(u− τ)2

+

√4− u2

πdu,

where G22 ∼ N(0, 1

n−rI)

. Taking ρ = r/n and minimizing over τ lead toclosed-form expression for phase transition boundary.

Matrix recovery 13-41

Page 47: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Numerical phase transition (n = 30)

Figure credit: Amelunxen, Lotz, McCoy, & Tropp ’13

Matrix recovery 13-42

Page 48: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Sampling operators that do NOT satisfy RIP

Unfortunately, many sampling operators fail to satisfy RIP(e.g. none of the 4 motivating examples in this lecture satisfies RIP)

Two important examples:

• Phase retrieval / solving random quadratic systems of equations

• Matrix completion

Matrix recovery 13-43

Page 49: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Phase retrieval / solving random quadraticsystems of equations

Matrix recovery 13-44

Page 50: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Rank-one measurements

Measurements: see (13.1)

yi = a>i xx>︸ ︷︷ ︸

:=M

ai =⟨aia

>i︸ ︷︷ ︸

:=Ai

,M⟩, 1 ≤ i ≤ m

A (X) =

〈A1,X〉〈A2,X〉

...〈Am,X〉

=

〈a1a

>1 ,X〉

〈a2a>2 ,X〉...

〈ama>m,X〉

Matrix recovery 13-45

Page 51: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Rank-one measurements

Suppose aiind.∼ N (0, In)

• If x is independent of ai, then⟨aia

>i ,xx

>⟩ =∣∣a>i x∣∣2 ‖x‖22 ⇒ ∥∥∥A(xx>)∥∥∥

F√m‖xx>‖F

• Consider Ai = aia>i : with high prob.,⟨

aia>i ,Ai

⟩= ‖ai‖42 ≈ n‖aia>i ‖F

=⇒ ‖A(Ai)‖F ≥ |⟨aia

>i ,Ai

⟩| ≈ n‖Ai‖F

• If the sample size m n (information limit) and K 1, then

maxX: rank(X)=1‖A(X)‖F‖X‖F

minX: rank(X)=1‖A(X)‖F‖X‖F

&n√m

&√n

=⇒ 1 + δubK

1− δlb2+K

≥maxX: rank(X)=1

‖A(X)‖F‖X‖F

minX: rank(X)=1‖A(X)‖F‖X‖F

&√n

√K

Violate RIP condition in Theorem 13.7 unless K is exceeding large

Matrix recovery 13-46

Page 52: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Rank-one measurements

Suppose aiind.∼ N (0, In)

• If x is independent of ai, then⟨aia

>i ,xx

>⟩ =∣∣a>i x∣∣2 ‖x‖22 ⇒ ∥∥∥A(xx>)∥∥∥

F√m‖xx>‖F

• Consider Ai = aia>i : with high prob.,⟨

aia>i ,Ai

⟩= ‖ai‖42 ≈ n‖aia>i ‖F

=⇒ ‖A(Ai)‖F ≥ |⟨aia

>i ,Ai

⟩| ≈ n‖Ai‖F

• If the sample size m n (information limit) and K 1, then

maxX: rank(X)=1‖A(X)‖F‖X‖F

minX: rank(X)=1‖A(X)‖F‖X‖F

&n√m

&√n

=⇒ 1 + δubK

1− δlb2+K

≥maxX: rank(X)=1

‖A(X)‖F‖X‖F

minX: rank(X)=1‖A(X)‖F‖X‖F

&√n

√K

Violate RIP condition in Theorem 13.7 unless K is exceeding large

Matrix recovery 13-46

Page 53: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Why do we lose RIP?

Problems:

• Some low-rank matrices X (e.g. aia>i ) might be too alignedwith some (rank-1) measurement matrices loss of “incoherence” in some measurements

• Some measurements 〈Ai,X〉 might have too high of a leverageon A(X) when measured in ‖ · ‖F Solution: replace ‖ · ‖F by other norms!

Matrix recovery 13-47

Page 54: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Mixed-norm RIP

Solution: modify RIP appropriately ...

Definition 13.8 (RIP-`2/`1)Let ξub

r (A) and ξlbr (A) be the smallest quantities s.t.

(1− ξlbr )‖X‖F ≤ ‖A(X)‖1 ≤ (1 + ξub

r )‖X‖F, ∀X : rank(X) ≤ r

• More generally, it only requires A to satisfy

supX:rank(X)≤r‖A(X)‖1‖X‖F

infX:rank(X)≤r‖A(X)‖1‖X‖F

≤ 1 + ξubr

1− ξlbr

(13.8)

Matrix recovery 13-48

Page 55: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Analyzing phase retrieval via RIP-`2/`1

Theorem 13.9 (Chen, Chi, Goldsmith ’15)

Theorem 13.7 continues to hold if we replace δubr and δlb

r with ξubr

and ξlbr (defined in (13.8)), respectively

• Follows the same proof as for Theorem 13.7, except that ‖ · ‖F(highlighted in red) is replaced by ‖ · ‖1 in Slide 13-36

• Back to the example in Slide 13-46: If x is independent of ai, then⟨

aia>i ,xx

>⟩ =∣∣a>i x∣∣2 ‖x‖22 ⇒

∥∥A(xx>)∥∥1 m‖xx>‖F

‖A(Ai)‖1 = |⟨aia

>i ,Ai

⟩|+∑j:j 6=i |

⟨aia

>i ,Aj

⟩| ≈ (n+m)‖Ai‖F

For both cases, ‖A(X)‖1‖X‖F

are of the same order if m n

Matrix recovery 13-49

Page 56: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Analyzing phase retrieval via RIP-`2/`1

Theorem 13.9 (Chen, Chi, Goldsmith ’15)

Theorem 13.7 continues to hold if we replace δubr and δlb

r with ξubr

and ξlbr (defined in (13.8)), respectively

• Follows the same proof as for Theorem 13.7, except that ‖ · ‖F(highlighted in red) is replaced by ‖ · ‖1 in Slide 13-36

• Back to the example in Slide 13-46: If x is independent of ai, then⟨

aia>i ,xx

>⟩ =∣∣a>i x∣∣2 ‖x‖22 ⇒

∥∥A(xx>)∥∥1 m‖xx>‖F

‖A(Ai)‖1 = |⟨aia

>i ,Ai

⟩|+∑j:j 6=i |

⟨aia

>i ,Aj

⟩| ≈ (n+m)‖Ai‖F

For both cases, ‖A(X)‖1‖X‖F

are of the same order if m n

Matrix recovery 13-49

Page 57: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Analyzing phase retrieval via RIP-`2/`1

Informally, a debiased operator satisfies RIP condition of Theorem13.9 when m & nr (Chen, Chi, Goldsmith ’15)

B(X) :=

〈A1 −A2,X〉〈A3 −A4,X〉

...

∈ Rm/2

• Debiasing is crucial when r 1

• A consequence of the Hanson-Wright inequality for quadraticform (Hanson & Wright ’71, Rudelson & Vershynin ’03)

Matrix recovery 13-50

Page 58: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Theoretical guarantee for phase retrieval

(PhaseLift) minimizeX∈Rn×n

trX︸ ︷︷ ︸‖·‖∗ for PSD matrices

s.t. yi = a>i Xai, 1 ≤ i ≤ mX 0 (since X = xx>)

Theorem 13.10 (Candes, Strohmer, Voroninski ’13, Candes,Li ’14)

Suppose aiind.∼ N (0, I). With high prob., PhaseLift recovers xx>

exactly as soon as m & n

Matrix recovery 13-51

Page 59: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Extension of phase retrieval

(PhaseLift) minimizeX∈Rn×n

trX︸ ︷︷ ︸‖·‖∗ for PSD matrices

s.t. a>i Xai = a>i Mai, 1 ≤ i ≤ mX 0

Theorem 13.11 (Chen, Chi, Goldsmith ’15, Cai, Zhang ’15)

Suppose M 0, rank(M) = r, and aiind.∼ N (0, I). With high

prob., PhaseLift recovers M exactly as soon as m & nr

Matrix recovery 13-52

Page 60: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Matrix completion

Matrix recovery 13-53

Page 61: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Sampling operators for matrix completionObservation operator (projection onto matrices supported on Ω)

Y = PΩ(M)

where (i, j) ∈ Ω with prob. p (random sampling)• PΩ does NOT satisfy RIP when p 1!• For example,

1 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 0

︸ ︷︷ ︸

M

? X ? X XX ? X ? X? X X ? ?X ? ? X ?X ? X ? X

︸ ︷︷ ︸

Ω

‖PΩ(M)‖F = 0, or equivalently, 1+δubK

1−δlb2+K

=∞Matrix recovery 13-54

Page 62: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Which sampling pattern?

Consider the following sampling patternX X X X X? ? ? ? ?X X X X XX X X X XX X X X X

• If some rows / columns are not sampled, recovery is impossible

Matrix recovery 13-55

Page 63: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Which low-rank matrices can we recover?

Compare the following rank-1 matrices:1 0 0 · · · 00 0 0 · · · 0...

......

0 0 0 · · · 0

←−

? 0 ? · · · 00 ? 0 · · · ?...

......

? 0 ? · · · 0

if we miss the top-left entry, then we cannot hope to recover thematrix

Matrix recovery 13-56

Page 64: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Which low-rank matrices can we recover?

Compare the following rank-1 matrices:1 1 1 · · · 11 1 1 · · · 1...

......

1 1 1 · · · 1

←−

? 1 ? · · · 11 ? 1 · · · ?...

......

? 1 ? · · · 1

it is possible to fill in all missing entries by exploiting the rank-1structure

Matrix recovery 13-57

Page 65: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Which low-rank matrices can we recover?

Compare the following rank-1 matrices:1 0 0 · · · 00 0 0 · · · 0...

......

0 0 0 · · · 0

︸ ︷︷ ︸

hard

vs.

1 1 1 · · · 11 1 1 · · · 1...

......

1 1 1 · · · 1

︸ ︷︷ ︸

easy

Column / row spaces cannot be aligned with canonical basis vectors

Matrix recovery 13-58

Page 66: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Coherence

Definition 13.12Coherence parameter µ of M = UΣV > is the smallest quantity s.t.

maxi‖U>ei‖22 ≤

µr

nand max

i‖V >ei‖22 ≤

µr

n

(primal) minimizeX hC, Xis.t. hAi, Xi = bi, 1 i m

X 0

m

(dual) maximizey b>y

s.t.mX

i=1

yiAi + S = C

S 0

A (X) =

26664

hA1, XihA2, Xi

...hAm, Xi

37775 =

26664

ha1a>1 , Xi

ha2a>2 , Xi...

hama>m, Xi

37775

ei U PUei

1

(primal) minimizeX hC, Xis.t. hAi, Xi = bi, 1 i m

X 0

m

(dual) maximizey b>y

s.t.mX

i=1

yiAi + S = C

S 0

A (X) =

26664

hA1, XihA2, Xi

...hAm, Xi

37775 =

26664

ha1a>1 , Xi

ha2a>2 , Xi...

hama>m, Xi

37775

ei U PUei

1

(primal) minimizeX hC, Xis.t. hAi, Xi = bi, 1 i m

X 0

m

(dual) maximizey b>y

s.t.mX

i=1

yiAi + S = C

S 0

A (X) =

26664

hA1, XihA2, Xi

...hAm, Xi

37775 =

26664

ha1a>1 , Xi

ha2a>2 , Xi...

hama>m, Xi

37775

ei U PU (ei)

1

• µ ≥ 1 (since∑ni=1 ‖U>ei‖22 = ‖U‖2F = r)

• µ = 1 if 1√n

1 = U = V (most incoherent)

• µ = nr if ei ∈ U (most coherent)

Matrix recovery 13-59

Page 67: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Performance guarantee

Theorem 13.13 (Candes & Recht ’09, Candes & Tao ’10, Gross’11, ...)Nuclear norm minimization is exact and unique with high probability,provided that

m & µnr log2 n

• This result is optimal up to a logarithmic factor

• Established via a RIPless theory

Matrix recovery 13-60

Page 68: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Numerical performance of nuclear-normminimization

n = 50

Fig. credit: Candes, Recht ’09Matrix recovery 13-61

Page 69: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

KKT condition

Lagrangian:

L (X,Λ) = ‖X‖∗+〈Λ,PΩ(X)−PΩ(M)〉 = ‖X‖∗+〈PΩ(Λ),X−M〉

When M is the minimizer, the KKT condition reads

0 ∈ ∂XL(X,Λ)∣∣X=M ⇐⇒ ∃Λ s.t. − PΩ(Λ) ∈ ∂‖M‖∗

⇐⇒ ∃W s.t. UV > + W is supported on Ω,PT (W ) = 0, and ‖W ‖ ≤ 1

Matrix recovery 13-62

Page 70: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Optimality condition via dual certificate

Slightly stronger condition than KKT guarantees uniqueness:

Lemma 13.14

M is the unique minimizer of nuclear norm minimization if• the sampling operator PΩ restricted to T is injective, i.e.

PΩ(H) 6= 0, ∀ nonzero H ∈ T

• ∃W s.t.

UV > + W is supported on Ω,PT (W ) = 0, and ‖W ‖ < 1

Matrix recovery 13-63

Page 71: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Proof of Lemma 13.14

For any W0 obeying ‖W0‖ ≤ 1 and PT (W0) = 0, one has

‖M + H‖∗ ≥ ‖M‖∗ +⟨UV > + W0,H

⟩= ‖M‖∗ +

⟨UV > + W ,H

⟩+ 〈W0 −W ,H〉

= ‖M‖∗ +⟨PΩ(UV > + W

),H⟩

+ 〈PT⊥(W0 −W ),H〉= ‖M‖∗ +

⟨UV > + W ,PΩ(H)

⟩+ 〈W0 −W ,PT⊥(H)〉

if we take W0 s.t. 〈W0,PT⊥(H)〉 = ‖PT⊥(H)‖∗︸ ︷︷ ︸exercise: how to find such an W0

≥ ‖M‖∗ + ‖PT⊥(H)‖∗ − ‖W ‖ · ‖PT⊥(H)‖∗= ‖M‖∗ + (1− ‖W ‖) ‖PT⊥(H)‖∗ > ‖M‖∗

unless PT⊥(H) = 0.

But if PT⊥(H) = 0, then H = 0 by injectivity. Thus, ‖M +H‖∗ > ‖M‖∗for any H 6= 0. This concludes the proof.

Matrix recovery 13-64

Page 72: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Constructing dual certificates

Use the “golfing scheme” to produce an approximate dual certificate(Gross ’11)

• Think of it as an iterative algorithm (with sample splitting) tofind a solution satisfying the KKT condition

Matrix recovery 13-65

Page 73: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

(Optional) Proximal algorithm

In the presence of noise, one needs to solve

minimizeX12‖y −A(X)‖2F + λ‖X‖∗

which can be solved via proximal methodsProximal operator:

proxλ‖·‖∗(X) = arg minZ

12‖Z −X‖2F + λ‖Z‖∗

= UTλ(Σ)V >

where SVD of X is X = UΣV > with Σ = diag(σi), and

Tλ(Σ) = diag((σi − λ)+)

Matrix recovery 13-66

Page 74: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

(Optional) Proximal algorithm

Algorithm 13.1 Proximal gradient methodsfor t = 0, 1, · · · :

Xt+1 = Tµt(Xt − µtA∗A(Xt)

)where µt: step size / learning rate

Matrix recovery 13-67

Page 75: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Reference

• “Lecture notes, Advanced topics in signal processing (ECE 8201),”Y. Chi, 2015.

• “Guaranteed minimum-rank solutions of linear matrix equations vianuclear norm minimization,” B. Recht, M. Fazel, P. Parrilo, SIAMReview , 2010.

• “Exact matrix completion via convex optimization,” E. Candes, andB. Recht, Foundations of Computational Mathematics, 2009

• “Matrix completion from a few entries,” R. Keshavan, A. Montanari,S. Oh, IEEE Transactions on Information Theory, 2010.

• “Matrix rank minimization with applications,” M. Fazel, Ph.D. Thesis,2002.

• “Modeling the world from internet photo collections,” N. Snavely,S. Seitz, and R. Szeliski, International Journal of Computer Vision,2008.

Matrix recovery 13-68

Page 76: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Reference

• “Computer vision: algorithms and applications,” R. Szeliski, Springer,2011.

• “Shape and motion from image streams under orthography: afactorization method ,” C. Tomasi and T. Kanade, International Journalof Computer Vision, 1992.

• “Topics in random matrix theory ,” T. Tao, American mathematicalsociety , 2012.

• “A singular value thresholding algorithm for matrix completion,” J. Cai,E. Candes, Z. Shen, SIAM Journal on Optimization, 2010.

• “Recovering low-rank matrices from few coefficients in any basis,”D. Gross, IEEE Transactions on Information Theory, 2011.

• “Incoherence-optimal matrix completion,” Y. Chen, IEEE Transactionson Information Theory, 2015.

Matrix recovery 13-69

Page 77: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Reference

• “The power of convex relaxation: Near-optimal matrix completion,”E. Candes, and T. Tao, 2010.

• “Tight oracle inequalities for low-rank matrix recovery from a minimalnumber of noisy random measurements,” E. Candes, and Y. Plan,IEEE Transactions on Information Theory, 2011.

• “PhaseLift: Exact and stable signal recovery from magnitudemeasurements via convex programming ,” E. Candes, T. Strohmer, andV. Voroninski, Communications on Pure and Applied Mathematics,2013.

• “Solving quadratic equations via PhaseLift when there are about asmany equations as unknowns,” E. Candes, X. Li, Foundations ofComputational Mathematics, 2014.

Matrix recovery 13-70

Page 78: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Reference

• “A bound on tail probabilities for quadratic forms in independentrandom variables,” D. Hanson, F. Wright, Annals of MathematicalStatistics, 1971.

• “Hanson-Wright inequality and sub-Gaussian concentration,”M. Rudelson, R. Vershynin, Electronic Communications in Probability,2013.

• “Exact and stable covariance estimation from quadratic sampling viaconvex programming ,” Y. Chen, Y. Chi, and A. Goldsmith, IEEETransactions on Information Theory, 2015.

• “ROP: Matrix recovery via rank-one projections,” T. Cai, andA. Zhang, Annals of Statistics, 2015.

• “Low rank matrix recovery from rank one measurements,” R. Kueng,H. Rauhut, and U. Terstiege, Applied and Computational HarmonicAnalysis, 2017.

Matrix recovery 13-71

Page 79: Low-Rank Matrix Recoveryyc5/ele520_math_data/lectures/matrix...Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2020 Outline •Motivation •Problem setup •Nuclear

Reference

• “Mathematics of sparsity (and a few other things),” E. Candes,International Congress of Mathematicians, 2014.

Matrix recovery 13-72