Matrix Completion from a Few Entriescnls.lanl.gov/~jasonj/poa/slides/keshavan.pdf · Matrix Completion from a Few Entries Raghunandan Keshavan, Andrea Montanari and Sewoong Oh Stanford

Matrix Completion from a Few Entries

Raghunandan Keshavan, Andrea Montanari and Sewoong Oh

Stanford University

Physics of AlgorithmsSanta Fe - Aug 31, 2009

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 1 / 24

Motivating Example : Recommender System

Netflix Challenge

3

13

4

1

1

5

4 4

4

4

4

4

4

4

4

4

4

4

4

4

3 3

3

3

1

1

1

11

5

5

5

2

2

2

2

5 · 105 users

2·10

4m

ovies

108 ratings

M =


The Model


Matrix Completion Problem

Low-rank M U

Σ V T

=n

nα r

r

1. Low-rank matrix M = UΣV T .UTU = n, V TV = nα, Σ = diag(Σ1,Σ2, . . . ,Σr )

2. Uniformly random sample E ⊂ [n]× [nα] given its size |E |.[k] = 1, 2, . . . , k


Matrix Completion Problem

Sample MEn

nα

1. Low-rank matrix M = UΣV T .UTU = n, V TV = nα, Σ = diag(Σ1,Σ2, . . . ,Σr )

2. Uniformly random sample E ⊂ [n]× [nα] given its size |E |.[k] = 1, 2, . . . , k


Matrix Completion Problems

For any estimation M, let RMSE =(

1√mn||M − M||F

)||A||2F =

Pij A2

ij




1√mn||M − M||F

)Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn . |E | = O(nr)

Q2. How many samples do we need to recover M exactly?

n log n . |E | = O(n log n)

Q3. What if the samples are corrupted by noise?




1√mn||M − M||F

)Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn . |E | = O(nr)






Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn . |E | = O(nr)







(1 + α)rn . |E | = O(nr)







(1 + α)rn . |E | = O(nr)







(1 + α)rn . |E | = O(nr)





Pathological Example

M = e1eT1

n

xy

1 0 · · · 00 0 · · · 0...

.... . .

...0 0 · · · 0

P(observing M11) = |E |

n2


Pathological Example

M = e1eT1

n

xy

1 0 · · · 00 0 · · · 0...

.... . .

...0 0 · · · 0

P(observing M11) = |E |

n2


Incoherence Property

M is (µ0, µ1)-incoherent if

A1. Mmax ≤ µ0Σ1

√r ; ,

A2.r∑

a=1

U2ia ≤ µ1r ,

r∑a=1

V 2ja ≤ µ1r .

[Candes, Recht 2008 [1]]


Previous Work

Theorem (Candes, Recht 2008 [1])

Let M be an n × nα matrix of rank r satisfying (µ0, µ1)-incoherencecondition. If

|E | ≥ C (α, µ0, µ1)rn6/5 log n ,

then w.h.p. Semidefinite Programming reconstructs M exactly.


Main Contributions

Questions Main Results

1. RMSE ≤ C (α)(

nr|E |

) 12

2. Exact Reconstruction |E | = O(n log n)

3. Noisy Reconstruction |E | = O(n log n)

(N = M + Z ) 1√mn||M − M||F ≤ C n

√αr|E | ||Z

E ||24. Complexity? O(|E |r log n)


Main Contributions


1. RMSE ≤ C (α)(

nr|E |

) 12



(N = M + Z ) 1√mn||M − M||F ≤ C n

√αr|E | ||Z



Main Contributions


1. RMSE ≤ C (α)(

nr|E |

) 12



(N = M + Z ) 1√mn||M − M||F ≤ C n

√αr|E | ||Z



Main Contributions


1. RMSE ≤ C (α)(

nr|E |

) 12



(N = M + Z ) 1√mn||M − M||F ≤ C n

√αr|E | ||Z



Main Contributions


1. RMSE ≤ C (α)(

nr|E |

) 12



(N = M + Z ) 1√mn||M − M||F ≤ C n

√αr|E | ||Z



The Algorithm and Main Theorems


Naıve Approach

MEij =

Mij if (i , j) ∈ E ,

0 otherwise.

ME =n∑

k=1

xkσkyTk

Rank-r projection :

Pr (ME ) ≡ n2α

|E |

r∑k=1

xkσkyTk


Naıve Approach Fails

Define : deg(rowi ) ≡ # of samples in row i .

For |E | = O(n), spurious singular values of Ω(√

log n/(log log n)).

Solution : Trimming

MEij =

0 if deg(rowi ) > 2E[deg(rowi )] ,0 if deg(colj) > 2E[deg(coli )] ,

MEij otherwise.





log n/(log log n)).

Solution : Trimming

MEij =


MEij otherwise.





log n/(log log n)).

Solution : Trimming

MEij =


MEij otherwise.


The Algorithm

OptSpace

Input : sample positions E , sample values ME , rank r

Output : estimation M

1: Trim ME , and let ME be the output;

2: Compute rank-r projection Pr (ME ) = X0S0YT0 ;

3:


Main Result

Theorem (Keshavan, Montanari, Oh, 2009 [2])

Let M be an n × nα matrix of rank-r bounded by Mmax. Then, w.h.p,

1

nMmax||M − Pr (ME )||F = RMSE ≤ C (α)

√nr

|E |,


The Algorithm

OptSpace

Input : sample positions E , sample values ME , rank r

Output : estimation M

1: Trim ME , and let ME be the output;

2: Compute rank-r projection Pr (ME ) = X0S0YT0 ;

3: Minimize F(X ,Y ) by gradient descent starting at (X0,Y0).

F(X ,Y ) = minS∈Rr×r

∑(i ,j)∈E

(Mij − (XSY T )ij

)2


Main Result


Assume r = O(1), and let M be an n × nα matrix satisfying(µ0, µ1)-incoherence with σ1(M)/σr (M) = O(1). If

|E | ≥ C ′n log n ,

then OptSpace returns, w.h.p., the matrix M.


Comparison

Theorem (Candes, Tao, 2009 [3])

Assume a strongly incoherent matrix M.If |E | ≥ C r n (log n)6 thenSemidefinite Programming returns, w.h.p., the matrix M.


Corrupted Observations

N = M + Z for any n × nα matrix Z

NE (defined similar to ME ) input to OptSpace










Let N = M + Z with M as above and Z any n × nα matrix. If|E | ≥ C ′n log n , then OptSpace with input NE returns M such thatw.h.p.,

1√mn||M − M||F ≤ C

n√αr

|E |||ZE ||2

provided that the right hand side is smaller than Σr .

||A||2 = supv 6=0

“||Av||||v||

”



Zij are independent, EZij = 0 and P|Zij | ≥ x ≤ Ce−x2

2σ2




2σ2


If Z is a random matrix with entries drawn as above, then

||ZE ||2 ≤ Cσ

(√α|E | log |E |

n

) 12




2σ2


If Z is a random matrix with entries drawn as above, then

||ZE ||2 ≤ Cσ

(√α|E | log |E |

n

) 12

Corollary

Let N = M + Z, with Z distributed as above. Then, OptSpace withinput NE returns M such that

1√mn||M − M||F ≤ Cασ

(nr

|E |

) 12 √

log |E |


Simulation ResultsM = UV T , Uij ∼ N(0, 1), Vij ∼ N(0, 1)r = 10, α = 1, n = 1000M is recovered if ||M − M||F/||M||F < 10−4

0

0.5

1

0 20 40 60 80 100 120 140 160 180 200

Lower BoundOptSpace

FPCASVT

ADMiRA

|E |/n

Rec

over

yR

ate


Simulation ResultsUij ∼ N(0, σ2 = 20/

√n)[5]

Zij ∼ N(0, 1)r = 2, α = 1, n = 600

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600

Convex RelaxationADMiRAOptSpace

Oracle Bound

RM

SE

|E |/nR.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 21 / 24

Conclusion

Main Results

1 RMSE ≤ δ : |E | = O(nr)

2 Exact : |E | = O(n log n)

3 Noisy : ≤ C n√αr|E | ||Z

E ||2

4 Complexity : O(|E |r log n)

What’s left?

1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)

2 r = Θ(nβ) ?Suboptimal bound


Conclusion

Main Results

1 RMSE ≤ δ : |E | = O(nr)



E ||2


What’s left?




Conclusion

Main Results

1 RMSE ≤ δ : |E | = O(nr)



E ||2


What’s left?




Conclusion

Main Results

1 RMSE ≤ δ : |E | = O(nr)



E ||2


What’s left?




Conclusion

Main Results

1 RMSE ≤ δ : |E | = O(nr)



E ||2


What’s left?




ConclusionMain Results

1 RMSE ≤ δ : |E | = O(nr)



E ||2


What’s left?



Thank you!


References

E. J. Candes and B. Recht.Exact matrix completion via convex optimization.arxiv:0805.4471, 2008.

R. H. Keshavan, A. Montanari, and S. Oh.Matrix completion from a few entries.arXiv:0901.3150, January 2009.

E. J. Candes and T. Tao.The power of convex relaxation: Near-optimal matrix completion.arXiv:0903.1476, 2009.

R. H. Keshavan, A. Montanari, and S. Oh.Matrix completion from noisy entries.arXiv:0906.2027, June 2009.

E. J. Candes and Y. Plan.Matrix completion with noise.arXiv:0903.3131, 2009.


0 10 20 30 40 50 600

5

10

15

20

25

30

35

40

45

0 10 20 30 40 50 600

5

10

15

20

25

30

35

40

45Untrimmed SVD Trimmed SVD

Σ1|E |n

cq|E |n

↑


Documents

Matrix Completion from a Few Entriescnls.lanl.gov/~jasonj/poa/slides/keshavan.pdf · Matrix Completion from a Few Entries Raghunandan Keshavan, Andrea Montanari and Sewoong Oh Stanford