47
Matrix Completion from a Few Entries Raghunandan Keshavan, Andrea Montanari and Sewoong Oh Stanford University Physics of Algorithms Santa Fe - Aug 31, 2009 R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 1 / 24

Matrix Completion from a Few Entriescnls.lanl.gov/~jasonj/poa/slides/keshavan.pdf · Matrix Completion from a Few Entries Raghunandan Keshavan, Andrea Montanari and Sewoong Oh Stanford

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Matrix Completion from a Few Entries

Raghunandan Keshavan, Andrea Montanari and Sewoong Oh

Stanford University

Physics of AlgorithmsSanta Fe - Aug 31, 2009

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 1 / 24

Motivating Example : Recommender System

Netflix Challenge

3

13

4

1

1

5

4 4

4

4

4

4

4

4

4

4

4

4

4

4

3 3

3

3

1

1

1

11

5

5

5

2

2

2

2

5 · 105 users

2·10

4m

ovies

108 ratings

M =

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 2 / 24

The Model

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 3 / 24

Matrix Completion Problem

Low-rank M U

Σ V T

=n

nα r

r

1. Low-rank matrix M = UΣV T .UTU = n, V TV = nα, Σ = diag(Σ1,Σ2, . . . ,Σr )

2. Uniformly random sample E ⊂ [n]× [nα] given its size |E |.[k] = 1, 2, . . . , k

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 4 / 24

Matrix Completion Problem

Sample MEn

1. Low-rank matrix M = UΣV T .UTU = n, V TV = nα, Σ = diag(Σ1,Σ2, . . . ,Σr )

2. Uniformly random sample E ⊂ [n]× [nα] given its size |E |.[k] = 1, 2, . . . , k

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 4 / 24

Matrix Completion Problems

For any estimation M, let RMSE =(

1√mn||M − M||F

)||A||2F =

Pij A2

ij

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

Matrix Completion Problems

For any estimation M, let RMSE =(

1√mn||M − M||F

)Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn . |E | = O(nr)

Q2. How many samples do we need to recover M exactly?

n log n . |E | = O(n log n)

Q3. What if the samples are corrupted by noise?

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

Matrix Completion Problems

For any estimation M, let RMSE =(

1√mn||M − M||F

)Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn . |E | = O(nr)

Q2. How many samples do we need to recover M exactly?

n log n . |E | = O(n log n)

Q3. What if the samples are corrupted by noise?

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

Matrix Completion Problems

Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn . |E | = O(nr)

Q2. How many samples do we need to recover M exactly?

n log n . |E | = O(n log n)

Q3. What if the samples are corrupted by noise?

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

Matrix Completion Problems

Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn . |E | = O(nr)

Q2. How many samples do we need to recover M exactly?

n log n . |E | = O(n log n)

Q3. What if the samples are corrupted by noise?

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

Matrix Completion Problems

Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn . |E | = O(nr)

Q2. How many samples do we need to recover M exactly?

n log n . |E | = O(n log n)

Q3. What if the samples are corrupted by noise?

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

Matrix Completion Problems

Q1. How many samples do we need to get RMSE ≤ δ?

(1 + α)rn . |E | = O(nr)

Q2. How many samples do we need to recover M exactly?

n log n . |E | = O(n log n)

Q3. What if the samples are corrupted by noise?

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 5 / 24

Pathological Example

M = e1eT1

n

xy

1 0 · · · 00 0 · · · 0...

.... . .

...0 0 · · · 0

P(observing M11) = |E |

n2

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 6 / 24

Pathological Example

M = e1eT1

n

xy

1 0 · · · 00 0 · · · 0...

.... . .

...0 0 · · · 0

P(observing M11) = |E |

n2

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 6 / 24

Incoherence Property

M is (µ0, µ1)-incoherent if

A1. Mmax ≤ µ0Σ1

√r ; ,

A2.r∑

a=1

U2ia ≤ µ1r ,

r∑a=1

V 2ja ≤ µ1r .

[Candes, Recht 2008 [1]]

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 7 / 24

Previous Work

Theorem (Candes, Recht 2008 [1])

Let M be an n × nα matrix of rank r satisfying (µ0, µ1)-incoherencecondition. If

|E | ≥ C (α, µ0, µ1)rn6/5 log n ,

then w.h.p. Semidefinite Programming reconstructs M exactly.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 8 / 24

Main Contributions

Questions Main Results

1. RMSE ≤ C (α)(

nr|E |

) 12

2. Exact Reconstruction |E | = O(n log n)

3. Noisy Reconstruction |E | = O(n log n)

(N = M + Z ) 1√mn||M − M||F ≤ C n

√αr|E | ||Z

E ||24. Complexity? O(|E |r log n)

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24

Main Contributions

Questions Main Results

1. RMSE ≤ C (α)(

nr|E |

) 12

2. Exact Reconstruction |E | = O(n log n)

3. Noisy Reconstruction |E | = O(n log n)

(N = M + Z ) 1√mn||M − M||F ≤ C n

√αr|E | ||Z

E ||24. Complexity? O(|E |r log n)

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24

Main Contributions

Questions Main Results

1. RMSE ≤ C (α)(

nr|E |

) 12

2. Exact Reconstruction |E | = O(n log n)

3. Noisy Reconstruction |E | = O(n log n)

(N = M + Z ) 1√mn||M − M||F ≤ C n

√αr|E | ||Z

E ||24. Complexity? O(|E |r log n)

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24

Main Contributions

Questions Main Results

1. RMSE ≤ C (α)(

nr|E |

) 12

2. Exact Reconstruction |E | = O(n log n)

3. Noisy Reconstruction |E | = O(n log n)

(N = M + Z ) 1√mn||M − M||F ≤ C n

√αr|E | ||Z

E ||24. Complexity? O(|E |r log n)

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24

Main Contributions

Questions Main Results

1. RMSE ≤ C (α)(

nr|E |

) 12

2. Exact Reconstruction |E | = O(n log n)

3. Noisy Reconstruction |E | = O(n log n)

(N = M + Z ) 1√mn||M − M||F ≤ C n

√αr|E | ||Z

E ||24. Complexity? O(|E |r log n)

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 9 / 24

The Algorithm and Main Theorems

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 10 / 24

Naıve Approach

MEij =

Mij if (i , j) ∈ E ,

0 otherwise.

ME =n∑

k=1

xkσkyTk

Rank-r projection :

Pr (ME ) ≡ n2α

|E |

r∑k=1

xkσkyTk

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 11 / 24

Naıve Approach Fails

Define : deg(rowi ) ≡ # of samples in row i .

For |E | = O(n), spurious singular values of Ω(√

log n/(log log n)).

Solution : Trimming

MEij =

0 if deg(rowi ) > 2E[deg(rowi )] ,0 if deg(colj) > 2E[deg(coli )] ,

MEij otherwise.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 12 / 24

Naıve Approach Fails

Define : deg(rowi ) ≡ # of samples in row i .

For |E | = O(n), spurious singular values of Ω(√

log n/(log log n)).

Solution : Trimming

MEij =

0 if deg(rowi ) > 2E[deg(rowi )] ,0 if deg(colj) > 2E[deg(coli )] ,

MEij otherwise.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 12 / 24

Naıve Approach Fails

Define : deg(rowi ) ≡ # of samples in row i .

For |E | = O(n), spurious singular values of Ω(√

log n/(log log n)).

Solution : Trimming

MEij =

0 if deg(rowi ) > 2E[deg(rowi )] ,0 if deg(colj) > 2E[deg(coli )] ,

MEij otherwise.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 12 / 24

The Algorithm

OptSpace

Input : sample positions E , sample values ME , rank r

Output : estimation M

1: Trim ME , and let ME be the output;

2: Compute rank-r projection Pr (ME ) = X0S0YT0 ;

3:

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 13 / 24

Main Result

Theorem (Keshavan, Montanari, Oh, 2009 [2])

Let M be an n × nα matrix of rank-r bounded by Mmax. Then, w.h.p,

1

nMmax||M − Pr (ME )||F = RMSE ≤ C (α)

√nr

|E |,

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 14 / 24

The Algorithm

OptSpace

Input : sample positions E , sample values ME , rank r

Output : estimation M

1: Trim ME , and let ME be the output;

2: Compute rank-r projection Pr (ME ) = X0S0YT0 ;

3: Minimize F(X ,Y ) by gradient descent starting at (X0,Y0).

F(X ,Y ) = minS∈Rr×r

∑(i ,j)∈E

(Mij − (XSY T )ij

)2

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 15 / 24

Main Result

Theorem (Keshavan, Montanari, Oh, 2009 [2])

Assume r = O(1), and let M be an n × nα matrix satisfying(µ0, µ1)-incoherence with σ1(M)/σr (M) = O(1). If

|E | ≥ C ′n log n ,

then OptSpace returns, w.h.p., the matrix M.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 16 / 24

Comparison

Theorem (Candes, Tao, 2009 [3])

Assume a strongly incoherent matrix M.If |E | ≥ C r n (log n)6 thenSemidefinite Programming returns, w.h.p., the matrix M.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 17 / 24

Corrupted Observations

N = M + Z for any n × nα matrix Z

NE (defined similar to ME ) input to OptSpace

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 18 / 24

Corrupted Observations

N = M + Z for any n × nα matrix Z

NE (defined similar to ME ) input to OptSpace

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 18 / 24

Corrupted Observations

N = M + Z for any n × nα matrix Z

NE (defined similar to ME ) input to OptSpace

Theorem (Keshavan, Montanari, Oh, 2009 [4])

Let N = M + Z with M as above and Z any n × nα matrix. If|E | ≥ C ′n log n , then OptSpace with input NE returns M such thatw.h.p.,

1√mn||M − M||F ≤ C

n√αr

|E |||ZE ||2

provided that the right hand side is smaller than Σr .

||A||2 = supv 6=0

“||Av||||v||

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 18 / 24

Corrupted Observations

Zij are independent, EZij = 0 and P|Zij | ≥ x ≤ Ce−x2

2σ2

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 19 / 24

Corrupted Observations

Zij are independent, EZij = 0 and P|Zij | ≥ x ≤ Ce−x2

2σ2

Theorem (Keshavan, Montanari, Oh, 2009 [4])

If Z is a random matrix with entries drawn as above, then

||ZE ||2 ≤ Cσ

(√α|E | log |E |

n

) 12

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 19 / 24

Corrupted Observations

Zij are independent, EZij = 0 and P|Zij | ≥ x ≤ Ce−x2

2σ2

Theorem (Keshavan, Montanari, Oh, 2009 [4])

If Z is a random matrix with entries drawn as above, then

||ZE ||2 ≤ Cσ

(√α|E | log |E |

n

) 12

Corollary

Let N = M + Z, with Z distributed as above. Then, OptSpace withinput NE returns M such that

1√mn||M − M||F ≤ Cασ

(nr

|E |

) 12 √

log |E |

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 19 / 24

Simulation ResultsM = UV T , Uij ∼ N(0, 1), Vij ∼ N(0, 1)r = 10, α = 1, n = 1000M is recovered if ||M − M||F/||M||F < 10−4

0

0.5

1

0 20 40 60 80 100 120 140 160 180 200

Lower BoundOptSpace

FPCASVT

ADMiRA

|E |/n

Rec

over

yR

ate

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 20 / 24

Simulation ResultsUij ∼ N(0, σ2 = 20/

√n)[5]

Zij ∼ N(0, 1)r = 2, α = 1, n = 600

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600

Convex RelaxationADMiRAOptSpace

Oracle Bound

RM

SE

|E |/nR.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 21 / 24

Conclusion

Main Results

1 RMSE ≤ δ : |E | = O(nr)

2 Exact : |E | = O(n log n)

3 Noisy : ≤ C n√αr|E | ||Z

E ||2

4 Complexity : O(|E |r log n)

What’s left?

1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)

2 r = Θ(nβ) ?Suboptimal bound

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24

Conclusion

Main Results

1 RMSE ≤ δ : |E | = O(nr)

2 Exact : |E | = O(n log n)

3 Noisy : ≤ C n√αr|E | ||Z

E ||2

4 Complexity : O(|E |r log n)

What’s left?

1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)

2 r = Θ(nβ) ?Suboptimal bound

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24

Conclusion

Main Results

1 RMSE ≤ δ : |E | = O(nr)

2 Exact : |E | = O(n log n)

3 Noisy : ≤ C n√αr|E | ||Z

E ||2

4 Complexity : O(|E |r log n)

What’s left?

1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)

2 r = Θ(nβ) ?Suboptimal bound

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24

Conclusion

Main Results

1 RMSE ≤ δ : |E | = O(nr)

2 Exact : |E | = O(n log n)

3 Noisy : ≤ C n√αr|E | ||Z

E ||2

4 Complexity : O(|E |r log n)

What’s left?

1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)

2 r = Θ(nβ) ?Suboptimal bound

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24

Conclusion

Main Results

1 RMSE ≤ δ : |E | = O(nr)

2 Exact : |E | = O(n log n)

3 Noisy : ≤ C n√αr|E | ||Z

E ||2

4 Complexity : O(|E |r log n)

What’s left?

1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)

2 r = Θ(nβ) ?Suboptimal bound

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24

ConclusionMain Results

1 RMSE ≤ δ : |E | = O(nr)

2 Exact : |E | = O(n log n)

3 Noisy : ≤ C n√αr|E | ||Z

E ||2

4 Complexity : O(|E |r log n)

What’s left?

1 Prior knowledge of rank?RankEstimation is exact w.h.p for |E | = O(n)

2 r = Θ(nβ) ?Suboptimal bound

Thank you!

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 22 / 24

References

E. J. Candes and B. Recht.Exact matrix completion via convex optimization.arxiv:0805.4471, 2008.

R. H. Keshavan, A. Montanari, and S. Oh.Matrix completion from a few entries.arXiv:0901.3150, January 2009.

E. J. Candes and T. Tao.The power of convex relaxation: Near-optimal matrix completion.arXiv:0903.1476, 2009.

R. H. Keshavan, A. Montanari, and S. Oh.Matrix completion from noisy entries.arXiv:0906.2027, June 2009.

E. J. Candes and Y. Plan.Matrix completion with noise.arXiv:0903.3131, 2009.

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 23 / 24

0 10 20 30 40 50 600

5

10

15

20

25

30

35

40

45

0 10 20 30 40 50 600

5

10

15

20

25

30

35

40

45Untrimmed SVD Trimmed SVD

Σ1|E |n

cq|E |n

R.Keshavan, A.Montanari, S.Oh (Stanford) Physics of Algorithms 2009 Aug 31, 2009 24 / 24