Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
ANKUR MOITRA MASSACHUSETTS INSTITUTE OF TECHNOLOGY
BEYOND MATRIX COMPLETION
Based on joint work with Boaz Barak (Harvard)
Matrix compleMon
Part I:
THE NETFLIX PROBLEM
users
movies
THE NETFLIX PROBLEM
users
movies
1-‐5 stars
THE NETFLIX PROBLEM
users
movies
480,189
17,770
THE NETFLIX PROBLEM
users
movies
480,189 1%
observed
17,770
THE NETFLIX PROBLEM
users
movies
480,189 1%
observed
17,770
Can we (approximately) fill-‐in the missing entries?
MATRIX COMPLETION Let M be an unknown, approximately low-‐rank matrix
MATRIX COMPLETION Let M be an unknown, approximately low-‐rank matrix
≈ + … + M +
MATRIX COMPLETION Let M be an unknown, approximately low-‐rank matrix
≈ + … + M +
drama
MATRIX COMPLETION Let M be an unknown, approximately low-‐rank matrix
≈ + … + M +
comedy drama
MATRIX COMPLETION Let M be an unknown, approximately low-‐rank matrix
≈ + … + M +
comedy drama sports
MATRIX COMPLETION Let M be an unknown, approximately low-‐rank matrix
≈ + … + M +
comedy drama sports
Model: we are given random observaMons Mi,j for all i,j Ω
MATRIX COMPLETION Let M be an unknown, approximately low-‐rank matrix
≈ + … + M +
comedy drama sports
Model: we are given random observaMons Mi,j for all i,j Ω
Is there an efficient algorithm to recover M?
MATRIX COMPLETION
The natural formulaMon is non-‐convex, and NP-‐hard
min rank(X) s.t. (i,j) Ω
|Xi,j–Mi,j| ≤ η 1 |Ω|
MATRIX COMPLETION
The natural formulaMon is non-‐convex, and NP-‐hard
min rank(X) s.t. (i,j) Ω
|Xi,j–Mi,j| ≤ η 1 |Ω|
There is a powerful, convex relaxaMon…
THE NUCLEAR NORM
Consider the singular value decomposi;on of X:
X = U Σ VT
orthogonal orthogonal diagonal
THE NUCLEAR NORM
Consider the singular value decomposi;on of X:
X = U Σ VT
orthogonal orthogonal diagonal
Let σ1 ≥ σ2 ≥ … σr > σr+1 = … σm = 0 be the singular values
THE NUCLEAR NORM
Consider the singular value decomposi;on of X:
X = U Σ VT
orthogonal orthogonal diagonal
Let σ1 ≥ σ2 ≥ … σr > σr+1 = … σm = 0 be the singular values
Then rank(X) = r, and X = σ1 + σ2 + … + σr * (nuclear norm)
[Fazel], [Srebro, Shraibman], [Recht, Fazel, Parrilo], [Candes, Recht], [Candes, Tao], [Candes, Plan], [Recht],
min X s.t. *
(i,j) Ω
|Xi,j–Mi,j| ≤ η 1 |Ω|
This yields a convex relaxaMon, that can be solved efficiently:
(P)
[Fazel], [Srebro, Shraibman], [Recht, Fazel, Parrilo], [Candes, Recht], [Candes, Tao], [Candes, Plan], [Recht],
min X s.t. *
(i,j) Ω
|Xi,j–Mi,j| ≤ η 1 |Ω|
This yields a convex relaxaMon, that can be solved efficiently:
(P)
Theorem: If M is n x n and has rank r, and is C-‐incoherent then (P) recovers M exactly from C6nrlog2n observaMons
[Fazel], [Srebro, Shraibman], [Recht, Fazel, Parrilo], [Candes, Recht], [Candes, Tao], [Candes, Plan], [Recht],
min X s.t. *
(i,j) Ω
|Xi,j–Mi,j| ≤ η 1 |Ω|
This is nearly opMmal, since there are O(nr) parameters
This yields a convex relaxaMon, that can be solved efficiently:
(P)
Theorem: If M is n x n and has rank r, and is C-‐incoherent then (P) recovers M exactly from C6nrlog2n observaMons
Robust PCA [Candes et al.], [Chandrasekaran et al.], … Can we recover a low rank matrix from sparse corrupMons?
Robust PCA [Candes et al.], [Chandrasekaran et al.], … Can we recover a low rank matrix from sparse corrupMons?
Superresolu;on, compressed sensing off-‐the-‐grid [Candes, Fernandez-‐Granda], [Tang et al.], …
Can we recover well-‐separated points from low-‐frequency measurements?
f1 f2 f3 f4
Higher order structure?
Part II:
TENSOR COMPLETION
Can using more than two auributes can lead to beuer recommendaMons?
TENSOR COMPLETION
Can using more than two auributes can lead to beuer recommendaMons?
e.g. Groupon
users
ac;vi;es
TENSOR COMPLETION users
;me
Can using more than two auributes can lead to beuer recommendaMons?
;me: season, Mme of day, weekday/weekend, etc
e.g. Groupon
TENSOR COMPLETION
T ≈ i = 1
r
users
;me
Can using more than two auributes can lead to beuer recommendaMons?
snow sports
;me: season, Mme of day, weekday/weekend, etc
e.g. Groupon
TENSOR COMPLETION
T ≈ ai bi ci × × i = 1
r
users
;me
Can using more than two auributes can lead to beuer recommendaMons?
TENSOR COMPLETION
T ≈ ai bi ci × × i = 1
r
users
;me
Can we (approximately) fill-‐in the missing entries?
Can using more than two auributes can lead to beuer recommendaMons?
THE TROUBLE WITH TENSORS Natural approach (suggested by many authors):
(P) min X s.t. * (i,j,k) Ω
|Xi,j,k–Ti,j,k| ≤ η 1 |Ω|
tensor nuclear norm
THE TROUBLE WITH TENSORS
(P) min X s.t. * (i,j,k) Ω
|Xi,j,k–Ti,j,k| ≤ η 1 |Ω|
tensor nuclear norm
[Gurvits], [Liu], [Harrow, Montanaro]
The tensor nuclear norm is NP-‐hard to compute!
Natural approach (suggested by many authors):
In fact, most of the linear algebra toolkit is ill-‐posed, or computa;onally hard for tensors…
e.g. [Hillar, Lim] “Most Tensor Problems are NP-‐Hard”
In fact, most of the linear algebra toolkit is ill-‐posed, or computa;onally hard for tensors…
e.g. [Hillar, Lim] “Most Tensor Problems are NP-‐Hard”
In fact, most of the linear algebra toolkit is ill-‐posed, or computa;onally hard for tensors…
Most Tensor Problems are NP-Hard 0:3
Table I. Tractability of Tensor Problems
Problem Complexity
Bivariate Matrix Functions over R, C Undecidable (Proposition 12.2)
Bilinear System over R, C NP-hard (Theorems 2.6, 3.7, 3.8)
Eigenvalue over R NP-hard (Theorem 1.3)
Approximating Eigenvector over R NP-hard (Theorem 1.5)
Symmetric Eigenvalue over R NP-hard (Theorem 9.3)
Approximating Symmetric Eigenvalue over R NP-hard (Theorem 9.6)
Singular Value over R, C NP-hard (Theorem 1.7)
Symmetric Singular Value over R NP-hard (Theorem 10.2)
Approximating Singular Vector over R, C NP-hard (Theorem 6.3)
Spectral Norm over R NP-hard (Theorem 1.10)
Symmetric Spectral Norm over R NP-hard (Theorem 10.2)
Approximating Spectral Norm over R NP-hard (Theorem 1.11)
Nonnegative Definiteness NP-hard (Theorem 11.2)
Best Rank-1 Approximation NP-hard (Theorem 1.13)
Best Symmetric Rank-1 Approximation NP-hard (Theorem 10.2)
Rank over R or C NP-hard (Theorem 8.2)
Enumerating Eigenvectors over R #P-hard (Corollary 1.16)
Combinatorial Hyperdeterminant NP-, #P-, VNP-hard (Theorems 4.1 , 4.2, Corollary 4.3)
Geometric Hyperdeterminant Conjectures 1.9, 13.1
Symmetric Rank Conjecture 13.2
Bilinear Programming Conjecture 13.4
Bilinear Least Squares Conjecture 13.5
Note: Except for positive definiteness and the combinatorial hyperdeterminant, which apply to 4-tensors,all problems refer to the 3-tensor case.
and n be positive integers. For the purposes of this article, a 3-tensor A over F is anl ⇥m⇥ n array of elements of F:
A = Jaijk
Kl,m,n
i,j,k=1 2 Fl⇥m⇥n. (1)
These objects are natural multilinear generalizations of matrices in the following way.For any positive integer d, let e1, . . . , ed denote the standard basis1 in the F-vector
space Fd. A bilinear function f : Fm⇥Fn ! F can be encoded by a matrix A = [aij
]
m,n
i,j=1 2Fm⇥n, in which the entry a
ij
records the value of f(ei
, ej
) 2 F. By linearity in eachcoordinate, specifying A determines the values of f on all of Fm ⇥ Fn; in fact, we havef(u,v) = u
>Av for any vectors u 2 Fm and v 2 Fn. Thus, matrices both encode 2-dimensional arrays of numbers and specify all bilinear functions. Notice also that ifm = n and A = A> is symmetric, then
f(u,v) = u
>Av = (u
>Av)
>= v
>A>u = v
>Au = f(v,u).
Thus, symmetric matrices are bilinear maps invariant under coordinate exchange.
1Formally, ei is the vector in Fd with a 1 in the ith coordinate and zeroes everywhere else. In this article,vectors in Fn will always be column-vectors.
Journal of the ACM, Vol. 0, No. 0, Article 0, Publication date: June 2013.
FLATTENING A TENSOR
Many tensor methods rely on flaTening:
FLATTENING A TENSOR
Many tensor methods rely on flaTening:
flat( ) = n 1
n3
n2n3
n1
FLATTENING A TENSOR
Many tensor methods rely on flaTening:
flat( ) = n 1
n3
ac;vi;es ;me
users
×
FLATTENING A TENSOR
Many tensor methods rely on flaTening:
flat( ) = n 1
n3
(j,k)
i
Ti,j,k
FLATTENING A TENSOR
Many tensor methods rely on flaTening:
flat( ) = n 1
n3
(j,k)
i
Ti,j,k
This is a rearrangement of the entries, into a matrix, that does not increase its rank
FLATTENING A TENSOR
Many tensor methods rely on flaTening:
flat( ) = n 1
n3
(j,k)
i
Ti,j,k
flat( ai bi ci × × i = 1
r
) = ai vec(bici) × i = 1
r
T
n2n3-‐dimensional vector
Let n1 = n2 = n3 = n
We would need O(n2r) observaMons to fill-‐in flat(T)
Let n1 = n2 = n3 = n
We would need O(n2r) observaMons to fill-‐in flat(T)
There are many other variants of flaTening, but with comparable guarantees
[Liu, Musialski, Wonka, Ye], [Gandy, Recht, Yamada], [SignoreTo, De Lathauwer, Suykens], [Tomioko, Hayashi, Kashima], [Mu, Huang, Wright, Goldfarb], …
Let n1 = n2 = n3 = n
We would need O(n2r) observaMons to fill-‐in flat(T)
Can we beat flauening?
There are many other variants of flaTening, but with comparable guarantees
[Liu, Musialski, Wonka, Ye], [Gandy, Recht, Yamada], [SignoreTo, De Lathauwer, Suykens], [Tomioko, Hayashi, Kashima], [Mu, Huang, Wright, Goldfarb], …
Let n1 = n2 = n3 = n
We would need O(n2r) observaMons to fill-‐in flat(T)
Can we beat flauening?
Can we make beuer predicMons than we do by treaMng each ac;vity x ;me as unrelated?
There are many other variants of flaTening, but with comparable guarantees
[Liu, Musialski, Wonka, Ye], [Gandy, Recht, Yamada], [SignoreTo, De Lathauwer, Suykens], [Tomioko, Hayashi, Kashima], [Mu, Huang, Wright, Goldfarb], …
Nearly opMmal algorithms for noisy tensor compleMon
Part III:
Suppose we are given |Ω|= m noisy observaMons from T:
OUR RESULTS
T = ai bi ci × × i = 1
r
σi
with |σi|, |ai|∞, |bi|∞, |ci|∞ ≤ C
+ noise
bdd by η
Theorem: There is an efficient algorithm that with prob 1-‐δ, outputs X with
Suppose we are given |Ω|= m noisy observaMons from T:
OUR RESULTS
i,j,k
|Xi,j,k–Ti,j,k| ≤ 1 n3
+ ln(2/δ)
m 2C3r n3/2log4n m C3r + 2η
T = ai bi ci × × i = 1
r
σi
with |σi|, |ai|∞, |bi|∞, |ci|∞ ≤ C
+ noise
bdd by η
SOME REMARKS
Theorem: There is an efficient algorithm that with prob 1-‐δ, outputs X with
i,j,k
|Xi,j,k–Ti,j,k| ≤ 1 n3
+ ln(2/δ)
m 2C3r n3/2log4n m C3r + 2η
SOME REMARKS
Theorem: There is an efficient algorithm that with prob 1-‐δ, outputs X with
i,j,k
|Xi,j,k–Ti,j,k| ≤ 1 n3
+ ln(2/δ)
m 2C3r n3/2log4n m C3r + 2η
In many se}ngs, 1-‐o(1) fracMon of entries of T are at least r1/2/polylog(n) in magnitude
SOME REMARKS
When m = Ω (n3/2r), average error is asymptoMcally smaller and hence Xi,j,k = (1±o(1))Ti,j,k for 1-‐o(1) fracMon of entries
Theorem: There is an efficient algorithm that with prob 1-‐δ, outputs X with
i,j,k
|Xi,j,k–Ti,j,k| ≤ 1 n3
+ ln(2/δ)
m 2C3r n3/2log4n m C3r + 2η
In many se}ngs, 1-‐o(1) fracMon of entries of T are at least r1/2/polylog(n) in magnitude
SOME REMARKS
When m = Ω (n3/2r), average error is asymptoMcally smaller and hence Xi,j,k = (1±o(1))Ti,j,k for 1-‐o(1) fracMon of entries
Theorem: There is an efficient algorithm that with prob 1-‐δ, outputs X with
i,j,k
|Xi,j,k–Ti,j,k| ≤ 1 n3
+ ln(2/δ)
m 2C3r n3/2log4n m C3r + 2η
“Almost all of the entries, almost en;rely correct”
In many se}ngs, 1-‐o(1) fracMon of entries of T are at least r1/2/polylog(n) in magnitude
SOME REMARKS
Theorem: There is an efficient algorithm that with prob 1-‐δ, outputs X with
i,j,k
|Xi,j,k–Ti,j,k| ≤ 1 n3
+ ln(2/δ)
m 2C3r n3/2log4n m C3r + 2η
SOME REMARKS
Theorem: There is an efficient algorithm that with prob 1-‐δ, outputs X with
i,j,k
|Xi,j,k–Ti,j,k| ≤ 1 n3
+ ln(2/δ)
m 2C3r n3/2log4n m C3r + 2η
For r = n3/2-‐δ (highly overcomplete), we only need to observe an n-‐δ fracMon of the entries
SOME REMARKS
Algorithms for decomposing such tensors given all the entries need stronger (e.g. factors are random) assumpMons
Theorem: There is an efficient algorithm that with prob 1-‐δ, outputs X with
i,j,k
|Xi,j,k–Ti,j,k| ≤ 1 n3
+ ln(2/δ)
m 2C3r n3/2log4n m C3r + 2η
For r = n3/2-‐δ (highly overcomplete), we only need to observe an n-‐δ fracMon of the entries
LOWER BOUNDS
Not only is the tensor nuclear norm hard to compute, but…
LOWER BOUNDS
Not only is the tensor nuclear norm hard to compute, but…
Noisy tensor compleMon with m observaMons
Refute random 3-‐SAT with m clauses
LOWER BOUNDS
Not only is the tensor nuclear norm hard to compute, but…
Noisy tensor compleMon with m observaMons
Refute random 3-‐SAT with m clauses
Believed to be hard, If m = n3/2-‐δ
Matrix compleMon revisited: ConnecMons to random CSPs
Part IV:
Can we disMnguish between low-‐rank and random?
+ noise
Can we disMnguish between low-‐rank and random?
aT
a
Case #1: Approximately low-‐rank
Mi,j = random ±1 w/ probability ¼
+ noise
For each (i,j) Ω
aiaj w/ probability ¾
Can we disMnguish between low-‐rank and random?
where each ai = ±1
aT
a
Case #1: Approximately low-‐rank
Can we disMnguish between low-‐rank and random?
, Mi,j = random ±1 For each (i,j) Ω
Can we disMnguish between low-‐rank and random?
Case #2: Random
random
, Mi,j = random ±1 For each (i,j) Ω
Can we disMnguish between low-‐rank and random?
Case #2: Random
random
In Case #1 the entries are (somewhat) predictable, but in Case #2 they are completely unpredictable
There are two very different communiMes that (essenMally) auacked this same disMnguishing problem:
There are two very different communiMes that (essenMally) auacked this same disMnguishing problem:
The community working on matrix comple;on
There are two very different communiMes that (essenMally) auacked this same disMnguishing problem:
The community working on matrix comple;on The community working on refu;ng random CSPs
AN INTERPRETATION
(i1, j1; σ1), (i2, j2; σ2), …, (im, jm; σm)
We can interpret:
±1 r.v. as a random 2-‐XOR formula ψ
AN INTERPRETATION
In parMcular each observaMon/fctn value maps to a clause:
(i, j, σ)
We can interpret:
±1 r.v. as a random 2-‐XOR formula ψ
(i1, j1; σ1), (i2, j2; σ2), …, (im, jm; σm)
vi vj = σ �
variables constraint
AN INTERPRETATION
(and vice-‐versa)
In parMcular each observaMon/fctn value maps to a clause:
(i, j, σ) vi vj = σ �
We can interpret:
±1 r.v. as a random 2-‐XOR formula ψ
(i1, j1; σ1), (i2, j2; σ2), …, (im, jm; σm)
variables constraint
STRONG REFUTATION
We will say that an algorithm strongly refutes* random 2-‐XOR with m clauses if:
STRONG REFUTATION
We will say that an algorithm strongly refutes* random 2-‐XOR with m clauses if:
(1) On any 2-‐XOR formula ψ, it outputs val where:
OPT(ψ) ≤ val(ψ)
STRONG REFUTATION
We will say that an algorithm strongly refutes* random 2-‐XOR with m clauses if:
(1) On any 2-‐XOR formula ψ, it outputs val where:
OPT(ψ) ≤ val(ψ)
largest fracMon of clauses of ψ that can be saMsfied
STRONG REFUTATION
We will say that an algorithm strongly refutes* random 2-‐XOR with m clauses if:
(1) On any 2-‐XOR formula ψ, it outputs val where:
OPT(ψ) ≤ val(ψ)
STRONG REFUTATION
We will say that an algorithm strongly refutes* random 2-‐XOR with m clauses if:
(1) On any 2-‐XOR formula ψ, it outputs val where:
OPT(ψ) ≤ val(ψ)
(2) With high probability (for random ψ with m clauses):
val(ψ) = 1 2 + o(1)
Lemma: If (i1, j1; σ1), …, (im, jm; σm) ψ then
σ
i
j
A 2 OPT(ψ) – ≤ 1
n 1 m 2
A
Lemma: If (i1, j1; σ1), …, (im, jm; σm) ψ then j
Proof: Map the assignment to a unit vector so that xi = ±1/√n and take the quadraMc form on A
2 OPT(ψ) – ≤ 1 n
1 m 2
A σ
i
A
1 m A 1
mn
Lemma: If (i1, j1; σ1), …, (im, jm; σm) ψ then j
Proof: Map the assignment to a unit vector so that xi = ±1/√n and take the quadraMc form on A
2 OPT(ψ) – ≤ 1 n
1 m 2
A σ
i
A
1 m A 1
mn OPT(ψ) ≤
1 2 + o(1)
Lemma: If (i1, j1; σ1), …, (im, jm; σm) ψ then j
Proof: Map the assignment to a unit vector so that xi = ±1/√n and take the quadraMc form on A
2 OPT(ψ) – ≤ 1 n
1 m 2
A σ
i
A
m = ω(n)
This solves the strong refutaMon problem…
Lemma: If (i1, j1; σ1), …, (im, jm; σm) ψ then j
Proof: Map the assignment to a unit vector so that xi = ±1/√n and take the quadraMc form on A
2 OPT(ψ) – ≤ 1 n
1 m 2
A σ
i
A
1 m A 1
mn OPT(ψ) ≤
1 2 + o(1) m = ω(n)
There are two very different communiMes that (essenMally) auacked this same disMnguishing problem:
The community working on matrix comple;on The community working on refu;ng random CSPs
There are two very different communiMes that (essenMally) auacked this same disMnguishing problem:
The community working on matrix comple;on
The same spectral bound implies:
An algorithm for strongly refuMng random 2-‐XOR (1)
The community working on refu;ng random CSPs
There are two very different communiMes that (essenMally) auacked this same disMnguishing problem:
The community working on matrix comple;on
The same spectral bound implies:
An algorithm for strongly refuMng random 2-‐XOR An algorithm for the disMnguishing problem
(1)
(2)
The community working on refu;ng random CSPs
There are two very different communiMes that (essenMally) auacked this same disMnguishing problem:
The community working on matrix comple;on
The same spectral bound implies:
An algorithm for strongly refuMng random 2-‐XOR An algorithm for the disMnguishing problem
(1)
(2)
The community working on refu;ng random CSPs
GeneralizaMon bounds for the nuclear norm (3)
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
min X s.t. *
(i,j) Ω
|Xi,j–Mi,j| ≤ η 1 |Ω|
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
min X s.t. *
(i,j) Ω
|Xi,j–Mi,j| ≤ η 1 |Ω|
An approach through sta;s;cal learning theory:
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
min X s.t. *
(i,j) Ω
|Xi,j–Mi,j| ≤ η 1 |Ω|
An approach through sta;s;cal learning theory:
empirical error: (i,j) Ω
|Xi,j–Mi,j| 1 |Ω|
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
min X s.t. *
(i,j) Ω
|Xi,j–Mi,j| ≤ η 1 |Ω|
An approach through sta;s;cal learning theory:
empirical error: (i,j) Ω
|Xi,j–Mi,j| 1 |Ω|
(≤ η)
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
min X s.t. *
(i,j) Ω
|Xi,j–Mi,j| ≤ η 1 |Ω|
An approach through sta;s;cal learning theory:
empirical error:
predic;on error:
(i,j) Ω
|Xi,j–Mi,j| 1 |Ω|
|Xi,j–Mi,j| 1
(≤ η)
n2
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
Then if we let
K = { X s.t. * X ≤ 1 } = } abT s.t. conv { a b ≤ , 1
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
Then if we let
K = { X s.t. * X ≤ 1 } = } abT s.t. conv { a b ≤ , 1
sup X K
–
generaliza;on error:
empirical error (X) predic;on error (X) (on Ω)
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
Then if we let
K = { X s.t. * X ≤ 1 } = } abT s.t. conv { a b ≤ , 1
generalizaMon error ≤ best agreement with random funcMon Theorem: “ ”
(on Ω)
sup X K
–
generaliza;on error:
empirical error (X) predic;on error (X) (on Ω)
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
Then if we let
K = { X s.t. * X ≤ 1 } = } abT s.t. conv { a b ≤ , 1
generalizaMon error ≤ best agreement with random funcMon Theorem: “ ”
(on Ω)
Rademacher complexity
sup X K
–
generaliza;on error:
empirical error (X) predic;on error (X) (on Ω)
More precisely:
sup X K a = 1
X ia,ja σa 1 m
Rademacher complexity (Rm(K))
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
More precisely:
sup X K a = 1
X ia,ja σa 1 m
Rademacher complexity (Rm(K))
= 1 m A
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
More precisely:
sup X K a = 1
X ia,ja σa 1 m
Rademacher complexity (Rm(K))
= 1 m A
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
1 m A 1
mn
More precisely:
sup X K a = 1
X ia,ja σa 1 m
Rademacher complexity (Rm(K))
= 1 m A
Rm(K) = o( ) 1 n
It also yields bounds on how well the soluMon to the convex program generalizes [Srebro, Shraibman] …
1 m A 1
mn m = ω(n)
There are two very different communiMes that (essenMally) auacked this same disMnguishing problem:
The community working on matrix comple;on The community working on refu;ng random CSPs
Noisy matrix compleMon with m observaMons
Strongly refute* random 2-‐XOR/2-‐SAT with m clauses
*Want an algorithm that cerMfies a formula is far from saMsfiable
Rademacher Complexity
There are two very different communiMes that (essenMally) auacked this same disMnguishing problem:
The community working on matrix comple;on The community working on refu;ng random CSPs
*Want an algorithm that cerMfies a formula is far from saMsfiable
Strongly refute* random 3-‐XOR/3-‐SAT with m clauses Rademacher
Complexity
Noisy tensor compleMon with m observaMons
There are two very different communiMes that (essenMally) auacked this same disMnguishing problem:
The community working on matrix comple;on The community working on refu;ng random CSPs
*Want an algorithm that cerMfies a formula is far from saMsfiable
[Coja-‐Oghlan, Goerdt, Lanka]
Strongly refute* random 3-‐XOR/3-‐SAT with m clauses Rademacher
Complexity
Noisy tensor compleMon with m observaMons
vi vj’ vk’ = σ’ vi vj vk = σ � � ( ) � � � ( )
vj'
vk'
vi σ' vj
vk
σ
[Coja-‐Ohglan, Goerdt, Lanka]: Reduce 3-‐XOR to 4-‐XOR
vi vj’ vk’ = σ’ vi vj vk = σ � � ( ) � � � ( ) ( vj vk vj’ vk’ = σ σ’ � � � )
vj'
vk'
vi σ' vj
vk
σ vj'
vk'
vi
vj
vk σ σ’
[Coja-‐Ohglan, Goerdt, Lanka]: Reduce 3-‐XOR to 4-‐XOR
vi vj’ vk’ = σ’ vi vj vk = σ � � ( ) � � � ( ) ( vj vk vj’ vk’ = σ σ’ � � � )
vj'
vk'
vi σ' vj
vk
σ vj'
vk'
vi
vj
vk σ σ’
This yields n5p2 = n2 logO(1)n clauses
[Coja-‐Ohglan, Goerdt, Lanka]: Reduce 3-‐XOR to 4-‐XOR
vi vj’ vk’ = σ’ vi vj vk = σ � � ( ) � � � ( ) ( vj vk vj’ vk’ = σ σ’ � � � )
vj'
vk'
vi σ' vj
vk
σ vj'
vk'
vi
vj
vk σ σ’
This yields n5p2 = n2 logO(1)n clauses
[Coja-‐Ohglan, Goerdt, Lanka]: Reduce 3-‐XOR to 4-‐XOR
Warning: The 4-‐XOR clauses are not independent!
vi vj’ vk’ = σ’ vi vj vk = σ � � ( ) � � � ( ) ( vj vk vj’ vk’ = σ σ’ � � � ) [Coja-‐Ohglan, Goerdt, Lanka]: Reduce 3-‐XOR to 4-‐XOR
vj'
vk'
vi
vj
vk
(j, j’)
A
vi vj’ vk’ = σ’ vi vj vk = σ � � ( ) � � � ( ) ( vj vk vj’ vk’ = σ σ’ � � � ) [Coja-‐Ohglan, Goerdt, Lanka]: Reduce 3-‐XOR to 4-‐XOR
vj'
vk'
vi
vj
vk
(j, j’)
(k,k’)
A
vi vj’ vk’ = σ’ vi vj vk = σ � � ( ) � � � ( ) ( vj vk vj’ vk’ = σ σ’ � � � ) [Coja-‐Ohglan, Goerdt, Lanka]: Reduce 3-‐XOR to 4-‐XOR
vj'
vk'
vi
vj
vk σ σ’
(j, j’)
(k,k’)
A
vi vj’ vk’ = σ’ vi vj vk = σ � � ( ) � � � ( ) ( vj vk vj’ vk’ = σ σ’ � � � ) [Coja-‐Ohglan, Goerdt, Lanka]: Reduce 3-‐XOR to 4-‐XOR
Hence the paired variables for the rows (and colns) come from different clauses!
vj'
vk'
vi
vj
vk σ σ’
(j, j’)
(k,k’)
A
vi vj’ vk’ = σ’ vi vj vk = σ � � ( ) � � � ( ) ( vj vk vj’ vk’ = σ σ’ � � � ) [Coja-‐Ohglan, Goerdt, Lanka]: Reduce 3-‐XOR to 4-‐XOR
Hence the paired variables for the rows (and colns) come from different clauses!
vj'
vk'
vi
vj
vk σ σ’
(j, j’)
(k,k’)
A + trace method
There are two very different communiMes that (essenMally) auacked this same disMnguishing problem:
The community working on matrix comple;on The community working on refu;ng random CSPs
*Want an algorithm that cerMfies a formula is far from saMsfiable
[Coja-‐Oghlan, Goerdt, Lanka]
Strongly refute* random 3-‐XOR/3-‐SAT with m clauses Rademacher
Complexity
Noisy tensor compleMon with m observaMons
There are two very different communiMes that (essenMally) auacked this same disMnguishing problem:
The community working on matrix comple;on The community working on refu;ng random CSPs
Strongly refute* random 3-‐XOR/3-‐SAT with m clauses
*Want an algorithm that cerMfies a formula is far from saMsfiable
[Coja-‐Oghlan, Goerdt, Lanka]
We then embed this algorithm into the sixth level of the sum-‐of-‐squares hierarchy, to get a relaxaMon for tensor predicMon
Embedding in SOS
Noisy tensor compleMon with m observaMons
i,j,k
Theorem: There is an efficient algorithm that with prob 1-‐δ, outputs X with
|Xi,j,k–Ti,j,k| ≤ 1 n3
+ ln(2/δ)
m 2C3r n3/2log4n m C3r + 2η
GENERALIZATION BOUNDS
Suppose we are given |Ω|= m noisy observaMons Ti,j,k ± η, and the factors of T are C-‐incoherent:
This comes from giving an efficiently computable norm whose Rademacher complexity is asymptoMcally smaller than the trivial bound whenever m=Ω(n3/2log4n)
K �
Theorem: There is an efficient algorithm that with prob 1-‐δ, outputs X with
Suppose we are given |Ω|= m noisy observaMons Ti,j,k ± η, and the factors of T are C-‐incoherent:
GENERALIZATION BOUNDS
i,j,k
|Xi,j,k–Ti,j,k| ≤ 1 n3
+ ln(2/δ)
m 2C3r n3/2log4n m C3r + 2η
SUMMARY
We gave an algorithm for 3rd-‐order tensor predicMon that uses m = n3/2rlog4n observaMons
New Algorithm:
SUMMARY
There is an inefficient algorithm that use m = nrlogn observaMons
New Algorithm:
An Inefficient Algorithm:
We gave an algorithm for 3rd-‐order tensor predicMon that uses m = n3/2rlog4n observaMons
(via tensor nuclear norm)
SUMMARY
There is an inefficient algorithm that use m = nrlogn observaMons
Even for nδ rounds of the powerful sum-‐of-‐squares hierarchy, no norm solves tensor predicMon with m = n3/2-‐δr observaMons
New Algorithm:
An Inefficient Algorithm:
A Phase Transi;on:
We gave an algorithm for 3rd-‐order tensor predicMon that uses m = n3/2rlog4n observaMons
(via tensor nuclear norm)
New direcMons in computaMonal vs. staMsMcal tradeoffs
Epilogue:
DISCUSSION
Convex programs are unreasonably effecMve for linear inverse problems!
DISCUSSION
Convex programs are unreasonably effecMve for linear inverse problems!
But we gave simple linear inverse problems that exhibit striking gaps between efficient and inefficient esMmators
DISCUSSION
Convex programs are unreasonably effecMve for linear inverse problems!
But we gave simple linear inverse problems that exhibit striking gaps between efficient and inefficient esMmators
Where else are there computaMonal vs staMsMcal tradeoffs?
DISCUSSION
Where else are there computaMonal vs staMsMcal tradeoffs?
New Direc;on: Explore computaMonal vs. staMsMcal tradeoffs through the powerful sum-‐of-‐squares hierarchy
Convex programs are unreasonably effecMve for linear inverse problems!
But we gave simple linear inverse problems that exhibit striking gaps between efficient and inefficient esMmators