35
Aravindan Vijayaraghavan NYU Northwestern University Smoothed Analysis of Tensor Decompositions based on joint work with Aditya Bhaskara Google Research Moses Charikar Princeton Ankur Moitra MIT

Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Aravindan Vijayaraghavan

NYU ⇒ Northwestern University

Smoothed Analysis of Tensor Decompositions

based on joint work with

Aditya Bhaskara

Google Research

Moses Charikar

Princeton

Ankur Moitra

MIT

Page 2: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Multi-dimensional arrays

Tensors

n n n

n n

• Tensor of order 𝑝 ≡ 𝑝-tensor of size 𝑛 × 𝑛 × ⋯ 𝑛 (𝑝 times)

• Elements are over ℝ

Page 3: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Low rank Decompositions

Tensor can be written as a sum of few rank-one tensors

𝑇 = 𝑎𝑖⨂𝑏𝑖⨂𝑐𝑖

𝑘

𝑖=1

Rank(T) = smallest k s.t. T written as sum of k rank-1 tensors

3-Tensors:

T 𝑎1 𝑎2 𝑎𝑘

𝑐1 𝑐2 𝑐𝑘

• Rank of 𝑝-tensor 𝑇𝑛×⋯×𝑛 ≤ 𝑛𝑝−1

Low-rank 𝜖-approximation: Low-rank decomposition approximating T up to error 𝜖 in

Frobenius norm i.e. 𝑇 − 𝑎𝑖⨂𝑏𝑖⨂𝑐𝑖𝑘𝑖=1

𝐹≤ 𝜖

Page 4: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Tensor Decomposition: Uniqueness

Thm [Kruskal’77]. Rank-𝑘 decompositions for 3-tensors unique

(non-algorithmic) under rank condition (𝑘 ≤3𝑛

2− 1 )

• p-tensors: rank condition gives 𝑘 ≤ (𝑝𝑛 − 𝑝+1)/2 [SB01]

Thm [Jennrich via Harshman’70]. Find unique rank-𝑘 decompositions for 3-tensors when the vectors of a decomposition are linearly independent (hence 𝑘 ≤ 𝑛)

• “Full-rank” case. Rediscovered in [Leurgans et al. 93, Chang’96]

Thm [DeLathauwer, Castiang, Cardoso’07]. Algorithm for 4-tensors of rank 𝑘 generically when 𝑘 ≤ 𝑐. 𝑛2 • p-tensors: generically handle 𝑘 ≤ 𝑐. 𝑛⌊𝑝/2⌋

Thm [Chiantini Ottaviani‘14]. Uniqueness of 3-tensors of rank 𝑘 ≤ 𝑛2/3 generically

Page 5: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Algorithms for Tensor Decompositions

NP-hard in general when rank 𝑘 ≫ 𝑛 (except in special settings)

[Hillar-Lim]

• Polynomial time algorithms* for robust Tensor decompositions

• Introduce Smoothed Analysis to overcome worst-case

intractability

• Handle rank 𝒌 ≫ 𝒏 for higher order tensors (𝒑 ≥ 𝟓).

*Algorithms 𝑝𝑜𝑙𝑦𝑝(𝑛, 𝑘, 1/𝜖) to get vectors in decomposition to 𝜖 error in . 2

This talk

Page 6: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Talk Plan

1. Applications of Tensor Decompositions to ML

– Motivating Algorithm properties

2. Smoothed Analysis Model and Results

3. Overview of the Proof

Page 7: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Learning Probabilistic Models: Parameter Estimation

Learning goal: Can the parameters of the model be learned from polynomial samples generated by the model ?

HMMs for speech recognition

Mixture of Gaussians for clustering points

Question: Can given data be “explained” by a simple probabilistic model?

Multiview models

Page 8: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Parameters • Mixing weights: 𝑤1, 𝑤2, … , 𝑤𝑘 • Gaussian 𝐺𝑖 : 𝜇𝑖 , Σ𝑖 mean 𝜇𝑖 , covariance Σ𝑖 : diagonal

Learning problem: Given many

sample points, find (𝑤𝑖 , 𝜇𝑖 , Σ𝑖)

Probabilistic model for Clustering in 𝒏-dims

Mixtures of (axis-aligned) Gaussians

• Algorithms use 𝐎(𝐞𝐱𝐩 𝒌 . 𝒑𝒐𝒍𝒚(𝒏)) samples and time [FOS’06, MV’10]

• Lower bound of Ω(𝐞𝐱𝐩(𝒌)) [MV’10] in worst case

ℝ𝑛

𝜇𝑖

𝑥

Aim: 𝒑𝒐𝒍𝒚(𝒌, 𝒏) guarantees in realistic settings

Page 9: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Method of Moments and Tensor decompositions

step 1. compute a tensor whose decomposition

encodes model parameters

step 2. find decomposition (and hence

parameters)

𝑛

𝑛

𝑛

⋅⋅ ⋯ ⋅⋮ 𝐸 𝑥𝑖𝑥𝑗𝑥𝑘

⋮ ⋯

𝑻 = 𝒘𝒊 𝝁𝒊 ⊗ 𝝁𝒊 ⊗ 𝝁𝒊

𝒌

𝒊=𝟏

• Uniqueness ⟹ Recover parameters 𝑤𝑖 and 𝜇𝑖 • Algorithm for Decomposition ⟹ efficient learning

[Chang] [Allman, Matias, Rhodes]

[Anandkumar,Ge,Hsu, Kakade, Telgarsky]

Third moment tensor

Page 10: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Aim: 1. Uniqueness of Tensor Decompositions

2. Algorithms taking time 𝒑𝒐𝒍𝒚 𝒏, 𝒌

robust to noise 𝝐 = 𝟏/𝒑𝒐𝒍𝒚(𝒏, 𝒌)

Robustness to Errors

Beware : Sampling error

Empirical estimate 𝑇 =𝜖 𝑤𝑖 𝜇𝑖⨂ 𝜇𝑖⨂ 𝜇𝑖𝑘𝑖=1

With poly(n) samples, error ϵ ≈ 1/poly(n, k)

Thm[Goyal-Vempala-Xiao]. Jennrich’s polytime algorithm for Tensor Decompositions of rank 𝑘 ≤ 𝑛 robust up to 1/𝑝𝑜𝑙𝑦(𝑛) error

⇒ Efficient Learning for many probabilistic models when no. of clusters/ topics k ≤ dimension n [Chang 96, Mossel-Roch 06, Hsu-Kakade12 , Anandkumar et al. 09-14]

Page 11: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Overcomplete Setting

Number of clusters/topics/states 𝐤 ≫ dimension 𝐧

Computer Vision Speech

NP-hard in worst-case (for rank k ≥ 6n)

Polytime decomposition of Tensors of rank 𝒌 ≫ 𝑛? ( rank 𝑘 ≤ 𝑛𝑝−1)

Page 12: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Smoothed Analysis

[Spielman & Teng 2000]

• Small random perturbation of input makes instances easy

• Best polytime guarantees in the absence of any worst-case guarantees

Good Smoothed analysis guarantees:

• Worst instances are isolated

Simplex algorithm solves LPs efficiently (explains practice).

Page 13: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Smoothed Analysis for Learning

Learning setting (e.g. Mixtures of Gaussians)

Worst-case instances: Means 𝝁𝒊 in pathological configurations

Means not in adversarial configurations in real-world!

What if means 𝜇𝑖 perturbed slightly ? 𝝁𝒊 𝝁 𝒊

Parameters of the model are perturbed slightly.

Page 14: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Smoothed Analysis for Tensor Decompositions

1. Given tensor

3. Input: 𝑇 . Analyse algorithm on 𝑇 .

2. 𝑎 𝑖(𝑗)

is random 𝜌-perturbation of 𝑎𝑖(𝑗)

i.e. add independent (gaussian) random vector of length ≈ 𝜌.

𝑇𝑑×𝑑×⋯×𝑑 = 𝑎𝑖(1)

⨂𝑎𝑖2

⨂ … ⨂𝑎𝑖𝑝

𝑘

𝑖=1

𝑇 = 𝑎 𝑖(1)

⨂𝑎 𝑖2

⨂ … ⨂𝑎 𝑖𝑝

+ noise𝑘

𝑖=1

Factors of the Decomposition are perturbed

For mixture of Gaussians, means 𝜇𝑖 perturbed slightly

T = 𝑤𝑖 𝜇𝑖 ⊗ 𝜇𝑖 ⊗ 𝜇𝑖 … ⊗ 𝜇𝑖

𝑘

𝑖=1

Page 15: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Smoothed Analysis model

• Different from elements of T being perturbed

• More similar in spirit generic results

than average-case analysis:

no regions that are hard

𝑇 = 𝑎 𝑖(1)

⨂𝑎 𝑖2

⨂ … ⨂𝑎 𝑖𝑝

+ noise𝑘

𝑖=1

Easy

Problem instance space

Hard

• Robust analog for generic results ?

Page 16: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Algorithmic Guarantees

Thm. Polynomial time algorithm for decomposing 𝑝-tensor (size 𝑛𝑝)

in smoothed analysis, when rank 𝒌 ≤ 𝒏⌊(𝒑−𝟏)/𝟐⌋ w.p. 1 − exp (−𝑛𝑓 𝑝 )

Running time, sample complexity = 𝒑𝒐𝒍𝒚𝒑 𝒏,𝟏

𝝆.

Guarantees for order-t tensors in d-dims (each)

Rank of the t-tensor=k (number of clusters)

Previous Algorithms

𝑘 ≤ 𝑛

Algorithms (smoothed case) 𝑘 ≤ 𝑛(𝑝−1)/2

Corollary. Polytime algorithms (smoothed analysis) for learning parameters of mixtures of axis-aligned Gaussians, multiview models etc. when no. of clusters k ≤ dimC for any constant C w.h.p.

Page 17: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Interpreting Smoothed Analysis Guarantees

Time, sample complexity = 𝑝𝑜𝑙𝑦𝑝 𝑛,1

𝜌.

Works with probability 1 − exp (−𝜌𝑛3−𝑝)

• Exponential small failure probability in polytime (for constant 𝑝)

Smooth Interpolation between Worst-case and Average-case

[Anderson, Belkin, Goyal, Rademacher, Voss ‘14]

Time, sample complexity = 𝑝𝑜𝑙𝑦𝑝 𝑛,1

𝜌, 1/𝜏 .

Works with probability =1 − 𝜏

Page 18: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Algorithm Details

Page 19: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Algorithm Outline

• Helps handle the over-complete setting (k ≫ 𝑛)

[Jennrich 70] A simple (robust) algorithm for 3-tensor T when: 𝜎𝑘 𝐴 , 𝜎𝑘 𝐵 , 𝜎2(𝐶) ≥ 1/𝑝𝑜𝑙𝑦(𝑛, 𝑘)

2. For higher order tensors using ``tensoring / flattening’’.

1. An algorithm for 3-tensors in the ``full rank setting’’ (𝐤 ≤ 𝒏).

𝐴 (𝑛 × 𝑘)

𝑇 = 𝑎𝑖⨂𝑏𝑖⨂𝑐𝑖

𝑘

𝑖=1 Recall:

𝑎𝑖

𝑛

𝑛

𝑇 Aim: Recover A, B, C

• Any algorithm for full-rank tensors suffices

Page 20: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Blast from the Past

𝑇

𝑻 ≈𝝐 𝑎𝑖⨂𝑏𝑖⨂𝑐𝑖

𝑘

𝑖=1

Recall

𝐴 (𝑛 × 𝑘)

𝑎𝑖

Aim: Recover A, B, C

Qn. Is this algorithm robust to errors ?

Yes ! Needs perturbation bounds for eigenvectors. [Stewart-Sun]

Thm. Efficiently decompose T=𝜖 𝑎𝑖⨂𝑏𝑖⨂𝑐𝑖𝑘𝑖=1

and recover 𝐴, 𝐵, 𝐶 upto 𝜖. 𝑝𝑜𝑙𝑦(𝑛, 𝑘) error when 1) 𝐴, 𝐵 are min-singular-value ≥ 1/poly(n) 2) C doesn’t have parallel columns (robustly).

[Jennrich via Harshman 70]

Algorithm for 3-tensor 𝑇 = 𝑎𝑖⨂𝑏𝑖⨂𝑐𝑖𝑘𝑖=1

• 𝐴, 𝐵 have rank=k i.e. ai , bi are linearly independent • C has rank ≥2

• Reduces to matrix eigen-decompositions

Page 21: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Algorithm:

1. Take random combination along w1 as 𝑀1.

2. Take random combination along w2 as 𝑀2.

3. Find eigen-decomposition of 𝑀1𝑀2† to get 𝐴. Similarly B,C.

Decomposition algorithm [Jennrich]

𝑇 ≈𝜖 𝑎𝑖 ⊗ 𝑏𝑖 ⊗ 𝑐𝑖

𝑘

𝑖=1

Thm. Efficiently decompose T=𝜖 𝑎𝑖⨂𝑏𝑖⨂𝑐𝑖𝑘𝑖=1 and recover

𝐴, 𝐵, 𝐶 up to 𝜖. 𝑝𝑜𝑙𝑦(𝑛, 𝑘) error (in Frobenius norm) when 1) 𝐴, 𝐵 are full rank i.e. min-singular-value ≥ 1/poly(n) 2) C doesn’t have parallel columns (in a robust sense).

Page 22: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Handling high rank

into Techniques

Page 23: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Mapping to Higher Dimensions

How do we handle the case rank 𝒌 = 𝛀(𝒏𝟐)?

(or vectors with “many” linear dependencies?)

Consider a 6th order tensor with rank 𝑘 ≤ 𝑛2

Trick: view T as an 𝑛2 × 𝑛2 × 𝑛2 object

vectors in the decomposition are: 𝑎𝑖 ⊗ 𝑏𝑖 , {𝑐𝑖 ⊗ 𝑑𝑖}, {𝑒𝑖 ⊗ 𝑓𝑖}

𝑇 = 𝑎𝑖 ⊗ 𝑏𝑖 ⊗ 𝑐𝑖 ⊗ 𝑑𝑖 ⊗ 𝑒𝑖 ⊗ 𝑓𝑖

𝑘

𝑖=1

Qn: are these vectors 𝒂𝒊 ⊗ 𝒃𝒊 𝒊=𝟏…𝒌 linearly independent? Is ``dimensionality’’ 𝛀(𝒏𝟐)?

Page 24: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Bad cases

Smoothed Analysis Can we hope for “dimension” to multiply typically?

Bad example where 𝑘 > 2𝑛: • Cols of A=B composed of two orthonormal basis of ℝ𝑛 • Every 𝑛 vectors of A and B are linearly independent • But (2𝑛 − 1) vectors of Z are linearly dependent !

B(𝑛 × 𝑘)

𝑏𝑖

A(𝑛 × 𝑘)

𝑎𝑖

Z (𝑛2 × 𝑘)

𝑧𝑖= 𝑎𝑖⨂𝑏𝑖

A, B have rank 𝑛. 𝑧𝑖 = 𝑎𝑖⨂𝑏𝑖 ∈ ℝ𝑛2 for 𝑖 = 1 … 𝑘 = 𝑛2

Dimension does not grow multiplicatively in worst case

But, bad examples are pathological and hard to construct!

Qn: Are 𝒂𝒊 ⊗ 𝒃𝒊 𝒊=𝟏…𝒌 linearly independent? Is ``dimensionality’’ 𝛀(𝒏𝟐)?

Page 25: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Product vectors & linear structure

• ``Flattening’’ of 3𝑝-order moment tensor

• New factor matrix is full rank using Smoothed Analysis.

Theorem. For any matrix 𝐴𝑛×𝑘, for 𝑘 < 𝑛𝑝/2,

𝜎𝑘 𝑍 ≥ 1/𝑝𝑜𝑙𝑦 𝑘, 𝑛,1

𝜌 with probability 1- exp(-poly(n)).

𝐴 (𝑛 × 𝑘)

𝑎 𝑖

𝑍 (𝑛𝑝 × 𝑘)

𝑎 𝑖 ⊗ 𝑎 𝑖 ⊗ ⋯ 𝑎 𝑖

Khatri-Rao product

Page 26: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Proof sketch (two-wise product p=2)

Main Issue: perturbation before product..

• easy if columns perturbed after tensor product (simple anti-concentration bounds)

Technical component

show product of perturbed vectors behave

like random vectors in 𝑅𝑛2

𝑛2

𝑘

𝑍

Prop. For any matrix 𝐴, matrix 𝑍 below (𝑘 < 𝑛2/2) has

𝜎𝑘 𝐵 ≥ 1/𝑝𝑜𝑙𝑦 𝑘, 𝑛,1

𝜌 with probability 1- exp(-poly(n)).

• only 2𝑛 bits of randomness in 𝑛2 dims • Block dependencies

Page 27: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Projections of product vectors

Easy : 𝑛2 dimensional 𝑥, 𝜌-perturbation to 𝑥

projection > 1/𝑝𝑜𝑙𝑦(𝜌) on to S w.h.p.

anti-concentration for polynomials implies this with

probability 1-1/poly(n)

….

... 𝑎 𝑛 ⊗ 𝑏

Much tougher for product of perturbations! (inherent block structure)

Question. Given any vector 𝑎, 𝑏 ∈ ℝ𝑛 and gaussian 𝜌-perturbation

𝑎 , 𝑏 , does 𝒂 ⊗ 𝒃 have projection 𝑝𝑜𝑙𝑦(𝜌,1

𝑛) onto any given 𝑛2/2

dimensional subspace 𝑆 ⊂ 𝑅𝑛2 with prob. 1 − exp(− 𝑛) ?

Page 28: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Projections of product vectors

dot product of

block with 𝑏

=

Question. Given any vector 𝑎, 𝑏 ∈ ℝ𝑛 and gaussian 𝜌-perturbation

𝑎 , 𝑏 , does 𝒂 ⊗ 𝒃 have projection 𝑝𝑜𝑙𝑦(𝜌,1

𝑛) onto any given 𝑛2/2

dimensional subspace 𝑆 ⊂ 𝑅𝑛2 with prob. 1 − exp(− 𝑛) ?

𝑛2

2

𝑛2 𝑎 𝑛 ⊗ 𝑏

𝛱𝑆 is projection matrix onto 𝑆

𝛱𝑆 𝑎 ⊗ 𝑏 = 𝛱𝑆 𝑏 𝑎

𝑛2

2

𝑛

Page 29: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Two steps of Proof..

2. If Π𝑆(𝑏 ) has 𝑟 eigenvalues > 𝑝𝑜𝑙𝑦(𝜌,1

𝑛), then w.p. 1 − exp (−𝑟)

(over perturbation of 𝑎 ), 𝒂 ⊗ 𝒃 has large projection onto 𝑆.

follows easily analyzing

projection of a vector to

a dim-k space

will show with 𝑟 = √𝑛

1. W.h.p. (over perturbation of b), Π𝑆(𝑏 ) has at least

𝑟 eigenvalues > 𝑝𝑜𝑙𝑦(𝜌,1

𝑛)

Page 30: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Suppose: Choose ΠS first 𝑛 × 𝑛 “blocks” in ΠS were orthogonal...

…. ….

….

Structure in any subspace S

(restricted to 𝑛 cols)

• Entry (i,j) is:

• Translated i.i.d. Gaussian matrix!

has many big eigenvalues

√𝑛

Π𝑆 𝑏 𝑛 =

𝑣𝑖𝑗 ∈ ℝ𝑛

Page 31: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Main claim: every 𝑐. 𝑛2 dimensional space 𝑆 has ~√𝑛 vectors

with such a structure..

….

….

….

Property: picked blocks (n dim vectors) have “reasonable”

component orthogonal to span of rest..

Finding Structure in any subspace S

Earlier argument goes through even with blocks not fully

orthogonal!

𝑣1

𝑣2

𝑣√𝑑

Page 32: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Idea: obtain “good” columns one by one..

• Show there exists a block with many linearly

independent “choices”

• Fix some choices and argue the same property holds, …

Main claim (sketch)..

Generalization: similar result holds for higher order

products, implies main result.

crucially use the fact

that we have a Ω(𝑛2) dim subspace

• Uses a delicate inductive argument

Page 33: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Summary

• Polynomial time Algorithms when

rank 𝒌 ≫ 𝒏:

Tensor Decompositions when rank 𝑘 = 𝑛𝑂(1)

Learning when number of clusters/ topics 𝑘 = 𝑛𝑂(1)

• Smoothed Analysis for Tensor Decompositions & Learning

Page 34: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Future Directions

Smoothed Analysis for Tensor Decompositions, Learning

• Handling ranks that match generic results for uniqueness,

algorithms ?

• Polynomially robust analogs of [Chiantini-Ottaviani] or [Cardoso]?

• Proofs for generic results that are more amenable to noise?

Better Robustness to Errors

• Tensor decomposition algorithms that more robust to errors ?

promise: [Barak-Kelner-Steurer’14] using Lasserre hierarchy

• Modelling errors?

Page 35: Smoothed Analysis of Tensor DecompositionsApplications of Tensor Decompositions to ML – Motivating Algorithm properties 2. Smoothed Analysis Model and Results 3. Overview of the

Thank You!

Questions?