51
Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang.

Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Dictionary Learning Using Tensor Methods

Anima Anandkumar

U.C. Irvine

Joint work with Rong Ge, Majid Janzamin and Furong Huang.

Page 2: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Feature learning as cornerstone of MLML Practice

Page 3: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Feature learning as cornerstone of MLML Practice ML Papers

Page 4: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Feature learning as cornerstone of MLFind efficient representation of data, e.g. based on sparsity,Invariances, low dimensional structures etc.

ML Practice ML Papers

Feature engineering typically critical for good performance

Deep learning has shown considerable promise for feature learning

Page 5: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Feature learning as cornerstone of MLFind efficient representation of data, e.g. based on sparsity,Invariances, low dimensional structures etc.

ML Practice ML Papers

Feature engineering typically critical for good performance

Deep learning has shown considerable promise for feature learning

Can we provide principled approaches which are guaranteed tolearn good features?

Page 6: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Applications of Representation Learning

Compressed sensing

Extensive literature on compressed sensing

Few linear measurements to recover sparse signals

What if the signal is not sparse in input representation?

What if the dictionary has invariances, e.g. shift, rotation.

Page 7: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Applications of Representation Learning

Compressed sensing

Extensive literature on compressed sensing

Few linear measurements to recover sparse signals

What if the signal is not sparse in input representation?

What if the dictionary has invariances, e.g. shift, rotation.

Can we learn a representation where the signal is sparse?

Page 8: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Applications of Representation Learning

Compressed sensing

Extensive literature on compressed sensing

Few linear measurements to recover sparse signals

What if the signal is not sparse in input representation?

What if the dictionary has invariances, e.g. shift, rotation.

Can we learn a representation where the signal is sparse?

Topic Modeling

Unsupervised learning of admixtures.

In text documents, social networks (community modeling), biologicalmodels, . . ..

Page 9: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Dictionary Learning ModelGoal: Find dictionary A with k elements such that each data point is alinear combination of sparse combination of dictionary elements.

=

X A H

Page 10: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Dictionary Learning ModelGoal: Find dictionary A with k elements such that each data point is alinear combination of sparse combination of dictionary elements.

=

X A H

Topic models: xi is a document, A contains topics, hi gives topicsin document i

Page 11: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Dictionary Learning ModelGoal: Find dictionary A with k elements such that each data point is alinear combination of sparse combination of dictionary elements.

=

X A H

Topic models: xi is a document, A contains topics, hi gives topicsin document i

Compressed sensing: xi are the signals, A is a basis with sparserepresentation

Page 12: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Dictionary Learning ModelGoal: Find dictionary A with k elements such that each data point is alinear combination of sparse combination of dictionary elements.

=

X A H

Topic models: xi is a document, A contains topics, hi gives topicsin document i

Compressed sensing: xi are the signals, A is a basis with sparserepresentation

Images: xi is an image, A contains filters, hi gives filters present inimage i (also need to incorporate invariances)

Page 13: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Outline

1 Introduction

2 Tensor Methods for Dictionary Learning

3 Convolutional Dictionary Models

4 Conclusion

Page 14: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Learning Dictionary Models

Computational Challenges

Maximum likelihood: non-convex optimization. NP-hard.

Practice: Local search approaches such as gradient descent, EM,Variational Bayes have no consistency guarantees.

Can get stuck in bad local optima. Poor convergence rates and hardto parallelize.

Tensor methods can yield guaranteed learning

Page 15: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Moment Matrices and Tensors

Multivariate Moments

M1 := E[x], M2 := E[x⊗ x], M3 := E[x⊗ x⊗ x].

Matrix

E[x⊗ x] ∈ Rd×d is a second order tensor.

E[x⊗ x]i1,i2 = E[xi1xi2 ].

For matrices: E[x⊗ x] = E[xx⊤].

Tensor

E[x⊗ x⊗ x] ∈ Rd×d×d is a third order tensor.

E[x⊗ x⊗ x]i1,i2,i3 = E[xi1xi2xi3 ].

Page 16: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Spectral Decomposition of Tensors

M2 =∑iλiui ⊗ vi

= + ....

Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2

Page 17: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Spectral Decomposition of Tensors

M2 =∑iλiui ⊗ vi

= + ....

Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2

M3 =∑iλiui ⊗ vi ⊗ wi

= + ....

Tensor M3 λ1u1 ⊗ v1 ⊗ w1 λ2u2 ⊗ v2 ⊗ w2

u⊗ v ⊗ w is a rank-1 tensor since its (i1, i2, i3)th entry is ui1vi2wi3 .

Page 18: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Moment forms for Dictionary Models

xi = Ahi, i ∈ [n].

Independent components analysis (ICA)

hi are independent, e.g. Bernoulli Gaussian

M4 := E[x⊗ x⊗ x⊗ x]− T, where

Ti1,i2,i3,i4 := E[xi1xi2 ]E[xi3xi4 ] + E[xi1xi3 ]E[xi2xi4 ] + E[xi1xi4 ]E[xi2xi3 ],

Let κj := E[h4j ]− 3E2[h2j ], j ∈ [k]. Then, we have

M4 =∑

j∈[k]κjaj ⊗ aj ⊗ aj ⊗ aj .

Page 19: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Moment forms for Dictionary Models

General (sparse) coefficients

xi = Ahi, i ∈ [n], E[hi] = s.

E[h4i]= E

[h2i]= βs/k,

E[h2i h

2j

]≤ τ, i 6= j,

E[h3i hj

]= 0, i 6= j,

E[x⊗ x⊗ x⊗ x] =∑

j∈[k]κjaj ⊗ aj ⊗ aj ⊗ aj + E, where ‖E‖ ≤ τ‖A‖4.

Page 20: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Tensor Rank and Tensor Decomposition

Rank-1 tensor: T = w · a⊗ b⊗ c ⇔ T (i, j, l) = w · a(i) · b(j) · c(l).

Page 21: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Tensor Rank and Tensor Decomposition

Rank-1 tensor: T = w · a⊗ b⊗ c ⇔ T (i, j, l) = w · a(i) · b(j) · c(l).

CANDECOMP/PARAFAC (CP) Decomposition

T =∑

j∈[k]wjaj ⊗ bj ⊗ cj ∈ R

d×d×d, aj , bj, cj ∈ Sd−1.

= + ....

Tensor T w1 · a1 ⊗ b1 ⊗ c1 w2 · a2 ⊗ b2 ⊗ c2

Page 22: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Tensor Rank and Tensor Decomposition

Rank-1 tensor: T = w · a⊗ b⊗ c ⇔ T (i, j, l) = w · a(i) · b(j) · c(l).

CANDECOMP/PARAFAC (CP) Decomposition

T =∑

j∈[k]wjaj ⊗ bj ⊗ cj ∈ R

d×d×d, aj , bj, cj ∈ Sd−1.

= + ....

Tensor T w1 · a1 ⊗ b1 ⊗ c1 w2 · a2 ⊗ b2 ⊗ c2

k: tensor rank, d: ambient dimension.

k ≤ d: undercomplete and k > d: overcomplete.

Page 23: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Orthogonal Tensor Power MethodSymmetric orthogonal tensor T ∈ R

d×d×d:

T =∑

i∈[k]λivi ⊗ vi ⊗ vi.

Page 24: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Orthogonal Tensor Power MethodSymmetric orthogonal tensor T ∈ R

d×d×d:

T =∑

i∈[k]λivi ⊗ vi ⊗ vi.

Recall matrix power method: v 7→M(I, v)

‖M(I, v)‖.

Page 25: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Orthogonal Tensor Power MethodSymmetric orthogonal tensor T ∈ R

d×d×d:

T =∑

i∈[k]λivi ⊗ vi ⊗ vi.

Recall matrix power method: v 7→M(I, v)

‖M(I, v)‖.

Algorithm: tensor power method: v 7→T (I, v, v)

‖T (I, v, v)‖.

Page 26: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Orthogonal Tensor Power MethodSymmetric orthogonal tensor T ∈ R

d×d×d:

T =∑

i∈[k]λivi ⊗ vi ⊗ vi.

Recall matrix power method: v 7→M(I, v)

‖M(I, v)‖.

Algorithm: tensor power method: v 7→T (I, v, v)

‖T (I, v, v)‖.

How do we avoid spurious solutions (not part of decomposition)?

Page 27: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Orthogonal Tensor Power MethodSymmetric orthogonal tensor T ∈ R

d×d×d:

T =∑

i∈[k]λivi ⊗ vi ⊗ vi.

Recall matrix power method: v 7→M(I, v)

‖M(I, v)‖.

Algorithm: tensor power method: v 7→T (I, v, v)

‖T (I, v, v)‖.

How do we avoid spurious solutions (not part of decomposition)?

• vi’s are the only robust fixed points.

Page 28: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Orthogonal Tensor Power MethodSymmetric orthogonal tensor T ∈ R

d×d×d:

T =∑

i∈[k]λivi ⊗ vi ⊗ vi.

Recall matrix power method: v 7→M(I, v)

‖M(I, v)‖.

Algorithm: tensor power method: v 7→T (I, v, v)

‖T (I, v, v)‖.

How do we avoid spurious solutions (not part of decomposition)?

• vi’s are the only robust fixed points. • All other eigenvectors are saddle points.

Page 29: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Orthogonal Tensor Power MethodSymmetric orthogonal tensor T ∈ R

d×d×d:

T =∑

i∈[k]λivi ⊗ vi ⊗ vi.

Recall matrix power method: v 7→M(I, v)

‖M(I, v)‖.

Algorithm: tensor power method: v 7→T (I, v, v)

‖T (I, v, v)‖.

How do we avoid spurious solutions (not part of decomposition)?

• vi’s are the only robust fixed points. • All other eigenvectors are saddle points.

For an orthogonal tensor, no spurious local optima!

Page 30: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Putting it together

Non-orthogonal tensor M3 =∑

iwiai ⊗ ai ⊗ ai, M2 =∑

iwiai ⊗ ai.

Whitening matrix W :

Multilinear transform: T = M3(W,W,W )

v1v2

v3

Wa1a2a3

Tensor M3 Tensor T

Page 31: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Putting it together

Non-orthogonal tensor M3 =∑

iwiai ⊗ ai ⊗ ai, M2 =∑

iwiai ⊗ ai.

Whitening matrix W :

Multilinear transform: T = M3(W,W,W )

v1v2

v3

Wa1a2a3

Tensor M3 Tensor T

Tensor Decomposition in Undercomplete Case: Solved!

Page 32: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Overcomplete Setting

In general, tensor decomposition NP-hard.

Tractable when A is incoherence, i.e. 〈ai, aj〉 ≈1√dfor i 6= j.

Page 33: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Overcomplete Setting

In general, tensor decomposition NP-hard.

Tractable when A is incoherence, i.e. 〈ai, aj〉 ≈1√dfor i 6= j.

SVD Initialization

Find the top singular vectors of T (I, I, θ) for θ ∼ N (0, I).

Use them for initialization of power method. L trials.

Page 34: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Overcomplete Setting

In general, tensor decomposition NP-hard.

Tractable when A is incoherence, i.e. 〈ai, aj〉 ≈1√dfor i 6= j.

SVD Initialization

Find the top singular vectors of T (I, I, θ) for θ ∼ N (0, I).

Use them for initialization of power method. L trials.

Assumptions

Number of initializations: L ≥ kΩ(k/d)2 , Tensor Rank: k = O(d)

No. of Iterations: N = Θ(log(1/‖E‖)). Recall ‖E‖: recovery error.

Theorem (Global Convergence)[AGJ-COLT2015]:

‖a1 − a(N)‖ ≤ O(‖E‖).

Page 35: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Improved Sample Complexity Analysis

Dictionary A ∈ Rd×k satisfying RIP, sparse-ICA model with

sub-Gaussian variables.

Sparsity level s. Number of samples n.

‖M4 −M4‖ = O

(s2

n+

√s4

d3n

)

Careful ǫ-net covering and bucketing.

Page 36: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Outline

1 Introduction

2 Tensor Methods for Dictionary Learning

3 Convolutional Dictionary Models

4 Conclusion

Page 37: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Convolutional Dictionary Model

So far, invariances in dictionary are not incorporated.

Convolutional models: incorporate invariances such as shift invariance.

Image

Dictionary elements

Page 38: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Rewriting as a standard dictionary model

== ∗

xx

f∗i w∗

i F∗ w∗

(a)Convolutional model (b)Reformulated model

x =∑

i

fi ∗ wi =∑

i

Cir(fi)wi = F∗w∗

Assume coefficients wi are independent (convolutional ICA model)

Cumulant tensor has decomposition with components F∗i .

Page 39: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Moment forms and optimization

x =∑

i

fi ∗ wi =∑

i

Cir(fi)wi = F∗w∗

Assume coefficients wi are independent (convolutional ICA model)

Cumulant tensor has decomposition with components F∗i .

.... ....= ++

Cumulant λ1(F∗1 )

⊗3 +λ2(F∗2 )

⊗3 . . .

Page 40: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Efficient Optimization Techniques

cumulant =∑

j

λjF⊗3j or matricization: cumulant = F∗Λ∗(F∗ ⊙F∗)⊤

Page 41: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Efficient Optimization Techniques

cumulant =∑

j

λjF⊗3j or matricization: cumulant = F∗Λ∗(F∗ ⊙F∗)⊤

Objective function: minF

‖Cumulant−FΛ (F ⊙ F)⊤‖2F

s.t. blkl(F) = UDiag(FFT(fl))UH, ‖fl‖2 = 1.

Page 42: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Efficient Optimization Techniques

cumulant =∑

j

λjF⊗3j or matricization: cumulant = F∗Λ∗(F∗ ⊙F∗)⊤

Objective function: minF

‖Cumulant−FΛ (F ⊙ F)⊤‖2F

s.t. blkl(F) = UDiag(FFT(fl))UH, ‖fl‖2 = 1.

Alternating minimization: Relax FΛ (F ⊙ F)⊤ to FΛ (H⊙ G)⊤

Page 43: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Efficient Optimization Techniques

cumulant =∑

j

λjF⊗3j or matricization: cumulant = F∗Λ∗(F∗ ⊙F∗)⊤

Objective function: minF

‖Cumulant−FΛ (F ⊙ F)⊤‖2F

s.t. blkl(F) = UDiag(FFT(fl))UH, ‖fl‖2 = 1.

Alternating minimization: Relax FΛ (F ⊙ F)⊤ to FΛ (H⊙ G)⊤

Under full column rank H⊙ G, form: T := Cumulant((H⊙ G)⊤

)†.

Page 44: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Efficient Optimization Techniques

cumulant =∑

j

λjF⊗3j or matricization: cumulant = F∗Λ∗(F∗ ⊙F∗)⊤

Objective function: minF

‖Cumulant−FΛ (F ⊙ F)⊤‖2F

s.t. blkl(F) = UDiag(FFT(fl))UH, ‖fl‖2 = 1.

Alternating minimization: Relax FΛ (F ⊙ F)⊤ to FΛ (H⊙ G)⊤

Under full column rank H⊙ G, form: T := Cumulant((H⊙ G)⊤

)†.

Main Result: Optimal solution foptl , ∀p ∈ [n], q := (i− j) mod n,

foptl (p) =

∑i,j∈[n]

‖blkl(T )j‖−1 · blkl(T )

ij · I

qp−1

∑i,j∈[n]

Iqp−1

,

Page 45: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Efficient Optimization Techniques

Under full column rank H⊙ G, form: T := Cumulant((H⊙ G)⊤

)†.

Optimal solution is then computed in closed form.

Bottleneck computation:((H⊙ G)⊤

)†. Naive implementation:

O(n6) time, where n is the length of signal.

Running time of our method: For length-n signals and L number of filters,O(log n+ logL) time with O(L2n3) processors.

Involves 2L FFT’s, some matrix multiplications, inverse of diagonalmatrices.

Page 46: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Experiments (synthetic)

Convolutional tensor (CT). Alternating minimization (AM).

100

101

102

10−1

100

101

error

iteration

CT: ReconstAM: Reconst

CT: f1AM: f1CT: f2AM: f2

(a) Reconstruction

Error

0 2 4 6 8 100

50

100

150

200

250

300

350

400

Number of Filters L

seconds

CTAM

(b) Running Times

Scale with L

0 200 400 600 800 10000

10

20

30

40

50

60

70

Number of Samples N

seconds

Proposed CTBaseline AM

(c) Running Times

Scale with N

Page 47: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Experiments (NLP)

Microsoft paraphrase dataset. 4096 sentence pairs. Unsupervisedconvolutional tensor method: no outside information. F score.

Method Description Outside Information F score

Vector Similarity cosine similarity with tf-idf weights word similarity 75.3%ESA explicit semantic space word semantic profiles 79.3%LSA latent semantic space word semantic profiles 79.9%RMLMG graph subsumption lexical&syntactic&synonymy info 80.5%CT (proposed) convolutional dictionary learning none 80.7%MCS combine word similarity measures word similarity 81.3%STS combine semantic&string similarity semantic similarity 81.3%SSA salient semantic space word semantic profiles 81.4%matrixJcn JCN WordNet similarity with matrix word similarity 82.4%

Paraphrase detected: (1) Amrozi accused his brother, whom he called ”the witness”, ofdeliberately distorting his evidence. (2) Referring to him as only ”the witness”, Amrozi accusedhis brother of deliberately distorting his evidence.

Non-paraphrase detected : (1) I never organised a youth camp for the diocese of Bendigo. (2)

I never attended a youth camp organised by that diocese.”

Page 48: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Outline

1 Introduction

2 Tensor Methods for Dictionary Learning

3 Convolutional Dictionary Models

4 Conclusion

Page 49: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Summary and Outlook

Summary

Method of moments for learning dictionary elements.

Invariances in convolutional models can be handled efficiently.

Page 50: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Summary and Outlook

Summary

Method of moments for learning dictionary elements.

Invariances in convolutional models can be handled efficiently.

Outlook

Analyze optimization landscape for convolutional models for tensormethods.

Extend to other kinds of invariances (e.g. rotation).

Page 51: Dictionary Learning Using Tensor Methodstensorlab.cms.caltech.edu/users/anima/pubs/AnandkumarDag...Learning Dictionary Models Computational Challenges Maximum likelihood: non-convex

Summary and Outlook

Summary

Method of moments for learning dictionary elements.

Invariances in convolutional models can be handled efficiently.

Outlook

Analyze optimization landscape for convolutional models for tensormethods.

Extend to other kinds of invariances (e.g. rotation).

How is feature learning useful for classification?

Precise characterization for training neural networks: first polynomialtime methods!

“Beating the Perils of Non-Convexity: Guaranteed Training of NeuralNetworks using Tensor Methods” by Majid Janzamin, Hanie Sedghiand A.