34
Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and Statistics University of California, Berkeley

Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Latent Structure Beyond Sparse Codes

Benjamin RechtDepartment of EECS and StatisticsUniversity of California, Berkeley

Page 2: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Sparse Codes

1.25x 2.5x

5x 10x

Figure 1. Learned dictionaries. Each panel shows 100 basis functions selected at random from the dictionary of a givenovercompleteness ratio.

resulting in dictionaries containing more specialized elements such as straight contours, blobs, local curvature, andgratings. The specialized elements are better matched to the structures occurring natural images, as evidencedby the fact that they yield lower L1 norm representations, steeper coe�cient decay, and better denoising. Itseems plausible that they may also result in improved image compression though this remains to be seen.

These results are of relevance to neuroscience because the input layer of V1 is thought to be at least 100x

redundancy

Which mathematical representations can be learned robustly?

robustness and sparsity

Gabor-like thingies...

Page 3: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Sparse Approximation

• Use the fact that images are sparse in wavelet basis to reduce number of measurements required for signal acquisition.

pixels largewaveletcoefficients

widebandsignalsamples

largeGaborcoefficients

time

frequency

Compressed Sensing

• npatients << npeaks

• If very few are needed for diagnosis, search for a sparse set of markers

Lasso

Page 4: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Cardinality Minimization• PROBLEM: Find the vector of lowest cardinality that

satisfies/approximates the underdetermined linear system

• NP-HARD:–Reduce to EXACT-COVER

–Hard to approximate

–Known exact algorithms require enumeration

• HEURISTIC: Replace cardinality with l1 norm

�x = y � : Rp ! Rn

Page 5: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Density Matrix

Seismic Imaging

Geometric Structure

Rank of:

RecommenderSystems

DataMatrix

Quantum Tomography

Rank of:

Rank of:

Rank of: Unfolded Tensor

GramMatrix

Page 6: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Affine Rank Minimization• PROBLEM: Find the matrix of lowest rank that

satisfies/approximates the underdetermined linear system

• NP-HARD:–Reduce to solving polynomial equations

–Hard to approximate

–Exact algorithms are awful

• HUERISTIC: Replace rank with nuclear norm

�(X) = y � : Rp1⇥p2 ! Rn

Page 7: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Heuristic: Gradient Descent

• Step 1: Pick (i,j) and compute residual:

• Step 2: Take a mixture of current model and corrected model (𝛼,β>0):

r x p2

=M LR*

p1 x rp1 x p2

minimize kXk⇤subject to �(X) = b

IDEA: Replace rank with nuclear norm:

Some guy on livejournal, 2006Fazel, Parillo, Recht, 2007Candes and Recht, 2008

Succeeds when number of samples is Õ(r(p1 +p2))

e = (LiRTj �Mij)

Li

Rj

↵Li � �eRj

↵Rj � �eLi

Page 8: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

System Identification: find a dynamical model that agrees with time series data• All linear systems are combinations of single pole filters.• Leverage this structure for new algorithms and analysis.

Observe a time series driven by the inputy1, y2, . . . , yTu1, u2, . . . uT

What is a principled way to build a parsimonious model for the input-output responses?

Na et al, 2012

Shah, Bhaskar, Tang, and Recht 2012

Page 9: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Linear Inverse Problems• Find me a solution of

• Φ n x p, n<p

• Of the infinite collection of solutions, which one should we pick?

• Leverage structure:

• How do we design algorithms to solve underdetermined systems problems with priors?

y = �x

Sparsity Rank Smoothness Symmetry

Page 10: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

kxk1 =pX

i=1

|xi|

• 1-sparse vectors of Euclidean norm 1

• Convex hull is the unit ball of the l1 norm

1

1

-1

-1

Sparsity

Page 11: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

minimize kxk1

subject to �x = y

x1

x2

Φx=y

Compressed Sensing: Candes, Romberg, Tao, Donoho, Tanner, Etc...

Page 12: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• 2x2 matrices• plotted in 3d

rank 1 x2 + z2 + 2y2 = 1

Rank

Page 13: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• 2x2 matrices• plotted in 3d

rank 1 x2 + z2 + 2y2 = 1

Convex hull:

Rank

kXk⇤ =X

i

�i(X)

Page 14: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• 2x2 matrices• plotted in 3d

Nuclear Norm Heuristic

Fazel 2002. R, Fazel, and Parillo 2007

Rank Minimization/Matrix Completion

kXk⇤ =X

i

�i(X)

Page 15: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• Integer solutions: all components of x

are ±1

• Convex hull is the unit ball of the l1 norm

(1,-1)

(1,1)

(-1,-1)

(-1,1)

Integer Programming

Page 16: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

minimize kxk1subject to �x = y

x1

x2

Φx=y

Donoho and Tanner 2008Mangasarian and Recht. 2009.

Page 17: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• Search for best linear combination of fewest atoms• “rank” = fewest atoms needed to describe the model

Parsimonious Models

atomsmodel weights

rank

Page 18: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Atomic Norms• Given a basic set of atoms, , define the function

• When is centrosymmetric, we get a norm

• When can we compute this?• When does this work?

kxkA = inf{X

a2A|ca| : x =

X

a2Acaa}

kxkA = inf{t > 0 : x 2 tconv(A)}

A

minimize kzkAsubject to �z = yIDEA:

A

Page 19: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Hierarchical dictionary for image patches

26/42

Union of Subspaces

• X has structured sparsity: linear combination of elements from a set of subspaces {Ug}.

• Atomic set: unit norm vectors living in one of the Ug

Permutations and Rankings

• X a sum of a few permutation matrices

• Examples: Multiobject Tracking, Ranked elections, BCS

• Convex hull of permutation matrices: doubly stochastic matrices.

Page 20: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• Moments: convex hull of of [1,t,t2,t3,t4,...], t∈T, some basic set.

• System Identification, Image Processing, Numerical Integration, Statistical Inference

• Solve with semidefinite programming

• Cut-matrices: sums of rank-one sign matrices.

• Collaborative Filtering, Clustering in Genetic Networks, Combinatorial Approximation Algorithms

• Approximate with semidefinite programming

• Low-rank Tensors: sums of rank-one tensors

• Computer Vision, Image Processing, Hyperspectral Imaging, Neuroscience

• Approximate with alternating least-squares

Page 21: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Atomic norms in sparse approximation

• Greedy approximations

• Best n term approximation to a function f in the convex hull of A.

• Maurey, Jones, and Barron (1980s-90s)• Devore and Temlyakov (1996)• Random Feature Heuristics (Rahimi and R, 2007)

kf � fnkL2 c0kfkAp

n

Page 22: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• Set of directions that decrease the norm from x form a cone:

• x is the unique minimizer if the intersection of this cone with the null space of Φ  equals {0}

Tangent Cones

y = �zx

minimize kzkAsubject to �z = y

{z : kzkA kxkA}TA(x)

TA(x) = {d : kx + ↵dkA kxkA for some ↵ > 0}

Page 23: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Mean Width

d

0x

S

C

(d) = supx2C

d

0x

�d

0x

Support Function:

SC(d) + SC(�d)measures width of C when projected onto span of d.

mean width: w(C) =

Z

Sp�1

SC(u)du

Page 24: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• When does a random subspace, U in , intersect a convex cone C at the origin?

• Gordon (1988): with high probability if

where is the mean width.

• Corollary: For inverse problems, if Φ is a random Gaussian matrix with n rows, need

for exact recovery of x.

codim(U) � pw(C \ Sp�1)

2

w(C \ Sp�1) =

Z

Sp�1

SC(u)du

n � pw(TA(x) \ Sp�1)2

Rp

Page 25: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• Hypercube:

• Sparse Vectors, p vector, sparsity s

• Block sparse, M groups (possibly overlapping), maximum group size B, k active groups

• Low-rank matrices: p1 x p2, (p1<p2), rank r

Ratesn � p/2

n � 2s log�ps

�+

5s4

n � k⇣p

2 log (M � k) +pB⌘2

+ kB

n � 3r(p1 + p2 � r)

Page 26: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• Suppose we observe

• If is an optimal solution, then provided that

Robust Recovery (deterministic)

minimize kzkAsubject to k�z � yk �

kwk2 �

kx� x̂k 2�

y = �x + w

{z : kzkA kxkA}

k�z � yk �

n � pw(TA(x) \ Sp�1)2

(1� ✏)2

Page 27: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• Suppose we observe

• If is an optimal solution, then provided that

Robust Recovery (statistical)

y = �x + w

minimize k�z � yk2 + µkzkA

cone{u : kx+ ukA kxkA + �kuk}

kx� x̂k2 ⌘(x,A,�, �)µAnd under an additional “cone condition”

Bhaskar, Tang, and Recht 2011

µ � Ew[k�⇤wk⇤A]k�x� �x̂k2

pµkxkA

Page 28: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• Sparse Vectors, p vector, sparsity s

• Low-rank matrices: p1 x p2, (p1<p2), rank r

Denoising Rates (re-derivations)

1

pkx̂� x?k22 = O

✓�2s log(p)

p

1

p1p2kx̂� x?k2F = O

✓�2r

p1

Page 29: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Atomic Norm Minimization

• Generalizes existing, powerful methods• Rigorous formula for developing new analysis

algorithms• Tightest bounds on number of measurements

needed for model recovery in all common models• One algorithm prototype for many data-mining

applications

minimize kzkAsubject to �z = yIDEA:

Chandrasekaran, Recht, Parrilo, and Willsky 2010

Page 30: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

• Gram matrix of y vectors indicates overlapping support

• Use graph algorithms to identify single dictionary elements at a time

Learning representations

• ASSUME:• very sparse vectors• s<N1/2/log(N)

• very incoherent dictionary (much more than RIP)

• number of observations is much bigger than N

Arora, Ge, and MoitraAgarwal, Anandkumar, and Netrapalli

x z

|��x, �z�| � |�x, z�|

Page 31: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Extended representations

C = �(K � L)convex body

linear map

cone affine space

this non-regular hexagon only has the trivial LP-lift

{y ! R5+ : y1 + y2 + y3 + y5 = 2, y3 + y4 + y5 = 1},

regular hexagon is the projection of a 3-dimlslice of R

5+

Page 32: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

C = �(K � L)

(1,-1)

(1,1)

(-1,-1)

(-1,1)

1

1

-1

-1

� =�I �I

�L = {y :

2d�

i=1

yi = 1} L = {Z : trace(Z) = 1}

��A B

BT C

��= B

��T xxT u

��= x

L =

�y :

yi + yi+d = 11 � i � d

� =�I �I

L =

�Z =

�T xxT u

�:

T toeplitzT11 = u = 1

K = R2d+

K = Sd1+d2+

K = Sd+1+K = R2d

+

Page 33: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Extended representations

C = �(K � L)

linear map

cone affine space

this non-regular hexagon only has the trivial LP-lift

{y ! R5+ : y1 + y2 + y3 + y5 = 2, y3 + y4 + y5 = 1},

regular hexagon is the projection of a 3-dimlslice of R

5+

C� = {y : �x, y� � 1 �x � C}

1 � �x, y� = �A(x), B(y)�A : C � K B : C� � K�

C has a lift into K if there are maps

such that

for all extreme points of x ∈ C and y ∈ C*

polar body

Gouveia, Parrilo, and Thomas

Representation learning becomes matrix factorization

Page 34: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and

Learning extended representations?

C = �(K � L)convex body

linear map

cone affine space

• Learning representation through NMF?• Ties immediately with gaussian width analysis• Could obviate graph structured arguments• What are the right features?