40
Stochastic Subsampling for Massive Matrix Factorization Arthur Mensch, Julien Mairal, Ga¨ el Varoquaux, Bertrand Thirion Inria Parietal, Universit´ e Paris-Saclay April 13, 2018

Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Stochastic Subsampling for Massive MatrixFactorization

Arthur Mensch, Julien Mairal,Gael Varoquaux, Bertrand Thirion

Inria Parietal, Universite Paris-Saclay

April 13, 2018

Page 2: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Matrix factorization

X ∈ Rp×n = DA ∈ Rp×k × Rk×n

Flexible tool for unsupervised data analysis

Dataset has lower underlying complexity than appearing size

How to scale it to very large datasets ?(Brain imaging, 4TB, hyperspectral imaging, 100 GB)

Arthur Mensch Stochastic Subsampling for Matrix Factorization 1 / 24

Page 3: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Example: resting-state fMRI

Resting-state data analysis:

Input: 3D brain images across time for 900 subjects

X ∈ Rp×n, n = 5 · 106, p = 2 · 105

Goal: Extract representative sparse brain components D

Functional networks: correlated brain activity in localizedareas (e.g., auditory, visual, motor cortex)

Voxe

ls

Time

=

k spatial maps Time

x

Arthur Mensch Stochastic Subsampling for Matrix Factorization 2 / 24

Page 4: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Other examples

Computer vision:

Patches of an image

Decompose onto a dictionary

Sparse loadings

Collaborative filtering:

Incomplete user-item rating matrix

Decompose into a low-rank factorisation

Arthur Mensch Stochastic Subsampling for Matrix Factorization 3 / 24

Page 5: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Designing new efficient algorithms

Voxe

lsTime

=

k spatial maps Time

x

X is large (5TB) in both number of samples n and sampledimension p

New stochastic algorithms that scale in both directions

Arthur Mensch Stochastic Subsampling for Matrix Factorization 4 / 24

Page 6: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Formalism and methods

Non-convex matrix factorization:

minD∈C,A∈Rk×n

‖X−DA‖2F + λΩ(A)

Constraints on the dictionary D: each column d(j) in B2 or B1

Penalty on the code A: `1, `2 (+ non-negativity)

= x

Naive resolution:

Alternated minimization: use full X at each iteration

Slow: single iteration cost in O(np)

Arthur Mensch Stochastic Subsampling for Matrix Factorization 5 / 24

Page 7: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Online matrix factorization [Mairal et al., 2010]

Scaling in n:

Stream (xt) and update (Dt) at each t

Single iteration cost in O(p)

Convergence in a few epochs → large speed-up

= x

UpdateStream Compute

Use case:

Large n, regular p, e.g., image patches:

p = 256 n ≈ 106 1GB

Low-rank factorization / sparse coding

Arthur Mensch Stochastic Subsampling for Matrix Factorization 6 / 24

Page 8: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Scaling-up for massive matrices

Out-of-the-box online algorithm ?

=

x

Update

Stream

Compute

Limited time budget ?

Need to accomodate large p

235 h run time

1 full epoch

10 h run time

124

epoch

Arthur Mensch Stochastic Subsampling for Matrix Factorization 7 / 24

Page 9: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Scaling-up for massive matrices

Out-of-the-box online algorithm ?

=

x

Update

Stream

Compute

Limited time budget ?

Need to accomodate large p

235 h run time

1 full epoch

10 h run time

124

epoch

Arthur Mensch Stochastic Subsampling for Matrix Factorization 7 / 24

Page 10: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Scaling-up for massive matrices

Out-of-the-box online algorithm ?

=

x

Update

Stream

Compute

Limited time budget ?

Need to accomodate large p

235 h run time

1 full epoch

10 h run time

124

epoch

Arthur Mensch Stochastic Subsampling for Matrix Factorization 7 / 24

Page 11: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Scaling-up in both directions

- Data access

- Dictionary update

Streamcolumns

- Code com- putation

Online matrixfactorization

Alternate-minimization

(dim.)

Iteration t

Seen at t Seen at t+1Unseen at t

(di

m.)

Updated at t

Arthur Mensch Stochastic Subsampling for Matrix Factorization 8 / 24

Page 12: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Scaling-up in both directions

- Data access

- Dictionary update

Streamcolumns

- Code com- putation Subsample

rows

Online matrixfactorization

Proposedalgorithm

Alternate-minimization

(dim.)

Iteration t

Seen at t Seen at t+1Unseen at t

(di

m.)

Updated at t

Arthur Mensch Stochastic Subsampling for Matrix Factorization 9 / 24

Page 13: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Online dictionary learning: details

We learn the left side factor: D? solution of

minD∈C

1

n

n∑i=1

‖x(i) −Dα(i)(D)‖22

α(i)(D) = argminα∈Rk

‖x(i) −Dα‖22 + λΩ(α)

Expected risk minimization problem: (it)t sampled uniformly

minD∈C

E[ft(D)] ft(D) , ‖x(it) −Dα(it)(D)‖22

How to minimize it ?

Arthur Mensch Stochastic Subsampling for Matrix Factorization 10 / 24

Page 14: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Online dictionary learning: details

We learn the left side factor: D? solution of

minD∈C

1

n

n∑i=1

‖x(i) −Dα(i)(D)‖22

α(i)(D) = argminα∈Rk

‖x(i) −Dα‖22 + λΩ(α)

Expected risk minimization problem: (it)t sampled uniformly

minD∈C

E[ft(D)] ft(D) , ‖x(it) −Dα(it)(D)‖22

How to minimize it ?

Arthur Mensch Stochastic Subsampling for Matrix Factorization 10 / 24

Page 15: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Online dictionary learning: details

We learn the left side factor: D? solution of

minD∈C

1

n

n∑i=1

‖x(i) −Dα(i)(D)‖22

α(i)(D) = argminα∈Rk

‖x(i) −Dα‖22 + λΩ(α)

Expected risk minimization problem: (it)t sampled uniformly

minD∈C

E[ft(D)] ft(D) , ‖x(it) −Dα(it)(D)‖22

How to minimize it ?

Arthur Mensch Stochastic Subsampling for Matrix Factorization 10 / 24

Page 16: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

A majorization-minimization algorithm

Expected risk minimization: xt , x (it)

minD∈C

f (D) = E[ft(D)] ft(D) , ‖xt −Dαt(D)‖22

At iteration t: we build a pointwise majorizing surrogate

αt = argminα∈Rk

‖xt −Dt−1α‖22 + λΩ(α)

gt(D) = ‖xt −Dαt‖22 ≥ ft(D) gt(Dt−1) = ft(Dt−1)

We minimize the aggregated surrogate

gt(D) ,1

t

t∑s=1

gs(D) ≥ 1

t

t∑s=1

fs(D) , ft(D)

and obtain Dt

Arthur Mensch Stochastic Subsampling for Matrix Factorization 11 / 24

Page 17: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

A majorization-minimization algorithm

Expected risk minimization: xt , x (it)

minD∈C

f (D) = E[ft(D)] ft(D) , ‖xt −Dαt(D)‖22

At iteration t: we build a pointwise majorizing surrogate

αt = argminα∈Rk

‖xt −Dt−1α‖22 + λΩ(α)

gt(D) = ‖xt −Dαt‖22 ≥ ft(D) gt(Dt−1) = ft(Dt−1)

We minimize the aggregated surrogate

gt(D) ,1

t

t∑s=1

gs(D) ≥ 1

t

t∑s=1

fs(D) , ft(D)

and obtain Dt

Arthur Mensch Stochastic Subsampling for Matrix Factorization 11 / 24

Page 18: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

A majorization-minimization algorithm

Expected risk minimization: xt , x (it)

minD∈C

f (D) = E[ft(D)] ft(D) , ‖xt −Dαt(D)‖22

At iteration t: we build a pointwise majorizing surrogate

αt = argminα∈Rk

‖xt −Dt−1α‖22 + λΩ(α)

gt(D) = ‖xt −Dαt‖22 ≥ ft(D) gt(Dt−1) = ft(Dt−1)

We minimize the aggregated surrogate

gt(D) ,1

t

t∑s=1

gs(D) ≥ 1

t

t∑s=1

fs(D) , ft(D)

and obtain Dt

Arthur Mensch Stochastic Subsampling for Matrix Factorization 11 / 24

Page 19: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

A majorization-minimization algorithm

The surrogate is a simple quadratic

gt(D) = Tr (D>DCt −D>Bt)

with parameters that can be updated online

Ct =1

t

t∑s=1

αsα>s Bt =

1

t

t∑s=1

xsα>s

The surrogate is minimized using block coordinate descent

Arthur Mensch Stochastic Subsampling for Matrix Factorization 12 / 24

Page 20: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Convergence guarantees (informal)

(Dt)t converges a.s. towards a critical point of the expected risk

minD∈C

f (D) = E[ft(D)]

Major lemma: gt is a tighter and tighter surrogate:

gt(DDt−1)− ft(gt(Dt−1))→ 0

and the algorithm is asymptotically a majorization-minimizationalgorithm.

Arthur Mensch Stochastic Subsampling for Matrix Factorization 13 / 24

Page 21: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Convergence guarantees (informal)

(Dt)t converges a.s. towards a critical point of the expected risk

minD∈C

f (D) = E[ft(D)]

Major lemma: gt is a tighter and tighter surrogate:

gt(DDt−1)− ft(gt(Dt−1))→ 0

and the algorithm is asymptotically a majorization-minimizationalgorithm.

Arthur Mensch Stochastic Subsampling for Matrix Factorization 13 / 24

Page 22: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Algorithm design: summary

Online dictionary learning [Mairal et al., 2010]

1 Compute code

– O(p)

αt = argminα∈Rk

‖xt −Dt−1α‖22 + λΩ(αt)

2 Update surrogate

– O(p)

gt(D) =1

t

t∑s=1

‖xs −Dαs‖22 = Tr (D>DCt −D>Bt)

3 Minimize surrogate

– O(p)

Dt = argminD∈C

gt(D) = argminD∈C

Tr (D>DCt −D>Bt)

Access to xt → Algorithm in O(p) (complexity dependency in p)

Arthur Mensch Stochastic Subsampling for Matrix Factorization 14 / 24

Page 23: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Algorithm design: summary

Online dictionary learning [Mairal et al., 2010]

1 Compute code – O(p)

αt = argminα∈Rk

‖xt −Dt−1α‖22 + λΩ(αt)

2 Update surrogate – O(p)

gt(D) =1

t

t∑s=1

‖xs −Dαs‖22 = Tr (D>DCt −D>Bt)

3 Minimize surrogate – O(p)

Dt = argminD∈C

gt(D) = argminD∈C

Tr (D>DCt −D>Bt)

Access to xt → Algorithm in O(p) (complexity dependency in p)

Arthur Mensch Stochastic Subsampling for Matrix Factorization 14 / 24

Page 24: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Stochastic subsampling

How to reduce single iteration cost O(p)?

Sample masking matrix Mt

Diagonal matrix with rescaled Bernouillicoefficients, E[rank Mt ] = q

xt →Mtxt , E[Mtxt ] = xt

Use only Mtxt in algorithm computations

Noisy updates but single iteration in O(q)

Stream

Subsample

Subsampled Online matrix Factorization (SOMF)

Adapt the 3 parts of the algorith to obtain O(q) complexity

1 Codecomputation

2 Surrogateupdate

3 Surrogateminimization

Arthur Mensch Stochastic Subsampling for Matrix Factorization 15 / 24

Page 25: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Stochastic subsampling

How to reduce single iteration cost O(p)?

Sample masking matrix Mt

Diagonal matrix with rescaled Bernouillicoefficients, E[rank Mt ] = q

xt →Mtxt , E[Mtxt ] = xt

Use only Mtxt in algorithm computations

Noisy updates but single iteration in O(q)

Stream

Subsample

Subsampled Online matrix Factorization (SOMF)

Adapt the 3 parts of the algorith to obtain O(q) complexity

1 Codecomputation

2 Surrogateupdate

3 Surrogateminimization

Arthur Mensch Stochastic Subsampling for Matrix Factorization 15 / 24

Page 26: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Stochastic subsampling

How to reduce single iteration cost O(p)?

Sample masking matrix Mt

Diagonal matrix with rescaled Bernouillicoefficients, E[rank Mt ] = q

xt →Mtxt , E[Mtxt ] = xt

Use only Mtxt in algorithm computations

Noisy updates but single iteration in O(q)

Stream

Subsample

Subsampled Online matrix Factorization (SOMF)

Adapt the 3 parts of the algorith to obtain O(q) complexity

1 Codecomputation

2 Surrogateupdate

3 Surrogateminimization

Arthur Mensch Stochastic Subsampling for Matrix Factorization 15 / 24

Page 27: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Stochastic subsampling

How to reduce single iteration cost O(p)?

Sample masking matrix Mt

Diagonal matrix with rescaled Bernouillicoefficients, E[rank Mt ] = q

xt →Mtxt , E[Mtxt ] = xt

Use only Mtxt in algorithm computations

Noisy updates but single iteration in O(q)

Stream

Subsample

Subsampled Online matrix Factorization (SOMF)

Adapt the 3 parts of the algorith to obtain O(q) complexity

1 Codecomputation

2 Surrogateupdate

3 Surrogateminimization

Arthur Mensch Stochastic Subsampling for Matrix Factorization 15 / 24

Page 28: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Stochastic subsampling

How to reduce single iteration cost O(p)?

Sample masking matrix Mt

Diagonal matrix with rescaled Bernouillicoefficients, E[rank Mt ] = q

xt →Mtxt , E[Mtxt ] = xt

Use only Mtxt in algorithm computations

Noisy updates but single iteration in O(q)

Stream

Subsample

Subsampled Online matrix Factorization (SOMF)

Adapt the 3 parts of the algorith to obtain O(q) complexity

1 Codecomputation

2 Surrogateupdate

3 Surrogateminimization

Arthur Mensch Stochastic Subsampling for Matrix Factorization 15 / 24

Page 29: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

1. Code computation

Linear regression with random sampling

αt = argminα∈Rk

‖Mt(xt −Dt−1αt)‖22 + λΩ(α)

approximative (sketched) solution of

αt = argminα∈Rk

‖xt −Dt−1αt‖22 + λΩ(α)

(Mt)t introduces errors in (αt)t computations

The error can be controlled and reduced

Arthur Mensch Stochastic Subsampling for Matrix Factorization 16 / 24

Page 30: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

3. Surrogate minimization

Original OMF: block coordinate descent with projection on C

minD∈C

gt(D) d(j) ← p⊥C[j ,j](d(j) − 1

Cj ,j(Dc

(j)t − b

(j)t )

SOMF: Freeze the rows not selected by Mt

minD∈C

P⊥t D=P⊥t Dt−1

gt(D)

Freezerows

Reduces to a block coordinate descent in Rq×k !

Arthur Mensch Stochastic Subsampling for Matrix Factorization 17 / 24

Page 31: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

3. Surrogate minimization

Original OMF: block coordinate descent with projection on C

minD∈C

gt(D) d(j) ← p⊥C[j ,j](d(j) − 1

Cj ,j(Dc

(j)t − b

(j)t )

SOMF: Freeze the rows not selected by Mt

minD∈C

P⊥t D=P⊥t Dt−1

gt(D)

Freezerows

Reduces to a block coordinate descent in Rq×k !

Arthur Mensch Stochastic Subsampling for Matrix Factorization 17 / 24

Page 32: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

3. Surrogate minimization

Original OMF: block coordinate descent with projection on C

minD∈C

gt(D) d(j) ← p⊥C[j ,j](d(j) − 1

Cj ,j(Dc

(j)t − b

(j)t )

SOMF: Freeze the rows not selected by Mt

minD∈C

P⊥t D=P⊥t Dt−1

gt(D)

Freezerows

Reduces to a block coordinate descent in Rq×k !

Arthur Mensch Stochastic Subsampling for Matrix Factorization 17 / 24

Page 33: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Theoretical analysis

Convergence theorem (informal)

f (Dt) converges with probability one and every limit point D∞ of(Dt)t is a stationary point of f : for all D ∈ C

∇f (D∞,D−D∞) ≥ 0

Surrogate approximation Partial minimization

Proof: Control perturbation (red) from the online matrixfactorization algorithm (green)

Arthur Mensch Stochastic Subsampling for Matrix Factorization 18 / 24

Page 34: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Theoretical analysis

Convergence theorem (informal)

f (Dt) converges with probability one and every limit point D∞ of(Dt)t is a stationary point of f : for all D ∈ C

∇f (D∞,D−D∞) ≥ 0

Surrogate approximation Partial minimization

Proof: Control perturbation (red) from the online matrixfactorization algorithm (green)

Arthur Mensch Stochastic Subsampling for Matrix Factorization 18 / 24

Page 35: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Results: up to 12x speed-up

Arthur Mensch Stochastic Subsampling for Matrix Factorization 19 / 24

Page 36: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Resting-state fMRI

Online dictionary learning

235 h run time

1 full epoch

10 h run time

124 epoch

Proposed method

10 h run time

12 epoch, reduction r=12

Qualitatively, usable maps are obtained 10× faster

Arthur Mensch Stochastic Subsampling for Matrix Factorization 20 / 24

Page 37: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Hyperspectral imaging

Comp. 1 Comp. 2 Comp. 3Time: 14h

841k patchesOMF

r = 1 Time: 177 s

3k patches

SOMF

r = 24Time: 179 s

87k patches

SOMF atoms are more focal and less noisy given a certaintime budget

Arthur Mensch Stochastic Subsampling for Matrix Factorization 21 / 24

Page 38: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Collaborative filtering

Mtxt movie ratings from user t

vs. coordinate descent for MMMF loss (no hyperparameters)

100 s 1000 s0.930.940.950.960.970.980.99 Netflix (140M)

Coordinate descent

Proposed(full projection)

Proposed(partial projection)

Dataset Test RMSE Speed

CD MODL -up

ML 1M 0.872 0.866 ×0.75ML 10M 0.802 0.799 ×3.7NF (140M) 0.938 0.934 ×6.8

Outperform coordinate descent beyond 10M ratings

Same prediction performance

Speed-up 6.8× on Netflix

Arthur Mensch Stochastic Subsampling for Matrix Factorization 22 / 24

Page 39: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Conclusion

New efficient algorithm with many potential use-case.

Subsampling mini-batches at each iteration.

Python package (github.com/arthurmensch/modl)

Perspectives:

Efficient heuristics and adaptative subsampling ratio

Is this kind of approach transposable to SGD setting ?

Arthur Mensch Stochastic Subsampling for Matrix Factorization 23 / 24

Page 40: Stochastic Subsampling for Massive Matrix Factorization · - Dictionary update Stream columns - Code com-putation Online matrix factorization Alternate-minimization (dim.) Iteration

Publications

[Mairal et al., 2010] Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2010).

Online learning for matrix factorization and sparse coding.

The Journal of Machine Learning Research, 11:19–60.

[Mensch et al., 2016] Mensch, A., Mairal, J., Thirion, B., and Varoquaux, G.(2016).

Dictionary learning for massive matrix factorization.

In 33rd International Conference on Machine Learning (ICML).

[Mensch et al., 2017] Mensch, A., Mairal, J., Thirion, B., and Varoquaux, G.(2017).

Stochastic Subsampling for Factorizing Huge Matrices.

arXiv:1701.05363 [cs, math, q-bio, stat].

Arthur Mensch Stochastic Subsampling for Matrix Factorization 24 / 24