24
Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Embed Size (px)

Citation preview

Page 1: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Ariadna QuattoniXavier Carreras

An Efficient Projection for l1,∞ Regularization

Michael Collins Trevor Darrell

MIT CSAIL

Page 2: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Goal : Efficient training of jointly sparse models in high dimensional spaces.

Joint Sparsity

Why? : Learn from fewer

examples. Build more efficient

classifiers. Interpretability.

Page 3: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

3

Church Airport Grocery Store

Flower-Shop

2,1w

2,5w

11w 11w

2,4w

2,3w

2,2w

3,1w

3,5w

3,4w

3,3w

3,2w

4,1w

4,5w

4,4w

4,3w

4,2w

1,1w

1,5w

1,4w

1,3w

1,2w

Page 4: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

4

Church Airport Grocery Store

Flower-Shop

2,1w

2,5w

11w 11w

2,4w

2,3w

2,2w

3,5w

3,4w

3,2w

4,1w

4,5w

4,4w

4,3w

4,2w

1,1w

1,5w

1,4w

1,3w

1,2w

Page 5: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

5

Church Airport Grocery Store

Flower-Shop

11w 11w

2,4w

2,3w

2,2w

3,5w

3,4w

3,2w

4,4w

4,2w

1,4w

1,2w

Page 6: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

l1,∞ Regularization How do we promote joint (i.e. row) sparsity ?

Coefficients forfeature 2

Coefficients for task 2

An l1 norm on the maximum absolute values of the coefficients across tasks promotes sparsity.

Use few features

The l∞ norm on each row promotes non-sparsity on each row.

Share parameters

Page 7: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Contributions An efficient projected gradient method for l1,∞ regularization

Our projection works on O(n log n) time, same cost as l1 projection

Experiments in Multitask image classification problems

We can discover jointly sparse solutions

l1,∞ regularization leads to better performance than l2 and l1 regularization

Page 8: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Multitask Application

},...,,{ 21 mDDDD

Joint SparseApproximation

1D2D mD

Collection of Tasks

Page 9: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

l1,∞ Regularization: Constrained Convex

Optimization Formulation

We use a Projected SubGradient method. Main advantages: simple, scalable, guaranteed convergence rates.

A convex function

Convex constraints

Projected SubGradient methods have been recently proposed:

l2 regularization, i.e. SVM [Shalev-Shwartz et al. 2007]

l1 regularization [Duchi et al. 2008]

Ariadna
computing the subgradients is trivial.So the question is weather the projection can be computed efficiently.
Page 10: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

10

Euclidean Projection into the l1-∞ ball

Page 11: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

11

Characterization of the solution

Page 12: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Series1

02468

10

Series1

02468

10

Series1

02468

10

Series1

02468

10

Characterization of the solution

Feature I Feature II Feature III Feature IV

Page 13: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Mapping to a simpler problem We can map the projection problem to the following problem which finds the optimal maximums μ:

Page 14: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Efficient Algorithm

14

Se-ries

1

02468

10

Se-ries

1

02468

10

Se-ries

1

02468

10

Se-ries

1

02468

10

Page 15: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Efficient Algorithm

15

Se-ries

1

02468

10

Se-ries

1

02468

10

Se-ries

1

02468

10

Se-ries

1

02468

10

Page 16: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

16

Complexity

The total cost of the algorithm is dominated by sorting the entries of A.

The total cost is in the order of:

Page 17: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

17

Synthetic Experiments

Generate a jointly sparse parameter matrix W:

For every task we generate pairs:where:

We compared three different types of regularization :

l1,∞ projection l1 projection l2 projection

Page 18: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

18

Synthetic Experiments

Page 19: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

19

Dataset: Image Annotation

40 top content words Raw image representation: Vocabulary Tree(Nister and Stewenius 2006)

11000 dimensions

president actress team

Page 20: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

20

Results

Most of the differences are statistically significant

Page 21: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

21

Results

Page 22: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

22

Dataset: Indoor Scene Recognition

67 indoor scenes. Raw image representation: similarities to a set of unlabeled images. 2000 dimensions.

bakery bar Train station

Page 23: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Results

Page 24: Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL

Conclusions

We proposed an efficient global optimization algorithm for l1,∞ regularization.

We presented experiments on image classification tasks and shown that our method can recover jointly sparse solutions.

A simple an efficient tool to implement an l1,∞ penalty, similar to standard l1 and l2 penalties.