Upload
reynard-washington
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Ariadna QuattoniXavier Carreras
An Efficient Projection for l1,∞ Regularization
Michael Collins Trevor Darrell
MIT CSAIL
Goal : Efficient training of jointly sparse models in high dimensional spaces.
Joint Sparsity
Why? : Learn from fewer
examples. Build more efficient
classifiers. Interpretability.
3
Church Airport Grocery Store
Flower-Shop
2,1w
2,5w
11w 11w
2,4w
2,3w
2,2w
3,1w
3,5w
3,4w
3,3w
3,2w
4,1w
4,5w
4,4w
4,3w
4,2w
1,1w
1,5w
1,4w
1,3w
1,2w
4
Church Airport Grocery Store
Flower-Shop
2,1w
2,5w
11w 11w
2,4w
2,3w
2,2w
3,5w
3,4w
3,2w
4,1w
4,5w
4,4w
4,3w
4,2w
1,1w
1,5w
1,4w
1,3w
1,2w
5
Church Airport Grocery Store
Flower-Shop
11w 11w
2,4w
2,3w
2,2w
3,5w
3,4w
3,2w
4,4w
4,2w
1,4w
1,2w
l1,∞ Regularization How do we promote joint (i.e. row) sparsity ?
Coefficients forfeature 2
Coefficients for task 2
An l1 norm on the maximum absolute values of the coefficients across tasks promotes sparsity.
Use few features
The l∞ norm on each row promotes non-sparsity on each row.
Share parameters
Contributions An efficient projected gradient method for l1,∞ regularization
Our projection works on O(n log n) time, same cost as l1 projection
Experiments in Multitask image classification problems
We can discover jointly sparse solutions
l1,∞ regularization leads to better performance than l2 and l1 regularization
Multitask Application
},...,,{ 21 mDDDD
Joint SparseApproximation
1D2D mD
Collection of Tasks
l1,∞ Regularization: Constrained Convex
Optimization Formulation
We use a Projected SubGradient method. Main advantages: simple, scalable, guaranteed convergence rates.
A convex function
Convex constraints
Projected SubGradient methods have been recently proposed:
l2 regularization, i.e. SVM [Shalev-Shwartz et al. 2007]
l1 regularization [Duchi et al. 2008]
10
Euclidean Projection into the l1-∞ ball
11
Characterization of the solution
Series1
02468
10
Series1
02468
10
Series1
02468
10
Series1
02468
10
Characterization of the solution
Feature I Feature II Feature III Feature IV
Mapping to a simpler problem We can map the projection problem to the following problem which finds the optimal maximums μ:
Efficient Algorithm
14
Se-ries
1
02468
10
Se-ries
1
02468
10
Se-ries
1
02468
10
Se-ries
1
02468
10
Efficient Algorithm
15
Se-ries
1
02468
10
Se-ries
1
02468
10
Se-ries
1
02468
10
Se-ries
1
02468
10
16
Complexity
The total cost of the algorithm is dominated by sorting the entries of A.
The total cost is in the order of:
17
Synthetic Experiments
Generate a jointly sparse parameter matrix W:
For every task we generate pairs:where:
We compared three different types of regularization :
l1,∞ projection l1 projection l2 projection
18
Synthetic Experiments
19
Dataset: Image Annotation
40 top content words Raw image representation: Vocabulary Tree(Nister and Stewenius 2006)
11000 dimensions
president actress team
20
Results
Most of the differences are statistically significant
21
Results
22
Dataset: Indoor Scene Recognition
67 indoor scenes. Raw image representation: similarities to a set of unlabeled images. 2000 dimensions.
bakery bar Train station
Results
Conclusions
We proposed an efficient global optimization algorithm for l1,∞ regularization.
We presented experiments on image classification tasks and shown that our method can recover jointly sparse solutions.
A simple an efficient tool to implement an l1,∞ penalty, similar to standard l1 and l2 penalties.