46
1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

1

Transfer Learning Algorithms forImage Classification

Ariadna QuattoniMIT, CSAIL

Advisors:Michael CollinsTrevor Darrell

Page 2: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

2 / 46

Motivation

We want to be able to build classifiers for thousands of visual categories.

We want to exploit rich and complex feature representations.

Problem:

Goal:

We might only have a few labeled samples per category.

Solution: Transfer Learning, leverage labeled data from multiple related tasks.

Page 3: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

3 / 46

Thesis Contributions

Learn an image representation using supervised data from auxiliary tasks automatically derived from unlabeled images + meta-data.

A transfer learning model based on joint regularization and an efficient optimization algorithm for training jointly sparse classifiers in high dimensional feature spaces.

We study efficient transfer algorithms for image classification whichcan exploit supervised training data from a set of related tasks:

Page 4: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

4 / 46

A method for learning image representations fromunlabeled images + meta-dataLarge dataset of

unlabeled images + meta-data

Create auxiliaryproblems

Structure Learning[Ando & Zhang, JMLR 2005] h:F I R

Visual Representation

)()( xvxf ttkk

Structure Learning:

Task specific parameters Shared Parameters

Page 5: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

5 / 46

A method for learning image representations fromunlabeled images + meta-data

4 8 16 32 64 128 3560.5

0.55

0.6

0.65

0.7

0.75

# training samples

Average AUCReuters Dataset

Structure Learning

Baseline ModelPCA Model

Page 6: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

6 / 46

Outline

An overview of transfer learning methods.

A joint sparse approximation model for transfer learning.

Asymmetric transfer experiments.

An efficient training algorithm.

Symmetric transfer image annotation experiments.

Page 7: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

7 / 46

Transfer Learning: A brief overview

The goal of transfer learning is to use labeled data from related tasks to make learning easier. Two settings:

Asymmetric transfer: Resource: Large amounts of supervised data for a set of related tasks. Goal: Improve performance on a target task for which training data is scarce.

Symmetric transfer: Resource: Small amount of training data for a large number of related tasks. Goal: Improve average performance over all classifiers.

Page 8: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

8 / 46

Transfer Learning: A brief overview

Three main approaches:

Learning intermediate latent representations: [Thrun 1996, Baxter 1997, Caruana 1997, Argyriou 2006, Amit 2007]

Learning priors over parameters: [Raina 2006, Lawrence et al. 2004 ]

Learning relevant shared features [Torralba 2004, Obozinsky 2006]

Page 9: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

9 / 46

Feature Sharing Framework:

Work with a rich representation: Complex features, high dimensional space Some of them will be very discriminative (hopefully) Most will be irrelevant

Related problems may share relevant features.

If we knew the relevant features we could: Learn from fewer examples Build more efficient classifiers

We can train classifiers from related problems together using a regularization penalty designed to promote joint sparsity.

Ariadna
Here is our framework for image recognition tasks.Something that works well is to use unlabeled data to generate a large set of features, some of which might be complex. The advantage is that in such a representation we expect to find features that are discriminative.However since the features where generated without regads to the classification tasks at hand many of them will be irrelevant.
Page 10: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

10 / 46

Church Airport Grocery Store

Flower-Shop

2,1w

2,5w

11w 1

1w

2,4w

2,3w

2,2w

3,1w

3,5w

3,4w

3,3w

3,2w

4,1w

4,5w

4,4w

4,3w

4,2w

1,1w

1,5w

1,4w

1,3w

1,2w

Page 11: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

11 / 46

Church Airport Grocery Store

Flower-Shop

2,1w

2,5w

11w 1

1w

2,4w

2,3w

2,2w

3,5w

3,4w

3,2w

4,1w

4,5w

4,4w

4,3w

4,2w

1,1w

1,5w

1,4w

1,3w

1,2w

Page 12: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

12 / 46

Church Airport Grocery Store

Flower-Shop

11w 1

1w

2,4w

2,3w

2,2w

3,5w

3,4w

3,2w

4,4w

4,2w

1,4w

1,2w

Page 13: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

13 / 46

Related Formulations of Joint Sparse Approximation

Obozinski et al. [2006] proposed L1-2 joint penalty and developed a blockwise boosting scheme based on Boosted-Lasso.

Torralba et al. [2004] developed a joint boosting algorithm based on the idea of learning additive models for each class that share weak learners.

Page 14: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

14 / 46

Our Contribution

Previous approaches to joint sparse approximation have relied on greedy coordinate descent methods.

We propose a simple an efficient global optimization algorithm with guaranteed convergence rates.

A new model and optimization algorithm for training jointly sparse classifiers:

We test our model on real image classification tasks where we observe improvements in both asymmetric and symmetric transfer settings.

Our algorithm can scale to large problems involving hundreds of problems and thousands of examples and features.

Page 15: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

15 / 46

Outline

An overview of transfer learning methods.

A joint sparse approximation model for transfer learning.

Asymmetric transfer experiments.

An efficient training algorithm.

Symmetric transfer image annotation experiments.

Page 16: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

16 / 46

Notation

Collection of Tasks

},...,,{ 21 mDDDD},...,,{ 21 mDDDD

Joint SparseApproximation

1D2D mD

)},(),....,,{( 11kn

kn

kkk kk

yxyxD

dx }1,1{ y

mddd

m

m

www

www

www

,2,1,

,22,21,2

,12,11,1

W

Page 17: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

17 / 46

Single Task Sparse Approximation

xwxf )(

Dyx

d

jjwQyxfl

),( 1

||)),((minargw

Consider learning a single sparse linear classifier of the form:

We want a few features with non-zero coefficients

Recent work suggests to use L1 regularization:

Classification error

L1 penalizes

non-sparse solutions

Donoho [2004] proved (in a regression setting) that the solution with smallest L1 norm is also the sparsest solution.

Page 18: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

18 / 46

Joint Sparse Approximation

m

kmk

Dyxk

QyxflD

k121

),(,...,, ),....,,R()),((

||

1minarg www

m21 www

Setting : Joint Sparse Approximation

Average Loss on Collection D

Penalizes solutions

that utilize too many features

xxf kk w)(

Page 19: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

19 / 46

Joint Regularization Penalty

rowszerononW #)R(

mddd

m

m

www

www

www

,2,1,

,22,21,2

,12,11,1

How do we penalize solutions that use too many features?

Coefficients forfor feature 2

Coefficients for classifier 2

Would lead to a hard combinatorial problem .

Ariadna
Simultaneous OrthogonalMatching Pursuit, greedy approximationsAgregar citas
Page 20: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

20 / 46

Joint Regularization Penalty

We will use a L1-∞ norm [Tropp 2006]

d

iik

kWW

1

|)(|max)R(

The combination of the two norms results in a solution where only a few features are used but the features used will contribute in solving many classification problems.

This norm combines:

An L1 norm on the maximum absolute values of the coefficients across tasks promotes sparsity.

Use few features

The L∞ norm on each row promotes non-sparsity on each row.

Share features

Page 21: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

21 / 46

Joint Sparse Approximation

m

k

d

iik

kk

Dyxk

WQyxflD

k1 1),(

|)(|max)),((||

1minW

Using the L1-∞ norm we can rewrite our objective function as:

For the hinge loss: the optimization problem can be expressed as a linear

program.

))(1,0max()),(( xyfyxfl

For any convex loss this is a convex objective.

Page 22: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

22 / 46

Joint Sparse Approximation

Objective:

m

k

D

j

d

ii

kj

k

k

tQD1

||

1 1],,[ ||

1min tεW

Linear program formulation (hinge loss):

Max value constraints:

mkfor :1:

difor :1:

0kj

iiki twt mkfor :1:

|:|1: kDjfor

kj

kjk

kj xfy 1)(

and

Slack variables constraints:and

Page 23: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

23 / 46

Outline

An overview of transfer learning methods.

A joint sparse approximation model for transfer learning.

Asymmetric transfer experiments.

An efficient training algorithm.

Symmetric transfer image annotation experiments.

Page 24: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

24 / 46

Setting: Asymmetric Transfer

SuperBowl

Danish CartoonsSharon

Australian Open Trapped Miners

Golden globes

Grammys Figure Skating

Academy Awards

Iraq

Learn a representation using labeled data from 9 topics.

Train a classifier for the 10th held out topic using the relevantfeatures R only.

}0|)(|max:{ rkk wrR Define the set of relevant features to be:

Learn the matrix W using our transfer algorithm.

Page 25: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

25 / 46

4 20 40 60 80 100 120 1400.52

0.54

0.56

0.58

0.6

0.62

0.64

0.66

0.68

0.7

0.72

Average AUC

# training samples

Asymmetric Transfer

Baseline RepresentationTransfered Representation

Results

Page 26: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

26 / 46

Outline

An overview of transfer learning methods.

A joint sparse approximation model for transfer learning.

Asymmetric transfer experiments.

An efficient training algorithm.

Symmetric transfer image annotation experiments.

Page 27: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

27 / 46

The LP formulation is feasible for small problems but becomes intractable for larger data-sets with thousands of examples and dimensions.

We might want a more general optimization algorithm that can handle arbitrary convex losses.

Limitations of the LP formulation

The LP formulation can be optimized using standard LP solvers.

Page 28: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

28 / 46

L1-∞ Regularization: Constrained Convex Optimization Formulation

m

kk

Dyxk

yxflD

k1 ),(

)),((||

1minarg W

d

iik

kCWts

1

|)(|max..

We will use a Projected SubGradient method. Main advantages: simple, scalable, guaranteed convergence rates.

A convex function

Convex constraints

Projected SubGradient methods have been recently proposed:

L2 regularization, i.e. SVM [Shalev-Shwartz et al. 2007] L1 regularization [Duchi et al. 2008]

Ariadna
computing the subgradients is trivial.So the question is weather the projection can be computed efficiently.
Page 29: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

29 / 46

Euclidean Projection into the L1-∞ ball

Page 30: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

30 / 46

Characterization of the solution

1

Page 31: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

31 / 46

Characterization of the solution

Feature I Feature IIIFeature II Feature VI

3μ2μ

2,2

2,2

jA

jA

Page 32: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

32 / 46

Mapping to a simpler problem

We can map the projection problem to the following problem which finds the optimal maximums μ:

Page 33: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

33 / 46

Efficient Algorithm for: , in pictures

4 Features, 6 problems, C=14

d

iik

kA

1

29|)(|max

Page 34: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

34 / 46

Complexity

The total cost of the algorithm is dominated by a sort of

The total cost is in the order of:

the entries of A.

))log(( dmdmO

Page 35: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

35 / 46

Outline

An overview of transfer learning methods.

A joint sparse approximation model for transfer learning.

Asymmetric transfer experiments.

An efficient training algorithm.

Symmetric transfer image annotation experiments.

Page 36: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

36 / 46

Synthetic Experiments

Generate a jointly sparse parameter matrix W:

)( ki

tk

ki xwsigny

For every task we generate pairs: ),( ki

ki yx

where

We compared three different types of regularization(i.e. projections):

L1−∞ projection L2 projection L1 projection

Page 37: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

37 / 46

Synthetic Experiments

10 20 40 80 160 320 64015

20

25

30

35

40

45

50

# training examples per task

Err

or

Synthetic Experiments Results: 60 problems 200 features 10% relevant

L2

L1L1-LINF

Test Error

Performance on predictingrelevant features

10 20 40 80 160 320 64010

20

30

40

50

60

70

80

90

100

# training examples per task

Feature Selection Performance

Precision L1-INF

Recall L1Precision L1

Recall L1-INF

Page 38: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

38 / 46

Dataset: Image Annotation

40 top content words

Raw image representation: Vocabulary Tree(Grauman and Darrell 2005, Nister and Stewenius 2006)

11000 dimensions

president actress team

Page 39: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

39 / 46

Results

The differences are statistically significant

Page 40: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

40 / 46

Results

Page 41: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

41 / 46

Dataset: Indoor Scene Recognition

67 indoor scenes.

Raw image representation: similarities to a set of unlabeled images.

2000 dimensions.

bakery bar Train station

Page 42: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

42 / 46

Results:

Page 43: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

43

Page 44: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

44 / 46

Summary of Thesis Contributions

A method that learns image representations using unlabeled images + meta-data.

A transfer model based on performing a joint loss minimization over the training sets of related tasks with a shared regularization. Previous approaches used greedy coordinate descent methods. We propose an efficient global optimization algorithm for learning jointly sparsemodels.

We presented experiments on real image classification tasks for both an asymmetric and symmetric transfer setting.

A tool that makes implementing an L1−∞ penalty as easy and almost as efficient as implementing the standard L1 and L2 penalties.

Page 45: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

45 / 46

Future Work

Task Clustering.

Online Optimization.

Generalization properties of L1−∞ regularized models.

Combining feature representations.

Page 46: 1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell

46 / 46

Thanks!