45
Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Daphne Koller QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Benjamin Packer

Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Embed Size (px)

Citation preview

Page 1: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Curriculum Learning forLatent Structural SVM

M. Pawan Kumar

(under submission)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Daphne Koller

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Benjamin Packer

Page 2: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

AimTo learn accurate parameters for latent structural SVM

Input x

Output y Y

“Deer”

Hidden Variableh H

Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }

Page 3: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

AimTo learn accurate parameters for latent structural SVM

Feature (x,y,h)(HOG, BoW)

(y*,h*) = maxyY,hH wT(x,y,h)

Parameters w

Page 4: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Motivation

Real Numbers

Imaginary Numbers

eiπ+1 = 0

Math is forlosers !!

FAILURE … BAD LOCAL MINIMUM

Page 5: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Motivation

Real Numbers

Imaginary Numbers

eiπ+1 = 0

Euler wasa Genius!!

SUCCESS … GOOD LOCAL MINIMUMCurriculum Learning: Bengio et al, ICML 2009

Page 6: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Motivation

Start with “easy” examples, then consider “hard” ones

Easy vs. Hard

Expensive

Easy for human Easy for machine

Simultaneously estimate easiness and parametersEasiness is property of data sets, not single instances

Page 7: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Outline

• Latent Structural SVM

• Concave-Convex Procedure

• Curriculum Learning

• Experiments

Page 8: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Latent Structural SVM

Training samples xi

Ground-truth label yi

Loss Function(yi, yi(w), hi(w))

Felzenszwalb et al, 2008, Yu and Joachims, 2009

Page 9: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Latent Structural SVM

(yi(w),hi(w)) = maxyY,hH wT(x,y,h)

min ||w||2 + C∑i(yi, yi(w), hi(w))

Non-convex Objective

Minimize an upper bound

Page 10: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Latent Structural SVM

min ||w||2 + C∑i i

maxhiwT(xi,yi,hi) - wT(xi,y,h)

≥ (yi, y, h) - i

Still non-convex Difference of convex

CCCP Algorithm - converges to a local minimum

(yi(w),hi(w)) = maxyY,hH wT(x,y,h)

Page 11: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Outline

• Latent Structural SVM

• Concave-Convex Procedure

• Curriculum Learning

• Experiments

Page 12: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Concave-Convex Procedure

Start with an initial estimate w0

Update

Update wt+1 by solving a convex problem

min ||w||2 + C∑i i

wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i

hi = maxhH wtT(xi,yi,h)

Page 13: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Concave-Convex Procedure

Looks at all samples simultaneously

“Hard” samples will cause confusion

Start with “easy” samples, then consider “hard” ones

Page 14: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Outline

• Latent Structural SVM

• Concave-Convex Procedure

• Curriculum Learning

• Experiments

Page 15: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Curriculum Learning

REMINDER

Simultaneously estimate easiness and parametersEasiness is property of data sets, not single instances

Page 16: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Curriculum Learning

Start with an initial estimate w0

Update

Update wt+1 by solving a convex problem

min ||w||2 + C∑i i

wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i

hi = maxhH wtT(xi,yi,h)

Page 17: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Curriculum Learning

min ||w||2 + C∑i i

wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i

Page 18: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Curriculum Learning

min ||w||2 + C∑i vii

wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i

vi {0,1}

Trivial Solution

Page 19: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Curriculum Learning

vi {0,1}

Large K Medium K Small K

min ||w||2 + C∑i vii - ∑ivi/K

wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i

Page 20: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Curriculum Learning

vi [0,1]

min ||w||2 + C∑i vii - ∑ivi/K

wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i

Large K Medium K Small K

BiconvexProblem

Page 21: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Curriculum LearningStart with an initial estimate w0

Update

Update wt+1 by solving a convex problem

min ||w||2 + C∑i vii - ∑i vi/K

wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i

hi = maxhH wtT(xi,yi,h)

Decrease K K/

Page 22: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Outline

• Latent Structural SVM

• Concave-Convex Procedure

• Curriculum Learning

• Experiments

Page 23: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Object Detection

Feature (x,y,h) - HOG

Input x - Image

Output y Y

Latent h - Box

- 0/1 Loss

Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }

Page 24: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Object Detection

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset

Page 25: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Object DetectionCCCP Curriculum

Page 26: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Object DetectionCCCP Curriculum

Page 27: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Object DetectionCCCP Curriculum

Page 28: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Object DetectionCCCP Curriculum

Page 29: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

44.14.24.34.44.54.64.74.84.9

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5

Objective value

0

5

10

15

20

25

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5CCCP Curriculum

Test error

Object Detection

Page 30: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Handwritten Digit Recognition

Feature (x,y,h) - PCA + Projection

Input x - Image

Output y Y

Y = {0, 1, … , 9}

Latent h - Rotation

MNIST Dataset

- 0/1 Loss

Page 31: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Handwritten Digit Recognition

- Significant Difference

C

C

C

Page 32: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Handwritten Digit Recognition

- Significant Difference

C

C

C

Page 33: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Handwritten Digit Recognition

- Significant Difference

C

C

C

Page 34: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Handwritten Digit Recognition

- Significant Difference

C

C

C

Page 35: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Motif Finding

Feature (x,y,h) - Ng and Cardie, ACL 2002

Input x - DNA Sequence

Output y Y

Y = {0, 1}

Latent h - Motif Location

- 0/1 Loss

Page 36: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Motif Finding

40,000 sequences

50/50 train/test split

5 folds

UniProbe Dataset

Page 37: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Motif FindingAverage Hamming Distance of Inferred Motifs

Page 38: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Motif Finding

020406080

100120140160

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5

CCCPCurr

Objective Value

Page 39: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Motif Finding

0

10

20

30

40

50

Fold1

Fold2

Fold3

Fold4

Fold5

CCCPCurr

Test Error

Page 40: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Noun Phrase Coreference

Feature (x,y,h) - Yu and Joachims, ICML 2009

Input x - Nouns Output y - Clustering

Latent h - Spanning Forest over Nouns

Page 41: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Noun Phrase Coreference60 documents

50/50 train/test split 1 predefined fold

MUC6 Dataset

Page 42: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Noun Phrase Coreference

- Significant Improvement

- Significant Decrement

MITRELoss

PairwiseLoss

Page 43: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Noun Phrase Coreference

MITRELoss

PairwiseLoss

Page 44: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Noun Phrase Coreference

MITRELoss

PairwiseLoss

Page 45: Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

Summary

• Automatic Curriculum Learning

• Concave-Biconvex Procedure

• Generalization to other Latent models– Expectation-Maximization– E-step remains the same

– M-step includes indicator variables vi