Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne...

Curriculum Learning forLatent Structural SVM

M. Pawan Kumar

(under submission)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Daphne Koller

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Benjamin Packer

AimTo learn accurate parameters for latent structural SVM

Input x

Output y Y

“Deer”

Hidden Variableh H

Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }

AimTo learn accurate parameters for latent structural SVM

Feature (x,y,h)(HOG, BoW)

(y*,h*) = maxyY,hH wT(x,y,h)

Parameters w

Motivation

Real Numbers

Imaginary Numbers

eiπ+1 = 0

Math is forlosers !!

FAILURE … BAD LOCAL MINIMUM

Motivation

Real Numbers

Imaginary Numbers

eiπ+1 = 0

Euler wasa Genius!!

SUCCESS … GOOD LOCAL MINIMUMCurriculum Learning: Bengio et al, ICML 2009

Motivation

Start with “easy” examples, then consider “hard” ones

Easy vs. Hard

Expensive

Easy for human Easy for machine

Simultaneously estimate easiness and parametersEasiness is property of data sets, not single instances

Outline

• Latent Structural SVM

• Concave-Convex Procedure

• Curriculum Learning

• Experiments

Latent Structural SVM

Training samples xi

Ground-truth label yi

Loss Function(yi, yi(w), hi(w))

Felzenszwalb et al, 2008, Yu and Joachims, 2009

(yi(w),hi(w)) = maxyY,hH wT(x,y,h)

min ||w||2 + C∑i(yi, yi(w), hi(w))

Non-convex Objective

Minimize an upper bound

min ||w||2 + C∑i i

maxhiwT(xi,yi,hi) - wT(xi,y,h)

≥ (yi, y, h) - i

Still non-convex Difference of convex

CCCP Algorithm - converges to a local minimum

(yi(w),hi(w)) = maxyY,hH wT(x,y,h)

Outline

• Experiments

Concave-Convex Procedure

Start with an initial estimate w0

Update

Update wt+1 by solving a convex problem

min ||w||2 + C∑i i

wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i

hi = maxhH wtT(xi,yi,h)

Concave-Convex Procedure

Looks at all samples simultaneously

“Hard” samples will cause confusion

Start with “easy” samples, then consider “hard” ones

Outline

• Experiments

Curriculum Learning

REMINDER

Simultaneously estimate easiness and parametersEasiness is property of data sets, not single instances

Curriculum Learning

Start with an initial estimate w0

Update

min ||w||2 + C∑i i

Curriculum Learning

min ||w||2 + C∑i i

Curriculum Learning

min ||w||2 + C∑i vii

vi {0,1}

Trivial Solution

Curriculum Learning

vi {0,1}

Large K Medium K Small K

min ||w||2 + C∑i vii - ∑ivi/K

Curriculum Learning

vi [0,1]

min ||w||2 + C∑i vii - ∑ivi/K

Large K Medium K Small K

BiconvexProblem

Curriculum LearningStart with an initial estimate w0

Update

min ||w||2 + C∑i vii - ∑i vi/K

Decrease K K/

Outline

• Experiments

Object Detection

Feature (x,y,h) - HOG

Input x - Image

Output y Y

Latent h - Box

- 0/1 Loss

Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }

Object Detection

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset

Object DetectionCCCP Curriculum

44.14.24.34.44.54.64.74.84.9

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5

Objective value

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5CCCP Curriculum

Test error

Object Detection

Handwritten Digit Recognition

Feature (x,y,h) - PCA + Projection

Input x - Image

Output y Y

Y = {0, 1, … , 9}

Latent h - Rotation

MNIST Dataset

- 0/1 Loss

- Significant Difference

Motif Finding

Feature (x,y,h) - Ng and Cardie, ACL 2002

Input x - DNA Sequence

Output y Y

Y = {0, 1}

Latent h - Motif Location

- 0/1 Loss

Motif Finding

40,000 sequences

50/50 train/test split

5 folds

UniProbe Dataset

Motif FindingAverage Hamming Distance of Inferred Motifs

Motif Finding

020406080

100120140160

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5

CCCPCurr

Objective Value

Motif Finding

CCCPCurr

Test Error

Noun Phrase Coreference

Feature (x,y,h) - Yu and Joachims, ICML 2009

Input x - Nouns Output y - Clustering

Latent h - Spanning Forest over Nouns

Noun Phrase Coreference60 documents

50/50 train/test split 1 predefined fold

MUC6 Dataset

- Significant Improvement

- Significant Decrement

MITRELoss

PairwiseLoss

MITRELoss

PairwiseLoss

MITRELoss

PairwiseLoss

Summary

• Automatic Curriculum Learning

• Concave-Biconvex Procedure

• Generalization to other Latent models– Expectation-Maximization– E-step remains the same

– M-step includes indicator variables vi

Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne...

Documents

Pawan Singh Takhar (Previous Name: Pawan P. Singh) · CURRICULUM VITAE Pawan Singh Takhar (Previous Name: Pawan P. Singh) Rm 203, Animal and Food Sciences Bldg. International Center

Pawan Word

Apollo and Daphne Apollo Cupid Daphne Peneus. Cupid

Learning a Small Mixture of Trees M. Pawan Kumar Daphne Koller pawan koller Aim: To efficiently learn a

DAPHNE PROGRAMME

Pawan agencies

Daphne Micaela

PAWAN KSHIRE (CRY)

Major Project Pawan

G08C PAWAN STATIC ASPIRATOR A · 2020. 1. 20. · PAWAN A120 PAWAN A140 PAWAN A160 PAWAN A170 Operating Pressure : 4 to 8 bar. Air Inlet Connection : 12mm ID plastic pipe Air Exhaust

CITY OF DAPHNE 1705 MAIN STREET, DAPHNE, AL CITY …

Pawan Sb Imf

PAWAN KUMAR.ppt

Pawan Granite

PAWAN PROJECT

Daphne Ellis_Powerpoint

pawan HW#1

PRODUCTION AND TEST PACKER RETRIEVABLE and Test Packer - Retrievable.pdf · PRODUCTION AND TEST PACKER RETRIEVABLE MODEL BR-3 DOUBLE-GRIP PACKER OPERATION 60 ... pumping and injection

Pawan Hans

Daphne mendizabal