62
Learning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 1 / 21

Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Learning Structural SVMswith Latent Variables

Chun-Nam Yu

Dept. of Computer Science, Cornell University

October 8-9, IBM SMiLe Workshop

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 1 / 21

Page 2: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Structured Output PredictionTraditional classification and regression

Structured output prediction

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21

Page 3: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Structured Output PredictionTraditional classification and regression

Structured output prediction

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21

Page 4: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Structured Output PredictionTraditional classification and regression

Structured output prediction

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21

Page 5: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Structured Output PredictionTraditional classification and regression

Structured output prediction

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21

Page 6: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Structured Output PredictionTraditional classification and regression

Structured output prediction

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 2 / 21

Page 7: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]

min~w ,~ξ

12‖~w‖2 + C

n∑i=1

ξi

s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y) ≥ ∆(yi , y)− ξi

Loss function ∆ controls the penalty of predicting y insteadof yi

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21

Page 8: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]

min~w ,~ξ

12‖~w‖2 + C

n∑i=1

ξi

s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y) ≥ ∆(yi , y)− ξi

~w ·Φ( , )

︸ ︷︷ ︸score of correct parse tree

Loss function ∆ controls the penalty of predicting y insteadof yi

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21

Page 9: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]

min~w ,~ξ

12‖~w‖2 + C

n∑i=1

ξi

s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y) ≥ ∆(yi , y)− ξi

~w ·Φ( , )

︸ ︷︷ ︸score of correct parse tree

~w ·Φ( , )

︸ ︷︷ ︸score of wrong parse tree

Loss function ∆ controls the penalty of predicting y insteadof yi

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21

Page 10: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]

min~w ,~ξ

12‖~w‖2 + C

n∑i=1

ξi

s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y)≥∆(yi , y)− ξi

~w ·Φ( , )

︸ ︷︷ ︸score of correct parse tree

≥ ~w ·Φ( , )

︸ ︷︷ ︸score of wrong parse tree

Loss function ∆ controls the penalty of predicting y insteadof yi

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21

Page 11: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Introduction to Structural SVMsStructural SVM (Margin rescaling) [Tsochantardis et.al ’04]

min~w ,~ξ

12‖~w‖2 + C

n∑i=1

ξi

s.t . for 1 ≤ i ≤ n, for all output structures y ∈ Y ,~w · Φ(xi , yi)− ~w · Φ(xi , y) ≥ ∆(yi , y)− ξi

~w ·Φ( , )

︸ ︷︷ ︸score of correct parse tree

≥ ~w ·Φ( , )

︸ ︷︷ ︸score of wrong parse tree

Loss function ∆ controls the penalty of predicting y insteadof yiC.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 3 / 21

Page 12: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Solving Margin-based Training Problems withthe Cutting-Plane Algorithm

Exponentially many constraints, but solvable in polynomialtime

using the cutting-planealgorithm to speed uptraining of structural SVMs[Joachims, Finley & Yu,MLJ’09]

using approximatecutting-plane models tobuild faster and sparserkernel SVMs[Yu & Joachims, KDD’08],[Joachims & Yu, ECML’09;Best Machine Learning Paper]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 4 / 21

Page 13: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Solving Margin-based Training Problems withthe Cutting-Plane Algorithm

Exponentially many constraints, but solvable in polynomialtime

using the cutting-planealgorithm to speed uptraining of structural SVMs[Joachims, Finley & Yu,MLJ’09]

using approximatecutting-plane models tobuild faster and sparserkernel SVMs[Yu & Joachims, KDD’08],[Joachims & Yu, ECML’09;Best Machine Learning Paper]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 4 / 21

Page 14: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Solving Margin-based Training Problems withthe Cutting-Plane Algorithm

Exponentially many constraints, but solvable in polynomialtime

using the cutting-planealgorithm to speed uptraining of structural SVMs[Joachims, Finley & Yu,MLJ’09]

using approximatecutting-plane models tobuild faster and sparserkernel SVMs[Yu & Joachims, KDD’08],[Joachims & Yu, ECML’09;Best Machine Learning Paper]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 4 / 21

Page 15: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Incomplete Label Information and LatentVariablesDiscriminative motif finding

Noun Phrase Coreference

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 5 / 21

Page 16: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Incomplete Label Information and LatentVariablesDiscriminative motif finding

Noun Phrase Coreference

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 5 / 21

Page 17: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Incomplete Label Information and LatentVariablesDiscriminative motif finding

Noun Phrase Coreference

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 5 / 21

Page 18: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Incomplete Label Information and LatentVariablesDiscriminative motif finding

Noun Phrase Coreference

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 5 / 21

Page 19: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]

min~w ,~ξ

12‖~w‖2 + C

n∑i=1

ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,

maxh∈H

~w · Φ(xi , yi ,h)− maxh∈H

~w · Φ(xi , y , h) ≥ ∆(yi , y , h)− ξi

~w · Φ(︸ ︷︷ ︸xi

, ︸ ︷︷ ︸yi

, ︸ ︷︷ ︸h′

)

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21

Page 20: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]

min~w ,~ξ

12‖~w‖2 + C

n∑i=1

ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,

maxh∈H

~w · Φ(xi , yi ,h)− maxh∈H

~w · Φ(xi , y , h) ≥ ∆(yi , y , h)− ξi

{~w · Φ( , , )

~w · Φ(︸ ︷︷ ︸xi

, ︸ ︷︷ ︸yi

, ︸ ︷︷ ︸h′′

), . . .}

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21

Page 21: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]

min~w ,~ξ

12‖~w‖2 + C

n∑i=1

ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,

maxh∈H

~w · Φ(xi , yi ,h)− maxh∈H

~w · Φ(xi , y , h) ≥ ∆(yi , y , h)− ξi

maxh∈H{~w · Φ( , , )

~w · Φ(︸ ︷︷ ︸xi

, ︸ ︷︷ ︸yi

, ︸ ︷︷ ︸h′′

), . . .}

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21

Page 22: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]

min~w ,~ξ

12‖~w‖2 + C

n∑i=1

ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,

maxh∈H

~w · Φ(xi , yi ,h)−maxh∈H

~w · Φ(xi , y , h) ≥ ∆(yi , y , h)− ξi

maxh∈H{~w · Φ( , , )

~w · Φ(︸ ︷︷ ︸xi

, ︸ ︷︷ ︸y

, ︸ ︷︷ ︸h′′

), . . .}

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21

Page 23: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Latent Structural Support Vector MachinesLatent Structural SVM [Yu & Joachims, ICML’09]

min~w ,~ξ

12‖~w‖2 + C

n∑i=1

ξi s.t . for 1 ≤ i ≤ n, for all outputs y ∈ Y ,

maxh∈H

~w · Φ(xi , yi ,h)−maxh∈H

~w · Φ(xi , y , h)≥∆(yi , y , h)− ξi

maxh∈H{~w · Φ(︸ ︷︷ ︸

xi

, ︸ ︷︷ ︸yi

, ︸ ︷︷ ︸h′

), . . . . . .}

≥maxh∈H{~w · Φ(︸ ︷︷ ︸

xi

, ︸ ︷︷ ︸y

, ︸ ︷︷ ︸h′

), . . . . . .}

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 6 / 21

Page 24: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Solving the Non-Convex OptimizationConcave-Convex Procedure [Yuille & Rangarajan ’03]

1 Decompose the objective into convex and concave part

2 Upper bound the concave part with a hyperplane

3 Minimize the resulting convex sum. Iterate untilconvergence

Recent works employing the CCCP algorithm[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 7 / 21

Page 25: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Solving the Non-Convex OptimizationConcave-Convex Procedure [Yuille & Rangarajan ’03]

1 Decompose the objective into convex and concave part

2 Upper bound the concave part with a hyperplane

3 Minimize the resulting convex sum. Iterate untilconvergence

Recent works employing the CCCP algorithm[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 7 / 21

Page 26: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Solving the Non-Convex OptimizationConcave-Convex Procedure [Yuille & Rangarajan ’03]

1 Decompose the objective into convex and concave part

2 Upper bound the concave part with a hyperplane

3 Minimize the resulting convex sum. Iterate untilconvergence

Recent works employing the CCCP algorithm[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 7 / 21

Page 27: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Solving the Non-Convex OptimizationConcave-Convex Procedure [Yuille & Rangarajan ’03]

1 Decompose the objective into convex and concave part

2 Upper bound the concave part with a hyperplane

3 Minimize the resulting convex sum. Iterate untilconvergence

Recent works employing the CCCP algorithm[Collobert et al. ’06, Smola et al. ’05, Chapelle et al. ’08]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 7 / 21

Page 28: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Solving the Non-Convex Optimization

Concave-Convex Procedure (CCCP)(1) Decompose the objective into convex and concave part

[12‖~w‖2 + C

n∑i=1

max(y ,h)∈Y×H

[~w · Φ(xi , y , h) + ∆(yi , y , h)]

]︸ ︷︷ ︸

convex

[C

n∑i=1

maxh∈H

~w · Φ(xi , yi ,h)

]︸ ︷︷ ︸

concave

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 8 / 21

Page 29: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Solving the Non-Convex Optimization

Concave-Convex Procedure (CCCP)(2) Upper bound the concave part with a hyperplane at ~wt

∀~w ,−

[C

n∑i=1

maxh∈H

~w · Φ(xi , yi ,h)

]︸ ︷︷ ︸

concave

≤ −

[C

n∑i=1

~w · Φ(xi , yi ,h∗i )

]︸ ︷︷ ︸

linear

where h∗i = argmaxh∈H

~wt · Φ(xi , yi ,h)

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 9 / 21

Page 30: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Solving the Non-Convex Optimization

Concave-Convex Procedure (CCCP)(3) Minimize the resulting convex sum to get ~wt+1

~wt+1 = min~w

[12‖~w‖2 + C

n∑i=1

max(y ,h)∈Y×H

[~w · Φ(xi , y , h) + ∆(yi , y , h)]

]︸ ︷︷ ︸

convex

[C

n∑i=1

~w · Φ(xi , yi ,h∗i )

]︸ ︷︷ ︸

linear

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 10 / 21

Page 31: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Analogy to Expectation-Maximization

E-step: equivalent to computing the upper boundinghyperplane

M-step: equivalent to minimizing the convex sum

Point estimate for latent variables; no normalization withpartition function requiredDiscriminative probabilistic models with latent variables

I [ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein’07]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 11 / 21

Page 32: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Analogy to Expectation-Maximization

E-step: equivalent to computing the upper boundinghyperplane

M-step: equivalent to minimizing the convex sum

Point estimate for latent variables; no normalization withpartition function requiredDiscriminative probabilistic models with latent variables

I [ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein’07]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 11 / 21

Page 33: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Analogy to Expectation-Maximization

E-step: equivalent to computing the upper boundinghyperplane

M-step: equivalent to minimizing the convex sum

Point estimate for latent variables; no normalization withpartition function required

Discriminative probabilistic models with latent variablesI [ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein

’07]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 11 / 21

Page 34: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Analogy to Expectation-Maximization

E-step: equivalent to computing the upper boundinghyperplane

M-step: equivalent to minimizing the convex sum

Point estimate for latent variables; no normalization withpartition function requiredDiscriminative probabilistic models with latent variables

I [ Gunawardana et al. 05], [Wang et al. ’06], [Petrov & Klein’07]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 11 / 21

Page 35: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Noun Phrase CoreferenceInput x : Noun phraseswith edge features

Label y : Clusters ofnoun phrasesLatent variable h:‘Strong’ links as treesTask: Cluster thenoun phrases usingsingle-linkagglomerativeclusteringInference: MinimumSpanning Tree

[from Cardie & Wagstaff ’99]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21

Page 36: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Noun Phrase CoreferenceInput x : Noun phraseswith edge featuresLabel y : Clusters ofnoun phrases

Latent variable h:‘Strong’ links as treesTask: Cluster thenoun phrases usingsingle-linkagglomerativeclusteringInference: MinimumSpanning Tree

[from Cardie & Wagstaff ’99]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21

Page 37: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Noun Phrase CoreferenceInput x : Noun phraseswith edge featuresLabel y : Clusters ofnoun phrasesLatent variable h:‘Strong’ links as trees

Task: Cluster thenoun phrases usingsingle-linkagglomerativeclusteringInference: MinimumSpanning Tree

[from Cardie & Wagstaff ’99]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21

Page 38: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Noun Phrase CoreferenceInput x : Noun phraseswith edge featuresLabel y : Clusters ofnoun phrasesLatent variable h:‘Strong’ links as treesTask: Cluster thenoun phrases usingsingle-linkagglomerativeclustering

Inference: MinimumSpanning Tree

[from Cardie & Wagstaff ’99]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21

Page 39: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Noun Phrase CoreferenceInput x : Noun phraseswith edge featuresLabel y : Clusters ofnoun phrasesLatent variable h:‘Strong’ links as treesTask: Cluster thenoun phrases usingsingle-linkagglomerativeclusteringInference: MinimumSpanning Tree

[from Cardie & Wagstaff ’99]

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 12 / 21

Page 40: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Noun Phrase Coreference: Results

Test on MUC 6 data, using the same features as in [Ng &Cardie ’02]

Initialize spanning trees by chronological order

10-fold CV results:Algorithm MITRE lossSVMcluster [Finley & Joachims ’05] 41.3Latent Structural SVM 35.6

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 13 / 21

Page 41: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Noun Phrase Coreference: Results

Test on MUC 6 data, using the same features as in [Ng &Cardie ’02]

Initialize spanning trees by chronological order10-fold CV results:

Algorithm MITRE lossSVMcluster [Finley & Joachims ’05] 41.3Latent Structural SVM 35.6

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 13 / 21

Page 42: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Discriminative Motif Finding

Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri

Label y : Whether the sequencereplicates in S. cerevisiaeLatent variable h: position of the motifTask: Find out the predictive motifInference: Enumerate all positions h

S. cerevisiae

S. kluyveri

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21

Page 43: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Discriminative Motif Finding

Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri

Label y : Whether the sequencereplicates in S. cerevisiae

Latent variable h: position of the motifTask: Find out the predictive motifInference: Enumerate all positions h

S. cerevisiae

S. kluyveri

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21

Page 44: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Discriminative Motif Finding

Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri

Label y : Whether the sequencereplicates in S. cerevisiaeLatent variable h: position of the motif

Task: Find out the predictive motifInference: Enumerate all positions h

S. cerevisiae

S. kluyveri

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21

Page 45: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Discriminative Motif Finding

Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri

Label y : Whether the sequencereplicates in S. cerevisiaeLatent variable h: position of the motifTask: Find out the predictive motif

Inference: Enumerate all positions h

S. cerevisiae

S. kluyveri

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21

Page 46: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Discriminative Motif Finding

Input x : DNA sequences containingARS from S. cerevisiae and S. kluyveri

Label y : Whether the sequencereplicates in S. cerevisiaeLatent variable h: position of the motifTask: Find out the predictive motifInference: Enumerate all positions h

S. cerevisiae

S. kluyveri

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 14 / 21

Page 47: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Discriminative Motif Finding: Results

Data - 197 yeast DNA sequences from S. cerevisiae and S.kluyveri.∼6000 intergenic sequences for background estimation

10-fold CV, 10 random restarts for each parameter settingAlgorithm Error RateGibbs Sampler (w=11) 37.9%Gibbs Sampler (w=17) 35.06%Latent Structural SVM (w=11) 11.09%Latent Structural SVM (w=17) 12.00%

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 15 / 21

Page 48: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Discriminative Motif Finding: Results

Data - 197 yeast DNA sequences from S. cerevisiae and S.kluyveri.∼6000 intergenic sequences for background estimation10-fold CV, 10 random restarts for each parameter setting

Algorithm Error RateGibbs Sampler (w=11) 37.9%Gibbs Sampler (w=17) 35.06%Latent Structural SVM (w=11) 11.09%Latent Structural SVM (w=17) 12.00%

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 15 / 21

Page 49: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Conclusions and Future Directions

A new formulation of Latent Variable Structural SVM with anefficient solution algorithm

A modular algorithm that exhibits very good accuracies ontwo example structured prediction tasksPotential extensions to semi-supervised settingsAlso looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21

Page 50: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Conclusions and Future Directions

A new formulation of Latent Variable Structural SVM with anefficient solution algorithmA modular algorithm that exhibits very good accuracies ontwo example structured prediction tasks

Potential extensions to semi-supervised settingsAlso looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21

Page 51: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Conclusions and Future Directions

A new formulation of Latent Variable Structural SVM with anefficient solution algorithmA modular algorithm that exhibits very good accuracies ontwo example structured prediction tasksPotential extensions to semi-supervised settings

Also looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21

Page 52: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Conclusions and Future Directions

A new formulation of Latent Variable Structural SVM with anefficient solution algorithmA modular algorithm that exhibits very good accuracies ontwo example structured prediction tasksPotential extensions to semi-supervised settingsAlso looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21

Page 53: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Conclusions and Future Directions

A new formulation of Latent Variable Structural SVM with anefficient solution algorithmA modular algorithm that exhibits very good accuracies ontwo example structured prediction tasksPotential extensions to semi-supervised settingsAlso looking at situations in structured output learningwhere unlabeled data in output domain Y are plentiful

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 16 / 21

Page 54: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Discriminative Motif Finding - FormulationFeature vector Φ: Position-specific weight matrix plusparameters for Markov background model

Φ(x , y ,h) =h∑

i=1

φBG(xi)︸ ︷︷ ︸background

+l∑

j=1

φ(j)PSM(xh+j)︸ ︷︷ ︸motif

+n∑

i=h+l+1

φBG(xi)︸ ︷︷ ︸background

[from Wasserman 2004]

Loss function ∆: Zero-one loss

Inference: enumeration, as y is binary and h is linear insequence length

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 17 / 21

Page 55: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Discriminative Motif Finding - FormulationFeature vector Φ: Position-specific weight matrix plusparameters for Markov background model

Φ(x , y ,h) =h∑

i=1

φBG(xi)︸ ︷︷ ︸background

+l∑

j=1

φ(j)PSM(xh+j)︸ ︷︷ ︸motif

+n∑

i=h+l+1

φBG(xi)︸ ︷︷ ︸background

[from Wasserman 2004]

Loss function ∆: Zero-one loss

Inference: enumeration, as y is binary and h is linear insequence length

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 17 / 21

Page 56: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Discriminative Motif Finding - FormulationFeature vector Φ: Position-specific weight matrix plusparameters for Markov background model

Φ(x , y ,h) =h∑

i=1

φBG(xi)︸ ︷︷ ︸background

+l∑

j=1

φ(j)PSM(xh+j)︸ ︷︷ ︸motif

+n∑

i=h+l+1

φBG(xi)︸ ︷︷ ︸background

[from Wasserman 2004]

Loss function ∆: Zero-one lossInference: enumeration, as y is binary and h is linear insequence length

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 17 / 21

Page 57: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Noun Phrase Coreference - FormulationFeature vector Φ: sum of tree edge features:

Φ(x , y ,h) =∑(i,j)∈h

xij

Loss function ∆:

∆(y , y , h) = n(y)︸︷︷︸#nodes

− k(y)︸︷︷︸#components

+∑(i,j)∈h

`(y , (i , j))︸ ︷︷ ︸+1/−1

Inference: Any MaximumSpanning Tree algorithm

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 18 / 21

Page 58: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Noun Phrase Coreference - FormulationFeature vector Φ: sum of tree edge features:

Φ(x , y ,h) =∑(i,j)∈h

xij

Loss function ∆:

∆(y , y , h) = n(y)︸︷︷︸#nodes

− k(y)︸︷︷︸#components

+∑(i,j)∈h

`(y , (i , j))︸ ︷︷ ︸+1/−1

Inference: Any MaximumSpanning Tree algorithm

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 18 / 21

Page 59: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Noun Phrase Coreference - FormulationFeature vector Φ: sum of tree edge features:

Φ(x , y ,h) =∑(i,j)∈h

xij

Loss function ∆:

∆(y , y , h) = n(y)︸︷︷︸#nodes

− k(y)︸︷︷︸#components

+∑(i,j)∈h

`(y , (i , j))︸ ︷︷ ︸+1/−1

Inference: Any MaximumSpanning Tree algorithm

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 18 / 21

Page 60: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Optimizing Precision@kInput x : A query with anassociated collection ofdocumentsLabel y : Relevancejudgments of eachdocumentLatent variable h: Top krelevant documents

Query q: ICML 2009

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 19 / 21

Page 61: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Optimizing Precision@k - Formulation

Feature vector Φ: sum of features from top k documents

Φ(x , y ,h) =k∑

j=1

xhj

Loss function ∆: One minus precison@k

∆(y , y , h) = 1− 1k

k∑j=1

[yhj== 1]

Depends only on top k document selected by hInference: Sorting

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 20 / 21

Page 62: Learning Structural SVMs with Latent VariablesLearning Structural SVMs with Latent Variables Chun-Nam Yu Dept. of Computer Science, Cornell University October 8-9, IBM SMiLe Workshop

Optimizing Precision@k - ResultsOHSUMED dataset from LETOR 3.0 benchmarkInitialize h with weight vector trained on classificationaccuracy5-fold CV results:

C.-N. Yu (Cornell) Latent Structural SVMs Oct 8-9, IBM SMiLe Workshop 21 / 21