53
Hidden–Variable Models for Discriminative Reranking Terry Koo and Michael Collins {maestro|mcollins}@csail.mit.edu

Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Hidden–Variable Models forDiscriminative Reranking

Terry Koo and Michael Collins{maestro|mcollins}@csail.mit.edu

Page 2: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Overview of reranking

The reranking approachUse a baseline model to get the N-best candidates“Rerank” the candidates using a more complex model

Parse rerankingCollins (2000): 88.2% ⇒ 89.8%Charniak and Johnson (2005): 89.7% ⇒ 91.0%Talk by Brooke Cowan in 7B: 83.6% ⇒ 85.1%

Also applied toMT (Och and Ney, 2002; Shen et al., 2004)NL Generation (Walker et al., 2001)

Page 3: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Representing NLP structures

Proper representation is critical to success

Hand–crafted feature vector representations

Φ( ) = {0, 1, 2, 0, 0, 3, 0, 1}

Features defined through kernels

K ( , ) = Φ( )·Φ( )

This talk: A new approach using hiddenvariables

Page 4: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Two facets of lexical items

Different lexical items can have similarmeanings, e.g. president and chairman

Clustering: president, chairman ∈ NounCluster 4

A single lexical item can have differentmeanings, e.g. [river] bank vs [financial] bank

Refinement: bank 1, bank 2 ∈ bank

Model clusterings and refinements as hiddenvariables that support the reranking task

Page 5: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Highlights of the approach

Conditional log–linear model with hiddenvariables

Dynamic programming is used for training anddecoding

Clustering and refinement done automaticallyusing a discriminative criterion

Page 6: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Overview of talk

Motivation

Design

General form of the model

Training and decoding efficiently

Creating specific instantiations

Results

Discussion

Conclusion

Page 7: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

The parse reranking framework

Sentences si for 1 ≤ i ≤ n

s1: Pierre Vinken , 61 years old , will join ...

s2: Mr. Vinken is chairman of Elsevier N.V. ...

s3: Big Board Chairman John Phelan said yesterday ...

Each si has candidate parses t i,j for 1 ≤ j ≤ ni

t i,1 is the best candidate parse for si

Page 8: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

The parse reranking framework

t i,j has phrase structure and dependency tree

Mr. Vinken is chairman of Elsevier N.V.

NNP

NP NP

PP

NP

VP

S

NP

NNPNNP VB NN IN NNP

Page 9: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

The parse reranking framework

t i,j has phrase structure and dependency tree

Vinken is chairman of Elsevier N.V.Mr.

NNP VB NN IN NNP NNP

NPNP NP

PP

NP

VP

S

NNP

Page 10: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Adding hidden variables

Hidden–value domains Hw(t i,j) for 1 ≤ w ≤ len(si)

1NN

NN1

2

NN3

IN1

IN3

IN2

NNP1

NNP3

NNP2

NNP

NNP2

3

NNP

Mr. Vinken is chairman of Elsevier N.V.

NNP1

NNP3

NNP2 NNP

NNP2

3

NNP1

VB

VB2

3

VB1

S

NNPNNP VB NN IN NNP NNP

NPNP NP

PP

NP

VP

Page 11: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Adding hidden variables

Assignment h ∈ H1(t i,j) × ... × H len(si)(t i,j)

NNNNP1

NNP3

NNP2 NNP

NNP2

3

NNP1

VB

VB2

3

VB1

NN1

2

NN3 NNP

NNP2

3

NNP1NNP1

NNP3

NNP2

IN1

IN3

IN2

Mr. Vinken is chairman of Elsevier N.V.

NNPNNP VB NN IN NNP NNP

NPNP NP

PP

NP

VP

S

Page 12: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Marginalized probability model

Φ(t i,j, h) produces a descriptive vector of featureoccurrence counts, e.g.

Φ2(t i,j, h) = Count( chairman has hidden value NN 1)

Φ13(t i,j, h) = Count(NNP 2 is a direct object of VB 1)

Φ19(t i,j, h) = Count(NN 1 coordinates with NN 2)

Page 13: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Marginalized probability model

Log–linear distribution over ( t i,j, h) withparameters Θ:

p(t i,j , h | si , Θ) = eΦ(t i,j , h)·Θ∑

j′ ,h′

eΦ(t i,j′ , h′)·Θ

Marginalize over assignments h:

p(t i,j | si , Θ) =∑

hp(t i,j , h | si , Θ)

Page 14: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Optimizing the parameters

Define loss as negative log-likelihood

L(Θ) = -n∑

i=1log p(t i,1 | si , Θ)

Minimize L(Θ) through gradient descent

∂L∂Θ

= -∑

i

hp(h | t i,1, si , Θ)Φ(t i,1, h)

+∑

i,jp(t i,j | si , Θ)

hp(h | t i,j , si , Θ)Φ(t i,j , h)

Page 15: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Overview of talk

Motivation

Design

General form of the model

Training and decoding efficiently

Creating specific instantiations

Results

Discussion

Conclusion

Page 16: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Problems with efficiency

|H1(t i,j) × ... × H len(si)(t i,j)| grows exponentially, sotraining the model is intractable:

∂L∂Θ

= -∑

i

hp(h | t i,1, si , Θ)Φ(t i,1, h)

+∑

i,jp(t i,j | si , Θ)

hp(h | t i,j , si , Θ)Φ(t i,j , h)

Decoding the model is also intractable:

p(t i,j | si , Θ) =∑

hp(t i,j , h | si , Θ)

Page 17: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Problems with efficiency

|H1(t i,j) × ... × H len(si)(t i,j)| grows exponentially, sotraining the model is intractable:

∂L∂Θ

= -∑

i

hp(h | t i,1, si , Θ)Φ(t i,1, h)

+∑

i,jp(t i,j | si , Θ)

hp(h | t i,j , si , Θ)Φ(t i,j , h)

Decoding the model is also intractable:

p(t i,j | si , Θ) =∑

hp(t i,j , h | si , Θ)

Page 18: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Locality constraint on features

Features have pairwise local scope on hiddenvariables

Features still have global scope on non-hiddeninformation

Φ can be factored into local feature vectors,allowing dynamic programming

Page 19: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Local feature vectors

Define two kinds of local feature vector φ:

Single-variable φ(t i,j, w, hw) look at a single hiddenvariable

Pairwise φ(t i,j, u, v, hu, hv) look at two hidden variablesin a dependency relationship

Page 20: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Local feature vectors

Φ(t i,j, h) looks at every hidden variable

NNNNP1

NNP3

NNP2 NNP

NNP2

3

NNP1

VB

VB2

3

VB1

NN1

2

NN3 NNP

NNP2

3

NNP1NNP1

NNP3

NNP2

IN1

IN3

IN2

Mr. Vinken is chairman of Elsevier N.V.

NNPNNP VB NN IN NNP NNP

NPNP NP

PP

NP

VP

S

Page 21: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Local feature vectors

φ(t i,j, chairman, NN3) only sees NN 3

3

NN

NN1

2

NN

Mr. Vinken is chairman of Elsevier N.V.

NNPNNP VB NN IN NNP NNP

NPNP NP

PP

NP

VP

S

Page 22: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Local feature vectors

φ(t i,j, chairman, of, NN3, IN2) sees NN3 and IN2

1

IN3

IN2

NN

NN1

Mr. Vinken is chairman of Elsevier N.V.

2

NN3

IN

S

NNPNNP VB NN IN NNP NNP

NPNP NP

PP

NP

VP

Page 23: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Local feature vectors

Rewrite global Φ as a sum over local φ

Φ(t i,j , h) =∑

w ∈ ti,j

φ(t i,j , w, hw )

+∑

(u,v ) ∈ D(ti,j )φ(t i,j , u, v, hu, hv )

Page 24: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Local feature vectors

Rewrite global Φ as a sum over local φ

Φ(t i,j , h) =∑

w ∈ ti,j

φ(t i,j , w, hw )

+∑

(u,v ) ∈ D(ti,j )φ(t i,j , u, v, hu, hv )

Page 25: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Applying belief propagation

New restrictions enable dynamic–programmingapproaches, e.g. belief propagation

BP generalizes the forward–backward algorithm from achain to a tree

Runtime O(len(si)H2), H = max |Hw(t i,j)|

BP efficiently computes∑

hp(t i,j , h | si , Θ)

hp(h | t i,j , si , Θ)Φ(t i,j , h)

Page 26: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Overview of talk

Motivation

Design

General form of the model

Training and decoding efficiently

Creating specific instantiations

Results

Discussion

Conclusion

Page 27: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Two areas for choice in the model

Definition of the hidden–value domains Hw(t i,j)

Definition of the feature vectors φ

Page 28: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Hidden–value domains

Lexical domains allow word refinement

chairmanMr. Vinken is of Elsevier N.V.

chairman

chairman Elsevier

Elsevier

Elsevier

1

2

3

1

2

3

1

3

1

2

3

Mr.

Mr.

Mr.

1

2

3

2 of

of

of 1

2

3

N.V.

N.V.

N.V.

1

2

3

Vinken

Vinken

Vinken

is

is

is

chairman

Page 29: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Hidden–value domains

Lexical domains allow word refinement

chairmanMr. Vinken is of Elsevier N.V.

Elsevier

Elsevier

1

2

3

1

2

3

1

2

3

Mr.

Mr.

Mr.

1

2

3

chairman

chairman 1

2

3chairman

of

of

of 1

2

3

N.V.

N.V.

N.V.

1

2

3

Vinken

Vinken

Vinken

is

is

is

Elsevier

Page 30: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Hidden–value domains

Part-of-speech domains allow word clustering

ElsevierMr. Vinken is chairman of N.V.

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

VB

VB

VB

VB

VB

NN

NN

NNP

NN

NN1

2

3

4

5

1

2

3

4

5

IN

IN

IN

IN

IN

NNNNP

NNP

NNP

NNP

1

2

3

4

5

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

Page 31: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Hidden–value domains

Part-of-speech domains allow word clustering

ElsevierMr. Vinken is of N.V.chairman

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

VB

VB

VB

VB

VB

1

2

NNP

4

5

IN

IN

IN

IN

IN

NN

NN

NN

NN

NN1

2

3

4

5

3

NNP

NNP

NNP

NNP

1

2

3

4

5

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

Page 32: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Hidden–value domains

Part-of-speech domains allow word clustering

Elsevieris chairman ofMr. Vinken N.V.

1

2

3

4

5

IN

IN

IN

IN

IN

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

1

2

1

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

3

2

3

4

5

VB

VB

VB

VB

VB

NN

NN

NN

NN

NN1

2

3

4

5

Page 33: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Hidden–value domains

Supersense domains model WordNet ontology(Ciaramita and Johnson, 2003; Miller et at., 1993)

Vinken Elsevier N.V.of

is

chairman

Mr.

3

noun.person

noun.person

noun.person1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

11

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

IN

IN

IN

IN

IN22

3verb.stative

verb.stative

verb.stative

1

2

3verb.social

verb.social

verb.social

1

2

3verb.possession

verb.possession

verb.possession

1

2

Page 34: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Hidden–value domains

Supersense domains model WordNet ontology(Ciaramita and Johnson, 2003; Miller et at., 1993)

chairman

of Elsevier

is

Mr. Vinken N.V.

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

NNP

NNP1

NNP

NNP

1

2

3

4

5

IN

IN

IN

IN

IN

1

2

3

noun.person

noun.person

noun.person

NNP2

3verb.stative

verb.stative

verb.stative

1

2

3verb.social

verb.social

verb.social

1

2

3verb.possession

verb.possession

verb.possession

1

2

Page 35: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Hidden–value domains

Supersense domains model WordNet ontology(Ciaramita and Johnson, 2003; Miller et at., 1993)

is

Elsevier N.V.of

chairman

Mr. Vinken

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

IN

IN

IN1

IN

1

2

3verb.possession

verb.possession

verb.possession

1

2

3verb.social

verb.social

verb.social

1

2

3verb.stative

verb.stative

verb.stative

IN2

3

noun.person

noun.person

noun.person1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

Page 36: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Hidden–value domains

Supersense domains model WordNet ontology(Ciaramita and Johnson, 2003; Miller et at., 1993)

Vinken Elsevier N.V.of

is

chairman

Mr.

3

noun.person

noun.person

noun.person1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

11

3

4

5

NNP

NNP

NNP

NNP

NNP

1

2

3

4

5

IN

IN

IN

IN

IN

1

2

3

4

5

NNP

NNP

NNP

NNP

NNP

22

3verb.stative

verb.stative

verb.stative

1

2

3verb.social

verb.social

verb.social

1

2

3verb.possession

verb.possession

verb.possession

1

2

Page 37: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Hidden–value domains

Hidden–value domains that didn’t work well

Word clustering without part-of-speech subdivisions

WordNet hyper/hyponym ontology

Domains containing mixed values

Page 38: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Examples of features

Elsevier

headed by chairmanThe highest nonterminal

2

NN3

IN1

IN3

NNP1

NNP3 NNP

NNP2

3

NNP1

IN2 NNP2

NNP1

Vinken of N.V.chairman

VB1

isMr.

NNP2

NP(chairman)

NNP1

NNP3

NNP

NNP2

3

VB

VB2

3

NN

NN1

S

NNP IN NNP NNP

NP NP

PP

VP

VB NN

NP

NNP

(NN3, Word= chairman, Highest Nonterminal= NP) ∈ φ(t i,j, chairman, NN3)

Page 39: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Examples of features

The governing rule

NN1

2

NN3

IN1

IN3

NNP1

NNP3 NNP

NNP2

3

NNP1

IN2 NNP2

NNP1

Vinken of Elsevier N.V.Mr.

NNP2

VB1

is chairman

NNP1

NNP3

NNP

NNP2

3

VB

VB2

3

NN

NP NP

PP

NN

NP

NNP

S

NP

VP

VBNNP IN NNP NNP

(VB1, NN3, Rule=VP →VB NP) ∈ φ(t i,j, is, chairman, VB1, NN3)

Page 40: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Overview of talk

Motivation

Design

Results

Discussion

Conclusion

Page 41: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Experimental Setup

N-best lists generated by Collins parser, ni ≈ 30

Training set: WSJ sections 2–21

Development set: WSJ section 0

Test set: WSJ sections 22-24

Page 42: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Final test models

Two baseline models

The Collins (1999) base parser

The Collins (2000) reranker

Two mixed models

MIX combined clustering, refinement, and WordNet

MIX+ augments MIX with features of Collins reranker

Page 43: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Results on Sections 22–24

LR LP F 1

Collins parser 88.19 88.60 88.39MIX 89.41 89.87 89.64Collins reranker 89.46 90.07 89.76MIX+ 89.78 90.29 90.03

All comparisons except MIX vs. Collins rerankerare significant at p ≤ 0.01 using the sign test

Page 44: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Results on Sections 22–24

LR LP F 1

Collins parser 88.19 88.60 88.39MIX 89.41 89.87 89.64Collins reranker 89.46 90.07 89.76MIX+ 89.78 90.29 90.03

All comparisons except MIX vs. Collins rerankerare significant at p ≤ 0.01 using the sign test

Page 45: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Results on Sections 22–24

LR LP F 1

Collins parser 88.19 88.60 88.39MIX 89.41 89.87 89.64Collins reranker 89.46 90.07 89.76MIX+ 89.78 90.29 90.03

All comparisons except MIX vs. Collins rerankerare significant at p ≤ 0.01 using the sign test

Page 46: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Overview of talk

Motivation

Design

Results

Discussion

Conclusion

Page 47: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Previous work

Parsing approaches that use hidden variables

Riezler et al. (2002)Clark and Curran (2004)Matsuzaki et al. (2005)

Differences with our approach

Use of rerankingDefinition of hidden variablesUse of belief propagation

Page 48: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Using packed representations

Candidates t i,j represented as a packed forest

Compact representation of many parse trees

Packed representation forces local scope

Features would become locally scoped on non-hiddeninformation

Decoding becomes NP-hard, must approximate withViterbi (cf. Matsuzaki et al., 2005):

argmax p(t i,j| si, Θ) ≈ argmax p(t i,j, h | si, Θ)t i,j t i,j ,h

Page 49: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Empirical analysis of hiddenvalues

The model makes hidden–value assignments onthe basis of the reranking criterion

i.e. maximize∑

log p(t i,1 | si , Θ)

The empirical distribution of assignments showslinguistically reasonable trends

Page 50: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Overview of talk

Motivation

Design

Results

Discussion

Conclusion

Page 51: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Concluding remarks

The hidden–variable model defines a newrepresentation for NLP structures

Conditional log–linear model with hiddenvariables

BP enables efficient and exact training anddecoding

Significant improvement over Collins (2000)

Page 52: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Example

Page 53: Terry Koo and Michael Collins - Massachusetts Institute of ...people.csail.mit.edu/maestro/papers/koo05emnlp-slides.pdf · Representing NLP structures Proper representation is critical

Example

I [will [give/VB 2 an example]]

I expected [to [give/VB 1 an example]]

I expected [to/TO 4 [give an example]]

You expected [me [to/TO 1,5 [give an example]]]