A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University

A Joint Model For Semantic Role Labeling

Aria Haghighi, Kristina Toutanova, Christopher D. Manning

Computer Science DepartmentStanford University

Most Previous Work: Local Models

• Extract features for each node and the predicate

• Classify nodes independently

the children

The ogrecooked

NP

S

VPNP(n

)

Phrase Type: NPPath: NP-up-VP-down VHead Word: childrenPredicate: cookPassive: falsePosition: after

A Drawback of Local Models

S

VP

The ogre

cooked

NP

NP

the children

NP

a meal

NPPATIENT

NPAGENT

NPBENIFICIARY NPPATIENT

• Core argument frame constraints• Hard Constraints: No overlapping arguments• Soft Constraints: AGENT occurs before other core arguments in a active sentence• Highly non-local constraints

• Model a predicate’s argument preferences• No core arguments are bad and so are 10• Verb specific rules: Obligatory arguments

• We’d like to do this statistically without hand-coding constraints or conditions

What we’d like to capture

Previous Joint Approaches

• Argument Language model and Viterbi Decoding

(Gildea and Jurafsky, 02)

• Linear Programming over Local Scores (Punyakanok et al, 04 and 05)

• Our approach: Capture joint information between features and labels discriminatively

Joint Discriminative Reranking

• Use a reranking approach (Collins 00)• Start with local model with strong

independences

• Find top N non-overlapping assignments for local model using a simple dynamic program (Toutanova, 05)

• Use joint model to select best assignment among top N using a joint log-linear model

Reranking Upperbounds

72

76

80

84

88

92

96

100

0 5 10 15 20top N

Performance

core args f-measure core args whole frame accall args f-measure all args whole frame acc

• Reranking not a serious bottleneck

• Core arguments top 20: f-measure 99.2, whole frame acc 97.4

• All arguments top 20: f-measure 98.8, whole frame acc 95.3

Global Reranking Features[AGENT The company] offered [PATIENT a 20%

stake]

[BENEFICIARY to the public]

• Core Argument Sequence with predicate and voice• [NPAGENT active:pred NPPATIENT PP-toBENEFICIARY]• Lexicalized version: active:pred to active:offer• [AGENT active:pred PATIENT BENEFICIARY] • [NP active:pred NP PP-to]

• Frame Feature• [NP active:pred NPPATIENT PP-to]• Compare to less likely [NP active:pred NPPATIENT NP]

Joint Results and Improvements

• Improvement doesn’t match gold parses (Toutanova,05)

• Argument Identification Bottleneck

Flat Model Joint Model Error Reduction

Dev Set F-Measure

74.52 76.71 8.6 %

Dev Set Whole Frame Accuracy

51.02 % 54.92 % 7.1 %

Using Multiple Trees

• Argument identification sensitive to parser errors

• PP attachment, Coordination, etc..

• Path feature becomes very noisy

• Use Top K trees (Charniak Parser ‘05)

• For top local assignments and

trees choose assignment and tree

to maximize:

• Only a small boost in performance….

Dealing with Dislocations

• Argument dislocation via control, subject raising etc.• IsMissingSubject and Path• For local with overlap: 73.80 to 74.52• AGENT improvement: 81.02 to 83.08

S

VPNPi

isS

VP

expectedVPNPi

-NONE-

The trade gap

to widen

Final Results

F-Measure Whole Frame

Test WSJ 78.45 56.52 %

Test Brown 67.71 37.06 %

Combined 77.04 44.83 %

• Genaralizing to other domains

Thanks !Thanks !

Why hasn’t it been done?

• Exponential Blowup!• A normal-sized tree in the Wall Street Journal will

have about 40 internal nodes to be classified• About 1 trillion possible assignments (binary ARG/NONE)

Thanks !Thanks !

What we’d like to capture …..

• Model predicate’s argument preferences • Bad: no core arguments, 10 core arguments • Verb specific rules: Require A0 or A1 args

• Model dependencies between labels and features of argument sequence • Discourage repeated arguments • Model syntactic alternations: [NPA0,gave,NPA2,NPA1] [NPA0,gave,NPA1,PP_toA2]

• Principled Parameter Estimation

Previous Work: Local Classifiers

• Extract features and classify each node independently

S

Phrase Type: NPPath: NP-up-VP-down VHead Word: Dursleys

NPVP

NPNPV

PP

a lesson

NPHarry Potter

gave

the Dursleys

in magic

NP

last week

(n)

(n)(n)(n)

• Core argument frame strongly interdependent• Hard Constraints: No overlapping arguments• Soft Constraints: A0 occurs before A1, A2, etc…

• Doesn’t capture statistical tendencies in core argument sequences and their syntactic realization

Problems with Local Classifiers…

NPA0

SVP

NPA0

NPA1V

PP

a lesson

NPHarry Potter

gave

the Dursleys

in magic

NPTMP

last week

Documents

A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University