22
PropBanks, 10/30/03 1 Penn Putting Meaning Into Your Trees Martha Palmer Paul Kingsbury, Olga Babko-Malaya, Scott Cotton, Nianwen Xue, Shijong Ryu, Ben Snyder PropBanks I and II site visit University of Pennsylvania, October 30, 2003

PropBanks, 10/30/03 1 Penn Putting Meaning Into Your Trees Martha Palmer Paul Kingsbury, Olga Babko-Malaya, Scott Cotton, Nianwen Xue, Shijong Ryu, Ben

Embed Size (px)

Citation preview

PropBanks, 10/30/03 1

Penn

Putting Meaning Into Your Trees

Martha Palmer

Paul Kingsbury, Olga Babko-Malaya, Scott Cotton,

Nianwen Xue, Shijong Ryu, Ben SnyderPropBanks I and II site visit

University of Pennsylvania,

October 30, 2003

PropBanks, 10/30/03 2

PennProposition Bank:From Sentences to Propositions

Powell met Zhu Rongji

Proposition: meet(Powell, Zhu Rongji)Powell met with Zhu Rongji

Powell and Zhu Rongji met

Powell and Zhu Rongji had a meeting

. . .When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane.

meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane))

debate

consult

joinwrestle

battle

meet(Somebody1, Somebody2)

PropBanks, 10/30/03 3

PennCapturing semantic roles*

JK broke [ ARG1 the LCD Projector.]

[ARG1 The windows] were broken by the hurricane.

[ARG1 The vase] broke into pieces when it toppled over.

SUBJ

SUBJ

SUBJ

*See also Framenet, http://www.icsi.berkeley.edu/~framenet/

PropBanks, 10/30/03 4

PennOutline Introduction Proposition Bank

Starting with TreebanksFrames filesAnnotation process and status

PropBank II Automatic labelling of semantic roles Chinese Proposition Bank

PropBanks, 10/30/03 5

PennA TreeBanked Sentence

Analysts

S

NP-SBJ

VP

have VP

been VP

expectingNP

a GM-Jaguar pact

NP

that

SBAR

WHNP-1

*T*-1

S

NP-SBJVP

wouldVP

give

the US car maker

NP

NP

an eventual 30% stake

NP

the British company

NP

PP-LOC

in

(S (NP-SBJ Analysts) (VP have (VP been (VP expecting

(NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)

(S (NP-SBJ *T*-1) (VP would

(VP give (NP the U.S. car maker)

(NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company))))))))))))

Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company.

PropBanks, 10/30/03 6

PennThe same sentence, PropBanked

Analysts

have been expecting

a GM-Jaguar pact

Arg0 Arg1

(S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting

Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)

(S Arg0 (NP-SBJ *T*-1) (VP would

(VP give Arg2 (NP the U.S. car maker)

Arg1 (NP (NP an eventual (ADJP 30 %) stake)

(PP-LOC in (NP the British company))))))))))))

that would give

*T*-1

the US car maker

an eventual 30% stake in the British company

Arg0

Arg2

Arg1

expect(Analysts, GM-J pact)give(GM-J pact, US car maker, 30% stake)

PropBanks, 10/30/03 7

PennFrames File Example: expectRoles: Arg0: expecter Arg1: thing expected

Example: Transitive, active:

Portfolio managers expect further declines in interest rates.

Arg0: Portfolio managers REL: expect Arg1: further declines in interest rates

PropBanks, 10/30/03 8

PennFrames File example: giveRoles: Arg0: giver Arg1: thing given Arg2: entity given to

Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

PropBanks, 10/30/03 9

PennTrends in Argument Numbering Arg0 = agent Arg1 = direct object / theme / patient Arg2 = indirect object / benefactive /

instrument / attribute / end state Arg3 = start point / benefactive / instrument /

attribute Arg4 = end point

PropBanks, 10/30/03 10

PennErgative/Unaccusative Verbs

Roles (no ARG0 for unaccusative verbs)Arg1 = Logical subject, patient, thing rising

Arg2 = EXT, amount risen

Arg3* = start point

Arg4 = end point

Sales rose 4% to $3.28 billion from $3.16 billion.

The Nasdaq composite index added 1.01 to 456.6 on paltry volume.

PropBanks, 10/30/03 11

Penn

Function tags for English/Chinese (arguments or adjuncts?)

Variety of ArgM’s (Arg#>4): TMP - when? LOC - where at? DIR - where to? MNR - how? PRP -why? TPC – topic PRD -this argument refers to or modifies another ADV –others CND – conditional DGR – degree FRQ - frequency

PropBanks, 10/30/03 12

PennInflection Verbs also marked for tense/aspect

Passive/Active Perfect/Progressive Third singular (is has does was) Present/Past/Future Infinitives/Participles/Gerunds/Finites

Modals and negation marked as ArgMs

PropBanks, 10/30/03 13

PennWord Senses in PropBank Orders to ignore word sense not feasible for 700+

verbs Mary left the room Mary left her daughter-in-law her pearls in her will

Frameset leave.01 "move away from":Arg0: entity leavingArg1: place left

Frameset leave.02 "give":Arg0: giver Arg1: thing givenArg2: beneficiary

How do these relate to traditional word senses as in WordNet?

PropBanks, 10/30/03 14

PennOverlap between Groups and Framesets – 95%

WN1 WN2 WN3 WN4

WN6 WN7 WN8 WN5 WN 9 WN10

WN11 WN12 WN13 WN 14

WN19 WN20

Frameset1

Frameset2

developPalmer, Dang & Fellbaum, NLE 2004

PropBanks, 10/30/03 15

PennAnnotator accuracy – ITA 84%

1000 10000 100000 10000000.86

0.87

0.88

0.89

0.9

0.91

0.92

0.93

0.94

0.95

0.96hertlerb

forbesk

solaman2istreit

wiarmstr kingsbur

ksledge

nryant

malayaojaywang

delilkanpteppercotter

Annotator Accuracy-primary labels only

# of annotations (log scale)

accu

racy

PropBanks, 10/30/03 16

PennEnglish PropBank Status - (w/ Paul Kingsbury & Scott Cotton)

Create Frame File for that verb - DONE 3282 lemmas, 4400+ framesets

First pass: Automatic tagging (Joseph Rosenzweig)

Second pass: Double blind hand correction 118K predicates – all but 300 done

Third pass: Solomonization (adjudication) Betsy Klipple, Olga Babko-Malaya – 400 left

Frameset tags 700+, double blind, almost adjudicated, 92% ITA

Quality Control and general cleanup

PropBanks, 10/30/03 17

PennQuality Control and General Cleanup

Frame File consistency checking Coordination with NYU

Insuring compatibility of frames and format

Leftover tasks have, be, become Adjectival usages

General cleanup Tense tagging Finalizing treatment of split arguments, ex. say, and

symmetric arguments, ex. match Supplementing sparse data w/ Brown for selected verbs

PropBanks, 10/30/03 18

PennSummary of English PropBankPaul Kingsbury, Olga Babko-Malaya, Scott Cotton

Genre Words Frames Files Frameset

Tags

Released

Wall Street Journal*

(financial subcorpus)

300K < 2000 400 July, 02

Wall Street Journal*

(Penn TreeBank II)

1000K < 4000 700 Dec, 03?

(March, 03)

English Translation of

Chinese TreeBank *

ITIC funding

100K <1500 July, 04

Sinorama, English corpus

NSF-ITR funding

150K <2000 July, 05

English half of DLI

Military Corpus

ARL funding

50K < 1000 July, 05

PropBanks, 10/30/03 19

PennPropBank II Nominalizations NYU Lexical Frames DONE Event Variables, (including temporals and

locatives) More fine-grained sense tagging

Tagging nominalizations w/ WordNet senseSelected verbs and nouns

Nominal Coreference not names

Clausal Discourse connectives – selected subset

PropBanks, 10/30/03 20

PennPropBank I

Also, [Arg0substantially lower Dutch corporate tax rates] helped [Arg1[Arg0 the company] keep [Arg1 its tax outlay] [Arg3-PRD flat] [ArgM-ADV relative to earnings growth]].

relative to earnings…

flatits tax outlaythe company

keep

the company keep its tax outlay flat

tax rateshelp

ArgM-ADVArg3-PRD

Arg1Arg0REL

Event variables;

ID#h23

k16

nominal reference;sense tags;

help2,5 tax rate1

keep1

company1

discourse connectives

{ }

I

PropBanks, 10/30/03 21

Penn

Summary of Multilingual TreeBanks, PropBanks

Parallel Corpora

Text Treebank PropBank I

Prop II

Chinese Treebank

Chinese 500K

English 400K

Chinese 500K

English 100K

Chinese 500K

English 350K*

Ch 100K

En 100K

Arabic

Treebank

Arabic 500K

English 500K

Arabic 500K

English 100K

Korean

Treebank

Korean 180K

English 50K

Korean 180K

English 50K

Korean100K+

English 50K* Also 1M word English monolingual PropBank

PropBanks, 10/30/03 22

PennAgenda PropBank I 10:30 – 10:50

Automatic labeling of semantic roles Chinese Proposition Bank

Proposition Bank II 10:50 – 11:30 Event variables – Olga Babko Malaya Sense tagging – Hoa Dang Nominal coreference – Edward Loper Discourse tagging – Aravind Joshi

Research Areas – 11:30 – 12:00 Moving forward – Mitch Marcus Alignment improvement via dependency structures– Yuan Ding Employing syntactic features in MT – Libin Shen

Lunch 12:00 – 1:30 White Dog Research Area - 1:30 – 1:45

Clustering – Paul Kingsbury DOD Program presentation – 1:45 – 2:15 Discussion 2:15 – 3:00