PropBanks, 10/30/03 1
Penn
Putting Meaning Into Your Trees
Martha Palmer
Paul Kingsbury, Olga Babko-Malaya, Scott Cotton,
Nianwen Xue, Shijong Ryu, Ben SnyderPropBanks I and II site visit
University of Pennsylvania,
October 30, 2003
PropBanks, 10/30/03 2
PennProposition Bank:From Sentences to Propositions
Powell met Zhu Rongji
Proposition: meet(Powell, Zhu Rongji)Powell met with Zhu Rongji
Powell and Zhu Rongji met
Powell and Zhu Rongji had a meeting
. . .When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane.
meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane))
debate
consult
joinwrestle
battle
meet(Somebody1, Somebody2)
PropBanks, 10/30/03 3
PennCapturing semantic roles*
JK broke [ ARG1 the LCD Projector.]
[ARG1 The windows] were broken by the hurricane.
[ARG1 The vase] broke into pieces when it toppled over.
SUBJ
SUBJ
SUBJ
*See also Framenet, http://www.icsi.berkeley.edu/~framenet/
PropBanks, 10/30/03 4
PennOutline Introduction Proposition Bank
Starting with TreebanksFrames filesAnnotation process and status
PropBank II Automatic labelling of semantic roles Chinese Proposition Bank
PropBanks, 10/30/03 5
PennA TreeBanked Sentence
Analysts
S
NP-SBJ
VP
have VP
been VP
expectingNP
a GM-Jaguar pact
NP
that
SBAR
WHNP-1
*T*-1
S
NP-SBJVP
wouldVP
give
the US car maker
NP
NP
an eventual 30% stake
NP
the British company
NP
PP-LOC
in
(S (NP-SBJ Analysts) (VP have (VP been (VP expecting
(NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)
(S (NP-SBJ *T*-1) (VP would
(VP give (NP the U.S. car maker)
(NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company))))))))))))
Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company.
PropBanks, 10/30/03 6
PennThe same sentence, PropBanked
Analysts
have been expecting
a GM-Jaguar pact
Arg0 Arg1
(S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting
Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)
(S Arg0 (NP-SBJ *T*-1) (VP would
(VP give Arg2 (NP the U.S. car maker)
Arg1 (NP (NP an eventual (ADJP 30 %) stake)
(PP-LOC in (NP the British company))))))))))))
that would give
*T*-1
the US car maker
an eventual 30% stake in the British company
Arg0
Arg2
Arg1
expect(Analysts, GM-J pact)give(GM-J pact, US car maker, 30% stake)
PropBanks, 10/30/03 7
PennFrames File Example: expectRoles: Arg0: expecter Arg1: thing expected
Example: Transitive, active:
Portfolio managers expect further declines in interest rates.
Arg0: Portfolio managers REL: expect Arg1: further declines in interest rates
PropBanks, 10/30/03 8
PennFrames File example: giveRoles: Arg0: giver Arg1: thing given Arg2: entity given to
Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation
PropBanks, 10/30/03 9
PennTrends in Argument Numbering Arg0 = agent Arg1 = direct object / theme / patient Arg2 = indirect object / benefactive /
instrument / attribute / end state Arg3 = start point / benefactive / instrument /
attribute Arg4 = end point
PropBanks, 10/30/03 10
PennErgative/Unaccusative Verbs
Roles (no ARG0 for unaccusative verbs)Arg1 = Logical subject, patient, thing rising
Arg2 = EXT, amount risen
Arg3* = start point
Arg4 = end point
Sales rose 4% to $3.28 billion from $3.16 billion.
The Nasdaq composite index added 1.01 to 456.6 on paltry volume.
PropBanks, 10/30/03 11
Penn
Function tags for English/Chinese (arguments or adjuncts?)
Variety of ArgM’s (Arg#>4): TMP - when? LOC - where at? DIR - where to? MNR - how? PRP -why? TPC – topic PRD -this argument refers to or modifies another ADV –others CND – conditional DGR – degree FRQ - frequency
PropBanks, 10/30/03 12
PennInflection Verbs also marked for tense/aspect
Passive/Active Perfect/Progressive Third singular (is has does was) Present/Past/Future Infinitives/Participles/Gerunds/Finites
Modals and negation marked as ArgMs
PropBanks, 10/30/03 13
PennWord Senses in PropBank Orders to ignore word sense not feasible for 700+
verbs Mary left the room Mary left her daughter-in-law her pearls in her will
Frameset leave.01 "move away from":Arg0: entity leavingArg1: place left
Frameset leave.02 "give":Arg0: giver Arg1: thing givenArg2: beneficiary
How do these relate to traditional word senses as in WordNet?
PropBanks, 10/30/03 14
PennOverlap between Groups and Framesets – 95%
WN1 WN2 WN3 WN4
WN6 WN7 WN8 WN5 WN 9 WN10
WN11 WN12 WN13 WN 14
WN19 WN20
Frameset1
Frameset2
developPalmer, Dang & Fellbaum, NLE 2004
PropBanks, 10/30/03 15
PennAnnotator accuracy – ITA 84%
1000 10000 100000 10000000.86
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
0.96hertlerb
forbesk
solaman2istreit
wiarmstr kingsbur
ksledge
nryant
malayaojaywang
delilkanpteppercotter
Annotator Accuracy-primary labels only
# of annotations (log scale)
accu
racy
PropBanks, 10/30/03 16
PennEnglish PropBank Status - (w/ Paul Kingsbury & Scott Cotton)
Create Frame File for that verb - DONE 3282 lemmas, 4400+ framesets
First pass: Automatic tagging (Joseph Rosenzweig)
Second pass: Double blind hand correction 118K predicates – all but 300 done
Third pass: Solomonization (adjudication) Betsy Klipple, Olga Babko-Malaya – 400 left
Frameset tags 700+, double blind, almost adjudicated, 92% ITA
Quality Control and general cleanup
PropBanks, 10/30/03 17
PennQuality Control and General Cleanup
Frame File consistency checking Coordination with NYU
Insuring compatibility of frames and format
Leftover tasks have, be, become Adjectival usages
General cleanup Tense tagging Finalizing treatment of split arguments, ex. say, and
symmetric arguments, ex. match Supplementing sparse data w/ Brown for selected verbs
PropBanks, 10/30/03 18
PennSummary of English PropBankPaul Kingsbury, Olga Babko-Malaya, Scott Cotton
Genre Words Frames Files Frameset
Tags
Released
Wall Street Journal*
(financial subcorpus)
300K < 2000 400 July, 02
Wall Street Journal*
(Penn TreeBank II)
1000K < 4000 700 Dec, 03?
(March, 03)
English Translation of
Chinese TreeBank *
ITIC funding
100K <1500 July, 04
Sinorama, English corpus
NSF-ITR funding
150K <2000 July, 05
English half of DLI
Military Corpus
ARL funding
50K < 1000 July, 05
PropBanks, 10/30/03 19
PennPropBank II Nominalizations NYU Lexical Frames DONE Event Variables, (including temporals and
locatives) More fine-grained sense tagging
Tagging nominalizations w/ WordNet senseSelected verbs and nouns
Nominal Coreference not names
Clausal Discourse connectives – selected subset
PropBanks, 10/30/03 20
PennPropBank I
Also, [Arg0substantially lower Dutch corporate tax rates] helped [Arg1[Arg0 the company] keep [Arg1 its tax outlay] [Arg3-PRD flat] [ArgM-ADV relative to earnings growth]].
relative to earnings…
flatits tax outlaythe company
keep
the company keep its tax outlay flat
tax rateshelp
ArgM-ADVArg3-PRD
Arg1Arg0REL
Event variables;
ID#h23
k16
nominal reference;sense tags;
help2,5 tax rate1
keep1
company1
discourse connectives
{ }
I
PropBanks, 10/30/03 21
Penn
Summary of Multilingual TreeBanks, PropBanks
Parallel Corpora
Text Treebank PropBank I
Prop II
Chinese Treebank
Chinese 500K
English 400K
Chinese 500K
English 100K
Chinese 500K
English 350K*
Ch 100K
En 100K
Arabic
Treebank
Arabic 500K
English 500K
Arabic 500K
English 100K
Korean
Treebank
Korean 180K
English 50K
Korean 180K
English 50K
Korean100K+
English 50K* Also 1M word English monolingual PropBank
PropBanks, 10/30/03 22
PennAgenda PropBank I 10:30 – 10:50
Automatic labeling of semantic roles Chinese Proposition Bank
Proposition Bank II 10:50 – 11:30 Event variables – Olga Babko Malaya Sense tagging – Hoa Dang Nominal coreference – Edward Loper Discourse tagging – Aravind Joshi
Research Areas – 11:30 – 12:00 Moving forward – Mitch Marcus Alignment improvement via dependency structures– Yuan Ding Employing syntactic features in MT – Libin Shen
Lunch 12:00 – 1:30 White Dog Research Area - 1:30 – 1:45
Clustering – Paul Kingsbury DOD Program presentation – 1:45 – 2:15 Discussion 2:15 – 3:00