Upload
nelson-norman
View
218
Download
1
Tags:
Embed Size (px)
Citation preview
10/9/01 PropBank 1
Proposition Bank: a resource of
predicate-argument relations
Martha Palmer
University of Pennsylvania
October 9, 2001
Columbia University
10/9/01 PropBank 2
Outline Overview (Ace consensus: BBN,NYU,MITRE,Penn)
Motivation Approach
• Guidelines, lexical resources, frame sets• Tagging process, hand correction of automatic
tagging
Status: accuracy, progress Colleagues: Joseph Rosenzweig, Paul
Kingsbury, Hoa Dang, Karin Kipper, Scott Cotton, Laren Delfs, Christiane Fellbaum
10/9/01 PropBank 3
Proposition Bank:Generalizing from Sentences to Propositions
Powell met Zhu Rongji
Proposition: meet(Powell, Zhu Rongji)Powell met with Zhu Rongji
Powell and Zhu Rongji met
Powell and Zhu Rongji had a meeting
. . .
When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane.
meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane))
debate
consult
joinwrestle
battle
meet(Somebody1, Somebody2)
10/9/01 PropBank 4
Penn English Treebank 1.3 million words Wall Street Journal and other sources Tagged with Part-of-Speech Syntactically Parsed Widely used in NLP community Available from Linguistic Data Consortium
10/9/01 PropBank 5
A TreeBanked Sentence
Analysts
S
NP-SBJ
VP
have VP
been VP
expecting NP
a GM-Jaguar pact
NP
that
SBAR
WHNP-1
*T*-1
S
NP-SBJVP
wouldVP
give
the US car maker
NP
NP
an eventual 30% stake
NP
the British company
NP
PP-LOC
in
(S (NP-SBJ Analysts) (VP have (VP been (VP expecting
(NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)
(S (NP-SBJ *T*-1) (VP would
(VP give (NP the U.S. car maker)
(NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British
company))))))))))))
Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company.
10/9/01 PropBank 6
The same sentence, PropBanked
Analysts
have been expecting
a GM-Jaguar pact
Arg0 Arg1
(S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting
Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)
(S Arg0 (NP-SBJ *T*-1) (VP would
(VP give Arg2 (NP the U.S. car maker)
Arg1 (NP (NP an eventual (ADJP 30 %) stake)
(PP-LOC in (NP the British company))))))))))))that would give
*T*-1
the US car maker
an eventual 30% stake in the British company
Arg0
Arg2
Arg1
expect(Analysts, GM-J pact)give(GM-J pact, US car maker, 30% stake)
10/9/01 PropBank 7
Motivation
Why do we need accurate predicate-argument relations? They have a major impact on Information Processing. Ex: Korean/English Machine Translation: ARL/SBIR
• CoGenTex, Penn, Systran (K/E Bilinugal Lexicon, 20K)
• 4K words ( < 500 words from Systran, military messages)• Plug and play architecture based on DsyntS (rich dependency structure)• Converter bug led to random relabeling of predicate arguments• Correction of predicate argument labels alone led to tripling of
acceptable sentence output
10/9/01 PropBank 8
Focusing on Parser comparisons 200 sentences hand selected to represent “good”
translations given a correct parse. Used to compare:
• Corrected DsyntS output• Juntae’s parser output (off-the-shelf)• Anoop’s parser output (Treebank trained, 95% F)
10/9/01 PropBank 9
Evaluating translation quality Compare DLI Human translation to system output (200) Criteria used by human judges (2 or more, not blind)
• [g] = good, exactly right
• [f1] = fairly good, but small grammatical mistakes
• [f2] = Needs fixing, but vocabulary basically there
• [f3] = Needs quite a bit of fixing, usually some
un-translated vocabulary, but most v. is right
• [m] = seems grammatical, but semantically wrong,
actually misleading
• [i] = irredeemable, really wrong, major problems
10/9/01 PropBank 10
Results Comparison = 200 sent.
0 20 40 60 80 100 120
Anoop
Juntae
Correct
Bad 5 9 3
Fixable 85 67 11
Good 10 24 85
Anoop Juntae Correct
10/9/01 PropBank 11
Plug and play? Converter used to map Parser outputs into MT
DsyntS format• Bug in the converter affected both systems• Predicate argument structure labels were being lost
in the conversion process, relabeled randomly
The converter was also still tuned to Juntae’s parse output, needed to be customized to Anoop’s
10/9/01 PropBank 12
Anoop’s parse -> MTW DsyntS
–0010Target: Unit designations are normally transmitted in code.–0010Corrected: Normally unit designations are notified in the code.–0010Anoop: Normally it is notified unit designations in code.
notified
unit
normally codedesignations
C = Arg1P = Arg0
10/9/01 PropBank 13
Anoop’s parse -> MTW DsyntS
0022Target: Under what circumstances does radio inteference occur? 0022Corrected: In what circumstances does the interference happen in the radio?0022Anoop: Do in what circumstance happen interference in radio?
happen
what
radio interferencecircumstances
C = Arg0P = ArgM
C = Arg1
P = Arg0
10/9/01 PropBank 14
New and Old Results Comparison
0% 20% 40% 60% 80% 100%
A2
J2
Correct
Bad 4.5 5 4 9 3
Fixable 60.5 85 64.5 67 11
Good 37 10 31 24 85
A2 A1 J2 J1 Correct
10/9/01 PropBank 15
English PropBank
1M words of Treebank over 2 years, May’01-03 New semantic augmentations
• Predicate-argument relations for verbs• label arguments: Arg0, Arg1, Arg2, …• First subtask, 300K word financial subcorpus
(12K sentences, 35K+ predicates)
Spin-off: Guidelines (necessary for annotators)• English lexical resource• 6000+ verbs with labeled examples, rich semantics
10/9/01 PropBank 16
Task: not just undoing passives
The earthquake shook the building. <arg0> <WN3> <arg1>
The walls shook; the building rocked. <arg1> <WN3>; <arg1> <WN1>
The guidelines = lexicon with examples: Frames Files
10/9/01 PropBank 17
Guidelines: Frames Files Created manually – Paul Kingsbury
• working on semi-automatic expansion
Refer to VerbNet, WordNet and Framenet Currently in place for 230 verbs
• Can expand to 2000+ using VerbNet• Will need hand correction
Use “semantic role glosses” unique to each verb (map to Arg0, Arg1 labels appropriate to class)
10/9/01 PropBank 18
Frames Example: expectRoles: Arg0: expecter Arg1: thing expected
Example: Transitive, active:
Portfolio managers expect further declines in interest rates.
Arg0: Portfolio managers REL: expect Arg1: further declines in interest rates
10/9/01 PropBank 19
Frames File example: giveRoles: Arg0: giver Arg1: thing given Arg2: entity given to
Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation
10/9/01 PropBank 20
The same sentence, PropBanked
Analysts
have been expecting
a GM-Jaguar pact
Arg0 Arg1
(S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting
Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)
(S Arg0 (NP-SBJ *T*-1) (VP would
(VP give Arg2 (NP the U.S. car maker)
Arg1 (NP (NP an eventual (ADJP 30 %) stake)
(PP-LOC in (NP the British company))))))))))))that would give
*T*-1
the US car maker
an eventual 30% stake in the British company
Arg0
Arg2
Arg1
expect(Analysts, GM-J pact)give(GM-J pact, US car maker, 30% stake)
10/9/01 PropBank 21
Complete Sentence
Analysts have been expecting a GM-Jaguar pact that *T*-1 would give the U.S. car maker an eventual 30% stake in the British company and create joint venturesthat *T*-2 would produce an executive-model rangeof cars.
10/9/01 PropBank 22
How are arguments numbered?
Examination of example sentences Determination of required / highly preferred
elements Sequential numbering, Arg0 is typical first
argument, except ergative/unaccusative verbs (shake example) Arguments mapped for "synonymous" verbs
10/9/01 PropBank 23
Additional tags (arguments or adjuncts?)
Variety of ArgM’s (Arg#>4):• TMP - when?
• LOC - where at?
• DIR - where to?
• MNR - how?
• PRP -why?
• REC - himself, themselves, each other
• PRD -this argument refers to or modifies another
• ADV -others
10/9/01 PropBank 24
Tense/aspect Verbs also marked for tense/aspect
Passive Perfect Progressive Infinitival
Modals and negation marked as ArgMs
10/9/01 PropBank 25
Ergative/Unaccusative Verbs: rise
Roles
Arg1 = Logical subject, patient, thing rising
Arg2 = EXT, amount risen
Arg3* = start point
Arg4 = end point
Sales rose 4% to $3.28 billion from $3.16 billion.
*Note: Have to mention prep explicitly, Arg3-from, Arg4-to, or could haveused ArgM-Source, ArgM-Goal. Arbitrary distinction.
10/9/01 PropBank 26
Synonymous Verbs: add in sense riseRoles:
Arg1 = Logical subject, patient, thing rising/gaining/being added to
Arg2 = EXT, amount risen
Arg4 = end point
The Nasdaq composite index added 1.01 to 456.6 on paltry volume.
10/9/01 PropBank 27
Phrasal Verbs Put together Put in Put off Put on Put out Put up ...
10/9/01 PropBank 28
Frames: Multiple Rolesets Rolesets are not necessarily consistent between different
senses of the same verb• Verb with multiple senses can have multiple frames, but not
necessarily Roles and mappings onto argument labels are consistent
between different verbs that share similar argument structures, Similar to Framenet
• Levin / VerbNet classes• http://www.cis.upenn.edu/~dgildea/VerbNet/
Out of the 179 most frequent verbs:• 1 Roleset – 92• 2 rolesets – 45• 3+ rolesets – 42 (includes light verbs)
10/9/01 PropBank 29
Annotation procedure
Extraction of all sentences with given verb First pass – automatic tagging Second pass: Double blind hand correction
• Variety of backgrounds• less syntactic training than for treebanking
Script to discover discrepancies Third pass: Solomonization (adjudication)
10/9/01 PropBank 30
Inter-annotator agreement
0
10
20
30
40
50
60
70
80
90
100
Buy 48
Begin 70Bid 70
Base 46
See 34
End 84
Cost 67
Keep 52Sell 52Leave 50
Announce 87
Close 80
Decline 53
Call 59
Tell 18
Want 75
Comment 92
Gain 29
Name 41
Seem 83
Offer 43
Know 61
Add 51
Compare 91
Hit 57
Result 83
Believe 11
Find 61
Quote 100
Earn 90
Want 75
Bring 39
Fall 76
Work 63
Approve 81
Elect 75
Cause 55
Resign 82Result 82
Return 73
Climb 62
Change 84
10/9/01 PropBank 31
Annotator Accuracy vs. Gold Standard
Verb Darren Erwin Kate KatherineAcquire 85% 96%Add 86% 93%Announce 90% 99%Bid 50% 95%Cost 78% 89%Decline 96% 61%Hit 96% 60%Keep 92% 53%Know 89% 69%
One version of annotation chosen (sr. annotator) Solomon modifies => Gold Standard
10/9/01 PropBank 32
Status
179 verbs framed (+ Senseval2 verbs) 97 verbs first-passed
12,300+ predicates Does not include ~3000 predicates tagged for
Senseval
54 verbs second-passed 6600+ predicates
9 verbs solomonized 885 predicates
10/9/01 PropBank 33
Throughput
Framing: approximately 2 verbs per hour Annotation: approximately 50 sentences per hour Solomonization: approximately 1 hour per verb
10/9/01 PropBank 34
Automatic Predicate Argument Tagger
Predicate argument labels • Uses TreeBank “cues”
• Consults lexical semantic KB—Hierarchically organized verb subcategorization frames and
alternations associated with tree templates
—Ontology of noun-phrase referents
—Multi-word lexical items
• Matches annotated tree templates against parse in Tree-adjoining Grammar style
• standoff annotation in external file referencing treenodes
Preliminary accuracy rate of 83.7% (800+ predicates)
10/9/01 PropBank 35
Summary
Predicate-argument structure labels are arbitrary to a certain degree, but still consistent, and generic enough to be mappable to particular theoretical frameworks
Automatic tagging as a first pass makes the task feasible Agreement and accuracy figures are reassuring
10/9/01 PropBank 36
SolomonizationSource tree: Intel told analysts that the company will resume
shipments of the chips within two to three weeks . *** kate said:arg0 : Intelarg1 : the company will resume shipments of the chips within
two to three weeksarg2 : analysts*** erwin said:arg0 : Intelarg1 : that the company will resume shipments of the chips
within two to three weeksarg2 : analysts
10/9/01 PropBank 37
SolomonizationSuch loans to Argentina also remain classified as non-accruing,
*TRACE*-1 costing the bank $ 10 million *TRACE*-*U* of interest income in the third period.
*** kate said:argM-TMP : in the third periodarg3 : the bankarg2 : $ 10 million *TRACE*-*U* of interest incomearg1 : *TRACE*-1*** erwin said:argM-TMP : in the third periodarg3 : the bankarg2 : $ 10 million *TRACE*-*U* of interest incomearg1 : *TRACE*-1 Such loans to Argentina
10/9/01 PropBank 38
SolomonizationAlso , substantially lower Dutch corporate tax rates helped the
company keep its tax outlay flat relative to earnings growth.*** kate said:argM-MNR : relative to earnings growtharg3-PRD : flatarg1 : its tax outlayarg0 : the company*** katherine said:argM-ADV : relative to earnings growtharg3-PRD : flatarg1 : its tax outlayarg0 : the company