Upload
amelia-horn
View
217
Download
0
Embed Size (px)
Citation preview
GALE Banks 11/9/06
1
Parsing Arabic: Key Aspects of Treebank Annotation
Seth KulickRyan GabbardMitch Marcus
GALE Banks 11/9/06
2
Outline
Summary of recent results Part of Speech/Treebank “mismatches” Components of Flat NPs Test and Train Results Conclusion
GALE Banks 11/9/06
3
Recent Results
Effect of Sentence Splitting – S->S (wa) S (wa) SBreaking these improves F-measure by
1.25%Investigating automatic accuracy of S splitting
Effect of “Spurious NPs” in coordination(NP (NP x) and (NP y)) changed to
(NP x and y and z)Improves F-measure by 0.5%
GALE Banks 11/9/06
4
Pos/Treebank Mismatches
“Ideal” – XP projection headed by XIdeal and Reality in the PTB and ATB
Ambiguities for (Pos word) makes parser’s job harder
GALE Banks 11/9/06
5
VP headed by noun
6% of VPs in ATB have a nonverbal headChanged heads to have new POS tag – “DV”Temporary approximation to current
annotation changes0.7 increase in F-measure
(VP (NOUN mugAdar+at+i- [departure]) (NP-SBJ (POSS_PRON –hi [his]) (NP-OBJ (DET+NOUN Al+bayot+a [the house]) (DET+ADJ Al+>aboyaD+a [the white])))
GALE Banks 11/9/06
6
NP headed by adj – #1
(S (NP-SBJ (PRON_1S –niy [I]) (NP-PRD (ADJ saEiyd+N [happy]))
ADJ heads NP-PRD, elsewhere ADJP-PRD
(VP (PV+PVSUFF_SUBJ kAn+a [be+he]) (NP-SBJ-1 (-NONE- *T*)) (ADJP-PRD (ADJ saEiyd+AF happy) (PP … [with the voting])))
GALE Banks 11/9/06
7
NP headed by adj - #2
(VP (IV ta+Eomal+a [they work]) (NP-SBJ rAbiT+ap+u Al+maxAtyr+i [league of the mukhtars(village chiefs)]) (NP-ADV (ADJ dA}im+AF [always]))
ADJ heads NP-ADV, elsewhere ADVP,ADJP
(VP (IV na+>omal+a [we hope for] (NP-SBJ (-NONE- *)) (ADVP (ADJ dA}im+AF [always]))
(VP (IV ya+SiH~+u he/it+be correct (NP-SBJ-1 (-NONE- *T*)) (ADJP (ADJ dA}im+AF [always])
GALE Banks 11/9/06
8
ADJP headed by noun
(S (NP-SBJ (NOUN >um~ah+At+u- [mothers]) (POSS_PRON_3P -hum [their])) (ADJP-PRD (NOUN >amiyrokiy~+At+N [American]))
Also as ADJ
(NP (NOUN >um~ah+At+K [mothers]) (ADJ >amiyrokiy~+At+K [American]))
GALE Banks 11/9/06
9
ADVP headed by conj
(S (ADVP (FOCUS_PART >am~A [as_for/concerning])) (NP-TPC-1 Haqiyb+ap+u Al+xArijiy~+ap+I [the foreign ministry’s portfolio]) (ADVP (CONJ fa- [and/so])) (VP ….
(CONJ fa-) also as child of S
(S (S …) (PUNC ,) (CONJ fa- [and/so]) (S…)
GALE Banks 11/9/06
10
Mismatches in ATB and PTB
ATB3 PTB2.0
VP 6.0% 0.5%
NP 5.0% 1.6%
ADJP 7.3% 23.4%
ADVP 45.37% 8.0%
PP 0.8% 1.8%
GALE Banks 11/9/06
11
XP/X mismatches - SummaryThis matters:
headless VPs to “DV” modification : +0.7%PTB: 23.4% mismatch for ADJP
Overall: 88.28 ADJP: 70.68
Real-life linguistic complexityNeed guidelines – visual prop timeSome automatic changes likely
No guarantee of level of improvement, but:Should be a priority
GALE Banks 11/9/06
12
Flat NPs
Flat NPs – only (Pos word) childrenExperiment –
Evaluate with Flat NPs as different bracketAffects overall score
(Gold)(NP (NOUN -<ijorA’+i [conducting]) (NP (NOUN {inotixAb+At+K [elections]) (ADJ niyAbiy~+ap+K [representative])))
GALE Banks 11/9/06
13
Flat NPs
(Gold)(NP (NOUN -<ijorA’+i [conducting]) (NP (NOUN {inotixAb+At+K [elections]) (ADJ niyAbiy~+ap+K [representative])))
(Test)(NP (NN -<ijorA’+i [conducting]) (NNS {inotixAb+At+K [elections]) (JJ niyAbiy~+ap+K [representative]))
Under regular evaluation, top NPs match
GALE Banks 11/9/06
14
Flat NPs
(Gold)(NP (NOUN -<ijorA’+i [conducting]) (FLATNP (NOUN {inotixAb+At+K [elections]) (ADJ niyAbiy~+ap+K [representative])))
(Test)(FLATNP (NN -<ijorA’+i [conducting]) (NNS {inotixAb+At+K [elections]) (JJ niyAbiy~+ap+K [representative]))
With FlatNP evalution, no match
GALE Banks 11/9/06
15
Flat NPs
Importance of Flat NPs30% of brackets are Flat NPsErrors percolate Up
ATB3 score on Flat NPs not good enoughUnclear why, but need some things from ATB
Flat NPs Overall
PTB2.0 94.20 87.54
ATB3 86.77 77.27
GALE Banks 11/9/06
16
Flat NPs
Clear statement of what can go in flat NPsRegular expressions for each headCertain things fall out:
Questionable categories – e.g. (DET+NOUN DET+NOUN) (NP Al+baHor+i [the sea] Al+>aHomar+i [the red])
Nouns that occur before a head noun are limited to a small class : quantifiers
GALE Banks 11/9/06
17
Flat NPs
(NP (NOUN kul~+a [every/all/each_one]) (DET+NOUN Al+nuSuws+I [the texts] (DET+ADJ Al+tijAriy~+ap+I [the business])
Quantifier as prenominal modifier in flat NP
Quantifier as taking NP complement
(NP (NOUN kul~+a [every/all/each_one]) (NP (DET+NOUN Al+duwal+i [the countries]) (DET+ADJ A+Earabiy~+ap+I [the Arabic]))
Quantifiers take NP complement 15%
GALE Banks 11/9/06
18
Flat NPs - SummaryReal-life linguistic complexity
Need guidelines for NP structure, quantifiersSome automatic changes likelyMaybe different POS tag for NOUNs with
different distribution?
No guarantee of level of improvement, but:Should be a priority
GALE Banks 11/9/06
19
Test on Train
ATB3 lower, but not so muchAnalysis of dependency errors
All <=40
PTB2.0 96.80 97.10
ATB3 94.31 95.34
GALE Banks 11/9/06
20
Dependency Analysis
PTB2.0 ATB3
% all Fmeas %all Fmeas
31.08% 99.19% 16.33% 95.83%
0.0% N/A 10.13% 97.08%
NPB
head mod
NP
head NP
% all = % of all dependenciesNPB = “base NP”, non-recursive NPMore evidence that minimal NPs matter a lot
GALE Banks 11/9/06
21
Dependency Analysis
PTB2.0 ATB3
% all Fmeas %all Fmeas
5.23% 94.78 5.74% 89.05
0.04% 30.40 1.28% 65.08
NP
NPB PP
NP
NP PP
Why the difference in PP adjoining to NP, and not just NPB?
GALE Banks 11/9/06
22
PP attachment in PTB
Adjuncts at the same level
Okay Not Okay
(NP (NP ….) (PP ….) (PP …))
(NP (NP (NP …) (PP …))
(PP …))
This is true for ATB also
GALE Banks 11/9/06
23
PP attachment in PTB
(NP (NP streets) (PP of (NP (NP the city) (PP of (NP Long Beach)) (PP in (NP the state…)))))
(NP (NP streets) (PP of (NP (NP (NP the city) (PP of (NP Long Beach))) (PP in (NP the state…)))))
First is okay, second is notPPs in PTB do not adjoin to recursive NPsPPs in ATB do, because of Al<DAfp
GALE Banks 11/9/06
24
PP attachment in PTB and ATB
(NP (NP streets) (PP of (NP (NP (NP the city) (PP of (NP Long Beach))) (PP in (NP the state…)))))
(NP ($awAriE [streets]) (NP (NP madinyn+ap [the city]) (NP luwnog byt$ [Long Beach])) (PP fiy [in] (NP wilAy+ap [the state] .. ))))
PTB: PP adjoining to recursive NP – bad structure
ATB: PP adjoining to recursive NP – good structure
GALE Banks 11/9/06
25
Dependency Analysis
PTB2.0 ATB3
% all Fmeas %all Fmeas
5.23% 94.78 5.74% 89.05
0.04% 30.40 1.28% 65.08
NP
NPB PP NP
NP PP
Parser distinguishes NPB, helps for PTB.A wider range of attachment possibilities for ATBChallenge for the parser
GALE Banks 11/9/06
26
Conclusion
We need guidelines We need to create the guidelines
Interaction - Parsing and TreebankIdentify useful consistency checksRun as part of each release
Better understanding of problematic areasWhat sort of changes are necessary?Parsing – automatic transformationsTreebank – Pos changes, etc.
Proper time allocation?