Upload
zamora
View
33
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Learning and Inference for Hierarchically Split PCFGs. Slav Petrov and Dan Klein. The Game of Designing a Grammar. Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]. The Game of Designing a Grammar. - PowerPoint PPT Presentation
Citation preview
Learning and Inference for Hierarchically Split PCFGs
Slav Petrov and Dan Klein
The Game of Designing a Grammar
Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]
The Game of Designing a Grammar
Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00]
The Game of Designing a Grammar
Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00] Automatic clustering?
Forward
Learning Latent AnnotationsEM algorithm:
X1
X2X7X4
X5 X6X3
He was right
.
Brackets are known Base categories are known Only induce subcategories
Just like Forward-Backward for HMMs. Backward
[Matsuzaki et al. ‘05]
k=16k=8k=4
k=2
k=160
65
70
75
80
85
90
50 250 450 650 850 1050 1250 1450 1650
Total Number of grammar symbols
Parsing accuracy (F1)
OverviewLimit of computational resources
- Hierarchical Training- Adaptive Splitting- Parameter Smoothing
Refinement of the DT tag
DT-1 DT-2 DT-3 DT-4
DT
Refinement of the DT tagDT
Hierarchical refinement of the DT tagDT
Hierarchical Estimation Results
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100 1300 1500 1700
Total Number of grammar symbols
Parsing accuracy (F1)
Model F1Baseline 87.3Hierarchical Training 88.4
Refinement of the , tag
Splitting all categories the same amount is wasteful:
Adaptive Splitting
Want to split complex categories more Idea: split everything, roll back splits which
were least useful
Likelihood with split reversedLikelihood with split
Adaptive Splitting
Want to split complex categories more Idea: split everything, roll back splits which
were least useful
Likelihood with split reversedLikelihood with split
Adaptive Splitting Results
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100 1300 1500 1700
Total Number of grammar symbols
Parsing accuracy (F1)
50% Merging
Hierarchical Training
Flat TrainingModel F1Previous 88.4With 50% Merging 89.5
0
5
10
15
20
25
30
35
40
NP VP PP
ADVP
S
ADJPSBARQP
WHNPPRNNX
SINVPRTWHPP
SQ
CONJPFRAGNACUCP
WHADVP
INTJ
SBARQ
RRC
WHADJP
X
ROOTLST
Number of Phrasal Subcategories
0
5
10
15
20
25
30
35
40
NP VP PPADVP
S
ADJPSBARQP
WHNPPRN
NXSINVPRT
WHPPSQ
CONJPFRAGNACUCP
WHADVP
INTJSBARQ
RRC
WHADJP
X
ROOTLST
PP
VPNP
Number of Phrasal Subcategories
0
5
10
15
20
25
30
35
40
NP VP PPADVP
S
ADJPSBARQP
WHNPPRN
NXSINVPRT
WHPPSQ
CONJPFRAGNACUCP
WHADVP
INTJSBARQ
RRC
WHADJP
X
ROOTLST
XNAC
Number of Phrasal Subcategories
0
10
20
30
40
50
60
70
NNPJJ
NNSNNVBN
RBVBG
VBVBDCDIN
VBZVBPDTNNPS
CCJJRJJS:
PRPPRP$MDRBRWPPOSPDTWRB-LRB-
.EX
WP$WDT-RRB-
''FWRBS
TO$
UH, ``
SYMRPLS
#
TO
,POS
Number of Lexical Subcategories
Number of Lexical Subcategories
0
10
20
30
40
50
60
70
NNPJJ
NNSNNVBN
RBVBG
VBVBDCDIN
VBZVBPDTNNPS
CCJJRJJS:
PRPPRP$MDRBRWPPOSPDTWRB-LRB-
.EX
WP$WDT-RRB-
''FWRBS
TO$
UH, ``
SYMRPLS
#
NN
NNS
NNP JJ
Smoothing
Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool
statistics
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100
Total Number of grammar symbols
Parsing accuracy (F1)50% Merging and Smoothing 50% MergingHierarchical TrainingFlat Training
Model F1Previous 89.5With Smoothing 90.7
Result Overview
Proper Nouns (NNP):
Personal pronouns (PRP):
NNP-14 Oct. Nov. Sept.NNP-12 John Robert JamesNNP-2 J. E. L.NNP-1 Bush Noriega PetersNNP-15 New San WallNNP-3 York Francisco Street
PRP-0 It He IPRP-1 it he theyPRP-2 it them him
Linguistic Candy
Linguistic Candy Relative adverbs (RBR):
Cardinal Numbers (CD):
RBR-0 further lower higherRBR-1 more less MoreRBR-2 earlier Earlier later
CD-7 one two ThreeCD-4 1989 1990 1988CD-11 million billion trillionCD-0 1 50 100CD-3 1 30 31CD-9 78 58 34
Inference
She heard the noise.
Exhaustive parsing:1 min per sentence
Coarse-to-Fine Parsing[Goodman ‘97, Charniak&Johnson ‘05]
Coarse grammarNP … VP
TreebankParse
Pru
ne
NP-17 NP-12NP-1 VP-6VP-31…
Refined grammar
…
Parse
Hierarchical PruningConsider again the span 5 to 12:
… QP NP VP …coarse:
split in two: … QP1
QP2
NP1 NP2 VP1 VP2 …
… QP1
QP1
QP3
QP4
NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:
split in eight: … … … … … … … … … … … … … … … … …
< t
Intermediate Grammars
X-Bar=G0
G=
G1
G2
G3
G4
G5
G6
Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8
DT1 DT2 DT3 DT4
DT1
DT
DT2
G1
G2
G3
G4
G5
G6
Learning
G1
G2
G3
G4
G5
G6
Learning
Projected Grammars
X-Bar=G0
G=
Projection
i0(G)
1(G)
2(G)
3(G)
4(G)
5(G)G
Final Results (Efficiency)
Parsing the development set (1600 sentences)
Berkeley Parser: 10 min Implemented in Java
Charniak & Johnson ‘05 Parser 19 min Implemented in C
Final Results (Accuracy)
≤ 40 wordsF1
all F1
ENG
Charniak&Johnson ‘05 (generative) 90.1 89.6
This Work 90.6 90.1
GER
Dubey ‘05 76.3 -
This Work 80.8 80.1
CH
N
Chiang et al. ‘02 80.0 76.6
This Work 86.3 83.4
Extensions
Acoustic modeling
Infinite Grammars Nonparametric Bayesian Learning QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
[Petrov, Pauls & Klein ‘07]
[Liang, Petrov, Jordan & Klein ‘07]
Conclusions
Split & Merge Learning Hierarchical Training Adaptive Splitting Parameter Smoothing
Hierarchical Coarse-to-Fine Inference Projections Marginalization
Multi-lingual Unlexicalized Parsing
Thank You!
http://nlp.cs.berkeley.edu