33
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein

Learning and Inference for Hierarchically Split PCFGs

  • Upload
    zamora

  • View
    33

  • Download
    2

Embed Size (px)

DESCRIPTION

Learning and Inference for Hierarchically Split PCFGs. Slav Petrov and Dan Klein. The Game of Designing a Grammar. Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]. The Game of Designing a Grammar. - PowerPoint PPT Presentation

Citation preview

Page 1: Learning and Inference  for Hierarchically Split PCFGs

Learning and Inference for Hierarchically Split PCFGs

Slav Petrov and Dan Klein

Page 2: Learning and Inference  for Hierarchically Split PCFGs

The Game of Designing a Grammar

Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]

Page 3: Learning and Inference  for Hierarchically Split PCFGs

The Game of Designing a Grammar

Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00]

Page 4: Learning and Inference  for Hierarchically Split PCFGs

The Game of Designing a Grammar

Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00] Automatic clustering?

Page 5: Learning and Inference  for Hierarchically Split PCFGs

Forward

Learning Latent AnnotationsEM algorithm:

X1

X2X7X4

X5 X6X3

He was right

.

Brackets are known Base categories are known Only induce subcategories

Just like Forward-Backward for HMMs. Backward

[Matsuzaki et al. ‘05]

Page 6: Learning and Inference  for Hierarchically Split PCFGs

k=16k=8k=4

k=2

k=160

65

70

75

80

85

90

50 250 450 650 850 1050 1250 1450 1650

Total Number of grammar symbols

Parsing accuracy (F1)

OverviewLimit of computational resources

- Hierarchical Training- Adaptive Splitting- Parameter Smoothing

Page 7: Learning and Inference  for Hierarchically Split PCFGs

Refinement of the DT tag

DT-1 DT-2 DT-3 DT-4

DT

Page 8: Learning and Inference  for Hierarchically Split PCFGs

Refinement of the DT tagDT

Page 9: Learning and Inference  for Hierarchically Split PCFGs

Hierarchical refinement of the DT tagDT

Page 10: Learning and Inference  for Hierarchically Split PCFGs

Hierarchical Estimation Results

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100 1300 1500 1700

Total Number of grammar symbols

Parsing accuracy (F1)

Model F1Baseline 87.3Hierarchical Training 88.4

Page 11: Learning and Inference  for Hierarchically Split PCFGs

Refinement of the , tag

Splitting all categories the same amount is wasteful:

Page 12: Learning and Inference  for Hierarchically Split PCFGs

Adaptive Splitting

Want to split complex categories more Idea: split everything, roll back splits which

were least useful

Likelihood with split reversedLikelihood with split

Page 13: Learning and Inference  for Hierarchically Split PCFGs

Adaptive Splitting

Want to split complex categories more Idea: split everything, roll back splits which

were least useful

Likelihood with split reversedLikelihood with split

Page 14: Learning and Inference  for Hierarchically Split PCFGs

Adaptive Splitting Results

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100 1300 1500 1700

Total Number of grammar symbols

Parsing accuracy (F1)

50% Merging

Hierarchical Training

Flat TrainingModel F1Previous 88.4With 50% Merging 89.5

Page 15: Learning and Inference  for Hierarchically Split PCFGs

0

5

10

15

20

25

30

35

40

NP VP PP

ADVP

S

ADJPSBARQP

WHNPPRNNX

SINVPRTWHPP

SQ

CONJPFRAGNACUCP

WHADVP

INTJ

SBARQ

RRC

WHADJP

X

ROOTLST

Number of Phrasal Subcategories

Page 16: Learning and Inference  for Hierarchically Split PCFGs

0

5

10

15

20

25

30

35

40

NP VP PPADVP

S

ADJPSBARQP

WHNPPRN

NXSINVPRT

WHPPSQ

CONJPFRAGNACUCP

WHADVP

INTJSBARQ

RRC

WHADJP

X

ROOTLST

PP

VPNP

Number of Phrasal Subcategories

Page 17: Learning and Inference  for Hierarchically Split PCFGs

0

5

10

15

20

25

30

35

40

NP VP PPADVP

S

ADJPSBARQP

WHNPPRN

NXSINVPRT

WHPPSQ

CONJPFRAGNACUCP

WHADVP

INTJSBARQ

RRC

WHADJP

X

ROOTLST

XNAC

Number of Phrasal Subcategories

Page 18: Learning and Inference  for Hierarchically Split PCFGs

0

10

20

30

40

50

60

70

NNPJJ

NNSNNVBN

RBVBG

VBVBDCDIN

VBZVBPDTNNPS

CCJJRJJS:

PRPPRP$MDRBRWPPOSPDTWRB-LRB-

.EX

WP$WDT-RRB-

''FWRBS

TO$

UH, ``

SYMRPLS

#

TO

,POS

Number of Lexical Subcategories

Page 19: Learning and Inference  for Hierarchically Split PCFGs

Number of Lexical Subcategories

0

10

20

30

40

50

60

70

NNPJJ

NNSNNVBN

RBVBG

VBVBDCDIN

VBZVBPDTNNPS

CCJJRJJS:

PRPPRP$MDRBRWPPOSPDTWRB-LRB-

.EX

WP$WDT-RRB-

''FWRBS

TO$

UH, ``

SYMRPLS

#

NN

NNS

NNP JJ

Page 20: Learning and Inference  for Hierarchically Split PCFGs

Smoothing

Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool

statistics

Page 21: Learning and Inference  for Hierarchically Split PCFGs

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100

Total Number of grammar symbols

Parsing accuracy (F1)50% Merging and Smoothing 50% MergingHierarchical TrainingFlat Training

Model F1Previous 89.5With Smoothing 90.7

Result Overview

Page 22: Learning and Inference  for Hierarchically Split PCFGs

Proper Nouns (NNP):

Personal pronouns (PRP):

NNP-14 Oct. Nov. Sept.NNP-12 John Robert JamesNNP-2 J. E. L.NNP-1 Bush Noriega PetersNNP-15 New San WallNNP-3 York Francisco Street

PRP-0 It He IPRP-1 it he theyPRP-2 it them him

Linguistic Candy

Page 23: Learning and Inference  for Hierarchically Split PCFGs

Linguistic Candy Relative adverbs (RBR):

Cardinal Numbers (CD):

RBR-0 further lower higherRBR-1 more less MoreRBR-2 earlier Earlier later

CD-7 one two ThreeCD-4 1989 1990 1988CD-11 million billion trillionCD-0 1 50 100CD-3 1 30 31CD-9 78 58 34

Page 24: Learning and Inference  for Hierarchically Split PCFGs

Inference

She heard the noise.

Exhaustive parsing:1 min per sentence

Page 25: Learning and Inference  for Hierarchically Split PCFGs

Coarse-to-Fine Parsing[Goodman ‘97, Charniak&Johnson ‘05]

Coarse grammarNP … VP

TreebankParse

Pru

ne

NP-17 NP-12NP-1 VP-6VP-31…

Refined grammar

Parse

Page 26: Learning and Inference  for Hierarchically Split PCFGs

Hierarchical PruningConsider again the span 5 to 12:

… QP NP VP …coarse:

split in two: … QP1

QP2

NP1 NP2 VP1 VP2 …

… QP1

QP1

QP3

QP4

NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:

split in eight: … … … … … … … … … … … … … … … … …

< t

Page 27: Learning and Inference  for Hierarchically Split PCFGs

Intermediate Grammars

X-Bar=G0

G=

G1

G2

G3

G4

G5

G6

Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8

DT1 DT2 DT3 DT4

DT1

DT

DT2

Page 28: Learning and Inference  for Hierarchically Split PCFGs

G1

G2

G3

G4

G5

G6

Learning

G1

G2

G3

G4

G5

G6

Learning

Projected Grammars

X-Bar=G0

G=

Projection

i0(G)

1(G)

2(G)

3(G)

4(G)

5(G)G

Page 29: Learning and Inference  for Hierarchically Split PCFGs

Final Results (Efficiency)

Parsing the development set (1600 sentences)

Berkeley Parser: 10 min Implemented in Java

Charniak & Johnson ‘05 Parser 19 min Implemented in C

Page 30: Learning and Inference  for Hierarchically Split PCFGs

Final Results (Accuracy)

≤ 40 wordsF1

all F1

ENG

Charniak&Johnson ‘05 (generative) 90.1 89.6

This Work 90.6 90.1

GER

Dubey ‘05 76.3 -

This Work 80.8 80.1

CH

N

Chiang et al. ‘02 80.0 76.6

This Work 86.3 83.4

Page 31: Learning and Inference  for Hierarchically Split PCFGs

Extensions

Acoustic modeling

Infinite Grammars Nonparametric Bayesian Learning QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

[Petrov, Pauls & Klein ‘07]

[Liang, Petrov, Jordan & Klein ‘07]

Page 32: Learning and Inference  for Hierarchically Split PCFGs

Conclusions

Split & Merge Learning Hierarchical Training Adaptive Splitting Parameter Smoothing

Hierarchical Coarse-to-Fine Inference Projections Marginalization

Multi-lingual Unlexicalized Parsing

Page 33: Learning and Inference  for Hierarchically Split PCFGs

Thank You!

http://nlp.cs.berkeley.edu