33
Probabilistic Parsing Chapter 14, Part 2

Probabilistic Parsing Chapter 14, Part 2

  • Upload
    shalin

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney. Vanilla PCFG Limitations. - PowerPoint PPT Presentation

Citation preview

Page 1: Probabilistic Parsing Chapter 14, Part 2

Probabilistic Parsing

Chapter 14, Part 2

•This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney

Page 2: Probabilistic Parsing Chapter 14, Part 2

2

Vanilla PCFG Limitations

• Since probabilities of productions do not rely on specific words or concepts, only general structural disambiguation is possible (e.g. prefer to attach PPs to Nominals).

• Consequently, vanilla PCFGs cannot resolve syntactic ambiguities that require semantics to resolve, e.g. ate with fork vs. meatballs.

• In order to work well, PCFGs must be lexicalized,

Page 3: Probabilistic Parsing Chapter 14, Part 2

Example of Importance of Lexicalization

• A general preference for attaching PPs to NPs rather than VPs can be learned by a vanilla PCFG.

• But the desired preference can depend on specific words.

3

S → NP VPS → VPNP → Det A NNP → NP PPNP → PropNA → εA → Adj APP → Prep NPVP → V NPVP → VP PP

0.90.10.50.30.20.60.41.00.70.3

English

PCFG Parser

S

NP VP

John V NP PP

put the dog in the pen

John put the dog in the pen.

3

Page 4: Probabilistic Parsing Chapter 14, Part 2

4

Example of Importance of Lexicalization

• A general preference for attaching PPs to NPs rather than VPs can be learned by a vanilla PCFG.

• But the desired preference can depend on specific words.

S → NP VPS → VPNP → Det A NNP → NP PPNP → PropNA → εA → Adj APP → Prep NPVP → V NPVP → VP PP

0.90.10.50.30.20.60.41.00.70.3

English

PCFG Parser

S

NP VP

John V NP

put the dog in the penXJohn put the dog in the pen.

4

Page 5: Probabilistic Parsing Chapter 14, Part 2

5

Page 6: Probabilistic Parsing Chapter 14, Part 2

Head Words

• Syntactic phrases usually have a word in them that is most “central” to the phrase.

• Linguists have defined the concept of a lexical head of a phrase.

• Simple rules can identify the head of any phrase by percolating head words up the parse tree.– Head of a VP is the main verb

– Head of an NP is the main noun

– Head of a PP is the preposition

– Head of a sentence is the head of its VP

6

Page 7: Probabilistic Parsing Chapter 14, Part 2

Head propagation in parse trees

VP (bought)

bought

NP (book)

the

NN (book)

book

S (bought)

the

NN (man)

man DT (the)

V (bought)DT (the)

NP (man)

7

Page 8: Probabilistic Parsing Chapter 14, Part 2

Lexicalized Productions

• Specialized productions can be generated by including the head word and its POS of each non-terminal as part of that non-terminal’s symbol.

S

VP

VBD NP

DT Nominal

Nominal PP

liked

IN NP

in

the

dog

NN

DT Nominal

NNthe

pen

NNP

NP

John

pen-NN

pen-NN

in-INdog-NN

dog-NN

dog-NN

liked-VBD

liked-VBD

John-NNP

Nominaldog-NN → Nominaldog-NN PPin-IN

8

Page 9: Probabilistic Parsing Chapter 14, Part 2

Lexicalized Productions

S

VP

VP PP

DT Nominalput

IN NP

in

the

dog

NN

DT Nominal

NNthe

pen

NNP

NP

John

pen-NN

pen-NN

in-IN

dog-NN

dog-NN

put-VBD

put-VBD

John-NNP

NPVBD

put-VBD

VPput-VBD → VPput-VBD PPin-IN

9

Page 10: Probabilistic Parsing Chapter 14, Part 2

Parameters

• We used to have– VP -> V NP PP P(r|VP)

• That’s the count of this rule divided by the number of VPs in a treebank

• Now we have– VP(dumped-VBD)-> V(dumped-VBD) NP(sacks-NNS)PP(into-P)

– P(r where dumped is the verb ^ sacks is the head of the NP ^ into is the head of the PP|VP whose head is dumped-VBD)

– Not likely to have significant counts in any treebank

10

Page 11: Probabilistic Parsing Chapter 14, Part 2

Challenges for Probabilistic Lexicalized Grammar

• Huge number of rules

• Most rules are too specific, so it is hard to make good probability estimates for them

• Need to make more manageable independence assumptions

• Many approaches have been developed; We’ll look at one (simplified version)

11

Page 12: Probabilistic Parsing Chapter 14, Part 2

Collins Parser [1997]

• Replaces production rules with a model of bilexical relationships

• Focus on head and dependency relationship (see notes in lecture)

• Generate parses in smaller steps; generate the head, then generate siblings assuming they are conditionally independent given the head

12

Page 13: Probabilistic Parsing Chapter 14, Part 2

Dependency Representation

• Represented as an 8 tuple:

(head word, head tag, modifier word, modifier tag,

parent non-terminal, head non-terminal,

modifier non-terminal, direction)

Example CFG rule:

VP(bought, VBD) V(bought, VBD) NP(book,NN) PP(with,Prep)

Decomposes into two dependencies

(bought, VBD, book, NN, VP, V, NP, right)

(bought, VBD, with, Prep, VP, V, PP, right)

13

Page 14: Probabilistic Parsing Chapter 14, Part 2

Collins: (see lecture)

VP(bought, VBD) V(bought, VBD) NP(book,NN) PP(with,Prep)(head word, head tag, modifier word, modifier tag,parent non-terminal, head non-

terminal, modifier non-terminal, direction)

1. Prhead(headNT| parentNT, headword, headtag)x

2. Prleft(modNT, modword, modtag | parentNT, headNT, headword, headtag) x … x

3. Prleft(STOP| parentNT, headNT, headword, headtag)x

4. Prright(modNT, modword, modtag | parentNT, headNT, headword, headtag) x … x

5. Prright(STOP| parentNT, headNT, headword, headtag)

6. Prhead(V| VP, bought, VBD) x

7. Prleft(STOP| VP, V, bought, VBD)x

8. Prright(NP, book, NN | VP, V, bought, VBD) x

9. Prright(PP, with, Prep | VP, V, bought, VBD) x

10. Prright(STOP| VP, V, bought, VBD)

14

Page 15: Probabilistic Parsing Chapter 14, Part 2

Estimating Production Generation Parameters

• Smooth estimates by linearly interpolating with simpler models conditioned on just POS tag or no lexical info. Using the version in the text:

smPR(PPin-IN | VPput-VBD) = 1 PR(PPin-IN | VPput-VBD)

+ (1 1) (2 PR(PPin-IN | VPVBD) +

(1 2) PR(PPin-IN | VP))

15

Page 16: Probabilistic Parsing Chapter 14, Part 2

Missed Context Dependence

• Another problem with CFGs is that which production is used to expand a non-terminal is independent of its context.

• However, this independence is frequently violated for normal grammars.– NPs that are subjects are more likely to be

pronouns than NPs that are objects.

16

Page 17: Probabilistic Parsing Chapter 14, Part 2

Splitting Non-Terminals

• To provide more contextual information, non-terminals can be split into multiple new non-terminals based on their parent in the parse tree using parent annotation.– A subject NP becomes NP^S since its parent

node is an S.– An object NP becomes NP^VP since its parent

node is a VP

17

Page 18: Probabilistic Parsing Chapter 14, Part 2

Parent Annotation Example

18

S

VP

VBD NP

DT Nominal

Nominal PP

liked

IN NP

in

the

dog

NN

DT Nominal

NNthe

pen

NNP

NP

John

^NP

^PP

^Nominal^Nominal

^NP

^VP

^S^S

^Nominal

^NP

^PP^Nominal

^NP

^VP^NP

VP^S → VBD^VP NP^VP

Page 19: Probabilistic Parsing Chapter 14, Part 2

Split and Merge

• Non-terminal splitting greatly increases the size of the grammar and the number of parameters that need to be learned from limited training data.

• Best approach is to only split non-terminals when it improves the accuracy of the grammar.

• May also help to merge some non-terminals to remove some un-helpful distinctions and learn more accurate parameters for the merged productions.

• Method: Heuristically search for a combination of splits and merges that produces a grammar that maximizes the likelihood of the training treebank.

19

Page 20: Probabilistic Parsing Chapter 14, Part 2

20

Treebanks

• English Penn Treebank: Standard corpus for testing syntactic parsing consists of 1.2 M words of text from the Wall Street Journal (WSJ).

• Typical to train on about 40,000 parsed sentences and test on an additional standard disjoint test set of 2,416 sentences.

• Chinese Penn Treebank: 100K words from the Xinhua news service.

• Other corpora existing in many languages, see the Wikipedia article “Treebank”

Page 21: Probabilistic Parsing Chapter 14, Part 2

First WSJ Sentence

21

( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (, ,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (, ,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (. .) ))

Page 22: Probabilistic Parsing Chapter 14, Part 2

WSJ Sentence with Trace (NONE)

22

( (S (NP-SBJ (DT The) (NNP Illinois) (NNP Supreme) (NNP Court) ) (VP (VBD ordered) (NP-1 (DT the) (NN commission) ) (S (NP-SBJ (-NONE- *-1) ) (VP (TO to) (VP (VP (VB audit) (NP (NP (NNP Commonwealth) (NNP Edison) (POS 's) ) (NN construction) (NNS expenses) )) (CC and) (VP (VB refund) (NP (DT any) (JJ unreasonable) (NNS expenses) )))))) (. .) ))

Page 23: Probabilistic Parsing Chapter 14, Part 2

23

Parsing Evaluation Metrics

• PARSEVAL metrics measure the fraction of the constituents that match between the computed and human parse trees. If P is the system’s parse tree and T is the human parse tree (the “gold standard”):– Recall = (# correct constituents in P) / (# constituents in T)

– Precision = (# correct constituents in P) / (# constituents in P)

• Labeled Precision and labeled recall require getting the non-terminal label on the constituent node correct to count as correct.

• F1 is the harmonic mean of precision and recall.

Page 24: Probabilistic Parsing Chapter 14, Part 2

Computing Evaluation Metrics

Correct Tree TS

VP

Verb NP

Det Nominal

Nominal PP

book

Prep NP

through

Houston

Proper-Noun

the

flight

Noun

Computed Tree P

VP

Verb NP

Det Nominalbook

Prep NP

through

Houston

Proper-Nounthe

flight

Noun

S

VP

PP

# Constituents: 12 # Constituents: 12# Correct Constituents: 10

Recall = 10/12= 83.3% Precision = 10/12=83.3% F1 = 83.3%24

Page 25: Probabilistic Parsing Chapter 14, Part 2

25

Treebank Results

• Results of current state-of-the-art systems on the English Penn WSJ treebank are slightly greater than 90% labeled precision and recall.

Page 26: Probabilistic Parsing Chapter 14, Part 2

Discriminative Parse Reranking

• Motivation: Even when the top-ranked parse not correct, frequently the correct parse is one of those ranked highly by a statistical parser.

• Use a classifier that is trained to select the best parse from the N-best parses produced by the original parser.

• Reranker can exploit global features of the entire parse whereas a PCFG is restricted to making decisions based on local info.

26

Page 27: Probabilistic Parsing Chapter 14, Part 2

2-Stage Reranking Approach

• Adapt the PCFG parser to produce an N-best list of the most probable parses in addition to the most-likely one.

• Extract from each of these parses, a set of global features that help determine if it is a good parse tree.

• Train a classifier (e.g. logistic regression) using the best parse in each N-best list as positive and others as negative.

27

Page 28: Probabilistic Parsing Chapter 14, Part 2

Parse Reranking

28

sentence N-Best Parse Trees

PCFG Parser

Parse Tree Feature Extractor

Parse Tree Descriptions

Parse Tree Classifier Best

Parse Tree

Page 29: Probabilistic Parsing Chapter 14, Part 2

Sample Parse Tree Features

• Probability of the parse from the PCFG.• The number of parallel conjuncts.

– “the bird in the tree and the squirrel on the ground”– “the bird and the squirrel in the tree”

• The degree to which the parse tree is right branching.– English parses tend to be right branching (cf. parse of

“Book the flight through Houston”)

• Frequency of various tree fragments, i.e. specific combinations of 2 or 3 rules.

29

Page 30: Probabilistic Parsing Chapter 14, Part 2

Evaluation of Reranking

• Reranking is limited by oracle accuracy, i.e. the accuracy that results when an omniscient oracle picks the best parse from the N-best list.

• Typical current oracle accuracy is around F1=97%

• Reranking can generally improve test accuracy of current PCFG models a percentage point or two.

30

Page 31: Probabilistic Parsing Chapter 14, Part 2

Dependency Grammars

• An alternative to phrase-structure grammar is to define a parse as a directed graph between the words of a sentence representing dependencies between the words.

31

liked

John dog

pen

inthe

the

liked

John dog

pen

in

the

the

nsubj dobj

det

det

Typed dependencyparse

Page 32: Probabilistic Parsing Chapter 14, Part 2

Dependency Graph from Parse Tree

• Can convert a phrase structure parse to a dependency tree by making the head of each non-head child of a node depend on the head of the head child.

32

S

VP

VBD NP

DT Nominal

Nominal PP

liked

IN NP

in

the

dog

NN

DT Nominal

NNthe

pen

NNP

NP

John

pen-NN

pen-NN

in-INdog-NN

dog-NN

dog-NN

liked-VBD

liked-VBD

John-NNP

liked

John dog

pen

inthe

the

Page 33: Probabilistic Parsing Chapter 14, Part 2

Statistical Parsing Conclusions

• Statistical models such as PCFGs allow for probabilistic resolution of ambiguities.

• PCFGs can be easily learned from treebanks.• Lexicalization and non-terminal splitting are

required to effectively resolve many ambiguities.

• Current statistical parsers are quite accurate but not yet at the level of human-expert agreement.

33