50
Presenter: Ba Dat Nguyen Advisor: Dr. Martin Theobald Max-Planck-Institut für Informatik Saarbrücken, Germany PCFG: Probabilistic Context Free Grammars

PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

Presenter: Ba Dat Nguyen

Advisor: Dr. Martin Theobald

Max-Planck-Institut für Informatik

Saarbrücken, Germany

PCFG: Probabilistic Context

Free Grammars

Page 2: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Introduction

• Probabilistic Context Free Grammars Parsing

Context Free Grammars

Probabilistic Context Free Grammars

Inside-Outside Algorithm

• Extension Distance

Complement/ adjunct distinction

Traces and Wh-movement

Outline

2 / 25 Probabilistic Context Free Grammars

Page 3: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

The World is a big ambiguity

3 / 25 Probabilistic Context Free Grammars

Page 4: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

PCFG is a good way to solve ambiguity

problems in syntactic structure field.

Probabilistic Context Free Grammars

Solution

Page 5: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Introduction

• Probabilistic Context Free Grammars Parsing

Context Free Grammars

Probabilistic Context Free Grammars

Inside-Outside Algorithm

• Extension Distance

Complement/ adjunct distinction

Traces and Wh-movement

Outline

5 / 25 Probabilistic Context Free Grammars

Page 6: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Language

Structural

Ambiguous

• Grammar

Generalization of regularities in language structures

Morphology and syntax

Language and Grammar

6 / 25 Probabilistic Context Free Grammars

Page 7: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Process working out the grammatical structure of

sentences.

• Basic Parsing Algorithms

Parsing Strategies

CYK Algorithm

Earley Algorithm

Parsing

7 / 25 Probabilistic Context Free Grammars

Page 8: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• “She is a nice girl”

S

NP VP

PRP VBZ NP

She is DT JJ NN

a nice girl

Example of parsing

8 / 25 Probabilistic Context Free Grammars

Page 9: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Introduction

• Probabilistic Context Free Grammars Parsing

Context Free Grammars

Probabilistic Context Free Grammars

Inside-Outside Algorithm

• Extension Distance

Complement/ adjunct distinction

Traces and Wh-movement

Outline

9 / 25 Probabilistic Context Free Grammars

Page 10: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

Probabilistic Context Free Grammars

Chomsky hierarchy

A

A

aBAaA or

lsnontermina and terminalsof strings are γβ, α,

terminala is a

lsnontermina are B A,

:Where

Page 11: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• A Context Free Grammars consists of

A set of terminals { }, k = 1,... V

A set of nonterminals { }, i = 1,... n

A designated start symbol

A set of rules { }

where is a sequence of terminals and nonterminals

Probabilistic Context Free Grammars

Context Free Grammars (CFG)

kw

iN

1N

jiN j

Page 12: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

S -> NP VP NP -> NP PP

PP -> P NP NP -> astronomers

VP -> V NP NP -> ears

VP -> VP PP NP -> saw

P -> with NP -> stars

V -> saw NP -> telescopes

Probabilistic Context Free Grammars

Example of CFG

Page 13: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

S

NP VP

astronomers V NP

saw NP PP

Which one stars P NP

is better? with ears

Probabilistic Context Free Grammars

S

NP VP

astronomers VP PP

V NP P NP

saw stars with ears

Ambiguous sentences

Page 14: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Introduction

• Probabilistic Context Free Grammars Parsing

Context Free Grammars

Probabilistic Context Free Grammars

Inside-Outside Algorithm

• Extension Distance

Complement/ adjunct distinction

Traces and Wh-movement

Outline

14 / 25 Probabilistic Context Free Grammars

Page 15: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• A Probabilistic Context Free Grammars (PCFG)

consists of

A CFG

A corresponding set of probabilities on rules such that:

Probabilistic Context Free Grammars

Probabilistic CFG

j

jiNP 1)( i

Page 16: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

S -> NP VP 1.0 NP -> NP PP 0.4

PP -> P NP 1.0 NP -> astronomers 0.1

VP -> V NP 0.7 NP -> ears 0.18

VP -> VP PP 0.3 NP -> saw 0.04

P -> with 1.0 NP -> stars 0.18

V -> saw 1.0 NP -> telescopes 0.1

Probabilistic Context Free Grammars

Example of PCFG

Page 17: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

NP

NP PP

stars P NP

with ears

Probabilistic Context Free Grammars

Probability of a tree

stars)NP PP, NP NPPPP withP earsP(NP

stars)NP PP, NP NPPPP withP(P

stars)NP PP, NP NPPP(PP

NP PP) stars| NPP(NP NP PP) P(S

ears) with, NPP P stars, PPNP PP, NP P(NP

NP, ,|

NP, |

|NP

NP,

Page 18: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Place invariance

• Context-free

• Ancestor-free

Probabilistic Context Free Grammars

Assumptions

sameξ) is thek P(N j

c)k(k

)( |( j

kl

j

kl N Phrough l) utside k tanything oNP

)( |( j

kl

j

kl

j

kl N P) ide Nnodes outsancestoranyNP

ckk

j

ww

N

...

Page 19: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

NP

NP PP

stars P NP

with ears

Probabilistic Context Free Grammars

Probability of a tree

ears)P(NP with)P(P

)PP(PP stars)P(NP NP PP) P(S

ears) with, NPP P stars, PPNP PP, NP P(NP

NP

NP,

Page 20: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

S1.0

NP0.1 VP0.7

astronomers V1.0 NP0.4

saw NP0.18 PP1.0

stars P1.0 NP0.18

with ears

Probabilistic Context Free Grammars

S1.0

NP0.1 VP0.3

astronomers VP0.7 PP1.0

V1.0 NP0.18 P1.0 NP0.18

saw stars with ears

1.0x0.1x0.7x1.0x0.4x0.18x1.0x1.0x0.18 =

0.0009072

1.0x0.1x0.3x0.7x1.0x0.18x1.0x1.0x0.18 =

0.0006804

Ambiguity

Page 21: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Introduction

• Probabilistic Context Free Grammars Parsing

Context Free Grammars

Probabilistic Context Free Grammars

Inside-Outside Algorithm

• Extension Distance

Complement/ adjunct distinction

Traces and Wh-movement

Outline

21 / 25 Probabilistic Context Free Grammars

Page 22: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Given a training set of annotated sentences

C(.) - number of times that a particular rule is used.

Probabilistic Context Free Grammars

Probability of a rule

γ

j

jj

γ)C(N

ξ)C(Nξ)P(N

Page 23: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

Probabilistic Context Free Grammars

Probability of a rule

How to calculate if there is no

annotated data!

Page 24: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Maximum Likelihood Estimation

• No known analytic method to choose µ to maximize

P(O | µ)

• Locally maximize P(O | µ) by an iterative hill-climbing –

special case of Expectation Maximization method.

• Inside-Outside algorithm is a form of EM using the

inside-outside probabilities estimated from training set.

Probabilistic Context Free Grammars

Maximum Likelihood Estimation

setgrammar current of parameters

)|(maxarg

trainingOP

Page 25: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• We are given

A set of training sentences

A set of terminals

A set of nonterminals

• Initial probabilities are estimated by rules (perhaps

by randonly chosen)

• Using inside-outside algorithm to train

Probabilistic Context Free Grammars

Training a PCFG

Page 26: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Outside probability Inside probability

Probabilistic Context Free Grammars

Inside-Outside probabilities

1N

jN

m1qqp1p1 w... w w... w w... w

),( qpj),( qpj

Page 27: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Inside probability is the probability of sequence

being generated with a tree rooted by node

• Calculation can be carried out bottom-up

Probabilistic Context Free Grammars

Inside probabilities

),( qpj

qp ww ... jN

)(),( k

j

j wNPkk

),1(),()(),(,

1

qddpNNNPqp sr

sr

sr

q

qd

j

j

)|(),( j

pqpqj NwPqp

Page 28: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Outside probability is the total probability of

beginning with the start symbol and generating

all the words outside

Probabilistic Context Free Grammars

Outside probabilities

),( qpj

j

pqN

)1,()(),(

),1()(),(),(

10111

...

),,(),(

,

1

1

, 1

1

*

,11,1

peNNNPqe

eqNNNPepqp

for j,m)( and α,m)(α

wwNN

wNwPqp

g

jgf

gf

p

e

f

g

gjf

gf

m

qe

fj

j

qp

jj

pq

nq

j

pqpj

Page 29: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

We have:

Probabilistic Context Free Grammars

Inside-Outside Algorithm

),(),()|(

)(

),(),(),(

,1

*1

,

*

,1

*1

,

*

,1

*1

qpqpwNwNP

wNPCall

wNwNPqpqp

jj

mqp

j

m

qp

j

mjj

Page 30: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

Probabilistic Context Free Grammars

Inside-Outside Algorithm

1

1 1

1

1

),1(),()(),(

) ,(

),(),(

) (

m

p

m

pq

q

pd

sr

srj

j

jsrj

m

p

m

pq

jj

j

qddpNNNPqp

usedNNNNE

qpqp

usedisNE

Page 31: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

Probabilistic Context Free Grammars

Inside-Outside Algorithm

m

p

m

q

jj

m

h

j

k

hjkj

m

p

m

q

jj

m

p

m

pq

q

pd

sr

srj

j

srj

qpqp

hhwwPhh

wNP

qpqp

qddpNNNPqp

NNNP

Therefore

1 1

1

1 1

1

1 1

1

),(),(

),()(),(

)(

),(),(

),1(),()(),(

)(

:

Page 32: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

Probabilistic Context Free Grammars

Inside-Outside Algorithm

)(

),(),(),,(

)(

),()(),(),,(

)(

),1(),()(),(

),,,,(

corpus in the Wsentenceeach For

*1

*1

*1

1

i

i

jj

i

i

j

k

hj

i

i

q

pd

sr

srj

j

i

WNP

qpqpjqph

WNP

hhwwPhhkjhg

WNP

qddpNNNPqp

srjqpf

Page 33: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

Probabilistic Context Free Grammars

Inside-Outside Algorithm

l

i

m

p

m

pq

i

l

i

m

h

ikj

l

i

m

p

m

pq

i

l

i

m

p

m

pq

i

srj

i i

i

i i

i i

jqph

kjhg

wNP

jqph

srjqpf

NNNP

1 1

1 1

1 1

1

1

1 1

),,(

),,(

)(

),,(

),,,,(

)(

have We

Page 34: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Inside-Outside algorithm is quite slow for

each sentence

m is the length of the sentence

n is the number of nonterminals

• The algorithm is very sensitive to the initialization of

the parameters.

• In practice, a PCFG is a worse language model for

English than an n-gram model (n>1).

Probabilistic Context Free Grammars

Discussion

)( 33nmO

Page 35: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Introduction

• Probabilistic Context Free Grammars Parsing

Context Free Grammars

Probabilistic Context Free Grammars

Inside-Outside Algorithm

• Extension Distance

Complement/ adjunct distinction

Traces and Wh-movement

Outline

35 / 25 Probabilistic Context Free Grammars

Page 36: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

TOP -> S(bought)

S(bought) -> NP(week) NP(Marks) VP(bought)

NP(week) -> JJ(Last) NN(week)

NP(Marks) -> NNP(Marks)

VP(bought) -> VB(bought) NP(Brooks)

NP(Brooks) -> NNP(Brooks)

• Adding some lexical features: words and POS inside non-terminals.

• Using the head-child of the phrase, which inherits the head-word from its parent.

Probabilistic Context Free Grammars

More features

Page 37: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

TOP

S(bought)

NP(week) NP(Marks) VP(bought)

JJ NN NNP VB NP(Brooks)

Last week Marks bought NNP

Brooks

Probabilistic Context Free Grammars

Example

Page 38: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Using some probabilities

– Head constituent label of the phrase

– Modifiers to the right of the head

– Modifiers to the left of the head

Probabilistic Context Free Grammars

Using Distance

),|( hPHPH

1..1

1111 ))()...(,,,|)((mi

iiiiR rRrRHhPrRP

1..1

1111 ))()...(,,,|)((ni

iiiiL lLlLHhPlLP

Page 39: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

Probabilistic Context Free Grammars

Using Distance

),,|)((

),,|)((),|(

))()()()((

0 distanceWith

),,,,|)((

),,|)((),|(

))()()()((

:

boughtVPSweekNPP

boughtVPSMarksNPPboughtSVPP

boughtVPMarksNPweekNPboughtSP

MarksNPboughtVPSweekNPP

boughtVPSMarksNPPboughtSVPP

boughtVPMarksNPweekNPboughtSP

exampleFor

l

lh

l

lh

Page 40: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Introduction

• Probabilistic Context Free Grammars Parsing

Context Free Grammars

Probabilistic Context Free Grammars

Inside-Outside Algorithm

• Extension Distance

Complement/ adjunct distinction

Traces and Wh-movement

Outline

40 / 25 Probabilistic Context Free Grammars

Page 41: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

TOP

S(bought)

NP(week) NP-C(Marks) VP(bought)

JJ NN NNP VB NP-C(Brooks)

Last week Marks bought NNP

Brooks

It would be useful to identify “Marks” as a subject and “Last

week” as an adjunct!

Probabilistic Context Free Grammars

Adding the complement / adjunct distinction

Page 42: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Adding “-C“ suffix to all non-terminals in training data which

satisfy:

The nonterminal must be: an NP, SBAR or S whose parent is an S;

an NP, SBAR, S, or VP whose parent is a VP; or S whose parent is

an SBAR

The non-terminal must not have one of the following semantic tags:

ADV, VOC, BNF, DIR, EXT, LOC, MNR, TMP, CLR or PRP.

Probabilistic Context Free Grammars

Adding the complement / adjunct distinction

Page 43: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Using some probabilities

Head constituent label of the phrase

Left and right subcat frames

Modifiers to the right of the head

Modifiers to the left of the head

Probabilistic Context Free Grammars

Adding the complement / adjunct distinction

),|( hPHPH

)),1(distance,,,|,( RCihPHrRP iiR

),,|( ),,|( hHPRCPandhHPLCP rclc

)),1(distance,,,|,( LCihPHlLP iiL

Page 44: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• For example

Probabilistic Context Free Grammars

Adding the complement / adjunct distinction

),,|)((

}){,,,|)}(({

),,|}({

),|(

))( )( )()((

boughtVPSweekNPP

CNPboughtVPSMarksCNPP

boughtVPSCNPP

boughtSVPP

boughtVPMarksCNPweekNPboughtSP

l

l

lc

h

Page 45: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Introduction

• Probabilistic Context Free Grammars Parsing

Context Free Grammars

Probabilistic Context Free Grammars

Inside-Outside Algorithm

• Extension Distance

Complement/ adjunct distinction

Traces and Wh-movement

Outline

45 / 25 Probabilistic Context Free Grammars

Page 46: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

• Adding a “gap“ feature to each non-terminal in the tree and

propagating gaps through the tree until the are finally

discharged as a trace complement.

• For example

(1) NP -> NP SBAR(+gap)

(2) SBAR(+gap) -> WHNP S-C(+gap)

(3) S(+gap) -> NP-C VP(+gap)

(4) VP(+gap) -> VB TRACE NP

Probabilistic Context Free Grammars

Traces and Wh-movement

Page 47: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

NP(store)

NP(store) SBAR(that)(+gap)

The store WHNP(that) S(bought)(+gap)

WDT NP-C(Marks) VP(bought)(+gap)

that Marks VBD TRACE NP(week)

bought last week

Probabilistic Context Free Grammars

Traces and Wh-movement

),,|)((

}),{,,,|(

),,|}({

),,|(),|(

))( )())(((

VBboughtVPweekNPP

gapCNPVBboughtVPTRACEP

VBboughtVPCNPP

VBboughtVPRightPboughtVPVBP

weekNPTRACEboughtVBgapboughtVPP

R

R

RC

Gh

l

Page 48: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

Probabilistic Context Free Grammars

Experiment

parse treebank in the

t constituen a with boundariest constituen ate viol

whichtsconstituen ofnumber Brackets Crossing

parsek in treeban tsconstituen ofnumber

parse proposedin tsconstituencorrect ofnumber

recall Labelled

parse proposedin tsconstituen ofnumber

parse proposedin tsconstituencorrect ofnumber

Precision Labelled

23.section on testedandTreebank Penn

theofportion JournalStreet Wallthe

of 21-02 sectionson trainedare Models

Page 49: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

1. Foundations of Statistical Natural Language

Processing

2. Standford parse http://nlp.stanford.edu:8080/parser/

3. Three Generative, Lexicalised Models for

Statistical Parsing, ACL 97

Probabilistic Context Free Grammars

References

Page 50: PCFG Probabilistic Context - Max Planck Societyresources.mpi-inf.mpg.de/d5/teaching/ss11/ie/slides/dat_ba_nguyen.pdf · PCFG is a good way to solve ambiguity problems in syntactic

Thanks!

Probabilistic Context Free Grammars