Probabilistic and Lexicalized Parsing

Probabilistic CFGs• Weighted CFGs

– Attach weights to rules of CFG– Compute weights of derivations– Use weights to pick, preferred parses

• Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR.

• Parsing with weighted grammars (like Weighted FA)– T* = arg maxT W(T,S)

• Probabilistic CFGs are one form of weighted CFGs.

Probability Model• Rule Probability:

– Attach probabilities to grammar rules

– Expansions for a given non-terminal sum to 1

R1: VP V .55

R2: VP V NP .40

R3: VP V NP NP .05

– Estimate the probabilities from annotated corpora P(R1)=counts(R1)/counts(VP)

• Derivation Probability:– Derivation T= {R1…Rn}

– Probability of a derivation:

– Most likely probable parse: – Probability of a sentence:

• Sum over all possible derivations for the sentence

• Note the independence assumption: Parse probability does not change based on where the rule is expanded.

iiRPTP

)(maxarg* TPTT

STPSP )|()(

Structural ambiguity • S NP VP• VP V NP• NP NP PP• VP VP PP• PP P NP

• NP John | Mary | Denver• V -> called• P -> from

John called Mary from Denver

V NP NPP

V NP PP

PJohn called Mary

from Denver

Cocke-Younger-Kasami Parser

• Bottom-up parser with top-down filtering

• Start State(s): (A, i, i+1) for each Awi+1

• End State: (S, 0,n) n is the input size• Next State Rules

– (Bi, k) (C, k, j) (A, i,j) if ABC

Example

Base Case: Aw

P Denver

NP from

V Mary

NP called

Recursive Cases: ABC

P Denver

NP from

X V Mary

NP called

P Denver

VP NP from

X V Mary

NP called

X P Denver

VP NP from

X V Mary

NP called

X P Denver

VP NP from

X V Mary

NP called

X P Denver

S VP NP from

V Mary

NP called

X X P Denver

S VP NP from

X V Mary

NP called

NP PP NP

X P Denver

S VP NP from

X V Mary

NP called

NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

VP NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

VP NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

S VP NP PP NP

X X X P Denver

S VP NP from

X V Mary

NP called

Probabilistic CKY• Assign probabilities to constituents as they are

completed and placed in the table• Computing the probability

– Since we are interested in the max P(S,0,n)• Use the max probability for each constituent

• Maintain back-pointers to recover the parse.

)(*),,(*),,(),,(

),,(),,(

BCAPjkCPkiBPjiBCAP

jiBCAPjiAPBCA

Problems with PCFGs• The probability model we’re using is just based on the rules in

the derivation.

• Lexical insensitivity:– Doesn’t use the words in any real way

– Structural disambiguation is lexically driven• PP attachment often depends on the verb, its object, and the preposition • I ate pickles with a fork. • I ate pickles with relish.

• Context insensitivity of the derivation– Doesn’t take into account where in the derivation a rule is used

• Pronouns more often subjects than objects • She hates Mary. • Mary hates her.

• Solution: Lexicalization– Add lexical information to each rule

An example of lexical information: Heads

• Make use of notion of the head of a phrase– Head of an NP is a noun– Head of a VP is the main verb– Head of a PP is its preposition

• Each LHS of a rule in the PCFG has a lexical item

• Each RHS non-terminal has a lexical item.– One of the lexical items is shared with the LHS.

• If R is the number of binary branching rules in CFG, in lexicalized CFG: O(2*|∑|*|R|)

• Unary rules: O(|∑|*|R|)

Example (correct parse)

Attribute grammar

Example (less preferred)

Computing Lexicalized Rule Probabilities

• We started with rule probabilities– VP V NP PP P(rule|VP)

• E.g., count of this rule divided by the number of VPs in a treebank

• Now we want lexicalized probabilities– VP(dumped) V(dumped) NP(sacks)PP(in)– P(rule|VP ^ dumped is the verb ^ sacks is the

head of the NP ^ in is the head of the PP)– Not likely to have significant counts in any

treebank

Another Example• Consider the VPs

– Ate spaghetti with gusto– Ate spaghetti with marinara

• Dependency is not between mother-child.

Vp (ate)

Vp(ate) Pp(with)

vAte spaghetti with gusto

Vp(ate)

Pp(with)

Np(spag)

npvAte spaghetti with marinara

Log-linear models for Parsing• Why restrict to the conditioning to the elements of a

rule?– Use even larger context– Word sequence, word types, sub-tree context etc.

• In general, compute P(y|x); where fi(x,y) test the properties of the context; i is the weight of that feature.

• Use these as scores in the CKY algorithm to find the best scoring parse.

Supertagging: Almost parsing

Poachers now control the underground trade

poachers

tradeS

poachers

control

poachers

underground

Summary• Parsing context-free grammars

– Top-down and Bottom-up parsers– Mixed approaches (CKY, Earley parsers)

• Preferences over parses using probabilities– Parsing with PCFG and PCKY algorithms

• Enriching the probability model– Lexicalization– Log-linear models for parsing

Probabilistic and Lexicalized Parsing

Documents

Lexicalized Probabilistic Context-Free Grammarsmcollins/cs4705-spring2019/slides/... · 2019. 1. 17. · Lexicalized Context-Free Grammars in Chomsky Normal Form I N is a set of non-terminal

Probabilistic Models for Parsing Images Xiaofeng Ren University of California, Berkeley

The Problem with Probabilistic Parsing Kari Baker Arizona State University

Exact Inference for Generative Probabilistic Non ...homepages.inf.ed.ac.uk/scohen/emnlp11nonproj.pdf · Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing

Intelligent Systems (AI-2)carenini/TEACHING/CPSC422-16/...CPSC 422, Lecture 27 3 Lecture Overview •Recap English Syntax and Parsing •Key Problem with parsing: Ambiguity •Probabilistic

Giorgio Satta University of Padua Parsing Techniques for Lexicalized Context-Free Grammars* * Joint work with : Jason Eisner, Mark-Jan Nederhof

Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05

IMPLEMENTING A SUBCATEGORIZED PROBABILISTIC DEFINITE CLAUSE GRAMMAR FOR VIETNAMESE SENTENCE PARSING

MALAY LEXICALIZED ITEMS

Short Introduction to Lexicalized Tree Kernels for NLPai-nlp.info.uniroma2.it/basili/didattica/WmIR_15_16/040_1_03_ShortINtrotiSMPTK.pdf · Short Introduction to Lexicalized Tree

Probabilistic Context-Free Grammars - Columbia Universitymcollins/cs4705-spring2018/slides/parsing2.… · I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing

Probabilistic parsing with a wide variety of featuresweb.science.mq.edu.au/~mjohnson/papers/johnson-ijcnlp04-slides.pdf · Probabilistic parsing with a wide variety of features

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05

Probabilistic Parsing

Probabilistic Parsing in Practice

Speech & NLP (Fall 2014): Parsing with Context-Free Grammars and Probabilistic Context-Free Grammars

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Lexicalized and Statistical Parsing of Natural Language Text in Tamil using Hybrid Language Models

Chart Parsing and Probabilistic Parsing

The Problem with Probabilistic Parsing