Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Parsing with Lexical-Functional Grammars
Nikos Engonopoulos
Universitat des Saarlandes
January 19, 2012
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Outline
1 Introduction
2 LFG parsing using Discriminative EstimationRobust LFG parsingDiscriminative Estimation
3 PCFG-based LFG approximationsAutomatic f-structure annotationInducing LFG approximationsLong-distance dependencies
4 Experimental Evaluation
5 Discussion
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Outline
1 Introduction
2 LFG parsing using Discriminative EstimationRobust LFG parsingDiscriminative Estimation
3 PCFG-based LFG approximationsAutomatic f-structure annotationInducing LFG approximationsLong-distance dependencies
4 Experimental Evaluation
5 Discussion
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
LFG basics
Lexical Functional Grammar:
lexical (not transformational): richly structured lexicon whichencodes grammatical structure (e.g. verbal frames)
functional (not configurational): grammatical functions (e.g.subject) object are explicit primitives
LFG distinguishes (minimally) two representations for a sentence:
c(onstituent)-structure Phonological realization
Represented as a tree structureSimilar to CFG structures
f(unctional)-structure Abstract functional organization:predicate-argument structures, functional relations
Represented as an attribute-value matrix (AVM)Similar to dependency structures
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
LFG structure example
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Parsing with LFG
Deep parsing with LFG (or HPSG) presents a number of challenges:
Annotation with LFG structures is very difficult andtime-consuming
Very small LFG-annotated training corpora
Difficult to automatically induce stochastic LFG grammarsdirectly using the same data-driven approaches as for PCFGs
Hand-crafted grammars have usually low coverage, and areextremely time-consuming and expensive
Two different approaches will be presented:
The PARC approach Carefully building a hand-crafted LFG, thenusing stochastic disambiguation guided by treebankPSG annotations (linguistically motivated)
The DCU approach Automatically extracting an approximation of anLFG using treebank PCFG and functional annotations,plus advanced heuristics for building missinginformation (data-driven)
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Outline
1 Introduction
2 LFG parsing using Discriminative EstimationRobust LFG parsingDiscriminative Estimation
3 PCFG-based LFG approximationsAutomatic f-structure annotationInducing LFG approximationsLong-distance dependencies
4 Experimental Evaluation
5 Discussion
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
LFG Parsing using Discriminative Estimation
Riezler et al. (2002) address the LFG parsing problem by using:
A hand-crafted LFG grammar (ParGram, 1999)
A widely used PSG-annotated corpus (WSJ), also containingsome partial functional annotations (e.g. NP-SBJ, ADJP-PRD)
A deterministic LFG parser which produces packedrepresentations of all possible parses for each input
A discriminative statistical estimation method to stochasticallydisambiguate between the possible parses
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Robust, polynomial-time deterministic parsing withLFG
An LFG grammar with 314 rules is used, compiled into acollection of FSMs with ∼ 8, 700 states and ∼ 20, 000 transitions
Grammar modified to parse POS tags and labeled bracketings,instead of raw text
The XLE parser (Maxwell and Kaplan, 1993) is used to tagpartially labeled data from WSJ and creating packedrepresentations of all possible parses for each input
Robustness: When no parse is found, a partial one is assigned(FRAGMENT parse)Efficiency: If a parsing time threshold is exceeded, a SKIMMED
parse is produced (full or partial)
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Conditional exponential model for stochasticdisambiguation
The conditional probability of a parse x given a sentence y ispλ(x |y) = Zλ(y)−1eλ·f(x), where:
f = (f1, ..., fn) is a vector of feature functions which map parseto numeric values
λ = (λ1, ..., λn) is a vector of log-parameters for these propertyfunctions
Zλ(y) =∑
x∈X (y) eλ·f(x) is a normalization constant
Feature functions can represent various complex properties of theparses
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Feature functions
Around 1000 feature functions are used, for example:
c-structure nodes and subtrees
number of recursively embedded phrases (to account forhigh-vs-low attachment)
values of specific f-structure attributes
presence of particular attribute-value pairs
branching behaviour of c-structures
properties based on soft clustering of grammatical relationsencoded as lexicalized head-argument-relation tuples
...
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Discriminative Estimation method
Goal: maximize the conditional likelihood of the correct parse giventhe training example
P(λ) = −L(λ)− G (λ) = −logm∏j=1
pλ(zj |yj) +n∑
i=1
λi2
2σi 2(1)
In this case: no gold-standard LFG parse
Instead, consider for each training sentence y the set of parsesX (y , z) consistent with the partial treebank annotation z :
P(λ) = −m∑j=1
log
∑X (yj ,zj )
eλ·f(x)∑X (yj )
eλ·f(x)+
n∑i=1
λi2
2σi 2(2)
= −m∑j=1
log∑
X (yj ,zj )
eλ·f(x) +m∑j=1
log∑X (yj )
eλ·f(x) +n∑
i=1
λi2
2σi 2(3)
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Outline
1 Introduction
2 LFG parsing using Discriminative EstimationRobust LFG parsingDiscriminative Estimation
3 PCFG-based LFG approximationsAutomatic f-structure annotationInducing LFG approximationsLong-distance dependencies
4 Experimental Evaluation
5 Discussion
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Data-driven, PCFG-based LFG approximations
Problem: Building a broad-coverage, large-scale LFG (orHPSG) grammar is time-consuming and expensive
However, large PCFG annotated corpora exist (e.g. Penn-IItreebank), with some functional, trace, head annotations
The DCU approach:
Automatically generate PCFG-based approximations of LFGgrammars, using an shallow f-structure annotation algorithm
Automatically extract semantic subcategorization frames andfinite approximations of Functional Uncertainty equations fromthe training data
Resolve long-distance dependencies using the inducedf-structures, frames and FU equations
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Automatic f-structure annotation algorithm
An algorithm exploting Penn-II annotations to produce f-structures.
1 The Penn-II trees are automatically head-lexicalized
2 All local subtrees are partitioned into left-right context (relativeto head)
3 A pipeline of 4 modules is applied for annotating with
f-equations:
4 A constraint solver produces full f-structures
The algorithm has 99.82% coverage and 96.57% F-1 on Penn-IItreebank data.
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Automatic extraction of LFG approximations usingPCFG annotations
Two alternative architectures used:
1 The pipeline model
extract a PCFG from a treebankparse unseen datagenerate f-structures using the f-structure annotation algorithm
2 The integrated model
annotate treebank with f-structure equationsextract a PCFG from the augmented treebank annotationsparse unseen data using the extended PCFGresolve f-structure equations into f-structures using a constraintsolver
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Long-distance dependencies
Problem: Resulting f-structures are still “shallow”: they don’taccount for long-distance dependencies
Solution: Resolve LDDs by using Functional uncertainty (FU)equations and subcategorization frames
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
LDD resolution with extracted frames and FUequations
1 Subcategorization frames:
2 Functional uncertainty (FU) equations:
3 Instead of being hand-coded, they are learned from the trainingdata and assigned probabilities
4 The LDD resolution algorithm then recursively traverses thef-structure looking for paths that match FU equations, whilesatisfying subcategorization constraints
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Outline
1 Introduction
2 LFG parsing using Discriminative EstimationRobust LFG parsingDiscriminative Estimation
3 PCFG-based LFG approximationsAutomatic f-structure annotationInducing LFG approximationsLong-distance dependencies
4 Experimental Evaluation
5 Discussion
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Experimental Evaluation: PARC approach
Two evaluation methods for produced f-structures:
1 LFG f-relations2 Dependency relations (to enable comparison with other
formalisms)
Results on the PARC 700 dataset (700 randomly selected sentencesfrom section 23 of WSJ):
Later results with a better disambiguation technique (Kaplan et al.,2004):
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Experimental Evaluation: DCU approach
On the same dataset (PARC 700), the DCU approach achieves betterF-1 than the PARC approach.
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Outline
1 Introduction
2 LFG parsing using Discriminative EstimationRobust LFG parsingDiscriminative Estimation
3 PCFG-based LFG approximationsAutomatic f-structure annotationInducing LFG approximationsLong-distance dependencies
4 Experimental Evaluation
5 Discussion
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
Discussion
LFG parsing is challenging due to costly grammar engineeringand annotating
Two approaches were presented:1 Linguistically motivated (PARC): hand-crafted LFG using
stochastic disambiguation guided by existing treebankannotations
more robust and consistentno annotation efforthuge engineering effort, over-reliance on a single person’sexpertise
2 Data-driven (DCU): automatically induced LFG approximationfrom existing treebank annotations
no grammar engineering or annotation effortmore fragmented grammar, no linguistic motivation
Both approaches achieve similar evaluation results
Parsing withLexical-
FunctionalGrammars
NikosEngonopoulos
Introduction
LFG parsing usingDiscriminativeEstimation
Robust LFGparsing
DiscriminativeEstimation
PCFG-based LFGapproximations
Automaticf-structureannotation
Inducing LFGapproximations
Long-distancedependencies
ExperimentalEvaluation
Discussion
References
Aoife Cahill, Michael Burke, Ruth O’Donovan, Josef vanGenabith, and Andy Way, Long-distance dependency resolutionin automatically acquired wide-coverage pcfg-based lfgapproximations, Proceedings of the 42nd Annual Meeting of theAssociation for Computational Linguistics (ACL-04), 2004, July21–26, Barcelona, Spain, pp. 320–327.
R.M. Kaplan, S. Riezler, T.H. King, J.T. Maxwell,A. Vasserman, and R. Crouch, Speed and accuracy in shallowand deep stochastic parsing, Proceedings of HLT-NAACL, vol. 4,2004, pp. 97–104.
Stefan Riezler, Tracy H. King, Ronald M. Kaplan, RichardCrouch, John T. Maxwell, and Mark Johnson, Parsing the WallStreet Journal using a Lexical-Functional Grammar anddiscriminative estimation techniques, Proceedings of the 40thAnnual Meeting of the Association for Computational Linguistics(ACL’02) (Philadephia), 2002.