Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Constructions, Constraints, Transducers, and TAGs: A unifyingview through Feature Logic
Part I: Constructions and Constraints
Bill RoundsCSLI, Stanford
Fall 2007
1
HPSG
• Conceived as a constraint-based theory of grammar
• Principal grammatical mechanisms: implicational constraints andinheritance
• Reformulated by Sag as a system of constructions: Sign-basedconstruction grammar
2
Construction grammar
• Fillmore and Kay
• Constructions are the primary mechanism
• A construction simultaneously displays syntax, semantics, phonol-ogy, . . ..
• All of this information is represented infeature structures.
3
Example Lexical Construction (Sag)
pn-wrd⇒
SYN
CAT
noun
SELECT⟨⟩
XARG none
VAL
⟨⟩MRKG det
• pn-wordis a type (proper-noun word)
• ⇒ signifies that the constraint given by the RHS feature structureis to be enforced on all pn-words.
• In logical terms I view the rhs feature structure as a clause con-taining exactly one feature logic formula.
4
The construction in another notation
pn-wrd→
SYN:
CAT:
noun
∧ SELECT:⟨⟩
∧ XARG: none
∧VAL:
⟨⟩∧ MRKG: det
If your lexicon specifiedjohn: pn-wrd then by this construction youcould infer the FL formula
john: pn-wrd∧
SYN:
CAT:
noun
∧ SELECT:⟨⟩
∧ XARG: none
∧VAL:
⟨⟩∧ MRKG: det
5
Kripke Structure
q0
q1 q2
p !" 1
q !" 1
r !" 0
p !" 0
q !" 1
r !" 1
p !" 0
q !" 0
r !" 1
q0 |= p ! ¬!(r " "(p # q))
• This is an NFA with just one letter in the alphabet.
• State≡ “world”.
• In each world we have a setting of propositional variables.
6
Feature Structure
q0
q1
q2
q3
q4
q5
subj
pred
agr
agr
person
number
sent
np
vp
syn
3rd
sing
q0 |= sent ! pred : verb
!subj : (np ! agr : (syn ! person : 3rd ! number : sing))
! subj agr = pred agr
• This is a DFA over an alphabet of feature namesΣ = { SUBJ, PRED, . . . , NUMBER}.
• State≡ “head vertex of a data structure bearing linguistic infor-mation.”
• Each state bears a type instead of a propositional variable setting.
• We still have a modal formula that this structure satisfies. Insteadof �φ we havea : φ, wherea is a feature name or “attribute.”
7
• The formula SUBJ AGR = PRED AGR indicates that two pathslead to the same state (q3). This is calledre-entrancy.
8
Attribute-value Matrices
• The earliest use of feature structures seems to have been by Chom-sky and Halle inThe Sound Pattern of English– on phonology.
• For example, here is a classification of a phoneme:N+cons-voc-nasal
• Consider the feature structure
phrase
syn
1st
SUBJ PRED
PERSON
• This has an attribute-value matrixphrase
SUBJ:[1]
[synPERSON:1st
]PRED: [1]
9
Feature Structures formalized
• Let Σ be a set offeature names, andA be a set oftype names. Wecall the pair〈Σ, A 〉 a feature signature.
• A typed feature structure of signature〈Σ, A 〉 is a tuple
A = 〈D, d0, {fσ}σ∈Σ, τ 〉, where (i)d0 ∈ D, (ii) for each feature nameσ, fσ is a partialfunction onD, and (iii) τ : D → A.
• We are thinking of the usual transition functionδ(q, σ) as beingthe same asfσ(q).
• We impose a partial ordering onA, called thesubsumption order-ing. (Or, an inheritance hierarchy.) We say “a subsumesb”, andwrite a v b, to mean that “b is more informative thana.”
• Technically, we impose more restrictions on this ordering. It hasto be abounded - completepartial order, which means that ifc andd have an upper bound at all, then they have a least upper bound.This is sometimes called a “semilattice”. In addition, we have “ap-propriateness conditions” between types and feature names. (Forexample, “ssno” is appropriate for “employee”, but not for “mol-lusk.”)
• We can extend the subsumption order on types to one on arbitraryfeature systems of a given signature. This is done using the idea ofautomaton homomorphisms. They are akin to “functional bisimu-lations.”
10
Feature Structure Subsumption
• LetA = 〈DA, dA0 , {fAσ }σ∈Σ, τA 〉, andB = 〈DB, dB0 , {fBσ }σ∈Σ, τ
B 〉be two feature structures. We say thatA subsumesB, and writeA v B, if there is a automaton homomorphismh : DA → DB,with the property thatτA(d) v τB(h(d)) for all d ∈ DA.
• Example (assumingsignv phrase):
sign
synsyn
1st 1st
SUBJ PRED
PERSON
phrase
syn
1st
SUBJ PRED
PERSONPERSON
!
• If A v B andB v A, thenA andB are isomorphic. Thusv is apartial order on feature structures up to isomorphism.
• v is also a bounded-complete partial order. The least upper boundAtB of two compatible feature structures is called theirunifica-tion.
11
Feature Logic - background
• We use modal logic to put linguistic conditions on the kinds of de-scriptions we can have in a grammar. This comes from the gram-mar formalisms GPSG, HPSG, and LFG. All of these formalismsare calledunification-based.
• The original logic of this kind is calledKasper-Rounds logic. Abetter version, due to Carpenter, is calledtyped feature logic.
• There is now a very active area of mathematical linguistics calledmodel-theoretic syntax, which stems from the idea of using logicto constrain syntactic structure.
12
Feature logic – formal definition
Fix a bcpo(A,v) of types and a setΣ of feature names. The formulasof typed feature logic are the following:
• true;
• a, for a ∈ A;
• x : φ, for x ∈ Σ+ andφ a formula;
• x .= y, for x, y ∈ Σ∗;
• φ ∨ ψ;
• φ ∧ ψ.
LetA be a feature structure, andx ∈ Σ∗. ByA/x we mean the featurestructure headed by the nodeδ(q0, x), provided there is a path to thisnode labeled byx. We then define the relationA |= φ recursively:
• A |= true always;
• A |= a if a v τ (q0);
• A |= x : φ if A/x |= φ;
• A |= x.= y if A/x = A/y;
• A |= φ ∨ ψ if A |= φ orA |= ψ;
• A |= φ ∧ ψ if A |= φ andA |= ψ.
13
Feature logic - Properties
• There is no negation in FL. This gives us thepersistenceproperty:
If A |= φ andA v B thenB |= φ.
• If the set of types is finite, as well as the set of feature names, thenevery formulaφ has a finite set ofmost general satisfiers, where amgs is av-minimal feature structureA |= φ.
• Conjunction captures unification: IfA |= φ andB |= ψ, andAandB are compatible, thenA t B is a most general satisfier ofφ ∧ ψ.
14
DCPOs - Motivation
• We’re going to do computation using feature structures.
• Instead of using Turing machines, we’re going to write recursiveprograms stated in feature logic.
• The prototype of this is context-free grammar – you can think of aCFG as a recursive program stating what trees are legitimate. Butprogramming with FS is much more general.
• In order to say what a feature logic program is, and what its mean-ing is, we are going to go very general. We will locate finite andinfinite feature structures in a class ofdirected-complete partialorders(dcpos).
15
DCPOs and BCPO’s
!
X
!X
ab
c D
• A directed-complete partial orderis a poset(D,v) such that everydirected subsetX of D has a least upper bound
⊔X ∈ D.
• A directed set is a generalization of a totally ordered set. We sayX is directed if for alla, b ∈ X there is ac ∈ X which is an upperbound fora andb.
• A crude way to think about this is that every increasing sequencehas a limit.
16
Compact (finite) elements and algebraic DCPOs
• An elementd in a dcpoD is said to befiniteor compactif for anydirectedX ⊆ D such thatd v
⊔X, we must have anx ∈ X such
thatd v x.
• A finite elementmust participate in the actual construction ofevery limit point which it subsumes.
• The black elements in the picture do not participate in the con-struction process given by the white elements. They are not finite.
• If you moved the left-hand black element to be just over the topone, then itwouldbe finite! Any directed set with that element aslimit would have to contain it as a member.
• A dcpo isalgebraic if every element is the least upper bound ofthe finite elements it subsumes.
17
Continuous functions on DCPO’s
• Definition 1 Let (S,v) be a dcpo. A functionT : S → S is saidto becontinuousif (i) T is monotonic, and (ii) for every directedD ⊆ S, we have
T (⊔
D) =⊔{T (d) | d ∈ D}.
• The definition makes sense, because by monotonicity{T (d) | d ∈D} is a directed set.
• Theorem 2 Every continuous function on a dcpo has a least fixedpoint.
• The least fixed point is ⊔n≥0
T n(⊥).
18
BCPO’s
!
a b
c
a ! b
• A DCPOD is bounded-completeif any setY of elements whichhas an upper bound, has in fact a least upper bound
⊔Y in D.
• If the DCPO is algebraic, you can replace the setY with a two-element set{a, b}.
• A Scott domainis an algebraic BCPO with a countable set of com-pact elements (countable basis).
• There is a way to make a Scott domain out of feature structures.
19
The Feature Structure Domain
sign
synsyn
1st 1st
SUBJ PRED
PERSON
phrase
syn
1st
SUBJ PRED
PERSONPERSON
!
• Usings, p, r to stand for “subject”, “predicate”, and “person”, theset of paths (in both structures) is{e, s, p, sr, pr}. .
• To distinguish the right-hand structure, we impose a “right-invariant”equivalence relation≡ which makess ≡ p andsr ≡ pr.
• (Moshier.) Anabstract feature structureis a triple(P,≡, τ ), whereP is a nonempty prefix-closed subset ofΣ∗, ≡ is a right-invariantequivalence relation onP (in the sense above), andτ : P → A isa typing function such thatx ≡ y impliesτ (x) = τ (y).
• Under the “inclusion” ordering on these triples, the domain of fea-ture structures is an algebraic bcpo.
20
What about the circular structures?
• Even if we have a circular feature structure like one representingself-employed persons
employee
employed-by
you can still assign an abstract feature structure to it. The fact thatit is a finite structure is represented by the relation≡ having finiteindex – in fact, there is just one equivalence class; all paths areequivalent.
• So this feature structure is near the top of our domain, because ithas the biggest equivalence relation, and the largest set of paths.
21
Clauses
• Conjunctive formulas:true; a for a ∈ A; x : φ for x ∈ Σ∗; p .= q
for p, q ∈ Σ∗; φ ∧ ψ.
• Every finite feature structure is describable by a conjunctive for-mula – it will be the minimal satisfier of that formula.
• Every conjunction of basic formulas has at most one minimumsatisfier.
• A clauseis a disjunction of conjunctive formulas.
• We can represent a clause as a set of subsumption-minimal featurestructures.
• For example, the clause
(S2 ∧ 1 : a ∧ 2 : S1) ∨ (S1 ∧ 1 : a)
is represented by
S2
1 a
2 S2
,
[S1
1 a
].
22
Theories and Models
• Let K,L be clauses. We sayK |= L if every satisfier ofK is asatisfier ofL. Using the FS representation of clauses, this meansthat every element ofK is subsumed by some element ofL.
• A theory is a set of clauses.
• f is a model ofT if f |= K for all K ∈ T .
• A theoryT logically implies a clauseL (T |= L) if every model ofT is a model ofL. Notice that the (theory consisting of) the emptyclause logically implies any clause.
• The logical closureCl(T ) of T is the set{L | T |= L}.
• A theoryT is logically closed ifT = Cl(T ).
23
Compactness
• Compactness Theorem. If L is a logical consequence ofT thenL is a logical consequence of a finite subset ofT . Here’s a sketchof the proof.
• One can prove that the space of logically closed theories is an alge-braic bcpo under the inclusion ordering; further, that the compact(finite) elements of this bcpo are exactly the logical closures offinite theories. (This is not trivial.)
• To prove the compactness theorem from this, notice that the set ofclosures of finite subtheories ofT is directed, and its least upperbound isT , by algebraicity.
• If T |= L, thenL ∈ T , and so for the closurecl(F ) of some finitesetF of clauses, we have{L} ⊆ cl(F ). This is our desired result.
24
Resolution
• We introduce a (generalized)resolution rule. In the following,K,L,M are clauses.
K L f ∈ K g ∈ L {f t g} |= M
M ∪ (K − f ) ∪ (L− g).
• The side conditionf t g means the singleton clause consistingof the unification off andg if that exists, and the empty clauseotherwise.
• To see how this rule generalizes the usual one, leta and a′ beinconsistent types, and consider clausesK,L with a ∈ K anda′ ∈ L. Then{a t a′} = ∅ |= ∅, so the conclusion of the rule is(K − a) ∪ (L− a′).
• This rule is sound; suppose the side conditions in the rule hypoth-esis hold. We claim that for anyh, if h |= K andh |= L, thenh |= M ∪ (K − f )∪ (L− g). The only way this could not happenis for f v h andg v h. But then by the last condition,h |= M .
• We add two more standard rules to our proof system:
– Initial :
{⊥}– Inconsistency:
∅M.
25
Completeness
• Completeness Theorem.If T |= L thenL is provable fromTusing resolution and the two other rules.
• Proof sketch: By the compactness theorem,L is a logical con-sequence of a finite subset{K1, . . . , Kn} of T . We show that if{K1, . . . , Kn} |= L then we can proveL from the finitely manyassumptionsKi.
• One case of this is relatively easy: the theory is a single clauseK.If K |= L, we can showK ` L.
• Now letP andQ be clauses. Thecross-unificationP ./ Q is theclause{p t q | p ∈ P, q ∈ Q}.
• This is, in view of our identification of clauses as disjunctions,and unification as producing the most general satisfier of a con-junction, a clause which is logically equivalent toP ∧ Q, by thedistributive law.
• Lemma. {P,Q} ` P ./ Q.
• Example. Proof that{f}, {g, h} ` {f t g, f t h}:
{f} {g, h} f t g |= {f t g}{f} {f t g, h} f t h |= {f t h}
{f t g, f t h}
In the second inference we reuse the assumption{f}.
26
• This lemma together with the single-clause case are enough to fin-ish the proof.
27
Logic Programming with feature logic
• A logic programming rule is a construct
f → L
wheref is a feature structure andL is a clause.
• A logic program is a set of logic programming rules.
• Example. Consider a CFG with one nonterminalS, one terminala, start symbolS, and the productionsS → aS | a.
• We create a type system as follows. The nonterminalS will havetwo incarnations, one as a binary typeS2 and one as a unary typeS1. The symbola is a constant. There are two attributes,1 and2,indicating left and right branches.
• Now consider the productions modelled as logic programmingrules
⊥ → {S}S → {[1 : a t 2 : S], a}
28
A program resolution rule
• We want to use logic programming rules to build up a theory. Themodels of this theory should be the structures we desire.
• To do this, we define aprogram resolution ruleas follows:
M ; g ∈M ; f → L ∈ P ; f v g
L ∪ (M \ {g})
• We define the theoryTh(P ) of a programP to be the set of allclauses derivable using the logical and program resolution rules.
• Example: What’s the theory of this program? (AssumeS v a.)
⊥ → {S}S → {[1 : a t 2 : S], a}
• We get{⊥} for free. Then we get{S} using the program rule.From this we get the clause{[1 : a t 2 : S], a}, but we alreadyhave{S}, so we can cross-unify this with{[1 : a t 2 : S], a},getting{[S t [1 : a t 2 : S]], a}.
• The program inference rule cannot be applied to this clause in anirredundant way. We need a way to be able to rewrite the innerS.
29
Rule generators
The way to get more rules is to introduce the prefixing rule generatoras follows: If f → L ∈ P , then for any attributeσ we add a ruleσ : f → {σ : g | g ∈ L} to P .
• Example: By the prefixing scheme, we can add the rule
2 : S → {2 : [1 : a t 2 : S], 2 : a}]
to our CFG program.
• The structure[S t [1 : a t 2 : S]] in the clauseM = {[S t [1 :
at 2 : S]], a} is subsumed by2 : S, so the program resolution ruleallows us to derive the clause{2 : [1 : a t 2 : S], 2 : a, a}. We canthen cross-unify this clause withM . Try it!
30
Some theory
• Consider the program resolution rule
M g ∈M f → L ∈ P f v g
L ∪ (M \ {g})
We say thatL ∪ (M \ {g}) is aP -consequence ofM .
• Further, we say thatN is aP -consequence of a theoryT if it is aP -consequence of someM ∈ T .
• LetT be a logically closed theory overD, and letP be a program.We define
TP (T ) = Cl{Y | Y is aP -consequence ofT}.
• TP (T ) is the logical closure of the theory obtained fromT by tak-ing one step of the program inference rule in all possible ways.Clearly it is monotonic in the set inclusion ordering on theories.
• Theorem 3TP is continuous in the inclusion order on logicallyclosed theories.
• By the standard least-fixed point theorem,TP has the least fixedpoint
⋃n T
nP (⊥), where⊥ is the theory consisting of all conse-
quences of{true}.
• Proposition 4 µTP is the theory ofP .
31
Model-theoretic consequences of a program
• Let P be a logic program. A feature structureg is said to be amodel ofP if for every rulef → L in P , if f v g, theng |= L.
• A clauseM is a model-theoretic consequence ofP if every modelof P satisfiesM . We writeModCons(P ) for the set of all clauseswhich are model-theoretic consequences ofP .
• Theorem 5 (Rounds and Zhang)ModCons(P ) = µTP .
32
Preview: Modelling TAG in feature logic
!
!!"
!!0""!!1"" !!2""
!!1, 0""!!1, 1""
!!1", !0""!!1", !1""
!!1", !"
!!2", !"
!!2", !0""
ab
c
a
b
a
b
a
3
3
3
(i)
(ii)
33