Upload
atul-agrawal
View
213
Download
0
Embed Size (px)
DESCRIPTION
an data driven information bank
Citation preview
Disjunctive Unication
Andreas Eisele, Jochen DorreInstitut fur maschinelle SprachverarbeitungUniversitat Stuttgart
A Bradford BookThe MIT PressCambridge, MassachusettsLondon, England
Contents
4 Andreas Eisele & Jochen Dorre: DisjunctiveUnication 1
4.1 Introduction 1
4.1.1 Unication-based grammar formalisms 1
4.1.2 Disjunction and Ambiguity in Unication Gram-
mars 34.1.3 The Processing of Ambiguous Unication Gram-
mars 44.1.4 Organisation of Paper 5
4.2 Unication of Feature Structures with Disjunctive In-formation 64.2.1 Karttunens Approach 8
4.2.2 Disjunction in [Eisele & Dorre, 1988] 9
4.2.3 Representation of General Disjunction in [Kasper,
1987c] 104.2.4 Conclusion and Motivation of a New Approach
134.3 Disjunction Names 14
4.3.1 Exporting Disjunction, Revisited 14
4.3.2 Other Applications of Named Disjunction 15
4.4 Feature Terms 16
4.4.1 Syntax of Feature Terms 16
4.4.2 Semantics of Feature Terms 18
4.4.3 Dierences to Smolkas Feature Terms 20
4.5 Feature Clauses 21
4.5.1 From Feature Terms to Sets of Constraints 22
4.5.2 Normal Form and Rewrite Rules without Dis-
junction 224.6 Applying the Maxwell/Kaplan Scheme 24
4.6.1 Conditional Constraints 26
4.6.2 Normal Form for Conditional Constraints 28
4.6.3 Discussion 33
4.7 From Conditional Constraints to Contexted Variables 34
4.7.1 Context-Uniqueness and Variants of Variables 35
4.7.2 Context-Unique Feature Descriptions 37
4.7.3 Translation to Context-Unique Form 38
4.7.4 Normal Form and Rewrite Rules 40
4 Contents
4.7.5 Soundness, Completeness, and Termination 43
4.7.6 An Example 51
4.7.7 Algorithmic Considerations 52
4.8 The Feature Graph Interpretation 53
4.9 Conclusion 57
Bibliography 59
4 Andreas Eisele & Jochen Dorre: Disjunctive Uni-cation
Abstract This paper describes techniques for the representation and
ecient unication of feature structures in the presence of disjunctive
information.
For the specication of feature structures we introduce feature terms
which may contain sorts, variables, negation and named disjunction.
Disjunction names allow for a compact representation and ecient man-
agement of multiple solutions even in cases where the dierences between
the solutions aect dierent parts of the structures. We present an al-
gorithm for the unication of such structures in form of simplication
rules for logical formulas which come with an open-world semantics.
Departing from a single algorithm for the non-disjunctive case, we show
how an algorithm for the disjunctive case can be derived using a general
scheme originally proposed in [Maxwell & Kaplan, 1989]. We investigate
an alternative representation, called context-unique feature descriptions,
where a closer coupling between variables and disjunctive contexts al-
lows major simplications of the algorithm, which facilitate an ecient
implementation.
The methods proposed here are in no way limited to the eld of natural
language processing, but could be useful in other applications of logic
programming, where the treatment of disjunctive information leads to
eciency problems within conventional approaches.
Keywords: Unication, Feature Structures, Disjunction, Constraint
Satisfaction
4.1 Introduction
4.1.1 Unication-based grammar formalisms
For about a decade now, recursive feature structures have been used as a
means of description in formal and computational linguistics. Among the
actual linguistic theories and formalisms which are based upon the uni-
cation of such feature structures or which have been implemented with
their help, one can nd formalisms such as FUG [Kay, 1979], [Kay, 1985],
LFG [Kaplan & Bresnan, 1982], GPSG [Gazdar et:al:, 1985], CUG
2 Chapter 4
[Uszkoreit, 1986], UCG [Zeevat/Klein/Calder, 1987], Systemic Gram-
mar [Kasper, 1987b], HPSG [Pollard & Sag, 1987], and many others.
Feature structures and the formalisms based on their unication have
properties which make them very suitable for the denition of grammars
as well as for computation.
Feature structures allow to express partial information about var-ious levels of linguistic description (phonology, syntax, seman-
tics ...) in a modular and uniform way, where also interactions
between dierent levels can be expressed easily.
The combination of such partial descriptions to more completeones can be performed by unication of feature structures.
Various notions of \grammar" which have been investigated informal language theory can be generalized in a natural way by the
introduction of feature structures.
Unication formalisms have clearly dened denotational seman-tics. In particular, the outcome of the unication of several struc-
tures can be dened independently from the details of implemen-
tation, such as, for instance, the order of computation.
Through its close relation with the terms used in logic program-ming and with nite automatons, ecient algorithms are known
which make unication possible in almost linear time.
These properties not only make unication grammars attractive as for-
malisms for linguistic theories, but also at the same time, open up ways
for practical applications in natural language processing. The implemen-
tations of natural language analysis systems which are directly or indi-
rectly based upon unication grammars are innumerable; and it has been
shown that unication grammars can also be successfully employed for
natural language generation [Momma & Dorre, 1987], [Shieber et:al:, 1989].
A further advantage of the use of feature structures in computational
linguistics is the close relationship to languages used for knowledge rep-
resentation (see [Smolka, 1988] for a more detailed discussion). The sim-
ilarities between the description languages used in both areas facilitates
the design of interfaces and systems where both areas are involved.
Two dierent approaches have been used to give a clear semantic base
to feature-unication formalisms. One is due to [Kasper & Rounds, 1986],
where the meaning of feature descriptions is dened relative to the do-
main of feature structures, which are labeled graphs. The other ap-
Andreas Eisele & Jochen Dorre: Disjunctive Unication 3
proach, due to [Smolka, 1988], does not make specic assumptions about
the domain, but uses an open world semantics instead. In this paper,
we will adopt the latter view and we will use the terminology given in
Smolka. In particular, we will not use the term feature structure in the
technical sense dened in [Kasper & Rounds, 1986], but as an informal
term referring to some representation of feature information which is
used in a practical implementation.
4.1.2 Disjunction and Ambiguity in Unication Grammars
Although ecient algorithms for the unication of feature structures are
well-known, most of the existing implementations of unication gram-
mar formalisms encounter increasing eciency problems when applied
to larger grammars. This can be explained by the fact that | as soon
as the described fragments of natural language grow to a useful size
| the need to handle a bunch of peculiar cases leads to the introduc-
tion of disjunctive information into the linguistic description. Usually,
this goes along with the existence of ambiguous derivations for certain
phrases. There are several sources of disjunctive information, including
lexical ambiguity , where diering analyses are possible for a given word
concerning part of speech, subcategorization for complements, morpho-
logical features, or any other information assigned to it, and structural
ambiguity introduced by dierent possible groupings of subphrases or
dierent interpretations of these phrases due to disjunctive annotations
of grammar rules.
Although most of these forms of ambiguity can also appear in context
free grammars, they are much more troublesome for an ecient process-
ing of unication grammars, since the techniques used in CF analysis to
represent ambiguous constituent structures (structure sharing, local rep-
resentation of ambiguity, see e.g. [Earley, 1970], [Graham/Harrison/Ruzzo, 1980],
[Tomita, 1987]) cannot be generalized simply to the treatment of feature
structures. In context free analysis, dierent c-structures of the same
part of a sentence can be combined and can be seen { from higher nodes
in the tree { as one single node whose complex internal structure does
not matter. However, a corresponding combination of a set of diering
feature structures is not always possible, since additional information
can interact in dierent ways with the feature structures of this set, so
that it cannot be easily treated as one unit.
Thus, unication grammars can describe more complex languages
4 Chapter 4
than context free grammars and consequently the recognition with uni-
cation grammars is much more complex in the worst case. Without
any restriction of the formalism, the recognition problem in the worst
case is not even decidable (this happens if an innite set of dierent fea-
ture structures can be constructed in some cyclic derivation chain). It
can be made decidable by imposing conditions such as o-line parsabil-
ity (see [Pereira & Warren, 1983]) on the grammar or on the formalism,
but still, the formalism can be used to state NP-hard problems (see
e.g. [Kasper, 1987a]). From this point of view, we can see the principal
limits of what an ecient implementation of unication grammars can
attain. On the other hand, the ineciency in actual implementations of
unication formalisms does not seem to be a consequence of this fact.
Often, one has the impression that a more sophisticated strategy could
save much computational overhead for a given grammar , although it
would of course not avoid exponential blow-up in the worst case.
4.1.3 The Processing of Ambiguous Unication Grammars
All implementations of unication-based grammar formalism have to
face the problem of an ecient treatment of ambiguity in one way or
another. This can be achieved by employing heuristics to solve am-
biguity as soon as possible, by using backtracking to process dierent
possibilities sequentially, or by trying to represent disjunctive informa-
tion explicitly and pursue all possibilities in parallel.
The early solution of disjunctions is limited to cases where a decision
can be found on the basis of the information which is present locally.
This is often impossible in practice, or it demands additional knowledge
sources and representational layers (for example, semantic or pragmatic
information or inferences where world knowledge is involved). If this
method is applied in situations where not enough information for a de-
cision exists, correct solutions can be lost.
The backtracking method is frequently applied in practice, since it
nds all possible solutions in a simple and systematic way. In particu-
lar, implementations in Prolog can make use of the depth-rst search-
ing algorithm which - just as unication of terms - is already a part
of the programming language. However, eciency problems appear if
disjunctions cannot be solved early by the appearance of further infor-
mation. By the division of computation into several mutually indepen-
dent branches, there is a danger that in these branches almost identical
Andreas Eisele & Jochen Dorre: Disjunctive Unication 5
work is carried out and thus the work is unnecessarily multiplied. In
particular, if many mutually independent disjunctions are treated by
backtracking, this then leads to a combinatorial explosion of possibili-
ties.
The quasi-parallel processing of dierent possibilities alone does not
provide any improvement vis-a-vis the backtracking method. In con-
trary: as long as no steps of computation are avoided, but only their
temporal order is modied, the space consumption can increase drasti-
cally, since the data structures which are used in dierent branches have
to be available at the same time. In practice, this can lead to additional
time consumption (for paging, garbage collection, etc.). On the other
hand, the quasi-parallel processing principally provides the opportunity
to exploit similarities between dierent branches for a more ecient rep-
resentation. The most popular method of quasi-parallel execution is the
use of a table, were intermediate results are stored in order to avoid
to compute them repeatedly. This table is called well-formed substring
table or chart .
Basically speaking, the quasi-parallel processing is already a form of
explicit representation of disjunctions, however, a very inecient one.
It can be improved by applying the techniques known from context free
analysis, namely structure sharing and ambiguity packing, to the repre-
sentation of feature structures. In this context, structure sharing means
that parts of feature structures used more than once are represented only
once. In [Karttunen & Kay, 1985], [Pereira, 1985], very detailed propos-
als have already been put forward which help to decrease the amount
of copying needed. It should not be overlooked that even sophisticated
methods of structure sharing can, at the best, avoid the copying of par-
tial structures which is not necessary in a good implementation of the
backtracking method anyway. More remarkable savings can be obtained
by combining dierent partial structures to one disjunctive structure
since, in this way, the number of dierent branches that have to be
considered can be reduced.
4.1.4 Organisation of Paper
The paper is organized as follows: We rst give an introduction to
unication-based grammar formalisms and motivate the use of disjunc-
tive information in such systems. Some general strategies that can be
used for the processing of disjunctive information are discussed. In sec-
6 Chapter 4
tion 4.2, we review some of the approaches to the unication of feature
structures containing value- and general disjunction, that have been pro-
posed in the literature. Section 4.3 motivates the introduction of named
disjunction. This extension combines advantages of value disjunction
with those of general disjunction and is useful both under aspects of
implementation and as an extension of the formalism. Section 4.4 in-
troduces the specication of feature structures with formulas called fea-
ture terms. These formulas use variables to express path equivalence
and allow to express named disjunction and (classical) negation. For
such feature terms, a denotational open world semantics is dened.
Section 4.5 introduces simple feature descriptions, which allow for a
constraint-based specication of feature structures not containing dis-
junction. Both a normal form and a rewrite system for the normaliza-
tion of simple descriptions are dened. Section 4.6 introduces conditional
constraints, which allow to extend feature descriptions to the disjunc-
tive case. The normalization procedure of Section 4.5 is extended using
the scheme given in [Maxwell & Kaplan, 1989] and some properties of
the resulting algorithm are discussed. In Section 4.7, an alternative
representation for disjunctive feature descriptions is given based on the
notion of context-uniqueness, that simplies both the feature descrip-
tions and the normalization rules. A translation procedure form feature
terms to the new representation is dened and the computational prop-
erties of the new method are compared to those of the algorithm from
Section 4.6. Section 4.8 investigates the domain of feature graphs as a
possible interpretation of the formulas given in the sections before and
shows that this interpretation is canonical, i.e. that it provides a model
for every consistent description. The last section gives some prospects
for possible applications and extensions of the methods discussed so far.
4.2 Unication of Feature Structures with DisjunctiveInformation
The rst step towards a representation of disjunctive information in
feature structures is to allow as the value of an attribute not only an
atom or an embedded feature structure, but also a disjunction of dierent
values. Analogous to Kasper [Kasper, 1987a], we will call this value
disjunction. Both the method sketched in [Karttunen, 1984] and the
Andreas Eisele & Jochen Dorre: Disjunctive Unication 7
eciency-oriented normal form (ENF) given in [Eisele & Dorre, 1988]
allow for the representation of such disjunctive values. In the sequel,
we will sketch some of the problems that arise during the unication of
such feature structures and the solutions proposed by [Karttunen, 1984]
and [Eisele & Dorre, 1988]. The straightforward approach to unication
in the presence of value disjunction is to traverse the structures to be
unied in the usual way and to multiply out all possibilities whenever
some substructure is being unied with a disjunctive value. However,
the situation gets more complicated as soon as disjunction interacts with
path equivalences.
Consider the unication of the feature structures1 shown in Fig. 4.1,
where the rst one contains an equivalence between the paths hobji andhvcomp subji (expressed by the shared variable O) and the second onea disjunctive value for the attribute vcomp. These structures represent
part of the information an LFG-style lexicon would assign to the verbs
\force" and \explode", respectively, where the latter has both a transi-
tive and an intransitive reading. The unication could e.g. be performed
by a parser analyzing a sentence \...he forced them to explode...". When
the unication algorithm gets to the point where the disjunctive value
for the attribute vcomp has to be unied with the value containing the
coreference, it is not obvious how a reasonable result of this unication
should look like and how it could be found. A simple enumeration of
the possibilities would not be sucient, since the rst branch of the uni-
cation would lead to a dierent instantiation of the variable O (namely
an identication with S1) than the second one, where it would be iden-
tied with S2 and where the feature animate would be restricted to the
value +. Hence, the value of the obj-feature would not be unique any
more, but would depend on the choice for a value for the vcomp-feature.
Hence, parts of the disjunction get linked to remote parts of the result-
ing structure, or stated dierently, disjunctive information is exported
through the path equivalence.
An implementation that modies the data structures used for the
representation of the feature structures would have problems to keep
the information from both disjuncts separate. In such representations,
the law of distributivity can not be applied freely to distribute conjunc-
1Capital letters are used to denote labels of f-structures (variables). The variablesappearing in the values of the pred-feature are not crucial for the example, but ratheradded for the sake of completeness.
8 Chapter 4
2664subj: S
pred: force(S,O,V)
obj: O
vcomp: V
subj:O
3775
u2666664vcomp:8>>>>>>>>>:
subj:S1pred:explodei(S1)
264subj:S2
animate:+
pred:explodet(S2,O2)
obj: O2
375
9>>>>>=>>>>>;
3777775
)
26666666666664
subj: S
pred: force(S,O,V)
obj: ??
vcomp:
8>>>>>>>>>:
subj:S1pred:explodei(S1)
264subj:S2
animate:+
pred:explodet(S2,O2)
obj: O2
375
9>>>>>=>>>>>;
37777777777775Figure 4.1A Unication Involving Disjunction and Path Equivalence
tive information over disjunctions, as long as information that originates
from a disjunctive context can give rise to global eects (be it destructive
modication or the instantiation of logical variables).
4.2.1 Karttunens Approach
Karttunens treatment of the problem is as follows. Both unications
are performed, but only if one of the unications fail, the other one is
treated in the usual way. If, however, both unications are successful,
their eect on the binding of variables (or the modication of the data
structures) are undone immediately afterwards. (We will call this a test-
unication in the sequel.) As the resulting value for the attribute vcomp,
Andreas Eisele & Jochen Dorre: Disjunctive Unication 9
a disjunction is used, where the disjuncts are tuples of structures which
are thought of as being conjunctively connected (see Fig. 4.2). Whenever
a disjunction of such tuples is resolved during later unications, the
structures in the chosen tuple have to be unied. The dicult point,
however, is to keep track if later modications of substructures render
some disjunct containing such tuples inconsistent. In order to achieve
this, Karttunen uses so-called constraints on the structures involved,
which indicate which disjuncts have to be re-checked for consistency
after the modication of a structure. These tests, again, are performed
by test-unications, that might exclude a disjunct when failing, but
have to be undone if successful. Moreover, Karttunens treatment does
not seem to catch all possible clashes, and hence structures containing
inconsistent information can remain undetected in certain circumstances
(see [Bear, 1987] for details and a discussion of possible improvements).
266666666666664
subj: S
pred: force(S,O,V)
obj: O
vcomp:
8>>>>>>>>>>>>>:
V
subj:O
subj:S1pred:explodei(S1)
0B@ Vsubj:O
264subj:S2animate:+
pred:explodet(S2,O2)
obj: O2
3751CA
9>>>>>>>=>>>>>>>;
377777777777775Figure 4.2[Karttunen, 1984]: Disjunction of Conjunctively Connected Tuples of Structures
4.2.2 Disjunction in [Eisele & Dorre, 1988]
The ENF-representation of feature structures the authors gave in [Eisele & Dorre, 1988]
was developed in order to avoid the repeated checking of consistency of
disjuncts by test-unications, since such test can lead to an unnecessary
repetition of similar computations in cases where the structures being
processed contain many disjunctions that cannot be resolved early.
Instead, our strategy was to apply the distributive law (i.e. to mul-
tiply out) in all cases where a disjunction has to be unied with some
other structure, but to handle the eects of such unications in a proper
10 Chapter 4
way, based on the logic calculus given in [Kasper & Rounds, 1986]. Ac-
cording to this logical foundation, it is e.g. save to distribute conjunctive
information over disjuncts without aecting the meaning of a represen-
tation. In the unication algorithm given in [Eisele & Dorre, 1988], the
case in which such a distribution of information has a non-local eect
is recognized and leads | if necessary | to a extension of the scope
of the disjunction involved. In the example given above, the eect on
the class of equivalent paths [[hvcomp subji; hobji]] would result | de-pending on the way the input formulas are represented | in one of the
results displayed in Fig. 4.32.
Although extensions of the scope of disjunctions are limited to the
minimal necessary amount, they can lead to the multiplication of obvi-
ously unrelated disjunctions which happen to be lifted to the top-level
of the representation due to such eects. This is aggravated by the
fact that the representation in [Kasper & Rounds, 1986] does not sup-
port the description of systems of structures with common parts, but
without common root, which occur frequently e.g. during parsing. The
introduction of an articial common root where the structures of the
system are embedded under articial labels increases the risk of getting
top-level disjunctions.
4.2.3 Representation of General Disjunction in [Kasper, 1987c]
Even if the interactions between path equivalencies and value-disjunction
are handled in some satisfactory way, one principal weakness of the rep-
resentation remains. Multiplying out each disjunction with all informa-
tion which is combined conjunctively with it leads to a duplication of
this information. In particular, if several disjunctive values for the same
attribute are unied, all possibilities have to be multipied out and, if
none of the combinations contain inconsistent information, all have to
be represented. A better approach would be to allow disjunction not
only on the level of values, but to use a representation where conjunc-
tive and disjunctive information can be mixed freely. For example, one
would like to take a formula3 like f: ((s1 t s2) u s3) as a representation2In this treatment, all paths in a class of equivalent paths besides the represen-
tative of this class carry so called non-local path expressions referring to the repre-sentative. The result of the unication in the example depends on the choice of therepresentative.
3We will always use the symbols u and t for conjunction (unication) and dis-junction of feature terms, since ^ and _ will be used for other purposes below.
Andreas Eisele & Jochen Dorre: Disjunctive Unication 11
Case 1:2666666666666664
subj: []
pred: force(hsubji,hobji,hvcompi)obj: hvcomp subji
vcomp:
8>>>>>>>>>>>:
subj:[]
pred:explodei(hvcomp subji)
24subj:animate:+
pred:explodet(hvcomp subji,hvcomp obji)obj: []
35
9>>>>>>=>>>>>>;
3777777777777775
Case 2:8>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>:
266664subj: []
pred: force(hsubji,hobji,hvcompi)obj: []
vcomp:
subj:hobjipred:explodei(hvcomp subji)
377775
2666666664
subj: []
pred: force(hsubji,hobji,hvcompi)obj:
animate:+
vcomp:
24subj:hobjipred:explodet(hvcomp subji,hvcomp obji)obj: []
35
3777777775
9>>>>>>>>>>>>>>>>>>>>=>>>>>>>>>>>>>>>>>>>>;Figure 4.3Two Possible Results in [Eisele & Dorre, 1988]
of the unication of f: (s1 t s2) with f:s3 if both disjuncts are compat-ible with s3. Of course interactions between dierent parts of such an
AND-OR-structure must be taken care of, since the representation no
longer guarantees consistency per se.
This way of representing general disjunction is used in [Kasper, 1987c],
where on the outermost level an AND-OR-tree is allowed, whose leaves
are ordinary (non-disjunctive) representations of feature structures. Fig. 4.4
shows how an AND-OR-tree representation for the example given above
would look like according to [Kasper, 1987c]. In the extreme case, this
12 Chapter 4
AND
8>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>:
2664subj: S
pred: force(S,O,V)
obj: O
vcomp: V
subj:O
3775
OR
8>>>>>>>>>>>>>:
vcomp:
subj:S1pred:explodei(S1)
264vcomp:
264subj:S2animate:+
pred:explodet(S2,O2)
obj: O2
375375
Figure 4.4Representation of Disjunctive Information in an AND-OR-Tree
representation can be exponentially more succinct than it is possible
with value disjunction alone, since dierent disjunctions are not multi-
plied out, but are accumulated as such. However, contrary to the log-
ical calculus given in [Kasper & Rounds, 1986], the use of disjunction
is restricted to the top level in Kaspers implementation. This makes
it possible to employ an ordinary unication algorithm which does not
have to care about disjunction. On the other hand, it increases the cost
of representing deeply embedded disjunctive values and makes it more
dicult to maintain consistency during the unication of such AND-
OR-trees, since also unrelated parts of the tree have to be taken into
account.
The unication of two such AND-OR-trees is pursued in three steps.
First, the non-disjunctive (xed) parts of the structures are unied
(using a non-disjunctive algorithm). In a second step, disjunctions are
pruned by removing all disjuncts which are incompatible with the xed
part found so far. Here, compatibility is determined by the use test-
unications, i.e. unications without remaining eect on the arguments.
In the last step, combinations of the remaining disjunctions are tried out
for each disjunct to determine if there is actually a possibility where this
disjunct is used. This step, again, makes heavy use of test-unications.
In cases where most of the disjunctions can be ruled out during a uni-
Andreas Eisele & Jochen Dorre: Disjunctive Unication 13
cation, Kaspers algorithm has the advantage that the inconsistencies
are very likely to be found in the earlier steps, so that they do not cause
trouble in the most expensive third step. In principle, however, the al-
gorithm needs an exponential amount of time to test for consistency in
each unication, which becomes a problem if more than a few disjunc-
tions remain unresolved during several unications. Even the locality of
the information cannot be exploited, since partial structures which do
not stand in relation to each other have to be combined every time.
Kasper acknowledges this problem and proposes two improvements.
The rst is to restrict the application of the last step to situations where
the disappearance of disjuncts is very likely (so called strategical mo-
ments). Secondly, he proposes to employ an indexing scheme on the
structures in the AND-OR-tree in such a way that only those disjunc-
tions have to be re-checked after a modication that can actually interact
with the modied parts. This could eliminate some of the problems that
arise from the restriction to top-level-disjunction, however, he does not
elaborate upon this proposal in detail.
4.2.4 Conclusion and Motivation of a New Approach
The three approaches described so far could be characterized according
to two criteria: a) the kind of disjunction they can handle, and b) the
method they use for consistency checking. Both Karttunens method and
our ENF-representation restrict themselves to value disjunction, whereas
in Kaspers approach the use of disjunction is not restricted to a single
attribute or path. Both Karttunens and Kaspers method rely on test-
unications, i.e. they unify feature structures and undo the eect of the
unication afterwards, which has the consequence that they might often
have to repeat similar computations in later steps. To the contrary, the
ENF-unication-algorithm always tries to keep the results of intermediate
computations as long as a later use seems possible.
Given these criteria, it appears that a method which is able to repre-
sent general disjunction locally and which does not rely on test-unications
would be an interesting alternative. Such an approach should attempt
to represent the result of intermediate computations in a way that it is
always available in later steps. However, this has to be done in such a
way that the computations which are performed for dierent branches of
a disjunct do not interact in an unwanted way and that as little structure
as possible has to be copied.
14 Chapter 4
4.3 Disjunction Names
4.3.1 Exporting Disjunction, Revisited
Consider once more the example from Fig. 4.1, where the unication
of a substructure with dierent disjuncts has dierent eects on the
binding of the variable O. Another way to represent the result would be
to bind this variable to a disjunctive value. However, the disjunction
in the value of O must be synchronized with the disjunctive value for
the attribute vcomp, i.e. it is not legal to choose the rst branch in
one of the disjunctions and the second one in the other. To express this
synchronization by position in the disjunction, we give both disjunctions
the same index d1. Such an index then stands for a global choice for one
of the positions in all disjunctions carrying this index. We then get the
situation shown in Fig. 4.5.
26666666666664
subj: S
pred: force(S,O,V)
obj: O = d1
S1 S2
}
vcomp:V =
d1
8>>>>>>>>>:
subj:S1pred:explodei(S1)
264subj:S2
animate:+
pred:explodet(S2,O2)
obj: O2
375
9>>>>>=>>>>>;
37777777777775Figure 4.5Disjunction with Index
This way of representing the non-local eects of disjunctive values has
a couple of advantages:
If the scope of a disjunction has to be extended by an \export" ofdisjunctive information through a path equivalence, the extension
is limited to exactly the necessary amount.
The representation is capable to express disjunctive informationconcerning more than one path, and is not limited to value dis-
junction in the usual sense.
Nevertheless, disjunctive information is represented locally as dis-junctive values of certain features, and consequently unications
Andreas Eisele & Jochen Dorre: Disjunctive Unication 15
concerning other features do not have to touch the disjunctive in-
formation at all. We do not need global consistency checking as in
Kaspers algorithm.
Once a scheme for the correct treatment of labeled disjunctions isimplemented, it can be used to decrease the scope of disjunctions
even further. For instance, the current example could be rep-
resented as shown in Fig. 4.6, where arbitrary values embedded
on paths other than hobji, hvcomp subj animatei, hvcomp prediand hvcomp obji could be added without any interaction with thedisjunction d1.
2666666664
subj: S
pred: force(S,O1,V)
obj: O1
vcomp:
V
2664subj:O1
hanimate:d1
X +
}ipred:d1
explodei(O1) explodet(O1,O2)
}obj: d1
NONE O2
}3775
3777777775Figure 4.6Disjunctive Information More Deeply Embedded
To summarize, one could say that the possibility of distributing disjunc-
tions with the same index over feature structures combines the advan-
tages of all representations discussed so far, if we succeed in giving a
unication algorithm for this representation.
4.3.2 Other Applications of Named Disjunction
The possibility to identify with a name the choice that is expressed in
disjunctive formulas allows for a clear and concise statement on depen-
dencies between dierent parts of a feature structure. For instance, we
might write a formula
syn : arg : case : (dat td1 acc) u sem : rel : (stat in td1 dir in)to express the fact that the semantic interpretation of the German
preposition \in" depends on the grammatical case of the following noun
phrase. The directional reading (translated as \into") goes along with
the accusative case of its argument, whereas for the stative reading
(translated as \in"), the argument stands in the dative case.
16 Chapter 4
4.4 Feature Terms
Before we can give a unication algorithm for feature structures contain-
ing labeled disjunction, we have to introduce a language for the speci-
cation of such structures and dene its exact meaning. In dening both
this language and its model-theoretic semantics we borrow several deni-
tions from [Smolka, 1988], however, we will generalize the representation
given there to support disjunction names.
The expressions of the language are so-called feature terms, where
each feature term describes a set of possible feature structures. The
language also allows for negative information (including negated path
equivalence) and the use of sort symbols, on which some semi-lattice is
dened. We will use variables to express (both positive and negative)
path equivalence, since they simplify both the use of the formalism (by
enabling clear and concise descriptions) and its computational processing
(by replacing explicit statements of classes of equivalent paths).
On the other hand, our formalization has some minor restrictions,
mainly to keep the presentation simple. For instance, it does not provide
a special notation for path equivalence, which can always be expressed
using variables, nor the possibility for negating complex subformulae,
since such negation can always be brought to the innermost level of
embedding. The use of disjunction is limited to named disjunction, i.e.
even for disjunctions that appear only once in a formula, a label has
to be introduced. We also limit ourselves to binary disjunction. These
restrictions do not decrease the expressive power of the formalism, since
it is straightforward to rewrite formulas containing such constructs into
equivalent formulas of our syntax in linear time.
One aspect under which out treatment diers slightly from Smolkas
is that we introduce a special sort symbol NONE in order to express the
non-existence of features. The dierences are discussed in section 4.4.3
in more detail.
4.4.1 Syntax of Feature Terms
For the following we assume a signature containing the following sets:
a set S of sort symbols, which forms a lower semilattice with re-spect to the partial order , i.e. the greatest lower bound (GLB)of two sorts is always a sort in S. > and ? are greatest and least
Andreas Eisele & Jochen Dorre: Disjunctive Unication 17
element, respectively. We use capital letters A;B;C : : : for sort
symbols.
a set Sg S of so-called singleton sorts. The GLB of a singletonsort with any other sort yields either ? or the singleton sort itself,i.e. there is no sort smaller than a singleton sort except ?. We uselowercase letters a; b; c : : : for singleton sort symbols.
a special sort NONE 2 Sg, which is incomparable to any other sortbesides > and ?. a set F of feature symbols. Letters f; g; h : : : will denote feature
symbols.
Additionally we assume:
an innite set V of variables, written: x; y; z; x1; y1 : : : an innite set D of disjunction names, written: d; d1; d2 : : :
The sets S, F, V and D are pairwise disjoint.
Denition 1 (Feature Terms) We dene feature terms with variables,
simple negation and named disjunction by the context-free production
rule given in Fig. 4.7. The set of feature terms is called FT. The letters
s, t, t1, : : : will always denote feature terms.
s; t ! A a sortj :A its complementj x a variablej :x its complementj f:s selectionj s u t conjunction (intersection)j s td t named disjunction (union)
Figure 4.7The Syntax of Feature Terms
Using this notation, we can now describe the feature structures de-
picted in Fig. 4.5 and Fig. 4.6 using the feature term in Fig 4.84.
4The terms force(xS ; xO1 ; xV ), explodei(xO1) and explodet(xO1 ; xO2) can bethought of as abbreviations (templates) for feature terms whose internal struc-ture does not matter here. These terms might involve the variables mentioned asparameters.
18 Chapter 4
subj : xSu pred : force(xS ; xO1 ; xV )u obj : xO1u vcomp : ( subj : xO1
u ( ( pred : explodei(xO1)u obj : NONE)td1
( pred : explodet(xO1 ; xO2)
u subj : animate : +u obj : xO2)))
Figure 4.8An Example for a Feature Term
4.4.2 Semantics of Feature Terms
We can now dene the semantics of our feature terms as follows. We
require an interpretation of a signature to be a pair (U ; I) of a universeof interpretation and an interpretation function such that:
>I = U ?I = ; for all sorts A;B: GLB(A;B)I = AI \BI singleton sorts are mapped onto singleton sets for every feature f : fI is a function U ! U . if a is a singleton sort and f is a feature symbol, then fI maps aI
into NONEI5
When interpreting a feature term with variables and named disjunc-
tions, we have to make sure that the same value is assigned to each
occurrence of a variable and that the same branch is chosen for each oc-
currence of a named disjunction. To achieve this, we introduce variable
assignments and disjunctive contexts . Variable assignments assign to
each variable some element of the universe. The idea behind disjunctive
contexts is that they assign to each disjunction name the branch that
5This restriction could be generalized quite naturally to non-singleton sorts, forwhich only certain features make sense. We could e.g. require that phonologyI mapsagreement-valueI into NONEI if an appropriate denition tells us so.
Andreas Eisele & Jochen Dorre: Disjunctive Unication 19
has to be taken for this disjunction and hence specify a possible inter-
pretation of a formula with named disjunction. Since we limit ourselves
to binary disjunctions, a branch of a disjunction can be specied by one
of the symbols l or r.
Denition 2 (U-Assignment) A U-assignment is an element of UV,i.e. a function from V to U . The symbol will always denote a U-assignment.
Denition 3 (Context) A context is an element of fl; rgD, i.e. a func-tion from D to the set fl; rg. The symbols , 0, etc. will always denotecontexts.
For a given interpretation, we dene the denotation of a feature term
in a context 2 fl; rgD under an assignment 2 UV as shown inFig. 4.9.
[[A]]; := AI
[[x]]; := f(x)g[[:s]]; := U [[s]];[[f:s]]; := fa 2 U j fI(a) 2 [[s]];g
[[s u t]]; := [[s]]; \ [[t]];[[s td t]]; :=
[[s]]; if (d) = l
[[t]]; if (d) = r
Figure 4.9Denotation of Feature Terms in context under assignment
We can leave out the reference to an assignment and dene the deno-
tation of a feature term in a context or without reference to a context
as follows:
[[s]] :=[
2UV[[s]];
[[s]] :=[
2fl;rgD[[s]]
20 Chapter 4
We will call a feature term consistent , i it has a non-empty denotation
for some interpretation. A feature term which has an empty denotation
in every interpretation is called inconsistent .
The reader might wonder why we restricted the use of negation in
feature terms, whereas the semantics of negation is dened without re-
striction. The reason is that this helps us to avoid some unnecessary
problems due to the fact that negation of complex feature terms con-
taining variables or named disjunctions does not have the intuitively ex-
pected meaning. According to our semantics, a term :(t1tdt2) turns outto be equivalent to :t1td:t2 and not | as expected | to :t1u:t2. Toget the expected meaning, we would have to introduce existential quan-
tication over variables and disjunctions before negating a feature term
containing them. It is not yet clear how such existential quantication
could be implemented.
4.4.3 Dierences to Smolkas Feature Terms
It might be important to point out some dierences between our feature
terms and those proposed in [Smolka, 1988]. Our treatment of dened-
ness and undenedness of features diers from Smolkas. We assume
that the interpretation of each feature symbol is a total function de-
ned on the universe of the interpretation. Instead of saying that such
a function is undened for a certain individual, we assume that this
function maps this individual onto the (only) member of NONEI . Thisspecial object itself is a xpoint for all functions associated to the fea-
tures. This modication simplies a couple of technical details, while
keeping the conditions for consistency essentially unchanged6. Among
other things, this modication will allow us to represent disjunction on
a deeper level of embedding in some cases. (See Fig. 4.6 for an exam-
ple.) Another advantage is that we do not need disjunction in order
to express undenedness of a path without implying denedness of its
prexes. (The term f : g : h : NONE would have to be written as
:f : > t f : :g : > t f : g : :h : > otherwise).For the sake of simplicity, we did not introduce a special notation
for agreement or disagreement of paths, since we can use variables and
6As far as we can see, the only dierence is that f : > is equivalent to > in ourtreatment, whereas it is more specic than > in Smolkas logic. To get the exactcounterpart of Smolkas interpretation of f : > in our syntax, we would have to writef : :NONE
Andreas Eisele & Jochen Dorre: Disjunctive Unication 21
their negation to express the same constraints. A term p # q expressingequivalence (including denedness) of the paths p and q can be written as
p: (xu:NONE)u q:x, and the disagreement p " q (including denednessof both paths) as p : (x u :NONE) u q : (:x u :NONE). However, theadditional constraints requiring denedness can of course be omitted at
will (in which case disagreement requires that at least one of the paths
must be dened).
Since our disjunctions carry names, not all the logical equivalences or
tautologies known from propositional logic apply to them. For instance,
we may not simplify the formula > tdi t to > unless all left-hand sidesof all disjunctions labeled di are less informative than the respective
right-hand sides. Similarly, labeled disjunction is not commutative in
the usual sense, i.e. if the name of the aected disjunction appears more
than once in the overall formula. Again, we have to swap all occurrences
of a disjunction with the same name in order to maintain the semantics
of a feature term. However, there are not only restrictions. Named
disjunctions allow also for tautologies that are not valid for ordinary
disjunction. While distribution of a conjunct over a disjunction and
vice versa works as usual, the rule for distributing a disjunct over a
conjunction can be generalized. A term of the form t1 td (t2 u t3) canbe replaced by (t1 td t2) u (> td t3) or by (> td t2) u (t1 td t3), sincethe label d of the disjunction excludes the possibility to choose one of t2and t3 without choosing the other.
Even if these dierences to the standard approaches might look dis-
advantageous at rst sight, they turn out to be a reasonable price, given
the other advantages of naming disjunction.
4.5 Feature Clauses
Since feature terms are a very useful way to encode linguistic specica-
tions they might be employed directly as a part of a grammar formalism.
However, the computational mechanisms needed for an implementation
are best described in terms of constraints over variables. We introduce
a relational language similar to Smolkas set descriptions and introduce
a normal form where inconsistencies become obvious, and simplica-
tion rules that allow to transform a given specication into this normal
form. Unlike Smolka, we use this language also to express descriptions
22 Chapter 4
containing disjunctive information.
4.5.1 From Feature Terms to Sets of Constraints
In this section, we will develop a constraint calculus for feature terms
without disjunction which will be extended in dierent ways in the fol-
lowing sections. In order to keep things simple and perspicuous, we do
not want to deal with many dierent types of constraints. Hence, we
will assume only one type that relates a variable with a feature term
describing possible values of this variable, and another that indicates
inconsistency. Such constraints will be called simple constraints .
Denition 4 (Simple Constraint) A simple constraint is either a pair
x jt where x 2 V and t 2 FT, or the symbol ?. SC is the set of simpleconstraints, and sc, sc1, : : : will always denote elements of SC.
Denition 5 (Satisfaction of simple constraints) For a given inter-
pretation we will say that a simple constraint sc is satised in a context
under an assignment (written ; j= sc) i sc has the form x jt and(x) 2 [[t]];. The simple constraint ? is never satised for arbitrary and . A set of simple constraints is satised in a context under an
assignment , if each of its members is satised in under .
Denition 6 (Simple Rooted Feature Description) A simple root-
ed f-description is a pair (x0; SC), where x0 2 V and SC SC.The denotation of a simple rooted f-description is dened as follows:
[[(x0; SC)]] := f(x0) j there are 2 UV; 2 fl; rgD such thatfor every sc 2 SC : ; j= sc g
We can easily verify that for every feature term t not containing the
variable x0, we can get an equivalent feature description (x0; fx0 j tg),i.e.:
[[t]] = [[(x0; fx0 jtg)]]4.5.2 Normal Form and Rewrite Rules without Disjunction
One reason for switching from terms to descriptions is the fact that for
the denition of a normal form and of rewrite rules for the normalization
we can exploit the work done by [Smolka, 1988]. Another reason is that
it is easy to apply the scheme given in [Maxwell & Kaplan, 1989] to
Andreas Eisele & Jochen Dorre: Disjunctive Unication 23
such a rewrite system. The idea of a normal form for constraints is to
restrict the syntax in such a way that inconsistent information can not
be hidden somewhere among dierent constraints, but has to show up
explicitly. Hence, we have to consider all possible kinds of contradiction
and to exclude them syntactically. The normal forms both for simple
constraints and for simple rooted f-descriptions are essentially the same
as in [Smolka, 1988], the only dierences resulting from our dierent
treatment of non-denedness:
A simple constraint is called normal if it has one of the following forms:
x jA or x j:A, where A 2 Snf>;?g x j:y, where x; y 2 V; x 6= y x jf:y, with y 2 V ?
Stated the other way, a non-normal constraint has one of the following
forms:
(1) x j>; (2) x j?; (3) x jy; (4) x j:>; (5) x j:?; (6) x j:x; (7) x jf:t; where t 62 V; (8) x jt1 td t2; or (9) x jt1 u t2.
A simple rooted f-description (x0; SC), is called normal if SC is a set
of normal simple constraints which satises the following conditions:
1. If SC contains ?, it does not contain any other constraint.2. If SC contains x jA and x jB with A;B 2 S, then A = B.3. If SC contains x ja and x jt with a 2 Sg, then t = a.4. If SC contains x jA and x j:B, then A is not a subsort of (or equal
to) B, and GLB(A;B) 6= ?5. If SC contains x j :A and x j :B, then A is not a proper subsort
of B.
6. If SC contains x jf:y and x jf:z, then y = z
We know that every normal simple feature description (x0; SC) with
SC 6= f?g has a non-empty denotation in some interpretation (U ; I)(see [Smolka, 1988] for a proof).
For each way a simple rooted f-description could fail to be normal,
one of the following rules can rewrite the description into an equivalent
one, which does not contain the conflict with normal form. For ease
of notation, we write sc & SC to denote fscg [ SC where sc 62 SC.SCx!y is the set of constraints obtained from SC where all occurrences
24 Chapter 4
of the variable x have been substituted by y. Since we do not want
to substitute the root variable away, the rewrite rules carry a reference
to this variable. The rules that handle single non-normal constraints
are numbered (Ss1) : : : (Ss8), whereas the rules (Ms1) : : : (Ms6) handle
cases where two normal simple constraints conflict with one of the co-
occurency conditions. We then get7:
For each of this rewrite rules, it can be shown that it does not modify
the denotation of the feature description. Furthermore, we can easily
check that for any non-normal feature description, one of the rules apply,
i.e. that if none of the rules match, the description must be normal.
Finally, we can prove that there is no innite chain of rule applications
by dening the size of a set of constraints as a non-negative integer
in such a way that this number is decreased by each rule application8.
Hence, we know that the rewrite rules constitute an eective procedure
for the normalization of an arbitrary feature description not containing
disjunction.
We did not give a rule (Ss9) for rewriting a constraint containing a
disjunction. Although the rules (Ss7); (Ss8) handle terms with embed-
ded disjunctions correctly, we can not rewrite a constraint of the form
x jt1 td t2 into an equivalent set of normal simple constraints.
4.6 Applying the Maxwell/Kaplan Scheme
Until now, our descriptions contain sets of constraints which are con-
nected conjunctively, i.e. all constraints have to be satised simultane-
ously by an adequate solution. Hence the rewrite system used for the
normalization can concentrate on the syntactic form of the constraints
and does not have to consider more complex relations between the con-
straints in the description. However, in order to deal with disjunctive
information, we have to give up some of this simplicity, since such cases
can not be represented with the simple constraints used so far. We might
7The treatment of inconsistency and equality in this rules may look unnecessarilylong-winded, but this formulation will facilitate the generalization to the disjunctivecase.
8Such an assignment is given e.g. by the sum cons+negv+2conj+2featv+4featn, where cons is the number of constraints 6= ?, negv the number of :-symbolsdirectly followed by a variable, conj the number of u-symbols, featv the number of:-symbols directly followed by a variable and featn the number of other :-symbolscontained in the description.
Andreas Eisele & Jochen Dorre: Disjunctive Unication 25
(Ss1) x j> & SC !x0 SC(Ss2) x j? & SC !x0 ? & SC(Ss3a) x jx & SC !x0 SC(Ss3b) x jy & SC !x0 SCx!y if x 6= x0(Ss3c) x0 jy & SC !x0 SCy!x0(Ss4) x j:> & SC !x0 ? & SC(Ss5) x j:? & SC !x0 SC(Ss6) x j:x & SC !x0 ? & SC(Ss7) x jf:t & SC !x0 x jf:y & y jt & SC
where t 62 V and y is new(Ss8) x jt1 u t2 & SC !x0 x jt1 & x jt2 & SC(Ss9) x jt1 td t2 & SC !x0 can not be handled yet
(Ms1) ? & x jt & SC !x0 ? & SC(Ms2) x jA & x jB & SC !x0 x jGLB(A;B) & SC(Ms3a) x ja & x j:y & SC !x0 x ja & y j:a & SC(Ms3b) x ja & x jf:y & SC !x0 x ja & y jNONE & SC(Ms4a) x jA & x j:B & SC !x0 ? & SC if A B(Ms4b) x jA & x j:B & SC !x0 x jA & SC
where GLB(A;B) = ?(Ms5) x j:A & x j:B & SC !x0 x j:B & SC if A < B(Ms6) x jf:y & x jf:z & SC !x0 x jf:y & z jy & SC
Figure 4.10Rewrite Rules for Simple Constraints
consider allowing for arbitrary boolean combinations of constraints or at
least some AND-OR-structures, but such a modication would be a very
drastic step and would make the rewrite system much more complicated
than it is now. Our goal will be to generalize the system in such a way
that as many as possible of its current properties are preserved.
26 Chapter 4
This is indeed possible, and one of the possibilities we have is to apply
the scheme given in [Maxwell & Kaplan, 1989] to our calculus. The
principle of their scheme is simple and ts very well into our current
framework.
4.6.1 Conditional Constraints
When dealing with disjunctive information, we need a way to relate our
constraints in some way to the disjunction names that appear in feature
terms. More specically, we want to express that a given constraint has
to be satised if certain disjunctive branches are chosen. To this end
we will attach a so-called context description to our constraints, which
denotes the disjunctive contexts in which the constraint has to be valid.
The key idea is that a simple constraint of the form x jt1 td t2 can bereplaced by the conjunction (d: l ! x jt1)^ (d:r ! x jt2) expressing thatx j t1 has to hold if the left branch of disjunction d is chosen, and x j t2,otherwise. Since these new constraints are still conjunctively connected,
we can hope to achieve a representation for such conditional constraints
and a generalization of our rewrite system without distroying the overall
structure of our method. We will use arbitrary boolean combinations of
disjunctive choices | called context descriptions | in order to express
conditions under which constraints have to hold.
Denition 7 (Context Descriptions) A context description is a propo-
sitional formula where the constant true, variables written di: l and di:r
with di 2 D, and the operators ^, _ and : may be employed.CD will denote the set of context descriptions. The symbols k; k1; : : :
will always denote members of CD.
The set of purely conjunctive context descriptions, i.e. those that do not
contain the operators _ and :, is denoted by CDc.
Denition 8 (Satisfaction of Context Descriptions) A context
satises a context description k, (written j=c k) according to the fol-lowing conditions:
j=c true always j=c d:b i (d) = b (b 2 fl; rg) j=c k1 ^ k2 i j=c k1 and j=c k2 j=c k1 _ k2 i j=c k1 or j=c k2 j=c :k i 6j=c k
Andreas Eisele & Jochen Dorre: Disjunctive Unication 27
If j=c k, we will also say that k describes or covers or that liesin k.
A context description is called contradictory, if no context satises it,
otherwise it is consistent .
Two context descriptions which are satised by exactly the same con-
texts are called equivalent (written ).
We can now annotate a simple constraint with a description of the
disjunctive contexts under which this constraint has to be valid.
Denition 9 (Conditional Constraint) A conditional constraint is a
pair sc[k] where sc 2 SC and k 2 CD. CC is the set of conditionalconstraints, and cc, cc1, : : : will always denote elements of CC.
9
For a given interpretation we will say that a conditional constraint
sc[k] is satised in a context under an assignment according to the
following conditions:
; j= x jt[k] i 6j=c k or (x) 2 [[t]];; j= ?[k] i 6j=c k
Clearly, a conditional constraint sc[k] is trivially satised in every
context with 6j= k. Stated dierently, the constraint sc[k] is eectivein contexts described by k. Suppose for instance k = d1: l ^ d2:r ^ d4: l,then we could express this as: \either sc has to be true or we have to
choose d1 : r or d2 : l or d4 : r".
A constraint of the form ?[k] is used to mark the contexts in k asinconsistent in order to exclude them from further consideration.
Denition 10 (Rooted Feature Description) A rooted f-description
is a pair (x0; CC), where x0 2 V and CC CC.The denotation of a rooted f-description is dened as follows:
[[(x0; CC)]] := f(x0)j 2 UV ^ 9 2 fl; rgD8cc 2 CC : ; j= ccg
We can easily verify that for every feature term t not containing x0:
[[t]] = [[(x0; fx0 jt[true]g)]]9A constraint sc[k] might also be written k! sc as in [Maxwell & Kaplan, 1989],
which would make its semantics more explicit. We do not use this notation since theimplication sign could be confused with the arrow used in rewrite rules.
28 Chapter 4
4.6.2 Normal Form for Conditional Constraints
A conditional constraint sc[k] is normal if sc is normal and k is consis-
tent. A rooted f-description (x0; CC), is called normal if CC is a set of
normal conditional constraints which satises the following conditions:
1. if CC contains ?[k] and x jt[k0], then k0 ^ k is contradictory2. if CC contains x jA[k] and x jB[k0] where k^k0 is consistent, thenA = B
3. if CC contains x ja[k] and x j t[k0] where k ^ k0 is consistent, thent = a
4. if CC contains x jA[k] and x j :B[k0] where k ^ k0 is consistent,then A 6 B.
5. if CC contains x j :A[k] and x j :B[k0] where k ^ k0 is consistent,then A 6< B and GLB(A;B) 6= ?.
6. if CC contains x jf :y[k] and x jf : z[k0] where k ^ k0 is consistent,then y = z
7. if CC contains sc[k] and sc[k0], then k=k
The rst six conditions are a straightforward generalization of the
normal form conditions for simple feature descriptions. Two conditional
constraints sc1[k]; sc2[k0] where sc1; sc2 would violate one of the normal
form conditions for simple descriptions can only coexist if their context
descriptions k and k0 are incompatible10
The last condition disallows constraints with dierent context descrip-
tions, but otherwise carrying the same information. In particular, this
has the consequence that all information about inconsistent contexts has
to be concentrated in one constraint ?[k]. If the overall description isinconsistent, this will eventually result in a constraint ?[k] where k isequivalent to true. In this case, the rst condition enforces that no
other constraint can be contained in the description. Hence, a normal
rooted f-description (x0; CC) where CC is dierent from f?[k]g withk true has a non-empty denotation in some interpretation (U ; I).
10Some rewrite steps might be saved by weakening the rst condition to:If CC contains ?[k] and x j t[k0], then k0 ^ :k is consistentIn this case, (Mc1) could be simplied to:
(Mc1) ?[k1] & x j t[k2] & CC !x0 ?[k1] & CC; if k2 ^ :k1 is contradictory
Andreas Eisele & Jochen Dorre: Disjunctive Unication 29
In order to generalize the rewrite system above to work for conditional
constraints, we can apply the scheme given in [Maxwell & Kaplan, 1989].
In our notation, this scheme says: Take a rewrite rule for simple con-
straints of the form sc1 & sc2 & SC ! sc3 & SC and replace it bythe rule
sc1[k1] & sc2[k2] & CC ! sc1[k1 ^ :k2] & sc2[k2 ^ :k1]& sc3[k1 ^ k2] & CCif k1 ^ k2 is consistent
In cases where sc2 = sc3 (i.e. rules where sc1 is eliminated if sc2 is
present), we can simplify the outcome of the scheme to
sc1[k1] & sc2[k2] & CC ! sc1[k1 ^ :k2] & sc2[k2] & CCif k1 ^ k2 is consistent
Using this scheme, we can nd rewrite rules for conditional constraints
that replace the rules (Ms1); (Ms2); (Ms4a; b) and (Ms5). The other
cases in the second group dier only in so far as they rewrite two con-
flicting constraints into two dierent constraints. For these cases, the
scheme can be slightly generalized in the obvious way, such that both
resulting constraints are marked with the context description [k1 ^ k2].With this generalization, for the remaining rules of the second group a
conditional variant can be found (see Fig. 4.12).
Most rules of the rst group have the form sc1 & SC ! SC1 [ SC,where SC1 is a set of 0, 1, or 2 simple constraints that carry the informa-
tion contained in sc1. These rules can be generalized straightforwardly
to the conditional system. The context description of the input con-
straint just has to be copied to the constraints in SC1, as shown in
Fig. 4.11. It is now straightforward to give a rule (Sc9) for rewriting
a constraint containing a disjunction to a pair of constraints with more
specic context descriptions.
The normal form for conditional descriptions additionally prohibits
constraints with contradictory context descriptions and also equal con-
straints with dierent context descriptions. These are eliminated by us-
ing the rules (Sc10) and (Mc7). The former of these rules will eliminate
constraints produced by (Mc1) : : : (Mc6) if either k1 ^ :k2 or k2 ^ :k1are contradictory11.
11It will also eliminate constraints resulting from inaccessible disjuncts of the inputformula, such as xi j t2[d: l^ d:r] if the original feature term contained t1 td (t2 td t3)
30 Chapter 4
The only case that still has to be handled are constraints of the form
x j y[k]. In the non-disjunctive case, one of the variables has been sub-stituted by the other and the constraint could be removed from the
description. Unfortunately, this would not be correct in the disjunctive
case, since such a substitution would restrict x and y to the same value
in all contexts, not only in those contexts described by k12.
We can x this problem by introducing the operation of conditional
substitution. The idea behind this operation is that x is substituted
by y in (all contexts described by) k, written CCx!y[k]13. This meansthat constraints whose context descriptions do not overlap with k (i.e.
describe no common contexts) are not aected by the substitution. For
constraints that are only eective in contexts covered by k, the sub-
stitution is done as usual, whereas constraints that contain x and are
eective both in contexts covered by k and in contexts covered by :khave to be split into one version that is left unmodied and another
where substitution takes place.
Denition 11 (Conditional Substitution) The substitution of a vari-
able x by a variable y under condition k in a set of conditional constraints
CC, written CCx!y[k] is dened as follows:
CCx!y[k] := fsc[k0] 2 CC j sc does not contain xg[ fsc[k0 ^ :k] j sc[k0] 2 CC; sc contains x;
k0 ^ :k consistentg[ f(scx!y[k0 ^ k] j sc[k0] 2 CC; sc contains x;
k0 ^ k consistentgwhere scx!y denotes the substitution of all occurrences of x by y in theconstraint sc.
Using conditional substitution, we can give also a conditional version of
the rules (S3b; c).
We can show termination as follows: A set of conditional constraints
CC can be seen as mapping from contexts to sets of simple constraint,
12Unless y is the root variable, this can only be a problem if there are constraintssc1[k1]; sc2[k2] (not necessarily dierent) where sc1 contains x, sc2 contains y andk1 ^ k2 ^ :k is consistent.
13[Maxwell & Kaplan, 1989] do not dene conditional substitution. However, theway a constraint x j y[k] is treated in the example they give (their notation wouldbe k ! x y) would lead to exactly the same result as applying the substitutionx! y[k].
Andreas Eisele & Jochen Dorre: Disjunctive Unication 31
(Sc1) x j>[k] & CC !x0 CC(Sc2) x j?[k] & CC !x0 ?[k] & CC(Sc3a) x jx[k] & CC !x0 CC(Sc3b) x jy[k] & CC !x0 CCx!y[k] if x 6= x0(Sc3c) x0 jy[k] & CC !x0 CCy!x0[k](Sc4) x j:>[k] & CC !x0 ?[k] & CC(Sc5) x j:?[k] & CC !x0 CC(Sc6) x j:x[k] & CC !x0 ?[k] & CC(Sc7) x jf: t[k] & CC !x0 x jf:y[k] & y j t[k] & CC;
where t 62 V and y is new(Sc8) x j t1 u t2[k] & CC !x0 x j t1[k] & x j t2[k] & CC(Sc9) x j t1 td t2[k] & CC !x0 x j t1[k ^ d: l] & x j t2[k ^ d:r] & CC(Sc10) sc[k] & CC !x0 CC; if k is contradictory
Figure 4.11Rewrite Rules for Conditional Constraints, Part I
where each context is mapped into the set of constraints that are ef-
fective in this context. For each context, the rewrite system does the
same thing as the unconditional version, hence for each context there
are only nitely many rule applications possible. However, this argu-
ment does not suce, since there are innitely many dierent contexts.
But if we restrict ourselves to the consideration of relevant disjunction
names, i.e. names that do appear in the initial feature description, we
nd that there is only a nite (although of course exponential) number
of relevant partial contexts, i.e. contexts that are dened only for the
disjunction names in use. We can dene the size of a set of conditional
constraints so that for each constraint the number of dierent partial
contexts where this constraint is eective is taken as a factor (this de-
nes the size relative to a given set of disjunction names). Now each of
the rewrite rules obtained by the schemes above decreases the size of a
conditional description. This is not necessarily true for the combination
of equal constraints (Mc7), where the result can have the same size, nor
for the elimination of constraints with contradictory context descriptions
32 Chapter 4
Rules (Mc1):::(Mc6) apply only if k1 ^ k2 is consistent.
(Mc1) ?[k1] & x j t[k2] & CC!x0 ?[k1] & x j t[k2 ^ :k1] & CC
(Mc2) x jA[k1] & x jB[k2] & CC!x0 x jGLB(A;B)[k1 ^ k2] & x jA[k1 ^ :k2]
& x jB[k2 ^ :k1] & CC(Mc3a) x ja[k1] & x j:y[k2] & CC
!x0 x ja[k1] & y j:a[k1 ^ k2] & x j:y[k2 ^ :k1] & CC(Mc3b) x ja[k1] & x jf:y[k2] & CC
!x0 x ja[k1] & y jNONE[k1 ^ k2] & x jf:y[k2 ^ :k1] & CC(Mc4a) x jA[k1] & x j:B[k2] & CC; where A B
!x0 ?[k1 ^ k2] & x jA[k1 ^ :k2] & x j:B[k2 ^ :k1] & CC(Mc4b) x jA[k1] & x j:B[k2] & CC; where GLB(A;B) = ?
!x0 x jA[k1] & x j:B[k2 ^ :k1] & CC(Mc5) x j:A[k1] & x j:B[k2] & CC; where A < B
!x0 x j:B[k2] & x j:A[k1 ^ :k2] & CC(Mc6) x jf:y[k1] & x jf:z[k2] & CC
!x0 x jf:y[k1] & z jy[k1 ^ k2] & x jf:z[k2 ^ :k1] & CC(Mc7) sc[k1] & sc[k2] & CC
!x0 sc[k1 _ k2] & CC
Figure 4.12Rewrite Rules for Conditional Constraints, Part II
(Sc10), which are not counted at all. But if we take pairs (s; n), where
s is the weighted size of the descriptions and n is the number of con-
straints, then we nd that each rule application decreases at least one of
these numbers and s is never increased. Hence the rewrite system will
terminate.
Andreas Eisele & Jochen Dorre: Disjunctive Unication 33
4.6.3 Discussion
We have applied the method proposed in [Maxwell & Kaplan, 1989] to
our rewrite system for simple feature descriptions and we have obtained
a rewrite system that supports disjunctive information. This system
has several advantages, one of the most important being the fact that
the locality of disjunctions is fully maintained during the computation.
This means that information that does not interact with a disjunction
needs to be represented only once and never has to multiplied out with
this disjunction. Consequently, only those disjunctions are multiplied
out with each other for which this is really necessary in order to test
consistency.
However, the formulation given here is still on a very abstract level and
in order to obtain an ecient implementation, a couple of details have
to be claried. One of the most critical points is the fact that at a very
central point of the algorithm (in an inner loop, to speak in programmers
terms), context descriptions have to be checked for compatibility in order
to nd out if two constraints can coexist or if they conflict and have to be
rewritten. Since context descriptions can be unrestricted propositional
formulas (and the algorithm does in fact produce formulas containing
conjunction, disjunction and negation), the test for compatibility can
not be implemented eciently, since it is known to be NP-complete.Hence in the worst case, in which all disjunctions interact in some way,
we do not only get an exponential number of rewrite steps, but each
step may involve tests that need exponential time. Since the overall
number of constraints for a given variable might grow exponentially in
such a scenario and since each of them has to be checked with the other
constraints concerning the same variable, each rewrite step could involve
an exponential number of compatibility test. Overstating it a bit, in the
\even worse than worst" case14 our algorithm could cube the complexity
of expansion to DNF.
For an ecient implementation, it does not suce to index the con-
straints under the variables they refer to, but the constraints concerning
the same variable have to be indexed according to their context descrip-
tions in a way that reduces the number of compatibility tests between
them. One aspect of this point is the observation (also mentioned in
[Maxwell & Kaplan, 1989]) that the outcome of a rewrite step are con-
14We do not know if this case can arise, actually.
34 Chapter 4
straints with mutually incompatible context descriptions. Of course, this
should be exploited and the mutually irrelevance of such constraints (and
those resulting from them in further rewrite steps : : : ) should not have
to be recomputed again and again. However, this is not easy to do, since
there are a couple of dierent relations in which context descriptions can
stand (incompatibility, compatibility, subsumption) that would all have
to be treated dierently in such an indexing scheme.
Another diculty, which is closely related to the problems mentioned,
lies in the notion of disjunctive substitution. Here, all constraints con-
cerning a variable have to be compared with the context description of
the substitution. Some of the constraints may remain unchanged, in
some of them the variable has to be replaced, and some have to be split
into one for the old variable (but with new context description) and one
for the new variable. The constraints obtained for the new variable must
then be unied with those that were already present for this variable,
what might involve a reorganization of the context indices and trigger
some new rewrite steps.
It is clear that the context-indexing scheme yet to be invented for
conditional constraints will have to support several rather intricate op-
erations. The representation given so far is indierent to such questions
and gives no indication of an answer. It is not quite so clear, how the
work that has been done to optimize non-disjunctive unication algo-
rithms (for example the almost linear solution to the union/nd problem,
see e.g. [Martelli & Montanari, 1982]) could be exploited in a simple way
or how the implementation of disjunctive constraint satisfaction could
exploit a non-disjunctive unication algorithm available in the program-
ming language or environment (e.g. Prolog).
4.7 From Conditional Constraints to Contexted Vari-ables
In this section we want to propose another rewrite system for feature de-
scriptions, which oers some simple answers to the questions mentioned
in the last section. It can be seen as an implementation of the algo-
rithm given above, since some of the operations which have been used
there, but were not dened (test for compatibility of contexts, context-
indexing) will be (implicitly) part of the algorithm given here. Although
Andreas Eisele & Jochen Dorre: Disjunctive Unication 35
in this respect the algorithm will be more detailed, it will nevertheless
be simpler.
Our method extends the method given in [Dorre & Eisele, 1989] to
feature descriptions containing negation. It is also an extension inas-
much we give a translation from our feature terms into appropriate fea-
ture descriptions. Originally, the method evolved from a prototypical
Prolog-implementation of a unication algorithm for named disjunction,
where extendable feature structures and path equivalences are repre-
sented with logical variables, which are instantiated during the unica-
tion process. Prolog provides unconditional substitution (instantiation)
of variables as a primitive operation, but it is not easy (esp. if eciency
is important) to implement a modied variant, such as conditional sub-
stitution. Hence we tried to nd a representation that could save us the
need for doing so. We were in fact able to nd such a representation
and it turns out that our approach is simple and has a couple of addi-
tional advantages, including the fact that in the core of our algorithm
we can restrict ourselves to the treatment of purely conjunctive context
descriptions, which can be processed more eciently.
4.7.1 Context-Uniqueness and Variants of Variables
The key idea is to restrict the use of variables in such a way that it is
safe to replace conditional substitution by conventional substitution. For
instance, one can easily see that there is no dierence between CCx!yand CCx!y[k], if for all constraints sc[k0] containing the variable x, k0
entails k, i.e. k0 describes only contexts also covered by k. Our trick willbe to require that essentially all occurences of a variable x aect the
same set of contexts, e.g. those described by k0. Then every conditionalsubstitution x ! y[k] can be replaced by the substitution x ! y, ifk0 entails k, especially if k0 k. We call this condition (which will bedened more precisely below) the context-uniqueness of variables. We
will set up the normal form and the rewrite system in such a way, that
conditional substitutions of x always happen in k0 and that context-uniqueness of a description is maintained during the rewrite process.
Before we dene context-uniqueness, we rst observe that an occur-
rence of a variable x0 in a conditional constraint x jt[k] is relevant to allcontexts in k, if x0 occurs outside the scope of a disjunction in t, whereasthis occurrence is relevant only to contexts described by di : l ^ k, if xis embedded in the left hand side of a disjunction labeled di, and anal-
36 Chapter 4
ogously for the right hand side and for deeper embedded occurrences.
Context-uniqueness will require that each occurence of a variable is rele-
vant to the same set of contexts . The relevant contexts will be regarded
as an inherent and invariant property of variables, and we will intro-
duce a function Con : V 7! CDc that maps each variable in use to apurely conjunctive description of the contexts it is relevant to. As a
consequence of context-uniqueness, it will not be necessary to represent
context descriptions with constraints that contain a variable, since the
possible contexts of a constraint can be seen from the variable(s) it con-
tains. However, the constraint ?[k], expressing inconsistency, will stillneed its context description.
When representing disjunctive information, we have to connect the
root variable (which is relevant to all contexts) with variables occuring
in conditional constraints without violating context-uniqueness. In our
normal form, we will use constraints of the form x j x1 td1 x2 to makesuch connections. If x is relevant to all contexts described by some
k, context-uniqueness will enforce that x1 and x2 are relevant only to
contexts in d1: l ^ k and d1:r ^ k, respectively. The constraint says thatx and x1 have to be identical in contexts described by k ^ d1: l and sodo x and x2 in contexts described by k ^ d1:r. Such constraints can beseen as bifurcations that distribute the information attached to x over
the variables on the right-hand side.
We will call x1 and x2 variants of x, to be more precise x1 will be
called the d1: l-variant and x2 the d1:r-variant of x. Assume an additional
constraint x1 jx3 td2 x4, then x3 will be called the d1 : l ^ d2 : l-variantof x and so on. x1 and x2 (but not x3 etc.) will be called direct variants
of x. We will (e.g. during the translation of a description into context-
unique form) refer to a variant of a variable x without having a variable
name for this variant. To this end, we will use a special notation x=k
to denote the k-variant of x. Such expressions will be called contexted
variables.
Denition 12 (Contexted Variables) A contexted variable is a pair
x=k where x 2 V and k 2 CDc.Vc will denote the union of V with the set of contexted variables. Ele-
ments of Vc will be written with capital letters X;Y; Z;X1; Y1 : : :
To mark the distinction, we will sometimes call the members of V pure
variables.
Andreas Eisele & Jochen Dorre: Disjunctive Unication 37
Now, instead of accumulating constraints on the variable x which might
be eective in dierent contexts and could interact in complicated ways,
we can introduce new variables as variants of x and attach the informa-
tion to them.
In our constraints, we will also employ feature terms containing con-
texted variables.
Denition 13 (Contexted Feature Terms) A contexted feature term
is built according to denition 1, but where both pure and contexted
variables may occur.
The set of contexted feature terms will be denoted by FTc. We will
generalize our notation so that henceforth s; t; t1 : : : might also denote
contexted feature terms.
The denotation of a contexted feature term in a context 2 fl; rgDunder an assignment 2 UV is dened as for usual feature terms byadding:
[[x=k]]; :=
f(x)g if j=c k; otherwise
4.7.2 Context-Unique Feature Descriptions
Our new descriptions will have three components: A root variable x0,
a set of constraints containing contexted variables and feature terms
x j t or conditional inconsistencies ?[k], and a context assignment , i.e.a mapping from the variables occuring in the constraints to the set of
context descriptions.
Denition 14 (Context Assignment) A context assignment Con is
a partial mapping from V into CDc, the set of conjunctive context
descriptions. The domain of a context assignment Con can be extended
to contexted variables by dening: Con(x=k) := Con(x) ^ kIn order to restrict ourselves to context-unique descriptions, we have
to dene the context compatibility of a feature term. This denition is
somewhat technical and the reader can skip it, since our algorithm will
produce only context-unique descriptions, anyway.
Denition 15 (Context compatibility) Given a partial assignment
Con : V 7! CDc, a contexted feature term t is context compatibleto a context description k with respect to Con, written t Con k,according to the following conditions.
38 Chapter 4
A Con k for arbitrary k 2 CDcX Con k i Con(X) k:t Con k i t Con kf:t Con k i t Con ks u t Con k i s Con k and t Con ks td t Con k i s Con k ^ d: l and t Con k ^ d:r
Denition 16 (Context-unique feature descriptions) A context-unique
feature description is a triple (x0; CUC;Con) such that:
x0 2 V CUC is a set of constraints which either have the form?[k], where k 2 CD orX jt, where X 2 Vc; t 2 FTc Con is a context assignment which is dened for all variables inCUC
The constraints in CUC are context-unique with respect to Con,i.e. for every X jt 2 CUC : t Con Con(X)
The semantics of context-unique feature descriptions is given by the
satisfaction relation j=Con between variable assignments15 in a contextand constraints, which is parametrized with a context assignment.
; j=Con X jt i 6j=c Con(X) or (X) 2 [[t]];; j=Con ?[k] i 6j=c k
The denotation of a context-unique f-description is dened as:
[[(x0; CUC;Con)]] :=
f(x0) j 2 UV ^ 9 2 fl; rgD : 8cuc 2 CUC : ; j=Con cucg
4.7.3 Translation to Context-Unique Form
We will now give a translation algorithm that computes for a given fea-
ture description (x0; fx0 jt[true]g) an equivalent context-unique featuredescription (x0; CUC;Con).
15 is extended to contexted variables by dening: (x=k) := (x)
Andreas Eisele & Jochen Dorre: Disjunctive Unication 39
Decomposition The rst step is to decompose constraints contain-
ing complex feature terms into simpler constraints. We do not need
new rules to do that, we can just apply the rules (Sc1); (Sc2); (Sc3a),
and (Sc4) : : : (sc10) as long as possible. This process produces only con-
straints of the form x j t[k] or ?[k], where k 2 CDc. We do not usethe rules (Sc3b; c) nor those from (Mc1) : : : (Mc7), since they could in-
troduce negation or disjunction into context descriptions of constraints
containing variables. Since this step is a subset of the algorithm given
in the last section, it will terminate.
After decomposition, we will have a feature description (x0; CC1)
equivalent to the initial description, which contains only constraints of
the following forms:
x jA[k] or x j:A[k], where A 2 Snf>;?g x jy[k] or x j:y[k], where x; y 2 V; x 6= y x jf:y[k], with y 2 V ?[k]
All context descriptions k appearing in CC1 will be purely conjunctive
and consistent. The description is not normal, since it still contains
equalities and the constraints might conflict with each other. But before
normalizing it, we will turn it into context-unique form.
Constructing a Context-Unique Feature Description In the sec-
ond step, we construct a context-unique feature description by replacing
all occurences of variables in conditional constraints x j t[k] 2 CC1 bytheir respective k-variants, if k 6 true. We do not yet have variablenames for these variants, but we can use the contexted variables x=k. If
k true, we do not have to introduce variants, we can use the vari-ables themselves. Since the context descriptions of the constraints will
be implicitly encoded in their variables, we can omit them. However,
the context description of a constraint ?[k] can not be inferred fromvariables, so we keep such constraints unchanged.
Variables occuring in the original description will be regarded as rel-
evant to all contexts. Hence we initialize Con so, that all variables
appearing in CC1 are mapped to true. We then get (x0; CUC;Con)