Document92

Disjunctive Unication

Andreas Eisele, Jochen DorreInstitut fur maschinelle SprachverarbeitungUniversitat Stuttgart

A Bradford BookThe MIT PressCambridge, MassachusettsLondon, England

Contents

4 Andreas Eisele & Jochen Dorre: DisjunctiveUnication 1

4.1 Introduction 1

4.1.1 Unication-based grammar formalisms 1

4.1.2 Disjunction and Ambiguity in Unication Gram-

mars 34.1.3 The Processing of Ambiguous Unication Gram-

mars 44.1.4 Organisation of Paper 5

4.2 Unication of Feature Structures with Disjunctive In-formation 64.2.1 Karttunens Approach 8

4.2.2 Disjunction in [Eisele & Dorre, 1988] 9

4.2.3 Representation of General Disjunction in [Kasper,

1987c] 104.2.4 Conclusion and Motivation of a New Approach

134.3 Disjunction Names 14

4.3.1 Exporting Disjunction, Revisited 14

4.3.2 Other Applications of Named Disjunction 15

4.4 Feature Terms 16

4.4.1 Syntax of Feature Terms 16

4.4.2 Semantics of Feature Terms 18

4.4.3 Dierences to Smolkas Feature Terms 20

4.5 Feature Clauses 21

4.5.1 From Feature Terms to Sets of Constraints 22

4.5.2 Normal Form and Rewrite Rules without Dis-

junction 224.6 Applying the Maxwell/Kaplan Scheme 24

4.6.1 Conditional Constraints 26

4.6.2 Normal Form for Conditional Constraints 28

4.6.3 Discussion 33

4.7 From Conditional Constraints to Contexted Variables 34

4.7.1 Context-Uniqueness and Variants of Variables 35

4.7.2 Context-Unique Feature Descriptions 37

4.7.3 Translation to Context-Unique Form 38

4.7.4 Normal Form and Rewrite Rules 40

4 Contents

4.7.5 Soundness, Completeness, and Termination 43

4.7.6 An Example 51

4.7.7 Algorithmic Considerations 52

4.8 The Feature Graph Interpretation 53

4.9 Conclusion 57

Bibliography 59

4 Andreas Eisele & Jochen Dorre: Disjunctive Uni-cation

Abstract This paper describes techniques for the representation and

ecient unication of feature structures in the presence of disjunctive

information.

For the specication of feature structures we introduce feature terms

which may contain sorts, variables, negation and named disjunction.

Disjunction names allow for a compact representation and ecient man-

agement of multiple solutions even in cases where the dierences between

the solutions aect dierent parts of the structures. We present an al-

gorithm for the unication of such structures in form of simplication

rules for logical formulas which come with an open-world semantics.

Departing from a single algorithm for the non-disjunctive case, we show

how an algorithm for the disjunctive case can be derived using a general

scheme originally proposed in [Maxwell & Kaplan, 1989]. We investigate

an alternative representation, called context-unique feature descriptions,

where a closer coupling between variables and disjunctive contexts al-

lows major simplications of the algorithm, which facilitate an ecient

implementation.

The methods proposed here are in no way limited to the eld of natural

language processing, but could be useful in other applications of logic

programming, where the treatment of disjunctive information leads to

eciency problems within conventional approaches.

Keywords: Unication, Feature Structures, Disjunction, Constraint

Satisfaction

4.1 Introduction

4.1.1 Unication-based grammar formalisms

For about a decade now, recursive feature structures have been used as a

means of description in formal and computational linguistics. Among the

actual linguistic theories and formalisms which are based upon the uni-

cation of such feature structures or which have been implemented with

their help, one can nd formalisms such as FUG [Kay, 1979], [Kay, 1985],

LFG [Kaplan & Bresnan, 1982], GPSG [Gazdar et:al:, 1985], CUG

2 Chapter 4

[Uszkoreit, 1986], UCG [Zeevat/Klein/Calder, 1987], Systemic Gram-

mar [Kasper, 1987b], HPSG [Pollard & Sag, 1987], and many others.

Feature structures and the formalisms based on their unication have

properties which make them very suitable for the denition of grammars

as well as for computation.

Feature structures allow to express partial information about var-ious levels of linguistic description (phonology, syntax, seman-

tics ...) in a modular and uniform way, where also interactions

between dierent levels can be expressed easily.

The combination of such partial descriptions to more completeones can be performed by unication of feature structures.

Various notions of \grammar" which have been investigated informal language theory can be generalized in a natural way by the

introduction of feature structures.

Unication formalisms have clearly dened denotational seman-tics. In particular, the outcome of the unication of several struc-

tures can be dened independently from the details of implemen-

tation, such as, for instance, the order of computation.

Through its close relation with the terms used in logic program-ming and with nite automatons, ecient algorithms are known

which make unication possible in almost linear time.

These properties not only make unication grammars attractive as for-

malisms for linguistic theories, but also at the same time, open up ways

for practical applications in natural language processing. The implemen-

tations of natural language analysis systems which are directly or indi-

rectly based upon unication grammars are innumerable; and it has been

shown that unication grammars can also be successfully employed for

natural language generation [Momma & Dorre, 1987], [Shieber et:al:, 1989].

A further advantage of the use of feature structures in computational

linguistics is the close relationship to languages used for knowledge rep-

resentation (see [Smolka, 1988] for a more detailed discussion). The sim-

ilarities between the description languages used in both areas facilitates

the design of interfaces and systems where both areas are involved.

Two dierent approaches have been used to give a clear semantic base

to feature-unication formalisms. One is due to [Kasper & Rounds, 1986],

where the meaning of feature descriptions is dened relative to the do-

main of feature structures, which are labeled graphs. The other ap-

Andreas Eisele & Jochen Dorre: Disjunctive Unication 3

proach, due to [Smolka, 1988], does not make specic assumptions about

the domain, but uses an open world semantics instead. In this paper,

we will adopt the latter view and we will use the terminology given in

Smolka. In particular, we will not use the term feature structure in the

technical sense dened in [Kasper & Rounds, 1986], but as an informal

term referring to some representation of feature information which is

used in a practical implementation.

4.1.2 Disjunction and Ambiguity in Unication Grammars

Although ecient algorithms for the unication of feature structures are

well-known, most of the existing implementations of unication gram-

mar formalisms encounter increasing eciency problems when applied

to larger grammars. This can be explained by the fact that | as soon

as the described fragments of natural language grow to a useful size

| the need to handle a bunch of peculiar cases leads to the introduc-

tion of disjunctive information into the linguistic description. Usually,

this goes along with the existence of ambiguous derivations for certain

phrases. There are several sources of disjunctive information, including

lexical ambiguity , where diering analyses are possible for a given word

concerning part of speech, subcategorization for complements, morpho-

logical features, or any other information assigned to it, and structural

ambiguity introduced by dierent possible groupings of subphrases or

dierent interpretations of these phrases due to disjunctive annotations

of grammar rules.

Although most of these forms of ambiguity can also appear in context

free grammars, they are much more troublesome for an ecient process-

ing of unication grammars, since the techniques used in CF analysis to

represent ambiguous constituent structures (structure sharing, local rep-

resentation of ambiguity, see e.g. [Earley, 1970], [Graham/Harrison/Ruzzo, 1980],

[Tomita, 1987]) cannot be generalized simply to the treatment of feature

structures. In context free analysis, dierent c-structures of the same

part of a sentence can be combined and can be seen { from higher nodes

in the tree { as one single node whose complex internal structure does

not matter. However, a corresponding combination of a set of diering

feature structures is not always possible, since additional information

can interact in dierent ways with the feature structures of this set, so

that it cannot be easily treated as one unit.

Thus, unication grammars can describe more complex languages

4 Chapter 4

than context free grammars and consequently the recognition with uni-

cation grammars is much more complex in the worst case. Without

any restriction of the formalism, the recognition problem in the worst

case is not even decidable (this happens if an innite set of dierent fea-

ture structures can be constructed in some cyclic derivation chain). It

can be made decidable by imposing conditions such as o-line parsabil-

ity (see [Pereira & Warren, 1983]) on the grammar or on the formalism,

but still, the formalism can be used to state NP-hard problems (see

e.g. [Kasper, 1987a]). From this point of view, we can see the principal

limits of what an ecient implementation of unication grammars can

attain. On the other hand, the ineciency in actual implementations of

unication formalisms does not seem to be a consequence of this fact.

Often, one has the impression that a more sophisticated strategy could

save much computational overhead for a given grammar , although it

would of course not avoid exponential blow-up in the worst case.

4.1.3 The Processing of Ambiguous Unication Grammars

All implementations of unication-based grammar formalism have to

face the problem of an ecient treatment of ambiguity in one way or

another. This can be achieved by employing heuristics to solve am-

biguity as soon as possible, by using backtracking to process dierent

possibilities sequentially, or by trying to represent disjunctive informa-

tion explicitly and pursue all possibilities in parallel.

The early solution of disjunctions is limited to cases where a decision

can be found on the basis of the information which is present locally.

This is often impossible in practice, or it demands additional knowledge

sources and representational layers (for example, semantic or pragmatic

information or inferences where world knowledge is involved). If this

method is applied in situations where not enough information for a de-

cision exists, correct solutions can be lost.

The backtracking method is frequently applied in practice, since it

nds all possible solutions in a simple and systematic way. In particu-

lar, implementations in Prolog can make use of the depth-rst search-

ing algorithm which - just as unication of terms - is already a part

of the programming language. However, eciency problems appear if

disjunctions cannot be solved early by the appearance of further infor-

mation. By the division of computation into several mutually indepen-

dent branches, there is a danger that in these branches almost identical


work is carried out and thus the work is unnecessarily multiplied. In

particular, if many mutually independent disjunctions are treated by

backtracking, this then leads to a combinatorial explosion of possibili-

ties.

The quasi-parallel processing of dierent possibilities alone does not

provide any improvement vis-a-vis the backtracking method. In con-

trary: as long as no steps of computation are avoided, but only their

temporal order is modied, the space consumption can increase drasti-

cally, since the data structures which are used in dierent branches have

to be available at the same time. In practice, this can lead to additional

time consumption (for paging, garbage collection, etc.). On the other

hand, the quasi-parallel processing principally provides the opportunity

to exploit similarities between dierent branches for a more ecient rep-

resentation. The most popular method of quasi-parallel execution is the

use of a table, were intermediate results are stored in order to avoid

to compute them repeatedly. This table is called well-formed substring

table or chart .

Basically speaking, the quasi-parallel processing is already a form of

explicit representation of disjunctions, however, a very inecient one.

It can be improved by applying the techniques known from context free

analysis, namely structure sharing and ambiguity packing, to the repre-

sentation of feature structures. In this context, structure sharing means

that parts of feature structures used more than once are represented only

once. In [Karttunen & Kay, 1985], [Pereira, 1985], very detailed propos-

als have already been put forward which help to decrease the amount

of copying needed. It should not be overlooked that even sophisticated

methods of structure sharing can, at the best, avoid the copying of par-

tial structures which is not necessary in a good implementation of the

backtracking method anyway. More remarkable savings can be obtained

by combining dierent partial structures to one disjunctive structure

since, in this way, the number of dierent branches that have to be

considered can be reduced.

4.1.4 Organisation of Paper

The paper is organized as follows: We rst give an introduction to

unication-based grammar formalisms and motivate the use of disjunc-

tive information in such systems. Some general strategies that can be

used for the processing of disjunctive information are discussed. In sec-

6 Chapter 4

tion 4.2, we review some of the approaches to the unication of feature

structures containing value- and general disjunction, that have been pro-

posed in the literature. Section 4.3 motivates the introduction of named

disjunction. This extension combines advantages of value disjunction

with those of general disjunction and is useful both under aspects of

implementation and as an extension of the formalism. Section 4.4 in-

troduces the specication of feature structures with formulas called fea-

ture terms. These formulas use variables to express path equivalence

and allow to express named disjunction and (classical) negation. For

such feature terms, a denotational open world semantics is dened.

Section 4.5 introduces simple feature descriptions, which allow for a

constraint-based specication of feature structures not containing dis-

junction. Both a normal form and a rewrite system for the normaliza-

tion of simple descriptions are dened. Section 4.6 introduces conditional

constraints, which allow to extend feature descriptions to the disjunc-

tive case. The normalization procedure of Section 4.5 is extended using

the scheme given in [Maxwell & Kaplan, 1989] and some properties of

the resulting algorithm are discussed. In Section 4.7, an alternative

representation for disjunctive feature descriptions is given based on the

notion of context-uniqueness, that simplies both the feature descrip-

tions and the normalization rules. A translation procedure form feature

terms to the new representation is dened and the computational prop-

erties of the new method are compared to those of the algorithm from

Section 4.6. Section 4.8 investigates the domain of feature graphs as a

possible interpretation of the formulas given in the sections before and

shows that this interpretation is canonical, i.e. that it provides a model

for every consistent description. The last section gives some prospects

for possible applications and extensions of the methods discussed so far.

4.2 Unication of Feature Structures with DisjunctiveInformation

The rst step towards a representation of disjunctive information in

feature structures is to allow as the value of an attribute not only an

atom or an embedded feature structure, but also a disjunction of dierent

values. Analogous to Kasper [Kasper, 1987a], we will call this value

disjunction. Both the method sketched in [Karttunen, 1984] and the


eciency-oriented normal form (ENF) given in [Eisele & Dorre, 1988]

allow for the representation of such disjunctive values. In the sequel,

we will sketch some of the problems that arise during the unication of

such feature structures and the solutions proposed by [Karttunen, 1984]

and [Eisele & Dorre, 1988]. The straightforward approach to unication

in the presence of value disjunction is to traverse the structures to be

unied in the usual way and to multiply out all possibilities whenever

some substructure is being unied with a disjunctive value. However,

the situation gets more complicated as soon as disjunction interacts with

path equivalences.

Consider the unication of the feature structures1 shown in Fig. 4.1,

where the rst one contains an equivalence between the paths hobji andhvcomp subji (expressed by the shared variable O) and the second onea disjunctive value for the attribute vcomp. These structures represent

part of the information an LFG-style lexicon would assign to the verbs

\force" and \explode", respectively, where the latter has both a transi-

tive and an intransitive reading. The unication could e.g. be performed

by a parser analyzing a sentence \...he forced them to explode...". When

the unication algorithm gets to the point where the disjunctive value

for the attribute vcomp has to be unied with the value containing the

coreference, it is not obvious how a reasonable result of this unication

should look like and how it could be found. A simple enumeration of

the possibilities would not be sucient, since the rst branch of the uni-

cation would lead to a dierent instantiation of the variable O (namely

an identication with S1) than the second one, where it would be iden-

tied with S2 and where the feature animate would be restricted to the

value +. Hence, the value of the obj-feature would not be unique any

more, but would depend on the choice for a value for the vcomp-feature.

Hence, parts of the disjunction get linked to remote parts of the result-

ing structure, or stated dierently, disjunctive information is exported

through the path equivalence.

An implementation that modies the data structures used for the

representation of the feature structures would have problems to keep

the information from both disjuncts separate. In such representations,

the law of distributivity can not be applied freely to distribute conjunc-

1Capital letters are used to denote labels of f-structures (variables). The variablesappearing in the values of the pred-feature are not crucial for the example, but ratheradded for the sake of completeness.

8 Chapter 4

2664subj: S

pred: force(S,O,V)

obj: O

vcomp: V

subj:O

3775

u2666664vcomp:8>>>>>>>>>:

subj:S1pred:explodei(S1)

264subj:S2

animate:+

pred:explodet(S2,O2)

obj: O2

375

9>>>>>=>>>>>;

3777775

)

26666666666664

subj: S

pred: force(S,O,V)

obj: ??

vcomp:

8>>>>>>>>>:


264subj:S2

animate:+


obj: O2

375

9>>>>>=>>>>>;

37777777777775Figure 4.1A Unication Involving Disjunction and Path Equivalence

tive information over disjunctions, as long as information that originates

from a disjunctive context can give rise to global eects (be it destructive

modication or the instantiation of logical variables).

4.2.1 Karttunens Approach

Karttunens treatment of the problem is as follows. Both unications

are performed, but only if one of the unications fail, the other one is

treated in the usual way. If, however, both unications are successful,

their eect on the binding of variables (or the modication of the data

structures) are undone immediately afterwards. (We will call this a test-

unication in the sequel.) As the resulting value for the attribute vcomp,


a disjunction is used, where the disjuncts are tuples of structures which

are thought of as being conjunctively connected (see Fig. 4.2). Whenever

a disjunction of such tuples is resolved during later unications, the

structures in the chosen tuple have to be unied. The dicult point,

however, is to keep track if later modications of substructures render

some disjunct containing such tuples inconsistent. In order to achieve

this, Karttunen uses so-called constraints on the structures involved,

which indicate which disjuncts have to be re-checked for consistency

after the modication of a structure. These tests, again, are performed

by test-unications, that might exclude a disjunct when failing, but

have to be undone if successful. Moreover, Karttunens treatment does

not seem to catch all possible clashes, and hence structures containing

inconsistent information can remain undetected in certain circumstances

(see [Bear, 1987] for details and a discussion of possible improvements).

266666666666664

subj: S

pred: force(S,O,V)

obj: O

vcomp:

8>>>>>>>>>>>>>:

V

subj:O


0B@ Vsubj:O

264subj:S2animate:+


obj: O2

3751CA

9>>>>>>>=>>>>>>>;

377777777777775Figure 4.2[Karttunen, 1984]: Disjunction of Conjunctively Connected Tuples of Structures

4.2.2 Disjunction in [Eisele & Dorre, 1988]

The ENF-representation of feature structures the authors gave in [Eisele & Dorre, 1988]

was developed in order to avoid the repeated checking of consistency of

disjuncts by test-unications, since such test can lead to an unnecessary

repetition of similar computations in cases where the structures being

processed contain many disjunctions that cannot be resolved early.

Instead, our strategy was to apply the distributive law (i.e. to mul-

tiply out) in all cases where a disjunction has to be unied with some

other structure, but to handle the eects of such unications in a proper

10 Chapter 4

way, based on the logic calculus given in [Kasper & Rounds, 1986]. Ac-

cording to this logical foundation, it is e.g. save to distribute conjunctive

information over disjuncts without aecting the meaning of a represen-

tation. In the unication algorithm given in [Eisele & Dorre, 1988], the

case in which such a distribution of information has a non-local eect

is recognized and leads | if necessary | to a extension of the scope

of the disjunction involved. In the example given above, the eect on

the class of equivalent paths [[hvcomp subji; hobji]] would result | de-pending on the way the input formulas are represented | in one of the

results displayed in Fig. 4.32.

Although extensions of the scope of disjunctions are limited to the

minimal necessary amount, they can lead to the multiplication of obvi-

ously unrelated disjunctions which happen to be lifted to the top-level

of the representation due to such eects. This is aggravated by the

fact that the representation in [Kasper & Rounds, 1986] does not sup-

port the description of systems of structures with common parts, but

without common root, which occur frequently e.g. during parsing. The

introduction of an articial common root where the structures of the

system are embedded under articial labels increases the risk of getting

top-level disjunctions.

4.2.3 Representation of General Disjunction in [Kasper, 1987c]

Even if the interactions between path equivalencies and value-disjunction

are handled in some satisfactory way, one principal weakness of the rep-

resentation remains. Multiplying out each disjunction with all informa-

tion which is combined conjunctively with it leads to a duplication of

this information. In particular, if several disjunctive values for the same

attribute are unied, all possibilities have to be multipied out and, if

none of the combinations contain inconsistent information, all have to

be represented. A better approach would be to allow disjunction not

only on the level of values, but to use a representation where conjunc-

tive and disjunctive information can be mixed freely. For example, one

would like to take a formula3 like f: ((s1 t s2) u s3) as a representation2In this treatment, all paths in a class of equivalent paths besides the represen-

tative of this class carry so called non-local path expressions referring to the repre-sentative. The result of the unication in the example depends on the choice of therepresentative.

3We will always use the symbols u and t for conjunction (unication) and dis-junction of feature terms, since ^ and _ will be used for other purposes below.


Case 1:2666666666666664

subj: []

pred: force(hsubji,hobji,hvcompi)obj: hvcomp subji

vcomp:

8>>>>>>>>>>>:

subj:[]

pred:explodei(hvcomp subji)

24subj:animate:+

pred:explodet(hvcomp subji,hvcomp obji)obj: []

35

9>>>>>>=>>>>>>;

3777777777777775

Case 2:8>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>:

266664subj: []

pred: force(hsubji,hobji,hvcompi)obj: []

vcomp:

subj:hobjipred:explodei(hvcomp subji)

377775

2666666664

subj: []

pred: force(hsubji,hobji,hvcompi)obj:

animate:+

vcomp:

24subj:hobjipred:explodet(hvcomp subji,hvcomp obji)obj: []

35

3777777775

9>>>>>>>>>>>>>>>>>>>>=>>>>>>>>>>>>>>>>>>>>;Figure 4.3Two Possible Results in [Eisele & Dorre, 1988]

of the unication of f: (s1 t s2) with f:s3 if both disjuncts are compat-ible with s3. Of course interactions between dierent parts of such an

AND-OR-structure must be taken care of, since the representation no

longer guarantees consistency per se.

This way of representing general disjunction is used in [Kasper, 1987c],

where on the outermost level an AND-OR-tree is allowed, whose leaves

are ordinary (non-disjunctive) representations of feature structures. Fig. 4.4

shows how an AND-OR-tree representation for the example given above

would look like according to [Kasper, 1987c]. In the extreme case, this

12 Chapter 4

AND

8>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>:

2664subj: S

pred: force(S,O,V)

obj: O

vcomp: V

subj:O

3775

OR

8>>>>>>>>>>>>>:

vcomp:


264vcomp:

264subj:S2animate:+


obj: O2

375375

Figure 4.4Representation of Disjunctive Information in an AND-OR-Tree

representation can be exponentially more succinct than it is possible

with value disjunction alone, since dierent disjunctions are not multi-

plied out, but are accumulated as such. However, contrary to the log-

ical calculus given in [Kasper & Rounds, 1986], the use of disjunction

is restricted to the top level in Kaspers implementation. This makes

it possible to employ an ordinary unication algorithm which does not

have to care about disjunction. On the other hand, it increases the cost

of representing deeply embedded disjunctive values and makes it more

dicult to maintain consistency during the unication of such AND-

OR-trees, since also unrelated parts of the tree have to be taken into

account.

The unication of two such AND-OR-trees is pursued in three steps.

First, the non-disjunctive (xed) parts of the structures are unied

(using a non-disjunctive algorithm). In a second step, disjunctions are

pruned by removing all disjuncts which are incompatible with the xed

part found so far. Here, compatibility is determined by the use test-

unications, i.e. unications without remaining eect on the arguments.

In the last step, combinations of the remaining disjunctions are tried out

for each disjunct to determine if there is actually a possibility where this

disjunct is used. This step, again, makes heavy use of test-unications.

In cases where most of the disjunctions can be ruled out during a uni-


cation, Kaspers algorithm has the advantage that the inconsistencies

are very likely to be found in the earlier steps, so that they do not cause

trouble in the most expensive third step. In principle, however, the al-

gorithm needs an exponential amount of time to test for consistency in

each unication, which becomes a problem if more than a few disjunc-

tions remain unresolved during several unications. Even the locality of

the information cannot be exploited, since partial structures which do

not stand in relation to each other have to be combined every time.

Kasper acknowledges this problem and proposes two improvements.

The rst is to restrict the application of the last step to situations where

the disappearance of disjuncts is very likely (so called strategical mo-

ments). Secondly, he proposes to employ an indexing scheme on the

structures in the AND-OR-tree in such a way that only those disjunc-

tions have to be re-checked after a modication that can actually interact

with the modied parts. This could eliminate some of the problems that

arise from the restriction to top-level-disjunction, however, he does not

elaborate upon this proposal in detail.

4.2.4 Conclusion and Motivation of a New Approach

The three approaches described so far could be characterized according

to two criteria: a) the kind of disjunction they can handle, and b) the

method they use for consistency checking. Both Karttunens method and

our ENF-representation restrict themselves to value disjunction, whereas

in Kaspers approach the use of disjunction is not restricted to a single

attribute or path. Both Karttunens and Kaspers method rely on test-

unications, i.e. they unify feature structures and undo the eect of the

unication afterwards, which has the consequence that they might often

have to repeat similar computations in later steps. To the contrary, the

ENF-unication-algorithm always tries to keep the results of intermediate

computations as long as a later use seems possible.

Given these criteria, it appears that a method which is able to repre-

sent general disjunction locally and which does not rely on test-unications

would be an interesting alternative. Such an approach should attempt

to represent the result of intermediate computations in a way that it is

always available in later steps. However, this has to be done in such a

way that the computations which are performed for dierent branches of

a disjunct do not interact in an unwanted way and that as little structure

as possible has to be copied.

14 Chapter 4

4.3 Disjunction Names

4.3.1 Exporting Disjunction, Revisited

Consider once more the example from Fig. 4.1, where the unication

of a substructure with dierent disjuncts has dierent eects on the

binding of the variable O. Another way to represent the result would be

to bind this variable to a disjunctive value. However, the disjunction

in the value of O must be synchronized with the disjunctive value for

the attribute vcomp, i.e. it is not legal to choose the rst branch in

one of the disjunctions and the second one in the other. To express this

synchronization by position in the disjunction, we give both disjunctions

the same index d1. Such an index then stands for a global choice for one

of the positions in all disjunctions carrying this index. We then get the

situation shown in Fig. 4.5.

26666666666664

subj: S

pred: force(S,O,V)

obj: O = d1

S1 S2

}

vcomp:V =

d1

8>>>>>>>>>:


264subj:S2

animate:+


obj: O2

375

9>>>>>=>>>>>;

37777777777775Figure 4.5Disjunction with Index

This way of representing the non-local eects of disjunctive values has

a couple of advantages:

If the scope of a disjunction has to be extended by an \export" ofdisjunctive information through a path equivalence, the extension

is limited to exactly the necessary amount.

The representation is capable to express disjunctive informationconcerning more than one path, and is not limited to value dis-

junction in the usual sense.

Nevertheless, disjunctive information is represented locally as dis-junctive values of certain features, and consequently unications


concerning other features do not have to touch the disjunctive in-

formation at all. We do not need global consistency checking as in

Kaspers algorithm.

Once a scheme for the correct treatment of labeled disjunctions isimplemented, it can be used to decrease the scope of disjunctions

even further. For instance, the current example could be rep-

resented as shown in Fig. 4.6, where arbitrary values embedded

on paths other than hobji, hvcomp subj animatei, hvcomp prediand hvcomp obji could be added without any interaction with thedisjunction d1.

2666666664

subj: S

pred: force(S,O1,V)

obj: O1

vcomp:

V

2664subj:O1

hanimate:d1

X +

}ipred:d1

explodei(O1) explodet(O1,O2)

}obj: d1

NONE O2

}3775

3777777775Figure 4.6Disjunctive Information More Deeply Embedded

To summarize, one could say that the possibility of distributing disjunc-

tions with the same index over feature structures combines the advan-

tages of all representations discussed so far, if we succeed in giving a

unication algorithm for this representation.

4.3.2 Other Applications of Named Disjunction

The possibility to identify with a name the choice that is expressed in

disjunctive formulas allows for a clear and concise statement on depen-

dencies between dierent parts of a feature structure. For instance, we

might write a formula

syn : arg : case : (dat td1 acc) u sem : rel : (stat in td1 dir in)to express the fact that the semantic interpretation of the German

preposition \in" depends on the grammatical case of the following noun

phrase. The directional reading (translated as \into") goes along with

the accusative case of its argument, whereas for the stative reading

(translated as \in"), the argument stands in the dative case.

16 Chapter 4

4.4 Feature Terms

Before we can give a unication algorithm for feature structures contain-

ing labeled disjunction, we have to introduce a language for the speci-

cation of such structures and dene its exact meaning. In dening both

this language and its model-theoretic semantics we borrow several deni-

tions from [Smolka, 1988], however, we will generalize the representation

given there to support disjunction names.

The expressions of the language are so-called feature terms, where

each feature term describes a set of possible feature structures. The

language also allows for negative information (including negated path

equivalence) and the use of sort symbols, on which some semi-lattice is

dened. We will use variables to express (both positive and negative)

path equivalence, since they simplify both the use of the formalism (by

enabling clear and concise descriptions) and its computational processing

(by replacing explicit statements of classes of equivalent paths).

On the other hand, our formalization has some minor restrictions,

mainly to keep the presentation simple. For instance, it does not provide

a special notation for path equivalence, which can always be expressed

using variables, nor the possibility for negating complex subformulae,

since such negation can always be brought to the innermost level of

embedding. The use of disjunction is limited to named disjunction, i.e.

even for disjunctions that appear only once in a formula, a label has

to be introduced. We also limit ourselves to binary disjunction. These

restrictions do not decrease the expressive power of the formalism, since

it is straightforward to rewrite formulas containing such constructs into

equivalent formulas of our syntax in linear time.

One aspect under which out treatment diers slightly from Smolkas

is that we introduce a special sort symbol NONE in order to express the

non-existence of features. The dierences are discussed in section 4.4.3

in more detail.

4.4.1 Syntax of Feature Terms

For the following we assume a signature containing the following sets:

a set S of sort symbols, which forms a lower semilattice with re-spect to the partial order , i.e. the greatest lower bound (GLB)of two sorts is always a sort in S. > and ? are greatest and least


element, respectively. We use capital letters A;B;C : : : for sort

symbols.

a set Sg S of so-called singleton sorts. The GLB of a singletonsort with any other sort yields either ? or the singleton sort itself,i.e. there is no sort smaller than a singleton sort except ?. We uselowercase letters a; b; c : : : for singleton sort symbols.

a special sort NONE 2 Sg, which is incomparable to any other sortbesides > and ?. a set F of feature symbols. Letters f; g; h : : : will denote feature

symbols.

Additionally we assume:

an innite set V of variables, written: x; y; z; x1; y1 : : : an innite set D of disjunction names, written: d; d1; d2 : : :

The sets S, F, V and D are pairwise disjoint.

Denition 1 (Feature Terms) We dene feature terms with variables,

simple negation and named disjunction by the context-free production

rule given in Fig. 4.7. The set of feature terms is called FT. The letters

s, t, t1, : : : will always denote feature terms.

s; t ! A a sortj :A its complementj x a variablej :x its complementj f:s selectionj s u t conjunction (intersection)j s td t named disjunction (union)

Figure 4.7The Syntax of Feature Terms

Using this notation, we can now describe the feature structures de-

picted in Fig. 4.5 and Fig. 4.6 using the feature term in Fig 4.84.

4The terms force(xS ; xO1 ; xV ), explodei(xO1) and explodet(xO1 ; xO2) can bethought of as abbreviations (templates) for feature terms whose internal struc-ture does not matter here. These terms might involve the variables mentioned asparameters.

18 Chapter 4

subj : xSu pred : force(xS ; xO1 ; xV )u obj : xO1u vcomp : ( subj : xO1

u ( ( pred : explodei(xO1)u obj : NONE)td1

( pred : explodet(xO1 ; xO2)

u subj : animate : +u obj : xO2)))

Figure 4.8An Example for a Feature Term

4.4.2 Semantics of Feature Terms

We can now dene the semantics of our feature terms as follows. We

require an interpretation of a signature to be a pair (U ; I) of a universeof interpretation and an interpretation function such that:

>I = U ?I = ; for all sorts A;B: GLB(A;B)I = AI \BI singleton sorts are mapped onto singleton sets for every feature f : fI is a function U ! U . if a is a singleton sort and f is a feature symbol, then fI maps aI

into NONEI5

When interpreting a feature term with variables and named disjunc-

tions, we have to make sure that the same value is assigned to each

occurrence of a variable and that the same branch is chosen for each oc-

currence of a named disjunction. To achieve this, we introduce variable

assignments and disjunctive contexts . Variable assignments assign to

each variable some element of the universe. The idea behind disjunctive

contexts is that they assign to each disjunction name the branch that

5This restriction could be generalized quite naturally to non-singleton sorts, forwhich only certain features make sense. We could e.g. require that phonologyI mapsagreement-valueI into NONEI if an appropriate denition tells us so.


has to be taken for this disjunction and hence specify a possible inter-

pretation of a formula with named disjunction. Since we limit ourselves

to binary disjunctions, a branch of a disjunction can be specied by one

of the symbols l or r.

Denition 2 (U-Assignment) A U-assignment is an element of UV,i.e. a function from V to U . The symbol will always denote a U-assignment.

Denition 3 (Context) A context is an element of fl; rgD, i.e. a func-tion from D to the set fl; rg. The symbols , 0, etc. will always denotecontexts.

For a given interpretation, we dene the denotation of a feature term

in a context 2 fl; rgD under an assignment 2 UV as shown inFig. 4.9.

[[A]]; := AI

[[x]]; := f(x)g[[:s]]; := U [[s]];[[f:s]]; := fa 2 U j fI(a) 2 [[s]];g

[[s u t]]; := [[s]]; \ [[t]];[[s td t]]; :=

[[s]]; if (d) = l

[[t]]; if (d) = r

Figure 4.9Denotation of Feature Terms in context under assignment

We can leave out the reference to an assignment and dene the deno-

tation of a feature term in a context or without reference to a context

as follows:

[[s]] :=[

2UV[[s]];

[[s]] :=[

2fl;rgD[[s]]

20 Chapter 4

We will call a feature term consistent , i it has a non-empty denotation

for some interpretation. A feature term which has an empty denotation

in every interpretation is called inconsistent .

The reader might wonder why we restricted the use of negation in

feature terms, whereas the semantics of negation is dened without re-

striction. The reason is that this helps us to avoid some unnecessary

problems due to the fact that negation of complex feature terms con-

taining variables or named disjunctions does not have the intuitively ex-

pected meaning. According to our semantics, a term :(t1tdt2) turns outto be equivalent to :t1td:t2 and not | as expected | to :t1u:t2. Toget the expected meaning, we would have to introduce existential quan-

tication over variables and disjunctions before negating a feature term

containing them. It is not yet clear how such existential quantication

could be implemented.

4.4.3 Dierences to Smolkas Feature Terms

It might be important to point out some dierences between our feature

terms and those proposed in [Smolka, 1988]. Our treatment of dened-

ness and undenedness of features diers from Smolkas. We assume

that the interpretation of each feature symbol is a total function de-

ned on the universe of the interpretation. Instead of saying that such

a function is undened for a certain individual, we assume that this

function maps this individual onto the (only) member of NONEI . Thisspecial object itself is a xpoint for all functions associated to the fea-

tures. This modication simplies a couple of technical details, while

keeping the conditions for consistency essentially unchanged6. Among

other things, this modication will allow us to represent disjunction on

a deeper level of embedding in some cases. (See Fig. 4.6 for an exam-

ple.) Another advantage is that we do not need disjunction in order

to express undenedness of a path without implying denedness of its

prexes. (The term f : g : h : NONE would have to be written as

:f : > t f : :g : > t f : g : :h : > otherwise).For the sake of simplicity, we did not introduce a special notation

for agreement or disagreement of paths, since we can use variables and

6As far as we can see, the only dierence is that f : > is equivalent to > in ourtreatment, whereas it is more specic than > in Smolkas logic. To get the exactcounterpart of Smolkas interpretation of f : > in our syntax, we would have to writef : :NONE


their negation to express the same constraints. A term p # q expressingequivalence (including denedness) of the paths p and q can be written as

p: (xu:NONE)u q:x, and the disagreement p " q (including denednessof both paths) as p : (x u :NONE) u q : (:x u :NONE). However, theadditional constraints requiring denedness can of course be omitted at

will (in which case disagreement requires that at least one of the paths

must be dened).

Since our disjunctions carry names, not all the logical equivalences or

tautologies known from propositional logic apply to them. For instance,

we may not simplify the formula > tdi t to > unless all left-hand sidesof all disjunctions labeled di are less informative than the respective

right-hand sides. Similarly, labeled disjunction is not commutative in

the usual sense, i.e. if the name of the aected disjunction appears more

than once in the overall formula. Again, we have to swap all occurrences

of a disjunction with the same name in order to maintain the semantics

of a feature term. However, there are not only restrictions. Named

disjunctions allow also for tautologies that are not valid for ordinary

disjunction. While distribution of a conjunct over a disjunction and

vice versa works as usual, the rule for distributing a disjunct over a

conjunction can be generalized. A term of the form t1 td (t2 u t3) canbe replaced by (t1 td t2) u (> td t3) or by (> td t2) u (t1 td t3), sincethe label d of the disjunction excludes the possibility to choose one of t2and t3 without choosing the other.

Even if these dierences to the standard approaches might look dis-

advantageous at rst sight, they turn out to be a reasonable price, given

the other advantages of naming disjunction.

4.5 Feature Clauses

Since feature terms are a very useful way to encode linguistic specica-

tions they might be employed directly as a part of a grammar formalism.

However, the computational mechanisms needed for an implementation

are best described in terms of constraints over variables. We introduce

a relational language similar to Smolkas set descriptions and introduce

a normal form where inconsistencies become obvious, and simplica-

tion rules that allow to transform a given specication into this normal

form. Unlike Smolka, we use this language also to express descriptions

22 Chapter 4

containing disjunctive information.

4.5.1 From Feature Terms to Sets of Constraints

In this section, we will develop a constraint calculus for feature terms

without disjunction which will be extended in dierent ways in the fol-

lowing sections. In order to keep things simple and perspicuous, we do

not want to deal with many dierent types of constraints. Hence, we

will assume only one type that relates a variable with a feature term

describing possible values of this variable, and another that indicates

inconsistency. Such constraints will be called simple constraints .

Denition 4 (Simple Constraint) A simple constraint is either a pair

x jt where x 2 V and t 2 FT, or the symbol ?. SC is the set of simpleconstraints, and sc, sc1, : : : will always denote elements of SC.

Denition 5 (Satisfaction of simple constraints) For a given inter-

pretation we will say that a simple constraint sc is satised in a context

under an assignment (written ; j= sc) i sc has the form x jt and(x) 2 [[t]];. The simple constraint ? is never satised for arbitrary and . A set of simple constraints is satised in a context under an

assignment , if each of its members is satised in under .

Denition 6 (Simple Rooted Feature Description) A simple root-

ed f-description is a pair (x0; SC), where x0 2 V and SC SC.The denotation of a simple rooted f-description is dened as follows:

[[(x0; SC)]] := f(x0) j there are 2 UV; 2 fl; rgD such thatfor every sc 2 SC : ; j= sc g

We can easily verify that for every feature term t not containing the

variable x0, we can get an equivalent feature description (x0; fx0 j tg),i.e.:

[[t]] = [[(x0; fx0 jtg)]]4.5.2 Normal Form and Rewrite Rules without Disjunction

One reason for switching from terms to descriptions is the fact that for

the denition of a normal form and of rewrite rules for the normalization

we can exploit the work done by [Smolka, 1988]. Another reason is that

it is easy to apply the scheme given in [Maxwell & Kaplan, 1989] to


such a rewrite system. The idea of a normal form for constraints is to

restrict the syntax in such a way that inconsistent information can not

be hidden somewhere among dierent constraints, but has to show up

explicitly. Hence, we have to consider all possible kinds of contradiction

and to exclude them syntactically. The normal forms both for simple

constraints and for simple rooted f-descriptions are essentially the same

as in [Smolka, 1988], the only dierences resulting from our dierent

treatment of non-denedness:

A simple constraint is called normal if it has one of the following forms:

x jA or x j:A, where A 2 Snf>;?g x j:y, where x; y 2 V; x 6= y x jf:y, with y 2 V ?

Stated the other way, a non-normal constraint has one of the following

forms:

(1) x j>; (2) x j?; (3) x jy; (4) x j:>; (5) x j:?; (6) x j:x; (7) x jf:t; where t 62 V; (8) x jt1 td t2; or (9) x jt1 u t2.

A simple rooted f-description (x0; SC), is called normal if SC is a set

of normal simple constraints which satises the following conditions:

1. If SC contains ?, it does not contain any other constraint.2. If SC contains x jA and x jB with A;B 2 S, then A = B.3. If SC contains x ja and x jt with a 2 Sg, then t = a.4. If SC contains x jA and x j:B, then A is not a subsort of (or equal

to) B, and GLB(A;B) 6= ?5. If SC contains x j :A and x j :B, then A is not a proper subsort

of B.

6. If SC contains x jf:y and x jf:z, then y = z

We know that every normal simple feature description (x0; SC) with

SC 6= f?g has a non-empty denotation in some interpretation (U ; I)(see [Smolka, 1988] for a proof).

For each way a simple rooted f-description could fail to be normal,

one of the following rules can rewrite the description into an equivalent

one, which does not contain the conflict with normal form. For ease

of notation, we write sc & SC to denote fscg [ SC where sc 62 SC.SCx!y is the set of constraints obtained from SC where all occurrences

24 Chapter 4

of the variable x have been substituted by y. Since we do not want

to substitute the root variable away, the rewrite rules carry a reference

to this variable. The rules that handle single non-normal constraints

are numbered (Ss1) : : : (Ss8), whereas the rules (Ms1) : : : (Ms6) handle

cases where two normal simple constraints conflict with one of the co-

occurency conditions. We then get7:

For each of this rewrite rules, it can be shown that it does not modify

the denotation of the feature description. Furthermore, we can easily

check that for any non-normal feature description, one of the rules apply,

i.e. that if none of the rules match, the description must be normal.

Finally, we can prove that there is no innite chain of rule applications

by dening the size of a set of constraints as a non-negative integer

in such a way that this number is decreased by each rule application8.

Hence, we know that the rewrite rules constitute an eective procedure

for the normalization of an arbitrary feature description not containing

disjunction.

We did not give a rule (Ss9) for rewriting a constraint containing a

disjunction. Although the rules (Ss7); (Ss8) handle terms with embed-

ded disjunctions correctly, we can not rewrite a constraint of the form

x jt1 td t2 into an equivalent set of normal simple constraints.

4.6 Applying the Maxwell/Kaplan Scheme

Until now, our descriptions contain sets of constraints which are con-

nected conjunctively, i.e. all constraints have to be satised simultane-

ously by an adequate solution. Hence the rewrite system used for the

normalization can concentrate on the syntactic form of the constraints

and does not have to consider more complex relations between the con-

straints in the description. However, in order to deal with disjunctive

information, we have to give up some of this simplicity, since such cases

can not be represented with the simple constraints used so far. We might

7The treatment of inconsistency and equality in this rules may look unnecessarilylong-winded, but this formulation will facilitate the generalization to the disjunctivecase.

8Such an assignment is given e.g. by the sum cons+negv+2conj+2featv+4featn, where cons is the number of constraints 6= ?, negv the number of :-symbolsdirectly followed by a variable, conj the number of u-symbols, featv the number of:-symbols directly followed by a variable and featn the number of other :-symbolscontained in the description.


(Ss1) x j> & SC !x0 SC(Ss2) x j? & SC !x0 ? & SC(Ss3a) x jx & SC !x0 SC(Ss3b) x jy & SC !x0 SCx!y if x 6= x0(Ss3c) x0 jy & SC !x0 SCy!x0(Ss4) x j:> & SC !x0 ? & SC(Ss5) x j:? & SC !x0 SC(Ss6) x j:x & SC !x0 ? & SC(Ss7) x jf:t & SC !x0 x jf:y & y jt & SC

where t 62 V and y is new(Ss8) x jt1 u t2 & SC !x0 x jt1 & x jt2 & SC(Ss9) x jt1 td t2 & SC !x0 can not be handled yet

(Ms1) ? & x jt & SC !x0 ? & SC(Ms2) x jA & x jB & SC !x0 x jGLB(A;B) & SC(Ms3a) x ja & x j:y & SC !x0 x ja & y j:a & SC(Ms3b) x ja & x jf:y & SC !x0 x ja & y jNONE & SC(Ms4a) x jA & x j:B & SC !x0 ? & SC if A B(Ms4b) x jA & x j:B & SC !x0 x jA & SC

where GLB(A;B) = ?(Ms5) x j:A & x j:B & SC !x0 x j:B & SC if A < B(Ms6) x jf:y & x jf:z & SC !x0 x jf:y & z jy & SC

Figure 4.10Rewrite Rules for Simple Constraints

consider allowing for arbitrary boolean combinations of constraints or at

least some AND-OR-structures, but such a modication would be a very

drastic step and would make the rewrite system much more complicated

than it is now. Our goal will be to generalize the system in such a way

that as many as possible of its current properties are preserved.

26 Chapter 4

This is indeed possible, and one of the possibilities we have is to apply

the scheme given in [Maxwell & Kaplan, 1989] to our calculus. The

principle of their scheme is simple and ts very well into our current

framework.

4.6.1 Conditional Constraints

When dealing with disjunctive information, we need a way to relate our

constraints in some way to the disjunction names that appear in feature

terms. More specically, we want to express that a given constraint has

to be satised if certain disjunctive branches are chosen. To this end

we will attach a so-called context description to our constraints, which

denotes the disjunctive contexts in which the constraint has to be valid.

The key idea is that a simple constraint of the form x jt1 td t2 can bereplaced by the conjunction (d: l ! x jt1)^ (d:r ! x jt2) expressing thatx j t1 has to hold if the left branch of disjunction d is chosen, and x j t2,otherwise. Since these new constraints are still conjunctively connected,

we can hope to achieve a representation for such conditional constraints

and a generalization of our rewrite system without distroying the overall

structure of our method. We will use arbitrary boolean combinations of

disjunctive choices | called context descriptions | in order to express

conditions under which constraints have to hold.

Denition 7 (Context Descriptions) A context description is a propo-

sitional formula where the constant true, variables written di: l and di:r

with di 2 D, and the operators ^, _ and : may be employed.CD will denote the set of context descriptions. The symbols k; k1; : : :

will always denote members of CD.

The set of purely conjunctive context descriptions, i.e. those that do not

contain the operators _ and :, is denoted by CDc.

Denition 8 (Satisfaction of Context Descriptions) A context

satises a context description k, (written j=c k) according to the fol-lowing conditions:

j=c true always j=c d:b i (d) = b (b 2 fl; rg) j=c k1 ^ k2 i j=c k1 and j=c k2 j=c k1 _ k2 i j=c k1 or j=c k2 j=c :k i 6j=c k


If j=c k, we will also say that k describes or covers or that liesin k.

A context description is called contradictory, if no context satises it,

otherwise it is consistent .

Two context descriptions which are satised by exactly the same con-

texts are called equivalent (written ).

We can now annotate a simple constraint with a description of the

disjunctive contexts under which this constraint has to be valid.

Denition 9 (Conditional Constraint) A conditional constraint is a

pair sc[k] where sc 2 SC and k 2 CD. CC is the set of conditionalconstraints, and cc, cc1, : : : will always denote elements of CC.

9

For a given interpretation we will say that a conditional constraint

sc[k] is satised in a context under an assignment according to the

following conditions:

; j= x jt[k] i 6j=c k or (x) 2 [[t]];; j= ?[k] i 6j=c k

Clearly, a conditional constraint sc[k] is trivially satised in every

context with 6j= k. Stated dierently, the constraint sc[k] is eectivein contexts described by k. Suppose for instance k = d1: l ^ d2:r ^ d4: l,then we could express this as: \either sc has to be true or we have to

choose d1 : r or d2 : l or d4 : r".

A constraint of the form ?[k] is used to mark the contexts in k asinconsistent in order to exclude them from further consideration.

Denition 10 (Rooted Feature Description) A rooted f-description

is a pair (x0; CC), where x0 2 V and CC CC.The denotation of a rooted f-description is dened as follows:

[[(x0; CC)]] := f(x0)j 2 UV ^ 9 2 fl; rgD8cc 2 CC : ; j= ccg

We can easily verify that for every feature term t not containing x0:

[[t]] = [[(x0; fx0 jt[true]g)]]9A constraint sc[k] might also be written k! sc as in [Maxwell & Kaplan, 1989],

which would make its semantics more explicit. We do not use this notation since theimplication sign could be confused with the arrow used in rewrite rules.

28 Chapter 4

4.6.2 Normal Form for Conditional Constraints

A conditional constraint sc[k] is normal if sc is normal and k is consis-

tent. A rooted f-description (x0; CC), is called normal if CC is a set of

normal conditional constraints which satises the following conditions:

1. if CC contains ?[k] and x jt[k0], then k0 ^ k is contradictory2. if CC contains x jA[k] and x jB[k0] where k^k0 is consistent, thenA = B

3. if CC contains x ja[k] and x j t[k0] where k ^ k0 is consistent, thent = a

4. if CC contains x jA[k] and x j :B[k0] where k ^ k0 is consistent,then A 6 B.

5. if CC contains x j :A[k] and x j :B[k0] where k ^ k0 is consistent,then A 6< B and GLB(A;B) 6= ?.

6. if CC contains x jf :y[k] and x jf : z[k0] where k ^ k0 is consistent,then y = z

7. if CC contains sc[k] and sc[k0], then k=k

The rst six conditions are a straightforward generalization of the

normal form conditions for simple feature descriptions. Two conditional

constraints sc1[k]; sc2[k0] where sc1; sc2 would violate one of the normal

form conditions for simple descriptions can only coexist if their context

descriptions k and k0 are incompatible10

The last condition disallows constraints with dierent context descrip-

tions, but otherwise carrying the same information. In particular, this

has the consequence that all information about inconsistent contexts has

to be concentrated in one constraint ?[k]. If the overall description isinconsistent, this will eventually result in a constraint ?[k] where k isequivalent to true. In this case, the rst condition enforces that no

other constraint can be contained in the description. Hence, a normal

rooted f-description (x0; CC) where CC is dierent from f?[k]g withk true has a non-empty denotation in some interpretation (U ; I).

10Some rewrite steps might be saved by weakening the rst condition to:If CC contains ?[k] and x j t[k0], then k0 ^ :k is consistentIn this case, (Mc1) could be simplied to:

(Mc1) ?[k1] & x j t[k2] & CC !x0 ?[k1] & CC; if k2 ^ :k1 is contradictory


In order to generalize the rewrite system above to work for conditional

constraints, we can apply the scheme given in [Maxwell & Kaplan, 1989].

In our notation, this scheme says: Take a rewrite rule for simple con-

straints of the form sc1 & sc2 & SC ! sc3 & SC and replace it bythe rule

sc1[k1] & sc2[k2] & CC ! sc1[k1 ^ :k2] & sc2[k2 ^ :k1]& sc3[k1 ^ k2] & CCif k1 ^ k2 is consistent

In cases where sc2 = sc3 (i.e. rules where sc1 is eliminated if sc2 is

present), we can simplify the outcome of the scheme to

sc1[k1] & sc2[k2] & CC ! sc1[k1 ^ :k2] & sc2[k2] & CCif k1 ^ k2 is consistent

Using this scheme, we can nd rewrite rules for conditional constraints

that replace the rules (Ms1); (Ms2); (Ms4a; b) and (Ms5). The other

cases in the second group dier only in so far as they rewrite two con-

flicting constraints into two dierent constraints. For these cases, the

scheme can be slightly generalized in the obvious way, such that both

resulting constraints are marked with the context description [k1 ^ k2].With this generalization, for the remaining rules of the second group a

conditional variant can be found (see Fig. 4.12).

Most rules of the rst group have the form sc1 & SC ! SC1 [ SC,where SC1 is a set of 0, 1, or 2 simple constraints that carry the informa-

tion contained in sc1. These rules can be generalized straightforwardly

to the conditional system. The context description of the input con-

straint just has to be copied to the constraints in SC1, as shown in

Fig. 4.11. It is now straightforward to give a rule (Sc9) for rewriting

a constraint containing a disjunction to a pair of constraints with more

specic context descriptions.

The normal form for conditional descriptions additionally prohibits

constraints with contradictory context descriptions and also equal con-

straints with dierent context descriptions. These are eliminated by us-

ing the rules (Sc10) and (Mc7). The former of these rules will eliminate

constraints produced by (Mc1) : : : (Mc6) if either k1 ^ :k2 or k2 ^ :k1are contradictory11.

11It will also eliminate constraints resulting from inaccessible disjuncts of the inputformula, such as xi j t2[d: l^ d:r] if the original feature term contained t1 td (t2 td t3)

30 Chapter 4

The only case that still has to be handled are constraints of the form

x j y[k]. In the non-disjunctive case, one of the variables has been sub-stituted by the other and the constraint could be removed from the

description. Unfortunately, this would not be correct in the disjunctive

case, since such a substitution would restrict x and y to the same value

in all contexts, not only in those contexts described by k12.

We can x this problem by introducing the operation of conditional

substitution. The idea behind this operation is that x is substituted

by y in (all contexts described by) k, written CCx!y[k]13. This meansthat constraints whose context descriptions do not overlap with k (i.e.

describe no common contexts) are not aected by the substitution. For

constraints that are only eective in contexts covered by k, the sub-

stitution is done as usual, whereas constraints that contain x and are

eective both in contexts covered by k and in contexts covered by :khave to be split into one version that is left unmodied and another

where substitution takes place.

Denition 11 (Conditional Substitution) The substitution of a vari-

able x by a variable y under condition k in a set of conditional constraints

CC, written CCx!y[k] is dened as follows:

CCx!y[k] := fsc[k0] 2 CC j sc does not contain xg[ fsc[k0 ^ :k] j sc[k0] 2 CC; sc contains x;

k0 ^ :k consistentg[ f(scx!y[k0 ^ k] j sc[k0] 2 CC; sc contains x;

k0 ^ k consistentgwhere scx!y denotes the substitution of all occurrences of x by y in theconstraint sc.

Using conditional substitution, we can give also a conditional version of

the rules (S3b; c).

We can show termination as follows: A set of conditional constraints

CC can be seen as mapping from contexts to sets of simple constraint,

12Unless y is the root variable, this can only be a problem if there are constraintssc1[k1]; sc2[k2] (not necessarily dierent) where sc1 contains x, sc2 contains y andk1 ^ k2 ^ :k is consistent.

13[Maxwell & Kaplan, 1989] do not dene conditional substitution. However, theway a constraint x j y[k] is treated in the example they give (their notation wouldbe k ! x y) would lead to exactly the same result as applying the substitutionx! y[k].


(Sc1) x j>[k] & CC !x0 CC(Sc2) x j?[k] & CC !x0 ?[k] & CC(Sc3a) x jx[k] & CC !x0 CC(Sc3b) x jy[k] & CC !x0 CCx!y[k] if x 6= x0(Sc3c) x0 jy[k] & CC !x0 CCy!x0[k](Sc4) x j:>[k] & CC !x0 ?[k] & CC(Sc5) x j:?[k] & CC !x0 CC(Sc6) x j:x[k] & CC !x0 ?[k] & CC(Sc7) x jf: t[k] & CC !x0 x jf:y[k] & y j t[k] & CC;

where t 62 V and y is new(Sc8) x j t1 u t2[k] & CC !x0 x j t1[k] & x j t2[k] & CC(Sc9) x j t1 td t2[k] & CC !x0 x j t1[k ^ d: l] & x j t2[k ^ d:r] & CC(Sc10) sc[k] & CC !x0 CC; if k is contradictory

Figure 4.11Rewrite Rules for Conditional Constraints, Part I

where each context is mapped into the set of constraints that are ef-

fective in this context. For each context, the rewrite system does the

same thing as the unconditional version, hence for each context there

are only nitely many rule applications possible. However, this argu-

ment does not suce, since there are innitely many dierent contexts.

But if we restrict ourselves to the consideration of relevant disjunction

names, i.e. names that do appear in the initial feature description, we

nd that there is only a nite (although of course exponential) number

of relevant partial contexts, i.e. contexts that are dened only for the

disjunction names in use. We can dene the size of a set of conditional

constraints so that for each constraint the number of dierent partial

contexts where this constraint is eective is taken as a factor (this de-

nes the size relative to a given set of disjunction names). Now each of

the rewrite rules obtained by the schemes above decreases the size of a

conditional description. This is not necessarily true for the combination

of equal constraints (Mc7), where the result can have the same size, nor

for the elimination of constraints with contradictory context descriptions

32 Chapter 4

Rules (Mc1):::(Mc6) apply only if k1 ^ k2 is consistent.

(Mc1) ?[k1] & x j t[k2] & CC!x0 ?[k1] & x j t[k2 ^ :k1] & CC

(Mc2) x jA[k1] & x jB[k2] & CC!x0 x jGLB(A;B)[k1 ^ k2] & x jA[k1 ^ :k2]

& x jB[k2 ^ :k1] & CC(Mc3a) x ja[k1] & x j:y[k2] & CC

!x0 x ja[k1] & y j:a[k1 ^ k2] & x j:y[k2 ^ :k1] & CC(Mc3b) x ja[k1] & x jf:y[k2] & CC

!x0 x ja[k1] & y jNONE[k1 ^ k2] & x jf:y[k2 ^ :k1] & CC(Mc4a) x jA[k1] & x j:B[k2] & CC; where A B

!x0 ?[k1 ^ k2] & x jA[k1 ^ :k2] & x j:B[k2 ^ :k1] & CC(Mc4b) x jA[k1] & x j:B[k2] & CC; where GLB(A;B) = ?

!x0 x jA[k1] & x j:B[k2 ^ :k1] & CC(Mc5) x j:A[k1] & x j:B[k2] & CC; where A < B

!x0 x j:B[k2] & x j:A[k1 ^ :k2] & CC(Mc6) x jf:y[k1] & x jf:z[k2] & CC

!x0 x jf:y[k1] & z jy[k1 ^ k2] & x jf:z[k2 ^ :k1] & CC(Mc7) sc[k1] & sc[k2] & CC

!x0 sc[k1 _ k2] & CC

Figure 4.12Rewrite Rules for Conditional Constraints, Part II

(Sc10), which are not counted at all. But if we take pairs (s; n), where

s is the weighted size of the descriptions and n is the number of con-

straints, then we nd that each rule application decreases at least one of

these numbers and s is never increased. Hence the rewrite system will

terminate.


4.6.3 Discussion

We have applied the method proposed in [Maxwell & Kaplan, 1989] to

our rewrite system for simple feature descriptions and we have obtained

a rewrite system that supports disjunctive information. This system

has several advantages, one of the most important being the fact that

the locality of disjunctions is fully maintained during the computation.

This means that information that does not interact with a disjunction

needs to be represented only once and never has to multiplied out with

this disjunction. Consequently, only those disjunctions are multiplied

out with each other for which this is really necessary in order to test

consistency.

However, the formulation given here is still on a very abstract level and

in order to obtain an ecient implementation, a couple of details have

to be claried. One of the most critical points is the fact that at a very

central point of the algorithm (in an inner loop, to speak in programmers

terms), context descriptions have to be checked for compatibility in order

to nd out if two constraints can coexist or if they conflict and have to be

rewritten. Since context descriptions can be unrestricted propositional

formulas (and the algorithm does in fact produce formulas containing

conjunction, disjunction and negation), the test for compatibility can

not be implemented eciently, since it is known to be NP-complete.Hence in the worst case, in which all disjunctions interact in some way,

we do not only get an exponential number of rewrite steps, but each

step may involve tests that need exponential time. Since the overall

number of constraints for a given variable might grow exponentially in

such a scenario and since each of them has to be checked with the other

constraints concerning the same variable, each rewrite step could involve

an exponential number of compatibility test. Overstating it a bit, in the

\even worse than worst" case14 our algorithm could cube the complexity

of expansion to DNF.

For an ecient implementation, it does not suce to index the con-

straints under the variables they refer to, but the constraints concerning

the same variable have to be indexed according to their context descrip-

tions in a way that reduces the number of compatibility tests between

them. One aspect of this point is the observation (also mentioned in

[Maxwell & Kaplan, 1989]) that the outcome of a rewrite step are con-

14We do not know if this case can arise, actually.

34 Chapter 4

straints with mutually incompatible context descriptions. Of course, this

should be exploited and the mutually irrelevance of such constraints (and

those resulting from them in further rewrite steps : : : ) should not have

to be recomputed again and again. However, this is not easy to do, since

there are a couple of dierent relations in which context descriptions can

stand (incompatibility, compatibility, subsumption) that would all have

to be treated dierently in such an indexing scheme.

Another diculty, which is closely related to the problems mentioned,

lies in the notion of disjunctive substitution. Here, all constraints con-

cerning a variable have to be compared with the context description of

the substitution. Some of the constraints may remain unchanged, in

some of them the variable has to be replaced, and some have to be split

into one for the old variable (but with new context description) and one

for the new variable. The constraints obtained for the new variable must

then be unied with those that were already present for this variable,

what might involve a reorganization of the context indices and trigger

some new rewrite steps.

It is clear that the context-indexing scheme yet to be invented for

conditional constraints will have to support several rather intricate op-

erations. The representation given so far is indierent to such questions

and gives no indication of an answer. It is not quite so clear, how the

work that has been done to optimize non-disjunctive unication algo-

rithms (for example the almost linear solution to the union/nd problem,

see e.g. [Martelli & Montanari, 1982]) could be exploited in a simple way

or how the implementation of disjunctive constraint satisfaction could

exploit a non-disjunctive unication algorithm available in the program-

ming language or environment (e.g. Prolog).

4.7 From Conditional Constraints to Contexted Vari-ables

In this section we want to propose another rewrite system for feature de-

scriptions, which oers some simple answers to the questions mentioned

in the last section. It can be seen as an implementation of the algo-

rithm given above, since some of the operations which have been used

there, but were not dened (test for compatibility of contexts, context-

indexing) will be (implicitly) part of the algorithm given here. Although


in this respect the algorithm will be more detailed, it will nevertheless

be simpler.

Our method extends the method given in [Dorre & Eisele, 1989] to

feature descriptions containing negation. It is also an extension inas-

much we give a translation from our feature terms into appropriate fea-

ture descriptions. Originally, the method evolved from a prototypical

Prolog-implementation of a unication algorithm for named disjunction,

where extendable feature structures and path equivalences are repre-

sented with logical variables, which are instantiated during the unica-

tion process. Prolog provides unconditional substitution (instantiation)

of variables as a primitive operation, but it is not easy (esp. if eciency

is important) to implement a modied variant, such as conditional sub-

stitution. Hence we tried to nd a representation that could save us the

need for doing so. We were in fact able to nd such a representation

and it turns out that our approach is simple and has a couple of addi-

tional advantages, including the fact that in the core of our algorithm

we can restrict ourselves to the treatment of purely conjunctive context

descriptions, which can be processed more eciently.

4.7.1 Context-Uniqueness and Variants of Variables

The key idea is to restrict the use of variables in such a way that it is

safe to replace conditional substitution by conventional substitution. For

instance, one can easily see that there is no dierence between CCx!yand CCx!y[k], if for all constraints sc[k0] containing the variable x, k0

entails k, i.e. k0 describes only contexts also covered by k. Our trick willbe to require that essentially all occurences of a variable x aect the

same set of contexts, e.g. those described by k0. Then every conditionalsubstitution x ! y[k] can be replaced by the substitution x ! y, ifk0 entails k, especially if k0 k. We call this condition (which will bedened more precisely below) the context-uniqueness of variables. We

will set up the normal form and the rewrite system in such a way, that

conditional substitutions of x always happen in k0 and that context-uniqueness of a description is maintained during the rewrite process.

Before we dene context-uniqueness, we rst observe that an occur-

rence of a variable x0 in a conditional constraint x jt[k] is relevant to allcontexts in k, if x0 occurs outside the scope of a disjunction in t, whereasthis occurrence is relevant only to contexts described by di : l ^ k, if xis embedded in the left hand side of a disjunction labeled di, and anal-

36 Chapter 4

ogously for the right hand side and for deeper embedded occurrences.

Context-uniqueness will require that each occurence of a variable is rele-

vant to the same set of contexts . The relevant contexts will be regarded

as an inherent and invariant property of variables, and we will intro-

duce a function Con : V 7! CDc that maps each variable in use to apurely conjunctive description of the contexts it is relevant to. As a

consequence of context-uniqueness, it will not be necessary to represent

context descriptions with constraints that contain a variable, since the

possible contexts of a constraint can be seen from the variable(s) it con-

tains. However, the constraint ?[k], expressing inconsistency, will stillneed its context description.

When representing disjunctive information, we have to connect the

root variable (which is relevant to all contexts) with variables occuring

in conditional constraints without violating context-uniqueness. In our

normal form, we will use constraints of the form x j x1 td1 x2 to makesuch connections. If x is relevant to all contexts described by some

k, context-uniqueness will enforce that x1 and x2 are relevant only to

contexts in d1: l ^ k and d1:r ^ k, respectively. The constraint says thatx and x1 have to be identical in contexts described by k ^ d1: l and sodo x and x2 in contexts described by k ^ d1:r. Such constraints can beseen as bifurcations that distribute the information attached to x over

the variables on the right-hand side.

We will call x1 and x2 variants of x, to be more precise x1 will be

called the d1: l-variant and x2 the d1:r-variant of x. Assume an additional

constraint x1 jx3 td2 x4, then x3 will be called the d1 : l ^ d2 : l-variantof x and so on. x1 and x2 (but not x3 etc.) will be called direct variants

of x. We will (e.g. during the translation of a description into context-

unique form) refer to a variant of a variable x without having a variable

name for this variant. To this end, we will use a special notation x=k

to denote the k-variant of x. Such expressions will be called contexted

variables.

Denition 12 (Contexted Variables) A contexted variable is a pair

x=k where x 2 V and k 2 CDc.Vc will denote the union of V with the set of contexted variables. Ele-

ments of Vc will be written with capital letters X;Y; Z;X1; Y1 : : :

To mark the distinction, we will sometimes call the members of V pure

variables.


Now, instead of accumulating constraints on the variable x which might

be eective in dierent contexts and could interact in complicated ways,

we can introduce new variables as variants of x and attach the informa-

tion to them.

In our constraints, we will also employ feature terms containing con-

texted variables.

Denition 13 (Contexted Feature Terms) A contexted feature term

is built according to denition 1, but where both pure and contexted

variables may occur.

The set of contexted feature terms will be denoted by FTc. We will

generalize our notation so that henceforth s; t; t1 : : : might also denote

contexted feature terms.

The denotation of a contexted feature term in a context 2 fl; rgDunder an assignment 2 UV is dened as for usual feature terms byadding:

[[x=k]]; :=

f(x)g if j=c k; otherwise

4.7.2 Context-Unique Feature Descriptions

Our new descriptions will have three components: A root variable x0,

a set of constraints containing contexted variables and feature terms

x j t or conditional inconsistencies ?[k], and a context assignment , i.e.a mapping from the variables occuring in the constraints to the set of

context descriptions.

Denition 14 (Context Assignment) A context assignment Con is

a partial mapping from V into CDc, the set of conjunctive context

descriptions. The domain of a context assignment Con can be extended

to contexted variables by dening: Con(x=k) := Con(x) ^ kIn order to restrict ourselves to context-unique descriptions, we have

to dene the context compatibility of a feature term. This denition is

somewhat technical and the reader can skip it, since our algorithm will

produce only context-unique descriptions, anyway.

Denition 15 (Context compatibility) Given a partial assignment

Con : V 7! CDc, a contexted feature term t is context compatibleto a context description k with respect to Con, written t Con k,according to the following conditions.

38 Chapter 4

A Con k for arbitrary k 2 CDcX Con k i Con(X) k:t Con k i t Con kf:t Con k i t Con ks u t Con k i s Con k and t Con ks td t Con k i s Con k ^ d: l and t Con k ^ d:r

Denition 16 (Context-unique feature descriptions) A context-unique

feature description is a triple (x0; CUC;Con) such that:

x0 2 V CUC is a set of constraints which either have the form?[k], where k 2 CD orX jt, where X 2 Vc; t 2 FTc Con is a context assignment which is dened for all variables inCUC

The constraints in CUC are context-unique with respect to Con,i.e. for every X jt 2 CUC : t Con Con(X)

The semantics of context-unique feature descriptions is given by the

satisfaction relation j=Con between variable assignments15 in a contextand constraints, which is parametrized with a context assignment.

; j=Con X jt i 6j=c Con(X) or (X) 2 [[t]];; j=Con ?[k] i 6j=c k

The denotation of a context-unique f-description is dened as:

[[(x0; CUC;Con)]] :=

f(x0) j 2 UV ^ 9 2 fl; rgD : 8cuc 2 CUC : ; j=Con cucg

4.7.3 Translation to Context-Unique Form

We will now give a translation algorithm that computes for a given fea-

ture description (x0; fx0 jt[true]g) an equivalent context-unique featuredescription (x0; CUC;Con).

15 is extended to contexted variables by dening: (x=k) := (x)


Decomposition The rst step is to decompose constraints contain-

ing complex feature terms into simpler constraints. We do not need

new rules to do that, we can just apply the rules (Sc1); (Sc2); (Sc3a),

and (Sc4) : : : (sc10) as long as possible. This process produces only con-

straints of the form x j t[k] or ?[k], where k 2 CDc. We do not usethe rules (Sc3b; c) nor those from (Mc1) : : : (Mc7), since they could in-

troduce negation or disjunction into context descriptions of constraints

containing variables. Since this step is a subset of the algorithm given

in the last section, it will terminate.

After decomposition, we will have a feature description (x0; CC1)

equivalent to the initial description, which contains only constraints of

the following forms:

x jA[k] or x j:A[k], where A 2 Snf>;?g x jy[k] or x j:y[k], where x; y 2 V; x 6= y x jf:y[k], with y 2 V ?[k]

All context descriptions k appearing in CC1 will be purely conjunctive

and consistent. The description is not normal, since it still contains

equalities and the constraints might conflict with each other. But before

normalizing it, we will turn it into context-unique form.

Constructing a Context-Unique Feature Description In the sec-

ond step, we construct a context-unique feature description by replacing

all occurences of variables in conditional constraints x j t[k] 2 CC1 bytheir respective k-variants, if k 6 true. We do not yet have variablenames for these variants, but we can use the contexted variables x=k. If

k true, we do not have to introduce variants, we can use the vari-ables themselves. Since the context descriptions of the constraints will

be implicitly encoded in their variables, we can omit them. However,

the context description of a constraint ?[k] can not be inferred fromvariables, so we keep such constraints unchanged.

Variables occuring in the original description will be regarded as rel-

evant to all contexts. Hence we initialize Con so, that all variables

appearing in CC1 are mapped to true. We then get (x0; CUC;Con)

Documents

Document92