Unification Knight

Embed Size (px)

Citation preview

  • 8/2/2019 Unification Knight

    1/32

    Unification: A Multidisciplinary SurveyKEVIN KNIGHTComputer Science Department, Carnegie-Mellon University, Pittsburgh, Pennsylvania 15213-3890

    The unification problem and several variants are presented. Various algorithms and datastructures are discussed. Research on unification arising in several areas of computerscience is surveyed, these areas include theorem proving, logic programming, and naturallanguage processing. Sections of the paper include examples that highlight particular usesof unification and the special problems encountered. Other topics covered are resolution,higher order logic, the occur check, infinite terms, feature structures, equational theories,inheritance, parallel algorithms, generalization, lattices, and other applications ofunification. The paper is intended for readers with a general computer sciencebackground-no specific knowledge of any of the above topics is assumed.Categories and Subject Descriptors: E.l [Data Structures]: Graphs; F.2.2 [Analysis ofAlgorithms and Problem Complexity]: Nonnumerical Algorithms and Problems-Computations on discrete structures, Pattern matching ; Ll.3 [Algebraic Manipulation]:Languages and Systems; 1.2.3 [Artificial Intelligence]: Deduction and TheoremProving; 1.2.7 [Artificial Intelligence]: Natural Language ProcessingGeneral Terms: AlgorithmsAdditional Key Words and Phrases: Artificial intelligence, computational complexity,equational theories, feature structures, generalization, higher order logic, inheritance,lattices, logic programming, natural language processing, occur check, parallel a lgorithms,Prolog, resolution, theorem proving, type inference, unification

    INTRODUCTIONThe unification problem has been studiedin a number of fields of computer science,including theorem proving, logic program-ming, natural language processing, compu-tational complexity, and computabilitytheory. Often, researchers in a particularfield have been unaware of work outsidetheir specialty. As a result, the problem hasbeen formulated and studied in a variety ofways, and there is a need for a generalpresentation. In this paper, I will elucidatethe relationships among the various con-ceptions of unification.The sections are organized as follows.The three sections The Unification Prob-

    lem, Unification and Computational Com-plexity, and Unification: Data Structuresand Algorithms introduce unification. Def-initions are made, basic research is re-viewed, and two unification algorithms,along with the data structures they require,are presented. Next, there are four sectionson applications of unification. These appli-cations are theorem proving, logic pro-gramming, higher order logic, and naturallanguage processing. The section Unifica-tion and Feature Structures is placed beforethe section on natural language processingand is required for understanding that sec-tion. Following are four sections coveringvarious topics of interest. Unification andEquational Theories presents abstract

    Permission to copy without fee all or part of this material is granted provided that the copies are not made ordistributed for direct commercial advantage, the ACM copyright notice and the title of the publication and itsdate appear, and notice is given that copying is by permission of the Association for Computing Machinery. Tocopy otherwise, or to republish, requires a fee and/or specific permission.0 1989 ACM 0360-0300/89/0300-0093 $1.50

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    2/32

    94 l Kevin KnightCONTENTS

    INTRODUCTION1. THE UNIFICATION PROBLEM2. UNIFICATION AND COMPUTATIONAL

    COMPLEXITY3. UNIFICATION: DATA STRUCTURES AND

    ALGORITHMS4. UNIFICATION AND THEOREM PROVING

    4.1 The Resolution Rule4.2 Research

    5. UNIFICATION AND LOGICPROGRAMMING5.1 Example of Unification in Prolog5.2 Research

    6. UNIFICATION AND HIGHER ORDERLOGIC6.1 Example of Second-Order Unification6.2 Research

    7. UNIFICATION AND FEATURESTRUCTURES

    8. UNIFICATION AND NATURAL LANGUAGEPROCESSING8.1 Parsing with a Unification-Based Grammar8.2 Research

    9. UNIFICATION AND EQUATIONALTHEORIES9.1 Unification as Equation Solving9.2 Research

    10. PARALLEL ALGORITHMS FORUNIFICATION11. UNIFICATION, GENERALIZATION, AND

    LATTICES12. OTHER APPLICATIONS OF

    UNIFICATION12.1 Type Inference12.2 Programming Languages12.3 Machine Learning

    13. CONCLUSION13.1 Some Properties of Unification13.2 Trends in Unification Research

    ACKNOWLEDGMENTSREFERENCES

    work on unification as equation solving.Parallel Algorithms for Unification reviewsrecent literature on unification and paral-lelism. Unification, Generalization, andLattices discusses unification and its dualoperation, generalization, in another ab-stract setting. Finally, in Other Applica-tions of Unification, research that connectsunification to research in abstract datatypes, programming languages, and ma-chine learning is briefly surveyed. The

    conclusion gives an abstract of some gen-eral properties of unification and projectsresearch trends into the future.Dividing this paper into sections was adifficult task. Some research involvingunification straddles two fields. Take, forexample, higher order logic and theoremproving-should this work be classified asUnification and Higher Order Logic orUnification and Theorem Proving? Orconsider ideas from logic programming putto work in natural language processing-isit Unification and Logic Programming orUnification and Natural Language Pro-cessing? Likewise, some researchers workin multiple fields, making their work hardto classify. Finally, it is sometimes difficultto define the boundary between two disci-plines; for example, logic programminggrew out of classical theorem proving, andthe origin of a particular contribution tounification is not always clear. I have pro-vided a balanced classification, giving crossreferences when useful.Finally, the literature on unification isenormous. I present all of the major results,if not the methods used to reach them.1. THE UNIFICATION PROBLEMAbstractly, the unification problem is thefollowing: Given two descriptions x and y,can we find an object z that fits bothdescriptions?The unification problem is most oftenstated in the following context: Given twoterms of logic built up from function sym-bols, variables, and constants, is there asubstitution of terms for variables that willmake the two terms identical? As an ex-ample consider the two terms f(x, y) andf(g(y, a), h(a)). They are said to be unifi-able, since replacing x by g(h(a), a) and yby h(u) will make both terms look likef(g(h(a), a), h(u)). The nature of suchunifying substitutions, as well as the meansof computing them, makes up the study ofunification.

    In order to ensure the consistency, con-ciseness, and independence of the varioussections of this paper, I introduce a fewformal definitions:

    ACM Computing Surveys, Vo l. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    3/32

    Unification: A Multidisciplinary Survey l 95Definition 1.1A variable symbol is one of (x, y, z, . . .).A constant symbol is one of (a, b, c, . . .). Afunction symbol is one of ( , g, h, . . .I.Definition 1.2A term is either a variable symbol, a con-stant symbol, or a function symbol followedby a series of terms, separated by commas,enclosed in parentheses. Sample terms area, x7 ix, Y), fi&, Y), hiz)). Terms aredenoted by the symbols (s, t, u, . . .).Definition 1.3A substitution is first a function from vari-ables into terms. Substitutions are denotedby the symbols {a, 7, 0, . . .). A substitutionu in which a(x) is g(h(a), a) and a(y) ish(a) can be written as a set of bindingsenclosed in curly braces; that is, (X tg(h(a), a), y t h(a)]. This set of bindingsis usually finite. A substitution is also afunction from terms to terms via its appli-cation. The application of substitution uto term t is also written u(t) and denotesthe term in which each variable xi in t isreplaced by a(s).

    Substitutions can be composed: a0 (t) de-notes the term t after the application of thesubstitutions of 19 ollowed by the applica-tion of the substitutions of u. The substi-tution a0 is a new substitution built fromold substitutions u and (3 by (1) modifying0 by applying u to its terms, then (2) addingvariable-term pairs in u not found in 8.Composition of substitutions is associative,that is, (&3)7(s) = a(&)(s) but is not ingeneral commutative, that is, ad(s) # &is).Definition 1.4Two terms s and t are unifiable if thereexists a substitution u such that u(s) =u(t). In such a case, u is called a unifier of

    I will often use the word term to describe themathematical object just defined. Do not confuse theseterms with terms of predicate logic, which includethings like V(n)V(y):(f(x) + f(y)) c, (if(y) -Y(x)).

    s and t, and u(s) is called a unification of sand t.Definition 1.5A unifier u of terms s and t is called a mostgeneral unifier (MGU) of s and t if for anyother unifier 0, there is a substitution 7such that 7u = 8. Consider, for example, thetwo terms f(x) and fig(y)). The MGU is(X t g(y)], but there are many non-MGUunifiers such as (x t g(a), y t a), Intui-tively, the MGU of two terms is the sim-plest of all their unifiers.Definition 1.6Two terms s and tare said to match if thereis a substitution u such that u(s) = t.Matching is an important variant ofunification.Definition 1.7Two terms s and t are infinitely unifiable ifthere is a substitution, possibly containinginfinitely long terms, that is a unifier of sand t. For example, x and f(x) are notunifiable, but they are infinitely unifi-able under the substitution u = (X tfififi . . . )))I, since 4x1 and dfix)) areboth equal to f (f (f ( . . . ))). An infinitelylong term is essentially one whose stringrepresentation (to be discussed in the nextsection) requires an infinite number ofsymbols-for a formal discussion, seeCourcelle [ 19831.

    These general definitions will be usedoften throughout the paper. Concepts spe-cific to particular areas of study will bedefined in their respective sections.2. UNIFICATION AND COMPUTATIONAL

    COMPLEXITYIn his 1930 thesis, Herbrand [1971] pre-sented a nondeterministic algorithm tocompute a unifier of two terms. This workwas motivated by Herbrands interest inequation solving. The modern utility andnotation of unification, however, originatedwith Robinson. In his pioneering paperRobinson [1965] introduced a method of

    ACM Computing Surveys, Vo l. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    4/32

    96 l Kevin Knighttheorem proving based on resolution, apowerful inference rule. Central to the meth-od was unification of first-order terms.Robinson proved that two first-order terms,if unifiable, have a unique most generalunifier. He gave an algorithm for comput-ing the MGU and proved it correct.Guard [ 19641 independently studied theunification problem under the name ofmatching. Five years later, Reynolds [ 19701discussed first-order terms using latticetheory and showed that there is also aunique most specific generalization ofany two terms. See Section 11 for moredetails.Robinsons original algorithm was inef-ficient, requiring exponential time andspace. A great deal of effort has gone intoimproving the efficiency of unification. Theremainder of this section will review thateffort. The next section will discuss basicalgorithmic and representational issues indetail.Robinson himself began the research onmore efficient unification. He wrote a note[Robinson 19711 about unification in whichhe argued that a more concise representa-tion for terms was needed. His formulationgreatly improved the space efficiency ofunification. Boyer and Moore [1972] gavea unification algorithm that shares struc-ture; it was also space efficient but was stillexponential in time complexity.In 1975, Venturini-Zilli [ 19751 intro-duced a marking scheme that reduced thecomplexity of Robinsons algorithm toquadratic time.

    Huets [1976] work on higher order uni-fication (see also Section 6) led to animproved time bound. His algorithm isbased on maintaining equivalence classesof subterms and runs in O(na(n)) time,where a(n) is an extremely slow-growingfunction. We call this an almost linearalgorithm. Robinson also discovered thisalgorithm, but it was not published inaccessible form until Vitter and Simons[1984, 19861 study of parallel unification.The algorithm is a variant of the algorithmused to test the equivalence of two finiteautomata. Ait-Kaci [1984] came up with asimilar, more general algorithm in his dis-sertation. Baxter [ 19731 also discovered analmost-linear algorithm. He presented the

    algorithm in his thesis [Baxter 19761, inwhich he also discussed restricted andhigher order versions of unification.In 1976, Paterson and Wegman [1976,19781 gave a truly linear algorithm for uni-fication. Their method depends on a carefulpropagation of the equivalence-class rela-tion of Huets algorithm. The papers ofPaterson and Wegman are rather brief;de Champeaux [1986] helps to clarify theissues involved.Martelli and Montanari [1976] inde-pendently discovered another linear al-gorithm for unification. They furtherimproved efficiency [Martelli and Monta-nari 19771 by updating Boyer and Mooresstructure-sharing approach. In 1982, theygave a thorough description of an efficientunification algorithm [Martelli and Mon-tanari 19821. This last algorithm is nolonger truly linear but runs in timeO(n + m log m), where m is the number ofdistinct variables in the terms. The paperincludes a practical comparison of this al-gorithm with those of Huet and Patersonand Wegman. Martelli and Montanari alsocite a study by Trum and Winterstein[ 19781, who implemented several unifica-tion a lgorithms in Pascal in order to com-pare actual running times.Kapur et al. [ 19821 reported a new linearalgorithm for unification. They also relatedthe unification problem to the connectedcomponents problem (graph theory) andthe online/offline equivalence problems(automata theory). Unfortunately, theirproof contained an error, and the runningtime is actually nonlinear (P. Narendran,personal communication, 1988).Corbin and Bidoit [ 19831 rehabilitatedRobinsons original unification algorithmby using new data structures. They reducedthe exponential time complexity of the al-gorithm to O(n) and claimed that the al-gorithm is simpler than Martelli andMontanaris and superior in practice.The problem of unifying with infiniteterms has received some attention. (Recallthe definition of infinitely unifiable.) In histhesis, Huet [1976] showed that in the caseof infinite unification, there still exists asingle MGU-he gave an almost-linearalgorithm for computing it. Colmerauer[ 1982b] gave two unification algorithms for

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    5/32

    Unification: A Multidisciplinary Survey l 97function UNIFY(t1, t2) + (unifiable: Boolean, Q: substitution)beginif tl or t2 is a variable thenbeginlet x be the variable, and let t be the other termif n = t, then (unifiable, a) c (true, 0)

    else if occur(x, t) then unifiable + falseelse(unifiable, o) +-- (true, {x + t))endelsebeginassume tl = f (xl, . . , x) and t2 = g(y,, . . . , ym)if f # g or m # n then unifiable t falseelsebeginkc0unifiable c trueo c nilwhile k < m and unifiable dobeginktk+l(unifiable, 7) - UNIFY(o(xk), u(yk))if unifiable then CT compose(7, u)endendendreturn (unifiable, a)end

    Figure 1. A version of Robinsons original unification algorithm.

    infinite terms, one theoretical and onepractical: The practical version is fasterbut may not terminate. Mukai [1983] gavea new variation of Colmerauers practicalalgorithm and supplied a terminationproof. Jaffar [ 19841 presented anotheralgorithm, designed along the lines ofMartelli and Montanari. Jaffar claimedthat his algorithm is simple and practical,even for finite terms; he discussed the rel-ative merits of several unification algo-rithms, including Colmerauer [ 1982b],Corbin and Bidoit [ 19831, Martelli andMontanari [1977], Mukai [1983], and Pa-terson and Wegman [1976]. None of thesealgorithms is truly linear: The theoreticalquestion of detecting infinite unifiability inlinear time was left open by Paterson andWegman and has yet to be solved.3. UNIFICATION: DATA STRUCTURES ANDALGORITHMSAs mentioned in the previous section,Robinson [ 19651 published the first modernunification algorithm. A version of thatalgorithm appears in Figure 1. (This ver-sion is borrowed in large part from Corbin

    and Bidoit [1983].) The function UNIFYtakes two terms (tl and t2) as argumentsand returns two items: (1) the Boolean-valued unifiable, which is true if and onlyif the two terms unify and (2) u, the unify-ing substitution. The algorithm proceedsfrom left to right, making substitutionswhen necessary. The routines occur andcompose need some explanation. Theoccur function reports true if the firstargument (a variable) occurs anywhere inthe second argument (a term). This func-tion call is known as the occur check and isdiscussed in more detail in Section 5. Thecompose operator composes two substi-tutions according to the definition given inthe Introduction of this paper.Robinsons original algorithm was expo-nential in time and space complexity. Themajor problem is one of representation.Efficient algorithms often turn on specialdata structures; such is the case with uni-fication. This paper will now discuss thevarious data structures that have beenproposed.First-order terms have one obvious rep-resentation, namely, the sequence of sym-bols we have been using to write them

    ACM Comput ing Surveys, Vol. 21. No. 1, March 1989

  • 8/2/2019 Unification Knight

    6/32

    98 l Kevin Knightk/l

    g h; :1 1n x

    Figure 2. Tree representation of the term f(g(f(x)), h(f(x))).

    down. (Recall the inductive definition givenat the beginning of this paper.) In otherwords, a term can be represented as a lineararray whose elements are taken from func-tion symbols, variables, constants, commas,and parentheses. We call this the stringrepresentation of a term. (As an aside, it isa linguistic fact that commas and parenthe-ses may be eliminated from a term withoutintroducing any ambiguity.) Robinsons[ 19651 unification algorithm (Figure 1) usesthis representation.The string representation is equivalentto the tree representation, in which a func-tion symbol stands for the root of a subtreewhose children are represented by thatfunctions arguments. Variables and con-stants end up as the leaves of such a tree.Figure 2 shows the tree representation ofthe term fMf(x)), W(x))).The string and tree representations areacceptable methods for writing down terms,and they find applications in areas in whichthe terms are not very complicated, forexample, Tomita and Knight [ 19881. Theyhave drawbacks, however, when used ingeneral unification algorithms. The prob-lem concerns structure sharing.Consider the terms f(g(f(x)), h(f(x)))andf(g(f(a)), h(f(a))). Any unification al-gorithm will ensure that the function sym-bols match and ensure that correspondingarguments to the functions are unifiable.In our case, after processing the subtermsf(x) and f(a), the substitution {x t al willbe made. There is, however, no need toprocess the second occurrences of f(x) andf(a), since we will just be doing the samework over again. What is needed is amore concise representation for the terms:a graph representation. Figure 3 showsgraphs for this pair of terms.

    The subterm f(x) is shared; the work tounify it with another subterm need be doneonly once. If f(x) were a much larger struc-ture, the duplication of effort would bemore serious of course. In fact, if subtermsare not shared, it may be necessary to gen-erate exponentially large structures duringunification (see Corbin and Bidoit [1983],de Champeaux [1986], and Huet [1976]).The algorithms of Huet [1976], Baxter[1973, 19761, and Jaffar [ 19841 all use thegraph representation. Paterson and Weg-mans linear algorithm [1976, 19781 uses agraph representation modified to includeparent pointers so that leaves contain in-formation about their parents, grandpar-ents, and so on. The Paterson-Wegmanalgorithm is still linear for terms repre-sented as strings, since terms can be con-verted from string representation to graphrepresentation (and vice versa) in lineartime [de Champeaux 19861. Corbin andBidoit [1983] also use a graph representa-tion that includes parent pointers. The al-gorithms of Martelli and Montanari [ 19861and of Kapur et al. [ 19821 both deal directlywith sets of equations produced by a unifi-cation problem. Their representations alsoshare structure.What I have called the graph represen-tation is known in combinatorics as adirected acyclic graph, or dag, in which allvertices of the graph are labeled. If we allowour graphs to contain cycles, we can modelinfinitely long terms, and unification mayproduct substitutions involving such terms.This matter is discussed in connection withthe occur check in Section 5.

    Before closing this section, I will presenta unification algorithm more efficient thanthe one in Figure 1. I have chosen a variantof Huets algorithm, since it is rather simple

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    7/32

    Unification: A Multidisciplinary Survey l 99f f1(1 1(1

    g hg hIr( 1J

    if1x a

    Figure 3. Term graphs for f(g(f (x)), h(f (x))) and f(g(f (a)), h(f (a))).

    function UNIFY(t1, t2) =+ (unifiable: Boolean, (T: substitution)beginpairs-to-unify + {(tl, t2))for each node z in tl and t2,z.class + *while pairs-to-unify # 0 dobegin(31, ) -pop (pairs-to-unify)u + FIND(r)u + FIND(y)ifu#uthenif u and u are not variables, and u.symbol f v.symbol ornumberof(u.subnodes) # numberof(u.subnodes) thenreturn (false, nil)elsebeginw c UNION(u, u)if w = u and u is a variable thenu.class + vif neither u nor u is a variable thenbegin

    let(u,,..., un) be u.subnodeslet(u,,..., vn) be vsubnodesforicltondopush ((IA:, ui), pairs-to-unify)endendendForm a new graph composed of the root nodes of the equivalence classes.This graph is the result of the unification.If the graph has a cycle, return (false, nil), but the terms are infinitely unifiable.If the graph is acyclic, return (true, CT), here c is a substitution in which any variable x is mapped on to theroot of its equivalence class, that is, FIND(r).endFigure 4. A version of Huets unification algorithm.

    and uses the graph representation of terms.The algorithm is shown in Figure 4.Terms are represented as graphs whosenodes have the following structure:type = node

    symbol: a function, variable, or constantsymbolsubnodes: a list o f nodes that are children of

    this nodeclass: a node that represents this nodesequivalence classend

    Huets algorithm maintains a set ofequivalence classes of nodes using theFIND and UNION operations for mergingdisjoint sets (for an explanation of theseoperations, see Aho et al. [1974]). Eachnode starts off in its own class, but as theunification proceeds, classes are mergedtogether. If nodes with different functionsymbols are merged, failure occurs. At theend, the equivalence classes over the nodesin the two-term graphs form a new graph,namely, the result of the unification. If this

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    8/32

    100 l Kevin Knightgraph contains a cycle, the terms are infi-nitely unifiable; if the graph is acyclic, theterms are unifiable. Omitting the acyclicitytest corresponds to omitting the occurcheck discussed in Section 5.The running time of Huets algorithm isO(na(n)), where n is the number of nodesand al(n) is an extremely slow-growingfunction. a(n) never exceeds 5 in practice,so the algorithm is said to be almost linear.Note also that it is nonrecursive, which mayincrease efficiency. For more details, seeHuet [1976] and Vitter and Simons [1984,19861. A version of this algorithm, whichincludes inheritance-based information(see Section 7), can be found in Ait-Kaci[ 19841 and Ait-Kaci and Nasr [ 19861.

    4. UNIFICATION AND THEOREM PROVINGRobinsons [ 19651 work on resolution theo-rem proving served to introduce the uni-fication problem into computer science.Automatic theorem proving (a.k.a. com-putational logic or automated deduction)became an area of concentrated research,and unification suddenly became a topic ofinterest. In this section, I will only discussfirst-order theorem proving, although thereare theorem-proving systems for higher or-der logic. Higher order unification will bediscussed in a later section.

    4.1 The Resolution RuleI will briefly demonstrate, using an exam-ple, how unification is used in resolutiontheorem proving. In simplest terms, reso-lution is a rule of inference that allows oneto conclude from A or B and not-A orC that B or C. In real theorem proving,resolution is more complex. For example,from the two facts(1) Advisors get angry when students donttake their advice.(2) If someone is angry, then he doesnt takeadvice.

    The first job is to put these three state-ments into logical form:(la) Vz : student(z) + [?akesadvice(z,advisor(z)) + angry(advisor(z))](2a) Vx, y: angry(x) + ltakesadvice(x, y)(3a) VW: student(w) + [angry(w) -+angry(advisor(w))]

    Next, we drop the universal quantifiersand remove the implication symbols (z +y is equivalent to lx V y):(lb) lstudent(z) V takesadvice(z,advisor(z)) V angry(advisor(z))(2b) langry(x) V ltakesaduice(x, y)(3b) lstudent(w) V langry(w) Vangry(advisor(w))(lb), (2b), and (3b) are said to be in clausalform. Resolution is a rule of inference thatwill allow us to conclude the last clausefrom the first two. Here it is in its simplestform:

    The Resolution RuleIf clause A contains some term s and clauseB contains the negation of some term t andif s and t are unifiable by a substitution c,then a resolvent of A and B is generated by(1) combining the clauses from A and B,(2) removing terms s and t, and (3) applyingthe substitution u to the remaining terms.If clauses A and B have a resolvent C, thenC may be inferred from A and B.

    In our example, let s be takesadvice(z,advisor(z)) and let t be takesadvice(x, y).(lb) contains s, and (2b) contains thenegation of t. Terms s and t are unifi-able under the substitution {x t z, y tadvisor(z)]. Removing s from (lb) and thenegation of t from (2b) and applying thesubstitution to the rest of the terms in (lb)and (2b), we get the resolvent:(4) lstudent(z) V angry(advisor(z)) V~angry(z)

    We want to infer the third fact: Expression (4) is the same as (3b), subjectto disjunct reordering and renaming of the(3) If a student is angry, then so is his unbound variable, and therefore resolutionadvisor. has made the inference we intended.ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    9/32

    Unification: A Multidisciplinary Survey l 101Resolution is a powerful inference rule-so powerful that it is the only rule neededfor a sound and complete system of logic.The simple example here is intended onlyto illustrate the use of unification in theresolution method; for more details aboutresolution, see Robinsons paper [ 19651 andChapter 4 of Nilssons book [1980].

    4.2 ResearchMuch of the theorem-proving communityhas recently moved into the field of logicprogramming, while a few others havemoved into higher order theorem proving.I will present the work of these two groupsin different sections: Unification andHigher Order Logic and Unification andLogic Programming.

    5. UNIFICATION AND LOGICPROGRAMMING

    The idea of programming in logic camedirectly out of Robinsons work: The origi-nal (and still most popular) logic program-ming language, Prolog, was at first a tightlyconstrained resolution theorem prover.Colmerauer and Roussel turned it into auseful language [Battani and Meloni 1973;Colmerauer et al. 1973; Roussel 19751, andvan Emden and Kowaiski [1976] providedan elegant theoretical model based on Hornclauses. Warren et al. [1977] presented anaccessible early description of Prolog.Through its use of resolution, Prolog in-herited unification as a central operation.A great deal of research in logic program-ming focuses on efficient implementation,and unification has therefore received someattention. I will first give an example ofhow unification is used in Prolog; then Iwill review the literature on unification andProlog, finishing with a discussion of thelogic programming languages ConcurrentProlog, LOGIN, and CIL.

    5.1 Example of Unificat ion in PrologI turn now to unification as it is used inProlog. Consider a set of four Prolog asser-tions (stored in a database) followed by a

    query. (One example is taken from Clocksinand Mellish [ 19811.)likes(mary, food).likes(mary, wine).likes(john, wine).likes(john, mary).?-likes(mary,X),likes(john,X).

    The query at the end asks, Does Marylike something that John likes? Pro-log takes the first term of the querylikes (mary, X) and tries to unify it withsome assertion in the database. It succeedsin unifying the terms likes(mary, X) andlikes (mar-y, food) by generating the sub-stitution (X t food ). Prolog applies thissubstitution to every term in the query.Prolog then moves on to the second term,which is now likes(john, food). This termfails to unify with any other term in thedatabase.Upon failure, Prolog backtracks. That is,it undoes a previous unification, in thiscase, the unification of likes (mar-y, X) withlikes (mar-y, food). It attempts to unifythe first query term with another term inthe database: the only other choice islikes(mary, wine). The terms unify underthe substitution (X t wine], which isalso applied to the second query termlikes( john, X) to give likes( john, wine).This term can now unify with a term inthe database, namely, the third assertion,likes(john, wine). Done with all queryterms, Prolog outputs the substitutions ithas found; in this case, X = wine.I used such a simple example only to beclear. The example shows how Prologmakes extensive use of unification as apattern-matching facility to retrieve rele-vant facts from a database. Unification it-self becomes nontrivial when the terms aremore complicated.

    5.2 ResearchColmerauers original implementation ofunification differed from Robinsons in oneimportant respect: Colmerauer deliberatelyleft out the occur check, allowing Prologto attempt unification of a variable witha term already containing that variable.

    ACM Computing Surveys, Vol. 21, NO. 1, March 1989

  • 8/2/2019 Unification Knight

    10/32

    102 l Kevin Knightfoo(X, X).?- fdf(Y), Y).y = f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f( f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f( f( . . .

    Figure 5. Infinite unification in Prolog.

    Version 1.less(X, s(X)). /* any number is less than its successor */?- less(s( Y), Y). /* is there a Y whose successor is less than Y? */Y=s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s(s~s~s~s~s~s~s~s~s~s~~~s~s~s~s~s~s~s~s~s~s~s~s~s~s~s~s~s~s~s~s~s~s~s(s(s(s(s(s . . .

    Version 2.less(X, s(X)).foe :- less(s( Y), Y). /* we dont care about the value of Y */?- foe. /* is there a number whose successor is less than it? */yes. /* there is?! /

    Figure 6. Prolog draws the wrong conclusion because it lacks an occur check.

    For example, Prolog will unify theterms xand f(x), using the substitution{x t f(f(f(. . .)))I; Robinsons occurcheck, on .the other hand, would instantlyreport nonunifiability.Discarding the occur check correspondsto moving from unification to infinite uni-fication. Figure 5 shows Prologs infiniteunification in action. In unifying f(Y)and Y, Prolog comes up with an infinitesubstitution.Omitting the occur check results in effi-ciency gains that outweigh the risk in-curred. For example, the concatentation oftwo lists, a linear-time operation withoutthe occur check, becomes an O(n2) timeoperation with the occur check [Colmer-auer 1982b]. For this reason, most currentProlog interpreters omit the occur check.But since its unification is really infiniteunification, in this respect, Prolog can besaid to be unsound. Figure 6 (adaptedfrom Plaisted [ 19841) shows a case in whichProlog draws an unexpected conclusion,one that would be blocked by an occurcheck.Colmerauer [1982b, 19831 studied infi-nite terms in an effort to provide a theoret-ical model for real Prolog implementations

    that lack the occur check. He examinedthe termination problem and produced aunification algorithm for infinite terms.Plaisted [ 19841 presented a preprocessorfor Prolog programs that detects poten-tial problems due to Prologs lack of anoccur check. See the end of Section 2 fora discussion of algorithms for infiniteunification.As noted above, Prolog is a search engine,and when a particular search path fails,backtracking is necessary. As a result, someof the unifications performed along the waymust be undone. Studies of efficientbacktracking can be found in Bruynoogheand Pereira [ 19841, Chen et al. [ 19861, Cox[ 19841, Matwin and Pietrzykowski [ 19821,and Pietrzykowski and Matwin [ 19821.Mannila and Ukkonen [1986] viewedProlog execution as a sequence of unifi-cation and deunifications, relating it tothe UNION/FIND (disjoint set union)problem. They showed how standard (fast)unification algorithms can have poorasymptotic performance when we considerdeunifications; algorithms with better per-formance are given.Concurrent Prolog [Shapiro 19831 ex-tends Prolog to concurrent programming

    ACM Computing Surveys, Vol. 21. No. 1, March 1989

  • 8/2/2019 Unification Knight

    11/32

    Unification: A Multidisciplinary Survey l 103and parallel execution. In this model, uni-fication may have one of three possibleoutcomes: succeed, fail, or suspend. Unifi-cation suspends when a special read-onlyvariable is encountered-execution of theunification call is restarted when that vari-able takes on some value. It is not possibleto rely on backtracking in the ConcurrentProlog model because all computations runindependently and in parallel. Thus, unifi-cations cannot simply be undone as inProlog, and local copies of structures mustbe maintained at each unification. Strate-gies for minimizing these copies are dis-cussed in Levy [ 19831.

    ematical vein, here is a statement of theinduction property of natural numbers:V(f) : [(f (0) A V(x)[f (x) ---, f(x + Ul)

    + V(x)f (x)1

    Alt-Kaci and Nasr [1986] present anew logic programming language calledLOGIN, which integrates inheritance-based reasoning directly into the uni-fication process. The language is based onideas from Ait-Kacis dissertation [ 19841,described in more detail in Section 7.Mukai [1985a, 1985b] and Mukai andYasukawa [1985] introduce a variant ofProlog called CIL aimed at natural lan-guage applications, particularly those in-volving situation semantics [Barwise andPerry 19831. CIL is based on an extendedunification that uses conditional state-ments, roles, and types and contains manyof the ideas also present in LOGIN. Hasida[1986] presents another variant of unifica-tion that handles conditions on patterns tobe unified.

    That is, for any predicate f, if f holds for 0and if f (x) implies f (x + l), then f holds forall natural numbers.For a theorem prover to deal with suchstatements, it is natural to work with ax-ioms and inference rules of a higher orderlogic. In such a case, unification of higherorder terms has great utility.Unification in higher order logic requiressome special notation for writing downhigher order terms. The typed X-calculus[Church 1940; Henkin 19501 is one com-monly used method. I will not introduceany formal definitions, but here is an ex-ample: A (u, v)(u) stands for a function oftwo arguments, u and v, whose value isalways equal to the first argument, u. Uni-fication of higher order terms involves h-conversion in conjunction with applicationof substitutions.Finally, we must distinguish betweenfunctibn constants (denoted A, B, C, . . . )and function variables (denoted f, g, h, . . . ),which range over those constants. Now wecan look at unification in the higher orderrealm.

    6. UNIFICATION AND HIGHER ORDERLOGIC

    First-order logic is sometimes too limitedor too unwieldy for a given problem.Second-order logic, which allows variablesto range over functions (and predicates) aswell as constants, can be more useful. Con-sider the statement Cats have the sameannoying properties as dogs. We can ex-press this in second-order logic as

    6.1 Example of Second-Order UnificationConsider the two terms f (x, b) and A(y).Looking for a unifying substitution for vari-ables f, x, and y, we find

    f + X(u, v)(u)x+A(y)Y+Y

    This substitution produces the unifica-tion A(y). But if we keep looking, we findanother unifier:V(x)~(y)V(f) :

    [cat(x) A dog(y) A annoying(f)]--, [f(x) * f(Y)1

    f +- Nu, v)A(g(u, v)).X+XY - dx, b)

    Note that the variable f ranges over pred- In this case, the unification is A (g(x, b)).icates, not constant values. In a more math- Notice that neither of the two unifiers is

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    12/32

    104 l Kevin Knightmore general than the other! A pair ofhigher order terms, then, may have morethan one most general unifier. Clearly,the situation is more complex than that offirst-order unification.6.2 ResearchGould [1966a, 1966b] was the first to inves-tigate higher order unification in his disser-tation. He showed that some pairs ofhigher order terms possess many generalunifiers, as opposed to the first-order case,where terms always (if unifiable) have aunique MGU. Gould gave an algorithm tocompute general unifiers, but his algorithmdid not return a complete set of such uni-fiers. He conjectured that the unifiabilityproblem for higher order logic was solvable.Robinson [1968], urged that research inautomatic theorem proving be moved intothe higher order realm. Robinson was non-plussed by Godels incompleteness resultsand maintained that Henkins [ 19501 inter-pretation of logical validity would providethe completeness needed for a mechaniza-tion of higher order logic. Robinson [1969,19701 followed up these ideas. His theoremprovers, however, were not completelymechanized; they worked in interactivemode, requiring a human to guide thesearch for a proof.Andrews [ 19711 took Robinsons originalresolution idea and applied it rigorously tohigher order logic. But, like Robinson, heonly achieved partial mechanization: Sub-stitution for predicate variables could notbe done automatically. Andrews describesan interactive higher order theorem proverbased on generalized resolution in Andrewsand Cohen [1977] and Andrews et al.[1984].Darlington [ 1968, 19711 chose a differentroute. He used a subset of second-orderlogic that allowed him only a little moreexpressive power than first-order logic. Hewas able to mechanize this logic completelyusing a new unification algorithm (firstcalled f-matching). To repeat, he was notusing full second-order logic; his algorithmcould only unify a predicate variable witha predicate constant of greater arity.* The above example is taken from that dissertation.

    Huet [ 19721 in his dissertation presenteda new refutational system for higher orderlogic based on a simple unification algo-rithm for higher order terms. A year later,part of that dissertation was published asHuet [1973b]. It showed that the unifica-tion problem for third-order logic was un-decidable; that is, there exists no effectiveprocedure to determine whether two third-(or higher) order terms are unifiable.Lucchesi [ 19721 independently made thissame discovery.Baxter [1978] extended Huets andLucchesis undecidability results to third-order dyadic unification. Three years later,Goldfarb [1981] demonstrated the unde-cidability of the unification problem forsecond-order terms by a reduction ofHilberts Tenth Problem to it.Huets refutational system is also de-scribed in Huet [1973a], and his higherorder unification algorithm can be found inHuet [1975]. Since w-order unification is ingeneral undecidable, Huets algorithm maynot halt if the terms are not unifiable. Thealgorithm first searches for the existence ofa unifier, and if one is shown to exist, avery general preunifier is returned.Huets system was based on the typedX-calculus introduced by Church [1940].The X-calculus was the basis for Henkins[1950] work on higher order completenessmentioned above in connection withRobinson [1968]. Various types of mathe-matical theorems can be expressed natu-rally and compactly in the typed X-calculus,and for this reason almost all work onhigher order unification uses this formal-ism.Pietrzykowski and Jensen also studiedhigher order theorem proving. Pietrzy-kowski [1973] gave a complete mechaniza-tion of second-order logic, which he andJensen extended to w-order logic in Jensenand Pietrzykowski [1976] and Pietrzy-kowski and Jensen [ 1972,1973]. Like Huet,they based their mechanization on higherorder unification. Unlike Huet, however,they presented an algorithm that computesa complete set of general unifiers3 Thealgorithm is more complicated than Huets,3 Of course, the algorithm still may not terminate ifthe terms are nonunifiable.

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    13/32

    Unification: A Multidisciplinary Survey l 105using five basic types of substitutions asopposed to Huets two, and is generallyredundant.Winterstein [ 19761 reported that even forthe simple case of monadic(one argumentper function) second-order terms unifica-tion could require exponential time underthe algorithms proposed by Huet, Pietrzy-kowski, and Jensen. Winterstein and Huetindependently arrived at better algorithmsfor monadic second-order unification.Darlington [1977] attempted to improvethe efficiency of higher order unification bysacrificing completeness. If two termsunify, Darlingtons algorithm may not re-port this fact, but he claims that in theusual case unification runs more swiftly.As mentioned above, Goldfarb [1981]showed that higher order unification is ingeneral undecidable. There are, however,variants that are decidable. Farmer [1987]shows that second-order monadic unifica-tion (discussed two paragraphs ago) is ac-tually decidable. Siekmann [1984] suggeststhat higher order unification under sometheory T might be solvable (see Section 9),but there has been no proof to this effect.7. UNIFICATION AND FEATURESTRUCTURESFirst-order terms turn out to be too limitedfor many applications. In this section, I willpresent structures of a somewhat more gen-eral nature. The popularity of these struc-tures has resulted in a line of research thathas split away from mainstream work intheorem proving and logic programming.An understanding of these structures isnecessary for an understanding of muchresearch on unification.Kay [1979] introduced the idea of usingunification for manipulating the syntac-tic structures of natural languages (e.g.,English and Japanese). He formalized thelinguistic notion of features in what arenow known as feature structures.

    Feature structures resemble first-orderterms but have several restrictions lifted:l Substructures are labeled symbolically,not inferred by argument position.l Fixed arity is not required.

    Figure 7. A feature structure.

    l The distinction between function andargument is removed.l Variables and coreference are treatedseparately.

    I will now discuss each of these differ-ences.Substructure labeling. Features arelabeled symbolically, not inferred by argu-ment position within a term. For example,we might want the term person (john, 23,70, spouse(mary, 25)) to mean There is aperson named John, who is 23 years old, 70inches tall, and who has a 25-year-oldspouse named Mary. If so, we must re-member the convention that the third ar-gument of such a term represents a persons

    height. An equivalent feature structure,using explicit labels for substructure, isshown in Figure 7. (I have used a standard,but not unique, representation for writingdown feature structures.)Fixed arity. Feature structures do nothave the fixed arity of terms. Traditionally,feature structures have been used to rep-resent partial information-unificationcombines partial structures into largerstructures, assuming no conflict occurs. In

    other words, unification can result in newstructures that are wider as well as deeperthan the old structures. Figure 8 showsan example (the square cup stands forunification).Notice how all of the information con-tained in the first two structures ends upin the third structure. If the type featuresof the two structures had held differentvalues, however, unification would havefailed.The demoted functor. In a term, thefunctor (the function symbol that beginsthe term) has a special place. In featurestructures, all information has equal status.Sometimes one feature is primary for a

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    14/32

    106 . Kevin Knighttype: personage: 23

    Figure 8. Unification of two feature structures.Itype : personname: johnage:height: 1;Figure 9. A feature structure with variables and internal coreference constraints.

    particular use, but this is not built into thesyntax.Variables and coreference. Variablesin a term serve two purposes: (1) They areplace holders for future instantiation, and(2) they enforce equality constraints amongdifferent parts of the term. For example,the term person (john, Y, 2, spouse ( W, Y))expresses the idea John can marry anyoneexactly as old as he is (because the variableY appears in two different places). Note,

    however, that equality constraints are re-stricted to the leaves of a term.4 Featurestructures allow within-term coreferencesto be expressed by separating the two rolesof variables.If we wished, for example, to express theconstraint that John must marry his bestfriend, we might start with a term such asperson(john, Y, 2, spouse( W, X),

    bestfriend( W, X))This is awkward and unclear, however; itis tedious to introduce new variables foreach possible leaf, and moreover, the termseems to imply that John can marry anyoneas long as she has the same name and ageas his best friend. What we really want is avariable to equate the fourth and fiftharguments without concern for their inter-nal structures. Figure 9 shows the featurestructure formalization.The boxed number indicates coreference.Features with values marked by the same

    4 This is construed in the sense of the terms graphrepresentation.

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

    coreference label share the same value (inthe sense of LISPs EQ, not EQUAL). Thevalue itself can be placed after any one ofthe coreference labels, the choice of whichis arbitrary. The symbol [ ] indicates a vari-able. A variable [ ] can unify with any otherfeature structure s.Like terms, feature structures can alsobe represented as directed graphs (see Sec-tion 3). This representation is often moreuseful for implementation or even presen-tation. Whereas term graphs are labeled onuertices, feature structure graphs have la-bels on arcs and leaves. Figure 10 shows thegraph version of the feature structure ofFigure 7. The graph of Figure 10, of course,is just a tree. In order to express somethingsuch as John must marry a woman exactlyas old as he is, we need coreference, indi-cated by the graph in Figure 11.

    This ends the discussion of feature struc-tures and terms. At this point, a few otherstructures are worth mentioning: frames, #-terms, and LISP functions. Frames [Min-sky 19751, a popular representation in ar-tifical intelligence, can be modeled withfeature structures. We view a feature as aslot: Feature labels become slot names andfeature values become slot values. As in theframe conception, a value may be eitheratomic or complex, perhaps having sub-structure of its own.The $-terms of Ait-Kaci [1984, 19861and Ait-Kaci and Nasr [1986] are verysimilar to feature structures. Subterms arelabeled symbolically, and fixed arity isnot required, as in feature structures, but

  • 8/2/2019 Unification Knight

    15/32

    Unification: A Multidisciplinary Survey 107

    mary 25Figure 10. Graph representation of the feature kructure shown in Figure 7.

    Figure 11. Graph[I II

    representation of a feature structure with variables

    the functor is retained. And like featurestructures, variables and coreference arehandled separately. Although feature struc-tures predate q-terms, Ait-Kacis discus-sion of arity, variables, and coreference issuperior to those found in the feature struc-ture literature.

    As an example, suppose we have the fol-lowing inheritance information: Birds andfish are animals, a fish-eater is an animal,a trout is a fish, and a pelican is both a bird

    The novel aspect of G-terms is the use oftype inheritance information. Ait-Kaciviews a terms functions and variables asfilters under unification, since two struc-tures with different functors can neverunify, whereas variables can unify withanything. He questions and relaxes thisopen/closed behavior by allowing typeinformation to be attached to functionsand variables. Unification uses informationfrom a taxonomic hierarchy to achieve amore gradual filtering.

    and coreference.

    and a fish-eater. Then unifying the follow-ing two q-terms,fish-eater (likes + trout)bird (color + brown; likes * fish)

    yields the new $-termpelican (color =$ brown; likes j trout)

    As a third structure, I turn to the functionin the programming language LISP. Func-tions are not directly related to the feature

    Unification does not fail when it sees

    structures, but some interesting parallels

    conflicts between fish-eater and birdor between trout and fish. Instead,it resolves the conflict by finding thegreatest lower bound on the two items inthe taxonomic hierarchy, in this casepelican and trout, respectively. In thisway, Ait-Kacis system naturally ex-tends the information-merging nature ofunification.

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    16/32

    108 l Kevin Knightcan be drawn. Older dialects of LISP re-quired that parameters to functions bepassed in an arbitrary, specified order.Common LISP [Steele 19841, however, in-cludes the notion of keyword parameter.This allows parameters to be passed alongwith symbolic names; for example,

    (make-node: state/( 1 2 3 ) :score 12 : successors nil)

    which is equivalent to(make-node : score 12 : state( 1 2 3 ) :

    successors nil)Keyword parameters correspond to theexplicit labeling of substructure discussedabove. Note that Common LISP functionsare not required to use the keyword param-eter format. Common LISP also allows foroptional parameters to a function, whichseems to correspond to the lack of fixedarity in feature structures. Common LISP,however, is not quite as flexible, sinceall possible optional parameters must bespecified in advance.Finally, feature structures may containvariables. LISP is a functional language,and that means a functions parametersmust be fully instantiated before the functioncan begin its computation. In other words,variable binding occurs in only one direc-tion, from parameters to their values. Con-trast this behavior with that of Prolog, inwhich parameter values may be temporarilyunspecified.8. UNIFICATION AND NATURAL LANGUAGE

    PROCESSINGThis section makes extensive use of theideas covered in Section 7. Only with thisbackground am I able to present examplesand literature on unification in naturallanguage processing.8.1 Parsing with a Unification-Based

    GrammarUnification-based parsing systems typi-cally contain grammar rules and lexicalentries. Lexical entries define words, and

    grammar rules define ways in which wordsmay be combined with one another to formlarger units of language called constituents.Sample constituent types are the nounphrase (e.g., the man) and the verbphrase(e.g., kills bugs). Grammar rules in nat-ural language systems share a common pur-pose with grammar rules found in formallanguage theory and compiler theory: Thedescription of how smaller constituents canbe put together into larger ones.Figure 12 shows a rule (called an aug-mented context-free grammar rule) takenfrom a sample unification-based grammar.It is called an augmented context-freegrammar rule because at its core is thesimple context-free rule S * NP VP,meaning that we can build a sentence (S)out of a noun phrase (NP) and a verbphrase (VP). The equations (the augmen-tation) serve two purposes: (1) to blockapplications of the rule in unfavorable cir-cumstances and (2) to specify structuresthat should be created when the rule isapplied.The idea is this: The rule builds a featurestructure called X0 using (already existent)feature structures Xl and X2. Each featurestructure, in this case, has two substruc-tures, one labeled category, the other la-beled head. Category tells what general typeof syntactic object a structure is, and headstores more information.Let us walk through the equations. Thefirst equation states that the feature struc-ture X0 will have the category of S (sen-tence). The next two equations state thatthe rule only applies if the categories of X 1and X2 are NP and VP, respectively.The fourth equation states that the num-ber agreement feature of the head of Xlmust be the same as the number agreementfeature of the head of X2. Thus, this rulewill not apply to create a sentence likeJohn swim. The fifth equation states thatthe subject feature of the head of X0 mustbe equal to Xl; that is, after the rule hasapplied, the head of X0 should be a struc-ture with at least one feature, labeled sub-ject, and the value of that feature should bethe head of the structure Xl.The sixth equation states that the headof X0 should contain all of the features of

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    17/32

    Unification: A Multidisciplinary Survey l 109x0=%.x1x2(X0 category) = S

    (Xl category) = NP(X2 category) = VP(Xl head agreement) = (X2 head agreement )(X0 head subject) = (Xl head)(X0 head) = (X2 head)(X0 head mood) = declarative

    Figure 12. Sample rule from a unification-based grammar.

    category: NP 1

    heldroot: killtense: presentagreement: singularobject: bugagreement: plural 111I [

    et: thehead: root: man

    agreement: singular 11category: VP

    Figure 13. Feature structures representing analyses of the man and kills bugs.

    category: NPdet: theroot : managreement: singulnr

    category: VProot:tense: killpresent22: head: I rgreement: singular

    11Figure 14. Two feature structures combined with dummy features xl and x2.

    the head of X2 (i.e., the sentence inheritsthe features of the verb phrase). The lastequation states that the head of X0 shouldhave a mood feature whose (atomic) valueis declarative.Now where does unification come in?Unification is the operation that simulta-neously performs the two tasks of buildingup structure and blocking rule applications.Rule application works as follows:(1) Gather constituent structures. Supposewe have already built up the two struc-tures shown in Figure 13. These struc-tures represent analyses of the phrases

    the man and kills bugs. In parsing,we want to combine these structuresinto a larger structure of category S.(2) Temporarily combine the constituentstructures. NP and VP are combinedinto a single structure by way ofdummy features 3tl and x2 (Figure14).(3) Represent the grammar rule itself as afeature structure. The feature structure

    for the sample rule given above isshown in Figure 15. The boxed corefer-ence labels enforce the equalities ex-pressed in the rules augmentation.

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    18/32

    110 l Kevin Knight

    declarative II-Figure 15. Feature structure representation of the grammar rule in Figure 12.

    x0: [category: Shead: q I

    Xl:r category: NP

    det : thehead: pJ root: manL agreement: IIsingularr category: VP root: killtense: presentmood: declarative

    x2: head: q subject: qagreement: qcategory: Np

    object: root : bwagreement: plural 11Figure 16. Result of unifying constituent and rule structures (Figures 14 and 15).(4) Unify the constituent structure withthe rule structure. We then get thestructure shown in Figure 16. In thismanner, unification builds larger syn-tactic constituents out of smaller ones.(5) Retrieve the substructure labeled X0.

    In practice, this is the only informationin which we are interested. (The finalstructure is shown in Figure 17.)Now suppose the agreement feature ofthe original VP had been plural. Unifica-tion would have failed, since the NP hasthe same feature with a different (atomic)value-and the rule would fail to apply,blocking the parse of The man kills bugs.A few issues bear discussion. First, where

    did the structures Xl and X2 come from?In unification-based grammar, the lexicon,or dictionary, contains basic feature struc-tures for individual words. Figure 18, forexample, shows a possible lexical entry

    for the word man. All higher level featurestructures are built up from these lexicalstructures (and the grammar rules).Second, how are grammar rules chosen?It seems that only a few rules can possiblyapply successfully to a given set of struc-tures. Here is where the category featurecomes into play. Any parsing method, suchas Earleys algorithm [Earley 1968, 19701or generalized LR parsing [Tomita 1985a,1985b,1987], can direct the choice of gram-mar rules to apply, based on the categoriesof the constituents generated.Third, what are the structures used for?Feature structures can represent syntactic,semantic, or even discourse-based infor-mation. Unification provides a kind ofconstraint-checking mechanism for merg-ing information from various sources. Thefeature structure built up from the last ruleapplication is typically the output of theparser.

    ACM Comput ing Surveys, Vo l. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    19/32

    Unification: A Multidisciplinary Survey l 111category:

    head :

    sroot: killtense: presentmood: declarative

    det: thesubject: root: man

    agreement: q singular Iagreement: qobject: bugagreement: plural 11

    Figure 17. Feature structure analysis of The man kills bugs.category: N

    [root: man

    head: agreement: singular 11Figure 18. Sample lexical entry for the word man.Finally, why do the rules use equationsrather than explicit tests and structure-building operations? Recall that unifica-tion both blocks rule applications if certaintests are not satisfied and builds up newstructures. Another widely used grammarformalism, the Augmented Transition Net-

    work (ATN) of Woods [1970], separatesthese two activities. An ATN grammar is anetwork of states, and parsing consists ofmoving from state to state in that network.Along the way, the parser maintains a setof registers, which are simply variables thatcontain arbitrary values. When the parsermoves from one state to another, it maytest the current values of any registers andmay, if the tests succeed, change the valuesof any registers. Thus, testing and buildingare separate activities.An advantage of using unification, how-ever, is that it is, by its nature, bidirec-tional. Since the equations are stateddeclaratively, a rule could be used to cre-ate smaller constituents from larger ones.One interesting aspect of unification-based grammars is the possibility using thesame grammar to parse and generate nat-ural language.8.2 ResearchAn excellent, modern introduction to theuse of unification in natural language pro-cessing is Shiebers [ 19861 book. This book

    presents motivations and examples far be-yond the above exposition. The reader in-terested in this area is urged to find thissource. Now I will discuss research on fea-ture structures, unification algorithms, andunification-based grammar formalisms.Karttunen [ 19841 discusses features andvalues at a basic level and provides linguis-tic motivation for both unification andgeneralization (see also Section 11).Structure-sharing approaches to speedingup unification are discussed by Karttunenand Kay [ 19851 and by Pereira [ 19851, whoimports Boyer and Moores [1972] ap-proach for terms. Wroblewski [1987] givesa simple nondestructive unification algo-rithm that minimizes copying.Much unpublished research has dealtwith extensions to unification. For exam-ple, it is often useful to check for thepresence or absence of a feature duringunification rather than simply mergingfeatures. Other extensions include (1)multiple-valued features-a feature maytake on more than one value, for example,big, red, round ; (2) negated features:-afeature may be specified by the values itcannot have rather than one it can have;and (3) disjunctive features:-a feature maybe specified by a set of values, at least oneof which must be its true value.Disjunctive feature structures are ofgreat utility in computational linguistics,but their manipulation is difficult. Fig-ure 19 shows an example of unification ofdisjunctive feature structures.The curly brackets indicate disjunction;a features value may be any one of thestructures inside the brackets. When per-forming unification, one must make sure

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    20/32

    112 l Kevin Knight

    c: 2[ 1: 1[e: 41

    d: 2c: 2 b:

    Figure lg. Unification of disjunctive feature structures.

    that all possibilities are maintained. Thisusually involves cross multiplying the val-ues of the two disjunctive features. In theabove example, there are four possible waysto combine the values of the b feature;three are successful, and one fails (becauseof the conflicting assignment to the dfeature).The mathematical basis of feature struc-tures has been the subject of intensivestudy. Pereira and Shieber [1984] devel-oped a denotational semantics for unifica-tion grammar formalisms. Kasper andRounds [1986] and Rounds and Kasper[ 19861 presented a logical model to describefeature structures. Using their formal se-mantics, they were able to prove that theproblem of unifying disjunctive structuresis flp-complete. Disjunctive unificationhas great utility, however, and several re-searchers, including Kasper [ 19871 andEisele and Dorre [ 19881, have come up withalgorithms that perform well in the averagecase. Rounds and Manaster-Ramer [ 19871discussed Kays Functional Grammar interms of his logical model. Moshier andRounds [1987] extended the logic to dealwith negation and disjunction by means ofintuitionistic logic. Finally, in his disserta-tion, Ait-Kaci [1984] gave a semantic ac-count of $-terms, logical constructs veryclose to feature structures-these con-structs were discussed in Section 7.The past few years have seen a numberof new grammar formalisms for naturallanguages. A vast majority of these formal-isms use unification as the central opera-tion for building larger constituents out ofsmaller ones. Gazdar [ 19871 enumeratedthem in a brief survey, and Sells [1985]described the theories underlying two of themore popular unification grammar formal-isms. Gazdar et al. [1987a] present a moreACM Computing Surveys, Vo l. 21, No. 1, March 1989

    complete bibliography. The formalisms canbe separated roughly into families:In the North American Family, wehave formalisms arising from work in com-putational linguistics in North America inthe late 1970s. Functional UnificationGrammar (FUG; previously FG and UG)is described by Kay [1979, 1984, 1985aJ.Lexical Functional Grammar (LFG) isintroduced in Bresnans [ 19821 book.The Categorial Family comes out oftheoretical linguistic research on categorialgrammar. Representatives are UnificationCategorial Grammar (UCG), CategorialUnification Grammar (CUG), Combina-tory Categorial Grammar (CCG), andMeta-Categorial Grammar (MCG). UCGwas described by Zeevat [1987], CUG byUzkoreit [1986], and CCG by Wittenburg[ 19861. Karttunen [ 1986131 also presentedwork in this area.The GPSG Family begins with Gener-alized Phrase Structure Grammar (GPSG),a modern version of which appears inGazdar and Pullum [1985]. Work atICOT in Japan resulted in a formalismcalled JPSG [Yokoi et al. 19861. Pollardand colleagues have been working recentlyon Head-Driven Phrase Structure Gram-mar (HPSG); references include Pollard[1985] and Sag and Pollard [1987].Ristad [1987] introduces Revised GPSG(RGPSG).The Logic Family consists of grammarformalisms arising from work in logic pro-gramming. Interestingly, Prolog was devel-oped with natural language applications inmind: Colmerauers [1978, 1982a] Meta-morphosis Grammars were the first gram-mars in this family. Pereira and Warrens[1980] Definite Clause Grammar (DCG)has become extremely popular. Other logicgrammars include Extraposition Grammar

  • 8/2/2019 Unification Knight

    21/32

    Unification: A Multidisciplinary Survey l 113(XC) [Pereira, 19811, Gapping Grammars,[Dahl1984; Dahl and Abramson 19841 SlotGrammar [McCord 19801, and ModularLogic Grammar (MLG) [McCord 19851.Prolog has been used in a variety of waysfor analyzing natural languages. DefiniteClause Grammar (DCG) is a basic exten-sion of Prolog, for example. Bottom-up par-sers for Prolog are described in Matsumotoand Sugimura [1987], Matsumoto et al.[ 19831, and Stabler [ 19831. Several articleson logic programming and natural languageprocessing are collected in Dahl and Saint-Dizier [1985]. Implementations of LFG inProlog are described in Eisele and Dorre[ 19861 and Reyle and Frey [ 19831.Pereira [ 19871 presented an interestinginvestigation of unification grammars froma logic programming perspective. In a sim-ilar vein, Kay [1985b] explained the needto go beyond the capabilities of unificationpresent in current logic programmingsystems.Finally, in the miscellaneous category,we have PATR-II, an (implemented) gram-mar formalism developed by Karttunen[1986b], Shieber [1984, 19851, and others.PATR is actually an environment in whichgrammars can be developed for a wide va-riety of unification-based formalisms. Infact, many of the formalisms listed aboveare so similar that they can be translatedinto each other automatically; for example,see Reyle and Frey [1983]. Hirsh [1986]described a compiler for PATR grammars.9. UNIFICATION AND EQUATIONAL

    THEORIESUnification under equational theories is avery active area of research. The researchhas been so intense that there are severalpapers devoted entirely to classifyingprevious research results [e.g., Siekmann1984, 1986; Walther 19861. Many openproblems remain, and the area promises toprovide new problems for the next decade.9.1 Unification as Equation SolvingAt the very beginning of this paper, I pre-sented an example of unification. Theterms were s = f (x, y) and t = f (g(y, a),

    h(a)), and the most general unifier u wasx +--Ah(a), a)y + h(a)

    From an algebraic viewpoint, we can thinkof this unification as solving the equations = t by determining appropriate values forthe variables n and y.Now, for a moment, assume that fdenotes the function add, g denotes thefunction multiply, h denotes the functionsuccessor, and a denotes the constantzero. Also assume that all the axioms ofnumber theory hold. In this case, the equa-tion s = t is interpreted as add(x, y) =add(mult(y, O), succ(0)). It turns out thatthere are many solutions to the equation,including the substitution r:x t h(a)Y+-a

    Notice that 7(s) is f(h(a), a) and that7(t) is f (g(a, a), h(a)). These resulting twoterms are not textually identical, but underthe interpretation of f, g, h, and a givenabove, they are certainly equivalent. Theformer is succ(0) + 0, or succ(0); the latteris (0 . 0) + succ(O), or succ(0). Therefore,we say that r unifies s and t under theaxioms of number theory. It is clear thatdetermining whether two terms are unifia-ble under the axioms of number theory isthe same problem as solving equations innumber theory.Number theory has complex axioms andinference rules. Of special interest tounification are simpler equational axiomssuch asf (f (x, y), 2) = f (x, f (y, 2)) associativity

    fb,Y) =f(y,x) commutativityf(&X)=x idempotenceA theory is a finite collection of axiomssuch as the ones above. The problem ofunifying two terms under theory T is writ-ten (s = t)T.Consider the problem of unificationunder commutativity: (s = t)c. To unifyf(x, y) with f(a, b) we have two substi-tutions available to us: 1~ t a; y t b} and(x t b; y c a}. With commutativity, we no

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    22/32

    114 l Kevin Knightlonger get the unique most general unifierof Robinsons null-theory unification-compare this situation to Goulds findingson higher order unification.There are many applications for unifi-cation algorithms specialized for certainequational theories. A theorem prover, forexample, may be asked to prove theoremsabout the multiplicative properties of inte-gers, in which case it is critical to knowthat the multiplication function is associa-tive. One could add an associativityaxiom to the theorem prover, but it mightwaste precious inferences simply bracket-ing and rebracketing multiplication for-mulas. A better approach would be to buildin the notion of associativity at the coreof the theorem prover: the unification al-gorithm. Plotkin [ 19721 pioneered thisarea, and it has since been the subject ofmuch research.

    9.2 ResearchSiekmanns [ 19841 survey of unificationunder equational theories is comprehen-sive. The reader is strongly urged to seekout this source. I will mention here only afew important results.We saw above that unification undercommutativity can produce more than onegeneral unifier; however, there are alwaysa finite number of general unifiers [Liveseyet al. 19791. Under associativity, on theother hand, there may be an infinite num-ber of general unifiers [Plotkin 19721.Baader [ 19861 and Schmidt-Schauss [ 19861independently showed that unification un-der associativity and idempotence has theunpleasant feature that no complete setof general unifiers even exists. (That is,there are two unifiable terms s and t, butfor every unifier u, there is a more generalunifier 7.)Results of this kind place any equationaltheory T into a hierarchy of theories,according to the properties of unificationunder T. Theories may be unitary (oneMGU), finitary (finite number of generalunifiers), infinitary (infinite number of gen-eral unifiers), or nullury (no complete setof general unifiers). Siekmann [1984] sur-veys the classifications of many naturally

    arising theories. The study of UniversalUnification seeks to discover properties ofunification that hold across a wide varietyof equational theories, just as UniversalAlgebra abstracts general properties fromindividual algebras.To give a flavor for results in UniversalUnification, I reproduce a recent theoremdue to Book and Siekmann [1986]:Theorem 9.1If T is a suitable first-order equation theorythat is not unitary, then T is not bounded.This means the following: Suppose thatunification under theory T produces a finitenumber of general unifiers but no MGU (aswith commutativity). Then, there is no par-ticular integer n such that for every pair sand t, the number of general unifiers is lessthan n (T is not bounded).Term-rewriting systems, systems fortranslating sets of equations into sets ofrewrite rules, are basic components of uni-fication algorithms for equational theories.Such systems were the object of substantialinvestigation in Knuth and Bendix [1970].A survey of work in this area is given inHuet and Oppen [ 19801.Some recent work [Goguen and Mese-guer 1987; Meseguer and Goguen 1987;Smolka and Ait-Kaci 1987; Walther 19861investigates the equational theories of or-der-sorted and many-sorted algebras.Smolka and Ait-Kaci [1987] show howunification of Ait-Kacis $-terms (see Sec-tion 7) is an instance of unification inorder-sorted Horn logic. In other words,unification with inheritance can be seen asunification with respect to a certain set ofequational axioms.Unification algorithms have been con-structed for a variety of theories. Algo-rithms for particular theories, however,have usually been handcrafted and basedon completely different techniques. Thereis interest in building more general (butperhaps inefficient) universal algorithms[e.g., Fages and Huet 1986; Fay 1979;Gallier and Snyder 1988; Kirchner 19861.There is also interest in examining thecomputational complexity of unification

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    23/32

    Unification: A Multidisciplinary Survey 115under various theories [e.g., Kapur andNarendran 19861.10. PARALLEL ALGORITHMS FOR

    UNIFICATIONLewis and Statman [1982] wrote a noteshowing that nonunifiability could be com-puted in nondeterministic log space (andthus, by Savitchs theorem, in deterministiclog2 space). By way of the parallel compu-tation thesis [Goldschlager 19781, which re-lates sequential space to parallel time, thisresult could have led to a log2 n paralleltime algorithm for unification.In the process of trying to find such analgorithm, Dwork et al. [1984] found anerror in the above-mentioned note. Theywent on to prove that unifiability is actuallylog space complete for 9, which means thatit is difficult to compute unification usinga small amount of space. This means thatit is highly unlikely that unification can becomputed in O(logk n) parallel time usinga polynomial number of processors (in com-plexity terminology, unification is in theclass &%? only if 9 = Jy^%?).

    In other words, even massive parallelismwill not significantly improve the speed ofunification-it is an inherently sequentialprocess. This lower bound result has a widerange of consequences; for example, usingparallel hardware to speed up Prolog exe-cution is unlikely to bring significant gains(but see Robinson [1985] for a hardwareimplementation of unification).Yasuura [ 19841 independently deriveda complexity result similar to that ofDwork et al. [1984] and gave a parallelunification algorithm that runs in timeO(log n + m log m), where m is the totalnumber of variables in the two terms.Dwork et al. [1984] mentioned the exist-ence of a polylog parallel algorithm for termmatching, the variant of unification inwhich substitution is allowed into only oneof the two terms. They gave an O(log2 n)time bound but required about O(n5)processors. Using randomization tech-niques, Dwork et al. [1986] reduced theprocessor bound to O(n3).Maluszynski and Komorowski [ 19851,motivated by Dwork et al.s [ 19841 negative

    results, investigated a form of logic pro-gramming based on matching rather thanunification.Vitter and Simons [1984, 19861 arguedthat unification can be helped by multipleprocessors in a practical setting. They de-veloped complexity classes to formalize thisnotion of practical speedup and gave aparallel unification algorithm that runs intime O(E/P + V log P), where P is thenumber of processors, E the number ofedges in the term graph, and V the numberof vertices.Harland and Jaffar [ 19871 introduced an-other measure for parallel efficiency andcompared several parallel algorithms in-cluding Yasuuras algorithm [ 19841, Vitterand Simons algorithm [1984, 19861, anda parallel version of Jaffars algorithm[1984].11. UNIFICATION, GENERALIZATION, AND

    LATTICESGeneralization is the dual of unification,and it finds applications in many areas inwhich unification is used. Abstractly, thegeneralization problem is the following:Given two objects x and y, can we find athird object z of which both x and y areinstances? Formally it is as follows:Definition 11.1An antisubstitution (denoted y, 17,5; . . .), isa mapping from terms into variables.Definition 11.2Two terms s and t are general&able if thereexists an antisubstitution y such thaty(s) = y(t). In such a case, y is called ageneralizer of s and t, and y(s) is calleda generalization of s and t.

    Generalizability is a rather vacuous con-cept, since any two terms are generalizableunder the antisubstitution that maps allterms onto the variable X. Of more interestis the following:DefinitionA generalizer y of terms s and t is calledthe most specific generalizer (MSG) of s and

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    24/32

    116 l Kevin Knight

    fk b)

    Figure 20. A portion of the lattice of first-order terms.

    fk b) fb, Y)

    Figure 21. Unification of terms f(x, b) and f (a,y),t if for any other generalizer 7, there is anantisubstitution {such that {y(s) = v(s).

    Intuitively, the most specific generaliza-tion of two terms retains information thatis common to both terms, introducing newvariables when information conflicts. Forexample, the most specific generaliza-tion of f(a, g(b, c)) and f(b, g(2c, c)) isf(z, g(x, c)). Since the second argument tof can be a or b, generalization makes anabstraction by introducing the variable Z.Unification, on the other hand, would failbecause of this conflict.

    came up with a generalization algorithm.(Plotkin [ 19701 independently discoveredthis algorithm.) Reynolds used the naturallattice structure of first-order terms, a par-tial ordering based on subsumption ofterms. General terms subsume more spe-cific terms; for example, f (x, a) subsumesf(b, a). Many pairs of terms, of course, donot stand in any subsumption relation, andthis is where unification and generalizationcome in. Figure 20 shows a portion ofthe lattice of first-order terms augmentedwith two special terms called top (T) andbottom (I).As mentioned in a previous section, Unification corresponds to finding theReynolds [1970] proved the existence of greatest lower bound (or meet) of two termsa unique MSG for first-order terms and in the lattice. Figure 21 illustrates the

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    25/32

    Unification: A Multidisciplinary Survey l 117T

    fh b) fh Y)

    Figure 22. Generalization of terms (x, b) and f(a, c).unification of f(x, b) and f(a, y). The bot-tom of the lattice (I), to which all pairs ofterms can unify, represents inconsistency.If the greatest lower bound of two termsis I, then they are not unifiable.Generalization corresponds to findingthe least upper bound (or join) of two termsin the lattice. Figure 22 illustrates the gen-eralization of f(x, b) and f (a, c). The top ofthe lattice (T), to which all pairs of termscan generalize, is called the universal term.Everything is an instance of this term.Feature structures, previously discussedin Section 7, make up a more complexlattice structure.Inheritance hierarchies, common struc-tures in artificial intelligence, can be seenas lattices that admit unification and gen-eralization. For example, the generalizationof concepts bird and fish-eater in acertain knowledge base might be animal,whereas the unification might be pelican.Ait-Kaci extends this kind of unificationand generalization to typed record struc-tures (#-terms), as discussed in Section 7.

    12. OTHER APPLICATIONS OFUNIFICATION

    I have already discussed three applicationareas for unification: automatic theoremproving, logic programming, and naturallanguage processing. Other areas are listedhere.

    12.1 Type InferenceMilner [1978] presented work on compile-time type checking for programming lan-guages, with the goal of equating syntacticcorrectness with well typedness. His algo-rithm for well typing makes use of unifica-tion, an idea originally due to Hindley[1969]. More recent research in this areaincludes the work of Cardelli [1984],MacQueen et al. [1984], and Moshier andRounds [ 19871.12.2 Programming LanguagesPattern matching is a very useful featureof programming languages. Logic program-ming languages make extensive use ofpattern matching, as discussed above,but some functional languages also pro-vide matching facilities (e.g., PLANNER,QA4/QLISP). In these cases, functionsmay be invoked in a pattern-directedmanner, making use of unification as apattern-matcher/variable-binder. Stickel[ 19781 discusses pattern-matching lan-guages and specialized unification algo-rithms in his dissertation.12.3 Machine LearningLearning concepts from training instancesis a task that requires the ability to gener-alize. Generalization, the dual of unifica-tion, is an operation widely used in machinelearning.

    ACM Computing Surveys, Vol. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    26/32

    118 . Kevin KnightPlotkin [ 19701, who opened up work inunification under equational theories, alsoproduced work on inductive generalization,that is, abstracting general properties froma set of specific instances. Mitchells [ 19791work on version spaces investigated thesame problem using both positive andnegative training instances. Positive ex-amples force the learner to generate moregeneral hypotheses, whereas negative ex-amples force more specific hypotheses.The general-to-specific lattice (see Sec-tion 11) is especially useful, and Mitchellsalgorithms make use of operations on thislattice.

    13. CONCLUSIONThis section contains a list of propertiesthat unification possesses along with asummary of the trends in research on uni-fication.13.1 Some Properties of UnificationThe following properties hold for first-order unification:Unification is monotonic. Unificationadds information, but never subtracts. It isimpossible to remove information from onestructure by unifying it with another. Thisis not the case with generalization.Unification is commutative and asso-ciative. The order of unifications is irrel-evant to the final result. Unification andgeneralization are not, however, distribu-tive with respect to each other [Shieber19861.Unification is a constraint-mergingprocess. If structures are viewed as en-coding constraint information, then uni-fication should be viewed as mergingconstraints. Unification also detects whencombinations of certain constraint sets areinconsistent.Unification is a pattern-matchingprocess. Unification determines whethertwo structures match, using the mechanismof variable binding.Unification is bidirectional since vari-able binding may occur in both of thestructures to be unified. Matching is the

    unidirectional variant of unification andappears in the function call of imperativeprogramming languages.Unification deals in partially definedstructures. Unification, unlike manyother operations, accepts inputs that con-tain uninstantiated variables. Its outputmay also contain uninstantiated variables.13.2 Trends in Unification ResearchOne trend in unification research seeks todiscover faster algorithms for first-orderunification.5 Over the past two decades,many quite different algorithms have beenpresented, and although the worst-casecomplexity analyses are very interesting, itis still unclear which algorithms work bestin practice. There seems to be no betterway to compare algorithms than to imple-ment and test them on practical problems.As it stands, which algorithm is fastestdepends on the kinds of structures that aretypically unified.Another trend, begun by Robinson[1968], tackles the problems of unificationin higher order logic. Such problems are ingeneral unsolvable, and the behaviors ofunification algorithms are less quantifiable.Systems that deal with higher order logic,for example theorem provers and other AIsystems [Miller and Nadathur 19871, standto gain a great deal from research in higherorder unification.A third trend runs toward incorporatingmore features into the unification opera-tion. Automatic theorem provers once in-cluded separate axioms for commutativity,associativity, and so on, but research begunby Plotkin [I9721 showed that buildingthese notions directly into the unificationroutine yields greater efficiency. Logic pro-gramming languages are often called on toreason about hierarchically organized data,but they do so in a step-by-step fashion.Ait-Kaci and Nasr [1986] explain how tomodify unification to use inheritance infor-mation, again improving efficiency. Manyunification routines for natural languageparsing perform unification over negated, This trend toward efficiency also explores topics suchas structure sharing and space efficiency.

    ACM Comput ing Surveys, Vo l. 21, No. 1, March 1989

  • 8/2/2019 Unification Knight

    27/32

    Unification: A Multidisciplinary Survey l 119multiple-valued, and disjunctive structures.New applications for unification will nodoubt drive further research into new, morepowerful unification algorithms.

    ACKNOWLEDGMENTSI would like to thank Yolanda Gil, Ed Clarke, andFrank Pfenning for their useful comments on earlierdrafts of this paper. My thanks also go to the reviewersfor making many suggestions relating to both accuracyand clarity. Finally, I would like to thank MasaruTomita for introducing me to this topic and for helpingto arrange this survey project.

    REFERENCESAHO, V. A., HOPCROFT, J. E., AND ULLMAN, J. D.1974. The Design and Analysis of Computer Al-gorithms. Addison-Wesley, Reading, Mass.ANDREWS, P. B. 1971. Resolution in type theory.J. Symbolic Logic 36, pp. 414-432.ANDREWS, P. B., AND COHEN, E. L. 1977. Theoremproving in type theory. In Proceedings of Inter-national Jo int Conference on Artificial Intelli-gence.ANDREWS, P., MILLER, D., COHEN, E., AND PFEN-

    NING, F. 1984. Automating higher-order logic.Vol. 29. In Automated Theorem Prouing: After 25Years, W. Bledsoe and D. Loveland, Eds. Amer-ican Mathematical Society, Providence, R.I.,Contemporary Mathematics Series.AiT-KACI, H. 1984. A lattice theoretic approach tocomputation based on a calculus of partially or-dered type structures. Ph.D. dissertation. Univ.of Pennsylvania, Philadelphia, Pa.AiT-KACI, H. 1986. An algebraic semantics approachto the effective resolution of type equations.Theor. Comput. Sci. 45.AiT-KACI, H., AND NASR, R. 1986. LOGIN: A logic

    programming language with built-in inheritance.J. Logic Program. 3. (Also MCC Tech. Rep. AI-068-85, 1985.)BAADER, F. 1986. The theory of idempotent semi-groups is of unification type zero. J. Autom. Rea-soning 2, pp. 283-286.

    BAXTER, L. 1978. The undecidability of the thirdorder dyadic unification problem. Znf. Control 38,pp. 170-178.BOOK, R. V., AND SIEKMANN, J. 1986. On unifica-tion: Equational theories are not bounded. J.Symbolic Comput. 2, pp. 317-324.BOYER, R. S., AND MOORE, J. S. 1972. The sharingof structure in theorem-proving programs. Mach.Intell. 7.BRESNAN, J., AND KAPLAN, R. 1982. Lexical-functional grammar: A formal system for gram-matical representation. In The Mental Represen-tation of Grammatical Relations, J. Bresnan, Ed.MIT Press, Cambridge, Mass.BRUYNOOGHE, M., AND PEREIRA, L. M. 1984.Deduction revision by intelligent backtracking.In Prolog Implementation, J. A. Campbell, Ed.Elsevier, North-Holland, New York.CARDELLI, L. 1984. A semantics of multiple inher-itance. In Semantics of Data Types, Lecture Notesin Computer Science, No. 173 , G. Kahn, D. B.MacQueen, and G. Plotkin, Eds., Springer-Verlag, New York.CHEN, T. Y., LASSEZ, J., AND PORT, G. S. 1986.Maximal unifiable subsets and minimal non-unifiable subsets. New Generation Comput. 4, pp.133-152.CHURCH, A. 1940. A formulation of the simple theoryof types. J. Symbolic Logic 5, pp. 56-68.CLOCKSIN, W. F., AND MELLISH, C. S. 1981.Programming in Prolog. Springer-Verlag, New

    York.COLMERAUER, A. 1978. Metamorphosis grammars.In Natural Language Communication with Com-puters, L. Bolt, Ed. Springer-Verlag, New York.COLMERAUER, A. 1982a. An interesting subset of nat-ural language. In Logic Programming, K. L. Clarkand S. Tarnlund, Eds. Academic Press, London.COLMERAUER, A. 1982b. Prolog and infinite trees.In Logic Programming, K. L. Clark and S.Timlund, Eds. Academic Press, Orlando, Fla.COLMERAUER, A. 1983. Prolog in ten figures. In Pro-ceedings of the International Joint Conference on

    Artificial Intelligence.COLMERAUER, A., KANOUI, H., AND VAN CANEGHEM,M. 1973. Un systeme de communicationhomme-machine en Francais. Research Rep.Groupe Intelligence Artificielle, Universitb Aix-Marseille II.BARWISE, J., AND PERRY, J. 1983. Situations and CORBIN, J., AND BIDOIT, M. 1983. A rehabilita-Attitudes. MIT Press, Cambridge, Mass. tion of Robinsons unification algorithm. Inf.BATTANI, G., AND MELONI, H. 1973. Interpreteur du Process. 83, pp. 73-79.langage de programmation PROLOG. Groupe In- COURCELLE, B. 1983. Fundamental properties of in-telligence Artificielle, UniversitG Aix-Marseille II. finite trees. Theor. Comput. Sci. 25, pp. 95-169.BAXTER, L. 1973. An efficient unification algorithm. COX, P. T. 1984. Finding backtrack points for intel-Tech. Rea. No. CS-73-23. Univ. o