The logic of generality - lirias.kuleuven.be · The logic of generality Luc De Raedt DepartmentofComputerScience,KatholiekeUniversiteitLeuven,Celestijnenlaan 200A, BE - 3001 Heverlee,

The logic of generalityLuc De RaedtDepartment of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan200A, BE - 3001 Heverlee, Belgium, email: [email protected]

Definition

One hypothesis is more general than another one if it covers all instances that arealso covered by the later one. The former hypothesis is called a generalization ofthe later, and the later a specialization of the former. When using logical formulaeas hypotheses, the generality relation coincides with the notion of logical entail-ment, which implies that the generality relation can be analyzed from a logicalperspective. The logical analysis of generality, which is pursued in this entry, leadsto the perspective of induction as the inverse of deduction. This forms the basis foran analysis of various logical frameworks for reasoning about generality and fortraversing the space of possible hypotheses. Many of these frameworks (such as forinstance, θ-subsumption) are employed in the field inductive logic programmingand are introduced below.

Synonyms

generality and logic, induction as inverted deduction, is more general than, is morespecific than, specialization, inductive inference rules,

Motivation

Symbolic machine learning methods typically learn by searching a hypothesisspace. The hypothesis space can be (partially) ordered by the generality relation,which serves as the basis for defining operators to traverse the space as well asfor pruning away unpromising parts of the search space. This is often realizedthrough the use of refinement operators, that is, generalization and specializationoperators. Because many learning methods employ a hypothesis language that islogical or that can be reformulated in logic, it is interesting to analyze the general-ity relation from a logical perspective. When using logical formulae as hypotheses,the generality relation closely corresponds to logical entailment. This allows us todirectly transfer results from logic to a machine learning context. In particular, ma-chine learning operators can be derived from logical inference rules. The logical

1

theory of generality provides a framework for transferring these results. Withinthe standard setting of inductive logic programming, learning from entailment,specialization is realized through deduction, and generalization through induction,which is considered to be the inverse of deduction. Different deductive inferencerules lead to different frameworks for generalization and specialization. The mostpopular one is that of θ-subsumption, which is employed by the vast majority ofcontemparary inductive logic programming systems.

Theory

A hypothesis g is more general than a hypothesis s if and only if g covers allinstances that are also covered by s, more formally, if covers(s) ⊆ covers(g),where covers(h) denotes the set of all instances covered by the hypothesis h.

There are several possible ways to represent hypotheses and instances in logic[3, 2], each of which results in a different setting with a corresponding coversrelation. Some of the best known settings are learning from entailment, learningfrom interpretations and learning from proofs.

Learning from entailment

When learning from entailment, both hypotheses and instances are logical formu-lae, typically definite clauses, which underly the programming language Prolog [5].Furthermore, when learning from entailment, a hypothesis h covers an instance eif and only if h |= e, that is when h logically entails e, or equivalently, when e is alogical consequence of h. For instance, consider the hypothesis h:

flies :- bird, normal.bird :- blackbird.bird :- ostrich.

The first clause or rules can be read as flies if normal and bird, that is,normal birds fly. The second and third state that blackbirds, resp. ostriches, arebirds. Consider now the examples e1:

flies :- blackbird, normal, small.

and e2:

flies :- ostrich, small.

2

Example e1 is covered by h, because it is a logical consequence of h, that is h |= e1.On the other hand, example e2 is not covered, which we denote as h 6|= e2.

When learning from entailment, the following property holds.

Property 1 A hypothesis g is more general than a hypothesis s if and only if glogically entails s, that is g |= s.

This is easy to see. Indeed, g is more general than s if and only if covers(s) ⊆covers(g) if and only if for all examples e : (s |= e) → (g |= e) if and only ifg |= s. For instance, consider the hypothesis h1:

flies :- blackbird, normal.

Because h |= h1, it follows that h covers all examples by h1, and hence, h gener-alizes h1.

Property 1 states that the generality relation coincides with logical entailmentwhen learning from entailment. In other learning settings, such as when learningfrom intepretations, this relationship also holds though the direction of the relation-ship might change.

Learning from interpretations

When learning from interpretations, hypotheses are logical formulae, typically setsof definite clauses, and instances are interpretations. For propositional theories,interpretations are assingments of truth-values to propositional variables. For in-stance, continuing the flies illustration, two interpretations could be

{blackbird, bird, normal, flies}{ostrich, small}

where we specify interpretations through the set of propositional variables thatare true. An interpretation specifies a kind of possible world. A hypothesis hthen covers an interpretation if and only if the interpretation is a model for thehypothesis. An interpretation is a model for a hypothesis if it satisfies all clausesin the hypothesis. In our illustration, the first interpretation is a model for thetheory h but the second is not. Because the condition part of the rule bird :-ostrich. is satisfied in the second interpretation (as it contains ostrich), theconclusion part, that is bird, should also belong to the interpretation in order tohave a model. Thus, the first example is covered by the theory h, but the second isnot.

When learning from interpretations, a hypothesis g is more general than a hy-pothesis s if and only if for all examples e: (e is a model of s)→ (e is a model ofg) if and only if s |= g.

3

Because the learning from entailment setting is more popular than the learningfrom interpretations setting, we shall employ in this section the usual conventionwhich states that one hypothesis g is more general than a hypothesis s if and onlyif g |= s.

An operational perspective

Property 1 lies at the heart of the theory of inductive logic programming and gen-eralization because it directly relates the central notions of logic with those of ma-chine learning [10]. It is also extremely useful because it allows us to directlytransfer results from logic to machine learning.

This can be illustrated using traditional deductive inference rules, which startfrom a set of formulae and derive a formulae that is entailed by the original set. Forinstance, consider the resolution inference rule for propositional definite clauses:

h← g, a1, . . . , an and g ← b1, . . . , bm

h← b1, . . . , bm, a1, . . . , an(1)

This inference rule starts from the two rules above the line and derives the so-calledresolvent below the line. This rule can be used to infer h1 from h. An alternativedeductive inference rule adds a condition to a rule:

h← a1, . . . , an

h← a, a1, . . . , an(2)

This rule can be used to infer that h1 is more general than the clause used in exam-ple e1. In general, a deductive inference rule can be written as

g

s(3)

If s can be inferred from g and the operator is sound, then g |= s. Thus applyinga deductive inference rule realizes specialization, and hence, deductive inferencerules can be used as specialization operators. A specialization operator maps ahypothesis onto a set of its specializations. Because specialization is the inverseof generalization, generalization operators — which map a hypothesis onto a setof its generalizations — can be obtained by inverting deductive inference rules.The inverse of a deductive inference rule written in format (3) works from bottomto top, that is from s to g. Such an inverted deductive inference rule is calledan inductive inference rule. This leads to the view of induction as the inverse ofdeduction. This view is operational as it implies that each deductive inference rulecan be inverted into an inductive one, and also, that each inference rule provides inan alternative framework for generalization.

4

An example generalization operator is obtained by inverting the adding condi-tion rule (2). It corresponds to the well-known “dropping condition” rule [6]. Aswe will see soon, it is also possible to invert the resolution principle (1).

Before deploying inference rules, it is necessary to determine their properties.Two desirable properties are soundness and completeness. These properties arebased on the repeated application of inference rules. Therefore, we write g `r swhen there exists a sequence of hypotheses h1, · · · , hn such that

g

h1,h1

h2, · · · , hn

susing r (4)

A set of inference rules r is then sound whenever g `r s implies g |= s; and com-plete whenever g |= s implies g `r s. In practice, soundness is always enforcedthough completeness is not always required in a machine learning setting. Whenworking with incomplete rules, one should realize that the generality relation “`r”is weaker than the logical one “|=”.

The most important logical frameworks for reasoning about generality, such asθ-subsumption and resolution, will be introduced below using the above introducedlogical theory of generality.

Frameworks for generality

Propositional Subsumption

Many propositional learning systems employ hypotheses that consist of rules, oftendefinite clauses as in the flies illustration above. The propositional subsumptionrelation defines a generality relation amongst clauses and is defined through theadding condition rule (2). The properties then follow by applying the logical theoryof generalization presented above to this inference rule. More specifically, thegenerality relation `a induced by the adding condition rule states that a clause g ismore general than a clause s if s can be derived from g by adding a sequence ofconditions to g. Observing that a clause h← a1, · · · , an is a disjunction of literalsh ∨ ¬a1 · · · ∨ ¬an allows us to write it in set notation as {h,¬a1, · · · ,¬an}. Thesoundness and completeness of propositional subsumption then follow from

g `a s if and only if g ⊆ s if and only if g |= s (5)

which also clarifies states that g subsumes s if and only if g ⊆ s.The propositional subsumption relation induces a complete lattice on the space

of possible clauses. A complete lattice is a partial order — a reflexive, anti-symmetric and transitive relation — where every two elements posses a unique

5

flies.

flies :- normal. flies :- small.

flies :-bird, normal. flies :- bird,small. flies :- small, normal.

flies :-bird, normal, small.

flies :- bird.

Figure 1: The Hasse diagram for the predicate flies.

least upper and greatest lower bound. An example lattice for rules defining thepredicate flies in terms of bird, normal and small is illustrated in the Hassediagram depicted in Figure 1.

The Hasse diagram also visualizes the different operators that can be used. Thegeneralization operator ρg maps a clause to the set of its children in the diagram,whereas the specialization operator ρs maps a clause to the set of its parents. Sofar we have defined such operators implicitly through their corresponding inferencerules. In the literature, they are often defined explicitly:

ρg(h← a1, · · · , an) = {h← a1, · · · , ai−1, ai+1, · · · , an|i = 1, · · · , n} (6)

In addition to using the inference rules directly, some systems like Golem [11]also exploit the properties of the underlying lattice by computing the least upperbound of two formulae. The least upper bound operator is known under the nameof least general generalization (lgg) in the machine learning literature. It returnsthe least common ancestor in the Hasse diagram. Using set notation for clauses thedefinition of the lgg is:

lgg(c1, c2) = c1 ∩ c2 (7)

The least general generalization operator is used by machine learning systems thatfollow a cautious generalization strategy. They take two clauses corresponding topositive examples and minimally generalize them.

6

θ-subsumption

The most popular framework for generality within inductive logic programmingis θ-subsumption [13]. It provides a generalization relation for clausal logic andit extends propositional subsumption to first order logic. A definite clause is anexpression of the form h ← a1, · · · , an where h and the hi are logical atoms. Anatom is an expression of the form p(t1, · · · , tm) where p is a predicate name (or,the name of a relation) and the ti are terms. A term is either a constant (denotingan object in the domain of discourse), a variable, or a structured term of the formf(u1, · · · , uk) where f is a functor symbol (denoting a function in the domain ofdiscourse) and the ui are terms, see [5] for more details. Consider for instance theclauses

likes(X,Y) :- neighbours(X,Y).likes(X,husbandof(Y)) :- likes(X,Y).likes(X,tom) :- neighbours(X,tom), male(X).

The first clause states that X likes Y if X is a neighbour of Y. The secondone that X likes the husband of Y if X likes Y. The third one that all maleneighbour of tom like tom.

Θ-subsumption is not only based on the adding condition rule (2) but also onthe substitution rule:

g

gθ(8)

The substitution rule applies a substitution θ to the definite clause g. A substitution{V1/t1, · · · , Vn/tn} is an assingment of terms to variables. Applying a substitutionto a clause c yields the instantiated clause where all variables are simultaneouslyreplaced by their corresponding terms.

Θ-subsumption is then the generality relation induced by combining the sub-stitution with the adding condition rule in the set t.

g θ-subsumes s if and only if g `t s if and only if ∃θ : gθ ⊆ s (9)

For instance, the first clause for likes subsumes the third one with the substitu-tion {Y/tom}.

θ-subsumption has some interesting properties:

• θ-subsumption is sound.

• θ-subsumption is complete for clauses that are not self-recursive. It is in-complete for self-recursive clauses such as

7

nat(s(X)) :- nat(X)nat(s(s(Y))) :- nat(Y)

for which one can use resolution to that the first clause logically entails thesecond one, even though it does not θ-subsume it.

• Deciding θ-subsumption is an NP-complete problem.

Because θ-subsumption is relatively simplicity and decidable whereas logicalentailment between single clauses is undecidable, it is used as the generality rela-tion by the majority of Inductive logic programming systems. These systems typ-ically employ a specialization or refinement operator to traverse the search space.To guarantee the systematic enumeration of the search space, the specialization op-erator ρs can be employed. ρs(c) is obtained by applying the adding condition orsubstitution rule with the following restrictions.

• the adding condition rule only adds atoms of the form p(V1, · · · , Vn) wherethe Vi are variables not yet occurring in the clause c;

• the substitution rule only employs elementary substitutions, which are of theform

– {V/Y }, where X and Y are two variables appearing in c

– {V/ct}, where V is a variable in c and ct a constant

– {V/f(V1, · · · , Vn)}, where V is a variable in c, f a functor of arity nand the Vi are variables not yet occurring in c.

A generalization operator can be obtained by inverting ρs, which requires oneto invert substitutions. Inverting substitutions is not easy. Whereas applying asubstitution θ = {V/a} to a clause c replaces all occurrences of V by a and yieldsa unique clause cθ, applying the substitution rule in the inverse direction does notnecessarily yield a unique clause. If we assume the elementary substitution appliedto

c

q(a,a)(10)

was {V/a}, then there are at least three possibilities for c: q(a,V), q(V,a), andq(V,V).

Θ-subsumption is reflexive, transitive but unfortunately not anti-symmetric,which can be seen by consider the clauses

8

parent(X,Y) :- father(X,Y).parent(X,Y) :- father(X,Y), father(U,V).

The first clause clearly subsumes the second one because it is a subset. The secondone subsumes the first with the substitution {X/U, V/Y}. The two clauses aretherefore equivalent under θ-subsumption, and hence also logically equivalent. Theloss of the anti-symmetry complicates the search process. The naive application ofthe specialization operator ρs may yield syntactic specializations that are logicallyequivalent. This is illustrated above where the second clause for parent is arefinement of the first one using the adding condition rule. In this way, uselesssclause are generated and there is a danger that if the resulting clauses are furtherrefined, there is a danger that the search will end up in an infinite loop.

Plotkin [13] has studied the quotient set induced by θ-subsumption and provenvarious interesting properties. The quotient set consists of classes of clauses thatare equivalent under θ-subsumption. The class of clauses equivalent to a givenclause c is denoted by

[c] = {c′|c′ is equivalent with c under θ-subsumption} (11)

Plotkin proved that

• the quotient set is well-defined w.r.t. θ-subsumption.

• there is a representative, a canonical form, of each equivalence class, theso-called reduced clause. The reduced clause of an equivalence class is theshortest clause belonging to class. It is unique up to variable renaming. Forinstance, in the parent example above, the first clause is in reduced form.

• the quotient set forms a complete lattice, which implies that there is a leastgeneral generalization of two equivalence classes. In the inductive logicprogramming literature, one often talks about the least general generalizationof two clauses.

Several variants of θ-subsumption have been developed. One of the most im-portant ones is that of OI-subsumption [4]. For functor-free clauses, it modifiesthe substitution rule by disallowing substitutions that unify two variables or thatsubstitute a variable by a constant already appearing in the clause. The advantageis that the resulting relation is anti-symmetric, which avoids some of the abovementioned problems with refinement operators. On the other hand, the minimallygeneral generalization of two clauses is not necessary unique, and hence, thereexists no least general generalization operator.

9

Inverse resolution

Applying resolution is a sound deductive inference rule, and therefore, realizesspecialization. Reversing it yields inductive inference rules or generalization oper-ators [7, 9]. This is typically realized by combining the resolution principle witha copy operator. The resulting rules are called absorption (19) and identification(21). They start from the clauses below and induce the clause above the line. Theyare shown here only for the propositional case, as the first order case requires oneto deal with substitions as well as inverse substitutions.

h← g, a1, . . . , an and g ← b1, . . . , bm

h← b1, . . . , bm, a1, . . . , an and g ← b1, . . . , bm(12)

h← g, a1, . . . , an and g ← b1, . . . , bm

h← b1, . . . , bm, a1, . . . , an and h← g, a1, . . . , an(13)

Other interesting inverse resolution operators perform predicate invention, thatis they introduce new predicates that were not yet present in the original data. Theseoperators invert two resolution steps. One such operator is the intra-constructionoperator (14). Applying this operator from bottom to top introduces the new pred-icate q that was not present before.

q ← l1, . . . , lk and p← k1, · · · , kn, q and q ← l′1, · · · , l′mp← k1, · · · , kn, l1, · · · , lk and p← k1, · · · , kn, l′1, · · · , l′m

(14)

The idea of inverting the resolution operator is very appealing because it aimsat inverting the most popular deductive inference operator, but is also rather com-plicated due to the non-determinism and the need to invert substitutions. Due tothese complications, there are only few systems that employ inverse resolution op-erators.

Background knowledge

Inductive logic programming systems employ background knowledge during thelearning process. Background knowledge typically takes the form of a set ofclauses B, which is then used by the covers relation. When learning from en-tailment in the presence of background knowledge B an example e is covered by ahypothesis h if and only if B∪h |= e. This notion of coverage is employed in most

10

of the work on inductive logic programming. in the intial flies example the twoclauses defining bird would typically be considered background knowledge.

The incorporation of background knowledge in the induction process has re-sulted in frameworks for generality relative to a background theory. More formally,a hypothesis g is more general than a hypthesis s relative to the background theoryB if and only if B ∪ g |= s. The only already seen inference rules that deal withmultiple clauses are those based on (inverse) resolution. The other frameworks canbe extended to cope with this generality relation following the logical theory ofgeneralization. Various frameworks have been developed along these lines. Someof the most important ones are relative subsumption [14] and generalized subsump-tion [1], which extend θ-subsumption and the notion of least general generalizationtowards the use of background knowledge. Computing the least general generaliza-tion of two clauses relative to the background theory is realized by first computingthe most specific clauses covering the examples with regard to the background the-ory and then generalizing them using the least general generalization operator ofθ-subsumption.

The first step is the most interesting one, and has been tackled under the nameof saturation [15] and bottom-clauses [8]. We illustrate it within the frameworkof inverse entailment due to Stephen Muggleton [8]. The bottom clause ⊥(c) ofa clause c with regard to a background theory B is the most specific clause ⊥(c)such that

B ∪ ⊥(c) |= c (15)

If B consist of

polygon :- rectangle.rectangle :- square.oval :- circle.

and the example c is

positive :- red, square.

Then the bottom-clause ⊥(c) is

positive :- red, rectangle, square, polygon.

The bottom-clause is useful because it only lists those atoms that are relevant to theexample, and only generalizations (under θ-subsumption) of ⊥(c) will cover theexample. For instance, in the illustration, the bottom-clause does neither mentionoval or circle as clauses for pos containing these atoms will never cover the

11

example clause c. Once the bottom-clause covering an example has been foundthe search process continues as if no background knowledge were present. Eitherspecialization operators (typically under θ-subsumption) would search the space ofclauses more general than ⊥(c), or else the least general generalization of multiplebottom-clauses would be computed.

Equation (15) is equivalent to

B ∪ ¬c |= ¬⊥(c) (16)

which explains why the bottom-clause is computed by finding all factual conse-quences of B ∪ ¬c and then inverting the resulting clause again. On the example:

¬c = {¬ positive, red, square}

and the set of all consequences is

¬⊥(c) = ¬c∪{rectangle, polygon}

which then yields ⊥(c) mentioned above. When dealing with first order logic,bottom-clauses can become infinite, and therefore, one typically imposes furtherestrictions on the atoms that appear in bottom-clauses. These restrictions are partof the syntactic bias.

Further reading

The textbook by Nienhuys-Cheng and De Wolf [12] is the best reference for anin-depth formal description of various frameworks for generality in logic, in par-ticular, for θ-subsumption and some of its variants. The book by De Raedt [3]contains a more complete introduction to inductive logic programming and rela-tional learning, and also introduces the key frameworks for generality in logic. Anearly survey of inductive logic programming and the logical theory of generalityis contained in [10]. Plotkin [13, 14] studied the use θ-subsumption and rela-tive subsumption (under a background theory) for machine learning. Buntine [1]extended these frameworks towards generalized subsumption, and [4]introducedOI-subsumption. Inverse resolution was first used in the system Marvin [16], andthen elaborated by [7] for propositional logic and by [9] for definite clause logic.Various learning settings are studied by [2] and discussed extensively by [3]. Theyare also relevant to probabilistic logic learning and statistical relational learning.

12

References

[1] W. Buntine. Generalized subsumption and its application to induction andredundancy. Artificial Intelligence, 36:375–399, 1988.

[2] L. De Raedt. Logical settings for concept learning. Artificial Intelligence,95:187–201, 1997.

[3] L. De Raedt. Logical and Relational Learning. Springer, 2008.

[4] F. Esposito, A. Laterza, D. Malerba, and G. Semeraro. Refinement of Datalogprograms. In Proceedings of the MLnet Familiarization Workshop on DataMining with Inductive Logic Programing, pages 73–94, 1996.

[5] P.A. Flach. Simply Logical: Intelligent Reasoning by Example. John Wiley,1994.

[6] R. S. Michalski. A theory and methodology of inductive learning. ArtificialIntelligence, 20(2):111–161, 1983.

[7] S. Muggleton. Duce, an oracle based approach to constructive induction. InProceedings of the 10th International Joint Conference on Artificial Intelli-gence, pages 287–292. Morgan Kaufmann, 1987.

[8] S. Muggleton. Inverse entailment and Progol. New Generation Computing,13(3-4):245–286, 1995.

[9] S. Muggleton and W. Buntine. Machine invention of first order predicatesby inverting resolution. In Proceedings of the 5th International Workshop onMachine Learning, pages 339–351. Morgan Kaufmann, 1988.

[10] S. Muggleton and L. De Raedt. Inductive logic programming: Theory andmethods. Journal of Logic Programming, 19/20:629–679, 1994.

[11] S. Muggleton and C. Feng. Efficient induction of logic programs. In Proceed-ings of the 1st Conference on Algorithmic Learning Theory, pages 368–381.Ohmsma, Tokyo, Japan, 1990.

[12] S.-H. Nienhuys-Cheng and R. de Wolf. Foundations of Inductive Logic Pro-gramming. Springer, 1997.

[13] G. D. Plotkin. A note on inductive generalization. In Machine Intelligence,volume 5, pages 153–163. Edinburgh University Press, 1970.

13

[14] G. D. Plotkin. A further note on inductive generalization. In Machine Intelli-gence, volume 6, pages 101–124. Edinburgh University Press, 1971.

[15] C. Rouveirol. Flattening and saturation: Two representation changes for gen-eralization. Machine Learning, 14(2):219–232, 1994.

[16] C. Sammut and R. B. Banerji. Learning concepts by asking questions. InR. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors, Machine Learn-ing: An Artificial Intelligence Approach, volume 2, pages 167–192. MorganKaufmann, 1986.

14

Coverage relation

Definition

In concept-learning (and also in rule-learning and inductive logic programming),the goal is often to find a hypothesis that covers, or matches, all positive examplesby means of a search through the hypothesis space. The covers relation speci-fies the relationship between the hypothesis language, which is used to representconcepts, and the language used to represent instances or examples. The coversrelation forms the basis for defining the generality relation, used to structure thesearch space. More formally, an example e is covered by a hypothesis h whenthe example e belongs to the concept defined by h, that is, when the hypothesisclassifies the example as positive. Sometimes the set of all examples covered by ahypothesis h is denoted as covers(h).

Synonyms

matches

See Also

concept-learning; rule-learning; inductive logic programming; generality relation;learning from entailment; learning from interpretations; learning from proofs; hy-pothesis language

References and recommended reading

1. T.M. Mitchell. Machine Learning. McGraw-Hill. 1997.

15

Generality Relation

Definition

The generality relation is used in concept-learning, rule-learning and inductive logic programming.It specifies when one hypothesis g is more general than another hypothesis s. It isdefined in terms of the covers relation, which specifies for each hypothesis the setof examples it covers or matches. More precisely, a hypothesis g is more generalthan a hypothesis s if g covers all instances that are also covered by s. This issometimes written as covers(s) ⊆ covers(g). The hypothesis g is called a gener-alization of s, and s a specialization of g. The generality relation (partially) ordersthe hypotheses space and can therefore be employed to prune the search for con-sistent hypotheses. For instance, when a hypothesis h does not cover a positiveexample e, then no specialization of h will cover e, and hence, all specializationsof h can be pruned.

Synonyms

See Also

Logic of Generality; Refinement Operators; Subsumption; Minimally general gen-eralization; least general generalization


1. T.M. Mitchell. Machine Learning. McGraw-Hill. 1997.

16

Subsumption

Definition

Subsumption is the term used in inductive logic programming to refer to variousgenerality relations between hypotheses in the form of clauses. In propositional logic,one clause g subsumes a clause s if and only if g |= s, that is, if and only if, g ⊆ s.For instance, the clause flies :- bird, normal, which states that normalbirds fly, subsumes the clause flies :- bird, normal, black because{flies, ¬ bird, ¬ normal} ⊆ {flies, ¬ bird, ¬ normal, ¬black}. While in the propositional case, the subsumption relation is relativeystraightforward, it is more involved when working with first order logic as ininductive logic programming. In this case, one typically employs the θ-subsumptionrelation, which states that a clause g θ-subsumes a clause s if and only if there ex-ists a substitution θ such that gθ ⊆ s. For instance, the clause father(X,Y) :-male(X), parent(X,Y) θ-subsumes the clause father(Z,mary) :- male(Z),parent(Z,mary), female(mary) because applying θ = {X/Z, Y/mary}to the first clause yields { father(Z,mary), ¬ male(Z), ¬ parent(Z,mary)}which is a subset of the second clause { father(Z,mary), ¬ male(Z),¬ parent(Z,mary), ¬ female(mary)}.

Synonyms

See also

Logic of generality; inductive logic programming; refinement operator;


1. L. De Raedt. Logical and Relational Learning. Springer. 2008.

2. S.-H. Nienhuys-Cheng and R. de Wolf. Foundations of Inductive Logic Pro-gramming. Springer, 1997.

17

Refinement operator

Definition

Many machine learning methods search through a hypothesis space structured by agenerality relation. For instance in concept-learning, one searches for a hypothesisthat covers all positive and none of the negative examples. These search proceduresrepeatedly consider candidate hypotheses and refine them to generate successorcandidates when needed. The successor candidates are generated using a refine-ment operator. So, a refinement operator that maps a hypothesis to its set of refine-ments. More formally, a refinement operator is a function ρ : L → P(L), where Lis the set of hypotheses and P(L) denotes its powerset. Usually, one distinguishestwo types of refinement operators: specialization operators, which return a set ofspecializations of the hypothesis, and generalization operators, which return a setof generalizations.

As an illustration of a refinement operator consider the “adding condition rule”of Ryszard Michalski, which specializes rules. For instance, consider the rule

class = + IF outlook = sunny AND temperature = high

it can be specialized by adding conditions to yield, for instance,

class = + IF outlook = sunny AND temperature = high AND humidity = low

Various types of refinement operators have been studied in an inductive logic programmingcontext.

See also

Logic of generality; inductive logic programming



2. R. S. Michalski. A theory and methodology of inductive learning. ArtificialIntelligence, 20(2):111–161, 1983.

18

Learning from entailment

Definition

Learning from entailment is an inductive logic programming setting that is also rel-evant to statistical relational learning. In this setting, both hypotheses and instancesare logical formulae, typically definite clauses, which underlie the programminglanguage Prolog. Furthermore, when learning from entailment, a hypothesis hcovers an instance e if and only if h |= e, that is, when h logically entails e, orequivalently, when e is a logical consequence of h. For instance, consider thehypothesis h:


The first clause or rule can be read as flies if normal and bird, that is, normalbirds fly. The second and third state that blackbirds, resp. ostriches, are birds.Consider now the examples e1:

flies :- blackbird, normal, small.

and e2:

flies :- ostrich, small.

Example e1 is covered by h, because it is a logical consequence of h, that is h |= e1.On the other hand, example e2 is not covered, which we denote as h 6|= e2.

Learning from entailment is often employed in a concept-learning context,where the goal is to learn a hypothesis that covers all positive and none of thenegative examples. It can also be employed when in statistical relational learning.

Synonyms

See also

Logic of generality, inductive logic programming, hypothesis language; statisticalrelational learning; entailment.

19



2. L. De Raedt. Logical settings for concept learning. Artificial Intelligence,95:187–201, 1997.

20

Learning from interpretations

Definition

Learning from interpretations is an inductive logic programming setting that is alsorelevant to statistical relational learning. When learning from interpretations, hy-potheses are logical formulae, typically definite clauses, which underlie the pro-gramming language Prolog, and instances are logical interpretations. For propo-sitional theories, interpretations are assignments of truth-values to propositionalvariables. For instance, when describing birds, two possible interpretations couldbe

{ostrich, small}{blackbird, bird, normal, flies}

where we specify interpretations through the set of propositional variables thatare true. This means that in the first interpretation the only true propositions areostrich and small. An interpretation specifies a kind of possible world. Ahypothesis h then covers an interpretation if and only if the interpretation is a modelfor the hypothesis. An interpretation is a model for a hypothesis if it satisfies allclauses in the hypothesis. Consider now the hypothesis h consist of the followingclauses.


The first clause or rule can be read as flies if normal and bird, that is, normal birdsy. The second and third state that blackbirds, resp. ostriches, are birds. the secondinterpretation is a model for the theory h but the first is not. Because the conditionpart of the rule bird :- ostrich. is satisfied in the first interpretation (as itcontains ostrich), the conclusion part, that is bird, should also belong to theinterpretation in order to have a model. Thus, the first example is covered by thetheory h, but the second is not.

Learning from interpretations is often employed in a concept-learning context,where the goal is to learn a hypothesis that covers all positive and none of thenegative examples. It can also be employed in statistical relational learning.

21

Synonyms

See also

Logic of generality, inductive logic programming, hypothesis language; statisticalrelational learning;



2. L. De Raedt. Logical settings for concept learning. Artificial Intelligence,95:187–201, 1997.

22

Learning from proofs

Definition

Learning from proofs is an inductive logic programming setting that is also relevantto statistical relational learning and inductive programming and trace-based programming.When learning from proofs, hypotheses are logical formulae, typically definiteclauses, which underlie the programming language Prolog, and instances are log-ical proofs. A hypothesis h then covers an instance if and only if the instance is aproof in the theory h.

Consider the following theory:

nat(0).nat(s(X)) :- nat(X).

where the first clause states that 0 is a natural number and the second one, that if,X is a natural number than also s(X) is a natural number. Given this theory, theexample

:-nat(s(s(0))) --- :-nat(s(0)) --- :-nat(0) --- []

is a legal proof, but the example

:-nat(s(s(5))) --- :-nat(s(5)) --- :-nat(5) --- []

is not because the last step is invalid (where the proofs are shown as SLD-refutationproofs).

Learning from proofs is often employed in an inductive logic programmingcontext where the goal is to learn a hypothesis that covers all positive and noneof the negative examples. It can also be employed in statistical relational learningand inductive programming.

Synonyms

learning from traces.

See also

Logic of generality, inductive logic programming, hypothesis language; statisticalrelational learning;trace-based programming; inductive programming.

23



24

Minimally general generalization

Definition

One hypothesis is said to be more general than another one if the former hypothesiscovers all instances that are covered by the later one. The resulting generality relationtypically partial orders the hypothesis space, that is, the generality relation is re-flexive, anti-symmetric and transitive. One cautious strategy for learning performscautious generalization by repeatedly computing the minimally general generaliza-tions of hypotheses. The minimally general generalizations of h1 and h2 are themost specific hypotheses that are more general than h1 and h2. Using the nota-tion g � s to denote that g is more general than s, the set of minimally generalgeneralizations mgg(h1, h2) is defined as follows:

mgg(h1, h2) = max{h ∈ L|h � h1 and h � h2}

where L denotes the hypothesis language and max denotes the most specific ele-ments of the sets.

As one example of a minimally general generalizations consider the hypoth-esis language consisting of strings and as generality relation the substring rela-tion. A string s0...sn is a substring of a string t0...tk if and only if ∃j : s0 =tj and ... and sn = tj+n. For instance, achin is a substring of machine learning.The minimally general generalizations of abcdefabc and defabcdef are defabcand abcdef.

If it is guaranteed that there exists a unique minimally general generalization,then one talks about the least general generalization.

Synonyms

See also

least general generalization, logic of generality, inductive logic programming



2. T. Mitchell. Machine Learning. McGraw-Hill. 1997.

25

Least general generalization

Definition

One hypothesis is said to be more general than another one if the former hypothesiscovers all instances that are covered by the later one. The resulting generality relationtypically partial orders the hypothesis space, that is, the generality relation is re-flexive, anti-symmetric and transitive. One cautious strategy for learning performscautious generalization by repeatedly computing the least general generalization ofhypotheses. The least general generalization of two hypotheses h1 and h2 is theunique most specific generalization of h1 and h2, that is, it is the least upper boundof h1 and h2 in the partial order. The least general generalization of two hypothesesdoes not always exist. Also, in case that the most specific generalizations are notunique, one talks about the minimally general generalization.

For instance, when the hypotheses h1 and h2 are propositional conjunctions,then their least general generalization is the intersection of the conjunctions. In-deed, consider the hypotheses

red and large and square and filledblue and large and square and closed

then the least general generalization is

large and square

The least general generalization is especially known in the context of inductive logic programming,where it is used to refer to Plotkin’s operation for computing the least general gen-eralization of two clauses under θ-subsumption.

Synonyms

See also

least general generalization, logic of generality, generality relation



2. T. Mitchell. Machine Learning. McGraw-Hill. 1997.

26

Resolution

Definition

Resolution is a logical inference rule used in theorem provers and logic program-ming to perform deductive inference. The resolution inference rule starts from twoclauses and generates a so-called resolvent, that is, a clause that is logically en-tailed by the two clauses. For propositional logic, the resolution rule can be writtenas follows:

h← g, a1, . . . , an and g ← b1, . . . , bm

h← b1, . . . , bm, a1, . . . , an(17)

or when writing clauses as disjunctions

h ∨ ¬g ∨ ¬a1 ∨ . . . ∨ ¬an and g ∨ ¬b1 ∨ . . . ∨ ¬bm

h ∨ ¬b1 ∨ . . . ∨ ¬bm ∨ ¬ . . . ∨ ¬an(18)

For instance, consider the clauses

flies :- bird, normal.bird :- blackbird.

where the first clause states that normal birds fly and the second one that blackbirdsare bird. The resolvent of these two clauses is


and is therefore logically entailed by the two clauses.Even though we only introduced resolution for propositional logic, there exists

also a (more involved) version of resolution for first order logic.

Synonyms

See also

Logic of generality, entailment, inverse resolution

27



2. P. Flach. Simply Logical. Wiley. 1994.

3. S. Russell and P. Norvig. Artificial Intelligence: a Modern Approach. 2ndEdition. Prentice Hall.

28

Inverse Resolution

Definition

Inverse resolution is, as the name indicates, a rule that inverts resolution. This fol-lows the idea of induction as the inverse of deduction formulated in logic of generality.The resolution rule is the best-known deductive inference rule, used in many the-orem provers and logic programming systems. Resolution starts from two clausesand derives the resolvent, a clause that is entailed by the two clauses. This can begraphically represented using the following schema (for propositional logic).

h← g, a1, . . . , an and g ← b1, . . . , bm

h← b1, . . . , bm, a1, . . . , an(19)

Inverse resolution operators, such as absorption and identification, invert this pro-cess. To this aim they typically copy assume the resolvent is given together withone of the original clauses and then derive the missing clause. This leads to thefollowing two operators, which start from the clauses below and induce the clauseabove the line.

h← g, a1, . . . , an and g ← b1, . . . , bm

h← b1, . . . , bm, a1, . . . , an and g ← b1, . . . , bm(20)

h← g, a1, . . . , an and g ← b1, . . . , bm

h← b1, . . . , bm, a1, . . . , an and h← g, a1, . . . , an(21)

The operators are shown here only for the propositional case, as the first ordercase is more involved at it requires one to deal with substitions as well as inversesubstitutions.

As one example, consider the clauses

(1) flies :- bird, normal.(2) bird :- blackbird.(3) flies :- blakcbird, normal.

Here, (3) is the resolvent of (1) and (2). Furthermore, starting from (3) and theclause (2) the absorption operator would generate (1), and starting from (3) and (1)the identification operator would generate (2).

29

Synonyms

See also

Logic of generality, resolution,



2. S. Muggleton. Duce, an oracle based approach to constructive induction. InProceedings of the 10th International Joint Conference on Artificial Intelli-gence, pages 287–292. Morgan Kaufmann, 1987.

3. S. Muggleton and W. Buntine. Machine invention of first order predicatesby inverting resolution. In Proceedings of the 5th International Workshopon Machine Learning, pages 339–351. Morgan Kaufmann, 1988.

30

Entailment

Definition

The term entailment is used in the context of logical reasoning. Formally, a logicalformula H entails a formula c if and only if all models of H are also a model of c.This is usually denoted as H |= c and means that c is a logical consequence of Tor that c is implied by T .

Let us elaborate this definition for propositional clausal logic, where the for-mulae are expressions such as:


Here, the first clause or rule can be read as flies if normal and bird, that is, nor-mal birds fly, the second and third one stating that blackbirds, resp. ostriches, arebirds. An interpretation is then an assignment of truth-values to the propositionalvariables. For instance, for the above domain

{ostrich, bird}{blackbird, bird, normal}

are interpretations, specified through the set of propositional variables that are true.This means that in the first interpretation the only true propositions are ostrichand bird. An interpretation specifies a kind of possible world. An interpretation Iis then a model for a clause h : −b1, ..., bn if and only if {b1, ..., bn} ⊆ I → h ∈ Iand it is model for a clausal theory if and only if it is a model for all clauses in thetheory. Therefore, the first interpretation above is a model for the theory, but thesecond one is not because the interpretation is not a model for the first clause (as{bird, normal} ⊆ I but flies 6∈ I). Using these notions it can now beverified that the clausal theory T above logically entails the clause

flies :- ostrich, normal.

because all models of the theory are also a model for this clause.In machine learning, the notion of entailment is used as a covers relation in

inductive logic programming, where hypotheses are clausal theories, instances areclauses, and an example is covered by the hypothesis when it is entailed by thehypothesis.

31

Synonyms

logical consequence, implication

See also

Learning from entailment, learning from interpretations, logic of generality, inverseentailment.



2. P. Flach. Simply Logical. Wiley. 1994.

3. S. Russell and P. Norvig. Artificial Intelligence: a Modern Approach. 2ndEdition. Prentice Hall.

32

Inverse entailment

Definition

Inverse entailment is a generality relation in inductive logic programming. Morespecifically, when learning from entailment using a background theory B, a hy-pothesis H covers an example e, relative to the background theory B if and only ifB ∧H |= e, that is, the background theory B and the hypothesis H together entailthe example (see entailment). For instance, consider the background theory B:

bird :- blackbird.bird :- ostrich.

and the hypothesis H:

flies :- bird.

Together B ∧H entail the example e:


This can be decided through deductive inference. Now when learning from en-tailment in inductive logic programming, one starts from the example e and thebackground theory B and the aim is to induce a rule H that together with B entailsthe example. Inverting entialment now refest on the observation that B ∧H |= e islogically equivalent to B ∧ ¬e |= ¬H , which in turn can be used to compute a hy-pothesis H that will cover the example relative to the background theory. Indeed,the negation of the example is ¬e:

blackbird.normal.:-flies.

and together with B this entails ¬H:

bird.:-flies.

The principle of inverse entailment is typically employed to compute the bottom clause,which is the most specific clause covering the example under entailment. It can becomputed by generating the set of all facts (true and false) that are entailed byB ∧ ¬e and negating the resulting formula ¬H .

33

Synonyms

See also

logic of generality, entailment, bottom clause, inductive logic programming


1. S. Muggleton. Inverse entailment and Progol. New Generation Computing,13(3-4):245286, 1995.

34

Bottom Clause

Definition

The bottom clause is a notion from the field of inductive logic programming. It isused to refer to the most specific hypothesis covering a particular example whenlearning from entailment. When learning from entailment, a hypothesis H coversan example e relative to the background theory B if and only if B ∧ H |= e,that is, B together with H entails the example e. The bottom clause is now themost specific clause satisfying this relationship w.r.t the background theory B anda given example e.

For instance, given the background theory B

bird :- blackbird.bird :- ostrich.

and the example e:


the bottom clause is H

flies :- bird, blackbird, normal.

The bottom clause can be used to constrain the search for clauses covering thegiven example because all clauses covering the example relative to the backgroundtheory should be more general than the bottom clause. The bottom clause can becomputed using inverse entailment.

Synonyms

Saturation, starting clause

See also

See also

logic of generality, entailment, inverse entailment, inductive logic programming

35


1. S. Muggleton. Inverse entailment and Progol. New Generation Computing,13(3-4):245286, 1995.


36

Predicate invention

Definition

Predicate invention is used in inductive logic programming to refer to the the au-tomatic introduction of new relations or predicates in the hypothesis language. In-venting relevant new predicates is one of the hardest tasks in machine learning,because there are so many possible ways to introduce such predicates and becauseit is hard to judge their quality. As one example, consider a situation where thepredicates fatherof and motherof are known. Then it would make senseto introduce a new predicate that is true whenever fatherof or motherof istrue. The new predicate that would be introduced this way corresponds to theparentof predicate. Predicate invention has been introduced in the context ofinverse resolution.

Synonyms

See also

logic of generality, inductive logic programming.


2. S. Muggleton. Duce, an oracle based approach to constructive induction. InProceedings of the 10th International Joint Conference on Artificial Intelli-gence, pages 287–292. Morgan Kaufmann, 1987.

3. S. Muggleton and W. Buntine. Machine invention of first order predicatesby inverting resolution. In Proceedings of the 5th International Workshopon Machine Learning, pages 339–351. Morgan Kaufmann, 1988.

37

Documents

The logic of generality - lirias.kuleuven.be · The logic of generality Luc De Raedt DepartmentofComputerScience,KatholiekeUniversiteitLeuven,Celestijnenlaan 200A, BE - 3001 Heverlee,