20
CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets Carlo Zaniolo Department of Computer Science University of California, Los Angeles January, 2002 Notes From Chapter 9 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari Morgan Kaufmann, 1997

CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

  • Upload
    adonia

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets. Notes From Chapter 9 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari Morgan Kaufmann, 1997. Carlo Zaniolo Department of Computer Science - PowerPoint PPT Presentation

Citation preview

Page 1: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

CS240A: Databases and Knowledge Bases

From Differential Fixpoints to Magic Sets

Carlo Zaniolo

Department of Computer Science

University of California, Los Angeles

January, 2002

Notes From Chapter 9 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari Morgan Kaufmann, 1997

Page 2: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Recursive Predicates

r1: anc(X, Y) parent(X, Y).

r2: anc(X, Z) anc(X,Y), parent(Y,Z).

r2 is a recursive rule---a left linear one

r1 is the a nonrecursive rule defining a recursive

predicate—this is called an exit rule.

An alternative definition for anc:

r3: anc(X, Y) parent(X, Y).

r4: anc(X, Z) anc(X,Y), anc(Y,Z).

Here r4 is a quadratic rule.

Page 3: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Fixpoint Computation

The inflationary immediate consequence operator for P:

P (I) = TP (I) I

We have: P

n ()   =   TPn ()

lfp(TP) = TP () = lfp(P) = P

()

Page 4: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Fixpoint Computation (cont.)

Naïve Fixpoint Algorithm for P (M = for now

{S : = M ; S: = P(M)

while S S{ S : = S;

S: = P(S) } }

We can replace the first P with E and the second

one with R respectively denoting the immediate

consequence operators for the exit rules and the recursive ones.

Page 5: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Differential Fixpoint (a.k.a. Seminaive Computation)

Redundant Computation: the jth iteration step also re-computes all atoms obtained in the (j – 1)th step. Finite differences techniques tracing the derivations over two steps:

1.  S the set of atoms obtained up to step j-1

2.  S’ the set of atoms obtained up to step j

3.  S = R (S) - S = TR (S) - S denotes the new atoms at

step j (i.e., the atoms that were not in S at step j-1)

4.  S = R (S) - S = TR (S) - S are the new atoms

obtained at step j+1.

Page 6: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Differential Fixpoint Algorithm(M = for now

{S := M; S := TE(M);

S:= S S; while S

{ S := TR(S) - S;

S := S ; S := S ;

S:= S S } }

anc, anc, and anc, respectively, denote ancestor atoms that are in S, S, and S = S S.

Page 7: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Rule Differentiation

To compute S: = TR ( S) - S we can use a TR defined by

the following rule:

anc(X, Z) anc(X,Y), parent(Y,Z). This can be rewritten as:

anc(X, Z) anc(X,Y),   parent(Y,Z). anc(X, Z) anc(X,Y),  parent(Y,Z).

The second rule can now be eliminated, since it produces only atoms that were already contained in anc, i.e., in the S computed in the previous iteration.

Thus, for linear rules, replace: S := TR(S) - S by

S := TR(S) - S

Forn nonlinear rules the rewriting is more complex.

Page 8: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Non Linear Rules

ancs(X, Y) parent(X, Y).

ancs(X, Z) ancs(X,Y), ancs(Y,Z).

r: ancs(X, Z) ancs(X,Y), ancs(Y,Z).

r1:ancs(X, Z) ancs(X,Y), ancs(Y,Z).

r2:ancs(X, Z) ancs(X,Y),  ancs(Y,Z).

Now, we can re-write r2 as:

r2,1:ancs(X, Z) ancs(X,Y), ancs(Y,Z).

r2,2:ancs(X, Z) ancs(X,Y),  ancs(Y,Z).

Rule r2,2 produces only `old' values, and can be eliminated. We are left with rules r1 and r2,1:

ancs(X, Z) ancs(X,Y), ancs(Y,Z).

ancs(X, Z) ancs(X,Y),  ancs(Y,Z).

Page 9: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Semivaive Fixpoint (cont.)

Analogy with symbolic differentiation Performance improvements: it is typically the case that n =

S << N = S S. The original ancs rule, for instance, requires the equijoin of

two relations of size N; after the differentiation we need to compute two equijoins, each joining a relation of size n with one of size N.

Page 10: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

General Nonlinear Rules

A recursive rule of rank k is as follows:

r: Q0    c0,  Q1,   c1,  Q2, Qk,  ck

Is rewritten as follows:

r1: Q0    c0,  Q1,  c1,  Q2, Qk,  ck

r2: Q0    c0,  Q1,  c1,  Q2, Qk,  ck

rk:Q0    c0,  Q1, 

c1,  Q2,   Qk,  ck

Thus the jth rule has the form:

rj:Q0     Q Qj   Q   

Page 11: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Iterated Fixpoint Computation for program P stratified in n strata

Let Pj, 1 j n denote the rules with their head in the j-th

stratum. Then, Mj be inductively constructed as follows:

1.  M0 = and

2.  Mj = Pj (Mj-1).

The naïve fixpoint algorithm remains the same, but M := Mj-1 and P is replaced by Pj

Theorem: Let P be a positive program stratified in n strata, and let Mn be the result produced by the iterated fixpoint

computation. Then, Mn = lfp(TP).

For programs with negated goals the computation by strata is necessary to produce the correct result (I.e., the Mn is the

stable model for P---not discussed here)

Page 12: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Bottom-Up versus Top-Down Computation

anc(X, Y) parent(X, Y). Compiled Rules anc(X, Z) anc(X,Y), parent(Y,Z).

parent(X, Y) father(X, Y). parent(X, Y) mother(X, Y).

mother(anne, silvia). Database mother(silvia, marc).

The differential fixpoint is computed in a bottom-up fashion. For a query ?anc(X, Y) this is optimal.

But many queries are such as ?anc(marc, Y) we want to propagate down the ‘marc’ constraint. Same for query forms: ?anc($X, Y), ?anc(X, $Y), or ?anc($X, $Y).

Page 13: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Specialization for Left-linear Recursive Rules

?anc(tom, Desc).

anc(Old, Young) parent(Old, Young).

anc(Old, Young) anc(Old, Mid), parent(Mid, Young)

This is changed into:

? anc(tom, Desc )

anc(Old/tom, Young) parent(Old/tom, Young).

anc(Old/tom, Young) anc(Old/tom, Mid), parent(Mid, Young).

Similar to the pushing selection inside recursion of query optimizers.

This works for left-linear rules with the query form: ?anc($Someone, Desc)

Page 14: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Right-linear rules

anc(Old, Young) parent(Old, Young).

anc(Old, Young) parent(Old, Mid), anc(Mid, Young).

Descendants of Tom: ? anc(TOM, X)

This query can no longer be implemented by specializing the program. Solution: turn the rules into equivalent left-recursive ones!

Symmetrically anc(X, $Y) cannot be supported into the above, to right-linear one above to which specialization applies.

The situation is symmetric. A query such as anc(X, $Y) cannot be supported on the left-linear version of the program. But the program can be transformed into the one above, to right-linear rules above to which specialization can apply.

For each left (right) linear rule there exists an equivalent right(left) linear program---similar tor regular grammars in PLs.

Deductive Database compilers do that.

Page 15: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

The Magic Set Method

Specialization only works for left/right linear programs. It does not work in general, even for linear rules. The same generation example:

sg(A , A). sg(X, Y) parent(XP,X), sg(XP,YP), parent(YP,Y). ?sg(marc, Who).

This program cannot be computed in a bottom-up fashion because the exit rule is not safe.

We can compute a “magic” set containing all the ancestors of marc and add them to the two rules.

Page 16: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Magic Sets fornon-recursive rules

Find the graduating seniors and their parents’ address:

spa(SN, PN, Paddr) senior(SN), parent(SN, PN),address(PN, Paddr).

senior(SN) student(SN, _, senior),graduating(SN).

To find the address of the parent named `Joe Doe’

?spa(SN, `Joe Doe’, Paddr)

Suppose that computing parent(X, $Y) is safe and not too expensive.

Page 17: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Magic Set Rewriting

spa_q(‘Joe Doe’).

m.senior(SN) spa_q(SN), parent(SN,PN).

senior(SN) m.senior(SN),student(SN, _, senior), graduating(SN).

The rest remains unchanged:spa(SN, PN, Paddr) senior(SN), parent(SN,PN),

address(PN,Paddr).

? spa(SN, `Joe Doe’, Paddr).

Page 18: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

The Same Generation Example

sg(A , A). sg(X, Y) parent(XP,X), sg(XP,YP), parent(YP,Y). ?sg(marc, Who).

This program cannot be computed in a bottom-up fashion because the exit rule is not safe.

We can compute a “magic” set containing all the ancestors of marc and add them to the two rules.

The magic set computation utilizes the bound arguments and goals in rules (blue).The first argument of sg is bound in the query. Thus X is bound and through goal parent(XP, X) the binding is passed to XP in the recursive goal. The variables Y and YP remain unbound

Page 19: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Magic Sets (Cont.)

Magic set rules: m.sg(marc). m.sg(XP) m.sg(X), parent(XP,X).

Transformed rules:

sg(X, X) m.sg(X).

sg(X, Y) parent(XP,X), sg(XP,YP), parent(YP,Y), m.sg(X).

Query: ?sg(marc, Who).

The rules for the magic predicates are built by using:

(1) the query constant as the exit rule (a fact).

(2) the bound arguments and predicates from the recursive rules---but the head and tail must be switched!

Page 20: CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Recursive Methods

There are many other recursive methods, but the magic set is the most general and more widely use in deductive systems—including LDL++