Upload
deasia-ballinger
View
220
Download
0
Embed Size (px)
Citation preview
CS848: Topics in Databases: Foundations of Query Optimization
Topics covered
Introduction to description logic: Single column QL
The ALC family of dialects
Terminologies
Language extensions
CS848: Topics in Databases: Foundations of Query Optimization
Single column QLD ::= THING | C Q ::= D as x
| (empty x) | (THING as x minus C as x) | (from Q1, Q2) | (elim x from x.A = y, elim y from y = x, Q) | (x.Pf1 = x.Pf2) | (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,
elim y from y = x, THING as x minus Q) |
CS848: Topics in Databases: Foundations of Query Optimization
Initial analysis
The language L2 consists of all formulae of FOPC with equality and constant functions that use at most two distinct variables.
Theorem: The satisfiability problem for L2 is NEXPTIME-complete.
Corollary: The query containment problem for single column QL is decidable for queries that are attribute free.
CS848: Topics in Databases: Foundations of Query Optimization
New syntax (cont’d)
D ::= THING | C Q ::= D as x | ? , | (empty x)
| (THING as x minus C as x) | (from Q1, Q2) | (elim x from x.A = y, elim y from y = x, Q) | (x.Pf1 = x.Pf2) | (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,
elim y from y = x, THING as x minus Q) |
CS848: Topics in Databases: Foundations of Query Optimization
New syntax (cont’d)
D ::= THING | C Q ::= D as x | ? | :C , | (THING as x minus C as x)
| (from Q1, Q2) | (elim x from x.A = y, elim y from y = x, Q) | (x.Pf1 = x.Pf2) | (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,
elim y from y = x, THING as x minus Q) |
CS848: Topics in Databases: Foundations of Query Optimization
New syntax (cont’d)
D ::= THING | C Q ::= D as x | ? | :C | C1 u C2 , | (from Q1, Q2)
| (elim x from x.A = y, elim y from y = x, Q) | (x.Pf1 = x.Pf2) | (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,
elim y from y = x, THING as x minus Q) |
CS848: Topics in Databases: Foundations of Query Optimization
New syntax (cont’d)
D ::= THING | C Q ::= D as x | ? | :C | C1 u C2
| 8A.D , | (elim x from x.A = y, elim y from y = x, Q) | (x.Pf1 = x.Pf2) | (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,
elim y from y = x, THING as x minus Q) |
CS848: Topics in Databases: Foundations of Query Optimization
New syntax (cont’d)
D ::= THING | C Q ::= D as x | ? | :C | C1 u C2
| 8A.D | Pf1 = Pf2 , | (x.Pf1 = x.Pf2)
| (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,
elim y from y = x, THING as x minus Q) |
CS848: Topics in Databases: Foundations of Query Optimization
New syntax (cont’d)
D ::= THING | C Q ::= D as x | ? | :C | C1 u C2
| 8A.D | Pf1 = Pf2
| Pf1 Pf2 , | (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,
elim y from y = x, THING as x minus Q) |
CS848: Topics in Databases: Foundations of Query Optimization
New syntax (cont’d)
D ::= THING | C Q ::= D as x | ? | :C | C1 u C2
| 8A.D | Pf1 = Pf2
| Pf1 Pf2
| 9R.THING , | (elim x x.R = y) | (THING as x minus elim x from x.R = y,
elim y from y = x, THING as x minus Q) |
CS848: Topics in Databases: Foundations of Query Optimization
New syntax (cont’d)
D ::= THING | C Q ::= D as x | ? | :C | C1 u C2
| 8A.D | Pf1 = Pf2
| Pf1 Pf2
| 9R.THING | 8R.D , | (THING as x minus elim x from x.R = y,
elim y from y = x, THING as x minus Q) |
CS848: Topics in Databases: Foundations of Query Optimization
New syntax (cont’d)
Q ::= D as x |
D ::= THING | C | ? | :C | C1 u C2
| 8A.D | Pf1 = Pf2
| Pf1 Pf2
| 9R.THING | 8R.D | (D)
CS848: Topics in Databases: Foundations of Query Optimization
New syntax (cont’d)
Q ::= D as x |
D ::= > | C | ? | :C | C1 u C2
| 8A.D | Pf1 = Pf2
| Pf1 Pf2
| 9R.> | 8R.D | (D)
CS848: Topics in Databases: Foundations of Query Optimization
Concept dependencies
On terminology and notation: We call an instance of the language generated by D for a given DL a concept. A concept inclusion dependency C for a given DL is written
D1 v D2
and corresponds to the query containment dependency
(D1 as x) v (D2 as x).
A concept definition C for a given DL is written
C ´ D
and corresponds to the query equivalence dependency
(C as x) ´ (D as x).
CS848: Topics in Databases: Foundations of Query Optimization
CLASSIC† (our first DL) (syntax) (semantics)
D ::= (universal concept) | > (primitive concept) | C (C)I
(bottom concept) | ? ; (atomic negation) | :C – (C)I
(intersection) | D1 u D2 (D1)I Å (D2)I
(attribute value restriction) | 8A.D {e : (A)I(e) 2 (D)I} (path agreement) | Pf1 = Pf2 {e : (Pf1)I(e) = (Pf2)I(e)} (path disagreement) | Pf1 Pf2 {e : (Pf1)I(e) (Pf2)I(e)} (existential quantification) | 9R.D {e1 : 9e2 : (e1, e2) 2 (R)I Æ e2 2 (D)I} (role value restriction) | 8R.D {e1 : 8(e1, e2) 2 (R)I : e2 2 (D)I}
| (D)
†[Borgida and Patel-Schneider, 1994]
CS848: Topics in Databases: Foundations of Query Optimization
Concept dependencies (cont’d)
The concept inclusion problem for a given DL is to determine if a concept inclusion dependency in the DL, D1 v D2, is an axiom; that is, to determine if (D1)I µ (D2)I for any database I.
Theorem: The concept inclusion problem for CLASSIC is solvable in low order polynomial time.
CS848: Topics in Databases: Foundations of Query Optimization
An efficient decision procedure
Theorem: The following procedure decides if C = (D1 v D2) is anaxiom for CLASSIC, and can be implemented in low order polynomialtime.
1. Create a partial database I1 consisting of a single individual e in concept D1. Perform a simple chase of I1 to obtain a partial database I2.
2. Return true if the domain of I2 is empty, or if the tuple
hx : e , cnt : 1i
occurs in «D2 as x¬(I2)†; otherwise return false.
†Use forced semantics for agreements and disagreements.
CS848: Topics in Databases: Foundations of Query Optimization
The simple chase
n : {D1 t D2} [ L n : {D1, D2} [ L
n1 : {8A.D} [ L n2 : {D}n1 : LA
n1 : {9R.D} [ L n2 : {D}n1 : LR
CS848: Topics in Databases: Foundations of Query Optimization
The simple chase (cont’d)
n2 : L2n1 : {8R.D} [ L1
R
n2 : {D} [ L2n1 : L1
R
n : {A1.A2. .Ar = B1.B2. .Bs} [ L
n : L u1 : ; ur : ;A1 ArA2
v1 : ; vs : ;BsB2B1
CS848: Topics in Databases: Foundations of Query Optimization
The simple chase (cont’d)
n : {A1.A2. .Ar B1.B2. .Bs} [ L
n : L u1 : ; ur : ;A1 ArA2
v1 : ; vs : ;BsB2B1
w : L u : L1
A
v : L2
A
w : L u : L1
A
v : L2
A
CS848: Topics in Databases: Foundations of Query Optimization
The simple chase (cont’d)
n1 : L1 n2 : L2 n1 : L1 [ L2 n2 : L1 [ L2
n1 : L1 n2 : L2 n3 : L3
n1 : L1 n2 : L2 n3 : L3
u : L1 v : L3
A
x : L4
Aw : L2
u : L1 v : L3
A
x : L4
Aw : L2
CS848: Topics in Databases: Foundations of Query Optimization
The simple chase (cont’d)
w : L u : L1
A
v : L2
A
w : {?} u : L1
A
v : L2
A
u : L1 v : L3
A
x : L4
Aw : L2
u : L1 v : L3
A
x : L4
Aw : L2
CS848: Topics in Databases: Foundations of Query Optimization
The simple chase (cont’d)
(remove all nodes and incident arcs)n : {?} [ L
or
m : L1 n : L2
n : {C, :C } [ L
or
CS848: Topics in Databases: Foundations of Query Optimization
Evaluating agreements and disagreements
Note that agreements and disagreements can navigate missing attribute values. In such cases, assume a forced semantics. In particular, a node n satisfies an agreement iff the agreement has the form
Pf1.Pf = Pf2.Pf
where (Pf1)I(n) and (Pf2)I(n) are defined and lead to nodes connected by an equality arc; n satisfies a disagreement iff it has the form
Pf1 = Pf2
where (Pf1)I(n) and (Pf2)I(n) are defined and lead to nodes connected by an inequality arc.
CS848: Topics in Databases: Foundations of Query Optimization
Example
Observation: The chase decision procedure for CLASSIC can be implemented in O(n log n) time, where n is the length of the component descriptions.
select e from EMP as ewhere e = e.b.b.b and e = e.b.b.b.b.b
(from (EMP as x), (from (x = x.b.b.b), (x = x.b.b.b.b.b)))
´ EMP u (id = b.b.b) u (id = b.b.b.b.b) as x
EMP u (id = b.b.b) u (id = b.b.b.b.b)
´ EMP u (id = id.b)
EMP u (id = b) as x)
select e from EMP as e where e = e.b
CS848: Topics in Databases: Foundations of Query Optimization
The ALC family of DLs
(syntax) (semantics)
D ::= (primitive concept) | C (C)I
(universal concept) | > (bottom concept) | ? ; (atomic negation) | :C – (C)I
(intersection) | D1 u D2 (D1)I Å (D2)I
(role value restriction) | 8R.D {e1 : 8(e1, e2) 2 (R)I : e2 2 (D)I} (limited existential quantification) | 9R.> {e1 : 9e2 : (e1, e2) 2 (R)I Æ e2 2 (D)I} (union) | D1 t D2 (D1)I [ (D2)I
(full existential quantification) | 9R.D {e1 : 9e2 : (e1, e2) 2 (R)I Æ e2 2 (D)I} (quantified number restriction) | (> n R) {e1 : |{e2 : (e1, e2) 2 (R)I}| ¸ n} (quantified number restriction) | (6 n R) {e1 : n ¸ |{e2 : (e1, e2) 2 (R)I}|} (full negation) | :D – (D)I
CS848: Topics in Databases: Foundations of Query Optimization
The ALC family of DLs (cont’d)
FL0 FL– AL ALN
D ::= C p p p p | > p p p | ? p p p | :C p p | D1 u D2 p p p p | 8R.D p p p p | 9R.> p p p | D1 t D2
| 9R.D | (> n R) p | (6 n R) p | :D
CS848: Topics in Databases: Foundations of Query Optimization
The ALC family of DLs (cont’d)
ALU ALE ALUE ALC ALCN
D ::= C p p p p p | > p p p p p | ? p p p p p | :C p p p p p | D1 u D2 p p p p p | 8R.D p p p p p | 9R.> p p p p p | D1 t D2 p p ± p | 9R.D p p ± p | (> n R) p | (6 n R) p | :D ± p p
CS848: Topics in Databases: Foundations of Query Optimization
Some complexity results
Theorem: The concept inclusion problems for ALC and ALCN are PSPACE-complete.
A consistency problem for a given set of concepts is to determine if there exists a database that interprets a given member of the set as nonempty.
Observation: The consistency problem for ALC (resp. ALCN ) coincides with the concept inclusion problem for ALC (resp. ALCN ). In particular,
D1 v D2
is an axiom iff the concept(D1 u :D2)
is not consistent.
CS848: Topics in Databases: Foundations of Query Optimization
Testing consistency in ALC
Theorem: The following procedure decides if a given concept D in ALCis consistent.
1. Create a singleton set S1 = {I} of partial databases in which I consists of a single individual e in concept D. Perform a union generalized chase of S1 to obtain a set of partial databases S2 = {I1, … , In}.
2. Return true if the domain of any database in S2 is nonempty; otherwise return false.
CS848: Topics in Databases: Foundations of Query Optimization
Union generalized chase
Repeatedly do the following to a given set of partial databases S until nochanges occur.
1. Apply the simple chase augmented with the negation rule to a member of S.
2. If S contains a partial database I that in turn contains a node n with the form on the left below, then replace I with two partial databases I1 and I2 in S in which the labeling of node n is revised to the forms on the right below.
e : {D1t D2} [ L e : {D1} [ L e : {D2} [ L
(old node n in I) (new node n in I2)(new node n in I1)
CS848: Topics in Databases: Foundations of Query Optimization
The negation rule
Exhaustively apply the following rewrites to the concept labeling for any given node:†
:> ) ?:? ) >::D ) D:(D1 u D2) ) (:D1) t (:D2):8A.D ) 8A.:D:8R. D ) 9R.:D:9R. D ) 8R.:D:(D1 t D2) ) (:D1) u (:D2)
†Obtains negation normal form for concept descriptions.
CS848: Topics in Databases: Foundations of Query Optimization
A general membership problem
A database schema T that consists of concept dependencies in which no primitive concept occurs more than once on the left-hand-side of a concept definition is called a terminology.
The membership problem for a DL dialect is to determine, given a set
{C1, … , Cn, C} of concept dependencies in the DL, if {C1, … , Cn} ² C; that is, if every database I that models each Ci also models C.
Theorem: The membership problem for CLASSIC is undecidable.
Theorem: The membership problem for ALCN is DEXPTIME-complete.
CS848: Topics in Databases: Foundations of Query Optimization
Varieties of terminologies
A terminology T with only concept definitions is definitional.
For each C1 ´ D occurring in a terminology T and each primitive concept C2 occurring in D, C1 has a direct use of C2. The use relation is the transitive closure of direct use.
T is cyclic iff there exists an atomic concept in T that has a use of itself.
T is acyclic iff it is definitional and is not cyclic.
CS848: Topics in Databases: Foundations of Query Optimization
An acyclic terminology in ALC
WOMAN ´ PERSON u FEMALE
MAN ´ PERSON u :WOMAN
MOTHER ´ WOMAN u 9hasChild.PERSON
FATHER ´ MAN u 9hasChild.PERSON
PARENT ´ FATHER t MOTHER
GRANDMOTHER ´ MOTHER u 9hasChild.PARENT
MOTHERWITHMANYCHILDREN ´ MOTHER u > 3 hasChild
MOTHERWITHOUTDAUGHTER ´ MOTHER u 8hasChild.:WOMAN
WIFE ´ WOMAN u 9hasHusband.MAN
CS848: Topics in Databases: Foundations of Query Optimization
More complexity results
Theorem: The membership problem for FL0 with acyclic terminologies is CoNP-complete.
Theorem: The membership problem for ALC with acyclic terminologies is PSPACE-complete.
The DL ALCF extends ALC with agreements and disagreements of path functions.
Theorem: The concept inclusion problem for ALCF is PSPACE-complete.
Theorem: The membership problem for ALCF with acyclic terminologies is NEXPTIME-complete.
CS848: Topics in Databases: Foundations of Query Optimization
Blocking
Theorem: The membership problem for ALCN is DEXPTIME-complete.
The membership problem for ALCN can be solved by a refinement of theconsistency checking algorithm for concepts in ALC. There are twoimportant tricks to note.
1. Each concept dependency occurring in the terminology, e.g. D1 v D2, is internalized to each new node by adding a corresponding concept, e.g. (:D1 t D2), to the node’s label.
2. To ensure termination, no chasing is performed on blocked nodes. A node is blocked if its concepts are included in an older node.
CS848: Topics in Databases: Foundations of Query Optimization
Language extensions
Role constructors
Role value maps
Uniqueness constraints