8
fnfomr. Sysmm Vol. 9, No. 2, pp. 139-146, 1984 Printed in the U.S.A. 03~3?5%4 53.m + ‘00 pergamon Preed Ltd. CONSTRAINT PRESERVING AND LOSSLESS DATABASE TRANSFORMATIONS JOHN GRANT Department of Mathematics and Computer Science, Towson State University, Towson, MD 21204, U.S.A. (Received 18 January 1983; in ratised form 21 June 1983) Abstrw&--The dependency preserving and lossless join properties for relational database decomposition are generalized to the constraint preserving and lossless properties for mappings between database systems. The constraint preserving and lossless properties are investigated both for relational database normalization by decomposition and for relational database distribution by selection. The constraint preserving and lossless properties are also discussed for mappings between heterogeneous databases. 1. INTRODUCI'ION In this paper we show that the dependency preserving and lossless join proper&[7 for relational database decomposition can be generalized to many different kinds of database transformations. We call these generalized properties constraint preserving and lo- ssless. Our framework for describing database map- pings is interpretations between languages[3,4]. Therefore, our definition can be used not only for relational databases but also for hierarchic and net- work databases via database logic[2]. The outline of this paper is as follows. In Section 2 we give definitions and terminology. In Section 3 we present our formal definitions for constraint pre- serving and Iossless mappings, and show how they generalize the dependency preserving and lossless join properties. In Section 4 we discuss relational database normalization in more detail. In Section 5 we apply the definitions to database distribution by selection. We conclude the paper in Section 6 by illustrating how the constraint preserving and lossless properties apply to hierarchic and network databases. 2 PREP In this section we review the definitions needed to use interpretations and state a fundamental result. We illustrate the definitions by an example. A language in (many-sorted) first-order logic for a relational database schema contains nonlogical sym- bols for the relations of the schema, and may contain tResearch supported by NSF Grant MCS 79-194IS. additional constant, function, and/or predicate sym- bols. (A language for a hierarchic or network schema is in database logic.) Usually a database mapping involves the modification of the schema. Given lan- guages L and L’ (representing 2 schemas), an inter- pretation f: L -*I, is a language map which describes the change in the schema. A structure M for L is a database which conforms to the schema for L. Given a sentence s of L and a structure M for L, s is either true in M or false in M. A structure is specified by the set of atomic formulas true in it. An interpretation f: L -+L’ induces a map (also denoted by fi from s~ctures for L’ to struc- tures for L in the following way: Given M’, a structure for L’, we defGref(M’), a structure for L, by setting R(a) true in f(W) ifff(R(a)) is true in M’ (for every relation R and constant sequence a). Note that the induced map f for structures goes in the opposite direction as the map f for languages. The following theorem connects the two types of inter- pretations. For every sentence s of L and structure N’ for L’, s is true inf(M) iff f(s) is true in M’. We define a system T = (L, S> as a pair consisting of a language L and a set of sentences S of L. S represents the constraints. Given T = (L, S) and T’ = (L’, S’), an interpretation f: L -+L’ is a tram- formation if S’ logically implies f(S). If every element of S is true in a structure M for L then M is an instance of T. Exampfe 2.2 Let L, contain the relations EMPINFO(EMP, CHILD, SALARY, YEAR) and DIVINFO(DIVISION, DEPT, LOCATION) and let & contain the relations EMPFAM(EMP, CHILD), EMPSAL(EMP, SALARY, YEAR), SALESDIV(DIVISION, DEPT, LOCATION), SERVICEDIV(DIVISION, DEPT, LOCATION), and OTHERDIV(DIVISION, DEPT, LOCATION). In Fig. 1 we show the tables of a structure Mi for L,; in Fig. 2 we give the tables of a structure h& for L2. 139

Constraint preserving and lossless database transformations

Embed Size (px)

Citation preview

Page 1: Constraint preserving and lossless database transformations

fnfomr. Sysmm Vol. 9, No. 2, pp. 139-146, 1984 Printed in the U.S.A.

03~3?5%4 53.m + ‘00 pergamon Preed Ltd.

CONSTRAINT PRESERVING AND LOSSLESS DATABASE TRANSFORMATIONS

JOHN GRANT Department of Mathematics and Computer Science, Towson State University, Towson, MD 21204,

U.S.A.

(Received 18 January 1983; in ratised form 21 June 1983)

Abstrw&--The dependency preserving and lossless join properties for relational database decomposition are generalized to the constraint preserving and lossless properties for mappings between database systems. The constraint preserving and lossless properties are investigated both for relational database normalization by decomposition and for relational database distribution by selection. The constraint preserving and lossless properties are also discussed for mappings between heterogeneous databases.

1. INTRODUCI'ION

In this paper we show that the dependency preserving and lossless join proper&[7 for relational database decomposition can be generalized to many different kinds of database transformations. We call these generalized properties constraint preserving and lo- ssless. Our framework for describing database map- pings is interpretations between languages[3,4]. Therefore, our definition can be used not only for relational databases but also for hierarchic and net- work databases via database logic[2].

The outline of this paper is as follows. In Section 2 we give definitions and terminology. In Section 3 we present our formal definitions for constraint pre- serving and Iossless mappings, and show how they generalize the dependency preserving and lossless join properties. In Section 4 we discuss relational database normalization in more detail. In Section 5 we apply the definitions to database distribution by selection. We conclude the paper in Section 6 by illustrating how the constraint preserving and lossless properties apply to hierarchic and network databases.

2 PREP

In this section we review the definitions needed to use interpretations and state a fundamental result. We illustrate the definitions by an example.

A language in (many-sorted) first-order logic for a relational database schema contains nonlogical sym- bols for the relations of the schema, and may contain

tResearch supported by NSF Grant MCS 79-194IS.

additional constant, function, and/or predicate sym- bols. (A language for a hierarchic or network schema is in database logic.) Usually a database mapping involves the modification of the schema. Given lan- guages L and L’ (representing 2 schemas), an inter- pretation f: L -*I, is a language map which describes the change in the schema.

A structure M for L is a database which conforms to the schema for L. Given a sentence s of L and a structure M for L, s is either true in M or false in M. A structure is specified by the set of atomic formulas true in it. An interpretation f: L -+L’ induces a map (also denoted by fi from s~ctures for L’ to struc- tures for L in the following way: Given M’, a structure for L’, we defGref(M’), a structure for L, by setting R(a) true in f(W) ifff(R(a)) is true in M’ (for every relation R and constant sequence a). Note that the induced map f for structures goes in the opposite direction as the map f for languages. The following theorem connects the two types of inter- pretations.

For every sentence s of L and structure N’ for L’, s is true inf(M) iff f(s) is true in M’.

We define a system T = (L, S> as a pair consisting of a language L and a set of sentences S of L. S represents the constraints. Given T = (L, S) and T’ = (L’, S’), an interpretation f: L -+L’ is a tram-

formation if S’ logically implies f(S). If every element of S is true in a structure M for L then M is an instance of T.

Exampfe 2.2 Let L, contain the relations

EMPINFO(EMP, CHILD, SALARY, YEAR) and DIVINFO(DIVISION, DEPT, LOCATION) and let & contain the relations EMPFAM(EMP, CHILD), EMPSAL(EMP, SALARY, YEAR), SALESDIV(DIVISION, DEPT, LOCATION), SERVICEDIV(DIVISION, DEPT, LOCATION), and OTHERDIV(DIVISION, DEPT, LOCATION).

In Fig. 1 we show the tables of a structure Mi for L,; in Fig. 2 we give the tables of a structure h& for

L2.

139

Page 2: Constraint preserving and lossless database transformations

140

We define f, : L, + L, by setting

J. GRANT

f,(EMPINFO(EMP, CHILD, SALARY, YEAR)) = EMPFAM(EMP, CHILD) A EMPSAL(EMP, SAL- ARY, YEAR), and f,(DIVINFO(DIVISION, DEPT, LOCATION)) = (SALESDIV(DIVISION, DEPT, LOCATION) A DIVISION = SALES) V (SERVICEDIV(DIVISION, DEPT, LOCATION) A DIVISION = SERVICE) V (OTHERDIV(DIVISION, DEPT, LOCATION) A DIVISION # SALES A DIVISION # SERVICE).

The EMPINFO-table of M, is the join of the EMPFAM and EMPSAL-tables of M,, while the DIVINFO-table of M, is the union of the SALESDIV, SERVICEDIV and OTHERDIV-tables of M2 (subject to the proper values for the DIVISION attribute in these tables). Thus M, is the structure induced by f, from M2, that is, M, =f,(M,).

Let T, = (L,, S,) and T, = (L2, S2) where S, = {EMPINFO: (EMP, YEAR)-+SALARY, EMP+-CHILD} and S, = {EMPsAL: (EMP, YEAR)-+SALARY; SALESDIV: DEPT-+L~CATION; SERVICEDIV:DEPT+LOCATION}.

Let s, = EMPINFO:(EMP, YEAR)+SALARY. Writing s, as a formula we get s, = VEMP,VCHILD,VCHILDzVSALARY,VSALARY2VYEAR, ((EMPINFO(EMP,, CHILD,, SALARY,, YEAR,) A EMPINFO(EMP,, CHILD,, SALARY,, YEAR,))+ SALARY, = SALARY*).

Then f(s,) = VEMP,VCHILD,VCHILDrVSALARY,VSALARYrVYEAR, ((EMPFAM(EMP,, CHILD,) A EMPSAL(EMP,, SALARY,, YEAR,) A EMPFAM(EMP,,CHILD,) A EMPSAL(EMP,, SALARY,, YEAR&-SALARY, = SALARY,).

Clearly EMPSAL: (EMP, YEAR)-SALARY logically implies f(q). In fact, fi is a transformation from T, to T2. We also note that M, is an instance of T, and M2 is an instance of T2.

EWINFO

THOMAS KELLY SUSAN 31200 1982

NANCY JONES LINDA 22500 1981

NANCY JONES LINDA 24050 1982

MARY TAYLOR FRANCIS 20400 1982

MARY TAYLOR MICHAEL 20400 1902

MARY TAYLOR JOHN 20400 1902

DIVINFO

Fig. 1. The tables for M,.

Page 3: Constraint preserving and lossless database transformations

FZPFAid

Constraint pres

EBfP

JOHN SMITH

ROBERT ADAMS

ROBERT ADAMS

THOMAS KELLY

NANCY JONES

SALESDIV

SEXVICEDIV

OTHERDIV

NARY TAYLOR

MARY TAYLOR

MARY TAYLOR

!rving an

CHILD

SUSAN

HARRY

DAVID

SUSAN

LINDA

FRANCXZ

MIDGET

JOHN

Iossless datE ase transfo~ations

FlMPSAI

~1

Fig. 2. The tables for MP

141

3. THE CONSTRAINT PRESERVING AND

LOSSLFSS PROPERTIES

In this section we generalize the dependency pre- serving and lossless join properties for relational database decomposition {[7] Section 7.3). The de- pendency preserving and bssless join properties are studied as adequacy properties for a good decom- position in[5] along with other conditions. We prove that the constraint preserving and lossless properties that we define reduce to the dependency preserving and lossless join properties for the case of relational databases with functional dependency constraints.

In the application presented in[4] one system TE is an external view, the other system T, is the concep- tual view, and f: L,+L, is the external-to-conceptual mapping. Since an external view may comprise only

a small portion of the conceptual view, there need not be an inverse interpretation g : Lc-*Le. However, in many cases, there is a natural inverse mapping. In the case of relational database decomposition by projection, the inverse mapping is the join. We will also investigate database distribution by selection where the inverse mapping is the union. From now on we deal only with cases where there is a pair (Jg) of interpretations, f: L +L' and g : L’-+L

We now generalize the dependency preserving and lossless join properties to interpretations between systems. We say that (A g) is constraint preserving if both f and g are transformations. We call (f,g) lossless if for every instance M of T, f@(M)) = hi. We note that the constraint preserving property is symmetric with respect to L. and L’, while the lossless property is not.

Example 3. I We continue 2.2 by defining g, : L2-+L, as follows:

g,(EMPFAM(EMP, CHILD)) = 3SALARY3YEAR EMPINFO(EMP, CHILD, SALARY, YEAR), g,(EMPSAL(EMP, SALARY, YEAR)) = XHILD EMPINFO(EMP, CHILD, SALARY, YEAR), g,(SALESDIV(DIVISION, DEPT, LOCATION)) = DIVINFO(DIVISION, DEPT, LOCATION) A

DIVISION = SALES, g,(SERVICEDIV(DIVISION, DEPT, LOCATION)) = DIVINFO(DIVISION, DEPT, LOCATION) A

DIVISION = SERVICE, and g,(OTHERDIV(DIVISION, DEPT, L~ATION)) = DIVINFO(DIVISION, DEPT, LOCATION) A

DIVISION # SALES A DIVISION # SERVICE.

Page 4: Constraint preserving and lossless database transformations

142 J. GRANT

Since g, is not a transformation, (f,,g,) is not constraint preserving. However, (f,, g,) does possess the lossless property.

Next we show how the constraint preserving and lossless properties, which are defined for inter- pretations between systems, reduce to the dependency preserving and lossless join properties for relational database decomposition. We use the following gen- eral setup: L contains the relation symbol R whose attribute set is 2; L’ contains the relation symbols R,, . . . , R, with attribute sets Xi,. . . , X,, where for 1 I i #j I k, Xi # X, and z=u:=,& Also, f(R(z)) = R,(x,) A. . . A Rk(xk), where xi is the projection of z on Xi and g(R,(xi)) = 3xjR(z) for 1 I i I k, where xi is a sequence of variables complementary to xi and z contains the variables in xi and xj

Intuitively, R is the join of R,, . . . , R, and each Ri is the projection of R on Xi.

In this section we assume that all the constraints are fds and, without loss of generality, restrict fds to a single attribute on the right side of the arrow. The idea of dependency preserving is that the fds projected on Ri should logically imply the original fds. The idea of lossless join is that if an instance is decomposed by g and then joined by f, the original M is recreated. Now we formalize these notions in our context.

We say that (f,g) is lossless join if for every instance M of T, f(g(M)) = M. Next, let s = R : X-+A be an fd statement. If X U (A } c Xi for some i, 1 I i I k, then we call s projectible and write s(Ri) for the projected fd statement, s(Ri) = R,:X+A. Let S be a set of fd statements on R. We write P(S) for the projectible subset of S, and S* for the set of fd statements logically implied by S. We say that (f,g) is dependency preserving if P(S*) logically implies S. Finally, to define T’ we set S’ to the set of fd statements in L’ obtained from S* by projection on the R, 1 I i I k. Our main result is given in the following theorem.

Theorem 3.2 Assume that S may contain only fd statements. (a) (f,g) is constraint preserving iff (f,g) is

dependency preserving. (b) (f,g) is lossless iff (f,g) is lossless join. Proof. (a) (=+) We prove the contrapositive. So

assume that P(P) does not logically imply S. This means that there is a structure M for L such that P(S*) is true in M but some s E S is false in M. We complete the proof by showing that S’ is true in g(M) but f(s) is false in g(M). Let S’E S’ where S’ = s*(R,) for some i, Isilk, s*ES* and s* = R:X+A. Since s* is true in M, the R-table of M does not contain two rows that are identical on X and which differ on A. Since the R,-table of g(M) is the projection of the R-table of M, g(M) also cannot contain two rows that are identical on X and which differ on A. Thus S‘ and hence S’ is true in g(M).

Now consider s again and let s = R : Y+ B. The R-table of M must contain two rows which are identical on Y and which differ on B. Thenf(s) is an interrelational constraint made false by the projections of these two rows. Sof(s) is false in g(M).

(G) Again we prove the contrapositive. So assume that either S’ does not logically implyf(S) or S does not logically imply g(S’). But we can show that S logically implies g(S’). For if s’ E S’ and s’ = s*(R,) then g(s’) = s* E S*. So we may assume that S’ does not logically imply f(S). This means that there is a structure M’ for L’ such that S’ is true in M’ but for some s E S,f(s) is false in M’. We complete the proof by showing that P(P) is true inf(M’) but s is false inf(M’). The latter is immediate from 2.1. Now let s*oP(S*). Then s*(Ri) = R,:X+A for some i, 1 I i I k, is in S’. Therefore the Rrtable of M’ does not contain two rows that are identical on X and which differ on A. Since the R-table off(M’) is the join of the tables of M’, it also cannot contain two rows that are identical on X and which differ on A. So s* is true inf(M’).

4. DATABASE NORMALIZATION

In this section we discuss database normalization by decomposition. We view normalization as a pair (f,g), f:L-+L’, g:L’-+L between two systems T = (L, S) and T’ = (L’, S’) where T’ is in some normal form. We present a normal form which is based on keys and generalizes BCNF. First we allow only fd (functional dependency) statements and later we include ejd (embedded join dependency) state- ments as constraints.

We say that a set of fd statements forms a key declaration (for some R’) if it asserts exactly that a certain set of attributes forms a key (for R’). We say that T = (L, S) is in KNF (key normal form) if there is a set of statements K that forms a set of key declarations (for the relations of L) such that K and S are logically equivalent. KNF is similar to DK/NF [ 11, however KNF does not deal with domain declarations.

We use the setup from Section 3 with L containing R, L’ containing R,, . . . , R,, and the (f,g) given there. We say that T = (L, S) is normalizable by &composition if there are T’ = (L’, S’), f :L +L’ and g:L’-+L such that T’ is in KNF. We start by dealing with systems whose constraints are fd state- ments. First we define BCNF (Boyce-Codd normal form) in our context. The idea of BCNF is that all fd statements which are constraints are implied by the keys. So we say that T is in BCNF if there is a set of fd statements K that forms a set of key declarations for the relations of L such that S logically implies K and every fd statement logically implied by S is also logically implied by K. Our 6rst result follows from the observation that if S contains only fd statements then T is in KNF iff T is in BCNF as well as from some standard results ([A Section 7.4).

Page 5: Constraint preserving and lossless database transformations

Constraint preserving and lossless database transfo~ations 143

Theorem 4.1 Assume that S contains only fd statements (a) 7’ is normalizable by a lossless decomposition. (b) (+-o-(v) are equivalent. (i) 7’ is normalizable by a constraint preserving

decomposition. (ii) T is normalizable by a constraint preserving

lossless decomposition. (iii) There is a dependency preserving decom-

position of T to BCNF. (iv) There is a dependency preserving’lossless join

d~m~sition of T to BCNF. Next we extend our results to include ejd state-

ments as constraints. The ejd R : *[X,, . . . , XJX ex- presses the constraint that if the attributes are projected to UT= ,X, (X = 2 - U y! IXi) then the de- composition to the relations R,, . . . , R,,, with attri- bute sets X,, . . . , X,,, is lossless join. A jd statement is a special case with X = 8.

Ifs=R:*[X,,..., X,llX, then we set the projected statements s(Ri)=Ri:*[X,nXi,...,XJIXJtXnXi for 1 I i I k. (We omit XflX, if it is empty; s(RJ is the empty formula if Xi r X.) Now suppose that S may contain fd and ejd statements. We write S* for the fd and ejd statements lo&ally implied by S. To define T’ we set S’ to the set of fd and ejd statements in L’ obtained from S* by projection on the R, 1 1 i 11; k. We also define statements corresponding to s(RJ in R as s(i)=R:*[X,flX,,...,X,nXJ (X U(Z - Xi)). We write P(S) for the union of the ejds and the projectibie fds in S. We say that (f,g) is dependency preserving if P(S*) logically implies S.

To extend 3.2 to ejd statements we define a special closure condition. We say that S is ejd-closed w.r.f. (f, g ) if whenever s E S, s ’ are ejd statements then for each i, 1 I i 5 k, s(i) E S and if S logically implies s’ then $’ E S. For example, recall 2.2 where we indicate by ” the restriction of the languages, mappings and constraints to EMPINFO and EMPFAM, EMPSAL respectively. Then S; is not ejd-closed w.r.t. (f;‘,g;). However, Sf* is ejd-closed w.r.t. fl,gf) because for s o S;*, each s(i) is valid. Generally, ejd-closure holds in the presence of the closure of a single jd if the decomposition is done in accordance with that jd.

Theorem 4.2 Assume that S may contain fd and ejd statements

and is ejd-closed w.r.t. (f,g). (a) (f,g) is constraint preserving iff (A g) is

dependency preserving. (b) (fig) is lossless iff (f,g) is lossless join. Proof. We indicate only the extensions to the proof

of 3.2. (a) (*) If P&S*) is true in M while for some s ES

s is false in M then s must be an fd statement. Again we show that S’ is true in g(M) but f(s) is false in g(M), It suflkes to show that if s’ E S’ is an ejd statement then s’ is true in g(M). So suppose that s’ = s*(Ri) for some s* E S*. Since S is ejd-closed, s*(i)EP(S*). Hence, s’ is true in g(M).

(-=) Again, since S is ejd-closed and g(s(R,)) is equivalent to s(i), S logically implies g(S’). Suppose that S’ is true in M’, while for some s E Sf(s) is false in M’. We need to show that P(S*) is true inf(M’). This follows, since if s* E P(S*) is an ejd statement, then the R,-tables contain rows whose join is a row needed for the truth of s* in f(M').

Next we extend 4.1 to include ejd statements. For this purpose we define a normal form, based on PJ/NF[l], which means that all fd and ejd statements are implied by key constraints. We say that T = (L, S) is in EPJNF (embody projection join normal form) if there is a set of fd statements K that forms a set of key declarations for the relations of L such that S logically implies K and every fd and ejd statement logically implied by S is also logically implied by K. The following result extends 4.1 and shows the connections between EPJNF and KNF.

Theorem 4.3 Assume that S may contain fd and ejd statements

and is ejd-closed w.r.t. (j;g). (a) T is no~~~ble by the ~nstraint preserving

decomposition (A g ) iff {A g) forms a dependency preserving decomposition of T to EPJNF.

(b) T is normalizable by the lossless decomposition $;i;t$$,g& forms a lossless join decomposition

We end this section with the following example.

Example 4.4 (a) Let T4 = (4, Sd) where L( contains the relation

symbol R(CI’N, ST, ZIP) and S, = (R:(CITY, ST)-r(ZIP, ZIP-*CITY) ([7f Example 7.10). T4 has no dependency preserving decomposition into BCNF. Hence, by 4.1, T4 is normalizable by a lossless decomposition but not by a constraint preserving decomposition.

(b) Let T5 = (Ls, &) where L, contains the relation symbol R(A, R, C) and S,={R:*[AB,AC], *[AB, BC]}. T, has a dependency preserving decom- position (f5, gs) into EPJNF where 4 contains R,(A, B) and R,(C) and S, = 8. However, T$ is not normalizable by any dependency preserving lossless join de~mpo~tion into EPJNF such that S, is ejd-closed w.r.t. the d~om~sition. Hence, by 4.3, Ts is not normalizable by any constraint preserving lossless join decomposition such that S, is ejd-closed w.r.t. the decomposition. This shows that 4.1(b) does not carry over in its entirety to ejd statements.

5. DATABASE DISl’RDlUTiON

In this section we consider the distribution of a centralized relational database into components. This is sometimes called a horizontal d~om~sition[6~. We obtain necessary and suihcient conditions for the constraint preserving and lossless properties. We allow fd (functional dependency), mvd (multivalued dependency) and cd (conditional dependency) state- ments as constraints.

Page 6: Constraint preserving and lossless database transformations

144 J. GRANT

In order to explain our setup for distributions we need some definitions. Let L contain the relation symbol R with attributes A,, . . . , A,. We say that a sentence d is a domain a24aration Vor Ai) if it has the form Vu, . . . VaJR(a,, . . . , a.)-e(ai)) where e(a,) is a formula of L with free variable a, which does not contain R or any aj, j # i. In general, e(a,) includes predicate symbols like = and < and constant sym- bols. For example, if 0, 10 and < are in L and appropriate for use on Ai, then Vu, . . Van

@(a,, . . . , a”)+(0 < a, A ai < 10)) is a domain decla- ration which states that the values for Ai are between 0 and 10. A domain declaration is a domain dependency [l] expressible in L. We also need to combine domain declarations for various attributes. We say that Q is a condition (declaration) if it is (equivalent to) a conjunction of domain declarations (for one or more attributes) and identify a condition Q(a) with the conjunction of the e(ai)‘s.

We use the following general setup for distribu- tions: L contains the relation symbol R with attri- butes A,, . . . , A,; L’ contains the relation symbols

R,, . . . , R, with attributes A,, . . . , A,; Q&4,. . . , Q,(a) are conditions. Also

f@(a)) = i’$i(Ri(a) A QM>,

g(R,(a)) = R(a) A QXa) for 1 I i I k.

Intuitively, R is the union of (the proper elements of) R,, . . . , Rk, and each R, is obtained from R by selection via the condition QP An example appears in 2.2 where the DIVINFO relation is distributed into the SALESDIV, SERVICEDIV and OTHERDIV relations. In dealing with the conditions we assume that the axioms for the predicates are built-in and that for any (appropriate) constant sequence b, Q,@) is either always true or always false. For example, the condition (0 < a A a < 10) A (5 < b A b < 10) is always true for the pair (1,7) and always false for the pair (15,7). Also, by the way that conditions are defined, we note for later reference that if Q,{b,, . . , b,) and

QXK . . . , b:) are both true, then every

Q,ib:, . . . , b,*), where b: = bi or bl, 1 I i I n, is also true.

We start by dealing with systems whose constraints are fd and/or mvd statements. If s = R :X-, Y (resp. X+-+Y) then we write s(Ri) = R,:X+Y (resp. X+ + Y) for 1 I i I k and call s(R,) the distributed statement (corresponding to s and R,). We also de- fine two properties about conditions. We use the notation Q for the sequence (Q,, . . . , Q,). We say that Q is a sequence of s-conditions for s = X-+ Y (resp. X+-* Y) if the sentence VxVy,Vy,Vw,Vw,

(<Q,k yI, w) A Q,h yz, w&-+~i = ~2) (resp. VXVY~VYZ ‘+‘w,Vw,((QXx, ~1, ~1) A Q,@, ~2, W&-~YI = ~2 “WI = ~2))) is always true for all i, j, 1 I i #j s k, where x, y,, wi are sequences for the attributes in X, Y, W (the rest of the attributes) respectively. We call Q a

sequence of covering conditions if Vz(Q,(z) V . . . V Qr(z)) is always true.

Theorem 5.1 Assume that S may contain fd and mvd statements

and S’ contains all the corresponding distributed statements.

(a) The distribution is constraint preserving iff Q is a sequence of s-conditions for all s E S.

(b) The distribution is lossless iff Q is a sequence of covering conditions.

Proof. We give the proof of (a). (a) (+) We show the contrapositive. Suppose that

Q is not a sequence of s-conditions for Q, and Q, where s = R :X -+ Y (resp. X-++ Y). Let M’ be a structure for L’ which contains exactly two tuples: a tuple b, in the R,-table and a tuple b2 in the R,-table such that Q,(b,) is true, Q2(b2) is true, and b, and 4 are identical on X but differ on Y (resp. differ on Y and W). Then S’ is true in M’ butf(s), which is an interrelational constraint in L’, is false in M’. Hence the distribution is not constraint preserving.

(-=) Assume that Q is a sequence of s-conditions for all s E S. Clearly, s lo&ally implies g(s(R,)) for all i, 1 < i I k. To show that the distribution is con- straint preserving it suffices to show that for every s ES, {s(R,), . . . , s(Rk)f logically implies f(s). We now indicate why this is so for the more complex case of mvd statements. Ifs = R : X -+ -+ Y for R (X, Y, W) then

s = VxVy,'4y,Vw,Vw,((W, YI, ~2) AW, Y~,w,))+

W, ~1, w,))

and

f(s) = vxvy,vy,vw,vw,

(( ik (4(x, YI, ~2) A QXx, YI, ~2))

A iQ, @Ax, ~2, WI) A Qh ~2, WI)) )

-+

i$( (RX% YIT WI) AQitxv YIT WI)) )

.

There are 2 cases; (1) For a fixed i = i0, R,& YI, ~2) A

Qdxt YI, we) A &(x9 Y23 wi) A Q~o(x, ~2, WI) holds. Then, by s(&) we get &(x9 yl, wi), and Q&i, yl, w,) follows from an earlier note.

(2) For i0 # il we get that R,(x, y,, w,)A

Q&v YI, ~2) A R& ~2. WI) A Qidx, ~2. w,) holds. Then, since Q is a sequence of s-conditions, we get y1 = y2 V wI = w2. So either we obtain R,,(x, y,, w,) A

Qil(x, YI, WI) or 4.4~~ ~1, wi) A Q.&, ~1, ~1). Sometimes a relation has constraints which apply

only to a subrelation. Consider again 2.2. It may be the case that DEPT determines LOCATION where DIVISION = SALES but not necessarily otherwise.

Page 7: Constraint preserving and lossless database transformations

Constraint preserving and lossless database transformations 145

To deal with this type of situation, we say that a statement is a cd (conditional dependency) statement (for V, g)) if it is obtained from the corresponding fd or mvd statement by substituting R(a) A Qna) (for some i, 1 < i < k) everywhere for R(a). We write ST = R :(QJs where s is the dependency statement and si* is the corresponding cd statement. With each cd statement SF we associate a single distributed dependency statement, s:(R,) = s(R,). We can extend 5.1 to include cd statements.

Theorem 5.2 Assume that S may contain fd, mvd and cd state-

ments and S’ contains all the corresponding distrib- uted statements.

(a) The distribution is constraint preserving iff Q is a sequence of s-conditions for all s E S which is an fd or mvd statement.

(b) The distribution is lossless iff Q is a sequence of covering conditions.

We end this section with the following example.

Example 5.3 In all parts of this example & contains a relation

symbol R(A, B, C), constant symbol 0 and predicate symbols < for A, B and C. The distribution V,,g,) is done in accordance with the conditions Q,, . . . , Q, in each part.

(a) Qda, b, c) = 0 < a, Q,@, b, cl= 0 < b, Q,(a,b,c)=(a <Ova =O)A(b <OVb =O),

S, = {R :A +B}, S, = {R,:A -+B, R,:A -+B, R,:A-+B}.

By 5.1 the distribution is lossless but not constraint preserving.

(b) Q,(a,b,c)=O<aAc=O, Q& b, c) = 0 < b A c = 0,

S,={R:A-t+B,(Q,)B-+C}, S, = {R,:A++B, R,:A++B, R,:B+C}.

By 5.2 the distribution is constraint preserving but not lossless.

6. HETEROGENEOUS DATABASES

We have presented a generalization for the con- cepts of dependency preserving and lossless join

EMPLOYEES EMP CHILDREN SALARIES

CHILD SALARYYEAR

JOHN SMITH SUSAN 22500 1982

ROBERT ADAMS HARRY 16800 1980

18100 1981

19500 1982

THOMAS KELLY SUSAN 27600 1981

31200 1982

NANCY JONES LINDA 22500 1981

24050 1982

MARY TAYLOR FRANCIS 20400 1982

MICHAEL

JOHN L

DIVISIONS DIVISIONl DEPARTMENTS f

I DEPT / LOCATION1

SALES HARDWARE 8

FURNITURE 7

TV 5

SERVICE AUTO 1

TV 6

DP PROGRAMMING 2

OPERATIONS 3

EMPLOYEE,PERSONNEL 10

Fig. 3. The tables for M3.

decompositions for relational databases in the frame- work of interpretations. In the previous section we showed how these notions apply to interpretations which describe the distribution of a relational data- base. However, interpretations can be applied to hierarchic and network databases also by using data- base logic[2, 31. Therefore, the constraint preserving and lossless properties are meaningful for mappings between heterogeneous databases, such as between a relational and a network database or between two hierarchic databases. We conclude this paper by giving an example of interpretations between a re- lational and a hierarchic database.

Example 6.1 We continue with 3.1. Let L, contain the relations

EMPLOYEES(EMP, CHILDREN, SALARIES), CHILDREN(CHILD), SALARIES(SALARY, YEAR), DIVISIONS(DIVISION, DEPARTMENTS), and DEPARTMENTS(DEPT, LOCATION). This language describes a schema with two hier- archies. In Fig. 3 we show the tables of a structure M3 for L,. We define f2: L, +L, by setting

Page 8: Constraint preserving and lossless database transformations

146 J. GRANT

f,(EMPINFO(EMP, CHILD, SALARY, YEAR)) = 3CHILDREN3SALARIES (EMPLOYEES(EMP, CHILDREN; SALARIES) A CHILDREN(CHILD) A SALARIES(SALARY, YEAR)), and I,(DIVINFO(DIVISION, DEPT, LOCATION)) = 3DEPARTMENTS (DIVISIONS(DIVISION, DEPARTMENTS) A DEPARTMENTS(DEPT, LOCATION)). We obtain M, =f2(iU3).

We define g,: L3+L, by setting g,(CHILDREN) = EMP’, g,(SALARIES) = EMP”, g,(DEPARTMENTS) = DIVISION’, g,(EMPLOYEES(EMP, CHILDREN, SALARIES)) = (EMP = EMP’ A EMP = EMP” A 3CHILDBALARY3YEAR EMPINFO(EMP, CHILD, SALARY, YEAR)), g,(CHILDREN(CHILD)) = 3SALARY 3YEAR EMPINFO(EMP’, CHILD, SALARY, YEAR), g,(SALARIES(SALARY, YEAR)) = 3CHILD EMPINFO(EMP”, CHILD, SALARY, YEAR), g,(DIVISIONS(DIVISION, DEPARTMENTS)) = DIVISION = DIVISION’ A 3DEPT3LOCATION DIVINFO(DIVISION, DEPT, LOCATION), and g,(DEPARTMENTS(DEPT, LOCATION)) = DIVINFO(DIVISION’, DEPT, LOCATION). We obtain M, = g2(M,).

To define the hierarchical system we need to give

the constraints. Let s, = {SALARIES: YEAR-+ SALARY}. Then we obtain Ts = (L,, S3). We find that (fi, g2) is both constraint preserving and lossless. If we modify S, by removing EMPINFO: EMP-+ *CHILD, then the constraint preserving property would still hold but the transformation would not be lossless.

Acknowledgement-1 wish to thank the referees for their comments and suggestions.

REFERENCES

[1] R. Fagin: A normal form for relational databases that is based on domains and keys. ACM TODS 6, 387-415 (1981).

[2] B. E. Jacobs: On database logic. JACK 29, 310-332 (1982).

[3] B. E. Jacobs: On interpretations in database logic and their applications. Technical Reporf, Department of Computer Science, University of Maryland, 1980.

[4] B. E. Jacobs, A. R. Aronson and A. C. Klug: On interpretations of relational languages and solutions to the implied constraint problem. ACM TODS 7,291-315 (1982).

[5] D. Maier, A. 0. Mendelzon, F. Sadri and J. D. Ullman: Adequacy of decompositions of relational databases. JCSS 21, 368-379 (1980).

[6] J. Paredaens and P. De Bra: On horizontal decom- positions. Presented at the XP2 Workshop, Pennsylvania State University, June 1981.

[7] J. D. Ullman: Principles of Dataluzse Systems, 2nd Edn, Computer Science Press (1982).