Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
2
Databases
and Lattices
Over the past few years relational databases have becc>me a standard field of study
for researchers in computer science and this theory enjoys considerable mathemati
cal support. In this chapter, we have attempted to give interesting results in a con
ceptually simple fashion and the main topics in the theory of relational databases
have. been discussed in detail with examples. The main body of the chapter is
divided into five sections:
* Relational Model
* Data Dependencies
* Entropy and Information
* Lattices, Boolean Algebra, and Dependencies
* Normal Forms.
2.1 Relational Model
In this section, the basic: definitions of relational model of databases are given.
Most of the definitions are conventional ones but given in a. simple form. Some
Databases and Lattices 6
definitions are new. An attempt has been made to include all the fundamental
ideas in relational database theory and examples are provided wherever necessary
to illustrate the concepts.
A datamodel[13,30,38,56,60J is primarily used for modelling a logical database
in a DBMS. There are three data models proposed and available in the literature
the relationalmodel[12,15,17-19], the network model[14,58], and the hierarchical
moclel[59]. In this thesis we consider the relational model conceived by Codd in
1970. The relational model has many advantages over the other models mainly
because of its simplicity and sound theoretical background. The model rests on the
well developed mathematical theory of relations and the first-order logic[28,31]. In
data modelling, the method stems from a perception of the real world through finite~
objects. The objects are essentially of three kinds: entities, their attributes, and
the relationships among entities. Thus attributes destribe entities which in turn
are associated by relationships.
• An entity is a physical or abstract object that exists and cau be distin
guished from other objects.
For example, a Maruti 800 car with a registration number KGX 6018 is an entity
since it uniquely identifies one particular car. By contrast, a book on databases
published in 1982 is not an example of an entity since the features do not identify
any particular book.
• Properties that are to represent an entity in the underlying database are
called attrib·u.te.~.
For example, for an entity named car there may be two attributes: lVIodel and
Registration Number associated with it. It is important to distinguish between
DatabiiSes and Lattices 7
attributf' name and attribute value. For the attribute name model then~ may b(~
attribute values like Maruti 800, PrPmier 118NE, Standard 2000, de ..
Before going into fttrther definitious let us tevit-'W the uotations used m this
chapter. For universal set of attrilmtes, i.e., set of all attrilmt<-~s iu a rdatiou, \\W
use n. We use U, V, TV with or without subscripts for suhsds of n. X, Y, Z
with or without subscripts stand for variable attrilmte wmws aud A, B, C, · · ·
with or without subscripts stand for specific attribute names. But. wlwu we discuss
dependencies in terms of concepts in boolean algebra, sma.lll<"tt<-~rs iustt-'ad of capital
letters are used for attribute names .
• The univer$C of a relational database, derwted by n. is a finit<;, nonempty
set of attributes AI' ... ' An, i.e., n = {AI' ... ' All}.
The dom.ain of an attribute Aj written as DOM(Aj ), is a fiuit.IJ set of a.ttrilmte
values of Aj which must be of the same datatype. A domaiu is a .~im.plc .>d if all its
elements are atomic (i.e., nmHlecomposable by the mHkrlyiug DBMS). Nonatomic
attributes occur in the theory of nested relational dataha.ses[35,52]. In relational
databases only atomic attributes are considered. Domain of a subset U of D, denoted
by DOM(U), is the union of the domains of all attributes in U, i.<'.,
DOM(U) = U DOM(Ak) AkEU
where DOM( <P) = </J.
• A relation R(D) is a subset of a cartesian product. x(D 1 , · • ·• D 11 ) where
Thus a relation can lw represented by a functioi1 with domain D f = x ( D 1 , • • ·, D 11 )
and range Rt = {0,1}. Ld that function he P(r/1 , · • ·,r/11 ) when-' diED;. P takes
the value 1 if ( d1, · · · , d 11 ) E !R aucl takes the valnt-~ 0 otlwrwis('. Sp(~~a.kiug informally,
a relationinrelational database theory can bevisnalizcdas a L1blewitlwa.ch column
Databases and Lattices 8
heading as attribute name and each row as an element of that relation. Under each
attribute name there are attribute values of the same datatype. Cardinality of R
or CARD(?R) is the number of rows in the table and degree of ?R or DEG(?R) is the
number of columns. For example, a relation BOOK may be defined as follows.
Title Author ISBN Type Future Shock Alvin Toffier 03199-11 Fiction
Bhagavad Gita S. B. Varma 34-5691-2 Philosophy One Richard Bach 94770-08 Fiction
The 'JEXbook D. E. Knuth 201-13448-9 Technical A Suitable Boy Vikram Seth 0-12-2318-7 Fiction
Power Shift Alvin Toffier 04729-15 Fiction Bhagavad Gi ta S .. Radhakrishnan 01-23-4568-9 Philosophy
A relation in table form has the following properties:
o There is no duplication of column names since these names are the at
tributes in the set U.
o There is no duplication of rows since 1'j is the set of tuples which occur in
the domain of the function P.
o Row and column orders are insignificant. When columns are permuted for
any reason, all the attribute values corresponding to it are also permuted.
o All attribute values are atomic.
0
• Let U be a nonempty subset of n, R(U) be a relation over U, and V be a
nonempty subset of U. The projection of R onto V is the relation over V .
consisting of the V -value of each tuple in R, i.e.,
R[V] = {lt[V]IV ~ U and 1r E R(U)}.
In other words, a projection of a relation is obtained by deleting certain columns
and by removing duplicate rows. In the above example BOOK is a relation over
Databases and Lattices 9
U ={Title, Author, ISBN, Ty}H~}. The projection of BOO!\ owr V
Type } will be
{ Author,
AnthoT Typr: Alvin Toffier Fiction S. B. Vanna Philosophy
Richard Bach Fiction D. E. Knuth Tedmic·al Vikra.m Sdh Fiction
s. Ra.dll<t.krislm<ul Philosophy
Note that every projection is also a relation over a smaller sd, of attributes.
• Let 2R 1(U) and 2R2 (V) be two relations. The 11:atnml join (or simply
join)[20] of Rt and R2 written as *(2R1, 2R2) is the relation with a.ttributc~s
from u ( v - ( u n v)) consisting of all tuples, such that for ea.dl tuple ,. in
the join then' c~xists some tuple s 1 in 2R 1 and s2 in 'R2 satisfying r·[U] = SJ
and r· [VJ = s2.
\Vhen U and V are disjoint, a join of R1 and 2R2 will lwc·onw the cross product of
R1 and R2~ As an Pxa.mple of join, consider the following relations \R1 a.nd ~R2:
A B c D B E F 4 a.b 15 20 a.b 20 pq 2 a. a 18 25 and ac 25 pr 8 a.c 9 10
Then,
A B c D E F 4 ab 15 20 20 pq 8 ac 9 10 25 pr
Databases and Lattices 10
• A decomposition of a relation )R into its projections R1, · · ·, Rn is losslc.~.~
if )R can be recow;truded by joining tlH~se projections, i.(~.,
For example, consider the relation )R and its projections R1, Rz, and R1.
A B c al bl CJ al bz CJ al ba Cz az bl ca
A B A c SR3 B c al bl al C] b) C]
al bz al cz hz ,cl a] ba az ca ba Cz
q,z bl bl ca
Here the decomposition of R into R2 and R1 is losslf'ss since *(R2 , R:d = !R: Bnt
the decomposition R 1 , !R1 is a los8y decomposition since *(R1 , R1 ) #- R.
In the next section, various data dependencies in a relation are discussed. Terms
like keys, superkeys, etc. are defined and explained.
2.2 Data Dependencies
Data dependencies are constraints imposed on data in a database. In addition
to a set of attributes, a set of data dependencies is also an essential part of a
relation or database scheme. The class of functional dependencies was the first
type of data dependencies to be studied and has since been thoroughly investigated
by many researchers including Cocld[17], who proposed the concept, Delobel and
Casey[24], who developed a set of inference rules, and Annstrong[3], who gave a
set of inference rules which are independent, sound, and complete. The relevance
of functional dependencies in the design of a relational database scheme has been
based on the observation that, in most cases, a set of functional depeiHl(~nci(-~S plus
Databases and Lattices 11
one join dependency are enough to expre:-;s the dependency structure of a rda.tional
databa..<;e scheme. Also, they are essential for the assumption of the universal rela
tion scheme[29,39]. Multivalned depenclenci(~S were independ(~ntly introduced by
Fagin[27], Zaniolo[63], and Delobel[23]. This class of dependencies is a generaliza
tion of the class of functional dependencies since nmltivalued dependencies provide
(many-to-many) relations, where as functional dependencies provid(~ only (many
to-one) functions. This section discusses data dependencies briefly. All the terms
are defined and illustrated with examples.
2.2.1 Functional Dependencies
An: attribute A functionally depends on another attribute B if, in the relation,
whenever B-value becomes equal, A-value also becomes equal. More formally:
• Let n be the universe and U be a subset of n. A Fu'nctional Dependency,
abbreviated as ·FD, is a constraint on U and i:-; of the form V -:--+ Hf wh(~re,
V and lV are subsets of U. Relation R(U) satisfies FD V-:--+ W, or V-:--+ lV
holds in R(U), if for every two tuples in R(U), say 1· and s, whenever their
V -values are identical, their lV -values are also the same, i.e.,
(r[V] = s[V]):::} (1·[lV] = s[H!])
In the relation BOOK defined earlier, we see that the attribute ISBN determines ev
ery other attri~)ute, A 1dhor and Titlr: together determine:-; ISBN, An thor determines
Type, and Title determines Type. The following relation
A B c a.o bl C]
a.o bz C]
a.o b3 cz O.z bl c3
has the dependencies AB-:--+ C and C -:--+ A. It can be verified that whenever the A-
Databases and Lattices 12
values and B-values coincide, the corresponding C-values also coincide. Similarly,
whenever the C-values coincide, the corresponding A-values also coincide.
Inference rules for FDs
* REFLEXIVITY:
(V ~ U) =} (U --+ V)
i.e., for attributes A, B, and C the FD ABC --+ AB is always true. FD's of this
kind are said to be trivial. Here ABC is an abbreviation for {A, B, C} and similarly
AB for {A, B}.
* AUGMENTATION:
Special cases: When U3 = </J, U1 U1 --+ U"-. \iVhen U3 = U4 , U1 U4 --+ U2U4 . When.
u3 = u4 = u1' u1 u1 --+ u1 u2 or simply, u] --+ u1 u2. * TRANSITIVITY:
(U--+ V and V--+ l¥) =} (U--+ l¥)
The following inference rules are deduced from the above three rules.
* PSEUDO-TRANSITIVITY:
If U3 is replaced with <P the transitivity rule is obtained. This rule is a generalisation
of transitivity.
* UNION or ADDITIVITY:
(U --+ V and U --+ W) =} (U --+ Vl¥)
* DECOMPOSITION or· PROJECTIVITY:
(U--+ Vl¥) =} (U--+ V and U--+ W)
Databases and Lattices 13
Cardinalities of projections U and V completely ddennine whether U func
tionally determines V or not. The following result statt's this.
* Let CARD(~[n]) or simply CARD(n) he tlw canlinality of tht~ relation ~
over the set of attribute:-; n as defined earlin·. Let U awl V lw subsets of n.
Then U is :-;aid to functionally determine V if CARD( U) = CARD( U, V).
• A set of attributes U not containing any of the a.ttrilmtes in the attribute
set V is called a dctermina.nt of V if V is fnndioua.lly dependent on U and
no proper subset of U has this property. If V is such that no super :-;et of
V has U as dett~rminaut then the set V is called the colon:tJ of. U. A set of
attributes U is called a saturated set if the following condition is satisfied:
if a determinant ·is in U then its colony must lw in U.
The above definition is explained in the follnwing t~xa.mple. Let ~ lw the
following relation.
A B c ao b1 CJ
(/.0 b2 CJ
O.o b3 C2
{1.2 b1 C:l
There exist two FDs AB --+ C and C --+ A in ~- Determinants in ~ are AB and
C. Colony of A = ¢>, B = ¢>, C = AC, AB = ABC, AC = AC, BC = ABC,
ABC= ABC. Saturated sets ofR are A, B, AC, and ABC. Note that uullset ¢>
and full set n (set of all att.ri bu te:-;) are saturatt~d st~ts of ewry relation. Similarly,
colony of ¢> = ¢> and colony of n = n. We assume that uo rela.tiou has a constant
colun1n.
14
Keys in a Relation
The central concept in the relatioualmodel i::; the c~oncept of a. key. Intuitiv(-~ly,
an attribute (or a collection of attributes) may uniqudy clet.ennim· auy particular
tuple within a relation. The existence of atleast one k(~y i::; guaranteed by definition.
However, the point is to hav(~ a key of minimal length, i.e, an attribute i::; to be
discarded from the key provided it does not destroy its uniqueness.
• A set of attributes U is called a candidate X:q; or simply a X:cy of a rela.tiou,
if U determines every attribute in the relation and no proper subs(~t of U
has this property. An FD I\ ---+ U i::; called a key dependency of ?R if I\ is
a key of ?R.
It is pos:-;ible that a relation has more than one candidate key. In :-;uch ca::;e:-;, mw
of them is designated by the datahase de:-;igner as the prinwry key, i.e., as the
primary means of identifying the corre:;;ponding tuples. All other keys are alternate
keys. The choice of the primary key from among the candidate keys depends on tlw
particular circumstances, though typically the shorte::;t key i::; the most favourable.
Any set containing a key is called a 8uperkey.
• An attribute is called a prime attrib11.te if it is an PlPnH~nt of some key. Non
prime attributes are referred to a::; petty attributes. Two set::; of attributes
U and V are said to be equivalent if U ---+ V and V ---+ U. A key is a simple
key if it contain:-; only one attribute. Complex key.~ are those in which there
are more than one attribute. A set of attributes which detennines atlea.st
one attribute outside it is called an open 8ci. A set which is not au open
set is called a clo$ed $ct.
It can be verified that ::;aturatecl :-;ets and dosed set:-; of a relation are identicaL Now,
consider the same example ?R with dependencies AB ---+ C and C ---+ A.
Databases and Lattices 15
A B c ao bl CJ
ao b2 C]
ao b:J c2 0.2 bl c3
Keys of the above relation are AB and BC. ABC is a super key since it contains
a key. All the attributes are prime attributes. Since there are no keys with sin
gle attribute all the keys are complex. C, AB, and BC are open sets since they
determine A, C, and A respectively. All other sets are closed.
D
The close connection between saturated sets (or closed sets) and lattices IS
given in section 2.4. Next we consider the representation of FDs in terms of boolean
functions.
Representation of FDs Using Binary Relations
This s~ction deals with a particular binary relation called an attribute relation
and its specializations[45]. This binary relation is not to be confused with the
relations of a relational database. The study of these relations can be helpful for
visualizing FDs.
Let S1 be the set: S1 = {Xu-1 , Xn-2, · · ·, X0 } where each X; is an attribute.
Let S be the power set of n. Thus the cardinality of S is 211• Let the elements of
S be Ro, R1, · · ·, R2"-1·
• An attribute relation (or attribute graph) is a subset of the Cartesian
product S x S: · A dependency relation (or dependency graph) is an
attribute graph satisfying the following three axioms.
0 TUU---tT ( projectivity)
o (T ---t U and T ---t F) :=;. (T ---t U U V) ( additivity)
Databases and Lattices 16
o ( T ----+ U an cl U ----+ F ) ::::} ( T ----+ F ) ( trausi t,i vi ty)
where T, U, F E S awl the symbol ----+ staucls for clq><~ucl<~w~y relation.
It can lH· verified that. the ahov<· axioms are iuclepcwlent.. It is kuowu[3,8,42] that
these tluet-~ axioms an~ appropriate for n~pres<~ntinJ,!; FDs awl that tlwy are sound
and complete. Consider the attribute .e;raph
B A ----+ C, C ----+ A
where the notation C, B, and A are used instead of X.2 , X 1 , X 0 aud also B A
instead of { B, A} for convenienr<-'. Tlw graph can lw rqn·<~seuted geometrically as
in Fig. 2.1.
AB
A
• ABC
.. AC
c
• <P
Fig. 2.1
• BC
• B
0
A graph can be represented by its adjacency matrix. The orclPr of the adjac<~ncy
matrix is equal to the nmnber of verti<~Ps of the graph. Iu repn-·s<>uting ;,n attrihnte
graph a '1' appears at the intPrsection of r.ow Ri and cohmm Ri if R; ----+ Rj otherwise
a '0' appears there. The adjacency matrix of the a.l H>V<' p;l'aph is showu hdow.
Databases and Lattices 17
"' A B A/3 (; i\(.' HC ABC
</> 0 0 0 0 0 () 0 ()
A 0 0 0 0 0 () () ()
B 0 () 0 0 0 0 () 0
G AB 0 0 0 0 1 () () ()
c: 0 1 0 0 0 0 () 0
AC 0 0 0 0 0 () () 0
BC 0 0 0 0 0 0 () 0
A 13(.' 0 0 0 0 0 () 0 0
• Dependency clo8ure of an attribute graph is defined as the minimal
dependency graph which <:ontains the given attribute graph.
The dependency closure of the above attribute graph. is shown lJdow.
1> 1 0 0 0 () 0 0 0
A 1 1 0 0 0 () 0 0
B 1 0 1 0 0 0 0 0
G BA 1 1 1 1 1 1 1 1
c 1 1 0 0 1 1 0 0
CA 1 1 0 0 1 1 0 0
CB 1 1 1 1 1 1 1 1
C:BA 1 1 1 1 1 1 1 1
0
The following results are derived in [45].
* The symmetric part of a. dependency graph defines an ecp ti valence
relation.
Databases 11/ld . Lattices 18
The symmetric part of the graph in Fig. 2.1 is shown hdow.
q, 1 0 0 0 0 0 0 0
A 0 1 0 0 0 0 0 0
B 0 0 1 0 0 0 0 0
H BA 0 0 0 1 0 0 1 1
c 0 0 0 0 1 1 0 0
CA 0 0 0 0 1 1 0 0
CB 0 0 0 1 0 0 1 1
CBA 0 0 0 1 0 0 1 1
It is easy to verify that this matrix defines an equivalence relation. The equivalence
classes are</>, {A}, {B}, {CA,C}, and {BA,CB,CBA}.
* The symmetric part of a dependency graph defines a set of equivalenn~
classes. In each clas!-:> there is a unique maximal element, i.e., one element
which contains all the others in the class.
The above result follows from the fact that the union of two nodes (attribute sets)
in an equivalence class belongs to the same class and the maximal element is the
union of all the elements in a cla!-:>s. The maximal elements in the above example
are</>, A, B, CA, and ABC.
I
* The maximal element!-:> of the ,:quivalence classes given by the symmetric
part of the dependency graph are precisely the saturated sets.
A saturated set can be constructed as follows. Take any attribute sd. If an attribute
is found outside the set which is functionally determined by these, include that
attribute in the set. Continue this process till the maximum possible attribute set
is obtained. The resulting set will be a saturated sd of that attribute relation.
Consider the attribute relation AB --+ C and C: --+ A. Consider the set </>.
Data"bases and Lattices 19
Since it does not determine any attribute outside it, qJ is a saturated set. Similarly,
the sets A and B also form saturated sets of the rdatiou. But Hw sd, C eau lw
expanded since it determines the attribute A outside it. Since the set C A does not
determine any attribut,e, C A forms another satuniJ,t·d sd of the relation. Finally,
the sets A.B and ABC both expand to the same set aml ABC becomes a saturated
set. It can be easily recognized that these sets are the very same maximal elements
defined earlier.
* A saturated set functionally determines another saturated set if and only
if the first one contains the second one.
Let U and V be two saturated sets. If V C U tlwn U ~ V follows from the
definition of functional dt~pendencies. Hence, the 'only if' part of the result follows
directly. For the 'if; part, assume U ~ V and V rJ:. U. Let Y be the attribute which
is in V but not in U. Since U determines V, U ~ Y also is true. Since U is a
saturated set, by definition of saturated sets, Y also should be in U.
• An attribute relation is called a di88idencc rcla.tion if it satisfies the
following axioms. The dissidence relation will lw designated by the symbol
~. In the axioms that follow, n is the en tin~ set of attributes, <P is the null
set, U and V are subsets of n and [T is the complement of U with respect
to n.
on~<P
o (U ~ </1):::;. (U = V)
o ( U ~ [J and V ~ V) :::;. (U n V ~ [J n 1~") From the above axioms it can be seen that, if there is an arrow p;oing out of a node,
it has to go to the complement of that node. The symbol ~ physically means that
the node on the left side does not functionally determine any of the attributes on
the right side. Saturated sets define<l earlier are the same a.s th<-' sets on the left
Databases and Lattices 20
side. From the saturated ::;ets the boolean function representinp.; Uu~ FDs can he
obtained. This is shown in section 2.4.
More boolean relations like completion relation (defined lat(~r), <>qui potence
relation and duals of all the abow~ relations can be defined. The purpose of intro
ducing these relations was to have a clear understan<linp.; of FDs through different
perspectives. In the following section timltivah1ed depen<lenci(-~S are descrilwd.
2.2.2 Multivalued Dependencies
Multivaluecl dependencies, abbreviated as MVDs, were introduced indepen-
dently as a generalization of FDs by Fagin[27], Zaniolo[63], and Delobe1.[23]. In this
section the properties of MVDs, their logical equivaleuce, some inference rules etc.
are discussed along with necessary examples.
• A multiva.lued dependency (:MVD) is statement of the form U --+-+ V,
where U ancl11 are subsets of rl. Let W be the set of attributes not in U
or V, i.e., W = n- UV. ';I'he MVD U --+-+ Vis said to hold for a relation
if whenever there are tuples s and t in that relation where .s[U] = t[V],
then there is a tuple u where u[UV] = ...;[UV] and u[Hl] = t[1iV].
Consider the following relation having the MVD BC --+-+ D.
A B c D b a c a
b a c .f b a e a
b a e .f d (l c a d a c .f d a e (l
d (1. c f
.Databases and Lattices 21
* Relation ~( n) holds an MVD U --H V if and only if ~( n) holds U --H W
where W = n- (UV).
It can be seen that in the above example the MVD BC --H A_ also exists. This cau
be proved with the help of the following formula.
where W = n- (UV)
* Let n be the universe of the relation~- An MVD U - V holds in ~(n)
if and only if
where U, V ~ n, vV = n- (UV),, and U n V =<I>.
Inference Rules for MVDs
As in the case of FDs, the following are some of the inference rules for MVDs.
* SYMMETRY:
U - V iff U - lV
where UVW = n, and W = n- (UV).
* REFLEXIVITY:
(V ~ U) => (U - V)
* AUGMENTATION:
* TRANSITIVITY:
(U --H V and F --H W) => (U --H lV- F)
Datal>ases and Lattices 22
* PSEUDO-TRA NSITI VJT}':
* UNION or ADDITIVI1T:
( U -rt 11 awl U -rt lV ) :=;. ( U -rt F lV )
* PIWJECT/VITY:
(U -rt F ancl U -rt Hi):=;. (U _, V n lV, 1/ -rt F- lV awl U -H lV- V)
The first four inference rules are sound and compld<· for a sd. of I\'IVDs 011 S1[8,54].
Other than tlw above inference rules, there are rules for both FDs aud MVDs.
* REPLICATION:
(U->F)=;.(U ~ V)
* COALESCENCE:
* ( U -rt V and UV -rt lV) :=;. (U _, l~V- ~'")
It has been shown that th<-~ above three inference rules are sound and complete[8].
These inference rules for FDs and MVDs written as lo.e;ical formulas are also sho\vn
to lw sound and complete[28,54].
0
Because of the equivaknce theorem between data <l<JI><'Udcucies and a pa.rt
of propositional logic, the determination of whether au FD U --> F or a11 MVD
U -H V holds in a r-dation ~R may be rmt.de hy evaluatiug the c.orrcspowlin,e;
formulas [J + v or [! + v + ( n- Ul') for two tupl<·s. Here n is the uniV<'-lS<" of \R.
In th<~ next section one of the most. important data clq><-'11< l<-·ncics and its prop
erties arc discuss<-~cl. For any relation. at. least mw join clq ww k~ncy is IH~<Tssary for
the ·purpose of normalization.
Databases and Lattices 23
2.2.3 Join Dependencies
These important da.ta dependencies are based 011 th<' definition of join _g;iv<~ll
earlier. In this section sonw of the results related to these dependencies and k<~y .. dependencies are given.
• A Jozn dependency (.JD)[50] is a statement of t.h<~ form *(Ul, U2. · · ·, Uu)
where, Ui ~ !1. A rdation :R is saicl to hold a .JD *( U1, U2, · · ·, Un) if
?R = *(U1 , U2, · · ·, Un)· Here Uis are referred to as wmponcnt8 of .!D.
For example, :R(ABC) with the FDs C -----+ A and AB -----+ C has the .JD
*(AB,BC). Note that the .JD *(U1 ,U2 ) is equivalent to.the MVD U1 nu2 -++ U2. Let J 1 and J2 be two .JDs of R. If .h is obtained by replacing two components
of J1 by their union, then .ft logically implies .h. The membership algorithm[2G]
to find out whether a .JD is a logical consequence of a sd of KDs (key dependencies)
is as follows:
Given a .JD *( U1 , U2 , · · · , U11 ) and a set {1<1 -----+ !1, · · · , I(~ -----+ !1} of KDs. Define
a set]= {U1,···,Un};
Repeat
if J{i ~ V n W for some i, 1 ::::; i ::::; .s and for some V, lV E J
then replace V and W in J by V U lV
until no more steps are possible;
If any member of J is equal to U then the giv<~n .JD is a logical consequence of
the given set of KDs;
As in the case of FDs and MVDs there is a set of infen'Hce rules proposed for .TDs
too. Some of the other results will be discussed in later s<~dio11s.
2.3 Entropy and Information
In this section we discuss the use of information theory[36] i11 the analysis of
databases. For a. better mHlerstanding of th<~ theory, we have incluclecl many
Databases and Lattices 24
results on the concept of entropy. Later we hav<~ shown how the entropy and data
dependencies are connected. Further, it is shown that <~ntropy nm lw used_ to ddine
information in a. relation. Every probability distribution has some lmcert.a.int.y
associated with it. The concept of entropy is to provide a quant.ita.tiv<~ measure
of this uncertainty. Many axiomatic definitions of entropy are available in the
literature [36] but the one available in [44] is particularly simph~.
• The entropy of a random variable X is given by
11
H(X) =- LPklogpk k=l
where Pk is the probability of the variable taking the J.;lh value.
For examp.le, consider the followi1ig relation.
A B c ao bl CJ
a.o b2 C1
a.o b3 C2
0.2 b1 C3
The various entropies with different variables are
H(A) 31 . 3 1 1 . 1 '"' :~1- . 3 = - 4 og 4 - 4 og 4 = .t. - 4 og
H(B) = _llog l- llog l- llog l = :2.
2 2 4 4 4 -4
H(C) = _llog l- llog l- llog l = 3 2 2 4 4 4 -4 2
H(A+B) = _llogl _llogl_llogl_llogl = 2 4 -4 4 4 4 -4 4 4
H(A +C)= _llog l- !log!- !log!- !log!= 2 4 -4 4 4 4 4 4 4
. . .1 1 1 1 1 1 :~ H ( B + C) = - 2log 2 - 4log 4 - 4log 4 = 2
H(A + B +C) = -~log~ - ~log~ - ~log~ - ~log~ = 2
Databases and Lattices 25
Now we will see how Venu diagrams can lw nscd t.o n·preseut ~~utropics. The
Venu diagram for a relatiou with thn~e attrilmt.es is giwu in Fig. 2.2.
c
A B
Fig. 2.2
The entropy diagram with va.lnes of different areas for t.lw previons Pxa.mple is giv1~n
below in Fig. 2.3. c
A B
Fig. 2.3
1 1 :l 1 :J :j 0 0
Here H2 = 2 H1:J = Z' Hn = 4log3- 2 , H 1n = 2 - 1 log3. Th<-' ent.ropws m
different portions of the entr<>py diagram can lw c·aknlat1·d with the help of the
following fonnula.s whid1 follow directly from tlw incl-n;;ion c:u/'1/..<;ion pr-inciplt~.
Databases and Lattice.5
1. H(A+B+C)=
H(A) + H(B) + H(C)
-H(AB)- H(A.C)- H(BC)
+H(ABC)
2. H(ABCDEF) =
H(A + D + E +F)+ H(B + D + E +F)+ H( C + D + E +F)
26
:__H(A + B + D + E +F)- H(A + C + D + E +F)- H(B + C + D + E +F)
+H(A + B + C + D + E +F)- H(D + E +F)
3. H(ABCDEF) =
H(ABC)- H(ABCD)- H(ABCE)- H(A.BCF)
+H(A.BCDE) + H(ABCDF) + H(ABCEF)
-H(ABCDEF)
4. H(ABC) =
H(A) + H(B) + H(C)
-H(A +B)- H(A +C)- H(B +C)
+H(A + B+ C)
5. H(A +C)= H(.4C:) + H(A)
Let Hk be the entropy of a set of variables writt.<'ll iu tr~nns of entropies
of atleast k variables. Vve find some interesting results that can he g<~ncralized
into a simple formula. Vve will also see what H, meaus in an entropy diagram
of three variables. We will first consider a few examples awl lat<~r gerwralize it.
1. For three variable::; we have the following n~sults.
a) H1 (A+ B +C)=
H(A) + H(B) + H(C)
-H(AB)- H(AC)- H(BC)
+H(ABC)
Databases and Latotices · -------------------
b) H2(A + B +C) =
H(AB) + H(BC) + H(AC)
-2H(ABC)
c) H 3 (A + B +C)= H(ABC).
2. For four variables we have
a) H1 (A+ B + C +D) =
H(A) + H(B) + H(C) + H(D)
-H(AB)- H(AC)- H(AD)- H(BC)- H(BD)- H(CD)
+H(ABC) + H(ABD) + H(ACD) + H(BCD)
:_H(ABCD)
b) H 2 (A + B + C +D) =
H(AB) + H(AC) + H(AD) + H(BC) + H(BD) + H(CD)
-2[H(ABC) + H(ABD) + H(ACD) + H(BCD)]
+3H(ABCD)
c) H3(A+B+C+D)=
H(ABC) + H(ABD) + H(ACD) + H(BCD)
-3H(ABCD)
d) H 4 (A + B + C +D)= H(ABCD)
3. Proceeding in similar way we find that
a) Ht(A+B+···+N)=
[sum of entropies with one variable]
- [sum of entropies with two variables]
+ [sum of entropr<'s with three variables]
( -l)n-I [sum of entropies with all variables]
b) H 2 (A + B + · · · + N) =
27
Databases nnd Lattices
[sum of entropies with two variables]
-2 [sum of entropies with three vari~t.bles]
+3 [sum of entropies with four variables]
( -1) n -l ( n - 1) [sum of entropies with all variables]
c) H 3 (A + B + · · · + N) =
[sum of entropies with three variables]
-3 [sum of entropies with four variables]
+6 [sum of entropies with five variables]
( -1)n-2 (n~l) [sum of entropies with all variables]
a) H 4 (A + B + · · · + N) =
[sum of entropies with four variables]
-4 [sum of entropies with five variables]
+10 [sum of entropies with six variables]
( -1)n- 3 (n;l) [sum of entropies with all variables]
Continuing in the same fashion we have the general formula
. (~~)[sum of entropies with m variables]
- ( 7~~ 1
) [sum of entropies with m + 1 variables]
+ ( m7~ 2 ) [sum of entropies with m + 2 variables]
( -1)n-m- 1 ( 11
7~1 ) [sum of entropies with all variables]
28
Databases and Lattices 29
2.3.1 Entropy and Dependencies
In this section we see how entropy can be used to obtain the (lata dependencies
in a relation. First let us consider an example. Let the re.lation \R( A, B, C) he
A B c ao bl CJ
ao b2 CJ
ao b:l c2 .a2 bt ca
Using the earlier calculated entropy values ( se1-~ p. 24) we gd.
H(AC') = 0 and H(.4BC) = o
The entropy diagram is given below in Fig. 2.4.
c
B
Fig. 2.4
In the above diagram, entropies in the shaded regions ABC, ABC:, aml ABC: are
zero. We can directly obtain the dependency function f.
f = a.bc + abc + abc
= a.bc + ca
Databases and Lattices 30
The entropy of each region shows how much dependency exists between those
attributes. The more the dependency, the less the <~ntropy. When the attributes
become fully dependent the entropy becomes zero. In section 2.4 we have shown
that a unique lattice can be constructed from the entropy diagram.
2.3.2 Information in a Relation
We have attempted to define information in a relatioil and in a database which
IS in Boyce-Codd normal form. Moreover, information in a lossless join is also
discussed. The starting point of a model for database performance is the measure
of the information stored in the database.
• Information, denoted as I in a. relation 1s the sum of entropies of all
' dependent attributes taken individually.
For example, let us calculate the information in the following relation.
A B c ao bl C]
ao b2 CJ
ao b3 c2
0,2 bl c3
The dependencies in the above relation are AB ---> C, and C ---+ A. Entropies of
attributes A and C are given below.
3 3 1 1 H(A) =--log-- -log-
4 4 4 4 3
= 2- -log 3 4
Databases and Lattices
1 1 1 1 1 1 H(C) =--log-- -log-- -lo,!!;-
2 2 4 4 4 4 1 1 1 -+-+-2 2 2 3
2
Infonna.tion m this relation I = Entropy of A + Entropy of C
= H(A) + H(C)
3 3 = 2- -lew 3 + -
4 t-, 2 7 3
=---log 3 2 4
The following features of information can be observed.
o The more tlie dependencies, tlw more the information.
o If there are ~10 dependencies th!~re is zero iuformation.
o If fully clepenc!f~nt, it will have complete infonuation.
31
In a relation which is in BCNF, W!~ have atleast one dqwiHlent a.ttrilmte in each
projection. Vle take marginal distribution corresponding to !'ad1 a.ttrihute. Take
the dependent attributes in tlw key. Thus,
• Information in a dat;tha.s!\ which is in BCNF, can h~~ ckfirwd as the sum
of informations in each relation (projections) in the d~~cmnposi tiou.
Information in a database, i.e., the sum of informa.tious in the projections should lw
equal to the information in the original relation. For ex<•mple, considt~r the earlier
relation:
A B c ao bl CJ
ao b2 CJ
ao b:3 c2
(/2 bl C3
Here information I = ~ - }log 3. The BCNF decomposition of the a.hove relation
is given hdow.
Databases and Lattices 32
B c A c bl Cl (/.I Cl
b2 Ct ([,I c2
b3 c2 0.2 CJ
bl CJ
Information in the relation SR(B, C), It = ~and information in the relation !R(A, C),
I2 = 2 - ~log 3. Thus total information in the da.tahase
I= It+ I2 7 3
=-- -log3 2 4 -
Thus information in a relation is the same as information in its BCNF decomposi-
tion. From the above results we can have a new definition of lossless decomposition
in terms of ii1forma.tion.
• A set of projections of a relation is called lossles8 decomposition of the
r'elation, if and only if the information in the join of projection is tlw same
as the information in the original relation. Also, being in BCNF, the join
of this decomposition gives back the original relation.
From properties of join, it can be seen that information in the projection can
never be greater than the information in the origi~wl relation. Inconsistent pro
jections are those projections whose join does not give the original relation back.
Otherwise, they are con,~istent.
0
Now we will see how entropy diagrams can lw used to find information in a
relation. Consider the earlier relation with dependencies AB ----> C and C ----> A.
The corresponding entropy diagram will be
Databases and Lattices 33
A B
Fig. 2.5
Due to dependency AB ---+ C, region corresponding to the set ABC will have zero - .
entropy. Similarly C ---+ A will make the entropies of the regions ABC: and ABC
zero. Let the entropies of the other r'egions be e1, e2, ca, and e4 as given in Fig. 2.5.
Then the entropies of each dependent attribute can be obtained by adding the
entropies of the regions which fall in the set corresponding to that attribute. Thus
Information I = Entropy of A + Entropy of B
A B
Fig. 2.6
Databases and Lattices 34
Consider another example of a rdation having the d(~I><:~udeuci(~S A - B and
B - C. The corresponding entropy diagram is shown in Fig. 2.G.
Information I = Entropy of B + Entropy of C
= (et + e2) + (el)
Even when we add the dependency A - C, we find that the entropy diagram
remains unchanged and the information remains the sanw. This is due to the
fact that A - C is not a new dependency as it can he deduced from original
dependencies by transitivity.
·The next section explores the connection between lattices, boolean algebra,
and data dependencies. All the important terms are defined aud explained. Some
of the definitions are altered from conventional ones to make understanding easier.
2.4 La~tices, Boolean Algebra, and Dependencies
This section deals with one of the most important structures in the theory of
relational databases. It is shown that lattices[9,25] and dependencies in relations
have a one-to-one correspondence and the study of these discrete structures ca.n give
deeper insight into the database theory. Similarly, boolean algebra also plays an
important role here. In this section, the close connection between these structures
is studied in detail. All the necessary definitions are provided with examples as the
chapter proceeds.
2.4.1 Lattices and Dependencies
In this section, lattices are visualiz<xl in different ways by g1vmg equivalent
definitions. Known results of the matrix theory, data dependencies, etc. are used
to arrive at the results. Before going into these details, some basic terms have to
be defined.
Databases and Lattices 35
• A digraph[46] is said to lw non.~yrnmdric if <·d.e;('
( :r, y) and ( y, .r) <=> ( .r = y)
1.e., then~ are no symmetric edg<~s otlwr thau loops at evny ll<Hl<-~. A
nonsynunetrie, transitive graph is called a pn.Ttial ()'T'flcT.
Reduction of a partial ord<~r can lw obtained by removing the loops awl transitive
edges. A digraph which is a partial order is given in Fig. 2./.a awl its reduction is - - . .
given in Fig. 2.7.b. All the eclg<~s a.n' a.ssnmt'd to have dowmvanl arrows.
Fig. 2.7.a Fig. 2.7.1>
A partial order can he defined in a different way as follows.
• Let S be a set of elements a, b, c, · · ·. Th<-·n a relation 0 of partial order
over S is any dyadic relation. over S \vhich is:
i) reflexive: for ev<~ry :r E S, _:rO:r;
ii) antis:vmnwtric: if :rOy and yO:r then :r = y:
iii) transitive: if :rOy and yO::. then :rOz;
The set on which tlw partial order rdation is defin<'d is called partially ordered 8ci.
For example, let S lw the set of natural nmnhers awl let t.h(' rdatiou lw :::;: . It can
be easily seen that the rda.tiou :::;: is a. partial ord<-'1'. The relation ~ over the set S
Databases aud Lattice.s 36
of sets A, B, · · · also defines a partial order.
• Let S be a partially ordered set of elements a., h, · · ·. lfo. < h and if tlwn
is no element :r E S such that a < :r < b then h is sai<l to coveT a.
Consider the following example. Let S be the set of proper nonempty subseb
of {a, b, -c}, i.e.,
S = {{a},{b},{c},{a,b},{a,c},{b,c}}.
Let set inclusion be the ordering relation. The corresponding partial order is giver
in Fig. 2.8.
D
A
E
B
Fig. 2.8
F
c
Here D covers B and A, F covers B and C, and so on. Consider another example
Let S be the set ofnontrivial factors of 12 and let the ordering relation be a/b. ThE
partial order is given in Fig. 2.9. Here 4 covers 2, 6 covers 2 and 3.
4 6
2 3
Fig. 2.9
Let xOy be one element in the partial order relation 0. If :r and y are inter·
changed for all x and y the conver-8e of 0 is obtained. It is denot<~d by 0'.
* If a set Sis partially ordered by the relation 0 then it is partially order<'(
by 0'.
Databases 'and -Lattices 37
The graph corresponding to the set having relation 0' can he obtained by reversing
the direction of arrows in tlw graph of 0. In tlw above example, tlw converse of
the relation ajb will be bja.
• A node is a dc8ccndant or 8nccC880r of a set of nodes if tlwre is a path
from all of tlwm to that node. A node is au ascendant or ancc8tor of a set
of nodes if there is a path from that node to all other nodes in that set.
A descendant is called a greatest lower bound or simply g.l. b. of a set of
nodes if there is a path from· that node to every other descendant of that
set. An ascendant is called a len8t v.pper bound or l. n. b. of a set of nodes if
there is a path from all the ascendants of that set to that particular node.
For a set of nodes g.l.b. and l.u.h. are unique. L.u.b. of a and his denoted by a+ b
and the g.l. b. as a · b or simpl:y a b.· Consider the following reduced partial order.
A
B D
E F G
H
Fig. 2.10
Successors of A= {B, C, D, E, F, G, H}, B = {E, F, H}, C = {E, F, G, H}, and so
on. Ancestors of H ={A, B, C, D, E, F, G}, E = {B, C, A.}, F = {B, C, A}, and so
on. G.l.b. of {A,B} = B, {A,C,D} = G, {B,C,D} = H, and so on. L.u.h.
Databases and Lattices 38
of {H,E} = E, {H,E,G} = C, {G,F,B} =A, and not<~ that there is no g.l.h.
for { B, C} since its successors are { E, F, H} and no successor is a sncn~ssor of the
other two. Similarly, there is no 1. u. b. for { E, F}.
• A partial order in which every pair of nodes has Ln. b. and g.l.b. is call<~d
a lattice. Reduction of a lattice is called Has$e diagra.rn. L.u.b. and g.l.b.
of all nodes in alattice is called full element (n) ancl null element (4> or 1)
respectively. A node with exactly one outgoing edge is called prime node.
Similarly, a node with exactly one incoming edge is called a. primitive node.
Normally, a lattice is drawn with the full element at the top and the null element
at the bottom. All the arrows point downwards. Without having a:ny ambiguity, a
lattice can be drawn without having any arrows shown. The partial orders given
below in Fig. 2.11.a and Fig. 2.11. b are lattices. We· will refer to them as "pentagon
lattice11 and "box lattice, respectively. Later, we will be making use of these lattices
often to explain different concepts and algorithms.
.]
D
H I c
B
A E F
4>
Fig. 2.11.a Fig. 2.11.b
Databases and Lattices 39
Here A, B, C, E, F, H, I are prinw nodes and A, B, C, G, H, I are primitive nodes.
But the partial order given below in Fig. 2.10 is not a lattice sine<~ nodes E, F do
not have a l.u.b. and similarly, nodes B, C do not have a. g.l.h ..
• As defined earlier a saturated set is a set of attributes which does not
determine any attribute outside it. Genera.tor8 an~ those saturated sets
whose intersections generate all the saturated sets f~xcept the universal
set.
Consider the dependencies AB --> C, C --> A. Saturated sets here are ABC,
AC, B, A, c/J. Generators are A, B, AC. Note that the universal set can always be
included when we take the intersection of elements in the generator set.
0
In the results which follow, :r·y means g.l.b.(:r, y) and :r + y means l.u.b.(.1::, y).
The universal element is denoted by 1 and the null element by 0. In a lattice, the
following are true[25].
o 1·x=x
o O·x=O
o xy = yx ·
o x(yz)=(xy)z
o x(x+y)=x
o :rx = .1:
1+:r=1
O+x=:z::
x+y=y+:z:
x + (y + z) = (:r + y) +;;
x + :ry = :r
:r+.T=X
Thus in set theoretic terms, a lattice is a partially ordered set in which every
pair of elements has a unique g.l.b. and·l.u.b. within the set. In this thesis, only
finite lattices are dealt with. The following are some of the important types of
lattices.
o Boolean lattices
o Modular lattices
o Complemented lattices
Databases lUld Lattices 40
o Distributive lattices
• Let S be a set of n elements. Then the set of all subsets of S form a
boolean lattice ifl.tLb(:r,y) = :rUy and g.l.b.(:r,y) = :rny.
Boolean lattices are of special interest to us and are discussed in later sections.
Examples for boolean lattices are given in Fig. 2.12.a. aiHl Fig. 2.12. h. ABC
AB
AB B(
A B
c
Fig. 2.12.a Fig. 2.12.b
• In a di8tribntive lattice, x · (y + z) = :r · y + .c · z. In a modular lattice,
whenever x ~ y, x · ( y + z) = x · y + :r · z. If for each element :r there is
a complement y such that x · y is the null element and x + y is the full
element, then the lattice is a complemented lattice.
* Every distributive, complemented lattice is a boolean lattice.
Completion Function and its Dual
Now, we define a function called completion function and its dual depletion
function[45,46]. It is shown that these functions defines a lattice completely.
Databases and Lattices . --~--------------------
• A function C : 211 ----+ subset of 211 having the followin.e; properties
i) C(U) 2 U
ii) C(C(U)) = C(U)
iii) c(n) = n iv) U ~ V:::} C(U) ~ C(V)
is called a . completion function.
41
Let n be the set of attributes {A, B, C} or simply, ABC. Let the dependencies
between these attributes be AB ----+ C and C ----+ A. Define the completion of a set
as follows. Keep adding all those attributes which are determined by this set until
no more addition is possible. Thus
C(¢) = ¢, C(A) = A, and C(B) = B since ¢, A, and B does not
determine any attribute outside it.
C( C) = AC since C determines A.
C(AB) = ABC since AB cletennines C.
C(AC) = AC since AC does not determine B.
C(BC) =ABC since C----+ A and thus BC----+ A.
C(ABC) =ABC
It can be easily verified that the above function satisfies all the properties of the
completion function. Similarly, dual of completion function can he defined as fol
lows.
• A function D : 211 ----+ subset of 211 having the following properties
i) D(U) ~ U
ii) D(D(U)) = D(U)
iii) V( 4>) = 4>
iv) U 2 V:::} D(U) 2 D(V)
is called a depletion function.
Databases <md Lattices 42
Let n be the same set as in th(; previous example with tlw same sd of depen
dencies. Define depletion of a set as follows. Keep removing at,trilmtes from that set
which arc determined by tlH~ attrilmtes outside it until no mon~ steps are possible.
V( <P) = <P
V( A) = <P since A is determined hy C.
V(B) = B since AC does not detennine B.
V( C) = <P since AB --+ C.
V(AB) = B since C--+ A.
V(AC) = AC since B does not determine either A or C.
V(BC) = BC since A does not determine eitlH~r B or C .
. V(ABC) =ABC
It can be easily verified that the above function satisfies all the conditions. Notice
that each element in the range of depletion function can be obtained by taking
coinplements of each element in the range of completion function with respect to
n. For the above example,
Range(C) ={¢;,A, B, AC, ABC}
Range(V) ={ABC, BC, AC, B, 1> }.
* Range of a completion function is precisely tlw set of saturated sets and
it defines a lattice.
The physical meaning of the function C( A) = B, representing the completion
function, is that B is the 'maximal attribute set functionally determined by A. The
largest superset determined by A is obvi(msly a saturated set. From the axioms of
the completion function, it is dear that the members of the range of the completion
function are the saturated sets. In the above example with dependencies AB --+ C
and C --+ A , the completed sets are </;, A, B, AC, and ABC which are precisely the
saturated sets. The corresponding lattice can be obtained by drawing a directed
edge from set A to set B if A ~ B. Thus we have the following lattice given iu
Databases and Lattices 43
Fig. 2.13.
ABC
AC
B
A
1
Fig. 2.13
D
Similarly, rangP- of deplP-tion function also gives out dual of set of saturated
sets. Here dual of a set means the complement with respect to the full set. In the
above example, dual of A is BC, dual of AC is B and so on. DeplPted S('ts {or
the above example are .4BC, BC, AC, B, and </J. Dual set of each member gives
out saturated sets and thus a lattice corresponding to the given relation. Note that
these sets by itself are saturated sets. But they form the dual lattice. Thus
* From a given set of dependencies, the corresponding lattice can be con
structed using the completion function.
Consider another set of dependencies C ---. A, D ---. B, AD ---. C, BC ---. D.
The saturated sets here can be calculated using completion function.
C(<P) = </J, C(A) =A, C(B) = B, C(C) = AC, C(D) = BD,
C(AB) = AB, C(AC) = AC, C(AD) = ABCD, C(BC) = .4BCD, C(BD) = BD,
C(CD) = ABCD,
C(ABC) = C(ABD) = C(ACD) = C(BCD) = C(ABCD) = ABCD.
The saturated sets here are </J, A, B, AB, AC, BD, ABCD and the cmTesponding
lattice is given below in Fig. 2.14.
Databases and Lattices
AC
A
ABCD
1
Fig. 2.14
44
BD
B
* Completion function partitions the power set of a set of attributes into
equivalence classes with each class having a unique maximal element. The
set of these maximal elements is the same as the set of saturated sd.s.
For the first example above the equivalence classes are { </>}, {A}, { B}, { C, AC}, and
{ AB, BC, ABC} and the maximal elements are </>, A, B, AC, and ABC. Similarly,
for the second example, the equivalence classes are { </>}, {A}, { B}, { AB}, { C, AC},
{D, BD}, and {AD, BC, CD, ABC, ABD,BCD, ABCD} having c/>, A, B, AB, AC,
BD, and ABCD as the maximal elements. Similar to the above result we have the
following result.
* Depletion ftmction partitions the power set of a set of attributes into
equivalence classes having a unique minimal element. The set of these
minimal elements forms the dual of the set of saturated sets.
Depleted sets.for the first exmi1ple are {</>,A,C}, {B,AB}, {AC}, {BC}, aml
{ABC} and the minimal dements are </J, B, AC, BC, and ABC. They din~ctly
Databases and Lattices 45
define the pentagon lattice, given m Fig. 2.13. Note that to obtain the original
lattice we have to take t.lH~ dual of the. above sets. For the second <~xample we haw~
{</>,A,C,D,AB,AD,BC}, {AC,ABC}, {BD,ABD}, {CD}, {ACD}, {BCD},
and {ABCD} as the partitioned classes and</>, AC, BD, CD, ACD, BCD, and
ABC D as the minimal element:-;. The lattice is .e;iven in Fi.e;. 2.15.
ABCD
ACD BCD
CD
AC BD
1
Fig. 2.15
Consider a relation having no dependencie:-;. The ran.e;e of the completion
function will be the power set of D and we will obtain a. boolean lattice. Thus
we have the following result.
* Every boolean lattice corresponds to a relation h~ving no dependencies.
Note that every n-attribute relation having no clependencie:-; will have DEG(?R) 11
IT CARD(Di) where D; is the domain of the attribute A;. i=l
Completion Graph and its Dual
Here graph corresponding to 'the completion function is defined. It is further
shown that every completion graph can b<~ used to represent a. lattice.
Databases and Lattices 46
• A graph with ~ets as m>dl~ labels and having 211 UIHks where n is the
number of distinct attrilmte~ occ.uriug in these lahds is c.al11~d a completion
_graph if it has the following properties.
o Out degree of every node is exactly OIH'.
o (P1 --t P2) =? (P1 <;;; P2)
o (P1 .--t P2) =? (P2 --t P2)
o ( ( P1 --t PJ and P2 --t P4 ) and (Pi <;;; P2)) =} ( P.1 <;;; P4 )
where P1 --t P2 nwans that tlH~n~ is an edge from node P1 to node P2 m
the graph.
The right hand side of the second condition is equivalent to PI n p2 = p2. TlH~
physical meaning of each edge P1 --t P2 in the completion graph is that C( P1 ) = P2.
The following is an example of a completion graph.
AB BC
0 0 A B c
0¢ Fig. 2.16
It is obvious that if we collect the nodes with loops and draw edges from sets to its
subsets we obtain the lattice. On the other hand, if the lattice is given w1~ can find
the dependencies and then obtain tlw completed sds and the equivalence classes
Databases and Lattices 47
which in turn gives the completi(m graph. Similarly, a depletion graph can also
be defined as follows, and it can be shown that every depletion graph represents a
lattice.
• A graph with sets as node labels and having 211 nodes when~ n is the
number of distinct attributes occuring in these labels is called a depletion
graph if it has the following properties.
o Out degree of every node is exactly one.
o (Pt ~ P2)::::} (PI 2 P2)
o (PI ~ P2)::::} (P2 ~ P2)
o ((PI ~ P3 and P2 ~ P4) and (PI ~ P2)) ::::} (P.J ~ P4).
Here, the interpretation of the symbol. ~ is the same as in the c.ompletion graph.
The following figure illustrates the above concept.
AB AC BC
0 0 A c
Fig. 2.17
Notice that if we complement each node label in the completion graph, we obtain
the depletion graph.
Databases and Lattices 48
Boolean Matrices and Lattices
We now proceed to defiiH; a dosed trian~ular matrix and some oth1-'r related
matrices and to show that each one of them can be nsecl to definl' a. lattic1~.
• A boolean matrix is a. matrix with boolean literals as d1~ments. A boolean
matrix with the following properties is call1~cl a clo.~ed triangnlnr mn.tri1::
o all the elements in tlw first row are ls
o all diagonal elements are 1s.
o rows are dosed under direct product
For example, consider the following boolean matrix.
A=
1 1 0 1 0 0 0 0 0 0
1 1 1 0 1 0 0 1 0 0
1 1 1 1 1
The above matrix A is a closed trian~ular matrix since all elements in tlw first row
are 1s, all diagonal elements are 1s, and the direct product (bit by bit boolean
product) of any two rows is again a. row in A.
* Every closed triangular matrix represents a lattice completely and vice
versa..
First we show, by example, how to obtain the corresponding closed triangular
matrix from a lattice. Consider the box lattice given in Fig. 2.14. 'Construct a. 7 x 7
matrix iu which the node labels of the lattice form both row-headings and column
headings. If a column-heading is a subset of a row-heading, the corresponding
matrix element is 1, otherwise it is 0. Row-headings and column-headings start with
the largest label, continuing with the next largest and so on, following the lexical
order within each group. Then the resulting matrix will be a closed triangular
matrix. Note that this matrix is the same as the a(ljacency matrix of the partial
Databases <1nd Lattices 49
order of the lattice. In this example we havt~
ALJCD AH ;\(.' HIJ A IJ
AHCD 1 1 . 1 1 1 1 1
AB () 1 () () 1 1 1
AC () 0 1 () 1 () 1
K= BD 0 0 () 1 0 1 1
A 0 () 0 0 1 0 1
8 () () 0 () () 1 1
0 () () 0 0 0 1
It is easy to see that a closed triangular matrix repres1·'nb a lat tin~. The first row
of the matrix corresponds to tlw full element fl. That the row~ are dosed under
bit wise product implies that every pair of elemenb has a unique g.l. b .. A partial
order in which every pair of elements has a unique g.l.b. aud the full element
fl exists, is a lattice. Now, the Hasse diagram of the lattice can be obtained as
follows. Draw nodes corresponding to each row of the matrix. If 01w row belongs
to another, then draw an edge from the node corresponding to the latter row to the
node corresponding to the former row. Consider the following example
1 1 1 1 1 () 1 1 () 1
A= () () 1 0 1 0 0 0 1 1 0 0 0 0 1
The graph corresponding to the above matrix is given below in Fig. 2.18.
Fig. 2.18
Databases and Lattic<>.s
By taking the transitive reduction we obtain the Hasse diagram.
0
Let n lw the munlwr of nodes, ]I Uw munlwr of prinw 11<>< ks a.11<l (j the munbcr
of primitive nodes in a lattice. We define the followi11g ma.tri!'es.
• A-matrix: an n x q matrix in which the node lalwls of the lattin' form
row-headings and primitiw node labels form column-lwa.dings. If a row
heading is a subset of column-heading the corresponding matrix elemc~nt
is 1, otherwise it is 0.
• B-ma.trix: a q x p matrix in which the primitive node labels of the lat
tice form row-heading:; and prime node labels form column-headings. If
a column-heading is a subset of ~·ow-headi11g the cmT<~srH>IHling matrix
element is 0, otherwise it is 1.
• C -matrix: a p x n matrix in which the node labels of the lattice form
column-headings and prime node labels form row-headings. If a row
heading is a subset of column-heading the corresponding matrix eleme11t
is 1, otherwise it is 0.
• D-ma.trix: an n x n matrix in which the node labels of the lattice form
row-headings and column-headings. If a colmnn-headiug ifi a. subset of
i.-ow-heading the corresponding matrix elenwnt i:; 0, otherwise it is 1.
For our box lattice
Databases and Lattices 51
AB AC HD
A fWD 0 0 0
AH 1 0 0
AC 0 1 0
A BD 0 0 1
A 1 1 0
B 1 0 1
1 1 1
AC: BD A H
AB 1 1 0 0
B AC 0 1 0 1
BD 1 0 1 0
ABCD AB AC BD A B
AC 1 0 1 0 0 0 0
c BD 1 0 0 1 0 0 0
A 1 1 1 0 1 0, 0
B 1 1 0 1 0 1 0
ABCD AB AC BD A B
ABCD 0 0 0 0 0 0 0
AB' 1 0 1 1 0 0 0
AC 1 1 0 1 0 1 0
D BD 1 1 1 0 1 0 0
A 1 1 1 1 0 1 0
B 1 1 1 1 1 0 0
1 1 1 1 1 1 0
Using these four boolean matri<~es the following results can lw derived.
Data'bases aud Lattices 52
* The matrices A, B, and C an~ related by the <~quations
where bar repre:·>ents complementation and T n•preseuts transposition.
We present an infornia.l proof of the above equations.
The ( i, j )th element of matrix CA is 1 if node i is a. suhsd of node ~: and node k:
is a subset of node j for some ~:. That is, node i is a. subset of node j. And by
definition of matrix B, the (i,j/" element of BT is 1 if w>de i is subset of node _j.
The ( 1:, .i )th element of matrix AB is 1 if node i is a snhsd of node 1.: and node k is
not a. superset of node .i for any k:. That is, node i is uot a superset of node j. An(l
by definition of matrix C, the ( i, .i )1" element of CT is 1 if node j is not a subset
of node i.
We can prove BC = AT in a similar fashion. The result ca.u lw easily verified for
our example.
* The matrices A, B, C, and D are related by tlw equations ABC
AAT =eTc= D.
The proof becomes clear if we notice that AB generates the columns of D corre
sponding to prime nodes. \Vhen AB is multiplied by C; the entire D is genera.tecl.
The above result can be easily verified for our example.
* Each of the matrices A, B, C, and D defines the lattice completely.
The proof follows from the methods of construction of the lattice given below. The
lattice can be constructed from A-matrix as follows. If the bitwist~ addition of i 1"
and j th rows results in the / h row then draw an edge from node i to no(le j. The
lattice can be constructed from B-matrix as follows. Take the bitwise addition of
Databases and Lattices 53-
the columns of B to produce as many columns as possihle. Tah~ tlw complement
of the resulting matrix and transpose it to get the A-matrix. The lattin~ can lw
constructed from C-matrix as follows. If the bitwise addition of i 1" and/" columns
results in the i 1h column then draw an edge from nocle i to nocle .J. Taking the
complement of the D-matrix gives the adjacency matrix K of tlw lattice ..
Now we move on to ddine othf~r representations of lattict-~s.
Generator Set and Armstrong Relation
Instead of the set of saturated sets, if we are ,e;iven the generator set the cor
responding lattice can be obtained. Saturated sets can be obtained hy taking the
intersection of elements in the generator sets and adding t.he universal dement.
Thus
* A generator set represents a lattice completely.
For example, consider the generator set { AC, A, B}. By taking the inten;ection we
get the pentagon lattice. Now we consider another representation of generator sets.
* Any finite subset of natural numbers represents a unique lattice and vice
versa.
Let us illustrate this with the help of an example. Let G = {1,3,4} be the finite
subset of natural numbers.· Write each element in the binary form. Then we have
G = {001, 011, 100}. Then assign an attribute to each position and write a bar if
there is a 0. Thus G = {ABC, ABC, ABC}. Vve can then obtain tlw generator set
{ AB, A, BC} by collecting the complemented part < Jf each element. Once we have
the generator set we obtain the set of saturated sets { qy, A, B, AB, BC, ABC}. Note
that we have to add the full set ABC to obtain the saturated sets. Thus we have
the following lattice.
Databases and Lattices
AB
A
ABC
1
Fig. 2.19
'
BC
B
It is obvious from the above example that for every finite subset of natural m1mb<~rs
we will have a unique lattice. From the lattice the generator sets and in turn the
subset of natural numbers also can be obtained.
0
A lattice can also be represented by a class of sets. This becomes trivial from
the above discussion. Thus
* Any class of dosed sets represents a lattice.
The property closed under intersection assures us of g.l.b .. Moreover, if we add the
fnll set into it we obtain tlw set of saturated sets. The following is an example of
such a class.
{a., c}, {a}, { b}, <P
The above class will give us the pentagon lattice and it correspoi'Hls to dependencies
C---+ A, AB ---+ C.
0
Databases 11nd Lattice,, 5'5
Now we will construct a table called A rm8irong rcla.tion whi.d1 can represent
the corresponding lattice uni<p1dy. From the generator set we write tlw binary form
of each element to obtain a unique munher. The following example illustrates this
in detail. Let the generator set be {A,B,AC}. For A we will write 011 (0 for
position A, 1 for B aml C), for B we have 101, and for AC we have 010. Thus we
obtain the numbers 3, 5, and 2 n~s1wctively. Then coustruct the tahl<-~ as follows.
A B c 0 0 0 0 2 0 0 3 3 5 0 5
Note that the first row corresponds to the full element. This table has exa.ct.ly the
same clepenclenciPs we started out with. Thus
* An Arnistrong relation gives saturated sets and depend<~ncies, and thus
the lattice directly.
Commutative Idempotent Monoid
first.
This section gives another n~presentation for lattices. We need some definitions
• A set S with an operation ·, denoted by (S, · ), is called a. comm:ntati1Jc
idempotent monoid if for any three elements :z:, y, and z the following is
true:
i) x · ( y · z) = ( :r · y) · z
ii) :r · e = e where e is tlu~ unit elenwnt
iii) X· X = X
iv) :r·y=y·x
Da-ta:bases nnd Lattices 56
It can be easily Vt~rified that the set {ABC, BC, B' A,</>} with tlw operation n is a commutative idempotent monoid.
* Every commutativt~ idempotent monoid uuiqudy rq>restmts a lattice.
The above results follow directly from the fact that the conditiou idempotent 1!;Uar
antees loops and transitive ed1!;es in the correspondin1!; 1!;r<tph and commutativity
removes symmetry. Thus we obtain a partial order with the unit d<~nH~nt in it
which is a lattice. From the pentagon lattice we can construct the above given set
directly. To demonstrate the above result consider the following tablt~.
e e1 e2 f:~ (;4
e f f) f2 f:J e4
fJ f) fJ f2 e4 e4
e'2 e2 f2 e2 e4 e4
C:J C:J e4 e4 e:~ C4
e4 e4 e4 C4 e4 e4
It can he seen that the above multiplication tahle corresponds to a commutative
idempotent monoid. From the table, the lattice can lw constructed as follows.
o Draw nodes for each element in the set.
o Draw an edge from node i to node j if the entry cmTesponding to ith row
and Ph column is j otherwise do not draw au edge at all.
o Remove the loops and reduce the graph to obtain the Hasse diagram.
We will now. see how entropy diagrams can be used to represent a lattice.
Entropy Diagram
Every lattice can be represented by a corresponding entropy diagram. Consider
· the following entropy diagram.
DatabliSes and Lattices -57
c
A B
Fig. 2.20
The dependencies of the relation eorrespondiiig to the diap;ram cau lw ohtaiw-~ci hy
collecting all boolean terms of the shaded regions. Awl thus W<' arrivt-> at
f = a.bc +ca.
The saturated sets of the relation can be obtai1wd as follows. Collect all sets cor
responding to the unshaded regions and the ontsick n~giou. Coll<~ct complemented
part of each term to obtain the set of saturated sets. For the a hove exampl<·\ the sets
are ABC, ABC, ABC, ABC, and ABC. Taking the complcmeuted parts we gPt
¢,A, B, AC, and ABC as the saturated sets. And in tnrn w<J ohtai11 the 1wntagon
lattice. Thus
* An entropy diagram can represent a lattin~ <'omplet.ely mHl vice versa.
In the following part we see the connection lwtw<'<>n hookau algtJl >ra aud de
pendencies.
2.4.2 Boolean Algebra and Dependencies
The close conw~dion bdw1-~en Boolean algebra[40] awl relational databases
theory is established in this S<~ction. Mon~ov1~r, Boolt·an a.lgchra is showu to lw
Databases and Lattices 58
useful in deriving somt> of the database results[24,28,42]. All the aJ~orithms are
explainecl with the help of t~xamples.
• A set with two operations, denoted hy (8, +,·)is called a Boolea.n nlgebm.
if the elements of B satisfies the followin~ propcrti<-~s.
o :r+y=y+:r
o :r · (y + z) = x · y + :r · z
o x+O=:z:
o :Jy 3 :r + y = 0
:r. y = y. :r
:r + ( y · z ) = ( :r + y ) · ( :r + z )
:r ·1 = :r
X· y = 1
wl1ere 0 and 1 are called additive and nmltipliea.tive identities respt~ctively.
Now we define some tt~nns which will be used later.
• A dependency term is a product of two or nH>re boolean variables in which
exactly one variable appears uncomplemented. A nnate term is a product
of boolean variables in which all variables appears complemented. A horn
term is either a dependency term or a una.te term.
For example, abc is a dependency term, a.b is a unate tt~nn. Let us see how we
can obtain the corresponding lattice from the dependency function. Consider the
same pentagon lattice discussed earlier. The saturated sets are </>, A., B, A.C, and
A.BC. V\7rite abc for </>, abc for A., abc for B, abc for A.C, and a.bc for ABC. Taking
the complements of these terms we obtain the dependency funetion f as
f = abc + abc + abc
= ac +abc
which straight away gives the fuiictional dependencies C ----+ A. and A.B ----+ C. Con
sider the box lattice given in Fig. 2.14. The saturated sets here are 4>, A, B, A.B,
A.C, BD, and ABCD. So we have
f = abed + abed + abed + a.bcd + abed + abcrl + a.br(l
Databases and Lattices 59
and tlw dependency function
f = ahccl + abed + abc(l + abrd + ahrd
= ac + bel+ rtcd + bed
which gives us the corresponding functional depc-·mlencies C--+ A, D --+ B, AD --+
C, and BC--+ D.
In the same fashion, if the dependencies are given, we can obtaiu tlw corTe-
sponding lattice as follows. Take the mintenns in tlw complenwnt of the dc~JWIHlency
function. Collect the complemented part of each term to obtain tht-' saturated sets.
Thus
* From the dependency function, a unique lattic·e cau lw obtained a.nd vice,
versa.
Consider a relation having no dependencies. Using completion function we sa\v
that this relation corresponds to a boolean lattice. The same result can be seen to
be true if we consider the dependency function. Since the dependency function of .
this relation will have 271 elemenb where n is the numbt>r of attributes, we will have
271 subsets of n as the saturated sets which will correspond to a hoolt~an lattice.
Prime Implicants
The prime implicants of the dependency func~tiori give the eutire set of keys
along with simple dependencies which are implied by the given set of dependencies.
Consider the following relation
A B c ao b1 CJ
ao b2 CJ
ao b3 cz O.z b1 c3
Databases and Lattices 60
having the dependencie:-:; AB ~ C and C ~ A. TlH~ corn~sponding dq)enclency
function i:-:; a.bc +ca. The prinw implicants a};, br, and i'!t. can he obtained by adding
abc to the dependency function. Tlm:-:; we ohtaiu tlw lloru fund.ion
h = ab + be + ca.
The three terms in the horn function rein·esents all tile clqwnd<·'IH~ies iu that rdation,
i.e., AB ~ C, BC ~ A, and C ~ A. But the dqwiHleury BC ~ A is a logical
consequence of the smaHer dependency C ~ A. The fully complemented terms in - -
the horn function ab and be give the keys AB and BC. Any rdatiou having the
dependencies will have AB and BC as their keys. Thus
* A lattice can be uniquely represented by the corresponding horn function.
Consider another example of a relation correspondiug to tlw box lattice having
dependencies C ~ A, D ~ B, AD ~ C, and B ~ CD. We have the dependency
function
f = ca + (Ib + a.clc + bed.
Adding the fully complemented term abed, we obtain the horn function
h = ca + db + adc + bed + a.bcri
= ari + be+ C(I + ca. + rlh.
The keys of a relation having the above set of dependencies are AD, BC, and CD.
Legitimate joins
In the first section, the terms legitimate join and lossless decomposition are
defined. We will now discuss an algorithm to check whdlwr a given join is legitimate
or not for a set of dependencies. In other words, the following algorithm checks
whether a. join dependency is a. valid one for a. relation or not. We will <~xpla.in this
algorithm using the previous rdation having the depeudeucies C ~ A a.ud
Databases aud Lattices 61
AB ~ C. Let tlw attribute ll<tllH's lw X 1 ,X't, all(l X:1 iusteacl of A, B, awl C. Giveu
the dependencies XI x'l ~ X:l awl X:~ -----t x, tlw problem is to dwck vvlwtlwr t.lw
join XI x3 * x'lx3 is legitimat.<·. The algorithm has Hw followiug steps.
o Write the dep(-~udency fundi on as
o Take the comp1enwnt <>f the < lepeudency fnudion f awl call it. .IJ. Tlms
= :r 1 a:2.-r:1 + J~ 1 :r 2 :r:1 + :r 1 :/~2:r:1 + :1: 1:r'L:l::1
o Multiply the function !l by the attributes iu tlw projections of tht-~ join
and name them fJJ:l, :rJn, a.ncl [/:3 (where y:1 is tlw common part of the
projections.)
For the projt~d.ions XI x3 aud x2x3'
= :r 2 :r:3 + :r 1 :r:l
= :t1 :~:2:1:3 + :!:1 :r2:r:l + :r, :1:2:r:1
!f2:l = g(:r2 + :r:l)
= :r'L:r3 + :1:1 :r2 + :rl :r:l + :1: I :r2:!::l
= :r 1 :r2 :c 3 + :l: 1 :r 2 :r3 + :r1 :!: 2 :r:1 + :1: 1:r'L:l::3
g3. = g(:r:l)
= :r2:r:l +:I: I :r2x:l + :rl :I":J
= :r1x2:r3 + i1 x2x:1 + :r1 :l:2:r::1
o Take the Hamming W<~ight of g and the projt~dious (Hamming weight of
the function is the numh<'r of mintenns of the eqnatiou). Ld th<~ Hamming
weight of g lw H, that of f}J:l he H J:l, that of .rf'L:1 he H n, and that of g:l lw
H3. For this example we have H = 4, H1 :~ = 3, H2 :1 = 4, aud H:1 = 3. We
Da·tabases and Lattices 62
say that if th~ Hamming w~ight equationH = H1:1 +H23- H'.l is sa.ti:sfied,
the given join is legi tima.te. Note that this Hamming w1~ight equation is a
dir~ct observation obta.irwd from the incln.~ion cxclu.~ion principle. Thus
The following are some of the corollaries:
* If we consider any' other set of dependencies which is a. superset of th(~
above set, the same join will again be legitimate.
Now let us dwck if the 'join X 1 X 2 * X 2 X.3 is legitimate.
912 = :t1:r2:r:1 + :r1:r2:r3 + :f1:r2:r3 + :r1:l:2:r:1
9:l = x1:r2::r3 + i1:r2:t3 + x1:r2:z::1
The Hamming weights H = 4, H12 = 4, H23 = 4~ and H:1 = 3. In this case
the Hamming weight equation H = H 12 + H2 3 - H2 does not satisfy. Therefore,
xl x2 * x'l.x.J is not a legitimate join.
Boolean Algebra and MVDs
In this section, representation of MVDs in terms of boolean algebra is clisct1ssed.
The MVD ABC --H DE in the relation !R(A, B, ···,X) can be represented b_Y the
entropy equation
H(ABC(D + E)(F + .. ·+X))= 0
(The last factor consists of all those variables which have not appeared in the state
ment of the MVD). The dependency can be stated as, the variables {DE jAB C} and
{F · · · XjABC} are independent. In other words, variables {D, E} and {F, ···,X}
are independent when {A, B, C} is given. Further the entropy of every minterm of
ABC(D + E)(F +···+X)
Databases and Lattices 63
IS zero. This follows from a slight generalization of the fact that
This implies that {A 1, Az, · · · , Ar} and { Ar+ 1, · · · , A11 } are independent and hence
the entropy of every mintern of
IS zero. From the above statement it follows that the boolean fundiou ABC( D +
E)( F +···+X) can represent an MVD. To fix these ideas we give a simple example
of a relation where MVD B ----++ C holds and we show that H(BAC.) = 0. The
relation is shown below.
Rent A Ocupant B Pet C 50 Nair Lion 50 Mehta Deer 50 Mehta Snake 75 Taneja Mouse
500 Mathur Fish 500 Menon Dog 500 Menon Rhino 600 Chhibber Cat
Assigning the probability of 0.125 for each one of the tuples occuring in the table,
we get
so that
H(A +B +C)= 3
H(A +B)= 5/2
H(B +C)= 3
H(B) = 5/2
H(BAC) = H(A +B)+ H(B +C)- H(A + B +C)- H(B)
= 5/2 + 3- 3- 5/2 = 0
Databases and Lattices 64
From the above results we conclude that any set of dependencies cau be represented
by a boolean functiou. For example, if R( A, B, C) has the dqwndenci<~s A - B,
and C ___. A in it, we can write the boolean function as
f = a.bc + ca
Boolean. Dependencies
We have seen that boolean algebra plays an important. role in defining depen
dencies. But we notice that many boolean functions like ABCDE h.a.ve not yet
been defined. Here we attempt to describe the dependencies corresponding to these
boolean terms. We call this kind of dependency a boolean dependency. The cor
responding dependency for boolean term ABCDE is given by AB ___. CDE. Now,
since we know that the entropy of dependencies is equal to zero, entropy of the
above boolean term is H(ABCDE) = 0. Expanding the left hand side, we get
H(A + B +C)+ H(A + B +D)+ H(A + B +E)- H(A. + B + C +D)- H(A +
B + C +E)- H(A + B + D +E)+ H(A + B + C + D +E)- H(A +B)= 0.
Or,
H(A + B + C +D) +H(A + B + C +E)+ H(A + B + D +E)+ H(A +B)
= H(A + B + C + D +E)+ H(A + B +C)+ H(A + B +D)+ H(A + B +E)
On looking at the relation again, we see that the above system gives natural join ' ' '
equation
ABD * ABCE * ABDE = ABCDE
Thus we find that the given boolean dependency can be represented as the join of
particular relations. Now, corresponding to zero boolean function i.e., mintern with
all terms uncomplemented, as in ABC for instance, we find there are many join
dependencies which imply all relations whose join does not give the original relation,
i.e., there is a class of joins for which the boolean dependency is zero. Adding the
Databases and Lattices 65
relation, i.e., there 1s a class of joins for which the boolean dependency is zero.
Adding the zero boolean depend<~ncy does not, therdore, alter t.lw orip;inal clepen
dency, and eorrespondinp; to a boolean function w<-~ thus find a class of joins. So,
corresponding to every miutern there is an equiva.lem~<' class of joins i.e., <~very one
of the joins gives this mintern.
In the next section, normal forms of relations are discussecl: Vv<~ have con
centrated on the most important form, Boyce-Codd Normal Form (BCNF) and
algorithms are given to .clt~compose a relation into BCNF.
2.5 Normal Forms
Data normalization is one of tlw most important concepts in the design as
pect of databases. It is important to keep the relations in normalized forms for
better storage mid retrieval of data. Tht::Te are many algorithms available in the
literature[5,6,17,21,22,2G]. In this section we discuss these normal forms. Moreover,
algorithms are given to decompose a relation into BCNF which are based on the
lattiees and boolean algebra concepts discussed in earlier sections.
• A relation is said to be in fir$t norma.! form ( lNF) if the domain of ev
ery attribute iu the relation is a simple set. If the determinant of every
nonprime attribute is an superkey then, the relation i:-; in 8econd norrna.l
form (2NF). If the determinant of every nonprime attribute is a k<~y, then
the relation is said to be in third normal form ( 3NF). A rel-ation lS Ill
Boyce-Codd normal form (BCNF) if every determinant is a key.
Note that in relational database theory we consider relations in lNF only. The
other two important .normal forms related to MVDs are defined below.
• A ~elation is in fo11.rth normal form ( 4NF) if whenever U -H V Is a
Databases nnd Lattices 1>6
is a logical consequence of the set of key dependencies of the schema. A
relation is in projection-join normal form (P.JNF) sometimes called fifth
normal form (5NF) if every .JD is a lo,e;ica.l c<>HS<~qncnce of the set of key
dependencies of the relation.
Now we discuss BCNF and give two algorithms to decompose a relation into . .
BCNF.
2.5.1 Boyce-Codd N annal Fonn .
In this section we discuss this particular normal form in <letail and give some
algorithms to decompose a relation into BCNF[16]. A procedure is given to decom
pose a relation having only functional dependencies into BCNF. This procedure is
based on boolean algebra theory discussed earlier.
Algorithm 1:
First, we take an example to illustrate the decomposition procedure. Consider
a relation in a database in which there are four attributes {A, B, C, D} and following
dependencies: C --+ A, D --+ B, AD --+ C, BC --+ D. Write down the dependency
function f and reduce it to the minimal boolean function. In this case
f = ca + db+ ii.dc + bed.
To facilitate the decomposition it is necessary to list all the keys of the relation.
As we have seen earlier these are the fully complemented terms in the function
where Xi's are the attributes of the relation. Note that if all the terms in the prune
implicants are keys then the relation is already in BCNF. The keys of the above
relation are AD, BC, and CD.
Databases and Lattices 67
The BCNF decomposition can be obtained as follows. Choose one of tlw small
est terms in .f. If then~ is more than one term with the s<une number of attributes,
then choose the term which has tlw lexically smallest dependent at.trilmte. H<·n-· we
choose CA. Delete all the terms in .f having all the attributes of this term. Thus
J =db+ bed.
Delete all the terms from the list of keys having the dependent a.ttrihut,c of the
chosen term in it. Here we delete AD. Now choose the next smallest term BD
and delete bed from .f and BC from the keys. Since there are no terms left in f
list out the terms which are left in keys and aU the chosen terms to obtain the
decomposition. Thus a BCNF of the above relation will be CD, AC, BD. It is dear
from the above procedure that we cannot choose a term in f which has a. smaller
dependency within it. Thus all the choser1 terms having, say, k attributes, will have
determinants of size k- 1 and hence are in BCNF.
From the above example, the general procedure for obtaining a BCNF decom-
position of a relation can be given as follows:
o From minimal dependency function list a.ll terms.
o From the prime implicants list all keys.
o Choose one of the smallest terms in the given dependency function making
use of lexical order in case of conflicts.
o Delete all the terms in the dependency function having all the attributes
of the chosen term.
o Delete all the terms in the list of keys having the dependent attribute of
the chosen term.
o Repeat the above steps until no term is left in the dependency f,·mction.
o List out all terms whid1 are left in the keys along with the chosen terms.
o Remove all terms which are subsets of other terms listed to obtain a
BCNF.
Dat·abases .and Lattices- 68
Consider another example of a. relation which ha:-; the followinp; set of depen
dencies: C -t A, and AB -t C. Note that this corresponds to the rwutagou lattice.
We have f = ca +abc. Adding the fully eomplenH~ntl~d term ii.hc, we ohta.in the keys
AB and BC. By the third step of the above algorithm, choose the smalle:-;t term
AC. When we- choose the this term from f and delete AB from list of keys, we will
exhaust all terms of J and we obtain the BCNF decomposition AC and BC.
Algorithm 2:
Now we give a procedure for BCNF decomposition basefl on lattice theory. A
relation is said to be in reduced form if all the equivaleut attributes are removed.
For example, in a relation with dependencies A -t Band B -t A, we retain only one
attribute between A and B. For this algorithm we consider only relations in reduced
form. As in the first ,algorithm we take an example to illustrate the decomposition
procedure. Consider a. relation in a database in which there are four attributes
{A, B, C, D} and the following dependencies: C -t A, D -t B, BC -t D, AD -t C.
The lattice which represents. the above dependencies is shown in Fig. 2.2l.a. The
nodes of the Hasse diagram are designated by tlw corresponding saturated sets
and all the edges in the diagram are assumed to have anows pointing downwards.
We saw in earlier sections that the saturated sets are closed under intersection and
they by themselves completely define the lattice.
To facilitate the decomposition it is necessary to relabel the Hasse diagram
as follows. Starting from the topmost node, from each node label remove all the
attributes which occur in the labels of its children. The· resultant diagram is shown
in Fig. 2.21.b.
Databases and Lattices 69
ABCD
AC BD c D
A B A B
1
Fig. 2.2l.a Fig. 2.2l.h
After relabeling, every node with outdeg1_·ee on<-' will have exactly ow-~ distinct at
tribute. Other nodes may or may not have a label. Now remove all unlabeled nodes
except the topmost node by maintaining all existing paths. Here, the middle node
can be removed by drawing edges fro~u the topmost node to node A and node B.
Since there are no paths through the lower most node, it can he removed without
drawing any edges. For obtaining the decomposition, all the transitive edges are to
be removed. The resultant partial order is shown in Fig. 2.22.
c D
A B
Fig. 2.22
Databases and Lattices
Using Fig. 2.22 it is easy to decompose the relation into BCNF. For each nocle,
collect the labels of its children and add them to its label and list them to obtain
the BCNF. Thus a decomposition for om example is CD, AC, BD. Note that we
need not include single attributes in the BCNF. The projection corresponding to
the topmost node CD is a key of the relation, hence obviously in BCNF. In every
oth-er projection with two attributes there will be one ddenninant. which is the key
of that particular relation. In all other projections with more than two attributes,
there will be two determinants: one is the parent node label and the other is the
set of all its children which determines the parent node. Hence all the projections
obtained will be in BCNF.
From the above example, the gerwral procediue for the decomposition of a
i·elation can be given as follows:
o From the saturated sets of the relation, draw the Hasse diagram of the
lattice.
o Starting from the topmost node, from each node label remove all the
attributes which occur in the labels of its children. Redraw the Hasse
diagram.
o Remove all unlabeled nodes except the topmost node, by maintaining all
existing paths.
o Remove all transitive edges to obtain a partial order.
o List the projections corresponding to each node after adding all the labels
of its children to its label to obtain a BCNF decomposition.
Finally, if we observe that in each projection the key is in its parent projection,
it becomes clear that the join of our decomposition will always give the original
relation back. The decomposition is obviously unique.
It can be easily verified that this algorithm works for the pentagon lattice too.
After relabeling and reducing the Hasse diagram we get the following partial order.
Databases and· Lattices 71
c B
A
Fig. 2.23
For each node, collecting the labels of its children, we obtain the Boyce-Codd
Normal Form decomposition as BC and AC.
0
In this chapter we have seen how relational database theory is closely connected
with lattices. We have also given two algorithms to find a BCNF decomposition of
a relation. Though we have consiclei·ed only reduced relations in the case of lattice
algorithm·, we can extend this algorithm to accomodate nonreduced relations as
well. From the discussion it should be clear that thest-~ discrete structures can be
extensively used in the study of databases.