Databases and Lattices - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/16656/8/08_chapter 2.pdf · For example, a Maruti 800 car with a registration number KGX 6018 is an entity

2

Databases

and Lattices

Over the past few years relational databases have becc>me a standard field of study

for researchers in computer science and this theory enjoys considerable mathemati

cal support. In this chapter, we have attempted to give interesting results in a con

ceptually simple fashion and the main topics in the theory of relational databases

have. been discussed in detail with examples. The main body of the chapter is

divided into five sections:

* Relational Model

* Data Dependencies

* Entropy and Information

* Lattices, Boolean Algebra, and Dependencies

* Normal Forms.

2.1 Relational Model

In this section, the basic: definitions of relational model of databases are given.

Most of the definitions are conventional ones but given in a. simple form. Some

Databases and Lattices 6

definitions are new. An attempt has been made to include all the fundamental

ideas in relational database theory and examples are provided wherever necessary

to illustrate the concepts.

A datamodel[13,30,38,56,60J is primarily used for modelling a logical database

in a DBMS. There are three data models proposed and available in the literature

the relationalmodel[12,15,17-19], the network model[14,58], and the hierarchical

moclel[59]. In this thesis we consider the relational model conceived by Codd in

1970. The relational model has many advantages over the other models mainly

because of its simplicity and sound theoretical background. The model rests on the

well developed mathematical theory of relations and the first-order logic[28,31]. In

data modelling, the method stems from a perception of the real world through finite~

objects. The objects are essentially of three kinds: entities, their attributes, and

the relationships among entities. Thus attributes destribe entities which in turn

are associated by relationships.

• An entity is a physical or abstract object that exists and cau be distin

guished from other objects.

For example, a Maruti 800 car with a registration number KGX 6018 is an entity

since it uniquely identifies one particular car. By contrast, a book on databases

published in 1982 is not an example of an entity since the features do not identify

any particular book.

• Properties that are to represent an entity in the underlying database are

called attrib·u.te.~.

For example, for an entity named car there may be two attributes: lVIodel and

Registration Number associated with it. It is important to distinguish between

DatabiiSes and Lattices 7

attributf' name and attribute value. For the attribute name model then~ may b(~

attribute values like Maruti 800, PrPmier 118NE, Standard 2000, de ..

Before going into fttrther definitious let us tevit-'W the uotations used m this

chapter. For universal set of attrilmtes, i.e., set of all attrilmt<-~s iu a rdatiou, \\W

use n. We use U, V, TV with or without subscripts for suhsds of n. X, Y, Z

with or without subscripts stand for variable attrilmte wmws aud A, B, C, · · ·

with or without subscripts stand for specific attribute names. But. wlwu we discuss

dependencies in terms of concepts in boolean algebra, sma.lll<"tt<-~rs iustt-'ad of capital

letters are used for attribute names .

• The univer$C of a relational database, derwted by n. is a finit<;, nonempty

set of attributes AI' ... ' An, i.e., n = {AI' ... ' All}.

The dom.ain of an attribute Aj written as DOM(Aj ), is a fiuit.IJ set of a.ttrilmte

values of Aj which must be of the same datatype. A domaiu is a .~im.plc .>d if all its

elements are atomic (i.e., nmHlecomposable by the mHkrlyiug DBMS). Nonatomic

attributes occur in the theory of nested relational dataha.ses[35,52]. In relational

databases only atomic attributes are considered. Domain of a subset U of D, denoted

by DOM(U), is the union of the domains of all attributes in U, i.<'.,

DOM(U) = U DOM(Ak) AkEU

where DOM( <P) = </J.

• A relation R(D) is a subset of a cartesian product. x(D 1 , · • ·• D 11 ) where

Thus a relation can lw represented by a functioi1 with domain D f = x ( D 1 , • • ·, D 11 )

and range Rt = {0,1}. Ld that function he P(r/1 , · • ·,r/11 ) when-' diED;. P takes

the value 1 if ( d1, · · · , d 11 ) E !R aucl takes the valnt-~ 0 otlwrwis('. Sp(~~a.kiug informally,

a relationinrelational database theory can bevisnalizcdas a L1blewitlwa.ch column


heading as attribute name and each row as an element of that relation. Under each

attribute name there are attribute values of the same datatype. Cardinality of R

or CARD(?R) is the number of rows in the table and degree of ?R or DEG(?R) is the

number of columns. For example, a relation BOOK may be defined as follows.

Title Author ISBN Type Future Shock Alvin Toffier 03199-11 Fiction

Bhagavad Gita S. B. Varma 34-5691-2 Philosophy One Richard Bach 94770-08 Fiction

The 'JEXbook D. E. Knuth 201-13448-9 Technical A Suitable Boy Vikram Seth 0-12-2318-7 Fiction

Power Shift Alvin Toffier 04729-15 Fiction Bhagavad Gi ta S .. Radhakrishnan 01-23-4568-9 Philosophy

A relation in table form has the following properties:

o There is no duplication of column names since these names are the at

tributes in the set U.

o There is no duplication of rows since 1'j is the set of tuples which occur in

the domain of the function P.

o Row and column orders are insignificant. When columns are permuted for

any reason, all the attribute values corresponding to it are also permuted.

o All attribute values are atomic.

0

• Let U be a nonempty subset of n, R(U) be a relation over U, and V be a

nonempty subset of U. The projection of R onto V is the relation over V .

consisting of the V -value of each tuple in R, i.e.,

R[V] = {lt[V]IV ~ U and 1r E R(U)}.

In other words, a projection of a relation is obtained by deleting certain columns

and by removing duplicate rows. In the above example BOOK is a relation over


U ={Title, Author, ISBN, Ty}H~}. The projection of BOO!\ owr V

Type } will be

{ Author,

AnthoT Typr: Alvin Toffier Fiction S. B. Vanna Philosophy

Richard Bach Fiction D. E. Knuth Tedmic·al Vikra.m Sdh Fiction

s. Ra.dll<t.krislm<ul Philosophy

Note that every projection is also a relation over a smaller sd, of attributes.

• Let 2R 1(U) and 2R2 (V) be two relations. The 11:atnml join (or simply

join)[20] of Rt and R2 written as *(2R1, 2R2) is the relation with a.ttributc~s

from u ( v - ( u n v)) consisting of all tuples, such that for ea.dl tuple ,. in

the join then' c~xists some tuple s 1 in 2R 1 and s2 in 'R2 satisfying r·[U] = SJ

and r· [VJ = s2.

\Vhen U and V are disjoint, a join of R1 and 2R2 will lwc·onw the cross product of

R1 and R2~ As an Pxa.mple of join, consider the following relations \R1 a.nd ~R2:

A B c D B E F 4 a.b 15 20 a.b 20 pq 2 a. a 18 25 and ac 25 pr 8 a.c 9 10

Then,

A B c D E F 4 ab 15 20 20 pq 8 ac 9 10 25 pr


• A decomposition of a relation )R into its projections R1, · · ·, Rn is losslc.~.~

if )R can be recow;truded by joining tlH~se projections, i.(~.,

For example, consider the relation )R and its projections R1, Rz, and R1.

A B c al bl CJ al bz CJ al ba Cz az bl ca

A B A c SR3 B c al bl al C] b) C]

al bz al cz hz ,cl a] ba az ca ba Cz

q,z bl bl ca

Here the decomposition of R into R2 and R1 is losslf'ss since *(R2 , R:d = !R: Bnt

the decomposition R 1 , !R1 is a los8y decomposition since *(R1 , R1 ) #- R.

In the next section, various data dependencies in a relation are discussed. Terms

like keys, superkeys, etc. are defined and explained.

2.2 Data Dependencies

Data dependencies are constraints imposed on data in a database. In addition

to a set of attributes, a set of data dependencies is also an essential part of a

relation or database scheme. The class of functional dependencies was the first

type of data dependencies to be studied and has since been thoroughly investigated

by many researchers including Cocld[17], who proposed the concept, Delobel and

Casey[24], who developed a set of inference rules, and Annstrong[3], who gave a

set of inference rules which are independent, sound, and complete. The relevance

of functional dependencies in the design of a relational database scheme has been

based on the observation that, in most cases, a set of functional depeiHl(~nci(-~S plus


one join dependency are enough to expre:-;s the dependency structure of a rda.tional

databa..<;e scheme. Also, they are essential for the assumption of the universal rela

tion scheme[29,39]. Multivalned depenclenci(~S were independ(~ntly introduced by

Fagin[27], Zaniolo[63], and Delobel[23]. This class of dependencies is a generaliza

tion of the class of functional dependencies since nmltivalued dependencies provide

(many-to-many) relations, where as functional dependencies provid(~ only (many

to-one) functions. This section discusses data dependencies briefly. All the terms

are defined and illustrated with examples.

2.2.1 Functional Dependencies

An: attribute A functionally depends on another attribute B if, in the relation,

whenever B-value becomes equal, A-value also becomes equal. More formally:

• Let n be the universe and U be a subset of n. A Fu'nctional Dependency,

abbreviated as ·FD, is a constraint on U and i:-; of the form V -:--+ Hf wh(~re,

V and lV are subsets of U. Relation R(U) satisfies FD V-:--+ W, or V-:--+ lV

holds in R(U), if for every two tuples in R(U), say 1· and s, whenever their

V -values are identical, their lV -values are also the same, i.e.,

(r[V] = s[V]):::} (1·[lV] = s[H!])

In the relation BOOK defined earlier, we see that the attribute ISBN determines ev

ery other attri~)ute, A 1dhor and Titlr: together determine:-; ISBN, An thor determines

Type, and Title determines Type. The following relation

A B c a.o bl C]

a.o bz C]

a.o b3 cz O.z bl c3

has the dependencies AB-:--+ C and C -:--+ A. It can be verified that whenever the A-


values and B-values coincide, the corresponding C-values also coincide. Similarly,

whenever the C-values coincide, the corresponding A-values also coincide.

Inference rules for FDs

* REFLEXIVITY:

(V ~ U) =} (U --+ V)

i.e., for attributes A, B, and C the FD ABC --+ AB is always true. FD's of this

kind are said to be trivial. Here ABC is an abbreviation for {A, B, C} and similarly

AB for {A, B}.

* AUGMENTATION:

Special cases: When U3 = </J, U1 U1 --+ U"-. \iVhen U3 = U4 , U1 U4 --+ U2U4 . When.

u3 = u4 = u1' u1 u1 --+ u1 u2 or simply, u] --+ u1 u2. * TRANSITIVITY:

(U--+ V and V--+ l¥) =} (U--+ l¥)

The following inference rules are deduced from the above three rules.

* PSEUDO-TRANSITIVITY:

If U3 is replaced with <P the transitivity rule is obtained. This rule is a generalisation

of transitivity.

* UNION or ADDITIVITY:

(U --+ V and U --+ W) =} (U --+ Vl¥)

* DECOMPOSITION or· PROJECTIVITY:

(U--+ Vl¥) =} (U--+ V and U--+ W)


Cardinalities of projections U and V completely ddennine whether U func

tionally determines V or not. The following result statt's this.

* Let CARD(~[n]) or simply CARD(n) he tlw canlinality of tht~ relation ~

over the set of attribute:-; n as defined earlin·. Let U awl V lw subsets of n.

Then U is :-;aid to functionally determine V if CARD( U) = CARD( U, V).

• A set of attributes U not containing any of the a.ttrilmtes in the attribute

set V is called a dctermina.nt of V if V is fnndioua.lly dependent on U and

no proper subset of U has this property. If V is such that no super :-;et of

V has U as dett~rminaut then the set V is called the colon:tJ of. U. A set of

attributes U is called a saturated set if the following condition is satisfied:

if a determinant ·is in U then its colony must lw in U.

The above definition is explained in the follnwing t~xa.mple. Let ~ lw the

following relation.

A B c ao b1 CJ

(/.0 b2 CJ

O.o b3 C2

{1.2 b1 C:l

There exist two FDs AB --+ C and C --+ A in ~- Determinants in ~ are AB and

C. Colony of A = ¢>, B = ¢>, C = AC, AB = ABC, AC = AC, BC = ABC,

ABC= ABC. Saturated sets ofR are A, B, AC, and ABC. Note that uullset ¢>

and full set n (set of all att.ri bu te:-;) are saturatt~d st~ts of ewry relation. Similarly,

colony of ¢> = ¢> and colony of n = n. We assume that uo rela.tiou has a constant

colun1n.

14

Keys in a Relation

The central concept in the relatioualmodel i::; the c~oncept of a. key. Intuitiv(-~ly,

an attribute (or a collection of attributes) may uniqudy clet.ennim· auy particular

tuple within a relation. The existence of atleast one k(~y i::; guaranteed by definition.

However, the point is to hav(~ a key of minimal length, i.e, an attribute i::; to be

discarded from the key provided it does not destroy its uniqueness.

• A set of attributes U is called a candidate X:q; or simply a X:cy of a rela.tiou,

if U determines every attribute in the relation and no proper subs(~t of U

has this property. An FD I\ ---+ U i::; called a key dependency of ?R if I\ is

a key of ?R.

It is pos:-;ible that a relation has more than one candidate key. In :-;uch ca::;e:-;, mw

of them is designated by the datahase de:-;igner as the prinwry key, i.e., as the

primary means of identifying the corre:;;ponding tuples. All other keys are alternate

keys. The choice of the primary key from among the candidate keys depends on tlw

particular circumstances, though typically the shorte::;t key i::; the most favourable.

Any set containing a key is called a 8uperkey.

• An attribute is called a prime attrib11.te if it is an PlPnH~nt of some key. Non

prime attributes are referred to a::; petty attributes. Two set::; of attributes

U and V are said to be equivalent if U ---+ V and V ---+ U. A key is a simple

key if it contain:-; only one attribute. Complex key.~ are those in which there

are more than one attribute. A set of attributes which detennines atlea.st

one attribute outside it is called an open 8ci. A set which is not au open

set is called a clo$ed $ct.

It can be verified that ::;aturatecl :-;ets and dosed set:-; of a relation are identicaL Now,

consider the same example ?R with dependencies AB ---+ C and C ---+ A.


A B c ao bl CJ

ao b2 C]

ao b:J c2 0.2 bl c3

Keys of the above relation are AB and BC. ABC is a super key since it contains

a key. All the attributes are prime attributes. Since there are no keys with sin

gle attribute all the keys are complex. C, AB, and BC are open sets since they

determine A, C, and A respectively. All other sets are closed.

D

The close connection between saturated sets (or closed sets) and lattices IS

given in section 2.4. Next we consider the representation of FDs in terms of boolean

functions.

Representation of FDs Using Binary Relations

This s~ction deals with a particular binary relation called an attribute relation

and its specializations[45]. This binary relation is not to be confused with the

relations of a relational database. The study of these relations can be helpful for

visualizing FDs.

Let S1 be the set: S1 = {Xu-1 , Xn-2, · · ·, X0 } where each X; is an attribute.

Let S be the power set of n. Thus the cardinality of S is 211• Let the elements of

S be Ro, R1, · · ·, R2"-1·

• An attribute relation (or attribute graph) is a subset of the Cartesian

product S x S: · A dependency relation (or dependency graph) is an

attribute graph satisfying the following three axioms.

0 TUU---tT ( projectivity)

o (T ---t U and T ---t F) :=;. (T ---t U U V) ( additivity)


o ( T ----+ U an cl U ----+ F ) ::::} ( T ----+ F ) ( trausi t,i vi ty)

where T, U, F E S awl the symbol ----+ staucls for clq><~ucl<~w~y relation.

It can lH· verified that. the ahov<· axioms are iuclepcwlent.. It is kuowu[3,8,42] that

these tluet-~ axioms an~ appropriate for n~pres<~ntinJ,!; FDs awl that tlwy are sound

and complete. Consider the attribute .e;raph

B A ----+ C, C ----+ A

where the notation C, B, and A are used instead of X.2 , X 1 , X 0 aud also B A

instead of { B, A} for convenienr<-'. Tlw graph can lw rqn·<~seuted geometrically as

in Fig. 2.1.

AB

A

• ABC

.. AC

c

• <P

Fig. 2.1

• BC

• B

0

A graph can be represented by its adjacency matrix. The orclPr of the adjac<~ncy

matrix is equal to the nmnber of verti<~Ps of the graph. Iu repn-·s<>uting ;,n attrihnte

graph a '1' appears at the intPrsection of r.ow Ri and cohmm Ri if R; ----+ Rj otherwise

a '0' appears there. The adjacency matrix of the a.l H>V<' p;l'aph is showu hdow.


"' A B A/3 (; i\(.' HC ABC

</> 0 0 0 0 0 () 0 ()

A 0 0 0 0 0 () () ()

B 0 () 0 0 0 0 () 0

G AB 0 0 0 0 1 () () ()

c: 0 1 0 0 0 0 () 0

AC 0 0 0 0 0 () () 0

BC 0 0 0 0 0 0 () 0

A 13(.' 0 0 0 0 0 () 0 0

• Dependency clo8ure of an attribute graph is defined as the minimal

dependency graph which <:ontains the given attribute graph.

The dependency closure of the above attribute graph. is shown lJdow.

1> 1 0 0 0 () 0 0 0

A 1 1 0 0 0 () 0 0

B 1 0 1 0 0 0 0 0

G BA 1 1 1 1 1 1 1 1

c 1 1 0 0 1 1 0 0

CA 1 1 0 0 1 1 0 0

CB 1 1 1 1 1 1 1 1

C:BA 1 1 1 1 1 1 1 1

0

The following results are derived in [45].

* The symmetric part of a. dependency graph defines an ecp ti valence

relation.

Databases 11/ld . Lattices 18

The symmetric part of the graph in Fig. 2.1 is shown hdow.

q, 1 0 0 0 0 0 0 0

A 0 1 0 0 0 0 0 0

B 0 0 1 0 0 0 0 0

H BA 0 0 0 1 0 0 1 1

c 0 0 0 0 1 1 0 0

CA 0 0 0 0 1 1 0 0

CB 0 0 0 1 0 0 1 1

CBA 0 0 0 1 0 0 1 1

It is easy to verify that this matrix defines an equivalence relation. The equivalence

classes are</>, {A}, {B}, {CA,C}, and {BA,CB,CBA}.

* The symmetric part of a dependency graph defines a set of equivalenn~

classes. In each clas!-:> there is a unique maximal element, i.e., one element

which contains all the others in the class.

The above result follows from the fact that the union of two nodes (attribute sets)

in an equivalence class belongs to the same class and the maximal element is the

union of all the elements in a cla!-:>s. The maximal elements in the above example

are</>, A, B, CA, and ABC.

I

* The maximal element!-:> of the ,:quivalence classes given by the symmetric

part of the dependency graph are precisely the saturated sets.

A saturated set can be constructed as follows. Take any attribute sd. If an attribute

is found outside the set which is functionally determined by these, include that

attribute in the set. Continue this process till the maximum possible attribute set

is obtained. The resulting set will be a saturated sd of that attribute relation.

Consider the attribute relation AB --+ C and C: --+ A. Consider the set </>.

Data"bases and Lattices 19

Since it does not determine any attribute outside it, qJ is a saturated set. Similarly,

the sets A and B also form saturated sets of the rdatiou. But Hw sd, C eau lw

expanded since it determines the attribute A outside it. Since the set C A does not

determine any attribut,e, C A forms another satuniJ,t·d sd of the relation. Finally,

the sets A.B and ABC both expand to the same set aml ABC becomes a saturated

set. It can be easily recognized that these sets are the very same maximal elements

defined earlier.

* A saturated set functionally determines another saturated set if and only

if the first one contains the second one.

Let U and V be two saturated sets. If V C U tlwn U ~ V follows from the

definition of functional dt~pendencies. Hence, the 'only if' part of the result follows

directly. For the 'if; part, assume U ~ V and V rJ:. U. Let Y be the attribute which

is in V but not in U. Since U determines V, U ~ Y also is true. Since U is a

saturated set, by definition of saturated sets, Y also should be in U.

• An attribute relation is called a di88idencc rcla.tion if it satisfies the

following axioms. The dissidence relation will lw designated by the symbol

~. In the axioms that follow, n is the en tin~ set of attributes, <P is the null

set, U and V are subsets of n and [T is the complement of U with respect

to n.

on~<P

o (U ~ </1):::;. (U = V)

o ( U ~ [J and V ~ V) :::;. (U n V ~ [J n 1~") From the above axioms it can be seen that, if there is an arrow p;oing out of a node,

it has to go to the complement of that node. The symbol ~ physically means that

the node on the left side does not functionally determine any of the attributes on

the right side. Saturated sets define<l earlier are the same a.s th<-' sets on the left


side. From the saturated ::;ets the boolean function representinp.; Uu~ FDs can he

obtained. This is shown in section 2.4.

More boolean relations like completion relation (defined lat(~r), <>qui potence

relation and duals of all the abow~ relations can be defined. The purpose of intro

ducing these relations was to have a clear understan<linp.; of FDs through different

perspectives. In the following section timltivah1ed depen<lenci(-~S are descrilwd.

2.2.2 Multivalued Dependencies

Multivaluecl dependencies, abbreviated as MVDs, were introduced indepen-

dently as a generalization of FDs by Fagin[27], Zaniolo[63], and Delobe1.[23]. In this

section the properties of MVDs, their logical equivaleuce, some inference rules etc.

are discussed along with necessary examples.

• A multiva.lued dependency (:MVD) is statement of the form U --+-+ V,

where U ancl11 are subsets of rl. Let W be the set of attributes not in U

or V, i.e., W = n- UV. ';I'he MVD U --+-+ Vis said to hold for a relation

if whenever there are tuples s and t in that relation where .s[U] = t[V],

then there is a tuple u where u[UV] = ...;[UV] and u[Hl] = t[1iV].

Consider the following relation having the MVD BC --+-+ D.

A B c D b a c a

b a c .f b a e a

b a e .f d (l c a d a c .f d a e (l

d (1. c f

.Databases and Lattices 21

* Relation ~( n) holds an MVD U --H V if and only if ~( n) holds U --H W

where W = n- (UV).

It can be seen that in the above example the MVD BC --H A_ also exists. This cau

be proved with the help of the following formula.

where W = n- (UV)

* Let n be the universe of the relation~- An MVD U - V holds in ~(n)

if and only if

where U, V ~ n, vV = n- (UV),, and U n V =<I>.

Inference Rules for MVDs

As in the case of FDs, the following are some of the inference rules for MVDs.

* SYMMETRY:

U - V iff U - lV

where UVW = n, and W = n- (UV).

* REFLEXIVITY:

(V ~ U) => (U - V)

* AUGMENTATION:

* TRANSITIVITY:

(U --H V and F --H W) => (U --H lV- F)

Datal>ases and Lattices 22

* PSEUDO-TRA NSITI VJT}':

* UNION or ADDITIVI1T:

( U -rt 11 awl U -rt lV ) :=;. ( U -rt F lV )

* PIWJECT/VITY:

(U -rt F ancl U -rt Hi):=;. (U _, V n lV, 1/ -rt F- lV awl U -H lV- V)

The first four inference rules are sound and compld<· for a sd. of I\'IVDs 011 S1[8,54].

Other than tlw above inference rules, there are rules for both FDs aud MVDs.

* REPLICATION:

(U->F)=;.(U ~ V)

* COALESCENCE:

* ( U -rt V and UV -rt lV) :=;. (U _, l~V- ~'")

It has been shown that th<-~ above three inference rules are sound and complete[8].

These inference rules for FDs and MVDs written as lo.e;ical formulas are also sho\vn

to lw sound and complete[28,54].

0

Because of the equivaknce theorem between data <l<JI><'Udcucies and a pa.rt

of propositional logic, the determination of whether au FD U --> F or a11 MVD

U -H V holds in a r-dation ~R may be rmt.de hy evaluatiug the c.orrcspowlin,e;

formulas [J + v or [! + v + ( n- Ul') for two tupl<·s. Here n is the uniV<'-lS<" of \R.

In th<~ next section one of the most. important data clq><-'11< l<-·ncics and its prop

erties arc discuss<-~cl. For any relation. at. least mw join clq ww k~ncy is IH~<Tssary for

the ·purpose of normalization.


2.2.3 Join Dependencies

These important da.ta dependencies are based 011 th<' definition of join _g;iv<~ll

earlier. In this section sonw of the results related to these dependencies and k<~y .. dependencies are given.

• A Jozn dependency (.JD)[50] is a statement of t.h<~ form *(Ul, U2. · · ·, Uu)

where, Ui ~ !1. A rdation :R is saicl to hold a .JD *( U1, U2, · · ·, Un) if

?R = *(U1 , U2, · · ·, Un)· Here Uis are referred to as wmponcnt8 of .!D.

For example, :R(ABC) with the FDs C -----+ A and AB -----+ C has the .JD

*(AB,BC). Note that the .JD *(U1 ,U2 ) is equivalent to.the MVD U1 nu2 -++ U2. Let J 1 and J2 be two .JDs of R. If .h is obtained by replacing two components

of J1 by their union, then .ft logically implies .h. The membership algorithm[2G]

to find out whether a .JD is a logical consequence of a sd of KDs (key dependencies)

is as follows:

Given a .JD *( U1 , U2 , · · · , U11 ) and a set {1<1 -----+ !1, · · · , I(~ -----+ !1} of KDs. Define

a set]= {U1,···,Un};

Repeat

if J{i ~ V n W for some i, 1 ::::; i ::::; .s and for some V, lV E J

then replace V and W in J by V U lV

until no more steps are possible;

If any member of J is equal to U then the giv<~n .JD is a logical consequence of

the given set of KDs;

As in the case of FDs and MVDs there is a set of infen'Hce rules proposed for .TDs

too. Some of the other results will be discussed in later s<~dio11s.

2.3 Entropy and Information

In this section we discuss the use of information theory[36] i11 the analysis of

databases. For a. better mHlerstanding of th<~ theory, we have incluclecl many


results on the concept of entropy. Later we hav<~ shown how the entropy and data

dependencies are connected. Further, it is shown that <~ntropy nm lw used_ to ddine

information in a. relation. Every probability distribution has some lmcert.a.int.y

associated with it. The concept of entropy is to provide a quant.ita.tiv<~ measure

of this uncertainty. Many axiomatic definitions of entropy are available in the

literature [36] but the one available in [44] is particularly simph~.

• The entropy of a random variable X is given by

11

H(X) =- LPklogpk k=l

where Pk is the probability of the variable taking the J.;lh value.

For examp.le, consider the followi1ig relation.

A B c ao bl CJ

a.o b2 C1

a.o b3 C2

0.2 b1 C3

The various entropies with different variables are

H(A) 31 . 3 1 1 . 1 '"' :~1- . 3 = - 4 og 4 - 4 og 4 = .t. - 4 og

H(B) = _llog l- llog l- llog l = :2.

2 2 4 4 4 -4

H(C) = _llog l- llog l- llog l = 3 2 2 4 4 4 -4 2

H(A+B) = _llogl _llogl_llogl_llogl = 2 4 -4 4 4 4 -4 4 4

H(A +C)= _llog l- !log!- !log!- !log!= 2 4 -4 4 4 4 4 4 4

. . .1 1 1 1 1 1 :~ H ( B + C) = - 2log 2 - 4log 4 - 4log 4 = 2

H(A + B +C) = -~log~ - ~log~ - ~log~ - ~log~ = 2


Now we will see how Venu diagrams can lw nscd t.o n·preseut ~~utropics. The

Venu diagram for a relatiou with thn~e attrilmt.es is giwu in Fig. 2.2.

c

A B

Fig. 2.2

The entropy diagram with va.lnes of different areas for t.lw previons Pxa.mple is giv1~n

below in Fig. 2.3. c

A B

Fig. 2.3

1 1 :l 1 :J :j 0 0

Here H2 = 2 H1:J = Z' Hn = 4log3- 2 , H 1n = 2 - 1 log3. Th<-' ent.ropws m

different portions of the entr<>py diagram can lw c·aknlat1·d with the help of the

following fonnula.s whid1 follow directly from tlw incl-n;;ion c:u/'1/..<;ion pr-inciplt~.

Databases and Lattice.5

1. H(A+B+C)=

H(A) + H(B) + H(C)

-H(AB)- H(A.C)- H(BC)

+H(ABC)

2. H(ABCDEF) =

H(A + D + E +F)+ H(B + D + E +F)+ H( C + D + E +F)

26

:__H(A + B + D + E +F)- H(A + C + D + E +F)- H(B + C + D + E +F)

+H(A + B + C + D + E +F)- H(D + E +F)

3. H(ABCDEF) =

H(ABC)- H(ABCD)- H(ABCE)- H(A.BCF)

+H(A.BCDE) + H(ABCDF) + H(ABCEF)

-H(ABCDEF)

4. H(ABC) =

H(A) + H(B) + H(C)

-H(A +B)- H(A +C)- H(B +C)

+H(A + B+ C)

5. H(A +C)= H(.4C:) + H(A)

Let Hk be the entropy of a set of variables writt.<'ll iu tr~nns of entropies

of atleast k variables. Vve find some interesting results that can he g<~ncralized

into a simple formula. Vve will also see what H, meaus in an entropy diagram

of three variables. We will first consider a few examples awl lat<~r gerwralize it.

1. For three variable::; we have the following n~sults.

a) H1 (A+ B +C)=

H(A) + H(B) + H(C)

-H(AB)- H(AC)- H(BC)

+H(ABC)

Databases and Latotices · -------------------

b) H2(A + B +C) =

H(AB) + H(BC) + H(AC)

-2H(ABC)

c) H 3 (A + B +C)= H(ABC).

2. For four variables we have

a) H1 (A+ B + C +D) =

H(A) + H(B) + H(C) + H(D)

-H(AB)- H(AC)- H(AD)- H(BC)- H(BD)- H(CD)

+H(ABC) + H(ABD) + H(ACD) + H(BCD)

:_H(ABCD)

b) H 2 (A + B + C +D) =

H(AB) + H(AC) + H(AD) + H(BC) + H(BD) + H(CD)

-2[H(ABC) + H(ABD) + H(ACD) + H(BCD)]

+3H(ABCD)

c) H3(A+B+C+D)=

H(ABC) + H(ABD) + H(ACD) + H(BCD)

-3H(ABCD)

d) H 4 (A + B + C +D)= H(ABCD)

3. Proceeding in similar way we find that

a) Ht(A+B+···+N)=

[sum of entropies with one variable]

- [sum of entropies with two variables]

+ [sum of entropr<'s with three variables]

( -l)n-I [sum of entropies with all variables]

b) H 2 (A + B + · · · + N) =

27

Databases nnd Lattices

[sum of entropies with two variables]

-2 [sum of entropies with three vari~t.bles]

+3 [sum of entropies with four variables]

( -1) n -l ( n - 1) [sum of entropies with all variables]

c) H 3 (A + B + · · · + N) =

[sum of entropies with three variables]

-3 [sum of entropies with four variables]

+6 [sum of entropies with five variables]

( -1)n-2 (n~l) [sum of entropies with all variables]

a) H 4 (A + B + · · · + N) =

[sum of entropies with four variables]

-4 [sum of entropies with five variables]

+10 [sum of entropies with six variables]

( -1)n- 3 (n;l) [sum of entropies with all variables]

Continuing in the same fashion we have the general formula

. (~~)[sum of entropies with m variables]

- ( 7~~ 1

) [sum of entropies with m + 1 variables]

+ ( m7~ 2 ) [sum of entropies with m + 2 variables]

( -1)n-m- 1 ( 11

7~1 ) [sum of entropies with all variables]

28


2.3.1 Entropy and Dependencies

In this section we see how entropy can be used to obtain the (lata dependencies

in a relation. First let us consider an example. Let the re.lation \R( A, B, C) he

A B c ao bl CJ

ao b2 CJ

ao b:l c2 .a2 bt ca

Using the earlier calculated entropy values ( se1-~ p. 24) we gd.

H(AC') = 0 and H(.4BC) = o

The entropy diagram is given below in Fig. 2.4.

c

B

Fig. 2.4

In the above diagram, entropies in the shaded regions ABC, ABC:, aml ABC: are

zero. We can directly obtain the dependency function f.

f = a.bc + abc + abc

= a.bc + ca


The entropy of each region shows how much dependency exists between those

attributes. The more the dependency, the less the <~ntropy. When the attributes

become fully dependent the entropy becomes zero. In section 2.4 we have shown

that a unique lattice can be constructed from the entropy diagram.

2.3.2 Information in a Relation

We have attempted to define information in a relatioil and in a database which

IS in Boyce-Codd normal form. Moreover, information in a lossless join is also

discussed. The starting point of a model for database performance is the measure

of the information stored in the database.

• Information, denoted as I in a. relation 1s the sum of entropies of all

' dependent attributes taken individually.

For example, let us calculate the information in the following relation.

A B c ao bl C]

ao b2 CJ

ao b3 c2

0,2 bl c3

The dependencies in the above relation are AB ---> C, and C ---+ A. Entropies of

attributes A and C are given below.

3 3 1 1 H(A) =--log-- -log-

4 4 4 4 3

= 2- -log 3 4

Databases and Lattices

1 1 1 1 1 1 H(C) =--log-- -log-- -lo,!!;-

2 2 4 4 4 4 1 1 1 -+-+-2 2 2 3

2

Infonna.tion m this relation I = Entropy of A + Entropy of C

= H(A) + H(C)

3 3 = 2- -lew 3 + -

4 t-, 2 7 3

=---log 3 2 4

The following features of information can be observed.

o The more tlie dependencies, tlw more the information.

o If there are ~10 dependencies th!~re is zero iuformation.

o If fully clepenc!f~nt, it will have complete infonuation.

31

In a relation which is in BCNF, W!~ have atleast one dqwiHlent a.ttrilmte in each

projection. Vle take marginal distribution corresponding to !'ad1 a.ttrihute. Take

the dependent attributes in tlw key. Thus,

• Information in a dat;tha.s!\ which is in BCNF, can h~~ ckfirwd as the sum

of informations in each relation (projections) in the d~~cmnposi tiou.

Information in a database, i.e., the sum of informa.tious in the projections should lw

equal to the information in the original relation. For ex<•mple, considt~r the earlier

relation:

A B c ao bl CJ

ao b2 CJ

ao b:3 c2

(/2 bl C3

Here information I = ~ - }log 3. The BCNF decomposition of the a.hove relation

is given hdow.


B c A c bl Cl (/.I Cl

b2 Ct ([,I c2

b3 c2 0.2 CJ

bl CJ

Information in the relation SR(B, C), It = ~and information in the relation !R(A, C),

I2 = 2 - ~log 3. Thus total information in the da.tahase

I= It+ I2 7 3

=-- -log3 2 4 -

Thus information in a relation is the same as information in its BCNF decomposi-

tion. From the above results we can have a new definition of lossless decomposition

in terms of ii1forma.tion.

• A set of projections of a relation is called lossles8 decomposition of the

r'elation, if and only if the information in the join of projection is tlw same

as the information in the original relation. Also, being in BCNF, the join

of this decomposition gives back the original relation.

From properties of join, it can be seen that information in the projection can

never be greater than the information in the origi~wl relation. Inconsistent pro

jections are those projections whose join does not give the original relation back.

Otherwise, they are con,~istent.

0

Now we will see how entropy diagrams can lw used to find information in a

relation. Consider the earlier relation with dependencies AB ----> C and C ----> A.

The corresponding entropy diagram will be


A B

Fig. 2.5

Due to dependency AB ---+ C, region corresponding to the set ABC will have zero - .

entropy. Similarly C ---+ A will make the entropies of the regions ABC: and ABC

zero. Let the entropies of the other r'egions be e1, e2, ca, and e4 as given in Fig. 2.5.

Then the entropies of each dependent attribute can be obtained by adding the

entropies of the regions which fall in the set corresponding to that attribute. Thus

Information I = Entropy of A + Entropy of B

A B

Fig. 2.6


Consider another example of a rdation having the d(~I><:~udeuci(~S A - B and

B - C. The corresponding entropy diagram is shown in Fig. 2.G.

Information I = Entropy of B + Entropy of C

= (et + e2) + (el)

Even when we add the dependency A - C, we find that the entropy diagram

remains unchanged and the information remains the sanw. This is due to the

fact that A - C is not a new dependency as it can he deduced from original

dependencies by transitivity.

·The next section explores the connection between lattices, boolean algebra,

and data dependencies. All the important terms are defined aud explained. Some

of the definitions are altered from conventional ones to make understanding easier.

2.4 La~tices, Boolean Algebra, and Dependencies

This section deals with one of the most important structures in the theory of

relational databases. It is shown that lattices[9,25] and dependencies in relations

have a one-to-one correspondence and the study of these discrete structures ca.n give

deeper insight into the database theory. Similarly, boolean algebra also plays an

important role here. In this section, the close connection between these structures

is studied in detail. All the necessary definitions are provided with examples as the

chapter proceeds.

2.4.1 Lattices and Dependencies

In this section, lattices are visualiz<xl in different ways by g1vmg equivalent

definitions. Known results of the matrix theory, data dependencies, etc. are used

to arrive at the results. Before going into these details, some basic terms have to

be defined.


• A digraph[46] is said to lw non.~yrnmdric if <·d.e;('

( :r, y) and ( y, .r) <=> ( .r = y)

1.e., then~ are no symmetric edg<~s otlwr thau loops at evny ll<Hl<-~. A

nonsynunetrie, transitive graph is called a pn.Ttial ()'T'flcT.

Reduction of a partial ord<~r can lw obtained by removing the loops awl transitive

edges. A digraph which is a partial order is given in Fig. 2./.a awl its reduction is - - . .

given in Fig. 2.7.b. All the eclg<~s a.n' a.ssnmt'd to have dowmvanl arrows.

Fig. 2.7.a Fig. 2.7.1>

A partial order can he defined in a different way as follows.

• Let S be a set of elements a, b, c, · · ·. Th<-·n a relation 0 of partial order

over S is any dyadic relation. over S \vhich is:

i) reflexive: for ev<~ry :r E S, _:rO:r;

ii) antis:vmnwtric: if :rOy and yO:r then :r = y:

iii) transitive: if :rOy and yO::. then :rOz;

The set on which tlw partial order rdation is defin<'d is called partially ordered 8ci.

For example, let S lw the set of natural nmnhers awl let t.h(' rdatiou lw :::;: . It can

be easily seen that the rda.tiou :::;: is a. partial ord<-'1'. The relation ~ over the set S

Databases aud Lattice.s 36

of sets A, B, · · · also defines a partial order.

• Let S be a partially ordered set of elements a., h, · · ·. lfo. < h and if tlwn

is no element :r E S such that a < :r < b then h is sai<l to coveT a.

Consider the following example. Let S be the set of proper nonempty subseb

of {a, b, -c}, i.e.,

S = {{a},{b},{c},{a,b},{a,c},{b,c}}.

Let set inclusion be the ordering relation. The corresponding partial order is giver

in Fig. 2.8.

D

A

E

B

Fig. 2.8

F

c

Here D covers B and A, F covers B and C, and so on. Consider another example

Let S be the set ofnontrivial factors of 12 and let the ordering relation be a/b. ThE

partial order is given in Fig. 2.9. Here 4 covers 2, 6 covers 2 and 3.

4 6

2 3

Fig. 2.9

Let xOy be one element in the partial order relation 0. If :r and y are inter·

changed for all x and y the conver-8e of 0 is obtained. It is denot<~d by 0'.

* If a set Sis partially ordered by the relation 0 then it is partially order<'(

by 0'.

Databases 'and -Lattices 37

The graph corresponding to the set having relation 0' can he obtained by reversing

the direction of arrows in tlw graph of 0. In tlw above example, tlw converse of

the relation ajb will be bja.

• A node is a dc8ccndant or 8nccC880r of a set of nodes if tlwre is a path

from all of tlwm to that node. A node is au ascendant or ancc8tor of a set

of nodes if there is a path from that node to all other nodes in that set.

A descendant is called a greatest lower bound or simply g.l. b. of a set of

nodes if there is a path from· that node to every other descendant of that

set. An ascendant is called a len8t v.pper bound or l. n. b. of a set of nodes if

there is a path from all the ascendants of that set to that particular node.

For a set of nodes g.l.b. and l.u.h. are unique. L.u.b. of a and his denoted by a+ b

and the g.l. b. as a · b or simpl:y a b.· Consider the following reduced partial order.

A

B D

E F G

H

Fig. 2.10

Successors of A= {B, C, D, E, F, G, H}, B = {E, F, H}, C = {E, F, G, H}, and so

on. Ancestors of H ={A, B, C, D, E, F, G}, E = {B, C, A.}, F = {B, C, A}, and so

on. G.l.b. of {A,B} = B, {A,C,D} = G, {B,C,D} = H, and so on. L.u.h.


of {H,E} = E, {H,E,G} = C, {G,F,B} =A, and not<~ that there is no g.l.h.

for { B, C} since its successors are { E, F, H} and no successor is a sncn~ssor of the

other two. Similarly, there is no 1. u. b. for { E, F}.

• A partial order in which every pair of nodes has Ln. b. and g.l.b. is call<~d

a lattice. Reduction of a lattice is called Has$e diagra.rn. L.u.b. and g.l.b.

of all nodes in alattice is called full element (n) ancl null element (4> or 1)

respectively. A node with exactly one outgoing edge is called prime node.

Similarly, a node with exactly one incoming edge is called a. primitive node.

Normally, a lattice is drawn with the full element at the top and the null element

at the bottom. All the arrows point downwards. Without having a:ny ambiguity, a

lattice can be drawn without having any arrows shown. The partial orders given

below in Fig. 2.11.a and Fig. 2.11. b are lattices. We· will refer to them as "pentagon

lattice11 and "box lattice, respectively. Later, we will be making use of these lattices

often to explain different concepts and algorithms.

.]

D

H I c

B

A E F

4>

Fig. 2.11.a Fig. 2.11.b


Here A, B, C, E, F, H, I are prinw nodes and A, B, C, G, H, I are primitive nodes.

But the partial order given below in Fig. 2.10 is not a lattice sine<~ nodes E, F do

not have a l.u.b. and similarly, nodes B, C do not have a. g.l.h ..

• As defined earlier a saturated set is a set of attributes which does not

determine any attribute outside it. Genera.tor8 an~ those saturated sets

whose intersections generate all the saturated sets f~xcept the universal

set.

Consider the dependencies AB --> C, C --> A. Saturated sets here are ABC,

AC, B, A, c/J. Generators are A, B, AC. Note that the universal set can always be

included when we take the intersection of elements in the generator set.

0

In the results which follow, :r·y means g.l.b.(:r, y) and :r + y means l.u.b.(.1::, y).

The universal element is denoted by 1 and the null element by 0. In a lattice, the

following are true[25].

o 1·x=x

o O·x=O

o xy = yx ·

o x(yz)=(xy)z

o x(x+y)=x

o :rx = .1:

1+:r=1

O+x=:z::

x+y=y+:z:

x + (y + z) = (:r + y) +;;

x + :ry = :r

:r+.T=X

Thus in set theoretic terms, a lattice is a partially ordered set in which every

pair of elements has a unique g.l.b. and·l.u.b. within the set. In this thesis, only

finite lattices are dealt with. The following are some of the important types of

lattices.

o Boolean lattices

o Modular lattices

o Complemented lattices

Databases lUld Lattices 40

o Distributive lattices

• Let S be a set of n elements. Then the set of all subsets of S form a

boolean lattice ifl.tLb(:r,y) = :rUy and g.l.b.(:r,y) = :rny.

Boolean lattices are of special interest to us and are discussed in later sections.

Examples for boolean lattices are given in Fig. 2.12.a. aiHl Fig. 2.12. h. ABC

AB

AB B(

A B

c

Fig. 2.12.a Fig. 2.12.b

• In a di8tribntive lattice, x · (y + z) = :r · y + .c · z. In a modular lattice,

whenever x ~ y, x · ( y + z) = x · y + :r · z. If for each element :r there is

a complement y such that x · y is the null element and x + y is the full

element, then the lattice is a complemented lattice.

* Every distributive, complemented lattice is a boolean lattice.

Completion Function and its Dual

Now, we define a function called completion function and its dual depletion

function[45,46]. It is shown that these functions defines a lattice completely.

Databases and Lattices . --~--------------------

• A function C : 211 ----+ subset of 211 having the followin.e; properties

i) C(U) 2 U

ii) C(C(U)) = C(U)

iii) c(n) = n iv) U ~ V:::} C(U) ~ C(V)

is called a . completion function.

41

Let n be the set of attributes {A, B, C} or simply, ABC. Let the dependencies

between these attributes be AB ----+ C and C ----+ A. Define the completion of a set

as follows. Keep adding all those attributes which are determined by this set until

no more addition is possible. Thus

C(¢) = ¢, C(A) = A, and C(B) = B since ¢, A, and B does not

determine any attribute outside it.

C( C) = AC since C determines A.

C(AB) = ABC since AB cletennines C.

C(AC) = AC since AC does not determine B.

C(BC) =ABC since C----+ A and thus BC----+ A.

C(ABC) =ABC

It can be easily verified that the above function satisfies all the properties of the

completion function. Similarly, dual of completion function can he defined as fol

lows.

• A function D : 211 ----+ subset of 211 having the following properties

i) D(U) ~ U

ii) D(D(U)) = D(U)

iii) V( 4>) = 4>

iv) U 2 V:::} D(U) 2 D(V)

is called a depletion function.

Databases <md Lattices 42

Let n be the same set as in th(; previous example with tlw same sd of depen

dencies. Define depletion of a set as follows. Keep removing at,trilmtes from that set

which arc determined by tlH~ attrilmtes outside it until no mon~ steps are possible.

V( <P) = <P

V( A) = <P since A is determined hy C.

V(B) = B since AC does not detennine B.

V( C) = <P since AB --+ C.

V(AB) = B since C--+ A.

V(AC) = AC since B does not determine either A or C.

V(BC) = BC since A does not determine eitlH~r B or C .

. V(ABC) =ABC

It can be easily verified that the above function satisfies all the conditions. Notice

that each element in the range of depletion function can be obtained by taking

coinplements of each element in the range of completion function with respect to

n. For the above example,

Range(C) ={¢;,A, B, AC, ABC}

Range(V) ={ABC, BC, AC, B, 1> }.

* Range of a completion function is precisely tlw set of saturated sets and

it defines a lattice.

The physical meaning of the function C( A) = B, representing the completion

function, is that B is the 'maximal attribute set functionally determined by A. The

largest superset determined by A is obvi(msly a saturated set. From the axioms of

the completion function, it is dear that the members of the range of the completion

function are the saturated sets. In the above example with dependencies AB --+ C

and C --+ A , the completed sets are </;, A, B, AC, and ABC which are precisely the

saturated sets. The corresponding lattice can be obtained by drawing a directed

edge from set A to set B if A ~ B. Thus we have the following lattice given iu


Fig. 2.13.

ABC

AC

B

A

1

Fig. 2.13

D

Similarly, rangP- of deplP-tion function also gives out dual of set of saturated

sets. Here dual of a set means the complement with respect to the full set. In the

above example, dual of A is BC, dual of AC is B and so on. DeplPted S('ts {or

the above example are .4BC, BC, AC, B, and </J. Dual set of each member gives

out saturated sets and thus a lattice corresponding to the given relation. Note that

these sets by itself are saturated sets. But they form the dual lattice. Thus

* From a given set of dependencies, the corresponding lattice can be con

structed using the completion function.

Consider another set of dependencies C ---. A, D ---. B, AD ---. C, BC ---. D.

The saturated sets here can be calculated using completion function.

C(<P) = </J, C(A) =A, C(B) = B, C(C) = AC, C(D) = BD,

C(AB) = AB, C(AC) = AC, C(AD) = ABCD, C(BC) = .4BCD, C(BD) = BD,

C(CD) = ABCD,

C(ABC) = C(ABD) = C(ACD) = C(BCD) = C(ABCD) = ABCD.

The saturated sets here are </J, A, B, AB, AC, BD, ABCD and the cmTesponding

lattice is given below in Fig. 2.14.


AC

A

ABCD

1

Fig. 2.14

44

BD

B

* Completion function partitions the power set of a set of attributes into

equivalence classes with each class having a unique maximal element. The

set of these maximal elements is the same as the set of saturated sd.s.

For the first example above the equivalence classes are { </>}, {A}, { B}, { C, AC}, and

{ AB, BC, ABC} and the maximal elements are </>, A, B, AC, and ABC. Similarly,

for the second example, the equivalence classes are { </>}, {A}, { B}, { AB}, { C, AC},

{D, BD}, and {AD, BC, CD, ABC, ABD,BCD, ABCD} having c/>, A, B, AB, AC,

BD, and ABCD as the maximal elements. Similar to the above result we have the

following result.

* Depletion ftmction partitions the power set of a set of attributes into

equivalence classes having a unique minimal element. The set of these

minimal elements forms the dual of the set of saturated sets.

Depleted sets.for the first exmi1ple are {</>,A,C}, {B,AB}, {AC}, {BC}, aml

{ABC} and the minimal dements are </J, B, AC, BC, and ABC. They din~ctly


define the pentagon lattice, given m Fig. 2.13. Note that to obtain the original

lattice we have to take t.lH~ dual of the. above sets. For the second <~xample we haw~

{</>,A,C,D,AB,AD,BC}, {AC,ABC}, {BD,ABD}, {CD}, {ACD}, {BCD},

and {ABCD} as the partitioned classes and</>, AC, BD, CD, ACD, BCD, and

ABC D as the minimal element:-;. The lattice is .e;iven in Fi.e;. 2.15.

ABCD

ACD BCD

CD

AC BD

1

Fig. 2.15

Consider a relation having no dependencie:-;. The ran.e;e of the completion

function will be the power set of D and we will obtain a. boolean lattice. Thus

we have the following result.

* Every boolean lattice corresponds to a relation h~ving no dependencies.

Note that every n-attribute relation having no clependencie:-; will have DEG(?R) 11

IT CARD(Di) where D; is the domain of the attribute A;. i=l

Completion Graph and its Dual

Here graph corresponding to 'the completion function is defined. It is further

shown that every completion graph can b<~ used to represent a. lattice.


• A graph with ~ets as m>dl~ labels and having 211 UIHks where n is the

number of distinct attrilmte~ occ.uriug in these lahds is c.al11~d a completion

_graph if it has the following properties.

o Out degree of every node is exactly OIH'.

o (P1 --t P2) =? (P1 <;;; P2)

o (P1 .--t P2) =? (P2 --t P2)

o ( ( P1 --t PJ and P2 --t P4 ) and (Pi <;;; P2)) =} ( P.1 <;;; P4 )

where P1 --t P2 nwans that tlH~n~ is an edge from node P1 to node P2 m

the graph.

The right hand side of the second condition is equivalent to PI n p2 = p2. TlH~

physical meaning of each edge P1 --t P2 in the completion graph is that C( P1 ) = P2.

The following is an example of a completion graph.

AB BC

0 0 A B c

0¢ Fig. 2.16

It is obvious that if we collect the nodes with loops and draw edges from sets to its

subsets we obtain the lattice. On the other hand, if the lattice is given w1~ can find

the dependencies and then obtain tlw completed sds and the equivalence classes


which in turn gives the completi(m graph. Similarly, a depletion graph can also

be defined as follows, and it can be shown that every depletion graph represents a

lattice.

• A graph with sets as node labels and having 211 nodes when~ n is the

number of distinct attributes occuring in these labels is called a depletion

graph if it has the following properties.

o Out degree of every node is exactly one.

o (Pt ~ P2)::::} (PI 2 P2)

o (PI ~ P2)::::} (P2 ~ P2)

o ((PI ~ P3 and P2 ~ P4) and (PI ~ P2)) ::::} (P.J ~ P4).

Here, the interpretation of the symbol. ~ is the same as in the c.ompletion graph.

The following figure illustrates the above concept.

AB AC BC

0 0 A c

Fig. 2.17

Notice that if we complement each node label in the completion graph, we obtain

the depletion graph.


Boolean Matrices and Lattices

We now proceed to defiiH; a dosed trian~ular matrix and some oth1-'r related

matrices and to show that each one of them can be nsecl to definl' a. lattic1~.

• A boolean matrix is a. matrix with boolean literals as d1~ments. A boolean

matrix with the following properties is call1~cl a clo.~ed triangnlnr mn.tri1::

o all the elements in tlw first row are ls

o all diagonal elements are 1s.

o rows are dosed under direct product

For example, consider the following boolean matrix.

A=

1 1 0 1 0 0 0 0 0 0

1 1 1 0 1 0 0 1 0 0

1 1 1 1 1

The above matrix A is a closed trian~ular matrix since all elements in tlw first row

are 1s, all diagonal elements are 1s, and the direct product (bit by bit boolean

product) of any two rows is again a. row in A.

* Every closed triangular matrix represents a lattice completely and vice

versa..

First we show, by example, how to obtain the corresponding closed triangular

matrix from a lattice. Consider the box lattice given in Fig. 2.14. 'Construct a. 7 x 7

matrix iu which the node labels of the lattice form both row-headings and column

headings. If a column-heading is a subset of a row-heading, the corresponding

matrix element is 1, otherwise it is 0. Row-headings and column-headings start with

the largest label, continuing with the next largest and so on, following the lexical

order within each group. Then the resulting matrix will be a closed triangular

matrix. Note that this matrix is the same as the a(ljacency matrix of the partial

Databases <1nd Lattices 49

order of the lattice. In this example we havt~

ALJCD AH ;\(.' HIJ A IJ

AHCD 1 1 . 1 1 1 1 1

AB () 1 () () 1 1 1

AC () 0 1 () 1 () 1

K= BD 0 0 () 1 0 1 1

A 0 () 0 0 1 0 1

8 () () 0 () () 1 1

0 () () 0 0 0 1

It is easy to see that a closed triangular matrix repres1·'nb a lat tin~. The first row

of the matrix corresponds to tlw full element fl. That the row~ are dosed under

bit wise product implies that every pair of elemenb has a unique g.l. b .. A partial

order in which every pair of elements has a unique g.l.b. aud the full element

fl exists, is a lattice. Now, the Hasse diagram of the lattice can be obtained as

follows. Draw nodes corresponding to each row of the matrix. If 01w row belongs

to another, then draw an edge from the node corresponding to the latter row to the

node corresponding to the former row. Consider the following example

1 1 1 1 1 () 1 1 () 1

A= () () 1 0 1 0 0 0 1 1 0 0 0 0 1

The graph corresponding to the above matrix is given below in Fig. 2.18.

Fig. 2.18

Databases and Lattic<>.s

By taking the transitive reduction we obtain the Hasse diagram.

0

Let n lw the munlwr of nodes, ]I Uw munlwr of prinw 11<>< ks a.11<l (j the munbcr

of primitive nodes in a lattice. We define the followi11g ma.tri!'es.

• A-matrix: an n x q matrix in which the node lalwls of the lattin' form

row-headings and primitiw node labels form column-lwa.dings. If a row

heading is a subset of column-heading the corresponding matrix elemc~nt

is 1, otherwise it is 0.

• B-ma.trix: a q x p matrix in which the primitive node labels of the lat

tice form row-heading:; and prime node labels form column-headings. If

a column-heading is a subset of ~·ow-headi11g the cmT<~srH>IHling matrix

element is 0, otherwise it is 1.

• C -matrix: a p x n matrix in which the node labels of the lattice form

column-headings and prime node labels form row-headings. If a row

heading is a subset of column-heading the corresponding matrix eleme11t

is 1, otherwise it is 0.

• D-ma.trix: an n x n matrix in which the node labels of the lattice form

row-headings and column-headings. If a colmnn-headiug ifi a. subset of

i.-ow-heading the corresponding matrix elenwnt i:; 0, otherwise it is 1.

For our box lattice


AB AC HD

A fWD 0 0 0

AH 1 0 0

AC 0 1 0

A BD 0 0 1

A 1 1 0

B 1 0 1

1 1 1

AC: BD A H

AB 1 1 0 0

B AC 0 1 0 1

BD 1 0 1 0

ABCD AB AC BD A B

AC 1 0 1 0 0 0 0

c BD 1 0 0 1 0 0 0

A 1 1 1 0 1 0, 0

B 1 1 0 1 0 1 0

ABCD AB AC BD A B

ABCD 0 0 0 0 0 0 0

AB' 1 0 1 1 0 0 0

AC 1 1 0 1 0 1 0

D BD 1 1 1 0 1 0 0

A 1 1 1 1 0 1 0

B 1 1 1 1 1 0 0

1 1 1 1 1 1 0

Using these four boolean matri<~es the following results can lw derived.

Data'bases aud Lattices 52

* The matrices A, B, and C an~ related by the <~quations

where bar repre:·>ents complementation and T n•preseuts transposition.

We present an infornia.l proof of the above equations.

The ( i, j )th element of matrix CA is 1 if node i is a. suhsd of node ~: and node k:

is a subset of node j for some ~:. That is, node i is a. subset of node j. And by

definition of matrix B, the (i,j/" element of BT is 1 if w>de i is subset of node _j.

The ( 1:, .i )th element of matrix AB is 1 if node i is a snhsd of node 1.: and node k is

not a. superset of node .i for any k:. That is, node i is uot a superset of node j. An(l

by definition of matrix C, the ( i, .i )1" element of CT is 1 if node j is not a subset

of node i.

We can prove BC = AT in a similar fashion. The result ca.u lw easily verified for

our example.

* The matrices A, B, C, and D are related by tlw equations ABC

AAT =eTc= D.

The proof becomes clear if we notice that AB generates the columns of D corre

sponding to prime nodes. \Vhen AB is multiplied by C; the entire D is genera.tecl.

The above result can be easily verified for our example.

* Each of the matrices A, B, C, and D defines the lattice completely.

The proof follows from the methods of construction of the lattice given below. The

lattice can be constructed from A-matrix as follows. If the bitwist~ addition of i 1"

and j th rows results in the / h row then draw an edge from node i to no(le j. The

lattice can be constructed from B-matrix as follows. Take the bitwise addition of

Databases and Lattices 53-

the columns of B to produce as many columns as possihle. Tah~ tlw complement

of the resulting matrix and transpose it to get the A-matrix. The lattin~ can lw

constructed from C-matrix as follows. If the bitwise addition of i 1" and/" columns

results in the i 1h column then draw an edge from nocle i to nocle .J. Taking the

complement of the D-matrix gives the adjacency matrix K of tlw lattice ..

Now we move on to ddine othf~r representations of lattict-~s.

Generator Set and Armstrong Relation

Instead of the set of saturated sets, if we are ,e;iven the generator set the cor

responding lattice can be obtained. Saturated sets can be obtained hy taking the

intersection of elements in the generator sets and adding t.he universal dement.

Thus

* A generator set represents a lattice completely.

For example, consider the generator set { AC, A, B}. By taking the inten;ection we

get the pentagon lattice. Now we consider another representation of generator sets.

* Any finite subset of natural numbers represents a unique lattice and vice

versa.

Let us illustrate this with the help of an example. Let G = {1,3,4} be the finite

subset of natural numbers.· Write each element in the binary form. Then we have

G = {001, 011, 100}. Then assign an attribute to each position and write a bar if

there is a 0. Thus G = {ABC, ABC, ABC}. Vve can then obtain tlw generator set

{ AB, A, BC} by collecting the complemented part < Jf each element. Once we have

the generator set we obtain the set of saturated sets { qy, A, B, AB, BC, ABC}. Note

that we have to add the full set ABC to obtain the saturated sets. Thus we have

the following lattice.


AB

A

ABC

1

Fig. 2.19

'

BC

B

It is obvious from the above example that for every finite subset of natural m1mb<~rs

we will have a unique lattice. From the lattice the generator sets and in turn the

subset of natural numbers also can be obtained.

0

A lattice can also be represented by a class of sets. This becomes trivial from

the above discussion. Thus

* Any class of dosed sets represents a lattice.

The property closed under intersection assures us of g.l.b .. Moreover, if we add the

fnll set into it we obtain tlw set of saturated sets. The following is an example of

such a class.

{a., c}, {a}, { b}, <P

The above class will give us the pentagon lattice and it correspoi'Hls to dependencies

C---+ A, AB ---+ C.

0

Databases 11nd Lattice,, 5'5

Now we will construct a table called A rm8irong rcla.tion whi.d1 can represent

the corresponding lattice uni<p1dy. From the generator set we write tlw binary form

of each element to obtain a unique munher. The following example illustrates this

in detail. Let the generator set be {A,B,AC}. For A we will write 011 (0 for

position A, 1 for B aml C), for B we have 101, and for AC we have 010. Thus we

obtain the numbers 3, 5, and 2 n~s1wctively. Then coustruct the tahl<-~ as follows.

A B c 0 0 0 0 2 0 0 3 3 5 0 5

Note that the first row corresponds to the full element. This table has exa.ct.ly the

same clepenclenciPs we started out with. Thus

* An Arnistrong relation gives saturated sets and depend<~ncies, and thus

the lattice directly.

Commutative Idempotent Monoid

first.

This section gives another n~presentation for lattices. We need some definitions

• A set S with an operation ·, denoted by (S, · ), is called a. comm:ntati1Jc

idempotent monoid if for any three elements :z:, y, and z the following is

true:

i) x · ( y · z) = ( :r · y) · z

ii) :r · e = e where e is tlu~ unit elenwnt

iii) X· X = X

iv) :r·y=y·x

Da-ta:bases nnd Lattices 56

It can be easily Vt~rified that the set {ABC, BC, B' A,</>} with tlw operation n is a commutative idempotent monoid.

* Every commutativt~ idempotent monoid uuiqudy rq>restmts a lattice.

The above results follow directly from the fact that the conditiou idempotent 1!;Uar

antees loops and transitive ed1!;es in the correspondin1!; 1!;r<tph and commutativity

removes symmetry. Thus we obtain a partial order with the unit d<~nH~nt in it

which is a lattice. From the pentagon lattice we can construct the above given set

directly. To demonstrate the above result consider the following tablt~.

e e1 e2 f:~ (;4

e f f) f2 f:J e4

fJ f) fJ f2 e4 e4

e'2 e2 f2 e2 e4 e4

C:J C:J e4 e4 e:~ C4

e4 e4 e4 C4 e4 e4

It can he seen that the above multiplication tahle corresponds to a commutative

idempotent monoid. From the table, the lattice can lw constructed as follows.

o Draw nodes for each element in the set.

o Draw an edge from node i to node j if the entry cmTesponding to ith row

and Ph column is j otherwise do not draw au edge at all.

o Remove the loops and reduce the graph to obtain the Hasse diagram.

We will now. see how entropy diagrams can be used to represent a lattice.

Entropy Diagram

Every lattice can be represented by a corresponding entropy diagram. Consider

· the following entropy diagram.

DatabliSes and Lattices -57

c

A B

Fig. 2.20

The dependencies of the relation eorrespondiiig to the diap;ram cau lw ohtaiw-~ci hy

collecting all boolean terms of the shaded regions. Awl thus W<' arrivt-> at

f = a.bc +ca.

The saturated sets of the relation can be obtai1wd as follows. Collect all sets cor

responding to the unshaded regions and the ontsick n~giou. Coll<~ct complemented

part of each term to obtain the set of saturated sets. For the a hove exampl<·\ the sets

are ABC, ABC, ABC, ABC, and ABC. Taking the complcmeuted parts we gPt

¢,A, B, AC, and ABC as the saturated sets. And in tnrn w<J ohtai11 the 1wntagon

lattice. Thus

* An entropy diagram can represent a lattin~ <'omplet.ely mHl vice versa.

In the following part we see the connection lwtw<'<>n hookau algtJl >ra aud de

pendencies.

2.4.2 Boolean Algebra and Dependencies

The close conw~dion bdw1-~en Boolean algebra[40] awl relational databases

theory is established in this S<~ction. Mon~ov1~r, Boolt·an a.lgchra is showu to lw


useful in deriving somt> of the database results[24,28,42]. All the aJ~orithms are

explainecl with the help of t~xamples.

• A set with two operations, denoted hy (8, +,·)is called a Boolea.n nlgebm.

if the elements of B satisfies the followin~ propcrti<-~s.

o :r+y=y+:r

o :r · (y + z) = x · y + :r · z

o x+O=:z:

o :Jy 3 :r + y = 0

:r. y = y. :r

:r + ( y · z ) = ( :r + y ) · ( :r + z )

:r ·1 = :r

X· y = 1

wl1ere 0 and 1 are called additive and nmltipliea.tive identities respt~ctively.

Now we define some tt~nns which will be used later.

• A dependency term is a product of two or nH>re boolean variables in which

exactly one variable appears uncomplemented. A nnate term is a product

of boolean variables in which all variables appears complemented. A horn

term is either a dependency term or a una.te term.

For example, abc is a dependency term, a.b is a unate tt~nn. Let us see how we

can obtain the corresponding lattice from the dependency function. Consider the

same pentagon lattice discussed earlier. The saturated sets are </>, A., B, A.C, and

A.BC. V\7rite abc for </>, abc for A., abc for B, abc for A.C, and a.bc for ABC. Taking

the complements of these terms we obtain the dependency funetion f as

f = abc + abc + abc

= ac +abc

which straight away gives the fuiictional dependencies C ----+ A. and A.B ----+ C. Con

sider the box lattice given in Fig. 2.14. The saturated sets here are 4>, A, B, A.B,

A.C, BD, and ABCD. So we have

f = abed + abed + abed + a.bcd + abed + abcrl + a.br(l


and tlw dependency function

f = ahccl + abed + abc(l + abrd + ahrd

= ac + bel+ rtcd + bed

which gives us the corresponding functional depc-·mlencies C--+ A, D --+ B, AD --+

C, and BC--+ D.

In the same fashion, if the dependencies are given, we can obtaiu tlw corTe-

sponding lattice as follows. Take the mintenns in tlw complenwnt of the dc~JWIHlency

function. Collect the complemented part of each term to obtain tht-' saturated sets.

Thus

* From the dependency function, a unique lattic·e cau lw obtained a.nd vice,

versa.

Consider a relation having no dependencies. Using completion function we sa\v

that this relation corresponds to a boolean lattice. The same result can be seen to

be true if we consider the dependency function. Since the dependency function of .

this relation will have 271 elemenb where n is the numbt>r of attributes, we will have

271 subsets of n as the saturated sets which will correspond to a hoolt~an lattice.

Prime Implicants

The prime implicants of the dependency func~tiori give the eutire set of keys

along with simple dependencies which are implied by the given set of dependencies.

Consider the following relation

A B c ao b1 CJ

ao b2 CJ

ao b3 cz O.z b1 c3


having the dependencie:-:; AB ~ C and C ~ A. TlH~ corn~sponding dq)enclency

function i:-:; a.bc +ca. The prinw implicants a};, br, and i'!t. can he obtained by adding

abc to the dependency function. Tlm:-:; we ohtaiu tlw lloru fund.ion

h = ab + be + ca.

The three terms in the horn function rein·esents all tile clqwnd<·'IH~ies iu that rdation,

i.e., AB ~ C, BC ~ A, and C ~ A. But the dqwiHleury BC ~ A is a logical

consequence of the smaHer dependency C ~ A. The fully complemented terms in - -

the horn function ab and be give the keys AB and BC. Any rdatiou having the

dependencies will have AB and BC as their keys. Thus

* A lattice can be uniquely represented by the corresponding horn function.

Consider another example of a relation correspondiug to tlw box lattice having

dependencies C ~ A, D ~ B, AD ~ C, and B ~ CD. We have the dependency

function

f = ca + (Ib + a.clc + bed.

Adding the fully complemented term abed, we obtain the horn function

h = ca + db + adc + bed + a.bcri

= ari + be+ C(I + ca. + rlh.

The keys of a relation having the above set of dependencies are AD, BC, and CD.

Legitimate joins

In the first section, the terms legitimate join and lossless decomposition are

defined. We will now discuss an algorithm to check whdlwr a given join is legitimate

or not for a set of dependencies. In other words, the following algorithm checks

whether a. join dependency is a. valid one for a. relation or not. We will <~xpla.in this

algorithm using the previous rdation having the depeudeucies C ~ A a.ud

Databases aud Lattices 61

AB ~ C. Let tlw attribute ll<tllH's lw X 1 ,X't, all(l X:1 iusteacl of A, B, awl C. Giveu

the dependencies XI x'l ~ X:l awl X:~ -----t x, tlw problem is to dwck vvlwtlwr t.lw

join XI x3 * x'lx3 is legitimat.<·. The algorithm has Hw followiug steps.

o Write the dep(-~udency fundi on as

o Take the comp1enwnt <>f the < lepeudency fnudion f awl call it. .IJ. Tlms

= :r 1 a:2.-r:1 + J~ 1 :r 2 :r:1 + :r 1 :/~2:r:1 + :1: 1:r'L:l::1

o Multiply the function !l by the attributes iu tlw projections of tht-~ join

and name them fJJ:l, :rJn, a.ncl [/:3 (where y:1 is tlw common part of the

projections.)

For the projt~d.ions XI x3 aud x2x3'

= :r 2 :r:3 + :r 1 :r:l

= :t1 :~:2:1:3 + :!:1 :r2:r:l + :r, :1:2:r:1

!f2:l = g(:r2 + :r:l)

= :r'L:r3 + :1:1 :r2 + :rl :r:l + :1: I :r2:!::l

= :r 1 :r2 :c 3 + :l: 1 :r 2 :r3 + :r1 :!: 2 :r:1 + :1: 1:r'L:l::3

g3. = g(:r:l)

= :r2:r:l +:I: I :r2x:l + :rl :I":J

= :r1x2:r3 + i1 x2x:1 + :r1 :l:2:r::1

o Take the Hamming W<~ight of g and the projt~dious (Hamming weight of

the function is the numh<'r of mintenns of the eqnatiou). Ld th<~ Hamming

weight of g lw H, that of f}J:l he H J:l, that of .rf'L:1 he H n, and that of g:l lw

H3. For this example we have H = 4, H1 :~ = 3, H2 :1 = 4, aud H:1 = 3. We

Da·tabases and Lattices 62

say that if th~ Hamming w~ight equationH = H1:1 +H23- H'.l is sa.ti:sfied,

the given join is legi tima.te. Note that this Hamming w1~ight equation is a

dir~ct observation obta.irwd from the incln.~ion cxclu.~ion principle. Thus

The following are some of the corollaries:

* If we consider any' other set of dependencies which is a. superset of th(~

above set, the same join will again be legitimate.

Now let us dwck if the 'join X 1 X 2 * X 2 X.3 is legitimate.

912 = :t1:r2:r:1 + :r1:r2:r3 + :f1:r2:r3 + :r1:l:2:r:1

9:l = x1:r2::r3 + i1:r2:t3 + x1:r2:z::1

The Hamming weights H = 4, H12 = 4, H23 = 4~ and H:1 = 3. In this case

the Hamming weight equation H = H 12 + H2 3 - H2 does not satisfy. Therefore,

xl x2 * x'l.x.J is not a legitimate join.

Boolean Algebra and MVDs

In this section, representation of MVDs in terms of boolean algebra is clisct1ssed.

The MVD ABC --H DE in the relation !R(A, B, ···,X) can be represented b_Y the

entropy equation

H(ABC(D + E)(F + .. ·+X))= 0

(The last factor consists of all those variables which have not appeared in the state

ment of the MVD). The dependency can be stated as, the variables {DE jAB C} and

{F · · · XjABC} are independent. In other words, variables {D, E} and {F, ···,X}

are independent when {A, B, C} is given. Further the entropy of every minterm of

ABC(D + E)(F +···+X)


IS zero. This follows from a slight generalization of the fact that

This implies that {A 1, Az, · · · , Ar} and { Ar+ 1, · · · , A11 } are independent and hence

the entropy of every mintern of

IS zero. From the above statement it follows that the boolean fundiou ABC( D +

E)( F +···+X) can represent an MVD. To fix these ideas we give a simple example

of a relation where MVD B ----++ C holds and we show that H(BAC.) = 0. The

relation is shown below.

Rent A Ocupant B Pet C 50 Nair Lion 50 Mehta Deer 50 Mehta Snake 75 Taneja Mouse

500 Mathur Fish 500 Menon Dog 500 Menon Rhino 600 Chhibber Cat

Assigning the probability of 0.125 for each one of the tuples occuring in the table,

we get

so that

H(A +B +C)= 3

H(A +B)= 5/2

H(B +C)= 3

H(B) = 5/2

H(BAC) = H(A +B)+ H(B +C)- H(A + B +C)- H(B)

= 5/2 + 3- 3- 5/2 = 0


From the above results we conclude that any set of dependencies cau be represented

by a boolean functiou. For example, if R( A, B, C) has the dqwndenci<~s A - B,

and C ___. A in it, we can write the boolean function as

f = a.bc + ca

Boolean. Dependencies

We have seen that boolean algebra plays an important. role in defining depen

dencies. But we notice that many boolean functions like ABCDE h.a.ve not yet

been defined. Here we attempt to describe the dependencies corresponding to these

boolean terms. We call this kind of dependency a boolean dependency. The cor

responding dependency for boolean term ABCDE is given by AB ___. CDE. Now,

since we know that the entropy of dependencies is equal to zero, entropy of the

above boolean term is H(ABCDE) = 0. Expanding the left hand side, we get

H(A + B +C)+ H(A + B +D)+ H(A + B +E)- H(A. + B + C +D)- H(A +

B + C +E)- H(A + B + D +E)+ H(A + B + C + D +E)- H(A +B)= 0.

Or,

H(A + B + C +D) +H(A + B + C +E)+ H(A + B + D +E)+ H(A +B)

= H(A + B + C + D +E)+ H(A + B +C)+ H(A + B +D)+ H(A + B +E)

On looking at the relation again, we see that the above system gives natural join ' ' '

equation

ABD * ABCE * ABDE = ABCDE

Thus we find that the given boolean dependency can be represented as the join of

particular relations. Now, corresponding to zero boolean function i.e., mintern with

all terms uncomplemented, as in ABC for instance, we find there are many join

dependencies which imply all relations whose join does not give the original relation,

i.e., there is a class of joins for which the boolean dependency is zero. Adding the


relation, i.e., there 1s a class of joins for which the boolean dependency is zero.

Adding the zero boolean depend<~ncy does not, therdore, alter t.lw orip;inal clepen

dency, and eorrespondinp; to a boolean function w<-~ thus find a class of joins. So,

corresponding to every miutern there is an equiva.lem~<' class of joins i.e., <~very one

of the joins gives this mintern.

In the next section, normal forms of relations are discussecl: Vv<~ have con

centrated on the most important form, Boyce-Codd Normal Form (BCNF) and

algorithms are given to .clt~compose a relation into BCNF.

2.5 Normal Forms

Data normalization is one of tlw most important concepts in the design as

pect of databases. It is important to keep the relations in normalized forms for

better storage mid retrieval of data. Tht::Te are many algorithms available in the

literature[5,6,17,21,22,2G]. In this section we discuss these normal forms. Moreover,

algorithms are given to decompose a relation into BCNF which are based on the

lattiees and boolean algebra concepts discussed in earlier sections.

• A relation is said to be in fir$t norma.! form ( lNF) if the domain of ev

ery attribute iu the relation is a simple set. If the determinant of every

nonprime attribute is an superkey then, the relation i:-; in 8econd norrna.l

form (2NF). If the determinant of every nonprime attribute is a k<~y, then

the relation is said to be in third normal form ( 3NF). A rel-ation lS Ill

Boyce-Codd normal form (BCNF) if every determinant is a key.

Note that in relational database theory we consider relations in lNF only. The

other two important .normal forms related to MVDs are defined below.

• A ~elation is in fo11.rth normal form ( 4NF) if whenever U -H V Is a

Databases nnd Lattices 1>6

is a logical consequence of the set of key dependencies of the schema. A

relation is in projection-join normal form (P.JNF) sometimes called fifth

normal form (5NF) if every .JD is a lo,e;ica.l c<>HS<~qncnce of the set of key

dependencies of the relation.

Now we discuss BCNF and give two algorithms to decompose a relation into . .

BCNF.

2.5.1 Boyce-Codd N annal Fonn .

In this section we discuss this particular normal form in <letail and give some

algorithms to decompose a relation into BCNF[16]. A procedure is given to decom

pose a relation having only functional dependencies into BCNF. This procedure is

based on boolean algebra theory discussed earlier.

Algorithm 1:

First, we take an example to illustrate the decomposition procedure. Consider

a relation in a database in which there are four attributes {A, B, C, D} and following

dependencies: C --+ A, D --+ B, AD --+ C, BC --+ D. Write down the dependency

function f and reduce it to the minimal boolean function. In this case

f = ca + db+ ii.dc + bed.

To facilitate the decomposition it is necessary to list all the keys of the relation.

As we have seen earlier these are the fully complemented terms in the function

where Xi's are the attributes of the relation. Note that if all the terms in the prune

implicants are keys then the relation is already in BCNF. The keys of the above

relation are AD, BC, and CD.


The BCNF decomposition can be obtained as follows. Choose one of tlw small

est terms in .f. If then~ is more than one term with the s<une number of attributes,

then choose the term which has tlw lexically smallest dependent at.trilmte. H<·n-· we

choose CA. Delete all the terms in .f having all the attributes of this term. Thus

J =db+ bed.

Delete all the terms from the list of keys having the dependent a.ttrihut,c of the

chosen term in it. Here we delete AD. Now choose the next smallest term BD

and delete bed from .f and BC from the keys. Since there are no terms left in f

list out the terms which are left in keys and aU the chosen terms to obtain the

decomposition. Thus a BCNF of the above relation will be CD, AC, BD. It is dear

from the above procedure that we cannot choose a term in f which has a. smaller

dependency within it. Thus all the choser1 terms having, say, k attributes, will have

determinants of size k- 1 and hence are in BCNF.

From the above example, the general procedure for obtaining a BCNF decom-

position of a relation can be given as follows:

o From minimal dependency function list a.ll terms.

o From the prime implicants list all keys.

o Choose one of the smallest terms in the given dependency function making

use of lexical order in case of conflicts.

o Delete all the terms in the dependency function having all the attributes

of the chosen term.

o Delete all the terms in the list of keys having the dependent attribute of

the chosen term.

o Repeat the above steps until no term is left in the dependency f,·mction.

o List out all terms whid1 are left in the keys along with the chosen terms.

o Remove all terms which are subsets of other terms listed to obtain a

BCNF.

Dat·abases .and Lattices- 68

Consider another example of a. relation which ha:-; the followinp; set of depen

dencies: C -t A, and AB -t C. Note that this corresponds to the rwutagou lattice.

We have f = ca +abc. Adding the fully eomplenH~ntl~d term ii.hc, we ohta.in the keys

AB and BC. By the third step of the above algorithm, choose the smalle:-;t term

AC. When we- choose the this term from f and delete AB from list of keys, we will

exhaust all terms of J and we obtain the BCNF decomposition AC and BC.

Algorithm 2:

Now we give a procedure for BCNF decomposition basefl on lattice theory. A

relation is said to be in reduced form if all the equivaleut attributes are removed.

For example, in a relation with dependencies A -t Band B -t A, we retain only one

attribute between A and B. For this algorithm we consider only relations in reduced

form. As in the first ,algorithm we take an example to illustrate the decomposition

procedure. Consider a. relation in a database in which there are four attributes

{A, B, C, D} and the following dependencies: C -t A, D -t B, BC -t D, AD -t C.

The lattice which represents. the above dependencies is shown in Fig. 2.2l.a. The

nodes of the Hasse diagram are designated by tlw corresponding saturated sets

and all the edges in the diagram are assumed to have anows pointing downwards.

We saw in earlier sections that the saturated sets are closed under intersection and

they by themselves completely define the lattice.

To facilitate the decomposition it is necessary to relabel the Hasse diagram

as follows. Starting from the topmost node, from each node label remove all the

attributes which occur in the labels of its children. The· resultant diagram is shown

in Fig. 2.21.b.


ABCD

AC BD c D

A B A B

1

Fig. 2.2l.a Fig. 2.2l.h

After relabeling, every node with outdeg1_·ee on<-' will have exactly ow-~ distinct at

tribute. Other nodes may or may not have a label. Now remove all unlabeled nodes

except the topmost node by maintaining all existing paths. Here, the middle node

can be removed by drawing edges fro~u the topmost node to node A and node B.

Since there are no paths through the lower most node, it can he removed without

drawing any edges. For obtaining the decomposition, all the transitive edges are to

be removed. The resultant partial order is shown in Fig. 2.22.

c D

A B

Fig. 2.22


Using Fig. 2.22 it is easy to decompose the relation into BCNF. For each nocle,

collect the labels of its children and add them to its label and list them to obtain

the BCNF. Thus a decomposition for om example is CD, AC, BD. Note that we

need not include single attributes in the BCNF. The projection corresponding to

the topmost node CD is a key of the relation, hence obviously in BCNF. In every

oth-er projection with two attributes there will be one ddenninant. which is the key

of that particular relation. In all other projections with more than two attributes,

there will be two determinants: one is the parent node label and the other is the

set of all its children which determines the parent node. Hence all the projections

obtained will be in BCNF.

From the above example, the gerwral procediue for the decomposition of a

i·elation can be given as follows:

o From the saturated sets of the relation, draw the Hasse diagram of the

lattice.

o Starting from the topmost node, from each node label remove all the

attributes which occur in the labels of its children. Redraw the Hasse

diagram.

o Remove all unlabeled nodes except the topmost node, by maintaining all

existing paths.

o Remove all transitive edges to obtain a partial order.

o List the projections corresponding to each node after adding all the labels

of its children to its label to obtain a BCNF decomposition.

Finally, if we observe that in each projection the key is in its parent projection,

it becomes clear that the join of our decomposition will always give the original

relation back. The decomposition is obviously unique.

It can be easily verified that this algorithm works for the pentagon lattice too.

After relabeling and reducing the Hasse diagram we get the following partial order.

Databases and· Lattices 71

c B

A

Fig. 2.23

For each node, collecting the labels of its children, we obtain the Boyce-Codd

Normal Form decomposition as BC and AC.

0

In this chapter we have seen how relational database theory is closely connected

with lattices. We have also given two algorithms to find a BCNF decomposition of

a relation. Though we have consiclei·ed only reduced relations in the case of lattice

algorithm·, we can extend this algorithm to accomodate nonreduced relations as

well. From the discussion it should be clear that thest-~ discrete structures can be

extensively used in the study of databases.

Documents

Databases and Lattices - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/16656/8/08_chapter 2.pdf · For example, a Maruti 800 car with a registration number KGX 6018 is an entity