CS 222 Database Management System
Spring 2010-11
Lecture 5 Database Design (Decomposition)
Korra Sathya BabuDepartment of Computer Science
NIT Rourkela
• Design of DB is needed to reduce redundancy and anomalies
• The theory of Functional Dependency is completely studied
• Better Design requires schema refinement• A solution for schema refinement is Synthesis of
relations
04/20/23 Database Design 2
Recap
04/20/23 Database Design 3
Relation Decomposition
R-X + X X +-X
R2
R1
R
• Reason for Decomposition• A solution for reducing redundancy and Anomalies
• Rules for synthesis• Lossless Join (Information Preservation)
• Dependency Preservation (a special case of information preservation)
• Decomposition (synthesis) types• By functional dependency• By multi-valued dependency• By Join dependency
04/20/23 Database Design 4
Relation Decomposition
• DefinitionA decomposition D = {R1, R2,..., Rm} of R has the lossless join property with respect to the set of dependencies F on R if, for every relation r of R that satisfies F, the following holds, (R1(r), ..., Rm(r)) = r
where is the natural join of all the relations in D
• The word loss in lossless refers to loss of information, not to loss of tuples.
04/20/23 Database Design 5
Lossless Join
Input: A relation R, a decomposition D = {R1, R2,..., Rm} of R, and a set F of Functional Dependencies
04/20/23 Database Design 6
Test for Lossless Join
Lossless Join Test Algorithm:Step 1: Create an initial matrix S with one row i for each relation Ri in D, and one column j for each attribute Aj in R.
Step 2: Set S(i, j) := bij for all matrix entries
Step 3: For each row i representing relation schema Ri Do{for each column j representing Aj do
{if relation Ri includes attribute Aj thenset S(i, j) := aj;}
Step 4: Repeat the following loop until a complete loop execution results in no changes to S.
04/20/23 Database Design 7
Test for Lossless Join
Lossless Join Test Algorithm: continues…Step 4: Repeat the following loop until a complete loop execution results in no changes to S.
If {for each function dependency X Y in F dofor all rows in S which have the same symbols in the
columns corresponding to attributes in X do{make the symbols in each column that correspond to
an attribute in Y be the same in all these rows as follows:
if any of the rows has an “a” symbol for the column,set the other rows to the same “a” symbol in the column.If no “a” symbol exists for the attribute in any of therows, choose one of the “b” symbols that appear in oneof the rows for the attribute and set the other rows tothat same “b” symbol in the column;}}
Step 5: If a row is made up entirely of “a” symbols, then the
decomposition has the lossless join property;
otherwise it does not.
04/20/23 Database Design 8
Example 1
SSN PNUM hours ENAME
Emp_PROJ
PNAME PLOCATION
F = {SSN ENAME, PNUM {PNAME, PLOCATION}, {SSN, PNUM} hours}
SSN ENAME
R1
PNUM PNAME PLOCATION
R2
SSN PNUM hours
R3
04/20/23 Database Design 9
Example 1
A1SSN
A2ENAME
A3PNUM
A4PNAME
A5PLOCATION
A6hours
b11
b21
b31
b12
b22
b32
b13
b23
b33
b14
b24
b34
b15
b25
b35
b16
b26
b36
R1
R2
R3
a1
b21
a1
a2
b22
b32
b13
a3
a3
b14
a4
b34
b15
a5
b35
b16
b26
a6
R1
R2
R3
04/20/23 Database Design 10
Example 1
a1
b21
a1
a2
b22
a2
b13
a3
a3
b14
a4
b34
b15
a5
b35
b16
b26
a6
R1
R2
R3
a1
b21
a1
a2
b22
a2
b13
a3
a3
b14
a4
a4
b15
a5
a5
b16
b26
a6
R1
R2
R3
SSN ENAME
PNUM {PNAME, PLOCATION}
SSN ENAME
PNUM PNAME PLOCATION
04/20/23 Database Design 11
Example 2
SSN PNUM hours ENAME
Emp_PROJ
PNAME PLOCATION
F = {SSN ENAME, PNUM {PNAME, PLOCATION}, {SSN, PNUM} hours}
ENAME
R1
SSN PNAMEPLOCATION
R2
PNUM hours PLOCATION
04/20/23 Database Design 12
Example 2
A1SSN
A2ENAME
A3PNUM
A4PNAME
A5PLOCATION
A6hours
b11
b21
b12
b22
b13
b23
b14
b24
b15
b25
b16
b26
R1
R2
b11
a1
a2
b22
b13
a3
b14
a4
a5
a5
b16
a6
R1
R2
SSN ENAMEPNUM {PNAME, PLOCATION}{SSN, PNUM} hours
• Check whether the following decompositions are lossy or lossless• Let R=ABCDE, R1=AD, R2=AB, R3=BE, R4=CDE, R5=AE.
Let F={AC, BC, CD, DEC, CEA}• R(XYZWQ), FD={XZ, YZ, ZW, WQZ, ZQX}.
R1(XW), R2(XY), R3(YQ), R4(ZWQ), R5(XQ)• R(XYZ), F={XY, ZY}. R1(XY), R2(YZ)• R(XYWZPQ), D={R1(ZPQ), R2(XYZPQ)}
F={XYW, XWP, PQZ, XYQ}
04/20/23 Database Design 13
Problems
R was decomposed (normalisation) into R1, …, Rn
S - the set of FDs for RS1, …, Sn - the set of FDs for R1, …, Rn (each Si refers to
only the attributes of Ri)
S’ = S1 … Sn (usually, S’ S)
the decomposition is dependency preserving if S’+ = S+
04/20/23 Database Design 14
Dependency Preservation
04/20/23 Database Design 15
Test for Dependency Preservation
Dependency Preservation Test:Step 1: For each XY Є F initialize a set T of attributes with the attributes of X (the determinant of the FD under consideration). ie set T=X and continue with step 2
Step 2: Repeat step 3 until the set T no longer changes. When T no longer changes continue with step 4
Step 3: For each relation Ri (1≤ i ≤ k) of the input decomposition apply the corresponding Ri operation (on a set of attributes T with
respect to set of dependencies F). i.e T=T ∩ ((T ∪ Ri)+ ∩ Ri) and
repeat step 3
Step 4: Test to see if Y(the right hand side of the FD under consideration) is such that Y ⊂ T. There are two outcomes to this test. If the answer is negative. i.e. if Y not a subset of T then stop the execution of the algorithm and report that the decomposition does not preserve the FD. If the answer is affirmative, i.e. if Y ⊂ T then XY Є G+. If there are other FDs in F that need to be considered repeat step 1 with a FD that has not been considered before. If no more FDs in F then continue with step 4
Input: decomposition D={D1,…,Dk} and a set of FDs F
04/20/23 Database Design 16
Problems
1.Given R(XYZ) and the set F = {ZX , XYZ}. Check if the decomposition R1(XY) and R2(XZ) preserve the set F.2.Given R(ABCD) and the set F = {AB , CD}. Check if the decomposition R1(AB) and R2(CD) preserve the set F.3.Determine if the decomposition D={R1(XY), R2(YZ), R3(ZW)} of the relation R(WXYZ) preserves the dependencies of the set F={XY, YZ, ZW, WX}.4.Given R(ABCDEF) and the set F = {AB , CDF, ACE, DF}. Check if the decomposition R1(ACE), R2(CD), R3(DF) and R4(AB) preserve the set F.
• Normalization is the process of successive reduction of a given set of relations to a better form (reduced redundancy and anomalies)
• The normalization that one needs to sustain depends on the work flow (tradeoff between fast access, maintenance of integrity)
• Assumes that all possible functional dependencies are known• First construct a minimal set of FDs• Then apply algorithms that construct a required Normal
Form
• Additional criteria may be needed to ensure that the set of relations in a relational database are atisfactory
04/20/23 Database Design 17
Normalization
• A relation is in first normal form (1NF) if it does not contain any repeating columns or repeating groups of columns
• It is the process of converting complex data structures into more simple, stable data structures
• A relvar is in 1NF if and only if in every legal value of that relvar, every tuple contains exactly one value for each attribute
• First Normal From (1NF)• Unique rows• All attributes are atomic
04/20/23 Database Design 18
1 NF
• A table is in the second normal form (2NF) if it is in the first normal form and if all non-key columns in the table depend on the entire primary key
• The following relation is in 1NF but not 2NF
04/20/23 Database Design 19
2 NF
EMPLOYEE2(Emp_ID, Name, Dept, Salary, Course, Date_Completed)
Functional dependencies:1. Emp_ID Name, Dept, Salary2. Emp_ID, Course Date_Completed
partial key dependency
Decompose into 2NFEMPLOYEE1(Emp_ID, Name, Dept, Salary)Functional dependencies: Emp_ID Name, Dept, Salary
EMPCOURSE(Emp_ID, Course,Date_Completed)Functional dependency: Emp_ID, Course Date_Completed
• A table is in the third normal form (3NF) if it is in the second normal form and if all non-key columns in the table depend non-transitively on the entire primary key
04/20/23 Database Design 20
3 NF
SALES(Customer_ID, Customer_Name, SalesPerson, Region)Functional dependencies:1. Customer_ID Customer_Name, SalesPerson, Region2. SalesPerson Region
Decompose into 3NFSALES1(Customer_ID, Customer_Name, SalesPerson)Functional dependencies: Customer_ID Customer_Name, SalesPerson
SPERSON(SalesPerson, Region) Functional dependency: SalesPerson Region
Transitive Dependency
• A table is in Boyce-Codd normal form (BCNF) if every column, on which some other column is fully functionally dependent, is also a candidate for the primary key of the table
• A table is in BCNF if the only determinants in the table are the candidate keys
04/20/23 Database Design 21
BCNF
SCHOOL(Student, Subject, Teacher)Functional dependencies:1. Student, Subject Teacher2. Student, Teacher Subject3. Teacher Subject
Decompose into BCNFSCHOOL1(Student, Subject)SCHOOL2(Subject, Teacher)
All Functional Dependencies vanished except TeacherSubject
• It is always possible to decompose a relation into relations in 3NF such that: the decomposition is lossless the dependencies are preserved
• It is always possible to decompose a relation into relations in BCNF such that: the decomposition is lossless but it may not be possible to preserve dependencies But may eliminate more redundancy
04/20/23 Database Design 22
Comparison between 3NF and BCNF
Let R be a relation schema and let R and R. The multivalued dependency
holds on R if in any legal relation r(R), for all pairs for tuples t1
and t2 in r such that t1[] = t2 [], there exist tuples t3 and t4 in r such that: t1[] = t2 [] = t3 [] = t4 []
t3[] = t1 [] t3[R – ] = t2[R – ] t4 ] = t2[] t4[R – ] = t1[R – ]
• MVD is a tuple generating Dependency04/20/23 Database Design 23
Multivalued Dependency
• A table is in the fourth normal form (4 NF) if it is in BCNF and does not have any independent multi-valued parts of the primary key
• If there are two attributes A and B and for a given value of A if there exists multiple values of B, then we say that an MVD exists between A and B
• The normal forms after BCNF are theoretical interests
04/20/23 Database Design 24
4 NF
Student Table
04/20/23 Database Design 25
4 NF
Student Subject Language
Geeta Mythology English
Geeta Psychology English
Geeta Mythology Hindi
Geeta Psychology Hindi
Shekher Gardening English
Student Subject Student Language
04/20/23 Database Design 26
4 NF
Student Subject
Geeta Mythology
Geeta Psychology
Shekher Gardening
Here we take care of the update anomaly
Split the independent multi-valued components of the primary key into two tablesThe primary key is (student subject language)
Student_Subject Table
Student Language
Geeta English
Geeta Hindi
Shekher English
Student_Language Table
• There exists relations that cannot be nonloss-decomposed into two projects, but can be decomposed into three or more
04/20/23 Database Design 27
Surprise: Loss less Decomposition
• Definition: A relation R satisfies the join Dependency (JD) *(X,Y,…,Z)
iff R is equal to the join of its projects on X,Y,..,Z, where X,Y,..,Z are subsets of the set of attributes of R.
• Consider the following Suppliers(S), Parts(P) and Location they Supply (L) tableSPL Table
04/20/23 Database Design 28
Join Dependency
S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S1 P1 L1
S P
S1 P1
S1 P2
S2 P1
P L
P1 L2
P2 L1
P1 L1
ACTUAL DECOMPOSTIO
N
04/20/23 Database Design 29
Join Dependency
S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S1 P1 L1
S P
S1 P1
S1 P2
S2 P1
P L
P1 L2
P2 L1
P1 L1
ACTUAL DECOMPOSTIO
N
Join
S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S1 P1 L1
S2 P1 L2Spurious Tuple
04/20/23 Database Design 30
Join Dependency
S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S1 P1 L1
S P
S1 P1
S1 P2
S2 P1
P L
P1 L2
P2 L1
P1 L1
DECOMPOSTION
Join
L S
L2 S1
L1 S1
L2 S2
S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S1 P1 L1
• A table is in fifth normal form (5NF) if it is in the fourth normal form and every join dependency in the table is implied by the candidate key
• Its also called as the Project Join Normal Form (PJNF)
04/20/23 Database Design 31
5 NF
04/20/23 Database Design 32
Normalization
Un-normalized Relation
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
Arrange every atomic value in the cell (intersection of row and column) of a table
Eliminate Partial Dependencies
Eliminate Transitive Dependencies
Make every determinant as a key
Eliminate Multi-valued Dependencies that are not Functional Dependencies
Eliminate Join Dependencies that are not implied by Candidate keys
• Denormalization if a process in which we retain or introduce some amount of redundancy for faster data access
• Where there arise tradeoffs
04/20/23 Database Design 33
Denormalization
• Normalization helps to reduce redundancy and few anomalies
• The first 3 (1, 2 and 3) normal forms are practical but BCNF, 4NF and 5 NF are more of theoretical interests
• Denormalization is done for fast access
04/20/23 Database Design 34
Summary