8/8/2019 29SpCS157BL16BCNF&Lossless
1/67
BCNF & Lossless Decomposition
Prof. Sin-Min Lee
Department of Computer Science
8/8/2019 29SpCS157BL16BCNF&Lossless
2/67
NormalizationReview on Keys
superkey: a set of attributes which will uniquelyidentify each tuple in a relationcandidate key: a minimal superkey
primary key: a chosen candidate key
secondary key: all the rest of candiate keys prime attribute: an attribute that is a part of acandidate key (key column)
nonprime attribute: a nonkey column
8/8/2019 29SpCS157BL16BCNF&Lossless
3/67
NormalizationFunctional Dependency Type by Keys
whole (candidate) key p nonprime attribute : fullFD (no violation) partial key p nonprime attribute : partial FD(violation of 2NF)
nonprime attribute p nonprime attribute :transitive FD (violation of 3NF)not a whole key p prime attribute : violation of BCNF
8/8/2019 29SpCS157BL16BCNF&Lossless
4/67
Functional DependenciesLet R be a relation schema
E R and F R
The functional dependency
E p F
holds on R iff for any legal relations r (R), whenever two tuples t 1and t 2 of r have same values for E, they have same values for F.
t 1[E] = t 2 [E] t 1[ F ] = t 2 [ F ]
On this instance A p B does NOT hold but B p A does hold.
1 41 5 3 7
A B
8/8/2019 29SpCS157BL16BCNF&Lossless
5/67
1. ClosureGiven a set of functional dependencies, F, itsclos ure , F + , is all FDs that are implied by FDs in F .
e. g . I f A B , and B C ,
then clearly A C
8/8/2019 29SpCS157BL16BCNF&Lossless
6/67
Armstrongs Ax iomsWe can find F + by applying ArmstrongsAx ioms: if F E , then E p F (reflexivity) if E p F , then KE p K F (augmentation)
if E p F , and F p K, then E p K (transitivity)
These rules are
sound (generate only functional dependencies thatactually hold) and
complete (generate all functional dependencies thathold).
8/8/2019 29SpCS157BL16BCNF&Lossless
7/67
Additional rulesIf E p F and E p K, then E p F K(union)If E p F K, then E p F and E p K(decomposition)
If E p F and KF p H , then E Kp H(pseudotransitivity)
The above rules can be inferred from Armstrongs
axioms.
8/8/2019 29SpCS157BL16BCNF&Lossless
8/67
Example R = ( A , B , C , G , H , I) F = { A p B
A p C CG p H CG p I
B p H }Some members of F +
A p H by transitivity from A p B and B p H
A G p I
by augmenting A p C with G, to get A G p CGand then transitivity with CG p I CG p HI
by augmenting CG p I to infer CG p CG I , and augmenting of CG p H to infer CGI p HI ,
and then transitivity
8/8/2019 29SpCS157BL16BCNF&Lossless
9/67
2. Closure of an attribute set
Given a set of attributes A and a set of FDs F, clos ure o f A under F is the set of all attributes implied by A
In other words, the largest B such that:A B
Redefining su per keys:The clos ure o f a s u per key is the entire relation
schemaRedefining candidate keys:
1. It is a super key
2. No subset of it is a super key
8/8/2019 29SpCS157BL16BCNF&Lossless
10/67
Computing the closure for A
Simple algorithm
1. Start with B = A.
2. Go over all functional dependencies, F pK, in F +
3. If F B , thenAdd Kto B
4. Repeat till B
changes
8/8/2019 29SpCS157BL16BCNF&Lossless
11/67
Example R = ( A , B , C , G , H , I) F = { A p B A p C
CG p H CG p I
B p H }
(AG) + ?1 . result = AG
2. res ult = AB CG ( A p C and A p B )
3. res ult = AB CGH (CG p H and CG A G B C)4. res ult = AB CGHI (CG p I and CG A G B CH
Is (AG) a candidate key ?1. It is a super key.2. A+ = BC G+ = G.
8/8/2019 29SpCS157BL16BCNF&Lossless
12/67
Uses of attribute set closuresDetermining su perkeys and candidate keys
Determining if A B is a valid FDCheck if A+ contains B
Can be used to compute F+
8/8/2019 29SpCS157BL16BCNF&Lossless
13/67
Database Normalization
F unctional dependency ( FD ) means that if there is only one possible value of Y for every value of X, thenY is Functionally dependent on X.
Is the following FDs hold?
Y X "
X Y Z
10 B1 C1
10 B2 C2
11 B4 C1
12 B3 C4
13 B1 C1
14 B3 C4
Y X "
Y Z "
Z Y "
X Y "
8/8/2019 29SpCS157BL16BCNF&Lossless
14/67
Functional Dependency is good. With functionaldependency the primary key ( A ttribute A) determines thevalue of all the other non-key attributes ( A ttributesB,C,D,etc.)
Transitive dependency is bad. Transitive dependencyexists if the primary/candidate key ( A ttribute A) determinesnon-key A ttribute B, and A ttribute B determines non-keyA ttribute C.
If a relation schema has more than one key, each is called acandidate key
An attribute in a relation schema R is called prim if it is amember of some candidate key of R
Database Normalization
8/8/2019 29SpCS157BL16BCNF&Lossless
15/67
First Normal Form (1NF)
Each attribute must be atomic ( single v alue)No repeating columns within a row (composite attributes)No multi-valued columns.
1NF si mplifies attributesQueries become easier.
8/8/2019 29SpCS157BL16BCNF&Lossless
16/67
1NF
D eptno D name Location10 IT Leeds, Bradford, Kent
20 Research Hundredfold
30 Marketing Leeds
D eptno D name10 IT
20 Research
30 Marketing
D eptno Location10 Leeds
10 Bradfprd
10 Kent
20 Hundredfold
30 Leeds
8/8/2019 29SpCS157BL16BCNF&Lossless
17/67
Second Normal Form (2NF)
Each attribute must be functionally d e pend ent on the pri mary k ey.
If the primary key is a single attribute, then the relation is in 2NFThe test for 2NF involves testing for FDs whose left-hand-sideattribute are part of the primary keyDisallow partial dependency, where non-keys attributes depend on
part of a composite primary keyIn short, remove partial dependencies
2 NF i mprov es d ata integrity.Prevents update, insert, and delete anomalies.
8/8/2019 29SpCS157BL16BCNF&Lossless
18/67
2NFPNo PName PLoc EmpNo E Name Salary Address HoursNo
G iven the following FD s:
Assuming all attributes are atomic, is the above relation inthe 1 N F , 2 N F ?
Relation X 1 Relation X3
Relation X 2
A ddressSalary Name EmpNo
Loc Dname PNo
Ho ursNo EmpNo PNo
,,
,
,
"
"
"
PNo PName PLoc
EmpNo E Name Salary Address
PNo EmpNo HoursNo
8/8/2019 29SpCS157BL16BCNF&Lossless
19/67
Third Normal Form (3NF)Remove transitive dependencies.T ransitive dependency
A non A non--prime attribute is dependent on another, nonprime attribute is dependent on another, non- -primeprimeattribute or attributesattribute or attributes Attribute is the result of a calculation Attribute is the result of a calculation
Examples: Area code attribute based on City attribute of a customerTotal price attribute of order entry based on quantity attribute
and unit price attribute (calculated value)
Solution:Any transitive dependencies are moved into a smaller table.
8/8/2019 29SpCS157BL16BCNF&Lossless
20/67
Transitive Dependence
G ive a relation R ,Assume the following FD hold:
Note : Both Ename and Address attributes are non-key attributes in R, and since
Address depends on a non-Prime attribute Name, which depends on the primary
key( EmpNo), a transitive dependency e xists
EmpNo E Name Salary Address
A ddress EmpNo A ddresst Ename Ename EmpNo """ ,,
Add ressna e "
EmpNo E Name Salary Ename Address R1 R2
Note : If address is a prime attribute Then R is in 3NF
8/8/2019 29SpCS157BL16BCNF&Lossless
21/67
Modification Anomalies
What happens when you want to add a new book? change the address of a patron?
delete a patron record?
P atronName
P atron Address
BookID
BookTitle
Book Author
BorrowDate
DueDate
ReturnDate
SmithJonesHartHicksRice
Jones
12 Elk25 Sun73 Sera22 Main69 Witt
25 Sun
AAABBBCCC
AAADDD
CCC
P eaceWar S stemP eaceS rin
S stem
BartHineVanBartL on
Van
2/42/42/52/122/6
1/26
2/182/182/192/252/20
2/7
2/152/192/232/282/8
2/6
8/8/2019 29SpCS157BL16BCNF&Lossless
22/67
Modification AnomaliesDeletion anomaly deleting one fact about an entity deletes a fact
about another entityInsertion anomaly cannot insert one fact about an entity unless a
fact about another entity is also added
U pdate anomaly changing one fact about an entity requires
multiple changes to a table
8/8/2019 29SpCS157BL16BCNF&Lossless
23/67
Referential Integrity Constraint
When we split a relation, we must payattention to the references across the newlyformed relationsE.g., a book must e xist before it can bechecked out: CH ECKO UT [BookID ] BOOK [BookID ]The DBMS or the applications will have tocheck/enforce constraints
8/8/2019 29SpCS157BL16BCNF&Lossless
24/67
Boyce-Codd Normal FormEvery determinant is a candidate key ADVIS ER(SID,Major,Fname)
ST U-ADV(SID,Fname)ADV-S UBJ(Fname,Subject)
8/8/2019 29SpCS157BL16BCNF&Lossless
25/67
Multi-valued DependencyTwo or more functionally independent multi-valued attributes are dependent on another
attribute EMPLOY EE (Name,Dependent,Project)
Data redundancy and modification anomalies
4NF: BCNF & no multi-valued dependencies EMPLOY EE (Name,Dependent) EMPLOY EE (Name, Project)
8/8/2019 29SpCS157BL16BCNF&Lossless
26/67
Boyce-Codd Normal Form (BCNF) A relation is in Boyce-Codd normal form (BC N F ) if
every determinant in the table is a candidate key.
(A determinant is any attribute whose value determinesother values with a row.)
If a table contains only one candidate key, the 3 N F and the BC N F are equivalent.
BC N F is a special case of 3 N F .
Database NormalizationDatabase Normalization
8/8/2019 29SpCS157BL16BCNF&Lossless
27/67
A T able T hat Is In 3 N F But N ot In BC N F
F igure 5.7
8/8/2019 29SpCS157BL16BCNF&Lossless
28/67
T he D ecomposition of a T able Structure to MeetBC N F Requirements
F igure 5.8
8/8/2019 29SpCS157BL16BCNF&Lossless
29/67
Lossless-join DecompositionF or the case of R = (R 1, R 2), we require thatfor all possible relations r on schema R
r =R1
(r ) |X|R2
(r ) A decomposition of R into R 1 and R 2 islossless join if and only if at least one of thefollowing dependencies is in F +:
R 1 R 2 p R 1R 1 R 2 p R 2
8/8/2019 29SpCS157BL16BCNF&Lossless
30/67
R = (A, B, C)F = {A p B, B p C)
Can be decomposed in two different ways
R 1 = (A, B), R 2 = (B, C)Lossless-join decomposition:R 1 R 2 = {B} and B p BC
Dependency preserving
R 1 = (A, B), R 2 = (A, C)
Lossless-join decomposition:R 1 R 2 = { A} and A p AB
Not dependency preserving(cannot check B p C without computing R 1 |X| R 2)
8/8/2019 29SpCS157BL16BCNF&Lossless
31/67
Dependency PreservationLet F i be the set of dependencies F +that include only attributes in R i .
A decomposition is dependency preserving , if
(F 1 F 2 F n )+ = F +
If it is not, then checking updates for violation of functional dependencies
may require computing joins,which is expensive .
8/8/2019 29SpCS157BL16BCNF&Lossless
32/67
Dependency PreservationTo check if a dependency E p F is preservedin a decomposition of R into R 1, R 2, , R n weapply the following test (with attribute closure
done with respect to F )result = Ewhile (changes to result ) do
for each R i in the decomposition
t =
(result R i )+
R i result = result t If result contains all attributes in F, then thefunctional dependency E p F is preserved.
8/8/2019 29SpCS157BL16BCNF&Lossless
33/67
Dependency PreservationWe apply the test on all dependencies in F to check if a decomposition is dependencypreservingThis procedure takes polynomial time,instead of the exponential time required to
compute F +
and ( F 1 F 2
F n)+
8/8/2019 29SpCS157BL16BCNF&Lossless
34/67
FD ExampleR = ( A, B, C )F = { A p B, B p C }Key = { A}R is not in BCN F
Decomposition R 1 = ( A, B), R 2 = (B,C)
R 1 and R 2 now in BCN FLossless-join decompositionDependency preserving
8/8/2019 29SpCS157BL16BCNF&Lossless
35/67
A Lossy Decomposition
8/8/2019 29SpCS157BL16BCNF&Lossless
36/67
A im of Normalization
Goal for a relational database design is: BCNF.
Lossless join.
Dependency preservation.
If we cannot achieve this, we accept one of Lack of dependency preservation
Redundancy due to use of 3NF
8/8/2019 29SpCS157BL16BCNF&Lossless
37/67
Sample
Data for a BC
N FConversion
T able 5.2
8/8/2019 29SpCS157BL16BCNF&Lossless
38/67
D ecomposition into BC N F
8/8/2019 29SpCS157BL16BCNF&Lossless
39/67
8/8/2019 29SpCS157BL16BCNF&Lossless
40/67
8/8/2019 29SpCS157BL16BCNF&Lossless
41/67
Perform lossless-join decompositions of each of the followingscheme into BCNF schemes: R( A , B, C, D, E) with dependency set{AB CD E, C D, D E}
A B C D A B C D
C D D EA B C E A B C D
C DD E A B C A B C
8/8/2019 29SpCS157BL16BCNF&Lossless
42/67
Given the FDs {B D, AB C, D B} and the relation { A , B, C,D}, give a two distinct lossless join decomposition to BNCF
indicating the keys of each of the resulting relations.
A B C D
B D A B C
A B C D
B D A C D
8/8/2019 29SpCS157BL16BCNF&Lossless
43/67
8/8/2019 29SpCS157BL16BCNF&Lossless
44/67
Example
The name-addr-phones-beersLiked e xampleillustrated the MVD
name->->phonesand the MVD
name ->-> beersLiked.
8/8/2019 29SpCS157BL16BCNF&Lossless
45/67
Picture of MV
DX ->->Y
X Y others
equal
ex change
8/8/2019 29SpCS157BL16BCNF&Lossless
46/67
MVD
RulesEvery FD is an MVD. If X ->Y , then swapping Y s between two tuples that
agree on X doesnt change the tuples. Therefore, the new tuples are surely in the
relation, and we know X ->-> Y .
Complementation : If X ->-> Y , and Z is all theother attributes, then X ->-> Z .
8/8/2019 29SpCS157BL16BCNF&Lossless
47/67
8/8/2019 29SpCS157BL16BCNF&Lossless
48/67
8/8/2019 29SpCS157BL16BCNF&Lossless
49/67
8/8/2019 29SpCS157BL16BCNF&Lossless
50/67
Fourth Normal
Form
The redundancy that comes from MVDs isnot removable by putting the databaseschema in BCNF.There is a stronger normal form, called4NF, that (intuitively) treats MVDs as FDs
when it comes to decomposition, but notwhen determining keys of the relation.
8/8/2019 29SpCS157BL16BCNF&Lossless
51/67
4N
F D
efinitionA relation R is in 4NF if whenever
X ->-> Y is a nontrivial MVD, then X is asuperkey.
Nontrivial means that:1 . Y is not a subset of X , and2. X and Y are not, together, all the attributes.
Note that the definition of superkey stilldepends on FDs only.
8/8/2019 29SpCS157BL16BCNF&Lossless
52/67
BCN
FVersus
4N
F
Remember that every FD X ->Y is also anMVD, X ->-> Y .Thus, if R is in 4NF, it is certainly inBCNF. Because any BCNF violation is a 4NF
violation.But R could be in BCNF and not 4NF,
because MVDs are invisible to BCNF.
8/8/2019 29SpCS157BL16BCNF&Lossless
53/67
Normalization
Good Decompositiondependency preserving decomposition- it is undesirable to lose functional dependenciesduring decompositionlossless join decomposition
- join of decomposed relations should be able tocreate the original relation (no spurious tuples)
8/8/2019 29SpCS157BL16BCNF&Lossless
54/67
8/8/2019 29SpCS157BL16BCNF&Lossless
55/67
D ecomposition and 4 NF
If X ->-> Y is a 4NF violation for relation R, we can decompose R using the sametechnique as for BCNF.
1 . X Y is one of the decomposed relations.2. A ll but Y X is the other.
8/8/2019 29SpCS157BL16BCNF&Lossless
56/67
8/8/2019 29SpCS157BL16BCNF&Lossless
57/67
8/8/2019 29SpCS157BL16BCNF&Lossless
58/67
Ex ample
Drinkers(name, addr, phones, beersLiked)
FD: name -> addr MVDs: name ->-> phones
name ->-> beersLikedKey is {name, phones, beersLiked }.A ll dependencies violate 4NF.
8/8/2019 29SpCS157BL16BCNF&Lossless
59/67
Ex ample, Continued
Decompose using name -> addr:1. Drinkers1(name, addr)
In 4NF, only dependency is name -> addr.
2. Drinkers2(name, phones, beersLiked) Not in 4NF. MVDs name ->-> phones andname ->-> beersLiked apply. No FDs, so allthree attributes form the key.
8/8/2019 29SpCS157BL16BCNF&Lossless
60/67
Ex ample: D ecompose D rinkers2
Either MVD name ->-> phones or name ->-> beersLiked tells us todecompose to: Drinkers3(name, phones) Drinkers4(name , beersLiked)
8/8/2019 29SpCS157BL16BCNF&Lossless
61/67
BCNF
Given a relation schema R , and a set of functional dependencies F, if every FD, A
B , is either:
1. Trivial
2. A is a su perkey of R
Then, R is in B CN F ( B oyce-Codd Normal F orm
8/8/2019 29SpCS157BL16BCNF&Lossless
62/67
BCNF
What if the schema is not in BCNF ?
Decompose (split) the schema into two pieces .
Careful: you want the decomposition to belossless
8/8/2019 29SpCS157BL16BCNF&Lossless
63/67
Achieving BCNF SchemasFor all dependencies A B in F+, check if A is a superkey
B y u sing attrib ute clos ure
If not, thenChoose a dependency in F+ that breaks the BCNF rules, say A B
Create R1 = A B
Create R2 = A (R B A)
Note that: R1 R2 = A and A AB (= R1), so this is losslessdecomposition
Repeat for R1 , and R2By defining F1+ to be all dependencies in F that contain only attributes inR1
Similarly F2+
8/8/2019 29SpCS157BL16BCNF&Lossless
64/67
Example 1
B C
R = (A , B, C)F = {A B, B C}
Candidate keys = { A}BCNF = No. B C violates.
R1 = (B, C)F1 = {B C}
Candidate keys = {B }BCNF = true
R2 = (A, B)F2 = {A B}
Candidate keys = { A}BCNF = true
Example 2 1
8/8/2019 29SpCS157BL16BCNF&Lossless
65/67
Example 2-1
A B
R = (A , B, C, D, E)F = {A B, BC D}
Candidate keys = { ACE}
BCNF = Violated by { A B, BC D} etc
R1 = (A , B)F1 = {A B}
Candidate keys = { A}BCNF = true
R2 = (A , C, D, E)F2 = {AC D}
Candidate keys = { ACE}BCNF = false ( AC D)
From A B and BC D by pseudo-transitivity
AC D
R3 = (A, C, D)F3 = {AC D}
Candidate keys = { AC}BCNF = true
R4 = (A, C, E)F4 = {} [[ only
trivial ]]Candidate keys =
{ACE}
BCNF = true
Dependency preservation ???We can check:
A B (R1), AC D (R3),but we lost BC D
So this is not a dependency-preserving decomposition
Example 2 2
8/8/2019 29SpCS157BL16BCNF&Lossless
66/67
Example 2-2
BC D
R = (A , B, C, D, E)F = {A B, BC D}
Candidate keys = { ACE}
BCNF = Violated by { A B, BC D} etc
R1 = (B, C, D)F1 = {BC D}
Candidate keys = {BC }BCNF = true
R2 = (B, C, A, E)F2 = {A B}
Candidate keys = { ACE}BCNF = false ( A B)
A BR3 = (A , B)
F3 = {A B}Candidate keys = { A}
BCNF = true
R4 = (A , C, E)F4 = {} [[ only
trivial ]]Candidate keys =
{ACE}BCNF = true
Dependency
preservation ???We can check:
BC D (R1), A B(R3),Dependency-preserving
decomposition
E l 3
8/8/2019 29SpCS157BL16BCNF&Lossless
67/67
Example 3
A BC
R = (A , B, C, D, E, H)F = {A BC, E HA}Candidate keys = {D E}
BCNF = Violated by { A BC} etc
R1 = (A, B, C)F1 = {A BC}
Candidate keys = { A}BCNF = true
R2 = (A, D, E, H)F2 = {E HA}
Candidate keys = {D E}BCNF = false ( E HA)
E HA
R3 = (E, H, A)F3 = {E HA}
Candidate keys = { E}BCNF = true
R4 = (ED)F4 = {} [[ only
trivial ]]Candidate keys =
{DE}
Dependency preservation???We can check:
A BC (R1), E HA (R3),Dependency-preserving