Functional Dependencies and Normalization for Relational
Databases CHAPTER 10
Slide 2
Informal Design Guidelines for Relational Databases Semantics
of the Relation Attributes Reducing the redundant values in tuples
Reducing the null values in tuples Disallowing the possibility of
generating spurious tuples
Slide 3
Semantics of the Relation Attributes GUIDELINE 1: Each tuple in
a relation should represent one entity or relationship instance.
Attributes of different entities should not be mixed in the same
relation. Only foreign keys should be used to refer to other
entities Entity and relationship attributes should be kept apart as
much as possible.
Slide 4
Semantics of the Relation Attributes Example:
Slide 5
Redundant Information in Tuples and Update Anomalies Mixing
attributes of multiple entities may cause problems Information is
stored redundantly wasting storage Problems with update anomalies
Insertion anomalies Deletion anomalies Modification anomalies
Slide 6
Redundant Information in Tuples and Update Anomalies
Slide 7
Consider the relation: EMP_PROJ ( Emp#, Proj#, Ename, Pname,
No_hours) Update Anomaly: Changing the name of project number P1
from Billing to Customer- Accounting may cause this update to be
made for all 100 employees working on project P1. Insert Anomaly:
Cannot insert a project unless an employee is assigned to.
Inversely - Cannot insert an employee unless an he/she is assigned
to a project
Slide 8
Redundant Information in Tuples and Update Anomalies Delete
Anomaly: When a project is deleted, it will result in deleting all
the employees who work on that project. Alternately, if an employee
is the sole employee on a project, deleting that employee would
result in deleting the corresponding project
Slide 9
Redundant Information in Tuples and Update Anomalies GUIDELINE
2: Guideline to Redundant Information in Tuples and Update
Anomalies Design the base relation schemas so that no insertion,
deletion, or modification anomalies are present in the relations.
If any anomalies are present, note them clearly and make sure that
the programs that update the database will operate correctly.
Slide 10
Null Values in Tuples GUIDELINE 3: Relations should be designed
such that their tuples will have as few NULL values as possible
Attributes that are NULL frequently could be placed in separate
relations Reasons for nulls: Attribute not applicable or invalid
Attribute value unknown (may exist) Value known to exist, but
unavailable
Slide 11
Spurious Tuples Bad designs for a relational database may
result in erroneous results for certain JOIN operations The
"lossless join" property is used to guarantee meaningful results
for join operations GUIDELINE 4: The relations should be designed
to satisfy the lossless join condition. No spurious tuples should
be generated by doing a natural-join of any relations.
Slide 12
Functional dependencies Consider the following relation schema
{Plane} StartTime {Pilot, StartDate, StartTime} Plane {Plane,
StartDate} Pilot
Slide 13
Functional dependencies A functional dependency is a constraint
between two sets of attributes from the database. Suppose that
relational database schema has n attributes R{A1, A2, An}, X and Y
are subset of R. Definition. A functional dependency between two
sets of attributes X and Y: X Y, specifies a constraint on the
possible tuples that can form a relation state r of R. The
constraint is that, for any two tuples t1 and t2 in r that have
t1[X]=t2[X] t1[Y] = t2[Y].
Slide 14
Functional dependencies Example:
Slide 15
Functional dependencies We say that there is a functional
dependency from X to Y, or that Y is functionally dependent on X.
The abbreviation for functional dependency is FD or f.d. The set of
attributes X is called the left-hand side of the FD, and Y is
called the right-hand side. X Y (t1.X = t2.X t1.Y = t2.Y)
Slide 16
Functional dependencies Example: Social security number
determines employee name: SSN -> ENAME Project number determines
project name and location PNUMBER -> {PNAME, PLOCATION} Employee
SSN and project number determines the hours per week that the
employee works on the project: {SSN, PNUMBER} -> HOURS
Slide 17
Functional dependencies If a constraint on R states that there
cannot be more than one tuple with a given X value in any relation
instance r(R), that is, X is a candidate key of R, this implies
that X Y for any subset of attributes Y of R If X Y in R, this does
not say whether or not Y X in R If K is a key of R, then K
functionally determines all attributes in R (since we never have
two distinct tuples with t1[K]=t2[K])
Slide 18
Functional dependencies(FDs) A functional dependency is a
property of the semantics or meaning of the attributes. Relation
extensions r(R) that satisfy the functional dependency constraints
are called legal relation states of R. A functional dependency is a
property of the relation schema R, not of a particular legal
relation state r of R. Hence, an FD cannot be inferred
automatically from a given relation extension r but must be defined
explicitly by someone who knows the semantics of the attributes of
R
Slide 19
Inference Rules for FDs Given a set of FDs F on relation schema
R. Any other functional dependencies hold in all legal relation
instances that satisfy the dependencies in F can be said to be
inferred or deduced from the FDs in F. The following six inference
rules for functional dependencies (Armstrong's inference rules) IR1
(reflexive : phn x): If X Y, then X Y. IR2. (Augmentation) If X Y,
then XZ YZ. IR3. (Transitive) If X Y and Y Z, then X Z
Slide 20
Inference Rules for FDs IR4(Decomposition): If X YZ, then X Y
and X Z IR5(Union): If X Y and X Z, then X YZ
IR6(pseudo-transitive) If X Y and WY Z, then WX Z The last three
inference rules, as well as any other inference rules, can be
deduced from IR1, IR2, and IR3 (completeness property)
Slide 21
Inference Rules for FDs Closure : the set of all dependencies
that can be inferred from F is called the closure of F; it is
denoted by F+. Closure of a set of attributes X with respect to F
is the set X + of all attributes that are functionally determined
by X. X + can be calculated by repeatedly applying IR1, IR2, IR3
using the FDs in F
Slide 22
Inference Rules for FDs Algorithm: Determining X+, the Closure
of : the set of attribute X under F X+= X; repeat oldX+= X+; for
each functional dependency Y Z in F do If X+ Y then X += X+ Z;
until (X+ = oldX+);
Slide 23
Inference Rules for FDs Example 1: Let Q be a relation with
attributes (A,B,C,D,E,G,H) and let the following functional
dependencies hold F={f 1 : B A; f 2 : DA CE; f 3 : D H; f 4 : GH C;
f 5 : AC D} Find the closure X + of X = {AC}. Answer: X+=ACDEH F{B
CD;B C;C D} F{B D; B C;C D}=F 1mc, In F 1tt, B D is redundant. F{B
C;C D}=Fmc">
Minimal Sets of FDs Example: Let Q be a relation with
attributes Q(A,B,C,D) and FD F={AB CD, B C, C D} Finding a Minimal
Cover of F ABCD is FD with the left hand side redundant A because
B+=BCD B CD F+ =>F{B CD;B C;C D} F{B D; B C;C D}=F 1mc, In F
1tt, B D is redundant. F{B C;C D}=Fmc
Slide 35
Minimal Sets of FDs Example: Q(MSCD,MSSV,CD,HG) F = {MSCD CD;
CD MSCD; CD,MSSV HG; MSCD,HG MSSV; CD,HG MSSV; MSCD,MSSV HG} Find
minimal cover for FD F Result: Fmc = { MSCD CD; CD MSCD; CD,HG
MSSV; MSCD,MSSV HG}
Slide 36
Normalization of Relations Normalization: The process of
decomposing unsatisfactory "bad" relations by breaking up their
attributes into smaller relations Normal form: Condition using keys
and FDs of a relation to certify whether a relation schema is in a
particular normal form
Slide 37
Normalization of Relations 2NF, 3NF, BCNF based on keys and FDs
of a relation schema 4NF based on keys, multi-valued dependencies :
MVDs; 5NF based on keys, join dependencies : JDs (Chapter 11)
Additional properties may be needed to ensure a good relational
design (lossless join, dependency preservation; Chapter 11)
Slide 38
Practical Use of Normal Forms Normalization is carried out in
practice so that the resulting designs are of high quality and meet
the desirable properties The practical utility of these normal
forms becomes questionable when the constraints on which they are
based are hard to understand or to detect The database designers
need not normalize to the highest possible normal form. (usually up
to 3NF, BCNF or 4NF) Denormalization: the process of storing the
join of higher normal form relations as a base relationwhich is in
a lower normal form
Slide 39
Definitions of Keys and Attributes Participating in Keys A
super key of a relation schema R = {A 1, A 2,...., A n } is a set
of attributes S subset-of R with the property that no two tuples t
1 and t 2 in any legal relation state r of R will have t 1 [S] = t
2 [S] A key K is a super key with the additional property that
removal of any attribute from K will cause K not to be a super key
any more.
Slide 40
Definitions of Keys and Attributes Participating in Keys
Algorithm 10.3: Finding all key for a relation Q and set of
Functional Dependencies F Determine all subset of Q+: X1, X2, , Xn
Find closure of the Xi. Super key is Xi that they are closure =Q+,
suppose that set supper key are S= {S1, S2, Sn} Create the set
contain all key of Q from S by consider subset Si, Sj of S S (i j),
if Si Sj then remove Sj (i,j=1..n).
Slide 41
Definitions of Keys and Attributes Participating in Keys
Example:
Slide 42
Definitions of Keys and Attributes Participating in Keys
Example:
Slide 43
Definitions of Keys and Attributes Participating in Keys
Algorithm 10.4: Finding one key K for a relation Q and set of
Functional Dependencies F: Step 1: Assign K = Q+ Step 2: A is an
attribute of K, K=K-A. If K+=Q+ then K=K.
Definitions of Keys and Attributes Participating in Keys If a
relation schema has more than one key, each is called a candidate
key. One of the candidate keys is arbitrarily designated to be the
primary key, and the others are called secondary keys. A Prime
attribute (thuc tnh kha) must be a member of some candidate key A
Nonprime attribute is not a prime attribute that is, it is not a
member of any candidate key.
Slide 46
First Normal Form First normal form (INF): Disallow multivalued
attributes, composite attributes, and their combinations. It states
that the domain of an attribute must include only atomic values and
that the value of any attribute in a tuple must be a single value
from the domain of that attribute.
Slide 47
First Normal Form
Slide 48
Second Normal Form Second normal form (2NF) is based on the
concept of full functional dependency. A functional dependency X Y
is a full functional dependency if removal of any attribute A from
X means that the dependency does not hold any more. Examples: {SSN,
PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor
PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full
FD (it is called a partial dependency ) since SSN -> ENAME also
holds
Slide 49
Second Normal Form Definition. A relation schema R is in 2NF if
every nonprime attribute A in R is fully functionally dependent on
the primary key of R. A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is fully functionally
dependent on the primary key R can be decomposed into 2NF relations
via the process of 2NF normalization
Slide 50
Second Normal Form Algorithm 10.4 :The test for 2NF Give a
relational Q, a set FD F Determine Q belong 2FN ? Step1: Find all
key of Q Step2: With each key K, Find closure of all the subsets S
of K Step3: If there are S+ contain Nonprime attribute then Q is
not 2NF else Q is 2NF
Slide 51
Second Normal Form Example: Q(A,B,C,D) F={ABC; BD; BCA}. Is Q
2NF? D is nonprime attribute Q is not 2NF
Slide 52
Third Normal Form Third normal form (3NF) is based on the
concept of transitive dependency. Transitive functional dependency
A functional dependency X Z that can be derived from two FDs X Y
and Y Z Examples: SSN DMGRSSN is a transitive FD since SSN DNUMBER
and DNUMBER DMGRSSN hold
Slide 53
Third Normal Form Definition: A relation schema R is in 3NF if
it satisfies 2NF and no nonprime attribute of R is transitively
dependent on the primary key. Example:
Slide 54
Third Normal Form NOTE: In X Y and Y Z, with X as the primary
key, we consider this a problem only if Y is not a candidate key.
When Y is a candidate key, there is no problem with the transitive
dependency. EMP (SSN, Emp#, Salary ). SSN Emp# Salary and Emp# is a
candidate key.
Slide 55
General Normal Form Definitions (For Multiple Keys) The above
definitions consider the primary key only The following more
general definitions take into account relations with multiple
candidate keys Partial and full functional dependencies and
transitive dependencies will now be considered with respect to all
candidate keys of a relation.
Slide 56
General Definition of 2NF Definition: A relation schema R is in
second normal form (2NF) if every nonprime attribute A in R is not
partially dependent on any key of R. Example:
Slide 57
General Definition of 2NF In LOT relation (above example),
Suppose that there are two candidate keys: PROPERTY_ID# and
{COUNTY_NAME, LOT#}; that is, lot numbers are unique only within
each county, but PROPERTY_ID numbers are unique across counties for
the entire state. Based on the two candidate keys PROPERTY_ID# and
{COUNTY_NAME, LOT#}, we know that functional dependencies FD1 and
FD2 hold. We choose PROPERTY_ID# as the primary key,
Slide 58
General Definition of 3NF Definition: A relation schema R is in
third normal form (3NF) if, whenever a nontrivial functional
dependency X A holds in R, either X is a super key of R, or A is a
prime attribute of R.
Slide 59
BCNF (Boyce-Codd Normal Form) Boyce-Codd normal form (BCNF) was
proposed as a simpler form of 3NF, but it was found to be stricter
than 3NF. That is, every relation in BCNF is also in 3NF; however,
a relation in 3NF is not necessarily in BCNF. Definition. A
relation schema R is in BCNF if whenever a nontrivial functional
dependency X A holds in R, then X is a super key of R
Slide 60
BCNF (Boyce-Codd Normal Form) Each normal form is strictly
stronger than the previous one Every 2NF relation is in 1NF Every
3NF relation is in 2NF Every BCNF relation is in 3NF There exist
relations that are in 3NF but not in BCNF The goal is to have each
relation in BCNF (or 3NF)
Slide 61
BCNF (Boyce-Codd Normal Form) Example: Q(A,B,C,D,E,I)
F={ACDEBI;CEAD}. Q is a BCNF? F F tt ={ACDE,ACDB,ACDI,CEA,CED} The
feft hand side of F tt are super key Q is BCNF