Functional Dependencies and Normalization for Relational Databases CHAPTER 10

Informal Design Guidelines for Relational Databases Semantics of the Relation Attributes Reducing the redundant values in tuples Reducing the null values in tuples Disallowing the possibility of generating spurious tuples

Semantics of the Relation Attributes GUIDELINE 1: Each tuple in a relation should represent one entity or relationship instance. Attributes of different entities should not be mixed in the same relation. Only foreign keys should be used to refer to other entities Entity and relationship attributes should be kept apart as much as possible.

Semantics of the Relation Attributes Example:

Redundant Information in Tuples and Update Anomalies Mixing attributes of multiple entities may cause problems Information is stored redundantly wasting storage Problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Redundant Information in Tuples and Update Anomalies

Consider the relation: EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours) Update Anomaly: Changing the name of project number P1 from Billing to Customer- Accounting may cause this update to be made for all 100 employees working on project P1. Insert Anomaly: Cannot insert a project unless an employee is assigned to. Inversely - Cannot insert an employee unless an he/she is assigned to a project

Redundant Information in Tuples and Update Anomalies Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project

Redundant Information in Tuples and Update Anomalies GUIDELINE 2: Guideline to Redundant Information in Tuples and Update Anomalies Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly.

Null Values in Tuples GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible Attributes that are NULL frequently could be placed in separate relations Reasons for nulls: Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist, but unavailable

Spurious Tuples Bad designs for a relational database may result in erroneous results for certain JOIN operations The "lossless join" property is used to guarantee meaningful results for join operations GUIDELINE 4: The relations should be designed to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural-join of any relations.

Functional dependencies Consider the following relation schema {Plane} StartTime {Pilot, StartDate, StartTime} Plane {Plane, StartDate} Pilot

Functional dependencies A functional dependency is a constraint between two sets of attributes from the database. Suppose that relational database schema has n attributes R{A1, A2, An}, X and Y are subset of R. Definition. A functional dependency between two sets of attributes X and Y: X Y, specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that have t1[X]=t2[X] t1[Y] = t2[Y].

Functional dependencies Example:

Functional dependencies We say that there is a functional dependency from X to Y, or that Y is functionally dependent on X. The abbreviation for functional dependency is FD or f.d. The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side. X Y (t1.X = t2.X t1.Y = t2.Y)

Functional dependencies Example: Social security number determines employee name: SSN -> ENAME Project number determines project name and location PNUMBER -> {PNAME, PLOCATION} Employee SSN and project number determines the hours per week that the employee works on the project: {SSN, PNUMBER} -> HOURS

Functional dependencies If a constraint on R states that there cannot be more than one tuple with a given X value in any relation instance r(R), that is, X is a candidate key of R, this implies that X Y for any subset of attributes Y of R If X Y in R, this does not say whether or not Y X in R If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K])

Functional dependencies(FDs) A functional dependency is a property of the semantics or meaning of the attributes. Relation extensions r(R) that satisfy the functional dependency constraints are called legal relation states of R. A functional dependency is a property of the relation schema R, not of a particular legal relation state r of R. Hence, an FD cannot be inferred automatically from a given relation extension r but must be defined explicitly by someone who knows the semantics of the attributes of R

Inference Rules for FDs Given a set of FDs F on relation schema R. Any other functional dependencies hold in all legal relation instances that satisfy the dependencies in F can be said to be inferred or deduced from the FDs in F. The following six inference rules for functional dependencies (Armstrong's inference rules) IR1 (reflexive : phn x): If X Y, then X Y. IR2. (Augmentation) If X Y, then XZ YZ. IR3. (Transitive) If X Y and Y Z, then X Z

Inference Rules for FDs IR4(Decomposition): If X YZ, then X Y and X Z IR5(Union): If X Y and X Z, then X YZ IR6(pseudo-transitive) If X Y and WY Z, then WX Z The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)

Inference Rules for FDs Closure : the set of all dependencies that can be inferred from F is called the closure of F; it is denoted by F+. Closure of a set of attributes X with respect to F is the set X + of all attributes that are functionally determined by X. X + can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

Inference Rules for FDs Algorithm: Determining X+, the Closure of : the set of attribute X under F X+= X; repeat oldX+= X+; for each functional dependency Y Z in F do If X+ Y then X += X+ Z; until (X+ = oldX+);

Inference Rules for FDs Example 1: Let Q be a relation with attributes (A,B,C,D,E,G,H) and let the following functional dependencies hold F={f 1 : B A; f 2 : DA CE; f 3 : D H; f 4 : GH C; f 5 : AC D} Find the closure X + of X = {AC}. Answer: X+=ACDEH F{B CD;B C;C D} F{B D; B C;C D}=F 1mc, In F 1tt, B D is redundant. F{B C;C D}=Fmc">

Minimal Sets of FDs Example: Let Q be a relation with attributes Q(A,B,C,D) and FD F={AB CD, B C, C D} Finding a Minimal Cover of F ABCD is FD with the left hand side redundant A because B+=BCD B CD F+ =>F{B CD;B C;C D} F{B D; B C;C D}=F 1mc, In F 1tt, B D is redundant. F{B C;C D}=Fmc

Minimal Sets of FDs Example: Q(MSCD,MSSV,CD,HG) F = {MSCD CD; CD MSCD; CD,MSSV HG; MSCD,HG MSSV; CD,HG MSSV; MSCD,MSSV HG} Find minimal cover for FD F Result: Fmc = { MSCD CD; CD MSCD; CD,HG MSSV; MSCD,MSSV HG}

Normalization of Relations Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

Normalization of Relations 2NF, 3NF, BCNF based on keys and FDs of a relation schema 4NF based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs (Chapter 11) Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 11)

Practical Use of Normal Forms Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect The database designers need not normalize to the highest possible normal form. (usually up to 3NF, BCNF or 4NF) Denormalization: the process of storing the join of higher normal form relations as a base relationwhich is in a lower normal form

Definitions of Keys and Attributes Participating in Keys A super key of a relation schema R = {A 1, A 2,...., A n } is a set of attributes S subset-of R with the property that no two tuples t 1 and t 2 in any legal relation state r of R will have t 1 [S] = t 2 [S] A key K is a super key with the additional property that removal of any attribute from K will cause K not to be a super key any more.

Definitions of Keys and Attributes Participating in Keys Algorithm 10.3: Finding all key for a relation Q and set of Functional Dependencies F Determine all subset of Q+: X1, X2, , Xn Find closure of the Xi. Super key is Xi that they are closure =Q+, suppose that set supper key are S= {S1, S2, Sn} Create the set contain all key of Q from S by consider subset Si, Sj of S S (i j), if Si Sj then remove Sj (i,j=1..n).

Definitions of Keys and Attributes Participating in Keys Example:

Definitions of Keys and Attributes Participating in Keys Algorithm 10.4: Finding one key K for a relation Q and set of Functional Dependencies F: Step 1: Assign K = Q+ Step 2: A is an attribute of K, K=K-A. If K+=Q+ then K=K.

Definitions of Keys and Attributes Participating in Keys Example: U={A,B,C,D,E}, F={AB->C, AC->B, BC->DE} Find K Step1: K=U > K=ABCDE Step2:(K\A)+=>(BCDE)+=BCDE U+ K=ABCDE Step3:(K\B)+=>(ACDE)+= ABCDE = U+ K=ACDE Step4: (K\C)+ =>(ADE)+ = ADE U+ K=ACDE Step5: (K\D)+ => (ACE)+ = ACEBD=U+ K=ACE Step6: (K\E)+ =>(AC)+ = ACBDE =U+ K=AC

Definitions of Keys and Attributes Participating in Keys If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. A Prime attribute (thuc tnh kha) must be a member of some candidate key A Nonprime attribute is not a prime attribute that is, it is not a member of any candidate key.

First Normal Form First normal form (INF): Disallow multivalued attributes, composite attributes, and their combinations. It states that the domain of an attribute must include only atomic values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.

First Normal Form

Second Normal Form Second normal form (2NF) is based on the concept of full functional dependency. A functional dependency X Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. Examples: {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds

Second Normal Form Definition. A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R. A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key R can be decomposed into 2NF relations via the process of 2NF normalization

Second Normal Form Algorithm 10.4 :The test for 2NF Give a relational Q, a set FD F Determine Q belong 2FN ? Step1: Find all key of Q Step2: With each key K, Find closure of all the subsets S of K Step3: If there are S+ contain Nonprime attribute then Q is not 2NF else Q is 2NF

Second Normal Form Example: Q(A,B,C,D) F={ABC; BD; BCA}. Is Q 2NF? D is nonprime attribute Q is not 2NF

Third Normal Form Third normal form (3NF) is based on the concept of transitive dependency. Transitive functional dependency A functional dependency X Z that can be derived from two FDs X Y and Y Z Examples: SSN DMGRSSN is a transitive FD since SSN DNUMBER and DNUMBER DMGRSSN hold

Third Normal Form Definition: A relation schema R is in 3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key. Example:

Third Normal Form NOTE: In X Y and Y Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency. EMP (SSN, Emp#, Salary ). SSN Emp# Salary and Emp# is a candidate key.

General Normal Form Definitions (For Multiple Keys) The above definitions consider the primary key only The following more general definitions take into account relations with multiple candidate keys Partial and full functional dependencies and transitive dependencies will now be considered with respect to all candidate keys of a relation.

General Definition of 2NF Definition: A relation schema R is in second normal form (2NF) if every nonprime attribute A in R is not partially dependent on any key of R. Example:

General Definition of 2NF In LOT relation (above example), Suppose that there are two candidate keys: PROPERTY_ID# and {COUNTY_NAME, LOT#}; that is, lot numbers are unique only within each county, but PROPERTY_ID numbers are unique across counties for the entire state. Based on the two candidate keys PROPERTY_ID# and {COUNTY_NAME, LOT#}, we know that functional dependencies FD1 and FD2 hold. We choose PROPERTY_ID# as the primary key,

General Definition of 3NF Definition: A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional dependency X A holds in R, either X is a super key of R, or A is a prime attribute of R.

BCNF (Boyce-Codd Normal Form) Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF. That is, every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF. Definition. A relation schema R is in BCNF if whenever a nontrivial functional dependency X A holds in R, then X is a super key of R

BCNF (Boyce-Codd Normal Form) Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF There exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF)

BCNF (Boyce-Codd Normal Form) Example: Q(A,B,C,D,E,I) F={ACDEBI;CEAD}. Q is a BCNF? F F tt ={ACDE,ACDB,ACDI,CEA,CED} The feft hand side of F tt are super key Q is BCNF

BCNF (Boyce-Codd Normal Form) Example:

Documents

Functional Dependencies and Normalization for Relational Databases CHAPTER 10