Upload
beverly-wiggins
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Databases 6:Normalization
Hossein Rahmani
Databases lecture 62
Design of a Relational DB-Schema»What is a good relational database schema?»Conceptual level (ER-schema, Views)»Implementation level (consistency, efficiency of base
relations = stored tables)» How do you achieve a good schema?»Bottom-up (start with relations between attributes)»design by synthesis
»Top-down (start with ER-schema and then further decomposition): »design by analysis
2
Databases lecture 63
Obtaining a good design
»We first give (informally) 4 guidelines»Then we will give formal requirements
incorporating the guidelines (based on functional dependencies)
Databases lecture 64
Guideline 1: Semantics
»Make sure that the semantics (meaning) of all base relations and attributes is clear »Tuples must be easily interpreted as ‘facts’»Do not mix, if possible, attributes of more than
one entity or relation type in one base relation
4
Databases lecture 65
Examples of poor design
Databases lecture 66
Guideline 2: Redundancy and anomalies
»Avoid redundancy: reduce the space that is needed to store the database as much as possible
»Prevent anomalies when changing data in the database»update (insertion / deletion / modification)
anomalies
6
Databases lecture 67
Redundancy - example
Databases lecture 68
Update Anomalies
»Cause: doubly stored data, wrong design (example: see next slide)»insertion anomalies:
»new tuple contains incorrect attribute value for an already stored entity
»new entity has a null key
»deletion anomalies»Incomplete deletion of an entity»unwanted deletion of an entity
»modification anomalies»incomplete modification of an entity
8
Databases lecture 69
Guideline 3: NULL-values
»Some base relations contain many attributes that often are ‘NULL’»Unnecessary use of space»Multiple meanings of ‘NULL’»JOIN operations can have undesired effects»COUNT and SUM can go wrong
»SO: place an attribute in a base relation in which it is as least as possible ‘NULL’
9
Databases lecture 610
Guideline 4: False (Spurious) Tuples
»If we select base relations wrong, a (NATURAL-)JOIN can create tuples that do not have any connection with the mini world (see next slides)
»So: select base relations such that at a JOIN on primary or foreign keys, no spurious tuples can occur. Don’t JOIN on other attributes
10
Databases lecture 611
Wrong choice - relations
Databases lecture 612
Wrong choice: states
Databases lecture 613
Natural join -> spurious tuples (marked *)
Databases lecture 614
Normalization
»Using ‘normalization’, you can adhere to these guidelines for a large part
»In a number of steps (algorithms) you transfer a given relational database schema into an ever higher normal form
»Base concept: functional dependency
14
Databases lecture 615
Functional dependency
»Start with one universal relation schema R containing all attributes A1,..,An
»Given two attribute sets X and Y in R»Functional dependency X Y exists (X
functionally determines Y; Y is functionally dependent on X) if: »r(R): t1,t2 r: t1[X] = t2[X] t1[Y] = t2[Y]» i.e.: component X determines component Y
15
Databases lecture 616
Functional dependency
16
A B C
AB C
Databases lecture 617
Functional dependency»If X is a superkey of R then X Y holds for
each set Y of attributes in R»If X Y, then nothing can be concluded on
the existence of Y X»X Y follows from the semantics of the
attributes in X and Y (which means that the designer should note and declare it)
»r(R) is legal if it agrees with all functional dependencies (FDs) declared on R
17
Databases lecture 618
Inference rules for FDs
»Six rules for deriving FDs:»IR1 (reflexive): if X Y then X Y (trivial)
As a special case: X X »IR2 (extension): {X Y} |= XZ YZ»IR3 (transitive): {X Y, Y Z} |= X Z»IR4 (project): {X YZ} |= X Y»IR5 (combine): {X Y, X Z} |= X YZ»IR6 (pseudotransitive): {X Y, WY Z} |= WX Z
18
Databases lecture 619
Closure F+ of F
»Given a set of FDs for R: F(R)»IR1-3 is sound & complete (Armstrong) »Sound: If a new FD f can be derived from F(R) using
IR1-3, and r(R) is legal for F, then r(R) is also legal for F {f}»Complete: If FD f holds on R then f can be derived
from F using IR1-3
»The set F+(R) of all FDs that can be derived from F, is called the closure F+ of F(R)
19
Databases lecture 620
Closure X+ under F
»Given X Y F. The closure X+ of X under F is the set of all attributes that are also functionally dependent on X
20
X+ := X ;do { oldX+ = X+ ; for all Y Z F : if X+ Y then X+ = X+ Z;
} while (oldX+ X+)
Algorithm to determine X+:
Databases lecture 621
Equivalence
»Two sets of FDs, F and E are equivalent (F E) iff F+ = E+
»Semantically: if F E, then r(R) is legal for F iff r(R) is legal for E
»By definition: F |= f iff F F {f}»For each set F there exist many equivalent
sets of FDs. We prefer simplicity: minimal cover
21
Databases lecture 622
Minimal Cover
»We can translate any F into an equivalent minimal cover G
»A set FDs G is a minimal cover of F if G F and»for all X Y G, Y has exactly one attribute (so, if
X YZ, then split into X Y and X Z) »We cannot remove any X Y from G without
loosing equivalence with F»We cannot replace any X Y in G by W Y with
W X, without loosing equivalence with F
22
Databases lecture 623
Algorithm for Minimal Cover
23
1) Start with G := F ;2) Replace all X Y with Y = {A1,..,An} by X Ai ; (IR4)3) For all XY A: if G - {XY A} G {X A} then replace XY A with X A;4) If G - {X A} G then remove X A;
1) AB CD; C D; A CB;2) AB C; AB D; C D; A C; A B;3) A C; A D; C D ; A B;4) A C; C D ; A B;
Example:
Databases lecture 624
Normal forms
»Invented by Codd as a test on relational database schemas»The tests (‘normal forms’) grow more severe.
The more severe the test, the higher the normal form, the more robust the database»If a schema does not pass the test, it is
decomposed in partial schemas that do pass the test»It is not always necessary to reach the highest
possible normal form24
Databases lecture 625
1NF - First Normal Form
»Attributes can only be single-valued»Is a basic demand of most relational
databases
»Example of a non-1NF relation (see next slide). This normally is already ruled out by the definition of a relation (so using the relational database model automatically ensures 1NF)
25
Databases lecture 626
Non-1NF - example
Databases lecture 627
1NF - First Normal Form
»Solutions for a multi-valued attribute A in R:»Preferred: create new relation S with A and a
foreign key to R »Extend the key of R with an index number for
the values of A (redundancy!); e.g. department has no. 5A, or 5B, or 5C»Determine the maximum number of values per
tuple for A (say k) and replace attribute by k attributes (say, loc1, loc2, and loc3). This introduces null-values!
27
Databases lecture 628
2NF - Second Normal form
»Definition: X Y is a partial functional dependency if there is an attribute A in X s.t. X-{A} Y
»X Y is total if it is not partial
»2NF: each non-primary attribute is totally dependent on primary key (and not on parts of the primary key)
28
Databases lecture 629
2NF - Normalizing
»Break up the relation such that every partial key with their dependent attributes is in a separate relation. Only keep those attributes that depend totally on the primary key
»Example (see next slide)
29
Databases lecture 630
2NF - example
Databases lecture 631
3NF - Third Normal Form
»Definition: X Y is a transitive dependency if there is a Z that is not (part of) a candidate key s.t. X Z and Z Y
»3NF: no non-primary attribute is transitively depending on the primary key
31
Databases lecture 632
3NF - Normalizing
»Break up the relation such that the attributes that are depending on not-key attributes appear in a separate table (together with the attributes on which they depend)
»Example (see next slide)
32
Databases lecture 633
3NF - example
Databases lecture 634
General form 2NF and 3NF
»Put the same demands on all candidate keys (super keys) – which is more severe»2NF: every non-key attribute is totally
dependent on all keys»3NF: no non-key attribute is transitively
dependent on any keyOther formulation: if X A then A is prime or X is a super key
»Example (see next slide)
34
Databases lecture 635
General form 2NF and 3NF - example
1NF
2NF
Databases lecture 636
General form 2NF and 3NF - example
2NF
3NF
Databases lecture 637
Boyce-Codd Normal Form
»Simpler, but stronger than 3NF»BCNF: for each non-trivial dependency
X A holds that X is a super key
»Difference: in 3NF if A is a prime attribute, X does not have to be super key
»In many cases a 3NF schema is also BCNF
37
Databases lecture 638
BCNF example
Databases lecture 639
Decompositions
»Only adhering to a normal form is not enough
»We must not lose attributes in the process!»Non-additive Join-property:»a natural join of the result of a decomposition
should result in the original table, without spurious tuples
»There exist algorithms to automatically find good decompositions
39
Databases lecture 640
ER-schema to relational schemas
»A relational database schema that is mapped from an ER-schema is often in BCNF, but always in 3NF (so, check if BCNF is applicable and useful)
»Many CASE-tools can map an ER-schema automatically into a good relational schema (e.g., SQL create-table commands)
40