Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
CSE 530A
Normalization
Washington University
Fall 2013
1NF
• A relation is in first normal form if
– the domain of each attribute contains only
atomic values
– the value of each attribute contains only a
single value from that domain
• That is,
– no value in a table should contain a set of
values
1NF
• Example of a table that is not in first
normal form
– Most DBMSs will not allow this (though some
have special array field types)
customers
cust_id last_name first_name phone
1 Bunny Bugs 555-123-4567
2 Duck Daffy 555-123-4568
555-867-5309
3 Pig Porky 555-123-4569
Set of values
Beyond 1NF
• The objectives of normalization beyond 1NF 1. To free the collection of relations from undesirable
insertion, update and deletion dependencies;
2. To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs;
3. To make the relational model more informative to users;
4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.
– E.F. Codd, "Further Normalization of the Data Base Relational Model"
Modification Anomalies
• Update anomaly
– Employee 519 has different addresses on
different records
Modification Anomalies
• Insertion anomaly
– New faculty 424 cannot be inserted until assigned a course
Modification Anomalies
• Deletion anomaly
– Faculty 389 is deleted if temporarily
unassigned any course
Functional Dependency
• In a given table, an attribute Y is said to
have a functional dependency on a set
of attributes X (written X → Y) if and only if
each X value is associated with precisely
one Y value
– e.g., in an "Employee" table that has attributes
"Employee ID" and "Date of Birth" {Employee
ID} → {Date of Birth} would hold
• Each employee has one Date of Birth
Trivial FD
• A trivial functional dependency is a
functional dependency of an attribute on a
superset of itself.
– e.g., {Employee ID, Employee Address} →
{Employee Address} is trivial, as is {Employee
Address} → {Employee Address}
Full FD
• An attribute is fully functionally
dependent on a set of attributes X if it is
– functionally dependent on X, and
– not functionally dependent on any proper
subset of X.
– e.g., {Employee Address} has a functional
dependency on {Employee ID, Skill}, but not
a full functional dependency, because it is
also dependent on just {Employee ID}.
Superkeys and Candidate Keys
• A superkey is a combination of attributes
that can be used to uniquely identify a
database record
• A candidate key is a minimal superkey
– that is, a set of attributes is a candidate key if
there is no proper subset that is also a
candidate key
Prime and Non-Prime Attributes
• A non-prime attribute is an attribute that
does not occur in any candidate key
– "Employee Address" would be a non-prime
attribute in the "Employees Skills" table
• A prime attribute is an attribute that does
occur in some candidate key
2NF
• A table is in second normal form if
– it is in first normal form and
– no non-prime attribute is dependent on any
proper subset of any candidate key of the
table
• 2NF is violated when
– a non-key field is a fact about a subset of a
key
• only relevant for tables with composite keys
2NF
• Employees Skills is not in second normal form – Neither {Employee} nor {Skill} is a candidate key
– Only the composite key {Employee, Skill} qualifies as a candidate key
– Current Work Location is dependent on only part of the candidate key, namely Employee
Employees Skills
Employee Skill Current Work Location
Jones Typing 114 Main Street
Jones Shorthand 114 Main Street
Jones Whittling 114 Main Street
Bravo Light Cleaning 73 Industrial Way
Ellis Alchemy 73 Industrial Way
Ellis Flying 73 Industrial Way
Harrison Light Cleaning 73 Industrial Way
2NF
• The example can be made 2NF by using
two tables
Employees
Employee Current Work Location
Jones 114 Main Street
Bravo 73 Industrial Way
Ellis 73 Industrial Way
Harrison 73 Industrial Way
Employees Skills
Employee Skill
Jones Typing
Jones Shorthand
Jones Whittling
Bravo Light Cleaning
Ellis Alchemy
Ellis Flying
Harrison Light Cleaning
Transitive Dependency
• A transitive dependency is an indirect
functional dependency
– X→Z only by virtue of X→Y and Y→Z
• e.g., {Tournament, Year} → {Winner} and {Winner}
→ {Winner Date of Birth}, so {Winner Date of Birth}
is transitively dependent on {Tournament, Year}
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
3NF
• A table is in third normal form if
– it is in second normal form and
– every non-prime attribute is directly (non-
transitively) dependent on every superkey of
the table
• 3NF is violated when
– a non-key field is a fact about another non-
key field
3NF
• This table is not in 3NF because the non-
prime attribute "Winner Date of Birth" is
transitively dependent on the candidate
key {Tournament, Year}
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
3NF
• The example can be made 3NF by using
two tables Tournament Winners
Tournament Year Winner
Indiana Invitational 1998 Al Fredrickson
Cleveland Open 1999 Bob Albertson
Des Moines Masters 1999 Al Fredrickson
Indiana Invitational 1999 Chip Masterson
Player Dates of Birth
Player Date of Birth
Chip Masterson 14 March 1977
Al Fredrickson 21 July 1975
Bob Albertson 28 September 1968
3NF
• "…the key, the whole key, and nothing but the key." – Bill Kent
• 1NF – implies existence of a "key" (no duplicate rows)
• 2NF – non-prime attributes dependent on "the whole key"
• 3NF – non-prime attributes dependent on "nothing but the
key"
BCNF
• A table is in Boyce-Codd normal form if
– it is in third normal form and
– for every dependency X→Y either • X→Y is a trivial dependency (Y is a subset of X)
• X is a superkey
• It is very rare for a table to be in 3NF but not BCNF
– A table must have multiple overlapping candidate keys to possibly be 3NF but not BCNF
BCNF
• Two courts, two rate types per court
• Candidate keys: – {Court, Start Time}
– {Court, End Time}
– {Rate Type, Start Time}
– {Rate Type, End Time}
Today's Court Bookings
Court Start Time End Time Rate Type
1 09:30 10:30 SAVER
1 11:00 12:00 SAVER
1 14:00 15:30 STANDARD
2 10:00 11:30 PREMIUM-B
2 11:30 13:30 PREMIUM-B
2 15:00 16:30 PREMIUM-A
BCNF
• 3NF but not BCNF
– because of dependency {Rate Type} → {Court}
• Can be made BCNF by splitting
Rate Types
Rate Type Court Member Flag
SAVER 1 Yes
STANDARD 1 No
PREMIUM-A 2 Yes
PREMIUM-B 2 No
Today's Bookings
Rate Type Start Time End Time
SAVER 09:30 10:30
SAVER 11:00 12:00
STANDARD 14:00 15:30
PREMIUM-B 10:00 11:30
PREMIUM-B 11:30 13:30
PREMIUM-A 15:00 16:30
4NF
• A table is in fourth normal form if
– it is in third normal form and
– for every non-trivial multivalued dependency
X Y, X is a superkey
Multivalued Dependency
• A multivalued dependency is a
constraint in which the presence of certain
rows in a table implies the presence of
certain other rows
Multivalued Dependency
• Consider a table that tries to map two
independent many-to-many relationships
between three attributes
– independent: no relationship between two of
the attributes
• e.g., employee-skill and employee-language but
no relationship between skill and language
• How are the records maintained?
Multivalued Dependency
• Option 1: disjoint format
– a record contains either a skill or a language
but not both
employee skill language
Jones cook
Jones type
Jones English
Jones French
Jones German
Multivalued Dependency
• Option 2: minimal number of records, with
repetitions
employee skill language
Jones cook English
Jones type French
Jones type German
Multivalued Dependency
• Option 3: minimal number of records, with
null values
employee skill language
Jones cook English
Jones type French
Jones German
Multivalued Dependency
• Option 4: unrestricted
employee skill language
Jones cook English
Jones type
Jones French
Jones type German
Multivalued Dependency
• Option 5: all possible pairings
– This is actually the version that corresponds to a multivalued dependency
• {employee} {skill}
• {employee} {language}
employee skill language
Jones cook English
Jones cook French
Jones cook German
Jones type English
Jones type French
Jones type German
4NF
• Split into two tables
employee skill
Jones cook
Jones type
employee language
Jones English
Jones French
Jones German
5NF
• A table is in fifth normal form if
– it is in fourth normal form and
– every join dependency is implied by the
candidate keys
Join Dependency
• A table T has a join dependency if T can
always be recreated by joining multiple
tables, each having a subset of the
attributes of T
5NF
• Suppose an agent sells specific products
for certain companies
– e.g., agent Jones sells Ford cars and Toyota
trucks but not Ford trucks or Toyota cars
• We would need a table with all three
attributes
agent company product
Jones Ford car
Jones Toyota truck
5NF
• But suppose instead that if an agent sells
a product then he sells that product for
every company he represents
– example not 5NF
agent company product
Jones Ford car
Jones Toyota truck
Jones Ford truck
Jones Toyota car
5NF
• In 5NF, this would be represented with
three tables
agent company
Jones Ford
Jones Toyota
agent product
Jones car
Jones truck
company product
Ford car
Ford truck
Toyota car
Toyota truck
More on Normalization
• "A Simple Guide to Five Normal Forms in
Relational Database Theory"
– by William Kent
• http://www.bkent.net/Doc/simple5.htm