38
CSE 530A Normalization Washington University Fall 2013

CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

CSE 530A

Normalization

Washington University

Fall 2013

Page 2: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

1NF

• A relation is in first normal form if

– the domain of each attribute contains only

atomic values

– the value of each attribute contains only a

single value from that domain

• That is,

– no value in a table should contain a set of

values

Page 3: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

1NF

• Example of a table that is not in first

normal form

– Most DBMSs will not allow this (though some

have special array field types)

customers

cust_id last_name first_name phone

1 Bunny Bugs 555-123-4567

2 Duck Daffy 555-123-4568

555-867-5309

3 Pig Porky 555-123-4569

Set of values

Page 4: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Beyond 1NF

• The objectives of normalization beyond 1NF 1. To free the collection of relations from undesirable

insertion, update and deletion dependencies;

2. To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs;

3. To make the relational model more informative to users;

4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

– E.F. Codd, "Further Normalization of the Data Base Relational Model"

Page 5: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Modification Anomalies

• Update anomaly

– Employee 519 has different addresses on

different records

Page 6: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Modification Anomalies

• Insertion anomaly

– New faculty 424 cannot be inserted until assigned a course

Page 7: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Modification Anomalies

• Deletion anomaly

– Faculty 389 is deleted if temporarily

unassigned any course

Page 8: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Functional Dependency

• In a given table, an attribute Y is said to

have a functional dependency on a set

of attributes X (written X → Y) if and only if

each X value is associated with precisely

one Y value

– e.g., in an "Employee" table that has attributes

"Employee ID" and "Date of Birth" {Employee

ID} → {Date of Birth} would hold

• Each employee has one Date of Birth

Page 9: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Trivial FD

• A trivial functional dependency is a

functional dependency of an attribute on a

superset of itself.

– e.g., {Employee ID, Employee Address} →

{Employee Address} is trivial, as is {Employee

Address} → {Employee Address}

Page 10: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Full FD

• An attribute is fully functionally

dependent on a set of attributes X if it is

– functionally dependent on X, and

– not functionally dependent on any proper

subset of X.

– e.g., {Employee Address} has a functional

dependency on {Employee ID, Skill}, but not

a full functional dependency, because it is

also dependent on just {Employee ID}.

Page 11: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Superkeys and Candidate Keys

• A superkey is a combination of attributes

that can be used to uniquely identify a

database record

• A candidate key is a minimal superkey

– that is, a set of attributes is a candidate key if

there is no proper subset that is also a

candidate key

Page 12: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Prime and Non-Prime Attributes

• A non-prime attribute is an attribute that

does not occur in any candidate key

– "Employee Address" would be a non-prime

attribute in the "Employees Skills" table

• A prime attribute is an attribute that does

occur in some candidate key

Page 13: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

2NF

• A table is in second normal form if

– it is in first normal form and

– no non-prime attribute is dependent on any

proper subset of any candidate key of the

table

• 2NF is violated when

– a non-key field is a fact about a subset of a

key

• only relevant for tables with composite keys

Page 14: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

2NF

• Employees Skills is not in second normal form – Neither {Employee} nor {Skill} is a candidate key

– Only the composite key {Employee, Skill} qualifies as a candidate key

– Current Work Location is dependent on only part of the candidate key, namely Employee

Employees Skills

Employee Skill Current Work Location

Jones Typing 114 Main Street

Jones Shorthand 114 Main Street

Jones Whittling 114 Main Street

Bravo Light Cleaning 73 Industrial Way

Ellis Alchemy 73 Industrial Way

Ellis Flying 73 Industrial Way

Harrison Light Cleaning 73 Industrial Way

Page 15: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

2NF

• The example can be made 2NF by using

two tables

Employees

Employee Current Work Location

Jones 114 Main Street

Bravo 73 Industrial Way

Ellis 73 Industrial Way

Harrison 73 Industrial Way

Employees Skills

Employee Skill

Jones Typing

Jones Shorthand

Jones Whittling

Bravo Light Cleaning

Ellis Alchemy

Ellis Flying

Harrison Light Cleaning

Page 16: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Transitive Dependency

• A transitive dependency is an indirect

functional dependency

– X→Z only by virtue of X→Y and Y→Z

• e.g., {Tournament, Year} → {Winner} and {Winner}

→ {Winner Date of Birth}, so {Winner Date of Birth}

is transitively dependent on {Tournament, Year}

Tournament Year Winner Winner Date of Birth

Indiana Invitational 1998 Al Fredrickson 21 July 1975

Cleveland Open 1999 Bob Albertson 28 September 1968

Des Moines Masters 1999 Al Fredrickson 21 July 1975

Indiana Invitational 1999 Chip Masterson 14 March 1977

Page 17: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

3NF

• A table is in third normal form if

– it is in second normal form and

– every non-prime attribute is directly (non-

transitively) dependent on every superkey of

the table

• 3NF is violated when

– a non-key field is a fact about another non-

key field

Page 18: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

3NF

• This table is not in 3NF because the non-

prime attribute "Winner Date of Birth" is

transitively dependent on the candidate

key {Tournament, Year}

Tournament Year Winner Winner Date of Birth

Indiana Invitational 1998 Al Fredrickson 21 July 1975

Cleveland Open 1999 Bob Albertson 28 September 1968

Des Moines Masters 1999 Al Fredrickson 21 July 1975

Indiana Invitational 1999 Chip Masterson 14 March 1977

Page 19: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

3NF

• The example can be made 3NF by using

two tables Tournament Winners

Tournament Year Winner

Indiana Invitational 1998 Al Fredrickson

Cleveland Open 1999 Bob Albertson

Des Moines Masters 1999 Al Fredrickson

Indiana Invitational 1999 Chip Masterson

Player Dates of Birth

Player Date of Birth

Chip Masterson 14 March 1977

Al Fredrickson 21 July 1975

Bob Albertson 28 September 1968

Page 20: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

3NF

• "…the key, the whole key, and nothing but the key." – Bill Kent

• 1NF – implies existence of a "key" (no duplicate rows)

• 2NF – non-prime attributes dependent on "the whole key"

• 3NF – non-prime attributes dependent on "nothing but the

key"

Page 21: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

BCNF

• A table is in Boyce-Codd normal form if

– it is in third normal form and

– for every dependency X→Y either • X→Y is a trivial dependency (Y is a subset of X)

• X is a superkey

• It is very rare for a table to be in 3NF but not BCNF

– A table must have multiple overlapping candidate keys to possibly be 3NF but not BCNF

Page 22: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

BCNF

• Two courts, two rate types per court

• Candidate keys: – {Court, Start Time}

– {Court, End Time}

– {Rate Type, Start Time}

– {Rate Type, End Time}

Today's Court Bookings

Court Start Time End Time Rate Type

1 09:30 10:30 SAVER

1 11:00 12:00 SAVER

1 14:00 15:30 STANDARD

2 10:00 11:30 PREMIUM-B

2 11:30 13:30 PREMIUM-B

2 15:00 16:30 PREMIUM-A

Page 23: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

BCNF

• 3NF but not BCNF

– because of dependency {Rate Type} → {Court}

• Can be made BCNF by splitting

Rate Types

Rate Type Court Member Flag

SAVER 1 Yes

STANDARD 1 No

PREMIUM-A 2 Yes

PREMIUM-B 2 No

Today's Bookings

Rate Type Start Time End Time

SAVER 09:30 10:30

SAVER 11:00 12:00

STANDARD 14:00 15:30

PREMIUM-B 10:00 11:30

PREMIUM-B 11:30 13:30

PREMIUM-A 15:00 16:30

Page 24: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

4NF

• A table is in fourth normal form if

– it is in third normal form and

– for every non-trivial multivalued dependency

X Y, X is a superkey

Page 25: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Multivalued Dependency

• A multivalued dependency is a

constraint in which the presence of certain

rows in a table implies the presence of

certain other rows

Page 26: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Multivalued Dependency

• Consider a table that tries to map two

independent many-to-many relationships

between three attributes

– independent: no relationship between two of

the attributes

• e.g., employee-skill and employee-language but

no relationship between skill and language

• How are the records maintained?

Page 27: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Multivalued Dependency

• Option 1: disjoint format

– a record contains either a skill or a language

but not both

employee skill language

Jones cook

Jones type

Jones English

Jones French

Jones German

Page 28: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Multivalued Dependency

• Option 2: minimal number of records, with

repetitions

employee skill language

Jones cook English

Jones type French

Jones type German

Page 29: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Multivalued Dependency

• Option 3: minimal number of records, with

null values

employee skill language

Jones cook English

Jones type French

Jones German

Page 30: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Multivalued Dependency

• Option 4: unrestricted

employee skill language

Jones cook English

Jones type

Jones French

Jones type German

Page 31: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Multivalued Dependency

• Option 5: all possible pairings

– This is actually the version that corresponds to a multivalued dependency

• {employee} {skill}

• {employee} {language}

employee skill language

Jones cook English

Jones cook French

Jones cook German

Jones type English

Jones type French

Jones type German

Page 32: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

4NF

• Split into two tables

employee skill

Jones cook

Jones type

employee language

Jones English

Jones French

Jones German

Page 33: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

5NF

• A table is in fifth normal form if

– it is in fourth normal form and

– every join dependency is implied by the

candidate keys

Page 34: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

Join Dependency

• A table T has a join dependency if T can

always be recreated by joining multiple

tables, each having a subset of the

attributes of T

Page 35: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

5NF

• Suppose an agent sells specific products

for certain companies

– e.g., agent Jones sells Ford cars and Toyota

trucks but not Ford trucks or Toyota cars

• We would need a table with all three

attributes

agent company product

Jones Ford car

Jones Toyota truck

Page 36: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

5NF

• But suppose instead that if an agent sells

a product then he sells that product for

every company he represents

– example not 5NF

agent company product

Jones Ford car

Jones Toyota truck

Jones Ford truck

Jones Toyota car

Page 37: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

5NF

• In 5NF, this would be represented with

three tables

agent company

Jones Ford

Jones Toyota

agent product

Jones car

Jones truck

company product

Ford car

Ford truck

Toyota car

Toyota truck

Page 38: CSE 530A Normalization - Washington University in St. Louis · 2013-09-30 · 1NF •Example of a table that is not in first normal form –Most DBMSs will not allow this (though

More on Normalization

• "A Simple Guide to Five Normal Forms in

Relational Database Theory"

– by William Kent

• http://www.bkent.net/Doc/simple5.htm