26
Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Embed Size (px)

Citation preview

Page 1: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Data NormalizationNormal is not something to aspire to,

it's something to get away from.

~ Jodie Foster ~

Page 2: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Activity: Sample User Report

DriverID #

DriverName

DriverChg/Hr

DriverTerritories

VehicleLic Plate

VehicleMake

VehicleModel

VehicleSize

VehicleChg/Hr

PermissionExp. Date

101 Bill Melator $100.00 West, North, Central PPF673 Cadillac Escalde M 100.00$ 12/31/2004PXK3D7T Chevy Tahoe L 120.00$ 12/31/2004445GH2 Lincon Towncar S 80.00$ 1/30/200559DLLK Lincon Continental S 80.00$ 4/30/2005

102 Willie Work $75.00 South, East PXK3D7T Chevy Tahoe L 120.00$ 1/15/2005663ETMP Chevy Suburban L 120.00$ 4/1/2005

103 Sal Ladd $75.00 Central, East, North 667GM8 Audi A8 M 100.00$ 9/1/2005445GH2 Lincon Towncar S 80.00$ 9/1/200559DLLK Lincon Continental S 80.00$ 12/31/2004

104 Carol Ling $100.00 Central, West 667GM8 Audi A8 M 100.00$ 7/31/2004PPF673 Cadillac Escalde M 100.00$ 7/31/200459DLLK Lincon Continental S 80.00$ 10/1/2004

FudgeCo Livery Driver Permissions Report

Can you build the underlying data model from this?

How many tables? What are the relationships?

Page 3: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Understanding Functional Dependence

For attributes A and B, B is functionally dependent on A means each value in column A determines one and only one value in column B. Written: A B A determines BB is the determinantEx: SSN Name

(Name is functionallydependent on SSN)

SSN Name123-45-6789 George Foreman123-46-9987 Georgeina Forman123-02-0902 George Foreman123-02-0993 George Foreman

Page 4: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Normalization Lingo

Prime attribute = Any attribute which is a primary key, or in the case of a composite key is part of a PK

Non-Prime Attribute = Any attribute which is not part of the PK.

Key Attribute = Prime AttributeNon-Key Attribute = Non-Prime Attribute

Page 5: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Normalization and FD

Technically, normalization is just the analysis of Functional Dependencies of all columns with respect to the primary key.

There are three “levels” of analysis:1. Functional Dependence – any non-prime attributes

which as FD on the PK.

2. Partial Functional Dependence – any non-key attributes which are FD on part of the PK.

3. Transitive Functional Dependence – any non-key attributes which are FD on some other non-key attribute(s).

Page 6: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Activity: IYCDTYCN!

Identify the: Primary Key? Prime Attributes? Non-Prime Attributes

DriverID #

DriverName

DriverChg/Hr

DriverTerritories

VehicleLic Plate

VehicleMake

VehicleModel

VehicleSize

VehicleChg/Hr

PermissionExp. Date

101 Bill Melator $100.00 West, North, Central PPF673 Cadillac Escalde M 100.00$ 12/31/2004PXK3D7T Chevy Tahoe L 120.00$ 12/31/2004445GH2 Lincon Towncar S 80.00$ 1/30/200559DLLK Lincon Continental S 80.00$ 4/30/2005

102 Willie Work $75.00 South, East PXK3D7T Chevy Tahoe L 120.00$ 1/15/2005663ETMP Chevy Suburban L 120.00$ 4/1/2005

103 Sal Ladd $75.00 Central, East, North 667GM8 Audi A8 M 100.00$ 9/1/2005445GH2 Lincon Towncar S 80.00$ 9/1/200559DLLK Lincon Continental S 80.00$ 12/31/2004

104 Carol Ling $100.00 Central, West 667GM8 Audi A8 M 100.00$ 7/31/2004PPF673 Cadillac Escalde M 100.00$ 7/31/200459DLLK Lincon Continental S 80.00$ 10/1/2004

FudgeCo Livery Driver Permissions Report

Identify the: Functional Dependencies (WRT the PK) Partial Functional Dependencies (WRT part of the PK) Transitive Functional Dependencies (WRT some non-prime attribute)

Page 7: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

The Dependency Diagram

The Dependency Diagram is a Very Useful Tool. It depicts the dependencies which exist among the attributes.

DriverID #

DriverName

DriverChg/Hr

DriverTerritories

VehicleLic Plate

VehicleMake

VehicleModel

VehicleSize

VehicleChg/Hr

PermissionExp. Date

Primary Key

Not FD (Multi-Valued)Partial Dependency

Partial Dependency

Transitive Dependency

Page 8: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Normal Forms

A Normal Form represents the current “state” of the data model.

There are 4 basic normal forms:Zero Normal Form (0NF)

Non-key attributes exist which are not FD on PK.First Normal Form (1NF)

All non-key attributes FD on entire PK.Second Normal Form (2NF)

In 1NF andNo partial functional dependencies exist.

Third Normal Form (3NF)In 2NF and No transitive functional dependencies exist.

Page 9: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

First Normal Form (1NF)

Definition: All non-key attributes must be FD on the entire PK.

(There must be PKFD for all attributes.) Rule:

Move each non-key FD column into its own new table. How to Apply the Rule:

For each non-key FD column:1. Place non-FD column into a new table2. Copy the PK (or part of it) from the original table into the

new table. This will be a FK in the new table.3. Assign a PK to the new table (typically a composite key

of the original Non-FD column and the FK.)

Page 10: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

1NF: Example 1/2

What’s wrong with this data model?What should be PK be? Why? Is there an attribute not FD on the PK? Is it in 1NF already?What if Erin takes up bass fishing? I’m planning a ski trip, whom should I contact?

(How do I know Hobby3, skiing and not Hobby1)?

FID Email Name Hobby1 Hobby2 Hobby3 Hobby4101 [email protected] Seymour Ofu Basketball Golf Skiing Hiking102 [email protected] Isabelle Gunnering Golf Skiing103 [email protected] Pete Moss Skiing104 [email protected] Erin Dutyres Basketball105 [email protected] Chuck Itupp Basketball Golf

My Friends & Their Hobbies

Page 11: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

1NF: Example 2/2

What was done: Hobbies table created. Contains the originally non FD

column, “hobby” The PK (FID) was copied into the hobbies table. The PK of the Hobbies table is the combination of FID and

Hobby.Questions:

Is this in 1NF? Can you reproduce the previous data model from this one?

Who likes skiing? Basketball?

FID Email Name101 [email protected] Seymour Ofu102 [email protected] Isabelle Gunnering103 [email protected] Pete Moss104 [email protected] Erin Dutyres105 [email protected] Chuck Itupp

Friends

FID Hobby101 Basketball101 Golf101 Skiing101 Hiking102 Golf102 Skiing103 Skiing104 Basketball105 Basketball105 Golf

Hobbies

Page 12: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Second Normal Form (2NF)

Definition: The data model must be in 1NF AND No partial functional dependencies can exist.

Rule: Move each partially FD non-key column into its own new

table. How to Apply the Rule:

For each partial dependency:1. Move all partially FD columns into a new table2. Copy the determinant into the new table.3. Make the determinant of the partial dependency:

1. The PK for the new table, FK to the existing table.

Page 13: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

2NF: Example 1/2

What’s wrong with this data model?What should be PK be? Why?Do any partial dependencies exist?

Where?What is the determinant for each, if any?

Is it in 1NF already? 2NF? I made a mistake, 81HLV3 is a Power edge 5500, not a

4400?

Serial Num Server Name Make Model SWID SW Title Date InstalledVNK334 www.ist HP Netserver LH4 101 Windows 2000 Server 8/1/2003VNK334 www.ist HP Netserver LH4 201 MS SQL Server 2000 8/1/2003VNK334 www.ist HP Netserver LH4 302 Segate BackupExec 7 10/1/2003ASD44P iststudents Dell Poweredge 2550 101 Windows 2000 Server 4/15/2002ASD44P iststudents Dell Poweredge 2550 202 MS SQL Server 2000 4/17/200381HLV3 istwebct Dell Poweredge 4400 111 Red Hat Linux 7.3 12/4/200381HLV3 istwebct Dell Poweredge 4400 301 Webct 4.1 12/12/2003

Software Installed on IST Servers

Page 14: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

2NF: Example 2/2

What was done: Serial Num + SWID is the primary key. Servers, Software tables created from partial dependencies,

where Serial Num,SWID are the determinants. Serial Num, is the PK for Servers, SWID is the PK for

Software, each are also FK’s for the SWInstallation tableQuestions:

Is this in 2NF? Can you reproduce the previous data model from this one?

Serial Num Server Name Make ModelVNK334 www.ist HP Netserver LH4ASD44P iststudents Dell Poweredge 255081HLV3 istwebct Dell Poweredge 4400

ServersSWID SW Title

101 Windows 2000 Server201 MS SQL Server 2000302 Segate BackupExec 7111 Red Hat Linux 7.3301 Webct 4.1

SoftwareSerial Num SWID Date InstalledVNK334 101 8/1/2003VNK334 201 8/1/2003VNK334 302 10/1/2003ASD44P 101 4/15/2002ASD44P 202 4/17/200381HLV3 111 12/4/200381HLV3 301 12/12/2003

SWInstallation

Page 15: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Third Normal Form (3NF)

Definition: The data model must be in 2NF AND No transitive functional dependencies can exist.

Rule: Move each transitive FD non-key column into its own new

table.

How to Apply the Rule: For each transitive dependency:1. Move all transitive FD columns into a new table.2. Copy the determinant column into the new table.3. Make the determinant of the transitive dependency:

The the PK for the new table. The FK for the original table.

Page 16: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

3NF: Example 1/2

What’s wrong with this data model?What should be PK be? Why?Do any transitive dependencies exist?

Where?What is the determinant for each, if any?

Is it in 1NF already? 2NF? 3NF? I made a mistake, Koors phone number is 4905?

What’s wrong?

Beer ID Beer Name Keg Qty Distrib ID Distributor Dist. Phone101 Mikealobe 3 501 Anhoser-Busch 555-4901105 Dudweiser 0 501 Anhoser-Busch 555-4901102 Meisterchau 2 601 Millor 555-6691106 Mil's Beast 4 601 Millor 555-6691107 Koors 3 701 Koors 555-4904108 Koors Lite 1 701 Koors 555-4904

Fudgebar - Beer Inventory /Distribution Report

Page 17: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

3NF: Example 2/2

What was done: Beer ID is the PK. All transitive dependencies moved into a new table,

Distributors. Distrib ID is the determinant. PK of Distributors table, FK in

original Beer table.Questions:

Is this in 3NF? Can you reproduce the previous data model from this one?

Beer ID Beer Name Keg Qty Distrib ID101 Mikealobe 3 501105 Dudweiser 0 501102 Meisterchau 2 601106 Mil's Beast 4 601107 Koors 3 701108 Koors Lite 1 701

BeersDistrib ID Distributor Dist. Phone

501 Anhoser-Busch555-4901601 Millor 555-6691701 Koors 555-4904

Distributors

Page 18: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Higher Normal Forms

Yes, there IS more…

… and it will blow your mind.

Page 19: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Boyce-Codd Normal Form (BCNF)

Rule: Eliminate key-transitive dependencies

A table in BCNF Means:The table is in 3NFIt includes no Non-Key attribute which

determines a key attribute, or part of a key attribute.

Page 20: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

BCNF: An Example

Page 21: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Fourth Normal Form (4NF)

RULE: Eliminate multiple sets of multi-valued dependencies.

A table in 4NF Means:The table is in 3NFIt includes no sets of attributes which contain

multi-valued dependencies.

Page 22: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

4NF: An Example

Figure 4.14Multivalued Dependencies

Figure 4.15Set of Tables in 4NF

Page 23: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

How “far” should one Normalize?

For relational databases: 1NF is required, at minimum for practical RDBMS

implementations. The majority of the time data models are normalized to 3NF. Sometimes certain tables are left in 1NF or 2NF, for performance

or practical reasons. Higher normal forms BCNF, 4NF are rare.

In General, the Higher the NF of your DM: The more complicated the internal DM The more “programming” required to reproduce the external DM. But, the lesser the chance for data anomalies!!

It’s a total trade-off: Database complexity vs. data anomalies.

Page 24: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Mike’s “Road To 3NF”

To normalize correctly, follow this process for each table in the data model:

Designate acandidate key

PKFD for allattributes?

Apply1NF Rule

1NF Any transitivedependencies?

Apply3NF Rule

3NF

Any partialdependencies?

Apply2NF Rule

2NF

n

yy

y

n

n

Party Hard !

Page 25: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Normalization Summary Cheat Sheet

0NF 1NF (Resolve non FD)

1NF 2NF (Resolve Partial FD)

2NF 3NF (Resolve Transitive FD)

O O N

O O

O O

N1 N2

N

Page 26: Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Data Normalization

Questions?