Data NormalizationNormal is not something to aspire to,
it's something to get away from.
~ Jodie Foster ~
Activity: Sample User Report
DriverID #
DriverName
DriverChg/Hr
DriverTerritories
VehicleLic Plate
VehicleMake
VehicleModel
VehicleSize
VehicleChg/Hr
PermissionExp. Date
101 Bill Melator $100.00 West, North, Central PPF673 Cadillac Escalde M 100.00$ 12/31/2004PXK3D7T Chevy Tahoe L 120.00$ 12/31/2004445GH2 Lincon Towncar S 80.00$ 1/30/200559DLLK Lincon Continental S 80.00$ 4/30/2005
102 Willie Work $75.00 South, East PXK3D7T Chevy Tahoe L 120.00$ 1/15/2005663ETMP Chevy Suburban L 120.00$ 4/1/2005
103 Sal Ladd $75.00 Central, East, North 667GM8 Audi A8 M 100.00$ 9/1/2005445GH2 Lincon Towncar S 80.00$ 9/1/200559DLLK Lincon Continental S 80.00$ 12/31/2004
104 Carol Ling $100.00 Central, West 667GM8 Audi A8 M 100.00$ 7/31/2004PPF673 Cadillac Escalde M 100.00$ 7/31/200459DLLK Lincon Continental S 80.00$ 10/1/2004
FudgeCo Livery Driver Permissions Report
Can you build the underlying data model from this?
How many tables? What are the relationships?
Understanding Functional Dependence
For attributes A and B, B is functionally dependent on A means each value in column A determines one and only one value in column B. Written: A B A determines BB is the determinantEx: SSN Name
(Name is functionallydependent on SSN)
SSN Name123-45-6789 George Foreman123-46-9987 Georgeina Forman123-02-0902 George Foreman123-02-0993 George Foreman
Normalization Lingo
Prime attribute = Any attribute which is a primary key, or in the case of a composite key is part of a PK
Non-Prime Attribute = Any attribute which is not part of the PK.
Key Attribute = Prime AttributeNon-Key Attribute = Non-Prime Attribute
Normalization and FD
Technically, normalization is just the analysis of Functional Dependencies of all columns with respect to the primary key.
There are three “levels” of analysis:1. Functional Dependence – any non-prime attributes
which as FD on the PK.
2. Partial Functional Dependence – any non-key attributes which are FD on part of the PK.
3. Transitive Functional Dependence – any non-key attributes which are FD on some other non-key attribute(s).
Activity: IYCDTYCN!
Identify the: Primary Key? Prime Attributes? Non-Prime Attributes
DriverID #
DriverName
DriverChg/Hr
DriverTerritories
VehicleLic Plate
VehicleMake
VehicleModel
VehicleSize
VehicleChg/Hr
PermissionExp. Date
101 Bill Melator $100.00 West, North, Central PPF673 Cadillac Escalde M 100.00$ 12/31/2004PXK3D7T Chevy Tahoe L 120.00$ 12/31/2004445GH2 Lincon Towncar S 80.00$ 1/30/200559DLLK Lincon Continental S 80.00$ 4/30/2005
102 Willie Work $75.00 South, East PXK3D7T Chevy Tahoe L 120.00$ 1/15/2005663ETMP Chevy Suburban L 120.00$ 4/1/2005
103 Sal Ladd $75.00 Central, East, North 667GM8 Audi A8 M 100.00$ 9/1/2005445GH2 Lincon Towncar S 80.00$ 9/1/200559DLLK Lincon Continental S 80.00$ 12/31/2004
104 Carol Ling $100.00 Central, West 667GM8 Audi A8 M 100.00$ 7/31/2004PPF673 Cadillac Escalde M 100.00$ 7/31/200459DLLK Lincon Continental S 80.00$ 10/1/2004
FudgeCo Livery Driver Permissions Report
Identify the: Functional Dependencies (WRT the PK) Partial Functional Dependencies (WRT part of the PK) Transitive Functional Dependencies (WRT some non-prime attribute)
The Dependency Diagram
The Dependency Diagram is a Very Useful Tool. It depicts the dependencies which exist among the attributes.
DriverID #
DriverName
DriverChg/Hr
DriverTerritories
VehicleLic Plate
VehicleMake
VehicleModel
VehicleSize
VehicleChg/Hr
PermissionExp. Date
Primary Key
Not FD (Multi-Valued)Partial Dependency
Partial Dependency
Transitive Dependency
Normal Forms
A Normal Form represents the current “state” of the data model.
There are 4 basic normal forms:Zero Normal Form (0NF)
Non-key attributes exist which are not FD on PK.First Normal Form (1NF)
All non-key attributes FD on entire PK.Second Normal Form (2NF)
In 1NF andNo partial functional dependencies exist.
Third Normal Form (3NF)In 2NF and No transitive functional dependencies exist.
First Normal Form (1NF)
Definition: All non-key attributes must be FD on the entire PK.
(There must be PKFD for all attributes.) Rule:
Move each non-key FD column into its own new table. How to Apply the Rule:
For each non-key FD column:1. Place non-FD column into a new table2. Copy the PK (or part of it) from the original table into the
new table. This will be a FK in the new table.3. Assign a PK to the new table (typically a composite key
of the original Non-FD column and the FK.)
1NF: Example 1/2
What’s wrong with this data model?What should be PK be? Why? Is there an attribute not FD on the PK? Is it in 1NF already?What if Erin takes up bass fishing? I’m planning a ski trip, whom should I contact?
(How do I know Hobby3, skiing and not Hobby1)?
FID Email Name Hobby1 Hobby2 Hobby3 Hobby4101 [email protected] Seymour Ofu Basketball Golf Skiing Hiking102 [email protected] Isabelle Gunnering Golf Skiing103 [email protected] Pete Moss Skiing104 [email protected] Erin Dutyres Basketball105 [email protected] Chuck Itupp Basketball Golf
My Friends & Their Hobbies
1NF: Example 2/2
What was done: Hobbies table created. Contains the originally non FD
column, “hobby” The PK (FID) was copied into the hobbies table. The PK of the Hobbies table is the combination of FID and
Hobby.Questions:
Is this in 1NF? Can you reproduce the previous data model from this one?
Who likes skiing? Basketball?
FID Email Name101 [email protected] Seymour Ofu102 [email protected] Isabelle Gunnering103 [email protected] Pete Moss104 [email protected] Erin Dutyres105 [email protected] Chuck Itupp
Friends
FID Hobby101 Basketball101 Golf101 Skiing101 Hiking102 Golf102 Skiing103 Skiing104 Basketball105 Basketball105 Golf
Hobbies
Second Normal Form (2NF)
Definition: The data model must be in 1NF AND No partial functional dependencies can exist.
Rule: Move each partially FD non-key column into its own new
table. How to Apply the Rule:
For each partial dependency:1. Move all partially FD columns into a new table2. Copy the determinant into the new table.3. Make the determinant of the partial dependency:
1. The PK for the new table, FK to the existing table.
2NF: Example 1/2
What’s wrong with this data model?What should be PK be? Why?Do any partial dependencies exist?
Where?What is the determinant for each, if any?
Is it in 1NF already? 2NF? I made a mistake, 81HLV3 is a Power edge 5500, not a
4400?
Serial Num Server Name Make Model SWID SW Title Date InstalledVNK334 www.ist HP Netserver LH4 101 Windows 2000 Server 8/1/2003VNK334 www.ist HP Netserver LH4 201 MS SQL Server 2000 8/1/2003VNK334 www.ist HP Netserver LH4 302 Segate BackupExec 7 10/1/2003ASD44P iststudents Dell Poweredge 2550 101 Windows 2000 Server 4/15/2002ASD44P iststudents Dell Poweredge 2550 202 MS SQL Server 2000 4/17/200381HLV3 istwebct Dell Poweredge 4400 111 Red Hat Linux 7.3 12/4/200381HLV3 istwebct Dell Poweredge 4400 301 Webct 4.1 12/12/2003
Software Installed on IST Servers
2NF: Example 2/2
What was done: Serial Num + SWID is the primary key. Servers, Software tables created from partial dependencies,
where Serial Num,SWID are the determinants. Serial Num, is the PK for Servers, SWID is the PK for
Software, each are also FK’s for the SWInstallation tableQuestions:
Is this in 2NF? Can you reproduce the previous data model from this one?
Serial Num Server Name Make ModelVNK334 www.ist HP Netserver LH4ASD44P iststudents Dell Poweredge 255081HLV3 istwebct Dell Poweredge 4400
ServersSWID SW Title
101 Windows 2000 Server201 MS SQL Server 2000302 Segate BackupExec 7111 Red Hat Linux 7.3301 Webct 4.1
SoftwareSerial Num SWID Date InstalledVNK334 101 8/1/2003VNK334 201 8/1/2003VNK334 302 10/1/2003ASD44P 101 4/15/2002ASD44P 202 4/17/200381HLV3 111 12/4/200381HLV3 301 12/12/2003
SWInstallation
Third Normal Form (3NF)
Definition: The data model must be in 2NF AND No transitive functional dependencies can exist.
Rule: Move each transitive FD non-key column into its own new
table.
How to Apply the Rule: For each transitive dependency:1. Move all transitive FD columns into a new table.2. Copy the determinant column into the new table.3. Make the determinant of the transitive dependency:
The the PK for the new table. The FK for the original table.
3NF: Example 1/2
What’s wrong with this data model?What should be PK be? Why?Do any transitive dependencies exist?
Where?What is the determinant for each, if any?
Is it in 1NF already? 2NF? 3NF? I made a mistake, Koors phone number is 4905?
What’s wrong?
Beer ID Beer Name Keg Qty Distrib ID Distributor Dist. Phone101 Mikealobe 3 501 Anhoser-Busch 555-4901105 Dudweiser 0 501 Anhoser-Busch 555-4901102 Meisterchau 2 601 Millor 555-6691106 Mil's Beast 4 601 Millor 555-6691107 Koors 3 701 Koors 555-4904108 Koors Lite 1 701 Koors 555-4904
Fudgebar - Beer Inventory /Distribution Report
3NF: Example 2/2
What was done: Beer ID is the PK. All transitive dependencies moved into a new table,
Distributors. Distrib ID is the determinant. PK of Distributors table, FK in
original Beer table.Questions:
Is this in 3NF? Can you reproduce the previous data model from this one?
Beer ID Beer Name Keg Qty Distrib ID101 Mikealobe 3 501105 Dudweiser 0 501102 Meisterchau 2 601106 Mil's Beast 4 601107 Koors 3 701108 Koors Lite 1 701
BeersDistrib ID Distributor Dist. Phone
501 Anhoser-Busch555-4901601 Millor 555-6691701 Koors 555-4904
Distributors
Higher Normal Forms
Yes, there IS more…
… and it will blow your mind.
Boyce-Codd Normal Form (BCNF)
Rule: Eliminate key-transitive dependencies
A table in BCNF Means:The table is in 3NFIt includes no Non-Key attribute which
determines a key attribute, or part of a key attribute.
BCNF: An Example
Fourth Normal Form (4NF)
RULE: Eliminate multiple sets of multi-valued dependencies.
A table in 4NF Means:The table is in 3NFIt includes no sets of attributes which contain
multi-valued dependencies.
4NF: An Example
Figure 4.14Multivalued Dependencies
Figure 4.15Set of Tables in 4NF
How “far” should one Normalize?
For relational databases: 1NF is required, at minimum for practical RDBMS
implementations. The majority of the time data models are normalized to 3NF. Sometimes certain tables are left in 1NF or 2NF, for performance
or practical reasons. Higher normal forms BCNF, 4NF are rare.
In General, the Higher the NF of your DM: The more complicated the internal DM The more “programming” required to reproduce the external DM. But, the lesser the chance for data anomalies!!
It’s a total trade-off: Database complexity vs. data anomalies.
Mike’s “Road To 3NF”
To normalize correctly, follow this process for each table in the data model:
Designate acandidate key
PKFD for allattributes?
Apply1NF Rule
1NF Any transitivedependencies?
Apply3NF Rule
3NF
Any partialdependencies?
Apply2NF Rule
2NF
n
yy
y
n
n
Party Hard !
Normalization Summary Cheat Sheet
0NF 1NF (Resolve non FD)
1NF 2NF (Resolve Partial FD)
2NF 3NF (Resolve Transitive FD)
O O N
O O
O O
N1 N2
N
Data Normalization
Questions?