Upload
edward-blurock
View
646
Download
1
Embed Size (px)
Citation preview
Database Normalization
How to convert a database to an efficient form
2
Purpose of Normalization
Normalization is a technique for producing a set of suitable relations
that support the data requirements of an enterprise.
© Pearson Education Limited 1995, 2005
3
Purpose of Normalization• Characteristics of a suitable set of relations
include: – the minimal number of attributes necessary to
support the data requirements of the enterprise;– attributes with a close logical relationship are
found in the same relation;– minimal redundancy with each attribute
represented only once with the important exception of attributes that form all or part of foreign keys.
© Pearson Education Limited 1995, 2005
4
Purpose of Normalization
• The benefits of using a database that has a suitable set of relations is that the database will be:– easier for the user to access and maintain the
data;– take up minimal storage space on the computer.
© Pearson Education Limited 1995, 2005
Normal Forms
All attributes depend on the key, the whole key and nothing but the key.
1NF Keys and no repeating groups2NF No partial dependencies3NF All determinants are candidate keys4NF No multivalued dependencies
Update Anomalies
One of the major purposes of normalization:Eliminate
Update AnomaliesMainly through
EliminatingRedundancies through
Functional dependencies
Redundant Information in Tuples and Update Anomalies
• Consider the relation:
• Insert Anomaly:– For a new employee, you have to assign NULL to projects– Cannot insert a project unless an employee is assigned to
it.
Redundant Information in Tuples and Update Anomalies
• Consider the relation:
• Delete Anomaly:– If we delete from EMP_DEPT an employee tuple
that happens to represent the last employee, the information containing that department is lost from the database
Redundant Information in Tuples and Update Anomalies
• Consider the relation:
• Modify Anomaly:– If we change the value of the manager of
department 5, we must update the tuples of all employees who work in the department
Null Values in Tuples• As far as possible, avoid placing attributes in a base
relation whose values may frequently be NULL
• For example: if only 10% of employees have individual offices, DO NOT include a attribute OFFICE_NUMBER in the EMPLOYEE relation
• Rather, a relation EMP_OFFICES(ESSN, OFFICE_NUMBER) can be created. (just like WEAK entity type)
Functional Dependence
Key concept in normalization
The key to finding redundancyIs to find
Functional dependence among theKeys and attributes
Functional Dependence
An attribute A is functionally dependent on attribute(s) B if: given a value b for B there is one and only one corresponding value a for A (at a time).
b2
b3
a1
b1
Example: functional dependence
All sales representatives in a given pay class have the same commission rate.
SalesRepNumber Name PayClass Commission
PayClass -> Commission
Functional Dependency
• Definition: a Functional Dependency, denoted by X -> Y holds if whenever two tuples have the same value
for X, they must have the same value for Y
Functional dependencies
• motivation: ‘good’ tables
takes1 (ssn, c-id, grade, name, address)
‘good’ or ‘bad’?
Functional dependencies
takes1 (ssn, c-id, grade, name, address)
Ssn c-id Grade Name Address
123 413 A smith Main
123 415 B smith Main
123 211 A smith Main
Functional dependencies
‘Bad’ - why?
Ssn c-id Grade Name Address
123 413 A smith Main
123 415 B smith Main
123 211 A smith Main
Functional Dependencies
• Redundancy– space– inconsistencies– insertion/deletion anomalies (later…)
• What caused the problem?
Functional dependencies
• … ‘name’ depends on ‘ssn’ • define ‘depends’
Ssn c-id Grade Name Address
123 413 A smith Main
123 415 B smith Main
123 211 A smith Main
Functional dependencies
Definition: ‘a’ functionally determines ‘b’
Ssn c-id Grade Name Address
123 413 A smith Main
123 415 B smith Main
123 211 A smith Main
ba
Functional dependencies
Informally: ‘if you know ‘a’, there is only one ‘b’ to match’
Ssn c-id Grade Name Address
123 413 A smith Main
123 415 B smith Main
123 211 A smith Main
Functional dependenciesformally:
if two tuples agree on the ‘X’ attribute,they *must* agree on the ‘Y’ attribute, too(e.g., if ssn is the same, so should address)
… a functional dependency is a generalization of the notion of a key
])[2][1][2][1( ytytxtxtYX
An attribute is fully functionally dependent on a set of attributes X if it is:• functionally dependent on X, • not functionally dependent on any proper subset of X.
{Employee Address} has a functional dependency on {Employee ID, Skill},
but not a full functional dependency, because it is also dependent on {Employee ID}.
Even by the removal of {Skill} functional dependency still holds between {Employee Address} and {Employee ID}.
Full functional dependency
Transitive dependency
A transitive dependency is an indirect functional dependency,
one in which
X→Z only by virtue of X→Y and Y→Z.
Relation AttributesA relation is made up of a tuple of attributesEach contributing aspects of the relation:
• Determinants• Prime and non-prime attributes• Keys: super, candidate, primary, foreign
The normalization process depends on the relationship between these types of attributes
Determinant
• A determinant in a database table is any attribute that you can use to determine the values assigned to other attribute(s) in the same row.
Non-prime attribute
Non-prime attributeAn attribute that does not occur in any candidate key.
Employee Address would be a non-prime attribute in the "Employees' Skills" table.
Prime attributeConversely, is an attribute that does occur in some candidate key.
Definitions of Keys and Attributes Participating in Keys
• A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S]
• A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more.
Basically a set of attributes which uniquely identify a row
Candidate Key
A super key reduced to the minimum number of columns required to uniquely identify each row.
In a sense, the candidate key is a minimal super key
Primary Key (C)
• C determines all attributes• The candidate key actually chosen in the
relation as the key
• A key consisting of more than one attribute is called a “composite key.”
Good Primary Keys
• Do not change over the life of the database• Are not “intelligent keys”• Are not too long• Do not consist of too many attributes (3 or
fewer is good)
Foreign Keys
A value in the “child” table that matches with the related value in the “parent” table.
SalesRep(SalesRepNumber, Name)[ 03 | Mary Jones ]
[ 124 | 03 ]Customer(CustomerNumber, SalesRepNumber)
Foreign Keys
The foreign key in the child table has the same value as the primary key in the parent.
• The foreign key in a many-to-many relationship goes in the many table.
• In a many-to-many relationship, foreign keys from both tables go into an associative entity.
• In a 1-to-1 relationship the foreign key goes into one of the tables (usually the one most likely to change)
Definitions of Keys and Attributes Participating in Keys
• If a relation schema has more than one key, each is called a candidate key.– One of the candidate keys is arbitrarily designated to be
the primary key
• A Prime attribute must be a member of some candidate key
• A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.
Keys
• Determinant– Any attribute that can determine values in a row
• Superkey– Set of attributes which uniquely identifies a row
• Candidate Key– A “minimal” (in terms of attributes) super key
• Primary Key– The chosen candidate key to identify the row
Normal Form
• Initially Codd (1972) presented three normal forms (1NF, 2NF and 3NF) all based on functional dependencies among the attributes of a relation.
• Later Boyce and Codd proposed another normal form called the Boyce-Codd normal form (BCNF).
• The fourth and fifth normal forms are based on multi-value and join dependencies and were proposed later.
• The primary objective of normalization is to avoid anomalies.
Normalization of Relations
• Normalization:– The process of decomposing unsatisfactory "bad"
relations by breaking up their attributes into smaller relations
• Normal form:– Condition using keys and FDs of a relation to
certify whether a relation schema is in a particular normal form
List of Normal Forms• First Normal Form (1NF)
– Atomic values
• 2NF, 3NF – based on primary keys
• 4NF– based on keys, multi-valued dependencies
• 5NF – based on keys, join dependencies
Practical Use of Normal Forms
• Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties
• The database designers need not normalize to the highest possible normal form– (usually up to 3NF, BCNF or 4NF)
1st Normal Form
• Table has a primary key• Table has no repeating groups
A multivalued attribute is an attribute that may have several values for one record
A repeating group is a set of one or more multivalued attributes that are related
Example
• Multivalued attribute:Orders(OrderNumber, OrderDate, {PartNumber})
[ 12491 | 9/02/2001 | BT04, BZ66 ]• Repeating group:Orders(OrderNumber, OrderDate, {PartNumber,
NumberOrdered})[12491 | 9/02/2001 | (BT04, 1), (BZ66, 1)]
1NF Example – Schema 2 (incorrect)
LANGUAGES
COBOL, JAVA, SQL
SQLJAVA, SQL, VB, COBOL
EMPID LNAME FNAME DEPT PHONE SALARY23 Jones Mark ITR 555-1087 4500025 Smith Sara FINC 555-2222 5500026 Billings David ACTG 555-4356 4200031 Dance Ivanna ACTG 444-4887 6000032 Jones Mary ITR 555-8745 7000035 Barker Bob ACTG 555-6565 4400036 Woods Robin ITR 555-9812 9000037 Jones Mary FINC 555-1234 56000
Employees TableSEXMFMFFMMF
VB, SQL, JAVACOBOL, SQL
NAMECOBOL
SQLJAVA
VB
FULLNAMECOmmon Business Oriented Language
Structured Query LanguageJAVA
Visual Basic
Languages Table
1NF Example – Schema 3 (incorrect)
LANG1EMPID LNAME FNAME DEPT PHONE SALARY23 Jones Mark ITR 555-1087 4500025 Smith Sara FINC 555-2222 5500026 Billings David ACTG 555-4356 4200031 Dance Ivanna ACTG 444-4887 6000032 Jones Mary ITR 555-8745 7000035 Barker Bob ACTG 555-6565 4400036 Woods Robin ITR 555-9812 9000037 Jones Mary FINC 555-1234 56000
Employees TableSEXMFMFFMMF
NAMECOBOL
SQLJAVA
VB
FULLNAMECOmmon Business Oriented Language
Structured Query LanguageJAVA
Visual Basic
Languages Table
COBOL SQL
SQLSQLJAVA
JAVA
VB
VB SQLCOBOL
JAVA
COBOL
SQL
LANG2 LANG3 LANG4
1NF Example – Schema 4 (incorrect)
COBOLEMPID LNAME FNAME DEPT PHONE SALARY23 Jones Mark ITR 555-1087 4500025 Smith Sara FINC 555-2222 5500026 Billings David ACTG 555-4356 4200031 Dance Ivanna ACTG 444-4887 6000032 Jones Mary ITR 555-8745 7000035 Barker Bob ACTG 555-6565 4400036 Woods Robin ITR 555-9812 9000037 Jones Mary FINC 555-1234 56000
Employees TableSEXMFMFFMMF
NAMECOBOL
SQLJAVA
VB
FULLNAMECOmmon Business Oriented Language
Structured Query LanguageJAVA
Visual Basic
Languages Table
T T
FTT
T
T
F TT
T
T
F
JAVA SQL VB
FF F F FF F F F
F F F F
F T F
TT F
1NF Example – Schema 1 (correct)
Programs TableEMPID LANGUAGE
2323
32
233132
COBOL
SQLSQL
SQLJAVA
JAVA
EMPID LNAME FNAME DEPT PHONE SALARY23 Jones Mark ITR 555-1087 4500025 Smith Sara FINC 555-2222 5500026 Billings David ACTG 555-4356 4200031 Dance Ivanna ACTG 444-4887 6000032 Jones Mary ITR 555-8745 7000035 Barker Bob ACTG 555-6565 4400036 Woods Robin ITR 555-9812 9000037 Jones Mary FINC 555-1234 56000
Employees TableSEXMFMFFMMF
3232
37
363636
VB
VBSQL
COBOLJAVA
COBOL
NAMECOBOL
SQLJAVA
VB
FULLNAMECOmmon Business Oriented Language
Structured Query LanguageJAVA
Visual Basic
Languages Table37 SQL
Normalisation to 1NF
Unnormalised
Module Dept Lecturer Texts
M1 D1 L1 T1, T2 M2 D1 L1 T1, T3 M3 D1 L2 T4 M4 D2 L3 T1, T5 M5 D2 L4 T6
1NF
Module Dept Lecturer Text
M1 D1 L1 T1 M1 D1 L1 T2 M2 D1 L1 T1 M2 D1 L1 T3 M3 D1 L2 T4 M4 D2 L3 T1 M4 D2 L3 T5 M5 D2 L4 T6
To convert to a 1NF relation, split up any non-atomic values
Problems in 1NF
• INSERT anomalies– Can't add a module with no texts
• UPDATE anomalies– To change lecturer for M1, we have to
change two rows• DELETE anomalies
– If we remove M3, we remove L2 as well
1NF
Module Dept Lecturer Text
M1 D1 L1 T1 M1 D1 L1 T2 M2 D1 L1 T1 M2 D1 L1 T3 M3 D1 L2 T4 M4 D2 L3 T1 M4 D2 L3 T5 M5 D2 L4 T6
2nd Normal Form
• First Normal Form is violated• If there exists a non-key field(s) which is
functionally dependent on a partial key.
partial key non-key
Second Normal Form is violated if:
2NF Example – Violation
JENO LINENO DESCRIPTION ACCTNO ACCTNAME AMOUNT1 1 Owner investment 100 Cash 20,000 1 2 Owner investment 310 Smith-Capital (20,000)2 1 Borrowed money 100 Cash 30,000 2 2 Borrowed money 220 Notes Payable (30,000)3 1 Purchased Supplies 120 Supplies 5,000 3 2 Purchased Supplies 100 Cash (1,000)3 3 Purchased Supplies 220 Notes Payable (4,000)
Transactions TableDATE
02-JAN-2003
03-JAN-200302-JAN-2003
03-JAN-200303-JAN-200303-JAN-200303-JAN-2003
Is there a non-key field which is functional dependenton a partial key?
2NF Example – ViolationFDs that indicate violation of 2NF
JENO LINENO DESCRIPTION ACCTNO ACCTNAME AMOUNT1 1 Owner investment 100 Cash 20,000 1 2 Owner investment 310 Smith-Capital (20,000)2 1 Borrowed money 100 Cash 30,000 2 2 Borrowed money 220 Notes Payable (30,000)3 1 Purchased Supplies 120 Supplies 5,000 3 2 Purchased Supplies 100 Cash (1,000)3 3 Purchased Supplies 220 Notes Payable (4,000)
DATE02-JAN-2003
03-JAN-200302-JAN-2003
03-JAN-200303-JAN-200303-JAN-200303-JAN-2003
2NF Example – Corrected
JENO LINENO ACCTNO ACCTNAME AMOUNT1 1 100 Cash 20,000 1 2 310 Smith-Capital (20,000)2 1 100 Cash 30,000 2 2 220 Notes Payable (30,000)3 1 120 Supplies 5,000 3 2 100 Cash (1,000)3 3 220 Notes Payable (4,000)
Transactions Table
JENO DESCRIPTION1 Owner investment2 Borrowed money3 Purchased Supplies
DATE02-JAN-200303-JAN-200303-JAN-2003
Journal_Entry Table
Removing FDs• Suppose we have a relation R with scheme S and the FD A B where A ∩ B = { }• Let C = S – (A U B)• In other words:
– A – attributes on the left hand side of the FD– B – attributes on the right hand side of the FD– C – all other attributes
• It turns out that we can split R into two parts:• R1, with scheme C U A• R2, with scheme A U B• The original relation can be recovered as the natural join of R1 and R2: • R = R1 NATURAL JOIN R2
Second Normal Form
• 1NF is not in 2NF– We have the FD{Module, Text}
{Lecturer, Dept}
– But also{Module} {Lecturer, Dept}
– And so Lecturer and Dept are partially dependent on the primary key
1NF
Module Dept Lecturer Text
M1 D1 L1 T1 M1 D1 L1 T2 M2 D1 L1 T1 M2 D1 L1 T3 M3 D1 L2 T4 M4 D2 L3 T1 M4 D2 L3 T5 M5 D2 L4 T6
1NF to 2NF – Example1NF
Module Dept Lecturer Text
M1 D1 L1 T1 M1 D1 L1 T2 M2 D1 L1 T1 M2 D1 L1 T3 M3 D1 L2 T4 M4 D2 L3 T1 M4 D2 L3 T5 M5 D2 L4 T6
2NFa
Module Dept Lecturer
M1 D1 L1 M2 D1 L1 M3 D1 L2 M4 D2 L3 M5 D2 L4
2NFb
Module Text
M1 T1 M1 T2 M2 T1 M2 T3 M3 T4 M4 T1 M4 T5 M1 T6
Problems Resolved in 2NF• Problems in 1NF
– INSERT – Can't add a module with no texts– UPDATE – To change lecturer for M1, we have to change two rows– DELETE – If we remove M3, we remove L2 as well
• In 2NF the first two are resolved, but not the third one
2NFa
Module Dept Lecturer
M1 D1 L1 M2 D1 L1 M3 D1 L2 M4 D2 L3 M5 D2 L4
Problems Remaining in 2NF• INSERT anomalies
– Can't add lecturers who teach no modules• UPDATE anomalies
– To change the department for L1 we must alter two rows• DELETE anomalies
– If we delete M3 we delete L2 as well
2NFa
Module Dept Lecturer
M1 D1 L1 M2 D1 L1 M3 D1 L2 M4 D2 L3 M5 D2 L4
Transitive FDs and 3NF• Transitive FDs:
– A FD, A C is a transitive FD, if there is some set B such that A B and B C are non-trivial FDs
– A B non-trivial means: B is not a subset of A– We have
A B C
• Third normal form – A relation is in third normal form (3NF) if it is in 2NF and no non-key
attribute is transitively dependent on a candidate key
3rd Normal Form
• Second Normal Form is violated• If there exists a non-key field(s) which is
functionally dependent on another non-key field(s).
non-key non-key
Third Normal Form is violated if:
Note: A candidate key is not a non-key field.
3NF from Cobb
Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent)
on every superkey of R.
Non-prime attributeattribute that does not occur in any candidate key
Superkeycombination of attributes used to uniquely identify row. A table might have many superkeys.
3NF Example – Violation
JENO LINENO ACCTNO ACCTNAME AMOUNT1 1 100 Cash 20,000 1 2 310 Smith-Capital (20,000)2 1 100 Cash 30,000 2 2 220 Notes Payable (30,000)3 1 120 Supplies 5,000 3 2 100 Cash (1,000)3 3 220 Notes Payable (4,000)
Transactions Table
JENO DESCRIPTION1 Owner investment2 Borrowed money3 Purchased Supplies
DATE02-JAN-200303-JAN-200303-JAN-2003
Journal_Entry TableAre there any non-key fields which functional determine another non-key field?
Are there any redundant facts?
3NF Example – ViolationFD that indicates violation of 3NF
JENO LINENO ACCTNO ACCTNAME AMOUNT1 1 100 Cash 20,000 1 2 310 Smith-Capital (20,000)2 1 100 Cash 30,000 2 2 220 Notes Payable (30,000)3 1 120 Supplies 5,000 3 2 100 Cash (1,000)3 3 220 Notes Payable (4,000)
JENO DESCRIPTION1 Owner investment2 Borrowed money3 Purchased Supplies
DATE02-JAN-200303-JAN-200303-JAN-2003
Journal_Entry TableEvery non-prime attribute
of R is non-transitively dependent (i.e. directly
dependent) on every superkey of R.
A non-prime attribute:ACCTNAMEIs dependent onSomething other than a superkeyACCTNO
3NF Example – ViolationFD that indicates violation of 3NF
JENO LINENO ACCTNO ACCTNAME AMOUNT1 1 100 Cash 20,000 1 2 310 Smith-Capital (20,000)2 1 100 Cash 30,000 2 2 220 Notes Payable (30,000)3 1 120 Supplies 5,000 3 2 100 Cash (1,000)3 3 220 Notes Payable (4,000)
JENO DESCRIPTION1 Owner investment2 Borrowed money3 Purchased Supplies
DATE02-JAN-200303-JAN-200303-JAN-2003
Journal_Entry TableAnomalies if not corrected:
• update (if name of account 100 changes it must be changed in multiple places risking inconsistancy) • deletion (can't delete JE#3 and its transactions without losing information about account 120)• insertion (can't set up a new account, Jones-capital, for a new partner unless we first have a transaction involving that account.
3NF Example – Corrected
JENO LINENO ACCTNO AMOUNT1 1 100 20,000 1 2 310 (20,000)2 1 100 30,000 2 2 220 (30,000)3 1 120 5,000 3 2 100 (1,000)3 3 220 (4,000)
JENO DESCRIPTION1 Owner investment2 Borrowed money3 Purchased Supplies
DATE02-JAN-200303-JAN-200303-JAN-2003
Journal_Entry Table
Transactions Table
ACCTNO ACCTNAME100 Cash
310 Smith-Capital220 Notes Payable120 Supplies
Accounts Table
3NF Example – CorrectedFinal Dependencies
JENO LINENO ACCTNO AMOUNT1 1 100 20,000 1 2 310 (20,000)2 1 100 30,000 2 2 220 (30,000)3 1 120 5,000 3 2 100 (1,000)3 3 220 (4,000)
JENO DESCRIPTION1 Owner investment2 Borrowed money3 Purchased Supplies
DATE02-JAN-200303-JAN-200303-JAN-2003
ACCTNO ACCTNAME100 Cash
310 Smith-Capital220 Notes Payable120 Supplies
All non-key fieldsare FD on the PKand only the PK.
Third Normal Form• 2NFa is not in 3NF
– We have the FDs{Module} {Lecturer}
{Lecturer} {Dept}– So there is a transitive FD from the primary key {Module} to {Dept}
2NFa
Module Dept Lecturer
M1 D1 L1 M2 D1 L1 M3 D1 L2 M4 D2 L3 M5 D2 L4
2NF to 3NF – Example
2NFa
Module Dept Lecturer
M1 D1 L1 M2 D1 L1 M3 D1 L2 M4 D2 L3 M5 D2 L4
3NFa
Lecturer Dept
L1 D1 L2 D1 L3 D2 L4 D2
3NFb
Module Lecturer
M1 L1 M2 L1 M3 L2 M4 L3 M5 L4
Problems Resolved in 3NF• Problems in 2NF
– INSERT – Can't add lecturers who teach no modules– UPDATE – To change the department for L1 we must alter two rows– DELETE – If we delete M3 we delete L2 as well
• In 3NF all of these are resolved (for this relation – but 3NF can still have anomalies!)
3NFa
Lecturer Dept
L1 D1 L2 D1 L3 D2 L4 D2
3NFb
Module Lecturer
M1 L1 M2 L1 M3 L2 M4 L3 M5 L4
4th Normal Form
• Boyce Codd Normal Form is violated• If there exists a partial key which has
multiple independent multi-valued functional dependencies to other partial keys.
partial-key1 partial-key2
partial-key3
4th Normal Form is violated if:
4NF Example – Violation
Name LanguageFred FrenchFred ItalianFred Spanish
InstrumentPianoFluteFlute
Instruments_Languages
Jane FrenchJane French
PianoOboe
Sam FrenchSam SpanishSam Spanish
PianoOboeFlute
4NF Example – Violation
Name LanguageFred FrenchFred ItalianFred Spanish
InstrumentPianoFluteFlute
Jane FrenchJane French
PianoOboe
Sam FrenchSam SpanishSam Spanish
PianoOboeFlute
Does this relation violate 1st, 2nd, 3rd, or BCNF?Are there any redundant facts?
4NF Example – Correction
Name LanguageFred FrenchFred ItalianFred Spanish
LanguagesSpoken
Jane FrenchSam FrenchSam Spanish
NameFredFred
InstrumentPianoFlute
InstrumentsPlayed
JaneJane
PianoOboe
SamSamSam
PianoOboeFlute
BCNF Normal Form
• Third Normal Form is violated• If there exists a partial key which is
functionally dependent on a non-key field(s).
non-key partial-key
Boyce-Codd Normal Form is violated if:
Boyce–Codd normal form
A relational schema R is in Boyce–Codd normal form if and only if for every one of its dependencies X → Y, at least one of the following conditions hold:
X → Y is a trivial functional dependency (Y X)⊆X is a superkey for schema R
trivial functional dependency a functional dependency of an attribute on a superset of itself(Y X)⊆SuperkeySet of attributes which uniquely identifies a row
3NF Examples
• ABC {ABC, CA} – In 3NF (but not in BCNF)– C is not a superkey, but A is in a candidate key
• ABC {AB, BC}– Not in 3NF– B is not a superkey, and C is not in a candidate key
• ABCD {ABC, BD}– Not in 3NF– B is not a superkey, and D is not in a candidate key
BNCF
• {ABC, CA}
• AB composite key• One of the prime attributes in the
composite key is dependent on a non-prime attribute
BCNF ExampleSemantics
• A student can have more than one major• A student has a different advisor for each
major.• Each advisor advises for only one major.
BCNF Example – ViolationSID MAJOR ADVISOR1 PHYSICS EINSTEIN1 BIOLOGY LIVINGSTON2 PHYSICS BOHR2 COMPUTER SCIENCE CODD3 PHYSICS EINSTEIN4 BIOLOGY LIVINGSTON4 ACCOUNTING PACIOLI5 PHYSICS EINSTEIN6 PHYSICS BOHR6 BIOLOGY DARWIN7 COMPUTER SCIENCE CODD7 BIOLOGY DARWIN
Student_Majors Table
BCNF Example – ViolationFD that violates BCNF
SID MAJOR ADVISOR1 PHYSICS EINSTEIN1 BIOLOGY LIVINGSTON2 PHYSICS BOHR2 COMPUTER SCIENCE CODD3 PHYSICS EINSTEIN4 BIOLOGY LIVINGSTON4 ACCOUNTING PACIOLI5 PHYSICS EINSTEIN6 PHYSICS BOHR6 BIOLOGY DARWIN7 COMPUTER SCIENCE CODD7 BIOLOGY DARWIN
It is importantthat you convinceyourself that majoris not FD onadvisor.
BCNF Example – Corrected
SID ADVISOR1 EINSTEIN1 LIVINGSTON2 BOHR2 CODD3 EINSTEIN4 LIVINGSTON4 PACIOLI5 EINSTEIN6 BOHR6 DARWIN7 CODD7 DARWIN
MAJORADVISOR
PHYSICSEINSTEINBIOLOGYLIVINGSTON
PHYSICSBOHRCOMPUTER SCIENCECODD
ACCOUNTINGPACIOLI
BIOLOGYDARWIN
Student_Advisors Table
Advisors Table
Fifth Normal Form (5NF)5NF is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships.
Fifth normal form (5NF), also known as project-join normal form (PJ/NF)
A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R.[1]
Join dependencyA table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.
A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.
5NF Example If agents represent companies, companies make products, and agents sell products,
then we might want to keep a record of which agent sells which product for which company.
-----------------------------| AGENT | COMPANY | PRODUCT ||-------+---------+---------|| Smith | Ford | car | | Smith | Ford | truck | | Smith | GM | car | | Smith | GM | truck | | Jones | Ford | car | -----------------------------
http://www.bkent.net/Doc/simple5.htm
5NF ExampleThis form is necessary in the general case.
For example, although agent Smith sells cars made by Ford and trucks made by GM, he does not sell Ford trucks or GM cars.
Thus we need the combination of three fields to know which combinations are valid and which are not.
-----------------------------| AGENT | COMPANY | PRODUCT ||-------+---------+---------|| Smith | Ford | car | | Smith | Ford | truck | | Smith | GM | car | | Smith | GM | truck | | Jones | Ford | car | -----------------------------
5NF ExampleBut suppose that a certain rule was in effect:
if an agent sells a certain product, and he represents a company making that product, then he sells that product for that company.
In this case, Iwe can reconstruct all the true facts from a normalized form consisting of three separate record types, each containing two fields:
------------------- --------------------- ------------------- | AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT ||-------+---------| |---------+---------| |-------+---------|| Smith | Ford | | Ford | car | | Smith | car || Smith | GM | | Ford | truck | | Smith | truck || Jones | Ford | | GM | car | | Jones | car |------------------- | GM | truck | ------------------- ---------------------
5NF Example
isolating semantically related multiple relationships.
we can reconstruct all the true facts from a normalized form consisting of three separate record types, each containing two fields
A record type is in fifth normal form when its information content cannot be reconstructed from several smaller record types,
i.e., from record types each having fewer fields than the original record.
Join dependencyA table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.
5NF Example-----------------------------| AGENT | COMPANY | PRODUCT ||-------+---------+---------|| Smith | Ford | car | | Smith | Ford | truck | | Smith | GM | car | | Smith | GM | truck | | Jones | Ford | car | -----------------------------
------------------- --------------------- ------------------- | AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT ||-------+---------| |---------+---------| |-------+---------|| Smith | Ford | | Ford | car | | Smith | car || Smith | GM | | Ford | truck | | Smith | truck || Jones | Ford | | GM | car | | Jones | car |------------------- | GM | truck | ------------------- ---------------------
Join dependencyA table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.
87
Denormalization
• Creation of normalized relations is important database design goal
• Processing requirements should also be a goal• If tables decomposed to conform to
normalization requirements:– Number of database tables expands
88
Denormalization (continued)
• Joining the larger number of tables takes additional input/output (I/O) operations and processing logic, thereby reducing system speed
• Conflicts between design efficiency, information requirements, and processing speed are often resolved through compromises that may include denormalization
89
Denormalization (continued)
• Unnormalized tables in production database tend to suffer from these defects:– Data updates are less efficient because programs
that read and update tables must deal with larger tables
– Indexing is more cumbersome– Unnormalized tables yield no simple strategies for
creating virtual tables known as views
90
Denormalization (continued)
• Use denormalization cautiously • Understand why—under some circumstances
—unnormalized tables are better choice
Functional dependencies
• ssn -> name, address• ssn, c-id -> grade
Ssn c-id Grade Name Address
123 413 A smith Main
123 415 B smith Main
123 211 A smith Main
Functional dependencies
K is a superkey for relation R iff K -> R
K is a candidate key for relation R iff:K -> Rfor no a K, a -> R
Functional dependencies
Closure of a set of FD: all implied FDs – e.g.:ssn -> name, addressssn, c-id -> grade
implyssn, c-id -> grade, name, addressssn, c-id -> ssn
FDs - Armstrong’s axioms
Closure of a set of FD: all implied FDs – e.g.:ssn -> name, addressssn, c-id -> grade
how to find all the implied ones, systematically?
FDs - Armstrong’s axioms
“Armstrong’s axioms” guarantee soundness and completeness:
• Reflexivity: e.g., ssn, name -> ssn• Augmentation
e.g., ssn->name then ssn,grade-> ssn,grade
YXXY
YWXWYX
FDs - Armstrong’s axioms
• Transitivity
ssn->address address-> county-tax-rateTHEN:
ssn-> county-tax-rate
ZXZYYX
FDs - Armstrong’s axioms
Reflexivity:
Augmentation:
Transitivity:
ZX
ZYYX
YXXY
YWXWYX
‘sound’ and ‘complete’
FDs – finding the closure F+
F+ = Frepeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f add the resulting functional dependencies to F+
for each pair of functional dependencies f1and f2 in F+
if f1 and f2 can be combined using transitivity then add the resulting functional dependency to F+
until F+ does not change any further
• We can further simplify manual computation of F+ by using the following additional rules
FDs - Armstrong’s axioms
Additional rules:
• Union
• Decomposition
• Pseudo-transitivity
ZXWZYW
YX
ZXYX
YZX
YZXZXYX
FDs - Armstrong’s axioms
Prove ‘Union’ from the three axioms:
YZXZXYX
?
FDs - Armstrong’s axioms
Prove ‘Union’ from the three axioms:
YZXtytransitiviandthusXisXXbut
XZXXXwaugmYZXZZwaugm
ZXYX
)4()3(;
)4(/.)2()3(/.)1(
)2()1(
FDs - Armstrong’s axioms
Prove Pseudo-transitivity:
ZXWZYW
YX
?
ZXZYYX
YXXY
YWXWYX
FDs - Armstrong’s axioms
Prove Decomposition
ZXZYYX
YXXY
YWXWYX
ZXYX
YZX?
FDs - Closure F+
Given a set F of FD (on a schema)F+ is the set of all implied FD. E.g.,takes(ssn, c-id, grade, name, address)
ssn, c-id -> grade ssn-> name, address }F
FDs - Closure F+
ssn, c-id -> grade ssn-> name, address ssn-> ssn ssn, c-id-> address c-id, address-> c-id ...
F+
Summary
• Normalization• Characteristics of a suitable set of relations
include: – the minimal number of attributes necessary to
support the data requirements of the enterprise;– attributes with a close logical relationship are
found in the same relation;– minimal redundancy with each attribute
represented only once with the important exception of attributes that form all or part of foreign keys.
Functional Dependence
Key concept in normalization
The key to finding redundancyIs to find
Functional dependence among theKeys and attributes
1st Normal Form
• Table has a primary key• Table has no repeating groups
A multivalued attribute is an attribute that may have several values for one record
A repeating group is a set of one or more multivalued attributes that are related
2nd Normal Form
• First Normal Form is violated• If there exists a non-key field(s) which is
functionally dependent on a partial key.
partial key non-key
Second Normal Form is violated if:
3rd Normal Form
• Second Normal Form is violated• If there exists a non-key field(s) which is
functionally dependent on another non-key field(s).
non-key non-key
Third Normal Form is violated if:
Note: A candidate key is not a non-key field.
4th Normal Form
• Boyce Codd Normal Form is violated• If there exists a partial key which has
multiple independent multi-valued functional dependencies to other partial keys.
partial-key1 partial-key2
partial-key3
4th Normal Form is violated if:
BCNF Normal Form
• Third Normal Form is violated• If there exists a partial key which is
functionally dependent on a non-key field(s).
non-key partial-key
Boyce-Codd Normal Form is violated if:
Fifth Normal Form (5NF)5NF is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships.
Fifth normal form (5NF), also known as project-join normal form (PJ/NF)
A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R.[1]