113
Database Normalization How to convert a database to an efficient form

Database normalization

Embed Size (px)

Citation preview

Page 1: Database normalization

Database Normalization

How to convert a database to an efficient form

Page 2: Database normalization

2

Purpose of Normalization

Normalization is a technique for producing a set of suitable relations

that support the data requirements of an enterprise.

© Pearson Education Limited 1995, 2005

Page 3: Database normalization

3

Purpose of Normalization• Characteristics of a suitable set of relations

include: – the minimal number of attributes necessary to

support the data requirements of the enterprise;– attributes with a close logical relationship are

found in the same relation;– minimal redundancy with each attribute

represented only once with the important exception of attributes that form all or part of foreign keys.

© Pearson Education Limited 1995, 2005

Page 4: Database normalization

4

Purpose of Normalization

• The benefits of using a database that has a suitable set of relations is that the database will be:– easier for the user to access and maintain the

data;– take up minimal storage space on the computer.

© Pearson Education Limited 1995, 2005

Page 5: Database normalization

Normal Forms

All attributes depend on the key, the whole key and nothing but the key.

1NF Keys and no repeating groups2NF No partial dependencies3NF All determinants are candidate keys4NF No multivalued dependencies

Page 6: Database normalization

Update Anomalies

One of the major purposes of normalization:Eliminate

Update AnomaliesMainly through

EliminatingRedundancies through

Functional dependencies

Page 7: Database normalization

Redundant Information in Tuples and Update Anomalies

• Consider the relation:

• Insert Anomaly:– For a new employee, you have to assign NULL to projects– Cannot insert a project unless an employee is assigned to

it.

Page 8: Database normalization

Redundant Information in Tuples and Update Anomalies

• Consider the relation:

• Delete Anomaly:– If we delete from EMP_DEPT an employee tuple

that happens to represent the last employee, the information containing that department is lost from the database

Page 9: Database normalization

Redundant Information in Tuples and Update Anomalies

• Consider the relation:

• Modify Anomaly:– If we change the value of the manager of

department 5, we must update the tuples of all employees who work in the department

Page 10: Database normalization

Null Values in Tuples• As far as possible, avoid placing attributes in a base

relation whose values may frequently be NULL

• For example: if only 10% of employees have individual offices, DO NOT include a attribute OFFICE_NUMBER in the EMPLOYEE relation

• Rather, a relation EMP_OFFICES(ESSN, OFFICE_NUMBER) can be created. (just like WEAK entity type)

Page 11: Database normalization

Functional Dependence

Key concept in normalization

The key to finding redundancyIs to find

Functional dependence among theKeys and attributes

Page 12: Database normalization

Functional Dependence

An attribute A is functionally dependent on attribute(s) B if: given a value b for B there is one and only one corresponding value a for A (at a time).

b2

b3

a1

b1

Page 13: Database normalization

Example: functional dependence

All sales representatives in a given pay class have the same commission rate.

SalesRepNumber Name PayClass Commission

PayClass -> Commission

Page 14: Database normalization

Functional Dependency

• Definition: a Functional Dependency, denoted by X -> Y holds if whenever two tuples have the same value

for X, they must have the same value for Y

Page 15: Database normalization

Functional dependencies

• motivation: ‘good’ tables

takes1 (ssn, c-id, grade, name, address)

‘good’ or ‘bad’?

Page 16: Database normalization

Functional dependencies

takes1 (ssn, c-id, grade, name, address)

Ssn c-id Grade Name Address

123 413 A smith Main

123 415 B smith Main

123 211 A smith Main

Page 17: Database normalization

Functional dependencies

‘Bad’ - why?

Ssn c-id Grade Name Address

123 413 A smith Main

123 415 B smith Main

123 211 A smith Main

Page 18: Database normalization

Functional Dependencies

• Redundancy– space– inconsistencies– insertion/deletion anomalies (later…)

• What caused the problem?

Page 19: Database normalization

Functional dependencies

• … ‘name’ depends on ‘ssn’ • define ‘depends’

Ssn c-id Grade Name Address

123 413 A smith Main

123 415 B smith Main

123 211 A smith Main

Page 20: Database normalization

Functional dependencies

Definition: ‘a’ functionally determines ‘b’

Ssn c-id Grade Name Address

123 413 A smith Main

123 415 B smith Main

123 211 A smith Main

ba

Page 21: Database normalization

Functional dependencies

Informally: ‘if you know ‘a’, there is only one ‘b’ to match’

Ssn c-id Grade Name Address

123 413 A smith Main

123 415 B smith Main

123 211 A smith Main

Page 22: Database normalization

Functional dependenciesformally:

if two tuples agree on the ‘X’ attribute,they *must* agree on the ‘Y’ attribute, too(e.g., if ssn is the same, so should address)

… a functional dependency is a generalization of the notion of a key

])[2][1][2][1( ytytxtxtYX

Page 23: Database normalization

An attribute is fully functionally dependent on a set of attributes X if it is:• functionally dependent on X, • not functionally dependent on any proper subset of X.

{Employee Address} has a functional dependency on {Employee ID, Skill},

but not a full functional dependency, because it is also dependent on {Employee ID}.

Even by the removal of {Skill} functional dependency still holds between {Employee Address} and {Employee ID}.

Full functional dependency

Page 24: Database normalization

Transitive dependency

A transitive dependency is an indirect functional dependency,

one in which

X→Z only by virtue of X→Y and Y→Z.

Page 25: Database normalization

Relation AttributesA relation is made up of a tuple of attributesEach contributing aspects of the relation:

• Determinants• Prime and non-prime attributes• Keys: super, candidate, primary, foreign

The normalization process depends on the relationship between these types of attributes

Page 26: Database normalization

Determinant

• A determinant in a database table is any attribute that you can use to determine the values assigned to other attribute(s) in the same row.

Page 27: Database normalization

Non-prime attribute

Non-prime attributeAn attribute that does not occur in any candidate key.

Employee Address would be a non-prime attribute in the "Employees' Skills" table.

Prime attributeConversely, is an attribute that does occur in some candidate key.

Page 28: Database normalization

Definitions of Keys and Attributes Participating in Keys

• A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S]

• A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more.

Basically a set of attributes which uniquely identify a row

Page 29: Database normalization

Candidate Key

A super key reduced to the minimum number of columns required to uniquely identify each row.

In a sense, the candidate key is a minimal super key

Page 30: Database normalization

Primary Key (C)

• C determines all attributes• The candidate key actually chosen in the

relation as the key

• A key consisting of more than one attribute is called a “composite key.”

Page 31: Database normalization

Good Primary Keys

• Do not change over the life of the database• Are not “intelligent keys”• Are not too long• Do not consist of too many attributes (3 or

fewer is good)

Page 32: Database normalization

Foreign Keys

A value in the “child” table that matches with the related value in the “parent” table.

SalesRep(SalesRepNumber, Name)[ 03 | Mary Jones ]

[ 124 | 03 ]Customer(CustomerNumber, SalesRepNumber)

Page 33: Database normalization

Foreign Keys

The foreign key in the child table has the same value as the primary key in the parent.

• The foreign key in a many-to-many relationship goes in the many table.

• In a many-to-many relationship, foreign keys from both tables go into an associative entity.

• In a 1-to-1 relationship the foreign key goes into one of the tables (usually the one most likely to change)

Page 34: Database normalization

Definitions of Keys and Attributes Participating in Keys

• If a relation schema has more than one key, each is called a candidate key.– One of the candidate keys is arbitrarily designated to be

the primary key

• A Prime attribute must be a member of some candidate key

• A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.

Page 35: Database normalization

Keys

• Determinant– Any attribute that can determine values in a row

• Superkey– Set of attributes which uniquely identifies a row

• Candidate Key– A “minimal” (in terms of attributes) super key

• Primary Key– The chosen candidate key to identify the row

Page 36: Database normalization

Normal Form

• Initially Codd (1972) presented three normal forms (1NF, 2NF and 3NF) all based on functional dependencies among the attributes of a relation.

• Later Boyce and Codd proposed another normal form called the Boyce-Codd normal form (BCNF).

• The fourth and fifth normal forms are based on multi-value and join dependencies and were proposed later.

• The primary objective of normalization is to avoid anomalies.

Page 37: Database normalization

Normalization of Relations

• Normalization:– The process of decomposing unsatisfactory "bad"

relations by breaking up their attributes into smaller relations

• Normal form:– Condition using keys and FDs of a relation to

certify whether a relation schema is in a particular normal form

Page 38: Database normalization

List of Normal Forms• First Normal Form (1NF)

– Atomic values

• 2NF, 3NF – based on primary keys

• 4NF– based on keys, multi-valued dependencies

• 5NF – based on keys, join dependencies

Page 39: Database normalization

Practical Use of Normal Forms

• Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties

• The database designers need not normalize to the highest possible normal form– (usually up to 3NF, BCNF or 4NF)

Page 40: Database normalization

1st Normal Form

• Table has a primary key• Table has no repeating groups

A multivalued attribute is an attribute that may have several values for one record

A repeating group is a set of one or more multivalued attributes that are related

Page 41: Database normalization

Example

• Multivalued attribute:Orders(OrderNumber, OrderDate, {PartNumber})

[ 12491 | 9/02/2001 | BT04, BZ66 ]• Repeating group:Orders(OrderNumber, OrderDate, {PartNumber,

NumberOrdered})[12491 | 9/02/2001 | (BT04, 1), (BZ66, 1)]

Page 42: Database normalization

1NF Example – Schema 2 (incorrect)

LANGUAGES

COBOL, JAVA, SQL

SQLJAVA, SQL, VB, COBOL

EMPID LNAME FNAME DEPT PHONE SALARY23 Jones Mark ITR 555-1087 4500025 Smith Sara FINC 555-2222 5500026 Billings David ACTG 555-4356 4200031 Dance Ivanna ACTG 444-4887 6000032 Jones Mary ITR 555-8745 7000035 Barker Bob ACTG 555-6565 4400036 Woods Robin ITR 555-9812 9000037 Jones Mary FINC 555-1234 56000

Employees TableSEXMFMFFMMF

VB, SQL, JAVACOBOL, SQL

NAMECOBOL

SQLJAVA

VB

FULLNAMECOmmon Business Oriented Language

Structured Query LanguageJAVA

Visual Basic

Languages Table

Page 43: Database normalization

1NF Example – Schema 3 (incorrect)

LANG1EMPID LNAME FNAME DEPT PHONE SALARY23 Jones Mark ITR 555-1087 4500025 Smith Sara FINC 555-2222 5500026 Billings David ACTG 555-4356 4200031 Dance Ivanna ACTG 444-4887 6000032 Jones Mary ITR 555-8745 7000035 Barker Bob ACTG 555-6565 4400036 Woods Robin ITR 555-9812 9000037 Jones Mary FINC 555-1234 56000

Employees TableSEXMFMFFMMF

NAMECOBOL

SQLJAVA

VB

FULLNAMECOmmon Business Oriented Language

Structured Query LanguageJAVA

Visual Basic

Languages Table

COBOL SQL

SQLSQLJAVA

JAVA

VB

VB SQLCOBOL

JAVA

COBOL

SQL

LANG2 LANG3 LANG4

Page 44: Database normalization

1NF Example – Schema 4 (incorrect)

COBOLEMPID LNAME FNAME DEPT PHONE SALARY23 Jones Mark ITR 555-1087 4500025 Smith Sara FINC 555-2222 5500026 Billings David ACTG 555-4356 4200031 Dance Ivanna ACTG 444-4887 6000032 Jones Mary ITR 555-8745 7000035 Barker Bob ACTG 555-6565 4400036 Woods Robin ITR 555-9812 9000037 Jones Mary FINC 555-1234 56000

Employees TableSEXMFMFFMMF

NAMECOBOL

SQLJAVA

VB

FULLNAMECOmmon Business Oriented Language

Structured Query LanguageJAVA

Visual Basic

Languages Table

T T

FTT

T

T

F TT

T

T

F

JAVA SQL VB

FF F F FF F F F

F F F F

F T F

TT F

Page 45: Database normalization

1NF Example – Schema 1 (correct)

Programs TableEMPID LANGUAGE

2323

32

233132

COBOL

SQLSQL

SQLJAVA

JAVA

EMPID LNAME FNAME DEPT PHONE SALARY23 Jones Mark ITR 555-1087 4500025 Smith Sara FINC 555-2222 5500026 Billings David ACTG 555-4356 4200031 Dance Ivanna ACTG 444-4887 6000032 Jones Mary ITR 555-8745 7000035 Barker Bob ACTG 555-6565 4400036 Woods Robin ITR 555-9812 9000037 Jones Mary FINC 555-1234 56000

Employees TableSEXMFMFFMMF

3232

37

363636

VB

VBSQL

COBOLJAVA

COBOL

NAMECOBOL

SQLJAVA

VB

FULLNAMECOmmon Business Oriented Language

Structured Query LanguageJAVA

Visual Basic

Languages Table37 SQL

Page 46: Database normalization

Normalisation to 1NF

Unnormalised

Module Dept Lecturer Texts

M1 D1 L1 T1, T2 M2 D1 L1 T1, T3 M3 D1 L2 T4 M4 D2 L3 T1, T5 M5 D2 L4 T6

1NF

Module Dept Lecturer Text

M1 D1 L1 T1 M1 D1 L1 T2 M2 D1 L1 T1 M2 D1 L1 T3 M3 D1 L2 T4 M4 D2 L3 T1 M4 D2 L3 T5 M5 D2 L4 T6

To convert to a 1NF relation, split up any non-atomic values

Page 47: Database normalization

Problems in 1NF

• INSERT anomalies– Can't add a module with no texts

• UPDATE anomalies– To change lecturer for M1, we have to

change two rows• DELETE anomalies

– If we remove M3, we remove L2 as well

1NF

Module Dept Lecturer Text

M1 D1 L1 T1 M1 D1 L1 T2 M2 D1 L1 T1 M2 D1 L1 T3 M3 D1 L2 T4 M4 D2 L3 T1 M4 D2 L3 T5 M5 D2 L4 T6

Page 48: Database normalization

2nd Normal Form

• First Normal Form is violated• If there exists a non-key field(s) which is

functionally dependent on a partial key.

partial key non-key

Second Normal Form is violated if:

Page 49: Database normalization

2NF Example – Violation

JENO LINENO DESCRIPTION ACCTNO ACCTNAME AMOUNT1 1 Owner investment 100 Cash 20,000 1 2 Owner investment 310 Smith-Capital (20,000)2 1 Borrowed money 100 Cash 30,000 2 2 Borrowed money 220 Notes Payable (30,000)3 1 Purchased Supplies 120 Supplies 5,000 3 2 Purchased Supplies 100 Cash (1,000)3 3 Purchased Supplies 220 Notes Payable (4,000)

Transactions TableDATE

02-JAN-2003

03-JAN-200302-JAN-2003

03-JAN-200303-JAN-200303-JAN-200303-JAN-2003

Is there a non-key field which is functional dependenton a partial key?

Page 50: Database normalization

2NF Example – ViolationFDs that indicate violation of 2NF

JENO LINENO DESCRIPTION ACCTNO ACCTNAME AMOUNT1 1 Owner investment 100 Cash 20,000 1 2 Owner investment 310 Smith-Capital (20,000)2 1 Borrowed money 100 Cash 30,000 2 2 Borrowed money 220 Notes Payable (30,000)3 1 Purchased Supplies 120 Supplies 5,000 3 2 Purchased Supplies 100 Cash (1,000)3 3 Purchased Supplies 220 Notes Payable (4,000)

DATE02-JAN-2003

03-JAN-200302-JAN-2003

03-JAN-200303-JAN-200303-JAN-200303-JAN-2003

Page 51: Database normalization

2NF Example – Corrected

JENO LINENO ACCTNO ACCTNAME AMOUNT1 1 100 Cash 20,000 1 2 310 Smith-Capital (20,000)2 1 100 Cash 30,000 2 2 220 Notes Payable (30,000)3 1 120 Supplies 5,000 3 2 100 Cash (1,000)3 3 220 Notes Payable (4,000)

Transactions Table

JENO DESCRIPTION1 Owner investment2 Borrowed money3 Purchased Supplies

DATE02-JAN-200303-JAN-200303-JAN-2003

Journal_Entry Table

Page 52: Database normalization

Removing FDs• Suppose we have a relation R with scheme S and the FD A B where A ∩ B = { }• Let C = S – (A U B)• In other words:

– A – attributes on the left hand side of the FD– B – attributes on the right hand side of the FD– C – all other attributes

• It turns out that we can split R into two parts:• R1, with scheme C U A• R2, with scheme A U B• The original relation can be recovered as the natural join of R1 and R2: • R = R1 NATURAL JOIN R2

Page 53: Database normalization

Second Normal Form

• 1NF is not in 2NF– We have the FD{Module, Text}

{Lecturer, Dept}

– But also{Module} {Lecturer, Dept}

– And so Lecturer and Dept are partially dependent on the primary key

1NF

Module Dept Lecturer Text

M1 D1 L1 T1 M1 D1 L1 T2 M2 D1 L1 T1 M2 D1 L1 T3 M3 D1 L2 T4 M4 D2 L3 T1 M4 D2 L3 T5 M5 D2 L4 T6

Page 54: Database normalization

1NF to 2NF – Example1NF

Module Dept Lecturer Text

M1 D1 L1 T1 M1 D1 L1 T2 M2 D1 L1 T1 M2 D1 L1 T3 M3 D1 L2 T4 M4 D2 L3 T1 M4 D2 L3 T5 M5 D2 L4 T6

2NFa

Module Dept Lecturer

M1 D1 L1 M2 D1 L1 M3 D1 L2 M4 D2 L3 M5 D2 L4

2NFb

Module Text

M1 T1 M1 T2 M2 T1 M2 T3 M3 T4 M4 T1 M4 T5 M1 T6

Page 55: Database normalization

Problems Resolved in 2NF• Problems in 1NF

– INSERT – Can't add a module with no texts– UPDATE – To change lecturer for M1, we have to change two rows– DELETE – If we remove M3, we remove L2 as well

• In 2NF the first two are resolved, but not the third one

2NFa

Module Dept Lecturer

M1 D1 L1 M2 D1 L1 M3 D1 L2 M4 D2 L3 M5 D2 L4

Page 56: Database normalization

Problems Remaining in 2NF• INSERT anomalies

– Can't add lecturers who teach no modules• UPDATE anomalies

– To change the department for L1 we must alter two rows• DELETE anomalies

– If we delete M3 we delete L2 as well

2NFa

Module Dept Lecturer

M1 D1 L1 M2 D1 L1 M3 D1 L2 M4 D2 L3 M5 D2 L4

Page 57: Database normalization

Transitive FDs and 3NF• Transitive FDs:

– A FD, A C is a transitive FD, if there is some set B such that A B and B C are non-trivial FDs

– A B non-trivial means: B is not a subset of A– We have

A B C

• Third normal form – A relation is in third normal form (3NF) if it is in 2NF and no non-key

attribute is transitively dependent on a candidate key

Page 58: Database normalization

3rd Normal Form

• Second Normal Form is violated• If there exists a non-key field(s) which is

functionally dependent on another non-key field(s).

non-key non-key

Third Normal Form is violated if:

Note: A candidate key is not a non-key field.

Page 59: Database normalization

3NF from Cobb

Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent)

on every superkey of R.

Non-prime attributeattribute that does not occur in any candidate key

Superkeycombination of attributes used to uniquely identify row. A table might have many superkeys.

Page 60: Database normalization

3NF Example – Violation

JENO LINENO ACCTNO ACCTNAME AMOUNT1 1 100 Cash 20,000 1 2 310 Smith-Capital (20,000)2 1 100 Cash 30,000 2 2 220 Notes Payable (30,000)3 1 120 Supplies 5,000 3 2 100 Cash (1,000)3 3 220 Notes Payable (4,000)

Transactions Table

JENO DESCRIPTION1 Owner investment2 Borrowed money3 Purchased Supplies

DATE02-JAN-200303-JAN-200303-JAN-2003

Journal_Entry TableAre there any non-key fields which functional determine another non-key field?

Are there any redundant facts?

Page 61: Database normalization

3NF Example – ViolationFD that indicates violation of 3NF

JENO LINENO ACCTNO ACCTNAME AMOUNT1 1 100 Cash 20,000 1 2 310 Smith-Capital (20,000)2 1 100 Cash 30,000 2 2 220 Notes Payable (30,000)3 1 120 Supplies 5,000 3 2 100 Cash (1,000)3 3 220 Notes Payable (4,000)

JENO DESCRIPTION1 Owner investment2 Borrowed money3 Purchased Supplies

DATE02-JAN-200303-JAN-200303-JAN-2003

Journal_Entry TableEvery non-prime attribute

of R is non-transitively dependent (i.e. directly

dependent) on every superkey of R.

A non-prime attribute:ACCTNAMEIs dependent onSomething other than a superkeyACCTNO

Page 62: Database normalization

3NF Example – ViolationFD that indicates violation of 3NF

JENO LINENO ACCTNO ACCTNAME AMOUNT1 1 100 Cash 20,000 1 2 310 Smith-Capital (20,000)2 1 100 Cash 30,000 2 2 220 Notes Payable (30,000)3 1 120 Supplies 5,000 3 2 100 Cash (1,000)3 3 220 Notes Payable (4,000)

JENO DESCRIPTION1 Owner investment2 Borrowed money3 Purchased Supplies

DATE02-JAN-200303-JAN-200303-JAN-2003

Journal_Entry TableAnomalies if not corrected:

• update (if name of account 100 changes it must be changed in multiple places risking inconsistancy) • deletion (can't delete JE#3 and its transactions without losing information about account 120)• insertion (can't set up a new account, Jones-capital, for a new partner unless we first have a transaction involving that account.

Page 63: Database normalization

3NF Example – Corrected

JENO LINENO ACCTNO AMOUNT1 1 100 20,000 1 2 310 (20,000)2 1 100 30,000 2 2 220 (30,000)3 1 120 5,000 3 2 100 (1,000)3 3 220 (4,000)

JENO DESCRIPTION1 Owner investment2 Borrowed money3 Purchased Supplies

DATE02-JAN-200303-JAN-200303-JAN-2003

Journal_Entry Table

Transactions Table

ACCTNO ACCTNAME100 Cash

310 Smith-Capital220 Notes Payable120 Supplies

Accounts Table

Page 64: Database normalization

3NF Example – CorrectedFinal Dependencies

JENO LINENO ACCTNO AMOUNT1 1 100 20,000 1 2 310 (20,000)2 1 100 30,000 2 2 220 (30,000)3 1 120 5,000 3 2 100 (1,000)3 3 220 (4,000)

JENO DESCRIPTION1 Owner investment2 Borrowed money3 Purchased Supplies

DATE02-JAN-200303-JAN-200303-JAN-2003

ACCTNO ACCTNAME100 Cash

310 Smith-Capital220 Notes Payable120 Supplies

All non-key fieldsare FD on the PKand only the PK.

Page 65: Database normalization

Third Normal Form• 2NFa is not in 3NF

– We have the FDs{Module} {Lecturer}

{Lecturer} {Dept}– So there is a transitive FD from the primary key {Module} to {Dept}

2NFa

Module Dept Lecturer

M1 D1 L1 M2 D1 L1 M3 D1 L2 M4 D2 L3 M5 D2 L4

Page 66: Database normalization

2NF to 3NF – Example

2NFa

Module Dept Lecturer

M1 D1 L1 M2 D1 L1 M3 D1 L2 M4 D2 L3 M5 D2 L4

3NFa

Lecturer Dept

L1 D1 L2 D1 L3 D2 L4 D2

3NFb

Module Lecturer

M1 L1 M2 L1 M3 L2 M4 L3 M5 L4

Page 67: Database normalization

Problems Resolved in 3NF• Problems in 2NF

– INSERT – Can't add lecturers who teach no modules– UPDATE – To change the department for L1 we must alter two rows– DELETE – If we delete M3 we delete L2 as well

• In 3NF all of these are resolved (for this relation – but 3NF can still have anomalies!)

3NFa

Lecturer Dept

L1 D1 L2 D1 L3 D2 L4 D2

3NFb

Module Lecturer

M1 L1 M2 L1 M3 L2 M4 L3 M5 L4

Page 68: Database normalization

4th Normal Form

• Boyce Codd Normal Form is violated• If there exists a partial key which has

multiple independent multi-valued functional dependencies to other partial keys.

partial-key1 partial-key2

partial-key3

4th Normal Form is violated if:

Page 69: Database normalization

4NF Example – Violation

Name LanguageFred FrenchFred ItalianFred Spanish

InstrumentPianoFluteFlute

Instruments_Languages

Jane FrenchJane French

PianoOboe

Sam FrenchSam SpanishSam Spanish

PianoOboeFlute

Page 70: Database normalization

4NF Example – Violation

Name LanguageFred FrenchFred ItalianFred Spanish

InstrumentPianoFluteFlute

Jane FrenchJane French

PianoOboe

Sam FrenchSam SpanishSam Spanish

PianoOboeFlute

Does this relation violate 1st, 2nd, 3rd, or BCNF?Are there any redundant facts?

Page 71: Database normalization

4NF Example – Correction

Name LanguageFred FrenchFred ItalianFred Spanish

LanguagesSpoken

Jane FrenchSam FrenchSam Spanish

NameFredFred

InstrumentPianoFlute

InstrumentsPlayed

JaneJane

PianoOboe

SamSamSam

PianoOboeFlute

Page 72: Database normalization

BCNF Normal Form

• Third Normal Form is violated• If there exists a partial key which is

functionally dependent on a non-key field(s).

non-key partial-key

Boyce-Codd Normal Form is violated if:

Page 73: Database normalization

Boyce–Codd normal form

A relational schema R is in Boyce–Codd normal form if and only if for every one of its dependencies X → Y, at least one of the following conditions hold:

X → Y is a trivial functional dependency (Y X)⊆X is a superkey for schema R

trivial functional dependency a functional dependency of an attribute on a superset of itself(Y X)⊆SuperkeySet of attributes which uniquely identifies a row

Page 74: Database normalization

3NF Examples

• ABC {ABC, CA} – In 3NF (but not in BCNF)– C is not a superkey, but A is in a candidate key

• ABC {AB, BC}– Not in 3NF– B is not a superkey, and C is not in a candidate key

• ABCD {ABC, BD}– Not in 3NF– B is not a superkey, and D is not in a candidate key

Page 75: Database normalization

BNCF

• {ABC, CA}

• AB composite key• One of the prime attributes in the

composite key is dependent on a non-prime attribute

Page 76: Database normalization

BCNF ExampleSemantics

• A student can have more than one major• A student has a different advisor for each

major.• Each advisor advises for only one major.

Page 77: Database normalization

BCNF Example – ViolationSID MAJOR ADVISOR1 PHYSICS EINSTEIN1 BIOLOGY LIVINGSTON2 PHYSICS BOHR2 COMPUTER SCIENCE CODD3 PHYSICS EINSTEIN4 BIOLOGY LIVINGSTON4 ACCOUNTING PACIOLI5 PHYSICS EINSTEIN6 PHYSICS BOHR6 BIOLOGY DARWIN7 COMPUTER SCIENCE CODD7 BIOLOGY DARWIN

Student_Majors Table

Page 78: Database normalization

BCNF Example – ViolationFD that violates BCNF

SID MAJOR ADVISOR1 PHYSICS EINSTEIN1 BIOLOGY LIVINGSTON2 PHYSICS BOHR2 COMPUTER SCIENCE CODD3 PHYSICS EINSTEIN4 BIOLOGY LIVINGSTON4 ACCOUNTING PACIOLI5 PHYSICS EINSTEIN6 PHYSICS BOHR6 BIOLOGY DARWIN7 COMPUTER SCIENCE CODD7 BIOLOGY DARWIN

It is importantthat you convinceyourself that majoris not FD onadvisor.

Page 79: Database normalization

BCNF Example – Corrected

SID ADVISOR1 EINSTEIN1 LIVINGSTON2 BOHR2 CODD3 EINSTEIN4 LIVINGSTON4 PACIOLI5 EINSTEIN6 BOHR6 DARWIN7 CODD7 DARWIN

MAJORADVISOR

PHYSICSEINSTEINBIOLOGYLIVINGSTON

PHYSICSBOHRCOMPUTER SCIENCECODD

ACCOUNTINGPACIOLI

BIOLOGYDARWIN

Student_Advisors Table

Advisors Table

Page 80: Database normalization

Fifth Normal Form (5NF)5NF is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships.

Fifth normal form (5NF), also known as project-join normal form (PJ/NF)

A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R.[1]

Page 81: Database normalization

Join dependencyA table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.

Page 82: Database normalization

5NF Example If agents represent companies, companies make products, and agents sell products,

then we might want to keep a record of which agent sells which product for which company.

-----------------------------| AGENT | COMPANY | PRODUCT ||-------+---------+---------|| Smith | Ford | car | | Smith | Ford | truck | | Smith | GM | car | | Smith | GM | truck | | Jones | Ford | car | -----------------------------

http://www.bkent.net/Doc/simple5.htm

Page 83: Database normalization

5NF ExampleThis form is necessary in the general case.

For example, although agent Smith sells cars made by Ford and trucks made by GM, he does not sell Ford trucks or GM cars.

Thus we need the combination of three fields to know which combinations are valid and which are not.

-----------------------------| AGENT | COMPANY | PRODUCT ||-------+---------+---------|| Smith | Ford | car | | Smith | Ford | truck | | Smith | GM | car | | Smith | GM | truck | | Jones | Ford | car | -----------------------------

Page 84: Database normalization

5NF ExampleBut suppose that a certain rule was in effect:

if an agent sells a certain product, and he represents a company making that product, then he sells that product for that company.

In this case, Iwe can reconstruct all the true facts from a normalized form consisting of three separate record types, each containing two fields:

------------------- --------------------- ------------------- | AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT ||-------+---------| |---------+---------| |-------+---------|| Smith | Ford | | Ford | car | | Smith | car || Smith | GM | | Ford | truck | | Smith | truck || Jones | Ford | | GM | car | | Jones | car |------------------- | GM | truck | ------------------- ---------------------

Page 85: Database normalization

5NF Example

isolating semantically related multiple relationships.

we can reconstruct all the true facts from a normalized form consisting of three separate record types, each containing two fields

A record type is in fifth normal form when its information content cannot be reconstructed from several smaller record types,

i.e., from record types each having fewer fields than the original record.

Join dependencyA table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

Page 86: Database normalization

5NF Example-----------------------------| AGENT | COMPANY | PRODUCT ||-------+---------+---------|| Smith | Ford | car | | Smith | Ford | truck | | Smith | GM | car | | Smith | GM | truck | | Jones | Ford | car | -----------------------------

------------------- --------------------- ------------------- | AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT ||-------+---------| |---------+---------| |-------+---------|| Smith | Ford | | Ford | car | | Smith | car || Smith | GM | | Ford | truck | | Smith | truck || Jones | Ford | | GM | car | | Jones | car |------------------- | GM | truck | ------------------- ---------------------

Join dependencyA table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

Page 87: Database normalization

87

Denormalization

• Creation of normalized relations is important database design goal

• Processing requirements should also be a goal• If tables decomposed to conform to

normalization requirements:– Number of database tables expands

Page 88: Database normalization

88

Denormalization (continued)

• Joining the larger number of tables takes additional input/output (I/O) operations and processing logic, thereby reducing system speed

• Conflicts between design efficiency, information requirements, and processing speed are often resolved through compromises that may include denormalization

Page 89: Database normalization

89

Denormalization (continued)

• Unnormalized tables in production database tend to suffer from these defects:– Data updates are less efficient because programs

that read and update tables must deal with larger tables

– Indexing is more cumbersome– Unnormalized tables yield no simple strategies for

creating virtual tables known as views

Page 90: Database normalization

90

Denormalization (continued)

• Use denormalization cautiously • Understand why—under some circumstances

—unnormalized tables are better choice

Page 91: Database normalization

Functional dependencies

• ssn -> name, address• ssn, c-id -> grade

Ssn c-id Grade Name Address

123 413 A smith Main

123 415 B smith Main

123 211 A smith Main

Page 92: Database normalization

Functional dependencies

K is a superkey for relation R iff K -> R

K is a candidate key for relation R iff:K -> Rfor no a K, a -> R

Page 93: Database normalization

Functional dependencies

Closure of a set of FD: all implied FDs – e.g.:ssn -> name, addressssn, c-id -> grade

implyssn, c-id -> grade, name, addressssn, c-id -> ssn

Page 94: Database normalization

FDs - Armstrong’s axioms

Closure of a set of FD: all implied FDs – e.g.:ssn -> name, addressssn, c-id -> grade

how to find all the implied ones, systematically?

Page 95: Database normalization

FDs - Armstrong’s axioms

“Armstrong’s axioms” guarantee soundness and completeness:

• Reflexivity: e.g., ssn, name -> ssn• Augmentation

e.g., ssn->name then ssn,grade-> ssn,grade

YXXY

YWXWYX

Page 96: Database normalization

FDs - Armstrong’s axioms

• Transitivity

ssn->address address-> county-tax-rateTHEN:

ssn-> county-tax-rate

ZXZYYX

Page 97: Database normalization

FDs - Armstrong’s axioms

Reflexivity:

Augmentation:

Transitivity:

ZX

ZYYX

YXXY

YWXWYX

‘sound’ and ‘complete’

Page 98: Database normalization

FDs – finding the closure F+

F+ = Frepeat

for each functional dependency f in F+

apply reflexivity and augmentation rules on f add the resulting functional dependencies to F+

for each pair of functional dependencies f1and f2 in F+

if f1 and f2 can be combined using transitivity then add the resulting functional dependency to F+

until F+ does not change any further

• We can further simplify manual computation of F+ by using the following additional rules

Page 99: Database normalization

FDs - Armstrong’s axioms

Additional rules:

• Union

• Decomposition

• Pseudo-transitivity

ZXWZYW

YX

ZXYX

YZX

YZXZXYX

Page 100: Database normalization

FDs - Armstrong’s axioms

Prove ‘Union’ from the three axioms:

YZXZXYX

?

Page 101: Database normalization

FDs - Armstrong’s axioms

Prove ‘Union’ from the three axioms:

YZXtytransitiviandthusXisXXbut

XZXXXwaugmYZXZZwaugm

ZXYX

)4()3(;

)4(/.)2()3(/.)1(

)2()1(

Page 102: Database normalization

FDs - Armstrong’s axioms

Prove Pseudo-transitivity:

ZXWZYW

YX

?

ZXZYYX

YXXY

YWXWYX

Page 103: Database normalization

FDs - Armstrong’s axioms

Prove Decomposition

ZXZYYX

YXXY

YWXWYX

ZXYX

YZX?

Page 104: Database normalization

FDs - Closure F+

Given a set F of FD (on a schema)F+ is the set of all implied FD. E.g.,takes(ssn, c-id, grade, name, address)

ssn, c-id -> grade ssn-> name, address }F

Page 105: Database normalization

FDs - Closure F+

ssn, c-id -> grade ssn-> name, address ssn-> ssn ssn, c-id-> address c-id, address-> c-id ...

F+

Page 106: Database normalization

Summary

• Normalization• Characteristics of a suitable set of relations

include: – the minimal number of attributes necessary to

support the data requirements of the enterprise;– attributes with a close logical relationship are

found in the same relation;– minimal redundancy with each attribute

represented only once with the important exception of attributes that form all or part of foreign keys.

Page 107: Database normalization

Functional Dependence

Key concept in normalization

The key to finding redundancyIs to find

Functional dependence among theKeys and attributes

Page 108: Database normalization

1st Normal Form

• Table has a primary key• Table has no repeating groups

A multivalued attribute is an attribute that may have several values for one record

A repeating group is a set of one or more multivalued attributes that are related

Page 109: Database normalization

2nd Normal Form

• First Normal Form is violated• If there exists a non-key field(s) which is

functionally dependent on a partial key.

partial key non-key

Second Normal Form is violated if:

Page 110: Database normalization

3rd Normal Form

• Second Normal Form is violated• If there exists a non-key field(s) which is

functionally dependent on another non-key field(s).

non-key non-key

Third Normal Form is violated if:

Note: A candidate key is not a non-key field.

Page 111: Database normalization

4th Normal Form

• Boyce Codd Normal Form is violated• If there exists a partial key which has

multiple independent multi-valued functional dependencies to other partial keys.

partial-key1 partial-key2

partial-key3

4th Normal Form is violated if:

Page 112: Database normalization

BCNF Normal Form

• Third Normal Form is violated• If there exists a partial key which is

functionally dependent on a non-key field(s).

non-key partial-key

Boyce-Codd Normal Form is violated if:

Page 113: Database normalization

Fifth Normal Form (5NF)5NF is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships.

Fifth normal form (5NF), also known as project-join normal form (PJ/NF)

A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R.[1]