32
DATA NORMALISATION Pamela Quick

DATA NORMALISATION Pamela Quick. Data Normalisation 2 Objectives Data normalisation aims to derive record structures which avoid anomalies in u Insertion

Embed Size (px)

Citation preview

DATA

NORMALISATION

DATA

NORMALISATION

Pamela Quick

Data Normalisation 2

Objectives

Data normalisation aims to derive record structures which avoid anomalies in

Insertion

Deletion

Modification

Accessing

Data normalisation ensures single valuedness of facts

Facts are represented in fields in keyed records

Data Normalisation 3

The Process of Normalisation

Usually three steps (in industry) giving rise to

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

In academia

Boyce -Codd Normal Form (BCNF)

Fourth Normal Form (4NF)

At each step we consider relationships between an entity's attributes

These relationships are known as functional dependencies

Data Normalisation 4

Steps in Data Normalisation

UNORMALISED ENTITY

step1 ... remove repeating groups

1st NORMAL FORM

step2 ... remove partial dependencies

2nd NORMAL FORM

step3 ... remove indirect dependencies

3rd NORMAL FORM

step4 ... remove multi-dependencies

4th NORMAL FORM

step4 ..every determinate a key

BOYCE-CODD NORMAL FORM

Data Normalisation 5

Attributes - Identifiers

An entity identifier uniquely determines an occurence on the entity

A Superkey - a combination of attributes that uniquely identify

When more than one identifier exists we have Candidate

dentifiers (Keys) - minimal superkey

Primary Key - designated

Supplier# Supplier-name Supp-add

SUPPLIER

Data Normalisation 6

Attributes - Repeating Groups

When a group of attributes has multiple values then we say there is a

repeating group of attributes in the entity

COMPANY NAME ADDRESSBRANCH

NAMEBRANCH

ADDRESS

A123 ABC Ltd 100 High St ABC1 Manchester

ABC2 London

ABC3 Glasgow

(BRANCH_NAME, BRANCH_ADDRESS) is a repeating group

Data Normalisation 7

Functional Dependency

A B

PART-DESCRIPTIONPART#

A

B

C

B is functionally dependent on A if a value of A uniquely determines

a value of B

Data Normalisation 8

Functional Dependency

A -> B B is functionally dependent on A, A determines B

for all A that have the same value , have the same value of B

Functional Dependency is Trivial if satisfied by all tuples

ie A ->A

in general X -> Y is trivial if Y = X or is a subset

FDs are said to HOLD - when every possible attribute combination complies

FDs are said to be SATISFIED - when all stated attribute instances comply

Data Normalisation 9

More Examples of Functional Dependency

X

YZ

Z

KY

X

Data Normalisation 10

Example

ORDER NUMBER

SUPPLIER NUMBER

ORDER DATE

DELIVERY DATE

500028

09/05/88

25/07/88

PART NO. PART-DESC QTY-ORD PRICE

O463 Hook 150 15.00

1492 Bolt 1000 10.00

3164 Spanner 10 5.00

TOTAL 30.00

1023

PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATEDELIVERY-DATE, (PART#, PART-

DESCRIPTION,QUANTITY-ORDERED, PRICE), TOTAL-PRICE)

Data Normalisation 11

First Normal Form

An entity type is in 1NF if there are no repeating groups of attribute types

Any un-normalised entity type is transformed to 1NF

Remove all repeating attribute groups

Repeating attribute groups become new entity types in their own right

The identifier of the original entity type must be an attribute (but not necessarily an identifier) of the derived entity type.

Data Normalisation 12

Example of First Normal Form

ORDER NUMBER

SUPPLIER NUMBER

ORDER DATE

DELIVERY DATE

500028

09/05/88

25/07/88

PART NO. PART-DESC QTY-ORD PRICE

O463 Hook 150 15.00

1492 Bolt 1000 10.00

3164 Spanner 10 5.00

TOTAL 30.00

1023

PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATEDELIVERY-DATE, (PART#, PART-

DESCRIPTION,QUANTITY-ORDERED, PRICE), TOTAL-

PRICE)

UN-NORMALISED ENTITY TYPE

Data Normalisation 13

Example in 1NF

PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATEDELIVERY-DATE, TOTAL-PRICE)

PURCHASE-ITEM-1 ( ORDER#, PART#, PART-DESCRIPTION,

QUANTITY-ORDERED, PRICE)

[NOTE: PART# ALONE DOES NOTE IDENTIFY PURCHASE-ITEM]

ENTITY TYPES IN 1NF

ORDER NUMBER

SUPPLIER NUMBER

ORDER DATE

DELIVERY DATE

500028

09/05/88

25/07/88

PART NO. PART-DESC QTY-ORD PRICE

O463 Hook 150 15.00

1492 Bolt 1000 10.00

3164 Spanner 10 5.00

TOTAL 30.00

1023

Data Normalisation 14

Example

STUDENT NUMBER

STUDENT NAME

STUDENT ADDRESS

COURSE NO COURSE TUTOR NAME TUTOR NO

S0843215

P. Smith

1, South Downs Hale

PM951 Computing T. Long 037428

S212 Biology S. Short 096524

REGISTRATION FORM

STUDENT (Student#, student-name, student-address)

ENROLMENT (Student#, Course#, course-title,tutor-name,tutor-staff#

Data Normalisation 15

Benefits from 1ST Normal Form

Any 'hidden' entities are identified

Process results in separation of different objects

BUT anomalies may still exist

PURCHASE-ITEM-1( ORDER#, PART#, PART-DESCRIPTION,QUANTITY-ORDERED, PRICE)

PART-DESCRIPTION appears on every PURCHASE-ITEM occurence.

This may result in anomalies when updating or deleting records

The problem in the example is that PART-DESCRIPTION is functionally dependent only on PART# (part of the identifier)

Data Normalisation 16

Second Normal Form

An enity type is in 2NF if it is in 1NF and each non identifying attribute depends upon the whole identifier

Any enity type in 1NF is transformed to 2NF

Identify functional dependencies

Re-write entity types so that each non-identifying attribute is functionally dependent on the whole of the identifier

Data Normalisation 17

Example

PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATEDELIVERY-DATE, TOTAL-PRICE)

PURCHASE-ITEM-1 ( ORDER#, PART#, PART-DESCRIPTION,

QUANTITY-ORDERED, PRICE)

ENTITY TYPES IN 1NF

ORDER NUMBER

SUPPLIER NUMBER

ORDER DATE

DELIVERY DATE

500028

09/05/88

25/07/88

PART NO. PART-DESC QTY-ORD PRICE

O463 Hook 150 15.00

1492 Bolt 1000 10.00

3164 Spanner 10 5.00

TOTAL 30.00

1023

Data Normalisation 18

Functional Dependencies

PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATEDELIVERY-DATE, TOTAL-PRICE)

PURCHASE-ITEM-1 ( ORDER#, PART#, PART-DESCRIPTION,QUANTITY-ORDERED, PRICE)

ORDER#

PART#

PART-

DESCRIPTION

QUANTITY-ORDERED

PRICE

Data Normalisation 19

In 2nd Normal Form

Decompose PURCHASE-ITEM into two entity types

PURCHASE-ITEM (Order#, Part#, Quantity-Ordered, Price)

PART (Part#, Part-Description)

Original enity type decomposed into three entity types in 2nd normal form

PURCHASE-ORDER (Order#,Supplier#, Order-Date, Delivery-Date, Total-Price)

PURCHASE-ITEM (Order#, Part#,Quantity-Ordered, Price)

PART (Part#, Part-Description)

Data Normalisation 20

Example in 2NF

STUDENT NUMBER

STUDENT NAME

STUDENT ADDRESS

COURSE NO COURSE TUTOR NAME TUTOR NO

S0843215

P. Smith

1, South Downs Hale

PM951 Computing T. Long 037428

S212 Biology S. Short 096524

REGISTRATION FORM

STUDENT (Student#,Student-Name, Student-Adderss)

ENROLMENT ( Student#, Course#, Tutor-Name, Tutor-Staff#)

COURSE (Course#, Course-Title)

ENTITY TYPES IN 2NF

Data Normalisation 21

Third normal Form

An enity type is in 3NF if it is in 2NF and all non identifying attributes are independent

Any enity type in 2NF is transformed in 3NF

Determine functional dependencies between non identifying attributes

Decompose enity into new entities

Data Normalisation 22

Example

STUDENT NUMBER

STUDENT NAME

STUDENT ADDRESS

COURSE NO COURSE TUTOR NAME TUTOR NO

S0843215

P. Smith

1, South Downs Hale

PM951 Computing T. Long 037428

S212 Biology S. Short 096524

REGISTRATION FORM

STUDENT (Student#,Student-Name, Student-Adderss)

ENROLMENT ( Student#, Course#, Tutor-Name, Tutor-Staff#)

COURSE (Course#,, Course-Title)

ENTITY TYPES IN 2NF

Data Normalisation 23

Functional Dependencies

STUDENT (Student#,Student-Name, Student-Adderss)ENROLMENT ( Student#, Course#, Tutor-Name, Tutor-Staff#)COURSE (Course#,, Course-Title)

Student#

Course#

Tutor-staff#

Tutor-name

Data Normalisation 24

Example in 3NF

STUDENT (Student#,Student-Name, Student-Adderss)

ENROLMENT ( Student#, Course#, Tutor-Staff#)

COURSE (Course#,, Course-Title)

TUTOR (Tutor-Staff#, Tutor-Name)

STUDENT NUMBER

STUDENT NAME

STUDENT ADDRESS

COURSE NO COURSE TUTOR NAME TUTOR NO

S0843215

P. Smith

1, South Downs Hale

PM951 Computing T. Long 037428

S212 Biology S. Short 096524

REGISTRATION FORM

ENTITY TYPES IN 3NF

Data Normalisation 25

Boyce-Codd Normal Form (BCNF)

A relation is in BCNF if every determinate is a candidate key

For a relation with only one candidate key , 3NF and BCNF are equivalent

Violation of BCNF is rare, may occur in a relation that :

contains two (or more) composite candidate keys and

which overlap, that is share at least one attribute in common

Data Normalisation 26

BCNF

Client_no InterviewDate

InterviewTime

Staff_no Room_no

CR76 13-May-95

13-May-95

13-May-95

10.30 SG5 G101

CR56 12.00 SG5 G101

CR74 12.00 SG37 G102

CR56 10-Jun-95 10.00 SG5 G102

The following FDs hold :Client_No,Interview_Date ->Interview_time,Staff_no,Room_noStaff_no,Interview_Date,Interview_time -> Client_noStaff_no,Interview_date -> Room_no

Client_no,Interview_date and Staff_no,Interview_date are composite candidate keys that share the common attribute Interview_date

CLIENT_INTERVIEW

Data Normalisation 27

BCNF

The relation CLIENT_INTERVIEW is in 3NF but not BCNF

To transform to BCNF:Remove the violating FD and create two relations:

INTERVIEW (Client_no, Interview_date, Interview_time, Staff_noSTAFF_ROOM (Staff_no,Interview_date,Room_no)

Data Normalisation 28

Fourth Normal Form

An entity type is in 4NF if it is in 3NF and there are no multivalued dependencies between its attribute types

Any entity type in 3NF is transformed to 4NF

Detect any multivalued dependencies

Decompose entity type

Data Normalisation 29

AUTHOR_NO BOOK_NO SUBJECT BOOK_TITLE AUTHOR_NAME

A1

A1

A2

A2

A3

B1

B1

B1

B1

B2

Comp. Sc.

Maths

Comp. Sc.

Maths

Maths

Methods

Method

Methods

Methods

Calculus

Jones

Jones

Smith

Smith

Brown

Multivalued Dependencies - 1

AUTHOR (Author_no, Author-name)

BOOK (Book_no, Book-_title)

AUTHOR-BOOK-SUBJECT (Author_no,

Book_no, Subject)

IN 3rd NORMAL FORM author_no

book_no

subject

author_name

book_title

Data Normalisation 30

Multivalued Dependencies - 2

Example models that "each AUTHOR is associated with all the SUBJECTS under which the BOOK is classified"

The attribute SUBJECT contains redundant values. If SUBJECT were deleted from rows 1 & 2 the values could be deduced from rows 3 & 4

Anomaly because the same set of SUBJECT is associated with each AUTHOR of the same BOOK

BOOK_NO AUTHOR_NO

multidetermines

BOOK_NO SUBJECT

AUTHOR_NO BOOK_NO SUBJECT

B1 B1 B1 B1 B2

Comp. Sc. Maths Comp. Sc. Maths Maths

A1 A1 A2 A2 A3

Data Normalisation 31

Fourth Normal Form

AUTHOR (Author_no, Author_name)

BOOK (Book_no, Book_Title)

AUTHOR-BOOK (Author_no, Book_no)

BOOK-SUBJECT (Book_no, Subject)

IN 4th NORMAL FORM

AUTHOR_NO BOOK_NO SUBJECT

A1 A1 A2 A2 A3

B1 B1 B1 B1 B2

Comp. Sc. Maths Comp. Sc. Maths Maths

AUTHOR_NO BOOK_NO

A1 A2 A3

B1 B1 B2

BOOK_NO SUBJECT

B1 B1 B2

Comp. Sc. Maths Maths

Data Normalisation 32

Conclusions

Data Normalisation is a bottom-up technique that ensures the basic properties of the relational model

no duplicate tuples

no nested relations

Data normalisation is often used as the only technique for database design - implementation view

A more appropriate approach is to complement conceptual modelling with data normalisation