37
1 Database I Methodology Normalization

Week07 - Normalization

Embed Size (px)

Citation preview

Page 1: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 1/37

1

Database I

Methodology

Normalization

Page 2: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 2/37

2

Normalization ? (1/2)

Normalization is the technique for analyzing

relations based on their primary key (or candidate

keys in the case of BCNF) and functional

dependencies

Normalization is executed as a series of steps

Each step corresponds to a specific normal form

Different normal forms or levels are called as first,

second and so on

Each normalized form has certain requirements or 

conditions, which must be fulfilled

Page 3: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 3/37

3

Normalization ? (2/2)

If a relation fulfills any particular form then it

is said to be in that normal form

The minimum form in which all the tables arein is called the normal form of entire database

It is performed after the logical database

design

Page 4: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 4/37

4

Why Normalization ? (1/3)

Main aim of relational database design is to group

attributes into relations so as to minimize data

redundancy and thereby reduce the file storage

space

Consider StaffBranch relation below

It is clear that there is redundant information in

StaffBranch relation

SNo SName Position Salary BNo BAddress PhNo

S123  Asad Manager 23000 B5 Peshawar 20456789

S125 Jamal Programmer   40000 B5 Peshawar 20456789

S130 Ghamgeen Peon 5000 B6 Islamabad 34564567

S133 Khamar Sweaper   3000 B7 Mardan 23567890

Page 5: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 5/37

5

Why Normalization ? (2/3)

The serious problem with relations havingredundant information is update anomalies

Update anomalies can be classified as: Insertion Anomalies

There are two insertion anomalies in StaffBranchrelation

To insert the details of new member, we have to also

enter the branch details for new entry, thus we have totake great care that branch details is consistent

To insert details of a new branch that currently has nomembers of staff, the solution could be to insert nullsfor staff, but wait SNo is primary key ?

Page 6: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 6/37

6

Why Normalization ? (3/3)

Deletion Anomalies

If we delete the row for staff number S133, the

information relating to branch number B7 is also lost

from the database

Modification Anomalies

If we want to change the phone number for branch

number B5, we must update the rows of all staff 

located at that branch to avoid data inconsistency

Page 7: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 7/37

7

Functional Dependencies (1/4)

Functional dependency describes the

relationship between attributes

In a database, we often have the case whereone field defines the other 

For example, we can say that Employee ID

(EmpID) defines a name

What does this mean?

It means that if I have a database with

EmpIDs and names, and if I know someone's

EmpID, then I can find their name

Page 8: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 8/37

8

Functional Dependencies (2/4)

From the word "defines," we means that for 

every EmpID we will have one and only one

name So we will say that we have defined name as

being functionally dependent on EmpID

 A formal definition can be:

If A and B are attributes or sets of attributes of 

relation R, we say that B is functionally dependent

on A if each value of A in R has associated with it

exactly one value of B in R

Page 9: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 9/37

9

Functional Dependencies (3/4)

We write a functional dependency (FD) as:

EmpID Name

Can be read as ³EmpI

D defines Name" or ³EmpI

Dimplies Name³ or Name is functionally dependent on

EmpID´

Let us look at a an interesting example:

EmpID Name Job Salary101 Darya Khan Programmer   40000

104 Shahrukh Designer 50000

103 Gul Jana System Engineer   35000

103 Gul Jana Project Manager 55000

Page 10: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 10/37

10

Functional Dependencies (4/4)

We have the FD that EmpID Name

This means that every time we find 103, wefind the name, Gul Jana

Thus something is on the left-hand side of aFD, it does not imply that you have a key or that it will be unique in the database

The FD X Y only means that for everyoccurrence of X you will get the same valueof Y

Page 11: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 11/37

11

FD Inference Axioms or Armstrong

Axioms ( Transitivity Rule )

Let us now consider 

another example

Here, we will define:

RNo Name

RNo School

Name School

Have we violated any FDs with our data?

Because all RNos are unique, there cannot be a FD

violation of RNo Name

The same comment is true for RNo School and

Name School

RNo Name School Location

101 Rasheed PPS Peshawar  

102 Ani FCA Mardan

103 Naseer ECS Swabi

104 Ghafoor FCA Mardan

105 Bilal PPS Peshawar  

106 Sohail PPS Peshawar  

Page 12: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 12/37

12

FD Inference Axioms or Armstrong

Axioms ( Transitivity Rule )

If we define a FD X Y and we define a FD Y Z,

then we know by inference that X Z

So we can infer that RNo Location

The inference illustrated is called the transitivity rule

of FD inference

To see that the FD RNo Location is true in our 

data, you can note that given any value of RNo, you

always find a unique location for that person

This prove that transitivity rule exists

Page 13: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 13/37

13

FD Inference Axioms or Armstrong

Axioms (Pseudo Transitivity Rule )

If A B and CB D, then AC D

If RNo Name and

Name,School Location Then we can write it as

RNo,Name Location

Page 14: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 14/37

14

FD Inference Axioms or Armstrong

Axioms ( Reflexive Rule )

If X is a composite, composed of A

and B, then X A and X B

Example: X= Name, City. Then we

are saying that X Name and

X City

The rule, says if I give you the combination

<Rasheed, Mardan>, what is this person's Name?What is this person's City?

Name City

Haleem Peshawar  

Rasheed Mardan

Kaleem Charsadda

Page 15: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 15/37

15

FD Inference Axioms or Armstrong

Axioms (Augmentation Rule)

If X Y, then XZ Y

Now, I claim that because

Name City, that

Name+ShoeSize City

(i.e., we augmented Name with ShoeSize)

Will there be a contradiction here, ever?

No, because we defined Name City, Nameplus more information will always identify theunique City for that individual

We can always add information to the LHS of an FD and still have the FD be true

Name City ShoeSize

Haleem Peshawar 10

Rasheed Mardan 6

Kaleem Charsadda 7

Page 16: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 16/37

16

FD Inference Axioms or Armstrong

Axioms (Decomposition Rule)

The decomposition/projectivity rule says that

if it is given that X YZ (that is, X defines

both Y and Z), then X Y and X Z Suppose I define Name City, ShoeSize

This means for every occurrence of Name, I 

have a unique value of City and a unique

value of ShoeSize

Page 17: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 17/37

17

FD Inference Axioms or Armstrong

Axioms (Union Rule)

The union/additivity rule is the reverse of the

decomposition rule in that if X Y and

X Z, then X YZ If we were given that Name City and given

that Name ShowSize

We can immediately write Name City, Shoe

Size

Page 18: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 18/37

18

Keys and Functional

Dependencies (1/4)

The main reason we identify the FDs and

inference rules is to be able to find keys and

develop normal forms for relationaldatabases

In any table, we want to find out which, if any

attribute(s), will identify the rest of the

attributes  An attribute that will identify all the other 

attributes in row is called a "candidate key³

Consider the example on next slide

Page 19: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 19/37

19

Keys and Functional

Dependencies (2/4)

RNo Name School Location

101 Rasheed PPS Peshawar  

102 Ani FCA Mardan

103

Naseer ECS Swabi104 Ghafoor FCA Mardan

105 Bilal PPS Peshawar  

106 Sohail PPS Peshawar  

Now suppose we define

the following FDs:

RNo NameRNo School

School Location

What we want is to find the least number of attributes that can identify all the rest

attributes (hopefully only one attribute)

Suppose we take RNo as candidate key

Page 20: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 20/37

20

Keys and Functional

Dependencies (3/4)

Can we show that RNo "defines" all attributes in the

relation?

We can use the transitive, reflexive and union rule

RNo Name (given)

RNo School (given)

School Location (given)

RNo Location (derived by the transitive rule)

RNo RNo (reflexive rule)

RNo RNo, Name, School, Location (union rule)

Therefore RNo is a candidate key for this relation

Finding a candidate key is the finding of a ³closure of an

attribute or a set of attributes´ that defines all the other 

attributes

Page 21: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 21/37

21

Keys and Functional

Dependencies (4/4)

 Are there any other candidate keys?

Of course! By augmentation rule we can augment

RNo and name to form new candidate keys: RNo,

Name

Is School a candidate key? No

Once we have found a set of candidate keys (or 

perhaps only one as in this case), we designate one

of the candidate keys as the primary key and move

on to normal forms

FD rules are useful in developing Normal forms

Page 22: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 22/37

22

Normal Forms

(1N Form)

 A relation is in first normal form if and only if 

every attribute is single valued for each tuple or 

There be no repeating fields or groups in the row

This means that each attribute in each row , or 

each cell of the table, contains only one value

First normal form say that the domains of 

attributes of a relation are atomic The concept of 1NF is demonstrated on next

slide

Page 23: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 23/37

23

Normal Forms

(1N Form)

Name Address Dependant

Sohail Dar  I-8 Islamabad Jamal,Kamal,Karim

Shoiab Ali Wah Cantt. Haider, Ihsan

TahiraI jaz Peshawar  

Gul

Mast

There are more than one values for 

Dependant, thus it is not in 1N form

Name Address

Sohail Dar  I-8 Islamabad

Shoiab Ali Wah Cantt.

Tahira I jaz Peshawar  

DependantName EmployeeName

Jamal Sohail Dar  

Kamal Sohail Dar  

Karim Sohail Dar  

Haider Shoiab Ali

Ihsan Shoiab Ali

Gul Mast Tahira I jaz

Dependant

Employee

Employee

1NF Tables:

Non-1NF:

Page 24: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 24/37

24

Normal Forms

(1N Form)

Note that the original table could be

reconstructed by combining these two tables

By recording all the rows in the EMPLOYEEtable and combining them with the

corresponding rows in the EMPLOYEE table

where the names were equal (an equi-join

operation)

Page 25: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 25/37

25

Normal Forms

(2N Form)

 A relation is in second normal form (2NF) if and only

if it is in first normal form and all the non-key

attributes are fully functional dependent on the key

Partial dependencies are not allowed

The only time we have to be concerned about 2NF

is when the key is composite

Second normal form (2NF) addresses the concept of 

removing duplicative data

 A relation that is not in 2NF exhibits the update,

insertion and deletion anomalies

Consider the example on next slide

Page 26: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 26/37

26

Normal Forms

(2N Form)

Employee (name, job,

salary, address)

FDs

name+job salary

name address

The problems developing here are redundancy and

anomalies Here address depends only on the name, not the

 job; this is an example of a partial dependency

name job salary address

 Asad Welder 2300 Peshawar 

 Asad Programmer  30000 Peshawar 

Karim Programmer  45000 Lahore

Javad Designer  40000 Lahore

Page 27: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 27/37

27

Normal Forms

(2N Form)

 As the table is in 1NF

The process for transforming a 1NF table to 2NF is:1. Identify any partial determinants (attribute on LHS) other 

than the composite key, and the columns they determine2. Create and name a new table for each determinant and

the unique columns it determines

3. Move the determined columns from the original table tothe new table. The determinate becomes the primary key

of the new table4. Delete the columns you just moved from the original table

except for the determinant which will serve as a foreignkey

5. The original table may be renamed to maintain semantic

meaning

Page 28: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 28/37

28

Normal Forms

(2N Form)

Consider the non-2NF

relation

name job salary

 Asad Welder 2300

 Asad Programmer   30000

Karim Programmer   45000

Javad Designer   40000

name address

 Asad Peshawar  

Karim Lahore

Javad Lahore

name job salary address

 Asad Welder 2300 Peshawar 

 Asad Programmer   30000 Peshawar 

Karim Programmer  45000 LahoreJavad Designer   40000 Lahore

1

2

NameAdd3

4

NameJob

5

Page 29: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 29/37

29

Normal Forms

(3N Form)

 A relational table is in 3NF if it is already in2NF and every non-key column is non-transitively dependent upon its primary key

OR All non-key attributes are functionallydependent only upon the primary key

Transitive Dependency

Transitive dependency occurs when one non-keyattribute determines another non-key attribute

E.g.

STD(stId, stName, stAdr, prName, prCrdts)

Page 30: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 30/37

30

Normal Forms

(3N Form)

stId stName, stAdr, prName, prCrdts

prName prCrdts

Here stId is the key prCrdts can be determined by prName

(Transitive Dependency)

Thus transitive dependency exists whichmeans STUDENT relation is not in 3NF

Transitive dependencies cause insertion,

deletion, and update anomalies

Page 31: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 31/37

31

Normal Forms

(3N Form)

So for 3NF we will concentrate on relations withone candidate key

The process for transforming a 2NF table to 3NF is:1. Identify any determinants, other the primary key, and the

columns they determine2. Create and name a new table for each determinant and

the unique columns it determines

3. Move the determined columns from the original table tothe new table. The determinate becomes the primary key

of the new table4. Delete the columns you just moved from the original table

except for the determinate which will serve as a foreignkey

5. The original table may be renamed to maintain semanticmeaning

Page 32: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 32/37

32

Normal Forms

(3N Form)

stdId stName stAdr prName prCrdts

S1020 Sohail Dar  I-8 Islamabad MCS 64

S1038 Shoaib Ali G-6 Islamabad BCS 132

S1015 Tahira Ejaz L Rukh Wah MCS 64

prName prCrdts

MCS 64

BCS 132

stdId stName stAdr prName

S1020 Sohail Dar  I-8 Islamabad MCS

S1038 Shoaib Ali G-6 Islamabad BCS

S1015 Tahira Ejaz L Rukh Wah MCS

1

Program

2

3 4

Student

5

Non-3NF

Page 33: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 33/37

33

Normal Forms

(Boyce-Codd Normal Form (BCNF))

1NF and 2NF identifies partial and transitive

dependencies on the basis of primary keys

But what if such dependencies remain on other 

candidate keys, if any exist

BCNF identifies the functional dependencies on all

candidate keys

It means that if relation has one candidate key and it

is in 3NF, it is also in BCNF

3NF relation is not in BCNF if:

The candidate keys in the relation are composite keys

There is more than one candidate key in the relation

Page 34: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 34/37

34

Normal Forms

(Boyce-Codd Normal Form (BCNF))

The keys are not disjoint, that is, some attributes in thekeys are common

For example consider the Enroll relation

Enroll (sno, sname, cno, cname, dateofenroll) Let us assume that the relation has the following

candidate keys:sno,cno

sno,cname

sname,cno

sname, cname

The relation is in 3NF but not in BCNF becausethere are dependencies

Page 35: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 35/37

35

Normal Forms

(Boyce-Codd Normal Form (BCNF))

sno sname

cno cname

Where attributes are part of a candidate key are

dependent on part of another candidate key By decomposing the Enroll relation in the following

three relations result in BCNF

Std(sno, sname)

Crs(cno, cname)

Std_Crs(sno, cno, dateofenroll)

Page 36: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 36/37

36

Normal Forms

(Boyce-Codd Normal Form (BCNF))

sno sname cno cname dateofenroll

S1020 Sohail Dar C104 E-Com 12/02/2007

S1038 Shoaib Ali C104 E-Com 10/01/2008

S1015 Tahira Ejaz H2

34English

01/

02/2

007

sno sname

S1020 Sohail Dar 

S1038 Shoaib Ali

S1015 Tahira Ejaz

cno cname

C104 E-Com

H234 English

sno cname datofenroll

S1020 C104 10/01/2008

S1038 C104 01/02/2007

S1015 H234 12/02/2007

Non-BCNF

Std

Crs

Std_Crs

Page 37: Week07 - Normalization

8/8/2019 Week07 - Normalization

http://slidepdf.com/reader/full/week07-normalization 37/37

37

Other Normal Forms

5NF deals with multi-valued dependency and

possible loss less decompositions

DomainK

ey Normal Form (DK

NF) reducesfurther chances of any possible inconsistency