Normalization Are we Normal. Normalization Normalization is the process of converting complex data...

Preview:

Citation preview

Normalization

Are we Normal

Normalization

Normalization is the process of converting complex data structures into simple, stable data structures

It also is the process of removing from a database certain “anomalies”

Anomalies Update anomalies—you have to update

a record in a number of different places Insertion anomalies—Example: in order

to insert a new employee a project must be assigned. If there is no project yet a phantom one must be created.

Deletion anomalies—Two types: when you delete a record other vital information is lost, or must delete in several places with the possibility of leaving unattached data islands

Normal Forms

There are many Normal Forms—or stages of normalization possible, but we will only focus on the first three.

First Normal Form1. There are no duplicated rows in the

table.2. Each cell is single-valued (i.e., there

are no repeating groups or arrays).3. Entries in a column (attribute, field)

are of the same kind.

Example Table 1

CDID CDTITLE TrackTitle Artist Artist Country

1 Sergeant Pepper Sergeant Pepper, Lucy in the Sky, With a little Help

Beatles UK

2 Blood on the Tracks

Tangled up in Blue, Idiot Wind

Dylan US

CDS Table 1

Another Example Table

CDID CDTitle Track1 Track2 Track3

1 Sergeant Pepper

Sergeant Peppers lonely hearts club band

Lucy in the sky with diamonds

With a little help

2 Blood on the Tracks

Tangled up in Blue

Idiot Wind Bucket of Rain

Normalizing The sample tables have repeating

groups—ie the tracks associated with each CD.

Each column must contain only a single value

You also don’t want to find yourself numbering columns like track1, etc.

The next table puts the sample table into first normal form

First Normal Form Sample

CDID CDTITLE TrackTitle Artist Artist Country

1 Sergeant Pepper Sergeant Pepper

Beatles UK

1 Sergeant Pepper Lucy in the Sky Beatles UK

1 Sergeant Pepper With a little Help

Beatles UK

2 Blood on the Tracks

Tangled up in Blue

Dylan US

2 Blood on the Tracks

Idiot Wind Dylan US

Second Normal Form

• A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on all of the key and nothing else.

• This is called functional dependency

Normalizing. . . In our sample table there are really

two separate things going on One is the CD information and one

is the track information- To get all track information creates

a lot of redundancy in the CD information

Each should be dependent on their own key

Second Normal Form Sample

CDID CDTitle

1 Sergeant Pepper

2 Blood on the Tracks

TrackID TrackTitle CDID Artist Artist Country

1 Sergeant Pepper 1 Beatles UK

2 Lucy in the Sky 1 Beatles UK

3 With a little help 1 Beatles UK

4 Tangled up in Blue

2 Dylan US

5 Idiot Wind 2 Dylan US

Third Normal Form

• A table is in 3NF if it is in 2NF and if it has no transitive dependencies.

• This means that the non primary key attributes don’t depend on each other.

• Look at our second sample table:

Sample TableTrackID TrackTitle CDID Artist Artist Country

1 Sergeant Pepper 1 Beatles UK

2 Lucy in the Sky 1 Beatles UK

3 With a little help 1 Beatles UK

4 Tangled up in Blue 2 Dylan US

5 Idiot Wind 2 Dylan US

Normalizing

There is a transitive dependency here

Artist Country is dependent on Artist, not on TrackID which is the key field of the table

The following tables resolve this:

Better

TrackID TrackTitle CDID ArtistID

1 Sergeant Pepper 1 1

2 Lucy in the Sky 1 1

3 With a little Help 1 1

4 Tangled up in Blue 2 2

5 Idiot wind 2 2

ArtistID Artist ArtistCountry

1 Beatles UK

2 Dylan US

Summary

Through the process of normalization our original table has become three tables, related by foreign keys:

CDs(CDID, CDTitle)ARTISTS(ArtistID, Artist, ArtistCountry)TRACKS(TrackID, TrackTitle, CDID, ArtistID)

MORE…

1. Boyce Codd Normal Form2. Fourth Normal Form3. Fifth Normal Form

Recommended