ITD102_Lecture 6-Normalization Process

8/13/2019 ITD102_Lecture 6-Normalization Process

1/39

Database NormalizationEngr. Julius Cansino


2/39

What is Normalization

Normalization allows us to organizedata so that it:

Allows faster access (dependenciesmake sense)

Reduced space (less redundancy)

Normalization is a method oforganizing data elements into tables.


3/39

Normal Forms

Normalization is done throughchanging or transforming data intovarious Normal Forms.

There are 5 Normal Forms but wealmost never use 4NF or 5NF.

We will only be concerned with 1NF,2NF, and 3NF.


4/39

For a database to be in a normalform, it must meet all requirementsof the previous forms:

Eg. For a database to be in 2NF, it mustalready be in 1NF. For a database to bein 3NF, it must already be in 1NF and

2NF.


5/39

Sample Data

Manager Employees

Bob Susie, Eric

Edward Bella, Andrew

Taylor Mark, Jane

This data has some problems:

The Employees column is not atomic.

A column must be atomic, meaning that itcan only hold a single item of data. Thiscolumn holds more than one employeename.


6/39

Data that is not atomic means: We cant easily sort the data

We cant easily search or index the data

We cant easily change the data

We cant easily reference the data inother tables

Manager Employees

Bob Susie, EricEdward Bella, Andrew

Taylor Mark, Jane


7/39

Manager Employee1 Employee2

Bob Susie Eric

Edward Bella Andrew

Taylor Mark Jane

Breaking the Employee column intomore than 1 column doesnt solveour problems:

The data may look atomic, but onlybecause we have many identical

columns storing a single piece of datainstead of a single column storing manypieces of data.


8/39

We still cant easily sort, search, orindex our employees.

What if a manager has more than 2employees, 10 employees, 100employees? Wed need to add columnsto our database just for these cases.

It is still hard to reference ouremployees in other tables.


Bob Susie Eric

Edward Bella Andrew

Taylor Mark Jane


9/39

By the way, what would be a goodchoice of a Primary Key for thistable?


Bob Susie Eric

Edward Bella Andrew

Taylor Mark Jane


10/39

First Normal Form

1NF means that we must:

Eliminate duplicate columns from thesame table, and

Create separate tables for each group ofrelated data into separate tables, eachwith a unique row identifier (primary

key) Lets get started by making our

columns atomic


11/39

Atomic Data

By breaking eachtuple of our tableinto an entry for

each employee, wehave made ourdata atomic.

What would be the

primary key?

Manager Employee

Bob SusieBob Eric

Edward Bella

Edward Andrew

Taylor Mark

Taylor Jane


12/39

Primary Key

The best primarykey would be theEmployee column.

Every employeeonly has onemanager, thereforean employee is

unique.

Employee Manager

Susie Bob

Eric Bob

Bella EdwardAndrew Edward

Mark Taylor

Jane Taylor


13/39

First Normal Form

Congratulations!

The fact that allour data and

columns is atomicand we have aprimary key meansthat we are in 1NF!

Employee Manager

Susie BobEric Bob

Bella Edward

Andrew Edward

Mark Taylor

Jane Taylor


14/39

First Normal Form Revised

Of course theremay come a daywhen we hire a

second employeeor manager withthe same name. Toavoid this, lets use

an employee IDinstead of theirname.

ID Employee ManagerID

1 Susie 7

2 Eric 7

3 Bella 84 Andrew 8

5 Mark 9

6 Jane 9

7 Bob

8 Edward

9 Taylor


15/39


16/39

Moving to Second Normal Form

A database in 2NF must also be in1NF:

Data must be atomic

Every row (or tuple) must have aunique primary key

Plus:

Subsets of data that apply to multiplerows (repeating data) are moved toseparate tables


17/39

CustID FirstName LastName Address City State Zip

1 Bob Smith 123 Main St. Tucson AZ 12345

2 John Brown 555 2nd Ave. St. Paul MN 54355

3 Sandy Jessop 4256 James St. Chicago IL 435554 Maria Hernandez 4599 Columbia Vancouver BC V5N 1M0

5 Gameil Hintz 569 Summit St. St. Paul MN 54355

6 James Richardson 12 Cameron Bay Regina SK S4T 2V8

7 Shiela Green 12 Michigan Ave. Chicago IL 43555

8 Ian Sampson 56 Manitoba St. Winnipeg MB M5W 9N7

9 Ed Rodgers 15 Athol St. Regina SK S4T 2V9

This data is in 1NF: all fields are atomic and the CustID serves asthe primary key


18/39

But lets payattention to theCity, State, and Zipfields: There are 2 rows of

repeating data:one for Chicago,and one for St.

Paul. Both have the same

city, state and zipcode

City State Zip

Tucson AZ 12345

St. Paul MN 54355

Chicago IL 43555Vancouver BC V5N 1M0

St. Paul MN 54355

Regina SK S4T 2V8

Chicago IL 43555Winnipeg MB M5W 9N7

Regina SK S4T 2V9


19/39

The CustID determines all the data in therow, but U.S. Zip codes determines theCityand State. (eg. A given Zip code can

only belong to one city and state sostoring Zip codes with a City and State isredundant)

This means that Cityand Stateare

Functionally Dependenton the value inZipcode and not only the primary key.


20/39

To be in 2NF, this repeating datamust be in its own table.

So:

Lets create a Zip code table that mapsZip codes to their City and State.

Note that Canadian Postal Codes are

different: the same city and state canhave many different postal codes.


21/39

Our Data in 2NFCustID FirstName LastName Address Zip

1 Bob Smith 123 Main St. 12345

2 John Brown 555 2nd Ave. 54355

3 Sandy Jessop 4256 James St. 43555

4 Maria Hernandez 4599 Columbia V5N 1M0

5 Gameil Hintz 569 Summit St. 54355

6 James Richardson 12 Cameron Bay S4T 2V87 Shiela Green 12 Michigan Ave. 43555

8 Ian Sampson 56 Manitoba St. M5W 9N7

9 Ed Rodgers 15 Athol St. S4T 2V9

Zip City State

12345 Tucson AZ54355 St. Paul MN

43555 Chicago IL

V5N 1M0 Vancouver BC

S4T 2V8 Regina SK

M5W 9N7 Winnipeg MB

S4T 2V9 Regina SK

We see that we can actually save 2

rows in the Zip Code table by removingthese redundancies: 9 customerrecords only need 7 Zip code records.

Zip code becomes a foreign key in thecustomer table linked to the primarykey in the Zip code table

CustomerTable

ZipCodeTable


22/39

Advantages of 2NF

Saves space in the database byreducing redundancies

If a customer calls, you can just ask

them for their Zip code and youllknow their city and state! (No morespelling mistakes)

If a City name changes, we onlyneed to make one change to thedatabase.


23/39

Summary So Far

1NF: All data is atomic

All rows have a unique primary key

2NF: Data is in 1NF

Subsets of data in multiple columns are

moved to a new table These new tables are related usingforeign keys


24/39

Moving to 3NF

To be in 3NF, a database must be:

In 2NF

All columns must be fully functionally

dependent on the primary key (Thereare no transitive dependencies)


25/39

In this table:

CustomerID and ProdID depend on theOrderID and no other column (good)

Stated another way, If you know the OrderID,you know the CustID and the ProdID

So: OrderID CustID, ProdID

OrderID CustID ProdID Price Quantity Total

1 1001 AB-111 50 1,000 50,000

2 1002 AB-111 60 500 30,000

3 1001 ZA-245 35 100 3,500

4 1003 MB-153 82 25 2,050

5 1004 ZA-245 42 10 420

6 1002 ZA-245 40 50 2,000

7 1001 AB-111 75 100 7,500


26/39

But there are some fields that arenot dependent on OrderID: Total is the simple product of

Price*Quantity. As such, has a transitivedependency to Price and Quantity.

Because it is a calculated value, doesntneed to be included at all.


1 1001 AB-111 50 1,000 50,000

2 1002 AB-111 60 500 30,000

3 1001 ZA-245 35 100 3,500

4 1003 MB-153 82 25 2,050

5 1004 ZA-245 42 10 420

6 1002 ZA-245 40 50 2,000

7 1001 AB-111 75 100 7,500


27/39

Also, we can see that Price isntreally dependent on ProdID, orOrderID. Customer 1001 bought AB-

111 for $50 (in order 1) and for $75(in order 7), while 1002 spent $60for each item in order 2.


1 1001 AB-111 50 1,000 50,000

2 1002 AB-111 60 500 30,000

3 1001 ZA-245 35 100 3,500

4 1003 MB-153 82 25 2,050

5 1004 ZA-245 42 10 420

6 1002 ZA-245 40 50 2,000

7 1001 AB-111 75 100 7,500


28/39

Maybe price is dependent on theProdID and Quantity: The more youbuy of a given product the cheaper

that product becomes! So we ask the business manager and

she tells us that this is the case.


1 1001 AB-111 50 1,000 50,000

2 1002 AB-111 60 500 30,000

3 1001 ZA-245 35 100 3,500

4 1003 MB-153 82 25 2,050

5 1004 ZA-245 42 10 420

6 1002 ZA-245 40 50 2,000

7 1001 AB-111 75 100 7,500


29/39


30/39

Lets diagram the dependencies. We can see that all fields are

dependent on OrderID, the Primary

Key (white lines)



31/39

But Total is also determined by Priceand Quantity (yellow lines) This is a derived field

(Price x Quantity = Total)

We can save a lot of space by gettingrid of it altogether and just calculatingtotal when we need it



32/39

Price is also determined by bothProdID and Quantity rather than theprimary key (red lines). This is called

a transitive dependency. We mustget rid of transitive dependencies tohave 3NF.

OrderID CustID ProdID Price Quantity


33/39

We do this by moving the transitivedependency into a second table

OrderID CustID ProdID Price Quantity


34/39

By splitting out thetable, we canquickly adjust our

price table to meetour competitor, orif the priceschanges from our

suppliers.

OrderID CustID ProdID Quantity

ProdID PriceQuantity


35/39

The second table is our pricing list. Think of Quantity as a range:

AB-111: 1-100, 101-500, 501 and moreZA-245: 1-10, 11-50, 51 and more

The primary Key for this second table is acomposite of ProdID and Quantity.

OrderID CustID ProdID Quantity ProdID Quantity Price

1 1001 AB-111 1,000 AB-111 1 75

2 1002 AB-111 500 AB-111 101 60

3 1001 ZA-245 100 AB-111 501 50

4 1003 MB-153 25 ZA-245 1 425 1004 ZA-245 10 ZA-245 11 40

6 1002 ZA-245 50 ZA-245 51 35

7 1001 AB-111 100 MB-153 1 82


36/39

Congratulations! Were now in 3NF! We can also quickly figure out what

price to offer our customers for any

quantity they want.

OrderID CustID ProdID Quantity ProdID Quantity Price

1 1001 AB-111 1,000 AB-111 1 75

2 1002 AB-111 500 AB-111 101 60

3 1001 ZA-245 100 AB-111 501 50

4 1003 MB-153 25 ZA-245 1 425 1004 ZA-245 10 ZA-245 11 40

6 1002 ZA-245 50 ZA-245 51 35

7 1001 AB-111 100 MB-153 1 82


37/39


38/39

Summarizing

A database is in 2NF if:

It is in 1NF

There is no repeating data in its tables.

Put another way, if we use a compositeprimary key, then all attributes aredependent on all parts of the key.


39/39

And Finally

A database is in 1NF if:

All its attributes are atomic (meaningthey contain only a single unit or type of

data), and All rows have a unique primary key.

Documents

ITD102_Lecture 6-Normalization Process