8
1 Copyright © 2007 Robinson College of Business, Georgia State University David S. McDonald, Ph.D. – Director of Emerging Technologies Tel: 404-413-7368; e-mail: [email protected] Relation Normalization Why Normalization? Functional Dependencies Functional Dependencies. First, Second, and Third Normal Forms. Denormalization Why Normalization? An ill-structured relation contains redundant data Data redundancy causes modification anomalies: Insertion anomalies -- Suppose we want to enter SCUBA as an activity that costs $100 we cant until a student signs up for it $100, we can t until a student signs up for it Update anomalies -- If we change the price of swimming for student 150, there is no guarantee that student 200 will pay the new price Deletion anomalies -- If we delete Student 100, we lose not only the fact that he/she is a skier, but also the fact that skiing costs $200 Normalization is the process used to remove modification anomalies SID Activity Fee 100 Skiing 200 150 Swimming 50 175 Squash 50 200 Swimming 50 ACTIVITY How can this table be changed to fix these problems???

Relation Normalization€¦ · Data redundancy causes modification anomalies: Insertion anomalies-- Suppose we want to enter SCUBA as an activity that costs $100 we can$100, we can

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Relation Normalization€¦ · Data redundancy causes modification anomalies: Insertion anomalies-- Suppose we want to enter SCUBA as an activity that costs $100 we can$100, we can

1

Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]

Relation Normalization

Why Normalization?

Functional DependenciesFunctional Dependencies.

First, Second, and Third Normal Forms.

Denormalization

Why Normalization?An ill-structured relation contains redundant data

Data redundancy causes modification anomalies:Insertion anomalies -- Suppose we want to enter SCUBA as an activity that costs $100 we can’t until a student signs up for it$100, we can t until a student signs up for it

Update anomalies -- If we change the price of swimming for student 150, there is no guarantee that student 200 will pay the new price

Deletion anomalies -- If we delete Student 100, we lose not only the fact that he/she is a skier, but also the fact that skiing costs $200

Normalization is the process used to remove modification anomalies

SID Activity Fee100 Skiing 200150 Swimming 50175 Squash 50200 Swimming 50

ACTIVITYHow can this table be changedto fix these problems???

Page 2: Relation Normalization€¦ · Data redundancy causes modification anomalies: Insertion anomalies-- Suppose we want to enter SCUBA as an activity that costs $100 we can$100, we can

2

Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]

Why Normalization...SIDs1s1s1

NameJosephJosephJoseph

GradeABA

Course#CIS8110CIS8120CIS8140

Textb1b2b5

MajorCISCISCIS

DeptCISCISCIS

Course

s2s2s3s3s3

AliceAliceTomTomTom

AABBA

CIS8110CIS8140CIS8110CIS8140CIS8680

b1b5b1b5b1

CSCS

AcctAcctAcct

MCSMCSAcctAcctAcct

Is there any redundant data?

Insertion anomalies?

Update anomalies?

Deletion anomalies?

Functional DependenciesGiven two attributes, X and Y, of a Table T, attribute Y is functionally dependent on attribute X iff each attribute X value must always occur with the same attribute Y value in R.

Employee.ID -> Employee.LastName

List all FDs in the Course relation:

Page 3: Relation Normalization€¦ · Data redundancy causes modification anomalies: Insertion anomalies-- Suppose we want to enter SCUBA as an activity that costs $100 we can$100, we can

3

Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]

Functional Dependencies...Attribute X is called the determinant of attribute Y.

X and Y may be composite (made up of more than one attribute)attribute).

Dependency relationships change with attribute semantics.

Attribute X and Attribute Y could be mutually dependent on each other.

Husband --> Wife, Wife --> Husband,H b d WifHusband <--> Wife

Attribute X may or may not be the primary key of the table.

Attribute Y value can occur in more than one field in a table.

Course# --> Text

Fully Functional DependenciesA fully functional dependence ( FFD ) exists between attributes X and Y if attribute Y is not functional dependent on any proper subset of attribute(s) X.

( SID, Course# ) --> Name?

( SID, Course# ) --> Grade?

( SID, Name ) --> Major?

( SID, Name ) --> SID?

Note that if X is not composite, then X --> Y is always a FFD.

By default, the term FD refers to FFD

Page 4: Relation Normalization€¦ · Data redundancy causes modification anomalies: Insertion anomalies-- Suppose we want to enter SCUBA as an activity that costs $100 we can$100, we can

4

Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]

Transitively Functional Dependencies

Given attributes X, Y, and Z of a Table T, attribute Z is transitively dependent on attribute(s) X iff( )

Attribute X --> Attribute Y and Attribute Y --> Attribute Z.

Given SID --> Dept and Dept --> CollegeGiven SID > Dept and Dept > College

SID -->?

Given SID --> Major and Major --> Dept,

SID --> ?

A Graphical Representation

Course (SID, Name, Grade, Course#, Text, Major, Dept)

SID Name

Major

Dept Course#

Grade

Primary Key

Text

Page 5: Relation Normalization€¦ · Data redundancy causes modification anomalies: Insertion anomalies-- Suppose we want to enter SCUBA as an activity that costs $100 we can$100, we can

5

Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]

First Normal Form (1NF)A Table T is in 1NF iff all attribute domains contain atomic

(single) values only.

A Table in 1NF has modification anomalies…in other words, the process of normalization must continue

Part#

QTY

WHouse#WAddress

INVENTORY (Part#, WHouse#, WAddress, QTY)

Second Normal Form (2NF)A table T is in 2NF iff T is in 1NF and every non key attribute

is fully dependent on the primary key (i.e. has no partial functional dependencies).

The term non key attribute refers to any attribute that does not belong to anyThe term, non key attribute, refers to any attribute that does not belong to any candidate key.

Part#

QTY

WHouse#WAddress

INVENTORY (Part#, WHouse#, WAddress, QTY)

Page 6: Relation Normalization€¦ · Data redundancy causes modification anomalies: Insertion anomalies-- Suppose we want to enter SCUBA as an activity that costs $100 we can$100, we can

6

Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]

Modification Anomalies in 2NF2NF tables have modification anomalies:

Redundant Information?

Update anomalies?Update anomalies?

Insertion anomalies?

Deletion anomalies?

Which FD causes the redundant data?INVENTORYPart# WHouse# WAddress QTY

123 4 Atlanta 10456 5 Birmingham 6456 2 Columbus 10123 7 Oakland 8235 1 Denver 2

Third Normal Form (3NF)A table T is in 3NF iff T is in 2NF and every

non key attribute is non transitively dependent on the primary key.on the primary key.

Student (SID, Name, Major, Dept)

Discussion:

If a relation does not have any non-key attribute, would it automatically be in 3NF?

Page 7: Relation Normalization€¦ · Data redundancy causes modification anomalies: Insertion anomalies-- Suppose we want to enter SCUBA as an activity that costs $100 we can$100, we can

7

Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]

LOCATION (Employee, Department, Location)

Redundant Information?

Modification Anomalies in 3NF

Redundant Information?

Update anomalies?

Insertion anomalies?

Deletion anomalies?

All determinants?E l Department

Location

Employee Department

Denormalization

Denormalization is needed ifRelations in higher normal form (not mentioned here) cause the performance

blproblem

Denormalization can speed up the data retrieval, and

Denormalization does not introduce severe update anomalies.

Is a Zipcode attribute normalized when kept with a street, city, and state attributes?

Page 8: Relation Normalization€¦ · Data redundancy causes modification anomalies: Insertion anomalies-- Suppose we want to enter SCUBA as an activity that costs $100 we can$100, we can

8

Copyright © 2007Robinson College of Business, Georgia State UniversityDavid S. McDonald, Ph.D. – Director of Emerging TechnologiesTel: 404-413-7368; e-mail: [email protected]

ExerciseGiven relation, Lab_Usage, has been defined as follows.

Lab_usage( SID, Class#, Course, SName, Account#, Lab_Hours )_ g ( _ )

whereSID is a unique student ID,

Class# is a unique class number,

SName is a student name,

Lab_Hours is the maximum laboratory hours assigned to each student in a class,

Account# is a unique computer account. A student is assigned an account for each class he/she takes. Assume that no student takes the same course twice.

Exercise...1. Determine all candidate keys and select a primary key.

2. List all FFDs.

3 Discuss update anomalies found in the relation3. Discuss update anomalies found in the relation.

4. Decompose the relation into 2NF relations.

5. Discuss update anomalies found in the 2NF relations.

6. Decompose the relation into 3NF relations.

7. Discuss update anomalies found in the 3NF relations if anyany.