The Relational Model and Normalization
The Relational Model and Normalization
1. Introduction 2
2. Relational Model Terminology 3
4. Normal Forms 11
5. Multi-valued Dependency 21
6. The Fifth Normal Form 22
The Relational Model and Normalization
The Relational Model and Normalization
1. Introduction
■ Questions ?
- Should we store these two tables as they are, or
- Should we combine them into one table in our new database?
■ We need to understand:
- The relational model
- Relational model terminology
The Relational Model and Normalization
2. Relational Model Terminology
■ Relational Model
- Introduced in 1970
Created by E.F. Codd
- He was an IBM engineer
- The model used mathematics known as "relational algebra"
Now the standard model for commercial DBMS products
■ The most Important Terms used by the Relational Model
• Entity• Relation• Functional Dependency• Determinant• Candidate Key• Composite Key• Primary Key• Surrogate Key• Foreign Key• Referential integrity constraint• Normal Form• Multivalued Dependency
(1) Entity
An entity is some identifiable thing that users want to track:
- Customers
- Computers
- Sales
The Relational Model and Normalization
(2) Relation
A Relation :
- Figure 3-5 : Tables that are not relations
(a) Table with Multiple entries per cell
(b) Table with Required Row order
- Alternative Terminology
The Relational Model and Normalization
(3) Functional Dependency
■ A functional dependency occurs when the value of one (a set of) attribute(s)
determines the value of a second (set of) attribute(s):
StudentID → StudentName
StudentID → (DormName, DormRoom, Fee)
■ X → Y :
If the value of X determines the value of Y ,
attribute Y is functionally dependent on attribute X
■ Determinant
- The attribute on the left side of the functional dependency is called the
determinant
- Functional dependencies may be based on equations:
ExtendedPrice = Quantity X UnitPrice
(Quantity, UnitPrice) → ExtendedPrice
- Function dependencies are not equations!
■ Composite Determinant
- A determinant of a functional dependency that consists of more than one
attribute
(StudentName, ClassName) → (Grade)
■ Functional Dependency Rule
① If X → (Y, Z), then X → Y and X → Z
② But, If (X, Y) → Z, it is not true X → Y and Y → Z
neither X nor Y determines Z by itself
The Relational Model and Normalization
■ Formal Description
Let R be a relation schema α ⊆R, β⊆R
The functional dependency α → β holds on R
if and only if
for any legal relation r(R),
whenever any two tuples t1 and t2 of r agree on the attributes α,
they also agree on the attributes on the β
That is, t1[α] = t2[α] ⇒ t1[β] = t2[β]
l Example : Supplier Relation
S# → (SNAME, STATUS, CITY)
S# SNAME STATUS CITYS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
FD diagram
S#
SNAME
STATUS
CITY
S#
SNAME
STATUS
CITY
■ Finding Functional Dependency in SKU_DATA Relation
SKU → (SKU_Description, Department, Buyer)
SKU_Description → (SKU, Department, Buyer)
Buyer → Department
The Relational Model and Normalization
■ Finding Functional Dependency in Order_ITEM Relation
(OrderNumber, SKU) → (Quantity, Price, ExtendedPrice)
(Quantity, Price) → (ExtendedPrice)
SKU ↛ PRICE
(the price can be changed after an order has been processed)
■ What Are Determinant Values Unique ?
■ A determinant is unique in a relation if, and only if, it determines every
other column in the relation
■ You cannot find the determinants of all functional dependencies simply by
looking for unique values in one column:
- Data set limitations
- Must be logically a determinant
※ To determine if attribute A determines attribute B
"Every time that a value of attribute A appears is it mateched with the
same value of attribute B?"
- sample data can be incomplete
The Relational Model and Normalization
■ Keys
■ Key
A group of one or more attributes that uniquely identifies a row
■ K is a super key for relation schema R if and only if
K →R
(uniqueness property)
■ K is candidate key for R if and only if
K → R and
there is no α ⊂ K, α → R
( minimality property)
■ A primary key is a candidate key selected as the primary means of
identifying rows in a relation:
- There is one and only one primary key per relation
- The primary key may be a composite key
■ There is at least one candidate key
- because relations do not contain duplicate tuples
- at least the combination of all attributes of the relation has the
uniqueness property
■ A surrogate key as an artificial column added to a relation to serve as a
primary key:
■ A foreign key is the primary key of one relation that is placed in another
relation to form a link between the relations:
- A foreign key can be a single column or a composite key
- The term refers to the fact that key values are foreign to the relation
in which they appear as foreign key values
DEPARTMENT (DepartmentName, BudgetCode, ManagerName)EMPLOYEE(EmployeeNumber,EmployeeName, DepartmentName)
The Relational Model and Normalization
■ Definition of Foreign Key
■ Let R2 be a base relation.
■ Then foreign key in R2 is a subset of the set of attributes of R2, say FK,
such that
① There exists a base relation R1 with a candidate key CK, (R1 and R2 not
necessarily distinct)
② For all time, each value of FK in the current value of R2 is identical to
the value of CK in some tuple in the current value of R1
CK R2R1 FK
■ Referential Integrity Rule
The database must not contain any unmatched foreign key values
If B references A, A must exist
"foreign key" and "referential integrity" are defined in terms of each other
■ Database Designer
- specify which operations should be rejected
- specify which compensating operations should be performed
What should happen on an attempt to delete the target of a foreign key
reference ?
RESTRICTED : "restricted" to the case where there are no matching.
CASCADES : " cascades" to delete those matching also
The Relational Model and Normalization
■ Nulls
- as a basis for dealing with the problem of missing information
- "date of birth unknown","to be announced"
- Nulls are not the same as blank or zero
three valued logic
AND True False unknownTrue True False unknownFalse
unknownFalse False
unknownFalse
Unknown False
OR True False unknownTrue True True TrueFalse
unknownTrue False
unknownunknown
True unknown
l Entity Integrity Rule
■ No component of the primary key of a base relation is allowed to accept
nulls.
■ definition of every attribute involved in primary key must include the
specification NULLS NOT ALLOWED
l Integrity Rules - Summary
Referential Integrity Rule
Entity Integrity Rule
The Relational Model and Normalization
4. Normal Forms
■ Modification Anomalies
■ update anomaly
(before update)
(after update)
■ Deletion Anomaly
- To lose facts that tow different things, ( a machine and a repair)
- If you delete a tuple that has the item number 200 or 300
■ Insertion Anomaly
※ Primary Key :( OrderNumber, SKU)
- We assume that situation when we are going to insert the fact that related
with only the SKU ( SKU, SKU_Descripion, Department, Buyer)
- Is it possible ?
The Relational Model and Normalization
■ Normalization Theory
All Possible Relational Schema
¨ Repetition of information¨ Inability to represent certain information¨ Inefficient retrieval and update performance
Good Database DesignGood Database Design
Normalization
■ Relations are categorized as a normal form based on which modification
anomalies or other problems that they are subject to:
The Relational Model and Normalization
■ Normalization Category
■ 1NF - A table that qualifies as a relation is in 1NF
■ 2NF - A relation is in 2NF if all of its nonkey attributes are dependent on
all of the primary key
■ 3NF
- A relation is in 3NF if it is in 2NF and has no determinants except the
primary key
- The relation R is 3NF iff it is 2NF and has no transitive dependencies
■ Boyce-Codd Normal Form (BCNF) - A relation is in BCNF if every
determinant is a candidate key
■ Eliminating Anomalies from Functional Dependencies
Put all relations into Boyce-Codd Normal Form (BCNF):
The Relational Model and Normalization
■ Eliminating Anomalies from Functional Dependencies : Example 1 (page 84)
- SKU_DATA relation
■ FDs
SKU → (SKU_Description, Department, Buyer)SKU_Description → (SKU, Department, Buyer)Buyer → Department
■ SKU and SKU_Description
; all of the attributes in the table, so they are candidate keys
■ Buyer
; It does not determine all of the other attributes
★ Hence, the SKU_DATA relation is not in BCNF
Step 3-A in Figure 3-11 :
Step 3-B : make the Buyer attribute the primary key
BUYER(Buyer, Department)
Step 3-C and 3-D : SKU_DATA_2(SKU, SKU_Description, Buyer)
The Relational Model and Normalization
[Report #3]
■ Describe the whole process of below examples and write your explanation
and PKs, FKs of the decomposed relations for each step of process for
putting a relation into BCNF.
1) Eliminating Anomalies from Functional Dependencies : Example 2 (page 85)
2) Eliminating Anomalies from Functional Dependencies : Example 3 (page 86)
3) Eliminating Anomalies from Functional Dependencies : Example 4 (page 87)
4) Eliminating Anomalies from Functional Dependencies : Example 5 (page 89)
The Relational Model and Normalization
■ Theoretical Approach : The First Normal Form
■ definition
- Any table of data that meets the definition of a relation
■ Formal Definition
A relation is in 1NF iff all underlying domains contain scalar values only
Example
"상태 20인 도시 “런던”에 위치한 공급자 S1이 부품 P1을 300개 공급한다“
20
S# STATUS CITYS1 20 LondonS1 London
P#P1P2
QTY300200
S1S1S2S2S3
2020101010
ParisParisParis
LondonLondon
P3P4P1P2P2
400200300400200
S#
P#
QTY
CITY
STATUS
S5 40 Athens P4 100
l Anomalies in 1NF
③ Insertion Anomaly
New Supplier without P#
④ Deletion Anomaly
Unexpected loss of bundled information
⑤ Update Anomaly
London → Paris
20
S# STATUS CITYS1 20 LondonS1 London
FIRST
P#P1P2
QTY300200
S1S1S2S2
20201010
ParisParis
LondonLondon
P3P4P1P2
400200300400
S5 40 Athens P4 100S3 90 New York P2 200
redundancy
S6 70 Florence Null Null
The Relational Model and Normalization
l Solution
■ Decomposition
First ⇒ SECOND(S#, STATUS, CITY), SP(S#, P#, QTY)
l Nonloss(lossless decomposition)
- Reversibility
S# STATUS CITYS3 30 ParisS5 30 Athens
S# STATUSS3 30S5 30
STATUS CITY30 Paris30 Athens
S
STC
S# STATUSS3 30S5 30
S# CITYS3 ParisS5 Athens
SCnonloss
lossy
l test for the lossless decomposition
S# STATUSS3 30S5 30
S# CITYS3 ParisS5 Athens
(a) SST SC S# STATUS CITYS3 30 ParisS5 30 Athens
S
S# STATUSS3 30S5 30
STATUS CITY30 Paris30 Athens
(b) SST STC S# STATUS CITYS3 30 Paris
S5 30 Athens
S
S3 30 AthensS5 30 Paris
nonloss
lossy
The Relational Model and Normalization
l Decomposition : In a view of functional dependency
S# STATUS CITYS3 30 ParisS5 30 Athens
S# STATUSS3 30S5 30
STATUS CITY30 Paris30 Athens
S
(b) SST STC
S# STATUSS3 30S5 30
S# CITYS3 ParisS5 Athens
(a) SST SC
S# ® STATUSS# ® CITY
S# ® STATUSS# ® CITY
S# ® STATUSS# ® CITY
S# ® STATUSS# ® CITY
S# ® STATUSS# ® STATUS
S# ® CITYis lostWe cannot tell which
supplier has which city
l Heath's Theorem
Let R(A, B, C) be a Relation, where A, B, C are set of attributes,
- If R satisfies the FD A→ B,
- then R is equal to join of its projections on R(A, B) and R(B, C)
l First ⇒ SECOND(S#, STATUS, CITY), SP(S#, P#, QTY)
S#
STATUS CITY
S1
20 London
S1SECOND
P#P1P2
QTY300200
S1S1S2S2S3
1010
ParisParis
P3P4P1P2P2
400200300400200
S#S1S2S3
SP
40 AthensS5
S5 P4 100
The Relational Model and Normalization
■ Theoretical Approach : The Second Normal Form
l Definition
- R is in 2NF When all of a relation's nonkey attributes are dependent of a
key
l Formal Definition
A relation R is in 2NF if and only if
- 1NF
- Every non-key attribute is irreducibly(fully) dependent on the primary
key
■ Anomaly in 2NF
Insertion Anomaly
- Insertion {Florence, Italy}
Deletion Anomaly
- Deletion of tuple S1 : Unexpected loss of bundled information
Update Anomaly
STATUS CITY20 London
SECOND
1010
ParisParis
S#S1S2S3
40 AthensS5
S# CITY
STATUS
Because, Transitive Dependency
■ Decomposition
SECOND(S#, STATUS, CITY) ⇒decomposition
SC(S#, STATUS), CS(CITY, STATUS)
S# STATUSCITYS1 London
SC
S2S3
3020Paris
ParisS5 Athens
CITY
LondonParisRome
Athens
1050
CS
The Relational Model and Normalization
■ Theoretical Approach : The Third Normal Form
■ Definition
The relation R is 3NF iff it is 2NF and has no transitive dependencies
■ Formal Definition
- 2NF
- Every non-key attribute is non-transitively(no mutual dependency) on the
primary key
■ Anomaly in 3NF
S
Smith
Smith
Jones
Jones
J
Math
Math
Physics
Physics
T
Prof. White
Prof. White
Prof. Green
Prof. Brown
S
JT
3NF
Two candidate keys :{S,J} , {S, T}
o Semantic constraintsw For each subject, each student of that subject is taught by only
one teacher { S, J } ® Tw Each teacher teaches only one subject T ® Jw Each subject is taught by several teachers
w Anomalyw Jones drops the “physics”w delete the last tuple w we loss the information “Prof. Brown teaches Physic”
■ Solution : Decomposition
- SJT ⇒ ST(S, T), TJ((T, J)
- BCNF(Boyce-Codd Normal Form)
The Relational Model and Normalization
5. Multi-valued Dependency
■ A multi-valued dependency occurs when a determinant determines a
particular set of values:
Employee ↠ Degree
Employee ↠ Sibling
PartKit ↠ Part
■ The determinant of a multivalued dependency can never be a primary key
■ Multivalued dependencies are not a problem if they are in a separate relation,
so:
- Always put multivaled dependencies into their own relation
- This is known as Fourth Normal Form (4NF)
The Relational Model and Normalization
A relation is in 4NF iff, for all multi-valued dependency A ®®B, non-key attributes are dependent on A
C
database
database
B
Korth
Ullman
T
Prof. White
Prof. White
database
database
Korth
Ullman
Prof. Green
Prof. Green
databaseProf. White
Prof. Green
Korth
Ullman
database UllmanProf. Sara
database KorthProf. Sara
insertion
C ®® T C ®® B
6. The Fifth Normal Form
A relation is in 5NF iff , for all join dependency (R1, R2, ..., Rn), R1, R2 , … , Rn is a candidate key
S#S1S1
P#P1P2
J#J2J1
S#S1S1
P#P1P2
P#P1P2
J#J2J1
S#S1S1
J#J2J1
S#S1S1
P#P1P2
P#P1P2
J#J2J1
S#S1S1
J#J2J1
S2 P1 P1 J1 S2 J1
S2 P1 J1
S1 P1 J1
S1 P1 J1
The Relational Model and Normalization
[Report #4]
1. make the relation that is the result of join on the relation CUSTOMER and
ORDER.
CUSTOMER (Cid, Name, Address)
Cid → (Name, Address, zipcode)Address → Zipcode
ORDER (BookNumber, Cid, Price, Orderdate)
(BookNumber, Cid) → (Price, Orderdate)
2. Write a detail process the following problems.
(1) Describe the FDs in result relation 1 and illustrate the anomaly
(2) Transform into the second Normal Form
(3) Transform into the third Normal Form
(4) Transform into the BCNF
(5) make sure that result of (4) is the same as the result of problem 1