Upload
vidyasagar-mundroy
View
425
Download
4
Embed Size (px)
Citation preview
Introduction
Normalization process is one of the steps to design databases.
Input to this step is ER Model.
Output of this step is a relational data model.
What is Normalization?
Normalization is a technique or process for relational database design by analysis.
The technique involves evaluating each relation against the criteria for normal forms of relations and decomposing relations if required.
No implementation details to be shown.
Purpose of Normalization
To come up with a set of relations that represent logical database design of a database.
Characteristics of Normalized Relations
The characteristics of relations that normalization process produces are as follows:
1. minimal number of attributes in each relation
2. attributes with close relationships in each relation
3. minimal redundancy of attributes, i.e., minimal number of instances of the same attribute being used in more than one relation.
Data Redundancy • Problems with Data Redundancy
• Unavoidable Data Redundancy
• Insertion Anomalies
• Deletion Anomaly
• Modification Anomaly
Data Redundancy & Its Problems
Data redundancy means presence of the same attributes, in multiple relations, that are redundant.
Example:
emp(eno, ename, salary, dno, dname) dept(dno, dname, mgr_eno, mgr_name)
Attribute dname of table emp and mgr_name of dept are redundant.
Unavoidable Data Redundancy
Foreign keys in relations that refer to primary keys or candidate keys in other relations can not be avoided.
Example: emp(eno, ename, salary, dno) dept(dno, dname, mgr_eno)
Data Redundancy Problems
Data redundancy in relations can cause the following anomalies:
1. Insertion Anomalies
2. Deletion Anomalies
3. Modification Anomalies
Insertion Anomalies
Types of Insertion Anomalies:
Insertion Anomaly-1: Inserted values of some attributes are not consistent with values that are present in already existing tuples.
Insertion Anomaly-2: Can not insert values for related attributes in a tuple without entering values in some other attributes of the tuple.
Insertion Anomalies - Example
Insertion anomaly-1 occurs if you insert values, for example, (11,
‘Marketing’) instead of (11, ‘Research’) in attributes dno and
dname of a new tuple.
Insertion anomaly-2 occurs when you try to insert a tuple for a
new department and you have to have some dummy employee
details in absence of valid employee details.
Deletion Anomaly
The deletion anomaly is the deletion of vital facts from the un-normalized relations.
In the above un-normalized relation, if an employee tuple
is deleted, even the department information is also
deleted.
Modification Anomaly
A modification anomaly leaves un-normalized relations in an inconsistent state.
If you modify department name in a tuple of relation empdepts, you have
to modify department name value in all other tuples that have the same
department name or data will be inconsistent.
Attribute Dependencies
Attribute dependency means dependency of values of one or more attributes on values of one or more other attributes in a relation.
Understanding dependencies is important for designing relations. These dependencies help in deciding primary key for relations.
Types of dependencies:
• Functional Dependencies
• Multi-Valued Dependencies
Functional Dependency
Functional Dependency (FD) is a description of relationship between attributes of a relation.
If X and Y are subsets of attributes of relation R, we say X functionally determines Y or Y is functionally dependent on X or simply X determines Y and denote this relationship with notation X → Y if each value of X is associated with exactly one value of Y.
Diagrammatic representation is as follows:
Functional Dependency Examples
Valid functional
dependencies :
eno -> ename
eno -> dob
eno -> salary
eno -> gender
eno -> dname
An invalid functional
dependency: gender -> eno
It is invalid because for the
same gender, for example, ‘M’,
there are multiple eno values
1001, 1002, etc.
Full Functional Dependency
A Functional Dependency is a Full Functional dependency if all attributes of determinant attributes determine dependent attributes.
X → Y is a full functional dependency of a relation R if set of attributes Y of the relation is functionally dependent on set of attributes X of the relation but not on any proper subset of X.
Partial Functional Dependency
A Functional Dependency is a Partial Functional dependency if only a few attributes of determinant attributes determine dependent attributes.
X → Y is a partial functional dependency of a relation R if set of attributes Y of the relation is functionally dependent on a proper subset of X. An example for partial functional dependency is eno,dob → salary because salary depends only on eno and not on date of birth attribute dob.
Transitive Functional Dependency
A Functional Dependency is a Transitive Functional dependency if its determinant itself is dependent on some other determinant.
If X, Y and Z are sets of attributes of a relation R and there are functional dependencies X → Y and Y → Z, then we say there is a transitive dependency between X and Z where Z is transitively dependent on X, i.e., X → Z, provided X is not functionally dependent on Y or Z. In the example relation empdepts, there are functional dependencies eno → dno and dno → dname. This implies there is a transitive dependency eno → dname.
Functional Dependencies in relation empdepts
The following are the functional dependencies of relation empdepts:
Full Functional Dependencies:
eno → ename, dob, salary, gender, dno, dname eno → dno dno → dname
Transitive Functional Dependencies:
eno → dname
Identifying Primary Keys and Candidate Keys
Left-hand side attributes of a functional dependency is a candidate key if right-hand side attributes of the functional dependency form the remaining attributes of the relation. This means all other attributes are dependent on the candidate key attributes. If there is only one candidate key for the relation, designate the candidate key as the primary key for the relation. Otherwise, choose one of the candidate keys as the primary key based on the enterprise requirements.
Multi-Valued Dependency
A Multi-valued Dependency (MVD) is a dependency of attribute Y and Z on X in a relation such that for each value of X, there is a set of values for Y and there is a set of values for Z and both sets of values are independent of each other.
The multi-valued dependency is represented with a double arrow as follows:
X -->> Y
X -->> Z
Multi-Valued Dependency Example
The following relation emp_assets(eno, phone, computer)
represents company assets (phones and computers) given to
an employee. As such phones and computer systems are
not related at all. Number of phones and computers are
also not related at all. Hence, the relation has the following
MVDs:
eno -->> phone
eno -->> computer
Un-normalized Form (UNF)
A relation that contains multi-valued and/or composite
attributes is said to be in un-normalized form.
Example: For example, the following relation emp has two
phone numbers for attribute phone and multiple
components in values for attribute address. Here,
attribute phone is a multi-valued and attribute address is
composite, i.e., non-atomic or divisible.
First Normal Form (1NF)
A relation is said to be in the First Normal Form (1NF) if
the relation does not have multi-valued attributes,
composite attributes or a combination of them.
Example: Attributes phone_number and interest of
relation customers are both multi-valued whereas
attribute details of relation participants is composite.
Problems with 1NF Relations
The first normal form relations have the following
problems:
1. Redundant data
2. Update anomalies
The emp relation which is in 1NF has redundant data for all
columns except phone number! It also can give rise to all
update anomalies we studied earlier.
Second Normal Form (2NF)
A relation is in Second Normal Form (2NF) if the relation is in
First Normal Form and every non-primary-key attribute is
fully functionally dependent on any candidate key.
Example:
Third Normal Form (3NF)
A relation is in Third Normal Form (3NF) if the relation is
already in Second Normal Form and no non-primary-key
attribute is transitively dependent on the primary key.
Two ways a set of attributes Y can transitively depend on a
key:
KEY X Y
KEY Y X
Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd Normal Form (BCNF) if and only
if every determinant is a candidate key.
This normal form is more stricter than 3NF.
Every relation in BCNP is also in 3NF.
Example: The FD in the relation emp3 that violates BCNF is
pin_code block, area, city
Fourth Normal Form (4NF)
A relation is in Fourth Normal Form (4NF) if the relation is in
Boyce-Codd normal form and does not contain non-trivial
multi-valued dependencies.
Example: The following relation has MVDs
eno language
eno hobby
Attribute language and hobby are
Both independent of each other.
Fourth Normal Form (4NF) – Repeated Values
The relation specified earlier has values repeated for
attributes language and hobby. One way to avoid the
repetition is to not store repeating values as shown below
Invalid representations of the
relations trying to avoid redundant
data
Relation not in
4th normal form
Decomposing into 4th Normal Form
The decomposition involves forming a relation for each MVD
so that relation contains determiner as well as dependent
attributes as shown below
Original relation Two relations after decomposing.
Fifth Normal Form (5NF)
A relation is in Fifth Normal Form (5NF) if the relation does not have any
Join Dependency (JD).
What it means is if a relation schema is decomposable into multiple
relations where each relation has a number of attributes less than that of
the original relation such that the join of all the multiple relations is
equal to that of the original relation then the relation is said to have Join
Dependency. If a relation has join dependency it should be decomposed
into multiple relations to remove redundancy. These decomposed
relations would be in 5th Normal Form.
If the key of the decomposed relations is same as that of the original
relation or the original relation can not be reconstructed from decomposed
smaller relations, it means the original relation is already in 5th Normal
Form and hence no decomposition is required.
Fifth Normal Form (5NF) Example
Suppose we have the following relation about dealers, vehicle
manufacturing companies and products. Dealer Company Product
Trident Hyundai Car
Dakshin Hyundai Car
Dakshin Maruti Car
Dakshin Maruti Scooter
Magnum Maruti Car
Magnum Honda Car
Magnum Honda Motor Cycle
Assume that there is a rule that if a
dealer sells a product and the dealer
represents a company then the
dealer definitely sells the product of
the company.
Dealer Company
Trident Hyundai
Dakshin Hyundai
Dakshin Maruti
Magnum Honda
Magnum Maruti
Company Product
Hyundai Car
Maruti Car
Maruti Scooter
Honda Car
Honda Motor Cycle
Dealer Product
Trident Car
Dakshin Car
Dakshin Scooter
Magnum Car
Magnum Motor Cycle
Then the relation has
redundancy and join
dependency and can
be split into 3
relations.