38
Normalization Introduction Data Redundancy

Database Systems - Normalization of Relations(Chapter 4/3)

Embed Size (px)

Citation preview

Normalization •Introduction

•Data Redundancy

Introduction

Normalization process is one of the steps to design databases.

Input to this step is ER Model.

Output of this step is a relational data model.

What is Normalization?

Normalization is a technique or process for relational database design by analysis.

The technique involves evaluating each relation against the criteria for normal forms of relations and decomposing relations if required.

No implementation details to be shown.

Purpose of Normalization

To come up with a set of relations that represent logical database design of a database.

Characteristics of Normalized Relations

The characteristics of relations that normalization process produces are as follows:

1. minimal number of attributes in each relation

2. attributes with close relationships in each relation

3. minimal redundancy of attributes, i.e., minimal number of instances of the same attribute being used in more than one relation.

Data Redundancy • Problems with Data Redundancy

• Unavoidable Data Redundancy

• Insertion Anomalies

• Deletion Anomaly

• Modification Anomaly

Data Redundancy & Its Problems

Data redundancy means presence of the same attributes, in multiple relations, that are redundant.

Example:

emp(eno, ename, salary, dno, dname) dept(dno, dname, mgr_eno, mgr_name)

Attribute dname of table emp and mgr_name of dept are redundant.

Unavoidable Data Redundancy

Foreign keys in relations that refer to primary keys or candidate keys in other relations can not be avoided.

Example: emp(eno, ename, salary, dno) dept(dno, dname, mgr_eno)

Data Redundancy Problems

Data redundancy in relations can cause the following anomalies:

1. Insertion Anomalies

2. Deletion Anomalies

3. Modification Anomalies

Insertion Anomalies

Types of Insertion Anomalies:

Insertion Anomaly-1: Inserted values of some attributes are not consistent with values that are present in already existing tuples.

Insertion Anomaly-2: Can not insert values for related attributes in a tuple without entering values in some other attributes of the tuple.

Insertion Anomalies - Example

Insertion anomaly-1 occurs if you insert values, for example, (11,

‘Marketing’) instead of (11, ‘Research’) in attributes dno and

dname of a new tuple.

Insertion anomaly-2 occurs when you try to insert a tuple for a

new department and you have to have some dummy employee

details in absence of valid employee details.

Deletion Anomaly

The deletion anomaly is the deletion of vital facts from the un-normalized relations.

In the above un-normalized relation, if an employee tuple

is deleted, even the department information is also

deleted.

Modification Anomaly

A modification anomaly leaves un-normalized relations in an inconsistent state.

If you modify department name in a tuple of relation empdepts, you have

to modify department name value in all other tuples that have the same

department name or data will be inconsistent.

Attribute Dependencies

• Functional Dependency

Attribute Dependencies

Attribute dependency means dependency of values of one or more attributes on values of one or more other attributes in a relation.

Understanding dependencies is important for designing relations. These dependencies help in deciding primary key for relations.

Types of dependencies:

• Functional Dependencies

• Multi-Valued Dependencies

Functional Dependency

Functional Dependency (FD) is a description of relationship between attributes of a relation.

If X and Y are subsets of attributes of relation R, we say X functionally determines Y or Y is functionally dependent on X or simply X determines Y and denote this relationship with notation X → Y if each value of X is associated with exactly one value of Y.

Diagrammatic representation is as follows:

Functional Dependency Examples

Valid functional

dependencies :

eno -> ename

eno -> dob

eno -> salary

eno -> gender

eno -> dname

An invalid functional

dependency: gender -> eno

It is invalid because for the

same gender, for example, ‘M’,

there are multiple eno values

1001, 1002, etc.

Full Functional Dependency

A Functional Dependency is a Full Functional dependency if all attributes of determinant attributes determine dependent attributes.

X → Y is a full functional dependency of a relation R if set of attributes Y of the relation is functionally dependent on set of attributes X of the relation but not on any proper subset of X.

Partial Functional Dependency

A Functional Dependency is a Partial Functional dependency if only a few attributes of determinant attributes determine dependent attributes.

X → Y is a partial functional dependency of a relation R if set of attributes Y of the relation is functionally dependent on a proper subset of X. An example for partial functional dependency is eno,dob → salary because salary depends only on eno and not on date of birth attribute dob.

Transitive Functional Dependency

A Functional Dependency is a Transitive Functional dependency if its determinant itself is dependent on some other determinant.

If X, Y and Z are sets of attributes of a relation R and there are functional dependencies X → Y and Y → Z, then we say there is a transitive dependency between X and Z where Z is transitively dependent on X, i.e., X → Z, provided X is not functionally dependent on Y or Z. In the example relation empdepts, there are functional dependencies eno → dno and dno → dname. This implies there is a transitive dependency eno → dname.

Functional Dependencies in relation empdepts

The following are the functional dependencies of relation empdepts:

Full Functional Dependencies:

eno → ename, dob, salary, gender, dno, dname eno → dno dno → dname

Transitive Functional Dependencies:

eno → dname

Identifying Primary Keys and Candidate Keys

Left-hand side attributes of a functional dependency is a candidate key if right-hand side attributes of the functional dependency form the remaining attributes of the relation. This means all other attributes are dependent on the candidate key attributes. If there is only one candidate key for the relation, designate the candidate key as the primary key for the relation. Otherwise, choose one of the candidate keys as the primary key based on the enterprise requirements.

Multi-Valued Dependency

A Multi-valued Dependency (MVD) is a dependency of attribute Y and Z on X in a relation such that for each value of X, there is a set of values for Y and there is a set of values for Z and both sets of values are independent of each other.

The multi-valued dependency is represented with a double arrow as follows:

X -->> Y

X -->> Z

Multi-Valued Dependency Example

The following relation emp_assets(eno, phone, computer)

represents company assets (phones and computers) given to

an employee. As such phones and computer systems are

not related at all. Number of phones and computers are

also not related at all. Hence, the relation has the following

MVDs:

eno -->> phone

eno -->> computer

NORMAL FORMS

What is a Normal Form?

Un-normalized Form (UNF)

A relation that contains multi-valued and/or composite

attributes is said to be in un-normalized form.

Example: For example, the following relation emp has two

phone numbers for attribute phone and multiple

components in values for attribute address. Here,

attribute phone is a multi-valued and attribute address is

composite, i.e., non-atomic or divisible.

First Normal Form (1NF)

A relation is said to be in the First Normal Form (1NF) if

the relation does not have multi-valued attributes,

composite attributes or a combination of them.

Example: Attributes phone_number and interest of

relation customers are both multi-valued whereas

attribute details of relation participants is composite.

Problems with 1NF Relations

The first normal form relations have the following

problems:

1. Redundant data

2. Update anomalies

The emp relation which is in 1NF has redundant data for all

columns except phone number! It also can give rise to all

update anomalies we studied earlier.

Second Normal Form (2NF)

A relation is in Second Normal Form (2NF) if the relation is in

First Normal Form and every non-primary-key attribute is

fully functionally dependent on any candidate key.

Example:

Third Normal Form (3NF)

A relation is in Third Normal Form (3NF) if the relation is

already in Second Normal Form and no non-primary-key

attribute is transitively dependent on the primary key.

Two ways a set of attributes Y can transitively depend on a

key:

KEY X Y

KEY Y X

Third Normal Form (3NF) (contd…)

Example:

eno dno; dno dname eno dname

Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd Normal Form (BCNF) if and only

if every determinant is a candidate key.

This normal form is more stricter than 3NF.

Every relation in BCNP is also in 3NF.

Example: The FD in the relation emp3 that violates BCNF is

pin_code block, area, city

Fourth Normal Form (4NF)

A relation is in Fourth Normal Form (4NF) if the relation is in

Boyce-Codd normal form and does not contain non-trivial

multi-valued dependencies.

Example: The following relation has MVDs

eno language

eno hobby

Attribute language and hobby are

Both independent of each other.

Fourth Normal Form (4NF) – Repeated Values

The relation specified earlier has values repeated for

attributes language and hobby. One way to avoid the

repetition is to not store repeating values as shown below

Invalid representations of the

relations trying to avoid redundant

data

Relation not in

4th normal form

Decomposing into 4th Normal Form

The decomposition involves forming a relation for each MVD

so that relation contains determiner as well as dependent

attributes as shown below

Original relation Two relations after decomposing.

Fifth Normal Form (5NF)

A relation is in Fifth Normal Form (5NF) if the relation does not have any

Join Dependency (JD).

What it means is if a relation schema is decomposable into multiple

relations where each relation has a number of attributes less than that of

the original relation such that the join of all the multiple relations is

equal to that of the original relation then the relation is said to have Join

Dependency. If a relation has join dependency it should be decomposed

into multiple relations to remove redundancy. These decomposed

relations would be in 5th Normal Form.

If the key of the decomposed relations is same as that of the original

relation or the original relation can not be reconstructed from decomposed

smaller relations, it means the original relation is already in 5th Normal

Form and hence no decomposition is required.

Fifth Normal Form (5NF) Example

Suppose we have the following relation about dealers, vehicle

manufacturing companies and products. Dealer Company Product

Trident Hyundai Car

Dakshin Hyundai Car

Dakshin Maruti Car

Dakshin Maruti Scooter

Magnum Maruti Car

Magnum Honda Car

Magnum Honda Motor Cycle

Assume that there is a rule that if a

dealer sells a product and the dealer

represents a company then the

dealer definitely sells the product of

the company.

Dealer Company

Trident Hyundai

Dakshin Hyundai

Dakshin Maruti

Magnum Honda

Magnum Maruti

Company Product

Hyundai Car

Maruti Car

Maruti Scooter

Honda Car

Honda Motor Cycle

Dealer Product

Trident Car

Dakshin Car

Dakshin Scooter

Magnum Car

Magnum Motor Cycle

Then the relation has

redundancy and join

dependency and can

be split into 3

relations.