28
. Dr Mohamed Osman Hegaz 1 Logical data base design (2) Logical data base design (2) Normalization Normalization

Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Embed Size (px)

Citation preview

Page 1: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 1

Logical data base design (2)Logical data base design (2)

NormalizationNormalization

Page 2: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 2

Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations (Bad relation: relation contains redundancy, or duplicated values and cause update anemones) Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

Page 3: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 3

The Process of Normalization

Formal technique for analyzing a relation based on its primary key and functional dependencies between its attributes.

Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties.

As normalization proceeds, relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies.

Page 4: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 4

Relationship Between Normal Forms

Page 5: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 5

“Key” Concepts

- Superkey - A set of attributes such that no two tuples have the same values for these attributes

– Primary key - A selected candidate key

Page 6: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 6

Unnormalized Form (UNF) A table that contains one or

more repeating groups.

To create an unnormalized table: transform data from information

source (e.g. form) into table format with columns and rows.

Page 7: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 7

First Normal Form (1NF) A relation in which intersection of each row

and column contains one and only one value.• A relation schema is in 1NF if domains of

attributes include only atomic (simple, indivisible) values and the value of an attribute is a single value from the domain of that attribute

1NF disallows– having a set of values, a tuple of values, or a

combination of both as an attribute value for a single tuple

– “relations within relations” and “relations as attributes of tuples

Page 8: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 8

UNF to 1NF

Nominate an attribute or group of attributes to act as the key for the unnormalized table.

Identify repeating group(s) in unnormalized table which repeats for the key attribute(s).

Page 9: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 9

UNF to 1NF

Remove repeating group by: entering appropriate data into the

empty columns of rows containing repeating data (‘flattening’ the table).

Or by placing repeating data along with

copy of the original key attribute(s) into a separate relation.

Page 10: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 10

Non- 1NF Relation

Page 11: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 11

Relations in 1NF

Page 12: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 12

Relations in 1NF

Page 13: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 13

(a) Relation schema that is not in 1NF.

(b) Example relation instance. (c) 1NF relation with redundancy.

Page 14: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 14

(a) Schema of the EMP_PROJ

relation with a "nested relation“

PROJS.

(b) Example extension of theEMP_ PROJ relation showing nested relations within each tuple

Page 15: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 15

Decomposing EMP_PROJ into 1NF relations

EMP_PROJ1 and EMP_PROJ2 by propagating the primary key.

Page 16: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 16

Second Normal Form (2NF)

A relation schema is in 2NF if it is in 1NF, and every non- prime attribute is fully functionally dependent on the primary key

A FD X -> Y is termed “full” if removal of any attribute from X means that the FD no longer holds

A FD X -> Y is termed “partial” if some attribute can be removed from X and the dependency still holds

Page 17: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 17

1NF to 2NF Identify primary key for the 1NF relation.

Identify functional dependencies in the relation.

If partial dependencies exist on the primary key remove them by placing them in a new relation along with copy of their determinant.

Page 18: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 18

Normalizing EMP_PROJ into 2NF relations

Page 19: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 19

Third Normal Form (3NF)Third Normal Form (3NF)

Based on concept of transitive dependency: A, B and C are attributes of a relation such that

if A B and B C, then C is transitively dependent on A through

B. (Provided that A is not functionally dependent on B or C).

3NF - A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key.

Page 20: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 20

2NF to 3NF

Identify the primary key in the 2NF relation.

Identify functional dependencies in the relation.

If transitive dependencies exist on the primary key remove them by placing them in a new relation along with copy of their determinant.

Page 21: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 21

Normalizing EMP_DEPT into 3NF relations

Page 22: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 22

General Definitions of 2NF and 3NF

Second normal form (2NF) A relation that is in 1NF and every non-primary-

key attribute is fully functionally dependent on any candidate key.

Third normal form (3NF) A relation that is in 1NF and 2NF and in which no

non-primary-key attribute is transitively dependent on any candidate key.

Page 23: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 23

Boyce–Codd Normal Form (BCNF)

Based on functional dependencies that take into account all candidate keys in a relation, however BCNF also has additional constraints compared with general definition of 3NF.

BCNF - A relation is in BCNF if and only if every determinant is a candidate key.

Page 24: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 24

Boyce–Codd normal form (BCNF)

Difference between 3NF and BCNF is that for a functional dependency A B, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key.

Whereas, BCNF insists that for this dependency to remain in a relation, A must be a candidate key.

Every relation in BCNF is also in 3NF. However, relation in 3NF may not be in BCNF.

Page 25: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 25

Summary :Summary :

2NF, 3NF, BCNF based on keys and FDs of a relation schema

4NF based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs

Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation)

Page 26: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 26

Summary (cont)Summary (cont): : Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form. (usually up to 3NF, BCNF or 4NF)

Denormalization: the process of storing the join of higher normal form relations as a base relation—which is in a lower normal form

Page 27: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 27

Summary (cont):Summary (cont):

Definitions of Keys and Attributes Definitions of Keys and Attributes Participating in Keys (1)Participating in Keys (1)

A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more.

Page 28: Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization

Dr. Mohamed Osman Hegaz 28

Summary (cont):Summary (cont):

Definitions of Keys and Attributes Definitions of Keys and Attributes Participating in Keys (1)Participating in Keys (1)

If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys.

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.