34
CS 377 Database Systems Database Design Theory and Normalization 1 Normalization Li Xiong Department of Mathematics and Computer Science Emory University

CS 377 Database Systems Database Design Theory and ...lxiong/cs377_f11/share/slides/14_relational... · CS 377 Database Systems Database Design Theory and Normalization 1 ... 4NF

Embed Size (px)

Citation preview

CS 377 Database Systems

Database Design Theory and

Normalization

1

Normalization

Li Xiong

Department of Mathematics and Computer Science

Emory University

Relational database design� So far

� Conceptual database design - ER Model

� Logical database design - relational model

� Mapping from ER to Relational Model

� Relational Algebra

2

� Relational Algebra

� SQL

� Relational database design - relational model

� goodness measures of relational schemas

3

Some “Bad” Designs

Anomalies

� Insert Anomaly:

� an insert operation that insert ONE item of information

needs to insert multiple tuples into some relation or needs

to use NULL values

� Delete Anomaly:

6

� Delete Anomaly:

� a delete operation that delete ONE item of information

needs to delete multiple tuples from some relation or

cause "additional" (unintended) information loss

� Update Anomaly:

� an update operation that update ONE item of information

needs to update multiple tuples and may result in logical

inconsistencies

Generation of Spurious Tuples

� Figure 15.5(a)

� Relation schemas EMP_LOCS and EMP_PROJ1

� NATURAL JOIN

� Result produces many more tuples than the original set of

9

� Result produces many more tuples than the original set of

tuples in EMP_PROJ

� Called spurious tuples

� Represent spurious information that is not valid

Problematic Designs

� Anomalies cause redundant work to be done

� Waste of storage space due to NULLs

� Difficulty of performing operations and joins due to

NULL values

10

NULL values

� Generation of invalid and spurious data during joins

Informal Design Guidelines

for “Good” Relation Schemas� Clear schema and attribute semantics

� No insertion, deletion, or update anomalies are

present

� Reducing redundant information in tuples

11

� Reducing redundant information in tuples

� Reducing NULL values in tuples

� Can be joined with equality conditions on related

attributes with guarantees that no spurious tuples

are generated

Database Design Theory� Normal forms

� Each Normal Form defines a set of properties that relations must

satisfy

� When relations possess these properties, they exhibit less anomalies

� Successively higher degrees of stringency

� Database normalization

12

� Certify whether a database design satisfies a certain normal form

� Correct a database design to achieve certain normal form

� Additional properties

� Nonadditive join property

� Dependency preservation property

History

� Relational database model

� 1970, Codd

� 1NF, 2NF and 3NF (first, second, and third normal form)

� 1972, Codd

� Based on the concept of functional dependency

13

� BCNF (Boyce-Codd Normal Form)

� 1974, Boyce & Codd

� new and stronger 3NF

� 4NF

� 1977, Fagin

� multi-valued dependencies

� 5NF (projection-join normal form)

� 1979, Fagin

First Normal Form

� Part of the formal definition of a relation in the

basic (flat) relational model

� Only attribute values permitted are single atomic

(or indivisible) values

14

(or indivisible) values

� Techniques to achieve first normal form

� Remove attribute violating 1NF and place in separate

relation

� Expand the key

� Use several atomic attributes if maximum number of

values is known

15

Functional Dependency

� Constraint between two sets of attributes

16

� X functionally determines Y

� Y is functionally dependent on X

� Notes

� If X is a candidate key of R, then X� R

� If X � Y, not necessarily Y � X

Example FDs

Example FDs

Functional Dependency� An FD is a property of semantics or meaning of the attributes

� An FD is a property of the relational schema, not of a particular

legal relation state

� An FD must be defined based on the semantics of the attributes

� An FD cannot be inferred automatically from a given populated

relation

19

relation

� An FD may exist

� Can state that an FD does not hold if there are violations of such an FD

Definitions of Keys and Attributes

Participating in Keys� Definition of superkey and key

� Candidate key

� If more than one key in a relation schema

• One is primary key

20

• One is primary key

• Others are secondary keys

Second Normal Form

� Full functional dependency vs. Partial functional

dependency

� X�Y is a full functional dependency if for any A, (X-

{A}) does not functionally determine Y

� X�Y is a partial functional dependency if for some A,

21

X Y is a partial functional dependency if for some A,

(X-{A}) functionally determines Y

� Second normal form (2NF)

� Problematic FD

� Left-hand side is part of primary key

Third Normal Form

� Transitive dependency� X�Y is a transitive dependency if for some Z that is not a prime

attribute, both X�Z and Z�Y hold

� Third normal form

24

� Problematic FD� Left-hand side is part of primary key

� Left-hand side is a nonkey attribute

Summary

27

28

29

30

Boyce-Codd Normal Form� BCNF

� Difference from 3NF:

� 3NF allows A to be prime

31

� 3NF allows A to be prime

� Every relation in BCNF is also in 3NF

� Most relation schemas that are in 3NF are also in

BCNF but not all:

32

33

Summary

� Informal guidelines for good design

� Functional dependency

� Normal forms

� 1NF, 2NF, 3NF, BCNF

34

� 1NF, 2NF, 3NF, BCNF