70
1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

Embed Size (px)

Citation preview

Page 1: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

1

Normalization, Roberts’s Rules and Introduction to Data Modeling

CSCI 6442

Page 2: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

2

Agenda

Roberts’s RulesNormalizationRoberts’s Rules and Normalization

Page 3: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

3

Why Are We Talking About This?

To design a database, we choose a set of entities that models a problem

We will store data in tables corresponding to our entity choices

The names of the entity types, and what’s in which table, becomes embedded in our programs

Changing later on is complex, so we want a stable model of the problem

Page 4: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

4

Midterm Question

The first question on the midterm will deal with normal forms. It will deal with the relationship between normal forms and Roberts’s Rules.

This one question will count more than any other question on the exam.

The homework assignment for next week looks a lot like Question 1 on the midterm.

Page 5: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

5

Syntax and Semantics

Syntax deals with the structure and form of a statement or language

Semantics deals with the meaning that is conveyed by a statement or language

Page 6: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

6

Question

Is normalization a syntactic or a semantic construct?

That is, does it deal with the form of information, or is it involved with meaning?

Page 7: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

7

Intentional vs. Extensional Data

Extensional data—the data that is actually present

Intentional data—all the data that is allowed to be present

Question: does normalization deal with intentional or extensional data?

Page 8: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

8

Entity and Entity Type

An entity is something that we record information about in the database

An entity type is a set of similar things that we store information about

An entity instance is one example of some entity type.

Usually we don’t say entity instance and entity type when context makes the meaning clear; we just say entity.

Page 9: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

Relations

We use a relation to model a single entity type

The relation is a set of tuplesEach tuple is an ordered collection of

values of attributes of the entity typeEach tuple of the relation corresponds to

a single instance of the entity type

9

Page 10: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

10

Summary of Terminology

Real World Theory Database

Entity Type Relation Table

Entity Instance Tuple Row

Attribute Fact Column

Page 11: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

11

Facts

A value of an attribute in a row conveys one fact about an entity instance

An attribute is a fact stating that “This entity instance has the value <value>”

Consider emp(empno,ename,job,deptno) Each value of ename in a row states that “This

person’s name is <value>”. Each row of this table can be viewed as a

collection of four facts

Page 12: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

Example of Facts

EMPNO ENAME JOB DEPTNO

10 Wu President 1

20 Liu VP 2

30 Chen VP 2

12

Page 13: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

Data Modeling

The entire relational database, which is a set of relations, models something in the real world

The job of constructing that set of relations is called data modeling.

In general, in data modeling we are designing a collection of relations that models a part of the real world

All of the formality of normalization is all about how to construct a data model that behaves the way we want it to

13

Page 14: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

What We’ll Do Now

First, we’ll talk about Roberts’s Rules, a collection of rules in plain English about how to design a database.

We’ll be careful to fully understand Roberts’s Rules. Then we’ll talk about the basic normal forms: 1NF,

2NF, 3NF, BCNF and 4NF. We’ll take time to understand the normal forms: what

does each actually do? Finally, we’ll look at the correspondence of the normal

forms with Roberts’s Rules. You will finish this exploration by additional exploration

that you will do in your homework.

14

Page 15: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

15

Roberts’s Rules

Roberts’s Rules are a set of plain English rules that, if followed during database design, result in a highly normalized database design.

We will explore the relationship of Roberts’s Rules to normalization, and vice versa.

Page 16: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

16

Roberts’s Rules

Page 17: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

17

Rule 1

Each relation describes exactly one entity type.

A relation models a distinct entity type, and each tuple of the relation models an instance of that entity.

The relation models an entity by storing its attributes. The attributes that identify it are called candidate keys; the other attributes are non-key.

Page 18: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

Do these follow Rule 1?

DESK(SER#, HEIGHT, WIDTH, COST, CUSTODIANSALARY)

EMP-CAR (EMP#, ENAME, DEPTNO, CARVIN#, CARMAKE, CARYEAR)

EMP(EMP#, ENAME, JOB,DEPTNO, DEPTCITY)

18

Page 19: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

19

Rule 2

Each fact is represented only once in the database.

A tuple (aka row) is a collection of facts about an entity instance, one fact per column.

Each fact can appear only once, in one row of one table.

Page 20: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

20

Duplicate Representation?

EMPNO ENAME JOB SAL DEPTNO

34 Liu Pres 200 5

456 Chen VP 150 5

32 Cox Sales 75 9

DEPTNO DNAME LOC

5 HQ NYC

9 Sales DC

20 Research SF

Page 21: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

21

Rule 3

Each tuple can reside in only one relation.

A relation is a model of an entity type, not a station on a factory assembly line.

Instead of moving a tuple from relation to relation, add an attribute that characterizes status.

Page 22: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

Rule 3 Example

As a person is being interviewed and hired, they change status:1. Resume received

2. Resume being evaluated

3. Selected for interview

4. Selected for hire

5. Hired As status changes, we could more the

person’s row from one table to another. Should we?

22

Page 23: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

23

Rule 4

If the cardinality of an attribute is greater than one, then database design must be insensitive to cardinality.

It’s easy—and very risky—to presume that the cardinality of various entity types and subtypes will remain the same.

Page 24: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

24

Rule 4 Examples

Company carCollege degreeTelephone numberHome addressBusiness addressEmail address

Page 25: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

25

Example of Roberts’s Rules

EMP ( EMPNO, ENAME, DEPTNO, DNAME) DEPT (DEPTNO, DNAME, DLOC)

This relation violates the following Roberts’s rules :

Rule 1. The EMP table describes employee as well as department

Rule 2. In the EMP table, if we have the same DEPTNO in multiple rows, DNAME will be represented multiple times.

Page 26: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

26

Another Example

EMP (ENAME, DEGREE1, DEGREE2, DEGREE3)

This schema violates the following Roberts’s rule : Rule 4. The design assumes every employee has a maximum of 3 degrees. If an employee has 4 degrees, then the database needs to be restructured by adding DEGREE4 in the EMP table.

Rule 4 deals with an aspect of data independence. It can be stated informally as:

"Grow down, not across"

Page 27: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

A Question

Are Rule 1 and Rule 2 equivalent?

They are equivalent if the set of relations that satisfy Rule 1 is the same as the set of relations that satisfies Rule 2.

This is a homework problem.

27

Page 28: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

28

Normalization Preliminaries

Page 29: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

29

Normalization

A set of formal rules that are intended to be a definition of a properly-structured database

A normal form generally deals with and removes certain anomalous behavior from the use of a relation that is normalized.

Page 30: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

30

Examples of Anomalies

Insert anomalies If we want to enter information about a new entity in

the database we need to enter information about some other entity first

Delete anomalies In order to delete information about an entity we

must delete information about another entity Update anomalies

In order to change the value of a single fact we may have to change many stored values in the database

Page 31: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

31

Basic Concepts

Entity Type: a class of an object that we record information about. Aka relation, table

Attribute: a characteristic of an entity. Aka column.

Entity Instance: a single occurrence of an entity type. Aka tuple, row

Page 32: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

32

Candidate Keys

 Candidate key: a set of attributes Ai, Aj,…Ak that is a candidate key has two (time-invariant) properties:

1. Uniqueness – no two tuples have the same value for the candidate key

2. 2. Minimality – if any Ai is discarded from the candidate key, then the uniqueness property is lost. It is the smallest set of attributes that identifies a row.

How many candidate keys can a table have?

Page 33: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

33

Primary Key

One of the candidate keys is selected to be the primary identifier of rows. It is called the primary key.

The selection is usually made based on the usefulness of the attribute that is the primary key.

Page 34: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

34

Functional Dependence

R.X→R.Y    or    R.X   FD    R.Y Given a relation R, attribute Y of R is

functionally dependent on attribute X of R iff each X-value in R has associated with it precisely one Y-value in R (at any one time)

In other words, for each value of X in table R, there is one and only one value of Y. A given X value must always occur with the same Y value.

Page 35: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

35

Functional Dependence Examples

X Y

1 A

2 C

3 B

1 A

2 C

4 A

3 B

6 B

Does X→Y?

Does Y→X?

Page 36: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

36

Anomalies

Update anomalies: If one copy of repeated data is updated, inconsistency is created unless all copies are similarly updated.

Insert anomalies: It may not be possible to store some information unless some other information is stored as well.

Delete anomalies: It may not be possible to delete some information without losing some other information as well.

Page 37: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

37

Full Functional Dependence

Y is fully functionally dependent on X iff X→Y and no subset of X determines Y.

That is, X is the smallest collection of columns that determines Y.

Page 38: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

“Aboutness”

FD is about “aboutness”

If A is FD on X, then A is “about” X

Suppose X is employee ID, EID; then EID determines salary, SAL

But SAL is “about” the employee identified by EID

38

Page 39: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

39

Normalization

Page 40: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

40

First Normal Form

Page 41: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

41

First Normal Form

A relation is said to be in first normal form iff every attribute of every tuple is atomic.

Page 42: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

42

1NF Example

Empno Ename Job Educ Deptno

33 Jones Pres BS EE, MS EE, PhD Comp Sci

3

324 Chu VP BS EE, MBA 3

88 Kumar Sales BS EE, MA Comm 4

65 Yu Quality Contr. BS CS, MS CS, PhD CS 5

Question: Is this relation in 1NF?

Question: Does this relation show any anomalies?

Page 43: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

What’s not allowed by 1NF?

1NF doesn’t allow a relation to containListsOther relationsMultiple values

43

Page 44: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

44

Second Normal Form

Page 45: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

45

Second Normal Form

A relation is said to be in second normal form iff it is in first normal form and every attribute is fully functionally dependent on the primary key.

Page 46: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

46

2NF Example

SID SNAME City Status

4 Smith NYC 45

6 Liu DC 65

7 Chen NYC 45

9 Jones LA 22

SID

SNAME

City Status

Does this relation follow Roberts’s Rules?

Do you see any anomalies?

Page 47: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

47

2NF and RR

What is the relationship between 2NF and Roberts’s Rules?

If Rule 1 is met, is the relation in 2NF?

What about Rule 2?

Page 48: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

What does 2NF not permit?

2NF doesn’t allow a relation to have information about more than one entity type

48

Page 49: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

49

Third Normal Form

Page 50: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

50

Third Normal Form

A relation is said to be in third normal form iff it is in second normal form and there are no transitive dependencies.

Page 51: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

51

How Do We Convert To 3NF?

SID SNAME City Status

4 Smith NYC 45

6 Liu DC 65

7 Chen NYC 45

9 Jones LA 22

SID SNAME City

4 Smith NYC

6 Liu DC

7 Chen NYC

9 Jones LA

City Status

DC 65

NYC 45

LA 22

SID

SNAME

City

StatusCity

Page 52: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

52

3NF and RR

If a relation is in 3NF, what about rules 1 and 2?

Page 53: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

What is not permitted by 3NF?

3NF refines the notion of “aboutness” beyond the restrictions of 2NF

53

Page 54: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

54

Fourth Normal Form

Page 55: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

55

Multi-Valued Dependency

R.X is said to multi-value determine R.Y if there is a set of values for Y that must appear in any relation where R.X appears.

For example, if a course has two textbooks, then there will be an MVD between the course number and the names of the books.

Page 56: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

56

Fourth Normal Form

A relation is said to be in fourth normal form iff it is in third normal form and it does not have more than one multi-valued dependency.

Page 57: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

57

Example of 4NF

Is this relation in 4NF?

SID Sport Instrument

87 Soccer Saxophone

87 Tennis Violin

87 Soccer Violin

87 Tennis Saxophone

SIDSport

InstrumentMVD

MVD

SPORT-INSTRUMENT

Page 58: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

58

Converting to 4NF

SID Sport

87 Soccer

87 Tennis

SID Instrument

87 Saxophone

87 Violin

SID Sport

InstrumentMVD

MVD

SID

SPORT INSTRUMENT

Page 59: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

What does 4NF not permit?

4NF does not permit multiple MVDs in a single relation

59

Page 60: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

60

Boyce-Codd Normal Form

Page 61: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

61

Boyce-Codd Normal Form

A relation is said to be in Boyce-Codd normal form iff every determinant is a key.

BCNF deals with problems that can be caused by overlapping candidate keys.

Page 62: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

62

Example of BCNF

S# SNAME P# QTY

1 Acme 65 788

2 Chen 34 76

3 Jones 65 34

How does this relation comply with Rule 1 and Rule 2?

S# SNAME

P#QTY

Are there any anomalies?

Page 63: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

63

Converting to BCNF

S# P# QTY1 65 788

2 34 76

3 65 34

S# SNAME1 Acme

2 Chen

3 Jones

S# SNAME

P#

QTYS#

Page 64: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

What does BCNF not allow?

BCNF brings the restrictions on “aboutness” to candidate and composite keys

64

Page 65: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

Roberts’s Rules and Normal Forms

65

Page 66: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

Rule 1: One Entity Type Per Table

Each row must be about a single entity type

Can’t have information about two entity types

Think of FD. RR1 requires FD, does not allow transitive FD.

66

Page 67: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

Rule 2: Each Fact Represented Once

What must happen for a single fact to be represented more than once?

Most likely, there is a transitive dependency

So RR2 seems to disallow transitive dependency

67

Page 68: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

What About 4NF?

Lack of 4NF causes duplicate representation of facts.

Not permitted by RR2.

68

Page 69: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

You will have the opportunity for more exploration of this relationship with your homework for next week.

69

Page 70: 1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442

70

Data Modeling

When we design a relational database, we search for a set of entity types that will model the problem of interest

If we choose a robust data model, it will last a long time without major changes, even though the programs that use it may change

Now that we have some idea what a good data model is, we will talk about how to design one.