40
11. Normalization Database Design is the process of finding a "good" logical structure for the database. So far, design has been maintaining well-defined relations and choosing good primary keys. Here is a review of some of the definitions: A CANDIDATE KEY is a set of attributes, K, from the schema which always satisfies UNIQUENESS: no 2 distinct tuples have same K-value, MINIMALITY: None of A i ..A k can be discarded and still maintain uniqueness. A PRIMARY KEY is one candidate key that is designated as the primary key (by a database administrator?). An ALTERNATE KEY is non-primary candidate key. A SUPERKEY is an attribute superset of a candidate key. NORMALIZATION is the part of Data Base design activities that precludes operational anomalies, and provide good key properties for projections. Normalization classes are (more exist): 1st Normal Form (1NF) 2nd Normal Form (2NF) 3rd Normal Form (3NF) BCNF Boyce Codd Normal Form (BCNF) 4th Normal Form (4NF) ... Section 11 # 1

11. Normalization

  • Upload
    cid

  • View
    61

  • Download
    1

Embed Size (px)

DESCRIPTION

Section 11 # 1. 11. Normalization. Database Design is the process of finding a "good" logical structure for the database. So far, design has been maintaining well-defined relations and choosing good primary keys. Here is a review of some of the definitions: - PowerPoint PPT Presentation

Citation preview

Page 1: 11. Normalization

11. NormalizationDatabase Design is the process of finding a "good" logical structure for the database. So far, design has been maintaining well-defined relations and choosing good primary keys. Here is a review of some of the definitions: A CANDIDATE KEY is a set of attributes, K, from the schema which always satisfies

UNIQUENESS: no 2 distinct tuples have same K-value,

MINIMALITY: None of Ai..Ak can be discarded and still maintain uniqueness.A PRIMARY KEY is one candidate key that is designated as the primary key (by a database

administrator?).An ALTERNATE KEY is non-primary candidate key.A SUPERKEY is an attribute superset of a candidate key.NORMALIZATION is the part of Data Base design activities that precludes operational

anomalies, and provide good key properties for projections.Normalization classes are (more exist):

1st Normal Form (1NF) 2nd Normal Form (2NF) 3rd Normal Form (3NF) BCNF Boyce Codd Normal Form (BCNF) 4th Normal Form (4NF) ...

Section 11 # 1

Page 2: 11. Normalization

I ask again: Who decides primary key? (and other design choices?)

• Ph D Part Number will be the primary key for, PARTS(P#, COLOR, WT, TIME-OF-ARRIVAL).• MG You're the expert.• Ph D Well, P# should be the primary key, because IT IS the lookup attribute!

. . . later

– Pointy-headed Dbexpert = Ph D

• The Database design expert?– NO! Not in isolation, anyway.– Someone from the enterprise who understands the

data and the procedures should be consulted.– The following story illustrates this point. CAST: – Mr. Goodwrench = MG (parts manager);

Section 11 # 2-3

• MG Why is lookup so slow?• Ph D You do store parts in the stock room ordered by P#, right?

• MG No. We store by weight! When a shipment comes in, I take each part into the back room and throw it. The lighter ones go further than the heavy ones so they get ordered by weight!

• Ph D But, but… weight doesn't have Uniqueness property! Parts with the same weight end up together in a pile!

• MG No they don't. I tire quickly, so the first one I throw goes furthest.• Ph D Then we’ll use a composite primary key, (weight, time-of-arrival).• MG We get our keys primarily from Murt’s Lock and Key.

Point: This conversation should have taken place during the 1st meeting.

Page 3: 11. Normalization

Fucntional DependenciesThe defining conditions get stronger going down this list and therefore the sets of qualifying

relations gets smaller going down this list.

By defining conditions 4NF BCNF 3NF 2NF 1NF ⇒ ⇒ ⇒ ⇒

By classes of relations 4NF BCNF 3NF 2NF 1NF ⊆ ⊆ ⊆ ⊆

Given a relation R with attributes X and Y (possibly composite)

R.Y is FUNCTIONALLY DEPENDENT (FD) on R.Xor equivalently, R.X FUNCTIONALLY DETERMINES R.Y written R.X → R.Y

iff (x,y1) and (x,y2) R[X,Y] implies y∈ 1=y2.

NOTES: All attributes are FD on any attribute with the uniqueness property. Functional Dependencies are like integrity constraints. They are stipulated to hold (all tuples for

all times) by designers.They can't be determined simply by observing the data state at a particular time. They are quite different from Association Rules. (ARs are approximate dependencies that hold at

a given confidence level over a given subset at a particular time).

Section 11 # 4

Page 4: 11. Normalization

Full Functional DependenciesFor composite attribute, X (composed of more than 1 attribute)

R.Y is FULLY FUNCTIONALLY DEPENDENT (FFD) on R.X or R.X FULLY FUNCTIONALLY DETERMINES R.Y

written R.X R.Y ⇒

iff R.X→R.Y but R.Z→R.Y for any proper subset Z X.⊂

Said another way; R.X is a candidate key (has uniqueness and minimality) for R[X,Y].

These are SEMANTIC notions. One needs to know how data is used or what is intended (ask people who created/use data!)

Section 11 # 5

Page 5: 11. Normalization

First Normal Form (1NF)NORMAL FORMS (initially assuming only one candidate key)

A file or relation is in First Normal Form (1NF) if there are no repeating groups (all attribute values are atomic). Allowing repeating groups in an attribute creates a situation in which the key does not determine a value in that attribute (but possibly many values).

We can use a memory technique, that 1NF means: (note: THIS IS A MEMORY TECHNIQUE, NOT A DEFINITION - i.e., don't use this as a definition on an exam) every nonkey attribute is functionally dependent on key.

This EDUCATION1 file is not in First Normal Form (1NF) S#|SNAME |CHILDREN|LCODE |LSTATUS|C#|CNAME|SITE |GR 32|THAISZ|Ed,Jo,Ty|NJ5102| 1 | 8| DSDE| ND |89 25|CLAY |Ann |NJ5101| 1 | 7| CUS | ND |68 32|THAISZ|Ed,Jo,Ty|NJ5102| 1 | 7| CUS | ND |91 25|CLAY |Ann |NJ5101| 1 | 6| 3UA | NJ |76 32|THAISZ|Ed,Jo,Ty|NJ5102| 1 | 6| 3UA | NJ |62

This EDUCATION2 file is in First Normal Form (1NF) S# |SNAME |LCODE |LSTATUS|C#|CNAME|SITE|GR 32|THAISZ|NJ5102| 1 | 8| DSDE| ND |89 25|CLAY |NJ5101| 1 | 7| CUS | ND |68 32|THAISZ|NJ5102| 1 | 7| CUS | ND |91 25|CLAY |NJ5101| 1 | 6| 3UA | NJ |76 32|THAISZ|NJ5102| 1 | 6| 3UA | NJ |62How do you get EDUCATION1 into 1NF? 1. Create a separate CHILDREN file: CHILD|S#

|Ed |32| |Jo |32| |Ty |32| |Ann |25|

Section 11 # 6

Page 6: 11. Normalization

Second Normal Form (2NF)A file or relation is in Second Normal Form (2NF) if it is in 1NF and every nonkey attribute is fully functionally dependent on the primary key.( memory

scheme: every nonkey attribute is functionally dependent on whole key).

Why do we need (want) relations to be in 2NF?

1NF relations which are not 2NF present PROBLEMS (anomalies), e.g.,

S# |SNAME |LCODE |LSTATUS|C#|CNAME|SITE|GR 32|THAISZ|NJ5102| 1 | 8| DSDE| ND |89 25|CLAY |NJ5101| 1 | 7| CUS | ND |68 32|THAISZ|NJ5102| 1 | 7| CUS | ND |91 25|CLAY |NJ5101| 1 | 6| 3UA | NJ |76 32|THAISZ|NJ5102| 1 | 6| 3UA | NJ |62

INSERT ANOMALY: Can't record JONES' LCODE until he takes a course.

DELETE ANOMALY: Delete 1st record (e.g., THAISZ drops DSDE), loose C#=8 is DSDE in ND

UPDATE ANOMALY: Change SITE of C#=6 from NJ to ND search sequentially for all C#=6.

Section 11 # 7

Page 7: 11. Normalization

Second Normal Form (cont.)E.g., In the EDUCATION1 file, with key (S#,C#), FD's which are not FFD are: (S#,C#) → SNAME (S#,C#) → LCODE (S#,C#) → LSTATUS (S#,C#) → CNAME (S#,C#) → SITE We make these FD's into FFD's by breaking (projecting) the file into 3 files. (puts relations in 2NF) STUDENTS = EDUCATION1[S#,SNAME,LCODE,LSTATUS] ENROLL = EDUCATION1[S#,C#,GRADE] COURSE = EDUCATION1[S#,CNAME,SITE]

STUDENTS ENROLL COURSE|S#|SNAME |LCODE |LSTATUS| |S#|C#|GRADE| |C#|CNAME|SITE||25|CLAY |NJ5101| 1 | |32|8 | 89 | |8 |DSDE |ND ||23|THAISZ|NJ5102| 1 | |32|7 | 91 | |7 |CUS |ND ||38|GOOD |FL6321| 4 | |25|7 | 68 | |6 |3UA |NJ ||17|BAID |NY2091| 3 | |25|6 | 76 | |5 |3UA |ND ||57|BROWN |NY2092| 3 | |32|6 | 62 |

STUDENTS is in 2NF since it has a single attribute as key (S#) COURSE is 2NF since it has a single attribute as key (C#) ENROLL is 2NF since GRADE is FFD on (S#,C#)

Still, we have problems: INSERT ANOMALY: Can't record LSTATUS of LOC=ND2987 until a student from ND2987 registers. DELETE ANOMALY: Deleting THAISZ, loose LCODE=NJ5102 has LSTATUS=1 UPDATE ANOMOLY: Change LSTATUS=3 to 2 and 4 to 3 must search STUDENTS sequentially. (so LSTATUS=2 isn't skipped!)

The problem is: we have a transitive dependency: S#→LCODE LSTATUS, i.e., an FD which doesn't involve the key (or even part of the key): LCODE→LSTATUS A more common transitive dependency is "CityState":

COURSE C# |CNAME|CTY|ST|8 |DSDE |Mot|ND||7 |CUS |Mot|ND||6 |3UA |Bay|NJ||5 |3UA |Mot|ND|

Delete 3rd record (cancel course C#=6) loose fact that Bay is in NJ, etc. Section 11 # 8

Page 8: 11. Normalization

Third Normal Form (3NF)A file or relation is in Third Normal Form (3NF) if it is in 2NF and every nonkey attr is non-transitively dependent on

primary key. (there are no transitive dependencies). (memory scheme: every nonkey attribute is functionally dependent on nothing but key)

We need to project STUDENT ≡ STUDENTS[S#, SNAME, LCODE] and LOCATION ≡ STUDENTS[ LCODE, LSTATUS ]

STUDENT LOCATION ENROLL COURSE S# |SNAME |LCODE LCODE |LSTATUS |S#|C#|GRADE |C#|CNAME|SITE |25|CLAY |NJ5101 |NJ5101| 1 |32|8 | 89 |8 |DSDE |ND |23|THAISZ|NJ5102 |NJ5102| 1 |32|7 | 91 |7 |CUS |ND |38|GOOD |FL6321 |FL6321| 4 |25|7 | 68 |6 |3UA |NJ |17|BAID |NY2091 |NY2091| 3 |25|6 | 76 |5 |3UA |ND |57|BROWN |NY2092 |NY2092| 3 |32|6 | 62

In summary, the memory scheme to remember 3NF: (not a definition! Just a memory scheme) every nonkey attribute is functionally dependent upon

the key 1NF the whole key and 2NF nothing but the key 3NF so help me Codd" (E.F. Codd invented relational model and normal forms)

This is an analogy based on the way in which witnesses are sworn into legal proceedings in the US: Do you swear to tell

the truth, the whole truth and nothing but the truth, so help you God?"

Section 11 # 9

Page 9: 11. Normalization

Boyce/Codd Normal Form (BCNF)A file or relation, with possibly multiple candidate keys, is in Boyce/Codd Normal Form (BCNF) if the only determinants

are the superkeys (of candidate keys).

Example of a 3NF relation which is not BCNF. ENROLL |S#|C#|GRADE|TUTOR| |32|8 | 89 |Xin | |32|7 | 91 |Sally||25|7 | 68 |Ahmed| |25|6 | 76 |Ben | |32|6 | 62 |Amit |

Primary key = (S#,C#) and each tutor is assigned to only 1 course. (Course to which a tutor is assigned is determined, so TUTOR → C# ) FDs: (S#,C#)→GRADE (S#,C#)→TUTOR (S#,C#)→(GRADE,TUTOR) TUTOR→C#

This isn't BCNF (since Tutor is not superkey), but it is in 3NF (Strictly speaking, since C# is not a non-key attribute).

Section 11 # 10

Page 10: 11. Normalization

Fourth Normal Form (4NF)A file or relation, with possibly multiple candidate keys, is in Fourth Normal Form (4NF) if it is in BCNF and all

Multivalue Dependencies (MVDs) are just FDs.

What are Multivalue Dependencies? For R(A,X,Y), where A,X,Y are distinct attributes (possibly composite)

R[A,X], R[A,Y] is a LOSSLESS decomposition of R iff R[A,X] JOINA R[A,Y] = R(A,X,Y)

A set of projections of a relation with at a one common attribute and such that every attribute is in at least one projection is called a DECOMPOSITION.

The join of a decomposition is always a superset of the original relation. Sometimes it is a proper superset (i.e., it includes SPURIOUS tuples that weren't in the original relation).

R is always a subset of R[A,B] JOINA R[A,C] PROOF: a,b,c R (a,b) R[A,B] and a,c R[A,C]. ∀ ∈ ∈ ∈

Thus, a,b,c R[A,B] JOIN∈ A R[A,C].

Heath's theorem says when the join is exactly the original relation (a lossless decomposition).

HEATH's THEOREM: Given R(A,B,C), if A→B or A→C then R = R[A,B] JOINA R[A,C] ie, if A→B or A→C then {R[A,B], R[A,C]} is a lossless decomposition Again: the join of a decomposition always contains the original relation but it may be larger (i.e., It may contain spurious tuples).

Why call it a loss when there are actually more tuples (extra spurious ones) and call it lossless when there is no gain in size? (next slide for answer).

Section 11 # 11

Page 11: 11. Normalization

Fourth Normal Form (4NF) cont.Proof of Heath's Theorem (Prove by contrapositive! i.e., P imples Q is true by showing NOT(Q) implies NOT(P) is

true.)

i.e., show NOT(R=R[A,B] JOINA R[A,C]) implies NOT(A→B or A→C ) = NOTA→B and NOTA→C (i.e., spurious tuples destroy at least one functional relationships)

NOT(R = R[A,B] JOINA R[A,C]) means (a,b,c) ( R[A,B] JOIN∃ ∈ A R[A,C] ) not in R (spurious)

(a,b,c) (R[A,B] JOIN∈ A R[A,C]) but (a,b,c) R implies (a,b) R[A,B] and ∉ ∈ (a,c) R[A,C] implies ∈

∃ (a,b,c'), (a,b',c) R such that c ∈ c' and b b' implies

R.A does not functionally determine R.B and R.A does not functionally determine R.C QED.

Does Heath's Theorem say: If R = R[A,B] JOINA R[A,C] then A→B (which would tell us that A→B implies A→C ) No! It is not an "if and only if". Counter example: R[A,B,C]: R[A,B]: R[A,C]: a b c a b a c a b' c a b'

R[A,B] JOINA R[A,C] = R. But A NOT→ B. a b c a b' c

So the FD A→B does not characterize lossless decomposition Is there a condition which does characterize lossless decomposition?

i.e., an IF AND ONLY IF (IFF) condition for lossless decomposition (next slide)

Section 11 # 12

Page 12: 11. Normalization

Fourth Normal Form (4NF) cont.Yes, it is "Multivalued Dependency" or MVD:

Given R(A,B,C), B is Multivalued Dependent on A (or A multi-determines B), written: A→→B,

iff the set of B-values matching a given (a,c) pair in R depends only on the A-value, a

(the same B-set is associated with a for all c's).

Another way: A→→B iff a R.A, R∀ ∈ R.A=a[B,C] (the projection onto B and C of the subscripted selection) is a product.

If RR.A=a[B,C] is a product then clearly the set of B-values matching any pair, (a,c) is just the projection of that product onto the B-attribute and therefore A→→B.

To prove A→→B implies RR.A=a[B,C] is a product, use the contrapositive arguement: If there is an A-value, a, such that RR.A=a[B,C] is not a product, R contains tuples:

a b1 c1 a b1 c2 a b2 c1 (but R does not contain a b2 c2 ).

But then, for c1, a→→b1,b2

but for c2, a→→b1 not the same set of B-values!

MVD is a symmetric condition: Theorem: Given R(A,B,C), A→→B iff A→→C

The proof follows directly from the previous result (the condition, RR.A=a[B,C] is a product and is symmetric in B and C).

Section 11 # 13

Page 13: 11. Normalization

Fourth Normal Form (4NF) cont.MVD is a symmetric condition: Theorem: Given R(A,B,C), A→→B iff A→→C

T

he proof follows directly from the previous result (the condition, RR.A=a[B,C] is a product and is symmetric in B and C).

Fagin's theorem: {R[A,B],R[A,C]} is a lossless decompostion of R(A,B,C) iff A→→B

To prove A→→B implies decomposition is nonloss, prove contrapositive: decomposition is lossy implies A NOT→→ B

If the decomposition is lossy, then there exists at least one (a,b) R[A,B] and (a,c) R[A,C] such that (a,b,c) R.∈ ∈ ∉Therefore, there is a b' b in B such that (a,b',c) is in R and c' c in C such that (a,b,c') is in R.

Therefore pairs (a,c) and (a,c') do not determine the same B-sets in R (true since, b is in the B-set determined by (a,c') while it is not in the B-set determined by (a,c) ).

To prove R=R[A,B] JOINA R[A,C] implies A→→B, prove the contrapositive: A NOT→→ B implies the decomposition is lossy.

A NOT→→ B means there are distinct pairs (a,c) and (a,c') R (and therefore in R[A,C]) which determine different B-sets ∈in R, say b is in the B-set determined by (a,c) in R (and therefore (a,b,c) is in R) but b is not in the B-set determined by (a,c') in R (and therefore (a,b,c') is not in R),

then (a,b,c') would not be in R. But since (a,b,c) is in R, (a,b) is in R[A,B] and (a,c') is in R[A,C]

so (a,b,c') is in R[A,B] JOINA R[A,C]

but not in R and the decomposition is lossy.

4th normal form (4NF)= BCNF and all MVDs are FDs (only dependencies are functional dependencies from superkeys).

Section 11 # 14

Page 14: 11. Normalization

The Normalization ProcessTHE OVERALL NORMALIZATION PROCESS:

0. Project off repeating groups (each as separate files with repeating group attribute as key and the original key as foreign key)

1. Take projections of 1NF to eliminate nonfull FDs - produce 2NF.

2. Take projections of 2NF to eliminate transitive FDs - produce 3NF.

3. Take projections of 3NF to eliminate remaining FDs where determinant is not candidate key - produce BCNF.

4. Take projections of BCNF to eliminate any MVDs not FDs - produce 4NF.

This discussion will stop with 4NF, however, there are 5NF, 6NF...18NF...

Normalization has become an art form.

CAUTION: It's not always completely clear what use will ever be made of some of these higher normal forms (that is; the anomolies being precluded are often somewhat involved and obscure).

Section 11 # 15

Page 15: 11. Normalization

Normalisation

The theory of Relational Database Design

These slides are an entirely separate treatment of normalisation (note the different spelling even!).

These come from the web site of the text. They are included for completeness and another point of view on the subject.

Section 11 # 16

Page 16: 11. Normalization

Introduction

• Normalisation is a theory for designing relational schema that “make sense” and work well.

• Well-normalised tables avoid redundancy and thereby reduce inconsistencies.

• Redundancy is unnecessary duplication.

• In well-normalised DBs semantic dependencies are maintained by primary key uniqueness.

Section 11 # 17

Page 17: 11. Normalization

Goals of Normalisation

• Eliminate certain kinds of redundancy

• avoid certain update anomalies

• good reresentation of real world

• simplify enforcement of DB integrity

Section 11 # 18

Page 18: 11. Normalization

Update anomalies• Undesirable side-effects that occur when

performaing insertion, modification or deletion operations on badly designed relational DBs.

SSN

987

654

333

321

678

467

Name

J Smith

M Burke

A Dolan

K Doyle

O O’Neill

R McKay

Dept

1

2

1

1

3

2

DeptMgr

321

467

321

321

678

467

Representing Department info in the Employee table causes problems.

Dept Name

...

Section 11 # 19

Page 19: 11. Normalization

Sample anomalies

• Modification -– when the manager of a dept changes we have to

change many values.– If we are not careful the DB will contain

inconsistencies.– There is no easy way to get the DB to ensure

that a department has only one manager and only one name.

Section 11 # 20

Page 20: 11. Normalization

Anomalies continued• Deletion -

– if O O’Neill leaves we delete his tuple and lose • the fact that there is a department 3

• the name of dept 3

• who is the manager of dept. 3

• Insertion– how would we create a new department before

any employees are assigned to it ?

Section 11 # 21

Page 21: 11. Normalization

Better design

• Separate entities are represented in separate tables.

SSN

987

654

333

321

678

467

Name

J Smith

M Burke

A Dolan

K Doyle

O O’Neill

R McKay

Dept

1

2

1

1

3

2

Dept

1

2

3

DeptMgr

321

467

678

Dept Name

...

Note that mapping from an ER model following the steps given will give a well-normalised DB.

Section 11 # 22

Page 22: 11. Normalization

Boyce-Codd Normal Form

• After a lot of other approaches Boyce and Codd noticed a simple rule for ensuring tables are well-normalised. Tables which obey the rule are in BCNF (Boyce Codd Normal Form).

• BCNF rule:

Every determinant in a table must be a candidate key for that table.

Section 11 # 23

Page 23: 11. Normalization

Determinants

• A is a determinant of B if each value of A has precisely one (possibly null) associated value of B.

Said another way -

• A is a determinant of B if and only if whenever two tuples agree on their A value they agree on their B value.

A B

Section 11 # 24

Page 24: 11. Normalization

Determinants

• Note that determinancy depends on semantics of data – cannot be decided from individual table

occurences.

• Alternative terminology– if A (functionally) determines B then– B is (functionally) dependent on A

Section 11 # 25

Page 25: 11. Normalization

Example determinants

• SSN determines employee name

• SSN determines employee department

• Dept. No. determines Dept. Name

• Dept. Name determines Dept. No.– assuming Dept. names are also unique

• Emp. Name does not determine Emp. Dept– two John Smiths could be in difft. Depts.

• Emp. Name does not determine SSN.

Section 11 # 26

Page 26: 11. Normalization

Determinancy Diagram

SSN

Name

Department Dept. Name

Dept. Mgr

In general key attributes of an entity determine all the single-valued attributes of the entity.

Section 11 # 27

Page 27: 11. Normalization

Composite Determinants• (SSN, Project#) together determine

the hours that the employee works on the project.

• Suppose packsize of a part depends on the supplier.

SSN

Project#

hours

PName

Name

S#

P#

packsize

PName

Section 11 # 28

Page 28: 11. Normalization

Superfluous Attrbiutes

• Superfluous attributes– If SSN determines name, so does (SSN, Dept)

and (SSN, Dept, salary), etc.– Always remove superfluous attributes from

determinants.

Section 11 # 29

Page 29: 11. Normalization

Transitive Dependencies

• SSN actually determines DeptMgr

• but only because – SSN determines DeptNo and– DeptNo determines DeptMgr.

• Be careful to remove transitive dependencies.– They mess up normalisation.

SSN

DeptNo

Dept. Mgr

Section 11 # 30

Page 30: 11. Normalization

Candidate keys• candidate key = any attribute or set of

attributes which will be unique for a table (set of attributes).– As well as the primary key there may be other

candidate keys.– E.g. DNUMBER and DNAME are both

candidate keys for the Department table.

• Key = row identifier• Candidate key = candidate identifier

Section 11 # 31

Page 31: 11. Normalization

Finding candidate keys

• Every key is by definition a determinant of all other attributes in a relation.– So in a diagram, any attribute (or composite) from

which all other attributes are reachable is a candidate key.

SSN

Project#

hours

PName

Name

(SSN, Project#) is a (composite) candidate

key for a table containing these five

attributes.

Section 11 # 32

Page 32: 11. Normalization

What are the candidate keys ?

DE

F

G H J

K

L

M

N

P Q R

S

T U

student

subject

teacher

V

W

X

Y

AZ

C

B

B

DF

E

GH

Section 11 # 33

Page 33: 11. Normalization

Problems occur when ...

• Redundancy and anomalies occur when there are determinants which are not candidate keys.

SSN Name

DeptNo Dept. Name

Dept. Mgr

• SSN is the only key for a table containing these attributes– all attributes are reachable from SSN.

• SSN, DeptNo and DeptName are determinants– they have arrows coming out of them.

Section 11 # 34

Page 34: 11. Normalization

BCNF rule

• In well-normalised relations (Boyce-Codd normal form)

every determinant is a candidate key.

SSN Name

DeptNo

Dept. Name

Dept. Mgr

DeptNo

The employee/dept table decomposed to BCNF.

Note that both DeptNo and DeptName are candidate keys of the second table.

Section 11 # 35

Page 35: 11. Normalization

Transformation to BCNF• Create new tables such that each

non-key determinant is a candidate key in a new table.

• The new table contains the attributes which are directly determined by the new candidate key.

V

W

X

Y

AZ

C

B

V X

V

WY

V

W

AZAC

B

BCNF tables :

(V, X)

(A, B, C)

(V, W, Z, A)

(V, W, Y)

Section 11 # 36

Page 36: 11. Normalization

Other Normal Forms

• First NF - no multi-valued attributes– all relational DBs are 1NF

• 2NF - every non-key attribute is fully dependent on the primary key

• 3NF - eliminate functional dependencies between non-key attributes

– all dependencies can then be enforced by uniqueness of keys.

G H J

Table is in 2NF but not 3NF

Section 11 # 37

Page 37: 11. Normalization

BCNF vs. 3NF• BCNF goes further than 3NF, some say too far.

• A 3NF table that has no overlapping composite keys is in BCNF.

student

subject

teacher

3NF, not BCNF

keys: (student, subject)

(student, teacher)

teacher is a determinant

student teacher

subjectteacher

BCNF

but tables are not independent

A teacher teaches only one subject.For a given subject a given student has only one teacher.

Section 11 # 38

Page 38: 11. Normalization

4NF : Multi-valued dependencies

• If a course can have multiple teachers and multiple texts, blind mapping to 1NF will give

SubjectPhysicsPhysicsPhysicsPhysicsMathsMathsMaths

TeacherGreenBrownGreenBrownGreenGreenGreen

TextBasic MechanisBasic MechanicsPrinciples of OpticsPrinciples of OpticsBasic MechanicsVector AnalysisTrigonometry

which clearly has redundancy.

Section 11 # 39

Page 39: 11. Normalization

Fully-normalised

• BCNF relations are well-normalised

• Fully-normalised relations are those with no multi-valued dependencies (4NF) and no join dependencies (5NF).

Section 11 # 40

Page 40: 11. Normalization

Thank you.

Section 11