34
Normalization Sridhar Narayan [email protected]

Normalization Sridhar Narayan [email protected]. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Embed Size (px)

Citation preview

Page 1: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Normalization

Sridhar [email protected]

Page 2: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

SSN PNUMBER HOURS ENAME PNAME PLOC

E1 P1 20 Joe CIS Roof UNCW

E1 P2 20 Joe Restaurant Mayfaire

E2 P1 40 Joe CIS Roof UNCW

EMP_PROJ

• Something feels wrong about this design• Try adding a row – Insertion anomaly• Try deleting a row – Deletion anomaly• Try updating a row – Update anomaly

• Need a formal way to reason about what is wrong with it and how to fix it

Page 3: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Functional Dependency

• Constraints between attribute sets in a relation

• If X and Y are sets of attributes of a relation R, and whenever two tuples in R have the same X-values they also have the same Y-values, we say that X functionally determines Y.

Page 4: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Functional Dependency

• Written as X -> Y– X functionally determines Y– Y is functionally determined by X– X is the determinant, Y is the dependent

• Examples– SSN -> SSN (trivial dependency)– PNUMBER -> PNAME– SSN -> ENAME– SSN, PNUMBER -> HOURS

Page 5: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Functional Dependency

• Between sets of attributes, not just single attributes

• Holds for all time, not just for a particular instance (snapshot) of a relation

• Formally states constraints that exist for the relation– These constraints are in addition to those imposed

by primary keys and foreign keys

Page 6: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Functional dependencies and keys

• If X functionally determines all attributes of R, then X is a super key

• If X is irreducible, i.e. every member of X is essential for the functional dependencies to hold, then X is a candidate key.

• Attributes that are a part of a candidate key are key attributes

Page 7: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Examples

Super key:– SSN, PNUMBER, PNAME -> SSN, PNUMBER, HOURS,

ENAME, PNAME, PLOC

Candidate key:– SSN, PNUMBER -> SSN, PNUMBER, HOURS, ENAME,

PNAME, PLOC

SSN PNUMBER HOURS ENAME PNAME PLOC

E1 P1 20 Joe CIS Roof UNCW

E1 P2 20 Joe Restaurant Mayfaire

E2 P1 40 Joe CIS Roof UNCW

Page 8: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Redundancy

• If in a relation R, A -> B and A is not a candidate key for R, then R will involve some redundancy.

SSN PNUMBER HOURS ENAME PNAME PLOC

Intuitively, all functional dependencies in a relation should involve candidate keys to eliminate redundancy

Page 9: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Normalization

• A process that utilizes functional dependencies to identify relation schemas that have an undesirable form (redundancy) and decomposes them into smaller schema in which the redundancy has been eliminated.

Page 10: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Decomposition

• Decomposition should be– Lossless join• Allow exact recovery of the original schema (without

spurious tuples)

– Dependency preserving• Allow dependencies to be checked without requiring a

join

Page 11: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Lossy decomposition

SSN PNUMBER HOURS ENAME

E1 P1 20 Joe

E1 P2 20 Joe

E2 P1 40 Joe

ENAME PNAME PLOC

Joe CIS Roof UNCW

Joe Restaurant Mayfaire

Joe CIS Roof UNCW

Page 12: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Natural join to recover originalSSN PNUMBER HOURS ENAME PNAME PLOC

E1 P1 20 Joe CIS Roof UNCW

E1 P2 20 Joe Restaurant Mayfaire

E2 P1 40 Joe CIS Roof UNCW

E2 P1 40 Joe Restaurant Mayfaire

Page 13: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Heath’s Theorem

• If relation R = {A,B,C} where A,B,C are attribute sets

• and A -> B• then R1= {A, B} and R2 = {A, C} represents a

lossless decomposition

Page 14: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Levels of normalization

• First normal form – 1NF• Second normal form – 2NF• Third normal form – 3NF• Boyce-Codd Normal Form - BCNF

Increasingly stringent requirements

Page 15: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Normal Forms

1NF 2NF3NF

BCNF

Page 16: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

First normal form

• Relation is in 1NF if all attribute values are atomic (By definition, all relations are in 1NF)

D_NAME D_NUM MGR_SSN D_LOCATIONS

RESEARCH 5 334619276 {Lumberton, Red Springs, Raeford}

• Assume that a department can have multiple locations, like {Lumberton, Red Springs, Raeford}• Relation not in 1NF

Page 17: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Resolution?

D_NAME D_NUM MGR_SSN D_LOCATIONS

RESEARCH 5 334619276 Lumberton

RESEARCH 5 334619276 Red Springs

RESEARCH 5 334619276 Raeford

Page 18: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

DecompositionD_NAME D_NUM MGR_SSN D_LOCATIONS

D_NAME D_NUM MGR_SSN D_NUM D_LOCATIONS

Page 19: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Second Normal Form: 2NF

• A relation is in 2NF if – It is in 1NF, and– If the non-key attributes are fully (irreducibly)

dependent on the primary key

Page 20: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Example: EMP_PROJ

SSN PNUMBER HOURS ENAME PNAME PLOC

• Functional Dependencies?• SSN -> ENAME• PNUMBER -> PNAME, PLOC• {SSN, PNUMBER} -> HOURS

•Relation not in 2NF• Non-key attributes ENAME, and PLOC and PNAME, are not

fully dependent on the primary key

Page 21: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Solution? Decompose

SSN PNUMBER ENAME PNAME PLOC1b

SSN PNUMBER HOURS1a 2NF

2NF ?

Page 22: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Decompose further…

SSN PNUMBER PNAME PLOC2b

SSN ENAME2a 2NF

2NF ?

Page 23: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

And a little more…

SSN PNUMBER3b 3b is a part of 1a, so drop it.

PNUMBER PNAME PLOC3a 2NF

Page 24: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

2NF Normalization

SSN PNUMBER HOURS1a 2NF

SSN ENAME2a 2NF

PNUMBER PNAME PLOC3a 2NF

Page 25: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

More than one way to get here

SSN PNUMBER HOURS ENAME PNAME PLOC

PNUMBER PNAME PLOC1a 2NF

SSN PNUMBER HOURS ENAME1b Not2NF

Page 26: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Decompose further…

SSN PNUMBER HOURS2a

SSN PNUMBER ENAME2b

2NF

Not2NF

Page 27: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

And a little bit more

SSN PNUMBER

3a SSN ENAME

3b

2NF

Redundant

Page 28: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

3NF Normalization

• A relation is in 3NF if – It is in 2NF, and– If the non-key attributes are mutually

independent. That is, no functional dependencies exist between non-key attributes.

Page 29: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Example: EMP_DEPT

• Functional Dependencies?• SSN -> {ENAME, DOB, ADDRESS, DNUM}• DNUM -> {DNAME, DMGRSSN}

• Redundancy? • Relation in 1NF ?• 2NF ?• 3NF ?

SSN ENAME DOB ADDRESS DNUM DNAME DMGRSSN

Page 30: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

3NF Normalization

DNUM DNAME DMGRSSN

SSN ENAME DOB ADDRESS DNUM1a1b

Page 31: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

BCNF Normalization

• S# and SNAME – Supplier# and Supplier Name are unique• FDs

– S# -> SNAME– SNAME -> S#– S#,P# -> QTY– SNAME, P# -> QTY

• Candidate keys– S#, P# and SNAME, P#

S# SNAME P# QTY

S1 Acme Supply P1 100

S2 Gem Mfg P1 200

S1 Acme Supply P2 400

Page 32: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

BCNF Normalization

• Redundancy?• 1NF?• 2NF?• 3NF?

S# SNAME P# QTY

S1 Acme Supply P1 100

S2 Gem Mfg P1 200

S1 Acme Supply P2 400

Page 33: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

BCNF

• Relation is in BCNF if and only if the only determinants are candidate keys

• FDs– S# -> SNAME– SNAME -> S#– S#,P# -> QTY– SNAME, P# -> QTY

Page 34: Normalization Sridhar Narayan narayans@uncw.edu. SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

BCNF Normalization

S# P# QTY

S1 P1 100

S2 P1 200

S1 P2 400

S# SNAME

S1 Acme Supply

S2 Gem Mfg

S1 Acme Supply

Two candidate keys:• S#• SNAME