26
M.P. Johnson, DBMS, Stern/NYU , Sp2004 1 Agenda Last time: Normalization Homework 1 due now Project part 2 is up, due on the 19 th (Thurs.) This time: 1. Finish BCNF 2. 3NF 3. 4NF 4. Relational Algebra…

Agenda

Embed Size (px)

DESCRIPTION

Agenda. Last time: Normalization Homework 1 due now Project part 2 is up, due on the 19 th (Thurs.) This time: Finish BCNF 3NF 4NF Relational Algebra…. BCNF Review. Q: What’s required for BCNF? Q: What’s the slogan for BCNF? Q: Who are B & C? - PowerPoint PPT Presentation

Citation preview

M.P. Johnson, DBMS, Stern/NYU, Sp2004

1

Agenda Last time: Normalization Homework 1 due now Project part 2 is up, due on the 19th (Thurs.) This time:

1. Finish BCNF

2. 3NF

3. 4NF

4. Relational Algebra…

M.P. Johnson, DBMS, Stern/NYU, Sp2004

2

BCNF Review Q: What’s required for BCNF?

Q: What’s the slogan for BCNF?

Q: Who are B & C?

Q: What are the two types of violations?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

3

BCNF Review Q: How do we fix a non-BCNF relation?

Q: If AsBs violates BCNF, what do we do? Q: In this case, could the decomposition be lossy?

Q: Under what circumstances could a decomposition be lossy?

Q: How do we combine two relations?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

4

Decomposition algorithm example R(N,O,R,P) F = {N O, O R, R N}

Key: N,P Violations of BCNF: N O, OR, N OR

which kinds of violations are these? Pick N OR (on board) Can we rejoin? (on board) What happens if we pick N O instead? Can we rejoin? (on board)

Name Office Residence Phone

George Pres. WH 202-…

George Pres. WH 486-…

Dick VP NO 202-…

Dick VP NO 307-…

M.P. Johnson, DBMS, Stern/NYU, Sp2004

5

Lossless BCNF decomposition Consider simple relation: R(A,B,C) Only FD: A B (assume C!A) Key: A,C

Diff vars from text! Also goes through if assumption is false BCNF violation (which kind?): no key on the left Thus: Decomposition to BCNF: Create R1(A,B) and R2(A,C) Could this be lossy? We will join R1 and R2 on A to find out

Q: If C A, then what kind do we have?

Q: Since C ! A, what kind of bad FD do we have?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

6

Lossless BCNF decomposition Suppose R contains (b,a,c) and (b’,a,c’) In projection onto (B,A):

(b,a,c) (b,a), (b’,a,c’) (b’,a) In projection onto (A,C):

(b,a,c) (a,c), (b’,a,c’) (a,c’) In joining, (b’,a), (a,c) (b’,a,c) Q: Is/must/can this be correct? A: Yes! A B, so b = b’ So this was lossless We assumed C!A, but argument also goes

through when CA Moral: BCNF decomp alg is always lossless

M.P. Johnson, DBMS, Stern/NYU, Sp2004

7

BCNF summary BCNF decomposition is lossless

Can reproduce original by joining Saw last time: Every 2-attribute relation is in

BCNF Final set of decomposed relations might be

different depending on Order of bad FDs chosen

Saw last time: But all results will be in BCNF

M.P. Johnson, DBMS, Stern/NYU, Sp2004

8

A problem with BCNF Relation: R(Title, Theater, Neighboorhood) FDs:

Title,N’hood Theater Assume movie can’t play twice in same neighborhood

Theater N’hood Keys:

{Title, N’hood} {Theater, Title}

Title Theater N’hood

City of God Angelica Village

Fog of War Angelica Village

M.P. Johnson, DBMS, Stern/NYU, Sp2004

9

A problem with BCNF BCNF violation: Theater N’hood Decompose:

{Theater, N’Hood} {Theater, Title}

Resulting relations:

VillageAngelica

N’hoodTheater

R1

Fog of WarAngelica

City of GodAngelica

TitleTheater

R2

M.P. Johnson, DBMS, Stern/NYU, Sp2004

10

Problem - continued Suppose we add new rows to R1 and R2:

Their join:

City of GodVillageFilm Forum

Village

Village

N’hood

Fog of War

City of God

Title

Angelica

Angelica

Theater

(R’)

Theater N’hood

Angelica Village

Film Forum Village

Theater Title

Angelica City of God

Angelica Fog of War

Film Forum City of God

R1 R2

A and B could not enforce FD Title,N’hood Theater

M.P. Johnson, DBMS, Stern/NYU, Sp2004

11

Third normal form: motivation There are some situations in which

BCNF is not dependency-preserving, and Efficient checking for FD violation on updates is

important

In these cases BCNF is too severe a req. Solution: define a weaker normal form, called

Third Normal Form in which FDs can be checked on individual relations

without performing a join (no inter-relational FDs) to which relations can be converted, preserving both

data and FDs

M.P. Johnson, DBMS, Stern/NYU, Sp2004

12

Third Normal Form BCNF decomposition is not dependency-preserving! We now define the (weaker) Third Normal Form

Turns out: this example was already in 3NF

A relation R is in 3rd normal form if :

For every nontrivial dependency A1, A2, ..., An Bfor R, {A1, A2, ..., An } is a super-key for R, or B is part of a key, i.e., B is prime

A relation R is in 3rd normal form if :

For every nontrivial dependency A1, A2, ..., An Bfor R, {A1, A2, ..., An } is a super-key for R, or B is part of a key, i.e., B is prime

Tradeoff:BCNF = no FD anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies

M.P. Johnson, DBMS, Stern/NYU, Sp2004

13

BCNF: vices and virtues Be clear on the problem just described v. the

arg. that BCNF decomp is lossless BCNF decomp does not lose data

Resulting relations can be rejoined to obtain the original

But: it can can lose dependencies After decomp, possible to add rows whose

corresponding rows would be illegal in (rejoined) original

M.P. Johnson, DBMS, Stern/NYU, Sp2004

14

Recap: goals of normalization When we decompose a relation R with FDs F into

R1..Rn we want:

1. lossless-join decomposition – no data lost

2. no/little redundancy: the relations Ri should be in either BCNF or at least 3NF

3. Dependency preservation: if Fi be the set of dependencies in F+ that include only attributes in Ri:

F is the “sum” of the FDs of the new relations (F1 F2 F3 … Fn)+ = F+

Otherwise checking updates for violation of FDs may require computing joins, which is expensive

M.P. Johnson, DBMS, Stern/NYU, Sp2004

15

Dependency preservation Saw that last req. didn’t hold in move-theater

example Did it hold in R(N,O,R,P) example?

(on board)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

16

Testing for 3NF For each dependency X Y, use attribute closure

to check if X is a superkey If X is not a superkey, verify that each attribute in Y

is prime This test is rather more expensive, since it involves finding

candidate keys Testing for 3NF is NP-complete (in what?) Interestingly, decomposition into 3NF can be done in

polynomial time Testing for 3NF is harder than decomposing into 3NF!

Optimization: need to check only FDs in F, need not check all FDs in F+ (why?)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

17

3NF Example R = (J, K, L) F = (JK L, L K) Two candidate keys: JK and JL R is in 3NF

JK L JK is a superkey L K K is prime

BCNF decomposition yields R1 = (L,K), R2 = (L,J)

testing for JK L requires a join There is some redundancy in R

M.P. Johnson, DBMS, Stern/NYU, Sp2004

18

BCNF and 3NF Comparison Example of problems due to redundancy in 3NF

R = (J, K, L) F = (JK L, L K)

A schema that is in 3NF but not BCNF has the problems of: redundancy (e.g., the relationship between l1 and k1) need to use null values (if allowed!), e.g. to represent the

relationship between l2 and k2 when there is no corresponding value for attribute J

J K L

j1 k1 l1

j2 k1 l1

j3 k1 l1

NULL k2 l2

M.P. Johnson, DBMS, Stern/NYU, Sp2004

19

Comparison of BCNF and 3NF It is always possible to decompose a relation

into relations in 3NF such that: the decomposition is lossless the dependencies are preserved

It is always possible to decompose a relation into relations in BCNF such that: the decomposition is lossless but it may not be possible to preserve

dependencies But may eliminate more redundancy

M.P. Johnson, DBMS, Stern/NYU, Sp2004

20

The Normal Forms (so far) 1NF: every attribute has an atomic value 2NF: 1NF and no partial dependencies 3NF: for each FD X Y either

it is trivial, or X is a superkey, or Y is a part of some key

BCNF: 3NF and third 3NF option disallowed I.e, 2NF and no transitive dependencies

M.P. Johnson, DBMS, Stern/NYU, Sp2004

21

Distinguishing examples 1NF but not 2NF: R(Name, SSN ,Mailing-

address,Phone) Key: SSN,Phone Partial: ssn name, address

2NF but not 3NF: R(Title,Year,Studio,Pres,Pres-Addr) Key: Title,Year Transitive: studio president

3NF but not BCNF: R(Title, Theater, N’hood) Title,N’hood Theater Prime-on-right: Theater N’hood

M.P. Johnson, DBMS, Stern/NYU, Sp2004

22

Design Goals Goal for a relational database design is:

No redundancy Lossless Join Dependency Preservation

If we cannot achieve this, we accept one of dependency loss use of more expensive inter-relational methods to preserve

dependencies data redundancy due to use of 3NF

Interesting: SQL does not provide a direct way of specifying FDs other than superkeys can specify FDs using assertions, but they are expensive to test

M.P. Johnson, DBMS, Stern/NYU, Sp2004

23

3NF 3NF means we may have anomalies Example: TEACH(student, teacher, subject)

student, subject teacher (students not allowed in the same subject with two teachers)

teacher subject (each teacher teaches one subject) Subject is prime, so this is 3NF

But we have anomalies: Insertion: cannot insert a teacher until we have a

student taking his subject If we convert to BCNF, we lost student,

subject teacher

M.P. Johnson, DBMS, Stern/NYU, Sp2004

24

BCNF and over-normalization What is the problem? Schema overload – trying to capture two meanings:

1) subject X can be taught by teacher Y 2) student Z takes subject W from teacher V

What to do? 3NF has anomalies, normalizing to BCNF loses FDs One soln: keep the 3NF TEACH and another

(BCNF) relation SUBJECT-TAUGHT (teacher, subject)

Still (more!) redundancy, but no more insert and delete anomalies

M.P. Johnson, DBMS, Stern/NYU, Sp2004

25

New topic: MVDs (3.7) Consider this relation

People ~ their jobs ~ their residences Person-address/city: many-many Person-job: many-many Address/city-job: independent

Chappaqua333 Some StreetFirst Lady456Hilary

Washington444 Embassy RowFirst Lady456Hilary

New York111 East 60th StreetCEO123Michael

London222 Brompton RoadCEO123Michael

444 Embassy Row

333 Some Street

444 Embassy Row

333 Some Street

222 Brompton Road

111 East 60th Street

Streets

Lawyer

Lawyer

Senator

Senator

Mayor

Mayor

Jobs

Washington456Hilary

Chappaqua789Hilary

Washington789Hilary

Chappaqua456Hilary

London123Michael

New York123Michael

CitysSSNName

M.P. Johnson, DBMS, Stern/NYU, Sp2004

26

Redundancy in BCNF

Lots of redundancy! Key? All fields

None determined by others! Non-trivial FDs? None! In BCNF? Yes!

Name Streets Citys Jobs

Michael 111 East 60th Street New York Mayor

Michael 222 Brompton Road London Mayor

Michael 111 East 60th Street New York CEO

Michael 222 Brompton Road London CEO

Hilary 333 Some Street Chappaqua Senator

Hilary 444 Embassy Row Washington Senator

Hilary 333 Some Street Chappaqua First Lady

Hilary 444 Embassy Row Washington First Lady

Hilary 333 Some Street Chappaqua Lawyer

Hilary 444 Embassy Row Washington Lawyer

Now what? New concept, leading

to another normal form: Multivalued

dependencies