60
Handout

Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Handout

Page 2: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Combining schemas

Problems: redundancy, hard to update, possible NULLs

Page 3: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Problems?

Conclusion: Whether the join attribute is PK or not makes a great difference when

combining schemas!

Page 4: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Splitting schemas, a.k.a. decomposition (revert arrows below)

Functional dependency: loan_number amount

Coincidence or not? (And why it matters …)

Page 5: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

An even worse decomposition: lossy!

Why do we say lossy when in

fact we end up with more data?

Page 6: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch
Page 7: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

7.3 Decomposition using FDs

Page 8: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch
Page 9: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

FD algebra

Page 10: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Example:

--------------------------------------------------------------------------------------------------

Page 11: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

The most useful normal form:

Page 12: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

loan = (loan_number, amount)

borrower = (customer_id, loan_number)

Find the set of all (non-trivial) FDs for the relation bor_loan

Page 13: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Another example:

Is this schema in BCNF?

Page 14: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

In bor_loan, the violating FD is loan_number → amount, so we set

Why not simply say R – ?

Page 15: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Another example:

It was found earlier that this schema is not in BCNF. The violating FD is B → C.

Apply the BCNF decomposition algorithm!

Page 16: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch
Page 17: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Is this relation in BCNF?

If no, decompose it!

Page 18: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

To do for next time: Rework all the BCNF examples!

----------------------------------------------------------------------------------------------

Page 19: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

BCNF and preservation of dependencies

E-R design from Ch.6: a customer A customer can have more than 1 personal banker,

can have at most 1 personal banker but at most one at any given branch. (?)

Page 20: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

A ternary relationship-set is needed:

Implementation:

R = cust_banker_branch = (customer_id, employee_id, branch_name, type)

FDs: FD1: employee_id branch_name

FD2: (customer_id, branch_name) (employee_id, type)

Page 21: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Is cust_banker_branch in BCNF?

No.

Apply the decomposition algorithm!

Decomposition: R1 = (employee_id, branch_name)

R2 = (customer_id, employee_id, type)

Problem: FD2 is now “spread” across two relations!

Page 22: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Conclusion: BCNF is not dependency preserving

R = cust_banker_branch = (customer_id, employee_id, branch_name, type)

FDs: FD1: employee_id branch_name

FD2: (customer_id, branch_name) (employee_id, type)

Extra-credit: What if we started BCNF decomposition with F2 instead of F1?

Time: 2’

Page 23: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Because it is not always possible to achieve both BCNF and dependency preservation, we

consider a weaker NF, known as

Show that cust_banker_branch is in 3NF R = cust_banker_branch = (customer_id, employee_id, branch_name, type)

FDs: FD1: employee_id branch_name

FD2: (customer_id, branch_name) (employee_id, type)

Page 24: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Whatever happened to 2NF?

In a nutshell, it forbids attributes to depend on parts of keys.

See Second normal form - Wikipedia, the free encyclopedia for more details.

Another BCNF/3NF example:

books (B-Name, Ed, A-Name, A-SSN, Nr-pag)

A_Name A_SSN

Is it in

BCNF?

3NF?

To do for next time: Rework all the BCNF & 3NF examples!

-----------------------------------------------------------------------------------------------------

Page 25: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Higher NFs

Consider this relation:

classes (course, teacher, book )

If (c, t, b) classes means that t is qualified to teach c, and b is a required textbook

for c.

What are the FDs for this relation?

Is it in BCNF?

Is it in 3NF?

Page 26: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

We still have redundancies and insertion anomalies – e.g., if Marilyn is a new teacher

that can teach database, two tuples need to be inserted:

(database, Marilyn, DB Concepts)

(database, Marilyn, Ullman)

Page 27: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch
Page 28: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Whatever happened to 2NF?

In a nutshell, it forbids attributes to depend on parts of keys.

See Second normal form - Wikipedia, the free encyclopedia for more details.

Page 29: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

The big picture

Page 30: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

7.4 FD Theory

7.4.1 The Closure of a set of FDs

Yes, this is a

trivial FD!

Page 31: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Algorithm to compute F+

Page 32: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Although Armstrong’s axioms are sufficient to obtain the closure …

… in practice we want more “tools”

How about these?

• Idempotency: X X X

• Commutativity: X Y Y X

They are true, but it is customary to write all attributes as sets w/no repeating values and

sorted in alphabetical order.

Page 33: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Important lemma:

if and only if

Proof: Left as individual work for next time. Use the definition of a FD from

p.271:

Page 34: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Practice

exercise

7.4

Page 35: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Example:

Quiz: Generate 4 more FDs that are in F+

Page 36: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

7.4.2 The Closure of a set of attributes (under the set of FDs)

Compare to the inefficient algorithm, based on F + …

For next time: Read and understand the example on p.281

--------------------------------------------------------------------------------------------------------

Page 37: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Applications of attribute closure:

Check if a set of attributes is superkey

Check if a set of attributes is candidate key (i.e. superkey + minimal)

Check if a functional dependency holds (i.e. if is in F+)

o Find + and then check if +

Computing closure F+ of F

o For each set of attributes R, find the closure +, and for each S +

output a functional dependency S

Attribute closure gives another algorithm to

find the FD closure F+! Compare it with the

first alg. from fig. 7.8. Which one do you

think is more efficient? Explain!

Page 38: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Example:

Page 39: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

In general, a FD is of the form , with and sets of attributes, e.g.

EFG KL.

Food for thought:

Can be the empty set? (“nothing” )

Can be the total set? (“everything” )

Can be the empty set? ( “nothing”)

Can be the total set? ( “everything”)

Page 40: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Extraneous attributes

In English:

If we remove the attribute, the closure F+ does not change

Why is this of practical importance?

This part is trivial, so it

doesn’t need to be

checked (it was included

just for symmetry)

Page 41: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Examples:

Given F = {A C, AB C }

B is extraneous in AB C because {AB C} can be derived

from A C (How?)

As seen in this example, sometimes removal of extraneous attributes

makes an entire FD disappear (b/c it’s a duplicate)

Given F = {A C, AB CD}

C is extraneous in AB CD since AB C can be derived even

after deleting C (How?)

Page 42: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Algorithm:

Exercise

Add to the list of

applications of

attribute closure!

Add to the list of

applications of

attribute closure!

Page 43: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Answer:

-------------------------------------------------------------------------------------

Exercise

Page 44: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Answer:

A+ = {A, B, C, D}, so A+ contains C, so C is extraneous in ACD

Exercise

Same scenario as above.

Is D extraneous in ACD?

Exercise

F = {A B, B C, A C).

Is C extraneous in A C?

So what do we do about A C?

For next time: solve all the exercises above, plus the one on p.283!

Page 45: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Why is this of practical importance?

Algorithm:

?

Page 46: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch
Page 47: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Solve for practice!

Example not from text:

Page 48: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Solution:

Page 49: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Two things must be preserved when we perform decompositions:

Data (tuples)

FDs

Page 50: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Efficient algorithm (uses only attribute closure, not FD closure!)

How much of Ri can we

recover, based on the

current result?

Page 51: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Example (not in text, but in text slides):

----------------------------------------------------------------------------------

Apply the algorithm

above to prove this!

Trivial, don’t need

algorithm!

Page 52: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Solution:

Prove that the decomposition R1=(A, B) R2 = (A,C) is not

dependency preserving.

The FD that needs to be recovered is B→C. Apply algorithm:

result = {B}

Consider R1; result ∩ R1 = {B}; {B}+ = {BC}; {BC}∩R1 = {B};

resultU{B} = {B}

Consider R2; result ∩ R2 = Ø; Ø+ = Ø; … result = {B}

No progress, so algorithm stops.

We could not obtain the RHS of B→C, so FD cannot be recovered.

Page 53: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Week 12, Lect 1

7.5 Decomposition using FDs

Problem: The definitions of both BCNF and 3NF require F+ → expensive!

Page 54: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

FYI there is a sketched proof

for this on p.289 (not required

for final)

Page 55: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Can you find super-keys?

Intuitively, we can “feel” that AC BDE … but how to prove it?

Hint: Armstrong’s axioms (and theorems)

So AC is a super-key. But is it a candidate key? (What’s the difference?)

Page 56: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Do you think there are other candidate keys? Why or why not?

Are there any BCNF violations?

Hint: To find BCNF violations, do we need to check F or F+? Why?

Which one do choose to start decomposition?

Page 57: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

Now write down the two relations resulting from decomposition, including

their FDs F1 and F2 and their candidate keys:

Page 58: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch
Page 59: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch
Page 60: Handout - Tarleton State UniversityBecause it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch

SKIP the remainder of Section 7.5, starting with 7.5.1.2. (p.289)

SKIP 7.6, 7.7

Read and take notes: Sections 7.8, 7.9

Homework for Ch.7: 1, 3, 5, 6, 7, 11