Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Handout
Combining schemas
Problems: redundancy, hard to update, possible NULLs
Problems?
Conclusion: Whether the join attribute is PK or not makes a great difference when
combining schemas!
Splitting schemas, a.k.a. decomposition (revert arrows below)
Functional dependency: loan_number amount
Coincidence or not? (And why it matters …)
An even worse decomposition: lossy!
Why do we say lossy when in
fact we end up with more data?
7.3 Decomposition using FDs
FD algebra
Example:
--------------------------------------------------------------------------------------------------
The most useful normal form:
loan = (loan_number, amount)
borrower = (customer_id, loan_number)
Find the set of all (non-trivial) FDs for the relation bor_loan
Another example:
Is this schema in BCNF?
In bor_loan, the violating FD is loan_number → amount, so we set
Why not simply say R – ?
Another example:
It was found earlier that this schema is not in BCNF. The violating FD is B → C.
Apply the BCNF decomposition algorithm!
Is this relation in BCNF?
If no, decompose it!
To do for next time: Rework all the BCNF examples!
----------------------------------------------------------------------------------------------
BCNF and preservation of dependencies
E-R design from Ch.6: a customer A customer can have more than 1 personal banker,
can have at most 1 personal banker but at most one at any given branch. (?)
A ternary relationship-set is needed:
Implementation:
R = cust_banker_branch = (customer_id, employee_id, branch_name, type)
FDs: FD1: employee_id branch_name
FD2: (customer_id, branch_name) (employee_id, type)
Is cust_banker_branch in BCNF?
No.
Apply the decomposition algorithm!
Decomposition: R1 = (employee_id, branch_name)
R2 = (customer_id, employee_id, type)
Problem: FD2 is now “spread” across two relations!
Conclusion: BCNF is not dependency preserving
R = cust_banker_branch = (customer_id, employee_id, branch_name, type)
FDs: FD1: employee_id branch_name
FD2: (customer_id, branch_name) (employee_id, type)
Extra-credit: What if we started BCNF decomposition with F2 instead of F1?
Time: 2’
Because it is not always possible to achieve both BCNF and dependency preservation, we
consider a weaker NF, known as
Show that cust_banker_branch is in 3NF R = cust_banker_branch = (customer_id, employee_id, branch_name, type)
FDs: FD1: employee_id branch_name
FD2: (customer_id, branch_name) (employee_id, type)
Whatever happened to 2NF?
In a nutshell, it forbids attributes to depend on parts of keys.
See Second normal form - Wikipedia, the free encyclopedia for more details.
Another BCNF/3NF example:
books (B-Name, Ed, A-Name, A-SSN, Nr-pag)
A_Name A_SSN
Is it in
BCNF?
3NF?
To do for next time: Rework all the BCNF & 3NF examples!
-----------------------------------------------------------------------------------------------------
Higher NFs
Consider this relation:
classes (course, teacher, book )
If (c, t, b) classes means that t is qualified to teach c, and b is a required textbook
for c.
What are the FDs for this relation?
Is it in BCNF?
Is it in 3NF?
We still have redundancies and insertion anomalies – e.g., if Marilyn is a new teacher
that can teach database, two tuples need to be inserted:
(database, Marilyn, DB Concepts)
(database, Marilyn, Ullman)
Whatever happened to 2NF?
In a nutshell, it forbids attributes to depend on parts of keys.
See Second normal form - Wikipedia, the free encyclopedia for more details.
The big picture
7.4 FD Theory
7.4.1 The Closure of a set of FDs
Yes, this is a
trivial FD!
Algorithm to compute F+
Although Armstrong’s axioms are sufficient to obtain the closure …
… in practice we want more “tools”
How about these?
• Idempotency: X X X
• Commutativity: X Y Y X
They are true, but it is customary to write all attributes as sets w/no repeating values and
sorted in alphabetical order.
Important lemma:
if and only if
Proof: Left as individual work for next time. Use the definition of a FD from
p.271:
Practice
exercise
7.4
Example:
Quiz: Generate 4 more FDs that are in F+
7.4.2 The Closure of a set of attributes (under the set of FDs)
Compare to the inefficient algorithm, based on F + …
For next time: Read and understand the example on p.281
--------------------------------------------------------------------------------------------------------
Applications of attribute closure:
Check if a set of attributes is superkey
Check if a set of attributes is candidate key (i.e. superkey + minimal)
Check if a functional dependency holds (i.e. if is in F+)
o Find + and then check if +
Computing closure F+ of F
o For each set of attributes R, find the closure +, and for each S +
output a functional dependency S
Attribute closure gives another algorithm to
find the FD closure F+! Compare it with the
first alg. from fig. 7.8. Which one do you
think is more efficient? Explain!
Example:
In general, a FD is of the form , with and sets of attributes, e.g.
EFG KL.
Food for thought:
Can be the empty set? (“nothing” )
Can be the total set? (“everything” )
Can be the empty set? ( “nothing”)
Can be the total set? ( “everything”)
Extraneous attributes
In English:
If we remove the attribute, the closure F+ does not change
Why is this of practical importance?
This part is trivial, so it
doesn’t need to be
checked (it was included
just for symmetry)
Examples:
Given F = {A C, AB C }
B is extraneous in AB C because {AB C} can be derived
from A C (How?)
As seen in this example, sometimes removal of extraneous attributes
makes an entire FD disappear (b/c it’s a duplicate)
Given F = {A C, AB CD}
C is extraneous in AB CD since AB C can be derived even
after deleting C (How?)
Algorithm:
Exercise
Add to the list of
applications of
attribute closure!
Add to the list of
applications of
attribute closure!
Answer:
-------------------------------------------------------------------------------------
Exercise
Answer:
A+ = {A, B, C, D}, so A+ contains C, so C is extraneous in ACD
Exercise
Same scenario as above.
Is D extraneous in ACD?
Exercise
F = {A B, B C, A C).
Is C extraneous in A C?
So what do we do about A C?
For next time: solve all the exercises above, plus the one on p.283!
Why is this of practical importance?
Algorithm:
?
Solve for practice!
Example not from text:
Solution:
Two things must be preserved when we perform decompositions:
Data (tuples)
FDs
Efficient algorithm (uses only attribute closure, not FD closure!)
How much of Ri can we
recover, based on the
current result?
Example (not in text, but in text slides):
----------------------------------------------------------------------------------
Apply the algorithm
above to prove this!
Trivial, don’t need
algorithm!
Solution:
Prove that the decomposition R1=(A, B) R2 = (A,C) is not
dependency preserving.
The FD that needs to be recovered is B→C. Apply algorithm:
result = {B}
Consider R1; result ∩ R1 = {B}; {B}+ = {BC}; {BC}∩R1 = {B};
resultU{B} = {B}
Consider R2; result ∩ R2 = Ø; Ø+ = Ø; … result = {B}
No progress, so algorithm stops.
We could not obtain the RHS of B→C, so FD cannot be recovered.
Week 12, Lect 1
7.5 Decomposition using FDs
Problem: The definitions of both BCNF and 3NF require F+ → expensive!
FYI there is a sketched proof
for this on p.289 (not required
for final)
Can you find super-keys?
Intuitively, we can “feel” that AC BDE … but how to prove it?
Hint: Armstrong’s axioms (and theorems)
So AC is a super-key. But is it a candidate key? (What’s the difference?)
Do you think there are other candidate keys? Why or why not?
Are there any BCNF violations?
Hint: To find BCNF violations, do we need to check F or F+? Why?
Which one do choose to start decomposition?
Now write down the two relations resulting from decomposition, including
their FDs F1 and F2 and their candidate keys:
SKIP the remainder of Section 7.5, starting with 7.5.1.2. (p.289)
SKIP 7.6, 7.7
Read and take notes: Sections 7.8, 7.9
Homework for Ch.7: 1, 3, 5, 6, 7, 11