2
Revisit a Previous Example
v Consider relation obtained from Hourly_Emps: § Hourly_Emps(ssn, name, lot, rating, hrly_wages, hrs_worked) § Denote the schema by listing all its attributes: SNLRWH
Contract_Emps
name ssn
Employees
Lot
hourly_wages ISA
Hourly_Emps
contractid
hours_worked rating
3
Example (Contd.)
v Rating (R) determines hourly wages (W): § Redundant storage § Update: Can we change W in just the 1st tuple of rating 8? § Insertion: Insert an employee without knowing the hourly wage for
his rating? Insert the hourly wage for rating 10 with no employee? § Deletion: What if we have deleted all employees with rating 5.
S N L R W H123-22-3666 Attishoo 48 8 10 40231-31-5368 Smiley 22 8 10 30131-24-3650 Smethurst 35 5 7 30434-26-3751 Guldu 35 5 7 32612-67-4134 Madayan 35 8 10 40
4
Will Two Smaller Tables be Better? S N L R W H123-22-3666 Attishoo 48 8 10 40231-31-5368 Smiley 22 8 10 30131-24-3650 Smethurst 35 5 7 30434-26-3751 Guldu 35 5 7 32612-67-4134 Madayan 35 8 10 40
S N L R H123-22-3666 Attishoo 48 8 40231-31-5368 Smiley 22 8 30131-24-3650 Smethurst 35 5 30434-26-3751 Guldu 35 5 32612-67-4134 Madayan 35 8 40
Hourly_Emps2 R W8 105 7
Wages
5
The Evils of Redundancy v Redundant storage causes several operation anomalies:
§ Insert and delete anomalies
v Functional dependencies, a new type of integrity constraint, can be used to identify schemas with such problems.
§ IC’s we have seen: domain constraints, key constraints, foreign key constraints, general constraints (checks or assertions)
§ A new type of IC, functional dependencies, which can be checked using SQL checks or assertions.
6
1. Functional Dependencies (FDs)
v A functional dependency X à Y holds over relation R if: § X and Y are two sets of attributes of R; § ∀ allowable instance r of R:
t1 ∈ r, t2 ∈ r, πX (t1) = πX (t2) implies πY (t1) = πY (t2)
§ E.g., X = {R}, Y = {W}, and R à W
v An FD holds for all allowable instances of a schema.
v Key constraint is a special from of FD: § K is a super key for R means that K à R. § K à R does not require K to be minimal!
7
FDs in the Hourly_Emps Example
v Hourly_Emps(ssn, name, office, rating, hrly_wages, hrs_worked) § Denoted by SNLRWH
v Some FDs on Hourly_Emps: § ssn is the key: S à SNLRWH § rating determines hrly_wages: R à W
8
Reasoning About FDs
v Given some FDs, we can usually infer additional FDs: § {ssn à did, did à building} implies ssn à building
v Given a set of FDs F, closure of F (F+) is the set of all FDs that are implied by F. § All FDs in F+ hold over the relation R.
9
Axioms and Rules
v Armstrong’s Axioms (X, Y, Z are sets of attributes): § Reflexivity: If X ⊆ Y, then Y à X § Augmentation: If X à Y, then XZ à YZ for any Z § Transitivity: If X à Y and Y à Z, then X à Z
v Couple of additional rules (that follow from AA): § Union: If X à Y and X à Z, then X à YZ § Decomposition: If X à YZ, then X à Y and X à Z
v Computing the closure F+ using the axioms/rules: § Compute for all FD’s. § Size of closure is exponential in number of attributes!
10
Attribute Closure v What if we just want to check if a given FD X à Y is in F+? v Simple algorithm for attribute closure X+:
§ X+ := {X} § DO if there is U à V in F, s.t. U ⊆ X+, then X+= X+∪V UNTIL no change § Check if a given FD X à Y is in F+: Simply check if Y ⊆ X+.
v Does F = {A à B, B à C, C D à E } imply A à E? § Is A à E in the closure F+? § Equivalently, is E in A+?
11
2. Normal Forms v Role of FDs in detecting redundancy: R(A,B,C)
– Given A à B:
– No FDs (except key constraints) hold:
v Normal forms: If a reln does not have certain kinds of FDs, certain redundancy related problems are known not to occur.
No redundancy here.
Two tuples have the same A value will have the same B value!
12
Boyce-Codd Normal Form (BCNF)
v Rewrite every FD in the form of X à A, X is a set of attributes, A is a single attribute § Use the decomposition rule
v Reln R with FDs F is in BCNF if ∀ X à A in F+: 1. A ∈ X (called a trivial FD), or 2. X is a superkey (i.e., contains a key) for R.
v In BCNF, the only non-trivial FDs are key constraints!
13
Boyce-Codd Normal Form (contd.)
v Can we infer the value marked by ‘?’ § If X → A, then the relation is not in BCNF § A reln. in BCNF can’t have X → A
X Y A
x y1 a x y2 ?
v Relation in BCNF:
§ Every field of every tuple records information that can’t be inferred using FD’s from other fields.
§ No redundancy can be detected using FDs!
14
Another Example
v When is redundancy possible? § Reserves( Sailor, Boat, Date, Credit_card) with
S à C, C à S § Keys are SBD and CBD. § It is not in BCNF. § For each reservation of sailor S, same (S, C) is stored.
18
3. Decomposing a Relation Scheme
v A decomposition of R breaks R into two or more relns s.t. § Each new reln contains a subset of the attributes of R. § Every attribute of R appears in at least one new reln.
v Decompositions should be used only when: § R has redundancy related problems (not in BCNF), § We can afford the joins in queries later (performance penalty).
19
Example Decomposition
v Hourly_Emps (SNLRWH) § FDs: S à SNLRWH and R à W. § R à W violates BCNF. § And it causes repeated (R,W) storage.
v To fix this, create a relation RW, remove W from the main schema. (SNLRWH) → (SNLRH) and (RW).
20
(1) Lossless Join Decompositions
v Decomposition of schema R into R1 and R2 is lossless-join w.r.t. a set of FDs F if ∀ reln. instance r that satisfies F:
§ r = π R1 (r) π R2 (r)
v It is always true that r ⊆ π R1 (r) π R2 (r).
§ A bad decomposition can cause r ⊂ π R1 (r) π R2 (r).
A B C1 2 34 5 67 2 8
A B1 24 57 2
B C2 35 62 8
A B C1 2 34 5 67 2 81 2 87 2 3
21
A Simple Test for Lossless Join
v Theorem: Decomposition of R into R1 and R2 is lossless-join wrt F iff the F+ contains: § R1 ∩ R2 à R1 or R1 ∩ R2 à R2 (intersection of R1 and R2 is a (super) key of one of them.)
v Algorithm for a lossless join is based on the above result: § If U àV holds over R and violates a BCNF definition, the
decomposition into UV and R - V is lossless-join.
22
(2) Dependency Preserving Decomposition
v Contracts(Contractid, Supplierid, proJectid, Deptid, Partid, Qty, Value), CSJDPQV, with FDs: § C is key. § JP à C: a project buys a given part using a single contract. § SD à P: a department buys at most one part from a supplier.
v What are the keys? Is it in BCNF? § Keys: C, JP, SDJ. § Not in BCNF due to SDàP
v Lossless-join BCNF decomposition: SDP, CSJDQV § Problem: Checking JP à C requires an assertion (using join) § How to write this assertion in SQL?
23
Dependency Preserving Decomposition
v The projection of a FD set F onto a decomposed reln R1, FR1 = F+
R1, contains all U àV s.t. § (i) U, V are both in R1, § (ii) U àV is in closure F+.
v Decomposition of R into R1, R2 is dependency preserving if (FR1 UNION FR2 ) + = F +
v Important to consider F + (not F!) in this definition: § ABC, A à B, B à C, C à A, decomposed into AB and BC. § Is this dependency preserving? Is C à A preserved?
24
Algorithm for Decomposition into BCNF
v Relation R with FDs F. If X à Y violates BCNF, decompose R into R1 = XY and R2 = R – Y. § For each Ri, compute FRi and check if it is in BCNF. § If not, pick a FD violating BCNF and keep composing Ri.
v Repeated application of this process yields a lossless join decomposition into BCNF relations. § But not necessarily dependency-preserving! So need to check
whether some FDs are lost.
25
Steps of BCNF Decomposition v Contracts(CSJDPQV), key C, JP à C, SD à P, J à S.
1. Keys. C, JP, DJ. 2. Normal form. Not in BCNF; SD à P and J à S violate BCNF. 3. Decomposition. To deal with SD à P, decompose into SDP, CSJDQV.
n SDP is in BCNF. But CSJDQV is not because: 1. Projection of FDs and keys. Projection of FDs: keys C and DJ, J à S. 2. Normal form. Not BCNF; J à S violates BCNF. 3. Decomposition. For J à S, decompose CSJDQV into JS and CJDQV.
n JS is in BCNF. So is CJDQV.
v If several FDs violate BCNF, the order of “dealing with” them could lead to very different sets of relations!
26
BCNF and Dependency Preservation
v Is a lossless-join BCNF decomposition dependency-preserving? § CSJDPQV with key C, JP à C, SD à P and J à S. § CSJDPQV decomposed into SDP, JS, and CJDQV. § What about the lost FD, JP à C?
a) Use assertion. Bad performance on inserts/updates. b) Adding JPC as a new relation to preserve JP à C introduces
redundancy across relations. In general, the new relation may not be in BCNF.
c) Don’t decompose. Deal with insertion/deletion anomalies. DBA should examine the tradeoffs.
v There may not exist a lossless join, dependency-preserving decomposition into BCNF. § But there is always a lossless join, dependency-preserving decomposition
into 3NF (more see the textbook)…