67
1 Normalization

Class Normalization III

Embed Size (px)

DESCRIPTION

This presentation contains infprmation about normalization and its types.

Citation preview

  • *Normalization

  • *Inference Rules for FDsIs equivalent toSplitting rule and Combining ruleA1, A2, , An B1, B2, , BmA1, A2, , An B1A1, A2, , An B2 . . . . .A1, A2, , An Bm

    A1...AmB1...Bm

  • *Inference Rules for FDs(continued)Trivial RuleWhy ?where i = 1, 2, ..., nA1, A2, , An Ai

    A1Am

  • *Inference Rules for FDs(continued)Transitive Closure RuleIfandthenWhy ?A1, A2, , An B1, B2, , BmB1, B2, , Bm C1, C2, , CpA1, A2, , An C1, C2, , Cp

  • *

    A1AmB1BmC1...Cp

  • *Closure of a set of FDsIt is not sufficient to consider just the given set of FDs We need to consider all FDs that holdGiven F, more FDs can be inferredSuch FDs are said to be logically implied by FF+ is the set of all FDs logically implied by FWe can compute F+ using formal definition of FDIf F were large, this process would be lengthy & cumbersomeAxioms or Rules of Inference provide simpler techniqueArmstrongs Axioms

  • *Inference Rules for FDsArmstrong's inference rules:IR1. (Reflexive) If Y X, then X YIR2. (Augmentation) If X Y, then XZ YZ(Notation: XZ stands for X U Z)IR3. (Transitive) If X Y and Y Z, then X Z

    IR1, IR2, IR3 form a sound & complete set of inference rules Never generates any wrong FDGenerate all FDs that hold

  • *Some additional inference rules that are useful:Decomposition: If XYZ, then XY & XZUnion: If XY & XZ, then XYZPsuedotransitivity: If XY & WYZ,then WXZ

    The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property) Inference Rules for FDs

  • *ExampleR = (A, B, C, G, H, I) F = { A B A C CG H CG I B H}some members of F+A H by transitivity from A B and B HAG I by augmenting A C with G, to get AG CG and then transitivity with CG I CG HI By union rule

  • *Procedure for Computing F+To compute the closure of a set of functional dependencies F: F + = F repeat for each functional dependency f in F+ apply reflexivity and augmentation rules on f add the resulting functional dependencies to F + for each pair of functional dependencies f1and f2 in F + if f1 and f2 can be combined using transitivity then add the resulting functional dependency to F + until F + does not change any further

    NOTE: We shall see an alternative procedure for this task later

  • Example on Computing F+* F = {A B, B C, C D E } Step 1: For each f in F, apply reflexivity rule We get: CD C; CD D Add them to F: F = {A B, B C, C D E; CD C; CD D } Step 2: For each f in F, apply augmentation rule From A B we get: A AB; AB B; AC BC; AD BD; ABC BC; ABD BD; ACD BCD From B C we get: AB AC; BC C; BD CD; ABC AC; ABD ACD, etc etc. Step 3: Apply transitivity on pairs of fs Keep repeating You get the idea

  • Reasoning About FDs*Example: Contracts(contract_id, supplier, project, dept, part, qty, value) and: C is the key: C CSJDPQV Project purchases each part using single contract: JP C Dept purchases at most one part from a supplier: SD P

    JP C, C CSJDPQV imply JP CSJDPQV SD P implies SDJ JP SDJ JP, JP CSJDPQV imply SDJ CSJDPQV

  • Reasoning About FDs* Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in # of attrs!)

    Typically, we just want to check if a given FD X Y is in the closure of a set of FDs F. An efficient check: Compute attribute closure of X (denoted X+) wrt F: Set of all attributes Z such that X Z is in F+ There is a linear time algorithm to compute this. Check if Y is in X+ Does F = {A B, B C, C D E } imply A E? i.e, is A E in the closure F+? Equivalently, is E in A+?

  • *Closure of Attribute SetsClosure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X

    X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

  • *Closure of Attribute SetsGiven a set of attributes a, define the closure of a under F (denoted by a+) as the set of attributes that are functionally determined by a under F

    Algorithm to compute a+, the closure of a under F result := a; while (changes to result) do for each in F do begin if result then result := result end

  • *Example of Attribute Set ClosureR = (A, B, C, G, H, I)F = {A B, A C, CG H, CG I, B H}(AG)+1.result = AG2.result = ABCG(A C and A B)3.result = ABCGH(CG H and CG AGBC)4.result = ABCGHI(CG I and CG AGBCH)Is AG a candidate key? Is AG a super key?Does AG R? == Is (AG)+ RIs any subset of AG a superkey?Does A R? == Is (A)+ RDoes G R? == Is (G)+ R

  • *Uses of Attribute ClosureThere are several uses of the attribute closure algorithm:Testing for superkey:To test if is a superkey, we compute +, and check if + contains all attributes of R.Testing functional dependenciesTo check if a functional dependency holds (or, in other words, is in F+), just check if +. That is, we compute + by using attribute closure, and then check if it contains . Is a simple and cheap test, and very usefulComputing closure of FFor each R, we find the closure +, and for each S +, we output a functional dependency S.

  • *

  • Computing F+

    * Given F={ A B, B C}. Compute F+ (with attributes A, B, C). Well do an example on A+. Step 1: Result = A Step 2: Consider A B, Result = A B = AB Consider B C, Result = AB C = ABC Step 3: A+ = {ABC}

  • *

  • *

  • *

  • *Boyce-Codd Normal Form (BCNF)A relation is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key.(A determinant is any attribute whose value determines other values with a row.)If a table contains only one candidate key, the 3NF and the BCNF are equivalent.BCNF is a special case of 3NF.Database Normalization

  • A Table That Is In 3NF But Not In BCNF

  • The Decomposition of a Table Structure to Meet BCNF Requirements

  • *Sample Data for a BCNF Conversion

  • *Decomposition into BCNF

  • *Based on FDs that take into account all candidate keys of a relationFor a relation with only 1 CK, 3NF & BCNF are equivalentA relation is said to be in BCNF if every determinant is a CKIs PLOTS in BCNF?NOBCNF

  • BCNF vs 3NFBCNF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either: Y is a subset of X or,X is a superkey of R

    3NF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either: Y is a subset of X or,X is a superkey of R, orY is a subset of K for some key K of RN.b., no subset of a key is a key

  • 3NF SchemaFor every functionaldependency X->Y in a set Fof functional dependenciesover relation R, either: Y is a subset of X or,X is a superkey of R, orY is a subset of K for some key K of RClient, Office -> Client, Office, AccountAccount -> Office

    AccountClientOfficeAJoe1BMary1AJohn1CJoe2

  • 3NF SchemaFor every functionaldependency X->Y in a set Fof functional dependenciesover relation R, either: Y is a subset of X or,X is a superkey of R, orY is a subset of K for some key K of RClient, Office -> Client, Office, AccountAccount -> Office

    AccountClientOfficeAJoe1BMary1AJohn1CJoe2

  • BCNF vs 3NFFor every functionaldependency X->Y in a set Fof functional dependenciesover relation R, either: Y is a subset of X or,X is a superkey of RY is a subset of K for some key K of R3NF has some redundancyBCNF does not

    Unfortunately, BCNF is not dependency preserving, but 3NF isClient, Office -> Client, Office, AccountAccount -> Office

    AccountClientOfficeAJoe1BMary1AJohn1CJoe2

    AccountOfficeA1B1C2

    AccountClientAJoeBMaryAJohnCJoe

  • *Equivalence of Sets of FDs Two sets of FDs F and G are equivalent if:- every FD in F can be inferred from G, &- every FD in G can be inferred from FHence, F and G are equivalent if F+=G+Definition: F covers G if every FD in G can be inferred from F (i.e., if G+F+)F and G are equivalent if F covers G and G covers FThere is an algorithm for checking equivalence of sets of FDs

  • *Canonical CoverSets of functional dependencies may have redundant dependencies that can be inferred from the othersFor example: A C is redundant in: {A B, B C}Parts of a functional dependency may be redundantE.g.: on RHS: {A B, B C, A CD} can be simplified to {A B, B C, A D} E.g.: on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D} Intuitively, a canonical cover of F is a minimal set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies

  • *Extraneous AttributesConsider a set F of functional dependencies and the functional dependency in F.Attribute A is extraneous in if A and F logically implies (F { }) {( A) }.Attribute A is extraneous in if A and the set of functional dependencies (F { }) { ( A)} logically implies F.Note: implication in the opposite direction is trivial in each of the cases above, since a stronger functional dependency always implies a weaker oneExample: Given F = {A C, AB C }B is extraneous in AB C because {A C, AB C} logically implies A C (I.e. the result of dropping B from AB C).Example: Given F = {A C, AB CD}C is extraneous in AB CD since AB C can be inferred even after deleting C

  • *Testing if an Attribute is ExtraneousConsider a set F of functional dependencies and the functional dependency in F.To test if attribute A is extraneous in compute ({} A)+ using the dependencies in F check that ({} A)+ contains ; if it does, A is extraneousTo test if attribute A is extraneous in compute + using only the dependencies in F = (F { }) { ( A)}, check that + contains A; if it does, A is extraneous

  • *Canonical CoverA canonical cover for F is a set of dependencies Fc such that F logically implies all dependencies in Fc, and Fc logically implies all dependencies in F, andNo functional dependency in Fc contains an extraneous attribute, andEach left side of functional dependency in Fc is unique.To compute a canonical cover for F: repeat Use the union rule to replace any dependencies in F 1 1 and 1 2 with 1 1 2 Find a functional dependency with an extraneous attribute either in or in If an extraneous attribute is found, delete it from until F does not changeNote: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied

  • *Computing Canonical CoverR = (A, B, C) F = {A BC, B C, A B, AB C}Combine A BC and A B into A BCSet is now {A BC, B C, AB C}A is extraneous in AB CCheck if the result of deleting A from AB C is implied by the other dependenciesYes: in fact, B C is already present!Set is now {A BC, B C}C is extraneous in A BC Check if A C is logically implied by A B and the other dependenciesYes: using transitivity on A B and B C. Can use attribute closure of A in more complex casesThe canonical cover is: A B, B C

  • *Decomposition1. Decomposing the schema R = ( bname, bcity, assets, cname, lno, amt)R1 = (bname, bcity, assets, cname)R1 = (cname, lno, amt)2. Decomposing the instanceR = R1 U R2

    bname

    bcity

    assets

    cname

    lno

    amt

    Downtown

    Bkln

    9M

    Jones

    L-17

    1000

    Downtown

    Bkln

    9M

    Johnson

    L-23

    2000

    Mianus

    Horse

    1.7M

    Jones

    L-93

    500

    Downtown

    Bkln

    9M

    Hayes

    L-17

    1000

    bname

    bcity

    assets

    cname

    Downtown

    Bkln

    9M

    Jones

    Downtown

    Bkln

    9M

    Johnson

    Mianus

    Horse

    1.7M

    Jones

    Downtown

    Bkln

    9M

    Hayes

    cname

    lno

    amt

    Jones

    L-17

    1000

    Johnson

    L-23

    2000

    Jones

    L-93

    500

    Hayes

    L-17

    1000

  • *Goals of Decomposition1. Lossless Joins Want to be able to reconstruct big (e.g. universal) relation by joining smaller ones (using natural joins) (i.e. R1 R2 = R)

    2. Dependency preservation Want to minimize the cost of global integrity constraints based on FDs ( i.e. avoid big joins in assertions)

    3. Redundancy Avoidance Avoid unnecessary data duplication (the motivation for decomposition)Why important? LJ : information loss DP: efficiency (time) RA: efficiency (space), update anomalies

  • Lossy DecompositionJOINSpurious Tuples

    A

    B

    C

    1

    2

    3

    4

    5

    6

    7

    2

    8

    1

    2

    8

    7

    2

    3

    A

    B

    C

    1

    2

    3

    4

    5

    6

    7

    2

    8

    A

    B

    1

    2

    4

    5

    7

    2

    B

    C

    2

    3

    5

    6

    2

    8

  • *Dependency Goal #1: lossless joinsA bad decomposition: =Problem: join adds meaningless tuples lossy join: by adding noise, have lost meaningful information as a result of the decomposition

    bname

    bcity

    assets

    cname

    Downtown

    Bkln

    9M

    Jones

    Downtown

    Bkln

    9M

    Johnson

    Mianus

    Horse

    1.7M

    Jones

    Downtown

    Bkln

    9M

    Hayes

    cname

    lno

    amt

    Jones

    L-17

    1000

    Johnson

    L-23

    2000

    Jones

    L-93

    500

    Hayes

    L-17

    1000

    bname

    bcity

    assets

    cname

    lno

    amt

    Downtown

    Bkln

    9M

    Jones

    L-17

    1000

    Downtown

    Bkln

    9M

    Jones

    L-93

    500

    Downtown

    Bkln

    9M

    Johnson

    L-23

    2000

    Mianus

    Horse

    1.7M

    Jones

    L-17

    1000

    Mianus

    Horse

    1.7M

    Jones

    L-93

    500

    Downtown

    Bkln

    9M

    Hayes

    L-17

    1000

  • *Dependency Goal #1: lossless joinsIs the following decomposition lossless or lossy? Ans: Lossless: R = R1 R2, it has 4 tuples

    bname

    assets

    cname

    lno

    Downtown

    9M

    Jones

    L-17

    Downtown

    9M

    Johnson

    L-23

    Mianus

    1.7M

    Jones

    L-93

    Downtown

    9M

    Hayes

    L-17

    lno

    bcity

    amt

    L-17

    Bkln

    1000

    L-23

    Bkln

    2000

    L-93

    Horse

    500

  • *Ensuring Lossless JoinsA decomposition of R : R = R1 U R2Is lossless iff R1 R2 R1, or R1 R2 R2(i.e., intersecting attributes must for a superkey for one of the resulting smaller relations)

  • Lossless DecompositionTheoremA decomposition of R into R1 and R2 is lossless join wrt FDs F, if and only if at least one of the following dependencies is in F+:R1 R2 R1R1 R2 R2In other words, R1 R2 forms a superkey of either R1 or R2

  • Lossy Decomposition

    S#StatusS330S530

    S#CityS3ParisS5Athens

    S#StatusS330S530

    StatusCity30Paris30Athens

    S#StatusCityS330ParisS530Athens

  • Lossless DecompositionObserve that S satisfies the FDs:S# Status & S# CityIt can not be a coincidence that S is equal to the join of its projections on {S#, Status} & {S#, City}Heaths Theorem:Let R{A,B,C} be a relation, where A, B, & C are sets of attributes. If R satisfies AB & AC, then R is equal to the join of its projections on {A,B} & {A,C}Observe that in the second decomposition of S the FD, S# City is lost

  • Lossless DecompositionThe decomposition of R into R1, R2, Rn is lossless if for any instance r of R r = R1 (r ) R2 (r ) Rn (r )We can replace R by R1 & R2, knowing that the instance of R can be recovered from the instances of R1 & R2We can use FDs to show that decompositions are lossless

  • *Decomposition Goal #2: Dependency preservationGoal: efficient integrity checks of FDs

    An example w/ no DP:R = ( bname, bcity, assets, cname, lno, amt) bname bcity assets lno amt bname

    Decomposition: R = R1 U R2 R1 = (bname, assets, cname, lno) R2 = (lno, bcity, amt)

    Lossless but not DP. Why?Ans: bname bcity assets crosses 2 tables

  • *Decomposition Goal #2: Dependency preservationTo ensure best possible efficiency of FD checks ensure that only a SINGLE table is needed in order to check each FD

    i.e. ensure that: A1 A2 ... An B1 B2 ... Bm

    Can be checked by examining Ri = ( ..., A1, A2, ..., An, ..., B1, ..., Bm, ...)To test if the decomposition R = R1 U R2 U ... U Rn is DP

    (1) see which FDs of R are covered by R1, R2, ..., Rn

    (2) compare the closure of (1) with the closure of FDs of R

  • *Decomposition Goal #2: Dependency preservationExample: Given F = { AB, AB D, C D}

    consider R = R1 U R2 s.t. R1 = (A, B, D) , R2 = (C, D)(1) F+ = { ABD, CD}+(2) G = {ABD, CD, ...} +

    (3) F+ = G+ note: G+ cannot introduce new FDs not in F+

    Decomposition is DP

  • *Dependency Preservation Let Fi be the set of dependencies F + that include only attributes in Ri. A decomposition is dependency preserving, if (F1 F2 Fn )+ = F +If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive.

  • *Testing for Dependency PreservationTo check if a dependency is preserved in a decomposition of R into R1, R2, , Rn we apply the following test (with attribute closure done with respect to F)result = while (changes to result) do for each Ri in the decomposition t = (result Ri)+ Ri result = result tIf result contains all attributes in , then the functional dependency is preserved.We apply the test on all dependencies in F to check if a decomposition is dependency preservingThis procedure takes polynomial time, instead of the exponential time required to compute F+ and (F1 F2 Fn)+

  • ExampleR = (A, B, C) F = {A B, B C)Can be decomposed in two different waysR1 = (A, B), R2 = (B, C)Lossless-join decomposition: R1 R2 = {B} and B BCDependency preservingR1 = (A, B), R2 = (A, C)Lossless-join decomposition: R1 R2 = {A} and A ABNot dependency preserving (cannot check B C without computing R1 R2)

  • *Decomposition Goal #3: Redudancy AvoidanceRedundancy for B=x , y and zExample: (1) An FD that exists in the above relation is: B C

    (2) A superkey in the above relation is A, (or any set containing A)When do you have redundancy? Ans: when there is some FD, XY covered by a relation and X is not a superkey

    A

    B

    C

    a

    x

    1

    e

    x

    1

    g

    y

    2

    h

    y

    2

    m

    y

    2

    n

    z

    1

    p

    z

    1

  • Problems with DecompositionsThere are three potential problems to consider: Some queries become more expensive e.g., What is the price of prop# 1? Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation! Fortunately, not in the PLOTS example

    Checking some dependencies may require joining the instances of the decomposed relations.Fortunately, not in the PLOTS exampleTradeoff: Must consider these issues vs. redundancy

  • ExampleR = (A, B, C ) F = {A B B C} Key = {A}R is not in BCNF (B C but B is not superkey)Decomposition R1 = (A, B), R2 = (B, C)R1 and R2 in BCNFLossless-join decompositionDependency preserving

  • Testing for BCNFTo check if a non-trivial dependency causes a violation of BCNF1. compute + (the attribute closure of ), and 2. verify that it includes all attributes of R, that is, it is a superkey of R.Simplified test: To check if a relation schema R is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+.If none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.However, simplified test using only F is incorrect when testing a relation in a decomposition of RConsider R = (A, B, C, D, E), with F = { A B, BC D}Decompose R into R1 = (A,B) and R2 = (A,C,D, E) Neither of the dependencies in F contain only attributes from (A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF. In fact, dependency AC D in F+ shows R2 is not in BCNF.

  • BCNF and Dependency PreservationR = (J, K, L ) F = {JK L L K } Two candidate keys = JK and JLR is not in BCNFAny decomposition of R will fail to preserveJK L This implies that testing for JK L requires a join

    It is not always possible to get a BCNF decomposition that is dependency preserving

  • Third Normal Form: MotivationThere are some situations where BCNF is not dependency preserving, and efficient checking for FD violation on updates is importantSolution: define a weaker normal form, called Third Normal Form (3NF)Allows some redundancy (with resultant problems; we will see examples later)But functional dependencies can be checked on individual relations without computing a join.There is always a lossless-join, dependency-preserving decomposition into 3NF.

  • Redundancy in 3NFJj1

    j2

    j3

    nullLl1

    l1

    l1

    l2Kk1

    k1

    k1

    k2repetition of information (e.g., the relationship l1, k1) (i_ID, dept_name)need to use null values (e.g., to represent the relationship l2, k2 where there is no corresponding value for J).(i_ID, dept_nameI) if there is no separate relation mapping instructors to departmentsThere is some redundancy in this schemaExample of problems due to redundancy in 3NFR = (J, K, L) F = {JK L, L K }

  • Testing for 3NFOptimization: Need to check only FDs in F, need not check all FDs in F+.Use attribute closure to check for each dependency , if is a superkey.If is not a superkey, we have to verify if each attribute in is contained in a candidate key of Rthis test is rather more expensive, since it involve finding candidate keystesting for 3NF has been shown to be NP-hardInterestingly, decomposition into third normal form (described shortly) can be done in polynomial time

  • 3NF Decomposition AlgorithmLet Fc be a canonical cover for F; i := 0; for each functional dependency in Fc do if none of the schemas Rj, 1 j i contains then begin i := i + 1; Ri := end if none of the schemas Rj, 1 j i contains a candidate key for R then begin i := i + 1; Ri := any candidate key for R; end /* Optionally, remove redundant relations */ repeat if any schema Rj is contained in another schema Rk then /* delete Rj */ Rj = R;; i=i-1; return (R1, R2, ..., Ri)

  • Testing Decomposition for BCNFTo check if a relation Ri in a decomposition of R is in BCNF, Either test Ri for BCNF with respect to the restriction of F to Ri (that is, all FDs in F+ that contain only attributes from Ri)or use the original set of dependencies F that hold on R, but with the following test:for every set of attributes Ri, check that + (the attribute closure of ) either includes no attribute of Ri- , or includes all attributes of Ri.If the condition is violated by some in F, the dependency (+ - ) Ri can be shown to hold on Ri, and Ri violates BCNF.We use above dependency to decompose Ri

  • BCNF Decomposition Algorithmresult := {R }; done := false; compute F +; while (not done) do if (there is a schema Ri in result that is not in BCNF) then begin let be a nontrivial functional dependency that holds on Ri such that Ri is not in F +, and = ; result := (result Ri ) (Ri ) (, ); end else done := true;

    Note: each Ri is in BCNF, and decomposition is lossless-join.

  • Example of BCNF Decompositionclass (course_id, title, dept_name, credits, sec_id, semester, year, building, room_number, capacity, time_slot_id)Functional dependencies:course_id title, dept_name, creditsbuilding, room_numbercapacitycourse_id, sec_id, semester, yearbuilding, room_number, time_slot_idA candidate key {course_id, sec_id, semester, year}.BCNF Decomposition:course_id title, dept_name, credits holdsbut course_id is not a superkey. We replace class by:course(course_id, title, dept_name, credits)class-1 (course_id, sec_id, semester, year, building, room_number, capacity, time_slot_id)

  • BCNF Decomposition (Cont.)course is in BCNFHow do we know this?building, room_numbercapacity holds on class-1 but {building, room_number} is not a superkey for class-1.We replace class-1 by:classroom (building, room_number, capacity)section (course_id, sec_id, semester, year, building, room_number, time_slot_id)classroom and section are in BCNF.

    *********28*28*28*28*28**28***********