29SpCS157BL16BCNF&Lossless

Embed Size (px)

Citation preview

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    1/67

    BCNF & Lossless Decomposition

    Prof. Sin-Min Lee

    Department of Computer Science

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    2/67

    NormalizationReview on Keys

    superkey: a set of attributes which will uniquelyidentify each tuple in a relationcandidate key: a minimal superkey

    primary key: a chosen candidate key

    secondary key: all the rest of candiate keys prime attribute: an attribute that is a part of acandidate key (key column)

    nonprime attribute: a nonkey column

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    3/67

    NormalizationFunctional Dependency Type by Keys

    whole (candidate) key p nonprime attribute : fullFD (no violation) partial key p nonprime attribute : partial FD(violation of 2NF)

    nonprime attribute p nonprime attribute :transitive FD (violation of 3NF)not a whole key p prime attribute : violation of BCNF

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    4/67

    Functional DependenciesLet R be a relation schema

    E R and F R

    The functional dependency

    E p F

    holds on R iff for any legal relations r (R), whenever two tuples t 1and t 2 of r have same values for E, they have same values for F.

    t 1[E] = t 2 [E] t 1[ F ] = t 2 [ F ]

    On this instance A p B does NOT hold but B p A does hold.

    1 41 5 3 7

    A B

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    5/67

    1. ClosureGiven a set of functional dependencies, F, itsclos ure , F + , is all FDs that are implied by FDs in F .

    e. g . I f A B , and B C ,

    then clearly A C

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    6/67

    Armstrongs Ax iomsWe can find F + by applying ArmstrongsAx ioms: if F E , then E p F (reflexivity) if E p F , then KE p K F (augmentation)

    if E p F , and F p K, then E p K (transitivity)

    These rules are

    sound (generate only functional dependencies thatactually hold) and

    complete (generate all functional dependencies thathold).

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    7/67

    Additional rulesIf E p F and E p K, then E p F K(union)If E p F K, then E p F and E p K(decomposition)

    If E p F and KF p H , then E Kp H(pseudotransitivity)

    The above rules can be inferred from Armstrongs

    axioms.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    8/67

    Example R = ( A , B , C , G , H , I) F = { A p B

    A p C CG p H CG p I

    B p H }Some members of F +

    A p H by transitivity from A p B and B p H

    A G p I

    by augmenting A p C with G, to get A G p CGand then transitivity with CG p I CG p HI

    by augmenting CG p I to infer CG p CG I , and augmenting of CG p H to infer CGI p HI ,

    and then transitivity

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    9/67

    2. Closure of an attribute set

    Given a set of attributes A and a set of FDs F, clos ure o f A under F is the set of all attributes implied by A

    In other words, the largest B such that:A B

    Redefining su per keys:The clos ure o f a s u per key is the entire relation

    schemaRedefining candidate keys:

    1. It is a super key

    2. No subset of it is a super key

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    10/67

    Computing the closure for A

    Simple algorithm

    1. Start with B = A.

    2. Go over all functional dependencies, F pK, in F +

    3. If F B , thenAdd Kto B

    4. Repeat till B

    changes

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    11/67

    Example R = ( A , B , C , G , H , I) F = { A p B A p C

    CG p H CG p I

    B p H }

    (AG) + ?1 . result = AG

    2. res ult = AB CG ( A p C and A p B )

    3. res ult = AB CGH (CG p H and CG A G B C)4. res ult = AB CGHI (CG p I and CG A G B CH

    Is (AG) a candidate key ?1. It is a super key.2. A+ = BC G+ = G.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    12/67

    Uses of attribute set closuresDetermining su perkeys and candidate keys

    Determining if A B is a valid FDCheck if A+ contains B

    Can be used to compute F+

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    13/67

    Database Normalization

    F unctional dependency ( FD ) means that if there is only one possible value of Y for every value of X, thenY is Functionally dependent on X.

    Is the following FDs hold?

    Y X "

    X Y Z

    10 B1 C1

    10 B2 C2

    11 B4 C1

    12 B3 C4

    13 B1 C1

    14 B3 C4

    Y X "

    Y Z "

    Z Y "

    X Y "

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    14/67

    Functional Dependency is good. With functionaldependency the primary key ( A ttribute A) determines thevalue of all the other non-key attributes ( A ttributesB,C,D,etc.)

    Transitive dependency is bad. Transitive dependencyexists if the primary/candidate key ( A ttribute A) determinesnon-key A ttribute B, and A ttribute B determines non-keyA ttribute C.

    If a relation schema has more than one key, each is called acandidate key

    An attribute in a relation schema R is called prim if it is amember of some candidate key of R

    Database Normalization

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    15/67

    First Normal Form (1NF)

    Each attribute must be atomic ( single v alue)No repeating columns within a row (composite attributes)No multi-valued columns.

    1NF si mplifies attributesQueries become easier.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    16/67

    1NF

    D eptno D name Location10 IT Leeds, Bradford, Kent

    20 Research Hundredfold

    30 Marketing Leeds

    D eptno D name10 IT

    20 Research

    30 Marketing

    D eptno Location10 Leeds

    10 Bradfprd

    10 Kent

    20 Hundredfold

    30 Leeds

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    17/67

    Second Normal Form (2NF)

    Each attribute must be functionally d e pend ent on the pri mary k ey.

    If the primary key is a single attribute, then the relation is in 2NFThe test for 2NF involves testing for FDs whose left-hand-sideattribute are part of the primary keyDisallow partial dependency, where non-keys attributes depend on

    part of a composite primary keyIn short, remove partial dependencies

    2 NF i mprov es d ata integrity.Prevents update, insert, and delete anomalies.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    18/67

    2NFPNo PName PLoc EmpNo E Name Salary Address HoursNo

    G iven the following FD s:

    Assuming all attributes are atomic, is the above relation inthe 1 N F , 2 N F ?

    Relation X 1 Relation X3

    Relation X 2

    A ddressSalary Name EmpNo

    Loc Dname PNo

    Ho ursNo EmpNo PNo

    ,,

    ,

    ,

    "

    "

    "

    PNo PName PLoc

    EmpNo E Name Salary Address

    PNo EmpNo HoursNo

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    19/67

    Third Normal Form (3NF)Remove transitive dependencies.T ransitive dependency

    A non A non--prime attribute is dependent on another, nonprime attribute is dependent on another, non- -primeprimeattribute or attributesattribute or attributes Attribute is the result of a calculation Attribute is the result of a calculation

    Examples: Area code attribute based on City attribute of a customerTotal price attribute of order entry based on quantity attribute

    and unit price attribute (calculated value)

    Solution:Any transitive dependencies are moved into a smaller table.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    20/67

    Transitive Dependence

    G ive a relation R ,Assume the following FD hold:

    Note : Both Ename and Address attributes are non-key attributes in R, and since

    Address depends on a non-Prime attribute Name, which depends on the primary

    key( EmpNo), a transitive dependency e xists

    EmpNo E Name Salary Address

    A ddress EmpNo A ddresst Ename Ename EmpNo """ ,,

    Add ressna e "

    EmpNo E Name Salary Ename Address R1 R2

    Note : If address is a prime attribute Then R is in 3NF

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    21/67

    Modification Anomalies

    What happens when you want to add a new book? change the address of a patron?

    delete a patron record?

    P atronName

    P atron Address

    BookID

    BookTitle

    Book Author

    BorrowDate

    DueDate

    ReturnDate

    SmithJonesHartHicksRice

    Jones

    12 Elk25 Sun73 Sera22 Main69 Witt

    25 Sun

    AAABBBCCC

    AAADDD

    CCC

    P eaceWar S stemP eaceS rin

    S stem

    BartHineVanBartL on

    Van

    2/42/42/52/122/6

    1/26

    2/182/182/192/252/20

    2/7

    2/152/192/232/282/8

    2/6

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    22/67

    Modification AnomaliesDeletion anomaly deleting one fact about an entity deletes a fact

    about another entityInsertion anomaly cannot insert one fact about an entity unless a

    fact about another entity is also added

    U pdate anomaly changing one fact about an entity requires

    multiple changes to a table

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    23/67

    Referential Integrity Constraint

    When we split a relation, we must payattention to the references across the newlyformed relationsE.g., a book must e xist before it can bechecked out: CH ECKO UT [BookID ] BOOK [BookID ]The DBMS or the applications will have tocheck/enforce constraints

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    24/67

    Boyce-Codd Normal FormEvery determinant is a candidate key ADVIS ER(SID,Major,Fname)

    ST U-ADV(SID,Fname)ADV-S UBJ(Fname,Subject)

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    25/67

    Multi-valued DependencyTwo or more functionally independent multi-valued attributes are dependent on another

    attribute EMPLOY EE (Name,Dependent,Project)

    Data redundancy and modification anomalies

    4NF: BCNF & no multi-valued dependencies EMPLOY EE (Name,Dependent) EMPLOY EE (Name, Project)

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    26/67

    Boyce-Codd Normal Form (BCNF) A relation is in Boyce-Codd normal form (BC N F ) if

    every determinant in the table is a candidate key.

    (A determinant is any attribute whose value determinesother values with a row.)

    If a table contains only one candidate key, the 3 N F and the BC N F are equivalent.

    BC N F is a special case of 3 N F .

    Database NormalizationDatabase Normalization

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    27/67

    A T able T hat Is In 3 N F But N ot In BC N F

    F igure 5.7

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    28/67

    T he D ecomposition of a T able Structure to MeetBC N F Requirements

    F igure 5.8

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    29/67

    Lossless-join DecompositionF or the case of R = (R 1, R 2), we require thatfor all possible relations r on schema R

    r =R1

    (r ) |X|R2

    (r ) A decomposition of R into R 1 and R 2 islossless join if and only if at least one of thefollowing dependencies is in F +:

    R 1 R 2 p R 1R 1 R 2 p R 2

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    30/67

    R = (A, B, C)F = {A p B, B p C)

    Can be decomposed in two different ways

    R 1 = (A, B), R 2 = (B, C)Lossless-join decomposition:R 1 R 2 = {B} and B p BC

    Dependency preserving

    R 1 = (A, B), R 2 = (A, C)

    Lossless-join decomposition:R 1 R 2 = { A} and A p AB

    Not dependency preserving(cannot check B p C without computing R 1 |X| R 2)

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    31/67

    Dependency PreservationLet F i be the set of dependencies F +that include only attributes in R i .

    A decomposition is dependency preserving , if

    (F 1 F 2 F n )+ = F +

    If it is not, then checking updates for violation of functional dependencies

    may require computing joins,which is expensive .

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    32/67

    Dependency PreservationTo check if a dependency E p F is preservedin a decomposition of R into R 1, R 2, , R n weapply the following test (with attribute closure

    done with respect to F )result = Ewhile (changes to result ) do

    for each R i in the decomposition

    t =

    (result R i )+

    R i result = result t If result contains all attributes in F, then thefunctional dependency E p F is preserved.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    33/67

    Dependency PreservationWe apply the test on all dependencies in F to check if a decomposition is dependencypreservingThis procedure takes polynomial time,instead of the exponential time required to

    compute F +

    and ( F 1 F 2

    F n)+

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    34/67

    FD ExampleR = ( A, B, C )F = { A p B, B p C }Key = { A}R is not in BCN F

    Decomposition R 1 = ( A, B), R 2 = (B,C)

    R 1 and R 2 now in BCN FLossless-join decompositionDependency preserving

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    35/67

    A Lossy Decomposition

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    36/67

    A im of Normalization

    Goal for a relational database design is: BCNF.

    Lossless join.

    Dependency preservation.

    If we cannot achieve this, we accept one of Lack of dependency preservation

    Redundancy due to use of 3NF

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    37/67

    Sample

    Data for a BC

    N FConversion

    T able 5.2

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    38/67

    D ecomposition into BC N F

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    39/67

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    40/67

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    41/67

    Perform lossless-join decompositions of each of the followingscheme into BCNF schemes: R( A , B, C, D, E) with dependency set{AB CD E, C D, D E}

    A B C D A B C D

    C D D EA B C E A B C D

    C DD E A B C A B C

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    42/67

    Given the FDs {B D, AB C, D B} and the relation { A , B, C,D}, give a two distinct lossless join decomposition to BNCF

    indicating the keys of each of the resulting relations.

    A B C D

    B D A B C

    A B C D

    B D A C D

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    43/67

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    44/67

    Example

    The name-addr-phones-beersLiked e xampleillustrated the MVD

    name->->phonesand the MVD

    name ->-> beersLiked.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    45/67

    Picture of MV

    DX ->->Y

    X Y others

    equal

    ex change

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    46/67

    MVD

    RulesEvery FD is an MVD. If X ->Y , then swapping Y s between two tuples that

    agree on X doesnt change the tuples. Therefore, the new tuples are surely in the

    relation, and we know X ->-> Y .

    Complementation : If X ->-> Y , and Z is all theother attributes, then X ->-> Z .

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    47/67

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    48/67

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    49/67

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    50/67

    Fourth Normal

    Form

    The redundancy that comes from MVDs isnot removable by putting the databaseschema in BCNF.There is a stronger normal form, called4NF, that (intuitively) treats MVDs as FDs

    when it comes to decomposition, but notwhen determining keys of the relation.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    51/67

    4N

    F D

    efinitionA relation R is in 4NF if whenever

    X ->-> Y is a nontrivial MVD, then X is asuperkey.

    Nontrivial means that:1 . Y is not a subset of X , and2. X and Y are not, together, all the attributes.

    Note that the definition of superkey stilldepends on FDs only.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    52/67

    BCN

    FVersus

    4N

    F

    Remember that every FD X ->Y is also anMVD, X ->-> Y .Thus, if R is in 4NF, it is certainly inBCNF. Because any BCNF violation is a 4NF

    violation.But R could be in BCNF and not 4NF,

    because MVDs are invisible to BCNF.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    53/67

    Normalization

    Good Decompositiondependency preserving decomposition- it is undesirable to lose functional dependenciesduring decompositionlossless join decomposition

    - join of decomposed relations should be able tocreate the original relation (no spurious tuples)

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    54/67

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    55/67

    D ecomposition and 4 NF

    If X ->-> Y is a 4NF violation for relation R, we can decompose R using the sametechnique as for BCNF.

    1 . X Y is one of the decomposed relations.2. A ll but Y X is the other.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    56/67

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    57/67

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    58/67

    Ex ample

    Drinkers(name, addr, phones, beersLiked)

    FD: name -> addr MVDs: name ->-> phones

    name ->-> beersLikedKey is {name, phones, beersLiked }.A ll dependencies violate 4NF.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    59/67

    Ex ample, Continued

    Decompose using name -> addr:1. Drinkers1(name, addr)

    In 4NF, only dependency is name -> addr.

    2. Drinkers2(name, phones, beersLiked) Not in 4NF. MVDs name ->-> phones andname ->-> beersLiked apply. No FDs, so allthree attributes form the key.

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    60/67

    Ex ample: D ecompose D rinkers2

    Either MVD name ->-> phones or name ->-> beersLiked tells us todecompose to: Drinkers3(name, phones) Drinkers4(name , beersLiked)

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    61/67

    BCNF

    Given a relation schema R , and a set of functional dependencies F, if every FD, A

    B , is either:

    1. Trivial

    2. A is a su perkey of R

    Then, R is in B CN F ( B oyce-Codd Normal F orm

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    62/67

    BCNF

    What if the schema is not in BCNF ?

    Decompose (split) the schema into two pieces .

    Careful: you want the decomposition to belossless

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    63/67

    Achieving BCNF SchemasFor all dependencies A B in F+, check if A is a superkey

    B y u sing attrib ute clos ure

    If not, thenChoose a dependency in F+ that breaks the BCNF rules, say A B

    Create R1 = A B

    Create R2 = A (R B A)

    Note that: R1 R2 = A and A AB (= R1), so this is losslessdecomposition

    Repeat for R1 , and R2By defining F1+ to be all dependencies in F that contain only attributes inR1

    Similarly F2+

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    64/67

    Example 1

    B C

    R = (A , B, C)F = {A B, B C}

    Candidate keys = { A}BCNF = No. B C violates.

    R1 = (B, C)F1 = {B C}

    Candidate keys = {B }BCNF = true

    R2 = (A, B)F2 = {A B}

    Candidate keys = { A}BCNF = true

    Example 2 1

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    65/67

    Example 2-1

    A B

    R = (A , B, C, D, E)F = {A B, BC D}

    Candidate keys = { ACE}

    BCNF = Violated by { A B, BC D} etc

    R1 = (A , B)F1 = {A B}

    Candidate keys = { A}BCNF = true

    R2 = (A , C, D, E)F2 = {AC D}

    Candidate keys = { ACE}BCNF = false ( AC D)

    From A B and BC D by pseudo-transitivity

    AC D

    R3 = (A, C, D)F3 = {AC D}

    Candidate keys = { AC}BCNF = true

    R4 = (A, C, E)F4 = {} [[ only

    trivial ]]Candidate keys =

    {ACE}

    BCNF = true

    Dependency preservation ???We can check:

    A B (R1), AC D (R3),but we lost BC D

    So this is not a dependency-preserving decomposition

    Example 2 2

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    66/67

    Example 2-2

    BC D

    R = (A , B, C, D, E)F = {A B, BC D}

    Candidate keys = { ACE}

    BCNF = Violated by { A B, BC D} etc

    R1 = (B, C, D)F1 = {BC D}

    Candidate keys = {BC }BCNF = true

    R2 = (B, C, A, E)F2 = {A B}

    Candidate keys = { ACE}BCNF = false ( A B)

    A BR3 = (A , B)

    F3 = {A B}Candidate keys = { A}

    BCNF = true

    R4 = (A , C, E)F4 = {} [[ only

    trivial ]]Candidate keys =

    {ACE}BCNF = true

    Dependency

    preservation ???We can check:

    BC D (R1), A B(R3),Dependency-preserving

    decomposition

    E l 3

  • 8/8/2019 29SpCS157BL16BCNF&Lossless

    67/67

    Example 3

    A BC

    R = (A , B, C, D, E, H)F = {A BC, E HA}Candidate keys = {D E}

    BCNF = Violated by { A BC} etc

    R1 = (A, B, C)F1 = {A BC}

    Candidate keys = { A}BCNF = true

    R2 = (A, D, E, H)F2 = {E HA}

    Candidate keys = {D E}BCNF = false ( E HA)

    E HA

    R3 = (E, H, A)F3 = {E HA}

    Candidate keys = { E}BCNF = true

    R4 = (ED)F4 = {} [[ only

    trivial ]]Candidate keys =

    {DE}

    Dependency preservation???We can check:

    A BC (R1), E HA (R3),Dependency-preserving