57
Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … • Considerations time/space tradeoff for application balance application characteristics, management requirements, competing theories, ... tool support and tool development based on theory • Approach – systematic – transformational guided by theory, but tempered by application semantics adjusted by application-dependent cost analysis

Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 1

Database Design

• Purpose: organize to achieve efficiency, …• Considerations

– time/space tradeoff for application– balance application characteristics, management

requirements, competing theories, ...– tool support and tool development based on theory

• Approach– systematic– transformational– guided by theory, but tempered by application semantics– adjusted by application-dependent cost analysis

Page 2: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 2

Design Steps

ORM application model Model-theoretic view

ORM hypergraph

DB schemes and constraints

designtransformations

not efficientlyorganized

time/spaceadjustments

efficiently organized

If transformations and adjustmentspreserve information and constraints,the generated schemes andconstraints are equivalent to theoriginals in the model-theoretic view.Scheme

generation

Generation ofbasic constraints

Page 3: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 3

Hypergraphs

• Graph = (V, E)– V = set of vertices– E = set of edges (two vertices; one for unordered loops)

• e E can be unordered: e = {x, y}

• e E can be ordered: e = (x, y)

• Hypergraph = (V, E)– V = set of vertices– E = set of edges (can have more than two vertices)

• e E can be unordered: e = {x1, x2, …, xn}

• e E can be ordered: e = (x1, …, xm; y1, …, yk) = e: x1, …, xm y1, …, yk = x1, …, xm y1, …, yk

Page 4: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 4

ORM Hypergraphs

• A regular hypergraph with:– object sets as vertices– relationship sets as edges

• Plus:– Generalization/Specialization– Optional-participation markers– General constraints

• Example:

B A

C

D D < 100

B A

C

D D < 100

Page 5: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 5

Conversion to ORM Hypergraph

H

G

E

F

D

C

I J

B A

G

E

F

0:* H

H, I -> J

J

D

I

0:1

0:*

1:*

B 1 0:* 1

1:*

C

B, C -> D, E

A

1:*

1:* 1:*

1:4

H

G

E

F

D

C

I J

B A

G

E

F

0:* H

H, I -> J

J

D

I

0:1

0:*

1:*

B 1 0:* 1

1:*

C

B, C -> D, E

A

1:*

1:* 1:*

1:4

x(H(x)4<y,z,w>(D(y)H(x)I(z)J(w)))

Page 6: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 6

Functional Dependencies (FDs)

Let r(R) be a relation and let t r, then the restriction of t toX R, written t[X], is the projection of t onto X.

R(A B C) 1 2 3 2 2 5 2 3 5 4 4 5

BC ABC

t[BC] = <2, 5> = {(B:2), (C:5)}

Let R be a relation scheme and let X R and Y R.X Y is a functional dependency or FD. A relation r(R)satisfies the FD X Y (or X Y holds) if for any two tuplest1 and t2 in r, t1[X] = t2[X] t1[Y] = t2[Y]; alternatively (andequivalently) if for every (sub)tuple x in Xr, |X=xXYr| = 1.

A C B / C AB C AC / B

t =

r =

Page 7: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 7

Directed Edges in ORM Hypergraphs are Functional Dependencies

B

A

C

A B

A B1

0:1

BC -> A

r

B

A

C

A B

A B1

0:1

BC -> A

r

Generated Constraints: Participation Constraint: x(A(x) 1y(A(x) r B(y))) Referential-Integrity Constraint: xy(A(x) r B(y) A(x) B(y))

Proof that these constraints imply that ArB satisfies A B.

Recall: A B if for every pair of tuples t1 and t2 in ArB, t1[A] = t2[A] t1[B] = t2[B].

Proof: Let t1 and t2 be tuples in ArB such that t1[A] = t2[A].Since t1 ArB, by the referential-integrity constraint, t1[A] A.Thus, by the participation constraint 1y(A(x) r B(y)) andthus there is exactly one y such that A(x) r B(y). Hence,t1[B] = t2[B].

Proofs for other cases are similar (see Appendix C).

Page 8: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 8

Motivational Example

Manager Worker

Start Date

Department

Manager Worker

Start Date

Department

Reduction Transformation: Information Preserving Constraint Preserving

Worker Manager Start Date W1 M1 1 Oct W2 M1 7 Nov

xyz(Worker(x) works for Manager(y) as of Start Date(z) Worker(x) Manager(y) Start Date(z))x(Worker(x) 1<y,z>(Worker(x) works for Manager(y) as of Start Date(z)))

Page 9: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 9

FD Implication

Let r(R) and let F be a set of FDs over R. Then r satisfies Fif each FD in F holds for r.

F may imply that other FDs also hold. F implies X Y(sometimes written F X Y or F X Y) if X Y holdsfor every relation that satisfies F.

r = A B C D 1 2 3 4 5 6 7 8 5 6 7 9 0 1 2 3

F = {A B, B C}r satisfies F.F A C.

Proof:1. Let s[A] = t[A].2. s[A] = t[A] s[B] = t[B]. given, A B3. s[B] = t[B]. 1 & 2, modus ponens4. s[B] = t[B] s[C] = t[C]. given, B C5. s[C] = t[C]. 3 & 4, modus ponens

Page 10: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 10

F+

Let S be a set of attributes. If F is a set of FDs over S, the setof all FDs implied by F is called the closure of F, denoted F+.When we assert an FD X Y, we mean X Y F+.

Rules for computing F+:

(trivial implication) Y X X Y e.g., Name City Name

(accumulation) X Y, W Z, W Y X YZ e.g., GuestNr Name City, Name Room GuestNr Name City Room

(projection) X Y, Z Y X Z e.g., GuestNr Name City Room GuestNr Room

We compute F+ by a least-fixed-point process.

Page 11: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 11

Sound and Complete Rules

The implication rules for F+ are: sound: the derived FDs hold for any relation satisfying F; complete: repeated application of the rules derives all implied FDs.

Proof of Soundness. (Y X X Y): Let s[X] = t[X]. Then since Y X, s[Y] = t[Y]. (X Y, W Z, W Y X YZ): Let s[X] = t[X]. Then since X Y, s[Y] = t[Y]. Then since W Y, s[W] = t[W] and since W Z, s[Z] = t[Z]. Now, since s[Y] = t[Y] and s[Z] = t[Z], s[YZ] = t[YZ]. (X Y, Z Y X Z): Let s[X] = t[X]. Then since X Y, s[Y] = t[Y], and since Z Y, s[Z] = t[Z].

Proof of Completeness (Appendix C).

Page 12: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 12

Example of F+

If the set of attributes is ABC and the set of FDs F = {A B, B C},then F+ = {A B, B C

A A, B B, C C, AB AB, AC AC, BC BC, ABC ABC, AB A, AB B, AC A, AC C, BC B, BC C, ABC A, ABC B, ABC C, ABC AB, ABC AC, ABC BC,

A BC, A C, A AB, A AC, A ABC, B BC, AB C, AB AC, AB BC, AB ABC, AC B, AC AB, AC BC, AC ABC}

Page 13: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 13

Additional FD-Implication Rules

(augmentation) X Y XZ YZ Room Cost Room NrDays Cost NrDays (transitivity) X Y, Y Z X Z GuestNr Name, Name Room GuestNr Room

(union) X Y, X Z X YZ Room NrBeds, Room Cost Room NrBeds Cost

Page 14: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 14

Checking for X Y F+

• Generate F+ and see if X Y is present (expensive)• Derive X Y from F, or determine that it’s not derivable

– Derivation sequence for X Y: sequence of FDs• each FD is given in F or follows by a sound derivation rule

• X Y is the last FD in the sequence

– Examples:

R = ABCD, F = {A B, B C}, AD C F+?, BC A F+?

1. A B given2. B C given3. A C transitivity, 1 & 24. AD CD augmentation, 35. AD C projection, 4

1. BC B trivial implication2. B C given3. BC C transitivity, 1 & 2 . . .How do we know we cannotderive BC A?

Page 15: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 15

TAP Derivation Sequence

• A particular derivation sequence always works!– List the given FDs– T: Trivial Implication– A: Accumulation (repeated zero or more times)– P: Projection (if needed)

• Examples:

R = ABCD, F = {A B, B C}, AD C F+?, BC A F+?

1. A B given2. B C given3. AD AD T4. AD ABD A5. AD ABCD A6. AD C P

1. A B given2. B C given3. BC BC T

How do we know we cannot derive BC A?Accumulation yields nothing more, andprojection cannot yield A on the rhs, andBC A F+ iff there is a TAP derivationsequence for BC A.

Page 16: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 16

X+ – Closure of a Set of Attributes

• X+ = maximal accumulation in a TAP derivation sequence starting with X.

• Algorithm for X+ given a set of FDs F:– 1. Start with X+ = X.– 2. If Y Z F and Y X+, X+ becomes X+Z.– 3. Repeat 2 until no more changes to X+ (least fixed point).

• Examples:R = ABCD, F = {A B, B C}

AD+ = ABCD BC+ = BC BD+ = BCD D+ = D A+ = ABC

Page 17: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 17

X Y F+ iff Y X+

• Significant observation!– X Y F+ looks like a problem requiring exponential time– BUT has a polynomial-time solution (linear with well-chosen

data structures)

• This is an example of the essence of good computer science.

Page 18: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 18

X+ and Hypergraph Reachability

EH

B

C

G

DA

EH

B

C

G

DA

To test X Y F+, mark the vertices in X and see if thevertices in Y are reachable following directed edges.

A D F+? YesA G F+? NoE CD F+? YesA BH F+? NoAE HG F+? Yes. . .

Page 19: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 19

Reduction Transformations• Discard “stuff” in model-theoretic view that we don’t need

– Discard redundant relations (= redundant hypergraph edges)

– Reduce relations (= reducible hypergraph edges)

– Discard implied constraints

• Example

Guest NameRoom

Room Name

Guest

hasoccupies

names guest occupying

hasoccupiesGuest NameRoom

Room Name

Guest

hasoccupies

names guest occupying

hasoccupies

Observe: 1. names_guest_occupying = Room,Name(occupies has) 2. {Room Guest, Guest Name} Room Name

Page 20: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 20

Information Preserving Transformations

A transformation from M to M preserves informationif for any valid interpretation there is a procedure Pthat computes M from M.

Room = RoomGuest = GuestName = Nameoccupies = occupieshas = hasnames_guest_occupying = Room,Name(occupies has)

For our example, P is:

Page 21: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 21

Constraint Preserving TransformationsA transformation from M to M preserves constraints if C, theconstraints of M (along with the constraints implied by thetransformations P and P), implies C, the constraints of M.

For our example: C C

Constraints of M: C = {Room Guest, Guest Name, Room Name, xy(Name(x) names guest occupying Room(y) Name(x) Room(y)), . . . }Constraints of M, plus the constraints implied by P and P: C = { Room Guest, Guest Name, xy(Guest(x) occupies Room(y) Guest(x) Room(y)), xy(Guest(x) has Name(y) Guest(x) Name(y)). . . . , xy(Name(x) names guest occupying Room(y) <=> z(Guest(z) has Name(x) Guest(z) occupies Room(y))) }

Page 22: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 22

Equivalence for Reduction Transformations

• The backward transformations (i.e., the ones just discussed) are the hard ones to guarantee

• The forward transformations are easy to guarantee– There is a “trivial” transformation P that computes M from

M – just delete the discarded part.– The constraints of M imply the constraints of M vacuously –

every constraint in M is a constraint in M.

• Reduction transformations are actually equivalence transformations.

Page 23: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 23

FD Equivalence

• Two sets of FDs F & G are equivalent, written F G, if F implies each FD in G and conversely.

• If F G, then F+ = G+.

F = {A B, AB C, C D} G = {A BC, C D, A D}In F, A+ = ABCD A BC & A D and C+ = CD C D.In G, A+ = ABCD A B, AB+ = ABCD AB C, and C+ = CD C D.

F G

G+

F+

F G

F+G+

F G G F (G+ F+ F+ G+) F+ = G+

Page 24: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 24

Semantics

R1 Kennedy

R3 Carter

103

101

Smith

R3 Carter

R1Smith

Name103

101

Room Name

r

Guest Guest

Room

Guest

Room

Name

s

q

R1 Kennedy

R3 Carter

103

101

Smith

R3 Carter

R1Smith

Name103

101

Room Name

r

Guest Guest

Room

Guest

Room

Name

s

q

q = Room, Name(r s) ??

Case 1: q is name of guest staying in room.

Case 2: q is name of room.

Room Guest and Guest Nameimplies Room Name. But themeaning of the implied relationshipset, Room Name, is the compositionof the meanings of the relationshipsets, Room Guest and Guest Name, which may or may not be themeaning of the relationship set q.

Page 25: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 25

Techniques for Checking Semantics

• Relevant-Edge Sets• Project-Join or Universal-Relation Test• Outer-Join/FD Test• Congruency Test

A search for Semantic Equivalence

Page 26: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 26

Relevant Edge Sets

• Ignore edges not used in derivations.• Multiple edge sets are possible.• A backtracking algorithm can generate all relevant edge sets.• Example: for q below

– Ignore t– Consider: {{p}, {r, s}}

Room Name

Guest

Costt

r s

q

p

Room Name

Guest

Costt

r s

q

p

Page 27: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 27

Universal RelationUniversal Relation: lossless join over all relations (for all valid interpretations)If r1(R1), …, rn(Rn) and u = r1 … rn, then Riu = ri.

Case 1.

Case 2. r = Room Guest s = Guest Name q = Room Name R1 101 101 Smith R1 Kennedy R3 103 103 Carter R3 Carter

r s q = Room Guest Name R3 103 Carter

r = Room Guest s = Guest Name q = Room Name R1 101 101 Smith R1 Smith R3 103 103 Carter R3 Carter

r s q = Room Guest Name R1 101 Smith R3 103 Carter

Page 28: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 28

Outer Join

Instead of discarding “dangling tuples” (tuples that do notjoin), pad them with nulls and add them to the result.

r = Room Guest s = Guest Name q = Room Name R1 101 101 Smith R1 Kennedy R3 103 103 Carter R3 Carter

s q = Room Guest Name R3 103 Carter 101 Smith R1 Kennedy

r (s q) = Room Guest Name R3 103 Carter 101 Smith R1 Kennedy R1 101

Observe that the FDRoom Name fails.In general, we can ask,“Do we always get tothe same place.”

Page 29: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 29

Congruency• If the common properties of the objects in an object set S

coincide with the properties explicitly defined for S, S is congruent; otherwise S is incongruent.

• For ORM application models:– An object set E is an explicitly defined property for an object set S if

E is connected by a relationship set to S or to any generalizations of S.

– An object set C is a common property for an object set S if for any valid interpretation, all objects in S connect to one or more objects in C.

• For ORM hypergraphs, an object set is incongruent if it has a relationship set with an optional connection; otherwise it is congruent.

Page 30: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 30

Congruency – Same Semantics

Guest

NameRoom

Occupied Room Name

Guest

Room

q

r s

r

q

s

Guest

NameRoom

Occupied Room Name

Guest

Room

q

r s

r

q

s

Case 1: q is name of guest staying in room.

Incongruent

Congruent

Page 31: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 31

Congruency – Different Semantics

Room Room Name

Guest Guest NameOccupied Room

Name

Room Name

Guestr s

q

r s

qRoom Room Name

Guest Guest NameOccupied Room

Name

Room Name

Guestr s

q

r s

q

Case 2: q is name of room.

Incongruent

Congruent

Page 32: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 32

Semantic Equivalence

• If a subORM diagram S (a set of object and relationship sets constituting a valid diagram) is congruent and has a universal relation for all valid interpretations, semantic equivalence holds over S.

• Various ways to view this.– Fundamental idea: no conflicting semantics.– URA (Universal-Relation Assumption)

• join-project test• outer-join/FD-paths test

– URSA (Universal Relation-Scheme Assumption)• congruency test• all attributes have one and only one meaning• roles are introduced to resolve multiple meanings

Page 33: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 33

Union-Covered Isolated Root Generalization

Room Name

Guest Name

Room Name

Guest Name

Name

Room Name

Guest Name

Room Name

Guest Name

Name

Page 34: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 34

Partition-Covered Isolated Root Generalization

Room Phone#

Guest Phone#

Room Phone#

Guest Phone#

Phone#

Room Phone#

Guest Phone#

Room Phone#

Guest Phone#

Phone#

x(Guest Phone#(x) Room Phone#(x))x(Room Phone#(x) Guest Phone#(x))

Page 35: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 35

Head-Reduction Test

C

A

D

E

B

C

A

D

E

B

• Procedure– Remove proposed head component.

– Mark tails & run closure.

– If head object set marked, “yes”; else “no”.

• Observations– No head reduction is possible unless multiple arrow heads initially point at an object set.

– We must, of course, check for semantic equivalence.

?

Page 36: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 36

Head Reduction – Case 1(One Tail or Multiple Heads)

C

A

D

E

B

C

A

D

E

B

• Reduction:– For multiple heads, discard the tested head.– For one tail with one head, discard the edge.

• Preserves information– Join over the relevant edge set and project on object sets of edge. ACD(AB BC AD)

• Preserves constraints– A B, B C, A D A CD– Referential integrity and other constraints (trivially) OK

Page 37: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 37

Head Reduction – Case 2(Multiple Tails and Single Head)

A

B

C

E

D

B E

A

C

D A

B

C

E

D

B E

A

C

D

• Reduction:– Discard the head component.– Keep tails connected.

• Preserves information ABC(AB AD DC)

– Not enough (no B): AC(AD DC)

– Not correct: ABC(AD DC BE)

• Preserves constraints: A D, D C AB C

A B C A D D C B E1 1 1 1 4 4 1 1 62 2 2 2 5 5 2 2 7

A D C A C B1 4 1 1 1 12 5 2 1 1 2 2 2 1 2 2 2

Page 38: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 38

Tail-Reduction Test

B E

F

A

C D

B E

F

A

C D

• Procedure– Mark all tails of an edge except the tail proposed for removal. (Be sure to keep the unmarked

(questioned) tail component for the test.)– Run closure.– If head object sets marked, “yes”; else “no”.

• Observations– No tail reduction is possible unless the edge has multiple tails.– We must, of course, check for semantic equivalence.

?

Page 39: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 39

Tail Reduction – Case 1(Edge Used in Closure)

B E

A

CF

D

B E

A

CF

D

• Reduction: Discard the tail component.

• Preserves information– Join over relevant edge set and project on object sets of adjusted edge. ABC(AF FB AC)

• Preserves constraints– A C AB C– (note: forward transformation OK) A F, F B, AB C A C– Referential integrity and other constraints (trivially) OK

A B C A C A F F B A F B C1 3 5 1 5 1 1 1 3 1 1 3 52 4 6 2 6 2 2 2 4 2 2 4 6

Page 40: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 40

Tail Reduction – Case 2(Edge Not Used in Closure)

A

C

B E

A D

C

B E

D

B E

A

C

D A

C

B E

A D

C

B E

D

B E

A

C

D

• Reduction:– Discard the tail component.– Add connection among tails.

• Preserves information ABC(AB AC AD DC)

– Not enough (no B): AC(AC AD DC)

– Not correct: ABC(AC AD DC BE)

• Preserves constraints: – A C AB C– A D, D C A C

A B C A D D C B E1 1 1 1 4 4 1 1 62 2 2 2 5 5 2 2 7

A D C A C B1 4 1 1 1 12 5 2 1 1 2 2 2 1 2 2 2

Page 41: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 41

FD Equivalence Classes

• X Y for a set of FDs F if X Y F+ and Y X F+.– F = {A BC, BC D, D A, A D, BC E}– A D, A BC, BC D, but not BC E

is an equivalence relation, with equivalence classes.– reflexive, symmetric, transitive– {A, BC, D}

• trivial equivalence classes: {B}, {CE}, …• nontrivial, minimal-set equivalence class

– nontrivial & no composite set can be reduced– not: {AB, BC, D}– not: {B}

Page 42: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 42

Circularly Linked Equivalence Class

C

A

B

E

D

C

B

D

AE

C

A

B

E

D

C

B

D

AE

Page 43: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 43

Minimally Consolidated Equivalence Class

AE

C

B

D

C

B

D

AE

AE

C

B

D

C

B

D

AE

Page 44: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 44

Lexicalization

Guest Nr(of Current Guest)

AddedCost

Guest Nr(of Guest)Guest

CurrentGuest

Name

AddedCost

Name Guest Nr

Guest Nr(of Current Guest)

AddedCost

Guest Nr(of Guest)Guest

CurrentGuest

Name

AddedCost

Name Guest Nr

Page 45: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 45

Composite Lexicalization

Guest Nr (of Guest) AddressName

Zip (of Address)

Street Nr (of Address)

Guest Nr (of Guest) City (of Address)Name

Zip

Street Nr

CityGuest Nr (of Guest) AddressName

Zip (of Address)

Street Nr (of Address)

Guest Nr (of Guest) City (of Address)Name

Zip

Street Nr

City

Page 46: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 46

1-1 Nonlexical Object Sets

Guest Nr (of Guest | Account)

Amount (of Account Record)

Service (of Account Record)

Name

AccountRecord

Amount

Service

Guest | Account

Name

Guest Nr

AccountRecord

Amount

Service

AccountGuest

Name

Guest Nr

Guest Nr (of Guest | Account)

Amount (of Account Record)

Service (of Account Record)

Name

AccountRecord

Amount

Service

Guest | Account

Name

Guest Nr

AccountRecord

Amount

Service

AccountGuest

Name

Guest Nr

Page 47: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 47

Further Head Reductions

• Some head reductions are possible only after linking equivalence classes circularly.

• Figure 9.20 gives an example.– 9.20a: no head reductions possible– 9.20b: circular link added– 9.20c: head reduction possible

Page 48: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 48

Redundant NonFD Relationship Sets

Room0:*

will be staying in0:*

0:*

Guest hasreservationfor Roomon Date

0:*1:*

Guest

Date

Room0:*

will be staying in0:*

0:*

Guest hasreservationfor Roomon Date

0:*1:*

Guest

Date

• Looks redundant.• Is redundant if semantic equivalence holds.• Cycles always exist when there are redundant nonFD relationship

sets.• Removal preserves information.• Removal preserves constraints.

Page 49: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 49

Information Preservation – Sufficient for NonFD Edge Reduction

A

E

D C

B

A

D C

BA

D C

B

A

E

D C

B

A

D C

BA

D C

B

Transformable if:

AD = AD(AB BC CD)

Cycles must include the entirehyperedge. We cannot removeADE because we cannot recoverthe connections to objects in E.

Note that the cycles we need have nothing to do with the direction of edges.

Page 50: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 50

Head/Tail Reductions Can Yield a Redundant NonFD Relationship Set

D

F

E

AC

B

D

F

E

A

C

B

D

A

C

F

E

B

D

F

E

AC

B

D

F

E

A

C

B

D

A

C

F

E

B

Page 51: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 51

N-ary Relationship-Set Reduction

Room

View

Guest

Date

Guest

View

Date

Room

Guest has reservation onDate for Room with View

Guest has reservationon Date for Room

has

Room

View

Guest

Date

Guest

View

Date

Room

Guest has reservation onDate for Room with View

Guest has reservationon Date for Room

has

Guest Date Room View = Guest Date Room Room View G1 10 May R1 Sea G1 10 May R1 R1 Sea G1 10 May R1 Forest G1 15 May R2 R1 Forest G1 15 May R2 Forest G2 15 May R1 R2 Forest G2 15 May R1 Sea G2 15 May R1 Forest

Page 52: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 52

Non-reducible N-ary Edge

Room

Guest

Date

Guest has reservationon Date for Room

Room

Guest

Date

Guest has reservationon Date for Room

r = Guest Date Room G1 10 May R1 G1 15 May R2 G2 15 May R1

p = Guest Date q = Date Room s = Guest Room G1 10 May 10 May R1 G1 R1 G1 15 May 15 May R2 G1 R2 G2 15 May 15 May R1 G2 R1

r p qr q sr p sr p q s

Page 53: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 53

Properly Embedded FD – Example

Date

Room

Picky Guest

Picky Guest makes reservationonly for particular Room

Picky Guest hasreservation forRoom on Date

Date

Room

Picky Guest

Picky Guest makes reservationonly for particular Room

Picky Guest hasreservation forRoom on Date

• The second relation can be obtained by projection from the first.• We can discard the second, BUT:

– We must keep the FD.• We can add it as a co-occurrence constraint.• The constraint, however, is not a key constraint and is “hard” to enforce.

– There is another problem: redundancy – for each picky guest reservation we store the same room.

Room Date Picky Guest R1 10 May G1 R1 15 May G1 R5 10 May G3

Room Picky Guest R1 G1 R5 G3

Page 54: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 54

Properly Embedded FD – Example

Room Date Picky Guest = Room Picky Guest Date Picky Guest R1 10 May G1 R1 G1 10 May G1 R1 15 May G1 R5 G3 15 May G1 R5 10 May G3 10 May G3

• The transformation preserves information.• The transformation preserves constraints.

Picky Guest

Date

Room

Date

Room

Picky Guest

Room, Date -> Picky Guest

Picky Guest

Date

Room

Date

Room

Picky Guest

Room, Date -> Picky Guest

Page 55: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 55

Properly Embedded FD – Definition

DA

C

B

DA

C

B

• R is the set of object sets of an edge (or eq. class) (e.g., ABC)• XY R (e.g., AC ABC) • X Y = (e.g., A C = )• X Y is an FD edge or implied by the FD edges (e.g., implied)• Semantic equivalence holds for X Y with respect to R• X+ R (e.g., C+ = ACD ABC)

An FD X Y (e.g., C A) is a properly embedded FD if:

Page 56: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 56

Properly Embedded FD – Reduction

B

C

D

A

DA

C

B AB -> CB

C

D

A

DA

C

B AB -> C

• Project R on R - Y (e.g., ABC - A = BC) and add it as a non-directed edge E.• Discard R.• Create a high-level relationship set S over E (e.g., BC) and the relevant edge

set used to imply X Y (e.g., C D and D A) with R (e.g., ABC) as its object sets.

• Specify the original FD of R (e.g., AB C) as a co-occurrence constraint on S.

An edge R (e.g., ABC) with a properly embedded FD X Y(e.g., C A) is reduced as follows:

Page 57: Chapter 9 - 1 Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,

Chapter 9 - 57

Properly Embedded FDs in Equivalence Classes

A

C

D

B

DC

A B

BC -> D

A

C

D

B

DC

A B

BC -> D

• The FD A B is embedded in the equivalence class {AC, BC, D}.• We disconnect the head of the embedded FD, but keep its tail.• Keeping the tail allows us to preserve information.• We then make the transformed diagram preserve constraints.