17
Your name here

Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Embed Size (px)

Citation preview

Page 1: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Your name here

Page 2: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Improving Schemas and Normalization

• What are redundancies and anomalies?• What are functional dependencies and how are they

related to schema quality?• What is a superkey?• What is a inference rule and how can we infer functional

dependencies• How are keys determined by functional dependencies• How can we modify a schema to improve it?• What are normal forms and why are they important?

Page 3: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Redundancy and Anomalies in Relation Schemas

videoId date Acquired

movieId title genre length rating

115 1/ 25/ 98 101 The Thirty-Nine Steps

mystery 101 PG

90987 2/ 5/ 97 450 Elizabeth costume drama 123 PG-13 145 12/ 31/ 95 145 Lady and the Tramp animated drama 93 G 8034 4/ 5/ 98 145 Lady and the Tramp animated drama 93 G 90988 4/ 5/ 98 450 Elizabeth costume drama 123 PG-13 90989 3/ 25/ 86 450 Elizabeth costume drama 123 PG-13 543 5/ 12/ 95 101 The Thirty-Nine

Steps mystery 101 R

1243 4/ 29/ 91 123 Annie Hall romantic comedy 110 R

• Anomalies occur when data is inconsistent• Redundancy of values is the source of anomalies• Update anomaly occurs when values are inconsistent

– if title, genre, length or rating changed in any one or two of the green rows

Page 4: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Redundancy and Anomalies in Relation Schemas

• Anomalies occur when data is inconsistent• Redundancy of values is the source of anomalies• Deletion anomaly caused by deletion of row with videoId1243 (pink)

– Information about movie is deleted along with video• Insertion anomaly caused by last row (blue)

– Length and rating are inconsistent with other rows

videoId date Acquired

movieId title genre length rating

115 1/ 25/ 98 101 The Thirty-Nine Steps

mystery 101 PG

90987 2/ 5/ 97 450 Elizabeth costume drama 123 PG-13 145 12/ 31/ 95 145 Lady and the Tramp animated drama 93 G 8034 4/ 5/ 98 145 Lady and the Tramp animated drama 93 G 90988 4/ 5/ 98 450 Elizabeth costume drama 123 PG-13 90989 3/ 25/ 86 450 Elizabeth costume drama 123 PG-13 543 5/ 12/ 95 101 The Thirty-Nine

Steps mystery 101 R

1243 4/ 29/ 91 123 Annie Hall romantic comedy 110 R 114 6/ 5/ 98 450 Elizabeth costume drama 110 R

Page 5: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Functional Dependencies Between Attributes

• A functional dependency is a strong connection between two or more attributes in a table. – one attribute is functionally dependent on another attribute when

any two rows of the table that have the same value of the second attribute must have the same value for the first

• Example: movieId determines title, genre, length, rating– Each row with movieId 123 has the same values for other

attributes– FD2: movieId {title, genre, length, rating}

VideoMovie:(videoId, dateAcquired, movieId, title, genre, length, rating)

FD2

Page 6: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

City, State, Zipcode Dependencies

• FD4: zipcode {city, state}• FD5: {street, city, state} zipcode

accountId lastName firstName street city state zipcode

101 Block Jane 123 Main St. Apopka FL 30458

102 Hamilton Cherry 3230 Dade St. Dade City FL 30555

103 Harrison Kate 103 Dodd Hall Apopka FL 30457

104 Breaux Carroll 76 Main St. Apopka FL 30457

106 Morehouse Anita 9501 Lafayette St. Houma LA 44099

111 Deaux Jane 123 Main St. Apopka FL 30458

201 Greaves Joseph 14325 N. Bankside St. Godfrey IL 43580

Customer:(accountId, lastName, firstName, street, city, state, zipcode)

FD4

FD5

Page 7: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Superkeys and Keys

• A key constraint is a functional dependency• Example: accountId is key of Customer

– FD6: accountId {lastName, firstName, street, city, state, zipcode}

• A superkey is a set of attributes that determine the rest of the attributes of a schema– FD7: {accountId, lastName} (firstName, street, city, state, zipcode}

Customer:(accountId, lastName, firstName, street, city, state, zipcode)

FD6

Customer:(accountId, lastName, firstName, street, city, state, zipcode)

FD7

Page 8: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Using Functional Dependencies

• Functional dependencies are used for– Determining keys– Finding sources of redundancy and hence trouble

• Functional dependencies are declared– Designer defines FDs based on the semantics of the schemas– Additional dependencies can be found from those that are

declared

• Keys and redundancies are based on the full set of FDs– All declared FDs– FDs inferred by applying inference rules

Page 9: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Inferring Additional Functional Dependencies

• Main inference rules– Rule 1: Reflexivity, a set of attributes X determines a subset Y of itself: – If X Y, then X Y. – Rule 2: Augmentation, a set of attributes Z can be added to both sides of X

Y:– If X Y, then XZ YZ.– Rule 3: Transitivity, we can follow chains of dependencies from X to Y to Z: If X

Y and Y Z, then X Z.• Additional rules for convenience

– Rule 4: Decomposition, we can remove a set of attributes Z from the right side of X YZ:

– if X YZ, then X Y.– Rule 5: Union, we can put two dependencies X Y and X Z together if they

have the same left side Z:– if X Y and X Z then X YZ– Rule 6: Pseudo-transitivity, a combination of augmentation by adding W to both

sides of X Y and transitivity in going from WX to WY to Z: – if X Y and WY Z, then WX Z.

• Apply rules to FDs to find new rules– Closure is the set of all FDs that can be inferred

Page 10: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Example of Inference

• Consider these how to infer FD7 from FD6– FD6: accountId {lastName, firstName, street, city, state,

zipcode}– FD7: {accountId, lastName} (firstName, street, city, state,

zipcode}

• Infer FD8 with augmentation: add lastName to left side– FD8: {accountId, lastName} (firstName, street, city, state,

zipcode, lastName}

• Use decomposition: remove lastName from right side– FD7: {accountId, lastName} (firstName, street, city, state,

zipcode}

Page 11: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Determining Keys from Functional Dependencies

• Start with closure of functional dependencies• Any functional dependency that includes all attributes

has a superkey as the left side• If no subset of the left side is a super key

– The left side is a key

• A set of attributes is a key if and only if the above holds• Some terminology

– Key is a set of attributes that determine all other attributes– Key attribute is an attribute that is part of a key– Non-key attribute is an attribute that is not part of any key – Primary key is one of the keys that has been selected to identify

the objects of the schema– Secondary key is a key that is not the primary key

Page 12: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Normalization

• Normalization is the process of transforming some objects into a structural form that satisfies some collection of rules

• Any schema that is in normal form is guaranteed to have certain quality characteristics

• Each normal form has a rule that describes what kinds of functional dependencies the normal form allows. – Normalization is the process of transforming schemas in order to

remove violations of the normal form rules. – Normalization is applied independently to each relation schema

in a database schema. – A a database schema is said to be in normal form if each of its

relation schemas is in the normal form.

Page 13: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Third Normal Form• A relation schema is in third normal form (3NF) if for every functional

dependency– The left side (determinant) is a superkey or – The right side attributes are all key attributes

• A functional dependency is a 3NF violation if– The left side is not a superkey and– The right side attributes are all non-key attributes

• Consider the schema and FDs– VideoMovie:(videoId, dateAcquired, movieId, title, genre, length, rating)– FD1: movieId title– FD2: movieId {title, genre, length, rating}– FD9: videoId (dateAcquired, movieId}– FD10: videoId movieId– FD11: videoId (title, genre, length, rating}– FD12: videoId (dateAcquired, movieId, title, genre, length, rating}

• FD1, FD2 are 3NF violations• FD9, FD10, FD11, FD12 are not 3NF violations because videoId (left side)

is a key

Page 14: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Decomposition

• Remove violations by decomposition– Create a new schema from FD– Remove right hand attributes of FD from original schema– Left side of FD becomes foreign key in original schema

• Consider the schema and 3NF violations– VideoMovie:(videoId, dateAcquired, movieId, title, genre, length,

rating)– FD1: movieId title– FD2: movieId {title, genre, length, rating}

• Can decompose by either FD1 or FD2– Better to use the larger FD

• New schemas– Video: (videoId, dateAcquired, movieId references Movie)– Movie: (movieId, title, genre, length, rating)

Page 15: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

First and Second Normal Form

• The traditional presentation of 3NF includes two other normal forms: 1NF and 2NF– E.F. Codd (1970) defined several normal forms– Subsequent analysis simplified the definition of 3NF

• 1NF specifies that every attribute must be single valued– 1NF has been incorporated into definition of relational model

• 2NF makes a technical distinction about why an FD is a violation– The goal of normalization is to achieve 3NF– 2NF is an intermediate step and never a goal of normalization

Page 16: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Boyce Codd Normal Form• A schema is in BCNF if every functional dependency has a

superkey as its determinant – No exclusion for key attributes in left side

• Important in the context of multi-attribute keys• Consider the example of schema and FD

– R6: (street, city, state, zipcode, secondary key {street, zipcode})– FD4: zipcode {city, state}

• FD4 has BCNF violation even though city and state are key attributes– FD4 is not a 3NF violation

• Decomposition of R6 by FD4 into R7 and R8 in BCNF– R7: (street, zipcode references R8)– R8: (zipcode, city, state)

• Note that the schemas have one key each.– Multiple keys have been removed by decomposition

Page 17: Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to

Case in Point: Normalizing a Car Registration Schema

• Example from text – Illustrates the way

that normalization can be a source of schema definition

• Process of design– Define relevant FDs– Apply inference rules– Normalize– Rename resulting

schemas

10583256

DECAL NUMBER

9

BIRTHDATEMO. DAY YR.YR.

347113M 9911

SEX

EXPIRESMO. DAY YR.

N

TAGISSUED

LW1 43K

TAG NUMBER

38543630

TITLE NUMBER

KLA4KA4667MCO19042

VEHICLE IDENTIFICATION NO. YR. MAKE

01

WT/LENGTH

3179

CLASS

02

MAKE

ACUR

TYPE

4D

GREGORY, MATILDA A OR ALAN M2006 W BRANOCH CIRTALLAHASSEE FL 32301

OWNER NAME AND ADDRESS

G26438156171

1ST OWNER DL. NO.

G26351353447

2ND OWNER DL. NO.

C

INSURANCEPIP LIABILITY

12 9910

DATE ISSUEDMO. DAY YR.

0.00 0.00CREDIT$ REFUND$

TAG MONEY

TITLE MONEY

1.60

MOS. TAX $

0.00

B.T. MOS. BACK TAX $

2.50

SVC. CHG. $

0.00

OTHER $

4.10

TAG TOTAL $

25.00

TITLE FEE $

0.00 0.00

LATE FEE $ LIEN $

4.25

SVC. CHG. $

29.25TITLE TOTAL $

0.00

SALES TAX $

33.35

GRAND TOTAL $

COLOR CPR

State of Florida Department of Motor VehiclesVehicle Registration Certificate