6
12/12/2005 8:10 PM CS 435 – Fall 2005 1 Final Review Final Review Dr. Rafal A. Angryk Dr. Rafal A. Angryk Outline Outline Covered readings in textbooks Covered readings in textbooks Regular type of test on the final (single Regular type of test on the final (single choice!) choice!) A few exercises A few exercises ER Normalization Readings in textbook (1) Readings in textbook (1) Part 1 Introduction and Conceptual Part 1 Introduction and Conceptual Modeling Modeling Chapter 1.1 Chapter 1.1- 6: 6: Databases and Databases and database Users database Users Chapter 2.1 Chapter 2.1-7: System concepts and System concepts and Architecture Architecture Chapter 3.1 Chapter 3.1- 9: Modeling using the E 9: Modeling using the E- R diagram R diagram Readings in textbook (2) Readings in textbook (2) Part 2 The Relational Model Part 2 The Relational Model Chapter 5: Concepts and Constraints Chapter 5: Concepts and Constraints Chapter 6.1 Chapter 6.1- 7: Relational Algebra 7: Relational Algebra Section 7.1 Section 7.1- 3: Mapping ER diagrams 3: Mapping ER diagrams to tables to tables Chapter 8.1 Chapter 8.1- 8, 9.: SQL schema 8, 9.: SQL schema definition, constraints, queries, views definition, constraints, queries, views Readings in textbook (3) Readings in textbook (3) Part 3 Database Design Part 3 Database Design Chapter 10.1 Chapter 10.1- 6: Functional 6: Functional dependencies and normalization dependencies and normalization Chapter 11.Intro: Two approaches to Chapter 11.Intro: Two approaches to normalization normalization Readings in textbook (4) Readings in textbook (4) Part 4 Data storage, indexing, Part 4 Data storage, indexing, query processing query processing Chapter 13: Disk storage, file Chapter 13: Disk storage, file structures, & hashing structures, & hashing Chapter 14: Indexing structures for Chapter 14: Indexing structures for files files Section 15.1 Section 15.1- 3, 15.7: Query 3, 15.7: Query optimization & algorithms for join optimization & algorithms for join Chapter 16 Practical design & tuning Chapter 16 Practical design & tuning

27 - Review for Final - Montana State University 8:10 PM CS 435 – Fall 2005 5 Normalization to BCNF – Example (2) Universal Table 5 Zane 2 14:00 Smith 4 Robert 1 13:00 McCoy 3

  • Upload
    vumien

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 27 - Review for Final - Montana State University 8:10 PM CS 435 – Fall 2005 5 Normalization to BCNF – Example (2) Universal Table 5 Zane 2 14:00 Smith 4 Robert 1 13:00 McCoy 3

12/12/2005 8:10 PM

CS 435 – Fall 2005

1

Final ReviewFinal Review

Dr. Rafal A. AngrykDr. Rafal A. Angryk

OutlineOutline

Covered readings in textbooks Covered readings in textbooks Regular type of test on the final (single Regular type of test on the final (single choice!)choice!)A few exercisesA few exercises

ERNormalization

Readings in textbook (1)Readings in textbook (1)

Part 1 Introduction and Conceptual Part 1 Introduction and Conceptual ModelingModeling

Chapter 1.1Chapter 1.1--6:6: Databases and Databases and database Users database Users Chapter 2.1Chapter 2.1--77: System concepts and System concepts and ArchitectureArchitectureChapter 3.1Chapter 3.1--9: Modeling using the E9: Modeling using the E--R diagramR diagram

Readings in textbook (2)Readings in textbook (2)

Part 2 The Relational ModelPart 2 The Relational ModelChapter 5: Concepts and ConstraintsChapter 5: Concepts and ConstraintsChapter 6.1Chapter 6.1--7: Relational Algebra7: Relational AlgebraSection 7.1Section 7.1--3: Mapping ER diagrams 3: Mapping ER diagrams to tablesto tablesChapter 8.1Chapter 8.1--8, 9.: SQL schema 8, 9.: SQL schema definition, constraints, queries, viewsdefinition, constraints, queries, views

Readings in textbook (3)Readings in textbook (3)

Part 3 Database DesignPart 3 Database DesignChapter 10.1Chapter 10.1--6: Functional 6: Functional dependencies and normalizationdependencies and normalizationChapter 11.Intro: Two approaches to Chapter 11.Intro: Two approaches to normalizationnormalization

Readings in textbook (4)Readings in textbook (4)

Part 4 Data storage, indexing, Part 4 Data storage, indexing, query processingquery processing

Chapter 13: Disk storage, file Chapter 13: Disk storage, file structures, & hashingstructures, & hashingChapter 14: Indexing structures for Chapter 14: Indexing structures for filesfilesSection 15.1Section 15.1--3, 15.7: Query 3, 15.7: Query optimization & algorithms for joinoptimization & algorithms for joinChapter 16 Practical design & tuningChapter 16 Practical design & tuning

Page 2: 27 - Review for Final - Montana State University 8:10 PM CS 435 – Fall 2005 5 Normalization to BCNF – Example (2) Universal Table 5 Zane 2 14:00 Smith 4 Robert 1 13:00 McCoy 3

12/12/2005 8:10 PM

CS 435 – Fall 2005

2

Readings in textbook (5)Readings in textbook (5)Part 5 Transaction Processing Part 5 Transaction Processing conceptsconcepts

Chapter 19Chapter 19--19.4: Database recovery 19.4: Database recovery techniquestechniques

This time This time -- regular type of regular type of test for the theory parttest for the theory part

Good news:Good news:Just a single answer will be correctJust a single answer will be correctStrategy: CHOOSE THE BEST ANSWERStrategy: CHOOSE THE BEST ANSWER

Bad news:Bad news:Still tricky… Otherwise it would be trivial!

Why? There is no way you can be good Why? There is no way you can be good in practice, if you do not have strong in practice, if you do not have strong foundations in theoryfoundations in theoryAs usually there will be more practice As usually there will be more practice than theorythan theory

Be sure to practice at home:Be sure to practice at home:Prepare ER Design in Min, Max notation, Prepare ER Design in Min, Max notation, and map it to the Relational Diagramand map it to the Relational DiagramTransform a Universal Relation to 3NF or Transform a Universal Relation to 3NF or BCNF tables, and find a minimal set of BCNF tables, and find a minimal set of full functional dependenciesfull functional dependenciesRelational Algebra and SQL CommandsRelational Algebra and SQL CommandsQuery optimizationQuery optimizationInsert and delete data to/from B+Insert and delete data to/from B+--treestreesRecommend the best type of indexing for Recommend the best type of indexing for given attributes and storage formatsgiven attributes and storage formats

Exe. Exe. -- Model the following Model the following System Requirements:System Requirements:1.1. The bank is organized into branches. Each The bank is organized into branches. Each

branch is located in a particular city and is branch is located in a particular city and is identified by a unique name. The bank identified by a unique name. The bank monitors the assets of each branch.monitors the assets of each branch.

2.2. Bank customers are identified by their Bank customers are identified by their customercustomer--id id values. The bank stores each values. The bank stores each customer's name, and the street and city customer's name, and the street and city where the customer lives. Customers may where the customer lives. Customers may have multiple saving accounts and can take have multiple saving accounts and can take out loans. A customer may be associated out loans. A customer may be associated with a particular banker, who may act as with a particular banker, who may act as either a loan officer or a personal banker for either a loan officer or a personal banker for that customer. that customer.

Exe. Exe. –– cont.cont.3.3. Bank employees are identified by their Bank employees are identified by their employeeemployee--

idid values. The bank administration stores the values. The bank administration stores the name and telephone number of each employee, name and telephone number of each employee, and the and the employeeemployee--idid number of the employee's number of the employee's manager. The bank also keeps track of the manager. The bank also keeps track of the employee's start date and, thus, length of employee's start date and, thus, length of employment. employment.

4.4. The bank offers only two types of services The bank offers only two types of services --saving accounts and loans. Accounts can be held saving accounts and loans. Accounts can be held by more than one customer, and a customer can by more than one customer, and a customer can have more than one account. Each account is have more than one account. Each account is assigned a unique account number. The bank assigned a unique account number. The bank maintains a record of each account's balance. It maintains a record of each account's balance. It also records all dates when the account was also records all dates when the account was accessed by each customer holding the account. accessed by each customer holding the account. Each savings account has a separate interest rate Each savings account has a separate interest rate and information about the branch name where it and information about the branch name where it was opened.was opened.

Exe. Exe. –– cont.cont.5.5. A loan originates at a particular branch and A loan originates at a particular branch and

can be held by one or more customers. A can be held by one or more customers. A loan is identified by a unique loan number. loan is identified by a unique loan number. For each loan, the bank keeps track of the For each loan, the bank keeps track of the loan amount of the loan payments. Although loan amount of the loan payments. Although a loan payment number does not uniquely a loan payment number does not uniquely identify a particular payment among those identify a particular payment among those for all the bank's loans, a payment number for all the bank's loans, a payment number does identify a particular payment for a does identify a particular payment for a specific loan. The date and amount are specific loan. The date and amount are recorded for each payment recorded for each payment Note: Do not worry about making this Note: Do not worry about making this description complete. Just design data description complete. Just design data model so it would reflect all requirements model so it would reflect all requirements mentioned abovementioned above

Page 3: 27 - Review for Final - Montana State University 8:10 PM CS 435 – Fall 2005 5 Normalization to BCNF – Example (2) Universal Table 5 Zane 2 14:00 Smith 4 Robert 1 13:00 McCoy 3

12/12/2005 8:10 PM

CS 435 – Fall 2005

3

Now map it to Relational Now map it to Relational Model and practice SQL Model and practice SQL on iton it

E.g. Retrieve all bank account numbers E.g. Retrieve all bank account numbers (only unique values), which were opened (only unique values), which were opened in Bozeman, and accessed by Mr. John in Bozeman, and accessed by Mr. John Smith (i.e. Smith (i.e. ‘‘John SmithJohn Smith’’) on 10th of ) on 10th of February (i.e. February (i.e. ‘‘Feb/05/2005Feb/05/2005’’).).Practice optimization of your queriesPractice optimization of your queriesPut the same in Relational AlgebraPut the same in Relational AlgebraWhat type of indexes would you specify?What type of indexes would you specify?Are your tables normalized?Are your tables normalized?

ER Design DecisionsER Design DecisionsThe use of an attribute or entity type (set) to The use of an attribute or entity type (set) to represent an object.represent an object.Whether a realWhether a real--world concept is best expressed world concept is best expressed by an entity set or a relationship type (set).by an entity set or a relationship type (set).The use of a ternary relationship versus a pair of The use of a ternary relationship versus a pair of binary relationships.binary relationships.The use of a strong or weak entity set.The use of a strong or weak entity set.

The use of specialization/generalization –contributes to modularity in the design.The use of aggregation – can treat the aggregate entity set as a single unit without concern for the details of its internal structure.

ER Design Crucial ElementsER Design Crucial ElementsUse of entity types vs. attributesUse of entity types vs. attributesChoice mainly depends on the structure of the Choice mainly depends on the structure of the enterprise being modeled, and on the semantics enterprise being modeled, and on the semantics associated with the attribute in question.associated with the attribute in question.Use of entity types (sets) vs. relationship types (sets)Use of entity types (sets) vs. relationship types (sets)Possible guideline is to designate a relationship set Possible guideline is to designate a relationship set to describe an action that occurs between entitiesto describe an action that occurs between entitiesBinary versus Binary versus nn--ary relationship typesary relationship typesAlthough it is possible to replace any nonbinary (Although it is possible to replace any nonbinary (nn--ary, for ary, for nn > 2) relationship set by a number of > 2) relationship set by a number of distinct binary relationship sets, a distinct binary relationship sets, a nn--ary ary relationship set shows more clearly that several relationship set shows more clearly that several entities participate in a single relationship.entities participate in a single relationship.Appropriate placement of relationship attributesAppropriate placement of relationship attributes

More on normalizationMore on normalization

BoyceBoyce--CoddCodd Normal FormNormal Form((BCNF)BCNF)One more example of One more example of normalizationnormalization

Why BCNF is importantWhy BCNF is important

When a relation has more than one When a relation has more than one candidate key, anomalies may result candidate key, anomalies may result even though the relation is in 3NF.even though the relation is in 3NF.3NF does not deal satisfactorily with 3NF does not deal satisfactorily with the case of a relation with overlapping the case of a relation with overlapping candidate keyscandidate keys

i.e. composite candidate keys with at least one attribute in common.

Page 4: 27 - Review for Final - Montana State University 8:10 PM CS 435 – Fall 2005 5 Normalization to BCNF – Example (2) Universal Table 5 Zane 2 14:00 Smith 4 Robert 1 13:00 McCoy 3

12/12/2005 8:10 PM

CS 435 – Fall 2005

4

ReminderReminder

SuperkeySuperkey of relation schema R of relation schema R -- a set of a set of attributes S of R that contains a key of Rattributes S of R that contains a key of RA relation schema R is in A relation schema R is in third normal third normal form (3NF)form (3NF) if whenever a FD X if whenever a FD X --> > A A holds in R, then either: holds in R, then either:

(a) X is a (a) X is a superkeysuperkey of R, or of R, or (b) A is a prime attribute of R(b) A is a prime attribute of R

NOTE:NOTE:BoyceBoyce--CoddCodd normal form disallows normal form disallows condition (b) abovecondition (b) above

What is a Prime AttributeWhat is a Prime Attribute

AA prime attributeprime attribute must bemust be a a member of some candidate keymember of some candidate key

AA nonprime nonprime (or(or nonkeynonkey) ) attributeattribute is not a prime attribute is not a prime attribute -- that is, it is that is, it is not a member of any not a member of any candidate keycandidate key..

In other words In other words ……A relation is in A relation is in 3NF3NF if every if every nonkeynonkey attribute is attribute is in 2NF and in 2NF and nontransitively dependent on nontransitively dependent on a candidate keya candidate key

BCNFBCNF is based on the concept of a is based on the concept of a determinantdeterminant..

A determinant is any attribute (simple or A determinant is any attribute (simple or composite) on which some other attribute is composite) on which some other attribute is fullyfully functionally dependentfunctionally dependent

A relation is in A relation is in BCNFBCNF if and only ifif and only if every every determinant is a candidate keydeterminant is a candidate keyi.e. whenever an FD X i.e. whenever an FD X --> > A holds in R, then X A holds in R, then X must be a must be a superkeysuperkey of Rof R

RememberRemember

Each normal form is strictly stronger Each normal form is strictly stronger than the previous onethan the previous one

Every 2NF relation is in 1NFEvery 2NF relation is in 1NFEvery 3NF relation is in 2NFEvery 3NF relation is in 2NFEvery BCNF relation is in 3NFEvery BCNF relation is in 3NF

BCNF is a simpler, yet stricter form of BCNF is a simpler, yet stricter form of 3NF3NFNOTE: There exist relations that are in NOTE: There exist relations that are in 3NF but not in BCNF3NF but not in BCNF

The theoryThe theoryConsider the following relation and Consider the following relation and FDsFDs: : R(R(a,ba,b,c,d,c,d))

(1) (1) a,ca,c --> > b,db,d(2) (2) a,da,d --> b> b

Here, the first determinant suggests that the primary Here, the first determinant suggests that the primary key of R could be changed from key of R could be changed from a,ba,b to to a,ca,c. . If this change was done all of the nonIf this change was done all of the non--key attributes key attributes present in R could still be determined, and therefore present in R could still be determined, and therefore this change is legal. this change is legal. However, the second determinant indicates that However, the second determinant indicates that a,da,ddetermines determines bb, but, but a,da,d could not be the key of R as could not be the key of R as a,da,d does not determine does not determine allall of the non key attributes of the non key attributes of R (it does not determine of R (it does not determine cc). ). We would say that the determinant in the first FD is We would say that the determinant in the first FD is a candidate key, but the second determinant is not a a candidate key, but the second determinant is not a candidate key, and thus candidate key, and thus this relation is not in BCNFthis relation is not in BCNFHowever, R is in 3NF since in the second FD However, R is in 3NF since in the second FD bb is a is a prime attribute of Rprime attribute of R

Normalization to BCNF Normalization to BCNF ––Example (1)Example (1)

The CLINIC relation above depicts a special The CLINIC relation above depicts a special weightweight--loss surgery clinic where the each loss surgery clinic where the each patient has 4 appointments:patient has 4 appointments:On the first they are weighed, On the first they are weighed, On the second they are exercised, On the second they are exercised, On the third they have gastric surgery, On the third they have gastric surgery, And on the fourth they are reAnd on the fourth they are re--examined and examined and given diet tips to not gain the weight againgiven diet tips to not gain the weight againNot all patients need all four appointments! Not all patients need all four appointments! Appointment 1 is either 09:00 or 13:00, Appointment 1 is either 09:00 or 13:00, appointment 2 10:00 or 14:00, and so onappointment 2 10:00 or 14:00, and so on……

Page 5: 27 - Review for Final - Montana State University 8:10 PM CS 435 – Fall 2005 5 Normalization to BCNF – Example (2) Universal Table 5 Zane 2 14:00 Smith 4 Robert 1 13:00 McCoy 3

12/12/2005 8:10 PM

CS 435 – Fall 2005

5

Normalization to BCNF Normalization to BCNF ––Example (2) Example (2) Universal TableUniversal Table

SmithSmith14:0014:0022ZaneZane55McCoyMcCoy13:0013:0011RobertRobert44SmithSmith10:0010:0022AdamAdam33

McCoyMcCoy09:0009:0011KerrKerr22SmithSmith09:0009:0011JohnJohn11DoctorDoctorTimeTimeAppointmentNoAppointmentNoPaienttNamePaienttNamePatient NoPatient No

CLINICCLINIC

Normalization to BCNF Normalization to BCNF ––Example (3)Example (3)

CLINIC (PatNo, CLINIC (PatNo, PatNamePatName, , AppNoAppNo, Time, Doctor), Time, Doctor)From this scenario DB designer extracted the From this scenario DB designer extracted the following determinants:following determinants:a. PatNo -> PatNameb. PatNo, AppNo -> Time,Doctorc. Time -> AppNo

Two options for 1NF primary key selection:Two options for 1NF primary key selection:1. CLINIC (PatNo, PatName, AppNo, Time, Doctor)2. CLINIC (PatNo, PatName, AppNo, Time, Doctor)

Transitive FD here: (Time-> AppNo), PatNo -> Doctor

First Choice of Primary First Choice of Primary KeyKey

CLINIC (CLINIC (PatNoPatNo, , PatNamePatName, , AppNoAppNo, Time, Doctor), Time, Doctor)

1NF 1NF -- no nonno non--atomic values, so in 1NFatomic values, so in 1NF2NF 2NF –– eliminate Redundant Information eliminate Redundant Information (i.e. Partial key dependencies):(i.e. Partial key dependencies):

VISIT (VISIT (PatNo,AppNoPatNo,AppNo,Time,Doctor,Time,Doctor))PATIENT (PATIENT (PatNoPatNo,PatName,PatName) )

3NF 3NF –– no transient dependences so in no transient dependences so in 3NF3NFNow we test for BCNFNow we test for BCNF

Is EVERY determinant a Is EVERY determinant a Candidate Key?Candidate Key?

Go through Go through allall determinates where ALL determinates where ALL of the left hand side attributes are of the left hand side attributes are present in a relation and at least ONE present in a relation and at least ONE of the right hand side attributes is also of the right hand side attributes is also present in the relation. present in the relation.

Check for VISITCheck for VISITVISIT (VISIT (PatNo, PatNo, AppNoAppNo,Time,Doctor,Time,Doctor))a) a) PatnoPatno --> > PatNamePatName

PatnoPatno is present in DB, but not is present in DB, but not PatNamePatName, so , so irrelevant.irrelevant.

b) b) PatnoPatno, , AppNoAppNo --> Time, Doctor> Time, DoctorAll LHS and RHS present so relevant. Is this a All LHS and RHS present so relevant. Is this a candidate key? candidate key? PatNoPatNo, , AppNoAppNo IS the key, so this IS the key, so this is a candidate key. OK for BCNF complianceis a candidate key. OK for BCNF compliance

c) c) Time Time --> > AppNoAppNoTime is present, and so is Time is present, and so is AppNoAppNo, so relevant. Is , so relevant. Is this a candidate key? this a candidate key? If it was then we could rewrite VISIT as:If it was then we could rewrite VISIT as:

VISIT (VISIT (PatNo,AppNo,PatNo,AppNo,TimeTime,Doctor,Doctor))This will not work, as you need both Time and This will not work, as you need both Time and PatNo together to form a unique key. Thus this PatNo together to form a unique key. Thus this determinate is not a candidate key, and therefore determinate is not a candidate key, and therefore VISIT is not in BCNF.VISIT is not in BCNF.

Now do the same for Now do the same for PATIENT relationPATIENT relation

Page 6: 27 - Review for Final - Montana State University 8:10 PM CS 435 – Fall 2005 5 Normalization to BCNF – Example (2) Universal Table 5 Zane 2 14:00 Smith 4 Robert 1 13:00 McCoy 3

12/12/2005 8:10 PM

CS 435 – Fall 2005

6

Can we fix the VISIT Can we fix the VISIT relation?relation?

Note: Time is enough to work out the Note: Time is enough to work out the appointment number of a particular appointment number of a particular patientpatient

VISIT (VISIT (PatNo, TimePatNo, Time, Doctor), Doctor)PATIENT (PATIENT (PatNoPatNo,PatName,PatName))APPOINTMENT (APPOINTMENT (TimeTime, , AppNoAppNo))

Now BCNF is satisfied, and the final Now BCNF is satisfied, and the final relations shown are in BCNFrelations shown are in BCNF

Second Choice of Primary Second Choice of Primary KeyKeyCLINIC (CLINIC (PatNoPatNo, , PatNamePatName, , AppNoAppNo, , TimeTime, Doctor), Doctor)

1NF 1NF -- no nonno non--atomic values, so in 1NFatomic values, so in 1NF2NF 2NF –– eliminate Redundant Information eliminate Redundant Information (i.e. Partial key dependencies):(i.e. Partial key dependencies):

VISIT (VISIT (PatNo,TimePatNo,Time, Doctor), Doctor)PATIENT (PATIENT (PatNoPatNo,PatName,PatName))APPOINTMENT (APPOINTMENT (TimeTime, , AppNoAppNo))

3NF 3NF –– no transient dependences so the no transient dependences so the same as 2NFsame as 2NFNow we test for BCNF Now we test for BCNF –– we already know we already know that this time 3NF is in BCNFthat this time 3NF is in BCNF

Questions?Questions?

assetsbranch-citybranch-name

BRANCH

originating-branchamountloan-number

LOAN

banker-namecustomer-citycustomer-streetcustomer-namecustomer-id

CUSTOMER

payment-amountpayment-dateloan-numberpayment-number

PAYMENT

loan-numbercustomer-id

BORROWER

manageremployment-lengthstart-datetelephoneemployee-nameemployee-id

EMPLOYEE

access-dateaccount-numbercustomer-name

DEPOSITS

balanceinterest-ratecustomer-idaccount-number

SAVING ACCOUNT