Upload
vumien
View
217
Download
1
Embed Size (px)
Citation preview
12/12/2005 8:10 PM
CS 435 – Fall 2005
1
Final ReviewFinal Review
Dr. Rafal A. AngrykDr. Rafal A. Angryk
OutlineOutline
Covered readings in textbooks Covered readings in textbooks Regular type of test on the final (single Regular type of test on the final (single choice!)choice!)A few exercisesA few exercises
ERNormalization
Readings in textbook (1)Readings in textbook (1)
Part 1 Introduction and Conceptual Part 1 Introduction and Conceptual ModelingModeling
Chapter 1.1Chapter 1.1--6:6: Databases and Databases and database Users database Users Chapter 2.1Chapter 2.1--77: System concepts and System concepts and ArchitectureArchitectureChapter 3.1Chapter 3.1--9: Modeling using the E9: Modeling using the E--R diagramR diagram
Readings in textbook (2)Readings in textbook (2)
Part 2 The Relational ModelPart 2 The Relational ModelChapter 5: Concepts and ConstraintsChapter 5: Concepts and ConstraintsChapter 6.1Chapter 6.1--7: Relational Algebra7: Relational AlgebraSection 7.1Section 7.1--3: Mapping ER diagrams 3: Mapping ER diagrams to tablesto tablesChapter 8.1Chapter 8.1--8, 9.: SQL schema 8, 9.: SQL schema definition, constraints, queries, viewsdefinition, constraints, queries, views
Readings in textbook (3)Readings in textbook (3)
Part 3 Database DesignPart 3 Database DesignChapter 10.1Chapter 10.1--6: Functional 6: Functional dependencies and normalizationdependencies and normalizationChapter 11.Intro: Two approaches to Chapter 11.Intro: Two approaches to normalizationnormalization
Readings in textbook (4)Readings in textbook (4)
Part 4 Data storage, indexing, Part 4 Data storage, indexing, query processingquery processing
Chapter 13: Disk storage, file Chapter 13: Disk storage, file structures, & hashingstructures, & hashingChapter 14: Indexing structures for Chapter 14: Indexing structures for filesfilesSection 15.1Section 15.1--3, 15.7: Query 3, 15.7: Query optimization & algorithms for joinoptimization & algorithms for joinChapter 16 Practical design & tuningChapter 16 Practical design & tuning
12/12/2005 8:10 PM
CS 435 – Fall 2005
2
Readings in textbook (5)Readings in textbook (5)Part 5 Transaction Processing Part 5 Transaction Processing conceptsconcepts
Chapter 19Chapter 19--19.4: Database recovery 19.4: Database recovery techniquestechniques
This time This time -- regular type of regular type of test for the theory parttest for the theory part
Good news:Good news:Just a single answer will be correctJust a single answer will be correctStrategy: CHOOSE THE BEST ANSWERStrategy: CHOOSE THE BEST ANSWER
Bad news:Bad news:Still tricky… Otherwise it would be trivial!
Why? There is no way you can be good Why? There is no way you can be good in practice, if you do not have strong in practice, if you do not have strong foundations in theoryfoundations in theoryAs usually there will be more practice As usually there will be more practice than theorythan theory
Be sure to practice at home:Be sure to practice at home:Prepare ER Design in Min, Max notation, Prepare ER Design in Min, Max notation, and map it to the Relational Diagramand map it to the Relational DiagramTransform a Universal Relation to 3NF or Transform a Universal Relation to 3NF or BCNF tables, and find a minimal set of BCNF tables, and find a minimal set of full functional dependenciesfull functional dependenciesRelational Algebra and SQL CommandsRelational Algebra and SQL CommandsQuery optimizationQuery optimizationInsert and delete data to/from B+Insert and delete data to/from B+--treestreesRecommend the best type of indexing for Recommend the best type of indexing for given attributes and storage formatsgiven attributes and storage formats
Exe. Exe. -- Model the following Model the following System Requirements:System Requirements:1.1. The bank is organized into branches. Each The bank is organized into branches. Each
branch is located in a particular city and is branch is located in a particular city and is identified by a unique name. The bank identified by a unique name. The bank monitors the assets of each branch.monitors the assets of each branch.
2.2. Bank customers are identified by their Bank customers are identified by their customercustomer--id id values. The bank stores each values. The bank stores each customer's name, and the street and city customer's name, and the street and city where the customer lives. Customers may where the customer lives. Customers may have multiple saving accounts and can take have multiple saving accounts and can take out loans. A customer may be associated out loans. A customer may be associated with a particular banker, who may act as with a particular banker, who may act as either a loan officer or a personal banker for either a loan officer or a personal banker for that customer. that customer.
Exe. Exe. –– cont.cont.3.3. Bank employees are identified by their Bank employees are identified by their employeeemployee--
idid values. The bank administration stores the values. The bank administration stores the name and telephone number of each employee, name and telephone number of each employee, and the and the employeeemployee--idid number of the employee's number of the employee's manager. The bank also keeps track of the manager. The bank also keeps track of the employee's start date and, thus, length of employee's start date and, thus, length of employment. employment.
4.4. The bank offers only two types of services The bank offers only two types of services --saving accounts and loans. Accounts can be held saving accounts and loans. Accounts can be held by more than one customer, and a customer can by more than one customer, and a customer can have more than one account. Each account is have more than one account. Each account is assigned a unique account number. The bank assigned a unique account number. The bank maintains a record of each account's balance. It maintains a record of each account's balance. It also records all dates when the account was also records all dates when the account was accessed by each customer holding the account. accessed by each customer holding the account. Each savings account has a separate interest rate Each savings account has a separate interest rate and information about the branch name where it and information about the branch name where it was opened.was opened.
Exe. Exe. –– cont.cont.5.5. A loan originates at a particular branch and A loan originates at a particular branch and
can be held by one or more customers. A can be held by one or more customers. A loan is identified by a unique loan number. loan is identified by a unique loan number. For each loan, the bank keeps track of the For each loan, the bank keeps track of the loan amount of the loan payments. Although loan amount of the loan payments. Although a loan payment number does not uniquely a loan payment number does not uniquely identify a particular payment among those identify a particular payment among those for all the bank's loans, a payment number for all the bank's loans, a payment number does identify a particular payment for a does identify a particular payment for a specific loan. The date and amount are specific loan. The date and amount are recorded for each payment recorded for each payment Note: Do not worry about making this Note: Do not worry about making this description complete. Just design data description complete. Just design data model so it would reflect all requirements model so it would reflect all requirements mentioned abovementioned above
12/12/2005 8:10 PM
CS 435 – Fall 2005
3
Now map it to Relational Now map it to Relational Model and practice SQL Model and practice SQL on iton it
E.g. Retrieve all bank account numbers E.g. Retrieve all bank account numbers (only unique values), which were opened (only unique values), which were opened in Bozeman, and accessed by Mr. John in Bozeman, and accessed by Mr. John Smith (i.e. Smith (i.e. ‘‘John SmithJohn Smith’’) on 10th of ) on 10th of February (i.e. February (i.e. ‘‘Feb/05/2005Feb/05/2005’’).).Practice optimization of your queriesPractice optimization of your queriesPut the same in Relational AlgebraPut the same in Relational AlgebraWhat type of indexes would you specify?What type of indexes would you specify?Are your tables normalized?Are your tables normalized?
ER Design DecisionsER Design DecisionsThe use of an attribute or entity type (set) to The use of an attribute or entity type (set) to represent an object.represent an object.Whether a realWhether a real--world concept is best expressed world concept is best expressed by an entity set or a relationship type (set).by an entity set or a relationship type (set).The use of a ternary relationship versus a pair of The use of a ternary relationship versus a pair of binary relationships.binary relationships.The use of a strong or weak entity set.The use of a strong or weak entity set.
The use of specialization/generalization –contributes to modularity in the design.The use of aggregation – can treat the aggregate entity set as a single unit without concern for the details of its internal structure.
ER Design Crucial ElementsER Design Crucial ElementsUse of entity types vs. attributesUse of entity types vs. attributesChoice mainly depends on the structure of the Choice mainly depends on the structure of the enterprise being modeled, and on the semantics enterprise being modeled, and on the semantics associated with the attribute in question.associated with the attribute in question.Use of entity types (sets) vs. relationship types (sets)Use of entity types (sets) vs. relationship types (sets)Possible guideline is to designate a relationship set Possible guideline is to designate a relationship set to describe an action that occurs between entitiesto describe an action that occurs between entitiesBinary versus Binary versus nn--ary relationship typesary relationship typesAlthough it is possible to replace any nonbinary (Although it is possible to replace any nonbinary (nn--ary, for ary, for nn > 2) relationship set by a number of > 2) relationship set by a number of distinct binary relationship sets, a distinct binary relationship sets, a nn--ary ary relationship set shows more clearly that several relationship set shows more clearly that several entities participate in a single relationship.entities participate in a single relationship.Appropriate placement of relationship attributesAppropriate placement of relationship attributes
More on normalizationMore on normalization
BoyceBoyce--CoddCodd Normal FormNormal Form((BCNF)BCNF)One more example of One more example of normalizationnormalization
Why BCNF is importantWhy BCNF is important
When a relation has more than one When a relation has more than one candidate key, anomalies may result candidate key, anomalies may result even though the relation is in 3NF.even though the relation is in 3NF.3NF does not deal satisfactorily with 3NF does not deal satisfactorily with the case of a relation with overlapping the case of a relation with overlapping candidate keyscandidate keys
i.e. composite candidate keys with at least one attribute in common.
12/12/2005 8:10 PM
CS 435 – Fall 2005
4
ReminderReminder
SuperkeySuperkey of relation schema R of relation schema R -- a set of a set of attributes S of R that contains a key of Rattributes S of R that contains a key of RA relation schema R is in A relation schema R is in third normal third normal form (3NF)form (3NF) if whenever a FD X if whenever a FD X --> > A A holds in R, then either: holds in R, then either:
(a) X is a (a) X is a superkeysuperkey of R, or of R, or (b) A is a prime attribute of R(b) A is a prime attribute of R
NOTE:NOTE:BoyceBoyce--CoddCodd normal form disallows normal form disallows condition (b) abovecondition (b) above
What is a Prime AttributeWhat is a Prime Attribute
AA prime attributeprime attribute must bemust be a a member of some candidate keymember of some candidate key
AA nonprime nonprime (or(or nonkeynonkey) ) attributeattribute is not a prime attribute is not a prime attribute -- that is, it is that is, it is not a member of any not a member of any candidate keycandidate key..
In other words In other words ……A relation is in A relation is in 3NF3NF if every if every nonkeynonkey attribute is attribute is in 2NF and in 2NF and nontransitively dependent on nontransitively dependent on a candidate keya candidate key
BCNFBCNF is based on the concept of a is based on the concept of a determinantdeterminant..
A determinant is any attribute (simple or A determinant is any attribute (simple or composite) on which some other attribute is composite) on which some other attribute is fullyfully functionally dependentfunctionally dependent
A relation is in A relation is in BCNFBCNF if and only ifif and only if every every determinant is a candidate keydeterminant is a candidate keyi.e. whenever an FD X i.e. whenever an FD X --> > A holds in R, then X A holds in R, then X must be a must be a superkeysuperkey of Rof R
RememberRemember
Each normal form is strictly stronger Each normal form is strictly stronger than the previous onethan the previous one
Every 2NF relation is in 1NFEvery 2NF relation is in 1NFEvery 3NF relation is in 2NFEvery 3NF relation is in 2NFEvery BCNF relation is in 3NFEvery BCNF relation is in 3NF
BCNF is a simpler, yet stricter form of BCNF is a simpler, yet stricter form of 3NF3NFNOTE: There exist relations that are in NOTE: There exist relations that are in 3NF but not in BCNF3NF but not in BCNF
The theoryThe theoryConsider the following relation and Consider the following relation and FDsFDs: : R(R(a,ba,b,c,d,c,d))
(1) (1) a,ca,c --> > b,db,d(2) (2) a,da,d --> b> b
Here, the first determinant suggests that the primary Here, the first determinant suggests that the primary key of R could be changed from key of R could be changed from a,ba,b to to a,ca,c. . If this change was done all of the nonIf this change was done all of the non--key attributes key attributes present in R could still be determined, and therefore present in R could still be determined, and therefore this change is legal. this change is legal. However, the second determinant indicates that However, the second determinant indicates that a,da,ddetermines determines bb, but, but a,da,d could not be the key of R as could not be the key of R as a,da,d does not determine does not determine allall of the non key attributes of the non key attributes of R (it does not determine of R (it does not determine cc). ). We would say that the determinant in the first FD is We would say that the determinant in the first FD is a candidate key, but the second determinant is not a a candidate key, but the second determinant is not a candidate key, and thus candidate key, and thus this relation is not in BCNFthis relation is not in BCNFHowever, R is in 3NF since in the second FD However, R is in 3NF since in the second FD bb is a is a prime attribute of Rprime attribute of R
Normalization to BCNF Normalization to BCNF ––Example (1)Example (1)
The CLINIC relation above depicts a special The CLINIC relation above depicts a special weightweight--loss surgery clinic where the each loss surgery clinic where the each patient has 4 appointments:patient has 4 appointments:On the first they are weighed, On the first they are weighed, On the second they are exercised, On the second they are exercised, On the third they have gastric surgery, On the third they have gastric surgery, And on the fourth they are reAnd on the fourth they are re--examined and examined and given diet tips to not gain the weight againgiven diet tips to not gain the weight againNot all patients need all four appointments! Not all patients need all four appointments! Appointment 1 is either 09:00 or 13:00, Appointment 1 is either 09:00 or 13:00, appointment 2 10:00 or 14:00, and so onappointment 2 10:00 or 14:00, and so on……
12/12/2005 8:10 PM
CS 435 – Fall 2005
5
Normalization to BCNF Normalization to BCNF ––Example (2) Example (2) Universal TableUniversal Table
SmithSmith14:0014:0022ZaneZane55McCoyMcCoy13:0013:0011RobertRobert44SmithSmith10:0010:0022AdamAdam33
McCoyMcCoy09:0009:0011KerrKerr22SmithSmith09:0009:0011JohnJohn11DoctorDoctorTimeTimeAppointmentNoAppointmentNoPaienttNamePaienttNamePatient NoPatient No
CLINICCLINIC
Normalization to BCNF Normalization to BCNF ––Example (3)Example (3)
CLINIC (PatNo, CLINIC (PatNo, PatNamePatName, , AppNoAppNo, Time, Doctor), Time, Doctor)From this scenario DB designer extracted the From this scenario DB designer extracted the following determinants:following determinants:a. PatNo -> PatNameb. PatNo, AppNo -> Time,Doctorc. Time -> AppNo
Two options for 1NF primary key selection:Two options for 1NF primary key selection:1. CLINIC (PatNo, PatName, AppNo, Time, Doctor)2. CLINIC (PatNo, PatName, AppNo, Time, Doctor)
Transitive FD here: (Time-> AppNo), PatNo -> Doctor
First Choice of Primary First Choice of Primary KeyKey
CLINIC (CLINIC (PatNoPatNo, , PatNamePatName, , AppNoAppNo, Time, Doctor), Time, Doctor)
1NF 1NF -- no nonno non--atomic values, so in 1NFatomic values, so in 1NF2NF 2NF –– eliminate Redundant Information eliminate Redundant Information (i.e. Partial key dependencies):(i.e. Partial key dependencies):
VISIT (VISIT (PatNo,AppNoPatNo,AppNo,Time,Doctor,Time,Doctor))PATIENT (PATIENT (PatNoPatNo,PatName,PatName) )
3NF 3NF –– no transient dependences so in no transient dependences so in 3NF3NFNow we test for BCNFNow we test for BCNF
Is EVERY determinant a Is EVERY determinant a Candidate Key?Candidate Key?
Go through Go through allall determinates where ALL determinates where ALL of the left hand side attributes are of the left hand side attributes are present in a relation and at least ONE present in a relation and at least ONE of the right hand side attributes is also of the right hand side attributes is also present in the relation. present in the relation.
Check for VISITCheck for VISITVISIT (VISIT (PatNo, PatNo, AppNoAppNo,Time,Doctor,Time,Doctor))a) a) PatnoPatno --> > PatNamePatName
PatnoPatno is present in DB, but not is present in DB, but not PatNamePatName, so , so irrelevant.irrelevant.
b) b) PatnoPatno, , AppNoAppNo --> Time, Doctor> Time, DoctorAll LHS and RHS present so relevant. Is this a All LHS and RHS present so relevant. Is this a candidate key? candidate key? PatNoPatNo, , AppNoAppNo IS the key, so this IS the key, so this is a candidate key. OK for BCNF complianceis a candidate key. OK for BCNF compliance
c) c) Time Time --> > AppNoAppNoTime is present, and so is Time is present, and so is AppNoAppNo, so relevant. Is , so relevant. Is this a candidate key? this a candidate key? If it was then we could rewrite VISIT as:If it was then we could rewrite VISIT as:
VISIT (VISIT (PatNo,AppNo,PatNo,AppNo,TimeTime,Doctor,Doctor))This will not work, as you need both Time and This will not work, as you need both Time and PatNo together to form a unique key. Thus this PatNo together to form a unique key. Thus this determinate is not a candidate key, and therefore determinate is not a candidate key, and therefore VISIT is not in BCNF.VISIT is not in BCNF.
Now do the same for Now do the same for PATIENT relationPATIENT relation
12/12/2005 8:10 PM
CS 435 – Fall 2005
6
Can we fix the VISIT Can we fix the VISIT relation?relation?
Note: Time is enough to work out the Note: Time is enough to work out the appointment number of a particular appointment number of a particular patientpatient
VISIT (VISIT (PatNo, TimePatNo, Time, Doctor), Doctor)PATIENT (PATIENT (PatNoPatNo,PatName,PatName))APPOINTMENT (APPOINTMENT (TimeTime, , AppNoAppNo))
Now BCNF is satisfied, and the final Now BCNF is satisfied, and the final relations shown are in BCNFrelations shown are in BCNF
Second Choice of Primary Second Choice of Primary KeyKeyCLINIC (CLINIC (PatNoPatNo, , PatNamePatName, , AppNoAppNo, , TimeTime, Doctor), Doctor)
1NF 1NF -- no nonno non--atomic values, so in 1NFatomic values, so in 1NF2NF 2NF –– eliminate Redundant Information eliminate Redundant Information (i.e. Partial key dependencies):(i.e. Partial key dependencies):
VISIT (VISIT (PatNo,TimePatNo,Time, Doctor), Doctor)PATIENT (PATIENT (PatNoPatNo,PatName,PatName))APPOINTMENT (APPOINTMENT (TimeTime, , AppNoAppNo))
3NF 3NF –– no transient dependences so the no transient dependences so the same as 2NFsame as 2NFNow we test for BCNF Now we test for BCNF –– we already know we already know that this time 3NF is in BCNFthat this time 3NF is in BCNF
Questions?Questions?
assetsbranch-citybranch-name
BRANCH
originating-branchamountloan-number
LOAN
banker-namecustomer-citycustomer-streetcustomer-namecustomer-id
CUSTOMER
payment-amountpayment-dateloan-numberpayment-number
PAYMENT
loan-numbercustomer-id
BORROWER
manageremployment-lengthstart-datetelephoneemployee-nameemployee-id
EMPLOYEE
access-dateaccount-numbercustomer-name
DEPOSITS
balanceinterest-ratecustomer-idaccount-number
SAVING ACCOUNT