View
218
Download
3
Category
Tags:
Preview:
Citation preview
Chapter 8Chapter 8Normalization
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Outline Outline
Modification anomaliesFunctional dependenciesMajor normal formsRelationship independencePractical concerns
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Modification AnomaliesModification Anomalies
Unexpected side effectInsert, modify, and delete more data than
desiredCaused by excessive redundanciesStrive for one fact in one place
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Big University Database TableBig University Database Table
StdSSN StdClass OfferNo OffYear Grade CourseNo CrsDesc
S1 JUN O1 2000 3.5 C1 DB S1 JUN O2 2000 3.3 C2 VB S2 JUN O3 2000 3.1 C3 OO S2 JUN O2 2000 3.4 C2 VB
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Functional DependenciesFunctional Dependencies
Constraint on the possible rows in a tableValue neutral like FKs and PKsAssertedUnderstand business rules
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
FD DefinitionFD Definition
X YX (functionally) determines YX: left-hand-side (LHS) or determinantFor each X value, there is at most one Y
valueSimilar to candidate keys
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
FD Diagrams and ListsFD Diagrams and Lists
StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGradeCourseNo CrsDesc
StdSSN StdCity, StdClass
OfferNo OffTerm, OffYear, CourseNo, CrsDesc
CourseNo CrsDesc
StdSSN, OfferNo EnrGrade
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
FDs in DataFDs in Data
• Prove non-existence (but not existence) by looking at data
• Two rows that have the same X value but a different Y value
StdSSN StdClass OfferNo OffYear Grade CourseNo CrsDesc
S1 JUN O1 2000 3.5 C1 DB S1 JUN O2 2000 3.3 C2 VB S2 JUN O3 2000 3.1 C3 OO S2 JUN O2 2000 3.4 C2 VB
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
NormalizationNormalization
Process of removing unwanted redundancies
Apply normal forms– Identify FDs– Determine whether FDs meet normal form– Split the table to meet the normal form if there
is a violation
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Relationships of Normal FormsRelationships of Normal Forms1NF
2NF
3NF/BCNF
4NF
5NF
DKNF
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
1NF1NF
Starting point for SQL2 databasesNo repeating groups: flat rows
StdSSN StdClass OfferNo OffYear Grade CourseNo CrsDesc
S1 JUN O1 2000 3.5 C1 DB O2 2000 3.3 C2 VB S2 JUN O3 2000 3.1 C3 OO O2 2000 3.4 C2 VB
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Combined Definition of Combined Definition of 2NF/3NF2NF/3NFKey column: candidate key or part of
candidate keyAnalogy to the traditional justice oathEvery nonkey depends on a key, the whole
key, and nothing but the keyUsually taught as separate definitions
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
2NF2NF
Every nonkey column depends on a whole key, not part of a key
Violations– Part of key nonkey– Violations only for combined keys
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
2NF Example2NF Example
Many violations for the big university database table– StdSSN StdCity, StdClass– OfferNo OffTerm, OffYear, CourseNo,
CrsDescSplitting the table
– UnivTable1 (StdSSN, StdCity, StdClass) – UnivTable2 (OfferNo, OffTerm, OffYear,
CourseNo, CrsDesc)
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
3NF3NF
Every nonkey column depends only on a key not on nonkey columns
Violations: Nonkey NonkeyAlternative formulation
– No transitive FDs– A B, B C then A C– OfferNo CourseNo, CourseNo CrsDesc
then OfferNo CrsDesc
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
3NF Example3NF Example
One violation in UnivTable2– CourseNo CrsDesc
Splitting the table– UnivTable2-1 (OfferNo, OffTerm, OffYear,
CourseNo, CrsDesc)– UnivTable2-2 (CourseNo, CrsDesc)
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
BCNFBCNF
Every determinant must be a candidate keySimpler definitionApply with simple synthesis procedureSpecial case not covered by 3NF
– Part of key Part of key– Special case is not common
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
BCNF ExampleBCNF Example
Many violations for the big university database table– StdSSN StdCity, StdClass– OfferNo OffTerm, OffYear, CourseNo,
CrsDesc– CourseNo CrsDesc
Splitting into four tables
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Simple Synthesis ProcedureSimple Synthesis Procedure
1. Eliminate extraneous columns from the LHSs.
2. Remove derived FDs. 3. Arrange the FDs into groups with each
group having the same determinant. 4. For each FD group, make a table with the
determinant as the primary key.5. Merge tables in which one table contains
all columns of the other table.
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Simple Synthesis ExampleSimple Synthesis Example
Step 1: no extraneous columnsStep 2: eliminate OfferNo CrsDescStep 3: already arranged by LHSStep 4: four tables (Student, Enrollment,
Course, Offering)Step 5: no redundant tables
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Relationship Independence and Relationship Independence and 4NF4NFM-way relationship that can be derived
from binary relationships Split into binary relationshipsSpecialized problem4NF does not involve FDs
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Relationship Independence Relationship Independence ProblemProblem
StdSSNStdName
StudentOfferNoOffLocation
Offering
TextNoTextTitle
Textbook
EnrollStd-Enroll
Offer-Enroll
Text-Enroll
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Relationship Independence Relationship Independence SolutionSolution
StdSSNStdName
Student
OfferNoOffLocation
Offering
TextNoTextTitle
Textbook
Enroll Orders
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
MVDs and 4NFMVDs and 4NF
MVD: difficult to identify– A B | C (multi-determines)– A associated with a collection of B and C
values– B and C are independent– Nontrivial MVD: not also an FD
4NF: no nontrivial MVDs
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Higher Level Normal FormsHigher Level Normal Forms
5NF for M-way relationshipsDKNF: absolute normal formDKNF is an ideal, not a practical normal
form
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Role of NormalizationRole of Normalization
Refinement– Use after ERD– Apply to table design or ERD
Initial design– Record attributes and FDs– No initial ERD– May reverse engineer an ERD
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
Normalization ObjectiveNormalization Objective
Update biasedNot a concern for databases without
updates (data warehouses)Denormalization
– Purposeful violation of a normal form– Some FDs may not cause anomalies– May improve performance
© 2001 The McGraw-Hill
Companies, Inc. All rights reserved.
McGraw-Hill/Irwin
SummarySummary
Beware of unwanted redundanciesFDs are important constraintsStrive for BCNFUse a CASE tool for large problemsImportant tool of database developmentFocus on the normalization objective
Recommended