View
225
Download
7
Embed Size (px)
Citation preview
Part 6 Part 6 Chapter 15 Chapter 15
Normalization of Relational DatabaseNormalization of Relational Database
1
• Design Methodologies• Goodness of design• functional dependencies• The normalization process and normal forms
– First, second, third, BCNF• Pros and cons of normalization
2
ObjectivesObjectives
• Database system can be designed via– Bottom-up (design by synthesis)– Top-Down (design by analysis)
3
Design MethodologyDesign Methodology
• Starts with the basic relationships between pair of attributes
• Uses these information to construct the relations
• not scalable and practical
4
Bottom-up design Bottom-up design
• The design process– Starts with one relation (set of all attributes)– Decomposes it into groups
• Use ER to model the conceptual schema• Existing design knowledge or experiences
– Maps each entity into table schema – Analyzes each table schema for goodness
• possible refinement and/or decomposition
5
Top-down designTop-down design
• Informal design metrics Semantics of the related attributes Reducing the redundant values in tuples Minimizing the NULL values Disallowing spurious tuples
6
Informal Design Guidelines for Relational Informal Design Guidelines for Relational SchemasSchemas
• Based on the semantics of attributes or how the attributes values in a tuple relate to one another– A schema should capture facts about one entity or
one relationship type
7
Semantics of the Relation AttributesSemantics of the Relation Attributes
• Design a relation schema so that it is easy to explain its meaning– do not combine attributes from multiple entity
types and relationship types into a single relation
10
Guideline 1Guideline 1
• The important objective of schema design – to minimize the storage space and effort – to minimize problems resulted from updates
• Example – Compare relations in Fig15.2 with those in Fig.15.4
12
Redundant Information in Tuples and Redundant Information in Tuples and Update AnomaliesUpdate Anomalies
• Update Anomalies – Insertion anomalies– deletion anomalies– Modification anomalies
15
Update AnomaliesUpdate Anomalies
• Insertion Anomalies • Consistency:
– E.g., insert a new employee » need to insert ALL attributes for Department, » or insert NULL if employee does not work
• Null values: – E.g., insert a new department, with no employee
» violation of Entity integrity because ssn cannot be NULL
• e.g., EMP_DEP fig 15.416
Insertion AnomaliesInsertion Anomalies
• Deletion Anomalies – Loss of Information
• E.g., – delete the very last employee who works for dnum=1 from
EMP_DEPT
18
Deletion AnomaliesDeletion Anomalies
• Modification Anomalies– Change one, change all
• E.g., change dept. Mgr or dept. number
20
Modification AnomaliesModification Anomalies
• Design anomaly-free base relation schemas– How? use formal approaches to validate design
against these guidelines
22
Guideline 2Guideline 2
• Results in a set of attributes that do not apply to all tuples– E.g., Student Phone number
• Not every student has a cell phone or work phone
• Guideline 3– Stay a way from attributes with NULL values in the base
table• Waste storage, difficulties to understand, aggregate functions,
and operations involving comparisons (e.g. join operation)
23
Null Values in TuplesNull Values in Tuples
• Refers to the undesirable decomposition of a relation– E.g.,
• EMP_LOC and EMP_PROJ1
24
Generation of Spurious (or invalid) TuplesGeneration of Spurious (or invalid) Tuples
• Design relation schema so that they can be JOINED with equality conditions on attributes that are either PKs or FKs
27
Guideline 4Guideline 4
Summary and discussion of design Summary and discussion of design guidelinesguidelines
• The problems discussed can be avoided using the following guidelines1. Anomalies that cause redundant work to be
done during insertion, deletion, and modifications
2. Waste of storage space due to NULL3. Generations of invalid and spurious data during
Join on base relations using non-key attributes
28
• Refers to a requirement between two sets of attributes: X and Y such that– For two tuples t1, and t2 in r(R)
• if t1[X]=t2[X] t1[Y] =t2[Y]
• Used to define normal forms
29
Functional DependenciesFunctional Dependencies
• Represented by X Y– X functionally determines Y– or, Y functionally depends on X– if for each X value, we have ONLY one Y value,
then X is Candidate Key (CK)• Note: FD is the property of the semantics or
meaning of attributes• Legal relation states (legal extensions) of R
30
Functional Dependencies (FD): Formal Functional Dependencies (FD): Formal definitiondefinition
• The notion of dependency has to do with a schema-based dependency – It is a semantic notation– FD is part of the process of understanding what
the data means
31
Properties of functional dependencies Properties of functional dependencies (FDs)(FDs)
• Legal extensions (or legal relation): – Refers to the extensions r(R) that satisfy the functional
dependency constraint • A FD is a property of the relation schema not the
relation extension
33
Important Notes on FDsImportant Notes on FDs
• Normalization theory: – builds around the concept of normal forms– used in the design process
• a relation is in a particularly normal form if it satisfies a specified set of requirements– E.g.,
• 1NF (i.e., all underlying domains MUST have atomic values)
35
NormalizationNormalization
• Type of Normal Forms– 1NF– 2NF– 3NF– BCNF– 4NF– 5NF (PJ/NF)– DKNF (absolute normal form)
36
Normal FormNormal Form
• 1NF prevents– multi-valued attributes, – composite attributes– combinations of the above
• See fig 15.8• See fig 15.9
– nested relation or multivalued composite attributes
38
First Normal Form (1NF)First Normal Form (1NF)
• Based on the concepts of full functional dependency• Analogy to the traditional justice oath:
– Every non-key attribute depends on a key, the whole key, and nothing but the key
• R is in 2NF iff – R is in 1NF– Every non-key attribute is fully depend on the PK
41
Second Normal Form (2NF)Second Normal Form (2NF)
• Based on the concepts of transitive dependency
• Relation R is in 3NF iff– R is in 2NF– Every non-key attribute is non-transitively
dependents on the PK
44
Third Normal FormThird Normal Form
• Formal Definition– R is in 3NF if, whenever a functional dependency
XY exists then• X is super key • Y is prime attribute
• e.g.,– LOTS2 in fig.15.12.b is 3NF– LOTS1 in fig.15.12.b (FD4) is NOT 3NF
46
Interpretation of 3NFInterpretation of 3NF
Alternative definition of 3NFAlternative definition of 3NF
• A relation schema R is in 3NF if every non-prime attribute of R satisfies the following conditions:– Non-primed attribute fully functionally depends
on every Key of R– Non-primed attribute is non-transitively depend
on every key of R
51
• Boyce-Codd normal form– A more restricter formal form than 3NF
•If R is BCNF then R is also in 3NF•R in 3NF does not mean R is BCNF
– Attempts to eliminate more redundancy not detectable by 3NF
52
Boyce/Codd NFBoyce/Codd NF
ExampleExample
• Suppose we have thousands of lots in the relation but the lots are from only two counties– DeKalb and Fulton
• Let say lot sizes in – The Dekalb are 0.5.,…,1.0 acres– The Fulton are 1.1, 1.2, …1.9,2.0 acres
• Also assume that– FD5: Area County_Name
53
• A relation R is in Boyce/Codd normal form (BCNF) iff – Every determinant is a CK
• (i.e., each attribute MUST describe the key, the whole key, and nothing but the key)
• Ensures no redundancy (GOOD)• Considered the most desirable NF
55
Boyce/Codd NF (Cont’)Boyce/Codd NF (Cont’)
• Consider a relation TEACH with– FD1: {Student, Course} Instructor– FD2: Instructor Course
• The relation is 3NF• Is it in BCNF? No
56
ExampleExample
Candidate key
BCNF ExampleBCNF ExampleSemanticsSemantics
• A student can take more than one course• But a student has a different instructor for
each course.• Each instructor (non-key) teaches only one
course (partial key).
57
• Possible decompositions are1. {Student, Instructor} and {Student, Course}2. {Course, Instructor} and {Course, Student}3. {Instructor, Course} and {Instructor, Student}
• Which of the decomposition is better? Justify it.
59
More on ExampleMore on Example
Instructor-course TableInstructor-course Table
Instructor Course
Mark Database
Navathe Database
Schulman Theory
Ahmand OS
Omiecinski Database
Ammar OS
60
Instructor-student TableInstructor-student Table
Instructor StudentMark Narayan
Mark Wallace
Navathe Smith
Navathe Zelaya
Ammar Smith
Ammar Narayan
Schulman Smith
Ahmand Wallace
OMIECINSKIw Wong61
• Decomposition: Pros and cons– Makes answering the complex queries less efficient (BAD)
because additional joins must be performed during query (BAD)
– May increase storage requirements if the degree of redundancy is very low (BAD)
– May decrease storage requirements if the degree of redundancy is very high (Good)
– Makes simple update transaction more efficient (GOOD)
62
To decompose or Not to decompose?To decompose or Not to decompose?
Multivalued DependencyMultivalued DependencyFourth Normal FormFourth Normal Form
• We discussed the concept of functional dependency (FD)• Other constraints that cannot be specified as functional dependencies is
– multivalued dependency (MVD) and define fourth normal form, which is based on this dependency
• It is a direct consequence of first normal form (1NF) which disallows an attribute in a tuple to have a set of values
• Happens when have two or more multivalued independent attributes in the same relation schema
– i.e., having a relation consists of multiple 1:Ns
63
• Multivalued dependency(MVD) XY on R, – where XYR, and Z = (R – (XY)) specifies the
following conditions on r(R):• t3[X]= t4[X]= t1[X]= t2[X]• t3[Y]=t1[Y] and t4[Y] = t2[Y]• t3[Z]=t2[Z] and t4[Z] = t1[Z]
• 4NF typically involves eliminating MVDs by repeated binary decompositions as well.
64
Formal Definition of Multivalued DependencyFormal Definition of Multivalued Dependency
Join Dependencies (JD)Join Dependencies (JD)Fifth Normal Form (Project-Join)Fifth Normal Form (Project-Join)
• Join dependency – constraint on the set of legal relations over a database
scheme. – A table T is subject to a join dependency if T can always be
recreated by joining multiple tables each having a subset of the attributes of T
– Join operation must satisfy the lossless (or nonadditive) join property
• A very specific semantic constraint and very difficult to detect in practice– there is no sound and complete axiomatization for join dependencies
66
Example (JD)Example (JD)
• Suppose that the following additional constraint always holds:– Whenever a supplier s supplies part p, – and a project j uses part p, – and the supplier s supplies at least one part pi to
project j, – Then supplier s will also be supplying part p to
project j.
67