Upload
elijah-jacobs
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Data Normalization
• Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data.
• The process of decomposing relations with anomalies to produce smaller, well-structured relations.
• Primary Objective: Reduce Redundancy,Reduce nulls,
• Improve “modify” activities:– insert, – update, – delete, – but not read
• Price: degraded query, display, reporting
Functional Dependency and Keys
• Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute.
• Candidate Key: Each non-key field is functionally dependent on every candidate key.
Normalization – most used
• Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF).
• Based on functional dependencies among the attributes of a relation.
• A relation can be normalized to a specific form to prevent possible occurrence of update anomalies.
First Normal Form
• No multi-valued attributes.• Every attribute value is atomic.
• Why are the following tables not in 1NFEmployee (ssn, Name, Salary, Address, ListOfSkills)
Department (Did, Dname, ssn)
Second Normal Form
• 1NF and every non-key attribute is fully functionally dependent on the primary key.
• Every non-key attribute must be defined by the entire key, not by only part of the key.
• No partial functional dependencies.
Assuming that we have a composite PK (LicensePlate, OwnerSSN) for the Vechicle Table below, why is the table not in 2NF ?
Vehicle (LicensePlate, Brand, Model, PurchasePrice, Year, OwnerSSN, OwnerName)
Third Normal Form & BCNF
• 2NF and no transitive dependencies (or no functional dependency between non-key attributes = BCNF)
Why are the following tables not in 3NF or BCNF ?
• Why is Employee [ssn, name, salary, did, dname]
• Customer
3NF & BCNF
• It is very rare for a Table to be in 3NF and not be in BCNF (violation of BCNF).
• Given a Relation R with attributes A, B and C where A and B are together the composite PK,
IF A, B -> C and C -> BTHEN R is in 3NF and is not in BCNF
Example: Student, course -> InstructorInstructor -> Course
Steps in Normalization
• 1NF: a table, without multivalued attributes– if not, then decompose
• 2NF: 1NF and every non-key attribute is fully functionally dependent on the primary key– if not, then decompose
• 3NF: 2NF and no transitive dependencies– if not, then decompose
• GENERAL: – Each table should describe a single theme– Modification anomalies are minimized
Hint: THE KEY, THE WHOLE KEY AND NOTHING BUT THE KEY
Normalized Tables Must:
• Two important properties of decomposition:- Lossless-join property enables us to find any instance of original relation from corresponding instances in the smaller relations. - Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations.
4NF
• Dependency between attributes (for example, A, B, and C) in a relation, such that for each value of A there is a set of values for B and a set of values for C. However, set of values for B and C are independent of each other.
Matching
• 1) A Weak Entity _____ a) skills• 2) A Derived Attribute _____ b) ssn• 3) A Composite Attribute _____ c) Monthly Sales• 4) A Multi-Valued Attribute_____ d) full name• 5) Primary Key _______ e) Section of a Course
Which tables are in 3NF ?
Employee (ssn, Name, Salary, pid) (obs.: employee can only work in one project)
Employee (ssn, Name, Salary, pid) (obs.: employee can work in many projects)
Vehicle (OwnerSSN, OwnerName, OwnerAddress, LicensePlate, Brand, Model, PurchasePrice, Year)(obs.: an owner can have several cars)
Project (pid, pname, did) (pid determines did)
Project (pid, pname, ssn) (ssn determines pid)
Normalize to 3NF
• ClientName -> Address, EmpNo -> Name, Service -> AmountDue• ClientName and Date -> Service, ClientName and Date -> EmpNo