5
Module 6 Data Normalization Module Introduction Module Overview This module discusses data normalization and the physical data model. It explains data normalization and the rules of first, second, and third normal form, and discusses the physical data model and why, during design and implementation, the database designer may denormalize a logical data model in order to maximize performance. Normalization Data Normalization Data normalization is a method of organizing data to reduce or eliminate insertion, deletion, and update anomalies. Data normalization is done by making sure we have one fact in one place, and all attributes are fully dependent on the unique identifier—this reduces data redundancy and increases data integrity. Normalization of relational databases has developed a reputation of being a difficult and complex operation that is technical in nature and must be completed by very experienced physical database experts. With this in mind, some data modelers have avoided data normalization. Normalization, however, is no more complex than any other operation. Normalization has more to do with understanding and analyzing the business data requirements and the implementation of the business rules than it has to do with understanding relational theory. You are more likely to be successful creating a data model if you understand the business than if you are a database product expert or an experienced data modeler. There are two basic concepts of data normalization: Each attribute in an entity must be functionally dependent on the whole key (unique identifier). Each attribute should be named appropriately, according to business and relational conventions, and must be present in only one entity. Attributes should only migrate to other entities under the rules of foreign keys or data denormalization. Module 6: Data Normalization © ESI International BAP:DWL:EN:000 ver. 2.0 57

data normalization physical data model Data Normalization ......An entity is in second normal form if it is already in first normal form, and all its attributes are fully dependent

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: data normalization physical data model Data Normalization ......An entity is in second normal form if it is already in first normal form, and all its attributes are fully dependent

Module 6Data Normalization

Module Introduction

Module Overview

This module discusses data normalization and the physical data model. It explains datanormalization and the rules of first, second, and third normal form, and discusses thephysical data model and why, during design and implementation, the database designer maydenormalize a logical data model in order to maximize performance.

Normalization

Data Normalization

Data normalization is a method of organizing data to reduce or eliminate insertion, deletion,and update anomalies. Data normalization is done by making sure we have one fact in oneplace, and all attributes are fully dependent on the unique identifier—this reduces dataredundancy and increases data integrity.

Normalization of relational databases has developed a reputation of being a difficult andcomplex operation that is technical in nature and must be completed by very experiencedphysical database experts. With this in mind, some data modelers have avoided datanormalization. Normalization, however, is no more complex than any other operation.

Normalization has more to do with understanding and analyzing the business data requirementsand the implementation of the business rules than it has to do with understanding relationaltheory. You are more likely to be successful creating a data model if you understand thebusiness than if you are a database product expert or an experienced data modeler.

There are two basic concepts of data normalization:Each attribute in an entity must be functionally dependent on the whole key (uniqueidentifier).Each attribute should be named appropriately, according to business and relationalconventions, and must be present in only one entity. Attributes should only migrate to otherentities under the rules of foreign keys or data denormalization.

Module 6: Data Normalization

© ESI International BAP:DWL:EN:000 ver. 2.0 57

Page 2: data normalization physical data model Data Normalization ......An entity is in second normal form if it is already in first normal form, and all its attributes are fully dependent

First Normal FormThe example below illustrates first normal form (in other words, no multivalued attributes).

In this example, the attributes—Product Number, Product Name, Product Quantity, and ProductPrice—in Entity 1 make up a multivalued group and must be moved to a new entity.

The minimum cardinality of the relationship to Entity 2 has been shown as “one” because itmakes sense (knowing the outcome) that an order must be for at least one product. However,the normalization process itself does not dictate what the minimum cardinality should be.

Second Normal FormThe example below illustrates second normal form (in other words, the entity is already in firstnormal form and all its attributes are fully dependent on the concatenated unique identifier).

Module 6: Data Normalization

58 BAP:DWL:EN:000 ver. 2.0 © ESI International

Page 3: data normalization physical data model Data Normalization ......An entity is in second normal form if it is already in first normal form, and all its attributes are fully dependent

In the example, Product Name and Product Price in Entity 2 are not fully dependent on theconcatenated unique identifier and must be moved to a new entity.

As before, the minimum cardinality of the relationship from Entity 3 to Entity 2 is shown as zerobased on knowledge of the outcome and the business rules it reflects. The normalizationprocess itself does not dictate it.

Third Normal FormThe following diagram illustrates third normal form (in other words, the entity is already insecond normal form and all its attributes are fully dependent on the unique identifier [notransitive dependencies]).

In this example, Customer Name in Entity 1 is not dependent on Order Number. It is dependenton Customer Number (a transitive dependency) and must be moved to a new entity. CustomerNumber must also be moved because it would be a foreign key.

The Normalized ResultThe diagram below illustrates additional changes that should be made to the third normal formexample shown in the previous section.

The entities and relationships have been given meaningful names. (Remember that entitynames must reflect normal business conventions.)The unique identifier of ORDER ITEM has been changed to use an attribute Order ItemNumber instead of Product Number. Using Product Number would have restricted the abilityto have the same product in an order more than once (for example, same product, differentcolor).

Module 6: Data Normalization

© ESI International BAP:DWL:EN:000 ver. 2.0 59

Page 4: data normalization physical data model Data Normalization ......An entity is in second normal form if it is already in first normal form, and all its attributes are fully dependent

The Normalization RulesThe table below summarizes the rules of normalization.

First Normal Form (1NF) An entity is in first normal form if it has no multivaluedattributes.

Second Normal Form (2NF)An entity is in second normal form if it is already in firstnormal form, and all its attributes are fully dependent on theconcatenated unique identifier.

Third Normal Form (3NF)

An entity is in third normal form if it is already in secondnormal form, and all its attributes are fully dependent on theunique identifier.No attribute may "determine" any other attribute (notransitive dependencies).

Most business-oriented, transactional databases are normalized to the third normal form. Otherkinds of databases, most of which are decision support and data warehouse oriented, do notfollow the normalization rules and are subject to a different discussion. These databases tend tobe highly denormalized and use a star (snowflake) schema instead of a relational schema.

The Physical Data Model

IntroductionThe physical data model is a logical model implemented in a specific database managementproduct (for example, Sybase, Oracle, Informix, and so on) in a specific installation.

The physical data model specifies implementation details that may be features of a particularproduct or version. It also specifies configuration choices for that database instance. Theseinclude index construction, decisions on primary and foreign keys, methods of referentialintegrity, constraints, views, and physical storage objects such as table spaces.

Reverse EngineeringAt times, the database designer may need to convert the physical data model into a logical datamodel. This is called "reverse engineering."

This normally takes place by reading in a database description (called a database schema) andgenerating out a logical entity relationship diagram (ERD). Most database modeling tools allowthis functionality.

This is not normally the responsibility of the business analyst.

Module 6: Data Normalization

60 BAP:DWL:EN:000 ver. 2.0 © ESI International

Page 5: data normalization physical data model Data Normalization ......An entity is in second normal form if it is already in first normal form, and all its attributes are fully dependent

The Database DesignerDesigning the physical data model is the responsibility of the database designer. The businessanalyst is not usually the database designer.

Unlike the business analyst, the database designer must deal with physical constraints,limitations, and performance issues. These concerns may result in the database designermaking decisions to “denormalize” some parts of the logical data model.

DenormalizationDenormalization is a well-known and well-understood method of improving the performance ofa relational database. It also is used to improve the performance of queries and reports that arerun against the database. Denormalization, when performed with thought and careful attentionto details, works well. However, every denormalization incurs a certain amount of risk ofcreating update and insert anomalies if anything goes wrong. The risk must be balanced againstperformance gains.

There are a number of approaches to denormalization. One common approach is to carryredundant data in more than one database table. This prevents “joining” data from separatetables that may cause performance problems. However, it creates more complexity in datamaintenance and causes potential problems of data integrity.

Module Summary

Module SummaryData normalization is concerned with ensuring that one fact is in one place and that allattributes are fully dependent on the unique identifier.Its objective is to reduce data redundancy and increase data integrity.If an attribute violates one of the normalization rules, it must be removed to a new entity.During design and implementation, a database designer may denormalize a logical datamodel to maximize performance.

Module 6: Data Normalization

© ESI International BAP:DWL:EN:000 ver. 2.0 61