Ifsm410 Normal (1)

Embed Size (px)

Citation preview

  • 7/31/2019 Ifsm410 Normal (1)

    1/21

    99978523.doc

    Lecture Normalization

    (I modified the Course Module on Normalization)

    ______________________________________________________

    Outline

    Normalization of Database Tables

    Functional Dependency

    Data Redundancy

    Normal Forms

    First Normal Form (1NF)

    Second Normal Form (2NF)

    Third Normal Form (3NF)

    Invoice Database Normalization (3NF) Example

    Database Design

    Example Homework Problems

    Student Homework

  • 7/31/2019 Ifsm410 Normal (1)

    2/21

    99978523.doc

    Normalization of Database Tables

    Good database design must be matched to good table structures. Goodtable structures are evaluated and designed to control dataredundancies, thereby avoiding data anomalies. The process that yieldssuch desirable results is known as normalization.

    Functional Dependency

    A primary key uniquely identifies one and only one row in a table.Functional dependency exists between a primary key and a uniquerow in a table because the primary key guarantees uniqueness.

    A more formal definition of functional dependency is that A B. B is

    functionally dependent on A if A determines B. In plain English, thismeans A determines one and only one value of B.We make a dependency diagram to show all functional dependencieswithin a table. Examine the following dependency diagram for the tablein figure 2.6, where C1 and C3 constitute the composite primary keybecause it uniquely identifies the entire tuple (remember formal term forrow or record)that is, all five attributes.

    Figure 2.6 Dependency Diagram (1NF)

    Table 1:

    Primary key: C1, C3

    Foreign key: None

    Normal form: 1NF

    C1, C3 C2, C4, C5 represents a functional dependency because C2,C4, and C5 depend on the primary key composed of C1 and C3.

    C1 C2 is a special case of a functional dependency referred to as apartial dependency because C2 depends only on C1 rather than onthe entire primary key composed of C1 and C3.

  • 7/31/2019 Ifsm410 Normal (1)

    3/21

    99978523.doc

    C4 C5 is a special case of a functional dependency referred to as atransitive dependency because C5 depends on an attribute (C4) thatis not part of the primary key.

    Data Redundancy

    Redundant data occur in more than one places, creating a strongprobability of inconsistency (anomalies) in updates, additions, anddeletions of the redundant data. For example, redundant data may beupdated in one place but overlooked in another place. The mostcommon anomalies discussed in the text when data redundancy existsare update anomalies,addition anomalies, and deletionanomalies.In addition, data redundancy causes data integrity problems becausedata entry failed to conform to the rule that all copies of redundant dataare to be equal.We can avoid all these difficulties through normalization.

    Normal Forms

    Normalization is a technique to design tables in which dataredundancies are minimized by assigning attributes to entities. If thenormalization process works properly, it eliminates uncontrolled dataredundancies, getting rid of both the data anomalies and the dataintegrity problems. It's important to realize that normalization does noteliminate data redundancy - it produces carefully controlledredundancies used to link database tables to form relationships.The first three normal forms (1NF, 2NF, and 3NF) are most commonlyencountered. From a structural point of view, higher normal forms arebetter than lower ones because higher normal forms yield fewer dataredundancies - 3NF is better than 2NF, which is better than 1NF.Almost all business designs use the 3NF as the ideal normal form. Aspecial, more restricted, 3NF is known as Boyce-Codd normal form(BCNF).Let's look at each of the normal forms in turn.

    First Normal Form (1NF)

    A table is in 1NF when all the key attributes are defined and allremaining attributes are dependent on the primary key. When there arerepeating groups, the primary key generally needs to be expanded toreach 1NF.

  • 7/31/2019 Ifsm410 Normal (1)

    4/21

    99978523.doc

    However, a table in 1NF can still contain both partial and transitivedependencies. A partial dependency is one in which an attribute isfunctionally dependent on only a part of a multiattribute primary key. Atransitive dependency is one in which one attribute is functionallydependent on another non-key attribute. Naturally, a table with a single-attribute primary key can't exhibit partial dependencies.The table in the dependency diagram of figure 2.6 is in 1NF.

  • 7/31/2019 Ifsm410 Normal (1)

    5/21

    99978523.doc

    Second Normal Form (2NF)

    A table is in 2NF when it's in 1NF and contains no partial dependencies.

    Therefore, a 1NF table is automatically in 2NF if its primary key is basedon only a single attribute.A table in 2NF may still contain transitive dependencies. Look at thedependency diagram for a database the tables of which are at least 2NFas shown in figure 2.7.

    Figure 2.7 Dependency Diagram (2NF)

    Remove partial dependencies by creating new tables. To do this, writeeach primary key component on a separate line, followed by a linecontaining the original primary key (C1,C3). Each of these keyspotentially starts a new table.Write the dependent attributes after each new key. Because no attributeis dependent on C3, a table never materializes for the primary key C3.Table 1 is in 3NF because it's in 2NF (no partial dependencies) andcontains no transitive dependencies. Table 2 is in 2NF because it

    contains a transitive dependency, C4 C5.

    Third Normal Form (3NF)

    A table is in 3NF if it's in 2NF and contains no transitive dependencies.Given this definition of 3NF, the Boyce-Codd normal form (BCNF) ismerely a special 3NF case, in which all the determinant keys arecandidate keys. So, if a table has only a single candidate key, a 3NFtable is automatically in BCNF.Split a table that is not in 3NF into new tables until all the tables meetthe 3NF requirements. Look at the dependency diagram for a databasewith tables at least 3NF in figure 2.8 below.

  • 7/31/2019 Ifsm410 Normal (1)

    6/21

    99978523.doc

    Figure 2.8 Dependency Diagram (3NF)

    Get rid of transitive dependencies by decomposing the table containingthe transitive dependency. Here's how:

    Place the attributes that create the transitive dependency in a

    separate table, C4 C5. Keep C4 in the original table 2 to create a foreign-key link to the

    new table 3.

    Make sure that the primary key attribute C4 for the new table 3 isthe foreign key in the original table 2.

    After doing this, tables 1, 2, and 3 are all in 3NF because neither partialnor transitive dependencies exist.Pause for a moment to look back at normalization from a nontechnicalperspective. When normalization begins, our starting point is a"conglomerate" tableone that has lots of themes. We then decomposeit to a number of single-theme tables. When you have arrived at acollection of tables each having one theme, you probably will haveachieved 3NF.

  • 7/31/2019 Ifsm410 Normal (1)

    7/21

    99978523.doc

    Invoice Database Normalization (3NF) Example

    Here's an INVOICE database - we'll use this data and decompose it to atleast 3NF:

    Attribute Name Sample Data

    INV_NUM 211347

    PROD_NUM AA_E3422QW

    SALE_DATE 3/25/96

    PROD_DESCRIPTION D & B rotary sander, 6-in. disk

    VEND_CODE 211

    VEND_NAME Never Fail, Inc.

    NUM_SOLD 2

    PROD_PRICE $49.95

    We know that a table is in 1NF when all the key attributes are definedand all remaining attributes are dependent on the primary key, and thatwhen there are repeating groups, the primary key generally needs to beexpanded to reach 1NF. To eliminate repeating groups in this case, we

    have INV_NUM, PROD_NUM as the composite primary key.However, a table in 1NF can still contain both partial and transitive

    dependencies. In this case, the partial dependencies are INV_NUMSALE_DATE and PROD_NUM PROD_PRICE. The transitive dependency is

    VEND_CODE VEND_NAME.Look at the INVOICE database table (1NF) in figure 2.9.

    Figure 2.9 Invoice Database (1NF)

  • 7/31/2019 Ifsm410 Normal (1)

    8/21

    99978523.doc

  • 7/31/2019 Ifsm410 Normal (1)

    9/21

    99978523.doc

    Table 1:

    Primary key: INV_NUM, PROD_NUM

    Foreign key: None

    Normal form: 1NF

    We know that a table is in 2NF when it's in 1NF and contains no partialdependencies. So we create three tables from the individualcomponents of the primary key and the composite primary key. Thesekeys start the new tables as primary keys with all their dependentattributes listed in their respective tables.Transitive dependencies are allowed in 2NF. In this case, a table (2NF)

    has the transitive dependency VEND_CODE VEND_NAME. The othertables are already in 3NF (no partial or transitive dependencies).

    Look at the INVOICE database tables (2NF) in figure 2.10.

    Figure 2.10 Invoice Database(2NF)

    Table 1:

    Primary key: INV_NUM, PROD_NUM

    Foreign keys: INV_NUM (to table 2)

    PROD_NUM (to table 3)

    Normal form: 3NF

  • 7/31/2019 Ifsm410 Normal (1)

    10/21

    99978523.doc

    Table 2:

    Primary key: INV_NUM

    Foreign key: None

    Normal form: 3NF

    Table 3:

    Primary key: PROD_NUM

    Foreign key: None

    Normal form: 2NF

    We know that a table is in 3NF if it's in 2NF and contains no transitivedependencies. So we place the attributes that create the transitive

    dependency in a separate table, VEND_CODE VEND_NAME. We keepVEND_CODE in the original table to create a foreign keylink to the newtable.The other tables are already in 3NF (no partial or transitivedependencies). Once we normalize to 3NF, we can give the INVOICEdatabase tables meaningful names such as LINE, INVOICE, PRODUCT,

    andVENDOR

    .Look at the INVOICE database tables (3NF) in figure 2.11.

    Figure 2.11 Invoice Database (3NF)

  • 7/31/2019 Ifsm410 Normal (1)

    11/21

    99978523.doc

    LINE table:

    Primary key: INV_NUM, PROD_NUM

    Foreign keys: INV_NUM (to table INVOICE)

    PROD_NUM (to table PROD_NUM)

    Normal Form: 3NF

    INVOICE table:

    Primary key: INV_NUM

    Foreign key: None

    Normal form: 3NF

    PRODUCT table:

    Primary Key: PROD_NUM

    Foreign Key: VEND_CODE (to table VENDOR)

    Normal Form: 3NF

    VENDOR table:

    Primary Key: VEND_CODE

    Foreign key: None

    Normal form: 3NF

  • 7/31/2019 Ifsm410 Normal (1)

    12/21

    99978523.doc

    Database Design

    Dependency diagrams don't show the nature of the relationships (1:1,1:M, M:N). The E-R diagrams remain crucial to our design effort. Wecan't successfully produce complex design without some form ofmodeling. Yet, as we've seen in the preceding examples, dependencydiagrams are a valuable addition to our designer's tool box because:

    Normalization is likely to suggest the existence of entities we maynot have considered in the modeling process.

    If transaction management issues require the existence ofattributes that create other than 3NF or BCNF conditions, theproper dependency diagrams will at least force us to be aware ofthese conditions.

    A relational schema is used in design documentation whichdepicts connecting fields and relationship types.

    Figure 2.12 shows the complete design documentation that wouldaccompany the previous INVOICE database (3NF) example.

  • 7/31/2019 Ifsm410 Normal (1)

    13/21

    99978523.doc

    Figure 2.12 Invoice E-R Diagram and Relational Schema

    Normalization is part of the design process. As we define entities andattributes during the E-R modeling process, we do normalization checkson each entity (or entity sets) and form new ones as required. Weincorporate the normalized entities into the E-R diagram and continuethe iterative E-R process until all entities and their attributes are definedand all equivalent tables are in 3NF.The more tables we have, the more additional disk I/O to join them andthe more processing logic we need. That's why we sometimes

    denormalize tables to yield less I/O and thus increase processingspeed. Is that a good idea? Unfortunately, we pay for the increasedprocessing speed because updates to a larger table are inefficient, anddata redundancies occur that are likely to yield data anomalies. So weshould use denormalization sparingly in the design process.Normalization offers us evaluation standards for producing good tablestructures that can also be passed on to the next generation of databasedesigners.

  • 7/31/2019 Ifsm410 Normal (1)

    14/21

    99978523.doc

  • 7/31/2019 Ifsm410 Normal (1)

    15/21

    99978523.doc

    Example Homework Problems

  • 7/31/2019 Ifsm410 Normal (1)

    16/21

    99978523.doc

    Student Homework

    :

    Example Problem 1) The following report is how an inexperienced databasedeveloper might create a table. Your mission is to get into 3rd Normal Form. Thekey field is Customer_ID and Movie_ID, Vendor_ID and Check_Out_Date..

    It would be helpful to do this in stages like the book so you can get partial credit ifsomething goes wrong.

    Normalize_ME

    Customer ID Last Name Movie ID Title Vendor ID Type Check Out Date Return Date

    1001 Barns 101 Title of Movie 1 ACM ACT 1/1/2002 1/2/2002

    1001 Barns 102 Title of Movie 2 ACM COM 1/1/2002 1/2/2002

    1001 Barns 103 Title of Movie 3 ACM DRA 1/1/2002 1/5/2002

    1001 Barns 104 Title of Movie 4 ACM DRA 1/1/2002 1/6/2002

    1001 Barns 105 Title of Movie 5 ACM DRA 1/1/2002

    1001 Barns 106 Title of Movie6 BB DRA 1/1/2002

    1001 Barns 107 Title of Movie7 BB COM 1/1/20021001 Barns 108 Title of Movie8 ACM COM 1/1/2002

    1001 Barns 109 Title of Movie9 ACM COM

  • 7/31/2019 Ifsm410 Normal (1)

    17/21

    99978523.doc

    Solution:Step 1) list the key fields separately and on the last line write the original(composite) key

    MOVIE_ID

    CUSTOMER_ID

    CHECK_OUT_DATE

    MOVIE_ID, CUSTOMER_ID, CHECK_OUT_DATE

    Step 2) Add the dependent attributes next to the each line you made on step 1:

    MOVIE_ID ---- MOVIE_TITLE, TYPE, VENDOR_ID

    CUSTOMER_ID ---- CUSTOMER_LNAME

    CHECK_OUT_DATE none!!!! No Dependent Attributes!! So it does notmake a NEW TABLE

    MOVIE_ID, CUSTOMER_ID--- CHECK_OUT_DATE, RETURN_DATE

    Step 3) Make new Tables for all the transitive dependencies.

    Notice that MOVIE_ID gives TYPE and VENDOR_ID. So, I add two new tablesfor each of these AND KEEP the linking attributes in the orginal table:

    MOVIE_ID ---- MOVIE_TITLE, TYPE, VENDOR_ID

    CUSTOMER_ID ---- CUSTOMER_LNAME

    MOVIE_ID, CUSTOMER_ID--- CHECK_OUT_DATE, RETURN_DATE

    TYPE -- TYPE, TYPE_DESCRIPTION (new)

    VENDOR --- VENDOR_ID, VENDOR_NAME (new)

  • 7/31/2019 Ifsm410 Normal (1)

    18/21

    99978523.doc

    Step 4) Give appropriate Names to the tables, and underline the PK of eachEntity:

    MOVIE: (MOVIE_ID, MOVIE_TITLE, TYPE, VENDOR_ID)

    CUSTOMER: (CUSTOMER_ID, CUSTOMER_LNAME)

    RENTAL: (MOVIE_ID, CUSTOMER_ID, CHECK_OUT_DATE,RETURN_DATE)

    TYPE: (TYPE, TYPE_DESCRIPTION)

    VENDOR: (VENDOR_ID, VENDOR_NAME)

    These tables are now in 3rd Normal Form.

    Notice how I had to deal with CHECK_OUT_DATE. It is possible (but unlikely)that a customer could rent the same video (i.e. same VIDEO_ID, MOVIE_TITLE,VENDOR_ID) on different days! So I had to keep CHECK_OUT_DATE as part ofthe PK.

    Example Problem 2).

    a) To keep track of office furniture, computers, printers, and so on, theFOUNDIT company uses the following table structure:

    Attribute name Sample value

    ITEM_ID 2311345-678ITEM_DESCRIPTION HP DeskJet 660C printerBLDG_ROOM 325BLDG_CODE DEL

    BLDG_NAME Dawn's Early LightBLDG_MANAGER E. R. Rightonit

    Given this information, draw the dependency diagram. Make sure youlabel the transitive and/or partial dependencies.

  • 7/31/2019 Ifsm410 Normal (1)

    19/21

    99978523.doc

    b) Starting with the dependency diagram drawn for problem 10, create a setof dependency diagrams that meet 3NF requirements. Renameattributes to meet the naming conventions and create new entities andattributes as necessary.

    Note that the dependency diagram reflect the notion that each building ismanaged by one employee.

    SOLUTION:

    Transitive Dependencies

    ITEM_DESCRIPTION BLDG_CODE

    BLDG_NAME

    EMP_FNAME EMP_INITIAL

    BLDG_NAME BLDG_MANAGER

    BLDG_CODE

    EMP_CODE EMP_LNAME

    Problem Solution: All tables in NF11 1

    ITEM_ID BLDG_ROOM

    ITEM_DESCRIPTIONITEM_ID ITEM_ROOM BLDG_CODE

    EMP_CODE

    Problem Solution11

  • 7/31/2019 Ifsm410 Normal (1)

    20/21

    99978523.doc

    STUDENT HOMEWORK

    1 Using the following INVOICE table structure, draw its dependencydiagram and identify all dependencies (including all partial and transitivedependencies). You can assume that the table does not contain repeatinggroups and that any invoice number may reference more than one product.(Hint: This table uses a composite primary key.)

    Attribute name Sample value

    INV_NUM 211347PROD_NUM AA_E3422QWSALE_DATE 06/25/1999

    PROD_DESCRIPTION B&D Rotary sander, 6 in. diskVEND_CODE 211VEND_NAME NeverFail, Inc.NUMBER_SOLD 2PROD_PRICE $49.95

    2 Using the initial dependency diagram drawn in problem 1, remove all

    partial dependencies, draw the new dependency diagrams, and identify thenormal forms for each table structure you created.

    Note: You can assume that any given product is supplied by asingle vendor, but a vendor can supply many products.Therefore, it is proper to conclude that the followingdependency exists:PROD_NUM PROD_DESCRIPTION, PROD_PRICE,

    VEND_CODE, VEND_NAME(Hint: Your actions should produce three new dependencydiagrams.)

  • 7/31/2019 Ifsm410 Normal (1)

    21/21

    99978523.doc

    3) The following report is how an inexperienced database developer might createa table. Your mission is to normalize it. The key fields are INV_NUM, S_ID,CAR_NUM and CUS_ID.

    The fields SER_CODEis multi-valued which implies a TABLE called SERVICEthat holds all the descriptions of the SERVICE.

    The Field S_ID refers to the Salesperson table.

    Normalize ME

    INV_NUM S_ID SER_CODE CAR_NUM CUS_ID Last Name CAR_MAKE CAR_MODEL CAR_YEAR

    5001 1001 NCS 00yo5mn832 101 Adams Mazada Tribute 20025029 1005 NCS 86gf2de356 123 Palaisa Mazada Miata 2001

    5030 1005 NCS 87mn5sw754 128 Sullivan Lincoln TownCar 2000

    5031 1006 NCS 87vb2sn427 125 Petty Volvo C70 Coupe 2000

    5032 1007 NCS 88za9qr821 127 Scott Mazada Protg 2001

    5033 1007 NCS 90as2sw987 126 Regan Lincoln Contiential 1997

    5034 1003 NCS 90gh1gf546 129 Washington Lincoln LS 1999

    5035 TUP A1 103 Armstrong BMW 326 2002

    5036 INS A2 104 Bell BMW 327 2002

    5037 TUP 18ms3sw145 107 Cliburn Volvo Cross Country 2002

    5038 TUP A3 109 Edision BMW 328 2002

    5043 PRE 85de4hg678 122 Nixon Ford Windstar 2002

    5044 PRE 55ar6xv772 123 Palaisa Ford Windstar 2001