33
IT 20303 • The Relational DBMS • Section 05

IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Embed Size (px)

Citation preview

Page 1: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

IT 20303

• The Relational DBMS

• Section 05

Page 2: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Normalization for Logical Database Design

Page 3: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Normalization

– Process of analyzing a grouping of data items

• Based on inherent characteristics

• Often applied to existing files or databases

Page 4: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Normalization

– Principles

• Data items belong together in a logical group

• Group of items can be identified by own unique identifier

Page 5: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Normalization

– Data in the group describes one, and only one, thing

– A Bottom-Up approach

Page 6: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Why Normalize

– Avoid update anomalies

• Nasty side effects

– Minimize storage of redundant data

– Support simpler logic for manipulating data

Page 7: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Why Not Normalize

– Data is never (very rarely) updated

– Data warehouse system is seldom normalized

Page 8: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Sample Data Not Normalized

WARD NAME

WARD TYPE NO. OF BEDS

SENIOR NURSE

PATINET NO

PATIENT NAME

DATE OF BIRTH

Liston Orthopedic 6 J Bryan 45812 D Carter 21/02/65

71384 R Willis 08/10/46

69355 G Barnes 17/06/41

Godlee General 10 V Fox 52217 M Brown 21/02/35

10823 R Willis 12/03/54

Page 9: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• How to Normalize Data using Functional Dependencies– Definition of Functional Dependency

• Given a relation R, attribute Y of R is functionally dependent on attribute X of R, if and only if each X value in R has associated with it precisely one Y-value in R (at any one time)

Page 10: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Y of R is Dependent on X of R• X (-->)functionally determines Y

WARD NAME

WARD TYPE NO. OF BEDS

SENIOR NURSE

PATINET NO

PATIENT NAME

DATE OF BIRTH

Liston Orthopedic 6 J Bryan 45812 D Carter 21/02/65

Liston Orthopedic 6 J Bryan 71384 R Willis 08/10/46

Liston Orthopedic 6 J Bryan 69355 G Barnes 17/06/41

Godlee General 10 V Fox 52217 M Brown 21/02/35

Godlee General 10 V Fox 10823 R Willis 12/03/54

X Y

Page 11: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Functional Dependency Diagram of Hospital Ward Example

Patient Name

Date of Birth

Patient No

Ward NameWard Type

No of Beds Senior Nurse

Page 12: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Table structure based on FD Diagram

WARD NAME

WARD TYPE

NO OF BEDS

SENIOR NURSE

Liston Orthopedic 6 J Bryan

Godlee General 10 V Fox

WARD

PATIENT NO

PATIENT NAME

DATE OF BIRTH

WARD NAME

45812 D Carter 21/2/65 Liston

71384 R Willis 8/10/46 Liston

52217 M Brown 21/2/85 Godlee

PATIENT

Page 13: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Normalization using Codd’s Rules– Origin

• Early enthusiasts wanted to use relational theory

• Sought rules for structuring data in relational model

Page 14: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Normalization using Codd’s Rules– Codd and contemporaries developed rules for

“Normal Forms”• 1NF• 2NF• 3NF

– Normal levels to do in database design• Boyce/Codd NF – 3.5NF• 4NF• 5NF

Page 15: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Customer-Order-Line Item Example– Assume an existing order-entry program

and data file:

ORD-NO CUST-NO CUST-NAME

ADDR PROD-NO PROD-NAME

UNIT-PRC QTY

TOT-ITEM-PRC … PROD-NO PROD-NAME

UNIT-PRC QTY TOT-ITEM-PRC

Page 16: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• 1NF – Break out repeating groups

ORD-NO CUST-NO CUST-NAME

ADDR PROD-NO PROD-NAME

UNIT-PRC QTY

TOT-ITEM-PRC … PROD-NO PROD-NAME

UNIT-PRC QTY TOT-ITEM-PRC

ORD-NO CUST-NO CUST-NAME

ADDR

ORD-NO PROD-NO PROD-NAME UNIT-PRC QTY TOT-ITEM-PRC

ORD-NO PROD-NO PROD-NAME UNIT-PRC QTY TOT-ITEM-PRC

ORDER

ORDER

LINEITEM

Page 17: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• 2NF- Break out attributes dependent on part of the primary key

ORD-NO CUST-NO CUST-NAME

ADDR

ORD-NO PROD-NO PROD-NAME UNIT-PRC QTY TOT-ITEM-PRC

ORD-NO PROD-NO QTY TOT-ITEM-PRC

ORDER

LINEITEM

LINEITEM

PROD-NO PROD-NAME UNIT-PRC

PRODUCT

Page 18: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• 3NF- Break out attributes wholly dependent on another key

ORD-NO CUST-NO CUST-NAME

ADDR

ORD-NO PROD-NO QTY TOT-ITEM-PRC

ORDER

LINEITEM

PROD-NO PROD-NAME UNIT-PRC

PRODUCT

ORD-NO CUST-NO

ORDER CUST-NO CUST-NAME

ADDR

CUSTOMER

Page 19: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Rules for 1NF, 2NF, & 3NF– 1NF

• Break out repeating groups into a separate entity– 2NF

• Break out attributes that are dependent on part of the primary key into a separate entity

• Called Partial Dependency– 3NF

• Break out attributes that are wholly dependent on another key (not PK) into a separate entity

• Called Transitive Dependency

Page 20: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Normalization– A relation R is in 3rd Normal Form (3NF) if

and only if the non-key attributes of R (if any) are:• Mutually independent, and• Fully dependent on the primary key of R

Page 21: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Normalization Cont’d– A relation is in 3NF if all the attributes are

functionally dependent• On the Key• On the Whole Key, and• On Nothing but the Key

–(So Help Me Codd)

Page 22: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Reconcile differences between the Data Model and Normalized Data Structures– Data model and normalized data structures must be

reconciled– Discard data items from old files that are no longer

needed• Calculation fields• Redundant fields

– Resolve discrepancies in data item names– Ensure that new fields are really necessary

• Use standard naming conventions

Page 23: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Example 01:

– What happens when a part has more than four suppliers?

– What happens when a supplier is dropped?– How do you query the parts with two or more

suppliers?– Normalized Table:

PART-NO SUPP-1 SUPP-2 SUPP-3 SUPP-4

WDGT01 XYZZY FOOBAR NULL NULL

PART-NO SUPP

WDGT01 XYZZY

WDGT01 FOOBAR

Page 24: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Example 02: Normalize this table

PART-NO SUPP PART_DESC SUPP-ADDRESS

WDGT01 XYZZY Blue Widget 123 Bluejay Way

WDGT01 FOOBAR Blue Widget 544 Old Orchard

PART-NO PART_DESC

WDGT01 Blue Widget

PART-NO SUPP

WDGT01 XYZZY

WDGT01 FOOBAR

SUPP SUPP-ADDRESS

XYZZY 123 Bluejay Way

FOOBAR 544 Old Orchard

Page 25: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

End Section

Page 26: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Multiply ways to Normalize Data– Normalization can be accomplished in different

ways• Well-formed E-R model is normalized• Functional dependencies• Codd’s Rules for 1NF, 2NF, & 3NF

– Discrepancies indicate something is missing or changed

– One approach validates or checks another approach

Page 27: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Impact of Normalization– Improve the integrity of data

• Purpose is to eliminate update anomalies– Minimize storage of redundant data– Reduce the complexity of programming logic

• Emphasis now is on maintainability, simplicity of program• Normalized data can minimize complexity of code that

manipulates the data– Enhance the stability, “goodness” of database design

• Normalized data tends to be easier to understand• Normalized data can be used by many different

applications more easily

Page 28: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Impact of Normalization on Performance– Concern that a large number of tables-and

table joins-will result in poor performance• Join can be a very expensive operation• Test to determine frequency of joins,

number of tables joined–After database is created and

available

Page 29: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Impact of Normalization on Performance Cont’d– Requirements for application performance,

response time dictate corrective actions– Performance addressed in section on

physical database design• There are alternatives to de-normalizing

data to improve performance

Page 30: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Recommendations for Data that is Updated– First Normalize– Don’t be dismayed by too many tables

• Normalization increases number of tables but improves logic

– Normalization is a helpful logical database design technique…for any DBMS

Page 31: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Objective of the design process is a “Good” design– The logical database design process

• Is well understood• Uses complementary techniques• Can be automated with CASE tools

Page 32: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Objective of the design process is a “Good” design cont’d– A “Good” database design

• Contains all the important entities and data items• Has stable primary keys• Identifies clearly all relationships• Has table structures in 3NF• Is understood by designers and users• Accurately models the real world, as described

in the requirements

Page 33: IT 20303 The Relational DBMS Section 05. Relational Database Theory Normalization for Logical Database Design

Relational Database Theory

• Questions?