76
1 . Ching, Ph.D. • MIS • California State University, Sacramento Week 11 Week 11 November 7 November 7 Data Normalization and ERD Data Normalization and ERD Conceptual, Logical and Physical Conceptual, Logical and Physical Database Design Database Design

1 R. Ching, Ph.D. MIS California State University, Sacramento Week 11 November 7 Data Normalization and ERDData Normalization and ERD Conceptual, Logical

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

1

R. Ching, Ph.D. • MIS • California State University, Sacramento

Week 11Week 11November 7November 7

• Data Normalization and ERDData Normalization and ERD• Conceptual, Logical and Physical Database DesignConceptual, Logical and Physical Database Design

2

R. Ching, Ph.D. • MIS • California State University, Sacramento

Data NormalizationData Normalization

• The purpose of normalization is to produce a stable set of The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the relations that is a faithful model of the operations of the enterprise.enterprise.

– Achieve a design that is highly flexibleAchieve a design that is highly flexible

– Reduce redundancy Reduce redundancy – Ensure that the design is free of certain update, Ensure that the design is free of certain update,

insertion and deletion anomaliesinsertion and deletion anomalies

Catherine Richardo, Catherine Richardo, 19901990

3

R. Ching, Ph.D. • MIS • California State University, Sacramento

4NF4NF4NF4NF

BCNFBCNFBCNFBCNF

3NF3NF3NF3NF

2NF2NF2NF2NF

NormalizationNormalization

1NF1NF1NF1NF Flat file Flat file

Partial dependencies removed Partial dependencies removed

Transitive dependencies removed Transitive dependencies removed

Every determinant is a candidate key Every determinant is a candidate key

Non-tivial multi-valued dependencies Non-tivial multi-valued dependencies removed removed

4

R. Ching, Ph.D. • MIS • California State University, Sacramento

Stereos To GoInvoice

Order No.

Date: / /

Account No.

ItemNumber Product Description/Manufacturer Qty Price

ProductCode

1

2

3

4

5

Date Shipped: / /

Customer:Address:

City State Zip Code

10001

6 15 99

0000-000-0000-0

John Smith2036-26 StreetSacramento CA 95819

SAGX730 Pioneer Remote A/V Receiver

AT10 Cervwin Vega LoudspeakersCDPC725 Sony Disc-Jockey CD Changer

6 18 99

SubtotalShipping & Handling

Sales TaxTotal

1329851000010306

153291

111

569953599539995

Go, HogsGo, Hogs

1/051/05

Stereos To Go

0000 000 0000 00000 000 0000 0John SmithJohn Smith

5

R. Ching, Ph.D. • MIS • California State University, Sacramento

Unnormalized RelationUnnormalized Relation

How would a program process the data to recreate the invoice?How would a program process the data to recreate the invoice?

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, Cust_account , Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code,Cust_name Cust_addr Cust_city Cust_state Zip_code,Item1 Item1_descrip Item1_qty Item1_price,Item1 Item1_descrip Item1_qty Item1_price,Item2 Item2_descrip Item2_qty Item2_price, Item2 Item2_descrip Item2_qty Item2_price, . . . , . . . , Item7 Item7_descrip Item7_qty Item7_price)Item7 Item7_descrip Item7_qty Item7_price)

6

R. Ching, Ph.D. • MIS • California State University, Sacramento

Unnormalized to 1NFUnnormalized to 1NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, Cust_account , Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code,Cust_name Cust_addr Cust_city Cust_state Zip_code,Item1, Item1_descrip, Item1_qty, Item1_price,Item1, Item1_descrip, Item1_qty, Item1_price,Item2, Item2_descrip, Item2_qty, Item2_price, Item2, Item2_descrip, Item2_qty, Item2_price, . . . , . . . , Item7, Item7_descrip, Item7_qty, Item7_price)Item7, Item7_descrip, Item7_qty, Item7_price)

A flat file places all the data of a transaction into a single record. A flat file places all the data of a transaction into a single record. A flat file places all the data of a transaction into a single record. A flat file places all the data of a transaction into a single record.

This is reminiscent of a COBOL or BASIC program This is reminiscent of a COBOL or BASIC program processing a single transaction with one read statement.processing a single transaction with one read statement.

Repeating groupsRepeating groups

7

R. Ching, Ph.D. • MIS • California State University, Sacramento

Unnormalized to 1NFUnnormalized to 1NF

Nominated group of attributes Nominated group of attributes to serve as the keyto serve as the key

(form a unique combination)(form a unique combination)

• Eliminate the repeating groups.Eliminate the repeating groups.• Each row retains data for one item.Each row retains data for one item.• If a person bought 5 items, we If a person bought 5 items, we

would have five tupleswould have five tuples

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, Cust_account, , Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code,Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code,Item, Item_descrip, Item_qty, Item_price)Item, Item_descrip, Item_qty, Item_price)

8

R. Ching, Ph.D. • MIS • California State University, Sacramento

1NF1NF

10001 123456 John Smith 10001 123456 John Smith ••• SAGX730••• SAGX730 Pioneer Remote A/V Rec Pioneer Remote A/V Rec 11 569.95 569.9510001 123456 John Smith 10001 123456 John Smith ••• SAGX730••• SAGX730 Pioneer Remote A/V Rec Pioneer Remote A/V Rec 11 569.95 569.95

10001 123456 John Smith10001 123456 John Smith ••• ••• AT10 AT10 Cerwin Vega LoudspeakersCerwin Vega Loudspeakers 1 359.951 359.9510001 123456 John Smith10001 123456 John Smith ••• ••• AT10 AT10 Cerwin Vega LoudspeakersCerwin Vega Loudspeakers 1 359.951 359.95

10001 123456 John Smith10001 123456 John Smith ••• ••• CDPC725 CDPC725 Sony Disc Jockey CD Sony Disc Jockey CD 11 399.95 399.9510001 123456 John Smith10001 123456 John Smith ••• ••• CDPC725 CDPC725 Sony Disc Jockey CD Sony Disc Jockey CD 11 399.95 399.95

10001 123456 John Smith10001 123456 John Smith ••• ••• S/HS/H Shipping Shipping 11 100.00 100.0010001 123456 John Smith10001 123456 John Smith ••• ••• S/HS/H Shipping Shipping 11 100.00 100.00

10001 123456 John Smith10001 123456 John Smith ••• ••• TaxTax Sales Tax Sales Tax 11 103.06 103.0610001 123456 John Smith10001 123456 John Smith ••• ••• TaxTax Sales Tax Sales Tax 11 103.06 103.06

Flat FileFlat File

Invo

ice nu

mber

Invo

ice nu

mber

Accou

nt nu

mber

Accou

nt nu

mber

Custom

er na

me

Custom

er na

me

DescriptionDescriptionItem Item

QuantityQuantityItem Item PricePriceItemItem

9

R. Ching, Ph.D. • MIS • California State University, Sacramento

From 1NFFrom 1NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code,Cust_state, Zip_code,ItemItem, Item_descrip, Item_qty, Item_price), Item_descrip, Item_qty, Item_price)

Functional dependencies and determinantsFunctional dependencies and determinants

Example: item_descrip is functionally dependent on item, Example: item_descrip is functionally dependent on item, such that item is the determinant of item_descript.such that item is the determinant of item_descript.

10

R. Ching, Ph.D. • MIS • California State University, Sacramento

From 1NF to 2NFFrom 1NF to 2NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)Cust_state, Zip_code)

((ItemItem, Item_descrip, Item_qty, Item_price), Item_descrip, Item_qty, Item_price)

Is this unique by itself?Is this unique by itself?What happens if the item is purchased more than once?What happens if the item is purchased more than once?

11

R. Ching, Ph.D. • MIS • California State University, Sacramento

From 1NF to 2NFFrom 1NF to 2NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)Cust_state, Zip_code)

((Invoice_number,Invoice_number, ItemItem, Item_descrip, Item_qty, Item_price), Item_descrip, Item_qty, Item_price)

Composite key (forms a unique combination)Composite key (forms a unique combination)

Partial dependencyPartial dependency

12

R. Ching, Ph.D. • MIS • California State University, Sacramento

From 1NF to 2NFFrom 1NF to 2NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, Cust_account, , Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)

((Invoice_number,Invoice_number, ItemItem, Item_qty, Item_price), Item_qty, Item_price)

((ItemItem, Item_descrip), Item_descrip)

13

R. Ching, Ph.D. • MIS • California State University, Sacramento

From 2NF to 3NFFrom 2NF to 3NF

Which attributes are dependent on others?Which attributes are dependent on others?Is there a problem?Is there a problem?

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, Cust_account, , Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)

((Invoice_number,Invoice_number, ItemItem, Item_qty, Item_price), Item_qty, Item_price)

((ItemItem, Item_descrip), Item_descrip)

14

R. Ching, Ph.D. • MIS • California State University, Sacramento

Transitive Dependencies and AnomaliesTransitive Dependencies and Anomalies

• Insertion anomaliesInsertion anomalies

– To add a new row, all customer (name, address, city, To add a new row, all customer (name, address, city, state, zip code, phone) and products (description) must state, zip code, phone) and products (description) must be consistent with previous entriesbe consistent with previous entries

• Deletion anomaliesDeletion anomalies

– By deleting a row, a customer or product may cease to By deleting a row, a customer or product may cease to existexist

• Modification anomaliesModification anomalies

– To modify a customer’s or product’s data in one row, To modify a customer’s or product’s data in one row, all modifications must be carried out to all othersall modifications must be carried out to all others

15

R. Ching, Ph.D. • MIS • California State University, Sacramento

Insertion and Modification AnomaliesInsertion and Modification AnomaliesFor example…For example…

DVD-A110DVD-A110 PanasonicPanasonicPV-4210PV-4210 PanasonicPanasonicPV-4250PV-4250 PanasonicPanasonic

DVD-A110DVD-A110 PanasonicPanasonicPV-4210PV-4210 PanasonicPanasonicPV-4250PV-4250 PanasonicPanasonic

CT-32S35CT-32S35 PANPANCT-32S35CT-32S35 PANPAN

InconsistencyInconsistency

DVD-A110DVD-A110 PanasonicPanasonicPV-4210PV-4210 PanaSonicPanaSonicPV-4250PV-4250 Pana SonicPana SonicCT-32S35CT-32S35 PANPAN

DVD-A110DVD-A110 PanasonicPanasonicPV-4210PV-4210 PanaSonicPanaSonicPV-4250PV-4250 Pana SonicPana SonicCT-32S35CT-32S35 PANPAN

Change all Panasonic Change all Panasonic products’ manufacturer products’ manufacturer

name to “Panasonic USA”name to “Panasonic USA”

Product_codeProduct_code Manufacturer_nameManufacturer_nameInsert a new Panasonic productInsert a new Panasonic product

16

R. Ching, Ph.D. • MIS • California State University, Sacramento

Deletion AnomalyDeletion AnomalyFor Example…For Example…

43771824377182 John SmithJohn Smith SacramentoSacramento CACA 958319583143987114398711 Arnold SArnold S DavisDavis CACA 956919569145784614578461 Gray DavisGray Davis SacramentoSacramento CACA 958319583148731794873179 Lisa CarrLisa Carr RenoReno NVNV 8955789557

By deleting customer Arnold S, we would also be deleting By deleting customer Arnold S, we would also be deleting Davis, California. Davis, California.

17

R. Ching, Ph.D. • MIS • California State University, Sacramento

Transitive Transitive DependenciesDependencies

Invoice_numberInvoice_number

Invoice_dateInvoice_date

Date_deliveredDate_delivered

Cust_accountCust_account

Cust_nameCust_name

Cust_addrCust_addr

Cust_cityCust_city

Cust_stateCust_state

Zip_codeZip_code

ItemItem

Item_descripItem_descrip

Invoice_number+ItemInvoice_number+Item

Item_qtyItem_qty

Item_priceItem_price

A condition where A, B, C A condition where A, B, C are attributes of a relation are attributes of a relation such that if A such that if A B and B and B B C, then C is transitively C, then C is transitively dependent on A via B dependent on A via B (provided that A is not (provided that A is not functionally dependent on B functionally dependent on B or C).or C).

18

R. Ching, Ph.D. • MIS • California State University, Sacramento

Why Should City and State Be Separated Why Should City and State Be Separated from Customer Relation?from Customer Relation?

• City and state are dependent on zip code for their values City and state are dependent on zip code for their values and not the customer’s identifier (i.e., key).and not the customer’s identifier (i.e., key).

Zip_code Zip_code City, State City, State

• Otherwise,Otherwise,

Cust_account Cust_account Cust_addr, Zip_code Cust_addr, Zip_code City, State City, State

In which case, you have transitive dependency.In which case, you have transitive dependency.

19

R. Ching, Ph.D. • MIS • California State University, Sacramento

3NF3NF

Invoice RelationInvoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)(Invoice_number, Invoice_date, Date_delivered, Cust_account)

Customer RelationCustomer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)(Cust_account, Cust_name, Cust_addr, Zip_code)

Zip_code RelationZip_code Relation(Zip_code, City, State)(Zip_code, City, State)

Invoice_items RelationInvoice_items Relation(Invoice_number, Item, Item_qty, Item_price)(Invoice_number, Item, Item_qty, Item_price)

Items RelationItems Relation(Item, Item_descrip)(Item, Item_descrip)

20

R. Ching, Ph.D. • MIS • California State University, Sacramento

3NF3NF

Invoice RelationInvoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)(Invoice_number, Invoice_date, Date_delivered, Cust_account)

Customer RelationCustomer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)(Cust_account, Cust_name, Cust_addr, Zip_code)

Zip_code RelationZip_code Relation(Zip_code, City, State)(Zip_code, City, State)

Invoice_items RelationInvoice_items Relation(Invoice_number, Item, Item_qty, Item_price)(Invoice_number, Item, Item_qty, Item_price)

Items RelationItems Relation(Item, Item_descrip)(Item, Item_descrip)

Since the Items relation contains the manufacturer’s name in Since the Items relation contains the manufacturer’s name in the description, a separate Manufacturers relation can be the description, a separate Manufacturers relation can be createdcreated

Manufacturers RelationManufacturers Relation(Manuf_code, Manuf_name)(Manuf_code, Manuf_name)

21

R. Ching, Ph.D. • MIS • California State University, Sacramento

22

R. Ching, Ph.D. • MIS • California State University, Sacramento

First to Third Normal FormFirst to Third Normal Form(1NF - 3NF)(1NF - 3NF)

• 1NF: 1NF: A relation is in first normal form if and only if every A relation is in first normal form if and only if every attribute is single-valued for each tuple attribute is single-valued for each tuple (remove the (remove the repeating or multi-value attributes and create a flat file)repeating or multi-value attributes and create a flat file)

• 2NF: 2NF: A relation is in second normal form if and only if it A relation is in second normal form if and only if it is in first normal form and the nonkey attributes are fully is in first normal form and the nonkey attributes are fully functionally dependent on the key functionally dependent on the key (remove partial (remove partial dependencies)dependencies)

• 3NF: 3NF: A relation is in third normal form if it is in second A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively normal form and no nonkey attribute is transitively dependent on the keydependent on the key (remove transitive dependencies) (remove transitive dependencies)

23

R. Ching, Ph.D. • MIS • California State University, Sacramento

Putting It TogetherPutting It Together

ERD of the Normalized Data ModelERD of the Normalized Data Model

24

R. Ching, Ph.D. • MIS • California State University, Sacramento

3NF3NF

Invoice RelationInvoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)(Invoice_number, Invoice_date, Date_delivered, Cust_account)

Customer RelationCustomer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)(Cust_account, Cust_name, Cust_addr, Zip_code)

Zip_code RelationZip_code Relation(Zip_code, City, State)(Zip_code, City, State)

Invoice_items RelationInvoice_items Relation(Invoice_number, Item, Item_qty, Item_price)(Invoice_number, Item, Item_qty, Item_price)

Items RelationItems Relation(Item, Item_descrip, Manuf_code)(Item, Item_descrip, Manuf_code)

Manufacturers RelationManufacturers Relation(Manuf_code, Manuf_name)(Manuf_code, Manuf_name)

25

R. Ching, Ph.D. • MIS • California State University, Sacramento

ERDERD

Invoices Customers Zip_Codes

Invoice_numberInvoice_dateDate_deliveredCust_account

Cust_accountCust_nameCust_addrZip_code

Zip_codeCityState

Invoice_items

Invoice_numberItemItem_qtyItem_price

Items

ItemItem_descripManuf_code

Manufacturers

Manuf_codeManuf_name

26

R. Ching, Ph.D. • MIS • California State University, Sacramento

ERDERD

Invoices Customers Zip_Codes

Invoice_numberInvoice_dateDate_deliveredCust_account

Cust_accountCust_nameCust_addrZip_code

Zip_codeCityState

Locate

Invoice_items

Invoice_numberItemItem_qtyItem_price

Have

Items

ItemItem_descripManuf_code

Appear onManufacturers

Manuf_codeManuf_name

Produce

Order

(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)

(1..1)(1..1)

(1..*)(1..*)

(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)

27

R. Ching, Ph.D. • MIS • California State University, Sacramento

ERDERD

Invoices Customers Zip_Codes

Invoice_numberInvoice_dateDate_deliveredCust_account

Cust_accountCust_nameCust_addrZip_code

Zip_codeCityState

Locate

Invoice_items

Invoice_numberItemItem_qtyItem_price

Have

Items

ItemItem_descripManuf_code

Appear onManufacturers

Manuf_codeManuf_name

Produce

Order

(0..*)(0..*) (1..1)(1..1) ((00..*)..*) (1..1)(1..1)

(1..1)(1..1)

(1..*)(1..*)

(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)

Zip codes locate Customers.Zip codes locate Customers.(() ) A zip code can be related to a A zip code can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many customers.many customers.(() ) A customer can be related to a A customer can be related to a minimum and maximum of one zip code.minimum and maximum of one zip code.

Zip codes locate Customers.Zip codes locate Customers.(() ) A zip code can be related to a A zip code can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many customers.many customers.(() ) A customer can be related to a A customer can be related to a minimum and maximum of one zip code.minimum and maximum of one zip code.

PartialPartial

28

R. Ching, Ph.D. • MIS • California State University, Sacramento

ERDERD

Invoices Customers Zip_Codes

Invoice_numberInvoice_dateDate_deliveredCust_account

Cust_accountCust_nameCust_addrZip_code

Zip_codeCityState

Locate

Invoice_items

Invoice_numberItemItem_qtyItem_price

Have

Items

ItemItem_descripManuf_code

Appear onManufacturers

Manuf_codeManuf_name

Produce

Order

((00..*)..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)

(1..1)(1..1)

(1..*)(1..*)

(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)

Customers order (items) on invoices.Customers order (items) on invoices.(() ) A customer can be related to a A customer can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many invoices.many invoices.(() ) An invoice can be related to a An invoice can be related to a minimum and maximum of one minimum and maximum of one customer.customer.

Customers order (items) on invoices.Customers order (items) on invoices.(() ) A customer can be related to a A customer can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many invoices.many invoices.(() ) An invoice can be related to a An invoice can be related to a minimum and maximum of one minimum and maximum of one customer.customer.

PartialPartial

29

R. Ching, Ph.D. • MIS • California State University, Sacramento

ERDERD

Invoices Customers Zip_Codes

Invoice_numberInvoice_dateDate_deliveredCust_account

Cust_accountCust_nameCust_addrZip_code

Zip_codeCityState

Locate

Invoice_items

Invoice_numberItemItem_qtyItem_price

Have

Items

ItemItem_descripManuf_code

Appear onManufacturers

Manuf_codeManuf_name

Produce

Order

(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)

(1..1)(1..1)

((11..*)..*)

(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)

Invoices possess invoice itemsInvoices possess invoice items(() ) An invoice can be related to a An invoice can be related to a minimum of one and a maximum of minimum of one and a maximum of many invoice items.many invoice items.(()) An invoice item can be related to a An invoice item can be related to a minimum and maximum of one invoice.minimum and maximum of one invoice.

Invoices possess invoice itemsInvoices possess invoice items(() ) An invoice can be related to a An invoice can be related to a minimum of one and a maximum of minimum of one and a maximum of many invoice items.many invoice items.(()) An invoice item can be related to a An invoice item can be related to a minimum and maximum of one invoice.minimum and maximum of one invoice.

MandatoryMandatory

30

R. Ching, Ph.D. • MIS • California State University, Sacramento

ERDERD

Invoices Customers Zip_Codes

Invoice_numberInvoice_dateDate_deliveredCust_account

Cust_accountCust_nameCust_addrZip_code

Zip_codeCityState

Locate

Invoice_items

Invoice_numberItemItem_qtyItem_price

Have

Items

ItemItem_descripManuf_code

Appear onManufacturers

Manuf_codeManuf_name

Produce

Order

(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)

(1..1)(1..1)

(1..*)(1..*)

((00..*)..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)

Items are sold on invoice items.Items are sold on invoice items.(() ) An item can be related to a An item can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many invoice items.many invoice items.(() ) An invoice item can be related to a An invoice item can be related to a minimum and maximum of one item.minimum and maximum of one item.

Items are sold on invoice items.Items are sold on invoice items.(() ) An item can be related to a An item can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many invoice items.many invoice items.(() ) An invoice item can be related to a An invoice item can be related to a minimum and maximum of one item.minimum and maximum of one item.

PartialPartial

31

R. Ching, Ph.D. • MIS • California State University, Sacramento

ERDERD

Invoices Customers Zip_Codes

Invoice_numberInvoice_dateDate_deliveredCust_account

Cust_accountCust_nameCust_addrZip_code

Zip_codeCityState

Locate

Invoice_items

Invoice_numberItemItem_qtyItem_price

Have

Items

ItemItem_descripManuf_code

Appear onManufacturers

Manuf_codeManuf_name

Produce

Order

(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)

(1..1)(1..1)

(1..*)(1..*)

(0..*)(0..*) (1..1)(1..1) ((00..*)..*) (1..1)(1..1)

Manufacturers produce items.Manufacturers produce items.(() ) A manufacturer can be related to a A manufacturer can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many items.many items.(() ) An item can be related to a An item can be related to a minimum and maximum of one minimum and maximum of one manufacturer.manufacturer.

Manufacturers produce items.Manufacturers produce items.(() ) A manufacturer can be related to a A manufacturer can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many items.many items.(() ) An item can be related to a An item can be related to a minimum and maximum of one minimum and maximum of one manufacturer.manufacturer.

PartialPartial

32

R. Ching, Ph.D. • MIS • California State University, Sacramento

Higher Forms of Data NormalizationHigher Forms of Data Normalization

• Boyce-Codd Normal Form (BCNF)Boyce-Codd Normal Form (BCNF)• Fourth Normal Form (4NF)Fourth Normal Form (4NF)• Fifth Normal Form (5NF)Fifth Normal Form (5NF)• Domain Key Normal Form (DKNF)Domain Key Normal Form (DKNF)

33

R. Ching, Ph.D. • MIS • California State University, Sacramento

Boyce-Codd Normal Form (BCNF)Boyce-Codd Normal Form (BCNF)

• A relation is in Boyce-Codd normal form if and only if A relation is in Boyce-Codd normal form if and only if every determinant is a candidate keyevery determinant is a candidate key

• For a relation with only one candidate key, 3NF and BCNF For a relation with only one candidate key, 3NF and BCNF are are equivalentequivalent..

• Usually occurs when keys of different relations overlapUsually occurs when keys of different relations overlap

DeterminantDeterminant

(B is functionally dependent on A)(B is functionally dependent on A)

Attribute AAttribute AAttribute AAttribute A Attribute BAttribute BAttribute BAttribute BA determines BA determines B

34

R. Ching, Ph.D. • MIS • California State University, Sacramento

BCNF ExampleBCNF Example

User (User (UserID, DeptUserID, Dept, Name, ComputerID, EmpClassification), Name, ComputerID, EmpClassification)User (User (UserID, DeptUserID, Dept, Name, ComputerID, EmpClassification), Name, ComputerID, EmpClassification)

UserComputer (UserComputer (ComputerIDComputerID, Dept), Dept)

User (User (UserID, ComputerIDUserID, ComputerID, Name, EmpClassification), Name, EmpClassification)

UserComputer (UserComputer (ComputerIDComputerID, Dept), Dept)

User (User (UserID, ComputerIDUserID, ComputerID, Name, EmpClassification), Name, EmpClassification)

ComputerID ComputerID Dept ( Dept (a department issues a computera department issues a computer))UserID, Dept UserID, Dept ComputerID, Name, EmpCassification ComputerID, Name, EmpCassification

((Employees may have the same name and Employees may have the same name and UserIDs are unique within the department onlyUserIDs are unique within the department only))

UserID, ComputerID UserID, ComputerID Dept, Name, EmpClassification Dept, Name, EmpClassification

BCNFBCNF

35

R. Ching, Ph.D. • MIS • California State University, Sacramento

From 3NF to BCNFFrom 3NF to BCNF

Invoice RelationInvoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)(Invoice_number, Invoice_date, Date_delivered, Cust_account)

Customer RelationCustomer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)(Cust_account, Cust_name, Cust_addr, Zip_code)

Zip_code RelationZip_code Relation(Zip_code, City, State)(Zip_code, City, State)

Invoice_items RelationInvoice_items Relation(Invoice_number, Item, Item_qty, Item_price)(Invoice_number, Item, Item_qty, Item_price)

Items RelationItems Relation(Item, Item_descrip)(Item, Item_descrip)

Manufacturers RelationManufacturers Relation(Manuf_code, Manuf_name)(Manuf_code, Manuf_name)

Candidate keys?Candidate keys?

36

R. Ching, Ph.D. • MIS • California State University, Sacramento

Fourth Normal Form (4NF)Fourth Normal Form (4NF)

• A relation is in fourth normal form if and only if it is in A relation is in fourth normal form if and only if it is in Boyce-Codd normal form and there are no nontrivial Boyce-Codd normal form and there are no nontrivial dependencies.dependencies.

– Identify all determinants and make sure they are Identify all determinants and make sure they are candidate keyscandidate keys

37

R. Ching, Ph.D. • MIS • California State University, Sacramento

Employee (Employee (EmployeeID, DeptEmployeeID, Dept))

Projects (Projects (EmployeeID, ProjectEmployeeID, Project))

Employee (Employee (EmployeeID, DeptEmployeeID, Dept))

Projects (Projects (EmployeeID, ProjectEmployeeID, Project))

4NF4NF4NF4NF

4NF Example4NF Example

Employee (Employee (EmployeeID, Dept, ProjectEmployeeID, Dept, Project))Employee (Employee (EmployeeID, Dept, ProjectEmployeeID, Dept, Project))

Multivalued Multivalued dependenciesdependencies

100100 FinanceFinance F177-99F177-99100100 MarketingMarketing F177-99F177-99100100 FinanceFinance F288-00F288-00102102 FinanceFinance F288-00F288-00102102 MarketingMarketing F177-99F177-99102102 FinanceFinance F177-99F177-99

Matrix Matrix managementmanagement

38

R. Ching, Ph.D. • MIS • California State University, Sacramento

4NF4NF4NF4NF

4NF Example4NF Example

Multivalued Multivalued dependenciesdependencies

100100 FinanceFinance F177-99F177-99100100 MarketingMarketing F177-99F177-99100100 FinanceFinance F288-00F288-00102102 FinanceFinance F288-00F288-00102102 MarketingMarketing F177-99F177-99102102 FinanceFinance F177-99F177-99

Matrix Matrix managementmanagement

100100 FinanceFinance100100 MarketingMarketing102102 FinanceFinance102102 MarketingMarketing

100100 FinanceFinance100100 MarketingMarketing102102 FinanceFinance102102 MarketingMarketing

100100 F177-99F177-99100100 F288-00F288-00102102 F288-00F288-00102102 F177-99F177-99

100100 F177-99F177-99100100 F288-00F288-00102102 F288-00F288-00102102 F177-99F177-99

39

R. Ching, Ph.D. • MIS • California State University, Sacramento

Fifth Normal Form (5NF)Fifth Normal Form (5NF)aka Project-Join NFaka Project-Join NF

• A relation is in A relation is in fifth normal form fifth normal form if no remaining nonloss if no remaining nonloss projections (i.e., all projects preserve all information projections (i.e., all projects preserve all information contained in the original relation)are possible, except the contained in the original relation)are possible, except the trivial one in which the key appears in each project.trivial one in which the key appears in each project.

– The join of all projects will result in the original The join of all projects will result in the original relationrelation

– No systematic method exists for obtaining 5NF or for No systematic method exists for obtaining 5NF or for ensuring that a set of relations is indeed 5NFensuring that a set of relations is indeed 5NF

Ricardo, 1990 Ricardo, 1990

40

R. Ching, Ph.D. • MIS • California State University, Sacramento

Domain-Key Normal Form (DKNF)Domain-Key Normal Form (DKNF)

• A relation is in A relation is in domain-key normal form domain-key normal form if every constraint if every constraint is a logical consequence of domain constraints or key is a logical consequence of domain constraints or key constraints (i.e., all possible values are a result of an constraints (i.e., all possible values are a result of an imposed constraint)imposed constraint)– There is no proven method of converting a design to There is no proven method of converting a design to

DKNF, so it remains an ideal rather than a state that DKNF, so it remains an ideal rather than a state that can readily be achievedcan readily be achieved

Ricardo, 1990 Ricardo, 1990

41

R. Ching, Ph.D. • MIS • California State University, Sacramento

DKNFDKNF

Emp_ID, Emp_name, Classification, Position, SalaryEmp_ID, Emp_name, Classification, Position, Salary

For example:For example:

Domain for Classification:Domain for Classification:• ExecutiveExecutive• ManagerManager• StaffStaff

Domain for Classification:Domain for Classification:• ExecutiveExecutive• ManagerManager• StaffStaff

Domain for PositionDomain for Position• Strategic PlannerStrategic Planner• CIOCIO• Vice PresidentVice President

Domain for PositionDomain for Position• Strategic PlannerStrategic Planner• CIOCIO• Vice PresidentVice President

Domain for PositionDomain for Position• Programmer/Analyst IProgrammer/Analyst I• Programmer/Analyst IIProgrammer/Analyst II• Database/Analyst IDatabase/Analyst I

Domain for PositionDomain for Position• Programmer/Analyst IProgrammer/Analyst I• Programmer/Analyst IIProgrammer/Analyst II• Database/Analyst IDatabase/Analyst I

42

R. Ching, Ph.D. • MIS • California State University, Sacramento

Database Design MethodologyDatabase Design Methodology

Physical database designPhysical database design

Logical database designLogical database design

Conceptual database designConceptual database design • Build conceptual representation of the Build conceptual representation of the databasedatabase

• Translate conceptual representation to Translate conceptual representation to logical structure of the databaselogical structure of the database

• Operatioanlize logical structure in a Operatioanlize logical structure in a physical implementationphysical implementation

43

R. Ching, Ph.D. • MIS • California State University, Sacramento

Conceptual Database DesignConceptual Database Design

• The process of constructing a model of the data used in an The process of constructing a model of the data used in an enterprise, independent of all physical considerationsenterprise, independent of all physical considerations

• What’s involved…What’s involved…– Identify entity types, relationship typesIdentify entity types, relationship types– Identify and associate attributes with entity or relationship typesIdentify and associate attributes with entity or relationship types– Determine attribute domainsDetermine attribute domains– Determine candidate, primary and alternate key attributesDetermine candidate, primary and alternate key attributes– Consider use of enhanced modeling conceptsConsider use of enhanced modeling concepts– Check model for redundanciesCheck model for redundancies– Validate conceptual model against user transactionsValidate conceptual model against user transactions– Review conceptual data model with the usersReview conceptual data model with the users

44

R. Ching, Ph.D. • MIS • California State University, Sacramento

Logical Database DesignLogical Database Design

• The process of constructing a model of the data used in an The process of constructing a model of the data used in an enterprise based on a specific data model, but independent enterprise based on a specific data model, but independent of a particular DBMS and other physical considerationsof a particular DBMS and other physical considerations

• What’s involved…What’s involved…– Derive relations for logical data modelDerive relations for logical data model– Validate relations using Validate relations using data normalizationdata normalization– Validate relations against user transactionsValidate relations against user transactions– Check integrity constraintsCheck integrity constraints– Review logical data model (ERD) with the usersReview logical data model (ERD) with the users– Merge logical data models into global data modelMerge logical data models into global data model– Check for future growthCheck for future growth

45

R. Ching, Ph.D. • MIS • California State University, Sacramento

Gather InformationGather Information

• Meet with the users to get gather informationMeet with the users to get gather information

– InterviewsInterviews

– DocumentsDocuments

46

R. Ching, Ph.D. • MIS • California State University, Sacramento

Derive Relations Derive Relations

• Strong and weak entity typesStrong and weak entity types• Relationship types (cardinality)Relationship types (cardinality)• Participation (mandatory vs. partial)Participation (mandatory vs. partial)

Invoice ItemsInvoice Items

Invoice number (pk)Invoice number (pk)Product code (pk)Product code (pk)Manufacture codeManufacture codeQuantityQuantitySales PriceSales Price

InvoiceInvoice

Invoice number (pk)Invoice number (pk)Invoice dateInvoice dateDelivery dateDelivery dateSales typeSales typeCustomer accountCustomer account

1..*1..* 1..11..1

HaveHave

Weak entity type Weak entity type (Invoice number is (Invoice number is part of key)part of key)

Strong entity typeStrong entity type

Mandatory (all Mandatory (all invoices must have invoices must have at least one invoice at least one invoice itemitem

One-to-many relationshipOne-to-many relationship Invoices have invoice itemsInvoices have invoice items

47

R. Ching, Ph.D. • MIS • California State University, Sacramento

Validate RelationsValidate Relations

• Normalize relationsNormalize relations

• Validate against transactions - Validate against transactions - Can a transaction be Can a transaction be recreated given the data retained in the relations?recreated given the data retained in the relations?))

• Check integrity constraintsCheck integrity constraints

– Required data (not null)Required data (not null)

– Domain constraints (in, references)Domain constraints (in, references)

– MultiplicityMultiplicity

– Entity integrity (primary key)Entity integrity (primary key)

– Referential integrity (foreign key)Referential integrity (foreign key)

– General constraints (business rules)General constraints (business rules)

48

R. Ching, Ph.D. • MIS • California State University, Sacramento

Review Data Model with the UsersReview Data Model with the Users

• Be pleasant and professional, not arrogant, challenging or Be pleasant and professional, not arrogant, challenging or condescendingcondescending

– Not everyone is receptive to change Not everyone is receptive to change

– Your role is to Your role is to facilitatefacilitate change change

• The user is always “right” – The user is always “right” – It’s his/her dataIt’s his/her data

• Document all change requests (Document all change requests (CYACYA))

• Listen, listen, listen… (Listen, listen, listen… (Even if you don’t agreeEven if you don’t agree))

49

R. Ching, Ph.D. • MIS • California State University, Sacramento

Logical Global Data ModelLogical Global Data Model

InvoiceInvoice

InventoryInventory

Cust BillingCust Billing

Vendor HistoryVendor History

Product SalesProduct Sales

Records TransactionsRecords Transactions

Counts and retail pricesCounts and retail prices

Customer credit salesCustomer credit sales

Vendor performanceVendor performance

Sales historySales history

Cust AccountsCust Accounts

Customer credit accountsCustomer credit accounts

Local Local Data Data

ModelsModels

Global Data Global Data ModelModel

Global Data Global Data ModelModel

Local data models are merged Local data models are merged to create a (near) normalized to create a (near) normalized global data model global data model

50

R. Ching, Ph.D. • MIS • California State University, Sacramento

Physical Database DesignPhysical Database Design

• The process of producing a description of the The process of producing a description of the implementation of the database on secondary storageimplementation of the database on secondary storage

• It describes the base relations, fle organizations and It describes the base relations, fle organizations and indexes used to achieve efficient access to the data and nay indexes used to achieve efficient access to the data and nay associated integrity constraints and security measuresassociated integrity constraints and security measures

• What’s involved…What’s involved…– Translate logical data model for target DBMS: Design base Translate logical data model for target DBMS: Design base

relations, representation of derived data and general constraintsrelations, representation of derived data and general constraints– Design file organizations and indexes: Analyze transactions, Design file organizations and indexes: Analyze transactions,

choose file organizations, choose indexes, choose file organizations, choose indexes, estimate disk space requirementsestimate disk space requirements

– Design user views and security mechanismsDesign user views and security mechanisms– Consider the introduction of controlled redundancyConsider the introduction of controlled redundancy– Monitor and tune the operational systemMonitor and tune the operational system

Dictated by the DB productDictated by the DB product

51

R. Ching, Ph.D. • MIS • California State University, Sacramento

Logical vs. Physical Database DesignLogical vs. Physical Database Design

• LogicalLogicalThe process of constructing a model of the information use The process of constructing a model of the information use the enterprise based on one model of data, BUT the enterprise based on one model of data, BUT independent of a particular DBMS and other physical independent of a particular DBMS and other physical aspects.aspects.

• PhysicalPhysicalThe process of producing a description of the The process of producing a description of the implementation of the database on secondary storage; it implementation of the database on secondary storage; it describes the storage structures and access methods used to describes the storage structures and access methods used to gain access effectively.gain access effectively.

Whereas the logical database design is concerned with the Whereas the logical database design is concerned with the whatwhat, physical database design is concerned with the , physical database design is concerned with the howhow..

52

R. Ching, Ph.D. • MIS • California State University, Sacramento

Physical Database DesignPhysical Database Design

Five steps:Five steps:

• Translate the global (enterprise) logical data model for the Translate the global (enterprise) logical data model for the target DBMStarget DBMS

• Design files organizations and indexes, estimate database Design files organizations and indexes, estimate database space (disk space requirements)space (disk space requirements)

• Design and implement user views and security Design and implement user views and security mechanismsmechanisms

• Consider the introduction of controlled redundancy Consider the introduction of controlled redundancy ((denormalizationdenormalization))

• Monitor and tune the operational systemMonitor and tune the operational system

53

R. Ching, Ph.D. • MIS • California State University, Sacramento

Translate the Global Logical Database Model Translate the Global Logical Database Model for the Target DBMSfor the Target DBMS

• Design the relations for the target DBMSDesign the relations for the target DBMS

– Decide how to represent the base relations in the global Decide how to represent the base relations in the global logical data model in the target DBMSlogical data model in the target DBMS

• Specify keys (primary, foreign), default values, Specify keys (primary, foreign), default values, integrity constraints (table, column), and indexesintegrity constraints (table, column), and indexes

• Design integrity rules for the target DBMSDesign integrity rules for the target DBMS

– Design the enterprise constraints for the target DBMSDesign the enterprise constraints for the target DBMS

• Applies to updates and insertsApplies to updates and inserts

54

R. Ching, Ph.D. • MIS • California State University, Sacramento

Design and Implement the Physical Design and Implement the Physical RepresentationRepresentation

• Determine the file organizations and access methods that Determine the file organizations and access methods that will be used to store the base relations (i.e., the way in will be used to store the base relations (i.e., the way in which relations and tuples will be held in secondary storage)which relations and tuples will be held in secondary storage)

– Understand the system resourcesUnderstand the system resources• Understand the capabilities of the hardware (CPU, Understand the capabilities of the hardware (CPU,

memory, disk I-O)memory, disk I-O)• Analyze the software’s performance and limitations Analyze the software’s performance and limitations

on the network (client/server) and Interneton the network (client/server) and Internet

Depends on the vendor!Depends on the vendor!

55

R. Ching, Ph.D. • MIS • California State University, Sacramento

Design and Implement the Physical Design and Implement the Physical RepresentationRepresentation

• Analyze the transactions - understand the functionality of the Analyze the transactions - understand the functionality of the transactions that will run on the database, and analyze the transactions that will run on the database, and analyze the import transactionsimport transactions

• Choose file organizationChoose file organization

• Choose secondary indexes - determine whether secondary Choose secondary indexes - determine whether secondary indexes will enhance performanceindexes will enhance performance

– Index the primary key (if it is not the key of the file Index the primary key (if it is not the key of the file organization)organization)

– Do not index small relationsDo not index small relations

– Add a secondary index to a heavily used secondary keyAdd a secondary index to a heavily used secondary key

– Add a secondary index to a frequently used foreign keyAdd a secondary index to a frequently used foreign key

56

R. Ching, Ph.D. • MIS • California State University, Sacramento

Design and Implement the Physical Design and Implement the Physical RepresentationRepresentation

((cont.cont.))

– AVOID AVOID INDEXING AN ATTRIBUTE OR INDEXING AN ATTRIBUTE OR RELATION THAT IS FREQUENTLY UPDATEDRELATION THAT IS FREQUENTLY UPDATED

– Avoid indexing an attribute if the query will retrieve a Avoid indexing an attribute if the query will retrieve a large portion of the tuples in a relationlarge portion of the tuples in a relation

– Avoid indexing attributes that consist of long character Avoid indexing attributes that consist of long character stringsstrings

57

R. Ching, Ph.D. • MIS • California State University, Sacramento

Design and Implement the Physical Design and Implement the Physical RepresentationRepresentation

• Consider the introduction of controlled redundancyConsider the introduction of controlled redundancy

– Determine whether introducing redundancy in a Determine whether introducing redundancy in a controlled manner by relaxing the normalization rules controlled manner by relaxing the normalization rules will enhance performancewill enhance performance

• DenormalizeDenormalize only when only when necessarynecessary

– However, denormalizingHowever, denormalizing

» Makes implementation more complexMakes implementation more complex

» Sacrifices flexibilitySacrifices flexibility

» May slow down updates (although retrievals May slow down updates (although retrievals may be increased)may be increased)

58

R. Ching, Ph.D. • MIS • California State University, Sacramento

3NF (Logical Database Design)3NF (Logical Database Design)

Invoice_numberInvoice_number Invoice_date Date_delivered Cust_account Invoice_date Date_delivered Cust_account

Cust_accountCust_account Cust_name Cust_addr Zip_code Cust_name Cust_addr Zip_code

Zip_codeZip_code City State City State

Invoice_numberInvoice_number ItemItem Item_qty Item_price Item_qty Item_price

Invoice RelationInvoice Relation

Customer RelationCustomer Relation

Zip_code RelationZip_code Relation

Invoice_items RelationInvoice_items Relation

ItemItem Item_descrip Manuf_code Item_descrip Manuf_code

Items RelationItems RelationManuf_codeManuf_code Manuf_name Manuf_nameManufacturers RelationManufacturers Relation

59

R. Ching, Ph.D. • MIS • California State University, Sacramento

Cust_account Cust_name Cust_addr Zip_code Account_type Cust_account Cust_name Cust_addr Zip_code Account_type

Credit_limit Current_balance Pay_historyCredit_limit Current_balance Pay_history

DenormalizationDenormalization

Cust_account Cust_name Cust_addr Zip_codeCust_account Cust_name Cust_addr Zip_code

Customers RelationCustomers Relation

Cust_account Account_type Credit_limit Current_balance Pay_historyCust_account Account_type Credit_limit Current_balance Pay_historyCustomer_accounts RelationCustomer_accounts Relation

• Duplicating attributes or combining relationsDuplicating attributes or combining relations– Combining 1:1 relationshipsCombining 1:1 relationships

Customers RelationCustomers Relation

60

R. Ching, Ph.D. • MIS • California State University, Sacramento

Cust_account Cust_name Cust_addr Cust_account Cust_name Cust_addr City StateCity State Zip_code Zip_code

Account_type Credit_limit Current_balance Pay_historyAccount_type Credit_limit Current_balance Pay_history

DenomalizationDenomalization

• Duplicating attributes or combining relationsDuplicating attributes or combining relations– Duplicating nonkey attributes in 1:M relationships to Duplicating nonkey attributes in 1:M relationships to

reduce joins (creating partial or transitive reduce joins (creating partial or transitive dependencies)dependencies)

Zip_code City StateZip_code City State

Cust_account Cust_name Cust_addr Zip_code Account_type Cust_account Cust_name Cust_addr Zip_code Account_type

Credit_limit Current_balance Pay_historyCredit_limit Current_balance Pay_history

Customers RelationCustomers Relation

Zip_codes RelationZip_codes Relation

Customers RelationCustomers Relation

61

R. Ching, Ph.D. • MIS • California State University, Sacramento

DenomalizationDenomalization

((cont.cont.))

– Reference tables (introducing transitive dependencies)Reference tables (introducing transitive dependencies)

Invoice_number Item Item_qty Item_priceInvoice_number Item Item_qty Item_price

Invoice_items RelationInvoice_items Relation

Item Item_descrip Manuf_codeItem Item_descrip Manuf_code Manuf_code Manuf_nameManuf_code Manuf_name

Manufacturers RelationManufacturers Relation

Problem:Problem: In order to know the manufacturer’s name of a In order to know the manufacturer’s name of a customer’s purchased item, a join between Items and customer’s purchased item, a join between Items and Manufacturers Manufacturers mustmust be performed be performed

Items RelationItems Relation

62

R. Ching, Ph.D. • MIS • California State University, Sacramento

Invoice_number Item Invoice_number Item Manuf_codeManuf_code Manuf_nameManuf_name Item_qty Item_qty

Item_priceItem_price

DenomalizationDenomalization

((cont.cont.))

– Reference tables (introducing transitive dependencies)Reference tables (introducing transitive dependencies)

Invoice_number Item Item_qty Item_priceInvoice_number Item Item_qty Item_price

Invoice_items RelationInvoice_items Relation

Item Item_descrip Manuf_codeItem Item_descrip Manuf_code Manuf_code Manuf_nameManuf_code Manuf_name

Manufacturers RelationManufacturers RelationItems RelationItems Relation

63

R. Ching, Ph.D. • MIS • California State University, Sacramento

DenomalizationDenomalization

((cont.cont.))

– Duplicating foreign key attributes in 1:M relationships to Duplicating foreign key attributes in 1:M relationships to reduce joinsreduce joins

Invoice_number Item Item_qty Item_priceInvoice_number Item Item_qty Item_price

Invoice_items RelationInvoice_items Relation

Item Item_descrip Manuf_codeItem Item_descrip Manuf_code Manuf_code Manuf_nameManuf_code Manuf_name

Manufacturers RelationManufacturers RelationItems RelationItems Relation

Problem:Problem: To find the manufacturer’s name of a product (e.g., To find the manufacturer’s name of a product (e.g., Sony CDP-525) from line_items (relation), two joins must be Sony CDP-525) from line_items (relation), two joins must be made: manufacturers to products, and products to manufacturers.made: manufacturers to products, and products to manufacturers.

64

R. Ching, Ph.D. • MIS • California State University, Sacramento

DenomalizationDenomalization

((cont.cont.))

Invoice_number Item Invoice_number Item Manuf_codeManuf_code Item_qty Item_price Item_qty Item_price

Invoice_number Item Item_qty Item_priceInvoice_number Item Item_qty Item_price

Invoice_items RelationInvoice_items Relation

Item Item_descrip Manuf_codeItem Item_descrip Manuf_code Manuf_code Manuf_nameManuf_code Manuf_name

Manufacturers RelationManufacturers RelationItems RelationItems Relation

65

R. Ching, Ph.D. • MIS • California State University, Sacramento

DenomalizationDenomalization

((cont.cont.))– Duplicating attributes in M:N relationships to reduce Duplicating attributes in M:N relationships to reduce

joinsjoins

If joint accounts are allowed and different types of accounts If joint accounts are allowed and different types of accounts (i.e., long term, revolving) are available:(i.e., long term, revolving) are available:

Cust_account Cust_name Cust_addr Zip_code Soc_Sec_NumCust_account Cust_name Cust_addr Zip_code Soc_Sec_Num

Customers RelationCustomers Relation

Cust_account Account_type Credit_limit Current_balance Pay_historyCust_account Account_type Credit_limit Current_balance Pay_history

Soc_Sec_NumSoc_Sec_Num

M:NM:NM:NM:NCustomer_accounts RelationCustomer_accounts Relation

66

R. Ching, Ph.D. • MIS • California State University, Sacramento

DenormalizationDenormalization

123456789123456789 John SmithJohn Smith … … 123-45-6789123-45-6789

123456789 123456789 Jane SmithJane Smith ...... 987-65-4321987-65-4321

112233445112233445 John DoeJohn Doe …… 567-32-1234567-32-1234

123456789123456789 …… 123-45-6789123-45-6789

123456789123456789 …… 987-65-7321987-65-7321

543219876543219876 …… 123-45-6789123-45-6789

678901234678901234 …… 987-65-7321987-65-7321

548794133548794133 …… 567-32-1234567-32-1234

A customer can have A customer can have several accounts...several accounts...

An account can have An account can have several owners...several owners...

67

R. Ching, Ph.D. • MIS • California State University, Sacramento

DenormalizationDenormalization

((cont.cont.))

Cust_account Cust_name Cust_addr Zip_code Soc_Sec_NumCust_account Cust_name Cust_addr Zip_code Soc_Sec_Num

Customers RelationCustomers Relation

Cust_account Account_type Credit_limit Current_balance Pay_historyCust_account Account_type Credit_limit Current_balance Pay_history

Soc_Sec_NumSoc_Sec_Num

M:NM:NM:NM:NCustomer_accounts RelationCustomer_accounts Relation

68

R. Ching, Ph.D. • MIS • California State University, Sacramento

Cust_account Account_type Credit_limit Current_balance Pay_historyCust_account Account_type Credit_limit Current_balance Pay_history

Soc_Sec_Num Soc_Sec_Num Cust_nameCust_name

DenormalizationDenormalization

Cust_account Cust_name Cust_addr Zip_code Soc_Sec_NumCust_account Cust_name Cust_addr Zip_code Soc_Sec_Num

Customers RelationCustomers Relation

Cust_account Account_type Credit_limit Current_balance Pay_historyCust_account Account_type Credit_limit Current_balance Pay_history

Soc_Sec_NumSoc_Sec_Num

Customer_accounts RelationCustomer_accounts Relation

((cont.cont.))• Duplicating attributes in M:N relationships to reduce joinsDuplicating attributes in M:N relationships to reduce joins

69

R. Ching, Ph.D. • MIS • California State University, Sacramento

DenomalizationDenomalization

((cont.cont.))

– Introducing repeating groups (Introducing repeating groups (if the number of if the number of occurrences is known and/or constantoccurrences is known and/or constant))

– Creating extract tables (in an extreme case, an Creating extract tables (in an extreme case, an unnormalized relation) - frees computing resourcesunnormalized relation) - frees computing resources

Cust_account Account_type Credit_limit Current_balance Pay_historyCust_account Account_type Credit_limit Current_balance Pay_history

Soc_Sec_Num1 Cust_name1 Soc_Sec_Num1 Cust_name1 Soc_Sec_num2 Cust_name2Soc_Sec_num2 Cust_name2

70

R. Ching, Ph.D. • MIS • California State University, Sacramento

DenomalizationDenomalization

((cont.cont.))

– Introduction of “codes” toIntroduction of “codes” to

• Simplify the composite keySimplify the composite key

• Retain the original sequence Retain the original sequence

Invoice_items RelationInvoice_items Relation

Invoice_number Item_numberInvoice_number Item_number Item Manuf_code Item_qty Item_price Item Manuf_code Item_qty Item_price

Invoice_number ItemInvoice_number Item Manuf_code Item_qty Item_price Manuf_code Item_qty Item_price

71

R. Ching, Ph.D. • MIS • California State University, Sacramento

Stereos To GoOrder No.

Date: / /

Account No.

Item

Number Product Description/Manufacturer Qty PriceProduct

Code

1

2

3

4

5

Date Shipped: / /

Customer:Address:

City State Zip Code

10001

6 15 93

1005-25941-2

William Tell2036-26 StreetSacramento CA 95819

SAGX730 Pioneer Remote A/V Receiver 1 56995AT10 Cervwin Vega Loudspeakers 1 35995CDPC725 Sony Disc-Jockey CD Changer 1 39995

6 18 93

Invoice No.

72

R. Ching, Ph.D. • MIS • California State University, Sacramento

DenormalizationDenormalization

1000110001 AT10 AT10 CV CV Loudspeakers Loudspeakers 2 2 359.95359.951000110001 CDPC725 CDPC725 SONSON Disc-Jockey CD Changer Disc-Jockey CD Changer 1 1 399.95399.951000110001 SAGX730 SAGX730 PIOPIO Remote A/V Receiver Remote A/V Receiver 1 1 569.95 569.95

Invo

ice_

num

ber

Invo

ice_

num

ber

Item

Item

Man

uf_c

ode

Man

uf_c

ode

Item

_des

crip

tion

Item

_des

crip

tion

QtyQty

Retai

l_pr

ice

Retai

l_pr

ice

KeyKey

Problem:Problem: These items are not in the sequence as they appear on These items are not in the sequence as they appear on the original document when retrieved from the table.the original document when retrieved from the table.

73

R. Ching, Ph.D. • MIS • California State University, Sacramento

DenormalizationDenormalization

1000110001 0101 SAGX730 SAGX730 PIOPIO Remote A/V Receiver Remote A/V Receiver 1 1 569.95 569.95 1000110001 02 02 AT10 AT10 CV CV Loudspeakers Loudspeakers 2 2 359.95359.951000110001 03 03 CDPC725 CDPC725 SONSON Disc-Jockey CD Changer Disc-Jockey CD Changer 1 1 399.95399.95

Invo

ice_

num

ber

Invo

ice_

num

ber

Item

_num

ber

Item

_num

ber

Item

Item

Man

uf_c

ode

Man

uf_c

ode

Item

_des

crip

tion

Item

_des

crip

tion

QtyQty

Retai

l_pr

ice

Retai

l_pr

ice

KeyKey

74

R. Ching, Ph.D. • MIS • California State University, Sacramento

DenomalizationDenomalization

((cont.cont.))

– Introducing calculated attributesIntroducing calculated attributes

• Simplify processingSimplify processing

Invoice_items RelationInvoice_items Relation

Invoice_number Item_numberInvoice_number Item_number Item Manuf_code Item_qty Item_price Item Manuf_code Item_qty Item_price

Invoice_number ItemInvoice_number Item Manuf_code Item_qty Item_price Manuf_code Item_qty Item_price

Extended_priceExtended_priceItem_qty Item_qty xx Item_price Item_price

75

R. Ching, Ph.D. • MIS • California State University, Sacramento

DenormalizationDenormalization

1000110001 0101 SAGX730 SAGX730 PIOPIO Remote A/V Receiver Remote A/V Receiver 1 1 569.95569.95 569.95 569.95 1000110001 02 02 AT10 AT10 CV CV Loudspeakers Loudspeakers 2 2 359.95359.95 719.90719.901000110001 03 03 CDPC725 CDPC725 SONSON Disc-Jockey CD Changer Disc-Jockey CD Changer 1 1 399.95399.95 399.95399.95

Invo

ice_

num

ber

Invo

ice_

num

ber

Item

_num

ber

Item

_num

ber

Item

Item

Man

uf_c

ode

Man

uf_c

ode

Item

_des

crip

tion

Item

_des

crip

tion

QtyQty Ret

ail_

pric

e

Retai

l_pr

ice

Exten

ded_

pric

e

Exten

ded_

pric

e

CalculationCalculation

76

R. Ching, Ph.D. • MIS • California State University, Sacramento