22
Dimensional Modeling Chapter 2

Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

  • Upload
    roy-cox

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Dimensional Modeling

Chapter 2

Page 2: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

The Dimensional Data Model

An alternative to the normalized data model

Present information as simply as possible (easier to understand)

Return queries as quickly as possible (efficient for queries)

Track the underlying business processes (process focused)

Page 3: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

The Dimensional Data Model

Contains the same information as the normalized model

Has far fewer tables Grouped in coherent business categories Pre-joins hierarchies and lookup tables

resulting in fewer join paths and fewer intermediate tables

Normalized fact table with denormalized dimension tables.

Page 4: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

GB Video E-R Diagram

Customer#Cust NoF NameL NameAds1Ads2CityStateZipTel NoCC NoExpire

Rental#Rental NoDateClerk NoPay TypeCC NoExpireCC Approval

Line#Line NoDue DateReturn DateOD chargePay type

Requestor of

Owner of

Video

#Video NoOne-day feeExtra daysWeekend

Title

#Title NoNameVendor NoCost

Name for

Holder of

Page 5: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

CustomerCustIDCust NoF NameL Name

RentalRentalIDRental NoClerk NoStorePay Type

LineLineIDOD ChargeOneDayChargeExtraDaysChargeWeekendChargeDaysReservedDaysOverdueCustIDAddressIDRentalIdVideoIDTitleIDRentalDateIDDueDateIDReturnDateID

VideoVideoIDVideo No

TitleTitleIDTitleNoNameCostVendor Name

Rental DateRentalDateIDSQLDateDayWeekQuarterHoliday

Due DateDueDateIDSQLDateDayWeekQuarterHoliday

Return DateReturnDateIDSQLDateDayWeekQuarterHoliday

AddressAddressIDAdddress1Address2CityStateZipAreaCodePhone

GB Video Data Mart

Page 6: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Fact Table

Measurements associated with a specific business process

Grain: level of detail of the table Process events produce fact records Facts (attributes) are usually

• Numeric• Additive

Derived facts included Foreign (surrogate) keys refer to dimension tables

(entities) Classification values help define subsets

Page 7: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Dimension Tables

Entities describing the objects of the process Conformed dimensions cross processes Attributes are descriptive

• Text• Numeric

Surrogate keys Less volatile than facts (1:m with the fact table) Null entries Date dimensions Produce “by” questions

Page 8: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Bus Architecture

An architecture that permits aggregating data across multiple marts

Conformed dimensions and attributes Drill Down vs. Drill Across Bus matrix

Page 9: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Keys and Surrogate Keys

A surrogate key is a unique identifier for data warehouse records that replaces source primary keys (business/natural keys)

Protect against changes in source systems Allow integration from multiple sources Enable rows that do not exist in source data Track changes over time (e.g. new customer

instances when addresses change) Replace text keys with integers for efficiency

Page 10: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Slowly Changing Dimensions

Attributes in a dimension that change more slowly than the fact granularity

Type 1: Current only Type 2: All history Type 3: Most recent few (rare)

Note: rapidly changing dimensions usually indicate the presence of a business process that should be tracked as a separate dimension or as a fact table

Page 11: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

CustKey BKCustID CustName CommDist Gender HomOwn?

1552 31421 Jane Rider 3 F N

Date CustKey ProdKey Item Count Amount

1/7/2004 1552 95 1 1,798.00

3/2/2004 1552 37 1 27.95

5/7/2005 1552 87 2 320.26

2/21/2006 1552 2387 42 1 19.95

Cust

Key

BKCust

ID

Cust

Name

Comm

Dist

Gender Hom

Own?

Eff End

1552 31421 Jane Rider 3 F N 1/7/2004 1/1/2006

2387 31421 Jane Rider 31 F N 1/2/2006 12/31/9999

Fact Table

Dimension with a slowly changing attribute

Page 12: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Date Dimensions

One row for every day for which you expect to have data for the fact table (perhaps generated in a spreadsheet and imported)

Usually use a meaningful integer surrogate key (such as yyyymmdd 20060926 for Sep. 26, 2006). Note: this order sorts correctly.

Include rows for missing or future dates to be added later.

Page 13: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Degenerate Dimensions

Dimensions without attributes. (Such as a transaction number or order number.)

Put the attribute value into the fact table even though it is not an additive fact.

Page 14: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Snowflaking (Outrigger Dimensions or Reference Dimensions)

Connects entities to dimension tables rather than the fact table

Complicates coding and requires additional processing for retrievals

Makes type 2 slowly changing dimensions harder to maintain

Useful for seldom used lookups

Page 15: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

M:N Multivalued Dimensions

Fact to Dimension Dimension to Dimension

Try to avoid these. Solutions can be very misleading.

Page 16: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Multivalued Dimensions

ORDERS (FACT)SalesRepKeyProductKeySalesRepGrpKeyCustomerKeyOrderQty

SALESREPSalesRepKeyNameAddress

SALESREP-ORDER-BRIDGESalesRepKeySalesrepGroupKeyWeight= (1/NumReps)

Page 17: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Hierarchies

Group data within dimensions: SalesRep• Region

• State

• County

• Neighborhood

Problem structures• Variable depth

• Frequently changing

Page 18: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Heterogeneous Products

Several different kinds of entry with different attributes for each• (The sub-class problem)

Page 19: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Aggregate Dimensions

Dimensions that represent data at different levels of granularity• Remove a dimension

• Roll up the hierarchy (provide a new shrunken dimension with new surr-key that represents rolled up data)

Page 20: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Junk Dimensions

Miscellaneous attributes that don’t belong to another entity, usually representing processing levels • Flags

• Categories

• Types

Page 21: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Fact Tables

Transaction• Track processes at discrete points in time

when they occur Periodic snapshot

• Cumulative performance over specific time intervals

Accumulating snapshot• Constantly updated over time. May include

multiple dates representing stages.

Page 22: Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier

Aggregates

Precalculated summary tables • Improve performance

• Record data an coarser granularity