Upload
roy-cox
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Dimensional Modeling
Chapter 2
The Dimensional Data Model
An alternative to the normalized data model
Present information as simply as possible (easier to understand)
Return queries as quickly as possible (efficient for queries)
Track the underlying business processes (process focused)
The Dimensional Data Model
Contains the same information as the normalized model
Has far fewer tables Grouped in coherent business categories Pre-joins hierarchies and lookup tables
resulting in fewer join paths and fewer intermediate tables
Normalized fact table with denormalized dimension tables.
GB Video E-R Diagram
Customer#Cust NoF NameL NameAds1Ads2CityStateZipTel NoCC NoExpire
Rental#Rental NoDateClerk NoPay TypeCC NoExpireCC Approval
Line#Line NoDue DateReturn DateOD chargePay type
Requestor of
Owner of
Video
#Video NoOne-day feeExtra daysWeekend
Title
#Title NoNameVendor NoCost
Name for
Holder of
CustomerCustIDCust NoF NameL Name
RentalRentalIDRental NoClerk NoStorePay Type
LineLineIDOD ChargeOneDayChargeExtraDaysChargeWeekendChargeDaysReservedDaysOverdueCustIDAddressIDRentalIdVideoIDTitleIDRentalDateIDDueDateIDReturnDateID
VideoVideoIDVideo No
TitleTitleIDTitleNoNameCostVendor Name
Rental DateRentalDateIDSQLDateDayWeekQuarterHoliday
Due DateDueDateIDSQLDateDayWeekQuarterHoliday
Return DateReturnDateIDSQLDateDayWeekQuarterHoliday
AddressAddressIDAdddress1Address2CityStateZipAreaCodePhone
GB Video Data Mart
Fact Table
Measurements associated with a specific business process
Grain: level of detail of the table Process events produce fact records Facts (attributes) are usually
• Numeric• Additive
Derived facts included Foreign (surrogate) keys refer to dimension tables
(entities) Classification values help define subsets
Dimension Tables
Entities describing the objects of the process Conformed dimensions cross processes Attributes are descriptive
• Text• Numeric
Surrogate keys Less volatile than facts (1:m with the fact table) Null entries Date dimensions Produce “by” questions
Bus Architecture
An architecture that permits aggregating data across multiple marts
Conformed dimensions and attributes Drill Down vs. Drill Across Bus matrix
Keys and Surrogate Keys
A surrogate key is a unique identifier for data warehouse records that replaces source primary keys (business/natural keys)
Protect against changes in source systems Allow integration from multiple sources Enable rows that do not exist in source data Track changes over time (e.g. new customer
instances when addresses change) Replace text keys with integers for efficiency
Slowly Changing Dimensions
Attributes in a dimension that change more slowly than the fact granularity
Type 1: Current only Type 2: All history Type 3: Most recent few (rare)
Note: rapidly changing dimensions usually indicate the presence of a business process that should be tracked as a separate dimension or as a fact table
CustKey BKCustID CustName CommDist Gender HomOwn?
1552 31421 Jane Rider 3 F N
Date CustKey ProdKey Item Count Amount
1/7/2004 1552 95 1 1,798.00
3/2/2004 1552 37 1 27.95
5/7/2005 1552 87 2 320.26
2/21/2006 1552 2387 42 1 19.95
Cust
Key
BKCust
ID
Cust
Name
Comm
Dist
Gender Hom
Own?
Eff End
1552 31421 Jane Rider 3 F N 1/7/2004 1/1/2006
2387 31421 Jane Rider 31 F N 1/2/2006 12/31/9999
Fact Table
Dimension with a slowly changing attribute
Date Dimensions
One row for every day for which you expect to have data for the fact table (perhaps generated in a spreadsheet and imported)
Usually use a meaningful integer surrogate key (such as yyyymmdd 20060926 for Sep. 26, 2006). Note: this order sorts correctly.
Include rows for missing or future dates to be added later.
Degenerate Dimensions
Dimensions without attributes. (Such as a transaction number or order number.)
Put the attribute value into the fact table even though it is not an additive fact.
Snowflaking (Outrigger Dimensions or Reference Dimensions)
Connects entities to dimension tables rather than the fact table
Complicates coding and requires additional processing for retrievals
Makes type 2 slowly changing dimensions harder to maintain
Useful for seldom used lookups
M:N Multivalued Dimensions
Fact to Dimension Dimension to Dimension
Try to avoid these. Solutions can be very misleading.
Multivalued Dimensions
ORDERS (FACT)SalesRepKeyProductKeySalesRepGrpKeyCustomerKeyOrderQty
SALESREPSalesRepKeyNameAddress
SALESREP-ORDER-BRIDGESalesRepKeySalesrepGroupKeyWeight= (1/NumReps)
Hierarchies
Group data within dimensions: SalesRep• Region
• State
• County
• Neighborhood
Problem structures• Variable depth
• Frequently changing
Heterogeneous Products
Several different kinds of entry with different attributes for each• (The sub-class problem)
Aggregate Dimensions
Dimensions that represent data at different levels of granularity• Remove a dimension
• Roll up the hierarchy (provide a new shrunken dimension with new surr-key that represents rolled up data)
Junk Dimensions
Miscellaneous attributes that don’t belong to another entity, usually representing processing levels • Flags
• Categories
• Types
Fact Tables
Transaction• Track processes at discrete points in time
when they occur Periodic snapshot
• Cumulative performance over specific time intervals
Accumulating snapshot• Constantly updated over time. May include
multiple dates representing stages.
Aggregates
Precalculated summary tables • Improve performance
• Record data an coarser granularity