Upload
debra-may
View
281
Download
1
Tags:
Embed Size (px)
Citation preview
Dimensional Modeling OverviewDimensional Modeling Overview
Agenda
■ Dimensional Modeling Overview
■ Dimensional Modeling Steps
■ Dimensional Modeling Framework
■ Retail Sales - Case Study
■ Dimensional Modeling Tips
Dimensional Modeling Overview
From Requirement to Data Design
■ The requirements definition completely drives the Data design for the DW.
■ Data design consists of putting together the data structures.
■ A group of data elements form a data structure.
■ Logical data design includes determination of the various data elements that are needed and combination of the data elements into structures of data and also establishing relationships among the data structures.
Requirements Gathering
Rqts. DefinitionDocument
DimensionalModel Data
Design
Information Package
Design Decisions
■ Choosing the process.
Selecting the subjects from the information packages for the first set of logical structures to be designed.
■ Choosing the grain.
Determining the level of detail for the data structures.
■ Identifying and conforming the dimensions.
Choosing the business dimensions (such as product, market, time, etc.) to be included in the first set of structures .
■ Choosing the facts.
Selecting the metrics or units of measurements (such as product sale units, dollar sales, dollar revenue, etc.) included in set of structures.
■ Choosing the duration of the database.
Determining how far back in time you should go for historical data.
Dimensional Modeling Overview
■ What is Dimensional Modeling♦ It is a logical design technique to structure the business
dimensions and the metrics that are analyzed along these dimensions.
♦ A logical design technique that seeks to present the data in a standard framework that is intuitive and allows for high performance access
♦ It is inherently dimensional and adheres to a discipline that uses relational model with some important restrictions
♦ The fundamental idea of dimensional modeling is that nearly every type of business data can be represented as a kind of cube of data
♦ The model has also proved to provide high performance for queries and analysis
Dimensional Modeling Overview
■ Components of Dimensional Model♦ Fact Table
► The fact table contains facts or measurements of the business
♦ Dimension Table► The dimension tables contain textual attributes that describe the
facts
“Dimension”Report,row and column heading
“Facts”Numeric report values
Dimensional Modeling Overview Sample Report Translation
Sales Rep Performance Report
Central Region
Jul-00 Aug-00
(Dollars) (Dollars)
Chicago District 879 878
Adams 345 456
Brown 564 565
Frederickson 657 768
Minneapolis District 890 789
Andersen Smith 909 978
Central Region Total 4244 4434
Dimensional Modeling Overview
■ Dimension Tables♦ Dimension Tables are the entry points into the data warehouse
♦ Dimension tables are designed especially for selection and grouping under a common head
♦ Determine contextual background for facts
♦ Parameters for OLAP
♦ Common Dimensions► Date
► Product
► Location/Region
► Customers
The dimensional model to represent the information contained in the information package, the data structure must be represent with Metrics, business dimensions and attributes for each business dimension.
Dimensional Modeling Overview
■ Dimension Tables♦ Dimension Table Characteristics
► Serve as report labels and query constraints
• “By” words
• “Where” clauses
► Provide Descriptive Information
• Minimal codes
• Embedded meaning as attributes
► Represent hierarchical relationships
Let see Product business dimension example,
When we want to analyze the fact by products.
ITEM KEY
Item DescriptionItem SizePackage TypeCategory
Dimensional Modeling Overview
Product Key
Product Desc.Brand Desc.Class Desc.Size
Product Key
Product Desc Brand Desc Class Size
0001
0002
0003
CHEERIOS 10 OZ
CHEERIOS 24 OZ
LUCKY CHARMS 10 OZ
CHEERIOS
CHEERIOS
LUCKY CHARMS
Family
Family
Kids
10 OZ
24 OZ
10 OZ
Product Dimension Table with Sample Rows
Big Dimensions
Retail_Fact
time_Key
Store_Key
Customer_Key
Product_key
Sales_dollars
Units_sold
Customer Dimension
Customer_key
Customer_ID (natural key)
Customer_name
Customer_address
Date_of_birth
Age
Gender
Annual_Income
Number_of_children
Marital_status
Other_attributes...
Big Dimensions
Customer Demographics Dimension
Customer_demographics_key
Age_Band
Gender
Income_Band
Number_of_children
Marital_status
Customer Dimension
Customer_key
Customer_ID (natural key)
Customer_name
Customer_address
Date_of_birth
Age
Gender
Annual_Income
Number_of_children
Marital_status
Other_attributes...
Customer Dimension
Customer_key
Customer_ID
Customer_name
Customer_address
Date_of_birth
Dirty dimensions
■ A dirty dimension is the one in which data quality cannot be guaranteed♦ Data about the same customer can appear multiple times
Customer Dimension
Customer_key
Customer_ID (natural key)
Customer_name
Customer_address
Date_of_birth
Age
Gender
Annual_Income
Number_of_children
Marital_status
Other_attributes...
Hierarchies in Dimensions
■ Multiple Hierarchies♦ Dimension tables can represent multiple hierarchies roll-ups
♦ For example ,Store Dimension could have
♦ the following hierarchies► Physical Geography
• Zip, City, County, State, Country
► Sales Organization
• District, Region, Zone
► Distribution Roll-up
• Distribution Center, Distribution Center Region
Store Dimension
Store_key
Store_description
Store_type
Zip
City
State
Sales_region
Sales_zone
Distribution Center
Distribution Center Region
Hierarchies in Dimensions
Dimension Tables can represent multiple hierarchical roll-ups
DateDate
MonthMonth
QuarterQuarter
YearYear
WeekWeek
CityCity
DistrictDistrict
StateState
RegionRegion
Sales ZoneSales Zone
ProductProduct
BrandBrand
CategoryCategoryRegionRegion
Dimensional Modeling Overview
■ Fact Table♦ The fact table is where the numerical measurements of the
business are stored
♦ Facts► The detail information in a Fact tables
• For Examples: Sales Quantity, Unit Sales Price, Sales Amount etc.
► Key performance indicators of the business
► Numeric in Nature
► Analyzed across the dimensions
♦ Multi-part key► Foreign keys to dimension tables
► Date is always a key
Sales Fact
DATE KEY
ITEM KEY
STORE KEY
PROMOTION KEY
POS TRXN#
Sales Quantity
Unit Sales Price
Sales $ Amount
Dimensional Modeling Overview
■ So far we have formed fact table and dimension tables.
■ How should these tables be arranged in the dimensional model?
■ What are the relationships and how should we mark the relationships in the model?
■ The dimensional model should primarily facilitate queries and analyses. What would be the types of queries and analyses?
■ Before combining these tables in dimensional model. What are the requirements? The model should provide the best data access.
The whole model must be query-centric.
It must be optimized for queries and analysis.
The model must show that the dimension tables interact with the fact table.
It should structured in such a way that every dimension can interact equally with the fact table.
The model should allow drilling down or rolling up along dimension hierarchies.
With this rqts., each of the dimension tables are directly relates to fact table in the middle.
■ Such an arrangement in the dimensional model looks like a star formation, with the fact table at the core of a star and the dimension tables along the spikes of the star.
■ The dimensional model is therefore called a STAR schema.
■ See figure:
Star like Model
Dimension1
Fact
Dimension3
Dimensionn
Dimension2
STAR schema for AUTO Sales
AUTO SALES
TIME
PRODUCT
DEALER
CUSTOMERDEMO
GRAPHICS
PAYMENTMETHOD
Typical Star model
Date Dimension
Date_key
Day_of_week
Month
Quarter
Year
Holiday_flag
Sales Fact
Date_key
Product_key
Store_key
Dollars_sold
Units_sold
Dollars_cost
Product Dimension
Product_key
Description
Brand
Category
Store Dimension
Store_key
Store_name
Address
Floor_plan_type
Star model…..
■ It consists of sales fact table in the middle of schema diagram. It have 3 dimension tables of Date, Product and Store.
■ The user will analyze the sales using dollar sold, unit sold and dollar cost.
■ From the STAR schema structure intuitively answers the questions for a given amount of dollars, what was the product sold? Who was the customer? Which store sold the product? When was the order placed?
■ Constraints and filters of queries are easily understood by looking at the star.
■ A common type of analysis is the drilling down the summary numbers to get at the details at the lower levels by filtering queries.
Snowflake Design
FactTable
DimTable
DimTable
DimTable
DimTable
Low cardinality redundant attributes moved to “sub dimension” tables
Snowflake Design
■ Issues♦ Only few tools optimized for snowflake schema
♦ When the tool is not optimized for snowflake design► Presentation more complex
► Browsing is slower
► Problems with multiple joins
Snowflake DesignDimensional Modeling Steps
Dimensional Modeling Steps
■ Identify the Business Process
♦ A major operational process that is supported by some kind of legacy system(s) from which data can be collected for the purpose of the data warehouse
♦ Example: orders, invoices, shipments, inventory, sales
■ Identify the Grain
♦ The fundamental lowest level of data represented in a fact table for the business process
♦ Example: individual transactions, individual daily snapshots
■ Identify the Dimensions
♦ Choose the dimensions that will apply to each fact table record
■ Identify the Facts
♦ Choose the measured facts that will populate each fact table record
Dimensional Modeling Steps
Resist the temptation to model data by looking copy books alone
1. Identify the business Process2. Identify the Grain3. Identify the Dimensions4. Identify the Facts
Business Requirements
Data Reality
Dimensional Model
Key Inputs Dimensional Modeling Steps Output
Shared Dimensions Must Conform
■ Option 1: Identical dimensions with the same keys, labels, definitions and values
Sales Schema
Inventory Schema
Conformed Dimension
Item Key
Item Desc.
Brand Desc.
Category
..
DATE KEY
ITEM KEY
STORE KEY
PROMO KEY
Sales Fact
Item Key
Item Desc.
Brand Desc.
Category
..
DATE KEY
ITEM KEY
STORE KEY
Inventory Fact
■ Option 2: “Subset” of base dimension
Sales Schema
Forecast Schema
Item key Item Desc Brand Desc Category Desc0001 Cheerios 10oz Cheerios Cereal
Brand key Brand Desc Category Desc1001 Cheerios Cereal
Conformed Dimension
Item Key
Item Desc.
Brand Desc.
Category
Desc.
..
DATE KEY
ITEM KEY
STORE KEY
PROMO KEY
Sales $
DATE KEY
Day-of-week
Week Desc
Month Desc
Brand Key
Brand Desc.
Category
Desc.
..
Month Key
Brand Key
Estimate
Sales $
Month KEY
Month Desc
Slowly Changing Dimensions
■ Dimension attributes evolve over time♦ For example, customers change their names, move, have
children, adjust their Incomes
■ For every dimension attribute, need to identify “Changes” strategy♦ May use combination of strategies within same dimension table
Type 1 : Overwrite the changed attributes
Original recordItem Key Item Desc Dept12345 Sim City 3000 Educational S/W
Updated recordItem Key Item Desc Dept12345 Sim City 3000 Strategy S/W
Slowly Changing Dimensions
Type 2 : Add a New Dimension Record
Original recordItem Key Item Desc Dept12345 Sim City 3000 Educational S/W
Additional recordItem Key Item Desc Dept12345 Sim City 3000 Strategy S/W
Slowly Changing Dimensions
Type 3 : Add a “Prior” Attribute
Original recordItem Key Item Desc Dept12345 Sim City 3000 Educational S/W
Updated recordItem Key Item Desc Dept Prior Dept12345 Sim City 3000 Strategy S/W Educational S/W
Slowly Changing Dimensions
Slowly Changing Dimensions
■ Data Warehouse Keys ie., STAR schema keys♦ All tables (facts and dimensions) should use Data Warehouse
generated surrogate keys
♦ It is possible that the customer no., of discontinued customers are reassigned to new customers. We will have a problem because the same customer no,. Could relate to the data for the newer customer and also to the data of the retried customer. Therefore do not use such keys as a primary keys for dimensional tables.
♦ The surrogate keys are simply system generated sequence numbers.
♦ Each row in a dimension table is identified by a unique value of an attribute designated as the primary key of the dimension.
♦ Each dimension table is in 1:M relationship with central fact table. So the primary key of each dimension table must be a foreign key in the fact table.
Additive Measures
■ The ability of measures to be added across all dimensions of the fact table.
■ Measures could be fully additive, semi additive or non additive♦ Fully Additive - The values of the attributes summed up by
simple addition, Aggregation is a fully additive measures is done by simple addition. Sales Quantity, Revenue
♦ Semi Additive - Account Balance, Inventory, number of customers (Measure of Intensity, head counts)
♦ Non-Additive - Profit margin (Ratios and Percentages) i.e., Ratio or Percentages should not be added 1:3, 30%, etc.
Factless Fact tables to track attendance of students
Student Attendance Fact
time_Key
Student_Key
Course_Key
Faculty_Key
Facility_key
Student Dimension
Student_key
Student attributes…..
Faculty Dimension
Faculty_key
Faculty attributes…..
Time Dimension
time_key
time attributes…..
Course Dimension
Course_key
Course attributes…..
Facility Dimension
Facility_key
Facility attributes…..
Factless Fact tables – Coverage tables
Date Dimension
Date_key
Day_of_week
Month
Quarter
Year
Holiday_flag
Sales Fact
Date_key
Product_key
Promotion_key
Store_key
Dollars_sold
Units_sold
Dollars_cost
Product Dimension
Product_key
Description
Brand
Category
Store Dimension
Store_key
Store_name
Address
Floor_plan_type
Promotion Dimension
Promotion_key
Promotion_name
Discount
Event
Sales Fact (revisited)
Factless Fact tables – Coverage tables
Date Dimension
Date_key
Day_of_week
Month
Quarter
Year
Holiday_flag
Sales Fact
Date_key
Product_key
Promotion_key
Store_key
Product Dimension
Product_key
Description
Brand
Category
Store Dimension
Store_key
Store_name
Address
Floor_plan_type
Promotion Dimension
Promotion_key
Promotion_name
Discount
Event
Dimensional Modeling Framework
Identify Subject Area, Grain
Detail Facts with Measures
Source-Data Model Mapping
Pre-calculations, Aggregates, Indexes, Data Structures, Source-Physical Model Mapping
Conceptual Level
Logical Level
Level of detail
Physical Level
Identify Major Dimension & Facts, Conform Dimensions across Facts
•Detail Dimensions with Hierarchies &Attributes•Slowly changing Dimensions Policies
Dimensional Modeling Framework
Dimensional Modeling Framework
■ STEP1: Choose the process♦ Chose a process or subject area from the list of subject areas
identified
♦ Examples of this could be Sales Analysis, Strategic Sourcing, Human Resources
■ STEP2 : Choose the Grain♦ Choose the level of detail
♦ Every data mart / warehouse should be based on the most granular (atomic) data that can possibly be collected and stored.
Dimensional Modeling Framework
■ STEP3 : Identifying Dimensions & Dimension Hierarchy
DateDate
MonthMonth
QuarterQuarter
YearYear
WeekWeek
CityCity
DistrictDistrict
StateState
RegionRegion
ProductProduct
BrandBrand
CategoryCategory
Dimensional Modeling Framework
■ STEP4 : Choose the Measures♦ Identify all the measures for the fact table
■ STEP5 : Conforming the dimensions♦ Common dimensions across the Facts/ data marts have to be
exactly same or subset of the main dimension table
Dimensional Modeling Framework
■ STEP6 : Adding Attributes to Dimension Tables♦ Enhance the depth of analysis
♦ Examples Customer Age, Address, Profession, Product color, flavor, product size, packaging type etc.
DateDate
MonthMonth
QuarterQuarter
YearYear
WeekWeek
CityCity
DistrictDistrict
StateState
RegionRegion
ProductProduct
BrandBrand
CategoryCategory
CurrentFlagSequence
Regional Manager
PhoneAddress Size Color
Dimensional Modeling Framework
■ STEP7 : Storing Pre-calculations in the Fact table♦ Calculated based on one or more base measures
■ STEP8 : Choosing the Duration of the Database♦ Need for analyzing the data over a period of time
Dimensional Modeling Framework
■ STEP9 : Track Slowly Changing Dimensions♦ Certain kinds of dimension attribute changes need to be handled
differently in Data Warehouse► Type I – Overwrite
► Type II - History
► Type III – Add new column example :- Organizational changes
Retail Sales – Case Study
Retail Sales - Case Study
■ Background♦ Chain consists of over 100 grocery stores in five states
♦ Stores average 60,000 SKUs in departments such as frozen foods, dairy etc.
♦ Bar codes are scanned directly into the cash registers PoS system
♦ Items are promoted via coupons, temporary price reductions, ads and in-store promotions
■ Analytical Requirements♦ Need to know what is selling in store each day in order to evaluate
product movement, as well as to see how sales are impacted by promotions
♦ Need to understand the mix of items in a consumer market basket
Retail Sales - Case Study
■ Identify the Business Process
■ Identify the Grain
■ Identify the Dimensions
■ Identify the Measures
■ Sales
■ Transaction Item (Daily Sales)
■ Date, Location, Item, Promotion
■ Quantity, Price, Amount
Retail Sales - Case Study
Date DescriptionWeekMonthQuarterYear
DATE KEYITEM KEYSTORE KEYPROMOTION KEYPOS TRXN#
Sales QuantityUnit Sales PriceSales $ Amount
Store NameCityDistrictZone
Item DescriptionItem SizePackage TypeCategory
Promotion DescriptionDiscount
DATE KEY
STORE KEYPROMOTION KEY
ITEM KEY
Resultant Sales Schema
Retail Sales - Case Study
Time Dimension
Date Key Date Day of Week Day Number in Month
Month Quarter Year Holiday Indicator
1 1/1/1999 Friday 1 January Q1 1999 Holiday
2 1/2/1999 Saturday 2 January Q1 1999 Non-Holiday
3 1/3/1999 Sunday 3 January Q1 1999 Non-Holiday
4 1/4/1999 Monday 4 January Q1 1999 Non-Holiday
Item Dimension
Itemkey Item Description SKU Number Dept Size Package Type Brand Category
1 Lasagna 6 OZ 90706287103 Grocery 6 OZ Box Cold Gourmet Frozen Foods
2 Beef Stew 6 OZ 16005393282 Grocery 6 OZ Box Cold Gourmet Frozen Foods
3 Extra Nougat 2 OZ
46817560065 Grocery 6 OZ Can Chewy Candy
Retail Sales - Case Study
Promotion Dimension
Promo Key Promo Name Price Reduction Ad Type Media Type Promo $ Begin Date End Date
1 Blue Ribbon Discounts
Temporary Daily Paper Paper 2000 1/1/1999 1/15/1999
2 Red Carpet Closeout
Markdown Sunday Paper Paper 1000 1/3/1999 1/10/1999
3 Ad Blitz None Paper and Radio Paper and Radio
7000 1/15/1999 1/30/1999
Sales Fact
Date Key Item Key Store Key Promo Key POS Trxn # Sales Qty Unit Sales Price
Sales $Amt
1 1 1 15 763457893 1 4.59 4.59
1 2 1 1 763457893 2 0.89 1.78
1 5 11 19 763457894 1 2.56 2.56
2 13 5 8 763457923 1 0.33 0.33
2 5 11 12 763457998 1 1.29 1.29
Dimensional Modeling Tips
Dimensional Modeling Tips
■ Carefully choose the labels to identify data marts, dimension, attributes and facts
■ An attribute can live in one and only one dimension, whereas a fact can be repeated in multiple fact tables
■ If a single dimension appears to reside in more than one places, several roles are probably being played. Name the roles uniquely and treat them as separate dimensions
■ A single field in the underlying source data can have one or more logical columns associated with it
♦ E.g., A product attribute field may translate to product code, product short description, and product long description
■ Every fact should have a default aggregation rule (sum, min, max, latest, semi-additive, special algorithm, and not aggregatable)
♦ This will serve as a requirements list for query and report writers tools evaluations
Thank You
■ References♦ Ralph Kimball
► The Data Warehouse Toolkit
► The Data Warehouse Lifecycle Toolkit