View
213
Download
0
Category
Preview:
Citation preview
Copyright © 2003, SAS Institute Inc. All rights reserved.
Best Practices inData WarehousingDesign and ImplementationRein MertensSAS Netherlands
Copyright © 2003, SAS Institute Inc. All rights reserved. 2
Best Practices�
! A best practice is a technique or methodology that, through experience and research, has proven to reliably lead to a desired result. A commitment to using the best practices in any field is a commitment to using all the knowledge and technology at one's disposal to ensure success. The term is used frequently in the fields of health care, government administration, the education system, project management, hardware and software product development, and elsewhere.
! Using past experiences to plan future developments
Copyright © 2003, SAS Institute Inc. All rights reserved. 3
Data Warehousing approaches over time
! Use operational data direct? (�virtual dwh�)
! Monolithic data warehouse? (�big bang�)
! Data marts?
! Evolutionary approach? (�think big, start small�)
Copyright © 2003, SAS Institute Inc. All rights reserved. 4
Inventory
Product
Orders
Contacts
Billing
Operational sourcesystems
Departmental datawarehouses and
business unit specificdata marts
Marketing
Finance
Sales
Personal datamarts
External data
Metadata
WEB
WAP telephone
PDA
Workstation
WWW browser
Operational datastore
(b) An Information Architecture based on Data Martswithout an Enterprise Layer
Metadata
Metadata
?
Data MartsIndependent
Copyright © 2003, SAS Institute Inc. All rights reserved. 5
Data Warehousing in 2003 requires an Architected Environment! Supports all forms of information use:
� Suppliers/Organization/Customer/Enterprise� Query and reporting� Analytics / Data mining� Multi-dimensional analysis and OLAP
! Ensures end-to-end consistency:� Co-ordinates access to operational data� Foundation layer� Metadata support− Maintenance− User view
Copyright © 2003, SAS Institute Inc. All rights reserved. 7
Data Warehousing in 2003
SAS
SAS
SAS
Staging
Step #1 - Extract and Transform from source data into Staging Area
SASSAS
SAS
Central Warehouse
Step #2 - Data QualityData ValidationSlowly changing dimensions
Finance
Customers
HCM
Suppliers
Step #3 - Transform into dimensional model
Excel
SAS
SAP
Oracle
PeopleSoft
Source
ETL StudioData Surveyors
ETL Studio transformationsData Quality plugins
More ETL flows
Copyright © 2003, SAS Institute Inc. All rights reserved. 8
Data Warehousing in 2003
SAS
SAS
SAS
Staging
SASSAS
SAS
Central Warehouse
Step #2 - Data QualityData ValidationSlowly changing dimensions
Finance
Customers
HCM
Suppliers
Step #3 - Transform into dimensional model
Excel
SAS
SAP
Oracle
PeopleSoft
Source
ETL Studio transformationsData Quality plugins
More ETL flows
Step #1 - Extract and Transform from source data into Staging Area
ETL StudioData Surveyors
Copyright © 2003, SAS Institute Inc. All rights reserved. 9
Working with operational data
! How do you know about your operational world?
! External metadata� Operational databases� Operational systems− May need navigational assistance
! (Near)realtime?
! Data Profiling!
Copyright © 2003, SAS Institute Inc. All rights reserved. 11
Data Warehousing in 2003
SAS
SAS
SAS
Staging
SASSAS
SAS
Central Warehouse
Step #2 - Data QualityData ValidationSlowly changing dimensions
Finance
Customers
HCM
Suppliers
Step #3 - Transform into dimensional model
Excel
SAS
SAP
Oracle
PeopleSoft
Source
ETL Studio transformationsData Quality plugins
More ETL flows
Step #1 - Extract and Transform from source data into Staging Area
ETL StudioData Surveyors
Copyright © 2003, SAS Institute Inc. All rights reserved. 12
The importance of staging areas
! Support an incremental approach
! A place for various data validation� Data cleansing� Data content validation� Data integrity� Key validation
! Slowly changing dimension management
Copyright © 2003, SAS Institute Inc. All rights reserved. 14
SAS
SAS
SAS
Staging
SASSAS
SAS
Central Warehouse
Step #2 - Data QualityData ValidationSlowly changing dimensions
Finance
Customers
HCM
Suppliers
Step #3 - Transform into dimensional model
Excel
SAS
SAP
Oracle
PeopleSoft
Source
ETL Studio transformationsData Quality plugins
More ETL flows
Data Warehousing in 2003
Step #1 - Extract and Transform from source data into Staging Area
ETL StudioData Surveyors
Copyright © 2003, SAS Institute Inc. All rights reserved. 15
What is a central data model?
! Normalized relational information
! Foundation Layer
! Example: Orion Star sample data model
Copyright © 2003, SAS Institute Inc. All rights reserved. 16
ContinentContinent_Id: INTEGER
Continent_Name: CHARACTER(30
CountryCountry: CHARACTER(2)
Country_Name: CHARACTER(45)Population: NUMERIC(6)Office: CHARACTER(2)Dir: CHARACTER(3)Country_Id: INTEGERContinent_Id: INTEGER (FK)Country_Former_Nam: CHARACTER(45
CustomerCustomer_Id: INTEGER
Country: CHARACTER(2)Gender: CHARACTER(1)Personal_Id: CHARACTER(15)Customer_Name: CHARACTER(40)Customer_Firstname: CHARACTER(20Customer_Lastname: CHARACTER(30Birthday: DATECustomer_Address: CHARACTER(40)Street_Id: INTEGER (FK)Street_Number: CHARACTER(8)Customer_Type_Id: INTEGER (FK)
Customer_TypeCustomer_Type_Id: INTEGER
Customer_Type: CHARACTER(40)Customer_Group_Id: INTEGERCustomer_Group: CHARACTER(40
OrderOrder_Id: INTEGER
Employee_Id: INTEGER (FKCustomer_Id: INTEGER (FKOrder_Date: DATEDelivery_Date: DATEOrder_Type: INTEGER
Order_ItemOrder_Id: INTEGER (FK)Order_Item_No: INTEGER
Product_Id: INTEGER (FK)Amount: SMALLINTPrice: DECIMAL(12,2)Unit_Cost_Price: DECIMAL(12,2Promotion: DECIMAL(5,2)
OrganizationEmployee_Id: INTEGER
Org_Name: CHARACTER(40)Country: CHARACTER(2)Org_Level_Id: INTEGER (FK)Start_Date: DATEEnd_Date: DATEOrg_Ref_Id: INTEGER (FK)
Org_LevelOrg_Level_Id: INTEGER
Org_Text: CHARACTER(40)Price_List
Product_Id: INTEGER (FK)Start_Date: DATE
End_Date: DATEUnit_Cost_Price: DECIMAL(12,2)Unit_Sales_Price: DECIMAL(12,2
ProductProduct_Id: INTEGER
Product_Name: CHARACTER(45Supplier_Id: INTEGER (FK)Product_Level_Id: INTEGER (FK)Product_Level: NUMERIC(3)Product_Ref_Id: INTEGER (FK)
Street_CodeStreet_Id: INTEGER
Country: CHARACTER(2)Street_Name: CHARACTER(30Zip_Code: CHARACTER(10)From_Street_No: NUMERIC(8)To_Street_No: NUMERIC(8)City: CHARACTER(22)County: CHARACTER(25)State: CHARACTER(2)City_Id: INTEGERCounty_Id: INTEGERState_Id: INTEGER
SupplierSupplier_Id: INTEGER
Supplier_Name: CHARACTER(30)Street_Id: INTEGER (FK)Supplier_Address: CHARACTER(30Supplier_Street_Nu: NUMERIC(3)Country: CHARACTER(2)
Product_LevelProduct_Level_Id: INTEGER
Product_Level_Name: CHARACTER(30
StaffEmployee_Id: INTEGER (FKStart_Date: DATE
Salary: DECIMAL(12,2)Birthday: DATEEnd_Date: DATEEmp_Hire_Date: DATEGender: CHARACTER(1)Emp_Term_Date: DATEJob_Title: CHARACTER(25)
PromotionProduct_Id: INTEGER (FK)Start_Date: DATE
End_Date: DATESales_Price: DECIMAL(12,2Promotion: DECIMAL(5,2)
Physical Normalized Model
Copyright © 2003, SAS Institute Inc. All rights reserved. 17
Metadata import
! Common Warehouse Metamodel (CWM)
! Meta Integration Model Bridge (MIMB)
! Registers lots of metadata at once
Copyright © 2003, SAS Institute Inc. All rights reserved. 19
Data Warehousing in 2003
SAS
SAS
SAS
Staging
SASSAS
SAS
Central Warehouse
Step #2 - Data QualityData ValidationSlowly changing dimensions
Finance
Customers
HCM
Suppliers
Step #3 - Transform into dimensional model
Excel
SAS
SAP
Oracle
PeopleSoft
Source
ETL Studio transformationsData Quality plugins
More ETL flows
Step #1 - Extract and Transform from source data into Staging Area
ETL StudioData Surveyors
Copyright © 2003, SAS Institute Inc. All rights reserved. 20
Working with dimensional models
! Purpose-oriented data mart
! Dimension tables
! Fact tables with measures
Copyright © 2003, SAS Institute Inc. All rights reserved. 21
Time_DimDate: INTEGER
Weekday_No: SMALLINTWeekday_Eu: SMALLINTWeekday_Name: VARCHAR(20)Week_No: SMALLINTWeek_Name: CHARACTER(7)Month_No: SMALLINTMonth_Name: VARCHAR(20)Quarter: CHARACTER(6)Year: CHARACTER(4)Holiday_Us: VARCHAR(45)
Product_DimProduct_Id: INTEGER
Product_Name: CHARACTER(45)Supplier_Id: INTEGERProduct_Group: CHARACTER(40)Product_Category: CHARACTER(40)Product_Line: CHARACTER(40)
Customer_DimCustomer_Id: INTEGER
Customer_Country: CHARACTER(2)Gender: CHARACTER(1)Customer_Name: CHARACTER(40)Customer_Firstname: CHARACTER(20)Customer_Lastname: CHARACTER(30)Birthday: DATECustomer_Type: CHARACTER(40)Customer_Group: CHARACTER(40)Age: DECIMAL(3) Order_Fact
Customer_Id: INTEGER (FK)Employee_Id: INTEGER (FK)Street_Id: INTEGER (FK)Product_Id: INTEGER (FK)Order_Date: DATE (FK)Order_Id: INTEGER
Order_Type: SMALLINTDelivery_Date: DATEAmount: DECIMAL(5)Price: NUMERIC(8,2)Cost_Price_Pr_Unit: DECIMAL(12,2)Promotion: DECIMAL(5,2)
Organization_DimEmployee_Id: INTEGER
Country: CHARACTER(2)Employee_Name: CHARACTER(40)Group: CHARACTER(40)Section: CHARACTER(40)Department: CHARACTER(40)Company: CHARACTER(40)Job_Title: CHARACTER(25)Gender: CHARACTER(1)Salary: DECIMAL(12,2)Birthday: DATEEmp_Hire_Date: DATEEmp_Term_Date: DATEStart_Date: DATEEnd_Date: DATE
Geography_DimStreet_Id: INTEGER
Street_Name: CHARACTER(40)Zipcode: CHARACTER(10)City: CHARACTER(40)County: VARCHAR(40)Province: CHARACTER(40)Region: CHARACTER(40)State: VARCHAR(30)Country: CHARACTER(2)Continent: CHARACTER(40)
Physical Data Model of Order Star Schema
Copyright © 2003, SAS Institute Inc. All rights reserved. 22
Next Steps
! OLAP� Design cubes for common queries/reports� Star schema is good basis for this
! Semantic layer for end user reporting� SAS Information Map Studio
Copyright © 2003, SAS Institute Inc. All rights reserved. 25
SAS® Information Map StudioHR Information Map
HR Data
Copyright © 2003, SAS Institute Inc. All rights reserved. 26
How to develop
! Document it!
! Multi-developer projects
! Check-in, check-out
! Audit history
! Dev, Test, Prod environments
! Reusable added code
Copyright © 2003, SAS Institute Inc. All rights reserved. 27
How to deploy
! Execute ETL in a production environment
! Easily managed deployment
! Schedule for periodic runs� Uses integrated LSF Scheduler from Platform Computing� Ready for use with standard schedulers too− cron (unix), at (Windows)
Copyright © 2003, SAS Institute Inc. All rights reserved. 29
Successful Data Warehouse implementations
! Shift corporate culture to rely upon DWH
! Align planning with corporate strategy
! Provide adequate resources to maintain/expand
! Demonstrate return-on-investment
! Build executive support
! Maintain complete and up-to-date content
! Employ correct technology/Upgrade at best freq.
! Plan sufficient training
Copyright © 2003, SAS Institute Inc. All rights reserved. 30
Successful Data Warehouse implementations
! Shift corporate culture to rely upon DWH 92% ! Align planning with corporate strategy 75%! Provide adequate resources to maintain/expand 75%! Demonstrate return-on-investment 50% ! Build executive support 42% ! Maintain complete and up-to-date content 42% ! Employ correct technology/Upgrade at best freq. 42% ! Plan sufficient training 33%
Copyright © 2003, SAS Institute Inc. All rights reserved. 31
Summary
! Planning is key
! Architected environment
! Don�t be afraid of ERP data
! Integrate cleansing and validation
! Don�t underestimate the value of data modeling
! Metadata
! Be sure the DWH is used
Recommended