Upload
phungthu
View
217
Download
1
Embed Size (px)
Citation preview
IIO30120 Database Design
Data warehouse designbased on the book Hovi, Huotari, Lahdenmäki:
Tietokantojen suunnittelu & indeksointi, Docendo (2003, 2005) chapter 8
© Jouni Huotari & Ari Hovi
Some slides from http://www.slideshare.net/idnats/data-warehousing-and-data-mining-presentation-72547618.10.2015
• What is a data warehouse?
• How the term Business Intelligence relates to data warehousing?
• How data mart differs from data warehouse?
• Why OLAP (online analytical processing) is important?
• What is the basic idea behind ETL?
For discussion
© Jouni Huotari & Ari Hovi2
18.10.2015
What is Data Warehouse?
• Defined in many different ways, but not rigorously. A decision support database that is maintained separately from the
organization’s operational database
Support information processing by providing a solid platform of consolidated, historical data for analysis.
• “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon
• Data warehousing: The process of constructing and using data warehouses
To support fast summary queries, analysis, and reporting
• This is often difficult in operative databases, but some solutions exist
• can you mention any solutions?
It is important to
• Maintain history for seeing trends etc.
• Clean up and make data consistent from different data sources
• Make the structure clear and understandable
Basic design principles for data warehouses
© Jouni Huotari & Ari Hovi4
18.10.2015
5
Example: a producer wants to know….
Which are ourlowest/highest margin
customers ?Who are my customers
and what products are they buying?
Which customersare most likely to go to the competition ?
What impact will new products/services
have on revenue and margins?
What product prom--otions have the biggest
impact on revenue?
What is the most effective distribution
channel?
18.10.2015
back room and front room of a data warehouse
© Jouni Huotari & Ari Hovi6
© Kimball, Caserta:
The Data Warehouse ETL Toolkit
(2004)
18.10.2015
7
Data Warehouse vs. Operational DBMS
• OLTP (on-line transaction processing)
– Major task of traditional relational DBMS
– Day-to-day operations: purchasing, inventory, banking, manufacturing,
payroll, registration, accounting, etc.
• OLAP (on-line analytical processing)
– Major task of data warehouse system
– Data analysis and decision making
• Distinct features (OLTP vs. OLAP):
– User and system orientation: customer vs. market
– Data contents: current, detailed vs. historical, consolidated
– Database design: ER + application vs. star + subject
– View: current, local vs. evolutionary, integrated
– Access patterns: update vs. read-only but complex queries
• Star schema: A fact table in the middle connected to a set of dimension tables
• Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake
• Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation
• Dimensions describe who, what, when, where and why for the facts.
Conceptual Modeling of Data Warehouses
8Octobe
r 18,
Example of Star Schema
10Octobe
r 18,
Another example of Star Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
state_or_province
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
11Octobe
r 18,
Example of Snowflake Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_key
item
branch_key
branch_name
branch_type
branch
supplier_key
supplier_type
supplier
city_key
city
state_or_province
country
city
12Octobe
r 18,
Example of Fact Constellation
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
province_or_state
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
Shipping Fact Table
time_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shipped
shipper_key
shipper_name
location_key
shipper_type
shipper
13Octobe
r 18,
Multidimensional Data
• Sales volume as a function of product, month, and region
Pro
duct
Month
•Dimensions: Product, Location, Time
•Hierarchical summarization paths
Industry Region Year
Category Country Quarter
Product City Month Week
Office Day
A Sample Data CubeTotal annual sales
of TV in U.S.A.Date
Cou
ntr
ysum
sumTV
VCRPC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
15
Browsing a Data Cube
• Visualization
• OLAP capabilities
• Interactive manipulation
• Install http://www.olapcube.com/ to your virtual machine and play with the dimensions, create a cube and examine the result (dashboard)
• http://www.olapcube.com/help/writer/panorama/
• http://www.pentaho.com/testdrive
• http://www.databaseanswers.org/downloads/Data_Warehousing_by_Example.pdf
• Example (real) data: http://www.gapminder.org/data/
Exercises
© Jouni Huotari & Ari Hovi18.10.201516
• BI refers to skills, technologies, applications and practices used to help a business acquire a better understanding of its commercial context (Wikipedia)
• The goal is to gain insight into the business by bringing together data, formatting it in a way that enables better analysis, and then providing tools that give users power—not just to examine and explore the data, but to quickly understand it. (Business Intelligence with Microsoft Office
PerformancePoint Server)
What is Business Intelligence (BI)?
18.10.201517
• http://en.wikipedia.org/wiki/Data_warehouse
• http://en.wikipedia.org/wiki/Data_mart
• http://en.wikipedia.org/wiki/Star_schema
• http://en.wikipedia.org/wiki/Snowflake_schema
• http://en.wikipedia.org/wiki/OLAP
More information from wikipedia:
© Jouni Huotari & Ari Hovi18
18.10.2015