23
Data Warehousing - Chetan Gadodia

Introduction to Datawarehousing

Embed Size (px)

Citation preview

DataWarehousing

- Chetan Gadodia

What’s Warehousing?

• Large volume of data (Gb, Tb)• Non-volatile• Historical• Time attributes are important• Updates infrequent• May be append-only

1

What’s Data Warehousing?

• Process of extracting.• Integrating.• Filtering.• Standardizing.• Transforming.• Cleaning & quality checking.• Storing it in a consolidated database.

2

Need

• Huge Amount of Operational Data• Knowledge worker wants to turn this data into useful

information.• Support strategic decision making .• From business perspective– Marketing weapon– Valuable tool in today’s world.– Learning more about Customer needs

3

Benefits

• The potential benefits of data warehousing are high returns on investment.

• Substantial competitive advantage.

• Increased productivity of corporate decision-makers.

4

Volatile•Same data for different period

Definition

Subject Oriented

•Finance•Marketing•Inventory

Integrated •SAP•Weblog•Legacy

Time Variant•Daily•Monthly•Quarterly

5

Basic Architecture6

Architecture with Staging Area7

Operational Database Data Warehouse

OLTPOLAP

Vs

Perform on-line transaction & query processing.

Day-to-Day operations of an organization

Data analysis & Decision making.

Systems can organize & present data in various formats

8

Data Marts: Overview

• Data Mart is a decentralized subset of data

• Data Marts have specific business-related purposes

9

Data Marts: Needs

• Much better performance querying from a data mart than from a data warehouse

• Much easier time navigating through data marts

10

Data Marts: Features

• Low cost • Controlled locally rather than

centrally, conferring power on the user group

• Contain less information• Rapid response• Easily understood and navigated

than an enterprise Data Warehouse

• Within the range of divisional or departmental budgets

11

Dimensional Data Modeling

E-R model• Symmetric• Divides data into many entities• Describes entities and relationships• Seeks to eliminate data redundancy• Good for high transaction performance

Dimensional model• Asymmetric• Divides data into dimensions and facts• Describes dimensions and measures• Encourages data redundancy• Good for high query performance

12

What is Dimension?

• Single join to the fact table (single primary key)

• Stores business attributes

• Attributes are textual in nature

• Organized into hierarchies

• More or less constant data

• E.g. Time, Product, Customer, Store, etc.

13

What is Fact?

• Central, dominant table

• Multi-part primary key

• Links directly to dimensions

• Stores business measures

• Constantly varying data

14

Star Schema

• A single, large and central fact table and one table for each dimension.

• For example A Fact surrounded by 4-15 dimensions

• Dimensions are de-normalized

15

Star Schema Example…

Store KeyProduct Key

Period Key

Units

Price

Store Dimension Time DimensionFact Table

Store Key

Store Name

City

State

Region

Period Key

Year

Quarter

Month

Product Key

Product Desc

16

Snowflake Schema

• Variant of star schema model.

• A single, large and central fact table and one or more tables for each dimension.

• Dimension tables are normalized i.e. split dimension table data into additional tables

17

Eg: Snowflake schema

Store KeyProduct Key

Period Key

Units

Price

Time DimensionFact Table

Store Key

Store Name

City Key

Period Key

Year

Quarter

Month

Product Key

Product Desc

City Key

City

State

Region

Store Dimension

18

Avoid Snowflakes• Avoid natural desire to normalize model:• Complicates end-user query

construction• Adds additional level of “JOIN”

complexity• Database optimizers do not handle very

well• Saves some space at the cost of longer

queries

So,• Don’t snowflake for saving space• Snowflake if secondary dimensions have

many attributes

19

Star vs Snow Flake20

Widely used ETL Tools

• IBM Information Server (Datastage) • PowerCenter –Informatica• Abinitio• SAS Data Integration Studio • Oracle Warehouse Builder (OWB)• SQL Server Integration Services(SSIS)

21

END