31
MIS 451 Building Business Intelligence Systems Logical Design (1)

MIS 451 Building Business Intelligence Systems Logical Design (1)

Embed Size (px)

Citation preview

Page 1: MIS 451 Building Business Intelligence Systems Logical Design (1)

MIS 451

Building Business Intelligence Systems

Logical Design (1)

Page 2: MIS 451 Building Business Intelligence Systems Logical Design (1)

2

Project Planning

Requirements Analysis

Physical Design

Logical Design

Data Staging

Data Analysis (OLAP)

Page 3: MIS 451 Building Business Intelligence Systems Logical Design (1)

3

Introduction to Dimensional Modeling

Dimensional Modeling is a DW logical design technique that seeks to present data in a standard framework that is intuitive for data access and allows for high performance data access.

Intuitive: easy to write SQL High performance: high performance SQL

Page 4: MIS 451 Building Business Intelligence Systems Logical Design (1)

4

Customer

Places

1

Order

M

Contain

1

OrderLine

MOrder

M

Product

1

Belong to

M

ProductCategory

1

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

ER Model

Dimensional Model (Star Schema)

For detailed information, please refer handout 1.

Page 5: MIS 451 Building Business Intelligence Systems Logical Design (1)

5

Introduction to Dimensional Modeling

Analytical Report: 2-dimension January sales report by customer state and product category

Query: list sales in Jan. by customer state and product category?

Page 6: MIS 451 Building Business Intelligence Systems Logical Design (1)

6

Introduction to Dimensional Modeling

Query based on ER Model:

 

Select State, PCName, SUM(Price*Quantity)

From OrderLine OL, Customer C, Product_Category PC, Product P, Order O

Where OL.OID = O.OID and OL.PID = P.PID and O.CID = C.CID and to_char(O.OrderDate,’MON’) = ’JAN’ and P.PCID = PC.PCID

Group by State, PCName

Join: 5 tables

Query based on Dimensional Model:

 

Select State, PCName, SUM(Sales)

From Sales S, Customer C, Product P, Time T

Where S.Time_ Key = T.Time_Key and S.Product_ Key = P.Product_Key and S.Customer_Key = C.Customer_Key and T.Month= ’JAN’

Group by State, PCName

Join: 4 tables

 

Page 7: MIS 451 Building Business Intelligence Systems Logical Design (1)

7

Fact and Dimension

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

Fact table

Dimension table

Page 8: MIS 451 Building Business Intelligence Systems Logical Design (1)

8

Fact and Dimension

There are two types of tables in dimensional modeling: Fact table: attributes in fact tables are

measurements for analysis or contents in reports.

Dimension table: attributes in dimension tables are constraints for the measurements or headers in reports.

Dimensions Facts

Page 9: MIS 451 Building Business Intelligence Systems Logical Design (1)

9

Facts and Dimensions

Criteria Fact Attributes Dimension Attributes

Purpose Measurements for analysis Constraints for the measurements

Reporting use Report content Row or column report headers

Data type Most facts are numeric and additive. There are semi-additive or no-additive facts.

Textual, descriptive

Size Larger number of records Smaller number of records

Page 10: MIS 451 Building Business Intelligence Systems Logical Design (1)

10

Facts and Dimensions

How to identify facts and dimensions? Requirements Analysis:

Analytical requirements: Marketing managers want to know sales performance for different product category in different states?

Information requirements: quantity of product sold, sales amount, product category, and customer states

ER Model

Page 11: MIS 451 Building Business Intelligence Systems Logical Design (1)

11

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

Page 12: MIS 451 Building Business Intelligence Systems Logical Design (1)

12

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

F1: Calculation

F: refers to special considerations for fact table or special type of fact table

Page 13: MIS 451 Building Business Intelligence Systems Logical Design (1)

13

F1: Calculation

Normalization in RDB 1NF 2NF 3NF

Non-volatile property of data warehouse enables DW design to resist normalization and improve query performance.

Page 14: MIS 451 Building Business Intelligence Systems Logical Design (1)

14

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

D1: Slowly changing dimension

D: refers to special considerations for dimension table or special type of dimension table

Page 15: MIS 451 Building Business Intelligence Systems Logical Design (1)

15

D1: Slowly changing dimension

Values of attributes in dimension tables may evolve over time. For example, customers moved from one city to another city.

CID CName State City

101 Jon Arizona Tucson

102 Tom Arizona Tucson

103 Mark Arizona Phoenix

Tom moved from Tucson to Phoenix

Phoenix

Page 16: MIS 451 Building Business Intelligence Systems Logical Design (1)

16

D1: Slowly changing dimension There are three ways to handle slowly changing dimension. Method 1: Overwrite old values with new values

CID CName State City

101 Jon Arizona Tucson

102 Tom Arizona Tucson

103 Mark Arizona Phoenix

CID CName State City

101 Jon Arizona Tucson

102 Tom Arizona Phoenix

103 Mark Arizona Phoenix

Page 17: MIS 451 Building Business Intelligence Systems Logical Design (1)

17

D1: Slowly changing dimension

Drawbacks of method 1:

Historical information is totally lost.

We will never know that customer 102 lived in Tucson before.

Moreover, when listing sales by city, all the sales of customer 102 will be counted as part of Phoenix sales, although 102 was in Tucson before.

Page 18: MIS 451 Building Business Intelligence Systems Logical Design (1)

18

D1: Slowly changing dimension Method 2: Add a new attribute to record current value of the changing attribute.

CID CName State City

101 Jon Arizona Tucson

102 Tom Arizona Tucson

103 Mark Arizona Phoenix

CID CName State Original City Current City

101 Jon Arizona Tucson Tucson

102 Tom Arizona Tucson Phoenix

103 Mark Arizona Phoenix Phoenix

Page 19: MIS 451 Building Business Intelligence Systems Logical Design (1)

19

D1: Slowly changing dimension

Drawbacks of method 2:

Only partial Historical information (original & current) is kept.

Considering that customer 102 moved from Tucson to Flagstaff then to Phoenix, the customer information of customer 102 only includes Tucson and Phoenix.

Page 20: MIS 451 Building Business Intelligence Systems Logical Design (1)

20

D1: Slowly changing dimension Method 3: Add a record whenever a dimension attribute changes.

CID CName State City

101 Jon Arizona Tucson

102 Tom Arizona Tucson

103 Mark Arizona Phoenix

Page 21: MIS 451 Building Business Intelligence Systems Logical Design (1)

21

D1: Slowly changing dimension

Method 3 keep all the information. However,

Is there any problem?

Page 22: MIS 451 Building Business Intelligence Systems Logical Design (1)

22

D1: Slowly changing dimension Method 4: warehouse key + method 3 Warehouse key is a sequence of non-negative integers served as primary keys of tables in data warehouse.

CID CName State City

101 Jon Arizona Tucson

102 Tom Arizona Tucson

103 Mark Arizona PhoenixWarehouse key

Page 23: MIS 451 Building Business Intelligence Systems Logical Design (1)

23

D1: Slowly changing dimension

Why warehouse key is needed in data warehouse?

Solve slowly changing dimension problem

Compared with natural keys (i.e., primary keys of tables in RDB, such as CID of customer table), warehouse keys have high join performance.

Page 24: MIS 451 Building Business Intelligence Systems Logical Design (1)

24

D1: Slowly changing dimension

Warehouse key

Primary keys in dimensional tables are warehouse keys.

Primary key in fact table is a collection of warehouse keys of all/part of its associated dimensions.

Page 25: MIS 451 Building Business Intelligence Systems Logical Design (1)

25

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

D1: Slowly changing dimension

Notation: Primary key

Page 26: MIS 451 Building Business Intelligence Systems Logical Design (1)

26

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

D2: Time Dimension

D: refers to special considerations for dimension table or special type of dimension table

Page 27: MIS 451 Building Business Intelligence Systems Logical Design (1)

27

D2: Time Dimension Data warehouse needs an explicit time

dimension table instead of just a time attribute (e.g, ORDERDATE).

Besides the time attribute, time dimension table includes the following additional attributes:

Day_of_week (1-7); Day_number_in_month (1-31); Day_number_in_year (1-365) Week_number (1-52); month (1-12), Quarter (1-4) Holiday_flag (y/n) Fiscal_quarter, Fiscal_year

Page 28: MIS 451 Building Business Intelligence Systems Logical Design (1)

28

D2: Time Dimension

Time dimension can:

Save computation effort and improve query performance

Complex queries regarding calendar calculation are hidden from end users of data warehouse.

Page 29: MIS 451 Building Business Intelligence Systems Logical Design (1)

29

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

D3: Snowflake

D: refers to special considerations for dimension table or special type of dimension table

Page 30: MIS 451 Building Business Intelligence Systems Logical Design (1)

30

D3: Snowflake

PRODUCT_CATEGORY# PRODUCT_CATEGORY_KEY* PCID* PCNAME

CUSTOMERTIME

SALES

PRODUCT# PRODUCT_KEY* PID* PNAME* PRODUCT_CATEGORY_KEY

REFERECEREFERENCED BY

REFERENCE

REFERENCED BY

REFERENCE

REFERENCED BY

REFERENCE

REFERENCED BY

Snowflake structure

Page 31: MIS 451 Building Business Intelligence Systems Logical Design (1)

31

D3: Snowflake

Snowflake structure should be avoided in data warehouse design

Tradeoff of avoiding snowflake

Advantage: improve query performance

Disadvantage: require more storage space