63
Data Warehouse Ali Kamandi Sharif University of Technology Spring 2007 [email protected]

Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

Data Warehouse

Ali Kamandi

Sharif University of Technology

Spring 2007

[email protected]

Page 2: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

2

Part 1:

Data Warehouse Concepts

Page 3: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

3

Data Management

� A critical success factor: IT applications cannot be done without using data.

� The Difficulties of managing Data: • The amount of data increases exponentially with time

• Data are scattered throughout organization

• An ever-increasing amount of external data needs

• Data security, quality, and integrity are critical

Page 4: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

4

Data Life Cycle

Page 5: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

5

Data Sources

� Internal Data Sources: data about people,

products, services, and processes.

� Personal Data: IS users or other corporate

employees may document their own expertise

by creating personal data.

� External Data Sources: Data from commercial

databases to sensors and satellites.

Page 6: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

6

Data, Data everywhere yet ...

� I can’t find the data I need

� data is scattered over the network

� many versions, subtle differences

� I can’t get the data I need

� need an expert to get the data

� I can’t understand the data I found

� available data poorly documented

� I can’t use the data I found

� results are unexpected

� data needs to be transformed from one

form to other

Page 7: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

7

What is a Data Warehouse?

A single, complete and

consistent store of data

obtained from a variety of

different sources made

available to end users in a

what they can understand and

use in a business context.

[Barry Devlin]

Page 8: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

8

Which are ourlowest/highest margin

customers ?

Which are ourlowest/highest margin

customers ?

Who are my customers and what products are they buying?

Who are my customers and what products are they buying?

Which customersare most likely to go to the competition ?

Which customersare most likely to go to the competition ?

What impact will new products/services

have on revenue and margins?

What impact will new products/services

have on revenue and margins?

What product prom--otions have the biggest

impact on revenue?

What product prom--otions have the biggest

impact on revenue?

What is the most effective distribution

channel?

What is the most effective distribution

channel?

Why Data Warehousing?

Page 9: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

9

Decision Support

� Used to manage and control business

� Data is historical or point-in-time

� Optimized for inquiry rather than update

� Use of the system is loosely defined and can

be ad-hoc

� Used by managers and end-users to

understand the business and make judgements

Page 10: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

10

Evolution of Decision Support

� 60’s: Batch reports

� hard to find and analyze information

� inflexible and expensive, reprogram every request

� 70’s: Terminal based DSS and EIS

� 80’s: Desktop data access and analysis tools

� query tools, spreadsheets, GUIs

� easy to use, but access only operational db

� 90’s: Data warehousing with integrated OLAP

engines and tools

Page 11: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

11

Definition of data warehousing

According to W.H.Inmon

A data warehouse is a subject-oriented,

integrated, time-variant and non-volatile

collection of data in support of management’s

decision making process.

Page 12: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

12

Characteristics of a Data Warehouse Subject-Oriented. Data are organized by subject and contain

information relevant for decision support only .

Consistency. Data in different operational databases may be encoded differently . In the data warehouse, though, they will be coded in a consistent manner.

Time variant. The data are kept for many years so that they can be used for trends, forecasting, and comparisons over time.

Non-volatile. Data are not updated once entered into the warehouse.

Multidimensional. Typically the data warehouse uses a multidimensional structure .

Web-based. Today’s data warehouse are designed to provide an efficient computing environment for web-based applications.

� often a copy of operational data

� with value-added data (e.g., summaries, history)

Page 13: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

13

Why a Warehouse?

� Two Approaches:

� Query-Driven (Lazy)

� Warehouse (Eager)

Source Source

?

Page 14: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

14

Query-Driven Approach

Client Client

Wrapper Wrapper Wrapper

Mediator

Source Source Source

Page 15: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

15

Warehouse Architecture

Client Client

Warehouse

Source Source Source

Query & Analysis

Integration

Metadata

Page 16: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

16

Advantages of Warehousing

� High query performance

� Queries not visible outside warehouse

� Local processing at sources unaffected

� Can operate when sources unavailable

� Can query data not stored in a DBMS

� Extra information at warehouse

� Modify, summarize (store aggregates)

� Add historical information

Page 17: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

17

Advantages of Query-Driven

� No need to copy data

� less storage

� no need to purchase data

� More up-to-date data

� Only query interface needed at sources

Page 18: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

18

OLTP vs. OLAP

� OLTP: On Line Transaction Processing

� Describes processing at operational sites

� OLAP: On Line Analytical Processing

� Describes processing at warehouse

Page 19: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

19

OLTP vs. OLAP

� Mostly updates

� Many small transactions

� Mb-Tb of data

� Raw data

� Clerical users

� Up-to-date data

� Consistency,

recoverability critical

� Mostly reads

� Queries long, complex

� Gb-Tb of data

� Summarized, consolidated

data

� Decision-makers, analysts

as users

OLTP OLAP

Page 20: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

20

OLTP vs. OLAP OLTP OLAP

users clerk, IT professional knowledge worker

function day to day operations decision support

DB design application-oriented subject-oriented

data current, up-to-date detailed, flat relational isolated

historical, summarized, multidimensional integrated, consolidated

usage repetitive ad-hoc

access read/write index/hash on prim. key

lots of scans

unit of work short, simple transaction complex query

# records accessed tens millions

#users thousands hundreds

DB size 100MB-GB 100GB-TB

metric transaction throughput query throughput, response

Page 21: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

21

Data Marts

Data Mart: A small data warehouse designed for a

strategic business unit ( SBU) or a department

The advantage of data marts include: low cost

(Prices under $100,000 versus $1million or more for

data warehouses); significantly shorter lead time for

implementation (often less than 90 days), local rather

than central control (conferring power on the using

group), More rapid response and more easily

understood and navigated than an enterprise wide

data warehouse .

Page 22: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

22

Part 2:

Data Warehouse Design

Page 23: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

23

Building a Data Warehouse

Page 24: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

24

Relational and Multidimensional Database

� Relational databases store data in two –

dimensional tables. Multidimensional

databases typically store data in arrays, which

consist of at least three business dimension.

Page 25: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

25

Relational data model

� based on a single structure of data values in a two

dimensional table

CUSTOMER ORDER

………

…Lyn002

…Robert001

…Cus_nameCus_id

03 Dec 02

02 Dec 02

Ord_date

………

…Lyn02

…00201

…Cus_idOrd_no

Page 26: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

26

A Sample Data CubeTotal annual sales

of TV in U.S.A.Date

Pro

duct

Cou

ntr

y

sum

sumTV

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

U.S.A

Canada

Mexico

sum

Page 27: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

27

Cuboids Corresponding to the Cubeall

product date country

product,date product,country date, country

product, date, country

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D(base) cuboid

• Cuboids show the data at different degrees of summarization.

• Given a set of dimensions, we can construct a lattice of cuboids, each showing the data at a different level of summarization, or group by. The lattice of cuboids is then referred to as a data cube.

Page 28: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

28

Multidimensional Data Model

� Composed of one fact table and a set of dimension tables.

� Dimensional table: each dimension table has a simple table (non-composite) primary key that corresponds exactly to one of the components of the composite key in the fact table.

� A multidimensional data model is typically organized around a central theme, like sales, for instance.

Page 29: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

29

Conceptual Modeling of Data Warehouses

Modeling data warehouses: dimensions & measures

� Star schema: A fact table in the middle connected to a set

of dimension tables

� Snowflake schema: A refinement of star schema where

some dimensional hierarchy is normalized into a set of

smaller dimension tables, forming a shape similar to

snowflake

� Fact constellations: Multiple fact tables share dimension

tables, viewed as a collection of stars, therefore called

galaxy schema or fact constellation

Page 30: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

30

Example of Star Schematime_key

day

day_of_the_week

month

quarter

year

time

location_key

street

city

province_or_street

country

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_key

item_name

brand

type

supplier_type

item

branch_key

branch_name

branch_type

branch

Page 31: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

31

Example of Snowflake Schematime_key

day

day_of_the_week

month

quarter

year

time

location_key

street

city_key

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_key

item_name

brand

type

supplier_key

item

branch_key

branch_name

branch_type

branch

supplier_key

supplier_type

supplier

city_key

city

province_or_street

country

city

Page 32: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

32

Example of Fact Constellation

time_key

day

day_of_the_week

month

quarter

year

time

location_key

street

city

province_or_street

country

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_key

item_name

brand

type

supplier_type

item

branch_key

branch_name

branch_type

branch

Shipping Fact Table

time_key

item_key

shipper_key

from_location

to_location

dollars_cost

units_shipped

shipper_key

shipper_name

location_key

shipper_type

shipper

Page 33: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

33

Typical OLAP Operations� Roll up (drill-up): summarize data

� by climbing up hierarchy or by dimension reduction

� Drill down (roll down): reverse of roll-up

� from higher level summary to lower level summary or detailed data,

or introducing new dimensions

� Slice and dice:

� project and select

� Pivot (rotate):

� reorient the cube, visualization, 3D to series of 2D planes.

� Other operations

Page 34: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

34

Operations

� Rollup: summarize data

� e.g., given sales data, summarize sales for last

year by product category and region

� Drill down: get more details

� e.g., given summarized sales as above, find

breakup of sales by city within each region, or

within the Andhra region

Page 35: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

35

More Cube Operations

� Slice and dice: select and project

� e.g.: Sales of soft-drinks in Andhra over the last

quarter

� Pivot: change the view of data

Page 36: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

36

Cube

sale prodId storeId amt

p1 c1 12

p2 c1 11

p1 c3 50

p2 c2 8

c1 c2 c3

p1 12 50

p2 11 8

Fact table view:Multi-dimensional cube:

dimensions = 2

Page 37: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

37

3-D Cube

sale prodId storeId date amt

p1 c1 1 12

p2 c1 1 11

p1 c3 1 50

p2 c2 1 8

p1 c1 2 44

p1 c2 2 4

day 2c1 c2 c3

p1 44 4

p2 c1 c2 c3

p1 12 50

p2 11 8

day 1

dimensions = 3

Multi-dimensional cube:Fact table view:

Page 38: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

38

Another Example

sale prodId storeId date amt

p1 c1 1 12

p2 c1 1 11

p1 c3 1 50

p2 c2 1 8

p1 c1 2 44

p1 c2 2 4

• Add up amounts by day, product• In SQL: SELECT date, sum(amt) FROM SALE

GROUP BY date, prodId

sale prodId date amt

p1 1 62

p2 1 19

p1 2 48

drill-down

rollup

Page 39: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

39

Pivoting

sale prodId storeId date amt

p1 c1 1 12

p2 c1 1 11

p1 c3 1 50

p2 c2 1 8

p1 c1 2 44

p1 c2 2 4

day 2c1 c2 c3

p1 44 4

p2 c1 c2 c3

p1 12 50

p2 11 8

day 1

Multi-dimensional cube:Fact table view:

c1 c2 c3

p1 56 4 50

p2 11 8

Page 40: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

40

Design a Warehouse?

� Design data warehouse

� Design data marts

� Design representation (Star schema, …)

� gathering data

� Which data is needed?

� Where does it come from?

� cleansing, integrating, ...

� querying, reporting, analysis

� data mining

� monitoring, administering warehouse

Page 41: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

41

Data Gathering

� Periodic snapshots

� Database triggers

� Log shipping

� Data shipping (replication service)

� …

Page 42: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

42

Integration

� Data Cleaning

� Data Loading

� Derived DataClient Client

Warehouse

Source Source Source

Query & Analysis

Integration

Metadata

Page 43: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

43

Data Cleaning

� Migration (e.g., yen � dollars)

� Fusion (e.g., mail list, customer merging)

billing DB

service DB

customer1(Joe)

customer2(Joe)

merged_customer(Joe)

Page 44: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

44

Loading Data

� Incremental vs. refresh

� Off-line vs. on-line

� Frequency of loading

� At night, 1x a week/month, continuously

� Parallel/Partitioned load

Page 45: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

45

Derived Data

� Derived Warehouse Data

� When to update derived data?

Page 46: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

46

Part 3:

Data Mining

Page 47: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

47

Data Mining Concepts

Data mining: The process of searching for

valuable business information in a large

database, data warehouse, or data mart.

Page 48: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

48

Data Mining Application

Retailing and sales

Banking

Manufacturing and production

Insurance

Police work

Health care

Marketing

Page 49: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

49

Text Mining

The application of data mining to non-

structured or less-structured text files.

Page 50: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

50

Web Mining

The application of data mining techniques to

discover actionable and meaningful patterns form

web resources.

Web mining is used in the following areas:

information filtering, surveillance, mining of web-

access logs for analyzing usage.

Page 51: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

51

Some basic operations

� Predictive:

� Regression

� Classification

� Descriptive:

� Clustering / similarity matching

� Association rules and variants

Page 52: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

52

Classification

� Given old data about customers and

payments, predict new applicant’s loan

eligibility.

Age

Salary

Profession

Location

Customer type

Previous customers Classifier Decision rules

Salary > 5 L

Prof. = Exec

New applicant’s data

Good/

bad

Page 53: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

53

Classification methods

Goal: Predict class Ci = f(x1, x2, .. Xn)

� Regression: (linear or any other polynomial)

� a*x1 + b*x2 + c = Ci.

� Nearest neighbor

� Decision tree classifier

� Neural networks

Page 54: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

54

� Tree where internal nodes are simple

decision rules on one or more attributes and

leaf nodes are predicted class labels.

Decision trees

Salary < 1 M

Prof = teacher

Good

Age < 30

BadBadGood

Page 55: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

55

Neural network

� Set of nodes connected by directed weighted

edges

Hidden nodes

Output nodes

x1

x2

x3

x1

x2

x3

w1

w2

w3

y

n

i

ii

ey

xwo

=

+

=

= ∑

1

1)(

)(1

σ

σ

Basic NN unitA more typical NN

Page 56: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

56

Bayesian learning

� Assume a probability model on generation of data.

� Apply Bayes theorem to find most likely class as:

)(

)()|(max)|(max :class predicted

dp

cpcdpdcpc

jj

cj

c jj

==

Page 57: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

57

Clustering

� Unsupervised learning when old data with class

labels not available e.g. when introducing a new

product.

� Group/cluster existing customers based on time

series of payment history such that similar

customers in same cluster.

Page 58: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

58

What is association rule mining?

Hmmm, which items are frequently

purchased together by my customers?

milkcereal

breadmilk

bread

butter

milk bread

sugar eggs

Customer 1

Market Analyst

Customer 2

sugareggs

Customer n

Customer 3

Shopping Baskets 100100Basket 4

001011Basket 3

100111Basket 2

010011Basket 1

eggscerea

l

butte

r

suga

r

brea

d

milk

Page 59: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

59

What is association rule mining? (cont.)

100100Basket 4

001011Basket 3

100111Basket 2

010011Basket 1

eggscerealbuttersugarbreadmilk

211233count

Support (milk)=3

Support (bread)=3

Support (sugar)=2

……

Support (milk U bread)=3

Support (milk U sugar)=1

……

Support (milk U bread U sugar)=1

……

Support (milk U bread U sugar U butter U cereal U eggs)=0

Confidence (A → B)=Support (A U B)/Support (A)

As Confidence (milk → bread) =

= Support (milk U bread)/Support (milk) = 3/3 = 100%,

Then milk → bread

If Confidence (A → B) >= min_conf, Then A → B

Page 60: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

60

How DM improve your business?

Strategy 1: Placing milk

and bread within close

proximity may further

encourage the sale of

these items together

within single visits to

the store.

Page 61: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

61

How DM improve your business?

Strategy 2: Placing milk and bread at opposite ends of the store may entice customers who purchase such items to pick up other items along the way.

Page 62: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

62

How DM improve your business?

Strategy 3:Put these two

items into a package

at reduced price.

Page 63: Data Warehouse - Sharif University of Technologyce.sharif.edu/courses/85-86/2/ce467/resources/root... · 3 Data Management A critical success factor: IT applications cannot be done

63

Stage in the evolution of knowledge discovery

Proactive , integrative ;

multiple business partners

Neural computing

advanced al models,

complex optimization,

web services

What is the best plan to

follow? how did we

perform compared to

metrics?

Advanced intelligent

systems; complete

integration(2000-2004)

Prospective , proactive

information delivery

Advanced algorithms,

multiprocessor computers,

massive databases

What’s likely to happen to

the tBoston unit’s sales

next month ? Why?

Intelligent data mining

(late 1990s)

Retrospective , proactive

data delivery at multiple

level

OLAP, multidimensional

databases, data

warehouses

What were the sales in

region A by product , by

salesperson?

Data warehousing and

decision support (early

1990s)

Retrospective , dynamic

data delivery at record

level

Relational databases

(RDBMS), structured

query language (SQL)

What were unit sales in

new England last March ?Data access (1980s)

Retrospective , static data

delivery

Computers ,tapes , disks What was my total

revenue in the last 5

years?

Data collection(1980s)

Business question enabling technologies characteristic Evolutionary stage