50

Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 2: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 3: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 4: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

(1960’s and earlier)

- primitive file processing

Data collection and database creation

(1970’s)

- data modeling tools

- indexing and data organization techniques

- query languages and query processing

- user interfaces

- optimization methods

- on-line transactional processing (OLTP)

Database management systems

- network and relational database systems

(mid-1980’s - present)

- advanced data models:

extended-relational, object-

oriented, object-relational

- application-oriented: spatial,

temporal, multimedia, active,

scientific, knowledge-bases,

World Wide Web.

Advanced databases systems

(2000 - ...)

New generation of information systems

Data warehousing and data mining

(late-1980’s - present)

- data warehouse and OLAP technology

- data mining and knowledge discovery

Page 5: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

How can I analyze

this data????

???

Page 6: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

Knowledge

[gold nuggets]

[ a mountain of data]

[a shovel]

[a pick]

[beads of sweat]

Page 7: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

patterns

knowledge

Integration

Cleaning &

Data

Mining

Selection &

Transformation

..

..

data

warehouse

data basesflat files

Evaluation

& Presentation

Page 8: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

WarehouseDataData

Base

EngineData Mining

Database or

Server Data Warehouse

Data cleaningdata integration filtering

Graphic User Interface

KnowledgeBase

Pattern Evaluation

Page 9: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 10: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

data

warehouse

clean

transform

integrate

load

client

client

query

and

analysis

tools.

.

.

.

.

.

data source in Vancouver

data source in New York

data source in Chicago

Page 11: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

roll-upon time data

drill-down

for Q1on address

homeentertainment

(types)item

computer

phone

security

time

Q1

Q2

Q3

Q4

(cities)address

New York

Montreal

Vancouver

Chicago

14K825K605K

(quarters)

homeentertainment

(types)item

computer

phone

security

March

Feb

Jan

time(months)

(cities)address

New York

Montreal

Vancouver

Chicago

400K

150K

100K

150K

homeentertainment

(types)item

computer

phone

security

time(quarters)

Q1

Q2

Q3

Q4

address(regions)

North

South

East

West

a)

<Vancouver,Q1,security>

b)

Page 12: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 13: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 14: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 15: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 16: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

+

+

+

Page 17: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 18: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

MachineLearning

StatisticsSystemsDatabase

ScienceInformation

Visualization Other disciplines

Page 19: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 20: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 21: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 22: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 23: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 24: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 25: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 26: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 27: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 28: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 29: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 30: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 31: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 32: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 33: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 34: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

870

925

789

698

984

1002

682

784

728

623

872

591

89

38

43

882

968

746

854

1087

818

580381038927

501301023812

51231952680

Q1

Q2

Q3

Q4

New York

Montreal

(quarters)

Chicago(cities)location

14K825K 400K605K

time

security

phone

computer

item(types)

entertainmenthome

Vancouver

homeentertainment

(types)item

computer

phone

security

homeentertainment

(types)item

computer

phone

security

homeentertainment

(types)item

computer

phone

security

time(quarters)

Q1

Q2

Q3

Q4

14K825K605K 400K

New York

Montreal

Vancouver

Chicago(cities)location "SUP1" "SUP2" "SUP3"supplier = supplier = supplier =

Page 35: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

all

item location suppliertime

time, supplier item, supplier

time, location

time, item

item, location location, supplier

time, item, location

item, location, suppliertime, item, supplier

time, location, supplier

1-D cuboids

0-D (apex) cuboid

3-D cuboids

2-D cuboids

4-D (base) cuboiditem, item, location, supplier

Sales FactTime Dimensionyearquartermonthday_of_weekdaytime_key

Location Dimension

country

citystreetlocation_key

Branch Dimension

branch_key

branch_key

Item Dimension

province_or_state

item_key

time_key

branch_type

item_key

branditem_name

typesupplier_type

branch_name

location_key

dollars_soldunits_sold

Page 36: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

time_key

Sales FactTime Dimension

month

time_keyLocation Dimension

supplier_keySupplier Dimension

supplier_type

location_key

city_key countryCity Dimension

year

day_of_week

street

city_keycity

supplier_key

location_key

dollars_soldunits_sold

quarter

day

Branch Dimension

branch_typebranch_namebranch_key

item_key

branch_key

Item Dimension

province_or_state

type

item_key

branditem_name

Page 37: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

time_key

Sales Fact

units_sold

dollars_soldlocation_key

brand

Shipper Dimension

shipper_keyfrom_locationto_location

Time Dimensionyearquartermonth

time_key

day_of_weekday

location_keystreetcity

countryLocation Dimension

Shipping Factshipper_typelocation_key

Branch Dimension

branch_typebranch_namebranch_key

item_key

branch_key

item_name

Item Dimension

item_key

province_or_state

shipper_nameshipper_key

typetime_keyitem_key

dollars_costunits_shipped

Page 38: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 39: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 40: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 41: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

British

Columbia

Vancouver Victoria

Ontario Quebec

Toronto Montreal

New York

New York Los Angeles San Francisco

California Illinois

Chicago

Canada USA

............ ... ...

...

......

all

... ... ...... ... ...

location

all

country

province_or_state

city

month

quarter

year

week

day

country

city

street

province_or_state

Page 42: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

($0 - $200]

($100 - $200]

($200 - $400]

($200 - $300]

($400 - $600]

($400 - $500]

($600 - $800]

($600 - $700] ($700 - $800]($500 - $600]($300 - $400]

($800 - $1,000]

($800 - $900]

($0 - $1000]

($0 - $100] ($900 - $1,000]

Page 43: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

phone

(types)item

computer security

time

entertainment

(quarters)

Q2

Q3

Q4

location(countries)

US

Canada

Q1

home

(cities)location

Montreal

Vancouver

time(quarters)

Q1

Q2

(types)item

homeentertainment

computer

(cities)location

New York

Montreal

Vancouver

Chicago

time(quarters)

Q1

Q3

Q4

Q2

homeentertainment

(types)item

computer

phone

security

14K825K605K 400K

on time

(from quarters

to months)

drill-downon location

roll-up

(from cities to countries)

for time="Q2"

slice

(time="Q1" or "Q2") and

dice for

(location="Montreal" or "Vancouver") and

(item="home entertainment" or "computer")

homeentertainment

(types)item

computer

phone

security

time(months)

(cities)location

Vancouver

Montreal

Chicago

New York

homeentertainment

computer

phone

security

(types)item

homeentertainment

(types)item

computer

phone

security

Chicago

New York

MontrealVancouver

(cities)location

pivot

150K

100K

150K

New York

Montreal

Vancouver

Chicago(cities)location

March

AprilMay

June

July

August

September

October

November

December

January

February

Page 44: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

time

location

customer

namestreet

continent

city

province_or_state

country

itemday

month

quarter

year

category

group

brandname typecategory

Page 45: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 46: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 47: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

LoadTransform

CleanExtract

Refresh

Query/Report Analysis Data Mining

OLAP Server OLAP ServerOutput

Operational Databases External sources

Data Cleaning

and

Data Integration

Data Storage

OLAP Engine

Front-End Tools

Metadata Repository

AdministrationMonitoring Data MartsData Warehouse

Page 48: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining

EnterpriseData

Warehouse

Define a high-level corporate data model

model

refinement model refinement

DataMartMart

Data

Data MartsDistributed

Multi-Tier

WarehouseData

Page 49: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining
Page 50: Morgan-Kaufmann-Jiawei-Han-Micheline-Kamber-DataMining