Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
(1960’s and earlier)
- primitive file processing
Data collection and database creation
(1970’s)
- data modeling tools
- indexing and data organization techniques
- query languages and query processing
- user interfaces
- optimization methods
- on-line transactional processing (OLTP)
Database management systems
- network and relational database systems
(mid-1980’s - present)
- advanced data models:
extended-relational, object-
oriented, object-relational
- application-oriented: spatial,
temporal, multimedia, active,
scientific, knowledge-bases,
World Wide Web.
Advanced databases systems
(2000 - ...)
New generation of information systems
Data warehousing and data mining
(late-1980’s - present)
- data warehouse and OLAP technology
- data mining and knowledge discovery
How can I analyze
this data????
???
Knowledge
[gold nuggets]
[ a mountain of data]
[a shovel]
[a pick]
[beads of sweat]
patterns
knowledge
Integration
Cleaning &
Data
Mining
Selection &
Transformation
..
..
data
warehouse
data basesflat files
Evaluation
& Presentation
WarehouseDataData
Base
EngineData Mining
Database or
Server Data Warehouse
Data cleaningdata integration filtering
Graphic User Interface
KnowledgeBase
Pattern Evaluation
data
warehouse
clean
transform
integrate
load
client
client
query
and
analysis
tools.
.
.
.
.
.
data source in Vancouver
data source in New York
data source in Chicago
roll-upon time data
drill-down
for Q1on address
homeentertainment
(types)item
computer
phone
security
time
Q1
Q2
Q3
Q4
(cities)address
New York
Montreal
Vancouver
Chicago
14K825K605K
(quarters)
homeentertainment
(types)item
computer
phone
security
March
Feb
Jan
time(months)
(cities)address
New York
Montreal
Vancouver
Chicago
400K
150K
100K
150K
homeentertainment
(types)item
computer
phone
security
time(quarters)
Q1
Q2
Q3
Q4
address(regions)
North
South
East
West
a)
<Vancouver,Q1,security>
b)
+
+
+
MachineLearning
StatisticsSystemsDatabase
ScienceInformation
Visualization Other disciplines
870
925
789
698
984
1002
682
784
728
623
872
591
89
38
43
882
968
746
854
1087
818
580381038927
501301023812
51231952680
Q1
Q2
Q3
Q4
New York
Montreal
(quarters)
Chicago(cities)location
14K825K 400K605K
time
security
phone
computer
item(types)
entertainmenthome
Vancouver
homeentertainment
(types)item
computer
phone
security
homeentertainment
(types)item
computer
phone
security
homeentertainment
(types)item
computer
phone
security
time(quarters)
Q1
Q2
Q3
Q4
14K825K605K 400K
New York
Montreal
Vancouver
Chicago(cities)location "SUP1" "SUP2" "SUP3"supplier = supplier = supplier =
all
item location suppliertime
time, supplier item, supplier
time, location
time, item
item, location location, supplier
time, item, location
item, location, suppliertime, item, supplier
time, location, supplier
1-D cuboids
0-D (apex) cuboid
3-D cuboids
2-D cuboids
4-D (base) cuboiditem, item, location, supplier
Sales FactTime Dimensionyearquartermonthday_of_weekdaytime_key
Location Dimension
country
citystreetlocation_key
Branch Dimension
branch_key
branch_key
Item Dimension
province_or_state
item_key
time_key
branch_type
item_key
branditem_name
typesupplier_type
branch_name
location_key
dollars_soldunits_sold
time_key
Sales FactTime Dimension
month
time_keyLocation Dimension
supplier_keySupplier Dimension
supplier_type
location_key
city_key countryCity Dimension
year
day_of_week
street
city_keycity
supplier_key
location_key
dollars_soldunits_sold
quarter
day
Branch Dimension
branch_typebranch_namebranch_key
item_key
branch_key
Item Dimension
province_or_state
type
item_key
branditem_name
time_key
Sales Fact
units_sold
dollars_soldlocation_key
brand
Shipper Dimension
shipper_keyfrom_locationto_location
Time Dimensionyearquartermonth
time_key
day_of_weekday
location_keystreetcity
countryLocation Dimension
Shipping Factshipper_typelocation_key
Branch Dimension
branch_typebranch_namebranch_key
item_key
branch_key
item_name
Item Dimension
item_key
province_or_state
shipper_nameshipper_key
typetime_keyitem_key
dollars_costunits_shipped
British
Columbia
Vancouver Victoria
Ontario Quebec
Toronto Montreal
New York
New York Los Angeles San Francisco
California Illinois
Chicago
Canada USA
............ ... ...
...
......
all
... ... ...... ... ...
location
all
country
province_or_state
city
month
quarter
year
week
day
country
city
street
province_or_state
($0 - $200]
($100 - $200]
($200 - $400]
($200 - $300]
($400 - $600]
($400 - $500]
($600 - $800]
($600 - $700] ($700 - $800]($500 - $600]($300 - $400]
($800 - $1,000]
($800 - $900]
($0 - $1000]
($0 - $100] ($900 - $1,000]
phone
(types)item
computer security
time
entertainment
(quarters)
Q2
Q3
Q4
location(countries)
US
Canada
Q1
home
(cities)location
Montreal
Vancouver
time(quarters)
Q1
Q2
(types)item
homeentertainment
computer
(cities)location
New York
Montreal
Vancouver
Chicago
time(quarters)
Q1
Q3
Q4
Q2
homeentertainment
(types)item
computer
phone
security
14K825K605K 400K
on time
(from quarters
to months)
drill-downon location
roll-up
(from cities to countries)
for time="Q2"
slice
(time="Q1" or "Q2") and
dice for
(location="Montreal" or "Vancouver") and
(item="home entertainment" or "computer")
homeentertainment
(types)item
computer
phone
security
time(months)
(cities)location
Vancouver
Montreal
Chicago
New York
homeentertainment
computer
phone
security
(types)item
homeentertainment
(types)item
computer
phone
security
Chicago
New York
MontrealVancouver
(cities)location
pivot
150K
100K
150K
New York
Montreal
Vancouver
Chicago(cities)location
March
AprilMay
June
July
August
September
October
November
December
January
February
time
location
customer
namestreet
continent
city
province_or_state
country
itemday
month
quarter
year
category
group
brandname typecategory
LoadTransform
CleanExtract
Refresh
Query/Report Analysis Data Mining
OLAP Server OLAP ServerOutput
Operational Databases External sources
Data Cleaning
and
Data Integration
Data Storage
OLAP Engine
Front-End Tools
Metadata Repository
AdministrationMonitoring Data MartsData Warehouse
EnterpriseData
Warehouse
Define a high-level corporate data model
model
refinement model refinement
DataMartMart
Data
Data MartsDistributed
Multi-Tier
WarehouseData