Transcript
Page 1: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Background ofOLAP, DW, Data Mining

Page 2: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Introductory Concepts

• Related Terms– OLAP– Data warehouse– Data mart– OLAP cube, multidimensional cube– Star schema– Fact, dimension– Dimensional modeling– Data mining

Page 3: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Introductory Concepts

OLAP

Data Warehouse

Data Mining

Multidimensional Cube

Star Schema

•Relevance of Terms

Page 4: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Introductory Concepts

• OLAP(On-Line Analytical Processing)– Complex query processing for decision making– Report generation for the summary/aggregate /

statistics over source data– Advanced data analysis techniques

• OLTP(On-Line Transaction Processing)– Simple transaction processing for regular (daily)

business operations– Manipulating databases maintained for the orga-

nization’s activities– Optimized for simple operations (select,insert ...)

Page 5: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Introductory Concepts

본부 현업 부서

정보 시스템

총장 , 교육부 , 국회

업무 데이타베이스OLTP DB

Transaction 요청

통계 자료 요청보고서 요청

Page 6: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Introductory Concepts

본부 현업 부서

정보 시스템

총장 , 교육부 , 국회

업무 데이타베이스OLTP DB

Transaction 요청

통계 자료 요청보고서 요청

• Solution (1)

통계 / 보고서생성 모듈

Page 7: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Introductory Concepts

본부 현업 부서

정보 시스템

총장 , 교육부 , 국회

업무 데이타베이스OLTP DB

Transaction 요청

통계 자료 요청보고서 요청

• Solution (2)

임의의 통계생성 도구

OLAP 시스템

Page 8: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Introductory Concepts

본부 현업 부서

정보 시스템

총장 , 교육부 , 국회

업무 데이타베이스OLTP DB

Transaction 요청

통계 자료 요청보고서 요청

• Solution (2)

OLAP 시스템

분석용 / 요약 자료Data Ware-house

Page 9: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Introductory Concepts

• OLAP systems/tools– Present summary/aggregate data for a large OLTP

databases in several ways to decision makers– Manipulate DW over several dimensions dynami-

cally to find useful information

• Data warehouse– A collection of data which are used for OLAP– Summary/aggregate data extracted from source

OLTP databases which can support every possible OLAP requests.

– Represented by star schema

Page 10: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Introductory Concepts

OLAP

Data Warehouse

Data Mining

Multidimensional Cube

Star Schema

•Relevance of Terms

Page 11: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Page 11

Page 12: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Introductory Concepts

• OLTP(Online Transaction Processing)– Typical daily business query and update processing

• OLAP (Online Analytical Processing)– Complex query processing or report generation

• Advanced data analysis techniques• ROLAP: Relational OLAP• MOLAP: Multi-dimensional OLAP

• Data Warehouse– An enterprise-wide data repository for decision sup-

port

• Data Mart– A smaller targeted DW for a business process

• Star Schema– A DB structure for data warehouse

Page 13: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Data Warehouse• DW is an integrated repository of data that is put into a

form that can be easily understood, interpreted, and analyzed by the people who need to use it to make de-cisions

• Data are extracted from operational systems, then cleansed, integrated, transformed, and aggregated, into a read-only database that is optimized for decision making

• Data Warehouse is a – Subject-oriented– Integrated– Time-variant– Non-volatile

– Collection of data in support of management’s decision-making process (W.H. Inmon)

Page 14: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Data Warehouse• Motivation for DW

– An enterprise-wide repository of data, information, knowledge, and meta-data• Gather all the information into a single place for in-depth

analysis• Decouple such analysis from OLTP systems

– Transform the data into information• Provide right information in the right format at the right time

– Perform sophisticated analysis of data• Perform trend analysis, time series analysis, risk analysis,

etc.• Perform DSS exploration such as alternative formation, al-

ternatives testing, decision-making, etc.• Discover/visualize hidden facts, patterns, correlations, rules,

exceptions using data mining techniques

Page 15: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Data Warehouse

• OLTP queries– How many shoes did we sell last month?–What are the age, address, phone of a

certain student (e.g., Hong Gildong)

Page 16: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Data Warehouse• OLAP queries

– How many size 10 shoes in red did we sell last month in the Midwest, the Northeast, compared with the same month last year, actual vs. budget?

– What are the top 25 brands, by products, styles, and re-gions, for this period for total US based on sales dollars?

– How much promotional expenses did we spend on cus-tomers who purchased less than $100 worth of products?

– How much discount should we offer to boost the sales vol-ume significantly?

– Find the correlation between buying patterns of products of type A and those of type B.

– What are sales trends?– What percent of the market do we own?– How are our defect rates improving?– Are our profits are increasing or decreasing?

Page 17: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

OLTP Data Warehouses & OLAP

Purpose Daily business support,Transaction processing

Decision support,Analytic processing

User Data entry clerk, administrator, developer

Decision maker, executives

DB design Application oriented Subject-oriented

DB design model

ER model Star, snowflake, multidimensional model

Data structures Normalized, complex Denormalized, simple

Data Current, up-to-date operational dataAtomicIsolated

HistoricalAtomic and summarizedIntegrated

Usage Repetitive, Routine Ad-hoc

Update Transaction constantly generateNew data, read/write

Data is relatively staticOften refreshed weekly, read mostly

Response time Subsecond to second Seconds, minutes, worse

Index types B+ trees B+ trees, bitmap index, join index

Systems re-quirements

Transaction throughput,Data consistency

Query throughput,Data accurary

• Comparison of OLTP and Data Warehouse– DWs require a new query-centric view of the data

Page 18: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Data Warehouse• Example Benefits of Data Warehousing & Data Mining

– Fast sophisticated report generations– 4-15 times faster delivery– Young men buy beer on Friday nights when they buy diapers– More athletic shoes are sold on Friday evenings and Satur-

days than during the rest of the week combined– More athletic shoes are sold when white tube socks are

prominently displayed as part of a 2-for-1 sale.– In a retail chain, potato chip purchases were accompanied

by a soda purchase in half the cases. That figure increases to 75% when there is a marketing promotion.

– Blue Cross found some providers had superior treatment success rates for some fatal diseases

– Victoria Secret found a particular incentive was ineffective, saving $300K per week.

Page 19: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Data Warehouse• Database Design for DW

– Objective of a DW• Creating a database optimized for decision support

– Limitations of ER model for DW applications• Normalized data model support large numbers of transactions

with very few records.• ER models tend to be very complex and difficult to navigate.• ER model identifies first entities, then relationships

– Four basic requirements of a warehouse design• The schema must be simple.• The data must be clean, consistent, and accurate• Query processing must be fast• Load the data into the warehouse quickly

– Two types of data representation in DW• Star Schema• Multi-dimensional array

Page 20: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Star Schema• Dimensional Model (Star Schema) – A database schema for data warehousing– Initially developed to simplify SQL queries (by

Ralph Kimball)– Consists of a few central fact tables and many di-

mension tables• Dimension = analysis criteria• Fact = measurements aggregated over dimensions to be analyzed

– Simplifies end-user query processing and high query performance

– Used to reduce joins by OLAP/Relational engines– Relatively few tables and well-defined join paths

Page 21: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Star Schema

Page 22: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Star Schema• Dimensional Model (Star Schema) consists of:

– Fact table:• Stores all transactions or factual data that are analyzed• Typically numeric measures• From millions to more than billion rows• Example: Revenue, Actuals, Budgets, Sales, Orders, Bookings, Claims

– Dimension table:• Attributes about facts• Supports grouping, browsing, constraining• Provides the entry points into the DW• Example: Time, Customer, Promotions, Demographics, LifeStyles, Prod-

ucts, Stores, Markets

Page 23: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Star Schema• Dimension hierarchy– Each dimension table has several attributes– These attributes may have a hierarchy

• Different level of aggregation for fact data

– User may want aggregate data by some level of the dimension hierarchy

Year

Month

day

State

County

City

Street

Page 24: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Star Schema• The Strengths of Dimensional Modeling (Kimball 98)

– Provides a predictable, standard framework • Report writers, query tools, and UIs could take advantage of DM structure• Supports presentation and performance• Simplifies the understanding and navigation of metadata

– Is robust against unexpected changes in user behavior • Logical design is independent of the use of the schema• All dimensions are equally entry points to the system

– Can be changed gracefully • Could add new unanticipated facts of the same grain• Could add new dimensions• Add new unanticipated dimensional attributes• Break existing dimension records down to a lower level of granularity

– Availability of common modeling situations • Slowly changing dimensions• Heterogeneous products• Pay-in-advance databases• Factless facts as in event-handling

– Availability of administrative utilities and SW processes that manage and use aggregates

Page 25: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Star Schema• Variations of Star Schema

– Star Schema• All dimensions are denormalized • Wide dimensions and deep facts

– Snowflake Schema• All dimensions are normalized into 3NF

Page 26: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Star Schema• Summary of DW and Star Schema– DW is databases specially maintained for ana-

lytical processing– DW is organized by facts and dimensions– Star schema is a way of representing DW

schema

• Cube, Data Mining??

Page 27: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Page 27

Page 28: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Multidimensional Cube• Motivation of Cube

– Similar OLAP queries can be executed repeatedly to find some valuable information.

– Users want to view data in several different perspectives– It is time-consuming to calculate these data which use

similar facts and dimensions– Cube is a special data structure to meet this kind of OLAP

queries’ requirements

dim1

dim2

dim3

dim4

dim5

dim6

dim7

dim8

dim9

dim10

fact1

fact2

fact3

dim1

dim2dim7

fact1

dim1 dim7fact1

dim1 fact1

Page 29: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Multidimensional Cube• Multidimensional Cube–Maintain every possible pairs of dimensions

and aggregation of fact data– Types of OLAP• ROLAP : maintaining cube by relational table• MOLAP : maintaining cube by multidimensional

array

dim1

dim2

dim3

dim4

dim5

dim6

dim7

dim8

dim9

dim10

fact1

fact2

fact3

dim1

dim7

dim2

Query1

Query2

Query3

Page 30: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Multidimensional Cube• Multidimensional Cube–Maintain every possible pairs of dimensions

and aggregation of fact data– Types of OLAP• ROLAP : maintaining cube by relational table• MOLAP : maintaining cube by multidimensional

array

dim1

dim2

dim3

dim4

dim5

dim6

dim7

dim8

dim9

dim10

fact1

fact2

fact3

Query1

Query2

Query3

Page 31: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Multidimensional Cube• Multidimension Cube– Consist of several dimensions and facts– Its schema can be represented by star schema– Two different perspectives

• Cube is a part of data warehouses• Cube is a data structure for OLAP tool which is ex-

tracted from data warehouses

Page 32: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Introductory Concepts• OLAP Operations

– Drill down. View the ata in a more specialized level within a dimension• Show me total sales for by quarter• Show me total sales for this quarter by department• Show me total sales for this quarter, dept 22, by state

– Roll-up. Converse of drill-down• Drilling down is adding new headers from the dimension tables• Rolling up is moving row headers

– Slice and dice. Slice (dice) data on columns (rows), elimi-nating the rest of the display. Keep a subset of dimensions for selected values.

– Pivot. Switching the arrangement of the dimensions– Cross-tab. A matrix report with X and Y axis– Other Operations

• Ranking and computed attribute definitions• Multiple granularity aggregates (e.g., cube operator)

Page 33: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Introductory Concepts

OLAP

Data Warehouse

Data Mining

Multidimensional Cube

Star Schema

•Relevance of Terms

Page 34: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Page 34Data Ware-house

Page 35: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Page 35

OLAP tools

Page 36: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Data Mining• What is Data Mining? – Is the process of knowledge discovery (useful infor-

mation, patterns, groups, etc.) from a large of data.– Is the principle of soring through large amounts of

data and picking out relevant information (by Wikipedia).

– Is the nontrivial extraction of implicit, previously un-known, and potentially useful information from data.

– Is the science of extracting useful information from large data sets or databases.

– Data may be stored in files, databases, webs, data warehouses, etc.

Page 37: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Data Mining• Data Mining algorithm

MiningAlgorithm

Large input data KnowledgeInformation

Data Ware-house

Multidimen-sional Cube

OLAP tools

Page 38: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Summary

OLAP

Data Warehouse

Data Mining

Multidimensional Cube

Star Schema

•Relevance of Terms

Page 39: Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

Page 39


Recommended