33
Data Warehouses and OLAP What are data warehousing systems? Data Warehouse Architecture & Design Multidimensional Data Model ROLAP and MOLAP Systems View Design in Data Warehouses Object-Oriented Data Warehousing Summary and New Aspects

Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Data Warehouses and OLAP

What are data warehousing systems? Data Warehouse Architecture & DesignMultidimensional Data ModelROLAP and MOLAP SystemsView Design in Data WarehousesObject-Oriented Data WarehousingSummary and New Aspects

Page 2: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

What Is Data Warehousing?

Data warehousing is a collection of decision support technologies, aimed at enabling the knowledge worker (e.g., chief executive, manager, analyst) to make better and faster decisions.

-Chaudhuri and Dayal, SIGMOD Record, March 1997

Page 3: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Characteristics of Data Warehousing Systems

Historical, summarized and consolidated data ; very large databases

Query intensive processing, query throughput and response time driven

Multidimensional model, new operationsROLAP, MOLAP, Data Marts.

Page 4: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Operational Database Systems

Mainly on-line transaction processing systems.Have been bread-butter systems for most database system vendors.Lot of work has gone in building these systems.

But these systems have limited decision support functionality (data analysis) required for competitive business environments

Page 5: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Why Need Data Analysis?to know your customers and yourself better,for effective business strategies,to provide future directions to business organizations.

This kind of data analysis has been going on for long time. But there is an urgency in getting such data analysis done faster. Main problem in doing this has been the disparate and heterogeneous data sources.

Data warehousing systems aim to solve this problem!

Page 6: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Data Warehousing Architecture

External Sources

Data Sources

Refresh

TransformLoad

Extract

Operational Dbs

Monitoring & Administration

MetadataRepository

Data Warehouse

Serve

OLAPServers

Analysis

Query/Reporting

Data Mining

Tools

Data Marts

Taken from Chaudhri&Dayal, SIGMOD RECORD March 1997

Page 7: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Data Warehouse DesignDefine the architecture, do capacity planning, and

select storage servers, database and OLAP servers, and tools

Integrate the servers, storage and client toolsDesign the warehouse schema and viewsDefine the physical warehouse organization, data

placement, partitioning, and access methods

Page 8: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Data Warehouse Design (Cont..)Connect the servers using gateways, ODBC

drivers, or other wrappersDesign and implement scripts for data extraction,

cleaning, transformation, load and refreshPopulate the repository with the schema and view

definitions, scripts, and other metadataDesign and implement end-user applicationsRoll out the warehouse and applications

Page 9: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Back-end Tools and UtilitiesData Cleansing consists of data migration, data

scrubbing and data auditingData Loading - consists of checking integrity

constraints; sorting; summarization; aggregation and other computation to build derived tables stored in warehouse; building indices and other access paths; and partitioning to multiple target storage areas.

Refresh - data shipping (triggers) vs transaction shipping (based on logs)

Page 10: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Multidimensional Data ModelMultidimensional view of data in the warehouse

Each dimension is described by a set of attributes; the attributes of a dimension may be related via hierarchy of relationships.

Dimensions: Product, City, DateHierarchical summarization paths

Industry Country Year

Category State Quarter

Product City Month Week

Date

Page 11: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

On-Line Analytical Processing (OLAP)OLAP tools provide an environment for decision making and business modeling activities by supporting ad hoc queries

provide a multidimensional conceptual view of the datausually star schema in which a single fact table relates to each

dimensional table, or

snowflake schema where dimensional tables are normalized for simplifying the data operations related to the dimension

provide easy-to-use end user interfaces

Page 12: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

OLAP (Front-end) ToolsMultidimesional data model grew out of the view of business data popularized by PC spread sheet programs.

Operations supported by multidimensional data model Aggregation: total sales by store and by year Selection (slicing); sales where toys = “soft” and store = “LA”

and year=1996 Roll up (multiple group by): sales by city to sales by state Drill-down: sales by state to sales by city Calculation by positioning: top 5 stores by total sales

Page 13: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Star Schema

OrderNoSalespersonIDCustomerNoProdNoDateKeyCityNameQuantityTotalPrice

Fact Table

CityNameStateCountry

City

DateKeyDateMonthYear

Date

ProdNoProdNameProdDescrCategoryCategoryDescrUnitPriceQOH

Product

OrderNoOrderDate

Order

SalespersonIDSalespersonNameCityQuota

Salesperson

CustomerNoCustomerNameCustomerAddressCity

Customer

Page 14: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Relational OLAP (ROLAP)

Stores the data in specialized relational tables (star schema);

ROLAP offers flexibility; cost is the many joins needed for each query

ROLAP extends SQL for decision support data requests

Bitmapped indexes more useful than B-trees in handling large amount of data

Page 15: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Multidimensional OLAP (MOLAP)

Stores data in a N-dimensional cube (hyper cube) using array-based storage structure

each cell is formed by the intersection of all the dimensions; not all cells have a value (eg, not every product is sold in every store)

Cubes are created before can be used and are static

Suited for small and medium data sets

Page 16: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

View Design & Data WarehousingThe virtual view approach may be better if

the information sources are changing frequently;

The materialized view approach would be superior if the information sources are changing infrequently and very fast query response time is needed.

Page 17: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

A Motivating Example

Suppose the member databases contain following tables

Item(I_id, I_name, I_price)

Part(P_id, P_name, I_id)

Supplier(S_id, S_name, P_id, city, cost, preference)

Sales(I_id, month, year, amount)

Page 18: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Example continuedAssume we have the following frequently asked queries:Q1: Select I_id, sum(amount*I_price)

From Item, Sales Where I_name like {MAZADA, NISSEN, TOYOTA} And year=1996 And Item.I_id=Sales.I_id Group by I_id

Q2: Select P_id, month, sum(amount) From Item, Sales, Part Where I_name like {MAZADA, NISSEN, TOYOTA} And year=1996 And Item.I_id=sales.I_id And Part.I_id=Item.I_id Group by P_id, month

Page 19: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Example ContinuedQ3: Select P_id, min(cost), max(cost)

From Part, Supplier WherePart.P_id=Supplier.P_id And P_name like {spark_plug, gas_kit} Group by P_id

Q4: Select I_id, sum(amount*min_cost) From Item, Sales, Part WhereI_name like {MAZADA, NISSEN, TOYOTA} And year=1996 And Item.I_id=Sales.I_id And Item.I_Id=Part.I_id and Part.P_id =

(Select P_id, min(cost) as min_cost From supplier Group by P_id)Group by I_id

Page 20: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

An MVPP for the Example

Item Sales Part Supplier

1k 12k 10k 50ktmp1 I_name like

{Mazda, Nisson, Toyota}

tmp2

year=“1996”

tmp5

p_name like{spark_plug, gas_kit}

tmp6

P_id,min(cost)max(cost)

tmp3

tmp tmp8 tmp7

result1

I_id, sum

(amount*I_price)

result2P_id, month

sum(amount*no)

result4I_id, sum

(mincost*amount*no)

result3

P_id,min(cost)max(cost)

Q1

Q2 Q4 Q4

36m360m

3.6b15m

36k

360k 360k 1.5k30k

10

1120k

230k 1.5k

5

Page 21: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Different Materialization Strategies

Materalized Views Cost of Query Processing Cost of Maintenance Total Cost

Item, Sales, Part, Supplier 8b980m860k 0 8b980m860k

tmp3, tmp4, tmp8 7b201m547k 1b350m125k 8b551m672k

tmp3, tmp5 416m747k 16b32m204k 16b448m951k

tmp3, tmp4, tmp7 7b276m497k 1b220m55k 8b496m552k

tmp3, tmp7 8b281m547k 126m122k 8b407m669k

result1, result2, result3, result4 1m447k 17b384m934k 17b386m381k

Page 22: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Issues & ProblemsFinding all the common subexpressions and

combining individual query access plans into one MVPP, such that all the common subexpressions are merged;

Finding a set of intermediate nodes in the MVPP, such that if the members of this set are materialized, the total cost of global query access and view maintenance is minimal.

Page 23: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Algorithms for Materialized View Selection

Algorithms for multiple MVPP design;a feasible solution - working with individual

optimal plans;generating optimal plan(s) - applying 0-1 integer

programming technique.

Given an MVPP, using heuristic rules to find a set of nodes to be materialized so that the total cost is minimal.

Page 24: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Dynamic Materialized View Selection Monitor the queries being executed over timeMaintain MVPP by incorporating most

frequently executed queries (common subexpressions)

Modify MVPP incrementally by executing MVPP generation algorithm (in background)

Decide on the views to be materializedReorganize the existing views

Page 25: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Materialized View Selection Costs

The dynamic materialized view selection problem has to take into consideration:Benefit to the query processing cost in futureCost of maintaining the materialized views Cost of reorganization

Page 26: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Materialized View ReorganizationGiven a set of views V1, V2, …, Vn currently

materializedLet V’

1, V’2, …, V’

m be the new views that need to be materialized

Need to design algorithms for efficient view reorganization

on-line (concurrency, failure recovery) & off-line (efficiency) algorithms

Page 27: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Relational Schema

OrderNoSalespersonIDCustomerNoProdNoDateKeyCityNameQuantityTotalPrice

Fact Table

CityNameStateCountry

City

DateKeyDateMonthYear

Date

ProdNoProdNameProdDescrCategoryCategoryDescrUnitPriceQOH

Product

OrderNoOrderDate

Order

SalespersonIDSalespersonNameCityQuota

Salesperson

CustomerNoCustomerNameCustomerAddressCity

Customer

Page 28: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

An Object Model

OrderOrderNoQuantity

TotalPrice

SalesPersonSalesPersonID

Quota

State

CityGetRegion()

PersonName

DateOfBirthAddressGetAge()

OrderPYCViewCity OrderPYView

ProductYear

OrderViewOrderSet

Summarize()

Country

Date

Month

Year

OrderDateGetDate()

CustomerCustomerNo

CategoryCategoryNameCategoryDescr

GetCategName()

ProductProdNameUnitPrice

GetProdName()

ISA relationship

IS PART-OF relationship

Page 29: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Why Object Oriented Data Warehouse?Object Identity reduces data redundancy -

can it help materialized view maintenance? Is-a hierarchy facilitates common data

objects and methods reuse (overloading)Class composition hierarchy helps fast

traversal using OIDsMethods facilitate implementation of

complex aggregate functions (over complex objects, such as volume of a CAD object)

Page 30: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Efficiency considerationsStructural join index hierarchies and class partitioning can facilitate inEvaluating of multiple path operationsEfficiently processing methodsIn calculating multidimensional aggregate operations, such as data cube, and pivoting

Page 31: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Architecture considerationsFollowing issues need to be addressed Is the preferred architecture OO front-end

with relational back-end?What about OO back end and front-end?How does one integrate data mining and OO

data warehousing componentsHow does one build distributed object

oriented data warehousing systems?

Page 32: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

SummaryData warehousing systems are about 5 years oldMost of the work has concentrated on

materialized view maintenance, preliminariesNew aspects of data warehousing have to be

considered to build next generation systemsdynamic materialized view designobject-orientation, etc.

Page 33: Data Warehouses and OLAP û What are data warehousing systems? û Data Warehouse Architecture & Design û Multidimensional Data Model û ROLAP and MOLAP Systems

Some ReferencesDynamic Materialized View Design/Selection Timos Sellis group, Stanford Group,

CSIRO/HKUST/CityU

Object Oriented Data WarehousingRundensteiner group, Tore Risch Group,

CityU/HKUST, Univ. of South Australia