21
An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of Minnesota, Fall 2006 CSci 8701: Overview of Database Research

An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Embed Size (px)

Citation preview

Page 1: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal

Michael GosheyUniversity of Minnesota, Fall 2006CSci 8701: Overview of Database Research

Page 2: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 2

Outline

1. Introduction

2. Problem Addressed

3. Major Contributions

4. Key Concepts

5. Validation Methodology

6. Assumptions

7. 2006 Rewrite

Page 3: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 3

Introduction

Selected paper S. Chaudhuri and U. Dayal, An Overview of

Data Warehousing and OLAP Technology, SIGMOD Record 26(1): 65-74(1997).

Motivation Personal Interest

Page 4: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 4

Outline

1. Introduction

2. Problem Addressed

3. Major Contributions

4. Key Concepts

5. Validation Methodology

6. Assumptions

7. 2006 Rewrite

Page 5: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 5

Problem Addressed

Problem Statement Survey: organizing the data warehousing space Differing requirements between OLTP and

OLAP Significance

Growth area Reference work establishing consensus on

terms, architectures and issues

Page 6: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 6

Outline

1. Introduction

2. Problem Addressed

3. Major Contributions

4. Key Concepts

5. Validation Methodology

6. Assumptions

7. 2006 Rewrite

Page 7: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 7

Major Contributions

Bridging the gulf between industry and academia OLTP vs. OLAP: clarifying the differences Concise survey of relevant issues, architectures

and tools Concrete list of data warehouse design and build

steps

Page 8: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 8

Outline

1. Introduction

2. Problem Addressed

3. Major Contributions

4. Key Concepts

5. Validation Methodology

6. Assumptions

7. 2006 Rewrite

Page 9: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 9

Key Concepts

Data warehouses and data marts OLTP, OLAP, ROLAP vs. MOLAP) Relational and dimensional data models Bitmap Index ETL Metadata Managed query vs. ad hoc environments Materialized views SQL extensions (cube, rollup, rank, percentile, etc.)

Page 10: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 10

Data Warehouse, Data Mart

Data Staging

Area

MetadataCatalog

Typical Data Warehouse Architecture

ETL Services

Dimensional Data Marts including atomic data

Other uses

Source Systems

Ad Hoc Query and Analysis Tools

Reporting ToolsDimensional Data Marts with

only aggregated data

Page 11: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 11

Relational or Dimensional?Categories

PK CategoryID

U1 CategoryName Description Picture

Shippers

PK ShipperID

CompanyName Phone

Order Details

PK,FK1,I2,I1 OrderIDPK,FK2,I4,I3 ProductID

UnitPrice Quantity Discount

Customers

PK CustomerID

I2 CompanyName ContactName ContactTitle AddressI1 CityI4 RegionI3 PostalCode Country Phone Fax

Suppliers

PK SupplierID

I1 CompanyName ContactName ContactTitle Address City RegionI2 PostalCode Country Phone Fax HomePage

Orders

PK OrderID

FK1,I2,I1 CustomerIDFK2,I3,I4 EmployeeIDI5 OrderDate RequiredDateI6 ShippedDateFK3,I7 ShipVia Freight ShipName ShipAddress ShipCity ShipRegionI8 ShipPostalCode ShipCountry

Employees

PK EmployeeID

I1 LastName FirstName Title TitleOfCourtesy BirthDate HireDate Address City RegionI2 PostalCode Country HomePhone Extension Photo Notes ReportsTo

Products

PK ProductID

I3 ProductNameFK2,I5,I4 SupplierIDFK1,I1,I2 CategoryID QuantityPerUnit UnitPrice UnitsInStock UnitsOnOrder ReorderLevel Discontinued

Page 12: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 12

Relational or Dimensional?

(image from http://www.laynetworks.com)

Page 13: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 13

Bitmap Indices

customer

age 0-10 age 11-20 age 21-30 age 31-40

Mary 1 0 0 0

John 0 1 0 0

Steve 0 0 1 0

Tom 0 0 0 1

Lisa 0 0 1 0

cardinality: unique values/total rows B-Tree vs. bitmap: 1% rule, uniqueness Boolean algebra directly on indices

Page 14: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 14

Outline

1. Introduction

2. Problem Addressed

3. Major Contributions

4. Key Concepts

5. Validation Methodology

6. Assumptions

7. 2006 Rewrite

Page 15: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 15

Validation Methodology

Survey paper goals Academic and industry citations Referencing tools, vendors Case studies

Page 16: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 16

Outline

1. Introduction

2. Problem Addressed

3. Major Contributions

4. Key Concepts

5. Validation Methodology

6. Assumptions

7. 2006 Rewrite

Page 17: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 17

Assumptions

Read-only environments Shortcomings

(occasional) transactional commitments the data revision problem

Page 18: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 18

Outline

1. Introduction

2. Problem Addressed

3. Major Contributions

4. Key Concepts

5. Validation Methodology

6. Assumptions

7. 2006 Rewrite

Page 19: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 19

2006 Rewrite

Changes in terminology, tools, vendors Fact constellations -> conformed dimensions Decision support -> BI Vendors and tools in BI, ETL, OLAP

Multiple user constituencies Data history difficulties

petabyte databases -> very large warehouses common

data expiry challenges slowly changing dimensions

Page 20: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 20

Slowly Changing Dimensions

CustomerID Name Status

001 Mary Johnson

Gold

CustomerID Name Status

001 Mary Johnson

Platinum

CustomerID Name Status

001 Mary Johnson

Gold

001 Mary Johnson

Platinum

CustomerID Name Original Status

Current Status

Effective Date

001 Mary Johnson

Gold Platinum 10/1/2006

Before

After: Type 1

After: Type 2

After: Type 3

CustomerID Name Status

001 Mary Johnson

Platinum

Page 21: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of

Michael Goshey: 9/19/2006 21

Questions?