24
Online Analytical Online Analytical Processing (OLAP) Processing (OLAP) Hweichao Lu Hweichao Lu CS157B-02 Spring 2007 CS157B-02 Spring 2007

Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Embed Size (px)

Citation preview

Page 1: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Online Analytical Online Analytical Processing (OLAP)Processing (OLAP)Hweichao LuHweichao Lu

CS157B-02 Spring 2007CS157B-02 Spring 2007

Page 2: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

What is OLAPWhat is OLAP

Basic idea: Basic idea: converting data into converting data into information that decision makers needinformation that decision makers need

Concept to analyze data by multiple Concept to analyze data by multiple dimension in a structure called data cubedimension in a structure called data cube

Page 3: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

HistoryHistory

In 1993, E. F. Codd came up with the In 1993, E. F. Codd came up with the term term online analytical processing (OLAP)online analytical processing (OLAP) and proposed 12 criteria to define an and proposed 12 criteria to define an OLAP databaseOLAP database

the term OLAP seems perfect to describe the term OLAP seems perfect to describe databases designed to facilitate decision databases designed to facilitate decision making (analysis) in an organization making (analysis) in an organization

Page 4: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Purpose of OLAPPurpose of OLAP

To derive summarized information from To derive summarized information from large volume databaselarge volume database

To generate automated reports for To generate automated reports for human viewhuman view

Page 5: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Why need OLAP over Why need OLAP over Relational Database IRelational Database I

Consistently fast responseConsistently fast response

OLAP obtains a consistently fast OLAP obtains a consistently fast response is by prestoring calculated response is by prestoring calculated valuesvalues

Page 6: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Why need OLAP over Why need OLAP over Relational Database IIRelational Database II

Metadata-based queriesMetadata-based queries

provide analysis functions that are provide analysis functions that are difficult or impossible to express in SQLdifficult or impossible to express in SQL

SQL SQL was developed primarily for was developed primarily for transaction systems, not for reporting transaction systems, not for reporting applicationsapplications

Page 7: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Why need OLAP over Why need OLAP over Relational Database IIIRelational Database III

Spreadsheet-style formulasSpreadsheet-style formulas

design the data structure with users in design the data structure with users in mind.mind.

Spreadsheets are Spreadsheets are key components of key components of business management because they are business management because they are intuitive to createintuitive to create

Page 8: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Step IStep I

1.1. identify multidimensional dataidentify multidimensional data

measure attributemeasure attribute (measure some value, can be (measure some value, can be

aggregated upon)aggregated upon) dimension attributedimension attribute (define the dimension and summary of (define the dimension and summary of

measure attribute)measure attribute)

Page 9: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

(Cont.)(Cont.)

Each dimension is typically expressed as Each dimension is typically expressed as a “hierarchy”a “hierarchy”

Hierarchy: Analyst is interested in Hierarchy: Analyst is interested in different level of detail of a dimensiondifferent level of detail of a dimension

Page 10: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Step IIStep II

2.2. Analyze multidimensional data into Analyze multidimensional data into cross-tabulationcross-tabulation

row header: value for one attributerow header: value for one attribute

column header: value for another attr.column header: value for another attr.

individual cell: value aggregationindividual cell: value aggregation

Page 11: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Step IIIStep III

3.3. Visualize n-dimensional cube - data Visualize n-dimensional cube - data cubecube

the word CUBE describe what in thethe word CUBE describe what in the

relational world would be the integrationrelational world would be the integration

of the fact table with dimension tables of the fact table with dimension tables

Page 12: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Step IVStep IV

After you design the cube, you will use After you design the cube, you will use the cube's structure to build a relational the cube's structure to build a relational database (known as a star schema) to database (known as a star schema) to house the data for the cubehouse the data for the cube

Page 13: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Step VStep V

Once you load data into the relational Once you load data into the relational database, and then into the cube, you'll database, and then into the cube, you'll be able to see how attributes, be able to see how attributes, dimensions, measures, and measure dimensions, measures, and measure groups fit together within a cube to create groups fit together within a cube to create a powerful analytical tool. a powerful analytical tool.

Page 14: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Star SchemaStar Schema

Cubes are easily stored in relational Cubes are easily stored in relational databases, using a denormalized data databases, using a denormalized data structure called the star schema, developed by structure called the star schema, developed by Ralph KimballRalph Kimball

starts with a central fact tablestarts with a central fact table Each row in the central fact table contains Each row in the central fact table contains

some combination of keys that makes it some combination of keys that makes it unique. These keys are called dimensions.unique. These keys are called dimensions.

Page 15: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Slicing & DicingSlicing & Dicing

Additional Functionality that can be Additional Functionality that can be thought of as viewing a slice of the data thought of as viewing a slice of the data cube, particularly when values for cube, particularly when values for multiple dimensions are fixed.multiple dimensions are fixed.

Slicing/Dicing simply consists of selecting Slicing/Dicing simply consists of selecting specific values for these attributes, which specific values for these attributes, which are then displayed on top of the cross-tabare then displayed on top of the cross-tab

Page 16: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007
Page 17: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

Rollup & Drill-downRollup & Drill-down

OLAP permit users to view data at ay OLAP permit users to view data at ay desired level of granularity.desired level of granularity.

Rollup: moving from finer-granularity data Rollup: moving from finer-granularity data to coarser granularityto coarser granularity

Drill-down: opposite to RollupDrill-down: opposite to Rollup

Page 18: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

OLAP InplementationOLAP Inplementation

Multidimensional OLAP (MOLAP)Multidimensional OLAP (MOLAP) Relational OLAP (ROLAP)Relational OLAP (ROLAP) Hybrid OLAP (HOLAP)Hybrid OLAP (HOLAP)

Page 19: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

MOLAPMOLAP

The database is stored in a special, usually The database is stored in a special, usually proprietary, structure that is optimized for proprietary, structure that is optimized for multidimensional analysis.multidimensional analysis.

+ : very fast query response time because data + : very fast query response time because data is mostly pre-calculatedis mostly pre-calculated

-: -: practical limit on the size because practical limit on the size because the time the time taken to calculate the database and the space taken to calculate the database and the space required to hold required to hold these pre-calculated values these pre-calculated values

Page 20: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

ROLAPROLAP

The database is a standard relational database The database is a standard relational database and the database model is a multidimensional and the database model is a multidimensional model, often referred to as a star or snowflake model, often referred to as a star or snowflake model or schema.model or schema.

+: more scalable solution +: more scalable solution -: -: performance of the queries will be largely performance of the queries will be largely

governed by the complexity of the SQL and the governed by the complexity of the SQL and the number and size of the number and size of the tables being joined in tables being joined in the query the query

Page 21: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

HOLAPHOLAP

a hybrid of ROLAP a hybrid of ROLAP and MOLAPand MOLAP can be thought of as a virtual database can be thought of as a virtual database

whereby the higher levels of the whereby the higher levels of the database are implemented as MOLAP database are implemented as MOLAP and the lower levels of the and the lower levels of the database as database as ROLAP ROLAP

Page 22: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

DOLAPDOLAP

The previous terms are used to refer to The previous terms are used to refer to server based OLAP technologiesserver based OLAP technologies

DOLAP (Desktop OLAP)DOLAP (Desktop OLAP) DOLAP enables DOLAP enables users to quickly pull users to quickly pull

together small cubes that run on their together small cubes that run on their desktops or laptops desktops or laptops

Page 23: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

ConclusionConclusion

OLAP is a significant improvement over OLAP is a significant improvement over query systemsquery systems

OLAP is an interactive system to show OLAP is an interactive system to show different summaries of multidimensional different summaries of multidimensional data by interactively selecting the data by interactively selecting the attributes in a multidimensional data cubeattributes in a multidimensional data cube

Page 24: Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007

ReferencesReferences

IBM Redbooks. IBM Redbooks. DB2 Cube Views: A Primer.DB2 Cube Views: A Primer. Durham, NC, Durham, NC, USA: IBM, 2003. ebrary collections. San Jose State USA: IBM, 2003. ebrary collections. San Jose State University. <University. <http://site.ebrary.com/lib/sjsu/Doc?http://site.ebrary.com/lib/sjsu/Doc?id=10113016&ppg=43id=10113016&ppg=43>>

Jacobson, Reed, Jacobson, Reed, Microsoft® SQL Server™ 2005 Analysis Microsoft® SQL Server™ 2005 Analysis Services Step by StepServices Step by Step. Microsoft Press.. Microsoft Press.

Berry, Michael J. A. Berry, Michael J. A. Data Mining Techniques : For Data Mining Techniques : For Marketing, Sales, and Customer Relationship Management.Marketing, Sales, and Customer Relationship Management. Hoboken, NJ, USA: John Wiley & Sons, Incorporated, 2004. Hoboken, NJ, USA: John Wiley & Sons, Incorporated, 2004. ebrary collections. San Jose State University. ebrary collections. San Jose State University. <<http://site.ebrary.com/lib/sjsu/Doc?http://site.ebrary.com/lib/sjsu/Doc?id=10114278&ppg=522id=10114278&ppg=522>.>.