Online Analytical Processing by Hweichao Lu

Embed Size (px)

Citation preview

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    1/24

    Online AnalyticalOnline Analytical

    Processing (OLAP)Processing (OLAP)

    Hweichao LuHweichao Lu

    CS157BCS157B--02 Spring 200702 Spring 2007

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    2/24

    What is OLAPWhat is OLAP

    Basic idea:Basic idea: converting data intoconverting data into

    information that decision makers needinformation that decision makers need

    Concept to analyze data by multipleConcept to analyze data by multiple

    dimension in a structure called data cubedimension in a structure called data cube

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    3/24

    HistoryHistory

    In 1993, E. F. Codd came up with theIn 1993, E. F. Codd came up with the

    termterm online analytical processing (OLAP)online analytical processing (OLAP)

    and proposed 12 criteria to define anand proposed 12 criteria to define anOLAP databaseOLAP database

    the term OLAP seems perfect to describethe term OLAP seems perfect to describe

    databases designed to facilitate decisiondatabases designed to facilitate decisionmaking (analysis) in an organizationmaking (analysis) in an organization

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    4/24

    Purpose of OLAPPurpose of OLAP

    To derive summarized information fromTo derive summarized information from

    large volume databaselarge volume database

    To generate automated reports forTo generate automated reports for

    human viewhuman view

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    5/24

    Why need OLAP overWhy need OLAP over

    Relational Database IRelational Database I Consistently fast responseConsistently fast response

    OLAP obtains a consistently fastOLAP obtains a consistently fast

    response is by prestoring calculatedresponse is by prestoring calculated

    valuesvalues

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    6/24

    Why need OLAP overWhy need OLAP over

    Relational Database IIRelational Database II MetadataMetadata--based queriesbased queries

    provide analysis functions that areprovide analysis functions that are

    difficult or impossible to express in SQLdifficult or impossible to express in SQL

    SQLSQL was developed primarily forwas developed primarily for

    transaction systems, not for reportingtransaction systems, not for reporting

    applicationsapplications

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    7/24

    Why need OLAP overWhy need OLAP over

    Relational Database IIIRelational Database III SpreadsheetSpreadsheet--style formulasstyle formulas

    design the data structure with users indesign the data structure with users in

    mind.mind.

    Spreadsheets areSpreadsheets are key components ofkey components of

    business management because they arebusiness management because they are

    intuitive to createintuitive to create

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    8/24

    Step IStep I

    1.1. identify multidimensional dataidentify multidimensional data

    measure attributemeasure attribute

    (measure some value, can be(measure some value, can beaggregated upon)aggregated upon)

    dimension attributedimension attribute(define the dimension and summary of(define the dimension and summary ofmeasure attribute)measure attribute)

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    9/24

    (Cont.)(Cont.)

    Each dimension is typically expressed asEach dimension is typically expressed as

    a hierarchya hierarchy

    Hierarchy: Analyst is interested inHierarchy: Analyst is interested in

    different level of detail of a dimensiondifferent level of detail of a dimension

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    10/24

    Step IIStep II

    2.2. Analyze multidimensional data intoAnalyze multidimensional data into

    crosscross--tabulationtabulation

    row header: value for one attributerow header: value for one attribute

    column header: value for another attr.column header: value for another attr.

    individual cell: value aggregationindividual cell: value aggregation

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    11/24

    Step IIIStep III

    3.3. Visualize nVisualize n--dimensional cubedimensional cube -- datadata

    cubecube

    the word CUBE describe what in thethe word CUBE describe what in the

    relational world would be the integrationrelational world would be the integration

    of the fact table with dimension tablesof the fact table with dimension tables

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    12/24

    Step IVStep IV

    After you design the cube, you will useAfter you design the cube, you will use

    the cube's structure to build a relationalthe cube's structure to build a relational

    database (known as a star schema) todatabase (known as a star schema) tohouse the data for the cubehouse the data for the cube

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    13/24

    Step VStep V

    Once you load data into the relationalOnce you load data into the relational

    database, and then into the cube, you'lldatabase, and then into the cube, you'll

    be able to see how attributes, dimensions,be able to see how attributes, dimensions,measures, and measure groups fitmeasures, and measure groups fit

    together within a cube to create atogether within a cube to create a

    powerful analytical tool.powerful analytical tool.

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    14/24

    StarSchemaStarSchema

    Cubes are easily stored in relational databases,Cubes are easily stored in relational databases,

    using a denormalized data structure called theusing a denormalized data structure called the

    star schema, developed by Ralph Kimballstar schema, developed by Ralph Kimball starts with a central fact tablestarts with a central fact table

    Each row in the central fact table containsEach row in the central fact table contains

    some combination of keys that makes it unique.some combination of keys that makes it unique.

    These keys are called dimensions.These keys are called dimensions.

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    15/24

    Slicing & DicingSlicing & Dicing

    Additional Functionality that can beAdditional Functionality that can be

    thought of as viewing a slice of the datathought of as viewing a slice of the data

    cube, particularly when values forcube, particularly when values formultiple dimensions are fixed.multiple dimensions are fixed.

    Slicing/Dicing simply consists of selectingSlicing/Dicing simply consists of selecting

    specific values for these attributes, whichspecific values for these attributes, whichare then displayed on top of the crossare then displayed on top of the cross--tabtab

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    16/24

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    17/24

    Rollup & DrillRollup & Drill--downdown

    OLAP permit users to view data at ayOLAP permit users to view data at ay

    desired level of granularity.desired level of granularity.

    Rollup: moving from finerRollup: moving from finer--granularity datagranularity data

    to coarser granularityto coarser granularity

    DrillDrill--down: opposite to Rollupdown: opposite to Rollup

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    18/24

    OLAP InplementationOLAP Inplementation

    Multidimensional OLAP (MOLAP)Multidimensional OLAP (MOLAP)

    Relational OLAP (ROLAP)Relational OLAP (ROLAP)

    Hybrid OLAP (HOLAP)Hybrid OLAP (HOLAP)

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    19/24

    MOLAPMOLAP

    The database is stored in a special, usuallyThe database is stored in a special, usually

    proprietary, structure that is optimized forproprietary, structure that is optimized for

    multidimensional analysis.multidimensional analysis. + : very fast query response time because data+ : very fast query response time because data

    is mostly preis mostly pre--calculatedcalculated

    --:: practical limit on the size becausepractical limit on the size because the timethe time

    taken to calculate the database and the spacetaken to calculate the database and the spacerequired to holdrequired to hold these prethese pre--calculated valuescalculated values

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    20/24

    ROLAPROLAP

    The database is a standard relational databaseThe database is a standard relational database

    and the database model is a multidimensionaland the database model is a multidimensional

    model, often referred to as a star or snowflakemodel, often referred to as a star or snowflakemodel or schema.model or schema.

    +: more scalable solution+: more scalable solution

    --:: performance of the queries will be largelyperformance of the queries will be largely

    governed by the complexity of the SQL and thegoverned by the complexity of the SQL and thenumber and size of thenumber and size of the tables being joined intables being joined in

    the querythe query

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    21/24

    HOLAPHOLAP

    a hybrid of ROLAPa hybrid of ROLAP and MOLAPand MOLAP

    can be thought of as a virtual databasecan be thought of as a virtual database

    whereby the higher levels of thewhereby the higher levels of the

    database are implemented as MOLAPdatabase are implemented as MOLAP

    and the lower levels of theand the lower levels of the database asdatabase as

    ROLAPROLAP

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    22/24

    DOLAPDOLAP

    The previous terms are used to refer toThe previous terms are used to refer to

    server based OLAP technologiesserver based OLAP technologies

    DOLAP (Desktop OLAP)DOLAP (Desktop OLAP)

    DOLAP enablesDOLAP enables users to quickly pullusers to quickly pull

    together small cubes that run on theirtogether small cubes that run on their

    desktops or laptopsdesktops or laptops

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    23/24

    ConclusionConclusion

    OLAP is a significant improvement overOLAP is a significant improvement over

    query systemsquery systems

    OLAP is an interactive system to showOLAP is an interactive system to showdifferent summaries of multidimensionaldifferent summaries of multidimensional

    data by interactively selecting thedata by interactively selecting the

    attributes in a multidimensional data cubeattributes in a multidimensional data cube

  • 8/7/2019 Online Analytical Processing by Hweichao Lu

    24/24

    ReferencesReferences

    IBM Redbooks.IBM Redbooks. DB2 Cube Views: A Primer.DB2 Cube Views: A Primer. Durham, NC,Durham, NC,USA: IBM, 2003. ebrary collections. San Jose StateUSA: IBM, 2003. ebrary collections. San Jose StateUniversity.University.

    Jacobson, Reed,Jacobson, Reed, Microsoft SQL Server 2005 AnalysisMicrosoft SQL Server 2005 AnalysisServices Step by StepServices Step by Step. Microsoft Press.. Microsoft Press.

    Berry, Michael J. A.Berry, Michael J. A. Data Mining Techniques : ForData Mining Techniques : For

    Marketing, Sales, and Customer Relationship Management.Marketing, Sales, and Customer Relationship Management.Hoboken, NJ, USA: John Wiley & Sons, Incorporated, 2004.Hoboken, NJ, USA: John Wiley & Sons, Incorporated, 2004.ebrary collections. San Jose State University.ebrary collections. San Jose State University..