39
CogNova Technologies 1 COMP 3503 COMP 3503 Deductive Modeling with OLAP Deductive Modeling with OLAP with with Daniel L. Silver Daniel L. Silver Copyright (c), 2007 All Rights Reserved

Introduction to KDD for Tony's MI Course

  • Upload
    tommy96

  • View
    960

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

1

COMP 3503COMP 3503

Deductive Modeling with OLAPDeductive Modeling with OLAP

withwith

Daniel L. SilverDaniel L. Silver

Copyright (c), 2007All Rights Reserved

Page 2: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

2

AgendaAgenda

What is OLAP?What is OLAP? OLAP, MOLAP and ROLAPOLAP, MOLAP and ROLAP OLAP FunctionalityOLAP Functionality Overview of Cognos PowerPlayOverview of Cognos PowerPlay OLAP Pros and ConsOLAP Pros and Cons

Page 3: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

3

What is OLAP?What is OLAP?

Page 4: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

4

On-Line Analytical On-Line Analytical ProcessingProcessing

OLAPOLAP Term coined by E.F. Codd in a document Term coined by E.F. Codd in a document

published in 1993 sponsored by Arbor published in 1993 sponsored by Arbor Software Corp (ESSBASE)Software Corp (ESSBASE)

Redefined requirements for tools to Redefined requirements for tools to implement decision support and implement decision support and business intelligence systems.business intelligence systems.

Has had a significant impact on the Has had a significant impact on the database and business software market.database and business software market.

Page 5: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

5

OLAP DefinitionOLAP Definition Online Analytical Processing = OLAPOnline Analytical Processing = OLAP refers to refers to

technology that allows users of multidimensional technology that allows users of multidimensional data bases to generate on-line descriptive or data bases to generate on-line descriptive or comparative summaries ("views") of data and comparative summaries ("views") of data and other analytic queries. other analytic queries.

OLAPOLAP facilities can (and should) be integrated facilities can (and should) be integrated into enterprise-wide data base systems and they into enterprise-wide data base systems and they allow analysts and managers to monitor the allow analysts and managers to monitor the performance of the business (e.g., such as performance of the business (e.g., such as various aspects of the manufacturing process or various aspects of the manufacturing process or numbers and types of completed transactions at numbers and types of completed transactions at different locations) or the market.different locations) or the market.

Courtesy Anders Stjarne

Page 6: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

6Multidimensional Multidimensional RequirementsRequirements

Example: Example: Sales volumeSales volume as a function as a function of of productproduct, , timetime, and , and geography.geography.

Pro

duct

Geogr

aphy

Time

Dimensions: Product, Geography, Time

Measure: ‘Sales Volume’

Courtesy Anders Stjarne

More than three dimensional data cube is referred to as a hypercube

Page 7: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

7

Deductive Modelling and Deductive Modelling and AnalysisAnalysis

QuarterMonth

TypeCustomer

LineBrandNumber

CountryBranchSales Rep

QuantityCostMargin

Combination 1

QuarterMonth

TypeCustomer

LineBrandNumber

CountryBranchSales Rep

QuantityCostMargin

Combination 2

When?Time(1997)

Who?Customers(Channels)

What?Product(Type)

Where?Location(Region)

Result?Indicator

(Revenue)

Comprehensive Sales Analysis

Courtesy Anders Stjarne

Page 8: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

8

On-Line Analytical On-Line Analytical ProcessingProcessing Strong connection to multi-dimensional Strong connection to multi-dimensional

database model - MOLAPdatabase model - MOLAP Data-cubes are typically constructed Data-cubes are typically constructed

off-line due to time required to build off-line due to time required to build indicesindices

Dimensions, values, and aggregations Dimensions, values, and aggregations are limited to that within data-cubeare limited to that within data-cube

On-line cube development has allowed On-line cube development has allowed RDBMS vendors to survive as major RDBMS vendors to survive as major players in OLAP market - ROLAPplayers in OLAP market - ROLAP

Page 9: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

9

On-Line Analytical On-Line Analytical ProcessingProcessing12 Rules of an OLAP Environment12 Rules of an OLAP Environment by E.F. Coddby E.F. Codd

Multi-dimensional - Multi-dimensional - data-cubes data-cubes oror hypercubes hypercubes

Transparent accessTransparent access Navigation aidsNavigation aids Consistent reportingConsistent reporting Client-sever basedClient-sever based Generic Generic

dimensionalitydimensionality Efficient data storageEfficient data storage

Multi-user supportMulti-user support Unrestricted cross-Unrestricted cross-

dimensional dimensional operationsoperations

Intuitive data Intuitive data manipulationmanipulation

Flexible reportingFlexible reporting Unlimited levels of Unlimited levels of

aggregationaggregation

Page 10: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

10

OLAP FunctionalityOLAP Functionality

Page 11: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

11

On-Line Analytical On-Line Analytical ProcessingProcessing

Deductive Modeling with OLAP Deductive Modeling with OLAP Model is developed within the users mind as Model is developed within the users mind as

data is exploreddata is explored Verification or rejection is facilitated by Verification or rejection is facilitated by

multi-dimensional functions which display multi-dimensional functions which display data numerically and graphicallydata numerically and graphically

Best practices:Best practices:• Determine suspected variable interaction Determine suspected variable interaction • Verify/reject model through explorationVerify/reject model through exploration• Drill-down to refine model Drill-down to refine model • Maintain record of exploratory findingsMaintain record of exploratory findings

Page 12: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

12

On-Line Analytical On-Line Analytical ProcessingProcessing

Basic OLAP FunctionalityBasic OLAP Functionality Dimension selection - slice & diceDimension selection - slice & dice Rotation - allows change in perspectiveRotation - allows change in perspective Filtration -value range selectionFiltration -value range selection Hierarchies of aggregation levelsHierarchies of aggregation levels

• drill-downs to lower levels drill-downs to lower levels • roll-ups to higher levelsroll-ups to higher levels

Tremendous tool for decision support and Tremendous tool for decision support and executive information delivery and analysisexecutive information delivery and analysis

Page 13: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

13

OLAP - Sample OperationsOLAP - Sample Operations Roll up: summarize dataRoll up: summarize data

• total sales volume last year by product category total sales volume last year by product category by regionby region

Roll down, drill down, drill through: go from Roll down, drill down, drill through: go from higher level summary to lower level summary higher level summary to lower level summary or detailed dataor detailed data

• For a particular product category, find the For a particular product category, find the detailed sales data for each salesperson by datedetailed sales data for each salesperson by date

Slice and dice: select and projectSlice and dice: select and project• Sales of beverages in the West over the last 6 Sales of beverages in the West over the last 6

monthsmonths Pivot or rotate: change visual dimensionsPivot or rotate: change visual dimensions

Courtesy Anders Stjarne

Page 14: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

14

OLAP and Data MiningOLAP and Data Mining

The final results from OLAP The final results from OLAP exploration can lead to inductive data exploration can lead to inductive data miningmining

Data Mining techniques can be Data Mining techniques can be applied to the data views and applied to the data views and summaries generated by OLAP to summaries generated by OLAP to provide more in-depth and often more provide more in-depth and often more multidimensional knowledgemultidimensional knowledge

Data Mining techniques can be Data Mining techniques can be considered analytic extension of OLAPconsidered analytic extension of OLAP

Page 15: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

15

OLAP, MOLAP and ROLAPOLAP, MOLAP and ROLAP

Page 16: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

16

OLAP Distributed FrameworkOLAP Distributed FrameworkOLAP functions are independent of:OLAP functions are independent of:

• Front-end user interfaceFront-end user interface• Back-end data storageBack-end data storage

OLAPTool

Front-endclient tool

- Web browser- Spread Sheet

StagedMulti-Dim

DataData Source: Data Mart

PopulateMulti-Dim

Data Structurein realtime(on the fly)

(server)“CUBE”

OLAPTool

Front-endclient tool

- Web browser- Spread Sheet

StagedMulti-Dim

DataData Source: Data Mart

PopulateMulti-Dim

Data Structurein realtime(on the fly)

(server)“CUBE”

Courtesy Anders Stjarne

Page 17: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

17

MOLAP vs. ROLAPMOLAP vs. ROLAPMultidimensionalMultidimensional• difficulty handling sparcity difficulty handling sparcity

efficiently efficiently • direct representation of the direct representation of the

data “cube”data “cube”• rapid drill down on rapid drill down on

summary datasummary data• proprietary solutionsproprietary solutions• better performance better performance

responseresponse• does not scale well to does not scale well to

handle large amounts of handle large amounts of detaildetail

• thin client, analytical thin client, analytical processing done on serverprocessing done on serverREF: White, “MOLAP vs ROLAP,” (B&A-15)

Relational• multidimensional view built on a

Relational DBMS

• hampered by the limitations of SQL

• handles sparcity automatically

• stores summary and detail data equally easily

• easy to share common dimensions across DWs

• scales well using well-developed relational technology

• depends on efficient processing of STAR joins and indexes

• analytical processing done on the client (or middle server)

Courtesy Anders Stjarne

Page 18: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

18

Overview of CognosOverview of CognosPowerPlay OLAPPowerPlay OLAP

Page 19: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

19

PowerPlay includes the PowerPlay includes the following components:following components:

•Transformer Transformer o Used to define the contents of a Used to define the contents of a

cube and create the cubecube and create the cube

•PowerPlayPowerPlayo Accesses cubes for data exploration Accesses cubes for data exploration

and reporting.and reporting.

PowerPlay for Windows PowerPlay for Windows ComponentsComponents

Courtesy Anders Stjarne

Page 20: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

20

PowerPlay CubesPowerPlay Cubes A cube is a structure that stores data multi-dimensionally and A cube is a structure that stores data multi-dimensionally and

provides:provides:• secure data accesssecure data access• fast retrieval of data.fast retrieval of data.

Cubes can be distributed across a network or to individual computers.Cubes can be distributed across a network or to individual computers.

CustomersChannels

Products

LocationsSales Reps

Time

CustomersChannels

Products

LocationsSales Reps

Time

Courtesy Anders Stjarne

Page 21: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

21

MeasuresMeasures The The numericnumeric (continuous) data that is collected and stored by your organization. (continuous) data that is collected and stored by your organization.

The performance measures used to evaluate your business.The performance measures used to evaluate your business.

Examples:Examples:• RevenueRevenue• CostCost• Quantity soldQuantity sold• Units on-handUnits on-hand• Hours per JobHours per Job• Number of callsNumber of calls• Defective units.Defective units.

#%

Revenue - Cost = Profit Margin

Basic

Derived

Courtesy Anders Stjarne

Page 22: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

22

Dimensions and LevelsDimensions and Levels DimensionsDimensions are a broad group of descriptive are a broad group of descriptive

data about the major aspects of your business.data about the major aspects of your business. LevelsLevels represent established hierarchy within represent established hierarchy within

dimensionsdimensions..

Dimensions

Levels

When?

Date

What?

Products

Where?

Locations

Years

Months

Days

Line

Type

Product

Region

Branch

Country

Courtesy Anders Stjarne

Page 23: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

23

Levels and CategoriesLevels and Categories•A A category iscategory is a data item that populates a level in a a data item that populates a level in a

dimension.dimension.

Levels

CategoriesDimension Locations

Region

Country

Branch

Europe

United Kingdom

London, U.K.

Manchester, U.K.

Courtesy Anders Stjarne

Page 24: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

24

Application Development ProcessApplication Development Process

Plan measures and dimensions

Create the cube

RevenueUnitsDiscountsQuota

Years 2

Quarters 8

Months 24

State 4City 16Store 72

Business Units

3ProductLines

6Brands 18Products 125

All Years National SalesForce

All Products

Sales Management Example

Technician 158

Obtain the required data

Develop the PowerPlay model

Explore the cube data using PowerPlay

Courtesy Anders Stjarne

Page 25: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

25

Explorer and ReporterExplorer and Reporter

PowerPlay offers two report modes:PowerPlay offers two report modes:

Build custom reports

Addcategories

Reporter

Investigate

Replace categories

Explorer

Courtesy Anders Stjarne

Page 26: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

26

Explorer Crosstab ReportExplorer Crosstab Report The default Explorer crosstab report contains:The default Explorer crosstab report contains:

• the first two dimensions in the rows and the first two dimensions in the rows and columnscolumns

• values for the first measurevalues for the first measure

• a summary row and column.a summary row and column.

Rows

ColumnsSummarycolumn

SummaryRow

Measures

Courtesy Anders Stjarne

Page 27: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

27

PowerPlay Toolbar and PowerPlay Toolbar and MenusMenus

You can access commonly used You can access commonly used features on the PowerPlay toolbar.features on the PowerPlay toolbar.

PowerPlay menus offer extended PowerPlay menus offer extended features.features.

Right-click a report to view and Right-click a report to view and use theuse theavailable options from a shortcut available options from a shortcut menu.menu.

Courtesy Anders Stjarne

Page 28: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

28

The Dimension LineThe Dimension Line

Courtesy Anders Stjarne

Use the dimension line to:Use the dimension line to:• filter datafilter data• navigate dimensions and change navigate dimensions and change

measuresmeasures• view the current level.view the current level.

Page 29: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

29

Dimension ViewerDimension Viewer

The dimension viewer is used to view the content and navigation paths of a selected cube, and the cube path.

The toolbox buttons provide access to commonly used features.

Dimension = Locations

Toolbox

Cube path

Level 1 = StatesCategory = CA

Level 2 = CitiesCategory = San Diego

Measures

Courtesy Anders Stjarne

Page 30: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

30

PowerPlay File ExtensionsPowerPlay File Extensions

.ppr, .ppx, .pdffor reports

.mdc for cubes

Courtesy Anders Stjarne

Page 31: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

31

Basic OLAP OperationsBasic OLAP Operations• Selection (Filter) – within the range of a dimensionSelection (Filter) – within the range of a dimension• Scope – the range on a dimensionScope – the range on a dimension• Slice – Slice – a two dimensional ‘page’ from the cubea two dimensional ‘page’ from the cube

• Dice Dice – chopping up along the dimensions– chopping up along the dimensions

• Drill down analysis - Drill down analysis - to the detail beneath summary datato the detail beneath summary data

• Rollup/ ConsolidateRollup/ Consolidate• Rotate (Pivot) – change dimension orientationRotate (Pivot) – change dimension orientation

o Swap rows and columnsSwap rows and columnso Swap on or offSwap on or offo Change nesting orderChange nesting order

• Reach Through – to the source data detailReach Through – to the source data detail• Calculations / Derivation formulas on the measured factsCalculations / Derivation formulas on the measured facts

o Ratios, Rankings, etc.Ratios, Rankings, etc.o E.g., E.g., NetSales = GrossSales – Cost; NetSales = GrossSales*(1 - NetSales = GrossSales – Cost; NetSales = GrossSales*(1 -

Margin)Margin) REFS: INMON, Building, Ch. 7, p. 243; White, “MOLAP vs ROLAP,” (B&A-15)

Courtesy Anders Stjarne

Page 32: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

32

Advanced OLAP Advanced OLAP OperationsOperations Trend analysis - over broad vistas of timeTrend analysis - over broad vistas of time

• handling time series data, time handling time series data, time calculationscalculations

Key ratio indicator measurement and Key ratio indicator measurement and trackingtracking

Comparisons - present to: past, plan, and Comparisons - present to: past, plan, and othersothers• competitive market analysiscompetitive market analysis

Problem monitoring - of variables within Problem monitoring - of variables within control limitscontrol limits

Alerts and Event-Driven Agent ProcessingAlerts and Event-Driven Agent ProcessingCourtesy Anders Stjarne

Page 33: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

33

OLAP Pros and ConsOLAP Pros and Cons

Page 34: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

34

On-Line Analytical On-Line Analytical ProcessingProcessing

Strengths of OLAP Strengths of OLAP Powerful visualization ability via GUIPowerful visualization ability via GUI Fast, interactive response timesFast, interactive response times Analysis of time seriesAnalysis of time series Deductive discovery of Deductive discovery of

clusters/exceptionsclusters/exceptions Many OLAP products available and Many OLAP products available and

integrated to DB productsintegrated to DB products

Page 35: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

35

On-Line Analytical On-Line Analytical ProcessingProcessing

Weaknesses of OLAP Weaknesses of OLAP Does not handle continuous variablesDoes not handle continuous variables Does not automatically discover Does not automatically discover

patterns and models patterns and models Generation of a hypercube requires Generation of a hypercube requires

some training and experiencesome training and experience Hypercube generation and update - Hypercube generation and update -

MOLAP Vs. ROLAPMOLAP Vs. ROLAP

Page 36: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

36

On-Line Analytical On-Line Analytical ProcessingProcessing Products and SuppliersProducts and Suppliers

PC OLAPPC OLAP• PowerPlay (Cognos)PowerPlay (Cognos)

High-end ROLAPHigh-end ROLAP• DSS Agent (Microstrategy)DSS Agent (Microstrategy)• InfoBeacon (Platinum Technology)InfoBeacon (Platinum Technology)

High-end MOLAPHigh-end MOLAP• Accumate (Kenan)Accumate (Kenan)• Oracle Express (Oracle)Oracle Express (Oracle)• Wired/ESSBASE (AppSource/Arbor Software)Wired/ESSBASE (AppSource/Arbor Software)

Page 37: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

37

TutorialTutorial

Cognos Transformer and Cognos Transformer and PowerPlayPowerPlay

Star Schema – Star Schema – http://www.ciobriefings.com/whitephttp://www.ciobriefings.com/whitepapers/StarSchema.aspapers/StarSchema.asp

Page 38: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

38

THE ENDTHE END

[email protected]@acadiau.ca

Page 39: Introduction to KDD for Tony's MI Course

CogNovaTechnologies

39

Codd’s 18 Rules for OLAPCodd’s 18 Rules for OLAP BASIC FEATURES:BASIC FEATURES:

• Multidimensional Conceptual View (#1)Multidimensional Conceptual View (#1)• Intuitive data manipulation (#10)Intuitive data manipulation (#10)• Accessibility (#3) – OLAP server engine as middlewareAccessibility (#3) – OLAP server engine as middleware• Batch Extraction & Interpretive (on the fly) – implies hybridBatch Extraction & Interpretive (on the fly) – implies hybrid• OLAP Analysis Models – categorical, exegetical, contemplative, goal seekingOLAP Analysis Models – categorical, exegetical, contemplative, goal seeking• Client-Server Architecture (#5)Client-Server Architecture (#5)• Transparency (#2)Transparency (#2)• Multi-User Support (#8) – concurrent access, and update, with securityMulti-User Support (#8) – concurrent access, and update, with security

SPECIAL FEATURES:SPECIAL FEATURES:• Treatment of Non-Normalized DataTreatment of Non-Normalized Data• Storing OLAP Results separate from Source DataStoring OLAP Results separate from Source Data• Extraction of Missing Values – missing(NULL) distinct from zeroExtraction of Missing Values – missing(NULL) distinct from zero• Treatment of Missing Values – excluded from statistical calculationsTreatment of Missing Values – excluded from statistical calculations

REPORTING FEATURES:REPORTING FEATURES:• Flexible Reporting (#11) – laying out dimensions in any wayFlexible Reporting (#11) – laying out dimensions in any way• Uniform Reporting Performance (#4) – not vary by #dimensions, or sizeUniform Reporting Performance (#4) – not vary by #dimensions, or size• Automatic Adjustment of Physical Level (#7) – adjust for sparsity, sizeAutomatic Adjustment of Physical Level (#7) – adjust for sparsity, size

DIMENSION CONTROL:DIMENSION CONTROL:• Generic Dimensionality (#6) – all dimensions treated uniformlyGeneric Dimensionality (#6) – all dimensions treated uniformly• Unlimited Dimensions & Aggregation Levels (#12)Unlimited Dimensions & Aggregation Levels (#12)• Unrestricted Cross-Dimensional Operations (#9)Unrestricted Cross-Dimensional Operations (#9)

Courtesy Anders Stjarne