Building OLAP Applications with Mondrian and MySQL Julian Hyde, lead developer April 26, 2006

Preview:

Citation preview

Building OLAP Applications with Mondrian and MySQL

Julian Hyde, lead developer

April 26, 2006

Agenda

Introduction to OLAP

Key features and architecture

Getting started

Schema Design

MDX Query language

Tuning

Business Intelligence suite

Q & A

What is OLAP?

View data “dimensionally”i.e. Sales by region, by channel, by time period

Navigate and exploreAd Hoc analysis“Drill-down” from year to quarterPivotSelect specific members for analysis

Interact with high

performanceTechnology optimized for rapid interactive response

A brief history of OLAP

1970

1980

1990

2000

2010

APL

Express

Mondrian

Pentaho

Essbase

Business Objects

Cognos PowerPlay

Mainframe OLAP

Comshare

MicroStrategy

JPivot

Desktop OLAP

Open-source OLAP

Open-source BI suites OpenIBEE

RDBMS support for DWSybase IQ

Informix MetaCubeOracle partitions,bitmap indexes

JRubik Palo

Microsoft OLAP Services

Codd’s “12Rules of OLAP”

Spreadsheets

Mondrian features and architecture

Key Features

On-Line Analytical Processing (OLAP)

cubesAutomated aggregationSpeed-of-thought response times

Open Architecture100% JavaJ2EESupports any JDBC data sourceNo processing required

Analysis ViewersEnables ad-hoc, interactive data explorationAbility to slice-and-dice, drill-down, and pivot Provides insights into problems or successes

How Mondrian Extends MySQL for OLAP Applications

MySQL Provides

Data storage

SQL query execution

Heavy-duty sorting, correlation,

aggregation

Integration point for all BI tools

Mondrian Provides

Dimensional view of data

MDX parsing

SQL generation

Caching

Higher-level calculations

Aggregate awareness

Cube Schema

XML

Cube Schema

XML

Cube Schema

XML

Open Architecture

Open Standards (Java, XML,

MDX, JOLAP, XML/A, SQL)

Cross Platform (Windows,

Unix/Linux, …)

J2EE ArchitectureServer ClusteringFault Tolerance

Data SourcesJDBCJNDI>10 SQL dialects

J2EE Application Server

Mondrian

Web Server

JDBC

RDBMS

cube cube cube

File or RDBMS Repository

RDBMS

JDBC JDBC

JPivot servlet

Viewers

JPivot servlet XML/A servlet

Getting started

Getting started with Mondrian

1. Download mondrian from http://mondrian.sourceforge.net and unzip

2. Copy mondrian.war to TOMCAT_HOME/webapps.

3. Create a database:$ mysqladmin create foodmart$ mysqlmysql> grant all privileges on *.* to 'foodmart'@'localhost' identified by 'foodmart';Query OK, 0 rows affected (0.00 sec)

mysql> quitBye

4. Install demo dataset (300,000 rows):$ java -cp

"/mondrian/lib/mondrian.jar:/mondrian/lib/log4j-1.2.9.jar:/mondrian/lib/eigenbase-xom.jar:/mondrian/lib/eigenbase-resgen.jar:/mondrian/lib/eigenbase-properties.jar:/usr/local/mysql/mysql-connector-java-3.1.12-bin.jar"     mondrian.test.loader.MondrianFoodMartLoader     -verbose -tables -data -indexes     -jdbcDrivers=com.mysql.jdbc.Driver     -inputFile=/mondrian/demo/FoodMartCreateData.sql      -outputJdbcURL= "jdbc:mysql://localhost/foodmart?user=foodmart&password=foodmart“

5. Start tomcat, and hit http://localhost:8080/mondrian

demonstration

Designing a Mondrian schema

A Mondrian schema consists of…

A dimensional model (logical)Cubes & virtual cubesShared & private dimensionsCalculated measures in cube and in query languageParent-child hierarchies

… mapped onto a star/snowflake

schema (physical)Fact tableDimension tablesJoined by foreign key relationships

Writing a Mondrian Schema

Regular cubes, dimensions,

hierarchies

Shared dimensions

Virtual cubes

Parent-child hierarchies

Custom readers

Access-control

<?xml version="1.0"?><Schema name="FoodMart"> <!– Shared dimensions --> <Dimension name="Region"> <Hierarchy hasAll="true“ allMemberName="All Regions"> <Table name="QUADRANT_ACTUALS"/> <Level name="Region" column="REGION“

uniqueMembers="true"/> </Hierarchy> </Dimension> <Dimension name="Department"> <Hierarchy hasAll="true“ allMemberName="All Departments"> <Table name="QUADRANT_ACTUALS"/> <Level name="Department“ column="DEPARTMENT“

uniqueMembers="true"/> </Hierarchy> </Dimension>

(Refer to http://mondrian.sourceforge.net/schema.html.)

Mapping

Each catalog object (cube,

hierarchy, measure) maps

onto a part of a snowflake

schema

<DimensionUsage> links a

shared dimension into a

particular cube

<CalculatedMember> defines a

member using a formula

<Cube name="Quadrant Analysis"> <Table name="QUADRANT_ACTUALS"/> <DimensionUsage name="Region" source="Region“ /> <DimensionUsage name="Department“ source="Department" /> <DimensionUsage name="Positions“ source="Positions" /> <Measure name="Actual" column="ACTUAL“

aggregator="sum“ formatString="#,###.00"/> <Measure name="Budget" column="BUDGET“

aggregator="sum“ formatString="#,###.00"/> <Measure name="Variance“ column="VARIANCE“

aggregator="sum“ formatString="#,###.00"/> <CalculatedMember name="Variance Percent“

dimension="Measures“> <Formula> ([Measures].[Variance] / [Measures].[Budget]) * 100 </Formula></CalculatedMember>

</Cube>

Security

Role-based access to data

Container authenticates and

chooses role (Mondrian does

not authenticate)

Examples:Alan can only see sales for Electronics productsBeatrice can see aggregated sales, but cannot drill down to individual customersCharles cannot slice using Gender hierarchy

<Role name="California manager"> <SchemaGrant access="none"> <CubeGrant cube="Sales" access="all"> <HierarchyGrant hierarchy="[Store]"

access="custom" topLevel="[Store].[Store Country]"> <MemberGrant member="[Store].[USA].[CA]“

access="all“ /> <MemberGrant

member="[Store].[USA].[CA].[Los Angeles]“ access="none“ />

</HierarchyGrant> <HierarchyGrant hierarchy="[Customers]“

access="custom" topLevel="[Customers].[State Province]“

bottomLevel="[Customers].[City]"> <MemberGrant member="[Customers].[USA].

[CA]“ access="all“ />

<MemberGrant member="[Customers].[USA].[CA].[Los Angeles]“ access="none“ />

</HierarchyGrant> <HierarchyGrant hierarchy="[Gender]"

access="none“/> </CubeGrant> </SchemaGrant></Role>

MDX query language

Who needs a query language?

Your end-users don’t need a query language

A query language helps you build more complex reports

MDX features:

Calculated members and sets

Localized format strings

Parameters

Member properties

Large set of built-in functions

MDX – Multi-Dimensional Expressions

A language for multidimensional queries

Plays the same role in Mondrian’s API as SQL does in JDBC

SQL-like syntax

… but un-SQL-like semantics

A query consists of:

Axes (usually ROWS and COLUMNS)

Slicer expression (WHERE)

Members and measures

SELECT {[Measures].[Unit Sales]} ON COLUMNS,

{[Store].[USA], [Store].[USA].[CA]} ON ROWS

FROM [Sales]

WHERE [Time].[1997].[Q1]

MDX evaluation

1. Every query is based on a cube

2. Slicer sets the context

3. Columns and rows expressions are

evaluated

4. Each cell in the resulting grid is

evaluatedCalculated members cause recursive evaluation

WITH MEMBER [Measures].[Margin] AS

‘ ([Measures].[Store Sales] – [Measures].[Store Cost]) /

[Measures].[Unit Sales] ‘

SELECT {[Measures].[Unit Sales], [Measures].[Margin]} ON COLUMNS,

Filter({[Store].[USA], [Store].[USA].Children},

[Measures].[Unit Sales] > 1000) ON ROWS

FROM [Sales]

WHERE ([Time].[1997].[Q1], [Product].[Beer])

demonstration

Advanced MDX

Calculated members

Calculated sets

Parameters

Time-series functions

Format strings

For additional details, see MDX resources on MSDN:•http://msdn.microsoft.com/library/en-us/olapdmad/agmdxbasics_

1bzs.asp

Tuning

Tuning Mondrian

Some techniques:

Test performance on a realistic data set!

Use tracing to identify long-running SQL statements

Tune SQL queries

Run the same query several times, to examine cache

effectiveness

When Mondrian starts, prime the cache by running queries

Get to know Mondrian’s system properties

Tuning MySQL for Mondrian

Fact table:Index foreign keysTry indexing combinations of foreign keysUse MyISAM or MERGE (also known as MRG_MyISAM)Partitioned tables (MySQL 5.1)

Dimension tables:Index primary and foreign keysUse MyISAM for large dimension tables, MEMORY for smaller dimension tables

Create aggregate tables:Contain pre-computed summariesPrevent full table scan followed by massive GROUP BY

Parent-child hierarchiesCreate closure tables

The whole enchilada

Business Intelligence Suite

Mondrian OLAP server

Analysis tools:Pivot tableCharting

Dashboards

ETL (extract/transform/load)

Integration with operational reporting

Integration with data mining

Actions on operational data

Design/tuning tools

Pentaho Open Source BI Offerings

All available in a Free Open Source license

Q & A

Next Steps and Resources

More information http://www.pentaho.org and http://

mondrian.sourceforge.net

Pentaho Community Forum http://www.pentaho.orgGo to Developer ZoneDiscussions

Pentaho BI Platform including Pentaho Analysis

http://sourceforge.net/project/showfiles.php?group_id=140317Pentaho Analysis (pentaho_demo-1.1.4.239.zip)Pentaho Report Design Wizard (pentaho-report-design-wizard-1.1.4.167.zip)

Mondrian OLAP Library onlyhttp://sourceforge.net/project/showfiles.php?group_id=35302

Thank you for attending!

Extra material

Architecture

Storage Layer

Star Layer

Presentation Layer

Dimensional Layer

Extending Mondrian

User-defined functions

User-defined cell, member and property formatters

Member readers

Schema processors

Localized resources

End User UI

JPivot (sister SourceForge project) is a JSP UI

Analytical grids

Charting

Tag library

Administration UI

Workbench (written in Java/SWT)

Recommended