View
221
Download
0
Category
Tags:
Preview:
Citation preview
Building OLAP Applications with Mondrian and MySQL
Julian Hyde, lead developer
April 26, 2006
Agenda
Introduction to OLAP
Key features and architecture
Getting started
Schema Design
MDX Query language
Tuning
Business Intelligence suite
Q & A
What is OLAP?
View data “dimensionally”i.e. Sales by region, by channel, by time period
Navigate and exploreAd Hoc analysis“Drill-down” from year to quarterPivotSelect specific members for analysis
Interact with high
performanceTechnology optimized for rapid interactive response
A brief history of OLAP
1970
1980
1990
2000
2010
APL
Express
Mondrian
Pentaho
Essbase
Business Objects
Cognos PowerPlay
Mainframe OLAP
Comshare
MicroStrategy
JPivot
Desktop OLAP
Open-source OLAP
Open-source BI suites OpenIBEE
RDBMS support for DWSybase IQ
Informix MetaCubeOracle partitions,bitmap indexes
JRubik Palo
Microsoft OLAP Services
Codd’s “12Rules of OLAP”
Spreadsheets
Mondrian features and architecture
Key Features
On-Line Analytical Processing (OLAP)
cubesAutomated aggregationSpeed-of-thought response times
Open Architecture100% JavaJ2EESupports any JDBC data sourceNo processing required
Analysis ViewersEnables ad-hoc, interactive data explorationAbility to slice-and-dice, drill-down, and pivot Provides insights into problems or successes
How Mondrian Extends MySQL for OLAP Applications
MySQL Provides
Data storage
SQL query execution
Heavy-duty sorting, correlation,
aggregation
Integration point for all BI tools
Mondrian Provides
Dimensional view of data
MDX parsing
SQL generation
Caching
Higher-level calculations
Aggregate awareness
Cube Schema
XML
Cube Schema
XML
Cube Schema
XML
Open Architecture
Open Standards (Java, XML,
MDX, JOLAP, XML/A, SQL)
Cross Platform (Windows,
Unix/Linux, …)
J2EE ArchitectureServer ClusteringFault Tolerance
Data SourcesJDBCJNDI>10 SQL dialects
J2EE Application Server
Mondrian
Web Server
JDBC
RDBMS
cube cube cube
File or RDBMS Repository
RDBMS
JDBC JDBC
JPivot servlet
Viewers
JPivot servlet XML/A servlet
Getting started
Getting started with Mondrian
1. Download mondrian from http://mondrian.sourceforge.net and unzip
2. Copy mondrian.war to TOMCAT_HOME/webapps.
3. Create a database:$ mysqladmin create foodmart$ mysqlmysql> grant all privileges on *.* to 'foodmart'@'localhost' identified by 'foodmart';Query OK, 0 rows affected (0.00 sec)
mysql> quitBye
4. Install demo dataset (300,000 rows):$ java -cp
"/mondrian/lib/mondrian.jar:/mondrian/lib/log4j-1.2.9.jar:/mondrian/lib/eigenbase-xom.jar:/mondrian/lib/eigenbase-resgen.jar:/mondrian/lib/eigenbase-properties.jar:/usr/local/mysql/mysql-connector-java-3.1.12-bin.jar" mondrian.test.loader.MondrianFoodMartLoader -verbose -tables -data -indexes -jdbcDrivers=com.mysql.jdbc.Driver -inputFile=/mondrian/demo/FoodMartCreateData.sql -outputJdbcURL= "jdbc:mysql://localhost/foodmart?user=foodmart&password=foodmart“
5. Start tomcat, and hit http://localhost:8080/mondrian
demonstration
Designing a Mondrian schema
A Mondrian schema consists of…
A dimensional model (logical)Cubes & virtual cubesShared & private dimensionsCalculated measures in cube and in query languageParent-child hierarchies
… mapped onto a star/snowflake
schema (physical)Fact tableDimension tablesJoined by foreign key relationships
Writing a Mondrian Schema
Regular cubes, dimensions,
hierarchies
Shared dimensions
Virtual cubes
Parent-child hierarchies
Custom readers
Access-control
<?xml version="1.0"?><Schema name="FoodMart"> <!– Shared dimensions --> <Dimension name="Region"> <Hierarchy hasAll="true“ allMemberName="All Regions"> <Table name="QUADRANT_ACTUALS"/> <Level name="Region" column="REGION“
uniqueMembers="true"/> </Hierarchy> </Dimension> <Dimension name="Department"> <Hierarchy hasAll="true“ allMemberName="All Departments"> <Table name="QUADRANT_ACTUALS"/> <Level name="Department“ column="DEPARTMENT“
uniqueMembers="true"/> </Hierarchy> </Dimension>
(Refer to http://mondrian.sourceforge.net/schema.html.)
Mapping
Each catalog object (cube,
hierarchy, measure) maps
onto a part of a snowflake
schema
<DimensionUsage> links a
shared dimension into a
particular cube
<CalculatedMember> defines a
member using a formula
<Cube name="Quadrant Analysis"> <Table name="QUADRANT_ACTUALS"/> <DimensionUsage name="Region" source="Region“ /> <DimensionUsage name="Department“ source="Department" /> <DimensionUsage name="Positions“ source="Positions" /> <Measure name="Actual" column="ACTUAL“
aggregator="sum“ formatString="#,###.00"/> <Measure name="Budget" column="BUDGET“
aggregator="sum“ formatString="#,###.00"/> <Measure name="Variance“ column="VARIANCE“
aggregator="sum“ formatString="#,###.00"/> <CalculatedMember name="Variance Percent“
dimension="Measures“> <Formula> ([Measures].[Variance] / [Measures].[Budget]) * 100 </Formula></CalculatedMember>
</Cube>
Security
Role-based access to data
Container authenticates and
chooses role (Mondrian does
not authenticate)
Examples:Alan can only see sales for Electronics productsBeatrice can see aggregated sales, but cannot drill down to individual customersCharles cannot slice using Gender hierarchy
<Role name="California manager"> <SchemaGrant access="none"> <CubeGrant cube="Sales" access="all"> <HierarchyGrant hierarchy="[Store]"
access="custom" topLevel="[Store].[Store Country]"> <MemberGrant member="[Store].[USA].[CA]“
access="all“ /> <MemberGrant
member="[Store].[USA].[CA].[Los Angeles]“ access="none“ />
</HierarchyGrant> <HierarchyGrant hierarchy="[Customers]“
access="custom" topLevel="[Customers].[State Province]“
bottomLevel="[Customers].[City]"> <MemberGrant member="[Customers].[USA].
[CA]“ access="all“ />
<MemberGrant member="[Customers].[USA].[CA].[Los Angeles]“ access="none“ />
</HierarchyGrant> <HierarchyGrant hierarchy="[Gender]"
access="none“/> </CubeGrant> </SchemaGrant></Role>
MDX query language
Who needs a query language?
Your end-users don’t need a query language
A query language helps you build more complex reports
MDX features:
Calculated members and sets
Localized format strings
Parameters
Member properties
Large set of built-in functions
MDX – Multi-Dimensional Expressions
A language for multidimensional queries
Plays the same role in Mondrian’s API as SQL does in JDBC
SQL-like syntax
… but un-SQL-like semantics
A query consists of:
Axes (usually ROWS and COLUMNS)
Slicer expression (WHERE)
Members and measures
SELECT {[Measures].[Unit Sales]} ON COLUMNS,
{[Store].[USA], [Store].[USA].[CA]} ON ROWS
FROM [Sales]
WHERE [Time].[1997].[Q1]
MDX evaluation
1. Every query is based on a cube
2. Slicer sets the context
3. Columns and rows expressions are
evaluated
4. Each cell in the resulting grid is
evaluatedCalculated members cause recursive evaluation
WITH MEMBER [Measures].[Margin] AS
‘ ([Measures].[Store Sales] – [Measures].[Store Cost]) /
[Measures].[Unit Sales] ‘
SELECT {[Measures].[Unit Sales], [Measures].[Margin]} ON COLUMNS,
Filter({[Store].[USA], [Store].[USA].Children},
[Measures].[Unit Sales] > 1000) ON ROWS
FROM [Sales]
WHERE ([Time].[1997].[Q1], [Product].[Beer])
demonstration
Advanced MDX
Calculated members
Calculated sets
Parameters
Time-series functions
Format strings
For additional details, see MDX resources on MSDN:•http://msdn.microsoft.com/library/en-us/olapdmad/agmdxbasics_
1bzs.asp
Tuning
Tuning Mondrian
Some techniques:
Test performance on a realistic data set!
Use tracing to identify long-running SQL statements
Tune SQL queries
Run the same query several times, to examine cache
effectiveness
When Mondrian starts, prime the cache by running queries
Get to know Mondrian’s system properties
Tuning MySQL for Mondrian
Fact table:Index foreign keysTry indexing combinations of foreign keysUse MyISAM or MERGE (also known as MRG_MyISAM)Partitioned tables (MySQL 5.1)
Dimension tables:Index primary and foreign keysUse MyISAM for large dimension tables, MEMORY for smaller dimension tables
Create aggregate tables:Contain pre-computed summariesPrevent full table scan followed by massive GROUP BY
Parent-child hierarchiesCreate closure tables
The whole enchilada
Business Intelligence Suite
Mondrian OLAP server
Analysis tools:Pivot tableCharting
Dashboards
ETL (extract/transform/load)
Integration with operational reporting
Integration with data mining
Actions on operational data
Design/tuning tools
Pentaho Open Source BI Offerings
All available in a Free Open Source license
Q & A
Next Steps and Resources
More information http://www.pentaho.org and http://
mondrian.sourceforge.net
Pentaho Community Forum http://www.pentaho.orgGo to Developer ZoneDiscussions
Pentaho BI Platform including Pentaho Analysis
http://sourceforge.net/project/showfiles.php?group_id=140317Pentaho Analysis (pentaho_demo-1.1.4.239.zip)Pentaho Report Design Wizard (pentaho-report-design-wizard-1.1.4.167.zip)
Mondrian OLAP Library onlyhttp://sourceforge.net/project/showfiles.php?group_id=35302
Thank you for attending!
Extra material
Architecture
Storage Layer
Star Layer
Presentation Layer
Dimensional Layer
Extending Mondrian
User-defined functions
User-defined cell, member and property formatters
Member readers
Schema processors
Localized resources
End User UI
JPivot (sister SourceForge project) is a JSP UI
Analytical grids
Charting
Tag library
Administration UI
Workbench (written in Java/SWT)
Recommended