Upload
nnrsekhar
View
130
Download
0
Embed Size (px)
Citation preview
OLAP – On Line Analytical Processing
2
Objectives:
At the end of this session, you will be able to:
> Define On Line Analytical Processing
> Understand the need for OLAP and applications of OLAP in BI
> Describe the various OLAP solutions and Architecture
> Comparison of different OLAP architectures
> Evaluation parameters to be considered for selecting an OLAP tool
Session Objectives
3
> OLAP (On Line Analytical Processing) applications - designed
for online ad-hoc data access and analysis.
> Data organized into multiple dimensions.
> Access to analytical content such as time series and trend
analysis views and summary level information.
> A set of functionality that attempts to facilitate
multidimensional analysis.
> Offers drill-down, drill-across and slice and dice capabilities.
What is OLAP?
4
• On Line No piles of paper,
please!
• Analytical Establish patterns
• Processing Data-based
• Fast Analysis of Shared
Multidimensional Information
OLAP - Fast Analysis
5
• Dimensions can we think in ?
E.g. analysis by branch, product, agent, year !!!
2 or 3
• Types of values we can handle ?
E.g. Sales, Profit, Cost
1 or 2
• How many levels can we handle ?
E.g. number of products we can analyze
Need for OLAP
6
Many parameters affect a Measure (value)
e.g Sales influenced by product, region, time,
distribution channel, etc.,
Linear analysis = reports
Many totals are at one level
Difficult to identify the key parameters
Need for OLAP
7
OLAP in an Enterprise
8
Departments:
Finance
Marketing
Sales
Manufacturing
Analytical Capabilities:
> Used by analysts and managers.
> Offers aggregated view of the data, such as total revenues by
customer profile, by product line, by geographical regions.
Uses of OLAP
9
> Provides the decision support front-end for data warehousing.
> Advanced statistical, financial, and analytical calculations.
> Appropriate tools to access data from a relational database.
> Appropriate tools to access or manage multidimensional data.
Functionality of OLAP Tools
10
OLAP analytical features
> Multi-dimensional views of data
> Calculation intensive capabilities
> Time intelligence
The OLAP Calculation engine in OLAP tools have a wide range of
built-in calculations such as:
> Ratios
> Time calculations
> Statistics
> Ranking
> Custom formulas/algorithms
> Forecasting and modeling
Features of OLAP Applications
Evolution of OLAP
12
Star Schema
> A Star Schema is a dimensional model created by mapping data entities from operational systems
> It has a central table (fact table) that links all the other tables (dimension tables) together
> Dimension: The same category of information. For example, year, month, day, and week are all part of the Time Dimension.
> Measure: The property that can be summed or averaged using pre computed aggregates.
13
Facts and Measures
> Facts or Measures are the Key Performance
Indicators of an enterprise
> Factual data about the subject area
> Numeric, summarized
Net ProfitSale
s Rev
enue
Gross Margin
ProfitabilityCost
14
Dimension
> Dimensions put measures in perspective
> What, when and where qualifiers to the measures
> Dimensions could be products, customers, time, geography
etc.
Sales
Rev
enue
(Mea
sure
) What was sold ? Whom was it sold to ? When was it sold ? Where was it sold ?
15
Star Schema
16
Star Schema Example
17
Star Schema with Sample Data
Cube
– Multi dimensional databases store information in the form of cubes.
– A cube is a collection of facts and related dimensions stored together in arrays.
Sales
HR
CUBE
Geography
Time
Product
19
> Hierarchy: A hierarchy defines the navigating path for drilling up and drilling down. All attributes in a hierarchy belong to the same dimension.
> Levels: These are organized into one or more hierarchies, typically from a coarse-grained level (for example, Year) down to the most detailed one (for example, Day).
> Members: The individual category values (for example, 2002 or 21Jan2002).
> Measures: These are the data values that are summarized and analyzed. Examples of measures are sales figures or operational costs.
> Cells: These are the intersection of one member for every dimension and store the data for measures.
Basic Terminology of a Cube
20
Basic Terminology of a Cube
> Dimensions consist of– Dimension Name
– Level
– Hierarchy
– Member
Time
1999 2000 2001
Q1 Q2 Q3 Q4 Q1 Q2Q3 Q4
YEAR
QUARTER
LevelOf
Detail
21
Aggregates
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1
81
22
Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date
ans date sum1 812 48
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
Aggregates
23
Another Example
Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId
sale prodId date amtp1 1 62p2 1 19p1 2 48
drill-down
rollup
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
24
Aggregates
>Operators: sum, count, max, min, median and avg
>“Having” clause
>Using dimension hierarchy– average by region (within store)– maximum by month (within date)
25
The MOLAP Cube
sale prodId storeId amtp1 s1 12p2 s1 11p1 s3 50p2 s2 8
s1 s2 s3p1 12 50p2 11 8
Fact table view: Multi-dimensional cube:
dimensions = 2
26
3-D Cube
dimensions = 3
Multi-dimensional cube:Fact table view:
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
27
Example
Store
Product
Time
M T W Th F S S
Juice
Milk
Coke
Cream
Soap
Bread
NYSF
LA
10
34
56
32
12
56
56 units of bread sold in LA on M
Dimensions:Time, Product, Store
Attributes:Product (upc, price, …)Store ……
Hierarchies:Product Brand …Day Week QuarterStore Region Country
roll-up to week
roll-up to brand
roll-up to region
28
Cube Aggregation: Roll-up
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
s1 s2 s3p1 56 4 50p2 11 8
s1 s2 s3sum 67 12 50
sump1 110p2 19
129
. . .
drill-down
rollup
Example: computing sums
29
Aggregation Using Hierarchies
region A region Bp1 56 54p2 11 8
store
region
country
(store s1 in Region A;stores s2, s3 in Region B)
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
30
Slicing
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
s1 s2 s3p1 12 50p2 11 8
TIME = day 1
In SQL: SELECT * FROM SALE WHERE date = 1
OLAP Solutions and Architecture
32
Online Analytical Processing (OLAP) can be done on:
> Relational databases
> Multidimensional databases
OLAP products are grouped into three categories:
> Relational OLAP (ROLAP)
> Multidimensional OLAP (MOLAP)
> Hybrid OLAP (HOLAP)
OLAP - Classification
33
Multi-dimensional OLAP
MOLAP is a technology which uses a multi-dimensional
database that stores data as n-dimensional cube
Geography
Age G
roup
Bra
nd
MOLAP
34
Architecture of MOLAP
Data Mart Server
•RDBMS•Connectivity Middleware
MOLAP Server
•MDDBMS/Data Cube•MOLAP Application
Desktop Systems
MOLAP Client Tools
LANLAN
non-live connection•Used for updating the MOLAP data cube only
IntranetInternet
Thin Clients
•WWW Browser
RouterRouterFirewallFirewall
Cube Size Crit
ical
Cube Size Crit
ical
Issues:• Size of Data Cube• Cubes deployment• Size of Update Data Set
35
Oracle's Oracle Express Server
Cognos - Powerplay Transformer
Essbase (Hyperion Software)
Holos (Seagate Software)
MOLAP Products
36
Data Mart Server
•RDBMS•Connectivity Middleware
ROLAP Server
•ROLAP Application
Desktop SystemsDesktop Systems
ROLAP Client Client ToolsTools
LANLAN
IntranetInternet
Thin Clients
•WWW Browser
Router /Router /FirewallFirewallIssues:
• Aggregate Awareness• Response Time•Network Capacity
Architecture of ROLAP
37
Brio Query Enterprise
Business Objects
Metacube
DSS Server
Information Advantage
ROLAP Products
38
ROLAP Server
•ROLAP Application
Desktop SystemsDesktop Systems
HOLAP Client Client ToolsTools
LANLAN
Router/FirewallIssues:
•Cube elements•Integration with RDBMS
MOLAP Server
•MDDBMS/Data Cube•MOLAP Application
Architecture of HOLAP
39
Holos (Seagate Software)
Microsoft SQL Server OLAP Services
Pilot Software's Pilot Decision Support Suite
SAS
HOLAP Products
MOLAP Vs ROLAP
41
ArchitecturalFeatures
MOLAP ROLAP
Number of Dimensions Ten or Less Unlimited
Support for Large numberof users
Limited support Good
Scalability Poor Good
ComplexMultidimensional analysis
Easier to achieve Difficult to achieve
Volume of Data storage Up to 50 GB Hundreds ofGigabytes andTerabytes
Storage of Information Through cubes SQL result sets
User Interface &functionality
Good Normal
Common accesslanguage
NA SQL
Nature of Data Stores summarizeddata
Stores Detailed aswell as summarizeddata
Comparison of Architectures
42
Parameters MOLAP ROLAPApplication design Essentially the
definition ofdimensional modeland calculation rules
It uses two-dimensional tablesthat are stored inRDBMSs. (Data isstored in Starschema or Snowflake schema.)
Aggregation techniques Measures are pre-calculated and storedat each hierarchysummary level duringload time
Summary tables areimplemented in therelational database
Multidimensionalanalysis
Drill down, Drill up,Drill across and Slicing /Dicing
Drill down, Drill up,Slicing and Dicing
Query performance Instant response Slower
Value added functions Supports complexfunctions like%change, ranking etc.,
Limited value addedfunctions
User – definedcalculations
Calculated from cubes Calculated (On thefly )from thedatabase
Strength and Weakness of MOLAP/ROLAP
43
Parameters MOLAP ROLAPProcessing Over headfor large input data sets
High Low
Support for frequentupdates
Cannot handlefrequent update ofcubes
Suitable for frequentupdates
Resource requirements High Low
Industry standard No current standards SQL standard
Access to the databasethrough ODBC
The databases haveproprietary API and donot provide accessthrough ODBC.
Provides accessthrough ODBC
Strength and Weakness of MOLAP/ROLAP
OLAP Tool Selection
45
Parameters FeaturesOpenness Openness to standard reporting tools.
Ad hoc reporting Ad hoc query performance and reportingcapabilities
read-write Multi user read-write applications
Integration Integration with the organization’s enterprisewide environment
Cost Cost of ownership, training, and installation.
Compatibility Compatibility with the enterprise computingenvironment
Database Database size capacity of product
Scalability Ability of tool to scale to the required numberof dimensions
Analysis of detaildata
Ability of tool to support analysis againstatomic data sets
Parameters to be Considered for an OLAP Tool Selection
46
Parameters FeaturesRDBMSintegration
Ability of the OLAP tool to integrate directlywith relational databases and non-numericrelational data.
Run timecalculations
Ability to perform calculations at run-time
Data Loading Data loading performance of the OLAPproduct.
Key features Key features offered by the tool such aswrite-back, allocation calculations,sophisticated currency conversions, printedreport quality, spreadsheet interface etc.
Integrationwith othersystems
Integration with other related systems, suchas e-mail, data warehouses.
Deploymentarchitectures
Ability to support various deployments suchas stand-alone, high speed client/server,intranet, extranet, Internet
Parameters to be Considered for an OLAP Tool Selection
47
Features MOLAP ROLAPCalculation intensity, complexity Data Sparsity Database Update Data Volatility Volume of Data Development time, learning curve
Standards, interoperability
Query response time Consistency, Reliability Data Loading time Security Network impact Vendor Stability
Which is Preferred ?
48
> Offers Fast, flexible data summarization and analysis.
> OLAP servers are a superior technology for BI applications.
> Ability to summarize data in multiple ways and view trends over
time.
> OLAP servers and relational databases can work in harmony.
OLAP - Summary
49
In this session, We have
> Understood the need for OLAP and significance of
Multidimensional analysis in a Data Warehouse.
> Discussed about the evolution of OLAP.
> Explained architectures, characteristics as well as the merits and
demerits of various OLAP solutions.
Session Summary
Thank you