Online Analytical Online Analytical Processing (OLAP)Processing (OLAP)
ByByAman SachdevaAman Sachdeva
23090012309001CSECSE
Topics CoveredTopics Covered1.What is OLAP ?1.What is OLAP ?2.OLAP v/s OLTP2.OLAP v/s OLTP3.Three-Tier Design Support Systems3.Three-Tier Design Support Systems4. Approaches to OLAP servers4. Approaches to OLAP servers5. Multi-dimensional data model5. Multi-dimensional data model6. ROLAP6. ROLAP7. MOLAP7. MOLAP8. HOLAP8. HOLAP9.OLAP Benefits9.OLAP Benefits10.OLAP Applications10.OLAP Applications11. Conclusion11. Conclusion
What is OLAP ?What is OLAP ?
• The dynamic synthesis, analysis, and The dynamic synthesis, analysis, and consolidation of large volumes of multi-consolidation of large volumes of multi-dimensional data, Codd (1993).dimensional data, Codd (1993).
• Describes a technology that uses a multi-Describes a technology that uses a multi-dimensional view of aggregate data to dimensional view of aggregate data to provide quick access to strategic information provide quick access to strategic information for the purposes of advanced analysis. for the purposes of advanced analysis.
• Enables users to gain a deeper Enables users to gain a deeper understanding and knowledge about various understanding and knowledge about various aspects of their corporate data through fast, aspects of their corporate data through fast, consistent, interactive access to a wide consistent, interactive access to a wide variety of possible views of the datavariety of possible views of the data
HHistoryistory
In 1993, E. F. Codd came up with the term In 1993, E. F. Codd came up with the term online analytical processing (OLAP) and and proposed 12 criteria to define an OLAP proposed 12 criteria to define an OLAP databasedatabase
The term OLAP seems perfect to describe The term OLAP seems perfect to describe databases designed to facilitate decision databases designed to facilitate decision making (analysis) in an organizationmaking (analysis) in an organization
OLAP v/s OLTPOLAP v/s OLTPOLTPOLTP OLAPOLAP
• Clerk, IT ProfessionalClerk, IT Professional
• Day to day Day to day operationsoperations
• Application-oriented Application-oriented (E-R based)(E-R based)
• Current, IsolatedCurrent, Isolated
• Structured, RepetitiveStructured, Repetitive
• Short, Simple Short, Simple transactiontransaction
• Read/writeRead/write
• TensTens
• ThousandsThousands
• 100 MB-GB100 MB-GB
• Knowledge workerKnowledge worker
• Decision supportDecision support
• Subject-oriented (Star, Subject-oriented (Star, snowflake)snowflake)
• Historical, ConsolidatedHistorical, Consolidated
• Ad hocAd hoc
• Complex queryComplex query
• Read MostlyRead Mostly
• MillionsMillions
• HundredsHundreds
• 100GB-TB100GB-TB
User
Function
DB Design
Data
View
Unit of work
Access
Records accessed
Users
Db size
Three-Tier Decision Three-Tier Decision Support SystemsSupport Systems
• Warehouse database serverWarehouse database server– Almost always a relational DBMS, rarely flat filesAlmost always a relational DBMS, rarely flat files
• OLAP serversOLAP servers– Relational OLAP (ROLAP): extended relational DBMS Relational OLAP (ROLAP): extended relational DBMS
that maps operations on multidimensional data to that maps operations on multidimensional data to standard relational operatorsstandard relational operators
– Multidimensional OLAP (MOLAP): special-purpose Multidimensional OLAP (MOLAP): special-purpose server that directly implements multidimensional data server that directly implements multidimensional data and operationsand operations
• ClientsClients– Query and reporting toolsQuery and reporting tools– Analysis toolsAnalysis tools– Data mining toolsData mining tools
The Complete Decision Support The Complete Decision Support SystemSystem
Information Sources Data Warehouse Server(Tier 1)
OLAP Servers(Tier 2)
Clients(Tier 3)
OperationalDB’s
SemistructuredSources
extracttransformloadrefreshetc.
Data Marts
DataWarehouse
e.g., MOLAP
e.g., ROLAP
serve
OLAP
Query/Reporting
Data Mining
serve
serve
Approaches to OLAP Approaches to OLAP ServersServers
Three possibilities for OLAP serversThree possibilities for OLAP servers
(1) Relational OLAP (ROLAP)(1) Relational OLAP (ROLAP)– Relational and specialized relational DBMS to store Relational and specialized relational DBMS to store
and manage warehouse dataand manage warehouse data– OLAP middleware to support missing piecesOLAP middleware to support missing pieces
((2) Multidimensional OLAP (MOLAP)2) Multidimensional OLAP (MOLAP)– Array-based storage structuresArray-based storage structures– Direct access to array data structuresDirect access to array data structures
(3) Hybrid OLAP (HOLAP)(3) Hybrid OLAP (HOLAP)– Storing detailed data in RDBMSStoring detailed data in RDBMS– Storing aggregated data in MDBMSStoring aggregated data in MDBMS– User access via MOLAP toolsUser access via MOLAP tools
The Multi-Dimensional Data The Multi-Dimensional Data ModelModel
““Sales by product line over the past six months”Sales by product line over the past six months”
““Sales by store between 1990 and 1995”Sales by store between 1990 and 1995”
Prod Code Time Code Store Code Sales Qty
Store Info
Product Info
Time Info
. . .
Numerical MeasuresKey columns joining fact table
to dimension tables
Fact table for measures
Dimension tables
ROLAP: Dimensional Modeling Using ROLAP: Dimensional Modeling Using Relational DBMSRelational DBMS
• Special schema design: Special schema design: star, snowflakestar, snowflake
• Special indexes: bitmap, multi-table joinSpecial indexes: bitmap, multi-table join
• Proven technology (relational model, DBMS), Proven technology (relational model, DBMS), tend to outperform specialized MDDB tend to outperform specialized MDDB especially on large data setsespecially on large data sets
• ProductsProducts– IBM DB2, Oracle, Sybase IQ, RedBrick, IBM DB2, Oracle, Sybase IQ, RedBrick,
InformixInformix
Star Schema (in RDBMS)Star Schema (in RDBMS)
Star Schema ExampleStar Schema Example
The “Classic” Star SchemaThe “Classic” Star Schema
A single fact table, with A single fact table, with detail and summary datadetail and summary data
Fact table primary key Fact table primary key has only one key column has only one key column per dimensionper dimension
Each key is generatedEach key is generated Each dimension is a Each dimension is a
single table, highly de-single table, highly de-normalizednormalized
Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low maintenance, very simple metadata
PERIOD KEY
Store Dimension Time Dimension
Product Dimension
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Period DescYearQuarterMonthDayCurrent FlagResolutionSequence
Fact Table
PRODUCT KEY
Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.Level
Product Desc.BrandColorSizeManufacturerLevel
STORE KEY
The “Snowflake” SchemaThe “Snowflake” Schema
STORE KEY
Store Dimension
Store DescriptionCityStateDistrict IDRegion_IDRegional Mgr.
District_IDDistrict Desc.Region_ID
Region_ID
Region Desc.Regional Mgr.
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Store Fact Table
Aggregation in a Single Fact TableAggregation in a Single Fact Table
Drawbacks: Summary data in the fact table yields poorer performance for summary levels, huge dimension tables a problem
PERIOD KEY
Store Dimension Time Dimension
Product Dimension
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Period DescYearQuarterMonthDayCurrent FlagResolutionSequence
Fact Table
PRODUCT KEY
Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.Level
Product Desc.BrandColorSizeManufacturerLevel
STORE KEY
PERIOD KEY
Store Dimension Time Dimension
Product Dimension
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Period DescYearQuarterMonthDayCurrent FlagSequence
Fact Table
PRODUCT KEY
Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.
Product Desc.BrandColorSizeManufacturer
STORE KEY
The “Fact Constellation” The “Fact Constellation” SchemaSchema
DollarsUnitsPrice
District Fact Table
District_IDPRODUCT_KEYPERIOD_KEY
DollarsUnitsPrice
Region Fact Table
Region_IDPRODUCT_KEYPERIOD_KEY
TheThe
Aggregations using Aggregations using “Snowflake” Schema and “Snowflake” Schema and Multiple Fact TablesMultiple Fact Tables
• No No LEVELLEVEL in dimension tables in dimension tables
• Dimension tables are normalized by Dimension tables are normalized by decomposing at the attribute leveldecomposing at the attribute level
• Each dimension table has one key for Each dimension table has one key for each level of the dimensionís each level of the dimensionís hierarchy hierarchy
• The lowest level key joins the The lowest level key joins the dimension table to both the fact table dimension table to both the fact table and the lower level attribute tableand the lower level attribute table
How does it work? The best way is for the query to be built by understanding which summary levels exist, and finding the proper snowflaked attribute tables, constraining there for keys, then selecting from the fact table.
STORE KEY
Store Dimension
Store DescriptionCityStateDistrict IDDistrict Desc.Region_ IDRegion Desc.Regional Mgr.
District_ IDDistrict Desc.Region_ ID
Region_ ID
Region Desc.Regional Mgr.
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Store Fact Table
DollarsUnitsPrice
District Fact Table
District_IDPRODUCT_KEYPERIOD_KEY Dollars
UnitsPrice
RegionFact Table
Region_IDPRODUCT_KEYPERIOD_KEY
Aggregation Contd …Aggregation Contd …
Advantage: Best performance when queries involve aggregation
Disadvantage: Complicated maintenance and metadata, explosion in the number of tables in the database
STORE KEY
Store Dimension
Store DescriptionCityStateDistrict IDDistrict Desc.Region_ IDRegion Desc.Regional Mgr.
District_ IDDistrict Desc.Region_ ID
Region_ ID
Region Desc.Regional Mgr.
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Store Fact Table
DollarsUnitsPrice
District Fact Table
District_IDPRODUCT_KEYPERIOD_KEY Dollars
UnitsPrice
RegionFact Table
Region_IDPRODUCT_KEYPERIOD_KEY
AggregatesAggregates
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1
81
AggregatesAggregates Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date
ans date sum1 812 48
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
Points to be noticed about ROLAPPoints to be noticed about ROLAP
• Defines complex, multi-dimensional data with Defines complex, multi-dimensional data with simple modelsimple model
• Reduces the number of joins a query has to Reduces the number of joins a query has to processprocess
• Allows the data warehouse to evolve with rel. Allows the data warehouse to evolve with rel. low maintenancelow maintenance
• Can contain both detailed and summarized Can contain both detailed and summarized data.data.
• ROLAP is based on familiar, proven, and ROLAP is based on familiar, proven, and already selected technologies.already selected technologies.
BUT!!!BUT!!!
• SQL for multi-dimensional manipulation of SQL for multi-dimensional manipulation of calculations.calculations.
MOLAP: Dimensional Modeling MOLAP: Dimensional Modeling Using the Multi Dimensional ModelUsing the Multi Dimensional Model
• MDDB: a special-purpose data modelMDDB: a special-purpose data model
• Facts stored in multi-dimensional Facts stored in multi-dimensional arraysarrays
• Dimensions used to index arrayDimensions used to index array
• Sometimes on top of relational DBSometimes on top of relational DB
• ProductsProducts– Pilot, Arbor Essbase, GentiaPilot, Arbor Essbase, Gentia
The MOLAP CubeThe MOLAP Cube
sale prodId storeId amtp1 s1 12p2 s1 11p1 s3 50p2 s2 8
s1 s2 s3p1 12 50p2 11 8
Fact table view: Multi-dimensional cube:
dimensions = 2
3-D Cube3-D Cube
dimensions = 3
Multi-dimensional cube:Fact table view:
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
ExampleExample
Store
Product
Time
M T W Th F S S
Juice
Milk
Coke
Cream
Soap
Bread
NYSF
LA
10
34
56
32
12
56
56 units of bread sold in LA on M
Dimensions:Time, Product, Store
Attributes:Product (upc, price, …)Store ……
Hierarchies:Product Brand …Day Week QuarterStore Region Country
roll-up to week
roll-up to brand
roll-up to region
Cube Aggregation: Roll-upCube Aggregation: Roll-up
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
s1 s2 s3p1 56 4 50p2 11 8
s1 s2 s3sum 67 12 50
sump1 110p2 19
129
. . .
drill-down
rollup
Example: computing sums
Cube Operators for Roll-upCube Operators for Roll-up
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
s1 s2 s3p1 56 4 50p2 11 8
s1 s2 s3sum 67 12 50
sump1 110p2 19
129
. . .
sale(s1,*,*)
sale(*,*,*)sale(s2,p2,*)
s1 s2 s3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129
Extended CubeExtended Cube
day 2 s1 s2 s3 *p1 44 4 48p2* 44 4 48s1 s2 s3 *
p1 12 50 62p2 11 8 19* 23 8 50 81
day 1
*
sale(*,p2,*)
Points to be noticed about MOLAPPoints to be noticed about MOLAP
• Pre-calculating or pre-consolidating transactional data Pre-calculating or pre-consolidating transactional data improves speed. improves speed.
BUTBUTFully pre-consolidating incoming data, MDDs require an Fully pre-consolidating incoming data, MDDs require an enormous amount of overhead both in processing time and in enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GBstorage. An input file of 200MB can easily expand to 5GB
MDDs are great candidates for the MDDs are great candidates for the <<50GB department data 50GB department data marts.marts.
• Rolling up and Drilling down through aggregate data.Rolling up and Drilling down through aggregate data.
• With MDDs, application design is essentially the definition of With MDDs, application design is essentially the definition of dimensions and calculation rules, while the RDBMS requires dimensions and calculation rules, while the RDBMS requires that the database schema be a star or snowflake.that the database schema be a star or snowflake.
Hybrid OLAP (HOLAP)Hybrid OLAP (HOLAP)
• HOLAP = Hybrid OLAP:HOLAP = Hybrid OLAP:
– Best of both worldsBest of both worlds
– Storing detailed data in RDBMSStoring detailed data in RDBMS
– Storing aggregated data in MDBMSStoring aggregated data in MDBMS
– User access via MOLAP toolsUser access via MOLAP tools
ExamplesExamples
• ROLAPROLAP– Telecommunication startup: call data records Telecommunication startup: call data records
(CDRs) (CDRs) – ECommerce SiteECommerce Site– Credit Card CompanyCredit Card Company
• MOLAPMOLAP– Analysis and budgeting in a financial departmentAnalysis and budgeting in a financial department– Sales analysisSales analysis
• HOLAPHOLAP– Sales department of a multi-national companySales department of a multi-national company– Banks and Financial Service ProvidersBanks and Financial Service Providers
• Increased productivity of end-usersIncreased productivity of end-users..• Reduced backlog of applications Reduced backlog of applications
development for IT staff.development for IT staff.• Retention of organizational control over Retention of organizational control over
the integrity of corporate data.the integrity of corporate data.• Reduced query drag and network traffic on Reduced query drag and network traffic on
OLTP systems or on the data warehouse. OLTP systems or on the data warehouse. • Improved potential revenue and Improved potential revenue and
profitability.profitability.
OLAP BenefitsOLAP Benefits
OLAP APPLICATIONSOLAP APPLICATIONS
• JIT(Just-In-Time) information is JIT(Just-In-Time) information is computed data that usually reflects computed data that usually reflects complex relationships and is often complex relationships and is often calculated on the fly. Also as data calculated on the fly. Also as data relationships may not be known in relationships may not be known in advance, the data model must be advance, the data model must be flexible.flexible.
OLAP APPLICATIONSOLAP APPLICATIONS
• Although OLAP applications are found in Although OLAP applications are found in widely divergent functional areas, they all widely divergent functional areas, they all have the following key features:have the following key features:
– multi-dimensional views of datamulti-dimensional views of data
– support for complex calculationssupport for complex calculations
– Time intelligenceTime intelligence
Examples of OLAP applications Examples of OLAP applications in various functional areasin various functional areas
ConclusionConclusion
• IT organizations are faced with the challenge of delivering systems that allow knowledge workers to make strategic and tactical decisions based on corporate information. These decision support systems are referred to as Online Analytical Processing (OLAP) systems, and they allow knowledge workers to intuitively, quickly, and flexibly manipulate operational data using familiar business terms, in order to provide analytical insight.
ThanksThanks
Any Queries ??Any Queries ??