54
Business Intelligence & Multi-Dimensional Databases Nirmal Jonnalagedda

Business Intelligence & Multi-Dimensional Databases Nirmal Jonnalagedda

Embed Size (px)

Citation preview

Business Intelligence & Multi-Dimensional Databases

Nirmal Jonnalagedda

Outline

1. BI: History2. BI: Overview3. Common Functions of BI4. BI: What can you do with it?5. Multidimensional Databases6. Contrast MDD and Relational Databases7. When is MDD (In)appropriate?8. MDD Features9. Pros/Cons of MDD

BI: History

1958 - Term first used by IBM researcher Hans Peter Luhn He defined intelligence as: “the ability to

apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal”

BI is understood to have evolved decision support systems (DSS) in the 1960’s

In the 80’s DSS concepts evolved and split data warehouses, Executive Information

Systems, OLAP

BI : an overview

There are many different opinions Depends on where you work Generally BA is a subset of BI

BI - Ability for an organization to take its capabilities and convert these things into knowledge

Often includes the implementation of Key Performance Indicators (KPIs), Trending

Analysis, Predictive Modeling

What does BI provide historical, current and predictive views of

business operations Where does BI get this information

From within your business Not necessarily focused on the actions of others

Called competitive analysis End Goal : Support better decision making

BI is sometimes called a decision support system (DSS)

BI applications can often vary in scope Can be enterprise wide, focusing on critical business

applications Monitoring the popularity of a product in a

nationwide grocery chain Tracking responses to mail offers and only

mailing those who respond Can be department or project specific, focused on

individual decisions and how those affect an organization

Monitoring employee productivity and department spending

Common Functions of BI

Reporting Online Analytical Processing Analytics Data, Process, Text Mining Complex Event Processing Business Performance Management Benchmarking Predictive, Prescriptive Analytics

BI : What can you do with it?

Identify cost cutting ideas and practices Uncover new business opportunities React and even predict retail demand Avoid repeating costly mistakes

Especially useful in large enterprises with many departments

Easily correlate and group business information and metrics into an understandable format

Understand customer behavior

Database Evolution

Flat files Hierarchical and Network Relational Distributed Relational Multidimensional

MDDB: Why?

No single "best" data structure for all applications within an enterprise

Organizations have abandoned the search for the holy grail of globally accepted database

Instead selecting the most appropriate data structure on a case-by-case basis from a palette of standard database structures

Multidimensional Databases for OLAP?

From econometric research conducted at MIT in the 1960s, the multidimensional database has matured into the database engine of choice for data analysis applications

Inherent ability to integrate and analyze large volumes of enterprise data

Offers a good conceptual fit with the way end-users visualize business data

Most business people already think about their businesses in multidimensional terms

Managers tend to ask questions about product sales in different markets over specific time periods

Spreadsheets – A 2D database? Functionalities What about a stack of similar

spreadsheets for different times? Limitations? We can not relate data in different

sheets easily

What is a Multi-Dimensional Database?

A multidimensional database (MDDB) is a computer software system designed to allow for the efficient and convenient storage and retrieval of large volumes of data that are

(1) intimately related and (2) stored, viewed and analyzed from different

perspectives.

These perspectives are called dimensions.

A Motivating Example

An automobile manufacturer wants to increase sale volumes by examining sales data collected throughout the organization. The evaluation would require viewing historical sales volume figures from multiple dimensions such asSales volume by modelSales volume by colorSales volume by dealerSales volume over time

Contrasting Relational and Multi-Dimensional Models

SALES VOLUMES FOR GLEASON DEALERSHIP

MODEL COLOR SALES VOLUME

MINI VAN BLUE 6MINI VAN RED 5MINI VAN WHITE 4SPORTS COUPE BLUE 3SPORTS COUPE RED 5SPORTS COUPE WHITE 5SEDAN BLUE 4SEDAN RED 3SEDAN WHITE 2

The Relational Structure

COLOR

MODEL

Mini Van

Sedan

Coupe

Red WhiteBlue

6 5 4

3 5 5

4 3 2

Sales Volumes

Multidimensional Structure

Measurement

DimensionPositions

Dimension

Differences between MDDB and Relational Databases

Normalized Relational MDDB

Data reorganized based on query. Perspectives are placed in the fields – tells us nothing about the contents

Perspectives embedded directly in the structure.

Browsing and data manipulation are not intuitive to user

Data retrieval and manipulation are easy

Slows down for large datasets due to multiple JOIN operations needed.

Fast retrieval for large datasets due to predefined structure.

Flexible. Anything an MDDB can do, can be done this way.

Relatively Inflexible. Changes in perspectives necessitate reprogramming of structure.

Contrasting Relational Model and MDD-Example 2

SALES VOLUMES FOR ALL DEALERSHIPS MODEL COLOR DEALERSHIP VOLUME MINI VAN BLUE CLYDE 6 MINI VAN BLUE GLEASON 6 MINI VAN BLUE CARR 2 MINI VAN RED CLYDE 3 MINI VAN RED GLEASON 5 MINI VAN RED CARR 5 MINI VAN WHITE CLYDE 2 MINI VAN WHITE GLEASON 4 MINI VAN WHITE CARR 3 SPORTS COUPE BLUE CLYDE 2 SPORTS COUPE BLUE GLEASON 3 SPORTS COUPE BLUE CARR 2 SPORTS COUPE RED CLYDE 7 SPORTS COUPE RED GLEASON 5 SPORTS COUPE RED CARR 2 SPORTS COUPE WHITE CLYDE 4 SPORTS COUPE WHITE GLEASON 5 SPORTS COUPE WHITE CARR 1 SEDAN BLUE CLYDE 6 SEDAN BLUE GLEASON 4 SEDAN BLUE CARR 2 SEDAN RED CLYDE 1 SEDAN RED GLEASON 3 SEDAN RED CARR 4 SEDAN WHITE CLYDE 2 SEDAN WHITE GLEASON 2 SEDAN WHITE CARR 3

Mutlidimensional Representation

Sales Volumes

DEALERSHIP

Mini Van

Coupe

Sedan

Blue Red White

MODEL

ClydeGleason

Carr

COLOR

Viewing Data - An Example

DEALERSHIP

Sales Volumes

MODEL

COLOR

•Assume that each dimension has 10 positions, as shown in the cube above •How many records would be there in a relational table? •Implications for viewing data from an end-user standpoint?

Performance Advantages

Volume figure when car type = SEDAN, color=BLUE, & dealer=GLEASON?

RDBMS – all 1000 records might need to be searched to find the right recordMDB has more ‘knowledge’ about where the data liesMaximum of 30 position searchesAverage case15 vs. 500

Total Sales across all colors and dealers when model = SEDAN?

RDBMS – all 1000 records must be searched to get the answerMDB – Sum the contents of one 10x10 ‘slice’

Data manipulation that requires a minute in RDBMS may require only a few seconds in MDB

MDBs are an order of magnitude faster than RDBMSs

Performance benefits are more for queries that generate cross-tab views of data

The performance advantages offered by multidimensional technology facilitates the development of interactive decision support applications like OLAP that can be impractical in a relational environment.

Real World Benefits

Ease of data presentation and navigation Ease of maintenance Performance

Ease of Data Presentation and Navigation

Intuitive spreadsheet like data views are natural output of MDDBs

Obtaining the same views in a relational environment, requires either a complex SQL or a SQL generator against a RDB to convert the table outputs into a more intuitive format

Even for end users well skilled in SQL, some forms of output, such as ranking reports (i.e. top ten, bottom 20%), simply cannot be performed with SQL at all!

Ease of Maintenance

Ease of maintenance because data is stored as it is viewed

No additional overhead is required to translate user queries into requests for data

To provide same intuitiveness, RDBs use indexes and sophisticated joins which require significant maintenance and storage

Performance

Multidimensional databases achieve performance levels that are difficult to match in a relational environment.

These high performance levels enable and encourage OLAP applications

Performance of MDBs can be matched by RDBs through database tuning

Not possible to tune the database for all possible adhoc queries

Tuning requires resources of an expensive DB specialist

Adding Dimensions- An Example

MODEL

Mini Van

Coupe

Sedan

Blue Red White

ClydeGleason

Carr

COLOR

Sales Volumes

Coupe

Sedan

Blue Red White

ClydeGleason

Carr

COLOR

DEALERSHIP

Mini Van

Coupe

Sedan

Blue Red White

ClydeGleason

Carr

COLOR

JANUARY FEBRUARY MARCH

Mini Van

When is MDD (In)appropriate?

PERSONNEL

LAST NAME EMPLOYEE# EMPLOYEE AGE

SMITH 01 21REGAN 12 19FOX 31 63WELD 14 31KELLY 54 27LINK 03 56KRANZ 41 45LUCUS 33 41WEISS 23 19

First, consider situation 1

When is MDD (In)appropriate?

Now consider situation 2 SALES VOLUMES FOR GLEASON DEALERSHIP

MODEL COLOR VOLUME

MINI VAN BLUE 6MINI VAN RED 5MINI VAN WHITE 4SPORTS COUPE BLUE 3SPORTS COUPE RED 5SPORTS COUPE WHITE 5SEDAN BLUE 4SEDAN RED 3SEDAN WHITE 2

1. Set up a MDD structure for situation 1, with LAST NAMEand Employee# as dimensions, and AGE as the measurement.2. Set up a MDD structure for situation 2, with MODEL and

COLOR as dimensions, and SALES VOLUME as the measurement.

When is MDD (In)appropriate?

COLOR

MODEL

Miini Van

Sedan

Coupe

Red WhiteBlue

6 5 4

3 5 5

4 3 2

Sales Volumes

EMPLOYEE #

LAST

NAME

Kranz

Weiss

Lucas

41 3331

45

19

Employee Age

41

31

56

63

21

19

Smith

Regan

Fox

Weld

Kelly

Link

01 14 54 03 1223

27

Note the sparseness in the second MDD representation

MDD Structures for the Situations

When is MDD (In)appropriate?

Our sales volume dataset has a great number of meaningful interrelationships

Interrelationships more meaningful than individual data elements themselves.

The greater the number of inherent interrelationships between the elements of a dataset, the more likely it is that a study of those interrelationships will yield business information of value to the company.

Highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis

When is MDD (In)appropriate?

No last name is matching with more than one emp # and no emp # is matching with more than one last name

In contrast, there is a sales figure associated with every combination of model and color resulting in a completed filled up 3x3 matrix

Performance suffers (RDB 9 vs. MDB 18)

When is MDD (In)appropriate?

The relative performance advantages of storing multidimensional data in a multidimensional array increase as the size of the dataset increases

The relative performance disadvantages of storing non-multidimensional data in a multidimensional array increase as the size of the dataset increases.

NO inherent value of storing Non-multidimensional data (employee data) in multidimensional arrays

When is MDD (In)appropriate?

The relative performance advantages of storing multidimensional data in a multidimensional array increase as the size of the dataset increases

The relative performance disadvantages of storing non-multidimensional data in a multidimensional array increase as the size of the dataset increases.

NO inherent value of storing Non-multidimensional data (employee data) in multidimensional arrays

When is MDD Appropriate?

The greater the number of inherent interrelationships between the elements of a dataset, the more likely it is that a study of those interrelationships will yield business information of value to the company. Most companies have limited time and resources to devote to analyzing dataIt therefore becomes critical that these highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis.

When is MDD Appropriate?

Examples of applications that are suited for multidimensional technology:

Financial Analysis and ReportingBudgetingPromotion TrackingQuality Assurance and Quality ControlProduct ProfitabilitySurvey Analysis

MDD Features - Rotation

Sales Volumes

COLOR

MODEL

Mini Van

Sedan

Coupe

Red WhiteBlue

6 5 4

3 5 5

4 3 2

MODEL

COLOR

SedanCoupe

Red

White

Blue 6 3 4

5 5 3

4 5 2( ROTATE 90

o )

View #1 View #2

Mini Van

•Also referred to as “data slicing.”•Each rotation yields a different slice or two dimensional tableof data – a different face of the cube.

MDD Features - Rotation

COLORCOLORMODEL

MODELDEALERSHIPDEALERSHIP

MODEL

Mini Van

Coupe

Sedan

Blue Red White

ClydeGleason

Carr

COLOR

Mini Van

Blue

Red

WhiteClyde

GleasonCarr

MODEL

Mini Van

Coupe

Sedan

Blue

Red

White

Carr

COLOR

COLOR

DEALERSHIP

View #1 View #2 View #3

DEALERSHIP

Mini Van

CoupeSedan

BlueRedWhite

Clyde

Gleason

Carr

Mini Van Coupe Sedan

BlueRed

WhiteClyde

Gleason

Carr Mini Van

Coupe

SedanBlue

RedWhite

Clyde Gleason Carr

View #4 View #5 View #6

DEALERSHIP

CoupeSedan

( ROTATE 90o

) ( ROTATE 90o

) ( ROTATE 90o

)

COLOR MODEL

MODEL

DEALERSHIP( ROTATE 90

o ) ( ROTATE 90

o )

Gleason Clyde

Sales Volumes

MDD Features - Rotation

All the six views can be obtained by simple rotation

In MDBs rotations are simple as no rearrangement of data is required

Rotation is also referred to as “data slicing”

MDD Features - Ranging

How sales volume of models painted with new metallic blue compared with the sales of normal blue color models?

The user knows that only Sports Coupe and Mini Van models have received the new paint treatment

Also the user knows that only 2 dealers viz, Carr and Clyde have unconstrained supply of these models

MDD Features - Ranging

Sales Volumes

DEALERSHIP

Mini Van

Coupe

Metal Blue

MODEL

ClydeCarr

COLOR

Normal Blue

Mini Van

Coupe

Normal Blue

Metal Blue

Clyde

Carr

• The end user selects the desired positions along each dimension.• Also referred to as "data dicing." • The data is scoped down to a subset grouping

MDD Features - Ranging

The reduced array can now be rotated and used in computations in the same was as the parent array

Referred to as “Data Dicing” as data is scoped down to a subset grouping

Complex SQL query is required in RDB Performance is better in MDB as less resource

consuming searches are required

MDD Features - Roll-Ups & Drill Downs

Users want different views of the same data For eg., Sales Volume by model vs sales volume

by dealership Many times views are similar Sales volume by dealership vs. volume by district Natural relationship between Sales Volumes at

the DEALERSHIP level and Sales Volumes at the DISTRICT level

Sales Volumes for all the dealerships in a district sum to the Sales Volumes for that district

MDD Features - Roll-Ups & Drill Downs

Multidimensional database technology is specially designed to facilitate the handling of natural relationships

Define two related aggregates on the same dimension

One aggregation is dealership and the other district

District is at a higher level of aggregation than dealership

MDD Features - Roll-Ups & Drill Downs

Gary

Gleason Carr Levi Lucas Bolton

Midwest

St. LouisChicago

Clyde

REGION

DISTRICT

DEALERSHIP

ORGANIZATION DIMENSION

• The figure presents a definition of a hierarchy within the organization dimension.

• Aggregations perceived as being part of the same dimension.• Moving up and moving down levels in a hierarchy is referred to

as “roll-up” and “drill-down.”

MDD Features - Roll-Ups & Drill Downs

Queries

High degree of structure in MDB makes the query language very simple and efficient

Query language is intuitive Output is immediately useful to end user

Queries: Example

Display sales volume by model for each dealership

PRINT TOTAL.(SALES_VOLUME KEEP MODEL DEALERSHIP)

Queries: Example

Corresponding SQL

SELECT MODEL, DEALERSHIP, SUM(SALES_VOLUME)

FROM SALES_VOLUMEGROUP BY MODEL, DEALERSHIPORDER BY MODEL, DEALERSHIP

Queries: Example

Pros/Cons of MDD

Cognitive Advantages for the User Ease of Data Presentation and Navigation,

Time dimension Performance

Less flexible Requires greater initial effort

?