92
1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Paul Chen Data Warehouse Fundamentals

1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

Embed Size (px)

Citation preview

Page 1: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

1

Chapter 8

OLAP IN THE DATA WAREHOUSE

Paul ChenPaul Chen

Data Warehouse Fundamentals

Page 2: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

2

Summary of Topics

1. OLAP Definition, Key features and Benefits

2. How OLAP differs from OATP?

3. Multidimensional data ? What? Why? How To Use? 4. OLAP Server Query, Features and Applications.

5. Category of OLAP tools -Multi-Dimensional OLAP (MOLAP), Relational OLAP (ROLAP), and Managed Query Environment (MQE).

6. OLAP extensions to SQL.

7. The Microsoft Data Warehousing Framework.

Page 3: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

3

Data Warehousing and End-User Access Tools

Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities.

Key developments include:

– Online analytical processing (OLAP)

– SQL extensions for complex data analysis

– Data mining tools.

Page 4: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

4

Limitations of Other Analysis Methods SQL has been the accepted interface for retrieving and manipulating data

from relational databases. These methods are used in OLTP systems and in data warehousing environments (referring to the environments with simple queries and routine reports).

Now consider information retrieval and manipulation in these environments- reports writers and spreadsheets.

Report writers: two features- the ability to point and click for generating and issuing SQL calls, and the capability to format the output reports. However, report writers do not support multidimensionality. With basic report writers, you cannot drill down to lower levels in the dimensions. You cannot rotate the results by switching rows and columns. The report writers do not provide aggregate navigation. Once the report is formatted and run, you cannot alter the presentation of the result data sets. Spreadsheets are still cumbersome for showing all the aggregate levels and multidimensional views, let alone doing calculations for “roll-up” and “drill-down”.

Page 5: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

5

Topic 1: OLAP Definition

“On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user”

-- DBMS Magazine, April, 1995

Page 6: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

6

Key Features of OLAP

Supports analysis, dynamic synthesis and consolidation of large volumes of multi-dimensional data. Types of analysis ranges from basic navigation and browsing (slicing and dicing) to calculations, to more complex analyses such as time series and complex modeling.

Is able to drill down or roll up with each dimension.

Is capable of applying mathematical formulas and calculations to measures.

Page 7: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

7

Key Features of OLAP

Can easily answer ‘who?’ and ‘what?’. Ability to answer ‘what if?’ and ‘why?’ type questions.

Distinguishes OLAP from general-purpose query tools.

Enables users to gain a deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to a wide variety of possible views of the data.

Can be implemented on the web.

Page 8: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

8

OLAP Benefits

Increased productivity of end-users. Reduced backlog of applications development for IT

staff. Retention of organizational control over the integrity

of corporate data. Reduced query drag and network traffic on OLTP

systems or on the data warehouse. Improved potential revenue and profitability.

Page 9: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

9

Topic 2: OLAP VS OLTP (ON-LINE TRANSACTION PROCESSING)

OLTP (RELATIONAL)

ATOMIZED PRESENT ‘RECORD-AT-A-TIME’

PROCESS ORIENTED

OLAP (MULTIDIMENSIONAL)

SUMMARIZED HISTORICAL ‘MANY RECORDS-AT-A-

TIME’ SUBJECT ORIENTED

Page 10: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

10

OLAP VS OLTP

WHILE OLTP APPLICATIONS GENERALLY DO NOT REQUIRE HISTORICAL DATA, NEARLY EVERY OLAP APPLICATION IS CONCERNED WITH VIEWING TRENDS AND THEREFORE REQUIRES HISTORICAL DATA. OLTP APPLICATIONS AND DATABASE TEND TO BE ORGANIZED AROUND SPECIFIC PROCESSES (SUCH AS ORDER ENTRY), OLAP APPLICATIONS TEND TO BE “SUBJECT-ORIENTED” ANSWERING SUCH QUESTIONS AS “WHAT PRODUCTS ARE SELLING WELL” OR “ WHAT ARE MY WEAKEST SALES OFFICES?”

Page 11: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

11

Topic 3: Multi-dimensional Database

In a multidimensional database, data is stored as Facts and Dimensions instead of rows and columns

Multi-dimensional structures are best visualized as cubes of data, and cubes within cubes of data. Each side of a cube is a dimension.

Page 12: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

12

Sample Star Schema

Product

Store

Month

Sales

Product KeyStore KeyTime KeyFixed CostVariable costProfit marginYTD_Sales_dollars_by_storeYTD_Sales_dollar_by_categoryYTD_Sales_By_department

Time keyDateMonthquarteryear

Product

Product Key

Product NameCategory

Fact Table Time

Product line

Department

Store keyStore name

region

Store

Page 13: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

13

Kinds of Queries

Display the total sales of all products for past five years in all stores.

Compare total sales for all stores, product by product, between years 2000 and 1999

Show comparison of total sales for all stores, product by product, between years 2000 and 1999 only for those products for reduced sales.

Page 14: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

14

Cube

Users often view and analyze data multidimensionality, using hierarchical segmentation along each dimension. Thus a user may analyze sales along the time dimension (such as months within quarters with years), along geographical dimension (cities with regions within countries), along the organizational dimension (sales persons within branches within territories). We can conceptualize the approach as a cube.

Page 15: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

15

Cube

sale p Branch

week

price

week 1

week 2

P1P4

p1p2p3p4

c1c2c3c1

C1 C2 C3

1221

1243

1

p2

p3

c1

c2 c3

3

Fact Table View Multi-Dimensional Cube

Property sale

P = property #

24

Cells: roughly equivalent to records in a relational database.

Page 16: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

16

WHAT IS MULTIDIMENSIONAL DATA?

RELATIONAL DATABASES ARE ORGANIZED AROUND A LIST OF “RECORDS”. EACH RECORD CONTAINS RELATED INFORMATION WHICH ARE ORGANIZED INTO “FIELDS”.

CUS NAME CUSTOMER # TELEPHONE ADDRESS JACK 10001 345-4444 40 MAIN WALTER 10002 345-6666 30 ELM HOOVER 10003 345-8588 6

BELLRED

THIS TABLE HAS ONLY ONE DIMENSION.

Page 17: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

17

WHAT IS MULTIDIMENSIONAL DATA?

LOOKING AT “CUSTOMER BY TELEPHONE #” OR

“TELEPHONE # BY CUSTOMER” ONLY PRODUCES A ONE-FOR-ONE CORRESPONDENCE.

Page 18: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

18

WHAT IS MULTIDIMENSIONAL DATA?

Let’s take a look at an example of a relational table where

there is more than one-to-one correspondence between

the fields.

In the following table, we have sales data for each product

In each region-- four products sold in three regions. The

Sales data is a two-dimensional matrix (“Product” and “Region”).

Page 19: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

19

product/region/sales table

Product Region Sales Nuts East 50 Nuts West 40 Nuts Central 30 Screws East 60 Screws West 50 Screws Central 60 Bolts East 100 Bolts West 120 Bolts Central 80 Washers East 90 Washers West 100 Washers Central 40

Page 20: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

20

A Much Better Way To Represent the Data: Two-Dimensional Matrix

PRODUCTEAST WEST CENTRALNUTS 50 40 30 SCREWS 60 50 60BOLTS 100 120 80WASHERS 90 100 40

Page 21: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

21

QUERY ON MULTIDIMENSIONAL DATA

QUESTIONS LIKE ‘WHAT WERE TOTAL SALES OF NUTS?’ OR WHAT WERE TOTAL SALES FOR THE EAST?’ TO FIND THE ANSWER IN THE TWO DIMENSIONAL TABLE, JUST FIND THE CELL CALLED ‘EAST’ AND ADD UP ALL THE NUMBERS IN

THE COLUMN.

Page 22: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

22

AGGREGATION: THE KEY TO CONSISTENTLY FAST RESPONSE

PRODUCTEAST WEST CENTRAL TOTALNUTS 50 40 30

120SCREWS 60 50 60 170BOLTS 100 120 80 300WASHERS 90 100 40 230TOTAL 300 310 210 820

Page 23: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

23

Multiple Reads & Database Writes

In the above example, computing the totals involves 28 (4*4 +3*4) database reads and eight database writes. A typical relational database can read about 200 records per second and writes perhaps 20 records per second. So consolidating this tiny database would take less than one second. However, for some larger tables, computing for totals could take days or even weeks to consolidate.

Page 24: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

24

Relational Database vs. Multidimensional OLAP Server

With a multidimensional OLAP server, we can perform the same consolidation with row and column arithmetic. Whereas a relational database can access a few hundred records per second, a good OLP server should be capable of consolidating 20,000 to 30,000 cells (equivalent to records in the relational table) per second, including the time to write the totals to the database. The multidimensional OLAP database will take up less space since the names of the regions and products are not repeated in the multidimensional database as they are in the relational database.

Page 25: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

25

Multiple Hierarchies And Classes Within Dimensions

The single biggest factor in determining how many dimensions for a database is the existence of multiple hierarchies and classes within dimensions.

Classes are typically attributes such as “size”, “color” and other characteristics that define a subset of the members of a dimension.

For example, a database of shampoo sales might want to roll up product sales by size (6 oz, 15 oz), by type(dry hair, oily hair) and possibly by other attributes such as scented/unscented, brand name, and so on.

Page 26: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

26

TERMINOLOGY

Dimension: roughly equivalent to Fields in a relational database.

In the multidimensional data, “Product” and “Region” are both dimensions.

Cells: roughly equivalent to Records in a relational

database.

Page 27: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

27

Simple Hierarchies (Roll up) Within Dimensions

The preceding table can be represented graphically as follows:

Region Total

Product Total

East West Central

nuts Bolts Screws Washers

Individual products roll up into a Product Total

Page 28: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

28

Multiple Levels of HierarchiesMultiple Levels of HierarchiesRegion Total

EAST West Central

Calif Washington Oregon

Seattle Bellevue

Page 29: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

29

Slice and Dice

A three-dimensional array has a total of six faces, or views. A four-dimensional array has twelve views. An n-dimensional array has n(n-1) views. The ability to rotate the data cube is the main technology for multi-dimensional reporting and is sometimes called “slice and dice.”

Product

Region

Actual/Forecast

Page 30: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

30

Practical Limitations on Database Size

In general, as the number of dimensions increases, the number of cells in the database increases exponentially.

for ex., a two-dimensional database with 100 products and 100 regions would have 10,000 cells. If we add a third dimension for time with 52 weeks, we now have 520,000 cells.

Most commercial OLAP servers hit the cell limit long before they run out of dimensions.

Page 31: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

31

Topic 4: A list of some important features supported by some OLAP Servers

Special time-series data types Special dimensions for variables (complex mathematical

relationships, such as computed averages, and simultaneous equations)

Multiple hierarchies within a dimension Classes with a dimension Virtual variables (computed on the fly at run time, such as “gross

margin derived from “revenues” and “expenses.”

Page 32: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

32

Examples of OLAP applications in various functional areas

Page 33: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

33

OLAP Applications

Although OLAP applications are found in widely divergent functional areas, all have following key features:

– multi-dimensional views of data

– support for complex calculations

– time intelligence.

Page 34: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

34

OLAP Applications - multi-dimensional views of data

Core requirement of building a ‘realistic’ business model.

Provides basis for analytical processing through flexible access to corporate data.

The underlying database design that provides the multi-dimensional view of data should treat all dimensions equally.

Page 35: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

35

OLAP Applications - support for complex calculations

Must provide a range of powerful computational methods such as that required by sales forecasting, which uses trend algorithms such as moving averages and percentage growth.

Mechanisms for implementing computational methods should be clear and non-procedural.

Page 36: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

36

OLAP Applications – time intelligence

Key feature of almost any analytical application as performance is almost always judged over time.

Time hierarchy is not always used in same manner as other hierarchies.

Concepts such as year-to-date and period-over-period comparisons should be easily defined.

Page 37: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

37

Representing Multi-Dimensional Data

Example of two-dimensional query.

– What is the total revenue generated by property sales in each city, in each quarter of 1997?’

Choice of representation is based on types of queries end-user may ask.

Compare representation - three-field relational table versus two-dimensional matrix.

Page 38: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

38

Multi-dimensional Data as Three-field table versus Two-dimensional Matrix

Page 39: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

39

Representing Multi-Dimensional Data

Example of three-dimensional query.

– ‘What is the total revenue generated by property sales for each type of property (Flat or House) in each city, in each quarter of 1997?’

Compare representation - four-field relational table versus three-dimensional cube.

Page 40: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

40

Multi-dimensional Data as Four-field Table versus Three-dimensional Cube

Page 41: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

41

Representing Multi-Dimensional Data

Cube represents data as cells in an array.

Relational table only represents multi-dimensional data in two dimensions.

Page 42: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

42

Multi-Dimensional OLAP Servers

Use multi-dimensional structures to store data and relationships between data.

Multi-dimensional structures are best visualized as cubes of data, and cubes within cubes of data. Each side of cube is a dimension.

A cube can be expanded to include other dimensions.

Page 43: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

43

Multi-Dimensional OLAP Servers

A cube supports matrix arithmetic.

Multi-dimensional query response time depends on how many cells have to be added ‘on the fly’.

As number of dimensions increases, number of the cube’s cells increases exponentially.

Page 44: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

44

Multi-Dimensional OLAP Servers

However, majority of multi-dimensional queries use summarized, high-level data.

Solution is to pre-aggregate (consolidate) all logical subtotals and totals along all dimensions.

Pre-aggregation is valuable, as typical dimensions are hierarchical in nature.

– (e.g. Time dimension hierarchy - years, quarters, months, weeks, and days)

Page 45: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

45

Multi-Dimensional OLAP Servers

Predefined hierarchy allows logical pre-aggregation and, conversely, allows for a logical ‘drill-down’.

Supports common analytical operations

– Consolidation

– Drill-down

– Slicing and dicing.

Page 46: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

46

Multi-Dimensional OLAP Servers

Consolidation - aggregation of data such as simple ‘roll-ups’ or complex expressions involving inter-related data.

Drill-Down - is reverse of consolidation and involves displaying the detailed data that comprises the consolidated data.

Slicing and Dicing - (also called pivoting) refers to the ability to look at the data from different viewpoints.

Page 47: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

47

Multi-Dimensional OLAP servers

Can store data in a compressed form by dynamically selecting physical storage organizations and compression techniques that maximize space utilization.

Dense data (ie., data that exists for high percentage of cells) can be stored separately from sparse data (ie., significant percentage of cells are empty).

Page 48: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

48

Multi-Dimensional OLAP Servers

Ability to omit empty or repetitive cells can greatly reduce the size of the cube and the amount of processing.

Allows analysis of exceptionally large amounts of data.

Page 49: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

49

Multi-Dimensional OLAP Servers

In summary, pre-aggregation, dimensional hierarchy, and sparse data management can significantly reduce the size of the cube and the need to calculate values ‘on-the-fly’.

Removes need for multi-table joins and provides quick and direct access to arrays of data, thus significantly speeding up execution of multi-dimensional queries.

Page 50: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

50

Codd’s Rules for OLAP Systems

In 1993, E.F. Codd formulated twelve rules as the basis for selecting OLAP tools.

– Multi-dimensional conceptual view

– Transparency

– Accessibility

– Consistent reporting performance

– Client-server architecture

– Generic dimensionality

Page 51: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

51

Codd’s rules for OLAP Systems

– Dynamic sparse matrix handling

– Multi-user support

– Unrestricted cross-dimensional operations

– Intuitive data manipulation

– Flexible reporting

– Unlimited dimensions and aggregation levels.

Page 52: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

52

Codd’s Rules for OLAP Systems

There are proposals to re-defined or extended the rules. For example, to also include:– Comprehensive database management tools

– Ability to drill down to detail (source record) level

– Incremental database refresh

– SQL interface to the existing enterprise environment

Page 53: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

53

Categories of OLAP Tools

OLAP tools are categorized according to the architecture of the underlying database.

Three main categories of OLAP tools include

– Multi-dimensional OLAP (MOLAP or MD-OLAP)

– Relational OLAP (ROLAP), also called multi-relational OLAP

– Managed query environment (MQE)

Page 54: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

54

Topic 5: Multi-Dimensional OLAP (MOLAP)

Use specialized data structures and multi-dimensional Database Management Systems (MDDBMSs) to organize, navigate, and analyze data.

Data is typically aggregated and stored according to predicted usage to enhance query performance.

Page 55: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

55

Multi-Dimensional OLAP (MOLAP)

Use array technology and efficient storage techniques that minimize the disk space requirements through sparse data management.

Provides excellent performance when data is used as designed, and the focus is on data for a specific decision-support application.

Page 56: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

56

Multi-Dimensional OLAP (MOLAP)

Traditionally, require a tight coupling with the application layer and presentation layer.

Recent trends segregate the OLAP from the data structures through the use of published application programming interfaces (APIs).

Page 57: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

57

Typical Architecture for MOLAP Tools

Page 58: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

58

MOLAP Tools - Development Issues

Underlying data structures are limited in their ability to support multiple subject areas and to provide access to detailed data.

Navigation and analysis of data is limited because the data is designed according to previously determined requirements.

Page 59: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

59

MOLAP Tools - Development Issues

MOLAP products require a different set of skills and tools to build and maintain the database, thus increasing the cost and complexity of support.

Page 60: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

60

Relational OLAP (ROLAP)

Fastest-growing style of OLAP technology.

Supports RDBMS products using a metadata layer - avoids need to create a static multi-dimensional data structure - facilitates the creation of multiple multi-dimensional views of the two-dimensional relation.

Page 61: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

61

Relational OLAP (ROLAP)

To improve performance, some products use SQL engines to support complexity of multi-dimensional analysis, while others recommend, or require, the use of highly denormalized database designs such as the star schema.

Page 62: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

62

Typical Architecture for ROLAP Tools

Page 63: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

63

ROLAP Tools - Development Issues

Middleware to facilitate the development of multi-dimensional applications. (Software that converts the two-dimensional relation into a multi-dimensional structure).

Development of an option to create persistent, multi-dimensional structures with facilities to assist in the administration of these structures.

Page 64: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

64

Managed Query Environment (MQE)

Relatively new development.

Provide limited analysis capability, either directly against RDBMS products, or by using an intermediate MOLAP server.

Page 65: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

65

Managed Query Environment (MQE)

Deliver selected data directly from DBMS or via a MOLAP server to desktop (or local server) in form of a datacube, where it is stored, analyzed, and maintained locally.

Promoted as being relatively simple to install and administer with reduced cost and maintenance.

Page 66: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

66

Typical Architecture for MQE Tools

Page 67: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

67

MQE Tools - Development Issues

Architecture results in significant data redundancy and may cause problems for networks that support many users.

Ability of each user to build a custom datacube may

cause a lack of data consistency among users.

Only a limited amount of data can be efficiently maintained.

Page 68: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

68

Topic 6: OLAP Extensions to SQL

SQL promoted as easy-to-learn, nonprocedural, free-format, DBMS-independent, and international standard.

However, major disadvantage has been inability to represent many of the questions most commonly asked by business analysts.

IBM and Oracle jointly proposed OLAP extensions to SQL early in 1999, adopted as an amendment to SQL.

Page 69: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

69

OLAP Extensions to SQL

Many database vendors including IBM, Oracle, Informix, and Red Brick Systems have already implemented portions of specifications in their DBMSs.

Red Brick Systems was first to implement many essential OLAP functions (as Red Brick Intelligent SQL (RISQL)), albeit in advance of the standard.

Page 70: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

70

OLAP Extensions to SQL - RISQL

Designed for business analysts.

Set of extensions that augments SQL with a variety of powerful operations appropriate to data analysis and decision-support applications such as ranking, moving averages, comparisons, market share, this year versus last year.

Page 71: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

71

Use of the RISQL CUME Function

Show the quarterly sales for branch office B003, along with the monthly year-to-date figures.

SELECT quarter, quarterlySales, CUME(quarterlySales) AS Year-to-Date

FROM BranchSales

WHERE branchNo = 'B003';

Page 72: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

72

Use of the RISQL MOVINGAVG / MOVINGSUM Function

Show the first six monthly sales for branch office B003 without the effect of seasonality.

SELECT month, monthlySales,

MOVINGAVG(monthlySales) AS 3-MonthMovingAvg,

MOVINGSUM(monthlySales) AS 3-MonthMovingSum

FROM BranchSales

WHERE branchNo = 'B003';

Page 73: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

73

Topic 7: The Microsoft Data Warehousing Framework

The Microsoft Data Warehousing Framework is an open architecture that is easily integrated with existing systems. The Microsoft SQL Server DTS tool is used to import, export, and repair or transform data (where it is necessary). The Framework contains a rich object-oriented programming interface for customized data warehousing implementations. There is also a user interface, the Microsoft SQL Server Analysis Services Manager that can be used to configure the data warehouse and to populate or update the content in a cube. It can be used to schedule tasks, monitor performance, and perform queries on the data warehouse.

Page 74: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

74

Microsoft Data WarehouseProduct

Data warehousing software has been included with Microsoft® SQL Server™ since the release of version 7.0 in 1998.

Page 75: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

75

The Cube as a Model for the Data Warehouse

The cube is an imperfect yet satisfactory name for a data warehouse repository. How does a data warehouse cube differ from a geometrical cube? There are a few important differences. A data warehouse cube is defined by any number of dimensions (it is not limited to three, and sometimes a data-warehousing cube may have fewer than three dimensions). Dimensions describe a data-warehousing cube just as width, height, and depth describe a geometrical cube. Where it is appropriate, dimensions can be organized into any number of levels.

Page 76: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

76

The Cube as a Model for the Data Warehouse

The relationship between two dimensions can be modeled using a grid. Dimensions are like the labels of along the axes of the grid. The cells are facts. Facts correspond to the cross product of each dimension of the cube. The data in the cell is a measure. Measures are the whole reason for the cube. If the cube is about the number of items sold, the measure is a count of the number of items sold. To repeat the grid example, the measure is the number that you would find in the grid cell.

Page 77: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

77

Dimensions and Levels

Levels are used to organize dimensions into smaller units where necessary. Levels may also contain other levels, depending on how they are configured in the cube. For example, consider that there is a region dimension. Perhaps this grocery store operates in three states and uses the state boundaries as territorial boundaries. Let's say that the region dimension contains three levels: California, Oregon, and Washington. If the business has additional sub-regions such as Seattle, Olympia, Yakima, and Spokane in the state of Washington, these levels can be added as levels to the Washington region, even if such detail is not needed for the California and Oregon region. Levels are just a convenient way to organize facts for a dimension.

Page 78: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

78

Facts and Measures

A fact is about the combination of the various dimensions. Locating a fact is like using a coordinate system. Just like a position in a mathematical cube such as the origin, which might be represented by (x=0,y=0,z=0), a fact would be represented by specific combination of dimensions such as

(Product=broccoli, Region=Seattle, Time=Wednesday) yielding a specific fact about broccoli being sold in Seattle on Wednesday. Depending on the way that the cube is being used, the fact may show a measure of something like 580 units sold or perhaps a different measure like $860.00 in sales. The meaning of the measure depends on how the cube is defined.

Page 79: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

79

Aggregations

The mathematical operations of count and sum are an essential part of the reason data warehouses are useful. These are aggregations. Once dimensions are organized and a cube is being processed, the aggregations are calculated. Generally, aggregations are calculated immediately after the cube is initially populated or there is a change to the content of the cube.

Page 80: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

80

Using a Data Warehouse to Make Decisions

Consider a grocery store. Let's say that a promotion has been running for a few days and the grocer needs to decide whether or not to run the promotion again. A question that the grocer might pose would be something like, "Has more of the product been sold during the promotion period compared to prior periods?"

Page 81: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

81

Using a Data Warehouse to Make Decisions

A grocery store inventory system may record prices, products, sales, and promotions in a transactional database using normalized structures. Inventory systems are optimized for inserting and updating records and perhaps for simple procedural selections such as retrieving the cost of an item. It is much less likely that the system is organized in such a way that

it would be just as efficient to produce a report that details on a day-by-day and product-by-product basis the effectiveness of a sale. In fact, there is usually a contradiction between systems that are designed for

transactional efficiency and those that are designed for efficient queries. This is where a data warehouse should be used. The data warehouse is a separate repository that uses the relevant data from existing sources in a structure that has been optimized for selection.

Page 82: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

82

Using a Data Warehouse to Make Decisions

In this case, it is easy to use a data warehouse to answer the grocer's question. The sum of the fact records that measure the number of items sold using the cube dimensions of products, promotion, and time can produce the needed results.

Page 83: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

83

Using a Data Warehouse to Make Decisions

To contrast this technique, the information in the other systems may not even be in the same database. The inventory data source may not be the same data source as the customer data source or the employee data source. Even if the systems are in the same database, it would still be a chore to build a system of queries that will combine and aggregate the results in a way that will produce the correct answer. In fact, this effort of combining data sources and aggregating the results is just what the data warehousing software does best.

Page 84: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

84

Viewing a Slice and the Programming Interface to A Cube

Although fashioning a data warehouse in multiple dimensions may be a simple design choice, and performing queries that produce results that span several dimensions may not be a significant chore for the processor, the very constitution of multi-dimensional output can make it difficult to display. Charts, graphs, and tables are almost always presented in two dimensions. There are some good three-dimensional charting tools, but beyond that, the chart becomes more of a puzzle than a visual aid. A common technique of viewing multi-dimensional output is to view the output one two-dimensional "slice" of a cube at a time. This is the way that the Microsoft SQL Server Analysis Services Tool displays output.

Page 85: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

85

Using DSO (Decision Support Object)

Fortunately, output is not restricted to two dimensions. Microsoft SQL Server Analysis Services provides a programming interface to multi-dimensional data warehouse output: the DSO (Decision Support Object). DSO can be used programmatically to access the various dimensions.

Page 86: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

86

Using MDX

MDX (multidimensional extensions) is a syntax designed for querying multidimensional objects and data. For such systems, it is more efficient and intuitive to use than SQL, which was designed for an entirely different set of objects. The grammar of an MDX query has a similar feel to the grammar of an SQL query. Observe the MDX query below which would select the sales figures for the broccoli sold in Seattle on Wednesday that I used in the example above:

Page 87: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

87

Using MDX

Microsoft SQL Server OLAP Services provides an architecture for access to multidimensional data. This data is summarized, organized, and stored in multidimensional structures for rapid response to user queries. Through OLE DB for OLAP, a PivotTable Service® provides client access to this multidimensional online analytical processing

(OLAP) data. For expressing queries to this data, OLE DB for OLAP employs a full-fledged, highly functional expression syntax: multidimensional expressions (MDX).

 

Page 88: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

88

Using MDX

OLE DB for OLAP is a set of Component Object Model (COM) interfaces designed to extend OLE DB for efficient access to multidimensional data. ADO has been extended with new objects, collections, and methods that are designed to take advantage of OLE DB for OLAP. These extensions are collectively known as ADO MD (multidimensional) and are designed to provide a simple, high-level object model for accessing tabular and OLAP data.

 

Page 89: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

89

MDX expression

SELECT axis specification ON COLUMNS, axis specification ON ROWS FROM cube_name WHERE slicer_specification

Page 90: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

90

SELECT

[Measures].[Sales] ON COLUMNS [Time].[Wednesday] ON ROWS FROM MySalesCube WHERE [Region].[Washington].[Seattle] AND [Product].[Vegetable].[Broccholi]   The output of this query would be a

column labeled Sales, a row labeled Wednesday and a single grid cell at the intersection with the sales figure

$860.00.

Page 91: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

91

SELECT

The SQL Server Analysis Server Manager has an interface that accepts MDX queries. Alternatively, MDX queries can be incorporated into programs that employ the DSO.

Page 92: 1 Chapter 8 OLAP IN THE DATA WAREHOUSE Paul Chen Data Warehouse Fundamentals

92

The Microsoft Data Warehousing Framework

There is more information about how to build a cube using the Microsoft SQL Server Analysis Services Manager at MSDN Online, as well as guidelines

to consider for the design and configuration of your data warehouse. See the "How to" article at this link:

http://msdn.microsoft.com/library/psdk/sql/aghtintro_2vov.htm.