36

A Gentle Introduction to Microsoft SSAS

Embed Size (px)

DESCRIPTION

Shows you how to build an OLAP cube in Microsoft Analysis Services

Citation preview

Page 1: A Gentle Introduction to Microsoft SSAS
Page 2: A Gentle Introduction to Microsoft SSAS

Analysis Services Overview

• Included with the SQL Server License

• Special version of Visual Studio

• Microsoft’s application for creating multidimensional OLAP databases

which are queried with the MDX language

• Microsoft’s powerful data mining platform. Includes sophisticated

algorithms that can operate on relational or OLAP data

• This presentation demonstrates the creation of an OLAP database, also

called a cube

2

Part of the Business Intelligence Development Studio

Page 3: A Gentle Introduction to Microsoft SSAS

Why Store Data in a Cube?

3

Analysis services, like the SQL Server relational database, is a platform for storingdata. But unlike the relational database, it does not store data in tables–it isstored in other types of structures comprising a cube.

Why store data in cubes instead of tables? There are a number of reasons:

• better query performance• fast, optimized aggregations calculations• more efficient storage of data through its simplified read-oriented design• richer calculation possibilities, supporting stored measures, calculated measures, and key performance indicators (KPI)• an easier to understand data model for the end user.

OLAP data is engineered to provide dimensional data views, hierarchicalbrowsing, attribute-based breakouts and filtering. And OLAP data does notrequire or even use table joins!

Page 4: A Gentle Introduction to Microsoft SSAS

Creating an Analysis Services (OLAP)Database

• Dimensions and their associated elements

– Hierarchies

– Attributes

• Measures and their associated elements

– Stored Measures

– Calculated Measures

– Measure Groups

– Key Performance Measures (KPI)

– Measure Profiles

4

Creating an analysis services database consists of creating the databasestructure, then populating those structures with data from external datasources. The structures comprising an OLAP cube include:

Page 5: A Gentle Introduction to Microsoft SSAS

Development Steps

1. Create Analysis Services Project

2. Create Data Source

3. Create Data Source View

4. Create cube object definitions

5. Deploy definitions to OLAP server and loadin data

6. Specify partitions and aggregations

5

Page 6: A Gentle Introduction to Microsoft SSAS

Open BIDS and create a new AnalysisServices Project…

6

Page 7: A Gentle Introduction to Microsoft SSAS

Create a data source and a datasource view

7

In the data source you identify the relational datawarehouse (DW) that hosts the source data andprovide the connection information.

In the data source view, you specify which the tables inthe DW you want to use to supply data to the cube.

If the DW has been designed using the classic starschema or snowflake approaches, the SSAS cubewizard can examine the DW’s structure and derive acube definition from it, making your job easier.

When the initial design from the wizard is completeyou can go back and make modifications, for example,you can add or modify hierarchies and attributerelationships.

Once you have the cube design you want, you load thedata in.

Page 8: A Gentle Introduction to Microsoft SSAS

In this illustration, all tables from the star schema DW are used, onlythe sysdiagrams table (which holds the database diagram data) remainsunselected.

8

Page 9: A Gentle Introduction to Microsoft SSAS

9

Once you have defined the data source view, you can inspect theschema. By right clicking on a table, you can browse the data anddefine derived columns. But at this point, we still have no OLAPdatabase. The creation of the OLAP database begins with creating acube.

Page 10: A Gentle Introduction to Microsoft SSAS

Creating the Cube Structure

10

1. With the data source view in place, you are now ready tocreate the cube. Right click on the cubes folder and select“New Cube.” This launches the cube wizard.

2. Make sure the wizard hascorrectly identified which aredimension and which are fact tablesand tell it which table contains thedata for the time dimension.

Page 11: A Gentle Introduction to Microsoft SSAS

Setting the Time DimensionParameters

11

Time is a unique dimension withinherent assumptions about how itshould work.

You identify the time dimension assuch so that the MDX functions(such as PrevMember andParallelPeriod) specific to it willwork.

You also tell which of the sourcedata columns map to well knowntime concepts such as years,quarters, and months.

Page 12: A Gentle Introduction to Microsoft SSAS

12

After defining the timeproperties, the wizard displaysthe measures that it defined.Here you can select which onesyou want to keep. I am keepingall of them in this illustration.

The “Fact Units Count”measure counts the number ofrecords in the source table. Thisinformation is used inoptimizing the aggregationprocess.

In the Review New Dimensionspanel the dimensions that werecreated along with theirhierarchies and attributes aredisplayed.

Page 13: A Gentle Introduction to Microsoft SSAS

A cube has now been defined. This panel lets you review the data model(UDM) that has been created.

13

Page 14: A Gentle Introduction to Microsoft SSAS

Completing the wizard, you give the cube a name.

14

Page 15: A Gentle Introduction to Microsoft SSAS

The solution browser on the right side of the screen now shows thecube and dimensions that were defined. These objects can be modifiedby clicking on them.

15

Page 16: A Gentle Introduction to Microsoft SSAS

Click on the Product dimension to observe that no hierarchy wasdefined by the wizard. I will create a hierarchy for it.

16

Page 17: A Gentle Introduction to Microsoft SSAS

A new hierarchy is created by dragging the an aggregate attribute into the center panel. Iuse Category Code and Dim Product to form the hierarchy; Category and Item providethe labels the user will see (they are mapped to the name Property).

Next I define attributes between these hierarchy levels. This is so the aggregation processadds the data from the level directly beneath it instead of always going to the leaf level,which would be a less efficient process entailing significantly more calculations.

17

Page 18: A Gentle Introduction to Microsoft SSAS

18

In the case of the time dimension,there is an additional step. Wewant the months to display inchronological order, notalphabetical order, so we assign avalue to the dimension’sOrderByAttribute property. Thedata source has a column calledCHRON_ORDER that containsthis ordering information.

Page 19: A Gentle Introduction to Microsoft SSAS

The information entered to define the hierarchies and attributes is stored inXML files. The database doesn’t actually know what you have done yet. Youmust “process” the dimension. This brings that information into the OLAPdatabase, which it then uses to create its internal structures.

19

Page 20: A Gentle Introduction to Microsoft SSAS

Once a dimension has been processed, you can inspect it in thebrowser pane to verify the hierarchy has the expected structure. Notethat the months show in chronological order.

20

Page 21: A Gentle Introduction to Microsoft SSAS

After all the elements have been defined and the cube hasbeen completely processed, you can inspect the data in theSSAS data browser.

21

Page 22: A Gentle Introduction to Microsoft SSAS

Partitioning and Pre-Aggregating

Page 23: A Gentle Introduction to Microsoft SSAS

23

The cube has been defined and created. The leaf level data from the data warehouse was loaded. We inspected the data in the multidimensional data browser.

We can tweak the physical design of the cube to improve scalability and query performance.Three primary mechanisms for doing this are:

•Selecting ROLAP/HOLAP/MOLAP data storage options

•Partitioning

•Pre-aggregating

MOLAP is probably the most commonly used data storage option and is the default. It meansall the data will be stored in the multidimensional data cube. The illustration one thesucceeding slides will used MOLAP.

At the opposite end of the spectrum, is ROLAP where all data comes from relational datatables. With ROLAP, the SSAS database is only providing metadata structures for presentinginformation in the dimensional style. You can expect performance to be much slower. Thismode is used in situations where the source data is not static, changes frequently and you wantthe reports to reflect those changes immediately.

HOLAP is a hybrid approach where all the aggregate data is in the SSAS database except the leaflevel data which resides in relational tables.

When the volume of data is very large, it can helpful to chop the data store into pieces, orpartitions. Storage mode is selectable per partition. This example uses a small amount of dataand only one partition will be employed.

Page 24: A Gentle Introduction to Microsoft SSAS

24

Aggregates are summary level data that are computed from the leaf level data that wasloaded from the source. Often the aggregates are totals and subtotals, but other summarystatistics such as averages or maximum values can also be used.

Pre-calculating and storing the aggregate values normally improves query performance (at thecost of the storage space and time required to compute them.) The default is to do no pre-aggregates. You can see this from the partitions panel shown below.

The data display shown earlier from the SSAS data browser included many aggregate datavalues. Those aggregates were all calculated on the fly.

You can pre-calculate all aggregates or only some of them. If you are going to pre-calculate onlysome, there are different strategies that can be employed to determined which are chosen forcalculation. You’ll see this ahead.

Page 25: A Gentle Introduction to Microsoft SSAS

Let’s go through the aggregations process. Click the “Design Aggregations”hyperlink to bring up the wizard. In the first panel of the wizard, push the countbutton to compute the statistics that are used to drive the aggregation optimizationprocess.

25

Page 26: A Gentle Introduction to Microsoft SSAS

26

After a few seconds, the sourcedata has been analyzed. Itcounts the number of recordsper partition.

Once the statistics have beencomputed, you can ask the systemto identify a set of aggregations toperform. You can direct thesystem to aggregate until A) acertain amount of storage has beenused, B) a certain level ofperformance gain has beenachieved, C) you click stop, or D)don’t do anything

In this illustration, I am asking itto aggregate until it reaches aperformance gain of 75%. Thesystem will run an optimizationalgorithm to determine the bestones to use.

Page 27: A Gentle Introduction to Microsoft SSAS

27

The system generates a chart tellingwhat percent (of the total possiblenumber of) aggregations it hasidentified and what level ofperformance gain would beachieved by computing them.

At the completion of that step, thewizard has identified whichaggregations to compute. You mayelect to have it compute them nowor you can defer the calculations tilllater. (They could take a while.)

Page 28: A Gentle Introduction to Microsoft SSAS

28

Selecting “Deploy and Processnow” and pushing Finish, youarrive at this screen.

Push the RUN button toperform the calculations.

When it finishes, you geta message heralding thesuccessful completion ofthe deployment.

The information underthe Aggregations tab willbe updated.

Page 29: A Gentle Introduction to Microsoft SSAS

CreatingDerived Measures and KPIs

Page 30: A Gentle Introduction to Microsoft SSAS

Different Kinds of Reporting Data

• Calculated measures

– Percents

– Shares

– Differences

• Key Performance Indicators (KPIs)

30

Thus far, all the measures that have been constructed have been displays ofstored data or aggregates either stored or calculated on the fly. There areother kinds information that can be made available to an end user.

Calculated measures are calculated on the fly using MDX expressions. KPI’sare measures with associated goals and graphics. I will show an example ofboth.

Page 31: A Gentle Introduction to Microsoft SSAS

31

In this example, I create a calculated measure that gives difference betweenthe data value at a given time and its value the previous time period. Thecalculation is defined from the Calculations tab. It is given a name and anMDX expression. In this example I make use of the PrevMember function.

Page 32: A Gentle Introduction to Microsoft SSAS

32

Displaying the Units measure and the Units Increase measure side-by-side demonstrates that the calculated measure correctly computes thedifference between the current value and the one a month ago.

In the next series of slides I will use this calculated measure to constructa KPI.

Page 33: A Gentle Introduction to Microsoft SSAS

What is a Key Performance Indicator? (KPI)

33

Every KPI starts life off as a measure, presumably, a measure that is an indicator of companyperformance. With each KPI, we assume that the company has established a target value –goal – of what that indicator should be. For instance, sales revenue might be a performanceindicator. The goal might be to sell at least $100,000 in a given quarter.

The KPI will calculate the difference between the goal and the actual result. We assume thecompany can assess those differences declaring them as either good, so-so, or bad. Forinstance, the company may say, revenue > 100,000 is good, 90,000 to 100,000 is so-so, andrevenue less than 90,000 is bad.

This brings us to an essential distinguishing feature of the KPI: a graphical icon, known as anindicator that is displayed to communicate the status of the KPI to the end user. That graphicmight be a happy face to show good, a neutral face to show so-so, and a frowning face to showbad. Traffic lights with green, amber, and red are often used. The choice of graphics is up tothe client.

Setting up a KPI in Analysis Services entails computing a value of status. The differencebetween the indicator and the goal is calculated, and the differences that are “good” aremapped to the number 1, so-so to 0 and bad to -1. That number is the KPI’s status.

Optionally, you can define a trend for the KPI. The trend shows if, over time, theperformance measure has been moving upwards or downwards.

Page 34: A Gentle Introduction to Microsoft SSAS

KPI Summary

• Begin with a measure indicating company performance

• Have goals associated with that performance measure

• Translate the difference between performance and goal into

its status with values of 1, 0, -1 (corresponding to good, so-so,

and bad)

• Display the status of the performance measure to the user as a

graphic

34

Page 35: A Gentle Introduction to Microsoft SSAS

35

You define KPIs from the KPI tab of the Cube browser. In this simpleillustration, our calculated measure, “Units difference” is the performanceindicator, and the goal is a constant value of 180. MDX expressions can beused to provide more complex goal statements.

Page 36: A Gentle Introduction to Microsoft SSAS

36

Once you have defined the KPI, it may be inspected in the browser tab of theKPI tab. Here you see the performance metric has a value of 179, just underthe target value. This is “so-so” and you see the neutral face showing.