25
About Presenter Karan Gulati is SQL Server Analysis Services Maestro (MCM), working as Support Escalation Engineer in Microsoft for last five years. Currently he is focusing more on SQL BI and SQL PDW. He is very Active blogger and contributed to multiple whitepapers which are published on MSDN or TechNet site. He had also written tools which are available on CodePlex. Karan Gulati (SSAS Maestro) 1

Data warehouse and ssas terms

Embed Size (px)

DESCRIPTION

Overview of Data Warehousing and Analysis Services terms

Citation preview

Page 1: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)1

About Presenter

Karan Gulati is SQL Server Analysis Services Maestro (MCM), working as Support Escalation Engineer in Microsoft for last five years. Currently he is focusing more on SQL BI and SQL PDW. He is very Active blogger and contributed to multiple whitepapers which are published on MSDN or TechNet site. He had also written tools which are available on CodePlex.

Page 2: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)2

Data Warehousing Concepts

Overview of Data Warehousing and Analysis Services terms

Page 3: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)3

What are we covering

Understanding terms used in SSAS and Data Warehousing world:

• What is Data Warehouse• OLAP

• Cube• Measures• Dimensions

• Schema• Star• Snow-Flake

• Surrogate Keys• Slowly Changing Dimensions

• SCD1• SCD2• SCD3

Page 4: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)4

Data Warehousing

A data warehouse is a general structure for storing the data needed for good BI (Business Intelligence).

Data in a warehouse is of little use until it is converted into the information that decision makers need.

The large relational databases, typical of data warehouses, need additional help to convert the data into information.

Page 5: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)5

Why Use OLAP?

Provides fast and interactive access to aggregated data and the ability to drill down to detail.Lets users view and interrogate large volumes of data (often millions of rows) by pre-aggregating the information. Puts the data needed to make strategic decisions directly into the hands of the decision makers, through pre-defined queries and reports, because it gives end users the ability to perform their own ad hoc queries, minimizing users' dependence on database developers.

Page 6: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)6

OLAP Secret

It leverages existing data from a relational schema or data warehouse (data source) by placing key performance indicators (measures) into context (dimensions).Once processed into a multidimensional database (cube), all of the measures are pre-aggregated, which makes data retrieval significantly faster.The processed cube can then be made available to business users who can browse the data using a variety of tools, making ad hoc analysis an interactive and analytical process rather than a development effort. SQL Server 2005's BI Workbench substantially improves upon SQL Server 2000's BI capability.

Page 7: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)7

SQL BI Tools

The SQL Server BI Workbench suite consists of five basic tools:SQL Server Relational Database: Used to create relational database Analysis Services: Used to create multidimensional model (measures, dimensions and schema)Data Transformation Services (DTS (Integration Services)): Used to extract, transform and load data from source(s) to the data warehouse or schemaReporting Services: Used to build and manage enterprise reporting using the relational or multidimensional sources Data Mining: Used to extract information based on predetermined algorithms

Page 8: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)8

Architecture

Page 9: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)9

What is Cube?

A collection of one or more related measure groups and their associated dimensions

Page 10: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)10

Cube Example

Consider the following Imports cube. It contains:Two measures:

PackagesLast

Three related dimensions:RouteSourceTime

Page 11: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)11

Elements of Cubes

MeasuresDimensionsSchema

StarSnowflake

Page 12: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)12

Measures

Measures are the key performance indicators that you want to evaluate.

To determine which of the numbers in the data might be measures, here is a rule of thumb:

If a number makes sense when it is aggregated, then it is a measure.

Page 13: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)13

Dimensions

Dimensions are the categories of data analysis.

Here is the rule of thumb: When a report is requested "by" something, that something is usually a dimension.

Page 14: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)14

Schema

Methodology of arranging your Fact and Master tables:

Star Schema

Snow-Flake Schema

Page 15: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)15

Star Schema

The figure shows a basic star schema; with the dimension tables arranged around a central fact table that contains the measures. A fact table contains a column for each measure as well as a column for each dimension. Each dimension column has a foreign-key relationship to the related dimension table, and the dimension columns taken together are the key to the fact table.

Page 16: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)16

Snowflake

Normalizing each of the dimension tables so that there are many joins for each dimension results in a Snowflake Schema.It is called a Snowflake Schema because the “points” of the star get broken up into little branches that look like a snowflake.

Page 17: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)17

Which Schema works for you?

Good question:It all depends on your requirement, I will say Star is very simple to understand and manage in comparison to Snow-flake but in real world you cant fit everything in one table so Normalize needs to be done.

Page 18: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)18

Surrogate Keys

Also known:Meaningless keysSubstitute keysNon-natural keysArtificial keys

A surrogate key is a unique value, usually an integer, assigned to each row in the dimension. This surrogate key becomes the primary key of the dimension table and is used to join the dimension to the associated foreign key field in the fact table.

Page 19: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)19

What’s benefit of Surrogate Keys

A surrogate key is a unique value, usually an integer, assigned to each row in the dimension. This surrogate key becomes the primary key of the dimension table and is used to join the dimension to the associated foreign key field in the fact table.

Surrogate keys helps in maintaining history in case of Slowly Changing Dimensions

Page 20: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)20

Slowly Changing Dimensions

There are 3 Versions of SCD

SCD 1

The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. This is most appropriate when correcting certain types of data errors, such as the spelling of a name. (Assuming you won't ever need to know how it used to be misspelled in the past)

Page 21: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)21

So, what’ Dis-Advantage of SCD1

The obvious disadvantage to this method of managing SCDs is that there is no historical record kept in the data warehouse. You can't tell if your suppliers are tending to move to the Midwest, for example. But an advantage to this is that these are very easy to maintain. Type 2

Page 22: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)22

SCD 2

The Type 2 method tracks historical data by creating multiple records in the dimensional tables with separate keys. With Type 2, we have unlimited history preservation as a new record is inserted each time a change is made.

In the same example, if the supplier moves to Illinois, the table would look like this:Another popular method for tuple versioning is to add effective date columns.

Page 23: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)23

SCD 3

The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited history preservation, as it's limited to the number of columns we designate for storing historical data. Where the original table structure in Type 1 and Type 2 was very similar, Type 3 will add additional columns to the tables:

Note: Type 3, keeps separate columns for both the old and new attribute values—sometimes called “alternate realities.” In our experience, Type 3 is less common because it involves changing the physical tables and is not very scalable.

Page 25: Data warehouse and ssas terms

Karan Gulati (SSAS Maestro)25

Thanks

Contact Speaker -

http://karanspeaks.com

http://blogs.msdn.com/karang

https://twitter.com/karangspeaks

http://in.linkedin.com/in/karanspeaks