37
Richard Currey | Senior Technical Trainer–New Horizons United George Squillace | Senior Technical Trainer–New Horizons Great Lakes Implementing a Data Warehouse with SQL Server Jump Start

MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Embed Size (px)

Citation preview

Page 1: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Richard Currey | Senior Technical Trainer–New Horizons UnitedGeorge Squillace | Senior Technical Trainer–New Horizons Great Lakes

Implementing a Data Warehouse with SQL Server Jump Start

Page 2: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Meet Richard Currey• Senior Technical Trainer – New Horizons United– Focused on database and development technologies– MCDBA, MCITP Dev / Admin / BI, MCSE Data

Platform, BI– MCSD, MCPD Web Developer, ASP .NET Developer,

Windows Developer

• 28 Years Industry Experience– Designed, developed and managed BI-centric

projects at several Fortune 500 organizations– Extensive consulting and project management

background

Page 3: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Meet George SQUILLACE• Senior Technical Trainer – New Horizons Great

Lakes (20 Years)– “SQL” is in his name!– Focused on database technologies– MCT Since 1997

• 28 Years Industry Experience– SQL Server 2012: certified as MCSA, MCSE: Data

Platform, & MSCE: Business Intelligence– Certified in every version of SQL Server since SQL

2000– Certified in every version of Windows Server from

NT 4.0 through Windows Server 2008– Certified in Exchange Server 5.5 through Exchange

Server 2007

Page 4: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Course Modules

Implementing a Data Warehouse with SQL Server01 | Design and Implement Dimensions and Fact Tables 04 | Control Flow

02 | Data Flow - Extract Data 05 | Configure and Deploy SSIS

03 | Data Flow - Transform Data 06 | Manage Enterprise Data

Page 5: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Setting Expectations• Target Audience – Data warehousing specialists who want to expand their knowledge

of SQL Server Integration Services (SSIS) – Database professionals who want to take exam 70-463 and get

certified in data warehouse implementations

• Suggested Prerequisites/Supporting Material– SQL Server development experience and exposure to extract,

transform, and load (ETL) processes– Course 10777, Implementing a Data Warehouse with Microsoft SQL

Server 2012– MS Press Book: Training Kit (Exam 70-463): Implementing a Data

Warehouse with Microsoft SQL Server 2012

Page 6: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Click to edit Master subtitle style01 | Design and Implement

Dimensions and Fact Tables

Richard Currey | Senior Technical Trainer–New Horizons UnitedGeorge Squillace | Senior Technical Trainer–New Horizons Great Lakes

Page 7: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

• Schema Design: Star vs. Snowflake• Facts and Fact Tables• Fact and Dimension Granularity• Conformed and Non-Conformed Dimensions• Time Dimensions

Module 1 Overview

Page 8: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Topic: Schema Design: Star vs. Snowflake

Page 9: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Topic: Schema Design: Star versus Snowflake• Star Schema• Snowflake Schema• Processing and Performance Considerations

Page 10: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Star Schema• A star schema has a

single table for each dimension

• Each table supports all attributes for that dimension

• Typically a de-normalized solution

DimSalesPersonSalesPersonKeySalesPersonNameStoreNameStoreCityStoreRegion

DimProductProductKeyProductNameProductLineSupplierName

DimCustomerCustomerKeyCustomerNameCityRegion

FactOrdersCustomerKeySalesPersonKeyProductKeyShippingAgentKeyTimeKeyOrderNoLineItemNoQuantityRevenueCostProfit

DimDateDateKeyYearQuarterMonthDay

DimShippingAgentShippingAgentKeyShippingAgentName

Page 11: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

DEMOImplementing a Star Schema

Page 12: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Snowflake Schema• More normalized

solution• Typically contains

multiple tables per dimension

• Each table contains dimension key, value, and the foreign key value for the parent

DimSalesPersonSalesPersonKeySalesPersonNameStoreKey

DimProductProductKeyProductNameProductLineKeySupplierKey

DimCustomerCustomerKeyCustomerNameGeographyKey

FactOrdersCustomerKeySalesPersonKeyProductKeyShippingAgentKeyTimeKeyOrderNoLineItemNoQuantityRevenueCostProfit

DimDateDateKeyYearQuarterMonthDay

DimShippingAgentShippingAgentKeyShippingAgentName

DimProductLineProductLineKeyProductLineName

DimGeographyGeographyKeyCityRegion

DimSupplierSupplierKeySupplierName

DimStoreStoreKeyStoreNameGeographyKey

Page 13: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

DEMOImplementing a Snowflake Schema

Page 14: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Processing and Performance Considerations• Star schema requires de-normalization during the

load process– Can impact the ETL times

• Snowflake schema can increase dimension complexity– Can impact Analysis Services solutions, negatively

affecting cube performance

Page 15: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Topic: Facts and Fact Tables

Page 16: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Topic: Facts and Fact Tables• What Is a Fact?• Grouping Facts• What Is Granularity?• Design Considerations

Page 17: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

What Is a Fact?• Facts are the key metrics used to measure business

results:– Sales– Production– Inventory

• Can be additive, semi-additive, or non-additive

Page 18: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Grouping Facts• Facts are grouped into fact tables• Related facts should be in the same fact table• Facts with different granularity should be in different

tables

Page 19: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

What Is Granularity?• Granularity refers to the level of detail in which facts

are recorded• Facts can be at different levels of granularity

Page 20: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Design Considerations• Fact tables should have all keys relating to

dimensions• Primary key should be composite of all dimension

keys• Separate additive, semi-additive, and non-additive

facts

Page 21: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Topic: Fact and Dimension Granularity

Page 22: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Topic: Fact and Dimension Granularity• How to Determine Fact Granularity• Dimension Granularity

Page 23: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

How to Determine Fact Granularity• Granularity is determined based on business needs• Should be the lowest level of detail that needs to be

examined• If data from transactional systems has more detail

than needed for analysis, ETL should aggregate the details

Page 24: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Dimension Granularity• Dimension granularity needs to be matched with fact

granularity• Each dimension has its own granularity• Fact tables are keyed to the granularity of the

dimensions

Page 25: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Topic: Conformed and Non-Conformed Dimensions

Page 26: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Topic: Conformed and Non-Conformed Dimensions• What Are Conformed and Non-Conformed

Dimensions?• Shared and Degenerate Dimensions• What Is a Slowly Changing Dimension?

Page 27: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

What Are Conformed and Non-Conformed Dimensions?• Conformed dimension– Shared by multiple fact tables– Used when all business users have the same definitions for

the dimension• Non-conformed dimension– Dimension table targeted to a single fact table– Used when dimensions have different definitions for

different business units

Page 28: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

DEMOCreating Conformed and Non-Conformed Dimensions

Page 29: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Shared and Degenerate Dimensions• Shared dimension– Used by multiple facts– Dimension key is stored in the fact table– Dimension value is stored in the dimension table with

other attributes of that dimension• Degenerate dimension– Used by a single fact table– Dimension value is stored directly in the fact table– No corresponding dimension table

Page 30: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

What Is a Slowly Changing Dimension?• When the historical attribute values are retained if

the attributes are updated• Used when the organization does not want to lose

track of what actually happened– Example: customer moves from Connecticut to Seattle

• Slowly changing dimension types:– Type 1: Attribute history is not retained– Type 2: Attribute change creates a new record– Type 3: Original attribute value recorded and latest value

recorded with an effective date

Page 31: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

DEMOImplementing a Slowly Changing Dimension

Page 32: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Topic: Time Dimensions

Page 33: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Topic: Time Dimensions• Types of Time Dimensions• Time Dimensions and Hierarchies

Page 34: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Types of Time Dimensions• Based on standard calendar breakdowns– Year => Month => Day– Year => Quarter => Week => Day

• Based on fiscal calendar– Year => Fiscal Quarter => Fiscal Month => Fiscal Week

=> Day• Time dimension needs to contain all hierarchy

elements to the lowest granularity for the fact tables

Page 35: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

Time Dimensions and Hierarchies• Establishes the “buckets” that the business uses• Typically there are multiple hierarchies in the

dimension– Calendar– Business

• Created using Microsoft Excel, scripts, or are auto-generated

Page 36: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

DEMOCreating a Time Dimension

Page 37: MVA Implementing a Data Warehouse With SQL Jump Start Mod 1 Final

©2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Office, Azure, System Center, Dynamics and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.