Upload
jamessnape
View
2.019
Download
0
Embed Size (px)
Citation preview
James SnapeApplication Development ConsultingMicrosoft Limited
Microsoft Analysis Services Physical Design
Agenda
Hardware
Dimensions
Facts
Relational stuff
Performance tuning next steps
NB: Relational design not complete –logging, auditing etc discussed in next session
Hardware
SQL Server Fast Track Data Warehousewww.microsoft.com/sqlserver/2008/en/us/fasttrack.aspx
Pre-tested hardware configurations
Specific disk, filegroup, layouts
Minimal indexing
To feed CPU at maximum capacity
Dimensions vs Facts
Dimension
Small (relatively)
Repeating data
Fact
Large
Numeric data + keys
Treat them differently
Dimensions in Relational Terms
Table structure
Keys
Indexes
Null handling
Managing change
Processing
CustomerFull Name
Post Code
City
State
Country
Gender
Occupation
Marital Status
Email Address
Customer
Geography1. Country
2. State
3. City
4. Post Code
5. Full Name
Star vs. Snowflake Schemas
dbo.CustomerCustomerKey
FullName
PostCode
City
State
Country
Gender
Occupation
MaritalStatus
EmailAddress
dbo.CustomerCustomerKey
GeographyKey
FullName
Gender
Occupation
MaritalStatus
EmailAddress
dbo.GeographyGeographyKey
PostCode
City
State
Country
OR
NB: both are denormalized, one more than the other
Primary Keys
Use smallest possible integer as surrogate primary key
Primary key is a “row identifier”
Multiple row “versions” are possible
“None” and “Unknown” special values are useful
Do NOT use business/source system keys
Clustered primary key is OK for dimensions
Dimension Indexes
Dimension processing queries of the form:
SELECT DISTINCT .... FROM ....
WHERE (filter) clauses never used
WHERE (join) clauses are used in snowflake dimensions
Non-processing queries may end up in SQL
ROLAP dimensions
Direct to SQL queries
Null Handling in Dimensions
By default NULL converts to 0 or an empty string
NULL attribute keys can invoke special “Unknown Member” handling
Prefer to create a specific “Unknown” row
CustomerKey FullName City Country
-1 Unknown Unknown Unknown
-2 None None None
1243 John Smith London United Kingdom
1244 Mary Jones Glasgow United Kingdom
Dimension Attributes
Attributes have keys, names (and values)
Integer attribute keys are smaller and faster
Keys must be unique
SELECT [Month] as [Month],
[Month] + „ „ + [Year] as [Month of Year]
FROM dbo.Time
Attribute Key Name (Value)
Year 2009 CY 2009 2009
Month 4 April 4
Month of Year 20090400 April 2009 4
Slowly Changing Dimensions
PK = row identifier
Multiple rows = multiple versions
Add effective dating columns
Which can be exposed as new dimensional attributes
dbo.CustomerCustomerKey
FullName
PostCode
City
State
Country
Gender
Occupation
MaritalStatus
EmailAddress
EffectiveFrom (smalldatetime)
EffectiveTo (smalldatetime)
CurrentFlag (tinyint)
Facts in Relational Terms
Keys
Indexing
Partitioning
Processing
Consider Row and Page compression
Internet SalesSales Amount
Order Quantity
Tax Amount
Unit Price
Transaction Count
Fact Keys and Indexes
Is a surrogate/primary key required?
Beware the clustered index/primary key
Prefer the date FK as the clustered index
Add NO CHECK to foreign keys
Indexes are usually not useful
Unless processing degenerate dimensions
Or servicing ROLAP/direct to SQL queries
Fact Partitioning – Why?
Parallel processing
Only process most recent data
Multiple storage engine threads during query
Archive off data
Multiple aggregation strategies
NB: Partitions require Enterprise Edition
Fact Partitioning – Guidelines
Partition when fact tables are 50-100GB+
Ideal partition size 2M-20M rows
Less than 1000 partitions per measure group
This wins over partition size
Prefer to partition over time
Can not aggregate higher than partition grain
Align AS and SQL partitions!
Calculated time keys become very useful
Proactive Caching
Cube = “Cache”
Automatic invalidation of cube
Automatic rebuild of cube
Valid?Valid?
Query
SQL Query
Quick Storage Engine Tuning
Ensure attribute relations are implemented
Turn on query log
Run Usage Based Optimisation (UBO) wizard
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other
countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to
changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of
this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.