View
1.343
Download
4
Category
Preview:
Citation preview
Best Practices for Implementing Enterprise BI Solution
Teo Lachev, Prologika
teo.lachev@prologika.com
Why BI projects fail
70-80% corporate BI projects fail (Gartner http://bit.ly/YRi028)
Top reasons Poor communication between IT and Business Failure to ask the right questions Other reasons from my experience
Business doesnt know about BI Inexperience and lack of technical knowledge When all you have is a hammer Data inaccuracy Performance degradation with large datasets
Agenda Share best practices and lessons learned
BI architecture Data warehouse design ETL Semantic layer Presentation layer
Assumptions Experience with Microsoft BI and database design
Microsoft case study Records Management Firm Saves $1 Million
http://bit.ly/15exUpM Most performance practices around biggish data
Ground rules
Ask questions
Turn cellphones off
Tweet away (@tlachev #BestBI)
About me
Consultant, author, and mentor with focus on Microsoft BI
Owner of Prologika BI consulting and training company based in Atlanta (www.prologika.com)
Microsoft SQL Server MVP for 10 years
Leader of Atlanta BI group (atlantabi.sqlpass.org)
Used phased approach
Identify critical success factors
Break project into phases
Phase 1 Most important
Scope it relatively small
Sets foundation Business process to model
First conformant dimensions
A few fact tables
Use classic BI solution architecture
Data SourcesData is extracted from
data sources,
transformed, and
loaded into DW
Data WarehouseData is stored in
dimensional schema
consisting of dimension
and fact tables
DimensionTables
FactTables
Semantic LayerGreat performance
Business calculations
Single version of truth
Client support
Security
Isolation
Tabular
Presentation LayerStandard reporting
Ad-hoc reporting
Dashboards
Ad-hoc reportsOperational reportsDashboardsThird party tools
Transactional reporting
Multidimensional
OR Historical &trend reporting
ETLIntegration Services
Keep it simple!
NA
Europe
ASIA
NA
Europe
Asia
Teos insight: Remove complexity until it cannot be simplified anymore
Consider active-active clustering
Databaseserver
SSASserver
Cluster
Check your environment I/O
BACKUP DATABASE [ContosoRetailDW] TO DISK='NUL';
Or use tools such as IOMeter or CrystalMark I/O should be above 500 MB/sec
Network speed select * from
(consider discarding query results)
Num rows/sec = row count/execution time (sec) Aim for > 100K rows/sec
Virtualization Disk pass-through enabled Dedicated resources
Agenda
BI architecture
Data warehouse design
ETL
Semantic Layer
Presentation layer
Star schema is your best friend
Your dimensional model is foundation
Design it with end user in mind
Teos insight: The fact that Tabular supports more flexible relationships doesnt mean that star schema is obsolete - just the opposite.
Avoid normalization
Avoid summarized tables
Use smartkey (YYYYMMDD) or [date] keys for Date tables
Use referential integrity
Optimize physical storage
Set database recovery to Simple
Index considerations Cluster key on DateKey column in fact tables
Other indexes as needed
File groups File group per each large table
Files on different drives
Avoid using Primary file group
Use partitioning
Partition large tables (above 50 Gb) Partition switching
Better manageability
Partition elimination when querying data
Good read: Partitioned Table and Index Strategies Using SQL Server 2008 whitepaper by Ron Talmage
Use compression
Consider page compression above 1 TB
50-80% saving in disk space
To estimate storage savings: Use SSMS Data Compression Wizard sp_estimate_data_compression_savings stored procedure
EXEC sp_estimate_data_compression_savings 'dbo', 'FactResellerSales', 1, NULL, 'PAGE'
Good read: Data Compression: Strategy, Capacity Planning and Best Practices whitepaper by Sanjay Mishra
Agenda
BI architecture
Data warehouse design
ETL
Semantic Layer
Presentation layer
Consider merge design pattern
LOB
Files
Data Sources
Staging
Database
work table
Data Warehouse
select a,b
from st1 inner join
st2 where...
incremental
extraction
dimension or
fact table
stored procedure with T-SQL
merge statement
Staging Database
More efficient than SSIS transforms
More flexible than SSIS lookups
Easier to maintain
Consider Operational Data Store
ODS advantages Offloads transactional data
Maintains data history
Smarter staging database
Start_Date End_Date Store Product
1/1/2010 5/1/2010 Atlanta Mountain Bike 1
5/2/2010 3/8/2012 Atlanta Mountain Bike 2
3/9/2012 12/31/9999 Norcross Mountain Bike 2
Index considerations
Eliminate read locks Indexes: ALLOW_PAGE_LOCKS = OFF and ALLOW_ROW_LOCKS = OFF
View hints WITH (NOLOCK) orSET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
Drop non-clustered indexes and constraints With massive updates (10% or more)
Enables non-logged load
Consider COLUMNSTORE indexes when queries aggregate data
Take advantage of partitioning
Consider partition switching Fast incremental load
Parallel partition load
Faster updates
Use Manage Partition Wizard to generate Switch in/out scripts
Staging table
Sliding window
For parallel partition load, change the table lock escalation
ALTER TABLE SET ( LOCK_ESCALATION = AUTO)To find the table lock escalation:
SELECT lock_escalation_desc FROM sys.tables WHERE name = '
Optimize big joins
Set OPTION (HASH JOIN or LOOP JOIN)
http://bit.ly/108HuHR
Agenda
BI architecture
Data warehouse design
ETL
Semantic Layer
Presentation layer
BI Semantic Layer
Third-Party BI Applications
Reporting Services Reports
ExcelWorkbooks
PowerPivotApplications
SharePointDashboards &
Scorecards
FilesODataFeeds
Multidimensional Tabular
MDX DAX
MOLAP ROLAPxVelocity(VertiPaq)
DirectQuery
MDX DAX
Choose semantic layer wisely
Decision checkpoints Data volumes
Complexity
Scenarios for considering Multidimensional Data warehousing
Large data volumes
Complex models
Scenarios for considering Tabular Promoting PowerPivot models to organizational models
Rapid development for simple models
Transactional reporting? (be careful)
Optimize Multidimensional
Dont be afraid of biggish data
Avoid complex scope assignments
Centralize business logic
Consider fast storage
Consider single cube
Tabular Considerations
Improve your design experience http://bit.ly/106iKjt Small dataset during dev
Disable automatic calculation
Remove unnecessary columns
Be careful about transactional reporting No cross-fact table support
Performance degradation withbig data - http://bit.ly/136h60U
Dim Date
Fact Orders Fact Receipts
Partition when makes sense
Partition large measure groups (above 100 million) Mostly management technique
Useful for incremental processing
Partition slice: ~50 million
Automate with partition generatorhttp://bit.ly/partitiongenerator
Use SQL views to wrap tables
When to use self-service BI?
Know your end users Power users
Financial analysts
When self-service BI make sense? Waiting for organizational BI to happen
Ideate and promote lateral thinking
Consider 80/20 rule 80% organizational BI
20% self-service BI
Agenda
BI architecture
Data warehouse design
ETL
Analytical layer
Presentation layer
DashboardsA dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance.
Stephen Few, Information Dashboard Design book
From Information Dashboard Design book
PerformancePoint in real life
Power View in real life
Excel Services in SharePoint 2013
Consider your dashboard options
Technology Pros Cons
PerformancePoint Designed for scorecards and KPIs
Supporting views
(reports, Excel spreadsheets, PP reports)
Decomposition tree
Customizable
BI pro-oriented
No wow effect
Power View Highly interactive
Easy to implement
End user-oriented
No extensibility
No mobile support yet (but promised)
Currently requires Silverlight
(MS working on HTML5)
Excel Services Use Excel pivot reports
Easy to implement
Reports updatable in SP 2013
Reports not updatable in SP 2010
No wow effect
Reporting Services reports Highly customizable
Rich visualizations
Require report experience
Reports not updatable
Drillthrough requires new reports
Summary
I shared proven practices and tips from past experience
Keep things simple but have sound design
How to contact me: Email: teo.lachev@prologika.com
Web: www.prologika.com
Blog: http://prologika.com/cs/blogs/
Newsletter: http://prologika.com/Newsroom/News.aspx
Recommended