35
Best Practices for Implementing Enterprise BI Solution Teo Lachev, Prologika [email protected]

Best Practices for Implementing Enterprise BI Solution

Embed Size (px)

Citation preview

  • Best Practices for Implementing Enterprise BI Solution

    Teo Lachev, Prologika

    [email protected]

  • Why BI projects fail

    70-80% corporate BI projects fail (Gartner http://bit.ly/YRi028)

    Top reasons Poor communication between IT and Business Failure to ask the right questions Other reasons from my experience

    Business doesnt know about BI Inexperience and lack of technical knowledge When all you have is a hammer Data inaccuracy Performance degradation with large datasets

  • Agenda Share best practices and lessons learned

    BI architecture Data warehouse design ETL Semantic layer Presentation layer

    Assumptions Experience with Microsoft BI and database design

    Microsoft case study Records Management Firm Saves $1 Million

    http://bit.ly/15exUpM Most performance practices around biggish data

  • Ground rules

    Ask questions

    Turn cellphones off

    Tweet away (@tlachev #BestBI)

  • About me

    Consultant, author, and mentor with focus on Microsoft BI

    Owner of Prologika BI consulting and training company based in Atlanta (www.prologika.com)

    Microsoft SQL Server MVP for 10 years

    Leader of Atlanta BI group (atlantabi.sqlpass.org)

  • Used phased approach

    Identify critical success factors

    Break project into phases

    Phase 1 Most important

    Scope it relatively small

    Sets foundation Business process to model

    First conformant dimensions

    A few fact tables

  • Use classic BI solution architecture

    Data SourcesData is extracted from

    data sources,

    transformed, and

    loaded into DW

    Data WarehouseData is stored in

    dimensional schema

    consisting of dimension

    and fact tables

    DimensionTables

    FactTables

    Semantic LayerGreat performance

    Business calculations

    Single version of truth

    Client support

    Security

    Isolation

    Tabular

    Presentation LayerStandard reporting

    Ad-hoc reporting

    Dashboards

    Ad-hoc reportsOperational reportsDashboardsThird party tools

    Transactional reporting

    Multidimensional

    OR Historical &trend reporting

    ETLIntegration Services

  • Keep it simple!

    NA

    Europe

    ASIA

    NA

    Europe

    Asia

    Teos insight: Remove complexity until it cannot be simplified anymore

  • Consider active-active clustering

    Databaseserver

    SSASserver

    Cluster

  • Check your environment I/O

    BACKUP DATABASE [ContosoRetailDW] TO DISK='NUL';

    Or use tools such as IOMeter or CrystalMark I/O should be above 500 MB/sec

    Network speed select * from

    (consider discarding query results)

    Num rows/sec = row count/execution time (sec) Aim for > 100K rows/sec

    Virtualization Disk pass-through enabled Dedicated resources

  • Agenda

    BI architecture

    Data warehouse design

    ETL

    Semantic Layer

    Presentation layer

  • Star schema is your best friend

    Your dimensional model is foundation

    Design it with end user in mind

    Teos insight: The fact that Tabular supports more flexible relationships doesnt mean that star schema is obsolete - just the opposite.

    Avoid normalization

    Avoid summarized tables

    Use smartkey (YYYYMMDD) or [date] keys for Date tables

    Use referential integrity

  • Optimize physical storage

    Set database recovery to Simple

    Index considerations Cluster key on DateKey column in fact tables

    Other indexes as needed

    File groups File group per each large table

    Files on different drives

    Avoid using Primary file group

  • Use partitioning

    Partition large tables (above 50 Gb) Partition switching

    Better manageability

    Partition elimination when querying data

    Good read: Partitioned Table and Index Strategies Using SQL Server 2008 whitepaper by Ron Talmage

  • Use compression

    Consider page compression above 1 TB

    50-80% saving in disk space

    To estimate storage savings: Use SSMS Data Compression Wizard sp_estimate_data_compression_savings stored procedure

    EXEC sp_estimate_data_compression_savings 'dbo', 'FactResellerSales', 1, NULL, 'PAGE'

    Good read: Data Compression: Strategy, Capacity Planning and Best Practices whitepaper by Sanjay Mishra

  • Agenda

    BI architecture

    Data warehouse design

    ETL

    Semantic Layer

    Presentation layer

  • Consider merge design pattern

    LOB

    Files

    Data Sources

    Staging

    Database

    work table

    Data Warehouse

    select a,b

    from st1 inner join

    st2 where...

    incremental

    extraction

    dimension or

    fact table

    stored procedure with T-SQL

    merge statement

    Staging Database

    More efficient than SSIS transforms

    More flexible than SSIS lookups

    Easier to maintain

  • Consider Operational Data Store

    ODS advantages Offloads transactional data

    Maintains data history

    Smarter staging database

    Start_Date End_Date Store Product

    1/1/2010 5/1/2010 Atlanta Mountain Bike 1

    5/2/2010 3/8/2012 Atlanta Mountain Bike 2

    3/9/2012 12/31/9999 Norcross Mountain Bike 2

  • Index considerations

    Eliminate read locks Indexes: ALLOW_PAGE_LOCKS = OFF and ALLOW_ROW_LOCKS = OFF

    View hints WITH (NOLOCK) orSET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    Drop non-clustered indexes and constraints With massive updates (10% or more)

    Enables non-logged load

    Consider COLUMNSTORE indexes when queries aggregate data

  • Take advantage of partitioning

    Consider partition switching Fast incremental load

    Parallel partition load

    Faster updates

    Use Manage Partition Wizard to generate Switch in/out scripts

    Staging table

    Sliding window

    For parallel partition load, change the table lock escalation

    ALTER TABLE SET ( LOCK_ESCALATION = AUTO)To find the table lock escalation:

    SELECT lock_escalation_desc FROM sys.tables WHERE name = '

  • Optimize big joins

    Set OPTION (HASH JOIN or LOOP JOIN)

    http://bit.ly/108HuHR

  • Agenda

    BI architecture

    Data warehouse design

    ETL

    Semantic Layer

    Presentation layer

  • BI Semantic Layer

    Third-Party BI Applications

    Reporting Services Reports

    ExcelWorkbooks

    PowerPivotApplications

    SharePointDashboards &

    Scorecards

    FilesODataFeeds

    Multidimensional Tabular

    MDX DAX

    MOLAP ROLAPxVelocity(VertiPaq)

    DirectQuery

    MDX DAX

  • Choose semantic layer wisely

    Decision checkpoints Data volumes

    Complexity

    Scenarios for considering Multidimensional Data warehousing

    Large data volumes

    Complex models

    Scenarios for considering Tabular Promoting PowerPivot models to organizational models

    Rapid development for simple models

    Transactional reporting? (be careful)

  • Optimize Multidimensional

    Dont be afraid of biggish data

    Avoid complex scope assignments

    Centralize business logic

    Consider fast storage

    Consider single cube

  • Tabular Considerations

    Improve your design experience http://bit.ly/106iKjt Small dataset during dev

    Disable automatic calculation

    Remove unnecessary columns

    Be careful about transactional reporting No cross-fact table support

    Performance degradation withbig data - http://bit.ly/136h60U

    Dim Date

    Fact Orders Fact Receipts

  • Partition when makes sense

    Partition large measure groups (above 100 million) Mostly management technique

    Useful for incremental processing

    Partition slice: ~50 million

    Automate with partition generatorhttp://bit.ly/partitiongenerator

    Use SQL views to wrap tables

  • When to use self-service BI?

    Know your end users Power users

    Financial analysts

    When self-service BI make sense? Waiting for organizational BI to happen

    Ideate and promote lateral thinking

    Consider 80/20 rule 80% organizational BI

    20% self-service BI

  • Agenda

    BI architecture

    Data warehouse design

    ETL

    Analytical layer

    Presentation layer

  • DashboardsA dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance.

    Stephen Few, Information Dashboard Design book

    From Information Dashboard Design book

  • PerformancePoint in real life

  • Power View in real life

  • Excel Services in SharePoint 2013

  • Consider your dashboard options

    Technology Pros Cons

    PerformancePoint Designed for scorecards and KPIs

    Supporting views

    (reports, Excel spreadsheets, PP reports)

    Decomposition tree

    Customizable

    BI pro-oriented

    No wow effect

    Power View Highly interactive

    Easy to implement

    End user-oriented

    No extensibility

    No mobile support yet (but promised)

    Currently requires Silverlight

    (MS working on HTML5)

    Excel Services Use Excel pivot reports

    Easy to implement

    Reports updatable in SP 2013

    Reports not updatable in SP 2010

    No wow effect

    Reporting Services reports Highly customizable

    Rich visualizations

    Require report experience

    Reports not updatable

    Drillthrough requires new reports

  • Summary

    I shared proven practices and tips from past experience

    Keep things simple but have sound design

    How to contact me: Email: [email protected]

    Web: www.prologika.com

    Blog: http://prologika.com/cs/blogs/

    Newsletter: http://prologika.com/Newsroom/News.aspx