How to Buy Data Warehouse DBMS February 2009 Final

Embed Size (px)

Citation preview

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    1/39

    How to Select an Analytic DBMSOverview, checklists, and tips

    by

    Curt A. Monash, Ph.D.President, Monash Research

    Editor, DBMS2

    contact @monash.com

    http://www.monash.comhttp://www.DBMS2.com

    mailto:[email protected]
  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    2/39

    Curt Monash

    Analyst since 1981, own firm since 1987

    Covered DBMS since the pre-relational days

    Also analytics, search, etc.

    Publicly available research Blogs, including DBMS2(www.DBMS2.com-- the

    source for most of this talk)

    Feed at www.monash.com/blogs.html

    White papers and more at www.monash.com

    User and vendor consulting

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    3/39

    Our agenda

    Why are there such things as specializedanalytic DBMS?

    What are the major analytic DBMS product

    alternatives? What are the most relevant differentiations

    among analytic DBMS users?

    Whats the best processfor selecting an

    analytic DBMS?

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    4/39

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    5/39

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    6/39

    Software strategies to optimize analytic I/O

    Minimize data returned Classic query optimization

    Minimize index accesses

    Page size

    Precalculate results Materialized views

    OLAP cubes

    Return data sequentially Store data in columns

    Stash data in RAM

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    7/39

    Hardware strategies to optimize analytic I/O

    Lots of RAM

    Parallel disk access!!!

    Lots of networking

    Tuned MPP (Massively Parallel Processing) isideal.

    Recommended configurations are a mixedbag.

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    8/39

    Specialty hardware strategies

    Custom or unusual chips (rare)

    Custom or unusual interconnects

    Fixed configurations of common parts

    Appliances orrecommended configurations

    And theres also SaaS.

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    9/39

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    10/39

    General areas of feature differentiation

    Most influenced by architecture

    Query performance

    Update/load performance

    Alternate datatypes Most influenced by product maturity

    Compatibilities

    Advanced analytics

    Manageability and availability Encryption and security

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    11/39

    Major analytic DBMS product groupings

    Architecture is a good first categorization

    Traditional OLTP

    Row-based MPP Columnar

    (Not covered tonight) MOLAP/array-based

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    12/39

    Traditional OLTP examples

    Oracle (especially pre-Exadata)

    IBM DB2 (especially mainframe)

    Microsoft SQL Server (pre-Madison)

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    13/39

    Analytic optimizations for OLTP DBMS

    Performance

    Two major kinds of precalculation Star indexes Materialized views

    Other specialized indexes

    Query optimization tools

    Other

    OLAP extensions SQL 2003

    Other embedded analytics

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    14/39

    Drawbacks

    Complexity and people cost

    Hardware cost

    Software cost

    Absolute performance

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    15/39

    Legitimate use scenarios

    When TCO isnt an issue

    Undemanding performance (and thereforeadministration too)

    When specialized features matter OLTP-like

    Integrated MOLAP

    Edge-case analytics

    Rigid enterprise standards Small enterprise/true single-instance

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    16/39

    Row-based MPP examples

    Teradata

    DB2 (open systems version)

    Netezza

    Oracle Exadata (sort of) DATAllegro/Microsoft Madison

    Greenplum

    Aster Data Kognitio

    HP Neoview

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    17/39

    Typical design choices in row-based MPP

    Random (hashed or round-robin) datadistribution among nodes

    Large block sizes

    Suitable for scans rather than random accesses Limited indexing alternatives

    Or little optimization for using the full boat

    Carefully balanced hardware

    High-end networking

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    18/39

    Tradeoffs among row MPP alternatives

    Enterprise standards

    Vendor size

    Hardware lock-in

    Total system price Features

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    19/39

    Columnar DBMS examples

    Sybase IQ

    Vertica

    InfoBright

    SAND ParAccel

    Kickfire

    Exasol

    MonetDB

    SAP BI Accelerator (sort of)

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    20/39

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    21/39

    Segmentation made (too) simple

    Onedatabase to rule them all

    One analyticdatabase to rule them all

    Frontlineanalytic database

    Very, very big analytic database Big analytic database handled very cost-

    effectively

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    22/39

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    23/39

    Use casesa first cut

    Light reporting

    Diverse EDW

    Big Data

    Operational analytics

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    24/39

    Metricsa first cut

    Total raw/user data

    Below 1-2 TB, references abound

    10 TB is another major breakpoint

    Total concurrent users 5, 15, 50, or 500?

    Data freshness

    Hours

    Minutes Seconds

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    25/39

    Basic platform issues

    Enterprise standards

    Appliance-friendliness

    Need for MPP?

    Cloud/SaaS

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    26/39

    The selection process in a nutshell

    Figure out what youre trying to buy

    Make a shortlist

    Do free POCs*

    Evaluate and decide

    *The only part thats even slightly specific to the analytic DBMScategory

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    27/39

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    28/39

    Use-case checklist -- generalities

    Database growth

    As time goes by

    More detail

    New data sources

    Users (human)

    Users/usage (automated)

    Freshness (data and query results)

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    29/39

    Use-case checklisttraditional BI

    Reports

    Today

    Future

    Dashboards and alerts

    Today Future

    Latency

    Ad-hoc

    Users Now that we have great response time

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    30/39

    Use-case checklistpredictive analytics

    How much do you think it would improveresults to

    Run more models?

    Model on more data?

    Add more variables?

    Increase model complexity?

    Which of those can the DBMS help withanyway?

    What about scoring?

    Real-time

    Other latency issues

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    31/39

    SLA realism

    What kind of turnaround truly matters?

    Customer or customer-facing users

    Executive users

    Analyst users

    How bad is downtime?

    Customer or customer-facing users

    Executive users

    Analyst users

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    32/39

    Short list constraints

    Cash cost

    But purchases are heavily negotiated

    Deployment effort

    Appliances can be good Platform politics

    You might as well consider incumbent(s)

    Appliances can be frowned on

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    33/39

    Filling out the shortlist

    Who matches your requirements in theory?

    What kinds of evidence do you require?

    References? How many? How relevant?

    A careful POC?

    Analyst recommendations?

    General buzz?

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    34/39

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    35/39

    Proof-of-Concept basics

    The better you match your use cases, themore reliable the POC is

    Most of the effort is in the set-up

    You might as well do POCs for severalvendorsat (almost) the same time!

    Where is the POC being held?

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    36/39

    The three big POC challenges

    Getting data

    Real?

    Politics

    Privacy

    Synthetic?

    Hybrid?

    Picking queries

    And more?

    Realistic simulation(s)

    Workload

    Platform

    Talent

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    37/39

    POC tips

    Dont underestimate requirements

    Dont overestimate requirements

    Get SOME data ASAP

    Dont leave the vendor in control Test what youll actually be buying

    Use the baseball bat

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    38/39

    Evaluate and decide

    It all comes down to

    Cost

    Speed

    Risk

    and in some cases

    Time to value

    Upside

  • 8/12/2019 How to Buy Data Warehouse DBMS February 2009 Final

    39/39

    Further information

    Curt A. Monash, Ph.D.

    President, Monash Research

    Editor, DBMS2

    contact @monash.com

    http://www.monash.com

    http://www.DBMS2.com