37
1 MicroStrategy PRIME High Performance In-memory Analytics

MicroStrategy PRIMEdocshare01.docshare.tips/files/25873/258739848.pdf · 2016-12-20 · 2 Speaker Introduction Bala Chandran – Dir. Enterprise BI, MicroStrategy • 15 years of

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

1

MicroStrategy PRIME High Performance In-memory Analytics

2

Speaker Introduction

Bala Chandran – Dir. Enterprise BI, MicroStrategy

• 15 years of experience implementing

and designing Big Data and Analytics

Solutions

• Hands on experience with many MPP

and in-memory systems

• @BG_Chandran – ask questions

#MSTRPrime

3

High Performance Is No Longer A “Nice To Have” In Analytical Applications

Drivers Of High

Performance

1 Users expect “Google Like” performance

from analytic applications, especially on

mobile devices

2 Exploding data volumes & variety require

In-Memory consolidation and

aggregation

3 Modern analytical applications contain

100’s of vizs, distributed to 1000’s of users

daily

4

Strageloopnetworks.com

Users Expect Sub 3 Second Response From Applications

5

Torbit.com

Performance Directly Correlates to Revenue

• Google found that a 500ms slowdown equals 20% decrease in ad revenue.

• Amazon finds a 100ms slowdown can mean a 1% decrease in revenue.

• Yahoo! found that a 400ms improvement translated to a 9% increase in traffic.

• Mozilla mapped a 2.2s improvement to 60 million additional Firefox downloads -

http://blog.edgecast.com/post/42404930702/ecommerce-performance-website-speed-

impacts-your#sthash.1Hn7Y4dr.dpuf

6

INTRODUCING MicroStrategy PRIME

7

INTRODUCING MicroStrategy PRIME

Linear scalability

to 1,000s of

CPUs

PARALLEL

8

INTRODUCING MicroStrategy PRIME

Flexible schema &

Partitioned data

Linear scalability

to 1,000s of

CPUs

PARALLEL RELATIONAL

9

INTRODUCING MicroStrategy PRIME

Flexible schema &

Partitioned data

Linear scalability

to 1,000s of

CPUs

PARALLEL RELATIONAL

3x to 10x faster

7x to 20x more

users

IN-MEMORY

10

INTRODUCING MicroStrategy PRIME

Flexible schema &

Partitioned data

Linear scalability

to 1,000s of

CPUs

Tightly-coupled

interactive exploration

PARALLEL RELATIONAL ENGINE

3x to 10x faster

7x to 20x more

users

IN-MEMORY

11

MicroStrategy PRIME In Action At Facebook

“We have this thing that’s running. It’s one of

the most amazing things I’ve seen. It’s

running against the entire Facebook user

base, 1.1 billion users.”

Guy Bayes

Head of Enterprise BI, Facebook

• 200 + petabytes of

Hadoop Source Data

• 30 + Terabytes

Analyzed in PRIME

• 200+ Node Cluster

• 3500+ Cores

• 175 Billion Rows

12

Traditional Technologies Cannot Deliver Performance At High Scale

Custom Approaches Are Expensive And Risky

HADOOP

Data Scale

User

Scale

13

Traditional Technologies Cannot Deliver Performance At High Scale

Custom Approaches Are Expensive And Risky

HADOOP

Data Scale

User

Scale

MPP

Databases

14

Traditional Technologies Cannot Deliver Performance At High Scale

Custom Approaches Are Expensive And Risky

HADOOP

Data Scale

User

Scale

MPP

Databases

In-

memory

DB’s

15

Traditional Technologies Cannot Deliver Performance At High Scale

Custom Approaches Are Expensive And Risky

HADOOP

Data Scale

User

Scale

MPP

Databases

In-

memory

DB’s

High Scale Information Driven

Apps

Custom Development

Java + Transactional DB clusters + Web 2.0 +

In-memory + BI Tools + …….

Expensive

Complex

Risky

Slow

16

MicroStrategy PRIME – Purpose Built For Performance @ Scale

HADOOP

Data Scale

User

Scale

MPP

Databases

In-

memory

DB’s

MicroStrategy PRIME First Out of the box solution

in the market

17

Example Applications

• CRM analysis across a large customer base

• Interactive analysis: large clickstream data

• Merchant analytics for a credit card issuer

• Store manager application for a large chain

MicroStrategy PRIME – Interactive Big Data Exploration

18

Example Applications

• CRM analysis across a large customer base

• Interactive analysis: large clickstream data

• Merchant analytics for a credit card issuer

• Store manager application for a large chain

MicroStrategy PRIME – Interactive Big Data Exploration

Application Characteristics

• Large Data Volumes

• Sub 3 second response time

• Highly Dimensional data

• Complex Dashboards with multiple

visualizations

• Highly Interactive App with users

filtering and slicing across many

dimensions

• Web & Mobile Deployments

• Large User Populations

19

MicroStrategy PRIME - 7x more users and 3x faster than the next best in-

memory technology

1

9

7x More

Users

3x

Faster

Complex analytical

dashboard

High user interactivity

200 GB data set with 50+

dimensions

Equivalent hardware

configurations – 30 nodes

20

MicroStrategy PRIME is like In-Memory on steroids

Data Size 100GB Limit No theoretical limit Tested to 4.6 TB

OLAP Services SMP architecture

PRIME MPP architecture

Data Rows 2B Limit No theoretical limit Tested to 200B

Load Rate 8 GB/Hr No theoretical limit Tested to 7TB/Hr

21

MicroStrategy PRIME – World’s First Technology to Combine 3 Key

Breakthroughs

1 In-Memory Data Store

2 Massively Parallel Processing on

Commodity HW

3 Look-Ahead Analytics – Integrated

Data & Visualization Layers

Interactive Exploration

of

Terabyte Datasets

by

100,000s of Users

22

The Evolution Of Storage

23

1. In-Memory Data Store – How much Faster Is It?

• Traditional Disk speed is a banana slug with a top speed of 0.007 mph

• In-Memory is an F-18 Hornet with a max speed of 1,190 mph

24

RAM Prices Have Fallen Drastically

25

2. Massively Parallel Processing On Commodity Hardware

• Distribute data across 1000’s of nodes

• Parallel Query execution and loading

• Inexpensive Commodity Hardware

Shared Memory

Traditional BI PRIME Parallel Execution

Memory Memory Memory

Query Engines

Bottleneck

Distributed Data

Parallel Execution

26

2. Parallel Processing: Scaling The Solution

http://blog.delloem.com/2010/12/talking-hpc-with-sagiv-tech/image001/

PRIME Parallel Efficiency

27

Parallel Processing: Breaking The Problem Down

Vertical Scaling (Scale-up): Generally refers to adding more processors and RAM, buying a more robust server.

Pros

• Less power consumption / cooling

• Less network hardware than scaling horizontally

• Cons

• More expensive

• Greater risk of hardware failure

• Limited upgradeability

Horizontal Scaling (Scale-out): Generally refers to adding more servers with less processors and RAM.

• Pros

• More cost effective than scaling vertically

• Easier to run fault-tolerance

• Easy to upgrade

Cons

• Bigger footprint in the Data Center

• Higher utility cost (Electricity and cooling)

• Possible need for more networking equipment (switches/routers)

28

Data Movement: The Performance Killer

http://www.edn.com/design/communication

s-networking/4313434/The-evolution-to-

network-flow-processing

Oracle, 2012

90+% YoY growth

50% YoY growth

29

PRIME Parallel Execution

Memory Memory Memory

Query Engines

Distributed Data

Parallel Execution

Minimizing Data Movement: Bringing Query To The Data

• Query partitioned and executed on core where

data lives

• Only summary information is sent across the

network

30

Commodity Hardware vs. Specialized Appliances

Example PRIME configuration

• 100 clusters of 2 worker

nodes; 1 cluster of 20 master

nodes

• Each Node-16 cores, 144

GB RAM each

• Total: 1920 cores, ~17TB

RAM

31

3. Look Ahead Analytics – Tightly Integrated Data & Visualization Layers

• Data layer has no knowledge of analytics layer

design

• Connections Optimized for the lowest common

denominator

Data Layer

Traditional BI PRIME – Look Ahead

Analytics

Loosely

Coupled

Visualization

Layer

Visualization Layer

Data Layer

Analytics layer

optimizes queries for

data

Data layer analyzes

dashboard and

optimizes structures

• Tightly integrated layers enable optimization

• Analytics layer globally optimizes queries sent to

data based on data structures

• Data layer “looks ahead” and plans based on

knowledge of dashboard

32

Taking Co-Location One Step Further: In-Process Analytics

Query

Processing

App Engine

Process 0

Query

Processing

App Engine

Process 1

Traditional BI

Even if you install BI and DB

on the same server

They run in separate

processes

MicroStrategy PRIME

Query Engine and

Application Engine run

In-process analytics

33

Typical PRIME Application

• 75+data sets

• Multiple views of similar data

• Share joins, filters and cohorts

3. Look Ahead Analytics – The Secret Sauce

Look Ahead Analytics

• Visualizations with identical information processed once

• Filtering and cohorts processed once and reused - processed into machine code

.

• Re-use of joined results for analytics with similar information

• ….. Many More

34

MicroStrategy PRIME in Action

35

MicroStrategy PRIME - Architecture

SOURCE DATA

Parallel data

loading

Analytics Engines

… DATA

DATA

DATA

DATA

Parallel query

execution

Optimized in-memory data

structure

Data partitioning within and across

nodes

Application Engines

VISUALIZATION

API

Web and mobile output

API

Commodity hardware

Tightly

coupled for

minimal

computation

al distance

36

MicroStrategy PRIME Co-exists With Existing Enterprise Databases

SOURCE DATA

Data

Warehouse

MicroStrategy

PRIME

• Does not replace databases

• Functions as Hot data layer

for apps requiring high

performance

• Load from databases or

directly from files and Hadoop

37

Thank You

@BG_Chandran #MSTRPrime