SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture...

SnappyDataA Unified Cluster for Streaming, Transactions, & Interactive Analytics Version 0.7 | © Snappydata Inc 2017

www.Snappydata.io

Barzan Mozafari | Jags Ramnarayan | Sudhir Menon

Yogesh Mahajan | Soubhik Chakraborty | Hemant Bhanawat | Kishor Bachhav

Our Pedigree

GTD VenturesPune (25+)

Portland (5)

Ann Arbor (1)

Mixed Workloads Are Everywhere

Stream Processing

TransacAon InteracAve AnalyAcs

Analytics on mutating data

Correlating and joining streams with large histories

Maintaining state or counters while

ingesting streams

Mixed Workloads Are Everywhere

Enrich – requires reference data join

InteracAve OLAP queries

State – manage windows with mutaMng state AnalyAcs – Joins to history, trend paOerns Model maintenance

TransacAon writesIOT Devices

Scalable Store HDFS, MPP DB

Data-in-moAon AnalyAcs

ApplicaAon

STREAMS

Alerts

Enrich

Reference DB

Models

InteracMve Queries

Updates

Why SupporAng Mixed Workloads is Difficult?

Data Structures Query Processing

Paradigm Scheduling &

Provisioning

Columnar Batch

Processing Long-running

Row stores Point

Lookups Short-lived

Sketches Delta /

Incremental Bursty

Lambda Architecture

New Data

Batch layer

Master Datasheet

Serving layer

Batch view 3

Batch view

Speed layer

4 Real-Ame View Real-Ame View

Lambda Architecture is Complex

Scalable Store HDFS, MPP DB

Data-in-moAon AnalyAcs

ApplicaAon

STREAMS

Alerts

Enrich

Reference DB

Models

InteracMve Queries

Updates Storm, Spark Streaming, Samza…

Enterprise DB – Oracle, Postgres..

NoSQL – Cassandra, Redis, Hbase ….

Teradata, Greenplum, VerMca, ...

IOT Devices

Lambda Architecture is Complex

• Complexity

- Learn and master mulMple products, data models, disparate APIs & configs

• Wasted resources

• Slower

- Excessive copying, serializaMon, shuffles - Impossible to achieve interacMve-speed analyMcs on large or mutaMng data

CanWeSimplify&Op0mize?

Our SoluAon: SnappyData

Deep Scale, High Volume

MPP DB

Real-Ame design Low latency, HA, concurrency

Batch design, high throughput

Rapidly Maturing Matured over 13 years

Single Unified HA Cluster OLTP + OLAP + Streaming for Real-time Analytics

•  Cannot update •  Repeated for each

User/App

USER 1 / APP 1

Spark Master

Spark Executor (Worker)

Framework for streaming SQL, ML…

Immutable CACHE

USER 2 / APP 2

Spark Master

Spark Executor (Worker)

Immutable CACHE

HDFS SQL

BoOleneck

We Transform Spark from This …

… Into an “Always-On” Hybrid Database !

Deep Scale, High Volume

MPP DB

HDFS SQL

HISTORY

Spark Executor (Worker) JVM - Long-lived

Spark Driver

In-Memory ROW + COLUMN

Start with Indexing

-  Mutable, -  Transactional Spark

Cluster

Spark Job

Shared Nothing Persistence

Unified Data Model & API

Cluster Manager & Scheduler

Snappy Data Server (Spark Executor + Store)

Parser

Data Synopsis Engine

Distributed Membership Service

Stream Processing

Tables ODBC/JDBC Data Frame

Low Latency

High Latency

HYBRID Store

ProbabilisMc Rows Columns

Query OpMmizer

Add / Remove Server

•  Mutability (DML+Trx)

•  Indexing

•  SQL-based streaming

Overview

Parser

Stream Processing

Data Frame

Low Latency

High Latency

HYBRID Store

Query OpMmizer

Add / Remove Server

Tables ODBC/JDBC

Hybrid Store

Unbounded Streams IngesMon

Real Mme Sampling

TransacMonal State Update

ProbabilisMc

Index Rows

Row Buffer Columns

Random Writes ( Reference data )

Stream AnalyMcs

Updates & Deletes on Column Tables

Column Segment ( t1-t2)

Column Segment ( t2-t3)

0 1 0 0 0 0 1 1 0

K11 K12

C11 C12

C21 C22

Summary Metadata

Periodic Compaction

One ParAAon

Row Buffer

New Segment

Replicate for HA

ProbabilisAc Store: Synopses + Uniform & StraAfied Samples

Higher resoluMon for more recent Mme ranges

1. Streaming CMS

(Count-Min-Sketch)

[t1, t2) [t2, t3) [t3, t4) [t4, now) Time

4T 2T T ≤T

1. Streaming CMS

(Count-Min-Sketch)

[t1, t2) [t2, t3) [t3, t4) [t4, now) Time

4T 2T T ≤T

Maintain a small sample at each CMS cell

2. Top-K Queries w/ Arbitrary Filters

Tradi>onal CMS CMS+Samples

1. Streaming CMS

(Count-Min-Sketch)

[t1, t2) [t2, t3) [t3, t4) [t4, now) Time

4T 2T T ≤T

Maintain a small sample at each CMS cell

2. Top-K Queries w/ Arbitrary Filters

Tradi>onal CMS CMS+Samples

3. Fully Distributed Stratified Samples

Always include Mmestamp as a straMfied column for streams

Streams Aging Row Store (In-memory) Column Store (Disk)

timestamp

Overview

Parser

Stream Processing

Data Frame

Low Latency

High Latency

HYBRID Store

Query OpMmizer

Add / Remove Server

Tables ODBC/JDBC

SupporAng Real-Ame & HA

Locator

Lead Node Executor JVM (Server)

Shared Nothing Persistence

JDBC/ODBC

Catalogue Service

Managed Driver

SPARK Contacts

SPARK Context

SNAPPY Cluster

Manager

SPARK JOBS

SPARK Program

Memory Mgmt

BLOCKS SNAPPY STORE

Stream SNAPPY

Tables

DataFrame

•  Spark Executors are long running. Driver failure doesn’t shutdown Executors

•  Driver HA – Drivers are “Managed” by SnappyData with standby secondary

•  Data HA – Consensus based clustering integrated for eager replicaMon

DataFrame

Overview

Parser

Stream Processing

Data Frame

Low Latency

High Latency

HYBRID Store

Query OpMmizer

Add / Remove Server

Tables ODBC/JDBC

Overview

Parser

Stream Processing

Data Frame

Low Latency

High Latency

HYBRID Store

Query OpMmizer

Add / Remove Server

Tables ODBC/JDBC

Query OpAmizaAon

•  Bypass the scheduler for transacAons and low-latency jobs

• Minimize shuffles aggressively

- Dynamic replicaMon for reference data - Retain ‘join indexes’ whenever possible - Collocate and co-parMMon related tables and streams

• OpAmized ‘Hash Join’, ‘Scan’, ‘GroupBy’ compared to Spark - Use more variables (eliminate virtual funcs) to generate beOer code - Use vectorized structures - Avoid Spark’s single-node boOlenecks in broadcast joins

•  Column segment pruning through metadata

Co-parAAoning & Co-locaAon

Spark Executor Subscriber A-M Ref data

Spark Executor Subscriber N-Z Ref data

Linearly scale with partition pruning

Subscriber A-M

Subscriber N-Z

KAFKA Queue

Overview

Parser

Stream Processing

Data Frame

Low Latency

High Latency

HYBRID Store

Query OpMmizer

Add / Remove Server

Tables ODBC/JDBC

Approximate Query Processing (AQP): Academia vs. Industry

25+ yrs of successful research in academia

User-facing AQP almost non-existent in commercial world!

Some approximate features in Infobright, Yahoo’s Druid, Facebook’s Presto, Oracle 12C, ...

AQUA, Online AggregaMon, MapReduce Online, STRAT, ABS, BlinkDB / G-OLA, ...

select geo, avg(bid) from adImpressions group by geo having avg(bid)>10 with error 0.05 at confidence 95

geo avg(bid) error prob_existence

MI 21.5 ± 0.4 0.99

CA 18.3 ± 5.1 0.80

MA 15.6 ± 2.4 0.81

... ... ... ....

select geo, avg(bid) from adImpressions group by geo having avg(bid)>10 with error 0.05 at confidence 95

geo avg(bid) error prob_existence

MI 21.5 ± 0.4 0.99

CA 18.3 ± 5.1 0.80

MA 15.6 ± 2.4 0.81

... ... ... .... 1.  Incompatible w/ BI tools

2.  Complex semantics

3.  Bad sales pitch!

A First Industrial-Grade AQP Engine

1. Highlevel Accuracy Contract (HAC)

•  User picks a single number p, where 0≤p≤1 (by default p=0.95)

•  Snappy guarantees that s/he only sees things that are at least p% accurate

•  Snappy handles (and hides) everything else!

geo avg(bid)

MI 21.5

WI 42.3

NY 65.6

... ...

2. Fully compatible w/ BI tools

•  Set HAC behavior at JDBC/ODBC connecMon level

geo avg(bid)

MI 21.5

WI 42.3

NY 65.6

... ...

2. Fully compatible w/ BI tools

•  Set HAC behavior at JDBC/ODBC connecMon level

•  Concurrency: 10’s of queries in shared clusters •  Resource usage: everyone hates their AWS bill •  Network shuffles •  Immediate results while waiMng for final results

3. Better marketing!

iSight (Immediate inSight)

geo avg(bid)

MI 21.5

WI 42.3

NY 65.6

... ...

Benchmarks

• Mixed Benchmark: Ad Analytics

- Ad impressions arrive on a message bus - Aggregate by publisher and geo - Report avg bid, # of impressions, and # of uniques every few

secs - Write to a parMMoned store - TransacMonally update the profiles during ingesMon - Q1: Top-20 ads receiving most impressions per region - Q2: Top-20 ads receiving largest bids per geo - Q3: Top-20 publishers receiving largest sum of bids overall

• TPC - H

Stream

• HW: 5 c4.2xlarge EC2 instances

- 1 coordinator + 4 workers

• Software: Latest GA versions avalable:

- Kaxa 2.10_0.8.2.2 - Spark 2.0.0 - Cassandra 3.9

w/ Spark-Cassandra connector 2.0.0_M3

- MemSQL Ops-5.5.10 Community EdiMon w/ Spark-MemSQL Connector 2.10_1.3.3

- SnappyData 0.6.1

Ad AnalyAcs

1.5-2x faster ingestion 7-142× faster analytics (at 300M records)

SnappyData MemSQL Spark

100 GB

Avg Median

SnappyData was 2.6x faster than MemSQL & 22.4x faster than Spark 2.0

Conclusion

Where Are We Today ?

•  Current customers

•  Current release

•  Next funding round

•  Upcoming features

- Investment banking, Industrial IoT, Telco, Ad AnalyMcs, & healthcare

- 0.7 (GA 1.0 in Q1-2017)

- Q2-2017

-  IntegraMon of Spark ML w/ our Data Synopsis Engine -  Cost-based query opMmizer -  Physical designer & workload miner (hOp://CliffGuard.org)

Lessons Learned

1. A unique experience marryings two different breeds of distributed systems lineage-based for high-throughput vs. (consensus-) replicaMon-based for low-latency

Lessons Learned

2. A unified cluster is simpler, cheaper, and faster

- By sharing state across apps, we decouple apps from data servers and provide HA - Save memory, data copying, serializaMon, and shuffles - Co-parMMoning and co-locaMon for faster joins and stream analyMcs

Lessons Learned

3. Advantages over HTAP engines: Deep stream integration + AQP

- Stream processing ≠ stream analyMcs - Top-k w/ almost arbitrary predicates + 1-pass straMfied sampling over streams

Lessons Learned

3. Advantages over HTAP engines: Deep stream integration + AQP

- Stream processing ≠ stream analyMcs - Top-k w/ almost arbitrary predicates + 1-pass straMfied sampling over streams

4. Commercializing academic work is lots of work but also lots of fun

THANK YOU !

Try our iSight cloud for free: htp://snappydata.io/iSight

SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture...

Documents

SnappyData, the Spark Database. A unified cluster for streaming, transactions & interactive analytics

PowerPoint Presentationcidrdb.org/cidr2017/slides/p82-dong-cidr17-slides.pdf · 8636146 (user) likes liked by 6205972929 (Story) 18429207554 (page) admin friend name: Barack Obama

Adaptive Concurrency Control: Despite the Looking Glass ...cidrdb.org/cidr2017/papers/p63-tang-cidr17.pdf · representative concurrency control protocols. Speci cally, it includes

ApproxJoin: Approximate Distributed Joins · performs native Spark-based joins and the state-of-the-art SnappyData system by a substantial margin. 2 OVERVIEW ApproxJoin is designed

Towards Sustainable Insightscidrdb.org/cidr2017/slides/p56-binnig-cidr17-slides.pdf · Towards Sustainable Insights ... B. Visualization Recommendation Systems C. Hypothesis Generator

Cosette: An Automated Solver for SQLcidrdb.org/cidr2017/slides/p51-chu-cidr17-slides.pdfCosette: An Automated Solver for SQL Shumo Chu Konstantin Weitz Chenglong Wang Alvin Cheung

Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter

SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th

Database Forensic Analysis with DBCarvercidrdb.org/cidr2017/slides/p128-wagner-cidr17-slides.pdfDatabase Forensic Analysis with DBCarver James Wagner, Alexander Rasin, TanuMalik, Karen

Ground: A Data Context Service - CIDRcidrdb.org/cidr2017/papers/p111-hellerstein-cidr17.pdf · A traditional DBMS included a ... volume sources (e.g., machine-generated usage

A Model for Fine-Grained Data Citationcidrdb.org/cidr2017/slides/p19-davidson-cidr17-slides.pdfA Model for Fine-Grained Data Citation Susan B. Davidson, Daniel Deutch, Tova Milo, Gianmaria

Demonstrating the BigDAWG Polystore System for …cidrdb.org/cidr2017/papers/p120-mattson-cidr17.pdf · Demonstrating the BigDAWG Polystore System for Ocean ... In most Big Data

SnappyData: A Unified Cluster for Streaming, Transactions, and

IS5126 - HowBA - NUS Computingphantq/IS5126/AY2016_2017_2/IS5126_Lectur… · Lecture 2 – Data, Databases, SQL, Behavioral AnalyAcs; Jan 18 ... The Data Warehouse ETL Toolkit, Ralph

Database Forensic Analysis with DBCarvercidrdb.org/cidr2017/papers/p128-wagner-cidr17.pdf · tors, alternatively, need forensic tools and techniques that work on poorly-con gured

Fast-Forwarding to Desired Visualizations with zenvisagecidrdb.org/cidr2017/papers/p43-siddiqui-cidr17.pdf · Fast-Forwarding to Desired Visualizations with zenvisage Tarique Siddiqui,

ExadataSo@ware12.2.1.1.0 - oracle.com · Copyright©201 7Oracleand/oritsaﬃliates.Allrightsreserved.** Smart(AnalyAcs:(SmartWriteBurstsandTempIOinFlashCache•

SnappyData: A Uniﬁed Cluster for Streaming, Transactions ...web.eecs.umich.edu/~mozafari/php/data/uploads/cidr_2017.pdf · SnappyData: A Uniﬁed Cluster for Streaming, Transactions,

SnappyData: Apache Spark Meets Embedded In-Memory Database · SnappyData: Apache Spark Meets Embedded In-Memory Database Masaki Yamakawa UL Systems, Inc

Self-Driving Database Management Systemscidrdb.org/cidr2017/slides/p42-pavlo-cidr17-slides.pdf · Self-Driving Database Management Systems CIDR 2017 @andy_pavlo . 1920s Cornelius