INTRODUCING: CREATE PIPELINE

1

INTRODUCING: CREATE PIPELINE

Exactly-one semantics with Kafkaand MemSQL Pipelines

OCTOBER 2016

Gary Orenstein, CMOSteven Camiña, Product Manager

How important are real-time initiatives to your organization?

Not Important

Critical

Somewhat Important

Very Important

Important 32%

29%

22%

14%

3%

3 MemSQL State of Real Time Survey 2016, n=125

Other

Drive new revenue opportunities

Increase efficiency and reduce cost

Power new applications_x000d_(Example: IoT data processing)

Improve customer experience

Deliver real-time dashboards 52%

52%

43%

36%

29%

4%

Why are you pursuing real-time initiatives?

3 MemSQL State of Real Time Survey 2016, n=125

4

Want real-time dashboards and a better customer experiences?

Start with fresh data

5

A Modern Approach to Real-Time Data

Streaming Database Data Warehouse

6

Streaming, database, and data warehouse workloads

Database

Data Warehouse

Streaming

7

Real-time pipelines, OTLP, and OLAP

High Volume TransactionsOLTP

Fast, Scalable SQL AnalyticsOLAP

Real-timePipelines

8

The Database Platform For Real-Time Analytics

Real-Time Scalable Proven

• Fast Ingest• Low Latency Queries• High Concurrency

• Horizontal Scale• On-premises or Cloud• Highly Available and Secure

• ANSI SQL• Transactional and relational• Universal Ecosystem

SQL

9

Today’s Streaming DiscussionWeb logs, Mobile apps IoT, Sensors

10

The Enterprise Opportunity – Past and FutureWeb logs, Mobile apps IoT, Sensors

Take Existing Batch Processes Real-Time

11

Nothing Closer To Real Time Than Streaming Let’s look at the leading edge Apache Kafka Messaging Semantics

• At most once• At least once• Exactly once

12

At most once

000 ?

13

At least once

000

000

000

000

000

000

000

000

000

14

Exactly-Once

000

15

Understanding Streaming Semantics

At most once

Message pulled once

May or may not be received

No duplicates

Possible missing data

000 ?

16


At most once At least once

Message pulled once Message pulled one or more times;processed each time

May or may not be received Receipt guaranteed

No duplicates Likely duplicates

Possible missing data No missing data

000 ? 000000

000000

000

000

000

000

000

17


At most once At least once Exactly-once

Message pulled once Message pulled one or more times;processed each time

Message pulled one or more times;processed once

May or may not be received Receipt guaranteed Receipt guaranteed

No duplicates Likely duplicates No duplicates

Possible missing data No missing data No missing data

000 ? 000000

000000

000

000

000

000

000

000

18

MemSQL Pipelines

19

CREATE TABLE

20

CREATE PIPELINE

21

Introducing MemSQL Pipelines CREATE PIPELINE is a database construct that enables

data ingestion with exactly-once semantics• MemSQL stores the Kafka offset in a table• Exactly once delivery facilitated by co-locating data and offsets

Extract, transform, and load external data natively Fully distributed workloads User-defined transformations Scalable, highly performant, online ALTER TABLE and

ALTER PIPELINE

22

MemSQL Pipelines Sequence1. Extract from data sources2. Transform extracted data3. Load transformed data into Database tables in parallel

Data Sources MemSQL

1. Extract 2. Transform extracted data 3. Load into Database tables

Pipelines

23

MemSQL Pipelines Architecture: Kafka Example

Kafka Broker MemSQL NodePipelines



MemSQL MasterPipelines

1. Extract 2. Transform 3. Load

Datareshuffle

Metadata query



24

MemSQL PipelinesDemo

25

Demo Architecture

Kafka PipelinesTwitterMemSQL

SQL Insights

Visualization Dashboard

26

THANK YOU!

www.memsql.com/download

Data & Analytics

INTRODUCING: CREATE PIPELINE