26
1 INTRODUCING: CREATE PIPELINE Exactly-one semantics with Kafka and MemSQL Pipelines OCTOBER 2016 Gary Orenstein, CMO Steven Camiña, Product Manager

INTRODUCING: CREATE PIPELINE

  • Upload
    memsql

  • View
    894

  • Download
    0

Embed Size (px)

Citation preview

Page 1: INTRODUCING: CREATE PIPELINE

1

INTRODUCING: CREATE PIPELINE

Exactly-one semantics with Kafkaand MemSQL Pipelines

OCTOBER 2016

Gary Orenstein, CMOSteven Camiña, Product Manager

Page 2: INTRODUCING: CREATE PIPELINE

How important are real-time initiatives to your organization?

Not Important

Critical

Somewhat Important

Very Important

Important 32%

29%

22%

14%

3%

3 MemSQL State of Real Time Survey 2016, n=125

Page 3: INTRODUCING: CREATE PIPELINE

Other

Drive new revenue opportunities

Increase efficiency and reduce cost

Power new applications_x000d_(Example: IoT data processing)

Improve customer experience

Deliver real-time dashboards 52%

52%

43%

36%

29%

4%

Why are you pursuing real-time initiatives?

3 MemSQL State of Real Time Survey 2016, n=125

Page 4: INTRODUCING: CREATE PIPELINE

4

Want real-time dashboards and a better customer experiences?

Start with fresh data

Page 5: INTRODUCING: CREATE PIPELINE

5

A Modern Approach to Real-Time Data

 Streaming           Database         Data Warehouse

Page 6: INTRODUCING: CREATE PIPELINE

6

Streaming, database, and data warehouse workloads

Database

Data Warehouse

Streaming

Page 7: INTRODUCING: CREATE PIPELINE

7

Real-time pipelines, OTLP, and OLAP

High Volume TransactionsOLTP

Fast, Scalable SQL AnalyticsOLAP

Real-timePipelines

Page 8: INTRODUCING: CREATE PIPELINE

8

The Database Platform For Real-Time Analytics

Real-Time Scalable Proven

• Fast Ingest• Low Latency Queries• High Concurrency

• Horizontal Scale• On-premises or Cloud• Highly Available and Secure

• ANSI SQL• Transactional and relational• Universal Ecosystem

SQL

Page 9: INTRODUCING: CREATE PIPELINE

9

Today’s Streaming DiscussionWeb logs, Mobile apps IoT, Sensors

Page 10: INTRODUCING: CREATE PIPELINE

10

The Enterprise Opportunity – Past and FutureWeb logs, Mobile apps IoT, Sensors

Take Existing Batch Processes Real-Time

Page 11: INTRODUCING: CREATE PIPELINE

11

Nothing Closer To Real Time Than Streaming Let’s look at the leading edge Apache Kafka Messaging Semantics

• At most once• At least once• Exactly once

Page 12: INTRODUCING: CREATE PIPELINE

12

At most once

000 ?

Page 13: INTRODUCING: CREATE PIPELINE

13

At least once

000

000

000

000

000

000

000

000

000

Page 14: INTRODUCING: CREATE PIPELINE

14

Exactly-Once

000

Page 15: INTRODUCING: CREATE PIPELINE

15

Understanding Streaming Semantics

At most once

Message pulled once

May or may not be received

No duplicates

Possible missing data

000 ?

Page 16: INTRODUCING: CREATE PIPELINE

16

Understanding Streaming Semantics

At most once At least once

Message pulled once Message pulled one or more times;processed each time

May or may not be received Receipt guaranteed

No duplicates Likely duplicates

Possible missing data No missing data

000 ? 000000

000000

000

000

000

000

000

Page 17: INTRODUCING: CREATE PIPELINE

17

Understanding Streaming Semantics

At most once At least once Exactly-once

Message pulled once Message pulled one or more times;processed each time

Message pulled one or more times;processed once

May or may not be received Receipt guaranteed Receipt guaranteed

No duplicates Likely duplicates No duplicates

Possible missing data No missing data No missing data

000 ? 000000

000000

000

000

000

000

000

000

Page 18: INTRODUCING: CREATE PIPELINE

18

MemSQL Pipelines

Page 19: INTRODUCING: CREATE PIPELINE

19

CREATE TABLE

Page 20: INTRODUCING: CREATE PIPELINE

20

CREATE PIPELINE

Page 21: INTRODUCING: CREATE PIPELINE

21

Introducing MemSQL Pipelines CREATE PIPELINE is a database construct that enables

data ingestion with exactly-once semantics• MemSQL stores the Kafka offset in a table• Exactly once delivery facilitated by co-locating data and offsets

Extract, transform, and load external data natively Fully distributed workloads User-defined transformations Scalable, highly performant, online ALTER TABLE and

ALTER PIPELINE

Page 22: INTRODUCING: CREATE PIPELINE

22

MemSQL Pipelines Sequence1. Extract from data sources2. Transform extracted data3. Load transformed data into Database tables in parallel

Data Sources MemSQL

1. Extract 2. Transform extracted data 3. Load into Database tables

Pipelines

Page 23: INTRODUCING: CREATE PIPELINE

23

MemSQL Pipelines Architecture: Kafka Example

Kafka Broker MemSQL NodePipelines

Kafka Broker MemSQL NodePipelines

Kafka Broker MemSQL NodePipelines

MemSQL MasterPipelines

1. Extract 2. Transform 3. Load

Datareshuffle

Metadata query

1. Extract 2. Transform 3. Load

1. Extract 2. Transform 3. Load

Page 24: INTRODUCING: CREATE PIPELINE

24

MemSQL PipelinesDemo

Page 25: INTRODUCING: CREATE PIPELINE

25

Demo Architecture

Kafka PipelinesTwitterMemSQL

SQL Insights

Visualization Dashboard

Page 26: INTRODUCING: CREATE PIPELINE

26

THANK YOU!

www.memsql.com/download