Upload
memsql
View
894
Download
0
Embed Size (px)
Citation preview
1
INTRODUCING: CREATE PIPELINE
Exactly-one semantics with Kafkaand MemSQL Pipelines
OCTOBER 2016
Gary Orenstein, CMOSteven Camiña, Product Manager
How important are real-time initiatives to your organization?
Not Important
Critical
Somewhat Important
Very Important
Important 32%
29%
22%
14%
3%
3 MemSQL State of Real Time Survey 2016, n=125
Other
Drive new revenue opportunities
Increase efficiency and reduce cost
Power new applications_x000d_(Example: IoT data processing)
Improve customer experience
Deliver real-time dashboards 52%
52%
43%
36%
29%
4%
Why are you pursuing real-time initiatives?
3 MemSQL State of Real Time Survey 2016, n=125
4
Want real-time dashboards and a better customer experiences?
Start with fresh data
5
A Modern Approach to Real-Time Data
Streaming Database Data Warehouse
6
Streaming, database, and data warehouse workloads
Database
Data Warehouse
Streaming
7
Real-time pipelines, OTLP, and OLAP
High Volume TransactionsOLTP
Fast, Scalable SQL AnalyticsOLAP
Real-timePipelines
8
The Database Platform For Real-Time Analytics
Real-Time Scalable Proven
• Fast Ingest• Low Latency Queries• High Concurrency
• Horizontal Scale• On-premises or Cloud• Highly Available and Secure
• ANSI SQL• Transactional and relational• Universal Ecosystem
SQL
9
Today’s Streaming DiscussionWeb logs, Mobile apps IoT, Sensors
10
The Enterprise Opportunity – Past and FutureWeb logs, Mobile apps IoT, Sensors
Take Existing Batch Processes Real-Time
11
Nothing Closer To Real Time Than Streaming Let’s look at the leading edge Apache Kafka Messaging Semantics
• At most once• At least once• Exactly once
12
At most once
000 ?
13
At least once
000
000
000
000
000
000
000
000
000
14
Exactly-Once
000
15
Understanding Streaming Semantics
At most once
Message pulled once
May or may not be received
No duplicates
Possible missing data
000 ?
16
Understanding Streaming Semantics
At most once At least once
Message pulled once Message pulled one or more times;processed each time
May or may not be received Receipt guaranteed
No duplicates Likely duplicates
Possible missing data No missing data
000 ? 000000
000000
000
000
000
000
000
17
Understanding Streaming Semantics
At most once At least once Exactly-once
Message pulled once Message pulled one or more times;processed each time
Message pulled one or more times;processed once
May or may not be received Receipt guaranteed Receipt guaranteed
No duplicates Likely duplicates No duplicates
Possible missing data No missing data No missing data
000 ? 000000
000000
000
000
000
000
000
000
18
MemSQL Pipelines
19
CREATE TABLE
20
CREATE PIPELINE
21
Introducing MemSQL Pipelines CREATE PIPELINE is a database construct that enables
data ingestion with exactly-once semantics• MemSQL stores the Kafka offset in a table• Exactly once delivery facilitated by co-locating data and offsets
Extract, transform, and load external data natively Fully distributed workloads User-defined transformations Scalable, highly performant, online ALTER TABLE and
ALTER PIPELINE
22
MemSQL Pipelines Sequence1. Extract from data sources2. Transform extracted data3. Load transformed data into Database tables in parallel
Data Sources MemSQL
1. Extract 2. Transform extracted data 3. Load into Database tables
Pipelines
23
MemSQL Pipelines Architecture: Kafka Example
Kafka Broker MemSQL NodePipelines
Kafka Broker MemSQL NodePipelines
Kafka Broker MemSQL NodePipelines
MemSQL MasterPipelines
1. Extract 2. Transform 3. Load
Datareshuffle
Metadata query
1. Extract 2. Transform 3. Load
1. Extract 2. Transform 3. Load
24
MemSQL PipelinesDemo
25
Demo Architecture
Kafka PipelinesTwitterMemSQL
SQL Insights
Visualization Dashboard
26
THANK YOU!
www.memsql.com/download