8
Real-Time Data Pipelines Nikita Shamgunov, MemSQL CTO and co-founder February 17, 2016

PowerStream Demo

  • Upload
    memsql

  • View
    676

  • Download
    0

Embed Size (px)

Citation preview

Real-Time Data PipelinesNikita Shamgunov, MemSQL CTO and co-founder

February 17, 2016

MemSQL Confidential2

Designed for Modern Hardware, Trends, and Workloads

Scalable SQL In-Memory and Solid-State Distributed Datacenter or Cloud

Multi-mode OLTP, OLAP, HTAP

Multi-model ANSI SQL Key-value Document/JSON Geospatial

In-Memory rowstore Solid-state columnstore Stream directly to

rowstore or columnstore

Distributed query optimizer and execution

Scale-out on commodity hardware

Deploy on-premises Cloud agnostic

Amazon Microsoft Google Digital Ocean

Simple Real-Time Affordable Flexible

SSD

MemSQL Confidential3

Creating Real-Time Pipelines with Streamliner

Real-Time Application

One click deployment of integrated Apache Spark Create real-time data pipelines through a graphical UI Eliminate batch ETL Open sourced on GitHub at memsql.github.io/spark-streamliner

Apache Spark

STREAMLINER

Extract Transform Load

STREAMLINER

Real-Time Inputs

MemSQL Confidential4

MemSQL in Energy Real-Time Scoring for Predictive Applications Sensor reading and predictive model score appear

simultaneously in database table

Input

User JarSAS Generated PMML

Industrial Equipment

Sensor Data

S1 S2 S3 P1 P2 P3

Sensor 1 Predictive Model 1

STREAMLINER

Internet-of-Things simulation depicting health of wind turbines globally.

7 machines - AWS C4-2X large instances, at $0.311 per hour per machine, annual cost ~ $19,000.

MemSQL PowerStream

Sensors

Wind Turbine Wind Farm

MemSQL PowerStream197,000 wind turbines around the world

Apache Spark

STREAMLINERData Producers

(simulating sensor activity)

PowerStream User Interface

MemSQL PowerStream Architecture

Thank Youwww.memsql.com