20
Stream Processing in “Big Data” world Jags Ramnarayan, Chief Architect, GemFire Pivotal Milind Bhandarkar , Chief Scientist, Pivotal Sunday, September 22, 2013

Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Stream Processing in “Big Data” world

Jags Ramnarayan,Chief Architect, GemFire

Pivotal

Milind Bhandarkar,Chief Scientist,

Pivotal

Sunday, September 22, 2013

Page 2: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Hype CurveSunday, September 22, 2013

Page 3: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Prediction: Hadoop Will Avoid Hype Curve by

Being Flexible....

Sunday, September 22, 2013

Page 4: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

...Instead, Hype Curve will Apply to Individual Hadoop Components

Sunday, September 22, 2013

Page 5: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Hadoop, the Project...

Core

YARN HDFS

MapReduce

Sunday, September 22, 2013

Page 6: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

...vs Hadoop, the Product

HDFS

HBase

Pig, Hive, Mahout

Map Reduce

Sqoop Flume

Resource Management & Workflow

Yarn

Zookeeper

Apache Pivotal HD Added Value

Configure, Deploy, Monitor,

Manage

Command Center

Hadoop Virtualization (HVE)

Data Loader

Pivotal HD Enterprise

Xtension Framework

Catalog Services

Query Optimizer

Dynamic Pipelining

ANSI SQL + Analytics

HAWQ– Advanced Database Services

Sunday, September 22, 2013

Page 7: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

e.g. MapReduceSunday, September 22, 2013

Page 8: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

MapReduce: Fault Tolerance, Scalability, & Flexibility at the

cost of Performance

Sunday, September 22, 2013

Page 9: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Performance Impact of MapReduce

User intelligence! 4.2! 198!

Sales analysis! 8.7! 161!

Click analysis! 2.0! 415!

Data exploration! 2.7! 1,285!

BI drill down! 2.8! 1,815!

47X

19X

208X

476X

648X

Sunday, September 22, 2013

Page 10: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Rise of Fast OLAP-On-Hadoop

• Pivotal HAWQ (aka Greenplum DB on Hadoop)

• Cloudera Impala

• Hortonworks Stinger (Hive over Tez)

•Drill, BigSQL, PolyBase, Optiq/Lingual

•More to come... (and go, such as Spire...)

Sunday, September 22, 2013

Page 11: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Gemfire-XD : Bringing OLTP/Operational DB to Hadoop

Sunday, September 22, 2013

Page 12: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Latency Spectrum

Machine latency

Interactive reports

Batch processing

Human interactions

Milliseconds Seconds Seconds, Minutes Minutes, Hours

GemFire XD, Online/OLTP/Operational DBs Analytics, Data Warehousing PivotalHD HAWQ

Sunday, September 22, 2013

Page 13: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Natural Next Step:Streaming in Hadoop

Sunday, September 22, 2013

Page 14: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Large-Scale Stream Processing

• Storm (Backtype/Twitter 2010, Apache Incubator 2013)

• S4 (Yahoo 2010, Apache Incubator 2011)

• Spark Streaming (Berkeley AMPLab, 2012)

• Dempsey (Nokia, 2012)

• MUPD8 (@WalmartLabs, 2012)

• MillWheel (Google, 2013)

• Apache Samza (LinkedIn, Apache Incubator 2013)

Sunday, September 22, 2013

Page 15: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Properties - 1• Decoupling Logical Model from Physical

Deployment

• Partitioning, Replication, Colocation with Distributed Reference Datasets

• Event Delivery Model

• At Least Once, At Most Once, Exactly Once

• Processor State

• Stateless, Local State, Distributed State

Sunday, September 22, 2013

Page 16: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Properties - 2• Flexible Data Model

• Stream Slice

• Event-at-a-time, Micro-Batch

• Integration with Hadoop

• Resource Sharing, Persistence on HDFS, Exactly once writes to persistent store

• Intuitive Programming Model

• Node, Channel, Flow

Sunday, September 22, 2013

Page 17: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

GemStreams

• In-Database Partitioned-Stream Processing

• Integrated Distributed State Management

• Replicated (Slow-Changing) Reference Datasets

• Reliable, Transparent, Exactly-Once persistence to HDFS

• Flexible Event Batching - Single Event or Micro-Batch

• Flexible Data Model - POJOs or Tuples

Sunday, September 22, 2013

Page 18: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Use Case: Trade Matching

Architecture – One hundred foot view

Trade Feeders

Trade Feeders

Trade Feeders

Trade Feeders

Matching Engines Aggregation/Alerting Engines

Risk Entity Alerts

Trade Feeds

Route by TradeID Route by RiskEntityID

Risk Entity Positions

Friday, January 18, 13Sunday, September 22, 2013

Page 19: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Event Flow

rawTrades

tradeMatching

unmatchedTrades

matchedTrades riskEntityPositions

summaryAlerting

network

tradeQueue

k vk vk vk v

positions

Sunday, September 22, 2013

Page 20: Stream Processing in “Big Data” world - HPTS · 2013-10-09 · Apache Pivotal HD Added Value Configure, Deploy, Monitor ... Pivotal HD Enterprise Xtension Framework Catalog Services

Sunday, September 22, 2013