Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Stream Processing in “Big Data” world
Jags Ramnarayan,Chief Architect, GemFire
Pivotal
Milind Bhandarkar,Chief Scientist,
Pivotal
Sunday, September 22, 2013
Hype CurveSunday, September 22, 2013
Prediction: Hadoop Will Avoid Hype Curve by
Being Flexible....
Sunday, September 22, 2013
...Instead, Hype Curve will Apply to Individual Hadoop Components
Sunday, September 22, 2013
Hadoop, the Project...
Core
YARN HDFS
MapReduce
Sunday, September 22, 2013
...vs Hadoop, the Product
HDFS
HBase
Pig, Hive, Mahout
Map Reduce
Sqoop Flume
Resource Management & Workflow
Yarn
Zookeeper
Apache Pivotal HD Added Value
Configure, Deploy, Monitor,
Manage
Command Center
Hadoop Virtualization (HVE)
Data Loader
Pivotal HD Enterprise
Xtension Framework
Catalog Services
Query Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ– Advanced Database Services
Sunday, September 22, 2013
e.g. MapReduceSunday, September 22, 2013
MapReduce: Fault Tolerance, Scalability, & Flexibility at the
cost of Performance
Sunday, September 22, 2013
Performance Impact of MapReduce
User intelligence! 4.2! 198!
Sales analysis! 8.7! 161!
Click analysis! 2.0! 415!
Data exploration! 2.7! 1,285!
BI drill down! 2.8! 1,815!
47X
19X
208X
476X
648X
Sunday, September 22, 2013
Rise of Fast OLAP-On-Hadoop
• Pivotal HAWQ (aka Greenplum DB on Hadoop)
• Cloudera Impala
• Hortonworks Stinger (Hive over Tez)
•Drill, BigSQL, PolyBase, Optiq/Lingual
•More to come... (and go, such as Spire...)
Sunday, September 22, 2013
Gemfire-XD : Bringing OLTP/Operational DB to Hadoop
Sunday, September 22, 2013
Latency Spectrum
Machine latency
Interactive reports
Batch processing
Human interactions
Milliseconds Seconds Seconds, Minutes Minutes, Hours
GemFire XD, Online/OLTP/Operational DBs Analytics, Data Warehousing PivotalHD HAWQ
Sunday, September 22, 2013
Natural Next Step:Streaming in Hadoop
Sunday, September 22, 2013
Large-Scale Stream Processing
• Storm (Backtype/Twitter 2010, Apache Incubator 2013)
• S4 (Yahoo 2010, Apache Incubator 2011)
• Spark Streaming (Berkeley AMPLab, 2012)
• Dempsey (Nokia, 2012)
• MUPD8 (@WalmartLabs, 2012)
• MillWheel (Google, 2013)
• Apache Samza (LinkedIn, Apache Incubator 2013)
Sunday, September 22, 2013
Properties - 1• Decoupling Logical Model from Physical
Deployment
• Partitioning, Replication, Colocation with Distributed Reference Datasets
• Event Delivery Model
• At Least Once, At Most Once, Exactly Once
• Processor State
• Stateless, Local State, Distributed State
Sunday, September 22, 2013
Properties - 2• Flexible Data Model
• Stream Slice
• Event-at-a-time, Micro-Batch
• Integration with Hadoop
• Resource Sharing, Persistence on HDFS, Exactly once writes to persistent store
• Intuitive Programming Model
• Node, Channel, Flow
Sunday, September 22, 2013
GemStreams
• In-Database Partitioned-Stream Processing
• Integrated Distributed State Management
• Replicated (Slow-Changing) Reference Datasets
• Reliable, Transparent, Exactly-Once persistence to HDFS
• Flexible Event Batching - Single Event or Micro-Batch
• Flexible Data Model - POJOs or Tuples
Sunday, September 22, 2013
Use Case: Trade Matching
Architecture – One hundred foot view
Trade Feeders
Trade Feeders
Trade Feeders
Trade Feeders
Matching Engines Aggregation/Alerting Engines
Risk Entity Alerts
Trade Feeds
Route by TradeID Route by RiskEntityID
Risk Entity Positions
Friday, January 18, 13Sunday, September 22, 2013
Event Flow
rawTrades
tradeMatching
unmatchedTrades
matchedTrades riskEntityPositions
summaryAlerting
network
tradeQueue
k vk vk vk v
positions
Sunday, September 22, 2013
Sunday, September 22, 2013