Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
© 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential © 2017 MapR Technologies1
Fast Cars, Big Data How Streaming Can Help Formula 1
Tugdual Grall@tgrall
© 2017 MapR Technologies@tgrall
{“about” : “me”}Tugdual “Tug” Grall • MapR
• Technical Evangelist • MongoDB
• Technical Evangelist • Couchbase
• Technical Evangelist • eXo
• CTO • Oracle
• Developer/Product Manager • Mainly Java/SOA
• Developer in consulting firms
• Web • @tgrall • http://tgrall.github.io • tgrall
• NantesJUG co-founder
• Pet Project : • http://www.resultri.com
© 2017 MapR Technologies 3
MapR Converged Data Platform
Open Source Engines & Tools Commercial Engines & Applications
Utility-Grade Platform Services
Dat
aPr
oces
sing
Enterprise StorageMapR-FS MapR-DB MapR Streams
Database Event Streaming
Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy
Search & Others
Cloud & Managed Services
Custom Apps
Unified M
anagement and M
onitoring
© 2017 MapR Technologies 4
Agenda• What’s the point of data in motorsports? • Live demo • Architecture • What’s next?
© 2017 MapR Technologies@tgrall 5
How data plays in F1 motorsports
© 2017 MapR Technologies@tgrall 6
© 2017 MapR Technologies@tgrall
Data in Motorsports
http
://f1
fram
ewor
k.bl
ogsp
ot.d
e/20
13/0
8/sh
ort-g
uide
-to-f1
-tele
met
ry-s
pa-c
ircui
t.htm
l
© 2017 MapR Technologies@tgrall
Race Strategy
F1 Framework - http://f1framework.blogspot.be/2013/08/race-strategy-explained.html
© 2017 MapR Technologies@tgrall 9
Difference is due to later and sharper braking
© 2017 MapR Technologies 10
Got examples?• Up to 300 sensors per car • Up to 2000 channels • Sensor data are sent to the paddock in 2ms • 1.5 billions of data points for a race • 5 billions for a full race weekend • 5/6Gb of compressed data per car for 90mn
© 2017 MapR Technologies 11
Got examples?• Up to 300 sensors per car
© 2017 MapR Technologies 12
Got examples?• Up to 300 sensors per car • Up to 2000 channels
© 2017 MapR Technologies 13
Got examples?• Up to 300 sensors per car • Up to 2000 channels • Sensor data are sent to the paddock in 2ms
© 2017 MapR Technologies 14
Got examples?• Up to 300 sensors per car • Up to 2000 channels • Sensor data are sent to the paddock in 2ms • 1.5 billions of data points for a race
© 2017 MapR Technologies 15
Got examples?• Up to 300 sensors per car • Up to 2000 channels • Sensor data are sent to the paddock in 2ms • 1.5 billions of data points for a race • 5 billions for a full race weekend
© 2017 MapR Technologies 16
Got examples?• Up to 300 sensors per car • Up to 2000 channels • Sensor data are sent to the paddock in 2ms • 1.5 billions of data points for a race • 5 billions for a full race weekend • 5/6Gb of compressed data per car for 90mn
© 2017 MapR Technologies 17
Got examples?• Up to 300 sensors per car • Up to 2000 channels • Sensor data are sent to the paddock in 2ms • 1.5 billions of data points for a race • 5 billions for a full race weekend • 5/6Gb of compressed data per car for 90mn
US Grand Prix 2014 : 243 Tb (race teams combined)
© 2017 MapR Technologies@tgrall 18
So how does that work? Especially for real-time data?
© 2017 MapR Technologies@tgrall 19
Production System Outline
© 2017 MapR Technologies@tgrall 20
Simplified Demo System Outline
Archive MapR DB
Jetty /Bootstrap /
d3
Apache Drill (SQL access)
TORCS race simulator
MapR Streams
© 2017 MapR Technologies@tgrall 21
TORCS for Cars, Physics and Drivers
The Open Racing Car Simulator http://torcs.sourceforge.net/
Car racing game AI & Research Platform
© 2017 MapR Technologies@tgrall
Demo time!
© 2017 MapR Technologies@tgrall
IoT : Racing Cars
Events Producers Events Consumers
sensors data
Real Time
AnalyticsAnalytics with SQL
https://github.com/mapr-demos/racing-time-series
© 2017 MapR Technologies@tgrall
IoT : Racing Cars
https://github.com/mapr-demos/racing-time-series
Events Producers Events Consumers
sensors data
Real Time
Analytics
https://github.com/mapr-demos/racing-time-series
JSONProducer
Consumer
© 2017 MapR Technologies@tgrall
Where/How to store data ?
© 2017 MapR Technologies@tgrall
Big Datastore
Distributed File SystemHDFS/MapR-FS
NoSQL DatabaseHBase/MapR-DB
….
© 2017 MapR Technologies@tgrall 27
Data Streaming
© 2017 MapR Technologies@tgrall 28
Data Streaming Moving millions of events per h|mn|s
© 2017 MapR Technologies 29
What is Kafka?• http://kafka.apache.org/ • Created at LinkedIn, open sourced in 2011 • Implemented in Scala / Java • Distributed messaging system built to scale
© 2017 MapR Technologies@tgrall 30
Big Picture
Producer
Producer
Producer
Consumer
Consumer
Consumer
© 2017 MapR Technologies 31
Topics and Partitions• Split topics into partitions for scalability
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5
0 1 2 3 4 5 6 7
Partition 0
Partition 1
Partition 2
Writes
© 2017 MapR Technologies 32
Consumer Groups• Single consumer abstraction for scalability • Max 1 consumer per partition • Any number of consumer groups
© 2017 MapR Technologies@tgrall 33
More real life Kafka …
Zookeeper
Broker 1
Topic A Topic B
Broker 2
Topic A Topic B
Broker 3
Topic A Topic B
Producer
Producer
Producer
Consumer
Consumer
Consumer
© 2017 MapR Technologies 34
• Distributed messaging system built to scale • Use Apache Kafka API 0.9.0 • No code change • Does not use the same “broker” architecture
• Log stored in MapR Storage(Scalable, Secured, Fast, Multi DC)
• No Zookeeper
© 2016 MapR Technologies@tgrall 35
Produce MessagesProducerRecord<String, byte[]> rec = new ProducerRecord<>( “/apps/racing/stream:sensors_data“, eventName, value.toString().getBytes());
producer.send(rec, (recordMetadata, e) -> { if (e != null) { … });
producer.flush();
© 2016 MapR Technologies@tgrall 36
Consume Messageslong pollTimeOut = 800; while(true) { ConsumerRecords<String, String> records = consumer.poll(pollTimeOut); if (!records.isEmpty()) { Iterable<ConsumerRecord<String, String>> iterable = records::iterator; StreamSupport.stream(iterable.spliterator(), false).forEach((record) -> { // work with record object
… record.value();…
}); consumer.commitAsync(); } }
© 2017 MapR Technologies@tgrall 37
What’s next?
© 2016 MapR Technologies@tgrall 3830
Sensor Data V1• 3 main data points:
• Speed (m/s) • RPM • Distance (m)
• Buffered
{ "_id":"1.458141858E9/0.324", "car" = "car1", "timestamp":1458141858, "racetime”:0.324, "records": [ { "sensors":{ "Speed":3.588583, "Distance":2003.023071, "RPM":1896.575806 }, "racetime":0.324, "timestamp":1458141858 }, { "sensors":{ "Speed":6.755624, "Distance":2004.084717, "RPM":1673.264526 }, "racetime":0.556, "timestamp":1458141858 },
© 2017 MapR Technologies@tgrall 39
Capture more data
© 2016 MapR Technologies@tgrall 4031
Sensor Data V2• 3 main data points:
• Speed (m/s) • RPM • Distance (m) • Throttle • Gear • …
• Buffered
{ "_id":"1.458141858E9/0.324", "car" = "car1", "timestamp":1458141858, "racetime”:0.324, "records": [ { "sensors":{ "Speed":3.588583, "Distance":2003.023071, "RPM":1896.575806, "gear" : 2 }, "racetime":0.324, "timestamp":1458141858 }, { "sensors":{ "Speed":6.755624, "Distance":2004.084717, “RPM":1673.264526, "gear" : 2 }, "racetime":0.556, "timestamp":1458141858 },
© 2017 MapR Technologies@tgrall 41
Add new services
© 2017 MapR Technologies@tgrall 42
New Data Service
sensors data
© 2017 MapR Technologies@tgrall 43
New Data Service
sensors data
AlertsStream
Processing
© 2017 MapR Technologies@tgrall
• Cluster Computing Platform • Extends “MapReduce” with
extensions – Streaming – Interactive Analytics
• Run in Memory • http://spark.apache.org/
© 2017 MapR Technologies@tgrall
• Streaming Dataflow Engine • Datastream/Dataset APIs • CEP, Graph, ML
• Run in Memory • https://flink.apache.org/
© 2017 MapR Technologies@tgrall 46
DEMO
Stream Processing with Flink
© 2017 MapR Technologies@tgrall
Streaming Architecture & Formula 1• Stream data in real time
• Big Data Store to deal with the scale • NoSQL Database, Distributed File System
• Decouple the source from the consumer(s) • Dashboard, Analytics, Machine Learning • Add new use case….
© 2017 MapR Technologies@tgrall
Streaming Architecture & Formula 1• Stream data in real time
• Big Data Store to deal with the scale • NoSQL Database, Distributed File System
• Decouple the source from the consumer(s) • Dashboard, Analytics, Machine Learning • Add new use case….
This is not only about Formula 1! (Telco, Finance, Retail, Content, IT)
© 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential © 2017 MapR Technologies49
Fast Cars, Big Data How Streaming Can Help Formula 1
Tugdual Grall@tgrall