Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
William Markito@william_markito
Fred Melo@fredmelo_br
An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture
Using an IoT example
(incubating) (incubating)
2
Traditional Data Analytics - Limitations
HDFS
Data Lake
Store Analytics
Hard to change Labor intensive
Inefficient
No real-time information ETL based Data-source specific
3
Stream-based, Real-Time Closed-Loop Analytics
HDFSData LakeExpert System /
Machine Learning
In-Memory Real-Time Data
Continuous Learning Continuous
Improvement Continuous Adapting
Data Stream Pipeline
Multiple Data Sources Real-Time Processing Store Everything
4
A Streaming Machine Learning for IoT Example
Sensor Data
Smart System
Learns with HISTORICAL TRENDS
"How were the temperature and vibration sensors reading when the latest failures happened? "
Live data becomes historical over time
Real-Time
Evaluates LIVE DATA“According to historical trends, there’s an 80% chance this equipment would fail in the next 12 hours"
Historical
Predictive Maintenance Scenario
5
Info
Analysis
Look at past trends (for similar input)
Evaluate current input
Score / Predict
Machine Learning
Streaming Machine Learning
6
Info
Analysis
Filter
[ json ]
Machine Learning
Streaming Machine Learning
7
Info
Analysis
Filter Enrich Machine Learning
Streaming Machine Learning
8
Info
Analysis
Filter Enrich Transform Machine Learning
Streaming Machine Learning
9
Info
Analysis
Filter Enrich TransformML Model
Streaming Machine Learning
10
Info
Analysis
Filter Enrich Transform
Transform
ML Model
Streaming Machine Learning
11
In-Memory Data Grid
Front-end
Update Push
ML Model
Streaming Machine Learning
12
Neural Network
In-Memory Data GridReal-time scoring
Train
Supervised Learning ExampleStreaming Machine Learning
13
Ingest Transform SinkSpringXD
Store / Analyze
Fast Data
Distributed Computing
Predict / Machine Learning
Other Sources and Destinations
JMS
A Streaming Machine Learning Reference Architecture
Indoors Localization - Applied Example
14
Trilateration and its limitations
15
Noisy Data
Physical Barriers
Large Overlap Areas
Moving Targets
Innacuracy
Large Overlap Areas
Particle Filters - Calculating the optimum solution
16
Particle Filters - Calculating the optimum solution
17
The Solution
18
1. Capture signal strength 2. Calculate distance from
antenna 3. Trilaterate different sensors
to predict location in real-time 4. Show on a map with live
updates
Architecture Overview
19
Ingest
SpringXD
Groovy
JSON HTTP
+ Distance
Transform Sink
Calculate Device Distance Predict
Location
Spring Boot
Application Platform
GUI
20
• Cache • Configurable through XML, ,Java
• Region • Distributed j.u.Map on steroids • Highly available, redundant
• Member • Locator, Server, Client
• Callbacks • Listener, Writer, AsyncEventListener, Parallel/Serial
Geode Basic Concepts
Introduction to SpringXD
21
Runs as a distributed application or as a single node
Spring XD
22
A stream is composed from modules. Each module is deployed to a container and its channels are bound to the transport.
Demo
24
Why have we selected those projects
• Iterative & Exploratory model
• Web based REPL • Multiple Interpreters
• Apache Geode • Apache Spark • Markdown • Flink • Python…
• In-memory & Persistent • Highly Consistent • Extreme transaction
processing • Thousands of concurrent
clients • Reliable event model
• Productivity • Built-in connectors • Cloud Agnostic • Highly Scalable • Easy to setup • Streams without coding
25
https://github.com/Pivotal-Open-Source-Hub/WifiAnalyticsIoTSource code and detailed instructions available at:
25
William Markito@william_markito
Fred Melo@fredmelo_br
Follow us on GitHub!
26
26
William Markito@william_markito
Fred Melo@fredmelo_br
Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode (incubating), R and Spring XD
Room: Tohotom - 14:30, Sep 30 Fred Melo, Pivotal, William Markito, Pivotal