26
1 William Markito @william_markito Fred Melo @fredmelo_br An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture Using an IoT example (incubating) (incubating)

Building a highly scalable open-source Real-time Streaming

Embed Size (px)

Citation preview

Page 1: Building a highly scalable open-source Real-time Streaming

1

William Markito@william_markito

Fred Melo@fredmelo_br

An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture

Using an IoT example

(incubating) (incubating)

Page 2: Building a highly scalable open-source Real-time Streaming

2

Traditional Data Analytics - Limitations

HDFS

Data Lake

Store Analytics

Hard to change Labor intensive

Inefficient

No real-time information ETL based Data-source specific

Page 3: Building a highly scalable open-source Real-time Streaming

3

Stream-based, Real-Time Closed-Loop Analytics

HDFSData LakeExpert System /

Machine Learning

In-Memory Real-Time Data

Continuous Learning Continuous

Improvement Continuous Adapting

Data Stream Pipeline

Multiple Data Sources Real-Time Processing Store Everything

Page 4: Building a highly scalable open-source Real-time Streaming

4

A Streaming Machine Learning for IoT Example

Sensor Data

Smart System

Learns with HISTORICAL TRENDS

"How were the temperature and vibration sensors reading when the latest failures happened? "

Live data becomes historical over time

Real-Time

Evaluates LIVE DATA“According to historical trends, there’s an 80% chance this equipment would fail in the next 12 hours"

Historical

Predictive Maintenance Scenario

Page 5: Building a highly scalable open-source Real-time Streaming

5

Info

Analysis

Look at past trends (for similar input)

Evaluate current input

Score / Predict

Machine Learning

Streaming Machine Learning

Page 6: Building a highly scalable open-source Real-time Streaming

6

Info

Analysis

Filter

[ json ]

Machine Learning

Streaming Machine Learning

Page 7: Building a highly scalable open-source Real-time Streaming

7

Info

Analysis

Filter Enrich Machine Learning

Streaming Machine Learning

Page 8: Building a highly scalable open-source Real-time Streaming

8

Info

Analysis

Filter Enrich Transform Machine Learning

Streaming Machine Learning

Page 9: Building a highly scalable open-source Real-time Streaming

9

Info

Analysis

Filter Enrich TransformML Model

Streaming Machine Learning

Page 10: Building a highly scalable open-source Real-time Streaming

10

Info

Analysis

Filter Enrich Transform

Transform

ML Model

Streaming Machine Learning

Page 11: Building a highly scalable open-source Real-time Streaming

11

In-Memory Data Grid

Front-end

Update Push

ML Model

Streaming Machine Learning

Page 12: Building a highly scalable open-source Real-time Streaming

12

Neural Network

In-Memory Data GridReal-time scoring

Train

Supervised Learning ExampleStreaming Machine Learning

Page 13: Building a highly scalable open-source Real-time Streaming

13

Ingest Transform SinkSpringXD

Store / Analyze

Fast Data

Distributed Computing

Predict / Machine Learning

Other Sources and Destinations

JMS

A Streaming Machine Learning Reference Architecture

Page 14: Building a highly scalable open-source Real-time Streaming

Indoors Localization - Applied Example

14

Page 15: Building a highly scalable open-source Real-time Streaming

Trilateration and its limitations

15

Noisy Data

Physical Barriers

Large Overlap Areas

Moving Targets

Innacuracy

Large Overlap Areas

Page 16: Building a highly scalable open-source Real-time Streaming

Particle Filters - Calculating the optimum solution

16

Page 17: Building a highly scalable open-source Real-time Streaming

Particle Filters - Calculating the optimum solution

17

Page 18: Building a highly scalable open-source Real-time Streaming

The Solution

18

1. Capture signal strength 2. Calculate distance from

antenna 3. Trilaterate different sensors

to predict location in real-time 4. Show on a map with live

updates

Page 19: Building a highly scalable open-source Real-time Streaming

Architecture Overview

19

Ingest

SpringXD

Groovy

JSON HTTP

+ Distance

Transform Sink

Calculate Device Distance Predict

Location

Spring Boot

Application Platform

GUI

Page 20: Building a highly scalable open-source Real-time Streaming

20

• Cache • Configurable through XML, ,Java

• Region • Distributed j.u.Map on steroids • Highly available, redundant

• Member • Locator, Server, Client

• Callbacks • Listener, Writer, AsyncEventListener, Parallel/Serial

Geode Basic Concepts

Page 21: Building a highly scalable open-source Real-time Streaming

Introduction to SpringXD

21

Runs as a distributed application or as a single node

Page 22: Building a highly scalable open-source Real-time Streaming

Spring XD

22

A stream is composed from modules. Each module is deployed to a container and its channels are bound to the transport.

Page 23: Building a highly scalable open-source Real-time Streaming

Demo

Page 24: Building a highly scalable open-source Real-time Streaming

24

Why have we selected those projects

• Iterative & Exploratory model

• Web based REPL • Multiple Interpreters

• Apache Geode • Apache Spark • Markdown • Flink • Python…

• In-memory & Persistent • Highly Consistent • Extreme transaction

processing • Thousands of concurrent

clients • Reliable event model

• Productivity • Built-in connectors • Cloud Agnostic • Highly Scalable • Easy to setup • Streams without coding

Page 25: Building a highly scalable open-source Real-time Streaming

25

https://github.com/Pivotal-Open-Source-Hub/WifiAnalyticsIoTSource code and detailed instructions available at:

25

William Markito@william_markito

Fred Melo@fredmelo_br

Follow us on GitHub!

Page 26: Building a highly scalable open-source Real-time Streaming

26

26

William Markito@william_markito

Fred Melo@fredmelo_br

Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode (incubating), R and Spring XD

Room: Tohotom - 14:30, Sep 30 Fred Melo, Pivotal, William Markito, Pivotal