30
© 2016 Mesosphere, Inc. All Rights Reserved. SCALABLE TIME SERIES AND STREAM PROCESSING FOR IOT APPLICATIONS 1 Michael Hausenblas, Developer & Cloud Advocate | 2016-01-16

Michael Hausenblas- Scalable time series and stream processing for IoT applications

Embed Size (px)

Citation preview

Page 1: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2016 Mesosphere, Inc. All Rights Reserved.

SCALABLE TIME SERIES AND STREAM PROCESSING FOR IOT APPLICATIONS

1

Michael Hausenblas, Developer & Cloud Advocate | 2016-01-16

Page 2: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

MOTIVATION

2

Page 3: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

AIRLINES

3

Page 4: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

LOGISTICS

4

Page 5: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

HEALTH CARE

5

Page 6: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

TRADERS

6

Page 7: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

FARMERS

7

Page 8: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

CITIES

8

© 2

014,

Wire

d m

agaz

ine

Page 9: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

YOU

9

Page 10: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

THETOOLBOX

10

Page 11: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

LET'S TALK ABOUT WORKLOADS* …

11*) kudos to Timothy St. Clair, @timothysc

batch streaming PaaS

MapReduce

Page 12: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

• Apache Kafka• ØMQ, RabbitMQ, Disque (Redis-based), etc.• fluentd, Logstash, Flume• Akka streams• cloud-only: AWS SQS, Google Cloud Pub/Sub• see also queues.io

MESSAGE QUEUES & ROUTERS

12

Page 13: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

APACHE KAFKA

13

• High-throughput, distributed, persistent publish-subscribe messaging system

• Originates from LinkedIn

• Typically used as buffer/de-coupling layer in online stream processing

Message queues & routers

kafka.apache.org

Page 14: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

FLUENTD

14

Message queues & routers

www.fluentd.org

Page 15: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAM PROCESSING PLATFORMS

15

• Apache Storm• Apache Spark• Apache Samza• Apache Flink• Concord• cloud-only: AWS Kinesis, Google Cloud Dataflow• see also my webinar on stream processing

Page 16: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

APACHE STORM

16

• Distributed, fault-tolerant stream-processing platform

• Guaranteed message processing (replaying messages on failure)

• Concepts: tuples, streams, spouts, bolts, topologies

Stream processing platforms

storm.apache.org

Page 17: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

APACHE SPARK

17

Stream processing platforms

spark.apache.org

Spark SQL Spark Streaming MLlib(machine learning)

Spark core (RDD)

GraphX(graph processing)

Mesos

Filesystem (local, HDFS, S3) or data store (HBase, Cassandra, Elasticsearch, etc.)

YARNStandalone

Page 18: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

TIME SERIES DATASTORES

18

• InfluxDB• OpenTSDB• KairosDB• Prometheus• see also iot-a.info

Page 19: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

OPENTSDB

19

• Distributed time series database on top HBase

• Store, index, query & plot metrics

• Extremely scalable

• Low-level monitoring

Time series datastores

opentsdb.net

Page 20: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

INFLUXDB

20

• No-dependency, time series database written in Go

• SQLish query language (incl. regex, fan out)

• Single node or Raft-based distributed node mode

Time series datastores

influxdb.com

Page 21: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

CHALLENGES

21

• Set up and operation of components

• Elasticity: static vs. dynamic partitioning

• Efficient usage of resources (TCO)

Page 22: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

MEET THE DATACENTER OPERATINGSYSTEM(DCOS)

22

Page 23: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

LOCAL OS VS. DISTRIBUTED OS

23http://bitly.com/os-vs-dcos

Page 24: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

DCOS IS A DISTRIBUTED OPERATING SYSTEM

24

• local OS per node (+container enabled)• scheduling (long-lived, batch)• networking• service discovery• stateful services• security• monitoring, logging, debugging

Page 25: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved. 25

Page 26: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

BENEFITS

26

DCOS

• Run stateless services such as Web server or app server and Big Data services like Kafka, Spark, or Cassandra together on one cluster

• Dynamic partitioning of your cluster, depending on your business requirements

• Increased utilization (10% → 80%++)

Page 27: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

ANEXAMPLE

27

Page 28: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved. 28

https://mesosphere.com/blog/2015/11/18/dcos-time-series-demo

Page 29: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved. 29https://github.com/mesosphere/time-series-demo

Page 30: Michael Hausenblas- Scalable time series and stream processing for IoT applications

© 2015 Mesosphere, Inc. All Rights Reserved.

Q & A

30

• @mhausenblas

• mhausenblas.info

• @mesosphere

• mesosphere.io/product

• mesosphere.com/infinity