41
Spark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing Team Lead

@maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

at

BigDataBe Meetup, July 09, 2014Gerard Maas - Data Processing Team Lead

[email protected] | @maasg

Page 2: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

@bout me

@maasg

Page 3: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Virdata: A cloud platform for the Internet of Things

Page 4: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Big Data Developers - Virdata, Internet of Things #virdata

Virdata - 2 COMPONENTS: A CLOUD & A LIBRARY

★ Elastic and Scalable cutting edge technologies★ API’s for different types of information/data consumption★ Cloud agnostic thru self build monitoring tools★ Running on both public & private cloud infrastructure★ Bi-directional messaging★ High performance brokers architecture

★ Lightweight and portable library★ Multiple programming languages★ Supports multiple transport protocols★ Available for all HW and OS★ Supports any type of data in any format/syntax★ Payload is compressed and encrypted

Page 5: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Scala @ Virdata

Page 6: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Page 7: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/ DataBricks Keynote - Spark Summit 2014

Page 8: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Batch Streaming

HDFS Cassandra

Page 9: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Batch Streaming

HDFS Cassandra

Page 10: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Spark: RDD Transformation

SAVEjoin

MAPFLATMAPGROUPFILTER...

INPUT DATA

HDFSTEXT/

Sequence File

RDD

RDD

.textFile

RDD RDD

OUTPUT

HDFSTEXT/

Sequence File

Cassandra

Page 11: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

DStream

Spark: RDD Transformation

RDD

OUTPUT

Cassandra

Web Sockets

...

INPUT STREAM

Kafka

RDD RDD

DStream

RDD RDD RDD

GROUPFILTER ...

MAPFLATMAP...

Page 12: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Batch Streaming

HDFS Cassandra

Page 13: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

HDFS

Worker

Worker

Worker

Page 14: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

HDFS

Worker

Worker

Worker

Page 15: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Memory CPU’s(and don’t forget to throw some disks in the mix)

Network

Page 16: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Spark Deployment Options

M

Local Standalone Cluster

WW W

Using a ClusterManager

WW

spark.master=local[8] spark.master=spark://host:port spark.master=mesos://host:port

MM

D

W

D

Page 17: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Apache Mesos

“Apache Mesos is a cluster manager that simplifies the complexity of running applications on a shared pool of servers.”

http://mesos.apache.org/

Page 18: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Why Mesos ?

Think in terms of Resources, not Machines

Page 19: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

How Mesos Works

M

DO O

O

Page 20: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

How Mesos Works

M

DO O

O

Frameworks- Scheduler- Executor

Page 21: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

How Mesos Works

M

DO O

O

Master

M

ZK

Page 22: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

How Mesos Works

M

DO O

O

M

M

ZKSlaves- run tasks

Page 23: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

How Mesos Works - resource offers

M

DO O

O

M

M

ZK

H1, 4CPU,2GB

Page 24: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

How Mesos Works - resource offers

M

DO O

O

M

M

ZK

H1, 4CPU,2GB

2C, 2G

2C, 4G

Page 25: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

How Mesos Works - resource offers

M

DO O

O

M

M

ZK

H1, 2CPU,2GB

2C, 2G

2C, 4G

Page 26: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

How Mesos Works - resource offers

M

DO O

O

M

M

ZK

2C, 4G

Page 27: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

The Mesos Paper… Where Spark Started

https://www.usenix.org/legacy/event/nsdi11/tech/full_papers/Hindman.pdf

Page 28: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Marathon“Keep your services

running”

Page 29: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Marathon

MD

O O

https://github.com/mesosphere/marathon

Page 30: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Marathon

MD

O O

Mar

atho

n

https://github.com/mesosphere/marathon

Page 31: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Marathon

M

D

O O

Mar

atho

n

https://github.com/mesosphere/marathon

Page 32: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Spark Job Server“Spark as a Service”

Page 33: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Job Server

MD

val sc = new spark.

SparkContext(conf)

https://github.com/ooyala/spark-jobserver

Page 34: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Job Server

MJob Impl

val sc = new spark.

SparkContext(conf)

Job

Serv

er

https://github.com/ooyala/spark-jobserver

object Job extends

SparkJob {

def runJob(...): Any

def validate(...):

SparkJobValidation

}

Page 35: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Job Server

M

Job Impl

Job

Serv

er

https://github.com/ooyala/spark-jobserver

HTTP/jars/context/jobs

Page 36: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

“The Datacenter as the computer”-Luis Barroso

HDFSFileSystem

MesosKernel

MarathonInit.d

Page 37: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

What about System Monitor?

Page 38: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Ganglia

Page 39: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

How we put it all together at

(Live Demo)

Page 40: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Questions?@virdata_iot | @maasg

[email protected]

Page 41: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing

Spark @ Virdata - BigData.be meetup 09/July/

Thank youvirdata.com | [email protected]