51
Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Introduction to Apache Spark Scott Deeg – Sr. Field Engineer, Pivotal

Spark forspringdevs springone_final

  • Upload
    sdeeg

  • View
    1.408

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/ Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Introduction to Apache Spark Scott Deeg – Sr. Field Engineer, Pivotal

Page 2: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Who Am I?

A Plain Old Java Geek •  Came to Si Valley seeking fame and fortune in 1995 (still looking) •  Started working in Java Jan 1996, Symantec Visual Café 1.0 •  Hacker on J2EE based BPM product for 10 years •  Joined VMware 2009 / Rolled into Pivotal April 1 2013 •  Primarily pre-sales consulting for large/medium enterprises [email protected]

Random Facts: CalPoly SLO, Physics, Guitar/Lutherie, Arduino, 3yr old boy, 100 yr old house (aka: Lots’O’work), spaces not tabs

2

Page 3: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Agenda

•  What is Spark? •  Programming Model •  Produce ecosystem •  Spark and Spring •  A bit on Internals

(with demo’s along the way)

3

Page 4: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

What people have been asking me about Spark

•  It’s one of those in memory things, right (yes) •  Is it “Big Data” (yes) •  Is it Hadoop (no) •  JVM, Java, Scala (yes) •  Is it “Real” or just another shiny technology with a long, but

ultimately small tail (?)

4

Page 5: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

What is Spark?

5

Page 6: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Official Definition

Apache Spark is a fast and general

engine for large scale data processing

6

Page 7: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spark is …

•  Distributed/Cluster Compute Engine •  A toolset for Data Scientists / Analysts •  Runs “batch” workloads in memory •  Hadoop Compatible •  Implementation of Resilient Distributed Dataset (RDD) in Scala •  Programmatic interface via API or Interactive

•  Scala, Java7/8, Python

7

Page 8: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spark is also …

•  An ASF Top Level project http://spark.apache.org •  Came out of AMPLab project at UCB •  An active community

•  ~100-200 contributors across 25-35 companies •  More active than Hadoop MapReduce •  1000 people (max) attended Spark Summit 2014 in SF

•  An eco-system of domain specific tools •  Different models, but interoperable

•  Backed by a commercial entity: Databricks

8

Page 9: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spark is not …

•  An OLTP data store •  A permanent or stable data store •  An app cache It’s also not Mature

•  Lots of room to grow.

9

Page 10: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Short History

•  2009 Started as research project at UCB •  2010 Open Sourced •  January 2011 AMPLab Created •  October 2012 version 0.6

•  Java, Stand alone cluster, maven •  June 21 2013 Spark accepted into ASF Incubator •  Feb 27 2014 Spark becomes top level ASF project •  May 30 2014 Spark 1.0 •  August 2014 1.0.2

10

Page 11: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spark Team Goals

•  Make life easy and productive for Data Scientists •  Provide well documented and expressive APIs •  Powerful Domain Specific Libraries •  Easy integration with common Big Data storage systems •  High Performance •  Well defined releases, stable API

11

Page 12: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spark is not Hadoop, but is compatible

•  Often better than Hadoop •  M/R fine for “Data Parallel”, but awkward for some workloads •  Low latency, Iterative, Streaming

•  Natively accesses Hadoop data •  Spark is YAYJ (Yet Another YARN Job)

•  Utilize current investments in Hadoop •  Brings Spark (closer) to the Data

•  Similar scalability and fault tolerance characteristics as Hadoop

It’s not OR … it’s AND

12

Page 13: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Improvements over Map/Reduce

•  Efficiency •  General Execution Graphs (not just map->reduce->store) •  In memory •  Useful for iterative processing

•  Usability •  Rich APIs in Scala, Java, Python •  Interactive REPL

Can Spark be the R for Big Data?

13

Page 14: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Topologies

•  Local in JVM or through REPL •  Great for dev

•  Spark Cluster (master/slaves) •  Improving rapidly

•  Cluster Resource Managers •  YARN •  MESOS

•  (PaaS?)

14

Page 15: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spark Programming Model

Page 16: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Core Spark Concept

In the Spark model a program is a set of transformations and actions on a dataset with the following properties: Resilient Distributed Dataset (RDD)

•  Read Only Collection of Objects spread across a cluster •  RDDs are built through parallel transformations (map, filter, …) •  Results are generated by actions (reduce, collect, …) •  Automatically rebuilt on failure using lineage •  Controllable persistence (RAM, HDFS, etc.)

16

Page 17: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Two Categories of Operations

•  Transform •  Create from stable storage (hdfs, tachyon, etc.) •  Generate new RDDs from other RDD •  Lazy Operations that build a DAG •  Once Spark knows your transformations it can build a plan

•  Action •  Return a result or write to storage •  Actions cause the DAG to execute

Ø  map Ø  filter Ø  flatMap Ø  sample Ø  groupByKey Ø  reduceByKey Ø  union Ø  join Ø  sort

Ø  count Ø  collect Ø  reduce Ø  lookup Ø  save

17

Page 18: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Demo

WordCount (of course) val file = sc.textFile("hdfs://bfm1/…") val words = file.flatMap(line => line.split(" ")) val wordOneMap = words.map(word => (word, 1)) val counts = wordOneMap.reduceByKey(_ + _) counts.collect()

18

Page 19: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

RDD Fault Tolerance

•  RDDs maintain lineage information that can be used to reconstruct lost partitions

cachedMsgs = textFile(...).filter(_.contains(“error”)) .map(_.split(‘\t’)(2)) .cache()

HdfsRDD path: hdfs://…

FilteredRDD func: contains(...)

MappedRDD func: split(…) CachedRDD

19

Source: http://spark.apache.org/

Page 20: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Optimizing Dataflow

20

Source: Aaron Davidson of Databricks

Page 21: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

RDDs are Foundational

•  General purpose enough to use to implement other programing models

•  SQL •  Streaming •  Machine Learning •  Graph

21

Page 22: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spark Ecosystem

Page 23: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spark SQL

•  Models RDDs as relations •  SchemaRDD

•  Replaces Shark •  Lighter weight version with no code from Hive

•  Import/Export in different Storage formats •  Parquet, learn schema from existing Hive warehouse

JavaRDD<Person> people = ctx.textFile(“people.txt").map(…) JavaSchemaRDD schemaPeople = sqlCtx.applySchema(people, Person.class);

schemaPeople.registerAsTable("people"); JavaSchemaRDD teens = sqlCtx.sql("SELECT name FROM people WHERE age >= 13");

23

Page 24: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Streaming

•  Extend Spark to do large scale stream processing •  100s of nodes with second scale end to end latency

•  Simple, batch like API with RDDs •  Input is broken up into micro-batches that become RDDs

24

Image from http://spark.apache.org/

Page 25: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Streaming

•  DStream is the primary construct •  Sources: HDFS, Flume, Kafka, Twitter, ZeroMQ, Custom •  Raw data needs to be replicated in-memory for FT •  Other features

•  Window-based Transformations •  Arbitrary join of streams

JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, …); JavaReceiverInputDStream<String> lines = ssc.socketTextStream(…) JavaDStream<String> words = lines.flatMap(…) JavaPairDStream<String, Integer> wordCounts = words.mapToPair(…)

25

Page 26: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

MLbase (“Young Project”)

•  Machine Learning toolset •  Library and higher level abstractions

•  General tool in space is MatLab •  Difficult for end users to learn, debug, scale solutions

•  Starting with MLlib •  Low level Distributed Machine Learning Library

•  Many different Algorithms •  Classification, Regression, Collaborative Filtering, etc.

26

Page 27: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

GraphX (alpha)

•  Graph processing library •  Replaces Spark Bagel

•  Graph Parallel not Data Parallel •  Reason in the context of neighbors •  GraphLab API

•  Graph Creation => Algorithm => Post Processing •  Existing systems mainly deal with the Algorithm and not interactive •  Unify collection and graph models

27

Image from http://spark.apache.org/

Page 28: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Others

•  Mesos •  Enable multiple frameworks to share same cluster resources •  Twitter is largest user: Over 6,000 servers

•  Tachyon •  In-memory, fault tolerant file system that exposes HDFS

•  Catalyst •  SQL Query Optimizer

28

Page 29: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spark and Spring

Page 30: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Sample App: Rocket Telemetry

•  Rockets generate data, and we want to understand it •  Batch processing to look for patterns across flights •  Streaming for watching it happen and alerting •  Boot, Java Config, MVC, etc.

WHY? •  Similar to Telematics

•  Very important to Auto Insurance industry •  It’s my friends project

•  It’s Real (model) rocket data!

30

Page 31: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Basics

•  Spark’s a library, so just include it •  Some lib conflicts, but not much

•  Logging loop

•  Packaging not fun •  Have to exclude spark and hadoop clients IF they’re running on a cluster as

as they’re provided by the runtime •  mvn “shade” plugin, gradle being a pain •  Executable Boot jars don’t just run on the Spark cluster

31

Page 32: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Demo

Show us some code already!

32 32

Page 33: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Spark and Spring XD

•  Two different problems in Enterprise data •  Primary data pipeline(s)

•  24/7/365 rock solid •  Operations oriented •  Well defined transformations and routing rules with long term deployment

•  Data analysis •  Batch and realtime aspects •  Transformation and processing exploration •  Frequently short term deployment •  Should not impact stability or operations of primary pipeline

33

Page 34: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Pretty Picture

Source Primary Stream

Processing

Stable Storage (HDFS)

Batch Analysis

Stream Analysis

Operational Data

(Redis, Gem)

Application

Sink

Transform / Filter

34

Source

Source

Page 35: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

A bit on Internals

Page 36: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

About this Sample

I can’t come up with a better example, so I use this one from Aaron Davidson of Databricks. This is a summary from his slides, and my notes from his talk at Spark Summit. All the images are from his deck. For more detail I highly recommend: http://spark-summit.org/2014/talk/a-deeper-understanding-of-spark-internals

36

Page 37: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Sample

37

Page 38: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

What happens

•  Create RDDs •  Pipeline operations as much of possible

•  When a results doesn’t depend on other results, we can pipeline •  But, when data needs to be reorganized, no longer pipeline

•  Stage is a merged operation •  Each stage gets a set of tasks •  Task is data and computation

38

Page 39: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

RDDs and Stages

39

Page 40: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Tasks

40

Page 41: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Stages running

•  Number of partitions matter for concurrency

•  Rule of thumb is at least 2x number of cores

41

Page 42: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

The Shuffle

•  Redistributes data among partitions •  Hash keys into buckets •  Pull not push •  Writes to intermediate files to disk •  Becoming plugable

� Optimizations: –  Avoided when possible, if ”data is already properly" partitioned –  Partial aggregation reduces data movement

42

Page 43: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Other thought’s on Memory

•  By default Spark (assumes it) owns 90% of the memory •  Partitions don’t have to fit in memory, but some things do

•  EG: values for large sets in groupBy’s must fit in memory

•  Shuffle memory is 20% •  If it goes over that, it’ll spill the data to disk •  Shuffle always writes to disk

•  Turn on compression to keep objects serialized •  Saves space, but takes compute to serialize/de-serialize

43

Page 44: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

This and That

Page 45: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Release cycle

•  1.0 Came out at end of May •  1.X expected to be current for several years •  API Stability in 1.X for all non-Alpha projects

•  Can recompile jobs, but hoping for binary compatibility •  Internal API are marked @DeveloperApi or @Experimental

•  Plan (was?) for quarterly .X release cycle •  2 mo dev / 1 mo QA •  1.0.1 July, 1.0.2 August

45

Page 46: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Resources

Main spark page •  http://spark.apache.org/ An initial paper on Spark •  https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf

Demo code for this session •  https://github.com/SpringOne2GX-2014/SparkForSpring

46

Page 47: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Upcoming

•  Blog post on executing Spring based Spark apps on clusters (Spark native, YARN, and Mesos)

•  Sample app with SpringXD as a source and Spark Streaming as a processor

47

Page 48: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Thanks! J

Page 49: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Misc

Page 50: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Abstract

Apache Spark is one of the most exciting, active, and talked about ASF projects today, but how should Spring developers and enterprise architects view it? Is it the second coming of the Bean spec, or just another shiny distraction? This talk will introduce Spark and its core concepts, the ecosystem of services on top of it, types of problems it can solve, similarities and differences from Hadoop, integration with Spring XD, deployment topologies, and an exploration of uses in enterprise. Concepts will be illustrated with several demos covering: the programming model with Spring/Java8, development experience, “realistic” infrastructure simulation with local virtual deployments, and Spark cluster monitoring tools.

50

Page 51: Spark forspringdevs springone_final

Unless otherwise indicated, these slides are © 2013-2014 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial l icense: http://creativecommons.org/licenses/by-nc/3.0/

Bio

A self described Plain Old Java Geek, Scott Deeg began his journey with Java in 1996 as a member of the Visual Café team at Symantec. From there he worked primarily as a consultant and solution architect dealing with enterprise Java applications. He joined Vmware in 2009 and is now a part of the EMC/VMware spin out Pivotal where he continues to work with large enterprises on their application platform and data needs. A big fan of open source software and technology, he tries to occasionally get out of the corporate world to talk about interesting things happening in the Java/OSS community.

51