16
© 2014 MapR Technologies 1 © 2014 MapR Technologies Chug Spark : Hello Spark Mike Emerick, Senior Architect MapR April 2014

Meet Spark

Embed Size (px)

DESCRIPTION

The Spark software stack includes a core data-processing engine, an interface for interactive querying, Spark-streaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction to the Spark stack, explain how Spark has lightening fast results, and how it complements Apache Hadoop.

Citation preview

Page 1: Meet Spark

© 2014 MapR Technologies 1© 2014 MapR Technologies

Chug Spark : Hello Spark

Mike Emerick, Senior Architect MapR

April 2014

Page 2: Meet Spark

© 2014 MapR Technologies 2

Agenda

• Introductions

• Log File enrichment

• ETL with ML

• Recommendation Engine

• Adhoc SQL Queries

• The Future case

Page 3: Meet Spark

© 2014 MapR Technologies 3

Who is Mike Emerick ?

My bio the highlights.

Architect for MapR for 2.5 years.

“creative hours at Workshop 88.”

Page 4: Meet Spark

© 2014 MapR Technologies 4

Approach to this presentation

1.No API discussion

2.Architecture features and utilization

3. Use Cases .. and Why Spark?

Page 5: Meet Spark

© 2014 MapR Technologies 5

Spark 10,000 feet

• Fundamentally Spark is an MPP.

• Can use many Storage Subsystems.(Great for development)

• RDD, Accumulators, Broadcast.

• Map Reduce +.

• Apache Spark site has

great resources

on architecture and API.

Page 6: Meet Spark

© 2014 MapR Technologies 6

Usecase : SQL Queries

• “Interactive SQL on Hadoop...”

• How does Spark make this easier?– Native Hive QL (SQL 93 ish)

– In memory and from disk

– Usually the first thought...

• Spark SQL

Page 7: Meet Spark

© 2014 MapR Technologies 7

Page 8: Meet Spark

© 2014 MapR Technologies 8

Usecase : Log file enrichment

• Why enrich my log data..?

• This is not Storm it is Batch– Similar to Hbase Async API..

• How does Spark make this easier?– Streaming API

– Sliding Windows

– SQL Hive/Shark• Connect to Hbase

– NoSQL Connectors • Hbase

Page 9: Meet Spark

© 2014 MapR Technologies 9

Page 10: Meet Spark

© 2014 MapR Technologies 10

Usecase : SQL mixing with ML

• Why are folks doing this..?

• How does Spark make this easier?– Native Machine learning Mlib

– Access to neartime Adhoc SQL queries

– R and SQL in the same place

– Bigger than in memory faster than MR

Page 11: Meet Spark

© 2014 MapR Technologies 11

Page 12: Meet Spark

© 2014 MapR Technologies 12

Usecase : Recommendation Engine

• It is a recommendation engine...

• How does Spark make this easier?– ETL and Enrichment

– Mlib makes it easy to import data.

– Mlib Training in same cluster

– NoSQL Adhoc serves recommendations

– Dynamic

Page 13: Meet Spark

© 2014 MapR Technologies 13

Page 14: Meet Spark

© 2014 MapR Technologies 14

Use cases build in complexity

• Adoption follows a curve of complexity– Ingestion and query

– Ingestion Enrichment Query

– Ingestion Enrichment Machine learning Query

– Ingestion Enrichment Machine learning Serving recommendations

– .....

• Spark is flattening the curve

• Why?– One framework

– Less data movement

– Access to preferred language

Page 15: Meet Spark

© 2014 MapR Technologies 15

Future state: ~ in the year 2000

• ADAM - Genomics

• GraphX – Graph is near...

• Mlib – Look for lots of work here

• PySpark – Fastest evolving

• SparkR – Just getting started

• BlinkDB – ~ Queries

• OEM...

Page 16: Meet Spark

© 2014 MapR Technologies 16

Business ServicesMapR is hiring in Chicago

Apache Drill Beta this Summer

Happy National Making day !

Check out W88 for Hadoop classes