29
PART TIME SPARK USER Rajiv Shah www.rajivshah.com Chicago Spark Users Meetup Nov 5, 2015

Using Spark Part Time

Embed Size (px)

Citation preview

Page 1: Using Spark Part Time

PART TIME SPARK USERRajiv Shah

www.rajivshah.com

Chicago Spark Users MeetupNov 5, 2015

Page 2: Using Spark Part Time

ROADMAP

• Status of spark

• My take

• Examples

Page 3: Using Spark Part Time

status of spark

Page 4: Using Spark Part Time
Page 5: Using Spark Part Time

Strata+Hadoop mentions of spark

Page 6: Using Spark Part Time

Cloudera Blog Post on Sparkling Water

http://blog.cloudera.com/blog/2015/10/how-to-build-a-machine-learning-app-using-sparkling-water-and-apache-spark

Page 7: Using Spark Part Time

my personal take

Page 8: Using Spark Part Time
Page 9: Using Spark Part Time
Page 10: Using Spark Part Time

Insufficient Algorithms

Page 11: Using Spark Part Time

http://projects.rajivshah.com/shiny/outlier/

surfing for algorithms

Page 12: Using Spark Part Time

ML - MLLIB

Page 14: Using Spark Part Time

Language SchizophreniaScala, Python, R

Page 15: Using Spark Part Time

Lack of Documentation

Page 16: Using Spark Part Time
Page 17: Using Spark Part Time

Difficult to tune

Page 18: Using Spark Part Time
Page 19: Using Spark Part Time

Not for small or big data

Page 20: Using Spark Part Time
Page 21: Using Spark Part Time
Page 22: Using Spark Part Time
Page 23: Using Spark Part Time
Page 24: Using Spark Part Time

USING SPARK

Page 25: Using Spark Part Time

Spark makes the impossible,possible

Page 26: Using Spark Part Time

Spark is hard

Page 27: Using Spark Part Time

COOL THINGS ABOUT SPARK

• Scales up

• Streaming

• Enterprise worthy

• It looks like it will play nice

Page 28: Using Spark Part Time

SUGGESTIONS

• Get data engineers that will work with your data scientists

• If you can’t take advantage of spark’s strengths, don't use it

Page 29: Using Spark Part Time

EXAMPLES

• Spark streaming - Streaming Kmeans clustering

• Anomaly Detection using H2O

• Recommenders