Upload
rajiv-shah
View
432
Download
0
Embed Size (px)
Citation preview
PART TIME SPARK USERRajiv Shah
www.rajivshah.com
Chicago Spark Users MeetupNov 5, 2015
ROADMAP
• Status of spark
• My take
• Examples
status of spark
Strata+Hadoop mentions of spark
Cloudera Blog Post on Sparkling Water
http://blog.cloudera.com/blog/2015/10/how-to-build-a-machine-learning-app-using-sparkling-water-and-apache-spark
my personal take
Insufficient Algorithms
http://projects.rajivshah.com/shiny/outlier/
surfing for algorithms
ML - MLLIB
http://spark.apache.org/docs/latest/mllib-guide.html
Language SchizophreniaScala, Python, R
Lack of Documentation
Difficult to tune
Not for small or big data
USING SPARK
Spark makes the impossible,possible
Spark is hard
COOL THINGS ABOUT SPARK
• Scales up
• Streaming
• Enterprise worthy
• It looks like it will play nice
SUGGESTIONS
• Get data engineers that will work with your data scientists
• If you can’t take advantage of spark’s strengths, don't use it
EXAMPLES
• Spark streaming - Streaming Kmeans clustering
• Anomaly Detection using H2O
• Recommenders