33

Bringing Deep Learning into production

Embed Size (px)

Citation preview

Page 1: Bringing Deep Learning into production
Page 2: Bringing Deep Learning into production

Brief introduction

• CTO & co-founder of Agile Lab

• Data & Tech addicted

• Contributor of Spark Notebook

• Spark early adopter

• Certified Cassandra Architect

• DeepLearning enthusiast

Page 3: Bringing Deep Learning into production

Who is Agile Lab ?

GO BIG (data) or GO HOME

http://www.meetup.com/it-IT/Torino-Scala-Programming-Big-Data-Meetup/

Page 4: Bringing Deep Learning into production

What we do

ApplicationsHigh scalability

Decision Support

Systemsdata engineering, data mining and data

«meaning»

Big Data Strategies

TrainingReactive, NoSQL, Big Data, Machine

learning

Page 5: Bringing Deep Learning into production

Why Deep Learning

Page 6: Bringing Deep Learning into production

Deep Learning is trending

Page 7: Bringing Deep Learning into production

What is Deep Learning

• Deep learning is just another name for artificial neural networks

• An algorithm is deep if the input is passed through several non-linearity before being output

• Deep learning is discovering the features that best represent the problem, rather than just a way to combine them

Page 8: Bringing Deep Learning into production

Deep Learning: Use cases

Page 9: Bringing Deep Learning into production

Do you want start with Deep

Learning ?

Let’s choose the right tools !!

Page 10: Bringing Deep Learning into production

Deep Learning Frameworks

• Deeplearning4J• TensorFlow• Caffe• Theano• Torch• Spark ML MultilayerPerceptrons• H2O• CNTK• MatLab• maxDNN

And many others

Page 11: Bringing Deep Learning into production

How to choose

Background

Target Environment

Vision

Page 12: Bringing Deep Learning into production

Background

Productivity !!

• Scala

• JavaBig Data Engineer

• Java

• PythonMath

Engineer

• R

• PythonStatistician

Page 13: Bringing Deep Learning into production

Target Environment

Trained model should be deployable !! Trained

Model

Dev Env

Prod Env

Page 14: Bringing Deep Learning into production

Target Environment

Prod Env Dev Env

TrainingData

CleaningETLScheduling

ML Pipeline

- Track model performance over time- Care about SLA- Continous tweaks

Page 15: Bringing Deep Learning into production

Enterprise Architecture

HADOOPOnline

DataStore

Enterprise Service BUS

Data

In

teg

rati

on

La

ye

r

Data Integration Layer

Data

In

teg

rati

on

La

ye

r

External

Sources

ANALYTICS

VALUE

ADDED

SERVICES

API

SERVICES

Internal

Business

Sources

Internal

System

Sources

DeepLearning

Page 16: Bringing Deep Learning into production

Easy Wins

Training pipeline should run on Spark or Hadoop

Trained Model should be represented in Java objects

Page 17: Bringing Deep Learning into production

Vision: keep in mind Scaling

High Level dynamic languages are incredibly productive for prototyping and data exploration

Scaling on larger data sets quickly runs into performance limitations

Keep in mind scaling requirements from beginning

Page 18: Bringing Deep Learning into production

Vision: simplify the pipeline

Copy & Sample data from Dev Env to Data Scientist Env

Prototype in Python or R

Train model

Predict on validation Data

Translate Model to match Prod Env Java, MapReduce, Spark

Deploy training pipeline and model

Page 19: Bringing Deep Learning into production

Easy Wins

Datascientists should work directly on distributed

environment

Datascientist and big data engineers should co-operate

on the same platform

Page 20: Bringing Deep Learning into production

SWOT Analysis

Page 21: Bringing Deep Learning into production

Tensor Flow

Strenghts: - Powered By Google- Nice UI

Weaknesses: - Powered By Google- No support for “inline” matrix operations Slow

Opportunities:- Awesome community

Threats: - No Scala or Java integration- No commercial support

Page 22: Bringing Deep Learning into production

TheanoStrenghts: - Grand Daddy of deep learning - RNN and CNN- Computational graph abstraction- Python

Weaknesses:- No support for Hadoop or Spark- No plug & play nets

Opportunities:- Great community

Threats: - No Scala or Java integration- No commercial support

Page 23: Bringing Deep Learning into production

TorchStrenghts: - GPU support- Lots of pretrained models and packages- Easy to use

Weaknesses: - Lua language

Opportunities:- Backed by DeepMind and Facebook

Threats: - No Scala or Java integration- No commercial support

Page 24: Bringing Deep Learning into production

Caffè

Strenghts: - C++ & Python- Good Performance- GPU Support

Weaknesses: - Focused on image processing

Opportunities:- Backed by Yahoo for Spark integration- Gpu Clustering

Threats: - No commercial support

Page 25: Bringing Deep Learning into production

DeepLearning4j

Strenghts: - GPU support- Java and Scala- Full DNN set- Support Hadoop, Spark & Akka

Weaknesses: - Not for dummies

Opportunities:- Commercial support - SkyMind

Threats: - Not so sexy for DataScientist because of Java/Scala

Page 26: Bringing Deep Learning into production

H2O

• Easy to use Web UI• Multi language API• Run directly on HDFS or S3• Model is Java PoJo• Big Data Ready• Really Fast• Compressed data• Regularization• Grid Search

• GPU is still on roadmap• CNN and RNN too

Page 27: Bringing Deep Learning into production

H2O - Flow

Page 28: Bringing Deep Learning into production

H20 – Sparkling Water

• Python, R and Scala API• Best Kagglers use H20• Tons of tools for profiling and tu

ning• Spark leverage• Best in class algorithms – battle

tested• Regolarization• Grid search

Page 29: Bringing Deep Learning into production

H20 – Sparkling Water

Page 30: Bringing Deep Learning into production

Workflow

POJO Java

Training Set

Embeddable in:• J2EE App• Spark Job• MR Job• DWH as UDF

training

Page 31: Bringing Deep Learning into production

Spark as middleware

Using Spark as middleware, you can leverage :

• Deeplearning4J• H2O• TensorFlow ( Arimo Extension)• Caffe ( Yahoo Extension )• ML MultilayerPerceptrons and future implementations

NO tech provider Lock-in

Page 32: Bringing Deep Learning into production

Our Stack for Enterprise

• Ready for Enterprise and Hadoop World• Deployable into Java Env• Notebook ( Flow )• H2O for out of the box algorithms• DeepLearning 4J for advanced DNN and

n-dimension array manipulation• Good usability for both DataScientists and

Big Data Engineers• Enterprise Support along the whole stack

Page 33: Bringing Deep Learning into production

Thanks!

We are hiring !

[email protected]