Deep Learning through Examples - Kaggle #1

Deep Learning through Examples

0xdata H2OaiScalable In-Memory Machine Learning

Silicon Valley Big Data Science Meetup Vendavo Mountain View 91114

Arno Candel

Who am IPhD in Computational Physics 2005

from ETH Zurich Switzerland

6 years at SLAC - Accelerator Physics Modeling 2 years at Skytree Inc - Machine Learning 9 months at 0xdataH2O - Machine Learning

15 years in HPCSupercomputingModeling

Named ldquo2014 Big Data All-Starrdquo by Fortune Magazine

ArnoCandel

H2O Deep Learning ArnoCandel 3

matlabulous (Jo-fai Chow Blend it like a Bayesian) says

ldquoI am 9999999999999 sure that I can still go further with H2Ordquo

Achieved with H2O Deep Learning from R

H2O DeepLearning Kaggle 1 rank (out of 413) - 40d left

1

17

H2O Deep Learning ArnoCandel

OutlineIntro amp Live Demo (10 mins)

Methods amp Implementation (20 mins)

Results amp Live Demos (25 mins)

Higgs boson detection

MNIST handwritten digits

text classification

Q amp A (5 mins)

4


About H20 (aka 0xdata)Java Apache v2 Open Source

Join the wwwh2oaicommunity 1 Java Machine Learning in Github

5


Customer Demands for Practical Machine Learning

6

Requirements Value

In-Memory Fast (Interactive)

Distributed Big Data (No Sampling)

Open Source Ownership of Methods

API SDK Extensibility

H2O was developed by 0xdata from scratch to meet these requirements


H2O Integration

H2O

HDFS HDFS HDFS

YARN Hadoop MR

R ScalaJSON Python

Standalone Over YARN On MRv1

7

H2O H2O

Java


H2O Architecture

Distributed In-Memory K-V storeCol compression

Machine Learning

Algorithms

R EngineNano fast

Scoring Engine

Prediction Engine

Memory manager

eg Deep Learning

8

MapReduce


H2O - The Killer App on Spark9

httpdatabrickscomblog20140630sparkling-water-h20-sparkhtml


H2O DeepLearning on Spark10

Test if we can correctly learn A B where Y = logistic(A + BX) test(deep learning log regression) val nPoints = 10000 val A = 20 val B = -15 Generate testing data val trainData = DeepLearningSuitegenerateLogisticInput(A B nPoints 42) Create RDD from testing data val trainRDD = scparallelize(trainData 2) trainRDDcache() import H2OContext_ Create H2O data frame (will be implicit in the future) val trainH2ORDD = toDataFrame(sc trainRDD) Create a H2O DeepLearning model val dlParams = new DeepLearningParameters() dlParamssource = trainH2ORDD dlParamsresponse = trainH2ORDDlastVec() dlParamsclassification = true val dl = new DeepLearning(dlParams) val dlModel = dltrain()get() Score validation data val validationData = DeepLearningSuitegenerateLogisticInput(A B nPoints 17) val validationRDD = scparallelize(validationData 2) val validationH2ORDD = toDataFrame(sc validationRDD) val predictionH2OFrame = new DataFrame(dlModelscore(validationH2ORDD))(predict) val predictionRDD = toRDD[DoubleHolder](sc predictionH2OFrame) will be implicit in the future Validate prediction validatePrediction( predictionRDDcollect()map (_predictgetOrElse(DoubleNaN)) validationData)

Brand-Sparkling-New Sneak Preview


John Chambers (creator of the S language R-core member) names H2O R API in top three promising R projects

H2O R CRAN package


H2O + R = Happy Data Scientist

12

Machine Learning on Big Data with RData resides on the H2O cluster


Higgs Particle Discovery

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

Machine Learning Meets Physics

Or rather Back to the roots (WWW was invented at CERN in rsquo89hellip)


Higgs Binary Classification ProblemCurrent methods of choice for physicists - Boosted Decision Trees - Neural networks with 1 hidden layer BUT Must first add derived high-level features (physics formulae)

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

Metric AUC = Area under the ROC curve (range 05hellip1 higher is better)

add derived

features


Higgs Can Deep Learning Do Better

Letrsquos build a H2O Deep Learning model and find out (That was my last weekend)






Deep Learning

ltYour guess goes heregt

reference paper results baseline 0733


WikipediaDeep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using

architectures composed of multiple non-linear transformations

What is Deep Learning

Example Input data(image)

Prediction (who is it)

16

Facebooks DeepFace (Yann LeCun) recognises faces as well as humans


What is NOT DeepLinear models are not deep (by definition)

Neural nets with 1 hidden layer are not deep (only 1 layer - no feature hierarchy)

SVMs and Kernel methods are not deep (2 layers kernel + linear)

Classification trees are not deep (operate on original input space no new features generated)

17


Deep Learning is Trending

20132009

Google trends

2011

18

Businesses are usingDeep Learning techniques

Google Brain (Andrew Ng Jeff Dean amp Geoffrey Hinton) FBI FACE $1 billion face recognition project Chinese Search Giant Baidu Hires Man Behind the ldquoGoogle Brainrdquo (Andrew Ng)


Deep Learning Historyslides by Yan LeCun (now Facebook)

19

Deep Learning wins competitions AND

makes humans businesses and machines (cyborgs) smarter


1970s multi-layer feed-forward Neural Network (supervised learning with stochastic gradient descent using back-propagation) + distributed processing for big data (H2O in-memory MapReduce paradigm on distributed data) + multi-threaded speedup (H2O ForkJoin worker threads update the model asynchronously) + smart algorithms for accuracy (weight initialization adaptive learning rate momentum dropout regularization l1L2 regularization grid search checkpointing auto-tuning model averaging)

= Top-notch prediction engine

Deep Learning in H2O20


ldquofully connectedrdquo directed graph of neurons

age

income

employment

married

single

Input layerHidden layer 1

Hidden layer 2

Output layer

3x4 4x3 3x2connections

information flow

inputoutput neuronhidden neuron

4 3 2neurons 3

Example Neural Network21


age

income

employmentyj = tanh(sumi(xiuij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yjvjk)+ck)

vjk

zk pl

pl = softmax(sumk(zkwkl)+dl)

wkl

softmax(xk) = exp(xk) sumk(exp(xk))

ldquoneurons activate each other via weighted sumsrdquo

Prediction Forward Propagation

activation function tanh alternative

x -gt max(0x) ldquorectifierrdquo

pl is a non-linear function of xi can approximate ANY function

with enough layers

bj ck dl bias values(indep of inputs)

22

married

single


age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

23

married

single

wkl


Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

24

single002

E

wrate


Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

25


H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

26


Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

27

ldquoSecretrdquo Sauce to Higher Accuracy


Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

28


Detail Dropout Regularization29

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper


MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

30

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

Letrsquos see how H2O does on the MNIST dataset


Frequent errors confuse 27 and 49

H2O Deep Learning on MNIST 087 test set error (so far)

31

test set error 15 after 10 mins 10 after 15 hours 087 after 4 hours

World-class results

No pre-training No distortions

No convolutions No unsupervised

training

Running on 4 nodes with 16 cores each

H2O Deep Learning A Candel

Weather Dataset32

Predict ldquoRainTomorrowrdquo from Temperature Humidity Wind Pressure etc


Live Demo Weather Prediction

Interactive ROC curve with real-time updates

33

3 hidden Rectifier layers Dropout

L1-penalty

127 5-fold cross-validation error is at least as good as GBMRFGLM models

5-fold cross validation


Live Demo Grid Search

How did I find those parameters Grid Search(works for multiple hyper parameters at once)

34

Then continue training the best model


Goal Predict the item from sellerrsquos text description

35

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Binary word vector 0010000010001hellip0

vintagegold condition

Letrsquos see how H2O does on the ebay dataset

Text Classification


Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

36

Note 2 No tuning was done(results are for illustration only)


Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification


Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes


Deep Learning Auto-Encoders for Anomaly Detection

38

Toy example Find anomaly in ECG heart beat data First train a model on whatrsquos ldquonormalrdquo 20 time-series samples of 210 data points each

Deep Auto-Encoder Learn low-dimensional non-linear ldquostructurerdquo of the data that allows to reconstruct the orig data

Also for categorical data


Test set with anomaly

Test set prediction is reconstruction looks ldquonormalrdquo

Found anomaly large reconstruction error

Model of whatrsquos ldquonormalrdquo

+

=gt



R Vignette with example R scripts http0xdatacomh2oalgorithms

All parameters are available from Rhellip

H2O brings Deep Learning to R


POJO Model Export for Production Scoring

41

Plain old Java code is auto-generated to take your H2O Deep Learning models into production


How well did H2O Deep Learning do

Letrsquos see how H2O did in the past 30 minutes

Higgs Particle Discovery with H2O


reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)


H2O Steam Scoring Platform

43

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo


Live Demo on 10-node cluster lt10 minutes runtime for all algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam


AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 6 hidden layers 0880 running - 6x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else H2O prelim results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf



Tips for H2O Deep LearningGeneral More layers for more complex functions (exp more non-linearity) More neurons per layer to detect finer structure in data (ldquomemorizingrdquo) Add some regularization for less overfitting (lower validation set error) Specifically Do a grid search to get a feel for convergence then continue training Try TanhRectifier try max_w2=10hellip50 L1=1e-51e-3 andor L2=1e-5hellip1e-3 Try Dropout (input up to 20 hidden up to 50) with testvalidation set Input dropout is recommended for noisy high-dimensional input Distributed More training samples per iteration faster but less accuracy With ADADELTA Try epsilon = 1e-41e-61e-81e-10 rho = 09095099 Without ADADELTA Try rate = 1e-4hellip1e-2 rate_annealing = 1e-5hellip1e-9 momentum_start = 05hellip09 momentum_stable = 099 momentum_ramp = 1rate_annealing Try balance_classes = true for datasets with large class imbalance Enable force_load_balance for small datasets Enable replicate_training_data if each node can h0ld all the data

46


Extensions for H2O Deep Learning47

- Vision Convolutional amp Pooling Layers PUB-644

- Anomaly Detection PUB-806

- Pre-Training Stacked Auto-Encoders PUB-1014

- Faster Training GPGPU support PUB-1013

- LanguageSequences Recurrent Neural Networks

- Benchmark vs other Deep Learning packages

- Investigate other optimization algorithms

Contribute to H2OAdd your own JIRA tickets


Key Take-AwaysH2O is a distributed in-memory data science platform It was designed for high-performance machine learning applications on big data

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai httpdocsh2oai wwwh2oaicommunity h2oai

48

Thank you

Who am IPhD in Computational Physics 2005

from ETH Zurich Switzerland

6 years at SLAC - Accelerator Physics Modeling 2 years at Skytree Inc - Machine Learning 9 months at 0xdataH2O - Machine Learning

15 years in HPCSupercomputingModeling

Named ldquo2014 Big Data All-Starrdquo by Fortune Magazine

ArnoCandel






1

17







text classification

Q amp A (5 mins)

4




5



6

Requirements Value







H2O Integration

H2O

HDFS HDFS HDFS

YARN Hadoop MR

R ScalaJSON Python


7

H2O H2O

Java


H2O Architecture


Machine Learning

Algorithms

R EngineNano fast

Scoring Engine

Prediction Engine

Memory manager

eg Deep Learning

8

MapReduce










H2O R CRAN package



12




Higgsvs

Background















add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you






1

17







text classification

Q amp A (5 mins)

4




5



6

Requirements Value







H2O Integration

H2O

HDFS HDFS HDFS

YARN Hadoop MR

R ScalaJSON Python


7

H2O H2O

Java


H2O Architecture


Machine Learning

Algorithms

R EngineNano fast

Scoring Engine

Prediction Engine

Memory manager

eg Deep Learning

8

MapReduce










H2O R CRAN package



12




Higgsvs

Background















add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you







text classification

Q amp A (5 mins)

4




5



6

Requirements Value







H2O Integration

H2O

HDFS HDFS HDFS

YARN Hadoop MR

R ScalaJSON Python


7

H2O H2O

Java


H2O Architecture


Machine Learning

Algorithms

R EngineNano fast

Scoring Engine

Prediction Engine

Memory manager

eg Deep Learning

8

MapReduce










H2O R CRAN package



12




Higgsvs

Background















add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you




5



6

Requirements Value







H2O Integration

H2O

HDFS HDFS HDFS

YARN Hadoop MR

R ScalaJSON Python


7

H2O H2O

Java


H2O Architecture


Machine Learning

Algorithms

R EngineNano fast

Scoring Engine

Prediction Engine

Memory manager

eg Deep Learning

8

MapReduce










H2O R CRAN package



12




Higgsvs

Background















add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



6

Requirements Value







H2O Integration

H2O

HDFS HDFS HDFS

YARN Hadoop MR

R ScalaJSON Python


7

H2O H2O

Java


H2O Architecture


Machine Learning

Algorithms

R EngineNano fast

Scoring Engine

Prediction Engine

Memory manager

eg Deep Learning

8

MapReduce










H2O R CRAN package



12




Higgsvs

Background















add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you


H2O Integration

H2O

HDFS HDFS HDFS

YARN Hadoop MR

R ScalaJSON Python


7

H2O H2O

Java


H2O Architecture


Machine Learning

Algorithms

R EngineNano fast

Scoring Engine

Prediction Engine

Memory manager

eg Deep Learning

8

MapReduce










H2O R CRAN package



12




Higgsvs

Background















add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you


H2O Architecture


Machine Learning

Algorithms

R EngineNano fast

Scoring Engine

Prediction Engine

Memory manager

eg Deep Learning

8

MapReduce










H2O R CRAN package



12




Higgsvs

Background















add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you










H2O R CRAN package



12




Higgsvs

Background















add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you







H2O R CRAN package



12




Higgsvs

Background















add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



H2O R CRAN package



12




Higgsvs

Background















add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



12




Higgsvs

Background















add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



Higgsvs

Background















add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you










add derived

features









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you









Deep Learning









16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you







16







17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you






17



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



20132009

Google trends

2011

18





19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



19









age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you







age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



age

income

employment

married

single


Hidden layer 2

Output layer


information flow


4 3 2neurons 3



age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you


age

income


uij

xi

yj



vjk

zk pl


wkl







with enough layers


22

married

single


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you


age

income

employment

xi









23

married

single

wkl









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you









1

24

single002

E

wrate






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you






wi


xiE = error(y)

y = activation(net)



25



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4




threads


updated model w







JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i


26





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you





27










RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you









RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =


cf ADADELTA paper

28




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you




age

income

employment

married

singleX

X

X







30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you




30









31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you




31


World-class results



training



Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you


Weather Dataset32





33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you




33


L1-penalty






34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you




34




35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



35






Text Classification



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



36




Text Classification



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



37

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes


27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



38









+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you






+

=gt








41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you







41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



41











43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you










43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



43



Live Demo






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you






low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



low-level H2O AUC

all featuresH2O AUC















46















48

Thank you



46















48

Thank you















48

Thank you





48

Thank you

Software

Deep Learning through Examples - Kaggle #1