View
237
Download
8
Category
Preview:
Citation preview
An Introduction to Deep Learning with RapidMiner
Philipp Schlunder - RapidMiner Research
©2017 RapidMiner, Inc. All rights reserved. - 2 -
1. Things to know before getting started
2. What’s Deep Learning anyway?
3. How to use it inside RapidMiner
What’s in store for today?
©2017 RapidMiner, Inc. All rights reserved. - 3 -
Before getting up to speed
©2017 RapidMiner, Inc. All rights reserved. - 4 -
Hardware
What do I actually need?
Minimum Recommended Starter Kit Optimum
PC or Mac running RapidMiner PC with supported NVIDIA GPU running Linux/Windows
Dedicated Server with multiple GPUs running Linux
Software
• RapidMiner Studio or RapidMiner Server• Python installation (at least Version 3.5) with
• Pandas• Linux recommended: Tensorflow (for GPUs)• Windows recommended: Microsoft CNTK
• Keras extension
Keras
TensorflowCognitive
ToolkitTheano
©2017 RapidMiner, Inc. All rights reserved. - 5 -
I don’t have a server with GPUs!
We’ve got you covered with RapidMiner Server on the Amazon AWS and Microsoft Azuremarketplaces
• Put your server where your data is
• Server installation in minutes
• No need for own hardware
• Bring-your-own-license images(Amazon AMIs & Microsoft VHDs)
©2017 RapidMiner, Inc. All rights reserved. - 6 -
1. Stuck with current model– Accuracy not high enough
2. Huge amount of training data available
3. Fixed specific use-case
4. Need notion of memory– Understanding context
5. Multi-dimensional input– Pictures: 3 colors per Pixel
– Text: Word Vectors
When should I try Deep Learning?
©2017 RapidMiner, Inc. All rights reserved. - 7 -
• Large number of parameters to tune often req. huge amount of data– Is my infrastructure suited for that?
– Is the potential ROI high enough for the resource investment?
• For many general ML tasks only minor improvement
• Often not that flexible and very domain specific
• Hard to understand reasoning (consider LIME)
• General Data Protection Regulation (25th of May 2018, EU-based)– “right to an explanation”
– “prevent discriminatory effects based on race, opinions, health”
Things to consider
©2017 RapidMiner, Inc. All rights reserved. - 8 -
What is Deep Learning anyway?
©2017 RapidMiner, Inc. All rights reserved. - 9 -
Machine Learning
©2017 RapidMiner, Inc. All rights reserved. - 10 -
Hierarchical Learning
©2017 RapidMiner, Inc. All rights reserved. - 11 -
First Representation Second RepresentationInput
Hierarchical Learning
©2017 RapidMiner, Inc. All rights reserved. - 12 -
Hierarchical Learning
©2017 RapidMiner, Inc. All rights reserved. - 13 -
Deep Learning
3x
©2017 RapidMiner, Inc. All rights reserved. - 14 -
Layers
Input Layer Output Layer
Hidden Layer
©2017 RapidMiner, Inc. All rights reserved. - 15 -
Neurons
Input Layer Output Layer
Number of Attributes
Hidden Layer
©2017 RapidMiner, Inc. All rights reserved. - 16 -
Neurons
Input Layer Output Layer
Number of Attributes
Number of Label values
Hidden Layer
©2017 RapidMiner, Inc. All rights reserved. - 17 -
Neurons
Input Layer Output Layer
Number of Attributes
Number of Label values
Hidden Layer
New Attribute based on previous ones and some activation function (e.g. ReLu)
©2017 RapidMiner, Inc. All rights reserved. - 18 -
Layer: Fully-Connected
Input Layer Output Layer
Hidden Layer
Fully-Connected (Dense) Layer
©2017 RapidMiner, Inc. All rights reserved. - 19 -
Change of Perspective
©2017 RapidMiner, Inc. All rights reserved. - 20 -
Layer: Convolutional
©2017 RapidMiner, Inc. All rights reserved. - 21 -
Layer: Convolutional
©2017 RapidMiner, Inc. All rights reserved. - 22 -
Layer: Convolutional
Example Filter
X =
Activation Map
(N – F) / Stride + 1
N = 4
F = 2
©2017 RapidMiner, Inc. All rights reserved. - 23 -
Layer: Convolutional
Example Filter
X =
Activation Map
(N – F) / Stride + 1
N = 4
F = 2
©2017 RapidMiner, Inc. All rights reserved. - 24 -
Layer: Convolutional
Example Filter
X =
Activation Map
(N – F) / Stride + 1= (4 – 2) / 1 + 1= 3
N = 4
F = 2
Stride = 1
©2017 RapidMiner, Inc. All rights reserved. - 25 -
Layer: Convolutional
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
1
0
0
1
Example Filter11
X =
Activation Map
4x1 + 3x0+ 1x0 + 7x1
= 11
©2017 RapidMiner, Inc. All rights reserved. - 26 -
Layer: Convolutional
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
1
0
0
1
Example Filter11 11
X =
Activation Map
©2017 RapidMiner, Inc. All rights reserved. - 27 -
Layer: Convolutional
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
1
0
0
1
Example Filter11 11 9
X =
Activation Map
©2017 RapidMiner, Inc. All rights reserved. - 28 -
Layer: Convolutional
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
1
0
0
1
Example Filter11
7
11 9
X =
Activation Map
©2017 RapidMiner, Inc. All rights reserved. - 29 -
Layer: Convolutional
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
1
0
0
1
Example Filter11
7
13
11
10
6
9
10
7
X =
Activation Map
©2017 RapidMiner, Inc. All rights reserved. - 30 -
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Layer: Convolutional
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
Padding = 1
©2017 RapidMiner, Inc. All rights reserved. - 31 -
Layer: ConvolutionalActivation Map
©2017 RapidMiner, Inc. All rights reserved. - 32 -
Layer: Convolutional
1
0
0
1
Filter:Activation Map
©2017 RapidMiner, Inc. All rights reserved. - 33 -
Layer: Convolutional
Convolutional Layer
1
0
1
0
1
0
0
1
Filter:Activation Map
©2017 RapidMiner, Inc. All rights reserved. - 34 -
Layer: Pooling
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
7
Max(imum) Pooling
©2017 RapidMiner, Inc. All rights reserved. - 35 -
Layer: Pooling
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
7 8
Max(imum) Pooling
Stride = 2
©2017 RapidMiner, Inc. All rights reserved. - 36 -
Layer: Pooling
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
7
9
8
Max(imum) Pooling
©2017 RapidMiner, Inc. All rights reserved. - 37 -
Layer: Pooling
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
7
9
8
4
Max(imum) Pooling
©2017 RapidMiner, Inc. All rights reserved. - 38 -
Layer: Pooling
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
7
9
8
4
Max(imum) Pooling
15/4
22/4
24/4
9/4
Average Pooling
©2017 RapidMiner, Inc. All rights reserved. - 39 -
Layer: Pooling
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
7
9
8
4
Max(imum) Pooling
• Representation becomes smaller less values to optimize
• Applied to each activation map of the previous layer
©2017 RapidMiner, Inc. All rights reserved. - 40 -
𝑠0
Layer: Recurrent
𝑠0
State of the layer
©2017 RapidMiner, Inc. All rights reserved. - 41 -
𝑠1
Layer: Recurrent
𝑠1
State of the layerchanges with each repetition
©2017 RapidMiner, Inc. All rights reserved. - 42 -
𝑠3
𝑠1
𝑠2
Layer: Recurrent
𝑠0
Neurons with hidden states
I drank some coffee
Drinking a hot beverage that […] for me it is coffee.
𝑠93𝑠1 𝑠2𝑠0…
©2017 RapidMiner, Inc. All rights reserved. - 43 -
Layer: Long Short Term Memory
of
𝑠𝑝𝑟𝑒𝑣
i
Information can be:
• Not stored (input)
• Only stored
partially (gate)
• Forgotten
• Be considered only
partially (output)
g
©2017 RapidMiner, Inc. All rights reserved. - 44 -
Layer: DropoutDropout Rate
• Probability of
dropping a neuron
• Between 0 and 1
Dropout is a form of regularization. It helps reduce complexity and frame the model.
©2017 RapidMiner, Inc. All rights reserved. - 45 -
Input & Setup
©2017 RapidMiner, Inc. All rights reserved. - 46 -
Input
Input Layer Output Layer
Hidden Layer
©2017 RapidMiner, Inc. All rights reserved. - 47 -
Input
Input Layer input_shapedepends on layers used
• Convolution/Recurrent:
(timesteps, input_dim)
= (1, 3)
• Others:
(input_dim,)
= (3,)
©2017 RapidMiner, Inc. All rights reserved. - 48 -
Input
Input Layer input_shape
• Convolution/Recurrent:
(timesteps, input_dim)
= (4,3)
since the data contains
timesteps
©2017 RapidMiner, Inc. All rights reserved. - 49 -
Input
Input Data Set
batch_size• Number of entries to
use in one model building step(obtaining weights)
Here: 4
Entry (picture, sentence, …)
Batch
©2017 RapidMiner, Inc. All rights reserved. - 50 -
The Model – Weights
weights
Weight amount ofa neuron’s contribution
Weights are changed using an optimizer
An optimizer reduces the loss
One weight change cycle is called an epoch
©2017 RapidMiner, Inc. All rights reserved. - 51 -
The Model – Building it
Parameter Binary Classification Multi-Class Classification Regression
optimizer adam/rmsprop adam/rmsprop adam/rmsprop
loss binary_crossentropy categorical_crossentropy mse
While training observe loss (log/Tensorboard):
• Bump change weight init
• Lowers slowly learning rate too low
• Fast decline, then stagnating learning rate too high
• Brief decline, rises extremely afterwards learning rate
way too high
• rmsprop: esp. good
choice for recurrent
• adam: rmsprop with
extra (momentum)
©2017 RapidMiner, Inc. All rights reserved. - 52 -
ou
tpu
t
recu
rren
t
po
olin
g
con
volu
tio
n
inp
ut
con
volu
tio
n
po
olin
g
con
volu
tio
n
con
volu
tio
n
fully
-co
nn
ecte
d
The Model – Architecture
• Start small and simple
• Test overfitting on small batch
(small loss, ~1.0 accuracy)
• Add regularization until loss goes down• Not going down, learning rate too low• Constant or NaN, learning rate too high
• Hyper Parameter Tuning
• Often:• No. filters rises (64 128 256)• Filter size shrinks (5 3)• Stride shrinks (2 1)• Padding is introduced (0 1)
©2017 RapidMiner, Inc. All rights reserved. - 53 -
Deep Learning Summary
©2017 RapidMiner, Inc. All rights reserved. - 54 -
Summary
• Different Types of Layers:– Fully-connected, convolutional, recurrent, pooling, dropout, …
• Model is defined by:– Setup of layers (architecture)
– Weights obtained through optimization of a loss over several epochs
• Classification vs. Regression:– Change number of units in last layer (Number of possible classes vs.
number of targeted values to predict)
– Change loss from crossentropy to mse
©2017 RapidMiner, Inc. All rights reserved. - 55 -
How to use it in RapidMiner?
©2017 RapidMiner, Inc. All rights reserved. - 56 -
Options in RapidMiner
Feature Neural Net Deep Learning H20
Deep LearningKeras
Fully-Connected Layer X X X
Basic Optimization X X X
Advanced Optimization X X
Multi-Threading X X
GPU Support X
Advanced Layers X
Complexity
©2017 RapidMiner, Inc. All rights reserved. - 57 -
Demo Time
©2017 RapidMiner, Inc. All rights reserved. - 58 -
Classification
©2017 RapidMiner, Inc. All rights reserved. - 59 -
Classification
©2017 RapidMiner, Inc. All rights reserved. - 60 -
Classification
©2017 RapidMiner, Inc. All rights reserved. - 61 -
Classification
3 label values
©2017 RapidMiner, Inc. All rights reserved. - 62 -
Regression
©2017 RapidMiner, Inc. All rights reserved. - 63 -
Regression
©2017 RapidMiner, Inc. All rights reserved. - 64 -
Regression
©2017 RapidMiner, Inc. All rights reserved. - 65 -
Regression 32 entries
256 iterations of changing/optimizing the weightswith a method called ‘Adam’
©2017 RapidMiner, Inc. All rights reserved. - 66 -
Regression
©2017 RapidMiner, Inc. All rights reserved. - 67 -
Regression
64 filters 64 activation maps
©2017 RapidMiner, Inc. All rights reserved. - 68 -
Regression
1
9
4
3
1
2
7
4
7
6
3
4
8
3
8
0
7
9
8
4
©2017 RapidMiner, Inc. All rights reserved. - 69 -
Regression
©2017 RapidMiner, Inc. All rights reserved. - 70 -
Regression
©2017 RapidMiner, Inc. All rights reserved. - 71 -
Regression
…250 neurons
©2017 RapidMiner, Inc. All rights reserved. - 72 -
Regression
1 value to estimate (closing date of 30th timestep)
©2017 RapidMiner, Inc. All rights reserved. - 73 -
Regression
Log storage directory
©2017 RapidMiner, Inc. All rights reserved. - 74 -
Regression
Log storage directory
tensorboard --logdir=C:/Users/PhilippSchlunder/Desktop/logs/
Start Tensorboard with log storagedirectory location provided throughcommand line
View loss graph in Browser under the address:Localhost:6006
©2017 RapidMiner, Inc. All rights reserved. - 75 -
Q & A
Philipp Schlunder
@WortPixel
inquiries@rapidminer.com @RapidMinerwww.rapidminer.com
Download RapidMiner 7.6
©2017 RapidMiner, Inc. All rights reserved. - 76 -
Sources
• Stanford CS231n
• Keras docu
• When not to use deep learning
• Deep Learning is not the AI future
Recommended