Introduction of Deep Neural Network
by Sang-Beom Nam
Bioimaging Lab.Department of Biomedical Engineering
Kyung Hee University
Bioimaging
Contents
• History of Deep Learning• Perceptron• Deep Learning• Deep Belief Network (Pre-Training and Fine Tuning)
• Parameters of DNN• Other DNN algorithms• Applications
Bioimaging
Deep Learning
Bioimaging
Deep Learning
Local MinimumProblem
BigdataBigdata
Unsupervised & Generative Training
Unsupervised & Generative Training
Drop-outDrop-out
XOR Problem
1967
Single-layeredNeural Network
(Perceptron)
ArtificialNeural Network
(ANN)Training
Time
Overfitting
Restricted Boltzmann MachineDeep Belief Networks
Contrastive Divergence-kContrastive Divergence-k
input1 input2 Bias
Output
Processor
1957F. Rosenblatt
1986D. Rumelhart - G. Hilton
- R. Wiliams
2006G. Hinton
Deeper Layer
=Higher Level
Features
Limitations
§ History of Deep Learning?• It is NOT a new technique, but improved version of Artificial Neural Network
(ANN) which is introduced to solve a XOR Problem of a perceptron in 1986.• A deeper structure of ANN can represent higher level features.• However, deep structure has limitations of local minima, over-fitting and
training time. (SVM, GMM got attention)• G. Hinton had been doing research on ANN, and he proposed <Deep Belief
Network> to avoid the limitations of ANN in 2006.
<History of Deep Neural Network>
Bioimaging
Deep Learning
§ Perceptron• Perceptron consists of input layer, weights and processor.• Processor performs “Weighted Sum” and “Activation Functions”
Ex) x1=0.5, x2=-0.3, w1=4, w2=5, Activation Func. = sgn(x)x1*w1 + x2*w2 = 2+(-1.5) = 0.5sgn(0.5)=1
If we expect -1 as an output when x1=0.5, x2=-0.3, the weight should be modified Such as w1=1 → sgn = 0.5 + −1.5 = −0.5 = −1
< Structure of a Perceptron >
Bioimaging
Deep Learning
§ Deep Neural Network (DNN)• Each circle in the figure indicates a perceptron.• Computation of DNN consist of multiple ‘Linear Multiplications’ and ‘Non-
Linear Activations’
§ New Algorithms• Deep Belief Network : Pre-Training with Restricted Boltzmann Machine
- Re-birth of Deep Learning- Unsupervised & Generative Training
• Drop-Out- Efficient methodology to avoid an overfitting- Regularizer by Randomness
< Structure of Deep Neural Network > < Process of Deep Neural Network >
Bioimaging
Deep Belief Network
500hiddenUnitsdata
200hiddenUnits500hiddenUnits
50hiddenUnits200hiddenUnits
OutputUnits50hiddenUnits OutputUnits50hiddenUnits200hiddenUnits500hiddenUnitsdata
Adjust all initialized weights via back propagation
RBM
RBM
RBM
RBM
§ Pre-Training with Restricted Boltzmann Machine (RBM)
Unsupervised Learning Supervised Learning
Bioimaging
Pre-training
Ø Each Two layer in DBNØ RBM is one of ‘Energy Based Model (EBM)’ with Boltzmann distribution = ()/, , ℎ = − − ℎ − ℎØ No connection between nodes in the same layer
Joint Probability: , ℎ = (,) = Marginal Probability: = ∑ (,) = ∑ . = ∑ ∑ (∗,). = ∑ ∏ (∗,). = ∏ ∑ (∗,)∈,. = ∏ (1+(:))Conditional Probability: p(h|v) = (,)() = ∏ exp( : )
1+(:) ∴ ℎ = 1 = exp :1+(:) = ( + ∑ )∴ = 1 ℎ = ( + ∑ ℎ )
§ What is RBM?
v : Visible Layerh : Hidden Layerb : Bias of visible Layerc : Bias of hidden LayerW: Weight
Bioimaging
Pre-training
Cost Function, L (Training Sample :, , … , ) = −∑ log = −∑ log = − (∑ (,) ) + ()= − (∑ (,) ) + = − ∑ ( , )∑ (− , ) , +∑ ∑ (−(,)) (,)= ∑ ℎ , −∑ ∑ , ℎ ,=< , > - < , >
<*>:Expectation:W,b,c
§ Likelihood gradient for a RBM with observed inputs v, hidden outputs y
Z= ∑ ∑ (−(,)) , ℎ = (,) p(h|v) = ,∑ ,
Bioimaging
Pre-training
To minimize L, use Gradient Descent Method : =- , = −ℎ , = − , = −ℎ
§ Training RBM
∆ = (< ℎ >() − < ℎ >())∆ = (< >() − < >())∆ = (< ℎ >() − < ℎ>())Fantasyℎ()< ℎ >() …< ℎ >() < ℎ >()
ℎ() ℎ()() () ()
< Gibbs Sampling >
However, < (,) > is value at steady state.Also, for , if we perform gibbs sampling for infinite times, RBM goes steady state Then, < (,) > = < (,) > ∴ =< , > - < , > = < , >() - < , >()
=< , > - < , >
Bioimaging
Pre-training
§ Greedy and Layer-wise Training
Output
ℎℎ
ℎOutput
ℎ
ℎOutput
ℎ
ℎOutput
ℎℎ
Output
ℎℎℎ
Output
ℎℎ
§ Contrastive Divergence-1 Algorithm : Only one sampling is enough for most applications in terms of pre-training∆ = (< ℎ >() − < ℎ >())∆ = (< >() − < >())∆ = (< ℎ >() − < ℎ>())
< Gibbs Sampling >
ℎ()< ℎ >() < ℎ >()ℎ()
() ()
ℎ = 1 = ( + ∑ )
= 1 ℎ = ( + ∑ ℎ )
Bioimaging
Fine-Tuning
§ Fine-tuning via Backpropagation
- Differences between the output data
and the teach data (label) should be
minimized.
- Update Weights and biases of the each
layer by calculating gradient from the
output layer to the input layer
End
Update the weights
Error < e
Compute the error
Nth hidden units
2nd hidden units
1st hidden units
Data
NY
…
Bioimaging
Fine-Tuning
§ Drop-out
• To prevent over-fitting problem, we had to Increase size of database
• However, in general, the size of database is limited.
• We want to avoid the over-fitting problem maintaining the same complexity of network under the limited database.
• Also, add some randomness as a noise to make the network independent with the given data.
Þ Drop some nodes out with a random rate
The output of the dropped node is zero
OutputLayer
InputLayer
Bioimaging
Parameters of DNN
§ DNN Structure- Number of Nodes in Input Layer- Number of Nodes in Output Layer- Number of Hidden Layers- Number of Nodes in Each Hidden Layer
§ Training Parameters- Momentum- Learning Rate- Weight Initial Value- Drop-Out Rate- Mini-Batch Size- Big-Batch Size
Remained Problem : There is no optimal way to define those parameters
Bioimaging
Other DNN Algorithms
§ Convolutional Neural Network (CNN)
§ Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM)
§ Regional Convolutional Neural Network
Bioimaging
Applications
§ Hand Writing Recognition ( http://www.cs.toronto.edu/~hinton/adi/ )§ Automatic Speech Recognition§ Image Recognition§ Natural Language Processing§ Drug Discovery and Toxicology§ Customer Relationship Management§ Bioinformatics§ Etc.§ Real-time Human Activity Recognition via Deep Learning