Download pdf - Introduction of Deep Neural Network - khu.ac.krweb.khu.ac.kr/~tskim/PatternClass Lec Note 25-1 Deep Learning.pdf · Introduction of Deep Neural Network by Sang-BeomNam BioimagingLab

Introduction of Deep Neural Network

by Sang-Beom Nam

Bioimaging Lab.Department of Biomedical Engineering

Kyung Hee University

Bioimaging

Contents

• History of Deep Learning• Perceptron• Deep Learning• Deep Belief Network (Pre-Training and Fine Tuning)

• Parameters of DNN• Other DNN algorithms• Applications

Bioimaging

Deep Learning

Bioimaging

Deep Learning

Local MinimumProblem

BigdataBigdata

Unsupervised & Generative Training

Unsupervised & Generative Training

Drop-outDrop-out

XOR Problem

1967

Single-layeredNeural Network

(Perceptron)

ArtificialNeural Network

(ANN)Training

Time

Overfitting

Restricted Boltzmann MachineDeep Belief Networks

Contrastive Divergence-kContrastive Divergence-k

input1 input2 Bias

Output

Processor

1957F. Rosenblatt

1986D. Rumelhart - G. Hilton

- R. Wiliams

2006G. Hinton

Deeper Layer

=Higher Level

Features

Limitations

§ History of Deep Learning?• It is NOT a new technique, but improved version of Artificial Neural Network

(ANN) which is introduced to solve a XOR Problem of a perceptron in 1986.• A deeper structure of ANN can represent higher level features.• However, deep structure has limitations of local minima, over-fitting and

training time. (SVM, GMM got attention)• G. Hinton had been doing research on ANN, and he proposed <Deep Belief

Network> to avoid the limitations of ANN in 2006.

<History of Deep Neural Network>

Bioimaging

Deep Learning

§ Perceptron• Perceptron consists of input layer, weights and processor.• Processor performs “Weighted Sum” and “Activation Functions”

Ex) x1=0.5, x2=-0.3, w1=4, w2=5, Activation Func. = sgn(x)x1*w1 + x2*w2 = 2+(-1.5) = 0.5sgn(0.5)=1

If we expect -1 as an output when x1=0.5, x2=-0.3, the weight should be modified Such as w1=1 → sgn = 0.5 + −1.5 = −0.5 = −1

< Structure of a Perceptron >

Bioimaging

Deep Learning

§ Deep Neural Network (DNN)• Each circle in the figure indicates a perceptron.• Computation of DNN consist of multiple ‘Linear Multiplications’ and ‘Non-

Linear Activations’

§ New Algorithms• Deep Belief Network : Pre-Training with Restricted Boltzmann Machine

- Re-birth of Deep Learning- Unsupervised & Generative Training

• Drop-Out- Efficient methodology to avoid an overfitting- Regularizer by Randomness

< Structure of Deep Neural Network > < Process of Deep Neural Network >

Bioimaging

Deep Belief Network

500hiddenUnitsdata

200hiddenUnits500hiddenUnits

50hiddenUnits200hiddenUnits

OutputUnits50hiddenUnits OutputUnits50hiddenUnits200hiddenUnits500hiddenUnitsdata

Adjust all initialized weights via back propagation

RBM

RBM

RBM

RBM

§ Pre-Training with Restricted Boltzmann Machine (RBM)

Unsupervised Learning Supervised Learning

Bioimaging

Pre-training

Ø Each Two layer in DBNØ RBM is one of ‘Energy Based Model (EBM)’ with Boltzmann distribution = ()/, , ℎ = − − ℎ − ℎØ No connection between nodes in the same layer

Joint Probability: , ℎ = (,) = Marginal Probability: = ∑ (,) = ∑ . = ∑ ∑ (∗,). = ∑ ∏ (∗,). = ∏ ∑ (∗,)∈,. = ∏ (1+(:))Conditional Probability: p(h|v) = (,)() = ∏ exp( : )

1+(:) ∴ ℎ = 1 = exp :1+(:) = ( + ∑ )∴ = 1 ℎ = ( + ∑ ℎ )

§ What is RBM?

v : Visible Layerh : Hidden Layerb : Bias of visible Layerc : Bias of hidden LayerW: Weight

Bioimaging

Pre-training

Cost Function, L (Training Sample :, , … , ) = −∑ log = −∑ log = − (∑ (,) ) + ()= − (∑ (,) ) + = − ∑ ( , )∑ (− , ) , +∑ ∑ (−(,)) (,)= ∑ ℎ , −∑ ∑ , ℎ ,=< , > - < , >

<*>:Expectation:W,b,c

§ Likelihood gradient for a RBM with observed inputs v, hidden outputs y

Z= ∑ ∑ (−(,)) , ℎ = (,) p(h|v) = ,∑ ,

Bioimaging

Pre-training

To minimize L, use Gradient Descent Method : =- , = −ℎ , = − , = −ℎ

§ Training RBM

∆ = (< ℎ >() − < ℎ >())∆ = (< >() − < >())∆ = (< ℎ >() − < ℎ>())Fantasyℎ()< ℎ >() …< ℎ >() < ℎ >()

ℎ() ℎ()() () ()

< Gibbs Sampling >

However, < (,) > is value at steady state.Also, for , if we perform gibbs sampling for infinite times, RBM goes steady state Then, < (,) > = < (,) > ∴ =< , > - < , > = < , >() - < , >()

=< , > - < , >

Bioimaging

Pre-training

§ Greedy and Layer-wise Training

Output

ℎℎ

ℎOutput

ℎ

ℎOutput

ℎ

ℎOutput

ℎℎ

Output

ℎℎℎ

Output

ℎℎ

§ Contrastive Divergence-1 Algorithm : Only one sampling is enough for most applications in terms of pre-training∆ = (< ℎ >() − < ℎ >())∆ = (< >() − < >())∆ = (< ℎ >() − < ℎ>())

< Gibbs Sampling >

ℎ()< ℎ >() < ℎ >()ℎ()

() ()

ℎ = 1 = ( + ∑ )

= 1 ℎ = ( + ∑ ℎ )

Bioimaging

Fine-Tuning

§ Fine-tuning via Backpropagation

- Differences between the output data

and the teach data (label) should be

minimized.

- Update Weights and biases of the each

layer by calculating gradient from the

output layer to the input layer

End

Update the weights

Error < e

Compute the error

Nth hidden units

2nd hidden units

1st hidden units

Data

NY

…

Bioimaging

Fine-Tuning

§ Drop-out

• To prevent over-fitting problem, we had to Increase size of database

• However, in general, the size of database is limited.

• We want to avoid the over-fitting problem maintaining the same complexity of network under the limited database.

• Also, add some randomness as a noise to make the network independent with the given data.

Þ Drop some nodes out with a random rate

The output of the dropped node is zero

OutputLayer

InputLayer

Bioimaging

Parameters of DNN

§ DNN Structure- Number of Nodes in Input Layer- Number of Nodes in Output Layer- Number of Hidden Layers- Number of Nodes in Each Hidden Layer

§ Training Parameters- Momentum- Learning Rate- Weight Initial Value- Drop-Out Rate- Mini-Batch Size- Big-Batch Size

Remained Problem : There is no optimal way to define those parameters

Bioimaging

Other DNN Algorithms

§ Convolutional Neural Network (CNN)

§ Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM)

§ Regional Convolutional Neural Network

Bioimaging

Applications

§ Hand Writing Recognition ( http://www.cs.toronto.edu/~hinton/adi/ )§ Automatic Speech Recognition§ Image Recognition§ Natural Language Processing§ Drug Discovery and Toxicology§ Customer Relationship Management§ Bioinformatics§ Etc.§ Real-time Human Activity Recognition via Deep Learning