Neural Networks - An Introduction · 2017-03-16 · Neural Networks - An Introduction Author:...

Preview:

Citation preview

Neural NetworksAn Introduction

Warith HARCHAOUI

MAP5, UMR 8145Universite Paris-Descartes

Sorbonne Paris Cite&

Oscaro.comResearch and Development

March 2017

Outline

Supervised Classification and RegressionClassificationRegression

One Neuronfor Regressionfor Classification

Gradient DescentBatch Gradient DescentStochastic Gradient Descent

Several Neurons

Convolutional Neural Networks for Images

Adversarial Networks

Conclusion

Outline

Supervised Classification and RegressionClassificationRegression

One Neuronfor Regressionfor Classification

Gradient DescentBatch Gradient DescentStochastic Gradient Descent

Several Neurons

Convolutional Neural Networks for Images

Adversarial Networks

Conclusion

Supervised ClassificationThe binary case

Given a training set that consists of:

I xi ∈ RD

I yi ∈ {0, 1}for i = 1, . . . , nFind F s.t. F(xi ) ' yiEx:xi is an imageyi = 1 corresponds to “cat”yi = 0 corresponds to “non-cat”

Supervised ClassificationMore than 2 classes

Given a training set that consists of:

I xi ∈ RD

I yi ∈ {0, 1}K one-hot representation

for i = 1, . . . , nFind F s.t. F(xi ) ' yiEx:xi is an imageyi = [1, 0, 0] corresponds to “cat”yi = [0, 1, 0] corresponds to “dog”yi = [0, 0, 1] corresponds to “elephant”

Regression

Given a training set that consists of:

I xi ∈ RD

I yi ∈ RK

for i = 1, . . . , nFind F s.t. F(xi ) ' yiEx:xi is a buildingyi is the rent value of the building

Outline

Supervised Classification and RegressionClassificationRegression

One Neuronfor Regressionfor Classification

Gradient DescentBatch Gradient DescentStochastic Gradient Descent

Several Neurons

Convolutional Neural Networks for Images

Adversarial Networks

Conclusion

One NeuronAn Input-Output Machine

x1

x2

x3

a y

w1

w2

w3

Figure: One Neuron

y = a(w1x1 + w2x2 + w3x3 + b)

One Neuron for RegressionLeast Mean Squares

Prediction:F(xi ) = yi = Wxi + b

Loss:

L(W,b) =1

n

n∑i=1

‖yi − yi‖22

One Neuron for Binary ClassificationLogistic Function

Prediction:

scorei = w>xi + b

P(yi = 1) = pi = Sigmoid(scorei ) =1

1 + exp(−scorei )

Loss:

`(w, b) =∏

i :yi=1

pi∏

i :yi=0

(1− pi ) =n∏

i=1

pyii (1− pi )1−yi

L(w, b) =−1

nlog(`(w, b)) =

−1

n

n∑i=1

yi log(pi )+(1−yi ) log(1−pi )

One neuron for Binary ClassificationLogistic function

Figure: The Sigmoid Function

Sigmoid(a) =1

1 + exp(−a)

One Neuron for Classification of K > 2 classesSoftmax Function

Prediction:

scoreik = wk>xi + bk

pik = SoftMax(scorei ) =exp(scoreik)∑K

k ′=1 exp(scoreik ′)

yi ,k = 1⇔ xi belongs to the kth class

yi ,k = 0⇔ xi does not belong to the kth class

Loss:

`(W,b) =n∏

i=1

K∏k=1

pyi,ki ,k

L(W,b) =−1

nlog(`(W, b)) =

−1

n

n∑i=1

K∑k=1

yi ,k log(pi ,k)

Outline

Supervised Classification and RegressionClassificationRegression

One Neuronfor Regressionfor Classification

Gradient DescentBatch Gradient DescentStochastic Gradient Descent

Several Neurons

Convolutional Neural Networks for Images

Adversarial Networks

Conclusion

Batch Gradient DescentThe common problem

Loss function:

L(W,b) =1

n

n∑i=1

Li (W,b)

Problem:

minW,bL(W,b)

Batch Gradient DescentA Universal Learning Procedure

minw

1

n

n∑i=1

Li (w)

1. Choose a random w and a constant α > 0

2. Iterate:wnew = wold − α∇L(wold)

∇L(wold) =1

n

n∑i=1

∇Li (wold)

Stochastic Gradient DescentA Universal Learning Procedure

minw

1

n

n∑i=1

Li (w)

1. Choose a random w and a constant α > 0

2. Iterate:

2.1 Choose a random subset J ⊂ (1, n) ⊂ N (sometimes reducedto a singleton)

2.2wnew = wold − α

|J|∑j∈J

∇Lj(wold)

Outline

Supervised Classification and RegressionClassificationRegression

One Neuronfor Regressionfor Classification

Gradient DescentBatch Gradient DescentStochastic Gradient Descent

Several Neurons

Convolutional Neural Networks for Images

Adversarial Networks

Conclusion

Several NeuronsThe Power of Back-Propagation

x1i

x2i

x3i

x4i

pi

Hiddenlayer

Inputlayer

Outputlayer

Figure: A Multi-Layer-Perceptron

Several NeuronsThe Power of Back-Propagation

Back-Propagation is just an iterated version of Chain Rule forplenty of functions:

(F ◦ G)′ =(F ′ ◦ G

)× G′

NB: (F ◦ G)(x) = F(G(x))

Three Remarks

1. Non-linearity: Sigmoid, SoftMax, ReLu

ReLu(x) = max(x , 0)

2. Automatic Differentiation thanks to: Theano, Torch, Caffe,Tensorflow, PyTorch

3. GPU Acceleration

Outline

Supervised Classification and RegressionClassificationRegression

One Neuronfor Regressionfor Classification

Gradient DescentBatch Gradient DescentStochastic Gradient Descent

Several Neurons

Convolutional Neural Networks for Images

Adversarial Networks

Conclusion

Convolutional Neural Networks for ImagesConvolutions

x : a PixelI: an Image in gray levelsK: a Kernel = A FilterI ∗ K: Convolution of image I by filter KnonLinearity(I ∗ K): Element-wise non-linearity on the convolutionresult producing a Feature Map

(I ∗ K)(x) =∑

y∈Supp(K)

I(x − y)K(y)

The same neuron of weights K is applied many times (as much asthe number of pixels in I) producing a new image called featuremap.

Convolutional Neural Networks for ImagesConvolutions

Figure: LeNet architecture

Outline

Supervised Classification and RegressionClassificationRegression

One Neuronfor Regressionfor Classification

Gradient DescentBatch Gradient DescentStochastic Gradient Descent

Several Neurons

Convolutional Neural Networks for Images

Adversarial Networks

Conclusion

Adversarial NetworksA Desired Network

yx or z Generator

Figure: Scheme for a Desired Network

Adversarial NetworksBinary Classification Networks

y

y

Discriminator p

Figure: Scheme for Binary Classification Networks

Adversarial NetworksThe full system

yx or z Generator

y

Discriminator p

Figure: Scheme for Adversarial Networks

Adversarial NetworksA New Kind of Loss

G : Generator (e.g. of images) from random noise or a real imageD: Discriminator that distinguished fake examples from realexamples

minwD

maxwG

L

Figure: Adversarial Networks Example

Outline

Supervised Classification and RegressionClassificationRegression

One Neuronfor Regressionfor Classification

Gradient DescentBatch Gradient DescentStochastic Gradient Descent

Several Neurons

Convolutional Neural Networks for Images

Adversarial Networks

Conclusion

ConclusionA Great Book

Figure: The Deep Learning Book

Recommended