Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Overview of Convolutional Neural network
Overview of Convolutional Neural network
Seoul National University Deep Learning September-December, 2019 1 / 54
Overview of Convolutional Neural network Artificial Neural Networks
Perceptron: Building block
• The perceptron was intended to be a machine, rather than a program,and the perceptron machine was designed for image recognition of anarray of 400 photocells.• The perceptron is an algorithm for a binary classifier: f (x) = 1 ifwx + b > 0, 0, otherwise.
Seoul National University Deep Learning September-December, 2019 2 / 54
Overview of Convolutional Neural network Artificial Neural Networks
Single-layered neural network
• The perceptron model is called single-layered neural network.
Seoul National University Deep Learning September-December, 2019 3 / 54
Overview of Convolutional Neural network Artificial Neural Networks
An example of learned filters or weights for input images
• Note that the filter size is the same as the input size.
Seoul National University Deep Learning September-December, 2019 4 / 54
Overview of Convolutional Neural network Artificial Neural Networks
Multi-layered feedforward neural network
figure from slides of Andrej Karpathy
Feedforward neural networks take input x and predict
P(y = 1|x , θ) = fk(· · · f3(f2(f1(x ; θ1); θ2); θ3) · · · ; θk).
Seoul National University Deep Learning September-December, 2019 5 / 54
Overview of Convolutional Neural network Artificial Neural Networks
Layers of Artificial Neural Network (ANN)
fl(.) is commonly a repeated compositional function of linear andnonlinear transformation.
Trying to estimate invariant function in a compositional manner.
A unit of layers is composed of known and unknown transformations.
Convolutional layer: at the l th layer: Z l = W lhl−1 + bl , whereh0 = x .
W=filters. Z l= neurons. W ’s and b’s are unknown and to beestimated or trained.
Pooling layer
Activation layer: hl = gl(Zl): nonlinear transformation
The last layer: softmax: hKi = exp(Z i )/∑k
l=1 exp(Z l).
Seoul National University Deep Learning September-December, 2019 6 / 54
Overview of Convolutional Neural network
Convolutional neural network (CNN)
CNN is a special case of feedforward neural network with locality andsharing restriction.
This characteristic is referred to as ‘shift invariance’.
Restriction reduces the number of parameters and helps capture localcharacteristics.
Seoul National University Deep Learning September-December, 2019 7 / 54
Overview of Convolutional Neural network
Convolutional layer
figure from slides of Andrej Karpathy
Resulting output is a 28 by 28 activation map.
Seoul National University Deep Learning September-December, 2019 8 / 54
Overview of Convolutional Neural network
Convolutional layer
figure from slides of Andrej Karpathy
Apply 6 filters and obtain 6 activation maps.
Seoul National University Deep Learning September-December, 2019 9 / 54
Overview of Convolutional Neural network
Role of locality and sharing of convolutional layer
How locality and sharing reduces the number of parameters?
If 32x32x3 volume is processed to 28x28x6 volume as in the figureusing fully connected layer, the number ofparameters=(32*32*3)*(28*28*6)=14.5 Million
With 6 5x5 filters, we only used (5*5*3)*6=450 parameters.
Seoul National University Deep Learning September-December, 2019 10 / 54
Overview of Convolutional Neural network
Pooling layer
figure from slides of Andrej Karpathy
Average pooling or maxpooling shrinks the representations.Recall averaging or integration can extract invariant features of the
images.Integration over all rota-tions
Seoul National University Deep Learning September-December, 2019 11 / 54
Overview of Convolutional Neural network
Activation layer
sigm(Z ) = 11+exp(−Z)
tanh(Z )Rectified Linear Unit: ReLU(Z)= max(Z , 0)
Seoul National University Deep Learning September-December, 2019 12 / 54
Overview of Convolutional Neural network
Stacked layers
The first layer:
Z 1 = W 1h0 + b1 where h0 = x .h1 = g1(Z 1), g1(.) is activation function
The l th layer:
Z l = W lhl−1 + bl
hl = gl(Zl)
Seoul National University Deep Learning September-December, 2019 13 / 54
Overview of Convolutional Neural network
Stride
• Shrink dimensions by subsampling.
Source:http://adeshpande4.github.io/A-Beginner%Seoul National University Deep Learning September-December, 2019 14 / 54
Overview of Convolutional Neural network
Padding
Source:https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2%
Seoul National University Deep Learning September-December, 2019 15 / 54
Overview of Convolutional Neural network
Role of multiple layers via visualization
Seoul National University Deep Learning September-December, 2019 16 / 54
Overview of Convolutional Neural network
Different architectures
CNNs popularity is triggered by debut of ‘AlexNet’ by Krizhevsky etal. (2012) winning ImageNet Large Scale Visual RecognitionChallenge (ILSVRC).
Imagenet competition is an annual computer vision contest runningsince 2010 after Li launched ImagNet assembling a free database of14 million+ labeled images.
Successful training is due to a large dataset, computational powerusing GPU and some aspects of the algorithm.
Every year through ImageNet competition new architecture andoptimization tips have been proposed and improved the accuracy ofclassification. We cover AlexNet, VGGNet and ResNet.
Seoul National University Deep Learning September-December, 2019 17 / 54
Overview of Convolutional Neural network
AlexNet by Krizhevsky et al. (2012)
Start with 224x224x3 input. End with three fully connected layers.
layer Filter size (stride) # filters maxpool (stride) output1.1 11x11x3 (4) 48x2 55x55x961.2 3x3 (2) 27x27x962.1 5x5x96 128x2 27x27x2562.2 3x3 (2) 13x13x2563 3x3x256 192x2 13x13x3844 3x3x384 192x2 13x13x384
5.1 3x3x384 128x2 13x13x2565.2 3x3 (2) 6x6x256=9216
Seoul National University Deep Learning September-December, 2019 18 / 54
Overview of Convolutional Neural network
AlexNet (Krizhevsky et al. 2012)
Used ReLu
Heavy data augmentation
Dropout
SGD, batch size 128, momentum=0.9, Reducing learning ratemanually starting from 0.01.
Ensemble of 7 CNNs
Seoul National University Deep Learning September-December, 2019 19 / 54
Overview of Convolutional Neural network
VGGNet, OxfordNet (Simonyan and Zisserman, 2014)
Deeper model. More layers (16 layers excluding maxpool and softmaxcompared to 5 layers for AlexNet).
Simpler structure.Only 3x3 filters with stride 1, pad 1, and 2x2 maxpool with stride 2,are used.Number of filters multiplied by two (64, 128, 256, 512)
Source: https://blog.heuritech.com/2016/02/29
Seoul National University Deep Learning September-December, 2019 20 / 54
Overview of Convolutional Neural network
VGGNet, OxfordNet (Simonyan and Zisserman, 2014)
Table: Structure of VGGNet
block # cov or fully connected layers # filter size1 2 conv 3x3 64 maxpool2 2 conv 3x3 128 maxpool3 3 conv 3x3 256 maxpool4 3 conv 3x3 512 maxpool5 3 conv 3x3 512 maxpool6 3 Fully connected 4096 (2) 1000 (1) softmax
• maxpool after each block• 140M parameters (heavy from FC layers)
Seoul National University Deep Learning September-December, 2019 21 / 54
Overview of Convolutional Neural network
VGGNet: Number of parameters and memory
Seoul National University Deep Learning September-December, 2019 22 / 54
Overview of Convolutional Neural network
Role of a small filter
If we stack two 3x3 convolutional layers, a neuron in the second layerwill cover 5x5 input region.
If we stack three 3x3 convolutional layers, a neuron in the third layerwill cover 7x7 input region.
If the number of filters is C : 7x7 filter needs Cx(7x7xC ) parameters;three 3x3 filters need 3xCx(3x3xC ). Three 3x3 filters need lessparameters with more nonlinearity.
How about even a smaller filter?
Seoul National University Deep Learning September-December, 2019 23 / 54
Overview of Convolutional Neural network
Role of a 1x1 filter
• For a HxWxCinput dimension, 1x1x(C/2) filtersoutput HxWx(C/2). (with stride1 and padding to preserve H, W)• (1. 1x1x(C/2) 2. 3x3x(C/2)3. 1x1xC) vs. single 3x3xC?The former needs less numberof parameters, less computation,with more nonlinearity.
Seoul National University Deep Learning September-December, 2019 24 / 54
Overview of Convolutional Neural network
GoogLeNet (Szegedy et al., 2014)
Design a good local network topology and stack these modules.
Use of average pooling before the classification
Computationally expensive
Auxiliary classifiers connected to intermediate layers
Seoul National University Deep Learning September-December, 2019 25 / 54
Overview of Convolutional Neural network
ResNet (He, Zhang, Ren and Sun, 2015)
Deeper the better? He et al. (2015) showed that deeper models canhave higher training error than shallower models.
Instead of f2(f1(xw1)w2) as in Alexnet or VGGNet, ResNet models theresidual, i.e., f1(xw1) + f2(f1(xw1)w2) so that w2 = 0 reduces to ashallow model.
Seoul National University Deep Learning September-December, 2019 26 / 54
Overview of Convolutional Neural network
ResNet
• 152-layer model• Every residual block has 3x3 conv layers• Periodilcally, double thenumber of filters and downsample spatially using stride 2• Additional conv layer at the beginning• No FC layers at the end• For deeper networks (50+ layers) usebottleneck layer to improve efficiency: 1x1→ 3x3 → 1x1• No dropout• Batch normalization • No maxpooling
Seoul National University Deep Learning September-December, 2019 27 / 54
Overview of Convolutional Neural network
ResNet (He, Zhang, Ren and Sun, 2015)
Seoul National University Deep Learning September-December, 2019 28 / 54
Overview of Convolutional Neural network
Performance of various architectures
source: Canziani, Culuciello and Paszke (2017)
Seoul National University Deep Learning September-December, 2019 29 / 54
Overview of Convolutional Neural network
Regularizations
In most cases, the number of parameters exceeds the number oftraining samples. To avoid overfitting, some regularization isnecessary.
ReLU (non-negative thresholding operator)
Early stopping
L1, L2 penalty on weights
Dropout
Batch normalization
Data augmentation
Ensemble
Seoul National University Deep Learning September-December, 2019 30 / 54