31
Computer Vision Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z´ apadoˇ cesk´ e univerzity v Plzni reg. ˇ c. CZ.02.2.69/0.0/0.0/16 015/0002287 Computer Vision

Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

Computer VisionNeural Networks

Ing. Ivan Gruber Ph.D.

Department of CyberneticsFaculty of Applied SciencesUniversity of West Bohemia

ESF projekt Zapadoceske univerzity v Plznireg. c. CZ.02.2.69/0.0/0.0/16 015/0002287

Computer Vision

Page 2: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

Content

I Neural Networks

– General Knowledge– Artificial neuron

I Neural Networks proprieties

– Activation functions– Layers– Training– Parameters

I Important architectures

Computer Vision 1 / 30

Page 3: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Deep neural networks

I The most popular machine learning technique nowadays

I Models are inspired by biological brain

I By using more layers with activation functions is achieved non-linearity

I Many different neural network architectures

I Feedforward vs Recurrent

I Supervised vs Unsupervised learning

I Weight update via back-propagation algorithm

I Advantages: End-to-end training (including feature extraction), state-of-the-artresults

I Disadvantages: Need huge amount of the training data, choice of correctarchitecture (little bit an alchemy)

Computer Vision 2 / 30

Page 4: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Artificial neuron

I Biological neuron is composed of:– Soma - body of the neuron– Axon - output, each neuron has only one axon– Dendrites - input, each neuron can have up to several thousands dendrites– Synapses - links between Axons and Dendrites, one-way gates, different synaptic

strength– Inputs (electrical impulses) are summed and send into its axon if above certain

thresholdI Artificial neuron:

– The strength of the axioms is modeled by weights W– The threshold is ensured by activation function f

Computer Vision 3 / 30

Page 5: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Activation function

I Defines the output of the neuron based on the input(s) and some fixedmathematical operation

I Neurons don’t have to have activation function

I Many different activation functions

Computer Vision 4 / 30

Page 6: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Sigmoid Function

f (ξ) =1

1 + eξ,where ξ =

n∑i=1

(wTi xi + b), (1)

I Frequently used historicallyI Two major drawbacks:

– Function saturation (-5,5) - causes problems during back-propagation (vanishing)– Not zero-centered - gradient during back-propagation will be always positive or

negative

Computer Vision 5 / 30

Page 7: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Tanh activation

f (ξ) =2eξ

1 + eξ(2ξ)− 1, (2)

I Zero centered

I Saturation problem again

Computer Vision 6 / 30

Page 8: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

ReLU (Rectified Linear Unit)

f (ξ) = max(0, ξ), (3)

I Most popular activation function

I Computational simplicity

I Danger of dead neurons creation

I Modifications: Leaky ReLU, PReLU

Computer Vision 7 / 30

Page 9: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Softmax

f (ξ)j =eξj∑N eξN

, (4)

I Used in the classification layer

I Converts a raw value into a posterior probability

Computer Vision 8 / 30

Page 10: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Layers

I NN is formad by (acyclic) connecting of artificial neurons together

I The final purpose and function of the ANN are to determine by these connections(architecture of the network), by weights, and by types of neurons (activationfunctions)

I Neurons are organized unto distinct layersI The most common ones:

– Fully-connected layer– Convolution layer– Pooling layer– Regularization layer

Computer Vision 9 / 30

Page 11: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Fully-Connected Layer

I Each neuron is connected to all neurons in previous layerI The most common layerI (Optionally) last few layers in CNNsI Prone to over-fitting → usage together with dropoutI Hyperparameters: Number of neurons

Computer Vision 10 / 30

Page 12: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Convolution Layer

I Neurons are connected only to a local region of the previous layerI Size of the region is hyperparameter called kernel size (or receptive field)I The size of convolutional step is called stride (usually = 1)I Can be imagined as a set of filtersI Number of filters is called depthI All the neurons within the same filter are sharing weightsI Hyperparameters: Kernel size, Stride, Number of filters

Computer Vision 11 / 30

Page 13: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Kernel’s properties

I Kernels in the first layers = low-level features (edge detectors for example)

I Kernels in the mid layers = higher-level features

I Kernels at the end = class specific features

Computer Vision 12 / 30

Page 14: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Pooling Layer

I No trainable weights. no activation functionI Performs specific mathematical operation over the related regionI Hyperparameters: Kernel size, StrideI Typical math operations: Maximum, Average

Computer Vision 13 / 30

Page 15: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Dropout

I Regularization technique

I Prevents over-fitting

I During the training, each neuron output has a probability p to be ignored

I Hyperparameters: probability p

Computer Vision 14 / 30

Page 16: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Loss layer

I Last layer of NNI in the case of classification or regression we try to:

I such values of ωi that will minimize a chosen criterionI criteriun usually incorporates information from teacher: t

Classification criteria:I Binary cross entropy :

Ek = −∑i

ti log oi + (1− ti ) log(1− oi ) (5)

I Categorical cross entropy:

Ek = −∑i

tk,i log ok,i (6)

(with soft-max layer)

Regression criteria: Mean squared/absolute error

Ek =∑i

(ti − oi )2 (7)

Computer Vision 15 / 30

Page 17: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Back-propagation

I The most common training method of NNs

I Used in conjunction with an optimization methodI Algorithm steps:

1. The forward pass - NN predicts an output2. Error calculation based on loss function3. The backward pass - By recursive application of chain rule, gradient for individual

parameters (W, b) is calculated = loss is back-propagated to the individual neurons4. Using the gradients, the parameters update is performed based on the optimization

method

Computer Vision 16 / 30

Page 18: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Loss function

I Cross-entropy loss (classification tasks), always used with Softmax:

Lce = −N∑i

pi log pi , (8)

I Mean-square error (regression tasks):

L2 = ||f − y ||22, (9)

I Others:

– Contrastive loss– Triplet loss– Angular Softmax loss– Arc loss

Computer Vision 17 / 30

Page 19: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Weight initialization

I Before the training process, it is necessary to initialize parameters

I Non-trivial task

I Popular subject of researchI Common initializers:

– Zeros– Normal– Gaussian Random Variables (µ = 0 a σ = 0.01 . . . 10−5)– Xavier (Glorot)– LeCun– etc

I Parameter update is performed by an optimizer

Computer Vision 18 / 30

Page 20: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Stochastic Gradient Descent (SGD)

I First-order optimization algorithm

I Training data are divided into batches (due to memory limitations)

I Gradient descent is computed over those batches

ωt+1 = ωt − γtn∑

i=1

∇Li (ωt), (10)

I Advantages: Low computational time, best results with the right learning ratepolicy

I Disadvantages: Necessity of finding right learning rate policy

Computer Vision 19 / 30

Page 21: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Learning rate

I Effect of the learning rate

I Learning rate decay:– Learning rate change during the training– Step decay– Exponential decay– Etc.

Computer Vision 20 / 30

Page 22: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Momentum

I Improves results in the most cases

I Weighted average between newly computed gradient and the past gradients

ωt+1 = ωt + ∆ωt = ωt − γt∇L(ωt) + α∆ωt−1, (11)

Computer Vision 21 / 30

Page 23: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Adaptive optimizers

I Changes learning rate adaptively

I Adagrad

I RMSprop

I Adam

I Etc.

Computer Vision 22 / 30

Page 24: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

AlexNet (2012)

I Winner of ImageNet 2012

I The first time NN approach overcome other approaches

I 5 convolutional layers + 3 fully-connected

I Innovations: ReLU nonlinearity, Dropout

Computer Vision 23 / 30

Page 25: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

VGGNet (2014)

I Usage of small kernels (3×3)

I Constant computational complexity across all convolutional layers

I State-of-the-art results

Computer Vision 24 / 30

Page 26: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

InceptionNet (2014)

I Large kernels are preferred for more global information, while smaller one arepreferred for local information

I Size of important things can be very variableI Application of different operations in the same depth = Inception moduleI Usage of 1×1 convolutions (’max pooling for the channel dimension’)

Computer Vision 25 / 30

Page 27: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Fully-Convolutional Networks

I No fully-connected layersI Fully-connected layer - huge number of parameters, prone to overfittingI Global average pooling instead of the last fully-connected layerI Advantages:

1. Correspondence between feature maps and categories is enforced2. Overfitting avoidance3. Global average pooling is more robust to spatial transformations4. Fully-convolutional networks have a great ability to encode localization without any

further information

Computer Vision 26 / 30

Page 28: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

ResNet (2016)

I Problems with vanishing and exploding gradientI The ease of learning is not same for all transformationsI Inclusion of shortcut connectionsI Winners of ImageNet 2015

y = F (x, {Wi}) + x, (12)

Computer Vision 27 / 30

Page 29: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Autoencoders

I Bow-tie structure:

– Encoder - Fully-convolutional network– Decoder - Deconvolutional network

I Semantic segmentation tasks

Computer Vision 28 / 30

Page 30: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

General Knowledge Activation functions Layers Training Important architectures Conclusion

Challenges, codes and examples

I Kaggle

I ImageNet

I Papers with code

I CS231n: Convolutional Neural Networks for Visual Recognition

I CS231n: Youtube

I Andrew Ng Coursera courses

I Siraj Raval - Youtube (Fraud and thief, but still very informative)

I 3Blue1Brown

I Two minute papers

I Deep learning news

Computer Vision 29 / 30

Page 31: Computer Vision - Neural Networks · Neural Networks Ing. Ivan Gruber Ph.D. Department of Cybernetics Faculty of Applied Sciences University of West Bohemia ESF projekt Z apado cesk

Thank you for your attention!

Questions?

Computer Vision 30 / 30