Image classification with Deep Neural Networks

Preview:

Citation preview

Image Classification with Deep Neural Networks

Yogendra Tamang

Sabin Devkota

Presented By:

February 6, 2016

ImageNet Classification with Deep Convolutional Neural NetworksA. Krizhevsky, I. Sutskever, G. Hinton

#pwlnepalPWL Kathmandu

Papers We LoveKathmandu

OutlineBackgroundImage ClassificationActivation FunctionsConvolution Neural NetworksOther IssuesTechniques used in other papersAnd So on…

Background Learning Supervised Unsupervised AI Tasks Classification and Regression Clustering

Machine Learning Problem

Supervised

RegressionClassfication

Unsupervised

Clustering

Background Classification Classifies data into one of discrete classes Eg. Classifying digits Cost Function for Classification Task may be Logistic Regression or Log-

likelihood Regression Predicts continuous real valued output Eg. Stock Price Predictions Cost function for regression type problem are MSE(Mean Squared Error)

5

Introduction• Input Image is array of

number for computer• Assign a label to input

image from set of categories

• One of core problems in computer vision

6

Neural Networks and Multilayer Perceptrons

7

Neural Networks and Multi Layer Perceptrons

“activation” of unit in layer matrix of weights controlling function mapping from layer to layer

They use Activation Functions.

ReLU Activationnon-saturating non-linearityWhile training with Gradient Descent these are faster than saturating non-linearities like sigmoid and tanh functions.

10

Convolutional Neural Networks• Replicated feature

approach• Local Connectivity:

Receptive fields(RFs)• Shared Weights:

Apply filter to image.• Pooling:

subsampling layers

Convolutional Neural Networks One or more convolutional layer Followed by one or more fully connected layer Resulting in easy to train networks with many fewer parameters.

Convolution Layers

Pooling Layer

Convolution Layer

Basic Architecture

17

Learning a Classifier Gradient Descent Algorithm Calculate Cost Function or Lost Function J(s) Calculate Gradient Update Weights Stochastic Gradient Descent: Updates Adjust after example. Mini-batch SGD: Updates after batches. We do not use Squared Error measure for training instead we use Softmax Function for output

Right cost function is Negative Log Likelihood.

Learning a Classifier- Negative Log likelihood

𝑁𝐿𝐿 (𝜃 ,𝒟 )=− ∑𝑖=0

¿ 𝒟∨¿log 𝑃(𝑌=𝑦( 𝑖)∨𝑥 ( 𝑖) ,𝜃 )¿

¿Where is Dataset is weight parameter is ith training data. Y is target data.

Overfitting When the number of training examples are small and the architecture is deep, the network performs well on training data but worse on test data. i.e. it overfits.

Overfitting Mitigation Data Augmentation : Artificially creating more data samples from existing data through

various transformation of images (i.e. rotation, reflection, skewing etc.) and/or dividing images into small patches and averaging all their predictions.

Applying PCA to the training examples to find out principal components which correspond to intensity and color of the illumination. Creating artificial data by adding randomly scaled eigen vectors to the training examples.

Dropout:

21

DropoutTechnique to reduce overfittingDropout prevents complex co-adaptation on the training data.Randomly omit each hidden unit with probability 0.5 andIts like randomly sampling from 2^H Architectures, H is number of units in a hidden layerEfficient way to average many larger neural nets

ImageNet classification with Deep CNNImprovement increases with larger datasetsNeed model with large learning capacityCNN’s capacity can be controlled with depth and breadthBest results in ILSVRC-2010, 2012

CNN Architecture Design 5 Convolution Layers 3 Full Connection Layers Last output is 1000 way

Training on Multiple GPUsCurrent GPU more suited to cross-GPU ParallelizationPutting half of neurons on each GPUs and allowing them to communicate only in certain layersChoosing the connectivity is done by cross validation

Local Response Normalization Activity of neuron i at position (x, y) is normalized to simulate lateral inhibition inspired by real neurons.

DatasetsILSVRC started as part of Pascal Visual Object challenge on 20101.2 Million training images, 50K validation Images and 150K testing images.ILSVRC uses 1000 Images for each of 1000 categories.Two error measures top-1 error, top-5 error. Top-5 error fraction of test images for which correct label is not among the best 5 results predicted from the model.

Visualization of Learned feature

Tools used for overfitting Data Augmentation Dropout

Results

30

References[1] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," 2012.[2] S. Tara, Brian Kingsbury, A.-r. Mohamed and B. Ramabhadran, "Learning Filter Banks within a Deep Neural Network Framework," in IEEE, 2013. [3] A. Graves, A.-r. Mohamed and G. Hinton, "Speech Recognition with Deep Recurrent Neural Networks," University of Toronto.[4] A. Graves, "Generating Sequences with Recurrent Neural Networks," arXiv, 2014.[5] Q. V. Oriol Vinyals, "A Neural Conversational Model," arXiv, 2015.[6] R. Grishick, J. Donahue, T. Darrel and J. Mallik, "Rich Features Hierarchies for accurate object detection and semantic segmentation.," UC Berkeley.[7] A. Karpathy, "CS231n Convolutional Neural Networks for Visual Recognition," Stanford University, [Online]. Available: http://cs231n.github.io/convolutional-networks/.[8] I. Sutskever, "Training Recurrent Neural Networks," University of Toronto, 2013.[9] "Convolutional Neural Networks (LeNet)," [Online]. Available: http://deeplearning.net/tutorial/lenet.html.[10] C. Eugenio, A. Dundar, J. Jin and J. Bates, "An Analysis of the Connections Between Layers of Deep Neural Networks," arXiv, 2013.

31

References[11] M. D. Zeiler and F. Rob, "Visualizing and Understanding Convolutional Networks," arXiv, 2013.[12] G. Hinton, N. Srivastava, A. Karpathy, I. Sutskever and R. Salakhutdinov, Improving Neural Networks by preventing co-adaptation of feature detectors, Totonto: arXiv, 2012. [13] L. Fie-Fie. and A. Karpathy, "Deep Visual Alignment for Generating Image Descriptions," Standford University, 2014.[14] O. Vinyals, A. Toshev., S. Bengio and D. Erthan, "Show and Tell: A Neural Image Caption Generator.," Google Inc., 2014.[15] J. M. G. H. IIya Sutskever, "Generating Text with Recurrent Neural Networks," in 28th International Conference on Machine Learning, Bellevue, 2011. [16] "Theano," [Online]. Available: http://deeplearning.net/software/theano/index.html. [Accessed 27 10 2015].[17] "What is GPU Computing ?," NVIDIA, [Online]. Available: http://www.nvidia.com/object/what-is-gpu-computing.html. [Accessed 27 12 2015].[18] "GeForce 820M|Specifications," NVIDIA, [Online]. Available: http://www.geforce.com/hardware/notebook-gpus/geforce-820m/specifications. [Accessed 28 10 2015].

Recommended