Download pdf - Visualizing and Understanding Convolutional Networks and... · Visualizing and Understanding Convolutional Networks Mattew D. Zeiler and Rob Fergus 12 Nov 2013 Baek Gyuseung ... Convolution

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Introduction Visualization Training Details CNN Visualization Experiments Discussion

Visualizing and Understanding ConvolutionalNetworks

Mattew D. Zeiler and Rob Fergus12 Nov 2013

Baek Gyuseung

Seoul National University

December 2, 2016

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


1 Introduction

2 Visualization

3 Training Details

4 CNN Visualization

5 Experiments

6 Discussion

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Outline for section 1

1 Introduction

2 Visualization

3 Training Details

4 CNN Visualization

5 Experiments

6 Discussion

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Introduction

• Since their introduction by LeCun in the early 1990’s,Convolution Neural Networks - CNNs have demonstratedexcellent performance at image classification

• CNNs have developed continuously - Krizhevsky et al. win theImageNet 2012 classification benchmark with their own CNN(AlexNet)

• There are several reasons that CNNs perform well - complexstructure, parsimonious coefficient

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Introduction

• However, from a scientific standpoint, this is deeplyunsatisfactory

• Little insight into the internal operation and behavior• Have no idea of how they achieve such good performance

• Without clear understanding of how and why they work, thedevelopment of better models is reduced to trial-and-error

• Visualization - reveals the input stimuli that excite individualfeature maps at any layer in the model

• Result of visualization is no just crops of input images, rathertop-down projection

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



1 Introduction

2 Visualization

3 Training Details

4 CNN Visualization

5 Experiments

6 Discussion

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Deconvolution Network

• Proposed by Zeiler et al.

• Inverse mapping of CNN

• It is not a learning - just use already trained CNN

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



• Unpoolingmax pooling is non-invertible - recording the locations of themaxima (switch variable)

• Rectification

• Filteringuse tranposed versions of the same filters (like Auto-Encoder)

• Contrast normalization does not need

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



Figure: Example of unpooling by using switch variable

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



1 Introduction

2 Visualization

3 Training Details

4 CNN Visualization

5 Experiments

6 Discussion

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Training details

• Compare with the AlexNet

• Data set - ImageNet 2012

• model fitting by stochastic gradient descent

• mini-batch size : 128• learning rate : 10−2, momentum term : 0.9• Dropout with a rate of 0.5• initial weight : 10−2, bias : 0• produce multiple crops

• Renormalize each filter in conv. layers whose RMS valueexceeds a fixed radius of 10−1 to this fixed radius

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Architecture

A little difference to AlexNet

• Sparse connection in AlexNet are replaced with denseconnections

• Filter size of input image and stride are adjusted

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



1 Introduction

2 Visualization

3 Training Details

4 CNN Visualization

5 Experiments

6 Discussion

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Feature Visualization

Figure: Top 9 activations in a random subset of feature maps across thevalidation data

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Feature Visualization

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Feature Evolution during Training

Figure: Evolution of features through training. The visualization showsthe strongest activation for a given feature map at epochs[1,2,5,10,20,30,40,64]

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Feature invariance

Figure: translation, scale, rotation invariance of CNN

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Architecture selection

• (b) and (d) is obtained by visualizing AlexNet

• (b) : too many features are dead // (d) : Aliasing

• Fix it by adjusting filter size and stride of 1st layer

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Occlusion Sensitivity

Figure: (a) Original image (b) Strongest feature map, layer5 (c)Visualization of b. (d) Prob. of correct class (e) Most probable class

CNN identify the location of the object in the image.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Correspondence Analysis

• Occlude same configuration (eyes and nose) and calculateHamming distance

• ∆ =∑

i ̸=j H(sign(ϵi ), sign(ϵj)), where ϵi = xi − x̃i

• Lower value indicates greater consistency in the changeresulting from the masking operation

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



1 Introduction

2 Visualization

3 Training Details

4 CNN Visualization

5 Experiments

6 Discussion

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


ImageNet 2012

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Varying ImageNet Model Sizes

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Feature Generalization - Caltech-101

• Foregoing results show that the features of CNN representunique topological properties.

• Use this image classifier trained by ImageNet 2012 to otherdifferent image datasets!

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Caltech-256

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


PASCAL 2012

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Feature Analysis

• Compare models having different number of layers

• As the feature hierarchies become deeper, they learnincreasingly powerful features!

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



1 Introduction

2 Visualization

3 Training Details

4 CNN Visualization

5 Experiments

6 Discussion

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Discussion

• Introduce a novel way to cisualize the activity within themodel

• By visualizing the CNN, we improve the AlexNet

• CNN is highly sensitive to local structure

• Deep model shows good performance

• ImageNet trained model can generalize well to other datasets

• Shortcoming of our visualization : It only visualize a singleactivation, not the joint activity

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Bibliography

• Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenetclassification with deep convolutional neural networks.In:NIPS(2012)

• LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard,R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied tohandwritten zip code recognition. Neural Comput. 1(4),541551 (1989)

• Zeiler, M., Taylor, G., Fergus, R.: Adaptive deconvolutionalnetworks for mid and high level feature learning. In: ICCV(2011)