Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented

1

Understanding Deep Image Representations by Inverting Them

Paper by Aravindth Mahendran, Andrea Velaldi

Presentation by Anthony Chen

2

Background

● Feature extraction methods like SIFT and HOG and CNN, but difficult to understand from information preservation standpoint.

3

Contributions

● Novel method to invert representations. – That is, given a function and its output, recover the

original input.

● Analysis of the information preservation of different types of representation (CNN, HOG, SIFT).

4

Related Work

● DeConvNets

Your thoughts on similarities/differences?

5

Related Work (2)

● DeConvNets – My thoughts

– DeConvNet are encouraged to look like original, while this paper enforces no such constraint.

– Therefore, while both can be thought of as inverses, DeConvNet studies how results are obtained, whereas this paper studies information representation/preservation.

6

Inverting Images

● This is the function representing the CNN.

● Let x0 be the original image.

● Goal: Find an x such that is close to

7

Inverting Images (2)

● We want to find an x, which we will call x*, s.t

● Here, we add a regularizer to ensure that the optimization search only searches for “natural images”

8

Inverting Images (3)

● Given an image reconstruction , the reconstruction error is given by:

● Additional modification To ensure that loss near solution is bounded in a [0, 1) range:

where sigma is the mean of the images in our test set.

9

Regularizers

● Let x be a mean subtracted image vector.

● enforces range.

● Total variation:

– Penalizes images with large total gradients.

– Discrete version:

10

Regularizers (2)

● Allows us to set the range of the pixel values . If we want to set the range between [-B, B], then

● Allows us to say how much variability the reconstruction should have.

11

Final objective function

12

Optimization

● Momentum based gradient descent is used to minimize the objective function.

● Momentum size has a decaying factor of .9

● Because CNN's function is differentiable, this is easy to optimize, but not for HOG and SIFT. Therefore, HOG and SIFT are implemented in CNN.

13

Representations: CNN

14

Representations: SIFT and HOG

● DSIFT and HOG implemented w/ CNN architecture which makes it easy to compute gradients.

● Binning is approximated using ReLU layer

● Pooling into cell histograms by linear filter.

● Cell blocks then normalized by a normalization layer.

● Maximum values are then set using ReLU unit.

15

Results

● Normalized reconstruction error

● is the normalization constant. Average pairwise Euclidean distance across 100 images.

● λa = 2.16x108, λVβ = 5, β = 2.

16

Results: SIFT and HOG

● Using bilinear gradient improves HOGb greatly.

17

Results: SIFT and HOG (2)

18

Results: CNN

● Experiments run allowing different levels of total variance. – λ1 = .5. λ2 = 5. λ3 = 50

19

Test Images

20

Results CNN (2)

21

Results: CNN (3)

● Reconstruction from subset of network illustrates subset's purpose.

22

Results (4): Variance in Reconstruction

23

Effects of parameter tuning

Decreasing the regularizing constant leads to higher variance reconstructions. These indistinguishable images still lead to good reconstruction errors.

24

Future Work

● Use this inverse technique to improve CNN architecture.

● Use this technique on other forms of neural networks (LSTM)?

25

Conclusion

● This paper provides a novel method to study and visualize information preservation in a CNN.

● Formalizes relationship between CNN and shallow feature representation.

Documents

Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented