1 Understanding Deep Image Representations by Inverting Them Paper by Aravindth Mahendran, Andrea Velaldi Presentation by Anthony Chen

Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented

  • Upload

  • View

  • Download

Embed Size (px)

Citation preview

Page 1: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Understanding Deep Image Representations by Inverting Them

Paper by Aravindth Mahendran, Andrea Velaldi

Presentation by Anthony Chen

Page 2: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented



● Feature extraction methods like SIFT and HOG and CNN, but difficult to understand from information preservation standpoint.

Page 3: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented



● Novel method to invert representations. – That is, given a function and its output, recover the

original input.

● Analysis of the information preservation of different types of representation (CNN, HOG, SIFT).

Page 4: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Related Work

● DeConvNets

Your thoughts on similarities/differences?

Page 5: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Related Work (2)

● DeConvNets – My thoughts

– DeConvNet are encouraged to look like original, while this paper enforces no such constraint.

– Therefore, while both can be thought of as inverses, DeConvNet studies how results are obtained, whereas this paper studies information representation/preservation.

Page 6: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Inverting Images

● This is the function representing the CNN.

● Let x0 be the original image.

● Goal: Find an x such that is close to

Page 7: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Inverting Images (2)

● We want to find an x, which we will call x*, s.t

● Here, we add a regularizer to ensure that the optimization search only searches for “natural images”

Page 8: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Inverting Images (3)

● Given an image reconstruction , the reconstruction error is given by:

● Additional modification To ensure that loss near solution is bounded in a [0, 1) range:

where sigma is the mean of the images in our test set.

Page 9: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented



● Let x be a mean subtracted image vector.

● enforces range.

● Total variation:

– Penalizes images with large total gradients.

– Discrete version:

Page 10: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Regularizers (2)

● Allows us to set the range of the pixel values . If we want to set the range between [-B, B], then

● Allows us to say how much variability the reconstruction should have.

Page 11: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Final objective function

Page 12: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented



● Momentum based gradient descent is used to minimize the objective function.

● Momentum size has a decaying factor of .9

● Because CNN's function is differentiable, this is easy to optimize, but not for HOG and SIFT. Therefore, HOG and SIFT are implemented in CNN.

Page 13: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Representations: CNN

Page 14: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Representations: SIFT and HOG

● DSIFT and HOG implemented w/ CNN architecture which makes it easy to compute gradients.

● Binning is approximated using ReLU layer

● Pooling into cell histograms by linear filter.

● Cell blocks then normalized by a normalization layer.

● Maximum values are then set using ReLU unit.

Page 15: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented



● Normalized reconstruction error

● is the normalization constant. Average pairwise Euclidean distance across 100 images.

● λa = 2.16x108, λVβ = 5, β = 2.

Page 16: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Results: SIFT and HOG

● Using bilinear gradient improves HOGb greatly.

Page 17: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Results: SIFT and HOG (2)

Page 18: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Results: CNN

● Experiments run allowing different levels of total variance. – λ1 = .5. λ2 = 5. λ3 = 50

Page 19: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Test Images

Page 20: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Results CNN (2)

Page 21: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Results: CNN (3)

● Reconstruction from subset of network illustrates subset's purpose.

Page 22: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Results (4): Variance in Reconstruction

Page 23: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Effects of parameter tuning

Decreasing the regularizing constant leads to higher variance reconstructions. These indistinguishable images still lead to good reconstruction errors.

Page 24: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented


Future Work

● Use this inverse technique to improve CNN architecture.

● Use this technique on other forms of neural networks (LSTM)?

Page 25: Understanding Deep Image Representations by Inverting Themweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2015/anthony2.pdf · 14 Representations: SIFT and HOG DSIFT and HOG implemented



● This paper provides a novel method to study and visualize information preservation in a CNN.

● Formalizes relationship between CNN and shallow feature representation.