20
Pruning Convolutional Neural Networks for resource efficient inference Presented by: Kaushalya Madhawa 27th January 2017 Molchanov, Pavlo, et al. "Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning." arXiv preprint arXiv:1611.06440 (2016).

Pruning convolutional neural networks for resource efficient inference

Embed Size (px)

Citation preview

Page 1: Pruning convolutional neural networks for resource efficient inference

Pruning Convolutional Neural Networks for

resource efficient inference

Presented by: Kaushalya Madhawa27th January 2017

Molchanov, Pavlo, et al. "Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning." arXiv preprint arXiv:1611.06440 (2016).

Page 2: Pruning convolutional neural networks for resource efficient inference

The paper

2

● Will be presented at ICLR 2017 - 24-26th April

● Anonymous reviewer ratings○ 9○ 6○ 6

https://openreview.net/forum?id=SJGCiw5gl

Page 3: Pruning convolutional neural networks for resource efficient inference

Optimizing neural networks

Goal: running trained neural networks on mobile devices

1.Designing optimized networks from scratch

2.Optimizing pre-trained networks

Deep Compression (Han et al.)

3

Page 4: Pruning convolutional neural networks for resource efficient inference

Optimizing pre-trained neural networks

Reasons for pruning pre-trained networks

Transfer learning: fine-tuning an existing deep neural network previously trained on a larger related dataset results in higher accuracies

Objectives of pruning:

Improving the speed of inference

Reducing the size of the trained model

Better generalization4

Page 5: Pruning convolutional neural networks for resource efficient inference

Which parameters to be pruned?

Saliency: A measure of importance

Parameters with least saliency will be deleted“Magnitude equals saliency”

Parameters with smaller magnitudes have low saliencyCriteria for pruning

Magnitude of weight

a convolutional kernel with low l2 norm detects less important features than those with a high norm

Magnitude of activation

if an activation value is small, then this feature detector is not important for prediction task

Pruning the parameters which has the least effect on the trained model

5

Page 6: Pruning convolutional neural networks for resource efficient inference

Which parameters to be pruned?

Saliency: A measure of importance

Parameters with least saliency will be deleted“Magnitude equals saliency”

Parameters with smaller magnitudes have low saliencyCriteria for pruning

Magnitude of weight

a convolutional kernel with low l2 norm detects less important features than those with a high norm

Magnitude of activation

if an activation value is small, then this feature detector is not important for prediction task

Pruning the parameters which has the least effect on the trained model

6

Page 7: Pruning convolutional neural networks for resource efficient inference

Contributions of this paper

New saliency measure based on the first-order Taylor expansion

Significant reduction in floating point operations per second (FLOPs) without a significant loss in accuracy

Oracle pruning as a general method to compare network pruning models

7

Page 8: Pruning convolutional neural networks for resource efficient inference

Pruning as an optimization problem

Find a subset of parameters which preserves the accuracy of the trained network

Impractical to solve this combinatorial optimization problem for current networks

ie: VGG-16 has 4,224 convolutional feature maps

8

Page 9: Pruning convolutional neural networks for resource efficient inference

Taylor series approximation

Taylor approximation used to approximate the change in the loss function from removing a particular parameter (hi)

Parameters are assumed to be independent

First order Taylor polynomial

9

Page 10: Pruning convolutional neural networks for resource efficient inference

Optimal Brain Damage (Le Cun et al., 1990)

Change of loss function approximated by second order Taylor polynomial

10

The effect of parameters are assumed to be independent

Parameter pruning is performed once the training is converged

OBD is 30 times slower than he proposed Taylor method for saliency calculation

Page 11: Pruning convolutional neural networks for resource efficient inference

Experiments

Data sets

Flowers-102

Birds-200

ImageNet

Implemented using Theano

Layerwise l2-normalization

FLOPs regularization

Feature maps from different layers require different amounts of computation due to the number of input feature maps and kernels11

Page 12: Pruning convolutional neural networks for resource efficient inference

Experiments...

Compared against

Oracle pruning: computing the effect of removal of each parameter and the one which has the least effect on the cost function is pruned at each iteration

Optimal Brain Damage (OBD)

Minimum weight

Magnitude of activation

Mean

Standard deviation

Average Percentage of Zeros (APoZ) : neurons with low average percentage of positive activations are pruned (Hu et al., 2016)

12

Feature maps at the first few layers have similar APoZ regardless of the network’s target

Page 13: Pruning convolutional neural networks for resource efficient inference

Results

Spearman rank against the oracle ranking calculated for each criterion

13

Page 14: Pruning convolutional neural networks for resource efficient inference

Layerwise contribution to the loss

Oracle pruning on VGG-16 trained on Birds-200 dataset

Layers with max-pooling tend to be more important than those without (layers 2, 4, 7, 10, and 13)

14

Page 15: Pruning convolutional neural networks for resource efficient inference

Importance of normalization across layers

15

Page 16: Pruning convolutional neural networks for resource efficient inference

Pruning VGG-16 (Simonyan & Zisserman, 2015)

16

A network with 50% of the original parameters trained from scratch

OBDParameters

FLOPs

● Pruning of feature maps in VGG-16 trained on the Birds-200 dataset (30 mini-batch SGD updates after pruning a feature map)

Page 17: Pruning convolutional neural networks for resource efficient inference

Pruning AlexNet (Krizhevsky et al., 2012)

● Pruning of feature maps in AlexNet trained on the Flowers-102 dataset (10 mini-batch SGD updates after pruning a feature map)

17

Page 18: Pruning convolutional neural networks for resource efficient inference

Speedup of networks pruned by Taylor criterion

18

● All experiments performed in Theano with cuDNN v5.1.0

Page 19: Pruning convolutional neural networks for resource efficient inference

Conclusion

An efficient saliency measure to decide which parameters can be pruned without a significant loss of accuracy

Provides a thorough evaluation of many aspects of network pruning

A theoretical explanation about how the gradient contains information about the magnitude of the activations is needed

19

Page 20: Pruning convolutional neural networks for resource efficient inference

References

[1] Molchanov, Pavlo, et al. "Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning." arXiv preprint arXiv:1611.06440 (2016).

[2] Hengyuan Hu, Rui Peng, Yu-Wing Tai, and Chi-Keung Tang. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016

[3] S. Han, H. Mao, and W. J. Dally, “Deep Compression - Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” Int. Conf. Learn. Represent., pp. 1–13, 2016.

[4] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015

[5] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

20