44
Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University of Tartu 2016 Deep Residual Learning for Image Recognition ILSVRC 2015 MS COCO 2015 WINNER

Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Article overview by Ilya Kuzovkin

K. He, X. Zhang, S. Ren and J. Sun Microsoft Research

Computational Neuroscience Seminar University of Tartu

2016

Deep Residual Learning for Image Recognition ILSVRC 2015

MS COCO 2015 WINNER

Page 2: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

THE IDEA

Page 3: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

1000 classes

Page 4: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

2012

8 layers 15.31% error

Page 5: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

2012

8 layers 15.31% error

2013

9 layers, 2x params 11.74% error

Page 6: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

2012

8 layers 15.31% error

2013 2014

9 layers, 2x params 11.74% error

19 layers 7.41% error

Page 7: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

2012

8 layers 15.31% error

2013 2014 2015

9 layers, 2x params 11.74% error

19 layers 7.41% error

?

Page 8: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

2012

8 layers 15.31% error

2013 2014 2015

9 layers, 2x params 11.74% error

19 layers 7.41% error

Is learning better networks as easy as stacking more layers?

?

Page 9: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

?

2012

8 layers 15.31% error

2013 2014 2015

9 layers, 2x params 11.74% error

19 layers 7.41% error

Is learning better networks as easy as stacking more layers?

Vanishing / exploding gradients

Page 10: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

?

2012

8 layers 15.31% error

2013 2014 2015

9 layers, 2x params 11.74% error

19 layers 7.41% error

Is learning better networks as easy as stacking more layers?

Vanishing / exploding gradients

Normalized initialization & intermediate normalization

Page 11: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

?

2012

8 layers 15.31% error

2013 2014 2015

9 layers, 2x params 11.74% error

19 layers 7.41% error

Is learning better networks as easy as stacking more layers?

Vanishing / exploding gradients

Normalized initialization & intermediate normalization

Degradation problem

Page 12: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Degradation problem

“with the network depth increasing, accuracy gets saturated”

Page 13: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Not caused by overfitting:

Degradation problem

“with the network depth increasing, accuracy gets saturated”

Page 14: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Conv

Conv

Conv

Conv

Trained

Accuracy X%

Tested

Page 15: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Conv

Conv

Conv

Conv

Trained

Accuracy X%

Conv

Conv

Conv

Conv

Identity

Identity

Identity

IdentityTested

Tested

Page 16: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Conv

Conv

Conv

Conv

Trained

Accuracy X%

Conv

Conv

Conv

Conv

Identity

Identity

Identity

Identity

Same performance

Tested

Tested

Page 17: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Conv

Conv

Conv

Conv

Trained

Accuracy X%

Conv

Conv

Conv

Conv

Identity

Identity

Identity

Identity

Same performance

Conv

Conv

Conv

Conv

Conv

Conv

Conv

Conv

Trained

Tested

Tested

Tested

Page 18: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Conv

Conv

Conv

Conv

Trained

Accuracy X%

Conv

Conv

Conv

Conv

Identity

Identity

Identity

Identity

Same performance

Conv

Conv

Conv

Conv

Conv

Conv

Conv

Conv

Trained

Worse!

Tested

Tested

Tested

Page 19: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Conv

Conv

Conv

Conv

Trained

Accuracy X%

Conv

Conv

Conv

Conv

Identity

Identity

Identity

Identity

Same performance

Conv

Conv

Conv

Conv

Conv

Conv

Conv

Conv

Trained

Worse!

Tested

Tested

Tested

“Our current solvers on hand are unable to find solutions that are comparably good or

better than the constructed solution (or unable to do so in feasible time)”

Page 20: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Conv

Conv

Conv

Conv

Trained

Accuracy X%

Conv

Conv

Conv

Conv

Identity

Identity

Identity

Identity

Same performance

Conv

Conv

Conv

Conv

Conv

Conv

Conv

Conv

Trained

Worse!

Tested

Tested

Tested

“Our current solvers on hand are unable to find solutions that are comparably good or

better than the constructed solution (or unable to do so in feasible time)”

“Solvers might have difficulties in approximating identity mappings by

multiple nonlinear layers”

Page 21: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Conv

Conv

Conv

Conv

Trained

Accuracy X%

Conv

Conv

Conv

Conv

Identity

Identity

Identity

Identity

Same performance

Conv

Conv

Conv

Conv

Conv

Conv

Conv

Conv

Trained

Worse!

Tested

Tested

Tested

“Our current solvers on hand are unable to find solutions that are comparably good or

better than the constructed solution (or unable to do so in feasible time)”

“Solvers might have difficulties in approximating identity mappings by

multiple nonlinear layers”

Add explicit identity connections and “solvers may simply drive the weights of

the multiple nonlinear layers toward zero”

Page 22: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Add explicit identity connections and “solvers may simply drive the weights of the multiple

nonlinear layers toward zero”

is the true function we want to learn

Page 23: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Add explicit identity connections and “solvers may simply drive the weights of the multiple

nonlinear layers toward zero”

is the true function we want to learn

Let’s pretend we want to learn

instead.

Page 24: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Add explicit identity connections and “solvers may simply drive the weights of the multiple

nonlinear layers toward zero”

is the true function we want to learn

Let’s pretend we want to learn

instead.

The original function is then

Page 25: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University
Page 26: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Network can decide how deep it needs to be…

Page 27: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Network can decide how deep it needs to be…

“The identity connections introduce neither extra parameter nor computation complexity”

Page 28: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

2012

8 layers 15.31% error

2013 2014 2015

9 layers, 2x params 11.74% error

19 layers 7.41% error

?

Page 29: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

2012

8 layers 15.31% error

2013 2014 2015

9 layers, 2x params 11.74% error

19 layers 7.41% error

152 layers 3.57% error

Page 30: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

EXPERIMENTS AND DETAILS

Page 31: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

• Lots of convolutional 3x3 layers • VGG complexity is 19.6 billion FLOPs

34-layer-ResNet is 3.6 bln. FLOPs

Page 32: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

• Lots of convolutional 3x3 layers • VGG complexity is 19.6 billion FLOPs

34-layer-ResNet is 3.6 bln. FLOPs !• Batch normalization • SGD with batch size 256 • (up to) 600,000 iterations • LR 0.1 (divided by 10 when error plateaus) • Momentum 0.9 • No dropout • Weight decay 0.0001

Page 33: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

• Lots of convolutional 3x3 layers • VGG complexity is 19.6 billion FLOPs

34-layer-ResNet is 3.6 bln. FLOPs !• Batch normalization • SGD with batch size 256 • (up to) 600,000 iterations • LR 0.1 (divided by 10 when error plateaus) • Momentum 0.9 • No dropout • Weight decay 0.0001 !• 1.28 million training images • 50,000 validation • 100,000 test

Page 34: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

• 34-layer ResNet has lower training error. This indicates that the degradation problem is well addressed and we manage to obtain accuracy gains from increased depth.

Page 35: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

• 34-layer ResNet has lower training error. This indicates that the degradation problem is well addressed and we manage to obtain accuracy gains from increased depth. !

• 34-layer-ResNet reduces the top-1 error by 3.5%

Page 36: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

• 34-layer ResNet has lower training error. This indicates that the degradation problem is well addressed and we manage to obtain accuracy gains from increased depth. !

• 34-layer-ResNet reduces the top-1 error by 3.5% !

• 18-layer ResNet converges faster and thus ResNet eases the optimization by providing faster convergence at the early stage.

Page 37: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

GOING DEEPER

Page 38: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

Due to time complexity the usual building block is replaced by Bottleneck Block

50 / 101 / 152 - layer ResNets are build from those blocks

Page 39: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University
Page 40: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University
Page 41: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

ANALYSIS ON CIFAR-10

Page 42: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University
Page 43: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University
Page 44: Deep Residual Learning for Image Recognition · Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University

ImageNet Classification 2015 1st 3.57% error

ImageNet Object Detection 2015 1st 194 / 200 categories

ImageNet Object Localization 2015 1st 9.02% error

COCO Detection 2015 1st 37.3%

COCO Segmentation 2015 1st 28.2%

http://research.microsoft.com/en-us/um/people/kahe/ilsvrc15/ilsvrc2015_deep_residual_learning_kaiminghe.pdf