DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

DENSELY CONNECTED CONVOLUTIONAL NETWORKS

Gao Huang*, Zhuang Liu*, Laurens van der Maaten, Kilian Q. Weinberger

CVPR 2017

Cornell University Tsinghua University

Facebook AI Research

Best paper award

CONVOLUTIONAL NETWORKS

LeNet AlexNet

VGG Inception

ResNet

STANDARD CONNECTIVITY

RESNET CONNECTIVITY

Deep residual learning for image recognition: [He, Zhang, Ren, Sun] (CVPR 2015)

Identity mappings promote gradient propagation.

: Element-wise addition

DENSE CONNECTIVITY

C C C C

: Channel-wise concatenation C

DENSE AND SLIM

k channels k channels k channels k channels

k : Growth Rate

C C C C

x4x2x1h1 h2 h3 h4x0

FORWARD PROPAGATION

x4x3x2x1x0

x0 x1x0x3x2x1 x0

x3x2x1x0 Batc

ReL

Con

COMPOSITE LAYER IN DENSENET

x5 =h5([x0, …, x4])

kchannels

x4

Batc

h N

orm

ReL

U

Con

volu

tion

(3x3

)

x3x2x1 x0x3x2x1x0

COMPOSITE LAYER IN DENSENET

Higher parameter and computational efficiency

WITH BOTTLENECK LAYER

lxkchannels

4xkchannels

Batc

h N

orm

ReL

U

Con

volu

tion

(3x3

)

Batc

h N

orm

ReL

U

Con

volu

tion

(1x1

)k

channels

x3x2x1 x0

x4

DENSENETC

onvo

lutio

n

Pool

ing

Con

volu

tion

Pool

ing

Con

volu

tion

Pool

ing

Line

ar

Dense Block 1 Dense Block 2 Dense Block 3

Pooling reduces feature map sizes

Output

Feature map sizes match within each block

ADVANTAGES OF DENSE CONNECTIVITY

ADVANTAGE 1: STRONG GRADIENT FLOW

Error Signal

Deeply supervised Net: [Lee, Xie, Gallagher, Zhang, Tu] (2015)

Implicit “deep supervision”

ADVANTAGE 2: PARAMETER & COMPUTATIONAL EFFICIENCY

ResNet connectivity:

DenseNet connectivity:

O(CxC)C C

lXkO(lxkxk)

k

Correlated features

k: Growth rate

#parameters:

k<<C

Diversified features

Input Output

Input

Output

hl

hl

ADVANTAGE 3: MAINTAINS LOW COMPLEXITY FEATURES

y = w4h4(x)w4

Classifier uses most complex (high level) features

classifier

Standard Connectivity:

h1(x) h2(x) h3(x) h4(x)x

Increasingly complex features

ADVANTAGE 3: MAINTAINS LOW COMPLEXITY FEATURESDense Connectivity:

h1(x) h2(x) h3(x) h4(x)x

C C C C

y = w0x + +w1h1(x) +w2h2(x) +w3h3(x) +w4h4(x)

classifier

w4

w0

w1

w2

w3

Classifier uses features of all complexity levels

Increasingly complex features

RESULTS

Test

Err

or (

%)

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

11.0

12.0

3.6

4.54.62

6.41

4.2

RESULTS ON CIFAR-10

ResNet (110 Layers, 1.7 M) ResNet (1001 Layers, 10.2 M)DenseNet (100 Layers, 0.8 M) DenseNet (250 Layers, 15.3 M)

Previous SOTA

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

11.0

12.0

5.25.9

10.5611.26

7.3

Previous SOTA

With data augmentation Without data augmentation

Test

Err

or (

%)

10.0

15.0

20.0

25.0

30.0

35.0

17.6

22.322.71

27.22

20.5

RESULTS ON CIFAR-100

ResNet (110 Layers, 1.7 M) ResNet (1001 Layers, 10.2 M)DenseNet (100 Layers, 0.8 M) DenseNet (250 Layers, 15.3 M)

Previous SOTA

10.0

15.0

20.0

25.0

30.0

35.0

19.6

24.2

33.4735.58

28.2

Previous SOTA

With data augmentation Without data augmentation

Top-

1 er

ror

(%)

20.0

22.0

24.0

26.0

28.0

GFLOPs

3 10 16 23 29

DenseNet ResNet

RESULTS ON IMAGENET

Top-

1 er

ror

(%)

20.0

22.0

24.0

26.0

28.0

# Parameters (M)

0 20 40 60 80

DenseNet ResNet

ResNet-152ResNet-101

ResNet-50

ResNet-34

ResNet-152ResNet-101

ResNet-50

ResNet-34

DenseNet-264(k=48)

DenseNet-264

DenseNet-201

DenseNet-169

DenseNet-121 DenseNet-121

DenseNet-169

DenseNet-201

DenseNet-264DenseNet-264(k=48)

Top-1: 20.27% Top-5: 5.17%

MULTI-SCALE DENSENET

Classifier 4Classifier 2 Classifier 3Classifier 1

…

…

…

…Classifier 2 Classifier 3Classifier 1

cat: 0.20.2 ≱ threshold

cat: 0.40.4 ≱ threshold

cat: 0.60.6 > threshold

Multi-Scale DenseNet: [Huang, Chen, Li, Wu, van der Maaten, Weinberger] (arXiv Preprint: 1703.09844)

(Preview)

MULTI-SCALE DENSENET

Classifier 4Classifier 2 Classifier 3Classifier 1

…

…

…

Test Input

“Easy” examples “Hard” examples

Inference Speed:~ 2.6x faster than ResNets~ 1.3x faster than DenseNets

(Preview)

…

Memory efficient Torch implementation:https://github.com/liuzhuang13/DenseNet

Our Caffe ImplementationOur memory-efficient Caffe Implementation.Our memory-efficient PyTorch Implementation.PyTorch Implementation by Andreas Veit.PyTorch Implementation by Brandon Amos.MXNet Implementation by Nicatio.MXNet Implementation (supports ImageNet) by Xiong Lin.Tensorflow Implementation by Yixuan Li.Tensorflow Implementation by Laurent Mazare.

Tensorflow Implementation (with BC structure) by Illarion Khlestov.Lasagne Implementation by Jan Schlüter.Keras Implementation by tdeboissiere.Keras Implementation by Roberto de Moura Estevão Filho.Keras Implementation (with BC structure) by Somshubra Majumdar.Chainer Implementation by Toshinori Hanya.Chainer Implementation by Yasunori Kudo.

Other implementations:

https://github.com/liuzhuang13/DenseNet

https://github.com/liuzhuang13/DenseNetCaffe

https://github.com/Tongcheng/DN_CaffeScript

https://github.com/gpleiss/efficient_densenet_pytorch

https://github.com/andreasveit/densenet-pytorch

https://github.com/bamos/densenet.pytorch

https://github.com/Nicatio/Densenet/tree/master/mxnet

https://github.com/bruinxiong/densenet.mxnet

https://github.com/YixuanLi/densenet-tensorflow

https://github.com/LaurentMazare/deep-models/tree/master/densenet

https://github.com/ikhlestov/vision_networks

https://github.com/Lasagne/Recipes/tree/master/papers/densenet

https://github.com/tdeboissiere/DeepLearningImplementations/tree/master/DenseNet

https://github.com/robertomest/convnet-study

https://github.com/titu1994/DenseNet

https://github.com/t-hanya/chainer-DenseNet

https://github.com/yasunorikudo/chainer-DenseNet

REFERENCES

• Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016• Chen-Yu Lee, et al. "Deeply-supervised nets" AISTATS 2015• Gao Huang, et al. "Deep networks with stochastic depth" ECCV 2016• Gao Huang, et al. "Multi-Scale Dense Convolutional Networks for Efficient Prediction" arXiv

preprint arXiv:1703.09844 (2017) • Geoff Pleiss, et al. "Memory-Efficient Implementation of DenseNets”, arXiv preprint arXiv:

1707.06990 (2017)

Documents

DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"