Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
DENSELY CONNECTED CONVOLUTIONAL NETWORKS
Gao Huang*, Zhuang Liu*, Laurens van der Maaten, Kilian Q. Weinberger
CVPR 2017
Cornell University Tsinghua University
Facebook AI Research
Best paper award
CONVOLUTIONAL NETWORKS
LeNet AlexNet
VGG Inception
ResNet
STANDARD CONNECTIVITY
RESNET CONNECTIVITY
Deep residual learning for image recognition: [He, Zhang, Ren, Sun] (CVPR 2015)
Identity mappings promote gradient propagation.
: Element-wise addition
DENSE CONNECTIVITY
C C C C
: Channel-wise concatenation C
DENSE AND SLIM
k channels k channels k channels k channels
k : Growth Rate
C C C C
x4x2x1h1 h2 h3 h4x0
FORWARD PROPAGATION
x4x3x2x1x0
x0 x1x0x3x2x1 x0
x3x2x1x0 Batc
ReL
Con
COMPOSITE LAYER IN DENSENET
x5 =h5([x0, …, x4])
kchannels
x4
Batc
h N
orm
ReL
U
Con
volu
tion
(3x3
)
x3x2x1 x0x3x2x1x0
COMPOSITE LAYER IN DENSENET
Higher parameter and computational efficiency
WITH BOTTLENECK LAYER
lxkchannels
4xkchannels
Batc
h N
orm
ReL
U
Con
volu
tion
(3x3
)
Batc
h N
orm
ReL
U
Con
volu
tion
(1x1
)k
channels
x3x2x1 x0
x4
DENSENETC
onvo
lutio
n
Pool
ing
Con
volu
tion
Pool
ing
Con
volu
tion
Pool
ing
Line
ar
Dense Block 1 Dense Block 2 Dense Block 3
Pooling reduces feature map sizes
Output
Feature map sizes match within each block
ADVANTAGES OF DENSE CONNECTIVITY
ADVANTAGE 1: STRONG GRADIENT FLOW
Error Signal
Deeply supervised Net: [Lee, Xie, Gallagher, Zhang, Tu] (2015)
Implicit “deep supervision”
ADVANTAGE 2: PARAMETER & COMPUTATIONAL EFFICIENCY
ResNet connectivity:
DenseNet connectivity:
O(CxC)C C
lXkO(lxkxk)
k
Correlated features
k: Growth rate
#parameters:
k<<C
Diversified features
Input Output
Input
Output
hl
hl
ADVANTAGE 3: MAINTAINS LOW COMPLEXITY FEATURES
y = w4h4(x)w4
Classifier uses most complex (high level) features
classifier
Standard Connectivity:
h1(x) h2(x) h3(x) h4(x)x
Increasingly complex features
ADVANTAGE 3: MAINTAINS LOW COMPLEXITY FEATURESDense Connectivity:
h1(x) h2(x) h3(x) h4(x)x
C C C C
y = w0x + +w1h1(x) +w2h2(x) +w3h3(x) +w4h4(x)
classifier
w4
w0
w1
w2
w3
Classifier uses features of all complexity levels
Increasingly complex features
RESULTS
Test
Err
or (
%)
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
12.0
3.6
4.54.62
6.41
4.2
RESULTS ON CIFAR-10
ResNet (110 Layers, 1.7 M) ResNet (1001 Layers, 10.2 M)DenseNet (100 Layers, 0.8 M) DenseNet (250 Layers, 15.3 M)
Previous SOTA
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
12.0
5.25.9
10.5611.26
7.3
Previous SOTA
With data augmentation Without data augmentation
Test
Err
or (
%)
10.0
15.0
20.0
25.0
30.0
35.0
17.6
22.322.71
27.22
20.5
RESULTS ON CIFAR-100
ResNet (110 Layers, 1.7 M) ResNet (1001 Layers, 10.2 M)DenseNet (100 Layers, 0.8 M) DenseNet (250 Layers, 15.3 M)
Previous SOTA
10.0
15.0
20.0
25.0
30.0
35.0
19.6
24.2
33.4735.58
28.2
Previous SOTA
With data augmentation Without data augmentation
Top-
1 er
ror
(%)
20.0
22.0
24.0
26.0
28.0
GFLOPs
3 10 16 23 29
DenseNet ResNet
RESULTS ON IMAGENET
Top-
1 er
ror
(%)
20.0
22.0
24.0
26.0
28.0
# Parameters (M)
0 20 40 60 80
DenseNet ResNet
ResNet-152ResNet-101
ResNet-50
ResNet-34
ResNet-152ResNet-101
ResNet-50
ResNet-34
DenseNet-264(k=48)
DenseNet-264
DenseNet-201
DenseNet-169
DenseNet-121 DenseNet-121
DenseNet-169
DenseNet-201
DenseNet-264DenseNet-264(k=48)
Top-1: 20.27% Top-5: 5.17%
MULTI-SCALE DENSENET
Classifier 4Classifier 2 Classifier 3Classifier 1
…
…
…
…Classifier 2 Classifier 3Classifier 1
cat: 0.20.2 ≱ threshold
cat: 0.40.4 ≱ threshold
cat: 0.60.6 > threshold
Multi-Scale DenseNet: [Huang, Chen, Li, Wu, van der Maaten, Weinberger] (arXiv Preprint: 1703.09844)
(Preview)
MULTI-SCALE DENSENET
Classifier 4Classifier 2 Classifier 3Classifier 1
…
…
…
Test Input
“Easy” examples “Hard” examples
Inference Speed:~ 2.6x faster than ResNets~ 1.3x faster than DenseNets
(Preview)
…
Memory efficient Torch implementation:https://github.com/liuzhuang13/DenseNet
Our Caffe ImplementationOur memory-efficient Caffe Implementation.Our memory-efficient PyTorch Implementation.PyTorch Implementation by Andreas Veit.PyTorch Implementation by Brandon Amos.MXNet Implementation by Nicatio.MXNet Implementation (supports ImageNet) by Xiong Lin.Tensorflow Implementation by Yixuan Li.Tensorflow Implementation by Laurent Mazare.
Tensorflow Implementation (with BC structure) by Illarion Khlestov.Lasagne Implementation by Jan Schlüter.Keras Implementation by tdeboissiere.Keras Implementation by Roberto de Moura Estevão Filho.Keras Implementation (with BC structure) by Somshubra Majumdar.Chainer Implementation by Toshinori Hanya.Chainer Implementation by Yasunori Kudo.
Other implementations:
REFERENCES
• Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016• Chen-Yu Lee, et al. "Deeply-supervised nets" AISTATS 2015• Gao Huang, et al. "Deep networks with stochastic depth" ECCV 2016• Gao Huang, et al. "Multi-Scale Dense Convolutional Networks for Efficient Prediction" arXiv
preprint arXiv:1703.09844 (2017) • Geoff Pleiss, et al. "Memory-Efficient Implementation of DenseNets”, arXiv preprint arXiv:
1707.06990 (2017)