19
Jifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen Wei Visual Computing Group Microsoft Research Asia (*interns at MSRA) Deformable Convolutional Networks -- MSRA COCO Detection & Segmentation Challenge 2017 Entry

Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Embed Size (px)

Citation preview

Page 1: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Jifeng DaiWith Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen Wei

Visual Computing Group

Microsoft Research Asia

(*interns at MSRA)

Deformable Convolutional Networks-- MSRA COCO Detection & Segmentation Challenge 2017 Entry

Page 2: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Outline

• Deformable ConvNets idea

• Deformable ConvNets for COCO challenge

Page 3: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Highlights

• Enabling effective modeling of spatial transformation in ConvNets

• No additional supervision for learning spatial transformation

• Significant accuracy improvements on sophisticated vision tasks

Code is available at https://github.com/msracver/Deformable-ConvNets

Page 4: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Modeling Spatial Transformations

• A long standing problem in computer visionDeformation: Scale:

Viewpoint variation: Intra-class variation:

(Some examples are taken from Li Fei-fei’s course CS223B, 2009-2010)

Page 5: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Traditional Approaches

• 1) To build training datasets with sufficient desired variations

• 2) To use transformation-invariant features and algorithms

• Drawbacks: geometric transformations are assumed fixed and known, hand-crafted design of invariant features and algorithms

Scale Invariant Feature Transform (SIFT) Deformable Part-based Model (DPM)

Page 6: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Spatial Transformations in CNNs

• Regular CNNs are inherently limited to model large unknown transformations• The limitation originates from the fixed geometric structures of CNN modules

regular convolution regular RoI Pooling2 layers of regular convolution

Page 7: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Spatial Transformer Networks

• Learning a global, parametric transformation on feature maps• Prefixed transformation family, infeasible for complex vision tasks

Page 8: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Deformable Convolution

• Local, dense, non-parametric transformation• Learning to deform the sampling locations in the convolution/RoI Pooling modules

regular deformed scale & aspect ratio rotation

Page 9: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Deformable Convolution

Regular convolution

Deformable convolution

where is generated by a sibling branch of regular convolution

Page 10: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Deformable RoI Pooling

deformable RoI Pooling

Regular RoI pooling

Deformable RoI pooling

where is generated by a sibling fc branch

Page 11: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Deformable ConvNets

• Same input & output as the plain versions• Regular convolution -> deformable convolution

• Regular RoI pooling -> deformable RoI pooling

• End-to-end trainable without additional supervision

Page 12: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Sampling Locations of Deformable Convolution

(a) standard convolution (b) deformable convolution

Page 13: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Part Offsets in Deformable RoI Pooling

Page 14: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Deformable ConvNets for Object Detection

Fast(er) RCNN R-FCN FPN

RoI Pooling

Car Person

Car

Person

Position SensitiveRoI Pooling

Car Person

RoI Pooling

Conv5 Position SensitiveScore Map

Conv3

Conv4

Conv5 P5

P4

P3

• Regular object detectors

Page 15: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Deformable ConvNets for Object Detection

Fast(er) RCNN R-FCN FPN

DeformableRoI Pooling

Car Person

Car

Person

DeformablePS RoI Pooling

Car Person

DeformableRoI Pooling

: Deformable Convolution / RoI Pooling

Conv5 Position SensitiveScore Map

Conv3

Conv4

Conv5 P5

P4

P3

• Deformable object detectors

Page 16: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

XCeption -> Aligned XCeption

14x14x728 feature maps

separable conv 728, 3x3, pad 1

separable conv 1024, 3x3, pad 1

max pool 3x3, stride2, pad 1

conv 1024 1x1 stride2, pad 0

separable conv 1536, 3x3, pad 1

separable conv 1536, 3x3, pad 1

7x7x2048 feature maps

separable conv 2048, 3x3, pad 1

entry flow

conv 32, 3x3, stride2, pad 1

conv 64, 3x3, pad 1

separable conv 128, 3x3, pad 1

separable conv 128, 3x3, pad 1

max pool 3x3, stride 2, pad 1

conv 128 1x1 stride2, pad 0

separable conv 728, 3x3, pad 1

separable conv 728, 3x3, pad 1

max pool 3x3, stride2, pad 1

14x14x728 feature maps

224x224x3 images

separable conv 256, 3x3, pad 1

separable conv 256, 3x3, pad 1conv 256 1x1

separable conv 256, 3x3, pad 1

separable conv 256, 3x3, pad 1

separable conv 256, 3x3, pad 1

max pool 3x3, stride2, pad 1

separable conv 728, 3x3, pad 1

separable conv 728, 3x3, pad 1

separable conv 728, 3x3, pad 1

conv 256 1x1, stride2, pad 0

conv 728 1x1

conv 728 1x1, stride2, pad 0

separable conv 728, 3x3, pad 1

separable conv 728, 3x3, pad 1

separable conv 728, 3x3, pad 1

14x14x728 feature maps

Repeat 16 times

14x14x728 feature maps

middle flow

exit flow

• Proper feature alignment in XCeption• Efficient: 9.5 GFLOPS on 224*224 img (ResNet-101, 7.6 GFLOPS)

• Accurate: mAP 2.8% better than ResNet-101 using FPN on COCO (det, test-dev)

Page 17: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Object Detection on COCO (Test-dev)

• MSRA 2017 Entry• ~3% mAP improvements by Deformable ConvNets

• Best single model performance: 48.5%

37.4

40.7

41.6

45.2

40.5

43.3

44.2

45.3

46.9

47.9

48.5

50.7

30 35 40 45 50

FPN+OHEM (RESNET-101)

FPN+OHEM (ALIGNED XCEPTION)

+ MASK

+ SOFT NMS

+ MULTI-SCALE TESTING

+ ITERATIVE TESTING

+ HORIZONTAL FLIP

+ ENSEMBLE (6 MODELS)

mAP (%)

Deformable Regular

Page 18: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Object Detection on COCO (Test-dev)

• Deformable ConvNets v.s. regular ConvNets• Noticeable improvements for varies baselines

• Marginal parameter & computation overhead

23.2

30.3

32.1

34.5

37.4

40.2

45.2

25.8

35

35.7

37.5

40.5

43.3

48.5

20 25 30 35 40 45 50

CLASS-AWARE RPN (RESNET-101)

FASTER R-CNN, 2FC (RESNET-101)

R-FCN (RESNET-101)

R-FCN (ALIGNED-INCEPTION-RESNET)

FPN+OHEM (RESNET-101)

FPN+OHEM (ALIGNED-XCEPTION)

FPN++ (ALIGNED-XCEPTION)

mAP (%)

Deformable Regular

Page 19: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen

Conclusion

• Deformable ConvNets for dense spatial modeling• Simple, efficient, deep, and end-to-end

• No additional supervision

• Feasible and effective on sophisticated vision tasks for the first time

• Our team

Jifeng DaiHaozhi Qi* Yichen Wei

* interns at MSRA

Zheng Zhang Bowen Cheng*Bin Xiao Han Hu