Deep Learning for Image Instance Segmentation ----Mask R-CNN · object instance segmentation....

Deep Learning for Image Instance Segmentation----Mask R-CNN

Jianping Fan Dept of Computer Science

UNC-Charlotte

Course Website: http://webpages.uncc.edu/jfan/itcs5152.html

Definition of Image Instance Segmentation

Instance segmentation = object detection + semantic segmentation?

Background Review

• R-CNN [4]: The Region-based CNN (R-CNN)• Replace sliding windows with “selective search” region proposals(Uijilings et

al. IJCV 2013)

• Extract rectangles around regions and resize to 227x227 pixels

• Extract features with fine-tuned CNN (that was initialized with network trained on ImageNet before training)

• Classify last layer of network features with SVM, refine bounding box localization (bbox regression) simultaneously

6Slide credit to Ke-Shuan Cheng

• R-CNN: The Region-based CNN (R-CNN)

Slide credit to Ke-Shuan Cheng

Region warping is performed for fixed size features

R-CNN Architecture

Regional Proposal Network (RPN)

● Foreground vs Background ● Bounding Box regression● Feed bounding boxes into Fast RCNN

(0,0)(1,3)

(16,48)

Mapping the center of the receptive fields

● k anchor boxes ○ 3 scales (8,16, 32)○ 3 aspect ratios (.5, 1, 2)○ Stride 16

● WHk anchors

Anchor Boxes

H Center(x,y)

Feature Map

See here.

Anchor Boxes

RPN(Region Proposal Network) Object vs Not an Object

Anchor

Object = 1 to:a) Anchors with the highest

Intersection-over-Union(IoU)b) IoU > 0.7 with any ground truth

box.Not object = -1

a) If IoU <0.3

(Sx,Sy)

Mapping the center of the receptive fields

SigmoidCrossEntropyLoss SmoothL1Loss

(512 × (2 + 4) × 9) parameters for VGG-16)

Multi-task loss:

Only if p*= 1Hyper parameter =10

Mini batch size =256

Number of Anchor locations

Fast R-CNN Architecture

Fast R-CNN

• Fast R-CNN [5]• Improvement: It only feed the whole image into CNN only once! Then crop

features instead of image itself.

Fast R-CNN

• RoI Pooling• The RoI pooling layer uses max pooling to convert the features inside any valid

region of interest into a small feature map with a fixed spatial extent of H × W. (e.g., 2*2 )

RoI Pooling in Fast R-CNN

RoI Pooling

Slide credit to Silvio Galesso

For each proposalNMS RoI Pooling

Fully connected layers

softmax

Bbox regression

Fast R-CNN

Region of Interest (RoI):

Fast R-CNN

.74 | .39 | .34

.2 | .16 | .73

.83 | .97 | .88

3X3 RoI pooling

Fast R-CNN

.74 | .39 | .34

.2 | .16 | .73

.83 | .97 | .88

7X7 RoI poolingPer proposal

Only a problem for segmentation

Faster R-CNN

• Faster R-CNN [6]• Improvement: Generate RoI by Region Proposal Network.

● Two main parts○ Region Proposal

Network○ Fast R-CNN○ (also this) Pre-

trained network

Comparison

• Compare with 3 model

Mask-RCNN

Mask R-CNN• Mask RCNN is a simple, flexible, and general framework for

object instance segmentation.

• Mask R-CNN, extends Faster R-CNN by adding a branch for predicting segmentation masks on each Region of Interest (RoI), in parallel with the existing branch for classification and bounding box regression.

• Mask R-CNN is simple to trained and adds only a small overhead to Faster R-CNN.

• The mask branch is a small Fully Convolutional Network (FCN) applied to each RoI, predicting a segmentation mask in a pixel-to-pixel manner.

Mask R-CNN Architecture

Kaiming He, ICCV 2017 Tutorial

Mask R-CNN

• Mask R-CNN [3] is conceptually simple: Faster R-CNN has two outputs for each candidate object, a class label and a bounding-box offset; to this R-CNN added a third branch that outputs the object mask.

Faster R-CNN

3rd Branch Object Mask

Slide credit to Ross Girshick

Slide credit to T. Kim

RoI Align in Mask R-CNN

RoI Align

Slide credit to T. Kim

Slide credit to Ross Girshick

Mask R-CNN

• RoI Align is Improves miss-align problems of RoI pooling• RoI Align use bilinear interpolation to generate new feature map.

• Do RoI Pooling with aligned feature map

RoI Pooling

Bilinear interpolation

FPN (Feature Pyramid Network)

Mask R-CNN

• They proposed 2 architecture and compared they for Object

Mask branch

• ResNet

• Branch from last Convolutional layer

• Feature Pyramid Network(FPN)

• Branch from RoI

Mask R-CNN

• Loss function is defined as below:𝐿 = 𝐿𝑐𝑙𝑠 + 𝐿𝑏𝑜𝑥 + 𝐿𝑚𝑎𝑠𝑘

• 𝐿𝑐𝑙𝑠: Cross-Entropy

• 𝐿𝑏𝑜𝑥: IoU𝐴𝑟𝑒𝑎 𝑜𝑓 𝑂𝑣𝑒𝑟𝐿𝑎𝑝

𝐴𝑟𝑒𝑎 𝑜𝑓 𝑈𝑛𝑖𝑜𝑛• 𝐿𝑚𝑎𝑠𝑘: Cross-Entropy between pixel-to-pixel

Mask R-CNN• Result

References https://arxiv.org/pdf/1506.01497.pdf (Faster R-CNN)https://arxiv.org/pdf/1504.08083.pdf (Fast R-CNN)https://arxiv.org/pdf/1506.06981.pdf (R-CNN minus R)https://koen.me/research/pub/uijlings-ijcv2013-draft.pdf (Selective Search for Object Detection)https://arxiv.org/pdf/1703.06870.pdf (Mask R-CNN)http://host.robots.ox.ac.uk/pascal/VOC/https://www.dropbox.com/s/xtr4yd4i5e0vw8g/iccv15_tutorial_training_rbg.pdf?dl=0http://kaiminghe.com/iccv15tutorial/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdfhttps://lovesnowbest.site/2018/02/27/Intro-to-Object-Detection/https://blog.deepsense.ai/region-of-interest-pooling-explained/https://tryolabs.com/blog/2018/01/18/faster-r-cnn-down-the-rabbit-hole-of-modern-object-detection/

Deep Learning for Image Instance Segmentation ----Mask R-CNN · object instance segmentation....

Documents

Weed Recognition in Agriculture: A Mask R-CNN Approach

A Multi-Modal, Discriminative and Spatially Invariant CNN ...static.tongtianta.site/paper_pdf/dac96026-036e-11e... · semantic segmentation to a multi-modal CNN architecture, where

Sequential Context Encoding for Duplicate Removalwith DCN [8]. FPN can yield high-quality object detection results. Mask R-CNN is designed for instance segmentation, suitable for multi-task

Mask R-CNN for Moving Shadow Detection and Segmentation

Mask R-CNN ICCV 2017(Oral)imlab.postech.ac.kr/dkim/class/csed703G_2019f/maskrcn.pdf · 2019-11-20 · Mask R-CNN extends Faster R-CNN by adding a branch for predicting an object mask

Lab2 Object Detection & Mask · 2020-01-03 · Faster R-cnn. Mask R-Cnn •Extending Faster R-CNN for Pixel Level Segmentation •Use RoI Align. FPN(feature pyramid network) You Only

Face mask detection using YOLOv3 and faster R-CNN models

Mask r-cnnfidler/teaching/2018/slides/CSC2548/MaskRCNN.pdfMask R-CNN for Human Pose Estimation •Model keypoint location as a one-hot binary mask •Generate a mask for each keypoint

Detection and Segmentation of Manufacturing Defects with ...eil.stanford.edu/publications/max_ferguson/SMSS_Casting_Defects.pdf · Mask R-CNN is an extension of Faster R-CNN that

Mask R-CNN - Facebook Research · PDF fileMask R-CNN is conceptually simple: Faster R-CNN has two outputs for each candidate object, a class label and a ... (RPN), proposes candidate

MRI Tumor Segmentation with Densely Connected 3D CNN · MRI Tumor Segmentation with Densely Connected 3D CNN Lele Chen y1, Yue Wu , Adora M. DSouza2, Anas Z. Abidin3, Axel Wismuller

Constrained-CNN Losses for Weakly Supervised Segmentation

Measuring particle statistics using a CNN-based segmentation

Mask R-CNN with Pyramid Attention Network for Scene Text … · 2018-11-26 · Mask R-CNN with Pyramid Attention Network for Scene Text Detection Zhida Huang1,3,y,, Zhuoyao Zhong2,3,y,,

Person Search via A Mask-guided Two-stream CNN Modelopenaccess.thecvf.com/content_ECCV_2018/papers/Di_Chen... · 2018-08-28 · Person Search via A Mask-Guided Two-Stream CNN Model

Deep CNN for Segmentation of Myocardial ASL … › files › talks › 2018-10-28-HungDo...2018/10/28 · Deep CNN for Segmentation of Myocardial ASL Short-Axis Data: Accuracy, Uncertainty,

cs230.stanford.educs230.stanford.edu/files_winter_2018/projects/6937153.pdf · 3.3.2 Mask RCNN Mask R-CNN is an extension of the Faster RCNN model [2]. Faster R-CNN is a Region Proposal

Contributions from Chiang, et al. Mask R-CNN

BUILDING SEGMENTATION FROM AIRBORNE VHR ......BUILDING SEGMENTATION FROM AIRBORNE VHR IMAGES USING MASK R-CNN K. Zhou1,, Y. Chen2, I. Smal1, R. Lindenbergh1 1 Dept. of Geoscience and

Tracking by Segmentation - cvut.cz · 2) Segmentation pre-training (DAVIS, PASCAL VOC, …) Strip fully connected layers Convert into fully convolutional segmentation CNN 3) Fine-tuning