Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Panoptic Feature Pyramid Networks
Presented by: Saikiran KomatineniECE 285: Autonomous Driving Systems
A. Kirillov, R. Girshick, K. He and P. Dollár, "Panoptic Feature Pyramid Networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 6392-6401, doi: 10.1109/CVPR.2019.00656.
Facebook AI Research (FAIR)
1
Background
“Thing”
“Stuff”[30] Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. Panoptic segmentation. In ´ CVPR, 2019. 2
Background - FPN
● Feature Pyramid Network (FPN) is a feature extractor designed for a pyramid concept with accuracy and speed in mind.
● Combines low-resolution, semantically strong features with high-resolution, semantically weak features via a top-down pathway and lateral connections
[36] Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, ´ Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017. 3
Motivation
● Top-performing methods:○ Semantic Segmentation - Fully Connected Networks (FCN) with specialized backbones
enhanced by dilation networks
○ Instance Segmentation - Region-based Mask R-CNN with Feature Pyramid Network (FPN)backbone
● For the combined task, current approaches use seperate networks with no shared computation○ This is computationally expensive○ Instance and Semantic Segmentation cannot be done simultaneously with one FPN
4
Objective
● To design a simple single-network baseline that can achieve top performance on both instance and semantic segmentation and their joint task
● Approach:○ Start with Mask R-CNN and FPN - a baseline for instance segmentation - and make minimal
changes to generate a semantic segmentation dense-pixel output
● Additionally:○ FPN can also be highly effective for semantic segmentation (rather than FCN)○ Study the benefits of multi-task training for stuff and thing segmentation
5
Methodology
● Feature Pyramid Network
● Instance Segmentation Branch
● Semantic Segmentation Branch
● Weighted loss for semantic and Instance
Segmentation
6
Methodology - Semantic Segmentation branch
7
Methodology - Semantic Segmentation branch
● Goal: To merge information from all levels of the FPN pyramid into a single output● Three upsampling stages starting from the deepest FPN level to the ¼ scale
○ Yields a feature map at ¼ scale○ Upsampling done by 3 x 3 convolution, group norm, ReLU, and 2x bilinear upsampling
● All feature maps are summed and 1 x 1 convolution, 4x bilinear upsampling, and sotmax are used to generate per-pixel class labels at the original image resolution.
8
Methodology - Inference and Training
● Post processing is used to remove overlap between instance and semantic segmentation outputs● Instance Segmentation Loss:
○ Classification loss (Lc) - normalized by number of sampled ROI○ Bounding-box loss (Lb) - normalized by number of sampled ROI○ Mark loss (Lm) - normalized by number of foreground ROI
● Semantic Segmentation Loss (Ls):○ Per-pixel cross entropy loss between predicted and ground-truth labels ○ Normalized by number of labeled image pixels
● Total Loss (L):○ L = λ
i (L
c + L
b + L
m) + λ
sL
s
9
Methodology - Dataset and Metrics
● Dataset:○ COCO - Focus on Instance Segmentation | 80 thing classes | 118k/5k/20k train/val/test images○ Cityscapes - ego-centric street scene dataset | 19 classes | 8 instance-level masks | 5k total images
● Single-task metrics:○ mIoU (mean Intersection-over-Union) - COCO/Cityscapes○ fIoU (frequency weighted IoU) - COCO○ iIoU (instance-level IoU) - Cityscapes○ AP (Average Precision)
● Panoptic Segmentation metrics:○ PQ (Panoptic Quality) captures both recognition and segmentation quality and treats both stuff and thing
categories in a unified manner
10
Panoptic Quality (Segment Matching)
● Gives unique matching where there can be at most one predicted segment matched with each ground truth segment
●
11
Panoptic Quality (PQ computation)
● PQ is insensitive to class imbalance ● Void labels (out of class pixels and
ambiguous pixels) are discarded
● Group labels - alternative to ids for adjacent segments
12
Results - Semantic Segmentation using FPN
COCO-Stuff 2017 Challenge
Cityscapes Semantic FPN
13
Results - Multi-Task Training (Panoptic FPN for Instance Segmentation)
● Adding a semantic segmentation branch can slightly improve instance segmentation results over a single-task baseline
● Best Results: λs = 0.1 for COCO and λ
s = 1.0 for Cityscapes
14
Results - Multi-Task Training (Panoptic FPN for Semantic Segmentation)
● Adding an instance segmentation branch can provide significant benefits for semantic segmentation
● Best Results: λi = 1.0 for COCO and λ
i = 0.25 for Cityscapes
15
Results - Panoptic Segmentation
● Using a single FPN for both instance and semantic segmentation (where previously two were used)
results in comparable accuracies, but with half the
compute
● mIoU shows better results for a single FPN16
● Given approximately the same computational budget, single FPN performs slightly better than
two independent FPN networks for both instance
and semantic segmentation
● All metrics reflect better performance
Results - Panoptic Segmentation
● Panoptic FPN improves PQ by 8.8 points● Greater improvement for PQst
○ Impressive because their semantic FPN is lightweight compared to state-of-the art
17
● Panoptic FPN improves PQ by 4.3 points● mIoU is greater for DIN because they perform pixel
wise semantic segmentation and then perform
grouping to extract instances
Single Network FPN on COCO-test-dev Alternative to Region-based Instance Segmentation on Cityscapes
Results - Qualitative
18
Results - Qualitative
19
Advantages
● Baseline for Panoptic Segmentation
● Instance and Semantic Segmentation can be
performed simultaneously with no
compromise in accuracy
● Provides impressive results when
computation is budgeted and when it is not
20
● There is only one other approach (DIN [1,
34] to compare against for results on
cityscapes data
● mIoU has significant variation across
experiments
Disadvantages
Takeaway
● Panoptic FPN is a single network that can simultaneously generate region-based outputs (for instance segmentation) and dense-pixel outputs (for semantic segmentation).
● Using a single FPN for solving both tasks simultaneously yields accuracy equivalent to two seperate FPN’s with half the compute.
● Using roughly equal computational budget, Panoptic FPN significantly outperforms two separate networks.
21
Question for the class
What post-processing steps need to be taken to resolve overlap between results from instance and
semantic segmentation?
22
https://forms.gle/XJHRR5YJCpDek5HX7https://forms.gle/XJHRR5YJCpDek5HX7
Thank you
Questions?
23
References
1. A. Kirillov, R. Girshick, K. He and P. Dollár, "Panoptic Feature Pyramid Networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 6392-6401, doi: 10.1109/CVPR.2019.00656.
2. A. Kirillov, K. He, R. Girshick, C. Rother and P. Dollár, "Panoptic Segmentation," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 9396-9405, doi: 10.1109/CVPR.2019.00963.
3. Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, ´ Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017.
4. Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Gir- ´ shick. Mask R-CNN. In ICCV, 20175. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C
Lawrence ´ Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.
24