Networks Panoptic Feature Pyramidcvrr.ucsd.edu/ece285sp20/files/panoptic.pdfPanoptic FPN improves PQ by 8.8 points Greater improvement for PQst Impressive because their semantic FPN

Panoptic Feature Pyramid Networks

Presented by: Saikiran KomatineniECE 285: Autonomous Driving Systems

A. Kirillov, R. Girshick, K. He and P. Dollár, "Panoptic Feature Pyramid Networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 6392-6401, doi: 10.1109/CVPR.2019.00656.

Facebook AI Research (FAIR)

1

Background

“Thing”

“Stuff”[30] Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. Panoptic segmentation. In ´ CVPR, 2019. 2

Background - FPN

● Feature Pyramid Network (FPN) is a feature extractor designed for a pyramid concept with accuracy and speed in mind.

● Combines low-resolution, semantically strong features with high-resolution, semantically weak features via a top-down pathway and lateral connections

[36] Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, ´ Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017. 3

Motivation

● Top-performing methods:○ Semantic Segmentation - Fully Connected Networks (FCN) with specialized backbones

enhanced by dilation networks

○ Instance Segmentation - Region-based Mask R-CNN with Feature Pyramid Network (FPN)backbone

● For the combined task, current approaches use seperate networks with no shared computation○ This is computationally expensive○ Instance and Semantic Segmentation cannot be done simultaneously with one FPN

4

Objective

● To design a simple single-network baseline that can achieve top performance on both instance and semantic segmentation and their joint task

● Approach:○ Start with Mask R-CNN and FPN - a baseline for instance segmentation - and make minimal

changes to generate a semantic segmentation dense-pixel output

● Additionally:○ FPN can also be highly effective for semantic segmentation (rather than FCN)○ Study the benefits of multi-task training for stuff and thing segmentation

5

Methodology

● Feature Pyramid Network

● Instance Segmentation Branch

● Semantic Segmentation Branch

● Weighted loss for semantic and Instance

Segmentation

6

Methodology - Semantic Segmentation branch

7

Methodology - Semantic Segmentation branch

● Goal: To merge information from all levels of the FPN pyramid into a single output● Three upsampling stages starting from the deepest FPN level to the ¼ scale

○ Yields a feature map at ¼ scale○ Upsampling done by 3 x 3 convolution, group norm, ReLU, and 2x bilinear upsampling

● All feature maps are summed and 1 x 1 convolution, 4x bilinear upsampling, and sotmax are used to generate per-pixel class labels at the original image resolution.

8

Methodology - Inference and Training

● Post processing is used to remove overlap between instance and semantic segmentation outputs● Instance Segmentation Loss:

○ Classification loss (Lc) - normalized by number of sampled ROI○ Bounding-box loss (Lb) - normalized by number of sampled ROI○ Mark loss (Lm) - normalized by number of foreground ROI

● Semantic Segmentation Loss (Ls):○ Per-pixel cross entropy loss between predicted and ground-truth labels ○ Normalized by number of labeled image pixels

● Total Loss (L):○ L = λ

i (L

c + L

b + L

m) + λ

sL

s

9

Methodology - Dataset and Metrics

● Dataset:○ COCO - Focus on Instance Segmentation | 80 thing classes | 118k/5k/20k train/val/test images○ Cityscapes - ego-centric street scene dataset | 19 classes | 8 instance-level masks | 5k total images

● Single-task metrics:○ mIoU (mean Intersection-over-Union) - COCO/Cityscapes○ fIoU (frequency weighted IoU) - COCO○ iIoU (instance-level IoU) - Cityscapes○ AP (Average Precision)

● Panoptic Segmentation metrics:○ PQ (Panoptic Quality) captures both recognition and segmentation quality and treats both stuff and thing

categories in a unified manner

10

Panoptic Quality (Segment Matching)

● Gives unique matching where there can be at most one predicted segment matched with each ground truth segment

●

11

Panoptic Quality (PQ computation)

● PQ is insensitive to class imbalance ● Void labels (out of class pixels and

ambiguous pixels) are discarded

● Group labels - alternative to ids for adjacent segments

12

Results - Semantic Segmentation using FPN

COCO-Stuff 2017 Challenge

Cityscapes Semantic FPN

13

Results - Multi-Task Training (Panoptic FPN for Instance Segmentation)

● Adding a semantic segmentation branch can slightly improve instance segmentation results over a single-task baseline

● Best Results: λs = 0.1 for COCO and λ

s = 1.0 for Cityscapes

14

Results - Multi-Task Training (Panoptic FPN for Semantic Segmentation)

● Adding an instance segmentation branch can provide significant benefits for semantic segmentation

● Best Results: λi = 1.0 for COCO and λ

i = 0.25 for Cityscapes

15

Results - Panoptic Segmentation

● Using a single FPN for both instance and semantic segmentation (where previously two were used)

results in comparable accuracies, but with half the

compute

● mIoU shows better results for a single FPN16

● Given approximately the same computational budget, single FPN performs slightly better than

two independent FPN networks for both instance

and semantic segmentation

● All metrics reflect better performance

Results - Panoptic Segmentation

● Panoptic FPN improves PQ by 8.8 points● Greater improvement for PQst

○ Impressive because their semantic FPN is lightweight compared to state-of-the art

17

● Panoptic FPN improves PQ by 4.3 points● mIoU is greater for DIN because they perform pixel

wise semantic segmentation and then perform

grouping to extract instances

Single Network FPN on COCO-test-dev Alternative to Region-based Instance Segmentation on Cityscapes

Results - Qualitative

18

Results - Qualitative

19

Advantages

● Baseline for Panoptic Segmentation

● Instance and Semantic Segmentation can be

performed simultaneously with no

compromise in accuracy

● Provides impressive results when

computation is budgeted and when it is not

20

● There is only one other approach (DIN [1,

34] to compare against for results on

cityscapes data

● mIoU has significant variation across

experiments

Disadvantages

Takeaway

● Panoptic FPN is a single network that can simultaneously generate region-based outputs (for instance segmentation) and dense-pixel outputs (for semantic segmentation).

● Using a single FPN for solving both tasks simultaneously yields accuracy equivalent to two seperate FPN’s with half the compute.

● Using roughly equal computational budget, Panoptic FPN significantly outperforms two separate networks.

21

Question for the class

What post-processing steps need to be taken to resolve overlap between results from instance and

semantic segmentation?

22

https://forms.gle/XJHRR5YJCpDek5HX7https://forms.gle/XJHRR5YJCpDek5HX7

Thank you

Questions?

23

References

1. A. Kirillov, R. Girshick, K. He and P. Dollár, "Panoptic Feature Pyramid Networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 6392-6401, doi: 10.1109/CVPR.2019.00656.

2. A. Kirillov, K. He, R. Girshick, C. Rother and P. Dollár, "Panoptic Segmentation," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 9396-9405, doi: 10.1109/CVPR.2019.00963.

3. Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, ´ Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017.

4. Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Gir- ´ shick. Mask R-CNN. In ICCV, 20175. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C

Lawrence ´ Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.

24

Documents

Networks Panoptic Feature Pyramidcvrr.ucsd.edu/ece285sp20/files/panoptic.pdfPanoptic FPN improves PQ by 8.8 points Greater improvement for PQst Impressive because their semantic FPN