24
Panoptic Feature Pyramid Networks Presented by: Saikiran Komatineni ECE 285: Autonomous Driving Systems A. Kirillov, R. Girshick, K. He and P. Dollár, "Panoptic Feature Pyramid Networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 6392-6401, doi: 10.1109/CVPR.2019.00656. Facebook AI Research (FAIR) 1

Networks Panoptic Feature Pyramidcvrr.ucsd.edu/ece285sp20/files/panoptic.pdfPanoptic FPN improves PQ by 8.8 points Greater improvement for PQst Impressive because their semantic FPN

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • Panoptic Feature Pyramid Networks

    Presented by: Saikiran KomatineniECE 285: Autonomous Driving Systems

    A. Kirillov, R. Girshick, K. He and P. Dollár, "Panoptic Feature Pyramid Networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 6392-6401, doi: 10.1109/CVPR.2019.00656.

    Facebook AI Research (FAIR)

    1

  • Background

    “Thing”

    “Stuff”[30] Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. Panoptic segmentation. In ´ CVPR, 2019. 2

  • Background - FPN

    ● Feature Pyramid Network (FPN) is a feature extractor designed for a pyramid concept with accuracy and speed in mind.

    ● Combines low-resolution, semantically strong features with high-resolution, semantically weak features via a top-down pathway and lateral connections

    [36] Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, ´ Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017. 3

  • Motivation

    ● Top-performing methods:○ Semantic Segmentation - Fully Connected Networks (FCN) with specialized backbones

    enhanced by dilation networks

    ○ Instance Segmentation - Region-based Mask R-CNN with Feature Pyramid Network (FPN)backbone

    ● For the combined task, current approaches use seperate networks with no shared computation○ This is computationally expensive○ Instance and Semantic Segmentation cannot be done simultaneously with one FPN

    4

  • Objective

    ● To design a simple single-network baseline that can achieve top performance on both instance and semantic segmentation and their joint task

    ● Approach:○ Start with Mask R-CNN and FPN - a baseline for instance segmentation - and make minimal

    changes to generate a semantic segmentation dense-pixel output

    ● Additionally:○ FPN can also be highly effective for semantic segmentation (rather than FCN)○ Study the benefits of multi-task training for stuff and thing segmentation

    5

  • Methodology

    ● Feature Pyramid Network

    ● Instance Segmentation Branch

    ● Semantic Segmentation Branch

    ● Weighted loss for semantic and Instance

    Segmentation

    6

  • Methodology - Semantic Segmentation branch

    7

  • Methodology - Semantic Segmentation branch

    ● Goal: To merge information from all levels of the FPN pyramid into a single output● Three upsampling stages starting from the deepest FPN level to the ¼ scale

    ○ Yields a feature map at ¼ scale○ Upsampling done by 3 x 3 convolution, group norm, ReLU, and 2x bilinear upsampling

    ● All feature maps are summed and 1 x 1 convolution, 4x bilinear upsampling, and sotmax are used to generate per-pixel class labels at the original image resolution.

    8

  • Methodology - Inference and Training

    ● Post processing is used to remove overlap between instance and semantic segmentation outputs● Instance Segmentation Loss:

    ○ Classification loss (Lc) - normalized by number of sampled ROI○ Bounding-box loss (Lb) - normalized by number of sampled ROI○ Mark loss (Lm) - normalized by number of foreground ROI

    ● Semantic Segmentation Loss (Ls):○ Per-pixel cross entropy loss between predicted and ground-truth labels ○ Normalized by number of labeled image pixels

    ● Total Loss (L):○ L = λ

    i (L

    c + L

    b + L

    m) + λ

    sL

    s

    9

  • Methodology - Dataset and Metrics

    ● Dataset:○ COCO - Focus on Instance Segmentation | 80 thing classes | 118k/5k/20k train/val/test images○ Cityscapes - ego-centric street scene dataset | 19 classes | 8 instance-level masks | 5k total images

    ● Single-task metrics:○ mIoU (mean Intersection-over-Union) - COCO/Cityscapes○ fIoU (frequency weighted IoU) - COCO○ iIoU (instance-level IoU) - Cityscapes○ AP (Average Precision)

    ● Panoptic Segmentation metrics:○ PQ (Panoptic Quality) captures both recognition and segmentation quality and treats both stuff and thing

    categories in a unified manner

    10

  • Panoptic Quality (Segment Matching)

    ● Gives unique matching where there can be at most one predicted segment matched with each ground truth segment

    11

  • Panoptic Quality (PQ computation)

    ● PQ is insensitive to class imbalance ● Void labels (out of class pixels and

    ambiguous pixels) are discarded

    ● Group labels - alternative to ids for adjacent segments

    12

  • Results - Semantic Segmentation using FPN

    COCO-Stuff 2017 Challenge

    Cityscapes Semantic FPN

    13

  • Results - Multi-Task Training (Panoptic FPN for Instance Segmentation)

    ● Adding a semantic segmentation branch can slightly improve instance segmentation results over a single-task baseline

    ● Best Results: λs = 0.1 for COCO and λ

    s = 1.0 for Cityscapes

    14

  • Results - Multi-Task Training (Panoptic FPN for Semantic Segmentation)

    ● Adding an instance segmentation branch can provide significant benefits for semantic segmentation

    ● Best Results: λi = 1.0 for COCO and λ

    i = 0.25 for Cityscapes

    15

  • Results - Panoptic Segmentation

    ● Using a single FPN for both instance and semantic segmentation (where previously two were used)

    results in comparable accuracies, but with half the

    compute

    ● mIoU shows better results for a single FPN16

    ● Given approximately the same computational budget, single FPN performs slightly better than

    two independent FPN networks for both instance

    and semantic segmentation

    ● All metrics reflect better performance

  • Results - Panoptic Segmentation

    ● Panoptic FPN improves PQ by 8.8 points● Greater improvement for PQst

    ○ Impressive because their semantic FPN is lightweight compared to state-of-the art

    17

    ● Panoptic FPN improves PQ by 4.3 points● mIoU is greater for DIN because they perform pixel

    wise semantic segmentation and then perform

    grouping to extract instances

    Single Network FPN on COCO-test-dev Alternative to Region-based Instance Segmentation on Cityscapes

  • Results - Qualitative

    18

  • Results - Qualitative

    19

  • Advantages

    ● Baseline for Panoptic Segmentation

    ● Instance and Semantic Segmentation can be

    performed simultaneously with no

    compromise in accuracy

    ● Provides impressive results when

    computation is budgeted and when it is not

    20

    ● There is only one other approach (DIN [1,

    34] to compare against for results on

    cityscapes data

    ● mIoU has significant variation across

    experiments

    Disadvantages

  • Takeaway

    ● Panoptic FPN is a single network that can simultaneously generate region-based outputs (for instance segmentation) and dense-pixel outputs (for semantic segmentation).

    ● Using a single FPN for solving both tasks simultaneously yields accuracy equivalent to two seperate FPN’s with half the compute.

    ● Using roughly equal computational budget, Panoptic FPN significantly outperforms two separate networks.

    21

  • Question for the class

    What post-processing steps need to be taken to resolve overlap between results from instance and

    semantic segmentation?

    22

    https://forms.gle/XJHRR5YJCpDek5HX7https://forms.gle/XJHRR5YJCpDek5HX7

  • Thank you

    Questions?

    23

  • References

    1. A. Kirillov, R. Girshick, K. He and P. Dollár, "Panoptic Feature Pyramid Networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 6392-6401, doi: 10.1109/CVPR.2019.00656.

    2. A. Kirillov, K. He, R. Girshick, C. Rother and P. Dollár, "Panoptic Segmentation," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 9396-9405, doi: 10.1109/CVPR.2019.00963.

    3. Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, ´ Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017.

    4. Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Gir- ´ shick. Mask R-CNN. In ICCV, 20175. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C

    Lawrence ´ Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.

    24