Recurrent Instance Segmentation (UPC Reading Group)

Recurrent Instance Segmentation

Slides by Manel BaradadComputer Vision Reading Group, UPC

9th September, 2016

Bernardino Romera-Paredes, Philip H. S. Torr[arxiv] (25 Nov 2015) - ECCV 2016

Contents

● Introduction● Structure

○ FCN○ ConvLSTM○ Spatial Inhibition module○ Post processing

● Loss function● Experiments

○ Multiple Person Segmentation○ Plants Leaf Segmentation

● Conclusions

Introduction● Detecting and delineating each distinct object of a

specific class appearing in an image● Contributions:

○ End-to-end approach for semantic instance segmentation

○ Derivation of a loss function for this problem

● Two particular classes tested:○ Multiple Person Segmentation○ Plants Leaf Segmentation and Counting

● It is not an attention based model, though the goal is attention on regions

Structure

Stopping condition:

Ŷn + 1^

sn + 1^

hn + 1 ^^

Fully convolutional network

Ŷn + 1^

sn + 1^

hn + 1 ^^

Fully convolutional network● Objective: obtain features that serve as the input of the ConvLSTM● The article builds upon other good FCN’s for the semantic segmentation task● Specific for each of the two experiments performed (explained later)

Example: For the Multiple Person Segmentation FCN-8

ConvLSTM

Ŷn + 1^

sn + 1^

hn + 1 ^^

LSTM: Recurrent structure

● Ability to produce sequential output● Provides memory

○ Implicitly model occlusion, segmenting non-occluded instances first, and keeping in its state the regions of the image that have already been segmented

○ Consider potential relationships from different instances in the image (i.e. all the instances of are always or never found together)

ConvLSTM

ConvLSTMConvLSTM: “Standard” LSTM replacing the Fully connected layers ( ) for Convolutional layers

ftit ot

11Extracted from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

ConvLSTMWhy Conv instead of FC Layer for LSTM?

● Similar advantages of Conv Layers with respect to FC Layers○ Suitable for learning filters○ Useful for spatially invariant inputs such as images○ Require less memory for the parameters

ConvLSTM

k(ht): sum of the absolute values across channels of ht Ŷt: predicted mask

Spatial inhibition module

Ŷn + 1^

sn + 1^

hn + 1 ^^

Spatial inhibition module

Region proposals

Scores

Region proposals

Scores

Region proposalsRegion proposals

Scores

Value ranges:

Discriminate only one instance: Convolution + log-softmax

Adapt output to binary mask

At inference time, a pixel is assigned to an instance if the predicted value is higher than 0.5 (though values are usually saturated, very close to 0 or 1)

Scores

Region proposals

Scores

Region proposals

Scores

Really simple: The “intelligence” of the scoring must be learned in the previous states

Look at the values of the hidden state and apply a linear function

Post ProcessingResults are further improved using a Conditional Random Field:

● Refine regions, as the ConvLSTM operates on a low resolution representation of the image

● Outside of the trainable modules

RIS (Recurrent Instance Segmentation)

RIS + CRF post processing

Structure

Stopping condition:

Ŷn + 1^

sn + 1^

hn + 1 ^^

Loss Function● End-to-end training

Loss Function1-Compute the intersection over the union for all pairs of Predicted/GT masks

Ŷt Yt

Loss Function1-Compute the Intersection over the union for all pairs of Predicted/GT masks

Loss Function2-Find best matching:

2-Find best matching:

Simply find maximum weight bipartite matching (being the weights )

Easily solved using the Hungarian Algorithm (polynomial time )

Loss Function

Loss Function2-Find best matching:

Loss: - Sum of the Intersections over the union for the best matching

Loss Function3-Also take into account the scores

Where:

is the binary cross entropy:

is the Iverson bracket which:

Is 1 if the condition is true and 0 else

Loss Function3-Also take into account the similarities

Simply:

For matched instances

For unmatched instances

> 0, and we want it small

1 - s5

Loss Function4-Add everything together

> 0, and we want it big > 0, and we want it small

● For each iteration:○ Forward propagate○ Find optimal matching

○ Once we have the matching, backpropagate the gradients of the loss function, with the values previously found

● The minimization of the loss function is ignored when backpropagating

Loss Function4-Add everything together

Integrates the model on the FCN-8s network developed in Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation

Experiments: Multiple Person Segmentation

ConvLSTM introduced before the upsampling layer

Multiple Person SegmentationTrained using the MSCOCO dataset and the training images of the Pascal VOC 2012 dataset

1. Fix the weights of the FCN-8s except for the last layer, and learn the parameters of that last layer, together with the ConvLSTM and the spatial inhibition module

2. Fine-tune the whole network

Multiple Person SegmentationThe FCN features show a “mix” between semantic and instance segmentation

Experiments: Plants Leaf SegmentationLearn the fully convolutional network from scratch: 5 convolutional layers +ReLU.

Computer Vision Problems in Plant Phenotyping (CVPPP) dataset: 161 images

Low SBD because of low resolution (though Difference in count is good)

35SBD is a measure about the accuracy of the segmentation of the instances

Plants Leaf SegmentationThere are systems that perform better at the moment (better resolution…)

Mengye Ren, Richard S. Zemel: End-to-End Instance Segmentation and Counting with Recurrent Attention (30th May 2016). The article studied in this presentation was published the 25th Nov 2015

RIS+CRF Ren & ZemelRen & Zemel

ConclusionsThe model integrates in a single pipeline all the required functions to segment instances, and their parameters are jointly learned end-to-end

The model uses a recurrent structure that is able to track visited areas in the image as well as to handle occlusion among instances, similarly to humans

The defined loss function accurately represents the instance segmentation objective

The experiments show promising performance

Appendix

38Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation

Appendix

39Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation

Recurrent Instance Segmentation (UPC Reading Group)

Data & Analytics

Bottom-up Instance Segmentation using Deep Higher-Order CRFs · 2016. 9. 17. · by [29] where a Recurrent Neural Network outputs an object instance at each time step. This method,

Recurrent Instance Research Café BRGF Segmentation · 2018. 7. 9. · Recurrent Semantic Instance Segmentation Contributions First end-to-end recurrent model for semantic instance

Recurrent Pregnancy Loss Recurrent Pregnancy Loss

UPC-3 / UPC-1 · 2014. 10. 20. · upc-3 / upc-1 programmable controllers . firmware v5.10 and later . operation . manual . pacific power source

UPC Location

Recurrent Neural Networks - UPC Universitat Politècnica ...bejar/DLMAI/RNN.pdf · Augmented recurrent neural networks RNNs are Turing complete, but it is di cult to achieve it in

Recurrent Semantic Instance Segmentation...Recurrent Semantic Instance Segmentation Amaia Salvador, Manel Baradad, Xavier Giró-i-Nieto and Ferran Marqués Universitat Politècnica

Upc businesspartners

INSTRUCTIVO DE PAGO POR INTRANET ... - postgrado.upc.edu.pepostgrado.upc.edu.pe/sites/default/files/upc/postgrado/programas/e… · INTRANET UPC Sócrates - Intranet Mi... Home-UPC

Edicion20 UPC

Upc Illustrated

Recurrent Instance Segmentation - University of Oxfordtvg/publications/2016/RIS7.pdf · 2016-07-29 · Recurrent Instance Segmentation 3 based on RNNs containing convolutional layers,

UPC October

Recurrent Instance Segmentation using Sequences of Referring expressions ... · 2020-05-06 · Recurrent Instance Segmentation using Sequences of Referring expressions (Supplementary

UPC-3 / UPC-1

Sesion4 upc

Recurrent Semantic Instance Segmentation arXiv:1712.00617v2 [cs.CV… · Ferran Marques 1Jordi Torres2 and Xavier Giro-i-Nieto 1Universitat Polit ecnica de Catalunya 2Barcelona Supercomputing

Recurso Upc

Recurrent Instance Segmentation - University of Oxfordtvg/publications/2016/RIS7.pdf · 2016. 7. 29. · Recurrent Instance Segmentation 3 based on RNNs containing convolutional layers,

Recurrent Instance Segmentation (UPC Reading Group)