85
Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Module 5

Deep Convnets for Local RecognitionJoost van de Weijer4 April 2016

Page 2: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Previously, end-to-end..

2Slide credit: Jose M Àlvarez

Dog

Page 3: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Previously, end-to-end..

3Slide credit: Jose M Àlvarez

Dog

Learned Representation

Page 4: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

4

Dog

Learned Representation

Part I: End-to-end learning (E2E)

Previously, end-to-end..

Page 5: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

5

Learned Representation

Part I: End-to-end learning (E2E)

Task A(eg. image classification)

Previously, end-to-end..

Page 6: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

6

Part I: End-to-end learning (E2E)

Domain BFine-tuned

Learned Representation

Part I’: End-to-End Fine-Tuning (FT)

Part I: End-to-end learning (E2E)

Domain ALearned Representation

Part I: End-to-end learning (E2E)

Transfer

Previously,finetuning..

slide credit: X. Giro

Page 8: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

8Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015)

Fine-tuning a pre-trained network

Fine-tuning: High learning rate in new layer, and low learning rate in all other layers.

Previously,finetuning..

Page 9: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

9

Task A(eg. image classification)

Learned Representation

Part I: End-to-end learning (E2E)

Task B(eg. image retrieval)Part II: Off-the-shelf features

Previously, off-the-shelf features..

slide credit: X. Giro

Page 10: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Orange

Image classification: image as an input, label as output

spatial coded image representations(like spatial pyramids)

x y Fd d d

orderless image representation (like BOW)

1 1 Fd

Previously, off-the-shelf features..

Page 11: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Two deep lectures in M5

Global Scale(today’s lecture)

Local Scale(next lecture)

Deep ConvNets for Recognition at...

Page 12: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Orange

Image ClassificationImage classification: image as an input, label as output

How to process non-squared images ?

resize zero padding largest centred square

Page 13: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Local object recognition

object localization

(single object)

object detection

semantic segmentation

Page 14: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Classification+LOCALIZATION

slide credit: Li, Karpathy, Johnson

Page 15: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Localization as regression

slide credit: Li, Karpathy, Johnson

Page 16: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

slide credit: Li, Karpathy, Johnson

Localization as regression

Page 17: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

regression head

classification head

Localization as regression

slide credit: Li, Karpathy, Johnson

Page 18: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

regression head

classification head

Localization as regression

slide credit: Li, Karpathy, Johnson

Page 19: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Localization as regression

slide credit: Li, Karpathy, Johnson

Page 20: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Localization as regressionClassification head:C- class scores

regression head:Cx4 - numbers

slide credit: Li, Karpathy, Johnson

Problem: multiple classes

Page 21: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Localization as regression

slide credit: Li, Karpathy, Johnson

Page 22: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Localization as regression (example)

Example of localization of cloths. Regression is done in two steps: first the person bounding box and then the cloth bounding boxes (master project 2015)

Esteve Cervantes: Evaluating deep features for Fashion Recognition

Page 23: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Local object recognition

object localization

(single object)

object detection

semantic segmentation

any ideas ?

Page 24: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Sliding window227

22

7

227

22

7

0.03

classification + regression

227

22

7

227

22

7

0.83classification + regression

Compute a new regressed bounding box and classification score for all sliding window positions.

Page 25: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Sliding window

227

22

7

Repeat for different scales and combine all results (e.g. with non maxima suppression)

22

7

227

0.83

0.99

Page 26: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Sliding window (efficient computation)

Let us for simplicity consider a simple three layer network

5x5

10

10

conv 1 fc1 fc2

car/not car

6

6

5

10

1

2

1

What are the spatial coordinates of conv1 ?

10

10

12x17

conv1 filter(5x5)

Part of the convolutionalfeatures are the same and do not need recomputation!

Page 27: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Sliding window (efficient computation)

Let us for simplicity consider a simple three layer network

5x5

10

10

conv 1 fc1 fc2

car/not car

6

6

5

10

1

2

1

10

10

12x17

conv1 filter(5x5)

How many 10x10 windows are there in this 12x17 image ?

Page 28: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Sliding window (efficient computation)

Let us for simplicity consider a simple three layer network

5x5

10

10

conv 1 fc1 fc2

car/not car

6

6

5

10

1

2

1

10

10

12x17

conv1 filter(5x5)

5x5

17

12

conv 1

13

8

5

The convolutions can be computed in a single pass.

Page 29: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Sliding window (efficient computation)

Let us for simplicity consider a simple three layer network

5x5

10

10

conv 1 fc1 fc2

car/not car

6

6

5

10

1

2

1

10

10

12x17

conv1 filter(5x5)

5x5

17

12

conv 1

13

8

5 6x6x5

1x1x10

fc2

Page 30: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Sliding window (efficient computation)

Let us for simplicity consider a simple three layer network

5x5

10

10

conv 1 fc1 fc2

car/not car

6

6

5

10

1

2

1

10

10

12x17

conv1 filter(5x5)

5x5

17

12

conv 1(5x5x3)

13

8

5

8

103

fc2=conv2(6x6x5)

Page 31: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Sliding window (efficient computation)

Let us for simplicity consider a simple three layer network

5x5

10

10

conv 1 fc1 fc2

car/not car

6

6

5

10

1

2

1

10

10

12x17

conv1 filter(5x5)

5x5

17

12

conv 1(5x5x3)

13

8

5

8

103

fc2=conv2(6x6x5)

1x1x2

fc3

Page 32: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Sliding window (efficient computation)

Let us for simplicity consider a simple three layer network

5x5

10

10

conv 1 fc1 fc2

car/not car

6

6

5

10

1

2

1

10

10

12x17

conv1 filter(5x5)

5x5

17

12

conv 15 fillters of (5x5x3)

13

8

5

8

103

fc2=conv210 filters of (6x6x5)

8

23

fc3=conv32 filters of (1x1x10)

We have the 8x3=24 classification scores sharing computation of the convolutional feaures.

Page 33: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Example of bear and fish detection on multiple scales.

Semanet et al, ‘Integrated Recognition, Localization and Detection using Convolutional Networks’ ICLR 2014

Networks can be written as fully convolutional networks to speed up computation at testing time.

Sliding window (efficient computation)

Page 34: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals

selective search

K. Van de Sande et al. Segmentation as selective search for object recognition. ICCV 2011.

• object proposal methods compute boxes which potentially contain an object.

• Features for each box are extracted and a classifier is applied.

• typically thousands of boxes (but much less than sliding window)

• Many different approaches: selective search, edge boxes, GOP, etc.

Page 35: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (RCNN)

Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.

1. compute object proposals (~2k)

2. warp dilated bounding box

4. classify regions

3. compute CNN features

car: yesperson : no

bounding box regression

Page 36: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (RCNN)

Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.

Alex Net

Page 37: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (RCNN)

Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.

Alex Net

remove last layer and finetune for 20 PASCAL classes

Use fc7 4096-d vector as the description of the bounding box.

Train a SVM on this representation for classification

Page 38: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (RCNN)

slide credit: Girshick

Page 39: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (RCNN)

Page 40: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (RCNN)

slide credit: Li, Karpathy, Johnson

Page 41: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (RCNN)

Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.

1. compute object proposals (~2k)

2. warp dilated bounding box

4. classify regions

3. compute CNN features

car: yesperson : no

improved bounding box

drawbacks:• not end-to-end• warping of boxes• lots of double computation (overlap of bounding boxes)

Page 42: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (Fast R-CNN)

Page 43: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (Fast R-CNN)

He, Kaiming, et al. "Spatial pyramid pooling in deep convolutionalnetworks for visual recognition." PAMI 2015

‘conv 5’ • compute ones the convolutional features per image.

shar

ed

co

mp

uta

tio

n(c

on

v1-c

on

v5)

Page 44: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (Fast R-CNN)

This was first proposed by: He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." PAMI 2015

• compute ones the convolutional features• extract features from conv5 for all bb’s

shar

ed c

om

pu

tati

on

‘conv 5’

Page 45: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (Fast R-CNN)

• pool the features in a spatial grid.

for all bounding boxes:Region of Interest pooling(ROI pooling)

shar

ed c

om

pu

tati

on

Page 46: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (Fast R-CNN)

• pool the features in a spatial grid

ROI pooling:

FCsclassification:log loss

regression:smooth L1 loss

end-to-end training

shar

ed c

om

pu

tati

on

Page 47: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (Fast R-CNN)

Fast R-CNN R-CNN

Train time 9.5 84

-speedup 8.8x -

Test time/image 0.32s 47s

Test speedup 146x -

mAP 66.9% 66.0%

multi-task improves also classification performance. end-to-end improves results

Test time does not include object proposal computation (which is now the bottleneck)

Page 48: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (Faster R-CNN)

shar

ed c

om

pu

tati

on

‘conv5’

compute the object proposals directly in the network.

FCs Region Proposal Network (RPN)

ROI pooling:

Page 49: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (Faster R-CNN)

slide credit: Kaming He

Slide a window over the feature map.

Add a network which classifies and regresses the bounding boxes.

The classification score provides the confidence of the presence of object.

Page 50: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (Faster R-CNN)

slide credit: Kaming He

Slide a window over the feature map.

Add a network which classifies and regresses the bounding boxes.

The classification score provides the confidence of the presence of object.

Use N anchors for proposals of varying aspect ratios.

Page 51: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (Faster R-CNN)

slide credit: Kaming He

Model Time

Edge boxes + R-CNN 0.25 sec + 1000*ConvTime + 1000*FcTime

Edge boxes + fast R-CNN 0.25 sec + 1*ConvTime + 1000*FcTime

faster R-CNN 1*ConvTime + 1000*FcTime

Computation for 1000 boxes.

Page 52: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (Faster R-CNN)

slide credit: Li, Karpathy, johnson

Page 53: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object proposals (Faster R-CNN)

slide credit: Li, Karpathy, johnson

Page 54: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object localization

Winner ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2015 challenge with residual networks and Faster RCNN.

Page 55: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

object localization

Winner ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2015 challenge with residual networks and Faster RCNN

Page 56: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

summary object detection

slide credit: Li, Karpathy, johnson

• object localization: when there is one or a known number of objects/classes you can do object localization by adding a ‘regression head’ to your network.

• Sliding window + CNN can be computed efficiently by writing the network as a fully convolutional network.

• Object proposal methods are straightforwardly combined with CNNs, but for fast/good results consider:

• adding a regression head to improve bounding box estimation.• share computation of the convolutional features (SPP)• end-to-end training of network (fast RCNN)• include Region Proposal Network for fast object proposals within the network (faster RCNN).

Page 57: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Local object recognition

object localization

(single object)

object detection

semantic segmentation

Page 58: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentation

semantic segmentation:assign a class to all pixels

instance segmentation : assign pixels to a particular instance of a class (chair1, etc..)

Page 59: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentationConvNet

predict center pixel

Because of the convolutions the resolution is smaller and upsampling is required

Write network as fully convolutionalnetwork and apply to image

Page 60: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentation

Long et al., Fully Convolutional Networksfor Semantic Segmentation, ICCV 2015

pixelwise loss

Page 61: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentation

Long et al., Fully Convolutional Networksfor Semantic Segmentation, ICCV 2015

Convolution (3x3)padding[1 1 1 1]stride [1 1]

inp

ut

Page 62: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentationConvolution (3x3)padding[1 1 1 1]stride [1 1]

inp

ut

Page 63: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentation

Convolution (3x3)padding[1 1 1 1]stride [2 2]

inp

ut

Convolution (3x3)padding[1 1 1 1]stride [1 1]

inp

ut

Page 64: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentation

Convolution (3x3)padding[1 1 1 1]stride [2 2]

inp

ut

Convolution (3x3)padding[1 1 1 1]stride [1 1]

inp

ut

Page 65: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentationdeconvolution (3x3)padding [1 1 1 1]stride [2 2]

inp

ut

Page 66: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentationdeconvolution (3x3)padding [1 1 1 1]stride [2 2]

inp

ut

• deconvolutions are also called fractionally strided convolutions, convolution transpose.

Page 67: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentation

Noh et al. ICCV 2015

Page 68: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentation

Noh et al. ICCV 2015

Page 69: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentation

combine where (local, shallow) with what (global, deep)

Long et al., Fully Convolutional Networksfor Semantic Segmentation, ICCV 2015

Page 70: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentation

Long et al., Fully Convolutional Networksfor Semantic Segmentation, ICCV 2015

interp + sum

interp + sum

dense output

‘skip layers’

Page 71: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentation

Long et al., Fully Convolutional Networksfor Semantic Segmentation, ICCV 2015

stride 32

no skips

stride 16

1 skip

stride 8

2 skips

ground truthinput image

Page 72: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentation

Eigen, Fergus, Predicting Depth, Surface Normals and Semantic Labelswith a Common Multi-Scale Convolutional Architecture, ICCV 2015

Page 73: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

semantic segmentation

Eigen, Fergus, Predicting Depth, Surface Normals and Semantic Labelswith a Common Multi-Scale Convolutional Architecture, ICCV 2015

Surface normalsresults

Page 74: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

instance segmentation

Dai et al. ‘Instance aware Semantic Segmentation via Multi-task Network Cascades’, arXiv 2015.

Page 75: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

instance segmentation

Dai et al. ‘Instance aware Semantic Segmentation via Multi-task Network Cascades’, arXiv 2015.

Page 76: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

instance segmentation

Dai et al. ‘Instance aware Semantic Segmentation via Multi-task Network Cascades’, arXiv 2015.

Page 77: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

instance segmentation

Dai et al. ‘Instance aware Semantic Segmentation via Multi-task Network Cascades’, arXiv 2015.

results ground-truth

Page 78: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Generative Adversarial Networks

Dai et al. ‘Instance aware Semantic Segmentation via Multi-task Network Cascades’, arXiv 2015.

Fractionally strided convolutions (deconvolutions) can be used to generate images.

noise

Page 79: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Generative Adversarial Networks

Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z.

max log log 1D

D x D G z

G(z)

generated horses

I can train a discriminative network D which is trained to distinguish real horse images x from generated horse images G(z)

x

real horses

D

Page 80: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Generative Adversarial Networks

Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z.

maxlog log 1D

D x D G z

G(z)

generated horses

I can then optimize my generative network to fool the discriminative network.

x

real horses

D

minG

Page 81: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Generative Adversarial Networks

Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z.

G(z)

generated horses

You can re-optimize the Discriminate network D, etc...

x

real horses

D

log oax l g 1mD

D x D G z minG

Page 82: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Generative Adversarial Networks

Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z.

G(z)

generated horses

You can re-optimize the Discriminate network D, etc...until D gives in...

x

real horses

D

log oax l g 1mD

D x D G z minG

Goodman et al. Generative Adversarial NetsNIPS 2014

Page 83: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Generative Adversarial Networks

Examples of generated bedrooms.Unsupervised Representation Radford et al. Learning with Deep ConvolutionalGenerative Adversarial Nteworks ICLR 2016

Page 84: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

Generative Adversarial Networks

Interpolation between points in z.

Unsupervised Representation Radford et al. Learning with Deep ConvolutionalGenerative Adversarial Nteworks ICLR 2016

Page 85: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end

summary semantic segmentation

slide credit: Li, Karpathy, johnson

• Fully convolutional networks can be applied for efficient classification of all pixels.• To get high quality segmentations deep features of multiple scales need to be combined (e.g. with skip layers).• upsampling can be done by de-convolution and de-pooling operations.• Instance segmentation can be performed by combining object detection and semantic segmentation pipelines.