62
Deep Learning for Vision Part II-CNN and Recognition Associate Prof. Bingbing Ni (倪冰冰) Shanghai Jiao Tong University

Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Deep Learning for Vision

Part II-CNN and Recognition

Associate Prof. Bingbing Ni (倪冰冰)

Shanghai Jiao Tong University

Page 2: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 3: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Alpha Go

Page 4: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Input image: 200x200

Consider an image classification problem

“face”

Fully-connected, 400000 hidden units, 16 billion parameters!

Page 5: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Idea: local connection

Locally-connected, 400000 hidden units, 40 million parameters!

1. Captures local

10x10 region (100

weights)

Leads to Conv Filter!

Input image: 200x200

𝒘

𝒘

2. Weights sharing

3. Like “convolution”

4. Can have different

local filters to generate

different responses

Page 6: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Evidence: biological inspiration

Hubel and Wiesel, 1959

Page 7: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 8: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Convolve the filter with the image, i.e.,

“slide over the image spatially, computing

dot products”

Filters always extend the full

depth of the input volume

Convolutional filter

Page 9: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

- The result of taking a dot product between the filter and a small

5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product +

bias)

- Called convolution due to some legacy, in fact “correlation”

Output a single number

Convolutional filter

Page 10: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

convolve (slide) over all

spatial locations

Convolutional layer

Page 11: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

convolve (slide) over all

spatial locations

Convolutional layer

- If we have 6 5x5x3 filters we got 6 activation maps

- Stack up these maps to get a new “image” of the size 28x28x6

- The set of 6 5x5x3 filters is called a “convolutional layer”

Page 12: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Image 6x6

Conv filter 3x1

We set stride = 1

Output map 4x4

An example

Page 13: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Image 7x7, Filter 3x3

Another example

- If stride = 1, output map size 5x5

- If stride = 2, output map size 3x3

Formula for output size:

(𝑁 − 𝐹)/𝑠𝑡𝑟𝑖𝑑𝑒 + 1

N

F What happens when F = 3?

Page 14: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Zero padding

- In practice, common to pad the border with 0

- In this case N = 7+2, F = 3, stride = 3, output

map size is 3 by the formula

- In general common to see CONV layers with

stride = 1, filters with size FxF, with zero-

padding with (F-1)/2

( N + 2 x (F-1) /2 – F)/1 + 1 = N preserve size!

Page 15: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 16: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 17: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 18: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

First we convert image to column, then calculate 𝒘𝒙+ 𝒃

In CAFFE, we do CNN via vector/matrix operation

Page 19: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Compose the network

Conv net is a sequence of conv layers,

interspersed with activation functions

Page 20: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Compose the network

Need shrink the image

step by step to extract

higher level

information

Page 21: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Receptive field

should be larger and

larger

Page 22: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Max pooling

Max pooling with 2x2 filter and stride = 2

Page 23: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Connect conv activation maps to fully connected layers (FC)

Page 24: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Fully connected layer (FC)

May also convert FC layers to CONV layers, i.e., by setting the

filter size exactly as the input volume

Page 25: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Local Contrast Normalization

- Performed also across features and in the higher layers

- improves invariance, optimization and sparsity

Page 26: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Local Contrast Normalization Layer

Page 27: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Implementation of Le-Net

Page 28: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 29: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 30: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 31: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 32: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 33: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 34: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 35: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Training Deep CNN

Page 36: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Training Deep CNN

Page 37: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Batch Normalization (BN)

Convolutional Neural NetworkTraining Deep CNN

Page 38: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Trouble shooting the training

Training Deep CNN

Page 39: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

AlexNet

GoogleNet

LeNet

VGGNet

Page 40: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 41: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 42: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 43: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 44: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 45: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Page 46: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

In practice: small scale, novel class

- Often small problem, e.g., hundred categories, thousands

samples

- Not stable if we train CNN from scratch

Page 47: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Deep CNN model

Idea: knowledge transfer via CNN

Shared general low level features

Fine-tuned

Deep CNN model

Domain

adaption

Page 48: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Idea: knowledge transfer via CNN

- Take a pre-trained model from model zoo

- Remove last fully convolutional and connect with new

objective

- Fine-tune the new network with higher learning rate on FC

layers and lower learning rate on the early CONV layers

Page 49: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Application: image retrieval

Page 50: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Application: OCR and logo

Page 51: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Application: texture

Page 52: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Application: object detection

Page 53: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Application: scene parsing

Page 54: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Convolutional Neural Network

Application: action recognition

Page 55: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Location: apply CNNs to region proposals

Scarce data: fine-tune the pre-trained model

How to extent the CNN classification results to object detection?

R-CNN

DCNN Object Detection

Page 56: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

SSD: Single Shot MultiBox Detector

Default boxes and aspect ratios

Each feature map cell has a set of default bounding

boxes and the position relative to its corresponding cell

is fixed.

DCNN Object Detection

Page 57: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Recurrent Neural Network

xt

yt

𝐡𝒕

x0

y0

𝐡𝟎

x1

y1

𝐡𝟏

x2

y2

𝐡𝟐

xt

yt

𝐡𝒕…=

Deep RNN for sequence

Page 58: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Recurrent Neural Network

𝐡𝒕+𝟏

𝜺𝒕+𝟏

𝜕𝜺𝒕+𝟏𝜕𝒉𝒕+𝟏

𝜕𝒉𝒕+𝟐𝜕𝒉𝒕+𝟏

𝐡𝒕

𝜺𝒕

𝜕𝜺𝒕𝜕𝒉𝒕

𝜕𝒉𝒕+𝟏𝜕𝒉𝒕

𝐡𝒕−𝟏

𝜺𝒕−𝟏

𝜕𝜺𝒕−𝟏𝜕𝒉𝒕−𝟏

𝜕𝒉𝒕𝜕𝒉𝒕−𝟏

𝜕𝒉𝒕−𝟏𝜕𝒉𝒕−𝟐

𝒙𝒕+𝟏𝒙𝒕𝒙𝒕−𝟏

Have no difference with vanilla neural network !

Training: back propagation though time (BPTT)

Page 59: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Recurrent Neural Network

Image Captioning

ℎ𝑡= tanh(𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡)

𝑥0

START

ℎ0

straw

𝑥1

straw

ℎ1

hat

𝑥2

ℎ2

END

hat

V

𝑵𝒐𝒘: ℎ𝑡 = tanh(𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡 +𝑊𝑣ℎ𝑣)

Page 60: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Recurrent Neural Network

Attention Model

Page 61: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Recurrent Neural Network Attention Model

Page 62: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,

Thank you!