71
Deep Learning for Computer Vision MIT 6.S191 Ava Soleimany January 29, 2019

Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

Deep Learning for Computer VisionMIT 6.S191

Ava SoleimanyJanuary 29, 2019

Page 2: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Page 3: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

What Computers “See”

Page 4: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Images are Numbers

[1]

Page 5: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Images are Numbers

[1]

Page 6: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Images are NumbersWhat the computer sees

An image is just a matrix of numbers [0,255]!i.e., 1080x1080x3 for an RGB image

[1]

Page 7: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Tasks in Computer Vision

- Regression: output variable takes continuous value- Classification: output variable takes class label. Can produce probability of belonging to a particular class

Input Image

classification

Lincoln

Washington

Jefferson

Obama

Pixel Representation

0.8

0.1

0.05

0.05

Page 8: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

High Level Feature Detection

Let’s identify key features in each image category

Wheels, License Plate, Headlights

Door, Windows,

Steps

Nose, Eyes,

Mouth

Page 9: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Manual Feature Extraction

Problems?

Define featuresDomain knowledge Detect features to classify

Page 10: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Manual Feature Extraction

Define featuresDomain knowledge Detect features to classify

[2]

Page 11: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Manual Feature Extraction

Define featuresDomain knowledge Detect features to classify

[2]

Page 12: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Learning Feature Representations

Can we learn a hierarchy of features directly from the data instead of hand engineering?

Low level features Mid level features High level features

Eyes, ears, noseEdges, dark spots Facial structure

[3]

Page 13: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

Learning Visual Features

Page 14: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Fully Connected Neural Network

Page 15: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Fully Connected Neural Network

Fully Connected: • Connect neuron in hidden

layer to all neurons in input layer

• No spatial information!• And many, many parameters!

Input: • 2D image• Vector of pixel values

Page 16: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Fully Connected Neural Network

How can we use spatial structure in the input to inform the architecture of the network?

Fully Connected: • Connect neuron in hidden

layer to all neurons in input layer

• No spatial information!• And many, many parameters!

Input: • 2D image• Vector of pixel values

Page 17: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Using Spatial Structure

Neuron connected to region of input. Only “sees” these values.

Idea: connect patches of input to neurons in hidden layer.

Input: 2D image. Array of pixel values

Page 18: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Using Spatial Structure

Connect patch in input layer to a single neuron in subsequent layer.Use a sliding window to define connections.

How can we weight the patch to detect particular features?

Page 19: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Applying Filters to Extract Features

1) Apply a set of weights – a filter – to extract local features

2) Use multiple filters to extract different features

3) Spatially share parameters of each filter(features that matter in one part of the input should matter elsewhere)

Page 20: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Feature Extraction with Convolution

1) Apply a set of weights – a filter – to extract local features

2) Use multiple filters to extract different features

3) Spatially share parameters of each filter

- Filter of size 4x4 : 16 different weights- Apply this same filter to 4x4 patches in input- Shift by 2 pixels for next patch

This “patchy” operation is convolution

Page 21: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

Feature Extraction and ConvolutionA Case Study

Page 22: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

X or X?

Image is represented as matrix of pixel values… and computers are literal! We want to be able to classify an X as an X even if it’s shifted, shrunk, rotated, deformed.

[4]

Page 23: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Features of X

[4]

Page 24: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Filters to Detect X Features

filters

[4]

Page 25: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

The Convolution Operation

element wisemultiply add outputs

1 1 = 1X

= 9

[4]

Page 26: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

The Convolution Operation

Suppose we want to compute the convolution of a 5x5 image and a 3x3 filter :

We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs…image

filter

[5]

Page 27: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

The Convolution Operation

filter feature map

We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:

[5]

Page 28: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

The Convolution Operation

We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:

filter feature map

[5]

Page 29: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

The Convolution Operation

We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:

filter feature map

[5]

Page 30: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

The Convolution Operation

We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:

filter feature map

[5]

Page 31: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

The Convolution Operation

We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:

filter feature map

[5]

Page 32: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

The Convolution Operation

We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:

filter feature map

[5]

Page 33: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

The Convolution Operation

We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:

[5]

filter feature map

Page 34: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

The Convolution Operation

We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:

filter feature map

[5]

Page 35: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

The Convolution Operation

We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:

filter feature map

[5]

Page 36: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Producing Feature Maps

Original Sharpen Edge Detect “Strong” Edge Detect

Page 37: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Feature Extraction with Convolution

1) Apply a set of weights – a filter – to extract local features

2) Use multiple filters to extract different features3) Spatially share parameters of each filter

Page 38: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

Convolutional Neural Networks (CNNs)

Page 39: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

CNNs for Classification

1. Convolution: Apply filters with learned weights to generate feature maps.2. Non-linearity: Often ReLU. 3. Pooling: Downsampling operation on each feature map.

Train model with image data.Learn weights of filters in convolutional layers.

Page 40: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Convolutional Layers: Local Connectivity

For a neuron in hidden layer:- Take inputs from patch - Compute weighted sum- Apply bias

Page 41: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Convolutional Layers: Local Connectivity

For a neuron in hidden layer:- Take inputs from patch - Compute weighted sum- Apply bias

4x4 filter : matrix of weights !"# $

"%&

'$#%&

'!"# (")*,#), + .

for neuron (p,q) in hidden layer

1) applying a window of weights 2) computing linear combinations

3) activating with non-linear function

Page 42: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

CNNs: Spatial Arrangement of Output Volume

depth

width

height

Layer Dimensions:ℎ " # " $

where h and w are spatial dimensionsd (depth) = number of filters

Receptive Field: Locations in input image that a node is path connected to

Stride:Filter step size

[3]

Page 43: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Introducing Non-Linearity

! " = max (0 , " )

Rectified Linear Unit (ReLU)- Apply after every convolution operation (i.e., after convolutional layers)

- ReLU: pixel-by-pixel operation that replaces all negative values by zero. Non-linear operation!

[5]

Page 44: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Pooling

How else can we downsample and preserve spatial invariance?

1) Reduced dimensionality2) Spatial invariance

[3]

Page 45: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Representation Learning in Deep CNNs

Mid level features

Eyes, ears, nose

Low level features

Edges, dark spots

High level features

Facial structure

Conv Layer 1 Conv Layer 2 Conv Layer 3

[3]

Page 46: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

CNNs for Classification: Feature Learning

1. Learn features in input image through convolution2. Introduce non-linearity through activation function (real-world data is non-linear!)3. Reduce dimensionality and preserve spatial invariance with pooling

Page 47: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

CNNs for Classification: Class Probabilities

- CONV and POOL layers output high-level features of input- Fully connected layer uses these features for classifying input image- Express output as probability of image belonging to a particular class

softmax () = +,-∑/ +,0

Page 48: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

CNNs: Training with Backpropagation

Learn weights for convolutional filters and fully connected layers

! " =$%&(%) log ,-(%)

Backpropagation: cross-entropy loss

Page 49: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

CNNs for Classification: ImageNet

Page 50: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

ImageNet Dataset

Dataset of over 14 million images across 21,841 categories

1409 pictures of bananas.

“Elongated crescent-shaped yellow fruit with soft sweet flesh”

[6,7]

Page 51: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

ImageNet Challenge

Classification task: produce a list of object categories present in image. 1000 categories.“Top 5 error”: rate at which the model does not output correct label in top 5 predictions

Other tasks include: single-object localization, object detection from video/image, scene classification, scene parsing

[6,7]

Page 52: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

ImageNet Challenge: Classification Task

2010

2011

2012

2013

2014

2015

Human0

10

20

30

clas

sific

atio

n er

ror %

28.225.8

16.4

11.7

6.73.57

5.1

2012: AlexNet. First CNN to win. - 8 layers, 61 million parameters2013: ZFNet- 8 layers, more filters2014: VGG- 19 layers 2014: GoogLeNet- “Inception” modules - 22 layers, 5million parameters2015: ResNet- 152 layers

[6,7]

Page 53: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

ImageNet Challenge: Classification Task

2010

2011

2012

2013

2014

2014

2015

Human

0

10

20

30

clas

sifica

tion

erro

r %

28.225.8

16.4

11.7

6.73.57

5.17.3

2010

2011

2012

2013

2014

2014

2015

0

50

100

150

num

ber

of la

yers

[6,7]

Page 54: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

An Architecture for Many Applications

Page 55: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

An Architecture for Many Applications

Object detection with R-CNNsSegmentation with fully convolutional networks

Image captioning with RNNs

Page 56: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Beyond Classification

Object Detection

CAT, DOG, DUCK

Semantic Segmentation

CAT

Image Captioning

The cat is in the grass.

Page 57: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Semantic Segmentation: FCNs

FCN: Fully Convolutional Network.Network designed with all convolutional layers,

with downsampling and upsampling operations

[3,8,9]

Page 58: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Driving Scene Segmentation

[10]Fix reference

Page 59: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Driving Scene Segmentation

[11, 12]

Page 60: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Object Detection with R-CNNs

R-CNN: Find regions that we think have objects. Use CNN to classify.

[13]

Page 61: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Image Captioning using RNNs

[14,15]

Page 62: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Image Captioning using RNNs

[14,15]

Page 63: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

Deep Learning for Computer Vision: Impact and Summary

Page 64: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Data, Data, Data

MNIST: handwritten digits

places: natural scenesImageNet:

22K categories. 14M images.CIFAR-10

Airplane

Automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Page 65: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Deep Learning for Computer Vision: Impact

Page 66: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Impact: Face Detection6.S191 Lab!

Page 67: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Impact: Self-Driving Cars

[16]

Page 68: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Impact: Healthcare

[17]

Identifying facial phenotypes of genetic disorders using deep learningGurovich et al., Nature Med. 2019

Page 69: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19

Deep Learning for Computer Vision: Summary

• Why computer vision?• Representing images• Convolutions for

feature extraction

Foundations CNNs Applications• CNN architecture• Application to

classification• ImageNet

• Segmentation, object detection, image captioning

• Visualization

Page 70: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

Referencesgoo.gl/hbLkF6

Page 71: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the

End of Slides