62
Convolutional Neural Networks (CNN) Prof. Seungchul Lee Industrial AI Lab.

(CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Convolutional Neural Networks

(CNN)

Prof. Seungchul Lee

Industrial AI Lab.

Page 2: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Machine Learning vs. Deep Learning

• Machine learning

• Deep learning

InputHand-engineered

featuresClassifier output

Input Deep Learning output

end-to-end learning

2

Page 3: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Convolution

3

Page 4: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

1D Convolution

• (actually cross-correlation)

Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1Input

4

Page 5: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

1D Convolution

• (actually cross-correlation)

Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

W

1 3 0 -1

w

Input

Kernel

Output

L = W-w+1

5

Page 6: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

1D Convolution

• (actually cross-correlation)

Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

1 3 0 -1

Input

Kernel

Output

L = W-w+1

7

W

w

6

Page 7: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

1D Convolution

• (actually cross-correlation)

Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

1 3 0 -1

Input

Kernel

Output

L = W-w+1

7 9

W

w

7

Page 8: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

2D Convolution

8

Page 9: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Convolution on Image (= Convolution in 2D)

• Filter (or Kernel)– Discrete convolution can be viewed as element-wise multiplication by a matrix

– Modify or enhance an image by filtering

– Filter images to emphasize certain features or remove other features

– Filtering includes smoothing, sharpening and edge enhancement

Image Kernel Output

9

Page 10: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Convolution Mask + Neural Network

10

Page 11: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Locality

• Locality: objects tend to have a local spatial support

– fully-connected layer → locally-connected layer

11

Page 12: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Locality

• Locality: objects tend to have a local spatial support

– fully-connected layer → locally-connected layer

We are not designing the kernel, but are learning the kernel from data→ Learning feature extractor from data

𝜔1 𝜔2 𝜔3

𝜔4 𝜔5 𝜔6

𝜔7 𝜔8 𝜔9

12

Page 13: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Nonlinear Activation Function

13

Page 14: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Pooling

14

Page 15: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Pooling

• Compute a maximum value in a sliding window (max pooling)

• Reduce spatial resolution for faster computation

• Achieve invariance to local translation

• Max pooling introduces invariances– Pooling size : 2×2

– No parameters: max or average of 2x2 units

15

Page 16: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

• Such an operation aims at grouping several activations into a single “more meaningful” one.

• The average pooling computes average values per block instead of max values

Pooling

Source: Dr. Francois Fleuret at EPFL

1 3 2 3 0 -1 1 2 2 1

w

3 3 0 2 2

Input

r w

Output

r

16

Page 17: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Pooling: Invariance

• Pooling provides invariance to any permutation inside one of the cell

• More practically, it provides a pseudo-invariance to deformations that result into local translations

Source: Dr. Francois Fleuret at EPFL 17

Page 18: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

• Pooling provides invariance to any permutation inside one of the cell

• More practically, it provides a pseudo-invariance to deformations that result into local translations

Pooling: Invariance

Source: Dr. Francois Fleuret at EPFL 18

Page 19: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

CNNs for Classification: Feature Learning

• Learn features in input image through convolution

• Introduce non-linearity through activation function (real-world data is non-linear!)

• Reduce dimensionality and preserve spatial invariance with pooling

Source: 6.S191 Intro. to Deep Learning at MIT 19

Page 20: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

CNNs for Classification: Class Probabilities

• CONV and POOL layers output high-level features of input

• Fully connected layer uses these features for classifying input image

• Express output as probability of image belonging to a particular class

Source: 6.S191 Intro. to Deep Learning at MIT 20

Page 21: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

실시간 강의자료

21

Page 22: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Images

22

Page 23: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Images Are Numbers

23Source: 6.S191 Intro. to Deep Learning at MIT

Page 24: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Images Are Numbers

24Source: 6.S191 Intro. to Deep Learning at MIT

Page 25: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Images Are Numbers

25Source: 6.S191 Intro. to Deep Learning at MIT

Page 26: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Images

26

Original image R G B

Gray image

Page 27: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multiple Filters (or Kernels)

27

Page 28: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Channels

• Colored image = tensor of shape (height, width, channels)

• Convolutions are usually computed for each channel and summed:

• Kernel size aka receptive field (usually 1, 3, 5, 7, 11)

28Source: Dr. Francois Fleuret at EPFL

Page 29: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

29Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Page 30: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

30Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Page 31: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

31

Kernel

w

h

c

OutputInput

W

H

C

Page 32: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

32Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Output

Kernel

w

h

c

Page 33: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

33Source: Dr. Francois Fleuret at EPFL

Kernel

w

h

c

Input

W

H

C

Output

Page 34: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

34Source: Dr. Francois Fleuret at EPFL

Kernel

w

h

c

Input

W

H

C

Output

Page 35: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

35Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

Page 36: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

36Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

Page 37: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

37Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

Page 38: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

38Source: Dr. Francois Fleuret at EPFL

Input

W

H

C

Kernel

w

h

c

Output

Page 39: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

39Source: Dr. Francois Fleuret at EPFL

OutputInput

W

H

C

Kernel

w

h

c

Page 40: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

40Source: Dr. Francois Fleuret at EPFL

Output

H – h + 1

W – w + 1

1

Input

W

H

C

Kernel

w

h

c

Page 41: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel and Multi-kernel 2D Convolution

41Source: Dr. Francois Fleuret at EPFL

Output

H – h + 1

W – w + 1

D

w

h

c

Kernels

Input

W

H

C

D

Page 42: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Dealing with Shapes

• Activations or feature maps shape

– Input (𝑊𝑖 , 𝐻𝑖 , 𝐶)

– Output (𝑊𝑜, 𝐻𝑜, 𝐷)

• Kernel of Filter shape (𝑤, ℎ, 𝐶, 𝐷)– 𝑤 × ℎ Kernel size

– 𝐶 Input channels

– 𝐷 Output channels

• Numbers of parameters: (𝑤 × ℎ × 𝐶 + 1) × 𝐷– bias

42Source: Dr. Francois Fleuret at EPFL

Page 43: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Multi-channel 2D Convolution

• The kernel is not swiped across channels, just across rows and columns.

• Note that a convolution preserves the signal support structure.– A 1D signal is converted into a 1D signal, a 2D signal into a 2D, and neighboring parts of the input signal influence neighboring

parts of the output signal.

• We usually refer to one of the channels generated by a convolution layer as an activation map.

• The sub-area of an input map that influences a component of the output as the receptive field of the latter.

43

Page 44: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Padding and Stride

44

Page 45: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Strides

• Strides: increment step size for the convolution operator

• Reduces the size of the output map

45

Example with kernel size 3×3 and a stride of 2 (image in blue)

Source: https://github.com/vdumoulin/conv_arithmetic

Page 46: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Padding

• Padding: artificially fill borders of image

• Useful to keep spatial dimension constant across filters

• Useful with strides and large receptive fields

• Usually fill with 0s

46Source: https://github.com/vdumoulin/conv_arithmetic

Page 47: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1 , 1), a stride of (2 , 2)

47Source: Dr. Francois Fleuret at EPFL

1

Input

1

Page 48: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

48Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 49: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

Padding and Stride

49Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 50: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

50Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 51: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

51Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 52: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

52Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 53: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

53Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 54: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

54Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 55: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

55Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 56: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Padding and Stride

• Here with 5 × 5 × 𝐶 as input, a padding of (1,1), a stride of (2,2), and a kernel of size 3 × 3 × 𝐶

56Source: Dr. Francois Fleuret at EPFL

Input

Output

Page 57: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Nonlinear Activation Function

57

Page 58: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

CNN in TensorFlow

58

Page 59: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Lab: CNN with TensorFlow

• MNIST example

• To classify handwritten digits

59

Page 60: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

CNN Structure

60

Page 61: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Loss and Optimizer

• Loss

– Classification: Cross entropy

– Equivalent to applying logistic regression

• Optimizer

– GradientDescentOptimizer

– AdamOptimizer: the most popular optimizer

61

Page 62: (CNN)Locality •Locality: objects tend to have a local spatial support –fully-connected layer → locally-connected layer We are not designing the kernel, but are learning the kernel

Test or Evaluation

62