36
Deep Learning Basics Lecture 6: Convolutional NN Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Deep Learning Basics Lecture 6: Convolutional NN

Princeton University COS 495

Instructor: Yingyu Liang

Page 2: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Review: convolutional layers

Page 3: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Convolution: two dimensional case

a b c d

e f g h

i j k l

w x

y z

bw + cx + fy + gz

wa + bx + ey + fz

Kernel/filter

Feature map

Input

Page 4: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Convolutional layers

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

the same weight shared for all output nodes

𝑚 output nodes

𝑛 input nodes

𝑘 kernel size

Page 5: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Terminology

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Page 6: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Case study: LeNet-5

Page 7: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet-5

• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

Page 8: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet-5

• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

• Apply convolution on 2D images (MNIST) and use backpropagation

Page 9: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet-5

• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

• Apply convolution on 2D images (MNIST) and use backpropagation

• Structure: 2 convolutional layers (with pooling) + 3 fully connected layers• Input size: 32x32x1

• Convolution kernel size: 5x5

• Pooling: 2x2

Page 10: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Page 11: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Page 12: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Filter: 5x5, stride: 1x1, #filters: 6

Page 13: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Pooling: 2x2, stride: 2

Page 14: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Filter: 5x5x6, stride: 1x1, #filters: 16

Page 15: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Pooling: 2x2, stride: 2

Page 16: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Weight matrix: 400x120

Page 17: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Weight matrix: 120x84

Weight matrix: 84x10

Page 18: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Software platforms for CNNUpdated in April 2016; checked more recent ones online

Page 19: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Platform: Marvin (marvin.is)

Page 20: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Platform: Marvin by

Page 21: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet in Marvin: convolutional layer

Page 22: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet in Marvin: pooling layer

Page 23: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet in Marvin: fully connected layer

Page 24: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Platform: Caffe (caffe.berkeleyvision.org)

Page 25: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

LeNet in Caffe

Page 26: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Platform: Tensorflow (tensorflow.org)

Page 27: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Platform: Tensorflow (tensorflow.org)

Page 28: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Platform: Tensorflow (tensorflow.org)

Page 29: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Others

• Theano – CPU/GPU symbolic expression compiler in python (from MILA lab at University of Montreal)

• Torch – provides a Matlab-like environment for state-of-the-art machine learning algorithms in lua

• Lasagne - Lasagne is a lightweight library to build and train neural networks in Theano

• See: http://deeplearning.net/software_links/

Page 30: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Optimization: momentum

Page 31: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Basic algorithms

• Minimize the (regularized) empirical loss

𝐿𝑅 𝜃 =1

𝑛σ𝑡=1

𝑛 𝑙(𝜃, 𝑥𝑡 , 𝑦𝑡) + 𝑅(𝜃)

where the hypothesis is parametrized by 𝜃

• Gradient descent𝜃𝑡+1 = 𝜃𝑡 − 𝜂𝑡𝛻𝐿𝑅 𝜃𝑡

Page 32: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Mini-batch stochastic gradient descent

• Instead of one data point, work with a small batch of 𝑏 points

(𝑥𝑡𝑏+1,𝑦𝑡𝑏+1),…, (𝑥𝑡𝑏+𝑏,𝑦𝑡𝑏+𝑏)

• Update rule

𝜃𝑡+1 = 𝜃𝑡 − 𝜂𝑡𝛻1

𝑏

1≤𝑖≤𝑏

𝑙 𝜃𝑡 , 𝑥𝑡𝑏+𝑖 , 𝑦𝑡𝑏+𝑖 + 𝑅(𝜃𝑡)

Page 33: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Momentum

• Drawback of SGD: can be slow when gradient is small

• Observation: when the gradient is consistent across consecutive steps, can take larger steps

• Metaphor: rolling marble ball on gentle slope

Page 34: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Momentum

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Contour: loss functionPath: SGD with momentumArrow: stochastic gradient

Page 35: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Momentum

• work with a small batch of 𝑏 points

(𝑥𝑡𝑏+1,𝑦𝑡𝑏+1),…, (𝑥𝑡𝑏+𝑏,𝑦𝑡𝑏+𝑏)

• Keep a momentum variable 𝑣𝑡, and set a decay rate 𝛼

• Update rule

𝑣𝑡 = 𝛼𝑣𝑡−1 − 𝜂𝑡𝛻1

𝑏

1≤𝑖≤𝑏

𝑙 𝜃𝑡 , 𝑥𝑡𝑏+𝑖 , 𝑦𝑡𝑏+𝑖 + 𝑅(𝜃𝑡)

𝜃𝑡+1 = 𝜃𝑡 + 𝑣𝑡

Page 36: Deep Learning Basics Lecture 6: Convolutional NN · 2018. 8. 1. · Lecture 6: Convolutional NN Princeton University COS 495 Instructor ... Input. Convolutional layers Figure from

Momentum

• Keep a momentum variable 𝑣𝑡, and set a decay rate 𝛼

• Update rule

𝑣𝑡 = 𝛼𝑣𝑡−1 − 𝜂𝑡𝛻1

𝑏

1≤𝑖≤𝑏

𝑙 𝜃𝑡 , 𝑥𝑡𝑏+𝑖 , 𝑦𝑡𝑏+𝑖 + 𝑅(𝜃𝑡)

𝜃𝑡+1 = 𝜃𝑡 + 𝑣𝑡

• Practical guide: 𝛼 is set to 0.5 until the initial learning stabilizes and then is increased to 0.9 or higher.