54
Introduction to Deep Learning CMPT 733 Steven Bergner

Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

Embed Size (px)

Citation preview

Page 1: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

Introduction to Deep LearningCMPT 733Steven Bergner

Page 2: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

2

Overview● Renaissance of artificial neural networks

– Representation learning vs feature engineering

● Background– Linear Algebra, Optimization– Regularization

● Construction and training of layered learners● Frameworks for deep learning

Page 3: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

3

Representations matter● Transform into the right

representation● Classify points simply by

threshold on radius axis

[Goodfellow, Bengio, Courville 2016]

Page 4: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

4

Representations matter● Transform into the right

representation● Classify points simply by

threshold on radius axis● Single neuron with non-

linearity can do this

[Goodfellow, Bengio, Courville 2016]

Page 5: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

5

Depth: layered composition

[Goodfellow, Bengio, Courville 2016]

Page 6: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

6

Computational graph

[Goodfellow, Bengio, Courville 2016]

Page 7: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

7

● Hand designed program– Input → Output

● Increasingly automated– Simple features– Abstract features– Mapping from features

Components of learning

[Goodfellow, Bengio, Courville 2016]

Page 8: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

8

Growing Dataset Size

MNIST dataset

[Goodfellow, Bengio, Courville 2016]

Page 9: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

9

Basics

Linear Algebra and Optimization

Page 10: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

10

Linear Algebra● Tensor is an array of numbers

– Multi-dim: 0d scalar, 1d vector, 2d matrix/image, 3d RGB image

● Matrix (dot) product

● Dot product of vectors A and B(m = p = 1 in above notation, n=2)

[Goodfellow, Bengio, Courville 2016]

Page 11: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

11

Linear Algebra● Tensor is an array of numbers

– Multi-dim: 0d scalar, 1d vector, 2d matrix/image, 3d RGB image

● Matrix (dot) product

● Dot product of vectors A and B(m = p = 1 in above notation, n=2)

[Goodfellow, Bengio, Courville 2016]

Page 12: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

12

Linear Algebra● Tensor is an array of numbers

– Multi-dim: 0d scalar, 1d vector, 2d matrix/image, 3d RGB image

● Matrix (dot) product

● Dot product of vectors A and B(m = p = 1 in above notation, n=2)

[Goodfellow, Bengio, Courville 2016]

Page 13: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

13

Linear Algebra● Tensor is an array of numbers

– Multi-dim: 0d scalar, 1d vector, 2d matrix/image, 3d RGB image

● Matrix (dot) product

● Dot product of vectors A and B(m = p = 1 in above notation, n=2)

[Goodfellow, Bengio, Courville 2016]

Page 14: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

14

Linear algebra: Norms

[Goodfellow, Bengio, Courville 2016]

Page 15: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

15

Nonlinearities● ReLU

● Sofplus

● Logistic Sigmoid

[Goodfellow, Bengio, Courville 2016]

[(c) public domain]

[Goodfellow, Bengio, Courville 2016]

Page 16: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

16

Approximate Optimization

[Goodfellow, Bengio, Courville 2016]

Page 17: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

17

Gradient descent

[Goodfellow, Bengio, Courville 2016]

Page 18: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

18

Critical points

[Goodfellow, Bengio, Courville 2016]

Page 19: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

19

Critical points

[Goodfellow, Bengio, Courville 2016]

Saddle point – 1st and 2nd derivative vanish

Page 20: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

20

Critical points

[Goodfellow, Bengio, Courville 2016]

Saddle point – 1st and 2nd derivative vanish

Poor conditioning:1st deriv large in one and small in another direction

Page 21: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

21

Tensorflow Playground● http://playground.tensorflow.org/

– Try out simple network configurations

● https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html– Visualize linear and non-linear mappings

Page 22: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

22

Regularization

Reduced generalization error without impacting training error

Page 23: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

23

Constrained optimization

[Goodfellow, Bengio, Courville 2016]

Unregularized objective

Page 24: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

24

Constrained optimization● Squared L2 encourages small

weights

[Goodfellow, Bengio, Courville 2016]

Unregularized objective

L2 regularizer

Page 25: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

25

Constrained optimization● Squared L2 encourages small

weights● L1 encourages sparsity of

model parameters (weights)

[Goodfellow, Bengio, Courville 2016]

Unregularized objective

L2 regularizer

Page 26: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

26

Dataset augmentation

[Goodfellow, Bengio, Courville 2016]

Page 27: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

27

Learning curves

Page 28: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

28

Learning curves

● Early stopping before validation error starts to increase

Page 29: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

29

Bagging● Average multiple models trained on subsets of the data

Page 30: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

30

Bagging● Average multiple models trained on subsets of the data● First subset: learns top loop, Second subset: bottom loop

Page 31: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

31

Dropout● Random sample of

connection weights is set to zero

● Train diferent network model each time

● Learn more robust, generalizable features

[Goodfellow, Bengio, Courville 2016]

Page 32: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

32

Multitask learning● Shared parameters are

trained with more data● Improved generalization

error due to increased statistical strength

[Goodfellow, Bengio, Courville 2016]

Page 33: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

33

Components ofpopular architectures

Page 34: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

34

Convolution as edge detector

[Goodfellow, Bengio, Courville 2016]

Page 35: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

35

Gabor wavelets (kernels)

[Goodfellow, Bengio, Courville 2016]

Page 36: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

36

Gabor wavelets (kernels)

[Goodfellow, Bengio, Courville 2016]

Local average, first derivative

Page 37: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

37

Gabor wavelets (kernels)

[Goodfellow, Bengio, Courville 2016]

Local average, first derivativeSecond derivative (curvature)

Page 38: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

38

Gabor wavelets (kernels)

[Goodfellow, Bengio, Courville 2016]

Local average, first derivativeSecond derivative (curvature)Directional second derivative

Page 39: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

39

Gabor-like learned kernels

[Goodfellow, Bengio, Courville 2016]

● Features extractors provided by pretrained networks

Page 40: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

40

Max pooling translation invariance

[Goodfellow, Bengio, Courville 2016]

● Take max of certain neighbourhood

Page 41: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

41

Max pooling translation invariance

[Goodfellow, Bengio, Courville 2016]

● Take max of certain neighbourhood

● Ofen combined followed by downsampling

Page 42: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

42

Max pooling transform invariance

[Goodfellow, Bengio, Courville 2016]

Page 43: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

43

Types of connectivity

[Goodfellow, Bengio, Courville 2016]

Page 44: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

44

Types of connectivity

[Goodfellow, Bengio, Courville 2016]

Page 45: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

45

Types of connectivity

[Goodfellow, Bengio, Courville 2016]

Page 46: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

46

Choosing architecture family

Page 47: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

47

Choosing architecture family● No structure → fully connected

Page 48: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

48

Choosing architecture family● No structure → fully connected● Spatial structure → convolutional

Page 49: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

49

Choosing architecture family● No structure → fully connected● Spatial structure → convolutional● Sequential structure → recurrent

Page 50: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

50

Optimization Algorithm● Lots of variants address choice of learning rate● See Visualization of Algorithms● AdaDelta and RMSprop ofen work well

Page 51: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

51

Sofware for Deep Learning

Page 52: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

52

Current Frameworks● Tensorflow / Keras● Pytorch● DL4J ● Cafe● And many more● Most have CPU-only mode but much faster on NVIDIA GPU

Page 53: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

53

Development strategy● Identify needs: High accuracy or low accuracy?● Choose metric

– Accuracy (% of examples correct), Coverage (% examples processed)– Precision TP/(TP+FP), Recall TP/(TP+FN)– Amount of error in case of regression

● Build end-to-end system– Start from baseline, e.g. initialize with pre-trained network

● Refine driven by data

Page 54: Introduction to Deep Learning CMPT 733 - sfu-db.github.io · DL4J Cafe And many

54

Sources● I. Goodfellow, Y. Bengio, A. Courville “Deep Learning” MIT

Press 2016 [link]