104
Deep Learning: An Engineer’s Dream and a Mathematician’s Nightmare John McKay Information Processing & Algorithms Laboratory Dept. of Electrical Engineering Pennsylvania State University Dartmouth Applied Math Seminar, 2017

Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Deep Learning: An Engineer’s Dream and aMathematician’s Nightmare

John McKay

Information Processing & Algorithms LaboratoryDept. of Electrical EngineeringPennsylvania State University

Dartmouth Applied Math Seminar, 2017

Page 2: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Contents

1 Intro

2 Neural Network DesignConvolutional Neural Networks

3 Training Neural NetworksBackpropagationStochastic Gradient DescentOptimization Issues

4 Transfer LearningPretrainingFine Tuning

[email protected] Skeptical Math Intro to DL 0 / 55

Page 3: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Why Give a Talk to Mathematicians on DL?

� Deep learning is dominating machine learning (imageclassification, object detection, text parsing, etc).

� Industry is keeping pace with/beating academia in terms ofbreakthroughs and adoption. We need mathematicians toprovide rigor to modern strategies.

Month of Citation

Numb

er o

f Tex

ts

Convolutional Neural Networks [Category: Computer Science; 86K articles, 946M words]topology [Category: Mathematics; 303K articles, 4B words]

1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 20140

500

1000

1500

2000

Highcharts.com

Figure: Per-month publication totals of papers on arXiv. Topology(orange) provided as a reference.

[email protected] Skeptical Math Intro to DL 1 / 55

Page 4: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Gatys et al. (2015), Taigman et al. (2014)

[email protected] Skeptical Math Intro to DL 2 / 55

Page 5: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Courtsey Nvidia,Krause et al. (2016)

[email protected] Skeptical Math Intro to DL 3 / 55

Page 6: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

This talk will. . .

� Provide a comprehensive introduction to Deep Learning� Cover most of the modern ideas and trends� Illuminate the issues underpinning popular strategies

[email protected] Skeptical Math Intro to DL 4 / 55

Page 7: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Traditional Supervised Image Classification

Training1. Input Labeled Data

↓2. Extract Features

↓3. Train Classifier

[email protected] Skeptical Math Intro to DL 5 / 55

Page 8: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Traditional Supervised Image Classification

Training1. Input Labeled Data

↓2. Extract Features

↓3. Train Classifier

[email protected] Skeptical Math Intro to DL 5 / 55

Page 9: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Traditional Supervised Image Classification

Training1. Input Labeled Data

↓2. Extract Features

↓3. Train Classifier

[email protected] Skeptical Math Intro to DL 5 / 55

Page 10: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Traditional Supervised Image Classification

Training1. Input Labeled Data

↓2. Extract Features

↓3. Train Classifier

[email protected] Skeptical Math Intro to DL 5 / 55

Page 11: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Training

http://www.cs.trincoll.edu

[email protected] Skeptical Math Intro to DL 6 / 55

Page 12: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Feature Extraction

� Popular features: scale-invariant feature transforms (SIFT),speeded up robust features (SURF), sparsereconstruction-based classification (SRC) coefficients.

� They are hand crafted from analytical work trying todecipher invariant (translation, rotational, scale, etc.) anddiscriminatory image characteristics.

Lowe (1999)

[email protected] Skeptical Math Intro to DL 7 / 55

Page 13: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Feature Extraction

� Popular features: scale-invariant feature transforms (SIFT),speeded up robust features (SURF), sparsereconstruction-based classification (SRC) coefficients.

� They are hand crafted from analytical work trying todecipher invariant (translation, rotational, scale, etc.) anddiscriminatory image characteristics.

� Why not have the computer do it?

Lowe (1999)

[email protected] Skeptical Math Intro to DL 7 / 55

Page 14: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Contents

1 Intro

2 Neural Network DesignConvolutional Neural Networks

3 Training Neural NetworksBackpropagationStochastic Gradient DescentOptimization Issues

4 Transfer LearningPretrainingFine Tuning

[email protected] Skeptical Math Intro to DL 7 / 55

Page 15: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Single Hidden Layer Network

� Input x ∈ Rd , NN output a(2)

a(2) = f2(f1(x))

[email protected] Skeptical Math Intro to DL 8 / 55

Page 16: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Single Hidden Layer Network

� Input x ∈ Rd , NN output a(2)

a(2) = W2 σ(W1x + b1)︸ ︷︷ ︸1st Layer (hidden)

+b2

[email protected] Skeptical Math Intro to DL 8 / 55

Page 17: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Single Hidden Layer Network

� Input x ∈ Rd , NN output a(2)

a(2) =

2nd Layer (Output)︷ ︸︸ ︷W2 σ(W1x + b1)︸ ︷︷ ︸

1st Layer (Hidden)

+b2

[email protected] Skeptical Math Intro to DL 8 / 55

Page 18: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Single Hidden Layer Network

� Input x ∈ Rd , NN output a(2)

a(2) =

2nd Layer (Output)︷ ︸︸ ︷W2 σ(W1x + b1)︸ ︷︷ ︸

1st Layer (Hidden)

+b2

� W1, W2 are the weights and b1, b2 the biases.� σ is an activation function (more on this later)

[email protected] Skeptical Math Intro to DL 8 / 55

Page 19: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Single Hidden Layer Network

� Input x ∈ Rd , NN output a(2)

a(2) =

2nd Layer (Output)︷ ︸︸ ︷W2 σ(W1x + b1)︸ ︷︷ ︸

1st Layer (Hidden)

+b2

� W1, W2 are the weights and b1, b2 the biases.� σ is an activation function (more on this later)

[email protected] Skeptical Math Intro to DL 8 / 55

Page 20: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 9 / 55

Page 21: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Universal Approximation Theorem

� Universal Approximation Theorem: given enough hiddenlayers, weight depth, and σ of a certain set of nonlinearfunctions, then a NN with linear output can represent anyBorel measurable function mapping finite dimensional spaceto another.

� Nice, but representation6=prediction.

Hornik et al. (1989),Leshno et al. (1993)

[email protected] Skeptical Math Intro to DL 10 / 55

Page 22: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

More Parameters, More Descriptiveness

Li/Kaparthy/Johnson, Stanford CS231

[email protected] Skeptical Math Intro to DL 11 / 55

Page 23: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Activation Functions

� Feature spaces require nonlinear activation functions.� There is no one “best” choice for an activation function, to

say nothing of how many layers have one specific activationfunction.

� Since 2009, the go-to: Rectified Linear Units (ReLU)

σ(z) = a such that ai = max(0, zi)

[email protected] Skeptical Math Intro to DL 12 / 55

Page 24: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Activation Functions

imiloainf.wordpress.com (her typo)

[email protected] Skeptical Math Intro to DL 13 / 55

Page 25: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Output Layer

� Nevermind, sigmoids are okay.� For binary problem, sigmoid acts as a probability� For multinomial, popular to use softmax

softmax(zi) =exp(zi)∑j exp(zj)

� softmax ≈ argmax function (one hot vector output)

[email protected] Skeptical Math Intro to DL 14 / 55

Page 26: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Contents

1 Intro

2 Neural Network DesignConvolutional Neural Networks

3 Training Neural NetworksBackpropagationStochastic Gradient DescentOptimization Issues

4 Transfer LearningPretrainingFine Tuning

[email protected] Skeptical Math Intro to DL 14 / 55

Page 27: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Convolutional Neural Networks

� Convolutions with filters are a way to do weight sharing andexploit sparse interactions.

� Fewer weights? Less to train/save.� Modern CNNs are the most accurate algorithms humans

have ever devised (by a bit).

LeCun et al. (1998)

[email protected] Skeptical Math Intro to DL 15 / 55

Page 28: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Convolutional Neural Network

[email protected] Skeptical Math Intro to DL 16 / 55

Page 29: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Convolutional Neural Network

[email protected] Skeptical Math Intro to DL 16 / 55

Page 30: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Convolutional Neural Network

[email protected] Skeptical Math Intro to DL 16 / 55

Page 31: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Convolutional Neural Network

[email protected] Skeptical Math Intro to DL 16 / 55

Page 32: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Convolutional Neural Network

[email protected] Skeptical Math Intro to DL 16 / 55

Page 33: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Convolutional Neural Network

[email protected] Skeptical Math Intro to DL 16 / 55

Page 34: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Convolutional Neural Network

[email protected] Skeptical Math Intro to DL 16 / 55

Page 35: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Convolutional Neural Network

[email protected] Skeptical Math Intro to DL 16 / 55

Page 36: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Li/Kaparthy/Johnson, Stanford CS231

[email protected] Skeptical Math Intro to DL 17 / 55

Page 37: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Recurrent Neural Networks

source

[email protected] Skeptical Math Intro to DL 18 / 55

Page 38: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Regularization

Li/Kaparthy/Johnson, Stanford CS231,Gal and Ghahramani (2015)

[email protected] Skeptical Math Intro to DL 19 / 55

Page 39: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Contents

1 Intro

2 Neural Network DesignConvolutional Neural Networks

3 Training Neural NetworksBackpropagationStochastic Gradient DescentOptimization Issues

4 Transfer LearningPretrainingFine Tuning

[email protected] Skeptical Math Intro to DL 19 / 55

Page 40: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

How to train Neural Networks?

� Suppose we are given a set x1, . . . ,xn with known labelsy(xi) = yi . Training entails tuning parameters to minimizethe error between y and the model’s outputs for each xi .

� Cost functions represent a surrogate for classification error.We indirectly improve classification performance.

� In minimizing the cost function, gradient descent methodshave become the go-to strategy. We will later discuss issueswith this.

� What is the gradient of a NN w.r.t. its parameters?

[email protected] Skeptical Math Intro to DL 20 / 55

Page 41: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Contents

1 Intro

2 Neural Network DesignConvolutional Neural Networks

3 Training Neural NetworksBackpropagationStochastic Gradient DescentOptimization Issues

4 Transfer LearningPretrainingFine Tuning

[email protected] Skeptical Math Intro to DL 20 / 55

Page 42: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Backpropagation

� Calculating the gradient was significant challenge until 1986when Rumelhart et al proposed backpropagation.

� What was the problem before them?� Nested nonlinear activation functions and nontrivial cost

functions are messy and require a lot of work for slight tweaks.� NNs with even a little depth involve several matrix

multiplications. If one were to use a finite difference scheme,several forward passes through a network gets expensive.

Rumelhart et al. (1986)

[email protected] Skeptical Math Intro to DL 21 / 55

Page 43: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Backpropagation

� Calculating the gradient was significant challenge until 1986when Rumelhart et al proposed backpropagation.

� What was the problem before them?� Nested nonlinear activation functions and nontrivial cost

functions are messy and require a lot of work for slight tweaks.� NNs with even a little depth involve several matrix

multiplications. If one were to use a finite difference scheme,several forward passes through a network gets expensive.

Rumelhart et al. (1986)

[email protected] Skeptical Math Intro to DL 21 / 55

Page 44: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Backpropagation

� Calculating the gradient was significant challenge until 1986when Rumelhart et al proposed backpropagation.

� What was the problem before them?� Nested nonlinear activation functions and nontrivial cost

functions are messy and require a lot of work for slight tweaks.� NNs with even a little depth involve several matrix

multiplications. If one were to use a finite difference scheme,several forward passes through a network gets expensive.

Rumelhart et al. (1986)

[email protected] Skeptical Math Intro to DL 21 / 55

Page 45: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Backpropagation

� As we will show, Backpropagation allows for us to computethe gradient of a NN at the cost of two forward passesregardless of the architecture.

� It plays mainly off of implementations of the chain rule.

[email protected] Skeptical Math Intro to DL 22 / 55

Page 46: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Example Network: C cost function, L number of layers, inputtraining x ∈ Rm (n in total), y(x) desired output

alj(x) = al

j = σ

(∑k

w ljkal−1

k + blj

)

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 23 / 55

Page 47: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Example Network: C cost function, L number of layers, inputtraining x ∈ Rm (n in total), y(x) desired output

alj(x) = al

j = σ

(∑k

w ljkal−1

k + blj

)

C has two requirements: it can be written as a function of thenetwork outputs aL and arranged as a sum of cost functionsfor each training sample. Example:

C(aL) =1

2n

∑x||y(x)− aL(x)||22

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 23 / 55

Page 48: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Example Network: C cost function, L number of layers, inputtraining x ∈ Rm (n in total), y(x) desired output

alj(x) = al

j = σ

(∑k

w ljkal−1

k + blj

)a1

j = xj

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 23 / 55

Page 49: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Example Network: C cost function, L number of layers, inputtraining x ∈ Rm (n in total), y(x) desired output

alj = σ

(∑k

w ljkal−1

k + blj

)

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 23 / 55

Page 50: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Example Network: C cost function, L number of layers, inputtraining x ∈ Rm (n in total), y(x) desired output

alj = σ

(∑k

w ljkal−1

k + blj︸ ︷︷ ︸

z l

)

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 23 / 55

Page 51: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Example Network: C cost function, L number of layers, inputtraining x ∈ Rm (n in total), y(x) desired output

al = σ(

W lal−1 + bl)= σ(z l)

Goal: find ∂C/∂w ljk , ∂C/∂bl

j

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 23 / 55

Page 52: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

al = σ(

W lal−1 + bl)= σ(z l)

Let δlj = ∂C/∂z l

j- Error in Output Layer

δLj =

∂C∂aL

j

∂aLj

∂zLj=∂C∂aL

jσ′(zL

j ) (1)

-δl = ((W l+1)Tδl+1)� σ′(z l) (2)

-∂C∂bl

j= δl

j (3)

-∂C∂w l

jk= al−1

k δlj (4)

[email protected] Skeptical Math Intro to DL 24 / 55

Page 53: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

al = σ(

W lal−1 + bl)= σ(z l)

Let δlj = ∂C/∂z l

j-

δLj =

∂C∂aL

j

∂aLj

∂zLj=∂C∂aL

jσ′(zL

j ) (1)

- δl in terms of δl+1

δl = ((W l+1)Tδl+1)� σ′(z l) (2)

-∂C∂bl

j= δl

j (3)

-∂C∂w l

jk= al−1

k δlj (4)

[email protected] Skeptical Math Intro to DL 24 / 55

Page 54: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

al = σ(

W lal−1 + bl)= σ(z l)

Let δlj = ∂C/∂z l

j-

δLj =

∂C∂aL

j

∂aLj

∂zLj=∂C∂aL

jσ′(zL

j ) (1)

-δl = ((W l+1)Tδl+1)� σ′(z l) (2)

- Partial w.r.t. bias∂C∂bl

j= δl

j (3)

-∂C∂w l

jk= al−1

k δlj (4)

[email protected] Skeptical Math Intro to DL 24 / 55

Page 55: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

al = σ(

W lal−1 + bl)= σ(z l)

Let δlj = ∂C/∂z l

j-

δLj =

∂C∂aL

j

∂aLj

∂zLj=∂C∂aL

jσ′(zL

j ) (1)

-δl = ((W l+1)Tδl+1)� σ′(z l) (2)

-∂C∂bl

j= δl

j (3)

- Partial w.r.t. weights∂C∂w l

jk= al−1

k δlj (4)

[email protected] Skeptical Math Intro to DL 24 / 55

Page 56: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Backpropagation Algorithm

1 Input x and solve for a(1)

2 Feedforward through the network, i.e. for l = 2, . . . ,L find

z l = W lal−1 + bl , al = σ(z l)

3 Find δL = ∇aLC � σ(zL)4 Backpropagate error by, for l = L− 1,L− 2, . . . ,2,

calculatingδl = ((W l+1)Tδl+1)� σ′(z l)

5 Gradients of Cost Function are found as∂C∂bl

j= δl

j ,∂C∂w l

jk= al−1

k δlj

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 25 / 55

Page 57: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Backpropagation Algorithm

1 Input x and solve for a(1)

2 Feedforward through the network, i.e. for l = 2, . . . ,L find

z l = W lal−1 + bl , al = σ(z l)

3 Find δL = ∇aLC � σ(zL)4 Backpropagate error by, for l = L− 1,L− 2, . . . ,2,

calculatingδl = ((W l+1)Tδl+1)� σ′(z l)

5 Gradients of Cost Function are found as∂C∂bl

j= δl

j ,∂C∂w l

jk= al−1

k δlj

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 25 / 55

Page 58: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Backpropagation Algorithm

1 Input x and solve for a(1)

2 Feedforward through the network, i.e. for l = 2, . . . ,L find

z l = W lal−1 + bl , al = σ(z l)

3 Find δL = ∇aLC � σ(zL)4 Backpropagate error by, for l = L− 1,L− 2, . . . ,2,

calculatingδl = ((W l+1)Tδl+1)� σ′(z l)

5 Gradients of Cost Function are found as∂C∂bl

j= δl

j ,∂C∂w l

jk= al−1

k δlj

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 25 / 55

Page 59: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Backpropagation Algorithm

1 Input x and solve for a(1)

2 Feedforward through the network, i.e. for l = 2, . . . ,L find

z l = W lal−1 + bl , al = σ(z l)

3 Find δL = ∇aLC � σ(zL)4 Backpropagate error by, for l = L− 1,L− 2, . . . ,2,

calculatingδl = ((W l+1)Tδl+1)� σ′(z l)

5 Gradients of Cost Function are found as∂C∂bl

j= δl

j ,∂C∂w l

jk= al−1

k δlj

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 25 / 55

Page 60: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Backpropagation Algorithm

1 Input x and solve for a(1)

2 Feedforward through the network, i.e. for l = 2, . . . ,L find

z l = W lal−1 + bl , al = σ(z l)

3 Find δL = ∇aLC � σ(zL)4 Backpropagate error by, for l = L− 1,L− 2, . . . ,2,

calculatingδl = ((W l+1)Tδl+1)� σ′(z l)

5 Gradients of Cost Function are found as∂C∂bl

j= δl

j ,∂C∂w l

jk= al−1

k δlj

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 25 / 55

Page 61: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Learned Filters

Courtesy Yann LeCun

[email protected] Skeptical Math Intro to DL 26 / 55

Page 62: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Contents

1 Intro

2 Neural Network DesignConvolutional Neural Networks

3 Training Neural NetworksBackpropagationStochastic Gradient DescentOptimization Issues

4 Transfer LearningPretrainingFine Tuning

[email protected] Skeptical Math Intro to DL 26 / 55

Page 63: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Gradient Descent

� Backpropagation→ gradient� (Batch) gradient descent:

W l = W l − αW∂C∂W l

b = bl − αb∂C∂bl

� Issue: C is the sum of costs associated with each trainingsample.

� Many problems use millions of training samples. . .

[email protected] Skeptical Math Intro to DL 27 / 55

Page 64: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Gradient Descent

� Backpropagation→ gradient� (Batch) gradient descent:

W l = W l − αW∂C∂W l

b = bl − αb∂C∂bl

� Issue: C is the sum of costs associated with each trainingsample.

� Many problems use millions of training samples. . .

[email protected] Skeptical Math Intro to DL 27 / 55

Page 65: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Stochastic Gradient Descent

� Randomly choose m < n training samples x1, . . . ,xm

C(m) =m∑

i=1

Cxi

� Update step:

W lk+1 = W l

k −nαW

m

m∑i=1

∂Cxi

∂W lk

bk+1 = bl − nαb

m

m∑i=1

∂Cxi

∂blk

[email protected] Skeptical Math Intro to DL 28 / 55

Page 66: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Stochastic Gradient Descent

� Randomly choose m < n training samples x1, . . . ,xm

C(m) =m∑

i=1

Cxi

W lk+1 = W l

k − αWnm

m∑i=1

∂Cxi

∂W lk︸ ︷︷ ︸

gW

E[gW ]= ∇W C

[email protected] Skeptical Math Intro to DL 28 / 55

Page 67: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Stochastic Gradient

� Randomly choose m < n training samples x1, . . . ,xm

C(m) =m∑

i=1

Cxi

W lk+1 = W l

k − αWnm

m∑i=1

∂Cxi

∂W lk

� Learning rate αW is typically fixed and “small” (but not “toosmall”).

[email protected] Skeptical Math Intro to DL 28 / 55

Page 68: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Stochastic Gradient “Descent”

� Randomly choose m < n training samples x1, . . . ,xm

C(m) =m∑

i=1

Cxi

W lk+1 = W l

k − αWnm

m∑i=1

∂Cxi

∂W lk

� Learning rate αW is typically fixed and “small” (but not “toosmall”).

[email protected] Skeptical Math Intro to DL 28 / 55

Page 69: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Stochastic Gradient Descent

Figure: GD on left, SGD on right

Ng, Standford Machine Learning (Fall 2011)

[email protected] Skeptical Math Intro to DL 29 / 55

Page 70: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Stochastic Gradient Descent

Figure: GD on left, SGD on right

Still acceptable - we just want “close enough”

Ng, Standford Machine Learning (Fall 2011)

[email protected] Skeptical Math Intro to DL 29 / 55

Page 71: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Summarize Training

[email protected] Skeptical Math Intro to DL 30 / 55

Page 72: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Quick Note on C

� Least squares is useful for examples, but usually notsuitable for practice (slow with sigmoid output).

� Cross-entropy

C =1n

∑x

yT ln(aL(x)) + (1− y)T ln(1− aL(x))

� Chosen for “nice” attributes, but no idea of resultingtopology, which influences everything.

[email protected] Skeptical Math Intro to DL 31 / 55

Page 73: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Contents

1 Intro

2 Neural Network DesignConvolutional Neural Networks

3 Training Neural NetworksBackpropagationStochastic Gradient DescentOptimization Issues

4 Transfer LearningPretrainingFine Tuning

[email protected] Skeptical Math Intro to DL 31 / 55

Page 74: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Optimization Issues

� Training for NNs remains one of the least analytically soundaspects.

� What does the cost function look like?� Where are its local minima (if there are any)?� How will SGD jump around?

[email protected] Skeptical Math Intro to DL 32 / 55

Page 75: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Local Minima Problem

Sontag and Sussmann (1989),Brady et al. (1989),Gori and Tesi (1992)

[email protected] Skeptical Math Intro to DL 33 / 55

Page 76: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Saddle Points

� Suppose H = ∇2C and λi its eigenvalues.� P(λi > 0) = 1

2=⇒P(λ1 > 0, . . . , λK > 0) = (12)

K

� (12)

K → 0 as K →∞, meaning the probability that a criticalpoint is a saddle point.

Dauphin et al. (2014)

[email protected] Skeptical Math Intro to DL 34 / 55

Page 77: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Sontag and Sussmann (1989),Brady et al. (1989),Gori and Tesi (1992)

[email protected] Skeptical Math Intro to DL 35 / 55

Page 78: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

No Minima Problem

[email protected] Skeptical Math Intro to DL 36 / 55

Page 79: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

In Practice. . .

� Even in cases where local minimum is not found, results canbe state-of-the-art; just want SGD to find “very small value.”

� Initialization of weights is a major area of research to try toplace model in “good” area to find local minima (more onthis later).

� Structural elements have adapted to (experimentally) fixthese issues (ReLU, cross-entropy cost, convolutionallayers, etc.).

Goodfellow et al. (2016)

[email protected] Skeptical Math Intro to DL 37 / 55

Page 80: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Unstable Gradient vs. Architecture

Layers (Parameters∗) Mutli-Class Error Top-5 Error11 (133) 29.6 10.413 (133) 28.7 9.916 (134) 28.1 9.416 (138) 27.3 8.819 (144) 25.5 8.0

∗ Number in millions

Simonyan and Zisserman (2014)

[email protected] Skeptical Math Intro to DL 38 / 55

Page 81: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Unstable Gradient

� Consider simplified model

� Typical initialization: wi ∼ N (0,1), bi = 1

Nielsen (2015)

[email protected] Skeptical Math Intro to DL 39 / 55

Page 82: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Vanishing Gradient

� First layers learn slower than later ones

Nielsen (2015)[email protected] Skeptical Math Intro to DL 40 / 55

Page 83: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Unstable Gradients for FFN

Nielsen (2015)[email protected] Skeptical Math Intro to DL 41 / 55

Page 84: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Overcoming Unstable Gradients

� Initialization (again)1 Random Walk Initialization1

W̃ ` = ωW ` where ωReLU = (√

2) exp(

1.2max(N`,6)− 2.4

)2 Orthonormal Initialization2

3 Unit Variance Initilization3,4

Sussillo and Abbott (2014)1,Saxe et al. (2013)2,He et al. (2015)3,Mishkin and Matas (2015)4

[email protected] Skeptical Math Intro to DL 42 / 55

Page 85: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Overcoming Unstable Gradients

� Student-teacher model� Drop in layers as you go� Change the cost function and/or activiation functions?

Romero et al. (2014)

[email protected] Skeptical Math Intro to DL 43 / 55

Page 86: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Contents

1 Intro

2 Neural Network DesignConvolutional Neural Networks

3 Training Neural NetworksBackpropagationStochastic Gradient DescentOptimization Issues

4 Transfer LearningPretrainingFine Tuning

[email protected] Skeptical Math Intro to DL 43 / 55

Page 87: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Transfer Learning & Optimization Problems

� Transfer learning: taking a model primed for task X andapplying to unrelated problem Y . For example, taking aCNN designed to discern amongst vehicles and applying itto classifying different animals without manipulatingparameters.

� Transfer learning and its predecessor pretraining areunintentional solutions to the initialization problems.

[email protected] Skeptical Math Intro to DL 44 / 55

Page 88: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Contents

1 Intro

2 Neural Network DesignConvolutional Neural Networks

3 Training Neural NetworksBackpropagationStochastic Gradient DescentOptimization Issues

4 Transfer LearningPretrainingFine Tuning

[email protected] Skeptical Math Intro to DL 44 / 55

Page 89: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Pretraining

� Pretraining revived deep neural networks from obscurity.� Without convolutions or recursion, deep full connected

networks could be trained and avoid local minima problems.� Unsupervised pretraining has mostly been abandoned, but it

inspired much of the modern work in transfer learning.

Bengio et al. (2007)

[email protected] Skeptical Math Intro to DL 45 / 55

Page 90: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Unsupervised Pretraining

Let f be the identity function, X input data matrix (1 row perexample), K number of iterationsfor k ∈ {1, . . . ,K}

f (k) = L(X )

f = f (k) ◦ f

X = f (k)(X )

The above is done for each layer. The input is the previouslayer’s output.

Goodfellow et al. (2016)

[email protected] Skeptical Math Intro to DL 46 / 55

Page 91: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Unsupervised Pretraining

Unsupervised pretraining is thought to work because:1 A model is sensitive to its initializations.2 Learning the input distribution is useful.

Goodfellow et al. (2016)

[email protected] Skeptical Math Intro to DL 47 / 55

Page 92: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Unsupervised Pretraining

Unsupervised pretraining is thought to work because:1 A model is sensitive to its initializations.

� This is the least mathematically understood part ofpretraining.

� It may start the model at local minima otherwise inaccessiblebased on the shape of the cost function.

� How much of the initialization survives training?

2 Learning the input distribution is useful.

Goodfellow et al. (2016)

[email protected] Skeptical Math Intro to DL 47 / 55

Page 93: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Unsupervised Pretraining

Unsupervised pretraining is thought to work because:1 A model is sensitive to its initializations.2 Learning the input distribution is useful.

� Suppose car/motorcycle model; will pretraining spot wheels?Will its representation of a wheel be helpful?

� There is no coherent theory as to if/how/why/whenunsupervised features help with NN training.

Goodfellow et al. (2016)

[email protected] Skeptical Math Intro to DL 47 / 55

Page 94: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Supervised Pretraining

x → σ(W (1)x + b(1))→ y(x)

Bengio et al. (2007)

[email protected] Skeptical Math Intro to DL 48 / 55

Page 95: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Supervised Pretraining

x → σ(W (1)x + b(1))︸ ︷︷ ︸x(1)

→ y(x)

x (1) → σ(W (2)x (1) + b(2))→ y(x)

Bengio et al. (2007)

[email protected] Skeptical Math Intro to DL 48 / 55

Page 96: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Supervised Pretraining

x → σ(W (1)x + b(1))︸ ︷︷ ︸x(1)

→ y(x)

x (1) → σ(W (2)x (1) + b(2))→ y(x)...

x → σ(W (1)x + b(1))→ σ(W (2)a(1) + b(2))→ · · · → y(x)

Bengio et al. (2007)

[email protected] Skeptical Math Intro to DL 48 / 55

Page 97: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Contents

1 Intro

2 Neural Network DesignConvolutional Neural Networks

3 Training Neural NetworksBackpropagationStochastic Gradient DescentOptimization Issues

4 Transfer LearningPretrainingFine Tuning

[email protected] Skeptical Math Intro to DL 48 / 55

Page 98: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Oquab et al. (2014)

[email protected] Skeptical Math Intro to DL 49 / 55

Page 99: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Finer Details

Inpu

tIm

age

Outpu

tLayer

Inpu

tIm

age

Outpu

tLayer

SVM

Model

xx

x

xx

oo o

o

1

2

3

McKay 2017

[email protected] Skeptical Math Intro to DL 50 / 55

Page 100: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

70

75

80

85

90

95

100

70 75 80 85 90 95 100Recall

Precision

VGG-f VGG16 VGG19 Alex

McKay 2017

[email protected] Skeptical Math Intro to DL 51 / 55

Page 101: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

Thank You

Acknolwedgements� Dr. Anne Gelb� Dr. Suren Jayasuriya� iPal NN group: Tiep Vu, Hojjat Mousavi, Tiantong Guo

[email protected] Skeptical Math Intro to DL 52 / 55

Page 102: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

References I

Bengio, Y. et al. (2007). “Greedy layer-wise training of deep networks”. In: IN NIPS.

Brady, M. L., R. Raghavan, and J. Slawny (1989). “Back propagation fails to separate where perceptrons succeed”.In: IEEE Transactions on Circuits and Systems 36.5,

Dauphin, Y. N. et al. (2014). “Identifying and attacking the saddle point problem in high-dimensional non-convexoptimization”. In: Advances in neural information processing systems,

Gal, Y. and Z. Ghahramani (2015). “Dropout as a Bayesian approximation: Representing model uncertainty in deeplearning”. In: arXiv preprint arXiv:1506.02142 2.

Gatys, L. A., A. S. Ecker, and M. Bethge (2015). “A neural algorithm of artistic style”. In: arXiv preprintarXiv:1508.06576.

Goodfellow, I., Y. Bengio, and A. Courville (2016). Deep Learning. http://www.deeplearningbook.org. MITPress.

Gori, M. and A. Tesi (1992). “On the problem of local minima in backpropagation”. In: IEEE Transactions on PatternAnalysis and Machine Intelligence 14.1,

He, K. et al. (2015). “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification”.In: Proceedings of the IEEE international conference on computer vision,

Hornik, K., M. Stinchcombe, and H. White (1989). “Multilayer feedforward networks are universal approximators”. In:Neural networks 2.5,

[email protected] Skeptical Math Intro to DL 53 / 55

Page 103: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

References II

Krause, J. et al. (2016). “A Hierarchical Approach for Generating Descriptive Image Paragraphs”. In: arXiv preprintarXiv:1611.06607.

LeCun, Y. et al. (1998). “Gradient-based learning applied to document recognition”. In: Proceedings of the IEEE86.11,

Leshno, M. et al. (1993). “Multilayer feedforward networks with a nonpolynomial activation function can approximateany function”. In: Neural networks 6.6,

Lowe, D. G. (1999). “Object recognition from local scale-invariant features”. In: Computer vision, 1999. Theproceedings of the seventh IEEE international conference on. Vol. 2. Ieee,

Mishkin, D. and J. Matas (2015). “All you need is a good init”. In: arXiv preprint arXiv:1511.06422.

Nielsen, M. A. (2015). Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com.Determination Press.

Oquab, M. et al. (2014). “Learning and transferring mid-level image representations using convolutional neuralnetworks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition,

Romero, A. et al. (2014). “Fitnets: Hints for thin deep nets”. In: arXiv preprint arXiv:1412.6550.

Rumelhart, D. E., G. E. Hinton, and R. J. Williams (1986). “Learning representations by back-propagating errors”.In: Nature 323,

[email protected] Skeptical Math Intro to DL 54 / 55

Page 104: Deep Learning: An Engineer's Dream and a Mathematician's ... · Intro Neural Network Design Training Neural Networks Transfer Learning References Contents 1 Intro 2 Neural Network

Intro Neural Network Design Training Neural Networks Transfer Learning References

References III

Saxe, A. M., J. L. McClelland, and S. Ganguli (2013). “Exact solutions to the nonlinear dynamics of learning in deeplinear neural networks”. In: arXiv preprint arXiv:1312.6120.

Simonyan, K. and A. Zisserman (2014). “Very deep convolutional networks for large-scale image recognition”. In:arXiv preprint arXiv:1409.1556.

Sontag, E. D. and H. J. Sussmann (1989). “Backpropagation can give rise to spurious local minima even fornetworks without hidden layers”. In: Complex Systems 3.1,

Sussillo, D. and L. Abbott (2014). “Random walk initialization for training very deep feedforward networks”. In: arXivpreprint arXiv:1412.6558.

Taigman, Y. et al. (2014). “Deepface: Closing the gap to human-level performance in face verification”. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,

[email protected] Skeptical Math Intro to DL 55 / 55