Deep Learning: An Introduction for Applied...

Deep Learning: An Introduction forApplied Mathematicians

Manuel Baumann

June 13, 2018

CSC Reading Group Seminar

Disclaimer

Claim: I am not an expert in machine learning – But we are!

Manuel Baumann, baumann@mpi-magdeburg.mpg.de Introduction to Deep Learning 2/16

Reference

C. F. Higham and D. J. Higham (2018). Deep Learning: An Introductionfor Applied Mathematicians. arXiv:1801.05894v1.

Reference

New names for old friends

Machine Learning Numerical Analysis

neuron smoothed step function

neural network directed graph

training phase parameter fitting

stochastic gradient steepest descent variant

learning rate step size in line search

back propagation adjoint equation

hidden layers parameter (over-)fitting

’deep’ learning large-scale

artificial intelligence fp(x)

??? model-order reduction

training phase parameter fitting → p

Overview

1. The General Set-up

2. Stochastic Gradient

3. Back Propagation

4. MATLAB Example

5. Current Research Directions

The General Set-up

The sigmoid function,

σ(x) =1

1 + e−x

models the bahavior of a neuron in the brain.

A neural network

a[1] = x ∈ Rn1 ,

a[l] = σ(W [l]a[l−1] + b[l]

)∈ Rnl , l = 2, 3, ..., L

The entering example

minW [j],b[j]

10∑i=1

2‖y(xi)− F (xi)‖22, j ∈ {2, 3, 4},

F (x) = σ(W [4]σ

(W [3]σ

(W [2]x+ b[2]

)+ b[3]

)b[4])∈ R2

The entering example

minp∈R23

10∑i=1

2‖y(xi)− F (xi)‖22, j ∈ {2, 3, 4},

F (x) = σ(W [4]σ

(W [3]σ

(W [2]x+ b[2]

)+ b[3]

)b[4])∈ R2

Stochastic Gradient

Objective function:

J (p) = 1

N∑i=1

2‖y(xi)− a[L](xi, p)‖22=:

N∑i=1

Ci(xi, p)

Steepest descent:

p← p− η∇J (p), η ∈ R+ is called ’learning rate’.

Stochastic gradient:

∇J (p) =≈

Stochastic Gradient

Objective function:

J (p) = 1

N∑i=1

2‖y(xi)− a[L](xi, p)‖22=:

N∑i=1

Ci(xi, p)

Steepest descent:

∇J (p) =≈

Stochastic Gradient

Objective function:

J (p) = 1

N∑i=1

2‖y(xi)− a[L](xi, p)‖22 =:

N∑i=1

Ci(xi, p)

Steepest descent:

∇J (p) = 1

N∑i=1

∇pCi(xi, p) ≈ 1

|I|∑i∈I∇pCi(x

Stochastic Gradient

Objective function:

J (p) = 1

N∑i=1

2‖y(xi)− a[L](xi, p)‖22 =:

N∑i=1

Ci(xi, p)

Steepest descent:

∇J (p) = 1

N∑i=1

∇pCi(xi, p) ≈ 1

|I|∑i∈I∇pCi(x

Back Propagation (1/3)

Now, p ∼{[W [l]

]j,k,[b[l]]j

}. Let z[l] :=W [l]a[l−1] + b[l] and δ

[l]j := ∂C

∂z[l]j

Lemma: Back Propagation

The partial derivatives are given by,

δ[L] = σ′(z[L]) · (a[L] − y), (1)

δ[l] = σ′(z[l]) · (W [l+1])T δ[l+1], 2 ≤ l ≤ L− 1, (2)

∂b[l]j

= δ[l]j , 2 ≤ l ≤ L, (3)

∂w[l]jk

= δ[l]j a

[l−1]k , 2 ≤ l ≤ L. (4)

Back Propagation (2/3)Proof.

We prove (1) component-wise:

δ[L]j =

∂z[L]j

∂a[L]j

∂z[L]j

= (a[L]j − yj)σ

′(z[L]j ) = (a

[L]j − yj)(σ(z

[L]j )− σ2(z

[L]j ))

Next, we prove (2) component-wise:

δ[l]j =

∂z[l]j

nl+1∑k=1

∂z[l+1]k

∂z[l]j

nl+1∑k=1

δ[l+1]k

∂z[l+1]k

∂z[l]j

nl+1∑k=1

δ[l+1]k w

[l+1]kj σ′(z

[l]j ),

where z[l+1]k =

s=1 w[l+1]ks σ(z

[l]s ) + b

[l+1]k .

(3) and (4) similar.

δ[L]j =

∂z[L]j

∂a[L]j

∂z[L]j

= (a[L]j − yj)σ

′(z[L]j ) = (a

[L]j − yj)(σ(z

[L]j )− σ2(z

[L]j ))

δ[l]j =

∂z[l]j

nl+1∑k=1

∂z[l+1]k

∂z[l]j

nl+1∑k=1

δ[l+1]k

∂z[l+1]k

∂z[l]j

nl+1∑k=1

δ[l+1]k w

[l+1]kj σ′(z

[l]j ),

where z[l+1]k =

s=1 w[l+1]ks σ(z

[l]s ) + b

[l+1]k .

δ[L]j =

∂z[L]j

∂a[L]j

∂z[L]j

= (a[L]j − yj)σ

′(z[L]j ) = (a

[L]j − yj)(σ(z

[L]j )− σ2(z

[L]j ))

δ[l]j =

∂z[l]j

nl+1∑k=1

∂z[l+1]k

∂z[l]j

nl+1∑k=1

δ[l+1]k

∂z[l+1]k

∂z[l]j

nl+1∑k=1

δ[l+1]k w

[l+1]kj σ′(z

[l]j ),

where z[l+1]k =

s=1 w[l+1]ks σ(z

[l]s ) + b

[l+1]k .

Back Propagation (3/3)

Interpretation:

Evaluation of a[L] requires aso-called forward pass:a[1], z[2] → a[2], z[3] → ...→ a[L]

Compute: σ[L] via (1)

Compute: backward pass (2)σ[L] → σ[L−1] → ...→ σ[2]

Gradients via (3)-(4)

MATLAB Example

Some more detailsConvolutional Neural Network:

Presented approach unfeasible for large data (W [l] is dense).

Layers can be pre- and post-precessing steps used in image analysis;filtering, max pooling, average pooling, ...

Avoiding Overfitting:

Trained network works well on given data, but not on new data.

Splitting: training – validation data

Dropout: independently remove neurons

Current Research Directions

Research directions:

proofs – e.g. when data is assumed to be i.i.d.

in practice: design of layers

perturbation theory: update trained network

autoencoders: ‖x−G(F (x))‖22 → min

Two software examples:

http://www.vlfeat.org/matconvnet/

http://scikit-learn.org/

Summary

New names for old friends.

PCA POD

Deep Learning: An Introduction for Applied...

Documents

2009 08 SITA - Stuart Hayward-Higham

BOURNEMOUTH - The UK Kerry Blue Terrier Association€¦ · Bournemouth Canine Association, c/o Higham Press Ltd. New Street, Higham, Alfreton, Derbyshire DE55 6BP. ... Terrier Group

Computing the Polar Decomposition—with Applications Higham

Higham hall courses apr sep 2016

Research Matters Nick Higham School of Mathematics The …higham/talks/writing... · 2019. 4. 25. · Better: as shown by Vaughan and Williams [2]. Nick Higham How to Write Scientiﬁc

Understanding and Managing WebSphere V5 Tony Higham FatWire Software

Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Higham Five cent article--final - York University

Suicidio indígena 2014 ML.pdf

Label: "FOIA - Scott Higham round 2"

Emma Higham - The Future of Doubleclick with Google Analytics

Cayley, Sylvester, and Early Matrix Theory · Cayley, Sylvester, and Early Matrix Theory Nick Higham School of Mathematics The University of Manchester higham@ma.man.ac.uk higham

Amen Shem Nora - Trad arr ML.pdf

Research Matters Nick Higham School of Mathematics The ...higham/talks/conte18.pdf · MATLAB double precision, Symbolic Math Toolbox, VPA arithmetic, digits(34), Multiprecision Computing

leeds - Higham Press Ltd · Higham Press Ltd., New Street, Higham, Alfreton, Derbyshire DE55 6BP. Tel. 01773 832390 Fax 01773 520794 E-mail: mail@highampress.co.uk Championship Show

AMS Dating Higham

Ben Higham - Mini Portfolio

BOURNEMOUTH CANINE ASSOCIATION IN-OUT RINGS FOR … · Bournemouth Canine Association, c/o Higham Press Ltd. New Street, Higham, Alfreton, Derbyshire DE55 6BP. Tel: 01773 832390

Nick Higham Research Matters School of Mathematics The ...higham/talks/talk11...Research Matters February 25, 2009 Nick Higham Director of Research School of Mathematics 1 / 6 The

Higham Hall courses Apr to Sep 2015