37
Scientific Computing Lecture Series Introduction to Deep Learning Mustafa K¨ ut¨ uk * * Scientific Computing, Institute of Applied Mathematics Lecture III Deep Learning: An Introduction for Applied Mathematicians M. K¨ ut¨ uk (METU) Introduction to Deep Learning 1 / 37

Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Scientific Computing Lecture SeriesIntroduction to Deep Learning

Mustafa Kutuk*

∗Scientific Computing, Institute of Applied Mathematics

Lecture IIIDeep Learning: An Introduction for Applied Mathematicians

M. Kutuk (METU) Introduction to Deep Learning 1 / 37

Page 2: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Lecture III–Outline

1 Motivation

2 General Overview of Deep Learning

3 Matlab Example

M. Kutuk (METU) Introduction to Deep Learning 2 / 37

Page 3: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

1 Motivation

2 General Overview of Deep Learning

3 Matlab Example

M. Kutuk (METU) Introduction to Deep Learning 3 / 37

Page 4: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Introduction

Deep learning is a powerful function that mimics the human brain in terms of

its working style for decision making with data processing and pattern

creation.

Figure 1: Classification with deep learning1

1https://databricks.com/blog/2017/06/06/databricks-vision-simplify-large-scale-deep-

learning.htmlM. Kutuk (METU) Introduction to Deep Learning 4 / 37

Page 5: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Introduction

Deep learning is widely used areas such that

Image/Text Classification,

Speech Recognition2,

Image Segmentation3.

2https://medium.com/@ageitgey/machine-learning-is-fun-part-6-how-to-do-speech-

recognition-with-deep-learning-28293c162f7a3http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review

M. Kutuk (METU) Introduction to Deep Learning 5 / 37

Page 6: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Introduction & Goals

There are also some areas of mathematics that uses deep learning:

Approximation Theory,

Numerical Optimization,

Linear Algebra.

Our aims are

to give brief introduction for deep learning,

to define some terms related to this area,

to apply deep learning for a small example in Matlab.

M. Kutuk (METU) Introduction to Deep Learning 6 / 37

Page 7: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

1 Motivation

2 General Overview of Deep Learning

3 Matlab Example

M. Kutuk (METU) Introduction to Deep Learning 7 / 37

Page 8: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Scheme of the Supervised Learning

M. Kutuk (METU) Introduction to Deep Learning 8 / 37

Page 9: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Example Problem

A map, which shows the oil drilling sites, is given below. The circles (class A)

denote the successful outcome while crosses (class B) are the unsuccessful

outcome.

Figure 2: Oil Drilling Sites

M. Kutuk (METU) Introduction to Deep Learning 9 / 37

Page 10: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

A Simple Model:The Perceptron

Figure 3: MIT’s 6.S191:Introduction to Deep Learning Course4

4http://introtodeeplearning.com/2019/materials/2019 6S191 L1.pdfM. Kutuk (METU) Introduction to Deep Learning 10 / 37

Page 11: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

A Simple Model:The Perceptron

M. Kutuk (METU) Introduction to Deep Learning 11 / 37

Page 12: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

A Simple Model:The Perceptron

M. Kutuk (METU) Introduction to Deep Learning 12 / 37

Page 13: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

A Simple Model:The Perceptron

M. Kutuk (METU) Introduction to Deep Learning 13 / 37

Page 14: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Activation Functions:Sigmoid Function

M. Kutuk (METU) Introduction to Deep Learning 14 / 37

Page 15: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Activation Functions:Hyperbolic Tangent Function

M. Kutuk (METU) Introduction to Deep Learning 15 / 37

Page 16: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Activation Functions:ReLU Function

M. Kutuk (METU) Introduction to Deep Learning 16 / 37

Page 17: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Activation Functions:Leaky ReLU Function

M. Kutuk (METU) Introduction to Deep Learning 17 / 37

Page 18: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Multilayer Perceptron(MLP)

M. Kutuk (METU) Introduction to Deep Learning 18 / 37

Page 19: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Multilayer Perceptron(MLP)

The activation function can be written as

σ(WX + b)

Input of these model can be shown as

X =

x1

x2

M. Kutuk (METU) Introduction to Deep Learning 19 / 37

Page 20: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Multilayer Perceptron(MLP)

The weight matrix and bias vector of the 2nd layer can be shown as

W [2] =

W 31 W 32

W 41 W 42

b[2] =

b3

b4

The output of the 2nd layer can be obtained asx3

x4

= σ

(W 31x1 + W 32x2 + b3

W 41x1 + W 42x2 + b4

)

M. Kutuk (METU) Introduction to Deep Learning 20 / 37

Page 21: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Multilayer Perceptron(MLP)

The activation function of the 3rd layer can be written as

σ(W [3]σ(W [2]X + b[2]) + b[3])

The weight matrix and bias vector of the second layer can be shown as

W [3] =

W 53 W 54

W 63 W 64

W 73 W 74

b[3] =

b5

b6

b7

M. Kutuk (METU) Introduction to Deep Learning 21 / 37

Page 22: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Multilayer Perceptron(MLP)

The output of the 3rd layer can be obtained asx5

x6

x7

= σ

(W 53x3 + W 54x4 + b5

W 63x3 + W 64x4 + b6

W 73x3 + W 74x4 + b7

)

M. Kutuk (METU) Introduction to Deep Learning 22 / 37

Page 23: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Multilayer Perceptron(MLP)

The activation function of the 4th layer can be written as

σ(W [4]σ(W [3]σ(W [2]X + b[2]) + b[3]) + b[4])

The weight matrix and bias vector of the third layer can be shown as

W [4] =

W 85 W 86 W 87

W 95 W 96 W 97

b[4] =

b8

b9

M. Kutuk (METU) Introduction to Deep Learning 23 / 37

Page 24: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Multilayer Perceptron(MLP)

The output of the 4th layer can be obtained asx8

x9

= σ

(W 85x5 + W 86x6 + W 87x7 + b8

W 95x5 + W 96x6 + W 97x7 + b9

)

M. Kutuk (METU) Introduction to Deep Learning 24 / 37

Page 25: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Multilayer Perceptron(MLP)

As a result, the overall model can be summarized as

Input:

x1

x2

Output:F (X ) =

x8

x9

=⇒ F : R2 −→ R2,

and this model includes totally 23 unknown parameters (16 weight

parameters, 7 bias parameters).

M. Kutuk (METU) Introduction to Deep Learning 25 / 37

Page 26: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Multilayer Perceptron(MLP)

Aim is to produce a classifier by optimizing over all unknown parameters.

We will require F (x) to be close to [1, 0]T for data points in class A and

close to [0, 1]T for data points in class B. Then, the classifier is:

class A, if F 1(x) > F 2(x)

class B, if F 1(x) < F 2(x)

This requirement on F is specified through a cost function.

y(x i ) =

{[1 0

]T, if x i is in class A[

0 1]T, if x i is in class B

M. Kutuk (METU) Introduction to Deep Learning 26 / 37

Page 27: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Cost Function

Then the cost function can be shown as

Cost(W [2],W [3],W [4], b[2], b[3], b[4]) =1

10

10∑i=1

1

2||y(x i )− F (x i )||22

where y(x i ) is the ground truth (labeled data) and F (x i ) is the model output.

This is a quadratic cost function (aka L2-loss function).

Choosing the weights and biases in a way that minimizes the cost function is

referred to as training the network.

M. Kutuk (METU) Introduction to Deep Learning 27 / 37

Page 28: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Steepest Descent Method

The unknown parameters can be stored as a single vector that we call p.

For our example, p ∈ R23.

Generally, p ∈ Rs and Cost : Rs −→ R.

The classical method is steepest descent or gradient descent.

Cost(p + ∆p) ≈ Cost(p) +s∑

r=1

∂Cost(p)

∂pr∆pr =⇒ From Taylor Series Exp.

(∇Cost(p))r =∂Cost(p)

∂pr=⇒ Cost(p + ∆p) ≈ Cost(p) +∇Cost(p)T∆p

We have to choose ∆p such that ∇Cost(p)T∆p < 0.

Therefore, we should choose ∆p to lie in the direction −∇Cost(p). We can

obtain

pk+1 = pk − η∇Cost(p)

where η is the stepsize (aka learning rate).

M. Kutuk (METU) Introduction to Deep Learning 28 / 37

Page 29: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Steepest Descent Method

The cost function for individual terms is

C (x i ) =1

2||y(x i )− a[L](x i )||22

∇Cost(p) =1

N

N∑i=1

∇C (x i )(p)

When there are a large number of parameters and a large number of training

points, computing the gradient vector ∇Cost(p) at every iteration of the

steepest descent method can be expensive.

M. Kutuk (METU) Introduction to Deep Learning 29 / 37

Page 30: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Stochastic Gradient Descent(SGD)

An alternative way is to replace the mean of the individual gradients over all

training points by the gradient at a single, randomly chosen, training point.

1 Choose an integer i uniformly at random from {1,2,3,...,N}2 Update p −→ p − η∇C(x i )(p)

As the iteration proceeds, the method sees more training points. So the cost

decreases after a while.

M. Kutuk (METU) Introduction to Deep Learning 30 / 37

Page 31: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Backpropagation

An application of the chain rule.

To compute the gradient of the error in the output layer, one has to compute

the gradient iteratively layer by layer from the output layer to the input layer.

M. Kutuk (METU) Introduction to Deep Learning 31 / 37

Page 32: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Backpropagation

Let’s consider the example given below

o = g(neto) = g

(7∑

i=5

W oix i

)∂E

∂W oi=∂E

∂o

∂o

∂neto

∂neto

∂W oi

where ∂neto∂W oi

= x i and ∂o∂neto

= g ′ (derivative of the activation function).

M. Kutuk (METU) Introduction to Deep Learning 32 / 37

Page 33: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Oil Drilling Sites Problem

Let’s turn back to our problem and try to solve it in Matlab by using

4-layered MLP which is shown before.

M. Kutuk (METU) Introduction to Deep Learning 33 / 37

Page 34: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

References & Useful Links

C.F. Higham, and D.J. Higham, ”Deep Learning: An Introduction for Applied

Mathematicians”, SIAM Review, 2019.

MIT 6.S191:Introduction to Deep Learning website,

http://introtodeeplearning.com

http://playground.tensorflow.org

https://www.wikiwand.com/en/Backpropagation

M. Kutuk (METU) Introduction to Deep Learning 34 / 37

Page 35: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Deep Learning Related Courses

CENG562 - Machine Learning

CENG783 - Deep Learning

CENG564 - Pattern Recognition

MMI727 - Deep Learning: Methods and Applications

EE583 - Pattern Recognition

IAM557 - Statistical Learning and Simulation

M. Kutuk (METU) Introduction to Deep Learning 35 / 37

Page 36: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

Optimization Related Courses

MATH402 - Introduction to Optimization

IAM566 - Numerical Optimization

EE553 - Optimization

M. Kutuk (METU) Introduction to Deep Learning 36 / 37

Page 37: Scientific Computing Lecture Series Introduction to Deep Learning · There are also some areas of mathematics that uses deep learning: Approximation Theory, Numerical Optimization,

For More Information

http://iam.metu.edu.tr/scientific-computing

https://iam.metu.edu.tr/scientific-computing-lecture-series

https://www.facebook.com/SCiamMETU/

...thank you for your attention !

M. Kutuk (METU) Introduction to Deep Learning 37 / 37