29
Deep Learning Hung-yi Lee 李宏毅

Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Deep LearningHung-yi Lee

李宏毅

Page 2: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Deep learning attracts lots of attention.• I believe you have seen lots of exciting results

before.

Deep learning trends at Google. Source: SIGMOD 2016/Jeff Dean

Page 3: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

• 1958: Perceptron (linear model)

• 1969: Perceptron has limitation

• 1980s: Multi-layer perceptron

• Do not have significant difference from DNN today

• 1986: Backpropagation

• Usually more than 3 hidden layers is not helpful

• 1989: 1 hidden layer is “good enough”, why deep?

• 2006: RBM initialization

• 2009: GPU

• 2011: Start to be popular in speech recognition

• 2012: win ILSVRC image competition

• 2015.2: Image recognition surpassing human-level performance

• 2016.3: Alpha GO beats Lee Sedol

• 2016.10: Speech recognition system as good as humans

Ups and downs of Deep Learning

Page 4: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Step 1: define a set of function

Step 2: goodness of

function

Step 3: pick the best function

Three Steps for Deep Learning

Deep Learning is so simple ……

Neural Network

Page 5: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Neural Network

z

z

z

z

“Neuron”

Different connection leads to different network structures

Neural Network

Network parameter 𝜃: all the weights and biases in the “neurons”

Page 6: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Fully Connect Feedforward Network

z

z

ze

z

1

1

Sigmoid Function

1

-1

1

-2

1

-1

1

0

4

-2

0.98

0.12

Page 7: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Fully Connect Feedforward Network

1

-2

1

-1

1

0

4

-2

0.98

0.12

2

-1

-1

-2

3

-1

4

-1

0.86

0.11

0.62

0.83

0

0

-2

2

1

-1

Page 8: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Fully Connect Feedforward Network

1

-2

1

-1

1

0

0.73

0.5

2

-1

-1

-2

3

-1

4

-1

0.72

0.12

0.51

0.85

0

0

-2

2

𝑓00

=0.510.85

𝑓1−1

=0.620.83

0

0

This is a function.

Input vector, output vector

Given network structure, define a function set

Page 9: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Output LayerHidden Layers

Input Layer

Fully Connect Feedforward Network

Input Output

1x

2x

Layer 1

……

Nx

……

Layer 2

……

Layer L…

……

……

……

……

y1

y2

yM

neuron

Page 10: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

8 layers

19 layers

22 layers

AlexNet (2012) VGG (2014) GoogleNet (2014)

16.4%

7.3%6.7%

http://cs231n.stanford.edu/slides/winter1516_lecture8.pdf

Deep = Many hidden layers

Page 11: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

AlexNet(2012)

VGG (2014)

GoogleNet(2014)

152 layers

3.57%

Residual Net (2015)

Taipei101

101 layers

16.4%7.3% 6.7%

Deep = Many hidden layers

Special structure

Ref: https://www.youtube.com/watch?v=dxB6299gpvI

Page 12: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

𝜎

Matrix Operation

2y

1y1

-2

1

-1

1

0

4

-2

0.98

0.12

1−1

1 −2−1 1

+10

0.980.12

=

1

-1

4−2

Page 13: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

1x

2x

……

Nx

……

……

……

……

……

……

……

y1

y2

yM

Neural Network

W1 W2 WL

b2 bL

x a1 a2 y

b1W1 x +𝜎b2W2 a1 +𝜎

bLWL +𝜎 aL-1

b1

Page 14: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

= 𝜎 𝜎

1x

2x

……

Nx

……

……

……

……

……

……

……

y1

y2

yM

Neural Network

W1 W2 WL

b2 bL

x a1 a2 y

y = 𝑓 x

b1W1 x +𝜎 b2W2 + bLWL +…

b1

Using parallel computing techniques to speed up matrix operation

Page 15: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Output Layer as Multi-Class Classifier

……

……

……

……

……

……

……

……

y1

y2

yMKx

Output LayerHidden Layers

Input Layer

x

1x

2x

Feature extractor replacing feature engineering

= Multi-class Classifier

Softm

ax

Page 16: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Example Application

Input Output

16 x 16 = 256

1x

2x

256x…

Ink → 1No ink → 0

……

y1

y2

y10

Each dimension represents the confidence of a digit.

is 1

is 2

is 0

……

0.1

0.7

0.2

The image is “2”

Page 17: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Example Application

• Handwriting Digit Recognition

Machine “2”

1x

2x

256x

…… ……

y1

y2

y10

is 1

is 2

is 0

……

What is needed is a function ……

Input: 256-dim vector

output: 10-dim vector

NeuralNetwork

Page 18: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Output LayerHidden Layers

Input Layer

Example Application

Input Output

1x

2x

Layer 1

……

Nx

……

Layer 2

……

Layer L

……

……

……

……

“2”……

y1

y2

y10

is 1

is 2

is 0

……

A function set containing the candidates for

Handwriting Digit Recognition

You need to decide the network structure to let a good function in your function set.

Page 19: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

FAQ

• Q: How many layers? How many neurons for each layer?

• Q: Can the structure be automatically determined?• E.g. Evolutionary Artificial Neural Networks

• Q: Can we design the network structure?

Trial and Error Intuition+

Convolutional Neural Network (CNN)

Page 20: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Step 1: define a set of function

Step 2: goodness of

function

Step 3: pick the best function

Three Steps for Deep Learning

Deep Learning is so simple ……

Neural Network

Page 21: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Loss for an Example

1x

2x

……

256x

……

……

……

……

……

y1

y2

y10

CrossEntropy

“1”

……

1

0

0

……

target

Softm

ax

𝑙 𝑦 , ො𝑦 = −

𝑖=1

10

ෝ𝑦𝑖𝑙𝑛𝑦𝑖

ො𝑦1

ො𝑦2

ො𝑦10

……

Given a set of parameters

𝑦 ො𝑦

Page 22: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Total Loss

x1

x2

xN

NN

NN

NN

……

……

y1

y2

yN

ො𝑦1

ො𝑦2

ො𝑦𝑁

𝑙1

……

……

x3 NN y3 ො𝑦3

For all training data …𝐿 =

𝑛=1

𝑁

𝑙𝑛

Find the network parameters 𝜽∗ that minimize total loss L

Total Loss:

𝑙2

𝑙3

𝑙𝑁

Find a function in function set that minimizes total loss L

Page 23: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Step 1: define a set of function

Step 2: goodness of

function

Step 3: pick the best function

Three Steps for Deep Learning

Deep Learning is so simple ……

Neural Network

Page 24: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Gradient Descent

𝑤1

Compute Τ𝜕𝐿 𝜕𝑤1

−𝜇 Τ𝜕𝐿 𝜕𝑤1

0.15

𝑤2

Compute Τ𝜕𝐿 𝜕𝑤2

−𝜇 Τ𝜕𝐿 𝜕𝑤2

0.05

𝑏1

Compute Τ𝜕𝐿 𝜕𝑏1

−𝜇 Τ𝜕𝐿 𝜕𝑏10.2

……

……

0.2

-0.1

0.3

𝜃𝜕𝐿

𝜕𝑤1

𝜕𝐿

𝜕𝑤2

⋮𝜕𝐿

𝜕𝑏1⋮

𝛻𝐿 =

gradient

Page 25: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Gradient Descent

𝑤1

Compute Τ𝜕𝐿 𝜕𝑤1

−𝜇 Τ𝜕𝐿 𝜕𝑤1

0.15−𝜇 Τ𝜕𝐿 𝜕𝑤1

Compute Τ𝜕𝐿 𝜕𝑤1

0.09

𝑤2

Compute Τ𝜕𝐿 𝜕𝑤2

−𝜇 Τ𝜕𝐿 𝜕𝑤2

0.05−𝜇 Τ𝜕𝐿 𝜕𝑤2

Compute Τ𝜕𝐿 𝜕𝑤2

0.15

𝑏1

Compute Τ𝜕𝐿 𝜕𝑏1

−𝜇 Τ𝜕𝐿 𝜕𝑏10.2

−𝜇 Τ𝜕𝐿 𝜕𝑏1

Compute Τ𝜕𝐿 𝜕𝑏10.10

……

……

0.2

-0.1

0.3

……

……

……

𝜃

Page 26: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Gradient Descent

This is the “learning” of machines in deep learning ……

Even alpha go using this approach.

I hope you are not too disappointed :p

People image …… Actually …..

Page 27: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Backpropagation

• Backpropagation: an efficient way to compute Τ𝜕𝐿 𝜕𝑤 in neural network

libdnn台大周伯威同學開發

Ref: http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20backprop.ecm.mp4/index.html

Page 28: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Step 1: define a set of function

Step 2: goodness of

function

Step 3: pick the best function

Three Steps for Deep Learning

Deep Learning is so simple ……

Neural Network

Page 29: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs

Acknowledgment

•感謝 Victor Chen發現投影片上的打字錯誤