74
IN 5400 Machine learning for image classification Lecture 11: Recurrent neural networks March 18 , 2020 Tollef Jahren

INF 5860 Machine learning for image classification

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: INF 5860 Machine learning for image classification

IN 5400 Machine learning for image classification

Lecture 11: Recurrent neural networks

March 18 , 2020

Tollef Jahren

Page 2: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Outline

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 2

Page 3: INF 5860 Machine learning for image classification

IN5400

18.03.2020

About today

Page 3

Goals:

• Understand how to build a recurrent neural network

• Understand why long range dependencies is a challenge for recurrent

neural networks

Page 4: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Readings

• Video:

– cs231n: Lecture 10 | Recurrent Neural Networks

• Read:

– The Unreasonable Effectiveness of Recurrent Neural Networks

• Optional:

– Video: cs224d L8: Recurrent Neural Networks and Language Models

– Video: cs224d L9: Machine Translation and Advanced Recurrent LSTMs

and GRUs

Page 4

Page 5: INF 5860 Machine learning for image classification

IN5400

18.03.2020

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 5

Progress

Page 6: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Overview

• What have we learnt so far?

– Fully connected neural network

– Convolutional neural network

• What worked well with these?

– Fully connected neural network

works well when the input data is

structured

– Convolutional neural network works

well on images

• What does not work well?

– Processing data with unknown length

Page 6

Page 7: INF 5860 Machine learning for image classification

IN5400

18.03.2020

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 7

Progress

Page 8: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Recurrent Neural Network (RNN)

• Takes a new input

• Manipulate the state

• Reuse weights

• Gives a new output

Page 8

𝑦𝑡

𝑥𝑡

Page 9: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Recurrent Neural Network (RNN)

• Unrolled view

• 𝑥𝑡 : The input vector:

• ℎ𝑡: The hidden state of the RNN

• 𝑦𝑡 : The output vector:

Page 9

𝑦𝑡

𝑥𝑡

Page 10: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Recurrent Neural Network (RNN)

• Recurrence formula:

– Input vector: 𝑥𝑡– Hidden state vector: ℎ𝑡

Page 10

𝑦𝑡

𝑥𝑡

Note: We use the same function and parameters

for every “time” step

Page 11: INF 5860 Machine learning for image classification

IN5400

18.03.2020

(Vanilla) Recurrent Neural Network

• Input vector: 𝑥𝑡• Hidden state vector: ℎ𝑡• Output vector: 𝑦𝑡• Weight matrices: 𝑊ℎℎ, 𝑊𝑥ℎ ,𝑊ℎ𝑦

• Bias vectors: 𝑏ℎ, 𝑏𝑦

General form:

ℎ𝑡 = 𝑓𝑤 ℎ𝑡−1, 𝑥𝑡

Vanilla RNN (simplest form):

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡 + 𝑏ℎ

𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦

Page 11

𝑦𝑡

𝑥𝑡

Page 12: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Computational Graph

Page 12

Page 13: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Computational Graph

Page 13

Page 14: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Computational Graph

Page 14

Page 15: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Computational Graph

Page 15

Page 16: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Computational Graph

Page 16

Page 17: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Computational Graph

Page 17

Page 18: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Computational Graph

Page 18

Page 19: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Example: Language model

• Task: Predicting the next character

• Training sequence: “hello”

• Vocabulary:

– [h, e, l, o]

• Encoding: Onehot

Page 19

Page 20: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Predicting the next character

• Task: Predicting the next character

• Training sequence: “hello”

• Vocabulary:

– [h, e, l, o]

• Encoding: Onehot

• Model:

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

Page 20

Page 21: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Predicting the next character

• Task: Predicting the next character

• Training sequence: “hello”

• Vocabulary:

– [h, e, l, o]

• Encoding: Onehot

• Model:

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡

Page 21

Page 22: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Predicting the next character

• Vocabulary:

– [h, e, l, o]

• At test time, we can sample from the

model to synthesis new text.

Page 22

Page 23: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Predicting the next character

• Vocabulary:

– [h, e, l, o]

• At test time, we can sample from the

model to synthesis new text.

Page 23

Page 24: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Predicting the next character

• Vocabulary:

– [h, e, l, o]

• At test time, we can sample from the

model to synthesis new text.

Page 24

Page 25: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Predicting the next character

• Vocabulary:

– [h, e, l, o]

• At test time, we can sample from the

model to synthesis new text.

Page 25

Page 26: INF 5860 Machine learning for image classification

IN5400

18.03.2020

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 26

Progress

Page 27: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Input-output structure of RNN’s

• One-to-one

• one-to-many

• many-to-one

• Many-to-many

• many-to-many (encoder-decoder)

Page 27

Page 28: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: One-to-one

Normal feed forward neural network

• Task:

– Predict housing prices

Page 28

Input layer

Hidden layer

Output layer

Page 29: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: One-to-many

Page 29

Task:

• Image Captioning

– (Image Sequence of words)

Input layer

Hidden layer

Output layer

Page 30: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Many-to-one

Page 30

Tasks:

• Sentiment classification

• Video classification

Input layer

Hidden layer

Output layer

Page 31: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Many-to-many

Page 31

Task:

• Video classification on frame level

• Text analysis (nouns, verbs)

Input layer

Hidden layer

Output layer

Page 32: INF 5860 Machine learning for image classification

IN5400

18.03.2020

RNN: Many-to-many (encoder-

decoder)

Page 32

Task:

• Machine Translation

(English French)

Input layer

Hidden layer

Output layer

z

Page 33: INF 5860 Machine learning for image classification

IN5400

18.03.2020

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 33

Progress

Page 34: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Training Recurrent Neural Networks

• The challenge in training a recurrent neural network is to preserve long range

dependencies.

• Vanilla recurrent neural network

– ℎ𝑡 = 𝑓 𝑊ℎℎℎ𝑡−1 +𝑊ℎ𝑥𝑥𝑡 + 𝑏

– 𝑓 = tanh() can easily cause vanishing gradients.

– 𝑓 = 𝑟𝑒𝑙𝑢() can easily cause exploding values and/or gradients

• Finite memory available

Page 34

Page 35: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Exploding or vanishing gradients

• tanh() solves the exploding value problem

• tanh() does NOT solve the exploding gradient problem, think of a scalar input

and a scalar hidden state.

ℎ𝑡 = 𝑡𝑎𝑛ℎ 𝑊ℎℎℎ𝑡−1 +𝑊ℎ𝑥𝑥𝑡 + 𝑏

𝜕ℎ𝑡𝜕ℎ𝑡−1

= 1 − tanh2(𝑊ℎℎℎ𝑡−1 +𝑊ℎ𝑥𝑥𝑡 + 𝑏) ⋅ 𝑊ℎℎ

The gradient can explode/vanish exponentially in time (steps)

• If |𝑊ℎℎ| < 1, vanishing gradients

• If |𝑊ℎℎ| > 1, exploding gradients

Page 35

Page 36: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Exploding or vanishing gradients

(for reference only)

• Conditions for vanishing and exploding gradients for the weight matrix, 𝑊ℎℎ:

– If largest singular value > 1: Exploding gradients

– If largest singular value < 1: Vanishing gradients

Page 36

“On the difficulty of training Recurrent Neural Networks, Pascanu et al. 2013

Page 37: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Gradient clipping

• To avoid exploding gradients, a simple

trick is to set a threshold of the norm of

the gradient matrix, 𝜕ℎ𝑡

𝜕ℎ𝑡−1.

• Error surface of a single hidden unit RNN

• High curvature walls

• Solid lines: Standard gradient descent

trajectories

• Dashed lines gradient rescaled to fixes

size

Page 37

“On the difficulty of training Recurrent Neural Networks, Pascanu et al. 2013

Page 38: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Vanishing gradients - “less of a

problem”

• In contrast to feed-forward networks, RNNs will not stop learning despite of

vanishing gradients.

• The network gets “fresh” inputs each step, so the weights will be updated.

• The challenge is to learning long range dependencies. This can be improved

using more advanced architectures.

Page 38

Page 39: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Backpropagation through time (BPTT)

• Training of a recurrent neural network is done using backpropagation and

gradient descent algorithms

– To calculate gradients you have to keep the internal RNN values in

memory until you get the backpropagating gradient

– Then what if you are reading a book of 100 million characters?

Page 41

Page 40: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Truncated Backpropagation thought

time

• The solution is that you stop at some point and update the weight as if you

were done.

Page 42

Page 41: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Truncated Backpropagation thought

time

• The solution is that you stop at some point and update the weight as if you

were done.

Page 43

Page 42: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Truncated Backpropagation thought

time

• The solution is that you stop at some point and update the weight as if you

were done.

Page 44

Page 43: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Truncated Backpropagation thought

time

• Advantages:

– Reducing the memory requirement

– Faster parameter updates

• Disadvantage:

– Not able to capture longer dependencies then the truncated length.

Page 45

Page 44: INF 5860 Machine learning for image classification

IN5400

18.03.2020

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 48

Progress

Page 45: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Watch cs231n

• Watch from 31:35 to 39:00

• cs231n: Lecture 10 | Recurrent Neural Networks

Page 49

Page 46: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Inputting Shakespeare

• Input a lot of text

• Try to predict the next character

• Learning spelling, quotes and

punctuation.

Page 50

https://gist.github.com/karpathy/d4dee566867f8291f086

Page 47: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Inputting Shakespeare

Page 51

https://gist.github.com/karpathy/d4dee566867f8291f086

Page 48: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Inputting latex

Page 52

Page 49: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Inputting C code (linux kernel)

Page 53

Page 50: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Interpreting cell activations

• Semantic meaning of the hidden state vector

Page 54

Visualizing and Understanding Recurrent Networks

Page 51: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Interpreting cell activations

Page 55

Visualizing and Understanding Recurrent Networks

Page 52: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Interpreting cell activations

Page 56

Visualizing and Understanding Recurrent Networks

Page 53: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Interpreting cell activations

Page 57

Visualizing and Understanding Recurrent Networks

Page 54: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Interpreting cell activations

Page 58

Visualizing and Understanding Recurrent Networks

Page 55: INF 5860 Machine learning for image classification

IN5400

18.03.2020

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 59

Progress

Page 56: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Gated Recurrent Unit (GRU)

• A more advanced recurrent unit

• Uses gates to control information flow

• Used to improve long term dependencies

Page 60

Page 57: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Gated Recurrent Unit (GRU)

Page 61

Vanilla RNN

• Cell ℎ𝑡 = 𝑡𝑎𝑛ℎ(𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡−1 + 𝑏)

GRU

• Update gate Γ𝑢 = 𝜎(𝑊𝑢𝑥𝑡 + 𝑈𝑢ℎ𝑡−1 + 𝑏𝑢)

• Reset gate Γ𝑟 = 𝜎(𝑊𝑟𝑥𝑡 + 𝑈𝑟ℎ𝑡−1 + 𝑏𝑟)

• Candidate cell state ෨ℎ𝑡 = 𝑡𝑎𝑛ℎ 𝑊𝑥𝑡 + 𝑈(Γ𝑟∘ ℎ𝑡−1 + 𝑏)

• Final cell state ℎ𝑡 = Γ𝑢 ∘ ℎ𝑡−1 + 1 − Γ𝑢 ∘ ෨ℎ𝑡

The GRU has the ability to add and to remove from the state, not “transforming

the state” only.

With Γ𝑟 as ones and Γ𝑢 as zeros, GRU ՜ Vanilla RNN

Page 58: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Gated Recurrent Unit (GRU)

• Update gate Γ𝑢 = 𝜎(𝑊𝑢𝑥𝑡 + 𝑈𝑢ℎ𝑡−1 + 𝑏𝑢)

• Reset gate Γ𝑟 = 𝜎(𝑊𝑟𝑥𝑡 + 𝑈𝑟ℎ𝑡−1 + 𝑏𝑟)

• Candidate cell state ෨ℎ𝑡 = 𝑡𝑎𝑛ℎ 𝑊𝑥𝑡 + 𝑈(Γ𝑟∘ ℎ𝑡−1 + 𝑏)

• Final cell state ℎ𝑡 = Γ𝑢 ∘ ℎ𝑡−1 + 1 − Γ𝑢 ∘ ෨ℎ𝑡

Page 62

• The update gate , Γ𝑢, can easily be close to 1 due to the sigmoid activation

function. Copying the previous hidden state will prevent vanishing gradients.

• With Γ𝑢 = 0 and Γ𝑟 = 0 the GRU unit can forget the past.

• In a standard RNN, we are transforming the hidden state regardless of how

useful the input is.

Page 59: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Long Short Term Memory (LSTM)

(for reference only)

Page 63

GRU LSTM

Update gate Γ𝑢 = 𝜎(𝑊𝑢𝑥𝑡 + 𝑈𝑢ℎ𝑡−1 + 𝑏𝑢) Forget gate Γ𝑓 = 𝜎(𝑊𝑓𝑥𝑡 + 𝑈𝑓 ℎ𝑡−1 + 𝑏𝑓)

Reset gate Γ𝑟 = 𝜎(𝑊𝑟𝑥𝑡 + 𝑈𝑟ℎ𝑡−1 + 𝑏𝑟) Input gate Γ𝑖 = 𝜎(𝑊𝑖𝑥𝑡 + 𝑈𝑖 ℎ𝑡−1 + 𝑏𝑖)

Output gate Γ𝑜 = 𝜎(𝑊𝑜𝑥𝑡 + 𝑈𝑜ℎ𝑡−1 + 𝑏𝑜)

Candidate cell

state

෨ℎ𝑡 = 𝑡𝑎𝑛ℎ 𝑊𝑥𝑡 + 𝑈(Γ𝑟∘ ℎ𝑡−1 + 𝑏) Potential cell memory ǁ𝑐𝑡 = 𝑡𝑎𝑛ℎ(𝑊𝑐𝑥𝑡 + 𝑈𝑐 ℎ𝑡−1 + 𝑏𝑐)

Final cell state ℎ𝑡 = Γ𝑢 ∘ ℎ𝑡−1 + 1 − Γ𝑢 ∘ ෨ℎ𝑡 Final cell memory 𝑐𝑡 = Γ𝑓 ∘ 𝑐𝑡−1 + Γ𝑖 ∘ ǁ𝑐𝑡

Final cell state ℎ𝑡 = Γ𝑜 ∘ tanh(𝑐𝑡)

LSTM training

• One can initialize the biases of the forget

gate such that the values in the forget gate

are close to one.

Page 60: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Multi-layer Recurrent Neural Networks

• Multi-layer RNNs can be used to enhance

model complexity

• Similar as for feed forward neural networks,

stacking layers creates higher level feature

representation

• Normally, 2 or 3 layers deep, not as deep

as conv nets

• More complex relationships in time

Page 64

Page 61: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Bidirectional recurrent neural network

• The blocks can be vanilla, LSTM and GRU recurrent units

• Real time vs post processing

Page 65

ℎ𝑡 = 𝑓 𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡−1

ℎ𝑡 = 𝑓 𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡+1

ℎ𝑡 = ℎ𝑡, ℎ𝑡

𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏)

Page 62: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Bidirectional recurrent neural network

• Example:

– Task: Is the word part of a person’s name?

– Text 1: He said, “Teddy bear are on sale!”

– Text 2: He said, “Teddy Roosevelt was a great President!”

Page 66

ℎ𝑡 = 𝑓 𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡−1

ℎ𝑡 = 𝑓 𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡+1

ℎ𝑡 = ℎ𝑡, ℎ𝑡

𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏)

Page 63: INF 5860 Machine learning for image classification

IN5400

18.03.2020

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 67

Progress

Page 64: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Image Captioning

• Combining a CNN with an RNN to generate descriptive text about an image.

• Learning an RNN to interpret top level features from a CNN

Page 68

Deep Visual-Semantic Alignments for Generating Image Descriptions

Page 65: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Image Captioning

Page 69

Page 66: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Image Captioning

Page 70

Page 67: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Image Captioning

Page 71

Page 68: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Image Captioning

Page 72

Page 69: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Image Captioning

Page 73

Page 70: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Image Captioning

Page 74

Page 71: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Image Captioning: Results

Page 75

NeuralTalk2

Page 72: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Image Captioning: Failures

Page 76

NeuralTalk2

Page 73: INF 5860 Machine learning for image classification

IN5400

18.03.2020

CNN + RNN

• With RNN on top of CNN features,

you capture a larger time horizon

Page 80

Long-term Recurrent Convolutional Networks for Visual Recognition

and Description

Page 74: INF 5860 Machine learning for image classification

IN5400

18.03.2020

Alternatives to RNN

(for reference only)

• Temporal convolutional networks (TCN)

– An Empirical Evaluation of Generic Convolutional and Recurrent Networks

for Sequence Modeling

• Attention based models

– Attention Is All You Need

Page 82