View
52
Download
0
Category
Preview:
Citation preview
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 1
Deep-Learning:
Recurrent Neural Networks (RNN)
Pr. Fabien MOUTARDECenter for Robotics
MINES ParisTech
PSL Université Paris
Fabien.Moutarde@mines-paristech.fr
http://people.mines-paristech.fr/fabien.moutarde
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 2
Acknowledgements
During preparation of these slides, I got inspiration and borrowedsome slide content from several sources, in particular:
• Fei-Fei Li + J.Johnson + S.Yeung: slides on “Recurrent Neural Networks”from the “Convolutional Neural Networks for Visual Recognition” course atStanfordhttp://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture10.pdf
• Yingyu Liang: slides on “Recurrent Neural Networks” from the “DeepLearning Basics” course at Princeton
https://www.cs.princeton.edu/courses/archive/spring16/cos495/slides/DL_lecture9_RNN.pdf
• Arun Mallya: slides “Introduction to RNNs” from the “Trends in DeepLearning and Recognition” course of Svetlana LAZEBNIK at University ofIllinois at Urbana-Champaign
http://slazebni.cs.illinois.edu/spring17/lec02_rnn.pdf
• Tingwu Wang: slides on “Recurrent Neural Network” for a course atUniversity of Toronto
https://www.cs.toronto.edu/%7Etingwuwang/rnn_tutorial.pdf
• Christopher Olah: online tutorial “Understanding LSTM Networks”https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 3
Outline
• Standard Recurrent Neural Networks
• Training RNN: BackPropagation Through Time
• LSTM and GRU
• Applications of RNNs
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 4
Recurrent Neural Networks (RNN)
Time-delayfor each connection
Equivalent form
f f
0
1
11
2
x2
output
x1
x3
input
S S
00
output
f
f
x2(t)
x1(t)
x3(t)
input
1
x2(t-1)
1
x3(t-1)
x2(t-1)
1
x2(t-2)
1S
SS
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 5
Canonical form of RNN
Non-recurrent network
U(t)External input
............
..................
Output Y(t)
1 1 1.......
........
X(t-1)State variables
X(t)State variables
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 6
Time unfolding of RNN
State variablesat time tOutput at time t :
Non-recurrent network
External input at time t :State variablesat time t-1Output at time t-1 :
External input at time t-1 :State variablesat time t-2Output at time t-2 :
External input at time t-2 :
State variablesat time t-3
Non-recurrent network
Non-recurrent network
Y(t)
Y(t-1)
Y(t-2)
U(t-2)
U(t-1)
U(t)
X(t-2)
X(t-3)
X(t-1)
X(t)
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 7
Dynamic systems & RNN
If using a Neural Net for f, this is EXACTLY a RNN!
Figures from Deep Learning, Goodfellow, Bengio and Courville
!("#$) = % !("), &("#$)
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 8
Standard (“vanilla”) RNN
State vector s ßà vector h of hidden neurons
ou yt=softMax(Whyht)
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 9
Advantages of RNN
The hidden state s of the RNN builds a kind of
lossy summary of the past
RNN totally adapted to processing SEQUENTIAL
data (same computation formula applied at each
time step, but modulated by the evolving “memory”
contained in state s)
Universality of RNNs: any function computable by
a Turing Machine can be computed by a finite-size
RNN (Siegelmann and Sontag, 1995)
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 10
RNN hyper-parameters
• As for MLP, main hyperparameter =
size of hidden layer (=size of vector h)
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 11
Outline
• Standard Recurrent Neural Networks
• Training RNN: BackPropagation Through Time
• LSTM and GRU
• Applications of RNNs
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 12
RNN training
• BackPropagation Through Time (BPTT)
gradients update for a whole sequence
• or Real Time Recurrent Learning (RTRL)
gradients update for each frame in a sequence
t+1t
t+2 t+3
t+4
Temporal sequence
W(t)W(t)W(t)
e3 e4W(t)
W(t+4)
Horizon Nt = 4
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 13
BackPropagationTHROUGH TIME (BPTT)
• Forward through entire sequence to compute SUM of losses at ALL (or part of) time steps
• Then backprop through ENTIRE sequence to compute gradients
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 14
BPTT computation principle
Xd(t-1)
bloc 1
W(t)
X(t)
U(t) U(t+1)
bloc 2
W(t)
X(t+1)
U(t+2)
bloc 3
W(t)
X(t+2)
D(t+2)D(t+1)
dE/dXn+1dE/dXn
dW3dW2dW1
dW = dW1 + dW2 + dW3
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 15
BPTT algorithm
W(t+Nt) = W(t) - l gradW(E) avec E = St (Y
t-D
t)2
'*, +-.+/ = +-.+0.
+0.+1.23
+1.23+/
+-+/ =4
.53
6 +-.+/
Feedforward
Network
U(t) Y(t)
X(t)
delay
X(t-1)
state
+1.+/ = 4
753
.23 +1.+1.27
+1.27+/
and (chain rule)
+1.+1.27 =8
953
. +19+1923
Jacobian matrix of the
Feedforward net
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 16
Vanishing/exploding gradient problem
• If eigenvalues of Jacobian matrix >1, then gradients tend
to EXPLODE
è Learning will never converge.
• Conversely, if eigenvalues of Jacobian matrix <1,
then gradients tend to VANISH
è Error signals can only affect small time lags
è short-term memory.
èPossible solutions for exploding gradient: CLIPPING trick
è Possible solutions for vanishing gradient: – use ReLU instead of tanh– change what is inside the RNN!
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 17
Outline
• Standard Recurrent Neural Networks
• Training RNN: BackPropagation Through Time
• LSTM and GRU
• Applications of RNNs
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 18
Long Short-Term Memory (LSTM)
LSTM = RNN variant for solving this issue(proposed by Hochreiter & Schmidhuber in 1997)
• Key idea = use “gates” that modulate respective influences of input and memory
Problem of standard RNNs =
no actual LONG-TERM memory
[Figures from https://colah.github.io/posts/2015-08-Understanding-LSTMs/]
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 19
LSTM gates
Gate = pointwise multiplication by s in ]0;1[è modulate between “let nothing through”
and “let everything through”
• FORGET gate
• INPUT gate
è next state = mix betweenpure memory or pure new
[Figures from https://colah.github.io/posts/2015-08-Understanding-LSTMs/]
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 20
LSTM summary
• OUTPUT gate
ALL weigths Wf, Wi, Wc and Wo(and biases) are LEARNT
[Figure from Deep Learning book by I. Goodfellow, Y. Bengio & A. Courville]
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 21
Why LSTM avoids vanishing gradients?
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 22
Gated Recurrent Unit (GRU)
Simplified variant of LSTM, with only 2 gates:a RESET gate & an UPDATE gate
(proposed by Cho, et al. in 2014)
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 23
Outline
• Standard Recurrent Neural Networks
• Training RNN: BackPropagation Through Time
• LSTM and GRU
• Applications of RNNs
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 24
Typical usages of RNNs
Sequence
Sequence to Sequence
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 25
Combining RNN with CNN
Input into RNN the features from
last convolutional layer
For example,
for image captioning
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 26
Deep RNNs
Several RNNs stacked (like layers in MLP)
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 27
Bi-directional RNNs
(e.g. for offline classification of sequence of words)
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 28
Encoder-decoder RNN
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 29
Applications of RNN/LSTM
Wherever data is intrinsicly SEQUENTIAL
• Speech recognition• Natural Language Processing (NLP)
– Machine-Translation
– Image caption generator
• Gesture recognition
• Music generation
• Potentially any kind of time-series!!
Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 30
Summary and perspectives on Recurrent Neural Networks
• For SEQUENTIAL data
(speech, text, …, gestures, …)
• Impressive results in
Natural Language Processing (in particular
Automated Real-Time Translation)
• Training of standard RNNs can be tricky
(vanishing gradient…)
• LSTM / GRU now more used than standard RNNs
Recommended