25
LSTM: A Search Space Odyssey Authors: Klaus Greff, Rupesh K. Srivastava, Jan Koutn´ık, Bas R. Steunebrink, J¨urgen Schmidhuber

LSTM: A Search Space Odyssey

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LSTM: A Search Space Odyssey

LSTM: A Search Space Odyssey

Authors: Klaus Greff, Rupesh K. Srivastava, Jan Koutn´ık, Bas R. Steunebrink, J¨urgen Schmidhuber

Page 2: LSTM: A Search Space Odyssey

Outlines

• Introduction

• Long Short-Term Memory (LSTM) with peephole connections

• Experiment and discussion

• Conclusion

Page 3: LSTM: A Search Space Odyssey

Definition:

• Recurrent Neural Networks

• Importance and its applications

• Gradient problem

• Vanishing gradient

• Exploding gradient

• What is the LSTM?

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 4: LSTM: A Search Space Odyssey

LSTM History:

• LSTM was proposed in 1997 by Sepp Hochreiter and Jürgen Schmidhuber.

• In 1999, Felix Gers and Jürgen Schmidhuber and Fred Cummins introduced the

forget gate into LSTM architecture.

• In 2000, Gers & Schmidhuber & Cummins added peephole connections

• In 2014, Kyunghyun Cho et al. put forward a simplified variant called Gated

recurrent unit

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 5: LSTM: A Search Space Odyssey

Simple RNN

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 6: LSTM: A Search Space Odyssey

Block diagram

• Three gates:• Input gate

• Forget gate

• Output gate

• Two blocks:• Block input

• Block output

• One cell state:• cell state

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 7: LSTM: A Search Space Odyssey

Block Diagram

Block input:

𝑊𝑊𝑧𝑧: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑅𝑅𝑧𝑧: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑏𝑏𝑧𝑧: bias weight

𝑥𝑥𝑡𝑡: input vector at time t

𝑦𝑦𝑡𝑡−1: output at time t-1

Input

Recurrent

z

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 8: LSTM: A Search Space Odyssey

Block Diagram

Input gate:𝑊𝑊𝑖𝑖: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑅𝑅𝑖𝑖: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑏𝑏𝑖𝑖: bias weight (𝑅𝑅𝑁𝑁 )

𝑝𝑝𝑖𝑖: peephole weight (𝑅𝑅𝑁𝑁 )

𝑐𝑐𝑡𝑡−1: cell state at time t-1

𝑥𝑥𝑡𝑡: input vector at time t

𝑦𝑦𝑡𝑡−1: output at time t-1

Input

Recurrent

i

𝑐𝑐𝑡𝑡−1

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 9: LSTM: A Search Space Odyssey

Block Diagram

Forget gate:𝑊𝑊𝑓𝑓: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑅𝑅𝑓𝑓: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑏𝑏𝑓𝑓: bias weight (𝑅𝑅𝑁𝑁 )

𝑝𝑝𝑓𝑓: peephole weight (𝑅𝑅𝑁𝑁 )

𝑐𝑐𝑡𝑡−1: cell state at time t-1

𝑥𝑥𝑡𝑡: input vector at time t

𝑦𝑦𝑡𝑡−1: output at time t-1

Input

Recurrent

f

𝑐𝑐𝑡𝑡−1

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 10: LSTM: A Search Space Odyssey

Block Diagram

Output gate:𝑊𝑊𝑜𝑜: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑅𝑅𝑜𝑜: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)

𝑏𝑏𝑜𝑜: bias weight (𝑅𝑅𝑁𝑁 )

𝑝𝑝𝑜𝑜: peephole weight (𝑅𝑅𝑁𝑁 )

𝑐𝑐𝑡𝑡−1: cell state at time t-1

𝑥𝑥𝑡𝑡: input vector at time t

𝑦𝑦𝑡𝑡−1: output at time t-1

Input

Recurrent

o

𝑐𝑐𝑡𝑡

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 11: LSTM: A Search Space Odyssey

Block Diagram

State cell:𝑧𝑧𝑡𝑡: the output of block input at time t

𝑖𝑖𝑡𝑡: the output of input gate at time t

𝑐𝑐𝑡𝑡−1: the output of cell state at time

t-1

𝑓𝑓𝑡𝑡: output of forget gate at time t

𝑐𝑐𝑡𝑡−1

𝑖𝑖𝑡𝑡

𝑧𝑧𝑡𝑡

𝑐𝑐𝑡𝑡−1

𝑓𝑓𝑡𝑡

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 12: LSTM: A Search Space Odyssey

Block Diagram

Block output:𝑜𝑜𝑡𝑡: the output of output gate at time t

𝑐𝑐𝑡𝑡: state cell at time tInput

Recurrent

y

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 13: LSTM: A Search Space Odyssey

LSTM Variants

• NIG: No Input Gate: 𝑖𝑖𝑡𝑡 = 1

• NFG: No Forget Gate: 𝑓𝑓𝑡𝑡 = 1

• NOG: No Output Gate: 𝑜𝑜𝑡𝑡 = 1

• NIAF: No Input Activation Function: g(x) = x

• NOAF: No Output Activation Function: h(x) = x

• CIFG: Coupled Input and Forget Gate: 𝑓𝑓𝑡𝑡 = 1- 𝑖𝑖𝑡𝑡

• NP: No Peepholes

• FGR: Full gate recurrence

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 14: LSTM: A Search Space Odyssey

Experiment setup

Datasets:

• TIMIT speech corpus

• IAM Online Handwriting Database

• JSB Chorales

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 15: LSTM: A Search Space Odyssey

Experiment setup

Features:

• TIMIT speech corpus:• extract 12 MFCCs + energy as well as their first and second derivatives

• IAM Online Handwriting Database:• x, y, t and the time of the pen lifting

• JSB Chorales:

• transposing each MIDI sequence in C major or C minor and sampling frames every quarter note.

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 16: LSTM: A Search Space Odyssey

Experiment setup

Network Architectures and training:

Dataset Type of Network Num of Hidden Layer Output Layer Loss Function Training

TIMIT Bidirectional LSTM Two SoftMax Cross-Entropy Error SGD

IAM Online Bidirectional LSTM Two SoftMax CTC Loss SGD

JSB Chorales LSTM one Sigmoid Cross-Entropy Error SGD

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 17: LSTM: A Search Space Odyssey

Comparison of the Variants

• Test set performance for all 200 trials:

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 18: LSTM: A Search Space Odyssey

Comparison of the Variants

• Test set performance for the best 10% trials:

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 19: LSTM: A Search Space Odyssey

Impact of Hyperparameters

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 20: LSTM: A Search Space Odyssey

Interaction of Hyperparameters

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 21: LSTM: A Search Space Odyssey

Total marginal predicted performance

TIMIT:

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 22: LSTM: A Search Space Odyssey

Total marginal predicted performance

IAM Online:

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 23: LSTM: A Search Space Odyssey

Total marginal predicted performance

JSB Chorales :

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 24: LSTM: A Search Space Odyssey

Conclusion

• The most commonly used LSTM architecture performs reasonably well on various datasets.

• Coupling the input and forget gates (CIFG) or removing peephole connections (NP)

simplified LSTMs in these experiments without significantly decreasing performance.

• The forget gate and the output activation function are the most critical components of the

LSTM block

• the learning rate is the most crucial hyperparameter, followed by the network size.

• Hyperparameters are virtually independent

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 25: LSTM: A Search Space Odyssey

References:

• K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink and J. Schmidhuber, "LSTM: A

Search Space Odyssey," in IEEE Transactions on Neural Networks and Learning Systems, vol.

28, no. 10, pp. 2222-2232, Oct. 2017.

• https://www.youtube.com/watch?v=lycKqccytfU

• https://www.youtube.com/watch?v=lWkFhVq9-nc

• https://en.wikipedia.org/wiki/Long_short-term_memory

Introduction LSTM with peephole connections Results and discussion Conclusion