25
A Brief Introduction on Recurrent Neural Network and Its Application Qiang Gan All contents are collected online, listed in Reference page. For Nanjing Deep Learning Meetup Only

A Brief Introduction on Recurrent Neural Network and Its Application

Embed Size (px)

Citation preview

Page 1: A Brief Introduction on Recurrent Neural Network and Its Application

A Brief Introduction on Recurrent Neural Network and Its Application

Qiang Gan

All contents are collected online, listed in Reference page.

For Nanjing Deep Learning Meetup Only

Page 2: A Brief Introduction on Recurrent Neural Network and Its Application

Outline

1. RNN o Model structure

o Parameters

o Learning algorithm

2. Long-Term Dependencies & Vanishing Gradient Problemo LSTM / GRU

3. Neural Machine Translationo Encoder-decoder framework

4. Attention Mechanismo Extract information needed from source

5. RNN other applicationso Image captioning

o Question Answering

All contents are collected online, listed in Reference page.

Page 3: A Brief Introduction on Recurrent Neural Network and Its Application

Before we start …

All contents are collected online, listed in Reference page.

Page 4: A Brief Introduction on Recurrent Neural Network and Its Application

Memory

• We are all familiar with the song 《Two Tigers》o Two tigers, two tiger …

• What is the 10th word?

• We learned them as a sequence, a kind of

conditional memory.

• More example: driving steps, movie scenes, …

All contents are collected online, listed in Reference page.

Page 5: A Brief Introduction on Recurrent Neural Network and Its Application

“Memory” in Neural Network

• Traditional Neural Network

o Output relies only on current input

o input -> hidden -> output

• Network with “Memory”

o Output relies on current input and history information

o (input + prev_hidden) -> hidden -> output

All contents are collected online, listed in Reference page.

Page 6: A Brief Introduction on Recurrent Neural Network and Its Application

“Memory” in Neural Network

• Four Steps in Network with “Memory”

1. (input + empty_hidden) -> hidden -> output

• Memory only contains blue information

2. (input + prev_hidden) -> hidden -> output

• Memory contains blue and red information

3. (input + prev_hidden) -> hidden -> output

• Memory contains blue, red and green information

4. (input + prev_hidden) -> hidden -> output

• Memory contains blue, red, green and purple information

All contents are collected online, listed in Reference page.

Page 7: A Brief Introduction on Recurrent Neural Network and Its Application

Recurrent Neural Network

All contents are collected online, listed in Reference page.

Page 8: A Brief Introduction on Recurrent Neural Network and Its Application

Recurrent Neural Network

• Previous example

All contents are collected online, listed in Reference page.

Page 9: A Brief Introduction on Recurrent Neural Network and Its Application

Recurrent Neural Network

All contents are collected online, listed in Reference page.

Page 10: A Brief Introduction on Recurrent Neural Network and Its Application

Recurrent Neural Network

• Learning algorithm (Backpropagation Through Time)

o Unfold the RNN into DNN (weights shared)

o Black is the prediction, errors are bright yellow, derivatives

are mustard colored.

All contents are collected online, listed in Reference page.

Page 11: A Brief Introduction on Recurrent Neural Network and Its Application

Long-Term Dependencies Problem

• Consider trying to predict the last word in the text “I

grew up in France… I speak fluent French.”

• We need the context of France, from further back.

All contents are collected online, listed in Reference page.

Page 12: A Brief Introduction on Recurrent Neural Network and Its Application

Vanishing Gradient Problem

w1,w2,… are the weights, b1,b2,…are the biases,C is some cost function. aj = σ(zj), σ is activation function, zj=wjaj−1+bj is the weighted input to the neuron.

All contents are collected online, listed in Reference page.

Page 13: A Brief Introduction on Recurrent Neural Network and Its Application

Tanh and derivative

Vanishing Gradient Problem

All contents are collected online, listed in Reference page.

Page 14: A Brief Introduction on Recurrent Neural Network and Its Application

Long-Short Term Memory

• Standard RNN

• LSTM

o Forget gate, input gate, output gate, cell state

All contents are collected online, listed in Reference page.

Page 15: A Brief Introduction on Recurrent Neural Network and Its Application

Long-Short Term Memory

All contents are collected online, listed in Reference page.

Page 16: A Brief Introduction on Recurrent Neural Network and Its Application

Long-Short Term Memory

All contents are collected online, listed in Reference page.

Page 17: A Brief Introduction on Recurrent Neural Network and Its Application

LSTM / GRU

LSTM GRU

(fewer parameters)

[1]An Empirical Exploration of Recurrent Network Architecture[2]Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

All contents are collected online, listed in Reference page.

Page 18: A Brief Introduction on Recurrent Neural Network and Its Application

Neural Machine Translation

• Encoder-decoder

o Input reversing

• 《Sequence to Sequence Learning with Neural Networks》

o Input doubling

• 《Learning to Execute》

All contents are collected online, listed in Reference page.

Page 19: A Brief Introduction on Recurrent Neural Network and Its Application

Attention Mechanism in NMT

Neural machine translation by jointly learning to align and translate. ICLR2015All contents are collected online, listed in Reference page.

Page 20: A Brief Introduction on Recurrent Neural Network and Its Application

Visualization of Attention Matrix

• Translating from English to French

• Elements in each row add up to 1• in grayscale (0: black, 1: white)

• Alignments found• La Syrie -> Syria

Neural machine translation by jointly learning to align and translate. ICLR2015All contents are collected online, listed in Reference page.

Page 21: A Brief Introduction on Recurrent Neural Network and Its Application

RNN Applications

• Image captioning

o Encode the image with CNN, and decode the embedded

information into description with RNN.

Li-feifei, Stanford Vision LabAll contents are collected online, listed in Reference page.

Page 22: A Brief Introduction on Recurrent Neural Network and Its Application

RNN Applications

• Question answering

o Encode the document and query with RNN, and predict

the token.

Teaching Machines to Read and Comprehend. NIPS2015Attentive Reader

All contents are collected online, listed in Reference page.

Page 23: A Brief Introduction on Recurrent Neural Network and Its Application

Summary

1. RNN o Model structure

o Parameters

o Learning algorithm

2. Long-Term Dependencies & Vanishing Gradient Problemo LSTM / GRU

3. Neural Machine Translationo Encoder-decoder framework

4. Attention Mechanismo Extract information needed from source

5. RNN other applicationso Image captioning

o Question Answering

Page 24: A Brief Introduction on Recurrent Neural Network and Its Application

Reference

1. Anyone Can Learn To Code an LSTM-RNN in Python

2. Recurrent Neural Network Tutorial WILDML

3. ATTENTION AND MEMORY IN DEEP LEARNING AND

NLP WILDML

4. Neural Networks and Deep Learning

5. Understanding LSTM Networks

6. Sequence to Sequence Learning with Neural

Networks. NIPS2014

7. Teaching Machines to Read and Comprehend.

NIPS2015

Page 25: A Brief Introduction on Recurrent Neural Network and Its Application

Thanks!