An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

An Introduction toRecurrent Neural Networks

Alex Atanasov1

1Dept. of PhysicsYale University

April 24, 2018

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks


Overview

1 Introduction to RNNs

2 Historical Background

3 Mathematical Formulation

4 Unrolling

5 Computing Gradients

Alex Atanasov VFU



Throughout this presentation I will be using the notation from thebook by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Alex Atanasov VFU



Motivation

Two motivations for recurrent neural network models:

Modeling of Neuronal Connectivity

Alex Atanasov VFU



Motivation


Sequential Processing


Alex Atanasov VFU



Motivation




Alex Atanasov VFU



Motivation




Alex Atanasov VFU



Motivation




Alex Atanasov VFU



Motivation



Most human brains don’t look like this:

Alex Atanasov VFU



Motivation



Most human brains don’t look like this:

Alex Atanasov VFU



Motivation



Most human brains don’t look like this1:

1Obligatory political reference here

Alex Atanasov VFU



Motivation



Instead, we have neurons connecting in a dense web called arecurrent network

Alex Atanasov VFU



Motivation



Instead, we have neurons connecting in a dense web called arecurrent network

Alex Atanasov VFU



RNN Examples

We can also combine RNNs with other networks we’ve seen before

Alex Atanasov VFU



RNN Examples

Alex Atanasov VFU



RNN Examples

Alex Atanasov VFU



Recurrent neural networks are

Designed for processing sequential data

Like CNNs, motivated by biological example

Unlike CNNs and deep neural networks in that their neuralconnections can contain cycles

A very general form of neural network

Turing complete

Alex Atanasov VFU








Turing complete

Alex Atanasov VFU








Turing complete

Alex Atanasov VFU








Turing complete

Alex Atanasov VFU








Turing complete

Alex Atanasov VFU








Turing complete

Alex Atanasov VFU



Visual Representation

From a neuroscience paper1:

1Song et al. 2014, Training Excitatory-Inhibitory Recurrent Neural Networksfor Cognitive Tasks: A Simple and Flexible Framework

Alex Atanasov VFU



Visual Representation

From a neuroscience paper1:

1Song et al. 2014, Training Excitatory-Inhibitory Recurrent Neural Networksfor Cognitive Tasks: A Simple and Flexible Framework

Alex Atanasov VFU



Hopfield Networks

The first formulation of a “recurrent-like” neural network wasmade by John Hopfield (1982)

Very simple form of neural network

Build in the context of theoretical neuroscience

First order attempt at understanding the mechanismunderlying associative memory

Still an important object of study, primarily in neuroscience

Alex Atanasov VFU



Hopfield Networks






Alex Atanasov VFU



Hopfield Networks






Alex Atanasov VFU



Hopfield Networks






Alex Atanasov VFU



Hopfield Networks






Alex Atanasov VFU



LSTM Networks

Invented by Hochreiter and Schmidhuber in 1997

Only recently (2007), when computational resources werepowerful enough to train LSTM networks, did they begin torevolutionize the field

First came to fame for outperforming all traditional models incertain speech applications

In 2009, RNNs set records for handwriting recognition

In 2015, Google’s speech recognition received at 47% jumpusing an LSTM network

Significantly improved machine translation, language modelingand multilingual language processing (i.e. Google Translate)

Together with CNNs, significantly improved image captioning

Alex Atanasov VFU



LSTM Networks








Alex Atanasov VFU



LSTM Networks








Alex Atanasov VFU



LSTM Networks








Alex Atanasov VFU



LSTM Networks








Alex Atanasov VFU



LSTM Networks








Alex Atanasov VFU



Lets actually define what an RNN is

Alex Atanasov VFU



Lets actually define what an RNN is

Alex Atanasov VFU



Our Variables

Our input will be a sequence of vectors:

x(1) . . . x(τ)

Let 1 ≤ t ≤ τ index these vectors. Our input is then denoted x(t).The activity of the neurons inside the RNN at time t will denotedby the vector h(t)

The output at time t will be denoted o(t)

The target output at time t will be denoted y(t)

The RNN’s goal is to minimize a loss L(o(t), y(t)) over all times.

Alex Atanasov VFU



Our Variables


x(1) . . . x(τ)

Let 1 ≤ t ≤ τ index these vectors. Our input is then denoted x(t).

The activity of the neurons inside the RNN at time t will denotedby the vector h(t)




Alex Atanasov VFU



Our Variables


x(1) . . . x(τ)

Let 1 ≤ t ≤ τ index these vectors. Our input is then denoted x(t).

Thinking of t as “time” gives us the dynamical evolution of asystem.

The activity of the neurons inside the RNN at time t will denotedby the vector h(t)




Alex Atanasov VFU



Our Variables


x(1) . . . x(τ)





Alex Atanasov VFU



Our Variables


x(1) . . . x(τ)





Alex Atanasov VFU



Our Variables


x(1) . . . x(τ)





Alex Atanasov VFU



Our Variables


x(1) . . . x(τ)





Alex Atanasov VFU



Our Variables

So:

x(t) h(t) o(t)

Alex Atanasov VFU



Our Variables

So:

x(t) h(t) o(t)

Alex Atanasov VFU



Equations of Evolution

Given an input x(t) with

Input weights Uij connecting input j to RNN neuron i

Internal weights Wij connecting RNN neuron j to RNNneuron i and internal bias bi on RNN neuron i

Output weights Vij connecting RNN neuron j to outputneuron i and internal bias ci on output neuron i

The time evolution of h(t), o(t) is given by:

h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c

Often, we want a probability as our output, so our RNN output is

y(t) = softmax(o(t))

Alex Atanasov VFU









h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c



Alex Atanasov VFU









h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c



Alex Atanasov VFU









h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c



Alex Atanasov VFU









h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c



Alex Atanasov VFU









h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c



Alex Atanasov VFU









h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c



Alex Atanasov VFU




So:

x(t) U h(t), W, b V, c o(t)

Alex Atanasov VFU



Computational Graph

Often in the field of deep learning, RNNs are pictorially describedby computational graphs.

Notice this is different from the usual deep network picture

Alex Atanasov VFU



Computational Graph



Alex Atanasov VFU



Computational Graph



Alex Atanasov VFU



Computational Graph

To recover a deep network picture, we perform an operation knownas unrolling the graph

Alex Atanasov VFU



Computational Graph

To recover a deep network picture, we perform an operation knownas unrolling the graph

Alex Atanasov VFU



Examples of Computational Graphs

Feeding the output in:

Alex Atanasov VFU




Feeding the output in:

Alex Atanasov VFU




Summarizing a sequence

Alex Atanasov VFU



Gradients Descent on the Unrolled Graph

After unrolling the computational graph of an RNN, we can use thesame gradient methods that we’re familiar with for deep networks.

Our trainable variables are U,W,V,b and c

Alex Atanasov VFU




After unrolling the computational graph of an RNN, we can use thesame gradient methods that we’re familiar with for deep networks.Our trainable variables are U,W,V,b and c

Alex Atanasov VFU




After unrolling the computational graph of an RNN, we can use thesame gradient methods that we’re familiar with for deep networks.Our trainable variables are U,W,V,b and c

Alex Atanasov VFU



Next lecture

More on the the training of RNNs: vanishing and explodinggradients

A tour of the many variations of RNNs

HopfieldLSTMNeural Turing Machines

Alex Atanasov VFU



Next lecture




Alex Atanasov VFU



Next lecture




Alex Atanasov VFU



Next lecture



Hopfield

LSTMNeural Turing Machines

Alex Atanasov VFU



Next lecture



HopfieldLSTM

Neural Turing Machines

Alex Atanasov VFU



Next lecture




Alex Atanasov VFU


Documents

An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling