71
Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients An Introduction to Recurrent Neural Networks Alex Atanasov 1 1 Dept. of Physics Yale University April 24, 2018 Alex Atanasov VFU An Introduction to Recurrent Neural Networks

An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

An Introduction toRecurrent Neural Networks

Alex Atanasov1

1Dept. of PhysicsYale University

April 24, 2018

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 2: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Overview

1 Introduction to RNNs

2 Historical Background

3 Mathematical Formulation

4 Unrolling

5 Computing Gradients

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 3: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Throughout this presentation I will be using the notation from thebook by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 4: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Motivation

Two motivations for recurrent neural network models:

Modeling of Neuronal Connectivity

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 5: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Motivation

Two motivations for recurrent neural network models:

Sequential Processing

Modeling of Neuronal Connectivity

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 6: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Motivation

Two motivations for recurrent neural network models:

Sequential Processing

Modeling of Neuronal Connectivity

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 7: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Motivation

Two motivations for recurrent neural network models:

Sequential Processing

Modeling of Neuronal Connectivity

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 8: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Motivation

Two motivations for recurrent neural network models:

Sequential Processing

Modeling of Neuronal Connectivity

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 9: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Motivation

Two motivations for recurrent neural network models:

Modeling of Neuronal Connectivity

Most human brains don’t look like this:

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 10: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Motivation

Two motivations for recurrent neural network models:

Modeling of Neuronal Connectivity

Most human brains don’t look like this:

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 11: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Motivation

Two motivations for recurrent neural network models:

Modeling of Neuronal Connectivity

Most human brains don’t look like this1:

1Obligatory political reference here

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 12: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Motivation

Two motivations for recurrent neural network models:

Modeling of Neuronal Connectivity

Instead, we have neurons connecting in a dense web called arecurrent network

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 13: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Motivation

Two motivations for recurrent neural network models:

Modeling of Neuronal Connectivity

Instead, we have neurons connecting in a dense web called arecurrent network

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 14: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

RNN Examples

We can also combine RNNs with other networks we’ve seen before

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 15: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

RNN Examples

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 16: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

RNN Examples

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 17: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Recurrent neural networks are

Designed for processing sequential data

Like CNNs, motivated by biological example

Unlike CNNs and deep neural networks in that their neuralconnections can contain cycles

A very general form of neural network

Turing complete

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 18: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Recurrent neural networks are

Designed for processing sequential data

Like CNNs, motivated by biological example

Unlike CNNs and deep neural networks in that their neuralconnections can contain cycles

A very general form of neural network

Turing complete

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 19: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Recurrent neural networks are

Designed for processing sequential data

Like CNNs, motivated by biological example

Unlike CNNs and deep neural networks in that their neuralconnections can contain cycles

A very general form of neural network

Turing complete

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 20: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Recurrent neural networks are

Designed for processing sequential data

Like CNNs, motivated by biological example

Unlike CNNs and deep neural networks in that their neuralconnections can contain cycles

A very general form of neural network

Turing complete

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 21: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Recurrent neural networks are

Designed for processing sequential data

Like CNNs, motivated by biological example

Unlike CNNs and deep neural networks in that their neuralconnections can contain cycles

A very general form of neural network

Turing complete

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 22: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Recurrent neural networks are

Designed for processing sequential data

Like CNNs, motivated by biological example

Unlike CNNs and deep neural networks in that their neuralconnections can contain cycles

A very general form of neural network

Turing complete

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 23: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Visual Representation

From a neuroscience paper1:

1Song et al. 2014, Training Excitatory-Inhibitory Recurrent Neural Networksfor Cognitive Tasks: A Simple and Flexible Framework

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 24: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Visual Representation

From a neuroscience paper1:

1Song et al. 2014, Training Excitatory-Inhibitory Recurrent Neural Networksfor Cognitive Tasks: A Simple and Flexible Framework

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 25: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Hopfield Networks

The first formulation of a “recurrent-like” neural network wasmade by John Hopfield (1982)

Very simple form of neural network

Build in the context of theoretical neuroscience

First order attempt at understanding the mechanismunderlying associative memory

Still an important object of study, primarily in neuroscience

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 26: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Hopfield Networks

The first formulation of a “recurrent-like” neural network wasmade by John Hopfield (1982)

Very simple form of neural network

Build in the context of theoretical neuroscience

First order attempt at understanding the mechanismunderlying associative memory

Still an important object of study, primarily in neuroscience

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 27: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Hopfield Networks

The first formulation of a “recurrent-like” neural network wasmade by John Hopfield (1982)

Very simple form of neural network

Build in the context of theoretical neuroscience

First order attempt at understanding the mechanismunderlying associative memory

Still an important object of study, primarily in neuroscience

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 28: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Hopfield Networks

The first formulation of a “recurrent-like” neural network wasmade by John Hopfield (1982)

Very simple form of neural network

Build in the context of theoretical neuroscience

First order attempt at understanding the mechanismunderlying associative memory

Still an important object of study, primarily in neuroscience

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 29: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Hopfield Networks

The first formulation of a “recurrent-like” neural network wasmade by John Hopfield (1982)

Very simple form of neural network

Build in the context of theoretical neuroscience

First order attempt at understanding the mechanismunderlying associative memory

Still an important object of study, primarily in neuroscience

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 30: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

LSTM Networks

Invented by Hochreiter and Schmidhuber in 1997

Only recently (2007), when computational resources werepowerful enough to train LSTM networks, did they begin torevolutionize the field

First came to fame for outperforming all traditional models incertain speech applications

In 2009, RNNs set records for handwriting recognition

In 2015, Google’s speech recognition received at 47% jumpusing an LSTM network

Significantly improved machine translation, language modelingand multilingual language processing (i.e. Google Translate)

Together with CNNs, significantly improved image captioning

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 31: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

LSTM Networks

Invented by Hochreiter and Schmidhuber in 1997

Only recently (2007), when computational resources werepowerful enough to train LSTM networks, did they begin torevolutionize the field

First came to fame for outperforming all traditional models incertain speech applications

In 2009, RNNs set records for handwriting recognition

In 2015, Google’s speech recognition received at 47% jumpusing an LSTM network

Significantly improved machine translation, language modelingand multilingual language processing (i.e. Google Translate)

Together with CNNs, significantly improved image captioning

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 32: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

LSTM Networks

Invented by Hochreiter and Schmidhuber in 1997

Only recently (2007), when computational resources werepowerful enough to train LSTM networks, did they begin torevolutionize the field

First came to fame for outperforming all traditional models incertain speech applications

In 2009, RNNs set records for handwriting recognition

In 2015, Google’s speech recognition received at 47% jumpusing an LSTM network

Significantly improved machine translation, language modelingand multilingual language processing (i.e. Google Translate)

Together with CNNs, significantly improved image captioning

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 33: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

LSTM Networks

Invented by Hochreiter and Schmidhuber in 1997

Only recently (2007), when computational resources werepowerful enough to train LSTM networks, did they begin torevolutionize the field

First came to fame for outperforming all traditional models incertain speech applications

In 2009, RNNs set records for handwriting recognition

In 2015, Google’s speech recognition received at 47% jumpusing an LSTM network

Significantly improved machine translation, language modelingand multilingual language processing (i.e. Google Translate)

Together with CNNs, significantly improved image captioning

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 34: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

LSTM Networks

Invented by Hochreiter and Schmidhuber in 1997

Only recently (2007), when computational resources werepowerful enough to train LSTM networks, did they begin torevolutionize the field

First came to fame for outperforming all traditional models incertain speech applications

In 2009, RNNs set records for handwriting recognition

In 2015, Google’s speech recognition received at 47% jumpusing an LSTM network

Significantly improved machine translation, language modelingand multilingual language processing (i.e. Google Translate)

Together with CNNs, significantly improved image captioning

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 35: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

LSTM Networks

Invented by Hochreiter and Schmidhuber in 1997

Only recently (2007), when computational resources werepowerful enough to train LSTM networks, did they begin torevolutionize the field

First came to fame for outperforming all traditional models incertain speech applications

In 2009, RNNs set records for handwriting recognition

In 2015, Google’s speech recognition received at 47% jumpusing an LSTM network

Significantly improved machine translation, language modelingand multilingual language processing (i.e. Google Translate)

Together with CNNs, significantly improved image captioning

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 36: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Lets actually define what an RNN is

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 37: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Lets actually define what an RNN is

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 38: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Our Variables

Our input will be a sequence of vectors:

x(1) . . . x(τ)

Let 1 ≤ t ≤ τ index these vectors. Our input is then denoted x(t).The activity of the neurons inside the RNN at time t will denotedby the vector h(t)

The output at time t will be denoted o(t)

The target output at time t will be denoted y(t)

The RNN’s goal is to minimize a loss L(o(t), y(t)) over all times.

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 39: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Our Variables

Our input will be a sequence of vectors:

x(1) . . . x(τ)

Let 1 ≤ t ≤ τ index these vectors. Our input is then denoted x(t).

The activity of the neurons inside the RNN at time t will denotedby the vector h(t)

The output at time t will be denoted o(t)

The target output at time t will be denoted y(t)

The RNN’s goal is to minimize a loss L(o(t), y(t)) over all times.

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 40: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Our Variables

Our input will be a sequence of vectors:

x(1) . . . x(τ)

Let 1 ≤ t ≤ τ index these vectors. Our input is then denoted x(t).

Thinking of t as “time” gives us the dynamical evolution of asystem.

The activity of the neurons inside the RNN at time t will denotedby the vector h(t)

The output at time t will be denoted o(t)

The target output at time t will be denoted y(t)

The RNN’s goal is to minimize a loss L(o(t), y(t)) over all times.

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 41: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Our Variables

Our input will be a sequence of vectors:

x(1) . . . x(τ)

Let 1 ≤ t ≤ τ index these vectors. Our input is then denoted x(t).The activity of the neurons inside the RNN at time t will denotedby the vector h(t)

The output at time t will be denoted o(t)

The target output at time t will be denoted y(t)

The RNN’s goal is to minimize a loss L(o(t), y(t)) over all times.

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 42: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Our Variables

Our input will be a sequence of vectors:

x(1) . . . x(τ)

Let 1 ≤ t ≤ τ index these vectors. Our input is then denoted x(t).The activity of the neurons inside the RNN at time t will denotedby the vector h(t)

The output at time t will be denoted o(t)

The target output at time t will be denoted y(t)

The RNN’s goal is to minimize a loss L(o(t), y(t)) over all times.

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 43: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Our Variables

Our input will be a sequence of vectors:

x(1) . . . x(τ)

Let 1 ≤ t ≤ τ index these vectors. Our input is then denoted x(t).The activity of the neurons inside the RNN at time t will denotedby the vector h(t)

The output at time t will be denoted o(t)

The target output at time t will be denoted y(t)

The RNN’s goal is to minimize a loss L(o(t), y(t)) over all times.

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 44: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Our Variables

Our input will be a sequence of vectors:

x(1) . . . x(τ)

Let 1 ≤ t ≤ τ index these vectors. Our input is then denoted x(t).The activity of the neurons inside the RNN at time t will denotedby the vector h(t)

The output at time t will be denoted o(t)

The target output at time t will be denoted y(t)

The RNN’s goal is to minimize a loss L(o(t), y(t)) over all times.

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 45: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Our Variables

So:

x(t) h(t) o(t)

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 46: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Our Variables

So:

x(t) h(t) o(t)

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 47: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Equations of Evolution

Given an input x(t) with

Input weights Uij connecting input j to RNN neuron i

Internal weights Wij connecting RNN neuron j to RNNneuron i and internal bias bi on RNN neuron i

Output weights Vij connecting RNN neuron j to outputneuron i and internal bias ci on output neuron i

The time evolution of h(t), o(t) is given by:

h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c

Often, we want a probability as our output, so our RNN output is

y(t) = softmax(o(t))

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 48: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Equations of Evolution

Given an input x(t) with

Input weights Uij connecting input j to RNN neuron i

Internal weights Wij connecting RNN neuron j to RNNneuron i and internal bias bi on RNN neuron i

Output weights Vij connecting RNN neuron j to outputneuron i and internal bias ci on output neuron i

The time evolution of h(t), o(t) is given by:

h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c

Often, we want a probability as our output, so our RNN output is

y(t) = softmax(o(t))

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 49: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Equations of Evolution

Given an input x(t) with

Input weights Uij connecting input j to RNN neuron i

Internal weights Wij connecting RNN neuron j to RNNneuron i and internal bias bi on RNN neuron i

Output weights Vij connecting RNN neuron j to outputneuron i and internal bias ci on output neuron i

The time evolution of h(t), o(t) is given by:

h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c

Often, we want a probability as our output, so our RNN output is

y(t) = softmax(o(t))

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 50: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Equations of Evolution

Given an input x(t) with

Input weights Uij connecting input j to RNN neuron i

Internal weights Wij connecting RNN neuron j to RNNneuron i and internal bias bi on RNN neuron i

Output weights Vij connecting RNN neuron j to outputneuron i and internal bias ci on output neuron i

The time evolution of h(t), o(t) is given by:

h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c

Often, we want a probability as our output, so our RNN output is

y(t) = softmax(o(t))

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 51: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Equations of Evolution

Given an input x(t) with

Input weights Uij connecting input j to RNN neuron i

Internal weights Wij connecting RNN neuron j to RNNneuron i and internal bias bi on RNN neuron i

Output weights Vij connecting RNN neuron j to outputneuron i and internal bias ci on output neuron i

The time evolution of h(t), o(t) is given by:

h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c

Often, we want a probability as our output, so our RNN output is

y(t) = softmax(o(t))

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 52: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Equations of Evolution

Given an input x(t) with

Input weights Uij connecting input j to RNN neuron i

Internal weights Wij connecting RNN neuron j to RNNneuron i and internal bias bi on RNN neuron i

Output weights Vij connecting RNN neuron j to outputneuron i and internal bias ci on output neuron i

The time evolution of h(t), o(t) is given by:

h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c

Often, we want a probability as our output, so our RNN output is

y(t) = softmax(o(t))

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 53: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Equations of Evolution

Given an input x(t) with

Input weights Uij connecting input j to RNN neuron i

Internal weights Wij connecting RNN neuron j to RNNneuron i and internal bias bi on RNN neuron i

Output weights Vij connecting RNN neuron j to outputneuron i and internal bias ci on output neuron i

The time evolution of h(t), o(t) is given by:

h(t) = tanh[U · x(t) + W · h(t−1) + b]

o(t) = V · h(t) + c

Often, we want a probability as our output, so our RNN output is

y(t) = softmax(o(t))

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 54: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Equations of Evolution

So:

x(t) U h(t), W, b V, c o(t)

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 55: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Computational Graph

Often in the field of deep learning, RNNs are pictorially describedby computational graphs.

Notice this is different from the usual deep network picture

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 56: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Computational Graph

Often in the field of deep learning, RNNs are pictorially describedby computational graphs.

Notice this is different from the usual deep network picture

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 57: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Computational Graph

Often in the field of deep learning, RNNs are pictorially describedby computational graphs.

Notice this is different from the usual deep network picture

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 58: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Computational Graph

To recover a deep network picture, we perform an operation knownas unrolling the graph

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 59: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Computational Graph

To recover a deep network picture, we perform an operation knownas unrolling the graph

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 60: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Examples of Computational Graphs

Feeding the output in:

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 61: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Examples of Computational Graphs

Feeding the output in:

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 62: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Examples of Computational Graphs

Summarizing a sequence

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 63: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Gradients Descent on the Unrolled Graph

After unrolling the computational graph of an RNN, we can use thesame gradient methods that we’re familiar with for deep networks.

Our trainable variables are U,W,V,b and c

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 64: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Gradients Descent on the Unrolled Graph

After unrolling the computational graph of an RNN, we can use thesame gradient methods that we’re familiar with for deep networks.Our trainable variables are U,W,V,b and c

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 65: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Gradients Descent on the Unrolled Graph

After unrolling the computational graph of an RNN, we can use thesame gradient methods that we’re familiar with for deep networks.Our trainable variables are U,W,V,b and c

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 66: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Next lecture

More on the the training of RNNs: vanishing and explodinggradients

A tour of the many variations of RNNs

HopfieldLSTMNeural Turing Machines

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 67: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Next lecture

More on the the training of RNNs: vanishing and explodinggradients

A tour of the many variations of RNNs

HopfieldLSTMNeural Turing Machines

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 68: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Next lecture

More on the the training of RNNs: vanishing and explodinggradients

A tour of the many variations of RNNs

HopfieldLSTMNeural Turing Machines

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 69: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Next lecture

More on the the training of RNNs: vanishing and explodinggradients

A tour of the many variations of RNNs

Hopfield

LSTMNeural Turing Machines

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 70: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Next lecture

More on the the training of RNNs: vanishing and explodinggradients

A tour of the many variations of RNNs

HopfieldLSTM

Neural Turing Machines

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks

Page 71: An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

Introduction to RNNs Historical Background Mathematical Formulation Unrolling Computing Gradients

Next lecture

More on the the training of RNNs: vanishing and explodinggradients

A tour of the many variations of RNNs

HopfieldLSTMNeural Turing Machines

Alex Atanasov VFU

An Introduction to Recurrent Neural Networks