TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough...

CS 4803 / 7643: Deep Learning

Zsolt KiraGeorgia Tech

Topics: – Recurrent Neural Networks (RNNs)– BackProp Through Time (BPTT)

Plan for Today• Model

– Recurrent Neural Networks (RNNs)

• Learning – BackProp Through Time (BPTT)

(C) Dhruv Batra and Zsolt Kira 2

New Topic: RNNs

(C) Dhruv Batra and Zsolt Kira 3Image Credit: Andrej Karpathy

New Words• Recurrent Neural Networks (RNNs)

• Recursive Neural Networks– General family; think graphs instead of chains

• Types:– “Vanilla” RNNs (Elman Networks)– Long Short Term Memory (LSTMs)– Gated Recurrent Units (GRUs)– …

• Algorithms– BackProp Through Time (BPTT)– BackProp Through Structure (BPTS)

What’s wrong with MLPs?• Problem 1: Can’t model sequences

– Fixed-sized Inputs & Outputs– No temporal structure

(C) Dhruv Batra and Zsolt Kira 5Image Credit: Alex Graves, book

What’s wrong with MLPs?• Problem 1: Can’t model sequences

– Fixed-sized Inputs & Outputs– No temporal structure

• Problem 2: Pure feed-forward processing– No “memory”, no feedback

(C) Dhruv Batra and Zsolt Kira 6Image Credit: Alex Graves, book

Why model sequences?

Figure Credit: Carlos Guestrin

Why model sequences?

(C) Dhruv Batra and Zsolt Kira 8Image Credit: Alex Graves

Sequences are everywhere…

(C) Dhruv Batra and Zsolt Kira 9Image Credit: Alex Graves and Kevin Gimpel

Even where you might not expect a sequence…

Image Credit: Vinyals et al.

Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015.Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission.

Classify images by taking a series of “glimpses”

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

12Image Credit: Ba et al.; Gregor et al

• Output ordering = sequence

(C) Dhruv Batra and Zsolt Kira

Sequences in Input or Output?• It’s a spectrum…

Input: No sequence

Output: No sequence

Example: “standard”

classification / regression problems

Image Credit: Andrej Karpathy

Input: No sequence

Output: No sequence

Input: No sequence

Output: Sequence

Example: Im2Caption

Input: No sequence

Output: No sequence

Input: No sequence

Output: Sequence

Example: Im2Caption

Input: Sequence

Output: No sequence

Example: sentence classification,

multiple-choice question answering

Input: No sequence

Output: No sequence

Input: No sequence

Output: Sequence

Example: Im2Caption

Input: Sequence

Output: No sequence

Example: sentence classification,

multiple-choice question answering

Input: Sequence

Output: Sequence

Example: machine translation, video classification, video captioning, open-ended question answering

2 Key Ideas• Parameter Sharing

– in computation graphs = adding gradients

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra and Zsolt Kira 18

Computational Graph

Gradients add at branches

• “Unrolling”– in computation graphs with parameter sharing

How do we model sequences?• No input

(C) Dhruv Batra and Zsolt Kira 23Image Credit: Bengio, Goodfellow, Courville

How do we model sequences?• No input

How do we model sequences?• With inputs

• “Unrolling”– in computation graphs with parameter sharing

• Parameter sharing + Unrolling– Allows modeling arbitrary sequence lengths!– Keeps numbers of parameters in check

Recurrent Neural Network

yusually want to predict a vector at some time steps

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

new state old state input vector at some time step

some functionwith parameters W

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

Notice: the same function and the same set of parameters are used at every time step.

(Vanilla) Recurrent Neural Network

The state consists of a single “hidden” vector h:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231nSometimes called a “Vanilla RNN” or an “Elman RNN” after Prof. Jeffrey Elman

h0 fW h1

RNN: Computational Graph

h0 fW h1 fW h2

h0 fW h1 fW h2 fW h3

…x2x1

…x2x1W

Re-use the same weight matrix at every time-step

…x2x1W

RNN: Computational Graph: Many to Many

y3y2y1

…x2x1W

y3y2y1 L1 L2 L3 LT

…x2x1W

y3y2y1 L1 L2 L3 LT

…x2x1W

RNN: Computational Graph: Many to One

RNN: Computational Graph: One to Many

y3y2y1

Sequence to Sequence: Many-to-one + one-to-many

x2x1W1

Many to one: Encode input sequence in a single vector

Sequence to Sequence: Many-to-one + one-to-many

Many to one: Encode input sequence in a single vector

One to many: Produce output sequence from single input vector

fW h1 fW h2 fW

x2x1W1

Example: Character-levelLanguage Model

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

Distributed Representations Toy Example• Local vs Distributed

(C) Dhruv Batra and Zsolt Kira 45Slide Credit: Moontae Lee

Distributed Representations Toy Example• Can we interpret each dimension?

(C) Dhruv Batra and Zsolt Kira 46Slide Credit: Moontae Lee

Power of distributed representations!

Distributed

Slide Credit: Moontae Lee

Training Time: MLE / “Teacher Forcing”

Example: Character-levelLanguage ModelSampling

At test-time sample characters one at a time, feed back to model

.79Softmax

“e” “l” “l” “o”Sample

Test Time: Sample / Argmax

.79Softmax

“e” “l” “l” “o”SampleExample:

Character-levelLanguage ModelSampling

.79Softmax

Backpropagation through timeLoss

Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient

Truncated Backpropagation through timeLoss

Run forward and backward through chunks of the sequence instead of whole sequence

Carry hidden states forward in time forever, but only backpropagate for some smaller number of steps

Deep (Stacked) RNNs

train more

at first:

The Stacks Project: open source algebraic geometry textbook

Latex source http://stacks.math.columbia.edu/The stacks project is licensed under the GNU Free Documentation License

Generated C code

Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

quote detection cell

line length tracking cell

if statement cell

quote/comment cell

code depth cell

Neural Image Captioning

(C) Dhruv Batra 77

Convolution Layer+ Non-Linearity

Pooling Layer Convolution Layer+ Non-Linearity

Pooling Layer Fully-Connected MLP

Image Embedding (VGGNet)4096-dim

(C) Dhruv Batra 78

Convolution Layer+ Non-Linearity

Pooling Layer Convolution Layer+ Non-Linearity

Pooling Layer Fully-Connected MLP

4096-dim

Image Embedding (VGGNet)

(C) Dhruv Batra 79

<start> Two people and two horses.

P(next) P(next) P(next) P(next) P(next)P(next)

(C) Dhruv Batra 80

<start> Two people and two horses.

P(next) P(next) P(next) P(next) P(next)P(next)

Beam Search• Proceed from left to right• Maintain N partial captions• Expand each caption with possible next words• Discard all but the top N new partial translations

A cat sitting on a suitcase on the floor

A cat is sitting on a tree branch

A dog is running in the grass with a frisbee

A white teddy bear sitting in the grass

Two people walking on the beach with surfboards

Two giraffes standing in a grassy field

A man riding a dirt bike on a dirt track

Image Captioning: Example Results

A tennis player in action on the court

Captions generated using neuraltalk2All images are CC0 Public domain: cat suitcase, cat tree, dog, bear, surfers, tennis, giraffe, motorcycle

Image Captioning: Failure Cases

A woman is holding a cat in her hand

A woman standing on a beach holding a surfboard

A person holding a computer mouse on a desk

A bird is perched on a tree branch

A man in a baseball uniform throwing a ball

Captions generated using neuraltalk2All images are CC0 Public domain: fur coat, handstand, spider web, baseball

Image Captioning with Attention

Image: H x W x 3

Features: L x D

Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

Image: H x W x 3

Features: L x D

Distribution over L locations

Image: H x W x 3

Features: L x D

Weighted combination of features

z1Weighted

features: D

Image: H x W x 3

Features: L x D

Weighted features: D y1

First wordXu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

Image: H x W x 3

Features: L x D

First word

Weighted features: D

Distribution over vocab

Image: H x W x 3

Features: L x D

First word

z2 y2Weighted

features: D

Image: H x W x 3

Features: L x D

First word

z2 y2Weighted

features: D

Soft attention

Hard attention

Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.

TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough...

Documents

Hybrid/Tandem models + TDNNs + Intro to RNNs - IIT Bombay

Introduction to Fundamentals of RNNs

Zsolt Peteri Architecte Portfolio

Recurrent Neural Network shad.pcs15@iitp.acshad.pcs15/data/rnn-shad.pdf · Recurrent Neural Network (RNN) Training of RNNs BPTT Visualization of RNN through Feed-Forward Neural Network

Recurrent Neural Networks · 2020-04-06 · Overview •Finite state models •Recurrent neural networks (RNNs) •Training RNNs •RNN Models •Long short-term memory (LSTM) •Attention

Thesis Work Zsolt Szabo 2011

RNNs can generate bounded hierarchical languages with optimal … · 2020. 10. 16. · RNNs can generate bounded hierarchical languages with optimal memory John Hewitt yMichael Hahnz

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Zsolt Marosvári: Invent with Nokia

Finetuning Pretrained Transformers into RNNs

zsolt frikker's portfolio_2010

Introduction to RNNs - Svetlana Lazebnikslazebni.cs.illinois.edu/spring17/lec02_rnn.pdf · Introduction to RNNs! Arun Mallya! Best viewed with Computer Modern fonts installed! Outline!

British Para Table Tennis BPTT Michael Hawkesworth National ...€¦ · BPTT Michael Hawkesworth National Championships Grantham Meres, 6th/7th April 2019 SM1-2 : Round Robin for

Introduction to CNNs and RNNs with PyTorchIntroduction to CNNs and RNNs with PyTorch Presented by: Adam Balint Email: balint@uoguelph.ca

Zsolt, Sebok - Home Decoration 2

Distributed Optimization of CNNs and RNNs - GTC …on-demand.gputechconf.com/gtc/2015/presentation/S5632-William-Cha… · Distributed Optimization of CNNs and RNNs GTC 2015 William

Bptt Cow Issow Handbook

Zsolt Agárdy, a Life in Photos

Zsolt Agárdy on Financial Planning

Generating Sequences with Deep LSTMs & RNNS in julia