Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990)...

Deep Learning for Natural Language

ProcessingStephen Clark et al…

DeepMind and University of Cambridge

5. Recurrent Neural Networks

Felix HillDeepMind

What are neural nets for?

How can you apply a neural net to language?

“language does not naturally go here, ahem, but fortunately…..”

How can you apply a neural net to language?

“language does not naturally go here, ahem, but fortunately…..”

what’s the issue here????

That’s the whole point!!

What is James doing in the store room?

searching for a book…

What is that empty cup doing over there?

err..being a cup?

time flies like an arrow

fruit flies like a banana

The networks that are good at Go and Atari were first developed for this reason!

Finding structure in time - Elman, 1990

The simple recurrent network (now RNN)

this now

is now this

a now this is

story now this is a

all now this is a story

about now this is a story all

how now this is a story all about

my now this is a story all about how

life now this is a story all about how my

got now this is a story all about how my life

flipped

flipped now this is a story all about how my life got

turned

turned now this is a story all about how my life got flipped

upside

upside now this is a story all about how my life got flipped turned

down now this is a story all about how my life got flipped turned upside

Suppose we have a vocabulary of 100k words.

How many weights are there in Elman’s network?

ht = tanh(Uht�1 +Wxt)

yt = V ht

what is represented here?

now this is a story all about how my life got flipped turned upside

what is represented here?

now this is a story all about how my life got flipped turned upside down

Finding structure in time

Finding more structure in time

Any downsides?

down now this is a story all about how my life got flipped turned upside

now this

y all a

got flipp

ed tur

upside

“Vanishing” gradients

now this

y all a

got flipp

ed tur

upside =

wn�1

c(f(x), y) y

dw1/ �0(z1)⇥ w2 ⇥ �0(z2)⇥ w3 · · ·⇥ wn ⇥ �0(zn)⇥

whereai = �(zi)

“Vanishing” gradients

now this

y all a

got flipp

ed tur

upside =

wn�1

c(f(x), y) y

�(x)�

dw1/ wn ⇥ �0(z1)⇥ · · ·⇥ �0(zn)⇥

(or exploding)

small change, big consequencesf

in an RNN

One final thing…no output words…..

But, more typically…

http://www.cs.toronto.edu/~ilya/rnn.html

ReferencesFinding structure in time (Elman, 1990)

Description and analysis of a recurrent neural network, inference of structure in unsegmented sequences

Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks (Graves et al, 2006)

Scales Elman up to the ML age

Recurrent neural network-based language model (Mikolov et al. 2010)

Scale Graves up to running text

Learning to understand phrases by embedding the dictionary (Hill et al. 2015)

Learns to predict words from dictionary definitions

Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990)...

Documents

Finding Structure in Time - stuba.sk · Finding Structure in Time JEFFREY L. ELMAN University of California, San Diego Time underlies many interesting human behaviors. Thus, the question

Jan Gra elman - si.biostat.washington.edu

Elman, Jeffrey, L - Finding Structure in Time

Research Article Financial Time Series Prediction …downloads.hindawi.com/journals/cin/2016/4742515.pdfResearch Article Financial Time Series Prediction Using Elman Recurrent Random

Neural networks - Lecture 111 Recurrent neural networks (II) Time series processing –Networks with delayed input layer –Elman network Cellular networks

Finding Structure in Time - Swarthmore Collegemeeden/cs63/f07/elman.srn.pdfFinding Structure in Time JEFFREY L. ELMAN ... a simple architecture is described which has a number of

Recurrent Meta-Structure for Robust Similarity Measure in ... · Recurrent Meta-Structure for Robust Similarity Measure in Heterogeneous Information Networks YU ZHOU, School of Computer

Finding Structure in Time - Indiana University Bloomington · Finding Structure in Time JEFFREY L. ELMAN University of Calcfornia, San Riego Time underlies many interesting human

Signal Prediction in the LOCA Using Elman Recurrent Neural Networks

Elman Cues, Constraints

(Neural Network)Elman Cognition 1993

A new forecasting approach for the hospitality industry · A special case of recurrent network is the Elman network (Elman, 1990). This ANN model has not been used in tourism demand

Efficient Graph Generation with Graph Recurrent Attention ...papers.nips.cc/...generation-with-graph-recurrent-attention-networks.… · Graphs are the natural data structure to represent

Finding Structure in Time - UFRGSengel/data/media/file/cmp121/elman-fsit.pdf · Finding Structure in Time JEFFREY L. ELMAN ... Mike Jordan, Mary Hare, Dave Rumelhart, Mike Mozer,

Neural networks for structured data 1. Table of contents Recurrent models Partially recurrent neural networks Elman networks Jordan networks Recurrent

Distributed Representations, Simple Recurrent …elman/Papers/machine.learning.pdfDistributed Representations, Simple Recurrent Networks, and Grammatical Structure ... ended nature

Elman Ch_Mind and Motion

encoding sequential structure in simple recurrent - Psychology

Compact Design of On-chip Elman Optical Recurrent Neural

ELMAN HERE THE DUKE CHRONICLE