45
Deep Learning for Natural Language Processing Stephen Clark et al… DeepMind and University of Cambridge

Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

Deep Learning for Natural Language

ProcessingStephen Clark et al…

DeepMind and University of Cambridge

Page 2: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

5. Recurrent Neural Networks

Felix HillDeepMind

Page 3: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

What are neural nets for?

Page 4: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

What are neural nets for?

Page 5: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

How can you apply a neural net to language?

“language does not naturally go here, ahem, but fortunately…..”

Page 6: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

How can you apply a neural net to language?

“language does not naturally go here, ahem, but fortunately…..”

what’s the issue here????

Page 7: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

That’s the whole point!!

Page 8: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

What is James doing in the store room?

Page 9: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

searching for a book…

Page 10: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

What is that empty cup doing over there?

Page 11: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

err..being a cup?

Page 12: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

time flies like an arrow

Page 13: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

fruit flies like a banana

Page 14: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

The networks that are good at Go and Atari were first developed for this reason!

Page 15: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

Finding structure in time - Elman, 1990

Page 16: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

The simple recurrent network (now RNN)

Page 17: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in
Page 18: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

now

Page 19: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

now

this

Page 20: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

this now

is

Page 21: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

is now this

a

Page 22: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

a now this is

story

Page 23: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

story now this is a

all

Page 24: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

all now this is a story

about

Page 25: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

about now this is a story all

how

Page 26: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

how now this is a story all about

my

Page 27: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

my now this is a story all about how

life

Page 28: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

life now this is a story all about how my

got

Page 29: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

got now this is a story all about how my life

flipped

Page 30: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

flipped now this is a story all about how my life got

turned

Page 31: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

turned now this is a story all about how my life got flipped

upside

Page 32: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

upside now this is a story all about how my life got flipped turned

down

Page 33: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

down now this is a story all about how my life got flipped turned upside

Page 34: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

Suppose we have a vocabulary of 100k words.

How many weights are there in Elman’s network?

Page 35: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

ht = tanh(Uht�1 +Wxt)

yt = V ht

A

B

C

D

E

F

G

Page 36: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

down

what is represented here?

now this is a story all about how my life got flipped turned upside

Page 37: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

what is represented here?

now this is a story all about how my life got flipped turned upside down

Page 38: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

Finding structure in time

Page 39: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

Finding more structure in time

Page 40: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

Any downsides?

down now this is a story all about how my life got flipped turned upside

now this

is a

stor

y all a

bout

how

my

life

got flipp

ed tur

ned

=

upside

Page 41: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

“Vanishing” gradients

now this

is a

stor

y all a

bout

how

my

life

got flipp

ed tur

ned

upside =

a1

a2

wn

wn�1

w1

c(f(x), y) y

x =

dC

dw1/ �0(z1)⇥ w2 ⇥ �0(z2)⇥ w3 · · ·⇥ wn ⇥ �0(zn)⇥

dC

dan

whereai = �(zi)

an

f

Page 42: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

“Vanishing” gradients

now this

is a

stor

y all a

bout

how

my

life

got flipp

ed tur

ned

upside =

a1

a2

wn

wn�1

w1

c(f(x), y) y

x =

an

�(x)�

0(x)

dC

dw1/ wn ⇥ �0(z1)⇥ · · ·⇥ �0(zn)⇥

dC

dan

(or exploding)

small change, big consequencesf

in an RNN

Page 43: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

One final thing…no output words…..

BPTT

Page 44: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

But, more typically…

http://www.cs.toronto.edu/~ilya/rnn.html

Page 45: Deep Learning for Natural Language Processing...References Finding structure in time (Elman, 1990) Description and analysis of a recurrent neural network, inference of structure in

ReferencesFinding structure in time (Elman, 1990)

Description and analysis of a recurrent neural network, inference of structure in unsegmented sequences

Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks (Graves et al, 2006)

Scales Elman up to the ML age

Recurrent neural network-based language model (Mikolov et al. 2010)

Scale Graves up to running text

Learning to understand phrases by embedding the dictionary (Hill et al. 2015)

Learns to predict words from dictionary definitions