38
Recurrent Neural Network The Deepest of Deep Learning Chen Liang

RNNs and Tensorflow

  • Upload
    dominh

  • View
    247

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RNNs and Tensorflow

Recurrent Neural NetworkThe Deepest of Deep Learning

Chen Liang

Page 2: RNNs and Tensorflow

Deep Learning

Page 3: RNNs and Tensorflow

Deep Learning

Page 4: RNNs and Tensorflow

Deep learning works like human brain?

Demystify Deep Learning

Page 5: RNNs and Tensorflow

Deep learning works like human brain?

Demystify Deep Learning

Page 6: RNNs and Tensorflow

Deep learning works like human brain?

Demystify Deep Learning

Page 7: RNNs and Tensorflow

Deep Learning: Building Blocks

Page 8: RNNs and Tensorflow

Deep Learning: Deep Composition

Page 9: RNNs and Tensorflow

Deep Learning: Gradient Descent

Page 10: RNNs and Tensorflow

Deep Learning: Gradient Descent

Page 12: RNNs and Tensorflow

Deep Learning: Weight Sharing

Page 13: RNNs and Tensorflow

Recurrent Neural NetworkDeepest of Deep learning?

● Can be infinitely deep

Equation

Page 14: RNNs and Tensorflow

BPTT: Backpropagation Through Time

Page 15: RNNs and Tensorflow

Recurrent Neural NetworkShort term dependency

Long term dependency

Exploding/vanishing gradient

Page 16: RNNs and Tensorflow

LSTM: Long-Short Term Memory Add a direct pathway for gradient

Page 17: RNNs and Tensorflow

LSTM: Long-Short Term Memory Forget gate

Page 18: RNNs and Tensorflow

LSTM: Long-Short Term Memory Input gate

Page 19: RNNs and Tensorflow

LSTM: Long-Short Term Memory Update memory using forget gate and input gate

Page 20: RNNs and Tensorflow

LSTM: Long-Short Term Memory Output gate

Page 21: RNNs and Tensorflow

LSTM: Long-Short Term Memory Putting together

Page 22: RNNs and Tensorflow

RNN: A General Framework

Machine Translation

Speech recognition

Language ModelingSentiment Analysis

Image Recognition

Image Caption Generation

Page 23: RNNs and Tensorflow

Char-RNNHow it works?

Vocabulary:

[“h”, “e”, “l”, “o”]

Training sequence:

“hello”

Page 24: RNNs and Tensorflow

Char-RNNLinux

Latex

Wikipedia

Music

Check out the blog:

The Unreasonable Effectiveness of RNN

Page 25: RNNs and Tensorflow

TensorFlowTensorFlow™ is an open source software library for numerical computation using data flow graphs.

Page 26: RNNs and Tensorflow

import tensorflow as tfimport numpy as np

# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3x_data = np.random.rand(100).astype(np.float32)y_data = x_data * 0.1 + 0.3

# Try to find values for W and b that compute y_data = W * x_data + b# (We know that W should be 0.1 and b 0.3, but Tensorflow will# figure that out for us.)W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))b = tf.Variable(tf.zeros([1]))y = W * x_data + b

# Minimize the mean squared errors.loss = tf.reduce_mean(tf.square(y - y_data))optimizer = tf.train.GradientDescentOptimizer(0.5)train = optimizer.minimize(loss)

TensorFlow: Computation Graph

Import TensorFlow and NumPy

Page 27: RNNs and Tensorflow

import tensorflow as tfimport numpy as np

# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3x_data = np.random.rand(100).astype(np.float32)y_data = x_data * 0.1 + 0.3

# Try to find values for W and b that compute y_data = W * x_data + b# (We know that W should be 0.1 and b 0.3, but Tensorflow will# figure that out for us.)W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))b = tf.Variable(tf.zeros([1]))y = W * x_data + b

# Minimize the mean squared errors.loss = tf.reduce_mean(tf.square(y - y_data))optimizer = tf.train.GradientDescentOptimizer(0.5)train = optimizer.minimize(loss)

TensorFlow: Computation Graph

Synthesize some noisy data from a linear model

Page 28: RNNs and Tensorflow

import tensorflow as tfimport numpy as np

# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3x_data = np.random.rand(100).astype(np.float32)y_data = x_data * 0.1 + 0.3

# Try to find values for W and b that compute y_data = W * x_data + b# (We know that W should be 0.1 and b 0.3, but Tensorflow will# figure that out for us.)W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))b = tf.Variable(tf.zeros([1]))y = W * x_data + b

# Minimize the mean squared errors.loss = tf.reduce_mean(tf.square(y - y_data))optimizer = tf.train.GradientDescentOptimizer(0.5)train = optimizer.minimize(loss)

W x_data

*

+

b

TensorFlow: Computation Graph

Page 29: RNNs and Tensorflow

TensorFlow: Computation Graphimport tensorflow as tfimport numpy as np

# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3x_data = np.random.rand(100).astype(np.float32)y_data = x_data * 0.1 + 0.3

# Try to find values for W and b that compute y_data = W * x_data + b# (We know that W should be 0.1 and b 0.3, but Tensorflow will# figure that out for us.)W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))b = tf.Variable(tf.zeros([1]))y = W * x_data + b

# Minimize the mean squared errors.loss = tf.reduce_mean(tf.square(y - y_data))optimizer = tf.train.GradientDescentOptimizer(0.5)train = optimizer.minimize(loss)

W x_data

*

+

b

LossOptimizer

y_data

Page 30: RNNs and Tensorflow

TensorFlow: Session# Before starting, initialize the variables. We will 'run' this first.init = tf.initialize_all_variables()

# Launch the graph.sess = tf.Session()sess.run(init)

# Fit the line.for step in xrange(201): sess.run(train) if step % 20 == 0: print(step, sess.run(W), sess.run(b))

# Learns best fit is W: [0.1], b: [0.3]

Page 31: RNNs and Tensorflow

TensorFlow: Session# Before starting, initialize the variables. We will 'run' this first.init = tf.initialize_all_variables()

# Launch the graph.sess = tf.Session()sess.run(init)

# Fit the line.for step in xrange(201): sess.run(train) if step % 20 == 0: print(step, sess.run(W), sess.run(b))

# Learns best fit is W: [0.1], b: [0.3]

Page 32: RNNs and Tensorflow

Tensorboard Demo

Page 33: RNNs and Tensorflow

Tensorboard Demo

Page 34: RNNs and Tensorflow

Tensorboard Demo

Page 35: RNNs and Tensorflow

Now the part that everybody hates...

Page 36: RNNs and Tensorflow

Jon Snow is dead

Page 37: RNNs and Tensorflow

HomeworkPart 1: Backpropagation and gradient check

● NumPy

Part 2: Char-RNN

● Undergrad/Grad Descent○ Gradient descent => graduate descent

○ Systematic search of hyperparameters

● Do something fun with it!

Use gradient to find the best parameters

Use graduate student to find the best hyperparameters

Page 38: RNNs and Tensorflow

ReferencesChristopher Colah’s Blog: http://colah.github.io/

Andrej Karpathy’s Blog: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

David Silver’s Talk: http://videolectures.net/rldm2015_silver_reinforcement_learning/

Geoffrey Hinton’s Coursera Talk:

https://class.coursera.org/neuralnets-2012-001/lecture