30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

LE Thien Hoa

30th Annual Conference on

Neural Information Processing Systems

NIPS 2016, Barcelona

Topics

• Deep Reinforcement Learning & Robotics

• Generative Adversarial Network

• RNN variants

• Meta-learning

• Neuroscience

• Optimization

• Machine Learning

• Natural Language Processing

• …

In this talk

• Nuts and Bolts of Applying Deep Learning

• RNN variants & limitations

• Natural Language Processing

Nuts and Bolts of Applying Deep Learning Source: Andrew Ng, NIPS 2016

End-to-End Deep Learning

Source: Andrew Ng, NIPS 2016

Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/

Effective when works with

Big Data

End-to-End Deep Learning (2)



Suppress pre-processing steps

to have End-to-End learning

Bias – Variance Tradeoff



Divide Dev to Train-Dev & Test-Dev



Bias – Variance Tradeoff (2)



Human error: 1%

2% Train error

Dev error: 10%

8% Train error

Not Overfitting

Bias

Overfitting

Good






Human Level Performance

• Typical human: 5%

• General doctor: 1%

• Specialized doctor: 0.8%

• Group of specialized doctors: 0.5%

Deep Learning models tend to plateau once they have

reached or surpassed human-level accuracy

RNN variants & limitations

RNN & LSTM

Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Learn “long-term dependencies”

Core components in many AI’s application

Fastweight RNN

Source: Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu.

Using Fastweight to Attend to the Recent Past. NIPS 2016

Using Fastweight

to Attend to the Recent Past

Phased LSTM

Source: Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu.

Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences. NIPS 2016

Accelerating Recurrent Net Training

for Long or Event-based Sequences

Quasi-RNN

Source: James Bradbury, Stephen Merity, Caiming Xiong & Richard Socher

Quasi-Recurrent Neural Networks. Under review to ICLR 2017

Use Convolution & Pooling to mimic Recurrent Layer,

which allows parallelism

16x times faster & better predictive accuracy

than stacked LSTMs of the same hidden size

WaveNet

(CNN model)

Source: Aaron van den Oord et al.

WaveNet: A Generative Model for Raw Audio

Deep generative model

of raw audio waveforms

(16000 samples / second or

more, with important structure

at many time-scales)

Sounds more natural than

the best existing Text-to-Speech

systems

RNN with Stochastic Layers

Source: Marco Fraccaro, Søren Kaae Sønderby, Ulrich Paquet, Ole Winther

Sequential Neural Models with Stochastic Layers. NIPS 2016

Extend the modeling capabilities of

RNN by combining them with

nonlinear state space models

Able to track the factorization of the

model’s posterior distribution

Learning to Learn

Source: Marcin Andrychowicz, Misha Denil et al

Learning to learn by gradient descent by gradient descent.

NIPS 2016

LSTM as a cure to

automatic learning optimization

Natural Language Processing

Machine Translation

Source: Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi et al

Google’s Neural Machine Translation System: Bridging the Gap

between Human and Machine Translation

Google replace traditional MT by LSTM

Zero-Shot Translation

Source: Melvin Johnson, Mike Schuster, Quoc V. Le et al

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Benefits: exploit Transfer Learning

across different languages

Multitasking

Source: Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, NIPS 2016 Workshop

Construct Deep Model by

Hierarchical Linguistic Structure

Multitasking (2)

Source: Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa

Natural language processing (almost) from scratch. JMLR 2011

Share Embedding Space

Free to choose the Depth Strucutre

Multitasking (3)

Source: Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, NIPS 2016 Workshop

Multiplicative Interaction

Source: Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov

Gated-Attention Readers for Text Comprehension. Under review to ICLR 2017

Gated-Attention

Multiplicative Operation

Performance of

different gating functions

on WDW dataset

Words or Characters?

Source: Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov

Words or Characters? Fine-grained Gating for Reading Comprehension. Under review to ICLR 2017

Extreme case: Rare words

Source: Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher

Pointer Sentinel Mixture Models. Workshop NIPS 2016

RNN struggle to predict rare words on Language Modeling task

Pointer sentinel mixture architecture:

ability to either reproduce a word from the recent context

or produce a word from a standard softmax classifier

Thank you for your attention

Documents

30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot