29
LE Thien Hoa 30 th Annual Conference on Neural Information Processing Systems NIPS 2016, Barcelona

30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

LE Thien Hoa

30th Annual Conference on

Neural Information Processing Systems

NIPS 2016, Barcelona

Page 2: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Topics

• Deep Reinforcement Learning & Robotics

• Generative Adversarial Network

• RNN variants

• Meta-learning

• Neuroscience

• Optimization

• Machine Learning

• Natural Language Processing

• …

Page 3: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

In this talk

• Nuts and Bolts of Applying Deep Learning

• RNN variants & limitations

• Natural Language Processing

Page 4: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Nuts and Bolts of Applying Deep Learning Source: Andrew Ng, NIPS 2016

Page 5: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

End-to-End Deep Learning

Source: Andrew Ng, NIPS 2016

Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/

Effective when works with

Big Data

Page 6: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

End-to-End Deep Learning (2)

Source: Andrew Ng, NIPS 2016

Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/

Suppress pre-processing steps

to have End-to-End learning

Page 7: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Bias – Variance Tradeoff

Source: Andrew Ng, NIPS 2016

Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/

Divide Dev to Train-Dev & Test-Dev

Page 8: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Source: Andrew Ng, NIPS 2016

Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/

Bias – Variance Tradeoff (2)

Page 9: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Bias – Variance Tradeoff (3)

Source: Andrew Ng, NIPS 2016

Human error: 1%

2% Train error

Dev error: 10%

8% Train error

Not Overfitting

Bias

Overfitting

Good

Page 10: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Bias – Variance Tradeoff (4)

Source: Andrew Ng, NIPS 2016

Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/

Page 11: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Source: Andrew Ng, NIPS 2016

Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/

Human Level Performance

• Typical human: 5%

• General doctor: 1%

• Specialized doctor: 0.8%

• Group of specialized doctors: 0.5%

Deep Learning models tend to plateau once they have

reached or surpassed human-level accuracy

Page 12: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

RNN variants & limitations

Page 13: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

RNN & LSTM

Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Learn “long-term dependencies”

Core components in many AI’s application

Page 14: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Fastweight RNN

Source: Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu.

Using Fastweight to Attend to the Recent Past. NIPS 2016

Using Fastweight

to Attend to the Recent Past

Page 15: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Phased LSTM

Source: Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu.

Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences. NIPS 2016

Accelerating Recurrent Net Training

for Long or Event-based Sequences

Page 16: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Quasi-RNN

Source: James Bradbury, Stephen Merity, Caiming Xiong & Richard Socher

Quasi-Recurrent Neural Networks. Under review to ICLR 2017

Use Convolution & Pooling to mimic Recurrent Layer,

which allows parallelism

16x times faster & better predictive accuracy

than stacked LSTMs of the same hidden size

Page 17: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

WaveNet

(CNN model)

Source: Aaron van den Oord et al.

WaveNet: A Generative Model for Raw Audio

Deep generative model

of raw audio waveforms

(16000 samples / second or

more, with important structure

at many time-scales)

Sounds more natural than

the best existing Text-to-Speech

systems

Page 18: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

RNN with Stochastic Layers

Source: Marco Fraccaro, Søren Kaae Sønderby, Ulrich Paquet, Ole Winther

Sequential Neural Models with Stochastic Layers. NIPS 2016

Extend the modeling capabilities of

RNN by combining them with

nonlinear state space models

Able to track the factorization of the

model’s posterior distribution

Page 19: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Learning to Learn

Source: Marcin Andrychowicz, Misha Denil et al

Learning to learn by gradient descent by gradient descent.

NIPS 2016

LSTM as a cure to

automatic learning optimization

Page 20: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Natural Language Processing

Page 21: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Machine Translation

Source: Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi et al

Google’s Neural Machine Translation System: Bridging the Gap

between Human and Machine Translation

Google replace traditional MT by LSTM

Page 22: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Zero-Shot Translation

Source: Melvin Johnson, Mike Schuster, Quoc V. Le et al

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Benefits: exploit Transfer Learning

across different languages

Page 23: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Multitasking

Source: Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, NIPS 2016 Workshop

Construct Deep Model by

Hierarchical Linguistic Structure

Page 24: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Multitasking (2)

Source: Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa

Natural language processing (almost) from scratch. JMLR 2011

Share Embedding Space

Free to choose the Depth Strucutre

Page 25: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Multitasking (3)

Source: Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, NIPS 2016 Workshop

Page 26: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Multiplicative Interaction

Source: Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov

Gated-Attention Readers for Text Comprehension. Under review to ICLR 2017

Gated-Attention

Multiplicative Operation

Performance of

different gating functions

on WDW dataset

Page 27: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Words or Characters?

Source: Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov

Words or Characters? Fine-grained Gating for Reading Comprehension. Under review to ICLR 2017

Page 28: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Extreme case: Rare words

Source: Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher

Pointer Sentinel Mixture Models. Workshop NIPS 2016

RNN struggle to predict rare words on Language Modeling task

Pointer sentinel mixture architecture:

ability to either reproduce a word from the recent context

or produce a word from a standard softmax classifier

Page 29: 30th Annual Conference on - Inriadeeploria.gforge.inria.fr/intranet/NIPS16review.pdf · 2017-02-06 · Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot

Thank you for your attention