Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
LE Thien Hoa
30th Annual Conference on
Neural Information Processing Systems
NIPS 2016, Barcelona
Topics
• Deep Reinforcement Learning & Robotics
• Generative Adversarial Network
• RNN variants
• Meta-learning
• Neuroscience
• Optimization
• Machine Learning
• Natural Language Processing
• …
In this talk
• Nuts and Bolts of Applying Deep Learning
• RNN variants & limitations
• Natural Language Processing
Nuts and Bolts of Applying Deep Learning Source: Andrew Ng, NIPS 2016
End-to-End Deep Learning
Source: Andrew Ng, NIPS 2016
Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
Effective when works with
Big Data
End-to-End Deep Learning (2)
Source: Andrew Ng, NIPS 2016
Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
Suppress pre-processing steps
to have End-to-End learning
Bias – Variance Tradeoff
Source: Andrew Ng, NIPS 2016
Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
Divide Dev to Train-Dev & Test-Dev
Source: Andrew Ng, NIPS 2016
Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
Bias – Variance Tradeoff (2)
Bias – Variance Tradeoff (3)
Source: Andrew Ng, NIPS 2016
Human error: 1%
2% Train error
Dev error: 10%
8% Train error
Not Overfitting
Bias
Overfitting
Good
Bias – Variance Tradeoff (4)
Source: Andrew Ng, NIPS 2016
Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
Source: Andrew Ng, NIPS 2016
Figure from https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
Human Level Performance
• Typical human: 5%
• General doctor: 1%
• Specialized doctor: 0.8%
• Group of specialized doctors: 0.5%
Deep Learning models tend to plateau once they have
reached or surpassed human-level accuracy
RNN variants & limitations
RNN & LSTM
Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Learn “long-term dependencies”
Core components in many AI’s application
Fastweight RNN
Source: Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu.
Using Fastweight to Attend to the Recent Past. NIPS 2016
Using Fastweight
to Attend to the Recent Past
Phased LSTM
Source: Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu.
Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences. NIPS 2016
Accelerating Recurrent Net Training
for Long or Event-based Sequences
Quasi-RNN
Source: James Bradbury, Stephen Merity, Caiming Xiong & Richard Socher
Quasi-Recurrent Neural Networks. Under review to ICLR 2017
Use Convolution & Pooling to mimic Recurrent Layer,
which allows parallelism
16x times faster & better predictive accuracy
than stacked LSTMs of the same hidden size
WaveNet
(CNN model)
Source: Aaron van den Oord et al.
WaveNet: A Generative Model for Raw Audio
Deep generative model
of raw audio waveforms
(16000 samples / second or
more, with important structure
at many time-scales)
Sounds more natural than
the best existing Text-to-Speech
systems
RNN with Stochastic Layers
Source: Marco Fraccaro, Søren Kaae Sønderby, Ulrich Paquet, Ole Winther
Sequential Neural Models with Stochastic Layers. NIPS 2016
Extend the modeling capabilities of
RNN by combining them with
nonlinear state space models
Able to track the factorization of the
model’s posterior distribution
Learning to Learn
Source: Marcin Andrychowicz, Misha Denil et al
Learning to learn by gradient descent by gradient descent.
NIPS 2016
LSTM as a cure to
automatic learning optimization
Natural Language Processing
Machine Translation
Source: Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi et al
Google’s Neural Machine Translation System: Bridging the Gap
between Human and Machine Translation
Google replace traditional MT by LSTM
Zero-Shot Translation
Source: Melvin Johnson, Mike Schuster, Quoc V. Le et al
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
Benefits: exploit Transfer Learning
across different languages
Multitasking
Source: Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, NIPS 2016 Workshop
Construct Deep Model by
Hierarchical Linguistic Structure
Multitasking (2)
Source: Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa
Natural language processing (almost) from scratch. JMLR 2011
Share Embedding Space
Free to choose the Depth Strucutre
Multitasking (3)
Source: Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, NIPS 2016 Workshop
Multiplicative Interaction
Source: Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
Gated-Attention Readers for Text Comprehension. Under review to ICLR 2017
Gated-Attention
Multiplicative Operation
Performance of
different gating functions
on WDW dataset
Words or Characters?
Source: Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
Words or Characters? Fine-grained Gating for Reading Comprehension. Under review to ICLR 2017
Extreme case: Rare words
Source: Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher
Pointer Sentinel Mixture Models. Workshop NIPS 2016
RNN struggle to predict rare words on Language Modeling task
Pointer sentinel mixture architecture:
ability to either reproduce a word from the recent context
or produce a word from a standard softmax classifier
Thank you for your attention