Upload
ai-frontiers
View
5.023
Download
0
Embed Size (px)
Citation preview
Trends and Developments inDeep Learning Research
Jeff DeanGoogle Brain team
g.co/brainIn collaboration with many other people at Google
Growing Use of Deep Learning at Google
AndroidAppsdrug discoveryGmailImage understandingMapsNatural language understandingPhotosRobotics researchSpeechTranslationYouTube… many others ...
Across many products/areas:
# of directories containing model description files
Google Brain Team
● Research: 27 papers in ICML, NIPS, and ICLR in 2016, plus others in venues like ACL, ICASSP, CVPR, ICER, and OSDI;Brain Residency Program, hosted ~50 interns in 2016, ...
● Product impact: Many dozens of high impact collaborations in Google products/efforts like Search, Ads, Photos, Translate, GMail, Maps, Cloud ML, speech recognition, self-driving cars, robotics, …
● Open-source tools for ML: TensorFlow, Magenta, visualization tools, …
g.co/brain
What do you want in a machine learning system?● Ease of expression: for lots of crazy ML ideas/algorithms● Scalability: can run experiments quickly● Portability: can run on wide variety of platforms● Reproducibility: easy to share and reproduce research● Production readiness: go from research to real products
Open, standard software for general machine learning
Great for Deep Learning in particular
First released Nov 2015
Apache 2.0 license
http://tensorflow.org/and
https://github.com/tensorflow/tensorflow
Why Did We Build TensorFlow?Wanted system that was flexible, scalable, and production-ready
DistBelief, our first system, was good on two of these, but lacked flexibility
Most existing open-source packages were also good on 2 of 3 but not all 3
TensorFlow GoalsEstablish common platform for expressing machine learning ideas and systems
Make this platform the best in the world for both research and production use
Open source it so that it becomes a platform for everyone, not just Google
Facts and FiguresLaunched on Nov. 9, 2015
Initial launch was reasonably fully-featured:auto differentiation, queues, control flow, fairly comprehensive set of ops, …
tutorials made system accessible
out-of-the-box support for CPUs, GPUs, multiple devices, multiple platforms
Some Stats500+ contributors, most of them outside Google
12,000+ commits since Nov, 2015 (~30 per day)
1M+ binary downloads
#15 most popular repository on GitHub by stars (just passed Linux!)
Used in ML classes at quite a few universities now:Toronto, Berkeley, Stanford, …
Many companies/organizations using TensorFlow:Google, DeepMind, OpenAI, Twitter, Snapchat, Airbus, Uber, ...
Appeared in OSDI 2016https://arxiv.org/abs/1605.08695
http://tensorflow.org/whitepaper2015.pdf
Strong External Adoption
GitHub launch Nov. 2015
GitHub launch Sep. 2013
GitHub launch Jan. 2012
GitHub launch Jan. 2008
1M+ binary installs since November, 2015
GitHub launch Apr. 2015
Experiment Turnaround Time and Research Productivity
● Minutes, Hours:○ Interactive research! Instant gratification!
● 1-4 days○ Tolerable○ Interactivity replaced by running many experiments in parallel
● 1-4 weeks○ High value experiments only○ Progress stalls
● >1 month○ Don’t even try
Just-In-Time Compilationvia TensorFlow’s XLA, "Accelerated Linear Algebra" compiler
0x00000000 movq (%rdx), %rax0x00000003 vmovaps (%rax), %xmm00x00000007 vmulps %xmm0, %xmm0, %xmm00x0000000b vmovaps %xmm0, (%rdi)
...
TF graphs go in,
Optimized & specialized assembly comes out.
Let's explain that!
Demo:Inspect JIT code in TensorFlowiPython shell
XLA:CPU
XLA:GPU
What are some ways thatdeep learning is having
a significant impact at Google?
All of these examples implemented using TensorFlowor our predecessor system
“How cold is it outside?”
DeepRecurrent
Neural NetworkAcoustic Input Text Output
Reduced word errors by more than 30%
Speech Recognition
Google Research Blog - August 2012, August 2015
“ocean”Deep
ConvolutionalNeural Network
Your Photo
Automatic Tag
Search personal photos without tags.
Google Photos Search
Google Research Blog - June 2013
Reuse same model for completely different problems
Same basic model structuretrained on different data,
useful in completely different contexts
Example: given image → predict interesting pixels
We have tons of vision problems
Image search, StreetView, Satellite Imagery, Translation, Robotics, Self-driving Cars,
www.google.com/sunroof
Performance on par or slightly better than the median of 8 U.S. board-certified ophthalmologists (F-score of 0.95 vs. 0.91).
http://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html
Combining Vision with Robotics
“Deep Learning for Robots: Learning from Large-Scale Interaction”, Google Research Blog, March, 2016
“Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection”, Sergey Levine, Peter Pastor, Alex Krizhevsky, & Deirdre Quillen,Arxiv, arxiv.org/abs/1603.02199
Score for doc,query
pair
DeepNeural
NetworkQuery & document features
Query: “car parts for sale”,
Doc: “Rebuilt transmissions …”
Launched in 2015Third most important search ranking signal (of 100s)
RankBrain in Google Search Ranking
Bloomberg, Oct 2015: “Google Turning Its Lucrative Web Search Over to AI Machines”
Sequence-to-Sequence Model
A B C
v
D __ X Y Z
X Y Z Q
Input sequence
Target sequence
[Sutskever & Vinyals & Le NIPS 2014]
Deep LSTM
Sequence-to-Sequence Model: Machine Translation
v
Input sentence
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How
Quelle est taille?votre <EOS>
Sequence-to-Sequence Model: Machine Translation
v
Input sentence
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How
Quelle est taille?votre <EOS>
tall
How
Sequence-to-Sequence Model: Machine Translation
v
Input sentence
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How tall are
Quelle est taille?votre <EOS> How tall
Sequence-to-Sequence Model: Machine Translation
v
Input sentence
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How tall you?are
Quelle est taille?votre <EOS> How aretall
Sequence-to-Sequence Model: Machine Translation
v
Input sentence
[Sutskever & Vinyals & Le NIPS 2014]
At inference time:Beam search to choose most probable
over possible output sequences
Quelle est taille?votre <EOS>
April 1, 2009: April Fool’s Day joke
Nov 5, 2015: Launched Real Product
Feb 1, 2016: >10% of mobile Inbox replies
Smart Reply
Small Feed-Forward
Neural Network
Incoming Email
ActivateSmart Reply?
yes/no
Smart Reply Google Research Blog- Nov 2015
Small Feed-Forward
Neural Network
Incoming Email
ActivateSmart Reply?
Deep RecurrentNeural Network
Generated Replies
yes/no
Smart Reply Google Research Blog- Nov 2015
Model: A close up of a child holding a stuffed animal.
Human: A young girl asleep on the sofa cuddling a stuffed bear.
Model: A baby is asleep next to a teddy bear.
Image Captions Research
Great quality improvements
...but challenging scalability issues
https://arxiv.org/abs/1609.08144
Encoder LSTMs
Decoder LSTMs
<s> Y1 Y3
SoftMax
Y1 Y2 </s>
X3 X2 </s>
8 Layers
Gpu1
Gpu2
Gpu2
Gpu3
+ + +
Gpu8
Attention
+ ++
Gpu1
Gpu2
Gpu3
Gpu8
Google Neural Machine Translation Model
One model replica: one machine w/ 8 GPUs
Model + Data Parallelism
...
Params
Many replicas
Parameters distributed across many parameter server machines
Params Params...
neural (GNMT)
phrase-based (PBMT)
English >
Spanish
English >
French
English >
Chinese
Spanish >
English
French >
English
Chinese >
English
Translation model
Tran
slat
ion
qual
ity
0
1
2
3
4
5
6
human
perfect translation
Neural Machine Translation
Closes gap between old system and human-quality translation by 58% to 87%
Enables better communication across the world
research.googleblog.com/2016/09/a-neural-network-for-machine.html
Currently:Most models are trained from scratch for a single task
This is quite inefficient:Data inefficient: needs lots of data for each task
Computation inefficient: starting from scratch is a lot of workHuman ML expert inefficient: substantial effort required for
each task
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation,Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Deanhttps://arxiv.org/abs/1611.04558
https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
Outrageously Large Neural Networks: The Sparsely-gated Mixture-of-Experts Layer,Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le & Jeff DeanSubmitted to ICLR 2017, https://openreview.net/pdf?id=B1ckMDqlg
Per-Example Routing
Current:Solution = ML expertise + data + computation
Can we turn this into:Solution = data + 100X computation
???
Idea: model-generating model trained via RL
(1) Generate ten models(2) Train them for a few hours(3) Use loss of the generated models as reinforcement
learning signal
Special computation properties
reducedprecision
ok
about 1.2
× about 0.6
about 0.7
1.21042
× 0.61127
0.73989343NOT
handful of specific
operations× =
reducedprecision
ok
about 1.2
× about 0.6
about 0.7
1.21042
× 0.61127
0.73989343NOT
Special computation properties
Tensor Processing UnitCustom Google-designed chip for neural net computations
In production use for >20 months: used on every search query, for neural machine translation,for AlphaGo match, ...
Example queries of the future
Which of these eye images shows
symptoms of diabetic retinopathy?
Please fetch me a cup of tea from the kitchen
Describe this video in Spanish
Find me documents related to reinforcement learning for
robotics and summarize them in German
ConclusionsDeep neural networks are making significant strides in speech, vision, language, search, robotics, healthcare, …
If you’re not considering how to use deep neural nets to solve some problems, you almost certainly should be