84
Trends and Developments in Deep Learning Research Jeff Dean Google Brain team g.co/brain In collaboration with many other people at Google

Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research

Embed Size (px)

Citation preview

Trends and Developments inDeep Learning Research

Jeff DeanGoogle Brain team

g.co/brainIn collaboration with many other people at Google

ConvNets

Accuracy

Scale (data size, model size)

1980s and 1990s

neural networks

other approaches

morecomputeAccuracy

Scale (data size, model size)

neural networks

other approaches

1980s and 1990s

morecomputeAccuracy

Scale (data size, model size)

neural networks

other approaches

Now

5% errors

humans

2011

26% errors

2016

3% errors

2011

5% errors

humans

26% errors

Growing Use of Deep Learning at Google

AndroidAppsdrug discoveryGmailImage understandingMapsNatural language understandingPhotosRobotics researchSpeechTranslationYouTube… many others ...

Across many products/areas:

# of directories containing model description files

Google Brain Team

● Research: 27 papers in ICML, NIPS, and ICLR in 2016, plus others in venues like ACL, ICASSP, CVPR, ICER, and OSDI;Brain Residency Program, hosted ~50 interns in 2016, ...

● Product impact: Many dozens of high impact collaborations in Google products/efforts like Search, Ads, Photos, Translate, GMail, Maps, Cloud ML, speech recognition, self-driving cars, robotics, …

● Open-source tools for ML: TensorFlow, Magenta, visualization tools, …

g.co/brain

Need to build the right tools

What do you want in a machine learning system?● Ease of expression: for lots of crazy ML ideas/algorithms● Scalability: can run experiments quickly● Portability: can run on wide variety of platforms● Reproducibility: easy to share and reproduce research● Production readiness: go from research to real products

Open, standard software for general machine learning

Great for Deep Learning in particular

First released Nov 2015

Apache 2.0 license

http://tensorflow.org/and

https://github.com/tensorflow/tensorflow

Why Did We Build TensorFlow?Wanted system that was flexible, scalable, and production-ready

DistBelief, our first system, was good on two of these, but lacked flexibility

Most existing open-source packages were also good on 2 of 3 but not all 3

TensorFlow GoalsEstablish common platform for expressing machine learning ideas and systems

Make this platform the best in the world for both research and production use

Open source it so that it becomes a platform for everyone, not just Google

Facts and FiguresLaunched on Nov. 9, 2015

Initial launch was reasonably fully-featured:auto differentiation, queues, control flow, fairly comprehensive set of ops, …

tutorials made system accessible

out-of-the-box support for CPUs, GPUs, multiple devices, multiple platforms

Some Stats500+ contributors, most of them outside Google

12,000+ commits since Nov, 2015 (~30 per day)

1M+ binary downloads

#15 most popular repository on GitHub by stars (just passed Linux!)

Used in ML classes at quite a few universities now:Toronto, Berkeley, Stanford, …

Many companies/organizations using TensorFlow:Google, DeepMind, OpenAI, Twitter, Snapchat, Airbus, Uber, ...

December 2016 (0.6)

Python 3.3+ Faster on GPUs

February 2016 (0.7)

Dynamic Loading of

Kernels

CuDNN v4

April 2016 (0.8)

June 2016 (0.9)

iOS GPUs on Mac

August 2016 (0.10)

Slim

October 2016 (0.11)

CuDNN v5

November 2016 (0.12)

Languages

Version 1.0 upcomingStability Usability

Backwards Compatibility Documentation

Libraries

Models

Appeared in OSDI 2016https://arxiv.org/abs/1605.08695

http://tensorflow.org/whitepaper2015.pdf

Strong External Adoption

GitHub launch Nov. 2015

GitHub launch Sep. 2013

GitHub launch Jan. 2012

GitHub launch Jan. 2008

1M+ binary installs since November, 2015

GitHub launch Apr. 2015

Experiment Turnaround Time and Research Productivity

● Minutes, Hours:○ Interactive research! Instant gratification!

● 1-4 days○ Tolerable○ Interactivity replaced by running many experiments in parallel

● 1-4 weeks○ High value experiments only○ Progress stalls

● >1 month○ Don’t even try

Just-In-Time Compilationvia TensorFlow’s XLA, "Accelerated Linear Algebra" compiler

0x00000000 movq (%rdx), %rax0x00000003 vmovaps (%rax), %xmm00x00000007 vmulps %xmm0, %xmm0, %xmm00x0000000b vmovaps %xmm0, (%rdi)

...

TF graphs go in,

Optimized & specialized assembly comes out.

Let's explain that!

Demo:Inspect JIT code in TensorFlowiPython shell

XLA:CPU

XLA:GPU

What are some ways thatdeep learning is having

a significant impact at Google?

All of these examples implemented using TensorFlowor our predecessor system

“How cold is it outside?”

DeepRecurrent

Neural NetworkAcoustic Input Text Output

Reduced word errors by more than 30%

Speech Recognition

Google Research Blog - August 2012, August 2015

“ocean”Deep

ConvolutionalNeural Network

Your Photo

Automatic Tag

Search personal photos without tags.

Google Photos Search

Google Research Blog - June 2013

Google Photos Search

Reuse same model for completely different problems

Same basic model structuretrained on different data,

useful in completely different contexts

Example: given image → predict interesting pixels

We have tons of vision problems

Image search, StreetView, Satellite Imagery, Translation, Robotics, Self-driving Cars,

www.google.com/sunroof

MEDICAL IMAGINGUsing similar model for detecting diabetic retinopathy in retinal images

Performance on par or slightly better than the median of 8 U.S. board-certified ophthalmologists (F-score of 0.95 vs. 0.91).

http://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html

Computers can now see

Large implications for machine learning for robotics

Combining Vision with Robotics

“Deep Learning for Robots: Learning from Large-Scale Interaction”, Google Research Blog, March, 2016

“Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection”, Sergey Levine, Peter Pastor, Alex Krizhevsky, & Deirdre Quillen,Arxiv, arxiv.org/abs/1603.02199

Better language understanding

Score for doc,query

pair

DeepNeural

NetworkQuery & document features

Query: “car parts for sale”,

Doc: “Rebuilt transmissions …”

Launched in 2015Third most important search ranking signal (of 100s)

RankBrain in Google Search Ranking

Bloomberg, Oct 2015: “Google Turning Its Lucrative Web Search Over to AI Machines”

Sequence-to-Sequence Model

A B C

v

D __ X Y Z

X Y Z Q

Input sequence

Target sequence

[Sutskever & Vinyals & Le NIPS 2014]

Deep LSTM

Sequence-to-Sequence Model: Machine Translation

v

Input sentence

Target sentence

[Sutskever & Vinyals & Le NIPS 2014] How

Quelle est taille?votre <EOS>

Sequence-to-Sequence Model: Machine Translation

v

Input sentence

Target sentence

[Sutskever & Vinyals & Le NIPS 2014] How

Quelle est taille?votre <EOS>

tall

How

Sequence-to-Sequence Model: Machine Translation

v

Input sentence

Target sentence

[Sutskever & Vinyals & Le NIPS 2014] How tall are

Quelle est taille?votre <EOS> How tall

Sequence-to-Sequence Model: Machine Translation

v

Input sentence

Target sentence

[Sutskever & Vinyals & Le NIPS 2014] How tall you?are

Quelle est taille?votre <EOS> How aretall

Sequence-to-Sequence Model: Machine Translation

v

Input sentence

[Sutskever & Vinyals & Le NIPS 2014]

At inference time:Beam search to choose most probable

over possible output sequences

Quelle est taille?votre <EOS>

April 1, 2009: April Fool’s Day joke

Nov 5, 2015: Launched Real Product

Feb 1, 2016: >10% of mobile Inbox replies

Smart Reply

Small Feed-Forward

Neural Network

Incoming Email

ActivateSmart Reply?

yes/no

Smart Reply Google Research Blog- Nov 2015

Small Feed-Forward

Neural Network

Incoming Email

ActivateSmart Reply?

Deep RecurrentNeural Network

Generated Replies

yes/no

Smart Reply Google Research Blog- Nov 2015

Combining vision and language

Image Captioning

W __ A young girl

A young girl asleep[Vinyals et al., CVPR 2015]

Model: A close up of a child holding a stuffed animal.

Human: A young girl asleep on the sofa cuddling a stuffed bear.

Model: A baby is asleep next to a teddy bear.

Image Captions Research

Translation as a sign ofbetter language understanding

Great quality improvements

...but challenging scalability issues

https://arxiv.org/abs/1609.08144

Encoder LSTMs

Decoder LSTMs

<s> Y1 Y3

SoftMax

Y1 Y2 </s>

X3 X2 </s>

8 Layers

Gpu1

Gpu2

Gpu2

Gpu3

+ + +

Gpu8

Attention

+ ++

Gpu1

Gpu2

Gpu3

Gpu8

Google Neural Machine Translation Model

One model replica: one machine w/ 8 GPUs

Model + Data Parallelism

...

Params

Many replicas

Parameters distributed across many parameter server machines

Params Params...

neural (GNMT)

phrase-based (PBMT)

English >

Spanish

English >

French

English >

Chinese

Spanish >

English

French >

English

Chinese >

English

Translation model

Tran

slat

ion

qual

ity

0

1

2

3

4

5

6

human

perfect translation

Neural Machine Translation

Closes gap between old system and human-quality translation by 58% to 87%

Enables better communication across the world

research.googleblog.com/2016/09/a-neural-network-for-machine.html

More widespread use of:Transfer and multi-task learning, zero-shot learning

Currently:Most models are trained from scratch for a single task

This is quite inefficient:Data inefficient: needs lots of data for each task

Computation inefficient: starting from scratch is a lot of workHuman ML expert inefficient: substantial effort required for

each task

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation,Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Deanhttps://arxiv.org/abs/1611.04558

https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html

Bigger models, but sparsely activated

Per-Example Routing

Outrageously Large Neural Networks: The Sparsely-gated Mixture-of-Experts Layer,Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le & Jeff DeanSubmitted to ICLR 2017, https://openreview.net/pdf?id=B1ckMDqlg

Per-Example Routing

Automated machine learning (“learning to learn”)

Current:Solution = ML expertise + data + computation

Can we turn this into:Solution = data + 100X computation

???

Idea: model-generating model trained via RL

(1) Generate ten models(2) Train them for a few hours(3) Use loss of the generated models as reinforcement

learning signal

CIFAR-10 Image Recognition Task

“Normal” LSTM cell

Cell discovered by architecture search

Penn Tree Bank Language Modeling Task

More computational power needed

Deep learning is transforming how we design computers

Special computation properties

reducedprecision

ok

about 1.2

× about 0.6

about 0.7

1.21042

× 0.61127

0.73989343NOT

handful of specific

operations× =

reducedprecision

ok

about 1.2

× about 0.6

about 0.7

1.21042

× 0.61127

0.73989343NOT

Special computation properties

Tensor Processing UnitCustom Google-designed chip for neural net computations

In production use for >20 months: used on every search query, for neural machine translation,for AlphaGo match, ...

Example queries of the future

Which of these eye images shows

symptoms of diabetic retinopathy?

Please fetch me a cup of tea from the kitchen

Describe this video in Spanish

Find me documents related to reinforcement learning for

robotics and summarize them in German

ConclusionsDeep neural networks are making significant strides in speech, vision, language, search, robotics, healthcare, …

If you’re not considering how to use deep neural nets to solve some problems, you almost certainly should be

g.co/brainMore info about our work

Thanks!

g.co/brain