Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016

Applying Deep Learning at Facebook Scale

Director of EngineeringApplied Machine Learning

Hussein Mehanna

Event prediction

Machinetranslation

Large scale computer vision

Natural languageprocessing

Applications of deep learning

Event prediction

Machinet

Why should I like this story?

1B+new stories every day

+ Billionsof stories from

this day years ago

Billion peopleThousand of stories

In milliseconds

Deep learning for rankingTitle

Deep learning

I like soccer I am fromAustralia

I am 26 I traveledto Argentina

Massive sparse logistic regression

Deep neural networks

Deep learning for ranking

Massive sparse logistic regression

Deep neural networks

Deep learning

I like soccer I am fromAustralia

I am 26 I traveledto Argentina

Event prediction

Machinet

Event prediction Machine

translation

Recurrent neural networks with attention decoderMachine translation with neural networks

Encoder input Encoded statesEncoder

DecoderDecoder input Decoder

have some todayGonna fun

Vamos a divertirnos hoy

Attention model

Event prediction Machine

translation

Event prediction

Machinet

Event prediction

Machinet

Event prediction

Machinet

VIDEOVIDEOVIDEO

VIDEOHundreds of Convolutional neural networks run on photos uploaded to Facebook

Classification Detection Segmentation

person, plate, drink

Improving Inference for deep learning

Compress modelsMemory usagein deep networksCompute faster

Memory usagein deep networksCompute faster Compress models

Convolution implementation strategies

90%+of runtime for modern

vision models

Faster convolution algorithms for deep learningCompute faster

201520142013

im2col + sgemm

FFT Tiled FFT Winograd

CuDNN for CPUsNNPACK

Easy integrationCuDNN-style C interface, easy to integrate

Supports the computationally-intensive layers:• Convolutions (tiled FFT, Winograd)• Pooling• Fully connected layers (GEMM/GEMV)

Via an x86-64 meta-assembler (PeachPy)

Computationally-intensive

Implementation

(2x-6x) vs baseline CPU

Excellent performance

Open source, integrated into frameworks

NNPACK

Caffe/Caffe2: github.com/ajtulloch/caffe/tree/nnpack-prTorch: github.com/szagoruyko/nnpack.torch

github.com/Maratyszcza/NNPACK

Integrated into several deep learning frameworks:

Memory usagein deep networksCompute faster Compress models

Compress modelsMemory usagein deep networks

Compute faster

The Memory Andy-Bill Theorem

• ResNets in vision• deep LSTMs in language

modeling

GPU memory relatively stable (12GB on Titan X/M4, 16GB on P100)

CPU memory has many constraints, especially in applied settings

Scale Constraints

Spend in activationsThe bulk of memory is in the activations – must reuse

Memory savings for modern ConvNets

View 'activations' as virtual registers and run a register allocator (graph coloring on interference graph)

50%-90%

Ideas from compilersRun inference in an O(N)-ResNet in O(1) memory!

Run inference

AlexNet

Inception Network

Some implementations

MXNet: github.com/dmlc/mxnet-memonger

Caffe/Caffe2: github.com/facebook/fb-caffe-exts/Torch: github.com/fmassa/optimize-net

Can go further and explicitly trade-off compute and memory:

ResNet-1000 from 48GB to 7GB for 30% slower timings

Compress modelsMemory usagein deep networks

Compute faster

Memory usagein deep networks Compress modelsCompute faster

Train Connectivity

Train Weights

Prune Connections

Generate Code Book

Retrain Code Book

Quantize the Weights with the

Cluster the Weights

Encode Weights

Encode Index

originalartwork

originalsize

sameaccuracy

10xreduction

sameaccuracy

27x-31xreduction

sameaccuracy

35x-50xreduction

PruningLess number of weights Huffman Encoding

Quantizationless bits per weight

Deep compression pipeline (Han et al)

All together: Pruning + Quantization + Huffman coding

11.32%10.91%

31.50%31.17%

49X552MB 11.3 MB

Event Machine

Large scale computer

Natural language

Compress

Memory usagein deep networks

Compute faster

Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016

Technology

Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016

Sandy Ryza – Software Engineer, Cloudera at MLconf ATL

Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

Raji Balasuubramaniyan, Senior Data Scientist, Manheim at MLconf ATL - 9/18/15

Elizabeth Elhassani – Director, Enterprise Analytics & Insights, LexisNexis at MLconf ATL

Amy Langville, Associate Professor of Mathematics, The College of Charleston in South Carolina at MLconf ATL

Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal Data Scientist, SAS Institute Inc. at MLconf ATL 2016

Music recommendations @ MLConf 2014

Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016

Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15

Talwalkar mlconf (1)

Sri Ambati – CEO, 0xdata at MLconf ATL

Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT Austin at MLconf ATL - 9/18/15

Jake Mannix, MLconf 2013

Kaz Sato, Evangelist, Google at MLconf ATL 2016

H2O 0xdata MLconf

MLconf NYC Ted Willke

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

First Circuit upholds Mehanna decision