33
Deep Learning at Twitter's Scale Cibele Montez Halasz Machine Learning Engineer @ Twitter Cortex October 11, 2018

Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Deep Learning at Twitter's Scale

Cibele Montez HalaszMachine Learning Engineer @ Twitter Cortex

October 11, 2018

Page 2: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

12345

BackgroundWorkflow/PlatformModeling and Optimizations Additional Performance GainsGPU

Agenda

Page 3: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Background

● Challenges: Characteristics of Platform

● Data Shift

Page 4: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

4

Sparsity of Data

Along with more body copy goes here.

Challenges

Source: Luca Belli/Dan Shiebler, June, 2018

VERY

SPARSE

DATA

Page 5: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

5

Speed

Along with more body copy goes here.

Challenges

Source: Jacob Kastrenakes, The Verge, July, 2018

Page 6: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

6

Data Shift

Along with more body copy goes here.

Data Shift

Source: Luca Belli/Dan Shiebler, June, 2018

Page 7: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

7

Data Shift

Source: Luca Belli/Dan Shiebler, June, 2018

Data Shift

Page 8: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Machine Learning at the Company

● Environment● Modeling: some use cases

Page 9: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Environment

Environment: ML Overview

Source: Luca Belli/Dan Shiebler, June, 2018

Team A’s Data Aggregation and Feature Extraction Job

Cortex Embedding Generation Pipeline

Feature Registry

Team A’s Machine Learning Model

Team B’s Machine Learning Model

Team C’s Machine Learning Model

Page 10: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Environment

Environment: ML Workflows

Source: Devin Goodsell, NYC Machine Learning Meetup, June 20, 2018

Page 11: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Environment

Environment: ML Training

Page 12: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Environment

Environment: Priorities

● Feature Addition → Scalable data ● Data Addition → Scalable data ● Training → Fast, robust training engine ● Deployment → Seamless and tested ML services ● A/B test → Good AB test environment

Page 13: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling: Modeling and Optimizations with TensorFlow

Page 14: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling

Modeling

Discretizer Full Sparse Dense MLP Full Sparse

Page 15: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling

Sparse Linear Layer

1 2 K...

V1 V2 Vn-2 Vn-1 V

Page 16: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling

Sparse Linear Layer: First Approach

Source: Tensorflow

Page 17: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling

Sparse Linear Layer: Final Approach

Source: Tensorflow

Page 18: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling

Sparse Linear Layer

● Optimizers

○ SGD○ Lazy Adam

Source: Berkeley Research Artificial Lab

Page 19: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling

Sparse Linear Layer: Variable Partitioning

output:

Page 20: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling

Sparse Linear Layer: Variable Partitioning

output:

Page 21: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling

Sparse Linear Layer: Variable Partitioning

output:

Page 22: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling

Sparse Linear Layer: Variable Partitioning: Profiling

~33% reduction

Page 23: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling

Sparse Linear Layer: Online Normalization

● Example:

Source: Nicolas Koumchatzky

Input: input_feature (value == 1M)⇒ weight_gradient == 1M⇒ update = 1M * learning_rate⇒ ?

Page 24: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling

Sparse Linear Layer: Online Normalization

● Example:

Input: input_feature (value == 1M)⇒ weight_gradient == 1M⇒ update = 1M * learning_rate⇒

Source: Nicolas Koumchatzky

Page 25: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Modeling

Sparse Linear Layer: Online Normalization

● Normalization of input values

Source: Nicolas Koumchatzky

Belongs to [-1, 1]Trainable per-feature bias: discriminate absence and presence of features

Page 26: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Additional Performance Gains

Page 27: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Performance Gains

Hogwild

Source: Hogwild!, UC Berkeley

Page 28: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Performance Gains

Hogwild

Source: Tensorflow

Page 29: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Performance Gains

Custom Ops

Page 30: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

GPU x CPU metrics

Page 31: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

GPU Benchmarks

GPU Benchmarks

Batch Size/Model Optimization

CPU: Baseline(tf.sparse_tensor_dense_matmul)

CPU: After optimizations

GPU: Baseline(tf.sparse_tensor_dense_matmul)

GPU: After Optimizations

256 1024 samples/s 7372 samples/s 5504 samples/s 22528 samples/s

512 1638 samples/s 11264 samples/s 8448 samples/s 21504 samples/s

1024 2355 samples/s 13312 samples/s 10752 samples/s 22528 samples/s

GPU benchmarks were run with NVIDIA Tesla K80 ProcessorsCPU benchmarks were run with Intel Xenon Platinum 8180 Processors

Page 32: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Acknowledgements

● Andrew Bean● Ricardo Cervera-Navarro● Priyank Jain● Ruhua Jiang● Nicholas Léonard● Briac Marcatté● Mahak Patidar● Tim Sweeney● Pavan Yalamanchili● Yi Zhuang

Page 33: Machine Learning Engineer @ Twitter Cortex Cibele Montez Halaszon-demand.gputechconf.com/gtc-eu/2018/pdf/e8449-deep... · 2018. 10. 7. · Deep Learning at Twitter's Scale Cibele

Thank you!Questions?

Follow me on Twitter: @cibelemhEmail me at: [email protected]