16
Proprietary and confidential. Do not distribute. Deep Learning at Scale May 2016 Urs Köster, PhD Nervana MAKING MACHINES SMARTER.

Urs Köster Presenting at RE-Work DL Summit in Boston

Embed Size (px)

Citation preview

Page 1: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

Deep Learning at Scale

May 2016 Urs Köster, PhD

Nervana

MAKING MACHINES SMARTER.

Page 2: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

About nervana

2

• A platform for machine intelligence

• enable deep learning at scale

• optimized from algorithms to silicon

X

Page 3: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

The Nervana Platform - a full-stack solution

3

neon deep learning

framework

nervana cloud Solutions

Images

Text

Tabular

Speech

Time series

Video

Page 4: Urs Köster Presenting at RE-Work DL Summit in Boston

neon: nervana python deep learning library

4

• User-friendly, extensible, fast

• Support for many deep learning models

• Interface to nervana cloud

• Multiple backends

• nervana engine

• GPU (optimized assembler kernels)

• CPU cluster

Open source (Apache 2.0) on github.com/nervanaSystems/neon

Page 5: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Nervana Cloud

5

web interface

command line

Page 6: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Deep learning as a core technology

6

DL

Photos Maps

Voice Search

Self-driving car

Ad Targeting

Machine Translation

‘Google Brain’ model

DL

Image Classification

Object Localization

Video Indexing

Speech Recognition

Nervana Platform

Natural Language

Page 7: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Video recognition with 3D convolution

7

Training Speed

0

0.25

0.5

0.75

1

epochs / hour

neon caffe

Page 8: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Object Localization / Segmentation

8

CamVid DatasetSegNet model

KITTI DatasetFast R-CNN model

neon (ms) caffe (ms) Speedup

Fast-RCNN (batch size=4) 360 670 1.8x

SegNet (batch size=4) 267 1455 5.4x

SegNet (4 GPUs, batch size=16) 348 -- *5.9x

Page 9: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Image Classification (Residual Network)

9

Page 10: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Speech to text

10

Page 11: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Imagenet ILSVRC Challenge

11

Top-5

err

or

rate

0%

10%

20%

30%

2010 2011 2012 2013 2014 2015

Deep learninghuman

performance

Alex

Net

C

larifa

i

Goo

gleNe

t

Res

Net

Page 12: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana 12

• Same model, better performance:

• Hardware improvements

• Algorithmic improvements

Speeding up Deep Learning

0100200

300400500600

CPU GTX580TitanX neon

Soumith's AlexNet Benchmark

ms

0

100

200

300

400

500

4/2015 8/2015 3/2016

neonCuDNN

Soumith's GoogleNet Benchmark

ms

0

100

200

300

400

500

4/2015 8/2015 3/2016

neonCuDNN

15,000 ...

Alexnet ms / iteration

Page 13: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Dennard scaling has ended

13

# OF PROCESSORS

LEARNING SPEED

INDUSTRY STANDARD: COMMUNICATION OVERHEAD = PERFORMANCE CEILING

NERVANA: BETTER COMMUNICATION FABRIC, NEAR LINEAR SCALING

Transistors Clock speed Power Perf / clock

Page 14: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Nervana Engine (coming in 2017)

14

• Unprecedented computing power

• 10x speedup over current GPUs

• More memory on-chip

• High-Bandwidth Memory off-chip

• Six bi-directional high-bandwidth

links for 3D torus interconnect

• 8 chips in a box, seamlessly scale

to multiple chassis

Page 15: Urs Köster Presenting at RE-Work DL Summit in Boston

Proprietary and confidential. Do not distribute.

ne r vana

Summary

15

• Deep learning is a new computational paradigm

• Learning and Inference on data

• neon with state-of-the-art GPU kernels

• Nervana Cloud with multi-GPU training

• Watch for Nervana Engine deep learning processor

Page 16: Urs Köster Presenting at RE-Work DL Summit in Boston