View
269
Download
0
Category
Preview:
Citation preview
Proprietary and confidential. Do not distribute.
Deep Learning at Scale
May 2016 Urs Köster, PhD
Nervana
MAKING MACHINES SMARTER.
Proprietary and confidential. Do not distribute.
ne r vana
About nervana
2
• A platform for machine intelligence
• enable deep learning at scale
• optimized from algorithms to silicon
X
Proprietary and confidential. Do not distribute.
ne r vana
The Nervana Platform - a full-stack solution
3
neon deep learning
framework
nervana cloud Solutions
Images
Text
Tabular
Speech
Time series
Video
neon: nervana python deep learning library
4
• User-friendly, extensible, fast
• Support for many deep learning models
• Interface to nervana cloud
• Multiple backends
• nervana engine
• GPU (optimized assembler kernels)
• CPU cluster
Open source (Apache 2.0) on github.com/nervanaSystems/neon
Proprietary and confidential. Do not distribute.
ne r vana
Nervana Cloud
5
web interface
command line
Proprietary and confidential. Do not distribute.
ne r vana
Deep learning as a core technology
6
DL
Photos Maps
Voice Search
Self-driving car
Ad Targeting
Machine Translation
‘Google Brain’ model
DL
Image Classification
Object Localization
Video Indexing
Speech Recognition
Nervana Platform
Natural Language
Proprietary and confidential. Do not distribute.
ne r vana
Video recognition with 3D convolution
7
Training Speed
0
0.25
0.5
0.75
1
epochs / hour
neon caffe
Proprietary and confidential. Do not distribute.
ne r vana
Object Localization / Segmentation
8
CamVid DatasetSegNet model
KITTI DatasetFast R-CNN model
neon (ms) caffe (ms) Speedup
Fast-RCNN (batch size=4) 360 670 1.8x
SegNet (batch size=4) 267 1455 5.4x
SegNet (4 GPUs, batch size=16) 348 -- *5.9x
Proprietary and confidential. Do not distribute.
ne r vana
Image Classification (Residual Network)
9
Proprietary and confidential. Do not distribute.
ne r vana
Speech to text
10
Proprietary and confidential. Do not distribute.
ne r vana
Imagenet ILSVRC Challenge
11
Top-5
err
or
rate
0%
10%
20%
30%
2010 2011 2012 2013 2014 2015
Deep learninghuman
performance
Alex
Net
C
larifa
i
Goo
gleNe
t
Res
Net
Proprietary and confidential. Do not distribute.
ne r vana 12
• Same model, better performance:
• Hardware improvements
• Algorithmic improvements
Speeding up Deep Learning
0100200
300400500600
CPU GTX580TitanX neon
Soumith's AlexNet Benchmark
ms
0
100
200
300
400
500
4/2015 8/2015 3/2016
neonCuDNN
Soumith's GoogleNet Benchmark
ms
0
100
200
300
400
500
4/2015 8/2015 3/2016
neonCuDNN
15,000 ...
Alexnet ms / iteration
Proprietary and confidential. Do not distribute.
ne r vana
Dennard scaling has ended
13
# OF PROCESSORS
LEARNING SPEED
INDUSTRY STANDARD: COMMUNICATION OVERHEAD = PERFORMANCE CEILING
NERVANA: BETTER COMMUNICATION FABRIC, NEAR LINEAR SCALING
Transistors Clock speed Power Perf / clock
Proprietary and confidential. Do not distribute.
ne r vana
Nervana Engine (coming in 2017)
14
• Unprecedented computing power
• 10x speedup over current GPUs
• More memory on-chip
• High-Bandwidth Memory off-chip
• Six bi-directional high-bandwidth
links for 3D torus interconnect
• 8 chips in a box, seamlessly scale
to multiple chassis
Proprietary and confidential. Do not distribute.
ne r vana
Summary
15
• Deep learning is a new computational paradigm
• Learning and Inference on data
• neon with state-of-the-art GPU kernels
• Nervana Cloud with multi-GPU training
• Watch for Nervana Engine deep learning processor
Recommended