View
4
Download
0
Category
Preview:
Citation preview
9/11/2016
1
Solutions Showcase: Machine Learning Workloads on Cray Systems
Mark Staveley
Machine Learning Research @ Cray
9/11/2016
2
Machine Learning
Analytics
Artificial Intelligence
eResearch 2016 - (c) Cray Inc
NoSQL
Streaming
GraphMachine
Learning
Some Deep
Learning
MapD
SQreamSome Machine
Learning
Deep Learning
CPU focus
GPU focus
SQL
eResearch 2016 - (c) Cray Inc
9/11/2016
3
Create code from data
Insight Into Data
Emulate human mind
eResearch 2016 - (c) Cray Inc
eResearch 2016 - (c) Cray Inc
9/11/2016
4
eResearch 2016 - (c) Cray Inc
Machine Learning Workflow
eResearch 2016 - (c) Cray Inc
9/11/2016
5
DL Example: Radiograph Classifier
Training Images
Image Results:
Bounding Boxes
Classification Scores For:
Tumors
Broken bones
Lesions
Etc.
3) Trained model
scores images1) Designs Neural
Net Architecture
2) Training defines
linking weights
Developer /
Researcher
Important Points:
• Parallelism and local memory: GPUs
• Single / half-precision floats speed up
learning without loss of accuracy
• All-to-all communication is required
for scale and works well on Aries
• Scoring is separate from training
eResearch 2016 - (c) Cray Inc
DL Example: Radiograph Classifier
Training Images
Image Results:
Bounding Boxes
Classification Scores For:
Tumors
Broken bones
Lesions
Etc.
3) Trained model
scores images1) Designs Neural
Net Architecture
2) Training defines
linking weights
Developer /
Researcher
Important Points:
• Parallelism and local memory: GPUs
• Single / half-precision floats speed up
learning without loss of accuracy
• All-to-all communication is required
for scale and works well on Aries
• Scoring is separate from training
eResearch 2016 - (c) Cray Inc
9/11/2016
6
Artificial Intelligence Pipeline
eResearch 2016 - (c) Cray Inc
Artificial Intelligence Pipeline
eResearch 2016 - (c) Cray Inc
9/11/2016
7
Why Cray ?
eResearch 2016 - (c) Cray Inc
NoSQL
Streaming
GraphMachine
Learning
Some Deep
Learning
MapD
SQreamSome Machine
Learning
Deep Learning
CPU focus
GPU focus
SQL
eResearch 2016 - (c) Cray Inc
9/11/2016
8
HPC-type Scale
and Problems
NoSQL
Streaming
GraphMachine
Learning
Some Deep
Learning
MapD
SQreamSome Machine
LearningDeep
Learning
CPU focus
GPU focus
SQL
Copyright 2016 Cray Inc.
eResearch 2016 - (c) Cray Inc
HPC-type Scale
and Problems
NoSQL
Streaming
GraphMachine
Learning
Some Deep
Learning
MapD
SQreamSome Machine
LearningDeep
Learning
CPU focus
GPU focus
100+ Layer
Neural
Networks
Specific Scope
and Use Cases
Convergence Data chains & Analytics workflows
Teams & Tools
Platforms and Technologies
SQL
Copyright 2016 Cray Inc.
eResearch 2016 - (c) Cray Inc
9/11/2016
9
HPC-type Scale
and Problems
NoSQL
Streaming
GraphMachine
Learning
Some Deep
Learning
MapD
SQreamSome Machine
LearningDeep
Learning
CPU focus
GPU focus
100+ Layer
Neural
Networks
Specific Scope
and Use Cases
Convergence Data chains & Analytics workflows
Teams & Tools
Platforms and Technologies
SQL
Copyright 2016 Cray Inc.
eResearch 2016 - (c) Cray Inc
Map Reduce
N-body methods
Graph traversal
Graphical models
Dense and sparse linear algebra
Spectral methods
Structured and unstructured grids
Combinational logic
Dynamic programming
Backtrack and branch-and-bound
Finite-state machines
Basic statistics – simple Map Reduce implementation
Generalized n-body problems
Graph-theoretic computations
Linear algebraic computations
Optimizations – e.g., linear programming
Integration/machine learning
Alignment problems – e.g., BLAST
eResearch 2016 - (c) Cray Inc
Landscape of Parallel Computing Research (Berkeley – 2006/2008)
State of Big Data: Use Cases and Ogre Patterns (NIST 2014)
9/11/2016
10
Components
eResearch 2016 - (c) Cray Inc
eResearch 2016 - (c) Cray Inc
Hardware
Data
OS Software
Management
Application Software
9/11/2016
11
eResearch 2016 - (c) Cray Inc
Single Machine
Cloud
Cluster Cloud
Linux (CoreOS, CentOS, Ubuntu, RedHat)
Docker, Mesos, Kubernetes, Marathon, Fleet,
CNTK
NAS HDFS
TensorFlow MXNet Caffe Torch Warp DSSTNE
eResearch 2016 - (c) Cray Inc
Single Machine
Cloud
Cluster Cloud
Linux (CoreOS, CentOS, Ubuntu, RedHat)
Docker, Mesos, Kubernetes, Marathon, Fleet
CNTK
NAS HDFS
TensorFlow MXNet Caffe Torch Warp
HW
Storage
OS
Mgmt
DSSTNEToolkits
9/11/2016
12
Where is Cray headed ?
eResearch 2016 - (c) Cray Inc
eResearch 2016 - (c) Cray Inc
Single Machine
Cloud
Cluster Cloud
Linux (CoreOS, CentOS, Ubuntu, RedHat)
Docker, Mesos, Kubernetes, Marathon, Fleet
CNTK
NAS HDFS
TensorFlow MXNet Caffe Torch Warp
HW
Storage
OS
Mgmt
DSSTNEToolkits
9/11/2016
13
CS-Storm
Three Focus Areas
• Computation
• Storage
• Analytics
● NVIDIA M40 24GB + CS-Storm● Variant of CS-Storm designed for Machine Learning (ML)
● 8 x M40 24GB / Machine ● (3072 CUDA cores + 24 GB GPU memory)
● 512 GB – 1 TB of RAM
● Up to 6 SSDs
● Dual Rail IB
● Key Features / Data Points● Workloads have seen a 1.2 – 1.8x improvement
● Optimizations in CUDA not available with K40 or K80
● Building Block for Deep Learning Compute Solution
● Power and Cooling Integrity
eResearch 2016 - (c) Cray Inc
eResearch 2016 - (c) Cray Inc
9/11/2016
14
eResearch 2016 - (c) Cray Inc
eResearch 2016 - (c) Cray Inc
9/11/2016
15
eResearch 2016 - (c) Cray Inc
Artificial Intelligence Pipeline
eResearch 2016 - (c) Cray Inc
9/11/2016
16
Deep Learning @ Cray
Many supported frameworks, common challenges
Need for Data
Support for 3rd Party Libraries (e.g. MKL-DNN & cuDNN)
Highly scalable SGD codes are on the way and
some are already here:
● CNTK-1 Bit SGD and BlockMomentum
● TensorFlow distributed
● MXNET MPI+OpenMP parallelism
eResearch 2016 - (c) Cray Inc
Cray Deep Learning Research
eResearch 2016 - (c) Cray Inc
9/11/2016
17
Three Focus Areas
• Computation
• Storage
• Analytics
https://blogs.technet.microsoft.com/inside_microsoft_research/2015/12/07/microsoft-computational-network-toolkit-offers-most-efficient-distributed-deep-learning-computational-performance/
eResearch 2016 - (c) Cray Inc
CNTK Scaling on XC-40s and Clusters
0
5
10
15
20
25
30
1 2 4 8 16 32
AV
G E
PO
CH
TIM
E
NUMBER GPUS
CNTK-1BIT SGD FFN BenchmarkXC(Aeries) vs Storm(IB)
Storm-K40 + Default OpenMPI Storm-K40 + Tuned OpenMPI XC-K40 Cray MPICH Defaults XC-K40 Cray MPICH Tuned
eResearch 2016 - (c) Cray Inc
9/11/2016
18
Wrap Up
eResearch 2016 - (c) Cray Inc
Summary
● Close Relationship between HPC workloads and Machine Learning
● Machine Learning is changing how we think about HPC (Data Movement, Workload Resiliency, etc.)
● Desire to make ML easy on Cray Systems (choice of HW and Toolkits)
● Fake Data / Small Data – negative influence on performance optimization targets
● Real Data Sets / Large Scale Workloads – challenges with libraries, implementations and HW
● Engineering and Research Development across Cray HW platforms & components
● Understanding and Learning (Different ML Toolkits + Data Movement + Network Performance Optimizations)
eResearch 2016 - (c) Cray Inc
9/11/2016
19
Thank You
Mark_Staveley
mstaveley@cray.com
Thursday – 10:30-10:50 – Industry State 1/2 –
Scaling Out Deep Learning Workloads on Cray
Systems
Recommended