Solutions Showcase: Machine Learning Workloads on Cray Systems€¦ · Solutions Showcase: Machine...

Preview:

Citation preview

9/11/2016

1

Solutions Showcase: Machine Learning Workloads on Cray Systems

Mark Staveley

Machine Learning Research @ Cray

9/11/2016

2

Machine Learning

Analytics

Artificial Intelligence

eResearch 2016 - (c) Cray Inc

NoSQL

Streaming

GraphMachine

Learning

Some Deep

Learning

MapD

SQreamSome Machine

Learning

Deep Learning

CPU focus

GPU focus

SQL

eResearch 2016 - (c) Cray Inc

9/11/2016

3

Create code from data

Insight Into Data

Emulate human mind

eResearch 2016 - (c) Cray Inc

eResearch 2016 - (c) Cray Inc

9/11/2016

4

eResearch 2016 - (c) Cray Inc

Machine Learning Workflow

eResearch 2016 - (c) Cray Inc

9/11/2016

5

DL Example: Radiograph Classifier

Training Images

Image Results:

Bounding Boxes

Classification Scores For:

Tumors

Broken bones

Lesions

Etc.

3) Trained model

scores images1) Designs Neural

Net Architecture

2) Training defines

linking weights

Developer /

Researcher

Important Points:

• Parallelism and local memory: GPUs

• Single / half-precision floats speed up

learning without loss of accuracy

• All-to-all communication is required

for scale and works well on Aries

• Scoring is separate from training

eResearch 2016 - (c) Cray Inc

DL Example: Radiograph Classifier

Training Images

Image Results:

Bounding Boxes

Classification Scores For:

Tumors

Broken bones

Lesions

Etc.

3) Trained model

scores images1) Designs Neural

Net Architecture

2) Training defines

linking weights

Developer /

Researcher

Important Points:

• Parallelism and local memory: GPUs

• Single / half-precision floats speed up

learning without loss of accuracy

• All-to-all communication is required

for scale and works well on Aries

• Scoring is separate from training

eResearch 2016 - (c) Cray Inc

9/11/2016

6

Artificial Intelligence Pipeline

eResearch 2016 - (c) Cray Inc

Artificial Intelligence Pipeline

eResearch 2016 - (c) Cray Inc

9/11/2016

7

Why Cray ?

eResearch 2016 - (c) Cray Inc

NoSQL

Streaming

GraphMachine

Learning

Some Deep

Learning

MapD

SQreamSome Machine

Learning

Deep Learning

CPU focus

GPU focus

SQL

eResearch 2016 - (c) Cray Inc

9/11/2016

8

HPC-type Scale

and Problems

NoSQL

Streaming

GraphMachine

Learning

Some Deep

Learning

MapD

SQreamSome Machine

LearningDeep

Learning

CPU focus

GPU focus

SQL

Copyright 2016 Cray Inc.

eResearch 2016 - (c) Cray Inc

HPC-type Scale

and Problems

NoSQL

Streaming

GraphMachine

Learning

Some Deep

Learning

MapD

SQreamSome Machine

LearningDeep

Learning

CPU focus

GPU focus

100+ Layer

Neural

Networks

Specific Scope

and Use Cases

Convergence Data chains & Analytics workflows

Teams & Tools

Platforms and Technologies

SQL

Copyright 2016 Cray Inc.

eResearch 2016 - (c) Cray Inc

9/11/2016

9

HPC-type Scale

and Problems

NoSQL

Streaming

GraphMachine

Learning

Some Deep

Learning

MapD

SQreamSome Machine

LearningDeep

Learning

CPU focus

GPU focus

100+ Layer

Neural

Networks

Specific Scope

and Use Cases

Convergence Data chains & Analytics workflows

Teams & Tools

Platforms and Technologies

SQL

Copyright 2016 Cray Inc.

eResearch 2016 - (c) Cray Inc

Map Reduce

N-body methods

Graph traversal

Graphical models

Dense and sparse linear algebra

Spectral methods

Structured and unstructured grids

Combinational logic

Dynamic programming

Backtrack and branch-and-bound

Finite-state machines

Basic statistics – simple Map Reduce implementation

Generalized n-body problems

Graph-theoretic computations

Linear algebraic computations

Optimizations – e.g., linear programming

Integration/machine learning

Alignment problems – e.g., BLAST

eResearch 2016 - (c) Cray Inc

Landscape of Parallel Computing Research (Berkeley – 2006/2008)

State of Big Data: Use Cases and Ogre Patterns (NIST 2014)

9/11/2016

10

Components

eResearch 2016 - (c) Cray Inc

eResearch 2016 - (c) Cray Inc

Hardware

Data

OS Software

Management

Application Software

9/11/2016

11

eResearch 2016 - (c) Cray Inc

Single Machine

Cloud

Cluster Cloud

Linux (CoreOS, CentOS, Ubuntu, RedHat)

Docker, Mesos, Kubernetes, Marathon, Fleet,

CNTK

NAS HDFS

TensorFlow MXNet Caffe Torch Warp DSSTNE

eResearch 2016 - (c) Cray Inc

Single Machine

Cloud

Cluster Cloud

Linux (CoreOS, CentOS, Ubuntu, RedHat)

Docker, Mesos, Kubernetes, Marathon, Fleet

CNTK

NAS HDFS

TensorFlow MXNet Caffe Torch Warp

HW

Storage

OS

Mgmt

DSSTNEToolkits

9/11/2016

12

Where is Cray headed ?

eResearch 2016 - (c) Cray Inc

eResearch 2016 - (c) Cray Inc

Single Machine

Cloud

Cluster Cloud

Linux (CoreOS, CentOS, Ubuntu, RedHat)

Docker, Mesos, Kubernetes, Marathon, Fleet

CNTK

NAS HDFS

TensorFlow MXNet Caffe Torch Warp

HW

Storage

OS

Mgmt

DSSTNEToolkits

9/11/2016

13

CS-Storm

Three Focus Areas

• Computation

• Storage

• Analytics

● NVIDIA M40 24GB + CS-Storm● Variant of CS-Storm designed for Machine Learning (ML)

● 8 x M40 24GB / Machine ● (3072 CUDA cores + 24 GB GPU memory)

● 512 GB – 1 TB of RAM

● Up to 6 SSDs

● Dual Rail IB

● Key Features / Data Points● Workloads have seen a 1.2 – 1.8x improvement

● Optimizations in CUDA not available with K40 or K80

● Building Block for Deep Learning Compute Solution

● Power and Cooling Integrity

eResearch 2016 - (c) Cray Inc

eResearch 2016 - (c) Cray Inc

9/11/2016

14

eResearch 2016 - (c) Cray Inc

eResearch 2016 - (c) Cray Inc

9/11/2016

15

eResearch 2016 - (c) Cray Inc

Artificial Intelligence Pipeline

eResearch 2016 - (c) Cray Inc

9/11/2016

16

Deep Learning @ Cray

Many supported frameworks, common challenges

Need for Data

Support for 3rd Party Libraries (e.g. MKL-DNN & cuDNN)

Highly scalable SGD codes are on the way and

some are already here:

● CNTK-1 Bit SGD and BlockMomentum

● TensorFlow distributed

● MXNET MPI+OpenMP parallelism

eResearch 2016 - (c) Cray Inc

Cray Deep Learning Research

eResearch 2016 - (c) Cray Inc

9/11/2016

17

Three Focus Areas

• Computation

• Storage

• Analytics

https://blogs.technet.microsoft.com/inside_microsoft_research/2015/12/07/microsoft-computational-network-toolkit-offers-most-efficient-distributed-deep-learning-computational-performance/

eResearch 2016 - (c) Cray Inc

CNTK Scaling on XC-40s and Clusters

0

5

10

15

20

25

30

1 2 4 8 16 32

AV

G E

PO

CH

TIM

E

NUMBER GPUS

CNTK-1BIT SGD FFN BenchmarkXC(Aeries) vs Storm(IB)

Storm-K40 + Default OpenMPI Storm-K40 + Tuned OpenMPI XC-K40 Cray MPICH Defaults XC-K40 Cray MPICH Tuned

eResearch 2016 - (c) Cray Inc

9/11/2016

18

Wrap Up

eResearch 2016 - (c) Cray Inc

Summary

● Close Relationship between HPC workloads and Machine Learning

● Machine Learning is changing how we think about HPC (Data Movement, Workload Resiliency, etc.)

● Desire to make ML easy on Cray Systems (choice of HW and Toolkits)

● Fake Data / Small Data – negative influence on performance optimization targets

● Real Data Sets / Large Scale Workloads – challenges with libraries, implementations and HW

● Engineering and Research Development across Cray HW platforms & components

● Understanding and Learning (Different ML Toolkits + Data Movement + Network Performance Optimizations)

eResearch 2016 - (c) Cray Inc

9/11/2016

19

Thank You

Mark_Staveley

mstaveley@cray.com

Thursday – 10:30-10:50 – Industry State 1/2 –

Scaling Out Deep Learning Workloads on Cray

Systems

Recommended