Construct a sharable GPU farm for Data Scientist-v2 · Construct a sharable GPU farm for Data Scientist Layne Peng DellEMCOCTO TRIGr. 2of 15 DellEMC Agenda qIntroduction qProblems

Construct a sharable GPU farm for Data Scientist

Layne Peng

DellEMC OCTO TRIGr

DellEMC2 of 15

Agenda

q Introduction

q Problems of consuming GPU

q GPU-as-a Service (project Asaka)

q How to manage GPU in farm?

q How to schedule jobs to GPU?

q How to integrate with cloud platform?

q Next step and contacts

DellEMC3 of 15

Technology Research Innovation Group (TRIGr)

q Innovation

q Advance Research

q Proof of Concept

q User Feedback

q Agile Roadmap

DellEMC4 of 15

About me

Layne Peng, Principal Technologist at DellEMC OCTO, experienced in cloud computing and SDDC areas, responsible for cloud computing related initiatives since joint in 2011, 10+ patents and a big data and cloud computing book author.

Email: [email protected]: @layne_peng

Wechat:

DellEMC5 of 15

What is GPU & GPGPU?

q Graphical Processing Unitq Designed for Parallel Processing

q Latency: Increase Clock Speedq Throughput: Increase Concurrent Task Execution

q General-Purpose GPUq Why: Scientific Computing and Graphical

processing => Matrix & Vectorq Prerequisite: key feature supported in GPU, such

as float point computingq Easy to use: CUDA => Theano, Tensorflow and etc.q Widely used in machine learning and deep

learning area

DellEMC6 of 15

Problems of consuming GPU today

q Not widely deployedq High costq Low availability when sharingq Hard to monitoring in fine-grained

ü Remote GPU as a Serviceü Fine-grained resource controlü Sharableü Fine-grained resource monitor and

tracing

Offering GPU in cloud makes it more complex:q How to abstract GPU resource?

§ Pass-through solution todayq Performance issues caused by networkq Resource discovery and provisioning

DellEMC7 of 15

Project Asaka – GPU-as-a Service

Remote GPU Access

• Consume remote GPU resource transparently

• Based on queue model and various network fabric

GPU Sharing

• N:1 model: multiple user consume same GPU accelerator

• Fine-grained control and management

GPU Chaining

• 1:N model: multiple GPU hosts chaining as one to serve a large size to 1 application;

• Dynamical GPU number fit to application

Smart Scheduling

• GPU pooling, discovery and provision• Fine-grained resource monitor and tracing• Pluggable intelligent scheduling algorithms:

o Heterogeneous resource o Network tradeoffs

DellEMC8 of 15

Overview of project Asaka

ClientNetwork

Fabric Server

Network(TCP, RDMA)

C

CUDA APIs

Asaka GPU Library

vGPU

C

CUDA APIs

Asaka GPU Library

vGPU

Asaka Server

CUDA APIs

Asaka Server

CUDA APIs

Asaka Server

CUDA APIs

C : CUDA Application

• Providing transparent remote GPU resource access:– CUDA APIs interception– Intelligent selection of network fabric:

TCP or RDMA

• Abstract the GPU resource to vGPU:– Device from client’s view, same with GPU– Small management grained

• GPU farm:– GPU hosts management– Sharable resource pool– Chaining features– Intelligent scheduling

DellEMC9 of 15

How to manage GPU in farm?

Asaka Server

CUDA APIs

Controller Agent

Asaka Server

CUDA APIs

Controller Agent Asaka

Server

CUDA APIs

Controller Agent

Controller Manager

Controller Manager

Controller Manager

Asaka Server

CUDA APIs

Controller Agent

a. Request to add to GPU farm

b. Request/Force to leave GPU farm

c. Failure detected and expel failed node

• Manage GPU farm:– Consul based– Basic cluster features:

› Add new nodes› Remove nodes› Detect failure› Expel nodes and resurrect nodes

• Key takeaways:– Abstract the hardware resource in a

enough level– Learn the experience from MSA

Note: • Controller Master: a executable binary based on

Consul Server with adding features; It can also deploy to the node with GPU;

• Controller Agent: is an extension of Consul Agent with checking scripts and special configuration.

Cluster linkAction

DellEMC10 of 15

How to select GPU hosts to chain?

Asaka Server

CUDA APIs

Controller Agent

Asaka Server

CUDA APIs

Controller Agent

Asaka Server

CUDA APIs

Controller Agent

Network Radius

Selected hosts

vGPU vGPU

vGPU vGPU

vGPU vGPU

XaaS_Controller

Global SchedulerAl

loca

tion

API

1. Select the nodes according to capacity and network radius

2. Chaining GPU from hosts selected to serve as one big machine has nine vGPU

• GPU can be chained and exposed as one large machine to application

• GPU hosts selected for chaining by:– Capacity– Queue length (crowdedness)– Network radius

› Hops› Bandwidth

• Key takeaways:– Network is the first citizen of

considering which GPU serve to application in remote case and chaining servers

C

CUDA APIs

Asaka GPU Library

vGPU

3. Offer vGPUsto client

DellEMC11 of 15

How to schedule jobs to GPU hosts?

ServerTasks Queue

Smart Dispatcher

Frontend

Client Application

Asaka GPU Library

Server

Server

Server

Distributed G

lobal State Store

Intelligent Analysis

Host Dispatcher

Network

Strategy

Q-Length

Strategy

H-Res

Strategy…

• Two-level scheduling:– 1st level: find GPU hosts from farm– 2nd level: find device for CUDA requests

• Global smart scheduling:– Global status monitor and trace– Network topology and analysis– Queue-based priority– Pluggable scheduling strategy

• Key takeaways:– Think resources scheduling in a global

view

DellEMC12 of 15

Leverage ML to solve heterogeneous scheduling?

XaaS_Controller

Global Scheduler

Scheduler API

Allo

catio

n AP

I

Colocation_Scheduler

Externel_Scheduler

Cluster Manager

Training Phase Inference Phase

Model training

Model trained

Server

Server

Server

Server

3. Execute the Job

1. Job requests

Sync nodes attributes & status

: Job : Allocation Strategy

2. Allocation Strategies

Learn…

• Heterogeneous scheduling:– Heterogeneous hardware resource– Heterogeneous network fabric– NP-hard problem

› Greedy Algorithm› Or …?

• Pluggable scheduler design:– Abstract of scheduler API– Machine learning jobs has work load

pattern– Metadata matters for scheduling

• Use machine learning to solve the scheduling problem of machine learning

• Status: on-going

DellEMC13 of 15

How to integrate with cloud platform?

: Job : Allocation Strategy

XaaS_Controller

Global Scheduler

Allo

catio

n AP

I

Cluster Manager

Server

Server

Server

Server

3. Execute the Job

1. Job requests

Sync nodes attributes & status

2. Allocation Strategies

4. Transparently consume GPU as a service

• Offering allocation strategies as a service:– High-level abstraction of

allocation/scheduling functions– Non-invasion design, easy to integrate

with existed scheduler using in Cloud Native Platform

• Access GPU transparently from applications deployed in Cloud Native Platform

• Ease of integrating to Cloud Native Platform:– Mesos/Marathon– Kubernetes– Cloud Foundry (demo in DellEMC

World 2017)

DellEMC14 of 15

Example: Cloud Foundry Integration

• As a friendly machine learning interface at to project Asaka– Receive jobs from users– Manage cluster– Start the containers for jobs

• Cloud Foundry offering – Machine learning platform doesn’t

need to consider how environment configure and hardware accelerations

– “Run my app with GPU! I don’t care how!”

• Reach out: – Victor Fong from DellEMC Dojo

› [email protected]› @victorfong

Data Scientist Users

CLI

GUI

XaaS_Controller

Global Scheduler

Allo

catio

n AP

I

Asaka Server

CUDA APIs

Controller Agent

Asaka Server

CUDA APIs

Controller Agent

Asaka Server

CUDA APIs

Controller Agent

GPU Farm

…

Cloud Foundry

Cloud Controller

Diego Brain

Loggreator

Marketplace

Die

go C

ell

Die

go C

ell

Router

1. Deploy machine learning task/application with CLI

2. Provision GPU service for application according to XaaS_Controller allocation strategies, and bind to applications

3. Users consume the task or application.

DellEMC15 of 15

Next step and contacts

q Intelligent schedulingq Schedule according to application metadataq Model and calculate network distanceq Introduce machine learning to solve heterogeneous resource

schedulingq Keep improve Asaka performance

q Intelligent data movement and loadingq Enterprise features such as snapshot and etc.

q XaaSq More hardware resource as a service, such as FPGAaaSq Global-wide maximized performance

q Reach out the teamq Jack Harwood: [email protected] (distinguished engineer in DellEMC)q Frank Zhao: [email protected] (consultant engineer, leads data path)q Layne Peng: [email protected]

Documents

Construct a sharable GPU farm for Data Scientist-v2 · Construct a sharable GPU farm for Data Scientist Layne Peng DellEMCOCTO TRIGr. 2of 15 DellEMC Agenda qIntroduction qProblems