Upload
ngocong
View
226
Download
1
Embed Size (px)
Citation preview
Construct a sharable GPU farm for Data Scientist
Layne Peng
DellEMC OCTO TRIGr
DellEMC2 of 15
Agenda
q Introduction
q Problems of consuming GPU
q GPU-as-a Service (project Asaka)
q How to manage GPU in farm?
q How to schedule jobs to GPU?
q How to integrate with cloud platform?
q Next step and contacts
DellEMC3 of 15
Technology Research Innovation Group (TRIGr)
q Innovation
q Advance Research
q Proof of Concept
q User Feedback
q Agile Roadmap
DellEMC4 of 15
About me
Layne Peng, Principal Technologist at DellEMC OCTO, experienced in cloud computing and SDDC areas, responsible for cloud computing related initiatives since joint in 2011, 10+ patents and a big data and cloud computing book author.
Email: [email protected]: @layne_peng
Wechat:
DellEMC5 of 15
What is GPU & GPGPU?
q Graphical Processing Unitq Designed for Parallel Processing
q Latency: Increase Clock Speedq Throughput: Increase Concurrent Task Execution
q General-Purpose GPUq Why: Scientific Computing and Graphical
processing => Matrix & Vectorq Prerequisite: key feature supported in GPU, such
as float point computingq Easy to use: CUDA => Theano, Tensorflow and etc.q Widely used in machine learning and deep
learning area
DellEMC6 of 15
Problems of consuming GPU today
q Not widely deployedq High costq Low availability when sharingq Hard to monitoring in fine-grained
ü Remote GPU as a Serviceü Fine-grained resource controlü Sharableü Fine-grained resource monitor and
tracing
Offering GPU in cloud makes it more complex:q How to abstract GPU resource?
§ Pass-through solution todayq Performance issues caused by networkq Resource discovery and provisioning
DellEMC7 of 15
Project Asaka – GPU-as-a Service
Remote GPU Access
• Consume remote GPU resource transparently
• Based on queue model and various network fabric
GPU Sharing
• N:1 model: multiple user consume same GPU accelerator
• Fine-grained control and management
GPU Chaining
• 1:N model: multiple GPU hosts chaining as one to serve a large size to 1 application;
• Dynamical GPU number fit to application
Smart Scheduling
• GPU pooling, discovery and provision• Fine-grained resource monitor and tracing• Pluggable intelligent scheduling algorithms:
o Heterogeneous resource o Network tradeoffs
DellEMC8 of 15
Overview of project Asaka
ClientNetwork
Fabric Server
Network(TCP, RDMA)
C
CUDA APIs
Asaka GPU Library
vGPU
C
CUDA APIs
Asaka GPU Library
vGPU
Asaka Server
CUDA APIs
Asaka Server
CUDA APIs
Asaka Server
CUDA APIs
C : CUDA Application
• Providing transparent remote GPU resource access:– CUDA APIs interception– Intelligent selection of network fabric:
TCP or RDMA
• Abstract the GPU resource to vGPU:– Device from client’s view, same with GPU– Small management grained
• GPU farm:– GPU hosts management– Sharable resource pool– Chaining features– Intelligent scheduling
DellEMC9 of 15
How to manage GPU in farm?
Asaka Server
CUDA APIs
Controller Agent
Asaka Server
CUDA APIs
Controller Agent Asaka
Server
CUDA APIs
Controller Agent
Controller Manager
Controller Manager
Controller Manager
Asaka Server
CUDA APIs
Controller Agent
a. Request to add to GPU farm
b. Request/Force to leave GPU farm
c. Failure detected and expel failed node
• Manage GPU farm:– Consul based– Basic cluster features:
› Add new nodes› Remove nodes› Detect failure› Expel nodes and resurrect nodes
• Key takeaways:– Abstract the hardware resource in a
enough level– Learn the experience from MSA
Note: • Controller Master: a executable binary based on
Consul Server with adding features; It can also deploy to the node with GPU;
• Controller Agent: is an extension of Consul Agent with checking scripts and special configuration.
Cluster linkAction
DellEMC10 of 15
How to select GPU hosts to chain?
Asaka Server
CUDA APIs
Controller Agent
Asaka Server
CUDA APIs
Controller Agent
Asaka Server
CUDA APIs
Controller Agent
Network Radius
Selected hosts
vGPU vGPU
vGPU vGPU
vGPU vGPU
XaaS_Controller
Global SchedulerAl
loca
tion
API
1. Select the nodes according to capacity and network radius
2. Chaining GPU from hosts selected to serve as one big machine has nine vGPU
• GPU can be chained and exposed as one large machine to application
• GPU hosts selected for chaining by:– Capacity– Queue length (crowdedness)– Network radius
› Hops› Bandwidth
• Key takeaways:– Network is the first citizen of
considering which GPU serve to application in remote case and chaining servers
C
CUDA APIs
Asaka GPU Library
vGPU
3. Offer vGPUsto client
DellEMC11 of 15
How to schedule jobs to GPU hosts?
ServerTasks Queue
Smart Dispatcher
Frontend
Client Application
Asaka GPU Library
Server
Server
Server
Distributed G
lobal State Store
Intelligent Analysis
Host Dispatcher
Network
Strategy
Q-Length
Strategy
H-Res
Strategy…
• Two-level scheduling:– 1st level: find GPU hosts from farm– 2nd level: find device for CUDA requests
• Global smart scheduling:– Global status monitor and trace– Network topology and analysis– Queue-based priority– Pluggable scheduling strategy
• Key takeaways:– Think resources scheduling in a global
view
DellEMC12 of 15
Leverage ML to solve heterogeneous scheduling?
XaaS_Controller
Global Scheduler
Scheduler API
Allo
catio
n AP
I
Colocation_Scheduler
Externel_Scheduler
Cluster Manager
Training Phase Inference Phase
Model training
Model trained
Server
Server
Server
Server
3. Execute the Job
1. Job requests
Sync nodes attributes & status
: Job : Allocation Strategy
2. Allocation Strategies
Learn…
• Heterogeneous scheduling:– Heterogeneous hardware resource– Heterogeneous network fabric– NP-hard problem
› Greedy Algorithm› Or …?
• Pluggable scheduler design:– Abstract of scheduler API– Machine learning jobs has work load
pattern– Metadata matters for scheduling
• Use machine learning to solve the scheduling problem of machine learning
• Status: on-going
DellEMC13 of 15
How to integrate with cloud platform?
: Job : Allocation Strategy
XaaS_Controller
Global Scheduler
Allo
catio
n AP
I
Cluster Manager
Server
Server
Server
Server
3. Execute the Job
1. Job requests
Sync nodes attributes & status
2. Allocation Strategies
4. Transparently consume GPU as a service
• Offering allocation strategies as a service:– High-level abstraction of
allocation/scheduling functions– Non-invasion design, easy to integrate
with existed scheduler using in Cloud Native Platform
• Access GPU transparently from applications deployed in Cloud Native Platform
• Ease of integrating to Cloud Native Platform:– Mesos/Marathon– Kubernetes– Cloud Foundry (demo in DellEMC
World 2017)
DellEMC14 of 15
Example: Cloud Foundry Integration
• As a friendly machine learning interface at to project Asaka– Receive jobs from users– Manage cluster– Start the containers for jobs
• Cloud Foundry offering – Machine learning platform doesn’t
need to consider how environment configure and hardware accelerations
– “Run my app with GPU! I don’t care how!”
• Reach out: – Victor Fong from DellEMC Dojo
› [email protected]› @victorfong
Data Scientist Users
CLI
GUI
XaaS_Controller
Global Scheduler
Allo
catio
n AP
I
Asaka Server
CUDA APIs
Controller Agent
Asaka Server
CUDA APIs
Controller Agent
Asaka Server
CUDA APIs
Controller Agent
GPU Farm
…
Cloud Foundry
Cloud Controller
Diego Brain
Loggreator
Marketplace
Die
go C
ell
Die
go C
ell
Router
1. Deploy machine learning task/application with CLI
2. Provision GPU service for application according to XaaS_Controller allocation strategies, and bind to applications
3. Users consume the task or application.
DellEMC15 of 15
Next step and contacts
q Intelligent schedulingq Schedule according to application metadataq Model and calculate network distanceq Introduce machine learning to solve heterogeneous resource
schedulingq Keep improve Asaka performance
q Intelligent data movement and loadingq Enterprise features such as snapshot and etc.
q XaaSq More hardware resource as a service, such as FPGAaaSq Global-wide maximized performance
q Reach out the teamq Jack Harwood: [email protected] (distinguished engineer in DellEMC)q Frank Zhao: [email protected] (consultant engineer, leads data path)q Layne Peng: [email protected]