rCUDA: desde máquinas virtuales a clústers mixtos CPU-GPUdesde máquinas virtuales a clústers...

Preview:

Citation preview

rCUDA:

desde máquinas virtuales a

clústers mixtos CPU-GPU

Federico Silla

Universitat Politècnica de València

HPC ADMINTECH 2018

rCUDA:

from virtual machines to

hybrid CPU-GPU clusters

HPC ADMINTECH 2018

Federico Silla

Universitat Politècnica de València

What is rCUDA?

Outline

F. Silla @ HPC ADMINTECH 2018

Basics of GPU computing

Remark:

GPUs can only be used within

the node they are attached to

Basic behavior of CUDA

GPU

F. Silla @ HPC ADMINTECH 2018

Remark:

GPUs can only be used within

the node they are attached to

GPU

Basics of GPU computing

Basic behavior of CUDA

F. Silla @ HPC ADMINTECH 2018

No GPU

A different approach: remote GPU virtualization

F. Silla @ HPC ADMINTECH 2018

A software technology that enables a more flexible

use of GPUs in computing facilities

rCUDA is a development by Universitat Politècnica de València

rCUDA … remote CUDA

A different approach: remote GPU virtualization

No GPU

Access to remote GPU is

transparent to applications:

no source code

modification is needed

rCUDA is a development by Universitat Politècnica de València

Basics or rCUDA

rCUDA is a development by Universitat Politècnica de València

Basics or rCUDA

Access to remote GPU is

transparent to applications:

no source code

modification is needed

rCUDA is a development by Universitat Politècnica de València

Basics or rCUDA

Access to remote GPU is

transparent to applications:

no source code

modification is needed

F. Silla @ HPC ADMINTECH 2018

rCUDA supports RDMA transfers

Physical

configuration

rCUDA allows a new vision of

a GPU deployment, moving from

the usual cluster configuration …

… to the following one:

Logical

configuration

F. Silla @ HPC ADMINTECH 2018

rCUDA envision

Perfomance of rCUDA?

Outline

F. Silla @ HPC ADMINTECH 2018

Loweris better

• K20 GPU and FDR InfiniBand

• K40 GPU and EDR InfiniBand

F. Silla @ HPC ADMINTECH 2018

Performance of rCUDA

CUDA-MEMEBarraCUDA

Loweris better

Loweris betterP100 GPU and EDR InfiniBand

F. Silla @ HPC ADMINTECH 2018

Performance of rCUDA

rCUDA scenario 1

rCUDA scenario 2

rCUDACUDA

F. Silla @ HPC ADMINTECH 2018

Performance of data movements among GPUs

Higher is better

F. Silla @ HPC ADMINTECH 2018

Performance of data movements among GPUs

rCUDACUDA

F. Silla @ HPC ADMINTECH 2018

Performance of data movements to/from GPUs

CPU to GPU

GPU to CPU

Higheris better

F. Silla @ HPC ADMINTECH 2018

Performance of data movements to/from GPUs

Higheris better

F. Silla @ HPC ADMINTECH 2018

Performance of data movements to/from GPUs

CPU to GPU

GPU to CPU

Higheris better

F. Silla @ HPC ADMINTECH 2018

Performance of data movements to/from GPUs

CPU to GPU

GPU to CPU

New communication module in progress

F. Silla @ HPC ADMINTECH 2018

Performance of data movements to/from GPUs

Benefits of rCUDA?

Outline

F. Silla @ HPC ADMINTECH 2018

Benefits of rCUDA:

1. Many GPUs for an application

2. Server consolidation

3. GPU acceleration for virtual machines

4. Increased cluster throughput

Outline

F. Silla @ HPC ADMINTECH 2018

F. Silla @ HPC ADMINTECH 2018

Providing many GPUs to an application with rCUDA

Loweris better

MonteCarlo multi-GPU program running in 14 NVIDIA Tesla K20 GPUs

K20 GPUs and FDR InfiniBand

F. Silla @ HPC ADMINTECH 2018

Providing many GPUs to an application with rCUDA

64 GPUs !!

F. Silla @ HPC ADMINTECH 2018

Providing many GPUs to an application with rCUDA

K20 GPUs

Work in progress!!

non-optimized (yet) version of rCUDA!!!

GPU

1

GPU

2

GPU

3

GPU

4

GPU

5GPU

6

GPU

7

GPU

8

GPU

9

GPU

10

GPU

11GPU

12

GPU

13

GPU

14

GPU

15

GPU

16

F. Silla @ HPC ADMINTECH 2018

Providing many GPUs to an application with rCUDA

1

3

7

9

12

13

14

GPU utilization (%)

20 40 60 80 1000

off

off

off

off

off

off

off

F. Silla @ HPC ADMINTECH 2018

Server consolidation with rCUDA

• The GPU-Blast application is migrated up to 5 times among K40 GPUs

• The aggregated volume of GPU data is 1300 MB (consisting of 9 memory regions)

The “Reference” line is the execution time of the application when using CUDA with a local GPU and without any migration

Loweris better

F. Silla @ HPC ADMINTECH 2018

Server consolidation with rCUDA

• How to access the GPU in the native domain from inside of virtual machines?

F. Silla @ HPC ADMINTECH 2018

Virtual machines may need access to GPUs

• The GPU is assigned by using PCI passthrough exclusively to a single virtual machine

• Concurrent usage of the GPU is not possible

F. Silla @ HPC ADMINTECH 2018

Virtual machines may need access to GPUs

• If InfiniBand is available, the rCUDA server can be placed in another node

• Several GPUs can be provided to the VMs, either in a single remote node or in several remote nodes

High performance network fabric available

F. Silla @ HPC ADMINTECH 2018

Using rCUDA to access the GPU

• When InfiniBand is not available, the rCUDA server may be placed in the native domain and the rCUDAclient would be placed inside the VMs

High performance network is not

available

• The virtual network provided by the hypervisor would be used to exchange data between the rCUDA clients and the rCUDA server

• This configuration allows the use of more than one GPU at the host

F. Silla @ HPC ADMINTECH 2018

Using rCUDA to access the GPU

F. Silla @ HPC ADMINTECH 2018

Using rCUDA to access the GPU

...

...

One rCUDA box serves multiple clients

F. Silla @ HPC ADMINTECH 2018

Increased cluster throughput

1. BarraCUDA

2. CUDA-MEME

3. CUDASW++

4. GPU-Blast

5. Gromacs

6. Magma

Lower

is better

- 58%

F. Silla @ HPC ADMINTECH 2018

Increased cluster throughput

GPU assigned but not used

GPU assigned but not used

F. Silla @ HPC ADMINTECH 2018

Increased cluster throughput

One more benefit:

Heterogeneous2 environments

Outline

F. Silla @ HPC ADMINTECH 2018

rCUDA is available for the x86,

POWER and ARM processors

F. Silla @ HPC ADMINTECH 2018

rCUDA availability

Performance of rCUDA

on ARM systems

Outline

F. Silla @ HPC ADMINTECH 2018

ThunderX

F. Silla @ HPC ADMINTECH 2018

From ARM to x86 with rCUDA

Work in progress. A couple of applications have been already

analyzed:

1. Cloverleaf: a mini-app that solves the compressible Euler

equations on a Cartesian grid

2. Flow: a mini-app that implements a 2D hydrodynamics

simulator

F. Silla @ HPC ADMINTECH 2018

Application performance

Lower

is

better

Single node

executions

Estimation over

multiple nodes

F. Silla @ HPC ADMINTECH 2018

Application performance: Cloverleaf

Rough energy estimation:

ThunderX TDP = 80 watts

P100 TDP = 250 watts

Xeon TDP = 140 watts

40*80 versus 1*80+3*250+2*140

3200 watts versus 1110 watts

Application performance: Cloverleaf

Lower

is

better

Single node

executions

Estimation over

multiple nodes

F. Silla @ HPC ADMINTECH 2018

Lower

is

better

Single node

executions

F. Silla @ HPC ADMINTECH 2018

Application performance: Flow

Estimation over

multiple nodes

Rough energy estimation:

ThunderX TDP = 80 watts

P100 TDP = 250 watts

Xeon TDP = 140 watts

60*80 versus 1*80+3*250+2*140

4800 watts versus 1110 watts

Application performance: Flow

Lower

is

better

Single node

executions

F. Silla @ HPC ADMINTECH 2018

Estimation over

multiple nodes

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

Hig

h d

en

sit

y A

RM

-ba

se

d n

od

es

F. Silla @ HPC ADMINTECH 2018

Hybrid CPU-GPU clusters

F. Silla @ HPC ADMINTECH 2018

Hybrid CPU-GPU clusters

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

Hig

h d

en

sit

y A

RM

-ba

se

d n

od

es

rCUDA serversrCUDA clients

F. Silla @ HPC ADMINTECH 2018

Hybrid CPU-GPU clusters

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

Hig

h d

en

sit

y A

RM

-ba

se

d n

od

es

Get a free copy of rCUDA at

http://www.rcuda.net

@rcuda_

More than 900 requests world wide

rCUDA is a development by Universitat Politècnica de València, Spain

·Tony Díaz · Pablo Higueras · Javier Prades · Jaime Sierra

· Cristian Peñaranda · Federico Silla · Carlos Reaño

rCUDA is a development by Universitat Politècnica de València, Spain