Using GPU Virtualization with TensorFlo · 2020. 1. 15. · HPCAC Swiss Conference 2018 2/41...

Using GPU Virtualization with

TensorFlow

Carlos ReañoUniversitat Politècnica de València, Spain

http://mural.uv.es/caregon

HPC Advisory Council Swiss Conference 2018

April 9-12, 2018, Lugano, Switzerland

HPCAC Swiss Conference 2018 2/41

Outline

What is rCUDA?

Installing and using rCUDA

rCUDA over HPC networks

InfiniBand

How taking benefit from rCUDA

Sample scenarios

Questions & Answers

Outline

What is rCUDA?

InfiniBand

Sample scenarios

Questions & Answers

What is rCUDA?

Node 2 Node 1 GPUNetwork

Node 1 GPU

rCUDA (remote CUDA):

With rCUDA Node 2 can useNode 1 GPU!!!

Outline

What is rCUDA?

InfiniBand

Sample scenarios

Questions & Answers

Where obtain rCUDA?

◦ www.rCUDA.net: Software Request Form

Package contents. Important folders:

doc: rCUDA user’s guide & quick start guide

bin: rCUDA server daemon

lib: rCUDA library

Installing rCUDA

◦ Just untar the tarball in both the server(s) and the client(s) node(s)

Starting rCUDA server:

◦ Set env. vars as if you were going to run a CUDA program:

export PATH=$PATH:/usr/local/cuda/bin

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

◦ Start rCUDA server:

cd $HOME/rCUDA/bin

./rCUDAd

cd $HOME/rCUDA/bin

./rCUDAd

Path to CUDA binaries

cd $HOME/rCUDA/bin

./rCUDAd

Path to CUDA libraries

cd $HOME/rCUDA/bin

./rCUDAd

Path to rCUDA server

cd $HOME/rCUDA/bin

./rCUDAd

Start rCUDA server in background

Running a CUDA program with rCUDA:

◦ Set env. vars as follows:

export LD_LIBRARY_PATH=$HOME/rCUDA/lib:$LD_LIBRARY_PATH

export RCUDA_DEVICE_COUNT=1

export RCUDA_DEVICE_0=<server_name_or_ip_address>:0

◦ Compile CUDA program using dynamic libraries:

cd $HOME/NVIDIA_CUDA_Samples/1_Utilities/deviceQuery

make EXTRA_NVCCFLAGS=--cudart=shared

◦ Run the CUDA program as usual:

./deviceQuery

Path to CUDA binaries

./deviceQuery

Path to rCUDA library

./deviceQuery

Number of remote GPUs: 1, 2, 3...

./deviceQuery

Name/IP of rCUDA server

./deviceQuery

GPU of remote server to use

./deviceQuery

Very important!!!

./deviceQuery

Live demonstration:

◦ deviceQuery

◦ bandwidthTest

Live demonstration:

◦ deviceQuery

◦ bandwidthTest

Problem: bandwidth with rCUDA is too low!!

◦ Why? We are using TCP

Live demonstration:

◦ deviceQuery

◦ bandwidthTest

Problem: bandwidth with rCUDA is too low!!

◦ Why? We are using TCP

Solution: HPC networks

◦ InfiniBand (IB)

Outline

What is rCUDA?

InfiniBand

Sample scenarios

Questions & Answers

rCUDA over HPC networks: InfiniBand

Starting rCUDA server using IB:

export RCUDA_NETWORK=IB

cd $HOME/rCUDA/bin

./rCUDAd

Run CUDA program using rCUDA over IB:

cd $HOME/NVIDIA_CUDA_Samples/1_Utilities/bandwidthTest

./bandwidthTest

cd $HOME/rCUDA/bin

./rCUDAd

./bandwidthTest

Tell rCUDA we want to use IB

Also in the client!!

cd $HOME/rCUDA/bin

./rCUDAd

./bandwidthTest

Live demonstration:

◦ bandwidthTest using IB

◦ Bandwidth is no more a problem!!

Outline

What is rCUDA?

InfiniBand

Sample scenarios

Questions & Answers

Sample scenarios:

◦ Typical behavior of CUDA applications: moving data to the GPU and

performing a lot of computations there to compensate the overhead

of having moved the data

This benefits rCUDA: more computations, less rCUDA overhead

◦ Scalable applications: more GPUs, less execution time

rCUDA can use all the GPUs of the cluster, while CUDA only can use the ones

directly connected to one node: for some applications, rCUDA can get better

results than with CUDA

◦ Heterogeneous clusters: access to GPU servers from ATOM, ARM…

rCUDA can be used to access GPU servers in x86 or Power8 machines, from

different systems and architectures (ATOM, ARM, Intel-D...)

Three main types of applications:

◦ Bandwidth bounded: more transfers, more rCUDA overhead

◦ Computations bounded: more computations, less rCUDA overhead

◦ Intermediate

Tensorflow

GPU vs. remote GPU

Overhead of remote GPUs?

Live demonstration:

Tensorflow with CUDA

Tensorflow with rCUDA

Tensorflow

CPU vs. remote GPU

What is better: a local CPU or a remote GPU?

Live demonstration:

Tensorflow on CPU (without CUDA)

Sample scenarios:

Multi-GPU scenario

Node 0

Node 1

Network

Node 1

rCUDA (remote CUDA):

Multi-GPU running in Node 0using all GPUs in the cluster

Multi-GPU App running in Node 1 using all the GPUs in the node

Node 2 GPU

Node 3 GPU

Node n GPU

... ...

Multi-GPU Configuration

Configure rCUDA for Multi-GPU:

export LD_LIBRARY_PATH=$HOME/rCUDA/framework/rCUDAl:$LD_LIBRARY_PATH

export RCUDA_DEVICE_0=node1:0

◦ Check configuration by running deviceQuery sample

Number of remote GPUs

Location of each GPU

Multi-GPU Tensorflow

Live demonstration:

deviceQuery sample with multiple GPUs

Multi-GPU Tensorflow

Sample scenarios:

Heterogeneous clusters:

◦ Access from low power nodes (Atom, ARM , Intel D…) to x86 GPU

accelerated nodes

◦ Access from no-Power8 nodes to Power8 GPU accelerated nodes

Outline

What is rCUDA?

InfiniBand

Sample scenarios

Questions & Answers

Get a free copy of rCUDA at

http://www.rcuda.net

@rcuda_

Using GPU Virtualization with TensorFlo · 2020. 1. 15. · HPCAC Swiss Conference 2018 2/41...

Documents

The rCUDA middleware and applications€¦ · rCUDA currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions, which

The Last Ten Years in HPC And the Next Ten · The Last Ten Years in HPC – And the Next Ten HPCAC Spain, September 2016 Addison Snell addison@intersect360.com

rCUDA v20.07alpha User’s Guide

Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Framework of rCUDA: An Overview · 2016-12-14 · 2 Project Director , Bhaskaracharya Institute for Space Applications and Geo-Informatics(BISAG) ... ISO 9001:2008 Certified Journal

How to Achieve High-Performance, Scalable and Distributed ...€¦ · Network Based Computing Laboratory HPCAC-AI-Stanford (April ‘20) 17 • gRPC – Officially available and supported

Calista Redmond - HPC Advisory Council...Calista Redmond President, OpenPOWER Foundation Director, IBM OpenPOWER Global Alliances ... Contribution of OpenPOWER systems to HPCAC lab

HPCXXL/HPCAC 2018 -- Charliecloud · 2020-01-15 · Support building, distribution, validation, and execution Provide for complete handling of containers throughout lifecycle Examples:

HPCXXL/HPCAC 2018 -- Charliecloud...LANL’s history in high-performance computing is long and storied, dating back to the early ‘50s. Accomplishments include: • Helped IBM develop

Best Practices: Application Profiling at the HPCAC High ...hpcadvisorycouncil.com/pdf/OpenFOAM-UK.pdf · Best Practices: Application Profiling at the HPCAC High Performance Center

Maximizing resource usage in multifold molecular dynamics ... · Maximizing resource usage in Multi-fold Molecular Dynamics with rCUDA Javier Prades1, Baldomero Imbernon´ 2, Carlos

rCUDA: desde máquinas virtuales a clústers mixtos CPU-GPUdesde máquinas virtuales a clústers mixtos CPU-GPU Federico Silla Universitat Politècnica de València HPC ADMINTECH 2018

Best Practices: Application Profiling at the HPCAC High …hpcadvisorycouncil.com/events/2017/stanford-workshop/pdf/PakLui_… · Best Practices: Application Profiling at the HPCAC

In-Network Computing SHARP Technology for MPI Offloads · Devendar Bureddy HPCAC-Stanford, Feb 8, 2017 In-Network Computing SHARP Technology for MPI Offloads

Increasing Cluster Performance by Combining rCUDA with Slurm

Application Profiling at the HPCAC High Performance Center

Software Libraries and Middleware for Exascale Systems · 2020-01-16 · Network Based Computing Laboratory HPCAC-Swiss (April ’19) 11Overview of the MVAPICH2 Project • High Performance

HPCAC-ISC Student Cluster Competition · “The HPCAC-ISC Student Cluster Competition has been a fantastic addition to the ISC program and is a great show-case for both the universities

The rCUDA Technology: Improvements Towards a Production ... · The rCUDA Technology: Improvements Towards a Production Ready Software Federico Silla Universitat Politècnica de València

rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC … · 2013. 7. 5. · Introduction to rCUDA A framework enabling that a CUDA-based application running in one node