Table of Contentsdocs.hol.vmware.com/HOL-2019/hol-1947-01-emt_pdf_en.pdf · •Module 2 - Uday Kurkure, Staff Engineer 1, USA • Module 3 - Uday Kurkure, Staff Engineer 1, USA •

Table of ContentsLab Overview - HOL-1947-01-EMT - Machine Learning Workloads in vSphere Using GPUs- Getting Started ............................................................................................................... 2

Lab Guidance .......................................................................................................... 3Module 1 - Machine Learning Apps in vSphere VMs Using GPUs (15 minutes)..................9

Introduction........................................................................................................... 10Conclusion............................................................................................................. 12

Module 2 - Using NVIDIA GRID vGPUs in vSphere (15 minutes) ......................................14Introduction........................................................................................................... 15Hands-on Labs Interactive Simulation: NVIDIA GRID vGPUs in vSphere................17Conclusion............................................................................................................. 18

Module 3 - Using GPUs in Pass-through Mode (15 Minutes) ............................................20Introduction........................................................................................................... 21Hands-on Labs Interactive Simulation: Configuring Passthrough for NVIDIA P100on vSphere............................................................................................................ 23Conclusion............................................................................................................. 24

Module 4 - Using Bitfusion GPU virtualization in vSphere (15 minutes) ..........................26Introduction........................................................................................................... 27Hands-on Labs Interactive Simulation: Using Bitfusion GPU virtualization invSphere................................................................................................................. 31Conclusion............................................................................................................. 32

Module 5 - Performing Infrastructure Maintenance when VMs are using GPUs (15minutes).......................................................................................................................... 34

Introduction........................................................................................................... 35Conclusion............................................................................................................. 37

Module 6 - Running Machine Learning Workloads using TensorFlow in vSphere (30minutes).......................................................................................................................... 39

Introduction........................................................................................................... 40Hands-on Labs Interactive Simulation: Running Machine Learning Workloads usingTensorFlow in vSphere........................................................................................... 41Conclusion............................................................................................................. 42

Module 7 - vGPU Scheduling Options (15 minutes)......................................................... 44Introduction........................................................................................................... 45Hands-on Labs Interactive Simulation: vGPU Scheduling Options.........................46Conclusion............................................................................................................. 47

Module 8 - Maximizing the Power of vSphere for Diverse Workloads using GPUs (15minutes).......................................................................................................................... 49

Introduction........................................................................................................... 50Hands-on Labs Interactive Simulation: Maximizing the Power of vSphere forDiverse Workloads using GPUs.............................................................................. 51Conclusion............................................................................................................. 52

HOL-1947-01-EMT

Page 1HOL-1947-01-EMT

Lab Overview -HOL-1947-01-EMT -Machine Learning

Workloads in vSphereUsing GPUs - Getting

Started

HOL-1947-01-EMT


Lab GuidanceNote: It may take more than 90 minutes to complete this lab. You shouldexpect to only finish 2-3 of the modules during your time. The modules areindependent of each other so you can start at the beginning of any moduleand proceed from there. You can use the Table of Contents to access anymodule of your choosing.

The Table of Contents can be accessed in the upper right-hand corner of theLab Manual.

This lab explores how to accelerate Machine Learning Workloads by using GPUs andvGPUs in vSphere. For this lab we will be using NVIDIA GRID GPUs installed in the ESXihosts. Throughout all 8 modules, we will show you mechanisms to access GPUs eitherdirectly or through passthrough mode from a VM, how to run machine learningworkloads using TensorFlow, and how to maximize your datacenter resources includingGPUs by running diverse workloads.

Lab Module List:

• Module 1 -Machine Learning Apps in vSphere VMs Using GPUs (15 minutes) -Basic - In this module, you will get a basic overview of what Machine Learning isand how to run ML workloads with TensorFlow in vSphere VMs

• Module 2 -Using NVIDIA GRID vGPUs in vSphere (15 minutes) - Basic - In thismodule, you will enable NVIDIA vGPU GRID in vSphere.

• Module 3 -Using GPUs in Pass-through Mode on vSphere (15 minutes) - Basic - Inthis module, you will access GPUs in Pass-through mode.

• Module 4 -Using Bitfusion GPU virtualization in vSphere (15 minutes) - Basic - Inthis module, you will enable Bitusion Elastic GPUs on vSphere.

• Module 5 -Performing Infrastructure Maintenance when VMs are using GPUs (15minutes) - Basic - In this module, you will perform vMotion on VMs runningapplications that are using GPUs for compute acceleration in order to remediatean ESXi host.

• Module 6 -Running Machine Learning Workloads using TensorFlow in vSphere (15minutes) - Basic - In this module, you will learn how run machine learningworkloads on NVIDIA GPUs using TensorFlow in vSphere.

• Module 7 -vGPU Scheduling Options (15 minutes) - Basic - In this module, youwill learn about different vGPU schedulers and how to select between differentvGPU Scheduler.

• Module 8 -Maximizing the Power of vSphere for Diverse Workloads using GPUs(15 minutes) - Basic - In this module, you will maximize your datacenterresources including GPUs.

Lab Captains:

• Module 1 - Uday Kurkure, Staff Engineer 1, USA

HOL-1947-01-EMT


• Module 2 - Uday Kurkure, Staff Engineer 1, USA• Module 3 - Uday Kurkure, Staff Engineer 1, USA• Module 4 - Uday Kurkure, Staff Engineer 1, USA• Module 5 - Uday Kurkure, Staff Engineer 1, USA• Module 6 - Uday Kurkure, Staff Engineer 1, USA• Module 7 - Uday Kurkure, Staff Engineer 1, USA• Module 8 - Uday Kurkure, Staff Engineer 1, USA

This lab manual can be downloaded from the Hands-on Labs Document site found here:

http://docs.hol.vmware.com

Location of the Main Console

1. The area in the RED box contains the Main Console. The Lab Manual is on the tabto the Right of the Main Console.

2. A particular lab may have additional consoles found on separate tabs in the upperleft. You will be directed to open another specific console if needed.

3. Your lab starts with 90 minutes on the timer. The lab can not be saved. All yourwork must be done during the lab session. But you can click the EXTEND toincrease your time. If you are at a VMware event, you can extend your lab timetwice, for up to 30 minutes. Each click gives you an additional 15 minutes.Outside of VMware events, you can extend your lab time up to 9 hours and 30

minutes. Each click gives you an additional hour.

HOL-1947-01-EMT


http://docs.hol.vmware.com

Alternate Methods of Keyboard Data Entry

During this module, you will input text into the Main Console. Besides directly typing itin, there are two very helpful methods of entering data which make it easier to entercomplex data.

Click and Drag Lab Manual Content Into Console ActiveWindow

You can also click and drag text and Command Line Interface (CLI) commands directlyfrom the Lab Manual into the active window in the Main Console.

Accessing the Online International Keyboard

You can also use the Online International Keyboard found in the Main Console.

1. Click on the Keyboard Icon found on the Windows Quick Launch Task Bar.

<div class="player-unavailable"><h1 class="message">An error occurred.</h1><div class="submessage"><ahref="http://www.youtube.com/watch?v=xS07n6GzGuo" target="_blank">Try watching this video on www.youtube.com</a>, or enableJavaScript if it is disabled in your browser.</div></div>

HOL-1947-01-EMT


Click once in active console window

In this example, you will use the Online Keyboard to enter the "@" sign used in emailaddresses. The "@" sign is Shift-2 on US keyboard layouts.

1. Click once in the active console window.2. Click on the Shift key.

Click on the @ key

1. Click on the "@ key".

Notice the @ sign entered in the active console window.

HOL-1947-01-EMT


Activation Prompt or Watermark

When you first start your lab, you may notice a watermark on the desktop indicatingthat Windows is not activated.

One of the major benefits of virtualization is that virtual machines can be moved andrun on any platform. The Hands-on Labs utilizes this benefit and we are able to run thelabs out of multiple datacenters. However, these datacenters may not have identicalprocessors, which triggers a Microsoft activation check through the Internet.

Rest assured, VMware and the Hands-on Labs are in full compliance with Microsoftlicensing requirements. The lab that you are using is a self-contained pod and does nothave full access to the Internet, which is required for Windows to verify the activation.Without full access to the Internet, this automated process fails and you see this

watermark.

This cosmetic issue has no effect on your lab.

Look at the lower right portion of the screen

HOL-1947-01-EMT


Please check to see that your lab is finished all the startup routines and is ready for youto start. If you see anything other than "Ready", please wait a few minutes. If after 5minutes your lab has not changed to "Ready", please ask for assistance.

HOL-1947-01-EMT


Module 1 - MachineLearning Apps in vSphere

VMs Using GPUs (15minutes)

HOL-1947-01-EMT


IntroductionIn this module, you will learn about Machine Learning (ML) and how to run ML workloadsusing TensorFlow in vSphere VMs.

Machine learning is an exciting area of technology that allows computers to behavewithout being explicitly programmed, that is, in the way a person might learn. This techis increasingly applied in many areas like health, science, finance, and intelligentsystems, among others.

In recent years, the emergence of deep learning and the enhancement of acceleratorslike GPUs has brought the tremendous adoption of machine learning applications in abroader and deeper aspect of our lives. Some application areas include facialrecognition in images, medical diagnosis in MRIs, robotics, automobile safety, and textand speech recognition.

GPUs reduce the time it takes for a machine learning or deep learning algorithm to learn(known as the training time) from hours to minutes.

Machine learning (ML), especially deep learning (DL) workloads are growing in thedatacenters and cloud environments. The use of ML in intelligent applications usuallyincludes two main stages: building models using ML methods (Neural Networks, SupportVector Machines, Hidden Markov Models, etc.), which is known as training stage, andthen applying the models for intelligent tasks like recognition, prediction orclassification, which is known as the inference stage.

There are several ways you can run ML applications using GPUs, one of which is to useGPU compute applications inside virtual machines on VMware vSphere. In this lab wepresent three of these options:

• Using NVIDIA vGPUs in vSphere• Using GPUs in Passthrough• Using Bitfusion FlexDirect

What to expect from each Module

NVIDIA GRID vGPU is a GPU virtualization solution by NVIDIA. It is a suitable option whenyou want multiple VMs to share the same physical GPU. It also enables well-knownvirtualization benefits, such as cloning a VM or suspending and resuming a VM. We willshow you this in Module 2

The NVIDIA GRID vGPU manager is installed in vSphere to virtualize the underlyingphysical GPUs. The graphics memory of the physical GPU is divided into equal chunksand those chunks are given to each VM. The type of vGPU profile determines theamount of graphics memory each VM can have.

HOL-1947-01-EMT


Passthrough on vSphere (also known as VMware DirectPath I/O) allows direct accessfrom the guest operating system in a virtual machine (VM) to the physical PCI or PCIehardware devices of the server controlled by the vSphere hypervisor layer. Each VM isassigned one or more GPUs as PCI devices. Pass-through is a suitable option when youwant a VM to have one or multiple physical GPUs for huge computation needs of anapplications running inside the VM. Since the guest OS bypasses the virtualization layerto access the GPUs, the overhead of using pass-through mode is low. There is no GPUsharing amongst VMs when using this mode. We will show you this in Module 3

Bitfusion FlexDirect is a GPU virtualization solution provided by a company namedBitfusion. It allows ML workflows running inside a VM to use one or more GPUs on thesame vSphere host and/or on remote hosts. It also supports multiple VMs sharing asingle physical GPU. We will show this in Module 4

Machine Learning training and High Performance Computing jobs can take weeks tocomplete even with GPUs. Currently, if the server needs maintenance, weeks of work islost when a server is powered down. Now VMware vSphere has added the ability toperform live VM migrations using vMotion for vGPU enabled VMs. The live VMs aremigrated to another server before the maintenance begins. No work is lost due tomaintenance. We will show you this in Module 5

Most ML methods are very computationally intensive. The training time for buildingprediction models can take hours, days or even weeks for large datasets and fastinference time is a critical requirement in many real-time applications. Hence, usingaccelerators like GPU, TPU, FPGA to accelerate ML tasks. In this lab, we focus on theGPU because of its popular use for ML. We can use CUDA and its cuDNN library fordeveloping ML applications for NVIDIA’s GPUs or OpenCL for applications running onAMD's GPUs. Some ML frameworks supporting cuDNN are Tensorflow, Keras, Theano,Caffe, Torch, MXNet, etc. We will show you this in Module 6

Our performance studies have shown that adding vGPU to VMs often leads tounderutilization of CPU resources. One can run CPU-only workloads concurrently withGPU workloads without significant performance penalties. One can run MachineLearning training batch jobs at night time and interactive 3D-CAD jobs during daytimehours by suspending and resuming VMs. We will show you this in Module 7

HOL-1947-01-EMT


ConclusionIn this module, we reviewed the basics of what Machine Learning (ML) is andwhat you can expect in each module.

You've finished Module 1

Congratulations on completing Module 1.

If you are looking for additional information on Machine Learning at VMware, try one ofthese:

• Click on this https://blogs.vmware.com/apps/machine-learning-resources• Or use your smart device to scan the QRC Code.

Proceed to any module below which interests you most.

• Module 2 - Using NVIDIA GRID vGPUs in vSphere (15 minutes) - Basic• Module 3 - Using GPUs in Pass-through Mode on vSphere (15 minutes) - Basic• Module 4 - Using Bitfusion GPU virtualization in vSphere (15 minutes) - Basic• Module 5 - Performing Infrastructure Maintenance when VMs are using GPUs (15

minutes) - Basic• Module 6 - Running Machine Learning Workloads using TensorFlow in vSphere

(15 minutes) - Intermediate• Module 7 - vGPU Scheduling Options (15 minutes) - Intermediate• Module 8 - Maximizing the Power of vSphere for Diverse Workloads using GPUs

(15 minutes) - Intermediate

HOL-1947-01-EMT


https://blogs.vmware.com/apps/machine-learning-resources

How to End Lab

To end your lab click on the END button.

HOL-1947-01-EMT


Module 2 - Using NVIDIAGRID vGPUs in vSphere

(15 minutes)

HOL-1947-01-EMT


IntroductionIn this module, we will take a closer look at how the NVIDIA GRID vGPU is integrated intoa vSphere environment. We will show you how to install the NVIDIA drivers in a VM totake advantage of the vSphere driver, and then run a ML workload to show you how itworks.

The NVIDIA GRID vGPU is a GPU virtualization solution by NVIDIA. This solution allowsmultiple VMs to share a physical GPU and is also called a mediated pass-throughsolution.

To enable this solution you would need to install the NVIDIA GRID vGPU manager, alsoknown as NVIDIA vGPU Driver or NVIDIA-ESX-HOST driver.

To run the ML workloads using GPUs, you need to install CUDA and CUDNN libraries fromNVIDIA in a VM. CUDNN stands for CUDA Deep Neural Network. It is a GPU-acceleratedlibrary for deep neural networks. Many ML frameworks like TensorFlow and Caffe2 usethis library to accelerate machine learning performance.

Once the driver is installed in the ESXi host, the graphics memory of the physical GPU isdivided into equal chunks and given to each VM. The type of vGPU profile determinesthe amount of graphics memory each VM can have. The Pascal P40 card has 24 GB ofMemory that will be distributed across the VMs base on the assigned profile.

HOL-1947-01-EMT


Table 1 lists the available Nvidia Pascal P40 vGPU profiles. You would use the differentVM profiles to give the VMs the proper resources needed to drive the type of MLworkloads. Currently, only one vGPU can be assigned to a VM.

ML Frameworks allow rapid development of machine learning applications. We will useTensorFlow in this lab. TensorFlow is an open source machine learning framework.

Once we have TensorFlow install we will run a machine learning workload. The workloadwe will run is a Handwriting Recognition benchmark known as MNIST. The benchmarkemploys Convolutional Neural Network and has a training set of 60000 examples.

HOL-1947-01-EMT


Hands-on Labs Interactive Simulation:NVIDIA GRID vGPUs in vSphereThis part of the lab is presented as a Hands-on Labs Interactive Simulation. This willallow you to experience steps which are too time-consuming or resource intensive to dolive in the lab environment. In this simulation, you can use the software interface as ifyou are interacting with a live environment.

1. Click here to open the interactive simulation. It will open in a new browserwindow or tab.

2. When finished, click the “Return to the lab” link to continue with this lab.

The lab continues to run in the background. If the lab goes into standby mode, you canresume it after completing the module.

HOL-1947-01-EMT


http://docs.hol.vmware.com/hol-isim/HOL-2019/hol-1947-01-vspherenvidiagrid.htm

ConclusionIn this module, we reviewed the basics of what Machine Learning (ML) is andhow you can utilize the vSphere, GPUs, and vGPU to process ML methods. Weinstalled the NVIDIA drivers in both vSphere and a VM. And showed you howyou could use the NVIDIA GPU by running a TensorFlow workload.






• Module 1 -Machine Learning Apps in vSphere VMs Using GPUs (15 minutes) -Basic

• Module 3 - Using GPUs in Pass-through Mode on vSphere (15 minutes) - Basic• Module 4 - Using Bitfusion GPU virtualization in vSphere (15 minutes) - Basic• Module 5 - Performing Infrastructure Maintenance when VMs are using GPUs (15




HOL-1947-01-EMT



How to End Lab


HOL-1947-01-EMT


Module 3 - Using GPUs inPass-through Mode (15

Minutes)

HOL-1947-01-EMT


IntroductionIn this Module, we will walk through the major steps for configuring DirectPath I/O(Passthrough) for a NVIDIA P100 GPU on vSphere 6.7.

In vSphere, GPU can be configured in DirectPath I/O (passthrough) mode, which allows aguest OS to directly access the device, essentially bypassing the hypervisor. Because ofthe shortened access path, performance of applications accessing GPUs in this way canbe very close to that of bare-metal systems.

With DirectPath I/O, we can configure one or multiple GPU devices into a single VM.Each GPU device is dedicated to a VM and there is no GPU sharing among the VMs.

Please note that some features are unavailable for VMs configured with DirectPath I/O,including hot-adding of virtual devices, taking snapshots, suspending/resuming VMs,and vMotion.

Requirements for configuring large-BAR GPU devices inPassthrough mode

Some high-end compute GPUs like NVIDIA V100, P100, K80 and K40 use large, multi-gigabyte passthrough memory-mapped I/O (MMIO) device memory regions to transferdata between the host and the device. For example, NVIDIA P100s PCI MMIO space isslightly large than 16GB. To enable a device that uses large PCI MMIO regions, includingNVIDIA V100, P100, K80, and K40, there are some preliminaries for configuring them inPassthrough mode:

1. Server BIOS◦ In server BIOS, 4G mapping/encoding should be enabled. The step to

enable it depends on server OEM. You can search for above 4G decoding or

HOL-1947-01-EMT


memory mapped I/O above 4GB or PCI 64 bit resource handing above 4Gkeywords.

2. UEFI installation of the VM◦ Ensure that virtual machine is UEFI enabled.

3. Advanced VM configuration parameters◦ Large PCI MMIO regions require 64bit MMIO support. To enable 64-bit MMIO

support, add this line to VM vmx file: pciPassthru.use64bitMMIO=TRUE◦ Specify large enough the MMIO region as power of two of GB in VM vmx

file, e.g. to passthrough 4 NVIDIA P100s into one VM, add this line to VMvmx file: pciPassthru.64bitMMIOSizeGB = 128

Please note there are different MMIO limitations across vSphere versions and if your GPUcard doesn't use large PCI MMIO regions, you don't need to configure the specialsettings for BIOS and advanced VM configuration parameters. For more details, pleaserefer to VMware vSphere VMDirectPath I/O: Requirements for Platforms and Devices.

HOL-1947-01-EMT


https://kb.vmware.com/s/article/2142307

Hands-on Labs Interactive Simulation:Configuring Passthrough for NVIDIAP100 on vSphereThis part of the lab is presented as a Hands-on Labs Interactive Simulation. This willallow you to experience steps which are too time-consuming or resource intensive to dolive in the lab environment. In this simulation, you can use the software interface as ifyou are interacting with a live environment.




HOL-1947-01-EMT


http://docs.hol.vmware.com/hol-isim/HOL-2019/hol-1947-01-vspheregpupassthrough.htm

ConclusionIn this module, we show you how to configure DirectPath I/O (Passthrough)way for using GPUs on vSphere.







• Module 2 - Using NVIDIA GRID vGPUs in vSphere (15 minutes) - Basic• Module 4 - Using Bitfusion GPU virtualization in vSphere (15 minutes) - Basic• Module 5 - Performing Infrastructure Maintenance when VMs are using GPUs (15




HOL-1947-01-EMT



How to End Lab


HOL-1947-01-EMT


Module 4 - UsingBitfusion GPU

virtualization in vSphere(15 minutes)

HOL-1947-01-EMT


IntroductionIn this module, you will learn about Bitfusion FlexDirect and how a VM without a GPUcan use the GPU on another VM.

Bitfusion FlexDirect is a GPU virtualization solution provided by a company namedBitfusion. The GPU accelerators can be shared over the network and accessed remotelyby VMs. With Bitfusion, GPU accelerators are now part of a common infrastructureresource pool and available for use by VMs in the vSphere-based environment.

Bitfusion FlexDirect runs as a userspace application within each VM instance, withoutthe need for change or special software in the ESXi hypervisor or the AI applications. Onthe GPU-accelerated server VM, FlexDirect also runs as a transparent software layer andexposes the individual physical GPUs as a pooled resource to be consumed by clientVMs (VMs don't have GPUs). Upon completion of the AI runtime code, the shared GPUresources go back into the resource pool.

Bitfusion use-cases on vSphere can be broadly categorized into 3 types.

Dynamic and Remote Attached GPUs

Bitfusion FlexDirect allows remote attach of GPUs dynamically to client VMs, as shown inFig 4.1. GPUs can also be dynamically detached after use.

HOL-1947-01-EMT


Fig 4.1 Dynamic and Remote Attached GPUs

Partial GPUs

Bitfusion FlexDirect can be used to slice GPUs to non-equal parts of partial GPUs. Thisserves as an optimal architecture for machine learning, in which each user/workloadtype is unpredictable and requires non-equal performance and response time. The GPUsare sliced with GPU memory. For instance, say there is a GPU with 16GB of GPU memory,one could create multiple partial GPUs namely two 4GB partial GPUs and four 2GBpartial GPUs using FlexDirect. This allows sharing the same GPU across multiple users ina multi-tenant environment, as shown in Fig 4.2.

• Fig 4.2 Bitfusion FlexDirect Partial GPUs. Here, vGPU means the memory slicedpartial GPU.

HOL-1947-01-EMT


Dynamic and Remote Attached Partial GPUs

Bitfusion FlexDirect can also be leveraged to remotely attach partial GPUs dynamically.A group of GPUs can be re-configured to partial GPUs of different size and combination,and can be remotely attached to client VMs, as shown in Fig 4.3.

• Fig 4.3 Bitfusion FlexDirect Remote Partial GPUs. Here, Virtual GPU means thememory sliced partial GPU.

HOL-1947-01-EMT


Summary

With VMware vSphere and Bitfusion, GPUs can be a shared pool of resources that can beattached to any VMs as shown in Fig 4.4. A full-fledged GPU as a Service can be createdwith VMWare vSphere and Bitfusion FlexDirect. FlexDirect GPU resource schedulers willbe started on all the GPU server VMs in the pool. Each of the Client VMs will useFlexDirect to attach full or partial remote GPUs from the GPU pool. For more information,you can check Bitfusion FlexDirect documentation https://docs.bitfusion.io

HOL-1947-01-EMT


https://docs.bitfusion.io

Hands-on Labs Interactive Simulation:Using Bitfusion GPU virtualization invSphereThis part of the lab is presented as a Hands-on Labs Interactive Simulation. This willallow you to experience steps which are too time-consuming or resource intensive to dolive in the lab environment. In this simulation, you can use the software interface as ifyou are interacting with a live environment.




HOL-1947-01-EMT


http://docs.hol.vmware.com/hol-isim/HOL-2019/hol-1947-01-bitfusionvsphere.htm

ConclusionIn this module, you have learned one of ways to use GPUs on vSphere byleveraging Bitfusion GPU virtualization solution.







• Module 2 - Using NVIDIA GRID vGPUs in vSphere (15 minutes) - Basic• Module 3 - Using GPUs in Pass-through Mode on vSphere (15 minutes) - Basic• Module 5 - Performing Infrastructure Maintenance when VMs are using GPUs (15




HOL-1947-01-EMT



How to End Lab


HOL-1947-01-EMT


Module 5 - PerformingInfrastructure

Maintenance when VMsare using GPUs (15

minutes)

HOL-1947-01-EMT


IntroductionIn this module, we will discuss why live vMotion of a GPU enabled VM is such a big deal.

vMotion's ability to move running VMs between physical machines is well known, so whyare we showing this in the ML lab? Because some significant challenges had to beovercome for vMotion to work with a GPU enabled VM.

The first challenge was to enable a VM to have direct access to physical hardware andstill be able to move from physical host to physical host. How many times have youtried to vMotion a VM that has a CD-ROM drive attached, and what happens? It failsbecause we don't allow that. But what are we doing here is giving a VM direct access tothe NVIDIA GPU installed in the ESXi host.

The second challenge is to pass the workload of the GPU between physical hosts. Thisseems like a simple addition to the capability of vMotion. However, when we considerthe nVIDIA GRID vGPU allocates anywhere from 1GB to 24GB of RAM on the GPU, hasthousands of state variables, and the state information of a sophisticated graphicspipeline which is transferred to the destination server where it is setup correctly so thatthe application in the VM that uses the GPU can continue without missing a beat. It isclear that this is quite a feat. Simply transferring the contents of a graphics RAM is anachievement. In addition, transferring the state information and loading it correctly atthe destination makes this a significant achievement.

Watch this video to see a live vMotion of a GPU enabled VM between 2 ESXi host.

HOL-1947-01-EMT


Video - vMotion Demo (1:18)

<div class="player-unavailable"><h1 class="message">An error occurred.</h1><div class="submessage"><ahref="http://www.youtube.com/watch?v=GU4AMB0pG9M" target="_blank">Try watching this video on www.youtube.com</a>, orenable JavaScript if it is disabled in your browser.</div></div>

HOL-1947-01-EMT


ConclusionIn this module, we showed you how you can vMotion a VM so maintenance canbe done on a host without effecting the ML workloads.







• Module 2 - Using NVIDIA GRID vGPUs in vSphere (15 minutes) - Basic• Module 3 - Using GPUs in Pass-through Mode on vSphere (15 minutes) - Basic• Module 4 - Using Bitfusion GPU virtualization in vSphere (15 minutes) - Basic• Module 6 - Running Machine Learning Workloads using TensorFlow in vSphere



HOL-1947-01-EMT



How to End Lab


HOL-1947-01-EMT


Module 6 - RunningMachine LearningWorkloads using

TensorFlow in vSphere(30 minutes)

HOL-1947-01-EMT


IntroductionIn this module, we will run a Complex Language Modeling ML Workload.

Given the history of words this benchmark predicts next word.The benchmark uses PennTree Bank (PTB) Database. It has 929K training words, 73K validation words, 82K testwords. It has vocabulary of 10K words. The benchmark employs Recurrent NeuralNetwork.

It has 3 models. (LSTM stands for Long Short Term Memory)

• The small model has 200 LSTM unit per layer.• The medium model has 650 LSTM units/layer.• The large model has 1500 LSTM units/layer.

The bigger model give better accuracy but they take more time to train. For examplethe large model takes 56 hours to train with GPUs. The use of Pascal P40 GPU brings thistime to 3 hours.

HOL-1947-01-EMT


Hands-on Labs Interactive Simulation:Running Machine Learning Workloadsusing TensorFlow in vSphereThis part of the lab is presented as a Hands-on Labs Interactive Simulation. This willallow you to experience steps which are too time-consuming or resource intensive to dolive in the lab environment. In this simulation, you can use the software interface as ifyou are interacting with a live environment.




HOL-1947-01-EMT


http://docs.hol.vmware.com/hol-isim/HOL-2019/hol-1947-01-vspheretensorflow.htm

ConclusionIn this module, we reviewed the basics of what Machine Learning (ML) is andhow you can utilize the vSphere, GPUs, and vGPU to process ML methods.








minutes) - Basic• Module 7 - vGPU Scheduling Options (15 minutes) - Intermediate• Module 8 - Maximizing the Power of vSphere for Diverse Workloads using GPUs


HOL-1947-01-EMT



How to End Lab


HOL-1947-01-EMT


Module 7 - vGPUScheduling Options (15

minutes)

HOL-1947-01-EMT


IntroductionIn this module, we will introduce you to vGPU scheduling options

Multiple VMs share a physical GPU by using NVIDIA Virtual GPU manager. vGPUscheduling policy specifies how GPU is shared among VMs. NVIDIA GRID supports threevGPU scheduling options: Best Effort, Equal Share and Fixed. The selection of a vGPUscheduling option depends on use cases.

Best Effort Scheduler optimizes GPU utilization. For some circumstances, a VM runningGPU intensive application may affect the performance of GPU lightweight applicationrunning in other VMs. To avoid such performance impact and ensure QoS (Quality ofService), you can choose to switch to Equal Share or Fixed Share scheduler. Equal ShareScheduler ensures equal share of GPU time for each powered on VM. Fixed Sharescheduler gives fixed share of GPU time to a VM based on the vGPU profile associatedVMs on the physical GPU.

NVIDIA supports Best Effort vGPU scheduler for all supported architectures. For NVIDIAPascal and Volta architectures, it supports Equal Share and Fixed Share schedulers inaddition to Best Effort Scheduler.

Below diagrams show an illustration of theBest Effort and EqualShare schedulers

Best Effort Scheduler

Equal Share Scheduler

HOL-1947-01-EMT


Hands-on Labs Interactive Simulation:vGPU Scheduling OptionsThis part of the lab is presented as a Hands-on Labs Interactive Simulation. This willallow you to experience steps which are too time-consuming or resource intensive to dolive in the lab environment. In this simulation, you can use the software interface as ifyou are interacting with a live environment.




HOL-1947-01-EMT


http://docs.hol.vmware.com/hol-isim/HOL-2019/hol-1947-01-vgpuscheduling.htm










(15 minutes) - Intermediate• Module 8 - Maximizing the Power of vSphere for Diverse Workloads using GPUs


HOL-1947-01-EMT



How to End Lab


HOL-1947-01-EMT


Module 8 - Maximizingthe Power of vSphere forDiverse Workloads using

GPUs (15 minutes)

HOL-1947-01-EMT


IntroductionIn this module, we will show you what the nVidia GPU can do based on benchmarks

The benchmarks will be started using a script that runs in a controller VM, which runsUbuntu Linux. Once the script is started, it remotely invokes the SPECapc 3ds Max 2015on two VMs, and MNIST on the CentOS VM. Once the benchmarks run to completion, theVMs reboot automatically and that signals completion to the controller VM. We will startthe benchmark now.

The metric we'll use is simply the wall-clock time to complete the CAD and MLbenchmarks. We'll compare the wall-clock time to run the ML benchmark, and CADbenchmarks stand-alone with the time to run the CAD+ML benchmarks concurrently.

Prior to this lab, we ran the CAD benchmark stand-alone and recorded its wall-clock runtime. Subsequently we ran the ML benchmark stand-alone and recorded its wall-clockrun time. These times are recorded in the file WT.txt which is printed out once theML+CAD benchmarks running concurrently complete execution. From the data we cansee that the ML benchmark sees no increase in run-time due to sharing the server withCAD. The CAD benchmarks do not show any increase in run-time due to sharing either(data for this is not shown in this lab.)

What we have demonstrated in this lab is that Nvidia GRID vGPU on vSphere issufficiently powerful and capable of running diverse workloads concurrently with nonoticeable drop in performance with little or special effort.

HOL-1947-01-EMT


Hands-on Labs Interactive Simulation:Maximizing the Power of vSphere forDiverse Workloads using GPUsThis part of the lab is presented as a Hands-on Labs Interactive Simulation. This willallow you to experience steps which are too time-consuming or resource intensive to dolive in the lab environment. In this simulation, you can use the software interface as ifyou are interacting with a live environment.




HOL-1947-01-EMT


http://docs.hol.vmware.com/hol-isim/HOL-2019/hol-1947-01-gpudiverse.htm






Proceed to any module below which interests you most. [Add any custom/optionalinformation for your lab manual.]




(15 minutes) - Intermediate• Module 7 - vGPU Scheduling Options (15 minutes) - Intermediate

HOL-1947-01-EMT



How to End Lab


HOL-1947-01-EMT


ConclusionThank you for participating in the VMware Hands-on Labs. Be sure to visithttp://hol.vmware.com/ to continue your lab experience online.

Lab SKU: HOL-1947-01-EMT

Version: 20200210-211126

HOL-1947-01-EMT


http://hol.vmware.com/

Documents

Table of Contentsdocs.hol.vmware.com/HOL-2019/hol-1947-01-emt_pdf_en.pdf · •Module 2 - Uday Kurkure, Staff Engineer 1, USA • Module 3 - Uday Kurkure, Staff Engineer 1, USA •