gScale : Improve vGPU Scalability - 01.org§ Slide20 is measured by running different 3D workload...

gScale : Improve vGPU Scalability Using Dynamic Resource Sharing

Aug 2016

Pei Zhang Pei.zhang@intel.com

Kevin Tian Kevin.tian@intel.com

Xiao Zheng Xiao.zheng@intel.com

Agenda

• vGPU Scalability of GVT-g

• gScale Design for Doubled vGPU Density

• Evaluation

• Summary

vGPU Scalability in GVT-g

GVT-g Introduction

• Full GPU virtualization solution with mediated pass-through approach

• Open source implementation for Xen/KVM (aka XenGT/KVMGT)

• Support a rich span of Intel® Processor Graphics

• Available in https://01.org/igvt-g

Near to native GPU performance

Full GPU capabilities

Flexible VMs sharing

(Up to 7VMs)+ +

>= 2X higher density!

gScale

• Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

• For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

Processor Graphics: Components

DisplayEngine

RenderEngine

System Memory

Global Graphics Virtual Memory

PPGTTPPGTTPPGTT

Graphics Virtual Memory

Per-Process Graphics Virtual Memory

Registers/MMIO

Aperture

• Parallel access from multiple engines due to split vCPU/vGPU scheduling

Shared Global Graphics Memory in GVT-g

DisplayEngine

RenderEngine

Global Graphics Virtual Memory

Aperture

By guest gfx driversthru

vCPU scheduling

By guest GPU commandsthru

vGPU scheduling

Upon Foreground/background

Display switch

Static Partitioning of Global Graphics Memory

0 4GB1GB

Global Graphics Memory

CPU Visible Memory CPU Invisible Memory

PCI MMIO Aperture

- Minimal 128MB visible/384MB invisible per vGPU- Means up to 7vGPUs possible (besides reserved for Dom0)- Some BIOS may support a smaller aperture which means more limitation

gScale Design for Doubled vGPU Density

gScale: per-VM Global Graphics Memory

ViewVM2

View Host View

Ballooned graphic memory

ViewVM2

View Host View

Switch between per-VM GGTT table

Shared GGTT space Per-VM GGTT space

Key challenge is to remove parallel accesses!

3 Components in Parallel Access to GGTT

• Render engine access

• Display engine access

• CPU tiled/non-tiled memory access

Render Engine Accesses

• At anytime, only one vGPU’s graphics memory is accessible by render engine

Controlled by vGPU scheduler

• Dynamically switch per-VM GGTT table at vGPU context switch

Render Engine

per-VM Global GTT

VM2 System MemoryVM1 System Memory

Part of vGPU context switch

Display Engine Accesses

• Display engine accesses is restricted to the Dom0 reserved region

Alias mapping of guest framebuffer into reserved graphics memory for Dom0

Display Engine

per-VM Global GTT

Dom0 reserved region

Tiled/non-Tiled and Aperture Memory

• What is tiled memory

• Tiled memory – aperture access

Linear surface Tiled surface

CPU Accesses to Non-Tiled Memory

per-VM Global GTT

EPT1 EPT2

vAperture1 vAperture2

Remove use of aperture

Bypass aperture and directly access system memory.

Mem access is coherent between CPU cores/GPU.

Aperture

CPU Access for Tiled Memory

per-VM Global GTT

EPT1 EPT2

vAperture1 vAperture2

Tiled Memory

FenceRegisters

Dynamic allocation of fence register and

aperture for tiled memory!

Aperture

Summary: Global Graphic memory access

DisplayEngine

RenderEngine

Aperture

Access Tiled mem:remap to DOM-0Reserved region

Only render owner can access memory

Remap to only accesses DOM-0 reserved region

Per-VM Global memory

System Memory

Access None-Tiled mem:pass through to

system memory directly

Optimization: Split per-VM GGTT into slots

• per-VM GTT switching is based on slots

• Switching for those only slots that shared

between other VMs

* Notes:

1. vGPU0 (for DOM0) has dedicated mem space slots which won’t be shared by other vGPUs.

Slot0(DOM0)

Slot1(DOMu)

Slot2(DOMu)

High Mem space

vGPU0*1

DOM0 Reserved

regionSlot0

(DOM0)

Slot1(DOMu)

Low Mem space

… …Available low/high space slot

Split sample

GGTT space

Whole per-VM GTT table size is about 2MB

Switching entire GTT table is time consumed.

Linux Windows

1. gScale-Basic is the data without memory slot split.

2. Performance data is sampled with 12 VMs.

Context switch performance- Private GTT table copy

Source: USENIX ATC (2015), gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space

Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.*Other names and brands may be claimed as the property of others.For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

Evaluation

Windows Guest: Scalability 3D

performance with 12 VMs could achieves

up to 80% of 1 VM.

Linux Guest: Scalability of 3D performance with 15VMs could achieves

up to 80% of 1 VM.

2D performance is better by burn out GPU power.

Linux VM

Windows VM

Scalability Evaluation

Source: USENIX ATC (2015), gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space

Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.*Other names and brands may be claimed as the property of others.For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

Summary

• gScale breaks graphics memory resource limitation by introducing per-VM GGTT design

• gScale doubles vGPU density in Intel®@ GVT-g with good scalability

• Converged performance of 15 vGPUs reaches 96% of native performance

• Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

Resource Links

• Project webpage and release: https://01.org/igvt-g

• Project public papers and document: https://01.org/group/2230/documentation-list

• Intel® IDF: GVT-g in Media Cloud: https://01.org/sites/default/files/documentation/sz15_sfts002_100_engf.pdf

• XenGT introduction in summit in 2015: http://events.linuxfoundation.org/sites/events/files/slides/XenGT-

Xen%20Summit-REWRITE%203RD%20v4.pdf

• XenGT introduction in summit in 2014: http://events.linuxfoundation.org/sites/events/files/slides/XenGT-

LinuxCollaborationSummit-final_1.pdf

Legal Notices and Disclaimers

§ Slide4 is measured by computing the VM creation number in HSW platform.

§ Slide18 is measured by computing the context switch time.

§ Slide20 is measured by running different 3D workload and compute the total FPS.

§ Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are

measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other

information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

§ All test in this document are with same hardware configuration: CPU Intel E3-1285 v3(4Cores, 3.6GHz), GPU Intel HD Graphics P4700, Memory 32GB, Storage SAMSUNG 850Pro 256GB*3.

Test is made by Xue, Mochi.

§ For more information go to http://www.intel.com/performance.

Intel, the Intel logo Intel® are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others

gScale : Improve vGPU Scalability - 01.org§ Slide20 is measured by running different 3D workload...

Documents

Have a question? Contact Us. - Engineer Ambitiously - NI · 2018. 1. 2. · Frame Rate (mono) 292 fps 85 fps 45.8 fps 21 fps Frame Rate (color) 147 fps 35 fps 19.9 fps 8.6 fps Pixel

VGPU ON KVM - linux-kvm.org · VGPU ON KVM VFIO BASED MEDIATED DEVICE FRAMEWORK. 2 AGENDA Background / Motivation Mediated Device Framework –Overview Mediated Device Framework –Deep-Dive

CORROSION INHIBITING SOLUTIONS PRODUCTGUIDE.... NITROGEN GENERATOR SPECIFICATIONS: FPS-500 FPS-900 FPS-1650 FPS-3250 FPS-5000 FPS-10000 FPS-16500 FPS-22500 Maximum FPS Capacity (Gallons)

Autodesk AutoCAD GRID vGPU scalability guide

Live Migration of vGPU - 01.org · 2019-06-27 · Live Migration of vGPU Aug 2016 Xiao Zheng xiao.zheng@Intel.com Kevin Tian kevin.tian@Intel.com. 2 ... Resume/Post-Copy Restore vGPU

FPS – What is RTI? FPS RTI wiki:

Nvidia grid and vGPU

FPS-650 and FPS-1250 - South-Tek Systems...FPS-650 and FPS-1250 South-Tek Systems Version: 6 Revision Date: 8/14/17 Page 2 of 31

GRID vGPU for VMware vSphere Version 367.43/369us.download.nvidia.com/Windows/Quadro_Certified/... · Validated Platforms GRID vGPU for VMware vSphere Version 367.43/369.17 RN-07347-001

Fps Pneumaticmechanic

FPS Prezentacija

Changelogcr4bd/3330/F2017/...Changelog Correctionsmadeinthisversionnotseeninfirstlecture: 24August2017: slide3(anoteonnextweek’sreading): adjustforlack ofquiz 24August2017: slide20

N2-Blast - Corrosion Inhibiting System - Viking Group Inc...FPS-650 FPS-1250 FPS-1750 FPS-3000 FPS-6000 FPS-15000 FPS-20000 Maximum FPS Capacity (Gallons) 650 1,250 1,750 3,000 6,000

GRID VGPU FOR CITRIX XENSERVER - …uk.download.nvidia.com/Windows/Quadro_Certified/GRID/332.83/331.59...1.3 GUEST OS This release of GRID vGPU includes support for the following guest

IDG MarketPulse: Virtual Graphics Processing Unit (vGPU)

Fps Brochure

AHV + NVIDIA VGPU INTEGRATIONon-demand.gputechconf.com/.../s7779-malcolm-crossley-nutanix.pdf · AHV + NVIDIA VGPU INTEGRATION Malcolm Crossley –AHV GPU Architect. NutanixAHV The

Instructions FPS-80, FPS-150, and FPS-210

FPS Gaming

FPS Calibration