A Look at Real World Performance Capabilities of NVIDIA ...€¦ · A Look at Real World Performance Capabilities of NVIDIA GRID™ 2.0 Synthetic Benchmarks vs The Real World Jason

A Look at Real World Performance

Capabilities of NVIDIA GRID™ 2.0 Synthetic Benchmarks vs The Real World

Jason Lee, NVIDIA; Fred Devoir, Textron; Luke Wignall, NVIDIA

Agenda

Introductions

Benchmarks: Introduction to the Real World

Click-to-Photons: The Next Level UX Test

Q & A

The Benchmark Introduced to Reality

Synthetic

Typical test run only on a single VM

Benchmarks don’t depict real

workloads

Take into account unrealistic

scenarios like constant clicks or

constant motion

Only test the RAW performance of

the environment

Simplified models only testing basic

functions in an unrealistic way

Real Users

Typical users don’t act like any

benchmarks

Users don’t do continuous work,

taking advantage of GPU Time slicing

RAM Overcommit

vCPU requirements of most CAD/CAM

apps, single core

Realistic model sizes – 40GB in my

case, exceeding the frame buffer

max limits requiring RAM overflow.

Typical Performance of Server 1 day : 7 day

Workday exhibited in graphs as hard offsets.

Behaviorally Textron has room to improve server workload optimization during off-peak hours.

Leverage environment for global engineering teams to get better utilization

Network performance is not a limiting factor. 300+ users consume a total of <3Gbps at the core switch the entire stack is connected too.

Each server consuming <30mbps

Average server has 8 Engineers performing full assembly Catia manipulation

Full assembly models roughly 40GB in RAM per user.

Early Performance Eval of GRID M60

All Testing was performed on production workloads, with fully utilized servers with 4-7 other engineers doing real work side by side with the Test VM.

Density of GRID M60 virtual machines was 2x that of GRID K2 with no compromise on performance.

Test was conducted on fully utilized production servers with 7 engineers running real work + 1 synthetic transaction VM or 15 engineers running real work + 1 synthetic transaction VM.

0

10

20

30

40

50

60

70

VP11 single VM with user workload on Lenovo M4 server

M60-0Q K2 220Q M60-2Q K2 260Q M60-4Q k2 280Q

Early Performance Eval of GRID M60

16 VMs

16VMs 32VMs

0

50

100

150

200

250

300

350

400

450

500

FPS

Total throughput Two K2 vs Two M60

4vcpu 4vcpu 4vcpu

16 VMs

16 VMs

32 VMs

0

5

10

15

20

25

16VM 16VM 32VM

K2-K240Q m60-1q M60-1Q

catia-04 catia-04 catia-04

4vcpu 4vcpu 4vcpu

VP 12, Catia AVG fps, Two k2 vs two M60

M60 shows better performance 34% than K2 with same density: 16 VMs

M60 shows drop performance -12% than k2 with double density 16VMs vs 32

VMs

But total throughput is increase 74% : K2 16 VMs vs M60 32VMs

Benchmarking Virtualized Environments

Typical Workstation benchmarks designed to stress all the available system resources.

Multiple VMs running the same task at the same time is not realistic test scenario

Most scalability tests can only simulate worst case real-user scenario

ViewPerf12 Catia viewset GPU ”heavy” process (zooming)

Benchmark Human workflow

User Load Testing – User Iterative Subsections

First Subsection - 50% of potential capacity.

Second Subsection – 50% of remaining potential capacity

Third Subsection – 50% of remaining potential capacity

Fourth Subsection – 50% of capacity of the failed added quantity

PASS

PASS

FAIL

PASS

Resulting Capacity after First Subsection

Resulting Capacity after Second Subsection

Resulting Capacity after Third Subsection - FAIL

Resulting capacity after user load testing

32

32

32

32

32

16

24

24

26

Baseline single user:

Assume potential capacity based on single user baseline

Each Subsection:

Baseline system

Evaluate performance against user feedback

If performance is good, proceed to next subsection increase.

If performance is bad repeat subsection at 50% of the increase.

Click to Photons: The Next Level UX Testing

Click to Photons: Testing Latency

Test end to end latency, from click at client to change at guest and back.

From "click" to when the photons appear on the screen.

High-speed camera or photodiode and scope.

Baseline - Round trip on LAN: ~65ms

Goals - Increase snappiness, reduce sponginess

Benefits – UX like local, or same experience further away

Click to Photon Demonstration Local PowerPoint (Black-White Transition)

Round trip from mouse click(client) to screen

update CLIENT

Render

Kybd/Mse

SERVER with GRID GPU

Capture

Encode

IP Network

CPU NIC Decode

Render

Screen

Click Demonstration

Motion Demonstration

Reduces Overall latency

Source: NVIDIA GRID Performance Engineering Lab

Low

er

is b

ett

er

65

215

150

120

280

250

120

0

50

100

150

200

250

300

Local laptop VMWarePCoIP

VMWareBlast CPU

VMWareBlast NVENC

VMWarePCoIP

VMWareBlast CPU

VMWareBlast NVENC

End(Client) to End(Client) latency Round trip from mouse click(client) to screen update

Host is idle Host is fully loaded

• Protocol acceleration decreases latency by

• 20ms~130ms - Blast (H.264 SW vs H.264 NVENC HW)

• 95ms~160ms - PCoIP vs Blast (H.264 NVENC HW)

• If the host is fully loaded latency will be increased.

• NVENC HW maintains exact same latency even fully loaded host situation

Available Resources

NVIDIA Performance Engineering Labs

NVIDIA GRID DASSAULT CATIA V5/V6 SCALABILITY GUIDE

Published March 2016 http://images.nvidia.com/content/pdf/grid/guides/dassault-systemes-catia-

application-scalability-guide.pdf

NVIDIA Performance Engineering Labs

Click to Photon Lab guide available soon

Request a copy by sending a Tweet including the #clicktophoton hash tag to

@NVIDIAGRID

http://images.nvidia.com/content/pdf/grid/guides/dassault-systemes-catia-application-scalability-guide.pdf











Questions

Documents

A Look at Real World Performance Capabilities of NVIDIA ...€¦ · A Look at Real World Performance Capabilities of NVIDIA GRID™ 2.0 Synthetic Benchmarks vs The Real World Jason