Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
A Look at Real World Performance
Capabilities of NVIDIA GRID™ 2.0 Synthetic Benchmarks vs The Real World
Jason Lee, NVIDIA; Fred Devoir, Textron; Luke Wignall, NVIDIA
Agenda
Introductions
Benchmarks: Introduction to the Real World
Click-to-Photons: The Next Level UX Test
Q & A
The Benchmark Introduced to Reality
Synthetic
Typical test run only on a single VM
Benchmarks don’t depict real
workloads
Take into account unrealistic
scenarios like constant clicks or
constant motion
Only test the RAW performance of
the environment
Simplified models only testing basic
functions in an unrealistic way
Real Users
Typical users don’t act like any
benchmarks
Users don’t do continuous work,
taking advantage of GPU Time slicing
RAM Overcommit
vCPU requirements of most CAD/CAM
apps, single core
Realistic model sizes – 40GB in my
case, exceeding the frame buffer
max limits requiring RAM overflow.
Typical Performance of Server 1 day : 7 day
Workday exhibited in graphs as hard offsets.
Behaviorally Textron has room to improve server workload optimization during off-peak hours.
Leverage environment for global engineering teams to get better utilization
Network performance is not a limiting factor. 300+ users consume a total of <3Gbps at the core switch the entire stack is connected too.
Each server consuming <30mbps
Average server has 8 Engineers performing full assembly Catia manipulation
Full assembly models roughly 40GB in RAM per user.
Early Performance Eval of GRID M60
All Testing was performed on production workloads, with fully utilized servers with 4-7 other engineers doing real work side by side with the Test VM.
Density of GRID M60 virtual machines was 2x that of GRID K2 with no compromise on performance.
Test was conducted on fully utilized production servers with 7 engineers running real work + 1 synthetic transaction VM or 15 engineers running real work + 1 synthetic transaction VM.
0
10
20
30
40
50
60
70
VP11 single VM with user workload on Lenovo M4 server
M60-0Q K2 220Q M60-2Q K2 260Q M60-4Q k2 280Q
Early Performance Eval of GRID M60
16 VMs
16VMs 32VMs
0
50
100
150
200
250
300
350
400
450
500
FPS
Total throughput Two K2 vs Two M60
4vcpu 4vcpu 4vcpu
16 VMs
16 VMs
32 VMs
0
5
10
15
20
25
16VM 16VM 32VM
K2-K240Q m60-1q M60-1Q
catia-04 catia-04 catia-04
4vcpu 4vcpu 4vcpu
VP 12, Catia AVG fps, Two k2 vs two M60
M60 shows better performance 34% than K2 with same density: 16 VMs
M60 shows drop performance -12% than k2 with double density 16VMs vs 32
VMs
But total throughput is increase 74% : K2 16 VMs vs M60 32VMs
Benchmarking Virtualized Environments
Typical Workstation benchmarks designed to stress all the available system resources.
Multiple VMs running the same task at the same time is not realistic test scenario
Most scalability tests can only simulate worst case real-user scenario
ViewPerf12 Catia viewset GPU ”heavy” process (zooming)
Benchmark Human workflow
User Load Testing – User Iterative Subsections
First Subsection - 50% of potential capacity.
Second Subsection – 50% of remaining potential capacity
Third Subsection – 50% of remaining potential capacity
Fourth Subsection – 50% of capacity of the failed added quantity
PASS
PASS
FAIL
PASS
Resulting Capacity after First Subsection
Resulting Capacity after Second Subsection
Resulting Capacity after Third Subsection - FAIL
Resulting capacity after user load testing
32
32
32
32
32
16
24
24
26
Baseline single user:
Assume potential capacity based on single user baseline
Each Subsection:
Baseline system
Evaluate performance against user feedback
If performance is good, proceed to next subsection increase.
If performance is bad repeat subsection at 50% of the increase.
Click to Photons: The Next Level UX Testing
Click to Photons: Testing Latency
Test end to end latency, from click at client to change at guest and back.
From "click" to when the photons appear on the screen.
High-speed camera or photodiode and scope.
Baseline - Round trip on LAN: ~65ms
Goals - Increase snappiness, reduce sponginess
Benefits – UX like local, or same experience further away
Click to Photon Demonstration Local PowerPoint (Black-White Transition)
Round trip from mouse click(client) to screen
update CLIENT
Render
Kybd/Mse
SERVER with GRID GPU
Capture
Encode
IP Network
CPU NIC Decode
Render
Screen
Click Demonstration
Motion Demonstration
Reduces Overall latency
Source: NVIDIA GRID Performance Engineering Lab
Low
er
is b
ett
er
65
215
150
120
280
250
120
0
50
100
150
200
250
300
Local laptop VMWarePCoIP
VMWareBlast CPU
VMWareBlast NVENC
VMWarePCoIP
VMWareBlast CPU
VMWareBlast NVENC
End(Client) to End(Client) latency Round trip from mouse click(client) to screen update
Host is idle Host is fully loaded
• Protocol acceleration decreases latency by
• 20ms~130ms - Blast (H.264 SW vs H.264 NVENC HW)
• 95ms~160ms - PCoIP vs Blast (H.264 NVENC HW)
• If the host is fully loaded latency will be increased.
• NVENC HW maintains exact same latency even fully loaded host situation
Available Resources
NVIDIA Performance Engineering Labs
NVIDIA GRID DASSAULT CATIA V5/V6 SCALABILITY GUIDE
Published March 2016 http://images.nvidia.com/content/pdf/grid/guides/dassault-systemes-catia-
application-scalability-guide.pdf
NVIDIA Performance Engineering Labs
Click to Photon Lab guide available soon
Request a copy by sending a Tweet including the #clicktophoton hash tag to
@NVIDIAGRID
Questions