58
Dell EMC Ready Solutions for VDI Designs for VMware Horizon on VxRail and vSAN Ready Nodes Validation Guide Abstract This validation guide describes the architecture and performance of the integration of VMware Horizon components for virtual desktop infrastructure (VDI) on Dell EMC vSAN Ready Nodes and Dell EMC VxRail appliances. Dell Technologies Solutions Part Number: H17748.2 July 2020

Dell EMC Ready Solutions for VDI · 2021. 1. 5. · Dell EMC Ready Solutions for VDI Designs for VMware Horizon on VxRail and vSAN Ready Nodes Validation Guide Abstract ... Office

  • Upload
    others

  • View
    24

  • Download
    1

Embed Size (px)

Citation preview

  • Dell EMC Ready Solutions for VDIDesigns for VMware Horizon on VxRail and vSAN

    Ready NodesValidation Guide

    Abstract

    This validation guide describes the architecture and performance of the integration ofVMware Horizon components for virtual desktop infrastructure (VDI) on Dell EMC vSANReady Nodes and Dell EMC VxRail appliances.

    Dell Technologies Solutions

    Part Number: H17748.2July 2020

  • Notes, cautions, and warnings

    NOTE: A NOTE indicates important information that helps you make better use of your product.

    CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid

    the problem.

    WARNING: A WARNING indicates a potential for property damage, personal injury, or death.

    © 2019 -2020 Dell Inc. or its subsidiaries. All rights reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries.Other trademarks may be trademarks of their respective owners.

  • Chapter 1: Executive Summary...................................................................................................... 5Document purpose.............................................................................................................................................................. 5Audience................................................................................................................................................................................ 5We value your feedback.....................................................................................................................................................5

    Chapter 2: Test Environment Configuration and Best Practices.................................................... 6Validated hardware resources.......................................................................................................................................... 6Validated software resources........................................................................................................................................... 7Virtual networking configuration......................................................................................................................................7Management server infrastructure................................................................................................................................. 8

    NVIDIA Virtual GPU Software License Server........................................................................................................8SQL Server databases.................................................................................................................................................. 8DNS................................................................................................................................................................................... 8

    High availability.....................................................................................................................................................................8VMware Horizon.................................................................................................................................................................. 9

    VMware Horizon solution architecture................................................................................................................... 10VMware Horizon clone technology...........................................................................................................................11

    Chapter 3: Login VSI Performance Testing...................................................................................12Login VSI performance testing process....................................................................................................................... 12

    Resource monitoring....................................................................................................................................................12Load generation............................................................................................................................................................ 13Login VSI workloads.................................................................................................................................................... 13Desktop VM test configurations...............................................................................................................................14

    Login VSI test results and analysis.................................................................................................................................14Summary of test results............................................................................................................................................. 14Knowledge Worker, 510 users, ESXi 6.7u3, Horizon 7.10, Windows 1803 .................................................... 16Knowledge Worker, 485 users, ESXi 6.7u3, Horizon 7.10, Windows 1909 ................................................... 22Power Worker, 385 users, ESXi 6.7u3, Horizon 7.10, Windows 1909 ........................................................... 28Multimedia Worker, 48 users, ESXi 6.7u3, Horizon 7.10, Windows 1909.......................................................34

    Chapter 4: NVIDIA nVector Performance Testing........................................................................ 40nVector performance testing process......................................................................................................................... 40

    Load generation........................................................................................................................................................... 40nVector Knowledge Worker workload.....................................................................................................................41Resource monitoring....................................................................................................................................................41Desktop VM configurations.......................................................................................................................................43

    nVector performance test results and analysis ........................................................................................................ 44Summary of test results.............................................................................................................................................44nVector Knowledge Worker, 48 vGPU users, ESXi 6.7u3, Horizon 7.10........................................................45nVector Knowledge Worker, 48 users, non-graphics, ESXi 6.7u3, Horizon 7.10.........................................50

    Chapter 5: Conclusion.................................................................................................................55Test results and density recommendations................................................................................................................ 55

    Contents

    Contents 3

  • Summary............................................................................................................................................................................. 55

    Chapter 6: References.................................................................................................................57Dell Technologies documentation..................................................................................................................................57VMware documentation...................................................................................................................................................57NVIDIA documentation.....................................................................................................................................................57

    Appendix A: Appendix A: Login VSI metrics................................................................................. 58

    4 Contents

  • Executive SummaryThis chapter presents the following topics:

    Topics:

    • Document purpose• Audience• We value your feedback

    Document purpose

    This validation guide details the architecture, components, testing methods, and test results for Dell EMC VxRail appliances andvSAN Ready Nodes with VMware Horizon 7. It includes the test environment configuration and best practices for systems thathave undergone testing.

    AudienceThis guide is intended for architects, developers, and technical administrators of IT environments. It provides an in-depthexplanation of the testing methodology and basis for VDI densities. It also demonstrates the value of Dell EMC Ready Solutionsfor VDI, which deliver Microsoft Windows virtual desktops to users of VMware Horizon 7 VDI components on VxRail appliancesor vSAN Ready Nodes.

    We value your feedbackDell Technologies and the authors of this document welcome your feedback on the solution and the solution documentation.Contact the Dell EMC Solutions team by email or provide your comments by completing our documentation survey.

    Authors: Dell EMC Ready Solutions for VDI Team.

    NOTE: The following website provides additional documentation for VDI Ready Solutions: VDI Info Hub for Ready Solutions.

    1

    Executive Summary 5

    mailto:[email protected]?subject=Feedback:%20Dell%20EMC%20Ready%20Solutions%20for%20VDI:%20Designs%20for%20VMware%20Horizon%20for%20VxRail%20and%20vSAN%20Ready%20Nodes%20VG%20(H17748.2)https://www.surveymonkey.com/r/SolutionsSurveyExthttps://infohub.delltechnologies.com/t/vdi/

  • Test Environment Configuration and BestPractices

    This chapter presents the following topics:

    Topics:

    • Validated hardware resources• Validated software resources• Virtual networking configuration• Management server infrastructure• High availability• VMware Horizon

    Validated hardware resourcesThe Dell EMC Ready Solutions for VDI team validated the Horizon solution on Dell EMC VxRail appliances with the specifichardware resources listed in this section.

    Enterprise platforms

    We performed the testing with the Density Optimized configuration. Configuration details are given in the following table:

    Table 1. Validated hardware configurations

    Serverconfiguration

    Platform CPU Memory RAID Ctrl HD configuration Network

    DensityOptimized

    VxRailV570F

    2 x Intel XeonGold 6248 (20C, 2.5 GHz)

    768 GB @2,933 MT/s

    HBA 330 Adapter -16.17.00.03

    Cache - 2 x 800 GB SASSSD, 2.5-inch Disk Drives

    Capacity - 6 x 1.92 TB NLSAS SSD, 2.5-inch DiskDrives

    BroadcomDual-Port25G rNDC -21.40.16.60

    NOTE: With the introduction of the six-channels-per-CPU requirement for Skylake, and now Cascade Lake processors, the

    Density Optimized memory configuration recommendation has increased from the previous guidance of 512 GB to 768 GB.

    This change was necessary to ensure a balanced memory configuration and optimized performance for your VDI solution.

    The additional memory is advantageous, considering the resulting increase in operating system resource utilization and the

    enhanced experience for users when they have access to additional memory allocations.

    Graphics hardware

    We used NVIDIA T4 Tensor Core GPUs for the graphics workload testing. The NVIDIA T4 is a single-slot form factor, 70 W, 16GB DDR6 memory, and PCI Express Gen3, universal GPU for data center workflows. The NVIDIA T4 is flexible enough to runKnowledge Worker VDI or professional graphics workloads. You can configure up to six NVIDIA T4 GPU cards into a VxRailV570F appliance to enable 96 GB of graphics frame buffer. For modernized data centers, use this card in off-peak hours to runyour inferencing workloads.

    2

    6 Test Environment Configuration and Best Practices

  • Network hardware

    We used the following network hardware for this testing:

    ● Dell EMC Networking S3048-ON (1 GbE ToR switch)—The S3048-ON switch accelerates applications in high-performance environments with a low-latency top-of-rack (ToR) switch that features 48 x 1 GbE and 4 x 10 GbE ports, adense 1U design, and up to 260 Gbps performance. This switch also supports ONIE for zero-touch installation of networkoperating systems.

    ● Dell EMC Networking S5248F-ON (25 GbE ToR switch)—The S5248F-ON switch provides optimum flexibility and cost-effectiveness for demanding compute and storage traffic environments. This ToR switch features 48 x 25 GbE SFP28 ports,4 x 100 GbE QSFP28 ports, and 2 x 100 GbE QFSP28-DD ports. The switch also supports ONIE.

    For more information, see PowerSwitch Data Center Switches.

    Validated software resourcesWe validated the solution with the software components listed in the following table:

    Table 2. Software components

    Component Software

    Hypervisor VMware ESXi 6.7u3

    Broker technology VMware Horizon 7.10

    Broker database Microsoft SQL Server 2016

    Management VM operating system Microsoft Windows Server 2016 (connection server and database)

    Virtual desktop operating system Microsoft Windows 10 Enterprise, 64-bit

    Office application suite Microsoft Office 2019 Professional Plus

    Platform Dell EMC VxRail v 4.7.410

    Testing software ● Login VSI 4.1.32.1● NVIDIA vGPU 10.1 (for graphics testing)

    Virtual networking configurationWe used 25 GbE networking for this validation effort. The VLAN configurations used in the testing were as follows:

    ● VLAN configuration:○ Management VLAN: Configured for hypervisor infrastructure traffic—L3 routed using core switch○ VDI VLAN: Configured for VDI session traffic—L3 routed using core switch○ VMware vSAN VLAN: Configured for VMware vSAN traffic—L2 switched only using ToR switch○ vMotion VLAN: Configured for Live Migration traffic—L2 switched only, trunked from Core (HA only)○ VDI Management VLAN: Configured for VDI infrastructure traffic—L3 routed using core switch

    ● A VLAN iDRAC was configured for all hardware management traffic—L3 routed using core switch

    Test Environment Configuration and Best Practices 7

    https://www.dell.com/en-ie/work/shop/networking/sc/switches/high-performance-ethernet-switches

  • Management server infrastructureThe following tables lists the management server infrastructure components and their sizing recommendations:

    Table 3. Management server sizing

    Component vCPU RAM (GB) NIC Operatingsystem + datavDisk (GB)

    Tier 2 volume (GB)

    VMware vCenter Appliance 2 16 1 290

    Platform Services Controller 2 2 1 30

    Horizon Connection Server 4 16 1 40

    SQL Server 4 8 1 40 210 (VMDK)

    File server 1 4 1 40 2048 (VMDK)

    VxRail Appliance Manager 2 8 1 32

    nVector Management VM 8 32 1 250

    NVIDIA vGPU License Server 2 4 1 40 + 5

    NVIDIA Virtual GPU Software License Server

    When using NVIDIA vGPUs, graphics-enabled VMs must obtain a license from NVIDIA vGPU Software License Server on yournetwork.

    We installed the vGPU license server software on a system running a Windows 2016 operating system to test vGPUconfigurations.

    We made the following changes to the NVIDIA license server to address licensing requirements:

    ● Used a reserved fixed IP address● Configured a single MAC address● Applied time synchronization to all hosts on the same network

    SQL Server databases

    During validation, a single dedicated SQL Server 2016 VM hosted the VMware databases in the management layer. Weseparated SQL data, logs, and tempdb into their respective volumes, and created a single database for Horizon ConnectionServer.

    DNS

    DNS is the basis for Microsoft Active Directory and also controls access to various software components for VMware services.All hosts, VMs, and consumable software components must have a presence in DNS. We used a dynamic namespace integratedwith Active Directory and adhered to Microsoft's best practices.

    High availabilityAlthough we did not enable high availability (HA) during the validation that is documented in this guide, we strongly recommendthat you factor HA into any VDI design and deployment. This process follows the N+1 model with redundancy at both thehardware and software layers. The design guide for this architecture provides additional recommendations for HA and isavailable at the VDI Info Hub for Ready Solutions.

    8 Test Environment Configuration and Best Practices

    https://infohub.delltechnologies.com/t/vdi/

  • VMware HorizonVMware Horizon 7 provides the centralized management, agility, and simplicity that is required for your virtual desktopinfrastructure. With Horizon 7, your workstations reside inside the data center premises, making the provisioning, maintenance,and recovery of virtual workstations easier. Horizon 7 with VMware Just-in-Time Management Platform (JMP) can provisionand deliver virtual desktops and applications in a fast, flexible, and personalized manner. JMP uses Instant Clones for ultra-fastprovisioning of desktops, App Volumes for real-time application delivery, and Dynamic Environment Manager for user-profilemanagement, personalization, and dynamic policy configuration to deliver an experience with the simplicity of non-persistentmanagement. For more information, see the Horizon resources page on the VMware product resources website.

    Test Environment Configuration and Best Practices 9

    https://www.vmware.com/products/horizon.html#resources

  • VMware Horizon solution architecture

    Figure 1 depicts the architecture of the validated solution, including the network, compute and graphics, management, andstorage layers. The solution runs on the VxRail HCI platform based on VMware vSAN software-defined storage. See the designguide for this solution on the VDI Info Hub for more information about the solution design.

    This architecture aligns with the VMware Horizon block and pod design. A pod is made up of management servers and a groupof interconnected Horizon Connection Servers that broker connections to desktops or published applications. A pod has multipleblocks to provide scalability. A block is a collection of one or more resource vSphere clusters hosting pools of desktops orapplications. Each block has a dedicated vCenter Server and composer servers (if linked clones are used). A vSphere Clustercan have a maximum of 64 nodes and 6,400 VMs per vSAN cluster. To expand beyond this limit, you can add clusters andbalance the VMs and nodes across the new clusters. For more information about Horizon component design, see the HorizonReference Architecture available on VMware TechZone.

    We validated this solution with the Login VSI and NVIDIA nVector performance tools. For this validation effort, we used a 4-node VxRail cluster. One of the hosts was used for both management and compute VMs, and the other three hosts were usedonly for the compute VMs. For the test involving graphics workloads, only one compute node was used with six NVIDIA T4Tensor Core GPUs configured on that host. The deployment option for this Dell EMC Ready Solutions for VDI solution supportsall cloning techniques available from VMware: full, instant, and linked clones.

    Figure 1. VMware Horizon on VxRail

    10 Test Environment Configuration and Best Practices

    https://infohub.delltechnologies.com/t/vdi/https://techzone.vmware.com/resource/workspace-one-and-horizon-reference-architecture#component_design_horizon_7https://techzone.vmware.com/resource/workspace-one-and-horizon-reference-architecture#component_design_horizon_7

  • VMware Horizon clone technology

    VMware Horizon 7 offers the following methods for cloning desktops:

    ● Full clones—These are typically used for testing purposes or to create management VMs. Full clones are not ideal for VDIbecause full copies have no connection to the original VM. Updates must be performed on each VM with this approach.

    ● Instant clones—These are available only with Horizon 7 Enterprise licenses. This technology provisions a VM the instant auser requests one. The result is a far easier approach to operating system updates and patch management because the VMis created near the time of login. You can use the combination of JMP features such as App Volumes and DynamicEnvironment Manager to emulate persistence. For the Login VSI testing, we created desktop virtual machines with instantclones. The instant clone VMs are re-created after they log off.

    ● Linked clones—These require fewer storage resources than full clones. This technology is appropriate for many VDI usecases. Differences between the parent VM and the clone are maintained in a delta file. While updates can be rolled outeffectively, multiple VM rebuilds are required to deploy a patch at the operating system level correctly. Operating systemupdates are rolled out to the parent images, and then the Desktop pool is pointed to the new snapshot with the updates. AHorizon Composer instance is required with linked clones to manage the recompose functions of the pool. For the nVectorgraphics performance testing, we created desktop virtual machines with linked clones.

    Test Environment Configuration and Best Practices 11

  • Login VSI Performance TestingThis chapter presents the following topics:

    Topics:

    • Login VSI performance testing process• Login VSI test results and analysis

    Login VSI performance testing processWe conducted the performance analysis and characterization testing (PAAC) on this solution using the Login VSI load-generation tool. Login VSI is an industry-standard tool for benchmarking VDI workloads. It uses a carefully designed, holisticmethodology that monitors both hardware resource utilization parameters and end-user experience (EUE) during load testing.

    We tested each user load against four runs: a pilot run to validate that the infrastructure was performing properly and valid datacould be captured, and three subsequent runs to enable data correlation.

    During testing, while the environment was under load, we logged in to a session and completed tasks that correspond to theuser workload. While this test is subjective, it helps to provide a better understanding of the EUE in the desktop sessions,particularly under high load. It also helps to ensure reliable data gathering.

    Resource monitoring

    To ensure that the user experience was not compromised, we monitored the following important resources:

    ● Compute host servers—Solutions based on VMware vCenter for VMware vSphere gather key data (CPU, memory, disk,and network usage) from each of the compute hosts during each test run. This data is exported to .csv files for single hostsand then consolidated to show data from all hosts. While the report does not include specific performance metrics for themanagement host servers, these servers are monitored during testing to ensure that they are performing at an expectedlevel with no bottlenecks.

    ● Hardware resources—Resource overutilization can cause poor EUE. We monitored the relevant resource utilizationparameters and compared them to relatively conservative thresholds. The thresholds, as shown in the following table, wereselected based on industry best practices and our experience to provide an optimal trade-off between good EUE and cost-per-user while also allowing sufficient burst capacity for seasonal or intermittent spikes in demand.

    Table 4. Parameter pass/fail thresholds for steady state utilization

    Parameter Pass/fail threshold

    Physical host CPU utilization 85% a

    Physical host memory utilization 85%

    Network throughput 85%

    Disk latency 20 milliseconds

    Login VSI failed sessions 2%

    a. The Ready Solutions for VDI team recommends that average CPU utilization not exceed 85 percent in a productionenvironment. A 5 percent margin of error was allocated for this validation effort. Therefore, CPU utilization sometimesexceeds our recommended percentage. Because of the nature of Login VSI testing, these exceptions are reasonablefor determining our sizing guidance.

    ● GPU resources—vSphere Client monitoring collects data about the GPU resource use from a script that is run on ESXi 6.7and later hosts. The script runs for the duration of the test and contains NVIDIA System Management Interface commands.The commands query each GPU and log the GPU processor, temperature, and memory use data to a .csv file.

    3

    12 Login VSI Performance Testing

  • Load generation

    Login VSI installs a standard collection of desktop application software, including Microsoft Office and Adobe Acrobat Reader,on each VDI desktop testing instance. It then uses a configurable launcher system to connect a specified number of simulatedusers to available desktops within the environment. When the simulated user is connected, a login script configures the userenvironment and starts a defined workload. Each launcher system can launch connections to several VDI desktops (targetmachines). A centralized management console configures and manages the launchers and the Login VSI environment.

    We used the following login and boot conditions:

    ● Users were logged in within a login timeframe of 1 hour, except during testing of low-density solutions such as GPU/graphic-based configurations, in which case users were logged in every 10 to 15 seconds.

    ● All desktops were started before users were logged in.

    Login VSI workloads

    The following table describes the Login VSI workloads that we tested:

    Table 5. Login VSI workloads

    Login VSI workloadname

    Workload description

    Knowledge Worker Designed for virtual machines with 2 vCPUs. This workload includes the following activities:● Microsoft Outlook—Browse messages.● Internet Explorer—Browse websites and open a YouTube style video (480p movie trailer) three

    times in every loop.● Microsoft Word—Start one instance to measure response time and another to review and edit a

    document.● Doro PDF Printer and Acrobat Reader—Print a Word document and export it to PDF.● Microsoft Excel—Open a large randomized sheet.● Microsoft PowerPoint—Review and edit a presentation.● FreeMind—Run a Java-based Mind Mapping application.● Other—Perform various copy and zip actions.

    Power Worker The most intensive of the standard Login VSI workloads. The following activities are performed withthis workload:● Begin by opening four instances of Internet Explorer and two instances of Adobe Reader that

    remain open throughout the workload.● Perform more PDF printer actions than in the other workloads.● Watch a 720p and a 1080p video.● Reduce the idle time to two minutes.● Perform various copy and zip actions.

    Multimedia Worker A workload that is designed to heavily stress the CPU when using software graphics acceleration.GPU-accelerated computing offloads the most compute-intensive sections of an application to theGPU while the CPU processes the remaining code. This modified workload uses the followingapplications for its GPU/CPU-intensive operations:● Adobe Acrobat● Google Chrome● Google Earth● Microsoft Excel● HTML5 3D spinning balls● Internet Explorer● MP3● Microsoft Outlook● Microsoft PowerPoint● Microsoft Word● Streaming video

    Login VSI Performance Testing 13

  • Desktop VM test configurations

    The following table summarizes the desktop VM configurations used for the Login VSI workloads that we tested:

    Table 6. Desktop VM specifications

    Login VSI workload vCPUs ESXi configuredmemory

    ESXi reservedmemory

    Screenresolution

    Operating system

    Knowledge Worker 2 4 GB 2 GB 1920 x 1080 Windows 10 Enterprise 64-bit

    Power Worker 2 8 GB 4 GB 1920 x 1080 Windows 10 Enterprise 64-bit

    Multimedia Worker 4 8 GB 8 GB 1920 x 1080 Windows 10 Enterprise 64-bit

    Login VSI test results and analysis

    Summary of test results

    The following table summarizes the host utilization metrics for the different Login VSI workloads that we tested and the userdensity derived from Login VSI performance testing:

    Table 7. Login VSI test results summary

    Serverconfiguration

    Login VSIworkload

    Operatingsystem

    Userdensity

    AverageCPUa

    AverageGPU

    Averageactivememory

    AverageIOPSper user

    AveragenetworkMbps peruser

    DensityOptimized

    KnowledgeWorker

    Windows 10,1803

    130 86% N/A 140 GB 15.9 1.55 Mbps

    DensityOptimized

    KnowledgeWorker

    Windows 10,1909

    125 85% N/A 134 GB 7.49 1.52 Mbps

    DensityOptimized

    Power Worker Windows 10,1909

    100 87% N/A 154 GB 8.45 2.31 Mbps

    DensityOptimized + 6x T4

    MultimediaWorker (T4-2BvGPU)

    Windows 10,1909

    48 80% 22% 392 GB 16 21 Mbps

    a. The Ready Solutions for VDI team recommends that average CPU utilization not exceed 85 percent in a productionenvironment. A 5 percent margin of error was allocated for this validation effort. Therefore, CPU utilization sometimesexceeds our recommended percentage. Because of the nature of Login VSI testing, these exceptions are reasonable fordetermining our sizing guidance.

    As shown in the table, the CPU was the bottleneck in all the test cases. In all but the Multimedia Worker test, the CPUutilization metric reached the 85 percent (+5 percent margin) threshold that we set for CPU utilization. These threshold values,as shown in Table 4, are carefully selected to deliver an optimal combination of excellent EUE and cost-per user while alsoproviding burst capacity for seasonal or intermittent spikes in usage. We do not load the system beyond these thresholds toreach a Login VSImax (Login VSImax shows the number of sessions that can be active on a system before the system issaturated).

    Memory was not a constraint during testing. The total memory of 768 GB was enough for all of the Login VSI workloads to runwithout any constraints. With a dual-port 25 GbE NIC available on the hosts, network bandwidth was also not an issue. Disklatency was also under the threshold that we set, and disk performance was good.

    For the multimedia workload test, the maximum number of users that can be accommodated on the host with each having anNVIDIA T4-2B vGPU profile is 48. The total available frame-buffer on the host with six NVIDIA T4 GPUs configured is 96 GB.The Login VSI scores and host metric results indicate that user experience and performance were good during the running ofthis graphics-intensive workload.

    We have recommended the user densities based on the Login VSI test results and considering the thresholds that we set forhost utilization parameters. To maintain good EUE, do not exceed these thresholds. You can load more user sessions and exceedthese thresholds, but you might experience a degradation in user experience.

    14 Login VSI Performance Testing

  • The host utilization metrics mentioned in the table are defined as follows:

    ● User density—The number of users per compute host that successfully completed the workload test within the acceptableresource limits for the host. For clusters, this number reflects the average of the density achieved for all compute hosts inthe cluster.

    ● Average CPU—The average CPU usage over the steady state period. For clusters, this number represents the combinedaverage CPU usage of all compute hosts. On the latest Intel processors, the ESXi host CPU metrics exceed the rated 100percent for the host if Turbo Boost is enabled, which is the default setting. An additional 35 percent of CPU is available fromthe Turbo Boost feature, but this additional CPU headroom is not reflected in the VMware vSphere metrics where theperformance data is gathered.

    ● Average active memory—For ESXi hosts, the amount of memory that is actively used, as estimated by the VMKernelbased on recently touched memory pages. For clusters, this is the average amount of physical guest memory that is activelyused across all compute hosts over the steady state period.

    ● Average IOPS per user—IOPS calculated from the average cluster disk IOPS over the steady state period divided by thenumber of users.

    ● Average network usage per user—Average network usage on all hosts calculated over the steady state period divided bythe number of users.

    Login VSI Performance Testing 15

  • Knowledge Worker, 510 users, ESXi 6.7u3, Horizon 7.10, Windows1803

    We performed this test with the Login VSI Knowledge Worker workload. The test was performed on a 4-node VxRail cluster. Wecreated the desktops VMs using VMware Horizon instant clone technology. We used the VMware Horizon Blast Extreme displayprotocol. Host 1 hosted both management and desktop VMs. We populated the compute hosts with 130 desktop VMs each andthe management host with 120 desktop VMs.

    CPU usage

    The following graphs show the CPU utilization across the four hosts during the testing. CPU usage with all VMs powered onwas approximately 10 percent before the test started. The CPU usage steadily increased during the login phase, as shown inFigure 2.

    Figure 2. CPU usage

    During the steady state phase, an average CPU utilization of 86 percent was recorded. This value is close to the pass/failthreshold that we set for average CPU utilization (see Table 4). To maintain good EUE, do not exceed this threshold. You canload more user sessions while exceeding this threshold for CPU, but you might experience a degradation in user experience.

    As shown in Figure 3, the CPU readiness was well below the 5 percent threshold that we set. The average steady state CPUcore utilization across the four hosts was 75 percent, as shown in Figure 4.

    16 Login VSI Performance Testing

  • Figure 3. CPU readiness

    Figure 4. CPU core utilization

    Login VSI Performance Testing 17

  • Memory

    We observed no memory constraints during the testing on either the management or compute hosts. Out of 768 GB of availablememory per node, the compute host reached a maximum consumed memory of 581 GB and a steady state average of 531 GB.Active memory usage reached a maximum active memory of 326 GB and recorded a steady state average memory of 140 GB.There was no memory ballooning or swapping on the hosts.

    Figure 5. Consumed memory usage

    Figure 6. Active memory usage

    18 Login VSI Performance Testing

  • Network usage

    Network bandwidth was not an issue during the testing. The network usage recorded a steady state average of 792 Mbps. Thebusiest period for network traffic was during the logout phase when a peak value of 5,852 Mbps was recorded. The averagesteady state network usage per user was 1.55 Mbps.

    Figure 7. Network usage

    Cluster IOPS

    Cluster IOPS reached a maximum value of 17,368 for read IOPS and 5,152 for write IOPS during the steady state phase. Theaverage steady state read and write IOPS were 5,362 and 2,749 respectively. The average disk IOPS (read+write) per user was15.9.

    Login VSI Performance Testing 19

  • Figure 8. Cluster IOPS

    Disk I/O latency

    Cluster disk latency reached a maximum read latency of 0.12 milliseconds and a maximum write latency of 0.2 milliseconds duringthe logout phase. The average steady state read latency was 0.25 milliseconds, and the average steady state write latency was0.83 milliseconds.

    Figure 9. Disk latency

    User experience

    The baseline score for the Login VSI test was 835. This score falls in the 800 to 1,199 range rated as "Good" by Login VSI. Formore information about Login VSI baseline ratings and baseline calculations, see this Login VSImax article. We ran the Login VSItest for 510 user sessions for the Knowledge Worker workload. As indicated by the blue line in Figure 10, the system reached aVSImax average score of 1,259 when 510 sessions were loaded. This value is well below the VSI threshold score of 1,836 set by

    20 Login VSI Performance Testing

    https://support.loginvsi.com/hc/en-us/articles/115004421905-VSImax-baseline-scores

  • the Login VSI tool. During the testing, VSImax was never reached, which typically indicates a stable system and a better userexperience.

    The Login VSImax user experience score for this test was not reached. When manually interacting with the sessions during thesteady state phase, the mouse and window movement were responsive, and video playback was good. No "stuck sessions" werereported during the testing, indicating that the system was not overloaded at any point. See Appendix A, which explains theLogin VSI metrics

    Figure 10. Login VSI graph

    Table 8. Login VSI score summary

    Login VSI baseline VSI index average VSIMax reached VSI threshold

    835 1,259 No 1,836

    Login VSI Performance Testing 21

  • Knowledge Worker, 485 users, ESXi 6.7u3, Horizon 7.10, Windows1909

    We performed this test with the Login VSI Knowledge Worker workload. The test was performed on a 4-node VxRail cluster. Wecreated the desktop VMs using VMware Horizon instant clone technology. Host 1 hosted both management and desktop VMs.We populated each compute host with 125 desktop VMs and the management host with 110 desktop VMs. We used the VMwareHorizon Blast Extreme display protocol.

    CPU usage

    The following graphs show the CPU utilization across the four hosts during the testing. CPU usage with all VMs powered onwas approximately 7 percent before the test started. The CPU usage steadily increased during the login phase. During thesteady state phase, an average CPU utilization of 85 percent was recorded. This value is close to the pass/fail threshold thatwe set for average CPU utilization (see Table 4). To maintain good EUE, do not exceed this threshold. You can load more usersessions while exceeding this threshold, but you might experience a degradation in user experience.

    Figure 11. CPU usage

    As shown in Figure 12, the CPU readiness was well below the 5 percent threshold that we set.

    22 Login VSI Performance Testing

  • Figure 12. CPU readiness

    The average steady state CPU core utilization across the four hosts was 75 percent, as shown in Figure 13.

    Figure 13. CPU core utilization

    Memory

    We observed no memory constraints during the testing on either the management or compute hosts. Out of 768 GB of availablememory per node, the compute host reached a maximum consumed memory of 550 GB and a steady state average of 529 GB.

    Login VSI Performance Testing 23

  • Active memory usage reached a maximum of 293 GB and recorded a steady state average memory of 134 GB. There was nomemory ballooning or swapping on the hosts.

    Active memory utilization was reduced to a minimum when users logged out of their sessions. There was an increase in activememory usage during the re-create phase. This peak in active memory usage is expected during the instant clone re-creationprocess because all VMs that have been destroyed after user logout have to be re-created, which is a memory-intensive task.

    Figure 14. Consumed memory usage

    Figure 15. Active memory usage

    24 Login VSI Performance Testing

  • Network usage

    Network bandwidth was not an issue on this test. A steady state average of 740 Mbps was recorded. The busiest period fornetwork traffic was during the steady state phase. Compute host 2 recorded a peak network usage of 2,662 Mbps. The averagesteady state network usage per user was 1.52 Mbps.

    Figure 16. Network usage

    Cluster IOPS

    Cluster IOPS reached a maximum read value of 13,462, and it reached a maximum write value of 2,787 in the logout phase. Theaverage steady state read and write IOPS were 1,247 and 2,389, respectively. The average disk IOPS per user during the steadystate period was 7.49.

    Login VSI Performance Testing 25

  • Figure 17. Cluster IOPS

    Disk latency

    Cluster disk latency reached a maximum read latency value of 0.69 milliseconds, and it reached and a maximum write latencyvalue of 1.89 milliseconds during the logout phase. The average steady state read latency was 0.24 milliseconds, and the averagesteady state write latency was 0.67 milliseconds.

    Figure 18. Disk latency

    26 Login VSI Performance Testing

  • User experience

    The baseline score for the Login VSI test was 938. This score falls in the 800 to 1,199 range rated as "Good" by Login VSI. Formore information about Login VSI baseline ratings and baseline calculations, see this Login VSImax article. We ran the Login VSItest for 485 user sessions for the Knowledge Worker workload. As indicated by the blue line in Figure 19, the system reached aVSImax average score of 1,289 when 485 sessions were loaded. This value is well below the VSI threshold score of 1,938 set bythe Login VSI tool. During the testing, VSImax was never reached, which normally indicates a stable system and a better userexperience.

    When manually interacting with the sessions during the steady state, the mouse and window movements were responsive, andvideo playback was good. No "stuck sessions" were reported during the testing, indicating that the system was not overloadedat any point. See Appendix A, which explains the Login VSI metrics.

    Figure 19. Login VSI graph

    Table 9. Login VSI score summary

    Login VSI baseline VSI index average VSIMax reached VSI threshold

    938 1,289 No 1,938

    Login VSI Performance Testing 27

    https://support.loginvsi.com/hc/en-us/articles/115004421905-VSImax-baseline-scores

  • Power Worker, 385 users, ESXi 6.7u3, Horizon 7.10, Windows 1909

    We performed this test with the Login VSI Power Worker workload. The test was performed on a 4-node VxRail cluster. Wecreated the desktop VMs using VMware Horizon instant clone technology. We used the VMware Horizon Blast Extreme displayprotocol. Host 1 was provisioned with both management and desktop VMs. We populated the compute host with 100 desktopVMs and the management host with 85 desktop VMs.

    CPU usage

    The following graphs show the CPU utilization across the four hosts during the testing. CPU usage with all VMs powered onwas approximately 10 percent before the test started. The CPU usage steadily increased during the login phase. During steadystate, an average CPU utilization of 87 percent was recorded. This value is close to the pass/fail threshold that we set foraverage CPU utilization (see Table 4). For good EUE, do not exceed this threshold. You can load more user sessions whileexceeding this threshold for CPU, but you might experience a degradation in user experience.

    As shown in Figure 21, the CPU readiness was well below the 5 percent threshold that we set. The average steady state CPUcore utilization across the four hosts was 79 percent, as shown in Figure 22.

    Figure 20. CPU usage

    28 Login VSI Performance Testing

  • Figure 21. CPU readiness

    Figure 22. CPU core utilization

    Memory

    We observed no memory constraints during the testing on either the management or compute hosts. Out of 768 GB of availablememory per node, the compute host reached a maximum consumed memory of 741 GB and a steady state average of 691 GB.Active memory usage reached a maximum of 459 GB and recorded a steady state average memory of 154 GB. There was nomemory ballooning or swapping on the hosts.

    Login VSI Performance Testing 29

  • Active memory utilization was reduced to a minimum when users logged out of their sessions. There was an increase in activememory usage during the re-create phase. This peak in active memory usage is expected during the instant clone re-creationprocess because all VMs destroyed after user logout have to be re-created, which is a memory-intensive task.

    Figure 23. Consumed memory usage

    Figure 24. Active memory usage

    30 Login VSI Performance Testing

  • Network usage

    Network bandwidth was not an issue on this test. A steady state average of 891 Mbps was recorded. The busiest period fornetwork traffic was toward the end of the logout phase and during the starting of the re-creation of instant clones. Computehost 3 reached a peak value of 3,769 Mbps toward the end of the logout phase.

    Figure 25. Network usage

    IOPS

    As shown in the following figure, the cluster IOPS reached a maximum read value of 21,090 in the re-create phase. It reached amaximum write value of 6,283 toward the end of the logout phase. The average steady state read and write IOPS were 1,310and 1,941, respectively. The average disk IOPS per user during the steady state period was 8.45.

    Login VSI Performance Testing 31

  • Figure 26. IOPS

    Disk latency

    As shown in the following figure, cluster disk latency reached a maximum read value of 12.83 milliseconds, and it reached amaximum write value of 3.12 milliseconds during the logout phase. The average steady state read latency was 0.77 milliseconds,and the average steady state write latency was 3.80.

    Figure 27. Disk latency

    32 Login VSI Performance Testing

  • User experience

    The baseline score for the Login VSI test was 877. This score falls within the 800 to 1,199 range rated as "Good" by Login VSI.For more information about Login VSI baseline ratings and baseline calculations, see this Login VSImax article. We ran the LoginVSI test for 385 user sessions for the Power Worker workload. As indicated by the blue line in the following figure, the systemreached a VSImax average score of 1,271 when 385 sessions were loaded. This score is well below the VSI threshold score of1,878 set by the Login VSI tool. During the duration of testing, VSImax was never reached, which normally indicates a stablesystem and a better user experience.

    When manually interacting with the sessions during the steady state phase, the mouse and window movement were responsive,and video playback was good. There was only one "stuck session" reported during the testing, indicating that the system wasnot overloaded at any point in time. See Appendix A, which explains the Login VSI metrics.

    Figure 28. Login VSI graph

    Table 10. Login VSI score summary

    Login VSI baseline VSI index average VSImax reached VSI threshold

    877 1,271 No 1,878

    Login VSI Performance Testing 33

    https://support.loginvsi.com/hc/en-us/articles/115004421905-VSImax-baseline-scores

  • Multimedia Worker, 48 users, ESXi 6.7u3, Horizon 7.10, Windows1909

    We performed this test with the Login VSI Multimedia Worker workload. The test was performed on the compute host 4 byconfiguring six NVIDIA T4 GPUs. We provisioned the host with 48 vGPU-enabled VMs, and each of the desktop VMs used theNVIDIA T4-2B vGPU profile. The infrastructure VMs were run on the management host 1. The desktop VMs were created usingVMware Horizon instant clone technology. We used the VMware Horizon Blast Extreme display protocol for this testing. Hosts 2and 3 did not have any load.

    CPU usage

    The following graph shows the CPU utilization on the GPU-enabled compute host during the testing. CPU usage with all virtualmachines powered on the host was approximately 4 percent before the test started. The CPU usage steadily increased duringthe login phase, as shown in the following figure. During steady state, an average CPU utilization of 80 percent was recorded.This value was below the pass/fail threshold that we set for average steady state CPU utilization (see Table 4).

    Figure 29. CPU usage

    As shown in the following figure, the CPU readiness was well below the 5 percent threshold that we set. The average steadystate CPU core utilization across the four hosts was 72 percent, as shown in Figure 31.

    34 Login VSI Performance Testing

  • Figure 30. CPU readiness

    Figure 31. CPU core utilization

    GPU

    The following graph shows the GPU usage across the six NVIDIA T4 GPUs configured on the host. The GPU usage during thesteady state period across the six GPUs averaged approximately 22 percent. A peak GPU usage of 36 percent was recorded onGPU 3 during the logout phase.

    Login VSI Performance Testing 35

  • Figure 32. GPU usage

    Memory

    We observed no memory constraints during the testing on either the management or compute hosts. Out of 768 GB of availablememory per node, the compute host reached a maximum consumed memory of 440 GB and a maximum active memory of 392GB during the steady state phase. There were no variations in memory usage throughout the test because all vGPU-enabled VMmemory was reserved. There was no memory ballooning or swapping on the hosts.

    Figure 33. Consumed memory usage

    36 Login VSI Performance Testing

  • Figure 34. Active memory usage

    Network usage

    Network bandwidth was not an issue during the test. A steady state peak value of 1,448 Mbps was recorded on the GPU-enabled compute host. The busiest period for network traffic was during the steady state phase. The steady state averagenetwork usage value was 1,009 Mbps. The average network usage per user was 21 Mbps.

    Figure 35. Network usage

    Login VSI Performance Testing 37

  • IOPS

    As shown in the following figure, the cluster IOPS reached a maximum value of 616 for read IOPS during the login phase and 938for write IOPS during the steady state phase of the testing. The average steady state value for read IOPS was 228, and theaverage steady state value for write IOPS it was 542. The average disk IOPS (read and write) per user was 16.

    Figure 36. IOPS

    Disk latency

    Cluster disk latency reached a maximum read latency value of 0.6 milliseconds and it reached a maximum write latency value of0.9 milliseconds during the logout phase. An average read latency of 0.38 milliseconds, and an average write latency of 0.7milliseconds were recorded during the steady state phase.

    Figure 37. Disk latency

    38 Login VSI Performance Testing

  • User experience

    The baseline score for the Login VSI test was 877. This score falls within the 800 to 1,199 range rated as "Good" by Login VSI.For more information about Login VSI baseline ratings and baseline calculations, see this Login VSImax article. We ran the LoginVSI test for 48 user sessions for the Login VSI Multimedia Worker workload. As indicated by the blue line in the following figure,the system reached a VSImax average score of 1,223 when 48 sessions were loaded. This is well below the VSI threshold scoreof 1,877 set by the Login VSI tool. During the duration of testing, VSImax was never reached, which normally indicates a stablesystem and a better user experience.

    The Login VSImax user experience score for this test was not reached. When manually interacting with the sessions duringsteady state, the mouse and window movements were responsive, and video playback was good. There were no "stucksessions" recorded during the testing, indicating that the system was not overloaded at any point. See Appendix A for details ofthe Login VSI metrics.

    Figure 38. Login VSI graph

    Table 11. Login VSI summary

    Login VSI baseline VSI index average VSImax reached VSI threshold

    877 1,223 No 1,877

    Login VSI Performance Testing 39

    https://support.loginvsi.com/hc/en-us/articles/115004421905-VSImax-baseline-scores

  • NVIDIA nVector Performance TestingThis chapter presents the following topics:

    Topics:

    • nVector performance testing process• nVector performance test results and analysis

    nVector performance testing processNVIDIA nVector is a performance testing tool for benchmarking VDI workloads. The nVector tool creates a load on the systemby simulating a workload that matches a typical VDI environment. The tool assesses the experience at the endpoint devicerather than the response time of the virtual desktop.

    The nVector tool captures performance metrics from the endpoints that quantify user experience, including image quality, framerate, and user latency. These metrics, when combined with resource utilization information from the servers under test, enableIT teams to assess their VDI graphics-accelerated environment needs.

    We tested multiple runs for each user load scenario to eliminate single-test bias. We used a pilot run to validate that the solutionwas functioning as expected, and we validated that testing data was being captured. We then tested subsequent runs toprovide data that confirmed that the results we obtained were consistent.

    To assess the EUE experience, we logged into a VDI session and completed several tasks that are typical of a normal userworkload. This small incremental load on the system did not significantly affect our ability to provide reproducible results. Whilethe assessment undoubtedly is subjective, it helps to provide a better understanding of the EUE under high load. It also helps toassess the reliability of the overall testing data.

    Load generation

    The nVector tool runs the simulated workflow of a typical VDI workload at a predesignated scale. This part of the test requiresperformance monitoring to measure resource utilization. Acting as an execution engine, nVector orchestrates the followingnecessary stages that are involved in measuring EUE for a predefined number of VDI instances:

    1. Provision VDI instances with predefined settings such as vCPU, vRAM, and frame buffer, and provision an equal number ofVMs that act as virtual thin clients.

    2. Establish remote connections to VDI desktops using virtual clients.3. Measure resource utilization statistics on the server and on the guest operating system of the VDI desktop.4. Run the designated workload on all the VDI instances.5. Collect and analyze performance data and EUE measurements.6. Generate a report that reflects the trade-off between EUE and user density (scale).

    The following figure shows the stages in the NVIDIA benchmarking tool's measurement of user experience:

    4

    40 NVIDIA nVector Performance Testing

  • Figure 39. NVIDIA nVector workflow

    nVector Knowledge Worker workload

    We performed this testing exercise with NVIDIA's nVector Knowledge Worker workload. This workload contains a mixture oftypical office applications, including some multimedia usage. This workload is representative of what a typical office worker doesduring the working day. The activities performed include:

    ● Working on Microsoft Excel files● Scrolling through PDFs● Opening and working on Microsoft Word documents● Opening and presenting a Microsoft PowerPoint presentation● Opening and viewing web pages and web videos using Google Chrome● Opening and closing applications and saving or copying content

    Resource monitoring

    Host metrics

    We used VMware vCenter to gather key host utilization metrics, including CPU, GPU, memory, disk, and network usage fromthe compute host during each test run. This data was exported to .csv files for each host and then consolidated for reporting.

    Resource overutilization can cause poor EUE. We monitored the relevant resource utilization parameters and compared them torelatively conservative thresholds. The thresholds were selected based on industry best practices and our experience to providean optimal trade-off between good EUE and cost-per-user while also allowing enough burst capacity for seasonal or intermittentspikes in demand. The following table shows the pass/fail threshold for host utilization metrics:

    Table 12. Resource utilization thresholds

    Parameter Pass/fail threshold

    Physical host CPU utilization 85%

    Physical host memory utilization 85%

    NVIDIA nVector Performance Testing 41

  • Table 12. Resource utilization thresholds (continued)

    Parameter Pass/fail threshold

    Network throughput 85%

    Physical host CPU readiness 10%

    Measuring EUE

    This section explains the EUE metrics measured by the nVector tool. These metrics include image quality, frame rate, and end-user latency.

    Metric 1: Image quality—NVIDIA nVector uses a lightweight agent on the VDI desktop and the client to measure imagequality. These agents take multiple screen captures on the VDI desktop and on the thin client to compare later. The structuralsimilarity (SSIM) of the screen capture taken on the client is computed by comparing it to the one taken on the VDI desktop.When the two images are similar, the heatmap reflects more colors above the spectrum, with an SSIM value closer to 1.0, asshown on the right-hand side in the following figure. As the images become less similar, the heatmap reflects more colors downthe spectrum, with a value of less than 1.0. More than a hundred pairs of images across an entire set of user sessions areobtained. The average SSIM index of all pairs of images is computed to provide the overall remote session quality for all users.

    Figure 40. SSIM as a measure of image quality

    Metric 2: Frame rate—Frame rate is a common measure of user experience that defines how smooth the experience is. Itmeasures the rate at which frames are delivered on the screen of the endpoint device. During the workload testing, NVIDIAnVector collects data on the frames per second (FPS) sent to the display device on the end client. This data is collected fromthousands of samples, and the value of the 90th percentile is taken for reporting. A larger FPS indicates a more fluid userexperience.

    Metric 3: End-user latency—The end-user latency metric defines the level of response of a remote desktop or application. Itmeasures the duration of any lag that an end-user experiences when interacting with a remote desktop or application.

    42 NVIDIA nVector Performance Testing

  • Desktop VM configurations

    The following table summarizes the compute VM configurations for the nVector workloads that we tested:

    Table 13. Desktop VM specifications

    Test case nVectorworkload

    vCPUs Memory Reservedmemory

    vGPUprofile

    Operatingsystem bitlevel

    HD size Screenresolution

    Non-GPU KnowledgeWorker

    2 4 GB 2 GB N/A Windows 10,64-bit

    60 GB 1920 x 1080

    GPU KnowledgeWorker+GPU

    2 4 GB 4 GB T4-2B Windows 10,64-bit

    60 GB 1920 x 1080

    NVIDIA nVector Performance Testing 43

  • nVector performance test results and analysis

    Summary of test results

    For the GPU testing, we used a single compute host with six NVIDIA T4 GPUs. We enabled 48 VMs with an NVIDIA T4-2B,vGPU profile. The vGPU scheduling policy was set to "Fixed Share Scheduler." For the non-GPU test, we performed testing ona compute host running 48 VMs without enabling vGPU profiles.

    The compute host was part of a 4-node VMware vSAN software-defined storage cluster. We used VMware linked clones tocreate VMs in both tests. We used the Horizon Blast Extreme protocol as the remote display protocol with H.264 hardwareencoding.

    We performed both tests with the NVIDIA nVector Knowledge Worker workload. The following table compares the utilizationmetrics gathered from vCenter for both tests, while the second table compares the EUE metrics generated by the nVector tool.Both tests produced almost the same image quality (SSIM value, above 0.9). However, with GPUs enabled, the FPS increasedby 14 percent, and the end-user latency decreased by 18 percent.

    Table 14. Host utilization metrics

    Testcase

    Serverconfiguration

    nVectorworkload

    Density perhost

    AverageCPU usage

    AverageGPU usage

    Averageactivememory

    AverageIOPS peruser

    Averagenet Mbpsper user

    GPU DensityOptimized + 6 xNVIDIA T4

    KnowledgeWorker(NVIDIAT4-2B)

    48 68% 19% 193 GB 84 8.7

    Non-GPU

    DensityOptimized

    KnowledgeWorker

    48 69% N/A 67 GB 65 8.9

    Table 15. NVIDIA nVector EUE metrics

    Testconfiguration

    nVectorworkload

    GPU profile Density perhost

    End-userlatency

    Frame rate Image quality

    GPU KnowledgeWorker

    NVIDIA T4-2B 48 100 milliseconds 16 0.94

    Non-GPU KnowledgeWorker

    N/A 48 122 milliseconds 14 0.99

    44 NVIDIA nVector Performance Testing

  • nVector Knowledge Worker, 48 vGPU users, ESXi 6.7u3, Horizon7.10

    For this test, we configured compute host 1 with six NVIDIA T4 GPUs. Host 1 was running 48 vGPU-enabled desktop VMs withthe NVIDIA T4-2B profile. Host 2 was used to run nVector launcher VMs. A launcher is an endpoint VM from which the desktopVM launch is initiated. Host 2 was configured with three P40 GPUs, and the launcher VMs in host 2 were GPU-enabled throughan NVIDIA P40-1B profile. It is a requirement for the nVector tool to enable launcher VMs with GPUs. Host 3 was used to hostmanagement VMs only, and host 4 did not have any load.

    We used linked clone provisioning for this testing. Linked clones, as opposed to instant clones, are not re-created automaticallyafter logout. We used the VMware Horizon Blast Extreme protocol as the remote display protocol.

    The total GPU frame-buffer available on compute host 1 was 96 GB. With vGPU VMs enabled with the NVIDIA T4-2B profile,the maximum number of GPU-enabled users that can be hosted on compute host 1 is 48 users.

    CPU

    The following figure shows the CPU utilization of the four hosts during the testing. We can see a spike in CPU usage forcompute host 1 and launcher host 2 during linked-clone creation and the login phase. During the steady state phase, an averageCPU utilization of 68 percent was recorded on the GPU-enabled compute host 1. This value was lower than the pass/failthreshold that we set for average CPU utilization (see Table 12). The launcher host 2 and management host 3 had very low CPUusage during steady state.

    Figure 41. CPU usage

    As shown in the following figure, the CPU readiness percentage was well below the 5 percent threshold that we set. Theaverage steady state CPU core utilization was 56 percent on the GPU-enabled compute host 1, as shown in Figure 43.

    NVIDIA nVector Performance Testing 45

  • Figure 42. CPU readiness

    Figure 43. CPU core utilization

    GPU usage

    The following graph shows the GPU usage across the six NVIDIA T4 GPUs configured on the GPU-enabled compute host 1. TheGPU usage during the steady state period across the six GPUs averaged approximately 19 percent. During the steady statephase, peak usage of 39 percent was recorded on GPU 3.

    46 NVIDIA nVector Performance Testing

  • Figure 44. GPU usage

    Memory

    We observed no memory constraints during the testing on the compute or management host. Out of 768 GB of availablememory per node, compute host 1 reached a maximum consumed memory of 244 GB. Active memory usage reached a maximumof 193 GB. There were no variations in memory usage throughout the test because all vGPU-enabled VM memory was reserved.There was no memory ballooning or swapping on hosts.

    Figure 45. Consumed memory usage

    NVIDIA nVector Performance Testing 47

  • Figure 46. Active memory usage

    Network usage

    Network bandwidth was not an issue in this test. A steady state average network usage of 422 Mbps was recorded during thesteady state phase. The busiest period for network traffic was during the start of the steady state phase, and network usage oncompute host 1 spiked to 5,831 Mbps. The steady state average network usage per user was 8.7 Mbps.

    Figure 47. Network usage

    48 NVIDIA nVector Performance Testing

  • Cluster IOPS

    Cluster IOPS reached a maximum value of 2,795 for read IOPS, and a maximum value of 1,987 for write IOPS during the testing.The average steady state read IOPS was 2,496 and the average steady state write IOPS was 1,557. The average steady stateIOPS (read+write) per user was 84.

    Figure 48. IOPS

    Disk latency

    Cluster disk latency reached a maximum read latency value of 0.28 milliseconds, and a maximum write latency value of 0.81milliseconds. The average steady state read latency was 0.23 milliseconds, and the average steady state write latency was 0.7milliseconds.

    Figure 49. Disk latency

    NVIDIA nVector Performance Testing 49

  • nVector Knowledge Worker, 48 users, non-graphics, ESXi 6.7u3,Horizon 7.10

    We ran compute host 1 with 48 desktop VMs. No GPUs were configured. Host 2 ran the nVector launcher VMs. A launcher is anendpoint VM, from which the desktop VMs are launched. We configured the host 2 with three P40 GPUs, and the launcher VMsin host 2 were GPU-enabled through an NVIDIA GRID P40-1B profile. It is a requirement for the nVector tool to enable launcherVMs with vGPUs. Host 3 was provisioned with management VMs only, and host 4 did not have any load.

    We used linked clone provisioning for this testing. Linked clones, as opposed to instant clones, are not re-created automaticallyafter logout. We used the VMware Horizon Blast Extreme protocol as the remote display protocol.

    CPU

    The following graph shows the CPU utilization across the four hosts during the testing. We can see a spike in CPU usage forcompute and launcher hosts during linked clone creation and the login phase. During the steady state phase, an average CPUutilization of 69 percent was recorded on the compute host. This value was lower than the pass/fail threshold that we set foraverage CPU utilization (see Table 12). Launcher host 2 and management host 3 had very low CPU usage during the steadystate phase.

    Figure 50. CPU usage

    As shown in the following figure, the CPU readiness was well below the 5 percent threshold that we set.

    50 NVIDIA nVector Performance Testing

  • Figure 51. CPU readiness

    As shown in the following figure, the average steady state CPU core utilization on compute host 1 was 56 percent.

    Figure 52. CPU core utilization

    NVIDIA nVector Performance Testing 51

  • Memory

    We observed no memory constraints during the testing on the compute or the management host. Out of 768 GB of availablememory per node, compute host 1 reached a maximum consumed memory of 241 GB. Active memory usage reached a maximumof 67 GB. There was no memory ballooning or swapping on hosts.

    Figure 53. Consumed memory usage

    Figure 54. Active memory usage

    52 NVIDIA nVector Performance Testing

  • Network usage

    Network bandwidth was not an issue in this test. A steady state average of 427 Mbps was recorded on compute host 1 duringthe steady state phase. The average network usage per user on compute host 1 was 8.9 Mbps. The busiest period for networktraffic was during the linked clone creation phase.

    Figure 55. Network usage

    Cluster IOPS

    The cluster IOPS reached a maximum value of 2,035 for read IOPS, and a maximum value of 1,984 for write IOPS during thetesting. The average steady state read IOPS was 1,680, and the average steady state write IOPS was 1,455. The average steadystate disk IOPS (read+write) per user during the steady state was 65.

    NVIDIA nVector Performance Testing 53

  • Figure 56. IOPS

    Disk latency

    Cluster disk latency reached a maximum read latency of 0.3 milliseconds, and a maximum write latency of 0.8 milliseconds. Theaverage steady state read latency was 0.24 milliseconds, and the average steady state write latency was 0.66 milliseconds.

    Figure 57. Disk latency

    54 NVIDIA nVector Performance Testing

  • ConclusionThis guide describes the integration of vSAN-based appliances from Dell Technologies and VMware Horizon 7 brokeringsoftware to create virtual application and desktop environments. This architecture provides exceptional scalability and anexcellent user experience, and empowers IT teams to play a proactive strategic role in the organization.

    Dell Technologies offers comprehensive, flexible, and efficient VDI solutions that are designed and optimized for theorganization's needs. These VDI solutions are easy to plan, deploy, and run.

    Dell EMC Ready Solutions for VDI offer several key benefits to clients:

    ● Predictable costs, performance, and scalability to support a growing workforce● Rapid deployments● Rapid scaling, ready to serve enterprises of any size● Dell Technologies support

    All the Dell EMC Ready Solutions for VDI are configured to produce similar results. You can be sure that the vSAN-basedappliances you choose have been designed and optimized for your organization's needs.

    Topics:

    • Test results and density recommendations• Summary

    Test results and density recommendationsThe recommended user densities in the following table were achieved during the performance testing on VxRail appliances. Wefollowed the VMware best practices of FTT = 1 and configured a reserved slack space of 30 percent. All configurations weretested with Microsoft Windows 10, 64-bit, and Microsoft Office 2019. We implemented all mitigations to patch the Spectre,Meltdown, and L1TF vulnerabilities at the hardware, firmware, and software levels to ensure an improved performance impact,which is reflected in the achieved user densities.

    Table 16. User density recommendations for VMware vSphere ESXi 6.7 with VMware Horizon 7.10

    Server configuration Workload Windows version User density

    Density Optimized Login VSI Knowledge Worker Windows 10, 1803 130

    Density Optimized Login VSI Knowledge Worker Windows 10, 1909 125

    Density Optimized Login VSI Power Worker Windows 10, 1909 100

    Density Optimized + 6 x T4 Login VSI Multimedia (Virtual PC:T4-2B)

    Windows 10, 1909 48

    Density Optimized + 6 x T4 nVector Knowledge Worker (Virtual PC:T4-2B)

    Windows 10, 1909 48

    Density Optimized nVector Knowledge Worker Windows 10, 1909 48

    SummaryWe have provided extensive performance testing results and guidance based on the PAAC testing carried out with the Login VSIKnowledge Worker, Power Worker, Multimedia Worker, and nVector Knowledge Worker workloads. The 2nd Generation IntelXeon Scalable processors in our Density Optimized configuration provide performance, density, and agility for your VDIworkloads. The NVIDIA GPU options provide exceptional performance for graphics-intensive workloads.

    The configurations for the VxRail appliances are optimized for VDI. We selected the memory and CPU configurations thatprovide optimal performance. You can change these configurations to meet your environmental requirements, but keep in mind

    5

    Conclusion 55

  • that changing the memory and CPU configurations from those that have been validated in this document will affect the userdensity per host. The guidance provided for VxRail appliances also applies to solutions based on VMware vSAN Ready Nodes.Use VxRail sizing tools for sizing the solution and reserve resources for management tools when designing your VDIenvironment. For further assistance on our solutions, contact your Dell Technologies account representative.

    Dell Technologies offers comprehensive, flexible, and efficient VDI solutions that are designed and optimized for yourorganization's needs. Our VDI solutions are easy to plan, deploy, and run. Dell EMC Ready Solutions for VDI offer several keybenefits to customers, including:

    ● Rapid deployment and scaling● Access from anywhere, any device● Hardened security● High performance● Predictable cost of ownership● Dell Technologies comprehensive support

    With VDI solutions from Dell Technologies, you can streamline the design and implementation process, and be assured that youhave a solution that is optimized for performance, density, and cost-effectiveness.

    56 Conclusion

  • ReferencesThis chapter presents the following topics:

    Topics:

    • Dell Technologies documentation• VMware documentation• NVIDIA documentation

    Dell Technologies documentationThe following links provide additional information from Dell Technologies. Access to these documents depends on your logincredentials. If you do not have access to a document, contact your Dell Technologies representative. Also see the VDI Info Hubfor a complete list of VDI resources.

    ● Dell Technologies Virtual Desktop Infrastructure● Dell EMC VxRail Hyperconverged Infrastructure● Dell EMC vSAN Ready Nodes

    Previous versions

    Previous versions of the documentation for this solution can be found here:

    ● VDI Info Hub Archive

    VMware documentation

    The following VMware documentation provides additional and relevant information:

    ● VMware vSphere documentation● VMware Horizon 7 documentation● Best Practices for Published Application and Desktops in VMware Horizon Apps and VMware Horizon 7● vSAN Ready Node Configurator● VMware Compatibility Guide● Horizon 7 Enterprise Edition Reference Architecture● Horizon 7 Enterprise Edition Multi-Site Reference Architecture

    NVIDIA documentation

    The following NVIDIA documentation provides additional and relevant information:

    ● NVIDIA Virtual GPU Software Quick Start Guide

    6

    References 57

    https://infohub.delltechnologies.com/t/solutions/vdi/https://www.dellemc.com/en-us/solutions/vdi/index.htmhttps://www.delltechnologies.com/en-ie/converged-infrastructure/vxrail/index.htmhttps://www.delltechnologies.com/en-us/converged-infrastructure/hyper-converged-infrastructure/vsan-ready-nodes.htmhttps://infohub.delltechnologies.com/t/archive/https://docs.vmware.com/en/VMware-vSphere/index.htmlhttps://docs.vmware.com/en/VMware-Horizon-7/index.htmlhttps://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-horizon-7-apps-published-applications-desktops-best-practices.pdfhttp://vsanreadynode.vmware.com/RN/RNhttps://www.vmware.com/resources/compatibility/search.php?deviceCategory=vsan&details=1&vsan_type=vsanreadynode&vsan_partner=23&vsan_releases=278&page=1&display_interval=10&sortColumn=Partner&sortOrder=Aschttps://techzone.vmware.com/resource/workspace-one-and-horizon-reference-architecture#component-design-horizon-architecturehttps://techzone.vmware.com/resource/workspace-one-and-horizon-reference-architecture#multi-site-architecturehttp://docs.nvidia.com/grid/latest/grid-software-quick-start-guide/index.html

  • Appendix A: Login VSI metricsTable 17. Description of Login VSI metrics

    Login VSImetrics

    Description

    VSImax VSImax shows the number of sessions that can be active on a system before the system is saturated. It isthe point where the VSImax V4 average graph line meets the VSImax V4 threshold graph line. Theintersection is indicated by a red X in the Login VSI graph. This number gives you an indication of thescalability of the environment (higher is better).

    VSIbase VSIbase is the best performance of the system during a test (the lowest response times). This number isused to determine what the performance threshold will be. VSIbase gives an indication of the baseperformance of the environment (lower is better).

    VSImax v4average

    VSImax v4 average is calculated on the number of active users that are logged into the system, but removesthe two highest and two lowest samples to provide a more accurate measurement.

    VSImax v4threshold

    VSImax v4 threshold indicates at which point the environment's saturation point is reached (based onVSIbase).

    A

    58 Appendix A: Login VSI metrics

    Dell EMC Ready Solutions for VDI: Designs for VMware Horizon on VxRail and vSAN Ready Nodes Validation Guide ContentsExecutive SummaryDocument purposeAudienceWe value your feedback

    Test Environment Configuration and Best PracticesValidated hardware resourcesValidated software resourcesVirtual networking configurationManagement server infrastructureNVIDIA Virtual GPU Software License ServerSQL Server databasesDNS

    High availabilityVMware HorizonVMware Horizon solution architectureVMware Horizon clone technology

    Login VSI Performance TestingLogin VSI performance testing processResource monitoringLoad generationLogin VSI workloadsDesktop VM test configurations

    Login VSI test results and analysisSummary of test resultsKnowledge Worker, 510 users, ESXi 6.7u3, Horizon 7.10, Windows 1803Knowledge Worker, 485 users, ESXi 6.7u3, Horizon 7.10, Windows 1909Power Worker, 385 users, ESXi 6.7u3, Horizon 7.10, Windows 1909Multimedia Worker, 48 users, ESXi 6.7u3, Horizon 7.10, Windows 1909

    NVIDIA nVector Performance TestingnVector performance testing processLoad generationnVector Knowledge Worker workloadResource monitoringDesktop VM configurations

    nVector performance test results and analysisSummary of test resultsnVector Knowledge Worker, 48 vGPU users, ESXi 6.7u3, Horizon 7.10nVector Knowledge Worker, 48 users, non-graphics, ESXi 6.7u3, Horizon 7.10

    ConclusionTest results and density recommendationsSummary

    ReferencesDell Technologies documentationVMware documentationNVIDIA documentation

    Appendix A: Login VSI metrics