Best practices for deploying Citrix XenApp
on XenServer on HP ProLiant servers
Technical white paper
Table of contents
Executive summary ............................................................................................................................... 2
Business case ...................................................................................................................................... 2
Virtualization overview ......................................................................................................................... 3 Benefits ........................................................................................................................................... 4 Why virtualize XenApp? ................................................................................................................... 4
Performance testing .............................................................................................................................. 5 Overview ........................................................................................................................................ 5 Test results ....................................................................................................................................... 8
Best practices ...................................................................................................................................... 9 Unable to migrate to a 64-bit environment? ......................................................................................... 9 Only virtualize suitable platforms ..................................................................................................... 10 Consider the cost ........................................................................................................................... 11 Do not oversubscribe vCPUs ............................................................................................................ 11 Fully utilize CPU resources............................................................................................................... 11 Avoid spikes in processor utilization ................................................................................................. 14 Allocate sufficient memory to each VM ............................................................................................. 15 Using write cache .......................................................................................................................... 19 Monitor network performance .......................................................................................................... 20 Enhance availability ....................................................................................................................... 20 Balance the distribution of VMs ........................................................................................................ 21 Optimize resource use .................................................................................................................... 21 Optimize the XenServer kernel ......................................................................................................... 21 Enhance manageability .................................................................................................................. 21
Enhancing the scalability of a modern 4P server .................................................................................... 22 Bare-metal 64-bit platform ............................................................................................................... 22 32-bit platform ............................................................................................................................... 25
Consolidation example ....................................................................................................................... 28
Appendix A – Testing ......................................................................................................................... 31 Test tools ....................................................................................................................................... 31 User profile ................................................................................................................................... 32 Test scenarios ................................................................................................................................ 32 Test topology ................................................................................................................................. 34
For more information .......................................................................................................................... 35
2
Executive summary
Since the first hypervisor came on the market, businesses have been challenged to consolidate
underutilized servers on to a single physical machine. While early implementations were
disappointing, resulting in large performance penalties due to virtualization overhead, there have
been significant technological advances. Today’s AMD Opteron™ and Intel® Xeon® processors
provide on-chip instructions to handle direct hardware calls from the hypervisor, minimizing the
associated overhead. As a result, the scalability of virtual machines (VMs) is much improved.
Meanwhile, businesses continue to deploy non-virtualized x86 platforms that are inherently restricted
in the number of users they can support due to limited memory addressability. The decision not to
move to a 64-bit platform is often predicated by driver and application incompatibilities that would
make migration prohibitively expensive. Virtualization offers a viable solution to this dilemma,
allowing 32-bit platforms to scale to unprecedented levels.
This paper explores a number of options for using Citrix XenServer to consolidate 32-bit workloads on
both 32- and 64-bit Microsoft® Windows® platforms, with emphasis on best practices, tuning, and
tips for virtualizing a Citrix XenApp on XenServer environment.
Target audience: This white paper provides information for IT professionals interested in virtualization.
This white paper describes testing performed in July 2008 – August 2010.
Business case
Whether you are running a small business, a remote office or a data center, the problem is the same:
you need to support multiple applications that may not co-exist well on the same server. However, no
single application is likely to overload the multi-core processors in one of today’s HP ProLiant servers;
thus, dedicating a server to a particular application would waste valuable resources. In addition, you
may need infrastructure servers to act as firewalls, Domain Name System (DNS)/Dynamic Host
Configuration Protocol (DHCP) servers, virtual desktops, desktop application servers, web servers, or
mail servers, depending on your particular environment. Are you going to dedicate a physical server
to each of these functions?
Another part of the equation may be the need to move old software and operating systems from
outdated and, possibly, failing or hard-to-repair hardware to more modern servers. Unfortunately,
your new hardware may not be able to support the older operating systems and applications, while
the alternative – updating both the hardware and applications – is expensive and increases the
potential for error. Virtualization is one way to address these issues, by better utilizing resources and,
through consolidation, reducing the number of physical servers you need.
XenApp customers are generally seeking ways to reduce their overall costs. Within a XenApp
environment, the cost of powering and cooling servers is high, constituting a significant – if not the
most significant – portion of total IT infrastructure costs. Indeed, more and more studies are indicating
that server hardware is no longer the leading data center expense; for example, the purchase price of
a new, 1U server has already been exceeded by the capital cost of the power and cooling
infrastructure needed to support it and may soon be exceeded by its lifetime energy costs. As a result,
power consumption is a key concern for enterprise customers considering the purchase of HP servers.
Many would like to reduce their overall power footprint – but without sacrificing performance. While
this goal has traditionally been impossible, recent performance characterizations performed by HP
demonstrate that a balance between performance and power consumption can be achieved.
3
With virtualization, you can now consolidate multiple applications on a single HP ProLiant server,
depending on the memory, CPUs, and disk space available in the host machine, and the particular
applications you wish to support. Furthermore, your VMs can be configured to run troublesome, older
operating systems.
In addition to consolidation, the benefits of hardware virtualization typically include:
VMs are isolated and can be configured to use specific hardware resources
VMs are easily copied and deployed, and can be moved between physical machines without
service interruption
VMs can be administered centrally
Virtualization overview
XenServer, the Citrix virtualization solution, is based on Xen, an open-source virtualization project that
supports both Intel Virtualization Technology (VT) and AMD Virtualization™ (AMD-V™) capabilities.
Xen allows a single machine to host multiple isolated environments, each running an operating system
and an application software instance.
In 2004, the Xen project released the first version of its virtualization platform (hypervisor); the
following year, the project founders formed XenSource, which subsequently introduced products such
as XenServer and XenEnterprise. In 2007, Citrix purchased XenSource, renaming these products
XenServer Standard Edition and XenServer Enterprise Edition.
The hypervisor is the layer responsible for managing partitions (domains), instantiating VMs into
domains, and scheduling and allocating resources for the domains. In some virtualized environments,
the guest is unaware that it is virtualized since the underlying hypervisor is able to emulate all system
components. I/O calls and other privileged requests are performed as normal by the guest but must
be trapped and emulated by the underlying hypervisor, thereby degrading performance. This
scenario is often referred to as full virtualization.
While first-generation hypervisors relied on emulation technology to virtualize operating systems, Xen
takes advantage of advances in operating system and CPU technologies to provide paravirtualization
and hardware-assisted virtualization, along with full 64-bit support.
With paravirtualization, the operating system is modified so that it can directly call virtualized I/O
services and other privileged operations supported by the hypervisor, eliminating the need for
resource-intensive binary translation and emulation. Drivers for storage and network interface cards
(NICs) are replaced with virtualization-aware drivers that provide a fast I/O channel through the host
domain, delivering excellent guest I/O performance.
While operating systems that are not virtualization-aware can be used with Xen, these operating
systems rely on processor extensions to assist virtualization. For a hardware-assisted domain to run on
Xen, the underlying hardware must be either Intel VT or AMD-V capable and have that feature
enabled. All current HP ProLiant servers support hardware-assisted virtualization in 32- or 64-bit
environments.
4
Note:
Hardware assistance for virtualization is disabled by default in HP ProLiant
servers. To enable this feature during boot, press F9 to enter the setup
mode; then select Advanced OptionsProcessor Options; and, lastly, select
and enable the Virtualization Technology option. You should also enable
No-Execute Memory Protection. Select F10 to save these settings and exit
the utility.
For more information, refer to the “HP ROM-Based Setup Utility User
Guide.”
If you re-flash your firmware, these settings will return to their default values.
Benefits
The benefits of virtualization include:
Enhanced server utilization
Average server utilization in the data center may be as low as 5% – 10%1, making infrastructure
servers (such as a Domain Name System (DNS) or Microsoft Active Directory controller) and other
lightly-used machines excellent candidates for consolidation.
Consolidating underutilized servers and application silos allows you to maximize the utilization of IT
resources and comply with conservation (green) initiatives.
Business continuity solution
Costly clusters of physical machines are typically used to minimize the risk of a loss of a single
server. However, virtualization allows you to provide failover and redundancy for multiple
applications on a single cluster – and the machines that make up this cluster need not be
configured identically. The provisioning of these servers is simple and flexible.
Disaster recovery solution
To help eliminate the risk of the loss of a whole location or data center, you can replicate VMs to
another site in near real-time.
Dynamic workload management
You can use VMs to support dynamic workload management, moving VMs to accommodate spikes
in demand.
Enhanced management flexibility
VMs can help increase levels of automation in the data center. Scripting and programmatic
exposure via management application programming interfaces (APIs) can enhance management
flexibility.
Why virtualize XenApp?
While the benefits of delivering applications to users through XenApp are proven, the growth in scale
and complexity of XenApp deployments has created an opportunity for you to achieve an even
greater return on your investment.
1 The DataCenter Journal, 12 March 2009
5
The benefits of XenApp on XenServer include:
Reduced server/data center footprint
Server consolidation can reduce the number of physical servers required in your data center (see
Consolidation example). XenServer allows you to deploy multiple applications on the same servers
– even if these applications would be incompatible in a non-virtualized environment.
Improved failover and redundancy
In a non-virtualized environment, silos are often created to simplify application-specific redundancy
and failover, typically resulting in significant unused capacity. To retain the availability benefits of
redundancy while reducing physical server footprint, you can virtualize underutilized XenApp
servers.
Zero-downtime hardware maintenance
In a non-virtualized environment, hardware maintenance is usually associated with reduced
application availability. You must typically schedule maintenance after-hours so that you power
down servers in order to replace faulty or outdated hardware.
However, XenServer’s XenMotion feature allows running VMs to be migrated from one physical
server to another with no service interruption, supporting zero-downtime maintenance.
Rapid server, application, and capacity provisioning
In a non-virtualized environment, it may take hours or even days to manually increase XenApp
capacity. With XenServer, however, VMs preinstalled with XenApp can be converted into templates
and, in conjunction with a resource pool, used for rapid provisioning.
Fast, easy, portable test and demonstration environments
If you cannot justify the hardware required to create test, demonstration, and training environments,
you can use XenServer to deliver copies of production environments. As a result, you can test the
quality and impact of applications, hot fixes, and configuration changes prior to rolling them out
into production. In addition, you can create complete, portable training and demonstration
environments to introduce new services and applications throughout the organization.
Performance testing
HP has performed a number of performance characterizations designed to compare the scalability of
virtualized HP ProLiant servers deployed in 32- and 64-bit XenApp environments. To provide
baselines, bare-metal configurations were also tested.
For more information on tested configurations and the test environment, refer to Appendix A –
Testing.
Overview
HP bases the workload for tested servers on a Microsoft Office 2003-based Heavy User profile.
Heavy Users (also known as Structured Task Workers) tend to open multiple applications
simultaneously and remain active for long periods; they often leave applications open when not in
use.
To characterize scalability, HP focuses on the following criteria:
System performance reported by Windows Performance Monitor (Perfmon)
User response times measured using a canary script
6
Ideally, HP would prefer to use exactly the same tools to characterize performance in both virtualized
and non-virtualized environments. However, because of the wide range of discrete data values2
generated in a virtualized environment, HP prefers to employ moving averages rather than discrete
values. HP has also developed custom tools to aggregate the large amount of performance data
generated by VMs.
Despite these departures from the methodology established for bare-metal servers, HP expects the
margin of error in metrics for a virtualized server to be less than 10%.
Note:
One option for monitoring VM performance is the use of round-robin
databases (RRDs). XenServer records persistent performance metrics in
RRDs to provide long-term access to this data and support the analysis of
historical trends. RRDs are maintained for the host server and for VMs. For
more information, refer to the HP white paper, “Analyzing Citrix XenServer
persistent performance metrics from Round Robin Database logs.”
In general, 80% processor utilization has been considered the critical performance threshold, typically
used to help specify the optimal number of users supported by a particular server configuration.
However, processor utilization does not always reach 80% during a test run. In such cases, HP
analyzes the Perfmon results to determine what has limited scalability; for example, when bare-metal
32-bit platforms are tested, scalability tends to be limited by lack of system page table entries (PTEs).
HP uses the results of the associated canary run to validate that response times were acceptable when
the optimal number of users indicated by Perfmon was active. If, however, user response times have
already become unacceptable before the 80% threshold is reached, HP accepts as optimal the
number of users supported just before response times began to degrade.
Sample test results are shown in Figure 1.
2 Due to ringing (that is, significant oscillation of processor utilization values)
7
Figure 1. Sample test results showing that, for this 64-bit HP ProLiant BL685c G6 platform, response times were acceptable
when 500 users – the optimal number – were active
For more information on the test methodology, refer to Appendix A – Testing.
8
Test results
This section summarizes test results for virtualized 32- and 64-bit platforms.
Table 1. Optimal numbers of users supported by a range of virtualized HP ProLiant servers
Server model Number of
cores
64-bit platforms 32-bit platforms
Config-
uration
Users Overhead Config-
uration
Users Consolid-
ation
factor
DL380 G7 24 * 6/4 680 5.0
DL585 G5 16 4/43 242 16%
DL785 G5 32 8/4 430 3%
BL460c G6 16* 4/4 340 16% 4/4 401 2.7
Bl465c G5 8 4/2 139 10%
G6 12 6/2 360 0% 6/2 378 2.1
G6 (i) 12 6/2 303 0%
G7 24 6/4 645 4.6
BL680c G5 24 6/4 291 25% 6/4 483 3.5
BL685c G6 (ii) 16 4/4 404 2%
G6 (iii) 24 6/4 500 0%
G7 (i) 32 8/4 731 (iv)
(i) Low-power processors
(ii) Four-core processors
(iii) Six-core processors
(iv) HP was unable to perform the bare-metal testing required to obtain a consolidation factor because the Enterprise Edition
of Windows Server 2003 (deployed on the tested server) does not support 32 processor threads.
* Intel HT Technology enabled
Important:
When Intel Hyper-Threading Technology (Intel HT Technology) is enabled,
the number of processor cores seen by the operating system doubles.
3 Virtualized configuration – formatted as x/y, where x denotes the number of VMs; y denotes the number of virtual CPUs (vCPUs) allocated to
each VM. Note that, in some cases, the configuration is expressed as x/y/z, where z denotes the amount of memory (in GB) allocated to each
VM.
9
Best practices
To take best advantage of the benefits delivered by virtualization, you need to understand your
XenApp environment. To size cost-effective host servers when virtualizing such an environment, you
must consider the particular applications, along with the numbers of users and the specific user
profiles you wish to support.
Be aware that VM performance can vary depending on the application, the guest operating system,
and other factors. Thus, one of the biggest challenges when planning a virtualized environment is
how to address the variables that can impact host server sizing and performance, such as:
How many VMs should be deployed on a single host?
How many virtual CPUs should be allocated to each VM?
How much memory is required on each VM to help eliminate memory and I/O bottlenecks?
Would a storage array network (SAN) be a better choice than internal storage?
Are there enough network interface cards (NICs)?
To compound this level of complexity, the XenApp environment presents unique challenges due to the
vast number of processes running simultaneously and the large memory dependencies of many of the
applications deployed. As a result, you must refine your sizing process when virtualizing such an
environment so that the host server you select can deliver the appropriate resources (processor,
memory, I/O, and network) and scalability.
Before you start planning for virtualization, however, you should first determine if your application is
a good candidate. For example, underutilized XenApp servers, XenApp data store servers, and
servers running infrastructure services may be suitable for virtualization. Conversely, XenApp servers
running resource-intensive applications or highly-utilized infrastructure servers may not be such good
candidates.
This section provides guidelines for optimizing the scalability of a virtualized HP ProLiant server.
Note:
In general terms, your virtualized server configuration (6/4/8, for example)
is considered optimal if scalability is degraded when you increase the
number of VMs, change the number of vCPUs per VM, or reduce the
amount of memory per VM.
However, it is important to point out that your environment is unique; testing is a critical part of
maximizing server scalability and consolidation ratio.
Unable to migrate to a 64-bit environment?
The ideal solution to addressing a memory-constrained 16- or 32-bit application is migration to a 64-
bit environment, where the amount of addressable memory is no longer an issue – indeed, it is not
uncommon for the latest server products to support 512 GB of memory or more. While a 64-bit
operating system can fully utilize this 512 GB, the best-case for a 32-bit operating system is support
for 128 GB4.
4 Accessing more than 4 GB of memory requires Physical Address Extension (PAE) support. For more information, refer to
http://www.microsoft.com/whdc/system/platform/server/PAE/PAEdrv.mspx.
10
Note that virtualizing a 64-bit platform is associated with resource overhead that decreases the
overall user density compared with a bare-metal implementation. However, this overhead is typically
acceptable given the range of benefits delivered by virtualization (such as server consolidation,
energy-efficiency, enhanced disaster recovery capabilities, and easier system maintenance).
There are a number of reasons that may make it impractical or uneconomical to migrate to a 64-bit
environment, including the following:
If you are supporting a 16-bit application that cannot be ported to a 32- or 64-bit deployment, you
have no choice but to run the existing application in a 32-bit environment (whether virtualized or
not).
If a device driver or application is incompatible with the 64-bit environment, you again have no
choice but to deploy your environment on a 32-bit edition of Windows.
It is possible to obtain excellent user densities by virtualizing your 32-bit platforms – and without
making any changes. The scalability of older platforms is often restricted to 150 users or less;
however, modern HP ProLiant servers can deliver consolidation factors as high as 5.0. Thus,
virtualization may allow you to replace five legacy servers with a single, virtualized server such as an
HP ProLiant DL380 G7.
Note:
Virtualization adds complexity5 to a deployment. Despite this, if you are
unable to migrate to a 64-bit environment, virtualization may be an
appropriate choice
Only virtualize suitable platforms
Many options are available to you when identifying good candidates for virtualization. In general,
opportunities exist for the dramatic consolidation of any under-utilized legacy servers, whatever the
makes and models, whether 32- or 64-bit platforms.
Note:
HP offers tools to help you migrate from third-party servers. For example,
HP Insight Server Migration software for ProLiant supports physical-to-
ProLiant application migrations.
In many cases, 32-bit platforms make the best candidates for consolidation. Businesses have long
been striving to extract every last ounce of productivity from their legacy servers and applications.
Indeed, many are now unable to move to 64-bit platforms due to driver incompatibilities and/or
porting issues with custom applications.
With XenServer, legacy servers can be efficiently migrated and hosted on HP’s latest server families
without sacrificing performance – or, thanks to generational improvements in processor, memory, and
I/O capabilities, performance may even be enhanced. Given the well-documented memory limitations
of the non-virtualized 32-bit platform and its inability to scale, virtualization can deliver dramatic
improvements in scalability – as much as 400%, as described in the HP white paper, “Consolidation
of x86 HP Server Based Computing environment with Citrix XenServer on HP ProLiant BL680c G5.”
Although the 64-bit platform eliminates the drastic memory limitations that plague 32-bit environments,
the requirement for emulation means that executing 32-bit workloads in a 64-bit environment places a
limit on scalability.
5 A significant learning curve may be required.
11
Note:
Since most of the emulation is handled at the chip level, the associated
overhead is significantly less than that associated with 16-bit Windows on
Windows (WoW) emulation.
Indeed, it is possible to get better performance when virtualizing a 32-bit workload on a 32-bit
platform than when virtualizing the same workload on an equivalently configured 64-bit platform.
Contrast the results HP obtained when running a 32-bit workload on an HP ProLiant BL680c G5
server blade: in a virtualized 32-bit environment, 483 users were supported; in a virtualized 64-bit
environment, 291.
Even though there may be significant overhead when virtualizing a 32-bit workload on a 64-bit
platform6, significant benefits can be achieved by consolidating older servers on to newer machines.
In addition, reducing the number of physical machines can reduce costs associated with power,
cooling, data center real estate, and licensing. In short, the savings start to add up.
Consider the cost
When planning your virtualized environment, consider the costs involved.
As shown in the Test results, HP tested a broad range of configurations, demonstrating that
virtualization overhead can vary significantly (0% – 25% for optimal configurations) in the 64-bit
environment. While the level of overhead may be significant, careful tuning can optimize scalability,
thus maximizing the return on your investment.
Note:
In a 32-bit environment, modern HP ProLiant servers support more users
when virtualized compared to a bare-metal configuration. Thus, there is
effectively no virtualization overhead.
Remember, however, that your operating system licensing costs are directly related to the number of
VMs deployed. In practice, it may be more beneficial to minimize licensing costs than deploy the
optimal number of VMs. For example, testing performed on an HP ProLiant DL785 G5 server
demonstrated that an 8/4 configuration was able to support 430 users, while a 4/8 configuration
could support 420. It is hard to imagine how support for 10 more users could justify the cost of the
four additional OS licenses that would be required.
Do not oversubscribe vCPUs
It might seem likely that the more virtual CPUs (vCPUs) you subscribe to a particular VM, the more
users the VM will be able to support. However, HP has found that oversubscribing vCPUs (that is,
allocating more vCPUs than there are processor cores) tends to degrade server scalability because
processor resources must now be shared between VMs.
Fully utilize CPU resources
Typically, a CPU core can only run a single thread at any one time. However, Intel HT Technology
allows a core to support two threads; indeed, a quad-core processor with Intel HT Technology
enabled is recognized by the operating system to be an eight-core processor (see Enabling Intel HT
Technology).
6 The Test results section provides examples of this overhead.
12
Each core is capable of supporting a virtual CPU (vCPU).
While it is a best practice not to oversubscribe vCPUs, you should utilize all available processor cores
to help maximize performance. Thus, if you are virtualizing an HP ProLiant server featuring two quad-
core AMD Opteron processors, you should deploy eight vCPUs; however, if you are virtualizing a
server that features two quad-core Intel Xeon processors, you can deploy 16 vCPUs when Intel HT
Technology is enabled.
Creating balance
One of the keys to optimizing server scalability is achieving a balance between the number of VMs
you configure in a particular server and the number of vCPUs allocated to each VM. For example,
consider the following virtualized HP ProLiant DL585 G5 server configurations tested by HP:
8/2: 184 users
4/4: 242 users (that is, 30% more)
Thus, reducing the number of VMs and doubling the number of vCPUs per VM increased the
scalability of this particular server by 30%.
HP offers the following guidelines to provide a starting point when you are configuring VMs:
8 cores: 4/2
16 cores: 4/4
24 cores: 6/4
32 cores: 8/4
HP strongly recommends carrying out performance tests to determine the ideal configuration for your
particular environment.
Enabling Intel HT Technology
If you are using a later-generation7 Intel Xeon-powered HP ProLiant server, you can enable Intel HT
Technology to double the number of processor cores available to VMs.
Note:
Due to the associated overhead, you should not enable earlier
implementations of Intel HT Technology.
HP demonstrated the benefits of enabling Intel HT Technology on a 2P/12C8 HP ProLiant DL380 G7
server blade9. A 6/2/6 configuration was able to support 500 users in a XenApp environment, as
shown in Figure 2.
7 G6 or later 8 Signifying support for two processors (P) and a total of 12 cores (C) 9 For more information, refer to the HP white papers, “Performance of HP ProLiant DL380 G7 with Intel Xeon Processor X5680 (3.33 GHz) in 32-
and 64-bit HP SBC environments” and “Performance of HP ProLiant DL380 G7 with Intel Xeon Processor X5680 (3.3 GHz) in a 32-bit
virtualized HP SBC environment.”
13
Figure 2. Configured to use 12 cores, this virtualized server was able to support 500 users
Using Intel HT Technology effectively increased the number of available cores from 12 to 24. Taking
full advantage of these additional resources, HP doubled the number of VMs deployed on this server.
Note:
This is not considered to be over-subscription.
As shown in Figure 3, the resulting 6/4/6 configuration was able to support 680 users, an increase
of 36%.
14
Figure 3. Doubling the number of vCPUs (from 12 to 24) fully utilized the CPU resources of this virtualized server and increased
the number of users supported by 36%
Avoid spikes in processor utilization
To avoid spikes in processor utilization, ensure your VMs are online before applying the workload.
Do not simultaneously add large numbers of users; if possible, balance the workload across your
VMs.
15
Allocate sufficient memory to each VM
Allocating sufficient memory to each VM is also important. For example, although test systems are
often configured with less, HP recommends configuring at least 8 GB for each production VM
(whether running 32- or 64-bit applications). This allocation should accommodate a typical workload
and provide some space for growth. Allocating 8 GB to 32-bit VMs also means that you will not have
to physically upgrade your host servers when you migrate to a 64-platform, which may impose an
additional memory overhead.
However, if you are running an operating system that cannot support 8 GB (such as the 32-bit version
of Windows Server 2003 Standard Edition, which can only support 4 GB), you should allocate less
memory to each VM.
Important:
To maximize scalability, HP does not recommend deploying page files
within VMs. However, in a test environment, HP deploys page files within
VMs to provide consistency with bare-metal server configurations.
Note that there are some benefits to deploying page files within VMs. If a
VM were to trap or BSOD10, for example, you would be unable to obtain a
dump file for analysis purposes unless there is a local page file.
Although XenServer does not currently allow you to over-subscribe memory, HP does not believe this
capability would add value in a XenApp environment. You merely need to ensure your total VM
memory allocation does not exceed the size of physical memory; individual VM allocations must not
exceed the limit supported by the guest operating system. Remember to reserve approximately 1 GB
for the hypervisor.
However, allocating insufficient memory resources can lead to a significant performance penalty.
Example of insufficient memory resources
To determine the impact of allocating insufficient memory to VMs, HP tested an HP ProLiant BL465c
G7 server blade11 in a 32-bit XenApp environment when configured as follows:
6/4/4
6/4/6
Figure 4 shows the scalability of the 6/4/4 configuration.
10 A reference to the so-called blue screen of death 11 For more information, refer to the HP white paper, “Performance of HP ProLiant BL465c G7 with AMD Opteron processor Model 6174 (2.2
GHz) in a 32-bit virtualized HP SBC environment.”
16
Figure 4. Scalability of a 6/4/4 configuration
Lack of available memory resources in the 6/4/4 configuration caused the number of stopped
sessions to increase exponentially when 445 Heavy Users were active.
Processor utilization never reached 80%, the criterion typically used to characterize server scalability.
For this particular server configuration, HP determined that the optimal number of users was 435 (445
active sessions less 10 stopped sessions), limited by lack of memory rather than processor resources.
Figure 5 shows what happened to an individual VM when it ran low on memory resources.
17
Figure 5. When this VM ran low on memory resources, disk idle time decreased exponentially, limiting the number of users that
could be supported (Note that stopped sessions are not included)
When memory size was increased from 4 to 6 GB, scalability increased significantly, as shown in
Figure 6.
18
Figure 6. Scalability of a 6/4/6 configuration
Average processor utilization reaching 80% when 650 Heavy Users were active; the number of
stopped sessions began to increase exponentially when 630 – 680 Heavy Users were active.
Thus, HP determined that the optimal number of users was 645 (650 users less five stopped sessions).
Increasing the memory allocated to each VM from 4 GB to 6 GB allowed processor resources to be
fully utilized, resulting in a 48% increase in scalability, as shown in Figure 7.
19
Figure 7. Memory allocation comparison
You should always monitor I/O performance in a production environment to determine if there are
potential bottlenecks. Take particular care in the following usage cases:
Applications that tax I/O subsystems do not scale as well in a virtualized environment as in a bare-
metal configuration.
If you have virtualized the existing workload on a modern 32-bit platform, be aware that the server
may be supporting significantly more users than before; potential I/O bottlenecks may now be
exposed.
Avoiding disk I/O bottlenecks
To help you avoid disk I/O bottlenecks, Microsoft recommends using the Windows performance
monitoring tool, Perfmon, to check the following metrics12:
%Idle time – Idle times for logical and physical drives should average at least 50%
Average Disk Seconds/Read and Average Disk Seconds/Write – The average time taken to
complete a read or write should average less than 25 milliseconds, with peak times of less than 50
milliseconds
If the above conditions specified by Microsoft cannot be met, a disk I/O bottleneck is likely.
Note:
In the event of an I/O bottleneck, you should tune the disk subsystem,
decrease the number of users or applications, or add memory to the server.
Using write cache
HP Smart Array controllers include a data cache, memory that can be utilized to temporarily cache
data being written to or read from disk. Since access to this memory is significantly faster than disk
access, the cache can enhance overall server performance, particularly during login operations.
Write cache is of particular interest in a XenApp environment. After buffering all the data associated
with a particular write command, the Smart Array controller indicates to the XenApp server that the
data transfer to the disk is complete – even through the data is still being written to disk. This frees up
the server’s processor to perform other tasks and accelerates data throughput.
12 For more information, visit the Microsoft website.
20
While HP has not yet characterized flash backed write cache (FBWC) performance in the XenApp
environment, testing performed on battery backed write cache (BBWC) demonstrates that
performance enhancements due to write cache may be most significant when the XenApp server is
carrying out log-intensive operations and/or when significant page file write operations are
necessary, such as during user logins. Performance gains have ranged from 50% to 250%13; actual
results would vary depending on the application(s) involved and your particular XenApp environment.
Monitor network performance
Historically, XenApp workloads have more issues with network latency than with raw bandwidth due
to efficiencies associated with Citrix Independent Computing Architecture (ICA).
Now that a single physical server is being used to host workloads that, prior to virtualization, were
run on multiple servers, be sure to monitor the host for any network bottlenecks that may have been
introduced.
There are a number of ways in which you can enhance network performance, such as deploying
additional network ports, implementing network interface card (NIC) teaming, or using HP Virtual
Connect technology.
Enhance availability
VMs are flexible, allowing you to readily implement the level of availability you need. Moreover, you
can enhance availability by utilizing a SAN created from HP StorageWorks product offerings, with
capabilities that may include:
Multiple paths for redundancy
Automatic path failover
High-availability cluster support
Example
Consider an environment in which a fully-configured HP ProLiant BL460c G6 server blade is hosting
eight VMs. In the event of a server failure, all eight VMs would fail.
An alternative would be to deploy two BL460c G6 server blades, each hosting four VMs. Now the
loss of a server would only impact four VMs; if desired, you could manually import the downed VMs
to the surviving server and restart them.
If, however, the two BL460c G6 server blades are in a pooled configuration with shared storage and
are utilizing the XenServer High Availability (HA) feature, the downed VMs could be automatically
restarted on the surviving server. Moreover, having shared storage allows running VMs to be moved
between hosts using the XenServer live migration (XenMotion) feature.
XenMotion helps to eliminate VM downtime, freeing up the server administrator to perform repairs or
upgrades to the original host.
For a business-critical environment, you might consider adding a third BL460c G6 server blade to the
pool, allowing each host to support two – three VMs. Using the live migration and workload
balancing capabilities of XenServer, VMs can automatically be moved between hosts to achieve the
best balance and optimize resource utilization on each host. In this configuration, one server can be
taken offline with little effect on overall pool performance.
13 For more information, visit the HP website.
21
Balance the distribution of VMs
Internal bottlenecks created by poor VM tuning can burden the host system, particularly if multiple
identical VMs are deployed on the same host server. For example, if all the VMs on a host are
memory-constrained, a tremendous burden is placed on the server’s disk I/O system. To avoid this
scenario, ensure VMs are properly configured and balanced within your environment. Avoid memory
swapping at all costs.
Optimize resource use
In addition to suitable sizing, optimal VM performance requires XenServer and guest operating
systems to be appropriately configured. Do not overlook the execution of screensavers or other
resource-intensive applications; carefully scrutinize your VMs to save precious resources for users and
applications. This requirement, which should be well-known in conventional XenApp environments,
becomes even more important after virtualization.
Optimize the XenServer kernel
HP typically makes no changes to the XenServer kernel to optimize performance in the XenApp test
environment. However, if your workload is memory-intensive, you may need to increase the amount of
memory allocated to domain zero (Dom0) if you experience scalability or reliability issues.
Thus, if necessary, you can increase the amount of RAM allocated to Dom0 in the XenServer
/boot/extlinux.cnf file to accommodate additional users.
For more information on Dom0, refer to http://wiki.xensource.com/xenwiki/Dom0.
Enhance manageability
Consider the following caveats concerning VM management:
While many management tools perceive VMs to be, in effect, the same as physical machines,
remember that you will also need to manage the virtualization layer. In order to minimize the
number of tools required to manage your environment, a single, integrated platform for physical
and virtual machines is recommended (such as HP ProLiant servers running XenServer and
XenApp).
Consider enabling any onboard hardware management and notification capabilities so that you
can receive pre-failure alerts, allowing you to migrate VMs to another physical host prior to failure.
Since it may be difficult to monitor application rather than server performance in a virtualized
environment, resist the temptation to blindly propagate VMs in response to performance issues. You
may be trading your silos of physical machines for silos of VMs.
Since virtualization makes it so easy to replicate services, you may find that, without even
increasing the number of physical machines, you are now managing a large number of new
servers. These additional instances translate to more patches, more managing and monitoring.
Since VMs often host seldom-used applications, they may be off for long periods of time.
Management tools may not be able to turn on these VMs to install patches, creating a potential
security risk.
The remainder of this paper describes how you can use virtualization to enhance the scalability of
today’s 4P servers, which may be limited by the capabilities of Windows Server 2003. In addition,
an example of the benefits of consolidating a legacy environment is provided.
22
Enhancing the scalability of a modern 4P server
HP has discovered that, in a 64-bit environment, the Enterprise Edition of Windows Server 2003 may
not be able to accommodate the number of processor cores featured in today’s 4P HP ProLiant
servers.
Note:
The limited scalability of 32-bit Windows Server 2003 platforms is well
known.
This section compares the limited scalability of an HP ProLiant BL685c G7 server blade when
deployed as a bare-metal 64-bit platform14 against the significant improvement when deployed as a
virtualized 32-bit platform15.
Bare-metal 64-bit platform
Figure 8 shows the number of Heavy Users supported by a 4P HP ProLiant BL685c G7 server blade in
a 64-bit test environment.
14 Due to its limited scalability, HP did not publish performance test results for the 64-bit platform. 15 For more information, refer to the HP white paper, “Performance of HP ProLiant BL685c G7 with AMD Opteron processors Model 6128 HE
(2.0 GHz) in a 32-bit virtualized HP SBC environment.”
23
Figure 8. The bare-metal 64-bit platform can only provide optimal support for 208 Heavy Users
Processor utilization, the metric typically used by HP when determining the optimal number of users
supported by a particular server, barely reached 80% during this test run. However, the number of
stopped session began to increase exponentially when 208 users were active.
Although over 600 sessions were started during the test run, a large number of these had already
stopped when the test concluded.
Thus, HP concluded that the maximum number of users supported by a bare-metal HP ProLiant BL685c
G7 server blade in this 64-bit test environment was 208. By comparison, an HP ProLiant BL685c G6
server blade was able to support 500 users in the same environment.
24
That a 32-core server (G7) would support significantly fewer users than a 24-core server (G6) is
counter-intuitive. However, the disparity can be explained through HP’s use of the Enterprise Edition of
Windows Server 2003 to run the tested server.
This edition cannot support the execution of 32 threads concurrently, even under moderate loads. As
a result, you may wish to consider the following upgrade options:
Datacenter Edition of Windows Server 2003 – maximum of 32 cores
Windows Server 2008
– Web or Standard Edition – maximum of four processors
– Enterprise Edition – maximum of eight processors
– Datacenter Edition – maximum of 64 processors
Note:
As a result of the recent discovery that scalability may be limited by high
core density in a 64-bit environment, HP is actively upgrading the test
harness from Windows Server 2003/Office 2003 to Windows Server
2008/Office 2007.
Due to this software performance issue in the current test harness, HP has
not published a report of the bare-metal testing of the HP ProLiant BL685c
G7 server blade in a 64-bit test environment. However, the same
methodology was used as for the G6 model of this server.
Kernel instability
As shown in Figure 9, % Privilege Time values spiked after 390 sessions had been started, indicating
that the kernel had become unstable, unable to concurrently execute the requisite number of threads.
In turn, the length of the processor queue also spiked.
25
Figure 9. The kernel became unstable after 390 sessions had been started
Thus, the scalability of the HP ProLiant BL685c G7 server blade was limited when deployed as a 64-
bit platform. The following section highlights the improvement that can be achieved when this server is
deployed as a virtualized 32-bit platform.
32-bit platform
HP tested the HP ProLiant BL685c G7 server blade as a virtualized 32-bit platform and, for
comparison purposes, as a bare-metal 32-bit platform.
Bare-metal 32-bit platform
Figure 10 shows that, as expected, the scalability of the bare-metal platform was limited by lack of
system PTEs.
26
Figure 10. The bare-metal HP ProLiant BL685c G7 server blade was able to support 127 users as a 32-bit platform
27
Virtualized 32-bit platform
Figure 11 shows that the virtualized 32-bit platform was able to fully utilize the processor resources of
the HP ProLiant BL685c G7 server blade.
Figure 11. The HP ProLiant BL685c G7 server blade was able to support 731 users as a virtualized 32-bit platform
Thus, in a 32-bit environment, the HP ProLiant BL685c G7 server blade was able to support
significantly more users16 when virtualized.
Moreover, since host operating systems only had to support four CPU cores, VMs were able to
perform without kernel limitations, unlike the bare-metal configurations, which featured 32 cores –
more than the operating system was able to support.
16 576%
28
Results indicated that, due to these kernel limitations, the virtualized 32-bit platform was able to
support 351% more users than the bare-metal 64-bit platform. Thus, the solution for a dense 4P server
that appears to under-perform as a bare-metal 64-bit platform may be to virtualize this server and
deploy it in a 32-bit environment, as shown in Figure 12.
Figure 12. Scalability comparison
Consolidation example
To highlight the benefits of consolidation, HP determined how many modern server blades it would
take to accommodate the workload supported by a number of legacy blades.
The challenge was as follows:
Support at least 1,500 Microsoft Office 2003 users from a single HP BladeSystem c7000 enclosure
Replace 16 HP ProLiant BL460c G1 server blades, each able to support 96 users17.
To replace the legacy blades, HP selected the 2P HP ProLiant BL465c G7 server blade, which was
configured as follows:
AMD Opteron processors Model 6174 (2.2 GHz)
– 12 cores
– 12 MB shared L3 cache
64 GB RAM
HP Smart Array P410i controller with RAID 0
– 2 x 146 GB 10,000 rpm SAS drive
– 1 GB FBWC
HP NC551i Dual port FlexFabric 10 Gb Converged Network Adapter
HP determined that a virtualized HP ProLiant BL465c G7 server blade can support 645 users in a
XenApp environment; thus, three18 of these blades can be used to accommodate the workload
previously supported by 16 legacy blades, as shown in Figure 13.
In fact, the three modern blades were able to support 25% more users that the legacy systems, while
leaving 13 slots available for future expansion.
17 16 x 96 = 1,536 supported users 18 3 x 645 = 1,935 supported users
29
Figure 13. 16 legacy systems replaced by three virtualized HP ProLiant BL465c G7 server blades, with capacity to spare
Note:
The numbers of users projected for the above HP ProLiant BL465c G7
server blades do not take into consideration factors such as third-party
agents that typically consume a modest amount of system resources (such
virus scanning, software provisioning, remote administration, and firewalls).
Be aware that varying the user profile and workload can have a significant
impact on scalability.
As with the legacy implementation, it is assumed that management servers
are installed elsewhere.
Minimizing utility costs
The cost of powering and cooling servers is high, constituting a significant – if not the most significant
– portion of total IT infrastructure costs. Indeed, more and more studies are indicating that server
hardware is no longer the leading data center expense. For example, the purchase price of a new,
1U server has already been exceeded by the capital cost of the power and cooling infrastructure
needed to support it and will soon be exceeded by its lifetime energy costs (for more information,
refer to HP ActiveAnswers).
Consolidating your workload on a small number of high-performance HP ProLiant server blades can
significantly reduce your annual utility costs. If, for example, the cost of electricity were $0.10 per
KWh, annual utility costs for the baseline HP ProLiant BL460c G1 server blade configuration would be
$5,168 for 24 x 7 operation. Utility costs for the three HP ProLiant BL465c G7 server blades are
significantly lower19, as shown in Figure 14.
19 Based on high-performance technical computing (HPTC) requirements of 1,935 W
30
Note:
Power requirements for the legacy and consolidated configurations were
estimated using the HP BladeSystem Power Sizer (BPS)
Based on actual component-level power measurements for systems stressed
to their maximum capabilities, the BPS helps you plan a particular HP
BladeSystem deployment, providing power and cooling requirements, cost,
and a detailed bill of material.
Figure 14. Annual utility costs for legacy configuration are significantly higher
31
Appendix A – Testing
Appendix A describes the methodology used by HP to characterize the optimal number of users
supported by VMs on HP ProLiant servers.
Important:
As with any laboratory testing, the performance metrics quoted in this
paper are idealized. In a production environment, these metrics may be
impacted by a variety of factors.
HP recommends proof-of-concept testing in a non-production environment
using the actual target application as a matter of best practice for all
application deployments. Testing the actual target application in a
test/staging environment identical to, but isolated from, the production
environment is the most effective way to characterize system behavior.
This section provides more information on test tools, user profile, and test scenarios.
Test tools
To facilitate the placement and management of simulated loads on a XenApp server, HP used
Terminal Services Scalability Planning Tools (TSScaling), a suite of tools developed by Microsoft to
help organizations with Windows Server 2003 Terminal Server capacity planning.
Table A-1 describes these tools.
Table A-1. Components of TSScaling
Component Description
Automation tools Robosrv.exe Drives the server-side of the load simulation
Robocli.exe Helps drive the client-side of the load simulation
Test tools Qidle.exe Determines if any scripts have failed and require
operator intervention
Tbscript.exe A script interpreter that helps drive the client-side load
simulation
Help files TBScript.doc Terminal Server bench scripting documentation
TSScalingSetup.doc A scalability test environment set-up guide
TSScalingTesting.doc A testing guide
32
More information
Roboserver (Robosrv.exe) and Roboclient (Robocli.exe): Terminal Server capacity planning
TSScaling: Windows Server 2003 Terminal Server Capacity and Scaling
User profile
To simulate a typical workload in a XenApp environment, HP selected the Heavy User profile. Heavy
Users (also known as Structured Task Workers) tend to open multiple applications simultaneously and
remain active for long periods. Heavy Users often leave applications open when not in use.
Table A-2 outlines the activities performed by these users.
Table A-2. Activities incorporated into the test script
Activity Description
Access Open a database, apply a filter, search through records, add records, and delete records.
Excel Open, print and save a large spreadsheet.
InfoPath Enter data20 into a form; save the form over an existing form.
Outlook First pass: Email a short message.
Second pass: Email a reply with an attachment.
Outlook_2 Create a long reply.
PowerPoint Create a new presentation, insert clipart, and apply animation. View the presentation after each slide is
created.
PowerPoint2 Open and view a large presentation with heavy animation and many colors and gradients.
Word Create, save, print, and email a document.
Test scenarios
HP tested a bare-metal server to provide a baseline, then tested virtualized configurations to compare
their scalability.
HP used the same basic methodology, tools, and workload to characterize the performance and
scalability of the bare-metal server and of VMs running on that server.
Obtaining the baseline
To characterize the bare-metal performance of the non-virtualized server, HP used a workload based
on the activities described in User profile.
Testing was initiated by running the particular workload with a group of simulated users; start times
were staggered to eliminate authentication overhead. After these sessions finished, HP added another
group of users, then repeated the testing. Further users were added until the optimal number (see
Performance and scalability metrics) was reached.
Characterizing VM performance
HP used a similar methodology to characterize the aggregate performance of VM configurations on
the tested server. Note, however, that when characterizing VM performance, you must also consider
the demands of the hypervisor: if the number of user sessions is increased too quickly or too many
sessions are initiated concurrently, CPU utilization on the physical server can increase dramatically.
20 Data entry for Office InfoPath 2003 requires significant processor resources
33
So as not to over-saturate the tested server, HP ensured VMs were online before testing began and
controlled the number of users being added to VMs, thus minimizing spikes in processor utilization. To
ensure VMs approached saturation at the same rate, HP adopted a round-robin approach when
adding users.
Performance and scalability metrics
While the Office 2003-based workload was running, HP monitored a range of Windows
Performance Monitor (Perfmon) counters to characterize the performance and scalability of the bare-
metal server and VMs. HP also used canary scripts featuring Office 2003-based activities to establish
the number of users that could be supported before user response times became unacceptable.
HP typically uses the Perfmon % Processor Time counter to establish the optimal number of users
supported by a XenApp server – by definition, the number of users active when processor utilization
reaches 80%. At this time, a limited number of additional users or services can be supported;
however, user response times may become unacceptable.
In a 32-bit XenApp environment, System Page Table Entries (PTEs) on a bare-metal server may
become exhausted before processor utilization reaches 80% due to the well-known scalability
limitations of the 32-bit Windows platform.
To validate metrics obtained from Perfmon, HP uses canary scripts to characterize response times for
a range of discrete activities, such as the time taken to invoke an application or for a modal box to
appear. By monitoring response times – a very practical metric – as more and more users log on, HP
has been able to demonstrate that these times are acceptable when the optimal number of users (as
determined using Perfmon counter values) is active.
With some tested servers, response times begin to increase before processor utilization reaches 80%.
In such cases, HP prefers to be conservative, specifying as optimal the number of users supported
when response times first become unacceptable (that is, these times begin to increase markedly over a
baseline level).
HP used the same basic methodology to characterize performance and scalability of the bare-
metal server and of VMs running on that server.
Characterizing the optimal number of users in the virtualized environment
HP ran Perfmon on each VM to log the CPU resources consumed during a particular scenario.
Individual results were also aggregated to provide a single view of the capabilities of the tested server
when virtualized. However, while plots of Perfmon counter values tend to be relatively smooth in a
non-virtualized environment, when VMs are deployed, hypervisor activity introduces sporadic
transients that make raw data difficult to interpret. By utilizing moving averages, HP was able to
smooth out these transients, creating a view of processor consumption that helped characterize the
optimal number of users supported by VMs in this environment.
After a particular scenario was run, Perfmon logs for all VMs were saved to a single file. Office Excel
was then used to plot a moving average of 10 sequential log values.
While this methodology is less precise than that used by HP in a bare-metal environment, it provides
significant insight into overall system performance and the performance of individual VMs. By
analyzing Perfmon results in conjunction with canary response times, HP was able to specify the
optimal aggregate number of users supported by VMs in each scenario.
34
Test topology
Figure A-1 illustrates the HP Server Based Computing (HP SBC) test environment.
Figure A-1. The tested environment – the HP ProLiant DL785 G5 server is shown
For more information
HP ProLiant servers http://www.hp.com/go/proliant
HP ActiveAnswers solution area for HP SBC,
including Citrix XenApp and Microsoft Terminal
Services
http://www.hp.com/solutions/activeanswers/hp
sbc
Citrix XenApp http://www.citrix.com/site/PS/products/feature
.asp?familyID=19&productID=186&featureID=41
10
Citrix XenServer http://h71019.www7.hp.com/ActiveAnswers/c
ache/457122-0-0-225-121.html
HP Sizer for Citrix XenApp and Microsoft
Terminal Services
http://h71019.www7.hp.com/ActiveAnswers/c
ache/70245-0-0-0-121.html
HP Solution Centers http://www.hp.com/go/solutioncenters
HP Services http://www.hp.com/hps/
AMD Opteron processors http://www.amd.com/us/products/server/Page
s/server.aspx
Intel Xeon processors http://www.intel.com/products/server/processo
rs/index.htm
To help us improve our documents, please provide feedback at
http://h20219.www2.hp.com/ActiveAnswers/us/en/solutions/technical_tools_feedback.html.
© Copyright 2009 - 2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. AMD Opteron, AMD Virtualization, and AMD-V are trademarks of Advanced Micro Devices, Inc. Intel and Xeon are trademarks of Intel Corporation in the U.S. and other countries.
4AA2-5115ENW, Created March 2009; Updated September 2010, Rev. 2