12
EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 [email protected] www.orsyp.com AMERICAS HEADQUARTERS 300 TradeCenter 128 Suite 5690 Wolburn , MA, 01801 USA +1 [781] 569 5730 [email protected] www.orsyp.com White Paper The Truth Behind VMware Virtual Infrastructure Performance Yann Guernion, VP Technology

The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 [email protected]

  • Upload
    doanh

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com

EMEA HEADQUARTERS

Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 [email protected] www.orsyp.com

AMERICAS HEADQUARTERS

300 TradeCenter 128 Suite 5690 Wolburn , MA, 01801 USA +1 [781] 569 5730 [email protected] www.orsyp.com

White Paper

The Truth Behind VMware Virtual Infrastructure Performance

Yann Guernion, VP Technology

Page 2: The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com

© ORSYP 2009 ▪ All Rights Reserved PAGE 2 / 12

White Paper

ABSTRACT Virtual infrastructures offer many advantages, including the optimization of physical resource use and the facilitation of server deployment. At the same time, virtualization brings about a new level of abstraction, greatly increasing the risk of material resource saturation and necessitating the use of more precise performance management tools. Virtualization may reduce the costs of adding physical equipment, but it increases the complexity of server infrastructure and the risk of saturation.

As an example, most performance monitoring solutions today cannot provide the information required to provide a clear vision of malfunctions emerging within the VMware VI3 virtualized environment. With Sysload’s new VMware guest agents embedded in each OS, one can now easily see the micro-phenomena occurring well before actual service levels start to degrade. Additionally, this new technology provides the industry’s first solution to deliver absolutely accurate timekeeping metrics for the VI3 guest machines.

Performance management has become an increasingly important strategic issue as the need to maximize server utilization continues to intensify. We are now experiencing a return to the reality and importance of precise performance management, reminiscent of the days of mainframe computing.

TABLE OF CONTENT

1. VIRTUALIZATION REALITY ................................................................................................... 3 2. THE REAL CHALLENGE ...................................................................................................... 4 3. PERFORMANCE MANAGEMENT OF VMWARE INFRASTRUCTURE 3 ......................................... 5 3.1 INFORMATION GATHERING ................................................................................................. 7 3.2 BLACKBOX PERFORMANCE MANAGEMENT .......................................................................... 8 4. SYSLOAD FOR VMWARE INFRASTRUCTURE 3 ...................................................................... 9 4.1 A CONSOLIDATED VIEW OF THE DATA CENTER .................................................................10 4.2 UNEQUALLED GRANULARITY OF ANALYSIS ........................................................................10 4.3 AUTOMATIC DETECTION OF MICRO-SATURATION INCIDENTS ..............................................10 4.4 REAL-TIME CORRECTION OF TIMEKEEPING DISCREPANCY .................................................10 4.5 SUPPORT FOR PLANNING SERVER DEPLOYMENTS ............................................................11 4.6 HOMOGENEOUS INDICATORS ...........................................................................................11 4.7 NEGLIGIBLE IMPACT ON VMWARE STRUCTURE .................................................................11 4.8 INTERACTIVE AND FLEXIBLE CONSOLE ..............................................................................12 5. ABOUT ORSYP ..............................................................................................................12

Page 3: The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com

© ORSYP 2009 ▪ All Rights Reserved PAGE 3 / 12

White Paper

1. Virtualization Reality The last few months have seen a frenzy of interest regarding server virtualization technologies. Today it seems like this trend is here to stay, as a number of software developers, including VMware, Microsoft, IBM, Sun, HP, Oracle, Citrix and Parallels promote their new virtualization solutions.

Nevertheless, even one might reasonably question the degree to which an organization consolidates infrastructure as the cost of equipment continues to decrease.

But putting aside the cost of equipment, there are other physical constraints that affect most large companies’ data centers that make virtualization an interesting option for them. For starters, most server rooms are bursting at the seams, with no room to add additional equipment. Additionally, electrical supply and climate control capabilities may also be at their limits. Meanwhile, due to the widespread use of multi-core processors as well as the low cost of memory, the vast majority of servers are operating at less than 20% of their capacity.

We could summarize the situation as follows: the data center is full, but the servers within it are empty. Virtualization technologies offer an obvious and efficient solution to this problem: if we can’t fit any more servers into the data center, we can start to fill the servers.

But despite what many companies think, a reduction in the number of infrastructure elements does not imply a reduction in administration constraints. To properly examine the subject, we must expose some of the hidden realities of virtual infrastructure.

Virtualization does not necessarily imply a reduction in the number of servers. On the contrary, virtualization may lead to a proliferation of servers as it becomes easier to deploy them in a virtual infrastructure. A virtual server, even if it is only loosely connected to its physical host, imposes just as many administration tasks as a physical server. It must be backed up, updated and monitored.

We all know the old adage that it is dangerous to put all ones’ eggs in one basket. By placing more virtual servers within a physical host server, the effect is similar to putting more eggs in a basket. A malfunction in the physical host will lead to downtime for a greater number of applications. Therefore, virtualization makes physical servers more critical. It becomes strategically advantageous to be able to quickly move virtual servers between physical hosts for host maintenance operations, as technologies like VMware’s VMotion allow.

Virtualization will negatively impact service levels unless companies understand that virtualization is not synonymous with a reduction in administration resources. In the majority of cases, administrators will have to treat virtual servers in the same way that they treat physical servers; as elements that require close management to ensure optimum efficiency.

In order to fully take advantage of their virtualized infrastructures, companies must plan properly and adopt a pragmatic approach, making sure to distinguish between the realities of their IT production and passing technological fads.

Page 4: The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com

© ORSYP 2009 ▪ All Rights Reserved PAGE 4 / 12

White Paper

2. The Real Challenge Despite the many advantages of virtualization, it is a fact that the sharing of a physical server by multiple virtual machines brings about a greater level of competition for server resources which greatly increases the risk of resource saturation.

At the same time, the need to maximize physical resource utilization means that IT teams must run applications close to their capacity limits. The one-application-per-physical-server comfort zone is a thing of the past.

A system’s scalability is defined as its response time relative to the demand it is subject to. These two variables are typically related in a curve as shown in the graph below.

Resp

onse

tim

e

Demand

System scalability

The left end of the curve shows the minimum response time (i.e. when the load is zero).

As the load increases, the increase in response time is approximately linear until it hits a corner which represents the system’s capacity limit (saturation). Response times increase exponentially after this point is reached.

Administrators seek to function at or close to this capacity limit. In a system operating near its capacity limits, response times can vary greatly with small changes in demand. As a result, users can experience sudden slowdowns of application response times.

Businesses that consolidate their data centers are inevitably faced with additional, new performance management problems. It would be difficult to have to explain to end-users and management that the adoption of new technologies is done at the expense of the quality of service provided. IT administration teams are therefore obliged to take the necessary means to ensure consistent service levels while working in more complex, dynamic and sensitive environments.

Page 5: The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com

© ORSYP 2009 ▪ All Rights Reserved PAGE 5 / 12

White Paper

3. Performance Management of VMware Infrastructure 3

In order to guarantee service levels provided to clients and end-users, it is necessary to have total control over the performance of systems that host applications.

There are two primary causes of performance problems in VMware environments:

Unexpected resource consumption by one application on one virtual machine or the cumulative effects of unexpected consumption by several applications on several virtual machines.

Intentional changes in the configuration of the physical infrastructure supporting the virtual infrastructure or malfunctions that provoke a reconfiguration of the same.

However, most commonly-used monitoring solutions are not capable of providing the detailed level of information required to have a precise vision of the performance of virtualized systems.

The primary reasons for this are the excessively coarse time grain of analysis metrics as well as architectural particularities specific to the virtualization platform.

Granularity of analysis

The time scale used to analyze any evolving system is particularly significant in virtualized environments.

Imagine if weather forecasters only checked the air temperature once a year, on the first of June. An observation of temperature reports over 10 years would lead us to the conclusion that the outside air temperature is more-or-less constant, or perhaps increasing slightly due to global warming.

However, if we observed weather measurements taken on January 31st and August 15th our conclusion would certainly be very different. But even in this case, the time granularity of the measurements taken is not significant, because even without taking

Page 6: The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com

© ORSYP 2009 ▪ All Rights Reserved PAGE 6 / 12

White Paper

into account day and night a variation, the temperature in any given spot varies constantly throughout the year.

We can extrapolate this concept to the observation of IT systems; a server’s CPU load, for example. Despite the many advantages of virtualization, it is a fact that the sharing of a physical

As we can see on the graphs above, the observation of a server’s CPU activity varies significantly depending on the granularity of information. On the left, the year 2008 is shown with a one-month time grain. On the right, a typical day in 2008 is shown with a five minute time grain.

By looking exclusively at the monthly data, we would significantly underestimate the server’s maximum utilization: 3.5% versus 16%.

That’s just the broad view. We can also consider the narrow view. An IT system can make more than 10 context changes per second or 30,000 changes every 5 minutes. Each one of those contexts can execute thousands of instructions.

So therefore, even an analysis with measurements taken at five minute intervals does not offer an accurate picture of the system’s real activity.

To illustrate this point, the graph below shows a server’s maximum CPU activity measured at two different frequencies: every second in green and every 5 seconds in red. Again we can see very different results: 24.5% versus 40.7%.

Page 7: The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com

© ORSYP 2009 ▪ All Rights Reserved PAGE 7 / 12

White Paper

As a general rule, a performance measurement tool whose granularity of analysis is too coarse does not add much value because it only provides averaged data, thus glossing over the peaks and valleys whose identification is critical to the efficient management of virtualized infrastructures.

Only high-frequency monitoring is useful for evaluating the performance of highly dynamic virtualized systems.

3.1 Information Gathering

VMware’s VI3 technology relies on Linux-based server infrastructure, within which a hypervisor program distributes physical resources between virtual machines.

Virtual Center allows the management of the VI3 virtual environment by means of a unique interface which ensures the functions of allocating resources to virtual machines, automating administration tasks and monitoring of the server infrastructure.

The metrics necessary for performance management in VMware environments can be gathered from three different sources:

The Service Console

The Service Console offers direct access to the Linux operating system that hosts the ESX hypervisor.

Programs executed in this zone offer the advantage of being very close to the hardware and the hypervisor, making it easy to gather performance information on them.

However, the Service Console that is available on ESX servers is not available on ESXi servers.

The durability of this information source is thus relative, as it is only useful for a homogeneous group of ESX servers.

Additionally, operating systems running on virtual machines are completely inaccessible from the Service Console. This means that it is impossible to monitor guest environments from this source, including application processes and other information that is crucial for the maintenance of service levels to end-users.

Finally, monitoring from the Server Console does not guarantee continuous measurement of virtual machines that are migrated from one ESX server to another.

The Virtual Center

The Virtual Center transparently centralizes access to the infrastructure servers, including both ESX and ESXi environments.

The very same information obtained from the Service Console can be incorporated into the Virtual Center thanks to its Soap interface.

Most third-party monitoring solutions employ this solution because it can be used for any kind of ESX server and is not affected by virtual machine migrations between servers.

Nonetheless, this approach has a distinct disadvantage in that it does not offer a view of operating systems, which are a critical link. Nor does it offer a view over applications installed on virtual machines.

Within the Virtual Machines

Page 8: The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com

© ORSYP 2009 ▪ All Rights Reserved PAGE 8 / 12

White Paper

Performance data gathered directly from the operating systems installed on virtual machines offer the highest added value.

This solution is completely unaffected by virtual machine migration and provides information that is directly related to applications.

Because virtualization adds a layer of abstraction between applications and physical resources, the best way to identify and correct potential application performance problems is by analyzing the interactions between systems and processes.

As noted earlier, administrators must treat virtual servers exactly the same as physical servers, i.e. as infrastructure elements requiring optimal management in order to remain operational.

However, because operating systems installed on virtual machines were not specifically designed to be virtualized in a VI3 environment (paravirtualization) certain side effects tend to occur:

The guest operating system is not programmed to restrict its utilization to a defined subset of physical resources.

CPU timekeeping for the guest operating system is not stable or accurate. (http://www.vmware.com/pdf/vmware_timekeeping.pdf)

The information provided by traditional monitoring solutions tends to be distorted by these effects, thus rendering it useless.

3.2 Blackbox Performance Management

The dilemma regarding performance management for ESX VI 3 can be summarized into the following question: Is it sufficient to manage only the host (physical) server performance, and consider virtual machines as black box based applications?

Page 9: The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com

© ORSYP 2009 ▪ All Rights Reserved PAGE 9 / 12

White Paper

Certainly not, unless one considers that in a “physical environment”, performance of a business application can be fully monitored by simply getting overall CPU, Memory, and IOs from the server running the programs.

If it is not so simple with a physical box, why should it be easier within a virtual machine and its added software layers?

Broadly speaking, focusing on the “host” performance will only lead to the detection of issues that occur when all the virtual machine are in an “all-you-can-eat” configuration.

In a production system however, virtual machines tend to have a limited resource configuration to avoid unexpected server over-commitments. In this case, saturation issues cannot be characterized through the host analysis alone, but only from monitoring the operating system inside the virtual machines.

4. Sysload for VMware Infrastructure 3 In order to directly address performance problems related to the heterogeneity of the data center and the complexity of monitoring in virtual environments, Sysload develops an ultra high-performance data gathering technology and provides the necessary Data Analysis consoles to master virtualization projects (from initial planning stage through production) and to easily detect and diagnose any performance related issues.

Unified view of the data center (physical and virtual resources)

Granularity of analysis providing unrivalled precision

Real-time correction of timekeeping discrepancies

Automatic detection of micro-saturation incidents.

Support for planning server deployment & upgrades

Negligible Impact on the VMware Infrastructure

Interactive and Flexible Console

Sysload dashboards provide precise, homogeneous and relevant views of your servers’

performance in Real-Time, 24x7

Identifying CPU saturation due to several VMs running on a

server

Page 10: The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com

© ORSYP 2009 ▪ All Rights Reserved PAGE 10 / 12

White Paper

4.1 A Consolidated View of the Data Center

Sysload offers a full range of multiplatform solutions that span hardware types, operating systems and proprietary virtualization solutions (VMware, Sun, HP, and IBM). Sysload’s data gathering technology is based on agents installed directly on the operating systems that allow objective management of physical and virtual servers.

4.2 Unequalled Granularity of Analysis

Sysload agents can collect up to 300 metrics from the heart of the systems while leaving only the lightest of footprints (less than 1% CPU utilization with no induced network load) at very high frequency (down to 1 second intervals). They offer real-time surveillance of both physical host machines and virtual guests.

This highly granular information guarantees precise monitoring of resource utilization and allows the identification of malfunctions related to micro-saturation incidents that cannot be detected by most other monitoring tools.

4.3 Automatic Detection of Micro-saturation Incidents

By collecting data directly from the system kernel, Sysload is able to monitor and identify the specific application processes that are responsible for the quality of service experienced by end users. Thus, Sysload agents can automatically detect saturation-related malfunctions which could not otherwise be detected from the host machine.

4.4 Real-time Correction of Timekeeping Discrepancy

Sysload also offers agents for Windows, Linux and Solaris that are specifically designed to be “aware” of the fact that they are running on an operating system which is in turn running on a virtual machine.

Virtualization: base lining host/guests CPU load for one

Page 11: The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com

© ORSYP 2009 ▪ All Rights Reserved PAGE 11 / 12

White Paper

They can neutralize the CPU timekeeping discrepancy brought about by various virtual machines sharing a single CPU. This well known phenomenon renders traditional measuring tools inoperative.

Calculations based on these metrics are conducted on stable time records obtained from the hypervisor and from utilization indicators coming from physical (not virtual) resources. For example, Sysload’s calculation of CPU load for the overall system or for an individual application or process is based on the real, physical CPU consumption from the point of view of the ESX host.

4.5 Support for Planning Server Deployments

Virtualization implies a radical departure from classic server resource management. With Sysload, IT service staff can precisely determine utilization and capacity levels for all resources necessary to maintain service levels.

Offering precise historical data down to five minute intervals, Sysload allows the profiling of virtual machine activity (typical day, typical week, etc.) and the determination of their compatibility within a resource pool.

4.6 Homogeneous Indicators

In order to have homogeneous indicators, processor consumption is also converted into MHz, the power unit normally used by Virtual Center.

Therefore, Sysload collects data about applications’ resource utilization in virtual machine in MHz, which allows a rapid and direct determination of the applications’ impact on a resource pool.

4.7 Negligible Impact on VMware Structure

The very light overhead of Sysload’s embedded agents on the ESX host and on each VM, minimizes the physical resources required to achieve optimum server

Page 12: The Truth Behind VMware Virtual Infrastructure … Truth Behind VMware Virtual...EMEA HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com

© ORSYP 2009 ▪ All Rights Reserved PAGE 12 / 12

White Paper

performance management, regardless of how many hosts and guests are being monitored.

4.8 Interactive and Flexible Console

Thanks to its high level of interactivity, the Sysload SP Analyst console provides the ability to shift effortlessly between real-time and historical views, from performance, trend analysis and reporting, to troubleshooting, with 2-3 intuitive mouse clicks. Global mosaics and/or Enterprise table views provide a high-level starting point for quick, easy and efficient drill down capabilities to pinpoint performance issues.

5. About ORSYP ORSYP is an independent IT Operations Management solutions provider helping customers assure that IT services are delivered on-time. Headquartered in Paris, France, Boston, USA, Hong Kong, China, ORSYP has more than 20 years of growth and over 1400 blue chip customers. ORSYP software, including Enterprise Job Scheduling, IT Automation, Performance and Capacity Management and also ITSM consulting services, are trusted and proven in some of the world’s most demanding physical and virtual environments. We strive to provide customers with the assurance that time to delivery of IT Operations services is properly and effectively managed today, and tomorrow.