2
GREATER THAN FIVE 9s UPTIME FOR MISSION-CRITICAL NODES IN MICROSOFT ® WINDOWS ® HPC SERVER 2008 CLUSTERS “Leveraging more than 25 years experience in fault-tolerant computing, Stratus has collaborated with Microsoft to provide valuable insight and validate fault tolerant-aware functionality in the Windows Server ® operating system. This collaboration is designed to enable uninterrupted operation of Windows Server under the most extreme circumstances.” Bill Laing, Microsoft corporate vice president, US-Windows Server Executive Management. PARTNER PROFILE Stratus Technologies, Inc., is a global provider of fault- tolerant computer servers, technologies and solutions services, with more than 25 years of experience focused on Continuous Processing ® technology. Stratus ® Windows-based ftServer ® systems, powered by Intel Xeon quad-core processors, provides among the highest levels of reliability in the server industry, delivering 99.999 percent or better uptime. Stratus is a trusted solutions provider to customers in financial services, manufacturing, health care, public safety, transportation and logistics, and other industries. SITUATION High-performance computing (HPC) has undergone fundamental changes. Complex computational problems are now distributed across a compute cluster made up of commodity, 64-bit x86 computers. Accordingly, HPC is being widely adopted by large and small organizations alike as a critical resource for accelerating time-to-insight. Availability and HPC While not all HPC environments require high levels of availability, environments where application results are time-critical or where compute clusters are large in size are highly sensitive to availability issues. Time-sensitive applications are abundant in the financial services industry. Risk management, derivatives trading and portfolio analysis generate results that influence decisions in real-time. Any disruption of these HPC applications can result in financial losses or regulatory penalties. Customers in this industry have a long history of critical computer applications and are tuned to availability issues in all of their computing decisions. Manufacturing applications where calculation results immediately affect operational parameters are another example of time- critical applications that cannot tolerate downtime or loss of critical data. The Impact of a Cluster Outage Large clusters—which typically host complex, longer-running applications—are more vulnerable to availability lapses and the effect of an outage can have a far-reaching impact on business operations. In these environments, even a brief outage will interrupt the operation of critical business functions and a restart of at least some jobs or tasks will be required. For large HPC applications that may run for hours, or even days, the impact of a restart can pose severe consequences. SOLUTION Windows HPC Server 2008 combines the power of a 64-bit Windows Server platform with rich, out-of-the-box functionality to help improve the productivity of your organization, and reduce the complexity of HPC. Working together with Stratus, Windows HPC Server 2008 delivers industry- leading uptime that exceeds 99.999 percent for Microsoft ® operating environments. Stratus ® ftServer ® systems are the perfect hardware choice for mission-critical compute cluster components. “Windows HPC Server 2008 brings a new level of simplicity to enterprise-class high- performance computing. Its scalability and power place it at the center of critical applications, such as investment pricing, financial risk analysis, securities trading strategies and other high-value functions that can tolerate no downtime. Stratus ftServer systems strategically placed in the HPC cluster provide continuous availability for applications like these.” Allan Jennings, Stratus senior vice president, product & solutions development.

GREATER THAN FIVE 9s UPTIME FOR MISSION-CRITICAL NODES …download.microsoft.com/download/e/0/2/e0239059-c3b7-4154-9c25 … · MISSION-CRITICAL NODES IN MICROSOFT® ... manufacturing,

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GREATER THAN FIVE 9s UPTIME FOR MISSION-CRITICAL NODES …download.microsoft.com/download/e/0/2/e0239059-c3b7-4154-9c25 … · MISSION-CRITICAL NODES IN MICROSOFT® ... manufacturing,

GREATER THAN FIVE 9s UPTIME FOR MISSION-CRITICAL NODES IN MICROSOFT®

WINDOWS® HPC SERVER 2008 CLUSTERS

“Leveraging more than 25 years experience in fault-tolerant computing, Stratus has collaborated with Microsoft to provide valuable insight and validate fault tolerant-aware functionality in the Windows Server® operating system. This collaboration is designed to enable uninterrupted operation of Windows Server under the most extreme circumstances.”

Bill Laing, Microsoft corporate vice president, US-Windows Server Executive Management.

PARTNER PROFILE

Stratus Technologies, Inc., is a global provider of fault-tolerant computer servers, technologies and solutions services, with more than 25 years of experience focused on Continuous Processing® technology.

Stratus® Windows-based ftServer® systems, powered by Intel Xeon quad-core processors, provides among the highest levels of reliability in the server industry, delivering 99.999 percent or better uptime.

Stratus is a trusted solutions provider to customers in financial services, manufacturing, health care, public safety, transportation and logistics, and other industries.

SITUATION

High-performance computing (HPC) has undergone fundamental changes. Complex computational problems are now distributed across a compute cluster made up of commodity, 64-bit x86 computers. Accordingly, HPC is being widely adopted by large and small organizations alike as a critical resource for accelerating time-to-insight.

Availability and HPCWhile not all HPC environments require high levels of availability, environments where application results are time-critical or where compute clusters are large in size are highly sensitive to availability issues. Time-sensitive applications are abundant in the financial services industry. Risk management, derivatives trading and portfolio analysis generate results that influence decisions in real-time. Any disruption of these HPC applications can result in financial losses or regulatory penalties. Customers in this industry have a long history of critical computer applications and are tuned to availability issues in all of their computing decisions.

Manufacturing applications where calculation results immediately affect operational parameters are another example of time-critical applications that cannot tolerate downtime or loss of critical data.

The Impact of a Cluster OutageLarge clusters—which typically host complex, longer-running applications—are more vulnerable to availability lapses and the effect of an outage can have a far-reaching

impact on business operations. In these environments, even a brief outage will interrupt the operation of critical business functions and a restart of at least some jobs or tasks will be required.

For large HPC applications that may run for hours, or even days, the impact of a restart can pose severe consequences.

SOLUTION

Windows HPC Server 2008 combines the power of a 64-bit Windows Server platform with rich, out-of-the-box functionality to help improve the productivity of your organization, and reduce the complexity of HPC. Working together with Stratus, Windows HPC Server 2008 delivers industry-leading uptime that exceeds 99.999 percent for Microsoft® operating environments. Stratus® ftServer® systems are the perfect hardware choice for mission-critical compute cluster components.

“Windows HPC Server 2008 brings a new level of simplicity to enterprise-class high-performance computing. Its scalability and power place it at the center of critical applications, such as investment pricing, financial risk analysis, securities trading strategies and other high-value functions that can tolerate no downtime. Stratus ftServer systems strategically placed in the HPC cluster provide continuous availability for applications like these.”

Allan Jennings, Stratus senior vice president, product & solutions development.

Page 2: GREATER THAN FIVE 9s UPTIME FOR MISSION-CRITICAL NODES …download.microsoft.com/download/e/0/2/e0239059-c3b7-4154-9c25 … · MISSION-CRITICAL NODES IN MICROSOFT® ... manufacturing,

ftServer SYSTEMS

Stratus’ industry-standard ftServer family delivers the highest levels of uptime in the industry – 99.999 percent or better – with measured customer availability results that meet or exceed that goal.

Stratus ftServer systems are built from the ground up to avoid downtime altogether. Stratus Continuous Processing® technology incorporates lockstep technology, failsafe software and ActiveService™ architecture. These unique availability innovations work together to keep ftServer systems in continuous operation.

Lockstep technology uses replicated, fault-tolerant hardware components that process the same instructions at the same time. In the event of a component malfunction, the partner component acts as an active spare that continues normal operation and averts system downtime.

Failsafe software prevents many software issues from escalating into outages - transparently handling errors to shield the operating system, middleware and application from failure.

ActiveService architecture detects and resolves problems before they cause downtime.

In a Windows HPC Server 2008 architecture, ftServer systems are used:

• As the head node, which is the central management, scheduling and controlling node for the entire cluster.

• As Windows Communication Foudation (WCF) broker nodes, which act as intermediaries between the application and cluster services.

• As the file server, which provides access control to shared files across the HPC cluster nodes.

WINDOWS HPC SERVER 2008

Windows HPC Server 2008 HPC platform is simple to deploy, operate and integrate into existing IT infrastructure and tools. A typical cluster includes a single head node, and one or more compute and WCF broker nodes. The head node is the most critical component of the cluster. It controls and mediates all access to the

cluster resources and is the single point of management, deployment, and job scheduling for the cluster.

One or more WCF broker nodes receive interactive requests from Service Oriented Architecture (SOA) applications and return results. In addition, a critical file server function provides access control to shared files across the HPC cluster nodes.

Windows HPC Server 2008 incorporates a number of important features that promote availability. HPC management software allows dynamic addition or removal of compute nodes and provides monitoring and recovery features that deal with compute node failures.

BENEFITS

Affordable, accessible, full-featured HPC Windows HPC Server 2008 brings simple deployment, operation and IT integration to HPC at a price point that allows smaller companies and individual departments to successfully deploy HPC applications.

Scalable, highly secure HPC Windows HPC Server 2008 provides the scalability of Windows Server 2008, and includes support for Windows Server 2008 security features.

Industry-leading uptime of 99.999 percent or greater Deploying Stratus ftServer systems helps to ensure continuous operation of the head node, file server and WCF Broker nodes help protect the entire HPC solution against failure and data loss. And, ftServer systems offer comprehensive protection for the WCF broker node without the performance penalties or application changes inherent to server-side and client-side recovery.

Increased productivity with built-in cluster diagnostics and reporting Windows HPC Server 2008 includes built-in diagnostics and reporting, enabling administrators to quickly identify and diagnose hardware, software, or network problems across the cluster.

FURTHER INFORMATION For more information about Windows HPC Server 2008 and HPC, please visit http://www.microsoft.com/hpcFor more information about the ftServer family of products, or Stratus, please visit http://www.stratus.com

This data sheet is for informational purposes only. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Microsoft Corporation. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.

In this Windows HPC Server 2008 cluster network, Stratus fault-tolerant servers provide >99.999% uptime for the critical head node, WCF Broker node and file server functions.