Download pdf - Dell’s High Availability Cluster Product Strategy · are available. The matrixes below present a generic outline of the various components that are supported in the Small Computer

Cluster Development Group

AS OF 12/18/2003

December 2003

Dell’s High Availability Cluster Product Strategy This article outlines generic High Availability (HA) Cluster requirements and supported configurations. The paper also explains the logic behind a number of the rules and requirements for Dell supported HA solutions.

Overview

Ensuring access to information requires that applications and data meet stringent uptime requirements. As users demand that more services be available to them, the need to maximize application uptime has become common place. Developing High Availability (HA) solutions that support high levels of uptime while maintaining the simplicity of the Dell business model is challenging. Dell’s primary focus has been on composing solutions with Windows-based systems, but over the past year the need for high availability cluster solutions has emerged in the Linux market. Likewise, application clustering now enables a distributed approach to computing where multiple servers can logically be grouped into a cluster and viewed as a single entity like Oracle Real Application Clusters (RAC). To facilitate the adoption of credible solutions for Linux, Dell is implementing a set of cluster rules that will enable a common set of configurations that will be supported and sold around Microsoft Cluster Server (MSCS), Linux High Availability (Linux HA) and Oracle RAC cluster solutions. Dell’s High Availability Cluster Configuration rules are designed to ensure no single point of failure (SPOF) in the end-to-end cluster solution. This includes the standalone server, storage system, fabrics, paths, and applications. The Dell HA Cluster solutions are tested end-to-end to ensure maximum availability and reliability are available. The matrixes below present a generic outline of the various components that are supported in the Small Computer System Interface (SCSI) and Fibre Channel solutions for Windows, Linux HA and Oracle RAC based HA cluster configurations. As our HA programs mature, each new solution should use similar approaches and components. The various permutations and configurations go through stringent testing, which includes fault interjection to ensure that the entire solution is extensively developed in a highly stressful environment. The entire scope of the testing is conducted by the Dell HA cluster development groups. When issues are found, the teams work with the appropriate engineering teams and/or vendors to determine the root cause and develop a solution. Certifications – If required, then they are completed. However, not all solutions require a certification by the ISV. No Heterogenous Storage Components – As HA clustering is very dependent on the I/O subsystem, intermixing I/O components adds unacceptable risk to the configuration. Data integrity in cluster configurations must never be jeopardized. Current and previous (N & N – 1) Server configurations – Customer investment protection and migration paths for the latest OS and I/O subsystems. High Availability SCSI Cluster Solution SCSI-based HA cluster solutions are based on a cluster (server) failover configuration versus a path and cluster failover as supported in the Fibre Channel configurations. There is also a private (heartbeat) network that is a dedicated connection for communicating the cluster status between the cluster nodes. In Dell’s SCSI-based HA solution there is at least a single RAID controller in each cluster node (server). When a cluster node fails, the

node will failover to another cluster node. The matrix shown later in the article outlines the standard components of each Dell supported cluster. For example, under servers, N is equivalent to the currently shipping server such as the PE1750, N-1 would represent the PE1650. Under OSs, N is equivalent to Windows Server 2003 Enterprise Edition and N-1 represents Windows 2000 Advanced Server.

Diagram 1

High Availability Fibre Channel Cluster Solution Fibre Channel-based HA cluster solutions are based on path failover and credundant HBAs or paths to the storage, this provides for a higher level of asolution. Redundant Host Bus Adapters (HBAs) are required in each clustecoupled with redundant switches provide the ability to support redundant pexternal storage array. When a path fails, there will be a failover within the then the cluster can fail over to another node in the cluster. The HBAs in a versions of HBAs are not supported in a single cluster configuration, regard8 nodes in Windows 2003, Enterprise Edition.

Fibre Channel Switches

Diagram 2

2

Network
Servers
External SCSI Storage

luster failover. By requiring vailability than a SCSI-based cluster r node (server). Redundant HBAs

aths and fabrics connected to the same cluster node. If both paths fail, cluster must be identical. Mixed less if you are implementing 2 up to

Servers (nodes)

TBU

External Fibre Channel Storage

Oracle Real Application Clusters (RAC) Application clustering enables additional functionalities for specific purposes. While database virtualization technologies such as Real Application Clusters are not yet as widespread as generic HA clustering technologies, they can provide a unique value proposition for a given application or deployment scenario. Oracle RAC is Oracle’s database clustering technology, whereby multiple servers can be grouped in an active-active cluster with shared data. As of today, RAC is the only technology that allows databases to scale out in a shared data model. Based on the RAC technology, any front-end application (such as OLTP applications, Oracle E-Business Suite, SAP, etc) can connect to the database cluster. RAC is therefore a platform for Oracle clustering at the database level.

Diagram 3

3

Matrix of Dell Supported Cluster Components Windows HA

Oracle RAC Linux HA

Product/Feature Win NT

EE W2K AS

W2K3, EE

RH 2.1 AS RHEL 3 RHEL 3

Configuration Rules Certifications X X X

N and/or N -1 Servers X X X

26x0 4600 64x0 66x0 8450

1750 26x0 4600 64x0 66x0

Multiple Clusters on a SAN X X X X X

Multiple Clusters Direct Attached CX600 CX600 Mixed Storage

(on a SAN) X X X Mixed Storage

(on a cluster) Single Path

Configs X X Dual Path

Configs X X X X X Homogeneous

HBA I/O (no mixing of HBA

cards) X X X X X Controllers

Emulex Single Channel

LP9002L X X X LP982 X X

QLogic Single Channel

QLA2200 X X X QLA2340 X X X X

Emulex Dual Channel

LP9802 QLogic Dual Channel

QLA2342 X X RAID Controllers

PERC 3/DC X X X X X PERC 4/DC X X X

Driver Changes

Requalification at a minimum. Certification done at next major release

4

Windows HA Oracle RAC Linux HA

Product/Feature Win NT

EE W2K AS W2K3

EE RHEL

2.1 RHEL 3 RHEL 3 Configuration Rules External Storage Power Vault TM

PV22xS X X Array Manager X X X

SATA SCSI PV650 X X PV660 X X X

FC4500 X X X FC4700-2 X X X X X

CX Series CX600 X X X X CX400 X X X X CX200 X X X X

CX200LC SATA FC Switches Brocade

8 Port X X X X X 16 Port X X X X X 32 Port

McData 8 Port

16 Port 32 Port

Flex Switch X X X Platforms

Blades SC

1P Tower 2P Tower X X X X X 4P Tower X X X X X 1P Rack 2P Rack X X X X X 4P Rack X X X X X

64 Bit 2P Rack Interconnect

On-board LOM X X X X X All add-in Ethernet NICs supported by

platform X X X X X Heterogeneous

Interconnect X X Homogeneous

Interconnect X X X X X

NIC Teaming

Public Network

Only

Public Network

Only

Public Network

Only X X

5

Win NT EE - Windows NT, Enterprise Edition W2K AS - Windows 2000, Advanced Server W2K3 EE - Windows Server 2003, Enterprise Edition RHEL 2.1 - Red Hat Enterprise License 2.1 RHEL 3 AS - Red Hat Enterprise License 3 Advanced Server Application Availability Because the application is critical, Dell focuses on understanding and proposing applications that are cluster aware, such as Microsoft Exchange and Microsoft SQL Server to name a few. By leveraging cluster aware applications, the clustering software can perform an operation to see if an application is responding. When it is not, the cluster software assumes the application is hung and the application attempts to restart on the same system. This is referred to as a local recovery. Local recoveries are quicker to perform than a failover to a backup server. Thus users are usually up and running quicker. Having failed all of these steps, the clustering software will fail resources over to another cluster node; this includes any applications. Node failover takes longer for the application to come up, but service will be restarted once the backup node and application are up and running.

Planned downtime can be managed in a more effective manner. Maintenance from a hardware as well as a software perspective can be performed on one of the servers while the other servers continue to provide the needed functionality for the users. No longer does this important task have to impact users or be performed at non-working timeframes.

As Dell’s high availability portfolio continues to expand, application monitoring and fault prevention are areas that continue to be a primary focus for improving application availability.

= High Availability

= Disaster Recovery

Conclusion

Dell is continuing to drive simplicity and standardization within the HA cluster market. Previously, High Availability clustering was considered difficult to plan, productize, implement, test, and sell. Throughout the past several years, Dell has standardized the Windows HA Clustering market, and is now planning to do so for the Linux HA clustering market and the Oracle RAC solutions.

6