EMC ClarIIon High Availability

8/12/2019 EMC ClarIIon High Availability

1/19

EMC CLARiiON High Availability (HA)

Best Practices Planning

Abstract

This white paper discusses end-to-end high availability (HA). It takes into consideration the HA aspects ofmission-critical storage environments, starting at the host side and going all the way to the storage system

to include connectivity infrastructure involving switches. The paper also considers the importance of

keeping HA aspects in mind in order to maintain HA in production environments. This white paperdiscusses the choices available to customers so that they can set appropriate expectations for data

availability in their environments.

April 2007


2/19

Copyright 2005, 2007 EMC Corporation. All rights reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is

subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION

MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THEINFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED

WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable

software license.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com

All other trademarks used herein are the property of their respective owners.

Part Number H1737.1

EMC CLARiiON High Availability (HA)Best Practices Planning 2


3/19

Table of Contents

Executive summary ............................................................................................4

Introduction.........................................................................................................4

Audience ...................................................................................................................................... 4

Host configuration for high availability ............................................................4

Prerequisite for a highly available host environment ................................................................... 4

CLARiiON Procedure Generator ................................................................................................. 6

Number of paths from the host to the storage system................................................................. 6

Host bus adapter vendor information........................................................................................... 7

Automatic path failover and failback............................................................................................ 7

Path management failover settings.......................................................................................... 8Install free failover software (EMC PowerPath SE).................................................................. 8

Connectivity configuration for high availability...............................................8

Low protection.............................................................................................................................. 9

Medium protection ....................................................................................................................... 9High protection............................................................................................................................. 9

Ultra protection............................................................................................................................. 9

Switch vendor information............................................................................................................ 9

Storage-side high availability ............................................................................9

Storage-system components ..................................................................................................... 10

RAID configuration..................................................................................................................... 10

Low protection............................................................................................................................ 10

Medium protection ..................................................................................................................... 10

High protection........................................................................................................................... 11

Ultra protection........................................................................................................................... 11

Hot sparing policy ...................................................................................................................... 11

Number of hot spares............................................................................................................. 11Sizing the hot spare disk ........................................................................................................ 11Disk replacement.................................................................................................................... 11

Rebuild-time considerations....................................................................................................... 11

System load............................................................................................................................ 11Rebuild priority ....................................................................................................................... 12Size of the RAID group........................................................................................................... 12Size and type of disk drive ..................................................................................................... 12

Clustering and replication................................................................................13

Clustering: Protecting against loss of the primary application server....................................... 13

Data mirroring: Protecting against loss of the primary storage system ..................................... 13Validation and maintenance.............................................................................13

HA validation.............................................................................................................................. 14

Initial host failover testing....................................................................................................... 14Ongoing high availability verification...................................................................................... 14

Change control process............................................................................................................. 16

Conclusion ........................................................................................................17

Appendix A: Failover mode settings...............................................................17



4/19

Executive summaryEMC is focused on helping customers maintain the most highly available infrastructure possible. Having a

highly available environment involves many factors. These factors include not only deploying world-class

products and services but also deploying and configuring those products and services in a manner that

provides maximum availability. It is also important to note that high availability (HA) comes at a cost.However, not all applications need the same level of availability. Some applications are absolutely mission-

critical while others may be business-critical but can withstand a few minutes of outage. This white paper

discusses what is needed to ensure end-to-end HA and to help customers make appropriate choices.

IntroductionA highly available system is one that does not have any single point of failure (SPOF). In the event of acomponent or element failure, the system maintains its basic functionality. In many cases, an HA system is

able to withstand multiple failures as long as these failures do not occur within the redundant component

set. For example, in a RAID 5 group, a single disk failure does not affect data availability; the system canwithstand multiple single-disk failures as long as they occur in different RAID groups.

AudienceThis white paper is primarily intended for EMCCLARiiONcustomers. However, EMC field personnelcan also benefit from the information included as well.

Host configuration for high availabilityThere are multiple aspects to HA, but designing such an environment starts at the host side. To ensure that

the application data is highly available, the host must be configured properly to withstand certain single

failures, such as failure of the host bus adapter (HBA), fibre cable, or failover software.

Prerequisite for a highly available host environmentTo ensure that the production environment is supported per the configurations tested and verified for

interoperability by EMC, refer to the E-Lab Interoperability Navigator (a searchable database of theEMC Support Matrix). The Navigator is available on Powerlink, EMCs password-protected extranet for

customers and partners.

E-Lab Interoperability Navigator outlines, among other things, supported revisions of:

Host operating systems

HBA models

HBA software including firmware

Latest operating system patches

Switch firmware

CLARiiON FLAREoperating environment

Symmetrixmicrocode

Another important and related utility is the High Availability Verification Tool. HAVT helps you validatethat a servers components are supported by EMC. HAVT is available in the GUI and the text-based

versions of NavisphereServer Utility (starting with release 24), and is accessed by selecting the Server

UtilitysHigh Availability Verificationoption.

The Server Utility is included on the Navisphere Server Support CD that ships with the storage system.

The Server Utility is supported on Windows, Linux, HP-UX, AIX, and Solaris operating systems and now

offers additional features beyond the traditional one of registering the server initiators with the storage

system. Refer to theEMC Navisphere Host Agent/CLI and Utilities Release Noteson Powerlinkfor the

http://powerlink.emc.com/http://powerlink.emc.com/http://powerlink.emc.com/http://powerlink.emc.com/


5/19

latest supported revisions and features. Figure 1shows the welcome dialog box for the Server Utility.

This dialog box outlines all the features that are currently available in the Server Utility.

Figure 1.Navisphere Server Utility functions

For the purposes of this white paper, we will only discuss the Verify Server High Availabilityoption. To

validate a particular server, select this option and follow the prompts. The final result is the Navisphere

Server High Availability Report with various tabs containing important information. One of these tabs is

labeled Checklist.

The Configuration Checklisttab includes information on three main components: the storage system,

server hardware, and server software. This information includes the storage system name, model, andFLARE OE version; manufacturer, model and firmware of the host operating system; and manufacturer,

model, and firmware of the host itself. It also lists the names and version of the server software that is

installed. Figure 2shows an example of this checklist.



6/19

Figure 2. Server Utility High Availability verification checklist report

Once this checklist has been generated, the report can be printed and compared against the E-Lab

Interoperability Navigator (a searchable database of theEMC Support Matrixavailable on Powerlink,).

New E-Lab Wizards make it easier to check the relevant components depending on the task beingperformed. For example, if attaching a new host and configuring it for HA, the Storage Array Wizard helps

to determine if the server hardware and software is supported with a particular storage system model. The

checklist mirrors the order that information is prompted for within the wizard and includes all requiredinformation. This makes it quick and easy to input the required information and provides each servers

component revisions in one place.

This white paper discusses another important feature of the High Availability Verification report in the

Ongoing high availability verification section.

CLARiiON Procedure GeneratorThe CLARiiON Procedure Generator (CPG) is another useful tool built by EMC for customers and field

personnel. This tool is designed to create procedures for various operations, such as: installing a new

storage system or adding a host in a new or existing SAN environment, performing a software upgrade, and

performing certain recovery procedures. The CPG is available on the Powerlinkwebsite.

Number of paths from the host to the storage systemTo ensure there are redundant paths between the host and storage system, there must be a failover path in

case the primary path fails. The CLARiiON storage systems have a primary/secondary LUN ownership

model for the storage processors, although the host-addressable logical units (LUNs) are serviced via both

storage processors (SPs). In this architecture, a LUN will be serviced by one SP at a time. In the event that

the LUN is trespassed over to the peer SP, the LUN will be serviced by the other SP. This may occur if theprimary path to the default SP fails, the host HBA fails, orin some rare casesthe SP fails. In any such

event, it is important that there is a standby/secondary path to the peer SP and path-failover softwaresuch

http://powerlink.emc.com/http://powerlink.emc.com/http://powerlink.emc.com/http://powerlink.emc.com/


7/19

as EMC PowerPathto initiate failover to the standby/secondary I/O path in order to ensure that the data

on the LUN is accessible via the secondary SP.

Table 1. Host failover options

Very Low

Single Drive

Failure

Power Failure

SP FailureHBA Failure

Server Failure

Storage System

Failure

2 HBAs per path with

PowerPath

Clustered Envrionment

Mirrored Data

Performance

and Protection

Low

Single Drive

Failure

Power Failure

SP Failure

HBA Failure

Multiple HBAs, SAN

environment, PowerPathHigh Protection

Medium

Single DriveFailure

Power Failure

SP Failure

Single HBA SAN environmentwith PowerPath SE

Multiple HBA direct attach with

PowerPath Base

Medium

Protection

High

Single Drive

Failure

Power Failure

Single HBA, Direct Attach

Single HBA, SAN, no

PowerPath SELow Protection

Risk of DataUnavailable

Storage System

Level

ProtectionAvailableHost Configuration

Host FailoverOptions

Very Low

Single Drive

Failure

Power Failure

SP FailureHBA Failure

Server Failure

Storage System

Failure

2 HBAs per path with

PowerPath

Clustered Envrionment

Mirrored Data

Performance

and Protection

Low

Single Drive

Failure

Power Failure

SP Failure

HBA Failure

Multiple HBAs, SAN

environment, PowerPathHigh Protection

Medium

Single DriveFailure

Power Failure

SP Failure

Single HBA SAN environmentwith PowerPath SE

Multiple HBA direct attach with

PowerPath Base

Medium

Protection

High

Single Drive

Failure

Power Failure

Single HBA, Direct Attach

Single HBA, SAN, no

PowerPath SELow Protection

Risk of DataUnavailable

Storage System

Level

ProtectionAvailableHost Configuration

Host FailoverOptions

Host bus adapter vendor informationThe HBA information for two major HBAs that EMC supports can be found at the following HBA vendor

websites:

Emulex drivers and installation docs with HBA settings at:http://www.emulex.com/ts/docoem/framemc.htm

QLogic drivers and installation docs with HBA settings at:

http://www.qlogic.com/support/oem_emc.asp

Please see the E-Lab Interoperability Navigator (available on Powerlink) for information about other

supported HBAs.

Automatic path failover and failbackTo automate path failover and failback, some type of path failover software must be running on the host

system. EMC supports several failover software packages, including its own PowerPath software. Whenthe failover software is correctly set up and is running on the host, it automatically fails over the application

I/Os from the failed path to the secondary/standby paths. To learn more about EMC PowerPath, go to

EMC.com:

http://software.emc.com/products/software_az/powerpath.htm

http://www.emulex.com/ts/docoem/framemc.htmhttp://www.qlogic.com/support/oem_emc.asphttp://powerlink.emc.com/http://powerlink.emc.com/http://www.qlogic.com/support/oem_emc.asphttp://www.emulex.com/ts/docoem/framemc.htm


8/19

Path management failover settings

Several path failover software packages may be present in the environment. To ensure that the storage

system is configured appropriately for the host operating system environment, the following parameters

must be set on the storage system:

arraycommpath

failovermode systemtype

unitserialnumber

Install free failover software (EMC PowerPath SE)

Failover software is required to maintain data availability during coordinated SP reboots(for example

when updating software on the CLARiiON storage system). Customers with single-HBA hosts (switchattached) can use PowerPath free of charge. This basic PowerPath functionality (PowerPath SE) is

available on the CLARiiON Utility Kit CD, as well as on Powerlink. The CLARiiON Utility Kit CD ships

with the storage system. A version for each operating system type in the environment is supplied.

Connectivity configuration for high availabilityAfter ensuring that the host configuration complies with the HA best practices, the next thing to consider is

the connectivity infrastructure. How the host is physically connected to the storage system host determines

the protection level. Table 2 shows how different configurations offer different levels of protection.

Table 2. Connectivity options

Host

Array

HBA

SPA SPB

Host

Array

HBA

SPA SPB

Switch Switch Switch

Host

Array

HBA HBA

SPA SPB

Switch Switch

Host

Array

HBA HBA

SPA SPB

HBA HBA

Protection Low Protection Medium Protection High Protection Ultra Protection

Connectivity

Direct Connect -

No Switch

Available

Single HBA with

Single Switch

True High

Availability

Ultra HA - All

components are

redundant

PowerPath

None - Insufficient

Hardware for

PowerPath PowerPath SE PowerPath - Full PowerPath - Full

Single Pointsof Failure HBA; SP; Cable HBA; Switch; Cable None None

Host

Array

HBA

SPA SPB

Host

Array

HBA

SPA SPB

Switch Switch Switch

Host

Array

HBA HBA

SPA SPB

Switch Switch

Host

Array

HBA HBA

SPA SPB

HBA HBA


Connectivity

Direct Connect -

No Switch

Available

Single HBA with

Single Switch

True High

Availability

Ultra HA - All

components are

redundant

PowerPath

None - Insufficient

Hardware for


Single Pointsof Failure HBA; SP; Cable HBA; Switch; Cable None None


Connectivity

Direct Connect -

No Switch

Available

Single HBA with

Single Switch

True High

Availability

Ultra HA - All

components are

redundant

PowerPath

None - Insufficient

Hardware for


Single Pointsof Failure HBA; SP; Cable HBA; Switch; Cable None None `

The following sections describe the protection options shown in Table 2.

http://powerlink.emc.com/http://powerlink.emc.com/


9/19

Low protectionThis is a very basic option. In this case, the host has a single HBA and is connected to a single SP. This

configuration includes multiple single points of failure. The failure of the HBA, cable, or SP results in data

being unavailable for the host applications. Customers running mission-critical applications must refrainfrom this configuration.

Medium protectionThis option provides some protection. It consists of the use of PowerPath SE softwarewhich EMC

provides free of costand a single switch for host-to-storage connectivity. When configured properly, the

PowerPath SE software running on the host provides basic LUN trespass functionality during operations

such as nondisruptive upgrades (NDUs), as well as in the rare event of single SP failure. This configuration

includes multiple single points of failure. Failure of the HBA, cable, or switch will result in data beingunavailable for the host applications. Customers running mission-critical applications must refrain from this

configuration.

High protectionThis configuration is recommended for all business-critical applications provided that adequate measures

have been taken at the host and application level. It entails the use of full-feature PowerPath software. Inthis configuration, there are dual HBAs connected to the host; therefore, there is a redundant path to each

SP. There is no single point of failure. Data availability is ensured in the event an HBA, cable, or SP fails.

Since there is a single path per SP, this configuration does not provide any additional performance

enhancement.

Ultra protectionThis configuration is recommended for the highest level of protection that may also help with higher

performance. It entails the use of full-feature PowerPath software. In this configuration, there are dual

HBAs connected to the host; therefore, there is a redundant path to each SP. There is no single point offailure. Data availability is ensured in event an HBA, cable, or SP fails. Since there are multiple paths per

SP, this configuration benefits from PowerPaths load-balancing feature and thus provides additional

performance.

Switch vendor informationStorage area network (SAN) switch information on a few switch types can be found at the following switchvendor websites; theEMC Support Matrixcontains information about other switch types that EMC

supports:

Cisco storage networking website:

http://www.cisco.com/en/US/products/hw/ps4159/index.html

Brocade storage networking website:

http://www.brocade.com/products/index.jsp

Storage-side high availabilityOnce the host- and connectivity-side HA is ensured, the next area of focus is the storage system itself. For

any mission-critical application environment, it is important that storage system on which the data resides

has a highly available architecture. The CLARiiON storage system offers N+1 redundant architecture,

which provides data protection against any single component failure. These components are discussed inthe next section.

http://www.cisco.com/en/US/products/hw/ps4159/index.htmlhttp://www.brocade.com/products/index.jsphttp://www.brocade.com/products/index.jsphttp://www.cisco.com/en/US/products/hw/ps4159/index.html


10/19

Storage-system componentsCLARiiON storage systems are designed for high availability. With redundant components such as dual

SPs, dual back-end loops, and dual-ported disk drives, CLARiiON storage systems can ride through

multiple component failure scenarios. Other features that make CLARiiON storage systems resilient in faceof various failure types include: triple protection of the storage system database (also referred to as PSM

Persistent Storage Manager) and the FLARE database, which, among others, is designed to keep customer

data available. These features are available out-of-the-box to our customers and no additional configurationis required at the time of installation. However, customers must choose the way in which disks are bound.

The next section discusses the best practices for RAID configuration.

RAID configurationTo access the data on the disk drives, the disks must be bound into a RAID group. There are differentRAID configurations that are supported with a CLARiiON storage system. Note that customers have the

option of using individual disks. However, this is rarely done, since by doing so they do not benefit from

the redundancy offered by the RAID-protected configuration.

CLARiiON offers RAID 1, 1/0, 3, and 5 options for data protection. These configurations offer protection

against single-disk failure. In the case of RAID 1/0 (mirrored stripes), multiple disk failure can be tolerated

in some cases as long as the disk failure does not occur within same mirrored pair. RAID 0 offers high-performance RAID configuration, but does not offer any protection.

Table 3 shows examples of various protection levels as they relate to resilience in case of disk failure and

the effective utilization of the raw disk space.

Table 3. Storage system RAID configuration options

Low protectionDisks configured as RAID 0 offer no protection against a single disk failure. The only reason this RAID

type is selected is to get the benefit of striped writes that RAID 0 offers for higher performance. This

configuration may be used for things like temp files or for holding any data that needs fast access. In event

of a disk failure, the information within RAID 0 group will be unrecoverable. Therefore this RAIDconfiguration must be used with great caution.

Medium protectionDisks configured as RAID 3 or RAID 5 with eight or more disks provide protection against a single-diskfailure within the same RAID group. When a single disk fails within the RAID group, the hot spare

(discussed in next section) is invoked and data that was contained on the failed disk is rebuilt. When the

failed disk is replaced, the data from the hot spare (invoked earlier) is copied to the replaced drive. Once

the rebuild process completes, the RAID group is ready to withstand another disk failure.



11/19

High protectionIn order to make the RAID group less vulnerable to double-disk failure, fewer disks can be used to

configure the RAID group, which reduces the possibility of double-disk failure within the same RAID

group. Medium protection may be upgraded to high protection by reducing the number of disks within theRAID 3 or RAID 5 group to three disks.

Ultra protectionIn certain environments, data protection and application performance are equally important requirements

(especially in certain transaction-processing environments such as database applications). To meet these

objectives, customers can choose a RAID 1/0 configuration that offers high performance and additional

protection against disk failure. In the RAID 1/0, configuration, a double-disk fault may be tolerated as longas there is no more than one disk failure within the mirrored pair.

Hot sparing policyHot spares are spare disks that are pre-allocated at the time of configuration. A hot spare is invoked in an

event of a disk failure. Once invoked, the data that resides on the failed disk is rebuilt from either the

mirrored pair RAID 1 and 1/0 or from data and parity from other drives, in the case of RAID 3 or RAID 5.

When the failed disk is replaced, the data on the replaced disks is equalized with the content of the hot

spare. After the completion of the equalization process, the hot spare returns to its default position, readyfor any future disk failure event.

Number of hot spares

There should be at least one hot spare disk for every 30 drives on the CLARiiON storage system. It is up to

the customer to configure more hot spares. For ease of management, it is also recommended that the hot

spare be configured on the last drive slot on a disk-array enclosure/shelf. However, the hot spare may be

configured anywhere in the system with exception of the vault drives that are used for cache vault andcertain other internal purposes. The vault drives are the first five drives on the CX series. Their location

varies for prior generation of products.

Sizing the hot spare diskWhen planning global hot spares on the system, disks should be as large as or larger than the drive(s) they

may be required to replace in the event of disk failure.

Disk replacement

If a disk drive fails and needs to be replaced, you should follow EMCs recommended procedure for diskreplacement and take all the necessary precautions, including proper drive handling. In case of doubt, wait

for EMC support personnel. In the rare event of multiple drive failure, please do not replace the drive.

Instead, wait for the trained EMC support personnel to arrive.

Rebuild-time considerationsWhen a disk drive fails and a hot spare is invoked, the rebuild process starts. During this process, data that

was resident on the failed disk is rebuilt from available redundant components. The time it takes to rebuilda failed disk depends upon various factors, which are discussed next.

System load

During the rebuild process the system has to do a significant amount of work to read the data from theredundant components of the RAID group to rebuild data. In the case of mirrored RAID groups (RAID 1

and RAID 1/0), the process involves reading data from the mirrored pair. In case of parity RAID groups

(RAID 3 and RAID 5) data is reconstructed by reading the data and parity from the available disk drives in



12/19

the RAID group. Therefore, if the system is idle, the rebuild process will be faster since all the system

resources are available to the rebuild process. If the system is under heavy application workload, the

rebuild process can take relatively longer to complete.

Rebuild priority

Beside system load, the rebuild process is controlled by the priority set for the rebuild process for the LUNs

within the RAID group. The priority settings for the rebuild process are:

Low

Medium

Fast

ASAP

By default the priority is set to ASAP. In most cases, the default is the recommended rebuild priority.

Size of the RAID group

The bigger the size of the RAID group (that is, number of disks within the RAID group), the longer it takes

to complete the rebuild process in the event of a single disk failure within the RAID group. The rebuild

time can be reduced if the size of the RAID group is kept smaller.

Size and type of disk drive

The size and type of the disk itself can affect the rebuild time. For example, a five-disk RAID 5 group

comprised of 36 GB 15,000 rpm disk drives will rebuild relatively faster than a nine-disk RAID 5 group

comprised of 250 GB 5,400 rpm disk drives.

Table 4 shows examples of various RAID group configurations under different system loads, rebuild

priorities, RAID groups, and disk sizes in order to show the risk associated with a double-disk failure

scenario. These are only examples and do not presume to show the best configuration.



13/19

Table 4. RAID protection options and rebuilds

Clustering and replicationThere is another dimension to high availability beyond local protection. Besides host, connectivity, and

storage high availability, options such as protection against the server hardware and storage-system failureshould be considered. These events are very rare, but if they happen they may cause disruption.

Clustering: Protecting against loss of the primary applicationserverThere are various clustering software products to protect the production environment against server failure.The most popular products include the following:

Microsoft Cluster Server

VERITAS Cluster Server

Sun Cluster Server

Please check theEMC Support Matrixfor supported cluster software.

Data mirroring: Protecting against loss of the primary storagesystemTo protect the production site against the failure of access to the primary storage system (that is, due to

power failure), customers can mirror data from the primary site to another disaster recovery site. Depending

on the recovery point objective (RPO) and recovery time objective (RTO), there are different optionsavailable. In cases where customers cannot afford to lose even a single transaction, they can use a

synchronous mirroring product such as MirrorView/Synchronous. In circumstances where the disaster

recovery site must be hundreds or even thousands of miles from production site, some type of

asynchronous replication application (such as RecoverPoint or MirrorView/Asynchronous) may be used.

In almost all of these cases, the customer will benefit from the use of both clustering and remote mirroring.

Validation and maintenanceA significant amount of time is often invested in preparing and configuring a new environment for highavailability. Because of this investment, it is important to validate the environment after configuration to

ensure it was implemented properly for high availability. If you have installed a new storage system,

attached a new host, or are about to perform other ongoing maintenance procedures (such as updating the

software on the CLARiiON storage system), it is imperative that you test the HA configuration to ensure

that data availability is maintained in the event of, for example, a path failure. It is also imperative that



14/19

failover testing occur regularly within the environment to protect against any inadvertent changes that may

have broken the high availability of the configuration.

HA validationThere are two important pieces to testing the high availability of your environment. After installing a

storage system, or attaching a new host, you should perform a physical test to ensure that failover occurred

as expected. Then, after you are in production, you should perform periodic health checks to validate theavailability of the environment, and to make sure that nothing unexpected changed within the environment(for example, a zone inadvertently changed, or failovermode adjusted on the wrong initiator). The next two

sections discuss how to check for failover after installation and how to perform ongoing periodic

verification.

Initial host failover testing

After the host environment has been successfully configured for failoverincluding the installation of

failover software (for example, EMC PowerPath), HBAs, and so forththe next and most important step is

testing. While the environment is in the deployment stage, induce a failure. For example, pull a fibre cable

that connects the host HBA to the storage system or switch, and ensure that the application LUNs are failed

over to the alternate path by the host path management software. For failover to work there must be activeI/O at the host level.

Following a successful failover, test the failback capability when the fault condition is cleared. In this

example, reconnect the fibre cable and see if the host I/O fails back to the default path. For more

information about EMC PowerPath software failover and failback features, refer to the PowerPath-relateddocumentation on Powerlink.

After physically validating failover, the HAVT utility should also be run to ensure that there are no other

HA issues in the environment. This is discussed in the following section.

Ongoing high availability verification

After manually testing failover, each servers high availability should regularly be verified to ensure that

nothing has changed in the HA configuration. HA verification should also be performed before a software

update is performed on the CLARiiON storage system. Because an update of the FLARE OE software or

the installation of a new software enabler reboots each SP in turn, it is important to ensure that each hostthat is to remain online during the update can ride through this reboot while maintaining access to the data

on the system. Maintaining access during an update means that, at a minimum, each server is zoned to each

SP and PowerPath SE is installed with the proper failovermode settings applied.

As described in the Prerequisite for a highly available host environment section, HAVT is a tool that is

used when upgrading arrays, and is available in the GUI and the text-based versions of Navisphere ServerUtility (starting with release 24), and with the Navisphere Service Taskbar Software Assistant. HAVT

allows you to validate CLARiiON attached hosts for high availability. Select the Verify Server High

Availabilityoption as shown in Figure 1, and indicate whether this check is part of a Software Update, in

which case the result of the report is sent to the storage system so that the software update process can

validate the servers will ride through the update; or whether it is a host attach validation (regular health

check). HAVT displays results that show whether the server meets HA requirements and allows you to

view the Navisphere Server High Availability Report. In this scenario the important tab to note is theIssuestab as shown in Figure 3.



15/19

Figure 3. Navisphere Server Utility High Availability verification issues report



16/19

Issues are generated based on a series of checks performed by the HAVT utility. These checks include

looking for redundant HBAs, ensuring path management software is installed, and validating that the

proper initiator settings (such as failovermode) are set on the storage system. As of release 24, HAVT

supports the following operating systems (refer to theEMC Navisphere Host Agent/CLI and Utilities

ReleaseNotes onPowerlinkfor the latest support information):

Solaris 8,9

HP-UX 11.0, 11.11, 11.23 IA 64, 11.23 PA RISC Windows 2000, 2003 (Fibre Channel and iSCSI attaches)

AIX 5.2, 5.3

Red Hat Enterprise 3 and 4 (Fibre Channel and iSCSI attaches)

SuSE Enterprise Server 8 and 9 (Fibre Channel and iSCSI attaches)

AsianUX (2.0) (Fibre Channel and iSCSI attaches)

HAVT supports the following failover software:

PowerPath

VERITAS DMP

HP-UX PVLinks

The Issues summary lists all the critical errors and warnings discovered with the host configuration as

regards High Availability, and provides corrective actions for each error and warning. The report also

includes a Detailstab. This tab provides further information on:

Server Status: Includes the version of the failover software, and information about the FC HBAs,iSCSI HBAs, and NICs, as well as details on the devices (similar to the information displayed by a

powermt display dev=all command issued from the PowerPath command line).

Initiators: Includes information about HBA and NIC configuration details, including driver andregistry settings, iSCSI host iqn, persistent targets, established sessions, and (CHAP/mutualCHAP) security information for iSCSI initiators.

Data Connection Report: This gives information about the failover mode, arraycommpath, initiatortype, and registration for each HBA connected to a device.

Software, Services and System Updates: Server-specific software installed features, includingEMC specific software and OS patches

HAVT may be run in the following scenarios:

As part of the Prepare For Installation step of the Software Assistant, a process that aids customersand service with upgrading the storage system. In this case, run the HAVT utility and analyze each

host report to preempt potential data unavailability issues that could occur during the SP reboots as

part of the software install.

Any time you need to check for host-related issues.

Periodically. Using a script, you can run periodically run HAVT to generate a report for analysis.

HAVT and its resulting report are important tools to help avoid potential data unavailability issues caused

by improperly attached hosts. Improper server configurations (meaning HA was not properlyimplemented) are the No. 1 issue identified in weekly analyses of data unavailable reports. HAVT has

been designed to help customers avoid these issues by providing a utility to validate the environment

during critical maintenance procedures, such as a new server attach, new storage system installation, orCLARiiON software update.

Change control processIt is very likely that a live production environment is going to change due to business requirements and

other factors. To ensure that the production environment maintains its resilience and remains highly

available, it is important that you refer to the E-Lab Interoperability Navigatorbefore making any changes.This includes changing any of the following:



17/19

Storage system software This is usually the process of upgrading the FLARE operatingenvironment on the CLARiiON storage system.

HBA firmware HBA vendors websites contain the latest information on HBA driver firmware,fcode, and other software. Refer to the following URLs for more details on Emulex and QLogic HBAs:

Emulex drivers and installation docs with HBA settings at:

http://www.emulex.com/ts/docoem/emc/index.html

QLogic drivers and installation docs with HBA settings at:

http://www.qlogic.com/support/oem_emc.asp

Switch firmware

Operating system patches and hot fixes

Path management software EMC PowerPath, HP PVLinks, VERITAS DMP, and so on

ConclusionThis white paper offers best practices for end-to-end high availability in a mission-critical and business-

critical production environment. The high availability design starts at the host, continues with connectivity,and ends with the storage system. By adding clustering technology and combining it with remote mirroring,

customers can make their production environments highly available. However, a highly available designcomes with a higher price tag. It requires redundant hardware components to guard against any possible

failures. To have a highly available environment requires a close identification of those applicationenvironments that need high protection when it comes to host components such as HBAs, use of failover

software such as PowerPath, use of redundant switch fabrics, and RAID protection at the storage-system

level. HAVT is an invaluable tool for ensuring the ongoing health of your highly available environment.

Appendix A: Failover mode settings

These settings are not applicable for CDL series.

The following tables include Initiator, arraycommpath, and Failover Mode settings for failover software on

CLARiiON-supported operating systems. It also indicates which failover software is supported on each

operating system. The following notes aid in appropriate use of these tables:

1. Initiator Typeis referred to as systemtypeif using NaviCLI rather than the failover wizard withinNavisphere Express.

2. The settings below are those set within the Navisphere Manager failover wizard, or, in the case of theUnitSerialNumbervariable, the Group Editoption in the Connectivity Statusdialog box.

3. Parentheses identify the NaviCLI equivalent value.

Table 5. AIX failover mode settings

Parameter PowerPath DMP (AIX 5.1 and 5.2 only)

Initiator Type CLARiiON Open (3) CLARiiON Open (3)

Arraycommpath Enabled (1) or Disabled (0)1

Enabled (1)Failovermode 3 or 1

2 2

UnitSerialNumber Array Array1 AIX settings depend on CLARiiON software being used:

If using ODM definitions, arraycommpathshould be set to Enabled (1).

If using CLArrayS3 software, arraycommpathshould be set to Disabled (0).2Set failovermode to 3 for PowerPath 4.5.1 or later. NDU primus case emc67186has the complete NDU

requirements.

http://www.emulex.com/ts/docoem/emc/index.htmlhttp://www.qlogic.com/support/oem_emc.asphttp://www.qlogic.com/support/oem_emc.asphttp://www.emulex.com/ts/docoem/emc/index.html


18/19

Table 6. HP-UX failover mode settings

Parameter PVLinks PowerPath DMP

(HP-UX 11i

only)

No path

management

software

Initiator Type

(Access Logix)

HP Auto

Trespass (2)

1

HP No Auto

Trespass(hex A)

HP No Auto

Trespass(hex A)

HP No Auto

Trespass(hex A)

Initiator Type(non-Access Logix)

HP AutoTrespass (2)

1

decimal 10

(hex A)

N/A HP No AutoTrespass

(hex A)

Arraycommpath Enabled (1) or

Disabled (0)2

Enabled (1) Enabled (1) 0 (Disabled) or 1

(Enabled)2

Failovermode 0 1 2 0

UnitSerialNumber LUN or Array3 LUN or Array3 LUN or Array3 LUN or Array3

1HP PVLinks requires that AutoTrespass be set for all LUNs. To set AutoTrespass, edit the Navisphere

Host Agent configuration file (agent.conf) by commenting out OptionsSupported Autotrespassand

restarting the Host Agent.

2 For HP-UX running PVLinks, arraycommpathcan be Enabled (1) or Disabled (0). Either will work.

3For HP-UX 11i v1.0, UnitSerialNumbermay have been changed to LUN if problems were experiencedwith the device display in the HP-UX SAM utility.

Table 7. Linux failover mode settings

Parameter PowerPath DMP MPIO

Initiator Type CLARiiON Open (3) CLARiiON Open (3) CLARiiON Open (3)

Arraycommpath Enabled (1) Enabled (1) Enabled (1)

Failovermode 1 2 1UnitSerialNumber Array Array Array

Table 8. NetWare failover mode settings1

Parameter PowerPath

Initiator Type CLARiiON Open (3)

Arraycommpath Enabled (1)

Failovermode 1

UnitSerialNumber Array

1Validate support for NetWare on the newer CLARiiON models via the E-Lab Interoperability Navigator

(available on Powerlink, EMCs password-protected extranet for customers and partners). An RPQ maybe required.



19/19

Table 9. Solaris failover mode settings

Parameter PowerPath DMP StorEdge Traffic

Manager

Initiator Type CLARiiON Open (3) CLARiiON Open (3) CLARiiON Open (3)

Arraycommpath Enabled (1) Enabled (1) Enabled (1)

Failovermode 1 2 1

UnitSerialNumber* Array or LUN Array or LUN Array or LUN

*Sun Solaris installations with PowerPath and DMP:

Solaris 2.6, 7, and 8: UnitSerialNumber should be set to LUN.

Solaris 9: Will work with UnitSerialNumberset to either Storage System or LUN.

Table 10. Tru64 failover mode settings

Parameter Native failover

Initiator Type Compaq/Tru64 (hex 1C)


Failovermode 0


Table 11. VMware failover mode settings

Parameter Native

Initiator Type CLARiiON Open (3)


Failovermode 1


Table 12. Windows failover mode settings

Parameter PowerPath DMP (Windows 2000

and Windows 2003

only)

Initiator Type CLARiiON Open (3) CLARiiON Open (3)

Arraycommpath Enabled (1) Enabled (1)

Failovermode 1 1

UnitSerialNumber Array Array

When changing any one of the Arraycommpath, Failovermode, and Initiator Typesettings, all settingswill be set to the storage-system default settings, so it is necessary to set not only the setting that is being

changed but also all initiator settings. This applies to both methods of changing these parameters (GroupEditin Connectivity Statusand the failover wizard).

EMC CLARiiON High Availability (HA)B t P ti Pl i 19

Documents

EMC ClarIIon High Availability