EMC ClarIIon High Availability

Embed Size (px)

Citation preview

  • 8/12/2019 EMC ClarIIon High Availability

    1/19

    EMC CLARiiON High Availability (HA)

    Best Practices Planning

    Abstract

    This white paper discusses end-to-end high availability (HA). It takes into consideration the HA aspects ofmission-critical storage environments, starting at the host side and going all the way to the storage system

    to include connectivity infrastructure involving switches. The paper also considers the importance of

    keeping HA aspects in mind in order to maintain HA in production environments. This white paperdiscusses the choices available to customers so that they can set appropriate expectations for data

    availability in their environments.

    April 2007

  • 8/12/2019 EMC ClarIIon High Availability

    2/19

    Copyright 2005, 2007 EMC Corporation. All rights reserved.

    EMC believes the information in this publication is accurate as of its publication date. The information is

    subject to change without notice.

    THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION

    MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THEINFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED

    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

    Use, copying, and distribution of any EMC software described in this publication requires an applicable

    software license.

    For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com

    All other trademarks used herein are the property of their respective owners.

    Part Number H1737.1

    EMC CLARiiON High Availability (HA)Best Practices Planning 2

  • 8/12/2019 EMC ClarIIon High Availability

    3/19

    Table of Contents

    Executive summary ............................................................................................4

    Introduction.........................................................................................................4

    Audience ...................................................................................................................................... 4

    Host configuration for high availability ............................................................4

    Prerequisite for a highly available host environment ................................................................... 4

    CLARiiON Procedure Generator ................................................................................................. 6

    Number of paths from the host to the storage system................................................................. 6

    Host bus adapter vendor information........................................................................................... 7

    Automatic path failover and failback............................................................................................ 7

    Path management failover settings.......................................................................................... 8Install free failover software (EMC PowerPath SE).................................................................. 8

    Connectivity configuration for high availability...............................................8

    Low protection.............................................................................................................................. 9

    Medium protection ....................................................................................................................... 9High protection............................................................................................................................. 9

    Ultra protection............................................................................................................................. 9

    Switch vendor information............................................................................................................ 9

    Storage-side high availability ............................................................................9

    Storage-system components ..................................................................................................... 10

    RAID configuration..................................................................................................................... 10

    Low protection............................................................................................................................ 10

    Medium protection ..................................................................................................................... 10

    High protection........................................................................................................................... 11

    Ultra protection........................................................................................................................... 11

    Hot sparing policy ...................................................................................................................... 11

    Number of hot spares............................................................................................................. 11Sizing the hot spare disk ........................................................................................................ 11Disk replacement.................................................................................................................... 11

    Rebuild-time considerations....................................................................................................... 11

    System load............................................................................................................................ 11Rebuild priority ....................................................................................................................... 12Size of the RAID group........................................................................................................... 12Size and type of disk drive ..................................................................................................... 12

    Clustering and replication................................................................................13

    Clustering: Protecting against loss of the primary application server....................................... 13

    Data mirroring: Protecting against loss of the primary storage system ..................................... 13Validation and maintenance.............................................................................13

    HA validation.............................................................................................................................. 14

    Initial host failover testing....................................................................................................... 14Ongoing high availability verification...................................................................................... 14

    Change control process............................................................................................................. 16

    Conclusion ........................................................................................................17

    Appendix A: Failover mode settings...............................................................17

    EMC CLARiiON High Availability (HA)Best Practices Planning 3

  • 8/12/2019 EMC ClarIIon High Availability

    4/19

    Executive summaryEMC is focused on helping customers maintain the most highly available infrastructure possible. Having a

    highly available environment involves many factors. These factors include not only deploying world-class

    products and services but also deploying and configuring those products and services in a manner that

    provides maximum availability. It is also important to note that high availability (HA) comes at a cost.However, not all applications need the same level of availability. Some applications are absolutely mission-

    critical while others may be business-critical but can withstand a few minutes of outage. This white paper

    discusses what is needed to ensure end-to-end HA and to help customers make appropriate choices.

    IntroductionA highly available system is one that does not have any single point of failure (SPOF). In the event of acomponent or element failure, the system maintains its basic functionality. In many cases, an HA system is

    able to withstand multiple failures as long as these failures do not occur within the redundant component

    set. For example, in a RAID 5 group, a single disk failure does not affect data availability; the system canwithstand multiple single-disk failures as long as they occur in different RAID groups.

    AudienceThis white paper is primarily intended for EMCCLARiiONcustomers. However, EMC field personnelcan also benefit from the information included as well.

    Host configuration for high availabilityThere are multiple aspects to HA, but designing such an environment starts at the host side. To ensure that

    the application data is highly available, the host must be configured properly to withstand certain single

    failures, such as failure of the host bus adapter (HBA), fibre cable, or failover software.

    Prerequisite for a highly available host environmentTo ensure that the production environment is supported per the configurations tested and verified for

    interoperability by EMC, refer to the E-Lab Interoperability Navigator (a searchable database of theEMC Support Matrix). The Navigator is available on Powerlink, EMCs password-protected extranet for

    customers and partners.

    E-Lab Interoperability Navigator outlines, among other things, supported revisions of:

    Host operating systems

    HBA models

    HBA software including firmware

    Latest operating system patches

    Switch firmware

    CLARiiON FLAREoperating environment

    Symmetrixmicrocode

    Another important and related utility is the High Availability Verification Tool. HAVT helps you validatethat a servers components are supported by EMC. HAVT is available in the GUI and the text-based

    versions of NavisphereServer Utility (starting with release 24), and is accessed by selecting the Server

    UtilitysHigh Availability Verificationoption.

    The Server Utility is included on the Navisphere Server Support CD that ships with the storage system.

    The Server Utility is supported on Windows, Linux, HP-UX, AIX, and Solaris operating systems and now

    offers additional features beyond the traditional one of registering the server initiators with the storage

    system. Refer to theEMC Navisphere Host Agent/CLI and Utilities Release Noteson Powerlinkfor the

    EMC CLARiiON High Availability (HA)Best Practices Planning 4

    http://powerlink.emc.com/http://powerlink.emc.com/http://powerlink.emc.com/http://powerlink.emc.com/
  • 8/12/2019 EMC ClarIIon High Availability

    5/19

    latest supported revisions and features. Figure 1shows the welcome dialog box for the Server Utility.

    This dialog box outlines all the features that are currently available in the Server Utility.

    Figure 1.Navisphere Server Utility functions

    For the purposes of this white paper, we will only discuss the Verify Server High Availabilityoption. To

    validate a particular server, select this option and follow the prompts. The final result is the Navisphere

    Server High Availability Report with various tabs containing important information. One of these tabs is

    labeled Checklist.

    The Configuration Checklisttab includes information on three main components: the storage system,

    server hardware, and server software. This information includes the storage system name, model, andFLARE OE version; manufacturer, model and firmware of the host operating system; and manufacturer,

    model, and firmware of the host itself. It also lists the names and version of the server software that is

    installed. Figure 2shows an example of this checklist.

    EMC CLARiiON High Availability (HA)Best Practices Planning 5

  • 8/12/2019 EMC ClarIIon High Availability

    6/19

    Figure 2. Server Utility High Availability verification checklist report

    Once this checklist has been generated, the report can be printed and compared against the E-Lab

    Interoperability Navigator (a searchable database of theEMC Support Matrixavailable on Powerlink,).

    New E-Lab Wizards make it easier to check the relevant components depending on the task beingperformed. For example, if attaching a new host and configuring it for HA, the Storage Array Wizard helps

    to determine if the server hardware and software is supported with a particular storage system model. The

    checklist mirrors the order that information is prompted for within the wizard and includes all requiredinformation. This makes it quick and easy to input the required information and provides each servers

    component revisions in one place.

    This white paper discusses another important feature of the High Availability Verification report in the

    Ongoing high availability verification section.

    CLARiiON Procedure GeneratorThe CLARiiON Procedure Generator (CPG) is another useful tool built by EMC for customers and field

    personnel. This tool is designed to create procedures for various operations, such as: installing a new

    storage system or adding a host in a new or existing SAN environment, performing a software upgrade, and

    performing certain recovery procedures. The CPG is available on the Powerlinkwebsite.

    Number of paths from the host to the storage systemTo ensure there are redundant paths between the host and storage system, there must be a failover path in

    case the primary path fails. The CLARiiON storage systems have a primary/secondary LUN ownership

    model for the storage processors, although the host-addressable logical units (LUNs) are serviced via both

    storage processors (SPs). In this architecture, a LUN will be serviced by one SP at a time. In the event that

    the LUN is trespassed over to the peer SP, the LUN will be serviced by the other SP. This may occur if theprimary path to the default SP fails, the host HBA fails, orin some rare casesthe SP fails. In any such

    event, it is important that there is a standby/secondary path to the peer SP and path-failover softwaresuch

    EMC CLARiiON High Availability (HA)Best Practices Planning 6

    http://powerlink.emc.com/http://powerlink.emc.com/http://powerlink.emc.com/http://powerlink.emc.com/
  • 8/12/2019 EMC ClarIIon High Availability

    7/19

    as EMC PowerPathto initiate failover to the standby/secondary I/O path in order to ensure that the data

    on the LUN is accessible via the secondary SP.

    Table 1. Host failover options

    Very Low

    Single Drive

    Failure

    Power Failure

    SP FailureHBA Failure

    Server Failure

    Storage System

    Failure

    2 HBAs per path with

    PowerPath

    Clustered Envrionment

    Mirrored Data

    Performance

    and Protection

    Low

    Single Drive

    Failure

    Power Failure

    SP Failure

    HBA Failure

    Multiple HBAs, SAN

    environment, PowerPathHigh Protection

    Medium

    Single DriveFailure

    Power Failure

    SP Failure

    Single HBA SAN environmentwith PowerPath SE

    Multiple HBA direct attach with

    PowerPath Base

    Medium

    Protection

    High

    Single Drive

    Failure

    Power Failure

    Single HBA, Direct Attach

    Single HBA, SAN, no

    PowerPath SELow Protection

    Risk of DataUnavailable

    Storage System

    Level

    ProtectionAvailableHost Configuration

    Host FailoverOptions

    Very Low

    Single Drive

    Failure

    Power Failure

    SP FailureHBA Failure

    Server Failure

    Storage System

    Failure

    2 HBAs per path with

    PowerPath

    Clustered Envrionment

    Mirrored Data

    Performance

    and Protection

    Low

    Single Drive

    Failure

    Power Failure

    SP Failure

    HBA Failure

    Multiple HBAs, SAN

    environment, PowerPathHigh Protection

    Medium

    Single DriveFailure

    Power Failure

    SP Failure

    Single HBA SAN environmentwith PowerPath SE

    Multiple HBA direct attach with

    PowerPath Base

    Medium

    Protection

    High

    Single Drive

    Failure

    Power Failure

    Single HBA, Direct Attach

    Single HBA, SAN, no

    PowerPath SELow Protection

    Risk of DataUnavailable

    Storage System

    Level

    ProtectionAvailableHost Configuration

    Host FailoverOptions

    Host bus adapter vendor informationThe HBA information for two major HBAs that EMC supports can be found at the following HBA vendor

    websites:

    Emulex drivers and installation docs with HBA settings at:http://www.emulex.com/ts/docoem/framemc.htm

    QLogic drivers and installation docs with HBA settings at:

    http://www.qlogic.com/support/oem_emc.asp

    Please see the E-Lab Interoperability Navigator (available on Powerlink) for information about other

    supported HBAs.

    Automatic path failover and failbackTo automate path failover and failback, some type of path failover software must be running on the host

    system. EMC supports several failover software packages, including its own PowerPath software. Whenthe failover software is correctly set up and is running on the host, it automatically fails over the application

    I/Os from the failed path to the secondary/standby paths. To learn more about EMC PowerPath, go to

    EMC.com:

    http://software.emc.com/products/software_az/powerpath.htm

    EMC CLARiiON High Availability (HA)Best Practices Planning 7

    http://www.emulex.com/ts/docoem/framemc.htmhttp://www.qlogic.com/support/oem_emc.asphttp://powerlink.emc.com/http://powerlink.emc.com/http://www.qlogic.com/support/oem_emc.asphttp://www.emulex.com/ts/docoem/framemc.htm
  • 8/12/2019 EMC ClarIIon High Availability

    8/19

    Path management failover settings

    Several path failover software packages may be present in the environment. To ensure that the storage

    system is configured appropriately for the host operating system environment, the following parameters

    must be set on the storage system:

    arraycommpath

    failovermode systemtype

    unitserialnumber

    Install free failover software (EMC PowerPath SE)

    Failover software is required to maintain data availability during coordinated SP reboots(for example

    when updating software on the CLARiiON storage system). Customers with single-HBA hosts (switchattached) can use PowerPath free of charge. This basic PowerPath functionality (PowerPath SE) is

    available on the CLARiiON Utility Kit CD, as well as on Powerlink. The CLARiiON Utility Kit CD ships

    with the storage system. A version for each operating system type in the environment is supplied.

    Connectivity configuration for high availabilityAfter ensuring that the host configuration complies with the HA best practices, the next thing to consider is

    the connectivity infrastructure. How the host is physically connected to the storage system host determines

    the protection level. Table 2 shows how different configurations offer different levels of protection.

    Table 2. Connectivity options

    Host

    Array

    HBA

    SPA SPB

    Host

    Array

    HBA

    SPA SPB

    Switch Switch Switch

    Host

    Array

    HBA HBA

    SPA SPB

    Switch Switch

    Host

    Array

    HBA HBA

    SPA SPB

    HBA HBA

    Protection Low Protection Medium Protection High Protection Ultra Protection

    Connectivity

    Direct Connect -

    No Switch

    Available

    Single HBA with

    Single Switch

    True High

    Availability

    Ultra HA - All

    components are

    redundant

    PowerPath

    None - Insufficient

    Hardware for

    PowerPath PowerPath SE PowerPath - Full PowerPath - Full

    Single Pointsof Failure HBA; SP; Cable HBA; Switch; Cable None None

    Host

    Array

    HBA

    SPA SPB

    Host

    Array

    HBA

    SPA SPB

    Switch Switch Switch

    Host

    Array

    HBA HBA

    SPA SPB

    Switch Switch

    Host

    Array

    HBA HBA

    SPA SPB

    HBA HBA

    Protection Low Protection Medium Protection High Protection Ultra Protection

    Connectivity

    Direct Connect -

    No Switch

    Available

    Single HBA with

    Single Switch

    True High

    Availability

    Ultra HA - All

    components are

    redundant

    PowerPath

    None - Insufficient

    Hardware for

    PowerPath PowerPath SE PowerPath - Full PowerPath - Full

    Single Pointsof Failure HBA; SP; Cable HBA; Switch; Cable None None

    Protection Low Protection Medium Protection High Protection Ultra Protection

    Connectivity

    Direct Connect -

    No Switch

    Available

    Single HBA with

    Single Switch

    True High

    Availability

    Ultra HA - All

    components are

    redundant

    PowerPath

    None - Insufficient

    Hardware for

    PowerPath PowerPath SE PowerPath - Full PowerPath - Full

    Single Pointsof Failure HBA; SP; Cable HBA; Switch; Cable None None `

    The following sections describe the protection options shown in Table 2.

    EMC CLARiiON High Availability (HA)Best Practices Planning 8

    http://powerlink.emc.com/http://powerlink.emc.com/
  • 8/12/2019 EMC ClarIIon High Availability

    9/19

    Low protectionThis is a very basic option. In this case, the host has a single HBA and is connected to a single SP. This

    configuration includes multiple single points of failure. The failure of the HBA, cable, or SP results in data

    being unavailable for the host applications. Customers running mission-critical applications must refrainfrom this configuration.

    Medium protectionThis option provides some protection. It consists of the use of PowerPath SE softwarewhich EMC

    provides free of costand a single switch for host-to-storage connectivity. When configured properly, the

    PowerPath SE software running on the host provides basic LUN trespass functionality during operations

    such as nondisruptive upgrades (NDUs), as well as in the rare event of single SP failure. This configuration

    includes multiple single points of failure. Failure of the HBA, cable, or switch will result in data beingunavailable for the host applications. Customers running mission-critical applications must refrain from this

    configuration.

    High protectionThis configuration is recommended for all business-critical applications provided that adequate measures

    have been taken at the host and application level. It entails the use of full-feature PowerPath software. Inthis configuration, there are dual HBAs connected to the host; therefore, there is a redundant path to each

    SP. There is no single point of failure. Data availability is ensured in the event an HBA, cable, or SP fails.

    Since there is a single path per SP, this configuration does not provide any additional performance

    enhancement.

    Ultra protectionThis configuration is recommended for the highest level of protection that may also help with higher

    performance. It entails the use of full-feature PowerPath software. In this configuration, there are dual

    HBAs connected to the host; therefore, there is a redundant path to each SP. There is no single point offailure. Data availability is ensured in event an HBA, cable, or SP fails. Since there are multiple paths per

    SP, this configuration benefits from PowerPaths load-balancing feature and thus provides additional

    performance.

    Switch vendor informationStorage area network (SAN) switch information on a few switch types can be found at the following switchvendor websites; theEMC Support Matrixcontains information about other switch types that EMC

    supports:

    Cisco storage networking website:

    http://www.cisco.com/en/US/products/hw/ps4159/index.html

    Brocade storage networking website:

    http://www.brocade.com/products/index.jsp

    Storage-side high availabilityOnce the host- and connectivity-side HA is ensured, the next area of focus is the storage system itself. For

    any mission-critical application environment, it is important that storage system on which the data resides

    has a highly available architecture. The CLARiiON storage system offers N+1 redundant architecture,

    which provides data protection against any single component failure. These components are discussed inthe next section.

    EMC CLARiiON High Availability (HA)Best Practices Planning 9

    http://www.cisco.com/en/US/products/hw/ps4159/index.htmlhttp://www.brocade.com/products/index.jsphttp://www.brocade.com/products/index.jsphttp://www.cisco.com/en/US/products/hw/ps4159/index.html
  • 8/12/2019 EMC ClarIIon High Availability

    10/19

    Storage-system componentsCLARiiON storage systems are designed for high availability. With redundant components such as dual

    SPs, dual back-end loops, and dual-ported disk drives, CLARiiON storage systems can ride through

    multiple component failure scenarios. Other features that make CLARiiON storage systems resilient in faceof various failure types include: triple protection of the storage system database (also referred to as PSM

    Persistent Storage Manager) and the FLARE database, which, among others, is designed to keep customer

    data available. These features are available out-of-the-box to our customers and no additional configurationis required at the time of installation. However, customers must choose the way in which disks are bound.

    The next section discusses the best practices for RAID configuration.

    RAID configurationTo access the data on the disk drives, the disks must be bound into a RAID group. There are differentRAID configurations that are supported with a CLARiiON storage system. Note that customers have the

    option of using individual disks. However, this is rarely done, since by doing so they do not benefit from

    the redundancy offered by the RAID-protected configuration.

    CLARiiON offers RAID 1, 1/0, 3, and 5 options for data protection. These configurations offer protection

    against single-disk failure. In the case of RAID 1/0 (mirrored stripes), multiple disk failure can be tolerated

    in some cases as long as the disk failure does not occur within same mirrored pair. RAID 0 offers high-performance RAID configuration, but does not offer any protection.

    Table 3 shows examples of various protection levels as they relate to resilience in case of disk failure and

    the effective utilization of the raw disk space.

    Table 3. Storage system RAID configuration options

    Low protectionDisks configured as RAID 0 offer no protection against a single disk failure. The only reason this RAID

    type is selected is to get the benefit of striped writes that RAID 0 offers for higher performance. This

    configuration may be used for things like temp files or for holding any data that needs fast access. In event

    of a disk failure, the information within RAID 0 group will be unrecoverable. Therefore this RAIDconfiguration must be used with great caution.

    Medium protectionDisks configured as RAID 3 or RAID 5 with eight or more disks provide protection against a single-diskfailure within the same RAID group. When a single disk fails within the RAID group, the hot spare

    (discussed in next section) is invoked and data that was contained on the failed disk is rebuilt. When the

    failed disk is replaced, the data from the hot spare (invoked earlier) is copied to the replaced drive. Once

    the rebuild process completes, the RAID group is ready to withstand another disk failure.

    EMC CLARiiON High Availability (HA)Best Practices Planning 10

  • 8/12/2019 EMC ClarIIon High Availability

    11/19

    High protectionIn order to make the RAID group less vulnerable to double-disk failure, fewer disks can be used to

    configure the RAID group, which reduces the possibility of double-disk failure within the same RAID

    group. Medium protection may be upgraded to high protection by reducing the number of disks within theRAID 3 or RAID 5 group to three disks.

    Ultra protectionIn certain environments, data protection and application performance are equally important requirements

    (especially in certain transaction-processing environments such as database applications). To meet these

    objectives, customers can choose a RAID 1/0 configuration that offers high performance and additional

    protection against disk failure. In the RAID 1/0, configuration, a double-disk fault may be tolerated as longas there is no more than one disk failure within the mirrored pair.

    Hot sparing policyHot spares are spare disks that are pre-allocated at the time of configuration. A hot spare is invoked in an

    event of a disk failure. Once invoked, the data that resides on the failed disk is rebuilt from either the

    mirrored pair RAID 1 and 1/0 or from data and parity from other drives, in the case of RAID 3 or RAID 5.

    When the failed disk is replaced, the data on the replaced disks is equalized with the content of the hot

    spare. After the completion of the equalization process, the hot spare returns to its default position, readyfor any future disk failure event.

    Number of hot spares

    There should be at least one hot spare disk for every 30 drives on the CLARiiON storage system. It is up to

    the customer to configure more hot spares. For ease of management, it is also recommended that the hot

    spare be configured on the last drive slot on a disk-array enclosure/shelf. However, the hot spare may be

    configured anywhere in the system with exception of the vault drives that are used for cache vault andcertain other internal purposes. The vault drives are the first five drives on the CX series. Their location

    varies for prior generation of products.

    Sizing the hot spare diskWhen planning global hot spares on the system, disks should be as large as or larger than the drive(s) they

    may be required to replace in the event of disk failure.

    Disk replacement

    If a disk drive fails and needs to be replaced, you should follow EMCs recommended procedure for diskreplacement and take all the necessary precautions, including proper drive handling. In case of doubt, wait

    for EMC support personnel. In the rare event of multiple drive failure, please do not replace the drive.

    Instead, wait for the trained EMC support personnel to arrive.

    Rebuild-time considerationsWhen a disk drive fails and a hot spare is invoked, the rebuild process starts. During this process, data that

    was resident on the failed disk is rebuilt from available redundant components. The time it takes to rebuilda failed disk depends upon various factors, which are discussed next.

    System load

    During the rebuild process the system has to do a significant amount of work to read the data from theredundant components of the RAID group to rebuild data. In the case of mirrored RAID groups (RAID 1

    and RAID 1/0), the process involves reading data from the mirrored pair. In case of parity RAID groups

    (RAID 3 and RAID 5) data is reconstructed by reading the data and parity from the available disk drives in

    EMC CLARiiON High Availability (HA)Best Practices Planning 11

  • 8/12/2019 EMC ClarIIon High Availability

    12/19

    the RAID group. Therefore, if the system is idle, the rebuild process will be faster since all the system

    resources are available to the rebuild process. If the system is under heavy application workload, the

    rebuild process can take relatively longer to complete.

    Rebuild priority

    Beside system load, the rebuild process is controlled by the priority set for the rebuild process for the LUNs

    within the RAID group. The priority settings for the rebuild process are:

    Low

    Medium

    Fast

    ASAP

    By default the priority is set to ASAP. In most cases, the default is the recommended rebuild priority.

    Size of the RAID group

    The bigger the size of the RAID group (that is, number of disks within the RAID group), the longer it takes

    to complete the rebuild process in the event of a single disk failure within the RAID group. The rebuild

    time can be reduced if the size of the RAID group is kept smaller.

    Size and type of disk drive

    The size and type of the disk itself can affect the rebuild time. For example, a five-disk RAID 5 group

    comprised of 36 GB 15,000 rpm disk drives will rebuild relatively faster than a nine-disk RAID 5 group

    comprised of 250 GB 5,400 rpm disk drives.

    Table 4 shows examples of various RAID group configurations under different system loads, rebuild

    priorities, RAID groups, and disk sizes in order to show the risk associated with a double-disk failure

    scenario. These are only examples and do not presume to show the best configuration.

    EMC CLARiiON High Availability (HA)Best Practices Planning 12

  • 8/12/2019 EMC ClarIIon High Availability

    13/19

    Table 4. RAID protection options and rebuilds

    Clustering and replicationThere is another dimension to high availability beyond local protection. Besides host, connectivity, and

    storage high availability, options such as protection against the server hardware and storage-system failureshould be considered. These events are very rare, but if they happen they may cause disruption.

    Clustering: Protecting against loss of the primary applicationserverThere are various clustering software products to protect the production environment against server failure.The most popular products include the following:

    Microsoft Cluster Server

    VERITAS Cluster Server

    Sun Cluster Server

    Please check theEMC Support Matrixfor supported cluster software.

    Data mirroring: Protecting against loss of the primary storagesystemTo protect the production site against the failure of access to the primary storage system (that is, due to

    power failure), customers can mirror data from the primary site to another disaster recovery site. Depending

    on the recovery point objective (RPO) and recovery time objective (RTO), there are different optionsavailable. In cases where customers cannot afford to lose even a single transaction, they can use a

    synchronous mirroring product such as MirrorView/Synchronous. In circumstances where the disaster

    recovery site must be hundreds or even thousands of miles from production site, some type of

    asynchronous replication application (such as RecoverPoint or MirrorView/Asynchronous) may be used.

    In almost all of these cases, the customer will benefit from the use of both clustering and remote mirroring.

    Validation and maintenanceA significant amount of time is often invested in preparing and configuring a new environment for highavailability. Because of this investment, it is important to validate the environment after configuration to

    ensure it was implemented properly for high availability. If you have installed a new storage system,

    attached a new host, or are about to perform other ongoing maintenance procedures (such as updating the

    software on the CLARiiON storage system), it is imperative that you test the HA configuration to ensure

    that data availability is maintained in the event of, for example, a path failure. It is also imperative that

    EMC CLARiiON High Availability (HA)Best Practices Planning 13

  • 8/12/2019 EMC ClarIIon High Availability

    14/19

    failover testing occur regularly within the environment to protect against any inadvertent changes that may

    have broken the high availability of the configuration.

    HA validationThere are two important pieces to testing the high availability of your environment. After installing a

    storage system, or attaching a new host, you should perform a physical test to ensure that failover occurred

    as expected. Then, after you are in production, you should perform periodic health checks to validate theavailability of the environment, and to make sure that nothing unexpected changed within the environment(for example, a zone inadvertently changed, or failovermode adjusted on the wrong initiator). The next two

    sections discuss how to check for failover after installation and how to perform ongoing periodic

    verification.

    Initial host failover testing

    After the host environment has been successfully configured for failoverincluding the installation of

    failover software (for example, EMC PowerPath), HBAs, and so forththe next and most important step is

    testing. While the environment is in the deployment stage, induce a failure. For example, pull a fibre cable

    that connects the host HBA to the storage system or switch, and ensure that the application LUNs are failed

    over to the alternate path by the host path management software. For failover to work there must be activeI/O at the host level.

    Following a successful failover, test the failback capability when the fault condition is cleared. In this

    example, reconnect the fibre cable and see if the host I/O fails back to the default path. For more

    information about EMC PowerPath software failover and failback features, refer to the PowerPath-relateddocumentation on Powerlink.

    After physically validating failover, the HAVT utility should also be run to ensure that there are no other

    HA issues in the environment. This is discussed in the following section.

    Ongoing high availability verification

    After manually testing failover, each servers high availability should regularly be verified to ensure that

    nothing has changed in the HA configuration. HA verification should also be performed before a software

    update is performed on the CLARiiON storage system. Because an update of the FLARE OE software or

    the installation of a new software enabler reboots each SP in turn, it is important to ensure that each hostthat is to remain online during the update can ride through this reboot while maintaining access to the data

    on the system. Maintaining access during an update means that, at a minimum, each server is zoned to each

    SP and PowerPath SE is installed with the proper failovermode settings applied.

    As described in the Prerequisite for a highly available host environment section, HAVT is a tool that is

    used when upgrading arrays, and is available in the GUI and the text-based versions of Navisphere ServerUtility (starting with release 24), and with the Navisphere Service Taskbar Software Assistant. HAVT

    allows you to validate CLARiiON attached hosts for high availability. Select the Verify Server High

    Availabilityoption as shown in Figure 1, and indicate whether this check is part of a Software Update, in

    which case the result of the report is sent to the storage system so that the software update process can

    validate the servers will ride through the update; or whether it is a host attach validation (regular health

    check). HAVT displays results that show whether the server meets HA requirements and allows you to

    view the Navisphere Server High Availability Report. In this scenario the important tab to note is theIssuestab as shown in Figure 3.

    EMC CLARiiON High Availability (HA)Best Practices Planning 14

    http://powerlink.emc.com/http://powerlink.emc.com/
  • 8/12/2019 EMC ClarIIon High Availability

    15/19

    Figure 3. Navisphere Server Utility High Availability verification issues report

    EMC CLARiiON High Availability (HA)Best Practices Planning 15

  • 8/12/2019 EMC ClarIIon High Availability

    16/19

    Issues are generated based on a series of checks performed by the HAVT utility. These checks include

    looking for redundant HBAs, ensuring path management software is installed, and validating that the

    proper initiator settings (such as failovermode) are set on the storage system. As of release 24, HAVT

    supports the following operating systems (refer to theEMC Navisphere Host Agent/CLI and Utilities

    ReleaseNotes onPowerlinkfor the latest support information):

    Solaris 8,9

    HP-UX 11.0, 11.11, 11.23 IA 64, 11.23 PA RISC Windows 2000, 2003 (Fibre Channel and iSCSI attaches)

    AIX 5.2, 5.3

    Red Hat Enterprise 3 and 4 (Fibre Channel and iSCSI attaches)

    SuSE Enterprise Server 8 and 9 (Fibre Channel and iSCSI attaches)

    AsianUX (2.0) (Fibre Channel and iSCSI attaches)

    HAVT supports the following failover software:

    PowerPath

    VERITAS DMP

    HP-UX PVLinks

    The Issues summary lists all the critical errors and warnings discovered with the host configuration as

    regards High Availability, and provides corrective actions for each error and warning. The report also

    includes a Detailstab. This tab provides further information on:

    Server Status: Includes the version of the failover software, and information about the FC HBAs,iSCSI HBAs, and NICs, as well as details on the devices (similar to the information displayed by a

    powermt display dev=all command issued from the PowerPath command line).

    Initiators: Includes information about HBA and NIC configuration details, including driver andregistry settings, iSCSI host iqn, persistent targets, established sessions, and (CHAP/mutualCHAP) security information for iSCSI initiators.

    Data Connection Report: This gives information about the failover mode, arraycommpath, initiatortype, and registration for each HBA connected to a device.

    Software, Services and System Updates: Server-specific software installed features, includingEMC specific software and OS patches

    HAVT may be run in the following scenarios:

    As part of the Prepare For Installation step of the Software Assistant, a process that aids customersand service with upgrading the storage system. In this case, run the HAVT utility and analyze each

    host report to preempt potential data unavailability issues that could occur during the SP reboots as

    part of the software install.

    Any time you need to check for host-related issues.

    Periodically. Using a script, you can run periodically run HAVT to generate a report for analysis.

    HAVT and its resulting report are important tools to help avoid potential data unavailability issues caused

    by improperly attached hosts. Improper server configurations (meaning HA was not properlyimplemented) are the No. 1 issue identified in weekly analyses of data unavailable reports. HAVT has

    been designed to help customers avoid these issues by providing a utility to validate the environment

    during critical maintenance procedures, such as a new server attach, new storage system installation, orCLARiiON software update.

    Change control processIt is very likely that a live production environment is going to change due to business requirements and

    other factors. To ensure that the production environment maintains its resilience and remains highly

    available, it is important that you refer to the E-Lab Interoperability Navigatorbefore making any changes.This includes changing any of the following:

    EMC CLARiiON High Availability (HA)Best Practices Planning 16

    http://powerlink.emc.com/http://powerlink.emc.com/
  • 8/12/2019 EMC ClarIIon High Availability

    17/19

    Storage system software This is usually the process of upgrading the FLARE operatingenvironment on the CLARiiON storage system.

    HBA firmware HBA vendors websites contain the latest information on HBA driver firmware,fcode, and other software. Refer to the following URLs for more details on Emulex and QLogic HBAs:

    Emulex drivers and installation docs with HBA settings at:

    http://www.emulex.com/ts/docoem/emc/index.html

    QLogic drivers and installation docs with HBA settings at:

    http://www.qlogic.com/support/oem_emc.asp

    Switch firmware

    Operating system patches and hot fixes

    Path management software EMC PowerPath, HP PVLinks, VERITAS DMP, and so on

    ConclusionThis white paper offers best practices for end-to-end high availability in a mission-critical and business-

    critical production environment. The high availability design starts at the host, continues with connectivity,and ends with the storage system. By adding clustering technology and combining it with remote mirroring,

    customers can make their production environments highly available. However, a highly available designcomes with a higher price tag. It requires redundant hardware components to guard against any possible

    failures. To have a highly available environment requires a close identification of those applicationenvironments that need high protection when it comes to host components such as HBAs, use of failover

    software such as PowerPath, use of redundant switch fabrics, and RAID protection at the storage-system

    level. HAVT is an invaluable tool for ensuring the ongoing health of your highly available environment.

    Appendix A: Failover mode settings

    These settings are not applicable for CDL series.

    The following tables include Initiator, arraycommpath, and Failover Mode settings for failover software on

    CLARiiON-supported operating systems. It also indicates which failover software is supported on each

    operating system. The following notes aid in appropriate use of these tables:

    1. Initiator Typeis referred to as systemtypeif using NaviCLI rather than the failover wizard withinNavisphere Express.

    2. The settings below are those set within the Navisphere Manager failover wizard, or, in the case of theUnitSerialNumbervariable, the Group Editoption in the Connectivity Statusdialog box.

    3. Parentheses identify the NaviCLI equivalent value.

    Table 5. AIX failover mode settings

    Parameter PowerPath DMP (AIX 5.1 and 5.2 only)

    Initiator Type CLARiiON Open (3) CLARiiON Open (3)

    Arraycommpath Enabled (1) or Disabled (0)1

    Enabled (1)Failovermode 3 or 1

    2 2

    UnitSerialNumber Array Array1 AIX settings depend on CLARiiON software being used:

    If using ODM definitions, arraycommpathshould be set to Enabled (1).

    If using CLArrayS3 software, arraycommpathshould be set to Disabled (0).2Set failovermode to 3 for PowerPath 4.5.1 or later. NDU primus case emc67186has the complete NDU

    requirements.

    EMC CLARiiON High Availability (HA)Best Practices Planning 17

    http://www.emulex.com/ts/docoem/emc/index.htmlhttp://www.qlogic.com/support/oem_emc.asphttp://www.qlogic.com/support/oem_emc.asphttp://www.emulex.com/ts/docoem/emc/index.html
  • 8/12/2019 EMC ClarIIon High Availability

    18/19

    Table 6. HP-UX failover mode settings

    Parameter PVLinks PowerPath DMP

    (HP-UX 11i

    only)

    No path

    management

    software

    Initiator Type

    (Access Logix)

    HP Auto

    Trespass (2)

    1

    HP No Auto

    Trespass(hex A)

    HP No Auto

    Trespass(hex A)

    HP No Auto

    Trespass(hex A)

    Initiator Type(non-Access Logix)

    HP AutoTrespass (2)

    1

    decimal 10

    (hex A)

    N/A HP No AutoTrespass

    (hex A)

    Arraycommpath Enabled (1) or

    Disabled (0)2

    Enabled (1) Enabled (1) 0 (Disabled) or 1

    (Enabled)2

    Failovermode 0 1 2 0

    UnitSerialNumber LUN or Array3 LUN or Array3 LUN or Array3 LUN or Array3

    1HP PVLinks requires that AutoTrespass be set for all LUNs. To set AutoTrespass, edit the Navisphere

    Host Agent configuration file (agent.conf) by commenting out OptionsSupported Autotrespassand

    restarting the Host Agent.

    2 For HP-UX running PVLinks, arraycommpathcan be Enabled (1) or Disabled (0). Either will work.

    3For HP-UX 11i v1.0, UnitSerialNumbermay have been changed to LUN if problems were experiencedwith the device display in the HP-UX SAM utility.

    Table 7. Linux failover mode settings

    Parameter PowerPath DMP MPIO

    Initiator Type CLARiiON Open (3) CLARiiON Open (3) CLARiiON Open (3)

    Arraycommpath Enabled (1) Enabled (1) Enabled (1)

    Failovermode 1 2 1UnitSerialNumber Array Array Array

    Table 8. NetWare failover mode settings1

    Parameter PowerPath

    Initiator Type CLARiiON Open (3)

    Arraycommpath Enabled (1)

    Failovermode 1

    UnitSerialNumber Array

    1Validate support for NetWare on the newer CLARiiON models via the E-Lab Interoperability Navigator

    (available on Powerlink, EMCs password-protected extranet for customers and partners). An RPQ maybe required.

    EMC CLARiiON High Availability (HA)Best Practices Planning 18

    http://powerlink.emc.com/http://powerlink.emc.com/
  • 8/12/2019 EMC ClarIIon High Availability

    19/19

    Table 9. Solaris failover mode settings

    Parameter PowerPath DMP StorEdge Traffic

    Manager

    Initiator Type CLARiiON Open (3) CLARiiON Open (3) CLARiiON Open (3)

    Arraycommpath Enabled (1) Enabled (1) Enabled (1)

    Failovermode 1 2 1

    UnitSerialNumber* Array or LUN Array or LUN Array or LUN

    *Sun Solaris installations with PowerPath and DMP:

    Solaris 2.6, 7, and 8: UnitSerialNumber should be set to LUN.

    Solaris 9: Will work with UnitSerialNumberset to either Storage System or LUN.

    Table 10. Tru64 failover mode settings

    Parameter Native failover

    Initiator Type Compaq/Tru64 (hex 1C)

    Arraycommpath Enabled (1)

    Failovermode 0

    UnitSerialNumber Array

    Table 11. VMware failover mode settings

    Parameter Native

    Initiator Type CLARiiON Open (3)

    Arraycommpath Enabled (1)

    Failovermode 1

    UnitSerialNumber Array

    Table 12. Windows failover mode settings

    Parameter PowerPath DMP (Windows 2000

    and Windows 2003

    only)

    Initiator Type CLARiiON Open (3) CLARiiON Open (3)

    Arraycommpath Enabled (1) Enabled (1)

    Failovermode 1 1

    UnitSerialNumber Array Array

    When changing any one of the Arraycommpath, Failovermode, and Initiator Typesettings, all settingswill be set to the storage-system default settings, so it is necessary to set not only the setting that is being

    changed but also all initiator settings. This applies to both methods of changing these parameters (GroupEditin Connectivity Statusand the failover wizard).

    EMC CLARiiON High Availability (HA)B t P ti Pl i 19