07 ASE2 Introduction to Metro Cluster v1.0

Introduction to MetroCluster®

ASE-2 Hardware Maintenance and Troubleshooting

Presenter

Presentation Notes

Module 7. Introduction to MetroCluster®

© 2008 NetApp. All rights reserved. 2

Objectives

At the end of this module, you will be able to: Describe the purpose of MetroCluster Explain the difference between Stretch

MetroCluster and Fabric MetroCluster List the hardware and software components

required to deploy a MetroCluster Describe MetroCluster failover expected behavior Explain how Cluster Failover on Disaster (CFOD)

works

Presenter

Presentation Notes

Objectives


MetroCluster Overview

MetroCluster is a disaster recovery solution which supports long distances between storage system controllers– Can utilize FC switches to provide connectivity

between nodes– Utilizes SyncMirror® to provide resiliency in

storage connectivity– Adds the possibility to declare disaster if one

site fails Two MetroCluster configuration types

– Stretch MetroCluster– Fabric MetroCluster

Presenter

Presentation Notes

MetroCluster Overview A MetroCluster configuration is a unique, cost-effective solution for combined high availability and disaster recovery within a campus or metropolitan area. A MetroCluster behaves in most ways just like an active-active configuration. All of the protection provided by core NetApp technology (RAID-DP™, Snapshot™ copies, automatic controller failover) also exists in a MetroCluster. However, MetroCluster adds complete synchronous mirroring (using SyncMirror®) along with the ability to perform a complete site failover from a storage perspective with a single command. MetroCluster can be implemented in what is known as Stretch MetroCluster, or as Fabric MetroCluster: Stretch MetroCluster provides Campus DR protection and can stretch up to 500m. Fabric MetroCluster provides Metropolitan DR protection and can stretch up to 100km with FC switches.


Stretch MetroCluster

Provides campus DR protection with direct connection of the two nodes (non-switched MetroCluster)

Can stretch up to 500 meters with OM3 cabling

Primary

Primary FC Storage

Secondary FC Mirror

Secondary

Secondary FC Storage

Primary FC Mirror

Cluster Interconnect

Primary

Primary FC Storage

Secondary FC Mirror

Secondary

Secondary FC Storage

Primary FC Mirror

Cluster Interconnect

Maximum distance 500 meters at 2Gbps or 270 meters at 4 Gbps

Presenter

Presentation Notes

Stretch MetroCluster Stretch MetroCluster (sometimes referred to as non-switched) provides Campus DR protection with direct connection of the two nodes and can stretch up to 500 meters with OM3 cabling. Stretch MetroCluster also includes synchronous mirroring (SyncMirror) and the ability to perform a site failover with a single command. Additional resiliency is added by deploying Multipathing. Note: the interconnect cable supplied by NetApp is only 30 meters in length, so customers must get the longer cables from third-party vendors.


Fabric MetroCluster

Uses four Fibre Channel switches in a dual fabric configuration and a separate cluster interconnect card

Deployed for distance over 500 meters; can stretch up to 100 km with DWDM switches

CI CI CI CI

ISL

ISL

Primary SecondaryMaximum distance 100 km at 2Gbps or

55 km at 4 Gbps

CI CI CI CI

ISL

ISL

Brocade FC Switches Brocade FC Switches

Primary FC Storage Secondary FC Storage

Primary FC MirrorSecondary FC Mirror

Presenter

Presentation Notes

Fabric MetroCluster Fabric MetroCluster (also referred to as switched) uses four Fibre Channel switches in a dual fabric configuration and a separate cluster interconnect card to achieve an even greater distance between primary and secondary locations. The two halves of the configuration can be more than 500 meters apart and up to 100 km with DWDM Brocade® certified FC switches. A switch fabric consists of a switch on the primary controller connected to a switch on the remote controller. The two switches are connected to each other through ISL (Inter-Switch Link) cables. The reason for two fabrics is redundancy. The loss of a switch in a fabric or the loss of a fabric will not affect the availability of the Fabric MetroCluster. Because of the nature of the MetroCluster architecture, Fibre Channel traffic on the switch fabric includes both disk I/O traffic between the controllers and disk shelves and cluster interconnect traffic.


SyncMirror Pools and Plexes

SyncMirror copies synchronously data on two plexes

Each plex of a mirror uses disks from separate pools: pool0 (local) and pool1 (mirror)

Uses Snapshot copies to guarantee consistency between the plexes in case of failure– The unaffected plex continues to

serve data– Once the issue is fixed, the two

plexes can be resynchronized

In a MetroCluster configuration, make sure each controller’s data has its mirror at the other site

Aggregate

Pool 0

Plex0

Pool 1

Plex1

Presenter

Presentation Notes

SyncMirror Pools and Plexes NetApp SyncMirror, an integral part of MetroCluster, combines the disk-mirroring protection of RAID 1 with NetApp industry-leading RAID 4 and RAID-DP technology. SyncMirror protects data from the following problems: The failure or loss of two or more disks in a RAID4 aggregate The failure or loss of three or more disks in a RAID-DP (RAID double-parity) aggregate SyncMirror creates two copies of the same WAFL file system on two plexes. The two plexes, are simultaneously updated; therefore, the copies are always identical. Snapshot copies are used to guarantee consistency between the plexes in case of failure. Note that here Snapshot copies are used for resync, not for transferring the delta. Every hour a Snapshot copy is taken on both sides. Resync occurs from plex to plex through the storage system. When SyncMirror is licensed and hardware ownership is used spare disks are split into two pools: pool0 and pool1. Each plex of a mirror uses disks from these separate pools. When software ownership is used, disks are explicitly assigned to pools by the administrator. To maximize availability, pool0 and pool1 disks need to be on separate loops and use separate HBAs, cables, and shelves. In the event of an outage (e.g. loss of disk connectivity) the unaffected plex continues to serve data while you fix the cause of the failure. Once fixed, the two plexes can be resynchronized and the mirror relationship reestablished. Make sure all storage is mirrored with SyncMirror. While non-mirrored storage is technically permissible, NetApp does support non-mirrored storage in a MetroCluster configuration because the data on that storage will not be available after a site failover.


Example of Pools and Plexes in Fabric MetroCluster

P0 P1 P0 P1P0 P1 P0 P1

A B C D

Loop ALoop B

Pool 0 Pool 1

P0 P1 P0 P1P0 P1 P0 P1

A B C D

Switch1 Switch2 Switch3 Switch4

FAS1 local plex0 FAS2 mirror plex1

A B A B A BA B

Bank0 Bank1 Bank0 Bank1 Bank0 Bank1 Bank0 Bank1

FAS2 local plex0FAS1 mirror plex1

Pool 1 Pool 0

FAS1 FAS2

Presenter

Presentation Notes

Example of Pools and Plexes in Fabric MetroCluster For information on Fabric MetroCluster installation, cabling and configuration, refer to the Active-Active Configuration Guide on the NOW site.


MetroCluster General Requirements

Software requirements– Data ONTAP 6.4.1 and later– SyncMirror_local, cluster, and cluster_remote

licenses Hardware requirements

– A clustered pair of FAS900, FAS3000, FAS3100 or FAS6000 series appliances

– Cluster interconnect card, copper/fiber converters, and associated cables

– Mirrors should be set between identical storage hardware

Presenter

Presentation Notes

MetroCluster General Requirements The cluster license provides automatic failover capability between sites in case of most hardware failures. The cluster_remote license provides mechanism for administrator to declare site disaster and initiate site failover via single command for ease of use. The SyncMirror maintain two copies of data online, providing protection against all types of hardware outages, including triple disk failure. Mirrors should be set between identical storage hardware.


Fabric MetroCluster Requirements

FC-VI (cluster interconnect) is dual-ported– One connection to each switch; any switch port can be

used Two storage FC ports; one connection to each switch Disk and storage shelf

– Limited to 504 disk spindles– Storage shelves are attached to the switches– Only DS14, DS14Mk2, and DS14Mk4 storage shelves

are supported – Maximum of two shelves per loop

Storage HBA’s and disk shelves must be attached using the pool and ownership rules

Presenter

Presentation Notes

Fabric MetroCluster Requirements


Fabric MetroCluster Requirements (cont’d)

Brocade FC switch

Four dedicated certified Brocade FC switches supplied by NetApp and with supported firmware Firmware downloads: Go to the NOW™ (NetApp on the Web) site > Download Software > Fibre Channel Switch > Brocade Supported switches: Brocade 200E, 300E, and 5000 It is recommended to have switch model identical at each given location

Brocade license

Extended Distance (ISL > 10 km) Full-Fabric Ports-on-Demand (for additional ports)

Presenter

Presentation Notes

Fabric MetroCluster Requirements (cont’d) Fabric MetroCluster requirements prohibit the use of any other model of or any other vendor’s Fibre Channel switch instead of the Brocade 200E, 300E or 5000 included with the Fabric MetroCluster. NetApp FC switches support matrix: http://now.netapp.com/NOW/knowledge/docs/san/fcp_iscsi_config/fcp_switch.shtml For the most up-to-date switch information, including supported switches and firmware downloads, see the Fabric-Attached MetroCluster Switch Description page on the NOW site. (To access this page, navigate to Download Software > Fibre Channel Switch > Brocade).


MetroCluster FC-VI Interconnect Card

Use FC-VI (QLogic2462) 4Gbps cluster interconnect card

Each port is connected to a separate FC switch Fabric

Good connection to Brocade switch must have Yellow LED ON (4Gbps link speed)

P/N: X1926A

Port A

Port B

PCIe Bus

Not applicable to NetApp appliances

Presenter

Presentation Notes

MetroCluster FC-VI Interconnect Card FC-VI stands for Fibre Channel-Virtual Interface, also called VI-MC (Virtual Interface MetroCluster) adapter card. MetroCluster uses the FC-VI (QLE2462) interconnect card (P/N#1926A) Dual-ported card: each port is connected to a separate FC switch fabric The card functions at 4 Gbps The card has a non-volatile ID from QLE2462 used by Storage Initiator HBA or (SAN) Target Adapter Requires Data ONTAP 7.2.2P1D1 or 7.2.3 and later QLE Led Scheme Good connection to Brocade 200E/300E Switch Port, must have Yellow LED ON (indicating 4 Gbps link speed) The last Activity entry (Beacon) is not applicable in NetApp storage system


MetroCluster Failover Expected Behavior

Event Does the event trigger a failover?

Single, or double, or triple disk failure No

Single HBA failure (loop A, or loop B or both) No

Shelf module failure No

Disk shelf backplane failure No

Disk shelf single or dual power failure No

Controller simple reboot No

Controller single power failure No

Cluster interconnected failure (one port or both ports) No

Ethernet interface Yes, if the options are set

Presenter

Presentation Notes

MetroCluster Failover Expected Behavior


MetroCluster Failover Expected Behavior (cont’d)

When cluster failover or cf is enabled, the following will cause a failover:– Controller dual power failure– halt command– Powering off a node – Failed reboot after a panic

The next section illustrates some examples of failover and non-failover events cause-and-effect

Presenter

Presentation Notes

MetroCluster Failover Expected Behavior (cont’d)


MetroCluster - Interconnect Failure

1. Interconnect failure does not trigger a failover, but mirroring is disabled

2. FAS1 and FAS2 continue to serve data3. Re-syncing happens automatically after

interconnect is reestablished

Vol1/Plex0

FAS2

Vol2/Plex1 Vol1/Plex1 Vol2/Plex0

ISL

FAS1DC#1 DC# 2

Presenter

Presentation Notes

MetroCluster – Interconnect Failure


MetroCluster – Disk Shelf Failure

1. Disk shelf connected to FAS1 has failed; data stored on plex0 is not accessible

2. FAS1 still serves the clients requests by accessing the same data mirrored (plex1) at the secondary data center

Vol1/Plex0

FAS2FAS1


ISL

DC#1 DC# 2

Presenter

Presentation Notes

MetroCluster –Disk Shelf Failure In this scenario, one of the disk shelf attached to FAS1 failed. The data stored on plex0 are not accessible however FAS1 can still serve the data by accessing the same data mirrored (plex1) at the secondary data center.


MetroCluster - Controller Failure

1. FAS1 fails and its storage is still accessible at the primary data center

2. FAS2 takes over the identity of its failed partner– FAS2 serves all clients requests by accessing the

data stored on disks in both data centers

Vol1/Plex0

FAS2FAS1


ISL

DC#1 DC# 2

Presenter

Presentation Notes

MetroCluster - Site Disaster In this scenario, the storage system FAS1 fails and the data are still accessible at the primary data center. FAS2 takes over the identity of its failed partner and serves all clients requests because it can access the disks in both data centers.


Cluster Failover on Disaster (CFOD)

Requires the cluster_remote license Enables the cf forcetakeover –d

command, which allows a takeover to occur without a quorum of disks available (due to ½ of the partner mailbox disks missing)– Discard mailbox disks– Split the mirror in order to bring the failed

controller’s mirror online– File System ID (FSID) may be re-written on

partner’s volumes Depends on option cf.takeover.change_fsid

Presenter

Presentation Notes

Cluster Failover on Disaster (CFOD) Upon determining that one of the sites has failed, the administrator must execute a specific command on the surviving node to initiate a site takeover. The command is: cf forcetakeover –d This command allows a takeover to occur in spite of the lack of a quorum of disks. The -d option is used in conjunction with RAID mirroring to recover from disasters in which one partner is not available. The forced takeover process breaks the mirrored relationships in order to bring the failed controller’s volumes online. Volumes have a new file system ID (FSID) in order to avoid conflict with the original volumes. Effective in Data ONTAP 7.2.4, there is an option to preserve the original FSID, which allows LUNs to retain their original serial number. The option is called cf.takeover.change_fsid. If set to off (0) the original FSID will be preserved. Note: FSID is a unique identifier for a volume or a aggregate on a given host or a given cluster. WAFL uses fsid as part of file handle to communicate with NFS clients.


MetroCluster - Site Disaster

FAS1 fails and data are inaccessible at the primary data center; automatic takeover is disabled

Use the cf forcetakeover –d command from FAS2 to cause the takeover; the plexes for the failed partner are split

Vol1/Plex0

FAS2FAS1


ISL

DC#1 DC# 2

Presenter

Presentation Notes

MetroCluster - Site Disaster In this scenario, the storage system FAS1 fails and its storage is not accessible at the primary center. In this case, automatic takeover is disabled. You have to perform a MetroCluster forced takeover from the surviving node FAS2 (cf forcetakeover -d command) to cause the takeover. Such an operation splits the failed node’s syncmirrored volumes by breaking up the relationships between the two plexes. Once FAS2 has taken over FAS1, recover access to the failed partner's data by completing one of the following tasks: If you are using file-access protocols, remount the failed partner's volumes If you are using LUNs, bring the failed partner's LUNs online


MetroCluster - Site Recovery

Once the failures are fixed, you have to reestablish the MetroCluster configuration

1. Restrict booting of the previously failed node 2. Rejoin the mirrors that were split by the forced takeover3. Perform a giveback: cf giveback

Vol1/Plex0

FAS2


ISL

FAS1DC#1 DC# 2

Presenter

Presentation Notes

MetroCluster - Site Recovery Once the problem at the failed site is resolved, the administrator must follow certain procedures, including restricting booting of the previously failed node. You can restrict access to the previously failed site controller in the following ways: a. Turn off power to the previously failed node (disk shelves should be left on). b. Disconnect the cluster interconnect and Fibre Channel adapter cables of the node at the surviving site. After you resolve all the problems that caused the site failure and ensure that the controller at the failed site is offline, it is time to prepare for the giveback so the sites can return to their normal operation. Rejoin the two volumes that were split by the forced takeover Perform a giveback (cf giveback) Caution: if you attempt a giveback operation prior to rejoining the aggregates, you might cause the node to boot with a previously failed plex, resulting in a data service outage.


Module Review

What are the two types of MetroCluster configuration?– Stretch MetroCluster and Fabric MetroCluster

What licenses are required to deploy a MetroCluster?– SyncMirror_local, cluster, and cluster_remote

Fabric MetroCluster can stretch up to 100 km (True or False?)– True

Presenter

Presentation Notes

Module Review


Module Review (cont’d)

What hardware is added to achieve a Fabric MetroCluster?

– FC-VI cluster interconnect adapter, two pairs of certified Brocade switches, and appropriate cabling

In a Fabric MetroCluster configuration, which license would you install when using a 16 ports Brocade switch?

– Brocade Ports-on-Demand license

Presenter

Presentation Notes




When cluster failover is enabled, which events will provoke a failover?– Triple disk failure?– No– Disk shelf dual power failure?– No– Cluster interconnected failure (both ports)?– No– Controller dual power failure?– Yes

Which command would you execute to force a site failover?– cf forcetakeover –d

Presenter

Presentation Notes


Documents

07 ASE2 Introduction to Metro Cluster v1.0