MetroCluster is a disaster recovery solution which supports long distances between storage system controllers– Can utilize FC switches to provide connectivity
between nodes– Utilizes SyncMirror® to provide resiliency in
storage connectivity– Adds the possibility to declare disaster if one
site fails Two MetroCluster configuration types
– Stretch MetroCluster– Fabric MetroCluster
Presenter
Presentation Notes
MetroCluster Overview A MetroCluster configuration is a unique, cost-effective solution for combined high availability and disaster recovery within a campus or metropolitan area. A MetroCluster behaves in most ways just like an active-active configuration. All of the protection provided by core NetApp technology (RAID-DP™, Snapshot™ copies, automatic controller failover) also exists in a MetroCluster. However, MetroCluster adds complete synchronous mirroring (using SyncMirror®) along with the ability to perform a complete site failover from a storage perspective with a single command. MetroCluster can be implemented in what is known as Stretch MetroCluster, or as Fabric MetroCluster: Stretch MetroCluster provides Campus DR protection and can stretch up to 500m. Fabric MetroCluster provides Metropolitan DR protection and can stretch up to 100km with FC switches.
Provides campus DR protection with direct connection of the two nodes (non-switched MetroCluster)
Can stretch up to 500 meters with OM3 cabling
Primary
Primary FC Storage
Secondary FC Mirror
Secondary
Secondary FC Storage
Primary FC Mirror
Cluster Interconnect
Primary
Primary FC Storage
Secondary FC Mirror
Secondary
Secondary FC Storage
Primary FC Mirror
Cluster Interconnect
Maximum distance 500 meters at 2Gbps or 270 meters at 4 Gbps
Presenter
Presentation Notes
Stretch MetroCluster Stretch MetroCluster (sometimes referred to as non-switched) provides Campus DR protection with direct connection of the two nodes and can stretch up to 500 meters with OM3 cabling. Stretch MetroCluster also includes synchronous mirroring (SyncMirror) and the ability to perform a site failover with a single command. Additional resiliency is added by deploying Multipathing. Note: the interconnect cable supplied by NetApp is only 30 meters in length, so customers must get the longer cables from third-party vendors.
Uses four Fibre Channel switches in a dual fabric configuration and a separate cluster interconnect card
Deployed for distance over 500 meters; can stretch up to 100 km with DWDM switches
CI CI CI CI
ISL
ISL
Primary SecondaryMaximum distance 100 km at 2Gbps or
55 km at 4 Gbps
CI CI CI CI
ISL
ISL
Brocade FC Switches Brocade FC Switches
Primary FC Storage Secondary FC Storage
Primary FC MirrorSecondary FC Mirror
Presenter
Presentation Notes
Fabric MetroCluster Fabric MetroCluster (also referred to as switched) uses four Fibre Channel switches in a dual fabric configuration and a separate cluster interconnect card to achieve an even greater distance between primary and secondary locations. The two halves of the configuration can be more than 500 meters apart and up to 100 km with DWDM Brocade® certified FC switches. A switch fabric consists of a switch on the primary controller connected to a switch on the remote controller. The two switches are connected to each other through ISL (Inter-Switch Link) cables. The reason for two fabrics is redundancy. The loss of a switch in a fabric or the loss of a fabric will not affect the availability of the Fabric MetroCluster. Because of the nature of the MetroCluster architecture, Fibre Channel traffic on the switch fabric includes both disk I/O traffic between the controllers and disk shelves and cluster interconnect traffic.
SyncMirror copies synchronously data on two plexes
Each plex of a mirror uses disks from separate pools: pool0 (local) and pool1 (mirror)
Uses Snapshot copies to guarantee consistency between the plexes in case of failure– The unaffected plex continues to
serve data– Once the issue is fixed, the two
plexes can be resynchronized
In a MetroCluster configuration, make sure each controller’s data has its mirror at the other site
Aggregate
Pool 0
Plex0
Pool 1
Plex1
Presenter
Presentation Notes
SyncMirror Pools and Plexes NetApp SyncMirror, an integral part of MetroCluster, combines the disk-mirroring protection of RAID 1 with NetApp industry-leading RAID 4 and RAID-DP technology. SyncMirror protects data from the following problems: The failure or loss of two or more disks in a RAID4 aggregate The failure or loss of three or more disks in a RAID-DP (RAID double-parity) aggregate SyncMirror creates two copies of the same WAFL file system on two plexes. The two plexes, are simultaneously updated; therefore, the copies are always identical. Snapshot copies are used to guarantee consistency between the plexes in case of failure. Note that here Snapshot copies are used for resync, not for transferring the delta. Every hour a Snapshot copy is taken on both sides. Resync occurs from plex to plex through the storage system. When SyncMirror is licensed and hardware ownership is used spare disks are split into two pools: pool0 and pool1. Each plex of a mirror uses disks from these separate pools. When software ownership is used, disks are explicitly assigned to pools by the administrator. To maximize availability, pool0 and pool1 disks need to be on separate loops and use separate HBAs, cables, and shelves. In the event of an outage (e.g. loss of disk connectivity) the unaffected plex continues to serve data while you fix the cause of the failure. Once fixed, the two plexes can be resynchronized and the mirror relationship reestablished. Make sure all storage is mirrored with SyncMirror. While non-mirrored storage is technically permissible, NetApp does support non-mirrored storage in a MetroCluster configuration because the data on that storage will not be available after a site failover.
Example of Pools and Plexes in Fabric MetroCluster
P0 P1 P0 P1P0 P1 P0 P1
A B C D
Loop ALoop B
Pool 0 Pool 1
P0 P1 P0 P1P0 P1 P0 P1
A B C D
Switch1 Switch2 Switch3 Switch4
FAS1 local plex0 FAS2 mirror plex1
A B A B A BA B
Bank0 Bank1 Bank0 Bank1 Bank0 Bank1 Bank0 Bank1
FAS2 local plex0FAS1 mirror plex1
Pool 1 Pool 0
FAS1 FAS2
Presenter
Presentation Notes
Example of Pools and Plexes in Fabric MetroCluster For information on Fabric MetroCluster installation, cabling and configuration, refer to the Active-Active Configuration Guide on the NOW site.
Software requirements– Data ONTAP 6.4.1 and later– SyncMirror_local, cluster, and cluster_remote
licenses Hardware requirements
– A clustered pair of FAS900, FAS3000, FAS3100 or FAS6000 series appliances
– Cluster interconnect card, copper/fiber converters, and associated cables
– Mirrors should be set between identical storage hardware
Presenter
Presentation Notes
MetroCluster General Requirements The cluster license provides automatic failover capability between sites in case of most hardware failures. The cluster_remote license provides mechanism for administrator to declare site disaster and initiate site failover via single command for ease of use. The SyncMirror maintain two copies of data online, providing protection against all types of hardware outages, including triple disk failure. Mirrors should be set between identical storage hardware.
Four dedicated certified Brocade FC switches supplied by NetApp and with supported firmware Firmware downloads: Go to the NOW™ (NetApp on the Web) site > Download Software > Fibre Channel Switch > Brocade Supported switches: Brocade 200E, 300E, and 5000 It is recommended to have switch model identical at each given location
Fabric MetroCluster Requirements (cont’d) Fabric MetroCluster requirements prohibit the use of any other model of or any other vendor’s Fibre Channel switch instead of the Brocade 200E, 300E or 5000 included with the Fabric MetroCluster. NetApp FC switches support matrix: http://now.netapp.com/NOW/knowledge/docs/san/fcp_iscsi_config/fcp_switch.shtml For the most up-to-date switch information, including supported switches and firmware downloads, see the Fabric-Attached MetroCluster Switch Description page on the NOW site. (To access this page, navigate to Download Software > Fibre Channel Switch > Brocade).
Use FC-VI (QLogic2462) 4Gbps cluster interconnect card
Each port is connected to a separate FC switch Fabric
Good connection to Brocade switch must have Yellow LED ON (4Gbps link speed)
P/N: X1926A
Port A
Port B
PCIe Bus
Not applicable to NetApp appliances
Presenter
Presentation Notes
MetroCluster FC-VI Interconnect Card FC-VI stands for Fibre Channel-Virtual Interface, also called VI-MC (Virtual Interface MetroCluster) adapter card. MetroCluster uses the FC-VI (QLE2462) interconnect card (P/N#1926A) Dual-ported card: each port is connected to a separate FC switch fabric The card functions at 4 Gbps The card has a non-volatile ID from QLE2462 used by Storage Initiator HBA or (SAN) Target Adapter Requires Data ONTAP 7.2.2P1D1 or 7.2.3 and later QLE Led Scheme Good connection to Brocade 200E/300E Switch Port, must have Yellow LED ON (indicating 4 Gbps link speed) The last Activity entry (Beacon) is not applicable in NetApp storage system
When cluster failover or cf is enabled, the following will cause a failover:– Controller dual power failure– halt command– Powering off a node – Failed reboot after a panic
The next section illustrates some examples of failover and non-failover events cause-and-effect
1. Disk shelf connected to FAS1 has failed; data stored on plex0 is not accessible
2. FAS1 still serves the clients requests by accessing the same data mirrored (plex1) at the secondary data center
Vol1/Plex0
FAS2FAS1
Vol2/Plex1 Vol1/Plex1 Vol2/Plex0
ISL
DC#1 DC# 2
Presenter
Presentation Notes
MetroCluster –Disk Shelf Failure In this scenario, one of the disk shelf attached to FAS1 failed. The data stored on plex0 are not accessible however FAS1 can still serve the data by accessing the same data mirrored (plex1) at the secondary data center.
1. FAS1 fails and its storage is still accessible at the primary data center
2. FAS2 takes over the identity of its failed partner– FAS2 serves all clients requests by accessing the
data stored on disks in both data centers
Vol1/Plex0
FAS2FAS1
Vol2/Plex1 Vol1/Plex1 Vol2/Plex0
ISL
DC#1 DC# 2
Presenter
Presentation Notes
MetroCluster - Site Disaster In this scenario, the storage system FAS1 fails and the data are still accessible at the primary data center. FAS2 takes over the identity of its failed partner and serves all clients requests because it can access the disks in both data centers.
Requires the cluster_remote license Enables the cf forcetakeover –d
command, which allows a takeover to occur without a quorum of disks available (due to ½ of the partner mailbox disks missing)– Discard mailbox disks– Split the mirror in order to bring the failed
controller’s mirror online– File System ID (FSID) may be re-written on
partner’s volumes Depends on option cf.takeover.change_fsid
Presenter
Presentation Notes
Cluster Failover on Disaster (CFOD) Upon determining that one of the sites has failed, the administrator must execute a specific command on the surviving node to initiate a site takeover. The command is: cf forcetakeover –d This command allows a takeover to occur in spite of the lack of a quorum of disks. The -d option is used in conjunction with RAID mirroring to recover from disasters in which one partner is not available. The forced takeover process breaks the mirrored relationships in order to bring the failed controller’s volumes online. Volumes have a new file system ID (FSID) in order to avoid conflict with the original volumes. Effective in Data ONTAP 7.2.4, there is an option to preserve the original FSID, which allows LUNs to retain their original serial number. The option is called cf.takeover.change_fsid. If set to off (0) the original FSID will be preserved. Note: FSID is a unique identifier for a volume or a aggregate on a given host or a given cluster. WAFL uses fsid as part of file handle to communicate with NFS clients.
FAS1 fails and data are inaccessible at the primary data center; automatic takeover is disabled
Use the cf forcetakeover –d command from FAS2 to cause the takeover; the plexes for the failed partner are split
Vol1/Plex0
FAS2FAS1
Vol2/Plex1 Vol1/Plex1 Vol2/Plex0
ISL
DC#1 DC# 2
Presenter
Presentation Notes
MetroCluster - Site Disaster In this scenario, the storage system FAS1 fails and its storage is not accessible at the primary center. In this case, automatic takeover is disabled. You have to perform a MetroCluster forced takeover from the surviving node FAS2 (cf forcetakeover -d command) to cause the takeover. Such an operation splits the failed node’s syncmirrored volumes by breaking up the relationships between the two plexes. Once FAS2 has taken over FAS1, recover access to the failed partner's data by completing one of the following tasks: If you are using file-access protocols, remount the failed partner's volumes If you are using LUNs, bring the failed partner's LUNs online
Once the failures are fixed, you have to reestablish the MetroCluster configuration
1. Restrict booting of the previously failed node 2. Rejoin the mirrors that were split by the forced takeover3. Perform a giveback: cf giveback
Vol1/Plex0
FAS2
Vol2/Plex1 Vol1/Plex1 Vol2/Plex0
ISL
FAS1DC#1 DC# 2
Presenter
Presentation Notes
MetroCluster - Site Recovery Once the problem at the failed site is resolved, the administrator must follow certain procedures, including restricting booting of the previously failed node. You can restrict access to the previously failed site controller in the following ways: a. Turn off power to the previously failed node (disk shelves should be left on). b. Disconnect the cluster interconnect and Fibre Channel adapter cables of the node at the surviving site. After you resolve all the problems that caused the site failure and ensure that the controller at the failed site is offline, it is time to prepare for the giveback so the sites can return to their normal operation. Rejoin the two volumes that were split by the forced takeover Perform a giveback (cf giveback) Caution: if you attempt a giveback operation prior to rejoining the aggregates, you might cause the node to boot with a previously failed plex, resulting in a data service outage.
When cluster failover is enabled, which events will provoke a failover?– Triple disk failure?– No– Disk shelf dual power failure?– No– Cluster interconnected failure (both ports)?– No– Controller dual power failure?– Yes
Which command would you execute to force a site failover?– cf forcetakeover –d