40
© 2010 IBM Corporation IBM Power Systems Technical University October 18–22, 2010 — Las Vegas, NV Session Title: Designing a PowerHA SystemMirror for AIX Disaster Recovery Solution Session ID: HA18 (AIX) Speaker Name: Michael Herrera

HA18 Herrera

Embed Size (px)

Citation preview

Page 1: HA18 Herrera

© 2010 IBM Corporation

IBM Power Systems Technical University

October 18–22, 2010 — Las Vegas, NV

Session Title:Designing a PowerHA SystemMirror for AIX Disaster Recovery Solution

Session ID: HA18 (AIX)

Speaker Name: Michael Herrera

Page 2: HA18 Herrera

Michael Herrera ([email protected])

Advanced Technical Skills (ATS)

Certified IT Specialist

Workload-Optimizing Systems

+

Best Practices for Designing a

PowerHA Enterprise Edition Solution on AIX

Page 3: HA18 Herrera

3

Agenda

• Available Offerings

• Campus Disaster Recovery vs. Extended Distance

• What you get with Enterprise Edition

• Expected Fallover Behaviors

• Summary

Page 4: HA18 Herrera

4

Tiers of Disaster Recovery – PowerHA SM Enterprise Edition

15 Min 1-4 Hr. 4-8 Hr. 8-12 Hr. 12-16 Hr. 24 Hr. Days

Tier 2 - PTAM, Hot Site, TSM**

Tier 1 - PTAM

Tier 3 - Electronic Vaulting, TSM**, Tape

Tier 4 - Batch/Online database shadowing & journaling,

Point in Time disk copy (FlashCopy), TSM-DRM

Tier 5 - Software two site, two phase commit (transaction integrity)

Tier 6 - Storage mirroring (example:XRC, Metro & Global Mirror, VTS Peer to Peer)

Tier 7 - Highly automated, business wide, integrated solution (Example: GDPS/PPRC/VTS P2P, AIX PowerHA Enterprise Edition, OS/400 HABP....

PowerHA Enteprise Edition fits in here

Zero or near zero data recreation

Applications with Low tolerance to outage

Applications Somewhat Tolerant to outage

Applications very tolerant to outage

Recovery TimeTiers based on SHARE definitions

Value

*PTAM=Pickup Truck Access Method with Tape

**TSM=Tivoli Storage Manager

***=Geographically Dispersed Parallel Sysplex

minutes to hours

data recreation

up to 24 hours

data recreation

24 to 48 hours

data recreation

HA & DR solutions from IBM for your mission-critical AIX applications

Page 5: HA18 Herrera

5

HACMP is now PowerHA SystemMirror for AIX!

• Current Release: 7.1.0.X– Available on: AIX 6.1 TL06 & 7.1

• Packaging Changes:– Standard Edition - Local Availability

– Enterprise Edition - Local & Disaster Recovery

(Version 7.1 will not be released until 2011)

• Licensing Changes:– Small, Medium, Large Server Class

N/AOct 20, 2009PowerHA SystemMirror 6.1.0

N/ANov 14, 2008PowerHA 5.5.0

Sept, 2011Nov 6, 2007HACMP 5.4.1

N/ASept 10, 2010PowerHA SystemMirror 7.1.0

End of Support DateRelease DateVersion

Product Lifecycle:

* These dates are subject to change per Announcement Flash

A 20 year track record in high availability for AIX

Page 6: HA18 Herrera

6

PowerHA SystemMirror Version 6.1 Editions for AIX

�EMC SRDF sync/async

��PowerHA DLPAR HA management

�GLVM deployment wizard

�IBM Metro Mirror support

�IBM Global Mirror support *

�Hitachi True Copy & Global Replicator *

��Centralized Management CSPOC

��SMIT management interfaces

��AIX event/error management

��Integrated heartbeat

��Smart Assists

�Multi Site HA Management

�PowerHA GLVM async mode

��Integrated disk heartbeat

��Cluster verification framework

��Shared Storage management

��Cluster resource management

Enterprise

Edition

Standard

EditionHigh Level Features

* Hitachi & Global Mirror functionality is only available in 6.1.0.3

Highlights:

� New Editions to optimize

software value capture

� Standard Edition targeted

at datacenter HA

� Enterprise Edition targeted

at multi-site HA/DR

� Tiered pricing structure

Small/Med/Large

(7.1 Enterprise Edition N/A till 2011)

Page 7: HA18 Herrera

7

High Availability & DR: Drawing the Line

Campus Style DR

• Cross Site LVM Mirroring - AIX LVM Mirrors

• SVC Split I/O VDisk Mirroring - SVC VDisk functionality

• Metro Mirror or SRDF * - Disk Based Replication* To manage disk level replication the Enterprise Edition is required

Extended Distance Offerings

• Metro Mirror & Global Mirror - SVC, DS6K, DS8K, ESS800

• EMC SRDF - DMX3, DMX4, VMAX

• Hitachi TrueCopy & Global - USPV USPVM

Replicator

• GLVM (sync / async) - IP Based Replication

Data Center

Remote Site

Different Perspectives on the protection and replication of data

Page 8: HA18 Herrera

8

Local High Availability vs. Disaster Recovery

How far can I stretch a local cluster?

– How far can my storage be shared?

• Network Connectivity

– Subnetting & potential latency

• Storage Infrastructure

– Can you merge fabrics and present LUNs from either location across the campus?

• Desired Resiliency

– LVM Mirroring across storage subsystems � Both copies are accessible

– Storage Level Replication � Only active copy available

– VDisk Mirroring (San Volume Controller) � Single logical copy mirrored on backend

(Distance limitations ~10km or 6 miles)

Storage

Enclosure 1

Storage

Enclosure 2

LVM Mirroring

Disk Replication

VDisk Mirroring

Page 9: HA18 Herrera

9

Campus Style DR: Cross Site LVM Mirroring

• Leverage AIX Logical Volume Mirroring

• Distance Limitations: Synchronous

FC switch

FC switch

FC switch

FC switch

LVM mirrors

Direct SAN links

Up to 15 km

LVM mirrors

FC switch

FC switch

DWDM

DWDM

FC switch

FC switch

DWDM, CWDM or other SAN extenders (ie. 120-300 km)

Distance limited by latency effect on performance

Page 10: HA18 Herrera

10

Campus DR: Cross Site LVM vs. Storage Replication

Considerations:

• Standard Edition vs. Enterprise Edition

• Disk Replication: Common Replication Mechanism across platforms

• Performance Differences:– Host based LVM Mirroring vs. Disk Replication

– White Paper – Cross Site Mirroring Performance Implications

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101269

SVC SVC

DWDM DWDM

LAN

SANSAN

LAN

Choices:

• Cross Site LVM Mirroring

• VDisk Mirroring (Split I/O Group)

• Metro Mirroring

Page 11: HA18 Herrera

11

Considerations:

• What do the volume groups look like?

hdisk hdisk hdisk hdisk

Local Storage

Subsystem

Remote Storage

Subsystem

DatavgLogical Volume

LV

Copy 1

LV

Copy 2

primary secondary

LV LV

PowerHA & Logical Volume Mirroring

New in AIX 6.1 - Mirror Pools

• Intended for Asynchronous GLVM

• Address Issues with Extending Logical Volumes and spanning copies

• New DR Redbook: Exploiting PowerHA SystemMirror Enterprise Edition

- Scenario for Cross Site LVM with Mirror Pools

Page 12: HA18 Herrera

12

AIX 6.1 & Mirror Pools (SMIT Panels & CLI)

Benefits:

• Prevent spanning copies

• Requirement for Async GLVM

Other Potential Uses:

- Cross Site LVM configurations

- Synchronous GLVM

* Reason that there is no

asynchronous GLVM on AIX 5.3

and why it was not retrofitted

* CSPOC does not currently allow you to create logical volume via menus.

* Work around is to create logical volume using smit mklv and then continue creating Filesystem via CSPOC

Page 13: HA18 Herrera

13

Infrastructure Considerations

Important:Identify & Eliminate Single Points of Failure!

Site A Site B

Node A Node B

DWDM DWDM

SITEAMETROVG

50GB

50GB

50GB

50GB

50GB

50GB

50GB

50GB

LAN

SANSAN

LAN

Page 14: HA18 Herrera

14

Infrastructure Considerations

Important:Identify Single Points of Failure & design the solution around them

Site A Site B

Node A

WAN

Node B

XD_rs232

XD_IP

net_ether_0

DWDM DWDM

SITEAMETROVG

1GBECM VG: diskhb_vg1

hdisk2 000fe4111f25a1d1

ECM VG: diskhb_vg1

hdisk3 000fe4111f25a1d1

1GBECM VG: diskhb_vg2

hdisk3 000fe4112f998235

ECM VG: diskhb_vg2

hdisk4 000fe4112f998235

LAN

SAN

LAN

SAN

50GB

50GB

50GB

50GB

50GB

50GB

50GB

50GB

Page 15: HA18 Herrera

15

XD_rs232 networks and Serial over Ethernet

• Converted rs232 using rs422/rs485

– Using true serial requires rs422/rs485 converters

– Distance to 1.2 km at 19,200 bps or 5 km (~3.1 miles) at 9600 bps

• Converted rs232 using fiber optics

– Fiber optic modems or multiplexors

– Distances of 20 - 100 km (~12 - 62 miles) but must conform to the vendor’s

specifications to avoid signal loss

– Companies like Black Box and TC Communications

• Serial over Ethernet

– This option provides the greatest distance by not defining any hard limitations,

but is based on TCP/IP, which is one of the components that this type of network

is designed to isolate

– Several vendors available online

Page 16: HA18 Herrera

16

PowerHA SystemMirror: Prominent Client Issues

• Cluster Subnet Requirements

• How do clients connect

en2

10.10.10.100 base

10.10.10.120 service_IP1

en2

13.10.10.100 base

13.10.10.120 service_IP2

1GB

1GB

XD_IP_net_0

disk_hb_net_0

disk_hb_net_1

XD_rs232_net_0

Resource Group A

Startup: Online on Home Node Only

Fallover: Fallover to Next Node in List

Fallback: Never Fallback

Site Policy: Prefer Primary Site

Nodes: NodeA Node B

Service IP: service_IP1 service_IP2

Volume Groups: datavg

Application Server: AppA

30GB 30GB 30GB 30GB

Cluster Data LUNs

• IPAT across Sites (Site Specific IPs)

• Context switch – External Devices (ie. www.f5.com)

• Static IPs – Node Bound Service IP (manual reconnect)

• DNS Change (consider TTL – Time to Live)

Bldg A Bldg B

Page 17: HA18 Herrera

17

PowerHA Extended DR Solution Progression

SVC_Site A

net_diskhb_01 net_diskhb_02

Node A1 Node A2 Node B1 Node B2

SVCSVC

net_ether_01

xd_ip

PPRC Links

SVC_Site B

SAN SAN

HA first then DR

Building blocks for success

Page 18: HA18 Herrera

18

What are customers doing (Manual vs. Automated)

• Local Clustering and Replication under the covers

– Metro / Global Mirror

– SRDF/A

– Hitachi True Copy & Global Replicator

– Oracle Dataguard

– DB2 HADR

* Replicated volumes to an Inactive cluster

• Standalone GLVM IP replication or Automated

– GLVM is available in base 5.3 & 6.1 AIX Media

– Enterprise Edition required for Automation

• Fully Automated Solution

– PowerHA SystemMirror Enterprise Edition

• Additional offerings in the works!

Longer Distances require more robust solutions

Page 19: HA18 Herrera

19

PowerHA SystemMirror Storage Replication Integration

Storage Level

Replication

Source LUNs Target LUNs

Site A Site B

PowerHA

SystemMirror

Enterprise

Cluster

IP CommunicationConsiderations:

• DS8700 Global Mirror, EMC SRDF &

Hitachi True Copy require PowerHA 6.1+

• The Enterprise Edition adds additional

cluster panels to define and store the

relationships for the replicated volumes

• CLI is enabled for each replication

offering to communicate directly with the

storage enclosures and perform a role

reversal in the event of a fallover

Characteristics:

• Distance Limitations: Synchronous or Asynchronous

• Supported Replication: Metro & Global Mirror, SRDF, Hitachi TrueCopy & Global Replicator

How it works:

• The cluster will redirect the replication depending on where the resources are being hosted

Enterprise Edition Storage replication offerings

Page 20: HA18 Herrera

20

SVC Version 5 Interoperability Matrix

SAN

Volume Controller

8Gbps SAN fabric

HPMA, EMA

MSA 2000, XP EVA 6400, 8400

HitachiLightningThunderTagmaStore

AMS 2100, 2300, 2500WMS, USP

EMCCLARiiON

CX4-960

Symmetrix

Microsoft

Windows

Hyper-V

IBM AIX

IBM i 6.1

Sun

Solaris

HP-UX 11i

Tru64

OpenVMS

Linux(Intel/Power/zLinux)

RHEL

SUSE 11

IBM

BladeCenter

SAN

SAN

Volume Controller

Continuous Copy

Metro/Global Mirror

Multiple Cluster Mirror

VMware

vSphere 4

Point-in-time CopyFull volume, Copy on write

256 targets, Incremental, Cascaded, ReverseSpace-Efficient, FlashCopy Mgr

Novell

NetWare

Sun

StorageTek

IBM

DSDS3400DS4000

DS5020, DS3950DS6000DS8000

1024

Hosts

IBM

N series

NetApp

FAS

SGI IRIX

IBM N series

Gateway

NetApp

V-Series

IBM TS7650G

Bull

StoreWay

Fujitsu

Eternus3000

8000 Models 2000 & 12004000 models 600 & 400

NEC

iStorage

For the most current, and more detailed, information please visit ibm.com/storage/svc and click on “Interoperability”.

Space-Efficient Virtual Disks

New

Entry Edition software

Virtual Disk MirroringNew

Apple

Mac OS

Pillar

Axiom

IBM

XIVDCS9550DCS9900

IBM

z/VSE

New

New

New

New

SSD

New

Native iSCSI

New

New

NewNew

New

Storage Level virtualization for your Enterprise needs

Page 21: HA18 Herrera

21

Enterprise Edition Disk Replication Integration

• Qualified & supported DR configurations- IBM Development & AIX Software Support

• Teaming with EMC & Hitachi

- Cooperative Service Agreement

• Install all Filesets or only what you need

- Note Enterprise Verification takes longer

- Don’t install if you are not using it

• Filesets are in addition to base replication

solution requirementsHitachi True Copy & Global

Replicator

cluster.es.tc.cmds

cluster.es.tc.rte

cluster.msg.En_US.tc

EMC SRDFcluster.es.sr.cmds

cluster.es.sr.rte

cluster.msg.en_US.sr

DSCLI Managementcluster.es.spprc.cmds

cluster.es.spprc.rte

cluster.es.pprc.rte

cluster.es.pprc.cmds

cluster.es.msg.en_US.pprc

Direct PPRC Managementcluster.es.pprc.rte

cluster.es.pprc.cmds

cluster.msg.en_US.pprc

Enterprise Licensecluster.xd.license

So what are you paying for ?

Page 22: HA18 Herrera

22

Geographic Logical Volume Mirroring - GLVM

GLVM code is available in the AIX base media:

• AIX 5.3 – synchronous replication

• AIX 6.1 – synchronous & asynchronous replication

PowerHA SystemMirror Enterprise Edition provides

SMIT panels to define and manage all configuration

information and automates the management of the

replication in the event of a fallover

Enterprise Edition integrates with this IP replication offering

Source LUNs Target LUNs

Site A Site B

PowerHA

SystemMirror

Enterprise

Cluster

IP Communication

IP I/O Replication

* Storage can be dissimilar subsystems at either location

How it works:

Drivers will make the remote disks appear as if they are local over the WAN allowing for LVM mirrors

between local and remote disks. Asynchronous replication requires the use of AIO cache logical

volumes and Mirror Pools available only in AIX 6.1 and above

Find more details in

the new DR Redbook

SG24-7841-00

Page 23: HA18 Herrera

23

AIX & Geographic Logical Volume Mirroring

Geographic Logical

Volume Mirrring(Available on AIX Media)

glvm.rpv.client

glvm.rpv.msg.en_US

glvm.rvp.server

glvm.rpv.util

glvm.rpv.man.en_US

glvm.rpv.msg.en_US

cluster.msg.en_US.glvm

Enterprise License

& Integration Filesets

cluster.xd.license

cluster.xd.glvm

cluster.doc.en_US.glvm

Filesets:

Page 24: HA18 Herrera

24

PowerHA SystemMirror & AIX 6.1 - Asynchronous GLVM

Vegas Conference:

Implementing PowerHA SystemMirror Enterprise Edition for Asynchronous GLVM

Double session lab Wednesday – Bill Miller

Page 25: HA18 Herrera

25

GLVM Cluster Configuration Assistant

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

* Communication Path to Takeover Node [] +

* Application Server Name []

* Application Server Start Script []

* Application Server Stop Script []

HACMP can keep an IP address highly available:

Consider specifying Service IP labels and

Persistent IP labels for your nodes.

Service IP Label [] +

Persistent IP for Local Node [] +

Persistent IP for Takeover Node [] +

GLVM Cluster Configuration Assistant

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

* Communication Path to Takeover Node [] +

* Application Server Name []

* Application Server Start Script []

* Application Server Stop Script []

HACMP can keep an IP address highly available:

Consider specifying Service IP labels and

Persistent IP labels for your nodes.

Service IP Label [] +

Persistent IP for Local Node [] +

Persistent IP for Takeover Node [] +

New in PowerHA SM 6.1 - GLVM Configuration Wizard

Generates:

HACMP configuration:Cluster name: <user supplied application name>_cluster

2 HACMP sites: "siteA" "siteB"

2 HACMP nodes - one per site: use hostname for node name

Single XD_data network

IP-Alias enabled

Includes all “inter-connected” network interfaces

Persistent IP address for each node (optional for single interface networks)

One Resource Group

Inter-site Management Policy – “Prefer Primary Site”

Includes all the GMVGs created by the wizard

Application Server

One or more Service IPs

Assists in the creation of a Synchronous GLVM cluster

Page 26: HA18 Herrera

26

Site Management Policies & Dependencies

The Enterprise Edition appends Inter-Site

Management policies beyond the resource

group node list

- Prefer Primary Site

- Online on Either Site

- Online on Both Sites

Standard Edition allows Site Definitions

- Cross Site LVM Configs

RG Dependencies:

- Online on Same Site

- will group RGs into a “set”

- rg_move would move “set” not

an individual resource group

- SW will prevent removal of RG

without removing dependency first

Page 27: HA18 Herrera

27

Failure Detection Rate & Disaster Recovery

IP Based Networks:

Serial Networks:

* PowerHA SystemMirror 7.1 has self tuning FDR with IP Multicasting

* * There is no Enterprise Edition available for the 7.1 2010 release

Most customers

using Local HA

have these by

default

XD Type Networks

have slower Failure

Detection rates

Page 28: HA18 Herrera

28

PowerHA SM Enterprise: Fallover Recovery Action

2 Policies available:

• AUTO (default) or MANUAL Fallover

Expected Behaviors:

• MANUAL only prevents a failover based on the status of the replicated volumes at

time of node failure, therefore, if replication consistency groups reflect a consistent

state a failover will still take place

* Example shows SVC menu but same option there for all replication options

Page 29: HA18 Herrera

29

Manual Recovery - Special Instructions

• In a scenario where the MANUAL recovery action was selected and a fallover did not occur due to the storage relationships being inconsistent the resource groups will go into an ERROR state and special instructions will be printed to the hacmp.out file

Page 30: HA18 Herrera

30

Oracle DB 2 CPU

DLPAR & Disaster Recovery Processing Flow

Oracle DB 1 CPU Standby 1 CPU

System A System B

DLPAR

Cluster 1

HMC

- 1 CPU - 1 CPU+ 1 CPU + 1 CPU

DLPAR

Oracle DB 2 CPU

LPAR Profile

Min 1

Desired 1

Max 2

Application Server

Min 1

Desired 2

Max 2

LPAR Profile

Min 1

Desired 1

Max 2

1. Activate LPARs Activate LPARs2. Start PowerHA

3. Release resources

Fallover or RG_move

Application Server

Min 1

Desired 2

Max 2

Read Requirements

Standby 1 CPU

System C

HMC

+ 1 CPU

DLPAR

Oracle DB 2 CPU

Secondary

Site

Primary

Site

How many licenses do you need ?

4. Site Fallover or movement

of resources to Secondary

Site

Page 31: HA18 Herrera

31

Enterprise Edition Command Line Interface

Additional Commands available in Enterprise Edition:

Hitachi True Copy relationships/usr/es/sbin/cluster/tc/cmds/cllstc

EMC SRDF relationships/usr/es/sbin/cluster/sr/cmds/cllssr

GLVM Resources & Statistics/usr/sbin/rpvstat

/usr/sbin/gmvgstat

San Volume Controller Metro &

Global Mirror Relationships

/usr/es/sbin/cluster/svcpprc/cllssvc

/usr/es/sbin/cluster/svcpprc/cllssvcpprc

/usr/es/sbin/cluster/svcpprc/cllsrelationship

DS Metro & Global Mirror

Relationships

/usr/es/sbin/cluster/pprc/spprc/cmds/cllscss

/usr/es/sbin/cluster/pprc/spprc/cmds/cllsspprc

/usr/es/sbin/cluster/pprc/spprc/cmds/cllsdss

• Knowing these will help identify & manage configuration

• Various usage examples in the new Enterprise Edition Redbook

Page 32: HA18 Herrera

32

Cluster Test Tool & Enterprise Edition

• Available Utility in the base level code

– Automated Test Tool

– Custom Test Plans

• Enterprise Edition appends additional tests that can be included in

custom test plans

Page 33: HA18 Herrera

33

Enterprise Edition: Component Failures & Outcomes

• Failures may not always occur in an orderly fashion – ie. rolling disaster

• In an Ideal Scenario the entire site goes down

Traditional Failures:

• Server / LPAR Failure – standard cluster behavior

• Storage Subsystem Failure – (remember AUTO vs. MANUAL)

– Selective Fallover behavior on quorum loss will result in movement of RG

Most risky:

• Communication links between the sites fail– Tested in Redbook by bringing down XD_IP network interfaces

– Results will vary based on the storage replication type

Results:– Standby site will acquire and redirect relationship

– Lost write access to disks and commands hung � Might result in a system crash

* Note:

Environments in same network segment could experience duplicate IP ERROR messages

Intermittent Failure (even worse):

- Links back up and then log GS_DOM_MER_ERR (halt of Standby Site)

- Entire cluster is now down since access to LUNs is N/A on primary site

Page 34: HA18 Herrera

34

Reference Diagram for Failure Scenario

* Note that there is only one

network passing heartbeats

between the sites

* Did not specify replication type

but can probably assume that this

was an SVC Metro Mirror

configuration based on the name of

the states

* Arrows should really point in

other direction for the replication

after the failure

Avoiding a partitioned cluster:

- More XD_IP networks

- Serial over Ethernet

- Diskhb networks over the SAN

Future Considerations:

- Quorum server

Page 35: HA18 Herrera

35

Recovery from Partitioned Cluster: Recommendations

• Things to check:

– State of the cluster nodes (connectivity, HMC, state of interfaces, error report)

– State of heartbeat communication paths (ie. lssrc –ls topsvcs)

– Consistency of the replicated volumes (CLI will vary by replication type)

– Status of the data

• What do you do to recover ?

– Identify cause – ASAP

• Beware of intermittent failures

– Consider bringing down all nodes on one site (avoid a cluster initiated halt)

• Hard Reset might be the best approach as graceful stop might hang up attempting to

release individual resources (ie. unmount, varyoff with no access to volumes)

– Check consistency of the data

• Every application will be different

– Reintegrate Nodes into cluster accordingly

• Consider Verify & sync before reintegration

Page 36: HA18 Herrera

36

When to use each Replication Option

Major Factors:

• Distance between sites

– Campus DR or Extended Distance

– Infrastructure & available bandwidth

• What type of Storage currently being used ?

• Same storage type at both locations ?

• Requirement to use CLI for management of relationships

• SLA Requirements – is HA required after a site fallover?

• What is the “True” requirement for automated fallover

– Recovery Time Objective – RTO

– Recovery Point Objective – RPO

Extended Distance Offerings

• Introduction to PowerHA SystemMirror for AIX Enterprise Edition

HA20 (AIX) Thursday & Friday – Shawn Bodily

Page 37: HA18 Herrera

37

Enterprise Edition: General Recommendations

• Clustering, Replication and High Availability solutions are not a

replacement for backups

– Mksysbs

– Flashcopies

– Snapshots

• Testing DR Solutions is the only way to guarantee they will work

– Testing should be performed at least once or twice a year

– Will help to identify any other components required outside of the cluster

• Recovery plan should be well documented & reside at both locations

• Leverage Cluster functions to ensure success

– CSPOC User functions guarantee that users are propagated to all cluster nodes

– User Password cluster management functions will ensure that changes are also

updated on all cluster nodes

Page 38: HA18 Herrera

38

PowerHA SM Enterprise Edition: Value Proposition

• Difference from Standard Edition

– Automates IP or Disk based Replication Mechanism

• Stretch Clusters – Campus Style DR

– Distance based on far you can extend shared storage

– Why pay more for Campus DR? - use Cross Site LVM

• Automated Fallover

– Manual Fallover option (based on state of disks)

– Enterprise Cluster will automatically trigger a fallover

• To disable alter start up scripts at DR location

• Ease of Management

– One time configuration – Location of RG will determine direction of replication

• Installation, Planning, Maintenance & Expected Behaviors

– Documented in new DR Redbook – SG24-7841

Page 39: HA18 Herrera

39

Questions?

Thank you for your time!

Page 40: HA18 Herrera

40

Additional Resources

• New - Disaster Recovery Redbook

SG24-7841 - Exploiting PowerHA SystemMirror Enterprise Edition for AIX

http://www.redbooks.ibm.com/abstracts/sg247841.html?Open

• New - RedGuide: High Availability and Disaster Recovery Planning: Next-Generation Solutions for Multi server IBM Power Systems Environments

http://www.redbooks.ibm.com/abstracts/redp4669.html?Open

• Online Documentationhttp://www-03.ibm.com/systems/p/library/hacmp_docs.html

• PowerHA SystemMirror Marketing Pagehttp://www-03.ibm.com/systems/p/ha/

• PowerHA SystemMirror Wiki Pagehttp://www-941.ibm.com/collaboration/wiki/display/WikiPtype/High+Availability

• PowerHA SystemMirror (“HACMP”) Redbookshttp://www.redbooks.ibm.com/cgi-bin/searchsite.cgi?query=hacmp