66
Designing a PowerHA SystemMirror Solution for AIX Michael Herrera Power Systems Strategic Initiatives Team [email protected] IBM – Coppell, TX @Herrera_HADR

Designing a PowerHA SystemMirror Solution for AIX

Embed Size (px)

Citation preview

Designing a PowerHA

SystemMirror Solution for AIX

Michael Herrera

Power Systems Strategic Initiatives Team

[email protected]

IBM – Coppell, TX

@Herrera_HADR

1

Agenda

• What are my options?

• How do I set it up? Requirements? Gotchas?

• What is new or different that might affect my configurations?

• Cluster Design

• Standard | Stretched | Linked clusters

• Split | Merge Features

• Heartbeat Communication Options

• Live Partition Mobility considerations

• PowerHA & critical volume groups

• Resiliency Enhancements

• Product Offering

• Common Topologies

• Licensing

2

PowerHA SystemMirror 7.2.0• AIX 7.1 TL4 | AIX 7.2

• AIX 7.1 TL3 – SP5

• AIX 6.1 TL9 – SP5

PowerHA SystemMirror 7.1.3• AIX 7.1 TL3 – SP1 with RSCT 3.1.5

• AIX 6.1 TL9 – SP1 with RSCT 3.1.5

PowerHA SystemMirror 7.1.2• AIX 7.1 TL2 - SP1 with RSCT 3.1.2.0

• AIX 6.1 TL8 - SP1 with RSCT 3.1.2.0

PowerHA SystemMirror 7.1.1• AIX 7.1 TL1 – SP3 with RSCT 3.1.2.0

• AIX 6.1 TL7 – SP3 with RSCT 3.1.2.0

PowerHA SystemMirror 7.1.0• AIX 7.1 with RSCT 3.1.0.1

• AIX 6.1 TL6 - SP1 with RSCT 3.1.0.1

PowerHA SystemMirror 6.1

• AIX 7.1 with RSCT 3.1.0.0• AIX 6.1 TL2 with RSCT 2.5.4.0

• AIX 5.3 TL9 with RSCT 2.4.12.0

Announce – Oct 2015

GA – Dec, 2015

SP 3 – March 2015

GA – Dec, 2013

EOL – April 2017

SP 6 – July 2015

GA – Nov, 2012

EOL – April 2016

SP 9 – May 2015

GA - Dec , 2011

EOS – April 2015

SP 9 – May 2014

GA - Sept , 2010

EOS – Sept 2014

SP 15 – April 2015

GA - Oct , 2009

EOS – April 2015

Standard

Edition5765 H39

Standard

Edition5765 H39

Standard

Edition5765 H23

Enterprise

Edition5765 H24

Enterprise

EditionN/A

Enterprise

EditionN/A

Standard

Edition5765 H39

Enterprise

Edition5765 H37

Standard

Edition5765 H39

Enterprise

Edition5765 H37

Standard

Edition5765 H39

Enterprise

Edition5765 H37

Minimum AIX Requirements for PowerHA SystemMirror

3

PowerHA SystemMirror for AIX Editions

Standard Edition

• Supports up to 16 nodes

• Supports Stretched or Linked clusters

• Provides local clustering functions

• Supports Manual or Smart Assist based

Deployments

• Traditionally shares same common

storage enclosure

• Supports 2 Site configurations:

� No Copy Services Integration

� No IP Replication Integration

� Supports Site Specific IPs

� Can be used with SVC Stretched Clusters

� Used with Cross Site LVM configurations

� Supports Split | Merge Policies when

configured as a Linked Cluster

Enterprise Edition

• Supports up to 16 nodes

• Supports Stretched or Linked clusters

• Application Smart Assistants also included for local portion of fallover configuration

• Provides local & extended cluster remote replication functions

• Can be configured to provide local clustering capabilities at first site and automated fallover to remote site

� Automates storage level Copy Services

� Automates IP Replication (GLVM)

� Integrates with DS8800 Hyperswap

� Supports up to 2 Sites

� Supports Split | Merge Policies

� Higher Price per core

4

PowerHA SystemMirror Standard Edition & CAA file sets

• CAA Packages:

cluster.license electronic license file

cluster.es.server base cluster filesets

cluster.adt.es Clinfo and Clstat samples and include files and a Web Based Monitor

cluster.doc.en_US.es PowerHA SystemMirror PDF Documentation

cluster.es.client cluster client binaries and libraries, plus Web based Smit for PowerHA

cluster.es.cspoc CSPOC and Dsh

cluster.es.migcheck Migration support

cluster.es.nfs NFS Server support

cluster.msg.en_US.es U.S. English message catalog

cluster.man.en_US.es man pages - U.S. English

cluster.doc.en_US.assist Smart Assist PDF documentation

cluster.hativoli PowerHA SystemMirror Tivoli Server and Client

cluster.es.assist Smart Assist filesets

cluster.msg.en_US.assist U.S. English Smart Assist messages

cluster.es.director.agent PowerHA SystemMirror Director CAS agent

cluster.es.cfs GPFS support

cluster.es.worksheets Online Planning Worksheets

• PowerHA SW packages:

bos.cluster.rte

bos.ahafs

bos.clvm.enh

devices.commom.IBM.storfwork

These should be part of

the base AIX build in AIX

6.1 TL6 and AIX V7

Part of a

traditional build

using the

Standard Edition

Consider these

optional packages

in the media

5

Product Stable Point (Recommended Levels)

Reference URL:

https://aix.software.ibm.com/aix/ifixes/PHA_Migration/ha_install_mig_fixes.htm

SP1 SP2 SP5 SP6 SP7

AIX 6.1 TL09 Link Link Link Link

AIX 7.1 TL03 Link Link Link

AIX 7.1 TL04 Link Link

AIX 7.2 Link Link

AIX: CAA and RSCT related Fix Bundles (updated June 24, 2016):

GA SP1 SP4

PowerHA 7.1.3 Link

PowerHA 7.2 Link Link

PowerHA Fix Bundle (updated July 6, 2016):

Site provides emgr

packages including Interim

fixes beyond the fixes

available for download in

Fix Central

6

Review of contents in AIX 714 SP1 bundle

# more README_AIX_7141

The epkgs contained in this tarball are:

MIG3_7141.160607.epkg.Z (CAA)

rsctHA7B4.160610.epkg.Z (RSCT)

# emgr -d -e MIG3_7141.160607.epkg.Z -v 3

Displaying Configuration File "APARREF"+------------------------------------------------------------------------+

25624|:|IV78064|:|UNDER RARE CIRCUMSTANCES CLSTRMGR MIGHT SEND SIGINT TO PID -1

25656|:|IV77352|:|HA:CAA DYN HOSTNAME CHANGE OPERATION MAY BREAK POWERHA MIGRATION

25414|:|IV75594|:|PowerHA may miss the manual merge notification from CAA/RSCT.

25494|:|IV76106|:|RG ONLINE AT BOTH NODES AFTER A RESOURCE FAILS TO BE ACQUIRED

26025|:|IV79497|:|SMCAACTRL IS BLOCKING NODE TIME

26602|:|IV83330|:|REDUCE COMMUNICATION_PATH CHANGES

26206|:|IV80748|:|HA: AUTOCLVERIFY DOESN'T WORK AFTER HA UPGRADE TO 713 SP4

26103|:|IV80053|:|SMCAACTRL MAY NOT ALLOW THE REPLACE REPOSITORY OPERATION

25368|:|IV75339|:|ALLOW NEW CAA TUNABLES TO BE SET VIA CLCTRL IN A POWERHA ENV.

24616|:|IV74077|:|HA SHUTDOWN -R CAUSES TAKEOVER STOP INSTEAD OF GRACEFUL STOP

26643|:|IV83599|:|POWERHA: CLMIXVER HANDLE=0 PREVENTS CLCOMD COMMUNICATION

26448|:|IV82534|:|POWERHA: CLVERIFY DOES NOT PREVENT DOUBLE MOUNT

7

SANCOMM

High Availability: Local Clustering

• Supported Topology Configurations:

• Active | Standby

• Active | Active (Independent Workloads)

• Active | Active (Concurrent)

LPAR A1 LPAR B1

Production

Workload

VIO 1 VIO 2 VIO 1 VIO 2

LPAR (not clustered)

LPAR (not clustered)

V7000

IBM FlashSystem

Standby

LPAR A2 LPAR B2

Production

Workload #1

Production

Workload #2

LPAR (not clustered)

LPAR (not clustered)

LPAR A3 LPAR B3

Concurrent

Workload

Concurrent

Workload

• Supported Shared Storage:

– Local clusters share the same storage

support as anything supported by AIX

– Native & OEM Multipath Drivers

• Supported Resource Configurations:

– Dedicated resources

– Virtualized (NPIV, VSCSI, SSP)

– Live Partition Mobility awareness

– AIX 7.2 Live Update awareness

• Supported Features:

– Resource Dependencies (not shown)

– Application Monitoring

– Custom Events

– Integrated DLPAR | PEP Integration

8

Enterprise Edition Software Packages

Replication Type File Sets to Install

ESS Direct Management PPRC

cluster.es.pprc.rte

cluster.es.pprc.cmds

cluster.msg.en_US.pprc

ESS DS6000/DS8000 Metro Mirror

DSCLI PPRC

cluster.es.spprc.cmds

cluster.es.spprc.rte

cluster.es.cgpprc.cmds

cluster.es.cgpprc.rte

cluster.msg.en_US.svcpprc

San Volume Controller (SVC) & Storwize

Family

cluster.es.svcpprc.cmds

cluster.es.svcpprc.rte

cluster.msg.en_US.svcpprc

XIV, DS8800 in-band and Hyperswap,

DS8700/DS8800 Global Mirror

cluster.es.genxd.cmds

cluster.es.genxd.rte

cluster.msg.en_US.genxd

Geographic Logical Volume Mirroring

(GLVM)

cluster.doc.en_US.glvm.pdf

cluster.msg.en_US.glvm

cluster.xd.glvm

glvm.rpv* (file sets in base AIX)

EMC SRDF

cluster.es.sr.cmds

cluster.es.sr.rte

cluster.msg.en_US.sr

Hitachi TrueCopy / Universal Replicator

cluster.es.tc.cmds

cluster.es.tc.rte

cluster.msg.en_US.tc

• Install the EE packages

needed for integration in

addition to the base code

• The installation will update

the new SMIT menus into the

PowerHA SM screens

• The Enterprise media now

includes the base code, the

EE packages and the Smart

Assist File Sets

9

Difference when Enterprise Edition is Installed

• Filesets Required for SVC Integration:

• smitty sysmirror ���� Cluster Applications & Resources ���� Resources

Entry Point into the EE resource configuration

Install the license fileset and the

packages applicable to the replication type in addition to the

base code

Product Applicable File Sets

Enterprise Edition License cluster.xd.license

San Volume Controller cluster.es.svcpprc.cmds

cluster.es.svcpprc.rte

cluster.msg.en_US.svcpprc

10

HA & DR: Automation of Site-to-Site Replication

LPAR A1 LPAR B1

Production

Workload

VIO 1 VIO 2 VIO 1 VIO 2

LPAR (not clustered)

LPAR (not clustered)

Standby

LPAR A2 LPAR B2

Production

Workload #1

Production

Workload #2

LPAR (not clustered)

LPAR (not clustered)

LPAR C1

VIO 1 VIO 2

LPAR (not clustered)

LPAR (not clustered)

Standby

LPAR C2

Standby

Primary Site Secondary Site

V9000 IBM FlashSystem V9000 IBM FlashSystem

Synchronous Replication

Asynchronous Replication

11

HA & DR: Automation of Site-to-Site Replication

LPAR A1 LPAR B1

Production

Workload

VIO 1 VIO 2 VIO 1 VIO 2

LPAR (not clustered)

LPAR (not clustered)

Standby

LPAR A2 LPAR B2

Production

Workload #1

Production

Workload #2

LPAR (not clustered)

LPAR (not clustered)

LPAR C1

VIO 1 VIO 2

LPAR (not clustered)

LPAR (not clustered)

Standby

LPAR C2

Standby

Primary Site Secondary Site

CG1 – DataVG1

CG2 – DataVG2

V9000 IBM FlashSystem V9000 IBM FlashSystem

12

Local HA & Replication to a Remote Site

Node A

Local

Cluster

Manual FalloverNode B

Storage Level

Replication (opt 2)

Node A

Remote Nodes

Within cluster

Node B Node C

Site A Site B

Standard Edition

Enterprise Edition

Cluster

If the DR location is

not part of the HA

cluster the LPARs

don’t need to be up

and running and

actively monitoring

heartbeats

Version 7 updates

• Tie Breaker Disks

- iSCSI or NFS backed

• Split | Merge Policies

- Majority

- Manual

IP Based Replication (opt 1)

Storage Level

Replication (opt 2)

Application Level

Replication (opt 1)

13

Different Storage Configuration Scenarios

Data Center A

Logical Volume Mirroring

Single Storage Subsystem(shared data volumes)

Prod LPAR Standby LPAR

Prod LPAR Standby LPAR

Prod LPAR Standby LPAR

Copy Services Replication(Sync / Asynchronous)

14

Storage Stretch Cluster Configuration

14

Data Center A Prod LPAR Standby LPARData Center B

Shared Virtualized Volume Definitions

Storage Copy 1 Storage Copy 2

• Cluster sees same PVID on both sides for shared LUNs

Benefits:

– Storage Subsystems maintains data copies

– Simpler configuration on client LPAR

– Facilitates VM Mobility (Live Partition Mobility)

Storage Level Replication behind the scenes

To the cluster this looks like a

local shared storage subsystem

configuration since it’s the

same PVID on both sides

15

Hyperswap Capabilities with Spectrum Virtualize

• PowerHA supports use of an SVC Enhanced Stretch Cluster

Metro Mirror Relationship

San Volume

Controller

SVC

Volume

Mirrors

Single SVC node Single SVC nodeSplit I/O Group

• Storwize 7.5 code supports Hyperswap or Enhanced Stretch Cluster

– Introduced in June 2015 release

– No longer requires use San Volume Controller with split I/O group

– The limitation today is that the 2 I/O Groups are still within same cluster

San Volume

Controller

Look out for Storwize updates

on Transparent Hyperswap

* Limitations with FlashCopy Manager & Global Mirror from volumes in Hyperswap relationship

16

PowerHA SystemMirror Licensing Software Tiers

16

POWER8Models

SoftwareTier

E880 Medium

E870 Medium

E850 Small

S824 Small

S822 Small

S814 Small

Physical Servers can be intermixed within a cluster configuration

* Cluster software is licensed by the

number of active cores

POWER7Models

SoftwareTier

Power 795 Large

Power 780 Large

Power 770 Medium

PureFlex Small

Power 750 Small

Entry Servers Small

Blades Small

Cheaper per

core price at

Power 8 for

Enterprise

Class Servers

Key Updates:

� Shared Processor Pool Resize

� Power Enterprise Pool Integration

� Medium Price per core on E870/E880

17

Environment: DLPAR Resource Processing Flow

Oracle DB 1 CPU

Banner DB 1 CPU

Standby 1 CPU

Standby 1 CPU

System A System B

DLPAR

Cluster 1

Cluster 2

HMC

- 4 CPU

- 4 CPU

- 4 CPU

- 4 CPU

+ 4 CPU

+ 4 CPU

+ 4 CPU

+ 4 CPU

DLPAR

Oracle DB 5 CPU

Banner DB 5 CPU

Oracle DB 5 CPU

Banner DB 5 CPU

LPAR Profile

Min 1

Desired 1

Max 5

LPAR Profile

Min 1

Desired 1

Max 5

Application Server

Min 1

Desired 5Max 5

Application Server

Min 1

Desired 5Max 5

LPAR Profile

Min 1

Desired 1

Max 5

LPAR Profile

Min 1

Desired 1

Max 5

1. Activate LPARs Activate LPARs2. Start PowerHA

3. Release resourcesFallover or RG_move

4. Release resourcesStop cluster without takeover

Application Server

Min 1

Desired 5Max 5

Application Server

Min 1

Desired 5Max 5

Read Requirements

Take Aways:• CPU allocations follow the application server wherever it is being hosted (this model allows you to lower the HA license count)• DLPAR resources will only get processed during the acquisition or release of cluster resources• PowerHA 6.1+ allows provide micro-partitioning support and the ability to also alter virtual processor counts• DLPAR resources can come from free CPUs in shared processor pool or CoD resources

18

Cluster Design with Savings in mind

• Standard Edition (local cluster scenario)

Oracle DB 5 CPU

Banner DB 5 CPU

Standby 5 CPU

Standby 5 CPU

Standby 5 CPU

Standby 5 CPU

PeopleSoft 5 CPU

Financial DB 5 CPU

System A System BCluster 1

PowerHA SE licenses:System A: 20 CPUs

System B: 20 CPUs

Total : 40 licenses

Oracle DB 5 CPU Standby .25 CPU

System A System BCluster 1

Banner DB 5 CPU

PeopleSoft 5 CPU

Financial DB 5 CPU

Standby .25 CPU

Standby .25 CPU

Standby .25 CPU

Cluster 2

Cluster 3

Cluster 4

PowerHA SE licenses:System A: 20 licenses

System B: 1 license

Total : 21 licenses

Cluster 2

Cluster 3

Cluster 4

Cost: Small – $104KMed – $146K

Large – $180K

Cost: Small – $54.6KMed – $76.6K

Large – $94.5K

19

Cluster Design with Savings in mind

• Enterprise Edition (Local HA & DR Integration)

Oracle DB 5 CPU

Banner DB 5 CPU

Standby 5 CPU

Standby 5 CPU

Standby 5 CPU

Standby 5 CPU

PeopleSoft 5 CPU

Financial DB 5 CPU

System A System BCluster 1

PowerHA EE licenses:System A: 20 CPUs

System B: 20 CPUs

System C: 20 CPUs

Total : 60 licenses

Oracle DB 5 CPU Standby .25 CPU

System A System BCluster 1

Standby 5 CPU

Standby 5 CPU

Standby 5 CPU

Standby DB 5 CPU

System C - DR

Standby .25 CPU

System C - DR

Banner DB 5 CPU

PeopleSoft 5 CPU

Financial DB 5 CPU

Standby .25 CPU

Standby .25 CPU

Standby .25 CPU

Standby .25 CPU

Standby .25 CPU

Standby .25 CPU

Cluster 2

Cluster 3

Cluster 4

PowerHA EE licenses:System A: 20 licenses

System B: 1 license

System C: 1 license

Total : 22 licenses

Cluster 2

Cluster 3

Cluster 4

Cost: Small – $204KMed – $315K

Large – $390K

Cost: Small – $74.8KMed – $115.5K

Large – $143K

20

PowerHA CoD and Enterprise Pool Support Summary

CoD Offering Type PowerHA 6.1 PowerHA 7.2.0

Permanent CPU, Memory Yes

On/OffCPU Yes Yes

Memory No Yes

Utility CoD CPU, MemoryUtility CoD automatically is performed at PHYP/System level.

PowerHA can not play a role in the same

Trial CoD CPU, Memory Yes

Enterprise Pools CPU, Memory No * Yes

You do not have to answer

Yes if you anticipate on using

Enterprise Pool Mobile cores

* Current integrated support is up to 8.8.4 HMC code

21

How the ROHA calculation is performed

Application Controller: App1

Processors & Memory Values:

Optional Amount of GB of Memory: 2GB

Optimal # Processing Units: 2.5

Optimal # Virtual Processors 5

Min + Optimal = (2 + 2) 4GB of Mem

Min + Optimal = (.5 + 2.5) 3 proc

Min + Optimal = (1 + 5) 6 virtual proc

• Pull from Trial CoD if available

• Pull from EPCoD if available

• Pull from On/Off is accepted license & its available

LPAR: mhha72node1

ProcessorsVirtual

ProcessorsMemory

Min .5 1 2 GB

Desired .5 2 2 GB

Max 3 6 4 GB

LPAR: mhha72node2

ProcessorsVirtual

ProcessorsMemory

Min .5 1 2 GB

Desired .5 2 2 GB

Max 3 6 4 GB

LPAR: < LPAR Hosting Workload >

ProcessorsVirtual

ProcessorsMemory

Active 3 6 4 GB

22

If necessary SPP size can be dynamically increased.

User agrees on this change through a tunable in PowerHA screens

Customer pays for 7 CPUs of Middleware Licenses, as he has 6 CPUs on active

frame, and 1 CPU on backup frame. Customer expects the SPP size to be adjusted

on both nodes, active node and backup node, at takeover time (and then that the

CoD CPUs are assigned to this LPAR)

PowerHA Shared Processor Pool (SPP) Resize

Normal Production

HA SPP

6 Processors

HA SPP

1 Processors

Server A Server B

DR Recovery Fallover Situation

HA SPP

1 Processors

HA SPP

6 Processors

Server A Server B

23

PowerHA SystemMirror V7 Deployment Methods

There are a number of different ways to achieve the same result:

• smitty sysmirror

- Initial | Discovery

- Custom Cluster configuration

• clmgr cluster copy � cluster cloning from snapshot

# clmgr manage snapshot restore <snapshot_name> \

nodes=<host>,<host#2> \

repositories=<disk>, [<backup>] [:<disk> [,<backup>]] \

[ cluster_name = <new_cluster_label> ] \

[ configure = yes | no ]

[ force = no | yes

• Snapshot must be

manually copied onto

new nodes

• Service Labels are

not preserved

• Will perform a new

discovery but will not

automatically

synchronize the

cluster

24

Expedited Deployment & Simplified Management

• V7 Command Line Interface (clmgr)1. clmgr add cluster <name> repository=<hdisk#> nodes=<node1>, <node2>

2. clmgr add service_ip <label> network=<name>

3. clmgr add application_controller <app_name> startscript=”<path>” stopscript=”<path>”

4. clmgr add <rg_name> nodes=<node1>, <node2> startup=ohn fallback=nfb service_label=<name> volume_group=<vg_names> application=<app_name>

5. clmgr sync cluster

Rapid Deployment

Cluster Worksheets

• Application Smart Assists

• Creation of cluster shell (Cluster | RGs | Resources)

• Auto provisioning of application start / stop logic

• Auto provisioning of application monitoring

# smitty clsa

25

PowerHA SystemMirror Cluster - Planning

Redundant SAN

Redundant LAN

Storage Enclosure

Server A Server B

Network Topology

� Reserve IPs | DNS Names

– Boot / Persistent / Service IPs

� Network Settings

– Unicast vs. Multicast

– IGMP_snooping

Storage

� Zoning | Mapping Requirements

� Multipath Drivers requirements

� ECM VG requirements

� HBA requirements (SANCOMM)

� Acquire Shared LUNs

– CAA Repository Disk

– Shared Data Volumes

Cluster Configuration

On Cluster LPARs:

� Install OS Pre-Reqs

� Install PowerHA filesets

� Configure Cluster

– Topology

– Resources

– Monitoring

26

A Closer Look at Cluster Configuration

Storage Enclosure

Resource Group Attributes:

� Startup, Fallover, Fallback

� Participating Nodes

� HA Resources

Application Controller

VG / File systems

Service IP

NFS Exports / Mounts

Imported VG

Definitions

NFS Mount/s

Resource Group:

Monitor/s

Startup Policy:

� Online on Home Node Only

� Online on First Available Node

� Online Using Distribution Policy

� Online on All Available Nodes

Fallover Policy:

� Fallover to the Next Priority Node

� Fallover Using Dynamic Node Priority

� Bring Offline

Fallback Policy:

� Never Fallback

� Fallback to Higher Priority Node

� Bring Offline

* Default values

Resource Group:

Dependent Workload

VG / File systems Monitor/s

Available RG Dependencies

27

New Resource Group Dependencies

Available RG Dependencies

� Parent / Child

� Location Dependencies

� Start After

� Stop After

Dynamic Node Priority

� Processor Utilization

� Memory Utilization

� Disk I/O Utilization

� cl_lowest_nonzero_udscript_rc

� cl_highest_udscript_rc

Resource Group:

Node List: A, B, C

Resource Group:

Processor Utilization

Memory Utilization

Disk I/O Utilization

Resource Group:

cl_lowest_nonzero_udscript_rc

cl_highest_udscript_rc

Static Fallover Policy

Dynamic Node Priority

DNP Adaptive Fallover

28

Application Monitoring within Cluster

• Some are provided in Smart Assistants• ie. cluster.es.assist.oracle � /usr/es/sbin/cluster/sa/oracle/sbin/DBInstanceMonitor

• A Monitor is bound to the Application Controller• Example OracleDB

Startup

Monitor

Process

Monitor

Custom

Monitor

Confirm the

startup of the

application

Invokes the

custom logic

Checks the

process table

New

Application

Startup Mode

in HA 7.1.1

60 sec

interval60 sec

interval

Only invoked

on

application

startup

Long Running Monitors

will continue run locally

with the running

application

• Application Monitoring

within the cluster

configuration is optional

• Monitoring can be

configured to perform

restarts | notify | fallover

• If the source LPAR

remains ONLINE and only

the application goes offline

- without monitoring the

cluster will not attempt to

relocate the workload/s

29

29

Application Start up Mode – New Option

• Application Controllers are started in background by default

• Foreground start causes event processing to wait for completion of the application start script

• Poorly designed scripts may cause hangs (config_too_long)

• Return codes usually not checked, SP1 will cause EVENT ERROR is RC=1

30

PowerHA: Looking under the Hood

NODE A NODE B

Storage Subsystem

Repository

LUN

Shared Data

Volumes

net_ether_0

CAA Unicast Communication(optional IP Multicast)

Application Monitoring

(optional)

HBA Based Heartbeating

(Optional)

Repository Heartbeating(Required)

Highlights:• CAA Kernel level Monitoring

• Heartbeat over all interfaces

• Handle Loss of Rootvg

• Exploit JFS2 Mountguard

• Disk Fencing Enhancements

• Quarantine Features

• CAA VIO NIC Failure Detection

• Resilient Repository Disks

• Tie Breaker Disks (NFS backed)

• Split | Merge Policies

Ongoing Tasks:• Nightly Verification

• Application monitoring (optional)

• Event based alerts (optional)

• AIX Error Report notification

• Live Partition Mobility awareness

• AIX Live Update awareness

RG1 (NodeA, NodeB)

Service IP

Volume Group

Application 1

SANCOMM

31

Why the Cluster “Type” matters

Stretched Cluster

Split: No action Merge: Majority

Split: Tie Breaker Disk | NFS Merge: Tie Breaker Disk | NFS

Linked Cluster

Split: None Merge: Majority

Split: Tie Breaker Disk | NFS Merge: Tie Breaker Disk | NFS

Split: Manual Merge: Manual

The topology you choose

matters if you want to take

advantage of the User

confirmation on fallover

feature

Standard Cluster

Split: Not supported Merge: Majority *

Manual: Operator must select which site continues | recovers# clmgr manage site respond [ continue | recover ]

The Split | Merge options

are only available when you

define Sites & define a

Stretched or Linked Cluster

32

Standard vs. Stretched Cluster Configuration

Stretched Cluster

Split: No action Merge: Majority

Split: Tie Breaker Merge: Tie Breaker

Both configurations

support the use of a

single repository disk

Standard Cluster

Split: Not Supported Merge: Not Supported

Site Definitions:- Site Specific IPs

- Site Specific RG Dependencies

- Tie Breaker Disk support

• Traditional Shared Disk Cluster

Best Suited for:- Cross Site LVM configuration

- Different Network segments

- Distinguish shared nodes between Metro area

Primary

Repository

Disk

Backup/s

(optional)

33

Standard, Stretched or Linked Clusters

Multicast Communication

between cluster members

Ie. 228.x.x.xMulticasting between local nodes &

Unicast Communication between Sites

Ie. 228.x.x.1

228.x.x.2

34

PowerHA SystemMirror

Stretched vs. Linked Cluster Configurations

Linked Cluster Topology

• One CAA Repository Disk per Site• Site Definitions

• Multiple Repository Disks• Site Specific IP Addresses

• Automated Start / Stop of Replication• Storage Copy Services Integration• IP Replication Integration (GLVM)

IP

VGs

App Server

IP Network

SAN Network

Resource Group

IP

VGs

App1

IP Network

SAN Network

Resource Group

IP Network

IP

VGs

App2

Resource Group

IP

VGs

Dev App

Resource Group

Site A

Site B

Disk Replication

Stretched Cluster Topology

� Single CAA Repository Disk

� Network Topology

� Resource Group/s

– IPs

– VGs

– Application Controller

� Application Monitor/s (optional)

CAA Repository Data Volumes

CAA Repository Source Data Volumes CAA RepositoryTarget Data Volumes

Site A

Site B

35

Using a Stretched or a Linked Cluster

IP

VGs

Workload

LPAR A

CAA Repository DataVG

LPAR B

CAA RepositoryDataVG

Storage Subsystem #1 Storage Subsystem #2

When you have Multiple Storage Subsystems where does the Repository Disk come from?

Backup

Repository Disks

Backup

Repository Disks

How many Backup Repository disks should you define?

36

PowerHA V7.2: Backup Repository Disks

View of CAA

Repository Disk and

assigned backups

from AIX – consider

renaming the hdisk #s

with rendev command

The minimum size

requirement for a

PowerHA/CAA Repository

Disk is 512MB

Different PowerHA

commands to view

the currently “active”

and “backup”

repository disks

37

Scenario: Small Server with only internal disks

Solution Details:• Uses Cluster Site definitions (Maximum of 2 LPARs in the configuration)

• Enterprise Edition will automate sync or async IP replication between machines

• Circumvents shared CAA Repository Disk requirement

• Exploits AIX Mirror Pools and HA Split | Merge Policies

Scale-out Box

Linked Cluster

Topology

Site A – Primary Site A – Secondary

Scale-out Box

Internal Disks Internal Disks

Geographic Logical

Volume Mirroring

(GLVM)

IP Replication

Copy 1 Copy 2

CAA

Site A

CAA

Site B

Each Site will see its own local disks and the Remote Physical Volumes

(RPVs)

Recommended:

• Multiple IP Links

• Tie Breaker Disk

38

38

Temporarily Removing CAA out of the equation

• Stopping cluster services does not close the CAA private volume group

root@mhoracle1 /> lspv | grep private

hdisk9 00f626d13aa3645a caavg_private active

root@mhoracle1 /> lsvg -l caavg_private

caavg_private:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

caalv_private1 boot 1 1 1 closed/syncd N/A

caalv_private2 boot 1 1 1 closed/syncd N/A

caalv_private3 boot 4 4 1 open/syncd N/A

powerha_crlv boot 1 1 1 closed/syncd N/A

� New option in Version 713 SP1 to stop CAA with it:

root@mhoracle1 /> clmgr stop cluster STOP_CAA=yes

root@mhoracle1 /> clmgr start cluster START_CAA=yes

Use of CAA option typically not required

39

Transition of PowerHA Topology IP Networks

en1

(persistent IP) 9.19.51.11 ( base address) 192.168.100.2en0

9.19.51.20 (service IP)9.19.51.10 (persistent IP)

192.168.100.1 (base address) en0

en1 ( base address) 192.168.101.2

9.19.51.21 (service IP)192.168.101.1 (base address)

VLAN

Traditional HA Network

(base address) 9.19.51.11en2

9.19.51.21 (service IP) 9.19.51.20 (service IP)9.19.51.10 (base address) en2

HB Rings

In 6.1 & below

Alternate Configuration (aggregation not showing)

en3 en3192.19.51.10 (base address) (base address) 192.19.51.10

VLAN

Cross Over Cable Provides

additional

resiliency and

bypasses

network

switches

( base address) 9.19.51.11en2

9.19.51.21 (service IP) 9.19.51.20 (service IP)9.19.51.10 (base address) en2

VLAN

ent0 ent1 ent0 ent1

Configuration Using Link Aggregation

EtherChannel or

Virtualized

Environments

with Dual VIOs

40

PowerHA SystemMirror Version 7.X

PowerHA Node 19.19.51.10( base address)

9.19.51.11( base address)

FRAME 1

en0

9.19.51.20 (service IP 1)

Virtual I/O Server (VIOS2)

Hypervisor

AIX Client LPAR

en0

ent0(virt)

Virtual I/O Server (VIOS1)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

ent3(LA)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

ent3(LA)

FRAME X

ent5(virt)

ent5(virt)

Control

ChannelControl

Channel

LPAR

VIO VIO

LPAR

LPAR

PowerHA Node 2

FRAME 2

en0

LPAR

VIO VIO

LPAR

LPAR

WAN

• Only IPAT via Aliasing is supported• Update netmon.cf file with IPs outside server

41

Virtual Ethernet & PowerHA SystemMirror

Virtual I/O Server (VIOS2)

Hypervisor

PowerHA LPAR 1

en0

ent0(virt)

Virtual I/O Server (VIOS1)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

Ethernet Switch

ent3(LA)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

ent3(LA)

Virtual I/O Server (VIOS2)

Hypervisor

PowerHA LPAR 2

en0

ent0(virt)

Virtual I/O Server (VIOS1)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

ent3(LA)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

ent3(LA)

Ethernet Switch

Frame1

Frame2

ent5(virt)

ent5(virt)

ent5(virt)

ent5(virt)

Control

Channel

Control

Channel

Control

ChannelControl

Channel

Independent Frames & Link Aggregation

42

Virtual Ethernet NIB & PowerHA SystemMirror

Frame 1

Hypervisor

Virtual I/O Server (VIOS2)LPAR 1

ent2

ent0(virt)

Virtual I/O Server (VIOS1)

ent4(SEA)

ent2(virt)

LPAR 2

ent0(virt)

ent2

ent0(phy)

Ethernet Switch

ent0(phy)

ent4(SEA)

ent2(virt)

Ethernet Switch

vswitch 0

This is an alternative configuration using virtual switches in order to

be able to have adapters active on each of the VIO servers.

Alternate configuration to provide load balancing between VIOs

ent1(virt)

ent1(virt)

vswitch 1

VLAN 1VLAN 1

NIB NIB

43

Subnet Requirements: Following the Rules

PowerHA Node 19.19.51.10( base address)

9.19.51.11( base address)

FRAME 1

en0

9.19.51.20 (service IP1)

LPAR

VIO VIO

LPAR

LPAR

PowerHA Node 2

FRAME 2

en0

LPAR

VIO VIO

LPAR

LPAR

WAN

PowerHA Node 1192.168.51.10( base address)

192.168.51.11( base address)

FRAME 1

en0

9.19.51.20 (service IP 1)

LPAR

VIO VIO

LPAR

LPAR

PowerHA Node 2

FRAME 2

en0

LPAR

VIO VIO

LPAR

LPAR

WAN

en1en110.19.51.10

( base address)

10.19.51.11( base address)

10.19.51.20 (service IP2)

net_ether_01

net_ether_02

en1 en1192.168.52.10( base address)

192.168.52.11( base address)net_ether_01

9.19.51.21 (service IP 2)

44

Simplified Topology in 7.1 Cluster

Sample Cluster Topology Output:

root@mhoracle1 /> cllsif

Adapter Type Network Net Type Attribute Node IP Address Interface Name Netmask

mhoracle1 boot net_ether_01 ether public mhoracle1 10.19.51.211 en0 255.255.255.0

sharesvc1 service net_ether_01 ether public mhoracle1 10.19.51.239 255.255.255.0

mhoracle2 boot net_ether_01 ether public mhoracle2 10.19.51.212 en0 255.255.255.0

sharesvc1 service net_ether_01 ether public mhoracle2 10.19.51.239 255.255.255.0

Status of the Interfaces

root@mhoracle1 /> lscluster -i

Network/Storage Interface Query

Cluster Name: sapdemo71_cluster

Cluster uuid: 3bd04654-3dfd-11e0-9641-46a6ba546403

Number of nodes reporting = 2

Number of nodes expected = 2

Node mhoracle1.dfw.ibm.com

Node uuid = bff1af28-3550-11e0-be44-46a6ba546403

Number of interfaces discovered = 4

Interface number 1 en0

Interface state UP

Interface number 2 en1

Interface state UP

Interface number 3 sfwcom

Interface state UP

Interface number 4 dpcom

Interface state UP

IP Heartbeating

HBA Heartbeating (optional)

Repository Disk

Note that this feature

is not supported on

16GB HBAs

45

Lets talk about speeds & tuning

CAA uses built-in valuesSet to be able to detect if the other side is unreachable within 5 seconds.(the used values can not be changed)

smitty sysmirror� Custom Cluster Configuration � Cluster Nodes & Networks � Manage Cluster � Cluster Heartbeat settings

Behavior prior to AIX 71 TL4.

In AIX 7.2 it will wait for the

period of the failure detection

time

Value of 0: Quick Failure Detection Process

Value of 5-590s : If value is specified CAA will use the full wait time process. The default in

AIX 7.2 should be 20.

HA 7.2 | AIX 71 TL3

HA 7.2 | AIX 72

46

Configure netmon.cf file

PowerHA V7.1

• RSCT based

• Up to 30 lines by interface

• Sequence about every 4 sec.

• Up 5 lines processed in parallel (if defined)

• The netmon.cf gets checked every few seconds for content changes

� Requires that the fix for IV74943

• To be able to define a specific latency for the network down detection

� Open a PMR and request the "Tunable FDT IFIX bundle"

PowerHA 7.2

• CAA based

• Up to 5 lines by interface

• Will be used if CAA heartbeating detects an outage

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

# repeated entries for

longer latency

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

# repeated entries for

longer latency

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

!REQD en0 192.168.60.10

!REQD en0 192.168.60.1

No need to do this to extend

network down detection as it

would only add .5s max

Usual configuration but

consider lines for the

various interfaces in

your environment

47

Virtual Ethernet device & "poll_uplink” setting

System A Using poll_uplink=no

HYPV

ent3

virt

ent0

ent2

SEA

LPAR x

en1

PowerHA

VIO

Gateway

Client

a.b.c.2

(fixed)

ent4

virt

ent1

ent5

SEA

en0

System AUsing poll_uplink=yes

HYPV

ent3

virt

ent0

ent2

SEA

LPAR x

en1

PowerHA

VIO

Gateway

Client

a.b.c.2

(fixed)

ent4

virt

ent1

ent5

SEA

en0

Physical link = down Physical link = down

Virtual link = up Virtual link = down

48

Using poll_uplink

• Requirements to use poll_uplink

- VIO 2.2.3.4 or later & AIX 71 TL3 (SP3 for entstat output)

• Need to be set on the LPAR

- Enable poll_uplink on virtual entX interfaces

# chdev -l entX -a poll_uplink=yes –P

• Possible Settingspoll_uplink (yes, no)

poll_uplink_int (100ms – 5000ms)

• To display used settingslsattr –El entX

(default setting for poll_uplink is no)

# lsdev –Cc Adapter | grep ent

ent0 Available Virtual I/O Ethernet Adapter (l-lan)

ent1 Available Virtual I/O Ethernet Adapter (l-lan)

# lsattr –El ent0 | grep “poll_up”

poll_uplink no Enable Uplink Polling True

poll_uplink_int 1000 Time interval for Uplink Polling True

49

Details to “poll_uplink”

LAN State: Operational...

# entstat -d ent0

--------------------------------------------------

ETHERNET STATISTICS (en0) :

Device Type: Virtual I/O Ethernet Adapter (l-lan)...General Statistics:

-------------------

No mbuf Errors: 0

Adapter Reset Count: 0

Adapter Data Rate: 20000

Driver Flags: Up Broadcast Running

Simplex 64BitSupport ChecksumOffload

DataRateSet VIOENT...LAN State: Operational...

...

# entstat -d ent0

--------------------------------------------------

ETHERNET STATISTICS (en0) :

Device Type: Virtual I/O Ethernet Adapter (l-lan)...General Statistics:

-------------------

No mbuf Errors: 0

Adapter Reset Count: 0

Adapter Data Rate: 20000

Driver Flags: Up Broadcast Running

Simplex 64BitSupport ChecksumOffload

DataRateSet VIOENT VIRTUAL_PORT

PHYS_LINK_UP...LAN State: Operational

Bridge Status: Up...

poll_uplink=no poll_uplink=yes, physical link up

...

# entstat -d ent0

--------------------------------------------------

ETHERNET STATISTICS (en0) :

Device Type: Virtual I/O Ethernet Adapter (l-lan)...General Statistics:

-------------------

No mbuf Errors: 0

Adapter Reset Count: 0

Adapter Data Rate: 20000

Driver Flags: Up Broadcast Running

Simplex 64BitSupport ChecksumOffload

DataRateSet VIOENT VIRTUAL_PORT...LAN State: Operational

Bridge Status: Unknown...

poll_uplink=yes, physical link down

50

SANCOMM: Evaluate the use of this feature

WW

PN

WW

PN

WW

PN

WW

PN

StorageControllers

HeartbeatZone

(required)

Individual NodeStorage

Cluster Zones

StorageSubsystem

ClusterNodes

VIO 1

vFC adapters

WW

PN

WW

PN

WW

PN

VIOServers

WW

PN

WW

PN

VIO 2

WW

PN

WW

PN

VIO 3

WW

PN

WW

PN

VIO 4

WW

PN

WW

PN

WW

PN

WW

PN

WW

PN

VIO 1 VIO 2 VIO 3 VIO 4

TMEEnabled

TMEEnabled

TMEEnabled

TMEEnabled

Must also enable:

• dyntrk=yes

• fc_err_recov=fast_fail

51

FRAME 1 FRAME 2

Hypervisor

Node 2

ent0

en0

ent1

9.19.50.20

VLAN 3358

VIOS 1

ent0

VLAN 3358

ent1

NPIV

HBAtme=yes VIOS 2

ent0

VLAN 3358

ent1

NPIV

HBAtme=yes

FRAME 3

Hypervisor

VIOS 1

ent0

NPIV

HBAVIOS 2

ent0

NPIV

HBA

Network Requirements for SANCOMM

Hypervisor

Node 1

ent0

en0

ent1

9.19.50.10

VLAN 3358

VIOS 1

ent0

VLAN 3358

ent1

• The virtual adapter (on VLAN 3358) on both the VIO and client LPARs serve as a bridge

to allow for communication to the physical fiber channel adapter

NPIV

HBAtme=yes VIOS 2

ent0

VLAN 3358

ent1

NPIV

HBAtme=yes

Ultimately whether traffic continues

should depend on whether the target VIO

servers already have the required settings

enabled and available

To temporarily disable SANCOMM traffic:

• Edit /etc/cluster/ifrestrict with sfwcomm

• Run clusterconf command

• Enable settings (TME, zoning, virtual adapter)

• Remove edits & re-run clusterconf

Live Partition Mobility

52

LPM Recommendations: V7.1.3 & earlier

� Pre LPM Manual Steps:

• (Optional) UNMANAGE PowerHA Resources

• Disable SANCOMM if applicable

• clmgr query cluster|grep HEARTBEAT_FREQUENCY

• clmgr -f modify cluster HEARTBEAT_FREQUENCY="600”

• /usr/sbin/rsct/bin/hags_disable_client_kill -s cthags

• /usr/sbin/rsct/bin/dms/stopdms -s cthags

� Initiate LPM

� Post LPM Manual Steps:

• /usr/sbin/rsct/bin/dms/startdms -s cthags

• /usr/sbin/rsct/bin/hags_enable_client_kill -s cthags

• clmgr -f modify cluster HEARTBEAT_FREQUENCY="XX”

• Re-enable SANCOMM if applicable

• (Optional) Re-MANAGE PowerHA Resources

IBM Knowledge Center Reference:http://www.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.admngd/ha_admin_live_partition.htm?lang=en

Temporarily set

HBT frequency to

max values

PowerHA V7.2 does all of

these things automatically

& provides a tunable to

UNMANAGE the resources

automatically if desired

53

Rootvg failure handling

• Rootvg related disk loss is problematic for Operating System

� AIX in most cases continues to operate from memory

� Note that AIX will crash if reference is made to critical areas such as paging space (however this does not happen in modern systems due to large size of the memory)

� Most user space programs can not make progress since they need access to rootvg

Whats New:

mkvg and chvg provide options (-r) to create or modify critical VG attribute

• mkvg –r y|n

• chvg –r y|n

AIX 61 TL9 SP5AIX 71 TL3 SP5GA: Jun 2015

* Manually enable for HA 713, otherwise HA 7.2 will check & automatically set it

54

Quarantine Policies – Active Node Halt Policy

Expected Behavior:

� In the event of a resource

freeze do not allow critical RG

to come online on standby node

unless the source LPAR is truly

gone or fenced out

� Heartbeating would have to

cease across all heartbeat links

(IP, Repository & SANCOMM)

Three Available Options:

1) HMC Based Halt

2) Disk Reserve Based Fence-Out

3) HMC & Disk Reserve

Hardware Management Console

Storage Subsystem

Resource Group:

Mark as

Critical RG

IP

VG / File Systems

Application Workload

Active Node Halt (option 1)

SCSI3 Fence-Out (option 2)

55

Configuring Node Quarantine Feature

• clmgr modify cluster \

[QUARANTINE_POLICY=<node_halt | fencing | halt_with_fencing>]

[CRITICAL_RG=<rg_value>]

Quarantine Policy can be enabled via SMIT panels or CLI

56

RG1 (NodeA, NodeB)

NODE A NODE B

PowerHA controls• Start / Stop / Movement of WPAR

Monitoring• Application Custom Monitoring

• Monitor will run inside WPAR

Supported Environments• AIX 5.2 & 5.3 Version WPARs

• SAN dedicated disks

Limitations• maximum of 64 RGs

WPAR_rg1

AIX Global

Environment

Nodes: Node A Node B

WPAR name

Service IP

App Server

AIX Global

Environment

Cluster

Services

Cluster

Services

Must

match

The WPAR IP addresses and disks

are managed by the LPARs global

environment WPAR Manager (WPM)

PowerHA & WPAR

Integration in Global Environment

RG1 (NodeA, NodeB)WPAR_rg2

Nodes: Node A Node B

WPAR name

Service IP

App Server

Must

match

RG1 (NodeA, NodeB)WPAR_rg3

Nodes: Node A Node B

WPAR name

Service IP

App Server

Must

match

57

PowerVM: Simplified Remote Restart

What is it?

• Method to restart LPARs elsewhere if an entire server fails

• Available on P8 servers with PowerVM Enterprise Edition

Difference from LPM:

• VIO servers are not available

• HMC code level will dictate the level of functionality

• User must “manually” invoke the remote restart commands

• A clean up command must be run on the source

VIO 1 VIO 2 VIO 1 VIO 2

RR-AIX1

FRAME A FRAME B

VIO 1 VIO 2

FRAME C

RR-AIX2

RR-AIX3 �

SRR

SRR

HMC 1

Manually invoke RR

operation from the

HMC for each SRR

capable LPAR

58

SRR Availability vs Clustering - Getting the picture

Remote Restart Config:

• PowerVM | HMC Management

• Only one OS instance

• Entire Frame needs to fail

• SRR is not automated

• Limited # of concurrent Restarts

• FSM needs to be online

(until HMC 8.8.5)VIO 1 VIO 2 VIO 1 VIO 2

LPAR A1

LPAR A2

LPAR A3

LPAR A4

OS Data

HMC 1

HMC XLPAR A1

FRAME A FRAME B

Remote Restart is not an

LPAR / VM level HA

Solution a Restart operation

in this scenario would fail

59

PowerVM SRR & Critical LPAR Workload Failure

VIO 1 VIO 2 VIO 1 VIO 2

RR-AIX1

LPAR A3

LPAR A2

LPAR A1

OS Data

HMC 1

HMC X

FRAME A FRAME B

Manual attempt to

move the LPAR to

target server

Syntax Invoked:

hscroot@vHMC:~> rrstartlpar -o restart -m S822 -t S814 -p RR-AIX3

HSCLA9CE The managed system is not in a valid state to support partition remote restart operations

What are your recovery procedures for a single failed Critical Workload?

• LPAR recreate / swing data LUNs

• mksysb restore

• Clustered standby target

• Attempt LPAR Restart

• Troubleshoot & Recover

• Inactive Partition Mobility

60

SRR Availability vs Clustering - Getting the picture

Remote Restart Config:

• PowerVM | HMC Management

• Only one OS instance

• Entire Frame needs to fail

• SRR is not automated

• Limited # of concurrent Restarts

• FSM needs to be online

(until HMC 8.8.5)VIO 1 VIO 2 VIO 1 VIO 2

LPAR A1

LPAR A2

LPAR A3

LPAR A4

OS Data

Cluster Configuration:

• PowerVM (optional)

• HMC (optional)

• Typically SAN backed storage

• Cluster Software cost

• Learning Curve | Management

• Multiple OS instances VIO 1 VIO 2 VIO 1 VIO 2

HA Node A1

CAA Data

HA Node B1

OS OS

IP Heartbeat Links

HMC 1

HMC X

Data

LPAR A1

Shared Disks

LPAR (Not clustered)

FRAME A FRAME B

Remote Restart is not an

LPAR / VM level HA

Solution a Restart operation

in this scenario would fail

61

Summary

License the appropriate Edition for your needs

� Standard Edition – Local Clustering

� Enterprise Edition – Integration & Automation of IP or Storage Level replication

DLPAR Integration enables clustering with cost savings in mind

� ROHA – Power Enterprise Pool Integration

� SPP Resize on fallover

V7 Clusters bring in a number of new design considerations

� Unicast vs. Multicast communication protocol

� Temporary & Permanent hostname changes are now accepted by CAA

� Evaluate differences between Standard, Stretched & Linked clusters

� Review new FDT values in CAA & Tuning options

� Netmon.cf Usage

� Exploit critical rootvg feature with HA V7.1.3

� Evaluate new Quarantine features in HA V7.2

62

Questions?

Thank you for your time!

63

Useful References

• New V7.2 Redbook: SG24-8278

www.redbooks.ibm.com

• New PowerHA LinkedIN Group

https://www.linkedin.com/groups/8413388

• IBM DeveloperWorks PowerHA Forum

https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000001611

• Recommended Product Stable Points

https://aix.software.ibm.com/aix/ifixes/PHA_Migration/ha_install_mig_fixes.htm

• Product V7.2 Pubs

http://www.ibm.com/support/knowledgecenter/SSPHQG_7.2.0/com.ibm.powerha.navigation/welcome_ha_72.htm

64

� Cluster Aware AIX� IBM Director Integration� Hitachi TrueCopy & HUR

async Integration� DS8700 Global Mirror

Integration� Drop topology services for

MultiCast protocol� Storage Monitoring� HADR Storage Framework

PowerHA SystemMirror 7.1.0PowerHA SystemMirror 6.1

� DSCLI Metro Mirror VIOS� Packaging & Pricing Changes� p6/p7 CoD DLPAR Support� EMC SRDF Integration� GLVM Config Wizard� Full IPV6 Support

� CAA Repository Resilience� JFS2 Mount Guard support� SAP Hot Standby Solution� Federated Security� SAP & MQ Smart Assists� XIV Replication Integration� Director Plug-in Updates

PowerHA SystemMirror 7.1.1

� Enterprise Edition for V7� Streched & Linked clusters� Tie Breaker Disks� Hyperswap w/DS8800� Full IPV6 Support� Backup Repository Disks� Director DR Plug-in

Updates

PowerHA SystemMirror 7.1.2

� Unicast Heartbeating avail.� Active / Active Hyperswap� Single Node Hyperswap� Cluster Simulator� Manual Fallover Policy� Dynamic Hostname change� Smart Assist Updates

PowerHA SystemMirror 7.1.3

* Based on Older RSCT Architecture

EOSApril 2015

20132012

201120102009

PowerHA SystemMirror for AIX Feature Evolution

� Resource Optimized High Availability

� Quarantine Node Policies� Live Update Support� LPM Enhancements� Automatic Repository swap� NFS Backed Tie Breaker� Detailed Verification Checks

PowerHA SystemMirror 7.2.0

2015

65

PowerHA SystemMirror V7.2.0 - New Feature Summary

• Non-Disruptive Upgrade Support (PowerHA code)

• AIX Live Update Support & LPM Support

Enhancements

• Automatic Repository Disk Replacement

• Cluster Detailed Verification Checks

• Quarantine Policies (Critical RG)

• NFS Backed Tie Breaker Disk support

• ROHA (Resource Optimized High Availability)

• Ability to upgrade HA to 7.2 from 7.1.3 or

loading 7.2 follow-on fixes without requiring a

Rolling Upgrade or interruption of service

• Handshaking with API Framework

• New Cluster Tunables & Cluster Behavior

• Define multiple repository disks & auto

replacement behavior with AIX 7.2

• (optional) Validation of a number of new

checks including AIX Expert Settings

• HMC Node Halt Policy

• SCSI3 Node Fence Policy

• New support flexibility to avoid the need of a

NAS backed device when using Tie Breaker

Disk function

• Enterprise Pool Integration

• Manipulate Shared Processor Pool Sizes

• Deactivate Low Priority Partitions

• New HMC Integration Tunables