Upload
phungcong
View
502
Download
41
Embed Size (px)
Citation preview
Designing a PowerHA
SystemMirror Solution for AIX
Michael Herrera
Power Systems Strategic Initiatives Team
IBM – Coppell, TX
@Herrera_HADR
1
Agenda
• What are my options?
• How do I set it up? Requirements? Gotchas?
• What is new or different that might affect my configurations?
• Cluster Design
• Standard | Stretched | Linked clusters
• Split | Merge Features
• Heartbeat Communication Options
• Live Partition Mobility considerations
• PowerHA & critical volume groups
• Resiliency Enhancements
• Product Offering
• Common Topologies
• Licensing
2
PowerHA SystemMirror 7.2.0• AIX 7.1 TL4 | AIX 7.2
• AIX 7.1 TL3 – SP5
• AIX 6.1 TL9 – SP5
PowerHA SystemMirror 7.1.3• AIX 7.1 TL3 – SP1 with RSCT 3.1.5
• AIX 6.1 TL9 – SP1 with RSCT 3.1.5
PowerHA SystemMirror 7.1.2• AIX 7.1 TL2 - SP1 with RSCT 3.1.2.0
• AIX 6.1 TL8 - SP1 with RSCT 3.1.2.0
PowerHA SystemMirror 7.1.1• AIX 7.1 TL1 – SP3 with RSCT 3.1.2.0
• AIX 6.1 TL7 – SP3 with RSCT 3.1.2.0
PowerHA SystemMirror 7.1.0• AIX 7.1 with RSCT 3.1.0.1
• AIX 6.1 TL6 - SP1 with RSCT 3.1.0.1
PowerHA SystemMirror 6.1
• AIX 7.1 with RSCT 3.1.0.0• AIX 6.1 TL2 with RSCT 2.5.4.0
• AIX 5.3 TL9 with RSCT 2.4.12.0
Announce – Oct 2015
GA – Dec, 2015
SP 3 – March 2015
GA – Dec, 2013
EOL – April 2017
SP 6 – July 2015
GA – Nov, 2012
EOL – April 2016
SP 9 – May 2015
GA - Dec , 2011
EOS – April 2015
SP 9 – May 2014
GA - Sept , 2010
EOS – Sept 2014
SP 15 – April 2015
GA - Oct , 2009
EOS – April 2015
Standard
Edition5765 H39
Standard
Edition5765 H39
Standard
Edition5765 H23
Enterprise
Edition5765 H24
Enterprise
EditionN/A
Enterprise
EditionN/A
Standard
Edition5765 H39
Enterprise
Edition5765 H37
Standard
Edition5765 H39
Enterprise
Edition5765 H37
Standard
Edition5765 H39
Enterprise
Edition5765 H37
Minimum AIX Requirements for PowerHA SystemMirror
3
PowerHA SystemMirror for AIX Editions
Standard Edition
• Supports up to 16 nodes
• Supports Stretched or Linked clusters
• Provides local clustering functions
• Supports Manual or Smart Assist based
Deployments
• Traditionally shares same common
storage enclosure
• Supports 2 Site configurations:
� No Copy Services Integration
� No IP Replication Integration
� Supports Site Specific IPs
� Can be used with SVC Stretched Clusters
� Used with Cross Site LVM configurations
� Supports Split | Merge Policies when
configured as a Linked Cluster
Enterprise Edition
• Supports up to 16 nodes
• Supports Stretched or Linked clusters
• Application Smart Assistants also included for local portion of fallover configuration
• Provides local & extended cluster remote replication functions
• Can be configured to provide local clustering capabilities at first site and automated fallover to remote site
� Automates storage level Copy Services
� Automates IP Replication (GLVM)
� Integrates with DS8800 Hyperswap
� Supports up to 2 Sites
� Supports Split | Merge Policies
� Higher Price per core
4
PowerHA SystemMirror Standard Edition & CAA file sets
• CAA Packages:
cluster.license electronic license file
cluster.es.server base cluster filesets
cluster.adt.es Clinfo and Clstat samples and include files and a Web Based Monitor
cluster.doc.en_US.es PowerHA SystemMirror PDF Documentation
cluster.es.client cluster client binaries and libraries, plus Web based Smit for PowerHA
cluster.es.cspoc CSPOC and Dsh
cluster.es.migcheck Migration support
cluster.es.nfs NFS Server support
cluster.msg.en_US.es U.S. English message catalog
cluster.man.en_US.es man pages - U.S. English
cluster.doc.en_US.assist Smart Assist PDF documentation
cluster.hativoli PowerHA SystemMirror Tivoli Server and Client
cluster.es.assist Smart Assist filesets
cluster.msg.en_US.assist U.S. English Smart Assist messages
cluster.es.director.agent PowerHA SystemMirror Director CAS agent
cluster.es.cfs GPFS support
cluster.es.worksheets Online Planning Worksheets
• PowerHA SW packages:
bos.cluster.rte
bos.ahafs
bos.clvm.enh
devices.commom.IBM.storfwork
These should be part of
the base AIX build in AIX
6.1 TL6 and AIX V7
Part of a
traditional build
using the
Standard Edition
Consider these
optional packages
in the media
5
Product Stable Point (Recommended Levels)
Reference URL:
https://aix.software.ibm.com/aix/ifixes/PHA_Migration/ha_install_mig_fixes.htm
SP1 SP2 SP5 SP6 SP7
AIX 6.1 TL09 Link Link Link Link
AIX 7.1 TL03 Link Link Link
AIX 7.1 TL04 Link Link
AIX 7.2 Link Link
AIX: CAA and RSCT related Fix Bundles (updated June 24, 2016):
GA SP1 SP4
PowerHA 7.1.3 Link
PowerHA 7.2 Link Link
PowerHA Fix Bundle (updated July 6, 2016):
Site provides emgr
packages including Interim
fixes beyond the fixes
available for download in
Fix Central
6
Review of contents in AIX 714 SP1 bundle
# more README_AIX_7141
The epkgs contained in this tarball are:
MIG3_7141.160607.epkg.Z (CAA)
rsctHA7B4.160610.epkg.Z (RSCT)
# emgr -d -e MIG3_7141.160607.epkg.Z -v 3
Displaying Configuration File "APARREF"+------------------------------------------------------------------------+
25624|:|IV78064|:|UNDER RARE CIRCUMSTANCES CLSTRMGR MIGHT SEND SIGINT TO PID -1
25656|:|IV77352|:|HA:CAA DYN HOSTNAME CHANGE OPERATION MAY BREAK POWERHA MIGRATION
25414|:|IV75594|:|PowerHA may miss the manual merge notification from CAA/RSCT.
25494|:|IV76106|:|RG ONLINE AT BOTH NODES AFTER A RESOURCE FAILS TO BE ACQUIRED
26025|:|IV79497|:|SMCAACTRL IS BLOCKING NODE TIME
26602|:|IV83330|:|REDUCE COMMUNICATION_PATH CHANGES
26206|:|IV80748|:|HA: AUTOCLVERIFY DOESN'T WORK AFTER HA UPGRADE TO 713 SP4
26103|:|IV80053|:|SMCAACTRL MAY NOT ALLOW THE REPLACE REPOSITORY OPERATION
25368|:|IV75339|:|ALLOW NEW CAA TUNABLES TO BE SET VIA CLCTRL IN A POWERHA ENV.
24616|:|IV74077|:|HA SHUTDOWN -R CAUSES TAKEOVER STOP INSTEAD OF GRACEFUL STOP
26643|:|IV83599|:|POWERHA: CLMIXVER HANDLE=0 PREVENTS CLCOMD COMMUNICATION
26448|:|IV82534|:|POWERHA: CLVERIFY DOES NOT PREVENT DOUBLE MOUNT
7
SANCOMM
High Availability: Local Clustering
• Supported Topology Configurations:
• Active | Standby
• Active | Active (Independent Workloads)
• Active | Active (Concurrent)
LPAR A1 LPAR B1
Production
Workload
VIO 1 VIO 2 VIO 1 VIO 2
LPAR (not clustered)
LPAR (not clustered)
V7000
IBM FlashSystem
Standby
LPAR A2 LPAR B2
Production
Workload #1
Production
Workload #2
LPAR (not clustered)
LPAR (not clustered)
LPAR A3 LPAR B3
Concurrent
Workload
Concurrent
Workload
• Supported Shared Storage:
– Local clusters share the same storage
support as anything supported by AIX
– Native & OEM Multipath Drivers
• Supported Resource Configurations:
– Dedicated resources
– Virtualized (NPIV, VSCSI, SSP)
– Live Partition Mobility awareness
– AIX 7.2 Live Update awareness
• Supported Features:
– Resource Dependencies (not shown)
– Application Monitoring
– Custom Events
– Integrated DLPAR | PEP Integration
8
Enterprise Edition Software Packages
Replication Type File Sets to Install
ESS Direct Management PPRC
cluster.es.pprc.rte
cluster.es.pprc.cmds
cluster.msg.en_US.pprc
ESS DS6000/DS8000 Metro Mirror
DSCLI PPRC
cluster.es.spprc.cmds
cluster.es.spprc.rte
cluster.es.cgpprc.cmds
cluster.es.cgpprc.rte
cluster.msg.en_US.svcpprc
San Volume Controller (SVC) & Storwize
Family
cluster.es.svcpprc.cmds
cluster.es.svcpprc.rte
cluster.msg.en_US.svcpprc
XIV, DS8800 in-band and Hyperswap,
DS8700/DS8800 Global Mirror
cluster.es.genxd.cmds
cluster.es.genxd.rte
cluster.msg.en_US.genxd
Geographic Logical Volume Mirroring
(GLVM)
cluster.doc.en_US.glvm.pdf
cluster.msg.en_US.glvm
cluster.xd.glvm
glvm.rpv* (file sets in base AIX)
EMC SRDF
cluster.es.sr.cmds
cluster.es.sr.rte
cluster.msg.en_US.sr
Hitachi TrueCopy / Universal Replicator
cluster.es.tc.cmds
cluster.es.tc.rte
cluster.msg.en_US.tc
• Install the EE packages
needed for integration in
addition to the base code
• The installation will update
the new SMIT menus into the
PowerHA SM screens
• The Enterprise media now
includes the base code, the
EE packages and the Smart
Assist File Sets
9
Difference when Enterprise Edition is Installed
• Filesets Required for SVC Integration:
• smitty sysmirror ���� Cluster Applications & Resources ���� Resources
Entry Point into the EE resource configuration
Install the license fileset and the
packages applicable to the replication type in addition to the
base code
Product Applicable File Sets
Enterprise Edition License cluster.xd.license
San Volume Controller cluster.es.svcpprc.cmds
cluster.es.svcpprc.rte
cluster.msg.en_US.svcpprc
10
HA & DR: Automation of Site-to-Site Replication
LPAR A1 LPAR B1
Production
Workload
VIO 1 VIO 2 VIO 1 VIO 2
LPAR (not clustered)
LPAR (not clustered)
Standby
LPAR A2 LPAR B2
Production
Workload #1
Production
Workload #2
LPAR (not clustered)
LPAR (not clustered)
LPAR C1
VIO 1 VIO 2
LPAR (not clustered)
LPAR (not clustered)
Standby
LPAR C2
Standby
Primary Site Secondary Site
V9000 IBM FlashSystem V9000 IBM FlashSystem
Synchronous Replication
Asynchronous Replication
11
HA & DR: Automation of Site-to-Site Replication
LPAR A1 LPAR B1
Production
Workload
VIO 1 VIO 2 VIO 1 VIO 2
LPAR (not clustered)
LPAR (not clustered)
Standby
LPAR A2 LPAR B2
Production
Workload #1
Production
Workload #2
LPAR (not clustered)
LPAR (not clustered)
LPAR C1
VIO 1 VIO 2
LPAR (not clustered)
LPAR (not clustered)
Standby
LPAR C2
Standby
Primary Site Secondary Site
CG1 – DataVG1
CG2 – DataVG2
V9000 IBM FlashSystem V9000 IBM FlashSystem
12
Local HA & Replication to a Remote Site
Node A
Local
Cluster
Manual FalloverNode B
Storage Level
Replication (opt 2)
Node A
Remote Nodes
Within cluster
Node B Node C
Site A Site B
Standard Edition
Enterprise Edition
Cluster
If the DR location is
not part of the HA
cluster the LPARs
don’t need to be up
and running and
actively monitoring
heartbeats
Version 7 updates
• Tie Breaker Disks
- iSCSI or NFS backed
• Split | Merge Policies
- Majority
- Manual
IP Based Replication (opt 1)
Storage Level
Replication (opt 2)
Application Level
Replication (opt 1)
13
Different Storage Configuration Scenarios
Data Center A
Logical Volume Mirroring
Single Storage Subsystem(shared data volumes)
Prod LPAR Standby LPAR
Prod LPAR Standby LPAR
Prod LPAR Standby LPAR
Copy Services Replication(Sync / Asynchronous)
14
Storage Stretch Cluster Configuration
14
Data Center A Prod LPAR Standby LPARData Center B
Shared Virtualized Volume Definitions
Storage Copy 1 Storage Copy 2
• Cluster sees same PVID on both sides for shared LUNs
Benefits:
– Storage Subsystems maintains data copies
– Simpler configuration on client LPAR
– Facilitates VM Mobility (Live Partition Mobility)
Storage Level Replication behind the scenes
To the cluster this looks like a
local shared storage subsystem
configuration since it’s the
same PVID on both sides
15
Hyperswap Capabilities with Spectrum Virtualize
• PowerHA supports use of an SVC Enhanced Stretch Cluster
Metro Mirror Relationship
San Volume
Controller
SVC
Volume
Mirrors
Single SVC node Single SVC nodeSplit I/O Group
• Storwize 7.5 code supports Hyperswap or Enhanced Stretch Cluster
– Introduced in June 2015 release
– No longer requires use San Volume Controller with split I/O group
– The limitation today is that the 2 I/O Groups are still within same cluster
San Volume
Controller
Look out for Storwize updates
on Transparent Hyperswap
* Limitations with FlashCopy Manager & Global Mirror from volumes in Hyperswap relationship
16
PowerHA SystemMirror Licensing Software Tiers
16
POWER8Models
SoftwareTier
E880 Medium
E870 Medium
E850 Small
S824 Small
S822 Small
S814 Small
Physical Servers can be intermixed within a cluster configuration
* Cluster software is licensed by the
number of active cores
POWER7Models
SoftwareTier
Power 795 Large
Power 780 Large
Power 770 Medium
PureFlex Small
Power 750 Small
Entry Servers Small
Blades Small
Cheaper per
core price at
Power 8 for
Enterprise
Class Servers
Key Updates:
� Shared Processor Pool Resize
� Power Enterprise Pool Integration
� Medium Price per core on E870/E880
17
Environment: DLPAR Resource Processing Flow
Oracle DB 1 CPU
Banner DB 1 CPU
Standby 1 CPU
Standby 1 CPU
System A System B
DLPAR
Cluster 1
Cluster 2
HMC
- 4 CPU
- 4 CPU
- 4 CPU
- 4 CPU
+ 4 CPU
+ 4 CPU
+ 4 CPU
+ 4 CPU
DLPAR
Oracle DB 5 CPU
Banner DB 5 CPU
Oracle DB 5 CPU
Banner DB 5 CPU
LPAR Profile
Min 1
Desired 1
Max 5
LPAR Profile
Min 1
Desired 1
Max 5
Application Server
Min 1
Desired 5Max 5
Application Server
Min 1
Desired 5Max 5
LPAR Profile
Min 1
Desired 1
Max 5
LPAR Profile
Min 1
Desired 1
Max 5
1. Activate LPARs Activate LPARs2. Start PowerHA
3. Release resourcesFallover or RG_move
4. Release resourcesStop cluster without takeover
Application Server
Min 1
Desired 5Max 5
Application Server
Min 1
Desired 5Max 5
Read Requirements
Take Aways:• CPU allocations follow the application server wherever it is being hosted (this model allows you to lower the HA license count)• DLPAR resources will only get processed during the acquisition or release of cluster resources• PowerHA 6.1+ allows provide micro-partitioning support and the ability to also alter virtual processor counts• DLPAR resources can come from free CPUs in shared processor pool or CoD resources
18
Cluster Design with Savings in mind
• Standard Edition (local cluster scenario)
Oracle DB 5 CPU
Banner DB 5 CPU
Standby 5 CPU
Standby 5 CPU
Standby 5 CPU
Standby 5 CPU
PeopleSoft 5 CPU
Financial DB 5 CPU
System A System BCluster 1
PowerHA SE licenses:System A: 20 CPUs
System B: 20 CPUs
Total : 40 licenses
Oracle DB 5 CPU Standby .25 CPU
System A System BCluster 1
Banner DB 5 CPU
PeopleSoft 5 CPU
Financial DB 5 CPU
Standby .25 CPU
Standby .25 CPU
Standby .25 CPU
Cluster 2
Cluster 3
Cluster 4
PowerHA SE licenses:System A: 20 licenses
System B: 1 license
Total : 21 licenses
Cluster 2
Cluster 3
Cluster 4
Cost: Small – $104KMed – $146K
Large – $180K
Cost: Small – $54.6KMed – $76.6K
Large – $94.5K
19
Cluster Design with Savings in mind
• Enterprise Edition (Local HA & DR Integration)
Oracle DB 5 CPU
Banner DB 5 CPU
Standby 5 CPU
Standby 5 CPU
Standby 5 CPU
Standby 5 CPU
PeopleSoft 5 CPU
Financial DB 5 CPU
System A System BCluster 1
PowerHA EE licenses:System A: 20 CPUs
System B: 20 CPUs
System C: 20 CPUs
Total : 60 licenses
Oracle DB 5 CPU Standby .25 CPU
System A System BCluster 1
Standby 5 CPU
Standby 5 CPU
Standby 5 CPU
Standby DB 5 CPU
System C - DR
Standby .25 CPU
System C - DR
Banner DB 5 CPU
PeopleSoft 5 CPU
Financial DB 5 CPU
Standby .25 CPU
Standby .25 CPU
Standby .25 CPU
Standby .25 CPU
Standby .25 CPU
Standby .25 CPU
Cluster 2
Cluster 3
Cluster 4
PowerHA EE licenses:System A: 20 licenses
System B: 1 license
System C: 1 license
Total : 22 licenses
Cluster 2
Cluster 3
Cluster 4
Cost: Small – $204KMed – $315K
Large – $390K
Cost: Small – $74.8KMed – $115.5K
Large – $143K
20
PowerHA CoD and Enterprise Pool Support Summary
CoD Offering Type PowerHA 6.1 PowerHA 7.2.0
Permanent CPU, Memory Yes
On/OffCPU Yes Yes
Memory No Yes
Utility CoD CPU, MemoryUtility CoD automatically is performed at PHYP/System level.
PowerHA can not play a role in the same
Trial CoD CPU, Memory Yes
Enterprise Pools CPU, Memory No * Yes
You do not have to answer
Yes if you anticipate on using
Enterprise Pool Mobile cores
* Current integrated support is up to 8.8.4 HMC code
21
How the ROHA calculation is performed
Application Controller: App1
Processors & Memory Values:
Optional Amount of GB of Memory: 2GB
Optimal # Processing Units: 2.5
Optimal # Virtual Processors 5
Min + Optimal = (2 + 2) 4GB of Mem
Min + Optimal = (.5 + 2.5) 3 proc
Min + Optimal = (1 + 5) 6 virtual proc
• Pull from Trial CoD if available
• Pull from EPCoD if available
• Pull from On/Off is accepted license & its available
LPAR: mhha72node1
ProcessorsVirtual
ProcessorsMemory
Min .5 1 2 GB
Desired .5 2 2 GB
Max 3 6 4 GB
LPAR: mhha72node2
ProcessorsVirtual
ProcessorsMemory
Min .5 1 2 GB
Desired .5 2 2 GB
Max 3 6 4 GB
LPAR: < LPAR Hosting Workload >
ProcessorsVirtual
ProcessorsMemory
Active 3 6 4 GB
22
If necessary SPP size can be dynamically increased.
User agrees on this change through a tunable in PowerHA screens
Customer pays for 7 CPUs of Middleware Licenses, as he has 6 CPUs on active
frame, and 1 CPU on backup frame. Customer expects the SPP size to be adjusted
on both nodes, active node and backup node, at takeover time (and then that the
CoD CPUs are assigned to this LPAR)
PowerHA Shared Processor Pool (SPP) Resize
Normal Production
HA SPP
6 Processors
HA SPP
1 Processors
Server A Server B
DR Recovery Fallover Situation
HA SPP
1 Processors
HA SPP
6 Processors
Server A Server B
23
PowerHA SystemMirror V7 Deployment Methods
There are a number of different ways to achieve the same result:
• smitty sysmirror
- Initial | Discovery
- Custom Cluster configuration
• clmgr cluster copy � cluster cloning from snapshot
# clmgr manage snapshot restore <snapshot_name> \
nodes=<host>,<host#2> \
repositories=<disk>, [<backup>] [:<disk> [,<backup>]] \
[ cluster_name = <new_cluster_label> ] \
[ configure = yes | no ]
[ force = no | yes
• Snapshot must be
manually copied onto
new nodes
• Service Labels are
not preserved
• Will perform a new
discovery but will not
automatically
synchronize the
cluster
24
Expedited Deployment & Simplified Management
• V7 Command Line Interface (clmgr)1. clmgr add cluster <name> repository=<hdisk#> nodes=<node1>, <node2>
2. clmgr add service_ip <label> network=<name>
3. clmgr add application_controller <app_name> startscript=”<path>” stopscript=”<path>”
4. clmgr add <rg_name> nodes=<node1>, <node2> startup=ohn fallback=nfb service_label=<name> volume_group=<vg_names> application=<app_name>
5. clmgr sync cluster
Rapid Deployment
Cluster Worksheets
• Application Smart Assists
• Creation of cluster shell (Cluster | RGs | Resources)
• Auto provisioning of application start / stop logic
• Auto provisioning of application monitoring
# smitty clsa
25
PowerHA SystemMirror Cluster - Planning
Redundant SAN
Redundant LAN
Storage Enclosure
Server A Server B
Network Topology
� Reserve IPs | DNS Names
– Boot / Persistent / Service IPs
� Network Settings
– Unicast vs. Multicast
– IGMP_snooping
Storage
� Zoning | Mapping Requirements
� Multipath Drivers requirements
� ECM VG requirements
� HBA requirements (SANCOMM)
� Acquire Shared LUNs
– CAA Repository Disk
– Shared Data Volumes
Cluster Configuration
On Cluster LPARs:
� Install OS Pre-Reqs
� Install PowerHA filesets
� Configure Cluster
– Topology
– Resources
– Monitoring
26
A Closer Look at Cluster Configuration
Storage Enclosure
Resource Group Attributes:
� Startup, Fallover, Fallback
� Participating Nodes
� HA Resources
Application Controller
VG / File systems
Service IP
NFS Exports / Mounts
Imported VG
Definitions
NFS Mount/s
Resource Group:
Monitor/s
Startup Policy:
� Online on Home Node Only
� Online on First Available Node
� Online Using Distribution Policy
� Online on All Available Nodes
Fallover Policy:
� Fallover to the Next Priority Node
� Fallover Using Dynamic Node Priority
� Bring Offline
Fallback Policy:
� Never Fallback
� Fallback to Higher Priority Node
� Bring Offline
* Default values
Resource Group:
Dependent Workload
VG / File systems Monitor/s
Available RG Dependencies
27
New Resource Group Dependencies
Available RG Dependencies
� Parent / Child
� Location Dependencies
� Start After
� Stop After
Dynamic Node Priority
� Processor Utilization
� Memory Utilization
� Disk I/O Utilization
� cl_lowest_nonzero_udscript_rc
� cl_highest_udscript_rc
Resource Group:
Node List: A, B, C
Resource Group:
Processor Utilization
Memory Utilization
Disk I/O Utilization
Resource Group:
cl_lowest_nonzero_udscript_rc
cl_highest_udscript_rc
Static Fallover Policy
Dynamic Node Priority
DNP Adaptive Fallover
28
Application Monitoring within Cluster
• Some are provided in Smart Assistants• ie. cluster.es.assist.oracle � /usr/es/sbin/cluster/sa/oracle/sbin/DBInstanceMonitor
• A Monitor is bound to the Application Controller• Example OracleDB
Startup
Monitor
Process
Monitor
Custom
Monitor
Confirm the
startup of the
application
Invokes the
custom logic
Checks the
process table
New
Application
Startup Mode
in HA 7.1.1
60 sec
interval60 sec
interval
Only invoked
on
application
startup
Long Running Monitors
will continue run locally
with the running
application
• Application Monitoring
within the cluster
configuration is optional
• Monitoring can be
configured to perform
restarts | notify | fallover
• If the source LPAR
remains ONLINE and only
the application goes offline
- without monitoring the
cluster will not attempt to
relocate the workload/s
29
29
Application Start up Mode – New Option
• Application Controllers are started in background by default
• Foreground start causes event processing to wait for completion of the application start script
• Poorly designed scripts may cause hangs (config_too_long)
• Return codes usually not checked, SP1 will cause EVENT ERROR is RC=1
30
PowerHA: Looking under the Hood
NODE A NODE B
Storage Subsystem
Repository
LUN
Shared Data
Volumes
net_ether_0
CAA Unicast Communication(optional IP Multicast)
Application Monitoring
(optional)
HBA Based Heartbeating
(Optional)
Repository Heartbeating(Required)
Highlights:• CAA Kernel level Monitoring
• Heartbeat over all interfaces
• Handle Loss of Rootvg
• Exploit JFS2 Mountguard
• Disk Fencing Enhancements
• Quarantine Features
• CAA VIO NIC Failure Detection
• Resilient Repository Disks
• Tie Breaker Disks (NFS backed)
• Split | Merge Policies
Ongoing Tasks:• Nightly Verification
• Application monitoring (optional)
• Event based alerts (optional)
• AIX Error Report notification
• Live Partition Mobility awareness
• AIX Live Update awareness
RG1 (NodeA, NodeB)
Service IP
Volume Group
Application 1
SANCOMM
31
Why the Cluster “Type” matters
Stretched Cluster
Split: No action Merge: Majority
Split: Tie Breaker Disk | NFS Merge: Tie Breaker Disk | NFS
Linked Cluster
Split: None Merge: Majority
Split: Tie Breaker Disk | NFS Merge: Tie Breaker Disk | NFS
Split: Manual Merge: Manual
The topology you choose
matters if you want to take
advantage of the User
confirmation on fallover
feature
Standard Cluster
Split: Not supported Merge: Majority *
Manual: Operator must select which site continues | recovers# clmgr manage site respond [ continue | recover ]
The Split | Merge options
are only available when you
define Sites & define a
Stretched or Linked Cluster
32
Standard vs. Stretched Cluster Configuration
Stretched Cluster
Split: No action Merge: Majority
Split: Tie Breaker Merge: Tie Breaker
Both configurations
support the use of a
single repository disk
Standard Cluster
Split: Not Supported Merge: Not Supported
Site Definitions:- Site Specific IPs
- Site Specific RG Dependencies
- Tie Breaker Disk support
• Traditional Shared Disk Cluster
Best Suited for:- Cross Site LVM configuration
- Different Network segments
- Distinguish shared nodes between Metro area
Primary
Repository
Disk
Backup/s
(optional)
33
Standard, Stretched or Linked Clusters
Multicast Communication
between cluster members
Ie. 228.x.x.xMulticasting between local nodes &
Unicast Communication between Sites
Ie. 228.x.x.1
228.x.x.2
34
PowerHA SystemMirror
Stretched vs. Linked Cluster Configurations
Linked Cluster Topology
• One CAA Repository Disk per Site• Site Definitions
• Multiple Repository Disks• Site Specific IP Addresses
• Automated Start / Stop of Replication• Storage Copy Services Integration• IP Replication Integration (GLVM)
IP
VGs
App Server
IP Network
SAN Network
Resource Group
IP
VGs
App1
IP Network
SAN Network
Resource Group
IP Network
IP
VGs
App2
Resource Group
IP
VGs
Dev App
Resource Group
Site A
Site B
Disk Replication
Stretched Cluster Topology
� Single CAA Repository Disk
� Network Topology
� Resource Group/s
– IPs
– VGs
– Application Controller
� Application Monitor/s (optional)
CAA Repository Data Volumes
CAA Repository Source Data Volumes CAA RepositoryTarget Data Volumes
Site A
Site B
35
Using a Stretched or a Linked Cluster
IP
VGs
Workload
LPAR A
CAA Repository DataVG
LPAR B
CAA RepositoryDataVG
Storage Subsystem #1 Storage Subsystem #2
When you have Multiple Storage Subsystems where does the Repository Disk come from?
Backup
Repository Disks
Backup
Repository Disks
How many Backup Repository disks should you define?
36
PowerHA V7.2: Backup Repository Disks
View of CAA
Repository Disk and
assigned backups
from AIX – consider
renaming the hdisk #s
with rendev command
The minimum size
requirement for a
PowerHA/CAA Repository
Disk is 512MB
Different PowerHA
commands to view
the currently “active”
and “backup”
repository disks
37
Scenario: Small Server with only internal disks
Solution Details:• Uses Cluster Site definitions (Maximum of 2 LPARs in the configuration)
• Enterprise Edition will automate sync or async IP replication between machines
• Circumvents shared CAA Repository Disk requirement
• Exploits AIX Mirror Pools and HA Split | Merge Policies
Scale-out Box
Linked Cluster
Topology
Site A – Primary Site A – Secondary
Scale-out Box
Internal Disks Internal Disks
Geographic Logical
Volume Mirroring
(GLVM)
IP Replication
Copy 1 Copy 2
CAA
Site A
CAA
Site B
Each Site will see its own local disks and the Remote Physical Volumes
(RPVs)
Recommended:
• Multiple IP Links
• Tie Breaker Disk
38
38
Temporarily Removing CAA out of the equation
• Stopping cluster services does not close the CAA private volume group
root@mhoracle1 /> lspv | grep private
hdisk9 00f626d13aa3645a caavg_private active
root@mhoracle1 /> lsvg -l caavg_private
caavg_private:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
caalv_private1 boot 1 1 1 closed/syncd N/A
caalv_private2 boot 1 1 1 closed/syncd N/A
caalv_private3 boot 4 4 1 open/syncd N/A
powerha_crlv boot 1 1 1 closed/syncd N/A
� New option in Version 713 SP1 to stop CAA with it:
root@mhoracle1 /> clmgr stop cluster STOP_CAA=yes
root@mhoracle1 /> clmgr start cluster START_CAA=yes
Use of CAA option typically not required
39
Transition of PowerHA Topology IP Networks
en1
(persistent IP) 9.19.51.11 ( base address) 192.168.100.2en0
9.19.51.20 (service IP)9.19.51.10 (persistent IP)
192.168.100.1 (base address) en0
en1 ( base address) 192.168.101.2
9.19.51.21 (service IP)192.168.101.1 (base address)
VLAN
Traditional HA Network
(base address) 9.19.51.11en2
9.19.51.21 (service IP) 9.19.51.20 (service IP)9.19.51.10 (base address) en2
HB Rings
In 6.1 & below
Alternate Configuration (aggregation not showing)
en3 en3192.19.51.10 (base address) (base address) 192.19.51.10
VLAN
Cross Over Cable Provides
additional
resiliency and
bypasses
network
switches
( base address) 9.19.51.11en2
9.19.51.21 (service IP) 9.19.51.20 (service IP)9.19.51.10 (base address) en2
VLAN
ent0 ent1 ent0 ent1
Configuration Using Link Aggregation
EtherChannel or
Virtualized
Environments
with Dual VIOs
40
PowerHA SystemMirror Version 7.X
PowerHA Node 19.19.51.10( base address)
9.19.51.11( base address)
FRAME 1
en0
9.19.51.20 (service IP 1)
Virtual I/O Server (VIOS2)
Hypervisor
AIX Client LPAR
en0
ent0(virt)
Virtual I/O Server (VIOS1)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
ent3(LA)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
ent3(LA)
FRAME X
ent5(virt)
ent5(virt)
Control
ChannelControl
Channel
LPAR
VIO VIO
LPAR
LPAR
PowerHA Node 2
FRAME 2
en0
LPAR
VIO VIO
LPAR
LPAR
WAN
• Only IPAT via Aliasing is supported• Update netmon.cf file with IPs outside server
41
Virtual Ethernet & PowerHA SystemMirror
Virtual I/O Server (VIOS2)
Hypervisor
PowerHA LPAR 1
en0
ent0(virt)
Virtual I/O Server (VIOS1)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
Ethernet Switch
ent3(LA)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
ent3(LA)
Virtual I/O Server (VIOS2)
Hypervisor
PowerHA LPAR 2
en0
ent0(virt)
Virtual I/O Server (VIOS1)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
ent3(LA)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
ent3(LA)
Ethernet Switch
Frame1
Frame2
ent5(virt)
ent5(virt)
ent5(virt)
ent5(virt)
Control
Channel
Control
Channel
Control
ChannelControl
Channel
Independent Frames & Link Aggregation
42
Virtual Ethernet NIB & PowerHA SystemMirror
Frame 1
Hypervisor
Virtual I/O Server (VIOS2)LPAR 1
ent2
ent0(virt)
Virtual I/O Server (VIOS1)
ent4(SEA)
ent2(virt)
LPAR 2
ent0(virt)
ent2
ent0(phy)
Ethernet Switch
ent0(phy)
ent4(SEA)
ent2(virt)
Ethernet Switch
vswitch 0
This is an alternative configuration using virtual switches in order to
be able to have adapters active on each of the VIO servers.
Alternate configuration to provide load balancing between VIOs
ent1(virt)
ent1(virt)
vswitch 1
VLAN 1VLAN 1
NIB NIB
43
Subnet Requirements: Following the Rules
PowerHA Node 19.19.51.10( base address)
9.19.51.11( base address)
FRAME 1
en0
9.19.51.20 (service IP1)
LPAR
VIO VIO
LPAR
LPAR
PowerHA Node 2
FRAME 2
en0
LPAR
VIO VIO
LPAR
LPAR
WAN
PowerHA Node 1192.168.51.10( base address)
192.168.51.11( base address)
FRAME 1
en0
9.19.51.20 (service IP 1)
LPAR
VIO VIO
LPAR
LPAR
PowerHA Node 2
FRAME 2
en0
LPAR
VIO VIO
LPAR
LPAR
WAN
en1en110.19.51.10
( base address)
10.19.51.11( base address)
10.19.51.20 (service IP2)
net_ether_01
net_ether_02
en1 en1192.168.52.10( base address)
192.168.52.11( base address)net_ether_01
9.19.51.21 (service IP 2)
44
Simplified Topology in 7.1 Cluster
Sample Cluster Topology Output:
root@mhoracle1 /> cllsif
Adapter Type Network Net Type Attribute Node IP Address Interface Name Netmask
mhoracle1 boot net_ether_01 ether public mhoracle1 10.19.51.211 en0 255.255.255.0
sharesvc1 service net_ether_01 ether public mhoracle1 10.19.51.239 255.255.255.0
mhoracle2 boot net_ether_01 ether public mhoracle2 10.19.51.212 en0 255.255.255.0
sharesvc1 service net_ether_01 ether public mhoracle2 10.19.51.239 255.255.255.0
Status of the Interfaces
root@mhoracle1 /> lscluster -i
Network/Storage Interface Query
Cluster Name: sapdemo71_cluster
Cluster uuid: 3bd04654-3dfd-11e0-9641-46a6ba546403
Number of nodes reporting = 2
Number of nodes expected = 2
Node mhoracle1.dfw.ibm.com
Node uuid = bff1af28-3550-11e0-be44-46a6ba546403
Number of interfaces discovered = 4
Interface number 1 en0
Interface state UP
Interface number 2 en1
Interface state UP
Interface number 3 sfwcom
Interface state UP
Interface number 4 dpcom
Interface state UP
IP Heartbeating
HBA Heartbeating (optional)
Repository Disk
Note that this feature
is not supported on
16GB HBAs
45
Lets talk about speeds & tuning
CAA uses built-in valuesSet to be able to detect if the other side is unreachable within 5 seconds.(the used values can not be changed)
smitty sysmirror� Custom Cluster Configuration � Cluster Nodes & Networks � Manage Cluster � Cluster Heartbeat settings
Behavior prior to AIX 71 TL4.
In AIX 7.2 it will wait for the
period of the failure detection
time
Value of 0: Quick Failure Detection Process
Value of 5-590s : If value is specified CAA will use the full wait time process. The default in
AIX 7.2 should be 20.
HA 7.2 | AIX 71 TL3
HA 7.2 | AIX 72
46
Configure netmon.cf file
PowerHA V7.1
• RSCT based
• Up to 30 lines by interface
• Sequence about every 4 sec.
• Up 5 lines processed in parallel (if defined)
• The netmon.cf gets checked every few seconds for content changes
� Requires that the fix for IV74943
• To be able to define a specific latency for the network down detection
� Open a PMR and request the "Tunable FDT IFIX bundle"
PowerHA 7.2
• CAA based
• Up to 5 lines by interface
• Will be used if CAA heartbeating detects an outage
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
# repeated entries for
longer latency
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
# repeated entries for
longer latency
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
!REQD en0 192.168.60.10
!REQD en0 192.168.60.1
No need to do this to extend
network down detection as it
would only add .5s max
Usual configuration but
consider lines for the
various interfaces in
your environment
47
Virtual Ethernet device & "poll_uplink” setting
System A Using poll_uplink=no
HYPV
ent3
virt
ent0
ent2
SEA
LPAR x
en1
PowerHA
VIO
Gateway
Client
a.b.c.2
(fixed)
ent4
virt
ent1
ent5
SEA
en0
System AUsing poll_uplink=yes
HYPV
ent3
virt
ent0
ent2
SEA
LPAR x
en1
PowerHA
VIO
Gateway
Client
a.b.c.2
(fixed)
ent4
virt
ent1
ent5
SEA
en0
Physical link = down Physical link = down
Virtual link = up Virtual link = down
48
Using poll_uplink
• Requirements to use poll_uplink
- VIO 2.2.3.4 or later & AIX 71 TL3 (SP3 for entstat output)
• Need to be set on the LPAR
- Enable poll_uplink on virtual entX interfaces
# chdev -l entX -a poll_uplink=yes –P
• Possible Settingspoll_uplink (yes, no)
poll_uplink_int (100ms – 5000ms)
• To display used settingslsattr –El entX
(default setting for poll_uplink is no)
# lsdev –Cc Adapter | grep ent
ent0 Available Virtual I/O Ethernet Adapter (l-lan)
ent1 Available Virtual I/O Ethernet Adapter (l-lan)
# lsattr –El ent0 | grep “poll_up”
poll_uplink no Enable Uplink Polling True
poll_uplink_int 1000 Time interval for Uplink Polling True
49
Details to “poll_uplink”
LAN State: Operational...
# entstat -d ent0
--------------------------------------------------
ETHERNET STATISTICS (en0) :
Device Type: Virtual I/O Ethernet Adapter (l-lan)...General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 20000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
DataRateSet VIOENT...LAN State: Operational...
...
# entstat -d ent0
--------------------------------------------------
ETHERNET STATISTICS (en0) :
Device Type: Virtual I/O Ethernet Adapter (l-lan)...General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 20000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
DataRateSet VIOENT VIRTUAL_PORT
PHYS_LINK_UP...LAN State: Operational
Bridge Status: Up...
poll_uplink=no poll_uplink=yes, physical link up
...
# entstat -d ent0
--------------------------------------------------
ETHERNET STATISTICS (en0) :
Device Type: Virtual I/O Ethernet Adapter (l-lan)...General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 20000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
DataRateSet VIOENT VIRTUAL_PORT...LAN State: Operational
Bridge Status: Unknown...
poll_uplink=yes, physical link down
50
SANCOMM: Evaluate the use of this feature
WW
PN
WW
PN
WW
PN
WW
PN
StorageControllers
HeartbeatZone
(required)
Individual NodeStorage
Cluster Zones
StorageSubsystem
ClusterNodes
VIO 1
vFC adapters
WW
PN
WW
PN
WW
PN
VIOServers
WW
PN
WW
PN
VIO 2
WW
PN
WW
PN
VIO 3
WW
PN
WW
PN
VIO 4
WW
PN
WW
PN
WW
PN
WW
PN
WW
PN
VIO 1 VIO 2 VIO 3 VIO 4
TMEEnabled
TMEEnabled
TMEEnabled
TMEEnabled
Must also enable:
• dyntrk=yes
• fc_err_recov=fast_fail
51
FRAME 1 FRAME 2
Hypervisor
Node 2
ent0
en0
ent1
9.19.50.20
VLAN 3358
VIOS 1
ent0
VLAN 3358
ent1
NPIV
HBAtme=yes VIOS 2
ent0
VLAN 3358
ent1
NPIV
HBAtme=yes
FRAME 3
Hypervisor
VIOS 1
ent0
NPIV
HBAVIOS 2
ent0
NPIV
HBA
Network Requirements for SANCOMM
Hypervisor
Node 1
ent0
en0
ent1
9.19.50.10
VLAN 3358
VIOS 1
ent0
VLAN 3358
ent1
• The virtual adapter (on VLAN 3358) on both the VIO and client LPARs serve as a bridge
to allow for communication to the physical fiber channel adapter
NPIV
HBAtme=yes VIOS 2
ent0
VLAN 3358
ent1
NPIV
HBAtme=yes
Ultimately whether traffic continues
should depend on whether the target VIO
servers already have the required settings
enabled and available
To temporarily disable SANCOMM traffic:
• Edit /etc/cluster/ifrestrict with sfwcomm
• Run clusterconf command
• Enable settings (TME, zoning, virtual adapter)
• Remove edits & re-run clusterconf
Live Partition Mobility
52
LPM Recommendations: V7.1.3 & earlier
� Pre LPM Manual Steps:
• (Optional) UNMANAGE PowerHA Resources
• Disable SANCOMM if applicable
• clmgr query cluster|grep HEARTBEAT_FREQUENCY
• clmgr -f modify cluster HEARTBEAT_FREQUENCY="600”
• /usr/sbin/rsct/bin/hags_disable_client_kill -s cthags
• /usr/sbin/rsct/bin/dms/stopdms -s cthags
� Initiate LPM
� Post LPM Manual Steps:
• /usr/sbin/rsct/bin/dms/startdms -s cthags
• /usr/sbin/rsct/bin/hags_enable_client_kill -s cthags
• clmgr -f modify cluster HEARTBEAT_FREQUENCY="XX”
• Re-enable SANCOMM if applicable
• (Optional) Re-MANAGE PowerHA Resources
IBM Knowledge Center Reference:http://www.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.admngd/ha_admin_live_partition.htm?lang=en
Temporarily set
HBT frequency to
max values
PowerHA V7.2 does all of
these things automatically
& provides a tunable to
UNMANAGE the resources
automatically if desired
53
Rootvg failure handling
• Rootvg related disk loss is problematic for Operating System
� AIX in most cases continues to operate from memory
� Note that AIX will crash if reference is made to critical areas such as paging space (however this does not happen in modern systems due to large size of the memory)
� Most user space programs can not make progress since they need access to rootvg
Whats New:
mkvg and chvg provide options (-r) to create or modify critical VG attribute
• mkvg –r y|n
• chvg –r y|n
AIX 61 TL9 SP5AIX 71 TL3 SP5GA: Jun 2015
* Manually enable for HA 713, otherwise HA 7.2 will check & automatically set it
54
Quarantine Policies – Active Node Halt Policy
Expected Behavior:
� In the event of a resource
freeze do not allow critical RG
to come online on standby node
unless the source LPAR is truly
gone or fenced out
� Heartbeating would have to
cease across all heartbeat links
(IP, Repository & SANCOMM)
Three Available Options:
1) HMC Based Halt
2) Disk Reserve Based Fence-Out
3) HMC & Disk Reserve
Hardware Management Console
Storage Subsystem
Resource Group:
Mark as
Critical RG
IP
VG / File Systems
Application Workload
Active Node Halt (option 1)
SCSI3 Fence-Out (option 2)
55
Configuring Node Quarantine Feature
• clmgr modify cluster \
[QUARANTINE_POLICY=<node_halt | fencing | halt_with_fencing>]
[CRITICAL_RG=<rg_value>]
Quarantine Policy can be enabled via SMIT panels or CLI
56
RG1 (NodeA, NodeB)
NODE A NODE B
PowerHA controls• Start / Stop / Movement of WPAR
Monitoring• Application Custom Monitoring
• Monitor will run inside WPAR
Supported Environments• AIX 5.2 & 5.3 Version WPARs
• SAN dedicated disks
Limitations• maximum of 64 RGs
WPAR_rg1
AIX Global
Environment
Nodes: Node A Node B
WPAR name
Service IP
App Server
AIX Global
Environment
Cluster
Services
Cluster
Services
Must
match
The WPAR IP addresses and disks
are managed by the LPARs global
environment WPAR Manager (WPM)
PowerHA & WPAR
Integration in Global Environment
RG1 (NodeA, NodeB)WPAR_rg2
Nodes: Node A Node B
WPAR name
Service IP
App Server
Must
match
RG1 (NodeA, NodeB)WPAR_rg3
Nodes: Node A Node B
WPAR name
Service IP
App Server
Must
match
57
PowerVM: Simplified Remote Restart
What is it?
• Method to restart LPARs elsewhere if an entire server fails
• Available on P8 servers with PowerVM Enterprise Edition
Difference from LPM:
• VIO servers are not available
• HMC code level will dictate the level of functionality
• User must “manually” invoke the remote restart commands
• A clean up command must be run on the source
VIO 1 VIO 2 VIO 1 VIO 2
RR-AIX1
FRAME A FRAME B
VIO 1 VIO 2
FRAME C
RR-AIX2
RR-AIX3 �
�
SRR
SRR
HMC 1
Manually invoke RR
operation from the
HMC for each SRR
capable LPAR
58
SRR Availability vs Clustering - Getting the picture
Remote Restart Config:
• PowerVM | HMC Management
• Only one OS instance
• Entire Frame needs to fail
• SRR is not automated
• Limited # of concurrent Restarts
• FSM needs to be online
(until HMC 8.8.5)VIO 1 VIO 2 VIO 1 VIO 2
LPAR A1
LPAR A2
LPAR A3
LPAR A4
OS Data
HMC 1
HMC XLPAR A1
FRAME A FRAME B
Remote Restart is not an
LPAR / VM level HA
Solution a Restart operation
in this scenario would fail
59
PowerVM SRR & Critical LPAR Workload Failure
VIO 1 VIO 2 VIO 1 VIO 2
RR-AIX1
LPAR A3
LPAR A2
LPAR A1
OS Data
HMC 1
HMC X
FRAME A FRAME B
Manual attempt to
move the LPAR to
target server
Syntax Invoked:
hscroot@vHMC:~> rrstartlpar -o restart -m S822 -t S814 -p RR-AIX3
HSCLA9CE The managed system is not in a valid state to support partition remote restart operations
What are your recovery procedures for a single failed Critical Workload?
• LPAR recreate / swing data LUNs
• mksysb restore
• Clustered standby target
• Attempt LPAR Restart
• Troubleshoot & Recover
• Inactive Partition Mobility
60
SRR Availability vs Clustering - Getting the picture
Remote Restart Config:
• PowerVM | HMC Management
• Only one OS instance
• Entire Frame needs to fail
• SRR is not automated
• Limited # of concurrent Restarts
• FSM needs to be online
(until HMC 8.8.5)VIO 1 VIO 2 VIO 1 VIO 2
LPAR A1
LPAR A2
LPAR A3
LPAR A4
OS Data
Cluster Configuration:
• PowerVM (optional)
• HMC (optional)
• Typically SAN backed storage
• Cluster Software cost
• Learning Curve | Management
• Multiple OS instances VIO 1 VIO 2 VIO 1 VIO 2
HA Node A1
CAA Data
HA Node B1
OS OS
IP Heartbeat Links
HMC 1
HMC X
Data
LPAR A1
Shared Disks
LPAR (Not clustered)
FRAME A FRAME B
Remote Restart is not an
LPAR / VM level HA
Solution a Restart operation
in this scenario would fail
61
Summary
License the appropriate Edition for your needs
� Standard Edition – Local Clustering
� Enterprise Edition – Integration & Automation of IP or Storage Level replication
DLPAR Integration enables clustering with cost savings in mind
� ROHA – Power Enterprise Pool Integration
� SPP Resize on fallover
V7 Clusters bring in a number of new design considerations
� Unicast vs. Multicast communication protocol
� Temporary & Permanent hostname changes are now accepted by CAA
� Evaluate differences between Standard, Stretched & Linked clusters
� Review new FDT values in CAA & Tuning options
� Netmon.cf Usage
� Exploit critical rootvg feature with HA V7.1.3
� Evaluate new Quarantine features in HA V7.2
63
Useful References
• New V7.2 Redbook: SG24-8278
www.redbooks.ibm.com
• New PowerHA LinkedIN Group
https://www.linkedin.com/groups/8413388
• IBM DeveloperWorks PowerHA Forum
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000001611
• Recommended Product Stable Points
https://aix.software.ibm.com/aix/ifixes/PHA_Migration/ha_install_mig_fixes.htm
• Product V7.2 Pubs
http://www.ibm.com/support/knowledgecenter/SSPHQG_7.2.0/com.ibm.powerha.navigation/welcome_ha_72.htm
64
� Cluster Aware AIX� IBM Director Integration� Hitachi TrueCopy & HUR
async Integration� DS8700 Global Mirror
Integration� Drop topology services for
MultiCast protocol� Storage Monitoring� HADR Storage Framework
PowerHA SystemMirror 7.1.0PowerHA SystemMirror 6.1
� DSCLI Metro Mirror VIOS� Packaging & Pricing Changes� p6/p7 CoD DLPAR Support� EMC SRDF Integration� GLVM Config Wizard� Full IPV6 Support
� CAA Repository Resilience� JFS2 Mount Guard support� SAP Hot Standby Solution� Federated Security� SAP & MQ Smart Assists� XIV Replication Integration� Director Plug-in Updates
PowerHA SystemMirror 7.1.1
� Enterprise Edition for V7� Streched & Linked clusters� Tie Breaker Disks� Hyperswap w/DS8800� Full IPV6 Support� Backup Repository Disks� Director DR Plug-in
Updates
PowerHA SystemMirror 7.1.2
� Unicast Heartbeating avail.� Active / Active Hyperswap� Single Node Hyperswap� Cluster Simulator� Manual Fallover Policy� Dynamic Hostname change� Smart Assist Updates
PowerHA SystemMirror 7.1.3
* Based on Older RSCT Architecture
EOSApril 2015
20132012
201120102009
PowerHA SystemMirror for AIX Feature Evolution
� Resource Optimized High Availability
� Quarantine Node Policies� Live Update Support� LPM Enhancements� Automatic Repository swap� NFS Backed Tie Breaker� Detailed Verification Checks
PowerHA SystemMirror 7.2.0
2015
65
PowerHA SystemMirror V7.2.0 - New Feature Summary
• Non-Disruptive Upgrade Support (PowerHA code)
• AIX Live Update Support & LPM Support
Enhancements
• Automatic Repository Disk Replacement
• Cluster Detailed Verification Checks
• Quarantine Policies (Critical RG)
• NFS Backed Tie Breaker Disk support
• ROHA (Resource Optimized High Availability)
• Ability to upgrade HA to 7.2 from 7.1.3 or
loading 7.2 follow-on fixes without requiring a
Rolling Upgrade or interruption of service
• Handshaking with API Framework
• New Cluster Tunables & Cluster Behavior
• Define multiple repository disks & auto
replacement behavior with AIX 7.2
• (optional) Validation of a number of new
checks including AIX Expert Settings
• HMC Node Halt Policy
• SCSI3 Node Fence Policy
• New support flexibility to avoid the need of a
NAS backed device when using Tie Breaker
Disk function
• Enterprise Pool Integration
• Manipulate Shared Processor Pool Sizes
• Deactivate Low Priority Partitions
• New HMC Integration Tunables