PowerHA for AIX Cookbook - hmm.presalesadvisor.com · ibm.com/redbooks PowerHA for AIX Cookbook Shawn Bodily Rosemary Killeen Liviu Rosca Extended case studies with practical disaster

ibm.com/redbooks

PowerHA for AIXCookbook

Shawn BodilyRosemary Killeen

Liviu Rosca

Extended case studies with practical disaster recovery examples

Explore the latest PowerHA V5.5 features

Enterprise ready

Front cover

http://www.redbooks.ibm.com/


PowerHA for AIX Cookbook

August 2009

International Technical Support Organization

SG24-7739-00

© Copyright International Business Machines Corporation 2009. All rights reserved.Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADPSchedule Contract with IBM Corp.

First Edition (August 2009)

This edition applies to Version 5, Release 5, of IBM High Availability Cluster Multi-Processing (product number 5765-F62).

Note: Before using this information and the product it supports, read the information in “Notices” on page xxvii.

Contents

Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi

Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii

Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxviiTrademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxixThe team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxixBecome a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiComments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi

Part 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 1. Introduction to PowerHA for AIX . . . . . . . . . . . . . . . . . . . . . . . . . 31.1 What is PowerHA for AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.1 High availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.2 Cluster multi-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Availability solutions: An overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.1 Downtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 Single point of failure (SPOF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 History and evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.1 HACMP Version 5 Release 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.2 HACMP Version 5 Release 4.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.3 PowerHA Version 5 Release 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4 High availability terminology and concepts . . . . . . . . . . . . . . . . . . . . . . . . 121.4.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.4.2 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.5 High availability versus fault tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.5.1 Fault-tolerant systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.5.2 High availability systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.6 Software planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.6.1 AIX level and related requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 161.6.2 Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.7 PowerHA software installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.7.1 Checking for prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.7.2 New installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

© Copyright IBM Corp. 2009. All rights reserved. iii

1.7.3 Installing PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Chapter 2. High availability components . . . . . . . . . . . . . . . . . . . . . . . . . . 232.1 PowerHA configuration data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2 Software components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.3 Cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3.1 RSCT and PowerHA heartbeating . . . . . . . . . . . . . . . . . . . . . . . . . . 322.3.2 Heartbeat over IP aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.3.3 TCP/IP networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.3.4 IP address takeover mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.3.5 Persistent IP label or address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.3.6 Device based or serial networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.3.7 Network modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.3.8 Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542.3.9 Network security considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

2.4 Resources and resource groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.4.2 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592.4.3 NFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662.4.4 Application servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672.4.5 Application monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682.4.6 Communication adapters and links . . . . . . . . . . . . . . . . . . . . . . . . . . 682.4.7 Tape resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692.4.8 Fast connect resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692.4.9 Workload Manager integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692.4.10 Resource groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

2.5 Plug-ins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882.6 Features (HACMP 5.1, 5.2 and 5.3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

2.6.1 New features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892.6.2 Features no longer supported . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

2.7 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 922.8 Storage characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

2.8.1 Shared LVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932.8.2 Non-concurrent access mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942.8.3 Concurrent access mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 952.8.4 Enhanced concurrent mode volume groups . . . . . . . . . . . . . . . . . . . 962.8.5 Fast disk takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

2.9 Shared storage configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 972.9.1 Shared LVM requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 972.9.2 Non-concurrent, enhanced concurrent, and concurrent . . . . . . . . . . 98

Part 2. Planning, installation, and migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Chapter 3. Planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

iv PowerHA for AIX Cookbook

3.1 High availability planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053.2 Planning for PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

3.2.1 Planning strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1073.2.2 Planning tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

3.3 Getting started. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1083.3.1 Current environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1093.3.2 Addressing single points of failure . . . . . . . . . . . . . . . . . . . . . . . . . 1113.3.3 Initial cluster design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123.3.4 Completing the cluster overview planning worksheet . . . . . . . . . . . 113

3.4 Planning cluster hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1143.4.1 Overview of cluster hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1143.4.2 Completing the cluster hardware planning worksheet . . . . . . . . . . 115

3.5 Planning cluster software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163.5.1 AIX and RSCT levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163.5.2 Virtual LAN and SCSI support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163.5.3 Required AIX filesets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173.5.4 AIX security filesets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1183.5.5 PowerHA filesets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1183.5.6 AIX files altered by PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1203.5.7 Application software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1243.5.8 Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1243.5.9 Completing the software planning worksheet . . . . . . . . . . . . . . . . . 125

3.6 Operating system considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1253.7 Planning security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

3.7.1 Cluster security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1263.7.2 User administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1293.7.3 HACMP group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1303.7.4 PowerHA ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1303.7.5 Planning for PoweHA file collections. . . . . . . . . . . . . . . . . . . . . . . . 130

3.8 Planning cluster networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1313.8.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1333.8.2 General network considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 1333.8.3 IP Address Takeover planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1413.8.4 Heartbeating over aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1443.8.5 Non-IP network planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1463.8.6 Planning RS232 serial networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 1513.8.7 Planning disk heartbeating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1523.8.8 Additional network planning considerations . . . . . . . . . . . . . . . . . . 1553.8.9 Completing the network planning worksheets. . . . . . . . . . . . . . . . . 157

3.9 Planning storage requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1593.9.1 Internal disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1593.9.2 Shared disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1603.9.3 Enhanced Concurrent Mode (ECM) volume groups . . . . . . . . . . . . 161

Contents v

3.9.4 Shared logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1623.9.5 Fast disk takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1633.9.6 Completing the storage planning worksheets . . . . . . . . . . . . . . . . . 164

3.10 Application planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1653.10.1 Application servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1673.10.2 Application monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1673.10.3 Availability analysis tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1683.10.4 Applications integrated with PowerHA . . . . . . . . . . . . . . . . . . . . . 1683.10.5 Completing the application planning worksheets . . . . . . . . . . . . . 169

3.11 Planning for resource groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1723.11.1 Resource group attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1733.11.2 Completing the planning worksheet . . . . . . . . . . . . . . . . . . . . . . . 175

3.12 Detailed cluster design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1773.13 Developing a cluster test plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

3.13.1 Custom test plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1783.13.2 Cluster Test Tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

3.14 Developing a PowerHA installation plan . . . . . . . . . . . . . . . . . . . . . . . . 1823.15 Backing up the cluster configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 1843.16 Documenting the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

3.16.1 Exporting a cluster definition file using SMIT . . . . . . . . . . . . . . . . 1863.16.2 Creating a cluster definition file from a snapshot using SMIT . . . . 1873.16.3 Creating a configuration report . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

3.17 Change and problem management. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1903.18 Planning tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

3.18.1 Cluster diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1913.18.2 Online Planning Worksheets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1923.18.3 Paper planning worksheets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Chapter 4. Installation and configuration . . . . . . . . . . . . . . . . . . . . . . . . . 1994.1 Basic steps to implement a PowerHA cluster . . . . . . . . . . . . . . . . . . . . . 2004.2 Configuring PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

4.2.1 General considerations for the configuration method . . . . . . . . . . . 2034.2.2 Standard Configuration Path: The two-Node Cluster Configuration

Assistant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2044.2.3 Using Extended Configuration Path and C-SPOC . . . . . . . . . . . . . 213

4.3 Installing and configuring WebSMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2244.3.1 The PowerHA related SMIT panels and their structure. . . . . . . . . . 2254.3.2 Installing a Web server with the IBM HTTP Server code . . . . . . . . 2264.3.3 Installing WebSMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2284.3.4 Configuring WebSMIT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2284.3.5 Starting WebSMIT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2294.3.6 Registering clusters with the WebSMIT gateway . . . . . . . . . . . . . . 2304.3.7 Accessing WebSMIT pages and add clusters. . . . . . . . . . . . . . . . . 232

vi PowerHA for AIX Cookbook

4.3.8 Introduction into WebSMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2364.3.9 WebSMIT monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

Chapter 5. Migrating a cluster to PowerHA V5.5 . . . . . . . . . . . . . . . . . . . 2415.1 Identifying the migration path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

5.1.1 Migration methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2425.1.2 Supported migration paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

5.2 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2435.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2455.4 General migration steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2465.5 Scenarios tested . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

5.5.1 Scenario 1: Non-disruptive upgrade (NDU) from HACMP 5.4.1 to PowerHA 5.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

5.5.2 Scenario 2: Rolling migration from HACMP 5.3 to PowerHA5.5. . . 2515.5.3 Scenario 3: Snapshot upgrade from HACMP 5.3 to PowerHA5.5 . 2585.5.4 Scenario 4: Offline upgrade from HACMP 5.3 to PowerHA 5.5 . . . 262

5.6 Post-migration steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2625.7 Troubleshooting a failed migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

5.7.1 Backing out of a failed migration . . . . . . . . . . . . . . . . . . . . . . . . . . . 2635.7.2 Reviewing the cluster version in the HACMP ODM . . . . . . . . . . . . 2665.7.3 Troubleshooting a stalled snapshot application . . . . . . . . . . . . . . . 2675.7.4 DARE error during synchronization. . . . . . . . . . . . . . . . . . . . . . . . . 2675.7.5 Error: config_too_long during migration . . . . . . . . . . . . . . . . . . . . . 268

Part 3. Cluster administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

Chapter 6. Cluster maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2716.1 Change control and testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

6.1.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2726.1.2 Test cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

6.2 Starting and stopping the cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2736.2.1 Cluster Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2746.2.2 Starting cluster services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2756.2.3 Stopping cluster services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

6.3 Resource group and application management . . . . . . . . . . . . . . . . . . . . 2806.3.1 Bringing a resource group offline using SMIT . . . . . . . . . . . . . . . . . 2806.3.2 Bringing a resource group online using SMIT . . . . . . . . . . . . . . . . . 2826.3.3 Moving a resource group using SMIT . . . . . . . . . . . . . . . . . . . . . . . 2836.3.4 Suspending and resuming application monitoring . . . . . . . . . . . . . 285

6.4 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2866.4.1 PCI hot-plug replacement of a NIC . . . . . . . . . . . . . . . . . . . . . . . . . 2866.4.2 Fixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2906.4.3 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2916.4.4 Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

Contents vii

6.5 Cluster Test Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2946.5.1 Custom testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2956.5.2 Test duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2956.5.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2966.5.4 Automated testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2966.5.5 Custom testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

Chapter 7. Cluster management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3257.1 C-SPOC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

7.1.1 C-SPOC in general . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3267.1.2 C-SPOC SMIT menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

7.2 File collections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3297.2.1 Predefined file collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3307.2.2 Managing file collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

7.3 User administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3377.3.1 C-SPOC user and group administration . . . . . . . . . . . . . . . . . . . . . 3387.3.2 Password management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

7.4 Shared storage management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3537.4.1 Updating LVM components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3547.4.2 C-SPOC Logical Volume Manager . . . . . . . . . . . . . . . . . . . . . . . . . 3577.4.3 C-SPOC Concurrent Logical Volume Management . . . . . . . . . . . . 3587.4.4 C-SPOC Physical Volume Management. . . . . . . . . . . . . . . . . . . . . 3597.4.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3607.4.6 C-SPOC command line interface (CLI) . . . . . . . . . . . . . . . . . . . . . . 372

7.5 Time synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4167.6 Cluster verification and synchronization . . . . . . . . . . . . . . . . . . . . . . . . . 416

7.6.1 Cluster verification and synchronization using SMIT . . . . . . . . . . . 4177.6.2 Dynamic cluster reconfiguration: DARE . . . . . . . . . . . . . . . . . . . . . 4217.6.3 Verification log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4237.6.4 Running automatically corrective actions during verification. . . . . . 4247.6.5 Automatic cluster verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426

7.7 Monitoring PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4277.7.1 Cluster status checking utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4287.7.2 Cluster status and services checking utilities . . . . . . . . . . . . . . . . . 4307.7.3 Topology information commands . . . . . . . . . . . . . . . . . . . . . . . . . . 4327.7.4 Resource group information commands . . . . . . . . . . . . . . . . . . . . . 4347.7.5 Log files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4367.7.6 Error notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4407.7.7 Application monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4407.7.8 Measuring application availability . . . . . . . . . . . . . . . . . . . . . . . . . . 453

Chapter 8. Cluster security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4578.1 Cluster security and the clcomd daemon . . . . . . . . . . . . . . . . . . . . . . . . 458

viii PowerHA for AIX Cookbook

8.1.1 The /usr/es/sbin/cluster/etc/rhosts file. . . . . . . . . . . . . . . . . . . . . . . 4588.1.2 Disabling the Cluster Communication daemon . . . . . . . . . . . . . . . . 4598.1.3 Additional cluster security features . . . . . . . . . . . . . . . . . . . . . . . . . 4598.1.4 Cluster communication over VPN . . . . . . . . . . . . . . . . . . . . . . . . . . 460

8.2 Using encrypted inter-node communication . . . . . . . . . . . . . . . . . . . . . . 4608.2.1 Encryption key management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4608.2.2 Setting up message authentication and encryption . . . . . . . . . . . . 4618.2.3 Troubleshooting message authentication and encryption. . . . . . . . 4688.2.4 Checking the current message authentication settings. . . . . . . . . . 468

8.3 Secure remote command execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4698.4 WebSmit security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470

8.4.1 Secure WebSMIT communication. . . . . . . . . . . . . . . . . . . . . . . . . . 4708.4.2 User authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4718.4.3 Access to WebSMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4718.4.4 Access to WebSMIT panels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472

8.5 PowerHA and firewalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4738.6 RSCT security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474

8.6.1 RSCT and PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4748.6.2 Cluster Security Services (CtSec) overview . . . . . . . . . . . . . . . . . . 4768.6.3 Mechanism abstraction layer (MAL) . . . . . . . . . . . . . . . . . . . . . . . . 4788.6.4 Mechanism pluggable modules (MPM). . . . . . . . . . . . . . . . . . . . . . 4798.6.5 Host-based authentication with ctcasd . . . . . . . . . . . . . . . . . . . . . . 4808.6.6 Identity mapping service and RMC access control lists . . . . . . . . . 481

Part 4. Advanced topics (with examples) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485

Chapter 9. Virtualization and PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . 4879.1 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4889.2 Virtual I/O Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4899.3 DLPAR and application provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492

9.3.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4939.3.2 Application provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4949.3.3 Configuring DLPAR to PowerHA. . . . . . . . . . . . . . . . . . . . . . . . . . . 5029.3.4 Troubleshooting HMC verification errors. . . . . . . . . . . . . . . . . . . . . 5129.3.5 Test cluster configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5149.3.6 Test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515

9.4 Live Partition Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5249.5 Workload Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525

9.5.1 Relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5259.5.2 Planning for Highly Available WPARs. . . . . . . . . . . . . . . . . . . . . . . 5279.5.3 Resource Groups and WPARs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531

Chapter 10. Extending resource group capabilities. . . . . . . . . . . . . . . . . 53510.1 Settling time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536

Contents ix

10.2 Node distribution policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53910.2.1 Configuring a RG node-based distribution policy . . . . . . . . . . . . . 54010.2.2 Node-based distribution scenario . . . . . . . . . . . . . . . . . . . . . . . . . 541

10.3 Dynamic node priority (DNP). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54210.3.1 Configuring the dynamic node priority policy . . . . . . . . . . . . . . . . 54210.3.2 Changing an existing resource group to use DNP policy . . . . . . . 54410.3.3 How dynamic node priority functions . . . . . . . . . . . . . . . . . . . . . . 544

10.4 Delayed fallback timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54610.5 Resource group dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551

10.5.1 Resource group parent/child dependency . . . . . . . . . . . . . . . . . . 55210.5.2 Resource group location dependency. . . . . . . . . . . . . . . . . . . . . . 55310.5.3 Combining various dependency relationships. . . . . . . . . . . . . . . . 55710.5.4 Displaying resource group dependencies . . . . . . . . . . . . . . . . . . . 558

Chapter 11. Customizing events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56111.1 Overview of cluster events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56211.2 Writing scripts for custom events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56311.3 Pre-event and post-event commands . . . . . . . . . . . . . . . . . . . . . . . . . . 563

11.3.1 Parallel processed resource groups and usage of pre-event and post-event scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564

11.3.2 Configuring pre-event or post-event scripts . . . . . . . . . . . . . . . . . 56511.4 Automatic error notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566

11.4.1 Disk monitoring consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56711.4.2 Setting up automatic error notification. . . . . . . . . . . . . . . . . . . . . . 56711.4.3 Listing automatic error notification . . . . . . . . . . . . . . . . . . . . . . . . 56711.4.4 Removing automatic error notification. . . . . . . . . . . . . . . . . . . . . . 56811.4.5 Using error notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56911.4.6 Customizing event duration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57111.4.7 Defining new events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572

Chapter 12. Storage related considerations . . . . . . . . . . . . . . . . . . . . . . . 57512.1 Volume group types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576

12.1.1 Enhanced concurrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57612.1.2 Non-concurrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57812.1.3 Concurrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578

12.2 Disk reservations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57812.3 Forced varyon of volume groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58012.4 Fast disk takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58012.5 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580

12.5.1 How fast disk takeover works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58112.5.2 Enabling fast disk takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583

12.6 Disk heartbeat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58612.6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586

x PowerHA for AIX Cookbook

12.6.2 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58812.6.3 Performance considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58812.6.4 Configuring traditional disk heartbeat . . . . . . . . . . . . . . . . . . . . . . 58912.6.5 Configuring multi-node disk heartbeat . . . . . . . . . . . . . . . . . . . . . 59112.6.6 Testing disk heartbeat connectivity . . . . . . . . . . . . . . . . . . . . . . . . 59412.6.7 Monitoring disk heartbeat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596

12.7 Fast failure detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597

Chapter 13. Networking considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 59913.1 EtherChannel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600

13.1.1 Implementing EtherChannel in a PowerHA environment . . . . . . . 60113.1.2 Configuration procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603

13.2 Distribution preference for service IP aliases . . . . . . . . . . . . . . . . . . . . 60913.2.1 Configuring service IP distribution policy . . . . . . . . . . . . . . . . . . . 61013.2.2 Lab experiences with service IP distribution policy . . . . . . . . . . . . 611

13.3 Site specific service IP labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61313.4 Understanding the netmon.cf file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621

13.4.1 New netmon.cf format for VIO environments . . . . . . . . . . . . . . . . 62213.4.2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625

13.5 Understanding the clhosts file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62613.6 Understanding the clinfo.rc file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628

Part 5. Disaster recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631

Chapter 14. PowerHA Extended Distance concepts and planning. . . . . 63314.1 Disaster recovery considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63414.2 PowerHA/XD components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636

14.2.1 PowerHA/XD Metro Mirror integration feature . . . . . . . . . . . . . . . 63614.2.2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637

14.3 PowerHA/XD SVC Global Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63814.4 PowerHA GLVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63814.5 Locating additional information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639

Chapter 15. PowerHA with cross-site LVM mirroring . . . . . . . . . . . . . . . . 64115.1 Cross-site LVM mirroring introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 642

15.1.1 Comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64215.1.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643

15.2 Infrastructure considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64315.3 Configuring cross-site LVM mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . 644

15.3.1 Configuring the cross-site LVM cluster . . . . . . . . . . . . . . . . . . . . . 64415.3.2 Configuring cluster sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64515.3.3 Configuring cross-site LVM mirroring site dependencies . . . . . . . 64615.3.4 Configuring volume groups with cross-site LVM mirror. . . . . . . . . 65015.3.5 Resource groups and cross-site LVM mirroring . . . . . . . . . . . . . . 655

Contents xi

15.4 Testing cross-site LVM mirroring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65815.4.1 Verifying the cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65815.4.2 Tested scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658

15.5 Maintaining cross-site LVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661

Chapter 16. PowerHA/XD and SVC copy services . . . . . . . . . . . . . . . . . . 67316.1 Scenario description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67416.2 implementing a PowerHA/XD SVC configuration . . . . . . . . . . . . . . . . . 675

16.2.1 PowerHA/XD SVC prerequisites overview . . . . . . . . . . . . . . . . . . 67516.2.2 Installing PowerHA/XD for SVC . . . . . . . . . . . . . . . . . . . . . . . . . . 67816.2.3 Configuring PowerHA/XD for SVC . . . . . . . . . . . . . . . . . . . . . . . . 679

Chapter 17. GLVM concepts and configuration . . . . . . . . . . . . . . . . . . . . 69717.1 PowerHA/XD GLVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698

17.1.1 Definitions and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69817.1.2 Configuring Synchronous GLVM with PowerHA/XD . . . . . . . . . . . 703

17.2 Converting from GLVM in synchronous mode to asynchronous mode . 73817.2.1 Migration steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73817.2.2 Test primary site failure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749

17.3 Migration: Logic for going from HAGEO to GLVM. . . . . . . . . . . . . . . . . 75617.3.1 Install GLVM filesets and configure GLVM . . . . . . . . . . . . . . . . . . 75817.3.2 Performance considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76217.3.3 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763

17.4 Steps for migrating from HAGEO to GLVM. . . . . . . . . . . . . . . . . . . . . . 765

Part 6. Appendixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773

Appendix A. Paper planning worksheets . . . . . . . . . . . . . . . . . . . . . . . . . 775Two-node cluster configuration assistant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776TCP/IP network planning worksheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777TCP/IP network interface worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778Fibre Channel Disks Worksheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780Shared volume group and file system worksheet. . . . . . . . . . . . . . . . . . . . . . 781NFS-Exported file system or directory worksheet . . . . . . . . . . . . . . . . . . . . . 782Application worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783Application server worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784Application monitor worksheet (custom) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785Resource group worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786Cluster events worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788Cluster file collections worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789

Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791

Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793

xii PowerHA for AIX Cookbook

IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794How to get Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797

Contents xiii

xiv PowerHA for AIX Cookbook

Figures

1-1 PowerHA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72-1 Software model of a PowerHA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . 262-2 Example of cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282-3 RSCT and important cluster daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . 342-4 PowerHA networks superimposed on the topology . . . . . . . . . . . . . . . . . 362-5 Example showing RSCT communication groups and node roles . . . . . . . 372-6 Split cluster caused by failure of network component . . . . . . . . . . . . . . . . 382-7 Heartbeating in a PowerHA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392-8 Three-node cluster with heartbeat over IP alias . . . . . . . . . . . . . . . . . . . . 422-9 IPAT via IP replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452-10 IPAT via IP aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472-11 Highly available resources superimposed on the cluster topology . . . . . 582-12 Online on home node only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722-13 Online on first available node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732-14 Online on all available nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742-15 Online using distribution policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752-16 Fallover to next priority node in list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762-17 Fallover using dynamic node priority. . . . . . . . . . . . . . . . . . . . . . . . . . . . 772-18 Bring offline (on error node only). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782-19 Fall back to higher priority node in list . . . . . . . . . . . . . . . . . . . . . . . . . . . 792-20 Never fall back . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802-21 Parent / child resource group relationships. . . . . . . . . . . . . . . . . . . . . . . 842-22 Resource group dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863-1 PowerHA implementation steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1073-2 Initial environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1103-3 Initial cluster design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123-4 HACMP Cluster Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1323-5 Ethernet Switch Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1353-6 Multiple EtherChannel configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1363-7 Persistent Aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1383-8 IPAT via replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1423-9 Heartbeating over Aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1463-10 Partitioned Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1483-11 Two Ethernet non-partitioned cluster . . . . . . . . . . . . . . . . . . . . . . . . . . 1493-12 Ethernet and point-to-point loop network configuration. . . . . . . . . . . . . 1503-13 Ethernet and point-to-point star network configuration . . . . . . . . . . . . . 1513-14 NULL modem cable wiring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1523-15 Disk Heartbeating Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

© Copyright IBM Corp. 2009. All rights reserved. xv

3-16 External Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1633-17 Detailed cluster design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1773-18 Online planning worksheets option in SMIT . . . . . . . . . . . . . . . . . . . . . 1863-19 SMIT panel: Export Definition File for Online Planning Worksheets . . . 1873-20 Sample Configuration Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1893-21 Sample Cluster Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1913-22 Online Planning Worksheets Main Menu . . . . . . . . . . . . . . . . . . . . . . . 1943-23 Import OLPW file to create cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1964-1 Websmit PowerHA menu structure (extract) . . . . . . . . . . . . . . . . . . . . . . 2254-2 IBM HTTP Server Install Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2274-3 WebSMIT cluster node registration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2314-4 WebSMIT login page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2324-5 WebSMIT add a cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2334-6 Cluster registration from within WebSMIT. . . . . . . . . . . . . . . . . . . . . . . . 2344-7 WebSMIT Enterprise View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2354-8 WebSMIT cluster details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2364-9 WebSMIT frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2394-10 WebSMIT nodes and networks view. . . . . . . . . . . . . . . . . . . . . . . . . . . 2404-11 WebSMIT cluster status icons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2406-1 Start Cluster Services Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2766-2 Stop Cluster Services Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2786-3 Resource Group picklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2816-4 Bring a resource group offline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2826-5 Destination node picklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2836-6 Move a Resource Group SMIT panel . . . . . . . . . . . . . . . . . . . . . . . . . . . 2846-7 Resource Group Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2856-8 Custom test SMIT menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3237-1 Add a file collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3337-2 Change a file collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3347-3 Add files a file collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3357-4 Removing files from a file collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3367-5 Manual propagation of a file collection . . . . . . . . . . . . . . . . . . . . . . . . . . 3377-6 Select nodes by resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3397-7 Create a user on all cluster nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3407-8 Listing users in the cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3417-9 Modifying user attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3427-10 Remove a user from the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3437-11 Add a group to the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3457-12 List groups on the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3467-13 Change / show group attributes on the cluster . . . . . . . . . . . . . . . . . . . 3477-14 Modifying the system password utility. . . . . . . . . . . . . . . . . . . . . . . . . . 3497-15 Managing list of users allowed to change their password cluster-wide. 3507-16 Selecting users allowed to change their password cluster-wide . . . . . . 351

xvi PowerHA for AIX Cookbook

7-17 Change a user’s password in the cluster . . . . . . . . . . . . . . . . . . . . . . . 3527-18 Change your own password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3537-19 C-SPOC importvg panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3577-20 C-SPOC LVM testing cluster setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3617-21 C-SPOC CLI command listing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3737-22 Verification and Synchronization panel - active cluster. . . . . . . . . . . . . 4197-23 Verification and Synchronization panel - inactive cluster . . . . . . . . . . . 4207-24 Verification panel using “Problem Determination Tools” path . . . . . . . . 4217-25 Automatic Cluster Configuration Monitoring . . . . . . . . . . . . . . . . . . . . . 4277-26 clstat command syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4297-27 Subsystem names and group names used by PowerHA . . . . . . . . . . . 4317-28 clshowsrv command syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4317-29 cltopinfo command syntax. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4327-30 SMIT show cluster topology menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4337-31 clRGinfo command syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4347-32 Adding process application monitor SMIT panel. . . . . . . . . . . . . . . . . . 4437-33 Adding Application Availability Analysis SMIT panel. . . . . . . . . . . . . . . 4548-1 Enable automatic key distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4628-2 Configure message authentication mode . . . . . . . . . . . . . . . . . . . . . . . . 4648-3 Generate a key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4658-4 Checking the message authentication settings. . . . . . . . . . . . . . . . . . . . 4698-5 SMIT Extended Topology Configuration fast path . . . . . . . . . . . . . . . . . 4728-6 RSCT and PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4758-7 Basic cluster communication overview . . . . . . . . . . . . . . . . . . . . . . . . . . 4768-8 Cluster Security Services (CtSec) architecture . . . . . . . . . . . . . . . . . . . . 4789-1 Example considerations for PowerHA with VIO . . . . . . . . . . . . . . . . . . . 4889-2 Example PowerHA configuration with VIO . . . . . . . . . . . . . . . . . . . . . . . 4909-3 Partition name and AIX host name matching . . . . . . . . . . . . . . . . . . . . . 4949-4 Enable remote command execution on the HMC . . . . . . . . . . . . . . . . . . 5059-5 Defining HMC and Managed System to PowerHA . . . . . . . . . . . . . . . . . 5099-6 Defining application provisioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5129-7 DLPAR Resource acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5169-8 DLPAR resource release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5179-9 First production LPAR fallover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5189-10 Second production LPAR fallover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5199-11 Fallover second production LPAR first . . . . . . . . . . . . . . . . . . . . . . . . . 5209-12 Results after second fallover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5219-13 Resource group release and acquisition from rg_move . . . . . . . . . . . . 5229-14 HMC redundancy test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5239-15 Workload Partitions relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5269-16 IBM Workload partitions Manager for AIX. . . . . . . . . . . . . . . . . . . . . . . 5289-17 Relocation options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5299-18 Relocation and transition of the WPAR. . . . . . . . . . . . . . . . . . . . . . . . . 530

Figures xvii

9-19 WPAR environment configured for high availability . . . . . . . . . . . . . . . 53310-1 Settling time scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53810-2 Online Using Node Distribution policy scenario . . . . . . . . . . . . . . . . . . 54110-3 Delayed fallback timer usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54712-1 Enhanced concurrent volume group example. . . . . . . . . . . . . . . . . . . . 57712-2 Passive mode volume group status . . . . . . . . . . . . . . . . . . . . . . . . . . . 58312-3 Example of how PowerHA determines volume group type . . . . . . . . . . 58412-4 Adding diskhb network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59012-5 Adding individual diskhb communication devices . . . . . . . . . . . . . . . . . 59012-6 Main mult-node disk heartbeat menu . . . . . . . . . . . . . . . . . . . . . . . . . . 59212-7 Free disks for multi-node disk heartbeat . . . . . . . . . . . . . . . . . . . . . . . . 59212-8 Create a multi-node disk heartbeat final menu . . . . . . . . . . . . . . . . . . . 59412-9 Disk heartbeat communications test . . . . . . . . . . . . . . . . . . . . . . . . . . . 59512-10 Monitoring diskhb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59612-11 Enabling fast failure detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59813-1 EtherChannel and PowerHA test environment . . . . . . . . . . . . . . . . . . . 60213-2 Ethernet adapter settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60413-3 Add EtherChannel Menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60513-4 Configure IP to EtherChannel device . . . . . . . . . . . . . . . . . . . . . . . . . . 60613-5 EtherChannel configuration cllsif output . . . . . . . . . . . . . . . . . . . . . . . . 60713-6 EtherChannel Resource Group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60713-7 Single adapter network warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60813-8 EtherChannel errors in AIX error report . . . . . . . . . . . . . . . . . . . . . . . . 60813-9 Add Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61413-10 Add network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61513-11 Add a site specific service IP label . . . . . . . . . . . . . . . . . . . . . . . . . . . 61613-12 Add a resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61713-13 Add service IP into resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . 61814-1 Example of a PowerHA/XD Metro Mirror configuration. . . . . . . . . . . . . 63714-2 GLVM example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63915-1 LVM cross-mirrored cluster testing environment. . . . . . . . . . . . . . . . . . 64515-2 Adding a site in the topology configuration . . . . . . . . . . . . . . . . . . . . . . 64615-3 Add a volume to a shared volume group . . . . . . . . . . . . . . . . . . . . . . . 66215-4 Select physical volumes to add logical volumes on . . . . . . . . . . . . . . . 66315-5 Add a shared logical volume with superstrict allocation policy . . . . . . . 66415-6 Mirrored logical volume partition mapping . . . . . . . . . . . . . . . . . . . . . . 66515-7 Shared logical volume pop-up list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66615-8 Increase the size of a shared logical volume . . . . . . . . . . . . . . . . . . . . 66715-9 Add a file system to a previously defined cross-site logical volume . . . 66915-10 Increase the size of Shared Enhanced Journaled File System. . . . . . 67116-1 PowerHA/XD SVC scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67416-2 SVC Vdisk unique ids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67716-3 Adding sites in the topology configuration. . . . . . . . . . . . . . . . . . . . . . . 681

xviii PowerHA for AIX Cookbook

16-4 Add XD_ip network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68216-5 Adding an SVC cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68316-6 Adding an SVC PPRC relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68516-7 Add SVC PPRC replicated resource. . . . . . . . . . . . . . . . . . . . . . . . . . . 68616-8 Add a resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69316-9 Add SVC replicated resources into resource group . . . . . . . . . . . . . . . 69517-1 RPV client viewed from Node1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70117-2 Reverse configuration on fallover to remote site. . . . . . . . . . . . . . . . . . 70217-3 RPV client and server configuration on both sites . . . . . . . . . . . . . . . . 70317-4 RPV server disk picklist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70817-5 Add RPV servers menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70917-6 Add RPV client first menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71017-7 Add RPV clients to local node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71117-8 Change RPV servers into defined state . . . . . . . . . . . . . . . . . . . . . . . . 71217-9 Add RPV clients to remote node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71417-10 Create volume group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71617-11 Change volume group to disable quorum . . . . . . . . . . . . . . . . . . . . . . 71717-12 Add logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71817-13 Adding a file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71917-14 Add remote site mirror copy to a logical volume . . . . . . . . . . . . . . . . . 72117-15 Node active at primary site with backup down . . . . . . . . . . . . . . . . . . 73017-16 Backup site active and data replicating. . . . . . . . . . . . . . . . . . . . . . . . 73117-17 Backup node active with primary site down . . . . . . . . . . . . . . . . . . . . 73217-18 Primary node integrated into the cluster and replication started . . . . . 73317-19 Application active on primary with 2 PV and one RPV client . . . . . . . 73417-20 Primary site down and application using 1 local copy of VG. . . . . . . . 73517-21 Application did not fallback on integration of primary site . . . . . . . . . . 73617-22 Application falls over on primary site integration. . . . . . . . . . . . . . . . . 73717-23 Example HACMP/XD cluster for migration to GLVM . . . . . . . . . . . . . 75717-24 Example with one GMD resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75817-25 RPV Server created . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75917-26 Mirror data LVs to RPV client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76017-27 Application now using GLVM devices. . . . . . . . . . . . . . . . . . . . . . . . . 761

Figures xix

xx PowerHA for AIX Cookbook

Tables

1-1 Types of availability solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51-2 Single points of failure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91-3 AIX level requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162-1 Base IP addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412-2 Configured heartbeat over alias IP addresses (base 198.10.1.1). . . . . . . 422-3 Characteristics of RAID levels widely used. . . . . . . . . . . . . . . . . . . . . . . . 632-4 Resource group attributes and how they affect RG behavior . . . . . . . . . . 812-5 PowerHA limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923-1 Single Points of Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1113-2 Cluster overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1143-3 Cluster hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1153-4 AIX and RSCT levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163-5 Cluster Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1253-6 Cluster Ethernet Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1573-7 Point-to-Point Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1583-8 Cluster Communication Interfaces and IP addresses . . . . . . . . . . . . . . . 1583-9 Shared Disks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1643-10 Shared Volume Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1643-11 Application Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1693-12 Application Monitoring Worksheet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1713-13 Resource group behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1723-14 Resource Groups Worksheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1753-15 Sample Test Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1785-1 Supported release upgrades to PowerHA 5.5. . . . . . . . . . . . . . . . . . . . . 2425-2 HACMP cluster version ODM stanzas . . . . . . . . . . . . . . . . . . . . . . . . . . 2667-1 Cross-reference of users, resource groups and nodes. . . . . . . . . . . . . . 3387-2 Cross-reference of groups, resource groups and nodes. . . . . . . . . . . . . 3449-1 Profile characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4999-2 Application requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4999-3 LPAR characteristics for node Longhorn . . . . . . . . . . . . . . . . . . . . . . . . 5019-4 Application requirements for App1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5019-5 Partition profile settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5159-6 Application server DLPAR settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51510-1 Resource group attribute behavior relationships. . . . . . . . . . . . . . . . . . 53512-1 dhb_read usage examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59517-1 RPV names used in our environment . . . . . . . . . . . . . . . . . . . . . . . . . . 727A-1 Two-node Cluster Configuration Assistant . . . . . . . . . . . . . . . . . . . . . . . 776A-2 Cluster Ethernet Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777

© Copyright IBM Corp. 2009. All rights reserved. xxi

A-3 Network interface worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778A-4 Cluster Serial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779A-5 Fibre Channel Disks Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780A-6 Shared volume group and file system worksheet . . . . . . . . . . . . . . . . . . 781A-7 NFS export file system or directory worksheet . . . . . . . . . . . . . . . . . . . . 782A-8 Application worksheet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783A-9 Application server worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784A-10 Application server worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785A-11 Resource Groups.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786A-12 Cluster event worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788A-13 Cluster file collections worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789

xxii PowerHA for AIX Cookbook

Examples

2-1 Checking DNP values as known to clstrmgrES . . . . . . . . . . . . . . . . . . . . 822-2 Modifying and checking RG dependencies. . . . . . . . . . . . . . . . . . . . . . . . 863-1 Network sensitivity for Ether type network . . . . . . . . . . . . . . . . . . . . . . . 1564-1 Creating an enhanced capable volume group using SMIT . . . . . . . . . . . 2074-2 Creating a shared logical volume using SMIT. . . . . . . . . . . . . . . . . . . . . 2084-3 Creating a shared jfs2 file system using SMIT . . . . . . . . . . . . . . . . . . . . 2104-4 Running the Two-Node Cluster Configuration Assistant. . . . . . . . . . . . . 2124-5 Discovered and pre-defined network types. . . . . . . . . . . . . . . . . . . . . . . 2154-6 Adding an IP-based network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2164-7 Adding discovered communication devices . . . . . . . . . . . . . . . . . . . . . . 2174-8 Adding pre-defined communication interfaces . . . . . . . . . . . . . . . . . . . . 2184-9 Cluster verification and synchronization . . . . . . . . . . . . . . . . . . . . . . . . . 2184-10 Selecting a disk accessible to all cluster nodes . . . . . . . . . . . . . . . . . . 2204-11 Selecting the type of the shared volume group. . . . . . . . . . . . . . . . . . . 2204-12 Creating a resource group when creating a shared volume group . . . . 2214-13 Successful creation of a shared volume group . . . . . . . . . . . . . . . . . . . 2224-14 The newly-defined resource group shows up as a cluster resource . . . 2224-15 IHS directory contents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2264-16 IHS installation executable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2264-17 Client filesets for WebSMIT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2285-1 Normal cluster status prior to starting non-disruptive upgrade . . . . . . . . 2495-2 Cluster status with one node in the unmanaged state . . . . . . . . . . . . . . 2505-3 clstrmgrES state after upgrade of PowerHA filesets . . . . . . . . . . . . . . . . 2505-4 Cluster resource groups before starting migration . . . . . . . . . . . . . . . . . 2515-5 Recording cluster name, version, and ID before starting the migration . 2525-6 AIX and RSCT levels on cluster node xdsvc2. . . . . . . . . . . . . . . . . . . . . 2535-7 Cluster resource groups after node xdscv2 failed over. . . . . . . . . . . . . . 2535-8 AIX, RSCT, and HACMP levels on cluster node xdsvc2. . . . . . . . . . . . . 2545-9 AIX, RSCT, and HACMP levels on cluster node xdsvc1. . . . . . . . . . . . . 2555-10 Resource groups states, HACMP log messages, and ODM entry following

start of the HACMP services on cluster node xdsvc1. . . . . . . . . . . . . . . 2565-11 Snapshot migration error#1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2595-12 Snapshot migration error#2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2595-13 Snapshot migration success . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2606-1 Checking the .toc prior to installing fixes. . . . . . . . . . . . . . . . . . . . . . . . . 2907-1 How to list which files are included in an existing file collection . . . . . . . 3317-2 Importing AIX volume groups manually . . . . . . . . . . . . . . . . . . . . . . . . . 3547-3 Adding a scalable enhanced concurrent VG. . . . . . . . . . . . . . . . . . . . . . 362

© Copyright IBM Corp. 2009. All rights reserved. xxiii

7-4 Create a new concurrent volume group and concurrent resource group 3637-5 Output from concurrent VG and concurrent RG creation . . . . . . . . . . . . 3647-6 Create shared volume group and enable cross-site lvm mirroring . . . . . 3647-7 C-SPOC creating new LV -1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3657-8 C-SPOC creating new LV - 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3657-9 C-SPOC creating jfs2 file system on a previously defined LV . . . . . . . . 3667-10 increase the size of a shared LV - LV selection . . . . . . . . . . . . . . . . . . 3677-11 increase the size of a shared LV - Disk selection . . . . . . . . . . . . . . . . . 3687-12 increase the size of a shared LV - Complete and lsfs -q output . . . . . . 3697-13 Increase the size of a file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3697-14 C-SPOC change file system selection . . . . . . . . . . . . . . . . . . . . . . . . . 3707-15 C-SPOC change file system - options. . . . . . . . . . . . . . . . . . . . . . . . . . 3717-16 C-SPOC remove a file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3727-17 clstat -o command output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4297-18 clshowres -v command output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4327-19 lssrc -ls topsvcs command output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4337-20 clRGinfo command output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4367-21 Application monitor startup monito . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4467-22 Output from resuming application monitor . . . . . . . . . . . . . . . . . . . . . . 4527-23 Application availability analysis tool output . . . . . . . . . . . . . . . . . . . . . . 4558-1 Successfully enabling key distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . 4638-2 Successful key distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4658-3 Successful key activation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4668-4 Successfully disabling key distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 4678-5 Available MPMs on a cluster node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4798-6 Location of MPMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4808-7 IBM.HostPublic resource and its privileges. . . . . . . . . . . . . . . . . . . . . . . 4829-1 Install SSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5039-2 SSH to HMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5059-3 Scp authorized_keys2 file to HMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5069-4 Test no-password ssh access. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5079-5 HMC unreachable during verification . . . . . . . . . . . . . . . . . . . . . . . . . . . 5139-6 Node not DLPAR capable verification error . . . . . . . . . . . . . . . . . . . . . . 5139-7 Verify LPAR is DLPAR capable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5139-8 Adding the WPAR to the hawp1 resource group . . . . . . . . . . . . . . . . . . 53110-1 Displaying the RG settling time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53710-2 Checking settling time in /var/hacmp/log/hacmp.out. . . . . . . . . . . . . . . 53910-3 Configuring resource group node-based distribution policy . . . . . . . . . 54010-4 Adding a resource group using DNP. . . . . . . . . . . . . . . . . . . . . . . . . . . 54310-5 Selecting the dynamic node priority policy to use . . . . . . . . . . . . . . . . . 54310-6 Displaying DNP policy for a resource group . . . . . . . . . . . . . . . . . . . . . 54410-7 Querying resource monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54510-8 DNP values maintained by cluster manager . . . . . . . . . . . . . . . . . . . . . 546

xxiv PowerHA for AIX Cookbook

10-9 Assigning a fallback timer policy to a resource group . . . . . . . . . . . . . . 54910-10 Displaying resource groups having fallback timers . . . . . . . . . . . . . . . 54910-11 Displaying fallback timers using ODM queries . . . . . . . . . . . . . . . . . . 55010-12 Displaying resource group dependencies . . . . . . . . . . . . . . . . . . . . . . 55810-13 Displaying resource group dependencies using ODM queries . . . . . . 55811-1 Sample list if automatic error notifications. . . . . . . . . . . . . . . . . . . . . . . 56811-2 Defining a user-defined event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57412-1 Diskhb warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59112-2 Free disks to be used for multi-node disk heartbeat . . . . . . . . . . . . . . . 59115-1 lsdev -Cc disk and lspv output on both nodes. . . . . . . . . . . . . . . . . . . . 64715-2 Site selection in the Add Site/Disk Definition SMIT menu. . . . . . . . . . . 64915-3 Disk selection in the Add Site/Disk Definition SMIT menu . . . . . . . . . . 64915-4 Add Disk/Site Definition for the second site . . . . . . . . . . . . . . . . . . . . . 65015-5 Selecting the disks for first volume group creation . . . . . . . . . . . . . . . . 65115-6 Volume group type picklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65115-7 Create cross-site LVM volume group . . . . . . . . . . . . . . . . . . . . . . . . . . 65215-8 Selecting the disks for second volume group creation . . . . . . . . . . . . . 65315-9 Create second cross-site lvm volume group . . . . . . . . . . . . . . . . . . . . . 65315-10 Lspv output on each node after volume group creation . . . . . . . . . . . 65415-11 Resource group settings before change for cross-site LVM mirroring 65515-12 Resource groups after setting forced varyon to true . . . . . . . . . . . . . . 65715-13 hdisk and logical volume status after one storage subsystem failure . 65915-14 hdisk and logical volume status after the storage reintegration . . . . . 66015-15 Final Add a JFS2 SMIT menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67016-1 Additional SVC replicate resource groups created . . . . . . . . . . . . . . . . 69517-1 PowerHA/XD/GLVM filesets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70517-2 PowerHA/XD base topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70617-3 PowerHA/XD/GLVM base resource group . . . . . . . . . . . . . . . . . . . . . . 70717-4 GLVM beginning disk configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 70717-5 Display RPV servers state and attributes . . . . . . . . . . . . . . . . . . . . . . . 70917-6 Display RPV client state and attributes . . . . . . . . . . . . . . . . . . . . . . . . . 71117-7 RPV servers available. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71317-8 RPV clients available on local node for vg creation . . . . . . . . . . . . . . . 71517-9 Local volume groups active before creating GLVM copies . . . . . . . . . . 72017-10 RPV server, RPV client verification before importing volume group . . 72217-11 GMVG verification on remote node. . . . . . . . . . . . . . . . . . . . . . . . . . . 72317-12 Verify GMVG active and file systems mounted. . . . . . . . . . . . . . . . . . 72517-13 Gmvgstat and rpvstat output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72617-14 GMVG resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72817-15 Output of lsvg -p checking for device state ‘active’ . . . . . . . . . . . . . . . 74017-16 Assign local PVs to mirror pool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74017-17 Assign remote PVs to mirror pool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74117-18 Change VG attributes - preparation for async mirroring . . . . . . . . . . . 741

Examples xxv

17-19 lsvg output - preparation for async mirroring. . . . . . . . . . . . . . . . . . . . 74217-20 Logical volume status after disabling bad block relocation . . . . . . . . . 74317-21 Assign LV copies to mirror pools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74317-22 Add AIO cache logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74417-23 Configure mirror pools for asynchronous mirroring - first pool . . . . . . 74617-24 Configure mirror pools for asynchronous mirroring - second pool . . . 74617-25 List mirror pools for each VG to show async mirror state . . . . . . . . . . 74717-26 Output of rpvstat -A and rpvstat -C . . . . . . . . . . . . . . . . . . . . . . . . . . . 74717-27 Update RG to handle async GMVGs . . . . . . . . . . . . . . . . . . . . . . . . . 74817-28 Check cluster and mirror pool status prior to primary site failure . . . . 74917-29 Cluster and mirror pool status after primary site failure . . . . . . . . . . . 75017-30 lsvg -M output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75317-31 Invalid AIO cache LV shown with lsmp command output . . . . . . . . . . 75417-32 errpt entries for invalid AIO cache LV . . . . . . . . . . . . . . . . . . . . . . . . . 75417-33 Check RPV server characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 76317-34 RPV error sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76417-35 Adding a RPV server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76617-36 Adding a RPV server - using CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76617-37 Listing the RPV servers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76617-38 Adding the RPV client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76717-39 Adding RPV client, using CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76717-40 Listing of the physical volumes defined on thor . . . . . . . . . . . . . . . . . 76817-41 Changing the logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76817-42 Using lsvg to query the status of the logical volume mirrors . . . . . . . . 76917-43 Changing the file systems for working with the logical volumes . . . . . 77017-44 Converting the HAGEO network into an XD_data network. . . . . . . . . 77117-45 Defining the resource group in PowerHA . . . . . . . . . . . . . . . . . . . . . . 771

xxvi PowerHA for AIX Cookbook

Notices

This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurement may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment.

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which illustrate programming

© Copyright IBM Corp. 2009. All rights reserved. xxvii

techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

Trademarks

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml

The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:

AIX 5L™AIX®DB2 Universal Database™DB2®DS4000®DS6000™DS8000®Enterprise Storage Server®GPFS™HACMP™

IBM®NetView®Power Systems™POWER4™POWER5™POWER6™PowerHA™PowerVM™POWER®pSeries®

Redbooks®Redbooks (logo) ®RS/6000®System p®System Storage™Tivoli®WebSphere®Workload Partitions Manager™

The following terms are trademarks of other companies:

InfiniBand, and the InfiniBand design marks are trademarks and/or service marks of the InfiniBand Trade Association.

Snapshot, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S. and other countries.

Oracle, JD Edwards, PeopleSoft, Siebel, and TopLink are registered trademarks of Oracle Corporation and/or its affiliates.

Interchange, and the Shadowman logo are trademarks or registered trademarks of Red Hat, Inc. in the U.S. and other countries.

Java, JRE, Sun, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Expression, Internet Explorer, Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

xxviii PowerHA for AIX Cookbook

http://www.ibm.com/legal/copytrade.shtml

Preface

This IBM® Redbooks® publication will help you install, tailor, and configure the new PowerHA™ Version 5.5, and understand new and improved features such as WebSMIT gateway, non-disruptive migrations, C-SPOC enhancements, and Disaster Recovery (DR) configurations, such as GLVM in asynchronous mode.

This publication provides a broad understanding of the PowerHA and PowerHA Extended Distance (PowerHA/XD) architecture. If you plan to install, migrate, or administer a high availability cluster, this book is right for you. Disaster recovery elements and how PowerHA fulfills these necessities are also presented in detail.

This cookbook is designed to help AIX® professionals that are seeking a comprehensive and task-oriented guide for developing the knowledge and skills required for PowerHA cluster design and implementation as well as for daily system administration. It is designed to provide a combination of theory and practical experience.

This book will be especially useful for system administrators currently running PowerHA and or PowerHA Extended Distance (XD) clusters who might want to consolidate their environment and move to new PowerHA Version 5.5. There are detailed descriptions of migration tasks to PowerHA 5.5, including non-disruptive migrations, and a comprehensive discussion about how to migrate from synchronous GLVM to asynchronous GLVM.

The team that wrote this book

This book was produced by a team of specialists from around the world working at the International Technical Support Organization, Poughkeepsie Center.

Shawn Bodily is a Certified Consulting IT Specialist for Advanced Technical Support Americas located in Dallas, Texas. He has worked for IBM for eleven years, and has thirteen years of AIX experience and ten years specializing in HACMP™. He is HACMP and ATE certified in both V4 and V5. He has written and presented on high availability and storage. Also, he has co-authored three IBM Redbooks publications.

Rosemary Killeen is a Level 1 accredited IT Specialist with IBM based in the UK. She is a member of the European Virtual Front End IBM Support and Collaboration Teams. Rosie has worked for IBM for twelve years and has six years of experience working with High Availability solutions on Power

© Copyright IBM Corp. 2009. All rights reserved. xxix

Systems™. Her areas of expertise include Power, AIX, LVM and HACMP/PowerHA. Rosie is an IBM Certified Systems Expert with AIX 5L™ and HACMP. She is also a HACMP/PowerHA technical instructor for IBM Training.

Liviu Rosca is a Senior IT Specialist with IBM Global Technology Services Romania. He has been working for IBM for seven years providing support for System p®, AIX, HACMP and WVR. His areas of expertise include pSeries®, AIX, HACMP, networking, security, and telecommunications. He is an IBM Certified AIX 5L and HACMP System Administrator as well as CCNP. He teaches AIX and HACMP classes, and has co-authored five other IBM Redbooks publications.

The Project Leader who managed the production of this publication was: Scott Vetter, PMP.

Our IBM Redbooks publications team thanks the following authors of the IBM Redbooks publication, Implementing High Availability Cluster Multi-Processing (HACMP) Cookbook, SG24-6769, on which our own book was based:

Shawn Bodily, Maria-Katharina Esser, Michael Herrera, Octavian Lascu, Patrick Pothier, Dusan Prelec, Dino Quintero, Ken Raymond, Viktor Sebesteny, Andrei Socoliuc, Antony (Red) Steel

Thanks also to the following people for their contributions to this project:

Dave Bennin, Don Brennan, Patrick Buah, Mike Coffey, Rich Conway, Paul Moyer, Skip Russell, Steve TovcimakIBM Poughkeepsie

Brandon Boles, Casey Brotherton, Loel Graber, Astrid Jaehde, Minh Pham, Gary Lowther, Alex McLeod, Ninad Palsule, Gus Schlachter, Tom WeaverIBM Austin

Bill Miller, Glenn E. Miller, James NashIBM U.S.A.

Shane BrandonIBM Australia

Philippe HermesIBM France

Claudio MarcantoniIBM Italy

SeongLul SonIBM Korea

xxx PowerHA for AIX Cookbook

James Lee, Jon Tate, Jim Wood, Andrew YoungIBM U.K.

Henning GammelmarkBankdata

Become a published author

Join us for a two- to six-week residency program! Help write a book dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You will have the opportunity to team with IBM technical professionals, Business Partners, and Clients.

Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you will develop a network of contacts in IBM development labs, and increase your productivity and marketability.

Find out more about the residency program, browse the residency index, and apply online at:

ibm.com/redbooks/residencies.html

Comments welcome

Your comments are important to us!

We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways:

� Use the online Contact us review Redbooks form found at:

ibm.com/redbooks

� Send your comments in an e-mail to:

[email protected]

� Mail your comments to:

IBM Corporation, International Technical Support OrganizationDept. HYTD Mail Station P0992455 South RoadPoughkeepsie, NY 12601-5400

Preface xxxi

http://www.redbooks.ibm.com/residencies.html

http://www.redbooks.ibm.com/residencies.html



http://www.redbooks.ibm.com/contacts.html

xxxii PowerHA for AIX Cookbook

Part 1 Introduction

In Part 1, we provide an overview of PowerHA and describe the PowerHA components as part of a successful implementation.

Because PowerHA is a mature product, we consider it important to present some of the recent PowerHA history, which can help in planning future actions, such as migrating existing configurations to the latest version and exploiting the new features in PowerHA V5.4.1 and V5.5.

We also introduce the basic PowerHA management concepts, with recommendations and considerations to ease the system administrator’s job.

We cover the following topics:

� Introduction to PowerHA for AIX� High availability components

Part 1

© Copyright IBM Corp. 2009. All rights reserved. 1

2 PowerHA for AIX Cookbook

Chapter 1. Introduction to PowerHA for AIX

In this chapter we provide an introduction to high availability in general and discuss the IBM PowerHA for AIX in particular.

We discuss the following topics:

� What is PowerHA for AIX� Availability solutions: An overview� History and evolution� High availability terminology and concepts� High availability versus fault tolerance� Software planning� PowerHA software installation

1


1.1 What is PowerHA for AIXPowerHA for AIX is the IBM premier high availability solution for POWER® systems running AIX. It was formerly known as HACMP. In this book you can see both terms used interchangeably. Versions prior to PowerHA 5.5 are referred to by its original name of HACMP.

1.1.1 High availabilityIn today’s complex environments, providing continuous service for applications is a key component of a successful IT implementation. High availability is one of the components that contributes to providing continuous service for the application clients, by masking or eliminating both planned and unplanned systems and application downtime. A high availability solution ensures that the failure of any component of the solution, either hardware, software, or system management, does not cause the application and its data to become permanently unavailable to the end user.

High availability solutions should eliminate single points of failure through appropriate design, planning, selection of hardware, configuration of software, control of applications, a carefully controlled environment, and change management discipline.

In short, we can define high availability as the process of ensuring, through the use of duplicated and/or shared hardware resources managed by a specialized software component, that an application is available for use.

1.1.2 Cluster multi-processingIn addition to high availability, PowerHA also provides the multi-processing component. The multi-processing capability comes from the fact that in a cluster there are multiple hardware and software resources managed by PowerHA to provide complex application functionality and better resource utilization.

A short definition for cluster multi-processing could be: multiple applications running over a number of nodes with shared or concurrent access to the data.

Although desirable, the cluster multi-processing component depends on the application capabilities and system implementation to efficiently use all resources available in a multi-node (cluster) environment. This must be implemented starting with the cluster planning and design phase.


PowerHA is only one of the high availability technologies and builds on the increasingly more reliable operating systems, hot swappable hardware, increasingly more resilient applications, by offering monitoring and automated response.

A high availability solution based on PowerHA provides automated failure detection, diagnosis, application recovery, and node reintegration. With an appropriate application, PowerHA can also provide concurrent access to the data for parallel processing applications, thus offering excellent horizontal, and vertical scalability (with the addition of the dynamic LPAR management capabilities).

IBM has also designed an extended version of PowerHA which provides disaster recovery functionality integrated is a solution known as PowerHA Extended Distance (PowerHA/XD) which supports PowerHA functionality between two geographic sites. PowerHA/XD supports a number of distinct methods for replicating the data and is discussed in detail in Chapter 14, “PowerHA Extended Distance concepts and planning” on page 633.

1.2 Availability solutions: An overviewThere are many solutions that can provide a wide range of availability options. In Table 1-1, we describe different types of availability solutions and their characteristics.

Table 1-1 Types of availability solutions

Solution Downtime measured in

Data availability Observations

Standalone Days From last backup Basic hardware and software costs ($)

Enhanced standalone

Hours Until last transaction Double basic hardware cost ($$)

High availability clusters

Minutes Until last transaction Double hardware and additional services ($$+)

Fault-tolerant computing

Never stops No loss of data Specialized hardware and software, very expensive ($$$$$$)

PowerHA/XD Minutes Until last transaction Two-three times the hardware cost + additional communication($$$$)

Chapter 1. Introduction to PowerHA for AIX 5

High availability solutions in general offer the following benefits:

� Standard hardware and networking components (can be used with the existing hardware)

� Work with just about any application

� Work with a wide range of disks and network types

� Excellent availability at reasonable cost

IBMs highly available solution for POWER systems offers distinct benefits that include:

� Proven solution (more than 18 years of product development)

� Flexibility (virtually any application running on a standalone AIX system can be protected with PowerHA)

� Using “of the shelf” hardware components

� Proven commitment for supporting our customers

When planning to implement a PowerHA solution, the following aspects must be considered:

� Thorough HA design and detailed planning from end to end

� Elimination of single points of failure

� Selection of appropriate hardware

� Correct implementation (do not take “shortcuts”)

� Disciplined system administration practices and change control

� Documented operational procedures

� Comprehensive test plan and thorough testing

A typical PowerHA environment is shown in Figure 1-1. Though a serial network is shown, and is still available today, most implementations today provide non-IP heartbeat networks using disk heartbeating. More information can be found in 12.6, “Disk heartbeat” on page 586.


Figure 1-1 PowerHA cluster

1.2.1 DowntimeDowntime is the period when an application is not available to serve its clients. We can classify downtime in two categories:

� Planned:

– Hardware upgrades– Repairs– Software updates/upgrades– Backups (offline backups)– Testing (periodic testing is required for cluster validation.)– Development


� Unplanned:

– Administrator errors– Application failures– Hardware failures– Operating system errors– Environmental disasters

Thus the role of PowerHA is to both maintain application availability through the unplanned outages and normal day to day administrative requirements. PowerHA provides monitoring and automatic recovery of the resources on which your application depends.

1.2.2 Single point of failure (SPOF)A single point of failure (SPOF) is any individual component integrated in a cluster which, in case of failure, renders the application unavailable for end users.

Good design can remove single points of failure in the cluster—nodes, storage, and networks. PowerHA manages these, as well as the resources required by the application (including the application start/stop scripts).

Ultimately, the goal of any IT solution in a critical environment is to provide continuous application availability and data protection. The high availability is just one building block in achieving the continuous operation goal. The high availability is based on the availability of the hardware, software (operating system and its components), application, and network components.

In order to avoid single points of failure, you need:� Redundant servers� Redundant network paths� Redundant storage (data) paths� Redundant (mirrored/RAID) storage� Monitoring� Failure detection and diagnosis� Automated application fallover� Automated resource reintegration

As previously mentioned, a good design is able to avoid single points of failure, and PowerHA can manage the availability of the application through downtimes. Table 1-2 lists each cluster object, which, if it fails, can result in loss of availability of the application. Each cluster object can be a physical or logical component.


Table 1-2 Single points of failure

PowerHA also optimizes availability by allowing for dynamic reconfiguration of running clusters. Maintenance tasks such as adding or removing nodes can be performed without stopping and restarting the cluster.

In addition, other management tasks, such as modifying storage, managing users, can be performed on the running cluster using the Cluster Single Point of Control (C-SPOC) without interrupting user access to the application running on the cluster nodes. C-SPOC also ensures that changes made on one node are replicated across the cluster in a consistent manner.

1.3 History and evolutionIBM High Availability Cluster Multi-Processing goes back to the early 1990s. HACMP development started in 1990 to provide high availability solutions for applications running on RS/6000® servers. We do not provide information about the very early releases, which are no longer supported or were not in use at the time of writing. Instead, we provide highlights about the most recent versions.

Cluster object Single point of failure eliminated by:

Node (Servers) Multiple nodes

Power supply Multiple circuits and/or power supplies and/or UPS

Network adapter Redundant network adapters

Network Multiple networks connected to each nodes, redundant network paths with independent hardware between each node and the clients.

TCP/IP subsystem Use of non-IP networks to connect each node to its neighbor in a ring.

I/O adapter Redundant I/O Adapters

Controllers User redundant controllers

Storage Redundant hardware, enclosures, disk mirroring / RAID technology, redundant data paths

Application Configuring application monitoring and backup node(s) to acquire the application engine and data

Sites Use of more than one site for disaster recovery

Resource groups Use of resource groups to control all resources required by an application


Originally designed as a standalone product (known as HACMP classic), after the IBM high availability infrastructure known as Reliable Scalable Clustering Technology (RSCT) became available, HACMP adopted this technology and became HACMP Enhanced Scalability (HACMP/ES), because it provides performance and functional advantages over the classic version. Starting with HACMP v5.1, there are no more classic versions.

1.3.1 HACMP Version 5 Release 4Released in July 2006, the new HACMP V5.4 continued the development of HACMP, by adding further improvements in management, configuration simplification, automation, and performance areas. Here we summarize the improvements in HACMP V5.4:

� Common smart assist framework

� Enhanced Oracle® Smart Assist

� WebSMIT improvements

� Cluster test tool enhancements through the addition of multiple events

� Resource group movement enhancements (elimination of POL)

� Unmanage resource group option

� Non-disruptive install (no reboot required)

� Non-disruptive cluster startup (by manually manage resource group option)

� Non-disruptive maintenance and upgrades

� Cluster verification enhancements

– Consolidation of error and warning messages– Status reporting on cluster components– Additional non-IP network validation

� Fast Failure Detection (utilizes disk heartbeat)

� GPFS™ 2.3 support

� Site specific IP address

� Additional HACMP/XD enhancements

– IP address takeover on geographic networks– Intermix of ESS 800 and DS6000/DS8000 metro mirror– Enhanced concurrent volume group support for GLVM– Up to four data networks for GLVM– GEO_primary and GEO_secondary network types removed


1.3.2 HACMP Version 5 Release 4.1

Released in November 2007, HACMP version 5.4.1 continues to add features and functionality such as these:

� Cluster verification progress indicator

� Cluster log file enhancements

� New first failure data capture

� Heartbeat metrics (using the cltopinfo -m command)

� SMIT panel enhancements

– Fast disk takeover– Clarify snapshot menu options– Import from online planning worksheets

� WebSMIT enhancements

– More common industry icons and status– Color customization– Easier installation and configuration– Verbose logging option– Additional online help and documentation

� NFSv4 support

– Configuration assistant– Application server– Application monitor

� WPAR support

� Multi-node disk heartbeat (mndhb)

� HACMP/XD enhancements

– GLVM status monitoring– Consistency group support for ESS800/DS6000/DS8000® Metro Mirroring

1.3.3 PowerHA Version 5 Release 5

Released in November 2008, PowerHA version 5.5 continues to add robust features and functionality for high availability and disaster recovery:

� IPv6 support (requires AIX 6.1)

� DLPAR Support for POWER6™ systems

� SMIT Panel Improvements

– Choose or create a resource group when adding a new volume group


– Persistent address added to show cluster topology (cllscf)– Multi-node disk heartbeat support for vpath/sdd devices– List users allowed to change passwords

� WebSMIT Enhancement

– New gateway design moves http away from cluster nodes– Monitor and manage multiple clusters using enterprise view

� Smart assists updated for version currency

� First version to utilize non-disruptive upgrades

� PowerHA/XD enhancements

– GLVM asynchronous mirroring capability (requires AIX 6.1 and 5.5 SP1)– San Volume Controller (SVC) support of global mirror

1.4 High availability terminology and conceptsTo understand the correct functionality of PowerHA and to utilize it effectively, it is necessary to understand some important terms and concepts.

1.4.1 TerminologyStarting in HACMP V5.1, the terminology used to describe HACMP configuration and operation has changed dramatically. The reason for this change is to simplify the overall usage and maintenance of HACMP, and also to align the terminology with the IBM product line.

For example, in previous HACMP versions, the term Adapter, depending on the context, could have different meanings, which made configuration confusing and difficult.

The following terms are used throughout this book:

Cluster Loosely-coupled collection of independent systems (nodes) or Logical Partitions (LPARs) organized into a network for the purpose of sharing resources and communicating with each other.

HACMP defines relationships among cooperating systems where peer cluster nodes provide the services offered by a cluster node should that node be unable to do so. These individual nodes are together responsible for maintaining the functionality of one or more applications in case of a failure of any cluster component.


Node An IBM Power (p, i, or blade) system (or LPAR) running AIX and HACMP/PowerHA that is defined as part of a cluster. Each node has a collection of resources (disks, file systems, IP addresses, and applications) that can be transferred to another node in the cluster in case the node or a component fails.

Clients A client is a system that can access the application running on the cluster nodes over a local area network. Clients run a client application that connects to the server (node) where the application runs.

1.4.2 ConceptsThe basic concepts of PowerHA can be classified as follows:

Topology Contains basic cluster components nodes, networks, communication interfaces, communication devices, and communication adapters.

Resources Logical components or entities that are being made highly available (for example, file systems, raw devices, service IP labels, and applications) by being moved from one node to another. All resources that together form a highly available application or service, are grouped together in resource groups (RG).

PowerHA keeps the RG highly available as a single entity that can be moved from node to node in the event of a component or node failure. Resource groups can be available from a single node or, in the case of concurrent applications, available simultaneously from multiple nodes. A cluster can host more than one resource group, thus allowing for efficient use of the cluster nodes (thus the “Multi-Processing” in HACMP).

Service IP label A label that matches to a service IP address and is used for communications between clients and the node. A service IP label is part of a resource group, which means that HACMP can monitor it and keep it highly available.

IP address takeover (IPAT)The process whereby an IP address is moved from one adapter to another adapter on the same logical network. This adapter can be on the same node, or another node in the cluster. If aliasing is used as the method of assigning addresses to adapters, then more than one address can reside on a single adapter.


Resource takeover This is the operation of transferring resources between nodes inside the cluster. If one component or node fails due to a hardware or operating system problem, its resource groups are moved to the another node.

Fallover This represents the movement of a resource group from one active node to another node (backup node) in response to a failure on that active node.

Fallback This represents the movement of a resource group back from the backup node to the previous node, when it becomes available. This movement is typically in response to the reintegration of the previously failed node.

Heartbeat packet A packet sent between communication interfaces in the cluster, used by the various cluster daemons to monitor the state of the cluster components - nodes, networks, adapters.

RSCT daemons These consist of two types of processes (topology and group services) that monitor the state of the cluster and each node. The cluster manager receives event information generated by these daemons and takes corresponding (response) actions in case of failure(s).

Group leader The node with the highest IP address as defined in one of the PowerHA networks (the first network available), that acts as the central repository for all topology and group data coming from the RSCT daemons concerning the state of the cluster.

Group leader backup This is the node with the next highest IP address on the same arbitrarily chosen network, that acts as a backup for the group leader. It takes over the role of group leader in the event that the group leader leaves the cluster.

Mayor A node chosen by the RSCT group leader (the node with the next highest IP address after the group leader backup), if such exists, else it is the group leader backup itself. It is the mayor’s responsibility to inform other nodes of any changes in the cluster as determined by the group leader.


1.5 High availability versus fault toleranceBased on the response time and response action to system detected failures, the clusters and systems can be classified as:

� Fault-tolerant� High availability

1.5.1 Fault-tolerant systemsThe systems provided with fault tolerance are designed to operate virtually without interruption, regardless of the failure that might occur (except perhaps for a complete site down due to a natural disaster). In such systems, all components are at least duplicated for both software or hardware.

All components, CPUs, memory, and disks have a special design and provide continuous service, even if one sub-component fails. Only special software solutions can run on fault tolerant hardware.

Such systems are very expensive and extremely specialized. Implementing a fault tolerant solution requires a lot of effort and a high degree of customization for all system components.

For environments where no downtime is acceptable (life critical systems), fault-tolerant equipment and solutions are required.

1.5.2 High availability systemsThe systems configured for high availability are a combination of hardware and software components configured to work together to ensure automated recovery in case of failure with a minimal acceptable downtime.

In such systems, the software involved detects problems in the environment, and manages application survivability by restarting it on the same or on another available machine (taking over the identity of the original machine - node).

Thus, it is very important to eliminate all single points of failure (SPOF) in the environment. For example, if the machine has only one network interface (connection), a second network interface (connection) should be provided in the same node to take over in case the primary interface providing the service fails.

Another important issue is to protect the data by mirroring and placing it on shared disk areas accessible from any machine in the cluster.

The PowerHA software provides the framework and a set of tools for integrating applications in a highly available system.


Applications to be integrated in a PowerHA cluster can require a fair amount of customization, possibly both at the application level and at the PowerHA and AIX platform level. PowerHA is a flexible platform that allows integration of generic applications running on the AIX platform, providing for highly available systems at a reasonable cost.

It is important to remember that PowerHA is not a fault tolerant solution and should never be implemented as such.

1.6 Software planningIn the process of planning a PowerHA cluster, one of the most important steps is to choose the software levels to be running on the cluster nodes.

The decision factors in node software planning are:

� Operating system requirements: AIX version and recommended levels.

� Application compatibility: Ensure that all requirements for the applications are met, and supported in cluster environments.

� Resources: Types of resources that can be used (IP addresses, storage configuration, if NFS is required, and so on).

1.6.1 AIX level and related requirementsBefore you install the PowerHA, you must check the other software level requirements.

Table 1-3 shows the recommended PowerHA and AIX levels at the time this book was written.

Table 1-3 AIX level requirements

HACMP/PowerHA Version

AIX OS Level Required APARs Minimum RSCT Level

HACMP V5.3 5200-04 IY72082, IY72946, IY72928

2.3.6.0

HACMP V5.3 5300-02 IY71500, 72852, IY72916, IY72928

2.4.2.0

HACMP V5.3 6100-00 IZ07791 2.5.0.0

HACMP V5.4.x 5200-08 IZ02620 2.3.6.0

HACMP V5.4.x 5300-04 IZ02620 2.4.5.0


For the latest list of recommended service packs for PowerHA, access the IBM Web site at:

http://www14.software.ibm.com/webapp/set2/sas/f/hacmp/home.html

The following AIX base operating system (BOS) components are prerequisites for PowerHA:

� bos.adt.lib� bos.adt.libm� bos.adt.syscalls� bos.net.tcp.client� bos.net.tcp.server� bos.rte.SRC� bos.rte.libc� bos.rte.libcfg� bos.rte.libcur� bos.rte.libpthreads� bos.rte.odm� bos.data� bos.rte

When using the (enhanced) concurrent resource manager access, the following components are also required:

� bos.clvm.enh (as required by the LVM)

Requirements for NFSv4The cluster.es.nfs fileset that comes with the PowerHA installation medium installs the NFSv4 support for PowerHA, along with a NFS Configuration Assistant. To install this fileset, the following BOS NFS components must also be installed on the system.

HACMP V5.4.x 6100-00 IZ02620 2.5.0.0

PowerHA V5.5 5300-09 2.4.10

PowerHA V5.5 6100-02-01 IZ31208 2.5.2.0

PowerHA/XD V5.5 5300-09 2.4.10

PowerHA/XD V5.5 6100-02-01 IZ31208 2.5.2.0

PowerHA/XD V5.5 GLVM Async

6100-02-03 IZ31208,IZ31205, IZ31207

2.5.2.0

HACMP/PowerHA Version

AIX OS Level Required APARs Minimum RSCT Level



For AIX Version 5.3:

� bos.net.nfs.server 5.3.7.0� bos.net.nfs.client 5.3.7.0

AIX Version 6.1:


Requirements for RSCTInstall the RSCT filesets before installing PowerHA. Ensure that each node has the same version of RSCT.

To determine if the appropriate filesets are installed and their level, issue the following commands:

/usr/bin/lslpp -l rsct.compat.basic.hacmp/usr/bin/lslpp -l rsct.compat.clients.hacmp/usr/bin/lslpp -l rsct.basic.rte

If these filesets are not present, install the appropriate version of RSCT.

Security fileset requirementsIf you plan to use message authentication or encryption for HACMP communication between cluster nodes, the necessary filesets must be installed on each node.

These filesets are available on the AIX Expansion Pack and include:

� rsct.crypt.des for data encryption with DES message authentication

� rsct.crypt.3des for data encryption standard Triple DES message authentication

� rsct.crypt.aes256 for data encryption with Advanced Encryption Standard (AES) message authentication

1.6.2 LicensingMost software vendors require that you have a unique license for each application for each physical machine or per processor in a multi-processor (SMP) machine. Usually, the license activation code is entered at installation time.

However, in a PowerHA environment, in a takeover situation, if the application is restarted on a different node, you must make sure that you have the necessary activation codes (licenses) for the new machine; otherwise the application might not start properly.


The application might also require a unique node-bound license (a separate license file on each node).

Some applications also have restrictions with the number of floating licenses available within the cluster for that application. To avoid this problem, be sure that you have enough licenses for each cluster node, so the application can run simultaneously on multiple nodes (especially for concurrent applications).

For more current information about PowerHA licensing, refer to the PowerHA frequently asked questions online page located at:

http://www.ibm.com/systems/power/software/availability/aix/faq/index.html

1.7 PowerHA software installationThe PowerHA software provides a series of facilities that you can use to make your applications highly available. You must keep in mind that not all system or application components are protected by PowerHA.

For example, if all the data for a critical application resides on a single disk, and that specific disk fails, then that disk is a single point of failure for the entire cluster, and is not protected by PowerHA. AIX logical volume manager or storage subsystems protection must be used in this case. HACMP only provides takeover for the disk on the backup node, to make the data available for use.

This is why PowerHA planning is so important, because your major goal throughout the planning process is to eliminate single points of failure. A single point of failure exists when a critical cluster function is provided by a single component. If that component fails, the cluster has no other way of providing that function, and the application or service dependent on that component becomes unavailable.

Also keep in mind that a well-planned cluster is easy to install, provides higher application availability, performs as expected, and requires less maintenance than a poorly planned cluster.

1.7.1 Checking for prerequisitesAfter you have finished your planning work sheets, verify that your system meets the requirements of PowerHA. Many potential errors can be eliminated if you make this extra effort. Refer back to Table 1-3 on page 16.



1.7.2 New installationPowerHA can be installed using the AIX Network Installation Management (NIM) program, including the Alternate Disk Migration option. You must install the PowerHA filesets on each cluster node. You can install PowerHA filesets either by using NIM or from a local software repository.

Installation using an NIM serverWe recommend using NIM, simply because it allows you to load the PowerHA software onto other nodes faster from the server than from other media. Furthermore, it is a flexible way of distributing, updating, and administrating your nodes. It allows you to install multiple nodes in parallel and provides an environment for maintaining software updates. This is very useful and a time saver in large environments; for smaller environments a local repository might be sufficient.

If you choose NIM, you need to copy all the PowerHA filesets onto the NIM server and define a lpp_source resource before proceeding with the installation.

Installation from CD-ROM or hard diskIf your environment has only a few nodes, or if the use of NIM is more than you need, you can use CD-ROM installation or make a local repository by copying the HACMP filesets locally and then use the exportfs command; this allows other nodes to access the data using NFS.

1.7.3 Installing PowerHABefore installing PowerHA, make sure you read the release notes in the /usr/es/sbin/cluster/ directory for the latest information about requirements or known issues:

Base PowerHA /usr/es/sbin/cluster/release_notes

PowerHA/XD /usr/es/sbin/cluster/release_notes_xd

Smart Assists /usr/es/sbin/cluster/release_notes_assist

More details on installing and configuring can be found in: Chapter 4, “Installation and configuration” on page 199.

To install the PowerHA software on a server node, do the following steps:

1. If you are installing directly from the installation media, such as a CD-ROM or from a local repository, enter the smitty install_all fast path. SMIT displays the Install and Update from ALL Available Software panel.

2. Enter the device name of the installation medium or install directory in the INPUT device/directory for software field and press Enter.


3. Enter the corresponding field values.

To select the software to install, press F4 for a software listing, or enter all to install all server and client images. Select the packages you want to install according to your cluster configuration. Some of the packages might require prerequisites that are not available in your environment (for example, Tivoli® Monitoring).

The cluster.es and cluster.cspoc filesets (which contain the PowerHA run-time executable) are required and must be installed on all servers.

Make sure that you select Yes in the Accept new license agreements field. You must choose Yes for this item to proceed with installation. If you choose No, the installation might stop with a warning that one or more filesets require the software license agreements. You accept the license agreement only once for each node.

4. press Enter to start the installation process.

Post-installation stepsTo complete the installation, do the following steps:

1. Verify the software installation by using the AIX lppchk command, and check the installed directories to see if the expected files are present.

2. Run the lppchk -v and lppchk -c cluster* commands. Both commands run clean if the installation is good; if not, use the proper problem determination techniques to fix any problems.

3. A reboot might be required if RSCT prerequisites have been installed since the last time the system was rebooted.

For more information about upgrading PowerHA, refer to Chapter 5, “Migrating a cluster to PowerHA V5.5” on page 241.

Tip: It is always good practice to install the latest PowerHA Service Pack at the time of installation. These can be downloaded from the following site:





Chapter 2. High availability components

PowerHA uses the underlying cluster topology—the nodes, networks, and storage—to keep the cluster resources highly available.

In this chapter, we discuss the following topics:

� PowerHA configuration data� Software components� Cluster topology� Resources and resource groups� Plug-ins� Features (HACMP 5.1, 5.2 and 5.3)� Limits� Storage characteristics� Shared storage configuration

2


2.1 PowerHA configuration dataThere are two main components to the cluster configuration:

Cluster topology The topology describes the underlying framework—the nodes, the networks, and the storage. PowerHA uses this framework to keep the other main component, the cluster resources, highly available.

Cluster resources The resources are those components that PowerHA can move from node to node—for example, service IP labels, file systems, and applications.

When the cluster is configured, the cluster topology and resource information is entered on one node, a verification process is then run, and the data synchronized out to the other nodes defined in the cluster. PowerHA keeps this data in its own Object Data Manager (ODM) classes on each node in the cluster.

While PowerHA can be configured / modified from any node in the cluster, it is good practice to perform administrative operations from one node to ensure that PowerHA definitions are kept consistent across the cluster, thus preventing a cluster configuration update from multiple nodes that might result in inconsistent data.

We recommend using the following basic steps for configuring your cluster:

1. Define the cluster and the nodes.

2. Discover the additional information (networks, disks).

3. Define the topology.

4. Verify and synchronize to check for errors.

5. Define the resources and resource groups.

6. Verify and synchronize.

AIX configurationYou should be aware that PowerHA makes some changes to the system when it is installed or started. We describe these changes in the following sections.

Installation changesThe following AIX configuration changes are made:

� Files modified:

– /etc/inittab– /etc/rc.net– /etc/services


– /etc/snmpd.conf– /etc/snmpd.peers– /etc/syslog.conf– /etc/trcfmt– /var/spool/cron/crontabs/root

� The hacmp group is added.

� Also, using cluster configuration and verification, the file /etc/hosts can be changed by adding or modifying entries.

� The following network options are set to “1” by RSCT topsvcs startup:

– nonlocsrcroute– ipsrcrouterecv– ipsrcroutesend– ipsrcrouteforward– ip6forwarding

� The verification utility ensures that the value of each network option is consistent across all cluster nodes for the following settings:

– tcp_pmtu_discover – udp_pmtu_discover– ipignoreredirects – routerevalidate

Tuning operating system parametersIn the past, tuning AIX for PowerHA was encouraged. However, we adopt the philosophy that the system should be tuned for the application, not for PowerHA. For example, if the system hangs for a while and PowerHA reacts, the system should be tuned so the application is unlikely to hang. While PowerHA can be tuned to be less sensitive, there are no general AIX tuning rules for PowerHA.

2.2 Software componentsThe following layered model describes the software components of a PowerHA cluster:

Application layer Any application that is made highly available through the services provided by PowerHA.

PowerHA layer Software that responds to changes within the cluster to ensure that the controlled applications remain highly available.

Chapter 2. High availability components 25

RSCT layer The daemons that monitor node membership, communication interfaces, device health and advises PowerHA accordingly.

AIX layer Provides support for PowerHA through the LVM layer which manages the storage and TCP/IP layer which provides communication.

LVM layer Provides access to storage and status information back to PowerHA.

TCP/IP layer Provides reliable communication, both node to node and node to client.

Figure 2-1 Software model of a PowerHA cluster

� The application layer can consist of:

– Application code (programs, daemons, kernel extensions, and so on)– Application configuration data (files or binaries) – Application (customer) data (files or raw devices)

� The PowerHA layer consists of:

– PowerHA code (binaries—daemons and executable commands, libraries, scripts)

– PowerHA configuration (ODM, ASCII files)

– PowerHA log files

LVM layerManages diskspace

AIX layerProvides operating system services

LVM layerManages diskspace

TCP/IP layerManagescommunications

AIX layerProvides operating system services

RSCT layerRSCT layerProvides monitoring and coordination of subsystems

RSCT layer

HACMP layer HACMP layerHACMP layerProvides highly available services to applications

Application layerServes clients

Node A Node B

TCP/IP layerManagescommunications


– Services:

• Cluster communication daemon (clcomdES)• Cluster manager (clstrmgrES)• Cluster information daemon (clinfoES)

� The RSCT layer consists of:

– RSCT code (binaries—daemons and commands, libraries, scripts)

– Configuration files (binary registry and ASCII files)

– Services:

• Topology and group (topsvcs and grpsvcs)• Resource monitoring and control (RMC)

� The AIX layer consists of:

– Kernel, daemons and libraries– Device drivers– Networking and TCP/IP layer– Logical volume manager (LVM)– Configuration files (ODM, ASCII)

2.3 Cluster topologyThe cluster topology represents the physical view of the cluster and how hardware cluster components are connected using networks (IP and non-IP). To understand the operation of PowerHA, you need to understand the underlying topology of the cluster, the role each component plays and how PowerHA interacts. In this section we describe:

� PowerHA cluster� Nodes� Sites� Networks� Communication interfaces / devices� Persistent node IP labels / addresses� Network modules (NIMs)� Topology and group services� Clients

Figure 2-2 shows typical cluster topology with:

� Three nodes

� Two IP networks (PowerHA logical networks) with redundant interfaces on each node.

� Shared storage


� Point-to-point non-IP connections (serial) between nodes configured as independent physical networks, but connecting nodes in a ring configuration.

Figure 2-2 Example of cluster topology

PowerHA clusterA name (up to 64 characters, [a-z],[A-Z],[0-9] or “_”, starting with an alpha) is assigned to the cluster. A cluster ID (number) is also associated with the cluster. PowerHA automatically generates a unique ID for the cluster. All heartbeat packets contain this ID, so two clusters on the same network should never have the same ID.

Cluster nodesNodes form the core of a PowerHA cluster. A node is a system running an image of the AIX operating system (standalone or a partition), PowerHA code and application software. The maximum number of nodes supported in a PowerHA cluster is 32.

en3

node3en1

en2

en0

tty1node2

en1

en2

en0

en3

node1en1

en2

en0

Shared storage

Communication device

CommunicationInterface

en3 tty1

tty1

tty2

tty1

tty2

tty2


When defining the cluster node, a unique name must assigned and a communication path to that node must be supplied (IP address or a resolvable IP label associated with one of the interfaces on that node). The node name can be the hostname (short), a fully qualified name (hostname.domain.name) or any name up to 64 characters ([a-z],[A-Z],[0-9]or”_” and must start with an alpha).

The communication path is first used by PowerHA to confirm that the node can be reached, then used to populate the ODM on each node in the cluster after secure communications have been established between the nodes. However, after the cluster topology has been configured, PowerHA can use any interface to attempt to communicate between nodes in the cluster.

PowerHA no longer requires the hostname to be a resolvable IP label (for example, an address on one of the IP interfaces). For consistency, we recommend to use the hostname which also resolves to the persistent IP address associated with each node. However, this is not mandatory.

SitesThe use of sites is optional. They are designed for use in cross-site LVM mirroring and/or PowerHA/XD configurations. A site consists of one or more nodes grouped together at a given location. PowerHA supports a cluster divided into two sites. Site relationships also can exist as part of a resource group’s definition, but should be set to ignore if sites are not used.

It is possible to use sites outside PowerHA/XD and cross-site LVM mirroring, but appropriate methods or customization must be provided to handle site operations. If sites are defined, site events are run during node_up and node_down events.

NetworksIn PowerHA, the term “network” is used to define a logical entity that groups the communication interfaces and devices used for communication between the nodes in the cluster, and for client access. The networks in PowerHA can be defined as IP networks and non-IP networks.

The following terms are used to describe PowerHA networking:

IP address The dotted decimal IP address.

IP label The label that is associated with a particular IP address as defined by the name resolution method (DNS or static using /etc/hosts).

Base IP label / address The default IP label / address that is set on the interface by AIX on startup. The base address of the interface.


Service IP label / address An IP label / address over which a service is provided. It can be bound to a single node or shared by multiple nodes. Although not part of the topology, these are the addresses that PowerHA keeps highly available.

Boot interface Earlier versions of PowerHA have used the terms “boot adapter” and “standby adapter” depending on the function. These have been collapsed into one term to describe any IP network interface that can be used by PowerHA to host a service IP label / address.

IP aliases An IP alias is an IP address that is added to an interface, rather than replacing its base IP address. This is an AIX function that is supported by PowerHA. However, PowerHA assigns to the IP alias the same subnet mask of the base IP address over which it is configured.

Logical network interface The name to which AIX resolves a port (for example, en0) of a physical network adapter.

PowerHA communication interfacesA communication interface, or just interface, refers to the physical adapter that supports the TCP/IP protocol and is represented by an IP address. The network interfaces that are connected to a common physical network are combined into logical networks that are used by PowerHA.

Each interface is capable of hosting several IP addresses. When configuring a cluster, you define the IP addresses that PowerHA monitors using RSCT (base or boot IP addresses) and the IP addresses that PowerHA itself keeps highly available (the service IP addresses and persistent aliases).

Important: It is good practice to have all the above IP addresses defined in /etc/hosts file on all nodes in the cluster. There is certainly no requirement to use fully qualified names. While PowerHA is processing network changes, the NSORDER variable is set to local (for example, pointing to /etc/hosts), however, it is also good practice to set this in /etc/netsvc.conf.


PowerHA communication devicesPowerHA topology also includes point-to-point non-IP networks such as serial RS232, target mode SCSI, target mode SSA, and disk heartbeat connections. Both ends of a point-to-point network are AIX devices (as defined in /dev directory), such as /dev/tty1, /dev/tmssa1, /dev/tmscsi1, and /dev/hdisk1.

These non-IP networks are point-to-point connections between two cluster nodes, and are used by RSCT for control messages and heartbeat traffic. These networks provide an additional protection level for the PowerHA cluster, in case the IP networks or the TCP/IP subsystem on the nodes fails.

Communication adapters and linksAn X.25 adapter is used to provide a highly available communication link, and the following configurations can be defined as resources in PowerHA:

� SNA configured over LAN network adapters� SNA configured over X.25 adapter� Native X.25 links

PowerHA managed these links as part of resource groups, thus ensuring high availability communication links. In the event of a physical network interface failure, an X.25 link failure, or a node failure, the highly available communication link is migrated over to another available adapter on the same node, or on the takeover node (together with all the resources in the same resource group).

Physical and logical networksA physical network connects two or more physical network interfaces. There are many types of physical networks, and PowerHA broadly categorizes them as IP-based and non-IP networks:

� TCP/IP-based, such as Ethernet, or Token Ring

� Device-based, such as RS-232, target mode SCSI (tmscsi), target mode SSA (tmssa), or disk heartbeat.

PowerHA, like AIX, has the concept of logical networks. Two or more network interfaces on one physical network can be grouped together to form a logical network. These logical networks are known by a unique name (for example net_ether_01 if assigned by PowerHA) and can consist of one or more subnets. A logical network can be viewed as the group of interfaces used by PowerHA to host one or more service IP labels / addresses. RSCT forms its own networks connecting interfaces on the same subnet, and if needed can provide temporary routing between the subnets.


Networks definitions can be added using the SMIT panels; however, we recommend using the discovery process before starting to configure your networks. Running the discovery process populates pull-down lists that can be used in the configuration process.

The discovery process harvests information from the /etc/hosts file, defined interfaces, defined adapters, target mode devices, and existing enhanced concurrent mode disks and creates the following files:

clip_config Contains details of the discovered interfaces, used in the F4 SMITlists.

clvg_config Contains the details of each physical volume (PVID, volume group name, status, major number, and so on) as well as a list of free major numbers.

Running discovery can also reveal any inconsistency in the network at your site.

Global networkA global network is a collection of multiple PowerHA networks of the same type, for example, Ethernet. As discussed before, the PowerHA logical networks can be composed of any combination of physically different networks, or different subnets. It is important for PowerHA to know where a single network on a node has failed, or if there is a global network failure. In a global failure, there is nothing to be gained by moving a resource group to another node.

2.3.1 RSCT and PowerHA heartbeatingThe PowerHA cluster manager uses a variety of sources to get information about possible failures:

� RSCT monitors the state of the network interfaces and devices.� AIX LVM monitors the state of the disks, logical volumes and volume groups.� PowerHA Application monitors the state of the applications.

PowerHA, like many other types of clusters, uses heartbeat keep alive (KA) packets to monitor the availability of network interfaces, communication devices, and IP labels (service, non-service, and persistent). PowerHA can use both IP and non-IP networks to exchange heartbeat packets or messages between the nodes. Through heartbeating, PowerHA maintains information about the status of the interfaces, devices, and adapters, and through them, the availability of the cluster nodes.


Heartbeating is exclusively based on RSCT topology services. The RSCT daemons use UDP for the heartbeat packets between nodes. When PowerHA is started on a node, PowerHA passes the network topology stored in the HACMP ODM configuration to RSCT. RSCT uses this information to construct its communication groups, heartbeat rings and in turn provides failure notifications back to PowerHA.

RSCT consists of the following components:

Resource monitoring HACMP 5.1 and earlier used the event managementand control (RMC) subsystem. RMC is a distributed subsystem that

provides a set of high availability services. It creates events by matching the state of a systems resources with information about the resource conditions of interest to the clients. Clients can then use event notifications to trigger recovery actions.

Resource managers These managers are daemons that are actually a part of RMC that represent an actual administrative task or system function. PowerHA uses RMC for dynamic node priority, application monitoring and user defined events. For example, the resource monitor, which represents the percentage that the CPU is idle, is used if a node is to fall over to the node with the highest CPU idle.

Group services Provides a system wide and highly available facility for monitoring and coordinating changes in state of an application running on a set of nodes.

Topology services Handles the heatbeating over the multiple networks in a cluster. Has knowledge of the network configuration and provides information about the state of the network interfaces and adapters as well as the nodes themselves.


Figure 2-3 shows some of the RSCT daemons and how they interact with other PowerHA daemons.

Figure 2-3 RSCT and important cluster daemons

This heartbeating is performed by exchanging messages (heartbeat packets) between the nodes over each communication interface and device defined to the cluster (topology). Each node sends a heartbeat packet and expects to receive a packet over each network within the interval determined by the network sensitivity. Because each host only communicates with its two neighbors on each network ring, a host only receives one packet from a particular node every two heartbeat intervals. This is important in calculating how long it takes PowerHA to determine a failure.

RSCT only monitors the base addresses on the interfaces (unless heartbeat over IP alias is selected), so does not monitor the service IP labels (if using IPAT via alias) or the persistent IP labels.

RSCT Topology Services

hatsd

RSCT Group Services

hagsd

RSCT Resource Monitoring

and controlrmcd

ClusterManager

clstrmgrES

Interface and nodemembership info

HA Event Scripts

ResourceManager daemons

Recoveryprogams

EnhancedConcurrent

VG

Control node access to enchanced concurrent

Volume groups

Interface,topologychanges

AIX characteristicsand utilization

Cluster information daemonclinfoES

Cluster status info to clients


PowerHA is responsible for keeping track of the aliased labels (service IP if IPAT via aliasing being used, and the persistent alias labels)—both through the state of the underlying interface, the link status, and by monitoring the received packet count (similar to the netstat command output). HACMP V5.2 and later versions attempt to bring up an interface if they find it in down or detached state (as reported by the lsattr command) but with the physical link still active.

RSCT determines that one of the interfaces or adapters on a node has failed if it is no longer receiving heartbeat packets from it, but is still receiving heartbeat information through other interfaces and adapters on that node. In this case, PowerHA preserves the communication to the node by transferring the service (and persistent) IP labels to another network interface on the same network on the same node.

If all interfaces on that PowerHA network are unavailable on that node, then PowerHA transfers all resource groups containing IP labels on that network to another node with available interfaces. If RSCT fails to receive heartbeat packets through any of the interfaces or adapters on a node, then that node is considered to have failed, and PowerHA tries to bring the affected resource groups online on another node.

RSCT communicationsPowerHA is responsible for starting up RSCT (topology and group services) on nodes joining the cluster. RSCT organizes its networks and inter-node communications, depending on network type, as follows:

IP networks A ring (RSCT communication group) is formed for each logical subnet for the interfaces defined to PowerHA, in IP address order. Each node communicates with its two neighbors—the nodes with the next higher and next lower IP address. Each IP ring (also known as an RSCT communication group) is modified as each incoming node joins the cluster.

Serial networks RSCT also creates a communication group for each pair of communication devices or PowerHA serial network. RSCT then builds a logic network from these communication groups to pass information between nodes using the non-IP communications.

Note: If you use the ifconfig command to bring an adapter down for testing, PowerHA brings it up, without involving any PowerHA event processing.


Considering a cluster topology as seen in Figure 2-2 on page 28, RSCT builds up three heartbeat networks—one for each IP subnet, and one for the non-IP device ring, as shown in Figure 2-4.

Figure 2-4 PowerHA networks superimposed on the topology

Earlier versions of PowerHA allowed for maximum two serial (non-IP) communication devices per type per node, so only the ring configuration was possible for three or more nodes. Newer PowerHA (5.1 and later) support a configuration with non-IP networks connecting every node to every other node, if there are sufficient devices available on each node.

RSCT uses particular nodes to manage the communications around these communication groups. The nodes that fulfill these tasks are chosen dynamically, and can change each time a node enters or leaves the cluster.

Group leader The node with the highest IP address in the first communication group created is called the group leader. This node keeps the information about the other nodes in the cluster and the network configuration.

en3

node3en1

en2

en0

tty1node2

en1

en2

en0

en3

node1en1

en2

en0

Shared storage

Communication device

Networknet_ether_01

Networknet_ether_02Communication

Interface

en3 tty1

tty1

tty2

tty1

tty2

tty2

Networknet_rs232_03

Networknet_rs232_01

Networknet_rs232_02


Group leader backup The node with the second highest IP address in the first communication group is called the crown prince. This node keeps a backup copy of the group leaders topology data, and takes over as the group leader should the group leader leave the cluster.

Mayor The node with the third highest IP address in the first communication group (or the GL backup if third node is not available). This node is responsible for ensuring that all nodes in the cluster are informed of any changes in cluster topology.

Figure 2-5 shows an example of two communication groups formed for the two IP subnets and the RSCT roles of the nodes.

Figure 2-5 Example showing RSCT communication groups and node roles

As discussed, PowerHA relies heavily on RSCT and the data received in the heartbeat packets for information about the state of the topology of the cluster, however PowerHA must be really sure that a node has actually failed before it takes any action. If there is no redundancy in the network, then PowerHA could easily make an incorrect assumption about the state of the nodes. For example, if RSCT was relying only on TCP/IP network, then a failure of a network component (switches, routes, hubs) or the TCP/IP subsystem would be incorrectly interpreted as a failure of one or more nodes.

Node 5172.1.1.8

Node 2172.1.1.3

Node 3172.1.1.5

Node 4172.1.1.6

Node 1172.1.1.1

Group Leader

Group LeaderBackupMayor

Node 5172.1.1.8

Node 2172.1.1.3

Node 3172.1.1.5

Node 4172.1.1.6

Node 1172.1.1.1

Group Leader

Group LeaderBackupMayor


Figure 2-6 shows an example of a cluster relying totally on TCP/IP for heartbeating.

In this example, nodes 1 and 2 assume that nodes 3 and 4 have failed and proceed to bring their resources on line. Similarly nodes 3 and 4 assume that nodes 1 and 2 have failed. This is called a partitioned cluster can lead to data corruption as nodes on either side of the split attempt to simultaneously access data and start applications.

Figure 2-6 Split cluster caused by failure of network component

To help PowerHA distinguish between a real node failure and a failure of the TCP/IP subsystem, another communication path between the nodes is required, a path that does not rely on TCP/IP. PowerHA uses non-IP (point-to-point or device based) serial networks for this.

Node 2 Node 3 Node 4Node 1

.


As RSCT monitors both the device based and TCP/IP networks, PowerHA can then use this information to distinguish between a node failure and a IP network / subsystem failure. It is recommended that each cluster have at least one non-IP network defined for each of the nodes in the cluster to prevent cluster partitioning. If serial networks had been used in the example shown in Figure 2-6 on page 38, PowerHA would have recognized the failure correctly and there would have been no risk of data loss or corruption.

For a recommended two-node cluster configuration, see Figure 2-7.

Figure 2-7 Heartbeating in a PowerHA cluster

Another scenario where RSCT and PowerHA would have trouble accurately determining the state of an interface is the single network interface. Upon failure, if an interface finds that it is the sole interface on a network, then there is no other address with which to exchange heartbeat packets.


There are technologies today, such as EtherChannel, network interface backup, shared ethernet adapters, and so on, that can provide redundancy for individual interfaces.

Subnetting and RSCTStarting with AIX 5L, there is support for multiple routes to the same destination in the kernel routing table. This implies that if multiple matching routes meet the same criteria, routing can be performed alternatively using each of the subnet routes. This is also known as route striping.

Thus the effect of multiple interfaces on the same subnet on one node is that packets are sent out each of the interfaces alternately. This means that other nodes, and therefore RSCT, are not able to determine which interface the heartbeat packet came from. To avoid this situation where RSCT, and therefore PowerHA, cannot be certain of the state of the interfaces, there are strict rules regarding the subnet configuration. These rules depend on the network configuration and are discussed in the IP address takeover sections.

There is a no option in AIX called mpr_policy, which allows the configuration of TCP/IP to ensure that packets for a particular destination only come from one adapter. To configure TCP/IP to use an adapter based on the destination of the packet, use mpr_policy = 5. We recommend to set this option if any applications are sensitive to the adapter from which the packet comes.

2.3.2 Heartbeat over IP aliasesPowerHA supports the use of heartbeating over IP aliases. This configuration removes the subnet restrictions on the base interfaces that are discussed in the above section. Now it is possible to configure the base IP addresses without any subnet restrictions, and PowerHA and RSCT then configure and use a range of separate subnets for heartbeat communications.

These subnets do not need to be routable and allow the base IP addresses to be configured according the requirements of the site, rather than according to PowerHA requirements. For example, if your network administrator requires that the base IP addresses for each adapter are in the same subnet. Without heartbeat over IP aliases, PowerHA would not support this configuration as RSCT would not be able to monitor the state of each adapter.

However, we still recommend that the service IP address be on a different subnet to the base IP addresses on the interfaces, so that PowerHA can still accurately monitor the Service IP addresses (unless you can take advantage of the mpr_policy option in AIX 5.3).


To configure heartbeat over IP aliases, a base (starting) heartbeat alias address must be specified in the PowerHA configuration. When PowerHA starts, it builds up an alias heartbeat network starting from this address, by calculating an IP address for each node based on the node number. This is defined as a separate PowerHA network with the number of subnets that matches the number of interfaces on a node. When specifying the heartbeat alias base address, the following rules apply:

� PowerHA only supports the use of the subnet mask on the underlying adapter for the heartbeat over IP alias network.

� The base adapter subnet mask on the adapter must be greater than the number of nodes because each node have an address on that subnet.

� There must be sufficient address space above the specified base address to allow for one subnet for each interface on a node.

� There must be no addresses in the site within the range of aliases that PowerHA creates. These address must also be out of any range of DNS.

� PowerHA still requires that each interface can communicate with each other interface - that is are on the same physical network.

When heartbeat over IP aliases is configured, PowerHA builds the required alias addresses following the above rules and loads this information into the PowerHA ODM. When PowerHA integrates a node into the cluster and RSCT starts, the alias addresses are added to each adapter under PowerHA control. RSCT then uses these addresses to build up its communication groups. So it is these IP alias addresses that RSCT monitors, not the base IP addresses on the interface.

Next, Figure 2-8 shows an example of a three-node cluster, with each node having three interfaces on the same physical network and the same subnet (see Table 2-1). The subnet mask on the base adapters was 255.255.255.0.

Table 2-1 Base IP addresses

In this example, PowerHA creates three IP alias subnets (one for each interface) with three addresses on each (one for each node). Figure 2-2 shows the IP addresses that PowerHA would use for heartbeat over IP aliases if a base address of 198.10.1.1 was configured.

Node 1 Node 2 Node 3

en0 135.2.5.12 135.2.5.22 135.2.5.27

en1 135.2.5.13 135.2.5.23 135.2.5.28

en2 135.2.5.14 135.2.5.24 135.2.5.29


Table 2-2 Configured heartbeat over alias IP addresses (base 198.10.1.1)

Figure 2-8 shows three heartbeat rings (communication groups) that RSCT uses.

Figure 2-8 Three-node cluster with heartbeat over IP alias

Note: If a base of x.x.x.1 is selected, PowerHA starts with x.x.x.2 as the first address on the first interface. However if you select x.x.x.0, PowerHA uses that address and it will not be usable.

Node 1 Node 2 Node 3

en0 - communication group 1 198.10.1.2 198.10.1.3 198.10.1.4



Note: None of the three subnets (198.10.1/24, 198.10.2/24, 198.10.3/24) need to be routable.


Heartbeat over IP aliases supports both IP address takeover mechanisms:

IPAT via replacement The service IP label replaces the boot IP address on the interface. The heartbeat IP alias address remains.

IPAT via aliasing The service IP label is added as an alias on the interface with the heartbeat IP alias.

2.3.3 TCP/IP networksThe IP network types supported in PowerHA 5.5 are:

� Ethernet (ether)� Token-ring (token)� Fiber Distributed Data Interface - FDDI (fddi)� Asynchronous Transfer Mode- ATM and ATM LAN Emulation) (atm)� EtherChannel (ether)� IP Version 6 (IPV6)� InfiniBand®

The following IP network types are not supported:

� Virtual IP Address (VIPA)� Serial Optical Channel Converter (SOCC)� Serial Line IP (SLIP)� Fibre Channel Switch (FCS)� IEEE 802.3

PowerHA is designed to work with any TCP/IP network. These networks are used to:

� Allow clients to access the nodes (for example, applications)

� Enable the nodes to exchange heartbeat messages

� Serialize access to data (concurrent access environments, for example, Oracle Real Application Cluster)

Note: The SP Switch (hps) is no longer supported starting with PowerHA 5.5. However. it does still appear as a network type option. This will be corrected in a future update.


TCP/IP networks can be classified as:

Public These are logical networks designed for client communication to the nodes. Each is built up from a collection of the IP adapters, so that each network can contain multiple subnets. As these networks are designed for client access, IP Address takeover is supported.

Private These networks are designed for use by applications that do not support IP takeover, for example, Oracle RAC or HAGEO geographic networks. All interfaces are defined as service and heartbeat packets are sent over these networks.

2.3.4 IP address takeover mechanismsOne of the key roles of PowerHA is to maintain the service IP labels / addresses highly available. PowerHA does this by starting and stopping each service IP address as required on the appropriate interface. When a resource group is active on a node, PowerHA supports two methods of activating the service IP addresses:

� By replacing the base (boot-time) IP address of an interface with the service IP address. This method is known as IP address takeover (IPAT) via IP replacement. This method also allows the takeover of a locally administered hardware address (LAA)—hardware address takeover.

� By adding the service IP address as an alias on the interface, for example, in addition to the base IP address. This method is known as IP address takeover via IP aliasing. This is the default for PowerHA.

To change this behavior, the network properties can be changed using the extended configuration menus.

It is worth noting that each method imposes subnet restrictions on the boot interfaces and the service IP labels, unless the heartbeat over IP alias feature is used.

IPAT via IP replacementThe service IP label / address replaces the existing address on the interface. Thus only one service IP label / address can be configured on one interface at one time. The service IP label should be on the same subnet as one of the base IP addresses. This interface will be the first one used by a service IP label when a resource group becomes active on the node. Other interfaces on this node cannot be in the same subnet, are traditionally referred to as standby interfaces, and are used if a resource group falls over from another node, or if the boot interface fails. This method can save subnets, but requires extra hardware. See Figure 2-9.


Figure 2-9 IPAT via IP replacement

Note: When using IPAT via IP replacement, it is also possible to configure hardware address takeover (HWAT). For HWAT, a locally administered MAC (Media Access Control) address becomes part of the service IP label definition, and at time of replacement, the MAC address on the interface is also changed. This ensures that the ARP caches on the subnet do not need to be updated and that MAC address dependent applications can still point to the correct host.

en0

en1

en0

en1

node1 node2

10.9.2.210.9.2.1

Netmask 255.255.255.0

10.9.1.11Service IP label10.9.1.10

Boot IP label

en0

en1

en0

en1

node1 node2

10.9.2.210.9.2.1

Netmask 255.255.255.0

10.9.1.2Boot IP label10.9.1.1

(also knows as standby IP label)

Boot IP label

Interface configuration after IPAT via replacement

Interface configuration at boot

(also knows as standby IP label)


If the interface holding the service IP address fails, when using the IPAT via IP replacement, PowerHA moves the service IP address on another available interface on the same node and on the same network; in this case, the resource group associated is not affected.

If there is no available interface on the same node, the resource group is moved together with the service IP labels to another node with an available interface on the same logical network. If there are no nodes or interfaces available, the resource group goes into an ERROR state. When PowerHA recovers any adapters, resource groups in error state are checked to determine if they can be brought back on line.

IPAT via aliasingThe service IP label or address is aliased onto the interface without removing the underlying boot IP address using the ifconfig command. See Figure 2-10. IPAT through aliasing also obsoletes the concept of standby interfaces—all network interfaces are labeled as boot interfaces.

As IP address are added to the interface via aliasing, more than one service IP label can coexist on one interface. By removing the need for one interface per service IP address that the node could host, IPAT through aliasing is the more flexible option and in some cases can require less hardware. IPAT through aliasing also reduces fallover time, as it is much faster to add an alias to an interface, rather than removing the base IP address and then apply the service IP address.

Even though IPAT through aliasing can support multiple service IP labels / addresses, we still recommend that you configure multiple interfaces per node per network. It is far less disruptive to swap interfaces compared to moving the resource group over to another node.

Take Note: With IPAT through replacement, RSCT and PowerHA are not able to monitor a node that has more than one resource group online if both Service IP labels are on the same subnet. This is because the node now has two interfaces with base addresses on the same subnet. For this reason, heartbeat over IP alias is recommended or in AIX mpr_policy can be set to 5.


Figure 2-10 IPAT via IP aliases

IPAT through aliasing is only supported on networks that support the gratuitous ARP function of AIX. Gratuitous ARP is when a host sends out an ARP packet prior to using an IP address and the ARP packet contains a request for this IP address. As well as confirming that no other host is configured with this address, it ensures that the ARP cache on each machine on the subnet is updated with this new address.

If there are multiple service IP alias labels / addresses active on one node, PowerHA by default will equally distribute them among the available interfaces on the logical network. This placement can be controlled by using distribution policies as explained in more detail in 13.2.1, “Configuring service IP distribution policy” on page 610.

en0

en1

en0

en1

node1 node2

10.9.2.210.9.2.1

Netmask 255.255.255.0

10.9.9.2Service IP label10.9.9.1

Boot IP label

en0

en1

en0

en1

node1 node2

10.9.2.210.9.2.1

Netmask 255.255.255.0

10.9.1.2

Boot IP label

10.9.1.1

Boot IP label

Interface configuration after IPAT via aliasing

Interface configuration at boot

10.9.1.210.9.1.1

Boot IP label


For IPAT through aliasing, each boot interface on a node must be on a different subnet, though interfaces on different nodes can obviously be on the same subnet, unless, as mentioned above, heartbeat over IP alias is used. The service IP labels can be on one or more subnets, but they cannot be the same as any of the boot interface subnets.

2.3.5 Persistent IP label or addressA persistent node IP label is an IP alias that can be assigned to a network for a specified node. A persistent node IP label is a label that:

� Always stays on the same node (is node-bound)� Co-exists with other IP labels present on the same interface� Does not require installation of an additional physical interface on that node� Is not part of any resource group

Assigning a persistent node IP label for a network on a node allows you to have a highly available node-bound address on a cluster network. This address can be used for administrative purposes because it always points to a specific node regardless of whether PowerHA is running.

The persistent IP labels are defined in the PowerHA configuration, and they become available as soon as the cluster definition is synchronized. A persistent IP label remains available on the interface it was configured, even if PowerHA is stopped on the node, or the node is rebooted. If the interface on which the persistent IP label is assigned fails while PowerHA is running, the persistent IP label is moved to another interface in the same logical network on the same node.

If the node fails or all interfaces on the logical network on the node fail, then the persistent IP label will no longer be available.

Important: For IPAT via aliasing networks, PowerHA will briefly have the service IP addresses active on both the failed Interface and the takeover interface so it can preserve routing. This might cause a “DUPLICATE IP ADDRESS” error log entry, which can be ignored.

Note: It is only possible to configure one persistent node IP label per network per node. For example, if you have a node connected to two networks defined in PowerHA, that node can be identified via two persistent IP labels (addresses), one for each network.


The following subnetting restrictions apply to the persistent IP label:

� For IPAT via replacement networks

The persistent IP alias must be on a different subnet to the standby interfaces and might be on the same subnet as the boot interfaces (same as the service IP labels).

� For IPAT via aliasing networks

The persistent IP alias must be on a different subnet might each of the boot interface subnets and either in the same subnet or in a different subnet as the Service IP address.

The persistent node IP labels can be created on the following types of IP-based networks:

� Ethernet� Token Ring� FDDI� ATM LAN Emulator

2.3.6 Device based or serial networks

Serial networks are designed to provide an alternative method for exchanging information using heartbeat packets between cluster nodes. In case of IP subsystem or physical network failure, PowerHA can still differentiate between a network failure and a node failure when an independent path is available and functional.

Serial networks are point-to-point networks, and therefore, if there are more than two nodes in the cluster, the serial links should be configured as a ring, connecting each node in the cluster. Even though each node is only aware of the state of its immediate neighbors, the RSCT daemons ensure that the group leader is aware of any changes in state of any of the nodes.

Even though it is possible to configure a PowerHA cluster without non-IP networks, we strongly recommend that you use at least one non-IP connection between each node in the cluster.

The following devices are supported for non-IP (device-based) networks in PowerHA:

� Serial RS232 (rs232)� Target mode SCSI (tmscsi)� Target mode SSA (tmssa)� Disk heartbeat (diskhb)� Multi-node disk heartbeat (mndhb)


In the following sections, we describe the types of serial network that can be used.

RS232This is a serial network using the RS232 ports, either the built-in serial ports or a multi-port serial adapter.

The default baud rate set by RSCT for the RS232 network is 38400 and this should be changed if this is not supported by your modem (in case you want to use this network between remote locations).

Target mode SCSIAnother possibility for a non-IP network is a target mode SCSI connection. Whenever you use a shared SCSI device, you can also use the SCSI bus for exchanging heartbeats. Target mode SCSI (tmscsi) is only supported with SCSI-2 Differential or SCSI-2 Differential Fast / Wide devices. SCSI-1 Single-Ended and SCSI-2 Single-Ended do not support serial networks in a PowerHA cluster.

Target mode SSAIf you are using shared SSA devices, target mode SSA (tmssa) can be used for non-IP communication in PowerHA. This relies on the built in capabilities of the SSA adapters (using the SCSI communication protocol). The SSA devices in a SSA loop (disks and adapters) use the communication between “initiator” and “target”; SSA disks are “targets”, but the SSA adapter has both capabilities (“initiator” and “target”); thus, a tmssa connection uses these capabilities for establishing a serial-like link between PowerHA nodes. This is a point-to point communication network, which can communicate only between two nodes.

To configure a tmssa network between two cluster nodes, one SSA adapter on each node, in the same SSA loop, forms each endpoint.

Note: Take care to ensure that the ports selected support heartbeating.

Important: PowerHA 5.5 is not supported on any version of AIX where SSA devices are supported (AIX 5.3 and 6.1 do not provide SSA support). Although the network type still appears, this is not valid for use.


Disk heartbeat networkIn certain situations, RS232, tmssa, and tmscsi connections are considered too costly or complex to set up. Heartbeating using disk (diskhb) provides users with an easy to configure alternative that requires no additional hardware. The only requirement is that the “disks” (physical disks or LUNs on external storage) be used in enhanced concurrent mode. Enhanced concurrent mode disks use RSCT group services to control locking, thus freeing up a sector on the disk that can now be used for communication. This sector, which was formerly used for SSA Concurrent mode disks, is now used for writing heartbeat information.

Any disk that is part of an enhanced concurrent volume group can be used for a diskhb network, including those used for data storage. Also, the volume group that contains the disk used for a diskhb network does not have to be varied on.

Any disk type can be configured as part of an enhanced concurrent volume group, making this network type extremely flexible. The endpoint “adapters” for this network are defined as the node and physical volume pair.

In case of disk heartbeat, the recommendation is to have one point-to-point network consisting of one disk per pair of nodes per physical enclosure. One physical disk cannot be used for two point-to-point networks.

Traditional disk heartbeatThe heartbeat using disk (diskhb) feature was introduced in HACMP V5.1, with a proposal to provide additional protection against cluster partitioning and simplified non-IP network configuration, especially for environments where the RS232, target mode SSA, or target mode SCSI connections are too complex or impossible to implement.

This type of network can use any type of shared disk storage (Fibre Channel, SCSI, or SSA), as long as the disk used for exchanging keep alive messages is part of an AIX enhanced concurrent volume group. The disks used for heartbeat networks are not exclusively dedicated for this purpose; they can be used to store application shared data (see Figure 2-7 on page 39 for more information).

By using the shared disks for exchanging messages, the implementation of a non-IP network is more reliable, and does not depend of the type of hardware used.

Note: An enhanced concurrent volume group is not the same as a concurrent volume group (which is part of a concurrent resource group), rather, it refers to the mode of locking by using RSCT.


Moreover, in a SAN environment, when using optic fiber to connect devices, the length of this non-IP connection has the same distance limitations as the SAN, thus allowing very long point-to-point networks.

By defining a disk as part of an enhanced concurrent volume group, a portion of the disk will not be used for any LVM operations, and this part of the disk (sector) is used to exchange messages between the two nodes.

These are the specifications for using the heartbeat using disk:

� One disk can be used for one network between two nodes. The disk to be used is uniquely identified on both nodes by its LVM assigned physical volume ID (PVID).

� The recommended configuration for disk heartbeat networks is one disk per pair of nodes per storage enclosure.

� This configuration requires that the disk to be used is part of an enhanced concurrent volume group, though it is not necessary for the volume group to be either active or part of a resource group (concurrent or non-concurrent). The only condition is that the volume group must be defined on both nodes.

More detailed information about traditional disk heartbeat, including configuring, can be found in 12.6.4, “Configuring traditional disk heartbeat” on page 589.

Multi-node disk heartbeatHACMP 5.4.1 introduced another form of disk heartbeating called multi-node disk heartbeat (mndhb). Unlike traditional disk heartbeat networks, it is not a single point-to-point network. But instead, as its name implies, it allows multiple nodes to use the same disk. However, it does require configuring a logical volume on an enhanced concurrent volume group. While this could reduce the total number of disks required for non-IP heartbeating it is, of course, recommended to have multiple to eliminate a single point of failure.

Multi-node disk heartbeating also offers the ability to invoke one of the following actions on its loss:

halt Halt the node (default).

fence Fence this node from the disks.

shutdown Stop PowerHA on this node (gracefully).

takeover Move the resource group(s) to a backup node.

Note: The cluster locking mechanism for enhanced concurrent volume groups does not use the reserved disk space for communication (as the “classic” clvmd does); it uses the RSCT group services instead.


More detailed information can be found in 12.6.5, “Configuring multi-node disk heartbeat” on page 591.

2.3.7 Network modulesPowerHA has a failure detection rate defined for each type of network and this can be configured using three predefined values or customized. The predefined values are slow, normal (the default) and fast, to customize, the interval between heartbeats and failure cycle can be set.

The interval between heartbeats (heartbeat rate - hbrate) defines the rate at which cluster services send “keep alive” packets between interfaces and devices in the cluster. The failure cycle (cycle) is the number of successive heartbeats that can be missed before the interface is considered to have failed. PowerHA supports sub-second heartbeat rates.

The time (in seconds) to detect a failure is:

hbrate * cycle * 2

PowerHA uses double the failure cycle as the time to detect a failure to allow for both the node and its neighbors to reach the conclusion. Also, to keep network traffic to a minimum, PowerHA only sends out one packet and expects to receive one, per logical network per heartbeat interval.

For serial networks, PowerHA declares the neighbor down after the failure detection rate has elapsed for that network type. PowerHA waits the same period again before declaring the device down. If still no heartbeats are received from the neighbor, PowerHA will not run the network_down event until both the local and remote devices have failed.

However, if the serial network is the last network left connection to a particular node, the node_down event will be triggered after the interface is detected as down.

Important: Currently, multi-node disk heartbeating can only be utilized in a resource group configured with a startup policy of Online On All Available Nodes. Currently only Oracle RAC takes advantage of mndhb. For more details consult Oracle Metalink note number 404474.1 found at:

https://metalink.oracle.com



The device based networks differ from IP based networks, because device based networks cannot distinguish between an interface down and a network down. Disk heartbeating is the exception. If the node can reach the disk, the interface is considered to be up, and the network is considered up if messages are being exchanged.

There is another characteristic that is defined for each network. This is the network grace period, which is the period of time after a particular network failure is detected, that failures of the same network type are ignored. This gives the cluster time to make changes to the network configuration without detecting any false failures.

2.3.8 ClientsA client is a system that can access cluster nodes over the network. They run some “front end” or client application that communicates with the application running in the cluster using the service IP labels. PowerHA ensures that the application is highly available for the clients, but they are not highly available themselves.

AIX 5L clients can make use of the cluster information (clinfo) services to receive notice of cluster events. Clinfo provides an API that displays the cluster status.

During resource group takeover, the application is started on another node, so clients must be aware of the action. In certain cases, the applications client uses the ARP cache on the client machine to reconnect to the server.

� In this case, there are two possible situations if the client is on the same subnet as the cluster nodes, as well as a third consideration:

– If IPAT via replacement is configured for the network that the applications service IP labels use, then MAC address takeover occurs as well, so there is no need to update the client machine’s ARP cache.

– If IPAT via aliasing is configured for the network, then a gratuitous ARP packet is sent out, so again the client’s ARP cache should not need updating.

– In these cases, if the client does not support the gratuitous ARP, their cache can be updated by /usr/es/sbin/cluster/etc/clinfo.rc. Whenever there is a network change, clinfo.rc sends one ping to each address or label in the PING_CLIENT_LIST variable. So to ensure that the client’s ARP cache is updated, add each clients address or label to the PING_CLIENT_LIST in clinfo.rc.

� However, if the client is on another subnet, then the foregoing conditions apply to the router. Clients running the clinfo daemon are able to reconnect to the cluster quickly after a cluster event.


2.3.9 Network security considerationsPowerHA security is important to limit both unauthorized access to the nodes and unauthorized interception of inter-node communication. Earlier versions of PowerHA used rsh to run commands on other nodes. This was difficult to secure, and IP addresses could be spoofed. PowerHA uses its own daemon, the cluster communication daemon (clcomdES) to control communication between the nodes.

PowerHA provides cluster security by:

� Controlling user access to PowerHA� Providing security for inter-node communications

For details on cluster security, see Chapter 8, “Cluster security” on page 457.

Connection authentication and encryptionAuthentication ensures the origin and integrity of the message, while encryption ensures that only the sender and recipient of the message are aware of its contents.

The following choices can be used for connection authentication:

Standard authentication This is the default; the communication daemon, clcomd authenticates against the IP address and limits the commands that can be run with root privilege. There is a set of PowerHA commands (those in /usr/es/sbin/cluster) that can be run as root, the remaining commands are run as nobody.

Kerberos authentication Kerberos authentication is supported in the SP environment.

Virtual private network A VPN can be configured for internode communication, and the persistent alias labels should be used to define the tunnels.

PowerHA supports the following encryption:

� Message Digest 5 (MD5) with Data Encryption Standard (DES)� MD5 with triple DES� MD5 with Advanced Encryption Standard (AES)

The key files are stored in /usr/es/sbin/cluster/etc.

Note: This encryption only applies to clcomdES, not the cluster manager.


The cluster communications daemonWith the introduction of clcomdES, there is no need for an /.rhosts file to be configured. However, some applications might still require the file to be present. The cluster communications daemon runs remote commands based on the principle of least privilege. This ensures that no arbitrary command can run on a remote node with root privilege. Only a small set of PowerHA commands are trusted and allowed to run as root. These are the commands in /usr/es/sbin/cluster. The remaining commands are run as nobody.

The cluster communications daemon is started by inittab, with the entry being created by the installation of PowerHA. The daemon is controlled by the system resource controller, so startsrc, stopsrc and refresh work. In particular, refresh is used to re-read /usr/es/sbin/cluster/etc/rhosts and moving the log files.

The cluster communication daemon uses port 6191 and authenticates incoming connects as follows:

� If the file /usr/es/sbin/cluster/etc/rhosts does not exist, all connections are refused.

� If the file does exist, then connections are checked against (in order):

– HACMPnode ODM class– HACMPadapter ODM class– /usr/es/sbin/cluster/etc/rhosts file

The /usr/es/sbin/cluster/etc/rhosts file is populated when the first synchronization is run with the interface addresses from the synchronizing node (it will still be blank on the synchronizing node). After the first synchronization, the HACMP ODM classes are populated, so the rhosts file can be emptied.

The real use of the file is before the cluster is first synchronized in an insecure environment. Populate the file on each node with only interface addresses of nodes in the cluster, and no other system can communicate via clcomdES.

The requesting host is asked to supply their IP label that matches the address found in the above location, and if a valid response is given, the connection is allowed. If all the above entries are empty, the daemon assumes that the cluster has not been configured, so it accepts incoming entries.

Tip: When the initial cluster synchronization is done, the file /usr/es/sbin/cluster/etc/rhosts is populated with the interface addresses of the synchronizing node.

Important: An invalid entry in /usr/es/sbin/cluster/etc/rhosts causes clcomdES to deny all connections.


The cluster communications daemon provides the transport medium for PowerHA cluster verification, global ODM changes and remote command execution. The following commands use clcomdES (they cannot be run by a regular user):

clrexec To run specific and potentially dangerous commandscl_rcp To copy AIX configuration filescl_rsh Used by the cluster to run commands in a remote shell

The cluster communication daemon also offers performance improvement over traditional rsh/rcp communications (r commands require a lot of time to process), and clcomdES also keeps the socket connections open, rather than closing after each operation. Because many PowerHA administration operations require access to the ODM, the cluster communications daemon also caches copies of each node’s ODM. When the ODM needs to be accessed, clcomdES compares the checksum of the cached ODM entries against the ODM on each node in the cluster, and only update as required.

The cluster communications daemon also sends its own heartbeat packets out to each node, and attempts to re-establish connection if there is a network failure.

The cluster communications daemon is also used for:

� File collections� Auto synchronization and automated cluster verification� User / passwords administration� C-SPOC

2.4 Resources and resource groupsThis section describes the PowerHA resource concepts:

� Definitions� Resources� Resource groups

2.4.1 DefinitionsPowerHA uses the underlying topology to ensure that the applications under its control and the resources they require are kept highly available:

� Service IP labels / addresses� Physical disks� Volume groups� Logical volumes� File systems


� Network File Systems� Application servers (applications)� Communication adapters and links� Tape resources� Fast connect resources� WLM integration

The applications and the resources required are configured into a resource groups. The resource groups are controlled by PowerHA as single entities whose behavior can be tuned to meet the requirements of the clients / users.

Figure 2-11 shows the resources that PowerHA makes highly available, superimposed on the underlying cluster topology.

Figure 2-11 Highly available resources superimposed on the cluster topology

In Figure 2-11, the following resources are made highly available:

� Service IP Labels� Applications shared between nodes� Storage shared between nodes

en3

node3en1

en2

en0

tty1node2

en1

en2

en0

en3

node1en1

en2

en0

Shared storage

en3 tty1

tty1

tty2

tty1

tty2

tty2

rg_01

share_vg

Service IP label 2

Service IP label 1

Service IP label 3

rg_02

rg_03


2.4.2 ResourcesThe following items are considered resources in a PowerHA cluster:

Service IP address / labelAs previously discussed, the service IP address is an IP address used by clients to access the applications or nodes. This service IP address (and its associated label) is monitored by PowerHA and is part of a resource group. There are two types of service IP address (label):

� Shared service IP address (label):

An IP address that can be configured on multiple nodes and is part of a resource group that can be active only on one node at a time.

� Node-bound service IP address (label):

An IP address that can be configured only on one node (is not shared by multiple nodes). Typically, this kind of service IP address is associated with concurrent resource groups.

The service IP addresses become available when PowerHA brings the associated resource group into an ONLINE status.

Starting with HACMP 5.3, the placement of the service IP labels can be specified using the following distribution preferences:

� Anti-Collocation:

This is the default, and PowerHA distributes the service IP labels across all boot IP interfaces in the same PowerHA network on the node.

� Collocation:

PowerHA allocates all service IP addresses on the same boot IP interface.

� Collocation with persistent label:

PowerHA allocates all service IP addresses on the boot IP interface that is hosting the persistent IP label. This can be useful in environments with VPN and firewall configuration, where only one interface is granted external connectivity.

� Anti-Collocation with persistent label:

PowerHA distributes all service IP labels across all boot IP interfaces in the same logical network, that are not hosting the persistent IP label. If no other interfaces are available, the service IP labels share the adapter with the persistent IP label.


It should be noted that if there are insufficient interfaces to satisfy the selected distribution preference, then PowerHA distributes IP labels using the interfaces available to ensure that the service IP labels are available.

The IP label distribution preference can also be changed dynamically, but is only used in subsequent cluster events. This is to avoid any extra interruptions in service. The cltopinfo -w command displays the policy used, but only if it is different from the default value of Anti-Collocation.

For more details, see 13.3, “Site specific service IP labels” on page 613.

StorageAll of the following storage types can be configured as resources:

� Volume groups (AIX and Veritas VM)

� Logical volumes (all logical volumes in a defined volume group)

� File systems (jfs and jfs2) - either all for the defined volume groups, or can be specified individually

� Raw disks - defined by PVID

If storage is to be shared by some or all of the nodes in the cluster, then all components must be on external storage and configured in such a way that failure of one node does not affect the access by the other nodes (for example, check the loop rules carefully if using SSA).

There are two ways that the storage can be accessed:

� Non-concurrent configurations where one node owns the disks, allowing clients to access them along with other resources required by the application. If this node fails, PowerHA determines the next node to take ownership of the disks, restart applications and provide access to the clients. Enhanced concurrent mode disks are often used in non-concurrent configurations. Remember that enhanced concurrent mode refers to the method of locking access to the disks, not to the access itself being concurrent or not.

� Concurrent configurations where one or more nodes are able to access the data concurrently with locking controlled by the application. The disks must be in a concurrent volume group.

For a list of supported devices by PowerHA, go to:

http://www-01.ibm.com/common/ssi/rep_sm/2/897/ENUS5765-F62/index.html

Choosing data protectionStorage protection (data or otherwise) is independent of PowerHA. For high availability of storage, you must use storage that has proper redundancy and fault tolerance levels. PowerHA does not have any control on storage availability.


http://www-01.ibm.com/common/ssi/rep_sm/2/897/ENUS5765-F62/index.html

For data protection, you can use either RAID technology (at storage or adapter level) or AIX LVM mirroring (RAID 1):

� Redundant Array of Independent Disks (RAID):

Disk arrays are groups of disk drives that work together to achieve data transfer rates higher than those provided by single (independent) drives. Arrays can also provide data redundancy so that no data is lost if one drive (physical disk) in the array fails. Depending on the RAID level, data is either mirrored, striped, or both. For the characteristics of some widely used RAID levels, see Table 2-3 on page 63.

– RAID 0:

RAID 0 is also known as data striping. Conventionally, a file is written out sequentially to a single disk. With striping, the information is split into chunks (fixed amounts of data usually called blocks) and the chunks are written to (or read from) a series of disks in parallel. There are two performance advantages to this:

Data transfer rates are higher for sequential operations due to the overlapping of multiple I/O streams.

Random access throughput is higher because access pattern skew is eliminated due to the distribution of the data. This means that with data distributed evenly across a number of disks, random accesses will most likely find the required information spread across multiple disks and thus benefit from the increased throughput of more than one drive.

RAID 0 is only designed to increase performance. There is no redundancy, so each disk is a single point of failure.

– RAID 1:

RAID 1 is also known as disk mirroring. In this implementation, identical copies of each chunk of data are kept on separate disks, or more commonly, each disk has a “twin” that contains an exact replica (or mirror image) of the information. If any disk in the array fails, then the mirror disk maintains data availability. Read performance can be enhanced because the disk that has the actuator (disk head) closest to the required data is always used, thereby minimizing seek times. The response time for writes can be somewhat slower than for a single disk, depending on the write policy; the writes can either be ran in parallel (for faster response) or sequential (for safety).

– RAID 2 and RAID 3:

RAID 2 and RAID 3 are parallel process array mechanisms, where all drives in the array operate in unison. Similar to data striping, information to be written to disk is split into chunks (a fixed amount of data), and each chunk is written out to the same physical position on separate disks


(in parallel). When a read occurs, simultaneous requests for the data can be sent to each disk. This architecture requires parity information to be written for each stripe of data; the difference between RAID 2 and RAID 3 is that RAID 2 can utilize multiple disk drives for parity, while RAID 3 can use only one. If a drive should fail, the system can reconstruct the missing data from the parity and remaining drives. Performance is very good for large amounts of data, but poor for small requests, because every drive is always involved, and there can be no overlapped or independent operation.

– RAID 4:

RAID 4 addresses some of the disadvantages of RAID 3 by using larger chunks of data and striping the data across all of the drives except the one reserved for parity. Using disk striping means that I/O requests need only reference the drive that the required data is actually on. This means that simultaneous, as well as independent reads, are possible. Write requests, however, require a read / modify / update cycle that creates a bottleneck at the single parity drive. Each stripe must be read, the new data inserted, and the new parity then calculated before writing the stripe back to the disk. The parity disk is then updated with the new parity, but cannot be used for other writes until this has completed. This bottleneck means that RAID 4 is not used as often as RAID 5, which implements the same process but without the bottleneck.

– RAID 5:

RAID 5 is very similar to RAID 4. The difference is that the parity information is also distributed across the same disks used for the data, thereby eliminating the bottleneck. Parity data is never stored on the same drive as the chunks that it protects. This means that concurrent read and write operations can now be performed, and there are performance increases due to the availability of an extra disk (the disk previously used for parity). There are other possible enhancements to further increase data transfer rates, such as caching simultaneous reads from the disks and transferring that information while reading the next blocks. This can generate data transfer rates that approach the adapter speed.

As with RAID 3, in the event of disk failure, the information can be rebuilt from the remaining drives. A RAID 5 array also uses parity information, though it is still important to make regular backups of the data in the array. RAID 5 arrays stripe data across all of the drives in the array, one segment at a time (a segment can contain multiple blocks). In an array with n drives, a stripe consists of data segments written to “n-1” of the drives and a parity segment written to the “n-th” drive. This mechanism also means that not all of the disk space is available for data. For example, in an array with five 72 GB disks, although the total storage is 360 GB, only 288 GB are available for data.


– RAID 0+1 (RAID 10):

RAID 0+1, also known as IBM RAID-1 Enhanced, or RAID 10, is a combination of RAID 0 (data striping) and RAID 1 (data mirroring). RAID 10 provides the performance advantages of RAID 0 while maintaining the data availability of RAID 1. In a RAID 10 configuration, both the data and its mirror are striped across all the disks in the array. The first stripe is the data stripe, and the second stripe is the mirror, with the mirror being placed on the different physical drive than the data. RAID 10 implementations provide excellent write performance, as they do not have to calculate or write parity data. RAID 10 can be implemented using software (AIX LVM), hardware (storage subsystem level), or in a combination of the hardware and software. The appropriate solution for an implementation depends on the overall requirements. RAID 10 has the same cost characteristics as RAID 1.

The most common RAID levels used in today’s IT implementations are listed in Table 2-3.

Table 2-3 Characteristics of RAID levels widely used

LVM quorum issuesQuorum must be enabled for concurrent volume groups because each node can be accessing different disk - data divergence.

Important: While all the RAID levels (other than RAID 0) have data redundancy, data should be regularly backed up. This is the only way to recover data in the event that a file or directory is accidentally corrupted or deleted.

RAID level Available disk capacity

Performance in read / write operations

Cost Data protection

RAID 0 100% High both read / write

Low No

RAID 1 50% Medium / High read, Medium write

High Yes

RAID 5 80% High read Medium write

Medium Yes

RAID 0+1 50% High both read / write

High Yes


Leaving quorum on (by default) will cause RG fallover if quorum lost, and the volume group will be forced to varyon on the other node if forced varyon of volume groups has been enabled. When forced varyon of volume groups is enabled, PowerHA checks to determine:

� That there is at least one copy of each mirrored set in the volume group

� That each disk is readable

� That there is at least one accessible copy of each logical partition in every logical volume

If these conditions are fulfilled, then PowerHA forces the volume group varyon.

Using enhanced concurrent mode volume groupsTraditionally access to a volume group was controlled by SCSI locks, and PowerHA had utilities to break these locks if a node did not release the volume groups cleanly. With AIX 5.1 the enhanced concurrent volume group was introduced and RSCT was used for locking. A further enhancement was the ability to varyon the volume group in two modes:

Active state The volume group behaves the same way as the traditional varyon. Operations can be performed on the volume group, and logical volumes and file systems can be mounted.

Passive state The passive state allows limited read only access to the VGDA and the LVCB.

When a node is integrated into the cluster, PowerHA builds a list of all enhanced concurrent volume groups that are a resource in any resource group containing the node. These volume groups are then activated in passive mode.

When the resource group comes online on the node, the enhanced concurrent volume groups are then varied on in active mode. When the resource group goes offline on the node, the volume group is varied off to passive mode.

Shared physical volumesFor applications that access raw disks, the physical volume identifier (PVID) can be added as a resource in a resource group.

Important: It is important when using enhanced concurrent volume groups that multiple networks exist for RSCT heartbeats. Because there is no SCSI locking, a partitioned cluster can very quickly varyon a volume group on all nodes, and then potentially corrupt data.


Shared logical volumesWhile not explicitly configured as part of a resource group, each logical volume in a shared volume group will be available on a node when the resource group is online. These shared logical volumes can be configured to be accessible by one node at a time or concurrently by a number of nodes in case the volume group is part of a concurrent Resource Group. If the ownership of the LV needs to be modified, remember to re-set it after each time the parent volume group is imported.

Although this is not an issue related to PowerHA, be aware that some applications using raw logical volumes can start writing from the beginning of the device, therefore overwriting the logical volume control block (LVCB).

Custom disk methodsThe extended resource SMIT menus allow the creation of custom methods to handle disks, volumes and file systems. To create a custom method, you need to define to PowerHA the appropriate scripts to manage the item in a highly available environment, for example:

For custom disks: PowerHA provides scripts to identify ghost disks, determines if a reserve is held, breaks a reserve, and makes the disk available.

For volume groups: PowerHA provides scripts to list volume group names, list the disks in the volume group, and bring the volume group online and offline.

For file systems: PowerHA provides scripts to mount, unmount, list, and verify status.

In HACMP 5.3, custom methods are provided for Veritas Volume Manager (VxVM) using the Veritas foundation suite v4.0.

File systems (jfs and jfs2): fsck and logredoAIX native file systems use database journaling techniques to maintain their structural integrity. So after a failure, AIX uses the journal file system log (JFSlog) to restore the file system to its last consistent state. This is faster than using the fsck utility. If the process of replaying the JFSlog fails, there will be an error and the file system will not be mounted.

The fsck utility performs a verification of the consistency of the file system, checking the inodes, directory structure, and files. While this is more likely to recover damaged file systems, it does take longer.

Important: Restoring the file system to a consistent state does not guarantee that the data is consistent; that is the responsibility of the application.


2.4.3 NFSPowerHA works with the AIX network file system (NFS) to provide a highly available NFS server, which allows the backup NFS server to recover the current NFS activity should the primary NFS server fail. This feature is only available for two-node clusters when using NFSv2/NFSv3, and more than two nodes when using NFSv4, because PowerHA preserves locks for the NFS file systems and handles the duplicate request cache correctly. The attached clients experience the same hang if the NFS resource group is acquired by another node as they would if the NFS server reboots.

When configuring NFS through PowerHA, you can control:

� The network that PowerHA will use for NFS mounting.� NFS exports and mounts at the directory level.� Export options for NFS exported directories and file systems. This information

is kept in /usr/es/sbin/cluster/etc/exports, which has the same format as the AIX exports file - /etc/exports.

NFS and PowerHA restrictionsThe following restrictions apply:

� Only two nodes are allowed in the cluster if the cluster is using NFSv2/NFSv3. More than two nodes are allowed if using NFSv4.

� Shared volume groups that contain file systems that will be exported by NFS must have the same major number on all nodes, or the client applications will not recover on a fallover.

� If NFS exports are defined on the node through PowerHA, all NFS exports must be controlled by PowerHA. AIX and PowerHA NFS exports cannot be mixed.

� If a resource group has NFS exports defined, the field “file systems mounted before IP configured” must be set to true.

� By default, a resource group that contains NFS exported file systems, will automatically be cross-mounted. This also implies that each node in the resource group will act as an NFS client, so must have a IP label on the same subnet as the service IP label for the NFS server.

NFS cross-mountsNFS cross-mounts work as follows:

� The node that is hosting the resource group mounts the file systems locally, NFS exports them, and NFS mounts them. Thus becoming both an NFS server and also an NFS client.

� All other participating nodes of the resource group simply NFS mount the file systems, thus becoming NFS clients.


� If the resource group is acquired by another node, that node mounts the file system locally and NFS exports them, thus becoming the new NFS server.

For example:

� Node1 with service IP label svc1 will locally mount /fs1and NFS exports it.� Node2 will NFS mount svc1:/fs1 on /mntfs1.� Node1 will also NFS mount svc1:/fs1 on /mntfs1.

2.4.4 Application serversVirtually any application that can run on a standalone AIX Server can run in a clustered environment protected by PowerHA. The application must be able to be started and stopped by scripts as well as able to be recovered by running a script after an unexpected shutdown.

Applications are defined to PowerHA as application servers with the following attributes:

Start script This script must be able to start the application from both a clean and an unexpected shutdown. Output from the script will be logged in the hacmp.out log file if set -x is defined within the script. The exit code from the script will be monitored by PowerHA.

Stop script This script must be able to successfully stop the application. Output is also logged and the exit code monitored.

Application monitors To keep applications highly available, PowerHA is able to monitor the application itself, not just the required resources.

The full path name of the script must be the same on all nodes, however, the contents of the script itself can be different from node to node. If they do differ on each node, it will inhibit your ability to use file collections feature. This is why we generally recommend that you have an intelligent script that can determine which node it is running on and start up appropriately.

As the exit codes from the application scripts are monitored, PowerHA assumes that a non-zero return code from the script means that the script failed and therefore the start or stop of the application was not successful. If this is the case, the resource group will go into error and a config_too_long recorded.

When configuring the application for PowerHA, these are some considerations:

� The application is compatible with the AIX version.

� The storage environment is compatible with a highly available cluster.


� The application and platform interdependencies must be well understood. The location of the application code, data, temporary files, sockets, pipes, and other components of the system such as printers must be replicated across all nodes that will host the application.

� As already discussed, the application must be able to be started and stopped without any operator intervention—particularly after an unexpected halt of a node. The application start and stop scripts must the thoroughly tested before implementation and with every change in the environment.

� The resource group that contains the application, must contain all the resources required by the application, or be the child of one that does.

� Application licensing must be taken into account. Many applications have licenses that depend on the CPU ID; careful planning must be done to ensure that the application can start on any node in the resource group node list. Care must also be taken with the numbers of CPUs, and so on, on each node, because some licensing is sensitive to this as well.

2.4.5 Application monitorsBy default, PowerHA is application unaware. However, PowerHA offers the capability of using application monitors to ensure that applications are kept highly available. Upon failure, PowerHA can respond as desired to the failure. More detailed information can be found in 7.7.7, “Application monitoring” on page 440.

Application availabilityPowerHA also provides the application availability analysis tool, which is useful for auditing the overall application availability, and for assessing the cluster environment. Additional detailed information can be found in 7.7.8, “Measuring application availability” on page 453.

2.4.6 Communication adapters and linksPowerHA supports three types of communication links:

� SNA configured over a LAN interface� SNA over X.25� X.25

Because of the way X.25 is used, these interfaces are treated as a different class of interfaces or devices and are not included in the PowerHA topology, and therefore not managed by the usual methods. In particular, heartbeats are not used to monitor the status of the X.25 interfaces. The daemon clcommlinkd, which uses x25status, is used to monitor the X.25 link status.


2.4.7 Tape resourcesSome SCSI and Fibre Channel connected tape drives can be configured as a highly available resource as part of any non-concurrent resource group.

2.4.8 Fast connect resourcesThe fast connect application server does not need start and stop scripts to be configured, because they are already integrated into PowerHA. After fast connect is configured as a PowerHA resource, then PowerHA supports the start, stop, fallover, fallback, and recovery of fast connect services. Fast connect services cannot be running on the cluster when the cluster is being brought up, because PowerHA needs to be controlling fast connect.

If IP address takeover and hardware address have been configured, clients do not need to re-establish their connection. PowerHA allows WLM classes to be configured as part of a resource group, thus ensuring that applications have sufficient access to critical system resources during times of peak workload.

2.4.9 Workload Manager integrationWorkload Manager (WLM) is the AIX resource administration tool that allows targets and limits to be set for applications and users use of CPU time, physical memory usage and disk I/O bandwidth. WLM classes can be configured, each with a range of system resources. Rules are then created to assign applications or groups of users to a class, and thus a range of resources that they can use.

WLM, using PowerHA configuration, will start either when a node joins the cluster, or as the result of DARE involving WLM - and only on nodes part of resource groups containing WLM classes. PowerHA works with WLM in two ways:

If WLM is already running, then PowerHA will save the running configuration, stop WLM, and then restart with the PowerHA configuration files. When PowerHA stops on a node, the previous WLM configuration will be activated.

If WLM is not running, it will start with the PowerHA configuration, and stopped when PowerHA stops on the node.

Note: PowerHA can only perform limited verification of the WLM configuration. Proper planning must be performed in advance.


The configuration that WLM uses on a node is specific to the node and the resource groups that might be brought online on that node. Workload manager classes can be assigned to resource groups either as:

� Primary class� Secondary class

When a node is integrated into the cluster, PowerHA checks each resource group whose node list contains that node. The WLM classes used then depend on the startup policy of each resource group, and the node’s priority in the node list.

Primary WLM classIf the resource group is either online on home node only, or online on first available node:

� If the node is the highest priority node in the node list, the primary WLM class will be used.

� If the node is not the highest priority node and no secondary WLM class is defined, the node will use the primary WLM class.

� If the node is not the highest priority node and a secondary WLM class is defined, the node will use the secondary WLM class.

If the resource group has a startup policy of either online on all available nodes (concurrent) or online using a node distribution policy, then the node will use the primary WLM class.

Secondary WLM classThis class is optional and is only used for nodes that are not the primary node for resource groups with a startup policy of either online on home node only, or online on first available node

2.4.10 Resource groupsEach resource must be included in a resource group to be made highly available by PowerHA. Resource groups allow PowerHA to manage a related group of resources as a single entity. For example, an application can consist of start and stop scripts, a database, an IP address. These resources would then be included in a resource group for PowerHA to control as a single entity.

PowerHA ensures that resource groups remain highly available by moving them from node to node as conditions within the cluster change. The main states of the cluster and the associated resource group actions are as follows:

Cluster startup The nodes in the cluster are up and then the resource groups are distributed according to their startup policy.


Resource failure / When a particular resource that is part of a resource recovery group becomes unavailable, the resource group can be

moved to another node. Similarly, it can be moved back when the resource becomes available.

PowerHA shutdown There are a number of ways of stopping PowerHA on aon a node node. One method will cause the node’s resource groups

to fall over to other nodes. Another method will take the resource groups offline. Under some circumstances, it is possible to stop the cluster services on the node, while leaving the resources active.

Node failure/recovery If a node fails, the resource groups that were active on that node are distributed among the other nodes in the cluster, depending on their fallover distribution policies. When a node recovers and is re-integrated into the cluster, resource groups can be re-acquired depending on their fallback policies.

Cluster shutdown When the cluster is shut down, all resource groups are taken offline. However there are some configurations where the resources can be left active, but the cluster services stopped.

Before understanding the types of behavior and attributes that can be configured for resource groups, you need to understand the following terms:

Node list This is the list of nodes that is able to host a particular resource group. Each node must be able to access the resources that make up the resource group.

Default node priority This is the order in which the nodes are defined in the resource group. A resource group with default attributes will move from node to node in this order as each node fails.

Home node This is the highest priority node in the default node list. By default this is the node a resource group will initially be activated on. This does not specify the node that the resource group is currently active on.

Startup This is the process of bringing a resource group into an online state.

Fallover This is the process of moving a resource group that is online on one node to another node in the cluster in response to an event.


Fallback This is the process of moving a resource group that is currently online on a node that is not its home node, to a re-integrating node.

Resource group behavior: Policies and attributesThe behavior of resource groups is defined by configuring the resource group policies and behavior.

Actually, what is important for PowerHA implementors and administrators is the resource groups’ behavior at startup, fallover ,and fallback. In the following sections, we describe the custom resource group behavior options.

Startup optionsThese options control the behavior of the resource group on initial startup.

� Online on home node only:

The resource group is brought online when its home node joins the cluster. If the home node is not available, it will stay in an offline state. See Figure 2-12.

Figure 2-12 Online on home node only

node_1 node_3node_2

offline online offline

node_1 node_3node_2

online online offline

node_1 node_3node_2

offline offline offlinerg_01

Node List: node_1 node_2 node_3

Online on homenode only

rg_01

Startup options


� Online on first available node:

The resource group is brought online when the first node in its node list joins the cluster. See Figure 2-13.

Figure 2-13 Online on first available node

node_1 node_3node_2


node_1 node_3node_2


node_1 node_3node_2



Online on firstavailable node

rg_01

rg_01

Startup options


� Online on all available nodes:

The resource group is brought online on all nodes in its node list as they join the cluster. See Figure 2-14.

Figure 2-14 Online on all available nodes

node_1 node_3node_2


node_1 node_3node_2


node_1 node_3node_2



Online on allavailable nodes

rg_01

rg_01rg_01

Startup options


� Online using distribution policy:

The resource group is only brought online if the node has no other resource group of this type already online. If there is more than one resource group of this type when a node joins the cluster, PowerHA will select the resource group with fewer nodes in its node list. If that is the same, PowerHA will choose the first node alphabetically. However, if one node has a dependent resource group (that is it is a parent in a dependency relationship), it will be given preference. See Figure 2-15.

Figure 2-15 Online using distribution policy

node_1 node_3node_2


node_1 node_3node_2

online offline offlinerg_02


Online usingdistribution policy

rg_01

rg_02rg_01

Startup options

check distributionpolicy


Fallover optionsThese options control the behavior of the resource group should PowerHA have to move it to another node in the response to an event.

� Fall over to next priority node in list:

The resource group falls over to the next node in the resource groups node list. See Figure 2-16.

Figure 2-16 Fallover to next priority node in list

node_1 node_3

node_2

offline

online

online

node_1 node_3

online onlinerg_01


Fallover to nextpriority node in list

rg_01

Fallover options

node_2

failed

rg_01


� Fallover using dynamic node priority:

The fallover node can be selected on the basis of either its available CPU, its available memory or the lowest disk usage. PowerHA uses RSCT to gather the data for the selected variable from each of the nodes in the node list, then the resource group will fall over to the node that best meets the criteria. This policy applies only to resource groups with three or more nodes. See Figure 2-17.

Figure 2-17 Fallover using dynamic node priority

node_1 node_3

node_2

online

online

online

node_1 node_3

offline onlinerg_01


Fallover usingDNP

rg_01

Fallover options

node_2

failed

rg_01

node_1 node_3

online online

node_2

failed

rg_01

cpu useage70%

cpu useage50%

RSCT


� Bring offline (on error only)

The resource group is brought offline in the event of an error. This option is designed for resource groups that are online on all available nodes. See Figure 2-18.

Figure 2-18 Bring offline (on error node only)

node_1 node_3

node_2

online

online

online

node_1 node_3

online onlinerg_01


Bring offline(error node only)

rg_01

Fallover options

node_2

failed

rg_01

rg_01

rg_01

rg_01


Fallback optionsThese options control the behavior of an online resource group when a node joins the cluster.

� Fall back to higher priority node in list

The resource group falls back to a higher priority node when it joins the cluster. See Figure 2-19.

Figure 2-19 Fall back to higher priority node in list

node_1 node_3

node_2

online

online

online

node_1 node_3

offline onlinerg_01


Fallback to higherpriority node in list

rg_01

Fallback options

node_2

online

rg_01


� Never fall back

The resource group does not move if a high priority node joins the cluster. Resource groups with online on all available nodes must be configured with this option. See Figure 2-20.

Figure 2-20 Never fall back

Resource group attributesResource group behavior can now be further tuned by setting resource group attributes:

� Settling time� Delayed fallback timers� Distribution policy� Dynamic node priorities� Resource group processing order� Priority override location� Resource group dependencies - parent / child� Resource group dependencies - location

The way these attributes affect RG behavior is provided in Table 2-4.

node_1 node_3

node_2

online

online

online

node_1 node_3

offline onlinerg_01


Never fallback

rg_01

Fallback options

node_2

online

rg_01


Table 2-4 Resource group attributes and how they affect RG behavior

Settling timeThis is a cluster wide attribute that affects the behavior of resource groups that have a startup policy of online on first available node. If not set, these resource groups will start on the first node in their resource group that integrates into the cluster. If the settling time is set for a resource group and the node that integrates into the cluster is its highest priority node then it will come on line immediately, else it will wait the settling time to see if another higher priority node joins.

This attribute ensures that a resource group does not start on an early integrated node low in its priority list, then keep falling over to higher priority nodes as they integrate.

Delayed fallback timersThis attribute is used to configure the time that a resource group will fall back. It can be set at a specific date and time, or a particular time, either daily, weekly, monthly or yearly.

The delayed fallback timer ensures that the resource group will fall back to its highest priority node at a specific time. This ensures that if there is to be a small disruption in services, it will occur at a time convenient to the users. The resource must not already be on its highest priority node.

Distribution policy This node based distribution policy ensures that on cluster startup, each node will only acquire one resource group with this policy set. There was also a network based distribution policy that would ensure that there was only one resource group coming online on the same network and node, so nodes with multiple networks could host multiple resource groups of this type.

Attribute Startup Fallover Fallback

Settling time P

Delayed fallback timer P

Distribution policy P

Dynamic node priority P

Resource group processing order P P P

Priority override location P P

Resource group Parent / Child dependency P P P

Resource group location dependency P P P


Dynamic node priority If there are three or more nodes in the cluster, a dynamic node priority fallover policy can be configured. One of the following three variables can be chosen to determine which node the resource group will fall over to:

� Highest free memory� Highest idle CPU� Lowest disk busy

The cluster manager keeps a table of these values for each node in the cluster. So at the time of fallover, the cluster manager can quickly determine which node best meets the criteria. These values are updated every 2 minutes, unless the node cannot be reached, in which case, the previous values remain unchanged.

To display the current values, use the lssrc -ls clstrmgrES command as shown in Example 2-1.

Example 2-1 Checking DNP values as known to clstrmgrES

odin:/# lssrc -ls clstrmgrESCurrent state: ST_STABLEsccsid = "@(#)36 1.135.1.37 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 51haes_r530, r5300525a 6/20/05 14:13:01"i_local_nodeid 1, i_local_siteid 1, my_handle 2ml_idx[1]=0 ml_idx[2]=1 ml_idx[3]=2There are 0 events on the Ibcast queueThere are 0 events on the RM Ibcast queueCLversion: 8Example 2-1cluster fix level is "0"The following timer(s) are currently active:Current DNP valuesDNP Values for NodeId - 1 NodeName - frigg PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000DNP Values for NodeId - 2 NodeName - odin PgSpFree = 130258 PvPctBusy = 0 PctTotalTimeIdle = 99.325169DNP Values for NodeId - 3 NodeName - thor PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000

Important: To measure highest free memory, PowerHA samples the paging space in use rather than the actual memory in use.


Resource group processing orderIf a node is attempting to bring more than one resource group online, the default behavior is to merge all the resources into one large resource group and then process them as one “resource group.” This is called parallel processing, though it is not true parallel processing because it is single thread.

This default behavior can be altered and a serial processing can be specified for particular resource groups by specifying a serial acquisition list. This order only defines the order of processing on a particular node, not across nodes. If serial processing is specified:

� The specified resource groups are processed in order.� Resource groups containing only NFS mounts are processed in parallel.� The remaining resource groups are processed in order.� The reverse order is used on release.

Resource group dependenciesA combination of two types of resource group dependency can be set:

� Parent/child dependency� Location dependencies

Parent/child relationships between resource groups are designed for multi-tier applications, where one or more resource groups cannot successfully start till a particular resource group is already active. When a parent / child relationship is defined, the parent resource group must be online before any of its children can be brought online on any node. If the parent resource group is to be taken offline, then the children must be taken offline first.

Up to three levels of dependency can be specified, that is a parent resource group can have children that are also parents to other resource groups. However, circular dependencies are not allowed.


Figure 2-21 shows an example where resource group 2 has two children, one of which also has a child. Thus resource group 2 must be online before resource groups 3 and 4 can be brought online. Similarly resource group 4 must be online before resource group 5 can be brought online. Resource group 3 has two parents (resource groups 1 and 2) that must be online before it can come online.

Figure 2-21 Parent / child resource group relationships

resourcegroup 2

resourcegroup 1

resourcegroup 4

resourcegroup 5

Resource group 3 parentto resource group 5

Resource groups 1 & 2 parentsto resource group 3

.

resourcegroup 3

Resource group 2 parentto resource groups 3 & 4


As PowerHA starts applications in background (so a hang of the script does not stop PowerHA processing), it is important to have startup application monitors for the parents in any parent / child resource group dependency. After the startup application monitor, or monitors, have confirmed that the application has successfully started, the processing of the child resource groups can then commence.

Location dependencies can also be defined for resource groups starting with HACMP 5.3. The choices are as follows:

Online on same node The specified resource groups will always start up, fall over, and fall back to the same node, that is the nodes will move a set.

A resource group with this dependency can only be brought online on the node where other resource groups in the same set are already online, unless it is the first resource group in the set to be brought online.

Online on different nodes The specified resource groups will start up, fall over, and fall back to different nodes. A priority is assigned to the resource groups, so that the higher priority nodes will be handled first and kept in an online state should there be a limited number of nodes.

Low priority nodes will be taken off line on a node if a higher priority resource group is without a node. Intermediate priority nodes will not be taken offline. A resource group with this dependency can only be brought online on a node where there are no other resource groups that are part of this dependency already online.

Online on same site The specified resource groups will always be in an online state at the same site.

A resource group with this dependency can only be brought online on the site where other resource groups with this dependency are currently in an online state, unless it is the first with the dependency to be brought online.


Figure 2-22 shows an example of a three node cluster, with two databases and two applications. The applications cannot start up before the databases are online, so the parent / child dependency is configured. For performance reasons, the databases should be on different nodes, and the applications should be on the same nodes as the databases, so the location dependencies are configured.

Figure 2-22 Resource group dependencies

To set or display the RG dependencies, you can use the clrgdependency command, as shown in Example 2-2:

Example 2-2 Modifying and checking RG dependencies

odin:># clrgdependency -t [PARENT_CHILD | NODECOLLOCATION | ANTICOLLOCATION | SITECOLLOCATION ] -slodin:># clrgdependency -t PARENT_CHILD -sl#Parent Childrg1 rg2rg1 rg3

odin:># clrgdependency -t NODECOLLOCATION -sl

odin:># clrgdependency -t ANTILOCATION -sl#HIGH:INTERMEDIATE:LOW

Node 2

Application 2

Database 2

Node 3(standby)

Node 1

Application 1

Database 1

parent/child

parent/child

parent/childDatabases must be active

before application

Applications located onsame node as database

Databases locatedon different nodes

.


rg01::rg03frigg

odin:># clrgdependency -t SITECOLLOCATION -slrg01 rg03 frigg

Another way to check is by using the odmget HACMPrg_loc_dependency command.

Resource group manipulationResource groups can be:

Brought online A resource group can be brought online on a node in the resource group’s node list. The resource group would be currently offline, unless it was an online on all available nodes resource group.

Brought offline A resource group can be taken offline from a particular node.

Moved to another A resource group that is online on one node can be takennode while online offline and then brought online on another node in the

resource group’s node list. This can include moving the resource group to another site.

Certain changes are not allowed:

� A parent resource group cannot be taken offline or moved if there is a child resource group in an online state.

� A child resource group cannot be started until the parent resource group is online.

Resource group statesHACMP 5x has modified how resource group failures are handled, and as such manual intervention is not always required.

If a node fails to bring a resource group online when it joins the cluster, the resource group will be left in ERROR state. If this fails and the resource group is not configured as online on all available nodes, then PowerHA will attempt to bring the resource group online on the other active nodes in the resource group’s node list.

Since HACMP 5.2, each node that joins the cluster will automatically attempt to bring online any of the resource groups that are in the ERROR state.

If a node fails to acquire a resource group during fallover, the resource group will be marked as “recoverable” and PowerHA will attempt to bring the resource group online in all the nodes in the resource groups node list. If this fails for all nodes, then the resource group will be left in ERROR state.


If there is a failure of a network on a particular node, PowerHA will determine what resource groups are affected (those that had service IP labels on the network) and then attempt to bring them online on another node. If there are no other nodes with the required network resources, then the resource groups are left in ERROR state. Should any interfaces become available, PowerHA will work out what ERROR resource groups can be brought on line, then attempt to do so.

Selective falloversThe following failures are categorized as selective:

� Interface failure:

– PowerHA swaps interfaces if possible.– Otherwise, RG is moved to the highest priority node with an available

interface, and if not successful, RG will be brought in ERROR state.

� Network failure:

– If a local failure, affected RGs are moved to another node.– If a global failure, this results in node_down for all nodes.

� Application failure:

– If an application monitor indicates an application has failed, depending on the configuration, PowerHA first attempts to restart the application on the same node (usually three times).

– Then, if restart is not possible, PowerHA moves the RG to another node, and if this fails also, the RG is brought in ERROR state.

� Communication link failure:

– PowerHA will attempt to move the RG to another node.– If selective fallover for volume group if “LVM_SA_QUORCLOSE” error is

configured, then PowerHA will attempt to move the affected RGs to another node.

2.5 Plug-insThe PowerHA plug-in software contains sample scripts to help you configure the following services as part of a highly available cluster:

� Name server� Print server� DHCP server

Tip: If you want to override the automatic behavior of bringing a resource group in ERROR state back online, specify that it must remain offline on a node.


Each plug-in consists of application start and stop scripts, application monitor scripts, and cleanup scripts. There is also a script to confirm that the correct configuration files are available on a shared file system. Each plug-in contains a readme file with further information. They can be found under /usr/es/sbin/cluster/plugins/<plugin_name> when the cluster.es.plugins is installed.

2.6 Features (HACMP 5.1, 5.2 and 5.3)This section lists some new features and enhancements as well as those no longer supported.

2.6.1 New featuresIn this section we describe the following key new features.

Clinfo daemon intra-cluster communication enhancementsThe clinfo daemon now contains version information. and has a new logfile /tmp/clinfo.debug.

The SMUX peer daemon (clsmuxpd) functionality has been added into the cluster manager daemon so SNMP queries are possible even if the cluster is not active. Two new states not_configured and not_synced have been created.

The cluster manager now has two log files:

/tmp/clstrmgr.debug The location is configurable and the file contains the cluster manager default logging

/tmp/clsmuxtrmgr.debug This is a new logfile for tracing the new SNMP function of the cluster manager.

Cluster verification enhancementsNext we cover the major cluster verification enhancements.

Automatic verification and synchronizationPowerHA verifies the node’s configuration when it starts (either as the first node in the cluster, or joining an active cluster). The following items are automatically checked and corrected if required:

� RSCT instance numbers are consistent.� IP interfaces are configured as RSCT expects.� Shared volume groups are not set to automatically varyon.� File systems are not set to automatically mount.


If the joining nodes configuration does not match that of the running cluster, it will be synchronized with one of the running nodes.

Verification also detects potential single points of failure that were previously only detected by automatic error notification.

Additional cluster verificationAdditional checks are performed by PowerHA:

� Each node has same version of RSCT.

� Each IP interface has same setting for MTU, and the AIX and PowerHA settings for the IP interfaces are consistent.

� The AIX network options used by PowerHA and RSCT are consistent across nodes.

� Volume groups and PVIDs are consistent across nodes that are members of the owning resource group.

� If PowerHA/XD is installed, site management policy cannot be set to ignore.

� If PowerHA/XD is installed, then the PPRC and/or GLVM configuration will be verified.

clhosts file automatically populatedThe clhosts file, which is used by many monitor programs, has two forms:

� HACMP server version: This file is on each node in /usr/es/sbin/cluster/etc and has the entry 127.0.0.1 added on installation of HACMP.

� HACMP client version: The file clhosts.client is found in /usr/es/sbin/cluster/etc and is populated by cluster verification with address and label of each interface and defined service IP address. Timestamped versions are kept.

Cluster definition file in XML formatXML file format is the common format for user created cluster definition files and the online planning worksheet files. SMIT can be used to convert existing cluster snapshot files into a XML cluster definition file.

OEM and Veritas Volumes and file system integrationPowerHA can now routinely manage OEM volume groups and their corresponding file systems. This feature means that OEM disks, volumes, and file systems can be included in an PowerHA resource group. Either the supplied custom methods can be used, or individually customized methods.


In particular, PowerHA will automatically detect volume groups created with Veritas volume manager using the Veritas foundation suite (v4.0).

SMS capabilityA new custom remote notification method has been added. It is now possible to send remote notification messages to any address such as a mobile phone or as an E-mail to an E-mail address.

Resource group location dependenciesIn addition to the policies that define resource group parent / child dependencies and the startup distribution, PowerHA now offers cluster wide location dependencies for resource groups:

� Online on same node� Online on different nodes� Online on same site

PowerHA/XDParallel processing of the primary and secondary instances of PowerHA/XD replicated resource groups is the default, though serial processing can be specified for this release at least. DARE and rg_move processes support parallel processing across sites.

Site management policies can be specified for the startup, fallover, and fallback behavior of both the primary and secondary instance of a resource group.

� Concurrent like (online on all available nodes) resource group inter site behavior can be combined with a non-concurrent site policy.

� Parent / child dependency relationships can be specified.

� Node based distribution start policy can be used.

� Resource group collocation and anti-collocation supported.

� Cluster verification also verifies PowerHA/XD configurations, however, the configuration must be manually propagated to other nodes as often significant amounts of customization must be done.

WebSMIT security enhancementsThere is server side validation of parameters passed to WebSMIT prior to execution.

The WebSMIT authentication tools are more fully integrated with AIX authentication mechanisms.


HACMP smart assist programsThe HACMP smart assist programs have received the following major changes:

� HACMP support for Smart assist for WebSphere®, which started with version 5.2, has been updated.

� Smart assist for DB2® includes monitoring and recovery support for DB2 Universal Database™ Enterprise Server Edition.

� Smart assist for Oracle assists with the install of the Oracle application server 10g.

2.6.2 Features no longer supportedThe following features are no longer supported.

� cllockd or cllockdES are no longer supported.

� clinfo no longer uses shared memory; it uses message queues.

� clsmuxpd no longer exists and its functions have been incorporated into the cluster manager.

� Event management subsystem has been replaced with RSCT Resource Monitoring and Control (RMC) subsystem.

� cldiag is no longer supported from the command line.

� clverify is no longer supported from the command line.

2.7 LimitsThis section lists some of the common PowerHA limits at the time of writing. These limits are presented in Table 2-5.

Table 2-5 PowerHA limits

Component Maximum number/other limits

Nodes 32

Resource groups 64

Networks 48

Network interfaces, devices, and labels 256

Cluster resources While 128 is the maximum that clinfo can handle, there can be more in the cluster

Parent-Child dependencies max of 3 levels


Subnet requirementsThe AIX kernel routing table supports multiple routes for the same destination. If multiple matching routes have the same weight, each of the subnet routes will be used alternately. The problem that this poses for PoweHA is that if one node has multiple interfaces that shares the same route, then PowerHA has no means to determine its health.

Therefore we recommend that each interface on a node belongs to a unique subnet, so that each interface can be monitored. Using heartbeat over alias is an alternative.

2.8 Storage characteristicsThis section presents information about storage characteristics, along with PowerHA storage handling capabilities.

2.8.1 Shared LVMFor a PowerHA cluster, the key element is the data used by the highly available applications. This data is stored on AIX Logical Volume Manager (LVM) entities. PowerHA clusters use the capabilities of the LVM to make this data accessible to multiple nodes. AIX Logical Volume Manager provides shared data access from multiple nodes. These are components of the shared logical volume manager:

� A shared volume group is a volume group that resides entirely on the external disks shared by cluster nodes.

� A shared physical volume is a disk that resides in a shared volume group.

Sites 2

Interfaces 7 interfaces per node per network

Application monitors per site 128

Persistent IP alias one per node per network

XD_data networks 4 per cluster

GLVM Modes Synchronous, Asynchronous, non concurrent

GLVM Devices All PVs supported by AIX, no need to be same local and remote

Component Maximum number/other limits


� A shared logical volume is a logical volume that resides entirely in a shared volume group.

� A shared file system is a file system that resides entirely in a shared logical volume.

If you are a system administrator of a PowerHA cluster, you might be called upon to perform any of the following LVM-related tasks:

� Create a new shared volume group.� Extend, reduce, change, or remove an existing volume group.� Create a new shared logical volume.� Extend, reduce, change, or remove an existing logical volume.� Create a new shared file system.� Extend, change, or remove an existing file system.� Add and remove physical volumes.

When performing any of these maintenance tasks on shared LVM components, make sure that ownership and permissions are reset when a volume group is exported and then re-imported. More details on performing these tasks are available in 7.4, “Shared storage management” on page 353.

After exporting and importing, a volume group is owned by root and accessible by the system group.

Shared logical volume access can be made available in any of the following data accessing modes:

� Non-concurrent access mode� Concurrent access mode� Enhanced concurrent access mode

Additional details can be found in 12.1, “Volume group types” on page 576.

2.8.2 Non-concurrent access modePowerHA in a non-concurrent access environment typically uses journaled file systems to manage data, though some database applications can bypass the journaled file system and access the logical volume directly.

Note: Applications, such as some database servers, that use raw logical volumes might be affected by this change if they change the ownership of the raw logical volume device. You must restore the ownership and permissions back to what is needed after this sequence.


Both mirrored and non-mirrored configuration is supported by non-concurrent access of LVM. For more information about creating mirrored and non-mirrored logical volumes, refer to the HACMP Planning and Installation Guide, SC23-4861.

To create a non-concurrent shared volume group and file systems on a node, use C-SPOC as described in 7.4.1, “Updating LVM components” on page 354.

Importing a volume group to a fall-over nodeBefore you import the volume group, make sure the volume group is varied off from the primary node. You can then run the discovery process of PowerHA, which will collect the information about all volume groups available across all nodes.

Importing the volume group onto the fall-over node synchronizes the ODM definition of the volume group on each node on which it is imported.

When adding a volume group to the resource group, you can choose to manually import a volume group onto the fall-over node, or you can choose to automatically import it onto all the fall-over nodes in the resource group.

For more information about importing volume groups using C-SPOC, see 7.4.2, “C-SPOC Logical Volume Manager” on page 357.

2.8.3 Concurrent access modeUsing concurrent access with PowerHA requires installing the additional fileset named bos.clvm.enh. Concurrent access mode is not supported for file systems; instead, you must use raw logical volumes or physical disks.

Creating a concurrent access volume groupCreating a concurrent access volume group now only applies to using enhanced concurrent mode volume groups as described in the next section. For more information about to create one and add to a resource group, refer to “Adding a concurrent VG and a new concurrent RG” on page 363.

Note: Always make sure to use unique logical volume, jfslog, and file system names.

Note: Concurrent access is only provided through the use of enhanced concurrent volume groups.


2.8.4 Enhanced concurrent mode volume groupsPowerHA has the ability to create and use enhanced concurrent volume groups. These can be used for both concurrent and non-concurrent access. You can also convert existing concurrent (classic) volume groups to enhanced concurrent mode using C-SPOC.

For enhanced concurrent volume groups that are used in a non-concurrent environment, rather than using the SCSI reservation mechanism, PowerHA uses the fast disk takeover mechanism to ensure fast takeover and data integrity.

The enhanced concurrent volume group is varied on all nodes in the cluster that are part of that resource group. However, the access for modifying data is only granted to the node that has the resource group active (online). More detailed information this can be found in Chapter 12, “Storage related considerations” on page 575.

2.8.5 Fast disk takeoverThis feature was introduced with HACMP V5.1, which has the following main purposes:

� Decreases the application downtime, with faster resource group fallover (and movement)

� Uses AIX Enhanced Concurrent volume groups (ECM)

� Uses RSCT for communications

The enhanced concurrent volume group supports active and passive mode varyon, and can be included in an non-concurrent resource group.

The fast disk takeover is set up automatically by the PowerHA software. For all shared volume groups that have been created in enhanced concurrent mode and contain file systems, PowerHA will activate the fast disk takeover feature. When HACMP starts, all nodes in a resource group that share the same enhanced volume group will varyon that volume group in passive mode. When the resource group is brought online, the node that acquires the resources will varyon the volume group in active mode.

The other nodes will maintain the volume group varied on in passive mode. In this case, all the changes to the volume group will be propagated automatically to all the nodes in that volume group. The change from active to passive mode and the reverse are coordinated by PowerHA at cluster startup, resource group activation, and fallover, and when a failing node rejoins the cluster.


For more information about fast disk takeover, see 12.4, “Fast disk takeover” on page 580.

2.9 Shared storage configurationMost of the PowerHA configurations require shared storage, meaning those disk subsystems that support access from multiple hosts.

There are also third-party (OEM) storage devices and subsystems that can be used, although most of these are not directly certified by IBM for PowerHA usage. For these devices, check the manufacturer’s respective Web sites.

PowerHA also supports shared tape drives (SCSI or FC). The shared tape(s) can be connected using SCSI or FC. Concurrent mode tape access is not supported.

For an updated list of supported storage and tape drives, check the IBM Web site at:

http://www-03.ibm.com/systems/power/software/availabilty/aix/index.html

Storage configuration is one of the most important tasks you have to perform before starting the PowerHA cluster configuration. Storage configuration can be considered a part of PowerHA configuration.

Depending on the application needs, and on the type of storage, you have to decide that how many nodes in a cluster will have shared storage access, and which resource groups will use which disks.

Most of the IBM storage subsystems are supported with PowerHA. To find more information about storage server support, see the HACMP Planning and Installation Guide, SC23-4861.

2.9.1 Shared LVM requirementsPlanning shared LVM for a PowerHA cluster depends on the method of shared disk access and the type of shared disk device. The elements that should be considered for shared LVM are:

� Data protection method� Storage access method� Storage hardware redundancy



In this section, we provide information about data protection methods at the storage level, and also talk about the LVM shared disk access modes:

� Non-concurrent� Concurrent “classic” � Enhanced concurrent mode (ECM)

2.9.2 Non-concurrent, enhanced concurrent, and concurrentIn a non-concurrent access configuration, only one cluster node can access the shared data at a time. If the resource group containing the shared disk space moves to another node, the new node will activate the disks, and check the current state of the volume groups, logical volumes, and file systems.

In non-concurrent configurations, the disks can be shared as:

� Raw physical volumes� Raw logical volumes� File systems

In a concurrent access configuration, data on the disks is available to all nodes concurrently. This access mode does not support file systems (either JFS or JFS2).

LVM requirementsThe Logical Volume Manager (LVM) component of AIX manages the storage by coordinating data mapping between physical and logical storage. Logical storage can be expanded and replicated, and can span multiple physical disks and enclosures.

The main LVM components are as follows:

� Physical volume:

A physical volume (PV) represents a single physical disk as it is seen by AIX (hdisk). The physical volume is partitioned into physical partitions (PPs), which represent the physical allocation units used by LVM.

Note: PowerHA itself does not provide storage protection. Storage protection is provided using:

� AIX (LVM mirroring)� GLVM� Hardware RAID


� Volume group:

A volume group (VG) is a set of physical volumes that AIX treats as a contiguous, addressable disk region. In PowerHA, the volume group and all its logical volumes can be part of a shared resource group. A volume group cannot be part of multiple resource groups (RGs).

� Physical partition:

A physical partition (PP) is the allocation unit in a volume group. The PVs are divided into PPs (when the PV is added to a volume group), and the PPs are used for LVs (one, two, or three PPs per logical partition (LP)).

� Volume group descriptor area (VGDA):

The VGDA is an area on the disk that contains information about the storage allocation in that volume group.

For a single disk volume group, there are two copies of the VGDA. For a two disk volume group, there are three copies of the VGDA: two on one disk and one on the other. For a volume group consisting of three or more PVs, there is one VGDA copy on each disk in the volume group.

� Quorum:

For an active volume group to be maintained as active, a “quorum” of VGDAs must be available (50% + 1). Also, if a volume group has the quorum option set to “off”, it cannot be activated (without the “force” option) if one VGDA copy is missing. If the quorum is be turned off, the system administrator must know the mapping of that volume group to ensure data integrity.

� Logical volume:

A logical volume (LV) is a set of logical partitions that AIX makes available as a single storage entity. The logical volumes can be used as raw storage space or as file system’s storage. In PowerHA, a logical volume that is part of a volume group is already part of a resource group, and cannot be part of another resource group.

� Logical partition:

A logical partition (LP) is the space allocation unit for logical volumes, and is a logical view of a physical partition. With AIX LVM, the logical partitions can be mapped to one, two, or three physical partitions to implement LV mirroring.

� File systems:

A file system (FS) is in fact a simple database for storing files and directories. A file system in AIX is stored on a single logical volume. The main components of the file system (JFS or JFS2) are the logical volume that holds the data, the file system log, and the file system device driver. PoweHA supports both JFS and JFS2 as shared file systems.


Forced varyon of volume groupsPowerHA provides a facility to use the forced varyon of a volume group option on a node. If, during the takeover process, the normal varyon command fails on that volume group (lack of quorum), PowerHA will ensure that at least one valid copy of each logical partition for every logical volume in that volume group is available before varying on that volume group on the takeover node.

Forcing a volume group to varyon lets you bring and keep a volume group online (as part of a resource group) as long as there is one valid copy of the data available. You should use a forced varyon option only for volume groups that have mirrored logical volumes, and use caution when using this facility to avoid creating a partitioned cluster.

This option is useful in a takeover situation in case a volume group that is part of that resource group loses one or mode disks (VGDAs). If this option is not used, the resource group will not be activated on the takeover node, thus rendering the application unavailable.

When using a forced varyon of volume groups option in a takeover situation, PowerHA first tries a normal varyonvg command. If this attempt fails due to lack of quorum, PowerHA checks the integrity of the data to ensure that there is at least one available copy of all data in the volume group before trying to force the volume online. If there is, it runs varyonvg -f command; if not, the volume group remains offline and the resource group results in an error state.

For more information, see “Planning Shared LVM Components” in the HACMP Planning and Installation Guide, SC23-4861.

Note: You should specify the super strict allocation policy for the logical volumes in volume groups used with the forced varyon option. In this way, the LVM makes sure that the copies of a logical volume are always on separate disks, and increases the chances that forced varyon will be successful after a failure of one or more disks.

Note: The users can still use quorum buster disks or custom scripts to force varyon a volume group, but the new forced varyon attribute in PowerHA automates this action, and customer enforced procedures can now be relaxed.


Part 2 Planning, installation, and migration

In Part 2, we provide information about PowerHA cluster and environment planning, and explain how to install a sample cluster. We also present examples for migrating a cluster from an earlier HACMP version to the latest PowerHA 5.5. Our scenarios provide step-by-step instructions and comments, as well as some problem determination for migration.


� Planning� Installation and configuration� Migrating a cluster to PowerHA V5.5

Part 2



Chapter 3. Planning

In this chapter, we discuss the planning aspects for a PowerHA cluster. Adequate planning and preparation are required to successfully install and maintain a PowerHA cluster. Time spent properly planning your cluster configuration and preparing your environment will result in a cluster that is easier to install and maintain and one that provides higher application availability.

Before you begin planning your cluster, you must have a good understanding of your current environment, your application, and your expectations for PowerHA. Building on this information, you can develop an implementation plan that can allow you to easily integrate PowerHA into your environment, and more importantly, have PowerHA manage your application availability to your expectations.

In addition to Chapter 2, “High availability components” on page 23, which discusses PowerHA concepts and basic design considerations, this chapter focuses on the steps required to plan for a PowerHA implementation. For ease of explanation, we use the planning and preparation for a two-node mutual takeover cluster as an example. This configuration is the starting point for more advanced installations.

3


We cover the following topics in this chapter:

� High availability planning� Planning for PowerHA� Getting started� Planning cluster hardware� Planning cluster software� Operating system considerations� Planning security� Planning cluster networks� Planning storage requirements� Application planning� Planning for resource groups� Detailed cluster design� Developing a cluster test plan� Developing a PowerHA installation plan� Backing up the cluster configuration� Documenting the cluster� Change and problem management� Planning tools


3.1 High availability planningThe primary goal of planning a high availability cluster solution is to eliminate or minimize service interruptions for a chosen application. To that end, single points of failure, in both hardware and software, must be addressed. This is usually accomplished through the use of redundant hardware, such as power supplies, network interfaces, SAN adapters, and mirrored or RAID disks. All of these components carry an additional cost, and might not protect the application in the event of a server or operating system failure.

Here is where PowerHA comes in. PowerHA can be configured to monitor server hardware, operating system, and application components. In the event of a failure, PowerHA can take corrective actions, such as moving specified resources (service IP addresses, storage, and applications) to surviving cluster components in order to restore application availability as quickly as possible.

Because PowerHA is an extremely flexible product, designing a cluster to fit your organization requires thorough planning. Knowing your application requirements and behavior provides important input to your PowerHA plan and will be primary factors in determining the cluster design. Ask yourself the following questions while developing your cluster design:

� Which application services are required to be highly available?

� What are the service level requirements for these application services (24/7, 8/5) and how quickly must service be restored in the event of a failure?

� What are the potential points of failure in the environment and how can they be addressed?

� Which points of failure can be automatically detected by PowerHA and which would require custom code to be written to trigger an event?

� What is the skill level within the group implementing and maintaining the cluster?

Note: Detailed planning information can be found in the manual HACMP for AIX Planning Guide, SC23-4861.

Chapter 3. Planning 105

Although the AIX system administrators are typically responsible for the implementation of PowerHA, usually they cannot do it on their own. A team consisting of the following representatives should be assembled to assist with the PowerHA planning as each will play a role in the success of the cluster:

� Network administrator� AIX system administrator� Database administrator� Application programmer� Support personnel� Application users

3.2 Planning for PowerHAThe major steps to a successful PowerHA implementation are identified in Figure 3-1. Notice that a cluster implementation does not end with the successful configuration of a cluster. Cluster testing, backup, documentation, and system and change management procedures are equally important to ensure the ongoing integrity of the cluster.

Using the concepts discussed in Chapter 1, “Introduction to PowerHA for AIX” on page 3, you should begin a PowerHA implementation by developing a detailed PowerHA cluster configuration and implementation plan. The Online Planning Worksheets can be used to guide you through this process and record the cluster details.


Figure 3-1 PowerHA implementation steps

3.2.1 Planning strategyAs illustrated in Figure 3-1, planning is the foundation upon which the implementation is built. Proper planning should touch on all aspects of cluster implementation. It should include:

� The cluster design and behavior� A detailed cluster configuration� Installation considerations and plan� A plan to test the integrity of the cluster� A backup strategy for the cluster� A procedure for documenting the cluster� A plan to manage problems and changes in the cluster

PreparationPrepare the existing environment for the HACMP install and configuration. Configure the Operating System, network, storage, and applications in preparation for HACMP.

InstallationInstall the HACMP 5.3 software.

ConfigurationUsing the planning worksheets (paper or online), configure the cluster.

TestingUse the automated test tool and custom test plans to test the cluster functionality and behavior before moving to production.

BackupsUse the snapshot tool to backup your cluster configuration.

DocumentationDocument the cluster configuration for System Management purposes.

Suc

cess

ful H

AC

MP

Impl

emen

tatio

n

ManagementImplement Systems Management and Change Management procedures to ensure cluster availability targets are maintained.

Change M

anagement

PlanningDesign the cluster configuration and behavior. Use the Online Planning Worksheets, Paper Worksheets, and cluster diagrams to assist with planning and documenting the cluster.


For ease of explanation, we use the planning of a simple two-node mutual takeover cluster as an example. Sample planning worksheets are included as we work through this chapter in order for you to see how the cluster planning is developed.

3.2.2 Planning toolsThree tools are available to help with the planning of a PowerHA cluster:

� Cluster diagram� Paper Planning Worksheets� Online Planning worksheets

Both the cluster diagram and the Paper Planning Worksheets provide a manual method of recording your cluster information. The Online Planning Worksheets provide an easy to use Java™-based interface that can be used to record and configure your cluster.

All three tools are discussed in more detail towards the end of this chapter.

3.3 Getting startedBegin cluster planning by assessing the current environment and your expectations for PowerHA. Here are some questions you might ask:

� Which applications need to be highly available?

� How many nodes are required to support the applications?

� Are the existing nodes adequate in size (CPU/memory) to run multiple applications or is this a new installation?

� How do the clients connect to the application, what is the network configuration?

Important: For a successful implementation, PowerHA cluster planning and preparation should be completed before you install or configure PowerHA.

Note: If you decide to use the Online Planning Worksheets (OLPW), or the two-node cluster configuration assistant, it is still important that PowerHA planning and preparation be completed. The OLPW and two-node cluster configuration assistant are simply intended to ease with the documentation and configuration of the cluster, you must still have a good understanding of the planning and preparation.


� What type of shared disk is to be used?

� What are the expectations for PowerHA?

3.3.1 Current environmentFigure 3-2 illustrates a simple starting configuration which we use as our example. It focuses on two applications to be made highly available. This could be an existing pair of servers or two new servers. Bear in mind that a server is merely a representation of an AIX image running on POWER hardware. It is common practice that these “servers” would be logical partitions (LPARs).

The starting configuration shows that:

� Each application resides on a separate node (server).

� Clients access each application over a dedicated Ethernet connection on each server.

� Each node is relatively the same size in terms of CPU and memory, each with additional spare capacity.

� Each node has redundant power supplies and mirrored internal disks.

� The applications reside on external SAN disk.

� The applications each have their own robust start and stop scripts.

� There is a monitoring tool to verify the health of each application.

� AIX 6.1 is already installed.

Important: Each application to be integrated into the cluster must run in standalone mode. You also must be able to fully control the application (start, stop, and validation test).


Figure 3-2 Initial environment

The intention is to make use of the two nodes in a mutual takeover configuration where app1 normally resides on node01, and app2 normally resides on node02. In the event of a failure, we want both applications to run on the surviving server. We can see from the diagram that we need to prepare the environment in order to allow each node to run both applications.

Analyzing PowerHA cluster requirements, we have three key focus areas as illustrated in Figure 3-2; network, application and storage. All planning activities are in support of one of these three items to some extent:

Network How clients connect to the application (the service address). The service address floats between all designated cluster nodes.

Application What resources are required by the application. The application must have everything it needs to run on a fallover node including CPU and memory resources, licensing, run-time binaries, and

Note: Each application to be integrated into the cluster must be able to run in standalone mode on any node that it might have to run on (under both normal and fallover situations).


configuration data. It should have robust start and stop scripts as well as a tool to monitor its status.

Storage What type of shared disks will be used. The application data must reside on shared disks that are available to all cluster nodes.

3.3.2 Addressing single points of failureTable 3-1 summarizes the various single points of failure found in the cluster infrastructure and how to protect against them. These items should be considered during the development of the detailed cluster design.

Table 3-1 Single Points of Failure

Cluster objects To eliminate as single point of failure PowerHA / AIX supports

Nodes Use multiple nodes. Up to 32.

Power sources Use multiple circuits or uninterruptible power supplies (UPS).

As many as needed.

Networks Use multiple networks to connect nodes.

Up to 48.

Network interfaces, devices, and IP addresses.

Use redundant network adapters. Up to 256.

TCP/IP subsystem Use point-to-point networks to connect adjoining nodes.

As many as needed.

Disk adapters Use redundant disk adapters. As many as needed.

Storage controllers Use redundant disk controllers. As many as needed (hardware limited).

Disks Use redundant hardware and disk mirroring, striping, or both.

As many as needed.

Applications Assign a node for application takeover, configure application monitors, configure clusters with nodes at more than one site.

As many as needed.

Sites Use more than one site for disaster recovery.

2


3.3.3 Initial cluster designNow that we have an understanding of the current environment, PowerHA concepts, and our expectations for the cluster, we can begin the cluster design.

This is a good time to create a diagram of the PowerHA cluster. Start simply and gradually increase the level of details as you go through the planning process. The diagram can help identify single points of failure, application requirements, and guide you along the planning process.

The paper or online planning worksheets should also be used to record the configuration and cluster details as you go.

Figure 3-3 illustrates the initial cluster diagram used in our example. At this point, the focus is on high level cluster functionality. Cluster details are developed as we move through the planning phase.

Figure 3-3 Initial cluster design


We begin to make design decisions for the cluster topology and behavior based on our requirements. For example, based on our requirements, the initial cluster design for our example includes the following considerations:

� The cluster is a two-node mutual takeover cluster.

� Hostnames could be used as cluster node names, but we choose to specify cluster node names instead.

� Each node contains one application but is capable of running both (consider network, storage, memory, CPU, software).

� Each node has one logical ethernet interface that is protected using Shared Ethernet Adapter (SEA) in a Virtual I/O Server (VIOS).

� We use IPAT (IP Address Takeover) using aliasing.

� Each node has a persistent IP address (an IP alias always available while the node is up) and one service IP (aliased to one of the adapters under PowerHA control). The base Ethernet adapter addresses are on separate subnets.

� Shared disks are virtual SCSI devices provided by a Virtual I/O Server and reside on a SAN and are available on both nodes.

� All volume groups on the shared disks are created in Enhanced Concurrent Mode (ECM) in order to allow for the use of heartbeating over disk and fast disk takeover.

� Each node has enough CPU/memory resources to run both applications.

� Each node has redundant hardware and mirrored internal disks.

� AIX 6.1 TL02 is installed.

� PowerHA 5.5 is used.

This list simply captures the basic components of the cluster design. Each item will be investigated in further detail as we progress through the planning stage.

3.3.4 Completing the cluster overview planning worksheetComplete the initial worksheet as we have done in our example. There are 11 worksheets found in this chapter, each covering different aspects of the cluster planning. Table 3-2 shows the first worksheet listing the basic cluster elements.

Note: If you plan to use the DLPAR function of PowerHA, the AIX hostname, and the HMC LPAR name (as seen in the HMC GUI) must match.


Table 3-2 Cluster overview

3.4 Planning cluster hardwareCluster design starts by determining how many and what type of nodes are required. This depends largely on a couple of factors:

� The amount of resources required by each application.� The fallover behavior of the cluster.

3.4.1 Overview of cluster hardware A primary consideration when choosing nodes is that in a fallover situation, the surviving node or nodes must be capable of running the failing node’s applications. That is, if you have a two-node cluster and one node fails, the surviving node must have all the resources required to run the failing node’s applications (in addition to its own applications). If this is not possible, you might consider implementing an additional node as a standby node, or consider using the dynamic LPAR (DLPAR) feature. As you might notice, PowerHA allows for a wide range of cluster configurations depending on your requirements.

PowerHA supports virtually any AIX supported node, from desktop systems to high end servers. When choosing a type of node, we recommend that you:

� Ensure that there are sufficient CPU and memory resources available on all nodes to allow the system to behave as desired in a fallover situation. The CPU and memory resources must be capable of sustaining the selected applications during fallover, otherwise clients might experience performance

PowerHA CLUSTER WORKSHEET - PART 1 of 11CLUSTER OVERVIEW

DATE: March 2009

CLUSTER NAME clusterdemo

ORGANIZATION IBM ITSO

NODE 1 HOSTNAME HA55node1

NODE 2 HOSTNAME HA55node2

NODE 1 PowerHA NAME node01

NODE 2 PowerHA NAME node02

COMMENTS This is a set of planning tables for a simple two-node PowerHA 5.5 mutual takeover cluster using IPAT via Aliasing.

Note: The number of nodes in a cluster can range from 2 to 32.


problems. If you are using LPARs, you might want to make use of the DLPAR capabilities to increase resources during fallover. If you are using standalone servers, you do not have this option and so you might have to look at using a standby server.

� Make use of highly available hardware and redundant components where possible in each server. For example, use redundant power supplies and connect them to separate power sources.

� Protect each node’s rootvg (local operating system copy) through the use of mirroring or RAID.

� Allocate at least two Ethernet adapters per node and connect them to separate switches to protect from a single adapter or switch failure. Commonly this is done using a single or dual Virtual I/O Server.

� Allocate two SAN adapters per node to protect from a single SAN adapter failure. Commonly this is done using a single or dual Virtual I/O Server.

Although not mandatory, we suggest using cluster nodes with similar hardware configurations in order to make it easier to distribute the resources and perform administrative operations. That is, do not try to fallover from a high-end server to a desktop model and expect everything to work properly, be thoughtful in your choice of nodes.

3.4.2 Completing the cluster hardware planning worksheetThe following worksheet (Table 3-3) contains the hardware specifications for our example. Where possible, we made use of redundant hardware, additional Ethernet and SAN switches, and we ensured that we had enough resources to sustain both applications simultaneously on any node.

Table 3-3 Cluster hardware

PowerHA CLUSTER WORKSHEET - PART 2 of 11CLUSTER HARDWARE

DATE: March 2009

HARDWARE COMPONENT SPECIFICATIONS COMMENTS

POWER6 technology-based 550s

Power Systems2 CPU and 8GB Memory

Quantity 2*Latest Firmware

Ethernet Adapters 10/100/1000 Ethernet Virtual Ethernet.

Network Switches All ports are configured in the same VLAN.Switches support Gratuitous ARP and Spanning Tree Protocol is disabled. Switch port speed set to auto as appropriate for Gigabit adapters.


3.5 Planning cluster softwareReview all software components to be used in the cluster to ensure compatibility. Items to consider are AIX, RSCT, PowerHA, Virtual I/O Server, application, and storage software. This section discusses the various software levels and compatibilities.

3.5.1 AIX and RSCT levelsPowerHA 5.5 is supported on AIX Versions 5.3 and 6.1. Specific combinations of AIX and RSCT levels are required for installing PowerHA 5.5 as listed in Table 3-4.

Table 3-4 AIX and RSCT levels

3.5.2 Virtual LAN and SCSI supportThe following software levels are required to support IBMs Virtual LAN and Virtual SCSI features found in IBMs Virtual I/O Server:

� AIX 5.3 ML02 (5300-02) with APARs IY70082 and iFIX IY72974.

SAN Adapters FC 5759 PCI-X 4GbFC 5774 PCI-e 4Gb

Virtual SCSI

SAN Switches (2) IBM 2005 B16 Zoned for Virtual I/O ServerHBAs and storage

SAN Storage DS4800 Switch attached though not shown in diagram.

COMMENTS All Hardware compatibility verified.

PowerHA CLUSTER WORKSHEET - PART 2 of 11CLUSTER HARDWARE

DATE: March 2009

AIX Version RSCT Version Minimum RSCT filesets

AIX 6.1 TL2 SP1 2.5.2 rsct.compat.basic.hacmp 2.5.2rsct.compat.clients.hacmp 2.5.2rsct.core.sec 2.5.2rsct.core.rmc 2.5.2

AIX 5.3 TL9 2.4.10 rsct.compat.basic.hacmp 2.4.10rsct.compat.clients.hacmp 2.4.10rsct.core.sec 2.4.10rsct.core.rmc 2.4.10


� Virtual I/O Server V1.1 with Fixpack 6.2 and iFIX IY71303.062905.epkg.Z

� Minimum RSCT levels:

– rsct.basic.hacmp 2.4.2.1– rsct.basic.rte 2.4.2.2– rsct.compat.basic.hacmp 2.4.2.0

For more information about Virtual I/O Server support, refer to Chapter 9, “Virtualization and PowerHA” on page 487.

3.5.3 Required AIX filesetsThe following filesets are required for PowerHA. They should be installed with the latest version of fixes for the appropriate AIX level before PowerHA is installed:

� bos.adt.lib� bos.adt.libm� bos.adt.syscalls� bos.net.tcp.client� bos.net.tcp.server� bos.rte.SRC� bos.rte.libc� bos.rte.libcfg� bos.rte.libcur� bos.rte.libpthreads� bos.rte.odm� bos.rte.lvm.� bos.clvm.enh (required for Enhanced Concurrent Volume Groups)

Requirements for NFSv4The cluster.es.nfs fileset that comes with the PowerHA installation medium installs the NFSv4 support for PowerHA, along with an NFS Configuration Assistant. To install this fileset, the following BOS NFS components must also be installed on the system.






3.5.4 AIX security filesetsThe following filesets are required if you plan to use message authentication or encryption for PowerHA communication between cluster nodes. They can be installed from the AIX Expansion Pack CD-ROM:

rsct.crypt.des For data encryption with standard DES message authentication.

rsct.crypt.3des For data encryption with Triple DES message authentication.

rsct.crypt.aes256 For data encryption with Advanced Encryption Standard (AES) message authentication.

3.5.5 PowerHA filesetsThe following PowerHA filesets can be installed from the install media (excluding additional language filesets):

� cluster.adt.es– cluster.adt.es.client.include– cluster.adt.es.client.samples.clinfo– cluster.adt.es.client.samples.clstat– cluster.adt.es.client.samples.libcl– cluster.adt.es.java.demo.monitor

� cluster.assist.license (by separate Smart assist LPP)� cluster.doc.en_US.assist

– cluster.doc.en_US.assist.db2.pdf– cluster.doc.en_US.assist.db2.html– cluster.doc.en_US.assist.oracle.html– cluster.doc.en_US.assist.oracle.pdf– cluster.doc.en_US.assist.websphere.pdf– cluster.doc.en_US.assist.websphere.html

� cluster.doc.en_US.es– cluster.doc.en_US.es.html– cluster.doc.en_US.es.pdf

� cluster.doc.en_US.glvm– cluster.doc.en_US.glvm.html– cluster.doc.en_US.glvm.pdf

� cluster.doc.en_US.pprc– cluster.doc.en_US.pprc.html– cluster.doc.en_US.pprc.pdf

� cluster.es.assist (by separate Smart Assist LPP)– cluster.es.assist.common– cluster.es.assist.db2– cluster.es.assist.oracle– cluster.es.assist.websphere


� cluster.es.cfs– cluster.es.cfs.rte– cluster.es.cgpprc (by separate PowerHA/XD LPP)– cluster.msg.En_US.cgpprc

� cluster.es.client– cluster.es.client.clcomd– cluster.es.client.lib– cluster.es.client.rte– cluster.es.client.utils– cluster.es.client.wsm

� cluster.es.cspoc– cluster.es.cspoc.cmds– cluster.es.cspoc.dsh– cluster.es.cspoc.rte

� cluster.es.nfs– cluster.es.nfs.rte (NFSv4 support)

� cluster.es.plugins– cluster.es.plugins.dhcp– cluster.es.plugins.dns– cluster.es.plugins.printserver

� cluster.es.pprc (by separate PowerHA/XD LPP)– cluster.es.pprc.cmds– cluster.msg.en_US.pprc– cluster.msg.En_US.pprc– cluster.msg.Ja_JP.pprc– cluster.es.pprc.rte

� cluster.es.spprc (by separate PowerHA/XD LPP)– cluster.es.spprc.cmds– cluster.es.spprc.rte

� cluster.es.server– cluster.es.server.cfgast– cluster.es.server.diag– cluster.es.server.events– cluster.es.server.rte– cluster.es.server.simulator– cluster.es.server.testtool– cluster.es.server.utils

� cluster.es.svcpprc (by separate PowerHA/XD LPP)– cluster.es.svcpprc.cmds– cluster.msg.En_US.svcpprc– cluster.msg.Ja_JP.svcpprc– cluster.es.svcpprc.rte

� cluster.es.worksheets� cluster.license


� cluster.man.en_US.es– cluster.man.en_US.es.data– cluster.man.en_US.assist.data

� cluster.msg.en_US.cspoc– cluster.msg.En_US.cspoc– cluster.msg.En_US.hativoli– cluster.msg.En_US.assist (by separate Smart Assist LPP)– cluster.msg.En_US.es.server– cluster.msg.En_US.es.client– cluster.msg.En_US.glvm (via separate PowerHA/XD LPP)– ssaa.msg.En_US– cluster.msg.Ja_JP.cspoc– cluster.msg.Ja_JP.hativoli– cluster.msg.Ja_JP.es.client– cluster.msg.Ja_JP.es.server– cluster.msg.Ja_JP.assist (via separate Smart Assist LPP)– cluster.msg.Ja_JP.glvm (via separate PowerHA/XD LPP)– ssaa.msg.Ja_JP– cluster.msg.en_US.cspoc– cluster.msg.en_US.es.server– cluster.msg.en_US.es.client– cluster.msg.en_US.hativoli– cluster.msg.en_US.assist– cluster.msg.en_US.glvm (via separate PowerHA/XD LPP)– ssaa.msg.en_US

� cluster.xd.glvm (by separate PowerHA/XD LPP)� cluster.xd.license (by separate PowerHA/XD LPP)� glvm.rpv (by separate PowerHA/XD LPP)

– glvm.rpv.client– glvm.rpv.msg.en_US– glvm.rpv.msg.En_US– glvm.rpv.msg.Ja_JP– glvm.rpv.server– glvm.rpv.util

� glvm.rpv.man.en_US (by separate PowerHA/XD LPP)� glvm.rpv.msg.en_US (by separate PowerHA/XD LPP)

– glvm.rpv.msg.En_US– glvm.rpv.msg.Ja_JP– glvm.rpv.msg.en_US

3.5.6 AIX files altered by PowerHABe aware that the following system files can be altered by PowerHA during cluster packages installation, verification, and synchronization process.


/etc/hostsThe cluster event scripts use the /etc/hosts file for name resolution. All cluster nodes IP interfaces must be added to this file on each node. PowerHA can modify this file to ensure that all nodes have the necessary information in their /etc/hosts file, for proper PowerHA operations.

If you delete service IP labels from the cluster configuration using SMIT, we recommend that you also remove them from /etc/hosts.

/etc/inittabThe /etc/inittab file is modified in each of the following cases:

� PowerHA is installed:

– The following line is added when you initially install PowerHA. It will start the clcomdES and clstrmgrES subsystems if they are not already running. hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init >/dev/console 2>&1

� PowerHA is configured for IP Address Takeover:

– harc:2:wait:/usr/es/sbin/cluster/etc/harc.net # HACMP network startup

� The Start at System Restart option is chosen on the SMIT System Management (C-SPOC) Manage HACMP Services Start Cluster Services panel:

– hacmp6000:2:wait:/usr/es/sbin/cluster/etc/rc.cluster -boot -i # Bring up Cluster

– When the system boots, the /etc/inittab file calls the /usr/es/sbin/cluster/etc/rc.cluster script to start HACMP.

Important: This PowerHA entry is used to start the following daemons using the startsrc command if they are not already running.

� startsrc -s syslogd� startsrc -s snmpd � startsrc -s clcomdES� startsrc -s clstrmgrES

Note: Although it is possible to start cluster services from the inittab, we strongly recommend that you do not use this option. It is best to manually control the starting of PowerHA. For example, in the case of a node failure, it is best to investigate the cause of the failure before restarting PowerHA on the node.


/etc/rc.netThe /etc/rc.net file is called by cfgmgr (cfgmgr is the AIX utility that configures devices and optionally installs device software into the system), to configure and start TCP/IP during the boot process. It sets hostname, default gateway, and static routes.

/etc/servicesPowerHA makes use of the following network ports for communication between cluster nodes. These are all listed in the /etc/services file:

� clinfo_deadman 6176/tcp� clm_lkm 6150/tcp� clm_smux 6175/tcp� godm 6177/tcp� topsvcs 6178/udp� grpsvcs 6179/udp� emsvcs 6180/udp� clcomd 6191/tcp� clinfo_client 6174/tcp

In addition to PowerHA, RMC uses the following ports:

� #rmc 657/tcp� #rmc 657/udp

WebSmit typically uses the following port. The WebSmit port is configurable:

� #http 42267 (WebSmit port)

/etc/snmpd.confThe version of snmpd.conf depends on whether you are using AIX 5L V5.1 or later. The default version of the file for versions of AIX later than V5.1 is snmpdv3.conf.

Note: ha_star is also found as an entry in the inittab. This fileset is delivered with the bos.rte.control fileset and not PowerHA.

Note: If you install PowerHA/XD for GLVM, the following entry for the port number and connection protocol is automatically added to the /etc/services file on each node on the local and remote sites on which you installed the software:

� rpv 6192/tcp


The SNMP daemon reads the /etc/snmpd.conf configuration file when it starts up and when a refresh or kill -1 signal is issued. This file specifies the community names and associated access privileges and views, hosts for trap notification, logging attributes, snmpd-specific parameter configurations, and SMUX configurations for the snmpd. The PowerHA installation process adds a clsmuxpd password to this file.

The following entry is added to the end of the file, to include the PowerHA MIB supervised by the Cluster Manager:

smux 1.3.6.1.4.1.2.3.1.2.1.5 “clsmuxpd_password” # HACMP/ES for AIX clsmuxpd

/etc/snmpd.peersThe /etc/snmpd.peers file configures snmpd SMUX peers. During installation, PowerHA adds the following entry to include the clsmuxpd password to this file:

clsmuxpd 1.3.6.1.4.1.2.3.1.2.1.5 "clsmuxpd_password" # HACMP/ES for AIX clsmuxpd

/etc/syslog.confThe /etc/syslog.conf configuration file is used to control output of the syslogd daemon, which logs system messages. During the install process PowerHA adds entries to this file that direct the output from PowerHA-related problems to certain files.

For example:

# HACMP Critical Messages from HACMPlocal0.crit /dev/console# HACMP Informational Messages from HACMPlocal0.info /usr/es/adm/cluster.log# HACMP Messages from Cluster Scriptsuser.notice /usr/es/adm/cluster.log# HACMP/ES for AIX Messages from Cluster Daemonsdaemon.notice /usr/es/adm/cluster.log

The /etc/syslog.conf file should be identical on all cluster nodes.

/etc/trcfmtThe /etc/trcfmt file is the template file for the system trace logging and report utility, trcrpt. The installation process adds PowerHA tracing to the trace format file. PowerHA tracing is performed for the clstrmgrES and clinfo daemons.


/var/spool/cron/crontab/rootThe PowerHA installation process adds PowerHA logfile rotation to the /var/spool/cron/crontab/root file:

0 0 * * * /usr/es/sbin/cluster/utilities/clcycle 1>/dev/null 2>/dev/null # > HACMP for AIX Logfile rotation

3.5.7 Application softwareTypically applications are not dependent on PowerHA versions as they are not aware of the underlying PowerHA functionality. That is, PowerHA simply starts and stops them (PowerHA can also monitor applications, but generally using an application dependent method).

Check with the application vendor to ensure there are no issues (such as licensing) with the use of PowerHA 5.5.

3.5.8 LicensingYou have to pay attention to two aspects of licensing: PowerHA (features) licensing and application licensing.

PowerHAPowerHA licensing is based on the number of processors, where the number of processors is the sum of the number of processors on which PowerHA will be installed or run. A PowerHA license is required for each AIX instance. PowerHA/XD, and the Smart Assist license are also done the same way as PowerHA.

Therefore, this means that:

� If you have a pSeries server with 4 CPUs running in full system partition mode, you require a license for 4 CPUs.

� If you have a pSeries server with 4 CPUs running logical partitions and you only run PowerHA in a 2 CPU partition, you require a license for 2 CPUs.

� Of course, you require a license for each server you plan to run PowerHA on.

Note: Micro-partition licensing for PowerHA is not available. You must license by full processors.


ApplicationsSome applications have specific licensing requirements such as a unique license for each processor that runs an application, which means that you must license-protect the application by incorporating processor-specific information into the application when it is installed. As a result, even though the PowerHA software processes a node failure correctly, it might be unable to restart the application on the fallover node because of a restriction on the number of licenses for that application available within the cluster.

To avoid this problem, be sure that you have a license for each system unit in the cluster that might potentially run an application.

3.5.9 Completing the software planning worksheetThe worksheet in Table 3-5 lists all of the software installed in our example.

Table 3-5 Cluster Software

3.6 Operating system considerationsIn addition to the AIX operating system levels and filesets, there are a few other operating system aspects to consider during the planning stage.

Important: Check with your application vendor for any license issues when using PowerHA.

PowerHA CLUSTER WORKSHEET - PART 3 of 11CLUSTER SOFTWARE

DATE: March 2009

SOFTWARE COMPONENT VERSION COMMENTS

AIX 6.1 TL2 SP3 Latest AIX Version

RSCT 2.5.2.0 Latest RSCT Version

PowerHA 5.5 SP1 Service Pack 1

APPLICATION Test Application Version 1 Add your application versions.

COMMENTS All software compatibility verified. No issues running applications with PowerHA.PowerHA licensing for 4 CPUs on each node.Application licensing verified and license purchased for both servers.


Disk space requirementsPowerHA requires the following available space in the rootvg volume group for installation:

� /usr requires 82 MB of free space for a full installation of PowerHA.� / (root) requires 710 KB of free space.

It is also good practice to allow approximately 100 MB free space in /var and /tmp for PowerHA logs (the space required depends on the number of nodes in the cluster, which dictates the size of the messages stored in the various PowerHA logs).

Time synchronizationTime synchronization is important between cluster nodes for both application and HACMP log issues. This is standard system administration practice, and we recommend that you make use of an NTP server or other procedure to keep the cluster nodes time in sync.

Operating system settingsThere are no additional operating system settings required for PowerHA. Follow normal AIX tuning as required by application workload.

3.7 Planning securityProtecting your cluster nodes (and application) from unauthorized access is an important factor of the overall system availability. There are certain general security considerations, as well as PowerHA related aspects, which we emphasize in this section.

3.7.1 Cluster securityPowerHA needs a way to authenticate to all nodes in the cluster for running remote commands related to cluster verification, synchronization, and certain administrative operations (C-SPOC).

Cluster security is required to prevent unauthorized access to cluster nodes. Starting in HACMP 5.1, a new security mechanism, facilitated by the cluster communication daemon (clcomdES), provides additional cluster security.

Note: Maintaining time synchronization between the nodes is especially useful for auditing and debugging cluster problems.


PowerHA inter-node communication relies on a cluster daemon (clcomdES), which eliminates the need for AIX “classic” remote commands. See further explanations in this chapter for detailed information about clcomdES mechanism.

PowerHA modes for connection authentication� Standard security mode:

– Standard security is the default security mode.– It is implemented directly by cluster communication daemon (clcomdES).– It uses node and adapter information stored in HACMP ODM classes and

the /usr/es/sbin/cluster/etc/rhosts file to determine legitimate partners.

� Enhanced security mode (Kerberos):

– Kerberos security is available only for HACMP clusters implemented in an SP cluster.

– It takes advantage of the Kerberos authentication method.

In standard security mode, the remote command execution for PowerHA remote commands in /usr/es/sbin/cluster uses the principle of least privilege. This ensures that no command can run on a remote node with root privilege, except for the ones in /usr/es/sbin/cluster. A select set of PowerHA commands are considered trusted and allowed to run as root; all other commands run as user nobody.

To manage inter-node communication, the cluster communication daemon requires a list of valid cluster IP labels or addresses to use. There are two ways to provide this information:

� Automatic node configuration (default method)� Individual node configuration (manual method)

Automatic node configurationIf you are configuring PowerHA for the first time, the file /usr/es/sbin/cluster/etc/rhosts on a node is empty. Because clcomdES has to authenticate the IP address of the incoming connection to be sure that is from a node in the cluster, the rules for validating the addresses are based on the following process:

� If the /usr/es/sbin/cluster/etc/rhosts file is empty and there is no PowerHA cluster defined on that node, then the first connection from another node will be authenticated and accepted. The content of the file /usr/es/sbin/cluster/etc/rhosts will be changed to include all “ping-able” base addresses of the network adapters from the communicating node.

Important: The Kerberos option is no longer valid starting with PowerHA 5.5.


� If a cluster is already defined on the node (the HACMPcluster ODM class is not empty), then clcomdES looks for a communication path (IP address) in the HACMPnode and subsequently in the HACMPadapter ODM class. If it finds a valid communication path, it takes the first occurrence (HACMPnode, then HACMPadapter), otherwise the /usr/es/sbin/cluster/etc/rhosts file will be checked for a valid IP address.

� If clcomdES cannot authenticate incoming connections, it will fail, and you have to manually update the /usr/es/sbin/cluster/etc/rhosts file, and then recycle the clcomdES daemon (stopsrc -s clcomdES, startsrc -s clcomdES)

Typically, the user should not have to manually populate the rhosts file, but rather let clcomdES do it. Because this file is empty upon installation, the first connection from another node will populate it. The first connection is usually verification and synchronization, and afterwards the HACMPnode and HACMPadapter ODMs are complete. After the cluster is synchronized, the rhosts file can be emptied but not removed. The information in HACMPnode and HACMPadapter is then used for clcomdES authentication.

Individual node configurationAs an alternate solution, if you are especially concerned about network security (building the cluster on an unsecured network), you might want to put all the IP addresses/labels in the /usr/es/sbin/cluster/etc/rhosts file prior to configuring the cluster.

The PowerHA installation creates this empty file with read-write permissions for root only.

Notes:

� To ensure that an unauthorized host does not connect to a node between the time when you install PowerHA software and the time when you initiate a connection from one cluster node to another, you can manually populate (edit) the /usr/es/sbin/cluster/etc/rhosts file to add one or more IP labels/addresses (that will be part of your cluster).

� If at a later time you decide to redo (start from scratch, or change the base IP addresses of the nodes) the cluster configuration, it is a good practice to also empty the contents (DO NOT delete) of the rhosts file on ALL the nodes that you plan to (re)use for your cluster.

Note: Ensure that each IP address/label is valid for the cluster, otherwise an error is logged in the /var/hacmp/clcomd/clcomd.log.


To set up the /usr/es/sbin/cluster/etc/rhosts file:

1. As root, open the file /usr/es/sbin/cluster/etc/rhosts on a node.2. Edit the file to add all possible network interface IP addresses for each node. 3. Put only one IP label or address on each line.

For more information, refer to Chapter 8, “Cluster security” on page 457.

3.7.2 User administrationMost of the applications require user information to be consistent across the cluster nodes (user ID, group membership, group ID) so that users can log in to surviving nodes without experiencing problems.

This is particularly important in a fallover (takeover) situation. It is imperative that application users be able to access the shared files from any required node in the cluster. This usually means that the application related UID and GID must be the same on all nodes.

In preparation for a cluster configuration, it is important that this be considered and corrected, otherwise you might experience service problems during a fallover.

After PowerHA is installed, it contains facilities to let you manage AIX user and group accounts across the cluster. It also provides a utility to authorize specified users to change their own password across nodes in the cluster.

For more information about user administration refer to 7.3.1, “C-SPOC user and group administration” on page 338.

Note: If you disable the cluster communications daemon or completely REMOVE the /usr/es/sbin/cluster/etc/rhosts file, programs that require inter-node communication, such as C-SPOC, cluster verification and synchronization, file collections, and message authentication and encryption will no longer function.

For this reason, it is mandatory to keep clcomdES running at all times.

Note: If you manage user accounts with a utility such as Network Information Service (NIS), PSSP user management, or Distributed Computing Environment (DCE) Manager, do NOT use PowerHA user management. Using PowerHA user management in this environment might cause serious system inconsistencies in the user authentication databases.


3.7.3 HACMP groupDuring the installation of PowerHA, the hacmp group will be created if it does not already exist. During creation, PowerHA will simply pick the next available GID for the hacmp group.

For more information about user administration, refer to 7.3.1, “C-SPOC user and group administration” on page 338.

3.7.4 PowerHA portsIn addition to the ports identified in the /etc/services file, the following services also require ports. However, these ports are selected randomly when the processes start. At present there is no way to specify specific ports, just be aware of their presence. Typical ports are shown for illustration, but these ports can be altered if you need to do so:

� #clstrmgr 870/udp� #clstrmgr 871/udp� #hatsd 32789/udp� #clinfo 32790/udp

3.7.5 Planning for PoweHA file collectionsPowerHA requires that certain files must be identical on all cluster nodes. These files include event scripts, application scripts, certain AIX configuration files, and PowerHA configuration files. The PoweHA File Collections facility allows you to automatically synchronize these files among cluster nodes and warns you if there are any unexpected results (for example, if a file in a collection has been deleted or has a length of zero on one or more cluster nodes).

These file collections can be managed through SMIT menus. Through SMIT, you can add, delete, and modify file collections to meet your needs.

Default PowerHA file collectionsWhen you install PowerHA, it sets up the following default file collections:

� Configuration_Files� HACMP_Files

Note: If you prefer to control the GID of the hacmp group, we suggest that you create the hacmp group before installing the PowerHA filesets.


Configuration_FilesConfiguration_Files is a container for the following essential system files:

� /etc/hosts� /etc/services� /etc/snmpd.conf� /etc/snmpdv3.conf� /etc/rc.net� /etc/inetd.conf� /usr/es/sbin/cluster/netmon.cf� /usr/es/sbin/cluster/etc/clhosts� /usr/es/sbin/cluster/etc/rhosts� /usr/es/sbin/cluster/etc/clinfo.rc

You can alter the propagation options for this file collection, and you can also add files to this file collection and delete files from it.

HACMP_FilesHACMP_Files is a container in which you typically find user-configurable files of the PowerHA configuration such as application start/stop scripts, customized events, and so on. This File Collection cannot be removed or modified, and the files in this File Collection cannot be removed, modified, or added.

For more information, refer to 7.2, “File collections” on page 329.

3.8 Planning cluster networksNetwork configuration is a key component in the cluster design.

In a typical clustering environment, clients access the applications via a TCP/IP network (usually Ethernet) using a service address. This service address will be made highly available by PowerHA and will move between communication interfaces on the same network as required. PowerHA sends heartbeat packets between all communication interfaces (adapters) on the network to determine the status of the adapter(s) and node(s) and take remedial action(s) as required.

In order to eliminate the TCP/IP protocol as a single point of failure and prevent cluster partitioning, PowerHA also utilizes non-IP point-to-point networks for heartbeating. This assists PowerHA with identifying the failure boundary, such as a TCP/IP failure or a node failure.

Note: For example, when you define an application server to PowerHA (start, stop and optionally monitoring scripts), PowerHA will automatically include these files into the HACMP_Files collection.


In this section we look at each network type and decide on the appropriate network connections and addresses.

Figure 3-4 provides an overview of the networks used in a cluster.

Figure 3-4 HACMP Cluster Networks

An Ethernet network is used for public access and has multiple adapters connected from each node. This network will hold the base IP addresses, the persistent IP addresses, and the service IP addresses. You can have more than one network; however, for simplicity, we are only going to use one.

A serial RS232 network is shown. This is a point-to-point network with a direct connection using a cable between serial ports on each node.

A disk heartbeat network (also point-to-point) is shown as well. If you are using SAN disks, disk heartbeat is easy to implement because no additional hardware is required. Also, in a multi-path device configuration, using the vpath device allows us to take advantage of the capabilities of SDD software, as opposed to using a simple hdisk. That is, an hdisk has only one path to the device while a vpath typically has many paths. Multi-path devices can be configured whenever there are multiple disk adapters in a node, multiple storage adapters, or both.

All network connections are used by PowerHA to monitor the status of the network, adapters, and nodes in the cluster.

In our example, we plan for an Ethernet and disk heartbeat network, but not an RS232 network.

ha53node1

en0 en1

ha53node2

en0 en1

fcs0 fcs1 fcs0 fcs1

tty1 tty1

/dev/vpath0

Ethernet

Enhanced Concurrent Volume group

app1vg

TCP/IPEthernet network (public)

Serial RS232 network

Point to point network (disk heartbeat)


3.8.1 TerminologyThis section presents a quick recap of the various terminology used in discussions regarding PowerHA networking:

IP labels IP labels are simply names associated with IP addresses resolvable by the system (/etc/hosts, BIND, and so on).

Service IP label / address An IP Address or label over which a service is provided. Typically this is the address used by clients to access an application. It can be bound to a node or shared by nodes and is kept highly available by HACMP.

Persistent IP label / address A node bound IP alias that is managed by PowerHA. That is, the persistent alias never moves to another node.

Communication interface A physical interface that supports the TCP/IP Protocol. For example an Ethernet adapter. It is represented by its boot-time or base IP label.

Communication device A physical device representing an end of a point-to-point non-IP network. For example /dev/tty1 or /dev/vpath0.

Communication adapter A physical X25 adapter maintained highly available by PowerHA.

Network Interface card (NIC) A Network Interface Card (NIC) is simply a physical adapter used to provide access to a network, for example an Ethernet adapter is referred to as a NIC.

3.8.2 General network considerationsIn this section we offer a number of considerations to keep in mind when designing your network configuration.

Supported network typesPowerHA allows internode communication with the following TCP/IP-based networks (note that Ethernet is the most common network in use):

� Ethernet� Token-Ring� IBM HPS (High Performance Switch, no longer valid with PowerHA 5.5)� Infiniband� Fiber Distributed Data Interchange™ (FDDI)


� ATM and ATM LAN Emulation� EtherChannel (or 802.3ad Link Aggregation)� IPV6

The following TCP/IP-based networks are NOT supported:

� Virtual IP Address (VIPA) facility of AIX 5L� Serial Optical Channel Converter (SOCC) � SLIP � FC Switch (FCS)� IBM HPS (High Performance Switch)� 802_ether

You can configure heartbeat over the following types of point-to-point networks:

� Serial RS232 � Disk heartbeat (over an enhanced concurrent mode disk)� Multi-node disk heartbeat� Target Mode SSA (no longer valid with PowerHA 5.5)� Target Mode SCSI (legacy)

Network connectionsPowerHA requires that each node in the cluster have at least one direct, non-routed network connection with every other node. These network connections pass heartbeat messages among the cluster nodes to determine the state of all cluster nodes, networks and network interfaces.

PowerHA also requires all of the communication interfaces for a given cluster network be defined on the same physical network, route packets, and receive responses from each other without interference by any network equipment.

Do not place intelligent switches, routers, or other network equipment that do not transparently pass UDP broadcasts and other packets between all cluster nodes.

Bridges, hubs, and other passive devices that do not modify the packet flow can be safely placed between cluster nodes, and between nodes and clients.


Figure 3-5 illustrates a physical Ethernet configuration, showing dual Ethernet adapters on each node connected across two switches but all configured in the same physical network (VLAN). This is sometimes referred to as being in the same MAC “collision domain”.

Figure 3-5 Ethernet Switch Connections

EtherChannelPowerHA supports the use of EtherChannel (or Link Aggregation) for connection to an Ethernet network. EtherChannel can be useful if you find that you want to use a number of Ethernet adapters for both additional network bandwidth and fallover, but also want to keep the PowerHA configuration simple. With EtherChannel, you can simply specify the EtherChannel interface as the communication interface. Any Ethernet failures, with the exception of the Ethernet network itself, can be handled without PowerHA being aware or involved.

ha53node1

en0 en1

ha53node2

en0 en1

All 4 ports connected

SW1 SW2

*all mac addresses can “see” each other.


Figure 3-6 illustrates a slightly more complex configuration showing each node with two EtherChannels, one to each switch. In this configuration, PowerHA uses the EtherChannel adapters as the base adapters.

Figure 3-6 Multiple EtherChannel configuration

For more information about configuring PowerHA with EtherChannel, refer to 13.1, “EtherChannel” on page 600.

Hostnames and node namesTypically, the hostname is the same as the PowerHA node name. If you use the Standard configuration path, PowerHA retrieves the hostname from a node and uses it as the node name. In the Extended configuration path, you can specify the node name.

Note: An EtherChannel can consist of one to eight primary adapters with only one backup adapter per EtherChannel.

Given this, it is quite possible to have only one adapter as the primary and one adapter as the backup if you want to handle adapter or switch failures.

SW1 SW2

HACMP configured for 2 base adapters per node – en4 + en5

ha53node2

en1 en2en0

en4 (en0 + en1)

Clients

en4

en3

en5 (en2 + en3)

ha53node1

en1 en2en0

en4 (en0 + en1)

en3

en5 (en2 + en3)

en5 en4 en5


When an application requires that the AIX TCP/IP hostname attribute moves with an application to another node at fallover, you can use pre-event and post-event scripts to change the hostname to correspond to the service IP label when the resource group that contains this application moves over to another node.

We recommend that you try to avoid this situation, as it will limit your fallover options. For example, you cannot change the hostname for a fallover node if there is an application already running on it. In this case, you would need to configure a standby node.

/etc/hostsAn IP address and its associated label (name) must be present in the /etc/hosts file. We recommend that you choose one of the cluster nodes to perform all changes to this file and then use ftp or the HACMP file collections to propagate the /etc/hosts file to the other nodes.

IP aliasesAn IP alias is an IP address configured onto a NIC in addition to the base IP address of the NIC. The use of IP aliases is an AIX function that is supported by PowerHA. AIX supports multiple IP aliases on a NIC, each on the same or different subnets.

Persistent IP addresses (aliases)A primary reason for using a persistent alias is to provide access the node with PowerHA services down. This is a routable address and is available as long as the node is up. You need to configure this alias through PowerHA. When PowerHA starts, it checks to see if the alias is available. If it is not, PowerHA configures it on an available adapter on the designated network. If the alias is already available, PowerHA leaves it alone.

Note: If you plan to use the DLPAR functionality provided by PowerHA, the AIX hostname and the HMC LPAR name must match.

Note: We strongly recommend that you test the direct and reverse name resolution on all nodes in the cluster and the associated Hardware Management Consoles (HMCs). All these must resolve names identically, otherwise you might run into security issues and other name resolution related problems.

Note: AIX allows IP aliases with different subnet masks to be configured for an interface. However, PowerHA will use the subnet mask of the base IP address for all IP aliases configured on this network interface.


A persistent alias:

� Always stays on the same node (is node-bound).� Co-exists with other IP labels present on an interface.� Does not require installing an additional physical interface on that node.� Is not part of any resource group.

We recommend that you configure the persistent alias through AIX (smitty inetalias) before configuring PowerHA. Then use the persistent alias as the communication path to the node, not the base adapters. This allows you the freedom to further change the base IPs, if required (be sure you check /usr/es/sbin/cluster/etc/rhosts file if you change the base adapter address). After PowerHA is configured, add the persistent alias to the PowerHA configuration.

Figure 3-7 illustrates the concept of the persistent address. Note that this is simply another IP address configured on one of the base interfaces. The netstat command will show it as an additional IP address on an adapter.

Figure 3-7 Persistent Aliases

Important: If the persistent IP address exists on the node, it MUST be an alias, NOT the base address of an adapter.

Note: The persistent IP address will be assigned by PowerHA to one communication interface which is part of a PowerHA defined network.

Persistent Aliasnode01

Base Adapternode01a

Base Adapternode01b

Base Adapternode02a

Base Adapternode02b

Network

Persistent Aliasnode02

node02node01

Communication Interfaces(adapters)


SubnettingSubnet requirements will vary depending upon the configuration chosen (IP Address Takeover -IPAT- via replacement or alias). However, all the communication interfaces configured in the same PowerHA Network Name must have the same subnet mask, while interfaces belonging to different PowerHA Network Names can have either the same or a different network mask.

Fundamentally, for IPAT via replacement:

� Base and service IP addresses on the primary adapter must be on the same subnet.

� All base IP addresses on the secondary adapters must be on separate subnets (different from each other and from the primary adapter).

For IPAT via aliases:

� All base IP addresses on a node must be on separate subnets (if heartbeat monitoring over IP aliases is not used).

� All Service IP addresses must be on a separate subnet from any of the base subnets.

� The service IP addresses can all be in the same or different subnets.

� The persistent IP address can be in the same or different subnet from the service IP address.

� If you choose to use heartbeat monitoring over IP aliases, then the base IP addresses can be on the same or different subnets as they are not monitored by PowerHA, only the PowerHA supplied aliases are monitored.

Default gateway (route) considerationsDepending on your IP network configuration, during the manipulation of the interfaces by PowerHA, you might find yourself losing your default route.

If you tie your default route to one of the base address subnets and that adapter fails, your default route will be lost.

To prevent this situation, we recommend that you use a persistent address and tie the default route to this subnet. The persistent address will be active as long as the node is active and therefore so will the default route.

Note: The base adapter addresses are also known as “boot” IP addresses if you are not using heartbeat monitoring over IP aliases (further explained in this section).


If you choose not to do this, then you will have to create a post-event script to re-establish the default route if this becomes an issue.

ARP cache updatingDuring manufacturing, every Network Interface Card (NIC) is given a unique hardware address, the Media Access Control (MAC) address. The MAC address is the address used by the network drivers to send packets between NICs on the local network. Most systems maintain a list that contains recently used IP addresses and their corresponding MAC addresses called an Address Resolution Protocol (ARP) cache. Because PowerHA can move IP addresses between NICs, some client ARP cache entries might become inaccurate.

After a cluster event, PowerHA nodes and network devices that support promiscuous mode automatically update their ARP caches. Clients and network appliances that do not support promiscuous mode continue to have incorrect entries. You can manage these updates in one of two ways:

� Use alternate hardware addresses. Configure PowerHA to move both the IP address and the MAC address (works only with IPAT via replacement).

� Update the ARP cache through use of ping_client_list entries in clinfo.rc.

PowerHA in a switched networkIf VLANs are used, all interfaces defined to PowerHA on a given network must be on the same VLAN. That is, all adapters in the same network are connected to the same physical network and can communicate between each other (“see” each other’s MAC addresses).

Ensure that the switch provides a timely response to ARP requests. For many brands of switches, this means turning off the following functions:

� The spanning tree algorithm� portfast� uplinkfast� backbonefast

If it is necessary to have spanning tree turned on, then portfast should also be turned on.

Note: NOT all adapters have to contain addresses that are routable outside the VLAN. Only the service and persistent addresses need to be routable. The base adapter addresses and any aliases used for heartbeating do not need to be routed outside the VLAN because they are not known to the client side.


Ethernet media speed settingsFor Fast Ethernet adapters, because the media speed negotiation might cause problems in certain adapter-switch combinations, we recommend that you not use autonegotiaton, but rather set the media to run at the desired values for speed and duplex.

3.8.3 IP Address Takeover planningIP Address Takeover (IPAT) is the mechanism used by PowerHA to move service addresses between communication interfaces.

Two methods can be used, IPAT via replacement and IPAT via aliases. Your network configuration will depend on which method you use to have PowerHA manipulate the interfaces.

For any new installation, we recommend the use of IP Address Takeover (IPAT) via aliases as this is easy to implement and more flexible than IPAT via replacement. You can have multiple service addresses on the same adapter at any given time, and there are some time savings during fallovers because PowerHA simply has to add an alias rather then reconfiguring the base IP address of an adapter.

Some configurations will require the use of heartbeating over aliases. For example, if both local base adapters are on the same subnet, or if all base adapters on all nodes are on separate subnets.

Each option will be looked at in detail in the following section. Our example will use IPAT via Aliases and heartbeating over aliases.

IP Address Takeover (IPAT) via IP replacementThis is the more traditional way of configuring PowerHA networks. PowerHA will replace the boot address with the service address when it starts.

For a two-node cluster, at least one subnet per communication interface per node is required (same subnet mask for all subnets). For a cluster with multiple communication interfaces per node, ensure that these requirements are met:

� Base and service addresses for the primary communication interface must be on the same subnet.

� All secondary communication interfaces must have their base IP addresses on separate subnets (from each other and from the primary one).

IPAT via replacement has the advantage of allowing you to do Hardware Address Takeover (HWAT) in conjunction with IPAT via replacement. This feature allows you to move the (locally administered) MAC address of the


adapter holding the service IP address along with the IP address to a standby adapter. This avoids the need for the client side ARP cache to be updated in case an IP address swap occurs.

Figure 3-8 illustrates the status of the network adapters before and after PowerHA starts on the nodes. Notice that PowerHA replaces the boot/base address with the service address when it starts. Fallover is to secondary (also known as standby) adapters, where again, the base (standby) address is replaced by the service address. The number of service addresses is restricted to the number of spare adapters defined on the same PowerHA network.

Figure 3-8 IPAT via replacement

IP Address Takeover (IPAT) via aliasingThis is the newer method to assign service addresses and is more flexible than IPAT via replacement. Using IPAT via aliasing, you can have multiple IP addresses assigned to a single communication interface.

PowerHA allows the use of IPAT via IP Aliases with the following network types that support gratuitous ARP (in AIX):

� Ethernet� Token Ring� FDDI

Note: IPAT via IP aliasing is not supported on ATM networks.

en0 – 192.168.100.31 (base) ha53node1

en1 - 10.10.32.31 (base/stby) ha53node1b

192.168.100.131 (svc) app1svc

ha53node1


ha53node2

en0 – 192.168.100.32 (base) ha53node2

192.168.100.132 (svc) app2svc

en0 – 192.168.100.131 (svc) app1svc


ha53node1


ha53node2

en0 – 192.168.100.132 (svc) app2svc

* When HACMP starts, it replaces the base address with the service address.

HACMP not Running

HACMP Running


When PowerHA starts, it simply configures a service alias on top of existing base IP address of an available adapter.

Consider the following requirements in order to use IPAT via aliases:

� Subnet requirements:

– Each base adapter must be on a separate subnet to allow heartbeating. The base addresses do not have to be routable outside of the cluster.

– The service addresses reside on a separate subnet from any of the base subnets when there are two or more interfaces per network configured. There can be multiple service addresses and they can all be on the same subnet or different ones.

– The persistent alias can be in the same or different subnet as the service.

– The subnet masks must all be the same.

� Multiple service labels can coexist as aliases on a given interface.

� Hardware Address Takeover (HWAT) cannot be configured.

In a multiple interface per network configuration, it is common to use a persistent alias and include it in the same subnet as your default route. This typically means that the persistent address is included in the same subnet as the service addresses. The persistent alias can be used to access the node when PowerHA is down as well as overcome the default route issue.

In a single interface per network configuration, it is now possible to configure both the base (or boot) and service IPs on the same subnet. More details on this can be found at:

http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105185

You can configure a distribution preference for the placement of service IP labels that are configured in PowerHA The placement of the alias is configurable through SMIT menus as follows:

� Anti-collocation:

This is the default. HACMP distributes all service IP aliases across all available communication interfaces using a “least loaded” selection process.

� Collocation:

PowerHA allocates all service IP label aliases on the same communication interface (NIC).

Note: This restriction is lifted if heartbeat monitoring over aliases is used.



� Anti-collocation with persistent:

PowerHA distributes all service IP label aliases across all active communication interfaces that are NOT hosting the persistent IP label. PowerHA will place the service IP label alias on the interface that is hosting the persistent label only if no other network interface is available. If you did not configure persistent IP labels, PowerHA lets you select the Anti-Collocation with Persistent distribution preference, but it issues a warning and uses the regular anti-collocation preference by default.

� Collocation with persistent:

All service IP label aliases are allocated on the same NIC that is hosting the persistent IP label. This option can be useful in VPN firewall configurations where only one interface is granted external connectivity, and all IP addresses (persistent and service) must be allocated on the same communication interface. If you did not configure persistent IP labels, PowerHA lets you select the Collocation with Persistent distribution preference, but it issues a warning and uses the regular collocation preference by default.

For more information and examples of using service distribution policies, refer to 13.2, “Distribution preference for service IP aliases” on page 609.

3.8.4 Heartbeating over aliasesPowerHA requires separate subnets for each base adapter to be monitored. If you have two Ethernet adapters per node you require two subnets. If you have three adapters, you require three subnets. These subnets do not have to be routable outside the cluster network(s).

To provide the means to monitor these adapters (without changing the base adapter address) and alleviate any subnetting concerns, PowerHA provides the heartbeating over alias feature. This method does not require to make any changes to existing base addresses. PowerHA will simply ignore the base addresses and add its own set of aliases to do the heartbeating.

When using heartbeating over IP aliases, the IP addresses used at boot time can reside on the same subnet or different ones; however, an IP address used at boot time must reside on a subnet that does not include service IP labels. We found that if all addresses (base and service) fall in the same subnet, you will experience routing issues due to the AIX route striping feature.


To set up heartbeat over IP aliases, configure an “IP Address Offset for Heartbeating over IP Aliases” as part of the PowerHA network configuration. The IP addresses used for heartbeat monitoring are calculated and assigned by PowerHA using this offset value. The subnet mask is the same as that used for the service and non-service addresses.

For example, you could use 1.1.1.1 as the IP Address Offset. If you had a network with two NICs on each node, and a subnet mask of 255.255.255.0, you would end up with the following heartbeat IP aliases:

� node01

– en0 - 1.1.1.1– en1 - 1.1.2.1

� node02

– en0 - 1.1.1.2– en1 - 1.1.2.2

Heartbeat alias IP addresses are added by PowerHA when it starts on the node and then removed again when it stops. These IP alias addresses are only used for heartbeat messages. They do not need to be routed and should not be used for any other traffic. The subnet mask is the same as that used for the service and non-service.

Figure 3-9 illustrates the status of the network adapters before and after PowerHA starts on the nodes. Notice that the base addresses never change. In addition to the service and persistent aliases being added to the base adapters by PowerHA the heartbeat aliases are also added. These are removed when PowerHA is stopped, along with the service alias. In our example, only the 192.168.100/24 network is routable outside the cluster.


Figure 3-9 Heartbeating over Aliases

A netstat -i command will show three IP addresses on each adapter while PowerHA is running.

3.8.5 Non-IP network planningPoint-to-point networks play an important role in ensuring the high availability of the cluster. It is not safe enough to depend on a single TCP/IP network to ensure the cluster availability and prevent cluster partitioning. This is why it is important that you create point-to-point networks. In the case of larger clusters, make sure you create adequate paths between all nodes in the clusters.

The objective of a serial, or point-to-point network topology is to provide enough paths between the cluster nodes in order for RSCT to make a proper diagnosis of the severity of the cluster failure.

en0 – 10.10.31.31 (base) ha53node1a

en1 - 10.10.32.31 (base) ha53node1b

192.168.100.31 (pers) ha53node1

ha53node1

en1 - 10.10.32.32 (base) ha53node2b

ha53node2

en0 – 10.10.31.32 (base) ha53node2a

192.168.100.32 (pers) ha53node2

en0 – 10.10.31.31 (base) ha53node1a

en1 - 10.10.32.31 (base) ha53node1b

ha53node1

en1 - 10.10.33.32 (base) ha53node2b

ha53node2

en0 – 10.10.31.32 (base) ha53node2a

*netstat will show 3 IP’s on each interface.

HACMP Running

192.168.100.132 (svc) app2svc 192.168.100.131 (svc) app1svc

192.168.100.131 (svc) app1svc

192.168.100.31 (pers) ha53node1

192.168.100.132 (svc) app2svc

192.168.100.32 (pers) ha53node2

When HACMP starts, it places the service aliases on the boot adapters.Note – The persistent alias is shown as already available.

HACMP not Running

Heartbeating Aliases – offset 1.1.1.1

1.1.1.1 Heartbeating Alias





Cluster partitioningPartitioning, also called node isolation (or “split brain”), occurs when a PowerHA node stops receiving all heartbeat traffic from another node (on all available networks), and assumes that the other node has failed.

The problem with a partitioned cluster is that the node(s) on one side of the partition interpret the absence of heartbeats from the nodes on the other side of the partition to mean that those nodes have failed and then generate node failure events for those nodes. After this occurs, nodes on each side of the cluster attempt to take over resources (if so configured) from a node that is still active and therefore still legitimately owns those resources. These attempted takeovers can cause unpredictable results in the cluster—for example, data corruption due to disks being reset.

The best protection against this situation is to provide more networks, both TCP/IP and point-to-point, in order to allow PowerHA to diagnose the severity of the problem. Remember that PowerHA (RSCT specifically) sends and receives heartbeat messages across all available networks—the more networks, the better able PowerHA is to determine if the problem is with a node or a network.

The following set of four figures illustrate how to add networks to the cluster in order to provide better protection against partitioning.


Figure 3-10 illustrates a four node cluster with a single Ethernet connection to each server. Because there is only one network, if any link is lost, part of the cluster will be partitioned. The example shows a break between the two Ethernet switches, resulting in the two nodes on the left being partitioned from the two on the right. In this case, problems might arise due to nodes trying to acquire resources from active nodes.

Figure 3-10 Partitioned Cluster

node01

Ethernet SW1 Ethernet SW2

node02

node04 node03

TCP/IP Network


Figure 3-11, is a bit more realistic, showing dual Ethernet connections from each node. Each Ethernet adapter is connected to a separate switch. In this case, failure of both switches or failure of both Ethernet connections will result in a partitioned cluster. However, the TCP/IP network itself remains a single point of failure.

Figure 3-11 Two Ethernet non-partitioned cluster

node01


node02

node04 node03

TCP/IP Network


Figure 3-12 is our recommended configuration. We have dual Ethernet connections going to multiple Ethernet switches, and we add a point-to-point loop network. The loop network has each node connected to its immediate neighbors. One connection can be lost and RSCT will still be able to connect to all surviving nodes. It would take a dual failure on this node, as well as the TCP/IP network to fail before the cluster would end up in a partitioned configuration.

Figure 3-12 Ethernet and point-to-point loop network configuration

node01


node02

node04 node03

point-to-point network

TCP/IP Network


For a more robust and reliable configuration, consider implementing a star configuration. In this configuration, in addition to the TCP/IP network, each node is connected to all other cluster nodes by point-to-point networks. In case of failure of multiple nodes, RSCT can still communicate between any surviving nodes. This configuration is illustrated in Figure 3-13.

Figure 3-13 Ethernet and point-to-point star network configuration

3.8.6 Planning RS232 serial networksAn RS232 network contains one serial port on each node connected via a serial cable. If you have multiple nodes, you will need multiple serial ports and cables.

If you decide to use an RS232 serial network, consider these recommendations:

� Some pSeries servers have restrictions on using the onboard (integrated) serial ports, some ports are unavailable, and some ports have to be assigned as a group (especially in an LPAR environment).

� If there are no serial ports available, and your planned PowerHA configuration includes an RS232 network, you will require a PCI serial adapter per cluster node (LPARs).

� All RS232 networks defined to PowerHA are automatically configured to run the serial ports at 38,400 Baud. Depending on the length of the serial cable, RSCT supports baud rates of 38400, 19200, and 9600.

node01


node02

node04 node03

point-to-point network

TCP/IP Network


� Any serial port that meets the following requirements can be used for heartbeat:

– The hardware supports usage of that serial port for modem attachment.– The serial port is free for PowerHA exclusive use.

The cable needed to connect two serial ports has to be wired as a full NULL modem cable, and is not supplied by default with the hardware. Figure 3-14 illustrates the NULL modem wiring. The actual cable connectors will depend on your hardware, and most likely be a DB9, DB25, or RJ50 connector.

Figure 3-14 NULL modem cable wiring

Refer to the hardware documentation and PowerHA support announcements to determine if your serial ports meet the requirements.

3.8.7 Planning disk heartbeatingHeartbeating over disk provides another type of non-IP point-to-point network used for failure detection. In prior versions of PowerHA, you could configure non-IP heartbeat over SCSI or SSA disks by configuring Target Mode SCSI (TMSCSI) and Target Mode SSA (TMSSA) point-to-point networks.

Starting with PowerHA 5.1, you can also configure a point-to-point, non-IP disk heartbeat connection using any shared disk that is part of an enhanced concurrent mode (ECM) volume group.

In a disk heartbeat network, two nodes connected to the disk, periodically write heartbeat messages and read heartbeat messages (written by the other node) on a small, non-data portion of the disk. Though a disk heartbeat network

RX

TX

DTR

DCD

DSR

RTS

CTS

RI

GND

Female Connector

RX

TX

DTR

DCD

DSR

RTS

CTS

RI

GND

Female Connector

Null Modem Pinout


connects only two nodes, in clusters with more than two nodes, multiple disks can be used for heartbeating.

If you are using SAN disks for your shared disks, consider using disk heartbeating for the following reasons,

� You can use any existing shared disks (inluding. SAN-attached disks).� No additional hardware or cables are required.

In order to take advantage of disk heartbeating, you require SAN disks that are accessible by both nodes and are part of an enhanced concurrent volume group.

Any shared disk in an enhanced concurrent mode volume group can support a point-to-point heartbeat connection. Each disk can support one connection between two nodes. The connection uses the shared disk hardware as the communication path.

A disk heartbeat network in a cluster contains:

� Two nodes, each with a SAN adapter. A node can be a member of any number of one disk heartbeat networks.

� An enhanced concurrent mode disk. A single disk can participate in only one heartbeat network.

Keep in mind the following points when selecting a disk to use for disk heartbeating:

� A disk used for disk heartbeating must be a member of an enhanced concurrent mode volume group. However, the volume groups associated with the disks used for disk heartbeating do not have to be defined as resources within a PowerHA resource group.

� The disk used for heartbeating should not be very busy, because PowerHA expects the writes to occur within certain intervals. If you choose to use a disk that has significant I/O load, increase the value for the time-out parameter for the disk heartbeat network. We generally recommend that you use a disk that does not experience more than 60 seeks/second.

� When Subsystem Device Driver is installed and the enhanced concurrent volume group is associated with an active vpath device, ensure that the disk heartbeating communication device is defined to use the /dev/vpath device (rather than the associated /dev/hdisk device) in order to take advantage of the multipath software.

� If a shared volume group is mirrored, at least one disk in each mirror should be used for disk heartbeating.

� The recommendation for the disk heartbeat network is to have one LUN (disk) per pair of nodes per disk enclosure.


� Disk heartbeat is also supported on virtual disks provided by a Virtual I/O Server.

� Disk heartbeat is also supported on third-party storage, such as EMC hdiskpower# devices.

Figure 3-15 illustrates the basic components found in a disk heartbeat network.

Figure 3-15 Disk Heartbeating Network

Note that the vpath number might be different on each node due to the AIX disk ordering. Therefore check the PVID to ensure that you have selected the same disk from either node.

We generally recommend that you run PowerHA discovery and pick the appropriate disks from the picklist provided.

HACMP 5.4.1, and HACMP 5.3 SP6, also introduced multi-node disk heartbeat which allows multiple nodes to use the same disk for heartbeating. It currently is limited to an “Online on all Available Nodes” resource group startup policy. This is synonymous with true concurrent access. For more information about both disk heartbeating options, refer to 12.6, “Disk heartbeat” on page 586.

node1_to_node2hb

Communication Devices

node2_to_node1hb

/dev/vpath0

PVID

Enhanced Concurrent

Volume Group

NOTE: When configuring disk heartbeat, use the PVID for reference because the vpath number can be different on each node.

diskhb10network


3.8.8 Additional network planning considerationsIn addition to configuring the network topology, there are two other topics to be considered during cluster design:

� PowerHA with Domain Name Service (DNS) and Network Information Services (NIS)

� PowerHA network modules

Next, we discuss both of these topics.

PowerHA with DNS and NISTo ensure that cluster events complete successfully and quickly, PowerHA disables NIS or DNS hostname resolution during service IP label swapping by setting the NSORDER AIX environment variable to local. Therefore, the /etc/hosts file of each cluster node must contain all PowerHA defined IP labels for all cluster nodes.

After the swap completes, DNS access is restored.

We suggest that you make the following entry in the /etc/netsvc.conf file to assure that the /etc/hosts file is read before a DNS lookup is attempted:

hosts = local, bind4

Network modulesEach supported cluster network has a corresponding RSCT network module, also known as a network interface module (NIM), that monitors the heartbeat traffic over the cluster network. The network modules maintain a connection to each other through which cluster managers on all nodes send keep-alive messages to each other.

Currently, PowerHA passes the corresponding tuning parameters to the RSCT network modules to support communication over the following types of networks:

� Ethernet� Serial (RS232)� Disk heartbeat (over enhanced concurrent mode disks)� Multi-node disk heartbeat� Target-mode SCSI� Target-mode SSA (no longer supported in PowerHA 5.5)� Token-Ring� FDDI� SP switch (hps, no longer supported in PowerHA 5.5)� Infiniband (ib)� ATM


Failure detection rateThe failure detection rate determines how quickly a connection is considered to have failed. The failure detection rate consists of two components:

� Cycles to fail (cycle). The number of heartbeats missed before detecting a failure.

� Heartbeat rate (hbrate). The number of seconds between heartbeats.

The time needed to detect a failure can be calculated using the following formula:

(heartbeat rate) X (cycles to fail) X 2

The failure detection rate can be changed for a network module in two ways:

� Select the preset rates of slow, normal, or fast

For network type Ether, the following values apply:

– fast = 10 seconds (5 x 1 x 2)

– normal = 20 seconds (10 x 1 x 2)

– slow = 48 seconds (12 x 2 x 2)

� Change the actual components cycle or hbrate:

You can use the SMIT menu “Change a Cluster Network Module using Custom Values”

The preset values are calculated for each type of network to give reasonable results. You might want to consider changing the failure detection rate to:

� Decrease fallover time� Prevent node CPU saturation from causing false takeovers.

You can find out the network sensitivity (also known as failure detection rate) from the topology services as shown in Example 3-1.

Example 3-1 Network sensitivity for Ether type network

p630n01-# lssrc -ls topsvcs.................. Omitted lines.......................NIM's PID: 19978net_ether_01_1 [1] 3 1 S 10.10.31.31 10.10.31.31net_ether_01_1 [1] en0 0x42d56868 0x42d56872HB Interval = 1.000 secs. Sensitivity = 10 missed beats.................. Omitted lines.......................


/usr/sbin/cluster/netmon.cfIn cluster configurations where there are networks that under certain conditions can become single adapter networks, it can be difficult for PowerHA to accurately determine a particular adapter failure. For these situations, RSCT uses the netmon.cf file.

RSCT topology services scans the netmon.cf configuration during cluster startup. When netmon needs to stimulate the network to ensure adapter function, it sends ICMP ECHO requests to each IP address. After sending the request to every address, netmon checks the inbound packet count before determining whether an adapter has failed.

There is also a new format to use in a Virtual I/O Server environment. More information about both formats can be found in 13.4, “Understanding the netmon.cf file” on page 621.

3.8.9 Completing the network planning worksheetsThe following worksheets include the necessary network information.

The first worksheet (Table 3-6) shows the specifications for the Ethernet network used in our example.

Table 3-6 Cluster Ethernet Networks

PowerHA CLUSTER WORKSHEET - PART 4 of 11CLUSTER ETHERNET NETWORKS

DATE: Mar 2009

NETWORK NAME

NETWORK TYPE

NETMASK NODE NAMES

IPAT VIA IP ALIASES

IP Address Offset for Heartbeating over IP Aliases

ether10 ethernet (public)

255.255.255.0 node01, node02

enable 1.1.1.1

COMMENTS *NOTE: IP Address offset will add IP Aliases to each Ethernet Interface when PowerHA Starts. These aliases are then used for heartbeating, and the base adapter addresses are not monitored.

Select default Failure Detection rate (ether = normal = 20 seconds)


Table 3-7 documents the point-to-point network(s) found in the cluster. Our example only uses a disk heartbeat network, but we have included an RS232 network as an example.

Table 3-7 Point-to-Point Networks

Now that the networks have been recorded, document the interfaces and IP addresses used by PowerHA, as shown in Table 3-8.

Table 3-8 Cluster Communication Interfaces and IP addresses

PowerHA CLUSTER WORKSHEET - PART 5 of 11CLUSTER POINT-TO-POINT AND SERIAL NETWORKS

DATE: Mar 2009

NETWORK NAME

NETWORK TYPE

NODE NAMES

Device INTERFACE NAME

ADAPTER LABEL

diskhb10 diskhb node01, node02

hdisk2hdisk2

NA node1_to_node2hbnode2_to_node1hb

COMMENTS *NOTE: Integrated serial ports are not supported on the 550 for PowerHA use and would require async adapters. Hence the serial10 network will not be configured—only the diskhb10 network.

PowerHA CLUSTER WORKSHEET - PART 6 of 11INTERFACES AND IP ADDRESSES

DATE: 2009

node01

IP Label IP Alias Dist. Preference

NETWORK INTERFACE

NETWORK NAME

INTERFACE FUNCTION

IP ADDRESS /MASK

node01a NA en0 ether10 base(non-service)

10.10.31.31255.255.255.0

ha55node1 Anti-collocation(default)

NA ether10 persistent 192.168.100.31255.255.255.0

app1svc Anti-collocation(default)

NA ether10 service 192.168.100.131255.255.255.0


3.9 Planning storage requirementsWhen planning cluster storage, you must consider the following requirements:

� Physical disks:

– Ensure any disk solution is highly available. This can be accomplished through mirroring, RAID, and redundant hardware.

– Internal disks. Typically this is the location of rootvg.

– External disks. This must be the location of the application data.

� LVM components:

– All shared storage has unique logical volume, jfslog, and file system names

– Major numbers are unique.

– Is mirroring of data required?

3.9.1 Internal disks Internal node disks typically contain rootvg and perhaps the application binaries. We recommend that the internal disks be mirrored for higher availability to prevent a node fallover due to a simple internal disk failure.

node02

IP Label IP AliasDist. Preference

NETWORK INTERFACE

NETWORK NAME

INTERFACE FUNCTION

IP ADDRESS /MASK

node02a NA en0 ether10 base(non-service)

10.10.31.32255.255.255.0

ha55node2 Anti-collocation(default)

NA ether10 persistent 192.168.100.32255.255.255.0

app2svc Anti-collocation(default)

NA ether10 service 192.168.100.132255.255.255.0

COMMENTS Each node contains 2 base adapters, each in their own subnet. Each node also contains a persistent (node bound) address and a service address. IPAT via Aliases is used as well as heartbeat over aliases (starting range = 1.1.1.1)

PowerHA CLUSTER WORKSHEET - PART 6 of 11INTERFACES AND IP ADDRESSES

DATE: 2009


3.9.2 Shared disksApplication data resides on the external disk in order to be accessible by all required nodes. These are referred to as the shared disks.

In a PowerHA cluster, shared disks are connected to more than one cluster node. In a non-concurrent configuration, only one node at a time owns the disks. If the owner node fails, in order to restore service to clients, another cluster node in the resource group node list acquires ownership of the shared disks and restarts applications.

Depending on the number of disks in a resource group and the disk takeover method, a takeover can take from 30 to 300 seconds. How long the application takes to start can also influence the fallover time.

PowerHA supports the following IBM disk technologies as shared external disks in a highly available cluster:

� SCSI drives, including RAID subsystems.� IBM SSA adapters and SSA disk subsystems (no longer supported in

PowerHA 5.5)� Fibre Channel adapters and disk subsystems� Data path devices (VPATH)—SDD 1.3.1.3 or greater.� AIX MPIO disks

OEM disk can be supported, but you have to validate support for these disk subsystems with the equipment manufacturer.

When working with a shared volume group:

� Do not include an internal disk in a shared volume group, because it will not be accessible by other nodes.

� Do not activate (vary on) the shared volume groups in a PowerHA cluster at system boot. Ensure that the automatic varyon attribute in the AIX ODM is set to No for shared volume groups being part of resource groups. You can use the cluster verification utility to change this attribute.

Important: All shared disks must be “zoned” to any cluster nodes requiring access to the specific volumes. That is, the shared disks must be able to be varied on and accessed by any node that has to run a specific application.

We recommend that you verify that shared volume groups can be manually varied on each node.


3.9.3 Enhanced Concurrent Mode (ECM) volume groupsAny disk supported by PowerHA for attachment to multiple nodes can be used to create an enhanced concurrent mode volume group, and used in either concurrent or non-concurrent environments:

Concurrent An application runs on all active cluster nodes at the same time. To allow such applications to access their data, concurrent volume groups are varied on all active cluster nodes. The application then has the responsibility to ensure consistent data access.

Non-concurrent An application runs on one node at a time. The volume groups are not concurrently accessed, they are still accessed by only one node at any given time.

When you vary on the volume group in enhanced concurrent mode, the LVM allows access to the volume group on all nodes. However, it restricts the higher-level connections, such as JFS mounts and NFS mounts, on all nodes, and allows them only on the node that currently owns the volume group.

More details on enhanced concurrent volume groups and the features dependent on using them can be found in Chapter 12, “Storage related considerations” on page 575.

Important: If you define a volume group to PowerHA, do not manage it manually on any node outside of PowerHA while PowerHA is running. This can lead to unpredictable results. C-SPOC should always be used to maintain the shared volume groups.

Note: Although you can define enhanced concurrent mode volume groups, this does not necessarily mean that you are going to use them for concurrent access; for example, you can still define and use these volume groups as normal shared file system access. However, you must NOT define file systems on volume groups that are intended for concurrent access.


3.9.4 Shared logical volumesPlanning for shared logical volumes is all about data availability. Making your data highly available through the use of mirroring or RAID is a key requirement. Remember that PowerHA relies on LVM and storage mechanisms (RAID) to protect against disk failures, therefore it is imperative that you make the disk infrastructure highly available.

Consider the following guidelines when planning shared LVM components:

� Logical volume copies or RAID arrays can protect against loss of data from physical disk failure.

� All operating system files should reside in the root volume group (rootvg) and all user data should reside on a separate shared volume group.

� Volume groups that contain at least three physical volumes provide the maximum availability when implementing mirroring.

� If you plan to specify the “Use Forced Varyon of Volume Groups if Necessary” attribute for the volume groups, you need to use the super strict disk allocation policy for mirrored logical volumes.

� With LVM mirroring, each physical volume containing a copy should get its power from a separate source. If one power source fails, separate power sources maintain the no single point of failure objective.

� Consider quorum issues when laying out a volume group. With quorum enabled, a two-disk volume group presents the risk of losing quorum and data access. Either build three-disk volume groups (for example, using a quorum buster disk/LUN) or disable quorum.

� Keep in mind the cluster configurations that you have designed. A node whose resources are not taken over should not own critical volume groups.

� Ensure regular backups are scheduled.

After you have established a highly available disk infrastructure, you must consider the following items as well when designing your shared volume groups:

� All shared volume groups have unique logical volume and file system names. This includes the jfs/jfs2 log files.

PowerHA also supports JFS2 with INLINE logs.

� Do not use encrypted JFS2 file systems.

� Major numbers for each volume group are unique (especially if you plan to use NFS).


Figure 3-16 outlines the basic components found in the external storage. Notice that all logical volumes and file system names are unique, as is the major number for each volume group. The data is made highly available through the use of SAN disk and redundant paths to the devices.

Figure 3-16 External Disk

3.9.5 Fast disk takeoverPowerHA automatically detects node failures and initiates a disk takeover as part of a resource group takeover process. The traditional disk takeover process implies breaking the SCSI (or SSA) disk reservation before varying on the volume group on the takeover node. This can be a lengthy process, especially if there are numerous disks in that volume group.

Starting with PowerHA 5.1, as AIX provides support for enhanced concurrent volume groups, it is possible to reduce the disk takeover (and thus, the resource group) time by using the fast disk takeover option. If a volume group used for shared file system access has been defined in enhanced concurrent mode, PowerHA automatically detects this and varies on the volume group on all nodes part of that RG. This eliminates the need for breaking the hardware disk reservation in case of a takeover. Fast disk takeover requires:

� AIX 5L Version 5.2 and up

� PowerHA 5.1 and up with the bos.clvm.enh AIX fileset installed on all nodes in the cluster

� Enhanced concurrent mode volume groups in non-concurrent resource groups

For existing volume groups included in non-concurrent resource groups, you can convert these volume groups to enhanced concurrent volume groups after upgrading your PowerHA software.

External SAN Disk

app1vgMajor #90

app1vglog jfs2logapp1lv /app1

vpath0

app2vgMajor #91

app2vglog jfs2logapp2lv /app2

vpath1

Enhanced Concurrent Volume groups


The actual fast disk takeover time observed in any configuration depends on factors out of PowerHA control, such as the processing power of the nodes and the amount of unrelated activity at the time of the fallover.

More detailed information can be found in 12.4, “Fast disk takeover” on page 580.

3.9.6 Completing the storage planning worksheetsThe following worksheets contain the required information about the shared volume groups. Combined, they will give you a good idea of the shared disk configuration.

Document the shared volume groups and physical disks as shown in Table 3-9.

Table 3-9 Shared Disks

Record the shared volume group details as shown in Table 3-10.

Table 3-10 Shared Volume Groups

PowerHA CLUSTER WORKSHEET - PART 7 of 11SHARED DISKS

DATE: Mar 2009

node01 node02

VGNAME VPATHS HDISK HDISK VPATHS VGNAME

app1vg n/a hdisk2, hdisk3 hdisk2, hdisk3 n/a

n/a hdisk4, hdisk5 hdisk4, hdisk5 n/a app2vg

COMMENTS All disks are seen by both nodes. app1vg normally resides on node01, app2vg normally resides on node02.

PowerHA CLUSTER WORKSHEET - PART 8 of 11SHARED VOLUME GROUPS (NON-CONCURRENT)

DATE: March 2009

RESOURCE GROUP VOLUME GROUP 1 VOLUME GROUP 1

C10RG1 app1vgMajor Number = 90log = app1vglogLogical Volume 1 = app1lv1Filesystem 1 = /app1 (20 GB)

NA


3.10 Application planningVirtually all applications that run on a standalone AIX server can be integrated into a PowerHA cluster, because they are not aware of the underlining PowerHA functionality. That is, PowerHA basically starts and stops them.

When planning for an application to be highly available, be sure you understand the resources required by the application and the location of these resources within the cluster. This will enable you to provide a solution that allows them to be handled correctly by PowerHA if a node fails.

You must thoroughly understand how the application behaves in a single-node and multi-node environment. We recommend that as part of preparing the application for PowerHA, you test the execution of the application manually on both nodes before turning it over to PowerHA to manage. Do not make assumptions about the application's behavior under fallover conditions.

C10RG2 app2vgMajor Number = 91log = app2vglogLogical Volume 1 = app2lv1Filesystem 1 = /app2 (20 GB)

NA

COMMENTS Create the shared Volume Group on the first node and then import on the second node.#importvg -y app1vg -V 90 vpath0 (might have to make the pv available with chdev -l vpath0 -a pv=yes)#chvg -an app1vg (set vg to not auto vary on)#mount /app1 (ensure the file system mounts)#umount /app1#varyoffvg app1vg (leave VG offline in order for PowerHA to manage)

PowerHA CLUSTER WORKSHEET - PART 8 of 11SHARED VOLUME GROUPS (NON-CONCURRENT)

DATE: March 2009


When planning an application to be protected in a PowerHA cluster, consider the following recommendations:

� Ensure that the application is compatible with the version of AIX used.

� Ensure that the application is compatible with the shared storage solution, because this is where its data will reside.

� Have adequate system resources (CPU, memory), especially in the case when the same node will be hosting all the applications part of the cluster.

� Ensure that the application runs successfully in a single-node environment. Debugging an application in a cluster is more difficult than debugging it on a single server.

� Lay out the application and its data so that only the data resides on shared external disks. This arrangement not only prevents software license violations, but it also simplifies failure recovery.

� If you are planning to include multi-tiered applications in parent/child dependent resource groups in your cluster, such as a database and appserver, PowerHA provides an easy to use SMIT menu to allow you to specify this relationship.

� Write robust scripts to both start and stop the application on the cluster nodes. The startup script must be able to recover the application from an abnormal termination. Ensure that they run properly in a single-node environment before including them in PowerHA.

� Confirm application licensing requirements. Some vendors require a unique license for each processor that runs an application, which means that you must license-protect the application by incorporating processor-specific information into the application when it is installed.

Note: The key prerequisite to making an application highly available is that it first must run correctly in standalone mode on each node that it can reside on.

We recommend that you ensure that the application runs on all required nodes properly before configuring it to be managed by PowerHA.

You need to analyze and address the following aspects:

� Application code: Binaries, scripts, links, configuration files. and so on� Environment variables - any environment variable that needs to be passed

to the application for proper execution� Application data� Networking setup: IP addresses, hostname� Application licensing� Application defined system users


As a result, even though the PowerHA software processes a node failure correctly, it might be unable to restart the application on the fallover node because of a restriction on the number of licenses for that application available within the cluster. To avoid this problem, be sure that you have a license for each system unit in the cluster that might potentially run an application.

� Verify that the application uses a proprietary locking mechanism if you need concurrent access.

3.10.1 Application serversIn PowerHA, an application server is simply a set of scripts used to start and stop an application.

Configure your application server by creating a name to be used by PowerHA and associating a start script and a stop script.

After you have created an application server, you associate it with a resource group (RG). PowerHA then uses this information to control the application.

3.10.2 Application monitoringPowerHA can monitor your application by one of two methods:

� Process monitoring: Detects the termination of a process, using RSCT Resource Monitoring and Control (RMC) capability.

� Custom monitoring: Monitors the health of an application, using a monitor method such as a script that you define.

Starting with HACMP 5.2 you can have multiple monitors for an application.

When defining your custom monitoring method, keep in mind the following points:

� You can configure multiple application monitors, each with unique names, and associate them with one or more application servers.

� The monitor method must be an executable program, such as a shell script, that tests the application and exits, returning an integer value that indicates the application’s status. The return value must be zero if the application is healthy, and must be a non-zero value if the application has failed.

� PowerHA does not pass arguments to the monitor method.

Tip: When application planning, keep in mind that if you cannot do something manually, then PowerHA will be unable to do it as well.


� In HACMP 5.4.1 and above, the monitoring method logs messages to the /var/hacmp/log/clappmond.application_name.resource_group_name.monitor.log file. Also, by default, each time the application runs, the monitor log file is overwritten.

� Do not make the method over complicated. The monitor method is terminated if it does not return within the specified polling interval.

More detailed information can be found in 7.7.7, “Application monitoring” on page 440.

3.10.3 Availability analysis toolThe application availability analysis tool can be used to measure the exact amount of time that any of your PowerHA-defined applications is available. The PowerHA software collects, time stamps, and logs the following information:

� An application monitor is defined, changed, or removed� An application starts, stops, or fails� A node fails or is shut down, or comes up� A resource group is taken offline or moved� Application monitoring via multiple monitors is suspended or resumed.

More detailed information can be found in 7.7.8, “Measuring application availability” on page 453.

3.10.4 Applications integrated with PowerHACertain applications, including Fast Connect Services and Workload Manager, can be configured directly as highly available resources without application servers or additional scripts. In addition, PowerHA verification ensures the correctness and consistency of certain aspects of your Fast Connect Services, or Workload Manager configuration.

Important: As the monitoring process is time sensitive, ALWAYS test your monitor method under different workloads to arrive at the best polling interval value.


3.10.5 Completing the application planning worksheetsThe following worksheets capture the required information for each application.

Update the application worksheet to include all required information, as shown in Table 3-11:

Table 3-11 Application Worksheet

PowerHA CLUSTER WORKSHEET - PART 9 of 11APPLICATION WORKSHEET

DATE: Mar 2009

APP1

ITEM DIRECTORY FILE SYSTEM

LOCATION SHARING

EXECUTABLE FILES

/app1/bin /app1 SAN Storage Shared

CONFIGURATION FILES

/app1/conf /app1 SAN Storage Shared

DATA FILES /app1/data /app1 SAN Storage Shared

LOG FILES /app1/logs /app1 SAN Storage Shared

START SCRIPT /cluster/local/app1/start.sh / rootvg Not Shared (must reside on both nodes)

STOP SCRIPT /cluster/local/app1/stop.sh / rootvg Not Shared (must reside on both nodes)

FALLOVER STRATEGY

Fallover to node02.

NORMAL START COMMANDS AND PROCEDURES

Ensure that the APP1 server is running.

VERIFICATION COMMANDS AND PROCEDURES

Run the following command and ensure APP1 is active. If not, send notification.


Ensure APP1 stops properly.

NODE REINTEGRATION

Must be reintegrated during scheduled maintenance window to minimize client disruption.


APP2

ITEM DIRECTORY FILE SYSTEM

LOCATION SHARING

EXECUTABLE FILES

/app2/bin /app2 SAN Storage Shared

CONFIGURATION FILES

/app2/conf /app2 SAN Storage Shared

DATA FILES /app2/data /app2 SAN Storage Shared

LOG FILES /app2/logs /app2 SAN Storage Shared

START SCRIPT /cluster/local/app2/start.sh / rootvg Not Shared (must reside on both nodes)

STOP SCRIPT /cluster/local/app2/stop.sh / rootvg Not Shared (must reside on both nodes)

FALLOVER STRATEGY

Fallover to node01.


Ensure that the APP2 server is running.

VERIFICATION COMMANDS AND PROCEDURES

Run the following command and ensure APP2 is active. If not, send notification.


Ensure APP2 stops properly.

NODE REINTEGRATION

Must be reintegrated during scheduled maintenance window to minimize client disruption.

COMMENTS Summary of Applications.

PowerHA CLUSTER WORKSHEET - PART 9 of 11APPLICATION WORKSHEET

DATE: Mar 2009


Update the application monitoring worksheet to include all the information required for the application monitoring tools (Table 3-12).

Table 3-12 Application Monitoring Worksheet

PowerHA CLUSTER WORKSHEET - PART 10 of 11APPLICATION MONITORING

DATE: March 2009

APP1

Can this Application Be Monitored with Process Monitor? Yes

Processes to Monitor app1

Process Owner root

Instance Count 1

Stabilization Interval 30

Restart Count 3

Restart Interval 95

Action on Application Failure Fallover

Notify Method /usr/es/sbin/cluster/events/notify_app1

Cleanup Method /usr/es/sbin/cluster/events/stop_app1

Restart Method /usr/es/sbin/cluster/events/start_app1

APP2

Can this Application Be Monitored with Process Monitor? Yes

Processes to Monitor app2

Process Owner root

Instance Count 1

Stabilization Interval 30

Restart Count 3

Restart Interval 95

Action on Application Failure Fallover

Notify Method /usr/es/sbin/cluster/events/notify_app2

Cleanup Method /usr/es/sbin/cluster/events/stop_app2

Restart Method /usr/es/sbin/cluster/events/start_app2


3.11 Planning for resource groupsPowerHA manages resources through the use of resource groups.

Each resource group is handled as a unit that can contain different types of resources. Some examples are: IP labels, applications, file systems, and volume groups. Each resource group has preferences that define when and how it will be acquired or released. You can fine-tune the non-concurrent resource group behavior for node preferences during a node startup, resource group fallover to another node in the case of a node failure, or when the resource group falls back to a reintegrating node.

The following rules and restrictions apply to resources and resource groups:

� To be kept highly available by PowerHA, a cluster resource must be part of a resource group. If you want a resource to be kept separate, you can define a group for that resource alone. A resource group can have one or more resources defined.

� A resource cannot be included in more than one resource group.

� We recommend that you put the application server along with the resources it requires in the same resource group (unless otherwise needed).

� If you include a node in participating node lists for more than one resource group, make sure the node can sustain all resource groups simultaneously.

After you have decided what components are to be grouped into a resource group, you must plan the behavior of the resource group.

Table 3-13 summarizes the basic startup, fallover, and fallback behaviors that you can configure for resource groups in PowerHA.

Table 3-13 Resource group behavior

Startup Behavior Fallover Behavior Fallback Behavior

Online on home node only (OHNO) for the resource group.

-Fallover to next priority node in the list-Fallover using Dynamic Node Priority

-Never fall back-Fall back to higher priority node in the list

Online using node distribution policy.

-Fallover to next priority node in the list-Fallover using Dynamic Node Priority

Never fall back


3.11.1 Resource group attributesIn the following sections, we discuss resource group attribute setup.

Startup settling timeSettling time only applies to Online on First Available Node (OFAN) resource groups and lets PowerHA wait for a set amount of time before activating a resource group. After the settling time, PowerHA will then activate the resource group on the highest available priority node. Use this attribute to ensure that resource groups do not bounce between nodes, as nodes with increasing priority for the resource group are brought online.

If the node that is starting is a home node for this resource group, the settling time period is skipped and PowerHA immediately attempts to acquire the resource group on this node.

Dynamic Node Priority (DNP) policySetting a dynamic node priority policy allows you select the takeover node based on specific performance criteria. This uses an RMC resource variable such as “lowest CPU load” to select the takeover node. With a dynamic priority policy enabled, the order of the takeover node list is determined by the state of the cluster at the time of the event, as measured by the selected RMC resource variable.

If you decide to define dynamic node priority policies using RMC resource variables to determine the fallover node for a resource group, consider the following points:

� Dynamic node priority policy is most useful in a cluster where all nodes have equal processing power and memory.

Online on first available node (OFAN).

-Fallover to next priority node in the list-Fallover using Dynamic Node Priority-Bring offline (on error node only)

-Never fall back-Fall back to higher priority node in the list

Online on all available nodes.

Bring offline (on error node only)

Never fall back

Startup Behavior Fallover Behavior Fallback Behavior

Note: This is a cluster wide setting and will be set for all OFAN resource groups.


� Dynamic node priority policy is irrelevant for clusters of fewer than three nodes.

� Dynamic node priority policy is irrelevant for concurrent resource groups.

Remember that selecting a takeover node also depends on conditions such as the availability of a network interface on that node.

Delayed fallback timerThe delayed fallback timer lets a resource group fall back to a higher priority node at a time that you specify. The resource group that has a delayed fallback timer configured and that currently resides on a non-home node falls back to the higher priority node at the specified time.

Resource group dependenciesPowerHA offers a wide variety of configurations where you can specify the relationships between resource groups that you want to maintain at startup, fallover, and fallback. You can configure:

� Parent/child dependencies so that related applications in different resource groups are processed in the proper order.

� Location dependencies so that certain applications in different resource groups stay online together on a node or on a site, or stay online on different nodes.

Although by default, all resource groups are processed in parallel, PowerHA processes dependent resource groups according to the order dictated by the dependency, and not necessarily in parallel. Resource group dependencies are honored cluster-wide and override any customization for serial order of processing of any resource groups included in the dependency

Dependencies between resource groups offer a predictable and reliable way of building clusters with multi-tiered applications.

IPAT method and resource groupsYou cannot mix IPAT via IP Aliases and IPAT via IP Replacement in the same resource group. This restriction is enforced during verification of cluster resources.

There is no IPAT with concurrent resource groups.

A resource group can include multiple service IP labels. When a resource group configured with IPAT via IP Aliases is moved, all service labels in the resource group are moved as aliases to an available network interface.


Planning for Workload Manager (WLM)WLM allows users to set targets and limits on CPU, physical memory usage, and disk I/O bandwidth for different processes and applications, for better control over the use of critical system resources at peak loads. PowerHA allows you to configure WLM classes into PowerHA resource groups so that the starting and stopping of WLM and the active WLM configuration can be under cluster control.

PowerHA does not verify every aspect of your WLM configuration, therefore, it remains your responsibility to ensure the integrity of the WLM configuration files. After you add the WLM classes to a PowerHA resource group, the verification utility checks only whether the required WLM classes exist. Therefore, you must fully understand how WLM works, and configure it carefully.

3.11.2 Completing the planning worksheetThe resource group worksheet captures all the required planning information for the resource groups (see Table 3-14).

Table 3-14 Resource Groups Worksheets

PowerHA CLUSTER WORKSHEET - PART 11 of 11RESOURCE GROUPS)

DATE: March 2009

RESOURCE NAME C10RG1 C10RG2

Participating Node Names node01 node02 node02 node01

Inter-Site Management Policy ignore ignore

Startup Policy Online on Home Node Only (OHNO)

Online on Home Node Only (OHNO)

Fallover Policy Fallover to Next Priority Node in List (FONP)

Fallover to Next Priority Node in List (FONP)

Fallback Policy Fallback to Higher Priority Node (FBHP)

Fallback to Higher Priority Node (FBHP)

Delayed Fallback Timer

Settling Time

Runtime Policies

Dynamic Node Priority Policy

Processing Order (Parallel, Serial, or Customized)

Service IP Label app1svc app2svc


File Systems /app1 /app2

File System Consistency Check fsck fsck

File Systems Recovery Method sequential sequential

File Systems or Directories to Export

File Systems or Directories to NFS mount (NFSv2/v3)

File Systems or Directories to NFS mount (NFSv4)

Stable Storage Path (NFSv4)

Network for NFS mount 0

Volume Groups app1vg app2vg

Concurrent Volume Groups

Raw Disk PVIDs

Fast Connect Services

Tape Resources

Application Servers app1 app2

Highly Available Communication Links

Primary Workload Manager Class

Secondary Workload Manager Class

Miscellaneous Data

Auto Import Volume Groups false false

Disk Fencing Activated

File systems Mounted before IP Configured.

false false

COMMENTS Overview of the 2 Resource Groups.

PowerHA CLUSTER WORKSHEET - PART 11 of 11RESOURCE GROUPS)

DATE: March 2009


3.12 Detailed cluster designPulling it all together, using the information collected during the preceding cluster planning and documented in the Planning Worksheets, we can now build an easy to read, detailed cluster diagram. Figure 3-17 contains a detailed cluster diagram for our example. This diagram is useful to use as an aid when configuring the cluster and diagnosing problems.

Figure 3-17 Detailed cluster design

3.13 Developing a cluster test planJust as important as planning and configuring your PowerHA cluster is developing an appropriate test plan to validate the cluster under failure situations, that is, determine if the cluster can handle failures as expected. You must test, or validate, the cluster recovery before the cluster becomes part of your production environment.


3.13.1 Custom test planAs with previous releases of PowerHA, you should develop a local set of tests to verify the integrity of the cluster. This typically involves unplugging network cables, downing interfaces, and shutting down cluster nodes to verify cluster recovery. This is still a useful exercise as you have the opportunity to simulate failures and watch the cluster behavior. If something does not respond correctly, or as expected, stop the tests and investigate the problem. After all tests complete successfully, the cluster can be moved to production.

Table 3-15 outlines a sample test plan that can be used to test our cluster.

Table 3-15 Sample Test Plan

Cluster Test Plan

Test # Test Description Comments Results

1 Start PowerHA on node01. node01 starts and acquires the C10RG1 resource group.


3 Perform a graceful stop without takeover on node01.

Resource Group C10RG1 goes offline.


5 Perform a graceful stop with takeover on node01.

Resource Group C10RG1 moves to node02.

6 Start PowerHA on node01 node01 starts and acquires the C10RG1 resource group.

7 Fail (unplug) the service interface on node01.

The service IP moves to the second base adapter.

8 Reconnect the service interface on node01.

The service IP remains on the second base adapter.

9 Fail (unplug) the service interface on node01 (now on the second adapter).

The service IP (and persistent) moves to the first base adapter.


3.13.2 Cluster Test ToolTo ease with the testing of the cluster, PowerHA includes a Cluster Test Tool to help you test the functionality of a cluster before it becomes part of your production environment.

10 On node01 issue a “halt -q” to force down the operating system.

node01 halts - resource group C10RG1 moves to node02.

11 Reboot node01 and restart PowerHA.

node01 reboots. After PowerHA starts, node01 acquires C10RG1.

12 Perform a graceful stop without takeover on node02.

Resource Group C10RG2 goes offline.


14 Perform a graceful stop with takeover on node02.

Resource Group C10RG2 moves to node01.

15 Start PowerHA on node02 node02 starts and reacquires the C10RG2 resource group.

16 Fail (unplug) the service interface on node02.

The service IP moves to the second base adapter.

17 Reconnect the service interface on node02.

The service IP remains on the second base adapter.

18 Fail (unplug) the service interface on node02 (now on the second adapter).

The service IP (and persistent) moves to the first base adapter.

19 On node02 issue a “halt -q” to force down the operating system.

node02 halts - resource group C10RG2 moves to node01.

20 Reboot node02 and restart PowerHA.

node02 reboots. After PowerHA starts, node02 requires C10RG2

Cluster Test Plan

Test # Test Description Comments Results


The Cluster Test Tool only runs on a cluster with HACMP 5.2 or later where the configuration has been verified and synchronized. The tool can run in two ways:

� Automated testing:

Use the automated test procedure (a predefined set of tests) supplied with the tool to perform basic cluster testing on any cluster. No setup is required. You simply run the test from SMIT and view test results from the Cluster Test Tool log file.

� Custom testing:

If you are an experienced PowerHA administrator and want to tailor cluster testing to your environment, you can create custom tests that can be run from SMIT. After you set up your custom test environment, you run the test procedure from SMIT and view test results in the Cluster Test Tool log file.

The Cluster Test Tool uses the PowerHA Cluster Communications daemon to communicate between cluster nodes to protect the security of your PowerHA cluster.

Automated testingThis test tool provides an automated method to quickly test the functionality of the cluster. It typically takes 30 to 60 minutes to run, depending on the cluster complexity, and will perform the following tests. You must have root access to perform these tests.

General cluster topology testsThe Cluster Test Tool runs the general topology tests in the following order:

� Start cluster services on all available nodes.� Stop cluster services gracefully on a node.� Restart cluster services on the node that was stopped.� Stop cluster services with takeover on another node.� Restart cluster services on the node that was stopped.� Forces cluster services to stop on another node.� Restart cluster services on the node that was stopped.

Resource group tests on non-concurrent resource groupsIf the cluster includes one or more non-concurrent resource groups, the tool runs each of the following tests in the following order for each resource group:

� Bring a local network down on a node to produce a resource group fallover.� Recover the previously failed network.� Bring an application server down and recover from the application failure.


Resource group test on concurrent resource groupsIf the cluster includes one or more resource groups that have a startup management policy of online on all available nodes (OAAN), the tool runs one test that brings an application server down and recovers from the application failure.

Catastrophic failure testThe tool runs one catastrophic failure test that stops the cluster manager on a randomly selected node that currently has at least one active resource group.

Running automated testsAs a general recommendation, it is useful to periodically validate your cluster configuration. For this purpose, two automation tools are available:

� Cluster automated test tool:

This is used to actually test the cluster

� Automatic cluster configuration verification:

This is a tool that periodically checks and advertises any configuration changes so the cluster administrator can take corrective actions (synchronize and re-test the cluster)

These tools can be used to implement a standard validation procedure. A manual test is not necessary after the initial test has been completed. However, because the automated cluster tool can take disruptive actions, you must schedule the usage of this tool in a periodic maintenance window.

The Cluster Test Tool runs a specified set of tests and randomly selects the nodes, networks, resource groups, and so forth for testing. The tool tests different cluster components during the course of the testing.

More information about using the cluster test tool, and details on the tests it can run, can be found in 3.13.2, “Cluster Test Tool” on page 179.

Note: f the tool terminates the cluster manager on the control node, you might need to reboot this node.

Important: Before you start running an automated test, ensure that the cluster is not in service in a production environment.


3.14 Developing a PowerHA installation planNow that you have planned the configuration of the cluster and documented the design, prepare for your installation.

If you are implementing PowerHA on existing servers, be sure to schedule an adequate maintenance window to allow for the installation, configuration, and testing of the cluster.

If this is a new installation, allow time to configure and test the basic cluster. After the cluster is configured and tested, you can integrate the required applications during a scheduled maintenance window.

Referring back to Figure 3-1 on page 107, you can see that there is a preparation step before installing PowerHA. This step is intended to ensure the infrastructure is ready for PowerHA. This typically involves using your planning worksheets and cluster diagram to prepare the nodes for installation. Ensure that:

� The node software and operating system prerequisites are installed.� The network connectivity is properly configured.� The shared disks are properly configured.� The chosen applications are able to run on either node.

The preparation step can take some time, depending on the complexity of your environment and the number of resource groups and nodes to be used. Take your time preparing the environment as there is no purpose in trying to install PowerHA in an environment that is not ready. You will simply spend your time troubleshooting a poor installation. Remember that a well configured cluster is built upon solid infrastructure.

After the cluster planning is complete and environment is prepared, the nodes are ready for PowerHA to be installed.

The installation of PowerHA code is straightforward. If using the install CDROM, simply use SMIT to install the required filesets. If using a software repository, you can NFS mount the directory and use SMIT to install from this directory.

Ensure you are licensed for any features you install, such as the Smart Assist and PowerHA/XD.

After you have installed the required filesets on all cluster nodes, use the planning worksheets to configure your cluster. Here you have a few tools available to use to configure the cluster:

� You can configure WebSMIT at this point and use it to configure the cluster.� For a two-node cluster, you can use the Java based two-node assistant.� You can use an ASCII screen and SMIT to perform the configuration.


You have a number of choices available to help with the configuration of the cluster. In the next chapter we discuss each option in detail, but basically, you can:

� Use the two-node assistant to configure the cluster. This will configure a basic two-node cluster with a single resource group.

� Use the HACMP Standard Configuration SMIT panels to configure the cluster is a standard format.

� Use the HACMP Extended Configuration SMIT panels to manually configure the cluster.

� Use the *.haw file generated by the Online Planning Worksheets to apply to the cluster.

� Apply a cluster snapshot to configure the cluster.

After you have configured, verified, and synchronized the cluster, run the automated cluster test tool to validate cluster functionality. Review the results of the test tool and if it was successful, run any custom tests you want to perform further verification.

Verify any error notification you have included.

After successful testing, take a mksysb of each node and a cluster snapshot from one of the cluster nodes.

The cluster should be ready for production.

Standard change and problem management processes now apply to maintain application availability.

Note: We recommend that when configuring the cluster, you start by configuring the cluster (network) topology. After the cluster topology is configured, verify and synchronize the cluster before moving forward with the resources (shared volume groups, service IP addresses, and applications).

After the topology has been successfully verified and synchronized, you should start the cluster services and verify if everything is running as expected.

This will allow you to identify any networking issues before moving forward to configuring the cluster resources.


3.15 Backing up the cluster configurationThe primary tool for backing up the PowerHA cluster is the cluster snapshot. Although the Online Planning Worksheet Cluster Definition file also captures the cluster configuration, it is less comprehensive because it does not include ODM entries.

The primary information saved in a cluster snapshot is the data stored in the HACMP Configuration Database classes (such as HACMPcluster, HACMPnode, HACMPnetwork, HACMPdaemons). This is the information used to recreate the cluster configuration when a cluster snapshot is applied.

The cluster snapshot does not save any user-customized scripts, applications, or other non-PowerHA configuration parameters. For example, the names of application servers and the locations of their start and stop scripts are stored in the HACMPserver Configuration Database object class. However, the scripts themselves as well as any applications they might call are not saved.

The cluster snapshot utility stores the data it saves in two separate files:

� ODM Data File (.odm):

This file contains all the data stored in the HACMP Configuration Database object classes for the cluster. This file is given a user-defined basename with the.odm file extension. Because the Configuration Database information is largely the same on every cluster node, the cluster snapshot saves the values from only one node.

� Cluster State Information File (.info):

This file contains the output from standard AIX and PowerHA commands. This file is given the same user-defined basename with the.info file extension. By default, this file no longer contains cluster log information. Note that you can specify in SMIT that PowerHA collect cluster logs in this file when cluster snapshot is created.

For a complete backup, take a mksysb of each cluster node according to your standard practices. Pick one node to perform a cluster snapshot and save the snapshot to a safe location for disaster recovery purposes.

If you can, take the snapshot before taking the mksysb of the node so that it is included in the system backup.

Important: You can take a snapshot from any node in the cluster, even if PowerHA is down. However, you can only apply a snapshot to a cluster if all nodes are running the same version of PowerHA and all are available.


3.16 Documenting the clusterIt is important to document the cluster configuration in order to effectively manage the cluster. From managing cluster changes, to troubleshooting problems, a well documented cluster will result in better change control and quicker problem resolution.

We suggest that you maintain an accurate cluster diagram which can be used for change and problem management.

In addition, PowerHA provides the tools to easily gather the Cluster configuration data through the use of the Online Planning Worksheets (OLPW).

In this section we discuss how to create a cluster definition file through SMIT and then use it to create a cluster configuration report via the OLPW tool. The resulting report is in html format and can be viewed using a Web browser.

The basic steps in creating a cluster report are as follows:

1. Export a cluster definition file from one of the cluster nodes using SMIT.

– This will typically be saved as a *.haw file.– If you are using the OLPW on your workstation, ftp the definition file to

your workstation.

2. Use the OLPW to open an existing definition file.

3. Use the OLPW to create a configuration report:

– This will create an *.html file.

4. Use your Web browser to view the file:

– We recommend that you save the file to another server or workstation for disaster recovery purposes.


3.16.1 Exporting a cluster definition file using SMITYou can create a cluster definition file from an active PowerHA cluster and then open this file using the Online Planning Worksheets application.

To create a cluster definition file from SMIT:

1. Use the smitty hacmp fast path and select Extended Configuration Export Definition File for Online Planning Worksheets (see Figure 3-18) and press Enter.

Figure 3-18 Online planning worksheets option in SMIT

Upon pressing Enter, you are then presented with the final OLPW SMIT panel as shown in Figure 3-19.

2. Enter field values as follows and press Enter:

– File Name:

The complete pathname of the cluster definition file. The default pathname is /var/hacmp/log/cluster.haw.


– Cluster Notes:

Any additional comments that pertains to your cluster. The information that you enter here will display in the Cluster Notes panel in Online Planning Worksheets.

3. Open the cluster definition file in Online Planning Worksheets.

Figure 3-19 SMIT panel: Export Definition File for Online Planning Worksheets

3.16.2 Creating a cluster definition file from a snapshot using SMITYou can also create a cluster definition file from a PowerHA cluster snapshot and then open this file using the Online Planning Worksheets application.

To create a cluster definition file from a snapshot using SMIT:

1. Use the smitty hacmp fast path and select Extended Configuration Snapshot™ Configuration Convert Existing Snapshot For Online Planning Worksheets

2. Select your previously created snapshot.


3. After the Cluster definition file is created, open it in Online Planning Worksheets.

3.16.3 Creating a configuration reportA configuration report enables you to record information about the state of your cluster configuration in an HTML format. A report provides summary information that includes:

� The name of the directory that stores images used in the report

� The version of the Online Planning Worksheets application

� The author and company specified on the Cluster Configuration panel

� Cluster notes added from the Cluster Notes panel

� The latest date and time that Online Planning Worksheets saved the cluster definition file

The report also provides a section for each of the following items:

� Nodes and communication paths� Applications� Networks� NFS exports� IP labels� Application servers� Global network� Application monitors� Sites� Pagers or cell phones� Disks� Remote notifications� Resource groups� Tape resources� Volume groups� Resource group runtime policies� Logical volumes� Node summary� File collections� Cluster verification� Cross-site LVM Mirroring

To create a configuration definition report:

1. Select File Create Report.

2. In the Save dialog box, enter a name and location for the report file.


When a report is generated, a directory named olpwimages is created in the same directory that stores the report. If you save your report file to the directory /home/shawn/reports, for example, the graphics directory is /home/shawn/reports/olpwimages. The olpwimages directory contains graphics files associated with the report. Each time you generate a report, the report and files in the images directory are replaced.

Figure 3-20 shows a screen capture of the generated report. You can scroll down the report page for further details.

Figure 3-20 Sample Configuration Report


3.17 Change and problem managementAfter the cluster is up and running, the job of managing change and problems begins.

Effective change and problem management processes are imperative to maintaining cluster availability. To be effective, you must have a current cluster configuration handy. You can use the OLPW tool to create an html version of the configuration and, as we also suggest, a current cluster diagram.

Any changes to the cluster should be fully investigated as to their effect on the the cluster functionality. Even changes that do not directly affect PowerHA, such as the addition of additional non-PowerHA workload, can affect the cluster. The changes should be planned, scheduled, documented, and the cluster tested after the change has been made.

To ease your implementation of changes to the cluster, PoweHA provides the Cluster Single Point of Control (C-SPOC) SMIT menus. Whenever possible, the C-SPOC menus should be used to make changes. Using C-SPOC, you can make changes from one node and the change will be propagated to the other cluster nodes.

Problems with the cluster should be quickly investigated and corrected. Because PowerHA’s primary job is to mask any errors from applications, it is quite possible that unless you have monitoring tools in place, you might be unaware of a fallover. Ensure that you make use of error notification to notify the appropriate staff of failures.

3.18 Planning toolsThis section covers the three main planning tools in greater detail. A sample cluster diagram and paper planning worksheets are provided.


3.18.1 Cluster diagramDiagramming the PowerHA cluster allows for a clear understanding of the behavior of the cluster and helps identify single points of failure. A sample two-node cluster diagram is provided in Figure 3-21.

Figure 3-21 Sample Cluster Diagram

Shared Disk

FC

– fcs0

en1

en0(base

)

(pers.)

AIX(5.3 ML02)

RSCT(2.4.2)

HACMP(5.3)

Service Addressrootvg

rootvg

Cluster Name __________

FC

– fcs0

en1

en0

(bas

e)

(pe

rs.)

AIX(5.3 ML02)

RSCT(2.4.2)

HACMP(5.3)

Service Address

rootvgrootvg

_____vgMajor #____

______vgMajor #____

Enhanced Concurrent Volume groups

(base

)

(base

)

HB

AL

IAS

HB

AL

IAS

HB

AL

IAS

HB

AL

IAS

CLIENTS

FC

– fcs1

FC

– fcs1

NODE = ________hostname = ________

Resource Group

NODE = _______hostname = _________

Resource Group

Ethernet network = _______* Heartbeat over aliases

Heartbeat over disk network = _______.


3.18.2 Online Planning WorksheetsThe Online Planning Worksheets (OLPW) are a Java-based version of the paper worksheets. Using this application, you can either import PowerHA configuration information from an existing cluster and edit it as needed, or you can enter all configuration information manually. The application saves your information as a cluster definition file that you can apply to configure your cluster. The application also validates your data to ensure that all required information has been entered.

The following steps illustrate an effective use of the OLPW tool, from planning, to implementing and documenting your cluster:

1. Prepare for cluster planning by familiarizing yourself with the PowerHA concepts and your environment.

2. Run the Online Planning Worksheet program on your workstation from the PowerHA CD-ROM.

3. Complete the Online Planning Worksheets and create a cluster definition file.

4. Install PowerHA on the cluster nodes.

5. Copy the cluster definition file to one of the cluster nodes.

6. Apply the cluster definition file to the cluster using SMIT.

7. Take a cluster snapshot.

8. Now document the cluster by generating a report which will create an html file that you can use for systems management.

Installing OLPWThe following sections discuss methods of installing the Online Planning Worksheet tool.

Installing the application on an AIX 5L systemYou install the Online Planning Worksheets application from the PowerHA software installation medium. The installable image for the application is:

cluster.es.worksheets

The Online Planning Worksheets application is installed in the /usr/es/sbin/cluster/worksheets directory.

Note: The Online Planning Worksheets offer a tool for configuring and recording a cluster. You still must have a good understanding of planning a cluster before using the tool.


Running the OLPW application from an AIX 5L GUIExecute the following command:

/usr/es/sbin/cluster/worksheets/worksheets

The application verifies that you have an appropriate version of the JRE™ installed before it runs the application in the background.

Installing the OLPW application on a Windows systemTo install the Online Planning Worksheets application on a Microsoft® Windows® system:

1. Install the Online Planning Worksheets from the PowerHA installation medium onto an AIX system.

2. Copy the worksheets.bat and worksheets.jar files to a directory of your choice on your Windows system.

Running the application from a Windows installationExecute the worksheets.bat command from the command line, or from a file manager GUI double-click the worksheet.jar icon.

Note: If you copy the files via FTP, be sure to specify the ASCII mode for the.bat file and binary for.jar.

Note: The OLPW can also be downloaded from the PowerHA Web site:

http://www-03.ibm.com/systems/power/software/availability/aix/


http://www-03.ibm.com/systems/power/software/availability/aix/

Understanding the main windowWhen you open the Online Planning Worksheets application, its main window displays, as shown in Figure 3-22:

Figure 3-22 Online Planning Worksheets Main Menu

The main window consists of two panes:

� The left pane enables navigation to cluster components.

� The right pane displays panels associated with icons selected in the left pane. Your configuration information is entered on the right pane.

Creating a new cluster definition fileWhen you start planning your cluster, you can either fill in all information by hand or OLPW can read in your cluster configuration information and then fill in the rest of the information by hand.

To create a cluster definition file:

1. Enter all data by hand using your planning information

Or,

Read in configuration information directly from your PowerHA cluster as follows:

a. Use the SMIT menus to create a definition file.


b. Within the OLPW tool, Select File Import HACMP Definition and select the cluster definition file.

c. The Import Validation dialog box appears. You can view information about validation errors or receive notification that the validation of the PowerHA definition file was successful.

d. Enter any additional data by hand.

2. Save the newly created definition file.

Adding notes about the cluster configurationAs you plan your configuration, you can add notes to the cluster definition file.

To add cluster notes:

1. In the left pane in either the Requisite view or the Hierarchical view, select Cluster Notes.

2. In the Cluster Notes panel, enter the information you want to save.

3. Click the Apply button.

Saving a cluster definition fileTo save a cluster definition file:

� Select File Save to use your cluster name as the filename.

Or,

� Select File Save As to enter a different filename. In the Save dialog box, enter the name and location for your cluster definition file, make sure that the filename has the .haw (or .xml) extension, and click Save.

When you save a file, OLPW automatically validates the cluster definition unless automatic validation has been turned off.

Applying worksheet data to your PowerHA clusterAfter you complete the configuration panels in the OLPW application, you can save the file, and then apply it to a cluster node. If you use the Online Planning Worksheets application on a Windows system, you must first copy the cluster definition file to a cluster node before applying it.

PrerequisitesBefore applying your cluster definition file to a cluster, ensure that the following conditions are met:

� The PowerHA software is installed on all cluster nodes.


� All hardware devices that you specified for your cluster configuration are in place.

� If you are replacing an existing configuration, any current cluster information in the PowerHA configuration database was retained in a snapshot.

� Cluster services are stopped on all nodes.

� A valid /usr/es/sbin/cluster/etc/rhosts file resides on all cluster nodes.

Applying your cluster configuration fileTo apply your cluster definition file:

1. From the Online Planning Worksheets application, validate your cluster definition file.

2. Create a report to document your cluster configuration.

3. Save the file and exit the application. If your cluster configuration file resides on a Windows system, copy the file to a PowerHA node.

4. From the cluster node, import the definition as follows:

a. smitty hacmp Extended Configuration Import Cluster Configuration from Online Planning Worksheets File

b. Then choose the appropriate .haw file.

c. Then press Enter.

An example of the SMIT panel to import the OLPW is shown in Figure 3-23.

Figure 3-23 Import OLPW file to create cluster

This applies the information to your cluster, performs a synchronization, and then a verification. During verification, on-screen messages appear, indicating the events taking place and any warnings or errors.

Import Cluster Configuration from Online Planning Worksheets File

Type or select values in entry fields.Press Enter AFTER making all desired changes.

[Entry Fields]* File Name [/var/hacmp/log/cluste>

F1=Help F2=Refresh F3=Cancel F4=ListF5=Reset F6=Command F7=Edit F8=ImageF9=Shell F10=Exit Enter=Do


3.18.3 Paper planning worksheetsDetailed Paper Planning Worksheets are found in the HACMP Planning Guide, SC23-4861.

We have found that it is useful to tailor these worksheets into a format that fits your environment.

We also included planning sheets in Appendix A, “” on page 775.



Chapter 4. Installation and configuration

In this chapter, we give you the general steps necessary for implementing a PowerHA cluster, including preparation of the cluster hardware and software.


� Basic steps to implement a PowerHA cluster� Configuring PowerHA� General considerations for the configuration method� Standard Configuration Path: The two-Node Cluster Configuration Assistant� Using Extended Configuration Path and C-SPOC� Installing and configuring WebSMIT

4


4.1 Basic steps to implement a PowerHA clusterIn this section we present the general steps to follow while implementing a PowerHA cluster. While the target configuration might differ slightly from implementation to implementation, the basic steps are the same, with certain sequence changes.

The basic steps for implementing a high availability cluster are as follows:

1. Planning:

This step is perhaps the most important one, because it requires profound knowledge and understanding of your environment. Thorough planning is the key for a successful cluster implementation. For details and a planning methodology, see Chapter 3, “Planning” on page 103.

2. Installing and connecting the hardware:

In this step you should have your hardware environment prepared according to the configuration identified during the planning phase. The following tasks should be performed:

– Installing server hardware (racks, power, Hardware Management Console, and so on.)

– Configuring the logical partitioning (if applicable)

– Installing and configuring VIOS (if applicable)

– Connecting systems to local networking environment

– Connecting systems to storage (SAN)

3. Installing and configuring base operating system (AIX) and PowerHA prerequisites:

In this step, the following tasks should be performed:

– Installing base operating system, application(s) and PowerHA prerequisites (CDROM/DVD, Network Install Manager) according to local rules.

– Configuring the local networking environment (TCP/IP configuration—interfaces, name resolution, and so on.)

– Configuring users, groups, authentication, and so on.

Note: Beside the cluster configuration, the planning phase should also provide a cluster testing plan. This testing plan should be used in the final implementation phase, and also during periodic cluster validations.


4. Configuring shared storage:

Depending on the storage subsystem used, the storage configuration can consist of:

– Configuring storage device drivers and multi-path extensions (if applicable)

– Configuring physical-to-logical storage such as RAID arrays, LUNs, storage protection, and so on

– Configuring storage security such as LUN masking, SAN zoning (where applicable)

– Configuring the storage access method for the application (file systems, raw logical volumes, or raw disks)

5. Installing and configuring application software:

In this step, the application software must be configured and tested to run as a standalone system. You should also perform a manual movement and testing of the application on all nodes designated for application in the HA cluster, as follows:

a. Create and test the application start and stop scripts; make sure the application is able to recover from unexpected failures, and that the application start/stop scripts function as expected on all nodes designated for running this application.

b. Create and test the application monitoring scripts (if desired) on all nodes designated to run the application.

6. Installing the PowerHA software:

A reboot is no longer required by PowerHA. However, it might be required by RSCT prerequisites.

7. Defining the cluster and discovering or manually defining the cluster topology:

As PowerHA provides various configuration tools, you can choose between a Standard Configuration Path (easier), or the Extended Configuration Path (for more complex configurations). Also, you can choose between manually entering all topology data or using the PowerHA discovery feature, which eases cluster configuration by building picklists to use.

8. Synchronizing the cluster topology and starting PowerHA services (on all nodes):

In this step, we recommend that you verify and synchronize cluster topology and start cluster services. Verifying and synchronizing at this stage eases the subsequent implementation steps, as it is much easier to detect configuration mistakes and correct them in this phase, providing a sound cluster topology for further resource configuration.

Chapter 4. Installation and configuration 201

9. Configuring cluster resources:

The following resources should be configured in this step:

– Service IP addresses (labels)– Application servers (application start/stop scripts)– Application monitors (application monitoring scripts and actions)

10.Configuring cluster resource groups and shared storage:

Cluster resource groups can be seen as containers used for grouping resources that will be managed together by PowerHA. Initially, the resource groups are defined as empty containers:

– Define the resource groups; synchronize the cluster.

– Define the shared storage (volume groups, logical volumes, file systems, OEM disk methods, and so on.).

– Populate resource groups with Service IP label(s), application server(s), volume groups, application monitors.

11.Synchronizing the cluster:

Because the PowerHA topology is already configured and PowerHA services started, after you synchronize the cluster, the resource groups will be brought online via dynamic reconfiguration.

12.Testing the cluster:

After the cluster is in stable state, you should test the cluster.

Testing includes these activities:

– Documenting the tests and results.– Updating the cluster documentation.

4.2 Configuring PowerHAThis section shows how to create a basic cluster configuration using various tools and menus provided. There are two ways to approach a basic cluster installation:

� You can start with the Two-node Cluster Configuration Assistant to create a basic configuration. This scenario is discussed in 4.2.2, “Standard Configuration Path: The two-Node Cluster Configuration Assistant” on page 204. If required, the cluster configured using the Two-node Cluster

Note: Although you can use the cluster automated test tool, we recommend that you also perform a thorough manual testing of the cluster.


Configuration Assistant can be further enhanced using other PowerHA features.

� You can first define only the topology and then configure all resources, resource groups, and so on, using the extended menus. This scenario is presented in 4.2.3, “Using Extended Configuration Path and C-SPOC” on page 213.

Before you decide which way to go, make sure that you have performed the necessary planning, and that the documentation for your cluster is available for use. Refer to Chapter 3, “Planning” on page 103.

In this section we configure the two scenarios according to the planning drawing shown in Figure 3-3 on page 112.

4.2.1 General considerations for the configuration method

In this section we discuss several considerations to help you determine your configuration method.

When to use the Standard Configuration PathUsing the Standard Configuration Path will give you the opportunity to add the basic components to the PowerHA Configuration Database (ODM) in a few simple steps. This configuration path significantly automates the discovery and configuration information selection, and chooses default behaviors for networks and resource groups.

The following prerequisites, assumptions, and defaults apply for the Standard Configuration Path:

� PowerHA software must be installed on all nodes of the cluster.

� All network interfaces must be completely configured at the operating system level. You must be able to communicate from one node to each of the other nodes and vice versa.

� The PowerHA discovery process runs on all cluster nodes, not just the local node.

� When using the Standard Configuration Path and information that is required for configuration resides on remote nodes, PowerHA automatically discovers the necessary cluster information for you. Cluster discovery is run automatically while using the Standard Configuration Path.


� PowerHA assumes that all network interfaces on a physical network belong to the same PowerHA network.

� Hostnames are used as node names.

� PowerHA uses IP aliasing as the default resource group configuration with any of the policies for startup, fallover, and fallback (without specifying fallback timer policies).

� The application start and stop scripts can be configured in the Standard Configuration Path, whereas application monitoring scripts have to be implemented by using the Extended Configuration Path.

When to use the Extended Configuration PathIn order to configure the less common cluster elements, or if connectivity to each of the cluster nodes is not available at configuration time, you can manually enter the information by using the Extended Configuration Path.

Using the options under the Extended Configuration menu you can add the basic components of a cluster to the PowerHA configuration database, as well as additional types of behaviors and resources. Use the Extended Configuration Path to customize the cluster components such as policies and options that are not included in the Initialization and Standard Configuration menus.

Make sure that you use the Extended Configuration Path if you plan to do any of the following tasks:

� Use cluster node names that are different from hostnames.� Use IPAT via IP Replacement.� Add/or change an IP-based network.� Add/or change non-IP-based networks and devices.� Configure a distribution preference for service IP addresses.� Specify fallback timer policies for resource groups.� Add or change any cluster configuration while one node is unavailable.

4.2.2 Standard Configuration Path: The two-Node Cluster Configuration Assistant

When using the Standard Configuration Path, you can benefit from using the Two-Node Cluster Configuration Assistant. It is very common to use this feature when you already have an application server and would like to make it highly

Note: You cannot select IP address takeover via replacement when using the Standard Configuration Path. You can modify this using the Extended Configuration Path.


available. This handy feature gives you the possibility to quickly configure a two-node cluster having a single resource group after you can specify the following items:

� Communication Path to Takeover Node� Application Server Name� Application Server Start Script� Application Server Stop Script� Service IP Label

In order to configure quickly a cluster using the Two-Node Cluster Configuration Assistant, you should go through the following steps:

� Configuring the network� Configuring LVM components� Writing application start and stop scripts� Running the Two-Node Cluster Configuration Assistant

Configuring the networkNetwork configuration tasks include these activities:

� Connecting and configuring all IP network interfaces

� Ensuring that there is an active communication path to the takeover node

� Configuring persistent IP Labels as shown in “Persistent IP addresses (aliases)” on page 137

� Adding application service address to /etc/hosts on all nodes

Configuring LVM components When using the Two-Node Configuration Assistant, all volume groups, logical volumes, and file systems have to be configured in advance.

We recommend that you take the following steps:

1. Verify the existing disk configuration.

2. Create shared volume groups.

3. Create shared logical volumes.

4. Create shared file systems.

5. Mount and unmount shared file systems.

Note: If you have some already existing volume groups (containing logical volumes and file systems) configured on external disks, even though they might not be related to the application you want to integrate, the Two-node Configuration Assistant will discover them.


6. Vary off the shared volume groups.

7. Import the shared volume groups on the other node.

Verify the existing disk configurationIn order to configure the shared LVM components, you must ensure that all nodes have access to all shared disks. You can do this by running the lspv command on both nodes and compare the output of the command to be sure that on both nodes the unassigned physical volumes (PV) have the same physical volume identifier (PVID). If any hdisk device does not have a PVID assigned, you must assign one. To assign a PVID to hdisk1 use the command

chdev -l hdisk1 -a pv=yes

The command must be run on all the nodes.

Create shared volume groupsPowerHA supports both normal and enhanced concurrent volume groups. However, it is very usual to create and use enhanced concurrent volume groups as shared volume groups. This type of volume groups can be included in both shared (non-concurrent) and concurrent resource groups.

Here are some advantages of enhanced concurrent volume groups over normal volumes groups:

� Reliable Scalable Cluster Technology (RSCT) is used for access control.

� RSCT is independent of disk technology.

� Disk reservation mechanism is no longer used.

� The volume group can be varied on in passive mode on all cluster nodes. Only one cluster node can have the volume group varied on in active mode.

� Lazy update is no longer needed.

� During fallback, volume groups are acquired using fast disk takeover.

� You can define a disk heartbeat network over disks that are part of an enhanced concurrent volume group.

Note: All cluster nodes must have the same PVID for each shared disk. Otherwise you will not be able to create the LVM components.

Note: We do not recommend to manually vary on an enhanced concurrent volume group that is part of a resource group. This task should always be handled by PowerHA.


If the volume groups to be included in the cluster are not already created, you must create them. To configure a volume group using SMIT menus, issue the smitty mkvg command and then select the type of volume group—original, big, or scalable.

In Example 4-1, we show the SMIT menu used for defining an original enhanced capable volume group named shared_vg, having physical partition size 128 MB, containing hdisk1 and having major number 38.

Example 4-1 Creating an enhanced capable volume group using SMIT

Add an Original Volume Group


[Entry Fields] VOLUME GROUP name [shared_vg] Physical partition SIZE in megabytes 128 +* PHYSICAL VOLUME names [hdisk1] + Force the creation of a volume group? no + Activate volume group AUTOMATICALLY no + at system restart? Volume Group MAJOR NUMBER [38] +# Create VG Concurrent Capable? enhanced concurrent +


Note that you must set the field Activate volume group AUTOMATICALLY at system restart to no because PowerHA will be responsible for activating this volume group by invoking the appropriate varyonvg command.

Even if you do not plan to use NFS to export file systems residing in this volume group, you might decide to do that later, when the cluster will be operational. As such, it is a good idea to ensure that Volume Group MAJOR NUMBER is set to the same value on all cluster nodes.

You can find the available volume group major numbers using the lvlstmajor command.

After the volume group shared_vg has been created, it can be varied on using the following command:


varyonvg shared_vg

Repeat this procedure for all volume groups that will be configured in the cluster.

For additional details regarding access to shared storage and enhanced concurrent volume groups, refer to Chapter 12, “Storage related considerations” on page 575.

Create shared logical volumesAfter all shared volume groups have been created, you should define the logical volumes that will be part of your volume groups.

If the logical volumes to be included in the cluster are not already created, you must create them. To configure a logical volume using SMIT menus, issue the smitty mklv command and then select the volume group which you want the logical volume to be part of. You should specify all characteristics of the logical volume such as type or size.

In Example 4-2 we show the SMIT menu used for defining a jfs2 logical volume group named shared_lv, being part of volume group vg1 and having 10 logical partitions.

Example 4-2 Creating a shared logical volume using SMIT

Add a Logical Volume


[Entry Fields] Logical volume NAME [shared_lv]* VOLUME GROUP name vg1* Number of LOGICAL PARTITIONS [10] # PHYSICAL VOLUME names [] + Logical volume TYPE [jfs2] + POSITION on physical volume middle + RANGE of physical volumes minimum + MAXIMUM NUMBER of PHYSICAL VOLUMES [] # to use for allocation Number of COPIES of each logical 1 + partition Mirror Write Consistency? active + Allocate each logical partition copy yes + on a SEPARATE physical volume? RELOCATE the logical volume during yes + reorganization? Logical volume LABEL [] MAXIMUM NUMBER of LOGICAL PARTITIONS [512] # Enable BAD BLOCK relocation? yes +


SCHEDULING POLICY for writing/reading parallel + logical partition copies Enable WRITE VERIFY? no + File containing ALLOCATION MAP [] Stripe Size? [Not Striped] + Serialize IO? no + Mirror Pool for First Copy + Mirror Pool for Second Copy + Mirror Pool for Third Copy +

Make sure that the name of the logical volume is unique across all cluster nodes.

Repeat this procedure for all logical volume groups that will be configured in the cluster.

Create shared logical volumesYou can access all logical volumes defined as raw devices. However, if the volume groups to be included in the cluster contain at least one jfs logical volume, you must have at least one jfs log logical volume. If the volume groups to be included in the cluster contain at least one jfs2 logical volume, you must have at least one jfs2 log logical volume. A log logical volume is a logical volume whose type is either jfslog or jfs2log. If a log logical volume does not already exist, you must define it. We recommend that you create dedicated log logical volumes as a part of the shared volume groups.

When you define a log logical volume, you should ensure that:

� The type of the log logical volume matches the type of your logical volumes.

� The name of the log logical volume group is unique across cluster nodes.

� The size of the logical volume is appropriate. Usually a single logical partition is enough.

� After the log logical volume is created, it is formatted using the logform command.

Create shared file systemsIf the file systems to be included in the cluster are not already created, you must create them. The sole purpose of defining jfs/jfs2 logical volumes and jfs/ jfs2 log logical volumes is to create the context required to define jfs/jfs2 file systems.

You can define jfs and/or jfs2 file systems that will reside on the corresponding jfs/jfs2 logical volumes previously defined. There is a one-to-one relationship

Note: PowerHA also supports INLINE logs for JFS2 file systems.


between the jfs file systems and jfs logical volumes. There is also a one-to-one relationship between the jfs2 file systems and jfs2 logical volumes.

Defining a jfs/jfs2 file system requires the existence of at least one jfs/jfs2 log logical volume. Multiple jfs file systems can use the same jfs log logical volume. Multiple jfs2 file systems can use the same jfs2 log logical volume.

To configure a file system (enhanced) on a previously defined logical volume using SMIT menus, issue the smitty crfs command, select Add an (Enhanced) Journaled File System, select Add an (Enhanced) Journaled File System on a Previously Defined Logical Volume and fill in the required data.

In Example 4-3 we show the SMIT menu used for defining a enhanced file system having a mounting point named /shared_fs and using a log logical volume named shared_log.

Example 4-3 Creating a shared jfs2 file system using SMIT

Add an Enhanced Journaled File System


[Entry Fields]* LOGICAL VOLUME name shared_lv +* MOUNT POINT [/shared_fs] Mount AUTOMATICALLY at system restart? no + PERMISSIONS read/write + Mount OPTIONS [] + Block Size (bytes) 4096 + Logical Volume for Log shared_log + Inline Log size (MBytes) [] # Extended Attribute Format + ENABLE Quota Management? no + Enable EFS? no + Allow internal snapshots? no + Mount GROUP []


You should specify all characteristics of the file system such as the mounting point or the log logical volume. Make sure that the file system is unique across all cluster nodes.


Note that you must set the field Mount AUTOMATICALLY at system restart to no because PowerHA will be responsible for mounting this file system.

Repeat this procedure for all new file systems that will be configured in the cluster.

Mount and unmount shared file systemsYou should verify that you can mount and unmount all shared file systems using mount and umount commands. That will confirm that shared file systems can be mounted and unmounted by PowerHA.

Vary off the shared volume groupsAfter the shared file systems are unmounted, you can vary off the shared volume groups. The following command will vary off the volume group named shared_vg.

varyoffvg shared_vg

Import the shared volume groups on the other nodeThe shared volume groups must be imported on the other cluster node. When importing the volume group, ensure that:

� You specify the same volume group major number.

� You specify a physical volume that is indeed part of the shared volume group.

Sometimes the same physical volume is identified on different systems by different hdisk devices. Ensure that you import the volume group definition from the correct physical volume by verifying the PVID.

� Following the import, the volume group AUTO ON flag is still set to no. If is not, change it using chvg command.

Apart from importing LVM definitions on the second node this operation will also update /etc/filesystems with the values used for the new logical volumes and mount points. Following this operation, both nodes should have identical definitions.

The following command imports the volume group named shared_vg from the physical volume hdisk2 and assigns it the volume group major number 38.

importvg -y shared_vg -V 38 hdisk2

In our scenario, hdisk1 on the first cluster node and hdisk2 on the second cluster node must have the same PVID.

You should verify that you can mount and unmount all shared file systems using the mount and umount commands.After the shared file systems are unmounted, you can vary off the shared volume groups.


Writing application start and stop scriptsPowerHA will need a start and a stop script that will be used to automatically start and stop the application that is part of the resource group. You must ensure that the scripts produce the expected results.

Running the Two-Node Cluster Configuration AssistantNow that all prerequisites are met, you can run the Two-Node Cluster Configuration Assistant. To run the Two-Node Cluster Configuration Assistant using SMIT menus, issue the smitty cl_configassist command and fill in the required data.

In Example 4-4 we show the SMIT menu used to configure a two-node cluster, using the service IP label named xdsvc_service1, having an application server named app_server1 which contains a start application script named /usr/app_start.sh and a stop application script named /usr/app_stop.sh. Because we did not use IPv6, we did not have to specify a prefix.

Example 4-4 Running the Two-Node Cluster Configuration Assistant

Two-Node Cluster Configuration Assistant


[Entry Fields]* Communication Path to Takeover Node [xdsvc2] +* Application Server Name [app_server1]* Application Server Start Script [/usr/app_start.sh]* Application Server Stop Script [/usr/app_stop.sh]* Service IP Label [xdsvc_service1] + Prefix Length [] #


The Two-Node Cluster Configuration Assistant will also synchronize the cluster definition. It also logs debugging information to the file, /var/hacmp/log/clconfigassist.log.

Note: In PowerHA 5.5, WebSMIT cannot be used to run the Two-Node Configuration Assistant.


Here we summarize the main features of the Two-Node Cluster Configuration Assistant:

� The Two-Node Cluster Configuration Assistant creates all cluster resources such as cluster name, cluster id, node names, and the shared resource group.

� For the shared resource group, it sets the startup policy to Online On Home Node Only, the fallover policy to Fallover To the Next Priority Node In The List, and the fallback policy to Never Fallback.

� It creates all topology elements (nodes, networks, communication interfaces and devices) including a non-IP disk heartbeat network when it detects a shared enhanced concurrent volume group

� It supports only one application server and 2 nodes.

� It uses IP takeover by aliasing.

4.2.3 Using Extended Configuration Path and C-SPOCIn this section we show how to configure a cluster by first discovering cluster topology and then using C-SPOC to add resource groups. (C-SPOC stands for Cluster Single Point Of Control and is a powerful tool that allows you to configure and manage a cluster from a single node.)

The sequence of general steps listed in 4.1, “Basic steps to implement a PowerHA cluster” on page 200 can be changed so that storage configuration and creation of application scripts will be done after the cluster topology has been defined.

Any subsequent cluster configuration will be done via C-SPOC while the cluster services are running.

Configuring the topology using Extended Configuration PathWhen using the Extended Configuration Path, you have the possibility to define more granular options and specify certain configuration parameters that are not accessible via the Standard Configuration Path.

Configuring the topology using the Extended Configuration Path involves the following steps:

1. Defining the cluster2. Adding nodes to the cluster3. Discovering cluster-related information from all cluster nodes (optional)4. Adding cluster networks (IP and non-IP)5. Configuring communication interfaces and devices6. Verifying and synchronizing the cluster


Defining the clusterIn order to define the cluster name, use the following steps:

1. Enter smitty hacmp Extended Configuration Extended Topology Configuration Configure an HACMP Cluster Add/Change/Show an HACMP Cluster

2. Fill in the cluster name.

The cluster name must be unique and can have up to 64 alphanumeric characters.

Adding nodes to the cluster After defining the cluster name, you should add the nodes. For each one of them, specify a cluster node name along with a valid communication path (an IP address assigned to an available network interface) using the following steps:

1. Enter smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP Nodes.

2. Fill in the nodename and a valid communication path to this node.

When using the Extended Configuration Path, you can specify a node name that is different from the hostname.

Discovering cluster-related information from all cluster nodesThis step is optional. After you have defined all cluster nodes and ensured that there is a viable communication path to each node, you can run the discovery process. This process will collect from all cluster nodes valuable information that can be subsequently used for cluster definition.

Information collected from cluster nodes includes node names, physical volume names and PVIDs, volume group names and major numbers, network interfaces, and communication devices. The data is organized in picklists that will be accessible from the menus used for configuring resources and topology, thus helping you make accurate selections of existing components. You can run the discovery process any time later during the configuration process or when the cluster is already functional.

It is typical to prefer running discovery over filling in all data. To run the discovery process, do the following steps:

1. Enter smitty hacmp Extended Configuration Discover HACMP-related Information from Configured Nodes.

2. Then continue as described in the following section.


Adding cluster networks (IP and non-IP)After defining cluster nodes, you can start adding networks using these steps:

1. Enter smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP Networks Add a Network to the HACMP Cluster.

a. If you previously have run discovery, you can select one of the network types discovered during that process. Based on information collected during the discovery process, PowerHA learns about network types currently available for your particular cluster and automatically adds them to the Discovered IP-based Network Types list and Discovered Serial Device Types list.

In Example 4-5 you can see the ether and diskhb types showing up in the Discovered IP-based Network Types list and the Discovered Serial Device Types list as a result of running the discovery process. These network types were available at the time when the discovery process was last run, which was on April 01 at 14:31.

Example 4-5 Discovered and pre-defined network types

Configure HACMP Networks

Move cursor to desired item and press Enter.

Add a Network to the HACMP Cluster Change/Show a Network in the HACMP Cluster Remove a Network from the HACMP Cluster Manage Concurrent Volume Groups for Multi-Node Disk Heartbeat

+--------------------------------------------------------------------------+| Select a Network Type || || Move cursor to desired item and press Enter. || || # Discovery last performed: (Apr 01 14:31) || # Discovered IP-based Network Types || ether || || # Discovered Serial Device Types || diskhb || || # Pre-defined IP-based Network Types || XD_data || XD_ip || atm || ether || fddi || hps |


| ib || token || || # Pre-defined Serial Device Types || XD_rs232 || diskhb || rs232 || tmscsi || tmssa || || F1=Help F2=Refresh F3=Cancel || F8=Image F10=Exit Enter=Do || /=Find n=Find Next |F9+------------------------------------------------------------------------+

b. If you have not run discovery, you can choose the network type from the lists of Pre-defined IP-based Network Types and Pre-defined Serial Network Types. These lists contain all network types that are available to PowerHA and that are shown in Example 4-5 on page 215.

2. After you have selected the network type, you can add the network. Example 4-6 shows how to add a predefined IP-based network named net_ether_02.

Example 4-6 Adding an IP-based network

Add an IP-Based Network to the HACMP Cluster


[Entry Fields]* Network Name [net_ether_02]* Network Type ether* Netmask [255.255.255.0] +* Enable IP Address Takeover via IP Aliases [Yes] + IP Address Offset for Heartbeating over IP Aliases []


3. When defining an IP-based network, you can choose the IP Address Takeover (IPAT) mechanism you want to use. By default, IP aliases are used. If you want to use IPAT via replacement, you should set the Enable IP Address Takeover via IP Aliases to no.

4. You should repeat this step for all IP and non-IP networks you want to add.


Configuring communication interfaces and devicesAfter the cluster networks have been defined, you can start to add communication interfaces to IP-based networks and communication devices to non IP-based networks.

If you previously have run discovery, you can select one of the communication interfaces and devices discovered during that process. Based on the information collected during the discovery process, PowerHA learns about communication interfaces and devices currently available for your particular cluster and automatically adds them to the Discovered Communication Devices list and Discovered Communication Interfaces list.

The list of discovered communication devices might be similar to the list in Example 4-7, which shows devices that can be used for a non-IP network.

Example 4-7 Adding discovered communication devices

Configure HACMP Communication Interfaces/Devices


Add Communication Interfaces/Devices Change/Show Communication Interfaces/Devices +--------------------------------------------------------------------------+ | Select Point-to-Point Pair of Discovered Communication Devices to Add | | | | Move cursor to desired item and press F7. | | ONE OR MORE items can be selected. | | Press Enter AFTER making all selections. | | | | # Node Device Pvid | | xdsvc1 hdisk2 000fe401d39e2344 | | xdsvc2 hdisk2 000fe401d39e2344 | | xdsvc1 hdisk3 000fe4014e5026c3 | | xdsvc1 hdisk4 000fe4014e504bd5 | | xdsvc2 hdisk7 000fe4114dd7f436 | | | | F1=Help F2=Refresh F3=Cancel | | F7=Select F8=Image F10=Exit |F1| Enter=Do /=Find n=Find Next |F9+--------------------------------------------------------------------------+

If you have not run discovery, you have to specify the communication interfaces and devices that you want to use and attach them to one of the networks. Both IP and non IP-based networks created in the previous step should show up and be available for selection.


In Example 4-8 we show how to add a predefined IP-based communication interface named glvm2_second to the previously configured IP network named net_ether_02 after taking the following steps:

1. Enter smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP Communication Interfaces/Devices Add Communication Interface/Devices Configure HACMP Communication Interfaces/Devices Add Pre-defined Communication Interfaces and Devices Communication Interfaces.

Example 4-8 Adding pre-defined communication interfaces

Add a Communication Interface


[Entry Fields]* IP Label/Address [glvm2_second] +* Network Type ether* Network Name net_ether_02* Node Name [glvm2] + Network Interface []


2. Repeat this step for all communication interfaces and devices you need to configure.

Verifying and synchronizing the clusterAfter the whole topology has been completely defined, you should verify and synchronize the cluster as shown in Example 4-9.

Example 4-9 Cluster verification and synchronization

HACMP Verification and Synchronization

Type or select values in entry fields.Press Enter AFTER making all desired changes. [Entry Fields]* Verify, Synchronize or Both [Both] +* Automatically correct errors found during [No] + verification?

* Force synchronization if verification fails? [No] +


* Verify changes only? [No] +* Logging [Standard] +F1=Help F2=Refresh F3=Cancel F4=ListF5=Reset F6=Command F7=Edit F8=ImageF9=Shell F10=Exit Enter=Do

Configuring cluster resources using C-SPOCAfter cluster topology has been configured, you can use C-SPOC to add quickly volume groups and resource groups to the cluster configuration.

As previously mentioned, C-SPOC stands for Cluster Single Point Of Control and is a powerful tool that allows you to configure and manage a cluster from a single node. C-SPOC allows you to modify your cluster and ensures that all changes are propagated to all cluster nodes. Basic tasks that are usually performed when installing a cluster and can be performed using C-SPOC include these:

� LVM-related tasks:

– Creating a new shared volume group– Extending, reducing, changing, mirroring, or unmirroring an existing

volume group– Creating a new shared logical volume– Extending, reducing, changing, or removing an existing logical volume– Creating a new shared file system– Extending, changing, or removing an existing file system– Adding, removing physical volumes

� User management tasks:

– Creating, removing and changing user accounts– Creating, removing and changing user groups– Managing user passwords across cluster nodes– Configuring cluster security features

Using C-SPOC to create volume groups and resource groupsIn this section we show how to quickly create a volume group and a resource group when you install your cluster by taking the following steps:

1. Enter smitty hacmp System Management (C-SPOC) HACMP Logical Volume Management Shared Volume Groups Create a Shared Volume Group

2. Select the cluster nodes that will share both the volume group and the resource group.

Note: Automatic error correction can be performed only if PowerHA is stopped on all cluster nodes.


3. Select the shared disk(s) that will be part of the volume group. Example 4-10 shows the PVID of a physical volume that is seen as hdisk1 by all cluster nodes.

Example 4-10 Selecting a disk accessible to all cluster nodes

Shared Volume Groups


List All Shared Volume Groups Create a Shared Volume Group Create a Shared Volume Group with Data Path Devices Enable a Shared Volume Group for Fast Disk Takeover Set Characteristics of a Shared Volume Group Import a Shared Volume Group Mirror a Shared Volume Group +--------------------------------------------------------------------------+ | Physical Volume Names | | | | Move cursor to desired item and press F7. | | ONE OR MORE items can be selected. | | Press Enter AFTER making all selections. | | | | 000fe401d39e31c4 ( hdisk1 on all selected nodes ) | | | | F1=Help F2=Refresh F3=Cancel | | F7=Select F8=Image F10=Exit |F1| Enter=Do /=Find n=Find Next |F9+--------------------------------------------------------------------------+

4. Select the type of the shared volume group as shown in Example 4-11.

Example 4-11 Selecting the type of the shared volume group



List All Shared Volume Groups Create a Shared Volume Group Create a Shared Volume Group with Data Path Devices Enable a Shared Volume Group for Fast Disk Takeover Set Characteristics of a Shared Volume Group Import a Shared Volume Group +--------------------------------------------------------------------------+ | Volume Group Type | | | | Move cursor to desired item and press Enter. |


| | | Legacy | | Original | | Big | | Scalable | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do |F1| /=Find n=Find Next |F9+--------------------------------------------------------------------------+

5. Fill in the name of the shared volume group and the resource group that will contain the volume group. Example 4-12 shows a volume named shared_vg, being part of the resource group named rg1 whose node list includes cluster nodes ndu1 and ndu2.

Example 4-12 Creating a resource group when creating a shared volume group

Create a Shared Volume Group


[TOP] [Entry Fields] Node Names ndu1,ndu2 Resource Group Name [rg1] + PVID 000fe401d39e31c4 VOLUME GROUP name [shared_vg] Physical partition SIZE in megabytes 4 + Volume group MAJOR NUMBER [36] # Enable Cross-Site LVM Mirroring Verification false + Enable Volume Group for Fast Disk Takeover? true + Volume Group Type Original

Warning: Changing the volume group major number may result in the command being unable to execute[MORE...5]



6. Verify that the volume groups have been successfully created as shown in Example 4-13.

Example 4-13 Successful creation of a shared volume group

COMMAND STATUS

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

ndu1: shared_vgndu1: mkvg: This concurrent capable volume group must be varied on manually.ndu1: synclvodm: No logical volumes in volume group shared_vg.ndu1: Volume group shared_vg has been updated.ndu2: synclvodm: No logical volumes in volume group shared_vg.ndu2: 0516-783 importvg: This imported volume group is concurrent capable.ndu2: Therefore, the volume group must be varied on manually.ndu2: 0516-1804 chvg: The quorum change takes effect immediately.ndu2: Volume group shared_vg has been imported.cl_mkvg: The HACMP configuration has been changed - Volume Group shared_vg has been added. The configuration must be synchronized to make this change effective across the clustercl_mkvg: Discovering Volume Group Configuration...

F1=Help F2=Refresh F3=Cancel F6=CommandF8=Image F9=Shell F10=Exit /=Findn=Find Next

7. Verify and synchronize the cluster. The newly-defined resource group with default attributes and containing shared_vg is presented, as shown in Example 4-14.

Example 4-14 The newly-defined resource group shows up as a cluster resource

root@ ndu1[] clshowres -g rg1

Resource Group Name rg1Participating Node Name(s) ndu1 ndu2Startup Policy Online On Home Node OnlyFallover Policy Fallover To Next Priority Node In The ListFallback Policy Never FallbackSite Relationship ignore


Node PriorityService IP LabelFilesystemsFilesystems Consistency CheckFilesystems Recovery MethodFilesystems/Directories to be exported (NFSv3)Filesystems/Directories to be exported (NFSv4)Filesystems to be NFS mountedNetwork For NFS MountFilesystem/Directory for NFSv4 Stable StorageVolume Groups shared_vgConcurrent Volume GroupsUse forced varyon for volume groups, if necessaryDisksGMVG Replicated ResourcesGMD Replicated ResourcesPPRC Replicated ResourcesERCMF Replicated ResourcesSVC PPRC Replicated ResourcesAIX Connections ServicesAIX Fast Connect ServicesShared Tape ResourcesApplication ServersHighly Available Communication LinksPrimary Workload Manager ClassSecondary Workload Manager ClassDelayed Fallback TimerMiscellaneous DataAutomatically Import Volume GroupsInactive TakeoverSSA Disk FencingFilesystems mounted before IP configuredWPAR Name

Run Time Parameters:

Node Name ndu1Debug Level highFormat for hacmp.out Standard

Node Name ndu2Debug Level highFormat for hacmp.out Standard

8. You can repeat this procedure for all the remaining volume groups. After the shared volume groups are defined, you can use C-SPOC to define all shared logical volumes and file systems.


4.3 Installing and configuring WebSMITIn addition to the “classic” configuration using the System Management Interface Tool (SMIT), PowerHA also provides a Web interface for configuring your cluster. Although some preparation work is needed, using the Web interface for PowerHA offers additional flexible monitoring and managing options.

Prior to 5.5, WebSMIT required running an HTTP server on at least one cluster node, and was limited to monitoring one cluster per Web session. It was limited to a one-to-one architecture, meaning that one WebSMIT server could manage one, and only one cluster.

PowerHA v5.5 has removed these restrictions by providing a new gateway configuration and an enterprise cluster view. The gateway architecture is a one-to-many architecture. This means that one WebSMIT server can now manage many different clusters at the same time. Currently, both the traditional one-to-one and the gateway options are available. However, the focus in this section is to utilize the gateway approach.

Additional requisite software package information, more detailed planning information, and instructions for upgrading from a previous version of WebSMIT can be found in the HACMP Installation Guide, SC23-5209.

In the following sections, we describe the basic steps for installing and configuring WebSMIT, which include:

1. Installing a Web server with the IBM HTTP Server code

2. Installing WebSMIT

3. Configuring WebSMIT

4. Starting WebSMIT

5. Registering clusters with the WebSMIT gateway

6. Accessing WebSMIT pages and add clusters

It is also important to understand the current limitations within PowerHA 5.5 WebSMIT:

� There is a maximum of 128 two-node clusters (removed in 5.5 SP1).� Pre-5.5 nodes must be manually registered (removed in 5.5. SP1).� WebSMIT’s clcomdES requires a unique cluster ID for each managed cluster.� You cannot configure a new, unconfigured cluster from within WebSMIT.� WebSMIT cannot currently use IPv6 addresses to communicate to clusters.� WebSMIT migration to 5.5 might fail to preserve permissions and update the

configuration file.� The Documentation tab does not reload automatically for new connections.

(fixed in 5.5. SP1).


In the following exercise, we assume that at least one cluster is defined and operational in your environment. For more information about installing and configuring a new cluster, refer to 4.2, “Configuring PowerHA” on page 202.

4.3.1 The PowerHA related SMIT panels and their structure

Using WebSMIT provides a useful view of the PowerHA related SMIT menu structure (tree). Every “+” marks at least one submenu that would expand out upon clicking it. This is shown in Figure 4-1.

Figure 4-1 Websmit PowerHA menu structure (extract)


4.3.2 Installing a Web server with the IBM HTTP Server codeWe chose to install IBM HTTP Server (IHS) 6.1 on an AIX 6.1 client because installing Apache has been covered in the previous book, Implementing High Availability Cluster Multi-Processing (HACMP) Cookbook, SG24-6769. We recommend installing this server on more than one client to eliminate the WebSMIT gateway being a single point of failure. The following steps and code levels were current at the time of writing.

To download and install the IBM HTTP Server code, follow these steps:

1. Launch a Web browser and go to the IBM site for HTTP servers at the following URL:

http://www.ibm.com/software/webservers/httpservers/

2. Download the current version (ours was ihs.6100.aix.ppc32.tar) and place it in a directory of your choice with at least 250 MB of free space. Upon untarring the file, a new directory called IHS_6.1.0.0 should be created with an IHS subdirectory. Change to the directory IHS directory and list the contents as shown in Example 4-15.

Example 4-15 IHS directory contents

root@ camryn[/usr/sys/inst.images/55/wsm/IHS_6.1.0.0/IHS] ls -altotal 23880drwxr-xr-x 9 504 504 4096 Jul 05 2006 .drwxrwxr-x 5 504 504 256 Jul 05 2006 ..drwxr-xr-x 3 504 504 256 May 16 2006 docsdrwxr-xr-x 2 504 504 4096 May 16 2006 frameworkdrwxr-xr-x 3 504 504 256 May 16 2006 ihs.primary.pak-rwxr-xr-x 1 504 504 362666 May 16 2006 installdrwxrwxr-x 2 504 504 4096 Jul 05 2006 lafilesdrwxr-xr-x 2 504 504 4096 May 16 2006 libdrwxr-xr-x 2 504 504 256 May 16 2006 panelsdrwxr-xr-x 2 504 504 256 May 16 2006 readme-rw-rw-r-- 1 504 504 11271 Jul 05 2006 responsefile.txt-rw-r--r-- 1 504 504 11829105 May 16 2006 setup.jar-rw-r--r-- 1 504 504 178 May 16 2006 version.txt

3. These images should include an install executable that, when invoked, will launch the installation GUI. Be sure you run the install by specifying the exact path as shown in Example 4-16.

Example 4-16 IHS installation executable

root@ camryn[/usr/sys/inst.images/55/wsm/IHS_6.1.0.0/IHS] ./installInstallShield Wizard

Initializing InstallShield Wizard...



Searching for Java(tm) Virtual Machine...

4. If problems occur using this executable, then directly invoke via java -jar setup.jar. Both of these methods will invoke the graphical installation program as shown in Figure 4-2.

5. Complete the installation by accepting the license agreement and completing the additional menus. We accepted the default values other than those for the HTTP administration server and unique user. We chose not to utilize those options. However, it is more common to set those values accordingly. Be sure to record what you set those to because you will most likely need them again in the future.

Figure 4-2 IBM HTTP Server Install Menu

6. You can also choose to do a silent installation. To do a silent installation, use the following command:

java -jar setup.jar -silent -options responsefile.txt

Tip: If installing from a graphical workstation, be sure to export your display from AIX by running export DISPLAY=<your.workstation.IP.address>:0


However, it will be necessary to edit the responsefile.txt file first, at a minimum, to set the license acceptance to true. Any other edits you might want to make can be made at that time, as well.

4.3.3 Installing WebSMIT

New in PowerHA 5.5 is the separation of the client and server filesets from the cluster.es package. The client package (cluster.es.client)includes the base client filesets, along with the WebSMIT fileset of cluster.es.client.wsm and the clcomdES fileset of cluster.es.client.clcomd. The client package is required to install on your designated WebSMIT AIX client. Follow these steps:

1. Using the directory or device where your PowerHA images reside, install the client package using SMIT install_all. Verify that all the client filesets are installed as shown in Example 4-17.

Example 4-17 Client filesets for WebSMIT

root@ ndu3[/usr/es/sbin/cluster/wsm] lslpp -L cluster.es.client.*---------------------------------------------------------------------- cluster.es.client.clcomd 5.5.0.1 A F ES Cluster Communication Infrastructure cluster.es.client.lib 5.5.0.1 A F ES Client Libraries cluster.es.client.rte 5.5.0.1 A F ES Client Runtime cluster.es.client.utils 5.5.0.1 A F ES Client Utilities cluster.es.client.wsm 5.5.0.1 A F Web based Smit

2. These filesets should also be installed on any additional clients you previously installed HTTP server on to use as a WebSMIT gateway.

4.3.4 Configuring WebSMITPrior to configuring WebSMIT, a Web server must be installed and the client WebSMIT filesets installed as covered in the previous sections.

A new utility has been provided to assist in the configuration of WebSMIT and the locally installed Web server. It creates a custom Web server configuration file, sets up appropriate file permissions, and verifies the WebSMIT configuration file (wsm_smit.conf). This utility should work for most typical installations, but in some cases, manual Web server configuration might be required. For such cases, refer to the documentation provided with the Web server.

Attention: Always carefully review the readme file in the directory, /usr/es/sbin/cluster/wsm, before proceeding.


Follow these steps for the installation:

1. Change your working directory to the WebSMIT install directory:

# cd /usr/es/sbin/cluster/wsm

2. If the IBM HTTP Server is in use, run the following command to configure WebSMIT with IHS and create a secure key for SSL:

# ./websmit_config -p <passphrase>

3. If you installed Apache Web server, run the following command to configure WebSMIT:

# ./websmit_config

4. The Web server configuration will be placed in the file:

/usr/es/sbin/cluster/wsm/httpd.wsm.conf

4.3.5 Starting WebSMIT Follow these steps to start WebSMIT:

1. If WebSMIT was configured with SSL enabled, which is normally the case as in our scenario, then start it with the following command:

# /usr/es/sbin/cluster/wsm/websmitctl startssl

2. If you need to stop WebSMIT, run the following command:

# /usr/es/sbin/cluster/wsm/websmitctl stop

Note: An SSL key is automatically included with the SSL-enabled Apache installation, so no pass phrase is needed.

Note: If the wsm_smit.conf file existed prior to installation of WebSMIT, the new version of the file will be placed in:

/usr/lpp/save.config/usr/es/sbin/cluster/wsm

Updates to this file should be merged with any existing configuration.

Note: This startup command will only work if the HTTP server has been SSL enabled. If a non-SSL server is being used, then consult the readme file in /usr/es/sbin/cluster/wsm for more information.


4.3.6 Registering clusters with the WebSMIT gatewayThis section shows how to prepare your cluster for registration with WebSMIT and how to register the clusters within WebSMIT. This process is for 5.5 clusters and there is a similar process for 5.3 and 5.4.1 clusters as mentioned in the note preceding Figure 4-3.

Follow these steps to prepare your cluster:

1. Login to your WebSMIT server and give the root user access to all clusters that are currently registered, or ever will be registered, by invoking the following command:

# /usr/es/sbin/cluster/wsm/utils/wsm_access -a root -t user ALL

2. Record the IP address of your WebSMIT server. You can use the hostname as long as it is resolvable (preferably listed in /etc/hosts) of the cluster nodes you are going to register.

3. Log in to one of the cluster nodes that you want to register with WebSMIT. If the cluster is fully operational, it does not matter which node you select.

/usr/es/sbin/cluster/utilities/wsm_gateway -v -a <WEBSMIT_SERVER>

Output from the command execution and registration process of our cluster (containing nodes xdsvc1 and xdsvc2) is shown in Figure 4-3. In the output, you will see it that automatically attempts to run the identical command on all other nodes in the cluster. Hence, this is why you only need to run this command on one node in the cluster.

Note: This step is a work around because the “root” user should always be given “ALL” access automatically. It has been fixed in 5.5 SP1 and can be skipped if you already have SP1 applied.

Note: If adding clusters running PowerHA levels lower than 5.4.1 SP5, or 5.3 SP11, simply copy the wsm_gateway script over from a 5.5 cluster node and follow these same procedures.


Figure 4-3 WebSMIT cluster node registration

4. During this process, the local cluster nodes should automatically be updated to include the WebSMIT servers IP address in their local /etc/hosts, /usr/es/sbin/cluster/etc/rhosts, as well as /usr/es/sbin/cluster/etc/gateways.

xdsvc1[/]/usr/es/sbin/cluster/utilities/wsm_gateway -v -a jessica

Validating gateway specification "jessica"... Resolve IP: 9.12.7.6 Network Name: jessica

Successfully registered WebSMIT server "9.12.7.6"!

>>> Invoking wsm_gateway on "jessica"..."

Validating gateway specification "jessica"... Resolved IP: 9.12.7.6 Network Name: jessica

Successfully registered WebSMIT server "9.12.7.6"!

/usr/bin/refresh -s clcomdES0513-095 The request for subsystem refresh was completed successfully.

/usr/bin/refresh -s clcomdES0513-095 The request for subsystem refresh was completed successfully.


4.3.7 Accessing WebSMIT pages and add clustersWebSMIT supports the following browsers:

� Internet Explorer® Version 6 or higher

� Mozilla 1.7.3 or higher

� Firefox 1.0.6 and up

1. To display WebSMIT on your browser, start up one of the foregoing approved browsers and point it to the following URL:

https://<your.websmit.server.IP>:42267

You will see a page similar to the one shown in Figure 4-4.

Figure 4-4 WebSMIT login page

By default, you use the root user to access WebSMIT (as listed in /usr/es/sbin/cluster/wsm/wsm_smit.conf file). The user can be changed to any AIX user (this user must exist and have a valid password). More information about adding users and granting access can be found in the readme file located in /usr/es/sbin/cluster/wsm.


2. Proceed by logging in as root, then right-click over the blank background of the Enterprise View tab, and from the pop-up menu, select Add a Cluster as shown in Figure 4-5.

Figure 4-5 WebSMIT add a cluster

Tip: Before adding a cluster, make sure that clcomdES is running on the WebSMIT gateway server and the cluster nodes.


3. You are then presented with the WebSMIT Cluster Registration menu option as shown in Figure 4-6. Enter the IP address or hostname of one of the cluster nodes you want to add, choose an icon (optional), and then click Execute.

Figure 4-6 Cluster registration from within WebSMIT

4. Upon successful completion of adding the clusters, refresh the Enterprise view by right-clicking the Enterprise tab and then left-click Refresh. An example follows, showing the new cluster added in Figure 4-7.

Tip: You can add multiple clusters simultaneously by simply listing the hostnames or IP addresses separated by a blank space.

Important: For full management capabilities of pre-5.5 clusters, you need SP11 (IZ49113) for 5.3, and/or SP5 (IZ42245) for 5.4.1.


Figure 4-7 WebSMIT Enterprise View

Tip: If upon execution you see the following error messages, here we offer some explanations and suggested actions.

ERROR: /usr/es/sbin/cluster/utilities/clrsh failed to get cluster information using the /usr/es/sbin/cluster/utilities/cllsclstr command on node 9.12.7.11.

ERROR: The clcomd agent on the WebSMIT server (websmit) cannot communicate with the clcomd agent on the specified node (9.12.7.11).

This usually indicates that the /usr/es/sbin/cluster/utilities/wsm_gateway utility has not been successfully invoked on that node. On 9.12.7.11, try entering:

/usr/es/sbin/cluster/utilities/wsm_gateway -avf websmit

ERROR: Failed to discover/add cluster information pertaining to:

node "9.12.7.11".

In addition to the reference given in the error of re-running wsm_gateway, validate that clcomdES is running on the WebSMIT gateway server and the local nodes /usr/es/sbin/cluster/etc/rhosts and /etc/hosts contain the IP address of the WebSMIT server. If so, stop and restart clcomdES on both the WebSMIT gateway and the client node(s) via stopsrc -s clcomdES and startsrc -s clcomdES respectively.


Notice that no information about the cluster is automatically displayed in the left frame. In order to access the cluster, you must connect to the cluster by right-clicking the cluster icon and choosing the Connect to Cluster option. This will then pull in the cluster configuration information and fill in the left frame with the cluster components as shown in Figure 4-8. At this point you can now use all the tabs and options available within WebSMIT for the chosen cluster.

Figure 4-8 WebSMIT cluster details

4.3.8 Introduction into WebSMITAfter completing the previous steps, you can now monitor and manage your cluster(s) through WebSMIT. In this section we provide an overview of WebSMIT features and functionality.

WebSMIT benefitsHere are just a few of the benefits of WebSMIT:

� WebSMIT provides PowerHA-level access to all the nodes in the registered clusters without requiring the PowerHA administrators to have to be given accounts on all those nodes with either root-level authority or pseudo access. Nor is it necessary to give the administrators the root password. Everything that must be done, can be done through WebSMIT.

� WebSMIT provides very simplistic role-based access, full versus read-only. If just cluster status monitoring capabilities is required, that too can be done.


� Most administrative cluster activity can be monitored at a single point, on the WebSMIT server. This can be incorporated for better change control and also allows for better accounting of who performs what actions, and when.

� Quicker navigation of SMIT panels by both expanded menu views and right-click shortcut pop-ups on most items.

� Enterprise level monitoring of all clusters.

WebSMIT frames and tab descriptionsThere are three frames within WebSMIT as listed below with their corresponding descriptions: the Header Frame, Navigation Frame, and Activity Frame. Figure 4-9 on page 239 highlights each frame.

� Header Frame:

This frame displays current cluster connection and access mode information. If no cluster is currently connected, then No cluster connection is displayed.

� Navigation Frame:

This is the left frame that displays three tabbed views from which you can navigate your cluster, as well as configuration menus. These navigation tabs display items in an expandable, hierarchical view. Selecting an item updates the content displayed in the activity frame to reflect the current selection. Open and contract any of the trees in this frame, as needed, to show or hide the subcomponents, by clicking the “+” or “-” symbol or by selecting the up or down arrow for all components.

You can select the following tabs from the navigation frame:

– SMIT tab: Provides hierarchical navigation of the SMIT menus to configure and manage your cluster. Clicking a menu item from the SMIT tab displays the corresponding SMIT panel in the activity frame in the Configuration tab. This allows quick navigation straight to the submenu panels by bypassing the main menus.

– N&N (Nodes and Networks) tab: Provides an expandable hierarchical view based on the cluster topology (either site- or node-centric depending on the cluster definition). The icons to the left of the hierarchical menu items indicate the type and state of the corresponding topological object. Clicking the items displayed within the N&N tree results in a textual report being displayed in the Details tab, and a graphical report in the Associations tab, both within the activity frame. Right-clicking any displayed item creates a context-sensitive menu of operations for that item if applicable.


– RGs (Resource Groups) tab: Provides an expandable hierarchical view, based on the cluster resources. The icons to the left of the hierarchical menu items indicate the type and state of the corresponding cluster resource. Right-clicking any displayed item creates a context-sensitive menu of operations for that item if applicable.

Clicking the items displayed within the RGs tree results in a textual report being displayed in the Details tab, and a graphical report in the Associations tab, both within the activity frame.

� Activity Frame:

This is the right frame, with five tabbed views from which you can configure and manage your clusters.

You can select the following tabs from the activity frame:

– Configuration tab: Displays SMIT panels. The SMIT panels can be selected from with in this tab. The panels can also originate from selections made in any of the tabs in the navigation frame or from the Enterprise tab. The SMIT fast path is also displayed at the bottom of each page.

– Details tab: Displays detailed reports pertaining to cluster components selected in the N&N or RGs tabs.

– Associations tab: Displays a graphical representation of a cluster component selected from the N&N or RGs tabs. Logical and physical relationships can be displayed, as well as resource contents. Right clicking any displayed item creates a context-sensitive menu of operations for that item if applicable. Double clicking an item will cause a new launch a new Associations display based on that item.

– Documentation tab: Displays the HACMP for AIX documentation bookshelf page, providing access to the HACMP documentation installed on the WebSMIT server, or to equivalent online documentation, when available. The displayed product documentation links are always made version appropriate, to match the current cluster connection (if any). This tab also provides links to various HACMP for AIX online resources.

– Enterprise tab: Displays all the clusters that have been registered with this WebSMIT server that you have authorized to see. Current status is displayed graphically for each cluster and that cluster’s nodes. A full WebSMIT connection can be established to any of the displayed clusters at any time.Help is available where indicated by a question mark, slightly smaller than the surrounding text. Right-clicking the tab, a cluster, or the background creates a context-sensitive menu of operations for that item.

Click the question mark to view the available help text in a small window.


Figure 4-9 WebSMIT frames

4.3.9 WebSMIT monitoringThe most commonly used view in the navigation frame is the nodes and networks view. If the icon and text is grayed out, then that device is not online. However, if it is a resource group, it might be online on another node. In Figure 4-10 you can see that rg1 shows up normal but rg2 is grayed out on node xdsv1. But it shows the opposite for node xdsv2. This is simply stating that rg1 is online on node xdsvc1 and rg2 is online on xdsvc2 and is considered normal.

Note: The documentation tab has also been updated in 5.5 SP1 to include readme files. However, this is only functional when WebSMIT is running on a cluster node instead of the WebSMIT gateway server. This limitation is planned to be removed in a future update. Also, if the documentation filesets are not installed on the WebSMIT gateway, the bookshelf links will point to the Web based publications.


Figure 4-10 WebSMIT nodes and networks view

Another common view is in the activity frame using the Enterprise tab to show the status of the clusters as introduced early on when registering clusters into WebSMIT. Figure 4-11 shows the different possible cluster status icons and what they mean.

Figure 4-11 WebSMIT cluster status icons

For more detailed information about monitoring and managing using WebSMIT, refer to the HACMP Administration Guide, SC23-4862.


Chapter 5. Migrating a cluster to PowerHA V5.5

This chapter describes the various ways to migrate a cluster to PowerHA 5.5. There are important planning steps and considerations to acknowledge prior to undertaking a migration. Before starting, make sure that you are familiar with the use of the cluster snapshot utility. Also, be aware of the current configuration, how it is expected to behave, and the changes in each release that you upgrade to, because your end result might not work or behave the same way.


� Identifying the migration path� Prerequisites� Considerations� General migration steps� Scenarios tested� Post-migration steps� Troubleshooting a failed migration

5

Important: Always review the planning and installation manual for a list of documented steps if you are not familiar with the procedure.


5.1 Identifying the migration pathIdentifying which migration path to use can be easily determined by whether you can tolerate an interruption to the entire cluster or whether you need to maintain maximum availability.

5.1.1 Migration methodsThere are four migration methods to get your cluster to a higher release:

Non-disruptive This method allows the cluster to be upgraded while the resources stay active. This means that no downtime is required.

Rolling migration This method requires short outage periods while moving the resources around to upgrade one node at a time. This will leave the cluster in a mixed state during the migration period.

Snapshot method This method requires all cluster nodes to be offline for a period of time. It requires removing previous versions of HACMP, installing the new PowerHA, and restoring a snapshot.

Offline This method requires all cluster nodes to be offline to perform the migration using the smitty update_all fast path.

5.1.2 Supported migration pathsTable 5-1 describes the supported migration methods for PowerHA 5.5.

Table 5-1 Supported release upgrades to PowerHA 5.5

Note: Node-by-Node migrations no longer apply after version 5.1 because this method was specific to an HACMP “classic” to HACMP/ES upgrade.

Existing version Non-disruptive Rolling Snapshot Offline

PowerHA 5.4.1 Yes Yes Yes Yes

PowerHA 5.4.0 Yes Yes Yes Yes

PowerHA 5.3 No Yes Yes Yes

PowerHA 5.2 No Yes Yes Yes

PowerHA 5.1 No No Yes Yes


5.2 PrerequisitesBefore upgrading to PowerHA 5.5, ensure that you are familiar with high-level concepts to low-level tasks, such as planning, maintenance, and troubleshooting because many of the sections in this chapter build on that knowledge.

Space requirementsThere must be enough disk space for the new installation:

� Approximately 82 MB in the /usr directory.� Approximately 710 KB in the / (root) directory.

If you are not planning to install all optional PowerHA software, then you can plan for less space.

Cluster software and RSCT prerequisitesEnsure that the same level of cluster software and RSCT filesets (including PTFs) are on all cluster nodes before starting a migration. Also, ensure that software installation is committed (not just applied).

To ensure that the software is already committed:

1. Run the lslpp -h cluster.* command.

2. If the word APPLY displays under the action header, enter smitty install_commit before installing the HACMP software.

SMIT displays the Commit Applied Software Updates (Remove Saved Files) panel.

3. Enter field values as follows:

SOFTWARE name: From the picklist, select all cluster filesets.

COMMIT old version if above version used it?: Set this field to yes.

EXTEND file system if space needed?: Set this field to yes.

Understanding the configurationKnow your cluster configuration. Consider reviewing and understanding the overall configuration prior to the upgrade. If you are new to your configuration, consider running the cltopinfo or cldump commands to get an overview of the environment. The output of the cllsif and clshowres commands will also give you a good understanding of how the cluster is configured. This information is also useful in the event that you need to place a call to AIX Support:

/usr/es/sbin/cluster/utilities/cllsif/usr/es/sbin/cluster/utilities/clshowres

Chapter 5. Migrating a cluster to PowerHA V5.5 243

The first output shows you the topology currently configured. It can help you to easily identify the available networks and the associated labels and IPs and their current function. The second command displays each resource group and all of the associated attributes. This output should help you determine the currently assigned resources and their designated fallover behavior.

Pre-migration recommendations and checksHere are some good basic practices before performing any migration in order to maximize success and to have a reliable backout plan:

� Always take a snapshot before commencing a migration. Remember to save it in multiple safe locations, including one completely off of the cluster nodes.

� Always have a system backup (mksysb) with the current configuration. Make sure that you have checked its contents and that it will restore your data.

� If the resources are available, consider creating an alternate disk installation on each cluster node prior to the migration. This is useful in the event that you have to quickly revert back to the old configuration. To change back, you would alter the bootlist to the alternate disk and reboot the migrated nodes.

� Ensure that the same level of cluster software and RSCT filesets (including PTFs) are on all cluster nodes before starting a migration.

� Make sure that your cluster will fallover successfully before the migration, otherwise it will probably also not work after the upgrade.

� Make copies of any custom event, pre/post event, and application scripts that you have created.

� Check the state of the system and filesets installed:

a. Run lppchk -v, -l, -c

b. Run instfix -i | grep ML

c. Review errpt -a for any recent pertinent errors.

d. Run df -k and check for full file systems.

e. Run lsps -s and make sure that paging space is not full.

f. Run emgr -l to check for any efixes loaded on the system before initiating the upgrade.


5.3 ConsiderationsVarious changes have occurred between earlier releases and PowerHA 5.5. Next we list some considerations to keep in mind when going into an upgrade:

� After you begin a rolling migration, your cluster will be running in a mixed mode until the last node is upgraded and reintegrated into the cluster. During this period of time, do not attempt to make any changes to the cluster topology or resources:

– Do not verify or synchronize the cluster.

– Do not attempt a DARE or C-SPOC operations, except for the Manage HACMP Services functions.

– Do not use the Problem Determination Tools > View Current State function.

– Do not attempt to create or restore a cluster snapshot.

– Do not use the Problem Determination Tools > Recover from HACMP Script Failure option or run the clruncmd command, except when running the command or SMIT option from the target node specified by the command.

� If upgrading from a release prior to HACMP 5.2, the HACMP installation creates the hacmp group on all nodes. The HACMP Configuration Database (ODM) classes have been updated so that they are owned by the root user and the hacmp group:

– The permissions of 640 are set for most HACMP object classes. The HACMPdisksubsystem is an exception with 600.

– All HACMP binaries intended for use by non-root users are installed with 2555 permissions. The setgid bit is turned on so that the program runs as a hacmp group.

Important: Do not leave the cluster in a hybrid state for an extended period of time to avoid accidentally invoking any of these operations.

Note: For security reasons, you should not expand the authority of the hacmp group.


– If you use programs that access the ODM directly, they might need to be rewritten.

– If using the PSSP File Collections facility to maintain the consistency of /etc/group, the new hacmp group might be lost when the next file synchronization occurs. To avoid this, do the following actions:

• Turn off the PSSP File collection synchronization of /etc/group.• Include the hacmp group in the master /etc/group file and propagate

the change to all cluster nodes.

� User-defined cluster events might no longer work because emsvcs are no longer used by HACMP. To correct any problems, rewrite them using rmcd.

� The Cluster Lock Manager (cllockd, or cllockdES) and Concurrent Logical Volume Manager (cluster.es.clvm) are no longer available. The installation removes these files and corresponding filesets from the nodes.

� Note that Enhanced Concurrent Mode is only supported on AIX 5L V5.1 and later. SSA concurrent mode is not supported on 64-bit kernels. If you have SSA disks in concurrent mode, you cannot run 64-bit kernels until you have converted all volume groups to enhanced concurrent mode.

5.4 General migration stepsHere we describe the general migration steps for the various migration methods.

Non-disruptive upgradeThe general steps for a non-disruptive upgrade are as follows:

1. Stop cluster services with the Unmanage Resource Groups option on one node.

2. Update to the latest version of PowerHA on each node.

3. Restart cluster services on one node with the Automatically manage resource groups option.

4. Repeat for each node.

Attention: Using the information retrieved directly from the ODM is for informational purposes only, because the format within the stanzas might change with updates or new versions.

Thus, hardcoding ODM queries within user defined applications is not supported and should be avoided.

Note: PSSP is no longer supported starting in PowerHA 5.5.


Rolling migrationThe general steps for a rolling migration are as follows:

1. Stop cluster services with takeover on first node to be migrated.

2. Upgrade the AIX and RSCT software (if necessary).

3. Upgrade the HACMP software (including latest PTFs).

4. Reboot (if AIX or RSCT updates requires it)

5. Reintegrate the node into the cluster and repeat steps on next node.

Snapshot migrationThe general steps for a snapshot migration are as follows:

1. Save a cluster snapshot.

2. Stop cluster services on all cluster nodes.

3. Upgrade the AIX and RSCT software on all nodes (if necessary).

4. Deinstall the current version of HACMP and install PowerHA on all nodes (including latest PTFs if available).

5. Reboot (only if AIX or RSCT requires it)

6. Restore the snapshot (this will push the configuration to all nodes).

7. Start cluster services on one node at a time.

Offline migrationThe general steps for an offline migration are as follows:



3. Upgrade to the latest PowerHA version.

4. Reboot (only if AIX or RSCT requires it).


5.5 Scenarios testedDuring the writing of this book we tested migrations to PowerHA 5.5 from two different releases:

� HACMP 5.4.1� HACMP 5.3


We performed a test of each of the following migration path options:

1. Non-disruptive upgrade from HACMP 5.4.1 to PowerHA 5.5 on AIX 6.1

2. Rolling migration from HACMP 5.3 on AIX 5.3 to PowerHA 5.5 and AIX 6.1

3. Snapshot upgrade from HACMP 5.3 to PowerHA 5.5 on AIX 6.1

4. Offline upgrade from HACMP 5.3 to PowerHA 5.5 on AIX 6.1

5.5.1 Scenario 1: Non-disruptive upgrade (NDU) from HACMP 5.4.1 to PowerHA 5.5

Our test scenario consisted of a two-node AIX 6.1.2 / HACMP 5.4.1 cluster to AIX 6.1.2 / HACMP 5.5.0. We used two POWER6 technology-based 550 servers utilizing virtual devices for our cluster configuration.

The topology consisted of a single IP network utilizing IPAT via aliasing and a disk heartbeat network. We configured two resource groups. Our resource groups each contained a service IP, a shared data volume group, and an application server. Our resource groups were configured with different startup, fallover, and fallback policies in order to examine any behavioral differences during the NDU process.

Our storage consisted of a switch attached DS4800 that was accessed as a virtual SCSI device via a VIO server. We used enhanced concurrent volume groups in order to use disk heartbeating.

Non-disruptive upgradeFor this test, we used the following steps in an active cluster:

1. Stop cluster services with smitty clstop with option Unmanage Resource Groups on node ndu1.

2. Check the status of clstrmgrES on both nodes lssrc -ls clstrmgrES and the resource group status clRGinfo.

3. Update to PowerHA 5.5.0.0 on node ndu1.

4. Start cluster services with smitty clstart with option Manage Resource Groups set to Automatically on node ndu1.

Note: We completed a non-disruptive upgrade on a PowerHA/XD cluster using the same generic steps. Non-disruptive upgrade only applies to cluster filesets. Note that when updating code such as GLVM (which is shipped as part of base AIX), the updated code will only be used when next loaded into the AIX kernel.


5. Check the status of clstrmgrES on both nodes lssrc -ls clstrmgrES and the resource group status clRGinfo.

6. Repeat steps 1 - 5 for node ndu2.

Non-disruptive upgrade resultsWhen a node is active, clstrmgrES state and clRGinfo outputs will look like the output shown in Example 5-1.

Example 5-1 Normal cluster status prior to starting non-disruptive upgrade

root@ ndu1[/] lssrc -ls clstrmgrESCurrent state: ST_STABLEsccsid = "@(#)36 1.135.4.3 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 52haes_r541, 0827A_hacmp541 5/22/08 02:36:17"i_local_nodeid 0, i_local_siteid -1, my_handle 1ml_idx[1]=0 ml_idx[2]=1There are 0 events on the Ibcast queueThere are 0 events on the RM Ibcast queueCLversion: 9local node vrmf is 5410cluster fix level is "0"The following timer(s) are currently active:Current DNP valuesDNP Values for NodeId - 1 NodeName - ndu1 PgSpFree = 129102 PvPctBusy = 0 PctTotalTimeIdle = 99.465268DNP Values for NodeId - 2 NodeName - ndu2 PgSpFree = 129108 PvPctBusy = 0 PctTotalTimeIdle = 99.377601

root@ ndu1[/] clRGinfo-----------------------------------------------------------------------Group Name Group State Node-----------------------------------------------------------------------ndu_rg1 ONLINE ndu1 OFFLINE ndu2

ndu_rg2 ONLINE ndu2 OFFLINE ndu1

When a node is in the unmanaged state, the output will change to reflect the Forced down node list as shown in Example 5-2.


Example 5-2 Cluster status with one node in the unmanaged state

root@ ndu1[/] lssrc -ls clstrmgrESCurrent state: ST_STABLEsccsid = "@(#)36 1.135.4.3 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 52haes_r541, 0827A_hacmp541 5/22/08 02:36:17"i_local_nodeid 0, i_local_siteid -1, my_handle 1ml_idx[1]=0 ml_idx[2]=1Forced down node list: ndu1There are 0 events on the Ibcast queueThere are 0 events on the RM Ibcast queueCLversion: 9local node vrmf is 5410cluster fix level is "0"The following timer(s) are currently active:Current DNP valuesDNP Values for NodeId - 1 NodeName - ndu1 PgSpFree = 129102 PvPctBusy = 0 PctTotalTimeIdle = 99.462205DNP Values for NodeId - 2 NodeName - ndu2 PgSpFree = 129108 PvPctBusy = 0 PctTotalTimeIdle = 99.430

root@ ndu1[/] clRGinfo-----------------------------------------------------------------------Group Name Group State Node-----------------------------------------------------------------------ndu_rg1 UNMANAGED ndu1 UNMANAGED ndu2

ndu_rg2 ONLINE ndu2 OFFLINE ndu1

When the PowerHA 5.5.0.0 filesets are installed (smitty update_all) using the non-disruptive upgrade method, clstrmgrES is restarted as shown in Example 5-3. Then cluster services should be started on the upgraded node with the option Manage Resource Groups set to Automatically.

Example 5-3 clstrmgrES state after upgrade of PowerHA filesets

root@ ndu1[/] lssrc -ls clstrmgrESCurrent state: ST_INITsccsid = "@(#)36 1.135.1.91 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 53haes_r550, 0845B_hacmp550 10/21/08 13:31:47"


5.5.2 Scenario 2: Rolling migration from HACMP 5.3 to PowerHA5.5

Our second scenario consisted of a two-node AIX 6.1.2 / HACMP 5.3 cluster to AIX 6.1.2 / HACMP 5.5.0 We used two POWER6 technology-based 550 servers using virtual devices for our cluster configuration.

The topology consisted of a single IP network utilizing IPAT via aliasing and a disk heartbeat network. We configured two resource groups. Our resource groups each contained a service IP, a shared data volume group, and an application server.

Our storage consisted of a switch-attached DS4800 that was accessed as a virtual SCSI device via a VIO server. We used enhanced concurrent volume groups in order to use disk heartbeating.

We verified migration prerequisites and followed the steps recommended for rolling migration described in 5.4, “General migration steps” on page 246 with the following practical results:

1. We had a good look at the cluster resource groups and their corresponding policies. That allows the system administrator to accurately predict resource group fallover during the upgrade. In our scenario there are two cluster nodes each having one resource group as shown in Example 5-4.

Example 5-4 Cluster resource groups before starting migration

xdsvc1[/]/usr/es/sbin/cluster/utilities/clRGinfo -v

Cluster Name: xdsvc

Resource Group Name: rg1Startup Policy: Online On Home Node OnlyFallover Policy: Fallover To Next Priority Node In The ListFallback Policy: Fallback To Higher Priority Node In The ListSite Policy: ignorePriority Override Information: Primary Instance POL:

Note: While completing the non-disruptive upgrade testing, a defect was discovered with clusters containing resource groups that had a startup policy of online on first available node causing a coredump of clstrmgrES.

At the time of writing, an APAR number was not available; however, the fix will be included in a future service pack. For further information, contact your local IBM Support Representative.


Node State---------------------------- ---------------xdsvc1 ONLINExdsvc2 OFFLINE

Resource Group Name: rg2Startup Policy: Online On Home Node OnlyFallover Policy: Fallover To Next Priority Node In The ListFallback Policy: Never FallbackSite Policy: ignorePriority Override Information: Primary Instance POL:Node State---------------------------- ---------------xdsvc2 ONLINExdsvc1 OFFLINE

2. We recorded cluster name, version, and ID before starting the migration as shown in Example 5-5. We ensured that the information was identical on all cluster nodes.

Example 5-5 Recording cluster name, version, and ID before starting the migration

xdsvc1[/]odmget HACMPcluster

HACMPcluster: id = 1236696699 name = "xdsvc" nodename = "xdsvc1" sec_level = "Standard" sec_level_msg = "" sec_encryption = "" sec_persistent = "" last_node_ids = "" highest_node_id = 0 last_network_ids = "" highest_network_id = 0 last_site_ids = "" highest_site_id = 0 handle = 1 cluster_version = 8 reserved1 = 0 reserved2 = 0 wlm_subdir = "" settling_time = 0


rg_distribution_policy = "node" noautoverification = 0 clvernodename = "" clverhour = 0 clverstartupoptions = 0

3. We recorded AIX and RSCT levels as shown in Example 5-6. We ensured that the information was identical on all cluster nodes.

Example 5-6 AIX and RSCT levels on cluster node xdsvc2

xdsvc2[/]lslpp -L|grep -i rsct.basic rsct.basic.hacmp 2.4.10.0 C F RSCT Basic Function (HACMP/ES rsct.basic.rte 2.4.10.0 C F RSCT Basic Function rsct.basic.sp 2.4.10.0 C F RSCT Basic Function (PSSP rsct.msg.en_US.basic.rte 2.4.0.0 C F RSCT Basic Msgs - U.S. Englishxdsvc2[/]oslevel -s5300-09-01-0847

4. We stopped HACMP services on node xdsvc2 graceful with takeover. As a result of this operation, the resource group named rg2 is moved to node xdscv1, as shown by the /usr/es/sbin/cluster/utilities/clRGinfo -v command in Example 5-7.

Example 5-7 Cluster resource groups after node xdscv2 failed over

xdsvc1[/]/usr/es/sbin/cluster/utilities/clRGinfo -v

Cluster Name: xdsvc

Resource Group Name: rg1Startup Policy: Online On Home Node OnlyFallover Policy: Fallover To Next Priority Node In The ListFallback Policy: Fallback To Higher Priority Node In The ListSite Policy: ignorePriority Override Information: Primary Instance POL:Node State---------------------------- ---------------xdsvc1 ONLINExdsvc2 OFFLINE

Resource Group Name: rg2


Startup Policy: Online On Home Node OnlyFallover Policy: Fallover To Next Priority Node In The ListFallback Policy: Never FallbackSite Policy: ignorePriority Override Information: Primary Instance POL:Node State---------------------------- ---------------xdsvc2 OFFLINExdsvc1 ONLINE

5. We performed the AIX, RSCT and HACMP migration on node xdsvc2 as follows:

– Upgraded to AIX 6 (6100-02-01-0847). This required a system reboot.

– Updated and verified RSCT levels (rsct.basic.rte 2.5.2.0).

– Updated the HACMP filesets to 5.5 (cluster.es.server.rte 5.5.0.0). This upgrade did not require a system reboot.

The results of all upgrades are shown in Example 5-8.

6. We started cluster services on cluster node xdsvc2.

In Example 5-8. we show two additional important points:

– Despite the fact that HACMP has been upgraded, the HACMPcluster ODM class still contains the value 8, which is the version number corresponding to HACMP 5.3.

– Resource group rg2 behaved according to its policy. Because this resource group had the fallback policy set to never fallback, after node xdscv2 rejoined the cluster, the resource group rg2 remained online on node xdsvc1.

Example 5-8 AIX, RSCT, and HACMP levels on cluster node xdsvc2

xdsvc2[/]oslevel -s6100-02-01-0847xdsvc2[/]lslpp -L|grep -i rsct.basic rsct.basic.hacmp 2.5.2.0 C F RSCT Basic Function (HACMP/ES rsct.basic.rte 2.5.2.0 C F RSCT Basic Function rsct.basic.sp 2.5.2.0 C F RSCT Basic Function (PSSP rsct.msg.en_US.basic.rte 2.5.0.0 C F RSCT Basic Msgs - U.S. Englishxdsvc2[/]lslpp -L|grep -i cluster|grep server cluster.es.server.cfgast 5.5.0.0 C F ES Two-Node Configuration cluster.es.server.diag 5.5.0.0 C F ES Server Diags cluster.es.server.events 5.5.0.0 C F ES Server Events


cluster.es.server.rte 5.5.0.0 C F ES Base Server Runtime cluster.es.server.testtool cluster.es.server.utils 5.5.0.0 C F ES Server Utilities cluster.msg.en_US.es.serverxdsvc2[/]odmget HACMPcluster|grep -i version cluster_version = 8xdsvc2[/]/usr/es/sbin/cluster/utilities/clRGinfo-----------------------------------------------------------------------------State Node Type-----------------------------------------------------------------------------rg1 ONLINE xdsvc1 OFFLINE xdsvc2

rg2 OFFLINE xdsvc2 ONLINE xdsvc1

7. We stopped HACMP on node xdsvc1 gracefully with takeover. As a result of this operation, all resource groups were acquired by xdsvc2.

8. We performed the AIX, RSCT, and HACMP migration on node xdsvc1 as follows:

– We upgraded to AIX 6 (6100-02-01-0847). This required a system reboot.

– We updated and verified RSCT levels (rsct.basic.rte 2.5.2.0).

– We updated the HACMP filesets to 5.5 (cluster.es.server.rte 5.5.0.0). This upgrade did not require a system reboot.

The results of all upgrades are shown in Example 5-9.

Example 5-9 AIX, RSCT, and HACMP levels on cluster node xdsvc1

xdsvc1[/]oslevel -s6100-02-01-0847xdsvc1[/]lslpp -L|grep rsct.basic rsct.basic.hacmp 2.5.2.0 C F RSCT Basic Function (HACMP/ES rsct.basic.rte 2.5.2.0 C F RSCT Basic Function rsct.basic.sp 2.5.2.0 C F RSCT Basic Function (PSSPxdsvc1[/]lslpp -L|grep -i cluster|grep server cluster.es.server.cfgast 5.5.0.0 C F ES Two-Node Configuration cluster.es.server.diag 5.5.0.0 C F ES Server Diags cluster.es.server.events 5.5.0.0 C F ES Server Events cluster.es.server.rte 5.5.0.0 C F ES Base Server Runtime cluster.es.server.testtool cluster.es.server.utils 5.5.0.0 C F ES Server Utilities cluster.msg.en_US.es.serverxdsvc1[/]odmget HACMPcluster|grep -i version


cluster_version = 8

9. We started cluster services on cluster node xdsvc1. Following this operation, a few additional points shown in Example 5-10 are worth noticing:

– Resource group rg1 behaved according to its policy. Because this resource group had the fallback policy set to Fallback To Higher Priority Node In The List, after node xdscv1 rejoined the cluster, the resource group rg1 migrated to node xdsvc1.

– Previous cluster ODM definitions have been preserved and updated where appropriate. For instance, on both cluster nodes HACMPcluster ODM class has been upgraded and contains both the original cluster ID and the new version number that has now the value 10, which in turn corresponds to HACMP 5.5.

– clconvert.log confirms that migration has completed successfuly.

Example 5-10 Resource groups states, HACMP log messages, and ODM entry following start of the HACMP services on cluster node xdsvc1

xdsvc2[/]/usr/es/sbin/cluster/utilities/clRGinfo-----------------------------------------------------------------------------State Node Type-----------------------------------------------------------------------------rg1 ONLINE xdsvc1 OFFLINE xdsvc2

rg2 ONLINE xdsvc2 OFFLINE xdsvc1

xdsvc2[/]odmget HACMPcluster

HACMPcluster: id = 1236696699 name = "xdsvc" nodename = "xdsvc2" sec_level = "Standard" sec_level_msg = "" sec_encryption = "" sec_persistent = "" last_node_ids = "1, 2" highest_node_id = 2 last_network_ids = "" highest_network_id = 0 last_site_ids = "" highest_site_id = 0 handle = 2 cluster_version = 10 reserved1 = 0


reserved2 = 0 wlm_subdir = "" settling_time = 0 rg_distribution_policy = "node" noautoverification = 0 clvernodename = "" clverhour = 0 clverstartupoptions = 0xdsvc1[/]odmget HACMPcluster

HACMPcluster: id = 1236696699 name = "xdsvc" nodename = "xdsvc1" sec_level = "Standard" sec_level_msg = "" sec_encryption = "" sec_persistent = "" last_node_ids = "1, 2" highest_node_id = 2 last_network_ids = "" highest_network_id = 0 last_site_ids = "" highest_site_id = 0 handle = 1 cluster_version = 10 reserved1 = 0 reserved2 = 0 wlm_subdir = "" settling_time = 0 rg_distribution_policy = "node" noautoverification = 0 clvernodename = "" clverhour = 0 clverstartupoptions = 0

xdsvc1[/]tail -n 15 /tmp/clconvert.log***************************

Done execution of ODM manipulator scripts.

Cleanup: Writing resulting odms to /etc/es/objrepos. Restoring original ODMDIR to /etc/es/objrepos. Removing temporary directory /tmp/tmpodmdir.

Exiting cl_convert.

Exiting with error code 0. Completed successfully.


--------- end of log file for cl_convert: Fri Mar 6 13:20:35 CST 2009

5.5.3 Scenario 3: Snapshot upgrade from HACMP 5.3 to PowerHA5.5Our third test scenario consisted of the migration of a two-node AIX 6.1.2.0 / HACMP 5.3 SP10 cluster to AIX 6.1.2.0 / PowerHA 5.5. We used two Power 550s utilizing logical partitions for our cluster nodes (kim and val) and virtual devices for our cluster configuration.

The topology comprised a single IP network, utilizing IP address aliasing, and a disk heartbeat network. We also had a single resource group, with a service IP, a shared data volume group, and an application server. A custom event, we called cetest, a user defined event called cutest, and custom heartbeat rate for the ethernet were created to ensure that these were preserved after the upgrade.

Our storage consisted of a switch attached DS4800 that was accessed as a virtual SCSI device via a VIO server. We used enhanced concurrent volume groups in order to use disk heartbeating.

Snapshot migrationFor this test, we used the following steps while the cluster as active:

1. We created a cluster snapshot from our active cluster.

2. We stopped HACMP gracefully on each node: kim val

3. We made a copy of the snapshot file from the default /usr/es/sbin/cluster/snapshots directory.

4. We ran smitty remove and de-installed all cluster.* filesets from each node.

5. We installed PowerHA 5.5 using smitty install_all.

6. We restored the snapshot using smitty cm_cfg_snap_menu Restore the Cluster Configuration from a Snapshot.

Important: There is a known defect when performing a snapshot restoration in the 5.5 base code level. It is corrected in PowerHA SP1. Additional details follow in “Snapshot migration results”.

Note: Running a clconvert_snapshot against the snapshot prior to restoring it is no longer required.


7. We started cluster services on one node at a time.

Snapshot migration resultsWe encountered an error while restoring the snapshot on 5.5.0.0. The error is shown in Example 5-11.

Example 5-11 Snapshot migration error#1

clsnapshot: Removing any existing temporary HACMP ODM entries...clsnapshot: Creating temporary HACMP ODM object classes...clsnapshot: Adding HACMP ODM entries to a temporary directory..migcheck[471]: cl_connect() error, nodename=kim, rc=-1

ERROR: Internode Communication check failed,check clcomd.log file for more Information.

We stopped and restarted clcomdES on both systems and tried restoring the snapshot again. We then encountered a different error as shown in Example 5-12.

Example 5-12 Snapshot migration error#2

kim

clsnapshot: Unable to discover the name of the local node.Please check the cluster configuration.

clsnapshot: Local node not properly configured for HACMP for AIX.

clsnapshot: Unable to discover the name of the local node.Please check the cluster configuration.

clsnapshot: Unable to save current snapshot. Aborting.

Additional attempts to restore the snapshot resulted in the same error above. We removed the cluster configuration using SMIT and started restoring the snapshot again. This now resulted in success.

Upon further research, we discovered that it was a known issue (see IZ40980) that is resolved in PowerHA 5.5 SP1. We repeated the entire snapshot migration procedure, including SP1, into our new install of 5.5. This too resulted in successfully restoring the snapshot. An abbreviated version of the results is shown in Example 5-13.


Example 5-13 Snapshot migration success

clsnapshot: Removing any existing temporary HACMP ODM entries...clsnapshot: Creating temporary HACMP ODM object classes...clsnapshot: Adding HACMP ODM entries to a temporary directory..clsnapshot: Verifying configuration using temporary HACMP ODM entries...Verification to be performed on the following: Cluster Topology Cluster Resources

Retrieving data from available cluster nodes. This could take a few minutes.

Start data collection on node kim Start data collection on node val Collector on node val completed Collector on node kim completed Data collection complete

Verifying Cluster Topology...

Completed 10 percent of the verification checks

Completed 100 percent of the verification checks

Remember to redo automatic error notification if configuration has changed.

Verification has completed normally.

clsnapshot: Removing current HACMP ODM entries...

clsnapshot: Adding new HACMP ODM entries...

clsnapshot: Synchronizing cluster configuration to all cluster nodes...

Committing any changes, as required, to all available nodes...Adding any necessary HACMP for AIX entries to /etc/inittab and /etc/rc.net for IP Address Takeover on node kim.Adding any necessary HACMP for AIX entries to /etc/inittab and /etc/rc.net for IP Address Takeover on node val.

Verification has completed normally.

clsnapshot: Succeeded applying Cluster Snapshot: 5309upgrade


Upon starting the cluster successfully on each node, we verified that the version level, as shown in 5.7.2, “Reviewing the cluster version in the HACMP ODM” on page 266, had been updated. We also checked to make sure our custom defined events and custom heartbeat settings had been maintained via both SMIT and the ODM. We also performed both manual testing and automated testing via the cluster test tool.

Overall, the snapshot migration scenario was quick and easy. Although it did require a cluster wide outage, most of the time spent was simply for removing the old version of HACMP and installing the new PowerHA. This included, of course, testing the cluster, which would apply no matter what migration/upgrade path is chosen.

If taking a full cluster outage is acceptable, then the offline upgrade method is faster. It does not require removing the current version first, nor restoring the snapshot manually as covered in 5.5.4, “Scenario 4: Offline upgrade from HACMP 5.3 to PowerHA 5.5” on page 262.

Tip: If you are planning to use the snapshot method, be aware that when you try to reapply it to the nodes, the first communication path that HACMP will try to use on each node is the one specified in the HACMPnode object class:

#odmget HACMPnodeHACMPnode: name = "camryn" object = "COMMUNICATION_PATH" value = "10.10.10.10" node_id = 2 node_handle = 2 version = 10

This value is originally set whenever the nodes were first defined to the cluster and a COMMUNICATION PATH was specified. A best practice recommendation is to always set this path as the persistent IP address. If for any reason that IP is not available when you try to apply the snapshot manually, set that alias on one of the interfaces.

The file /usr/es/sbin/cluster/etc/rhosts is the last file checked for communication. It can be manually updated with all of the cluster IP addresses if necessary. The IPs will include base, service, and persistent IPs.


5.5.4 Scenario 4: Offline upgrade from HACMP 5.3 to PowerHA 5.5

For our fourth test scenario we used the same two nodes, kim and val, as we used in 5.5.3, “Scenario 3: Snapshot upgrade from HACMP 5.3 to PowerHA5.5” on page 258. To set up, we simply removed the PowerHA 5.5.0.1 filesets, re-installed HACMP 5.3 SP10, and applied a copy of the snapshot that we previously created to recreate the cluster.

Offline migrationFor this test, we used the following steps while the cluster was active:

1. We created a cluster snapshot from our active cluster.

2. We stopped HACMP gracefully on each node: kim val.

3. We made a copy of the snapshot file from the default /usr/es/sbin/cluster/snapshots directory.

4. We upgraded to PowerHA5.5 using smitty update_all.

5. We started cluster services on one node at a time.

Offline migration resultsUpon starting the cluster successfully on each node, we verified that the version level, as shown in 5.7.2, “Reviewing the cluster version in the HACMP ODM” on page 266, had been updated. We also checked to make sure that our custom defined events and custom heartbeat settings had been maintained via both SMIT and the ODM. We also performed both manual testing and automated testing via the cluster test tool.

Overall, the offline migration scenario was very quick and easy. Although it did require a cluster wide outage, it was the most efficient in both time and process. If it is possible for you to take the cluster offline, we recommend this migration method.

5.6 Post-migration stepsWe recommend that you take the following actions after a migration:

� Check for any HACMP filesets still installed from the previous release. It is easy to leave behind old documentation filesets, which although they will not cause any problems, should be upgraded as well.

� Test the cluster fallover and recovery behavior after any migration. The cluster test tool gives you the ability to run multiple tests in one run and can be customized to include additional tests.


5.7 Troubleshooting a failed migrationMigrating a cluster is an integral part of cluster administration. In order to complete a successful upgrade, be sure to carefully plan and review your migration path. You should always test your procedures on a test cluster prior to performing them on a production cluster. This section was written with that in mind.

5.7.1 Backing out of a failed migrationFollowing the steps in the pre-migration checklist is the best way to avoid running into a migration problem. However, should you encounter a problem, consider retracing your migration steps. In this section we provide suggestions that you can follow in order to revert back to your old configuration.

Post-migration tip:

We used a NIM server for our AIX/HACMP migrations as well as the persistent IP for each NIM client machine definition. As a result, we discovered that the installation set the persistent IP label as the base address for one of the interfaces and removed the base address. The hostname was also set to the machine name specified in the NIM client machine definition.

We corrected these changes by doing the following steps:

1. We issued the following command to hard set the interface back to base address:

smitty chinet

2. We commented out the lines added in the /etc/rc.net file:

/bin/hostname <hostname>/usr/sbin/ifconfig <en#> inet <IP> netmask 255.255.255.0

3. We issued the following command to set the hostname back:

hostname <name>

If using NIM, make sure that you check for these changes, otherwise you will experience mixed results within HACMP after starting cluster services.

This issue can be avoided by using NIM customization (provided that you write your own customization script).


If the HACMP cluster migration that you are performing involves the upgrade of AIX and RSCT, your cluster will be running mixed or in a hybrid state as soon as the first upgraded node is reintegrated into the cluster. Should this node or any remaining nodes in the cluster fail to integrate into the cluster, there is a chance that your cluster migration will not be easily corrected and that you will need to revert back to your old configuration.

When you are in this stage, you are limited to the following options:

Option 1 Power up the node and attempt to identify and troubleshoot the problem.

Option 2 Revert back to your old configuration.

Option 3 Deinstall the old HACMP software, install the new code, and follow the snapshot migration path (assuming that you have a snapshot available).

Option 1: Troubleshooting the migration failureIn the event that the node halted, power up the node. With the machine powered up and no HACMP active on it, you can try to identify the cause of the clexit by reviewing the some of the different cluster and system logs. We recommend reviewing the following advice:

� Check the error report (errpt -a| more).

Record the exact time of the failure based on the errors logged. Analyze any recent pertinent errors and try to identify any abnormal daemon exits or CORE files generated.

� Check the /var/hacmp/adm/cluster.log file for any related cluster events.

Analyze the log for any cluster events taking place during the time of the failure. Remember to check the logs on the other cluster nodes for any possible differences.

� Check /tmp/clstrmgr.debug for any messages about the clstrmgrES exit.

If the cluster manager exits abnormally, a machine will typically halt. The majority of the time, some type of an exit message will be logged at the end of this file. The message can give you or your support representatives an idea as to the cause of the failure.

Note: If you encounter a problem, depending on what stage of the migration you are in, you might not have to restore the entire cluster.


� Based on the errpt messages, you can consider analyzing the group services log files in /var/hacmp/log to try to identify a problem.

An analysis of the logs during the time the problem occurred can help you identify the cause of an abnormal daemon exit or any other problems recorded.

Generally, an in-depth review of these log files should be performed by an experienced HACMP administrator or by an IBM software support representative. Keep in mind that the review of these logs during the migration process will prevent you from continuing and will delay the overall migration process.

Option 2: Reverting back to the old configurationIf your migration involves only the upgrade of the HACMP software and you experience a failure in the middle of a rolling migration, the quickest method to revert back to your previous configuration is to deinstall the HACMP filesets from the nodes that have been upgraded thus far. Then reinstall the old HACMP version onto those nodes and synchronize the configuration back from the remaining active nodes.

If you are upgrading the AIX and RSCT software along with HACMP, reverting back to the old configuration will be a bit more difficult. You will need to stop HACMP on all upgraded nodes and revert them back to the previous version of AIX either via mksysb or alternate disk installation taken prior to the start of the migration. After restoring the nodes back from your backup source, you should perform a verification and synchronization from the currently active nodes. Upon a successful completion, you can reintegrate the restored nodes back into the cluster.

After successfully restoring the cluster back to the original code version, you should review your migration steps and ensure that you have reviewed and performed the steps discussed in 5.2, “Prerequisites” on page 243.

Option 3: Deinstalling HACMP and performing snapshot migrationIn order to take advantage of this option, you need to have a valid snapshot from when the cluster was at the version of HACMP that you are upgrading from. The steps involved are basically the same as the general snapshot migration steps covered in 5.4, “General migration steps” on page 246.



Tip: If you are contacting IBM software support, consider collecting a snap -e from the node that experienced the failure in order to expedite the analysis and resolution of your problem.


3. Deinstall the current version of HACMP and install PowerHA 5.5 on all nodes (including latest PTFs if available).

4. Reboot the nodes (only necessary if AIX or RSCT updates require it).

5. Restore the cluster snapshot.


7. Now test the cluster.

5.7.2 Reviewing the cluster version in the HACMP ODMTo review the version of your cluster:

1. Run odmget HACMPcluster or odmget HACMPnode. It is very important to note that after the migration to PowerHA 5.5 is completed, the version level should be equal to 10 (see Table 5-2).

Table 5-2 HACMP cluster version ODM stanzas

If the version was not updated in a rolling migration after the last node integrated into the cluster, check the clconvert.log for any migration problems reported. Only as a last resort should the ODM values be modified by an administrator or IBM software support personnel.

Note: If you are having problems beyond those mentioned in this section, consider contacting IBM software support for further assistance.

HACMP Version In HACMPcluster In HACMPnode

PowerHA 5.5 cluster_version = 10 version = 10

HACMP 5.4.1 cluster_version = 9 version = 9

HACMP 5.4 cluster_version = 9 version = 9



Note: Although this practice is not supported, if you are still considering manually editing the ODM values to correct a discrepancy, you should call IBM software support before proceeding.


5.7.3 Troubleshooting a stalled snapshot applicationIn some instances using the snapshot migration, you might encounter a situation in which the verification fails and the snapshot fails to restore. If you restore a snapshot and see an error, review the log files and check to see if it can be corrected by the HACMP verification utility. Be advised that even if the apply fails, some of the configuration might be updated into the HACMP ODM classes.

If the error meets the criteria of a discrepancy that the auto-correct feature in a verification will resolve, you can continue the upgrade process by applying the snapshot with the forced option. Upon completion, run the cluster synchronization and verification process with the option Automatically Correct Errors during the Cluster Verification set to Interactively.

You might see the following warnings and errors:

WARNING: “ The NFS mount/Filesystem specified for resource group rg1 is using incorrect syntax for specifying an NFS cross mount: /mnt/fs1”.

ERROR: “ Disk Heartbeat Networks have been defined, but no Disk Heartbeat Devices. You must configure one device for each node in order for a Disk Heartbeat network to function”.

In these cases, apply the snapshot forcefully to continue the upgrade process. Although the apply fails, the cluster remains intact. In this instance, force applying the snapshot is safe.

5.7.4 DARE error during synchronizationIf after the migration completes, you try to synchronize and see the following message:

cldare: Migration from HACMPversion to HACMP 5.3 Detected. cldare cannot be run until migration has completed.

You should first check the clconvert.log for any failures and proceed to use the following steps:

1. Enter smitty hacmp.

2. Go to Problem Determination Tools.

3. Select Restore HACMP Configuration Database from Active Configuration.

Note: Only use the force option if you are sure that the error encountered can be automatically corrected.


If this still does not resolve the issue, you can check for zero-length /usr/es/sbin/cluster/.esmig lock files on each of the cluster nodes. These files are normally automatically removed when the last node integrates into the cluster.

5.7.5 Error: config_too_long during migrationIf the cluster was in working order before starting your migration process, it is unlikely that your cluster will enter recovery mode and encounter a config_too_long message. In the event that this occurs, consider the following HACMP backup behavior.

Various files are saved in the /usr/lpp/save.config directory during the upgrade process, including:

/usr/lpp/save.config/usr/es/sbin/cluster/events/node_up.rp/usr/lpp/save.config/usr/es/sbin/cluster/events/node_down.rp

If after the last node integrates into the cluster at the end of the migration, the ODM stanzas are not automatically updated, you could potentially encounter a config_too_long message because the processing within the cluster events will be unable to find the original path to these events in /usr/es/sbin/cluster/events.

After checking the clconvert.log file for any migration failures, a potential work-around is to remove the /usr/lpp/save.config portion out of the stanzas. This operation should only be performed as a last resort under the supervision of IBM support personnel.

Note: Removing these files will remove the lock. However, if these files were not removed through the standard procedure, something else might have gone wrong and you should consider contacting IBM software support before proceeding.

Important: If these paths were not automatically corrected during the migration, you might potentially have other things that failed to convert.

In this situation, consider contacting IBM software support.


Part 3 Cluster administration

In Part 3, we present cluster administrative tasks and scenarios for modifying and maintaining a PowerHA cluster.


� Cluster maintenance� Cluster management� Cluster security

Part 3



Chapter 6. Cluster maintenance

In this chapter we provide basic guidelines to follow while you are planning and performing maintenance operations in a PowerHA cluster. The goal is to keep the cluster applications as highly available as possible. We use functionality within PowerHA and AIX to perform these operations. Of course, the scenarios are not exhaustive.

In this chapter, AIX best practices for troubleshooting, including monitoring the error log, are assumed. However, we do not cover how to determine what problem exists, whether dealing with problems either after they are discovered or as preventative maintenance.


� Change control and testing� Starting and stopping the cluster� Resource group and application management� Scenarios� Cluster Test Tool

6


6.1 Change control and testingChange control is imperative to provide high availability in any system, but it is more crucial for a clustered environment to be optimized.

6.1.1 ScopeChange control is above and beyond documented procedures. It encompasses several things and is not optional. Change control includes, but is not limited to:

� Limit root access � Thoroughly documented and tested procedures� Proper planning and approval of all changes

6.1.2 Test clusterA test cluster is important both for maintaining proper change control and for the overall success of the production cluster. Test clusters allow thorough testing of administrative and/or maintenance procedures in an effort to find problems before the problem reaches the production cluster. Test clusters should not be considered a luxury, but a must-have.

Many current PowerHA customers have a test cluster, or at least started out with a test cluster. However, over time these cluster nodes have become utilized within the company in some form. To use these systems requires a scheduled maintenance window much like the production cluster. If that is the case, do not be fooled, as it truly is not a test cluster.

A test cluster, ideally, would be at least the same AIX, PowerHA, and application level as the production cluster. It is preferred to have the hardware to also be as similar as possible. In most cases it is not practical to fully mirror the production environment, especially when there are multiple production clusters. There are several things that can be done to maximize a test cluster when there are multiple clusters that have varying levels of software.

Using logical partitioning (LPAR), virtual I/O servers (VIOS), and multiple varying rootvg images, via alt_disk_install or multibos, have become common practice. Virtualization allows a test cluster to be easily created with very little physical resources and can even be within the same physical machine. The multi-boot option allows customers to easily change cluster environments by simply booting the partition from another image. This also allows testing of many software procedures such as:

� Applying AIX maintenance � Applying PowerHA fixes� Applying application maintenance


This type of test cluster would require at least one disk, per image, per LPAR. For example, if the test cluster had two nodes and three different rootvg images, it would require a minimum of six hard drives. This is still far easier than having six separate nodes in three different test clusters.

A test cluster also allows testing of hardware maintenance procedures. These procedures include, but are not limited to:

� System firmware updates� Adapter firmware updates� Adapter replacement� Disk replacement

Additional testing can be accomplished by utilizing the cluster test tool and event emulation.

6.2 Starting and stopping the cluster Starting cluster services refers to the process of starting the RSCT subsystems required by PowerHA, and then the PowerHA daemons that enable the coordination required between nodes in a cluster. During startup, the cluster manager runs the node_up event and resource groups are acquired. Stopping cluster services refers to stopping the same daemons on a node and might or might not cause the execution of additional PowerHA scripts, depending on the type of shutdown you perform.

Starting with HACMP V5.3, the cluster manager process (clstrmgrES) is always running, regardless of whether the cluster is online or not. It can be in one of the following states as displayed by running the lssrc -ls clstrmgrES command:

NOT_CONFIGURED The cluster is not configured or node is not synced.

ST_INIT The cluster is configured but not active on this node.

ST_STABLE The cluster services are running with resources online.

ST_JOINING The cluster node is joining the cluster.

ST_VOTING The cluster nodes are voting to decide event execution.

ST_RP_RUNNING The cluster is running a recovery program.

RP_FAILED A recovery program event script has failed.

ST_BARRIER Clstrmgr is in between events waiting at the barrier.

ST_CBARRIER Clstrmgr is exiting a recovery program.

ST_UNSTABLE The cluster is unstable usually do to an event error.

Chapter 6. Cluster maintenance 273

Changes in the state of the cluster are referred to as cluster events. The Cluster Manager monitors local hardware and software subsystems on each node for events such as an application failure event. In response to such events, the Cluster Manager runs one or more event scripts such as a restart application script. Cluster Managers running on all nodes exchange messages to coordinate required actions in response to an event.

During maintenance periods it might be necessary to stop and start cluster services. But before doing so, you need to understand the node(s) interactions it causes and the impact on your system’s availability. The cluster must be synchronized and verification should detect no errors. The following section briefly describes the processes themselves and then the processing involved in startup or shutdown of these services. Later in this section we describe the procedures necessary to start or stop cluster services on a node.

6.2.1 Cluster ServicesThe main PowerHA and RSCT daemons are as follows:

� Cluster Manager daemon (clstrmgrES)

This is the main PowerHA daemon. It maintains a global view of the cluster topology and resources and runs event scripts in response to changes in the state of nodes, interfaces, or resources (or when the user makes a request).

The Cluster Manager receives information about the state of interfaces from Topology Services. The Cluster Manager maintains updated information about the location, and status of all resource groups. The Cluster Manager is a client of Group Services, and uses the latter for reliable inter-daemon communication.

� Cluster Communication Daemon (clcomdES)

This daemon, first introduced in HACMP 5.1, provides secure communication between cluster nodes for all cluster utilities such as verification and synchronization and system management (C-SPOC). The clcomd daemon is started automatically at boot time by the init process. Starting with HACMP 5.2, clcomdES must be running before any cluster services can be started.

� Cluster Information Program (clinfoES)

This daemon provides status information about the cluster to cluster nodes and clients and calls the /usr/es/sbin/cluster/etc/clinfo.rc script in response to a cluster event. The clinfo daemon is optional on cluster nodes and clients.


� Cluster Topology Services Subsystem

The RSCT Topology Services subsystem monitors the status of network interfaces and publishes the state to clients, who access the information through Group Services membership. The main daemon is the hatsd. Topology Services also includes network interface modules hats_nim* that send and receive heartbeats. All cluster nodes must run the Topology Services subsystem.

� Cluster Group Services Subsystem

This RSCT subsystem provides reliable communication and protocols required for cluster operation. Clients are distributed daemons, such as the PowerHA Cluster Manager and the Enhanced Concurrent Logical Volume Manager. All cluster nodes must run the hagsd daemon.

� Cluster Globalized Server daemon (grpglsmd)

This RSCT daemon operates as a Group Services client; its function is to make switch adapter membership global across all cluster nodes. All cluster nodes must run the grpglsmd daemon.

� Resource Monitoring and Control Subsystem

This RSCT subsystem acts as a resource monitor for the event management subsystem and provides information about the operating system characteristics and utilization. The RMC subsystem must be running on each node in the cluster. By default the rmcd daemon is set up to start from inittab when it is installed. The rc.cluster script ensures the RMC subsystem is running.

6.2.2 Starting cluster servicesIn this section we describe the startup options of cluster services on any single node, multiple nodes, or even all nodes. You should always start cluster services by using SMIT. The SMIT panel can be seen in Figure 6-1.

Executing as the root user, perform the following steps to start the cluster services on a node:

1. Execute the SMIT fast path smitty clstart press Enter.


Start now, on system restart or both:

Indicate whether you want to start cluster services and the clinfoES when you commit the values on this panel by pressing Enter (now), when the operating system reboots (on system restart), or on both occasions.


Start Cluster Services on these nodes:

Enter the name(s) of one or more nodes on which you want to start cluster services. Alternatively, you can select nodes from a picklist. When entering multiple nodes manually separate the names with a comma as shown in Figure 6-1 on page 276.

Manage Resource Groups:

The options are Automatically or Manually:

– Automatically, which is the default, will bring brings resource group(s) online according to the resource groups’ configuration settings and the current cluster state and starts managing the resource group(s) and applications for availability.

– Manually will not activate resource groups while the cluster services on the selected node are started. After you start cluster services, you can bring any resource groups online or offline, as needed.

BROADCAST message at startup?:

Indicate whether you want to send a broadcast message to all nodes when the cluster services start.

Figure 6-1 Start Cluster Services Menu

Start Cluster Services


[Entry Fields]* Start now, on system restart or both now + Start Cluster Services on these nodes [Maddi,Patty] +* Manage Resource Groups Automatically + BROADCAST message at startup? true + Startup Cluster Information Daemon? true + Ignore verification errors? false + Automatically correct errors found during Interactively + cluster start?



Startup Cluster Information Daemon?

Indicate whether you want to start the clinfo daemon. If your application uses Clinfo, if you use the clstat monitor, or you want to run event emulation, set this field to true. Otherwise, set it to false.

Ignore Verification Errors?

Set this value to true only in case verification reports an ERROR and this ERROR does not put at risk the overall cluster functionality. Set this value to false to stop all selected nodes from starting cluster services if verification finds errors on any node.

Automatically correct errors found during cluster start?

The options are Yes, No, and Interactively. This is also known as auto corrective actions. Choosing Yes will fix automatically, without prompting. No will not fix them and prevent cluster services from starting if errors are encountered. The Interactively option will prompt the user during startup of what errors are found and reply to fix, or not to fix, accordingly.

Upon completing the fields and pressing Enter, the system starts the cluster services on the nodes specified, activating the cluster configuration that you have defined. The time that it takes the commands and scripts to run depends on your configuration—that is, the number of disks, the number of interfaces to configure, the number of file systems to mount, and the number of applications being started.

Note: In a production environment, it is generally not considered a best practice to have PowerHA services start up automatically on system restart.

The reason for this is directly related to the aftermath of system failure. If a resource group owning system crashes, and AIX is set to reboot after crash, it could restart cluster services in the middle of a current takeover. Depending on the cluster configuration this could cause resource group contention, resource group processing errors, or even a fallback to occur. All of which could extend an outage.

However during test and maintenance periods, and even on dedicated standby nodes, it might be convenient to use this option.

Note: There are situations when choosing interactively will indeed correct some errors. More details can be found in 7.6.4, “Running automatically corrective actions during verification” on page 424.


During the node_up event, resource groups are acquired. The time it takes to run each node_up event is dependent on the resource processing during the event. The node_up events for the joining nodes are processed sequentially.

When the command completes running and PowerHA cluster services are started on all nodes specified, SMIT displays a command status window. Note that when the SMIT panel indicates the completion of the cluster startup, event processing in most cases has not yet completed. To verify the nodes are up you can use clstat, WebSMIT, or even tail the hacmp.out file on any node. More information about this can be found in 7.7.1, “Cluster status checking utilities” on page 428.

6.2.3 Stopping cluster servicesThe following steps describe the procedure for stopping cluster services on a single, multiple or all nodes in a cluster by using the C-SPOC utility on one of the cluster nodes. C-SPOC stops the nodes sequentially, not in parallel. If any node specified to be stopped is inactive, the shutdown operation aborts on that node. Just like starting services, SMIT should always be used to stop cluster services. The SMIT panel to stop cluster services is shown in Figure 6-2.

Figure 6-2 Stop Cluster Services Menu

Stop Cluster Services


[Entry Fields]* Stop now, on system restart or both now + Stop Cluster Services on these nodes [Maddi,Patty]+ BROADCAST cluster shutdown? true +* Select an Action on Resource Groups Bring Resource Group>+



To stop cluster services:

1. Enter the fast path smitty clstop and press Enter.

2. Enter field values in the SMIT panel as follows:

Stop now, on system restart or both

Indicate whether you want the cluster services to stop now, at restart (when the operating system reboots), or on both occasions. If you select restart or both, the entry in the /etc/inittab file that starts cluster services is removed. Cluster services will no longer come up automatically after a reboot.

BROADCAST cluster shutdown?

Indicate whether you want to send a broadcast message to users before the cluster services stop. If you specify true, a message is broadcast on all cluster nodes.

Shutdown mode

Indicate the type of shutdown:

– Bring Resource Group Offline:

Stops PowerHA and releases the Resource Groups running on the node (if any). Other cluster nodes do not take over the resources of the stopped node. In previous versions this was known as graceful.

– Move Resource Groups:

Stops PowerHA and releases the Resource Groups present on the node. Next priority node takes over the resources of the stopped node. In previous versions this was known as graceful with takeover.

– Unmanage Resource Groups:

PowerHA stops on the node immediately. The node retains control of all its resources.You can use this option to bring down a node while you perform maintenance. This is a newer option that is similar to the older forced option. However, because cluster services are stopped, the applications are no longer highly available. If a failure occurs, no recovery for them will be provided. This feature is used when performing non-disruptive updates and upgrades to PowerHA.


6.3 Resource group and application managementIn this section we discuss how to:

� Bring a resource group offline� Bring a resource group online� Move a resource group to another node/site� Suspend/resume application monitoring

Understanding each of these actions is important, along with stopping and starting cluster services, as they are often used during maintenance periods.

In the following topics we start off with assuming that cluster services are running, the resource groups are online, the applications are running, and the cluster is stable. If the cluster is not in the stable state, then the resource group related operations will not be possible.

All three resource group options we discuss can be done by using the clRGmove command. However, in our examples we use C-SPOC. They also all have similar SMIT panels and picklist. In an effort to streamline this documentation, we show only one SMIT panel in each of the following sections.

6.3.1 Bringing a resource group offline using SMITTo bring a resource group offline:

1. Run smitty cl_admin HACMP Resource Group and Application Management Bring a Resource Group Offline.


2. The picklist appears. as shown in Figure 6-3. It lists only the resource groups that are online or in the ERROR state on all nodes in the cluster.

Figure 6-3 Resource Group picklist

3. Select the appropriate resource group from the list and press Enter. After the resource group has been selected, another picklist appears to Select an Online Node. The picklist will only contain the node(s) that are currently active in the cluster that currently are hosting the previously selected resource group.

4. Select an online node from the picklist and press Enter.

HACMP Resource Group and Application Management


Bring a Resource Group Online Bring a Resource Group Offline Move a Resource Group to Another Node / Site

Suspend/Resume Application Monitoring Application Availability Analysis __________________________________________________________________________ | Select a Resource Group | | | | Move cursor to desired item and press Enter. | | | | # | | # Resource Group State Node(s) / Site| | # | | Maddi_rg ONLINE Maddi / | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do |F1| /=Find n=Find Next |F9|_______________________________________________________________________|


5. The final SMIT menu appears with the information selected in the previous picklists as shown in Figure 6-4. Verify the entries previously specified and then press Enter to start the processing of the resource group to be brought offline.

Figure 6-4 Bring a resource group offline

After processing is completed, not only will the resource group be offline, but also cluster services remain active on the node. The standby will not acquire the Resource Group.

6.3.2 Bringing a resource group online using SMITTo bring a resource group online:

1. Run smitty cl_admin HACMP Resource Group and Application Management Bring a Resource Group Online.

2. Select a destination node from the picklist as shown in Figure 6-5.

3. The final SMIT menu appears with the information selected in the previous picklists.

4. Verify the entries previously specified and then press Enter to start the moving of the resource group.

Upon successful completion, PowerHA displays a message and the status, location, and a type of location of the resource group that was successfully started on the specified node. This information is also available using the clRGinfo command.

Bring a Resource Group Offline


[Entry Fields] Resource Group to Bring Offline Maddi_rg Node On Which to Bring Resource Group Offline Maddi

Note: When either moving a resource group or bringing a resource group online, if it has previously been moved or brought offline from its home node, then in the Select a Destination Node picklist the home node, or Highest Priority Node, will have an * in front of its name.


Figure 6-5 Destination node picklist

6.3.3 Moving a resource group using SMITMoving a resource group consists of releasing the resources on the current owning node and then processing the normal resource group startup procedures on the destination node. This results in a short period in which the application is not available.

HACMP V5.3 added the ability to move a resource group to another site. The concept is the same as moving it between local nodes. For our example, we are using the option to move to another node as opposed to another site.

HACMP Resource Group and Application Management


Show the Current State of Applications and Resource Groups Bring a Resource Group Online Bring a Resource Group Offline Move a Resource Group to Another Node / Site

Suspend/Resume Application Monitoring Application Availability Analysis

+------------------------------------------------------------------------+ ¦ Select a Destination Node ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. ¦ ¦ ¦ ¦ # *Denotes Originally Configured Highest Priority Node ¦ ¦ Patty ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦F1¦ /=Find n=Find Next ¦F9+------------------------------------------------------------------------+


To move a resource group:

1. Run smitty cl_admin HACMP Resource Group and Application Management Move a Resource Group to Another Node/Site Move Resource Groups to Another Node. The picklist appears. It lists only the resource groups that are ONLINE, in the ERROR or UNMANAGED states on all nodes in the cluster.

2. Select the appropriate resource group from the list and press Enter.

3. After the resource group has been selected, another picklist appears to Select a Destination Node. The picklist will only contain those nodes that are currently active in the cluster and are participating nodes in the previously selected resource group.

4. Select a destination node from the picklist.

5. The final SMIT menu appears with the information selected in the previous picklists (Figure 6-6).

Figure 6-6 Move a Resource Group SMIT panel

6. Verify the entries previously specified and then press Enter to start the moving of the resource group.

Move a Resource Group


[Entry Fields] Resource Group to be Moved Maddi_rg Destination Node Patty



Upon successful completion, PowerHA displays a message and the status, location, and a type of location of the resource group that was successfully stopped on the specified node as shown in Figure 6-7.

Figure 6-7 Resource Group Status

Any time that a resource group is moved to another node, application monitoring for the applications is suspended during the application stop. After the application has restarted on the destination node, application monitoring will resume. Additional information can be found in 6.3.4, “Suspending and resuming application monitoring” on page 285.

6.3.4 Suspending and resuming application monitoringDuring application maintenance periods, it is often desirable to bring the application offline only, as opposed to stopping cluster services. If application monitoring is being used, it is required to suspend application monitoring before stopping the application. Otherwise PowerHA will take the predefined recovering procedures when it detects the application is down, which is not desired during maintenance. Defining application monitors is explained in 7.7.7, “Application monitoring” on page 440.

COMMAND STATUS



[MORE...7]

Cluster Name: Testcluster

Resource Group Name: Maddi_rg

Node State---------------------------- ---------------Maddi OFFLINEPatty ONLINE

[BOTTOM]



To suspend application monitoring:

1. Run smitty cl_admin HACMP System Management Suspend/Resume Application Monitoring Suspend Application Monitoring and press Enter.

2. You are prompted to select the application server for which this monitor is configured. If you have multiple application monitors, they are all suspended until you choose to resume them or until a cluster event occurs to resume them automatically, as explained above.

The monitoring will stay suspended until either resumed manually or the resource group is stopped/restarted.

To resume application monitoring:

1. Run smitty cl_admin HACMP System Management Suspend/Resume Application Monitoring Resume Application Monitoring and press Enter.

2. Choose the appropriate application server associated with the application monitor you want to resume.

Application monitoring will continue to stay active until either suspended manually or until the resource group is brought offline.

6.4 ScenariosIn this section we cover the common scenarios of:

� PCI hot-plug replacement of a NIC� Installing AIX and PowerHA fixes� Replacing and LVM mirrored disk� Application maintenance

6.4.1 PCI hot-plug replacement of a NICThis section takes you through the process of replacing a PCI hot plug network interface card by utilizing the C-SPOC “PCI Hot Plug Replace a Network Interface Card” facility.

Special considerationsKeep the following considerations in mind before you replace a hot-pluggable PCI network interface card:

� Be aware that if a network interface you are hot-replacing is the only available keepalive path on the node where it resides, you must shut down PowerHA on this node in order to prevent a partitioned cluster while the interface is being replaced.


This situation is easily avoidable by having a working non-IP network between the cluster nodes.

� SMIT gives you the option of doing a graceful shutdown on this node. From this point, you can manually hot-replace the network interface card.

� Hot-replacement of Ethernet, Token-Ring, FDDI and ATM network interface cards is supported. This process is not supported for non-IP communication devices.

� You should manually record the IP address settings of the network interface being replaced to prepare for unplanned failures.

� You should not attempt to change any configuration settings while the hot replacement is in progress.

The SMIT interface simplifies the process of replacing a hot-pluggable PCI network interface card. PowerHA supports only one PCI hot plug network interface card replacement using SMIT at one time per node.

Scenario 1 (live NICs only)Follow this procedure when hot-replacing the following interfaces:

� A live PCI network service interface in a resource group and with an available non-service interface

� A live PCI network service interface not in a resource group and with an available non-service interface

� A live PCI network boot interface with an available non-service interface

Go to the node on which you want to replace a hot-pluggable PCI network interface card.

1. Run smitty hacmp System Management (C-SPOC) HACMP Communication Interface Management PCI Hot Plug Replace a Network Interface Card and press Enter.

SMIT displays a list of available PCI network interfaces that are hot-pluggable.

Note: If the network interface was alive before the replacement process began, then between the initiation and completion of the hot-replacement, the interface being replaced is in a maintenance mode. During this time, network connectivity monitoring is suspended on the interface for the duration of the replacement process.

Tip: You can also get to this panel with the fast path smitty cl_pcihp.


2. Select the network interface you want to hot-replace. Press Enter. The service address of the PCI interface is moved to the available non-service interface.

3. SMIT prompts you to physically replace the network interface card. After you have replaced the card, you are asked to confirm that replacement has occurred.

If you select yes, the service address will be moved back to the network interface which has been hot-replaced. On aliased networks, the service address will not move back to the original network interface, but will remain as an alias on the same network interface. The hot-replacement is complete.

If you select no, you must manually reconfigure the interface settings to their original values:

1. Run the drslot command to take the PCI slot out of the removed state.

2. Run mkdev on the physical interface.

3. Use ifconfig manually as opposed to smitty chinet, cfgmgr, or mkdev in order to avoid configuring duplicate IP addresses or an unwanted boot address.

Scenario 2 (live NICs only)Follow this procedure when hot-replacing a live PCI network service interface on a resource group but with no available non-service interface. Steps 1-3 are the same as in the previous scenario, so in this scenario we start from the SMIT fast path of smitty cl_pcihp:

1. Select the network interface that you want to hot-replace and press Enter.

2. SMIT prompts you to choose whether to move the resource group to another node during the replacement process in order to ensure its availability.

3. If you choose to do this, SMIT gives you the option of moving the resource group back to the node on which the hot-replacement took place after completing the replacement process.

If you do not move the resource group to another node, it will be offline for the duration of the replacement process.

4. SMIT prompts you to physically replace the network interface card. After you have replaced the card, you are asked to confirm that replacement has occurred.

If you select Yes, the hot-replacement is complete.


If you select no, you must manually reconfigure the interface settings to their original values:

1. Run the drslot command to take the PCI slot out of the removed state.

2. Run mkdev on the physical interface.

3. Use ifconfig manually as opposed to smitty chinet, cfgmgr, or mkdev in order to avoid configuring duplicate IP addresses or an unwanted boot address.

4. (If applicable) Move the resource group back to the node from which you moved it in Step 2.

Scenario 3 (non-alive NICs only)Follow the procedure below when hot-replacing the following interfaces:

� A non-alive PCI network service interface in a resource group and with an available non-service interface

� A non-alive PCI network service interface not in a resource group and with an available non-service interface

� A non-alive PCI network boot interface with an available non-service interface

We begin again from the fast path of smitty cl_pcihp as in the previous scenario:

1. Select the network interface that you want to hot-replace and press Enter.

SMIT prompts you to physically replace the network interface card. After you have replaced it, SMIT prompts you to confirm that replacement has occurred.

2. If you select Yes, the hot-replacement is complete. If you select no, you must manually reconfigure the interface settings to their original values:

a. Run the drslot command to take the PCI slot out of the removed state.

b. Run mkdev on the physical interface.

c. Use ifconfig manually as opposed to smitty chinet, cfgmgr, or mkdev commands in order to avoid configuring duplicate IP addresses or an unwanted boot address.

Hot-replacing an ATM network interface cardATM network interface cards support multiple logical interfaces on one network interface card. An ATM network interface hot-replacement is managed the same as other network interface cards, with the following exceptions:

� All logical interfaces configured on the card being replaced that are not configured for and managed by PowerHA are lost during the replacement


process. They will not be reconfigured on the newly replaced ATM interface card. All other logical interfaces configured for and managed by PowerHA on the ATM network interface card being replaced are restored when the replacement is complete.

� Because it is possible to have more than one service interface configured on an ATM network interface card—thus multiple resource groups on one ATM network interface—when you hot-replace an ATM network interface card, SMIT leads you through the process of moving each resource group on the ATM interface, one at a time.

For additional details, refer to the High Availability Cluster Multi-Processing Administration Guide, SC23-4862.

6.4.2 Fixes This section relates to installing fixes, previously referred to as APARs or PTFs. AIX now has maintenance updates known as Technology Levels (TL) and Service Packs (SP) for the Technology levels. PowerHA has adopted the AIX process of creating Service Packs (SP). It is our recommendation that maintenance should be loaded at least twice a year. Some cases dictate deviating from the standard practice as any serious problems are encountered.

Some AIX fixes can be loaded dynamically without a reboot. Kernel and device driver updates often require a reboot as installing updates to them runs a bosboot. One way to determine if a reboot is required is to check the .toc created using the inutoc command prior to installing the fixes. The contents of the file contains fileset information similar to Example 6-1.

Example 6-1 Checking the .toc prior to installing fixes

bos.64bit 6.1.2.0 I b usr,root# Base Operating System 64 bit Runtime bos.acct 6.1.2.0 I N usr,root# Accounting Services

In the foregoing example, the fileset bos.64bit requires a reboot as indicated by the “b” character in fourth column. The “N” character indicates that a reboot is not required.

Applying PowerHA fixes is similar to AIX fixes. However, starting with HACMP 5.4 it is no longer required to reboot after installing base filesets. Always check with the support line if unsure about the effects of loading certain fixes.

When updating AIX or PowerHA software it is our recommendation to:

� Take a cluster snapshot and save it somewhere off the cluster.


� Back up the operating system and data before performing any upgrade. Prepare a backout plan in case you encounter problems with the upgrade.

� Always do an initial run through on a test cluster.

� Use disk update if possible.

� Follow this same general rule for fixes to the application; follow specific instructions for the application.

The general procedure for applying AIX fixes that require a reboot is as follows:

1. Stop cluster services on standby node.

2. Apply, do not commit, TL or SP to standby node. (reboot as needed)

3. Start cluster services on standby node.

4. Stop cluster services on production node using Move Resource Group option to standby machine.

5. Apply TL or SP to primary node. (reboot as needed).

6. Start cluster services on production node.

If installing either AIX or PowerHA fixes that do not require a reboot, it is now possible to simply use the Unmanage Resource Groups option when stopping cluster services as described in 6.2.3, “Stopping cluster services” on page 278. The general procedure to this would be:

1. Stop cluster services on standby using the Unmanage option.

2. Apply, do not commit, SP to standby node.


4. Stop cluster services on production node using the Unmanage option.

5. Apply SP to primary node.


Of course, these procedures should be tested in a test environment before ever attempting them in production.

6.4.3 StorageMost shared storage environments today use some level of RAID for data protection and redundancy. When utilizing RAID (1,5, or 10) devices, individual disk failures normally do not require AIX LVM maintenance to be performed. Any procedures required are often external to cluster nodes and have no affects to the cluster itself. However, if protection is provided by utilizing LVM mirroring, then LVM maintenance procedures are required.


C-SPOC provides a facility to aid in the replacement of failed LVM mirrored disk. This facility, Cluster Disk Replacement, performs all the necessary LVM operations of replacing an LVM mirrored disk. To utilize this facility, ensure that the following conditions are met:

� You have root privilege.

� The affected disk, and preferably the entire volume group, is mirrored.

� The desired replacement disk is available to the each node and a PVID is already assigned to it and is shown on each node via lspv command.

If physically replacing an existing disk, remove the old disk and replace the new one in its place. This of course assumes that the drive is hot plug replaceable, which is common.

To replace a mirrored disk via C-SPOC:

1. Locate the failed disk. Make note of the PVID on the disk and volume group it belongs to.

2. Enter smitty cl_admin HACMP Physical Volume Management Cluster Disk Replacement and press Enter.

SMIT displays a list of disks that are members of volume groups contained in cluster resource groups. There must be two or more disks in the volume group where the failed disk is located. The list includes the volume group, the hdisk, the disk PVID, and the reference cluster node. (This node is usually the cluster node that has the volume group varied on.)

3. Select the disk for disk replacement (source disk) and press Enter.

SMIT displays a list of those available shared disk candidates that have a PVID assigned to them, to use for replacement. (Only a disk that is of the same capacity or larger than the failed disk is suitable to replace the failed disk.)

4. Select the desired replacement disk (destination disk) and press Enter.

SMIT displays your selections from the two previous panels.

5. Press Enter to continue or Cancel to terminate the disk replacement process.

A warning message will appear telling you that continuing will delete any information you might have stored on the destination disk.

6. Press Enter to continue or Cancel to terminate.

SMIT displays a command status panel, and informs you of the replacepv command recovery directory. If disk configuration fails and you want to proceed with disk replacement, you must manually configure the destination disk. If you terminate the procedure at this point, be aware that the destination disk can be configured on more than one node in the cluster.


The replacepv command updates the volume group in use in the disk replacement process (on the reference node only).

Configuration of the destination disk on all nodes in the resource group takes place at this time.

If a node in the resource group fails to import the updated volume group, you can use the C-SPOC Import a Shared Volume Group facility as shown in “Importing volume groups using C-SPOC” on page 356.

C-SPOC will not remove the failed disk device information from the cluster nodes. This must done manually by running rmdev -dl <devicename> command.

6.4.4 ApplicationsIt is commonly understood that each application varies, however most application maintenance requires the application be brought offline. This can be done in a number of ways. The most appropriate method for any particular environment depends on the overall cluster configuration.

In a multi-tier environment where an application server is dependent on a database, and maintenance is to be performed on the database, then usually both the database and the application server will need to be stopped. It is most common that at least the database will be in the cluster. When using resource group dependencies, the application server can easily be part of the same cluster.

It is also common, to minimize the overall downtime of the application, that the application maintenance be performed first on the non-production nodes for that application. Traditionally this means on a standby node, however it is not very common that a backup/fallover node truly is a standby only. If not a true standby node then any work load or applications currently running on that node must be accounted for to minimize any adverse affects of installing the maintenance. Hopefully this has all been tested previously in a test cluster.

In most cases stopping cluster services is not needed. You can simply bring the resource group offline as described in 6.3.1, “Bringing a resource group offline using SMIT” on page 280. If the shared volume group is needed to be online during the maintenance then you can just suspend application monitoring and start the application stop server script to bring the application offline. However, this will leave the service IP address online, which might not be desirable.

Note: During the command execution, SMIT tells you the name of the recovery directory to use should replacepv fail. Make note of this information as it is required in the recovery process.


In a multiple resource group and/or multiple application environment all running on the same node, it might not be feasible to stop cluster services on the local node. Be aware of the possible affects of not stopping cluster services on the node in which application maintenance is being performed.

If during the maintenance period, the system encounters a catastrophic error resulting a crash, a fallover will occur. This might be undesirable if the maintenance has not been performed on the fallover candidates first and/or if the maintenance is incomplete on the local node. Though this might be a rare occurrence, the possibility exists and must be understood.

Another possibility is that if another production node fails during this maintenance period a fallover can occur successfully on the local node without adverse affects. If this is not desired, and there are multiple resource groups, then you might want to move the other resource groups to another node first and then stop cluster services on the local node.

If using persistent addresses, and you stop cluster services, local adapter swap protection is no longer provided. Though again rare, this makes it possible that when using the persistent address to perform maintenance and the hosting NIC fails that your connection will be dropped.

After performing the application maintenance you should always test the cluster again. Depending on what actions you chose to stop the application, you will either need to restart cluster services, bring the resource group back online via C-SPOC, or manually run the application start server script and resume application monitoring as needed.

6.5 Cluster Test Tool

The Cluster Test Tool utility allows you to test a PowerHA cluster configuration to evaluate how a cluster operates under a set of specified circumstances, such as when cluster services on a node fail or when a node loses connectivity to a cluster network.

You can start a test, let it run unattended, and return later to evaluate the results of your testing. You should run the tool under both low load and high load conditions to observe how system load affects your PowerHA cluster.

You run the Cluster Test Tool from SMIT on one node in a PowerHA cluster. For testing purposes, this node is referred to as the control node. From the control node, the tool runs a series of specified tests, some on other cluster nodes, gathers information about the success or failure of the tests processed, and stores this information in the Cluster Test Tool log file for evaluation or future reference.


6.5.1 Custom testing

If you are an experienced PowerHA administrator and want to tailor cluster testing to your environment, you can create custom tests that can be run from SMIT.

You create a custom test plan, a file that lists a series of tests to be run, to meet requirements specific to your environment and apply that test plan to any number of clusters. You specify the order in which tests run and the specific components to be tested. After you set up your custom test environment, you run the test procedure from SMIT and view test results in SMIT and in the Cluster Test Tool log file.

6.5.2 Test duration

Running automated testing on a basic two-node cluster that has a simple cluster configuration takes approximately 30 to 60 minutes to complete.

Individual tests can take around three minutes to run. The following conditions affect the length of time to run the tests:

� Cluster complexity.

� Testing in complex environments takes considerably longer.

� Network latency

� Cluster testing relies on network communication between the nodes. Any degradation in network performance slows the performance of the Cluster Test Tool.

� Use of verbose logging for the tool.

� If you customize verbose logging to run additional commands from which to capture output, testing takes longer to complete. In general, the more commands you add for verbose logging, the longer a test procedure takes to complete.

� Manual intervention on the control node

� At some points in the test, you might need to intervene.

� Running custom tests.

� If you run a custom test plan, the number of tests run also affects the time required to run the test procedure. If you run a long list of tests, or if any of the tests require a substantial amount of time to complete, then the time to process the test plan increases.


6.5.3 Considerations

The Cluster Test Tool has some considerations. It does not support testing of the following PowerHA cluster-related components:

� ATM networks � Single adapter networks� Network adapters that have FQDN � Resource groups with dependencies

You can perform general cluster testing for clusters that support sites, but not testing specific to PowerHA sites or any of the PowerHA/XD products. PowerHA/XD for Metro Mirror and PowerHA/XD for GLVM use sites in their cluster configuration. Here are some situations regarding cluster testing:

� Replicated resources. You can perform general cluster testing for clusters that include replicated resources, but not testing specific to replicated resources or any of the PowerHA/XD products. PowerHA/XD for Metro Mirror and PowerHA/XD for GLVM all include replicated resources in their cluster configuration.

� Dynamic cluster reconfiguration. You cannot run dynamic reconfiguration while the tool is running.

� Pre-events and post-events. Pre-events and post-events run in the usual way, but the tool does not verify that the events were run or that the correct action was taken. In addition, the Cluster Test Tool might not recover from the following situations:

� A node that fails unexpectedly, that is, a failure not initiated by testing.

� The cluster does not stabilize.

6.5.4 Automated testing

Use the automated test procedure, a predefined set of tests, supplied with the tool to perform basic cluster testing on any cluster. No setup is required. You simply run the test from SMIT and view test results from SMIT and the Cluster Test Tool log file.

The automated test procedure runs a predefined set of tests on a node that the tool randomly selects. The tool ensures that the node selected for testing varies from one test to another. You can run the automated test procedure on any PowerHA cluster that is not currently in service.


Running automated testsYou can run the automated test procedure on any PowerHA cluster that is not currently in service as the beginning of the test includes starting the cluster. The Cluster Test Tool runs a specified set of tests and randomly selects the nodes, networks, resource groups, and so forth for testing. The tool tests different cluster components during the course of the testing. For a list of the tests that are run, see “Understanding automated testing” discussed next.

Before running the automated test:

� Ensure that the cluster is not in service in a production environment.

� Stop PowerHA cluster services; this is recommended but optional. Note that if the Cluster Manager is running, some of the tests will be irrational for your configuration, but the Test Tool will continue to run.

� Cluster nodes are attached to two IP networks.

One network is used to test a network becoming unavailable then available. The second network provides network connectivity for the Cluster Test Tool. Both networks are tested, one at a time.

Understanding automated testingThese topics list the sequence that the Cluster Test Tool uses for the automated testing, and describes the syntax of the tests run during automated testing.

The automated test procedure performs sets of predefined tests in this order:

1. General topology tests2. Resource group tests on non-concurrent resource groups3. Resource group tests on concurrent resource groups4. IP-type network tests for each network 5. Non-IP network tests for each network 6. Volume group tests for each resource group 7. Site-specific tests 8. Catastrophic failure test.

The Cluster Test Tool discovers information about the cluster configuration, and randomly selects cluster components, such as nodes and networks, to be used in the testing.

Which nodes are used in testing varies from one test to another. The Cluster Test Tool can select some node(s) for the initial battery of tests, and then, for subsequent tests, it can intentionally select the same node(s), or, choose from nodes on which no tests were run previously. In general, the logic in the automated test sequence ensures that all components are sufficiently tested in all necessary combinations.


The testing follows these rules:

� Tests operation of a concurrent resource group on one randomly selected node, not all nodes in the resource group.

� Tests only those resource groups that include monitored application servers or volume groups.

� Requires at least two active IP networks in the cluster to test non-concurrent resource groups.

The automated test procedure runs a node_up event at the beginning of the test to make sure that all cluster nodes are up and available for testing.

Launching the cluster test toolYou can use the cluster test tool to run an automated test procedure.

To run the automated test procedure:

1. Enter smitty hacmp Initialization and Standard Configuration HACMP Cluster Test Tool, and press Enter.

The system displays:

Are you sure

2. If you press Enter again, the automated test plan runs.

3. Evaluate the test results.

General topology testsThe Cluster Test Tool runs the general topology tests in a certain order. The order is as follows:

1. Bring a node up and start cluster services on all available nodes.

2. Stop cluster services on a node and bring resource groups offline.

3. Restart cluster services on the node that was stopped

4. Stop cluster services and move resource groups to another node

5. Restart cluster services on the node that was stopped

6. Stop cluster services on another node and place resource groups in an UNMANAGED state.

7. Restart cluster services on the node that was stopped.

The Cluster Test Tool uses the terminology for stopping cluster services that was used in pre-HACMP 5.4.


When the automated test procedure starts, the tool runs each of the following tests in the order shown:

1. NODE_UP, ALL, Start cluster services on all available nodes

2. NODE_DOWN_GRACEFUL, node1, Stop cluster services gracefully on a node

3. NODE_UP, node1, Restart cluster services on the node that was stopped

4. NODE_DOWN_TAKEOVER, node2, Stop cluster services with takeover on a node


6. NODE_DOWN_FORCED, node3, Stop cluster services forced on a node


Resource group testsThere are two groups of resource group tests that can be run. Which group of tests to run depends on the startup policy for the resource group: non-concurrent and concurrent resource groups. If a resource of the specified type does not exist in the resource group, the tool logs an error in the Cluster Test Tool log file.

Resource group starts on a specified nodeThe following tests run if the cluster includes one or more resource groups that have a startup management policy other than Online on All Available Nodes, that is, the cluster includes one or more non-concurrent resource groups.

The Cluster Test Tool runs each of the following tests in the order shown for each resource group:

1. Bring a resource group offline and online on a node.

RG_OFFLINE, RG_ONLINE

2. Bring a local network down on a node to produce a resource group fallover. NETWORK_DOWN_LOCAL, rg_owner, svc1_net, Selective fallover on local network down

3. Recover the previously failed network.

NETWORK_UP_LOCAL, prev_rg_owner, svc1_net, Recover previously failed network

4. Move a resource group to another node. RG_MOVE

5. Bring an application server down and recover from the application failure. SERVER_DOWN, ANY, app1, /app/stop/script, Recover from application failure


Resource group starts on all available nodesIf the cluster includes one or more resource groups that have a startup management policy of Online on All Available Nodes, that is, the cluster has concurrent resource groups, the tool runs one test that brings an application server down and recovers from the application failure.

The tool runs the following test:

RG_OFFLINE, RG_ONLINE SERVER_DOWN, ANY, app1, /app/stop/script, Recover from application failure

Network testsThe tool runs tests for IP networks and for non-IP networks.

For each IP network, the tool runs these tests:

� Bring a network down and up:

NETWORK_DOWN_GLOBAL, NETWORK_UP_GLOBAL

� Fail a network interface, join a network interface. This test is run for the service interface on the network. If no service interface is configured, the test uses a random interface defined on the network:

FAIL_LABEL, JOIN_LABEL

� For each non-IP network, the tool runs these tests:

Bring a non-IP network down and up.

NETWORK_DOWN_GLOBAL, NETWORK_UP_GLOBAL

Volume group testsThe tool also runs tests for volume groups. For each resource group in the cluster, the tool runs tests that fail a volume group in the resource group: VG_DOWN.

Site specific testIf sites are present in the cluster, the tool runs tests for them. The automated testing sequence that the Cluster Test Tool uses contains two site-specific tests:

� auto_site: This sequence of tests runs if you have any cluster configuration with sites. For instance, this sequence is used for clusters with cross-site LVM mirroring configured that does not use XD_data networks. The tests in this sequence include:

SITE_DOWN_GRACEFUL Stop the cluster services on all nodes in a site while taking resources offline

SITE_UP Restart the cluster services on the nodes in a site


SITE_DOWN_TAKEOVER Stop the cluster services on all nodes in a site and move the resources to nodes at another site

SITE_UP Restart the cluster services on the nodes at a site

RG_MOVE_SITE Move a resource group to a node at another site

� auto_site_isolation: This sequence of tests runs only if you configured sites and an XD-type network. The tests in this sequence include:

SITE_ISOLATION Isolate sites by failing XD_data networks

SITE_MERGE Merge sites by bringing up XD_data networks.

Catastrophic failure testAs a final test, the tool stops the Cluster Manager on a randomly selected node that currently has at least one active resource group.

CLSTRMGR_KILL, node1, Kill the cluster manager on a node

If the tool terminates the Cluster Manager on the control node, you might need to reboot this node.

6.5.5 Custom testing

If you are an experienced PowerHA administrator and want to tailor cluster testing to your environment, you can create custom tests that can be run from SMIT.

You create a custom test plan, a file that lists a series of tests to be run, to meet requirements specific to your environment and apply that test plan to any number of clusters. You specify the order in which tests run and the specific components to be tested. After you set up your custom test environment, you run the test procedure from SMIT and view test results in SMIT and in the Cluster Test Tool log file.

Planning a test procedure Before you create a test procedure, make sure that you are familiar with the PowerHA clusters on which you plan to run the test. List the following components in your cluster and have this list available when setting up a test:

� Nodes� IP networks� Non-IP networks� XD-type networks� Volume groups� Resource groups


� Application servers� Sites

Your test procedure should bring each component offline then online, or cause a resource group fallover, to ensure that the cluster recovers from each failure. Start your test by running a node_up event on each cluster node to ensure that all cluster nodes are up and available for testing.

Creating a custom test procedureA test plan is a text file that lists cluster tests to be run in the order in which they are listed in the file. In a test plan, specify one test per line. You can set values for test parameters in the test plan or use variables to set parameter values.

The tool supports the following tests:

FAIL_LABEL Brings the interface associated with the specified label down on the specified node.

JOIN_LABEL Brings the interface associated with the specified label up on the specified node.

NETWORK_UP_GLOBAL Brings a specified network up (IP network or non-IP network) on all nodes that have interfaces on the network.

NETWORK_DOWN_GLOBAL Brings a specified network down (IP network or non-IP network) on all nodes that have interfaces on the network.

NETWORK_UP_LOCAL Brings a network on a node up.

NETWORK_DOWN_LOCAL Brings a network on a node down.

NETWORK_UP_NONIP Brings a Non-IP network up.

NETWORK_DOWN_NONIP Brings a Non-IP network down.

NODE_UP Starts cluster services on the specified node.

NODE_DOWN_GRACEFUL Stops cluster services and brings the resource groups offline on the specified node.

NODE_DOWN_TAKEOVER Stops cluster services with the resources acquired by another node.

Note: The Cluster Test Tool uses the terminology for stopping cluster services that was used pre-HACMP 5.4. They translate as follows:

Graceful = Bring Resource Groups Offline Takeover = Move Resource GroupsForced = Unmanage Resource Groups


NODE_DOWN_FORCED Stops cluster services on the specified node with the Unmanage Resource Group option.

CLSTRMGR_KILL Terminates the Cluster Manager on the specified node.

RG_MOVE Moves a resource group that is already online to a specific node.

RG_MOVE_SITE Moves a resource group that is already online to an available node at a specific site.

RG_OFFLINE Brings a resource group offline that is already online.

RG_ONLINE Brings a resource group online that is already offline.

SERVER_DOWN Brings a monitored application server down.

SITE_ISOLATION Brings down all XD_data networks in the cluster at which the tool is running, thereby causing a site isolation.

SITE_MERGE Brings up all XD_data networks in the cluster at which the tool is running, thereby simulating a site merge. Run the SITE_MERGE test after running the SITE_ISOLATION test.

SITE_UP Starts cluster services on all nodes at the specified site that are currently stopped.

SITE_DOWN_TAKEOVER Stops cluster services on all nodes at the specified site and moves the resources to node(s) at another site by launching automatic rg_move events.

SITE_DOWN_GRACEFUL Stops cluster services on all nodes at the specified site and takes the resources offline.

VG_DOWN Emulates an error condition for a specified disk that contains a volume group in a resource group.

WAIT Generates a wait period for the Cluster Test Tool.

Specifying parameters for testsYou can specify parameters for the tests in the test plan. Parameters can be specified in one of the following ways:

� By using a variables file. A variables file defines values for variables assigned to parameters in a test plan.

� By setting values for test parameters as environment variables.

� By identifying values for parameters in the test plan.


When the Cluster Test Tool starts, it uses a variables file if you specified the location of one in SMIT. If it does not locate a variables file, it uses values set in an environment variable. If a value is not specified in an environment variable, it uses the value in the test plan. If the value set in the test plan is not valid, the tool displays an error message.

Using a variables fileThe variables file is a text file that defines the values for test parameters. By setting parameter values in a separate variables file, you can use your test plan to test more than one cluster.

The entries in the file have this syntax:

parameter_name = value

For example, to specify a node as node_lee:

node=node_lee

To provide more flexibility, you can:

1. Set the name for a parameter in the test plan.

2. Assign the name to another value in the variables file.

For example, you could specify the value for node as node1 in the test plan: NODE_UP,node1,

Bring up node1 In the variables file, you can then set the value of node1 to node_lee:

node1=node_lee

The following example shows a sample variables file:

node1=node_leenode2=node_kimnode3=node_briannanode4=node_kileigh

Using a test planIf you want to run a test plan on only one cluster, you can define test parameters in the test plan. The associated test can be run only on the cluster that includes those cluster attributes specified. More information about the syntax of the test parameters is given in the following sections.

Description of the testsThe test plan supports the tests listed in this section. The description of each test includes information about the test parameters and the success indicators for a test.


Test syntaxThis topic describes the syntax for a test. The syntax for a test is:

TEST_NAME, parameter1, parametern|PARAMETER, comments

Where:

� The test name is in upper_case letters.

� Parameters follow the test name.

� Italic text indicates parameters expressed as variables.

� Commas separate the test name from the parameters and the parameters from each other. A space around the commas is also supported starting with HACMP 5.4.

� The example syntax line shows parameters as parameter1 and parametern with n representing the next parameter. Tests typically have from two to four parameters.

� A pipe (|) indicates parameters that are mutually exclusive alternatives.

� Comments (optional) is user-defined text that appears at the end of the line. The Cluster Test Tool displays the text string when the Cluster Test Tool runs.

In the test plan, the tool ignores:

� Lines that start with a pound sign (#)

� Blank lines.

Node testsThe node tests start and stop cluster services on specified nodes:

NODE_UP, node | ALL, comments:

This command starts the cluster services on a specified node that is offline or on all nodes that are offline.

node: The name of a node on which cluster services start

ALL: Any nodes that are offline have cluster services start

comments: User-defined text to describe the configured test

Note: One of the success indicators for each test is that the cluster becomes stable.


Example

NODE_UP, node1, Bring up node1

Entrance criteria

Any node to be started is inactive.

Success indicators

The following conditions indicate success for this test:

� The cluster becomes stable.� The cluster services successfully start on all specified nodes.� No resource group enters the error state.� No resource group moves from online to offline.

The following command stops cluster services on a specified node and brings resource groups offline:

NODE_DOWN_GRACEFUL, node | ALL, comments:

Where:

node: The name of a node on which cluster services stop

ALL: All nodes that are online to have cluster services stop. At least one node in the cluster must be online


Example

NODE_DOWN_GRACEFUL, node3, Bring down node3 gracefully

Entrance criteria

Any node to be stopped is active.

Success indicators


� The cluster becomes stable.

� Cluster services stop on the specified node(s).

� Cluster services continue to run on other nodes if ALL is not specified.

� Resource groups on the specified node go offline, and do not move to other nodes.

� Resource groups on other nodes remain in the same state.


The following command stops cluster services on a specified node with a resource group acquired by another node as configured, depending on resource availability.

NODE_DOWN_TAKEOVER, node, comments:

Where:

node: The name of a node on which to stop cluster services


Example

NODE_DOWN_TAKEOVER, node4, Bring down node4 by moving the resource groups.

Entrance criteria

The specified node is active.

Success indicators


� The cluster becomes stable.� Cluster services stop on the specified node.� Cluster services continue to run on other nodes.� All resource groups remain in the same state.

The following command stops cluster services on a specified node and unmanage the resource groups. Resources on the node remain online, that is, they are not released:

NODE_DOWN_FORCED, node, comments:

Where:

node: The name of a node on which to stop cluster services.

comments: User-defined text to describe the configured test.

Example

NODE_DOWN_FORCED, node2, Bring down node2 via unmanaged.

Entrance criteria

Cluster services on another node have not already been stopped with its resource groups placed in an UNMANAGED state. The specified node is active.


Success indicators


� The cluster becomes stable.� The resource groups on the node change to UNMANAGED state.� Cluster services stop on the specified node.� Cluster services continue to run on other nodes.� All resource groups remain in the same state.

Network tests for an IP networkThe Cluster Test Tool requires two IP networks to run any of the tests described in this section. The second network provides network connectivity for the tool to run. The Cluster Test Tool verifies that two IP networks are configured before running the test.

The following command brings up a specified network on a specified node:

NETWORK_UP_LOCAL, node , network , comments:

Where:

node: The name of a node on which to bring up network

network: The name of the network to which the interface is connected


Example

NETWORK_UP_LOCAL, node6, hanet1, Bring up hanet1 on node 6

Entrance criteria

The specified node is active and has at least one inactive interface on the specified network.

Success indicators



� Cluster services continue to run on the cluster nodes where they were active before the test.

� Resource groups that are in the ERROR state on the specified node and that have a service IP label available on the network can go online, but should not enter the ERROR state.



The following command brings a specified network down on a specified node:

NETWORK_DOWN_LOCAL, node, network, comments:

Where:

node: The name of a node on which to bring network down



Entrance criteria

The specified node is active and has at least one active interface on the specified network.

Success indicators




� Resource groups on other nodes remain in the same state; however, some might be hosted on a different node.

� If the node hosts a resource group for which the recovery method is set to notify, the resource group does not move.

The following command brings specified network up on all nodes that have interfaces on the network. The network specified can be an IP network or a serial network:

NETWORK_UP_GLOBAL, network, comments:

Where:



Example

NETWORK_UP_GLOBAL, hanet1, Bring up hanet1 on node 6

Important: If one IP network is already unavailable on a node, the cluster might become partitioned. The Cluster Test Tool does not take this into account when determining the success.


Entrance criteria

The specified network is active on at least one node.

Success indicators






The following command brings the specified network down on all nodes that have interfaces on the network. The network specified can be an IP network or a serial network:

NETWORK_DOWN_GLOBAL, network, comments:

Where:



Example

NETWORK_DOWN_GLOBAL, hanet1, Bring down hanet1 on node 6

Entrance criteria

The specified network is inactive on at least one node.

Success indicators

The following conditions indicate success for this test





Network interface tests for IP networksThis section lists tests that bring network interfaces up or down on an IP network.

The following command brings up a network interface associated with the specified IP label on a specified node:

JOIN_LABEL iplabel, comments:

Where:

iplabel The IP label of the interface

comments User-defined text to describe the configured test

The only time you could have a resource group online and the service label hosted on an inactive interface would be when the service interface fails but there was no place to move the resource group, in which case it stays online.

Example

JOIN_LABEL, app_serv_address, Bring up app_serv_address on node 2

Entrance criteria

Specified interface is currently active on the specified node.

Success indicators



� Specified interface comes up on specified node.


� Resource groups that are in the ERROR state on the specified node and that have a service IP label availab.le on the network can go online, but should not enter the ERROR state


The following command brings down a network interface associated with a specified label on a specified node:

FAIL_LABEL, iplabel, comments:

Where:

iplabel The IP label of the interface.

comments User-defined text to describe the configured test.


Example

FAIL_LABEL, app_serv_label, Bring down app_serv_label, on node 2

Entrance criteria

The specified interface is currently inactive on the specified node.

Success indicators



� Any service labels that were hosted by the interface are recovered.


� Resource groups remain in the same state; however, the resource group can be hosted by another node.

Network tests for a non-IP networkThe testing for non-IP networks is part of the NETWORK_UP_GLOBAL, NETWORK_DOWN_GLOBAL, NETWORK_UP_LOCAL and NETWORK_DOWN_LOCAL test procedures.

Resource group tests This section lists tests for resource groups.

The following command brings a resource group online in a running cluster:

RG_ONLINE, rg, node | ALL | ANY | RESTORE, comments:

Where:

rg The name of the resource group to bring online.

node The name of the node where the resource group will come online.

ALL Use ALL for concurrent resource groups only. When ALL is specified, the resource group will be brought online on all nodes in the resource group. If you use ALL for non-concurrent groups, the Test Tool interprets it as ANY.

ANY Use ANY for non-concurrent resource groups to pick a node where the resource group is offline. For concurrent resource groups, use ANY to pick a random node where the resource group will be brought online.


RESTORE Use RESTORE for non-concurrent resource groups to bring the resource groups online on the highest priority available node. For concurrent resource groups, the resource group will be brought online on all nodes in the nodelist.


Example

RG_ONLINE, rg_1, node2, Bring rg_1 online on node 2.

Entrance criteria

The specified resource group is offline, there are available resources, and can meet all dependencies.

Success indicators


� The cluster becomes stable.� The resource group is brought online successfully on the specified node� No resource groups go offline or into ERROR state.

The following command brings a resource group offline in a running cluster:

RG_OFFLINE, rg, node | ALL | ANY | RESTORE, comments:

Where:

rg The name of the resource group to bring online.

node The name of the node where the resource group will come online.

ALL Use ALL for concurrent resource groups only. When ALL is specified, the resource group will be brought online on all nodes in the resource group. If you use ALL for non-concurrent groups, the Test Tool interprets it as ANY.

ANY Use ANY for non-concurrent resource groups to pick a node where the resource group is offline. For concurrent resource groups, use ANY to pick a random node where the resource group will be brought online.


Example

RG_OFFLINE, rg_1, node2, Bring rg_1 offline from node2


Entrance criteria

The specified resource group is online on the specified node.

Success indicators



� The resource group, which was online on the specified node, is brought offline successfully.

� Other resource groups remain in the same state.

The following command moves a resource group that is already online in a running cluster to a specific or any available node:

RG_MOVE, rg, node | ANY | RESTORE, comments:

Where:

rg The name of the resource group to bring offline.

node The target node; the name of the node to which the resource group will move.

ANY Use ANY to let the Cluster Test Tool pick a random available node to which to move the resource group.

RESTORE Enable the resource group to move to the highest priority node available.


Example

RG_MOVE, rg_1, ANY, Move rg_1 to any available node.

Entrance criteria

The specified resource group must be non-concurrent and must be online on a node other than the target node.

Success indicators


� The cluster becomes stable.� Resource group is moved to the target node successfully.� Other resource groups remain in the same state.


The following command moves a resource group that is already online in a running cluster to an available node at a specific site:

RG_MOVE_SITE, rg, site | OTHER, comments:

Where:

rg The name of the resource group to bring offline.

site The site where the resource group will move.

OTHER Use OTHER to have the Cluster Test Tool pick the other site as the resource group destination. For example, if the resource group is online on siteA, it will be moved to siteB, and conversely if the resource group is online on siteB, it will be moved to siteA.


Example

RG_MOVE_SITE, rg_1, site_2, Move rg_1 to site_2.

Entrance criteria

The specified resource group is online on a node, other than the a node in the target site.

Success indicators


� The cluster becomes stable.� Resource group is moved to the target site successfully.� Other resource groups remain in the same state.

Volume group testThis section list tests for the volume groups.

The following command forces an error for a disk that contains a volume group in a resource group:

VG_DOWN, vg, node | ALL | ANY, comments:

Where:

vg The volume group on the disk of which to fail.

node The name of the node where the resource group that contains the specified volume group is currently online.


ALL Use ALL for concurrent resource groups. When ALL is specified, the Cluster Test Tool will fail the volume group on all nodes in the resource group where the resource group is online. If ALL is used for non-concurrent resource groups, the Tool performs this test for any resource group.

ANY Use ANY to have the Cluster Test Tool will select the node as follows: For a non-concurrent resource group, the Cluster Test Tool will select the node where the resource group is currently online. For a concurrent resource group, the Cluster Test Tool will select a random node from the concurrent resource group node list, where the resource group is online.


Example

VG_DOWN, sharedvg, ANY, Fail the disk where sharedvg resides

Entrance criteria

The resource group containing the specified volume groups is online on the specified node.

Success indicators



� Resource group containing the specified volume group successfully moves to another node, or if it is a concurrent resource groups, it goes into an ERROR state.

� Resource groups can change state to meet dependencies.

Site testsThis section lists tests for site.

The following command fails all the XD_data networks, causing the site_isolation event:

SITE_ISOLATION, comments:

Where:


Example

SITE_ISOLATION, Fail all the XD_data networks


Entrance criteria

At least one XD_data network is configured and is up on any node in the cluster.

Success indicators


� The XD_data network fails, no resource groups change state.� The cluster becomes stable.

The following command runs when at least one XD_data network is up to restore connections between the sites, and remove site isolation. Run this test after running the SITE_ISOLATION test:

SITE_MERGE, comments:

Where:


Example

SITE_MERGE, Heal the XD_data networks

Entrance criteria

At least one node must be online.

Success indicators


� No resource groups change state.� The cluster becomes stable.

The following command stops cluster services and moves the resource groups to other nodes, on all nodes at the specified site:

SITE_DOWN_TAKEOVER, site, comments:

Where:

site The site that contains the nodes on which cluster services will be stopped.


Example

SITE_DOWN_TAKEOVER, site_1, Stop cluster services on all nodes at site_1, bringing the resource groups offline and moving the resource groups.


Entrance criteria

At least one node at the site must be online.

Success indicators


� Cluster services are stopped on all nodes at the specified site. All primary instance resource groups mover to the another site.

� All secondary instance resource groups go offline.


The following command starts cluster services on all nodes at the specified site:

SITE_UP, site, comments:

Where:

site The site that contains the nodes on which cluster services will be started.


Example

SITE_UP, site_1, Start cluster services on all nodes at site_1.

Entrance criteria

At least one node at the site must be offline.

Success indicators


� Cluster services are started on all nodes at the specified site.� Resource groups remain in the same state.� The cluster becomes stable.

General tests The other tests available to use in PoweHA cluster testing are:

� Bring an application server down� Terminate the Cluster Manager on a node� Add a wait time for test processing.


The following command runs the specified command to stop an application server. This test is useful when testing application availability. In the automated test, the test uses the stop script to turn off the application:

SERVER_DOWN, node | ANY, appserv, command, comments:

Where:

node The name of a node on which the specified application sever is to become unavailable.

ANY Any available node that participates in this resource group can have the application server become unavailable The Cluster Test Tool tries to simulate server failure on any available cluster node. This test is equivalent to failure on the node that currently owns the resource group, if the server is in a resource group that has policies other than the following ones: Startup: Online on all available nodes.Fallover: Bring offline (on error node only)

appserv The name of the application server associated with the specified node.

command The command to be run to stop the application server.


Example

SERVER_DOWN,node1,db_app /apps/stop_db.pl, Kill the db app

Entrance criteria

The resource group is online on the specified node.

Success indicators



� Cluster nodes remain in the same state.

� The resource group that contains the application server is online; however, the resource group can be hosted by another node, unless it is a concurrent resource group, in which case the group goes into ERROR state.


The following command runs the kill command to terminate the Cluster Manager on a specified node:

CLSTRMGR_KILL, node, comments:

Where:

node The name of the node on which to terminate the Cluster Manager.


For the Cluster Test Tool to accurately assess the success or failure of a CLSTRMGR_KILL test, do not perform other activities in the cluster while the Cluster Test Tool is running.

Example

CLSTRMGR_KILL, node5, Bring down node5 hard

Entrance criteria

The specified node is active.

Success indicators



� Cluster services stop on the specified node.

� Cluster services continue to run on other nodes.

� Resource groups that were online on the node where the Cluster Manager fails move to other nodes.

� All resource groups on other nodes remain in the same state.

Note: If CLSTRMGR_KILL is run on the local node, you might need to reboot the node. On startup, the Cluster Test Tool automatically starts again. You can avoid manual intervention to reboot the control node during testing by

– Editing the /etc/cluster/hacmp.term file to change the default action after an abnormal exit. The clexit.rc script checks for the presence of this file and, if the file is executable, the script calls it instead of halting the system automatically.

– Configuring the node to auto-Initial Program Load (IPL) before running the Cluster Test Tool. (Stops).


The following command generates a wait period for the Cluster Test Tool for a specified number of seconds:

WAIT, seconds, comments:

Where:

seconds The number of seconds that the Cluster Test Tool waits before proceeding with processing

comments User-defined text to describe the configured test

Example

WAIT, 300, We need to wait for five minutes before the next test

Entrance criteria

Not applicable

Success indicators

Not applicable

Example test planThe following excerpt from a sample test plan includes the tests:

� NODE_UP� NODE_DOWN_GRACEFUL

It also includes a WAIT interval. The comment text at the end of the line describes the action to be taken by the test.

NODE_UP,ALL,starts cluster services on all nodes NODE_DOWN_GRACEFUL,brianna,stops cluster services gracefully on node brianna WAIT,20 NODE_UP,brianna,starts cluster services on node waltham

Running a custom test procedureBefore you start running custom tests, ensure that:

� Your test plan is configured correctly. For information about setting up a test plan, see Creating a test plan.

� You have specified values for test parameters.

� You have logging for the tool configured to capture the information that you want to examine for your cluster.

� The cluster is not in service in a production environment.


To run custom testing, follow these steps:

1. Enter smitty hacmp Extended Configuration HACMP Cluster Test Tool Execute Custom Test Procedure.

2. In the Execute Custom Test Procedure panel, enter field values as follows:

Test Plan This required field contains the full path to the test plan for the Cluster Test Tool. This file specifies the tests for the tool to run.

Variable File This optional field contains the full path to the variables file for the Cluster Test Tool. This file specifies the variable definitions used in processing the test plan.

Verbose Logging When set to yes, includes additional information in the log file that might help to judge the success or failure of some tests. The default is yes. Select no to decrease the amount of information logged by the Cluster Test Tool.

Cycle Log File When set to yes, uses a new log file to store output from the Cluster Test Tool. The default is yes. Select no to append messages to the current log file.

Abort on Error When set to no, the Cluster Test Tool continues to run tests after some of the tests being run fail. This might cause subsequent tests to fail because the cluster state is different from the one expected by one of those tests. The default is no. Select yes to stop processing after the first test fails.

3. Press Enter to start running the custom tests.

4. Evaluate the test results.

Note: The tool stops running and issues an error if a test fails and Abort on Error is selected.


The SMIT panel for the custom test plan is shown in Figure 6-8.

Figure 6-8 Custom test SMIT menu

Log filesIf a test fails, the Cluster Test Tool collects information in the automatically created log files. To collect logs, the Cluster Test Tool creates the directory /var/hacmp/cl_testtool if it does not exist. PowerHA never deletes the files in this directory. You evaluate the success or failure of tests by reviewing the contents of the Cluster Test Tool log file, /var/hacmp/log/cl_testtool.log.

For each test plan that has any failures, the tool creates a new directory under /var/hacmp/log/. If the test plan has no failures, the tool does not create a log directory. The directory name is unique and consists of the name of the Cluster Test Tool plan file, and the time stamp when the test plan was run.

Execute Custom Test Procedure


[Entry Fields]* Test Plan [/cluster/custom] / Variables File [/cluster/testvars] / Verbose Logging [Yes] + Cycle Log File [Yes] + Abort On Error [No] +


Important: If you uninstall PowerHA, the program removes any files that you might have customized for the Cluster Test Tool. If you want to retain these files, make a copy of these files before you uninstall PowerHA.


Log file rotation The Cluster Test Tool saves up to three log files and numbers them so that you can compare the results of different cluster tests. The tool also rotates the files with the oldest file being overwritten. The following list shows the three files saved:

/var/hacmp/log/cl_testtool.log/var/hacmp/log/cl_testtool.log.1 /var/hacmp/log/cl_testtool.log.2

If you do not want the tool to rotate the log files, you can disable this feature from SMIT.

For more information about error logging and trouble shooting consult the HACMP for AIX Administration Guide, SC23-4862.


Chapter 7. Cluster management

In this chapter we describe PowerHA cluster management and administration, including helpful tips when using these features.


� C-SPOC� File collections� User administration� Shared storage management� Time synchronization� Cluster verification and synchronization� Monitoring PowerHA

7


7.1 C-SPOC(Cluster Single Point of Control (C-SPOC) is a very useful tool, helping to manage the entire cluster from any single point. It provides facilities for performing common cluster-wide administration tasks from any active node within the cluster. The downtime that could be caused by cluster administration is reduced by using C-SPOC.

Highly available environments require special consideration when planning changes to the environment. We strongly recommend that a strict change management discipline be followed.

Before we describe cluster management in detail, we want to emphasize general best-practice for cluster administration:

� Wherever possible, use the PowerHA C-SPOC facility to make changes to the cluster.

� Document routine operational procedures (for example, shutdown, startup, and increasing the size of a file system).

� Restrict access to the root password to trained PowerHA administrators.

� Always take a snapshot of your existing configuration before making any changes.

� Monitor your cluster regularly.

7.1.1 C-SPOC in generalThe C-SPOC functionality is provided through its own set of cluster administration commands, accessible through SMIT menus. The location of the commands is /usr/es/sbin/cluster/cspoc. It uses the cluster communication daemon clcomdES to run commands on remote nodes. If this daemon is not running, the command might not be run and C-SPOC operation might fail.

C-SPOC operations fail if any target node is down at the time of execution or if the selected resource is not available. It requires a correctly configured cluster in the sense that all nodes within the cluster can communicate.

If node failure occurs during a C-SPOC operation, an error is displayed to the SMIT panel and the error output is recorded in the C-SPOC log file (cspoc.log). You should check this log if any C-SPOC problems occur.

Note: Since HACMP 5.3, the Cluster Manager process clstrmgrES is started from inittab, so it is always running whether cluster services are started or not.


You can find more information about PowerHA logs in the 7.7.5, “Log files” on page 436.

7.1.2 C-SPOC SMIT menuC-SPOC SMIT menus are accessible by running smitty hacmp System Management (C-SPOC) or by using the fast path, smitty cl_admin. The main C-SPOC functions or sub-menus are presented as they appear in the SMIT C-SPOC menu in the same order as they appear within the main C-SPOC menu:

� Manage HACMP Services:

This option contains utilities to start/stop cluster services on selected nodes as well as the function to show running cluster services on the local node. For more details, see 6.2, “Starting and stopping the cluster” on page 273 and 7.7.2, “Cluster status and services checking utilities” on page 430.

� HACMP Communication Interface Management:

This option contains utilities to manage configuration of communications interfaces to AIX and update PowerHA with these settings.

� HACMP Resource Group and Application Management:

This option contains utilities to manually manipulate resource groups in addition to application monitoring and application availability measurement tools. You can find more details about application monitoring in 7.7.7, “Application monitoring” on page 440, and about the application availability analysis tool in 7.7.8, “Measuring application availability” on page 453.

� HACMP Log Viewing and Management:

This option contains utilities to display the contents of some log files and change the debug level and format of log files (standard html) You can also change the location of cluster log files in this menu. For more details about these topics, see 7.7.5, “Log files” on page 436.

� HACMP File Collection Management:

This option contains utilities to assist with file synchronization throughout the cluster. A file collection is a user defined set of files. For more details about file collections, see 7.2, “File collections” on page 329.

� HACMP Security and Users Management:

This option contains menus and utilities for various security settings as well as users, groups and password management within a cluster. You can find more details about security in Chapter 8, “Cluster security” on page 457 and about user management in 7.3, “User administration” on page 337.

Chapter 7. Cluster management 327

� HACMP Logical Volume Management:

This option contains utilities provided to assist with the cluster-wide administration of shared volume groups, logical volumes and file systems. For more details about this topic, see 7.4.2, “C-SPOC Logical Volume Manager” on page 357.

� HACMP Concurrent Logical Volume Management:

This option contains utilities provided to assist with the cluster-wide administration of concurrent shared volume groups, logical volumes and file systems. For more details about this topic, see 7.4.3, “C-SPOC Concurrent Logical Volume Management” on page 358.

� HACMP Physical Volume Management:

This option contains utilities provided to assist with cluster-wide physical volume management such as adding, removing and replacing physical volumes in the cluster. It also has support for datapath devices and cross-site LVM mirroring. For more details about these topics, see 7.4.4, “C-SPOC Physical Volume Management” on page 359.

� Open a SMIT Session on a Node:

This option allows you to open a basic SMIT window on any active node in the cluster. You can initiate any SMIT action to any node in the cluster from the local SMIT menu.

ConsiderationsCurrently C-SPOC is unable to provide functionality for dynamic volume expansion. The chvg command does not support use in volume groups varied on in enhanced concurrent mode. If you want to increase the size of a shared LUN allocated to your cluster, follow this process:

1. Bring the volume group offline (stop the cluster on all nodes).

2. On the first cluster node:

a. Run cfgmgr

b. Varyon the volume group in normal mode: varyonvg vgname

c. Run lsattr -El hdisk# to check the size of the disk.

d. To update the size of the disks in the VG, run chvg -g vgname (this will determine if VG factor change is required to meet the correct 1016 multiplier).

e. Run lsvg vgname to check the new size.

f. Varyoff the volume group: varyoffvg vgname


3. On subsequent cluster nodes that share the VG:

a. Run cfgmgr

b. Run lsattr -El hdisk# to check the size of the disk.

c. Import the volume group which you have changed: importvg -L vgname hdisk#

4. Back on the first cluster node:

a. Varyon the volume group: varyonvg vgname

b. Run smitty hacmp Extended Configuration Extended Verification and Synchronization and ensure there are no errors.

7.2 File collectionsPowerHA provides cluster-wide file synchronization capabilities with the use of C-SPOC file collections. A file collection is a user defined set of files. You can add files to a file collection or remove files from it, and you can specify the frequency at which PowerHA synchronizes these files.

PowerHA provides three ways to propagate your files:

� Manually: You can synchronize your files manually at any time. The files on the local system are copied from the local node to the remote nodes in the cluster.

� Automatically during cluster verification and synchronization: The files are copied from the local node where you initiate the verification operation.

� Automatically when changes are detected: PowerHA periodically checks the file collection on all nodes and if a file has changed it will synchronize this file across the cluster. You can set a timer to determine how frequently PowerHA checks your file collections.

PowerHA retains the permissions, ownership, and timestamp of the file on the local node and propagates this to the remote nodes. You can only specify ordinary files for a file collection. You cannot add symbolic links, directory, pipe, socket, device file (/dev/*), files from /proc directory and ODM files from /etc/objrepos/* and /etc/es/objrepos/*. Always use full path names.

Each file can be added to only one file collection, except those files that are automatically added to the HACMP_Files collection. The files should not exist on the remote nodes, PowerHA will create them during the first synchronization. The zero length or non-existent files are not propagated from the local node.


PowerHA creates a backup copy of the modified files during synchronization on all nodes. These backups are stored in /var/hacmp/filebackup directory. Only one previous version is retained and you can only manually restore them.

The file collection logs are stored in /var/hacmp/log/clutils.log file.

7.2.1 Predefined file collectionsPowerHA provides two file collections by default: Configuration_Files and HACMP_Files. None of them is set up for automatic synchronization by default. You can enable them by setting either the “Propagate files during cluster synchronization” or “Propagate files automatically when changes are detected” option to Yes in SMIT Change/Show a file collection menu, see chapter “Changing a file collection” on page 333.

Configuration_Files This collection contains the essential AIX configuration files:

� /etc/hosts� /etc/services� /etc/snmpd.conf� /etc/snmpdv3.conf� /etc/rc.net� /etc/inetd.conf� /usr/es/sbin/cluster/netmon.cf� /usr/es/sbin/cluster/etc/clhosts� /usr/es/sbin/cluster/etc/rhosts� /usr/es/sbin/cluster/etc/clinfo.rc

You can easily add or remove files to these file collections. See “Adding files to a file collection” on page 335 for more information.

HACMP_FilesIf you add any of the following user-defined files to your cluster configuration, then these are automatically included in the HACMP_Files file collection:

� Application server start script� Application server stop script� Event notify script� Pre-event script� Post-event script

Important: It is your responsibility to ensure that files on the local node (where you start the propagation) are the most recent and are not corrupted.


� Event error recovery script� Application monitor notify script� Application monitor cleanup script� Application monitor restart script� Pager text message file� SNA Link start and stop scripts� X.25 Link start and stop scripts� HA Tape support start script� HA Tape support stop script� User-defined event recovery program� Custom snapshot method script

Let us look at an example of how this works. Our cluster has an application server, called app_server_1. Its start script is /usr/app_scripts/app_start, its stop script is /usr/app_scripts/app_stop. We have a custom post-event script to the PowerHA node_up event, called /usr/app_scripts/post_node_up. These three files were automatically added to HACMP_Files file collection when we defined them during PowerHA configuration. You can check this as follows:

1. Start SMIT HACMP file collection management: smitty cm_filecollection_mgt Change/Show a File Collection HACMP_Files from the pop-up list and press Enter.

2. Go down to the Collection files field and press F4. As you can see in Example 7-1, the application start and stop scripts and the post event command are automatically added to this file collection.

Example 7-1 How to list which files are included in an existing file collection

Change/Show a File Collection


[Entry Fields] File Collection Name HACMP_Files New File Collection Name [] File Collection Description [User-defined scripts > +--------------------------------------------------------------------------+ | Collection files | | | | The value for this entry field must be in the | | range shown below. | | Press Enter or Cancel to return to the entry field, | | and enter the desired value. | | | | /tmp/app_scripts/app_start | | /tmp/app_scripts/app_stop |


| | | F1=Help F2=Refresh F3=Cancel |F1| F8=Image F10=Exit Enter=Do |F5| /=Find n=Find Next |F9+--------------------------------------------------------------------------+

If you do not want to synchronize all of your user-defined scripts or if they are not the same on all nodes, then disable this file collection and create another one, which includes only the required files.

7.2.2 Managing file collectionsHere we describe how you can create, modify, and remove a file collection.

Adding a file collectionFollow these steps to add a file collection:

1. Start SMIT: smitty hacmp HACMP for AIX Select System Management (CSPOC) HACMP File Collection Management.

Or you can start HACMP File Collection Management by entering smitty cm_filecollection_menu

2. Select Manage File Collections Add a File Collection.

3. Supply the requested information (see Figure 7-1):

– File Collection Name: A unique name for file collection.

– File Collection Description: A short description of this file collection.

– Propagate files during cluster synchronization?: if you set this to yes, then PowerHA propagates this file collection during cluster synchronization. This is a convenient solution for cluster related files, for example, your application start/stop scripts are automatically synchronized after you make any changes in the cluster configuration.

– Propagate files automatically when changes are detected?: If you select yes, PowerHA will check the files in this collection regularly, and if any of them are changed, then it re-propagates them.

If both of the above options are left as No, then no automatic synchronization will take place.

Note: You cannot add to or remove files from this file collection. If you start using HACMP_Files collection, be sure that your scripts work as designed on all nodes.


Figure 7-1 Add a file collection

Changing a file collectionFollow these steps to change a file collection:

1. Start HACMP File Collection Management by entering smitty cm_filecollection_menu Manage File Collections Change/Show a File Collections

2. Select a file collection from the pop-up list.

3. Now you can change the following information (see Figure 7-2):

– File collection name– File collection description– Propagate files during cluster synchronization (yes/no)– Propagate files automatically when changes are detected (yes/no)– Collection files:

Press F4 here to see the list of files in this collection.

See “Adding a file collection” on page 332 for explanation of the above fields.

Add a File Collection


[Entry Fields]* File Collection Name [application_files] File Collection Description [Application config fi> Propagate files during cluster synchronization? yes + Propagate files automatically when changes are det no + ected?



Figure 7-2 Change a file collection

Removing a file collectionFollow these steps to remove a file collection:

1. Start HACMP File Collection Management by entering smitty cm_filecollection_menu Manage File Collections Remove a File Collection.

2. Select a file collection from the pop-up list.

3. Press Enter again to confirm the deletion of the file collection.

Changing the automatic update timerHere you can set the timer for how frequently PowerHA checks the files in the collections for changes. Only one timer can be set for all file collections in the cluster:

1. Start HACMP File Collection Management by entering smitty cm_filecollection_menu fast path Manage File Collections Change/Show Automatic Update Time

2. Supply the Automatic File Update Time in minutes. The value should be between 10 minutes and 1440 minutes (24 hours).

Change/Show a File Collection


[Entry Fields] File Collection Name Configuration_Files New File Collection Name [] File Collection Description [AIX and HACMP configu> Propagate files during cluster synchronization? no + Propagate files automatically when changes are det no + ected? Collection files +



Adding files to a file collectionTo add files to a file collection, follow these steps:

1. Start HACMP File Collection Management by entering smitty cm_filecollection_menu Manage File in File Collections Add Files to a File Collection.

2. Select a file collection from the pop-up list and press Enter.

3. On the SMIT panel you can check the current file list or you can add new files (see Figure 7-3).

– To get the list of current files in this collection go to the Collection Files field and press F4.

– To add new files, go to New files field and type the file name here, that you want to add to the file collection. You can only add one file at a time. The file name should start with “/”. You can only specify ordinary files here, you cannot add symbolic links, directory, pipe, socket, device file (/dev/*), files from /proc directory and ODM files from /etc/objrepos/* and /etc/es/objrepos/*.

Figure 7-3 Add files a file collection

Add Files to a File Collection


[Entry Fields] File Collection Name app_files File Collection Description Application configura> Propagate files during cluster synchronization? no Propagate files automatically when changes are det no ected? Collection files * New files [/usr/app/config_file]/


Attention: You cannot add files to the HACMP_Files collection.


Removing files from a file collectionTo remove files from a file collection, follow these steps:

1. Start HACMP File Collection Management by entering smitty cm_filecollection_menu Manage File in File Collections Remove Files from a File Collection.

2. Select a file collection from the pop-up list end press Enter.

3. Select one or more files from the list and press Enter. See Figure 7-4.

4. Press Enter again to confirm.

Figure 7-4 Removing files from a file collection

Manually propagating files in a file collectionYou can manually synchronize file collections (see Figure 7-5):

1. Start HACMP File Collection Management by entering smitty cm_filecollection_menu Propagate Files in File Collections.

2. Select a file collection from the pop-up list end press Enter.

Manage File in File Collections


Add Files to a File Collection Remove Files from a File Collection

+--------------------------------------------------------------------------+| Select one or more files to remove from this File Collection || || Move cursor to desired item and press F7. || ONE OR MORE items can be selected. || Press Enter AFTER making all selections. || || /usr/app/data.conf || /usr/app/app.conf || /usr/app/config_file || || F1=Help F2=Refresh F3=Cancel || F7=Select F8=Image F10=Exit || Enter=Do /=Find n=Find Next |+--------------------------------------------------------------------------

Attention: You cannot remove files from the HACMP_Files collection.


3. Press Enter again to confirm.

Figure 7-5 Manual propagation of a file collection

7.3 User administrationIn a PowerHA cluster, user IDs and passwords should be synchronized. If user and group IDs are not the same across your cluster, your application might not work and users will be unable to access their files on the shared storage. Additionally, we suggest that you synchronize passwords, so in case of fallover, users can log in conveniently without the delay of having their password reset if they do not know what it is on the fallover node.

There are several options to consider for user and password synchronization:

� Using C-SPOC: PowerHA provides utilities in C-SPOC for easy user administration. In “C-SPOC user and group administration” on page 338, we introduce the HACMP Security and Users Management function.

� LDAP: This is the best solution for managing a large number of users in a complex environment. LDAP can be set up to work together with PowerHA. For more information about LDAP, see Understanding LDAP - Design and Implementation, SG24-4986.

COMMAND STATUS



The following file collections will be processed:app_filesStarting file propagation to remote node buttercup.Successfully propagated file /usr/app/data.conf to node buttercup.Successfully propagated file /usr/app/app.conf to node buttercup.Successfully propagated file /usr/app/config_file to node buttercup.Total number of files propagated to node buttercup: 3



7.3.1 C-SPOC user and group administrationPowerHA provides C-SPOC tools for easy cluster-wide user, group, and password administration. The following functions are available with C-SPOC:

� Add users� List users� Change user attributes� Remove users� Add groups� List groups� Change group attributes� Remove groups� Change users passwords cluster-wide� Manage list of users permitted to change their passwords cluster-wide

Adding a userTo add a user on all nodes in the cluster, follow these steps:

1. Start SMIT: smitty hacmp System Management (C-SPOC) HACMP Security and Users Management

Or you can start C-SPOC HACMP Security and User Management by entering the smitty cl_usergroup fast path.

2. Select Users in a HACMP Cluster.

3. Select Add a User to the Cluster.

4. Select on which nodes you want to create users. If you leave the Select Nodes by Resource Group field empty, the user will be created on all nodes in the cluster. If you select a resource group here, then the user will only be created on the subset of nodes on which that resource group is configured to run. In the case of a two-node cluster, leave this field blank.

If you have more than two nodes in your cluster, you can create users related to specific resource groups. If you want to create a user for these nodes only (for example, they can logon to node1 and node2, but they not allowed to logon to node3 or node4), then you can select the appropriate resource group name from the picklist. See Table 7-1.

Table 7-1 Cross-reference of users, resource groups and nodes

Resource group Nodes Users

rg1 node1, node2 dbadm, dbinst, dbuser

rg2 node2, node1 btrcup, longr

rg3 node3, node4 millerb, alexa

rg4 node4, node3, node2, node1 app1adm, app1user


Table 7-1 on page 338 is a cross-reference between users, resource groups, and nodes. It shows that in our example, the app1adm user will be created on all nodes (leave the Select Nodes by Resource Group field empty), while users such as btrcup and longr will be created only on node 1 and node2 (select “rg2” at the Select Nodes by Resource Group field). See Figure 7-6.

Figure 7-6 Select nodes by resource group

5. Create the user. You should supply the user name and other relevant information just as you would creating any normal user. You can specify the user ID here, however if the User ID is already on a node the command will fail. If you leave the user ID field blank the user will be created with the first available ID on all nodes. See the SMIT panel in Figure 7-7.

Add a User to the Cluster

Type or select a value for the entry field.Press Enter AFTER making all desired changes.

[Entry Fields] Select nodes by Resource Group [] + *** No selection means all nodes! ***

+-----------------------------------------------------------------------+ ¦ Select nodes by Resource Group ¦ ¦ *** No selection means all nodes! *** ¦ ¦ Move cursor to desired item and press Enter. ¦ ¦ ¦ ¦ rg1 ¦ ¦ rg2 ¦ ¦ rg3 ¦ ¦ rg4 ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦F1¦ F8=Image F10=Exit Enter=Do ¦F5¦ /=Find n=Find Next ¦F9+-----------------------------------------------------------------------+


Figure 7-7 Create a user on all cluster nodes

Listing cluster usersTo list users in the cluster, follow these steps:

1. Start C-SPOC HACMP Security and Users Management by entering the smitty cl_usergroup fast path.

2. Select Users in an HACMP Cluster.

3. Select List Users in the Cluster.

Add a User to the Cluster


[TOP] [Entry Fields] Select nodes by resource group *** No selection means all nodes! ***

* User NAME [millerb] User ID [] # ADMINISTRATIVE USER? false + Primary GROUP [] + Group SET [] + ADMINISTRATIVE GROUPS [] + Another user can SU TO USER? true + SU GROUPS [ALL] + HOME directory [/home/millerb] Initial PROGRAM [] User INFORMATION [Mr. Miller][MORE...33]


Note: When creating a user’s home directory, if it is to reside on a shared file system, C-SPOC does not check whether the file system is mounted or if the volume group is varied. In this case C-SPOC creates the user home directory under the empty mount point of the shared file system.You can fix this by moving the home directory under the shared file system.

If a user’s home directory is on a shared file system, the user can only login on the node where the file system is mounted.


4. Select which nodes for whose user lists you want to display. If you leave the Select Nodes by Resource Group field empty the users for all cluster nodes will be listed.

If you select a resource group here, then C-SPOC will only list users from the nodes which belong to the specified resource group.

5. Press Enter. See the SMIT panel in Figure 7-8.

Figure 7-8 Listing users in the cluster

COMMAND STATUS



node1 root 0 /node1 daemon 1 /etcnode1 bin 2 /binnode1 sys 3 /usr/sysnode1 adm 4 /var/admnode1 sshd 207 /var/emptynode1 miller 302 /home/millernode1 alexa 305 /home/alexanode1 btrcup 307 /home/btrcupnode1 dbadm 1000 /home/dbadmnode1 dbinst 1001 /home/dbinstnode1 dbuser 1003 /home/dbusernode1 longr 312 /home/longrnode2 root 0 /node2 daemon 1 /etcnode2 bin 2 /binnode2 sys 3 /usr/sysnode2 adm 4 /var/admnode2 sshd 207 /var/emptynode2 miller 302 /home/millernode2 alexa 305 /home/alexanode2 btrcup 307 /home/btrcupnode2 dbadm 1000 /home/dbadmnode2 dbinst 1001 /home/dbinstnode2 dbuser 1003 /home/dbusernode2 longr 312 /home/longr



Modifying user attributesTo modify user attributes in the cluster, follow these steps:

1. Start C-SPOC HACMP Security and Users Management by entering smitty cl_usergroup Users in an HACMP Cluster Change / Show Characteristics of a User in the Cluster

2. Select on which nodes you want to modify a user. If you leave the field Select Nodes by Resource Group empty, the user can be modified on all nodes.

If you select a resource group here, then you can modify a user that belongs to the specified resource group.

3. Enter the name of the user you want to modify or press F4 to select from the picklist.

4. Now you can modify the user attributes (see Figure 7-9).

Figure 7-9 Modifying user attributes

Change / Show Characteristics of a User


[TOP] [Entry Fields]* User NAME killer User ID [305] # ADMINISTRATIVE USER? false + Primary GROUP [staff] + Group SET [staff] + ADMINISTRATIVE GROUPS [] + ROLES [] + Another user can SU TO USER? true + SU GROUPS [ALL] + HOME directory [/home/killer] Initial PROGRAM [/usr/bin/ksh] User INFORMATION [Ms Killeen] EXPIRATION date (MMDDhhmmyy) [0] Is this user ACCOUNT LOCKED? false +[MORE...36]



Removing a userTo remove a user, follow these steps:

1. Start C-SPOC HACMP Security and Users Management by entering: smitty cl_usergroup Users in an HACMP Cluster Remove a User from the Cluster

2. Select the nodes you want to remove a user from. If you leave the field Select Nodes by Resource Group empty, any user can be removed from all nodes.

If you select a resource group here, then C-SPOC will remove the user from only the nodes that belong to the specified resource group.

3. Enter the user name to remove or press F4 to select a user from the picklist.

4. Remove AUTHENTICATION information: Select Yes to delete the user password and other authentication information. Select No to leave the user password in the /etc/security/passwd file. The default is Yes. See Figure 7-10.

Figure 7-10 Remove a user from the cluster

Remove a User from the Cluster


[Entry Fields] Select nodes by resource group *** No selection means all nodes! ***

* User NAME [suri] + Remove AUTHENTICATION information? Yes +



Adding a group to the clusterTo add a group to the cluster, follow these steps:

1. Start C-SPOC HACMP Security and Users Management by entering smitty cl_usergroup Groups in an HACMP Cluster Add a Group to the Cluster.

2. Select on which nodes you want to create groups. If you leave the Select Nodes by Resource Group field empty the group will be created on all nodes in the cluster.

If you have more than two nodes in your cluster, you can create groups related to specific resource groups. If you want to create a group for these nodes only then you can select the appropriate resource group name from the picklist. See Table 7-2.

Table 7-2 Cross-reference of groups, resource groups and nodes

Table 7-2 is a cross-reference between groups, resource groups, and nodes. It shows “support” present on all nodes (leave the Select Nodes by Resource Group field empty), while groups such as “dbadmin” will be created only on node1 and node2 (select “rg1” in the Select Nodes by Resource Group field).

Resource group Nodes Group

rg1 node1, node2 dbadmin

rg2 node2, node1 developers

rg3 node3, node4 appusers

rg4 node4, node3, node2, node1 support


3. Create the group. See SMIT panel in Figure 7-11. You should supply the group name, user list, and other relevant information just as when creating any normal group. Press F4 for the list of the available users to include in the group.

You can specify the group ID here, however, if it is already used on a node, the command will fail. If you leave the group ID field blank, the group will be created with the first available ID on all cluster nodes.

Figure 7-11 Add a group to the cluster

Listing groups on the clusterTo list the groups on the cluster, follow these steps:

1. Start C-SPOC HACMP Security and Users Management by entering smitty cl_usergroup Groups in an HACMP Cluster List All Groups in the Cluster.

2. Select the nodes whose groups lists you want to display. If you leave the Select Nodes by Resource Group field empty C-SPOC lists all groups on all cluster nodes. If you select a resource group here, then C-SPOC will only list groups from the nodes which belong to the specified resource group.

Add a Group to the Cluster


[Entry Fields] Select nodes by resource group *** No selection means all nodes! ***

* Group NAME [dbadmin] ADMINISTRATIVE group? false + Group ID [] # USER list [longr,btrcup] + ADMINISTRATOR list [] +



3. C-SPOC lists the groups and their attributes from the selected nodes as seen in Figure 7-12.

Figure 7-12 List groups on the cluster

Changing a group in the clusterTo change a group in the cluster, follow these steps:

1. Start C-SPOC HACMP Security and Users Management by entering smitty cl_usergroup Groups in an HACMP Cluster Change / Show Characteristics of a Group in the Cluster.

COMMAND STATUS



node1 system 0 true root filesnode1 staff 1 false millerb,alexa,dbadm,dbinst,dbuser filesnode1 bin 2 true root,bin filesnode1 sys 3 true root,bin,sys filesnode1 adm 4 true bin,adm filesnode1 security 7 true root filesnode1 cron 8 true root filesnode1 shutdown 21 true filesnode1 sshd 205 false sshd filesnode1 hacmp 206 false filesnode1 dbgroup208 false dbadm,dbuser root filesnode2 system 0 true root filesnode2 staff 1 false btrcup,longr,dbadm,dbuser filesnode2 bin 2 true root,bin filesnode2 sys 3 true root,bin,sys filesnode2 adm 4 true bin,adm filesnode2 security 7 true root filesnode2 cron 8 true root filesnode2 shutdown 21 true filesnode2 sshd 202 false sshd filesnode2 hacmp 203 false filesnode2 dbgroup 208 false dbadm,dbuser root files



2. Select on which nodes you want to change the groups. If you leave the field Select Nodes by Resource Group empty, you can modify any groups from all cluster nodes. If you select a resource group here, then C-SPOC will change only that groups that are on the nodes which belong to the specified resource group.

3. Change the group attributes. See Figure 7-13.

Figure 7-13 Change / show group attributes on the cluster

Removing a groupTo remove a group from a cluster, follow these steps:

1. Start CSPOC HACMP Security and User Management by entering smitty cl_usergroup Groups in a HACMP Cluster Remove a Group from the Cluster.

2. Select the nodes whose groups you want to change. If you leave the field Select Nodes by Resource Group empty, C-SPOC will remove the selected group from all cluster nodes. If you select a resource group here, then C-SPOC will remove the group from only the nodes which belong to the specified resource group. Select the group to remove. Press F4 to get a list of groups in the cluster.

Change / Show Group Attributes on the Cluster


[Entry Fields] Select nodes by resource group

Group NAME db2group Group ID [208] # ADMINISTRATIVE group? false + USER list [db2adm,db2user] + ADMINISTRATOR list [root] +



Notes on using C-SPOC user managementThe following are remarks regarding user and group administration with C-SPOC:

� C-SPOC User and password management requires the cluster secure communication daemon (clcomdES) to be up and running on all cluster nodes. You cannot use C-SPOC if any of your nodes are powered off. In such a case you will get an error message similar to this:

1800-106 An error occurred: migcheck[471]: cl_connect() error, nodename=node2, rc=-1 migcheck[471]: cl_connect() error, nodename=node2, rc=-1node2: rshexec: cannot connect to node node2 ndu2: cl_rsh had exit code = 1, see cspoc.log and/or clcomd.log for more information .

However, you can use C-SPOC regardless of the state of the cluster.

� Be careful if selecting nodes by resource groups. You have to select exactly the nodes where the user or group you want to modify/remove exists. You cannot modify/remove a user or group if that user or group does not exist on any of the selected nodes.

� If you encounter an error using C-SPOC check cspoc.log and or clcomd.log for more information.

� C-SPOC user management cannot be used together with NIS or LDAP.

7.3.2 Password managementThe PowerHA C-SPOC password management utility is a convenient way for users to change their password on all cluster node from a single point of control. If using this utility, when a user changes their password with the passwd command from any cluster node, C-SPOC will propagate the new password to all other cluster nodes.


Setting up C-SPOC password managementThe C-SPOC password management utilities are disabled by default. Here are the steps to enable it:

1. Modify the system password utility to use the cluster password utility. On a standalone AIX machine the /usr/bin/passwd command is used to change a user’s password. Now this command will be replaced by the /usr/es/sbin/cluster/utilities/clpasswd command, which will change the password on all cluster nodes:

a. Start C-SPOC HACMP Security and Users Management by entering smitty cl_usergroup Passwords in an HACMP cluster Modify System Password Utility

b. Press F4 and select Link to Cluster Password Utility from the picklist. See Figure 7-14.

c. Select the nodes where you want to change the password utility. Just leave this field blank for all nodes. We suggest that you set up the cluster password utility on all nodes.

Figure 7-14 Modifying the system password utility

Modify System Password Utility


[Entry Fields]* /bin/passwd utility is [Link to Cluster Passw> +

Select nodes by Resource Group [] + *** No selection means all nodes! ***

+-----------------------------------------------------------------------+ ¦ /bin/passwd utility is ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. ¦ ¦ ¦ ¦ Original AIX System Command ¦ ¦ Link to Cluster Password Utility ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦F1¦ F8=Image F10=Exit Enter=Do ¦F5¦ /=Find n=Find Next ¦F9+-----------------------------------------------------------------------+


2. Create a list of users who can change their own password from any cluster node:

a. Start C-SPOC HACMP Security and Users Management by entering smitty cl_usergroup Passwords in an HACMP cluster Manage List of Users Allowed to Change Password.

b. SMIT shows the users who are allowed to change their password cluster-wide (see Figure 7-15).

Figure 7-15 Managing list of users allowed to change their password cluster-wide

c. To change the list of the users who are allowed to change their password cluster-wide, press F4 and select the user names from the pop-up list. Choose ALL_USERS to enable all current and future cluster users to use C-SPOC password management. See Figure 7-16.

We suggest that you include only real named users here, and manually change the password for the technical users.

Manage List of Users Allowed to Change Password


[Entry Fields] Users allowed to change password [barnsk longr] + cluster-wide

F4 lists all users defined to all cluster nodes



Figure 7-16 Selecting users allowed to change their password cluster-wide

Manage List of Users Allowed to Change Password


[Entry Fields] Users allowed to change password [barnsk longr] + +-----------------------------------------------------------------------+ ¦ Users allowed to change password ¦ ¦ cluster-wide ¦ ¦ Move cursor to desired item and press F7. ¦ ¦ ONE OR MORE items can be selected. ¦ ¦ Press Enter AFTER making all selections. ¦ ¦ ¦ ¦ ALL_USERS ¦ ¦ sshd ¦ ¦ btrcup ¦ ¦ killeer ¦ ¦ dbadm ¦ ¦ dbuser ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦F1¦ F7=Select F8=Image F10=Exit ¦F5¦ Enter=Do /=Find n=Find Next ¦F9+-----------------------------------------------------------------------+

Note: If you enable C-SPOC password utilities for all users in the cluster, but you have users who only exist on one node, then you will get an error message similar to this:

# passwd shaneChanging password for "shane"shane’s New password:Enter the new password again:node2: clpasswdremote: User shane does not exist on node node2node2: cl_rsh had exit code = 1, see cspoc.log and/or clcomd.log for more information

The password is changed regardless of the error message.


Changing a user password with C-SPOCUse the following procedure to change a user password with C-SPOC:

1. Start C-SPOC HACMP Security and Users Management by entering smitty cl_usergroup Passwords in an HACMP cluster Change a User's Password in the Cluster

2. Select on which nodes you want to change the user’s password. Just leave this field empty for all nodes. If you select a resource group here, C-SPOC will change the password only on the nodes which belong to the resource group.

3. Type the user name or press F4 to select a user from the pop-up list.

4. Set User must change password on first login to either true or false as you prefer. See Figure 7-17.

Figure 7-17 Change a user’s password in the cluster

5. Press Enter and type the new password when prompted.

Changing your own passwordUse the following procedure to change your own password.

1. Start C-SPOC HACMP Security and Users Management by entering smitty cl_usergroup Passwords in an HACMP cluster Change Current Users Password.

Change a User's Password in the Cluster


[Entry Fields] Selection nodes by resource group *** No selection means all nodes! ***

* User NAME [caseyb] + User must change password on first login? true +


Tip: You can still use the AIX passwd command to change a specific user’s password on all nodes.


2. Select on which nodes you want to change your password. Leave this field empty for all nodes. If you select a resource group here, C-SPOC will only change the password on nodes belonging to that resource group.

3. Your user name is shown on the SMIT panel. See Figure 7-18.

Figure 7-18 Change your own password

4. Press Enter and change your password when prompted.

Now your password is changed on all the selected nodes.

7.4 Shared storage managementThe PowerHA C-SPOC utility simplifies maintenance of shared LVM components in a cluster. C-SPOC commands provide comparable functions in a cluster environment to the standard AIX LVM commands, which can be run on a standalone node. By automating repetitive tasks on nodes within the cluster, C-SPOC eliminates a potential source of errors and makes cluster maintenance more efficient. Although you can use the AIX command line to administer the cluster nodes, we suggest that you use C-SPOC wherever possible.

Change Current Users Password


[Entry Fields] Selection nodes by resource group [] + *** No selection means all nodes! ***

User NAME caseyb


Tip: You can use the passwd command to change your password on all nodes.


C-SPOC commands operate on both shared and concurrent LVM components to be used as part of a PowerHA resource group. When you use C-SPOC, the command runs on the local node and propagates the changes to the other cluster nodes where the operation is to be run.

You can find additional information about storage in Chapter 12, “Storage related considerations” on page 575.

7.4.1 Updating LVM componentsWhen making changes to LVM components manually on nodes within a PowerHA cluster (including volume groups, logical volumes, and file systems), the commands will update the AIX ODM on the local node and the Volume Group Descriptor Area (VGDA) on the shared disks in the volume group. However, updates to the ODM on all remote nodes require manual propagation to ensure that the cluster operates successfully.

If you use C-SPOC to make LVM changes within a PowerHA cluster, the changes are propagated automatically to all nodes selected for the LVM operation.

Importing volume groups manuallyThe regular AIX based procedure to propagate volume group ODM information to other nodes for non-concurrent volume groups is shown in Example 7-2. You can use the same steps for enhanced concurrent capable volume groups as well. You can also use the equivalent AIX SMIT command instead of the command line.

Example 7-2 Importing AIX volume groups manually

Tasks performed on the local node (where the volume group is active):node_UK> lsvg -l app1vgapp1vg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTapp1vglog jfs2log 1 2 2 open/syncd N/Aapp1lv jfs2 200 400 4 open/syncd /app1node_UK> umount /app1node_UK> varyoffvg app1vgnode_UK> ls -l /dev/app1vgcrw-r----- 1 root system 90, 0 Mar 24 14:50 /dev/app1vg

Tasks performed on all the other nodes:node_USA> lspv |grep app1vghdisk1 000685bf8595e225 app1vghdisk2 000685bf8595e335 app1vghdisk3 000685bf8595e445 app1vghdisk4 000685bf8595e559 app1vg


node_USA> exportvg app1vgnode_USA> importvg -y app1vg -n -V90 hdisk1node_USA> chvg -a n app1vgnode_USA> varyoffvg app1vg

Instead of export/import commands, you can use the importvg -L VGNAME HDISK command on the remote nodes, but be aware that the -L option requires that the volume group has not been exported on the remote nodes. The importvg -L command preserves the logical volume devices ownership.

Enhanced concurrent capable volume groups (introduced with HACMP 5.2) simplify LVM administration, most LVM changes can be made dynamically, even from the command line. For these dynamic changes to work correctly, it is required to have gsclvmd, topsvcs, grpsvcs, and emsvcs running while performing maintenance. You can find more information about enhanced concurrent volume groups in Chapter 12, “Storage related considerations” on page 575.

Lazy update In a cluster, PowerHA controls when volume groups are activated. PowerHA implements a function called lazy update.

This function examines the volume group timestamp, which is maintained in both the volume group’s VGDA, and the local ODM. AIX updates both these timestamps whenever a change is made to the volume group. When PowerHA is going to varyon a volume group, it compares the copy of the timestamp in the local ODM with that in the VGDA. If the values differ, PowerHA will cause the local ODM information on the volume group to be refreshed from the information in the VGDA.

If a volume group under PowerHA control is updated directly (that is, without going through C-SPOC), other nodes’ information on that volume group will be updated when PowerHA has to bring the volume group online on those nodes, but not before. The actual operations performed by PowerHA will depend on the state of the volume group at the time of activation.

Note: Ownership and permissions on logical volume devices are reset when a volume group is exported and then re-imported. After exporting and importing, a volume group will be owned by root:system. Some applications that use raw logical volumes might be affected by this. You must check the ownership and permissions before exporting VG and restore them back manually in case they are not root:system as default.


PowerHA does not require lazy update processing for enhanced concurrent volume groups, because it keeps all cluster nodes updated with the LVM information.

Importing volume groups automaticallyPowerHA allows for importing volume groups onto all cluster nodes automatically. This is done through the Extended Resource Configuration SMIT menu. Automatic import allows you to create a volume group and then add it to the resource group immediately, without manually importing it onto each of the destination nodes in the resource group. To utilize this feature, run:

smitty hacmp Extended Configuration Extended Resource Configuration HACMP Extended Resource Group Configuration Change/Show Resources and Attributes for a Resource Group then select resource group and set the Automatically Import Volume Groups to true.

The following must be true for PowerHA to import available volume groups:

� You must run discovery by running smitty hacmp Extended Configuration Discover HACMP-related Information from Configured Nodes.

� Logical volumes and file systems must have unique names cluster wide.

� All physical disks must be known to AIX and have appropriate PVIDs assigned.

� The physical disks on which the volume group resides are available to all of the nodes in the resource group.

Importing volume groups using C-SPOCC-SPOC allows you to import volume groups on all cluster nodes from a single point of control. You can do this by running:

1. smitty hacmp System Management (C-SPOC) HACMP Logical Volume Management Shared Volume Groups Import a Shared Volume Group

2. Then select the volume group you want to import and the physical disk you want to use for the import operation. The SMIT panel appears, as shown in Figure 7-19.

Note: We recommend using C-SPOC to make LVM changes where possible rather than relying on lazy update. C-SPOC will import these changes to all nodes at the time of the C-SPOC operation unless a node is powered off.


Figure 7-19 C-SPOC importvg panel

7.4.2 C-SPOC Logical Volume ManagerC-SPOC HACMP Logical Volume Management menu offers the ability to perform LVM commands similar to those in the AIX LVM SMIT menus (running smitty lvm). When using these C-SPOC functions picklists are generated from resources which are available for cluster administration and will be selected by resource name. When the desired resource (for example, volume group, physical volume) has been selected for use, the panels that follow, closely resemble the AIX LVM SMIT menus. For AIX administrators who are familiar with AIX LVM SMIT menus, C-SPOC is a very easy tool to navigate.

To select the LVM C-SPOC menu for logical volume management, run smitty cl_admin HACMP Logical Volume Management. There you will find the following menu options:

� Shared Volume Groups:

– List All Shared Volume Groups– Create a Shared Volume Group– Create a Shared Volume Group with Data Path Devices– Enable a Shared Volume Group for Fast Disk Takeover– Set Characteristics of a Shared Volume Group

Import a Shared Volume Group

Type or select values in entry fields.Press Enter AFTER making all desired changes. [Entry Fields] Resource Group Name vg_rg1 VOLUME GROUP name cspocvg Reference node panther PHYSICAL VOLUME name hdisk2 Volume group MAJOR NUMBER [100] +# Make this VG Concurrent Capable? no + Make default varyon of VG Concurrent? no +


Note: The volume group should already belong to a resource group in order to use this C-SPOC function.


– Import a Shared Volume Group– Mirror a Shared Volume Group– Unmirror a Shared Volume Group

� Shared Logical Volumes:

– List All Shared Logical Volumes by Volume Group– Add a Shared Logical Volume– Set Characteristics of a Shared Logical Volume– Show Characteristics of a Shared Logical Volume– Change a Shared Logical Volume– Remove a Shared Logical Volume

� Shared File Systems:

– Journaled File Systems– Enhanced Journaled File Systems

For both file system types, the submenus are as follows:

• Add an (Enhanced) Journaled File System

• Add an (Enhanced) Journaled File System on a Previously Defined Logical Volume

• List All Shared File Systems

• Change / Show Characteristics of a Shared (Enhanced) Journaled File System

• Remove a Shared File System

� Synchronize Shared LVM Mirrors:

– Synchronize by Volume Group– Synchronize by Logical Volume

� Synchronize a Shared Volume Group Definition

You can find more detailed descriptions of the specific tasks in 7.4.5, “Examples” on page 360.

7.4.3 C-SPOC Concurrent Logical Volume ManagementC-SPOC HACMP Concurrent Logical Volume Management menu offers the ability to perform LVM commands similar to those in the C-SPOC LVM menus, described in the previous section.


To select the LVM C-SPOC menu for concurrent logical volume management, run smitty cl_admin HACMP Concurrent Logical Volume Management. There you will find the following menu options:

� Concurrent Volume Groups:

– List All Concurrent Volume Groups– Create a Concurrent Volume Group– Create a Concurrent Volume Group with Data Path Devices– Set Characteristics of a Concurrent Volume Group– Import a Concurrent Volume Group– Mirror a Concurrent Volume Group– Unmirror a Concurrent Volume Group– Manage Concurrent Volume Groups for Multi-Node Disk Heartbeat

� Concurrent Logical Volumes:

– List All Concurrent Logical Volumes by Volume Group– Add a Concurrent Logical Volume– Set Characteristics of a Concurrent Logical Volume– Show Characteristics of a Concurrent Logical Volume– Remove a Concurrent Logical Volume

� Synchronize Concurrent LVM Mirrors:

– Synchronize by Volume Group– Synchronize by Logical Volume

You can find more detailed descriptions of the specific tasks in the 7.4.5, “Examples” on page 360.

7.4.4 C-SPOC Physical Volume ManagementC-SPOC HACMP Physical Volume Management menu offers the ability to perform for physical volume management and SDD virtual path management within the cluster, run smitty cl_admin HACMP Physical Volume Management. There you will find the following menu options:

� Add a Disk to the Cluster

� Remove a Disk From the Cluster

� Cluster Disk Replacement

� Cluster Data Path Device Management:

– Display Data Path Device Configuration– Display Data Path Device Status– Display Data Path Device Adapter Status– Define and Configure all Data Path Devices– Add Paths to Available Data Path Devices


– Configure a Defined Data Path Device– Remove a Data Path Device– Convert ESS hdisk Device Volume Group to an SDD VPATH Device

Volume Group– Convert SDD VPATH Device Volume Group to an ESS hdisk Device

Volume Group

� Configure Disk/Site Locations for Cross-Site LVM Mirroring

More information about Cross-Site mirroring implementation is given in the example in Chapter 15, “PowerHA with cross-site LVM mirroring” on page 641.

You can find more detailed descriptions of the specific physical volume management tasks in the 7.4.5, “Examples” on page 360.

7.4.5 Examples In this section we present some scenarios with C-SPOC LVM options to administer your cluster. We show the following examples:

1. Adding a scalable enhanced concurrent volume group to the existing cluster

2. Adding a concurrent volume group and a new concurrent resource group to the existing cluster

3. Enabling disks in a volume group for Cross-Site LVM Mirroring

4. Creating a new logical volume

5. Creating a new jfs2log logical volume

6. Creating a new file system

7. Extend a file system for VG using Cross-Site LVM

8. Increasing the size of a file system.

9. Removing a file system

In our examples we used a two-node cluster based on VIO clients, one ethernet network using IPAT via aliasing, and a disk heartbeat network for non-IP communication. The storage is DS4k presented through VIO servers. Figure 7-20 shows our test cluster setup.


Figure 7-20 C-SPOC LVM testing cluster setup

Adding a scalable enhanced concurrent volume groupThe following example shows how to add a new volume group into the cluster. We also explain how you can enable fast disk takeover (enhanced concurrent capable) for this volume group and add it into an existing resource group at the same time.

Before creating a shared VG for the cluster using C-SPOC, we check that the following conditions are true:

� All disk devices are properly configured on all cluster nodes and the devices are listed as available on all nodes.

� Disks have a PVID.

In PowerHA 5.5 SP1, there are significant enhancements to the way we add new volume groups to the cluster.


We added the enhanced concurrent capable volume group by running:

smitty cl_admin HACMP Logical Volume Management Shared Volume Groups Create a Shared Volume Group F7 select nodes, Enter F7 select disk/s, Enter select VG type from picklist.

As a result of the volume group type that we chose, we create a scalable volume group as shown in Example 7-3. From here, if we also want to add this new volume group to a resource group, we have the option to either select an existing resource group from the picklist or we can create a new resource group.

Example 7-3 Adding a scalable enhanced concurrent VG

Create a Shared Scalable Volume Group


[TOP] [Entry Fields] Node Names ndu1,ndu2 Resource Group Name [existing_rg] + PVID 000fe4013b8a8cbb VOLUME GROUP name [new_vg] Physical partition SIZE in megabytes 4 + Volume group MAJOR NUMBER [35] # Enable Volume Group for Fast Disk Takeover? true + Volume Group Type Scalable Maximum Physical Partitions in units of 1024 32 + Maximum Number of Logical Volumes 256 +F1=Help F2=Refresh F3=Cancel F4=ListF5=Reset F6=Command F7=Edit F8=ImageF9=Shell F10=Exit Enter=Do

Tip: From PowerHA 5.5 SP1, we can now choose the volume group type from C-SPOC picklist. The options we now have are Original, Big, Scalable, and Legacy (mkvg -I <uppercase i) for VGs that can be imported to AIX 5.2 and AIX 5.1). In previous releases, some of these additional options were manually enabled at the AIX command line.

Important: When choosing to create a new resource group from the C-SPOC Logical Volume Management menu, the resource group will be created with the following default policies. These can be changed in the HACMP Extended Resource Group Configuration after creation if desired:

� Startup: Online on home node only� Fallover: Fallover to next priority node in the list� Fallback: Never Fallback


Adding a concurrent VG and a new concurrent RGThe following example shows how to add a new concurrent volume group into the cluster. We also show how to add this new volume group into a new concurrent resource group in the same operation.

Before creating a shared VG for the cluster using C-SPOC, we check that the following conditions are true:

� All disk devices are properly configured on all cluster nodes and the devices are listed as available on all nodes.

� Disks have a PVID.

We added the concurrent volume group and resource group by running:

1. smitty cl_admin HACMP Concurrent Logical Volume Management Concurrent Volume Groups Create a Concurrent Volume Group

2. Use F7 to select nodes, press Enter, use F7 to select disk/s, press Enter, then select VG type from the picklist.

As a result of the volume group type that we chose, we created a big, concurrent volume group as displayed in Example 7-4

Example 7-4 Create a new concurrent volume group and concurrent resource group

Create a Concurrent Volume Group


[TOP] [Entry Fields] Node Names ndu1,ndu2 Resource Group Name [new_OOAAN_rg] + PVID 000fe4013b8a8cbb VOLUME GROUP name [conc_vg] Physical partition SIZE in megabytes 4 + Volume group MAJOR NUMBER [35] +# Enable Cross-Site LVM Mirroring Verification false + Volume Group Type BigF1=Help F2=Refresh F3=Cancel F4=ListF5=Reset F6=Command F7=Edit F8=ImageF9=Shell F10=Exit Enter=Do

In Example 7-5 you can see the output from the command used to create this volume group and resource group. The cluster must now be synchronized for the resource group changes to take effect, however, the volume group information has been imported to all cluster nodes selected for the operation immediately.


Example 7-5 Output from concurrent VG and concurrent RG creation

COMMAND STATUS



ndu1: conc_vgndu1: mkvg: This concurrent capable volume group must be varied on manually.ndu1: synclvodm: No logical volumes in volume group conc_vg.ndu1: Volume group conc_vg has been updated.ndu2: synclvodm: No logical volumes in volume group conc_vg.ndu2: 0516-783 importvg: This imported volume group is concurrent capable.ndu2: Therefore, the volume group must be varied on manually.ndu2: 0516-1804 chvg: The quorum change takes effect immediately.ndu2: Volume group conc_vg has been imported.cl_mkvg: The HACMP configuration has been changed - Resource Group new_OOAAN_rg has been added. The configuration must be synchronized to make this change effective across the clustercl_mkvg: Discovering Volume Group Configuration...

Enabling disks in a volume group for Cross-Site LVM MirroringThe following example shows two ways to enable disks in a volume group for Cross-Site LVM Mirroring.

As seen in Example 7-6, we can enable Cross-Site LVM Mirroring Verification on creation of a new volume group.

Example 7-6 Create shared volume group and enable cross-site lvm mirroring

Create a Shared Volume Group


[TOP] [Entry Fields] Node Names ndu1,ndu2 Resource Group Name [xsite_rg] + PVID 000fe4013b8a8cbb

Important: When choosing to create a new concurrent resource group from the C-SPOC Concurrent Logical Volume Management menu, the resource group will be created with the following default policies:

� Startup: Online On All Available Nodes� Fallover: Bring Offline (On Error Node Only)� Fallback: Never Fallback


VOLUME GROUP name [xsite_vg] Physical partition SIZE in megabytes 4 + Volume group MAJOR NUMBER [35] # Enable Cross-Site LVM Mirroring Verification true + Enable Volume Group for Fast Disk Takeover? true + Volume Group Type Original


Creating a new logical volumeThe following example shows how to create a new logical volume in the selected volume group, which is already active as part of a resource group.

We add the LV adminlv in the VG named adminvg by running smitty cl_admin HACMP Logical Volume Management Shared Logical Volumes. We select the adminvg VG from the picklist. On the subsequent panel, we select devices for LV allocation as shown in Example 7-7.

Example 7-7 C-SPOC creating new LV -1

+--------------------------------------------------------------------------+ | Physical Volume Names | | | | Move cursor to desired item and press F7. | | ONE OR MORE items can be selected. | | Press Enter AFTER making all selections. | | | | Auto-select | | ndu1 hdisk2 | | |

Then we populated the necessary fields as shown in Example 7-8.

Example 7-8 C-SPOC creating new LV - 2

Add a Shared Logical Volume


[TOP] [Entry Fields] Resource Group Name cspoc_rg

Restriction: When enabling Cross-Site LVM Mirroring Verification for a volume group, sites must already be configured in your cluster.


VOLUME GROUP name adminvg Reference node ndu1* Number of LOGICAL PARTITIONS [10] # PHYSICAL VOLUME names hdisk2 Logical volume NAME [adminlv] Logical volume TYPE [jfs2] + POSITION on physical volume middle + RANGE of physical volumes minimum + MAXIMUM NUMBER of PHYSICAL VOLUMES [] # to use for allocation Number of COPIES of each logical 1 + partition

The adminlv is created and information is propagated on the other cluster nodes.

Creating a new jfslog2 logical volumeFor adding a new jfs2log logical volume adminloglv in the adminvg volume group, we used the same procedure as described previously in “Creating a new logical volume” on page 365. In the C-SPOC creating new LV - 2 panel, shown in Example 7-8 on page 365, we select the jfs2log as type of the LV:

Logical volume TYPE [jfs2log]

Creating a new file systemThe following example shows how to create a jfs2 file system on a previously defined logical volume. We run the following commands:

1. smitty cl_admin HACMP Logical Volume Management Shared File Systems Enhanced Journaled File Systems Add an Enhanced Journaled File System on a Previously Defined Logical Volume

2. Then we select the previously created logical volume from the picklist. After that, we fill in the necessary fields as shown in Example 7-9.

Example 7-9 C-SPOC creating jfs2 file system on a previously defined LV

Add an Enhanced Journaled File System on a Previously Defined Logical Volume


[Entry Fields] Resource Group cspoc_rg Node Names ndu1,ndu2 LOGICAL VOLUME name adminlv Volume Group adminvg* MOUNT POINT [/admincspoc] / PERMISSIONS read/write +


Mount OPTIONS [] + Block Size (bytes) 4096 + Inline Log? no + Inline Log size (MBytes) [] # Logical Volume for Log adminloglv + Extended Attribute Format Version 1 + ENABLE Quota Management? no +


The /admincspoc file system is now created. The contents of /etc/filesystems on both nodes are now updated with the correct jfs2log, and our file system is mounted in the running cluster.

Extending a shared file system for VG using Cross-Site LVMThe following example shows how to extend a shared file system within an existing volume group which is already being used for Cross-Site LVM mirroring. We run the following commands:

1. smitty cl_admin HACMP Logical Volume Management Shared Logical Volumes Set Characteristics of a Shared Logical Volume Increase the Size of a Shared Logical Volume

2. Select the logical volume to be extended as seen in Example 7-10

Example 7-10 increase the size of a shared LV - LV selection

Set Characteristics of a Shared Logical Volume


Rename a Shared Logical Volume Increase the Size of a Shared Logical Volume Add a Copy to a Shared Logical Volume Remove a Copy from a Shared Logical Volume

+--------------------------------------------------------------------------+ ¦ Shared Logical Volume Names ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. ¦

Tip: With jfs2, we also have the option to use inline logs which can be configured from the options in the example above.


¦ ¦ ¦ test loglv02 ¦ ¦ > test xsitelv ¦ ¦ ¦ ¦ Esc+1=Help Esc+2=Refresh Esc+3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦Es¦ /=Find n=Find Next ¦F9+--------------------------------------------------------------------------+

3. The next panel displays the disks on the local site that are members of the volume group. Select the disks you want to extend the logical volume on as seen in Example 7-11. Do not use AUTO-SELECT.

Example 7-11 increase the size of a shared LV - Disk selection




+--------------------------------------------------------------------------+ ¦ Physical Volume Names ¦ ¦ ¦ ¦ Move cursor to desired item and press F7. ¦ ¦ ONE OR MORE items can be selected. ¦ ¦ Press Enter AFTER making all selections. ¦ ¦ ¦ ¦ Auto-select ¦ ¦ > jordan hdisk9 Scotland ¦ ¦ > jordan hdisk10 Scotland ¦ ¦ ¦ ¦ Esc+1=Help Esc+2=Refresh Esc+3=Cancel ¦ ¦ F7=Select F8=Image F10=Exit ¦Es¦ Enter=Do /=Find n=Find Next ¦F9+--------------------------------------------------------------------------+


4. After the logical volume has been increased, check the size with lsfs -q as shown in Example 7-12.

Example 7-12 increase the size of a shared LV - Complete and lsfs -q output

Increase the Size of a Shared Logical Volume


[Entry Fields] Resource Group Name testrg LOGICAL VOLUME name xsitelv Reference node jordan* Number of ADDITIONAL logical partitions [5] # PHYSICAL VOLUME names hdisk9 hdisk10 POSITION on physical volume outer_middle + RANGE of physical volumes minimum + MAXIMUM NUMBER of PHYSICAL VOLUMES [] # to use for allocation Allocate each logical partition copy yes + on a SEPARATE physical volume? File containing ALLOCATION MAP []

jordan /# lsfs -q<snip>/dev/xsitelv -- /xsitefs jfs2 163840 rw no no (lv size: 327680, fs size: 163840, block size: 4096, sparse files: yes, inline log: no, inline log size: 0, EAformat: v1, Quota: no, DMAPI: no, VIX: no)

Now you can extend the file system as seen in Example 7-13. Run the following commands:

5. smitty cl_admin HACMP Logical Volume Management Shared File Systems Enhanced Journaled File Systems Change / Show Characteristics of a Shared Enhanced Journaled File System.

Example 7-13 Increase the size of a file system

Change/Show Characteristics of a Shared Enhanced Journaled File System


[TOP] [Entry Fields] Resource Group Name rg1 File system name /xsitefs Node Names ndu1,ndu2


NEW mount point [/xsitefs] Volume group name xsite_vg SIZE of file system Unit Size Megabytes + Number of units [+1] # Mount GROUP [] Mount AUTOMATICALLY at system restart? no PERMISSIONS read/write + Mount OPTIONS [] + Start Disk Accounting? no +[MORE...6]


Increasing the size of a file systemThe following example shows how to increase the size of a shared file system with C-SPOC. We run the following commands:

1. smitty cl_admin HACMP Logical Volume Management Shared File Systems Enhanced Journaled File Systems Change / Show Characteristics of a Shared Enhanced Journaled File System

2. Then SMIT file system selection list appears and we select /cltestfs file system as shown in Example 7-14.

Example 7-14 C-SPOC change file system selection

| Enhanced Journaled File System Name and Resource Group | | | | Move cursor to desired item and press Enter. | | | | #File System Volume Group Resource Group Node List | | /admincspoc adminvg cspoc_rg ndu1,ndu2 | | /rosie bcklupvg <None> ndu1,ndu2 | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do |F1| /=Find n=Find Next |F9+--------------------------------------------------------------------------+


3. After selecting the file system, we chose to add 1 GB in size as shown in Example 7-15. You can change the unit size as applicable from the picklist; we chose to increase the size in gigabytes.

Example 7-15 C-SPOC change file system - options



[TOP] [Entry Fields] Resource Group Name cspoc_rg File system name /admincspoc Node Names ndu1,ndu2 NEW mount point [/admincspoc] Volume group name adminvg SIZE of file system Unit Size Gigabytes + Number of units [+1] # Mount GROUP [] Mount AUTOMATICALLY at system restart? no PERMISSIONS read/write + Mount OPTIONS [] + Start Disk Accounting? no + Block Size (bytes) 4096 Inline Log? no Inline Log size (MBytes) [0] # Extended Attribute Format [v1] ENABLE Quota Management? no Allow Small Inode Extents? [yes] +[BOTTOM]


Removing a file systemThe following example shows how to remove a shared file system with C-SPOC. Before starting the action, it is necessary to unmount this file system manually.

1. Run the following command:

umount /admincspoc


2. Then we run:

smitty cl_admin HACMP Logical Volume Management Shared File Systems Enhanced Journaled File Systems Remove a Shared File System

3. Then we selected the file system from the picklist. Then we need to confirm the action on the next SMIT panel. The panel is shown in Example 7-16.

Example 7-16 C-SPOC remove a file system

Remove a Shared File System


[Entry Fields] Resource Group Name cspoc_rg* FILE SYSTEM name /admincspoc + Node Names ndu1,ndu2 Volume group name adminvg Remove Mount Point no +


7.4.6 C-SPOC command line interface (CLI)

PowerHA 5.5 SP1 introduced a new command line interface (CLI) for C-SPOC. The CLI is a set of C-SPOC commands that can be run from the command line to perform the same tasks as their corresponding C-SPOC SMIT menu.

The CLI is oriented for root users who need to run certain tasks with shell scripts rather than through a SMIT menu. The C-SPOC CLI commands are located in the /usr/es/sbin/cluster/cspoc directory and they all have a name with the cli_ prefix.

Similar to the C-SPOC SMIT menus, the CLI commands log their operations in the cspoc.log log file on the node where the CLI command has been run.


A list of the commands are shown in Figure 7-21. While the names are quite descriptive as to what function each one provides, we have also provided their corresponding man pages.

Figure 7-21 C-SPOC CLI command listing

cli_assign_pvidsAssign a PVID to each of the disks passed as arguments, then update all other cluster nodes with those PVIDs.

Syntax

cli_assign_pvids PhysicalVolume ...

Description

Directs LVM to assign a PVID to each of the physical volumes in the list (if one is not already present), and then makes those PVIDs known on all cluster nodes.

tarah /usr/es/sbin/cluster/cspoc# ls -al cli_*-rwxr-xr-x 1 root system 3276 Oct 07 2008 cli_assign_pvids-rwxr-xr-x 1 root system 2454 Oct 07 2008 cli_chfs-rwxr-xr-x 1 root system 2388 Oct 07 2008 cli_chlv-rwxr-xr-x 1 root system 2564 Oct 07 2008 cli_chvg-rwxr-xr-x 1 root system 2446 Oct 07 2008 cli_crfs-rwxr-xr-x 1 root system 2751 Oct 07 2008 cli_crlvfs-rwxr-xr-x 1 root system 3329 Oct 07 2008 cli_extendlv-rwxr-xr-x 1 root system 3602 Oct 07 2008 cli_extendvg-rwxr-xr-x 1 root system 4249 Oct 07 2008 cli_importvg-rwxr-xr-x 1 root system 3313 Oct 07 2008 cli_mirrorvg-rwxr-xr-x 1 root system 3676 Oct 07 2008 cli_mklv-rwxr-xr-x 1 root system 2611 Oct 07 2008 cli_mklvcopy-rwxr-xr-x 1 root system 5262 Oct 07 2008 cli_mkvg-rwxr-xr-x 1 root system 2710 Oct 07 2008 cli_on_cluster-rwxr-xr-x 1 root system 3264 Oct 07 2008 cli_on_node-rwxr-xr-x 1 root system 3648 Oct 07 2008 cli_reducevg-rwxr-xr-x 1 root system 3870 Oct 07 2008 cli_replacepv-rwxr-xr-x 1 root system 2412 Oct 07 2008 cli_rmfs-rwxr-xr-x 1 root system 2883 Oct 07 2008 cli_rmlv-rwxr-xr-x 1 root system 2612 Oct 07 2008 cli_rmlvcopy-rwxr-xr-x 1 root system 2411 Oct 07 2008 cli_syncvg-rwxr-xr-x 1 root system 3336 Oct 07 2008 cli_unmirrorvg-rwxr-xr-x 1 root system 2386 Oct 07 2008 cli_updatevg

Note: The C-SPOC CLI is also available in 5.3 SP6 and 5.4 SP2.


Examples

To assign PVIDs to a list of disks, and have those PVIDs known across the cluster, enter:

cli_assign_pvids hdisk101 hdisk102 hdisk103

Implementation specifics

This command is part of High Availability Cluster Multi-Processing for AIX (HACMP for AIX). It must be run as root, on a node in an HACMP cluster. It should only be used on physical disks accessible from all cluster nodes.

The '-f' flag is passed to cl_assign_pvids to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_assign_pvids The cli_assign_pvids command executable file

/usr/es/sbin/cluster/cspoc/clgetpvids C-SPOC utility to update PVID in ODM

/usr/es/sbin/cluster/sbin/cl_assign_pvids The C-SPOC cl_assign_pvids executable

/usr/es/sbin/cluster/samples/cspoc/cl_assign_pvids.cel Source code for cl_assign_pvids

cli_chfsChange the attributes of a file system on all nodes in a cluster.

Syntax

cli_chfs [ -m NewMountPoint ] [ -u MountGroup ] [ -p { ro | rw } ] [ -t { yes | no } ] [ -a Attribute=Value ] [ -d Attribute ] FileSystem

Description

Uses C-SPOC to run the chfs command with the given parameters, and make the updated file system definition known on all cluster nodes.


Flags

Only the following flags from the chfs command are supported:

-d Attribute

Deletes the specified attribute from the /etc/filesystems file for the specified file system.

-m NewMountPoint

Specifies a new mount point for the specified file system.

-p

Sets the permissions for the file system.

ro

Specifies read-only permissions.

rw

Specifies read-write permissions.

-t

Sets the accounting attribute for the specified file system.

yes

File system accounting is to be processed by the accounting subsystem.

no

File system accounting is not to be processed by the accounting subsystem; this is the default.

-u MountGroup

Specifies the mount group. Mount groups are used to group related mounts, so that they can be mounted as one instead of mounting each individually. For example, when performing certain tests, if several scratch file systems always need to be mounted together, they can each be placed in the test mount group. They can then all be mounted with a single command, such as the mount -t test command.

-a Attribute=Value

Specifies the Attribute=Value pairs dependent on virtual file system type. To specify more than one Attribute=Value pair, provide multiple -a Attribute=Value parameters.


Examples

In general, any operation valid with the chfs command that uses the supported operands above is valid with cli_chfs. For example, to change the size of the shared file system /test:

cli_chfs -a size=32768 /test

To increase the size of the shared file system called /lv1_fs1, issue:

node61# df -m /lv1_fs1Filesystem MB blocks Free %Used Iused %Iused Mounted on/dev/lv1 64.00 61.95 4% 17 1% /lv1_fs1

node62# cli_chfs -a size=+16M /lv1_fs1

node61# df -m /lv1_fs1Filesystem MB blocks Free %Used Iused %Iused Mounted on/dev/lv1 80.00 77.45 4% 17 1% /lv1_fs1


This command is part of High Availability Cluster Multi-Processing for AIX (HACMP for AIX). It must be run as root, on a node in an HACMP cluster.

It should not be used on file systems in rootvg, or that otherwise might appear multiple times across the cluster. The automount attribute is not supported.

The '-f' flag is passed to cl_chfs to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_chfs The cli_chfs commandexecutable file

/usr/es/sbin/cluster/sbin/cl_chfs The C-SPOC cl_chfs executable/usr/es/sbin/cluster/samples/cspoc/cl_chfs.cel Source code for cl_chfs

cli_chlvChange the attributes of a logical volume on all nodes in a cluster.

Note: The /lv1_fs1 file system is locally mounted on node61 and we ran cli_chfs from the other cluster node node62. This is perfectly valid, the communication between the nodes occurs through the clcomdES daemon.


Syntax

cli_chlv [-a Position] [-b BadBlocks] [-d Schedule] [-e Range] [-L label] [-p Permission] [-r Relocate] [-s Strict] [-t Type] [-u Upperbound] [-v Verify] [-w MirrorWriteConsistency] [-x Maximum] [-U userid] [-G groupid] [-P modes] LogicalVolume

Description

Uses C-SPOC to run the chlv command with the given parameters, and make the updated logical volume definition known on all cluster nodes.

Flags

Only the following flags from the chlv command are supported:

-a Position

Sets the intraphysical volume allocation policy (the position of the logical partitions on the physical volume). The Position variable is represented by one of the following values:

m

Allocates logical partitions in the outer middle section of each physical volume. This is the default position.

c

Allocates logical partitions in the center section of each physical volume.

e

Allocates logical partitions in the outer edge section of each physical volume.

ie

Allocates logical partitions in the inner edge section of each physical volume.

im

Allocates logical partitions in the inner middle section of each physical volume.

-b BadBlocks

Sets the bad-block relocation policy. The BadBlocks variable is represented by one of the following values:

y

Causes bad-block relocation to occur.


n

Prevents bad block relocation from occurring.

-d Schedule

Sets the scheduling policy when more than one logical partition is written. Must use parallel or sequential to mirror a striped lv. The Schedule variable is represented by one of the following values:

p

Establishes a parallel scheduling policy.

ps

Parallel write with sequential read policy. All mirrors are written in parallel but always read from the first mirror if the first mirror is available.

pr

Parallel write round robin read. This policy is similar to the parallel policy except an attempt is made to spread the reads to the logical volume more evenly across all mirrors.

s

Establishes a sequential scheduling policy. When specifying policy of parallel or sequential strictness, set to s for super strictness.

-e Range

Sets the interphysical volume allocation policy (the number of physical volumes to extend across, using the volumes that provide the best allocation). The value of the Range variable is limited by the Upperbound variable, set with the -u flag, and is represented by one of the following values:

x

Allocates logical partitions across the maximum number of physical volumes.

m

Allocates logical partitions across the minimum number of physical volumes.

-G Groupid

Specifies group ID for the logical volume special file.

-L Label

Sets the logical volume label. The maximum size of the Label variable is 127 characters.


-n NewLogicalVolume

Changes the name of the logical volume to that specified by the NewLogicalVolume variable. Logical volume names must be unique system wide and can range from 1 to 15 characters.

-p Permission

Sets the access permission to read-write or read-only. The Permission variable is represented by one of the following values:

w

Sets the access permission to read-write.

r

Sets the access permission to read-only.

-P Modes

Specifies permissions (file modes) for the logical volume special file.

-r Relocate

Sets the reorganization flag to allow or prevent the relocation of the logical volume during reorganization. The Relocate variable is represented by one of the following values:

y

Allows the logical volume to be relocated during reorganization. If the logical volume is striped, the chlv command will not let you change the relocation flag to y.

n

Prevents the logical volume from being relocated during reorganization.

-s Strict

Determines the strict allocation policy. Copies of a logical partition can be allocated to share or not to share the same physical volume. The Strict variable is represented by one of the following values:

y

Sets a strict allocation policy, so copies of a logical partition cannot share the same physical volume.

Note: Mounting a JFS file system on a read-only logical volume is not supported.


n

Does not set a strict allocation policy, so copies of a logical partition can share the same physical volume.

s

Sets a super strict allocation policy, so that the partitions allocated for one mirror cannot share a physical volume with the partitions from another mirror.

-t Type

Sets the logical volume type. The maximum size is 31 characters. If the logical volume is striped, you cannot change Type to boot.

-U Userid

Specifies user ID for the logical volume special file.

-u Upperbound

Sets the maximum number of physical volumes for new allocation. The value of the Upperbound variable should be between one and the total number of physical volumes. When using super strictness, the upperbound indicates the maximum number of physical volumes allowed for each mirror copy. When using striped logical volumes, the upper bound must be multiple of Stripe_width.

-v Verify

Sets the write-verify state for the logical volume. Causes all writes to the logical volume either to be verified with a follow-up read or not to be verified with a follow-up read. The Verify variable is represented by one of the following values:

y

Causes all writes to the logical volume to be verified with a follow-up read.

n

Causes all writes to the logical volume not to be verified with a follow-up read.

Note: When changing a non-superstrict logical volume to a superstrict logical volume you must use the -u flag.


-w MirrorWriteConsistencyy or a

Turns on active mirror write consistency which ensures data consistency among mirrored copies of a logical volume during normal I/O processing.

p

Turns on passive mirror write consistency which ensures data consistency among mirrored copies during volume group synchronization after a system interruption.

n

No mirror write consistency. See the -f flag of the syncvg command.

-x Maximum

Sets the maximum number of logical partitions that can be allocated to the logical volume. The maximum number of logical partitions per logical volume is 32,512.

Examples

In general, any operation valid with the chlv command that uses the supported operands above is valid with cli_chlv. For example, to change the interphysical volume allocation of logical volume lv01, enter:

cli_chlv -e m lv01



It should not be used on any logical volume in rootvg, or that otherwise might appear multiple times across the cluster.

The '-f' flag is passed to cl_chlv to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_chlv The cli_chlv command executable file

usr/es/sbin/cluster/sbin/cl_chlv The C-SPOC cl_chlv executable/usr/es/sbin/cluster/samples/cspoc/cl_chlv.cel Source code for cl_chlv

Note: This functionality is only available on big volume groups.


cli_chvgChange the attributes of a volume group on all nodes in a cluster.

Syntax

cli_chvg [ -s Sync { y | n }] [ -L LTGSize ] [ -Q { n | y } ] [ -u ] [ -t [factor ] ] [ -B ] [ -C ] VolumeGroup

Description

Uses C-SPOC to run the chvg command with the given parameters, and make the updated volume group definition known on all cluster nodes.

Flags

Only the following flags from the chvg command are supported:

-B

Changes the volume group to Big VG format. This can accommodate up to 128 physical volumes and 512 logical volumes.

Notes:

1) The -B flag cannot be used if there are any stale physical partitions.

2) The -B flag cannot be used if the volume group is varied on in concurrent mode.

3) There must be enough free partitions available on each physical volume for the VGDA expansion for this operation to be successful.

4) Because the VGDA resides on the edge of the disk and it requires contiguous space for expansion, the free partitions are required on the edge of the disk. If those partitions are allocated for application data, they will be migrated to other free partitions on the same disk. The rest of the physical partitions will be renumbered to reflect the loss of the partitions for VGDA usage. This will change the mappings of the logical to physical partitions in all the PVs of this VG. If you have saved the mappings of the LVs for a potential recovery operation, you should generate the maps again after the completion of the conversion operation. Also, if the backup of the VG is taken with the map option and if you plan to restore using those maps, the restore operation might fail, because the partition number might no longer exist (due to reduction). We recommend that you take a backup before the conversion, and right after the conversion, if the map option is utilized.

5) Because the VGDA space has been increased substantially, every VGDA update operation (creating a logical volume, changing a logical volume, adding a physical volume, and so on) might take considerably longer to run.


-C

Changes the volume group into an Enhanced Concurrent Capable volume group. Changes the volume group varied on in non-concurrent mode to Enhanced Concurrent Capable. This requires that the volume group be re-imported on all other nodes prior to activation in Enhanced Concurrent mode. Changes the volume group varied on in Concurrent mode to an Enhanced Concurrent mode volume group. Only use the -C flag in a configured HACMP cluster.

Enhanced Concurrent volume groups use Group Services. Group Services ships with HACMP and must be configured prior to activating a volume group in this mode.

Use this flag to change a volume group into an Enhanced Concurrent Capable volume group. Notes:

1) Enhanced Concurrent volume groups use Group Services. Group Services ships with HACMP ES and must be configured prior to activating a volume group in this mode.

2) Only Enhanced Concurrent Capable volume groups are supported when running with a 64-bit kernel. Concurrent Capable volume groups are not supported when running with a 64-bit kernel.

-L

For volume groups created on AIX 5.3, the -L flag is ignored. When the volume group is varied on, the logical track group size will be set to the common max transfer size of the disks.

For volume groups created prior to AIX 5.3, the -L flag changes the logical track group size, in number of kilobytes, of the volume group. The value of the LTGSize parameter must be 0, 128, 256, 512, or 1024. In addition, it should be less than or equal to the maximum transfer size of all disks in the volume group. The default size is 128 kilobytes. An LTGSize of 0 will cause the next varyonvg to set the logical track group size to the common max transfer size of the disks.

-Q

Determines if the volume group is automatically varied off after losing its quorum of physical volumes. The default value is yes. The change becomes effective the next time the volume group is activated.

n

The volume group stays active until it loses all of its physical volumes.

y

The volume group is automatically varied off after losing its quorum of physical volumes.


-s Sync

Sets the synchronization characteristics for the volume group specified by the VolumeGroup parameter. Either permits (y) the automatic synchronization of stale partitions or prohibits (n) the automatic synchronization of stale partitions. This flag has no meaning for non-mirrored logical volumes. Automatic synchronization is a recovery mechanism that will only be attempted after the LVM device driver logs LVM_SA_STALEPP in the errpt. A partition that becomes stale through any other path (for example, mklvcopy) will not be automatically resynced.

y

Attempts to automatically synchronize stale partitions.

n

Prohibits automatic synchronization of stale partitions. This is the default for a volume group.

-t [factor]

Changes the limit of the number of physical partitions per physical volume, specified by factor. factor should be between 1 and 16 for 32 disk volume groups and 1 and 64 for 128 disk volume groups.

If factor is not supplied, it is set to the lowest value such that the number of physical partitions of the largest disk in volume group is less than factor x 1016.

If factor is specified, the maximum number of physical partitions per physical volume for this volume group changes to factor x 1016. Notes:

1This option is ignored for Scalable-type volume groups.

2) If the volume group was created in AIX 4.1.2 in violation of 1016 physical partitions per physical volume limit, this flag can be used to convert the marking of partitions.

3) factor cannot be changed if there are any stale physical partitions in the volume group.

4) This flag cannot be used if the volume group is varied on in concurrent mode.

5) The maximum number of physical volumes that can be included in this volume group will be reduced to (MAXPVS/factor).

Note: This flag is not supported for the concurrent capable volume groups.


-u

Unlocks the volume group. This option is provided if the volume group is left in a locked state by abnormal termination of another LVM operation (such as the command core dumping, or the system crashing).

Examples

In general, any operation valid with the chvg command that uses the supported operands above is valid with cli_chvg. For example, to change the a volume group to quorum off, enter:

cli_chvg -Q n vg01



It should not be used on rootvg, or any other volume group that otherwise might appear multiple times across the cluster.

The '-f' flag is passed to cl_chvg to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_chvg The cli_chvg commandexecutable file

/usr/es/sbin/cluster/sbin/cl_chvg The C-SPOC cl_chvgexecutable

/usr/es/sbin/cluster/samples/cspoc/cl_chvg.cel Source code for cl_chvg

cli_crfsCreate a new file system, and make it known on all nodes in a cluster.

Syntax

cli_crfs -v VfsType { -g VolumeGroup | -d Device } [ -l LogPartitions ] -m MountPoint [ -u MountGroup ] [ -A { yes | no } ] [ -p {ro | rw } ] [ -a Attribute=Value ... ] [ -t { yes | no } ]

Description

Uses C-SPOC to run the crfs command with the given parameters, and make the updated file system definition known on all cluster nodes.

Note: Before using the -u flag, make sure that the volume group is not being used by another LVM command.


Flags

Only the following flags from the crfs command are supported:

-a Attribute=Value

Specifies a virtual file system-dependent attribute/value pair. To specify more than one attribute/value pair, provide multiple -a Attribute=Value parameters

-d Device

Specifies the device name of a device or logical volume on which to make the file system. This is used to create a file system on an already existing logical volume.

-g VolumeGroup

Specifies an existing volume group on which to make the file system. A volume group is a collection of one or more physical volumes.

-l LogPartitions

Specifies the size of the log logical volume, expressed as a number of logical partitions. This flag applies only to JFS and JFS2 file systems that do not already have a log device.

-m MountPoint

Specifies the mount point, which is the directory where the file system will be made available. Note: If you specify a relative path name, it is converted to an absolute path name before being inserted into the /etc/filesystems file.

-p


ro

Read-only permissions

rw

Read-write permissions

-t

Specifies whether the file system is to be processed by the accounting subsystem:

yes

Accounting is enabled on the file system.

no

Accounting is not enabled on the file system (default value).


-u MountGroup

Specifies the mount group.

-v VfsType

Specifies the virtual file system type.

Note: The agblksize attribute is set at file system creation and cannot be changed after the file system is successfully created. The size attribute defines the minimum file system size, and you cannot decrease it after the file system is created.

Examples

In general, any operation valid with the crfs command that uses the supported operands above is valid with cli_crfs. For example, to create a JFS file system on an existing logical volume lv01, enter:

cli_crfs -v jfs -d lv01 -m /tstvg -a 'size=32768'



It should not be used to create a file system in rootvg, or that otherwise might appear multiple times across the cluster.

The '-f' flag is passed to cl_crfs to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_crfs The cli_crfs commandexecutable file

/usr/es/sbin/cluster/sbin/cl_crfs The C-SPOC cl_crfs executable/usr/es/sbin/cluster/samples/cspoc/cl_crfs.cel Source code for cl_crfs

cli_crlvfsCreate a new logical volume and file system on it, and make it known on all nodes in a cluster.

Syntax

cli_crlvfs -v VfsType -g VolumeGroup [ -l LogPartitions ] -m MountPoint [ -u MountGroup ] [ -A { yes | no } ] [ -p {ro | rw } ] [ -a Attribute=Value ... ] [ -t { yes | no } ]


Description

Uses C-SPOC to run the crfs command with the given parameters, and make the updated file system definition known on all cluster nodes.

Flags

Only the following flags from the crlvfs command are supported:

-a Attribute=Value

Specifies a virtual file system-dependent attribute/value pair. To specify more than one attribute/value pair, provide multiple -a Attribute=Value parameters.

-g VolumeGroup

Specifies an existing volume group on which to make the file system. A volume group is a collection of one or more physical volumes.

-l LogPartitions

Specifies the size of the log logical volume, expressed as a number of logical partitions. This flag applies only to JFS and JFS2 file systems that do not already have a log device.

-m MountPoint

Specifies the mount point, which is the directory where the file system will be made available. Note: If you specify a relative path name, it is converted to an absolute path name before being inserted into the /etc/filesystems file.

-p


ro

Read-only permissions

rw

Read-write permissions

-t

Specifies whether the file system is to be processed by the accounting subsystem:

yes

Accounting is enabled on the file system.

no

Accounting is not enabled on the file system (default value).


-u MountGroup

Specifies the mount group.

-v VfsType

Specifies the virtual file system type. Note, the agblksize attribute is set at file system creation and cannot be changed after the file system is successfully created. The size attribute defines the minimum file system size, and you cannot decrease it after the file system is created.

Examples

In general, any operation valid with the crfs command that uses the supported operands above is valid with cli_crlvfs. For example, to create a JFS file system on volume group vg01, enter:

cli_crlvfs -v jfs -g vg01 -m /tstvg -a 'size=32768'



It should not be used to create a file system in rootvg, or that otherwise might appear multiple times across the cluster.

The '-f' flag is passed to cl_crlvfs to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_crlvfs The cli_crlvfs command executable file

/usr/es/sbin/cluster/sbin/cl_crlvfs The C-SPOC cl_crlvfs executable

/usr/es/sbin/cluster/samples/cspoc/cl_crlvfs.cel Source code for cl_crlvfs

cli_extendlvIncreases the size of a logical volume on all nodes in a cluster by adding unallocated physical partitions from within the volume group.

Syntax

cli_extendlv [ -a Position ] [ -e Range ] [ -u Upperbound ] [ -s Strict ] LogicalVolume Partitions [ PhysicalVolume ... ]


Description

Uses C-SPOC to run the extendlv command with the given parameters, and make the updated logical volume definition known on all cluster nodes.

Flags

Only the following flags from the extendlv command are supported:

-a Position

Sets the intra-physical volume allocation policy (the position of the logical partitions on the physical volume). The Position variable can be one of the following values:

m


c


e


ie


im


-e Range

Sets the inter-physical volume allocation policy (the number of physical volumes to extend across, using the volumes that provide the best allocation). The value of the Range variable is limited by the Upperbound variable (set with the -u flag) and can be one of the following values:

x


m



-s Strict

Determines the strict allocation policy. Copies of a logical partition can be allocated to share or not to share the same physical volume. The Strict variable is represented by one of the following values:

y

Sets a strict allocation policy, so copies for a logical partition cannot share the same physical volume.

n

Does not set a strict allocation policy, so copies for a logical partition can share the same physical volume.

s


Note: When changing a non superstrict logical volume to a superstrict logical volume you must specify physical volumes or use the -u flag.

-u Upperbound

Sets the maximum number of physical volumes for new allocation. The value of the Upperbound variable should be between one and the total number of physical volumes. When using super strictness, the upper bound indicates the maximum number of physical volumes allowed for each mirror copy. When using striped logical volumes, the upper bound must be multiple of Stripe_width.

Examples

In general, any operation valid with the extendlv command that uses the supported operands above is valid with cli_extendlv. For example, to increase the size of the logical volume lv01 by three logical partitions, enter:

cli_extendlv lv01 3



It should not be used on any logical volume in rootvg, or that otherwise might be duplicated across the cluster.

The '-f' flag is passed to cl_extendlv to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.


Files

/usr/es/sbin/cluster/cspoc/cli_extendlv The cli_extendlv command executable file

/usr/es/sbin/cluster/sbin/cl_extendlv The C-SPOC cl_extendlv executable

/usr/es/sbin/cluster/samples/cspoc/cl_extendlv.cel Source code for cl_extendlv

cli_extendvgAdds physical volumes to a volume group on all nodes in a cluster.

Syntax

cli_extendvg VolumeGroup PhysicalVolume ...

Description

Uses C-SPOC to run the extendvg command with the given parameters, and make the updated volume group definition known on all cluster nodes.

Examples

In general, any operation valid with the extendvg command that uses the supported operands above is valid with cli_extendvg. For example, to add disks hdisk101 and hdisk111 to volume group vg01, enter:

cli_extendvg vg01 hdisk101 hdisk111



It should not be used on rootvg, or any other volume group that might appear multiple times across the cluster.

The '-f' flag is passed to cl_extendvg to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_extendvg The cli_extendvg command executable file

/usr/es/sbin/cluster/sbin/cl_extendvg The C-SPOC cl_extendvg executable

/usr/es/sbin/cluster/samples/cspoc/cl_extendvg.cel Source code for cl_extendvg


cli_importvgImports a new volume group definition from a set of physical volumes on all nodes in a cluster.

Syntax

cli_importvg [ -y VolumeGroup ] [ -V MajorNumber ] PhysicalVolume

Description

Uses C-SPOC to run the importvg command, which causes LVM on each cluster node to read the LVM information on the disks in the volume group, and update the local volume group definition.

Flags

-V MajorNumber

Specifies the major number of the imported volume group.

-y VolumeGroup

Specifies the name to use for the new volume group. If this flag is not used, the system automatically generates a new name.

The volume group name can only contain the following characters: "A" through "Z," "a" through "z," "0" through "9," or "_" (the underscore), "-" (the minus sign), or "." (the period). All other characters are considered invalid.

Examples

In general, any operation valid with the importvg command that uses the supported operands above is valid with cli_importvg. For example, to make the volume group bkvg from physical volume hdisk07 known on all cluster nodes, enter:

cli_importvg -y bkvg hdisk07




The '-f' flag is passed to cl_importvg to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.


Files

/usr/es/sbin/cluster/cspoc/cli_importvg The cli_importvg command executable file

usr/es/sbin/cluster/sbin/cl_importvg The C-SPOC cl_importvg executable

/usr/es/sbin/cluster/samples/cspoc/cl_importvg.cel Source code for cl_importvg

cli_mirrorvgMirrors all the logical volumes that exist on a given volume group on all nodes in a cluster.

Syntax

cli_mirrorvg [-S | -s] [-Q] [-c Copies] [-m] VolumeGroup [PhysicalVolume...]

Description

Uses C-SPOC to run the mirrorvg command with the given parameters, and make the updated volume group definition known on all cluster nodes.

Flags

Only the following flags from the mirrorvg command are supported:

-c Copies

Specifies the minimum number of copies that each logical volume must have after the mirrorvg command has finished running. It might be possible, through the independent use of mklvcopy, that some logical volumes might have more than the minimum number specified after the mirrorvg command has run. Minimum value is 2 and 3 is the maximum value. A value of 1 is ignored.

-m exact map

Allows mirroring of logical volumes in the exact physical partition order that the original copy is ordered. This option requires you to specify a PhysicalVolume(s) where the exact map copy should be placed. If the space is insufficient for an exact mapping, then the command will fail. You should add new drives or pick a different set of drives that will satisfy an exact logical volume mapping of the entire volume group. The designated disks must be equal to or exceed the size of the drives which are to be exactly mirrored, regardless of if the entire disk is used. Also, if any logical volume to be mirrored is already mirrored, this command will fail.


-Q Quorum Keep

By default in mirrorvg, when a volume group's contents becomes mirrored, volume group quorum is disabled. If the user wants to keep the volume group quorum requirement after mirroring is complete, this option should be used in the command. For later quorum changes, refer to the chvg command.

-S Background Sync

Returns the mirrorvg command immediately and starts a background syncvg of the volume group. With this option, it is not obvious when the mirrors have completely finished their synchronization. However, as portions of the mirrors become synchronized, they are immediately used by LVM for mirroring.

-s Disable Sync

Returns the mirrorvg command immediately without performing any type of mirror synchronization. If this option is used, the mirror might exist for a logical volume but is not used by the operating system until it has been synchronized with the syncvg command.

Examples

In general, any operation valid with the mirrorvg command that uses the supported operands above is valid with cli_mirrorvg. For example, to specify two copies for every logical volume in shared volume group vg01, enter:

cli_mirrorvg -c 2 vg01




The '-f' flag is passed to cl_mirrorvg to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_mirrorvg The cli_mirrorvg command executable file

/usr/es/sbin/cluster/sbin/cl_mirrorvg The C-SPOC cl_mirrorvg executable

/usr/es/sbin/cluster/samples/cspoc/cl_mirrorvg.cel Source code for cl_mirrorvg


cli_mklvCreate a new logical volume on all nodes in a cluster.

Syntax

cli_mklv [ -a Position ] [ -b BadBlocks ] [ -c Copies ] [ -d Schedule ] [ -e Range ] [ -i ] [ -L Label ] [ -o y / n ] [ -r Relocate ] [ -s Strict ] [ -t Type ] [ -u UpperBound ] [ -v Verify ] [ -w MirrorWriteConsistency ] [ -x Maximum ] [ -y NewLogicalVolume | -Y Prefix ] [ -S StripSize ] [ -U Userid ] [ -G Groupid ] [ -P Modes ] VolumeGroup NumberOfLPs [ PhysicalVolume ... ]

Description

Uses C-SPOC to run the mklv command with the given parameters, and make the new logical volume definition known on all cluster nodes.

Flags

Only the following flags from the mklv command are supported:

-a Position


m


c


e


ie


im



-b BadBlocks

Sets the bad-block relocation policy. The Relocation variable can be one of the following values:

y

Causes bad-block relocation to occur. This is the default.

n

Prevents bad-block relocation from occurring.

-c Copies

Sets the number of physical partitions allocated for each logical partition. The Copies variable can be set to a value from 1 to 3; the default is 1.

-d Schedule

Sets the scheduling policy when more than one logical partition is written. Must use parallel or sequential to mirror a striped lv. The Schedule variable is represented by one of the following values:

p

Establishes a parallel scheduling policy.

ps

Parallel write with sequential read policy. All mirrors are written in parallel but always read from the first mirror if the first mirror is available.

pr

Parallel write round robin read. This policy is similar to the parallel policy except an attempt is made to spread the reads to the logical volume more evenly across all mirrors.

s

Establishes a sequential scheduling policy. When specifying policy of parallel or sequential strictness, set to s for super strictness.

-e Range


x



m


-G Groupid

Specifies group ID for the logical volume special file.

-i

Reads the PhysicalVolume parameter from standard input. Use the -i flag only when PhysicalVolume is entered through standard input.

-L

Sets the logical volume label. The default label is None. The maximum size of the label file is 127 characters. Note: If the logical volume is going to be used as a journaled file system (JFS), then the JFS will use this field to store the mount point of the file system on that logical volume for future reference.

-P Modes

Specifies permissions (file modes) for the logical volume special file.

-r Relocate

Sets the reorganization relocation flag. For striped logical volumes, the Relocate parameter must be set to n (the default for striped logical volumes). The Relocate parameter can be one of the following values:

y

Allows the logical volume to be relocated during reorganization. This is the default for relocation.

n

Prevents the logical volume from being relocated during reorganization.

-s Strict

Determines the strict allocation policy. Copies of a logical partition can be allocated to share or not to share the same physical volume. The Strict parameter is represented by one of the following values:

y

Sets a strict allocation policy, so copies for a logical partition cannot share the same physical volume. This is the default for allocation policy.

n



s


-S StripSize

Specifies the number of bytes per strip (the strip size multiplied by the number of disks in an array equals the stripe size). Valid values include 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K, 1M, 2M, 4M, 8M, 16M, 32M, 64M, and 128M. Note: The -d, -e, and -s flags are not valid when creating a striped logical volume using the -S flag.

-t Type

Sets the logical volume type. The standard types are jfs (journaled file systems), jfslog (journaled file system logs), jfs2 (enhanced journaled file system), jfs2log (enhanced journaled file system logs), and paging (paging spaces), but a user can define other logical volume types with this flag. You cannot create a striped logical volume of type boot. The default is jfs. If a logical volume of type jfslog or jfs2log is created, C-SPOC will automatically run the logform command so that it can be used.

-U Userid

Specifies user ID for logical volume special file.

-u UpperBound

Sets the maximum number of physical volumes for new allocation. The value of the Upperbound variable should be between one and the total number of physical volumes. When using super strictness, the upperbound indicates the maximum number of physical volumes allowed for each mirror copy. When using striped logical volumes, the upper bound must be multiple of Stripe_width. If upper_bound is not specified it is assumed to be stripe_width for striped logical volumes.

-v Verify

Sets the write-verify state for the logical volume. Causes (y) all writes to the logical volume to either be verified with a follow-up read, or prevents (n) the verification of all writes to the logical volume. The Verify parameter is represented by one of the following values:

n

Prevents the verification of all write operations to the logical volume. This is the default for the -v flag.

y

Causes the verification of all write operations to the logical volume.


-w MirrorWriteConsistencyy or a

Turns on active mirror write consistency that ensures data consistency among mirrored copies of a logical volume during typical I/O processing.

p

Turns on passive mirror write consistency that ensures data consistency among mirrored copies during volume group synchronization after a system interruption. Note: This functionality is only available on Big Volume Groups.

n

No mirror write consistency. See the -f flag of the syncvg command.

-x Maximum

Sets the maximum number of logical partitions that can be allocated to the logical volume. The default value is 512. The number represented by the Number parameter must be equal to or less than the number represented by the Maximum variable. The maximum number of logical partitions per logical volume is 32,512.

-y NewLogicalVolume

Specifies the logical volume name to use instead of using a system-generated name. Logical volume names must be unique system wide name, and can range from 1 to 15 characters. The new name should be unique across all nodes on which the volume group is defined. The name cannot begin with a prefix already defined in the PdDv class in the Device Configuration Database for other devices.

-Y Prefix

Specifies the Prefix to use instead of the prefix in a system-generated name for the new logical volume. The prefix must be less than or equal to 13 characters. The name cannot begin with a prefix already defined in the PdDv class in the Device Configuration Database for other devices, nor be a name already used by another device.

Examples

In general, any operation valid with the mklv command that uses the supported operands above is valid with cli_mklv. For example, to make a logical volume in volume group vg02 with one logical partition and a total of two copies of the data, enter:

cli_mklv -c 2 vg01 1


In general, it is preferred to specify the name when creating a new logical volume. To create a new logical volume called shared_lv1 of 10 logical partitions inside volume group called data_vg, enter:

cli_mklv -y shared_lv1 data_vg 10



It should not be used on rootvg, or any other volume group that might appear multiple times across the cluster.

The '-f' flag is passed to cl_mklv to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_mklv The cli_mklv command executable file

/usr/es/sbin/cluster/sbin/cl_mklv The C-SPOC cl_mklv executable/usr/es/sbin/cluster/samples/cspoc/cl_mklv.cel Source code for cl_mklv

cli_mklvcopyIncrease the number of copies in each logical partition in a logical volume on all nodes in a cluster

Syntax

cli_mklvcopy [ -a Position ] [ -e Range ] [ -k ] [ -s Strict ] [ -u UpperBound ] LogicalVolume Copies [ PhysicalVolume... ]

Description

Uses C-SPOC to run the mklvcopy command with the given parameters, and make the updated logical volume definition known on all cluster nodes.

Flags

Only the following flags from the mklvcopy command are supported:

-a Position


m



c


e


ie


im


-e Range


x


m


-k

Synchronizes data in the new partitions.

-s Strict

Determines the strict allocation policy. Copies of a logical partition can be allocated to share or not to share the same physical volume. The Strict parameter is represented by one of the following values:

y

Sets a strict allocation policy, so copies for a logical partition cannot share the same physical volume. This is the default for allocation policy.

n



s


Note: When changing a non-super strict logical volume to a super strict logical volume, you must specify physical volumes or use the -u flag.

-u UpperBound

Sets the maximum number of physical volumes for new allocation. The value of the Upperbound variable should be between one and the total number of physical volumes. When using super strictness, the upper bound indicates the maximum number of physical volumes allowed for each mirror copy. When using striped logical volumes, the upper bound must be multiple of Stripe_width.

Examples

In general, any operation valid with the mklvcopy command that uses the supported operands above is valid with cli_mklvcopy. For example, to add physical partitions to the logical partitions logical volume lv01, so that a total of three copies exist for each logical partition, enter:

cli_mklvcopy lv01 3




The '-f' flag is passed to cl_mklvcopy to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_mklvcopy The cli_mklvcopy commandexecutable file

/usr/es/sbin/cluster/sbin/cl_mklvcopy The C-SPOC cl_mklvcopyexecutable

/usr/es/sbin/cluster/samples/cspoc/cl_mklvcopy.cel Source code for cl_mklvcopy

cli_mkvgCreate a new volume group on all nodes in a cluster.


Syntax

cli_mkvg [ -B ] [ -t factor ] [ -C ] [ -G ] [ -x ] [ -s Size ] [ -V MajorNumber ] [ -v LogicalVolumes ] [ -y VolumeGroup ] PhysicalVolume ...

Description

Uses C-SPOC to run the mkvg command with the given parameters, and make the new logical volume definition known on all cluster nodes.

Flags

Only the following flags from the mkvg command are supported:

-B

Creates a big-type volume group. This can accommodate up to 128 physical volumes and 512 logical volumes. Note, because the VGDA space has been increased substantially, every VGDA update operation (creating a logical volume, changing a logical volume, adding a physical volume, and so on) might take considerably longer to run.

-C

Creates an enhanced concurrent capable volume group. Only use the -C flag in a configured HACMP cluster.

Use this flag to create an enhanced concurrent capable volume group.

Notes:

1) Enhanced concurrent volume groups use group services. Group services ships with HACMP and must be configured prior to activating a volume group in this mode.

2) Only enhanced concurrent capable volume groups are supported when running with a 64-bit kernel. Concurrent capable volume groups are not supported when running with a 64-bit kernel.

-G

same as -b flag.

-p partitions

Total number of partitions in the volume group, where the partitions variable is represented in units of 1024 partitions. Valid values are 32, 64, 128, 256, 512 768, 1024 and 2048. The default is 32 KB (32768 partitions). the chvg command can be used to increase the number of partitions up to the maximum of 2048 KB (2097152 partitions). This option is only valid with the -s option.


-s size

Sets the number of megabytes in each physical partition, where the size variable is expressed in units of megabytes from 1 (1 MB) through 131072 (128 GB). the size variable must be equal to a power of 2 (example 1, 2, 4, 8). The default value for 32 and 128 PV volume groups will be the lowest value to remain within the limitation of 1016 physical partitions per PV. The default value for scalable volume groups will be the lowest value to accommodate 2040 physical partitions per PV.

-t factor

Changes the limit of the number of physical partitions per physical volume, specified by factor. The factor should be between 1 and 16 for 32 PV volume groups and 1 and 64 for 128 PV volume groups. The maximum number of physical partitions per physical volume for this volume group changes to factor x 1016. The default will be the lowest value to remain within the physical partition limit of factor x 1016. The maximum number of pvs that can be included in the volume group is maxpvs/factor. The -t option is ignored with the -s option.

-V majornumber

Specifies the major number of the volume group that is created.

-v

Number of logical volumes that can be created. Valid values are 256, 512, 1024, 2048 and 4096. the default is 256. The chvg command can be used to increase the number of logical volumes up to the maximum of 4096. This option is only valid with the -s option. The last logical volume is reserved for metadata.

-y volumegroup

Specifies the volume group name rather than having the name generated automatically. Volume group names must be unique system wide and can range from 1 to 15 characters. The name cannot begin with a prefix already defined in the pddv class in the device configuration database for other devices. The volume group name created is sent to standard output.

The volume group name can only contain the following characters: "a" through "z," "a" through "z," "0" through "9," or "_" (the underscore), "-" (the minus sign), or "." (the period). all other characters are considered invalid.


Examples

In general, any operation valid with the mkvg command that uses the supported operands above is valid with cli_mkvg. For example, to create a volume group that contains hdisk3, hdisk5 and hdisk6 with a physical partition size set to 1 megabyte, enter:

cli_mkvg -s 1 hdisk3 hdisk5 hdisk6


This command is part of high availability cluster multi-processing for AIX (HACMP for AIX). it must be run as root, on a node in an HACMP cluster.

The '-f' flag is passed to cl_mkvg to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_mkvg the cli_mkvg commandexecutable file

/usr/es/sbin/cluster/sbin/cl_mkvg the c-spoc cl_mkvg executable/usr/es/sbin/cluster/samples/cspoc/cl_mkvg.cel source code for cl_mkvg

cli_on_clusterRun an arbitrary command on all nodes in the cluster.

Syntax

cli_on_cluster [ -S | -P ] 'command string'

Description

Runs a given command as root on all cluster nodes, either serially or in parallel. Any output from the command (stdout or stderr) is sent to the terminal. Each line of output is proceeded by the node name followed by a colon (":").

Flags

-S

Run the command on each node in the cluster in turn, waiting for completion before going to the next.

-P

Run the command in parallel on all nodes in the cluster simultaneously


Examples

To reboot every node in the cluster, enter:

cli_on_cluster -S 'shutdown -Fr'

To get the date from each node in the cluster, enter:

cli_on_cluster -P date

The following results are returned:

jordan: Tue May 5 18:12:30 CDT 2009jessica: Tue May 5 18:12:29 CDT 2009



Files

/usr/es/sbin/cluster/cspoc/cli_on_cluster The cli_on_clustercommand executable file

/usr/es/sbin/cluster/sbin/cl_on_cluster The C-SPOC cl_on_clusterexecutable

/usr/es/sbin/cluster/samples/cspoc/cl_on_cluster.cel Source code for cl_on_cluster

cli_on_nodeRun an arbitrary command on a specific node in the cluster.

Syntax

cli_on_node [ -V <volume group> | -R <resource group | -N <node> ] 'command string'

Description

Runs a given command as root on either an explicitly specified node, or on the cluster node that owns a specified volume group or resource group. Any output from the command (stdout and stderr) is sent to the terminal.

Flags

One and only one of the following flags can be specified.

-V volume group

Run the command on the node on which the given volume group is varied on. If the volume group is varied on in concurrent mode on multiple nodes, the command will be run on all those nodes.


-R resource group

Run the command on the node that currently owns the given resource group.

-N node

Run the command on the given node. This is the HACMP node name.

Examples

To run the ps -efk command on the node named oyster, enter:

cli_on_node -N oyster 'ps -efk'

To get a list of adapters from node “jordan” in the cluster, enter:

cli_on_node -N jordan lsdev -Cc adapter

The following results are returned:

ent0 Available Virtual I/O Ethernet Adapter (l-lan)ent1 Available Virtual I/O Ethernet Adapter (l-lan)ent2 Available Virtual I/O Ethernet Adapter (l-lan)fcs0 Available 01-08 FC Adaptervsa0 Available LPAR Virtual Serial Adaptervscsi0 Available Virtual SCSI Client Adapter



Files

/usr/es/sbin/cluster/cspoc/cli_on_node The cli_on_node commandexecutable file

/usr/es/sbin/cluster/sbin/cl_on_node The C-SPOC cl_on_nodeexecutable

/usr/es/sbin/cluster/samples/cspoc/cl_on_node.cel Source code forcl_on_node

cli_reducevgRemoves a physical volume from a volume group, and makes the change known on all cluster nodes. When all physical volumes are removed from the volume group, the volume group is deleted on all cluster nodes.


Syntax

cli_reducevg VolumeGroup PhysicalVolume ...

Description

Uses C-SPOC to run the reducevg command with the given parameters, and make the updated volume group definition known on all cluster nodes.

Examples

In general, any operation valid with the reducevg command that uses the supported operands above is valid with cli_reducevg. For example, to remove physical disk hdisk10 from volume group vg01, enter:

cli_reducevg vg01 hdisk101



It should not be used on any physical volume in rootvg, or in any other volume group that otherwise might appear multiple times across the cluster.

The '-f' flag is passed to cl_reducevg to suppress unnecessary checking and provide automatic confirmation of physical disk removal. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_reducevg The cli_reducevg commandexecutable file

/usr/es/sbin/cluster/sbin/cl_reducevg The C-SPOC cl_reducevgexecutable

/usr/es/sbin/cluster/samples/cspoc/cl_reducevg.cel Source code forcl_reducevg

cli_replacepvReplace a physical volume in a volume group with another, and make the change known on all cluster nodes.

Syntax

cli_replacepv SourcePhysicalVolume DestinationPhysicalVolume

Description

Uses C-SPOC to run the replacepv command with the given parameters, and make the updated volume group definition known on all cluster nodes.


Examples

In general, any operation valid with the replacepv command that uses the supported operands above is valid with cli_replacepv. For example, to replace hdisk10 with hdisk20 in the volume group that owns hdisk10, enter:

cli_replacepv hdisk10 hdisk20


This command is part of High Availability Cluster Multi-Processing for AIX(HACMP for AIX). It must be run as root, on a node in an HACMP cluster.

It should not be used on any physical disk in rootvg, or in any other volume group that otherwise might appear multiple times across the cluster.

The '-f' flag is passed to cl_diskreplace to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_replacepv The cli_replacepvcommand executable file

/usr/es/sbin/cluster/sbin/cl_diskreplace The C-SPOC cl_replacepv executable

/usr/es/sbin/cluster/samples/cspoc/cl_diskreplace.cel Source code for cl_replacepv

cli_rmfsRemove a file system from all nodes in a cluster

Syntax

cli_rmfs [ -r ] FileSystem

Description

Uses C-SPOC to run the rmfs command with the given parameters, and remove the file system definition from all cluster nodes.

Flags

Only the following flags from the rmfs command are supported:

-r

Removes the mount point of the file system


Examples

In general, any operation valid with the rmfs command that uses the supported operands above is valid with cli_rmfs. For example, to remove the shared file system /test, enter:

cli_rmfs -r /test



It should not be used on file systems in rootvg, or that otherwise might appear multiple times cross the cluster.

The '-f' flag is passed to cl_rmfs to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_rmfs The cli_rmfs commandexecutable file

/usr/es/sbin/cluster/sbin/cl_rmfs The C-SPOC cl_rmfsexecutable

/usr/es/sbin/cluster/samples/cspoc/cl_rmfs.cel Source code for cl_rmfs

cli_rmlvRemove a logical volume from all nodes in a cluster.

Syntax

cli_rmlv LogicalVolume ...

Description

Uses C-SPOC to run the rmlv command with the given parameters, and make the updated logical volume definition known on all cluster nodes.

Examples

In general, any operation valid with the rmlv command that uses the supported operands above is valid with cli_rmlv. For example, to change the shared logical volume lv01, enter:

cli_rmlv lv01





The '-f' flag is passed to cl_rmlv to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_rmlv The cli_rmlv commandexecutable file

/usr/es/sbin/cluster/sbin/cl_rmlv The C-SPOC cl_rmlvexecutable

/usr/es/sbin/cluster/samples/cspoc/cl_rmlv.cel Source code for cl_rmlv

cli_rmlvcopyRemove copies from a logical volume on all nodes in a cluster

Syntax

cli_rmlvcopy LogicalVolume Copies [ PhysicalVolume... ]

Description

Uses C-SPOC to run the rmlvcopy command with the given parameters, and make the updated logical volume definition known on all cluster nodes.

Examples

In general, any operation valid with the rmlvcopy command that uses the supported operands above is valid with cli_rmlvcopy. For example, to reduce the number of copies of each logical partition belonging to logical volume lv01 so that each as only a single copy, enter:

cli_rmlvcopy lv01 1



It should not be used on any logical volume in rootvg, or that otherwise might appear multiple times across the cluster.

The '-f' flag is passed to cl_rmlvcopy to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are no accessible.


Files

/usr/es/sbin/cluster/cspoc/cli_rmlvcopy The cli_rmlvcopy commandexecutable file

/usr/es/sbin/cluster/sbin/cl_rmlvcopy The C-SPOC cl_rmlvcopyexecutable

/usr/es/sbin/cluster/samples/cspoc/cl_rmlvcopy.cel Source code forcl_rmlvcopy

cli_syncvgRun the syncvg command with the given parameters and make the updated volume group definition known on all cluster nodes

Syntax

cli_syncvg [-f] [-H] [-P NumParallelLps] {-l|-v} Name

Description

Uses C-SPOC to run the syncvg command, which causes LVM on each cluster node to read the LVM information on the disks in the volume group, and update the local volume group definition.

Flags

Only the following flags from the syncvg command are supported:

-f

Specifies a good physical copy is chosen and propagated to all other copies of the logical partition, whether or not they are stale.

-H

Postpones writes for this volume group on any other cluster nodes where the concurrent volume group is active, until this sync operation is complete. When using the -H flag, the -P flag does not require that all the nodes on the cluster support the -P flag. This flag is ignored if the volume group is not varied on in concurrent mode.

-l

Specifies that the Name parameter represents a logical volume device name.

-P NumParallelLps

Numbers of logical partitions to be synchronized in parallel. The valid range for NumParallelLps is 1 to 32. NumParallelLps must be tailored to the machine, disks in the volume group, system resources, and volume group mode.


-v

Specifies that the Name parameter represents a volume group device name.

Examples

In general, any operation valid with the syncvg command that uses the supported operands above is valid with cli_syncvg. For example, to synchronize the copies on volume group v01, enter:

cli_syncvg -v vg01




The '-f' flag is passed to cl_syncvg to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_syncvg The cli_syncvg commandexecutable file

/usr/es/sbin/cluster/sbin/cl_syncvg The C-SPOC cl_syncvgexecutable

/usr/es/sbin/cluster/samples/cspoc/cl_syncvg.cel Source code for cl_syncvg

cli_unmirrorvgUnmirror a volume group on all nodes in a cluster.

Syntax

cli_unmirrorvg [ -c Copies ] VolumeGroup [ PhysicalVolume ... ]

Description

Uses C-SPOC to run the unmirrorvg command with the given parameters, and make the updated volume group definition known on all cluster nodes.


Flags

Only the following flags from the unmirrorvg command are supported:

-c Copies

Specifies the minimum number of copies that each logical volume must have after the unmirrorvg command has finished running. If you do not want all logical volumes to have the same number of copies, then reduce the mirrors manually with the rmlvcopy command. If this option is not used, the copies will default to 1.

Examples

In general, any operation valid with the unmirrorvg command that uses the supported operands above is valid with cli_unmirrorvg. For example, to specify only a single copy for shared volume group vg01, enter:

cli_unmirrorvg -c 1 vg01




The '-f' flag is passed to cl_unmirrorvg to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_unmirrorvg The cli_unmirrorvgcommand executable file

/usr/es/sbin/cluster/sbin/cl_unmirrorvg The C-SPOCcl_unmirrorvg executable

/usr/es/sbin/cluster/samples/cspoc/cl_unmirrorvg.cel Source code for cl_unmirrorv

cli_updatevgUpdates the definition of a volume group on all cluster nodes to match the current actual state of the volume group.

Syntax

cli_updatevg VolumeGroup


Description

Uses C-SPOC to run the updatevg command, which causes LVM on each cluster node to read the LVM information on the disks in the volume group, and update the local volume group definition.

Examples

To update the volume group definition for volume group vg11 on all cluster nodes, enter:

cli_updatevg vg11




The '-f' flag is passed to cl_updatevg to suppress unnecessary checking. As a consequence, the operation will proceed even if some nodes are not accessible.

Files

/usr/es/sbin/cluster/cspoc/cli_updatevg The cli_updatevg commandexecutable file

/usr/es/sbin/cluster/sbin/cl_updatevg The C-SPOC cl_updatevgexecutable

/usr/es/sbin/cluster/samples/cspoc/cl_updatevg.cel Source code forcl_updatevg

7.5 Time synchronizationPowerHA does not perform time synchronization for you, we strongly recommend that you implement time synchronization within your clusters. Some applications require time synchronization but also from an administration perspective having xntpd running will ease administration within a clustered environment.

7.6 Cluster verification and synchronizationVerification and synchronization of the PowerHA cluster ensures that all resources being placed under PowerHA control, are configured appropriately and that all rules regarding resource ownership and other parameters are consistent across nodes in the cluster.


The PowerHA cluster stores the information about all cluster resources and cluster topology, as well as some additional parameters in PowerHA specific object classes in the ODM. PowerHA ODM files must be consistent across all cluster nodes to cluster behavior will work as designed. Cluster verification checks the consistency of PowerHA ODM files across all nodes as well as verify if PowerHA ODM information is consistent with required AIX ODM information. If verification is successful, then the cluster configuration can be synchronized across all the nodes. Synchronization takes effect immediately in an active cluster. Cluster synchronization copies the PowerHA ODM from the local to all remote nodes.

7.6.1 Cluster verification and synchronization using SMIT

Using SMIT (run smitty hacmp) there are three different verification and synchronization paths for cluster verification:

� Initialization and Standard Configuration path� Extended Configuration path� Problem Determination Tools path

Initialization and Standard Configuration verification pathTo use the “Initialization and Standard Configuration” verification path, run:

1. smitty hacmp Initialization and Standard Configuration Verify and Synchronize HACMP Configuration

When you use the SMIT Initialization and Standard Configuration path, synchronization will take place automatically following successful verification of the cluster configuration. There are no additional options in this menu. The feature to automatically correct errors found during verification, is always active while using the “Initialization and Standard Configuration” path. You can find more information about automatically correcting errors found during verification in 7.6.4, “Running automatically corrective actions during verification” on page 424.

Note: If the cluster in not synchronized and failure of a cluster topology or resource component takes place, the cluster might not be able to fallover as designed. We recommend that your regularly verify the cluster configuration and synchronize all changes when complete.


Extended Configuration verification pathWhen using the Extended Configuration path, you have different options for verification and you can choose whether to verify followed by synchronize or not.

To use the Extended Configuration path for verification of your cluster, run:

1. Run smitty hacmp Extended Configuration Extended Verification and Synchronization.

2. Change the field parameters (the default option here is ‘Both’ verification and synchronization, as shown in the examples to follow) and press Enter.

3. Figure 7-22 on page 419 shows the SMIT panel displayed when cluster services are active (DARE - Dynamic Reconfiguration).

4. Figure 7-23 on page 420 shows the SMIT panel displayed when cluster services are not active.

The Extended Verification and Synchronization path parameters depend on the cluster services state (active or inactive) on the node, where verification is initiated. In an active cluster the SMIT panel parameters are as follows:

1. Emulate or Actual: Option Emulate runs verification in emulation mode, no changes to the cluster configuration are made while Actual will apply changes live to the cluster configuration.

2. Verify changes only: Option No runs the full check of topology and resources while Yes only verifies the changes made to the cluster configuration (PowerHA ODM) since the last verification.

3. Logging: Option Verbose will send full output to the console which would otherwise be directed to the clverify.log file.

In an inactive cluster (where PowerHA is not running), the SMIT panel parameters are as follows:

1. “Verify, Synchronize or Both”: Verify runs verification only, Synchronize runs synchronization only, Both runs verification, and if that completes, then it goes ahead with synchronization (with this option, the Force synchronization if verification fails option can be used).

2. “Automatically correct errors found during verification”: For a detailed description, see 7.6.4, “Running automatically corrective actions during verification” on page 424.

Note: The Emulate option is no longer available in newer PowerHA SP levels.


3. “Force synchronization if verification fails”: Option No will stop synchronization from commencing if the verification procedure return errors while Yes forces synchronization regardless of the result of verification. In general we do not recommend forcing the synchronization. In some specific situations, if the synchronization needs to be forced, ensure that you fully understand the consequences of these cluster configuration changes.

4. “Verify changes only”: No runs a full check of topology and resources while Yes verifies only the changes which have taken place in the PowerHA ODM files since the time of the last verification operation.

5. Logging: Option Verbose will send full output to the console which would otherwise be directed to the clverify.log file.

Figure 7-22 Verification and Synchronization panel - active cluster

Note: Synchronization can be initiated on either an active or inactive cluster. If some nodes in the cluster are inactive, synchronization could be initiated only from an active node, using DARE (Dynamic Reconfiguration). You can find more information about DARE in 7.6.2, “Dynamic cluster reconfiguration: DARE” on page 421.

HACMP Verification and Synchronization (Active Cluster Nodes Exist)


[Entry Fields]* Emulate or Actual [Actual] +* Verify changes only? [No] +* Logging [Standard] +



Figure 7-23 Verification and Synchronization panel - inactive cluster

Problem Determination Tools verification pathTo use the “Problem Determination Tools” path to run verification as follows, run:

1. smitty hacmp Problem Determination Tools HACMP Verification Verify HACMP Configuration.

If you are using the Problem Determination Tools path, you have more options for verification, such as the option to define custom verification methods, but it is not possible to synchronize the cluster from here. You can see the SMIT panel of the Problem Determination Tools verification path in Figure 7-24.

HACMP Verification and Synchronization


[Entry Fields]* Verify, Synchronize or Both [Both] +* Automatically correct errors found during [No] + verification

* Force synchronization if verification fails? [No] +* Verify changes only? [No] +* Logging [Standard] +


Note: Verification, using the Problem Determination Tools path, can be initiated either from active or inactive nodes.


Figure 7-24 Verification panel using “Problem Determination Tools” path

If verification fails, errors should be corrected and verification repeated to ensure that these are resolved as soon as possible. The messages output from verification indicate where the error occurred (for example, on a node, a device, or a command). In 7.6.3, “Verification log files” on page 423, we describe the location and purpose of the verification logs.

7.6.2 Dynamic cluster reconfiguration: DAREPowerHA will allow some changes to be made to both the cluster topology and the cluster resources while the cluster is running. This feature is referred to as a dynamic automatic reconfiguration event, or DARE. You can make a number of supported resource and topology changes in the cluster and then utilize the dynamic reconfiguration event to apply those changes to the active cluster without having to bring cluster nodes offline, making the whole operation faster, especially for complex configuration changes.

Verify HACMP Configuration

Type or select values in entry fields.Press Enter AFTER making all desired changes. [Entry Fields] HACMP Verification Methods Pre-Installed + (Pre-Installed, none) Custom Defined Verification Methods [] + Error Count [] # Log File to store output [] Verify changes only? [No] + Logging [Standard] +


Notes:

� Be aware that when the cluster synchronization (DARE) takes place, action will be taken on any resource/topology component which is to be changed/removed, immediately.

� It is not supported to run a DARE operation on a cluster which has nodes running at different versions of the PowerHA code, for example, during a cluster migration.


The following changes can be made to resources in an active cluster using DARE:

� Add, remove, or change an application server.

� Add, remove, or change application monitoring.

� Add or remove the contents of one or more resource groups.

� Add, remove, or change a tape resource.

� Add or remove one or more resource groups.

� Add, remove, or change the order of participating nodes in a resource group.

� Change the node relationship of the resource group.

� Change resource group processing order.

� Add, remove, or change the fallback timer policy associated with a resource. group. The new fallback timer will not have any effect until the resource group is brought online on another node.

� Add, remove, or change the settling time for resource groups.

� Add or remove the node distribution policy for resource groups.

� Add, change, or remove parent/child or location dependencies for resource groups (some limitations apply here).

� Add, change, or remove inter-site management policy for resource groups.

� Add, remove, or change pre-events or post-events.

The dynamic reconfiguration can be initiated only from an active cluster node, which means, from a node that has cluster services running. The change must be made from a node that is active so the cluster can be synchronized.

Before making changes to a cluster definition, ensure that:

� The same version of HACMP is installed on all nodes.

� Some nodes are up and running PowerHA and they are able to communicate with each other. No node should be in an UNMANAGED state.

� The cluster is stable and the hacmp.out log file does not contain recent event errors or config_too_long events.

Depending on your cluster configuration and on the specific changes you plan to make in your cluster environment, there are many different possibilities and possible limitations while running a dynamic reconfiguration event. You must understand all of the consequences of changing an active cluster configuration, so we recommend that you read the PowerHA Administration Guide for further details before making dynamic changes in your live PowerHA environment.


7.6.3 Verification log filesDuring cluster verification, PowerHA collects configuration data from all cluster nodes as it runs through the list of checks. The verbose output is saved to the clverify.log file. The log file is rotated.

The following output shows the /var/hacmp/clverify/ directory contents with verification log files:

root@ ndu1[/var/hacmp/clverify] ls -ltotal 12712-rw-r--r-- 1 root system 12 Mar 24 19:13 clver_CA_daemon_invoke_client.log-rw------- 1 root system 405 Mar 24 19:13 clver_ca_ndu1.xml-rw------- 1 root system 335 Mar 22 19:49 clver_ca_ndu2.xml-rw-r--r-- 1 root system 3064366 Mar 25 00:00 clver_debug.log-rw-r--r-- 1 root system 821284 Mar 25 00:00 clver_parser.log.466964-rw-r--r-- 1 root system 713984 Mar 25 00:00 clver_request.xml-rw------- 1 root system 113224 Mar 25 21:54 clverify.log-rw------- 1 root system 113271 Mar 25 21:52 clverify.log.1-rw------- 1 root system 112959 Mar 25 01:15 clverify.log.2-rw------- 1 root system 112830 Mar 25 01:02 clverify.log.3-rw------- 1 root system 110446 Mar 25 00:00 clverify.log.4-rw------- 1 root system 110691 Mar 24 22:41 clverify.log.5-rw------- 1 root system 112574 Mar 24 22:40 clverify.log.6-rw------- 1 root system 114372 Mar 24 19:13 clverify.log.7-rw------- 1 root system 107546 Mar 24 18:35 clverify.log.8-rw------- 1 root system 114731 Mar 24 17:10 clverify.log.9-rw------- 1 root system 1205 Mar 25 21:54 clverify_daemon.log-rw-r--r-- 1 root system 738560 Mar 25 21:54 clverify_last_request.xmldrwx------ 4 root system 256 Mar 24 22:40 faildrwx------ 4 root system 256 Mar 25 21:54 passdrw------- 4 root system 256 Mar 25 21:52 pass.prevdrwxr-xr-x 4 root system 256 Mar 05 11:52 wpar

On the local node, where you initiate the cluster verification command, detailed information is collected in the log files, which contain a record of all data collected, the tasks performed and any errors. These log files are written to the following directories and are used by a service technician to determine the location of errors:

� /var/hacmp/clverify/pass/nodename/ - if verification succeeds

� /var/hacmp/clverify/fail/nodename/ - if verification fails

Note: Verification requires 4 MB of free space per node in the /var file system in order to run. Typically, the /var/hacmp/clverify/clverify.log files require additional 1–2 MB of disk space. At least 42 MB of free space is recommended for a four-node cluster. The default log file location for most PowerHA log files is now /var/hacmp, however there are some exceptions. For more details, refer to the PowerHA Administration Guide.


7.6.4 Running automatically corrective actions during verificationPowerHA allows the possibility to automatically correct some errors during cluster verification. The default action for this depends on the path you are using for verification and synchronization.

The automatic corrective action feature can only correct some types of errors, which are detected during the cluster verification. The following list presents the errors that can be addressed using this feature:

� HACMP shared volume group time stamps are not up-to-date on a node.

� The /etc/hosts file on a node does not contain all PowerHA-managed IP addresses.

� SSA concurrent volume groups need unique SSA node numbers.

� A file system is not created on a node, although disks are available.

� Disks are available, but the volume group has not been imported to a node.

� Shared volume groups configured as part of an HACMP resource group have their automatic varyon attribute set to Yes.

� Required /etc/services entries are missing on a node.

� Required PowerHA snmpd entries are missing on a node.

� Required RSCT network options settings.

� Required PowerHA network options setting.

� Required routerevalidate network option setting.

� Corrective actions when using IPv6.

� Create WPAR if added to a resource group but WPAR does not exist yet.

With no prompt:

� Correct error conditions that appear in /etc/hosts.

� Correct error conditions that appear in /usr/es/sbin/cluster/etc/clhosts.client.

� Update /etc/services with missing entries.

� Update /etc/snmpd.peers and /etc/snmp.conf files with missing entries.

� Update SSA node numbers to be unique cluster wide.

With a prompt:

� Update auto-varyon on this volume group.

� Update volume group definitions for this volume group.

� Keep HACMP volume group timestamps in sync with the VGDA.


� Auto-import volume groups.

� Re-import volume groups with missing file systems and mount points.

� File system automount flag is set in /etc/filesystems

� Set network option.

� Set inoperative cluster nodes interfaces to the boot time interfaces.

� Bring active resources offline.

� Update automatic error notification stanzas.

� Do corrective action of starting ntpd daemons.

� Do corrective action of assigning link-local addresses to ND-capable network interfaces.

Initialization and Standard Configuration verification pathWhen you use the “Initialization and Standard Configuration” verification path, the automatically corrected errors feature is always active and it is not possible to disable it.

Extended Configuration verification pathWhen you use the “Extended Configuration” path for cluster verification, the ability to automatically correct errors depends on the cluster state. You can choose not to use this feature or you can choose one of two modes of operation:

Interactively When verification detects a correctable condition related to importing a volume group or to exporting and re-importing mount points and file systems, you are prompted to authorize a corrective action before verification continues.

Automatically (Menu selection Yes) When verification detects that any of the error conditions exists, as listed in the section, Conditions That Can Trigger a Corrective Action, it takes the corrective action automatically without a prompt.

If cluster services are inactive, you can select the mode of the automatic error correction feature directly in the “Extended Configuration” verification path menu by running

smitty hacmp Extended Configuration Extended Verification and Synchronization

Note: No automatic corrective actions take place during a dynamic automatic reconfiguration event (DARE).


As shown in Figure 7-23 on page 420, you can change the mode with the “Automatically correct errors found during verification” field, by setting it to one of these choices:

� Yes� No� Interactively

If the cluster is active, the automatic corrective action feature is enabled by default. You can change the mode of automatic error correction for an active cluster directly in the SMIT menu for starting cluster services. Run:

� smitty hacmp System Management (C-SPOC) Manage HACMP Services Start Cluster Services

By setting values to Yes, No or Interactive, this will set the automatic error correction mode for:

1. The PowerHA “Extended Configuration” verification path.

2. Automatic cluster verification to run at cluster services start time.

3. Automatic cluster configuration monitoring, which runs daily if enabled.

You can find more information about the latter two topics in 7.6.5, “Automatic cluster verification” on page 426.

Problem Determination Tools verification pathUsing this verification path, automatic corrective actions are not possible.

7.6.5 Automatic cluster verificationPowerHA provides automatic verification in the following cases:

� Each time you start cluster services on a node� Every 24 hours (automatic cluster configuration monitoring - enabled by

default)

During automatic verification and synchronization, PowerHA will detect and correct several common configuration issues. This automatic behavior ensures that if you have not manually verified and synchronized a node in your cluster prior to starting cluster services, PowerHA will do so.

Using the SMIT menus, you can set the parameters for the periodic automatic cluster verification checking utility, by running:

� smitty hacmp Problem Determination Tools HACMP Verification Automatic Cluster Configuration Monitoring.


You can find the following fields in the SMIT panel:

� Automatic cluster configuration verification: Here you can enable or disable the utility, by selecting either Disable or Enable.

� Node name: Here you can select nodes where the utility will run. Selecting the option default means that the first node in alphabetical order will verify the configuration.

� HOUR (00 - 23): Here you can define time, when the utility will start. The default value is 00 (midnight) and it can be changed manually.

Figure 7-25 shows the SMIT panel for “Automatic Cluster Configuration Monitoring” parameters setting, for:

smitty clautover.dialog

Figure 7-25 Automatic Cluster Configuration Monitoring

You can check the verification result of automatic cluster verification in the verification log files. The default location for them is the /var/hacmp/clverify/ directory. You can find more about verification log files in 7.6.3, “Verification log files” on page 423.

7.7 Monitoring PowerHA

By design, PowerHA provides a highly available application environment by masking or eliminating failures that might occur either on hardware or software components of the environment. Masking the failure means that the active resources are moved from a failed component to the next available component of that type. So all highly available applications continue to operate and clients can access and use them despite the failure.

Automatic Cluster Configuration Monitoring


[Entry Fields]* Automatic cluster configuration verification Enabled + Node name Default +* HOUR (00 - 23) [00] +



As a result, it is possible that a component in the cluster has failed and that you are unaware of the fact. The danger here is that, while PowerHA can survive one or possibly several failures, each failure that escapes your notice threatens the cluster’s ability to provide a highly available environment, as the redundancy of cluster components is diminished.

To avoid this situation, we recommend that you regularly check and monitor the cluster. PowerHA provides various utilities that help you with cluster monitoring and with cautions, as follows:

� Automatic cluster verification. You can find more about this in 7.6.5, “Automatic cluster verification” on page 426.

� Cluster status checking utilities

� Resource group information commands

� Topology information commands

� Log files

� Error notification methods

� Application monitoring

� Measuring application availability

� Monitoring clusters from the enterprise system administration and monitoring tools

You can use either ASCII SMIT or WebSMIT to configure and manage cluster environments.

For more information about WebSMIT, see 4.3, “Installing and configuring WebSMIT” on page 224

7.7.1 Cluster status checking utilitiesIn the following section we describe the common status checking utilities.

clstat commandThe clstat command (/usr/es/sbin/cluster/clstat) is a very helpful tool that you can use for cluster status monitoring. It uses the clinfo library routines to display information about the cluster, including name and state of the nodes, networks, network interfaces, and resource groups.

This utility requires the clinfoES subsystem to be active on nodes where the clstat command is initiated.


The clstat utility is supported in two modes: ASCII mode and X Window mode. ASCII mode can run on any physical or virtual ASCII terminal, including xterm or aixterm windows. If the cluster node runs graphical mode, clstat displays the output in graphical window. Before running the command, ensure that the DISPLAY variable is exported to the X server and that X client access is allowed.

Figure 7-26 shows the syntax of the clstat command.

Figure 7-26 clstat command syntax

clstat -a runs the program in ASCII mode.

clstat -o runs the program once in ASCII mode and exits (useful for capturing output from a shell script or cron job).

clstat -s displays service labels that are both up and down, otherwise it displays only service labels, which are active.

Example 7-17 shows the clstat -o command output from our test cluster.

Example 7-17 clstat -o command output

# clstat -o

clstat - HACMP Cluster Status Monitor-------------------------------------

Cluster: glvmtest (1236788768)Thu Mar 26 15:25:57 EDT 2009 State: UP Nodes: 2 SubState: STABLE

clstat [-c cluster ID | -n cluster_name] [-i] [-r seconds] [-a|-o] [-s]

-c cluster ID > run in automatic (non-interactive) mode for the specified cluster. -n name > run in automatic (non-interactive) mode for the specified cluster name. -i > run in interactive mode -r seconds > number of seconds between each check of the cluster status -a > run ascii version. -o > run once and exit. -s > display both up and down service labels.


Node: glvm1 State: UP Interface: glvm1 (2) Address: 9.12.7.4 State: UP Interface: glvm1_ip1 (1) Address: 10.10.30.4 State: UP Interface: glvm1_data1 (0) Address: 10.10.20.4 State: UP Resource Group: liviu State: On line Resource Group: rosie State: On line

Node: glvm2 State: UP Interface: glvm2 (2) Address: 9.12.7.8 State: UP Interface: glvm2_ip2 (1) Address: 10.10.30.8 State: UP Interface: glvm2_data2 (0) Address: 10.10.20.8 State: UP Resource Group: liviu State: Online (Secondary) Resource Group: rosie State: Online (Secondary)

cldump commandAnother useful utility is cldump (/usr/es/sbin/cluster/utilities/cldump). It provides a snapshot of the key cluster status components: the cluster itself, the nodes in the cluster, the networks and network interfaces connected to the nodes, and the resource group status on each node.

The cldump command does not have any arguments, so you simply run cldump from the command line.

7.7.2 Cluster status and services checking utilitiesIn the following section we provide a list of common cluster status and services checking utilities.

Checking cluster subsystem statusYou can check the PowerHA or RSCT subsystem status by running the lssrc command with -s or -g switches. It displays subsystem name, group, PID, and status (active or inoperative).

lssrc -s subsystem_name displays subsystem information for a specific subsystem.

lssrc -g subsystem_group_name displays subsystem information for all subsystems in specific group.


Figure 7-27 shows subsystem names and group names for subsystems used by PowerHA.

Figure 7-27 Subsystem names and group names used by PowerHA

The clshowsrv commandYou can display the status of PowerHA subsystems by using the clshowsrv command (/usr/es/sbin/cluster/utilities/clshowsrv). It displays the status of all subsystems, used by PowerHA or the status of the selected subsystem. The command output format is the same as lssrc -s command output.

Figure 7-28 shows the syntax of the clshowsrv command.

Figure 7-28 clshowsrv command syntax

clshowsrv -a displays status of HACMP subsystem: clstrmgrES, clinfoES, and clcomdES

clshowsrv -v displays status of all HACMP and RSCT subsystems.

Note: Since HACMP Version 5.3, the cluster manager daemon clstrmgrES is initiated from the init process, so it starts automatically at boot time. The Cluster Manager must be running before any cluster services can start on a node. Because the clstrmgr daemon is now a long-running process, you cannot use the lssrc -s clstrmgrES command to determine the state of the cluster. Use the lssrc -ls clstrmgrES command and check the clstrmgr state or use the clcheck_server grpsvcs;print $? command (1=cluster services running, 0=cluster services not running).

Subsystem_name group_name

RSCT subsystems used by HACMP: topsvcs topsvcs grpsvcs grpsvcs grpglsm grpsvcsemsvcs emsvcsemaixos emsvcsctrmc rsct HACMP subsystems: clcomdES clcomdES clstrmgrES clusteroptional HACMP subsystems: clinfoES cluster

clshowsrv [-a|-v] [clstrmgrES|clinfoES|clcomdES]


Example 7-18 shows output of the clshowres command from our test cluster, when cluster services are running.

Example 7-18 clshowres -v command output

Status of the RSCT subsystems used by HACMP:Subsystem Group PID Status topsvcs topsvcs 22756 active grpsvcs grpsvcs 21858 active grpglsm grpsvcs inoperative emsvcs emsvcs 24932 active emaixos emsvcs 28982 active ctrmc rsct 13430 active

Status of the HACMP subsystems:Subsystem Group PID Status clcomdES clcomdES 15738 active clstrmgrES cluster 26498 active

Status of the optional HACMP subsystems:Subsystem Group PID Status clinfoES cluster 26260 active

You can also run the command clshowsrv -v using SMIT menus:smitty hacmp System Management (C-SPOC) Manage HACMP Services Show Cluster Services.

7.7.3 Topology information commandsThis section discusses topology information commands and their use.

The cltopinfo commandcltopinfo command (/usr/es/sbin/cluster/utilities/cltopinfo) lists the cluster topology information using an alternative format that is easier to read and understand.

Figure 7-29 shows the cltopinfo command syntax.

Figure 7-29 cltopinfo command syntax

You could use also SMIT menus to display different formats of the topology information, by running:

smitty hacmp Extended Configuration Extended Topology Configuration Show HACMP Topology and selecting the desired format.

cltopinfo [-c] [-n] [-w] [-i] [-m]


Figure 7-30 shows the SMIT menus for topology information view with different format options. This selections are consistent with command options, shown in Figure 7-29.

smitty cm_show_menu

Figure 7-30 SMIT show cluster topology menu

The topsvcs subsystemYou could run the lssrc -ls topsvcs command to monitor the heartbeat activity, based on topology services. The output of the topsvcs daemon activity shows you all heartbeats related information for all active network paths. Pointing to Missed HBs, Packets sent, Packets received, and Errors fields in the output for the specific network path, it gives you information about the specific heartbeat activity. Example 7-19 shows part of the lssrc -ls topsvcs command output on our test cluster. The emphasized words point you to the output information of interest.

Example 7-19 lssrc -ls topsvcs command output

Subsystem Group PID Status topsvcs topsvcs 372976 activeNetwork Name Indx Defd Mbrs St Adapter ID Group IDnet_ether_01_0 [ 0] 2 2 S 10.10.10.5 10.10.10.11net_ether_01_0 [ 0] en0 0x51cb9603 0x51cb88cdHB Interval = 1.000 secs. Sensitivity = 10 missed beatsMissed HBs: Total: 0 Current group: 0Packets sent : 25685 ICMP 0 Errors: 0 No mbuf: 0Packets received: 41169 ICMP 0 Dropped: 0NIM's PID: 426106diskhb_0 [ 1] 2 2 S 255.255.10.0 255.255.10.1diskhb_0 [ 1] rhdisk1 0x81cb9602 0x81cb88d3HB Interval = 2.000 secs. Sensitivity = 4 missed beatsMissed HBs: Total: 32 Current group: 32Packets sent : 12103 ICMP 0 Errors: 0 No mbuf: 0

Show HACMP Topology


Show Cluster Topology Show Cluster Definition Show Topology Information by Node Show Topology Information by Network Show Topology Information by Communication Interface

F1=Help F2=Refresh F3=Cancel F8=ImageF9=Shell F10=Exit Enter=Do


Packets received: 12716 ICMP 0 Dropped: 0NIM's PID: 516302diskhbmulti_0 [ 2] 2 2 S 255.255.12.0 255.255.12.1diskhbmulti_0 [ 2] rmndhb_lv_01.1_2 0x81cb963f 0x81cb88f3HB Interval = 3.000 secs. Sensitivity = 6 missed beatsMissed HBs: Total: 0 Current group: 0Packets sent : 8386 ICMP 0 Errors: 0 No mbuf: 0Packets received: 8837 ICMP 0 Dropped: 0NIM's PID: 544926 2 locally connected Clients with PIDs:haemd(286924) hagsd(274658) Fast Failure Detection available but off. Dead Man Switch Enabled: reset interval = 1 seconds trip interval = 36 seconds Client Heartbeating Disabled. Configuration Instance = 16 Daemon employs no security Segments pinned: Text Data. Text segment size: 781 KB. Static data segment size: 1546 KB. Dynamic data segment size: 4097. Number of outstanding malloc: 177 User time 1 sec. System time 1 sec. Number of page faults: 141. Process swapped out 0 times. Number of nodes up: 2. Number of nodes down: 0.

7.7.4 Resource group information commandsThis section discusses resource groups, and the commands that provide information about them.

The clRGinfo commandYou can display a resource group's attributes within the cluster using clRGinfo command (/usr/es/sbin/cluster/utilities/clRGinfo). The command output shows the location and state of one or more specified resource groups. The output of the command displays both the global state of the resource group as well as the special state of the resource group on a local node.

Figure 7-31 shows the clRGinfo command syntax.

Figure 7-31 clRGinfo command syntax

clRGinfo -v // Produces verbose output.

clRGinfo [-h] [-v] [-a] [-s|-c] [-t] [-p] [group name(s)]


clRGinfo -a // Displays the current location of a resource group and its destination after a cluster event (when run during a cluster event).

clRGinfo -t // Displays the delayed timer information all delayed fallback timers and settling timers that are currently active on the local node. Note: This flag can be used only if the cluster manager is active on the local node.

clRGinfo -s // Produces colon-separated output.

clRGinfo -p // Displays the node that temporarily has the highest priority for this instance.

clRGinfo -m // Displays the status of application monitors in the cluster.

The resource group status is shown as:

Online The resource group is currently operating properly on one or more nodes in the cluster.

Offline The resource group is not operating in the cluster and is currently not in an error condition. Offline state means either the user requested this state or dependencies were not met.

Acquiring A resource group is currently coming up on one of the nodes in the cluster. In normal conditions status changes to Online.

Releasing The resource group is in the process of being released from ownership by one node. Under normal conditions after being successfully released from a node the resource group’s status changes to offline.

Error The resource group has reported an error condition. User interaction is required.

Unknown The resource group’s current status cannot be attained, possibly due to loss of communication; the fact that all nodes in the cluster are not up, or because a resource group dependency is not met (another resource group that depends on this resource group failed to be acquired first).

If cluster services are not running on the local node, the command determines a node where the cluster services are active and obtains the resource group information from the active cluster manager.

Instead of clRGinfo, you can us the e clfindres command, which is a link to clRGinfo (/usr/es/sbin/cluster/utilities/clfindres).


Example 7-20 shows the clRGinfo command output.

Example 7-20 clRGinfo command output

# clRGinfo-----------------------------------------------------------------------------Group Name Group State Node-----------------------------------------------------------------------------rosie ONLINE glvm1@USA ONLINE SECONDARY glvm2@UK

liviu ONLINE glvm1@USA ONLINE SECONDARY glvm2@UK

7.7.5 Log filesPowerHA writes the messages it generates to the system console and to several log files. Because each log file contains a different subset of the types of messages generated by PowerHA, you can get different views of cluster status by viewing different log files.

The default locations of log files are used in this section. If you redirected any logs, check the appropriate location.

C-SPOC offers the following utilities to view log files:

� View/Save/Delete HACMP Event Summaries

Using this selection you can display the contents, delete or save the cluster event summary to a file.

� View Detailed HACMP Log Files

With this selection you can display the PowerHA (detailed) scripts log (/var/hacmp/log/hacmp.out), PowerHA event summary log (/var/hacmp/adm/cluster.log), C-SPOC system log file (/var/hacmp/log/cspoc.log).

� Change/Show HACMP Log File Parameters

Using this option you can set the debug level (high/low) and formatting option (default, standard, html-low, html-high) for the selected node.

� Change/Show Cluster Manager Log File Parameters

With this selection you can set the clstrmgrES debug level (standard/high)

� Change/Show a Cluster Log Directory

Using this menu you can change the default directory for a log file and select a new directory.


� Collect Cluster log files for Problem Reporting

You can use this feature to collect cluster log file snap data (clsnap command), which are necessary for additional problem determination and analysis. You can select the debug option here, include RSCT log files and select nodes included in this data collection. If not otherwise specified, the default location for snap collection is in the /tmp/ibmsupt/hacmp/ directory for clsnap and /tmp/phoenix.snapOut for phoenix snap.

Next we list all of the PowerHA logs and describe their purpose:

� /var/hacmp/adm/cluster.log file

The cluster.log file is the main PowerHA log file. PowerHA error messages and messages about PowerHA-related events are appended to this log with the time and date at which they occurred.

� /var/hacmp/adm/history/cluster.mmddyyyy file

The cluster.mmddyyyy file contains time-stamped, formatted messages generated by PowerHA scripts. The system creates a cluster history file whenever cluster events occur, identifying each file by the file name extension mmddyyyy, where mm indicates the month, dd indicates the day, and yyyy indicates the year.

� /var/hacmp/clcomd/clcomd.log file

The clcomd.log file contains time-stamped, formatted messages generated by the PowerHA Cluster Communication Daemon. This log file contains an entry for every connect request made to another node and the return status of the request.

� /var/hacmp/clcomd/clcomddiag.log file

The clcomddiag.log file contains time-stamped, formatted messages generated by the PowerHA Communication daemon when tracing is turned on. This log file is typically used by IBM support personnel for troubleshooting.

� /var/hacmp/clverify/clverify.log file

The clverify.log file contains verbose messages, output during verification. Cluster verification consists of a series of checks performed against various PowerHA configurations. Each check attempts to detect either a cluster consistency issue or an error. The verification messages follow a common, standardized format, where feasible, indicating such information as the node(s), devices, and command in which the error occurred.

� /var/hacmp/log/autoverify.log file

The autoverify.log file contains logging for auto verify and synchronize.


� /var/hacmp/log/clavan.log file

The clavan.log file keeps track of when each application that is managed by PowerHA is started or stopped and when the node stops on which an application is running. By collecting the records in the clavan.log file from every node in the cluster, a utility program can determine how long each application has been up, as well as compute other statistics describing application availability time.

� /var/hacmp/log/clinfo.log clinfo.log.n, n=1,..,7 file

The clinfo.log file is typically installed on both client and server systems. Client systems do not have the infrastructure to support log file cycling or redirection. The clinfo.log file records the activity of the clinfo daemon.

� /var/hacmp/log/cl_testtool.log file

When you run the Cluster Test Tool from SMIT, it displays status messages to the screen and stores output from the tests in the /var/hacmp/log/cl_testtool.log file.

� /var/hacmp/log/clconfigassist.log file

The clconfigassist.log file is the log file for the Cluster Configuration Assistant.

� /var/hacmp/log/clstrmgr.debug clstrmgr.debug.n, n=1,..,7 file

The clstrmgr.debug log file contains time-stamped, formatted messages generated by Cluster Manager activity. This file is typically used only by IBM support personnel.

� /var/hacmp/log/clstrmgr.debug.long clstrmgr.debug.long.n, n=1,..,7 file

The clstrmgr.debug.long file contains high-level logging of cluster manager activity, in particular its interaction with other components of PowerHA and with RSCT, which event is currently being run, and information about resource groups (for example, their state and actions to be performed, such as acquiring or releasing them during an event).

� /var/hacmp/log/clutils.log file

The clutils.log file contains the results of the automatic verification that runs on one user-selectable PowerHA cluster node once every 24 hours. When cluster verification completes on the selected cluster node, this node notifies the other cluster nodes with the following information:

– The name of the node where verification had been run– The date and time of the last verification– Results of the verification


The clutils.log file also contains messages about any errors found and actions taken by PowerHA for the following utilities: – The PowerHA File Collections utility – The Two-Node Cluster Configuration Assistant – The Cluster Test Tool – The OLPW conversion tool

� /var/hacmp/log/cspoc.log file

The cspoc.log file contains logging of the execution of C-SPOC commands on the local node with ksh option xtrace enabled (set -x).

� /var/hacmp/log/cspoc.log.remote file

The cspoc.log.remote file contains logging of the execution of C-SPOC commands on remote nodes with ksh option xtrace enabled (set -x). To enable this logging you must set environment variable 'VERBOSE_LOGGING_REMOTE=high' on the local node where the C-SPOC operation is being run, this will create a log file on the remote node named cspoc.log.remote containing set -x output from the operations run there, this is useful in debugging failed LVM operations on the remote node.

� /var/hacmp/log/hacmp.out hacmp.out.n n=1,..,7 file

The hacmp.out file records the output generated by the event scripts as they run. This information supplements and expands upon the information in the /var/hacmp/adm/cluster.log file. To receive verbose output, the debug level runtime parameter should be set to high (the default).

� /var/hacmp/log/migration.log file

The migration.log file contains a high level of logging of cluster activity while the cluster manager on the local node operates in a migration state.

� /var/hacmp/log/oraclesa.log file

The oraclesa.log file contains logging of the Smart Assist for Oracle facility.

� /var/hacmp/log/sa.log file

The sa.log file contains logging of the Smart Assist facility.

Cluster administrators should ensure that there is enough free space for all log files in the file systems. The minimum amount of space required in /var depends on the number of the nodes in the cluster. You can use the following estimations to assist in calculating the value for each cluster node:� 2 MB should be free for writing the clverify.log[0-9] files.� 4 MB per node is needed for writing the verification data from the nodes.� 20 MB is needed for writing the clcomd log information.� 1 MB per node is needed for writing the ODM cache data.

For example, for a four-node cluster, you need 2 + (4x4) + 20 + (4x1) = 42 MB of space in the /var file system.


Some additional log files that gather debug data might require further additional space in the /var file system. This is dependent on other factors such as number of shared volume groups and file system. Cluster verification will issue a warning if there is not sufficient space allocated to the /var file system.

To change a default directory of the specific logfile in the SMIT menu, run:

smitty hacmp System Management (C-SPOC) HACMP Log Viewing and Management Change/Show a Cluster Log Directory, then select the log file to change

The SMIT fast-path is smitty clusterlog_redir.select The default log directory is changed for all nodes in cluster. The cluster should be synchronized after changing the log parameters.

7.7.6 Error notificationYou can use the AIX Error Notification facility to add an additional layer of high availability to a PowerHA environment. You can add notification for failures of resources for which PowerHA does not provide recovery by default.

You can find more information about automatic error notification, with some examples of using and configuring it, in 11.4, “Automatic error notification” on page 566.

7.7.7 Application monitoringIn this section we discuss application monitoring.

Why we implement application monitorsBy default, PowerHA in conjunction with RSCT monitors the network infrastructure. Application health, beyond the availability of the network used by clients to get to the application, is not monitored. This fact makes configuring Application monitors in PowerHA an important consideration.

Note: We recommend that you use local file systems if changing default log locations rather than shared or NFS file systems. Having logs on shared or NFS file systems can cause problems if the file system needs to unmount during a fallover event. Redirecting logs to shared or NFS file systems can also prevent cluster services from starting during node reintegration.


In addition, the introduction of the Unmanaged Resource Groups option while stopping cluster services (which leaves the applications running without cluster services) makes application monitors a crucial factor in maintaining application availability.

When cluster services are restarted to begin managing the resource groups again, the process of acquiring resources will check each resource to determine if it is online. If it is running, acquiring that resource will be skipped.

For the application, for example running the server start script, this check is done using an application monitor. The application monitor’s returned status determines whether or not the application server start script will be run.

But what if no application monitor is defined? If so, the cluster manager runs the application server start script. This could cause problems for applications that cannot deal with another instance being started, for example, if the start script is run again when the application is already running.

Configuring application monitorsThere are two types of application monitors that can be configured with PowerHA, process monitors and custom monitors:

� Process monitors detect the termination of one or more processes of an application using RSCT Resource Monitoring and Control (RMC). Any process that appears in the output of ps -el can be monitored using a PowerHA process monitor.

� Custom monitors check the health of an application with a user-written custom monitor method at user-specified polling intervals. This gives the administrator the freedom to check for anything that can be defined as a determining factor in an application’s health. It could be a check for the ability to login to an application, to open a database, to write a dummy record, to query an application’s internal state, and so on. A return code from the user-written monitor of zero (0) indicates that application is healthy, no further action is taken. A non-zero return code indicates that the application is not healthy and recovery actions are to take place.

For each PowerHA application server configured in the cluster, you can configure up to 128 different application monitors, but the total number of application monitors in a cluster can not exceed 128.

Application monitors can be configured to run in different modes:

� Long-running mode� Startup mode� Both modes


In long-running mode, the monitor periodically checks that the application is running successfully. The checking frequency is set via the Monitor Interval. The checking begins after the stabilization interval expires, the Resource Group that owns the application server is marked online, and the cluster has stabilized.

In startup mode, PowerHA checks the process (or calls the custom monitor), at an interval equal to 1/20th of the stabilization interval of the startup monitor. The monitoring continues until:

� The process is up.� The custom monitor returns a 0.� The stabilization interval expires.

If successful, the resource group is put into the online state, otherwise the cleanup method is invoked. In both modes, the monitor checks for the successful startup of the application server and periodically checks that the application is running successfully.

To configure an application monitor using SMIT, run:

1. smitty hacmp Extended Resource Configuration Extended Resource Configuration HACMP Extended Resources Configuration Configure HACMP Applications Configure HACMP Application Monitoring,

2. Then select between the Configure Process Application Monitors and Configure Custom Application Monitors menus.

Process application monitoringThe process application monitoring facility uses RMC and therefore does not require any custom script. It detects only the application process termination and does not detect any other malfunction of the application.

When PowerHA finds that the monitored application process (or processes) are terminated, it tries to restart the application on the current node until a specified retry count is exhausted.

To add a new process application monitor using SMIT, run:

� smitty hacmp Extended Resource Configuration Extended Resource Configuration HACMP Extended Resources Configuration Configure HACMP Applications Configure HACMP Application Monitoring Configure Process Application Monitors Add a Process Application Monitor

� You can also access it by using fast-path smitty cm_cfg_custom_appmon.

Tip: The SMIT fast-path for application monitor configuration is smitty cm_cfg_appmon fast path.


Figure 7-32 shows the SMIT panel with field entries for configuring an example process application monitor.

Figure 7-32 Adding process application monitor SMIT panel

The application monitor is called APP1_monitor and is configured to monitor the APP1 application server. The default monitor mode, Long-running monitoring, was chosen. A Stabilization interval of 120 seconds was chosen in our example.

The application processes being monitored are app1d and app1testd. These must be present in the output of a ps -el command when the application is running to be monitored via a process monitor. They are owned by root and only one instance of each is expected to be running as determined by Process Owner and Instance Count values.

Add a Process Application Monitor

Type or select values in entry fields.Press Enter AFTER making all desired changes. [Entry Fields]* Monitor Name [APP1_monitor]* Application Server(s) to Monitor APP1 +* Monitor Mode [Long-running monito>+* Processes to Monitor [app1d app1testd]* Process Owner [root] Instance Count [1] #* Stabilization Interval [120] #* Restart Count [5] # Restart Interval [600] #* Action on Application Failure [notify] +Notify Method [/usr/local/App1Mon.sh>

Cleanup Method [/usr/local/App1Stop] Restart Method [/usr/local/App1Start]


Note: The stabilization interval is one of the most critical values in the monitor configuration. It must be set to a value that is determined to be long enough that if it expires, the application has definitely failed to start. If the application is in the process of a successful start and the stabilization interval expires, cleanup will be attempted and the resource group will be placed into ERROR state. The consequences of the cleanup process will vary by application and the method might provide undesirable results.


If the application fails, the Restart Method is run to recover the application. If the application fails to recover to a running state after the number of restart attempts exceed the Retry Count, the Action on Application Failure is taken. The action can be notify or fallover. If notify is selected, no further action is taken after running the Notify Method. If fallover is selected, the resource group containing the monitored application moves to the next available node in the resource group.

The Cleanup Method and Restart Method define the scripts for stopping and restarting the application after failure is detected. The default values are the start and stop scripts as defined in the application server configuration.

Custom application monitoringCustom application monitor offers another possibility to monitor the application availability by using custom scripts, which could simulate client access to the services, provided by the application. Based on the exit code of this script, the monitor will establish if the application is available or not. If the script exits with return code 0, then the application is available. Any other return code means that application is not available.

To add a new custom application monitor using SMIT, run:

� smitty hacmp Extended Resource Configuration Extended Resource Configuration HACMP Extended Resources Configuration Configure HACMP Applications Configure HACMP Application Monitoring Configure Custom Application Monitors Add a Custom Application Monitor

� You can also access it by using fast-path smitty cm_cfg_custom_appmon.

The SMIT panel and its entries for adding this method into the cluster configuration are similar to the process application monitor add SMIT panel, as shown in Figure 7-32 on page 443.

The only different fields in configuring custom application monitors SMIT menu are as follows:

Monitor Method Defines the full path name for the script that provides a method to check the application status. If the application is a database, this script could connect to the database and run a SQL select sentence for a specific table in the database. If the given result of the SQL select sentence is correct, it means that database works normally.

Monitor Interval Defines the time (in seconds) between each occurrence of Monitor Method being run.


Hung Monitor Signal Defines the signal that is sent to stop the Monitor Method if it doesn't return within Monitor Interval seconds. The default action is SIGKILL(9).

How application monitors workThere are two aspects of Application Monitors to consider:

Application Monitor How the status is determined and in what order Initial Status processing Application Monitors are considered for initial

application status checking

Application Monitor The relationship and processing of Long-running General processing versus Startup monitors

Application Monitor Initial Status processingAs a protection mechanism, prior to invoking the application server start script, the cluster manager uses an application monitor to determine the status of the application. This processing is done for each application server and is as follows:

� If there is no application monitor defined for the application server or if the application monitor returns a failure status (RC!=0 for custom monitors, processes not running via RMC for process monitors), the Application Server start script is invoked.

� If the application monitor returns a success status (RC=0 for custom monitors, processes running via RMC for process monitors), the application server start script is not run.

If only one application monitor is defined for an application server, the process is as simple as stated previously.

If more than one application monitor is defined, the selection priority is based on the Monitor Type (custom or process) and the Invocation (both, long-running or startup). The ranking of the combinations of these two monitor characteristics is as follows:

� Both, Process� Long-running, Process� Both, Custom� Long-running, Custom � Startup, Process� Startup, Custom

The highest priority application monitor found is used to test the state of the application. When creating multiple application monitors for an application, be sure that your highest ranking monitor according to the foregoing list returns a status that can be used by the cluster manager to decide whether to invoke the application server start script.


When more than one application monitor meets the criteria as the highest ranking, the sort order is unpredictable (because qsort is used). It does consistently produce the same result, though.

Fortunately, there is a way to test which monitor will be used. The routine that is used by the cluster manager to determine the highest ranking monitor is: /usr/es/sbin/cluster/utilities/cl_app_startup_monitor.

An example of using this utility for the Application Server called testmonApp which has three monitors configured is as follows:

/usr/es/sbin/cluster/utilities/cl_app_startup_monitor -s testmonApp -a

The output for this command, shown in Example 7-21, shows three monitors:

� Mon: Custom, Long-running� bothuser_testmon: Both, Custom� longproctestmon: Process, Long-running

Example 7-21 Application monitor startup monito

application = [testmonApp] monitor_name = [Mon] resourceGroup = [NULL] MONITOR_TYPE = [user] PROCESSES = [NULL] PROCESS_OWNER = [NULL] MONITOR_METHOD = [/tmp/longR] INSTANCE_COUNT = [NULL] MONITOR_INTERVAL = [60] HUNG_MONITOR_SIGNAL = [9] STABILIZATION_INTERVAL = [60] INVOCATION = [longrunning]

application = [testmonApp] monitor_name = [bothuser_testmon] resourceGroup = [NULL] MONITOR_TYPE = [user] PROCESSES = [NULL] PROCESS_OWNER = [NULL] MONITOR_METHOD = [/tmp/Bothtest] INSTANCE_COUNT = [NULL] MONITOR_INTERVAL = [10] HUNG_MONITOR_SIGNAL = [9] STABILIZATION_INTERVAL = [20] INVOCATION = [both]

application = [testmonApp] monitor_name = [longproctestmon] resourceGroup = [NULL]


MONITOR_TYPE = [process] PROCESSES = [httpd] PROCESS_OWNER = [root] MONITOR_METHOD = [NULL] INSTANCE_COUNT = [4] MONITOR_INTERVAL = [NULL] HUNG_MONITOR_SIGNAL = [9] STABILIZATION_INTERVAL = [60] INVOCATION = [longrunning]

Monitor [longproctestmon] is detecting whether the application is runningqueryProcessState - Called.Arguments: selectString=[ProgramName == "httpd" && Filter == "ruser == \"root\"" ] expression=[Processes.CurPidCount < 4]Application monitor [longproctestmon] exited with code (17)Application monitor[longproctestmon] exited with code (17) - returning success

In the example, there are three monitors that can be used for initial status checking. The highest ranking is the long-running process monitor, longproctestmon. Recall that the Monitor Type for custom monitors is user.

Application Monitor General processingAn initial application status check is performed using one of the monitors. See the previous description for details.

If necessary, the application server start script is invoked. Simultaneously, all startup monitors are invoked. Only when all the startup monitors indicate that the application has started, by returning successful status, is the application considered online (and can lead to the resource group going to the ONLINE state).

The stabilization interval is the timeout period for the startup monitor. If the startup monitor fails to return a successful status, the application's resource group will go to the ERROR state.

After the startup monitor returns a successful status, there is a short time where the resource group state transitions to ONLINE, usually from ACQUIRING.

Note: A startup monitor will be used for initial application status checking only if no long-running (or both) monitor is found.


For each long-running monitor, the stabilization interval is allowed to elapse and then the long-running monitor is invoked. The long-running monitor continues to run until a problem is encountered with the application.

If the long-running monitor returns a failure status, the retry count is examined. If it is non-zero, it is decremented, the Cleanup Method is invoked and then the Restart Method is invoked. If the retry count is zero, the cluster manager will process either a fallover event or a notify event. This is determined by the Action on Application Failure setting for the monitor.

After the Restart Interval expires, the retry count is reset to the configured value.

Application Monitor examplesThe initial settings for application monitors used in the example are as follows:

App mon name: MonInvocation: longrunningMonitor method: /tmp/longRMonitor interval: 60Stab Interval: 60Restart count: 3Restart method: /tmp/Start (the application server start script)Restart Interval:396 (default)

App mon name: startupInvocation: startupMonitor method: /tmp/start-upMonitor interval: 20Stab Interval: 20Restart count: 3Restart method: /tmp/Start (the application server start script)Restart Interval: 132 (default)

Application server start, stop, and monitor script functionsThe restart method /tmp/Start sleeps for 10 seconds, then starts the “real” application (/tmp/App simulates a real application and is called in the background). /tmp/App writes the status of the monitored RG to the log file every second for 20 seconds, then sleeps for the balance of 10 minutes.

/tmp/longR and /tmp/start-up do the same thing. They both check for /tmp/App in the process table. They exit RC=0 if found and RC=1 if not found.

/tmp/Stop finds and kills /tmp/App.


Scenario 1: Normal (no error) application startLogging was done to a common log and is shown here:

/tmp/longR 12:24:47 App RC is 1/tmp/Start 12:24:48 App start script invoked/tmp/start-up 12:24:48 App RC is 1/tmp/start-up 12:24:49 App RC is 1/tmp/start-up 12:24:50 App RC is 1/tmp/start-up 12:24:51 App RC is 1/tmp/start-up 12:24:52 App RC is 1/tmp/start-up 12:24:53 App RC is 1/tmp/start-up 12:24:54 App RC is 1/tmp/start-up 12:24:55 App RC is 1/tmp/start-up 12:24:56 App RC is 1/tmp/start-up 12:24:57 App RC is 1/tmp/App 12:24:58 testmon ACQUIRING/tmp/start-up 12:24:58 App RC is 0/tmp/App 12:24:59 testmon ACQUIRING/tmp/App 12:25:00 testmon ONLINE/tmp/longR 12:26:00 App RC is 0/tmp/longR 12:27:00 App RC is 0

What happened in Scenario 1The following results were part of Scenario 1:

� The long-running monitor (/tmp/longR) was invoked, returning RC=1.

� The application server start script (/tmp/Start) and startup monitor (/tmp/start-up) were invoked one second later.

� The startup monitor returns RC=1, iterating every second (1/20 of the 20 second stabilization interval).

� After the programmed 10 second sleep, the start script launches the “real” application, /tmp/App.

� The startup monitor finds /tmp/App running and returns 0.

� Five seconds later, PowerHA marks the resource group online (other tests showed less than 5 seconds, but no consistency). At 60 second intervals after the RG is marked ONLINE, the longR monitor is invoked, returning RC=0.

Scenario 1 conclusionsWe drew the following major conclusions from scenario 1:

� A long-running monitor is used to perform initial application status check over the Startup monitor.

� A startup monitor is invoked simultaneously with the Start script and monitors at 1/20 of Stabilization interval of the startup monitor (not the long-running monitor).


� A delay of 2-5 seconds exists between the time the startup monitor returns RC=0 and the RG is marked ONLINE.

� The long-running monitor is invoked after a delay of exactly the Stabilization Interval.

Scenario 2 In this scenario, the Application fails to launch within Stabilization Interval (for example, the start-up monitor condition was never met).

Common logging is shown in the following listing:

/tmp/longR 11:32:52 App RC is 1/tmp/Start 11:32:53 App start script invoked/tmp/start-up 11:32:53 App RC is 1/tmp/start-up 11:32:54 App RC is 1/tmp/start-up 11:32:55 App RC is 1/tmp/start-up 11:32:56 App RC is 1/tmp/start-up 11:32:57 App RC is 1/tmp/start-up 11:32:58 App RC is 1/tmp/start-up 11:32:59 App RC is 1/tmp/start-up 11:33:00 App RC is 1/tmp/start-up 11:33:01 App RC is 1/tmp/start-up 11:33:02 App RC is 1/tmp/start-up 11:33:03 App RC is 1/tmp/start-up 11:33:04 App RC is 1/tmp/start-up 11:33:05 App RC is 1/tmp/start-up 11:33:06 App RC is 1/tmp/start-up 11:33:07 App RC is 1/tmp/start-up 11:33:08 App RC is 1/tmp/start-up 11:33:09 App RC is 1/tmp/start-up 11:33:10 App RC is 1/tmp/start-up 11:33:11 App RC is 1/tmp/start-up 11:33:12 App RC is 1/tmp/Stop 11:33:16 App stopped

What happened in Scenario 2The following events took place in scenario 2:

� The long-running monitor (/tmp/longR) is invoked and returns RC=1.

� The application server start script (/tmp/Start) and the startup monitor (/tmp/start-up) are invoked one second later.

� The startup monitor returns RC=1, iterating every second.

� After the 20 second startup monitor stabilization interval, the Cleanup Method (/tmp/Stop, which is also the application server stop script) is invoked.

� The RG goes into ERROR state.


To see what happened in more detail, the failure as logged in /var/hacmp/log/hacmp.out (on final RC=1 from start-up monitor) is shown here:…+testmon:start_server[start_and_monitor_server+102] RETURN_STATUS=1+testmon:start_server[start_and_monitor_server+103] : exit status of cl_app_startup_monitor is: 1+testmon:start_server[start_and_monitor_server+103] [[ 1 != 0 ]]+testmon:start_server[start_and_monitor_server+103] [[ false = true ]]+testmon:start_server[start_and_monitor_server+109] cl_RMupdate resource_error testmonApp start_server2009-03-11T11:33:13.3581952009-03-11T11:33:13.410297Reference string: Wed.Mar.11.11:33:13.EDT.2009.start_server.testmonApp.testmon.ref+testmon:start_server[start_and_monitor_server+110] echo ERROR: Application Startup did not succeed.ERROR: Application Startup did not succeed.+testmon:start_server[start_and_monitor_server+114] echo testmonApp 1+testmon:start_server[start_and_monitor_server+114] 1>> /var/hacmp/log/.start_server.700610+testmon:start_server[start_and_monitor_server+116] return 1+testmon:start_server[+258] awk { if ($2 == 0) { exit 1 } }+testmon:start_server[+258] cat /var/hacmp/log/.start_server.700610+testmon:start_server[+264] SUCCESS=0+testmon:start_server[+266] [[ REAL = EMUL ]]+testmon:start_server[+266] [[ 0 = 1 ]]+testmon:start_server[+284] awk { if ($2 == 1) { exit 1 } if ($2 == 11) { exit 11 }

}+testmon:start_server[+284] cat /var/hacmp/log/.start_server.700610+testmon:start_server[+293] SUCCESS=1+testmon:start_server[+295] [[ 1 = 0 ]]+testmon:start_server[+299] exit 1Mar 11 11:33:13 EVENT FAILED: 1: start_server testmonApp 1

+testmon:node_up_local_complete[+148] RC=1+testmon:node_up_local_complete[+149] : exit status of start_server testmonApp is: 1


Scenario 2 conclusionsWe drew the following major conclusions from this scenario:

� Startup monitor failing to find the application active within the Stabilization Interval results in two important things:

– The Resource Group goes to ERROR state (requiring manual intervention).

– The Cleanup Method is run to stop any processes that might have started.

� The Cleanup Method should be coded such that it verifies that the cleanup is successful, otherwise, remnants of the failed start will exist, possibly hindering a restart.

� If the Stabilization Interval is too short, the application start process will be cut short. This stopped-while-starting situation could be confusing to the application start and stop scripts and to you during any debug effort.

Suspending/resuming application monitoringAfter configuring the application monitor, it is started automatically as part of the acquisition of the application server. If needed, it can be suspended while the application server is still online in PowerHA. This is can be done via the SMIT C-SPOC menus, by running:

� smitty cl_admin HACMP Resource Group and Application Management Suspend/Resume Application Monitoring Suspend Application Monitoring and select the application server, associated with the monitor you want to suspend.

Use the same SMIT path to resume the application monitor. The output of resuming the application monitor associated with the application server APP1 is shown in Example 7-22 shows the output.

Example 7-22 Output from resuming application monitor

Jul 6 2005 18:00:17 cl_RMupdate: Completed request to resume monitor(s) for applic ation APP1.Jul 6 2005 18:00:17 cl_RMupdate: The following monitor(s) are in use for applicati on APP1:test


7.7.8 Measuring application availabilityYou can use the application availability analysis tool for measuring the amount of time, when your high available applications are generally available. The PowerHA software collects and logs the following information in time-stamped format:

� An application starts, stops, or fails.

� A node fails, shuts down, or comes online, as well as if cluster services are started or shut down.

� A resource group is taken offline or moved.

� Application monitoring is suspended or resumed.

According to the information, collected by the application availability analysis tool, you can select a time for measurement period and the tool displays uptime and downtime statistics for a specific application during that period. Using SMIT you can display:

The percentage of uptime.The amount of uptime.The longest period of uptime.The percentage of downtime.The amount of downtime.The longest period of downtime.The percentage of time application monitoring was suspended.

The application availability analysis tool reports application availability from the PowerHA cluster perspective. It can analyze only those applications which have been properly configured in the cluster configuration.

This tool only shows the statistics which reflect the availability of the PowerHA application server, resource group, and the application monitor (if configured). It cannot measure any internal failure in the application, that could be detected by the end-user, if it is not detected by the application monitor.

Using the Application Availability Analysis toolYou can use the application availability analysis tool immediately after you define the application servers, the tool does not need any additional customization and it automatically collects statistics for all application servers configured to PowerHA.

You can display the specific application statistics, generated from the Application Availability Analysis tool with SMIT menus by running:

smitty hacmp System Management (C-SPOC) Resource Group and Application Management Application Availability Analysis


Figure 7-33 shows the SMIT panel displayed for the application availability analysis tool in our test cluster.

smitty cl_app_AAA.dialog

Figure 7-33 Adding Application Availability Analysis SMIT panel

In the SMIT menu of the application availability analysis tool, you just need to enter selected application server, enter start and stop time for statistics, and run the tool. In the Example 7-23 you can see the application availability analysis tool output from our test cluster.

Application Availability Analysis


[Entry Fields]* Select an Application [App1] +* Begin analysis on YEAR (1970-2038) [2009] #* MONTH (01-12) [03] #* DAY (1-31) [21] #* Begin analysis at HOUR (00-23) [16] #* MINUTES (00-59) [22] #* SECONDS (00-59) [00] #* End analysis on YEAR (1970-2038) [2009] #* MONTH (01-12) [03] #* DAY (1-31) [31] #* End analysis at HOUR (00-23) [18] #* MINUTES (00-59) [52] #* SECONDS (00-59) [22] #



Example 7-23 Application availability analysis tool output

Analysis begins: Saturday, 21-March-2009, 16:20Analysis ends: Saturday, 21-March-2009, 17:42Application analyzed: APP1Total time: 0 days, 1 hours, 22 minutes, 0 secondsUptime: Amount: 0 days, 1 hours, 16 minutes, 51 seconds Percentage: 93.72% Longest period: 0 days, 1 hours, 10 minutes, 35 secondsDowntime: Amount: 0 days, 0 hours, 5 minutes, 9 seconds Percentage: 6.28% Longest period: 0 days, 0 hours, 5 minutes, 9 seconds

Log records terminated before the specified ending time was reached.Application monitoring was suspended for 75.87% of the time period analyzed.Application monitoring state was manually changed during the time period analyzed.Cluster services were manually restarted during the time period analyzed.



Chapter 8. Cluster security

In this chapter, we describe the PowerHA security features and show you how cluster security can be enhanced.


� Cluster security and the clcomd daemon� Using encrypted inter-node communication� Secure remote command execution� WebSmit security� PowerHA and firewalls� RSCT security

8


8.1 Cluster security and the clcomd daemonThe Cluster Communication Daemon (clcomd) is a robust, secure transport layer for PowerHA. clcomd manages almost all inter-cluster communication, such as CSPOC, cluster verification, synchronization, and file collections. clcomd also enhances cluster performance: It has a cache for HACMP ODM files for faster delivery and can reuse its existing socket connections.

The cluster manager daemon and the heartbeating mechanism are based on RSCT (see 8.6, “RSCT security” on page 474), while clinfo uses SNMP protocol.

Since HACMP Version 5.1, the .rhosts file and usage of rsh are no longer required. PowerHA security is based on connection authentication using the “least privilege” principle. The Cluster Communication daemon uses an internal access list to authorize other nodes to run command remotely. When a remote execution request arrives to a node, clcomd checks the connection incoming IP address against the IP addresses found in the following locations:

1. HACMPnode ODM class2. HACMPadapter ODM class3. /usr/es/sbin/cluster/etc/rhosts file.

The Cluster Communication daemon uses the principle of “least privileged.” When a command is run remotely, only the minimum required privileges are granted. This ensures that only trusted PowerHA commands can be run remotely and as user nobody whenever possible. The cluster utilities are divided into two groups:

� Trusted commands that are allowed to run as root,� Other commands that run as nobody.

8.1.1 The /usr/es/sbin/cluster/etc/rhosts fileThe /usr/es/sbin/cluster/etc/rhosts file should contain a list of all IP addresses of all cluster nodes for enhanced security. The node connection information (the nodes’ base addresses) is added to this file during PowerHA discovery and cluster verification and synchronization. Also, you can manually edit the /usr/es/sbin/cluster/etc/rhosts file to include the nodes’ IP addresses.

Note: You cannot use clcomd based authentication for your own scripts (application start and stop, custom events, and so on), you should still rely on rsh or SSH.


Initial cluster set upDuring initial cluster setup, the /usr/es/sbin/cluster/etc/rhosts file is empty. Normally PowerHA will put the peer nodes’ base addresses there during the first discovery and cluster synchronization. For a secure initial configuration, we suggest that you add your cluster’s IP addresses to /usr/es/sbin/cluster/etc/rhosts file on all nodes before you start the PowerHA configuration.

8.1.2 Disabling the Cluster Communication daemonYou can disable clcomd for higher security. Notice that without clcomd, the following functions cannot work:

� PowerHA verification and synchronization� CSPOC, including LVM, user and password management� File collections� Message authentication and encryption

You can disable the Cluster Communication daemon by stopping clcomd with the stopsrc -s clcomdES command. To restart clcomd, enter the startsrc -s clcomdES command.

8.1.3 Additional cluster security featuresThe HACMP ODM files are stored in the /etc/es/objrepos directory. In order to improve security, their owner is root and group ID is hacmp. Their permission is 0640, except HACMPdisksubsystem which is 0600. All cluster utilities intended for public use have hacmp setgid turned on so they can read the HACMP ODM files. The hacmp group is created during PowerHA installation, if it is not already there.

Important: The /usr/es/sbin/cluster/etc/rhosts file should have the following permissions:

� owner: root� group: system� permissions: 0600

Note: During the initial setup, when the /usr/es/sbin/cluster/etc/rhosts file is still empty on a PowerHA node it is possible that an unwelcome host connects first and puts its IP address to the rhosts file. In this case, the other peer nodes cannot connect until the /usr/es/sbin/cluster/etc/rhosts file is manually corrected.

Chapter 8. Cluster security 459

8.1.4 Cluster communication over VPN

You can set up PowerHA to use VPN connections for internode communication. VPN support is provided by IP security features available at the AIX operating system level. VPN tunnels must be used to configure persistent IP addresses on each cluster node. RSCT can also be configured to use VPN tunnels.

8.2 Using encrypted inter-node communicationPowerHA supports authentication and encryption of messages exchanged between cluster nodes. All Cluster Manager, Cluster Communication daemon and RSCT traffic can be encrypted. By default, authentication and encryption of messages are disabled.

The authetication service is based on Message Digest version 5 (MD5). The encryption is based on a symmetric key scheme, provided by Cluster Security Services (CtSec). CtSec is part of RSCT. See also 8.6, “RSCT security” on page 474. The following RSCT encryption modules can be used:

� Data Encryption Standard (md5_des)� Triple DES (md5_3des)� Advanced Encryption Standard (md5_aes)

Encryption puts an additional load on the CPUs, but fallover time is not affected.

8.2.1 Encryption key managementThe RSCT encryption modules use symmetric keys. All cluster nodes use the same key for encryption/decryption. On each node the security keys are located in /usr/es/sbin/cluster/etc directory. The file name reflects the selected encryption method:

� key_md5_des� key_md5_3des� key_md5_aes

You should generate the key, when you first time set up the message encryption, when you modify the cluster security configuration, or when security policy requires to do so. One key can be used as long as you like.

PowerHA provides the means to distribute the keys among the nodes. The keys can be distributed among the cluster hosts automatically after they are generated. This method is very convenient and as secure as the network over which the keys are distributed. Alternatively, you can manually distribute the keys to all other nodes.


8.2.2 Setting up message authentication and encryptionIn this section we describe how to configure message authentication and encryption as follows:

1. Install the RSCT encryption filesets.

2. Enable automatic distribution of the keys (automatic key distribution only).

3. Enable message authentication and encryption.

4. Generate and distribute the key.

5. Activate the key.

6. Synchronize the cluster.

7. Disable automatic distribution of the keys (automatic key distribution only).

For security reasons, we suggest that you use manual key distribution.

All steps should be performed from the same cluster node. Additionally, step 2 and step 7 should be run on all nodes.

1. Installing the RSCT encryption filesets.

You should install the RSCT encryption filesets from the AIX Expansion Pack CD.

– rsct.crypt.des for using DES– rsct.crypt.3des for using Triple DES– rsct.crypt.aes256 for using AES.

Note: Check that the key-file’s owner is root, the group ID is system, and the permission is 0400. If somebody can copy your key or intercepts over the network, the cluster security could be compromised.

Important:

� Ensure that the authentication and encryption settings are consistent across the cluster. All nodes should use identical key files. Otherwise the PowerHA nodes cannot communicate.

� Do not perform any other PowerHA configuration task while setting up the encrypted cluster messaging.

� Be sure that your cluster is functioning properly, that it was recently synchronized and verified, and the nodes can communicate to each other.


If your cluster is already running, you have to restart clcomd to enable PowerHA to use the filesets.

2. Enabling automatic distribution of the keys on all nodes.

Perform this step only if you like to use automatic key distribution:

a. Go to SMIT HACMP Cluster Security: start smitty.

– Select Communications Applications and Services.– Select HACMP for AIX.– Select System Management (C-SPOC).– Select HACMP Security and Users Management.– Select HACMP Cluster Security.

Or, use the smit cm_config_security fast-path.

b. Select Configure Message Authentication Mode and Key Management.

c. Select Enable/Disable Automatic Key Distribution.

d. Change Enable/Disable Key Distribution to Enabled. See Figure 8-1.

e. Press Enter to confirm.

Figure 8-1 Enable automatic key distribution

Enable/Disable Automatic Key Distribution


[Entry Fields]* Enable/Disable Key Distribution Enabled +

+-----------------------------------------------------------------------+ ¦ Enable/Disable Key Distribution ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. ¦ ¦ ¦ ¦ Enabled ¦ ¦ Disabled ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦F1¦ F8=Image F10=Exit Enter=Do ¦F5¦ /=Find n=Find Next ¦F9+-----------------------------------------------------------------------+


When key distribution is enabled, you should see a message similar to the one shown in Example 8-1. Repeat this step on all nodes.

Example 8-1 Successfully enabling key distribution

COMMAND STATUS



0513-077 Subsystem has been changed.0513-044 The clcomdES Subsystem was requested to stop.0513-059 The clcomdES Subsystem has been started. Subsystem PID is 303162.The key distribution was Enabled


When key distribution is enabled, you should see in clcomd.log a message similar to:

2009-03-21T20:50:08.405562: The key distribution is Enabled

Repeat this steps on all nodes.

3. Enabling message authentication and encryption method:

a. Go to SMIT HACMP Cluster Security by using the smitty cm_config_security fast path.


c. Select Configure Message Authentication Mode.

d. Press F4 and select the Message Authentication Mode, what you would like to use (see Figure 8-2 on page 464):

• md5_des• md5_3des• md5_aes• None: Neither message authentication nor encryption is used.

e. Set Enable Encryption to Yes.

f. Press Enter.


Figure 8-2 Configure message authentication mode

When message authentication and encryption are enabled, you should see in clcomd.log a message similar to this:

2009-03-22T00:00:01.028738: Message Secmode = md5_aes

4. Generating and distributing the key:

a. Go to SMIT HACMP Cluster Security using the smitty cm_config_security fast path.


c. Select Generate/Distribute a Key.

d. Press F4 and select the Type of Key to Generate. This should be the same as what you selected in Step 3., “Enable message authentication and encryption.” on page 461. See the SMIT panel shown in Figure 8-3.

e. Select Distribute a Key:

• Yes: if you use automatic key distribution.• No: if you prefer manual key distribution.

f. Press Enter.

Configure Message Authentication Mode


[Entry Fields]* Message Authentication Mode md5_aes +* Enable Encryption Yes +

+-----------------------------------------------------------------------+ ¦ Message Authentication Mode ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. ¦ ¦ ¦ ¦ md5_des ¦ ¦ md5_3des ¦ ¦ md5_aes ¦ ¦ none ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦F1¦ F8=Image F10=Exit Enter=Do ¦F5¦ /=Find n=Find Next ¦F9+-----------------------------------------------------------------------+


Figure 8-3 Generate a key

When key generation and distribution have completed successfully, you should see a message similar to the one shown in Example 8-2.

Example 8-2 Successful key distribution

COMMAND STATUS



The key was distributed to node glvm1The key was distributed to node glvm2


When the key was distributed on a cluster node, you should see in clcomd.log a message similar to:

2009-03-21T20:51:20.333236: KEYDISTR: ACCEPTED: glvm1: 10.10.30.4->10.10.30.8

If you selected automatic key distribution, then proceed to Step 5., “Activate the key.” on page 461.

Generate/Distribute a Key


[Entry Fields]* Type of Key to Generate md5_aes +* Distribute a Key No +



If you did not select automatic key distribution manually, copy the key to all other cluster nodes and then proceed to Step 5., “Activate the key.” on page 461.

5. Activating the key:

a. Go to SMIT HACMP Cluster Security by using the smitty cm_config_security fast path.


c. Select Activate the new key on all HACMP cluster nodes.

d. Press Enter again to confirm.

When key has been successfully activated on all cluster nodes, you should see a message similar to the one shown in Example 8-3.

Example 8-3 Successful key activation

COMMAND STATUS



The key was activated on node glvm1The key was activated on node glvm2


When the key is successfully activated, you should see in clcomd.log a message similar to:

2009-03-22T00:00:48.399989: Encryption is ON

6. Synchronizing the cluster: If you encounter any error related to cluster communication, then disable both message authentication and encryption and start over with this procedure.

Important: Do not activate the new key until all the cluster nodes have the same key file installed.


7. Disabling automatic distribution of the keys on all cluster nodes:

Perform this step only if you used automatic key distribution:

a. Go to SMIT HACMP Cluster Security using the smitty cm_config_security fast path.


c. Select Enable/Disable Automatic Key Distribution.

d. Change Enable/Disable Key Distribution to Disabled.

e. Press Enter to confirm.

When key distribution is disabled, you should see a message similar to the one shown in Example 8-4. Repeat this step on all nodes.

Example 8-4 Successfully disabling key distribution

COMMAND STATUS



0513-077 Subsystem has been changed.0513-044 The clcomdES Subsystem was requested to stop.0513-059 The clcomdES Subsystem has been started. Subsystem PID is 303160.The key distribution was Disabled


Important: Always disable automatic distribution of keys after you have successfully generated and distributed a new key. Otherwise, the cluster security could be compromised. It is a good security practice to periodically change encryption keys.


8.2.3 Troubleshooting message authentication and encryptionIf you encounter any cluster communication errors (for example, cluster verification fails or CSPOC cannot communicate with other nodes), then turn off both message authentication and message encryption:

1. Go to SMIT HACMP Cluster Security using the smitty cm_config_security fast path.

2. Select Configure Message Authentication Mode and Key Management.

3. Select Configure Message Authentication Mode.

4. Set Message Authentication Mode to None.

5. Set Enable Encryption to No.

6. Press Enter.

7. Synchronize the cluster.

8.2.4 Checking the current message authentication settingsYou can check the current message authentication and encryption settings as follows:

1. Enter the smitty hacmp fast path command.

2. Select Extended Configuration.

3. Select Extended Topology Configuration.

4. Select Show HACMP Topology.

5. Select Show Cluster Definition. Press Enter to see the cluster definition settings. See the SMIT panel shown in Figure 8-4.


Figure 8-4 Checking the message authentication settings

8.3 Secure remote command executionThe ability to run remote commands on cluster nodes is no longer a requirement for a PowerHA cluster. However, application start and stop scripts, customized cluster events, and other scripts might still require the capability of running commands on remote nodes.

The rsh can still be used for this purpose, but it is not secure because of its .rhosts file based authentication. Secure shell is a very popular method for securing remote command execution in today’s networking environment.

We recommend to use Secure Shell (SSH). DLPAR operations require SSH as well.SSH and Secure Socket Layer (SSL) together provide authentication, confidentiality and data integrity. SSH authentication scheme is based on public - private key infrastructure, while SSL encrypts network traffic.

The following utilities can be used instead of the r-commands:

ssh Secure remote shell, similar to rsh or rlogin

scp Secure remote copy, similar to rcp

sftp Encrypted file transfer utility, similar to ftp

COMMAND STATUS



Cluster Name: two2fourCluster Connection Authentication Mode: StandardCluster Message Authentication Mode: md5_aesCluster Message Encryption: EnabledUse Persistent Labels for Communication: No



8.4 WebSmit securityWebSMIT is a very convenient tool that allows you to configure and manage multiple clusters. Using single sign-on capabilities, each WebSMIT registered user can log in and get secured access to multiple clusters.

Security of WebSMIT relies and depends on the security of the underlying Web server being used. It is possible to use various combinations of security settings for IBM HTTP Server, Apache, cluster nodes and WebSMIT to enhance the security of your environment. The default settings used provide the highest level of security in the default IHS or SSL-enabled Apache environments.

When customizing the security setting of your environment, we recommend to take into account the following settings:

� Secure WebSMIT communication� Require user authetication� Restrict access to WebSMIT� Restrict access to WebSMIT functions

8.4.1 Secure WebSMIT communication

You can secure WebSMIT communication by:

� Changing the default port� Allowing only secure HTTP trafic� Enforcing session time-out

Changing the default portWebSMIT can be configured to allow access only over a specified port using the AUTHORIZED_PORT setting in the wsm_smit.conf file. The port used by default is 42267. The HTTP configuration file supplied with the WebSMIT installation is by default configured to allow secure HTTP access on this port.We suggest to change this setting to a different number to prevent any potential attack targeting this known port number.

Allowing only secure HTTP trafficWebSMIT can use SSL and certificates. At installation time, regardless of the Web server being used a self-signed certificate is generated. This allows all WebSMIT connections to be authenticated and communication encrypted.

You can set REDIRECT_TO_HTTPS in the wsm_smit.conf file so that all users trying to connect to WebSMIT via an insecure connection will be automatically redirected to a secure HTTP connection.


Session timeoutThe SESSION_TIMEOUT variable in the wsm_smit.conf file can be used to request user authentication after a specific period of inactivity. The default value is 20 minutes.

8.4.2 User authentication

You can configure user authentication by:

� Enforcing AIX operating system authentication� Assigning each WebSMIT user active or read-only roles

Using AIX authenticationWebSMIT uses by default AIX operating system authentication. WebSMIT prompts for valid AIX user ID and password. Because AIX authentication mechanisms are in use, login failures can cause an account to be locked.

Accepted usersThe ACCEPTED_USERS variable contains a list of AIX user IDs who can get access to WebSMIT. All users listed here will have root level of authority in WebSMIT. The default is ACCEPTED_USERS=”root”.

Read-only usersThe READ_ONLY_USERS variable contains a list of AIX user IDs who have read-only access to WebSMIT. Users listed here are able to display configuration information and view cluster status, but are not allowed to make any changes to the cluster.

8.4.3 Access to WebSMIT

You can grant only selected AIX users access to WebSMIT.

In order to perform its management function, WebSMIT requires access to PowerHA functions, which in turn require root permissions. A setuid program named wsm_cmd_exec is provided with WebSMIT to allows non-root users to run commands with root privileges on cluster nodes.

Tip: We strongly recommend that you create a separate AIX operating system user to be used only for WebSMIT access.


This could become a potential security exposure if access to this file is not controlled carefully. WebSMIT allows an administrator to explicitly specify the set of users who are allowed access to this program.WebSMIT allows you to specify a list of users who are allowed to use the wsm_cmd_exec program.

8.4.4 Access to WebSMIT panelsYou can configure which SMIT panels can be accessed through WebSMIT. SMIT panel name is the SMIT fast path name for a given panel and you can get this name by pressing F8 in SMIT. For example the panel name for SMIT Extended Topology Configuration is “cm_extended_topology_config_menu_dmn”. See Figure 8-5.

Figure 8-5 SMIT Extended Topology Configuration fast path

The /usr/es/sbin/cluster/wsm/wsm_smit.allow file contains a list of SMIT panels to which access from WebSMIT is allowed. All other panels will be rejected.

In the /usr/es/sbin/cluster/wsm/wsm_smit.deny file you can list that SMIT panels to which access from WebSMIT is not allowed.

Extended Topology Configuration


Configure an HACMP Cluster Configure HACMP Nodes Configure HACMP Sites Configure HACMP Networks Configure HACMP Communication Interfaces/Devices Configure HACMP Persistent Node IP Label/Addresses Configure HACMP Global Networks Configure HACMP Network Modules Configure +------------------------------------------------------+ Show HACMP¦ PRINT SCREEN ¦ ¦ ¦ ¦ Press Enter to save the screen image ¦ ¦ in the log file. ¦ ¦ Press Cancel to return to the application. ¦ ¦ ¦ ¦ Current fast path: ¦ ¦ "cm_extended_topology_config_menu_dmn" ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦F1=Help ¦ F8=Image F10=Exit Enter=Do ¦F9=Shell +------------------------------------------------------+


If a panel is listed in both the allow and deny files, then the deny setting takes precedence and access will not be allowed.

8.5 PowerHA and firewallsThere are some considerations to take into account when placing a PowerHA cluster behind a firewall:

� PowerHA itself does not require any open port on the firewall, there is no outgoing traffic originating from clcomd, RSCT or Cluster Manager. You only need to open ports required by your application and/or system management (for example, SSH).

� Ensure that all service IP addresses can communicate with the outside network regardless the interface where they are bounded to. Take in consideration that during a network failure a service interface will move from one adapter to another. In case of moving a resource group, the service address will move to other node.

� Do not place a firewall between the nodes. In a PowerHA/XD cluster, your nodes might connect through a public network. In this case use Virtual Private Networks or other solution that is transparent for the Cluster Communication Daemon.

� If you use netmon.cf file for enhanced network failure detection, be sure that the IP address listed in this file can be reached (ping) through your firewall.

� If you have clinfo clients coming through a firewall, then you should open the clinfo_client port: 6174/tcp.

� Be sure that your firewall solution is redundant, otherwise the firewall is a single point of failure.

� In case there is a firewall between the WebSMIT Gateway and the Cluster, the clcomdES port must be open: clcomdtcp

Tip: After you finished your cluster configuration, deny all SMIT panels except the HACMP cluster status page.


8.6 RSCT securityIn this section we describe the terminology and concepts used in RSCT security. We also describe the RSCT security architecture and mechanisms used by PowerHA. Knowledge of RSCT and its base components (such as RMC, Resource managers, and so forth) is assumed.

It is important to understand that most of the components and mechanisms explained in this chapter do not need to be configured; the RSCT security layer works out of the box. Also, some functions and files are configured only when PowerHA message authentication and encryption are set up (see 8.2, “Using encrypted inter-node communication” on page 460). Some of the functions described are not used by PowerHA, but they are an integral part of RSCT security.

8.6.1 RSCT and PowerHAPowerHA uses RSCT as an underlying infrastructure layer. PowerHA nodes contact their peer nodes using RSCT components to ensure they are still alive or to request access to their resources. The Topology Services provide heartbeat services. Group Services provide coordination and monitoring of PowerHA cluster manager. implement the PowerHA heartbeat using an RSCT peer domain. The RMC subsystem is used for:

� Custom events� Application monitoring� Dynamic node priority � Exporting network status for use by Oracle RAC.

Figure 8-6 shows how HACMP and RSCT components are related to each other.


Figure 8-6 RSCT and PowerHA

Every time a PowerHA component (for example, the cluster manager of one node) sends a functional request to an RMC resource, either local or remote, the RSCT subsystem is called and the request is sent to the node where the resource is located. See Figure 8-7.

Apart from topology, the group and resource monitoring services provided by RSCT to PowerHA also include authentication, authorization, and confidentiality (encryption and decryption).

Communication between all cluster members is a client/server communication. In the following sections, the term client is used for an application, for example, a local resource manager or a PowerHA function that is using the RMC client API.

Because remote connections are simply TCP/IP socket connections, they must be secured in order to ensure that both the requestor and the server are indeed the entities they claim to be. The privacy of messages exchanged must also be ensured.

API API

clstrmgrES

SW (OS) HW

Resource Monitors

Security Services

Resource Monitoring and Control (RMC)

Topology Services

Group Services

EM


After the communication is secured, authorization takes place. This is a process by which cluster software components are granted or denied access to resources based on specific criteria.

CtSec is responsible for authentication only. RMC itself is responsible for authorization by using an access control list (ACL) to grant or deny access to resources within the cluster.

Figure 8-7 Basic cluster communication overview

Client applications are linked against the RMC and CtSec shared libraries. If those clients request access to resources on a remote node, the RMC daemon on that remote node will be the server for those requests.

8.6.2 Cluster Security Services (CtSec) overviewCluster Security Services (CtSec) are integrated into the RSCT subsystem and are used by RMC to determine the identity of a client from a node. This authentication process results in a security context that is subsequently used by RMC for the communication between the participating nodes to fulfill the client’s request.

HACMP

RMRSCT

HACMP

RMRSCT

RMC API / RMC daemonRMC API / RMC daemon

CtSec ACL CtSec ACL

Node1 Node2

Note: The security context is at a client/server level, not at a node level.


To create this security context, CtSec uses credentials for the authentication. The credentials are used to ensure the authenticity of both the server and the client application. To access resources on a remote node, both nodes send and compare the credentials during the authentication process.

A unique pair of public and private keys is assigned to each cluster node. Data encrypted with either of these keys can be decrypted only using its corresponding counterpart.

These keys are used by cluster security services to establish the node identity. The private key of each node is known only to the local node, whereas the public key is made available to all other nodes.

Using these keys, the authentication process:

� Enables a client to present identity information about itself in a unique manner that cannot be imitated by any other client.

� Enables a server to clearly recognize the identity of the client.

� Enables a server to ensure that client identity is genuine.

� Enables a server to ensure that messages coming from a client are genuine in the sense that they were originated by the client.

� Enables a server to present identity information about itself in a unique manner that cannot be imitated by any other server.

� Enables a client to clearly recognize the identity of the server.

� Enables a client to ensure that server identity is genuine.

� Enables a client to ensure that messages coming from a server are genuine in the sense that they were originated by the server.

The credential-based authentication involves other components to create and identify credentials. This component-based architecture, shown in Figure 8-8, allows future extensions to the security layer of RSCT.


Figure 8-8 Cluster Security Services (CtSec) architecture

The CtSec library consists of several components required to provide the current and future security functions.

The security context created by CtSec contains credentials for both the client and the server, the state of authentication (authenticated or unauthenticated), and the session information (for example, session key or expiration time). The security context is created by the components of CtSec and is sent to RMC as a result of the CtSec authentication.

8.6.3 Mechanism abstraction layer (MAL)CtSec provides an interface to applications that need to be secured. This interface is provided via an abstract layer named mechanism abstract layer (MAL).

MAL provides a generic, mechanism-independent interface for applications and translates application requests into general tasks that can be performed by the underlying security layer. Tasks are actually performed by security mechanism pluggable modules (MPM). An MPM converts generic security routines into specific security functions.

Cluster Security Services (CtSec)

Mechanism Abstract Layer (MAL)

UNIX MPMFuture

ExtensionMPM

RMC authentication request

Identity Mapping Service (IDM)

HBA (ctcasd)


Currently the available MPMs are:

� Host Based Authentication mechanism� Enhanced Host Based Authentication mechanism

Additional MPMs might be added in the future.

Information regarding the available MPMs, such as priority and full path, is kept on each cluster node in ASCII file named /usr/sbin/rsct/cfg/ctsec.cfg. An example of available MPMs on a cluster node is shown in Example 8-5.

Example 8-5 Available MPMs on a cluster node

root@ glvm1[/usr/sbin/rsct/cfg] cat ctsec.cfg |grep -v "#" 1 unix 0x00001 /usr/lib/unix.mpm i 2 hba2 0x00002 /usr/lib/hba2.mpm iz[unix]

When using the Enhanced Host Based Authentication mechanism or the Host Based Authentication mechanism cluster security services use ctcasd daemon to provide and authenticate operating system based identity credentials.

The ctcasd daemon is called only if TCP/IP sockets for remote connections are used. If the request is for the local host, MPM uses UNIX® domain sockets within the kernel. The kernel security is trusted, and no further security features are required.

8.6.4 Mechanism pluggable modules (MPM)Each mechanism pluggable module (MPM) converts the general tasks, received from the MAL layer, into specific tasks the security mechanism uses to satisfy the MAL request.

The MPM gathers all credentials that are necessary to fulfill the authentication process for this specific security mechanism. The MPM also maps network identities to local identities (see 8.6.6, “Identity mapping service and RMC access control lists” on page 481).

Due to the modular architecture of MAL and MPM, other security mechanisms might be added in future releases. MPMs are object modules that are loaded by the MAL during run time.


MPMs are located in the /usr/sbin/rsct/lib directory. Each MPM must have a link in /usr/lib/ that points to the corresponding file in /usr/sbin/rsct/lib/ as shown in Example 8-6.

Example 8-6 Location of MPMs

root@ glvm1[/usr/sbin/rsct/lib] ls -l *mpm*-r--r--r-- 1 bin bin 741283 Sep 23 11:25 hba2.mpm-r--r--r-- 1 bin bin 823514 Sep 23 11:25 hba2.mpm64-r--r--r-- 1 bin bin 150518 Sep 23 11:23 unix.mpm-r--r--r-- 1 bin bin 155612 Sep 23 11:23 unix.mpm64root@ glvm1[/usr/sbin/rsct/lib] ls -al /usr/lib/*.mpm*lrwxrwxrwx 1 root system 27 Mar 12 09:06 /usr/lib/hba2.mpm -> /usr/sbin/rsct/lib/hba2.mpmlrwxrwxrwx 1 root system 29 Mar 12 09:06 /usr/lib/hba2.mpm64 -> /usr/sbin/rsct/lib/hba2.mpm64lrwxrwxrwx 1 root system 27 Mar 12 09:06 /usr/lib/unix.mpm -> /usr/sbin/rsct/lib/unix.mpmlrwxrwxrwx 1 root system 29 Mar 12 09:06 /usr/lib/unix.mpm64 -> /usr/sbin/rsct/lib/unix.mpm64

8.6.5 Host-based authentication with ctcasdThe Cluster Technology Cluster Authentication Service daemon (ctcasd) creates security credentials used for authentication based on the node’s host name and client identity. Remember that a client is an application that uses the RMC client API.

The private key resides in a file that can be accessed only by local root. The public key is made available to other nodes. Associations between the host identities—host names and IP addresses—and their corresponding public keys are kept in a trusted host list file:

� The private key is kept in the file /var/ct/cfg/ct_has.qkf. � The public key is kept in the file /var/ct/cfg/ct_has.pkf. � The list of associations is kept in the file /var/ct/cfg/ct_has.thl.

Information regarding the configuration of the ctcasd daemon, such as the key generation method, is kept on each cluster node in an ASCII file named /usr/sbin/rsct/cfg/ctcasd.cfg. Any changes to this configuration file require the ctcasd daemon to be restarted. Bear in mind that when ctcasd daemon is unavailable, authentication is not possible.

To ensure that ctcasd uses the appropriate public key, it is necessary that all nodes across the PowerHA cluster can resolve names identically.


The distribution of public keys to all nodes is performed by RSCT. The public key exchange is done over the network. During this exchange, the network must be secured against tracing and spoofing, because the keys are bounded to a node within the cluster.

Another task of ctcasd is to create and exchange a secret key that will be subsequently used as a session key to encrypt communication between nodes. This session key is created to allow RSCT to encrypt and decrypt transmitted data and uses a symmetric key algorithm, which is faster than public key algorithms.

To exchange the session key, the ctcasd daemon on the initiator node first creates a message by encrypting the session key with the target host public key obtained from the trusted hosts file. This ensures that only the target node which has the corresponding private key can decrypt this message. In this manner the privacy of the session key is ensured.

The second step is to encrypt the whole message using the initiator’s private key so the target node can be sure about the authenticity of the originator. This action also ensures the integrity of data because no other node can have access to the private key of the initiator.

The target node uses initiator public key obtained from the trusted hosts file and then its own private key to decrypt the message received and so gets the session key. The traffic flowing between the originator and the target node is encrypted with the symmetric key.

8.6.6 Identity mapping service and RMC access control listsWe have seen so far that the request for a specific resource can originate from a different node than the node where the resource is located. The identity of the requestor as reported by authentication services constitutes its network identity.

Authorization is the process by which cluster software components are granted or denied access to resources based on specific criteria. The network identity of the requestor is used to establish the privileges of the requestor to have access to a specific local resource.

Important: To ensure identical host name resolution, all participating cluster members should use a method for name resolution that gives identical results on all nodes in a PowerHA cluster. The name resolution method and order can be changed in /etc/netsvc.conf (for AIX systems). All hosts should also use either short or fully qualified host names. If the cluster consists of nodes in different domains, fully qualified host names must be used.


RMC uses access control lists (ACL) to control access to resources. ACLs allow for accurate and granular control over resources. ACLs are updated by PowerHA during cluster administration procedures (for example, adding and removing nodes).

On each cluster node there is a file /usr/sbin/rsct/cfg/ctrmc.acls which contains the default Access Control List. The ACL file consists of stanzas corresponding to resource classes and instances that specify access permissions for user identifiers across the cluster nodes. If any changes to this file are to be made, then the file must be copied to /var/ct/cfg/ctrmc.acls and the ctrmc daemon must be restarted.

Example 8-7 displays the resource class instance named IBM.HostPublic on the host named glvm1 and privileges associated with it.

Example 8-7 IBM.HostPublic resource and its privileges

root@ glvm1[/var/ct/cfg] lsrsrc IBM.HostPublicResource Persistent Attributes for IBM.HostPublicresource 1: PublicKey = ["rsa512","120200948eb975ec93891b690b62659754a4d8de092cc3d9d4532e99fd56c70d0ba7f624f94b5d570c947a90228380cf3b8d9ef68a3cdcb51c4d638c7354e16c2812f30103"] PublicKeyBinary = "0xc5ec0001 0x00002001 0x12020094 0x8eb975ec 0x93891b69 0x0b626597 0x54a4d8de 0x092cc3d9 0xd4532e99 0xfd56c70d 0x0ba7f624 0xf94b5d57 0x0c947a90 0x228380cf 0x3b8d9ef6 0x8a3cdcb5 0x1c4d638c 0x7354e16c 0x2812f301 0x03" Hostname = "glvm1" ActivePeerDomain = "" NodeNameList = {"glvm1"}root@ glvm1[/var/ct/cfg] cat ctrmc.acls|grep -p HostPublic# The following stanza will enable anyone to read the information in the# IBM.HostPublic class which provides public information about the node,# mainly its public key.

IBM.HostPublic * * r UNAUTHENT * r


It is very easy to image the same user names on different nodes not having the same permissions to access a specific resource. So RMC needs to match the operating stem user identifiers specified in the ACL file against the network security identifiers that are verified by the authentication services to determine whether the requestor has the appropriate permissions.

This correspondence is achieved by identity mapping (IDM) service. The IDM service maps network identities to local identities using mapping rules specified in the configuration files named maps. This is the main reason for which IDM was designed. IDM simply maps network identities in a cluster to a local identity in order to avoid hundreds of lines in the RMC ACL file.

IDM uses information stored in the following identity mapping files:

� /var/ct/cfg/ctsec_map.global which contains the cluster-wide identity mappings

� /var/ct/cfg/ctsec_map.local which contains identity mappings specific to the local node only. The existence of this file is optional.

The files used to map identities are used by IDM as follows:

� If the /var/ct/cfg/ctsec_map.local exists, IDM uses this file to determine associations.

� If the /var/ct/cfg/ctsec_map.global exists, IDM uses this file to determine associations.

� If the /var/ct/cfg/ctsec_map.global does not exist, IDM uses the /usr/sbin/rsct/cfg/ctsec_map.global file to determine associations.

The identity mapping within an individual file is performed on a first-match basis, which means the first entry that applies for a network identity is used.



Part 4 Advanced topics (with examples)

In Part 4, we cover the following advanced topics:

� Virtualization and PowerHA� Extending resource group capabilities� Customizing events� Storage related considerations� Networking considerations

Part 4



Chapter 9. Virtualization and PowerHA

In this chapter we introduce the various virtualization features and options available to a PowerHA cluster administrator. We discuss the benefits of these features and describe how to configure them for use with PowerHA.


� Virtualization� Virtual I/O Server� DLPAR and application provisioning� Live Partition Mobility� Workload Partitions

9


9.1 Virtualization

Virtualization is a normal factor in the day to day configuration of a POWER systems environment today. Any environment, whether virtual or not, requires detailed planning and documentation, enabling administrators to effectively maintain and manage these environments.

When planning a virtual environment in which to run a PowerHA cluster, we must focus on improving hardware concurrency at the Virtual I/O Server level as well as in the PowerHA cluster nodes. Typically, the Virtual I/O Server hosts the physical hardware being presented to the cluster nodes, so it is crucial to address the question of what would happen to your cluster if each or any of those devices were to fail.

The Virtual I/O Server itself is considered a single point of failure, so you should consider presenting shared disk and virtual ethernet to your cluster nodes from additional Virtual I/O Server partitions. We can see an example of considerations for PowerHA clusters in a virtualized environment in Figure 9-1.

Figure 9-1 Example considerations for PowerHA with VIO


For further information about configuring Virtual I/O Servers, refer to PowerVM Virtualization on IBM System p: Introduction and Configuration Fourth Edition, SG24-7940, which can be found at:

http://www.redbooks.ibm.com/redpieces/abstracts/sg247940.html

9.2 Virtual I/O Server

PowerHA supports the use of VIO client partitions, with some important considerations:

� Management of active cluster shared storage is done at the cluster node level. The Virtual I/O Server only presents this storage to the cluster nodes.

� Care should be taken to ensure that the reservation policy of all shared disks presented through the Virtual I/O Server is set to ‘no_reserve’

� All volume groups created on VIO clients used for PowerHA clusters must be enhanced concurrent capable, whether they are to be used in concurrent mode or not.

� If any cluster nodes are accessing the shared disks by use of virtual SCSI then all cluster nodes must use virtual scsi, you cannot mix direct attached and virtual SCSI in a cluster.

� Use of an HMC is only required if you want to utilize DLPAR with PowerHA.

� IVM contains a restriction on the number of VLANs a Virtual I/O Server can have. The maximum is 4 VLANs.

There are several ways to configure AIX client partitions and Virtual I/O Server resources for additional high availability with PowerHA, we recommend that you use at least two VIO servers for maintenance tasks at that level. An example of a PowerHA configuration based on VIO clients is shown in Figure 9-2.

Chapter 9. Virtualization and PowerHA 489

http://www.redbooks.ibm.com/redpieces/abstracts/sg247940.html

Figure 9-2 Example PowerHA configuration with VIO

PowerHA and virtual SCSIPowerHA requires the use of enhanced concurrent volume groups when using virtual SCSI. No hardware reserves are placed on the disks, and fast disk takeover is utilized in the event that a volume group must be taken over by another node.

If file systems are used on the standby nodes, they are not mounted until the point of fallover, because the volume groups are in full active read/write mode only on the home node; the standby nodes have the volume groups in passive mode, which does not allow access to the logical volumes or file systems. If shared volumes (raw logical volumes) are accessed directly in enhanced concurrent mode, these volumes are accessible from multiple nodes, so access and locking must be controlled at a higher layer, such as databases.

All volume group creation and maintenance is done using the C-SPOC function of PowerHA and the bos.clvm.enh fileset must be installed.


PowerHA and virtual EthernetIP Address Takeover (IPAT) aliasing must be used. IPAT using replacement and MAC Address Takeover are not supported with cluster nodes using virtual ethernet.

Virtual ethernet interfaces defined to PowerHA should be treated as single-adapter networks. In particular, configure the file netmon.cf to include a list of clients to ping. It must be used to monitor and detect failure of the network interfaces. Due to the nature of virtual ethernet, other mechanisms to detect the failure of network interfaces are not effective.

If the Virtual I/O Server has only a single physical interface on a network (instead of, for example, two interfaces with ethernet aggregation), then a failure of that physical interface will be detected by PowerHA on the AIX client partition. However, that failure will isolate the node from the network. In this case, we recommend that you use a second virtual ethernet on the AIX client partition.

Example configurationThe following steps show an example of how to set up concurrent disk access for a SAN disk that is assigned to two client partitions. Each client partition sees the disk through two VIO servers. On the disk, an enhanced concurrent volume group is created. This kind of configuration can be used to build a two-node PowerHA test cluster on a single POWER machine:

1. Create the disk on the storage device.

2. Assign the disk to the VIO servers.

3. On the first VIO server, scan for the newly assigned disk:

$ cfgdev

4. On the first VIO server, change the SCSI reservation of that disk to no_reserve so that the SCSI reservation bit on that disk is not set if the disk is accessed:

$ chdev -dev hdiskN -attr reserve_policy=no_reserve

Where N is the number of the disk in question; reservation commands are specific to the multipathing disk driver in use. This parameter is used with DS4000® disks; it can be different with other disk subsystems.

5. On the first VIO server, assign the disk to the first partition:

$ mkvdev -vdev hdiskN -vadapter vhostN [ -dev Name ]

Where N is the number of the disk respectively; the vhost in question and the device name can be chosen to your liking, but can also be left out entirely; the system will then create a name automatically.


6. On the first VIO server, assign the disk to the second partition:

$ mkvdev -f -vdev hdiskN -vadapter vhostN [ -dev Name ]

7. On the second VIO server, scan for the disk:

$ cfgdev

8. On the second VIO server, change the SCSI reservation of that disk:

$ chdev -dev hdiskN -attr reserve_policy=no_reserve

9. On the second VIO server, assign the disk to the first cluster node:

$ mkvdev -vdev hdiskN -vadapter vhostN [ -dev Name ]

10.. On the second VIO server, assign the disk to the second cluster node:

$ mkvdev -f -vdev hdiskN -vadapter vhostN [ -dev Name ]

11.. On the first cluster node, scan for that disk:

# cfgmgr

12.. On the first cluster node, create an enhanced concurrent capable volume group and a file system using C-SPOC. You should now see the volume groups and file systems on the second cluster node.

9.3 DLPAR and application provisioning

In this section we cover the following topics in regard to DLPAR:

� Requirements� Application provisioning� Configuring DLPAR to PowerHA� Troubleshooting HMC verification errors� Test cluster configuration� Test results

We expect that proper LPAR and DLPAR planning is part of your overall process before implementing any similar configuration. It is important to understand, not only the requirements and how to implement them, but also to understand the overall effects that each decision has on the overall implementation.


9.3.1 RequirementsTo use the integrated DLPAR functions, and/or CoD of PowerHA on POWER5™ and POWER6, all LPAR nodes in the cluster should have at least the following levels installed:

� PowerHA 5.5:

– PowerHA 5.4.1 w/APAR #IZ18217– PowerHA 5.3 w/APAR #IY88675

� Appropriate AIX level for specific PowerHA version

� Appropriate RSCT level for specific AIX level

� OpenSSH

The OpenSSH software can be obtained from any of the following sources:

� AIX 5.3 Expansion pack

� Linux® Toolbox CD

� Downloaded from:

http://sourceforge.net/projects/openssh-aix

HMC attachment to the LPARs is required for proper management and DLPAR capabilities. The HMC must be network attached on a common network with the LPARs to allow remote DLPAR operations.

Other considerationsWhen planning a cluster to include DLPAR operations, the following considerations apply:

� Encountering possible config_too_long during DLPAR events� Mix of LPARs and non-LPAR systems� Mix of POWER4™, POWER5, and POWER6� CoD provisioning

While HMC versions might not support all POWER platforms, in general, PowerHA does, meaning that a POWER6 production system could fallover to a POWER5 system.

As with any cluster, the configuration must be tested thoroughly. This includes anything that can be done to simulate or produce a real work load for the most realistic test scenarios as possible.

Attention: A key configuration requirement is that the LPAR partition name, and the AIX host name must match. We show this in Figure 9-3.


http://sourceforge.net/projects/openssh-aix

LimitsThe integrated dynamic LPAR support within PowerHA, today, works the same on POWER5 and POWER6 as it did on POWER4. What this means is only whole CPUs can be allocated, no micro-partition or fractional CPU support. While shared CPUs can be used if configured correctly, we recommend that you use dedicated CPUs as shown in the following scenario (Figure 9-3).

Figure 9-3 Partition name and AIX host name matching

9.3.2 Application provisioningIn this section we describe the flow of actions in the PowerHA cluster, if the application provisioning function through DLPAR and CoD is configured. Also included are several examples that illustrate how resources are allocated, depending on different resource requirements.


OverviewWhen you configure an LPAR on the HMC (outside of PowerHA), you provide LPAR minimum, desired, and maximum values for the number of CPUs and amount of memory. These values can be obtained by running the lshwres command on the HMC. The stated minimum values of the resources must be available when an LPAR node starts. If more resources are available in the free pool on the frame, an LPAR can allocate up to the stated desired values.

During dynamic allocation operations, the system does not allow that the values for CPU and memory go below the minimum or above the maximum amounts specified for the LPAR.

PowerHA obtains the LPAR minimums and LPAR maximums amounts and uses them to allocate and release CPU and memory when application servers are started and stopped on the LPAR node.

PowerHA requests the DLPAR resource allocation on the HMC before the application servers are started, and releases the resources after the application servers are stopped. The Cluster Manager waits for the completion of these events before continuing the event processing in the cluster.

PowerHA handles the resource allocation and release for application servers serially, regardless if the resource groups are processed in parallel. This minimizes conflicts between application servers trying to allocate or release the same CPU or memory resources. Therefore, you must carefully configure the cluster to properly handle all CPU and memory requests on an LPAR.

These considerations are important:

� After PowerHA has acquired additional resources for the application server, when the application server moves again to another node, PowerHA releases only those resources that are no longer necessary to support this application on the node.

� PowerHA does not start and stop LPAR nodes.

It is possible to create a custom event or customize application start/stop scripts to stop LPAR nodes if desired.

Additional details on this topic can be found in the HACMP Administration Guide, SC23-4862.

Acquiring DLPAR and CUoD resourcesIf you configure an application server that requires a minimum and a desired amount of resources (CPU or memory), PowerHA determines if additional resources need to be allocated for the node and allocates them if possible.


In general, PowerHA tries to allocate as many resources as possible to meet the desired amount for the application, and uses CUoD, if allowed, to do this.

The LPAR node has the LPAR minimumIf the node owns only the minimum amount of resources, PowerHA requests additional resources through DLPAR and CUoD (if applicable).

In general, PowerHA starts counting the extra resources required for the application from the minimum amount. That is, the minimum resources are retained for the node’s overhead operations, and are not utilized to host an application.

The LPAR node has enough resources to host an applicationThe LPAR node that is about to host an application might already contain enough resources (in addition to the LPAR minimum) to meet the desired amount of resources for this application.

In this case, PowerHA does not allocate any additional resources and the application can be successfully started on the LPAR node. PowerHA also calculates that the node has enough resources for this application in addition to hosting all other application servers that might be currently running on the node.

Resources requested from the free pool and from the CoD poolIf the amount of resources in the free pool is insufficient to satisfy the total amount requested for allocation (minimum requirements for one or more applications), PowerHA requests resources from CoD (if enabled).

If PowerHA meets the requirement for a minimum amount of resources for the application server, application server processing continues. Application server processing continues even if the total desired resources (for one or more applications) have not been met or are only partially met. In general, PowerHA attempts to acquire up to the desired amount of resources requested for an application.

If the amount of resources is insufficient to host an application, PowerHA starts resource group recovery actions to move the resource group to another node.

Minimum amount requested for an application cannot be satisfiedIn some cases, even after PowerHA requests to use resources from the CUoD pool, the amount of resources it can allocate is less than the minimum amount specified for an application.

If the amount of resources is still insufficient to host an application, PowerHA starts resource group recovery actions to move the resource group to another node.


The LPAR node is hosting application serversIn all cases, PowerHA checks whether the node is already hosting application servers that required application provisioning, and that the LPAR maximum for the node is not exceeded:

� Upon subsequent fallovers, PowerHA checks if the minimum amount of requested resources for yet another application server plus the amount of resources already allocated to applications residing on the node exceeds the LPAR maximum.

� In this case, PowerHA attempts resource group recovery actions to move the resource group to another LPAR. Note that when you configure the DLPAR and CUoD requirements for this application server, then during cluster verification, PowerHA warns you if the total number of resources requested for all applications exceeds the LPAR maximum.

Allocation of resources in a cluster with multiple applicationsIf you have multiple applications in different resource groups in the cluster with LPAR nodes, and more than one application is configured to potentially request additional resources through the DLPAR and CUoD function, the resource allocation in the cluster becomes more complex.

Based on the resource group processing order, some resource groups (hence the applications) might not be started. We explain this further in “Examples of using DLPAR and CUoD resources” on page 498

Releasing DLPAR and CUoD resourcesWhen the application server is stopped on the LPAR node (the resource group moves to another node), PowerHA releases only those resources that are no longer necessary to support this application server on the node. The resources are released to the free pool on the frame.

PowerHA first releases the DLPAR or CUoD resources it acquired last. This implies that the CUoD resources might not always be released before the dynamic LPAR resources are released.

The free pool is limited to the single frame only. That is, for clusters configured on two frames, PowerHA does not request resources from the second frame for an LPAR node residing on the first frame.

Also, if LPAR 1 releases an application that puts some DLPAR resources into free pool, LPAR 2 which is using the CUoD resources does not make any attempt to release its CUoD resources and acquire the free DLPAR resources.


Stopping LPAR nodesWhen the Cluster Manager is forced down on an LPAR node, and that LPAR is subsequently shutdown (outside of PowerHA), the CPU and memory resources are released (not by PowerHA) and become available for other resource groups running on other LPARs. PowerHA does not track CPU and memory resources that were allocated to the LPAR and does not retain them for use when the LPAR node rejoins the cluster.

If the LPAR is not stopped after the Cluster Manager is forced down on the node, the CPU and memory resources remain allocated to the LPAR for use when the LPAR rejoins the cluster.

Changing the DLPAR and CUoD resources dynamicallyYou can change the DLPAR and CUoD resource requirements for application servers without stopping the cluster services. Synchronize the cluster after making the changes.

The new configuration is not reflected until the next event that causes the application (hence the resource group) to be released and reacquired on another node. In other words, a change in the resource requirements for CPUs, memory or both does not cause the recalculation of the DLPAR resources. PowerHA does not stop and restart application servers solely for the purpose of making the application provisioning changes.

If another dynamic reconfiguration change (for example, an rg_move) causes the resource groups to be released and reacquired, the new resource requirements for DLPAR and CUoD are used at the end of this dynamic reconfiguration event.

Examples of using DLPAR and CUoD resourcesThe following examples explain CPU allocation and release. The process for memory is very similar. While these are descriptions of how it works, we also provide real results from our test configuration in 9.3.6, “Test results” on page 515.

Note: If you are using the On/Off license for CUoD resources, and the LPAR node is shut down (outside of PowerHA), the CUoD resources are released (not by PowerHA) to the free pool, but the On/Off license continues to be turned on. You might need to manually turn off the license for the CUoD resources that are now in the free pool. (This ensures that you do not pay for resources that are not being currently used).


The configuration is an 8-CPU frame, with a two-node (each an LPAR) cluster. There are two CPUs available in the CUoD pool, that is, through the CUoD activations. The nodes have the partition profile characteristics shown in Table 9-1 and Table 9-2.

Table 9-1 Profile characteristics

There are three application servers defined, each belonging to separate resource groups.

Table 9-2 Application requirements

Example 1: No CPUs added at start, some are released upon stopThe starting configuration settings are as follows:

� Longhorn has 3 CPUs allocated.� Hurricane has 1 CPU allocated.� The free pool has 4 CPUs allocated.

The applications servers are started in the following order:

� Longhorn starts App2, and no CPUs are allocated to meet the requirement of 3 CPUs. (Note that 3 CPUs is equal to the sum on Node1's LPAR minimum of 1 plus the App2 desired amount of 2).

� Longhorn stops App2. Now 2 CPUs are released, leaving 1 CPU, the minimum requirement. (Because no other application servers are running, the only requirement is the Longhorn LPAR minimum of 1).

Note: Be aware that after PowerHA acquires additional resources for an application server, when the server moves again to another node, it takes the resources with it. That is, the LPAR node releases all the additional resources it acquired.

Node name LPAR minimum LPAR maximum

Longhorn 1 9

Hurricane 1 5

Applicationserver name

CPU desired CPU minimum Allow CUoD

App1 1 1 Yes

App2 2 2 No

App3 4 4 No


Example 2: No CPUs added due to RG processing orderIn this example we start off with the same configuration settings as shown in the previous example.

The application servers are started as follows:

� Longhorn starts App1, no CPUs are allocated because the requirement of 2 is met. Longhorn starts App3, and 3 CPUs are allocated to meet the requirement of 6. There is now 1 CPU in the free pool.

� Longhorn attempts to start App2. After Longhorn has acquired App1 and App3, the total amount of CPUs that Longhorn must now own to satisfy these requirements is 6, which is the sum of the Longhorn LPAR minimum of 1 plus the App1 desired amount of 1 plus the App3 desired amount of 4.

Because the App2 minimum amount is 2, in order to acquire App2, Longhorn needs to allocate 2 more CPUs, but there is only 1 CPU left in the free pool and it does not meet the minimum requirement of 2 CPUs for App2. The resource group with App2 is not acquired locally because there is only 1 CPU in the free pool and CUoD use is not allowed. If no other member nodes are present, then the resource group goes into the error state.

If node Hurricane would have been a member node of the App 2 resource group, and active in the cluster, then an rg_move would have been invoked in an attempt to bring up the resource group on node Hurricane.

Example 3: Successful CUoD resources allocation and releaseThe starting configuration settings are as follows:

� Longhorn has 3 CPUs allocated.� Hurricane has 1 CPU allocated.� The free pool has 4 CPUs.

The application servers are started in the following order:

� Longhorn starts App3, and 2 CPUs are allocated to meet the requirement of 5.

� Longhorn starts App2, and 2 CPUs are allocated to meet the requirement of 7. There are now no CPUs in the free pool.

� Longhorn starts App1, and 1 CPU is taken from CUoD and allocated to meet the requirement of 8.

� Longhorn stops App3, and 4 CPUs are released, while 1 of those CPUs is put back into the CUoD pool.


Example 4: Resource group failure, minimum resources not metIn this example, the resource group acquisition fails due to the fact that the minimum resources needed are not currently available because the LPAR has reached its maximum.

The configuration is as follows:

� Longhorn has 1 CPU.� Hurricane has 1 CPU.� The free pool has 6 CPUs.

The application servers are started in the following order:

� Hurricane starts App3, and 4 CPUs are allocated to meet the requirement of 5. There are now 2 CPUs in the free pool.

� Hurricane attempts to start App2, but App2 goes into error state because the LPAR maximum for Hurricane is 5 and Hurricane cannot acquire more CPUs.

Example 5: Resource group failure, LPAR min and max are sameIn this example, we demonstrate a real case that we encountered during our early testing. This is a direct result of improper planning in regards to how application provisioning works.

We are still using an 8-CPU frame, however, the additional application servers and nodes are not relevant to this example. The LPAR configuration for node Longhorn is shown in Table 9-3.

Table 9-3 LPAR characteristics for node Longhorn

The App1 application server has the settings shown in Table 9-4.

Table 9-4 Application requirements for App1

Note: If the minimum resources for App2 would have been set to zero instead of one, the acquisition would have succeeded. because no additional resources would have been required.

LPAR minimum LPAR desired LPAR maximum

4 4 4

Minimum number of CPUs Desired number of CPUs

1 4


The starting configuration is as follows:

� Longhorn has 4 CPUs allocated.� The free pool has 4 CPUs.

App1 application server is started locally on node Longhorn. During acquisition, then the LPAR minimum is checked and added to the application server minimum, which returns a total of 5. This total exceeds the LPAR maximum setting and results in the resource group going into the error state.

Though technically the LPAR might already have enough resources to host the application, because of the combination of settings, it results in a failure. Generally, you would not have the minimum and maximum settings equal.

This scenario could have been avoided in any one of these three ways:

� Change LPAR minimum to 3 or less.� Change LPAR maximum to more than 4.� Change App1 minimum CPUs to 0.

9.3.3 Configuring DLPAR to PowerHAHere, we cover the following steps that are needed to configure dynamic LPAR to a PowerHA cluster:

� Name resolution� Installing SSH on PowerHA nodes� Configuring HMC SSH access� Defining HMC and managed system names to PowerHA

Name resolutionOne common issue seen is name resolution not being consistent across all systems. If name resolution is not configured correctly, the DLPAR feature cannot be used. The underlying Reliable Scalable Cluster Technology (RSCT) infrastructure expects an identical hostname resolution on all participating nodes. If this is not the case RSCT will be unable to communicate properly.

Ensure that all nodes and the HMC are configured identical by checking the following list. The phrase “All systems” includes all PowerHA nodes and the HMC:

� All systems must resolve the participating hostnames and IP addresses identical. This includes reverse name resolution.


� All systems must use the same type of name resolution, either short or long name resolution. All systems should use the same name resolution order, either local or remote.

To ensure these requirements, check the following files:

– /etc/hosts on all systems– /etc/netsvc.conf on all AIX nodes– /etc/host.conf, if applicable, on the HMC

We expect that it is common knowledge how to check these files on the AIX systems. However, it is not as well known how to do so on the HMC, which is covered in the following sections.

Installing and configuring SSH on PowerHA nodesIn order to use remote command operations on the HMC, it is required to have SSH installed on the PowerHA nodes. The HMC must be configured to allow access from these partitions. In this section we cover installing openssh on the AIX cluster nodes.

With each version of SSH and HMC code, these steps might differ slightly. We have documented our processes, which we used to successfully implement our environment.

Installing SSH on PowerHA nodesIn 9.3.1, “Requirements” on page 493, we covered which packages are needed and where to obtain them. The following steps assume these packages have been downloaded or copied onto the PowerHA nodes. We chose to get openssh and openssl from the AIX expansion pack and put them in the common install directory of /usr/sys/inst.images.

You can now install using smitty install_all. The core filesets required to install and the results of our installation are shown in Example 9-1.

Example 9-1 Install SSH

smitty install_all| openssh.base ALL |

| @ 5.0.0.5300 Open Secure Shell Commands | | @ 5.0.0.5300 Open Secure Shell Server | | | | openssh.license ALL | | @ 5.0.0.5300 Open Secure Shell License || openssh.man.en_US ALL |

| @ 5.0.0.5300 Open Secure Shell Documentation - U.S. English || openssh.msg.EN_US ALL |

| @ 5.0.0.5300 Open Secure Shell Messages - U.S. English (UTF) |


| openssl.base ALL | | @ 0.9.8.801 Open Secure Socket Layer | | | | openssl.license ALL | | @ 0.9.8.801 Open Secure Socket License |+--------------------------------------------------------------------------+

root@ jordan[/] lslpp -L open*

Fileset Level State Type Description (Uninstaller) ----------------------------------------------------------------------------

openssh.base.client 5.0.0.5300 C F Open Secure Shell Commands openssh.base.server 5.0.0.5300 C F Open Secure Shell Server openssh.license 5.0.0.5300 C F Open Secure Shell License openssh.man.en_US 5.0.0.5300 C F Open Secure Shell

Documentation - U.S. English openssh.msg.EN_US 5.0.0.5300 C F Open Secure Shell Messages

- U.S. English (UTF openssl.base 0.9.8.801 C F Open Secure Socket Layer openssl.license 0.9.8.801 C F Open Secure Socket License

Now that SSH is installed we need to configure the PowerHA nodes to access the HMC without passwords for remote DLPAR operations.

Configuring HMC SSH accessThese are the steps we used in our setup to enable SSH access from our PowerHA nodes on HMC V7R3.4.0:

� Enable HMC SSH access.� Generate SSH keys on PowerHA nodes.� Enable non-password HMC access via authorized_keys2 file.

First, make sure the HMC is set up to allow remote operations by doing the following steps:

1. In the Navigation area, select HMC Management.

2. Then in the right frame under Administration, choose Remote Command Execution.

3. Click the box to enable ssh that says, Enable remote command execution using the ssh facility, as shown in Figure 9-4.

Tip: Be sure to choose yes on the field to accept the license agreement.


Figure 9-4 Enable remote command execution on the HMC

It is necessary to create the SSH directory $HOME/.ssh for user root to store the authentication keys. PowerHA will run the ssh remote DLPAR operations as the root user. By default this is /.ssh, and is what we used.

To generate public and private keys, run the following command on each HACMP node:

/usr/bin/ssh-keygen -t rsa

This will create the following files in /.ssh:

private key: id_rsapublic key: id_rsa.pub

The write bits for both group and other are turned off. Ensure that the private key has a permission of 600.

The HMC’s public key needs to be in known_hosts file on each PowerHA node, and vice versa. This is easily accomplished by running ssh to the HMC from each PowerHA node. The first time run, a prompt will be displayed to insert the key into the file. Answer yes to continue, and then you will be prompted to enter a password. It is not necessary to do so as we have not completed the setup yet to allow non-password ssh access (see Example 9-2).

Example 9-2 SSH to HMC

Jordan /tmp > ssh [email protected] authenticity of host '9.12.5.23 (9.12.5.23)' can't be established.

Note: Normally we recommend that you create a separate HMC user for remote command execution, however PowerHA uses hscroot.


RSA key fingerprint is 2d:50:3f:03:d3:51:96:27:5a:5e:94:f4:e3:9b:e7:78 Are you sure you want to continue connecting (yes/no)?yesWarning: Permanently added '9.12.5.3' (RSA) to the list of known hosts.

When utilizing two HMCs, as in our test configuration, it is necessary to repeat this process for each HMC. You can also do this between all member nodes to allow ssh types of operations between them (for example, scp, sftp, and ssh).

To allow non-password ssh access, we must put each PowerHA node’s public key into the authorized_keys2 file on the HMC. This can be done more than one way, however here is an overview of the steps we used:

� Copy (scp) the authorized_keys2 file from the HMC to the local node.� Concatenate (cat) the public key for each node into the authorized_keys2 file.� Repeat on each node.� Copy (scp) the concatenated file over to the HMC /home/hscroot/.ssh.

For the first step, we copied the authorized_keys2 file from the HMC. Assume it already exists on the HMC. To verify on the HMC that the authorized_keys2 file exists in the .ssh directory, we did this from the client by running:

ssh hscroot@hmc hscroot@hmc3:~> ls -al .ssh/total 16drwxr-xr-x 3 root hmc 4096 2008-10-28 17:48 .drwxr-xr-x 6 hscroot hmc 4096 2008-11-11 08:14 ..-rw-r--r-- 1 hscroot hmc 4070 2009-04-01 17:23 authorized_keys2drwxrwxr-x 2 ccfw hmc 4096 2008-10-28 17:48 ccfw

We then copied it to the local node while in the /.ssh directory by running:

scp hscroot@hmc3:~/.ssh/authorized_keys2 ./authorized_keys2.hmc

Next, from /.ssh on the AIX LPARs, we made a copy of the public key and renamed it to include the local node name as part of the file name. We then copied, via scp, the public key of each machine (Jessica and Alexis) to one node (Jordan). We then ran the cat command to create an authorized_keys2 file that contains the public key information for all PowerHA nodes. The commands run on each node are shown in Example 9-3.

Example 9-3 Scp authorized_keys2 file to HMC

Alexis /.ssh > cp id_rsa.pub id_rsa.pub.alexisAlexis /.ssh > scp id_rsa.pub.alexis jordan:/.ssh/id_rsa.pub.alexis

Jessica /.ssh > cp id_rsa.pub id_rsa.pub.jessicaJessica /.ssh > scp id_rsa.pub.jessica jordan:/.ssh/id_rsa.pub.jessica


Jordan /.ssh > cp id_rsa.pub id_rsa.pub.jordanJordan /.ssh > cat id_rsa.pub.alexis id_rsa.pub.jessica id_rsa.pub.jordan >authorized_keys2.hmc

Jordan /.ssh > ls -altotal 64drwx------ 2 root system 256 Jul 18 22:27 .drwxr-xr-x 21 root system 4096 Jul 14 02:11 ..-rw-r--r-- 1 root system 664 Jun 16 16:31 authorized.keys2.hmc-rw------- 1 root system 883 Jun 16 14:12 id_rsa-rw-r--r-- 1 root system 221 Jun 16 14:12 id_rsa.pub-rw-r--r-- 1 root system 221 Jun 16 16:30 id_rsa.pub.alexis-rw-r--r-- 1 root system 222 Jun 16 15:20 id_rsa.pub.jessica-rw-r--r-- 1 root system 221 Jun 16 16:27 id_rsa.pub.jordan-rw-r--r-- 1 root system 1795 Jul 14 04:08 known_hosts

Jordan/.ssh > scp authorized.keys2.hmc hscroot@hmc3:~/.ssh/[email protected]'s password:authorized_keys2 100% 664 0.7KB/s 00:00

When running the scp command to the HMC, you should be prompted to enter the password for the hscroot user. After being entered, the authorized_key2 will be copied. You can then test if the no-password access is working from each node by the ssh command as shown in Example 9-2 on page 505. However, this time you should end up at the HMC shell prompt as shown in Example 9-4.

Example 9-4 Test no-password ssh access

Alexis /.ssh > ssh hscroot@hmc3Last login:Thur Jun 16 22:46:51 2005 from 9.12.7.11hscroot@hmcitso:~>

After each node can ssh to the HMC without a password, then this step is completed and PowerHA verification of the HMC communications should succeed.

Defining HMC and managed system names to PowerHAThe HMCs’ IP addresses must be specified for each PowerHA node that will be utilizing DLPAR. In our example, each PowerHA corresponds to an LPAR. Each LPAR is assigned to a managed system. Managed systems are those systems that are physically attached to, and managed, by the HMC. These managed systems must also be defined to PowerHA.

You can obtain the managed system names through the HMC console in the navigation area. The managed system name can be a user created name or the default name is the machine type and serial number.


To define the HMC communication for each HACMP node:

1. In smitty hacmp, select Extended Configuration Extended Resource Configuration HACMP Extended Resources Configuration Configure HACMP Applications Configure HACMP for Dynamic LPAR and CUoD Resources Configure Communication Path to HMC Add HMC IP Address for a Node and press Enter.

The Add HMC IP Address panel appears.

2. Fill out the following fields as appropriate:

Node Name Select a node name to associate with one or more. Hardware Management Console (HMC) IP addresses and a Managed System.

HMC IP Addresses Enter one or more space-separated IP addresses for the HMC. If addresses are added for more than one HMC, PowerHA tries to communicate with each HMC until a working communication path is found. After the communication path is established, PowerHA uses this path to run the dynamic logical partition commands on that HMC.

Managed System Name Enter the name of the Managed System that runs the LPAR that represents the node. The maximum length is 32 characters.

3. Press Enter.

Figure 9-5 shows the SMIT panel to add the HMC information that we listed previously. We also show the HMC managed system name information as used in our test configuration.

During cluster verification, PowerHA verifies that the HMC is reachable by first issuing a ping to the IP address specified. If the HMC responds, then PowerHA will verify that each specified PowerHA node is in fact DLPAR capable by issuing an lssycfg command, via ssh, on the HMC.

Tip: You can use the SMIT fast path of smitty cladd_apphmc.dialog

Note: Having 2 HMCs provides improved availability.


Figure 9-5 Defining HMC and Managed System to PowerHA


Configuring application provisioningTo configure dynamic LPAR and CoD resources, for each application server that could use DLPAR-allocated or CoD resources:

1. Using the smitty hacmp fast path, select Extended Configuration Extended Resource Configuration HACMP Extended Resources Configuration Configure HACMP Applications Configure HACMP for Dynamic LPAR and CUoD Resources Configure Dynamic LPAR and CUoD Resources for Applications Add Dynamic LPAR and CUoD Resources for Applications and press Enter.

A picklist of configured application servers appears.

2. Select an application server from the list and press Enter.

The panel to specify the requirements for an application server appears. Detailed information can be found in the help panels and in 9.3.2, “Application provisioning” on page 494

3. Fill out the following fields as appropriate:

Application Server Name This is the application server for which you will configure Dynamic LPAR and CoD resource provisioning that was chosen from the previous menu.

Minimum Number of CPUs Enter the minimum number of CPUs to acquire when the application server starts. The default value is 0. To perform the application provisioning, PowerHA checks how many CPUs the LPAR node currently has above its LPAR minimum value, compares this number with the minimum requested in this field and based on this, requests more CPUs, if needed.

Number of CPUs Enter the maximum amount of CPUs HACMP will attempt to allocate to the node before starting this application on this node. The default value is 0.

Minimum Amount of Memory Enter the amount of memory to acquire when the application server starts. Must be a multiple of 256.

Tip: You can use the SMIT fast path of smitty cladd_appdlpar.dialog


Use CUoD if resources The default is No. Select Yes to have PowerHAare insufficient? use Capacity on Demand (CoD) to obtain

enough resources to fulfill the minimum amount requested. Using CoD requires a license key (activation code) to be entered on the Hardware Management Console (HMC) and might result in extra costs due to usage of the CoD license.

I agree to use CUoD The default is No. Select Yes to acknowledgeresources that you understand that there might be extra

costs involved when using CUoD. HACMP logs the answer to the syslog and smit.log files

4. Press Enter.

When the application requires additional resources to be allocated on this node, PowerHA performs its calculations to see whether it needs to request only the DLPAR resources from the free pool on the frame and whether that would already satisfy the requirement, or if CUoD resources are also needed for the application server. After that, PowerHA proceeds with requesting the desired amounts of memory and numbers of CPU, if you selected to use them.

During verification, PowerHA ensures that the entered values are below LPAR maximum values for memory and CPU. Otherwise PowerHA issues an error, stating these requirements.

PowerHA also verifies that the total of required resources for ALL application servers that can run concurrently on the LPAR is less than the LPAR maximum. If this requirement is not met, PowerHA issues a warning. Note that this scenario can happen upon subsequent fallovers. That is, if the LPAR node is already hosting application servers that require DLPAR and CUoD resources, then upon acquiring yet another application server, it is possible that the LPAR cannot acquire any additional resources beyond its LPAR maximum. PowerHA verifies this case and issues a warning.


An application provisioning example for our test configuration can be seen in Figure 9-6.

Figure 9-6 Defining application provisioning

After adding both the HMC communications and application provisioning, it is necessary to synchronize the cluster.

9.3.4 Troubleshooting HMC verification errorsIn this section we show some errors that could be encountered during verification, along with possibilities of why the errors are generated. Though some of the error messages seem self explanatory, we believe any tips in troubleshooting are normally welcome.

In Example 9-5 the error message itself gives good probable causes for the problem. Here are some things you can do to discover the source of the problem:

� Ping the HMC IP address.� Manually ssh to the HMC using the ssh hscroo@hmcip command.


Example 9-5 HMC unreachable during verification

ERROR: The HMC with IP label 9.12.5.28 configured on node jordan is not reachable. Make sure the HMC IP address is correct, the HMC is turned on and connected to the network, and the HMC has OpenSSH installed and setup with the public key of node jordan.

If the ssh is unsuccessful or prompts for a password that is an indication that ssh has not been properly configured (Example 9-6).

Example 9-6 Node not DLPAR capable verification error

ERROR: An HMC has been configured for node jordan, but the node doesnot appear to be DLPAR capable.

If the message shown in Example 9-6 appears by itself, this is normally an indication that access to the HMC is working, however the particular node’s matching LPAR definition is not reporting that it is DLPAR capable. This might be caused by something as simple as the node not running at least AIX 5.2 that is required for DLPAR operations. It can also be from RMC not updating properly. This generally is rare, and usually only applies to POWER4 systems. You can verify this manually from the HMC command line as shown in Example 9-7.

Example 9-7 Verify LPAR is DLPAR capable

hscroot@hmc3:~> lspartition -dlpar<#0> Partition:<5*8204-E8A*10FE401, , 9.12.7.5> Active:<0>, OS:<, , >, DCaps:<0x0>, CmdCaps:<0x0, 0x0>, PinnedMem:<0>

<#14> Partition:<4*9111-520*10EE73E, jessica.dfw.ibm.com, 9.19.51.193> Active:<1>, OS:<AIX, 5.3, 5300-09-01-0847>, DCaps:<0x3f>, CmdCaps:<0xb, 0xb>, PinnedMem:<285>

In the case of partition#0 shown above, it is not DLPAR capable as indicated by DCaps:<0x0>, whereas with partition#14 it is DLPAR capable. In this case, because partition #0 is not active, this might be a truer indication of why it is not current DLPAR capable.

Also, be sure that be sure that RMC communication to HMC (port 657) is working and restart the rsct daemons on partitions running:

� /usr/sbin/rsct/install/bin/recfgct� /usr/sbin/rsct/bin/rmcctrl -z� /usr/sbin/rsct/bin/rmcctrl –A� /usr/sbin/rsct/bin/rmcctrl –p

Note: The HMC command syntax can vary by HMC code levels and type.


Also, restart the rsct daemons on HMC the same as we just described, but you must be become root from pesh (hscpe) to do so.

During our testing, we ran several events within very short periods of time. At some point we would see that our LPAR would report that it was no longer DLPAR capable. Then after a short period, it would report back normally again. We believe that this was due to RMC information getting out of sync between the LPARs and the HMC.

9.3.5 Test cluster configurationOur test configuration consists of three LPARs distributed over two POWER6 technology-based 550 servers. There are two production LPARs (Jordan, Jessica) in one POWER6 technology-based 550 and one standby LPAR (Alexis) in the other POWER6 technology-based 550. Each 550 contains eight CPUs and 16 GB of memory.

Each partition has the following software levels installed:

� AIX 6.1 TL2 SP1� PowerHA 5.5� RSCT 2.5.2.0� OpenSSH 5.0.0.5300� openssl 0.9.8.801� VIOS 1.5.� Dual HMCs both with HMC 7 version 3.4� HMC build level/firmware 20081112.1

For shared storage we used a DS4800 with 10 GB LUNs, two of which were assigned to each production resource group. For purposes of our testing, the shared storage was not important other than trying to set up a more complete cluster configuration by utilizing disk heartbeat.


These LPARs are configured with the partition profile settings shown in Table 9-5.

Table 9-5 Partition profile settings

We have two resource groups configured, app1_rg and app2_rg each containing their own corresponding application servers of app1 and app2 respectively. Each resource group is configured as online on home node. App1_rg has participating nodes of Jordan and Alexis. App2_rg has participating nodes of Jessica and Alexis. Making our cluster a 2+1 setup, with node Alexis as the standby node.

The application server DLPAR configuration settings are shown in Table 9-6:

Table 9-6 Application server DLPAR settings

We specifically chose the minimum settings of zero to always allow our resource group to be acquired.

9.3.6 Test results

LPAR Name Minimum Desired Maximum

Jordan 1CPU - 1GB 1CPU - 1GB 4CPU - 4GB

Jessica 1CPU - 1GB 1CPU - 1GB 2CPU - 2GB

Alexis 1CPU - 1GB 1CPU - 1GB 6CPU - 6GB

Application server Minimum Desired

app1 0 3

app2 0 2

Important: In POWER5 and POWER6, even if partitions are inactive, if they have resources assigned to them (meaning that their min, desired, and max settings are not set to zero (0), then they do not show up in the free pool.


Scenario 1: Resource group acquisitionIn this scenario we start off with the following:

� Jordan has 1 CPU/1 GB allocated.� The free pool has 7 CPU/7 GB.

Upon starting cluster services on Jordan, app1 is started locally and attempts to acquire the desired amount of resources assigned. Because there are enough resources in the free pool, another 3 CPUs and 3 GB are acquired as shown in Figure 9-7.

Figure 9-7 DLPAR Resource acquisition

P690_1 – 8 CPU - 8GB 1GB

1GB

1GB

1GB

Jordan = 4X4

App1_group

P690_1 – 8 CPU - 8GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

Jordan = 1X1

P690_1 – 8 CPU - 8GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

Acquire 3 each

App1_group

Free Resource Pool Free Resource Pool Free Resource Pool

1GB

1GB

1GB

1GB

1GB = 1 GB of Memory

= 1 CPU

P690_1 – 8 CPU - 8GB 1GB

1GB

1GB

1GB

Jordan = 4X4

App1_group

P690_1 – 8 CPU - 8GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

Jordan = 1X1

P690_1 – 8 CPU - 8GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

Acquire 3 each

App1_group

Free Resource Pool Free Resource Pool Free Resource Pool

1GB

1GB

1GB

1GB


= 1 CPU


Scenario 2: Resource group releaseIn this scenario, node Jessica is online in the cluster with app2 running on its partition maximum settings of 2 CPUs and 2 GB.

Upon stopping cluster services on Jessica, app2 is stopped locally and releases resources back to the free pool. When releasing resources, HACMP will not release more resources than it originally acquired.

In Figure 9-8, we show the releasing of resources and their re-allocation back to the free pool.

Figure 9-8 DLPAR resource release

P690_1 – 8 CPU - 8GB

1GB

1GB

1GB

1GB

P690_1 – 8 CPU - 8GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

Jessica = 1X1

P690_1 – 8 CPU - 8GB

1GB

1GB

1GB

1GB

1GB

1GBFree Resource Pool Free Resource Pool Free Resource Pool

1GB

1GB

1GB

1GB

Jessica = 2X2

App2_group

Release 1 eachApp2_group

1GB


= 1 CPU

P690_1 – 8 CPU - 8GB

1GB

1GB

1GB

1GB

P690_1 – 8 CPU - 8GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

Jessica = 1X1

P690_1 – 8 CPU - 8GB

1GB

1GB

1GB

1GB

1GB

1GBFree Resource Pool Free Resource Pool Free Resource Pool

1GB

1GB

1GB

1GB

Jessica = 2X2

App2_group

Release 1 eachApp2_group

1GB


= 1 CPU


Scenario 3: Fallover each LPAR sequentiallyThis is a two part scenario that we go through, falling over each partition, node Jordan and then node Jessica. This demonstrates how resources are acquired on fallover, similarly to local resource group acquisition.

Also between this scenario and Scenario 4: Fallover production LPARs in reverse order, we show how each individual fallover differs in the total amount of resources the standby node.

For the first part, we fail node Jordan running the reboot -q command. This results in a fallover to occur to node Alexis. Alexis acquires the app1 resource group and allocates the desired amount of resource as shown in Figure 9-9.

Figure 9-9 First production LPAR fallover

P690_2 – 8 CPU – 8GB

Alexis (4x4)Eth

Eth

P690_1 – 8 CPU - 8GB

Jordan (4x4)Eth

Eth

Jessica (2x2)Eth

Eth

1GB

1GB 1GB

1GB

HMC

App1_group App1_group

App2_groupdlpar_app2_svc

jordan_base1jordan_base2

jessica_base1jessica_base2

alexis_base1alexis_base2

dlpar_app1_svc 1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

Free Resource PoolFree Resource Pool

HMC1GB = 1 GB of Memory

= 1 CPU

P690_2 – 8 CPU – 8GB

Alexis (4x4)Eth

Eth

P690_1 – 8 CPU - 8GB

Jordan (4x4)Eth

Eth

Jessica (2x2)Eth

Eth

1GB

1GB 1GB

1GB

HMC






dlpar_app1_svc 1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB



= 1 CPU


Node Alexis now has the same amount resources as the original failing node.

The second part of this scenario we continue on from the following conditions:

� Jordan is offline.� Jessica has 2 CPUs and 2 GB memory.� Alexis has 4 CPUs and 4 GB memory.� The free pool (frame 2) has 4 CPUs and 4 GB memory.

We now fail node Jessica using the reboot -q command. Node Alexis takes over the app2 resource group and acquires the desired resources as shown in Figure 9-10.

Figure 9-10 Second production LPAR fallover

Alexis ends up with its maximum partition setting of 6 CPUs and 6 GB memory.

P690_1 – 8 CPU – 8GB

Alexis (6X6)Eth

Eth

P690_2 – 8 CPU - 8GB

Jordan (prod 4x4)Eth

Eth

Jessica (prod 2x2)Eth

Eth

1GB

1GB 1GB

1GB

HMC


App2_group

dlpar_app2_svc




dlpar_app1_svc 1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB App2_group



= 1 CPU

P690_1 – 8 CPU – 8GB

Alexis (6X6)Eth

Eth

P690_2 – 8 CPU - 8GB


Eth


Eth

1GB

1GB 1GB

1GB

HMC


App2_group

dlpar_app2_svc




dlpar_app1_svc 1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB App2_group



= 1 CPU


Scenario 4: Fallover production LPARs in reverse orderThis is also a two part scenario. We start off exactly the same way we did with scenario 3 as follows:

We start off with cluster services running on all nodes and they currently have the following resources assigned to each:

� Jordan (frame 1) has 4 CPUs and 4 GB memory.� Jessica (frame 1) has 2 CPUs and 4 GB memory.� Alexis (frame 2) has 1 CPU and 1 GB memory.� The free pool (frame 2) has 7 CPUs and 7 GB memory.

This time the we fail node Jessica first using the reboot -q command. This results in node Alexis acquiring the app2 resource group and the desired amount of resources as shown in Figure 9-11.

Figure 9-11 Fallover second production LPAR first

The point to note here is that now, Alexis has 3 CPUs and 3 GB memory. Normally the app2 resource group only has 2 CPUs and 2 GB memory on node Jessica. Technically this might be more resources than necessary. This is a direct result of how application provisioning can end up with a different amount of resources depending on which LPAR/partition profile the resource group ends up on.

P690_1 – 8 CPU – 8GB

Alexis (3X3)Eth

Eth

P690_2 – 8 CPU - 8GB


Eth


Eth

1GB

1GB 1GB

1GB

HMC

App1_group

App2_group

dlpar_app2_svc




dlpar_app1_svc 1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

App2_group



= 1 CPU

P690_1 – 8 CPU – 8GB

Alexis (3X3)Eth

Eth

P690_2 – 8 CPU - 8GB


Eth


Eth

1GB

1GB 1GB

1GB

HMC

App1_group

App2_group

dlpar_app2_svc




dlpar_app1_svc 1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

App2_group



= 1 CPU


For the second part of this scenario, we continue from here with:

� Jordan (frame 1) has 4 CPUs and 4 GB memory� Jessica offline.� Alexis (frame 2) has 3 CPU and 3 GB memory.� The free pool (frame 2) has 5 CPUs and 5 GB memory.

We now fail node Jordan using the reboot -q command. Node Alexis takes over the app1 resource group and acquires the desired resources as shown in Figure 9-12. The end result is exactly the same as the previous scenario.

Figure 9-12 Results after second fallover

P690_1 – 8 CPU – 8GB

Alexis (6X6)Eth

Eth

P690_2 – 8 CPU - 8GB


Eth


Eth

1GB

1GB 1GB

1GB

HMC


App2_group

dlpar_app2_svc




dlpar_app1_svc 1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB App2_group



= 1 CPU

P690_1 – 8 CPU – 8GB

Alexis (6X6)Eth

Eth

P690_2 – 8 CPU - 8GB


Eth


Eth

1GB

1GB 1GB

1GB

HMC


App2_group

dlpar_app2_svc




dlpar_app1_svc 1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB App2_group



= 1 CPU


Scenario 5: Production re-acquisition via rg_moveIn this scenario we continue from where we left off in scenario 4 after restarting nodes Jordan and Jessica back into the cluster. We start with the following conditions:

� Jordan (frame 1) has 1 CPU and 1 GB memory.� Jessica (frame 1)has 1 CPU and 1 GB memory.� Alexis (frame 2) has 6 CPUs and 6 GB memory.� The free pool (frame 1) has 6 CPUs and 6 GB memory.

Node Alexis is currently hosting both resource groups. We run an rg_move of app2_rg from node Alexis back to its home node of Jessica. Alexis releases 2 CPUs and 2GB memory, while Jessica only acquires 1 CPU and 1 GB of memory as shown in Figure 9-13. This again is a direct result of the combination of application provisioning and the LPAR profile settings.

Figure 9-13 Resource group release and acquisition from rg_move

P690_1 – 8 CPU – 8GB

Alexis (4X4)Eth

Eth

P690_2 – 8 CPU - 8GB


Eth


Eth

1GB

1GB 1GB

1GB

HMC

App1_group





dlpar_app1_svc 1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

App1_group



= 1 CPU

P690_1 – 8 CPU – 8GB

Alexis (4X4)Eth

Eth

P690_2 – 8 CPU - 8GB


Eth


Eth

1GB

1GB 1GB

1GB

HMC

App1_group





dlpar_app1_svc 1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

App1_group



= 1 CPU


Scenario 6: Testing HMC redundancyIn this scenario we test the HMC redundancy by physically unplugging the network connection of one of the HMCs. We start off with cluster services running on all nodes and they currently have the following resources assigned to each:

� Jordan (frame 1) has 4 CPUs and 4GB memory� Jessica (frame 1) has 2 CPUs and 4GB memory� Alexis (frame 2) has 1 CPU and 1 GB memory� The free pool (frame 2) has 7 CPUs and 7 GB memory

We physically pulled the ethernet cable from the HMC we have listed first as 192.168.100.69. We then failed node Jordan to cause a fallover to occur, as shown in Figure 9-14.

Figure 9-14 HMC redundancy test

During fallover and trying to access the HMCs, the following actions occur:

1. PowerHA issues ping to first HMC

2. First HMC is offline and does not respond

3. PowerHA issues ping to second HMC which is successful and continues to process dynamic LPAR command line operations.

The HMC test actions can be seen in /var/hacmp/log/hacmp.out via the event utility clhmcexec.

P690_2 – 8 CPU – 8GB

Alexis (4x4)Eth

Eth

P690_1 – 8 CPU - 8GB

Jordan (4x4)Eth

Eth

Jessica (2x2)Eth

Eth

1GB

1GB 1GB

1GB

HMC






dlpar_app1_svc 1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB


HMC

P690_2 – 8 CPU – 8GB

Alexis (4x4)Eth

Eth

P690_1 – 8 CPU - 8GB

Jordan (4x4)Eth

Eth

Jessica (2x2)Eth

Eth

1GB

1GB 1GB

1GB

HMC






dlpar_app1_svc 1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB

1GB


HMC


9.4 Live Partition Mobility

Live Partition Mobility allows you to migrate partitions that are running AIX and Linux operating systems and their hosted applications from one physical server to another without disrupting the infrastructure services. The migration operation, which takes just a few seconds, maintains complete system transactional integrity. The migration transfers the entire system environment, including processor state, memory, attached virtual devices, and connected users.

It provides the facility for no required downtime for planned hardware maintenance. However, it does not offer the same for software maintenance or unplanned downtime. That is why PowerHA is still relevant today.

PowerHA can be used within a partition that is capable of being moved with Live Partition Mobility. This does not mean that PowerHA utilizes Live Partition Mobility in anyway. PowerHA is just treated as another application within the partition. The following requirements are for PowerHA to be supported with Live Partition Mobility:

� PowerHA 5.5

– AIX 5.3 TL9 or AIX 6.1 TL2 SP1– RSCT 2.4.10.0 or 2.5.2.0

� HACMP 5.4.1

– AIX 5.3.7.1, or AIX 6.1.0.1– RSCT 2.4.5.4, or 2.5.0.0– HACMP APAR #IZ02620

� HACMP 5.3

– AIX 5.3.7.1, or AIX 6.1.0.1– RSCT 2.4.5.4, or 2.5.0.0– HACMP APAR #IZ07791

� Firmware Level: 01EM320_40 (or higher)

� VIOS Level: 1.5.1.1-FP-10.1 (w/Interim Fix 071116)

� HMC Level: 7.3.2.0

The support flash listing these requirements can be found at:

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10640

With the requisite levels of software and firmware installed on the source and destination POWER6 servers, the Live Partition Mobility feature can be used to migrate an LPAR running as a PowerHA node without affecting the state or operation of the PowerHA cluster, provided that the PowerHA cluster is configured to use standard (that is, default) heartbeat parameters.



In that case, the effect on the application servers running under PowerHA control is a brief suspension of operations during the migration. Neither PowerHA nor the application servers will have to be restarted.

For those PowerHA clusters configured to use fast heartbeating, or short custom heartbeat intervals, IBM recommends customer testing to validate that the period of suspension during Live Partition Mobility will not cause unwanted cluster events.

Another option when using HACMP 5.4 and above is, you could stop the cluster services with the Unmanage option, perform the partition move, then restart the cluster services upon the successful completion of the move. This will prevent any events from processing related to the node being moved.

9.5 Workload Partitions

In this section we cover the use of Workload Partitions (WPARs) in a PowerHA cluster. We discuss the relationship between the WPARs and the host instance of AIX or Global Environment (GE) and the PowerHA cluster itself.

9.5.1 Relationships

AIX Workload Partitions (WPARs) are lightweight virtualized operating system environments within a single controlling instance of the AIX operating system known as the Global Environment which is capable of hosting a number of WPARs at any given point in time. For all users and applications running inside the WPAR, the workload partition appears and functions as if it were a separate instance of AIX in its own right as the applications and workload partitions have a private execution environment.

The applications are isolated at a software level as far as process, signal and file system space, and so on. Workload partitions have dedicated network addresses and inter-process communication is restricted to process execution within the same workload partition. WPARs have their own unique users and groups that have no privileges outside of that particular WPAR, including access or permissions in the Global Environment.

The clustered WPAR is reliant and dependent upon resources from the global environment, such as the actual physical devices, as the WPAR has none. In addition to this PowerHA operates at the Global Environment level making it available to all WPARs which run on that particular host and are configured for high availability.


The file systems required by the WPAR are NFS mounted in the global environment after the volume group is varied on and made available to the WPAR. By default, a new WPAR's file systems will be based at the /wpars/wpar_name/ directory.

The IP Address made available to the WPAR is an IP Alias on the base network interface which belongs to the global environment. All Process IDs in the WPAR can be monitored and managed from the global environment. PowerHA runs and operates in the global environment Therefore it is able to monitor and manage all WPARs, as well as the devices and base components which the WPAR is dependent upon.

Figure 9-15 shows the relationship between the global environment and its resources and the WPARs. Note that the WPAR is associated with a PowerHA, WPAR-enabled resource group (RG01). Particular attention should be paid to the association between the components in Figure 9-15.

Figure 9-15 Workload Partitions relationships


9.5.2 Planning for Highly Available WPARs

As with any PowerHA solution the success is largely dependent on planning and testing the configuration. When planning for AIX Workload Partitions in your PowerHA cluster, there are further details that we need to incorporate into the planning of the solution:

� The name of the WPAR� The name of the WPAR volume group� The WPAR service IP label/s� The resource group name associated with the WPAR� The application name, which will run within the resource group

The introduction of WPARs has the potential to introduce several new single points of failure (SPOF) in a PowerHA clustered environment, including these:

� The NFS server� The network between the NFS server and the global environment/s hosting

the WPAR� The WPARs themselves� The global environments hosting the WPARs� The applications running within the WPARs

One of the key considerations is that of the NFS server, and the network between it and the global environment hosting the WPARs. Are the NFS servers, and the network configured for high availability? If so, then the usage of NFS cross mounts will greatly reduce fallover times by having the file systems already mounted to each of the nodes that can host the WPAR and its associated resource group. In this configuration you should also set the resource group attribute File systems Mounted Before IP Configured to True.

In a clustered environment with many IP networks the cluster administrator should specify which network PowerHA will use for the NFS cross mounts.

When planning for WPARs in a PowerHA cluster, the WPARs should be created independently of PowerHA and created for Live Application Mobility. After the WPAR creation is complete, the cluster administrator should test the mobility of the new WPAR using the Workload Partitions Manager™ for AIX. Further information about WPARs and IBM Workload Partitions Manager for AIX, can be found in Workload Partition Management in IBM AIX Version 6.1, SG24-7656.

Note: These exported file systems need to be defined to /etc/exports as is normally the case in AIX for global environments as well as in /usr/es/sbin/cluster/etc/exports for use by PowerHA. Further information about NFS Cross Mounts can be seen in Chapter 2, Paragraph 4.3.


Creating the WPAR/sA cluster administrator can create a WPAR using one of the following methods:

� Using IBM Workload Partitions Manager for AIX� From the command line� Using a script� Using SMIT fast path smitty mkwpar

After the WPAR has been created, the administrator should ensure mobility of the WPAR from its originating or home global environment to each and every other cluster node that the WPAR could be deployed to as seen in Figure 9-16.

Figure 9-16 IBM Workload partitions Manager for AIX


Using the Workload Partition Manager, select the WPAR and check the relocation options seen in Figure 9-17. Then click OK, to relocate the WPAR to the new global environment or host.

Figure 9-17 Relocation options


Figure 9-18 shows the WPAR transition from one global environment to the next after we click the Task Details button on the WPAR Manager.

Figure 9-18 Relocation and transition of the WPAR


9.5.3 Resource Groups and WPARs

To create the WPAR-enabled resource group we do the following steps:

1. Run the smitty hacmp fast path.

2. Enter the smitty hacmp fast path, then select Extended Configuration Extended Resource Configuration HACMP Extended Resource Group Configuration Add a Resource Group and add the hawp1 resource group as seen in Example 9-8 on page 531.

3. After we have our resource group, we add the resource group to the WPAR using the smitty hacmp fast path and select Extended Configuration Extended Resource Configuration HACMP Extended Resource Group Configuration Change/Show Resources and Attributes for a Resource Group.

4. From here we select the hawp1 resource group. This will give us the menu option Change/Show All Resources and Attributes for a Resource Group.

5. Towards the base of the options list, we have WPAR Name. Using the F4 key to generate the picklist, we select out WPAR (Example 9-8).

Example 9-8 Adding the WPAR to the hawp1 resource group

Change/Show All Resources and Attributes for a Custom Resource Group


[Entry Fields] Resource Group Name hawp1 Inter-site Management Policy ignore Participating Nodes from Primary Site xdsvc1 Participating Nodes from Secondary Site xdsvc2 Startup Policy Online On Home Node > Fallover Policy Fallover To Next Pri> Fallback Policy Fallback to Higher P> Service IP Labels/Addresses [10.0.20.5] + Application Servers [app1] + Volume Groups [appvg datavg] + Use forced varyon of volume groups, if necessary true + Automatically Import Volume Groups false +Filesystems (empty is ALL for VGs specified) [ ] +

Filesystems Consistency Check fsck + Filesystems Recovery Method sequential +

Note: You need a service IP label defined for use with the WPAR-enabled resource group before the resource group can be added to the WPAR.


Filesystems mounted before IP configured true + Filesystems/Directories to Export (NFSv2/3) [] + Filesystems/Directories to Export (NFSv4) [] + Stable Storage Path (NFSv4) [] + Filesystems/Directories to NFS Mount [] Network For NFS Mount [] + Tape Resources [] + Raw Disk PVIDs [] + Fast Connect Services [] + Communication Links [] ++-------------------------------------------------------------------------+ | WPAR Name | | | | Move cursor to desired item and press Enter. | | | | [rosie] | | F1=Help F2=Refresh F3=Cancel |F1| F8=Image F10=Exit Enter=Do |F5| /=Find | n=Find Next |F9+-----------------------------------------------------------------------+

6. Our resource group is now WPAR-enabled. When the WPAR-enabled resource group is brought online, all its associated resources are activated within the corresponding WPAR.

7. We can now use the smitty hacmp fast path and select System Management (C-SPOC) HACMP Resource Group and Application Management to bring the WPAR-enabled resource group online and test its behavior

When a resource group is WPAR-enabled, all the user defined scripts (such as application start, stop, and monitoring scripts) should be accessible within the WPAR at the given paths specified in the PowerHA configuration.

Note: The relationship and association between the WPAR-enabled resource group and the WPAR is based on them having a common name. You cannot associate the two if their names are not identical.


Figure 9-19 shows an environment configured to make WPARs highly available as well as the NFS exported file systems required by these WPARs.

Figure 9-19 WPAR environment configured for high availability

Mixed nodesIt is possible for a PowerHA cluster to consist of some nodes that are not WPAR capable. If a WPAR-enabled resource group comes online on a node which is not WPAR-capable within the cluster, it will behave as if the WPAR property for the RG was not set.

Therefore, if your cluster is configured as such, you need to ensure that all the user-defined scripts are accessible at the same path as previously specified in the PowerHA configuration. PowerHA verification will not check for access permissions for application scripts which are part of a WPAR-enabled resource group.

Enabling or disabling the WPAR property of a resource groupTo enable or disable the WPAR property of a resource group, we select the suitable option in the Change/Show Resources and Attributes for a Resource Group using the extended path.

Note: If the WPAR property of a resource group is changed using DARE (when the resource group is online), then the WPAR property will only take effect after the resource group is subsequently brought online the next time.


Resource assignmentAfter a WPAR-enabled resource group is brought online or taken offline, PowerHA will activate and deactivate resources depending on the operation of the resource group.

Only the following resource types are supported for use in a WPAR:

� Service IP label� Application Servers� File systems

With any highly available solution, including those which involve the use of WPARs, the keys to success are in meticulous planning, implementation, and most importantly, thorough and ongoing testing.

Note: You should not manually assign or remove any PowerHA resources directly from the WPAR.

Important: When a WPAR-enabled resource group comes online on a WPAR capable node, PowerHA (which runs in the global environment), automatically sets up access to the corresponding WPAR, using rsh to manage various resources associated with the resource group.


Chapter 10. Extending resource group capabilities

In this chapter we describe how PowerHA advanced resource group capabilities can be used to meet specific requirements of particular environments. The attributes shown in Table 10-1 can influence the behavior of resource groups during startup, fallover, and fallback.

Table 10-1 Resource group attribute behavior relationships

In the following sections, we discuss each of these attributes and their effects on resource groups.

10

Attribute Startup Fallover Fallback

Settling time Yes

Node distribution policy Yes

Dynamic node priority Yes

Delayed fallback timer Yes

Resource group parent / child dependency Yes Yes Yes

Resource group location dependency Yes Yes Yes


10.1 Settling timeThe settling time feature gives you the ability to delay the acquisition of a resource group, so that in the event of a higher priority node joining the cluster during the settling period, the resource group will be brought online on the higher priority node instead of being activated on the first available node.

Settling time behaviorThe following characteristics apply to the settling time:

� If configured, settling time affects the startup behavior of all offline resource groups in the cluster for which you selected the Online on First Available Node startup policy.

� The only time that this attribute is ignored is when the node joining the cluster is the first node in the node list for the resource group. In this case the resource group is acquired immediately.

� If a resource group is currently in the ERROR state, PowerHA will wait for the settling time period before attempting to bring the resource group online.

� The current settling time continues to be active until the resource group moves to another node or goes offline. A DARE operation might result in the release and re-acquisition of a resource group, in which case the new settling time values take effect immediately.

Configuring settling time for resource groupsTo configure a settling time for resource groups, do the following steps:

1. Enter the smitty hacmp fast path and select Select Extended Configuration Extended Resource Configuration Configure a Resource Group Run-Time Policies Configure Settling Time for Resource Group and press Enter.


– Settling Time (sec.):

Enter any positive integer number in this field. The default is zero.

If this value is set and the node that joins the cluster is not the highest priority node, the resource group will wait the duration of the settling time interval. When this time expires, the resource group is acquired on the node which has the highest priority among the list of nodes that joined the cluster during the settling time interval.

Remember that this is only valid for resource groups using the startup policy, Online of First Available Node.


Displaying the current settling timeTo display the current settling time in a cluster already configured, you can run the clsettlingtime list command.

#/usr/es/sbin/cluster/utilities/clsettlingtime list#SETTLING_TIME120

During the acquisition of the resource groups on cluster startup, you can also see the settling time value by running the clRGinfo -t command as shown in Example 10-1.

Example 10-1 Displaying the RG settling time

#/usr/es/sbin/cluster/utilities/clRGinfo -t-------------------------------------------------------------------------------Group Name Group State Node DelayedTimers-------------------------------------------------------------------------------settling_rg1 OFFLINE cobra 120 Seconds OFFLINE python 120 Seconds

settling_rg2 OFFLINE viper 120 Seconds OFFLINE python 120 Seconds

Note that a settling time with a non-zero value will be displayed only during the acquisition of the resource group. The value will be set to 0 after the settling time expires and the resource group is acquired by the appropriate node.

Settling time scenarioIn order to demonstrate how this feature functions, we imagined a settling time scenario and configured a 3-node cluster using two resource groups. In our scenario, we showed the following characteristics:

1. The settling time period is enforced and the resource group is not acquired on the node startup (as long as the node is not the highest priority node) until the settling time expires.

2. If the highest priority node joins the cluster during the settling period, then it does not wait for settling time to expire and acquires the resource group immediately.

We specified a settling time of 10 minutes and configured two resource groups named Settling1_RG and Settling2_RG to use the startup policy, Online on First Available Node. We set the nodelist for each resource group so that they would fallover from nodes thrish and kaitlyn respectively to node mike. Figure 10-1 shows a diagram of our configuration and the sequence in which cluster services were started on each node.

Chapter 10. Extending resource group capabilities 537

Figure 10-1 Settling time scenario

We took the following steps:

1. With cluster services inactive on all nodes, we defined a settling time value of 600 seconds.

2. We synchronized the cluster.

During cluster synchronization, the following messages were added to the cluster log:

The Resource Group Settling time value is: 120 secs.The Resource Group(s) affected by the settling time are: settling_rg1 settling_rg2

3. We started cluster services on node mike.

We started cluster services on this node because it was the last on the node list for both resource groups. After starting cluster services, neither resource group was acquired by node mike. Running the clRGinfo -t command displayed the 600 seconds settling time.

The messages shown in Example 10-2 were logged against the hacmp.out log file.

thrishkaitlynmike

Settling1_RGnodelist: thrish mike

Online on FirstAvailable Node

Start beforesettling time completion

Settling_time = 600 seconds

Settling2_RGnodelist: kaitlyn mike

Online on FirstAvailable Node

Start cluster services 1st

Start aftersettling time


Example 10-2 Checking settling time in /var/hacmp/log/hacmp.out

#tail -f /var/hacmp/log/hacmp.outNo action taken on resource group 'settling_rg1'The Resource Group 'settling_rg1' has been configuredto use 600 seconds Settling Time. This group will beprocessed when the timer expires.

No action taken on resource group 'settling_rg2'The Resource Group 'settling_rg2' has been configuredto use 600 seconds Settling Time. This group will beprocessed when the timer expires.

4. We started cluster services on the node kaitlyn.

We started cluster services about 2 minutes later within the settling time period, and Settling2_RG was acquired immediately. Because the highest priority node joined the cluster during the settling period, it was not necessary to wait for the remaining 8 minutes to expire, and the resource group was acquired immediately. Settling1_RG remained offline as expected.

5. We waited for the settling time to expire.

Upon the expiration of the settling time, Settling1_RG was acquired by node mike. Because the first node in the nodelist (thrish) did not become available within the settling time period, the resource group was acquired on the next node in the nodelist (mike).

10.2 Node distribution policyOne of the startup policies that can be configured for resource groups is Online Using Node Distribution policy.

This policy causes resource groups having this startup policy to spread across cluster nodes in such a way that only one resource group is acquired by any node during startup. This could be used, for instance, for distributing CPU-intensive applications on different nodes.

If two or more resource groups are offline when a particular node joins the cluster, this policy determines which resource group is brought online based on the following criteria:

1. The resource group with the least number of participating nodes will be acquired.

2. In case of tie, the resource group to be acquired is chosen using alphabetical order.


3. A parent resource group is preferred over a resource group that does not have any child resource group.

10.2.1 Configuring a RG node-based distribution policy To configure this type of startup policy, follow these steps:

1. Enter the smitty hacmp fast path and select Extended Configuration Extended Resource Configuration HACMP Extended Resource Group Configuration Add a Resource Group and press Enter.

2. Type in a resource group name.

3. Select a startup policy of Online Using Node Distribution Policy and press Enter as shown in Example 10-3.

Example 10-3 Configuring resource group node-based distribution policy

Add a Resource Group (extended)


[Entry Fields]* Resource Group Name []* Participating Nodes (Default Node Priority) []

Startup Policy Online On Home Node O> Fallover Policy Fallover To Next Prio> Fallback Policy Fallback To Higher Pr> +--------------------------------------------------------------------------+¦ Startup Policy ¦¦ ¦¦ Move cursor to desired item and press Enter. ¦¦ ¦¦ Online On Home Node Only ¦¦ Online On First Available Node ¦¦ Online Using Node Distribution Policy ¦ Online On All Available Nodes ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦¦ /=Find n=Find Next ¦+--------------------------------------------------------------------------


10.2.2 Node-based distribution scenarioIn order to show how this feature functions and understand the difference between this policy and the Online On Home Node Only policy, we imagined a node-based distribution scenario and configured a two-node cluster having 5 resource groups, with 3 of them using the Online Using Node Distribution policy and 2 of them using the Online On Home Node Only policy. Cluster nodes and resource groups are shown in Figure 10-2. Note that the number of resource groups having the Online Using Node Distribution policy is greater than the number of cluster nodes.

Figure 10-2 Online Using Node Distribution policy scenario

For our scenario, we took the following steps:

1. We started cluster services on node cobra. Resource group APP1_Rg was acquired as expected. Distributed_Rg1, Distributed_Rg2 and Distributed_Rg3 have no child resource groups and the same number of participating nodes. So Distributed_Rg1 was acquired due to alphabetical order.

2. We started cluster services on node viper. Resource group APP2_Rg was acquired as expected. Distributed_Rg2 and Distributed_Rg3 have no child resource groups and the same number of participating nodes. So Distributed_Rg1 was acquired due to alphabetical order.

3. The third resource group, Distributed_Rg3, remained offline.

Distributed_Rg1

Online Using Node Distribution

Participating: cobra viperDistributed_Rg2


Participating: viper cobra

Distributed_Rg3


Participating: cobra viper

APP1_Rg

Online on Home Node Only

Participating: cobra viperAPP2_Rg

Online on Home Node Only

Participating: viper cobra

cobra viper

Homeless resource group


10.3 Dynamic node priority (DNP)The default node priority order for a resource group is the order in the participating node list. Implementing a dynamic node priority for a resource group allows you to go beyond the default fallover policy behavior and influence the destination of a resource group upon fallover based on the following three RMC pre-configured attributes:

cl_highest_free_mem - node with highest percentage of free memorycl_highest_idle_cpu - node with the most available processor timecl_lowest_disk_busy - node with the least busy disks

The cluster manager queries the RMC subsystem every three minutes to obtain the current value of these attributes on each node and distributes them cluster wide. The interval at which the queries of the RMC subsystem are performed is not user-configurable. During a fallover event of a resource group with Dynamic Node Priority configured, the most recently collected values are used in the determination of the best node to acquire the resource group.

In order for DNP to be effective, note these considerations:

� DNP is irrelevant for clusters comprising fewer than three nodes.� DNP is irrelevant for concurrent resource groups.� DNP is most useful in a cluster where all nodes have equal processing power

and memory.

10.3.1 Configuring the dynamic node priority policyWhen DNP is set up for a resource group, there can be no resources already assigned to the resource group. You need to assign the fallover policy of Dynamic Node Priority at the time when the resource group is created. In order for your resource group to use one of the three DNP policies, you must set the fallover policy as shown in Example 10-4.

1. Enter the smitty hacmp fast path and select Extended Configuration Extended Resource Configuration Extended Resource Group Configuration Add a Resource Group and press Enter.

Attention: The highest free memory calculation is performed based on the amount of paging activity taking place. It does not take into account whether one cluster node has less real physical memory than another.


Example 10-4 Adding a resource group using DNP



[Entry Fields]* Resource Group Name []* Participating Nodes (Default Node Priority) [] +

Startup Policy Online On Home Node O> + Fallover Policy Fallover To Next Prio> + Fallback Policy Fallback To Higher Pr> +

Set the Fallover Policy field to Fallover Using Dynamic Node Priority.

2. Assign the resources to the resource group by selecting Extended Configuration Extended Resource Configuration Extended Resource Group Configuration Change/Show Resources and Attributes for a Resource Group and press Enter, as shown in Example 10-5.

Example 10-5 Selecting the dynamic node priority policy to use

Change/Show All Resources and Attributes for a Resource Group


[TOP] [Entry Fields] Resource Group Name DNP_test1 Participating Nodes (Default Node Priority) alexis jessica jordan* Dynamic Node Priority Policy [] +

Startup Policy Online On Home Node O> Fallover Policy Fallover Using Dynami> Fallback Policy Fallback To Higher Pr>

3. Select one of the three available policies from the pull-down list:

– cl_highest_free_mem– cl_highest_idle_cpu– cl_lowest_disk_busy

Continue selecting the resources that will be part of the resource group.

4. Verify and synchronize the cluster.


You can display the current DNP policy for an existing resource group as shown in Example 10-6.

Example 10-6 Displaying DNP policy for a resource group

root@ xdsvc1[] odmget -q group=test_rg HACMPresource|more

HACMPresource: group = "test_rg" name = "NODE_PRIORITY_POLICY" value = "cl_highest_free_mem" id = 21 monitor_method = ""

10.3.2 Changing an existing resource group to use DNP policyYou cannot change the fallover policy to DNP if there are any resources currently part of the resource group. The SMIT fast path for changing a resource group will return an error if you attempt to do so without first removing the resources:

In order to change the policy, you can:

� Remove the resource group and recreate it selecting the Fallover Using Dynamic Node Priority as the fallover policy.

or

� Enter the Change/Show Resources and Attributes for a Resource Group panel, remove all resources part of the resource group and press Enter. Then, go to the SMIT path for Extended Configuration Extended Resource Configuration Extended Resource Group Configuration Change a Resource Group and set the Fallover policy to DNP. You can then read the resources into the resource group and synchronize the cluster for changes to take effect.

10.3.3 How dynamic node priority functionsClstrmgrES polls the Resource Monitoring and Control (ctrmc) daemon every three minutes and maintains a table that stores the current memory, CPU, and disk I/O state of each node.

Note: Using the information retrieved directly from the ODM is for informational purposes only, because the format within the stanzas might change with updates and/or new versions.

Hardcoding ODM queries within user defined applications is not supported and should be avoided.


The resource monitors that contain the information for each policy are:

� IBM.PhysicalVolume� IBM.Host

Each of these monitors can be queried during normal operation by running the commands shown in Example 10-7.

Example 10-7 Querying resource monitors

root@ xdsvc1[] lsrsrc -Ad IBM.Host | grep TotalPgSpFree TotalPgSpFree = 128829 PctTotalPgSpFree = 98.2887root@ xdsvc1[] lsrsrc -Ad IBM.Host | grep PctTotalTimeIdle PctTotalTimeIdle = 99.0069root@ xdsvc1[] lsrsrc -Ap IBM.PhysicalVolumeResource Persistent Attributes for IBM.PhysicalVolumeresource 1: Name = "hdisk2" PVId = "0x000fe401 0xd39e2344 0x00000000 0x00000000" ActivePeerDomain = "" NodeNameList = {"xdsvc1"}resource 2: Name = "hdisk1" PVId = "0x000fe401 0xd39e0575 0x00000000 0x00000000" ActivePeerDomain = "" NodeNameList = {"xdsvc1"}resource 3: Name = "hdisk0" PVId = "0x000fe401 0xafb3c530 0x00000000 0x00000000" ActivePeerDomain = "" NodeNameList = {"xdsvc1"}root@ xdsvc1[] lsrsrc -Ad IBM.PhysicalVolumeResource Dynamic Attributes for IBM.PhysicalVolumeresource 1: PctBusy = 0 RdBlkRate = 0 WrBlkRate = 39 XferRate = 4resource 2: PctBusy = 0 RdBlkRate = 0 WrBlkRate = 0 XferRate = 0resource 3: PctBusy = 0 RdBlkRate = 0 WrBlkRate = 0 XferRate = 0


You can display the current table maintained by clstrmgrES by running the command shown in Example 10-8.

Example 10-8 DNP values maintained by cluster manager

root@ xdsvc1[] lssrc -ls clstrmgrESCurrent state: ST_RP_RUNNINGsccsid = "@(#)36 1.135.1.91 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 53haes_r550, 0845B_hacmp550 10/21/08 13:31:47"i_local_nodeid 0, i_local_siteid -1, my_handle 1ml_idx[1]=0tp is 20477d48Events on event queue:te_type 1, te_nodeid 1, te_network -1There are 0 events on the Ibcast queueThere are 0 events on the RM Ibcast queueCLversion: 10local node vrmf is 5500cluster fix level is "0"The following timer(s) are currently active:Current DNP valuesDNP Values for NodeId - 0 NodeName - xdsvc1 PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000DNP Values for NodeId - 0 NodeName - xdsvc2 PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000DNP Values for NodeId - 0 NodeName - xdsvc3 PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000

The values in the table are used for the DNP calculation in the event of a fallover. If clstrmgrES is in the middle of polling the current state when a fallover occurs, then the value last taken when the cluster was in a stable state is used to determine the DNP.

A detailed scenario of using DNP can be found in Implementing High Availability Cluster Multi-Processing (HACMP) Cookbook, SG24-6769.

10.4 Delayed fallback timerThis feature allows you to configure the fallback behavior of a resource group to occur at one of the predefined recurring times: daily, weekly, monthly, yearly. Alternatively you can specify a particular date and time. This feature can be useful for scheduling fallbacks to occur during off-peak business hours. The diagram in Figure 10-3 shows how the delayed fallback timers can be used.


Figure 10-3 Delayed fallback timer usage

Consider a simple scenario with a cluster having 2 nodes and a resource group. In the event of a node failure the resource group will fallover to the standby node. The resource group will remain on that node until the fallback timer expires. If cluster services are active on the primary node at that time, the resource group will fallback to the primary node. If the primary node is not available at that moment, then the fallback timer will be reset and the fallback will be postponed until the fallback timer expires again.

Delayed fallback timer behaviorWhen using delayed fallback timers, observe these considerations:

� The delayed fallback timer applies only to resource groups having fallback policy set to Fallback To Higher Priority Node In The List.

� If there is no higher priority node available when the timer expires, then the resource group remains online on the current node. The timer is reset and the fallback will be retried when the timer expires again.

� If a specific date is used for a fallback timer and at that moment there is not any higher priority node, then the fallback will not be rescheduled.

� If a resource group being part of an Online on the Same Node dependency relationship has a fallback timer, then the timer will apply to all resource groups that are part of the Online on the Same Node dependency relationship.

� When using an Online on the Same Site dependency relationship, if a fallback timer is used for a resource group, then it must be identical for all resource groups being part of the same dependency relationship.

Phase 3. Resource Group falls back to highest priority node in the nodelist if cluster services are running on it at the time specified in the Fallback timer.

Standby Node

Phase 1. Failure causes resource group to fallover to standby node.

Phase 2. Resource Group operates on standby host until the specified Delayed Fallback Timer timeframe specified

Primary Node

Fallback Timer:Sunday 12:00PM


� You cannot configure the delayed fallback timer feature using the Initialization and Standard Configuration menu. This feature can only be configured using the Extended Configuration path.

Configuring delayed fallback timersTo configure the delayed fallback timer, do the following steps:

1. Define a fallback timer policy.

2. Assign the fallback timer policy to a resource group.

To configure a delayed fallback policy, do the following steps:

1. Use the smitty hacmp fast path, then select Extended Configuration Extended Resource Configuration Configure Resource Group Run-Time Policies Configure Delayed Fallback Timer Policies Add a Delayed Fallback Timer Policy and press Enter.

2. Select one of the following choices:

– Daily– Weekly– Monthly– Yearly– Specific Date

3. Specify the following data:

– Name of Fallback Policy:

Specify the name of the policy using no more than 32 characters. Use alphanumeric characters and underscores only. Do not use a leading numeric value or any reserved words.

– Policy specific values:

Based on the previous selection enter values suitable for the policy selected.

To assign a fallback timer policy to a resource group, do the following steps:

1. Use the smitty hacmp fast path and select Extended Configuration Extended Resource Configuration HACMP Extended Resource Group Configuration. Select the desired resource group from the list and press Enter.


2. Press the F4 key to select one of the policies configured in the previous steps. You should see something similar to the display shown in Example 10-9.

Example 10-9 Assigning a fallback timer policy to a resource group



[TOP] [Entry Fields] Resource Group Name test_rg Participating Nodes (Default Node Priority) xdsvc1 xdsvc2* Dynamic Node Priority Policy [cl_highest_free_mem] +

Startup Policy Online On Home Node Only Fallover Policy Fallover Using Dynamic Node Priori> Fallback Policy Fallback To Higher Priority Node I> Fallback Timer Policy (empty is immediate) [] +

Service IP Labels/Addresses [] + Application Servers [] +

Volume Groups [] + Use forced varyon of volume groups, if necessary false + Automatically Import Volume Groups false + Filesystems (empty is ALL for VGs specified) [] + Filesy+--------------------------------------------------------------------------+ + Filesy| Fallback Timer Policy (empty is immediate) | + Filesy| | + Filesy| Move cursor to desired item and press Enter. | + | | + Filesy| timer_policy1 | + Stable| timer_policy2 | +[MORE...| | | F1=Help F2=Refresh F3=Cancel |F1=Help | F8=Image F10=Exit Enter=Do |F5=Reset| /=Find n=Find Next |F9=Shell+--------------------------------------------------------------------------+

3. Select the desired fallback timer policy from the picklist and press Enter.

4. Add any additional resources to the resource group and press Enter.

5. Run a verification and synchronization on the cluster to propagate the changes to all cluster nodes.

Displaying delayed fallback timers in a resource groupYou can display existing fallback timer policies for resource groups using the clshowres command as shown in Example 10-10:

Example 10-10 Displaying resource groups having fallback timers

root@ xdsvc1[] clshowres|grep -ip test_rg|egrep -i "resource group|timer"Resource Group Name test_rgDelayed Fallback Timer timer_policy1


An alternative is to query the HACMPtimer object class as shown in Example 10-11.

Example 10-11 Displaying fallback timers using ODM queries

root@ xdsvc1[] odmget HACMPtimer

HACMPtimer: policy_name = "timer_policy2" recurrence = "daily" year = 0 month = 0 day_of_month = 1 week_day = 0 hour = 11 minutes = 11

HACMPtimer: policy_name = "timer_policy1" recurrence = "daily" year = 0 month = 0 day_of_month = 1 week_day = 0 hour = 22 minutes = 22

HACMPtimer: policy_name = "timer_policy3" recurrence = "once" year = 110 month = 0 day_of_month = 13 week_day = 0 hour = 12 minutes = 11

A detailed scenario of using delayed fallback timers can be found in Implementing High Availability Cluster Multi-Processing (HACMP) Cookbook, SG24-6769.

Note: Using the information retrieved directly from ODM is for informational purposes only, because the format of the stanzas might change with updates, or new versions.



10.5 Resource group dependenciesIt is usual nowadays to have large business environments that accommodate more and more sophisticated business solutions. Complex applications often contain multiple modules that rely on various resources being available. Highly-available applications having multi-tiered architecture can use PowerHA capabilities to ensure that all required resources are kept available and started in proper order. PowerHA can include components used by an application in resource groups and establish resource group dependencies that will accurately reflect the logical relationships between application components.

For instance, a database must be online before the application server is started. If the database goes down and falls over to a different node, then the resource group containing the application server would also be brought down and back up on any of the available cluster nodes. If the fallover of the database resource group is not successful, then both resource groups (database and application) will be put offline.

To understand how PowerHA can be used to ensure high availability of multi-tiered applications, it is necessary to understand the following concepts:

� Parent resource group:

The parent resource group is the first resource group to be acquired during the resource groups acquisition. This resource group does not have any other resource group as a prerequisite. You should include here application components or modules that do not rely on the presence of other components or modules.

� Child resource group:

A child resource group is dependent on a parent resource group. This type of resource groups assumes the existence of another resource group. You should include here application components or modules that do rely on the availability of other components or modules.

A child resource group will not be brought online unless the parent resource group is online. In the parent resource group is put offline, then the child resource group will also be put offline. A child resource group cannot be brought online if the parent resource group is not online.

� Parent/child dependency:

A parent/child dependency allows binding resource groups in a hierarchical manner. There can be only three levels of dependency for resource groups. A resource group can act both as a parent and a child. You cannot specify circular dependencies between resource groups. Additionally you can configure a location dependency between resource group in order to control the collocation of your resource groups.


� Location dependency:

Resource group location dependency gives you the means to ensure that certain resource groups will always be online on the same node or site, or that certain resource groups will always be online on different nodes or sites.

With PowerHA, you can configure the following types of resource group dependencies:

� Parent/child dependency � Resource group location dependency

10.5.1 Resource group parent/child dependencyYou can configure parent/child dependencies between resource groups to ensure that resource groups are processed properly during cluster events.

Planning for parent/child resource group dependenciesWhen planning to use parent/child resource group dependencies, take into account the following considerations:

� Plan carefully which resource groups will contain which application component. Ensure that application components that rely on the availability of other components are placed in different resource groups. Resource group parent/child relationship should reflect the logical dependency between application components.

� Parent/child relationship can span up to three levels.

� There should be no circular dependencies between resource groups.

� A resource group can act as a parent for a resource group and as a child for another resource group.

� Plan for application monitors for each application that you are planning to include in a child or parent resource group.

� For an application in a parent resource group, configure a monitor in the monitoring startup mode. After the parent resource group is online, the child resource groups will also be brought online.

Configuring a resource group parent/child dependencyTo configure parent/child resource group dependency, do the following steps:

1. Use the smitty hacmp fast path and select Extended Configuration HACMP Extended Resource Configuration Configure Resource Group Run-Time Policies Configure Dependencies between Resource Groups Configure Parent/Child Dependency Add Parent/Child Dependency between Resource Groups and press Enter.


2. Fill in the fields as follows:

– Parent Resource Group:

Select the parent resource group from the list. During resource group acquisition the parent resource group will be brought online before the child resource group.

– Child Resource Group:

Select the child resource group from the list and press Enter. During resource group release, HACMP will bring the child resource group offline before the parent resource group.

PowerHA will prevent you from specifying a circular dependency.

3. Use the Verify and Synchronize option to validate the dependencies and propagate them on all cluster nodes.

10.5.2 Resource group location dependencyYou can configure location dependencies between resource groups to control the location of resource groups during cluster events.

With PowerHA you can configure the following types of resource group location dependencies:

� Online on the Same Node� Online on the Same Site� Online on Different Nodes

You can combine resource group parent/child and location dependencies.

Planning for Online on the Same Node DependencyWhen planning to use Online on the Same Node resource group dependencies, take into account the following considerations:

� All resource groups having an Online on the Same Node dependency relationship must have the same node list and the participating nodes must be listed in the same order.

� Both concurrent and non-concurrent resource groups are allowed.

� You can have more than one Online on the Same Node dependency relationship in the cluster.


� All non-concurrent resource groups in the same Online on the Same Node dependency relationship must have identical startup/fallover/fallback policies.

– Online Using Node Distribution Policy is not allowed for startup policy.

– If Dynamic Node Priority policy is being used as the fallover policy, then all resource group in the dependency must use the same DNP policy.

– If one resource group has a fallback timer configured, then the timer will also apply to the resource groups that take part in the dependency relationship. All resource groups must have identical fallback time setting

– If one or more resource groups in the Online on the Same Node dependency relationship fail, then cluster services will try to place all resource groups on the node that can accommodate all resource groups being currently online plus one or more failed resource groups.

Configuring Online on the Same Node location dependencyTo configure an Online on the Same Node resource group dependency, do the following steps:

1. Use the smitty hacmp fast path and select Extended Configuration HACMP Extended Resource Configuration Configure Resource Group Run-Time Policies Configure Dependencies between Resource Groups Configure Online on the same node Dependency Add Online on the Same Node Dependency Between Resource Groups and select the resource groups that will be part of that dependency relationship.

In order to have resource groups activated on the same node, they must have identical participating node lists.

2. In order to propagate the change across all cluster nodes remember to verify and synchronize your cluster.

Planning for Online On Different Nodes DependencyWhen you configure resource groups in the Online On Different Nodes Dependency relationship, you assign priorities to each resource group in case there is contention for a given node at any point in time. You can assign High, Intermediate, and Low priorities. Higher priority resource groups take precedence over lower priority resource groups upon startup, fallover, and fallback.

When planning to use Online on Different Nodes resource group dependencies, take into account the following considerations:

� Only one Online On Different Nodes dependency is allowed per cluster.

� Each resource group should have a different home node for startup.


� When using this policy, a higher priority resource group takes precedence over a lower priority resource group during startup, fallover, and fallback:

– If a resource group with High priority is online on a node, then no other resource group being part of the Online On Different Nodes dependency can be put online on that node.

– If a resource group being part of the Online On Different Nodes dependency is online on a cluster node and a resource group being part of the Online On Different Nodes dependency and having a higher priority falls over or falls back to the same cluster node, then the resource group having a higher priority will be brought online, whereas the resource group having a lower priority resource group will be taken offline or migrated to another cluster node if available.

– Resource groups being part of the Online On Different Nodes dependency and having the same priority cannot be brought online on the same cluster node. The precedence of resource groups being part of the Online On Different Nodes dependency and having the same priority is determined by the alphabetical order.

– Resource groups being part of the Online On Different Nodes dependency and having the same priority do not cause each other to be moved from cluster node after a fallover or fallback.

– If a parent/child dependency is being used, then the child resource group cannot have a priority higher than its parent.

Configuring Online on Different Node location dependencyTo configure an Online on Different Node resource group dependency, do the following steps:

1. Use the smitty hacmp fast path and select Extended Configuration HACMP Extended Resource Configuration Configure Resource Group Run-Time Policies Configure Dependencies between Resource Groups Configure Online on the Same Node Dependency Add Online on Different Nodes Dependency between Resource Groups and press Enter.

2. Fill in the following fields as explained here and press Enter:

– High Priority Resource Group(s):

Select the resource groups that will be part of the Online On Different Nodes dependency and should be acquired and brought online before all other resource groups.


On fallback and fallover, these resource groups are processed simultaneously and brought online on different cluster nodes before any other resource groups. If different cluster nodes are unavailable for fallover or fallback, then these resource groups, having the same priority level, can remain on the same node.

The highest relative priority within this set is the resource group listed first.

– Intermediate Priority Resource Group(s):

Select the resource groups that will be part of the Online On Different Nodes dependency and should be acquired and brought online after high priority resource groups and before the low priority resource groups.

On fallback and fallover, these resource groups are processed simultaneously and brought online on different target nodes before low priority resource groups. If different target nodes are unavailable for fallover or fallback, these resource groups, having same priority level, can remain on the same node.

The highest relative priority within this set is the resource group listed first.

– Low Priority Resource Group(s):

Select the resource groups that will be part of the Online On Different Nodes dependency and that should be acquired and brought online after all other resource groups.

On fallback and fallover, these resource groups are brought online on different target nodes after the all higher priority resource groups are processed.

Higher priority resource groups moving to a cluster node can cause these resource groups to be moved to another cluster node or be taken offline.

3. Continue configuring run-time policies for other resource groups or verify and synchronize the cluster.

Planning for Online on the Same Site DependencyWhen planning to use Online on the Same site resource group dependencies, take into account the following considerations:

� All resource groups in an Online on the Same Site dependency relationship must have the same Inter-Site Management policy. However, they might have different startup/fallover/fallback policies. If fallback timers are used, these must be identical for all resource groups part of the Online on the Same Site dependency.

� The fallback timer does not apply to moving a resource group across site boundaries.


� All resource groups in an Online on the Same Site dependency relationship must be configured so that the nodes that can own the resource groups are assigned to the same primary and secondary sites.

� Online Using Node Distribution policy is supported.

� Both concurrent and non-concurrent resource groups are allowed.

� You can have more than one Online on the Same Site dependency relationship in the cluster.

� All resource groups having an Online on the Same Site dependency relationship are required to be on the same site, even though some of them might be in OFFLINE or ERROR state.

� If you add a resource group being part of an Online on the Same Node dependency to an Online on the Same Site dependency, then you must add all other resource groups that are part of the Online on the Same Node dependency to the Online on the Same Site dependency.

Configuring Online on the Same Site location dependencyTo configure an Online on the Same Site resource group dependency, do the following steps:

1. Use the smitty hacmp fast path and select Extended Configuration HACMP Extended Resource Configuration Configure Resource Group Run-Time Policies Configure Dependencies between Resource Groups Configure Online on the Same Site Dependency Add Online on the Same Site Dependency Between Resource Groups and press Enter.

2. Select from the list the resource groups to be put online on the same site. During acquisition these resource groups will be brought online on the same site according to the site and node startup policy specified for the resource groups. On fallback or fallover the resource groups are processed simultaneously and brought online on the same site.


10.5.3 Combining various dependency relationshipsWhen combining multiple dependency relationships you have to take into account the following:

� Only one resource group can belong to both an Online on the Same Node dependency relationship and an Online on Different Nodes dependency relationship.


� If a resource group belongs to both an Online on the Same Node dependency relationship and an Online on Different Node dependency relationship, then all other resource groups than are part of the Online of Same Node dependency will have the same priority as the common resource group.

� Only resource groups having the same priority and being part of an Online on Different Nodes dependency relationship can be part of an Online on the Same Site dependency relationship.

10.5.4 Displaying resource group dependenciesYou can display resource group dependencies using the clrgdependency command as shown in Example 10-12.

Example 10-12 Displaying resource group dependencies

root@ xdsvc1[] clrgdependency -t PARENT_CHILD -sl# Parent Childrg_parent rg_child

An alternative is to query the HACMPrg_loc_dependency and HACMPrgdependency object classes as shown in Example 10-13.

Example 10-13 Displaying resource group dependencies using ODM queries

root@ xdsvc1[] odmget HACMPrgdependency

HACMPrgdependency: id = 0 group_parent = "rg_parent" group_child = "rg_child" dependency_type = "PARENT_CHILD" dep_type = 0 group_name = ""root@ xdsvc1[] odmget HACMPrg_loc_dependency

HACMPrg_loc_dependency: id = 1 set_id = 1 group_name = "rg_same_node2" priority = 0 loc_dep_type = "NODECOLLOCATION" loc_dep_sub_type = "STRICT"


HACMPrg_loc_dependency: id = 2 set_id = 1 group_name = "rg_same_node_1" priority = 0 loc_dep_type = "NODECOLLOCATION" loc_dep_sub_type = "STRICT"

HACMPrg_loc_dependency: id = 4 set_id = 2 group_name = "rg_different_node1" priority = 1 loc_dep_type = "ANTICOLLOCATION" loc_dep_sub_type = "STRICT"

HACMPrg_loc_dependency: id = 5 set_id = 2 group_name = "rg_different_node2" priority = 2 loc_dep_type = "ANTICOLLOCATION" loc_dep_sub_type = "STRICT"

Note: Using the information retrieved directly from the ODM is for informational purposes only, because the format within the stanzas might change with updates, and/or new versions.




Chapter 11. Customizing events

In this chapter we show how you can use PowerHA to recognize and react to cluster events. PowerHA has features that give you the ability to modify and adjust cluster behavior in response to specific events according to the requirements of your particular environment.


� Overview of cluster events� Writing scripts for custom events� Pre-event and post-event commands� Automatic error notification

11


11.1 Overview of cluster eventsWhen a cluster event occurs, the Cluster Manager runs the event script corresponding to that event. As the event script is being processed, a series of sub-event scripts can be run. PowerHA provides a script for each event and sub-event. The default scripts are located in the /usr/es/sbin/cluster/events directory. By default, the Cluster Manager calls the corresponding event script for a specific event. You can specify additional actions to be performed when a particular event occurs. You can customize the handling of a particular event for your cluster using the following features:

� Pre-event and post-event processing:

You can customize event processing according to the requirements of your particular environment by specifying commands or user-defined scripts that are run before or after a specific event is run by the Cluster Manager.

� Event notification:

You can specify a command or a user-defined script that provides notification that an event is about to happen or has just occurred. This command is run once before processing the event itself and once again as the last step of event processing.

� Event recovery and retry:

You can specify a command that attempts to recover from an event failure. This command is run only if the script event fails. After the recovery command is run the event script is run once again. You can also specify a counter which represents the maximum number of times that the cluster event can fail. If the cluster script still fails after the last attempt, then the cluster manager will declare the failure of that event.

� Cluster automatic error notification:

You can use AIX error logging feature to detect hardware and software errors which are not monitored by default by cluster services and trigger an appropriate response action.

� Customizing event duration:

PowerHA issues a warning each time a cluster event takes more time to complete than a specified time-out period. You can customize the time period allowed for a cluster event to complete before a warning is issued.

� Defining new events:

With PowerHA you can define new cluster events.


11.2 Writing scripts for custom events

Customizing cluster events requires writing scripts. Consider these suggestions:

� Test all possible input parameters.

� Test all conditional branches, for example, all “if”, “case” branches.

� Handle all error (non-zero) exit codes.

� Provide correct return value: 0 for success, any other number for failure.

� Terminate within a reasonable amount of time.

� Test the scripts thoroughly as they can impact the behavior of your cluster.

� Consider that if your script fails, your cluster will fail too.

� Make sure that a recovery program can recover from an event failure, otherwise the cluster will fail.

� Store your scripts in a convenient location.

� Thoroughly document your scripts.

� Remember to set the execute bit for all scripts.

� Keep in mind that synchronization does not copy pre-event and post-event script content from one node to another.

� You need to copy pre-event and post-event scripts on all cluster nodes.

� The name and location of scripts must be identical on all cluster nodes. However, the content of the scripts might be different.

11.3 Pre-event and post-event commandsFor all predefined events, you can define a pre-event, a post-event, a notification method, and a recovery command:

Pre-event script This script runs before the cluster event is run.

Post-event script This script runs after the cluster event is run.

Notify method The notification method runs before and after the cluster event. It usually sends a message to the system administrator about an event starting or completing.

Important: Your cluster will not continue processing events until your custom pre-event or post-event script has run.

Chapter 11. Customizing events 563

Recovery command This runs only if the cluster event has failed. After the recovery command has completed, the event script is run again. You can also specify the maximum number of times the recovery command can be run in attempt to recover from cluster event script failure.

11.3.1 Parallel processed resource groups and usage of pre-event and post-event scripts

Resource groups, by default, are processed in parallel unless you specify a customized serial processing order for all or some of the resource groups in the cluster. When resource groups are processed in parallel, fewer cluster events occur in the cluster, and thus the number of particular cluster events for which you can create customized pre-event or post-event scripts is reduced.

Only the following events take place during parallel processing of resource groups:

� node_up� node_down� acquire_svc_addr� acquire_takeover_addr� release_svc_addr� release_takeover_addr� start_server � stop_server

The following events do not occur during parallel processing of resource groups:

� get_disk_vg_fs� release_vg_fs� node_up_ local� node_up_remote� node_down_local� node_down_remote� node_up_local_complete� node_up_remote_complete� node_down_local_complete� node_down_remote_complete

Always pay particular attention to the list of events when you upgrade from an older version and choose parallel processing for some of the pre-existing resource groups in your configuration.


11.3.2 Configuring pre-event or post-event scriptsIn order to define a pre-event or post-event script, you must create a custom event and then associate the custom event with a cluster event as follows:

1. Write and test your event script carefully. Ensure that you copy the file to all cluster nodes under the same path and name.

2. Define the custom event:

a. Use the smitty hacmp fast path and select Extended Configuration Extended Event Configuration Configure Pre/Post-Event Commands Add a Custom Cluster Event

b. Fill in the following information:

• Cluster Event Name: The name of the event.

• Cluster Event Description: A short description of the event.

• Cluster Event Script Filename: The full path of the event script.

3. Connect the custom event with pre/post-event cluster event:

a. Use the smitty hacmp fast path and select Extended Configuration Extended Event Configuration Change/Show Pre-Defined HACMP Events.

b. Select the event that you want to adjust.

c. Enter the following values:

• Notify Command (optional): The full path name of the notification command, if any.

• Pre-event Command (optional): The name of the custom cluster event that you want to run as a pre-event. You can choose from the custom cluster event list that have been previously defined.

• Post-event Command (optional): The name of the custom cluster event that you want to run as a post-event. You can choose from the custom cluster event list that have been previously defined.

• Recovery Command (optional): The full path name of the recovery script to be run in attempt to recover from cluster event failure.

• Recovery Counter: The maximum number of times to run the recovery command. The default value is 0.

Note: When trying to adjust the default behavior of an event script, always use pre-event or post-event scripts. Do not modify the built-in event script files. This option is neither supported, nor safe, because these files can be modified without notice when applying fixes or performing upgrades.



You can define multiple customized pre-event and post-event scripts for a particular cluster event.

11.4 Automatic error notificationBy default, PowerHA monitors only cluster nodes, networks, and network adapters. However, in your particular environment, there might be other events that should be monitored and for whose occurrence the cluster behavior must be modified accordingly.

PowerHA provides a SMIT interface to the AIX Error Notification facility. AIX Error Notification facility allows you to detect an event not specifically monitored by the PowerHA—for example, a disk adapter failure—and to trigger a response to this event.

Before you configure Automatic Error Notification, a valid cluster configuration must be in place.

Automatic error notification applies to selected hard, non-recoverable error types such as those related to disks or disk adapters. This utility does not support media errors, recovered errors, or temporary errors.

Enabling automatic error notification assigns one of two error notification methods for all error types as follows:

� The non-recoverable errors pertaining to resources that have been determined to represent a single point of failure are assigned the cl_failover method and will trigger a failover.

� All other non-critical errors are assigned the cl_logerror method and an error entry will be logged against the hacmp.out file.

Tip: You can use cluster file collection feature to ensure that custom event files will be propagated automatically to all cluster nodes.

Tip: If you use pre-event and post-event scripts to ensure proper sequencing and correlation of resources used by applications running on the cluster, you can consider simplifying or even eliminating them by specifying parent/child dependencies between resource groups.


PowerHA configures automatically error notifications and recovery actions for several resources and error types that include:

� All disks in the rootvg volume group

� All disks in cluster volume groups, concurrent volume groups, and file systems

� All disks defined as cluster resources

11.4.1 Disk monitoring consideration

Additionally, PowerHA can monitor both mirrored and non-mirrored volume groups regardless of the disk type. When the loss of quorum is detected, an LVM_SA_QUORCLOSE entry is logged against the error log. PowerHA can initiate a takeover for the resource group that contains the volume group.

11.4.2 Setting up automatic error notification

PowerHA can add automatic error notifications on all nodes. Starting with HACMP 5.3, the Automatic Error Notification methods are added automatically during cluster verification and synchronization.

In order to set up automatic error notifications, do the following steps:

� Use the smitty hacmp fast path and select Problem Determination Tools HACMP Error Notification Configure Automatic Error Notification Add Error Notify Methods for Cluster Resources.

11.4.3 Listing automatic error notification

In order to list automatic error notifications that are currently configured in your cluster, do the following steps:

1. Use the smitty hacmp fast path and select Problem Determination Tools HACMP Error Notification Configure Automatic Error Notification.

Note: Handling LVM_SA_QUORCLOSE for non-mirrored volume groups has been added in APAR IZ21648 for AIX 5.3 and APAR IZ21631 for AIX 6.1, and PowerHA 5.5 APAR IZ41771.

Note: You cannot configure automatic error notification while the cluster is running.


2. Select List Error Notify Methods for Cluster Resources.

The result should be similar to the output shown in Example 11-1

Example 11-1 Sample list if automatic error notifications

COMMAND STATUS



xdsvc1:xdsvc1: HACMP Resource Error Notify Methodxdsvc1:xdsvc1: hdisk0 /usr/es/sbin/cluster/diag/cl_failoverxdsvc1: hdisk1 /usr/es/sbin/cluster/diag/cl_logerrorxdsvc1: hdisk2 /usr/es/sbin/cluster/diag/cl_logerrorxdsvc2:xdsvc2: HACMP Resource Error Notify Methodxdsvc2:xdsvc2: hdisk0 /usr/es/sbin/cluster/diag/cl_failoverxdsvc2: hdisk1 /usr/es/sbin/cluster/diag/cl_logerrorxdsvc2: hdisk2 /usr/es/sbin/cluster/diag/cl_logerror


11.4.4 Removing automatic error notification

In order to remove automatic error notifications, do the following steps:

� Use the smitty hacmp fast path and select Problem Determination Tools HACMP Error Notification Configure Automatic Error Notification Remove Error Notify Methods for Cluster Resources press Enter to confirm.


11.4.5 Using error notificationThe High Availability Cluster Multi-Processing for AIX Administrator Guide, SC23-4862-11 contains lists with hardware errors that are handled by the cluster automatic error notification utility. It also contains a list of hardware errors that are not handled by the cluster automatic error notification utility.

With PowerHA you can customize the error notification method for other devices and error types and define a specific notification method, rather than using one of the two automatic error notification methods.

In order to add a notify method, do the following steps:

1. Use the smitty hacmp fast path and select Problem Determination Tools HACMP Error Notification Add a Notify Method.

2. Define the notification object:

– Notification Object Name:

User supplied name that uniquely identifies the error notification object.

– Persist across system restart?

Yes: The error notification will persist after system reboot.

No: The error notification will be used until the system is next restarted.

– Process ID for use by Notify Method:

The error notification will be send on behalf of the selected process ID. You should set Persist across system restart to No if you specify any non-zero process ID here.

– Select Error Class:

• None: Choose this value to ignore this entry.• All: Match all error classes.• Hardware: Match all hardware errors,• Software: Match all software errors.• Errlogger: Operator notifications and messages from the errlogger

program.

– Select Error Type:

• None: Choose this value to ignore this entry.• All: Match all error types.• PEND: Impending loss of availability.• PERF: Performance degradation.• PERM: Permanent errors.• TEMP: Temporary errors.• UNKN: Unknown error type.


– Match Alertable errors?

This field is intended to be used by alert agents of system management applications application. If you do not use such applications then leave this field to None.

– None: Choose this value to ignore this entry.

– All: Alert all errors.

– Yes: Match alertable errors.

– No: Match non-alertable errors.

– Select Error Label:

Select the error label from the list. See the /usr/include/sys/errids.h file for a short description of error labels.

– Resource Name:

The name of the failing resource. For the hardware error class, this is the device name. For the software errors class, this is the name of the failing executable. Select All to match all resource type.

– Resource Class:

For the hardware resource class, this is the device class. It is not applicable for software errors. Specify All to match all resource classes.

– Resource Type:

The type of the failing resource. For hardware error class, the device type by which a resource is known in the devices object. Specify All to match all resource classes.

– Notify Method:

The full-path name of the program to be run whenever an error is logged which matches any of the above defined criteria. You can pass the following variables to the executable:

$1: Error log sequence number$2: Error identifier$3: Error class$4: Error type$5: Alert flag$6: Resource name of the failing device$7: Resource type of the failing device$8: Resource class of the failing device$9: Error log entry label

3. Press Enter to create the error notification object.


After an error notification has been defined, PowerHA offers the means to emulate it. You can emulate an error log entry with a selected error label. The error label will appear in the error log and the notification method will be run by errdemon.

In order to emulate a notify method, do the following steps:

� Use the smitty hacmp fast path and select Problem Determination Tools HACMP Error Notification Emulate Error Log Entry

� Select the error label or notify method name from the pop-up list. Only notify methods that have an error label defined will be shown.

� SMIT shows the error label, notification object name, and the notify method. Press Enter to confirm error log entry emulation.

A detailed scenario of defining and using error notification can be found in Implementing High Availability Cluster Multi-Processing (HACMP) Cookbook, SG24-6769.

11.4.6 Customizing event duration

Cluster events run asynchronously and can take different times to complete. Because PowerHA has no means to detect whether an event script has not hung, it runs a config_too_long event each time the processing of an event exceeds a certain amount of time. For such events, you can customize the time period that cluster services will wait for an event to complete before issuing the config_too_long warning message.

Cluster events can be divided into two classes as follows:

� Fast events:

These events do not include acquiring or releasing resources and normally take a shorter time to complete. For fast events, the time which PowerHA waits before issuing a warning is equal to Event Duration Time.

� Slow events:

These events involve acquiring and releasing resources or use application server start and stop scripts. Slow events can take a longer time to complete. Customizing event duration time for slow events allows you to avoid getting unnecessary system warnings during normal cluster operation. For slow events, the total time before receiving a config_too_long warning message is set to the sum of Event-only Duration Time and Resource Group Processing Time.


In order to change the total event duration time before receiving a config_too_long warning message, do the following steps:

1. Use the smitty hacmp fast path and select HACMP Extended Configuration Extended Event Configuration Change/Show Time Until Warning

2. Enter data in the fields as follows:

– Max. Event-only Duration (in seconds):

The maximum time (in seconds) to run a cluster event. The default is 180 seconds.

– Max. Resource Group Processing Time (in seconds):

The maximum time (in seconds) to acquire or release a resource group. The default is 180 seconds.

– Total time to process a Resource Group event before a warning is displayed:

The total time for the Cluster Manager to wait before running the config_too_long script. The default is 6 minutes. This field is the sum of the two other fields and is not editable.


4. Verify and synchronize the cluster to propagate the changes.

11.4.7 Defining new events

With PowerHA you can define your own cluster events that would run specified recovery programs. The events you define can be related to RMC resources. An RMC resource refers to an instance of a physical or logical entity that provides services to some other component of the system. Resources can refer to both hardware and software entities. For example, a resource could be a physical disk (IBM.PhysicalVolume) or a running program (IBM.Program). Resources of the same type are organized in classes. You can obtain information regarding resources and resource classes using lsrsrcdef command. The AIX resource monitor generates events for operating system-related resource conditions.

For more details regarding resources and RMC refer to the IBM RSCT Documentation.

Recovery programsA recovery program consists of a sequence of recovery command specifications having the following format:

:node_set recovery_command expected_status NULL


Where:

� node_set:

The set of nodes on which the recovery program will run and can take one of the following values:

– all: The recovery command runs on all nodes.

– event: The node on which the event occurred.

– other: All nodes except the one on which the event occurred.

� recovery_command:

Quote-delimited string specifying a full path to the executable program. The command cannot include any arguments. Any executable program that requires arguments must be a separate script. The recovery program must have the same path on all cluster nodes. The program must specify an exit status.

� expected_status:

Integer status to be returned when the recovery command completes successfully. The Cluster Manager compares the actual status returned against the expected status. A mismatch indicates unsuccessful recovery. If you specify the character X in the expected status field then Cluster Manager will skip the comparison.

� NULL:

Not used, included for future functions.

Multiple recovery command specifications can be separated by the barrier command. All recovery command specifications before a barrier start in parallel. When a node encounters a barrier command, all nodes must reach the same barrier before the recovery program resumes.

In order to define your new event, do the following steps:

1. Use the smitty hacmp fast path and select HACMP Extended Configuration Extended Event Configuration Configure User-Defined Events Add Custom User-Defined Events.

2. Enter data in the fields as follows:

– Event name:

The name of the event.

– Recovery program path:

Full path of the recovery program.


– Resource name:

RMC resource name.

– Selection String:

An SQL expression that includes attributes of the resource instance.

– Expression®:

Relational expression between dynamic resource attributes. When the expression evaluates true it generates an event.

– Rearm expression:

Relational expression between dynamic resource attributes. Usually the logical inverse or complement of the event expression.


An example of defining a user-defined event is shown in Example 11-2.

Example 11-2 Defining a user-defined event

Add a Custom User-Defined Event


[Entry Fields]* Event name [user_defined_event]* Recovery program path [/user_defined.rp]* Resource name [IBM.FileSystem]* Selection string [name = "/var"]* Expression [PercentTotUsed > 70] Rearm expression [PercentTotUsed < 50]


The recovery program used was:

#Recovery Program for user-defined event call levent "/usr/ha/trigger_user_defined_event.sh" 0 NULL

The /usr/ha/trigger_user_defined_event.sh script could perform any form of notification, such as writing to a log file or sending an email, sms or SNMP trap.

For additional details regarding user-defined events, refer to High Availability Cluster Multi-Processing for AIX Planning Guide, SC23-4861-11


Chapter 12. Storage related considerations

In this chapter, we describe the different types of volume groups and explain how PowerHA utilizes each type.


� Volume group types– Enhanced concurrent– Non-concurrent– Concurrent

� Disk reservations� Forced varyon of volume groups � Fast disk takeover� Disk heartbeat� Fast failure detection

12


12.1 Volume group types

It is important to understand the different types of volume groups and how PowerHA utilizes each type. We cover the following volume group types:

� Enhanced concurrent� Non-concurrent� Concurrent

There are other additional volume group attributes that can coexist with these listed above, such as, “big” and “scalable”. Generally speaking, you can combine these attributes with the additional types just listed. For example, you can have a big enhanced concurrent volume group. However, these additional attributes do not affect how PowerHA activates the volume groups. PowerHA activates a big enhanced concurrent volume group using the same method as though it were only an enhanced concurrent volume group.

12.1.1 Enhanced concurrent

Enhanced concurrent volume groups were first introduced in AIX 5.1. Unlike concurrent (that is, for SSA only), it is supported for use on any disk subsystem that is supported in a shared, AIX, POWER systems, PowerHA configuration. In AIX 5.2 and above, the enhanced concurrent volume group type is the only concurrent type available.

Enhanced concurrent volume groups use the Group Services Concurrent Logical Volume Manager (gsclvmd) daemon, which communicates over IP to other cluster member nodes.

Utilizing gsclvmd, most LVM changes can be made dynamically, even from the command line. For these dynamic changes to work correctly, it is required to have gsclvmd, topsvcs, grpsvcs, and emsvcs running while performing maintenance. This is easily done by having the PowerHA cluster up and running with your volume groups online in concurrent mode.

Enhanced concurrent volume groups can be used in both concurrent and non-concurrent environments. These features (such as fast disk takeover and disk heartbeat) are dependent on it.

Note: C-SPOC is the recommended best practice for all cluster LVM administration. Unlike the command line, it is not dependent on cluster services (other than clcomdES) to be running on the member nodes.


Existing non-concurrent volume groups can be converted to enhanced concurrent without losing any additional storage space. The volume group must be online to change it and can be done by running the chvg -c vgname command. For this change to take affect on other nodes, the volume group must be offline, and either exported and re-imported, or you can use the lazy update option of importvg using the importvg -L vgname pvname command.

To create a new enhanced concurrent volume group on a local node from the command line, simply run mkvg -C vgname pvname.

You can determine if a volume is enhanced concurrent by running lsvg vgname and checking the Concurrent: field; it should be Enhanced Capable, as shown in Figure 12-1.

Figure 12-1 Enhanced concurrent volume group example

Attention: When configuring enhanced concurrent volume groups in the cluster, ensure that multiple networks (IP and non-IP) exist for communication between the nodes in the cluster, to avoid cluster partitioning. When fast disk takeover is used, the normal SCSI reserve is not set to prevent multiple nodes from accessing the volume group.

Maddi / > lsvg pattyvgVOLUME GROUP: app1vg VG IDENTIFIER: 0022be2a00004c48VG STATE: active PP SIZE: 16 megabyte(s)VG PERMISSION: read/write TOTAL PPs: 1190 MAX LVs: 256 FREE PPs: 1180LVs: 0 USED PPs: 10 OPEN LVs: 0 QUORUM: 2TOTAL PVs: 2 VG DESCRIPTORS: 3STALE PVs: 0 STALE PPs: 0ACTIVE PVs: 2 AUTO ON: noConcurrent: Enhanced-Capable Auto-Concurrent: DisabledVG Mode: Non-ConcurrentMAX PPs per PV: 1016 MAX PVs: 32LTG size: 128 kilobyte(s) AUTO SYNC: noHOT SPARE: no BB POLICY: relocatable

Important: It is important when using enhanced concurrent volume groups that multiple networks exist for RSCT heartbeats. As there is no SCSI locking, a partitioned cluster can very quickly active a volume group, and then potentially corrupt data.

Chapter 12. Storage related considerations 577

12.1.2 Non-concurrent

A non-concurrent volume group is the default when creating a new volume group. It is also referred to as a standard volume group. The inherent nature of non-concurrent volume groups is that the volume group will not be accessed by more than one system at any time. Full read/write access is possible only by the system that activated the volume group with varyonvg vgname.

Non-concurrent is not an LVM designated type of volume group. It is a designation of the mode of operation in which the volume group is to be used. When running the lsvg command against a volume group, you can tell it is a non-concurrent volume group by the omission of the “Concurrent” field that is shown in Figure 12-1 on page 577.

12.1.3 Concurrent

This type of volume group, also referred to as Concurrent Capable, is specific to SSA disks in a concurrent access configuration. This combination provided the first true concurrent mode volume group.

The unique serial connectivity of SSA disks allowed communication access over something called the covert channel. This covert channel is utilized by the Concurrent Logical Volume Manager (CLVM). CLVM is capable of keeping LVM related ODM information in sync automatically using the CLVM daemon (clvmd). This allowed for online LVM maintenance of volume groups.

Clvmd would start automatically when the volume group was varied on in concurrent mode using the varyonvg -c command.

12.2 Disk reservations

When a volume group is varied on in AIX, a disk reserve is placed against each member disk. This is to ensure that no other systems can access these drives to maintain data integrity.

These reserves are often called SCSI reserves and they are based on SCSI standards. Most of today’s newest FC disks are using FSCSI protocol and still utilize a SCSI reserve.

Note: Concurrent volume groups are no longer available starting in AIX 5.2 and are now considered obsolete.


The SCSI standards define two different types of reservations:

� SCSI-2 “traditional” reservation� SCSI-3 persistent reservation (PR)

A SCSI-2 reservation allows access along a single path only, so this reservation could not be used for general multipathing access to the storage. SCSI-2 reservations are not persistent and they do not survive node reboots.

SCSI-3 PR (persistent reservation) supports device access through multiple nodes, while at the same time blocking access to other nodes. SCSI-3 PR reservations are persistent across SCSI bus resets or node reboots, and they also support multiple paths from host to disk.

SCSI-3 PR uses a concept of registration and reservation. Systems that participate, register a “key” with the SCSI-3 device. Each system registers its own key. Registered systems can then establish a reservation. With this method, blocking write access is as simple as removing the registration from a device. When a system wants to eject another system, it issues a “pre-empt and abort” command, which ejects another node. After a node is ejected, it has no key registered so it cannot eject others. This method effectively avoids the split-brain condition.

Another benefit of the SCSI-3 PR method is that because a node registers the same key on each path, ejecting a single key blocks all I/O paths from that node. For example, SCSI-3 PR is implemented by EMC Symmetrix, Sun™ T3, and Hitachi Storage systems. ESS SDD uses persistent reservations while LVM commands use “traditional” reservation.

This reserve will normally stay in place until removed by a varying off the volume group. Even if the AIX system is halted, if the disks maintain power, the reserve normally stays set. For this reason, PowerHA must break disk reserves during fallover to bring the volume group online to the standby node.

Disk reserves are not used for concurrent, or enhanced concurrent mode volume groups used when the volume groups are online in concurrent access mode. This is also true when using enhanced concurrent volume groups in a fast disk takeover configuration.

More information about fast disk takeover can be found in 12.4, “Fast disk takeover” on page 580.


12.3 Forced varyon of volume groupsThis ability is very important when using mirrored volume groups. It is an attribute of the varyonvg command is represented by the -f flag. Utilizing this flag enables a volume group to be brought online even when a quorum of disks is not available. During the varyon process, each logical volume is checked and at least one complete copy of each must be found for the varyon to succeed.

It is standard practice to disable the quorum setting when configuring a mirrored volume group. When quorum is disabled, it allows the volume group to stay online even after disk failures. As long as one disk is available the volume group will stay online. However, to initially varyon the volume group all disks must be available. In order to varyon on a volume group which does not have all member disks available, the force attribute must be used.

This setting is most commonly used when mirroring across storage subsystems, and/or mirroring between locations via cross-site LVM mirroring to create what we often refer to as campus style clusters. This allows for site redundancy so in the event of a site outage, (a site consists of a server and one copy of storage), a server at the remote site can active the volume group off of the local LVM copy. In recent years, campus style clusters using cross-site LVM mirroring has become more common. PowerHA has also added additional features to be more aware and better handling of this type of configure. More information about cross-site LVM mirroring can be found in Chapter 16.

12.4 Fast disk takeover

This section explains the following concepts in regard to fast disk takeover:

� Prerequisites� How fast disk takeover works� Enabling fast disk takeover � Advantages� Special considerations

12.5 Prerequisites

The following levels are required to implement fast disk takeover

� AIX 5.2 or higher� Bos.clvm.enh 5.2.0.11 or higher� Enhanced concurrent shared data volume groups


12.5.1 How fast disk takeover works

Historically, disk takeover has involved breaking a SCSI reserve on each disk device in a serial fashion. The amount of time it takes to break the reserve varies by disk type. In a large environment with hundreds of disks, this can add significant amount of time to the fallover.

Fast disk takeover reduces total fallover time by providing faster acquisition of the disks without having to break SCSI reserves. It utilizes enhanced concurrent volume groups, and additional LVM enhancements provided by AIX 5.2.

AIX 5.2 introduced the ability to varyon an enhanced concurrent volume group in two different modes:

� Active Mode:

– Operations on file systems, such as file system mounts– Operations on applications– Operations on logical volumes, such as creating logical volumes– Synchronizing volume groups

� Passive Mode:

– LVM read-only access to the volume group’s special file– LVM read-only access to the first 4 KB of all logical volumes that are

owned by the volume group

The following operations are not allowed when a volume group is varied on in the passive state:

� Operations on file systems, such mount� Any open or write operation on logical volumes� Synchronizing volume groups

Active mode is similar to a non-concurrent volume group being varied online with the varyonvg command. It provides full read/write access to all logical volumes and file systems, and it supports all LVM operations.

Passive mode is the LVM equivalent of disk fencing. Passive mode only allows readability of the VGDA and the first 4 KB of each logical volume. It does not allow read/write access to file systems or logical volumes. It also does not support LVM operations.

When a resource group, containing an enhanced concurrent volume group, is brought online, the volume group is first varied on in passive mode and then it is varied on in active mode. The active mode state only applies to the current resource group owning node. As any other resource group member node joins the cluster, the volume group is varied on in passive mode.


When the owning/home node fails, the fallover node simply changes the volume group state from passive mode to active mode through the LVM. This change takes ~10 seconds and is at the volume group level. It can take longer with multiple volume groups with multiple disks per volume group. However, the time impact is minimal compared to the previous method of breaking SCSI reserves.

The active and passive mode flags to the varyonvg are not documented because they should not be used outside a PowerHA environment. It can, however, be easily found in the hacmp.out log.

Active mode varyon command:

varyonvg -n -c -A app2vg

Passive mode varyon command:

varyonvg -n -c -P app2vg

To determine if the volume group is online in active or passive mode, verify the “VG PERMISSION” field from the lsvg output as shown in Figure 12-2 on page 583.

There are other distinguishing LVM status features that you will notice for volume groups that are being utilized in a fast disk takeover configuration. For example, the volume group will show online in concurrent mode on each active cluster member node using the lspv command. However, the lsvg -o command will only report the volume group online to the node that has it varied on in active mode. An example of how passive mode status is reported is shown in Figure 12-2.

Important: Do not run these commands without cluster services running. It is also not recommended to ever run these commands unless directed to do so from IBM support.


Figure 12-2 Passive mode volume group status

12.5.2 Enabling fast disk takeover

There is not an actual option or flag within the PowerHA cluster configuration specifically related to fast disk takeover. It is a logical relationship on how the cluster is configured.

The shared volume groups must be enhanced concurrent volume groups. These volume groups are then added as resources to a non-concurrent mode style resource group. The combination of these two things is how PowerHA determines to use the fast disk takeover method of volume group acquisition.

When a non-concurrent style resource group is brought online, PowerHA checks one of the volume group member disks to see if it is an enhanced concurrent volume group or not. PowerHA determines this by running the lqueryvg -p devicename -X command. If the return output is 0, then it is a regular non-concurrent volume group. If the return output is 32, then it is an enhanced concurrent volume group.

Justen / > lspv hdisk5 0022be2a8607249f app2vghdisk6 0022be2a86607918 app1vg concurrenthdisk7 0022be2a8662ce0e app2vghdisk8 0022be2a86630978 app1vg concurrent

Justen / > lsvg -orootvg

Justen / > lsvg app1vgVOLUME GROUP: app1vg VG IDENTIFIER: 0022be2a00004c48VG STATE: active PP SIZE: 16 megabyte(s)VG PERMISSION: passive-only TOTAL PPs: 1190 MAX LVs: 256 FREE PPs: 1180LVs: 0 USED PPs: 10 OPEN LVs: 0 QUORUM: 2TOTAL PVs: 2 VG DESCRIPTORS: 3STALE PVs: 0 STALE PPs: 0ACTIVE PVs: 2 AUTO ON: noConcurrent: Enhanced-Capable Auto-Concurrent: DisabledVG Mode: ConcurrentNode ID: 6 Active Nodes:MAX PPs per PV: 1016 MAX PVs: 32LTG size: 128 kilobyte(s) AUTO SYNC: noHOT SPARE: no BB POLICY: relocatable


In Figure 12-3, hdisk0 is a rootvg member disk that is non-concurrent. Hdisk6 is an enhanced concurrent volume group member disk.

Figure 12-3 Example of how PowerHA determines volume group type

Advantages There are at least two advantages of using fast disk takeover:

� Faster disk acquisition time� LVM ODM synchronization

We have already explained the first benefit of faster disk acquisition time in 12.5.1, “How fast disk takeover works” on page 581. The other advantage, LVM ODM synchronization, is directly related to using enhanced concurrent volume groups and having the gsclvmd daemon running with cluster services.

When all member nodes of an enhanced concurrent volume group are online in an active cluster, LVM related changes executed on the home node (using command line or using SMIT) are automatically synchronized across the other members. This action greatly reduces the possibility of mismatched volume group information on each node.

However, this advantage itself is not considered a best practice in a PowerHA environment. C-SPOC is the recommended method of keeping LVM related ODM information in sync across cluster nodes. C-SPOC has the following advantages over the stand alone LVM ODM synchronization:

1. It is independent of cluster node status (active/inactive).

2. It can be used for JFS related changes.

3. It updates vgda time stamp files on cluster nodes.

Maddi / > lqueryvg -p hdisk0 -X0Maddi / >lqueryvg -p hdisk6 -X32

Note: Newer versions of AIX store the volume group timestamp information in the CuAt ODM file. HACMP 5.4 and above verifies the AIX level to determine if creating a copy of the time stamp file is necessary or not.


Special considerationsThe following special considerations apply to storage:

No SCSI reserveOne consideration is the lack of SCSI reserves. This why you should never allow non-cluster nodes to have the shared data volume groups visible to them. Also, when all of the PowerHA nodes in a cluster are active, each of the cluster members maintains the volume group LVM layer in an open state, preventing any subsequent LVM open requests from setting their own locks on the disks. Typically the active side will be hosting the volume group in the ACTIVE state and the remaining nodes in the cluster in the PASSIVE mode.

The potential exposure comes in when either a non-cluster node accesses the disks or external monitoring or management software queries the disk. In those scenarios, there is no mechanism in place to prevent access to the disks. This of course can lead to unpredictable and often undesirable results.

We have noted this behavior with applications that have disk polling functions and can implicitly try to open the disks such as storage copy services, backup software, and SAN management software.

Note that on some of these applications, the polling function can be disabled, but the responsibility of noting the existence of these applications on the machine and the integrity of the disks ultimately falls on the end-user.

LVM capabilitiesAnother limitation is the inability to perform certain LVM commands and utilize certain attributes while the volume group(s) are online in concurrent mode. These are:

� Commands:

– reorgvg– splitvg

� Volume group attributes:

– Big vg– Hot spare– Sync (auto)– Factor size– Dynamic volume expansion– Bad block relocation

This obviously means that these cannot be changed via C-SPOC either.


Forced downIn versions prior to HACMP 5.4, another limitation of using fast disk takeover was that stopping cluster services using the forced option was no longer valid. When stopping a cluster using smitty clstop, HACMP would check to see if there were any shared volume groups currently active in concurrent mode. If so, then the forced option would not appear in the menu.

The reason why is that the volume groups would be left online but gsclvmd would be stopped. If you stopped cluster services, you also stopped the services needed to maintain volume group consistency. This leaves a possible exposure to the volume groups similar to lack of SCSI reserve mentioned previously in this section.

The inability to perform a forced down has been addressed by the facts that cluster services should run all the time, as well as the addition of the Unmanage resource group option as introduced in HACMP 5.4.

12.6 Disk heartbeat

In this section we discuss the following topics concerning disk heartbeat:

� Overview� Prerequisites� Performance considerations� Configuring traditional disk heartbeat� Configuring multi-node disk heartbeat� Testing disk heartbeat connectivity� Monitoring disk heartbeat

12.6.1 Overview

In the following section we describe the types of disk heartbeat.

Traditional disk heartbeatDisk heartbeat is another form of non-IP heartbeat that utilizes the existing shared disks of any disk type. This feature, which was introduced in HACMP 5.1, has become the most common and preferred method of non-IP heartbeat. It eliminates the need for serial cables or 8-port asynchronous adapters. Also, it can easily accommodate greater distances between nodes when using a SAN environment.


This feature requires using enhanced concurrent volume groups to allow access to the disk by each node. It utilizes a special reserved area on the disks to read and write the heartbeat data. Because it uses a reserved area, it allows the use of existing data volume groups without losing any additional storage space. However, be aware of the concerns discussed in 12.6.3, “Performance considerations” on page 588.

It is possible to use a dedicated disk/LUN for the purpose of disk heartbeat. However, because disk heartbeat uses the reserved space, the remaining data storage area is unused. The bigger the disk/LUN you use solely for this purpose, the more space that is wasted. However, you could use it later for additional storage space if needed.

A traditional disk heartbeat network is a point-to-point network. If more than two nodes exist in your cluster, you will need a minimum of N number of non-IP heartbeat networks. Where N represents the number of nodes in the cluster. For example, a three node cluster requires at least three non-IP heartbeat networks.

Multi-node disk heartbeatHACMP 5.4.1 introduced another form of disk heartbeating called multi-node disk heartbeat (mndhb). Unlike traditional disk heartbeat networks, it is not a single point-to-point network. But instead, as its name implies, it allows multiple nodes to use the same disk. However, it does require configuring a logical volume on an enhanced concurrent volume group. While this could reduce the total number of disks required for non-IP heartbeating it is, of course, recommended to have multiple to eliminate a single point of failure.

Multi-node disk heartbeating also offers the ability to invoke one of the follow actions on its loss:

halt Halt the node (default)

fence Fence this node from the disks

shutdown Stop PowerHA on this node (gracefully)

takeover Move the resource group(s) to a backup node

Important: Currently, multi-node disk heartbeating can only be utilized in a resource group configured with a startup policy of “Online On All Available Nodes”. Currently only Oracle RAC takes advantage of mndhb. For more details consult Oracle Metalink note number 404474.1 found at:




12.6.2 Prerequisites

The following minimum software and hardware is required for traditional disk heartbeat:

� HACMP 5.x

� Bos.clvm.enh (required for enhanced concurrent vg support)

� Shared disks (configured as enhanced concurrent volume groups)

In addition to those listed above, the following is required for multi-node disk heartbeat:

� HACMP

– 5.3 PTF6 or

– 5.4.1 or higher

� RSCT

– 2.4.7.3 + IZ01838 or

– 2.5.0.0

� AIX

– bos.rte.lvm 5.3.0.60 or

– bos.rte.lvm 6.1.0.0

– A 32MB logical volume

• The logical volume must be on a single physical disk

• Mirror write consistency (MWC) must be off

12.6.3 Performance considerationsMost modern non-RAID disks can perform ~100 seeks per second. The sectors used for disk heart beating are part of the VGDA. The VGDA is located at the outer edge of the disk, and might not be near the application data. This means t every time there is a disk heartbeat that a seek will be performed. Disk heartbeating will typically (with the default parameters) require four (4) seeks per second. That is, each of two nodes will write to the disk and read from the disk once/second, for a total of 4 IOPS. When choosing a disk to be used for disk heartbeat, we recommend that you use disk that has fewer than 60 seeks per second. The filemon tool can be used to monitor the seek activity on a disk.

If you choose to use a disk that has an I/O load that is above the recommended value, then we recommend that you change the failure detection rate of the disk heartbeat network to slow. More information can be found in section 14.2


The stated recommendation is based on non-RAID (or JBOD) storage. The technology of the disk subsystem affects the overall recommendation. For example:

� If the disk is part of an enterprise class storage subsystem with large amounts of write cache, such as ESS, then the seeks can be much higher.

� If the disk used for heart beating is part of a RAID set, or RAID subset, with little or no caching, the disk would support fewer seeks, due to the extra activity required by RAID operations. Check with the manufacturer to determine how many seeks the specific unit can support.

12.6.4 Configuring traditional disk heartbeat

This example consists of a two-node cluster (nodes camryn and kelli) with a shared devices (hdisk3 and hdisk5 respectively) to be used as the disk heartbeat device. Both hdisks (which are actually the same physical disk) are already configured as member disks of an enhanced concurrent volume group.

There are two different methods to configure a disk heartbeat device:

� Use the discovery method� Use the predefined devices method

For this example, we use the predefined devices method. When using this method, it is necessary to create a diskhb network first, then assign the disk-node pair devices to the network.

The key information needed, before continuing with the predefined method, is knowing exactly what the devices names are on each node. It is not necessary that the names match as demonstrated in our example. The devices can be matched by the pvid on each node by running lspv.

Create the diskhb network as follows:

1. Using the smitty hacmp fast path, select Extended Configuration Extended Topology Configuration Configure HACMP Networks Add a Network to the HACMP cluster choose diskhb.

2. Enter the desired network name (defaults to net_diskhb_01) as shown in Figure 12-4.

Note: When using the discovery method, PowerHA matches the disk devices by PVID automatically and provides a picklist to choose from.


Figure 12-4 Adding diskhb network

Now add two communication devices, one for each node, to the disk heartbeat network created in the previous step.

Run the following procedure:

1. smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP Communication Interfaces/Devices Add Communication Interfaces/Devices Add Pre-Defined Communication Interfaces and Devices Communication Devices

2. Choose the diskhb created in the previous step (net_diskhb_01). Press Enter.

For Device Name, this is a unique name you can choose to describe the device. It will show up in your topology under this name, much like serial heartbeat and ttys have in the past.

For the Device Path, type in /dev/hdisk#. Then choose the corresponding node for this device (Figure 12-5).

Figure 12-5 Adding individual diskhb communication devices

Add a Serial Network to the HACMP Cluster


[Entry Fields]* Network Name [net_diskhb_01]* Network Type diskhb

Add a Communication Device


[Entry Fields]* Device Name [ck_disk_hb]* Network Type diskhb* Network Name net_diskhb_01* Device Path [/dev/vpath0]* Node Name [camryn] +


After creating the first device of any non-IP network, it is normal to get the warning message shown in Example 12-1

Example 12-1 Diskhb warning

WARNING: Serial network [net_name] has 1 communication device(s) configured.Two devices are required for a serial network.

After you repeat this process for the other node (kelli) and the other device (hdisk5), the warning will no longer exist because the two-device requirement is fulfilled.

12.6.5 Configuring multi-node disk heartbeat

This example consists of a two-node cluster (nodes jordan and travis) with a shared device (hdisk2) to be designated as the multi-node disk heartbeat device. This disk is currently a free disk not part of any volume group as shown in Example 12-2.

Example 12-2 Free disks to be used for multi-node disk heartbeat

jordan[/]lspvhdisk0 000fe401afb3c530 rootvg activehdisk1 000fe401d39e0575 vg1hdisk2 000fe401d39e2344 None

travis[/]lspvhdisk0 000fe411b923cfe9 rootvg activehdisk1 000fe401d39e0575 vg1 hdisk2 000fe401d39e2344 None

It is not necessary that the disk names match, though they do in our example. The key is to match up the disk pvid on each node by running lspv.

There are two different SMIT paths available to configure a multi-node disk heartbeat device:

� Using the smitty hacmp fast path, select Extended Configuration Extended Topology Configuration Configure HACMP Networks Manage concurrent Volume Groups for Multi-Node Disk Heartbeat.

� Using the smitty hacmp fast path, select System Management (C-SPOC) HACMP Concurrent Logical Volume Management Manage concurrent Volume Groups for Multi-Node Disk Heartbeat.


When using PowerHA 5.5, you have the options shown in the menu, “Manage Concurrent Volume Groups for Multi-Node Disk Heartbeat” (Figure 12-6).

Figure 12-6 Main mult-node disk heartbeat menu

In our scenario, we choose the second option, Create a new Volume Group and Logical Volume for Multi-Node Disk Heartbeat. This will automatically generate a list of free disks between the nodes that are already defined to the cluster. In our case, only hdisk2 appears, as shown in Figure 12-7.

Figure 12-7 Free disks for multi-node disk heartbeat

Tip: The SMIT fast path option to the multi-node disk heartbeat is smitty cl_manage_mndhb.

Manage Concurrent Volume Groups for Multi-Node Disk Heartbeat


Use an existing Logical Volume and Volume Group for Multi-Node Disk Heartb Create a new Volume Group and Logical Volume for Multi-Node Disk Heartbeat Add a Concurrent Logical Volume for Multi-Node Disk Heartbeat Show Volume Groups in use for Multi-Node Disk Heartbeat Stop using a Volume Group for Multi-Node Disk Heartbeat Configure failure action for Multi-Node Disk Heartbeat Volume Groups

+------------------------------------------------------------------------+ | Physical Volume to reserve for Heartbeat | | | | Move cursor to desired item and press Enter. | | | | 000fe401d39e2344 ( hdisk2 on all cluster nodes ) | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do |F1| /=Find n=Find Next |F9+------------------------------------------------------------------------+


After choosing our free disk, we are now ready to finish creating our multi-node disk heartbeat. While each field is required, most of them are editable to choose a name that you want. Keep in mind the following limits as well those from the help menu:

� Volume Group Name: This specifies the volume group name. The name must be cluster-wide unique and can range from 1 to 15 characters.

� Volume Group Major Number: This is the major number for the volume group to be used.

� PVID for Logical Volume: This field is not editable as it shows the PVID for the disk chosen in the previous pop-up menu.

� Logical Volume Name: This is the name of the logical volume you want to be created for the multi-node disk heartbeat to use. The name must be cluster-wide unique and can range from 1 to 15 characters.

� Physical partition SIZE in megabytes: Just like any other volume group that you create, you must choose the physical partition size.

� Node Names: This field is not editable and will already contain the node names from your cluster.

� Resource Group Name: The volume group will be controlled by this resource group when cluster services are started. If other resource groups are available, then you would be able to choose one from the F4 list.

� Network Name: This is the name that will be associated with this multi-node disk heartbeat. This will also show up in your cluster topology.


Figure 12-8 Create a multi-node disk heartbeat final menu

Upon completing and validating the required fields, simply press Enter twice and the new volume group, logical volume, and multi-node disk heartbeat network will be created automatically.

12.6.6 Testing disk heartbeat connectivity

After the device and network definitions have been created, we recommend that you test it to make sure communications are working properly. If the volume group is varied on in normal mode on any one of the nodes, the test will probably not succeed.

The /usr/sbin/rsct/bin/dhb_read command is used to test the validity of a diskhb connection. The usage of dhb_read is as in Table 12-1:

Create a new Volume Group and Logical Volume for Multi-Node Disk Heartbeat


[TOP] [Entry Fields] Volume Group Name [mndhb_vg_01] + Volume group MAJOR NUMBER [35] +# PVID for Logical Volume 000fe401d39e2344 Logical Volume Name [mndhb_lv_01] + Physical partition SIZE in megabytes 4 + Node Names jordan,travis Resource Group Name [rg_diskhbmulti_01] + Network Name [net_diskhbmulti_01] +




Table 12-1 dhb_read usage examples

To test the diskhb network connectivity we set one node, camryn, to receive while the other node, kelli will transmit.

On camryn, we run:

dhb_read -p hdisk3 -r

On kelli, we run:

dhb_read -p hdisk5 -t

If the link between the nodes are operational, both nodes will display “Link operating normally” as shown in Figure 12-9.

Figure 12-9 Disk heartbeat communications test

Command Action

dhb_read -p devicename dumps diskhb sector contents

dhb_read -p devicename -r receives data over diskhb network

dhb_read -p devicename -t transmits data over diskhb network

Note: Disk heartbeat testing can only be run when PowerHA is not running on the nodes.

Note: The devicename is the raw device as designated with the “r” proceeding the device name. For hdisks, the dhb_read utility automatically converts it to the proper raw device name. For all other devices (vpath, hdiskpower, to name two), it is required to specify it explicitly.

Camryn /usr/sbin/rsct/bin >./dhb_read -p rvpath0 -rReceive Mode:Waiting for response . . .Link operating normallyCamryn /usr/sbin/rsct/bin >

Kelli /usr/sbin/rsct/bin >./dhb_read -p rvpath0 -tTransmit Mode:Detected remote utility in receive mode. Waiting for response . . .Link operating normallyKelli /usr/sbin/rsct/bin >


The volume groups associated with the disks used for disk heartbeating (disk heartbeating networks) do not have to be defined as resources within a resource group. However it is not uncommon to use a shared data volume group for disk heartbeating, hence in that case it would be a resource in the resource group.

12.6.7 Monitoring disk heartbeat

After cluster services are running, you can monitor the activity of the disk (actually all) heartbeats via cltopinfo -m. The main field to monitor is the Current Missed Heartbeats. If the total continues to grow, it is a good indication that there is a problem and possibly that the disk is not optimal for a diskhb network. Either move the diskhb to another disk, or change the failure detection rate of the diskhb network to slow.

Output from cltopinfo is shown in Figure 12-10.

Figure 12-10 Monitoring diskhb

The default grace period before heartbeats start processing is 60 seconds. If running this command quickly after starting the cluster, you will not see any disk heartbeat information until the grace period time has elapsed.

Camryn /# cltopinfo -m

Interface Name Adapter Total Missed Current Missed Address Heartbeats Heartbeats--------------------------------------------------------------en1 10.10.10.20 0 0en2 10.10.11.2 0 0rhdisk3 255.255.10.1 0 0

Note: Multi-node disk heartbeat was introduced in HACMP 5.4.1 and the cltopinfo command is not fully multi-node disk heartbeat aware. You can monitor multi-node disk heartbeat, and all heartbeats, using the lssrc -l topscvs command.


12.7 Fast failure detection

This feature was added in HACMP 5.4. It provides faster response to a system failing specifically due to AIX kernel panic. AIX provides a kernel callout in the panic path. It sends a departing message to a traditional disk heartbeat device within one heartbeat period. This message is picked up by topology services and distributed throughout the other cluster nodes about the node_down event. In turn, this results in normal node fallover processing being initiated immediately instead of waiting for missed heartbeats.

The following list gives the bare minimum requirements in order to utilize the fast failure detection feature:

� HACMP 5.4 or above is required.

� AIX 5.3 TL5 (5300-05) or above is required.

� RSCT 2.4.5.2 or above is required.

� Traditional disk heartbeat must be configured.

� Disk heartbeat network must be enabled for fast failure detection.

To enable a traditional disk heartbeat network for fast failure detection, you must change the custom parameters to the corresponding diskhb network module. This is done by running:

1. smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP Network Modules Change a Network Module using Custom Values

Note: Currently fast failure detection is not available for multi-node disk heartbeat devices, only traditional disk heartbeat.


2. Choose the diskhb network type and press Enter. You will then be presented with the menu shown in Figure 12-11. In the Parameters field, you must manually enter “FFD_ON” because that field will not generate a picklist to choose from.

Figure 12-11 Enabling fast failure detection

Change a Cluster Network Module using Custom Values


[TOP] [Entry Fields]* Network Module Name diskhb Description [Disk Heartbeating Pro] Address Type Device + Path [/usr/sbin/rsct/bin/hats_diskhb_nim] Parameters [FFD_ON] Grace Period [30] # Supports gratuitous arp [false]+ Entry type [adapter_type] Next generic type [transport] Next generic name [] Supports source routing [false]+ Failure Cycle [4]#

Important: For fast failure detection to become activated, you must both synchronize the cluster and restart cluster services.


Chapter 13. Networking considerations

In this chapter we describe several new network options available within PowerHA. Some of these new features include the service IP distribution policy and the automatic creation of the clhosts file.


� EtherChannel� Distribution preference for service IP aliases� Site specific service IP labels� Understanding the netmon.cf file� Understanding the clhosts file� Understanding the clinfo.rc file

13


13.1 EtherChannelEtherChannel is a port aggregation method whereby up to eight ethernet adapters are defined as one EtherChannel. Remote systems view the EtherChannel as one IP and MAC address, so up to eight times network bandwidth is available in one network presence.

Traffic is distributed across the adapters in the standard way (address algorithm) or on a round robin basis. If an adapter fails, traffic is automatically sent to the next available adapter in the EtherChannel without disrupting user connections. When only one link in the main EtherChannel is active, a failure test triggers a rapid detection / fallover (in 2-4 seconds) to an optional backup adapter with no disruption to user connections. Two failure tests are offered—the physical adapter link to network and the optional TCP/IP path to the user-specified node. When failure is detected, the MAC and IP addresses are activated on the backup adapter. When at least one adapter in the main channel is restored, the addresses are reactivated on the main channel.

The AIX V5.1 Network Interface Backup (NIB) configuration mode was replaced and enhanced in AIX V5.2. The new method is a single adapter EtherChannel with backup adapter, providing a priority (fallback upon link repair) between the primary and backup links which the previous implementation lacked. The dynamic adapter membership (DAM) enhancement in AIX V 5.2 allows the dynamic reconfiguration of adapters within the EtherChannel without disruption to the running connection.

All multi-adapter channels require special EtherChannel or IEEE 802.3ad port configuration in the network switch. In most cases, the switch will be configured for EtherChannel mode. However, if the switch does not support EtherChannel or if the corporation has standardized on IEEE 802.3ad, then configure 802.3ad at both the switch and in AIX. Single-adapter links, on the other hand, require no special configuration at the network switch. This includes a single-adapter EtherChannel and the backup adapter connection.

EtherChannel has the following benefits:

� Higher bandwidth and load balancing options:

– Multi-adapter channels utilize aggregate bandwidth.

– Several user configurable alternatives are given for directing traffic across the channel adapters.

Note: PowerHA itself does not state support for DAM as this is below the level that PowerHA monitors. So no support statement should be required.


� Built in availability features:

– It automatically handles adapter, link and network failures

– An optional backup adapter is offered to avoid single point of failure (SPOF) at the network switch.

– Design techniques are possible to avoid SPOFs.

� A simple, flexible solution and growth path:

– One Ethernet MAC and IP address is used for entire aggregation (including backup adapter)

– It accommodates future bandwidth requirements easily.

– The user can add, delete, and reconfigure adapters dynamically (no service disruption).

� Various options for interoperability with network switch:

– Multi-adapter channels for both EtherChannel and 802.3ad capable switches are provided.

– Single adapter channels and backup adapter links are transparent to the network switch.

– Channel backup adapter option is available (it can connect to a different network switch to avoid SPOF).

– Channel operates without switch when two systems cabled directly (back-to-back, though not applicable in PowerHA environments).

� It is free (assuming that EtherChannel capable switches are already in place).EtherChannel is included in AIX and is regularly enhanced since AIX v4.3.3.

13.1.1 Implementing EtherChannel in a PowerHA environmentPowerHA officially supports the use of EtherChannel. Integrating an EtherChannel solution into a cluster is relatively easy and can actually simplify network addressing and subnet requirements. In many cases, all addresses are configured on the same logical interface. In these cases of a single interface per network configuration, it is possible to have both the boot and service IPs all on the same subnet. Though not shown in the following example, more details and another example can be found at:

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105185

In our example, we only show the relevant parts as it relates to the combination of PowerHA and EtherChannel specifically, without using step-by-step PowerHA menus. To avoid repetition in this book, basic PowerHA configuration knowledge is assumed. We recommend other best practices, such as having non-IP heartbeat networks configured and using an EtherChannel capable switch as opposed to a crossover cables.

Chapter 13. Networking considerations 601


The following details are based on a previous write-up (May 2004) that we did on this combination, but is still valid today. The original document can be found at:


Test environment configurationOur test environment was constructed using the following combination of components:

� Two pSeries p630 systems (named neo and trinity)

� AIX V5.3 ML8

� PowerHA V5.4.1

� Ethernet network connections ent0 through ent6:

– ent0 and ent5 (unused) are integrated 10/100 adapters– ent1, ent2, ent3, ent4 (unused) are all on a single 4-port 10/100 adapter– ent6 - EtherChannel (comprised of ent2, ent3 and ent0– Three UTP Ethernet crossover cables

Figure 13-1 is a diagram of the cluster configuration we used:

Figure 13-1 EtherChannel and PowerHA test environment

In this test, we successfully implemented a “single adapter network” using PowerHA IP Address Takeover (IPAT) with the EtherChannel. The EtherChannel is responsible for providing local adapter swapping which is outside of PowerHA. PowerHA has no knowledge of EtherChannel and is completely independent. While a single adapter network is normally considered not ideal, EtherChannel makes this okay because there are multiple physical adapters within the single EtherChannel pseudo device. Thus, we can safely ignore the insufficient adapter warning messages posted during cluster synchronization.

Sys

tem

“n

eo”

Sys

tem

“tr

init

y”ent1

ent2

ent3

ent0

ent1

ent2

ent3

ent0

(ent6)Etherchannel

2.2.2.1

(ent6)Etherchannel

2.2.2.2

(backup, in standby)

Ethernet Switch

Admin Network

9.19.176.107 9.19.176.108



Our configuration consisted of an Online on First Available Node resource group with a single adapter network using IP aliasing. Our testing proved to be beneficial in simplifying the PowerHA setup. We implemented the EtherChannel connection without a network switch, by cabling the two test systems directly with crossover cables. This was only done for testing purposes. A typical PowerHA environment would have these adapters cabled to an EtherChannel capable switch to fully exploit it.

Most switch manufacturers expect attachment of the individual links in the EtherChannel to be to the same network switch. For additional switch redundancy you can connect the backup adapter to a separate switch.

Choose the adapters for the EtherChannel carefully. The goal is to avoid a single point of failure. In the test environment, we had an integrated Ethernet adapter and a single 4-port Ethernet adapter on each system so we chose to configure the integrated adapter as the backup to eliminate our 4-port adapter as a single point of failure.

13.1.2 Configuration procedures

We set up our cluster via the following basic steps. details on each step, as completed for system neo follows:

1. Check the ethernet adapter configurations and adapter cabling.2. Create the EtherChannel interface.3. Configure IPs on new interface (en6) via TCP/IP.4. Add boot and service IPs to PowerHA topology.5. Create a resource group and assign it the service IP.6. Synchronize the cluster.7. Start cluster services.8. Test redundancy of NICs and make sure that PowerHA does not detect it.

We started with unconfigured adapters, preferably cabled into an EtherChannel capable switch and the switch already set for an EtherChannel configuration. Our adapters had been configured previously, so we removed the ODM interface definitions using the smitty inet fast path. We completed these basic steps on both systems, using the IP interfaces and IP addresses as shown in Figure 13-1.

Step 1. Check the Ethernet adapter configurationThe adapters that will become a part of the EtherChannel should be configured for the same speed and duplex mode. We configured ent0, ent2 and ent3 for 100 Mbps, full duplex using the fast path smitty eadap Change / Show Characteristics of an Ethernet Adapter as shown in Figure 13-2.


Figure 13-2 Ethernet adapter settings

Note: If you are using Gigabit adapters, the media speed on most are set to auto-detect and are a non-tunable setting.

Change / Show Characteristics of an Ethernet Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] Ethernet Adapter ent2 Description IBM 4-Port 10/100 Bas> Status Available Location 12-08 TRANSMIT queue size [8192] HARDWARE RECEIVE queue size [256] RECEIVE buffer pool size [384] Media Speed 100_Full_Duplex Inter-Packet Gap [96] Enable ALTERNATE ETHERNET address no ALTERNATE ETHERNET address [0x000000000000] Enable Link Polling no Time interval for Link Polling [500] Apply change to DATABASE only no F1=Help F2=Refresh F3=Cancel F4=List Esc+5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=Do

Tip: At this point, its a good idea to test these links by configuring IP addresses on each side. Just remember to remove the configuration prior to the next step.


Step 2. Configure the EtherChannelConfigure the EtherChannel through the fast path smitty etherchannel Add an Etherchannel/Link Aggregation and select the appropriate adapters via F7. In our configuration, ent2 and ent3 comprise the main channel and ent0 is the backup adapter. Processing the following menu, as pictured in Figure 13-3, creates the new EtherChannel interface (ent6).

Figure 13-3 Add EtherChannel Menu

We selected round robin mode so both links will be utilized in this configuration. Refer to the EtherChannel documentation to learn about the different modes and select the one that will best suit your configuration.

Add An EtherChannel / Link Aggregation Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] EtherChannel / Link Aggregation Adapters ent2,ent3 + Enable Alternate Address no + Alternate Address [] + Enable Gigabit Ethernet Jumbo Frames no + Mode round_robin + Hash Mode default + Backup Adapter ent0 + Automatically Recover to Main Channel yes + Internet Address to Ping [] Number of Retries [] + Retry Timeout (sec) [] + F1=Help F2=Refresh F3=Cancel F4=List Esc+5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=Do

Important: When implementing a single interface with a single backup adapter (previously network interface backup or NIB) and specifying the values of “Number of Retries” and “Retry Timeout,” make sure that they do not exceed PowerHA’s NIM settings for the failure detection rate. We recommend that you have these settings be at least half of the PowerHA settings.


Step 3. Configure IP on EtherChannel deviceConfigure the IP interface (en6) on the EtherChannel using fast path smitty chinet choose en6 as shown in Figure 13-4.

Figure 13-4 Configure IP to EtherChannel device

We repeated this procedure on node trinity using an IP address of 2.2.2.2.

Step 4. Configure the PowerHA topologyIn our testing we chose to use IP aliasing when defining our PowerHA network (channet). We configured our boot IP addresses on each EtherChannel device (neo_boot 2.2.2.1, trinity_boot 2.2.2.2). We then defined our service IP address (bound to multiple nodes) 192.168.43.4 and our persistent IP addresses, 192.168.43.10 on neo and 192.168.43.20 on trinity. Our topology configuration can be seen in Figure 13-5 in the output of the cllsif command.

Change / Show a Standard Ethernet Interface Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] Network Interface Name en6 INTERNET ADDRESS (dotted decimal) [2.2.2.1] Network MASK (hexadecimal or dotted decimal) [255.255.255.0] Current STATE up + Use Address Resolution Protocol (ARP)? yes + BROADCAST ADDRESS (dotted decimal) [] Interface Specific Network Options ('NULL' will unset the option) rfc1323 [] tcp_mssdflt [] tcp_nodelay [] tcp_recvspace [] tcp_sendspace [] Apply change to DATABASE only no + F1=Help F2=Refresh F3=Cancel F4=List Esc+5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=Do

Note: When running familiar TCP/IP commands, remember to run them against the new psuedo interface (en6) and not the individual interfaces.


Figure 13-5 EtherChannel configuration cllsif output

Step 5. Configure the PowerHA resource groupWe configured one cascading resource group with the single service IP label defined to it. Because our focus was on the NIC redundancy testing, we simplified the configuration by omitting additional resource in the resource group. Our resource group definition can be seen in Figure 13-6.

Figure 13-6 EtherChannel Resource Group

Step 6. Synchronize the clusterThough PowerHA familiarity is assumed, we wanted to show the warning message displayed when PowerHA topology is configured as a single adapter network. When synchronizing the cluster, we got the following warning (see Figure 13-7).

Note: Although omitted from our example, at least one non-IP serial network should always be used in a production environment.

Change/Show Resources for a Rotating Resource Group


[Entry Fields] Resource Group Name testec_rg Participating Node Names (Default Node Priority) neo trinity* Service IP Labels/Addresses [neoec_svc] + Volume Groups [] + Filesystems (empty is ALL for VGs specified) [] + Application Servers [] +


Figure 13-7 Single adapter network warning

Because we truly have configured only one interface to PowerHA topology, this warning message is expected. EtherChannel provides the local adapter redundancy and swapping as needed.

Step 7. Start the cluster servicesExecute smitty clstart on each node and wait for node_up _complete.

Step 8. Perform testing Our testing focused on physically pulling cables to see how the system responded and to make sure that PowerHA was unaware. While performing each test, we ran a ping from an outside client node, to both boot IPs and the service IP:

1. We pulled the cable from ent3. This resulted in continued service surviving on ent2. This was verified with netstat and entstat commands, along with the surviving ping running from the client. AIX makes note of this in the error report. PowerHA, however, is unaware that a failure occurred. The error report errors are shown in Figure 13-8.

Figure 13-8 EtherChannel errors in AIX error report

WARNING: There may be an insufficient number of communication interfaces defined on node: neo, network: channet. Multiple communication interfaces are recommended for networks that will use IP aliasing.

WARNING: There may be an insufficient number of communication interfaces defined on node: trinity, network: channet. Multiple communication interfaces are recommended for networks that will use IP aliasing.

Important: When implementing a similar single adapter network configuration, for proper PowerHA network detection, it is necessary to configure a netmon.cf file. More information can be found in 13.4, “Understanding the netmon.cf file” on page 621

F77ECAC2 0624145904 T H ent3 ETHERNET NETWORK RECOVERY MODE8650BE3F 0624145904 I H ent6 ETHERCHANNEL RECOVERYF77ECAC2 0624145904 T H ent2 ETHERNET NETWORK RECOVERY MODE


2. We pulled the cable from ent2. This caused the standby adapter of ent0 to takeover the services. Much like the previous tests, AIX noted failure in the error report, but not PowerHA. Because we used crossover cables, this had a dual effect of causing similar errors and swaps on both nodes.

3. We then pulled the lone surviving adapter of ent0. This, in turn, resulted in a full EtherChannel failure, which was noticed as a failed network by PowerHA.

EtherChannel conclusions Our overall thoughts about the implementation of EtherChannels in a PowerHA environment were very positive. Although the configuration required some additional initial planning, it was very quick and easy to set up. We were especially pleased with the recovery times of our testing; they were almost instantaneous and had no impact on our running cluster. We were also pleased at how the implementation of this model eliminated the removal of routes in PowerHA events associated with local adapter swaps, making the failure time shorter and easier to troubleshoot.

In summary, the simplicity and overall benefits of the EtherChannel model make it a very promising choice when planning a new environment that needs PowerHA’s availability with scalable network bandwidth and redundancy. The dynamic scalability and possibilities for even greater redundancy are an even bigger incentive to consider migration to this type of configuration.

13.2 Distribution preference for service IP aliasesWhen using IP aliasing with multiple service IP addresses configured, PowerHA will analyze the total number of aliases, whether defined to PowerHA or not, and assign each service address to the least loaded interface. PowerHA gives you added control over their placement and allows you to define a distribution preference for your service IP label aliases.

This network-wide attribute can be used to customize the load balancing of HA service IP labels, taking into consideration any persistent IP labels already configured. The distribution selected is maintained during cluster startup and subsequent cluster events. The distribution preference will be maintained as long as acceptable network interfaces are available in the cluster. However, PowerHA will always keep service IP labels active, even if the preference cannot be satisfied.


There are four different distribution policies available:

� Anti-Collocation: This is the default. PowerHA distributes all service IP aliases across all base IP addresses using a “least loaded” selection process.

� Collocation: PowerHA allocates all service IP aliases on the same network interface card (NIC).

� Anti-Collocation with Persistent: PowerHA distributes all service IP aliases across all active physical interfaces that are NOT hosting the persistent IP label. PowerHA will place the service IP alias on the interface that is hosting the persistent label only if no other network interface is available. If you did not configure persistent IP labels, PowerHA lets you select the Anti-Collocation with Persistent distribution preference, but it issues a warning and uses the regular anti-collocation preference by default.

� Collocation with Persistent: All service IP aliases are allocated on the same NIC that is hosting the persistent IP label. This option can be useful in VPN firewall configurations where only one interface is granted external connectivity and all IP labels (persistent and service) must be allocated on the same interface card. If you did not configure persistent IP addresses, PowerHA lets you select the Collocation with Persistent distribution preference, but it issues a warning and uses the regular Collocation preference by default.

13.2.1 Configuring service IP distribution policyThe distribution preference can be set or changed dynamically. The steps to configure this type of distribution policy are as follows:

1. Enter smitty hacmp.

2. In SMIT, select Extended Configuration Extended Resource Configuration HACMP Extended Resources Configuration Configure Resource Distribution Preferences Configure Service IP labels/addresses Distribution Preferences and press Enter.

PowerHA will display only networks using IPAT via Aliasing.

3. Select the network for which you want to specify the policy and press Enter.

4. From the Configure Service IP Labels/Address Distribution Preference panel, choose the Distribution Preference desired.

5. Press Enter to accept your selection and update the PowerHA ODM on the local node.

6. In order for the change to take effect and to get propagated out to all nodes you will need to synchronize your cluster. Using the smitty hacmp fast path, select Extended Configuration Verification and Synchronization and press Enter. This will trigger a dynamic reconfiguration event.


Viewing the distribution preference for service IP label aliasesYou are supposed to be able to display the current distribution preference for each network using the cltopinfo or the cllsnw commands.

The output of cltopinfo -w will display the following lines:

root@ lee[/] cltopinfo -wNetwork net_diskhb_01 NODE lee: lee_dhb /dev/hdisk1 NODE ann: ann_dhb /dev/hdisk1

Network net_diskhbmulti_01 NODE lee: lee_1 /dev/mndhb_lv_01 NODE ann: ann_2 /dev/mndhb_lv_01

Network net_ether_01 NODE lee: service_address_2 192.168.1.2 service_address_1 192.168.1.1 lee 9.12.7.5 NODE ann: service_address_2 192.168.1.2 service_address_1 192.168.1.1 ann 9.12.7.11

The following sample output of cllsnw -c displays the service label distribution preference (sldp) for a particular network:

#/usr/es/sbin/cluster/utilities/cllsnw -c#netname:attr:alias:monitor_method:sldp:net_ether_01:public:true:default:lee::sldp_collocation_with_persistent

13.2.2 Lab experiences with service IP distribution policy

In our testing we were able to change the service IP distribution policy with cluster services down on all of the nodes and on cluster startup, see the IP labels and persistent IPs get distributed according the specified policy.

Note: Configuring the service IP distribution policy resulted in the messages:

cldare: Detected changes to service IP label app1svc. Please note that changing parameters of service IP label via a DARE may result in releasing resource group <name>.


This was visible in the output of netstat -i:

python-# more /etc/hosts10.10.31.31 pythona # base address 110.10.32.31 pythonb # base address 2192.168.100.31 p630n01 n1 # python persistent address192.168.100.82 app1svc # cobra service address192.168.100.83 app2svc # viper service address

python-# netstat -iName Mtu Network Address Ipkts Ierrs Opkts Oerrs Collen0 1500 link#2 0.2.55.4f.c4.ab 5044669 0 1828909 0 0en0 1500 10.10.31 pythona 5044669 0 1828909 0 0en0 1500 192.168.100 p630n01 5044669 0 1828909 0 0en0 1500 192.168.100 app1svc 5044669 0 1828909 0 0en0 1500 192.168.100 app2svc 5044669 0 1828909 0 0en3 1500 link#3 0.20.35.e2.7f.8d 3191047 0 1410806 0 0en3 1500 10.10.32 pythonb 3191047 0 1410806 0 0lo0 16896 link#1 1952676 0 1957548 0 0lo0 16896 127 localhost 1952676 0 1957548 0 0lo0 16896 localhost 1952676 0 1957548 0 0

Our testing of the dynamic change of this policy resulted in no move of any of the labels after a synchronization. The following message was logged during the synchronization of the cluster after making the service IP distribution policy change:

Verifying additional pre-requisites for Dynamic Reconfiguration...

cldare: Detected changes to service IP label app1svc. Please notethat changing parameters of service IP label via a DARE may result in releasing resource group APP1_RG .

cldare: Detected changes to service IP label app2svc. Please notethat changing parameters of service IP label via a DARE may result in releasing resource group APP2_RG .

Note: In this output, the node python had the resource groups for nodes cobra and viper and their corresponding service IPs. The distribution policy was set to Collocation with persistent.

Note: For this instance, the message logged is generic and only gets reported because a change was detected. As long as that was the only change made, no actual resources will be brought offline.


A change to the service IP distribution policy is only enforced whenever we manually invoke a swap event or stop and restart PowerHA on a node. Note that this is the intended behavior of the feature in order to avoid any potential disruption of connectivity to those IP addresses. The remaining cluster nodes will not enforce the policy unless cluster services are also stopped and restarted on them.

13.3 Site specific service IP labels

The site specific service IP label feature was added in HACMP 5.4. It provides the ability to have unique service addresses at each site. This helps solve the problem of having different subnets at each site. It can be used for the IP network types of:

� ether� XD_data� XD_ip

It also can be used in combination with regular service IP labels and persistent IP labels. It is generally recommend to use persistent IP labels, especially a node bound one with XD_data networks, as well because no communication occurs through the service IP label that is configurable on multiple nodes.

In order to configure and utilize site specific service IP labels, obviously sites must be defined to the cluster. After adding a cluster, adding nodes to the cluster, perform the following steps:

1. Add sites.

2. Add network(s) (ether, XD_data, or XD_ip):

– Enable IPAT via IP Alias

3. Add Service IP label(s):

– Configurable on multiple nodes– Specify associated site

4. Add resource group(s)

5. Add service IP label(s) into the resource group(s)

6. Synchronize cluster

7. Test site specific IP fallover

Important: In PowerHA/XD, configurations utilizing the site specific IP labels with XD network types requires configuring a persistent IP label on each node.


In our test scenario, we have a two node cluster with nodes, jordan and jessica, that currently have a single ether network with two interfaces each defined to them and resource group defined. We also have a volume group available on each node named xsite_vg as shown. Here is our starting topology configuration:

jordan /# cltopinfoCluster Name: dfwxsitelvmCluster Connection Authentication Mode: StandardCluster Message Authentication Mode: NoneCluster Message Encryption: NoneUse Persistent Labels for Communication: NoThere are 2 node(s) and 1 network(s) definedNODE jessica: Network net_ether_01 jessica_base1 10.10.10.100 jessica_base2 10.10.11.1NODE jordan: Network net_ether_01 jordan_base1 10.10.10.200 jordan_base2 10.10.11.2

Adding sitesTo define the sites:

1. Using the smitty hacmp fast path, select Extended Configuration Extended Topology Configuration Configure HACMP Sites Add a Site, and press Enter.

2. We add the two sites Dallas and FtWorth. Node jordan is a part of the Dallas site while node jessica is a part of FtWorth site. The Add Site menu is shown in Figure 13-9.

Figure 13-9 Add Site

Add Site

Type or select values in entry fields.Press Enter AFTER making all desired changes. [Entry Fields]* Site Name [FtWorth] +* Site Nodes jessica +



Adding a networkTo define the additional network:

1. Using the smitty hacmp fast path, select Extended Configuration Extended Topology Configuration Configure HACMP networks Add a Network to the Cluster, and press Enter. Choose the network type, in our case we chose XD_data, and press Enter.

2. You can keep the default network name, as we did, or specify one. Ensure that the option Enable IP Address Takeover via IP Aliases is set to yes as shown in Figure 13-10.

Figure 13-10 Add network

Adding service IP labelsTo define the site specific service IP labels:

1. Using the smitty hacmp fast path, select Extended Configuration Extended Resource Configuration HACMP Extended Resources Configuration Configure HACMP Service IP Labels/Addresses Add a Service IP Label/Address

2. Choose Configurable on Multiple Nodes, choose network (net_XD_data_01 in our case), choose an IP label from the picklist provided, and press Enter. Ensure that you specify the Associated Site that matches the node in which the service label belongs, as shown in Figure 13-11.



[Entry Fields]* Network Name [net_XD_data_01]* Network Type XD_data* Netmask [255.255.255.0] +* Enable IP Address Takeover via IP Aliases [Yes] + IP Address Offset for Heartbeating over IP Aliases []


Figure 13-11 Add a site specific service IP label

3. Repeat the previous step as needed for each service IP label. We repeated and added service IP label jessica_svc2 to the FtWorth site.

The topology as configured at this point is shown here:

jordan /# cltopinfoCluster Name: dfwxsitelvmCluster Connection Authentication Mode: StandardCluster Message Authentication Mode: NoneCluster Message Encryption: NoneUse Persistent Labels for Communication: NoThere are 2 node(s) and 2 network(s) definedNODE jessica: Network net_XD_data_01 jessica_svc2 192.20.10.3 jordan_svc1 192.10.10.2 jessica_en0 192.168.1.1 Network net_ether_02 jessica_base1 10.10.10.100 jessica_base2 10.10.11.1NODE jordan: Network net_XD_data_01 jessica_svc2 192.20.10.3 jordan_svc1 192.10.10.2 jordan_en0 192.168.1.2 Network net_ether_02 jordan_base1 10.10.10.200 jordan_base2 10.10.11.2

jordan /# cllssite---------------------------------------------------Sitename Site Nodes Dominance Protection Type

Add a Service IP Label/Address configurable on Multiple Nodes (extended)


[Entry Fields]* IP Label/Address jordan_svc1 + Prefix Length [] #* Network Name net_XD_data_01 Alternate Hardware Address to accompany IP Label/A [] ddress Associated Site Dallas +


---------------------------------------------------Dallas jordan yes NONEFtWorth jessica no NONE

jordan /# cllsif

jessica_en0 boot net_XD_data_01 XD_data public jessica 192.168.1.1 en0 jessica_svc2 service net_XD_data_01 XD_data public jessica 192.20.10.3 jordan_svc1 service net_XD_data_01 XD_data public jessica 192.10.10.2 jessica_base1 boot net_ether_02 ether public jessica 10.10.10.100 en1jessica_base2 boot net_ether_02 ether public jessica 10.10.11.1 en2jordan_en0 boot net_XD_data_01 XD_data public jordan 192.168.1.2 en0jessica_svc2 service net_XD_data_01 XD_data public jordan 192.20.10.3 jordan_svc1 service net_XD_data_01 XD_data public jordan 192.10.10.2 jordan_base1 boot net_ether_02 ether public jordan 10.10.10.200 en1jordan_base2 boot net_ether_02 ether public jordan 10.10.11.2 en2

Notice that the service addresses of jordan_sv1 and jessica_sv2 are indeed on separate subnets based on our netmask, not shown, of 255.255.255.0.

Adding a resource groupTo add a resource group:

1. Using the smitty hacmp fast path, select Extended Configuration Extended Resource Configuration HACMP Extended Resource Group Configuration Add a Resource Group, press Enter.

2. Complete the fields as desired. Our configuration is shown below in Figure 13-12.

Figure 13-12 Add a resource group



[Entry Fields]* Resource Group Name [testsiteip]

Inter-Site Management Policy [Prefer Primary Site] +* Participating Nodes from Primary Site [jordan] + Participating Nodes from Secondary Site [jessica] +

Startup Policy Online On Home Node O> Fallover Policy Fallover To Next Prio> Fallback Policy Never Fallback +

Note: The additional options, shown in bold in this figure, are available only if sites are defined.


Adding service IP labels into the resource groupTo add the service IP labels into the resource group:

1. Using the smitty hacmp fast path, select Extended Configuration Extended Resource Configuration HACMP Extended Resource Group Configuration Change/Show Resources and Attributes for a Resource Group, choose the resource group, testsiteip in our case, and press Enter.

2. Then specify the policies as desired. Choose Service IP Labels/Addresses and press F4 to get a pick of the service IP labels previously created. Choose both with F7 and press Enter. After completing the fields, press Enter to add the resources in the resource group. Our example follows in Figure 13-13.

Figure 13-13 Add service IP into resource group

Synchronizing the clusterTo synchronize the cluster:

1. Using the smitty hacmp fast path, select Extended Configuration Extended Verification and Synchronization

2. Upon successful synchronization and verification, we are ready to test.



[TOP] [Entry Fields] Resource Group Name testsiteip Inter-site Management Policy ignore Participating Nodes from Primary Site jordan Participating Nodes from Secondary Site jessica

Startup Policy Online On Home Node O> Fallover Policy Fallover To Next Prio> Fallback Policy Never Fallback

Service IP Labels/Addresses [jessica_svc2 jordan_s> Application Servers [] +

Volume Groups [xsite_vg ]


Testing site specific IP falloverTo display the results of site specific IP fallover tests, we show the netstat output before cluster start, after start, and after fallover. We start with the following output:

jordan /# netstat -iName Mtu Network Address Ipkts Ierrs Opkts Oerrs Collen0 1500 link#2 b2.2f.e0.0.30.2 144943 0 144703 0 0en0 1500 192.168.1 jordan_en0 144943 0 144703 0 0en1 1500 link#3 b2.2f.e0.0.30.3 1147516 0 135586 0 0en1 1500 10.10.10 jordan_base1 1147516 0 135586 0 0en1 1500 9.19.51 jordan_pers 1147516 0 135586 0 0en2 1500 link#4 b2.2f.e0.0.30.4 1089660 0 79221 0 0en2 1500 10.10.11 jordan_base2 1089660 0 79221 0 0lo0 16896 link#1 177591 0 178039 0 0lo0 16896 127 loopback 177591 0 178039 0 0lo0 16896 ::1 177591 0 178039 0 0

jessica /# netstat -iName Mtu Network Address Ipkts Ierrs Opkts Oerrs Collen0 1500 link#2 b2.2f.e0.0.40.2 144779 0 145015 0 0en0 1500 192.168.1 jessica_en0 144779 0 145015 0 0en1 1500 link#3 b2.2f.e0.0.40.3 1116094 0 135662 0 0en1 1500 10.10.10 jessica_base1 1116094 0 135662 0 0en1 1500 9.19.51 jessica_pers 1116094 0 135662 0 0en2 1500 link#4 b2.2f.e0.0.40.4 1057348 0 76408 0 0en2 1500 10.10.11 jessica_base2 1057348 0 76408 0 0lo0 16896 link#1 182955 0 183219 0 0lo0 16896 127 loopback 182955 0 183219 0 0lo0 16896 ::1 182955 0 183219 0 0

The text that is bolded shows that en0 is currently only configured with the boot IP address on both nodes. Whereas, en2 is hosting the persistent IP address. Upon starting cluster services, Because jordan is the primary, it will acquire the service address specific to the Dallas site. The secondary node, jessica, remains unchanged. This is shown here:

jordan /# netstat -iName Mtu Network Address Ipkts Ierrs Opkts Oerrs Collen0 1500 link#2 b2.2f.e0.0.30.2 146903 0 144703 0 0en0 1500 192.168.1 jordan_en0 146903 0 144703 0 0en0 1500 192.10.10 jordan_svc1 146903 0 144703 0 0en1 1500 link#3 b2.2f.e0.0.30.3 1159516 0 135586 0 0en1 1500 10.10.10 jordan_base1 1159516 0 135586 0 0en1 1500 9.19.51 jordan_pers 1159516 0 135586 0 0en2 1500 link#4 b2.2f.e0.0.30.4 1089660 0 79221 0 0en2 1500 10.10.11 jordan_base2 1089660 0 79221 0 0lo0 16896 link#1 177591 0 178039 0 0lo0 16896 127 loopback 177591 0 178039 0 0


lo0 16896 ::1 177591 0 178039 0 0

jessica /# netstat -iName Mtu Network Address Ipkts Ierrs Opkts Oerrs Collen0 1500 link#2 b2.2f.e0.0.40.2 144779 0 145015 0 0en0 1500 192.168.1 jessica_en0 144779 0 145015 0 0en1 1500 link#3 b2.2f.e0.0.40.3 1116094 0 135662 0 0en1 1500 10.10.10 jessica_base1 1116094 0 135662 0 0en1 1500 9.19.51 jessica_pers 1116094 0 135662 0 0en2 1500 link#4 b2.2f.e0.0.40.4 1057348 0 76408 0 0en2 1500 10.10.11 jessica_base2 1057348 0 76408 0 0lo0 16896 link#1 182955 0 183219 0 0lo0 16896 127 loopback 182955 0 183219 0 0lo0 16896 ::1 182955 0 183219 0 0

We then move the resource group to another site via the SMIT fast path, smitty cl_resgrp_move.node_site. Upon success, the primary site service IP, jordan_svc1 is removed and the secondary site IP, jessica_svc2 is brought online as shown below:

jordan /# netstat -iName Mtu Network Address Ipkts Ierrs Opkts Oerrs Collen0 1500 link#2 b2.2f.e0.0.30.2 151103 0 144703 0 0en0 1500 192.168.1 jordan_en0 151103 0 144703 0 0en1 1500 link#3 b2.2f.e0.0.30.3 1147516 0 135586 0 0en1 1500 10.10.10 jordan_base1 1147516 0 135586 0 0en1 1500 9.19.51 jordan_pers 1147516 0 135586 0 0en2 1500 link#4 b2.2f.e0.0.30.4 1089660 0 79221 0 0en2 1500 10.10.11 jordan_base2 1089660 0 79221 0 0lo0 16896 link#1 177591 0 178039 0 0lo0 16896 127 loopback 177591 0 178039 0 0lo0 16896 ::1 177591 0 178039 0 0

jessica /# netstat -iName Mtu Network Address Ipkts Ierrs Opkts Oerrs Collen0 1500 link#2 b2.2f.e0.0.40.2 149900 0 152034 0 0en0 1500 192.168.1 jessica_en0 149900 0 152034 0 0en0 1500 192.20.10 jessica_svc2 149900 0 152034 0 0en1 1500 link#3 b2.2f.e0.0.40.3 1638442 0 137317 0 0en1 1500 10.10.10 jessica_base1 1638442 0 137317 0 0en1 1500 9.19.51 jessica 1638442 0 137317 0 0en2 1500 link#4 b2.2f.e0.0.40.4 1070543 0 77209 0 0en2 1500 10.10.11 jessica_base2 1070543 0 77209 0 0lo0 16896 link#1 191780 0 192051 0 0lo0 16896 127 loopback 191780 0 192051 0 0lo0 16896 ::1 191780 0 192051 0 0


13.4 Understanding the netmon.cf file

This section discusses the effects of the netmon.cf file.

Traditional netmon.cfIn PowerHA you can create a netmon.cf configuration file with a list of additional network addresses. These addresses will only be used by topology services to send ICMP ECHO requests in order to help determine an adapter’s status under certain circumstances.

The implementation of this file is therefore not required, but recommended in cluster configurations with only a single network card on each node or in a cluster where failures have left a single adapter network remaining. In these scenarios it can be difficult for PowerHA to accurately determine an adapter failure because topology services cannot force traffic over the single adapter to confirm its proper operation.

An enhancement to netmon, the network portion of RSCT topology services, allows for a more accurate determination of a service adapter failure. This function can be used in a configuration that requires the use of a single service adapter per network.

The file must exist at cluster startup because RSCT topology services scans the netmon.cf file during initialization. When netmon needs to stimulate the network to ensure adapter function, it sends ICMP ECHO requests to each IP address in the file. After sending the request to every address, netmon checks the inbound packet count before determining whether an adapter has failed.

Creating a netmon.cf fileThe netmon.cf file must be placed in the /usr/es/sbin/cluster directory on all cluster nodes.

Here are the requirements for creating the netmon.cf file:

� The file must consist of one IP address or IP label per line.

� The file should contain remote IP labels/addresses that are not in the cluster configuration and that can be accessed from PowerHA interfaces. We recommend the use of the router’s IP address.

� A maximum of 30 IP addresses/labels can be defined in netmon.cf

� Include each IP address and its corresponding label in the /etc/hosts file.


The contents of your netmon.cf file might resemble the following coding:

/usr/es/sbin/cluster/netmon.cf192.168.100.76p690_1_lpar3192.168.100.35router_lan1

/etc/hosts (corresponding entries)192.168.100.76 node365 #client node running oracle192.168.100.21 p690_1_lpar3 #client node hosting application 4192.168.100.35 node367 #client node running db2192.168.100.200 router_lan1 #router hosting production lan

Recommendations and additional notesAs a general rule of thumb, you should implement a netmon.cf file in a two node cluster configuration using a single IP network, regardless of the implementation of a non-IP, serial heartbeat network. This should be done in order to help topology services identify an adapter failure if it ever goes into a singleton state, where basically only node is left in the cluster.

Other scenarios can include environments using a single logical EtherChannel interface made up of multiple ethernet links. In that environment link failures are handled seamlessly by the EtherChannel logic, but a complete channel failure would result in problems for topology services without the netmon.cf file. Implementing EtherChannel in a PowerHA environment is discussed in detail in section 13.1, “EtherChannel” on page 600.

13.4.1 New netmon.cf format for VIO environments

In the following section, we discuss some aspects of the netmon.cf configuration file with its use in a Virtual I/O environment.

The problem that netmon addressesThis new netmon functionality was added to support PowerHA on VIO. PowerHA customers using VIO within their clusters have experienced problems with specific scenarios where an entire CEC is unplugged from the network, but the PowerHA node within it does not detect a local adapter down event.

Note: When the NIM process (from RSCT Topology Services) attempts to determine the state of local adapters, it might try to use hostname resolution. If the IP and corresponding label are not in /etc/hosts and a problem or a delay is encountered while trying to resolve the address, the overall failure detection time might be prolonged and result in slow fallover operations.


This is because traffic being passed between the VIO clients looks like normal external traffic from the perspective of the LPAR's OS.

There is already a limitation against two PowerHA nodes in the same cluster using the same VIO server, because this would mean heartbeats can be passed between the nodes via the server even when no real network connectivity exists. The problem addressed by this new netmon.cf format is not the same as that issue, although there are similarities.

In PowerHA, heartbeating is used as a reliable means of monitoring an adapter's state over a long period of time. When heartbeating is broken, a decision has to be made as to whether the local adapter has gone bad. Does the neighbor have a problem or is it something between them. The local node only needs to take action if the local adapter is the problem. If its own adapter is good, then we assume it is still reachable by other clients regardless of the neighbor's state as the neighbor is responsible for acting on its local adapters failures.

This decision of which adapter is bad, local or remote, is made based on whether any network traffic can be seen on the local adapter, using the inbound byte count of the interface. Where VIO is involved, this test becomes unreliable because there is no way to distinguish whether inbound traffic came in from the VIO server's connection to the outside world, or just from a neighboring VIO client. This is a design point of VIO that its virtual adapters be indistinguishable to the LPAR from a real adapter.

Problem resolution (interim)A long term solution will require cooperative design work between both VIO and PowerHA/RSCT so that customers can take advantage of the Virtual I/O Server’s benefits, but PowerHA is still aware enough of what is happening below the surface to react appropriately when needed.

In the meantime, an intermediate solution for customers who are already using VIO is being provided. This new format allows customers to declare that a given adapter should only to be considered up if it can ping a set of specified targets.

Configuring new netmon.cfThe netmon.cf file must be placed in the /usr/es/sbin/cluster directory on all cluster nodes. Up to 32 different targets can be provided for each interface. If any given target is pingable, the adapter will be considered up.

Important: For this fix to be effective, the customer must select targets that are outside the VIO environment, and not reachable simply by hopping from one Virtual I/O Server to another. Neither PowerHA or RSCT will be able to detect if this restriction is violated.


Targets are specified using the existing netmon.cf configuration file as shown in 13.4, “Understanding the netmon.cf file” on page 621, using this new format:

!REQD <owner> <target>

Parameters: ----------

!REQD : An explicit string; it *must* be at the beginning of the line (no leading spaces).

<owner> : The interface this line is intended to be used by; that is, the code monitoring the adapter specified here will determine its own up/down status by whether it can ping any of the targets (below) specified in these lines. The owner can be specified as a hostname, IP address, or interface name. In the case of hostname or IP address, it *must* refer to the boot name/IP (no service aliases). In the case of a hostname, it must be resolvable to an IP address or the line will be ignored. The string "!ALL" will specify all adapters.

<target> : The IP address or hostname you want the owner to try to ping. As with normal netmon.cf entries, a hostname target must be resolvable to an IP address in order to be usable.

The traditional format of the netmon.cf file, which has one hostname or IP address per line, has not changed. Any adapters not matching one of the !REQD lines will still use the traditional lines as they always have; as extra targets for pinging in addition to known local or defined adapters, with the intent of increasing the inbound byte count of the interface.

Any adapter matching one or more !REQD lines (as the owner) will ignore any traditional lines.

Order from one line to the other is unimportant; you can mix !REQD lines with traditional ones in any way. However, if using a full 32 traditional lines, do not put them all at the very beginning of the file otherwise each adapter will read in all the traditional lines (because those lines apply to any adapter by default), stop at 32 and quit reading the file there. The same problem is not true in reverse, as !REQD lines which do not apply to a given adapter will be skipped over and not count toward the 32 maximum.

Comments, lines beginning with "#" are allowed on or between lines and ignored.

If more than 32 !REQD lines are specified for the same owner, any extra will simply be ignored as is the same just as with traditional lines.


ExamplesHere are some brief examples just to explain the syntax:

9.12.4.11!REQD en2 100.12.7.99.12.4.13!REQD en2 100.12.7.10

Most adapters will use netmon in the traditional manner, pinging 9.12.4.11 and 9.12.4.13 along with other local adapters or known remote adapters, and will only care about the interface's inbound byte count for results. Interface en2 will only be considered up if it can ping either 100.12.7.9 or 100.12.7.10:

!REQD host1.ibm 100.12.7.9!REQD host1.ibm host4.ibm!REQD 100.12.7.20 100.12.7.10!REQD 100.12.7.20 host5.ibm

The adapter owning host1.ibm will only be considered up if it can ping 100.12.7.9 or whatever host4.ibm resolves to. The adapter owning 100.12.7.20 will only be considered up if it can ping 100.12.7.10 or whatever host5.ibm resolves to. It is possible that 100.12.7.20 is the IP address for host1.ibm. However, we cannot tell from this example if that is true; then all four targets belong to that adapter:

!REQD !ALL 100.12.7.9!REQD !ALL 110.12.7.9!REQD !ALL 111.100.1.10!REQD en1 9.12.11.10

All adapters will be considered up only if they can ping 100.12.7.9, 110.12.7.9, or 111.100.1.10. En1 has one additional target: 9.12.11.10.In this example having any traditional lines would be pointless, because all of the adapters have been defined to use the new method.

13.4.2 Implications

Any interfaces which are not included as an owner of one of the !REQD netmon.cf lines will continue to behave in the old manner, even if you are using this new function for other interfaces.

Important: It is not recommended that any customers use this new function unless they absolutely have to because of their VIO environment.


This format does not change heartbeating behavior itself in any way. It only changes how the decision is made as to whether a local adapter is up or down. So this new logic will be used upon:

� Startup, before heartbeating rings are formed during � Heartbeat failure, when contact with a neighbor is initially lost)� During periods when heartbeating is not possible, such as when a node is the

only one up in the cluster

Invoking the new format changes the definition of a good adapter from “Am I able to receive any network traffic?” to “Can I successfully ping certain addresses?” regardless of how much traffic is seen.

This fact alone makes it inherently more likely for an adapter to be falsely considered down, because the second definition is more restrictive.

For this same reason, customers who find they must take advantage of this new functionality are encouraged to be as generous as possible with the number of targets they provide for each interface.

13.5 Understanding the clhosts fileThe clhosts file contains IP address information which helps to enable communication among monitoring daemons on clients and within the PowerHA cluster nodes. The tools that utilize this file include: clinfoES, Avow, and clstat. The file resides on all PowerHA cluster servers and clients in the /usr/es/sbin/cluster/etc/ directory.

When a monitor daemon starts up, it reads the following file, /usr/es/sbin/cluster/etc/clhosts, to determine which nodes are available for communication. Therefore, it is important for these files to be in place whenever trying to use the monitoring tools from a client outside of the cluster. Whenever the server portion of PowerHA is installed, the clhosts file is updated on the cluster nodes with the loopback address (127.0.0.1). The contents of the file within each cluster node typically will only contain the following line:

127.0.0.1 # HACMP/ES for AIX

Creating the clhosts fileIn prior releases, you were required to manually create and maintain the client-based clhosts file. In HACMP 5.3, you can automatically generate the clhosts file needed by clients when you perform a verification with the automatic corrective action feature enabled. The verification will create a /usr/es/sbin/cluster/etc/clhosts.client file on all cluster nodes.


The file will look similar to the following example:

# /usr/es/sbin/cluster/etc/clhosts.client Created by HACMP Verification / Synchronization Corrective Actions# Date Created: 07/01/2005 at 12:45:29192.168.100.102 #dlpar_app2_svc192.168.100.101 #dlpar_app1_svc192.168.202.204 #alexis_base1192.168.202.205 #alexis_base2192.168.100.61 #alexis192.168.201.203 #jessica_base2192.168.201.202 #jessica_base1192.168.100.72 #jessica192.168.200.200 #jordan_base1192.168.200.201 #jordan_base2192.168.100.71 #jordan

Notice that all of the addresses are pulled in including the boot, service, and persistent IP labels. Before utilizing any of the monitor utilities from a client node the clhosts.client file must be copied over to all clients as /usr/es/sbin/cluster/etc/clhosts. Remember to remove the .client extension when you copy the file over to the client nodes.

Use of clstat on a client requires a clhosts fileWhen running the clstat utility from a client, the clinfoES daemon obtains its cluster status information from the server side SNMP and populates the PowerHA MIB on the client side. It will be unable to communicate with the daemon and report that it is unable to find any clusters if it has no available clhosts file.

In this type of environment it is critical to implement a clhosts file on the client. This file will give the clinfoES daemon the addresses to attempt communication with the SNMP process running on the PowerHA cluster nodes.

Important: The clhosts file on a client should never contain 127.0.0.1, loopback, or localhost.

Note: When using IPAT via Replacement do not include standby addresses in the clhosts file.


HAView and the clhosts fileHAView monitors a cluster’s state within a network topology based on cluster specific information in the /usr/es/sbin/cluster/etc/clhosts file. It must be present on the Tivoli NetView® management node. Make sure that the hostname and service label of your Tivoli NetView nodes are exactly the same. (If they are not the same, add an alias in the /etc/hosts file to resolve the name difference.)

If an invalid IP address exists in the clhosts file, HAView will fail to monitor the cluster. Make sure that the IP addresses are valid, and there are no extraneous characters in the file.

13.6 Understanding the clinfo.rc filePowerHA can be configured to change the MAC address of a network interface by the implementation of hardware address takeover (HWAT). In a switched ethernet network environment, the switch might not always get promptly informed of the new MAC. In turn, the switch will not route the appropriate packets to the network interface.

The clinfo.rc script is used by PowerHA to flush the system’s ARP cache in order to reflect changes to network IP addresses. It does not update the cache until another address responds to a ping request. Flushing the ARP cache typically is not necessary if the PowerHA hardware address swapping facility is enabled. This is because hardware address swapping maintains the relationship between an IP address and a hardware address.

On clients not running clinfoES, you might have to update the local ARP cache indirectly by pinging the client from the cluster node. In order to avoid this, add the name or address of a client host you want to notify to the PING_CLIENT_LIST variable in the clinfo.rc script. The clinfoES program will calls the /usr/es/sbin/cluster/etc/clinfo.rc script whenever a network or node event occurs. Through use of PING_CLIENT_LIST entries in clinfo.rc can update the ARP caches for clients and network devices such as routers.

When a cluster event occurs, clinfo.rc runs the following command for each host specified in PING_CLIENT_LIST:

ping -c1 $host

Note: HWAT is only supported in PowerHA when using IPAT via replacement.

Note: This assumes the client is connected directly to one of the cluster networks.


Configuring the clinfo.rc fileDo the following steps to ensure that the new MAC address is communicated to the switch:

1. Modify the line in /usr/es/sbin/cluster/etc/clinfo.rc that currently reads:

PING_CLIENT_LIST=" "

2. Include on this line the names or IP addresses of at least one client on each subnet on the switched Ethernet.

3. Run clinfoES on all nodes in the PowerHA cluster that are attached to the switched Ethernet.

Remember to meet the following requirements:

� If you normally start PowerHA cluster services using the /usr/es/sbin/cluster/etc/rc.cluster shell script, specify the -i option.

� If you normally start PowerHA cluster services through SMIT, specify yes in the Start Cluster Information Daemon? field.

� Ensure that a copy of the /usr/es/sbin/cluster/etc/clinfo.rc script exists on each server node and client in the cluster in order for all ARP caches to be updated.

How clinfo and clinfo.rc workThis is the format of the clinfo call to clinfo.rc:

clinfo.rc {join,fail,swap} interface_name

Clinfo obtains information about the interfaces and their current state, and checks for changed states of interfaces:

� If a new state is UP, Clinfo calls clinfo.rc join interface_name.

� If a new state is DOWN, Clinfo calls clinfo.rc fail interface_name.

� If Clinfo receives a node_down_complete event, it calls clinfo.rc with the fail parameter for each interface currently UP.

� If Clinfo receives a fail_network_complete event, it calls clinfo.rc with the fail parameter for all associated interfaces.

� If Clinfo receives a swap_complete event, it calls clinfo.rc swap interface_name.



Part 5 Disaster recovery

In Part 5, we discuss PowerHA Extended Distance (PowerHA/XD).


� PowerHA Extended Distance concepts and planning� PowerHA with cross-site LVM mirroring� PowerHA/XD and SVC copy services� GLVM concepts and configuration

Part 5



Chapter 14. PowerHA Extended Distance concepts and planning

In this chapter we present an overview of the PowerHA Extended Distance (PowerHA/XD) features and capabilities. In the following chapters we describe how to install and configure some of the disaster recovery features of PowerHA/XD.

In this chapter we discuss the following topics:

� Disaster recovery considerations� PowerHA/XD components� PowerHA/XD SVC Global Mirror� PowerHA GLVM� Locating additional information

14


14.1 Disaster recovery considerations

Disaster recovery strategies cover a wide range from no recovery readiness to automatic recovery with high data integrity. Data recovery strategies must address the following issues:

� Data readiness levels:

– Level 0: None. No provision for disaster recovery.

– Level 1: Periodic backup. Data required for recovery up to a given date is backed up and sent to another location.

– Level 2: Ready to roll forward. In addition to periodic backups, data update logs are also sent to another location. Transport can be manual or electronic. Recovery is to the last log data set stored at the recovery site.

– Level 3: Roll forward or forward recover. A shadow copy of the data is maintained on disks at the recovery site. Data update logs are received and periodically applied to the shadow copy using recovery utilities.

– Level 4: Real time roll forward. Like roll forward, except updates are transmitted and applied at the same time as they are being logged in the original site. This real-time transmission and application of log data does not impact transaction response time at the original site.

– Level 5: Real time remote update. Both the original and the recovery copies of data are updated before sending the transaction response or completing a task.

� Site interconnection options:

– Level 0: None. There is no interconnection or transport of data between sites.

– Level 1: Manual transport. There is no interconnection. For transport of data between sites, dispatch, tracking, and receipt of data is managed manually.

– Level 2: Remote tape. Data is transported electronically to a remote tape. Dispatch and receipt are automatic. Tracking can be either automatic or manual.

– Level 3: Remote disk. Data is transported electronically to a remote disk. Dispatch, receipt, and tracking are all automatic.


� Recovery site readiness:

– Cold: A cold site typically is an environment with the proper infrastructure, but little or no data processing equipment. This equipment must be installed as the first step in the data recovery process.

Both periodic backup and ready to roll forward data can be shipped from a storage location to this site when a disaster occurs.

– Warm: A warm site has data processing equipment installed and operational. This equipment is used for other data processing tasks until a disaster occurs. Data processing resources can be used to store data, such as logs. Recovery begins after the regular work of the site is shut down and backed up.

Both periodic backup and ready to roll forward data can be stored at this site to expedite disaster recovery.

– Hot: A hot site has data processing equipment installed and operational and data can be restored either continually or regularly to reduce recovery time.

Generally disaster recovery focuses on three key areas. They are:

– Recovery Point Objective (RPO): Upon recovery, how much data is acceptable to recreate:

• If none, then synchronous replications is required.• If some, then asynchronous replication might be suitable.

– Recovery Time Objective (RTO): What is an acceptable amount of time to be without system access:

• Minutes to a couple hours generally require automated recovery.• Hours to days can allow manual recovery steps.

– Network Recovery Objective (NRO): How long it takes to switch over network access.

Other considerations for planning disaster recovery are often unique to your environment. The connectivity options and the distance between sites will also dictate what type of data replication options are available to use. There is a careful balance required between the bandwidth required and latency encountered when traversing greater distance. Though technologies might support “unlimited” distance, this does not always mean it is possible or even feasible to implement it.

PowerHA/XD software provides the highest level of disaster recovery:

� Data readiness level 5: PowerHA/XD provides real-time remote update data readiness by updating both the original and the recovery copies of data prior to sending a transaction response or completing a task.

Chapter 14. PowerHA Extended Distance concepts and planning 635

� Site interconnection level 3: PowerHA/XD also provides remote disk site inter connectivity by transmitting data electronically to a geographically distant site where the disks are updated and all bookkeeping is automatic.

� Hot site readiness. Since recovery site contains operational data processing equipment along with current data, this keeps recovery time to a minimum.

In case of using PowerHA Metro Mirror and Global Mirror (SVC only) feature, data mirroring is managed by the copy services at the storage level.

With PowerHA/XD, the recovery site can be actively processing data and performing useful work. In fact, each site can be a backup for the other, thereby minimizing the cost of setting up a recovery site for each original production site.

14.2 PowerHA/XD components

The PowerHA base software product addresses part of the continuous operation problem. It addresses recovery from the failure of a node, an adapter, or a local area network within a computing complex at a single site.

PowerHA/XD extends the base capability of the PowerHA, by providing automated fallover/fallback support for applications over geographically dispersed systems. Systems running in different locations are defined as PowerHA nodes assigned to sites and they are managed by PowerHA like usual nodes.

The key function of PowerHA/XD is to provide and automate the data replication across sites. To accomplish this function PowerHA/XD can use several components:

� ESS/DS6000/DS8000 Metro Mirror � SAN Volume Controller (SVC) Metro Mirror � SAN Volume Controller (SVC) Global Mirror� Geographic Logical Volume Manager (GLVM)

14.2.1 PowerHA/XD Metro Mirror integration featureThis feature was introduced simultaneously in HACMP/XD 4.5 PTF5 and HACMP/XD 5.1, and provides automated site fallover and activation of remote copies of application data in an environment where the IBM Enterprise Storage Server® (ESS) is used in both sites and the Metro Mirror facility provides storage volumes mirroring.


The support was expanded in HACMP/XD 5.2 to include the SAN Volume Controller Metro Mirror ability. HACMP/XD 5.2 and HACMP 5.3 were also later enhanced to include this capability on the DS8000 and DS6000™. Further updates provided support for intermixing ESS/DS6000/DS8000 units when utilizing the DS Command-line interface (DSCLI).

A typical configuration for PowerHA/XD ESS Metro Mirror is shown in Figure 14-1.

Figure 14-1 Example of a PowerHA/XD Metro Mirror configuration

14.2.2 ImplicationsIn case of primary site failure, data should be available for use at the secondary site (replicated via Metro Mirror). The data copy in the secondary site must be activated in order to be used for processing.

Note: Currently only the combination of SVC Metro Mirror is supported within a VIOS environment.


The PowerHA/XD Metro Mirror integration feature provides automated copy split in case of primary site failure and automated reintegration when the primary site becomes available.

For detailed information, see HACMP/XD Metro Mirror: Planning and Administration Guide, SC23-4863.

14.3 PowerHA/XD SVC Global MirrorPowerHA/XD 5.5 introduced integrated support for SVC Global Mirror. It provides the same automated support features as PowerHA/XD does for Metro Mirror.

The SVC maintains separate copies of the application data on two separate back end storage subsystems. Several virtual disks are replicated by the master SVC cluster from the primary site to the standby site. When node failure and another node is available within the local site, then traditional PowerHA fallover applies.

However, if a site failure occurs, all highly available applications are restarted at the standby site using data copy on secondary volumes. Under normal operation, the application is active on a server at the production site, and all updates to the application data are automatically replicated to the backup disk subsystem by the SVC copy services framework. Copy services protects the backup copy from inadvertent modification. When a total site failure occurs, the application is restarted on a backup server at the remote site.

For more details and example of PowerHA/XD and SVC refer to Chapter 16, “PowerHA/XD and SVC copy services” on page 673.

14.4 PowerHA GLVM

PowerHA/XD Geographic Logical Volume Manager (GLVM) is a high availability function that can mirror data across a standard IP network and provide automated fallover/fallback support for the applications using the geographically mirrored data.

GLVM performs the remote mirroring of AIX logical volumes using AIX's native Logical Volume Manager (LVM) functions for optimal performance and ease of configuration and maintenance. It offers both synchronous and, starting in PowerHA/XD 5.5 SP1, asynchronous replication. Figure 14-2 shows a GLVM example.

For more information about GLVM, consult Chapter 17, “GLVM concepts and configuration” on page 697.


Figure 14-2 GLVM example

14.5 Locating additional informationRefer to these references for more information:

� ILM Library: Information Lifecycle Management Best Practices Guide, SG24-7251.

� ILM Library: Information Lifecycle Management Best Practices Guide, SG24-7251.

� SAN Volume Controller V4.3.0 Advanced Copy Services, SG24-7574.

� IBM System Storage DS8000: Copy Services in Open Environments, SG24-6788.

� HACMP/XD Metro Mirror: Planning and Administration Guide, SC23-4863.

� HACMP/XD Geographic LVM: Planning and administration guide, SC23-1338.


The following Web sites also contain useful information:

� GLVM white papers

http://www-03.ibm.com/systems/power/software/aix/whitepapers/aix_glvm.htmlhttp://www-03.ibm.com/systems/p/software/whitepapers/hacmp_xd_glvm.html

� HA PPRC white paper

http://www-03.ibm.com/systems/p/software/whitepapers/hacmp_pprc.html


http://www-03.ibm.com/systems/p/software/whitepapers/hacmp_xd_glvm.html

http://www-03.ibm.com/systems/power/software/aix/whitepapers/aix_glvm.html

http://www-03.ibm.com/systems/p/software/whitepapers/hacmp_pprc.html

Chapter 15. PowerHA with cross-site LVM mirroring

This chapter describes a disaster recovery solution, based on AIX LVM mirroring and a basic PowerHA cluster. It is built from the same components generally used for local cluster solutions with SAN-attached storage. Cross-site LVM mirroring replicates data between the disk subsystems at separate sites.


� Cross-site LVM mirroring introduction� Infrastructure considerations� Configuring cross-site LVM mirroring� Testing cross-site LVM mirroring� Maintaining cross-site LVM

15


15.1 Cross-site LVM mirroring introduction

A storage area network (SAN) is a high-speed network that allows the establishment of direct connections between storage devices and servers. The maximum distance for the connections is defined by Fibre Channel limitations. This allows for two or more servers, located in different sites to access the same physical disks.

These remote disks can be combined into a volume group via the AIX Logical Volume Manager (LVM) and this volume group can be imported to the nodes located at different sites. You can create logical volumes and set up a LVM mirror with a copy at each site. The number of active sites in a cross-site LVM mirroring supported in PowerHA is limited to two.

15.1.1 ComparisonThe main difference between general local clusters and cluster solutions with cross-site mirroring is as follows:

� Within local clusters, all nodes and storage subsystems are located in the same location.

� With cross-site mirrored, cluster nodes, and storage subsystems reside on different site locations. Each site has at least one cluster node and one storage subsystem with all necessary IP network and SAN infrastructure.

This solution offers automation of AIX LVM mirroring within SAN disk subsystems between different sites. It also provides automatic LVM mirroring synchronization and disk device activation when, after a disk or site failure, a node or disk becomes available.

Each node in a cross-site LVM cluster accesses all storage subsystems. The data availability is ensured through the LVM mirroring between the volumes residing on different storage subsystems on different sites.

Figure 15-1 on page 645 explains the two-site LVM cross-mirroring environment that we used for our cross-site LVM mirroring test.

In case of site failure, PowerHA performs a takeover of the resources to the secondary site according to the cluster policy configuration. It activates all defined volume groups from the surviving mirrored copy. In case one storage subsystem fails, data access is not interrupted and applications can access data from the active mirroring copy the on surviving disk subsystem.


PowerHA drives automatic LVM mirroring synchronization, and after the failed site joins the cluster, it automatically fixes removed and missing volumes (PV states removed and missing) and synchronizes data. Automatic synchronization is not possible for all cases, but you can use C-SPOC to synchronize the data from the surviving mirrors to stale mirrors after a disk or site failure.

15.1.2 RequirementsThe following requirements are necessary to assure data integrity and appropriate PowerHA reaction in case of site or disk subsystem failure:

� The force varyon attribute for the resource group must be set to true.

� The logical volumes allocation policy must be set to superstrict (ensuring that LV copies are allocated on different volumes, and the primary and secondary copy of each LP are allocated on disks located in different sites).

� The LV mirroring copies must be allocated on separate volumes that reside on different disk subsystem (on different sites).

When increasing the size of mirrored file system, it is necessary to assure that the new logical partition(s) will be allocated on different volumes and different disk subsystems according to the requirements above. For this task it is required to increase the logical volume first with the appropriate volume selection(s), then increase file system, preferably using C-SPOC (in this case, PowerHA will enforce this).

Before configuring cross-site LVM mirroring environment, check for the following prerequisites:

� Configure the sites and resource groups and run the PowerHA cluster discovery process.

� Ensure that both sites have copies of the logical volumes and that the forced varyon attribute for a volume group is set to true if an existing resource group already contains a volume group.

15.2 Infrastructure considerationsHere we describe some considerations regarding SAN setup, Fibre Channel connections, and LAN environment. The considerations and limitations are based on the technologies and protocols, used for cross-site mirroring cluster implementation.

The SAN network can be expanded beyond the original site, by the way of advanced technology.

Chapter 15. PowerHA with cross-site LVM mirroring 643

Here is an example of what kind of technology could be used for expansion. This list is not exhaustive:

� FCIP router

� Wave division multiplexing (WDM) devices. This technology includes:

– CWDM - Coarse Wavelength Division Multiplexing, which is the less expensive component among the WDM technology.

– DWDM stands for Dense Wave length Division Multiplexing.

Of course the infrastructure should be resilient to failures by providing redundant components. Also, the SAN interconnecting must provide sufficient bandwidth to allow for adequate performance for the synchronous based mirroring that LVM provides. Also consult the white paper, Understanding the Performance Implications of Cross-Site Mirroring with AIX’s Logical Volume Manager, which is available online at:

http://www.ibm.com/support/docview.wss?uid=tss1wp101269

15.3 Configuring cross-site LVM mirroringIn this section we show an example how to set up and configure a cross-site LVM mirroring environment. In general you can make cross-site LVM mirroring configuration as a new cluster implementation. You could also change an existing local cluster by adding site dependencies and cross-site LVM features in the cluster configuration and integrate it within the cross-site environment.

15.3.1 Configuring the cross-site LVM clusterFor our cross-site LVM mirroring test, we configure a new cluster environment by using the Extended Configuration SMIT menu. We first define the cluster topology including cluster definition, nodes, networks, network interfaces and non-IP over disk heartbeat paths as for normal cluster environment. Our nodes are named “jordan” and “jessica”; where jordan resides on the site named Dallas while jessica resides on the site named FtWorth (see Figure 15-1).

An important part of cross-site environments are communication paths between the nodes on both sites. The communication between the nodes consists of both IP and non-IP connections. A non-IP connection is very important for cross-site cluster solutions to prevent node or site isolation (“split brain”). We configured the non-IP heartbeat using the heartbeat over disk feature.



Following the general recommendation for non-IP disk heartbeat networks (see 3.8, “Planning cluster networks” on page 131), we define two non-IP heartbeat networks. The first network uses disk devices on a DS4400, which resides on the FtWorth site. The second network uses disk devices on a DS4300 storage, which resides on the Dallas site. This redundancy is set in order to keep alive the cluster non-IP heartbeats in case of one disk subsystem failure.

Figure 15-1 shows our test environment for cross-site LVM mirroring cluster testing. the following sections describe detailed configuration steps for configuring the necessary cross-site LVM specific in cluster configuration.

Figure 15-1 LVM cross-mirrored cluster testing environment

15.3.2 Configuring cluster sitesYou should use the PowerHA site definitions when you configure cross-site LVM mirroring or any of the PowerHA/XD components.

We configured the sites and add them to a cluster configuration, using SMIT menus. We run:

� smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP Sites Add a Site.


We add the two sites FtWorth and Dallas. Node jordan is part of the Dallas site, while node jessica is part of the FtWorth site. Figure 15-2 shows the SMIT menu for site creation.

Figure 15-2 Adding a site in the topology configuration

The required fields to fill in for the definition of a site are as follows:

� Site Name: Define a name of a site, using no more than 64 alphanumeric and underscore characters.

� Site Nodes: For each site we define at least one node residing within the site. You can add multiple nodes by leaving a blank space between the names. Each node can only belong to one site.

15.3.3 Configuring cross-site LVM mirroring site dependenciesAfter we define the cluster topology with sites, we assign the specific disk devices to the appropriate site. The storage on FtWorth site is a DS4400, while the storage on Dallas site is DS4300. The storage is accessed through dual Virtual I/O Servers, each with redundant HBAs. The SAN zoning configuration is set to allow one HBA from each Virtual I/O Server access to each storage unit. This provides redundancy but also logically separates the HBAs where both storage units are not visible from any HBA. This allows each server to access each storage unit through separate HBAs for better performance.

On node jessica we plan on using hdisk10-hdisk13 for the site Dallas disks, and hdisk15-hdisk18 for site FtWorth disks. On node jordan the corresponding disks are hdisk2-hdisk5 for Dallas and hdisk7-hdisk10 for FtWorth. Example 15-1 shows the disk list on both nodes jordan and jessica.

Add Site

Type or select values in entry fields.Press Enter AFTER making all desired changes. [Entry Fields]* Site Name [FtWorth] +* Site Nodes jessica +


Note: The options, Dominance and Backup Communications, have been removed in PowerHA 5.5.


Example 15-1 lsdev -Cc disk and lspv output on both nodes

jessica /# lspvhdisk0 00cee73e15c90def rootvg activehdisk2 00cee73eab68f10e aprilvghdisk1 00cee73e15f95516 zoevghdisk3 00cee73eab68f1dd princessvghdisk4 00cee73eab68f2ad shadowvg hdisk5 00cee73eab68f375 UTvg activehdisk6 00cee73ea7ceece4 UTvg activehdisk7 00cee73eab68f515 Nonehdisk8 00cee73eab68f5e6 Nonehdisk9 00cee73eab68f6bb Nonehdisk10 00cee73e491cebda Nonehdisk11 00cee73e491d04c5 Nonehdisk12 00cee73e491d3e3f Nonehdisk13 00cee73e38b5823e Nonehdisk14 00cee73e4a3c5773 rootvg activehdisk15 00cee73e307c8752 Nonehdisk16 00cee73e307ca67e Nonehdisk17 00cee73e307cbff9 Nonehdisk18 00cee73e307e6eb8 None

jessica /# lsdev -Cc diskhdisk0 Available Virtual SCSI Disk Drivehdisk1 Available Virtual SCSI Disk Drivehdisk2 Available Virtual SCSI Disk Drivehdisk3 Available Virtual SCSI Disk Drivehdisk4 Available Virtual SCSI Disk Drivehdisk5 Available Virtual SCSI Disk Drivehdisk6 Available Virtual SCSI Disk Drivehdisk7 Available Virtual SCSI Disk Drivehdisk8 Available Virtual SCSI Disk Drivehdisk9 Available Virtual SCSI Disk Drivehdisk10 Available 01-08-02 1722-600 (600) Disk Array Devicehdisk11 Available 01-08-02 1722-600 (600) Disk Array Devicehdisk12 Available 01-08-02 1722-600 (600) Disk Array Devicehdisk13 Available 01-08-02 1722-600 (600) Disk Array Device

Note: All disks should have PVIDs assigned before running the menu selection, Discover HACMP-related Information from Configured Nodes, in order to have complete disk information stored into the Disk Discovery File. You can do this with the chdev -l hdiskX -a pv=yes command on one node, and then you remove and reconfigure disk devices on the other nodes.


hdisk14 Available 01-08-02 1742 (700) Disk Array Devicehdisk15 Available 01-08-02 1742 (700) Disk Array Devicehdisk16 Available 01-08-02 1742 (700) Disk Array Devicehdisk17 Available 01-08-02 1742 (700) Disk Array Devicehdisk18 Available 01-08-02 1742 (700) Disk Array Device

jordan /# lspvhdisk0 00cee73e1a0a09a6 rootvg activehdisk1 00cee73e1a459de1 rootvg activehdisk2 00cee73e491cebda Nonehdisk3 00cee73e491d04c5 Nonehdisk4 00cee73e491d3e3f Nonehdisk5 00cee73e38b5823e Nonehdisk6 00cee73e4a3b9ad8 Nonehdisk7 00cee73e307c8752 Nonehdisk8 00cee73e307ca67e Nonehdisk9 00cee73e307cbff9 Nonehdisk10 00cee73e307e6eb8 None

jordan /# lsdev -Cc diskhdisk0 Available Virtual SCSI Disk Drivehdisk1 Available Virtual SCSI Disk Drivehdisk2 Available 02-08-02 1722-600 (600) Disk Array Devicehdisk3 Available 02-08-02 1722-600 (600) Disk Array Devicehdisk4 Available 02-08-02 1722-600 (600) Disk Array Devicehdisk5 Available 02-08-02 1722-600 (600) Disk Array Devicehdisk6 Available 02-08-02 1742 (700) Disk Array Devicehdisk7 Available 02-08-02 1742 (700) Disk Array Devicehdisk8 Available 02-08-02 1742 (700) Disk Array Devicehdisk9 Available 02-08-02 1742 (700) Disk Array Devicehdisk10 Available 02-08-02 1742 (700) Disk Array Device

We then run cluster discovery by running:

1. smitty hacmp Extended Configuration Discover HACMP-related Information from Configured Nodes and press Enter.

2. After completing cluster discovery, use SMIT for assigning the site / disk dependencies by running

smitty hacmp System Management (C-SPOC) HACMP Physical Volume Management Configure Disk/Site Locations for Cross-Site LVM Mirroring Add Disk/Site Definition for Cross-Site LVM Mirroring

You can also use the SMIT fast path smitty cl_xslvmm for accessing the Configure Disk/Site Locations for Cross-Site LVM Mirroring directly.


3. After we use the Add Disk/Site Definition for Cross-Site LVM Mirroring menu selection, we first select the site for our definition, as shown in Example 15-2. In our case, we chose the disks for the Dallas site first. After that we select the disks that reside in the Dallas site, as shown in Example 15-3.

Example 15-2 Site selection in the Add Site/Disk Definition SMIT menu

¦ Move cursor to desired item and press Enter. ¦

¦ Dallas ¦ FtWorth

Example 15-3 Disk selection in the Add Site/Disk Definition SMIT menu

Add Disk/Site Definition for Cross-Site LVM Mirroring

Ty+--------------------------------------------------------------------------+Pr¦ Disks PVID ¦ ¦ ¦ ¦ Move cursor to desired item and press F7. ¦* ¦ ONE OR MORE items can be selected. ¦* ¦ Press Enter AFTER making all selections. ¦ ¦ ¦ ¦ [TOP] ¦ ¦ > 00cee73e491cebda hdisk2 jordan ¦ ¦ > 00cee73e491d04c5 hdisk3 jordan ¦ ¦ > 00cee73e491d3e3f hdisk4 jordan ¦ ¦ > 00cee73e38b5823e hdisk5 jordan ¦ ¦ > 00cee73e491cebda hdisk10 jessica ¦ ¦ > 00cee73e491d04c5 hdisk11 jessica ¦ ¦ > 00cee73e491d3e3f hdisk12 jessica ¦ ¦ > 00cee73e38b5823e hdisk13 jessica ¦ ¦ ¦ ¦ [MORE...8] ¦ ¦ ¦ ¦ Esc+1=Help Esc+2=Refresh Esc+3=Cancel ¦Es¦ F7=Select F8=Image F10=Exit ¦Es¦ Enter=Do /=Find n=Find Next ¦F9+--------------------------------------------------------------------------+¦

Note: The menu selection, Configure Disk/Site Locations for Cross-Site LVM Mirroring, functions correctly only if the Disk Discovery File reflects the current disk configuration. We recommend to run discovery before configuring disk/site dependencies.


We then repeat this step for the FtWorth site and the corresponding disks as shown in Example 15-4.

Example 15-4 Add Disk/Site Definition for the second site

Add Disk/Site Definition for Cross-Site LVM Mirroring

Type or select values in entry fields.Press Enter AFTER making all desired changes. +--------------------------------------------------------------------------+ ¦ Disks PVID ¦* ¦ ¦* ¦ Move cursor to desired item and press F7. ¦ ¦ ONE OR MORE items can be selected. ¦ ¦ Press Enter AFTER making all selections. ¦ ¦ ¦ ¦ > 00cee73e307c8752 hdisk7 jordan ¦ ¦ > 00cee73e307ca67e hdisk8 jordan ¦ ¦ > 00cee73e307cbff9 hdisk9 jordan ¦ ¦ > 00cee73e307e6eb8 hdisk10 jordan ¦ ¦ > 00cee73e307c8752 hdisk15 jessica ¦ ¦ > 00cee73e307ca67e hdisk16 jessica ¦ ¦ > 00cee73e307cbff9 hdisk17 jessica ¦ ¦ > 00cee73e307e6eb8 hdisk18 jessica ¦ ¦ ¦ ¦ Esc+1=Help Esc+2=Refresh Esc+3=Cancel ¦Es¦ F7=Select F8=Image F10=Exit ¦Es¦ Enter=Do /=Find n=Find Next ¦F9+--------------------------------------------------------------------------+

You can change the disk/site dependency later by running smitty cl_xslvmm and selecting Change/Show Disk/Site Definition for Cross-Site LVM Mirroring.You can also remove the site and disk dependency later by running smitty cl_xslvmm and selecting Remove Disk/Site Definition for Cross-Site LVM Mirroring.

15.3.4 Configuring volume groups with cross-site LVM mirrorFor our cross-site LVM mirroring test, we created two volume groups, dallasvg and ftworthvg, consisting of four disks each. Two disks per volume group will be designated to each site to provide the desired site redundancy.

We created our volume groups using SMIT C-SPOC menus by running:

1. smitty cl_admin HACMP Logical Volume Management Shared Volume Groups Create a Shared Volume Group: use F7 and select both nodes.


2. After selecting the participating nodes, we selected the appropriated disks by pressing F7 on each one, as shown in Example 15-5.

Example 15-5 Selecting the disks for first volume group creation


Mo+--------------------------------------------------------------------------+ ¦ Physical Volume Names ¦ ¦ ¦ ¦ Move cursor to desired item and press F7. ¦ ¦ ONE OR MORE items can be selected. ¦ ¦ Press Enter AFTER making all selections. ¦ ¦ ¦ ¦ [TOP] ¦ ¦ > 00cee73e307c8752 ( hdisk7 on node jordan at site FtWorth ) ¦ ¦ > 00cee73e307c8752 ( hdisk15 on node jessica at site FtWorth ) ¦ ¦ > 00cee73e307ca67e ( hdisk8 on node jordan at site FtWorth ) ¦ ¦ > 00cee73e307ca67e ( hdisk16 on node jessica at site FtWorth ) ¦ ¦ > 00cee73e491cebda ( hdisk2 on node jordan at site Dallas ) ¦ ¦ > 00cee73e491cebda ( hdisk10 on node jessica at site Dallas ) ¦ ¦ > 00cee73e491d04c5 ( hdisk3 on node jordan at site Dallas ) ¦ ¦ > 00cee73e491d04c5 ( hdisk11 on node jessica at site Dallas ) ¦ ¦ ¦ ¦ [MORE...8] ¦ ¦ ¦ ¦ Esc+1=Help Esc+2=Refresh Esc+3=Cancel ¦ ¦ F7=Select F8=Image F10=Exit ¦Es¦ Enter=Do /=Find n=Find Next ¦F9+--------------------------------------------------------------------------+

3. We then choose the volume group type of Big from the pop-up picklist menu, as shown in Example 15-6, and press Enter.

Example 15-6 Volume group type picklist



List All Shared Volume Groups Create a Shared Volume Group Create a Shared Volume Group with Data Path Devices Enable a Shared Volume Group for Fast Disk Takeover Set Characteristics of a Shared Volume Group Import a Shared Volume Group +--------------------------------------------------------------------------+ ¦ Volume Group Type ¦ ¦ ¦


¦ Move cursor to desired item and press Enter. ¦ ¦ ¦ ¦ Legacy ¦ ¦ Original ¦ ¦ Big ¦ ¦ Scalable ¦ ¦ ¦ ¦ Esc+1=Help Esc+2=Refresh Esc+3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦Es¦ /=Find n=Find Next ¦F9+--------------------------------------------------------------------------+

4. Then we fill in all necessary fields and ensure that we set the Enable Cross-Site LVM Mirroring Verification option to true. Example 15-7 shows the SMIT C-SPOC volume group creation panel.

Example 15-7 Create cross-site LVM volume group

Create a Shared Big Volume Group


[TOP] [Entry Fields] Node Names jessica,jordan Resource Group Name [testxlvm] + PVID 00cee73e307c8752 00ce> VOLUME GROUP name [dallasvg] Physical partition SIZE in megabytes 32 + Volume group MAJOR NUMBER [43] # Enable Cross-Site LVM Mirroring Verification true + Enable Volume Group for Fast Disk Takeover? true + Volume Group Type Big

Warning: Changing the volume group major number may result

Note: The volume group type picklist was added in PowerHA 5.5 SP1.

Note: We discovered when creating a type Scalable volume group that the needed option of Enable Cross-Site LVM Mirroring Verification does not exist. To get around this, simply complete creating the volume group. Then go to smitty cl_vgsc Enable/Disable a Shared Volume Group for Cross-Site LVM Mirroring Verification, then choose the appropriate resource group and volume group and set the option to true. This situation should be corrected in a future update.


in the command being unable to execute[MORE...5]

Esc+1=Help Esc+2=Refresh Esc+3=Cancel Esc+4=ListEsc+5=Reset F6=Command F7=Edit F8=ImageF9=Shell F10=Exit Enter=Do

5. We then repeated the previous step using the remaining four disks, as shown in Example 15-8. We created ftworthvg and also chose the option to create a second resource group, testxlvm2, as shown in Example 15-9.

Example 15-8 Selecting the disks for second volume group creation

+--------------------------------------------------------------------------+ ¦ Physical Volume Names ¦ ¦ ¦ ¦ Move cursor to desired item and press F7. ¦ ¦ ONE OR MORE items can be selected. ¦ ¦ Press Enter AFTER making all selections. ¦ ¦ ¦ ¦ [MORE...8] ¦ ¦ 00cee73e307cbff9 ( hdisk9 on node jordan at site FtWorth ) ¦ ¦ 00cee73e307cbff9 ( hdisk17 on node jessica at site FtWorth ) ¦ ¦ 00cee73e307e6eb8 ( hdisk10 on node jordan at site FtWorth ) ¦ ¦ 00cee73e307e6eb8 ( hdisk18 on node jessica at site FtWorth ) ¦ ¦ 00cee73e38b5823e ( hdisk5 on node jordan at site Dallas ) ¦ ¦ 00cee73e38b5823e ( hdisk13 on node jessica at site Dallas ) ¦ ¦ 00cee73e491d3e3f ( hdisk4 on node jordan at site Dallas ) ¦ ¦ 00cee73e491d3e3f ( hdisk12 on node jessica at site Dallas ) ¦ ¦ [BOTTOM] ¦ ¦ ¦ ¦ Esc+1=Help Esc+2=Refresh Esc+3=Cancel ¦ ¦ F7=Select F8=Image F10=Exit ¦Es¦ Enter=Do /=Find n=Find Next ¦F9+--------------------------------------------------------------------------+

Example 15-9 Create second cross-site lvm volume group

Create a Shared Big Volume Group

Type or select values in entry fields.

Note: The option, Resource Group Name, which was added in PowerHA 5.5, allows you to either select an existing resource group in which to put the volume group, or if you type in a new name, it will create a new resource group automatically.


Press Enter AFTER making all desired changes.

[TOP] [Entry Fields] Node Names jessica,jordan Resource Group Name [testxlvm2] + PVID 00cee73e307c8752 00ce> VOLUME GROUP name [ftworthvg] Physical partition SIZE in megabytes 32 + Volume group MAJOR NUMBER [43] # Enable Cross-Site LVM Mirroring Verification true + Enable Volume Group for Fast Disk Takeover? true + Volume Group Type Big



Our volume groups are now present on both nodes as shown in Example 15-10.

Example 15-10 Lspv output on each node after volume group creation

jessica /# lspvhdisk0 00cee73e15c90def rootvg activehdisk2 00cee73eab68f10e aprilvghdisk1 00cee73e15f95516 zoevghdisk3 00cee73eab68f1dd princessvghdisk4 00cee73eab68f2ad shadowvg hdisk5 00cee73eab68f375 UTvg activehdisk6 00cee73ea7ceece4 UTvg activehdisk7 00cee73eab68f515 Nonehdisk8 00cee73eab68f5e6 Nonehdisk9 00cee73eab68f6bb Nonehdisk10 00cee73e491cebda dallasvghdisk11 00cee73e491d04c5 dallasvghdisk12 00cee73e491d3e3f ftworthvghdisk13 00cee73e38b5823e ftworthvghdisk14 00cee73e4a3c5773 rootvg activehdisk15 00cee73e307c8752 dallasvghdisk16 00cee73e307ca67e dallasvghdisk17 00cee73e307cbff9 ftworthvghdisk18 00cee73e307e6eb8 ftworthvg

jordan /# lspv


hdisk0 00cee73e1a0a09a6 rootvg activehdisk1 00cee73e1a459de1 rootvg activehdisk2 00cee73e491cebda dallasvghdisk3 00cee73e491d04c5 dallasvghdisk4 00cee73e491d3e3f ftworthvghdisk5 00cee73e38b5823e ftworthvghdisk6 00cee73e4a3b9ad8 Nonehdisk7 00cee73e307c8752 dallasvghdisk8 00cee73e307ca67e dallasvghdisk9 00cee73e307cbff9 ftworthvghdisk10 00cee73e307e6eb8 ftworthvg

At this point, C-SPOC should continue to be used to create all additional logical volumes and file systems needed for the volume groups just created. You can find more information about this in 15.5, “Maintaining cross-site LVM” on page 661.

15.3.5 Resource groups and cross-site LVM mirroring Currently our cluster consists of topology, all LVM components, and two resource groups. Because we allowed C-SPOC to create the resource groups for us at the time of volume group creation, they both currently have the lowest alphanumeric node, jessica, listed as the primary node. We changed the second resource group, testxlvm2, to have node jordan as primary and jessica as secondary. This gives us a mutual takeover style cluster environment. Our current resource group configuration is shown in Example 15-11.

Example 15-11 Resource group settings before change for cross-site LVM mirroring

Resource Group Name testxlvmParticipating Node Name(s) jessica jordanStartup Policy Online On Home Node OnlyFallover Policy Fallover To Next Priority

Node In The ListFallback Policy Never FallbackSite Relationship ignoreNode PriorityService IP LabelFilesystems ALLFilesystems Consistency Check fsckFilesystems Recovery Method sequentialFilesystems/Directories to be exported (NFSv3)Filesystems/Directories to be exported (NFSv4)Filesystems to be NFS mountedNetwork For NFS MountFilesystem/Directory for NFSv4 Stable StorageVolume Groups ftworthvg


Concurrent Volume GroupsUse forced varyon for volume groups, if necessary false

Resource Group Name testxlvm2Participating Node Name(s) jordan jessicaStartup Policy Online On Home Node OnlyFallover Policy Fallover To Next Priority

Node In The ListFallback Policy Never FallbackSite Relationship ignoreNode PriorityService IP LabelFilesystems ALLFilesystems Consistency Check fsckFilesystems Recovery Method sequentialFilesystems/Directories to be exported (NFSv3)Filesystems/Directories to be exported (NFSv4)Filesystems to be NFS mountedNetwork For NFS MountFilesystem/Directory for NFSv4 Stable StorageVolume Groups dallasvgConcurrent Volume GroupsUse forced varyon for volume groups, if necessary false

There is an additional setting for resource groups where sites are defined. This setting is Inter-Site Management Policy. The possible selections for this feature are:

� Ignore. This is the default selection and ignores the site dependency settings for the resource group. This is the standard recommendation when using cross-site LVM mirroring.

� Prefer Primary Site. The resource group can be assigned to be taken over by multiple sites in a prioritized manner. When a sites fails, the active site with the highest priority acquires the resource. When the failed site rejoins, the site with the highest priority acquires the resource.

� Online On Either Site. The resource group can be acquired by any site in its resource chain. When a site fails, the resource group is acquired by the highest priority standby site. When the failed site rejoins, the resource group remains with its new owner.

� Online On Both Sites. The resource group is acquired by both sites. This selection defines the concurrent capable resource group.


After checking the resource group policies we can now configure the resource group attributes like service IP label and the application server by running:

� smitty hacmp Extended Configuration HACMP Extended Resource Group Configuration Change/Show Resources and Attributes for a Resource Group, choose the appropriate resource group, and press Enter.

The important parameter while adding the cross-site mirroring enabled volume group in the resource group is the field, “Use forced varyon of volume groups, if necessary.” You must set this field to true for any cross-site LVM mirroring configuration. This assures that in case of one storage or site failure, that specific volume group could be varied on the other node with only one (surviving) LV copy.

For each resource group, we define the application server. The application server runs a test application that provides an intensive writing on the file systems part of these resource groups. With this load we are able to achieve the specific disk utilization between 70 and 100 percent.

Our resource groups are now configured as shown in Example 15-12. The output has been manipulated to show only the fields that are relevant.

Example 15-12 Resource groups after setting forced varyon to true

Resource Group Name testxlvmParticipating Node Name(s) jessica jordanStartup Policy Online On Home Node OnlyFallover Policy Fallover To Next Priority

Node In The ListFallback Policy Never FallbackSite Relationship ignoreNode PriorityService IP Label jessica_svcFilesystems ALLFilesystems Consistency Check fsckFilesystems Recovery Method sequentialVolume Groups ftworthvgUse forced varyon for volume groups, if necessary trueApplication Servers testfwio

Resource Group Name testxlvm2Participating Node Name(s) jordan jessicaStartup Policy Online On Home Node Only

Note: In case of a site failure, we would lose one storage subsystem, hence 50% of the VGDA copies, therefore the varyonvg command can be successful only if issued with the force (-f) flag.


Fallover Policy Fallover To Next Priority Node In The List

Fallback Policy Never FallbackSite Relationship ignoreNode PriorityService IP Label jordan_svcFilesystems ALLFilesystems Consistency Check fsckFilesystems Recovery Method sequentialVolume Groups dallasvgUse forced varyon for volume groups, if necessary trueApplication Servers testdallasio

After we set up the cluster environment, we activate the automatic error notification. We run:

� smitty hacmp Problem Determination Tools HACMP Error Notification Configure Automatic Error Notification Add Error Notify Methods for Cluster Resources.

You can find more information about the error notification in the 11.4, “Automatic error notification” on page 566.

15.4 Testing cross-site LVM mirroringAfter we configure the cluster topology and resources, we synchronize and verify the cluster.

15.4.1 Verifying the clusterWe start the cluster and check if all resources and the communication paths are active. The cldump utility shows the information about the cluster nodes and network interfaces, as well as resource group status (including resource group policies for each resource group).

15.4.2 Tested scenariosNext we describe various scenarios for this feature.

Stop cluster with move resource groupsFirst we test by stopping the cluster with the Move Resource Groups option on node jessica. The cluster services on the jessica node stop and the resource group testxlvm, activates on Dallas site node jordan as expected. After this test, we restart node jessica and move the resource group back via C-SPOC. All resources for testxlvm became available on jessica node and the applications became active.


Move one resource group to the other siteAfter this test, we restart cluster services on node jessica and move the resource group back via C-SPOC. All resources for testxlvm1 became available on the jessica node and the applications became active.

One storage subsystem failureFor the following test, we simulate a storage failure. We test the storage failure while the test applications are active and extensive load on the disks exists. The utilization of the disks is near 100% for all test file systems. We simulated two different types of the storage failure.

For this test, we disabled the SAN ports to the DS4400 storage unit hosting a copy our cross-site mirroring enabled volume groups. This simulates a total storage loss by isolation. For the second test, after getting the data back in sync, we physically disconnect the fibre cables from the Virtual I/O Server that was connected to the DS4300.

In both cases, the applications continue to work without interruption, and the volume groups and file systems remain available. After the failure we check the availability of disk and the status of logical volume copy synchronization. The disks from the DS storage are marked as missing and the logical volume status as stale. Example 15-13 shows the output of the ftworthvg VG when the ESS disk subsystem is not available.

Note in the example the missing 50% of the volume group.

Example 15-13 hdisk and logical volume status after one storage subsystem failure

jessica > lsvg -p ftworthvgftworthvg:PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTIONhdisk12 missing 639 58 00..00..00..00..58hdisk13 missing 639 58 23..08..00..00..27hdisk17 active 639 58 15..14..14..15..15hdisk18 active 639 58 15..00..09..15..15

jessica > lsvg -l ftworthvgftworthvg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTfwlog2lv jfs2log 1 2 2 open/stale N/Afwtest1lv jfs2 30 60 2 open/stale /fw1fs

The applications continue to work with the remaining logical volume copy. After this test, we rezone to make the DS4400 storage available again. We used the C-SPOC option Synchronize Shared LVM Mirrors and it automatically makes all hdisk devices available and synchronizes all logical volumes. We run:


� smitty cl_admin HACMP Logical Volume Management Synchronize Shared LVM Mirrors Synchronize by Volume Group, then select the appropriate VG.

We verified the disk availability and the status of the logical volume copy synchronization. All disks in all volume groups are available and all logical volumes are in syncd status. Example 15-14 shows the status of the ftworthvg volume group after the storage reintegration.

Example 15-14 hdisk and logical volume status after the storage reintegration

jessica > lsvg -p ftworthvgftworthvg:PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTIONhdisk12 active 639 58 00..00..00..00..58hdisk13 active 639 58 23..08..00..00..27hdisk17 active 639 58 15..14..14..15..15hdisk18 active 639 58 15..00..09..15..15

jessica > lsvg -l ftworthvgftworthvg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTfwlog2lv jfs2log 1 2 2 open/syncd N/Afwtest1lv jfs2 30 60 2 open/syncd /fw1fs

Failure of all disk connections on one siteFor the following test, we disconnect both Fibre Channel connections on site FtWorth and jessica node. After some time delay (a couple of minutes), the cluster detected the storage failure for all file systems on shared volume group and moved to the other site Dallas. The cluster activates the testxlvm resource group on the jordan node. All resources are available on node jordan, the ftworthvg volume group is activated, file systems mounted, and application running. In this case both storage units are still available to the jordan node, so no forced varyon of the volume group is required and data will not go stale.

Site failureFor the following test, we simulate FtWorth site failure by simultaneously crashing the jessica node and the DS4400 disk subsystem Fibre Channel connections. The cluster detects the site failure and activates the resource on the jordan node. Unlike the previous test scenario, because the DS4400 is unreachable, takeover will have to force varyon the volume group. All resources are available on node jordan, the volume groups belonging to testxlvm1 and testxlvm2 are activated with the surviving disk copy (DS4500 storage).


After this test, we connected back the DS4400 disk subsystem, so that the DS4400 disk resources are available again. We used the C-SPOC option Synchronize Shared LVM Mirrors and it automatically makes all hdisk devices available and synchronizes all logical volumes. We run:

� smitty cl_admin HACMP Logical Volume Management Synchronize Shared LVM Mirrors Synchronize by Volume Group, then select the appropriate VG.

After verifying that the data is back in sync, we perform a resource group move back to node jessica at the FtWorth site. The resources became available on node jessica and the application is started.

15.5 Maintaining cross-site LVM

It is just as important to properly maintain the configuration as it is to set it up properly from the start. In general, you should use C-SPOC to maintain the storage space allocation in your environment. However, it is very important to know exactly how to do it correctly because parts of C-SPOC are not completely cross-site aware.

The most common tasks to perform are:

� Creating a new volume group� Adding volumes into an existing volume group� Adding new logical volumes� Adding additional space to an existing logical volume� Adding a file system� Increasing the size of a file system

Creating a new volume groupWhen creating a new volume group, refer to the initial configuration steps in 15.3.4, “Configuring volume groups with cross-site LVM mirror” on page 650.

Adding volumes into an existing volume groupWhen adding disks into an existing volume group, it is important to add them in pairs, one for each site. Make sure that the PVID of the disks are known to each system, run the discovery process, and define the disks to their appropriate sites as described in 15.3.3, “Configuring cross-site LVM mirroring site dependencies” on page 646.

Tip: The cluster test tool can also be utilized for site testing.


To add volumes/disks into an existing volume group, run:

� smitty cl_vgsc Add a Volume to a Shared Volume Group, press Enter, select the appropriate resource group and volume group and press Enter.

You will then be presented with a list of volumes to choose from. Make sure that you choose one from each site as shown in Figure 15-3.

Figure 15-3 Add a volume to a shared volume group

Upon choosing the appropriate disks, verify the fields in the final menu and press Enter to complete the addition.

Adding new logical volumesWhen adding a new logical volume, it is important to create the logical volume copies on separate disks, and those disks are specific to each site. It is also crucial to set the allocation policy to superstrict as shown in Figure 15-5.

Set Characteristics of a Shared Volume Group


Add a Volume to a Shared Volume Group Change/Show characteristics of a Shared Volume Group Remove a Volume from a Shared Volume Group Enable/Disable a Shared Volume Group for Cross-Site LVM Mirroring Verification +--------------------------------------------------------------------------+ ¦ Physical Volume Names ¦ ¦ ¦ ¦ Move cursor to desired item and press F7 ¦ ¦ ONE OR MORE items can be selected. ¦ ¦ Press Enter AFTER making all selections. ¦ ¦ ¦ ¦ > jordan hdisk9 Dallas ¦ ¦ > jordan hdisk10 FtWorth ¦ ¦ jordan hdisk8 FtWorth ¦ ¦ jordan hdisk3 Dallas ¦ ¦ ¦ ¦ Esc+1=Help Esc+2=Refresh Esc+3=Cancel ¦ ¦ F7=Select F8=Image F10=Exit ¦Es¦ Enter=Do /=Find n=Find Next ¦F9+------------------------------------------------------------------------+


To add a new logical volume, run:

� smitty cl_lv Add a Shared Logical Volume, press Enter, select the appropriate resource group and volume group, and press Enter.

Then choose the appropriate disks, one from each site, using F7 as shown in Figure 15-4.

Figure 15-4 Select physical volumes to add logical volumes on

Shared Logical Volumes


List All Shared Logical Volumes by Volume Group Add a Shared Logical Volume Set Characteristics of a Shared Logical Volume Show Characteristics of a Shared Logical Volume Change a Shared Logical Volume +--------------------------------------------------------------------------+ ¦ Physical Volume Names ¦ ¦ ¦ ¦ Move cursor to desired item and press F7. ¦ ¦ ONE OR MORE items can be selected. ¦ ¦ Press Enter AFTER making all selections. ¦ ¦ ¦ ¦ Auto-select ¦ ¦ > jordan hdisk9 FtWorth ¦ ¦ > jordan hdisk10 Dallas ¦ ¦ ¦ ¦ Esc+1=Help Esc+2=Refresh Esc+3=Cancel ¦ ¦ F7=Select F8=Image F10=Exit ¦Es¦ Enter=Do /=Find n=Find Next ¦F9+------------------------------------------------------------------------+

Important: Do not use the Auto-select option.


Upon pressing Enter, you will be presented with the final menu to create the new shared logical volume (Figure 15-5). Choose the appropriate unique name, type, and size of the logical volume. Keep the RANGE of physical volumes set to minimum. Also specify two copies and set the Allocate each logical partition copy on a SEPARATE physical volume? to superstrict. Repeat as needed for each logical volume that needs to be created.

Figure 15-5 Add a shared logical volume with superstrict allocation policy

Add a Shared Logical VolumeType or select values in entry fields.Press Enter AFTER making all desired changes.

[TOP] [Entry Fields] Resource Group Name testsiteip VOLUME GROUP name xsite_vg Reference node jordan* Number of LOGICAL PARTITIONS [20] # PHYSICAL VOLUME names hdisk9 hdisk10 Logical volume NAME [xsitelv] Logical volume TYPE [jfs2] + POSITION on physical volume middle + RANGE of physical volumes minimum + MAXIMUM NUMBER of PHYSICAL VOLUMES [] # to use for allocation Number of COPIES of each logical 2 + partition Mirror Write Consistency? active + Allocate each logical partition copy superstrict + on a SEPARATE physical volume? RELOCATE the logical volume during reorganization? yes + Logical volume LABEL [] MAXIMUM NUMBER of LOGICAL PARTITIONS [512] # Enable BAD BLOCK relocation? yes + SCHEDULING POLICY for reading/writing parallel + logical partition copies Enable WRITE VERIFY? no + File containing ALLOCATION MAP [] / Stripe Size? [Not Striped] + Serialize I/O? no + Make first block available for applications? no +


You can verify that the copies have been created correctly by viewing the mapping of the partitions for each logical volume. Use the lslv -m lvname command as shown in Figure 15-6.

Figure 15-6 Mirrored logical volume partition mapping

jordan /# lslv -m xsitelvxsitelv:/xsitefsLP PP1 PV1 PP2 PV2 PP3 PV30001 0104 hdisk9 0104 hdisk100002 0105 hdisk9 0105 hdisk100003 0106 hdisk9 0106 hdisk100004 0107 hdisk9 0107 hdisk100005 0108 hdisk9 0108 hdisk100006 0109 hdisk9 0109 hdisk100007 0110 hdisk9 0110 hdisk100008 0111 hdisk9 0111 hdisk100009 0112 hdisk9 0112 hdisk100010 0113 hdisk9 0113 hdisk100011 0115 hdisk9 0115 hdisk100012 0116 hdisk9 0116 hdisk100013 0117 hdisk9 0117 hdisk100014 0118 hdisk9 0118 hdisk100015 0119 hdisk9 0119 hdisk100016 0120 hdisk9 0120 hdisk100017 0121 hdisk9 0121 hdisk100018 0122 hdisk9 0122 hdisk100019 0123 hdisk9 0123 hdisk100020 0124 hdisk9 0124 hdisk10


Adding additional space to an existing logical volumeSimilar to creating a logical volume, it is important to allocate additional space properly to maintain the mirrored copies at each site. To add mores space, run:

1. smitty cl_lvsc Increase the Size of a Shared Logical Volume and press Enter.

2. Choose the appropriate logical volume the pop-up list displayed as shown in Figure 15-7. You will then be presented with a list of disks that belong to the same volume group as the logical volume previously chosen. The list will be very similar to the one seen when creating a new logical volume as shown in Figure 15-4. Choose the appropriate disks with F7 and press Enter.

Figure 15-7 Shared logical volume pop-up list

Important: Do not use the Auto-select option.




+--------------------------------------------------------------------------+ ¦ Shared Logical Volume Names ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. ¦ ¦ ¦ ¦ testsiteip cspocmklvtest ¦ ¦ testsiteip loglv02 ¦ ¦ testsiteip xsitelv ¦ ¦ ¦ ¦ Esc+1=Help Esc+2=Refresh Esc+3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦Es¦ /=Find n=Find Next ¦F9+------------------------------------------------------------------------+


3. After selecting the target disks, you will be presented with the final Increase the Size of a Shared Logical Volume menu as shown in Figure 15-8. Be sure to keep the RANGE of physical volumes set to minimum and set the Allocate each logical partition copy on a SEPARATE physical volume? to superstrict as it should already be set properly, if the logical volume was originally created correctly.

Figure 15-8 Increase the size of a shared logical volume

4. After adding additional space, verify that the partition mapping is correct by running the lslv -m lvname command again. Shown in bold are the ten new partitions just added to the logical volume:

jordan /# lslv -m xsitelvxsitelv:/xsitefsLP PP1 PV1 PP2 PV2 PP3 PV30001 0104 hdisk9 0104 hdisk100002 0105 hdisk9 0105 hdisk100003 0106 hdisk9 0106 hdisk100004 0107 hdisk9 0107 hdisk100005 0108 hdisk9 0108 hdisk100006 0109 hdisk9 0109 hdisk100007 0110 hdisk9 0110 hdisk100008 0111 hdisk9 0111 hdisk100009 0112 hdisk9 0112 hdisk100010 0113 hdisk9 0113 hdisk100011 0115 hdisk9 0115 hdisk100012 0116 hdisk9 0116 hdisk10

Increase the Size of a Shared Logical Volume


[Entry Fields] Resource Group Name testsiteip LOGICAL VOLUME name xsitelv Reference node jordan* Number of ADDITIONAL logical partitions [10] # PHYSICAL VOLUME names hdisk9 hdisk10 POSITION on physical volume outer_middle + RANGE of physical volumes minimum + MAXIMUM NUMBER of PHYSICAL VOLUMES [] # to use for allocation Allocate each logical partition copy superstrict + on a SEPARATE physical volume? File containing ALLOCATION MAP []


0013 0117 hdisk9 0117 hdisk100014 0118 hdisk9 0118 hdisk100015 0119 hdisk9 0119 hdisk100016 0120 hdisk9 0120 hdisk100017 0121 hdisk9 0121 hdisk100018 0122 hdisk9 0122 hdisk100019 0123 hdisk9 0123 hdisk100020 0124 hdisk9 0124 hdisk100021 0125 hdisk9 0125 hdisk100022 0126 hdisk9 0126 hdisk100023 0127 hdisk9 0127 hdisk100024 0128 hdisk9 0128 hdisk100025 0129 hdisk9 0129 hdisk100026 0130 hdisk9 0130 hdisk100027 0131 hdisk9 0131 hdisk100028 0132 hdisk9 0132 hdisk100029 0133 hdisk9 0133 hdisk100030 0134 hdisk9 0134 hdisk10

Adding a file systemC-SPOC can be used to add a file system in a cross-site LVM mirrored configuration. The key here is to always create the logical volume first to ensure proper mirroring. Then add the file system on the previously defined logical volume.

Important: Always add a file system by creating the logical volume first, then create the file system on a previously defined logical volume. The reason for this is: If you allow the creation of the logical volume at the time of file system creation, the logical volume mirroring will not created.


To add a new JFS2 file system, run:

1. smitty cl_whichfs Enhanced Journaled File Systems Add an Enhanced Journaled File System on a Previously Defined Logical Volume and press Enter.

2. Then choose the appropriate logical volume from the picklist as shown in Figure 15-9.

Figure 15-9 Add a file system to a previously defined cross-site logical volume

3. Then proceed to fill out the final menu fields appropriately as needed, as shown Figure 15-15. Press Enter twice to complete the creation.

Enhanced Journaled File Systems


Add an Enhanced Journaled File SystemAdd an Enhanced Journaled File System on a Previously Defined Logical VolumeList All Shared File SystemsChange / Show Characteristics of a Shared Enhanced Journaled File SystemRemove a Shared File System

+------------------------------------------------------------------------+ ¦ Logical Volume Names ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. ¦ ¦ ¦ ¦ #Logical Volume Volume Group Resource Group Node List ¦ ¦ xsitelv xsite_vg testsiteip jordan,jessica¦ ¦ ¦ ¦ Esc+1=Help Esc+2=Refresh Esc+3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦Es¦ /=Find n=Find Next ¦F9+------------------------------------------------------------------------+


Example 15-15 Final Add a JFS2 SMIT menu

Add an Enhanced Journaled File System on a Previously Defined Logical Volume


[Entry Fields] Resource Group testsiteip Node Names jordan,jessica LOGICAL VOLUME name xsitelv Volume Group xsite_vg* MOUNT POINT [/xsitefs] / PERMISSIONS read/write + Mount OPTIONS [] + Block Size (bytes) 4096 + Inline Log? no + Inline Log size (MBytes) [] # Logical Volume for Log + Extended Attribute Format Version 1 +

ENABLE Quota Management? no +


4. Repeat this step as needed for each logical volume previously created that requires a file system.

Increasing the size of a file systemC-SPOC can be used to increase the size of file system, in a cross-site LVM mirrored configuration. The key here is to always increase the size of the logical volume first to ensure proper mirroring. Then increase the size of the file system on the previously defined logical volume.

This is because the cl_chfs command is not cross-site LVM aware. When a shared file system is extended using C-SPOC, cl_chfs calls the chfs command which automatically extends the LV taking free space from anywhere in the volume group. This does not ensure the mirroring will be properly maintained, as documented in APAR IZ44896.

Important: Always add more space to a file system, by adding more space to the logical volume first. Never allow cl_chfs to add the additional space when using cross-site lvm mirroring as it does not maintain the mirroring properly.


To add additional space to a JFS2 file system, run:

1. smitty cl_whichfs Enhanced Journaled File Systems Change / Show Characteristics of a Shared Enhanced Journaled File System.

2. Then choose the appropriate logical volume and fill in the rest of the fields appropriately as shown in Figure 15-10.

Figure 15-10 Increase the size of Shared Enhanced Journaled File System

3. Ensure that the size of the file system matches the size of the logical volume. If you are unsure, you can use the lsfs -q mountpoint command as shown in the following example:

jordan /# lsfs -q /xsitefsName Nodename Mount Pt VFS Size Options Auto Accounting/dev/xsitelv -- /xsitefs jfs2 327680 rw no no (lv size: 491520, fs size: 327680, block size: 4096, sparse files: yes, inline log: no, inline log size: 0, EAformat: v1, Quota: no, DMAPI: no, VIX: no)



[Entry Fields] Resource Group Name connor_rg File system name /xsitefs NEW mount point [/xsitefs] SIZE of file system (in 512-byte blocks) [491520] Mount GROUP [] Mount AUTOMATICALLY at system restart? PERMISSIONS read/write + Mount OPTIONS [] + Start Disk Accounting? no + Block Size (bytes) 4096 Inline Log? no Inline Log size (MBytes) 0




Chapter 16. PowerHA/XD and SVC copy services

In this chapter we describe a disaster recovery scenario based on PowerHA/XD utilizing SVC copy services. We provide a description of the steps needed to configure a PowerHA/XD SVC environment.


� Scenario description� implementing a PowerHA/XD SVC configuration� PowerHA/XD SVC prerequisites overview� Installing PowerHA/XD for SVC� Configuring PowerHA/XD for SVC

16

Note: PowerHA/XD 5.5 introduced support for SVC global copy. At the time of writing, the base publications still had it listed as a restriction.


16.1 Scenario descriptionThis conceptual scenario uses two nodes, one in each of two sites: NewYork and Texas. Node xdsvc1, and its corresponding SVC, itsopoksvc, are at site NewYork. Node xdsvc2, and its corresponding SVC, itsoaustinsvc, are at site Texas. Each node/site provides automatic recovery for the other. This is often referred to as a mutual takeover configuration.

Each site consists of three IP networks. One is for the SVC (which is not required to be defined to the cluster topology), one is for the regular public ethernet, and one is for the XD_ip network. Ideally you would have multiple networks between sites to minimize the chance of site isolation. Figure 16-1 details the configuration between the two sites.

Figure 16-1 PowerHA/XD SVC scenario

In this scenario, each node has three unique disks defined through an SVC at each site. Node xdsvc1 at site NewYork will have both a metro mirror and global mirror configuration to site Texas. Node xdsvc2 will have a single metro mirror configuration replication to site NewYork. This shows that mixing both copy services types together is indeed allowed. However, mixing replicated volume

NewYork Site Texas Site

Node xdsvc1 Node xdsvc2

Copy Service Links

WAN

hdisk3

itsopoksvcVer 4.3.1.6 itsoaustinsvc

Ver 4.3.1.6

svcethernet svcethernet

Metro CopyMaster Vdisk

hdisk4 hdisk7


hdisk5 hdisk5

Global CopyMaster Vdisk

Metro CopyAux. Vdisk

hdisk6

Global CopyAux. Vdisk


Metro Mirroring

Metro Mirroring

Global Mirroring

NY local SAN TX local SAN

NewYork Site Texas Site

Node xdsvc1 Node xdsvc2

Copy Service Links

WAN

hdisk3

itsopoksvcVer 4.3.1.6 itsoaustinsvc

Ver 4.3.1.6

svcethernet svcethernet


hdisk4 hdisk7


hdisk5 hdisk5

Global CopyMaster Vdisk


hdisk6

Global CopyAux. Vdisk


Metro Mirroring

Metro Mirroring

Global Mirroring

NY local SAN TX local SAN


groups and non-replicated in the same resource group is not permitted. Note that in this scenario there are no non-replicated volume groups.

There are three volume groups, one for each pair of replicated Vdisks. Xdsvc1 has two volume groups, nymetrovg and nyglobalvg for use by metro and global mirroring respectively. Xdsvc2 has one volume group, txmetrovg, which will use metro mirroring.

16.2 implementing a PowerHA/XD SVC configuration

In this section we discuss prerequisites, installation, and configuration.

16.2.1 PowerHA/XD SVC prerequisites overview

This section introduces the base prerequisites for implementing a PowerHA/XD SVC configuration. As with any new implementation, planning is the key to minimizing risk and maximizing success. There are several prerequisites required in order to complete the solution. More complete details and corresponding planning worksheets can be found in the HACMP/XD Metro Mirror: Planning and Administration guide, SC23-4863.

SoftwareThe following list describes additional required software needed for PowerHA/XD utilizing SVC copy services:

� AIX 5.3 TL 7 or higher� openssh version 4.6.1 or higher (+ license), for access to the SVC interface� Base PowerHA filesets (5.5 or higher is required for global copy support)� Virtual I/O Server 1.5 (if applicable)� 2145 SDD or SDDPCM 2.2.0.0 (for AIX client or VIOS)� SVC copy services (metro and/or global)

PowerHA/XD configuration requirementsIn addition to the base cluster networks, each node that can host SVC replicated resources must have access to its own local SVC over an IP network. While this network does not have to be defined to the cluster, it is required in order for PowerHA/XD to run remote commands to the SVC via ssh.

Tip: Refer to Implementing the IBM System Storage SAN Volume Controller V4.3, SG24-6423, for additional details on how to configure ssh client access to the SVC.

Chapter 16. PowerHA/XD and SVC copy services 675

It is also required to have two SVC (clusters), one at each site, and each cluster is comprised of at least two SVC nodes. Doing this eliminates a single point of failure, which is a best practice. Each cluster can be both a master and an auxiliary for copy services as desired.

Limitations and restrictionsThe current release of Power/XD with SVC has some limitations and restrictions:

� Volume Groups: Resource Groups to be managed by HACMP cannot contain volume groups with both SVC PPRC-protected and non SVC-PPRC protected disks.

� Replicated Resources: You cannot mix both SVC Global Mirror and Metro Mirror in the same resource group.

� C-SPOC: You cannot use C-SPOC for the following LVM operations to configure nodes at the remote site (that contain the target volumes):

– Creating a volume group.

– Operations that require nodes at the target site to write to the target volumes (for example, changing file system size, changing mount point, adding LVM mirrors) cause an error message in CSPOC. This includes functions such as changing file system size, changing mount points, and adding LVM mirrors. However, nodes on the same site as the source volumes can successfully perform these tasks. The changes are subsequently propagated to the other site via lazy update.

Matching Vdisk to client hdiskIt is important to know which Vdisk on the SVC corresponds to the end client hdisk device. Most new environments today utilize Virtual I/O Servers (VIOS). The following procedure shows how to determine which devices correspond to each other. It assumes that the disks, hdisks, are already defined on the AIX client node.

An SVC Vdisk has a unique_id (UID) on the SVC and it is also part of the disk device definition in AIX. This can be gathered from the SVC master console under Work with Virtual Disks Virtual Disks as shown in Figure 16-2.

If using vpath devices instead of hdisks, refer to The HACMP/XD Metro Mirror: Planning and Administration guide, SC23-4863.

Note: The HACMP/XD Metro Mirror: Planning and Administration guide, SC23-4863, states that the cluster node name and the SVC host alias names must match. This restriction does not apply when the cluster node is virtual I/O client.


Figure 16-2 SVC Vdisk unique ids

It also can be displayed via the SVC command line interface. Assuming ssh access from the client to the SVC is already in place, simply run:

ssh admin@svc_cluster_ip svcinfo lshostvdiskmap |more

You can also grep on the host alias name to narrow the list. Record the UIDs for each Vdisk. Repeat this on each SVC. An example is shown below:

vdisk_name wwpn vdisk_UIDHARED-B0002 10000000C9738E85 60050768018F026C4000000 00000000C

On the AIX clients, the UID is stored in the ODM. This can be obtained by running odmget -q “attribute=unique_id” CuAt. The Vdisk UID will be contained in this attribute starting at the tenth numeric position.

Here we show a coding example with the Vdisk UID in bold:

name = "hdisk7" attribute = "unique_id" value = "48333321360050768018F026C400000000000000C04214503IBMfcp05VDASD03AIXvscsi"

type = "R" generic = "" rep = "n" nls_index = 0


In this scenario, hdisk7 matches Vdisk HARED-B0002. Repeat the matchings and record them properly in order to create the proper replicated relationships later.

16.2.2 Installing PowerHA/XD for SVCIn addition to the base PowerHA filesets, the following additional filesets need to be installed when using SVC copy services:

� cluster.es.svcpprc.cmds� cluster.es.svcpprc.rte� cluster.msg.en_US.svcpprc� cluster.xd.license

Generally these filesets should be installed at the same time as the base PowerHA filesets. However, there is the option to install them onto an existing PowerHA cluster, as in this scenario, at any time if expanding a cluster’s current configuration.

After installing the required filesets, you will have something similar to this:

cluster.doc.en_US.es.html 5.5.0.1 A F HAES Web-based HTML cluster.doc.en_US.es.pdf 5.5.0.0 C F HAES PDF Documentation -

U.S.

Note: A known problem exists when installing the PowerHA/XD SVC related filesets after PowerHA has already been installed. The following errors are displayed during the install:

cluster.es.svcpprc.rte.pre_i: ERRORCluster services are active on this node. Please stop allcluster services prior to installing this software.

instal: Failed while executing the cluster.es.svcpprc.rte.pre_i script.

cluster.es.svcpprc.cmds.pre_i: ERRORCluster services are active on this node. Please stop allcluster services prior to installing this software.

instal: Failed while executing the cluster.es.svcpprc.cmds.pre_i script

To get around this problem, if the cluster is down, make a snapshot, uninstall all PowerHA filesets, and reinstall all PowerHA and XD filesets together. Then restore the snapshot. Another option is to stop cluster services with takeover, then stop clstrmgrES locally via stopsrc. Then install the PowerHA/XD SVC filesets and restart clstrmgrES. Repeat as needed. Otherwise, contact the support line for further assistance.


cluster.es.cfs.rte 5.5.0.1 A F ES Cluster File System Support

cluster.es.client.clcomd 5.5.0.1 A F ES Cluster Communication cluster.es.client.lib 5.5.0.1 A F ES Client Libraries cluster.es.client.rte 5.5.0.1 A F ES Client Runtime cluster.es.client.utils 5.5.0.1 A F ES Client Utilities cluster.es.client.wsm 5.5.0.1 A F Web based Smit cluster.es.cspoc.cmds 5.5.0.1 A F ES CSPOC Commands cluster.es.cspoc.dsh 5.5.0.0 C F ES CSPOC dsh cluster.es.cspoc.rte 5.5.0.1 A F ES CSPOC Runtime Commands cluster.es.nfs.rte 5.5.0.0 C F ES NFS Support cluster.es.plugins.dhcp 5.5.0.1 A F ES Plugins - dhcp cluster.es.plugins.dns 5.5.0.1 A F ES Plugins - Name Server cluster.es.plugins.printserver cluster.es.server.cfgast 5.5.0.0 C F ES Two-Node Configuration cluster.es.server.diag 5.5.0.1 A F ES Server Diags cluster.es.server.events 5.5.0.1 A F ES Server Events cluster.es.server.rte 5.5.0.1 A F ES Base Server Runtimecluster.es.server.testtool

cluster.es.server.utils 5.5.0.1 A F ES Server Utilities cluster.es.svcpprc.cmds 5.5.0.0 C F ES HACMP - SVC PPRC Commands cluster.es.svcpprc.rte 5.5.0.1 A F ES HACMP - SVC PPRC Runtime cluster.es.worksheets 5.5.0.1 A F Online Planning Worksheets cluster.license 5.5.0.0 C F HACMP Electronic License cluster.man.en_US.es.data 5.5.0.0 C F ES Man Pages - U.S. English cluster.msg.En_US.svcpprc 5.5.0.0 C F HACMP SVC PPRC Messages -

U.S. cluster.msg.en_US.cspoc 5.5.0.0 C F HACMP CSPOC Messages - U.S. cluster.msg.en_US.es.client cluster.msg.en_US.es.servercluster.xd.license 5.5.0.0 C F HACMP XD Feature License

16.2.3 Configuring PowerHA/XD for SVC

This section covers the steps required to configure a PowerHA/XD for SVC cluster. However, these steps are in addition to configuring a base PowerHA cluster. For more information about configuring a PowerHA cluster, refer to Chapter 4, “Installation and configuration” on page 199.

The initial PowerHA topology configuration already contains a single IP network and two service addresses as shown here:

root@ xdsvc1[/usr/es/sbin/cluster/utilities] cltopinfoCluster Name: xdsvcCluster Connection Authentication Mode: StandardCluster Message Authentication Mode: NoneCluster Message Encryption: NoneUse Persistent Labels for Communication: No


There are 2 node(s) and 1 network(s) defined

NODE xdsvc1:

Network net_ether_01 tx_service 192.168.1.2 ny_service 192.168.1.1 xdsvc1 9.12.7.5

NODE xdsvc2:

Network net_ether_01tx_service 192.168.1.2

ny_service 192.168.1.1xdsvc2 9.12.7.11

It might also be desirable to have a service address that is unique to each site. If that is the case, refer to 13.3, “Site specific service IP labels” on page 613.

These additional steps will be performed to completely configure the cluster to utilize SVC copy services between sites:

1. Add two sites.

2. Add XD_ip network(s).

3. Add SVC cluster definitions.

4. Add SVC PPRC relationships.

5. Add SVC PPRC-Replicated resources.

6. Create volume group(s):

– Create logical volumes.– Create file systems.

7. Create temporary SVC PPRC relationship.

– Import volume group(s) to remote site.

8. Create resource group(s).

9. Add volume groups and replicated resources into resource group(s).


Add the sitesTo add the sites, run the following command:

1. smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP Sites Add a Site

2. In this scenario, the two sites, NewYork and Texas are added. Node xdsvc1 is a part of the NewYork site, while node xdscv2 is a part of the Texas site.

Figure 16-3 shows the SMIT menu for site creation.

Figure 16-3 Adding sites in the topology configuration

The required fields to complete for the definition of a site are as follows:

� Site Name: Define a name of a site, using no more than 64 alphanumeric and underscore characters.

� Site Nodes: For each site we define at least one node residing within the site. You can add multiple by leaving a blank space between the names. Each node can only belong to one site.

Upon completion, the sites are defined as follows:

root@ xdsvc2[] cllssite---------------------------------------------------Sitename Site Nodes Dominance Protection Type---------------------------------------------------NewYork xdsvc1 yes NONETexas xdsvc2 no NONE.

Add Site

Type or select values in entry fields.Press Enter AFTER making all desired changes. [Entry Fields]* Site Name [NewYork] +* Site Nodes xdsvc1 +


Note: The options, Dominance and Backup Communications, have been removed in PowerHA 5.5. However, they are still displayed via the command cllssite.


Add the XD_ip networkTo define an additional network, run the following command:

1. smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP networks Add a Network to the Cluster, and press Enter.

2. Choose the network type, in this case XD_ip, and press Enter. You can keep the default network name or specify one as shown in Figure 16-4.

Figure 16-4 Add XD_ip network

After adding the network, define the appropriate communication interfaces to the network. This is done by running the following command:

1. smitty hacmp Extended Configuration Extended Topology Configuration Configure HACMP Communication Interfaces/Devices Add Communication Interfaces/Devices Add Pre-Defined Communication Interfaces and Devices Communication Interfaces Choose the previously configured network, net_XD_ip_01 in this scenario, and press Enter.

2. Add the appropriate interface and repeat for each node. If you have run configuration discovery previously, then you can use the Add Discovered Communication Interface and Devices option instead of the pre-defined option.



[Entry Fields]* Network Name [net_XD_ip_01]* Network Type XD_ip* Netmask [255.255.255.0] +* Enable IP Address Takeover via IP Aliases [No] + IP Address Offset for Heartbeating over IP Aliases []

Note: Though an XD_ip network is not required, most configurations would use them between sites.


Add SVC clustersTo add the SVC clusters into the cluster configuration, run:

1. smitty svcpprc_def SVC Clusters Definition to HACMP

2. The menu appears as shown in Figure 16-5.

The required fields to complete are as follows:

SVC Cluster Name Enter the same name for the SVC cluster. This name cannot be more than 20 alphanumeric characters and underscores.

SVC Cluster Role Select Master or Auxiliary. The Master SVC Cluster is usually defined at the Primary HACMP site, the Auxiliary SVC Cluster at the Backup HACMP site.

SVC Cluster IP Address IP address of this cluster. used by HACMP to submit PPRC management commands.

Remote SVC Partner The name of the SVC Cluster that will be hosting vDisks from the other side of the SVC PPRC link.

Figure 16-5 Adding an SVC cluster

Note: There is not a diskhb network configured because typically there are no shared disks between sites. Use either additional XD_ip networks; or if using any serial, or serial to fiber, connection between sites, then an XD_rs232 network can also be defined.

Add an SVC Cluster


[Entry Fields]* SVC Cluster Name [itsopoksvc]* SVC Cluster Role [Master] +* HACMP site NewYork +* SVC Cluster IP Address [192.168.200.20]* Remote SVC Partner [itsoaustinsvc]



Complete the menu as needed and repeat as required. In this scenario, both SVC clusters will have a master role. This step is repeated for the SVC, itsoaustinsvc, at site Texas and listing its remote partner of itsopoksvc at site NewYork.

At this point the SVC cluster definition is as follows:

root@ xdsvc1 [/usr/es/sbin/cluster/svcpprc/cmds]./cllssvc -asvccluster_name svccluster_role sitename cluster_ip r_partneritsoaustinsvc Master Texas 192.168.200.10 itsopoksvcitsopoksvc Master NewYork 192.168.200.20 itsoaustinsvc

Add SVC PPRC relationshipsThese relationships define a pair of Vdisks, one from each SVC cluster/site, to be used for replication.

To add the relationships, run:

1. smitty svcpprc_def SVC PPRC Relationships Definition Add an SVC PPRC Relationship



Relationship Name This name is used by both SVC and PowerHA for configuring the SVC replication relationships. This is limited to 20 alphanumeric characters and underscores.

Master Vdisk Info The Master VDisk is the disk that resides at the primary site for the resource group that the SVC PPRC Relationship will be a part of. The Master and Auxiliary VDisk names use the format: vdisk_name@svc_cluster_name.

Auxiliary Vdisk Info The Auxiliary VDisk is the disk at the backup site for the resource group that the SVC PPRC Relationship will be a part of.

Important: It is essential to know which disks (hdisks or vpaths) and their corresponding Vdisks will be used at each site to define proper relationships. Consult “Matching Vdisk to client hdisk” on page 676 for more information.

Note: While PowerHA allows 20 alpha numerics in the Relationship Name field, the SVC only allows 15. Do not use more than 15 character names.


Figure 16-6 Adding an SVC PPRC relationship

Repeat this step for every disk as needed. In this scenario it is repeated twice; once to create another pair, nyhdsk4txhdsk6, which will later be used for the Global mirror from NewYork to Texas; and once more, to create another pair, txhdsk7nyhdsk5, which will be used for a Metro mirror from Texas to NewYork. Upon completion, the relationships are as follows:

root@xdsvc1 [/usr/es/sbin/cluster/svcpprc/cmds]./cllsrelationship -arelationship_name MasterVdisk_info AuxiliaryVdisk_infonyhdsk3txhdsk5 HARED-A0003@itsopoksvc HARED-B0003@itsoaustinsvcnyhdsk4txhdsk6 HARED-A0007@itsopoksvc HARED-B0004@itsoaustinsvctxhdsk7nyhdsk5 HARED-B0002@itsoaustinsvc HARED-A0006@itsopoksvc

Add SVC PPRC-Replicated resourcesThis step creates the replicated resources to be added later in the resource groups. To add the replicated resources run:

1. smitty svcpprc_def SVC PPRC Relationships Definition Add an SVC PPRC Resource



SVC PPRC Consistency Name to be used by SVC and also to be used Group Name in the resource group configuration. Use no

more than 20 alphanumeric characters and underscores.

Master SVC Cluster Name Name of the Master cluster is the SVC Cluster connected to the PowerHA Primary Site.

Add an SVC PPRC Relationship


[Entry Fields]* Relationship Name [nyhdsk3txhdsk5]* Master VDisk Info [HARED-A0003@itsoausti]* Auxiliary VDisk Info [HARED-B0003@itsopoksv]

F1=Help F2=Refresh F3=Cancel F4=ListF5=Reset F6=Command F7=Edit F8=Image F10=Exit Enter=Do


Auxiliary SVC Cluster Name Name of the SVC Cluster connected to the PowerHA Backup/Recovery Site.

List of Relationships List of names of the SVC PPRC relationships previously created.

Copy Type Specify either Metro or Global

HACMP Recovery Action This indicates the action to be taken by PowerHA in case of a site fallover for the replicated pairs. MANUAL: Manual intervention required or AUTOMATED: No manual intervention required.

Figure 16-7 Add SVC PPRC replicated resource

Every field except for the SVC PPRC Consistency Group Name will provide a picklist of what has been previously defined. Repeat this step for each replicated relationship previously created. In this scenario it is repeated twice more. Upon completion the replicated resources are as follows:

-----------------------------------------------------------------------------------------------root@ xdsvc1[/usr/es/sbin/cluster/svcpprc/cmds] ./cllssvcpprc -asvcpprc_consistencygrp MasterCluster AuxiliaryCluster relationships CopyType RecoveryActionnytexmetro itsopoksvc itsoaustinsvc nyhdsk3txhdsk5 METRO AUTOtexnymetro itsoaustinsvc itsopoksvc txhdsk7nyhdsk5 METRO AUTOnytexglobal itsopoksvc itsoaustinsvc nyhdsk4txhdsk6 GLOBAL AUTO-----------------------------------------------------------------------------------------------

Add an SVC PPRC Resource


[Entry Fields]* SVC PPRC Consistency Group Name [nytexmetro]* Master SVC Cluster Name [itsopoksvc] +* Auxiliary SVC Cluster Name [itsoaustinsvc] +* List of Relationships [nyhdsk3txhdsk5] +* Copy Type [METRO] +* HACMP Recovery Action [AUTO] +



Create Volume Group(s)Similar to regular local clustering, you can create the volume group locally and import it to the other system. However, what is different, is these are not actual shared disks and temporary replication is required before importing to the remote system. This step is shown later in “Create temporary SVC PPRC relationship”.

Use the smitty mkvg fast path to create a volume group as desired. Make sure that the option Activate volume group AUTOMATICALLY at system restart is set to no. Repeat this step as needed for each volume group. In this scenario there are two volume groups created on xdsvc1, nymetrovg and nyglobalvg, and one volume group created on xdsvc2, txmetrovg.

Upon completion, the disks and volume groups appear as follows:

root@ xdsvc1[/] lspvhdisk0 000fe401afb3c530 rootvg activehdisk1 000fe401d39e0575 rootvghdisk2 000fe401d39e2344 Nonehdisk3 000fe4014e5026c3 nymetrovghdisk4 000fe4014e504bd5 nyglobalvghdisk5 None None

root@ xdsvc2[] lspvhdisk0 000fe411b923cfe9 rootvg activehdisk1 000fe411b39e0575 rootvghdisk2 000fe4114dde2344 Nonehdisk3 000fe4114dd84f9e Nonehdisk4 000fe4114dd60ba7 Nonehdisk5 None Nonehdisk6 None Nonehdisk7 000fe4114dd7f436 txmetrovg

At this point all logical volumes and file systems would be created on their respective systems. Follow standard unique naming conventions as is normal for clustering.

In this scenario the following names were created to their respective volume groups:

root@ xdsvc1[/usr/sys/inst.images/55] lsvg -l nymetrovgnymetrovg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTnymetrolv jfs2 50 50 1 closed/syncd /nymetrofsnymetrologlv jfs2log 1 1 1 closed/syncd N/A

Note: C-SPOC cannot be used to create a volume group for a replicated resource.


root@ xdsvc1[/usr/sys/inst.images/55] lsvg -l nyglobalvgnyglobalvg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTnygloballv jfs2 50 50 1 closed/syncd /nyglobalfsloglv01 jfs2log 1 1 1 closed/syncd N/A

root@ xdsvc2[] lsvg -l txmetrovgtxmetrovg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTtxmetrolv jfs2 40 40 1 closed/syncd /txmetrofstxmetrologlv jfs2log 1 1 1 closed/syncd N/A

Create temporary SVC PPRC relationshipThis step allows the LVM related information previously created to be replicated to the corresponding remote auxiliary Vdisks. This step is crucial in order to import the volume group information later.

Creating these temporary relationships in this scenario requires running the following command:

ssh admin@<master_SVC_cluster IP> svctask mkrcrelationship -master <master_Vdisk_name> -aux <aux_Vdisk_name> -cluster <Aux_SVC_cluster> -name <relationship_name>

Which in this scenario translates to:

ssh admin@itsopoksvc svctask mkrcrelationship -master HARED-A0003 -aux HARED-B0003 -cluster itsoaustinsvc -name nyhdsk3txhdsk5

Tip: Before continuing to the next step, ensure that the file systems are unmounted and the volume group is varied off.

Note: These steps are performed from the SVC command line interface (CLI). Always consult the specific SVC version CLI guide, SC26-7903, for proper syntax.

Note: The <master_SVC_ClusterIP> can also be a name as long as that name is resolvable (in /etc/hosts). As shown in these examples, itsopoksvc is the SVC cluster name and it resolved to 192.168.200.20.


Then start the copy/relationship by running:

ssh admin@<master_SVC_Cluster IP> svctask startrcrelationship <relationship_name>


ssh admin@itsopoksvc svctask startrcrelationship nyhdsk3txhdsk5

Repeat this previous step as needed for each relationship that was previously created. Upon completion, wait until the relationship moves from inconsistent_copying to consistent_synchronised state. Check the state by running:

ssh admin@master<master_SVC_Cluster IP> svcinfo lsrcrelationship <relationship name>


ssh admin@itsopoksvc svcinfo lsrcrelationship nyhdsk3txhdsk5

Next we list some example output, respectively, taken from before and after the relationship is in sync:

root@ xdsvc1[] ssh admin@itsopoksvc svcinfo lsrcrelationship nyhdsk3txhdsk5id 3name nyhdsk3txhdsk5master_cluster_id 0000020063C09B10master_cluster_name itsoaustinsvcmaster_vdisk_id 3master_vdisk_name HARED-A0003aux_cluster_id 0000020063C09B10aux_cluster_name itsoaustinsvcaux_vdisk_id 7aux_vdisk_name HARED-B0003primary masterconsistency_group_idconsistency_group_namestate inconsistent_copyingbg_copy_priority 50progress 0freeze_timestatus onlinesynccopy_type metro

root@ xdsvc1[] ssh admin@itsopoksvc svcinfo lsrcrelationship nyhdsk3txhdsk5id 3name nyhdk3txhdk5master_cluster_id 0000020063C09B10


master_cluster_name itsoaustinsvcmaster_vdisk_id 3master_vdisk_name HARED-A0003aux_cluster_id 0000020063C09B10aux_cluster_name itsoaustinsvcaux_vdisk_id 7aux_vdisk_name HARED-B0003primary masterconsistency_group_idconsistency_group_namestate consistent_synchronizedbg_copy_priority 50progressfreeze_timestatus onlinesynccopy_type metro

Upon successful completion of the copy, the relationship needs to be removed by running:

ssh admin@<master SVC Cluster IP> svctask rmrcrelationship <relationship name>


ssh admin@itsopoksvc svctask rmrcrelationship nyhdsk3txhdsk5

Repeat this step as needed for each relationship created.

Import volume groupsAfter removing the relationships, using either SMIT or the command line on the respective backup site’s node(s), import the volume group(s) created previously. In this scenario the volume groups nymetrovg and nyglobalvg would be imported on node xdsvc2 in Texas. The volume group txmetrovg would be imported on node xdsvc1 in NewYork.

Import the volume groups by running:

importvg -V <majornumber> -y <vgname> hdisk#

Important: Before importing the volume groups on the remote disks/systems, make sure the correct PVID is present on the disks by running the chdev -l hdisk# -a pv=yes command. This PVID will actually match the hdisk at the opposite site as it is a true complete copy of the disk. It is not a shared disk.


In this scenario, the following commands would be performed:

� On node xdsvc2:

importvg -V 98 -y nymetrovg hdisk5importvg -V 99 -y nyglobalvg hdisk6

� On node xdsvc1:

importvg -V 100 -y txmetrovg hdisk5

Ensure that AUTO VARYON of the volume groups is disabled. If using enhanced concurrent volume groups, this will be the default. If not, run the chvg -a n <vgname> command for each volume group. The volume group must be varied on to run the chvg command.

If the volume group is currently not varied on, do so now by running the varyonvg <vgname> command. Verify that all logical volumes and file systems exist and can be mounted. Verify any data that can reside in the file systems. Next we list the output from each node after importing the volume groups. This should look the same as when originally created, but just now, it matches on the respective remote node(s).

root@ xdsvc2[] lsvg -l nymetrovgnymetrovg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTnymetrolv jfs2 50 50 1 closed/syncd /nymetrofsnymetrologlv jfs2log 1 1 1 closed/syncd N/A

root@ xdsvc2[/usr/sys/inst.images/55] lsvg -l nyglobalvgnyglobalvg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTnygloballv jfs2 50 50 1 closed/syncd /nyglobalfsloglv01 jfs2log 1 1 1 closed/syncd N/A

root@ xdsvc1[] lsvg -l txmetrovgtxmetrovg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTtxmetrolv jfs2 40 40 1 closed/syncd /txmetrofstxmetrologlv jfs2log 1 1 1 closed/syncd N/A

Upon completion of verifying the data, unmount the file systems and varyoff the volume groups on each node.

Note: Specifying a major number is only required when using NFS. However, a good common cluster practice is to keep the major numbers the same.


Create resource group(s)If resource groups already exist that are to be used, this step can be skipped. However, remember that mixing non-replicated volume groups and replicated volume groups in the same resource group is not allowed.

Add resource groupTo add a resource group, run:

1. smitty hacmp Extended Configuration Extended Resource Configuration HACMP Extended Resource Group Configuration Add a Resource Group, press Enter

2. You will then be presented with the menu as shown in Figure 16-8. Specify the fields as desired.

There are additional resource group fields presented when sites are defined to cluster. These fields and their corresponding descriptions are as follows.

Inter-Site Management Policy You can select from the following options:

Ignore If you select this option, the resource group will not have ONLINE SECONDARY instances. Use this option if you use cross-site LVM mirroring. You can also use it with HACMP/XD for Metro Mirror.

Prefer Primary Site The primary instance of the resource group is brought ONLINE on the primary site at startup, the secondary instance is started on the other site. The primary instance falls back when the primary site rejoins.

Online on Either Site During startup, the primary instance of the resource group is brought ONLINE on the first node that meets the node policy criteria (either site). The secondary instance is started on the other site. The primary instance does not fall back when the original site rejoins.

Online on Both Sites: During startup, the resource group (node policy must be defined as Online on All Available Nodes) is brought ONLINE on both sites. There is no fallover or fallback

Participating Nodes from Specify the nodes in order for the primary site.Primary Site


Participating Nodes Specify the nodes in order for the secondary site.from Secondary Site

Figure 16-8 Add a resource group

Complete the fields as desired and repeat for any additional resource groups that need to be added. In this scenario, three resource groups would be created. Two would have xdsvc1 as the primary node, one for the global mirror and one for the metro mirror. The third would have xdsvc2 as primary and contain a global mirror. These three resource groups are shown here:

Resource Group Name newyorkmetrorgParticipating Node Name(s) xdsvc1 xdsvc2Startup Policy Online On Home Node OnlyFallover Policy Fallover To Next Priority

Node In The ListFallback Policy Never FallbackSite Relationship Prefer Primary Site

Resource Group Name newyorkglobalrgParticipating Node Name(s) xdsvc1 xdsvc2Startup Policy Online On Home Node OnlyFallover Policy Fallover To Next Priority

Node In The List



[Entry Fields]* Resource Group Name [newyorkmetrorg]

Inter-Site Management Policy [Prefer Primary Site] +* Participating Nodes from Primary Site [xdsvc1] + Participating Nodes from Secondary Site [xdsvc2] +

Startup Policy Online On Home Node O> Fallover Policy Fallover To Next Prio> Fallback Policy Never Fallback +

Notice: The additional options in bold above are available only when sites are defined.


Fallback Policy Never FallbackSite Relationship Prefer Primary Site

Resource Group Name texasmetrorgParticipating Node Name(s) xdsvc2 xdsvc1Startup Policy Online On Home Node OnlyFallover Policy Fallover To Next Priority


Add VGs and replicated resources into resource groupTo add the service IP labels into the resource group, run:

1. smitty hacmp Extended Configuration Extended Resource Configuration HACMP Extended Resource Group Configuration Change/Show Resources and Attributes for a Resource Group, choose one of the resource groups previous created, newyorkrg in this scenario, and press Enter.

2. Enter the desired resources. In additional to the normal resources of application server and service IP, and volume groups, the SVC PPRC Replicated Resources is also used to include the replicated resources previously created. For site NewYork and newyorkmetrorg, the replicated resource nytexmetro would be added as shown in Figure 16-9. A picklist is available on the field by pressing F4 and then use press F7 to choose each one.

In this scenario, this step would be repeated twice; once for newyorkglobalrg to include the nyglobalvg and the replicated resource nytexglobal; and once more for texasmetrorg to include the volume group txmetrovg and the replicated resource texnymetro.

Upon completion, the two additional resource groups would be as shown in Example 16-1. Note that only the relevant fields are being shown.


Figure 16-9 Add SVC replicated resources into resource group

Example 16-1 Additional SVC replicate resource groups created

Resource Group Name newyorkglobalrg Inter-site Management Policy Prefer Primary Site Participating Nodes from Primary Site xdsvc1 Participating Nodes from Secondary Site xdsvc2


Service IP Labels/Addresses []> Application Servers [] +

Volume Groups [nyglobalvg] SVC PPRC Replicated Resources [nytexglobal>

Resource Group Name texaskmetrorg Inter-site Management Policy Prefer Primary Site Participating Nodes from Primary Site xdsvc2 Participating Nodes from Secondary Site xdsvc1

Startup Policy Online On Home Node O> Fallover Policy Fallover To Next Prio>



[TOP] [Entry Fields] Resource Group Name newyorkmetrorg Inter-site Management Policy Prefer Primary Site Participating Nodes from Primary Site xdsvc1 Participating Nodes from Secondary Site xdsvc2


Service IP Labels/Addresses [ny_service> Application Servers [writenydata] +

Volume Groups [nymetrovg] SVC PPRC Replicated Resources [nytexmetro>


Fallback Policy Never Fallback

Service IP Labels/Addresses [tx_service1] Application Servers [writetxdata] +

Volume Groups [txmetrovg] SVC PPRC Replicated Resources [texnymetro>

It is now required to synchronize the cluster, as described next.

Synchronize clusterTo synchronize the cluster, run:

1. smitty hacmp Extended Configuration Extended Verification and Synchronization

2. Upon successful synchronization and verification, the cluster is ready to be started and tested accordingly.

Tip: The /usr/es/sbin/cluster/svcpprc/utils/cl_verify_svcpprc_config script can be run prior to full cluster synchronize to verify the SVC copy services configuration specifically. This will aid in troubleshooting if errors occur during cluster synchronization.


Chapter 17. GLVM concepts and configuration

PowerHA/XD Geographic Logical Volume Manager (GLVM) is a high availability function that can mirror data across a standard IP network and provide automated fallover/fallback support for the applications utilizing the geographically mirrored data.

GLVM performs the remote mirroring of AIX logical volumes using the AIX native Logical Volume Manager (LVM) functions for optimal performance and ease of configuration and maintenance.

In this chapter we examine GLVM—the concepts, installation, configuration, and migration.

17


17.1 PowerHA/XD GLVMGLVM provides similar functions as HAGEO, but uses a simplified method to define and maintain the data replication between the sites.

PowerHA/XD GLVM provides two essential functions:

� Remote data mirroring� Automated fallover and fallback

Together these functions provide high availability support for applications and data across a standard TCP/IP network to a remote site.

PowerHA/XD for GLVM provides the following features for disaster recovery:

� It allows automatic detection and response to site and network failures in the geographic cluster without user intervention.

� It performs automatic site takeover and recovery and keeps mission-critical applications highly available through application fallover and monitoring.

� It allows for simplified configuration of volume groups, logical volumes and resource groups.

� It uses the TCP/IP network for remote mirroring over an unlimited distance.

� It supports maximum sized logical volumes.

PowerHA/XD GLVM is PowerHA extended distance using geographic logical volumes to mirror data to the remote site. PowerHA/XD GLVM:

� It supports clusters with multiple nodes at two sites.

� It mirrors data by providing a local representation of the remote physical volume to the LVM.

� The local and remote storage systems do not have to be the same type of equipment.

17.1.1 Definitions and conceptsIn this section, we define the basic concepts of PowerHA/XD for GLVM:

� Remote physical volume (RPV):

A pseudo device driver that provides access to the remote disks as though they were locally attached. The remote system must be connected via TCP/IP network. The distance between the sites is limited by the latency and bandwidth of the connecting network(s).

Important: HAGEO no longer exists starting in PowerHA/XD 5.5


The RPV consists of two parts:

– RPV Client:

This is a pseudo device driver that runs on the local machine and allows the AIX LVM to access remote physical volumes as though they were local. The RPV clients are seen as hdisk devices, which are logical representations of the remote physical volume.

The RPV client device driver appears like an ordinary disk device, for example, RPV client device, hdisk8, and will have all its I/O directed to the remote RPV Server. It also has no knowledge at all about the nodes, networks, and so on.

In PowerHA/XD, concurrent access is not supported for GLVM, so when accessing the RPV clients, the local equivalent RPV servers and remote RPV clients must be in a defined state.

When configuring the RPV client, the following details are defined:

• The IP address of the RPV server

• The local IP address (defines the network to use)

• The time-out. This field is primarily for the standalone GLVM option, as PowerHA will overwrite this field with the cluster’s config_too_long time. In a PowerHA cluster, this will be the worst case scenario, as PowerHA will detect problems with the remote node well before then.

The SMIT fast path to configure the RPV clients is, smitty rpvclient

– RPV Server:

The RPV Server runs on the remote machine, one for each physical volume that is being replicated. The RPV Server can listen to a number of remote RPV Clients on different hosts to handle fallover.

The RPV Server is an instance of the kernel extension of the RPV device driver with names such as rpvserver0 and is not an actual physical device.

When configuring the RPV server, the following are defined:

• The PVID of the local physical volume• The IP addresses of the RPV clients (comma separated).

� Geographically mirrored volume group (GMVG):

A volume group that consists of local and remote physical volumes. Strict rules apply to GMVGs to ensure that it is much less likely to find that you do not have a complete copy of the mirror at each site. For this reason the superstrict allocation policy is required for each logical volume in a GMVG. PowerHA/XD will also expect each logical volume in a GMVG to be mirrored.

Chapter 17. GLVM concepts and configuration 699

GMVGs are managed by PowerHA/XD and recognized as a separate class of replicated resources, so they have their own events. PowerHA/XD verification will issue a warning if there are resource groups that contain GMVG resources that do not have the forced varyon flag set and if quorum is not disabled.

The SMIT fast path to configure the RPV servers is, smitty rpvserver.:

� GLVM Utilities:

There are SMIT menus provided with GLVM to create the GMVGs and the logical volumes. While not required, because they perform the same function as the equivalent SMIT menus under the covers, they do control the location of the logical volumes to ensure proper placement of mirror copies.

If you use the standard commands to configure your GMVGs, we recommend that you use the GLVM verification utility.

� Network definitions are added to PowerHA/XD for GLVM:

XD_data Network that can be used for data replication only. This is the equivalent of the Geo_Primary network. This network will support adapter swap, but not fallover to another node. RSCT heartbeat packets will be sent on this network.

XD_ip An IP-based network used for participation in RSCT protocols, heartbeating, and client communication.

Important: PowerHA/XD will enforce the requirement that each physical volume that is part of a volume group with RPV Clients has the reverse relationship defined. This, at a minimum, means that every GMVG will consist of two physical volumes on each site—one local disk and the logical representation of the remote physical volume.

Important: The LVM commands and SMIT menus are not completely aware of RPV design, so it is possible to create geographically mirrored volume groups that do not have a complete copy of the data on either site.

Note: PowerHA 5.4.0 and later version support up to four XD_data networks, while previous releases support 1. EtherChannel is also supported.


XD_rs232 Network that can be used for serial communications. Same as the RS232 network type, except that the heartbeat parameters have been modified for the greater distance. For example could be a leased line or a serial line with line driver.

RSCT heartbeat packets will be sent over all the networks.

Figure 17-1 shows an example of two sites with one node at each. Viewing the replication from Node1, we see that the destination physical volume hdisk3 on Node2 is presented on Node1 as hdisk8.

Figure 17-1 RPV client viewed from Node1

Note: PowerHA/XD GLVM requires one XD_data network for data replication and one of XD_rs232 or XD_ip to differentiate between a remote node failure or XD_data network failure.

Note: hdisk4 is a physical disk local to note1 only. It must not be a shared disk physically connected to both nodes as is typical on standard PowerHA clusters.

RPV Clientdevice driver

RPV Serverkernel extension

Node2Node1

New York Munchen

hdisk8hdisk8 hdisk3

Application

LVM

hdisk4

rpserver0


On Node2 there is an RPV Server for each physical volume. On Node1 there is a corresponding RPV Client for each RPV server, which presents to the LVM as a local physical volume. We can now construct a volume group glvm_vg on Node1, mirroring a local physical volume (hdisk4) to the local RPV Client (hdisk8). The RPV client and server will ensure that all I/O is transferred over the XD_data network to the physical volumes on Node2.

When Node2 becomes the active node, the operation is reversed, as can be seen in Figure 17-2. The remote physical volumes on Node1 are presented through the RPV client as local physical volumes on Node2.

Figure 17-2 Reverse configuration on fallover to remote site

Note: hdisk3 is a physical disk local to node2 only. It must not be a disk that is shared or accessible by all cluster nodes.

Note: As with any physical volumes shared between multiple systems, the hdisk numbering might not be consistent, however, the PVID will be.


RPV Serverkernel extension

Node2Node1

New York Munchen

hdisk4 hdisk7

Application

hdisk3

LVM

rpserver0


Figure 17-3 shows the configuration of both nodes with the RPV servers and clients defined. A volume group glvm_vg is defined on both nodes, consisting of two local physical volumes, and the two local representations of the remote node’s physical volumes.

Figure 17-3 RPV client and server configuration on both sites

17.1.2 Configuring Synchronous GLVM with PowerHA/XD

Synchronous mirroring works in such a way that LVM returns control to the application after writing both the local and remotes copies of the data, keeping your disaster recovery site up to date. While having both sites up to date is definitely advantageous, the time that it takes to write to the remote physical volumes can have a large impact on application response time.

Factors to consider with synchronous mirroringThere are two limiting factors when using synchronous mirroring:

� Network bandwidth� Network latency

To find more information about planning for synchronous GLVM, refer to HACMP/XD for Geographic LVM (GLVM): Planning and Administration Guide.

http://publib.boulder.ibm.com/epubs/pdf/a2313387.pdf

glvm_vg:hdisk3 pair with node1 hdisk8hdisk4 pair with node1 hdisk9

hdisk4 hdisk7 hdisk8hdisk3

glvm_vg:hdisk4 pair with node2 hdisk7hdisk5 pair with node2 hdisk8

hdisk5 hdisk8 hdisk9hdisk4

Real copy Virtual copyReal copy Virtual copy

Node2Node1

New York Munchen

Important: GLVM 5.5 requires at least AIX 6.1.2.0.



In this section we cover how to configure a new PowerHA/XD Synchronous GLVM cluster, including PowerHA/XD sites, networks, and resource groups:

� Setting up the cluster:

– Installing the filesets– Creating the base cluster– Creating a resource group

� Configuring and testing Synchronous GLVM:

– Creating RPV servers– Creating RPV clients– Creating volume groups– Creating logical volumes– Creating file systems (if applicable)– Creating GLVM copies– Importing GMVGs to each node– Testing GMVGs

� Adding GMVGs into resource group(s)

� Starting the cluster

Our two node cluster consists of the following hardware and software:

� POWER6 550s

� AIX 6100-02-01

� PowerHA/XD/GLVM 5.5 SP1

� VIOS 1.5.2.0

� Three IP networks

� Two virtual SCSI disks on each node:

– Each disk is part its own enhanced concurrent VG.– The disks are not shared between the cluster nodes across sites.– There are two GMVGs (glvm1vg, glvm2vg).

Installing the filesetsWe begin with having the following PowerHA/XD GVLM filesets installed as shown in Example 17-1.

Note: Basic knowledge of how to install and configure a PowerHA cluster is assumed because it is required in order to configure PowerHA/XD. You can find more information about installing and configuring PowerHA in Chapter 4, “Installation and configuration” on page 199.


Example 17-1 PowerHA/XD/GLVM filesets

root@ glvm1[] lslpp -L|egrep "cluster|glvm" cluster.adt.es.client.include cluster.adt.es.client.samples.clinfo cluster.adt.es.client.samples.clstat cluster.adt.es.client.samples.libcl cluster.adt.es.java.demo.monitor cluster.doc.en_US.glvm.pdf cluster.es.client.clcomd 5.5.0.1 A F ES Cluster Communication cluster.es.client.lib 5.5.0.1 A F ES Client Libraries cluster.es.client.rte 5.5.0.1 A F ES Client Runtime cluster.es.client.utils 5.5.0.1 A F ES Client Utilities cluster.es.client.wsm 5.5.0.1 A F Web based Smit cluster.es.cspoc.cmds 5.5.0.1 A F ES CSPOC Commands cluster.es.cspoc.dsh 5.5.0.0 C F ES CSPOC dsh cluster.es.cspoc.rte 5.5.0.1 A F ES CSPOC Runtime Commands cluster.es.nfs.rte 5.5.0.0 C F ES NFS Support cluster.es.server.cfgast 5.5.0.0 C F ES Two-Node Configuration cluster.es.server.diag 5.5.0.1 A F ES Server Diags cluster.es.server.events 5.5.0.1 A F ES Server Events cluster.es.server.rte 5.5.0.1 A F ES Base Server Runtimecluster.es.server.testtool

cluster.es.server.utils 5.5.0.1 A F ES Server Utilities cluster.es.worksheets 5.5.0.1 A F Online Planning Worksheets cluster.license 5.5.0.0 C F HACMP Electronic License cluster.man.en_US.es.data 5.5.0.0 C F ES Man Pages - U.S. English cluster.msg.en_US.glvm 5.5.0.0 C F HACMP GLVM Messages - U.S. cluster.xd.glvm 5.5.0.1 A F HACMP/XD for GLVM RPV

Support cluster.xd.license 5.5.0.0 C F HACMP XD Feature License glvm.rpv.client 5.5.0.1 A F Remote Physical Volume

Client glvm.rpv.man.en_US 5.5.0.0 C F Geographic LVM Man Pages - glvm.rpv.msg.en_US 5.5.0.0 C F RPV Messages - U.S. English glvm.rpv.server 5.5.0.1 A F Remote Physical Volume

Server glvm.rpv.util 5.5.0.1 A F Geographic LVM Utilities

Creating the base clusterWe chose to start with creating our base cluster first, then create the GMVGs and then add into our cluster. We started in Extended Topology Configuration and performed these steps:

1. Add a new cluster (glvmtest).

2. Add two nodes (glvm1,glvm2).


3. Add two sites (USA,UK):

– Assigned node glvm1 to site USA– Assigned node glvm2 to site UK

4. Add three networks (ether, XD_ip, and XD_data).

5. Add communication interfaces to our networks.

Our cluster topology configuration is shown in Example 17-2. Of course all IP addresses and names should exist in /etc/hosts.

Example 17-2 PowerHA/XD base topology

root@ glvm1[] cltopinfoCluster Name: glvmtestCluster Connection Authentication Mode: StandardCluster Message Authentication Mode: md5_aesCluster Message Encryption: EnabledUse Persistent Labels for Communication: NoThere are 2 node(s) and 3 network(s) definedNODE glvm1: Network net_XD_data_01 glvm1_data1 10.10.20.4 Network net_XD_ip_01 glvm1_ip1 10.10.30.4 Network net_ether_01 glvm1 9.12.7.4NODE glvm2: Network net_XD_data_01 glvm2_data2 10.10.20.8 Network net_XD_ip_01 glvm2_ip2 10.10.30.8 Network net_ether_01 glvm2 9.12.7.8

Note: If you are not using IPAT on the XD_data network, as we chose to do so in this test configuration, then you must disable it in the network setting, and configure a node-bound service IP for it.

Note: There is not a diskhb network configured, as there typically are no shared disks between sites.


Creating a resource groupWe then created a single resource group (rosie) with policies as shown in Example 17-3.

Example 17-3 PowerHA/XD/GLVM base resource group

Resource Group Name rosieParticipating Node Name(s) glvm1 glvm2Startup Policy Online On Home Node OnlyFallover Policy Fallover To Next Priority


At this point we are ready to begin configuring GLVM. As mentioned previously, we have two separate unique disks on each node to be used for GLVM. Example 17-4 shows our beginning disk configuration from each node.

Example 17-4 GLVM beginning disk configuration

root@ glvm1[] lspvhdisk0 000fe4012b5361f2 rootvg activehdisk1 000fe401d39e5286 nonehdisk2 000fe401d39e663c none

root@ glvm1[] lsdev -Cc diskhdisk0 Available Virtual SCSI Disk Drivehdisk1 Available Virtual SCSI Disk Drivehdisk2 Available Virtual SCSI Disk Drive

root@ glvm2[] lspvhdisk0 000fe411fe0f8404 rootvg activehdisk1 000fe401d39e7776 nonehdisk2 000fe401d39e856f none

root@ glvm2[] lsdev -Cc diskhdisk0 Available Virtual SCSI Disk Drivehdisk1 Available Virtual SCSI Disk Drivehdisk2 Available Virtual SCSI Disk Drive

Note: hdisk1 and hdisk2 are local disks on each node, they are not shared disks.


Creating RPV serversYou can create the RPV servers on either side first, because this process will be repeated for each site. We chose to create the RPV servers on the remote node, glvm2. To create the RPV servers, run:

1. smitty glvm_utils Remote Physical Volume Servers Add Remote Physical Volume Servers. You will be presented with a pop up picklist of available disks to use for replication. Highlight each disk and press F7 to select them. You can choose just one and repeat the process, or just multiple at the same time. In our case we are choosing both hdisk1 and hdisk2 as shown in Example 17-4, and press Enter to proceed to the final menu.

.

Figure 17-4 RPV server disk picklist

2. For the Remote Physical Volume Client Internet Address, enter the IP address(es) of the XD_data network interfaces on the remote node. In our case it is for node glvm1. We only have one XD_data network, but if you have multiple (up to four) you can enter all and separate the addresses with a comma. Or, the easier and preferred method, is if the XD_data networks are already defined, they can simply be chosen from a picklist generated by

Remote Physical Volume Servers


Remote Physical Volume Server Site Name Configuration List All Remote Physical Volume Servers Add Remote Physical Volume Servers Change / Show a Remote Physical Volume Server Change Multiple Remote Physical Volume Servers +--------------------------------------------------------------------------+ | Physical Volume Identifiers | | | | Move cursor to desired item and press F7. | | ONE OR MORE items can be selected. | | Press Enter AFTER making all selections. | | | | # Physical Volume Physical Volume Identifier | | # ---------------------------------------------------- | | > hdisk1 000fe401d39e7776 || > hdisk2 000fe401d39e856f |

| F1=Help F2=Refresh F3=Cancel | | F7=Select F8=Image F10=Exit |F1| Enter=Do /=Find n=Find Next |F9+------------------------------------------------------------------------+


pressing the F4 key. Also both options, Configure Automatically at System Restart and Start New Devices Immediately should be set to no as shown in Figure 17-5.

Figure 17-5 Add RPV servers menu

3. Upon pressing Enter both RPV servers, rpvserver0 and rpvserver1, will be created and shown in the defined state. Not only will you see this in the SMIT output, but you can also verify it at the command line by using the lsdev -t rpvstype command. You can also verify the attributes of the RPV server you chose by running the lsattr -El rpvserver# command as shown in Example 17-5.

Example 17-5 Display RPV servers state and attributes

root@ glvm2[] lsdev -t rpvstyperpvserver0 Defined Remote Physical Volume Serverrpvserver1 Defined Remote Physical Volume Server

root@ glvm2[] lsattr -El rpvserver0auto_online n Configure at System Boot Trueclient_addr 10.10.20.4 Client IP Address Truerpvs_pvid 000fe401d39e77760000000000000000 Physical Volume Identifier True

Note: A picklist using the F4 key will only be generated if the cluster is running. Otherwise you will get the generic error message of “There are no items of this type”.

Add Remote Physical Volume Servers


[Entry Fields] Physical Volume Identifiers 000fe401d39e7776* Remote Physical Volume Client Internet Address [10.10.20.4 > + Configure Automatically at System Restart? [no] + Start New Devices Immediately? [no] +



These steps should now be repeated on the local node, glvm1. Upon completion of creating the RPV servers on both nodes, you are now ready to create the RPV clients.

Creating RPV clientsTo create RPV clients on the local node, glvm1, the RPV servers must be in the available state on the remote node, glvm2.

On glvm2, run:

� smitty rpvserver Configure Defined Remote Physical Volume Servers and press Enter. Using F7 choose all RPV servers and press Enter.

On the local node, glvm1 in our scenario, run:

� smitty rpvclient Add Remote Physical Volume Clients and you will menu as shown in Figure 17-6.

Figure 17-6 Add RPV client first menu

4. Again, because the XD_data network has already been defined and the cluster is running, you can press F4 to choose the address from the picklist. Upon pressing Enter, you will be presented with another picklist of available IP address for the local node. While you can choose any of them, we generally recommend choosing the XD_data and again pressing Enter. A list of Remote Physical Volume Server Disks mapping to the RPV servers will be listed. Using F7, select all disks, which in our case both hdisk1 and hdisk2.

Add Remote Physical Volume Clients

Type or select a value for the entry field.Press Enter AFTER making all desired changes.

[Entry Fields]* Remote Physical Volume Server Internet Address [10.10.20.8] +



You will then be presented with the final RPV client menu as shown Figure 17-7.

Figure 17-7 Add RPV clients to local node

5. For the option Start New Devices Immediately, choose no and press Enter. This creates both of the RPV clients (hdisk3 hdisk4) and will put them in the defined state. The state will be displayed in the SMIT output, but you can also verify it at the command line by using the lsdev -t rpvclient command.You can also verify the attributes of the RPV clients by using the lsattr -El hdisk# command as shown in Example 17-6.

Example 17-6 Display RPV client state and attributes

root@ glvm1[] lsdev -t rpvclienthdisk3 Defined Remote Physical Volume Clienthdisk4 Defined Remote Physical Volume Client

root@ glvm1[] lsattr -El hdisk3io_timeout 360 I/O Timeout Interval Truelocal_addr 10.10.20.4 Local IP Address (Network 1) Truelocal_addr2 none Local IP Address (Network 2) Truelocal_addr3 none Local IP Address (Network 3) Truelocal_addr4 none Local IP Address (Network 4) Truepvid 000fe401d39e77760000000000000000 Physical Volume Identifier Trueserver_addr 10.10.20.8 Server IP Address (Network 1) Trueserver_addr2 none Server IP Address (Network 2) Trueserver_addr3 none Server IP Address (Network 3) True



[Entry Fields] Remote Physical Volume Server Internet Address 10.10.20.8 Remote Physical Volume Local Internet Address 10.10.20.4 Physical Volume Identifiers 000fe401d39e777600000> I/O Timeout Interval (Seconds) [180] # Start New Devices Immediately? [no]+



server_addr4 none Server IP Address (Network 4) True

root@ glvm1[] lsattr -El hdisk4io_timeout 360 I/O Timeout Interval Truelocal_addr 10.10.20.4 Local IP Address (Network 1) Truelocal_addr2 none Local IP Address (Network 2) Truelocal_addr3 none Local IP Address (Network 3) Truelocal_addr4 none Local IP Address (Network 4) Truepvid 000fe401d39e856f0000000000000000 Physical Volume Identifier Trueserver_addr 10.10.20.8 Server IP Address (Network 1) Trueserver_addr2 none Server IP Address (Network 2) Trueserver_addr3 none Server IP Address (Network 3) Trueserver_addr4 none Server IP Address (Network 4) True

In order to create RPV clients on the remote node, glvm2, the RPV servers must be placed in the defined state on glvm2 and made available on the local node, glvm1.

On the remote node, glvm2, run:

� smitty rpvserver Remove Remote Physical Volume Servers and using F7 choose all RPV servers from the picklist and press Enter. You will then be presented with the final SMIT menu as shown in Figure 17-8. Press Enter twice to complete the execution.

Figure 17-8 Change RPV servers into defined state

Remove Remote Physical Volume Servers


[Entry Fields] Remote Physical Volume Servers rpvserver0 rpvserver1 Keep definitions in database? [yes] +


Important: Be sure to leave the option, Keep definitions in database, to the default of yes. Otherwise the RPV server will be deleted completely.


On the local node, glvm1, the RPV servers need to be changed to available. Execute:

� smitty rpvserver Configure Defined Remote Physical Volume Servers and press Enter. Using F7 choose all RPV servers and press Enter.

Again, this can be verified at the command line by running the lsdev -t rpvstype command as shown in Example 17-7.

Example 17-7 RPV servers available

root@ glvm1[] lsdev -t rpvstyperpvserver0 Available Remote Physical Volume Serverrpvserver1 Available Remote Physical Volume Server

On the remote node, glvm2, to add the RPV clients run:

1. smitty rpvclient Add Remote Physical Volume Clients and select the RPV server address by pressing F4. Upon pressing Enter you will be presented with another picklist of available IP address for the local node. While you can choose any of them, we generally recommend choosing the XD_data and again pressing Enter.

2. A list of Remote Physical Volume Server Disks mapping to the RPV servers will be listed. Using F7 select all disk, which in our case both hdisk1 and hdisk2. You will then be presented with the final RPV client menu as shown in Figure 17-9.

Note: A picklist using the F4 key will only be generated if the cluster is running. Otherwise you will get the generic error message of “There are no items of this type”.


Figure 17-9 Add RPV clients to remote node

As you can see, the RPV server and RPV local addresses are the opposite as they were when adding them to the local node, glvm1. For the option, Start New Devices Immediately, choose no and press Enter. This creates all, or both in our case, of the RPV clients (hdisk3 and hdisk4) and will put them in the defined state. The state will be displayed in the SMIT output, but you can also verify it at the command line by running the lsdev -t rpvclient command. You can also verify the attributes of the RPV clients by running the lsattr -El hdisk# command.

The next step is to create volume groups on our disks on the local node, glvm1. To do so, the RPV clients must be available on the local node. Currently the RPV clients are defined on both systems, and the RPV servers are available on the local node. The RPV servers must be changed to defined on glvm1 and then made available on the remote node, glvm2. Then make the RPV clients on glvm1 available, making the remote disks available on the local node in order to create the volume group. Perform the following steps:

1. On the local node, glvm1, run smitty rpvserver Remove Remote Physical Volume Servers and using F7, choose all RPV servers from the picklist and press Enter. You will then be presented with the final SMIT menu as shown in Figure 17-8 on page 712. press Enter twice to complete the execution.

2. On the remote node, glvm2, run smitty rpvserver Configure Defined Remote Physical Volume Servers and press Enter. Using F7, choose all RPV servers and press Enter to make the RPV servers available.



[Entry Fields] Remote Physical Volume Server Internet Address 10.10.20.4 Remote Physical Volume Local Internet Address 10.10.20.8 Physical Volume Identifiers 000fe401d39e538600000> I/O Timeout Interval (Seconds) [180] # Start New Devices Immediately? [no]+



3. On the local node, glvm1, run smitty rpvclient Configure Defined Remote Physical Volume Clients and use F7 to choose all RPV clients, hdisk3 and hdisk4 in our case, and press Enter to make them available. They are now ready to be used for volume group creation and are visible to both the lspv and lsdev -t rpvclient commands as shown in Example 17-8.

Example 17-8 RPV clients available on local node for vg creation

root@ glvm1[] lspvhdisk0 000fe4012b5361f2 rootvg activehdisk1 000fe401d39e5286 nonehdisk2 000fe401d39e663c nonehdisk3 000fe401d39e7776 nonehdisk4 000fe401d39e856f none

root@ glvm1[] lsdev -t rpvclienthdisk3 Available Remote Physical Volume Clienthdisk4 Available Remote Physical Volume Client

For the following LVM steps, the RPV server will stay available on glvm2 and the RPV client will stay available on glvm1 until importing the volume group information onto the remote node, discussed later in the section. At that time, the roles will be swapped again.

Creating volume groupsTo create the volume group, we use the regular SMIT LVM menus. Begin by running smitty mkvg and choose your preferred volume group type. We chose to create a scalable volume group mainly for currency and it is required later when we converted to asynchronous GLVM. We also decided to create two separate volume groups, glvm1vg and glvm2vg.

For each volume group, be sure to select both a local disk (hdisk1 or hdisk2) and an RPV client disk (hdisk3 or hdisk4). As shown in Figure 17-10 on page 716, we created glvm1vg with both hdisk1 and hdisk3. Ensure that Activate volume group AUTOMATICALLY at system restart? is set to no. We also chose to use a concurrent capable volume group because we are using virtual SCSI devices. After pressing Enter, the volume is created. We repeated this step to create glvm2vg using hdisk2 and hdisk4.,

Important: Enhanced concurrent volume groups are currently supported for synchronous GLVM but not for asynchronous. However, normal benefits of enhanced concurrent, such as fast disk takeover, currently do not apply.


Figure 17-10 Create volume group

Though not a GLVM requirement, it is a good common practice to have the same volume group major numbers on each node. You can run lvlstmajor once for each node to find out what the next free major number. Find a common free major number and use one for each volume. In our scenario we used 34 for glvm1vg and 35 for glvm2vg. Make note of the major numbers because they will be used again later when importing the volume groups to the remote node, glvm2.

Because we chose to use enhanced concurrent volume groups, they must be varied on manually before creating logical volumes and file systems. Vary them on now by running:

# varyonvg glvm1vg# varyonvg glvm2vg

The volume groups also need to have quorum disabled. This can be set at the command line by running the chvg -Q n vgname command as we have shown here:

#chvg -Q n glvm1vg#chvg -Q n glvm2vg

Add a Scalable Volume Group

Type or select values in entry fields.Press Enter AFTER making all desired changes. [Entry Fields] VOLUME GROUP name [glvm1vg] Physical partition SIZE in megabytes 32 +* PHYSICAL VOLUME names [hdisk1 hdisk3] + Force the creation of a volume group? yes + Activate volume group AUTOMATICALLY no + at system restart? Volume Group MAJOR NUMBER [34] +# Create VG Concurrent Capable? enhanced concurrent + Max PPs per VG in units of 1024 32 + Max Logical Volumes 256 + Enable Strict Mirror Pools No +



This can also be changed in SMIT by running smitty chvg, press F4, and choose the appropriate volume group and press Enter. Make sure that A QUORUM of disks required to keep the volume group on-line? is set to no as shown in Figure 17-11. Press Enter to complete the change and repeat on each volume group as needed.

Figure 17-11 Change volume group to disable quorum

Creating logical volumesAt this point the logical volumes can be created:

1. Execute smitty mklv and choose a volume group created in the previous steps and press Enter.

2. When creating the logical volume, specify a unique name that can be used across the cluster, only use the local disk(s), create only one copy and for the option of Allocate each logical partition copy on a SEPARATE physical volume? choose superstrict as shown in Figure 17-12.

Change a Volume Group


[Entry Fields]* VOLUME GROUP name glvm1vg* Activate volume group AUTOMATICALLY no + at system restart?* A QUORUM of disks required to keep the volume no + group on-line ? Convert this VG to Concurrent Capable? no + Change to big VG format? no + Change to scalable VG format? no + LTG Size in kbytes 128 + Set hotspare characteristics n + Set synchronization characteristics of stale n + partitions Max PPs per VG in units of 1024 32 + Max Logical Volumes 256 + Mirror Pool Strictness +


Figure 17-12 Add logical volumes

3. Repeat this step for any logical volumes and each volume group as needed. We created one logical volume for each volume group. We created glvm1lv and glvm2lv for volume groups glvm1vg and gvlm2vg respectively.

4. If using file systems, it is also recommended to manually create the jfslog LV for each volume group in order to specify a unique name to each. We created testlog and testlog2. Upon creation, initialize the jfslog LV by running:

#logform /dev/<loglvname>

Creating file systems (if applicable)Now the file systems need to be created on the logical volumes that were just created in the previous step. This can be done by running smitty crfs and choosing the appropriate JFS type. We are going to use JFS2 in our example.



[TOP] [Entry Fields] Logical volume NAME [glvm1lv]* VOLUME GROUP name glvm1vg* Number of LOGICAL PARTITIONS [256] # PHYSICAL VOLUME names [hdisk1] + Logical volume TYPE [jfs2] + POSITION on physical volume middle + RANGE of physical volumes minimum + MAXIMUM NUMBER of PHYSICAL VOLUMES [1] # to use for allocation Number of COPIES of each logical 1 + partition Mirror Write Consistency? active + Allocate each logical partition copy superstrict + on a SEPARATE physical volume? RELOCATE the logical volume during yes + reorganization? Logical volume LABEL [] MAXIMUM NUMBER of LOGICAL PARTITIONS [512] #

Note: If using JFS2 and inline logs, manually creating the jfslog LV and initializing it does not apply.


We ran the following commands:

1. smitty crfs Add an Enhanced Journaled File System Add an Enhanced Journaled File System on a Previously Defined Logical Volume.

2. Then, on the Logical Volume name, press F4 and choose the appropriate logical volume. In our case we are using glvm1 and are creating a file system with a mount point of /glvm1jfs2. Also ensure that the option Mount AUTOMATICALLY at system restart? is set to no. Our example is shown below in Figure 17-13.

Figure 17-13 Adding a file system

Repeat this step for each file system needed. We repeated this step for glvm2lv and created another file system with a mount point of /glvm2jfs.

Creating GLVM copiesTo create the GLVM copies, the volume groups must be varied on. We begin with both volume groups, glvm1vg and glvm2vg, online. All four disks, two local and two RPV clients, are available and active. Each logical volume previously created only has one copy as shown in Example 17-9.

Add an Enhanced Journaled File System


[Entry Fields]* LOGICAL VOLUME name glvm1lv +* MOUNT POINT [/glvm1jfs2] Mount AUTOMATICALLY at system restart? no + PERMISSIONS read/write + Mount OPTIONS [] + Block Size (bytes) 4096 + Logical Volume for Log + Inline Log size (MBytes) [] # Extended Attribute Format + ENABLE Quota Management? no + Enable EFS? no + Allow internal snapshots? no + Mount GROUP []



Example 17-9 Local volume groups active before creating GLVM copies

root@ glvm1[] lspvhdisk0 000fe4012b5361f2 rootvg activehdisk1 000fe401d39e5286 glvm1vg activehdisk2 000fe401d39e663c glvm2vg activehdisk3 000fe401d39e7776 glvm1vg activehdisk4 000fe401d39e856f glvm2vg active

root@ glvm1[] lsvg -l glvm1vgglvm1vg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTglvm1lv jfs2 256 256 1 closed/syncd /glvm1jfs2testlog jfs2log 1 1 1 closed/syncd N/A

root@ glvm1[] lsvg -l glvm2vgglvm2vg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTglvm2lv jfs2 256 256 1 closed/syncd /glvm2jfs2testlog2 jfs2log 1 1 1 closed/syncd N/A

To create the GLVM copy involves using smitty glvm_utils. Execute the following steps:

1. smitty glvm_utils Geographically Mirrored Logical Volumes Add a Remote Site Mirror Copy to a Logical Volume

2. Choose the appropriate logical volume, glvm1lv in our case, and press Enter. Choose the remote site, UK in our case, and press Enter. Then choose one (or more) remote physical volumes, in our case hdisk3 and press Enter.

3. You will the see the final menu as shown in Figure 17-14. The NEW TOTAL number of logical partition copies should automatically increment up one. Make sure the option Allocate each logical partition copy on a SEPARATE physical volume? is set to superstrict as it should be from the previous logical volume creation.


Figure 17-14 Add remote site mirror copy to a logical volume

Repeat this step for each logical volume created previously. In our case we repeated for glvm1lv, and our jfslogs logical volumes testlog and testlog2.

Importing GMVGs to each nodeTo this point the RPV servers have been available on the remote node, glvm2, and the RPV clients available on the local node,glvm1. To import the GMVGs to the remote node, glvm2, the local disks need to be available to the remote node. This is done as follows:

1. Varyoff the volume groups on the local node, glvm1.

2. Make the RPV clients defined on the local node, glvm1.

3. Make the RPV servers defined on the remote node, glvm2.

4. Make the RPV servers available on the local node, glvm1.

5. Make the RPV clients available on the remote note, glvm2.

On the local node, glvm1, run:

1. varyoffvg glvm1vg

Add a Remote Site Mirror Copy to a Logical Volume


[Entry Fields]* LOGICAL VOLUME name glvm1lv* NEW TOTAL number of logical partition copies 2 +* REMOTE PHYSICAL VOLUME Name hdisk3 POSITION on physical volume outer_middle + RANGE of physical volumes minimum + Allocate each logical partition copy on a SEPARATE superstrict physical volume? SYNCHRONIZE the data in the new logical partition no + copies?


Note: GMVGs cannot be configured through C-SPOC. So create on one node, then varyoff and import on the next node as shown in the next section.


2. varyoffvg glvm2vg

3. smitty rpvclient Remove Remote Physical Volume Clients and using F7 choose all RPV clients, hdisk3 and hdisk4 in our case, from the picklist and press Enter. Ensure that the option Keep definitions in database? is set to yes then press Enter twice to complete the execution. This will put the RPV clients in the defined state.


1. smitty rpvserver Remove Remote Physical Volume Servers and press Enter.

2. Using F7 choose all RPV servers and press Enter. Ensure that the option Keep definitions in database? is set to yes then press Enter twice to complete the execution. This will put the RPV servers in the defined state.


1. smitty rpvserver Configure Defined Remote Physical Volume Servers and press Enter.

2. Using F7, choose all RPV servers and press Enter to make the RPV servers available.


1. smitty rpvclient Configure Defined Remote Physical Volume Clients

2. Use F7 to choose all RPV clients, hdisk3 and hdisk4 in our case, and press Enter to make them available.

We verified the state of our RPV clients, RPV servers, and disks as shown in Example 17-10.

Example 17-10 RPV server, RPV client verification before importing volume group

##Node GLVM1root@ glvm1[] lsdev -t rpvstyperpvserver0 Available Remote Physical Volume Serverrpvserver1 Available Remote Physical Volume Server

root@ glvm1[] lsdev -t rpvclienthdisk3 Defined Remote Physical Volume Clienthdisk4 Defined Remote Physical Volume Client

root@ glvm1[] lspvhdisk0 000fe4012b5361f2 rootvg activehdisk1 000fe401d39e5286 glvm1vghdisk2 000fe401d39e663c glvm2vg


##Node GLVM2root@ glvm2[] lsdev -t rpvstyperpvserver0 Defined Remote Physical Volume Serverrpvserver1 Defined Remote Physical Volume Server

root@ glvm2[] lsdev -t rpvclienthdisk3 Available Remote Physical Volume Clienthdisk4 Available Remote Physical Volume Client

root@ glvm2[] lspvhdisk0 000fe411fe0f8404 rootvg activehdisk1 000fe401d39e7776 Nonehdisk2 000fe401d39e856f Nonehdisk3 000fe401d39e5286 Nonehdisk4 000fe401d39e663c None

With all the disks now available on the remote note, glvm2, the volume group can now be imported. The volume groups will be imported with the same major number used when they were originally created.

1. On the remote node, glvm2, run:

#importvg -y glvm1vg -V 34 hdisk1#importvg -y glvm2vg -V 35 hdisk2

2. In order to verify that the logical volumes and file systems have been imported, we varyon the volume groups by running:

#varyonvg glvm1vg#varyonvg glvm2vg

3. To verify, we performed the following actions as shown in Example 17-11.

Example 17-11 GMVG verification on remote node

root@ glvm2[] lspvhdisk0 000fe411fe0f8404 rootvg activehdisk1 000fe401d39e7776 glvm1vg activehdisk2 000fe401d39e856f glvm2vg activehdisk3 000fe401d39e5286 glvm1vg activehdisk4 000fe401d39e663c glvm2vg active

root@ glvm2[] lsvg -l glvm1vgglvm1vg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTglvm2lv jfs2 256 512 2 closed/syncd /glvm1jfs2testlog jfs2log 1 2 2 closed/syncd N/A

root@ glvm2[] lsvg -l glvm2vgglvm2vg:


LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTglvm2lv jfs2 256 512 2 closed/syncd /glvm2jfs2testlog2 jfs2log 1 2 2 closed/syncd N/A

4. Upon satisfactory verification, you might want to also check /etc/filesystems and/or run lsfs, the volume groups should be varied off and the RPV clients on the local node, glvm1, and the RPV servers on the remote node glvm2 returned to the defined state. This is done by running:

#varyoffvg glvm1vg#varyoffvg glvm2vg


1. smitty rpvclient Remove Remote Physical Volume Clients and press Enter.

2. Using F7 choose all RPV servers and press Enter.

3. Ensure that the option Keep definitions in database? is set to yes, then press Enter twice to complete the execution. This will put the RPV clients in the defined state.




3. Ensure that the option Keep definitions in database? is set to yes, then press Enter twice to complete the execution. This will put the RPV servers in the defined state.

Testing GMVGsTo test the GMVGs the RPV servers must be available on the remote node, glvm2, and the RPV clients available on the local node glvm1.

On glvm2, run:

1. smitty rpvserver Configure Defined Remote Physical Volume Servers and press Enter.


On glvm1 run:

1. smitty rpvclient Configure Defined Remote Physical Volume Clients

2. Use F7 to choose all RPV clients, hdisk3 and hdisk4 in our case, and press Enter to make them available.


Verify both the disks and RPV clients are available as shown in Example 17-8 on page 715. The one obvious difference is that the volume groups should now be seen in the lspv output.

1. Varyon the volume groups by running:

#varyonvg glvm1vg#varyonvg glvm2vg

2. Mount the file systems by running:

#mount /glvm1jfs2#mount /glvm2jfs2

3. Verify the volume groups are active and the file systems are mounted on the local node, glvm1, as shown in Example 17-12.

Example 17-12 Verify GMVG active and file systems mounted

root@ glvm1[] lspvhdisk0 000fe4012b5361f2 rootvg activehdisk1 000fe401d39e5286 glvm1vg activehdisk2 000fe401d39e663c glvm2vg activehdisk3 000fe401d39e7776 glvm1vg activehdisk4 000fe401d39e856f glvm2vg active

root@ glvm1[] lsvg -oglvm2vgglvm1vgrootvg

root@ glvm1[] dfFilesystem 512-blocks Free %Used Iused %Iused Mounted on/dev/hd4 524288 162896 69% 5649 23% //dev/hd2 6291456 1633264 75% 37322 17% /usr/dev/hd9var 1048576 315680 70% 4821 12% /var/dev/hd3 21495808 19215760 11% 1082 1% /tmp/dev/hd1 524288 523520 1% 5 1% /home/proc - - - - - /proc/dev/hd10opt 524288 353648 33% 1724 5% /opt/dev/hd11admin 524288 523520 1% 5 1% /admin/dev/livedump 524288 523552 1% 4 1% /var/adm/ras/livedump/dev/glvm1lv 2097152 1811216 14% 4469 3% /glvm1jfs2/dev/glvm2lv 2097152 1907784 10% 4617 3% /glvm2jfs2

To test, you can write or copy data to the GMVG file systems. We simply copied data from /var and touched files. We also created a script to copy this data to later use as an application server with PowerHA. While writing the data, verify with the iostat command that indeed that all disks are indeed being written to.


This is a good time to become familiar with the GLVM status monitors of rpvstat and gmvgstat. As their names imply, these commands give status related information about RPVs and GMVGs respectively as shown in Example 17-13.

Example 17-13 Gmvgstat and rpvstat output

root@ glvm1[] gmvgstatGMVG Name PVs RPVs Tot Vols St Vols Total PPs Stale PPs Sync--------------- ---- ---- -------- -------- ---------- ---------- ----glvm1vg 1 1 2 0 2523 0 100%glvm2vg 1 1 2 0 2524 0 100%

root@ glvm1[] rpvstat

Remote Physical Volume Statistics:

Comp Reads Comp Writes Comp KBRead Comp KBWrite ErrorsRPV Client cx Pend Reads Pend Writes Pend KBRead Pend KBWrite------------------ -- ----------- ----------- ------------ ------------ ------hdisk4 1 43 74 773 1012 0 0 0 0 0hdisk3 1 49 64 776 256 0 0 0 0 0

For more details on these commands, refer to their respective man pages. For more information about testing and troubleshooting GLVM consult the following:

� Using the Geographic LVM in AIX5L white paper:


� HACMP/XD Geographic LVM: Planning and administration guide:


After verifying that the GMVGs are indeed in sync, we can now proceed to add our GMVGs into our PowerHA/XD cluster configuration.

Adding GMVGs into resource group(s)The RPV clients, RPV servers and GMVGs must be configured on all nodes before they can be defined as part of PowerHA/XD resource group. There must be a RPV server for every local disk and local RPV client for every remote disks that belongs to a GMVG.

Note: The gmvgstat and rpvstat commands were introduced with PowerHA/XD GLVM 5.4.1




Table 17-1 shows the labels for the configuration in Figure 17-3 on page 703.

Table 17-1 RPV names used in our environment

It is possible to access the GMVG from either side as long as the appropriate clients and servers are available. That is, for the GMVG to be configured on node glvm1, then the RPV clients must be available on glvm1 and the RPV servers available on node glvm2.

After the GMVGs have been configured, they just need to be added to the resource group, as PowerHA/XD will recognize them correctly and call the correct events for their processing.

To add the GMVGs into PowerHA/XD control we need to perform/verify the following steps:

1. Unmount the file systems on the local node, glvm1.

2. Varyoff the volume groups on glvm1.

3. Make the RPV clients defined on glvm1.

4. Make the RPV servers defined on the remote node, glvm2.

5. Perform discovery process within PowerHA/XD.

6. Add GMVGs into resource group(s).

7. Synchronize cluster definition.

8. Unmount the file systems by running:

#umount /glvm1jfs2#umount /glvm2jfs2

9. Vary off the volume groups by running:

#varyoffvg glvm1vg#varyoffvg glvm2vg


1. smitty rpvclient Remove Remote Physical Volume Clients and press Enter.


Sites/nodes Local disks RPV servers RPV clients

USA / GLVM1 hdisk1 rpvserver0 hdisk3 on UK

hdisk2 rpvserver1 hdisk4 on UK

UK/ GLVM2 hdisk1 rpvserver0 hdisk3 on USA

hdisk2 rpvserver1 hdisk4 on USA


3. Ensure that the option Keep definitions in database? is set to yes then press Enter twice to complete the execution. This will put the RPV clients in the defined state.




3. Ensure that the option Keep definitions in database? is set to yes then press Enter twice to complete the execution. This will put the RPV servers in the defined state.

Initiate the discovery process by running:

� smitty hacmp Extended Configuration Discover HACMP-related Information from Configured Nodes then press Enter.

This will automatically discover the GMVGs and provide picklists when adding the GMVGs into the resource group in the next step as follows:

� smitty hacmp Extended Configuration Extended Resource Configuration HACMP Extended Resource Group Configuration Change/Show Resources and Attributes for a Resource Group, then press Enter.

Choose the appropriate resource group, rosie in our case, and on Volume Groups, press F4, select the GMVGs previously created, glvm1vg and glvm2vg in our case. Also for the Use forced varyon for volume groups, if necessary option, use Tab to change it to true and press Enter.

Synchronize the cluster by running:

� smitty hacmp Extended Configuration Extended Verification and Synchronization

Our resource group details are shown in Example 17-14.

Example 17-14 GMVG resource group

Resource Group Name rosieParticipating Node Name(s) glvm1 glvm2Startup Policy Online On Home Node OnlyFallover Policy Fallover To Next Priority Node In The List

Note: We created an application server, glvmiotest, that copied data from one of the rootvg file systems to our GMVG file systems and updated a date file.


Fallback Policy Never FallbackSite Relationship Prefer Primary SiteNode PriorityService IP LabelFilesystems ALLFilesystems Consistency Check fsckFilesystems Recovery Method sequentialFilesystems/Directories to be exported (NFSv3)Filesystems/Directories to be exported (NFSv4)Filesystems to be NFS mountedNetwork For NFS MountFilesystem/Directory for NFSv4 Stable StorageVolume Groups glvm1vg glvm2vgConcurrent Volume GroupsUse forced varyon for volume groups, if necessary trueDisksGMVG Replicated Resources glvm1vg glvm2vgGMD Replicated ResourcesPPRC Replicated ResourcesERCMF Replicated ResourcesSVC PPRC Replicated ResourcesAIX Connections ServicesAIX Fast Connect ServicesShared Tape ResourcesApplication Servers glvmiotestHighly Available Communication LinksPrimary Workload Manager ClassSecondary Workload Manager ClassDelayed Fallback TimerMiscellaneous DataAutomatically Import Volume Groups falseInactive TakeoverSSA Disk Fencing falseFilesystems mounted before IP configured falseWPAR Name

Starting the clusterAfter the cluster has been configured and synchronized, start PowerHA cluster services on the primary node. Assuming that the remote server is not available, the local RPV clients will not be accessible, so PowerHA will have to force the varyon of the volume group and all I/O to the local physical volumes, will result in stale partitions on the RPV client volumes as the local physical partitions are modified. See Figure 17-15.

Note: The GMVG Replicated Resources field is automatically populated during synchronization.


Now test the cluster via your preferred method. We recommend performing at least the following steps:

1. Start the cluster on each node and verify that resources are brought online successfully.

2. Stop the cluster on each node and verify that resources brought offline successfully.

3. Restart the cluster and move resource groups between nodes.

4. Test the failing network interfaces for recovery.

5. Hard fail the primary node using the reboot -q command.

You can use the cluster test tool to test your site cluster configuration.

Figure 17-15 Node active at primary site with backup down

Note: During our testing using enhanced volume groups, we discovered a problem with the volume group not being varied offline when the resource group was brought offline or when moving the resource group. This problem has been resolved in PowerHA 5.5 APAR IZ44984.


When the node at the remote site becomes active, PowerHA will activate the RPV Servers on that node and start the RPV clients on the primary site. PowerHA/XD will run an event which will inform the LVM that the RPV clients are available, so that the stale partitions on the RPV clients are resynchronized by copying data from the local physical volumes to the remote physical volumes. See Figure 17-16.

Figure 17-16 Backup site active and data replicating

If the application falls over to the backup site, or the primary site is taken off line, the reverse process will occur. Those physical volumes on the remote node, which were controlled by the RPV servers, are now the local physical volumes for the volume group, while the RPV clients (which point to the physical volumes on the primary site) will be offline until the primary node becomes available.

Tip: When utilizing the Inter-site Management Policy of Prefer Primary Site, as in our case, the above scenario of missing disks and stale partitions can be avoided by starting the remote node with the RPV servers first.


The volume group will again have to be activated in forced mode. See Figure 17-17.

Figure 17-17 Backup node active with primary site down


When the primary site becomes active, PowerHA/XD will either leave the resource group on the backup site, and the RPV servers there will start, bringing the backup servers RPV clients online. The LVM again will process the replication of the data on the backup’s physical volumes that are marked stale, synchronizing through the RPVs to the primary node’s physical volumes. See Figure 17-18.

Figure 17-18 Primary node integrated into the cluster and replication started

However, if the resource group is configured to prefer the primary site, processing will stop on the backup site and the resources be brought on line on production. This means that the application will be running on the primary site while the stale data is being synchronized back from the backup site. Any attempt by the application to read a stale partition will result in a read from the RPV server. See Figure 17-16 on page 731.

This question of site preference for the resource group can have important performance repercussions, as the I/O is handled by the LVM and it is largely unaware that the underlying device is a remote physical volume.


For another example, if we configure the local GMVG to be mirrored across two physical volumes and one RPV client (see Figure 17-19) we increase the changes of the data being available locally.

Figure 17-19 Application active on primary with 2 PV and one RPV client


However, if the primary site goes offline, it means that the backup site will be running on a volume group consisting of one local physical volume, and two RPV clients. See Figure 17-20.

Figure 17-20 Primary site down and application using 1 local copy of VG

In this scenario, the site preference of the resource group has a major influence on the performance of the re-synchronization of the data.


If the resource group has no site preference, it will stay online on the secondary site and two copies of the stale data will be sent over the XD_data network; one for each of the RPV servers. See Figure 17-21.

Figure 17-21 Application did not fallback on integration of primary site

However, if the site preference for the resource group is to fallback to the primary site, then the volume group will be activated on the primary site, with 1 physical volume (the RPV client) up to date, and the two local physical volumes with stale partitions. Thus there will only be one copy of each partition sent over the network to bring the physical volumes into a synchronized state.


Reads from the stale partitions will take longer as they will include the network latency, as they must be made from the remote physical volumes. See Figure 17-22.

Figure 17-22 Application falls over on primary site integration

The I/O pathFor normal I/O, the application will pass an I/O request to the LVM, which will pass the request through the disk device driver, through the adapter device driver and then to the physical volume.

With GLVM, the application will pass the I/O request to the LVM, which will pass the request to the RPV client device driver, which will send the request over the TCP/IP network to the remote RPV server. The remote RPV server will pass the request through the disk device driver on the remote node, through the adapter device driver to the remote physical volume. The response will then return the same way.


Any delay in the network will lead to slower I/O performance, while any error will be returned to the LVM as a physical volume device driver would.

The LVM will see the RPV client as a slower and less reliable physical volume - slower because of the longer I/O path (particularly network latency) and less reliable as long distance networks through multiple devices have a greater failure rate than local writes.

17.2 Converting from GLVM in synchronous mode to asynchronous mode

This section covers the migration of our existing PowerHA/XD GLVM cluster using synchronous mirroring to use asynchronous mirroring.

Asynchronous mirroring allows the local site to be updated immediately and the remote site to be updated as bandwidth allows. The information is cached and sent later, as network resources become available. While this can greatly increase application response time, there is some risk of data loss.

Factors to consider with asynchronous mirroring Consider the following factors with asynchronous mirroring:

� Network bandwidth� Network latency� Preventing data loss� Data divergence

To find more information about planning for asynchronous GLVM, refer to HACMP/XD for Geographic LVM (GLVM): Planning and Administration Guide.


17.2.1 Migration steps

Our existing cluster is configured as a 2-node cluster named glvmtest. It has a resource group rosie which hosts volume group glvm1vg mirrored using GLVM in synchronous mode. We have created scalable volume groups (SVG) because it is the only type of VG that can be used for asynchronous mirroring. The reason for this requirement comes from the fact that asynchronous GLVM depends on

Attention: At the time of writing this book, enhanced concurrent volume groups are not supported with asynchronous GLVM.



LVM Mirror Pool functionality, and AIX LVM currently allows mirror pools only on Scalable volume groups.

The configuration has two sites, USA and UK, with one node per site (glvm1/USA and glvm2/UK).

For the purposes of this example site USA with node glvm1 is the primary site/node. Site UK with node glvm2 is the secondary site/node.

Cluster services are up and running at this time, however, some tasks in the migration process were disruptive to our volume groups and logical volumes (we had to varyoff volume groups and unmount file systems), you might want to take your cluster offline prior to migration if these tasks are required in your environment. You will need to manipulate the rpvserver and client devices manually if you chose to take the cluster offline.

Convert synchronous mirrored VG to asynchronous mirroringTo convert a synchronous mirrored volume group to asynchronous mirroring, follow these steps.

1. Log in to both nodes in the cluster. Run lsvg -o to find out the node where the volume groups are online. We refer to this node as the primary node.

root@ glvm1[] lsvg -oglvm1vgrootvg

2. Run lsvg -p vgname on the primary node to find the RPV client volumes and make sure all the volumes are active. If you do not know which volumes are remote, then use the rpvstat command to find remote volumes in the volume group as seen in Example 17-15. Our local disk is hdisk1 (glvm1vg) and our remote disk is hdisk3 (glvm1vg).

Important: Before implementing Asynchronous GLVM, we recommend to become familiar with LVM Mirror Pools. Refer to the AIX 6.1 TL2 Infocenter documentation for details.

http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.baseadmn/doc/baseadmndita/mirrorpools.htm

It is important to become familiar with LVM Mirror Pools as the Asynchronous functionality in GLVM is dependent upon mirror pools. Later pages in this book often refer to AIX mirror pools.



Example 17-15 Output of lsvg -p checking for device state ‘active’

root@ glvm1[] lsvg -p glvm1vgglvm1vg:PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTIONhdisk1 active 1261 1002 251..00..247..252..252hdisk3 active 1262 1003 251..00..247..252..253

root@ glvm1[] rpvstatRemote Physical Volume Statistics: Comp Reads Comp Writes Comp KBRead Comp KBWrite ErrorsRPV Client cx Pend Reads Pend Writes Pend KBRead Pend KBWrite------------------ -- ----------- ----------- ------------ ------------ ---hdisk3 1 77 505 797 62973 0 0 0 0 0

3. Assign the local physical volumes in the volume group to a local mirror pool (there is no step to create the ‘mirror pool’ this will be done as a result of the command being run here). Use the smitty chpv fast path. Select the appropriate PV name and assign the siteA mirror pool name. In our example we have a total of 2 mirror pools, 2 for each VG. Example 17-16 shows the mirror pool configuration steps for the local PV in glvm1vg (hdisk1).

Example 17-16 Assign local PVs to mirror pool

Change Characteristics of a Physical Volume

Type or select values in entry fields.Press Enter AFTER making all desired changes. [Entry Fields]* Physical volume NAME hdisk1 Allow physical partition ALLOCATION? yes + Physical volume STATE active + Set hotspare characteristics n + Set Mirror Pool [USA_glvm1_A] + Change Mirror Pool Name [] Remote From Mirror Pool +

4. Now list the PVs and check the mirror pool information for the local PV using the lsvg -P command. Make sure the mirror pool contains the local physical volumes. Run the command lsvg -P vgname.

root@ glvm1[] lsvg -P glvm1vgPhysical Volume Mirror Poolhdisk1 USA_glvm1_A


5. Run smitty chpv fast path to add remote PV to a mirror pool. Perform this action for each remote PV. Select the correct remote PV and assign the siteB mirror pool name. Example 17-17 shows the config for the remote PV of glvm1vg (hdisk3).

Example 17-17 Assign remote PVs to mirror pool

Change Characteristics of a Physical Volume

Type or select values in entry fields.Press Enter AFTER making all desired changes. [Entry Fields]* Physical volume NAME hdisk3 Allow physical partition ALLOCATION? yes + Physical volume STATE active + Set hotspare characteristics n + Set Mirror Pool [UK_glvm1_B] + Change Mirror Pool Name [] Remote From Mirror Pool +

6. Now list the PVs and check the mirror pool information for the remote PV using the lsvg -P command. Make sure the remote physical volumes are assigned to the correct mirror pool. Run the command lsvg -P vgname.

root@ glvm1[] lsvg -P glvm1vgPhysical Volume Mirror Poolhdisk1 USA_glvm1_Ahdisk3 UK_glvm1_B

7. Change volume group attributes to disable auto varyon and turn on super strict mirror pools. Use smitty chvg fast path. Select the volume group. See Example 17-18.

Example 17-18 Change VG attributes - preparation for async mirroring

Change a Volume Group


[TOP] [Entry Fields]* VOLUME GROUP name glvm1vg* Activate volume group AUTOMATICALLY no +

Tip: Each volume group should contain two mirror pools. All local disks in the volume group should be added to one mirror pool (local mirror pool) and all remote disks in the volume group added to the other mirror pool (remote mirror pool).


at system restart?* A QUORUM of disks required to keep the volume no + group on-line ? Convert this VG to Concurrent Capable? no + Change to big VG format? no + Change to scalable VG format? no + LTG Size in kbytes 128 + Set hotspare characteristics n + Set synchronization characteristics of stale n +partitions Max PPs per VG in units of 1024 32 + Max Logical Volumes 256 + Mirror Pool Strictness Super +

8. Change volume group characteristics to disable bad block relocation using the chvg -b n vgname command.

9. Now list the volume group properties with the lsvg vgname command. Make sure the values for auto varyon, mirror pool strictness and bad block relocation reflect the changes made in the previous two steps, as seen in Example 17-19.

Example 17-19 lsvg output - preparation for async mirroring

root@ glvm1[] lsvg glvm1vgVOLUME GROUP: glvm1vg VG IDENTIFIER: 000fe4010000d9000000011ff6a4095eVG STATE: active PP SIZE: 4 megabyte(s)VG PERMISSION: read/write TOTAL PPs: 2523 (10092 megabytes)MAX LVs: 256 FREE PPs: 2005 (8020 megabytes)LVs: 4 USED PPs: 518 (2072 megabytes)OPEN LVs: 2 QUORUM: 1 (Disabled)TOTAL PVs: 2 VG DESCRIPTORS: 4STALE PVs: 0 STALE PPs: 0ACTIVE PVs: 2 AUTO ON: noMAX PPs per VG: 32768 MAX PVs: 1024LTG size (Dynamic): 128 kilobyte(s) AUTO SYNC: noHOT SPARE: no BB POLICY: non-relocatableMIRROR POOL STRICT: super

10.Turn off bad block relocation for all the logical volumes in the volume group. Use smitty chlv fast path to change the logical volume attributes. Run this step for each logical volume in each volume group. This operation requires the LV to be closed, unmount the file systems to complete this step. See Example 17-20 for the result of these steps.


Example 17-20 Logical volume status after disabling bad block relocation

root@ glvm1[] lslv glvm1lvLOGICAL VOLUME: glvm1lv VOLUME GROUP: glvm1vgLV IDENTIFIER: 000fe4010000d9000000011ff6a4095e.1 PERMISSION: read/writeVG STATE: active/complete LV STATE: opened/syncdTYPE: jfs2 WRITE VERIFY: offMAX LPs: 512 PP SIZE: 4 megabyte(s)COPIES: 2 SCHED POLICY: parallelLPs: 256 PPs: 512STALE PPs: 0 BB POLICY: non-relocatableINTER-POLICY: minimum RELOCATABLE: yesINTRA-POLICY: middle UPPER BOUND: 1MOUNT POINT: /glvm1jfs2 LABEL: /glvm1jfs2MIRROR WRITE CONSISTENCY: on/ACTIVEEACH LP COPY ON A SEPARATE PV ?: yes (superstrict)Serialize IO ?: NODEVICESUBTYPE : DS_LVZCOPY 1 MIRROR POOL: noneCOPY 2 MIRROR POOL: NoneCOPY 3 MIRROR POOL: None

11.Assign the mirror pool which will be used for each copy of each logical volume. Perform this step for each logical volume in the volume group. Use the command lslv lvname to make sure each copy has a mirror pool assigned when the chlv command is successful.

The command to assign LV copies to mirror pools is shown here:

# chlv -m copy1=siteA_pool -m copy2=siteB_pool <lv name>

See Example 17-21 which shows the result of assigning our LV copies to mirror pools.

Example 17-21 Assign LV copies to mirror pools

root@ glvm1[] lslv glvm1lvLOGICAL VOLUME: glvm1lv VOLUME GROUP: glvm1vgLV IDENTIFIER: 000fe4010000d9000000011ff6a4095e.1 PERMISSION: read/writeVG STATE: active/complete LV STATE: opened/syncdTYPE: jfs2 WRITE VERIFY: offMAX LPs: 512 PP SIZE: 4 megabyte(s)

Note: Make sure you turn off bad block relocation and assign each copy of each LV to a mirror pool. Do not forget your log LVs.


COPIES: 2 SCHED POLICY: parallelLPs: 256 PPs: 512STALE PPs: 0 BB POLICY: non-relocatableINTER-POLICY: minimum RELOCATABLE: yesINTRA-POLICY: middle UPPER BOUND: 1MOUNT POINT: /glvm1jfs2 LABEL: /glvm1jfs2MIRROR WRITE CONSISTENCY: on/ACTIVEEACH LP COPY ON A SEPARATE PV ?: yes (superstrict)Serialize IO ?: NODEVICESUBTYPE : DS_LVZCOPY 1 MIRROR POOL: USA_glvm1_ACOPY 2 MIRROR POOL: UK_glvm1_BCOPY 3 MIRROR POOL: None

Asynchronous GLVM mirroring requires a new type of logical volume for caching of asynchronous write requests. This logical volume should not be mirrored across sites. Super strict mirror pools handle this new aio_cache logical volume type as a special case.

Logical volumes of type aio_cache are required in each mirror pool. Use the smitty mklv fast path to create the LV. Select from the picklist to add the aio cache LV for site A, to the mirror pool for siteA and the aio cache LV to the mirror pool for siteB, as seen in Example 17-22 (see lsvg -l vgname output for the result). Make sure you turn off bad block relocation for these LVs and set superstrict allocation.

Example 17-22 Add AIO cache logical volumes



[TOP] [Entry Fields] Logical volume NAME [USA_glvm1_Aclv]* VOLUME GROUP name glvm1vg* Number of LOGICAL PARTITIONS [2] #PHYSICAL VOLUME names [] + Logical volume TYPE [aio_cache] + POSITION on physical volume middle + RANGE of physical volumes minimum +

Note: AIO cache logical volumes cannot be spread across sites.


MAXIMUM NUMBER of PHYSICAL VOLUMES [2] # to use for allocation Number of COPIES of each logical 1 + partition Mirror Write Consistency? active + Allocate each logical partition copy superstrict + on a SEPARATE physical volume? RELOCATE the logical volume during yes + reorganization? Logical volume LABEL [] MAXIMUM NUMBER of LOGICAL PARTITIONS [512] # Enable BAD BLOCK relocation? no + SCHEDULING POLICY for writing/reading parallel + logical partition copies Enable WRITE VERIFY? no + File containing ALLOCATION MAP [] Stripe Size? [Not Striped] + Serialize IO? no + Mirror Pool for First Copy USA_glvm1_A + Mirror Pool for Second Copy + Mirror Pool for Third Copy +

root@ glvm1[] lsvg -l glvm1vgglvm1vg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT

POINTglvm1lv jfs2 256 512 2 open/syncd /glvm1jfs2testlog jfs2log 1 2 2 open/syncd N/AUSA_glvm1_Aclv aio_cache 2 2 1 closed/syncd N/AUK_glvm1_Bclv aio_cache 2 2 1 closed/syncd N/A

12.Configure the mirror pool at site A for asynchronous mirroring. Go to smitty glvm_utils Geographically Mirrored Volume Groups Manage Geographically Mirrored Volume Groups with Mirror Pools Configure Mirroring Properties of a Mirror Pool Convert to Asynchronous Mirroring for a Mirror Pool.

– To select the Logical Volume for I/O Cache press F4 for the picklist and select the appropriate aio cache lv.

– Use the site B aio cache LV as the cache logical volume.

– Similarly, configure the site B mirror pool for asynchronous mirroring and use the site A aio cache LV as the cache logical volume.


13.First mirror pool (USA_glvm1_A), associate aio cache LV (UK_glvm1_Bclv), as seen in Example 17-23.

Example 17-23 Configure mirror pools for asynchronous mirroring - first pool

Convert to Asynchronous Mirroring for a Mirror Pool


[Entry Fields]* VOLUME GROUP name glvm1vg* MIRROR POOL Name USA_glvm1_A Logical Volume for I/O Cache [] + I/O Cache High Water Mark Value [100] #

+--------------------------------------------------------------------------+ | Logical Volume for I/O Cache | | | | Move cursor to desired item and press Enter. | | | | UK_glvm1_Bclv | | | | F1=Help F2=Refresh F3=Cancel |F1| F8=Image F10=Exit Enter=Do |F5| /=Find n=Find Next |F9+--------------------------------------------------------------------------+

14.Then do the same for the second mirror pool (UK_glvm1_B) associate aio cache LV (USA_glvm1_Aclv), as seen in Example 17-24.

Example 17-24 Configure mirror pools for asynchronous mirroring - second pool

Convert to Asynchronous Mirroring for a Mirror Pool


[Entry Fields]* VOLUME GROUP name glvm1vg* MIRROR POOL Name UK_glvm1_B Logical Volume for I/O Cache [] + I/O Cache High Water Mark Value [100] #

+--------------------------------------------------------------------------+


| Logical Volume for I/O Cache | | | | Move cursor to desired item and press Enter. | | | | USA_glvm1_Aclv | | | | F1=Help F2=Refresh F3=Cancel |F1| F8=Image F10=Exit Enter=Do |F5| /=Find n=Find Next |F9+--------------------------------------------------------------------------+

15.List the mirror pool information using the lsmp -A vgname command as seen in Example 17-25. The -A flag will print asynchronous mirroring information. This information is the status known by the LVM. Observe that the site B mirror pool shows that it is actively mirroring in asynchronous mode.

Example 17-25 List mirror pools for each VG to show async mirror state

root@ glvm1[] lsmp -A glvm1vgVOLUME GROUP: glvm1vg Mirror Pool Super Strict: yes

MIRROR POOL: USA_glvm1_A Mirroring Mode: ASYNCASYNC MIRROR STATE: inactive ASYNC CACHE LV: UK_glvm1_BclvASYNC CACHE VALID: yes ASYNC CACHE EMPTY: yesASYNC CACHE HWM: 100 ASYNC DATA DIVERGED: no

MIRROR POOL: UK_glvm1_B Mirroring Mode: ASYNCASYNC MIRROR STATE: active ASYNC CACHE LV: USA_glvm1_AclvASYNC CACHE VALID: yes ASYNC CACHE EMPTY: noASYNC CACHE HWM: 100 ASYNC DATA DIVERGED: noASYNC CACHE HWM: 100 ASYNC DATA DIVERGED: no

16.Now run the rpvstat -A and rpvstat -C commands to get the asynchronous statistics from RPV driver as seen in Example 17-26.

Example 17-26 Output of rpvstat -A and rpvstat -C

root@ glvm1[/] rpvstat -A

Remote Physical Volume Statistics: Completd Completed Cached Cached Pending Pending Async Async Async Async Async AsyncRPV Client ax Writes KB Writes Writes KB Writes Writes KBWrites------------ -- -------- ----------- -------- ----------- -------- --------hdisk3 A 0 0 1 4 0 0

root@ glvm1[/] rpvstat -C



Max Pending Total Max Total Async Cache Cache Cache Cache Cache

FreeGMVG Name ax Writes Util % Writes Wait % Wait Space

KB---------------- -- -------------- ------ ---------- ------ ------- -------glvm1vg A 1 6.31 0 0.00 0 15350

17.Now make changes to the resource group rosie to handle the asynchronously mirrored volume groups. smitty hacmp Extended Configuration Extended Resource Configuration HACMP Extended Resource Group Configuration Change/Show Resources and Attributes for a Resource Group

18.Select rosie.

19.Select site B (UK) as the default choice for data divergence recovery so if data divergence happens then site B (UK) data will be preserved. Also change Allow varyon with missing data updates to true so after a primary site failure the volume groups can be activated even if data divergence might occur. You can see this in Example 17-27.

Example 17-27 Update RG to handle async GMVGs



[TOP] [Entry Fields] Resource Group Name rosie Inter-site Management Policy Prefer Primary Site Participating Nodes from Primary Site glvm1 Participating Nodes from Secondary Site glvm2 Startup Policy Online On Home Node> Fallover Policy Fallover To Next Pri> Fallback Policy Never Fallback Service IP Labels/Addresses [] + Application Servers [glvm_app] + Volume Groups [glvm1vg ] + Use forced varyon of volume groups, if necessary true + Automatically Import Volume Groups false + Default choice for data divergence recovery UK + (Asynchronous GLVM Mirroring Only) Allow varyon with missing data updates? true + (Asynchronous GLVM Mirroring Only) Filesystems (empty is ALL for VGs specified) [ ] + Filesystems Consistency Check fsck + Filesystems Recovery Method sequential +


Filesystems mounted before IP configured false +[MORE...18]


20.Now run a verification and synchronization of the cluster smitty hacmp Extended Configuration Extended Verification and Synchronization Press Enter. Make sure verification and synchronization passes.

17.2.2 Test primary site failure

In any cluster, changes must be tested to ensure desired operation. The primary role of GLVM is data availability and PowerHA/XD provides the high availability aspects within the cluster environment itself, therefore we suggest that prior to making changes to any production system of this nature, you fully test functionality in a test environment before working with real data. In any environment it is good practice to take a backup of your operating system, cluster configuration and your production data/applications prior to any changes being made to the environment.

Document a test plan of what you want to test, what you expect the result to be and what the actual result was. This is critical for other administrators in your environment so that they are aware if there are any differences in cluster operation since the last test procedure was completed.

Next we run through a failure of our primary site (USA).

1. Ensure that cluster services are running on all nodes and the cluster is stable. There are various methods for this. We are interested in making sure that the clstrmgrES is stable on both nodes and that our GMVG is online on the primary site in preparation for the test. Now that we are mirroring asynchronously, we also need to check the status of our mirror pools, as seen in Example 17-28; you can have additional status checks to make in your environment.

Example 17-28 Check cluster and mirror pool status prior to primary site failure

# clfindres---------------------------------------------------------------------------Group Name Group State Node---------------------------------------------------------------------------

Note: During this testing phase we used PowerHA 5.5 service pack 1 and GLVM 5.5 service pack 2.


rosie ONLINE glvm1@USA ONLINE SECONDARY glvm2@UK

# lsmp -A glvm1vgVOLUME GROUP: glvm1vg Mirror Pool Super Strict: yes


MIRROR POOL: USA_glvm1_B Mirroring Mode: ASYNCASYNC MIRROR STATE: active ASYNC CACHE LV: USA_glvm1_AclvASYNC CACHE VALID: yes ASYNC CACHE EMPTY: noASYNC CACHE HWM: 100 ASYNC DATA DIVERGED: no

# gmvgstatGMVG Name PVs RPVs Tot Vols St Vols Total PPs Stale PPs Sync--------------- ---- ---- -------- -------- ---------- ---------- glvm1vg 1 1 2 0 2523 0 100%

2. Fail the primary site; in our case this was done by running halt -q on node glvm1@siteUSA. Monitor your cluster during fallover as normal.

3. Check the status of the cluster again, paying attention to the mirror pool status as seen in Example 17-29.

Example 17-29 Cluster and mirror pool status after primary site failure

root@ glvm2[/] clfindres---------------------------------------------------------------------------Group Name Group State Node---------------------------------------------------------------------------rosie OFFLINE glvm1@USA ONLINE glvm2@UK

root@ glvm2[/] lsmp -A glvm1vgVOLUME GROUP: glvm1vg Mirror Pool Super Strict: yesMIRROR POOL: USA_glvm1_A Mirroring Mode: ASYNCASYNC MIRROR STATE: inactive ASYNC CACHE LV: UK_glvm1_BclvASYNC CACHE VALID: yes ASYNC CACHE EMPTY: yesASYNC CACHE HWM: 100 ASYNC DATA DIVERGED: no

MIRROR POOL: USA_glvm1_B Mirroring Mode: ASYNCASYNC MIRROR STATE: inactive ASYNC CACHE LV: USA_glvm1_AclvASYNC CACHE VALID: yes ASYNC CACHE EMPTY: noASYNC CACHE HWM: 100 ASYNC DATA DIVERGED: yes


4. The production site fallover is complete. In your environment this would be tested with a copy of your production data, meaning that there are further minimum checks which should be make to ensure that the data is mirrored. Refer to “Asynchronous GLVM: Commands to check status” on page 751 for some further tools which can assist with these checks.

Asynchronous GLVM: Commands to check statusThroughout testing we monitored our GMVG and mirror pool status. This section includes a list of commands which were useful to ensure that we had a clear view of our configuration and mirror states.

List mirror pools - lsvg -P vgnameroot@ glvm1[] lsvg -P glvm1vgPhysical Volume Mirror Poolhdisk1 USA_glvm1_Ahdisk3 UK_glvm1_B

List mirror pool status - lsmp -A vgnameroot@ glvm1[] lsmp -A glvm1vgVOLUME GROUP: glvm1vg Mirror Pool Super Strict: yes


MIRROR POOL: UK_glvm1_B Mirroring Mode: ASYNCASYNC MIRROR STATE: active ASYNC CACHE LV: USA_glvm1_AclvASYNC CACHE VALID: yes ASYNC CACHE EMPTY: noASYNC CACHE HWM: 100 ASYNC DATA DIVERGED: no

This command will show your mirror pool status, as known by the LVM. You should monitor all output from this command, pay particular attention to async cache valid and async mirror state for further information about AIO cache validity, refer to “AIO cache validity” on page 754

GMVG status - gmvgstat (see the following syntax)Usage: gmvgstat -h | [-r] [-t] [-i <Interval> [-c <Count>] [-w]] [gmvgname [gmvgname] ... ] -h Display command help. -r Display RPV client details. -t Display header with date and time. -i Redisplay every <Interval> seconds (from 1 to 3600). -c Redisplay for <Count> repeats (from 1 to 999999). -w Clear screen before redisplay.


root@ glvm1[] gmvgstatGMVG Name PVs RPVs Tot Vols St Vols Total PPs Stale PPs Sync--------------- ---- ---- -------- -------- ---------- ---------- ----glvm1vg 1 1 2 0 2523 0 100%

RPV status - rpvstatThere are many flags for the rpvstat command, shown below are the additional options for async mirroring. The rpvstat commands shows the status as known by the RPV device driver.

# rpvstat


Comp Reads Comp Writes Comp KBRead Comp KBWrite ErrorsRPV Client cx Pend Reads Pend Writes Pend KBRead Pend KBWrite------------------ -- ----------- ----------- ------------ ------------ ------hdisk3 1 79 549 798 68550 0 0 0 0 0

-A Display the statistics for Asynchronous Mirroring.# rpvstat -A


Completd Completed Cached Cached Pending Pending Async Async Async Async Async AsyncRPV Client ax Writes KB Writes Writes KB Writes Writes KB Writes------------ -- -------- ----------- -------- ----------- -------- -----------hdisk3 A 165 3088 0 0 0 0

-C Display the statistics for Asynchronous I/O Cache.# rpvstat -C


Max Pending Total Max Total Async Cache Cache Cache Cache Cache FreeGMVG Name ax Writes Util % Writes Wait % Wait Space KB---------------- -- -------------- ------ ---------- ------ ------- ----------glvm1vg A 165 10.22 0 0.00 0 15359

-r Reset counters for the Asynchronous I/O cache information.

Check location of stale partitions - lsvg -M vgnameIf you have stale partitions in your GMVG and want to check where those stale partitions are, you can use this command as seen in Example 17-30.


Example 17-30 lsvg -M output

root@ glvm1[] lsvg -M glvm1vg | more

glvm1vghdisk1:1-251hdisk1:252 USA_glvm1_Aclv:1hdisk1:253 USA_glvm1_Aclv:2hdisk1:254 glvm1lv:1:1hdisk1:255 glvm1lv:2:1hdisk1:256 glvm1lv:3:1hdisk1:257 glvm1lv:4:1hdisk1:258 glvm1lv:5:1hdisk1:259 glvm1lv:6:1hdisk1:260 glvm1lv:7:1hdisk1:261 glvm1lv:8:1hdisk1:262 glvm1lv:9:1hdisk1:263 glvm1lv:10:1hdisk1:264 testlog:1:1hdisk1:265 glvm1lv:11:1hdisk1:266 glvm1lv:12:1hdisk1:267 glvm1lv:13:1hdisk1:268 glvm1lv:14:1hdisk1:269 glvm1lv:15:1hdisk1:270 glvm1lv:16:1hdisk1:271 glvm1lv:17:1hdisk1:272 glvm1lv:18:1<snip>hdisk3:1-251hdisk3:252 UK_glvm1_Bclv:1hdisk3:253 UK_glvm1_Bclv:2hdisk3:254 glvm1lv:1:2 stalehdisk3:255 glvm1lv:2:2 stalehdisk3:256 glvm1lv:3:2 stalehdisk3:257 glvm1lv:4:2 stalehdisk3:258 glvm1lv:5:2 stalehdisk3:259 glvm1lv:6:2 stalehdisk3:260 glvm1lv:7:2 stalehdisk3:261 glvm1lv:8:2 stalehdisk3:262 glvm1lv:9:2hdisk3:263 glvm1lv:10:2 stalehdisk3:264 testlog:1:2 stalehdisk3:265 glvm1lv:11:2hdisk3:266 glvm1lv:12:2hdisk3:267 glvm1lv:13:2hdisk3:268 glvm1lv:14:2hdisk3:269 glvm1lv:15:2hdisk3:270 glvm1lv:16:2hdisk3:271 glvm1lv:17:2hdisk3:272 glvm1lv:18:2


AIO cache validityThe purpose of the AIO cache logical volume in to cache asynchronous write requests. This is critical to the availability of data within a GLVM cluster being asynchronously mirrored.

If any I/O operation in an AIO cache LV were to fail, this could cause the cache to become invalid. For this reason, we recommend that you monitor the status of the AIO cache regularly. The lsmp -A command provides details of mirror pool status, including the status of AIO cache logical volumes. If the cache LV were to become invalid as seen in Example 17-31, then you will see errpt entries such as those seen in Example 17-32.

Example 17-31 Invalid AIO cache LV shown with lsmp command output

root@ glvm2[/] lsmp -A glvm1vgVOLUME GROUP: glvm1vg Mirror Pool Super Strict: yes


MIRROR POOL: UK_glvm1_B Mirroring Mode: ASYNCASYNC MIRROR STATE: inactive ASYNC CACHE LV: USA_glvm1_AclvASYNC CACHE VALID: no ASYNC CACHE EMPTY: yesASYNC CACHE HWM: 100 ASYNC DATA DIVERGED: no

Example 17-32 errpt entries for invalid AIO cache LV

LABEL: LVM_CLV_FAIL_NOTIFYIDENTIFIER: 71D2CAA4

Date/Time: Thu Apr 16 14:12:17 EDT 2009Sequence Number: 1270Machine Id: 000FE401D900Node Id: glvm1Class: SType: PERMWPAR: GlobalResource Name: LVDD

DescriptionAIO CACHE FAIL NOTIFY RECEIVED

Probable CausesRemote Physical Volume Driver failed to perform Input/Outputoperation on IO cache logical volumedevice.


Recommended Actions Logical Volume Manager will mark all the asynchronous Logical Partition copies as stale for each Logical volume.

Detail DataAIO CACHE DEVICE MAJOR/MINOR8000 0022 0000 0003MIRROR POOL ID0000 0000 0000 0002VOLUME GROUP ID000F E401 0000 D900 0000 011F F6A4 095E

LABEL: LVM_CLV_FAIL_DONEIDENTIFIER: 30097641

Date/Time: Thu Apr 16 14:12:17 EDT 2009Sequence Number: 1274Machine Id: 000FE401D900Node Id: glvm1Class: SType: INFOWPAR: GlobalResource Name: LVDD

DescriptionAIO CACHE FAIL RECOVERY DONE

Recommended Actions If asynchronous IO cache is marked as invalid then using chmp command disable the asynchronous mirroring for a volume group. Synchronize all the logical volume copies. Delete a old aio_cache type logical volume. Create a new aio_cache type logical volume. Setup an asynchronous mirroring using new aio_cache type logical volume.

Detail DataAIO CACHE DEVICE MAJOR/MINOR8000 0022 0000 0003MIRROR POOL ID0000 0000 0000 0002VOLUME GROUP ID000F E401 0000 D900 0000 011F F6A4 095E


In the event of an invalid cache, the mirror pool partitions will be marked stale. The recovery action is to convert the mirror pool to synchronous (because the cache cannot be administered while in asynchronous mode), delete the old cache LV, and create a new cache LV. These steps are required so that we replace the “bad” logical volume with a good one to ensure that mirroring takes place correctly. Then the mirror pool can be converted to async again and synchronized.

It is recommended that you use some form of RAID level mirroring (for example, in your disk subsystem) to reduce the chance of AIO cache failure taking place.

17.3 Migration: Logic for going from HAGEO to GLVMThere is no automatic migration from HACMP/XD HAGEO to PowerHA/XD GLVM, but both the HAGEO and GLVM (using synchronous copy only) versions can coexist on the same cluster. So a step by step migration of the Geomirrored resources can be performed with some downtime.

As HACMP/XD HAGEO does not support dynamic reconfiguration, the whole cluster will have to be stopped for the topology and resource change. However, the migration of the data from a GMD to a GMVG can be done with the application live. This migration does require the re-mirroring of all the data to the remote site.

For our example migration, we look at a cluster with two nodes and two applications at the primary site, and a single node at the backup. The steps are:

1. Install GLVM filesets.

2. Stop remote site and create RPV servers on the remote site.

3. Create local rpv client and mirror.

Important: Asynchronous mode GLVM is only supported on AIX 6.1 and HAGEO is not. Hence you cannot migrate an asynchronous HAGEO configuration to asynchronous GLVM via the method discussed here.

Important: Careful planning is required because the full replication of each data logical volume will both affect the primary and secondary nodes performance as well as use a large proportion of the network bandwidth. The size of the logical volumes and network bandwidth will determine the time required, and in some cases it might take days. Ideally this would be performed for new configurations with no user traffic impeeding the process.


4. Mirror the local data logical volumes.

5. Create local rpvservers and remote rpvclients.

6. Modify /etc/filesystems to point to LVs not GMDs (if using file systems).

7. Stop the cluster and modify topology and resource group definitions.


9. Start the cluster.

Figure 17-23 shows the cluster running HACMP/XD HAGEO. Node thor and odin are at site Boston, running one application each, with each application using one Geomirrored file system (both the underlying logical volume and jfslog are mirrored), which is replicated to frigg at site Munchen.

Figure 17-23 Example HACMP/XD cluster for migration to GLVM

Note: The steps described are assuming limited hardware, so the backup copy of the data will not be available for the entire migration period.

odin

Boston Munchen

hdisk4

ulv11ulv11_smulv11_jfslogulv11_jfslog_sm

vg01

thor

Application 2

frigg

hdisk5


vg02 hdisk4


vg02hdisk3


vg01

/app02

ulv21_log_gmdulv21_gmd

LVM


LVM

Application 1

/app01


LVM

ulv21_log_gmdulv21_gmdkrpc

krpc


For this exercise we only look in detail at the migration of the resource group on node thor as the logic is the same for all nodes. Figure 17-24 shows the details of the single replicated resource.

17.3.1 Install GLVM filesets and configure GLVMInstall the GLVM filesets on each node in the cluster. Take a PowerHA and HAGEO snapshot.

Figure 17-24 Example with one GMD resource

1. Stop the remote site and create RPV servers on the remote site.

The aim of this migration is to move the data replication to use GLVM, without requiring extra hardware. The limitation of this process is that the remote copy of the GMDs will have to be turned off and then be overwritten as a GLVM device—thus leaving the customer without a backup copy of the data. If this is an issue, then extra hardware will be required.

In our example, the replicated copy of the resource group on the backup site is stopped (the primary GMD no longer synchronizing data) and an RPV server created pointing to the physical volume. See Figure 17-25.

frigg

hdisk3

thorApplication

Boston Munchen

ulv11_loggmd


ulv11_gmd

/app01

hdisk4

ulv11_loggmd


ulv11_gmd krpc

LVMLVM

vg_01


Figure 17-25 RPV Server created

2. Create local rpv client and mirror.

The next step is to create the local RPV client and add the physical volume to the GMD volume group. Each of the data and jfslog logical volumes needs to be changed to superstrict allocation policy and then extended by adding a copy on the RPV client.

We recommend that the statemap logical volumes do not need to be mirrored, as they will no longer be required after the GMDs are turned off.

After the logical volumes have had the mirror copy defined, the data can then be synchronized. This will replicate all the data from the primary copy of the data to the backup site affecting the performance of both nodes and placing a large load on the network. See Figure 17-26.

friggthorApplication

Boston Munchen

/app01

hdisk4

ulv11_loggmd


ulv11_gmd

LVM

hdisk3

RPV Serverkernel extn.

vg_01


Figure 17-26 Mirror data LVs to RPV client

3. Create local rpvservers and remote rpvclients.

Although not required until the application falls over, the RPV servers and clients must be created on all nodes or PowerHA will fail verification.

4. Modify /etc/filesystems to point to LVs not GMDs.

File systems will now point to the logical volume, not the GMD, so changes need to be made to /etc/filesystems on each node. Any statemap or mirrored logical volume should be removed from the GMVG, as PowerHA/XD will return an error if there are unreplicated logical volumes on any physical volume in a GMVG.

5. Stop the cluster and modify topology and resource group definitions.

As HACMP/XD HAGEO does not support dynamic reconfiguration, the cluster must be stopped on all nodes, so that:

– The XD_data network can be configured– The GMD definitions removed from the resource group– The GLVM volume groups’ force varyon set true

frigg

hdisk3

thorApplication

Boston Munchen

hdisk8

/app01

hdisk4

ulv1_loggmd


ulv1_gmd

LVMRPV Serverkernel extn.


mirror

mirror

vg_01



The GLVM changes need to be verified and synchronized to each node in the cluster.

7. Start the cluster.

The cluster can now be started, with the modified resource group using the GLVM devices (See Figure 17-27), and the remaining resource groups using GMDs. The unused GMD definitions can now be removed.

Figure 17-27 Application now using GLVM devices.

One option to consider is stopping only the resource group that contains the GMDs to be replaced, do a forced down of the cluster on all nodes, make the changes to the topology and resource group configurations, then restart the cluster. This will mean that the only outage experienced will be for the resource group that we are actually modifying.

friggthorApplication

Boston Munchen

hdisk8

/app01

hdisk4

ulv11

ulv11_jfslog

LVMRPV Client

device driver

mirror

mirror

hdisk3

RPV Serverkernel extn.

vg_01


17.3.2 Performance considerationsWhile not in the scope of this book, the following considerations can be useful in planning your GMVG configuration:

� Mirror write consistency:

The mirror read write consistency can be turned off to improve write performance, but on rebooting after a crash, a syncvg -f must be run before the logical volume can be accessed. Or the LV can be set:

– Active:

This is the default for a mirrored logical volume.

Ensures a fast recovery after a system crash (no need to do the syncvg -f on reboot). This could lead to a performance problem on writes.

– Passive:

No performance penalty on writes, and will not require a syncvg -f after reboot. This will do a background resynchronization of all partitions if it is detected that the system was not shutdown gracefully.

� LVM scheduling policies:

There are four read / write scheduling policies defined for mirrored logical volumes:

– Parallel:

Reads will be balanced across the physical volumes (sent to the device with the shortest queue), writes will be sent to each physical volume in parallel (for example, at the same time).

– Sequential:

Reads will be from the primary copy and writes will be done in sequence (for example, one copy after another).

– Parallel write, sequential read:

Reads will be done from the primary copy and writes will be sent to all physical volumes in parallel.

– Parallel write, round-robin read:

Reads will be from each copy in turn and writes will be sent to all physical volumes in parallel.

� Write verify:

There are two options:

– Yes: All writes to the logical volume will be followed by a read.– No: Writes not verified.


For GMVGs:

� Mirror write consistency:

We recommend that the mirror write consistency be left active as a crash of the node will result in the synchronization of the whole logical volume. However, if the network bandwidth and logical volume sizes can handle this then the passive mode could be considered.

� LVM scheduling policies:

The default parallel policy is recommended because the LVM developers have made a small change for GMVGs. The change is that the LVM will attempt to read from a local copy if the physical volumes are available, in preference to reading from the RPV.

� Write verify:

We would strongly recommend leaving this OFF, which is the default.

17.3.3 TroubleshootingHere we offer some troubleshooting suggestions:

� Unlike HAGEO, there is very little data in syslog, - one trace hook (4A6).

� PowerHA snapshot contains the lsrpvserver -H and lsrpvclient -H output in the .info file.

� The snap -g command contains the RPV server and client configurations.

� general.snap - filesets; attributes for rpvserver and rpvclients.

� The CuAt command contains information about the remote site name.

Example 17-33 shows RPV server properties.

Example 17-33 Check RPV server characteristics

frigg:/# lsattr -El rpvserver0auto_online n Configure at System Boot Trueclient_addr 192.168.101.74 Client IP Address Trueclient_addr 192.168.101.73 Client IP Address Truerpvs_pvid 0022be2aa13f292e0000000000000000 Physical Volume Identifier Truefrigg:/# lsattr -El hdisk7io_timeout 180 I/O Timeout Interval Truelocal_addr 10.1.101.192 Local IP Address Truepvid 0022be2aa13dc0720000000000000000 Physical Volume Identifier Trueserver_addr none Server IP Address True

Also, to check the RPV error information, see Example 17-34.


Example 17-34 RPV error sample

odin:/# lsrpvserver -H# RPV Server Physical Volume Identifier Physical Volume# ----------------------------------------------------------- rpvserver0 0022be2aa13dc072 hdisk2odin:/# lsrpvclient -H# RPV Client Physical Volume Identifier Remote Site# ----------------------------------------------------------- hdisk6 0022be2aa13f292e Munchen

LABEL: RPVC_IO_TIMEOUTIDENTIFIER: D034B795

Date/Time: Thu Jul 14 15:48:03 2005Sequence Number: 16314Machine Id: 002574004C00Node Id: friggClass: UType: PERMResource Name: hdisk7Resource Class: diskResource Type: rpvclientLocation:VPD:

DescriptionNo response from RPV server within I/O timeout interval.

Probable CausesRPV server is down or not reachable.

Failure CausesThere is a problem with the data mirroring network.The node or site which hosts the RPV server is down.RPV server is not configured in the Available state.

Recommended Actions Correct the problem which has caused the RPV server to be down or not reachable. Then, tell the RPV client to resume communication with the RPV server by running the command: chdev -l <device> -a resume=yes where <device> is the name of this RPV client device.

---------------------------------------------------------------------------


17.4 Steps for migrating from HAGEO to GLVMIn this section we provide migration steps.

Installing the package for GLVMSelect the following packages from the installation media:

� cluster.doc.en_US.glvm.html� cluster.doc.en_US.glvm.pdf� cluster.xd.license� cluster.xd.glvm� glvm.rpv.client� glvm.rpv.server� glvm.rpv.man.en_US� glvm.rpv.util

1. We begin with a graceful stop of the cluster services on frigg. This will stop the geo mirror devices at the remote site Munchen:

smitty clstop

Wait for the cluster services to be stopped on the remote node. The geo devices will be in the “Defined” state.

Export the GMD volume group definition on node frigg:

exportvg vg01

This operation removes the volume group definition from the ODM and deletes the file systems’ stanzas from /etc/filesystems.

Configure the RPV Server environment. Perform the following steps from the RPV server:

2. Setup the remote mirroring site name. On the node frigg, run smitty rpvserver Remote Physical Volume Server Site Name Configuration Define / Change / Show Remote Physical Volume Server Site Name. Define the name of the site as in PowerHA definition of site.

You can use the rpvsitename command to define the site:

/usr/sbin/rpvsitename -a 'Munchen'

3. From the “Remote Physical Volume Servers” menu, choose Add Remote Physical Volume Servers to define the RPV servers which are associated with the target disks for mirroring. After selecting the target disks, specify the IP address of the RPV client, as in Example 17-35.


Example 17-35 Adding a RPV server

Add Remote Physical Volume Servers


[Entry Fields] Physical Volume Identifiers 0022be2aa13f292e* Remote PV Client Internet Address [192.168.101.73,192.168.101.74] + Configure Automatically at System Restart? [no] + Start New Devices Immediately? [yes] +


If using the command line, use mkdev command as in Example 17-36.

Example 17-36 Adding a RPV server - using CLI

frigg:/# /usr/sbin/mkdev -c rpvserver -s rpvserver -t rpvstype \>-a rpvs_pvid='0022be2aa13f292e' -a client_addr='192.168.101.73,\ 192.168.101.74' -a auto_online='n'rpvserver0 Available

4. Repeat the steps 1 and 2 for the second RPV.

Use lsrpvserver to list the RPV servers defined, as shown in Example 17-37.

Example 17-37 Listing the RPV servers

frigg:/# lsrpvserver -H# RPV Server Physical Volume Identifier Physical Volume# ----------------------------------------------------------- rpvserver0 0022be2aa13f292e hdisk1frigg:/# lsattr -El rpvserver0auto_online n Configure at System Boot Trueclient_addr 192.168.101.73 Client IP Address Trueclient_addr 192.168.101.74 Client IP Address Truerpvs_pvid 0022be2aa13f292e0000000000000000 Physical Volume Identifier True


Configure the RPV clients. Perform these steps on each client.

5. Run smitty rpvclient Add Remote Physical Volume Clients.

Provide the IP address of the RPV server, and the local IP address used for data replication. Then select the remote disk from the list, as in Example 17-38.

Example 17-38 Adding the RPV client



[Entry Fields] Remote Physical Volume Server Internet Address 10.1.101.192 Remote Physical Volume Local Internet Address 192.168.101.74 PV Identifiers 0022be2aa13f292e0000000000000000 I/O Timeout Interval (Seconds) [180] # Start New Devices Immediately? [yes] +


From the command line, see the sample coding in Example 17-39.

Example 17-39 Adding RPV client, using CLI

thor:/# /usr/sbin/mkdev -c disk -s remote_disk -t rpvclient \ >-a pvid='0022be2aa13f292e' -a server_addr='10.1.101.192' \>-a local_addr='192.168.101.73' -a io_timeout='180'hdisk6 Available

thor:/# lsattr -El hdisk6io_timeout 180 I/O Timeout Interval Truelocal_addr 192.168.101.73 Local IP Address Truepvid 0022be2aa13f292e0000000000000000 Physical Volume Identifier Trueserver_addr 10.1.1.192 Server IP Address True

At this time the disk devices are created on the client and can be used for integrating in a volume group and defining the logical volume mirrors. Use lsrpvclient to list the defined client RPVs.

At the operating system level, they are defined as normal hdisks. The LVM commands used for local volumes applies to the RPVs, too. Example 17-40 shows the output of the lspv command.


Example 17-40 Listing of the physical volumes defined on thor

thor:/# lspvhdisk0 0022be2a80b97feb rootvg activehdisk1 none Nonehdisk2 0022be2aa13dc072 vg01 concurrenthdisk3 0022be2aa13ea83e vg02 concurrenthdisk4 none Nonehdisk5 none Nonehdisk6 0022be2aa13f292e None

6. Repeat the steps 1-3 to create the reverse RPV pair, associating an RPV server for the local disk in node thor and a RPV client on node frigg.

7. Repeat the same step for the node odin, using as the local communication address, odin_geo1.

Define the LVM mirroringFollow these steps:

1. Extend the volume group, containing primary data with the defined RPVs. Use the GLVM menus in SMIT to extend the volume group. Run smitty glvm_vg Add Remote Physical Volumes to a Volume Group, or use the extendvg command:

extendvg vg01 hdisk6

2. Mirror the volume group containing the RPVs:

In Example 17-41 we present how we changed the logical volumes ulv11_log and ulv11.

Example 17-41 Changing the logical volumes

thor:/# chlv -s s -u 2 ulv11_logthor:/# lslv ulv11_logLOGICAL VOLUME: ulv11_log VOLUME GROUP: vg01LV IDENTIFIER: 0022be2a00004c0000000104d52d0c6d.1 PERMISSION: read/writeVG STATE: active/complete LV STATE: opened/syncd

Note: The PVID of the RPV client is the same as the PVID of the remote disk.

Note: Before mirroring a logical volume you must change the allocation policy to superstrict. Use chlv -s s < lv_name> -u <upper_bound> to change the allocation policy to superstrict. Refer the man page for chlv for further details.


TYPE: jfs2log WRITE VERIFY: offMAX LPs: 512 PP SIZE: 16 megabyte(s)COPIES: 1 SCHED POLICY: parallelLPs: 1 PPs: 1STALE PPs: 0 BB POLICY: relocatableINTER-POLICY: minimum RELOCATABLE: yesINTRA-POLICY: middle UPPER BOUND: 2MOUNT POINT: N/A LABEL: NoneMIRROR WRITE CONSISTENCY: on/ACTIVEEACH LP COPY ON A SEPARATE PV ?: yes (superstrict)Serialize IO ?: NOthor:/# chlv -s s -u 2 ulv11thor:/# lslv ulv11LOGICAL VOLUME: ulv11 VOLUME GROUP: vg01LV IDENTIFIER: 0022be2a00004c0000000104d52d0c6d.2 PERMISSION: read/writeVG STATE: active/complete LV STATE: opened/syncdTYPE: jfs2 WRITE VERIFY: offMAX LPs: 512 PP SIZE: 16 megabyte(s)COPIES: 1 SCHED POLICY: parallelLPs: 10 PPs: 10STALE PPs: 0 BB POLICY: relocatableINTER-POLICY: minimum RELOCATABLE: yesINTRA-POLICY: middle UPPER BOUND: 2MOUNT POINT: N/A LABEL: /app01MIRROR WRITE CONSISTENCY: on/ACTIVEEACH LP COPY ON A SEPARATE PV ?: yes (superstrict)Serialize IO ?: NOthor:/#

Mirror the volume group by running smitty glvm_vg Add a Remote Site Mirror Copy to a Logical Volume. You can use mirrorvg command to mirror the volume group or mklvcopy to mirror the logical volumes, as follows:

/usr/sbin/mklvcopy -s's' ulv11_log 2 hdisk6

Check the status of the volume group and logical volumes using lsvg, as in Example 17-42.

Example 17-42 Using lsvg to query the status of the logical volume mirrors

thor:/# lsvg -p vg01vg01:PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTIONhdisk2 active 639 476 128..00..92..128..128hdisk6 active 639 478 128..02..92..128..128thor:/# lsvg -l vg01vg01:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTulv11_log jfs2log 1 2 2 open/syncd N/A


ulv11 jfs2 160 320 2 open/stale N/Aulv11_sm statemap 1 1 1 open/syncd N/Aulv11_log_sm statemap 1 1 1 open/syncd N/A

3. Stop the cluster services gracefully on the local node, using smitty clstop menu. Check the proper termination of the cluster resource processing. Use lsgmd to verify that the GMDs in “Defined” state.

4. On each node in the cluster, change the file system definition in /etc/filesystems file, to use the regular logical volumes, instead of the GMDs. See Example 17-43.

Example 17-43 Changing the file systems for working with the logical volumes

/app01: dev = /dev/ulv11 vfs = jfs2 log = /dev/ulv11_log mount = false check = false account = false

5. Change the PowerHA topology and resource definitions to use GLVM.

For integrating the GLVM volume groups in PowerHA, you need to ensure that each logical volume is replicated. PowerHA issues an error message if the geographically mirrored volume groups contains unreplicated logical volumes.

6. Reconfigure the cluster topology.

Important: If initially you have created the file systems using the crfs command the LVCB (logical volume control block) will get updated with the file system information, so that each importvg command will update the /etc/filesystems. You can check for LVCB data using getlvcb -AT <lv_name>. If you have created the file system over GMD, using crfs, the importvg command will not update the file system information in the /etc/filesystem file.

Note: HACMP/XD HAGEO does not support dynamic reconfiguration. You must stop the cluster services for changing the cluster configuration. PowerHA/XD GLVM supports dynamic reconfiguration as long as you do not have HAGEO installed.


7. Change the network type from Geo_Primary to XD_data. You can have GMDs and RPVs configured in the same time in the cluster. However, the GMD and RPV resources cannot be part of the same resource group. If you have two Geo_Primary networks, you can leave the second network for the unconverted GMDs.

Example 17-44 shows how we changed the first Geo_Primary network in an XD_data type.

Example 17-44 Converting the HAGEO network into an XD_data network

Change/Show an IP-Based Network in the HACMP Cluster


[Entry Fields]* Network Name net_Geo_Primary_01 New Network Name [XD_data_net_01]* Network Type [XD_data]+* Netmask [255.255.255.0]+* Enable IP Address Takeover via IP Aliases No+ IP Address Offset for Heartbeating over IP Aliases []* Network attribute public+


8. Synchronize the cluster topology.

9. Change the resource groups to integrate the RPVs. You do not have to configure special resources for using the RPVs in the cluster. At this time you should remove the GMD definitions from the resource groups (see Example 17-45).

Example 17-45 Defining the resource group in PowerHA



Note: If you are changing your Geo_Primary network attribute from private to public, you have to remove the network and recreate it.


[Entry Fields] Resource Group Name app01_rg Inter-site Management Policy Prefer Primary Site Participating Nodes from Primary Site thor odin Participating Nodes from Secondary Site frigg

Startup Policy Online On Home Node Only Fallover Policy Fallover To Next Priority Node In The List Fallback Policy Fallback To Higher Priority Node In The Li> Fallback Timer Policy (empty is immediate) [] + Service IP Labels/Addresses [] + Application Servers [app01_srv] +Volume Groups [vg01 ] +

Use forced varyon of volume groups, if necessary true + Automatically Import Volume Groups false + Filesystems (empty is ALL for VGs specified) [/app01 ] + Filesystems Consistency Check fsck + Filesystems Recovery Method sequential + Filesystems mounted before IP configured false + Filesystems/Directories to Export [] +Filesystems/Directories to NFS Mount []Network For NFS Mount [] +Tape Resources [] +

Raw Disk PVIDs [] +Fast Connect Services [] +

Communication Links [] + Primary Workload Manager Class [] + Secondary Workload Manager Class [] + Miscellaneous Data [] GeoMirror Devices [] +


10.Synchronize the cluster definition across the nodes.

11.Start the cluster on the nodes.


Part 6 Appendixes

Part 6



Appendix A. Paper planning worksheets

The following Planning Worksheets can be used to guide you through the planning and implementation of a PowerHA cluster. You will notice that the worksheets cover all the important aspects of the cluster configuration and follow a logical planning flow.

Additional detailed Paper Planning Worksheets are also found in the HACMP Planning Installation Guide, SC23-4861.

A


Two-node cluster configuration assistantUse this table if you plan to use the two-node cluster configuration assistant to configure your initial cluster. The two-node cluster configuration assistant simply requires the following information in order to setup a simple 2 node cluster with a single resource group.

Table A-1 Two-node Cluster Configuration Assistant

PowerHA CLUSTER WORKSHEET Two NODE CONFIGURATION ASSISTANT

DATE:

Local Node

Takeover (Remote) Node

Communication Path to Takeover Node

Application Server

Application Start Script

Application Stop Script

Service IP Label


TCP/IP network planning worksheetsUse this worksheet to record your network information.

Table A-2 Cluster Ethernet Networks

Network Name Network Type Netmask Node Names IPAT via IP Aliases

IP Address Offset for HB over Aliasing

Appendix A. Paper planning worksheets 777

TCP/IP network interface worksheet

You can record your inference information with these worksheets.

Table A-3 Network interface worksheet

IP Label IP Alias distributionpreference

Network Interface

NetworkName

InterfaceFunction

IP Address

Netmask HardwareAddress


Table A-4 Cluster Serial Networks

NETWORK NAME

NETWORK TYPE

NODE NAMES

Device INTERFACE NAME

ADAPTER LABEL

COMMENTS RS232, diskhb, mndhb


Fibre Channel Disks Worksheets

Use this worksheet to record information about Fibre Channel disks to be included in the cluster. Complete a separate worksheet for each cluster node.

Table A-5 Fibre Channel Disks Worksheet

Fibre Channel Adapter Disks associated with adapter


Shared volume group and file system worksheet

Use this worksheet to record the shared volume groups and file systems in a non-concurrent access configuration. You need a separate worksheet for each shared volume group, print a worksheet for each volume group and fill in the names of the nodes sharing the volume group on each worksheet.

Table A-6 Shared volume group and file system worksheet

Node A Node B

Node Names

Shared Volume Group Name

Major Number

Log Logical Volume Name

Physical Volumes

Cross-site LVM Mirror

Logical Volume Name

Number of Copies of Logical Partitions

On Separate Physical Volumes

File System Mount Point

Size

Cross-site LVM Mirroring enabled


NFS-Exported file system or directory worksheet

Use this worksheet to record the file systems and directories NFS-exported by a node in a non-concurrent access configuration. You need a separate worksheet for each node defined in the cluster, print a worksheet for each node and fill in a node name on each worksheet.

Table A-7 NFS export file system or directory worksheet

Resource Group Name

Network for NFS Mount

File System Mounted before IP Configured?

For export options see the exports man page.

File System or Directory to Export (NFSv2/3)

Export Options

File System or Directory to Export (NFSv4)

Export Options

File System or Directory to Export (NFSv2/3)

Export Options

File System or Directory to Export (NFSv4)

Export Options

Stable Storage Path (NFSv4)


Application worksheet

Use these worksheets to record information about applications in the cluster.

Table A-8 Application worksheet

Application Name

Directory

Executable Files

Configuration Files

Data files or devices

Log files or devices

Cluster Name

Fallover strategy (P=Primary T=Takeover)

Node

Strategy

Normal Start Commands

Normal stop Commands

Verification Commands

Node Reintegration Caveats

Node A

Node B


Application server worksheet

Use these worksheets to record information about application servers in the cluster.

Table A-9 Application server worksheet

Cluster Name

Note: Use full pathnames for all user-defined scripts.

Server Name

Start Script

Stop Script

Server Name

Start Script

Stop Script

Server Name

Start Script

Stop Script


Application monitor worksheet (custom)

Use these worksheets to record information about custom application monitors in the cluster.

Table A-10 Application server worksheet

Cluster Name

Application Server Name

Monitor Method

Monitor Interval

Hung Monitor Signal

Stabilization Interval

Restart Count

Restart Interval

Action on Application Failure

Notify Method

Cleanup Method

Restart Method


Resource group worksheet

Use these worksheets to record information about the resource groups in the cluster.

Table A-11 Resource Groups.

Cluster Name

RESOURCE Group Name

Participating Node Names

Inter-Site Management Policy

Startup Policy

Fallover Policy

Fallback Policy

Delayed Fallback Timer

Settling Time

Runtime Policies

Dynamic Node Priority Policy

Processing Order (Parallel, Serial, or Customized)

Service IP Label

file systems

file system Consistency Check

file systems Recovery Method

file systems/Directories to Export

file systems/Directories to NFS mount (NFSv2/3)

file systems/Directories to NFS mount (NFSv4)

Storage Stable Path (NFSv4)

Network for NFS mount

Volume Groups

Concurrent Volume Groups


Raw Disk PVIDS

Fast Connect Services

Tape Resources

Application Servers

Highly Available Communication Links

Primary WLM Class

Secondary WLM Class

Miscellaneous Data

Auto Import Volume Groups

Disk Fencing Activated

file systems Mounted before IP Configured.

COMMENTS

Cluster Name


Cluster events worksheet

Use these worksheets to record information about cluster events in the cluster.

Table A-12 Cluster event worksheet

Cluster Name

Cluster Event Description

Cluster Event Method

Cluster Event Name

Event Command

Notify Command

Remote Notification Message Text

Remote Notification Message Location

Pre-Event Command

Post-Event Command

Event Recovery Command

Recovery Counter

Time Until Warning


Cluster file collections worksheet

Use these worksheets to record information about file collections in the cluster.

Table A-13 Cluster file collections worksheet

Cluster Name

File Collection name

File Collection description

Propagate files before verification

Propagate files automatically

Files to include in this collection

Automatic check time limit



acronyms

ACL Access Control List

AIX Advanced Interactive Executive

API Application Programming Interface

ARP Address Resolution Protocol

ATM Asynchronous Transfer Mode

BOS Base Operating System

CA Certificate Authority

C-CPOC Cluster Single Point Of Control

CEC Central Electronic Complex

CGI Common Gateway Interface

CLI Command Line Interface

CLVM Concurrent Logical Volume Manager

CPU Central Processing Unit

CSM Cluster Systems Management

CWDM Coarse Wave Division Multiplexing

CWOF Cascading Without Fallback

DAC Disk Array Controller

DARE Dynamic Reconfiguration

DBFS Dial Back Fail Safe

DES Data Encryption System

DLPAR Dynamic LPAR

DNP Dynamic Node Priority

DNS Domain Name Service

DWDM Dense Wave Division Multiplexing

ECM Enhanced Concurrent Mode

ESS Enterprise Storage Server

FC Fibre Channel

Abbreviations and

© Copyright IBM Corp. 2009. All rights reserved.

FCIP Fibre Channel IP

FDDI Fiber Distributed Data Interface

GLVM Geographic LVM

GMD Geographic Mirror Device

GMVG Geographically Mirrored Volume Group

GPFS General Parallel File System

HACMP High Availability Cluster Multi-Processing

HACMP/ES HACMP Enhanced Scalability

HACMP/XD HACMP Extended Distance

HA-NFS High Availability NFS

HBA Host Bus Adapter

HPS High Performance Switch

HSC Hardware Service Console

HWAT Hardware Address Takeover

IBM International Business Machines Corporation

IHS IBM Http Server

IPAT IP Address Takeover

ITSO International Technical Support Organization

JBOD Just a Bunch Of Disks

JFS Journal File System

JRE Java Runtime Environment

LAA Locally Administered Address

LAN Local Area Network

LDAP Lightweight Directory Application Protocol

LPAR Logical Partition

LUN Logical Unit Number

LV Logical Volume

791

LVCB Logical Volume Control Block

LVDD Logical Volume Device Driver

LVM Logical Volume Manager

MAC Media Access Control

MAL Mechanism Abstraction Layer

MIB management Information Base

MPIO Multi-Path I/O

MPM Mechanism Pluggable Module

MTU Media Transmission Unit

NFS Network File System

NIM Network Interface Module

POL Priority Override Location

PP Physical Partition

PPK Private-Public Key

PPRC Peer-to-Peer Remote Copy

PV Physical Volume

PVID Physical Volume ID

RAC Real Application Cluster

RAID Redundant Array of Independent Disks

RDAC Redundant Disk Array Controller

RG Resource Group

RM Resource Monitor

RMC Resource Monitoring and Control

RPV Remote Physical Volume

RSCT Reliable Scalable Clustering Technology

SAN Storage Area Network

SCSI Small Computer System Interface

SDD Subsystem Device Driver

SPOF Single Point Of Failure

THL Trusted Host List

VG Volume Group

VGDA Volume Group Descriptor Area

VIO Virtual I/O

VIOS Virtual I/O Server

VLAN Virtual LAN

WLM Workload manager


Related publications

The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this book.

IBM Redbooks publications

For information about ordering these publications, see “How to get Redbooks publications” on page 795.

� Implementing High Availability Cluster Multi-Processing (HACMP) Cookbook, SG24-6769

� HACMP 5.3, Dynamic LPAR, and Virtualization, REDP-4027

� IBM eServer pSeries HACMP V5.x Certification Study Guide Update, SG24-6375

� PowerVM Virtualization on IBM System p: Introduction and Configuration Fourth Edition, SG24-7940

� ILM Library: Information Lifecycle Management Best Practices Guide, SG24-7251

� ILM Library: Information Lifecycle Management Best Practices Guide, SG24-7251

� SAN Volume Controller V4.3.0 Advanced Copy Services, SG24-7574.

� IBM System Storage DS8000: Copy Services in Open Environments, SG24-6788

� Implementing the IBM System Storage SAN Volume Controller V4.3, SG24-6423

Other publications

These publications are also relevant as further information sources:

� HACMP for AIX Administration Guide, SC23-4862

� HACMP for AIX Planning Guide, SC23-4861

� HACMP for AIX Installation Guide, SC23-5209


� HACMP for AIX Troubleshooting Guide, SC23-5177

� HACMP/XD for Metro Mirror Planning and Administration Guide, SC23-4863

� HACMP/XD for Geographic LVM Planning and Administration Guide, SA23-1338

Online resources

These Web sites are also relevant as further information sources:

� PowerHA Technical Forum: Use this forum to post questions and discuss concepts regarding using PowerHA, or HACMP to provide a High Availability environment on the Power platform:

ibm.com/developerworks/forums/forum.jspa?forumID=1611

� For the latest list of recommended service pack for PowerHA access the IBM Web site at:


� For more current information about PowerHA licensing refer to the PowerHA frequently asked questions online page located at:


� For more details consult Oracle Metalink note number 404474.1 found at:


� For an updated list of supported storage and tape drives, check the IBM Web site at:


� More details on IP configuration can be found at:


� IBM site for HTTP servers at the following URL:


� Live Partition Mobility support flash:


� OpenSSL project Web site:

http://www.openssl.org

� OpenSSH project Web site:

http://www.openssh.org



http://www.openssh.org

http://www.openssl.org







� OpenSSH on AIX Web site:

http://www.sourceforge.net/projects/openssh-aix

� PowerHA recommended maintenance levels Web site:

http://www-933.ibm.com/support/fixcentral/

� The OLPW can be downloaded from the PowerHA Web site:

http://www.ibm.com/systems/power/software/availability/aix/

� GLVM white papers:

http://www.ibm.com/systems/power/software/aix/whitepapers/aix_glvm.htmlhttp://www.ibm.com/systems/p/software/whitepapers/hacmp_xd_glvm.html

� Automated recovery management with IBM HACMP/XD and PPRC white paper:

http://www.ibm.com/systems/p/software/whitepapers/hacmp_pprc.html

� Supported devices by PowerHA Web site:

http://www.ibm.com/common/ssi/rep_sm/2/897/ENUS5765-F62/index.html

� EtherChannel support and PowerHA Web site:


� Understanding the Performance Implications of Cross-Site Mirroring with AIX’s Logical Volume Manager, which is available online at:


� Using the Geographic LVM in AIX5L white paper:

http://www.ibm.com/systems/power/software/aix/whitepapers/aix_glvm.html

� HACMP/XD Geographic LVM: Planning and administration guide:


� The AIX 6.1 TL2 Infocenter documentation for details:


How to get Redbooks publications

You can search for, view, or download Redbooks publications, Redpapers publications, Technotes, draft publications, and Additional materials, as well as order hardcopy Redbooks publications, at this Web site:

ibm.com/redbooks

Related publications 795



http://www.sourceforge.net/projects/openssh-aix

http://www.ibm.com/systems/power/software/availability/aix/


http://www.ibm.com/systems/p/software/whitepapers/hacmp_xd_glvm.html

http://www.ibm.com/systems/p/software/whitepapers/hacmp_pprc.html

http://www.ibm.com/common/ssi/rep_sm/2/897/ENUS5765-F62/index.html






Help from IBM

IBM Support and downloads:

ibm.com/support

IBM Global Services:

ibm.com/services


http://www.ibm.com/support/

http://www.ibm.com/support/

http://www.ibm.com/services/

http://www.ibm.com/services/

Index

Symbols.rhosts 458/etc/hosts 121, 137/etc/inittab 24, 121/etc/netsvc.conf 30/etc/rc.net 24, 122/etc/rhosts 127/etc/services 24, 122/etc/snmpd.conf 25, 122/etc/snmpd.peers 25, 123/etc/syslog.conf 25, 123/etc/trcfmt 25, 123/usr/es/lpp/cluster/doc 20/usr/es/sbin/cluster/clstat 428/usr/es/sbin/cluster/etc/clinfo.rc 330/usr/es/sbin/cluster/etc/gateways 231/usr/es/sbin/cluster/etc/rhosts 458/usr/es/sbin/cluster/etc/rhosts file 458/usr/es/sbin/cluster/worksheets/worksheets 193/usr/es/sbin/cluster/wsm 228, 232/usr/es/sbin/cluster/wsm/wsm_smit.allow 472/usr/es/sbin/cluster/wsm/wsm_smit.deny 472/var/hacmp/adm/cluster.log 437/var/hacmp/adm/history/cluster.mmddyyyy 437/var/hacmp/clcomd/clcomd.log file 437/var/hacmp/clcomd/clcomddiag.log 437/var/hacmp/clverify/clverify.log 437/var/hacmp/log/autoverify.log 437/var/hacmp/log/cl_testtool.log 324, 438/var/hacmp/log/clavan.log 438/var/hacmp/log/clconfigassist.log 438/var/hacmp/log/clinfo.log clinfo.log.n 438/var/hacmp/log/clstrmgr.debug clstrmgr.debug.n 438/var/hacmp/log/clstrmgr.debug.long clstrmgr.de-bug.long.n 438/var/hacmp/log/clutils.log 438/var/hacmp/log/cspoc.log 439/var/hacmp/log/cspoc.log.remote 439/var/hacmp/log/hacmp.out hacmp.out.n 439/var/hacmp/log/migration.log 439/var/hacmp/log/oraclesa.log 439/var/hacmp/log/sa.log file 439

© Copyright IBM Corp. 2009. All rights reserved.

/var/spool/cron/crontab/root 124/var/spool/cron/crontabs/root 25

Aaccess control list 476access control lists 481Active Cluster 582active node 14, 148, 583, 702

resource group 14Address Resolution Protocol (ARP) 140administration 326Advanced Encryption Standard 460

data encryption 118Advanced Encryption Standard (AES) 55, 118AIX 26AIX 5.1 576AIX 5.2 116, 576AIX 5L

Expansion Pack CD-ROM 118function 137GUI 193ODM 160Planning 193TCP/IP hostname attribute move 137user 129utility 122

AIX configuration files 330AIX hostname 113AIX level 16analysis tool 168application 110application availability 8, 19, 103, 327application availability analysis tool 327application failure 8, 171application fallover 8Application Management 280application monitor 32, 67–68, 85, 93, 111, 285, 552application monitoring 327application planning worksheets 169application server 58, 131, 167, 286, 495, 772

additional resources 495CUoD requirements 497

797

CUoD resource requirements 498highly available resources 168required resources 511

ARP cache 140Asynchronous transfer mode (ATM) 43, 134ATM network interface 289authentication 460, 477

setup 461authorization 476, 481automated testing 296Automatic node configuration 127availability 5available node 173

Bbackbonefast 140backup site 735base adapter 138

separate subnet 144base address 29, 41, 127, 138base IP address 30, 44, 46, 128, 137base operating system (BOS) 17basic step 185best-practice 326boot adapter 30Bos.clvm.enh 5.2.0.11 580bosboot command 290

CCapacity Upgrade on Demand (CUOD) 493cat command 506chdev -l command 206, 690child resource group 551circular dependencies 551–552cl_chfs command 670cl_highest_free_mem 542cl_highest_idle_cpu 542cl_lowest_disk_busy 542clcomd

authentication 458disabling 459features 458

clcomd daemon 458clcomdES 274, 326clconvert 256cldump command 243cli_assign_pvids 373cli_chfs 374

cli_chlv 376cli_chvg 382cli_crfs 385cli_crlvfs 387cli_extendlv 389cli_extendvg 392cli_importvg 393cli_mirrorvg 394cli_mklv 396cli_mklvcopy 401cli_mkvg 403cli_on_cluster 406cli_on_node 407cli_reducevg 408cli_replacepv 409cli_rmfs 410cli_rmlv 411cli_rmlvcopy 412cli_syncvg 413cli_unmirrorvg 414Clients 13clinfo 32790/udp 130clinfoES 274clip_config 32cllsclstr command 235cllsif command 243, 606cllsnw command 611clrgdependency command 558clRGinfo command 282clRGinfo -t command 537clRGinfo -v command 253clRGmove command 280clruncmd command 245clsettlingtime list command 537clshowres command 243, 549clshowsrv 431clstrmgr 870/udp 130clstrmgr 871/udp 130CLSTRMGR_KILL 301, 303, 320clstrmgrES 274, 326cltopinfo 432cltopinfo command 243, 596, 611cluster

security features 459security services 476VPN connections 460

cluster communication 27, 55–57, 89, 126–127cluster configuration 21, 24–25, 39, 97, 114, 195, 277, 514, 770


service IP labels 121wide range 114

Cluster definitionfile 90, 184–185, 194–195

Cluster definition file 185–186cluster design 112, 177Cluster diagram 108, 182cluster event 121, 155, 274, 286cluster information daemon 27cluster management 326cluster manager 14, 27, 32, 55, 82, 89, 92, 123, 273–274, 458, 495

new SNMP function 89CLUSTER Name 114, 285cluster node 9, 12–13, 16, 19–20, 28–29, 31–32, 49–50, 54, 93, 98, 110, 292, 584

efficient use 13enough paths 146failed disk device information 293HACMP communication 118HACMP filesets 20heartbeat messages 134heartbeat packets 49IP labels 155other packets 134PCI serial adapter 151point-to-point connections 31required filesets 182secure communication 274Updates vgda time stamp files 584working non-ip network 287

Cluster objects 111cluster resource 71, 92, 172, 174cluster security 126Cluster Security Services 476cluster service 53, 71, 121, 273, 498, 576, 729

graceful stop 765startup options 275

cluster services 327Cluster Single Point of Control 9Cluster Test Tool 294cluster topology 23–24, 27–28, 36–37, 58, 113, 180, 274, 770

design decisions 113global view 274

Cluster verification 10, 90–91, 126, 129, 497Cluster verification and synchronization 325, 416cluster.adt.es 118cluster.assist.license 118

cluster.doc.en_US.es 118cluster.doc.en_US.glvm 118cluster.doc.en_US.pprc 118cluster.es.assist 118cluster.es.cfs 119cluster.es.client 119cluster.es.cspoc 119cluster.es.nfs 119cluster.es.plugins 119cluster.es.pprc 119cluster.es.server 119cluster.es.spprc 119cluster.es.svcpprc 119cluster.es.worksheets 119, 192cluster.haw 186cluster.license 119cluster.man.en_US.es 120cluster.msg.en_US.cspoc 120cluster.xd.glvm 120cluster.xd.license 120clvg_config 32clvmd 52command completion 285command line 193, 576, 766

worksheets.bat command 193commands

bosboot 290cat 506chdev -l 206, 690cl_chfs 670cldump 243cllsclstr 235cllsif 243, 606cllsnw 611clrgdependency 558clRGinfo 248, 282, 537clRGinfo -v 253clRGmove 280clruncmd 245clsettlingtime 537clshowres 243, 549cltopinfo 243, 596, 611dhb_read 594drslot 288entstat 608ifconfig 289importvg -y 211inutoc 290java -jar 227

Index 799

logform 209lqueryvg -p 583ls clstrmgrES 273lsfs -q 671lshwres 495lslpp -h 243lsrsrcdef 572lssrc -l 596lssycfg 508lsvg -o 582lvlstmajor 207netstat 608netstat -i 146reboot 518replacepv 292scp 469, 507sftp 469smit cl_configassist 212smit crfs 210smit mklv 208smit mkvg 207ssh 469, 512startsrc 121stopsrc 459varyonvg 100, 208, 582websmit_config 229websmitctl startssl 229wsm_access 230wsm_gateway 230

communication adapter 13, 133communication device 27, 31communication interface 13–14, 26–27, 30, 34, 131, 133

heartbeat packets 131service addresses 141

Communication Path 29, 38, 128, 138, 508concurrent 95, 328concurrent access 4–5, 117, 578, 699Concurrent Logical Volume Manager (CLVM) 117, 578concurrent mode 576, 578, 582concurrent resource manager (CRM) 17concurrent volume group 51, 64, 96, 161, 576config_too_long 268configuration 327Configuration Assistant 212Configuration Database 203CONFIGURATION File 166Configuration_Files 330

configure HACMP 108, 508credentials 477cross-site LVM 328C-SPOC 9, 96, 213, 325–326C-SPOC file collections 329C-SPOC menu 190cspoc.log 326ctcasd 480–481CtSec 476, 478CUoD Resource 495

On/Off license 498custom application monitoring 444

Ddaemon

ctcasd 479DARE 267, 536Data Encryption Standard (DES) 55, 118, 460data replication 698datapath devices 328debug 327default gateway 139default route 139defined RPVs

primary data 768delayed fallback timer 546dependency

combination 557location 552online on different nodes 554online on the same node 553online on the same site 556

Destination Node 281Melany 284normal resource group startup procedures 283

detailed information 127, 510device name 20, 590dhb_read command 594directory 480

/usr/sbin/rsct/lib 480disaster recovery (DR) 5, 9, 111, 673disk adapter 111Disk heartbeat

network 52, 132, 153, 587–588disk heartbeat 132, 134, 514, 575disk mirroring 61Disk space requirements 126disk subsystem 160, 576


diskhb 51, 589diskhb network 51, 589

failure detection rate 596Distributed Computing Environment (DCE) 129distribution policy 70, 75, 81distribution preference 143DLPAR 113–114DNS 155downtime

planned 7unplanned 8

drslot command 288dynamic LPAR 114, 497Dynamic Node Priority 173dynamic node priority 542dynamic node priority (DNP) 33, 77, 81–82, 172dynamic reconfiguration (DARE) 9, 756dynamic volume expansion 328

EECM 96, 98, 113E-mail Address 91EMC Symmetrix, Sun (ESS) 579encryption 460

filesets 461setup 461troubleshooting 468

enhanced concurrent 52, 96enhanced concurrent mode 98, 113, 328

volume group 161enhanced concurrent mode (ECM) 51, 96, 134enhanced concurrent volume group 51, 64, 153, 576

member disks 589member nodes 584

Enhanced Host Based Authentication 479Enhanced Host Based Authentication mechanism 479enhanced scalability (ES) 10, 21entry field 590, 766entstat command 608error notification 183, 566ERROR state 281, 500errpt 265ESS 636etc/inittab file 121, 279Etherchannel 43, 134Ethernet media speed settings 141

event script 121, 274exportfs 20Extended Configuration 136Extended Distance (XD) 673Extended Resource Configuration 508

FFAIL_LABEL 302failure detection 8fallback 14, 546fallback behavior 172Fallback Policy 175fallback timer 81, 174

policy 772fallover 14, 33, 46, 64, 66, 69, 71–72, 76–77, 81–82, 85, 87–88, 91, 96, 125, 518, 579Fallover Policy 175Fast disk takeover 96–98, 113, 163, 575fast paths

smit hacmp 186smitty install_all 20smitty mkvg 207

fault tolerance 15Fault-tolerant system 15Fiber Distributed Data Interchange (FDDI) 133Fiber Distributed Data Interface (FDDI) 43, 49file collection 57, 129, 327, 329

propagation options 131File collections 325file collections 332File system 98–99, 159

main components 99File system (FS) 24, 57, 60, 64–66, 89–90, 94–96, 98–99, 581, 757file systems 328filesets 116, 503firewall 473first node 165floating licenses 19following network type

IP Aliases 142for.jar 193forced varyon 100, 575free pool 495

DLPAR resources 497enough resources 516

Index 801

GGB memory 519geo device 765Geo_Primary network 700, 771Geographically mirrored volume group 699GeoMirror device 772Global Logical Volume Manager 638Global Logical Volume Manager (GLVM) 638, 697global network 32GLVM 697glvm.rpv 120glvm.rpv.man.en_US 120glvm.rpv.msg.en_US 120GMD 756GMDs 757GMVG 699, 734Gratuitous ARP 115group leader (GL) 14, 36, 49grpglsmd 275grpsvcs 27

HHACMP 4, 12HACMP 5.1 33, 126HACMP 5.1, 5.2, 5.3 23HACMP 5.2 87HACMP 5.3 17, 53, 59, 65, 85HACMP cluster 16, 25, 28, 31–32, 49, 103, 195, 576, 699

additional protection level 31group accounts 129IP-Based Network 771planning aspects 103serial networks 50shared volume groups 160software components 25system administrator 94

HACMP code 182HACMP Communication Interface Management 327HACMP Concurrent Logical Volume Management 328HACMP configuration 12, 48, 69, 130, 138

user-configurable files 131HACMP discovery 154HACMP environment 6, 582HACMP File Collection Management 327hacmp group 130

HACMP Log Viewing and Management 327HACMP Logical Volume Management 328HACMP MIB 123HACMP network 141HACMP node 50, 140

configure SSH 503following command 505HMC communication 508public key information 506serial-like link 50SSH keys 504

HACMP ODMclass 127

HACMP Physical Volume Management 328HACMP plugins 23HACMP Resource Group and Application Manage-ment 327HACMP Security and Users Management 327HACMP Service 121HACMP software 19–20, 125, 290HACMP start 157HACMP V5.1 12, 20

recommended maintenance levels 17, 794HACMP V5.3 10, 16, 19, 273HACMP Version 16, 124

5 Release 1 105 Release 3 10

HACMP/XD 5HACMP/XD HAGEO 770HACMP_Files 329–330HAGEO 29, 636Hardware address takeover (HWAT) 45, 141Hardware Management Console (HMC) 493hardware planning worksheet 115hatsd 32789/udp 130HB Interval 156hdisk 132, 292Heartbeat 14Heartbeating 144High Availability

Cluster Multi-Processing 9, 23, 25–26, 28–29, 31, 34–43, 46–51, 53–54, 58–60, 64, 66–69, 76, 85, 87–90, 96, 99, 576

High Performance Switch (HPS) 133–134HMC 113HMC communication 507–508home node 70–72, 172, 515, 522, 582, 772Host Based Authentication 479Host Based Authentication mechanism 479


host name 481hot swapp 5http

//www-933.ibm.com/support/fixcentral/ 795httpd.wsm.conf 229

II/O Adapter 9ibm.com/developerworks/forums/forum.jspa?foru-mID=1611 794identity

Identity Mapping Service 483IDM 483local 479mapping 483network 479

ifconfig command 289importvg -y command 211increase the size of a shared LUN 328Individual node configuration 128initiator 50inittab 326inutoc command 290IP Address 13–14, 16, 29–30, 35–37, 40–44, 46–48, 55, 59, 69–70, 90, 105, 127, 502, 507, 763, 765

daemon, clcomd authenticates 55setting 287swap 142takeover 121, 771takeover planning 141

IP Address takeover 13, 113IP alias 30, 40–41, 43, 47, 113, 771

Heartbeat 40–41, 43heartbeat monitoring 139reason heartbeat 46

IP aliases 137IP Label 29, 43–44, 46, 48, 56, 59–60, 66–67, 121, 129IP network 27, 29–32, 35–37, 39, 43–44, 49–51IP replacement 141ip6forwarding 25IPAT 113, 174ipignoreredirects 25ipsrcrouteforward 25ipsrcrouterecv 25ipsrcroutesend 25issuing a ping (IP) 508

Jjava -jar command 227JFS 98JFS2 98JOIN_LABEL 302July 2005 10, 114

Kkeep alive (KA) 32key management 460

Llicense activation code 18Licensing 18Local Node 283, 294, 577, 770local Node

cluster services 294location dependency 552LOG File 56, 89, 162log files 327logform command 209logical partition 99

accessible copy 64valid copy 100

logical partition (LP) 12, 64, 99–100logical volume 99

first 4 KB 581space allocation unit 99super strict allocation policy 100

logical volume (LV) 65, 99, 162–163, 576, 638logical volume manager (LVM) 26, 32, 52, 63, 93–94, 97–100, 286logical volumes 328logical volumes (LV) 762LPAR 113, 493, 498LPAR maximum 497

setting 502value 511

LPAR minimum 495LPAR name 515LPAR node 493

second frame 497lpp_source 20lppchk 21lqueryvg -p command 583ls clstrmgrES command 273lsfs -q command 671lshwres command 495

Index 803

lslpp -h command 243lsrsrcdef command 572lssrc -l command 596lssrc -s 430lssycfg command 508lsvg -o command 582luster.doc.en_US.assist 118lvlstmajor command 207LVM 26, 52, 98LVM change 576LVM component 98, 100, 159LVM mirroring 61, 98LVM operation 292, 581

Mmajor number 159MAL 478Man page 768Manage HACMP Services 327manipulate resource groups 327MAX PPs 577MAX PVs 577mechanism abstract layer 478Media Access Control (MAC) 45, 54, 140message authentication 118

troubleshooting 468migration

backup 244cluster versions 266completed 256config_too_long 268customizations 244DARE 245DARE error 267error 259error report 264failure 264file permissions 245filesets 244known issue 259logs 264non-disruptive 242offline 242, 247prerequisites 243reverting 265rolling 242, 247RSCT filesets 244snapshot 242, 244, 247

synchronization 245troubleshooting 263, 267uninstall 265verification 262

mksysb 183Monitoring PowerHA 325, 427Move cursor 281MPM 478mpr_policy 40Multiple Node 9, 13, 19–20, 24, 30, 59, 93, 111, 151, 275–276, 577

cluster configuration update 24device access 579shared data access 93

multi-tiered application 166multi-tiered architecture 551

Nname resolution 121, 481

same type 503netmon.cf 473netmon.cf file 157netstat command 608netstat -i command 146network 27, 29, 110network configuration 108Network File System (NFS) 66, 83Network Information

Service 129, 155Network Installation Management 20Network Installation Management (NIM) 20NETWORK Interface 15, 30–33, 46, 92, 105, 134, 275, 286Network Interface 15, 129, 275

card 286IP address settings 287Module 155

Network Interface card (NIC) 286network modules 27NETWORK Name 157, 589network planning worksheets 157NETWORK Type 6, 35, 51, 53–54, 132–133, 701NETWORK_DOWN_GLOBAL 302, 312NETWORK_DOWN_LOCAL 302, 312NETWORK_DOWN_NONIP 302NETWORK_UP_GLOBAL 302, 312NETWORK_UP_LOCAL 302, 312NETWORK_UP_NONIP 302


next priority node 172–173NFS 16NFS Mount 772NIM 20NIM server 20NIS 155node 13, 27node distribution policy 539node failure 13, 125, 131node frigg 765

GMD volume group definition 765RPV client 768

node Hurricane 500node hurricane 500

resource group 500node jordan 513

public key 513node list 68, 70–71, 73–77, 87

nodes priority 70NODE Name 29, 113, 136, 590node preference 172

non-concurrent resource group behavior 172node thor 757

resource group 758node_down 29NODE_DOWN_FORCED 299, 303NODE_DOWN_GRACEFUL 299NODE_DOWN_TAKEOVER 299, 302NODE_UP 299node_up 29node_up event 273node-bound service IP address 59Nodes and Networks 237non-IP network 29, 31–32, 36, 39, 49–51, 591

first device 591HACMP cluster 50

nonlocsrcroute 25Notify method 563NSORDER 30

OObject Data Manager (ODM) 26, 33, 41, 56–57, 95, 765ODM 24

definitions 256HACMPcluster 254

offline migration 247, 262OLPW 108

Online on all available nodes (OAAN) 181Online on First Available Node 536Online on first available node (OFAN) 173Online on home node only (OHNO) 172Online Planning Worksheet 106, 108, 186

cluster definition file 187Cluster Notes panel 187Existing Snapshot 187

operating system (OS) 105, 115Operating system settings 126operational procedures 6optic fiber 52other software (OS) 16

PPaper Planning Worksheets 108parent resource group 551parent/child dependency 551–552parent/child resource

planning 552passive mode 581, 763

volume group state 582perf 588persistent alias 133persistent IP

address 113, 132label 133, 144

persistent IP label 27persistent reservation (PR) 579physical memory

usage 175physical partition 99

logical view 99physical partition (PP) 98–99physical volume 32, 51, 64, 93, 98, 162, 328

logical volumes 760physical volume ID 52picklist 589planning 4, 107Planning cluster hardware 114Planning cluster software 116planning strategy 107planning tool 106, 108planning worksheet 175point-to point 50point-to-point network 132, 134, 587

following types 134other cluster nodes 151

Index 805

portfast 140ports 130Post-event script 330post-event script 137, 563PowerHA 326PowerHA cluster requirements 110PowerHA Technical Forum 794PPRC 636Pre-event script 330pre-event script 563press F4 21primary node 291, 729primary site 735priority node 70–71, 76, 79, 81, 88, 172, 772Priority override location (POL) 80–81, 83private key 477, 480problem management 183process application monitoring 442Process Monitor 171proper LPAR 492public and private key (PPK) 505public key 477, 480–481, 505pv 98, 165, 734PV_NAME (PPS) 769PVID 52, 154, 292, 699, 702

QQuorum 99

RRAID 8, 61, 98RAID0 61RAID1 61RAID10 63RAID2 61RAID3 61RAID4 62RAID5 62raw logical volumes 95, 98raw physical volumes 98read/write access 578, 581reboot -q command 518Recovery command 564Redbooks Web site 795

Contact us xxxiRedundant Array

of Independent Disks 61–64reintegration 638

reliable configuration 151Reliable Scalable Cluster Technology 206Reliable Scalable Clustering Technology (RSCT) 10, 27, 30–37, 39–42, 46, 49, 51, 64, 89–90, 92, 96, 273remote command execution 469remote node 127, 699remote physical volume 698remote site 580

RPV servers 756replacepv command 292resource

classes 482permissions 482

resource class 764resource group 9, 59, 97

active or part 52certain applications 174child 551child relationships 83dependencies 551fall-over node 95fallover node 173GMD definitions 771live PCI network service interface 287multiple applications 497next priority node 76nodes part 69non-alive PCI network service interface 289parent 551Planning 172primary node 70related applications 174replicated copy 758required planning information 175service labels 174settling time 536site preference 733startup policy 70wide location dependencies 91

RESOURCE Group (RG) 9, 13–14, 24, 31, 35, 46, 57–59, 69–72, 76, 78, 80–81, 83–85, 87–88, 91–92, 99, 163, 277, 495–496Resource monitoring and control 27Resource monitoring and control (RMC) 33, 92, 116RESOURCE Name 175, 764resource reintegration 8resource type 764


resources 13, 23–24RG_MOVE 303, 314RG_MOVE_SITE 301, 303, 315RG_OFFLINE 303, 313RG_ONLINE 303RMC 27, 474, 542RMC resource 173

variable 173rolling migration 247routerevalidate 25RP_FAILED 273RPV 698, 733RPV client 699, 702, 731

device 699device driver 737volume 729volume group 700

RPV Server 699RPV server 699, 726, 731

following steps 765IP address 767

RS232 51, 701RS232 network 132RSCT 14, 26, 52, 474RSCT levels 116–117rsh 458

Ssame node 13, 15, 86, 138, 294

available interface 46same logical network 48

same subnet 40–41, 44, 46, 48–49, 54, 66base addresses 46mask 141multiple interfaces 40

scp command 507SCSI 50–51secure shell 469security 126, 327

activating the key 466checking current settings 468context 476disabling automatic distribution 467enabling message authentication 463generating and distributing key 464key automatic distribution 462remote command execution 469troubleshooting 468

select value 590, 766separate subnet 139SERIAL Network 151, 590Serial Optical Channel Converter (SOCC) 43, 134Serial Storage Architecture (SSA) 31, 49–51, 60SERVER_DOWN 303, 319service address 110, 131, 288

boot address 141boot/base address 142

service IP 24, 30–31, 34, 40, 43–44, 46, 49, 54, 57–59, 66, 88, 113

address 13, 132address online 293alias 143Label 13, 133, 143Label alias 143–144Labels/Addresses 772move 178

service IP address 59service IP label 13, 30session key 481settling time 536shared file systems 209shared logical volumes 209shared LVM 97shared service IP address 59shared storage 97Shared storage management 325shared tape 97shared VG

logical volume 65shared volume group

internal disk 160required information 164

single point 4, 6, 8–9, 15, 19, 61, 90, 105, 111single point of failure 111Single point of failure (SPOF) 8, 15Site 27, 29, 111site Boston 757site failure 637site Munchen 674, 757SITE_DOWN_GRACEFUL 300, 303SITE_DOWN_TAKEOVER 301, 303SITE_ISOLATION 301, 303, 316SITE_MERGE 301, 303, 317SITE_UP 300–301, 303, 318SMIT 326, 328smit cl_admin 275smit cl_whichfs 671

Index 807

smit hacmp 508smit hacmp fast path 186SMIT menu 699SMIT panel 278, 508

field values 279SMIT screen 275smitty cl_configassist command 212smitty crfs command 210smitty install_all fast path 20smitty mklv command 208smitty mkvg command 207smitty mkvg fast path 207SMP 18snapshot migration 247, 258software planning worksheet 125specified base address

sufficient address space 41SPOF 8SSA 50ssh hscroo@hmcip command 512ST_BARRIER 273ST_CBARRIER 273ST_INIT 273ST_JOINING 273ST_RP_RUNNING 273ST_STABLE 273ST_VOTING 273Stabilization Interval 171Standard Configuration 136Standard Configuration Path 203start and stop scripts 212START Script 169startsrc command 121Startup Policy 70, 81, 175

Online 772statemap 759Stop script 167stopsrc -s clcomdES command 459Storage controller 111storage planning worksheets 164storage requirements 159storage subsystem 580striping 61Subnet requirement 139Subsystem Device Driver (SDD) 153superstrict 699synchronization 267synchronizing the cluster 218System Management

Interface Tool 275system management 4, 274system resource controller (SRC) 56

TTakeover Node 163, 173

volume group 163Tape resource 188, 772target 50target mode SCSI 51target mode SSA 51TCP/IP 26TCP/IP network 131TCP/IP subsystem 111tcp_pmtu_discover 25test cluster 272

overall cost 273separate nodes 273

test plan 6Time synchronization 126, 325, 416tmscsi 50Topology 13, 24Topology service 156, 274–275topsvcs 27Triple DES 460troubleshooting

encryption 468message authentication 468

Uudp_pmtu_discover 25uplinkfast 140user 338User administration 325user administration 129user space 483users, groups and password management 327

Vvaryonvg -n -c -A command 582varyonvg shared_vg command 208varyonvg vgname

volume group 578VG 32, 51–52, 60, 64–65, 88, 99–100, 163VG Permission 577VG State 577, 768VG_DOWN 303, 315


VGDA 99VGs 113Virtual private network (VPN) 55, 59VOLUME Group 32, 52, 57, 60, 64–66, 89–90, 95–96, 98, 100, 152–153, 292, 575, 581Volume group 99

descriptor area 581type 575

volume groupdifferent types 575–576Forced varyon 580lsvg command 578major number 163Major numbers 162mirrored set 64numerous disks 163online LVM maintenance 578storage allocation 99

Volume group descriptor area 99volume groups 328vpath device 132VPN 460

WWAIT 303Web Site 17, 97, 794WebSMIT 226, 470

accepted users 471access to panels 472AIX authentication 471changing the default port 470communication 470read-only users 471secure HTTP 470security 470session timeout 471user authentication 471

WebSMIT benefits 236websmit_config -p command 229websmitctl startssl command 229WLM class 175worksheet data 195wsm_access command 230wsm_gateway command 230wsm_smit.conf 228

XXD_data 700

XD_ip 700

Index 809


(1.5” spine)1.5”<

-> 1.998”

789 <->

1051 pages

PowerHA for AIX

Cookbook

®

SG24-7739-00 ISBN 0738433187

INTERNATIONAL TECHNICALSUPPORTORGANIZATION

BUILDING TECHNICALINFORMATION BASED ONPRACTICAL EXPERIENCE

IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.

For more information:ibm.com/redbooks

®

PowerHA for AIXCookbookExtended case studies with practical disaster recovery examples

Explore the latest PowerHA V5.5 features

Enterprise ready

This IBM Redbooks publication will help you install, tailor, and configure the new PowerHA Version 5.5, and understand new and improved features such as WebSMIT gateway, non-disruptive migrations, C-SPOC enhancements, and Disaster Recovery (DR) configurations, such as GLVM in asynchronous mode.

This publication provides a broad understanding of the PowerHA and PowerHA Extended Distance (PowerHA/XD) architecture. If you plan to install, migrate, or administer a high availability cluster, this book is right for you. Disaster recovery elements and how PowerHA fulfills these necessities are also presented in detail.

This cookbook is designed to help AIX professionals that are seeking a comprehensive and task-oriented guide for developing the knowledge and skills required for PowerHA cluster design and implementation as well as for daily system administration. It is designed to provide a combination of theory and practical experience.

This book will be especially useful for system administrators currently running PowerHA or PowerHA Extended Distance (XD) clusters who might want to consolidate their environment and move to new PowerHA Version 5.5.

Back cover




Documents

PowerHA for AIX Cookbook - hmm.presalesadvisor.com · ibm.com/redbooks PowerHA for AIX Cookbook Shawn Bodily Rosemary Killeen Liviu Rosca Extended case studies with practical disaster