Upload
musabsyd
View
75
Download
2
Tags:
Embed Size (px)
DESCRIPTION
ha activities daily
Citation preview
Day-To-Day activities on HACMP.
Overview
This document contains Operational procedure for Day-To-Day
activities respective to the HACMP.
Contents:
1. Basics
2. HACMP Installation
3. HACMP Configuration
4. Disk Heartbeat
5. HACMP Startup/Stop
6. Resource Group Management
7. Application startup/stop scripts
8. HACMP Logical Volume Management
9. Cluster verification
10. User and Group Administration
Basics:
Cluster Topology: The Nodes, networks, storage, clients, persistent
node ip label/devices
Cluster resources: HACMP can move these components from one
node to others Ex: Service labels, File systems and applications
HACMP Services:
Cluster communication daemon (clcomdES)
Cluster Manager (clstrmgrES)
Cluster information daemon (clinfoES)
Cluster locks manager (cllockd)
Cluster SMUX peer daemon (clsmuxpd)
HACMP Daemons: clstrmgr, clinfo, clmuxpd, cllockd.
HACMP installation:
Smitty install_all à fast path for installation
Start the cluster communication daemon à startsrc –s clcomdES
Upgrading the cluster options: node by node migration and snapshot
conversion
Steps for migration:
Stop cluster services on all nodes
Upgrade the HACMP software on each node
Start cluster services on one node at a time
Convert from supported version of HAS to hacmp
Current s/w should be commited
Save snapshot
Remove the old version
Install HA 5.1 and verify
Check previous version of cluster: lslpp –h ―cluster‖
To save your HACMP configuration, create a snapshot in HACMP
Remove HACMP: smitty install_remove ( select software name
cluster*)
Lppchk –v and lppchk –c cluster* both commands run clean if the
installation is ok.
After you have installed HA on cluster nodes you need to convert and
apply the snapshot. Converting the snapshot must be performed
before rebooting the cluster nodes
Cluster Configuration:
All HACMP configuration is done through the smit menus. The rest of
this section tells you what the configuration is. This part tells you how
to do that configuration. Unless noted, this need only be done on one
server in the HACMP cluster – HACMP copies everything to the other
server.
smitty hacmp
à cluster configuration
à cluster topology
à configure cluster
à add a cluster definition
à cluster name : cl_mgmt
à configure nodes
à add cluster nodes
à enter the hostnames of the two nodes, separated by spaces
à configure adapters
à There are 8 adapters to configure in a standard implementation, so
this screen must be completed 8 times – once for each adapter. The
8 are : service, standby, boot and serial adapters for each of the
servers,
Note that
·All labels must be in /etc/hosts before you do this step
·Adapter IP label must match the entry in /etc/hosts
·network type is ―ether‖ except for the serial adapters which are
―rs232‖.
·Network attribute is ―public‖ for all adapters except the serial
adapters which are ―serial‖
·Network name is ―ether1‖ for all adapters except the serial
adapters which are ―serial1‖
·Node name is required for all adapters
·Other fields can be left blank
à Show cluster topology
à Show cluster topology
Check that this output looks like the cluster topology shown below.
à Synchronize cluster topology
à Run this with defaults. If it fails, check output and correct any errors
(these may be errors in network or AIX as well as HACMP
configuration).
à Cluster Resources
à Define Resource Groups
à Add a resource group
à See resource group definitions and names below. The first three
lines of the definition are defined in this panel.
à Define Application Servers
à Add an application server
à See below for application server configuration details.
à Change/Show Resources/Attributes for a resource group
à For the resource group fill in the attributes as shown below.
à Synchronise cluster resources
à Synchronize with the defaults. If it fails, check output and fix any
problems.
Once your cluster has completed a resource synchronization with no
errors (and you are happy with any warnings) you have completed the
HACMP configuration. You may now start HACMP.
smit hacmp
à Cluster services
à Start cluster services
Disk Heartbeat:
Disk heartbeating will typically requires 4 seeks/second. That is
each of two nodes will write to the disk and read from the disk
once/second.
Configuring disk heartbeat:
Vpaths are configured as member disks of an enhanced
concurrent volume group. Smitty lvmàselect volume groupsàAdd
a volume groupàGive VG name, PV names, VG major number,
Set create VG concurrent capable to enhanced concurrent.
Import the new VG on all nodes using smitty importvg or
importvg –V 53 –y c23vg vpath5
Create the diskhb networkàsmitty hacmpàextended
configuration àextended topology configurationàconfigure
hacmp networksàAdd a network to the HACMP clusteràchoose
diskhb
Add 2 communication devicesà smitty hacmpàextended
configuration àextended topology configurationàConfigure
HACMP communication Interfaces/DevicesàAdd
communication interfaces/devicesàAdd pre-defined
communication interfaces and devicesà communication
devicesàchoose the diskhb
Create one communication device for other node also
Testing Disk Heartbeat connectivity:/usr/sbin/rsct/dhb_read is
used to test the validity of a diskhb connection.
Dhb_read –p vpath0 –r for receives data over diskhb network
Dhb_read –p vpath3 –t for transmits data over diskhb network.
Monitoring disk heartbeat: Monitor the activity of the disk
heartbeats via lssrc –ls topsvcs.
Cluster Startup/Stop:
Cluster Startup : smit cl_admin à Manage HACMP Services > Start
ClusterServices
Note: Monitor with /tmp/hacmp.out and check for
node_up_complete.
Stop the cluster : smitty cl_admin à HACMP Services > Stop
ClusterServices
Note: Monitor with /tmp/hacmp.out and check fr
node_down_complete.
Resource Group Management:
Resource group takeover relationship:
1. Cascading
2. Rotating
3. Concurrent
4. Custom
Cascading:
Cascading resource group is activated on its home node by
default.
Resource group can be activated on low priority node if the
highest priority node is not available at cluster startup.
If node failure resource group falls over to the available node
with the next priority.
Upon node reintegration into the cluster, a cascading resource
group falls back to its home node by default.
Attributes:
1. Inactive takeover(IT): Initial acquisition of a resource group in
case the home node is not available.
2. Fallover priority can be configured in default node priority list.
3. cascading without fallback is an attribute that modifies the fall
back behavior. If cwof flag is set to true, the resource group will
not fall back to any node joining. When the flag is false the
resource group falls back to the higher priority node.
Rotating:
At cluster startup first available node in the node priority list will
activate the resource group.
If the resource group is on the takeover node. It will never
fallback to a higher priority node if one becomes available.
Rotating resource groups require the use of IP address takeover.
The nodes in the resource chain must all share the same network
connection to the resource group.
Concurrent:
A concurrent RG can be active on multiple nodes at the same
time.
Custom:
Users have to explicitly specify the desired startup, fallover and
fallback procedures.
This support only IPAT – via aliasing service IP addresses.
Startup Options:
Online on home node only
Online on first available node
Online on all available nodes
Online using distribution policyàThe resource group will only be
brought online if the node has no other resource group online.
You can find this by lssrc –ls clstrmgrES
Fallover Options:
Fallover to next priority node in list
Fallover using dynamic node priorityàThe fallover node can be
selected on the basis of either its available CPU, its available
memory or the lowest disk usage. HACMP uses RSCT to gather
all this information then the resource group will fallover to the
node that best meets.
Bring offlineàThe resource group will be brought offline in the
event of an error occur. This option is designed for resource
groups that are online on all available nodes.
Fallback Options:
Fallback to higher priority node in the list
Never fallback
Resource group Operation:
Bring a resource group offline: smitty cl_adminàselect
hacmp resource group and application managementàBring a
resource group offline.
Bring a resource group online: smitty hacmp àselect hacmp
resource group and application managementàBring a resource
group online.
Move a resource group: smitty hacmp à select hacmp
resource group and application managementà Move a resource
group to another node
To find the resource group information: clrginfo –P
Resource group states: online, offline, aquiring, releasing, error,
temporary error, or unknown.
Application Startup/Stop Scripts:
smitty hacmp
à cluster configuration
à Cluster Resources
à Define Application Servers
à Add an application server
Configure HACMP Application Monitoring: smitty
cm_cfg_appmonàAdd a process application monitoràgive process
names, app startup/stop scripts
HACMP Logical Volume Management:
C-SPOC LVM: smitty cl_admin à HACMP Logical Volume
Management
Shared Volume groups
Shared Logical volumes
Shared File systems
Synchronize shared LVM mirrors (Synchronize by
VG/Synchronize by LV)
Synchronize a shared VG definition
C-SPOC concurrent LVM: smitty cl_admin à HACMP
concurrent LVM
Concurrent volume groups
Concurrent Logical volumes
Synchronize concurrent LVM mirrors
C-SPOC Physical volume management: smitty
cl_adminàHACMP physical volume management
Add a disk to the cluster
Remove a disk from the cluster
Cluster disk replacement
Cluster datapath device management
Cluster Verification:
smitty hacmpàExtended verificationàExtended verification and
synchronization. Verification log files stored in /var/hacmp/clverify.
/var/hacmp/clverify/clverify.log à Verification log
/var/hacmp/clverify/pass/nodename à If verification succeeds
/var/hacmp/clverify/fail/nodename à If verification fails
Automatic cluster verification: Each time you start cluster services and
every 24 hours.
Configure automatic cluster verification: smitty hacmpàproblem
determination toolsàhacmp verification à Automatic cluster
configuration monitoring.
User and group Administration:
Smitty cl_usergroupàusers in a HACMP cluster
Add a user to the cluster
List users in the cluster
Change/show characteristics of a user in the cluster
Remove a user from the cluster
Smitty cl_usergroupàGroups in a HACMP cluster
Add a group to the cluster
List groups to the cluster
Change a group in the cluster
Remove a group
Smitty cl_usergroupàPasswords in an HACMP cluster
FAQ’S
Does HACMP work on different operating systems?
Yes. HACMP is tightly integrated with the AIX 5L operating system and System p servers
allowing for a rich set of features which are not available with any other combination of
operating system and hardware. HACMP V5 introduces support for the Linux operating system
on POWER servers. HACMP for Linux supports a subset of the features available on AIX 5L,
however this mutli-platform support provides a common availability infrastructure for your entire
enterprise.
What applications work with HACMP?
All popular applications work with HACMP including DB2, Oracle, SAP, WebSphere, etc.
HACMP provides Smart Assist agents to let you quickly and easily configure HACMP with
specific applications. HACMP includes flexible configuration parameters that let you easily set it
up for just about any application there is.
Does HACMP support dynamic LPAR, CUoD, On/Off CoD,
or CBU?
HACMP supports Dynamic Logical Partitioning, Capacity Upgrade on Demand, On/Off Capacity
on Demand and Capacity Backup Upgrade.
If a server has LPAR capability, can two or more
LPARs be configured with unique instances of HACMP
running on them without incurring additional
license charges?
Yes. HACMP is a server product that has one charge unit: number of processors on
which HACMP will be installed or run. Regardless of how many LPARs or instances of AIX
5L that run in the server, you are charged based on the number of active processors in the
server that is running HACMP. Note that HACMP configurations containing
multipleLPARs within a single server may represent a potential single point-of-failure. To avoid
this, it is recommended that the backup for an LPAR be an LPAR on a different server or a
standalone server.
Does HACMP support non-IBM hardware or operating
systems?
Yes. HACMP for AIX 5L supports the hardware and operating systems as specified in the
manual where HACMP V5.4includes support for Red Hat and SUSE Linux.
Paging space and paging rates
HACMP interview questions a. What characters should a hostname contain for HACMP configuration? The hostname cannot have following characters: -, _, * or other special characters. b. Can Service IP and Boot IP be in same subnet? No. The service IP address and Boot IP address cannot be in same subnet. This is the basic requirement for HACMP cluster configuration. The verification process does not allow the IP addresses to be in same subnet and cluster will not start. c. Can multiple Service IP addresses be configured on single Ethernet cards? Yes. Using SMIT menu, it can be configured to have multiple Service IP addresses running on single Ethernet card. It only requires selecting same network name for specific Service IP addresses in SMIT menu. d. What happens when a NIC having Service IP goes down? When a NIC card running the Service IP address goes down, the HACMP detects the failure and fails over the service IP address to available standby NIC on same node or to another node in the cluster. e. Can Multiple Oracle Database instances be configured on single node of HACMP cluster? Yes. Multiple Database instances can be configured on single node of HACMP cluster. For this one needs to have separate Service IP addresses over which the listeners for every Oracle Database will run. Hence one can have separate Resource groups which will own each Oracle instance. This configuration will be useful if there is a failure of single Oracle Database instance on one node to be failed over to another node without disturbing other running Oracle instances. f. Can HACMP be configured in Active - Passive configuration? Yes. For Active - In Passive cluster configuration, do not configure any Service IP on the passive node. Also for all the resource groups on the Active node please specify the passive node as the next node in the priority to take over in the event of failure of active node. g. Can file system mounted over NFS protocol be used for Disk Heartbeat? No. The Volume mounted over NFS protocol is a file system for AIX, and since disk
device is required for Enhanced concurrent capable volume group for disk heartbeat the NFS file system cannot be used for configuring the disk heartbeat. One needs to provide disk device to AIX hosts over FCP or iSCSI protocol. h. Which are the HACMP log files available for troubleshooting? Following are log files which can be used for troubleshooting: 1. /var/hacmp/clverify/current//* contains logs from current execution of cluster verification. 2. /var/hacmp/clverify/pass//* contains logs from the last time verification passed. 3. /var/hacmp/clverify/fail//* contains logs from the last time verification failed. 4. /tmp/hacmp.out file records the output generated by the event scripts of HACMP as they execute. 5. /tmp/clstmgr.debug file contains time-stamped messages generated by HACMP clstrmgrES activity. 6. /tmp/cspoc.log file contains messages generated by HACMP C-SPOC commands. 7. /usr/es/adm/cluster.log file is the main HACMP log file. HACMP error messages and messages about HACMP related events are appended to this log. 8. /var/adm/clavan.log file keeps track of when each application that is managed by HACMP is started or stopped and when the node stops on which an application is running. 9. /var/hacmp/clcomd/clcomd.log file contains messages generated by HACMP cluster communication daemon. 10. /var/ha/log/grpsvcs. file tracks the execution of internal activities of the grpsvcs daemon. 11. /var/ha/log/topsvcs. file tracks the execution of internal activities of the topsvcs daemon. 12. /var/ha/log/grpglsm file tracks the execution of internal activities of grpglsm daemon.
Key PowerHA terms
The following terms are used throughout this article and are helpful to
know when discussing PowerHA:
Cluster: A logical grouping of servers running PowerHA.
Node: An individual server within a cluster.
Network: Although normally this term would refer to a larger
area of computer-to-computer communication (such as a WAN),
in PowerHA network refers to a logical definition of an area for
communication between two servers. Within PowerHA, even
SAN resources can be defined as a network.
Boot IP: This is a default IP address a node uses when it is first
activated and becomes available. Typically—and as used in this
article—the boot IP is a non-routable IP address set up on an
isolated VLAN accessible to all nodes in the cluster.
Persistent IP: This is an IP address a node uses as its regular
means of communication. Typically, this is the IP through which
systems administrators access a node.
Service IP: This is an IP address that can "float" between the
nodes. Typically, this is the IP address through which users
access resources in the cluster.
Application server: This is a logical configuration to tell
PowerHA how to manage applications, including starting and
stopping applications, application monitoring, and application
tunables. This article focuses only on starting and stopping an
application.
Shared volume group: This is a PowerHA-managed volume
group. Instead of configuring LVM structures like volume
groups, logical volumes, and file systems through the operating
system, you must use PowerHA for disk resources that will be
shared between the servers.
Resource group: This is a logical grouping of service IP
addresses, application servers, and shared volume groups that
the nodes in the cluster can manage.
Failover: This is a condition in which resource groups are
moved from one node to another. Failover can occur when a
systems administrator instructs the nodes in the cluster to do so
or when circumstances like a catastrophic application or server
failure forces the resource groups to move.
Failback/fallback: This is the action of moving back resource
groups to the nodes on which they were originally running after a
failover has occurred.
Heartbeat: This is a signal transmitted over PowerHA networks
to check and confirm resource availability. If the heartbeat is
interrupted, the cluster may initiate a failover depending on the
configuration.