Upload
kernel-training
View
3.627
Download
16
Embed Size (px)
Citation preview
Welcome to HACMPIntroduction Demo Class
Email: [email protected] us: +91 8099776681
www.kerneltraining.com
Unit objectivesAfter completing this unit, you should be able to:
Define High Availability and explain why it is needed
List the key considerations when designing and implementing a high availability cluster
Outline the features and benefits of HACMP for AIX
Describe the components of an HACMP for AIX cluster
Explain how HACMP for AIX operates in typical cases HACMP
www.kerneltraining.com
High Availability and HACMP conceptsAfter completing this topic, you should be able to:
Define High Availability
Recognize that eliminating single points of failure (SPOFs) is part of the HACMP implementation process
Outline the features and benefits for HACMP for AIX
Describe the HACMP concepts of topology and resources
Give examples of topology components and resources
Provide a brief description of the software and hardware components of a typical HACMP cluster
HACMP
www.kerneltraining.com
So, what is High Availability?High Availability characteristics:The masking or elimination of both planned and unplanned downtimeThe elimination of single points of failure (SPOFs)Fault resilience and system hardeningNo specialized hardware requirement
HACMP
client
Workload Fallover
WAN
Production Node/LPAR Standby Node/LPAR
www.kerneltraining.com
Eliminating single points of failure
HACMP
Cluster Object Eliminated as a single point of failure by:
Node Using multiple nodes
Power source Using multiple circuits or uninterruptible power supplies
Network adapterNetwork
Using redundant network adaptersUsing multiple networks to connect nodes
TCP/IP subsystem Using non-IP networks to connect adjoining nodes and clients
Disk adapterDisk
Using redundant disk adapter or multipath hardwareUsing multiple disks with mirroring or raid
Application Adding node for takeover; configuring application monitor
VIO Server Implementing dual VIO Servers
Site Adding an additional site
The fundamental goal of (successful) cluster design isthe elimination of single points of failure (SPOFs).
www.kerneltraining.com
High availability clusters (HACMP base)
HACMP
System p and AIX RAS features include:Application and Partition MobilityFirst Failure Data Capture (FFDC)Dynamic CPU DeallocationFlexible Service ProcessorRedundant Power and CoolingError Correction Checking MemoryHot Swap AdaptersDynamic KernelJournaled FilesystemRedundant Data Paths
Dual Disk Adapters (MPIO)Data Mirroring and/or StripingHot Swap / Hot Spare StorageRedundant Power/Cooling for Storage Arrays
With High Availability Clustering (HACMP)Protection against node and OS failure with Redundant
nodesProtection against NIC failure with Redundant Network
AdaptersProtection against Network failure with Redundant
NetworksSelf-healing clusters with Application MonitoringProtection against Site Failure (typically limited by SAN
infrastructure) or no distance limitations with HACMP/XD
www.kerneltraining.com
What about site failure?
HACMP
Limited distance (LVM mirroring and SAN): HACMP for AIX
Extended distance: Geographic Clustering Solution (that is, HACMP/XD)
Distance unlimitedApplication, disk, and network independentAutomated site failover and reintegrationA single cluster across two sitesGet more details in HACMP System Administration III –
AU620
Toronto
Brussels
Metro Mirror/PPRCGLVMGeoRM
Data Replication
www.kerneltraining.com
IBM's HA solution for AIX
HACMP
HACMP for AIX characteristics:Stands for High Availability Cluster Multi-processingIs based on cluster technology (RSCT)Provides two environments (which can co-exist simultaneously):
Serial (High Availability): the process of ensuring that an application is available for use through the use of serially accessible shared data and duplicated resourcesParallel (Cluster Multiprocessing): concurrent access to shared data
www.kerneltraining.com
Fundamental HACMP concepts
HACMP
Topology: Physical “networking centric” components Resources: Entities that are being made highly available Resource group: A collection of resources, which HACMP controls as a single unit
A given resource can appear only in, at most, one resource groupResource group policies:
startup policy: which node the resource group is activated onfallover policy: determines target when there is a failurefallback policy: determines fallback behavior
Customization The process of augmenting HACMP, typically via implementing
scripts Minimum: application start and stop scriptsOptional:
Application monitoring scripts (highly recommended!)Event customization
Notification, pre- and post-event scripts, recovery scripts, user-defined events, time until warning (config_too_long timeout)
www.kerneltraining.com
A highly available cluster
HACMP
Resource
group Shared Storage
clstrmgr clstrmgr
Fallover Node Node
Fundamental Concepts
Cluster is comprised of physical components (topology) and logical components (resource groups and resources).
www.kerneltraining.com
HACMP's topology components (1 of 2)
HACMP
IP Network
CommunicationInterface
Non-IP
Networ
k
Communicatio
n
Device
Node
The Topology components consist of a cluster, nodes and the technology that connects them together.
www.kerneltraining.com
HACMP’s topology components (2 of 2)
HACMP
Ethernet / Etherchannel
ServerServerPC
Non -IP Server Server
Heartbeat on DiskRS232/422
SAN IBM
RS/6000RS/6000
DS8000 Fibre
DS4000
Fibre Channel
Node Any-to-any, including LPARs Minimum number of physical adapters
for redundancy must be considered
Networking Ethernet
Physical and virtualEtherchannel
Non-IPHeartbeat on disk, RS-232, Target-
mode SCSI
Shared storage Physical
SCSI or Fibre Channel Virtual SCSI
www.kerneltraining.com
What is HACMP?
HACMP
An application which:Controls where resource groups runMonitors and reacts to eventsProvides tools for cluster-wide configuration and
synchronizationRelies on other AIX Subsystems (ODM, LVM, RSCT, TCP/IP, SRC,
and so on)Cluster Manager Subsystem (clstrmgrES)
Topology manager
Resource manager
Event manager
SNMP manager
RSCT(topsvcs, grpsvcs, RMCsubsystems)
snmpd clinfoES
clcomdES
clstat
www.kerneltraining.com
Additional features of HACMP
HACMP
HACMP is shipped with utilities to simplify configuration, monitoring, customization, and cluster administration.
OLPW smit via web
Configuration Assistant
CSPOCDARE
clstrmgrESSNMP
VerificationAuto tests
TivoliIntegration
Application Monitoring
www.kerneltraining.com
Some assembly required
HACMP
HACMP can be used out of the box; however, some assembly is required.Minimum:
Application Start/Stop/Monitor scriptsOptional:
Customized pre/post event scriptsReaction to events
Error notification MethodsUser Defined Event’s (UDE’s)Cluster State Change
HACMP's flexibility allows for complex customization in order to meet availability goals
www.kerneltraining.com
Let’s review
HACMP
1. Which of the following items are examples of topology components in HACMP? (Select all that apply.)
a. Nodeb. Networkc. Service IP labeld. Hard disk drive
2. True or False?All nodes in an HACMP cluster must have roughly equivalent performance characteristics.
3. Which of the following is a characteristic of high availability?a. High availability always requires specially designed hardware
components.b. High availability solutions always require manual intervention to
ensure recovery following fallover. c. High availability solutions never require customization.d. High availability solutions use redundant standard equipment (no
specialized hardware).4. True or False?
A thorough design and detailed planning is required for all high availability solutions.
www.kerneltraining.com
Let’s review solutions
HACMP
1. Which of the following items are examples of topology components in HACMP? (Select all that apply.)
a. Nodeb. Networkc. Service IP labeld. Hard disk drive
2. True or False?All nodes in an HACMP cluster must have roughly equivalent performance characteristics.a
3. Which of the following is a characteristic of high availability?a. High availability always requires specially designed hardware
components.b. High availability solutions always require manual intervention to
ensure recovery following fallover. c. High availability solutions never require customization.d. High availability solutions use redundant standard equipment (no
specialized hardware).4. True or False?
A thorough design and detailed planning is required for all high availability solutions.
www.kerneltraining.com
What does HACMP do?
HACMP
After completing this topic, you should be able to:
Describe the failures that HACMP detects directly
Provide an overview of the standby and takeover cluster configuration options in HACMP
Describe some of the considerations and limits of an HACMP cluster
www.kerneltraining.com
Just what does HACMP do?
HACMP
HACMP functions:Monitors the states of nodes, networks, network adapters and
devicesStrives to keep resource groups highly availableOptionally, monitors the state of the applications, and can be
customized to react to every possible failure
www.kerneltraining.com
What happens when something fails?
HACMP
How the cluster responds to a failure depends on what has failed, what the resource group's fallover policy is, and if there are any resource group dependencies: Typically, another equivalent component takes over duties of failed
component (for example, another node takes over from a failed node).
www.kerneltraining.com
What happens when a problem is fixed?
HACMP
How the cluster responds to the recovery of a failed component depends on what has recovered, what the resource group's fallback policy is, and the resource group dependencies:Typically, administrators need to indicate or confirm that the fixed
component is approved for use. Some components are integrated automatically; for instance, when a communication interface recovers.a
www.kerneltraining.com
Standby (active/passive) with fallback
HACMP
Node USA fails Node UK fails
USA returns UK returns
One node is primary
RG can be configured to come online on the primary or any node
(no change)
A
A A
AA
www.kerneltraining.com
Standby (active/passive) without fallback
HACMP
USA fails
UK failsUSA returns
Eliminates anotheroutageReduces downtime
A
A
A
A UK returns
www.kerneltraining.com
Mutual takeover: Active/Active
HACMP
UK fails
Very commonNo one node/LPAR is left idle
A B
B
B
B A
A
A
(with Fallback) (with Fallback)
www.kerneltraining.com
Concurrent: Multiple active nodes
HACMP
USA, Germany, and UK are all running Application A, each using a separate IP Address A A A
A A AAIf nodes fail, the application remains continuously available as long as there are surviving nodes to run on.
Fixed nodes resume running their copy of the application.
Application must be designed to run simultaneously onmultiple nodes.This has the potential for essentially zero downtime.
www.kerneltraining.com
Points to ponder
HACMP
Resource groups:Must be serviced by at least two nodesCan have different policiesCan be migrated (manually or automatically) to rebalance loads
Clusters:Must have at least one IP network and one non-IP networkNeed not have any shared storageCan have any combination of supported nodes *Can be split across two sites
Might or might not require replicating data (HACMP/XD).Applications:
Can be restarted via monitoringMust be manageable via scripts (start/restart and stop)
* Application performance requirements and other operational issuesalmost certainly impose practical constraints on the size and complexity of a given cluster.
www.kerneltraining.com
Other considerations for HACMP
HACMP
Design, planning, testing Focus on service and availabilityApply appropriate risk analysisDisciplined system administration practices
Documented operational procedures
High availability
Continuous operation
Continuous
availability
SystemsManagement
People
Data
Hardware
Software
Environment
Networking
www.kerneltraining.com
Things HACMP does not do
HACMP
Back-up and restorationTime synchronizationApplication specific configurationSystem administration tasks unique to each node
www.kerneltraining.com
When is HACMP not the correct solution?
HACMP
Zero downtime required Maybe a fault tolerant system is the correct choice.Availability 7x24x365; HACMP occasionally needs to be
shut down for maintenance.Life-critical environments.
Security issuesToo little security
Many people can change the environment.Too much security
C2 and B1 environments might not allow HACMP to function as designed.
Unstable environmentsHACMP cannot make an unstable and poorly managed
environment stable. HACMP tends to reduce the availability of poorly managed
systems.
www.kerneltraining.com
What do we plan to achieve this week?
HACMP
Your mission this week is to build a two-node mutual takeover highly available cluster using two previously separate AIX systems, each of which has an application which needs to be made highly available.
A
B
A
B
www.kerneltraining.com
Overview of the implementation process
HACMP
Plan and configure AIXElimination of single points of failureStorage (adapters, LVM volume group, filesystem)Networks (IP interfaces, /etc/hosts, non-IP networks, and devices)Application start and stop scripts
Install the HACMP filesets (Note: 5.3 and earlier reboot!)
Configure the HACMP environmentTopology
Cluster, node names, HACMP IP and non-IP networksResources and Resource groups:
Identify name, nodes, policiesResources: Application Server, service label, VG, filesystem
Synchronize, then start HACMPNote: If using two nodes and one application “Configure the
HACMP environment” can be done in one step.
www.kerneltraining.com
Hints to get started
HACMP
•Draw a diagram.•Use (online) planning sheets.•Focus on eliminating SPOFs.•Always factor in a non-IP network.•Ensure that you have multipath access to shared storage devices.•Document a test plan.•Test the cluster carefully.•Be methodical.
hints
Public Network
Resource Group databaserg containsVolume Group = dbvg
hdisk3, hdisk4, hdisk5, hdisk6, hdisk7Major # = 51JFS Log = dblvlogLogical Volume = dblv1, dblv2FS Mount Point = /db, /dbdata
Node Name = nodea Resource group = dbrg
Applications = database Resources = cascading
A-B Priority = 1,2 CWOF = yes
Label = a_tmssa Device = /dev/tmssa1
Label = a_tty Device = /dev/tty1
Node Name =nodeb Resource group = httprg
Applications = http Resources = cascading
B-A Priority = 2,1 CWOF = yes
Label = b_tmssa Device = /dev/tmssa2
Label = a_tty Device = /dev/tty1
tmssa network
serial network
VG = dbvgRaid5100GB
VG =httpvgRaid19GB
rootvgraid19.1GB
rootvgraid19.1GB
usercommunity
HACMP Clusterfor
the ABC company
Resource Group httprg containsVolume Group = httpvghdisk2,hdisk8
Major # = 50JFS Log = httplvlogLogical Volume = httplvFS Mount Point = /http
Node A IP Label IP Address NetmaskService webserv 192.168.9.5 255.255.255.0Boot nodebboot 192.168.9.6 255.255.255.0Standby nodebstand 192.168.254.3 255.255.255.0
Node A IP Label IP Address NetmaskService database 192.168.9.3 255.255.255.0Boot nodeaboot 192.168.9.4 255.255.255.0Standby nodeastand 192.168.254.3 255.255.255.0
www.kerneltraining.com
Sources of HACMP information
HACMP
HACMP manuals come with the product cluster.doc.en_US.es.html cluster.doc.en_US.es.pdf
HACMP documentation also available online http://www.ibm.com/servers/eserver/pseries/library/
hacmp_docs.htmlRelease Notes contain important information about the version
release /usr/es/sbin/cluster/release_notes
Sales manual: http://www.ibm.com/common/ssiIBM courses:
HACMP Admin. I: Planning and Implementation (AU540/AU54) HACMP Admin II: Admin. and Problem Determination
(AU610/AU61) HACMP Administration III: Virtualization and Disaster Recovery
(AU620/AU62) HACMP V5 Internals (AU60)
IBM Web site: http://www-03.ibm.com/systems/p/ha/
Non-IBM sources (not endorsed by IBM but probably worth a look): http://lpar.co.uk http://portal.explico.de/ http://www.matilda.com/hacmp/ http://groups.yahoo.com/group/hacmp/
www.kerneltraining.com
Checkpoint
HACMP
1. True or False?Resource Groups can be moved from node to node.
2. True or False?HACMP/XD is a complete solution for building geographically distributed clusters.
3. Which of the following capabilities does HACMP not provide? (Select all that apply.)a.Time synchronizationb.Automatic recovery from node and network adapter
failurec. System Administration tasks unique to each node; back-
up and restorationd.Fallover of just a single resource group
4. True or False?All nodes in a resource group must have equivalent performance characteristics.
www.kerneltraining.com
Checkpoint solutions
HACMP
True or False?Resource Groups can be moved from node to node.
True or False?HACMP/XD is a complete solution for building geographically distributed clusters.
Which of the following capabilities does HACMP not provide? (Select all that apply.):Time synchronizationAutomatic recovery from node and network adapter
failureSystem Administration tasks unique to each node;
back-up and restorationFallover of just a single resource group
True or False?All nodes in a resource group must have equivalent performance characteristics.
www.kerneltraining.com
Unit summary
HACMP
Having completed this unit, you should be able to:
Define high availability and explain why it is needed
Outline the various options for implementing high availability
List the key considerations when designing and implementing a high availability cluster
Outline the features and benefits of HACMP for AIX
Describe the components of an HACMP for AIX cluster
Explain how HACMP for AIX operates in typical casesa
Email: [email protected] us: +91 8099776681
THANK YOUfor attending
Demo of HACMP
www.kerneltraining.com
HACMP