IBM Integration Bus High Availability Overview

IBM Integration BusHigh Availability Overview

Peter BroadhurstIBM Messaging & Integration

© 2013, 2014 International Business Machines Corporation 2

Introduction

These charts provide a high-level overview of IIB HA topologies:

• Comparison of active/active and active/passive HA

• Solutions for active/passive HA failover with IBM Integration Bus

• Solutions for active/active processing with IBM Integration Bus

• Adding Global Cache to active/active processing

• Combining all of the above

Only HTTP and JMS (MQ) workloads are shown


Active/active vs. active/passive

Scale High Availability Design Considerations

Active/Active All N instances contribute to the processing capacity of the system

New requests can be serviced immediately after the planned or unplanned termination of an active instance.

Often referred to as continuous availability.

Each instance must be able to operate independently, without relying on the availability of any other instance in the environment.

The order of processing for two items of work cannot be guaranteed, as there are multiple instances that might perform each item of work

Active/Passive Only one of N instances contributes to the processing capacity of the system

There is a failover period after the planned or unplanned termination of an active instance. This failover period commonly lasts for a small number of minutes, depending on the technology used.

An infrastructure for HA failover must exist, that ensures only one instance is ever activated, as well as detecting when one instance fails to initiate the failover.

The active and the passive system must have identical copies of persistent data, such as persisted transaction state, and persisted messages. In IBM Integration Bus this is achieved by sharing a filesystem between the two machines.


Active/passive high availability

Machine 1 Machine 2

IIB Integration Node

MQ Messaging

WESB AppTargetWESB AppTargetIntegration Server


MQ Messaging


Sharedfile system

HAfailover

• Single instance of the IIB runtime• Single instance of all state

– Configuration, MQ messages and Transaction coordinator

• Highly available due to automatic fail-over. Two options available for HA failover:– Out-of-the-box with Multi-Instance capability,– Or using external HA cluster software

JMS workloads

HTTP workloads


Option 1: Out of the box Multi-Instance failover

• IP address of each machine is different– IP address redirect is required for HTTP workloads– MQ client libraries automatically handle IP redirect for JMS workloads

• Requires highly available network-attached storage (NAS). Examples:– IBM GPFS– Veritas Cluster File System– Highly available NFSv4

• More information on choosing a suitable NAS:– https://ibm.biz/BdFxfz

Machine 1 – IP addr 1 Machine 2 – IP addr 2


MQ Messaging



MQ Messaging


Network-attached

file system (NAS)TCP/ IP

Failovermanaged

viaNAS file

locks

IP redirect (Gateway / Load Balancer)

JMS workloads

TCP/ IP

https://ibm.biz/BdFxfz


Option 2: HA cluster failover

• Allows use of direct-attached storage• IP address failed over by HA cluster software• Requires HA cluster software. Examples:

– IBM PowerHA (HACMP)– Veritas Cluster Server (VCS)– Microsoft Cluster Service (MSCS)– Red Hat Cluster

Machine 1 – IP addr 1 Machine 2 – IP addr 1


MQ Messaging



MQ Messaging


Direct-attachedfile system

switched by HA

Failovermanaged

viahealth

checking

Fibre/SCSI

HA cluster Software HA cluster SoftwareHeartbeat

Healthcheck Healthcheck

JMS workloads

HTTP workloads

Fibre/SCSI

IP addrfailover


Multi-instance or HA cluster?

Multi-instance queue manager• Integrated into the IIB and MQ products• Faster failover than HA cluster*• Delay before queue manager restart is much shorter*• Runtime performance of networked storage must be considered• IP address of standby instance is different to primary• No automatic fail-back to primary hardware when restored• More susceptible to MQ and OS defects

HA cluster• Capable of handling a wider range of failures• Failover historically rather slow, but some HA clusters are improving• Some customers frustrated by unnecessary failovers• Require MC91 SupportPac or equivalent configuration• Extra product purchase and skills required

Storage distinction• Multi-instance queue manager typically uses NAS• HA clustered queue manager typically uses SAN

*depends on NAS file-system tuning and specific customer environment


Active/active topologies

Machine 1 Machine 2


MQ Messaging


MQ Messaging

WESB AppTargetWESB AppTargetIntegration ServerWESB AppTargetWESB AppTargetIntegration Server

Gateway / Load Balancer for HTTP workloads

HTTP workloads

JMS workloads

• Each Integration Node operates independently• Continuous availability of the service during a failure

– Individual in-flight requests on the failed node receive errors, but the service stays available

• HTTP workloads require external workload balancing– Hardware load balancer is the most common solution

• JMS workloads require load balancing– Via an MQ cluster when remote applications have their own MQ queue managers– Options for direct MQ client attachment (over TCP/IP) described here: https://ibm.biz/BdFxfS

https://ibm.biz/BdFxfS


Adding Global Cache to active/active topologies

Machine 1 Machine 2


MQ Messaging

GlobalCache


MQ Messaging

WESB AppTargetWESB AppTargetIntegration ServerWESB AppTargetWESB AppTargetIntegration Server

• Highly available in-memory state store– High performance compared to disk persistence (DB/MQ Messaging)– High availability through redundancy – cannot be recovered if all nodes are stopped concurrently– Built with WebSphere eXtreme Scale technology

• Alternative to using MQ or a Database for state storage/retrieval– Correlation state between asynchronous request/reply flows

• Only needed if replies might be routed via a different runtime, otherwise simply use memory (LocalEnvironment etc.)– Regularly accessed reference data, such as routing tables


GlobalCache

Putting it all together

• Active/active processing. HA fail-over for state and in-flight message recovery• Individual queues/flows can be configured active/passive if required

– Single flow instance for message ordering or file-based processing– Single correlation state queue shared between active flow instances

Machine 1 Machine 2


MQ Messaging



MQ Messaging



MQ Messaging



MQ Messaging


SharedFilesystem

Failover

Failover

Gateway / Load Balancer for HTTP workloads

JMS workloads

HTTP workloads


More information

• Knowledge Center starting points:IIB Active/passive HA: https://ibm.biz/BdFxfeIIB Active/active HA for HTTP: https://ibm.biz/BdFxfb IIB Global cache: https://ibm.biz/BdFxfp MQ HA Cluster configurations: https://ibm.biz/BdFxf8

• Wiki article with detailed description of IIB topology choicesTailored to customers migrating from WebSphere WESB, but useful to allhttps://ibm.biz/BdFxfa

• MQDev blog on attaching MQ clients to active/active qmgrs:https://ibm.biz/BdFxfS

• Testing and support statement for multi-instance https://ibm.biz/BdFxfz

https://ibm.biz/BdFxfe

https://ibm.biz/BdFxfe

https://ibm.biz/BdFxfb

https://ibm.biz/BdFxfb

https://ibm.biz/BdFxfp

https://ibm.biz/BdFxfp

https://ibm.biz/BdFxf8

https://ibm.biz/BdFxfa

https://ibm.biz/BdFxfS

https://ibm.biz/BdFxfz