28
Enable Active-Active Enterprise Messaging Technology to extend workload balancing and high availability Session HHM-3474 Pete Siddall IBM Hursley [email protected]

Hhm 3474 mq messaging technologies and support for high availability and active active

Embed Size (px)

Citation preview

Enable Active-Active Enterprise Messaging Technology to extend workload balancing and high availability

Session HHM-3474

Pete Siddall – IBM [email protected]

Agenda

• Concepts of Business Continuity

– Business Continuity

– High Availability

– Continuous Serviceability

– Continuous Availability Cross Sites

• Messaging Technologies for Business Continuity

• Example Use Cases

1

What does business continuity mean to you?

• Why we need to have a business continuity plan (BCP)?

– Don’t panic in the event of disaster crisis

• What we need to consider when preparing a BCP?

– "backups" and their locations

– a central command center, which we call it as "Crisis Management Team (CMT)" in IBM

– maintain a "contact list“

– think about all possible "scenarios" and their corresponding action plans

– consider "critical" information or applications first

2

Different levels of business continuity

• Enterprise Business Requires Business Continuity

3

Standby Active-Active

0. Disaster

Recovery

• Restore the

business after

a disaster

1. High-Availability

• Meet Service

Availability objectives

e.g., 99.9% availability

or no more than 8

hours of down-time a

year for maintenance

and failures

2. Continuous

serviceability

• No downtime

within one data

center (planned

or not)

3. Continuous

Availability cross

sites

• No downtime ever

(planned or not)

BC Level 1 – High Availability

4

• HA at different levels (AIX example)

– Apps follow HA principles

– Middleware HA technologies

• Clustering, DB2 pureScale, MQ multi-ins

– OS HA technologies

• PowerHA (HACMP)

– Hardware HA technologies

• Disk redundancy (RAID, SDD, etc)

• FlashCopy, Metro/Global mirror

• Server redundancy (CPU, power, etc)

• Network redundancy

• Key point is eliminating SPOF

– Redundancy

• RPO = 0!

BC Level 2 – Continuous Serviceability

• Usually based on workload take over

– Automatically take over

– A challenge for application affinity and sequence

– Old data may be lost – could combine with HA

• Maintenance

– Planned and unplanned downtime

– Rolling updates

– Coexistence

• Short RTO !

5

BC Level 3 – Continuous Availability Cross Sites

• Two or more sites, separated by unlimited distances, running the same applications and having

the same data to provide cross-site workload balancing and Continuous Availability / Disaster

Recovery

• Customer data at geographically dispersed sites kept in sync via synchronization

6

Synchronous data mirroring Async data mirroring Active/Active

Failover model Failover model Near CA model

Recovery time = 2 minutes Recovery time < 1 hour Recovery time < 1 minute

Distance < 20 KM Unlimited distance Unlimited distance

CD1SOURCECD1TABLE

CD1SOURCECD1TABLE

CD1SOURCECD1TABLE

CD1SOURCECD1TABLE

CD1SOURCECD1SOURCECD1TABLECD1CD1TABLE

CD1SOURCECD1TABLE

CD1SOURCECD1TABLE

CD1SOURCECD1SOURCECD1TABLECD1CD1TABLE • Care about both

RPO & RTO!

Workload Balancing Through Data Replication

• Both sides run workload simultaneously, may with same or different volumes. But both have the

full picture of data!

• Replicate data from one platform to another

– Both sides may work equally, or have different focus, like below:

– Main server still do the existing critical work.

– Meanwhile, the offloaded server can run data analysis, query data, etc.

– New business requirements, but don’t want to touch the existing server!

– When acquire a new organization, may involve a different database on a different platform. How to

centralize the data?

7

Site A Site B

Synchronization

OLTP QLTP

Powerful

Critical production work

(DB updates/inserts)

Strict maintenance

process

Cautions: Nobody

wants it down

Less powerful

Less critical work

(DB queries)

Work can be delayed,

but may cost high CPU

(Data analysis, credit

card anti-fraud, etc)

New workloads

Agenda

• Concepts of Business Continuity

• Messaging Technologies for Business Continuity

– HA Technologies

– Continuous Serviceability Technologies

– Continuous Availability Cross Sites

• Example Use Cases

8

MQ Technologies

• HA Technologies

– QSG for MQ on zOS

– Failover Technologies

– Application HA

• Continuous Serviceability Technologies

– MQ Clustering

– Rolling Upgrade

• Continuous Availability Cross Sites

– Data Synchronization

– Synchronization Application Design

– How To Replicate Data

– Performance Consideration

9

HA - QSG for MQ on z/OS

10

Queue

manager

Private

queues

Queue

manager

Private

queues

Queue

manager

Private

queues

Shared

queues

Coupling facility failure

Messages on

shared queues

OK (kept)

Nonpersistent

messages on

shared queues

lost (deleted)

Queue

manager

Private

queues

Queue

manager

Private

queues

Queue

manager

Private

queues

Shared

queues

Nonpersistent

messages on

private queues

OK (kept)

Messages on

shared queues

OK (kept)Nonpersistent

messages on

private queues

lost (deleted)

Queue manager failure

Persistent

messages on

shared queues

restored from log

HA - Failover Technologies

• Failover

– The automatic switching of availability of a service

– Data accessible on all servers

• Multi-instance queue manager

– Integrated into the IBM MQ product

– Faster failover than HA cluster

– Runtime performance of networked storage

– More susceptible to MQ and OS defects

• MQ Appliance

– Built-in replication of queue manager data

– Eliminates reliance on network file system

• HA cluster

– Capable of handling a wider range of failures

– Failover historically slower, but some HA clusters are improving

– Some customers frustrated by unnecessary failovers

– Extra product purchase and skills required

• Workload Balancing

• Service Availability

• Location Transparency (of a kind)

Service 1

App 1App 1Client 1

Service 1

Gateway

QMgr

QMgr

Site 1

QMgr

Site 2

Continuous Serviceability – MQ Cluster

13

QMgr QMgr

QMgr QMgr

Service Service

Service Service

QMgr

QMgr

App 1App 1Client

QMgr

QMgr

App 1App 1Client

New York

London

but separated by an ocean and 3500 miles

• Prefer traffic to stay geographically local

• Except when you have to look further afield

• How do you do this with clusters that span geographies?…

Global applications

Multi - Data Center using MQ Cluster

14

Continuous Availability Cross Sites: Active-Active

17

Business App

BusinessData

SyncApp

Messaging

SyncApp

Messaging

Business App

BusinessData

Workload Distributor

Sites at a distance

•Cross Site Workload Distribution

•Data synchronization

•Rely on high performance, reliable messaging transmission

•Flexible application design

•Automation & Management

Continuous Availability Cross Sites

• Data Synchronization is the key component in Active-Active

– Capture transaction change in real-time

– Publish the change in high performance with low latency

• Messaging based implementation is proven to be the simplest way among kinds of methods of

data transmission

• A high performance, reliable messaging product is needed for the following requirements:

– Simplifies application development

– Ease of use

– Assured message delivery

– High Performance and Scalability

– Easy of Management

18

How to replicate data?

• Capture transaction activities through DB2 logs – an independent tool

• Modify the existing applications – Send out transactional data with MQ API

– At the end of existing logic, add MQPUT call to send the data. Program an apply application at the target end.

– Flexible, can cross different platforms, even different database products. But need a robust application.

– Option to choose within or without syncpoint. – Will the existing transaction fail(roll back) if the send fails?

19

Log-based

Capture

IBM MQSource Target

Highly parallel

Apply

Q-Replication

Q

Capture

Q

Apply

Performance Tuning Considerations

• Synchronize only the changed data, thus reduce the data volume

• Introduce more parallelism

– Multiple synchronization channels for different type of workload

– More threads in sync application for parallel processing

– Multiple MQ channels to leverage single channel busy problem

• Invest to use MQ new feature

– Bigger buffer pools above the bar

– Sequential pre-fetch

– Page set read/write performance enhancement

– Channel performance improvement

20

Agenda

• Concepts of Business Continuity

• Messaging Technologies for Business Continuity

• Example Use Cases:

– Case 1 (Active/Active with QREP tool )

– Case 2 (Data replication for new workload )

– Case 3 (Data replicate to multiple systems)

22

1200 km

70 km

Beijing data center:

For disaster recovery

Shanghai data center 1

Production centerShanghai data center 2

Requirements of a bank – Active/Active

• A commercial bank - data centers in Shanghai and Beijing

– Beijing: One existing data center for disaster recovery

– Shanghai: One existing data center for production, and one new data center for Active-Active. 70 km

between two data centers

• This bank plans to achieve Active-Active between two data centers in Shanghai for core banking

business.

23

rows/s MB/s

OLTP 45K-50K 45

Batch 140K 50

Month-End Batch 130K 70-80

Interest Accrual Batch 440K 172.5

MQ in Q Replication

• Part of the InfoSphere Data Replication product

• A software-based asynchronous replication solution

– For Relational Databases

– Changes are captured from the database recovery log; transmitted as (compact) binary data; and then applied to the

remote database(s) using SQL statements.

• Leverages IBM MQ for Staging/Transport

– Each captured database transactions published in an MQ message (messages sent at each commit interval)

– Staging makes it possible to achieve continuous operation even when the target database is down for some time or

the network encounter some problem.

24

DB2

Control Tables

Site A

DB2

Control tables

Q Capture

Q Applyagent

agent

agentUser tables

database recoverylog

User tables

Unlimited Distance

Site B

Configuration & Monitoring

logrdr publish

Data CenterIBM MQ

DB2 Transaction Parallel Replay

AsynchronousLOG change data capture

Active DB2Active DB2 Persistent Staging Capability

SQL statements

MQ v8.0 features for Q Rep scenarios

• Sequential pre-fetch on z/OS

– The TUNE READAHEAD(ON), TUNE RAHGET(ON) delivered to the bank as PTF in V71 and still applicable to V8

• Pageset read/write performance enhancements for QREP on z/OS

– Changes to the queue manager deferred write processor. Now it’s the default behavious in the V8

• 64-bit enablement of buffer pools on z/OS

– More real storages can be used as buffers

• SMF Enhancements on z/OS

– Chinit SMF helps on tuning channel performance

• 64-bit log RBA

– We probably want QREP users to get to this

• Other improvement

–z/OS miscellaneous improvements (performance and serviceability)

–Channel performance on z/OS

25

Case 2 (Data replication for new workload)

• Purpose

– A new business – SELECT frequently.

– Existing DB2 on zOS, but wants to buy an existing solution on Linux.

– So this is an active-active data replication within the same data center, cross platform.

• Implementation

– Modify the existing core banking applications + Send with MQ logic at the end.

– On the distributed side, develop another application for DB updates/inserts.

– Minimize the impact on the existing applications - out of syncpoint.

34

Data replication for new workload

• Easier and Faster expand the business

• The existing business is slight touched (nearly untouched).

• Flexible, no dependencies on the type of target database.

35

Workload

Distributor

Core banking system(zOS)

zOS System

Core

Workload

Standby

Linux System

Query

Workload

Active

QUERY system(Linux)

App Logic:

• Existing logic

• MQPUT (data to update in DB)

• EXEC CICS SYNCPOINT

Apply

Application

App Logic:

• According to the data received,

update target with SQL statement

or SP

MQ ChannelzOS MQ Linux MQ

Case 3 (Data replicate to multiple systems)

• Purpose

– Replicate zOS database of core credit card system to a Linux database in a near real time window.

There are multiple consumers on different Linux boxes want the same data.

• Implementation

– zOS MQ does a normal put(same as the data replication discussed in previous pages), only one copy of

data is transferred to Linux MQ. Then this MQ dose the 1-n publication with the MQ pub/sub engine.

36

MQPUT(/credit/deposit/)

CICS/Batch

QM on zOS (QM1) QM distributed (QM2)

SUB2.Q

APP1.SUB

SUB1.Q

MQGET or

Remote QMGR

APP2.SUB

Cluster XMITQ

Or XMITQ(hierarchy)

……

10 Subs in total

MQGET or

Remote QMGR

Notices and Disclaimers

38

Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission

from IBM.

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.

Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of

initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS

DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE

USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY.

IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.

Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers

have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.

References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in

which IBM operates or does business.

Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials

and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or

their specific situation.

It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and

interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such

laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law

Notices and Disclaimers Con’t.

39

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not

tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products.

Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the

ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT

NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual

property right.

IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®,

FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG,

Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®,

PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®,

StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business

Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM

trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

Thank YouYour Feedback is Important!

Access the InterConnect 2016 Conference Attendee

Portal to complete your session surveys from your

smartphone,

laptop or conference kiosk.