High Availability - Brett Thurber - ManageIQ Design Summit 2016

Preview:

Citation preview

High AvailabilityManageIQ/CloudForms

Brett Thurber - Red HatJune 2016

AgendaIntroduction & Acknowledgements

What is HA?

Traditional HA

What’s on the horizon?

pglogical

BDR

Containers & Kubernetes

Summary

Q & A

Introduction & Acknowledgements

Brett Thurber - RHCT, RHCE, RHCDS, RHCA, RHCVA

20+ years of IT experience

Been with Red Hat since 2011

Team lead in Systems Engineering focused on management and integrated solutions

Worked with MIQ/CloudForms since 2013

Authored 11 Reference Architectures

Presented at RH Summit 2015 - Application portability & interoperability with Red Hat Cloud Infrastructure

Contact: bthurber@redhat.com

Special thanks to:

Gregg Tanzillo, Nick Carboni, Joe Rafaniello

What is HA?“A system or component that is continuously operational for a desirably long

length of time. Availability can be measured relative to "100% operational" or "never failing."” - Source: SearchDataCenter

“A characteristic of a system, which aims to ensure an agreed level of operational performance for a higher than normal period.” - Source: Wikipedia

Traditional HA

Heavy Lift

Highly complex and resource intensive

Shared storage

iSCSI, NFS, fibre channel

Multiple number of bare metal or VM hosts

Minimum of 2 cluster hosts for pgsql database

2+ MIQ/CFME instances

Haproxy to load balance

Complex and time intensive deployment

Typical deployment time measured in days

Stretch cluster risks

Expensive, dedicated high speed connection

Supportability

Data consistency

Active/Passive Deployment Pattern: intra-site

MIQ/CFME

haproxy

VIP

MIQ/CFME

pgsql pgsql pacemaker

VIP

Active/Passive Deployment Pattern: inter-site

MIQ/CFME

haproxy

VIP

MIQ/CFME

pgsql pgsql

Streaming Replication

Site 1 Site 2

What’s on the horizon?

Interesting possibilities...Emerging technologies present the possibility of reducing the complexity of HA

and postgresql.

pglogical

BDR

Containers & Kubernetes

pglogical

pglogical

What is pglogical?

pglogical offers Logical Replication as a PostgreSQL extension and is a replacement for streaming replication

Introduced in postgresql 9.4 (MIQ Capablanca, CloudForms 4.1)

Less complex solution for database replication

pglogical works on a per-database level, not whole server level like physical streaming replication

One Provider may feed multiple Subscribers without incurring additional disk write overhead

One Subscriber can merge changes from several origins and detect conflict between changes with automatic and configurable conflict resolution

Replication across major releases is supported (9.4 and >)

How would it work?

pgsql pgsql pgsql pgsql

VMDB Database

MIQ/CFME MIQ/CFME

haproxy

VIP

SubscribersPublisher

What about failover?

pgsql pgsql pgsql pgsql

VMDB Database

MIQ/CFME MIQ/CFME

haproxy

VIP

SubscribersPublisher

??? ??? ???

pglogical limitations...Not suitable for failover

Automatic DDL (data definition language) replication is not supported

Logical decoding doesn't decode catalog changes directly. So the plugin can't just send a CREATE TABLE statement when a new table is added.

If the data being decoded is being applied to another PostgreSQL database then its table definitions must be kept in sync via some means external to the logical decoding plugin itself, such as:

Event triggers using DDL deparse to capture DDL changes as they happen and write them to a table to be replicated and applied on the other end

Doing DDL management via tools that synchronise DDL on all nodes

Bi-Directional Replication

BDRWhat is BDR?

Bi-Directional Replication (BDR) is an asynchronous multi-master replication system for PostgreSQL, specifically designed to allow geographically distributed clusters. Supporting up to 48 nodes (and possibly more in future releases). BDR is a low overhead, low maintenance technology for distributed databases.

BDR excels in environments where users are distributed across high-latency and/or unreliable network links where conventional tightly-coupled clustering software does not work well

Support for DDL replication and Global DDL locking

Active/Active BDR Deployment Pattern: intra-site

MIQ/CFME

haproxy

VIP

MIQ/CFME

pgsql pgsqlBDR

Active/Active BDR Deployment Pattern: inter-site

MIQ/CFME

haproxy

VIP

MIQ/CFME

pgsql pgsqlBDR

Site 1 Site 2

BDR limitations...Still under development; not production ready (requires modified version of 9.4)

Asynchronous replication

Changes made on one BDR node are not replicated to other nodes before they are committed locally. As a result the data is not exactly the same on all nodes at any given time

Non-shared storage architecture means additional storage space considerations

Containers & Kubernetes

Containers?Docker image for ManageIQ under development

Currently monolithic

Allows for a MIQ container image to be deployed to Atomic Host and other container providers

Service decoupling on the horizon

Utilizing kubernetes pods, allows for:

Service distribution across multiple hosts

Persistent storage to be used for database

Highly available and scalable architecture

Easily upgradeable with quick roll-back capabilities

Self-healing

Possible Container Architecture

Container

Pod

httprails

pgsql

Persistent Storage

Container

Pod

httprails

pgsql

Persistent StorageBDR

Node Proxy

Possible Container Architecture (con’t)

Container

Pod

httprails

pgsql

Persistent Storage

Container

Pod

httprails

pgsql

Persistent StorageBDR

NodeProxy

NodeProxy

Overlay Network

What about networking?Kubernetes imposes the following network rules:

All containers can communicate with all other containers without NATAll nodes can communicate with all containers (and vice-versa) without NATThe IP that a container sees itself as is the same IP that others see it as

Supported overlay networks

L2 networks and linux bridging

Flannel

OpenVSwitch

Romana

OpenShift SDN

etc...

Summary

In closing….

Traditional HA clustering is complex, expensive, time consuming to implement

and poses some support limitations

pglogical is a good replacement for streaming replication however lacks some

needed features to make it a viable HA solution

BDR bridges the necessary gaps with pglogical to offer a viable HA solution

however is still growing in maturity (> postgresql 9.4)

Containers, coupled with Kubernetes, offer compelling use cases to include self-

healing, upgrades, scaling and high availability

Q & A

Thank You!