HA Cluster

7/31/2019 HA Cluster

1/14

HA CLUSTER

Prepared By:Dhairya Giri

Rahul Mehta

Smit Gohel

Shruj Dabhi

Zishan Murji


2/14

INTRODUCTION

High-availability (HA) clustering is a solution that uses clusteringsoftware and special purpose hardware to minimize system downtime

HA clusters are groups of computing resources that are implemented

to provide high availability of software and hardware computingservices


3/14

Basic Work Done

Putting together a group of computers which trust each other to

provide a service even when system components fail

When one machine goes down, others take over its work. This

involves IP address takeover, service takeover, etc.

If 1 node shuts down or fails, another node takes over application

load and facilitates planned maintenance

Performs its function continuously for a significantly longer period

of Time


4/14

WHY HA CLUSTER???

HA clusters usually use a Heartbeat private network connection

which is used to monitor the health and status of each node in the

cluster

HA Cluster consists of

R.A.S.Reliability: High degree of protection for corporate data as

information is a crucial business asset

Availability: Continuous data access

Serviceability: Procedures to correct problems with minimal

business impact


5/14

HA Cluster Categories

There are two main Categories of HA Cluster:

Shared Disk: There is only one SHARED Disk. All nodes have access

to that same storage. A locking mechanism protects against race.

Shared Nothing clusters: At any given time, only one node owns a

disk. When a node fails, another owns it.


6/14

CONCEPTS & COMPLICATIONS

HA Clusters introduce concepts and complications around:

Split-Brain

Quorum

Fencing

One subtle, but serious condition all clustering software must be

able to handle is split-brain


7/14

Split Brain

Split-brain occurs when all of the private links go down simultaneously,

but the cluster nodes are still running

If that happens, each node in the cluster may mistakenly decide that

every other node has gone down and attempt to start services that othernodes are still running

Having duplicate instances of services may cause data corruption on

the shared storage

This condition is called SPLIT BRAIN condition


8/14

Quorum

Quorum is an attempt to avoid split brain for most kinds of failures

Typically one tries to make sure only one partition can be active and

Quorum is term for methods for ensuring this

One disadvantage is that this doesn't work very well for 2 nodes


9/14

Fencing

Fencing tries to put a fence around an errant node or nodes to keepthem from accessing cluster resources

This way one doesn't have to rely on correct behaviour or timing ofthe errant node.

We use STONITH to do this

STONITH: Shoot The Other Node In The Head


10/14

NODE CONFIGURATION

The most common size for an HA Cluster is a two-node cluster and

such configuration can sometimes be categorized into:

Active/Active: Traffic intended for the failed node is either passed

onto an existing node or load balanced across the remaining nodes

Active/Passive: Provides a fully redundant instance of each node,

which is only brought online when its associated primary node fails

N-to-1: Allows the failover standby node to become the active one

temporarily, until the original node can be restored or brought back

online


11/14

Virtualization

The usual goal of virtualization is to centralize administrative tasks

while improving scalability and work loads.

They allow to run multiple virtual servers on a single physical machine.

By combining virtualization and HA clustering, it is possible to benefit

from increased manageability and savings from server consolidation

through virtualization without decreasing uptime of critical services.


12/14

FAILOVER STRATEGIES

Systems that handle failures have different strategies to get rid of a

failure, these are three ways to configure a failover:

FAIL_FAST: The try fails, if the first node cannot be reached

ON_FAIL_TRY_ONE_NEXT_AVAILABLE: Tries one more host before

giving up

ON_FAIL_TRY_ALL_AVAILABLE: Tries all existing nodes before giving

up


13/14

Benefits

Supports many operating systems like Windows, Linux, Sun Solaris,

etc.

Simple to install, configure and maintain

Often used for critical databases, file-sharing on a network, business

applications, etc.

Handles and solves Split-Brain condition easily

Provides facility like Heartbeat private Network to maintain the

health on cluster nodes


14/14

Documents

HA Cluster