47
High Availability Clustering with Pacemaker and DRBD Dan Frîncu

Pacemaker+DRBD

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Pacemaker+DRBD

High Availability Clustering with Pacemaker and DRBD

Dan Frîncu

Page 2: Pacemaker+DRBD

2

Dan Frîncu@previous experience• Have been working with clustering technologies for 3 years

• The first 2 years were spent

– Migrating cluster stack to Pacemaker, OpenAIS/Corosync, DRBD

– Giving trainings to Product Managers, Sales, Delivery Engineers, Support teams

– Integrating the company's software products with this cluster stack and with other cluster technologies

– Performance testing, hardware benchmarks, designing cluster solutions for company's clients (RFI, RFP), writing documentation, packaging, deploying solutions remotely

Page 3: Pacemaker+DRBD

3

Dan Frîncu@1&1 Internet Development• Co-developer of the LinuxDesktop project

• Responsible for IT Operations on the LinuxDesktop project

• The backend for LinuxDesktop is a cluster running on Pacemaker, Corosync & DRBD

• <spam> LinuxDesktop is an custom built GNU/Linux operating system developed for 1&1 employees </spam>

Page 4: Pacemaker+DRBD

4

• Clustering - Introduction

• High Availability Clustering – A historical background and future endeavors

• Cluster components – Tools of the trade

• Clustering scenarios – Fitting the needs

• Resource agents – Controlling cluster services

• Demo

• Q&A

Page 5: Pacemaker+DRBD

5

• Generic types of clusters– HA – High Availability (a.k.a. failover clusters)

• Failover/Failback

• Load Balancing

– HPC – High Performance Computing

• Parallel Programming

• Distributed Computing

ClusteringIntroduction

Page 6: Pacemaker+DRBD

6

• Why do I need a cluster?– “There's no Upside to Downtime” (SAForum.org)

– Hardware redundancy does not account for software bugs, human error or gremlins chewing on the cables

– I'm a developer working on an application, do I need a cluster?

– It depends on what your requirements are!

– Most Dev's don't have full access to the backend they work on

– When there is an issue, detection of a fault can be automatic, but recovery is done through human intervention, which can take time

ClusteringIntroduction

Page 7: Pacemaker+DRBD

7

• HA Clusters – How low can you go?– Minimum number of nodes for a HA cluster is 2

– There is no theoretical upper limit to the number of nodes, but HA clusters usually span 2-32 nodes

– If you need more than 32 nodes in one cluster, you probably need to rethink the design of the cluster, could be that HPC fits better

– Default setups can go up to 8-10 nodes without any specific tweaks

– Going above 10 nodes requires taking into consideration delays, mostly network related (STP convergence, multicast groups join/part, etc.)

ClusteringIntroduction

Page 8: Pacemaker+DRBD

8

• HA Clusters – What can 2 nodes do?– The most common size for HA clusters is 2 nodes

– Active/Passive – Applications run on one node, if it fails the other node takes over and starts all apps on it

– Active/Active – Applications run on all/a subset of all nodes (usually resources must be either stateless or depend on a shared storage to work)

• Again, minimum number of nodes for both Active/Passive and Active/Active is 2

• Shared storage can be a dedicated SAN or be easily achieved through use of DRBD

ClusteringIntroduction

Page 9: Pacemaker+DRBD

9

• HA Clusters – Beyond 2 nodes?– N+1 – N nodes with 1 backup node, applications run

on any of the N nodes, either node fails, the backup node takes over service

– N+M – N nodes with M backups, usually it is calculated what the expected hardware failure ratio will be for a specific service, then based on this the number of backups is put into play (e.g.: 4:1, 7:2, etc.)

– N-to-1 – A variation of N+1, it does the same thing, but for a limited timeframe, after restoring the failed node, the service would failback

– N-to-N – N+M meets Active/Active

ClusteringIntroduction

Page 10: Pacemaker+DRBD

10

• Clustering - Introduction

• High Availability Clustering – A historical background and future endeavors

• Cluster components – Tools of the trade

• Clustering scenarios – Fitting the needs

• Resource agents – Controlling cluster services

• Demo

• Q&A

Page 11: Pacemaker+DRBD

11

• Once upon a time, there was Heartbeat

• Main software that came out of the Linux-HA project

• Heartbeat v1– Was limited to two nodes

– Supported only very simple failover

– Had no resource monitoring (external monitoring was required)

High Availability ClusteringA historical background and future endeavors

Page 12: Pacemaker+DRBD

12

• Heartbeat v2– Added support for n-node clusters

– Resource monitoring

– Dependencies

– Policies

High Availability ClusteringA historical background and future endeavors

Page 13: Pacemaker+DRBD

13

• Heartbeat v2.1.4 – A fork split in the road– Cluster Resource Manager split off into independent

project – Pacemaker

– Resource Agents and Cluster-Glue moved into separate packages

– Heartbeat name is associated from this point forward only with the cluster messaging and membership layer

High Availability ClusteringA historical background and future endeavors

Page 14: Pacemaker+DRBD

14

• Heartbeat v3 – under new leadership– Starting with January 2010, Heartbeat code base

development done by LINBIT (whom also develop DRBD)

• LINBIT announced it has– “no intention to add significant features to the

Heartbeat code base, or extend its functionality significantly”

– “no intention to establish the Heartbeat code base as a long-term alternative or competition to the OpenAIS/Corosync cluster messaging layer”

High Availability ClusteringA historical background and future endeavors

Page 15: Pacemaker+DRBD

15

• Heartbeat v3 – A glance into the future– Why continue to use Heartbeat?

• It works!

• Simple configuration!

• People don't like change™

– There are two sides to every story

• No upper limits, but cluster cannot grow beyond a maximum message size <64kB (~16 hosts)

• No support for cluster filesystems (GFS2, OCFS2, CLVM2, etc.)

• No new features to be developed

High Availability ClusteringA historical background and future endeavors

Page 16: Pacemaker+DRBD

16

• OpenAIS/Corosync – The story begins– Service Availability Forum (HP, Oracle, Erricson, a.o.)

defined the Application Interface Specification (AIS), an API designed to provide inter-operable HA services, from which the OpenAIS project began its life

– In 2008, OpenAIS (a OSI Certified implementation of the AIS spec) got split into 2 projects: Corosync and OpenAIS

– Corosync provides cluster messaging & membership

– OpenAIS provides the rest of AIS spec (plugin to Corosync)

High Availability ClusteringA historical background and future endeavors

Page 17: Pacemaker+DRBD

17

• Corosync – The basics– It's a cluster messaging and membership layer

providing reliable communications between nodes

– Supports multiple transports, such as unicast, multicast, broadcast, as well as InfiniBand

– Supports clustered filesystems (GFS2, OCFS2, CLVM2, etc.)

– Configurable maximum message size (1MB by default) which means it can scale to more nodes and resources per node than Heartbeat

– Redundant self-recovering communication rings (starting with version 1.4.0)

High Availability ClusteringA historical background and future endeavors

Page 18: Pacemaker+DRBD

18

• Corosync – The basics– Used by RedHat as the only cluster stack for

Pacemaker starting with RHEL6

– Used as High Availability framework by Pacemaker and Apache Qpid

– Used as communications layer by Sheepdog, Proxmox VE (v2.0) and Openfiler (v2.99)

– Runs on all major GNU/Linux distros: SLES, RHEL, Ubuntu, Debian, Fedora, Gentoo

High Availability ClusteringA historical background and future endeavors

Page 19: Pacemaker+DRBD

19

• Pacemaker – The road ahead– It is a Cluster Resource Manager

– Detects and recovers from node and resource-level failures

– Supports both Corosync and Heartbeat stacks

– Resource agnostic

– Supports STONITH for ensuring data integrity

– Automatically replicated configuration

– Python-based unified, scriptable, cluster shell

• Validation of input prior to commit

• Syntax highlighting

High Availability ClusteringA historical background and future endeavors

Page 20: Pacemaker+DRBD

20

• Pacemaker – The road ahead– Tool for making offline configuration changes

– Trigger recurring actions at known times (cron-like or based on date comparisons – gt, lt, in-range)

– RelaxNG-based configuration schema

– Connecting to the CIB from non-cluster machines

– Supports cluster-wide service ordering, colocation and anti-colocation

– Supports advanced services

• Clones: services that need to run on N nodes

• Multi-state: Master/Slave, Primary/Secondary

High Availability ClusteringA historical background and future endeavors

Page 21: Pacemaker+DRBD

21

• Pacemaker – Future developments– Given the possible limitations to the number of

nodes/resources within a single Pacemaker cluster (not limited by Pacemaker itself but by the underlying messaging stack), scalability to thousands of nodes is in question

– To address it, development of Pacemaker Cloud has already begun

– Pacemaker Cloud provides high levels of service availability for high scale cloud deployments

– Reuses PEngine library from Pacemaker

High Availability ClusteringA historical background and future endeavors

Page 22: Pacemaker+DRBD

22

• Pacemaker – Future developments– Integrates with several technologies

• Matahari – A stripped down version of Pacemaker suited for running inside VM's

• DeltaCloud – An API that abstracts the differences between cloud providers, preventing a vendor lock-in

– Project under development, not yet ready for mainstream use

High Availability ClusteringA historical background and future endeavors

Page 23: Pacemaker+DRBD

23

•Pacemaker – Future developments– Strech clusters

• (multi-site clusters/clusters of clusters) was discussed as being under development

• On 4th of December, Booth cluster ticket manager was launched

• Multi-site clusters can be considered as “overlay” clusters where each cluster site corresponds to a cluster node in a traditional cluster

– Scalability is addressed both on short term as well as on long term

High Availability ClusteringA historical background and future endeavors

Page 24: Pacemaker+DRBD

24

• DRBD – Shared storage made easy– Spans 2 cluster nodes (Master/Slave or

Master/Master)

– All write I/O synchronously replicated to other node

– Also considered to be a Network-based RAID-1

– Highly used in the industry, including in 1&1, therefore most of its features are well known

• Stacked resources, which can lead to a 3-way and even 4-way replication (as of version 8.3)

• Adaptive dynamic resync rate controller (starting with 8.3.11)

High Availability ClusteringA historical background and future endeavors

Page 25: Pacemaker+DRBD

25

•DRBD – Shared storage made easy• Multi volume feature allows usage of several minor

devices “within the same resource”

• Planned features include development of a „full data log“ (it may have another name when released), which would allow for the Secondary to be consistent even after replication link “hickups” or fallback to bitmap

High Availability ClusteringA historical background and future endeavors

Page 26: Pacemaker+DRBD

26

• Clustering – Introduction

• High Availability Clustering – A historical background and future endeavors

• Cluster components – Tools of the trade

• Clustering scenarios – Fitting the needs

• Resource agents – Controlling cluster services

• Demo

• Q&A

Page 27: Pacemaker+DRBD

27

Cluster componentsTools of the trade

Page 28: Pacemaker+DRBD

28

• Pacemaker's internal components– CRMd – Cluster Resource Manager daemon (a

message broker between PEngine and LRMd)

– LRMd – Local Resource Manager daemon (non-cluster aware daemon that interacts with resource agents – scripts – directly)

– PEngine – Policy Engine (the “brain”, computes the next state of the cluster based on current state + conf)

– CIB – Cluster Information Base (contains all cluster information, synchronizes updates to all nodes)

– STONITHd – Shoot-The-Other-Node-In-The-Head Daemon (a subsystem for node fencing)

Cluster componentsTools of the trade

Page 29: Pacemaker+DRBD

29

Cluster componentsTools of the trade

Page 30: Pacemaker+DRBD

30

• Clustering – Introduction

• High Availability Clustering – A historical background and future endeavours

• Cluster components – Tools of the trade

• Clustering scenarios – Fitting the needs

• Resource agents – Controlling cluster services

• Demo

• Q&A

Page 31: Pacemaker+DRBD

31

• Pacemaker can support practically any redundancy configuration including– Active/Active

– Active/Passive

– N+1

– N+M

– N-to-1

– N-to-N

Clustering scenariosFitting the needs

Page 32: Pacemaker+DRBD

32

Clustering scenariosFitting the needs

Page 33: Pacemaker+DRBD

33

Clustering scenariosFitting the needs

Page 34: Pacemaker+DRBD

34

Clustering scenariosFitting the needs

Page 35: Pacemaker+DRBD

35

Clustering scenariosFitting the needs

Page 36: Pacemaker+DRBD

36

Clustering scenariosFitting the needs

Page 37: Pacemaker+DRBD

37

• Clustering – Introduction

• High Availability Clustering – A historical background and future endeavours

• Cluster components – Tools of the trade

• Clustering scenarios – Fitting the needs

• Resource agents – Controlling cluster services

• Demo – Don't try this at home

• Q&A

Page 38: Pacemaker+DRBD

38

• Definition: a Resource Agent is a standardized interface for a cluster resource

• Pacemaker supports four types of RA's:– Heartbeat v1 (legacy, deprecated)

– LSB (Linux Standard Base) “init scripts”

– OCF (Open Cluster Framework)

– STONITH Resource Agents

• Most Resource Agents are coded as shell scripts

Resource agentsControlling cluster services

Page 39: Pacemaker+DRBD

39

• LSB Resource Agents– Are the scripts found in /etc/init.d/

– Require LSB compliance in terms of exit codes and arguments for usage within a Pacemaker cluster

– Although many distributions boast LSB compliant init scripts, they ship with broken ones

– Broken LSB compliance leads to “controlling the service via init script works”, “controlling the service via Pacemaker doesn't work”

• Always check if a script is LSB compliant before adding it to a Pacemaker cluster

Resource agentsControlling cluster services

Page 40: Pacemaker+DRBD

40

• OCF Resource Agents– Are the scripts found in

/usr/lib/ocf/resource.d/provider/

– The OCF spec is an extension of the definitions for LSB Resource Agents

– Require same LSB compliance in terms of exit codes and arguments for usage within a Pacemaker cluster

– Support additional parameters to be passed to the script

– Support additional actions compared to the LSB Resource Agents

– Can be tested with ocf-tester for compliance

Resource agentsControlling cluster services

Page 41: Pacemaker+DRBD

41

• Supported operations of Resource Agents– start: enable or start the given resource

– stop: disable or stop the given resource

– monitor: check whether the resource is running or not

– validate-all: validate the resource's configuration

– meta-data: return information about the RA itself (used by GUI's and other tools)

Resource agentsControlling cluster services

Page 42: Pacemaker+DRBD

42

• Additional operations provided by OCF Resource Agents– promote: promote the local instance of a resource to

the master/primary state

– demote: demote the local instance of a resource to the slave/secondary state

– notify: used by the cluster to send the agent pre and post notification events to the resource

– reload: reload the configuration of the resource

– migrate_from/migrate_to: perform live migration of a resource

Resource agentsControlling cluster services

Page 43: Pacemaker+DRBD

43

• Resource scores– Every resource has a score, even if not explicitly

defined

– CRM (through Pengine) uses scores to calculate resource placement within the available cluster nodes

– Every action related to placement of a resource is related to a score attribution and its manipulation

– Highest score INF (1.000.000), lowest score -INF

(-1.000.000); resources can get any score within the range, including INF/-INF

– Positive values mean “can run”, negative values mean “cannot run”; +/- INF change “can” to “must”

Resource agentsControlling cluster services

Page 44: Pacemaker+DRBD

44

• Clustering – Introduction

• High Availability Clustering – A historical background and future endeavours

• Cluster components – Tools of the trade

• Clustering scenarios – Fitting the needs

• Resource agents – Controlling cluster services

• Demo

• Q&A

Page 45: Pacemaker+DRBD

45

• Clustering – Introduction

• High Availability Clustering – A historical background and future endeavours

• Cluster components – Tools of the trade

• Clustering scenarios – Fitting the needs

• Resource agents – Controlling cluster services

• Demo

• Q&A

Page 46: Pacemaker+DRBD

46

Q&A

Resource agentsControlling cluster services

Page 47: Pacemaker+DRBD

47

• Useful resources and links– http://fghaas.wordpress.com/2009/11/16/linbit-announces-stewardship-for-

heartbeat-code-base/

– http://www.saforum.org/Application-Interface-Specification~217404~16627.htm

– http://linux-ha.org/wiki/LSB_Resource_Agents

– http://linux-ha.org/wiki/OCF_Resource_Agents

– http://www.openais.org/doku.php

– http://www.corosync.org/doku.php

– http://www.clusterlabs.org

– http://www.drbd.org

Contact