76
Galera Cluster Best Practices Seppo Jaakola Codership

Plny12 galera-cluster-best-practices

Embed Size (px)

Citation preview

Page 1: Plny12 galera-cluster-best-practices

Galera Cluster Best Practices

Seppo JaakolaCodership

Page 2: Plny12 galera-cluster-best-practices

2www.codership.com

Agenda

● Galera Cluster Short Introduction● Multi-master Conflicts● State Transfers (SST IST) ● Backups● Schema Upgrades● Galera Project

Page 3: Plny12 galera-cluster-best-practices

Galera Cluster

Page 4: Plny12 galera-cluster-best-practices

4www.codership.com

MySQL

Multi-Master Replication

a Galera Replication

Page 5: Plny12 galera-cluster-best-practices

5www.codership.com

MySQL MySQL

Multi-Master Replication

a Galera Replication

There can be several nodes

Page 6: Plny12 galera-cluster-best-practices

6www.codership.com

MySQL MySQL MySQL

Multi-Master Replication

a Galera Replication

There can be several nodes

Page 7: Plny12 galera-cluster-best-practices

7www.codership.com

MySQL MySQL MySQL

Multi-Master Replication

a Galera Replication

Client can connect to any node

There can be several nodes

Page 8: Plny12 galera-cluster-best-practices

8www.codership.com

MySQL MySQL MySQL

Multi-Master Replication

a Galera Replication

Client can connect to any node

There can be several nodes

read & write read & write read & write Read & write access to any node

Page 9: Plny12 galera-cluster-best-practices

9www.codership.com

MySQL MySQL MySQL

Multi-Master Replication

a Galera ReplicationReplication is synchronous

read & write read & write read & write

Client can connect to any node

There can be several nodes

Read & write access to any node

Page 10: Plny12 galera-cluster-best-practices

10www.codership.com

MySQL

Multi-Master Replication

a

Multi-master cluster looks like one big database with multiple entry points

read & write read & write read & write

Page 11: Plny12 galera-cluster-best-practices

11www.codership.com

Galera Cluster

➢ Synchronous multi-master cluster➢ For MySQL/InnoDB➢ 3 or more nodes needed for HA➢ Automatic node provisioning➢ Works in LAN / WAN / Cloud

Page 12: Plny12 galera-cluster-best-practices

14www.codership.com

MySQL MySQL MySQL

Synchronous Replication

aGalera Replication

Read & write

Transaction is processed locally up to commit time

Page 13: Plny12 galera-cluster-best-practices

15www.codership.com

MySQL MySQL MySQL

Synchronous Replication

aGalera Replication

commit

Transaction is replicated to whole cluster

Page 14: Plny12 galera-cluster-best-practices

16www.codership.com

Galera Replication

MySQL MySQL MySQL

Synchronous Replication

a

OK

Client gets OK status

Page 15: Plny12 galera-cluster-best-practices

17www.codership.com

Galera Replication

MySQL MySQL MySQL

Synchronous Replication

a

Transaction is applied in slaves

Page 16: Plny12 galera-cluster-best-practices

Dealing with Multi-Master Conflicts

Page 17: Plny12 galera-cluster-best-practices

20www.codership.com

MySQL MySQL MySQL

Multi-master Conflicts

aGalera Replication

write write

Page 18: Plny12 galera-cluster-best-practices

21www.codership.com

MySQL MySQL MySQL

Multi-master Conflicts

aGalera Replication

write write

Conflict detected

Page 19: Plny12 galera-cluster-best-practices

22www.codership.com

MySQL MySQL MySQL

Multi-master Conflicts

aGalera Replication

OK writeDeadlockerror

Page 20: Plny12 galera-cluster-best-practices

23www.codership.com

Multi-Master Conflicts

● Galera uses optimistic concurrency control● If two transactions modify same row on

different nodes at the same time, one of the transactions must abort➔ Victim transaction will get deadlock error

● Application should retry deadlocked transactions, however not all applications have retrying logic inbuilt

Page 21: Plny12 galera-cluster-best-practices

24www.codership.com

Database Hot-Spots

● Some rows where many transactions want to write to simultaneously

● Patterns like queue or ID allocation can be hot-spots

Page 22: Plny12 galera-cluster-best-practices

25www.codership.com

Hot-Spots

a

write

Hot row

writewrite

Page 23: Plny12 galera-cluster-best-practices

26www.codership.com

Diagnosing Multi-Master Conflicts

● In the past Galera did not log much information from cluster wide conflicts

● But, by using wsrep_debug configuration, all conflicts (...and plenty of other information) will be logged

● Next release will add new variable: wsrep_log_conflicts which will cause each cluster conflict to be logged in mysql error log

● Monitor:● wsrep_local_bf_aborts● wsrep_local_cert_failures

Page 24: Plny12 galera-cluster-best-practices

27www.codership.com

wsrep_retry_autocommit

● Galera can retry autocommit transaction on behalf of the client application, inside of the MySQL server

● MySQL will not return deadlock error, but will silently retry the transaction

● wsrep_retry_autocommit=n will retry the transaction n times before giving up and returning deadlock error

● Retrying applies only to autocommit transactions, as retrying is not safe for multi-statement transactions

Page 25: Plny12 galera-cluster-best-practices

28www.codership.com

MySQL MySQL MySQL

Retry Autocommit

aGalera Replication

Writewrite

1. conflict detected

2. retrying

Page 26: Plny12 galera-cluster-best-practices

29www.codership.com

Multi-Master Conflicts

1)Analyze the hot-spot

2)Check if application logic can be changed to catch deadlock exception and apply retrying logic in application

3)Try if wsrep_retry_autocommit configuration helps

4)Limit the number of master nodes or change completely to master-slave model

if you can filter out the access to the hot-spot table, it is enough to treat writes only to hot-spot table as master-slave

Page 27: Plny12 galera-cluster-best-practices

State Transfers

Page 28: Plny12 galera-cluster-best-practices

31www.codership.com

State Transfer

➢ Joining node needs to get the current database state

➢ Two choices:➢ IST: incremental state transfer➢ SST: full state transfer

➢ If joining node had some previous state and gcache spans to that, then IST can be used

Page 29: Plny12 galera-cluster-best-practices

32www.codership.com

State Snapshot Transfer

● To send full database state● wsrep_sst_method to choose the method:

➢ mysqldump➢ rsync➢ xtrabackup

Page 30: Plny12 galera-cluster-best-practices

33www.codership.com

MySQL MySQL

SST Request

Galera Replication

joinerSST Request● wsrep_sst_method

Page 31: Plny12 galera-cluster-best-practices

34www.codership.com

MySQL donor

SST Method

Galera Replication

joiner

wsrep_sst_mysqldump

wsrep_sst_rsync

wsrep_sst_xtrabackup

Page 32: Plny12 galera-cluster-best-practices

35www.codership.com

SST API

● SST is open API for shell scripts● Anyone can write custom SST● SST API can be used e.g. for:

● Backups● Filtering out part of database

Page 33: Plny12 galera-cluster-best-practices

36www.codership.com

wsrep_sst_mysqldump

● Logical backup● Slowest method● Configure authentication

➢ wsrep_sst_auth=”root:rootpass”➢ Super privilege needed

● Make sure SST user in donor node can take mysqldump from donor and load it over the network to joiner node

● You can try this manually beforehand

Page 34: Plny12 galera-cluster-best-practices

37www.codership.com

wsrep_sst_rsync

● Physical backup● Fast method● Can only be used when node is starting

➢ Rsyncing datadirectory under running InnoDB is not possible

Page 35: Plny12 galera-cluster-best-practices

38www.codership.com

wsrep_sst_xtrabackup

● Contributed by Percona● Probably the fastest method● Uses xtrabackup● Least blocking on Donor side (short readlock is still used when backup starts)

Page 36: Plny12 galera-cluster-best-practices

39www.codership.com

SST Donor

● All SST methods cause some disturbance for donor node

● By default donor accepts client connections, although committing will be prohibited for a while

● If wsrep_sst_donor_rejects_queries is set, donor gives unknown command error to clients

➔Best practice is to dedicate a reference node for donor and backup activities

Page 37: Plny12 galera-cluster-best-practices

40www.codership.com

Donor

Incremental State Transfer

Request to join

Joiner

gcache

GTID: seqno-n

gcache

seqno-n

Page 38: Plny12 galera-cluster-best-practices

41www.codership.com

Donor

Incremental State Transfer

Joiner

seqno-ngcachegcache

Send IST eventsapply

Page 39: Plny12 galera-cluster-best-practices

42www.codership.com

Incremental State Transfer

● Very effective● gcache.size parameter defines how big cache will be maintained

● Gcache is mmap, available disk space is upper limit for size allocation

Page 40: Plny12 galera-cluster-best-practices

43www.codership.com

Incremental State Transfer

● Use database size and write rate to optimize gcache:

➢ gcache < database➢ Write rate tells how long tail will be stored in cache

Page 41: Plny12 galera-cluster-best-practices

44www.codership.com

Incremental State Transfer

● You can think that IST Is ● A short asynchronous replication session ● If communication is bad quality, node

can drop and join back fast with IST

Page 42: Plny12 galera-cluster-best-practices

Backups Backups Backups

Page 43: Plny12 galera-cluster-best-practices

46www.codership.com

Backups

➢ All Galera nodes are constantly up to date➢ Best practices:

➢ Dedicate a reference node for backups➢ Assign global trx ID with the backup

➢ Possible methods:

1.Disconnecting a node for backup2.Using SST script interface3.xtrabackup

Page 44: Plny12 galera-cluster-best-practices

47www.codership.com

Backups with global Trx ID

➢ Global transaction ID (GTID) marks a position in the cluster transaction stream

➢ Backup with known GTID make it possible to utilize IST when joining new nodes, eg, when: ➢ Recovering the node➢ Provisioning new nodes

Page 45: Plny12 galera-cluster-best-practices

48www.codership.com

Backup by Disconnecting a Node

MySQL MySQL MySQL

Galera Replication

Load Balancing Isolate the backup node

Page 46: Plny12 galera-cluster-best-practices

49www.codership.com

Backup by Disconnecting a Node

MySQL MySQL MySQL

Galera Replication

Load Balancing

Disconnect from groupe.g. clear wsrep_provider

Page 47: Plny12 galera-cluster-best-practices

50www.codership.com

Backup by Disconnecting a Node

MySQL MySQL MySQL

Galera Replication

Load Balancing

Disconnect from groupe.g. clear wsrep_provider

Page 48: Plny12 galera-cluster-best-practices

51www.codership.com

Backup by Disconnecting a Node

MySQL MySQL MySQL

Galera Replication

Load Balancing

Work your backup magic

backups

Page 49: Plny12 galera-cluster-best-practices

52www.codership.com

Backup by Disconnecting a Node

MySQL MySQL MySQL

Galera Replication

Load Balancing

Read global transaction ID from status and assign to backupwsrep_cluster_uuidwsrep_last_committed

backups

Page 50: Plny12 galera-cluster-best-practices

53www.codership.com

Backup by SST

● Donor mode provides isolated processing environment

● A special SST script can be written just to prepare backup in donor node: wsrep_sst_backup

● Garbd can be used to trigger donor node to run the wsrep_sst_backup

Page 51: Plny12 galera-cluster-best-practices

54www.codership.com

Backup by SST API

node1 node3

Galera Replication

Load Balancing Launch garbd

Garbd

SST request

wsrep_sst_donor=node3wsrep_sst_method=backup

node2

Page 52: Plny12 galera-cluster-best-practices

55www.codership.com

Backup by SST API

node1 node2 node3

Galera Replication

Load Balancing Donor launcheswsrep_sst_backup

wsrep_sst_backup

.

.

.

Page 53: Plny12 galera-cluster-best-practices

56www.codership.com

Backup by SST API

node1 node2 node3

Galera Replication

Load Balancing wsrep_sst_backup prepares the backup

wsrep_sst_backup

.

.

.

backups

GTID

Page 54: Plny12 galera-cluster-best-practices

57www.codership.com

Backup by SST API

node1 node2 node3

Galera Replication

Load Balancing Backup node returns to cluster

Page 55: Plny12 galera-cluster-best-practices

58www.codership.com

Backup by xtrabackup

● Xtrabackup is hot backup method and can be used anytime

● Simple, efficient● Use –galera-info option to get global transaction

ID logged into separate galera info file

Page 56: Plny12 galera-cluster-best-practices

Schema Upgrades

Page 57: Plny12 galera-cluster-best-practices

60www.codership.com

Schema Upgrades

● DDL is non-transactional, and therefore bad

● Galera has two methods for DDL● TOI, Total Order Isolation● RSU, Rolling Schema Upgrade

● Use wsrep_osu_method to choose either option

Page 58: Plny12 galera-cluster-best-practices

61www.codership.com

Total Order Isolation

● DDL is replicated up-front● Each node will get the DDL statement

and must process the DDL at same slot in transaction stream

● Galera will isolate the affected table/database for the duration of DDL processing

Page 59: Plny12 galera-cluster-best-practices

62www.codership.com

Rolling Schema Upgrade

● DDL is not replicated● Galera will take the node out of replication

for the duration of DDL processing● When DDL is done with, node will catch up

with missed transactions (like IST)● DBA should roll RSU operation over all

nodes● Requires backwards compatible schema

changes

Page 60: Plny12 galera-cluster-best-practices

63www.codership.com

wsrep_on=OFF

● wsrep_on is a session variable telling if this session will be replicated or not

● I tried to hide this information to the best I can, but somebody has leaked this out

● And so, yes, it is possible to run “poor man's RSU” with wsrep_on set to OFF

● such session may be aborted by replication● Use only, if you are really sure that:

● planned SQL is not conflicting● SQL will not generate inconsistency

Page 61: Plny12 galera-cluster-best-practices

64www.codership.com

Schema Upgrades

● Best practices:➔ Plan your upgrades

➔ Try to be backwards compatible➔ Rehearse your upgrades

➔ Find out DDL execution time➔ Go for RSU if possible

Page 62: Plny12 galera-cluster-best-practices

Consistent Reads

Page 63: Plny12 galera-cluster-best-practices

66www.codership.com

Consistent reads

MySQL MySQL MySQL

Galera Replication

commit

Transaction is replicated to whole cluster

Replication is virtually synchronous...

Page 64: Plny12 galera-cluster-best-practices

67www.codership.com

Consistent reads

MySQL MySQL

Galera Replication

1. Insert into t1 values (1,....)

Will the select see the inserted row?

2. Select from t1 where i=1

Page 65: Plny12 galera-cluster-best-practices

68www.codership.com

Consistent Reads

● Aka read causality● There is causal dependency between

operations on two database connections● Application is expecting to see the values of

earlier write

Page 66: Plny12 galera-cluster-best-practices

69www.codership.com

Consistent Reads

● Use: wsrep_causal_reads=ON➔ Every read (select, show) will wait until slave

queue has been fully applied● There is timeout for max causal read wait:

● replicator.causal_read_keepalive

Page 67: Plny12 galera-cluster-best-practices

Other Tidbits...

Page 68: Plny12 galera-cluster-best-practices

71www.codership.com

Parallel Applying

● Aka parallel replication● “true parallel applying”

● Every application will benefit of it● Works not on database, not on table,

but on row level● wsrep_slave_threads=n● How many slaves makes sense:

● Monitor wsrep_cert_deps_distance● Max 2 * cores

Page 69: Plny12 galera-cluster-best-practices

72www.codership.com

MyISAM Replication

● On experimental level● MyISAM is phasing out not much demand to

complete● Replicates SQL up-front, like TOI● Should be used in master-slave model● No checks for non-deterministic SQL

● Insert into t (r, time) values (rand(), now());

Page 70: Plny12 galera-cluster-best-practices

73www.codership.com

SSL / TLS

● Replication over SSL is supported● No authentication (yet), only encryption● Whole cluster must use SSL

Page 71: Plny12 galera-cluster-best-practices

74www.codership.com

SSL or VPN

● Bundling several nodes through VPN tunnel may cause a vulnerability

● When VPN gateway breaks, a big part of cluster will be blacked out

● Best practice is to go for SSL if VPN does not have alternative routes

Page 72: Plny12 galera-cluster-best-practices

75www.codership.com

UDP Multicast

● Configure with gmcast.mcast_addr● Full cluster must be configured for

multicast or tcp sockets● Multicast is good for scalability● Best practice is to go for multicast if

planning for large clusters

Page 73: Plny12 galera-cluster-best-practices

Galera Project

Page 74: Plny12 galera-cluster-best-practices

77www.codership.com

Galera Project

● Galera Cluster for MySQL● 5 years development● based on MySQL server community edition● Fully open source● Active community

● ~3 releases per year● Release 2.2 RC out yesterday● Major release 3.0 in the works

● Galera Replication also used in:● Percona XtraDB Cluster● MariaDB Galera Cluster

Page 75: Plny12 galera-cluster-best-practices

78www.codership.com

Galera Project

API

MySQL

Galera Replication plugin

API

PerconaServer

API

MariaDB

mergemerge

Percona XtraDB Cluster

Galera Cluster for MySQL

MariaDB Galera Cluster

Page 76: Plny12 galera-cluster-best-practices

Questions?

Thank you for listening!Happy Clustering :-)