Upload
dimas-prasetyo
View
1.473
Download
15
Embed Size (px)
Citation preview
Galera Cluster Best Practices
Seppo JaakolaCodership
2www.codership.com
Agenda
● Galera Cluster Short Introduction● Multi-master Conflicts● State Transfers (SST IST) ● Backups● Schema Upgrades● Galera Project
Galera Cluster
4www.codership.com
MySQL
Multi-Master Replication
a Galera Replication
5www.codership.com
MySQL MySQL
Multi-Master Replication
a Galera Replication
There can be several nodes
6www.codership.com
MySQL MySQL MySQL
Multi-Master Replication
a Galera Replication
There can be several nodes
7www.codership.com
MySQL MySQL MySQL
Multi-Master Replication
a Galera Replication
Client can connect to any node
There can be several nodes
8www.codership.com
MySQL MySQL MySQL
Multi-Master Replication
a Galera Replication
Client can connect to any node
There can be several nodes
read & write read & write read & write Read & write access to any node
9www.codership.com
MySQL MySQL MySQL
Multi-Master Replication
a Galera ReplicationReplication is synchronous
read & write read & write read & write
Client can connect to any node
There can be several nodes
Read & write access to any node
10www.codership.com
MySQL
Multi-Master Replication
a
Multi-master cluster looks like one big database with multiple entry points
read & write read & write read & write
11www.codership.com
Galera Cluster
➢ Synchronous multi-master cluster➢ For MySQL/InnoDB➢ 3 or more nodes needed for HA➢ Automatic node provisioning➢ Works in LAN / WAN / Cloud
14www.codership.com
MySQL MySQL MySQL
Synchronous Replication
aGalera Replication
Read & write
Transaction is processed locally up to commit time
15www.codership.com
MySQL MySQL MySQL
Synchronous Replication
aGalera Replication
commit
Transaction is replicated to whole cluster
16www.codership.com
Galera Replication
MySQL MySQL MySQL
Synchronous Replication
a
OK
Client gets OK status
17www.codership.com
Galera Replication
MySQL MySQL MySQL
Synchronous Replication
a
Transaction is applied in slaves
Dealing with Multi-Master Conflicts
20www.codership.com
MySQL MySQL MySQL
Multi-master Conflicts
aGalera Replication
write write
21www.codership.com
MySQL MySQL MySQL
Multi-master Conflicts
aGalera Replication
write write
Conflict detected
22www.codership.com
MySQL MySQL MySQL
Multi-master Conflicts
aGalera Replication
OK writeDeadlockerror
23www.codership.com
Multi-Master Conflicts
● Galera uses optimistic concurrency control● If two transactions modify same row on
different nodes at the same time, one of the transactions must abort➔ Victim transaction will get deadlock error
● Application should retry deadlocked transactions, however not all applications have retrying logic inbuilt
24www.codership.com
Database Hot-Spots
● Some rows where many transactions want to write to simultaneously
● Patterns like queue or ID allocation can be hot-spots
25www.codership.com
Hot-Spots
a
write
Hot row
writewrite
26www.codership.com
Diagnosing Multi-Master Conflicts
● In the past Galera did not log much information from cluster wide conflicts
● But, by using wsrep_debug configuration, all conflicts (...and plenty of other information) will be logged
● Next release will add new variable: wsrep_log_conflicts which will cause each cluster conflict to be logged in mysql error log
● Monitor:● wsrep_local_bf_aborts● wsrep_local_cert_failures
27www.codership.com
wsrep_retry_autocommit
● Galera can retry autocommit transaction on behalf of the client application, inside of the MySQL server
● MySQL will not return deadlock error, but will silently retry the transaction
● wsrep_retry_autocommit=n will retry the transaction n times before giving up and returning deadlock error
● Retrying applies only to autocommit transactions, as retrying is not safe for multi-statement transactions
28www.codership.com
MySQL MySQL MySQL
Retry Autocommit
aGalera Replication
Writewrite
1. conflict detected
2. retrying
29www.codership.com
Multi-Master Conflicts
1)Analyze the hot-spot
2)Check if application logic can be changed to catch deadlock exception and apply retrying logic in application
3)Try if wsrep_retry_autocommit configuration helps
4)Limit the number of master nodes or change completely to master-slave model
if you can filter out the access to the hot-spot table, it is enough to treat writes only to hot-spot table as master-slave
State Transfers
31www.codership.com
State Transfer
➢ Joining node needs to get the current database state
➢ Two choices:➢ IST: incremental state transfer➢ SST: full state transfer
➢ If joining node had some previous state and gcache spans to that, then IST can be used
32www.codership.com
State Snapshot Transfer
● To send full database state● wsrep_sst_method to choose the method:
➢ mysqldump➢ rsync➢ xtrabackup
33www.codership.com
MySQL MySQL
SST Request
Galera Replication
joinerSST Request● wsrep_sst_method
34www.codership.com
MySQL donor
SST Method
Galera Replication
joiner
wsrep_sst_mysqldump
wsrep_sst_rsync
wsrep_sst_xtrabackup
35www.codership.com
SST API
● SST is open API for shell scripts● Anyone can write custom SST● SST API can be used e.g. for:
● Backups● Filtering out part of database
36www.codership.com
wsrep_sst_mysqldump
● Logical backup● Slowest method● Configure authentication
➢ wsrep_sst_auth=”root:rootpass”➢ Super privilege needed
● Make sure SST user in donor node can take mysqldump from donor and load it over the network to joiner node
● You can try this manually beforehand
37www.codership.com
wsrep_sst_rsync
● Physical backup● Fast method● Can only be used when node is starting
➢ Rsyncing datadirectory under running InnoDB is not possible
38www.codership.com
wsrep_sst_xtrabackup
● Contributed by Percona● Probably the fastest method● Uses xtrabackup● Least blocking on Donor side (short readlock is still used when backup starts)
39www.codership.com
SST Donor
● All SST methods cause some disturbance for donor node
● By default donor accepts client connections, although committing will be prohibited for a while
● If wsrep_sst_donor_rejects_queries is set, donor gives unknown command error to clients
➔Best practice is to dedicate a reference node for donor and backup activities
40www.codership.com
Donor
Incremental State Transfer
Request to join
Joiner
gcache
GTID: seqno-n
gcache
seqno-n
41www.codership.com
Donor
Incremental State Transfer
Joiner
seqno-ngcachegcache
Send IST eventsapply
42www.codership.com
Incremental State Transfer
● Very effective● gcache.size parameter defines how big cache will be maintained
● Gcache is mmap, available disk space is upper limit for size allocation
43www.codership.com
Incremental State Transfer
● Use database size and write rate to optimize gcache:
➢ gcache < database➢ Write rate tells how long tail will be stored in cache
44www.codership.com
Incremental State Transfer
● You can think that IST Is ● A short asynchronous replication session ● If communication is bad quality, node
can drop and join back fast with IST
Backups Backups Backups
46www.codership.com
Backups
➢ All Galera nodes are constantly up to date➢ Best practices:
➢ Dedicate a reference node for backups➢ Assign global trx ID with the backup
➢ Possible methods:
1.Disconnecting a node for backup2.Using SST script interface3.xtrabackup
47www.codership.com
Backups with global Trx ID
➢ Global transaction ID (GTID) marks a position in the cluster transaction stream
➢ Backup with known GTID make it possible to utilize IST when joining new nodes, eg, when: ➢ Recovering the node➢ Provisioning new nodes
48www.codership.com
Backup by Disconnecting a Node
MySQL MySQL MySQL
Galera Replication
Load Balancing Isolate the backup node
49www.codership.com
Backup by Disconnecting a Node
MySQL MySQL MySQL
Galera Replication
Load Balancing
Disconnect from groupe.g. clear wsrep_provider
50www.codership.com
Backup by Disconnecting a Node
MySQL MySQL MySQL
Galera Replication
Load Balancing
Disconnect from groupe.g. clear wsrep_provider
51www.codership.com
Backup by Disconnecting a Node
MySQL MySQL MySQL
Galera Replication
Load Balancing
Work your backup magic
backups
52www.codership.com
Backup by Disconnecting a Node
MySQL MySQL MySQL
Galera Replication
Load Balancing
Read global transaction ID from status and assign to backupwsrep_cluster_uuidwsrep_last_committed
backups
53www.codership.com
Backup by SST
● Donor mode provides isolated processing environment
● A special SST script can be written just to prepare backup in donor node: wsrep_sst_backup
● Garbd can be used to trigger donor node to run the wsrep_sst_backup
54www.codership.com
Backup by SST API
node1 node3
Galera Replication
Load Balancing Launch garbd
Garbd
SST request
wsrep_sst_donor=node3wsrep_sst_method=backup
node2
55www.codership.com
Backup by SST API
node1 node2 node3
Galera Replication
Load Balancing Donor launcheswsrep_sst_backup
wsrep_sst_backup
.
.
.
56www.codership.com
Backup by SST API
node1 node2 node3
Galera Replication
Load Balancing wsrep_sst_backup prepares the backup
wsrep_sst_backup
.
.
.
backups
GTID
57www.codership.com
Backup by SST API
node1 node2 node3
Galera Replication
Load Balancing Backup node returns to cluster
58www.codership.com
Backup by xtrabackup
● Xtrabackup is hot backup method and can be used anytime
● Simple, efficient● Use –galera-info option to get global transaction
ID logged into separate galera info file
Schema Upgrades
60www.codership.com
Schema Upgrades
● DDL is non-transactional, and therefore bad
● Galera has two methods for DDL● TOI, Total Order Isolation● RSU, Rolling Schema Upgrade
● Use wsrep_osu_method to choose either option
61www.codership.com
Total Order Isolation
● DDL is replicated up-front● Each node will get the DDL statement
and must process the DDL at same slot in transaction stream
● Galera will isolate the affected table/database for the duration of DDL processing
62www.codership.com
Rolling Schema Upgrade
● DDL is not replicated● Galera will take the node out of replication
for the duration of DDL processing● When DDL is done with, node will catch up
with missed transactions (like IST)● DBA should roll RSU operation over all
nodes● Requires backwards compatible schema
changes
63www.codership.com
wsrep_on=OFF
● wsrep_on is a session variable telling if this session will be replicated or not
● I tried to hide this information to the best I can, but somebody has leaked this out
● And so, yes, it is possible to run “poor man's RSU” with wsrep_on set to OFF
● such session may be aborted by replication● Use only, if you are really sure that:
● planned SQL is not conflicting● SQL will not generate inconsistency
64www.codership.com
Schema Upgrades
● Best practices:➔ Plan your upgrades
➔ Try to be backwards compatible➔ Rehearse your upgrades
➔ Find out DDL execution time➔ Go for RSU if possible
Consistent Reads
66www.codership.com
Consistent reads
MySQL MySQL MySQL
Galera Replication
commit
Transaction is replicated to whole cluster
Replication is virtually synchronous...
67www.codership.com
Consistent reads
MySQL MySQL
Galera Replication
1. Insert into t1 values (1,....)
Will the select see the inserted row?
2. Select from t1 where i=1
68www.codership.com
Consistent Reads
● Aka read causality● There is causal dependency between
operations on two database connections● Application is expecting to see the values of
earlier write
69www.codership.com
Consistent Reads
● Use: wsrep_causal_reads=ON➔ Every read (select, show) will wait until slave
queue has been fully applied● There is timeout for max causal read wait:
● replicator.causal_read_keepalive
Other Tidbits...
71www.codership.com
Parallel Applying
● Aka parallel replication● “true parallel applying”
● Every application will benefit of it● Works not on database, not on table,
but on row level● wsrep_slave_threads=n● How many slaves makes sense:
● Monitor wsrep_cert_deps_distance● Max 2 * cores
72www.codership.com
MyISAM Replication
● On experimental level● MyISAM is phasing out not much demand to
complete● Replicates SQL up-front, like TOI● Should be used in master-slave model● No checks for non-deterministic SQL
● Insert into t (r, time) values (rand(), now());
73www.codership.com
SSL / TLS
● Replication over SSL is supported● No authentication (yet), only encryption● Whole cluster must use SSL
74www.codership.com
SSL or VPN
● Bundling several nodes through VPN tunnel may cause a vulnerability
● When VPN gateway breaks, a big part of cluster will be blacked out
● Best practice is to go for SSL if VPN does not have alternative routes
75www.codership.com
UDP Multicast
● Configure with gmcast.mcast_addr● Full cluster must be configured for
multicast or tcp sockets● Multicast is good for scalability● Best practice is to go for multicast if
planning for large clusters
Galera Project
77www.codership.com
Galera Project
● Galera Cluster for MySQL● 5 years development● based on MySQL server community edition● Fully open source● Active community
● ~3 releases per year● Release 2.2 RC out yesterday● Major release 3.0 in the works
● Galera Replication also used in:● Percona XtraDB Cluster● MariaDB Galera Cluster
78www.codership.com
Galera Project
API
MySQL
Galera Replication plugin
API
PerconaServer
API
MariaDB
mergemerge
Percona XtraDB Cluster
Galera Cluster for MySQL
MariaDB Galera Cluster
Questions?
Thank you for listening!Happy Clustering :-)