Taking Full Advantage of Galera Multi Master Cluster

Taking Full Advantage ofGalera Multi-Master

Philip StoevCodership Oy

Agenda• A very quick overview of Galera Cluster• General principles of multi-master (MM)• Workloads that are well-suited for MM• Application considerations for MM• Configuring, monitoring and troubleshooting multi-

master

Galera Cluster OverviewSynchronous

– each transaction is immediately replicated on all nodes at commit– no stale slaves

Multi-Master– read from and write to any node– automatic transaction conflict detection

Replication– a copy of the entire dataset is available on all nodes– new nodes can join automatically

For MySQL– based on a modified version of MySQL (5.5, 5.6 with 5.7 coming up)– InnoDB storage engine

And more …• Recovers from node failures within seconds• Data consistency protections

– avoids reading stale data– prevents unsafe data modifications

• Cloud and WAN support

Introduction to Multi-Master

What is Multi-Master• The ability to issue any transaction on any Galera node• A core feature of the product, not a clever trick that

happens to work• Available out of the box

Benefits to Multi-Master• Operational flexibility

– no need to designate a single node to use exclusively for writes– simplified configuration for load balancing– easier handling of scheduled downtime and node failures

• Wide Area Networks– applications can write to the node that is closest to them

General Principles• Galera places consistency on top:

– conflicting transactions issued on different nodes will be detected– the transaction that committed first succeeds, those that attempt to

commit after it are rejected– a transaction can be aborted halfway through if Galera detects that it

can not be completed without a conflict

• Callaghan’s Law “a given row can’t be modified more than once per RTT”

Write ScalingDoes multi-master provide write scaling?• The updates made by every write transaction need to

be applied on every Galera node• But none of the following operations are duplicated:

– parser and optimizer overhead– the effort needed to find and read many records in order to update a few– execution of triggers

Applications and Workloads

The Multimaster-ready Application• Check if application uses transactions or individual queries• Suggested application behavior:

– ensure that the application can handle “deadlock” errors during transaction and at COMMIT

– application should be able to retry failed transactions– transactions requiring absolutely fresh data are known – reads and writes can be directed to different servers if needed

• Better logging:– make sure all database errors are logged to enable analysis

• For legacy applications:– autocommit statements can be retried by Galera in case of failure

Suitable Workloads• Low percentage of effective database updates• Queries or transactions that perform a lot of work or

contain a lot of business logic but eventually update a smaller set of rows

Typical Example

START TRANSACTIONSELECT * FROM table1;SELECT * FROM table2;SELECT * FROM table3;

...UPDATE total_amount = 42 WHERE pk = 1

COMMIT

Other Examples

INSERT INTO t1SELECT COUNT(*) FROM very_large_table;

UPDATE shipmentsSET flag = 1

WHERE sender_country = ‘Vatican’AND receiving_state = ‘WY’;# Assuming no suitable indexes

Workload Considerations• High-percentage of single-row, NoSQL-style updates

that act on single rows• The SELECT FOR UPDATE statement• Frequent operations on “hot” rows:

– job queues or locking schemes implemented in the database– counters– generation of sequence numbers– repeated updates to “last accessed” timestamp records

• Long-running and housekeeping transactions

Autoincrement Handling• Galera handles AUTO_INCREMENT columns in a safe

way– works even as nodes join or leave the cluster– gaps in the sequence are possible, so use bigint columns

• There is no need for the application to manage sequence values, reserve ranges, etc.

Read-Write Splitting• If there are conflicts due to heavy contention on rows,

the application can direct writes to those rows alone to a single node

• With a TCP load-balancer, provide a TCP port that can be load-balanced to any node and a TCP port that is directed to a single node only

• Consider a query-aware proxy such as MaxScale

Configuring Galera for MM

Galera Variables• Galera is multi-master by default

– any node can accept any query out of the box

Useful options:• wsrep_retry_autocommit

– retries queries that failed

• wsrep_sync_wait– ensures data freshness

• wsrep_log_conficts– prints information in the server error log

Retrying Autocommit Transactions• Autocommit transactions are those that contain only a

single SQL statement, even if it updates multiple rows• A higher value of wsrep_retry_autocommit will help a

most such transactions complete successfully• default is 1, so one retry will happen by default• SQL statements that update many rows may not be

successful even if retried multiple times

Sync Waiting• With Galera, some small slave lag (a few transactions) is

allowed for performance reasons• If a transaction absolutely positively needs the most up-to-

date data there is, set wsrep_sync_wait• Can be set on a session basis, as needed (do not forget to

reset the variable at the end of the critical block)• Makes sure the data is up to date as of the start of the

transaction• Sync waiting is a properly of the transaction that requires

fresh data, not of the transaction that wrote the data

Dealing with Conflicts

Monitoring Conflicts• wsrep_local_bf_aborts

– number of transactions that were aborted because a conflicting transaction has already been committed locally

– this type of abort can happen even prior to COMMIT, to avoid performing unnecessary work that is doomed to fail

• wsrep_local_cert_failures– number of transactions failed at COMMIT time because they conflict

with another transaction still in “in flight”

Use the sum of the two counters.

Debugging Conflicts• Enable logging on the application side:

– to provide context information on the failing query (e.g. function or line numbers; schema name)

• Ensure that system time is synchronized across the cluster and with the application servers

• Enable the wsrep_log_conflicts variable• Enable binary logging to obtain information on the winning

transaction (see http://goo.gl/Tw5JLn)

http://goo.gl/Tw5JLn

Log Output*** Victim TRANSACTION:TRANSACTION 1374, ACTIVE 23 sec starting index readmysql tables in use 1, locked 14833 lock struct(s), heap size 554536, 1004832 row lock(s), undo log entries 934296MySQL thread id 5, OS thread handle 0x7fbbb4601700, query id 50localhost ::1 root updatingupdate t1 set f2 = 'problematic_key_value21'*** WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 8 page no 4 n bits 280 index `PRIMARY`of table `test`.`t1` trx id 1374 lock_mode XRecord lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000001; asc ;; # Unsigned integer value of PK 1: len 6; hex 00000000055e; asc ^;; 2: len 7; hex 39000021fd0110; asc 9 ! ;; 3: len 30; hex 70726f626c656d617469635f6b65795f76616c7565323120202020202020; asc

problematic_key_value21 ; (total 50 bytes);

Avoiding ConflictsFor hot records:• break down a “hot record” into multiple rows• replace repeated updates with inserting new records into a log

table

For long-running transactions:• split housekeeping work into smaller units

Or:• Send conflicting writes to a single node

– non-conflicting transactions can still be directed to any node

Questions

• Please use the Question/Chat box in the GoToWebinar panel

Thank You

http://www.galeracluster.com

Discussion group:

[email protected]

http://www.galeracluster.com/

mailto:[email protected]

Software

Taking Full Advantage of Galera Multi Master Cluster