24
Galera Cluster Best Practices Part 3: Schema Changes and DDL Philip Stoev Codership Oy

Galera Cluster DDL and Schema Upgrades 220217

Embed Size (px)

Citation preview

Page 1: Galera Cluster DDL and Schema Upgrades 220217

Galera Cluster Best Practices Part 3:Schema Changes and DDL

Philip StoevCodership Oy

Page 2: Galera Cluster DDL and Schema Upgrades 220217

Agenda• A very quick overview of Galera Cluster• DDL handling in Galera Cluster• Preparing for a schema upgrade• Execution strategies for DDL• Recent Developments and Future Improvements• Q/A

Page 3: Galera Cluster DDL and Schema Upgrades 220217

Galera Cluster OverviewSynchronous

– each transaction is immediately replicated on all nodes at commit– no stale slaves

Multi-Master– read from and write to any node– automatic transaction conflict detection

Replication– a copy of the entire dataset is available on all nodes– new nodes can join automatically

For MySQL– based on a modified version of MySQL (5.5, 5.6, 5.7)– InnoDB storage engine

Page 4: Galera Cluster DDL and Schema Upgrades 220217

And more …• Recovers from node failures within seconds• Data consistency protections

– avoids reading stale data– prevents unsafe data modifications

• Cloud and WAN support

Page 5: Galera Cluster DDL and Schema Upgrades 220217

DDL in Galera• DDL statements are handled differently in Galera

– this is to ensure maximum data consistency in a distributed environment

– The “online”, “in-place” and “non-blocking” terms from the MySQL documentation do not apply directly

• DDL execution must be thought out in advance

Page 6: Galera Cluster DDL and Schema Upgrades 220217

DDL Execution Methods• Total Order Isolation (TOI) - the default

– the DDL is run on all nodes at the same time– the cluster can not commit other transactions while the DDL is

running• RSU – Rolling Schema Upgrade

– the DDL is run on one node at a time

Page 7: Galera Cluster DDL and Schema Upgrades 220217

The Application and DDLs• Check for DDLs executed by the application/framework:

– some applications run a lot ofCREATE TABLE [IF NOT EXISTS] at connection time

– some run ALTER TABLE when started,if they feel they need to upgrade the schema

– TEMPORARY tables are OK• Take control over DDLs:

– Revoke ALTER, INDEX privileges– A SQL-aware proxy / load balancer can also intercept such

queries

Page 8: Galera Cluster DDL and Schema Upgrades 220217

The DDL Statement• CREATE, DROP [PARTITION]

– usually fast enough, no need for special planning– unless executed repeatedly by multiple connections

• ALTER TABLE or CREATE INDEX– some operations have different execution speed depending on

MySQL version– some statements operate on metadata only, so are fast– some DDL support ALGORITHM=INPLACE for faster execution

• Will still cause locking in Galera under TOI– some DDLs require creating a complete copy of the entire table

Page 9: Galera Cluster DDL and Schema Upgrades 220217

OPTIMIZE TABLE, etc.• If a statement can be given the

NO_WRITE_TO_BINLOG modifier, it will not be replicated by Galera

• Such a statement may fail if concurrent updates against the same table are going on elsewhere in the cluster.

• If you experience deadlock errors:– do not perform concurrent updates against the table, or– make the updates only on the node running the statement

Page 10: Galera Cluster DDL and Schema Upgrades 220217

Total Order Isolation (TOI)

Running DDL on all nodes at once

Page 11: Galera Cluster DDL and Schema Upgrades 220217

General Principles for TOI• No other write transactions can commit anywhere on the

cluster while a TOI DDL is in progress• Even if the DDL is “online”, ”inplace” or allows

concurrent table access in stand-alone MySQL, it is still fully blocking for writes

• DML transactions operating on same table may get deadlock error

• wsrep_sync_wait queries may time out

Page 12: Galera Cluster DDL and Schema Upgrades 220217

General Principles for TOI (#2)• DDL statements can not be killed once started• If a node dies during DDL, it may need to rejoin via SST• In Galera 3.x, DDL execution errors are ignored

– so check server error log– a GRA*.log file will also be created for each failure

Page 13: Galera Cluster DDL and Schema Upgrades 220217

How Galera runs TOI DDL1. The DDL statement is sent to all nodes2. All transactions in the cluster that committed prior to the

DDL are replicated and applied first, new commits are blocked

3. The DDL is run on all nodes at exactly the same place in the logical sequence of events

Page 14: Galera Cluster DDL and Schema Upgrades 220217

Procedure for TOI1. Practice the DDL on a test cluster, if possible2. Ensure enough free disk space is available on all nodes3. Schedule a maintenance window / put application in read-

only mode– hangs or deadlock errors will occur for all DML transactions

4. Run the DDL on one node only, it will be replicated to the rest

5. Examine SHOW PROCESSLIST, SHOW CREATE TABLE on all nodes to confirm successful execution

6. Check error logs on all nodes for errors

Page 15: Galera Cluster DDL and Schema Upgrades 220217

Potential Failure Scenarios• MySQL returns a SQL error locally

– statement still ran on all nodes even if it failed locally;– it may have succeeded elsewhere

• Statement fails to complete successfully on other nodes– disk space issues– constraint violation due to data inconsistency

ALTER TABLE ADD UNIQUE KEY may expose inconsistencies that have not been noticed previously

• Statement takes longer than expected

Page 16: Galera Cluster DDL and Schema Upgrades 220217

RSU

Rolling Schema Upgrade

Page 17: Galera Cluster DDL and Schema Upgrades 220217

Basic Principles for RSU• Statement is manually run on one node at a time• Node will temporarily fall behind the cluster for the

duration of the DDL• Standard MySQL locking rules apply on local node• Nothing is locked on remote nodes• Other transactions can continue unaffected• The binary log on each node will contain events in

different order – important when using async replication

Page 18: Galera Cluster DDL and Schema Upgrades 220217

Your Application and RSU• During a Rolling Schema Upgrade:

– a cluster contains some nodes with old schemaand some nodes with the new one

– the node that is currently running the DDL may temporarily fall behind

• Remove it from load balancer if data freshness is important– RSU is a global setting, so application should not attempt to

run other DDLs while you execute the RSU procedure

Page 19: Galera Cluster DDL and Schema Upgrades 220217

Coexistence of Two Schemas• INSERT queries should not attempt to insert into a

column that does not yet exist everywhere.– INSERT INTO table (old_col1, new_col2) VALUES (123, ‘abc’);

• Column count and position may be different in old and new schema:– SELECT * may return differently-shaped result sets

• SELECT old_col1, old_col2 is better– INSERT INTO table VALUES (123, ‘abc’) may fail or put

data in the wrong column

Page 20: Galera Cluster DDL and Schema Upgrades 220217

Preparation1. Practice the DDL on a test machine, if possible2. Practice taking nodes out of the load balancer, as this

needs to be done repeatedly3. Do one DDL at a time, to avoid confusion

– multiple operations can be combined in a single DDL statement

Page 21: Galera Cluster DDL and Schema Upgrades 220217

Step-By-Step Procedure1. On every node, one at a time:2. Remove node from load balancer if data freshness is important3. Run DDL:

SET GLOBAL wsrep_osu_method=RSU;

ALTER TABLE …

SET GLOBAL wsrep_osu_method=TOI;

4. Wait for node to catch up - wsrep_local_recv_queue variable5. Restore node to load balancer6. Check for application errors7. Repeat procedure on the other nodes

Page 22: Galera Cluster DDL and Schema Upgrades 220217

Current and Future Improvements• In recently-released Galera Cluster 5.7:

certain DDL statements are now much faster or instantaneous:– ALTER TABLE ADD KEY

– ALTER TABLE CHANGE COLUMN for some VARCHAR types– OPTIMIZE TABLE

• In upcoming Galera Replication Library 4.x:– a new schema upgrade method, NBO, will allow ALTER

statements to run without blocking the entire cluster– a new consistency mechanism will check if a DDL succeeded

or failed equally on all nodes

Page 23: Galera Cluster DDL and Schema Upgrades 220217

Questions

• Please use the Question/Chat box in the GoToWebinar panel

• Ideas welcome for future webinars

Page 24: Galera Cluster DDL and Schema Upgrades 220217

Thank You

http://www.galeracluster.com

Discussion group:

[email protected]