163
Best practices for MySQL High Availability in 2017 Colin Charles, Chief Evangelist, Percona Inc. [email protected] / [email protected] http://www.bytebot.net/blog/ | @bytebot on Twitter Percona Live Santa Clara, California, USA 24 April 2017

Best practices for MySQL High Availability percona live 2017

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Best practices for MySQL High Availability percona live 2017

Best practices for MySQL High Availability in 2017Colin Charles, Chief Evangelist, Percona [email protected] / [email protected]://www.bytebot.net/blog/ | @bytebot on TwitterPercona Live Santa Clara, California, USA24 April 2017

Page 2: Best practices for MySQL High Availability percona live 2017

whoami

• Chief Evangelist (in the CTO office), Percona Inc

• Founding team of MariaDB Server (2009-2016), previously at Monty Program Ab, merged with SkySQL Ab, now MariaDB Corporation

• Formerly MySQL AB (exit: Sun Microsystems)

• Past lives include Fedora Project (FESCO), OpenOffice.org

• MySQL Community Contributor of the Year Award winner 2014

2

Page 3: Best practices for MySQL High Availability percona live 2017

License

• Creative Commons BY-NC-SA 4.0

• https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode

3

Page 4: Best practices for MySQL High Availability percona live 2017

Agenda• Choosing the right High Availability (HA) solution

• Discuss replication

• Handling failure

• Discuss proxies

• HA in the cloud, geographical redundancy

• Sharding solutions

• MySQL 5.6/5.7 features + utilities + Fabric + Router

• What’s next?

4

Page 5: Best practices for MySQL High Availability percona live 2017

5

Page 6: Best practices for MySQL High Availability percona live 2017

6

Page 7: Best practices for MySQL High Availability percona live 2017

7

Page 8: Best practices for MySQL High Availability percona live 2017

8

Page 9: Best practices for MySQL High Availability percona live 2017

9

Page 10: Best practices for MySQL High Availability percona live 2017

10

Page 11: Best practices for MySQL High Availability percona live 2017

Uptime

Percentile target Max downtime per year90% 36 days99% 3.65 days

99.5% 1.83 days99.9% 8.76 hours99.99% 52.56 minutes99.999% 5.25 minutes99.9999% 31.5 seconds

11

Page 12: Best practices for MySQL High Availability percona live 2017

Estimates of levels of availability

12

Method Level of Availability

Simple replication 98-99.9%

Master-Master/MMM 99%

SAN 99.5-99.9%

DRBD, MHA, Tungsten Replicator 99.9%

NDBCluster, Galera Cluster 99.999%

Page 13: Best practices for MySQL High Availability percona live 2017

HA is Redundancy

• RAID: disk crashes? Another works

• Clustering: server crashes? Another works

• Power: fuse blows? Redundant power supplies

• Network: Switch/NIC crashes? 2nd network route

• Geographical: Datacenter offline/destroyed? Computation to another DC

13

Page 14: Best practices for MySQL High Availability percona live 2017

Durability

• Data stored on disks

• Is it really written to the disk?

• being durable means calling fsync() on each commit

• Is it written in a transactional way to guarantee atomicity, crash safety, integrity?

14

Page 15: Best practices for MySQL High Availability percona live 2017

High Availability for databases

• HA is harder for databases

• Hardware resources and data need to be redundant

• Remember, this isn’t just data - constantly changing data

• HA means the operation can continue uninterrupted, not by restoring a new/backup server

• uninterrupted: measured in percentiles

15

Page 16: Best practices for MySQL High Availability percona live 2017

Redundancy through client-side XA transactions

• Client writes to 2 independent but identical databases

• HA-JDBC (http://ha-jdbc.github.io/)

• No replication anywhere

16

Page 17: Best practices for MySQL High Availability percona live 2017

InnoDB “recovery” time

•innodb_log_file_size

• larger = longer recovery times

• Percona Server 5.5 (XtraDB) - innodb_recovery_stats

17

Page 18: Best practices for MySQL High Availability percona live 2017

Redundancy through shared storage

• Requires specialist hardware, like a SAN

• Complex to operate

• One set of data is your single point of failure

• Cold standby

• failover 1-30 minutes

• this isn’t scale-out

• Active/Active solutions: Oracle RAC, ScaleDB

18

Page 19: Best practices for MySQL High Availability percona live 2017

Redundancy through disk replication

• DRBD

• Linux administration vs. DBA skills

• Synchronous

• Second set of data inaccessible for use

• Passive server acting as hot standby

• Failover: 1-30 minutes

• Performance hit: DRBD worst case is ~60% single node performance, with higher average latencies

19

Page 20: Best practices for MySQL High Availability percona live 2017

20

Page 21: Best practices for MySQL High Availability percona live 2017

MySQL Sandbox

• Great for testing various versions of MySQL/Percona Server/MariaDB

• Great for creating replication environments

• make_sandbox mysql.tar.gz

•make_replication_sandbox mysql.tar.gz

• http://mysqlsandbox.net/

21

Page 22: Best practices for MySQL High Availability percona live 2017

Redundancy through MySQL replication

• MySQL replication

• Tungsten Replicator

• Galera Cluster

• MySQL Group Replication

• MySQL Cluster (NDBCLUSTER)

• Storage requirements are multiplied

• Huge potential for scaling out

22

Page 23: Best practices for MySQL High Availability percona live 2017

MySQL Replication• Statement based generally

• Row based became available in 5.1, and the default in 5.7

• mixed-mode, resulting in STATEMENT except if calling

• UUID function, UDF, CURRENT_USER/USER function, LOAD_FILE function

• 2 or more AUTO_INCREMENT columns updated with same statement

• server variable used in statement

• storage engine doesn’t allow statement based replication, like NDBCLUSTER

• default in MariaDB Server 10.2 onwards

23

Page 24: Best practices for MySQL High Availability percona live 2017

MySQL Replication II

• Asynchronous by default

• Semi-synchronous plugin in 5.5+

• However the holy grail of fully synchronous replication is not part of standard MySQL replication (yet?)

• MariaDB Galera Cluster is built-in to MariaDB Server 10.1

24

Page 25: Best practices for MySQL High Availability percona live 2017

The logs

• Binary log (binlog) - events that describe database changes

• Relay log - events read from binlog on master, written by slave i/o thread

• master_info_log - status/config info for slave’s connection to master

• relay_log_info_log - status info about execution point in slave’s relay log

25

Page 26: Best practices for MySQL High Availability percona live 2017

Semi-synchronous replication

• semi-sync capable slave acknowledges transaction event only after written to relay log & flushed to disk

• timeout occurs? master reverts to async replication; resumes when slaves catch up

• at scale, Facebook runs semi-sync: http://yoshinorimatsunobu.blogspot.com/2014/04/semi-synchronous-replication-at-facebook.html

26

Page 27: Best practices for MySQL High Availability percona live 2017

Semi-sync II

• nowadays, its enhanced (COMMIT method):

1. prepare transaction in storage engine

2. write transaction to binlog, flush to disk

3. wait for at least one slave to ack binlog event

4. commit transaction to storage engine

27

Page 28: Best practices for MySQL High Availability percona live 2017

MySQL Replication in 5.6• Global Transaction ID (GTID)

• Server UUID

• Ignore (master) server IDs (filtering)

• Per-schema multi-threaded slave

• Group commit in the binary log

• Binary log (binlog) checksums

• Crash safe binlog and relay logs

• Time delayed replication

• Parallel replication (per database)

28

Page 29: Best practices for MySQL High Availability percona live 2017

MySQL Replication in 5.7

• Multi-source replication

• Online GTID implementation

• Loss-less semi-sync

• Intra-schema parallel replication

• Group commit tuning

• Online CHANGE MASTER TO w/o stopping replication thread

• GTIDs in the OK packet

29

Page 30: Best practices for MySQL High Availability percona live 2017

Group commit in MariaDB 5.3 onwards

• Do slow part of prepare() in parallel in InnoDB (first fsync(), InnoDB group commit)

• Put transaction in queue, decide commit order

30

Page 31: Best practices for MySQL High Availability percona live 2017

• First in queue runs serial part for all, rest wait

• Wait for access to the binlog

• Write transactions into binlog, in order, then sync (second fsync())

• Run the fast part of commit() for all transactions in order

31

Page 32: Best practices for MySQL High Availability percona live 2017

• Finally, run the slow part of commit() in parallel (third fsync(), InnoDB group commit)

• Only 2 context switches per thread (one sleep, one wakeup)

• Note: MySQL 5.6, MariaDB 10 only does 2 fsyncs/group commit

32

Page 33: Best practices for MySQL High Availability percona live 2017

Group commit in MariaDB 10

• Remove commit in slow part of InnoDB commit (stage 4)

• Reduce cost of crash-safe binlog

• A binlog checkpoint is a point in the binlog where no crash recovery is needed before it. In InnoDB you wait for flush + fsync its redo log for commit

33

Page 34: Best practices for MySQL High Availability percona live 2017

crash-safe binlog

• MariaDB 5.5 checkpoints after every commit —> expensive!

• 5.5/5.6 stalls commits around binlog rotate, waiting for all prepared transactions to commit (since crash recovery can only scan latest binlog file)

34

Page 35: Best practices for MySQL High Availability percona live 2017

crash-safe binlog 10.0

• 10.0 makes binlog checkpoints asynchronous

• A binlog can have no checkpoints at all

• Ability to scan multiple binlogs during crash recovery

• Remove stalls around binlog rotates

35

Page 36: Best practices for MySQL High Availability percona live 2017

group commit in 10.1

• Tricky locking issues hard to change without getting deadlocks sometimes

• mysql#68251, mysql#68569

• New code? Binlog rotate in background thread (further reducing stalls). Split transactions across binlogs, so big transactions do not lead to big binlog files

• Works with enhanced semi-sync replication (wait for slave before commit on the master rather than after commit)

36

Page 37: Best practices for MySQL High Availability percona live 2017

Replication: START TRANSACTION WITH CONSISTENT SNAPSHOT

• Works with the binlog, possible to obtain the binlog position corresponding to a transactional snapshot of the database without blocking any other queries.

• by-product of group commit in the binlog to view commit ordering (MariaDB Server 5.3+, Percona Server for MySQL 5.6+)

• Used by the command mysqldump--single-transaction --master-data to do a fully non-blocking backup

• Works consistently between transactions involving more than one storage engine

• https://kb.askmonty.org/en/enhancements-for-start-transaction-with-consistent/

• Percona Server made it better, by session ID, and also introducing backup locks

37

Page 38: Best practices for MySQL High Availability percona live 2017

Multi-source replication

• Multi-source replication - (real-time) analytics, shard provisioning, backups, etc.

• @@default_master_connection contains current connection name (used if connection name is not given)

• All master/slave commands take a connection name now (like CHANGE MASTER “connection_name”, SHOW SLAVE “connection_name” STATUS, etc.)

38

Page 39: Best practices for MySQL High Availability percona live 2017

Global Transaction ID (GTID)

• Supports multi-source replication

• GTID can be enabled or disabled independently and online for masters or slaves

• Slaves using GTID do not have to have binary logging enabled.

• (MariaDB Server) Supports multiple replication domains (independent binlog streams)

• Queries in different domains can be run in parallel on the slave.

39

Page 40: Best practices for MySQL High Availability percona live 2017

Why is MariaDB Server GTID is different compared to MySQL 5.6?

• MySQL 5.6 GTID does not support multi-source replication (only 5.7 supports this)

• Supports —log-slave-updates=0 for efficiency (like 5.7)

• Enabled by default

• Turn it on without having to restart the topology (just like 5.7)

40

Page 41: Best practices for MySQL High Availability percona live 2017

Crash-safe slave (w/InnoDB DML)

• Replace non-transactional file relay_log.info with transactional mysql.rpl_slave_state

• Changes to rpl_slave_state are transactionally recovered after crash along with user data.

41

Page 42: Best practices for MySQL High Availability percona live 2017

Crash-safe slaves in 5.6?

• Not using GTID

• you can put relay-log.info into InnoDB table, that gets updated along w/trxn

• Using GTID

• relay-log.info not used. Slave position stored in binlog on slave (—log-slave-updates required)

• Using parallel replication

• Uses a different InnoDB table for this use case

42

Page 43: Best practices for MySQL High Availability percona live 2017

Replication domains• Keep central concept that replication is just applying events in-order from a serial

binlog stream.

• Allow multi-source replication with multiple active masters

• Let’s the DBA configure multiple independent binlog streams (one per active master: mysqld --git-domain-id=#)

• Events within one stream are ordered the same across entire replication topology

• Events between different streams can be in different order on different servers

• Binlog position is one ID per replication domain

43

Page 44: Best practices for MySQL High Availability percona live 2017

Parallel replication

• Multi-source replication from different masters executed in parallel

• Queries from different domains are executed in parallel

• Queries that are run in parallel on the master are run in parallel on the slave (based on group commit).

• Transactions modifying the same table can be updated in parallel on the slave!

• Supports both statement based and row based replication.

44

Page 45: Best practices for MySQL High Availability percona live 2017

All in… sometimes it can get out of sync

• Changed information on slave directly

• Statement based replication

• non-deterministic SQL (UPDATE/DELETE with LIMIT and without ORDER BY)

• triggers & stored procedures

• Master in MyISAM, slave in InnoDB (deadlocks)

• --replication-ignore-db with fully qualified queries

• Binlog corruption on master

• PURGE BINARY LOGS issued and not enough files to update slave

• read_buffer_size larger than max_allowed_packet

• Bugs?

45

Page 46: Best practices for MySQL High Availability percona live 2017

Replication Monitoring

• Percona Toolkit is important

• pt-slave-find: find slave information from master

• pt-table-checksum: online replication consistency check

• executes checksum queries on master

• pt-table-sync: synchronise table data efficiently

• changes data, so backups important

46

Page 47: Best practices for MySQL High Availability percona live 2017

Replication Monitoring with PMM

47

•http://pmmdemo.percona.com/

Page 48: Best practices for MySQL High Availability percona live 2017

Statement Based Replication Binlog

$ mysqlbinlog mysql-bin.000001

# at 3134

#140721 13:59:57 server id 1 end_log_pos 3217 CRC32 0x974e3831 Querythread_id=9 exec_time=0 error_code=0

SET TIMESTAMP=1405943997/*!*/;

BEGIN/*!*/;

# at 3217

#140721 13:59:57 server id 1 end_log_pos 3249 CRC32 0x8de28161 Intvar

SET INSERT_ID=2/*!*/;

# at 3249

#140721 13:59:57 server id 1 end_log_pos 3370 CRC32 0x121ef29f Querythread_id=9 exec_time=0 error_code=0

SET TIMESTAMP=1405943997/*!*/;

insert into auto (data) values ('a test 2')/*!*/;

# at 3370

#140721 13:59:57 server id 1 end_log_pos 3401 CRC32 0x34354945 Xid = 414COMMIT/*!*/;

48

Page 49: Best practices for MySQL High Availability percona live 2017

Dynamic replication variable control

• SET GLOBAL binlog_format=‘STATEMENT’ | ‘ROW’ | ‘MIXED’

• Can also be set as a session level

• Dynamic replication filtering variables on MariaDB 5.3+, MySQL 5.7+

49

Page 50: Best practices for MySQL High Availability percona live 2017

Row based replication event> mysqlbinlog mysql-bin.*

# at 3401

#140721 14:03:59 server id 1 end_log_pos 3477 CRC32 0xa37f424a Query thread_id=9 exec_time=0 error_code=0

SET TIMESTAMP=1405944239.559237/*!*/;

BEGIN

/*!*/;

# at 3477

#140721 14:03:59 server id 1 end_log_pos 3529 CRC32 0xf4587de5 Table_map: `demo`.`auto` mapped to number 80

# at 3529

#140721 14:03:59 server id 1 end_log_pos 3585 CRC32 0xbfd73d98 Write_rows: table id 80 flags: STMT_END_F

BINLOG '

rwHNUxMBAAAANAAAAMkNAAAAAFAAAAAAAAEABGRlbW8ABGF1dG8AAwMRDwMGZAAE5X1Y9A==

rwHNUx4BAAAAOAAAAAEOAAAAAFAAAAAAAAEAAgAD//gDAAAAU80BrwiIhQhhIHRlc3QgM5g9178=

'/*!*/;

# at 3585

#140721 14:03:59 server id 1 end_log_pos 3616 CRC32 0x5f422fed Xid = 416

COMMIT/*!*/;

50

Page 51: Best practices for MySQL High Availability percona live 2017

mysqlbinlog versions

• ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 56, event_type: 30

• 5.6 ships with a “streaming binlog backup server” - v.3.4; MariaDB 10 doesn’t - v.3.3 (fixed in 10.2 - MDEV-8713)

• GTID variances!

51

Page 52: Best practices for MySQL High Availability percona live 2017

GTID

52

# at 471

#140721 14:20:01 server id 1 end_log_pos 519 CRC32 0x209d8843 GTID [commit=yes]SET @@SESSION.GTID_NEXT= 'ff89bf58-105e-11e4-b2f1-448a5b5dd481:2'/*!*/;# at 519

#140721 14:20:01 server id 1 end_log_pos 602 CRC32 0x5c798741 Query thread_id=3 exec_time=0 error_code=0

SET TIMESTAMP=1405945201.329607/*!*/;

BEGIN

/*!*/;

# at 602

# at 634

#140721 14:20:01 server id 1 end_log_pos 634 CRC32 0xa5005598 Intvar

SET INSERT_ID=5/*!*/;

#140721 14:20:01 server id 1 end_log_pos 760 CRC32 0x0b701850 Query thread_id=3 exec_time=0 error_code=0

SET TIMESTAMP=1405945201.329607/*!*/;

insert into auto (data) values ('a test 5 gtid')

/*!*/;

# at 760

#140721 14:20:01 server id 1 end_log_pos 791 CRC32 0x497a23e0 Xid = 31

COMMIT/*!*/;

Page 53: Best practices for MySQL High Availability percona live 2017

SHOW SLAVE STATUSmysql> show slave status\G*************************** 1. row ***************************Slave_IO_State: Waiting for master to send eventMaster_Host: server1Master_User: repluserMaster_Port: 3306...Master_Log_File: server1-binlog.000008 <- io_thread (read)Read_Master_Log_Pos: 436614719 <- io_thread (read)Relay_Log_File: server2-relaylog.000007 <- io_thread (write)Relay_Log_Pos: 236 <- io_thread (write)Relay_Master_Log_File: server1-binlog.000008 <- sql_threadSlave_IO_Running: YesSlave_SQL_Running: Yes...Exec_Master_Log_Pos: 436614719 <- sql_thread...Seconds_Behind_Master: 0

53

Page 54: Best practices for MySQL High Availability percona live 2017

Slave prefetching

• Replication Booster

• https://github.com/yoshinorim/replication-booster-for-mysql

• Prefetch MySQL relay logs to make the SQL thread faster

• Tungsten has slave prefetch

• Percona Server till 5.6 + MariaDB till 10.1 have InnoDB fake changes

54

Page 55: Best practices for MySQL High Availability percona live 2017

What replaces slave prefetching?

• In Percona Server 5.7, slave prefetching has been replaced by doing intra-schema parallel replication

• Feature removed from XtraDB

• MariaDB Server 10.2 will also have this feature removed

55

Page 56: Best practices for MySQL High Availability percona live 2017

Tungsten Replicator• Replaces MySQL Replication layer

• MySQL writes binlog, Tungsten reads it and uses its own replication protocol

• Global Transaction ID

• Per-schema multi-threaded slave

• Heterogeneous replication: MySQL <-> MongoDB <-> PostgreSQL <-> Oracle

• Multi-master replication

• Multiple masters to single slave (multi-source replication)

• Many complex topologies

• Continuent Tungsten (Enterprise) vs Tungsten Replicator (Open Source)

56

Page 57: Best practices for MySQL High Availability percona live 2017

In today’s world, what does it offer?

• opensource MySQL <-> Oracle replication to aid in your migration

• automatic failover without MHA

• multi-master with cloud topologies too

• Oracle <-> Oracle replication (this is Golden Gate for FREE)

• Replication from MySQL to MongoDB

• Data loading into Hadoop

57

Page 58: Best practices for MySQL High Availability percona live 2017

Galera Cluster• Inside MySQL, a replication plugin (wsrep)

• Replaces MySQL replication (but can work alongside it too)

• True multi-master, active-active solution

• Virtually Synchronous

• WAN performance: 100-300ms/commit, works in parallel

• No slave lag or integrity issues

• Automatic node provisioning

58

Page 59: Best practices for MySQL High Availability percona live 2017

59

Page 60: Best practices for MySQL High Availability percona live 2017

Percona XtraDB Cluster 5.7

• Engineering within Percona

• Load balancing with ProxySQL (bundled)

• PMM integration

• Benefits of all the MySQL 5.7 feature-set

60

Page 61: Best practices for MySQL High Availability percona live 2017

Group replication• Fully synchronous replication (update everywhere), self-healing, with elasticity,

redundancy

• Single primary mode supported

• MySQL InnoDB Cluster - a combination of group replication, Router, to make magic!

• Recent blogs:

• https://www.percona.com/blog/2017/02/24/battle-for-synchronous-replication-in-mysql-galera-vs-group-replication/

• https://www.percona.com/blog/2017/02/15/group-replication-shipped-early/

61

Page 62: Best practices for MySQL High Availability percona live 2017

MySQL NDBCLUSTER• 3 types of nodes: SQL, data and management

• MySQL node provides interface to data. Alternate API’s available: LDAP, memcached, native NDBAPI, node.js

• Data nodes (NDB storage)

• different to InnoDB

• transactions synchronously written to 2 nodes(ore more) - replicas

• transparent sharding: partitions = data nodes/replicas

• automatic node provisioning, online re-partitioning

• High performance: 1 billion updates / minute

62

Page 63: Best practices for MySQL High Availability percona live 2017

Summary of Replication Performance

• SAN has "some" latency overhead compared to local disk. Can be great for throughput.

• DRBD = 50% performance penalty

• Replication, when implemented correctly, has no performance penalty

• But MySQL replication with disk bound data set has single-threaded issues!

• Semi-sync is poorer on WAN compared to async

• Galera & NDB provide read/write scale-out, thus more performance

63

Page 64: Best practices for MySQL High Availability percona live 2017

Handling failure• How do we find out about failure?

• Polling, monitoring, alerts...

• Error returned to and handled in client side

• What should we do about it?

• Direct requests to the spare nodes (or DCs)

• How to protect data integrity?

• Master-slave is unidirectional: Must ensure there is only one master at all times.

• DRBD and SAN have cold-standby: Must mount disks and start mysqld.

• In all cases must ensure that 2 disconnected replicas cannot both commit independently. (split brain)

64

Page 65: Best practices for MySQL High Availability percona live 2017

Frameworks to handle failure

• MySQL-MMM

• Severalnines ClusterControl

• Orchestrator

• MySQL MHA

• Percona Replication Manager

• Tungsten Replicator

• 5.6: mysqlfailover, mysqlrpladmin

• Replication Manager

65

Page 66: Best practices for MySQL High Availability percona live 2017

MySQL-MMM• You have to setup all nodes and replication manually

• MMM gives Monitoring + Automated and manual failover on top

• Architecture consists of Monitor and Agents

• Typical topology:

• 2 master nodes

• Read slaves replicate from each master

• If a master dies, all slaves connected to it are stale

• http://mysql-mmm.org/

66

Page 67: Best practices for MySQL High Availability percona live 2017

Severalnines ClusterControl• Started as automated deployment of MySQL NDB Cluster

• now: 4 node cluster up and running in 5 min!

• Now supports

• MySQL replication and Galera

• Semi-sync replication

• Automated failover

• Manual failovers, status check, start & stop of node, replication, full cluster... from single command line.

• Monitoring

• Topology: Pair of semi-sync masters, additional read-only slaves

• Can move slaves to new master

• http://severalnines.com/

67

Page 68: Best practices for MySQL High Availability percona live 2017

ClusterControl II

• Handles deployment: on-premise, EC2, or hybrid (Rackspace, etc.)

• Adding HAProxy as a Galera load balancer

• Hot backups, online software upgrades

• Workload simulation

• Monitoring (real-time), health reports

68

Page 69: Best practices for MySQL High Availability percona live 2017

Orchestrator

• Reads replication topologies, keeps state, continuous polling

• Modify your topology — move slaves around

• Nice GUI, JSON API, CLI

69

Page 70: Best practices for MySQL High Availability percona live 2017

MySQL MHA• Like MMM, specialized solution for MySQL replication

• Developed by Yoshinori Matsunobu at DeNA

• Automated and manual failover options

• Topology: 1 master, many slaves

• Choose new master by comparing slave binlog positions

• Can be used in conjunction with other solutions

• http://code.google.com/p/mysql-master-ha/

70

Page 71: Best practices for MySQL High Availability percona live 2017

Cluster suites

• Heartbeat, Pacemaker, Red Hat Cluster Suite

• Generic, can be used to cluster any server daemon

• Usually used in conjunction with Shared Disk or Replicated Disk solutions (preferred)

• Can be used with replication.

• Robust, Node Fencing / STONITH

71

Page 72: Best practices for MySQL High Availability percona live 2017

Pacemaker• Heartbeat, Corosync, Pacemaker

• Resource Agents, Percona-PRM

• Percona Replication Manager - cluster, geographical disaster recovery options

• Pacemaker agent specialised on MySQL replication

• https://github.com/percona/percona-pacemaker-agents/

• Pacemaker Resource Agents 3.9.3+ include Percona Replication Manager (PRM)

72

Page 73: Best practices for MySQL High Availability percona live 2017

VM based failover

• VMWare, Oracle VM, etc can migrate / failover the entire VM guest

• This isn’t the focus of the talk

73

Page 74: Best practices for MySQL High Availability percona live 2017

Load Balancers for multi-master clusters

• Synchronous multi-master clusters like Galera require load balancers

• HAProxy

• Galera Load Balancer (GLB)

• MaxScale

• ProxySQL

74

Page 75: Best practices for MySQL High Availability percona live 2017

What is a proxy?

• Lightweight application between the MySQL clients and the server

• Man-in-the-middle between client/server• Communicate with one or more clients/

servers

Page 76: Best practices for MySQL High Availability percona live 2017

Image via Giuseppe Maxia

Page 77: Best practices for MySQL High Availability percona live 2017

MySQL Proxy - ten years ago!

• The first proxy, which had an embedded Lua interpreter

• It is used in MySQL Enterprise Monitor• Lua was flexible to allow you to rewrite

queries, add statements, filter, etc.• 2007-2014

Page 78: Best practices for MySQL High Availability percona live 2017

MariaDB MaxScale 1.0…1.4.x

• GA January 2015• The “Swiss Army Knife” - pluggable router with an

extensible architecture• Logging, writing to other backends (besides MySQL),

firewall filter, routing via hints, query rewriting• Binlog Server - popularised by booking.com to not have

intermediate masters• Popular use case: sitting in front of a 3-node Galera Cluster

Page 79: Best practices for MySQL High Availability percona live 2017

MariaDB MaxScale ecosystem

• First known plugin: Kafka backend written by Yves Trudeau

• https://www.percona.com/blog/2015/06/08/maxscale-a-new-tool-to-solve-your-mysql-scalability-problems/

• First known credible fork: AirBnB MaxScale 1.3•connection pooling (not 1:1, multiplexed N:M, N>M connections), requests throttling, denylist query rejection, monitoring

Page 80: Best practices for MySQL High Availability percona live 2017

MariaDB MaxScale 2.0

• Same Github repository, unlinked against MySQL client libraries (replaced with SQLite), CDC to Kafka, binlog events to Avro/JSON

• License change from GPLv2 to Business Source License (BSL)

Page 81: Best practices for MySQL High Availability percona live 2017
Page 82: Best practices for MySQL High Availability percona live 2017
Page 83: Best practices for MySQL High Availability percona live 2017

MariaDB MaxScale 2.1 beta

• Dynamic (re)configuration

• Performance

Page 84: Best practices for MySQL High Availability percona live 2017

MySQL Router - GPLv2

• GA October 2015• Transparent routing between applications and any

backend MySQL servers• Pluggable architecture via the MySQL Harness• Failover, load balancing• This is how you manage MySQL InnoDB Cluster with mysqlsh - https://www.youtube.com/watch?v=JWy7ZLXxtZ4

Page 85: Best practices for MySQL High Availability percona live 2017

ProxySQL - GPLv3• Stable December 2015• ProxySQL - included with Percona

XtraDB Cluster 5.7, proxysql-admin tool available for PXC configurations

• Improve database operations, understand and solve performance issues, HA to DB topology

• Connection Pooling & Multiplexing• Read/Write Split and Sharding

• Seamless failover (including query rerouting), load balancing

• Query caching• Query rewriting• Query blocking (database aware

firewall)• Query mirroring (cache warming)• Query throttling and timeouts• Runtime reconfigurable • Monitoring built-in

Page 86: Best practices for MySQL High Availability percona live 2017

Comparison

• http://www.proxysql.com/compare

Page 87: Best practices for MySQL High Availability percona live 2017

ProxySQL missing features from MariaDB MaxScale• Front-end SSL encryption (client -> SSL -> proxy

-> application) - issue#891• Binlog router• Streaming binlogs to Kafka• use Maxwell’s Daemon: http://maxwells-

daemon.io/ • Binlogs to Avro

Page 89: Best practices for MySQL High Availability percona live 2017

Health of these projects

• MariaDB MaxScale: 142 watchers, 670 stars, 199 forks, 19 contributors

• MySQL Router: 25 watchers, 47 stars, 30 forks, 8 contributors

• ProxySQL: 119 watchers, 951 stars, 145 forks, 25 contributors

Page 90: Best practices for MySQL High Availability percona live 2017

Punch cards

Page 91: Best practices for MySQL High Availability percona live 2017
Page 92: Best practices for MySQL High Availability percona live 2017

What do you use?

• MySQL Router is clearly very interesting going forward, especially with the advent of the MySQL InnoDB Cluster

• ProxySQL is a great choice today, has wide use, also has Percona Monitoring & Management (PMM) integration

• MariaDB MaxScale pre-2.0 if you really need a binlog router

• Server you’re using?

Page 94: Best practices for MySQL High Availability percona live 2017

JDBC/PHP drivers

• JDBC - multi-host failover feature (just specify master/slave hosts in the properties)

• true for MariaDB Java Connector too

• PHP handles this too - mysqlnd_ms

• Can handle read-write splitting, round robin or random host selection, and more

94

Page 95: Best practices for MySQL High Availability percona live 2017

Clustering: solution or part of problem?

• "Causes of Downtime in Production MySQL Servers" whitepaper, Baron Schwartz, VividCortex

• Human error

• SAN

• Clustering framework + SAN = more problems

• Galera is replication based, has no false positives as there’s no “failover” moment, you don’t need a clustering framework (JDBC or PHP can load balance), and is relatively elegant overall

95

Page 96: Best practices for MySQL High Availability percona live 2017

InnoDB based?

• Use InnoDB, continue using InnoDB, know workarounds to InnoDB

• All solutions but NDB are InnoDB. NDB is great for telco/session management for high bandwidth sites, but setup, maintenance, etc. is complex

96

Page 97: Best practices for MySQL High Availability percona live 2017

Replication type• Competence choices

• Replication: MySQL DBA manages

• DRBD: Linux admin manages

• SAN: requires domain controller

• Operations

• DRBD (disk level) = cold standby = longer failover

• Replication = hot standby = shorter failover

• GTID helps tremendously

• Performance

• SAN has higher latency than local disk

• DRBD has higher latency than local disk

• Replication has little overhead

• Redundancy

• Shared disk = SPoF

• Shared nothing = redundant

97

Page 98: Best practices for MySQL High Availability percona live 2017

SBR vs RBR? Async vs sync?

• row based: deterministic

• statement based: dangerous

• GTID: easier setup & failover of complex topologies

• async: data loss in failover

• sync: best

• multi-threaded slaves: scalability (hello 5.6+, Tungsten)

98

Page 99: Best practices for MySQL High Availability percona live 2017

Conclusions for choice• Simpler is better

• MySQL replication > DRBD > SAN

• Sync replication = no data loss

• Async replication = no latency (WAN)

• Sync multi-master = no failover required

• Multi-threaded slaves help in disk-bound workloads

• GTID increases operational usability

• Galera provides all this with good performance & stability

99

Page 100: Best practices for MySQL High Availability percona live 2017

Deep-dive: MHA

100

Page 101: Best practices for MySQL High Availability percona live 2017

Why MHA needs coverage

• High Performance MySQL, 3rd Edition

• Published: March 16 2012

101

Page 102: Best practices for MySQL High Availability percona live 2017

Where did MHA come from?

• DeNA won 2011 MySQL Community Contributor of the Year (April 2011)

• MHA came in about 3Q/2011

• Written by Yoshinori Matsunobu, Oracle ACE Director

102

Page 103: Best practices for MySQL High Availability percona live 2017

What is MHA?

• MHA for MySQL: Master High Availability Manager tools for MySQL

• Goal: automating master failover & slave promotion with minimal downtime

• Set of Perl scripts

• http://code.google.com/p/mysql-master-ha/

103

Page 104: Best practices for MySQL High Availability percona live 2017

Why MHA?

• Automating monitoring of your replication topology for master failover

• Scheduled online master switching to a different host for online maintenance

• Switch back after OPTIMIZE/ALTER table, software or hardware upgrade

• Schema changes without stopping services

• pt-online-schema-change, oak-online-alter-table, Facebook OSC, Github gh-ost

• Interactive/non-interactive master failover (just for failover, with detection of master failure + VIP takeover to Pacemaker)

104

Page 105: Best practices for MySQL High Availability percona live 2017

Why is master failover hard?• When master fails, no more writes

till failover complete

• MySQL replication is asynchronous (MHA works with async + semi-sync replication)

• slave2 is latest, slave1+3 have missing events, MHA does:

• copy id=10 from master if possible

• apply all missing events

105

Page 106: Best practices for MySQL High Availability percona live 2017

MHA: Typical scenario

• Monitor replication topology

• If failure detected on master, immediately switch to a candidate master or the most current slave to become new master

• MHA must fail to connect to master server three times

• CHANGE MASTER for all slaves to new master

• Print (stderr)/email report, stop monitoring

106

Page 107: Best practices for MySQL High Availability percona live 2017

So really, what does MHA do?

107

Page 108: Best practices for MySQL High Availability percona live 2017

Typical timeline

• Usually no more than 10-30 seconds

• 0-10s: Master failover detected in around 10 seconds

• (optional) check connectivity via secondary network

• (optional) 10-20s: 10 seconds to power off master

• 10-20s: apply differential relay logs to new master

• Practice: 4s @ DeNA, usually less than 10s

108

Page 109: Best practices for MySQL High Availability percona live 2017

How does MHA work?

• Save binlog events from crashed master

• Identify latest slave

• Apply differential relay log to other slaves

• Apply saved binlog events from master

• Promote a slave to new master

• Make other slaves replicate from new master

109

Page 110: Best practices for MySQL High Availability percona live 2017

Getting Started• MHA requires no changes to

your application

• You are of course to write to a virtual IP (VIP) for your master

• MHA does not build replication environments for you - that’s DIY

110

Page 111: Best practices for MySQL High Availability percona live 2017

MHA Node

• Download mha4mysql-node & install this on all machines: master, slaves, monitor

• Packages (DEB, RPM) available

• Manually, make sure you have DBD::mysql & ensure it knows the path of your MySQL

111

Page 112: Best practices for MySQL High Availability percona live 2017

MHA Manager server

• Monitor server doesn’t have to be powerful at all, just remain up

• This is a single-point-of-failure so monitor the manager server where MHA Manager gets installed

• If MHA Manager isn’t running, your app still runs, but automated failover is now disabled

112

Page 113: Best practices for MySQL High Availability percona live 2017

MHA Manager

• You must install mha4mysql-node then mha4mysql-manager

• Manager server has many Perl dependencies: DBD::mysql, Config::Tiny, Log::Dispatch, Parallel::ForkManager, Time::HiRes

• Package management fixes dependencies, else use CPAN

113

Page 114: Best practices for MySQL High Availability percona live 2017

Configuring MHA

• Application configuration file: see samples/conf/app1.cnf

• Place this in /etc/MHA/app1.cnf

• Global configuration file: see /etc/MHA/masterha_default.cnf (see samples/conf/masterha_default.cnf)

114

Page 115: Best practices for MySQL High Availability percona live 2017

app1.cnf[server default]

manager_workdir=/var/log/masterha/app1

manager_log=/var/log/masterha/app1/manager.log

[server1]

hostname=host1

[server2]

hostname=host2

candidate_master=1

[server3]

hostname=host3

[server4]

hostname=host4

no_master=1

115

no need to specify master as

MHA auto-detects this

sets priority, but doesn’t necessarily mean it gets promoted

as a default (say its too far behind replication). But maybe this is a more powerful box, or has a

better setup

will never be the master. RAID0 instead of RAID1+0?

Slave is in another data centre?

Page 116: Best practices for MySQL High Availability percona live 2017

masterha_default.cnf[server default]

user=root

password=rootpass

ssh_user=root

master_binlog_dir= /var/lib/mysql,/var/log/mysql

remote_workdir=/data/log/masterha

ping_interval=3

# secondary_check_script=masterha_secondary_check -s remote_host1 -s remote_host2

# master_ip_failover_script= /script/masterha/master_ip_failover

# shutdown_script= /script/masterha/power_manager

# report_script= /script/masterha/send_report

# master_ip_online_change_script= /script/masterha/master_ip_online_change

116

check master activity from manager->remote_hostN->

master (multiple hosts to ensure its not a network issue)

STONITH

Page 117: Best practices for MySQL High Availability percona live 2017

MHA uses SSH

• MHA uses SSH actively; passphraseless login

• In theory, only require Manager SSH to all nodes

• However, remember masterha_secondary_check

•masterha_check_ssh --conf=/etc/MHA/app1.cnf

117

Page 118: Best practices for MySQL High Availability percona live 2017

Check replication

•masterha_check_repl --conf=/etc/MHA/app1.cnf

• If you don’t see MySQL Replication Health is OK, MHA will fail

• Common errors? Master binlog in different position, read privileges on binary/relay log not granted, using multi-master replication w/o read-only=1 set (only 1 writable master allowed)

118

Page 119: Best practices for MySQL High Availability percona live 2017

MHA Manager

•masterha_manager --conf=/etc/MHA/app1.cnf

• Logs are printed to stderr by default, set manager_log

• Recommended running with nohup, or daemontools (preferred in production)

• http://code.google.com/p/mysql-master-ha/wiki/Runnning_Background

119

Page 120: Best practices for MySQL High Availability percona live 2017

So, the MHA Playbook

• Install MHA node, MHA manager

•masterha_check_ssh --conf=/etc/app1.cnf

•masterha_check_repl --conf=/etc/app1.cnf

•masterha_manager --conf=/etc/app1.cnf

• That’s it!

120

Page 121: Best practices for MySQL High Availability percona live 2017

master_ip_failover_script

• Pacemaker can monitor & takeover VIP if required

• Can use a catalog database

• map between application name + writer/reader IPs

• Shared VIP is easy to implement with minimal changes to master_ip_failover itself (however, use shutdown_script to power off machine)

121

Page 122: Best practices for MySQL High Availability percona live 2017

master_ip_online_change

• Similar to master_ip_failover script, but used for online maintenance

•masterha_master_switch --master_state=alive

• MHA executes FLUSH TABLES WITH READ LOCK after the writing freeze

122

Page 123: Best practices for MySQL High Availability percona live 2017

Test the failover

•masterha_check_status --conf=/etc/MHA/app1.cnf

• Kill MySQL (kill -9, shutdown server, kernel panic)

• MHA should go thru failover (stderr)

• parse the log as well

• Upon completion, it stops running

123

Page 124: Best practices for MySQL High Availability percona live 2017

masterha_master_switch• Manual failover

•--master_state=dead

• Scheduled online master switchover

• Great for upgrades to server, etc.

•masterha_master_switch --master_state=alive --conf=/etc/MHA/app1.cnf --new_master_host=host2

124

Page 125: Best practices for MySQL High Availability percona live 2017

Handling VIPsmy $vip = ‘192.168.0.1/24”;

my $interface = “0”;

my $ssh_start_vip = “sudo /sbin/ifconfig eth0:$key $vip”;

my $ssh_stop_vip = “sudo /sbin/ifconfig eth0:$key down”;

...

sub start_vip() {

`ssh $ssh_user\@$new_master_host \” $ssh_start_vip \”`; }

sub stop_vip() {

`ssh $ssh_user\@$orig_master_host \” $ssh_stop_vip \”`; }

125

Page 126: Best practices for MySQL High Availability percona live 2017

Integration with other HA solutions

• Pacemaker

• on RHEL6, you need some HA add-on, just use the CentOS packages

• /etc/ha.d/haresources to configure VIP

•`masterha_master_switch --master_state=dead --interactive=0 --wait_on_failover_error=0 --dead_master_host=host1 --new_master_host=host2`

• Corosync + Pacemaker works well

126

Page 127: Best practices for MySQL High Availability percona live 2017

What about replication delay?

• By default, MHA checks to see if slave is behind master. By more than 100MB, it is never a candidate master

• If you have candidate_master=1 set, consider setting check_repl_delay=0

• You can integrate it with pt-heartbeat from Percona Toolkit

127

Page 128: Best practices for MySQL High Availability percona live 2017

MHA deployment tips• You really should install this as root

• SSH needs to work across all hosts

• If you don’t want plaintext passwords in config files, use init_conf_load_script

• Each monitor can monitor multiple MHA pairs (hence app1, app2, etc.)

• You can have a standby master, make sure its read-only

• By default, master1->master2->slave3 doesn’t work

• MHA manages master1->master2 without issue

• Use multi_tier_slave=1 option

• Make sure replication user exists on candidate master too!

128

Page 129: Best practices for MySQL High Availability percona live 2017

Consul

• Service discovery & configuration. Distributed, highly available, data centre aware

• Comes with its own built-in DNS server, KV storage with HTTP API

• Raft Consensus Algorithm

129

Page 130: Best practices for MySQL High Availability percona live 2017

MHA + Consul

130

Page 131: Best practices for MySQL High Availability percona live 2017

VIPs vs Consul

• Previously, you handled VIPs and had to write to master_ip_online_change/master_ip_failover

• system(“curl -X PUT -d ‘{\”Node\”:\”master\”}’ localhost:8500/v1/catalog/deregister);

• system(“curl -X PUT -d ‘{\”Node\”:\”master\”, \”Address\”:\”$new_master_host\”}’ localhost:8500/v1/catalog/register);

131

Page 132: Best practices for MySQL High Availability percona live 2017

mysqlfailover• mysqlfailover from mysql-utilities using GTID’s in 5.6

• target topology: 1 master, n-slaves

• enable: log-slave-updates, report-host, report-port, master-info-table=TABLE

• modes: elect (choose candidate from list), auto (default), fail

• --discover-slaves-login for topology discovery

• monitoring node: SPoF

• Errant transactions prevent failover!

• Restart node? Rejoins replication topology, as a slave.

132

Page 133: Best practices for MySQL High Availability percona live 2017

MariaDB 10

• New slave: SET GLOBAL GTID_SLAVE_POS = BINLOG_GTID_POS("master-bin.00024", 1600); CHANGE MASTER TO master_host="10.2.3.4", master_use_gtid=slave_pos; START SLAVE;

• use GTID: STOP SLAVE CHANGE MASTER TO master_use_gtid=current_pos; START SLAVE;

• Change master: STOP SLAVE CHANGE MASTER TO master_host="10.2.3.5"; START SLAVE;

133

Page 134: Best practices for MySQL High Availability percona live 2017

Where is MHA used

• DeNA

• Premaccess (Swiss HA hosting company)

• Ireland’s national TV & radio service

• Jetair Belgium (MHA + MariaDB!)

• Samsung

• SK Group

• DAPA

134

Page 135: Best practices for MySQL High Availability percona live 2017

MHA 0.56 is current

• Major release: MHA 0.56 April 1 2014 (0.55: December 12 2012)

• http://code.google.com/p/mysql-master-ha/wiki/ReleaseNotes

135

Page 136: Best practices for MySQL High Availability percona live 2017

MHA 0.56

• 5.6 GTID: GTID + auto position enabled? Failover with GTID SQL syntax not relay log failover

• MariaDB 10+ doesn’t work

• MySQL 5.6 support for checksum in binlog events + multi-threaded slaves

• mysqlbinlog and mysql in custom locations (configurable clients)

• binlog streaming server supported

136

Page 137: Best practices for MySQL High Availability percona live 2017

MHA 0.56

• ping_type = INSERT (for master connectivity checks - assuming master isn’t accepting writes)

137

Page 138: Best practices for MySQL High Availability percona live 2017

Replication Manager• Support for MariaDB Server GTIDs, MySQL and Percona Server

• Single, portable 12MB binary

• Interactive GTID monitoring

• Supports failover or switchover based on requests

• Topology detection

• Health checks

• GUI! - https://github.com/tanji/replication-manager

138

Page 139: Best practices for MySQL High Availability percona live 2017

139

Page 140: Best practices for MySQL High Availability percona live 2017

Is fully automated failover a good idea?

• False alarms

• Can cause short downtime, restarting all write connections

• Repeated failover

• Problem not fixed? Master overloaded?

• MHA ensures a failover doesn’t happen within 8h, unless --last_failover_minute=n is set

• Data loss

• id=103 is latest, relay logs are at id=101, loss

• group commit means sync_binlog=1, innodb_flush_log_at_trx_commit=1 can be enabled! (just wait for master to recover)

• Split brain

• sometimes poweroff takes a long time

140

Page 141: Best practices for MySQL High Availability percona live 2017

Video resources• Yoshinori Matsunobu talking about High Availability & MHA at Oracle

MySQL day

• http://www.youtube.com/watch?v=CNCALAw3VpU

• Alex Alexander (AccelerationDB) talks about MHA, with an example of failover, and how it compares to Tungsten

• http://www.youtube.com/watch?v=M9vVZ7jWTgw

• Consul & MHA failover in action

• https://www.youtube.com/watch?v=rA4hyJ-pccU

141

Page 142: Best practices for MySQL High Availability percona live 2017

References• Design document

• http://www.slideshare.net/matsunobu/automated-master-failover

• Configuration parameters

• http://code.google.com/p/mysql-master-ha/wiki/Parameters

• JetAir MHA use case

• http://www.percona.com/live/mysql-conference-2012/sessions/case-study-jetair-dramatically-increasing-uptime-mha

• MySQL binary log

• http://dev.mysql.com/doc/internals/en/binary-log.html

142

Page 143: Best practices for MySQL High Availability percona live 2017

143

Page 144: Best practices for MySQL High Availability percona live 2017

Service Level Agreements (SLA)

• AWS - 99.95% in a calendar month

• Rackspace - 99.9% in a calendar month

• Google - 99.95% in a calendar month

• SLAs exclude “scheduled maintenance”

• AWS is 30 minutes/week, so really 99.65%

144

Page 145: Best practices for MySQL High Availability percona live 2017

RDS: Multi-AZ

• Provides enhanced durability (synchronous data replication)

• Increased availability (automatic failover)

• Warning: can be slow (1-10 mins+)

• Easy GUI administration

• Doesn’t give you another usable “read-replica” though

145

Page 146: Best practices for MySQL High Availability percona live 2017

External replication

• MySQL 5.6 you can do RDS -> Non-RDS

• enable backup retention, you now have binlog access

• You can nowadays replicate INTO RDS

• (you may also) use Tungsten Replicator

• also supports going from RDS to Rackspace/etc. (hybrid clouds)

146

Page 147: Best practices for MySQL High Availability percona live 2017

High Availability

• Plan for node failures

• Don’t assume node provisioning is quick

• Backup, backup, backup!

• “Bad” nodes exist

• HA is not equal across options - RDS wins so far

147

Page 148: Best practices for MySQL High Availability percona live 2017

Unsupported features

• AWS: GTIDs, InnoDB Cache Warming, InnoDB transportable tablespaces, authentication plugins, semi-sync replication

• Google: UDFs, LOAD DATA INFILE, INSTALL PLUGIN, SELECT ... INTO OUTFILE

148

Page 149: Best practices for MySQL High Availability percona live 2017

Can you configure MySQL?

• You don’t access my.cnf naturally

• In AWS you have parameter groups which allow configuration of MySQL

149

source: http://www.mysqlperformanceblog.com/2013/08/21/amazon-rds-with-mysql-5-6-configuration-variables/

Page 150: Best practices for MySQL High Availability percona live 2017

150

Page 151: Best practices for MySQL High Availability percona live 2017

Sharding solutions

• Not all data lives in one place

• hash records to partitions

• partition alphabetically? put n-users/shard? organise by postal codes?

151

Page 152: Best practices for MySQL High Availability percona live 2017

Horizontal vs vertical

152

192.168.0.1User

id int(10)username char(15)password char(15)

email char(50)

192.168.0.2User

id int(10)username char(15)password char(15)

email char(50)

192.168.0.3User

id int(10)username char(15)password char(15)

email char(50)

192.168.0.1User

id int(10)username char(15)password char(15)

email char(50)

192.168.0.2

UserInfologin datetime

md5 varchar(32)guid varchar(32)

Better if INSERTheavy and there’s

less frequentlychanged data

Page 153: Best practices for MySQL High Availability percona live 2017

How do you shard?

• Use your own sharding framework

• write it in the language of your choice

• simple hashing algorithm that you can devise yourself

• SPIDER

• Tungsten Replicator

• Tumblr JetPants

• Google Vitess

153

Page 154: Best practices for MySQL High Availability percona live 2017

SPIDER

• storage engine to vertically partition tables

154

Page 155: Best practices for MySQL High Availability percona live 2017

Tungsten Replicator (OSS)

• Each transaction tagged with a Shard ID

• controlled in a file: shard.list, exposed via JMX MBean API

• primary use? geographic replication

• in application, requires changes to use the API to specify shards used

155

Page 156: Best practices for MySQL High Availability percona live 2017

Tumblr JetPants

• clone replicas, rebalance shards, master promotions (can also use MHA for master promotions)

• Ruby library, range-based sharding scheme

• https://github.com/tumblr/jetpants

• Uses MariaDB as an aggregator node (multi-source replication)

156

Page 157: Best practices for MySQL High Availability percona live 2017

Google (YouTube) vitess

• Servers & tools to scale MySQL for web written in Go

• Has MariaDB Server & MySQL support

• DML annotation, connection pooling, shard management, workflow management, zero downtime restarts

• Become super easy to use: http://vitess.io/ (with the help of Kubernetes)

157

Page 158: Best practices for MySQL High Availability percona live 2017

158

Page 159: Best practices for MySQL High Availability percona live 2017

Conclusion

• MySQL replication is amazing if you know it (and monitor it) well enough

• Large sites run just fine with semi-sync + tooling for automated failover

• Galera Cluster is great for fully synchronous replication

• Don’t forget the need for a load balancer: ProxySQL is nifty

159

Page 160: Best practices for MySQL High Availability percona live 2017

At Percona, we care about your High Availability

• Percona XtraDB Cluster 5.7 with support for ProxySQL and Percona Monitoring & Management (PMM)

• Percona Monitoring & Management (PMM) with Orchestrator

• Percona Toolkit

• Percona Server for MySQL 5.7

• Percona XtraBackup

160

Page 161: Best practices for MySQL High Availability percona live 2017

Resources

161

Page 162: Best practices for MySQL High Availability percona live 2017

Resources II

162

Page 163: Best practices for MySQL High Availability percona live 2017

Q&A / [email protected] / [email protected]@bytebot on Twitter | http://bytebot.net/blog/

slides: slideshare.net/bytebot

163