Upload
ludovico-caldara
View
5.305
Download
6
Embed Size (px)
DESCRIPTION
Some slides I've presented at TechEvent 04.2013 and LinuxDay 2013
Citation preview
2012 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
TechEventApril 2013
Galera ClusterSynchronous Multi-Master Replication for MySQL HA
Ludovico CALDARA
LS-IMS
27.04.2013
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
1
2012 © Trivadis
MySQL forks: which one is better?
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
2
� New forks
� Many new features
� Improved instrumentation
� New solutions for DEVs and DBAs
� Fast-paced competition between forks’ developers
� Recent evolutions in HA and scalability have made MySQL enterprise ready
MySQL Oracle MySQL
Percona Server
MariaDB
Drizzle
2012 © Trivadis
There is no recipe that can satisfy all tastes
Percona Server MariaDB MySQL
Multi source replication NO YES (rel. 10) NO
NoSQL integration YES (cassandra) YES (cassandra) YES (memcached)
Virtual Columns NO YES NO
Improved diagnostics YES NO NO
Online DDL NO YES YES
Galera Cluster YES YES YES (codership patch)
Many many others YES/NO YES/NO YES/NO
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
3
2012 © Trivadis
Your real requirements will let you choose… Need HA?
• How will react your customer if there is an important loss of service?
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
4
2012 © Trivadis
MySQL Multi-Master Replication• Complex and not reliable• Concurrent writes lead to logical corruption
Native MySQL Replication• Doesn’t scale writes• Complex to promote slaves
Old-school solutions have weaknesses
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
5
DRBD Replication• Standby is offline, doesn’t scale at all• Poor performance
NDB
NDB
NDB
MySQL Cluster• Very complex• It’s not InnoDB!
2012 © Trivadis
Continuent Tungsten Replicator• Similar to Golden Gate• Heterogeneous databases• Provides complex topologies• Asynchronous• Conflicts are complex to resolve• Complex to maintain• Not free
New school solutions: 3rd parties are playing a decisive role
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
6
ORACLE
MYSQL
MYSQL
ORACLEMYSQL
Galera Cluster Replication• Transparent Multi-Master easy to mantain• (Virtually) Synchronous• It’s InnoDB (only InnoDB)• Great and easy scalability• Optimistic locking (side effects)• At least 3 nodes for good HA
2012 © Trivadis
Multi-Master and virtually synchronous: it’s transparent
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
7
R/W R/W R/W R/W R/W
2012 © Trivadis
Cluster implementation - Ingredients
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
8
• One or more standalone servers (either physical or virtual)
• Linux (other operating systems are not yet available)
• “Permissive” Firewall between nodes
• Codership’s Galera Library package
• A package of your choice:
• Percona XtraDB Cluster
• MariaDB Galera Cluster
• MySQL with wsrep patch (patched by Codership)
2012 © Trivadis
Cluster implementation - Variables
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
9
• Each server’s my.cnf must contain:
• wsrep_cluster_address=gcomm://192.168.1.100,…,192.168.1.10x
• wsrep_provider=/usr/lib64/libgalera_smm.so
• binlog_format=ROW
• default_storage_engine=InnoDB
• innodb_autoinc_lock_mode=2
• innodb_locks_unsafe_for_binlog=1 #disables gap locking
2012 © Trivadis
Cluster implementation – Start the cluster
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
10
mysqld_safe --wsrep_cluster_address=gcomm:// &
[…]130220 17:56:46 [Note] WSREP: Starting new group from scratch : […]
The empty gcomm:// address starts the node as the first of the cluster
NEVER USE IT TO JOIN AN EXISTING CLUSTER
2012 © Trivadis
Cluster implementation – Adding nodes to the cluster
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
11
mysqld_safe \
--wsrep_cluster_address=gcomm://host1,host2… &
[…]130220 18:01:56 [Note] WSREP: Shifting OPEN -> PRIMARY (TO:…)130220 18:01:56 [Note] WSREP: State transfer required: […]
The address should be already present in the my.cnf!
2012 © Trivadis
Server State Transfer
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
12
• The joiner asks for a SST
R/W R/W R/W
2012 © Trivadis
Server State Transfer
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
13
• The joiner asks for a SST
R/W R/W
• The cluster chooses a donor, the donor is taken offline
DONOR
2012 © Trivadis
Server State Transfer
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
14
• The joiner asks for a SST
R/W R/W R/W
• The cluster chooses a donor, the donor is taken offline
DONOR
• The donor is backed up
• The donor comes online again and the joiner is loaded
2012 © Trivadis
Server State Transfer
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
15
• The joiner asks for a SST
R/W R/W R/W
• The cluster chooses a donor, the donor is taken offline
DONOR
• The donor is backed up
• The donor comes online again and the joiner is loaded
• The joiner replays the missing transactions and joins the cluster
R/W
2012 © Trivadis
Server State Transfer
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
16
• The joiner asks for a SST
R/W R/W R/W
• The cluster chooses a donor, the donor is taken offline
• The donor is backed up
• The donor comes online again and the joiner is loaded
• The joiner replays the missing transactions and joins the cluster
R/W• The cluster can also do
Incremental State Transfers (IST)
2012 © Trivadis
Split-Brain
• The majority of nodes wins
• Complete loss of network: all nodes go offline
• The offline nodes will respond:mysql> select * from emp;
ERROR 1047 (08S01): Unknown command
• Galera arbitrator (garbd) can join the cluster and count as a member in split brain resolution.
• NEW: Galera 2.4 intruduces weighted quorum
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
17
gararbitrator
2012 © Trivadis
WAN
Example 1: Arbitrator in Trivadis Swiss
… sorry for German/Austrian attenders ☺
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
18
arbitrator
ZURICH
BASEL
BERN
LAUSANNE
• If the WAN connection is lost, Zurich survives
• If the Zurich site is lost, the cluster will be off lined
2012 © Trivadis
WAN
Example 2: Arbitrator in Trivadis Swiss
… sorry for German/Austrian attenders ☺
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
19
arbitrator
ZURICH
BASEL
BERN
LAUSANNE
• If the Zurich site is lost, the other sites survive
• If the WAN connection is lost, the cluster will be off lined
2012 © Trivadis
What does “Virtually synchronous” mean? In brief:
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
20
Write
2012 © Trivadis
What does “Virtually synchronous” mean? In brief:
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
21
Write
Commit
WS
2012 © Trivadis
What does “Virtually synchronous” mean? In brief:
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
22
Write
Commit
WS WS WS
2012 © Trivadis
What does “Virtually synchronous” mean? In brief:
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
23
Write
Commit
WS WS
Commit OK
2012 © Trivadis
What does “Virtually synchronous” mean? In brief:
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
24
Write
Commit
WS WS
Commit OK
• Writes are as fast as if they were local
• Commits take just the time of a network roundtrip: if acceptable then the cluster can be spread geographically
2012 © Trivadis
Optimistic locking leads to side effects
mysql> update emp set salary=‘peanuts’ where name=‘Caldara' ;Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
25
2012 © Trivadis
Optimistic locking leads to side effects
mysql> update emp set salary=‘peanuts’ where name=‘Caldara' ;Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
26
mysql> update emp set salary=‘one billion' where name=‘Cald ara';Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0
2012 © Trivadis
Optimistic locking leads to side effects
mysql> update emp set salary=‘peanuts’ where name=‘Caldara' ;Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27
mysql> update emp set salary=‘one billion' where name=‘Cald ara';Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0mysql> commit ;
WS
Query OK , 0 rows affected (0.01 sec
WS
2012 © Trivadis
Optimistic locking leads to side effects
mysql> update emp set salary=‘peanuts’ where name=‘Caldara' ;Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
28
mysql> update emp set salary=‘one billion' where name=‘Cald ara';Query OK , 1 row affected (0.03 sec)Rows matched: 1 Changed: 1 Warnings: 0mysql> commit ;
mysql> commit ;ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Query OK , 0 rows affected (0.01 sec
WS
mysql> select salary from emp where name=‘Caldara’;+-------------+| salary |+-------------+| one billion |+-------------+
2012 © Trivadis
Conclusions on optimistic locking…
• Cluster-wise, the first that broadcasts its commit wins (it’s Galera…)
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
29
• The application should not have hotspots...
• … or it should retry the transaction after the deadlock occurs…
• … or, for each database, you can elegy one node as the master
• Locally, the first that acquires the lock wins (it’s InnoDB…)
2012 © Trivadis
About performance
• Commit performance loss is between 5% and 10% plus the network RTT
• Write workloads scale to up to 8 nodes
• >8 nodes: it scales reads, not writes
• Many benchmarks show that Galera overcomes NDB with few nodes
• NDB scales out more with many nodes thanks to data sharding
• Benchmarks on internet are not always reliable… test the performance of YOUR application
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
30
2012 © Trivadis
How to migrate
• Converts all your tables to InnoDB
• Double-check that all tables have primary keys
• Think about potential problems caused by triggers (if you have any)
• Create a new empty Galera Cluster
• Setup MySQL native replication between the old database and the Galera cluster
• Once all is aligned, direct your clients on the new cluster
• Setup the old node to join the cluster
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
31
NATIVEREPLICATION
JOIN
2012 © Trivadis
Load balancing
• HAProxy is the most used solution so far
• Codership is actively developing his own load balancer: Galera Load Balancer (glbd)
• Several balancing modes: round robin, custom, least connected, …
• Automatically drains disconnected nodes
• New nodes can be added with a single tcpcall
• Release 1.0 (now rc1) will support watchdog and automatic discover of nodes composing the cluster
• Other methods possible (e.g. java connector properties, HW load balancer)
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
32
2012 © Trivadis
Conclusions on Galera Cluster
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
33
•
•
•
•
•
•
•
• At least 3 nodes to have good HA
• Optimistic locking (side effects)
• Explicit locking doesn’t work
• Only InnoDB is replicated
• Primary keys are mandatory
• Not yet available for MySQL 5.6
• Linux only
• Multi-master
•
•
•
•
•
•
• Easy to add/remove nodes
• Multi-master
• Shared-nothing
• Great performances and scalability
• «Virtually» synchronous
• It uses InnoDB!!
• Conflict prevention
• Split-brain (no inconsistencies)
• Easy to add/remove nodes
2012 © Trivadis
Links
http://www.slideshare.net/skysql/galera-cluster-by-seppo-jaakola-codership-at-skysql-roadshow-in-stuttgart-2013
http://www.codership.com/files/presentations/Galera_Replication_PLL_2011.pdf
http://www.mysqlperformanceblog.com/2013/01/31/feature-in-details-incremental-state-transfer-after-a-node-crash-in-percona-xtradb-cluster/
http://www.percona.tv/percona-webinars/migrating-to-percona-xtradb-cluster
http://www.codership.com/content/5-tips-migrating-your-mysql-server-galera-cluster
http://www.mysqlperformanceblog.com/2012/08/17/percona-xtradb-cluster-multi-node-writing-and-unexpected-deadlocks/
http://www.mysqlperformanceblog.com/2012/11/20/understanding-multi-node-writing-conflict-metrics-in-percona-xtradb-cluster-and-galera/
http://www.mysqlperformanceblog.com/2011/10/13/benchmarking-galera-replication-overhead/
http://karlssonondatabases.blogspot.ch/2012/12/galera-features-beyond-just-ha.html
http://infoscience.epfl.ch/record/52305/files/IC_TECH_REPORT_199908.pdf
http://www.inf.usi.ch/faculty/pedone/Paper/2005/2005WDIDDR.pdf
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
34
2012 © Trivadis
Little demo?
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
35
2012 © Trivadis
?
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
36
2012 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
THANK YOU.Trivadis SA
Ludovico CaldaraSenior Consultant
27.04.2013Galera Cluster Synchronous Multi-Master Replication for MySQL HA
37