Upload
giuseppe-maxia
View
1.990
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Full tutorial of Tungsten Replicator installation and management
Citation preview
©Continuent 2013
Using Tungsten Replicator to solve replication
problemsNeil Armitage, Cluster implementation Engineer, Continuent
Giuseppe Maxia, QA Director, Continuent
1
1Monday, April 22, 13
©Continuent 2013
ABOUT US
• Neil Armitage
• Continuent Tungsten Deployment and Support Engineer, Continuent, Inc
• 20 years development and DB experience
• Giuseppe Maxia, a.k.a. "The Data Charmer"
• QA Director, Continuent, Inc
• 25 years development and DB experience
• long timer MySQL community member. Oracle ACE Director
2
2Monday, April 22, 13
©Continuent 2013
Tungsten replicator
• Global transaction ID
• Multiple masters
• Multiple sources
• Flexible topologies
• Parallel replication
• Heterogeneous replication
• ... and more
3
3Monday, April 22, 13
©Continuent 2013
What Tungsten Replicator is NOT
• Automated management
• Automatic failover
• Transparent connections
• All the above (and more) are available with a commercial solution named Continuent Tungsten (a.k.a. Tungsten Enterprise)
4
4Monday, April 22, 13
©Continuent 2013
What are we talking about?
• Requirements
• Components
• Installation
• Topologies
• Administration
• Troubleshooting
5
5Monday, April 22, 13
©Continuent 2013
Tungsten Replicator Concepts
6
Role
service
Replicator
Master, slave, direct slave
A.k.a. "pipeline"
The replication engine
stage extract,queue,apply
6Monday, April 22, 13
©Continuent 2013
Tungsten Replicator Components
7
THL
service schema
properties file
Transaction History Log
Makes the node crash proof
service definition
tools Ruling from a centralized location
7Monday, April 22, 13
©Continuent 2013
Tungsten Replicator in a nutshell
binlog THL
slavemaster
host1 host2
THL
trep_commit_seqnotrep_commit_seqnotrep_commit_seqnoorigin seqno eventid
trep_commit_seqnotrep_commit_seqnotrep_commit_seqnoorigin seqno eventid
global transaction ID
8
8Monday, April 22, 13
©Continuent 2013
Planning
9
• Hosts
• Topology
• Stand-alone or taking over
9Monday, April 22, 13
©Continuent 2013
star
master-slave Heterogeneous
fan-in slave all-masters
MySQL
Oracle
Oracle
MySQL Oracle
Oracle
MySQL MySQL
10Monday, April 22, 13
©Continuent 2013
Installation
11
11Monday, April 22, 13
©Continuent 2013
Installation
• System Requirements
• Validate !rst
• Deploying from a single location
12
12Monday, April 22, 13
©Continuent 2013
Installation - tools
• tools/ tungsten-installer
• tools/ con!gure-service
• tools/update
• (Using the cookbook recipes, you hardly see them)
13
13Monday, April 22, 13
©Continuent 2013
Tungsten in practiceInstallation
14
14Monday, April 22, 13
©Continuent 2013
Installation
• Check the requirements
• Get the binaries
• Expand the tarball
• Run cookbook
15
15Monday, April 22, 13
©Continuent 2013
REQUIREMENTS
• Java JRE or JDK (Sun/Oracle or Open-jdk)
• Ruby 1.8 (only during installation)
• ssh access to the same user in all nodes
• MySQL user with all privileges
16
16Monday, April 22, 13
©Continuent 2013
Installation - Choices
• --master-slave
• --direct
17
17Monday, April 22, 13
©Continuent 2013
binlog
THL
THL
slave
slave
master
host1host2
host3
THL
18
master-slave
18Monday, April 22, 13
©Continuent 2013
binlogTHL
slavemaster
relay log
host1host2
host3
THL
slave
relay log
19
direct
19Monday, April 22, 13
©Continuent 2013
Overview of Virtual Machines
• Copy zip !les from USB Key
• Expand on local disk
• Start all 4 Machines in VirtualBox
20
20Monday, April 22, 13
©Continuent 2013
Virtual Machines
• 4 Nodes host1->host4
• Running centos 6.3 and Percona 5.5
• Root and tungsten password = ‘password’
• localhost port 2222 redirects to 22 on hosts
21
ssh -‐p 2222 tungsten@localhost
21Monday, April 22, 13
©Continuent 2013
VERY important de!nitions
• Staging directory:
• Where you unpack the software and run the installer.
• There is generally only one, in one host;
• Can be discarded after installation
• Installation directory:
• Where your installed software will go;
• There is one for every host;
22
22Monday, April 22, 13
©Continuent 2013
Example
23
host1
host3
Staging directory:$HOME/tungsten-replicator-2.0.8-167
host2
Installation directory:/opt/replication
Installation directory:/opt/replication
Installation directory:/opt/replication
23Monday, April 22, 13
©Continuent 2013
Requirements : how to
• step by step: how it happened
24
24Monday, April 22, 13
©Continuent 2013
installing VMs
• Step-by-step demo
25
25Monday, April 22, 13
©Continuent 2013
Overview of Tungsten cookbook
26
26Monday, April 22, 13
©Continuent 2013
tungsten cookbook
tungsten-replicator-2.0.8-167 | +--/cluster-home +--/cookbook +--/tools +--/tungsten-replicator
27
27Monday, April 22, 13
©Continuent 2013
tungsten cookbook
tungsten-replicator-2.0.8-167 | +--/cookbook | +--COMMON_NODES.sh +--USER_VALUES.sh +--NODES_MASTER_SLAVE.sh +--install_master_slave +--show_cluster +--test_cluster ...
28
28Monday, April 22, 13
©Continuent 2013
tungsten cookbook
tungsten-replicator-2.0.8-167 | +--/cookbook | +--COMMON_NODES.sh +--USER_VALUES.sh +--NODES_ALL_MASTERS.sh +--install_all_masters +--show_cluster +--test_cluster ...
29
29Monday, April 22, 13
©Continuent 2013
tungsten cookbook
tungsten-replicator-2.0.8-167 | +--/cookbook | +--COMMON_NODES.sh +--USER_VALUES.sh +--NODES_STAR.sh +--install_star +--show_cluster +--test_cluster ...
30
30Monday, April 22, 13
©Continuent 2013
tungsten cookbook
tungsten-replicator-2.0.8-167 | +--/cookbook | +--COMMON_NODES.sh +--USER_VALUES.sh +--NODES_FAN_IN.sh +--install_fan_in +--show_cluster +--test_cluster ...
31
31Monday, April 22, 13
©Continuent 2013
tungsten cookbook
$ cat COMMON_NODES.sh
export NODE1=host1export NODE2=host2export NODE3=host3export NODE4=host4
32
32Monday, April 22, 13
©Continuent 2013
tungsten cookbook
$ cat USER_VALUES.sh# User defined values for the cluster to be installed.
export TUNGSTEN_BASE=$HOME/installs/cookbookexport DATABASE_USER=tungstenexport BINLOG_DIRECTORY=/var/lib/mysqlexport MY_CNF=/etc/my.cnfexport DATABASE_PASSWORD=secretexport DATABASE_PORT=3306export TUNGSTEN_SERVICE=cookbookexport RMI_PORT=10000export THL_PORT=2112export START_OPTION=start
33
33Monday, April 22, 13
©Continuent 2013
Getting started: VALIDATE FIRST
export VERBOSE=1./cookbook/check_cookbook./cookbook/validate_cluster
34
34Monday, April 22, 13
©Continuent 2013
sample master-slave installation
• edit cookbook/COMMON_NODES.sh
• edit cookbook/USER_VALUES.sh
• run cookbook/install_master_slave
• and then:
• run cookbook/show_cluster
• run cookbook/test_cluster
35
35Monday, April 22, 13
©Continuent 2013
What does the installation do
host4
1: Validate all servers
host1 host2 host3
Report all errors
36
36Monday, April 22, 13
©Continuent 2013
What does the installation do
host4
1: (again) Validate all servers
host1 host2 host3
37
37Monday, April 22, 13
©Continuent 2013
What does the installation do
2: install Tungsten in all servers
host3
$HOME/ tinstall/ config/ releases/ relay/ thl/ tungsten/ backups/
host4host1
host2
38
38Monday, April 22, 13
©Continuent 2013
example (from manual installation)
ssh r2 chmod 444 $HOME/tinstall./tools/tungsten-installer \ --master-slave --master-host=r1 \ --datasource-user=tungsten \ --datasource-password=secret \ --service-name=dragon \ --home-directory=$HOME/tinstall \ --thl-directory=$HOME/tinstall/logs \ --relay-directory=$HOME/tinstall/relay \ --cluster-hosts=r1,r2,r3,r4 --start
ERROR >> qa.r2.continuent.com >> /home/tungsten/tinstall is not writeable
39
39Monday, April 22, 13
©Continuent 2013
example
ssh r2 chmod 755 $HOME/tinstall./tools/tungsten-installer \ --master-slave --master-host=r1 \ --datasource-user=tungsten \ --datasource-password=secret \ --service-name=dragon \ --home-directory=$HOME/tinstall \ --thl-directory=$HOME/tinstall/logs \ --relay-directory=$HOME/tinstall/relay \ --cluster-hosts=r1,r2,r3,r4 --start
# no errors
40
40Monday, April 22, 13
©Continuent 2013
After installation. A tour of the cookbook utilities
41
41Monday, April 22, 13
©Continuent 2013
General principles (1)
42
• Scripts without extension are designed to be launched by users
• e.g. ./cookbook/help
• ./cookbook/install_master_slave
• Scripts with extension ".sh" are either for internal use only or deprecated.
• ./cookbook/install_* scripts can be used before installing. Most everything else require an installed topology
42Monday, April 22, 13
©Continuent 2013
General principles (2)
43
• After installation there is a !le CURRENT_TOPOLOGY in the staging directory
• cookbook scripts can be used either from the staging directory or from the installation directory.
43Monday, April 22, 13
©Continuent 2013
Cookbook tour: help and checks
44
./cookbook/check_cookbook
./cookbook/help
./cookbook/readme
44Monday, April 22, 13
©Continuent 2013
Cookbook tour: Getting information
45
./cookbook/show_cluster
./cookbook/paths
./cookbook/backups
./cookbook/services
./cookbook/query_node {node} {query}
./cookbook/query_all_nodes {query}
45Monday, April 22, 13
©Continuent 2013
Cookbook tour: Inspecting replication
46
./cookbook/replicator
./cookbook/trepctl
./cookbook/thl
./cookbook/show_conf
./cookbook/edit_conf
./cookbook/show_log
./cookbook/vimlog
./cookbook/emacslog
46Monday, April 22, 13
©Continuent 2013
Cookbook tour: testing tools
47
./cookbook/test_cluster
./cookbook/start_load [start|stop]
./cookbook/test_all_topologies
47Monday, April 22, 13
©Continuent 2013
Cookbook tour: powerful admin tools
48
./cookbook/heartbeat
./cookbook/switch
./cookbook/add_node_master_slave
./cookbook/add_node_star
./cookbook/copy_backup
./cookbook/clear_cluster # <--- CAUTION!
48Monday, April 22, 13
©Continuent 2013
More installation
49
49Monday, April 22, 13
©Continuent 2013
DRY-RUN
50
• Method to simulate installation;
• Does NOT perform installation;
• Does NOT even do validation;
• It only shows the commands used to install;
• Allows you to get the commands and do an installation manually (e.g. when you can't ssh between nodes)
50Monday, April 22, 13
©Continuent 2013
DRY-RUN
51
export DRYRUN=1./cookbook/install_master_slave
51Monday, April 22, 13
©Continuent 2013
Intro to multi-master installation
52
52Monday, April 22, 13
©Continuent 2013
How tungsten-installer Works for Basic Master/Slave Deployment
53
db1
db2
db3
Staging copy of files
check prereqscopy codeconfigure
53Monday, April 22, 13
©Continuent 2013
From Master/Slave Replication ...
54
db1Replicator
db3
Service alpha
db2
Replicator
Service alpha
Replicator
Service alpha
Install master and slaves on the whole cluster
tungsten-installertungsten-installertungsten-installer
54Monday, April 22, 13
©Continuent 2013
To Multi-Master
55
db1 Replicator
Service alpha
Service bravo
db2Replicator
Service bravo
Service alpha
Install master on db1
tungsten-installer
install master on db2
tungsten-installer
install slave service on db1
con!gure-service
install slave service on db2
con!gure-service
55Monday, April 22, 13
©Continuent 2013
tungsten-installer master 1
56
TUNGSTEN_HOME=/home/tungsten/installs/cookbook
./tools/tungsten-installer --master-slave --master-host=$MASTER1 --datasource-port=3306 --datasource-user=tungsten --datasource-password=secret --datasource-log-directory=/var/lib/mysql --service-name=alpha --home-directory=$TUNGSTEN_HOME --cluster-hosts=$MASTER1 --start
creating service 'alpha'Notice: --cluster-hosts has only one host
56Monday, April 22, 13
©Continuent 2013
tungsten-installer master 2
57
TUNGSTEN_HOME=/home/tungsten/installs/cookbook
./tools/tungsten-installer --master-slave --master-host=$MASTER2 --datasource-port=3306 --datasource-user=tungsten --datasource-password=secret --datasource-log-directory=/var/lib/mysql --service-name=bravo --home-directory=$TUNGSTEN_HOME --cluster-hosts=$MASTER2 --start
creating service 'bravo'Notice: --cluster-hosts has only one host
57Monday, April 22, 13
©Continuent 2013
Con!gure Service master 1
58
TUNGSTEN_HOME=/home/tungsten/installs/cookbook
$TUNGSTEN_HOME/tungsten/tools/configure-service -C --quiet --host=$MASTER1 --datasource=$MASTER1 --local-service-name=alpha --role=slave --service-type=remote --release-directory=$TUNGSTEN_HOME/tungsten --skip-validation-check=THLStorageCheck --master-thl-host=$MASTER2 --master-thl-port=2112 --svc-start bravo
Notice: bravo is the master service in host 2
58Monday, April 22, 13
©Continuent 2013
Con!gure Service master 2
59
TUNGSTEN_HOME=/home/tungsten/installs/cookbook
$TUNGSTEN_HOME/tungsten/tools/configure-service -C --quiet --host=$MASTER2 --datasource=$MASTER2 --local-service-name=bravo --role=slave --service-type=remote --release-directory=$TUNGSTEN_HOME/tungsten --skip-validation-check=THLStorageCheck --master-thl-host=$MASTER1 --master-thl-port=2112 --svc-start alpha
Notice: alpha is the master service in host 1
59Monday, April 22, 13
©Continuent 2013
From Master/Slave Replication ...
60
db1Replicator
db3
Service db1
db2
Replicator
Service db1
Replicator
Service db1
./cooobook/install_master_slave
60Monday, April 22, 13
©Continuent 2013
How Do I Install Fan-In Replication?
61
db1Replicator
db3
Service db1
db2Replicator
Service db2
Replicator
Service db1
Service db2
./cooobook/install_fan_in
61Monday, April 22, 13
©Continuent 2013
How Do I Install Multi-Master?
62
db1 Replicator
Service db1
Service db2
db2
Replicator
Service db1
Service db2
./cooobook/install_all_masters
62Monday, April 22, 13
©Continuent 2013
How Do I Extend Multi-Master?
63
db1 Replicator
Service db1
Service db2
Service db3db3
Service db1
Service db2
Service db3db2
Replicator
Service db1
Service db2
Service db3
Replicator
63Monday, April 22, 13
©Continuent 2013
How Do I Extend Multi-Master?
64
db1 db3Service db1
Service db2
Service db3
db2
Replicator
db4Service db1
Service db2
Service db4
Replicator
Service db3
Service db4
Service db1
Service db2
Service db3
Replicator
Service db4
Service db1
Service db2
Service db3
Replicator
Service db4
64Monday, April 22, 13
©Continuent 2013
How Do I Install a Star Topology?
65
db1Replicator
Service db1
Service db3db3
Service db1
Service db2
Service db3db2
Replicator
Service db2
Service db3
HUB
Replicator
./cooobook/install_star
65Monday, April 22, 13
©Continuent 2013
How Do I Extend a Star Topology?
66
db1Replicator
Service db1
Service db3db3Service db1
Service db2
Service db3
db2Replicator
Service db2
Service db3
db4
Replicator
Service db3
Service db4
HUB
Service db4
66Monday, April 22, 13
©Continuent 2013
How Do I Extend a Star Topology?
67
db1Replicator
Service db1
Service db3db3Service db1
Service db2
Service db3
db2Replicator
Service db2
Service db3
db4
Replicator
Service db3
Service db4
HUB
Service db4
db5
Replicator
Service db5
Service db3
Service db5
67Monday, April 22, 13
©Continuent 2013
BI-DIR: the painless way
• edit cookbook/COMMON_NODES.sh
• edit cookbook/USER_VALUES.sh
• remove two nodes
• edit the variables in cookbook/NODES_ALL_MASTERS.sh
• cookbook/install_all_masters
68
68Monday, April 22, 13
©Continuent 2013
Multiple masters
• fan-in
• Steps:
• install a master service in each node
• install a slave service for each master in the fan-in node
• or :
• cookbook/install_fan_in
69
69Monday, April 22, 13
©Continuent 2013
Multiple masters
• star topology
• Steps:
• install a master service in each server
• in the hub, install a slave service for each spoke
• in each spoke, install a slave service for the hub, using bypass option
• cookbook/install_star
70
70Monday, April 22, 13
©Continuent 2013
Taking Over from Standard Replication
• cookbook/install_standard_replicaton
• cookbook/takeover
71
71Monday, April 22, 13
©Continuent 2013
Replication Management
72
72Monday, April 22, 13
©Continuent 2013
Common Commands
• replicator
• trepctl
• thl
• the Tungsten service schema
73
73Monday, April 22, 13
©Continuent 2013
replicator
• It’s the service provider
• You launch it once when you start
• You may restart it when you change con!g
74
74Monday, April 22, 13
©Continuent 2013
trepctl
• Tungsten Replicator ConTroLler
• It’s the driving seat for your replication
• You can start, update, and stop services
• You can get speci!c info
75
75Monday, April 22, 13
©Continuent 2013
trepctl
• Tungsten Replicator Controller
• put services online or o"ine
• check status
• skip events
• inspect internals
• change roles
• heartbeat
• backup/restore
• ... and a lot more
76
76Monday, April 22, 13
©Continuent 2013
thl
• Transaction History List
• Gives you access to the Tungsten transaction history logs
77
77Monday, April 22, 13
©Continuent 2013
thl
• Transaction History Log
• info
• index
• list (total or a speci!c event, or by range)
• purge
78
78Monday, April 22, 13
©Continuent 2013
Tungsten service schema
• one for each service
• named "tungsten_SERVICE_NAME"
• e.g. tungsten_alpha, tungsten_dragon
• Most important table: trep_commit_seqno
79
79Monday, April 22, 13
©Continuent 2013
Looking at the tungsten service dbselect * from tungsten_dragon.trep_commit_seqno\G******************* 1. row ******************* task_id: 0 seqno: 102 fragno: 0 last_frag: 1 source_id: qa.r1.continuent.com epoch_number: 0 eventid: mysql-bin.000002:0000000000018903;0 applied_latency: 0 update_timestamp: 2012-02-06 05:56:12 shard_id: tungsten_dragonextract_timestamp: 2012-02-06 05:56:09
80
80Monday, April 22, 13
©Continuent 2013
Where are the tools
in the tungsten directory:
$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin
replicator # the daemon
trepctl # replicator controller
thl # transaction history log tool
81
81Monday, April 22, 13
©Continuent 2013
Starting and stopping the replicator
cd $TUNGSTEN_BASE/tungsten/tungsten-replicator/bin
./replicator statusTungsten Replicator Service is running (PID:32400).
./replicator stopStopping Tungsten Replicator Service...Stopped Tungsten Replicator Service.
./replicator startStarting Tungsten Replicator Service...
.... or ./cookbook/replicator ...
82
82Monday, April 22, 13
©Continuent 2013
checking replicator vitals
trepctl servicesProcessing services command...NAME VALUE---- -----appliedLastSeqno: -1 # bad sign?appliedLatency : -1.0role : slaveserviceName : dragonserviceType : localstarted : truestate : ONLINEFinished services command...
83
83Monday, April 22, 13
©Continuent 2013
sending a heartbeat
trepctl -host $MASTER_HOST heartbeattrepctl servicesProcessing services command...NAME VALUE---- -----appliedLastSeqno: 102appliedLatency : 3.139role : slaveserviceName : dragonserviceType : localstarted : truestate : ONLINEFinished services command...
84
84Monday, April 22, 13
©Continuent 2013
replicator status (1)trepctl statusProcessing status command...NAME VALUE---- -----appliedLastEventId : mysql-bin.000002:0000000000018903;0appliedLastSeqno : 102appliedLatency : 3.139clusterName : defaultcurrentEventId : NONEcurrentTimeMillis : 1328504342058dataServerHost : qa.r4.continuent.comextensions : latestEpochNumber : 0masterConnectUri : thl://qa.r1.continuent.com:2112/masterListenUri : thl://qa.r4.continuent.com:2112/maximumStoredSeqNo : 102minimumStoredSeqNo : 0[...]
85
85Monday, April 22, 13
©Continuent 2013
replicator status (2)[...]offlineRequests : NONEpendingError : NONEpendingErrorCode : NONEpendingErrorEventId : NONEpendingErrorSeqno : -1pendingExceptionMessage: NONEresourcePrecedence : 99rmiPort : 10000role : slaveseqnoType : java.lang.LongserviceName : dragonserviceType : localsimpleServiceName : dragonsiteName : defaultsourceId : qa.r4.continuent.comstate : ONLINEtimeInStateSeconds : 245.215uptimeSeconds : 245.539Finished status command...
86
86Monday, April 22, 13
©Continuent 2013
A failover scenario1: MySQL native replication
87
87Monday, April 22, 13
©Continuent 2013
1. one Master, two slaves
• Loading the “employees” test database
88
88Monday, April 22, 13
©Continuent 2013
2. Master goes away
* Stop replication* Slaves are updated at di"erent levels
# 2select count(*) from titles 333,145
# 3select count(*) from titles 443,308
89
89Monday, April 22, 13
©Continuent 2013
3. Look into Slave #2 binary logs
• !nd the last transaction
90
90Monday, April 22, 13
©Continuent 2013
4. Look into Slave #3 binary logs
1. !nd the transaction that was last in slave #2
2. Recognize that last transaction in the log of slave #3 (This can actually take you a LOOOONG TIME)
3. Get the position immediately after this transaction
4. (e.g. 134000 in !le mysql-bin.000018)
91
91Monday, April 22, 13
©Continuent 2013
5. promote Slave #3 to master
* in slave #2
CHANGE MASTER TO master_host=‘slave_3_IP’, master_user=‘slavename’,master_password=‘slavepassword’,master_log_file=‘mysql-bin.000018’,master_log_pos=134000;
92
92Monday, April 22, 13
©Continuent 2013
A failover scenario1I: Tungsten Replicator
93
93Monday, April 22, 13
©Continuent 2013
1. one master, two slaves
• loading the ‘employees’ test database
94
94Monday, April 22, 13
©Continuent 2013
2. Master goes away
* Stop replication* Slaves are updated at di"erent levels
# 2select count(*) from titles 333,145
# 3select count(*) from titles 443,308
95
95Monday, April 22, 13
©Continuent 2013
3. no need to !nd the last transaction
# simply change roles
trepctl -host slave3 setrole -role master
trepctl -host slave2 setrole \ -role slave -uri thl://slave3
trepctl -host slave3 online State: ONLINE
trepctl -host slave2 online State: GOING-ONLINE:SYNCHRONIZING
96
96Monday, April 22, 13
©Continuent 2013
4. Check that the slave has synchronized
# new masterselect seqno from tungsten.trep_commit_seqno;78
# new slaveselect seqno from tungsten.trep_commit_seqno;64
97
97Monday, April 22, 13
©Continuent 2013
4. Tell the replicator to hurry up
# new mastertrepctl -node slave3 flushMaster log is synchronized with database at log sequence number: 78
# new slavetrepctl host slave2 wait -applied 78ONLINEselect seqno from tungsten.trep_commit_seqno;78
98
98Monday, April 22, 13
©Continuent 2013
4. ... and we’re done
# new masterselect count(*) from employees.titlescount(*)443308
# new slave: count(*)443308
99
99Monday, April 22, 13
©Continuent 2013
planned role switch
cookbook/install_master_slave
cookbook/switch
100
100Monday, April 22, 13
©Continuent 2013
Switching roles in master/slave replication (1)
101
db1Replicator
db3
Service db1
db2
Replicator
Service db1
Replicator
Service db1✔ online
✔ online
✔ online
101Monday, April 22, 13
©Continuent 2013
Switching roles in master/slave replication (2)
102
db1Replicator
db3
Service db1
db2
Replicator
Service db1
Replicator
Service db1
o"ine✗
✔ online
✔ online
102Monday, April 22, 13
©Continuent 2013
Switching roles in master/slave replication (3)
103
db1Replicator
db3
Service db1
db2
Replicator
Service db1
Replicator
Service db1
o"ine✗
✔ online
✔ online
Wait for transactions to be applied
103Monday, April 22, 13
©Continuent 2013
Switching roles in master/slave replication (4)
104
db1Replicator
db3
Service db1
db2
Replicator
Service db1
Replicator
Service db1
o"ine✗
o"ine✗
o"ine✗
Slaves go offline
104Monday, April 22, 13
©Continuent 2013
Switching roles in master/slave replication (5)
105
db1Replicator
db3
Service db1
db2
Replicator
Service db1
Replicator
Service db1
o"ine✗
o"ine✗
o"ine✗
Slave is promoted.Notice: 2 masters, but o"ine
105Monday, April 22, 13
©Continuent 2013
Switching roles in master/slave replication (6)
106
db1Replicator
db3
Service db1
db2
Replicator
Service db1
Replicator
Service db1
o"ine✗
o"ine✗
o"ine✗
old master becomes slave
106Monday, April 22, 13
©Continuent 2013
Switching roles in master/slave replication (7)
107
db1Replicator
db3
Service db1
db2
Replicator
Service db1
Replicator
Service db1
o"ine✗
o"ine✗
o"ine✗
slaves are directed to new master
107Monday, April 22, 13
©Continuent 2013
Switching roles in master/slave replication (8)
108
db1Replicator
db3
Service db1
db2
Replicator
Service db1
Replicator
Service db1✔ online
✔ online
✔ online
all nodes go online, using new master
108Monday, April 22, 13
©Continuent 2013
Tungsten GTID vs MySQL 5.6 GTID
• What is GTID
• How it works in Tungsten
• How it works (or not) in MySQL 5.6
109
109Monday, April 22, 13
©Continuent 2013
without global transaction ID
110
slave
master
slave
A
B C
commitcommitcommitcommit
binlog
position
binlog
position position
binlog
110Monday, April 22, 13
©Continuent 2013
with global transaction ID
111
slave
master
slave
A
B C
commitcommitcommitcommit
id#200
id#200id#200111Monday, April 22, 13
©Continuent 2013
Tungsten and global transaction ID:activation
(none)active by default
112
112Monday, April 22, 13
©Continuent 2013
Tungsten and global transaction ID:status
trepctl statusProcessing status command...NAME VALUE---- -----appliedLastEventId : mysql-bin.000002:0000000000001442;0appliedLastSeqno : 6appliedLatency : 0.862clusterName : defaultcurrentEventId : NONEcurrentTimeMillis : 1354304680923dataServerHost : qa.r4.continuent.com
113
113Monday, April 22, 13
©Continuent 2013
Tungsten and global transaction ID:seeing transactions
thl list -seqno 6SEQ# = 6 / FRAG# = 0 (last frag)- TIME = 2012-11-30 20:44:35.0- EPOCH# = 0- EVENTID = mysql-bin.000002:0000000000001442;0- SOURCEID = qa.r1.continuent.com- SQL(0) = insert into test.v1 values (1, 'inserted by node #1') /* ___SERVICE___ = [cookbook] */
114
114Monday, April 22, 13
©Continuent 2013
Tungsten and global transaction ID:changing master connection
trepctl offlinetrepctl online -seqno 105
115
115Monday, April 22, 13
©Continuent 2013
Tungsten and Global transaction ID:crash-safe slave tables
mysql -e 'select * from tungsten_cookbook.trep_commit_seqno\G'*************************** 1. row *************************** task_id: 0 seqno: 6 fragno: 0 last_frag: 1 source_id: qa.r1.continuent.com epoch_number: 0 eventid: mysql-bin.000002:0000000000001442;0 applied_latency: 0 update_timestamp: 2012-11-30 20:44:35 shard_id: testextract_timestamp: 2012-11-30 20:44:35
116
116Monday, April 22, 13
©Continuent 2013
Tungsten and Global transaction ID:crash-safe tables and parallel replication
mysql -e 'select seqno, source_id, shard_id,update_timestamp from tungsten_cookbook.trep_commit_seqno'+-------+----------------------+----------+---------------------+| seqno | source_id | shard_id | update_timestamp |+-------+----------------------+----------+---------------------+| 7 | qa.r1.continuent.com | db1 | 2012-11-30 20:54:14 || 8 | qa.r1.continuent.com | db2 | 2012-11-30 20:54:14 || 9 | qa.r1.continuent.com | db3 | 2012-11-30 20:54:14 || 10 | qa.r1.continuent.com | db4 | 2012-11-30 20:54:14 || 11 | qa.r1.continuent.com | db5 | 2012-11-30 20:54:14 || 12 | qa.r1.continuent.com | db6 | 2012-11-30 20:54:14 || 13 | qa.r1.continuent.com | db7 | 2012-11-30 20:54:14 || 14 | qa.r1.continuent.com | db8 | 2012-11-30 20:54:14 || 15 | qa.r1.continuent.com | db9 | 2012-11-30 20:54:14 || 16 | qa.r1.continuent.com | db10 | 2012-11-30 20:54:14 |+-------+----------------------+----------+---------------------+
117
117Monday, April 22, 13
©Continuent 2013
MySQL 5.6 and global transaction IDactivation
mysqld --log-slave-updates \ --gtid-mode=on \ --enforce-gtid-consistency
WARNING: before MySQL 5.6.10, it was --disable-gtid-unsafe-statements
118
118Monday, April 22, 13
©Continuent 2013
MySQL 5.6 and global transaction IDseeing transactions
#121203 11:15:49 server id 1 end_log_pos 344 CRC32 0x45b25c8f GTID [commit=yes]SET @@SESSION.GTID_NEXT= '7A77A490-3D3A-11E2-8CC9-7DCF9991097B:2'/*!*/;# at 344#121203 11:15:49 server id 1 end_log_pos 423 CRC32 0x873c8fac Query thread_id=3 exec_time=0 error_code=0SET TIMESTAMP=1354533349/*!*/;BEGIN/*!*/;# at 423#121203 11:15:49 server id 1 end_log_pos 522 CRC32 0xb4bf4372 Query thread_id=3 exec_time=0 error_code=0SET TIMESTAMP=1354533349/*!*/;insert into t1 values (1)
119
119Monday, April 22, 13
©Continuent 2013
MySQL 5.6 and global transaction IDstatus
show slave status\G*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: rsandbox Master_Port: 13233 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 1837 Relay_Log_File: mysql_sandbox13234-relay-bin.000005 Relay_Log_Pos: 2047 Relay_Master_Log_File: mysql-bin.000002... Retrieved_Gtid_Set: 46E13434-3B28-11E2-BF47-6C626DA07446:1-7 Executed_Gtid_Set: 46E13434-3B28-11E2-BF47-6C626DA07446:1-7
120
120Monday, April 22, 13
©Continuent 2013
MySQL 5.6 and global transaction IDchanging master connection
CHANGE MASTER TO master_log_file='mysql-bin-000003', master_log_pos='1234'
# No global transaction ID is used
121
121Monday, April 22, 13
©Continuent 2013
MySQL 5.6 and global transaction IDcrash-safe slave table
select * from slave_relay_log_info\G********************* 1. row ******************** Number_of_lines: 7 Relay_log_name: ./mysql_sandbox13234-relay-bin.000005 Relay_log_pos: 2047 Master_log_name: mysql-bin.000002 Master_log_pos: 1837 Sql_delay: 0Number_of_workers: 5 Id: 1
# NO Global transaction ID is used!
122
122Monday, April 22, 13
©Continuent 2013
MySQL 5.6 and global transaction IDcrash-safe slave table + parallel
select * from mysql.slave_worker_info\G Id: 12 Relay_log_name: ./mysql_sandbox13234-relay-bin.000007 Relay_log_pos: 4299 Master_log_name: mysql-bin.000002 Master_log_pos: 7155 Checkpoint_relay_log_name: ./mysql_sandbox13234-relay-bin.000007 Checkpoint_relay_log_pos: 1786Checkpoint_master_log_name: mysql-bin.000002 Checkpoint_master_log_pos: 4642 Checkpoint_seqno: 9 Checkpoint_group_size: 64 Checkpoint_group_bitmap: ?
# NO Global transaction ID is used!
123
123Monday, April 22, 13
©Continuent 2013
Filters
124
124Monday, April 22, 13
©Continuent 2013
Tungsten Replication Service
125
Extract Filter Apply
StageExtract Filter Apply
StageExtract Filter Apply
Stage
Pipeline
MasterDBMS
TransactionHistory Log
In-MemoryQueue
SlaveDBMS
125Monday, April 22, 13
©Continuent 2013
Restrict replication to some schemas and tables
126
./tools/tungsten-installer \ --master-slave -a \ ... --svc-extractor-filters=replicate \ "--property=replicator.filter.replicate.do=test,*.foo" \ ... --start-and-report
# test="test.*" -> same drawback as binlog-do-db in MySQL# *.foo = table 'foo' in any database# employees.dept_codes,employees.salaries => safest way
126Monday, April 22, 13
©Continuent 2013
Exclude some schemas and tables from replication
127
./tools/tungsten-installer \ --master-slave -a \ ... --svc-extractor-filters=replicate \ "--property=replicator.filter.replicate.ignore=test,*.foo" \ ... --start-and-report
# test="test.*" -> same drawback as binlog-ignore-db in MySQL# *.foo = table 'foo' in any database# employees.dept_codes,employees.salaries => safest way
# DO NOT MIX .do and .ignore! # (you can do it, but it may not do what you mean)
127Monday, April 22, 13
©Continuent 2013
Change name of replicated schema
128
-a --svc-applier-filters=dbtransform \ --property=replicator.filter.dbtransform.from_regex1=stores \ --property=replicator.filter.dbtransform.to_regex1=playground
# from_regex1=stores -> name of the schema in the master# to_regex1=playground -> name of the schema in the slave
# WARNING: requires "USE schema_name" to work properly.
128Monday, April 22, 13
©Continuent 2013
Multi-master:Con#ict prevention
129
129Monday, April 22, 13
©Continuent 2012
CONFLICTS
130
130Monday, April 22, 13
©Continuent 2013
What's a con#ict
• Data modi!ed by several sources (masters)
• Creates one or more :
• data loss (unwanted delete)
• data inconsistency (unwanted update)
• duplicated data (unwanted insert)
• replication break
131
131Monday, April 22, 13
©Continuent 2013
Data duplication
132
id name amount1 Joe 1002 Frank 1103 Sue 100
alphabravo
charlie4 Matt 130
4 Matt 140
BREAKS REPLICATION
132Monday, April 22, 13
©Continuent 2013
auto_increment o$sets are not a remedy
• A popular recipe
• auto_increment_increment + auto_increment_offset
• They don't prevent con#icts
• They hide duplicates
133
133Monday, April 22, 13
©Continuent 2013
Hidden data duplication
134
id name amount1 Joe 1002 Frank 1103 Sue 100
alphao$set 1
bravoo$set 2
charlieo$set 3
13 Matt 130
11 Matt 140
INSERT
INSERT
134Monday, April 22, 13
©Continuent 2013
Data inconsistency
135
id name amount1 Joe 1002 Frank 1103 Sue 100
alphabravo
charlie3 Sue 105
3 Sue 108
UPDATE
UPDATE
135Monday, April 22, 13
©Continuent 2013
Data loss
136
id name amount1 Joe 1002 Frank 1103 Sue 100
alphabravo
charlierecord #3
3 Sue 108
MAY BREAK REPLICATION
UPDATE
DELETE
136Monday, April 22, 13
©Continuent 2012
con#ict handling strategies• resolving
• after the fact
• Needs information that is missing in async replication
• avoiding
• requires synchronous replication with 2pc
• preventing
• setting and enforcing a split sources policy
• Transforming and resolving
• all records are converted to INSERTs
• con"icts are resolved within a given time window
137
used by Tungsten
planned forfuture use
planned forfuture use
137Monday, April 22, 13
©Continuent 2013
Multi-master:Con!ict prevention
138
138Monday, April 22, 13
©Continuent 2013
Tungsten con#ict preventionin a nutshell
1. de!ne the rules
(which master can update which database)
2. tell Tungsten the rules
3. de!ne the policy
(error, drop, warn, or accept)
4. Let Tungsten enforce your rules
139
139Monday, April 22, 13
©Continuent 2013
Tungsten Con#ict prevention facts
• Sharded by database
• De!ned dynamically
• Applied on the slave services
• methods:
• error: make replication fail
• drop: drop silently
• warn: drop with warning
140
140Monday, April 22, 13
©Continuent 2013
Tungsten con#ict prevention applicability
• unknown shards
• The schema being updated is not planned
• actions: accept, drop, warn, error
• unwanted shards
• the schema is updated from the wrong master
• actions: accept, drop, warn, error
• whitelisted shards
• can be updated by any master
141
141Monday, April 22, 13
©Continuent 2013
Con#ict prevention directives
--svc-extractor-filters=shardfilter
replicator.filter.shardfilter.unknownShardPolicy=error
replicator.filter.shardfilter.unwantedShardPolicy=error
replicator.filter.shardfilter.enforceHomes=false
replicator.filter.shardfilter.allowWhitelisted=false
142
142Monday, April 22, 13
©Continuent 2013
con#ict prevention in a star topology
143
Host1master: alphadatabase: employees
Host2master: bravodatabase: buildings
Host3master: charlie (hub)database: vehicles
A
B
CB
A
C
C
alpha updates employees
✔
✔143Monday, April 22, 13
©Continuent 2013
con#ict prevention in a star topology
144
Host1master: alphadatabase: employees
Host2master: bravodatabase: buildings
Host3master: charlie (hub)database: vehicles
A
B
CB
A
C
C
alpha updates vehicles
✗
144Monday, April 22, 13
©Continuent 2013
con#ict prevention in a all-masters topology
145
Host1master: alphadatabase: employees
Host2master: bravodatabase: buildings
Host3master: charlie database: vehicles
A
B
CB
A
C
C
A
B
alpha updates employees
✔✔
145Monday, April 22, 13
©Continuent 2013
con#ict prevention in a all-masters topology
146
Host1master: alphadatabase: employees
Host2master: bravodatabase: buildings
Host3master: charlie database: vehicles
A
B
CB
A
C
C
A
B
charlie updates vehicles
✔✔
146Monday, April 22, 13
©Continuent 2013
con#ict prevention in a all-masters topology
147
Host1master: alphadatabase: employees
Host2master: bravodatabase: buildings
Host3master: charlie database: vehicles
A
B
CB
A
C
C
A
B
bravo updates employees ✗
✗147Monday, April 22, 13
©Continuent 2013
con#ict prevention in a all-masters topology
148
Host1master: alphadatabase: employees
Host2master: bravodatabase: buildings
Host3master: charliedatabase: vehicles
A
B
CB
A
C
C
A
B
charlie updates employees
✗✗
148Monday, April 22, 13
©Continuent 2013
setting con#ict prevention rules
trepctl -host host1 -service charlie \ shard -insert < shards.map
cat shards.mapshard_id master criticalpersonnel alpha falsebuildings bravo falsevehicles charlie falsetest whitelisted false
# charlie is slave service in host 1
149
149Monday, April 22, 13
©Continuent 2013
setting con#ict prevention rules
trepctl -host host2 -service charlie \ shard -insert < shards.map
cat shards.mapshard_id master criticalpersonnel alpha falsebuildings bravo falsevehicles charlie falsetest whitelisted false
# charlie is slave service in host 2
150
150Monday, April 22, 13
©Continuent 2013
setting con#ict prevention rules
trepctl -host host3 -service alpha \ shard -insert < shards.maptrepctl -host host3 -service bravo \ shard -insert < shards.map
cat shards.mapshard_id master criticalpersonnel alpha falsebuildings bravo falsevehicles charlie falsetest whitelisted false
# alpha and bravo are slave services in host 3
151
151Monday, April 22, 13
©Continuent 2013
Con#ict prevention demo
152
• reminder
• Server #1 can update "employees"
• Server #2 can update "buildings"
• Server #3 can update "vehicles"
152Monday, April 22, 13
©Continuent 2013
Sample correct operation (1)
mysql #1> create table employees.names( ... )
# all servers receive the table# all servers keep working well
153
153Monday, April 22, 13
©Continuent 2013
Sample correct operation (2)
mysql #2> create table buildings.homes( ... )
# all servers receive the table# all servers keep working well
154
154Monday, April 22, 13
©Continuent 2013
Sample incorrect operation (1)
mysql #2> create table employees.nicknames( ... )
# Only server #2 receives the table# slave service in hub gets an error# slave service in #1 does not receive anything
155
155Monday, April 22, 13
©Continuent 2013
sample incorrect operation (2)
#3 $ trepct services | simple_services alpha [slave]seqno: 7 - latency: 0.136 - ONLINE
bravo [slave]seqno: -1 - latency: -1.000 - OFFLINE:ERROR
charlie [master]seqno: 66 - latency: 0.440 - ONLINE
156
156Monday, April 22, 13
©Continuent 2013
sample incorrect operation (3)
#3 $ trepct -service bravo statusNAME VALUE---- -----appliedLastEventId : NONEappliedLastSeqno : -1appliedLatency : -1.0(...)offlineRequests : NONEpendingError : Stage task failed: q-to-dbmspendingErrorCode : NONEpendingErrorEventId : mysql-bin.000002:0000000000001241;0pendingErrorSeqno : 7pendingExceptionMessage: Rejected event from wrong shard: seqno=7 shard ID=employees shard master=alpha service=bravo(...)
157
157Monday, April 22, 13
©Continuent 2013
Fixing the issue
mysql #1> drop table if exists employees.nicknames;mysql #1> create table if exists employees.nicknames ( ... ) ;
#3 $ trepct -service bravo online -skip-seqno 7
# all servers receive the new table
158
158Monday, April 22, 13
©Continuent 2013
Sample whitelisted operation
mysql #2> create table test.hope4best( ... )
mysql #1> insert into test.hope4best values ( ... )
# REMEMBER: 'test' was explicitly whitelisted# All servers get the new table and records# But there is no protection against conflicts
159
159Monday, April 22, 13
©Continuent 2013
administration
160
160Monday, April 22, 13
©Continuent 2013
Viewing THL Events
thl infolog directory = /home/tungsten/installs/master_slave/thl/dragon/min seq# = 0max seq# = 101events = 101
161
161Monday, April 22, 13
©Continuent 2013
viewing THL events
thl indexLogIndexEntry thl.data.0000000001(0:102)
162
162Monday, April 22, 13
©Continuent 2013
viewing THL eventsthl index[...]LogIndexEntry thl.data.0000000001(0:18)LogIndexEntry thl.data.0000000002(19:33)LogIndexEntry thl.data.0000000003(34:35)LogIndexEntry thl.data.0000000004(36:3641)LogIndexEntry thl.data.0000000005(3642:3712)LogIndexEntry thl.data.0000000006(3713:3838)LogIndexEntry thl.data.0000000007(3839:3949)LogIndexEntry thl.data.0000000008(3950:4011)LogIndexEntry thl.data.0000000009(4012:4039)LogIndexEntry thl.data.0000000010(4040:4057)LogIndexEntry thl.data.0000000011(4058:4067)LogIndexEntry thl.data.0000000012(4068:4073)LogIndexEntry thl.data.0000000013(4074:4085)LogIndexEntry thl.data.0000000014(4086:4095)LogIndexEntry thl.data.0000000015(4096:4101)LogIndexEntry thl.data.0000000016(4102:4111)
163
163Monday, April 22, 13
©Continuent 2013
viewing THL eventsthl list -seqno 102[...]SEQ# = 102 / FRAG# = 0 (last frag)- TIME = 2012-02-06 05:56:09.0- EPOCH# = 0- EVENTID = mysql-bin.000002:0000000000018903;0- SOURCEID = qa.r1.continuent.com- METADATA = [mysql_server_id=10;is_metadata=true;service=dragon;shard=tungsten_dragon;heartbeat=NONE]- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent- OPTIONS = [##charset = ISO8859_1, autocommit = 1, sql_auto_is_null = 1, foreign_key_checks = 1, unique_checks = 1, sql_mode = 'IGNORE_SPACE', character_set_client = 8, collation_connection = 8, collation_server = 8]
- SCHEMA = tungsten_dragon- SQL(0) = UPDATE tungsten_dragon.heartbeat SET source_tstamp= "2012-02-06 05:56:09", salt= 2, name= "NONE" WHERE id= 1 /* ___SERVICE___ = [dragon] */
164
164Monday, April 22, 13
©Continuent 2013
Skipping a THL Event
trepctl online -skip-seqno 1092
trepctl online -skip-seqno 1092,1093,1094
# see example
165
165Monday, April 22, 13
©Continuent 2013
Adding a Member
• Let's see the cookbook, and use it
166
166Monday, April 22, 13
©Continuent 2013
parallel replication
167
167Monday, April 22, 13
Replicator Pipeline Architecture
THL SlaveDBMS
Transaction History Log
MySQLBinlog
shard.listfile
ApplyExtract Extract
PipelineTungsten Replicator Process
StageApplyExtract
ApplyExtract
ApplyExtract
ParallelQueue
Assign Shard
IDApply
StageStage
“channels”
168Monday, April 22, 13
©Continuent 2013
Parallel replication facts
✓Sharded by database
✓Good choice for slave lag problems
❖Bad choice for single database projects
169
169Monday, April 22, 13
Parallel Replication test
binary logs
MySQL slave
Tungsten slave
OFFLINE
STOPPED
replicator alpha
direct: alpha(slave)
Concurrent sysbenchon 30 databasesrunning for 1 hour
TOTAL DATA: 130 GBRAM per server: 20GB
Slaves will have 1 hour lag170Monday, April 22, 13
measuring results
binary logs
MySQL slave
Tungsten slave
ONLINE
START
replicator alpha
direct: alpha(slave)
Recording catch-up time
171Monday, April 22, 13
MySQL native replication
slave catch up in 04:29:30
172Monday, April 22, 13
Tungsten parallel replication
slave catch up in 00:55:40
173Monday, April 22, 13
Parallel replication made simpler
FROM HERE ....174Monday, April 22, 13
Parallel replication made simpler
TO HERE175Monday, April 22, 13
Parallel replication made simpler
176Monday, April 22, 13
©Continuent 2013
parallel replicationdirect slave facts
✓No need to install Tungsten on the master
✓Tungsten runs only on the slave
✓Replication can revert to native slave with two commands (trepctl offline; start slave)
✓Native replication can continue on other slaves
❖Failover (either native or Tungsten) becomes a manual task
177
177Monday, April 22, 13
©Continuent 2013
installing parallel replication
• MORE_OPTIONS='--channels=10'
• ./cookbook/install_master_slave
178
178Monday, April 22, 13
©Continuent 2013
Checking parallel replication
trepctl status
trepctl status -name tasks
trepctl status -name shards
trepctl status -name stores
179
179Monday, April 22, 13
©Continuent 2013
Parallel replication demo
180
180Monday, April 22, 13
©Continuent 2013
Troubleshooting
181
181Monday, April 22, 13
©Continuent 2013
Identify the Failed Component
• Steps
1. trepctl services
2. trepctl -service SVC_NAME status
3. look at the logs
4. Take action
182
182Monday, April 22, 13
©Continuent 2013
reading the logs
ls $TUNGSTEN_BASE/tungsten/tungsten-replicator/logs/trepsvc.log user.log
...or ./cookbook/show_log
# let's see it in practice
183
183Monday, April 22, 13
©Continuent 2013
Parting thoughts
184
184Monday, April 22, 13
©Continuent 2013
Open source Tungsten Replicator now includes Oracle-to-MySQL and Oracle-to-Oracle extractors and appliers!
185
185Monday, April 22, 13
©Continuent 2012 186
Continuent Website:http://www.continuent.com
Tungsten Replicator 2.0:http://code.google.com/p/tungsten-replicator
Our Blogs:http://scale-out-blog.blogspot.comhttp://datacharmer.blogspot.comhttp://flyingclusters.blogspot.com
560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel +1 (866) 998-3642 Fax +1 (408) 668-1009e-mail: [email protected]
186Monday, April 22, 13