Fine-tuning Group Replication for Performance

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Fine-tuning Group Replication for performance

FOSDEM, 4th of February 2017

Vítor Oliveira ([email protected])Senior Performance Engineer

1Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

2

In this session, I will present Group Replication from the perspective of a performance optimizer.

It will show the main moving parts, how we can use the available options to tune its behaviour, and show a few significant benchmark results.

For a performance evaluation of Group Replication please visit:http://mysqlhighavailability.com/performance-evaluation-mysql-5-7-group-replication/

Summary

3

http://mysqlhighavailability.com/performance-evaluation-mysql-5-7-group-replication/

http://mysqlhighavailability.com/performance-evaluation-mysql-5-7-group-replication/

Anatomy of Group Replication

Performance enhancement options

Flow-control considerations

A bit more benchmarking

Conclusion

Contents

4

1

2

3

4

5

Anatomy of Group Replication(from a performance perspective)

1

6

TransactionHook

BeginExecute transaction

(until prepare)

Needs tothrottle?

Rollback

Commit

Delay until nextflow-control period

Yes

Collect writesetinformation

Send message forordering by GCS

Wait result fromcertification thread

Certificationpositive?

No

Certificationcertification result

GroupCommunication

WR

ITE

R-S

IDE

7

Certifier Loop

Start Quit?

End

Yes

Get next transaction in queue

Certify transaction

Is transaction

local?

Applier


UserThread

Send certification result to user thread

Add transaction eventsto relay log

Yes

Yes

signal release

GCSReceiveThread

OrderedTransaction

Queue

CER

ITFI

ER

Main factors affecting performance:• Network bandwidth and latency

To agree in a specific order of transactions Group Replication needs to send them to all members of the group and wait for the majority to respond, which consumes the network bandwidth and adds at least one network RTT to the transaction latency.

• Certification throughputTransactions are certified in the agreed message delivery order and sent to the relay log by a single thread. That can become a contention point if the certification rate is high or the storage system that stores the relay log is slow.

• Remote transaction applier throughputRemote transactions can be applied by the single or the multi-threaded applier, but parallelism should be properly explored so that the non-writer members can keep up with the writers.

8

9

TransactionHook

BeginExecute transaction

(until prepare)

Needs tothrottle?

Rollback

Commit

Delay until nextflow-control period

Yes

Collect writesetinformation

Send message forordering by GCS

Wait result fromcertification thread


No

Certificationcertification result

GroupCommunication

WR

ITE

R-S

IDE

10

Certifier Loop

Start Quit?

End

Yes

Get next transaction in queue

Certify transaction

Is transaction

local?

Applier


UserThreads

Send certification result to user thread

Add transaction eventsto relay log

Yes

Yes

signal release

GCSReceiveThread

OrderedTransaction

Queue

CE

RIT

FIE

R

Performance enhancement options2

Network bandwidth and latency:1. Use high bandwidth and low-latency networks

– If needed hide latencies by using many concurrent connections

2. Options to reduce the bandwidth required

– group_replication_compression_threshold=<size: 100>

– binlog_row_image=MINIMAL

3. Reduce latency with busy waiting before sleeping

– group_replication_poll_spin_loops=<num_spins>

12

Certifier throughput:

13

1. Use high-performance storage for the relay log

– Improve the disk bandwidth as each event is sent by the certifier thread to the relay log while also being read by the applier.

2. Use fast temporary directory

– Transaction writesets are extracted using the IO_CACHE, and that may need to spill-over to disk on large transactions.

– tmpdir=/dev/shm/...

3. Reduce certification complexity in multi-master

– group_replication_gtid_assignment_block_size=<size: 10M>

Applier throughput:

14

1. Apply remote transactions with the LOGICAL_CLOCK scheduler and enough worker thread parallelism to keep up with the writers:

– slave-parallel-type=LOGICAL_CLOCK

– slave-parallel-workers=8/16/+

2. Take advantage of writeset-based transaction dependency tracking

GR

APPLI

ER

15

Applier throughput:

2. Writeset-based transaction dependency tracking allows Group Replication to reach higher slave throughput with less client threads.

1. When using high performance storage (RAMDISK) and reduced number of clients the throughput of the slave applier based on binary log group commit is reduced.

Flow-control considerations3

Flow-control goals:• Allow new members to join the group when writing intensively

Nodes entering the group need to catch up previous work while also storing current work to apply later. This is more demanding then just applying remote transactions, so the cluster may need to be put at lower speed for new members to catch up.

• Reduce the number of transactions aborts in multi-master Rows from transactions in the relay log cannot be used by new transactions, otherwise the transaction will be forced to fail at certification time.

• Keep members closer for faster failoverFailing over to an up-to-date member is faster as the back-log to apply is smaller.

• Keep members closer for reduced replication lagApplications using one writer and multiple readers will see more up-to-date information when reading from other members then the writer.

17

Flow-control approach:• Writer decision

The writers will throttle themselves when they see a large queue on the other members, the delayed members don’t have to spend extra time dealing with that.

• Coarse grainThe flow-control does not micro-manage the synchrony between nodes, it just expects that it is enough to correct course over the long run.

• Can be disabled as neededFlow-control can disabled as needed to address situations less fit for it, in which case it works just like asynchronous replication.

• Options to use--flow-control-mode=QUOTA* | DISABLED--flow-control-certifier/applier-threshold=25000*

18

19

FLO

W-C

ON

TR

OL

MA

IN L

OO

P

Flow-control Loop

Start Quit?

End

Yes

Find members withexcessive queueing

Send stats messageto group

Are all members

ok?

Throttling Active?

quota = min. capacity /number of writers

(minus 10%, up to 5% of thresholds)

Release throttlinggradually

(50% increase per step)

Yes

StatsMessagesReceiver

no

wait one second& release trans

MemberExecution

Stats

Flow-control throughput effects

8 16 32 64 128 2560

2 500

5 000

7 500

10 000

12 500

15 000

Throughput varying Flow-control: Sysbench OLTP RW(9 members)

flow-control disabled default threshold (25000) small threshold (1000)

number of client threads

tota

l tra

nsa

ctio

ns

pe

r se

con

d (

TP

S)

GR

OU

P S

IZE =

9

20

A bit more benchmarking(using Sysbench OLTP RW)

4

Single-master throughput and latency:

8 16 32 64 128 2560

2 500

5 000

7 500

10 000

12 500

15 000

0

15

30

45

Sysbench OLTP RW on a 9 member group

Asynchronous (sustained throughput) Group Replication (sustained throughput)

Latency Latency

number of clients/threads

ma

xim

um

su

sta

ine

d th

rou

gh

pu

t (tr

an

sact

ion

s p

er

seco

nd

)

clie

nt l

ate

ncy

(m

s)

(throughput: higher is better)

(latency: lower is better)

GR

OU

P S

IZE =

9

22

Multi-master maximum throughput:

3 members 5 members 7 members 9 members0

2 500

5 000

7 500

10 000

12 500

15 000

Single- and multi-master scalability: Sysbench RW

Asynchronous one writer Group Replication: one writer two writers three writers

number of group members

ma

xim

um

su

sta

ine

d th

rou

gh

pu

t (T

PS

)

SC

ALA

BIL

ITY

(non-d

ura

ble

sett

ing

s)

23

Effects of network round-trip latency

8 16 32 64 128 256 512 10240

2 500

5 000

7 500

10 000

12 500

15 000

Effects of network round-trip latency: Sysbench RW

10Gbps LAN 1 node @ 7ms, 200Mbps 1 node @ 50ms, 200Mbps

number of client threads

thro

ug

hp

ut (

TP

S)

HIG

H-L

ATEN

CY N

ETW

OR

KS

TH

REE

MEM

BE

RS

24

0 60 120 180 2400

2 500

5 000

7 500

10 000

Peak Single-master Throughput over Time: Sysbench OLTP RW(durable settings, 64 clients, 9 members)

Asynchronous Group Replication

time (sec)

ave

rag

e th

rou

gh

pu

t (tr

an

sact

ion

s p

er

seco

nd

)

(higher is better)

STA

BIL

ITY

Stability over time

25

Conclusion5

Conclusion• For performance one needs to focus on the three areas:

group communication, certifier and applier throughputs.• Group Replication has high-performance out of the box:

– High throughput and low-latency vs asynchronous replication;

– Scalable to a significant number of members and client threads;

– Optimized for low-latency network but can already widthstand high-latency networks well.

• Group Replication has reach GA status very recently, so the fun is just begining...

27

Thank you.

Any questions?

• Documentation– http://dev.mysql.com/doc/refman/5.7/en/group-replication.html

• Performance blog posts related to Group Replication:– http://mysqlhighavailability.com/category/performance/

28

http://dev.mysql.com/doc/refman/5.7/en/group-replication.html

http://dev.mysql.com/doc/refman/5.7/en/group-replication.html

Technology

Fine-tuning Group Replication for Performance