View
245
Download
2
Category
Preview:
Citation preview
MySQL, Containers, & CephRed Hat Summit, 2016
WHOIS
Kyle BaderSenior Solutions ArchitectRed Hat
Yves TrudeauPrincipal ArchitectPercona
OPENSTACK CINDER DRIVER TRENDS
0
10
20
30
40
50
November 2014 May 2015 October 2015 April 2016% using Ceph RBD for Cinder % using LVM for Cinder
0 10 20 30 40 50 60 70 80
LAMP
JAVA
MEAN
WISA
RAILS
Other
April 2016 October 2015
OPENSTACK APP FRAMEWORK TRENDS
• Shared, elastic storage pool
• Dynamic DB placement
• Flexible volume resizing
• Live instance migration
• Backup to object pool
• Read replicas via copy-on-write snapshots
MySQL ON CEPH STORAGE CLOUDOPS EFFICIENCY
MYSQL-ON-CEPH PRIVATE CLOUDFIDELITY TO A MYSQL-ON-AWS EXPERIENCE
• Hybrid cloud requires public/private cloud commonalities
• Developers want DevOps consistency
• Elastic block storage, Ceph RBD vs. AWS EBS
• Elastic object storage, Ceph RGW vs. AWS S3
• Users want deterministic performance
HEAD-TO-HEADPERFORMANCE
30 IOPS/GB: AWS EBS P-IOPS TARGET
HEAD-TO-HEAD LABTEST ENVIRONMENTS
• EC2 r3.2xlarge and m4.4xlarge
• EBS Provisioned IOPS and GPSSD
• Percona Server
• Supermicro servers
• Red Hat Ceph Storage RBD
• Percona Server
OSD Storage Server Systems5x SuperStorage SSG-6028R-OSDXXX
Dual Intel Xeon E5-2650v3 (10x core)32GB SDRAM DDR32x 80GB boot drives 4x 800GB Intel DC P3700 (hot-swap U.2 NVMe)1x dual port 10GbE network adaptors AOC-STGN-i2S 8x Seagate 6TB 7200 RPM SAS (unused in this lab)Mellanox 40GbE network adaptor(unused in this lab)
MySQL Client Systems12x Super Server 2UTwin2 nodes
Dual Intel Xeon E5-2670v2(cpuset limited to 8 or 16 vCPUs)64GB SDRAM DDR3
Storage Server Software:Red Hat Ceph Storage 1.3.2Red Hat Enterprise Linux 7.2Percona Server
5x OSD Nodes 12x Client Nodes
Shared 10
G S
FP+
Netw
orking
Monitor Nodes
SUPERMICRO CEPHLAB ENVIRONMENT
7996 7956
950
1680 1687
267
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
P-IOPS m4.4xl
P-IOPSr3.2xl
GP-SSD r3.2xl
100% Read
100% Write
SYSBENCH BASELINE ON AWS EC2 + EBS
7996
67144
40031
16805677
1258
20053
4752
0
10000
20000
30000
40000
50000
60000
70000
80000
P-IOPS m4.4xl
Ceph cluster1x "m4.4xl"
(14% capacity)
Ceph cluster6x "m4.4xl"
(87% capacity)
100% Read
100% write
70/30 RW
SYSBENCH REQUESTS PER MYSQL INSTANCE
CONVERTING SYSBENCH REQUESTS TO IOPSREAD PATH
X% FROM INNODB BUFFER POOL
IOPS = (READ REQUESTS – X%)
SYSBENCH READ
CONVERTING SYSBENCH REQUESTS TO IOPSWRITE PATH
SYSBENCH WRITE
1X READ
X% FROM INNODB BUFFER POOL
IOPS = (READ REQ – X%)
LOG, DOUBLE WRITE BUFFER
IOPS = (WRITE REQ * 2.3)
1X WRITE
30.0 29.8
3.6
25.6 25.7
4.1
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
P-IOPS m4.4xl
P-IOPS r3.2xl
GP-SSD r3.2xl
100% Read
100% Write
AWS IOPS/GB BASELINE: ~ AS ADVERTISED!
IOPS/GB PER MYSQL INSTANCE
30
252
150
26
78
19
0
50
100
150
200
250
300
P-IOPS m4.4xl
Ceph cluster1x "m4.4xl"
(14% capacity)
Ceph cluster6x "m4.4xl"
(87% capacity)
MySQL IOPS/GB ReadsMySQL IOPS/GB Writes
FOCUSING ON WRITE IOPS/GBAWS THROTTLE WATERMARK FOR DETERMINISTIC PERFORMANCE
26
78
19
0
10
20
30
40
50
60
70
80
90
P-IOPS m4.4xl
Ceph cluster1x "m4.4xl"
(14% capacity)
Ceph cluster6x "m4.4xl"
(87% capacity)
A NOTE ON WRITE AMPLIFICATIONMYSQL ON CEPH – WRITE PATH
INNODB DOUBLEWRITE BUFFER
CEPH REPLICATION
OSD JOURNALING
MYSQL INSERT
X2
X2
X2
EFFECT OF CEPH CLUSTER LOADING ON IOPS/GB
78
37
2519
134
72
37 36
0
20
40
60
80
100
120
140
160
Ceph cluster (14% capacity)
Ceph cluster (36% capacity)
Ceph cluster(72% capacity)
Ceph cluster(87% capacity)
IOPS/GB
100% Write
70/30 RW
18 1819
6
34 3436
8
0
5
10
15
20
25
30
35
40
Ceph cluster80 cores8 NVMe
(87% capacity)
Ceph cluster40 cores4 NVMe
(87% capacity)
Ceph cluster80 cores4 NVMe
(87% capacity)
Ceph cluster80 cores12 NVMe
(84% capacity)
IOPS/GB
100% Write70/30 RW
CONSIDERING CORE-TO-FLASH RATIO
HEAD-TO-HEADPERFORMANCE
30 IOPS/GB: AWS EBS P-IOPS TARGET
25 IOPS/GB: CEPH 72% CLUSTER CAPACITY (WRITES)78 IOPS/GB: CEPH 14% CLUSTER CAPACITY (WRITES)
HEAD-TO-HEADPRICE/PERFORMANCE
$2.50: TARGET AWS EBS P-IOPS STORAGE PER IOP
IOPS/GB ON VARIOUS CONFIGS
31
18 18
78
-
10
20
30
40
50
60
70
80
90 IOPS/GB
(Sysbench Write)
AWS EBS Provisioned-IOPSCeph on Supermicro FatTwin 72% CapacityCeph on Supermicro MicroCloud 87% CapacityCeph on Supermicro MicroCloud 14% Capacity
$/STORAGE-IOP ON THE SAME CONFIGS
$2.40
$0.80 $0.78 $1.06
$-
$0.50
$1.00
$1.50
$2.00
$2.50
$3.00
Storage $/IOP
(Sysbench Write)
AWS EBS Provisioned-IOPSCeph on Supermicro FatTwin 72% CapacityCeph on Supermicro MicroCloud 87% CapacityCeph on Supermicro MicroCloud 14% Capacity
HEAD-TO-HEADPRICE/PERFORMANCE
$2.50: TARGET AWS P-IOPS $/IOP (EBS ONLY)$0.78: CEPH ON SUPERMICRO MICRO CLOUD CLUSTER
TUNING CEPH BLOCK
TUNING CEPH BLOCK
• Format
• Order
• Fancy Striping
• TCP_NO_DELAY
RBD FORMAT
• Format 1
• Deprecated
• Supported by all versions of Ceph
• No reason to use it in greenfield environment
• Format 2
• New, default, format
• Support snapshot and clone
RBD ODER
• The chunk / striping boundary for block device
• Default is 4MB -> 22
• 4MB = 222
• Used default during our testing
RBD: Fancy Striping
• Only available to QEMU / librbd
• Finer striping for parallelization of small writes across order
• Helps with some HDD workloads
• Used default during our testing
TCP_NO_DELAY
• Disables Nagel congestion control algorithm
• Important for latency sensitive workloads
• Good for maximizing IOPS -> MySQL
• Default in QEMU
• Default in KRBD
• Added in mainline 4.2
• Backported to RHEL 7.2 3.10-236+
TUNING QEMUBLOCK VIRTUALIZATION
TUNING QEMU BLOCK
• Paravirtual Devices
• AIO Mode
• Caching
• x-data-plane
• num_queues
QEMU: PARAVIRTUAL DEVICES
• Virtio-blk
• Virtio-scsi
QEMU: AIO MODE
• Threads
• Software implementation of aio using thread pool
• Native
• User Kernel AIO
• Way to go in the future
QEMU: CACHING
Writeback None Writethrough Directsync
Uses Host Page Cache Yes No Yes No
Guest Disk WCE Enabled Enabled Disabled Disabled
rbd_cache True False True False
rbd_max_dirty 25165824 0 0 0
BENCHMARKS
BENCHMARKS
• Sysbench OLTP, 32 tables of each 28M rows, ~200GB
• MySQL config: 50GB buffer pool, 8MB log file size, ACID
• Filesystem: XFS with noatime, nodiratime, nobarrier
• Data reloaded before each test
• 100% reads: --oltp-point-select=100
• 100% writes: --oltp-index-updates=100
• 70%/30% reads/writes: --oltp-index-updates=28 --oltp-point-select=70
--rand-type=uniform
• 20 minute run time per test, iterations averaged
• 64 threads, 8 cores
BASIC QEMU PERFORMANCE
0
5000
10000
15000
20000
25000
30000
35000
qemu tcg qemu-kvm-default io=threads cache=none io=native cache=none
IOPS
Reads Writes R/W 70/30
THREAD CACHING MODES
0
5000
10000
15000
20000
25000
30000
io=threads cache=none io=threads cache=writethrough io=threads cache=writeback
IOP
S
Reads Writes R/W 70/30
DEDICATED DISPATCH THREADS
0
5000
10000
15000
20000
25000
30000
35000
io=native cache=none io=native cache=directsync io=native cache=directsynciothread=1
io=native cache=directsynciothread=2
IOPS
Reads Writes R/W 70/30
DATA PLANE AND VIRTIO-SCSI QUEUES
0
5000
10000
15000
20000
25000
30000
35000
40000
x-data-plane virtio-scsi, num-queues=4 virtio-scsi, num-queues=2, vectors=3
virtio-scsi, num-queues=4, vectors=5
IOP
S
Reads Writes R/W 70/30
CONTAINERS AND METAL
0
10000
20000
30000
40000
50000
60000
Metal (taskset -c 10-17) lxc (cgroup cpu 10-17) io=threads cache=none io=native cache=none virtio-scsi, num-queues=2, vectors=3
IOP
S
Reads Writes R/W 70/30
8x Nodes in 3U chassisModel:SYS-5038MR-OSDXXXP
Per Node Configuration:CPU: Single Intel Xeon E5-2630 v4Memory: 32GB NVMe Storage: Single 800GB Intel P3700 Networking: 1x dual-port 10G SFP+
+ +
1x CPU + 1x NVMe + 1x SFP
SUPERMICRO MICRO CLOUDCEPH MYSQL PERFORMANCE SKU
THANK YOU!
Recommended