36
MySQL and Ceph 2 August 2016

MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

MySQL and Ceph 2 August 2016

Page 2: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

WHOIS

Brent Compton and Kyle Bader Storage Solution Architectures Red Hat

Yves Trudeau Principal Architect Percona

Page 3: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

AGENDA

MySQL on Ceph

•  Ceph Architecture •  MySQL on Ceph RBD •  Sample Benchmark Results •  Hardware Selection Considerations

Page 4: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

Why MySQL on Ceph

Page 5: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

•  Ceph #1 block storage for OpenStack clouds

•  70% apps on OpenStack use LAMP stack

•  MySQL leading open-source RDBMS

•  Ceph leading open-source software-defined storage

WHY MYSQL ON CEPH? MARKET DRIVERS

Page 6: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

•  Shared, elastic storage pool on commodity servers

•  Dynamic DB placement

•  Flexible volume resizing

•  Live instance migration

•  Backup block pool to object pool

•  Read replicas via copy-on-write snapshots

•  … commonality with public cloud deployment models

WHY MYSQL ON CEPH? EFFICIENCY DRIVERS

Page 7: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

CEPH ARCHITECTURE

Page 8: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

ARCHITECTURAL COMPONENTS

RGW A web services

gateway for object storage, compatible with S3 and Swift

LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBD A reliable, fully-distributed block device with cloud

platform integration

CEPHFS A distributed file

system with POSIX semantics and scale-

out metadata

APP HOST/VM CLIENT

Page 9: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

RADOS COMPONENTS

OSDs •  10s to 10000s in a cluster •  Typically one per disk •  Serve stored objects to clients •  Intelligently peer for replication & recovery

Monitors •  Maintain cluster membership and state •  Provide consensus for distributed decision-making •  Small, odd number •  These do not serve stored objects to clients

Page 10: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

CEPH OSD

Page 11: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

RADOS CLUSTER

RADOS CLUSTER

Page 12: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

WHERE DO OBJECTS LIVE?

??

Page 13: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

A METADATA SERVER?

1

2

Page 14: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

CALCULATED PLACEMENT

Page 15: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

EVEN BETTER: CRUSH

CLUSTER PLACEMENT GROUPS (PGs)

Page 16: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

CRUSH IS A QUICK CALCULATION

CLUSTER

Page 17: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

DYNAMIC DATA PLACEMENT

CRUSH: •  Pseudo-random placement algorithm

•  Fast calculation, no lookup •  Repeatable, deterministic

•  Statistically uniform distribution •  Stable mapping

•  Limited data migration on change •  Rule-based configuration

•  Infrastructure topology aware •  Adjustable replication •  Weighting

Page 18: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

DATA IS ORGANIZED INTO POOLS

CLUSTER POOLS (CONTAINING PGs)

POOL A

POOL B

POOL C

POOL D

Page 19: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

ACCESS METHODS

Page 20: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

ARCHITECTURAL COMPONENTS

RGW A web services

gateway for object storage, compatible with S3 and Swift

LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBD A reliable, fully-distributed block device with cloud

platform integration

CEPHFS A distributed file

system with POSIX semantics and scale-

out metadata

APP HOST/VM CLIENT

Page 21: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

ARCHITECTURAL COMPONENTS

RGW A web services

gateway for object storage, compatible with S3 and Swift

LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBD A reliable, fully-distributed block device with cloud

platform integration

CEPHFS A distributed file

system with POSIX semantics and scale-

out metadata

APP HOST/VM CLIENT

Page 22: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

STORING VIRTUAL DISKS

RADOS CLUSTER

Page 23: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

STORING VIRTUAL DISKS

RADOS CLUSTER

Page 24: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

STORING VIRTUAL DISKS

RADOS CLUSTER

Page 25: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

PERCONA SERVER ON KRBD

RADOS CLUSTER

Page 26: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

TUNING MYSQL ON CEPH

Page 27: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

HEAD-TO-HEAD: MYSQL ON CEPH VS. AWS

31

18 18

78

-

10

20

30

40

50

60

70

80

90

IOPS

/GB

(S

ysbe

nch

Writ

e)

AWS EBS Provisioned-IOPS Ceph on Supermicro FatTwin 72% Capacity Ceph on Supermicro MicroCloud 87% Capacity Ceph on Supermicro MicroCloud 14% Capacity

Page 28: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

TUNING FOR HARMONY OVERVIEW

Tuning MySQL •  Buffer pool > 20%

•  Flush each Tx or batch?

•  Parallel double write-buffer

flush Tuning Ceph •  RHCS 1.3.2, tcmalloc 2.4

•  128M thread cache

•  Co-resident journals

•  2-4 OSDs per SSD

Page 29: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

TUNING FOR HARMONY SAMPLE EFFECT OF MYSQL BUFFER POOL ON TpmC

-

200,000

400,000

600,000

800,000

1,000,000

1,200,000

0 1000 2000 3000 4000 5000 6000 7000 8000

tpm

C

Time (seconds) - 1 data point per minute

64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses

1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool

Page 30: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

TUNING FOR HARMONY SAMPLE EFFECT OF MYSQL Tx FLUSH ON TpmC

-

500,000

1,000,000

1,500,000

2,000,000

2,500,000

0 1000 2000 3000 4000 5000 6000 7000 8000

tpm

C

Time (seconds) - 1 data point per minute

64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses

Batch Tx flush (1 sec) Per Tx flush

Page 31: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

TUNING FOR HARMONY CREATING A SEPARATE POOL TO SERVE IOPS WORKLOADS

Creating multiple pools in the CRUSH map

•  Distinct branch in OSD tree

•  Edit CRUSH map, add SSD rules

•  Create pool, set crush_ruleset to SSD rule

•  Add Volume Type to Cinder

Page 32: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

TUNING FOR HARMONY IF YOU MUST USE MAGNETIC MEDIA

Reducing seeks on magnetic pools

•  RBD cache is safe

•  RAID Controllers with write-back cache

•  SSD Journals

•  Software caches

Page 33: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

HARDWARE SELECTION CONSIDERATIONS

Page 34: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

ARCHITECTURAL CONSIDERATIONS UNDERSTANDING THE WORKLOAD

Traditional Ceph Workload •  $/GB

•  PBs

•  Unstructured data

•  MB/sec

MySQL Ceph Workload •  $/IOP

•  TBs

•  Structured data

•  IOPS

Page 35: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

ARCHITECTURAL CONSIDERATIONS FUNDAMENTALLY DIFFERENT DESIGN

Traditional Ceph Workload •  50-300+ TB per server

•  Magnetic Media (HDD)

•  Low CPU-core:OSD ratio

•  10GbE->40GbE

MySQL Ceph Workload •  < 10 TB per server

•  Flash (SSD -> NVMe)

•  High CPU-core:OSD ratio

•  10GbE

Page 36: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING

Ceph Test Drive: bit.ly/cephtestdrive

Percona Blog: https://www.percona.com/blog/2016/07/13/using-ceph-mysql/

Author: Yves Trudeau