MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1%...

Preview:

Citation preview

MySQL and Ceph 2 August 2016

WHOIS

Brent Compton and Kyle Bader Storage Solution Architectures Red Hat

Yves Trudeau Principal Architect Percona

AGENDA

MySQL on Ceph

•  Ceph Architecture •  MySQL on Ceph RBD •  Sample Benchmark Results •  Hardware Selection Considerations

Why MySQL on Ceph

•  Ceph #1 block storage for OpenStack clouds

•  70% apps on OpenStack use LAMP stack

•  MySQL leading open-source RDBMS

•  Ceph leading open-source software-defined storage

WHY MYSQL ON CEPH? MARKET DRIVERS

•  Shared, elastic storage pool on commodity servers

•  Dynamic DB placement

•  Flexible volume resizing

•  Live instance migration

•  Backup block pool to object pool

•  Read replicas via copy-on-write snapshots

•  … commonality with public cloud deployment models

WHY MYSQL ON CEPH? EFFICIENCY DRIVERS

CEPH ARCHITECTURE

ARCHITECTURAL COMPONENTS

RGW A web services

gateway for object storage, compatible with S3 and Swift

LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBD A reliable, fully-distributed block device with cloud

platform integration

CEPHFS A distributed file

system with POSIX semantics and scale-

out metadata

APP HOST/VM CLIENT

RADOS COMPONENTS

OSDs •  10s to 10000s in a cluster •  Typically one per disk •  Serve stored objects to clients •  Intelligently peer for replication & recovery

Monitors •  Maintain cluster membership and state •  Provide consensus for distributed decision-making •  Small, odd number •  These do not serve stored objects to clients

CEPH OSD

RADOS CLUSTER

RADOS CLUSTER

WHERE DO OBJECTS LIVE?

??

A METADATA SERVER?

1

2

CALCULATED PLACEMENT

EVEN BETTER: CRUSH

CLUSTER PLACEMENT GROUPS (PGs)

CRUSH IS A QUICK CALCULATION

CLUSTER

DYNAMIC DATA PLACEMENT

CRUSH: •  Pseudo-random placement algorithm

•  Fast calculation, no lookup •  Repeatable, deterministic

•  Statistically uniform distribution •  Stable mapping

•  Limited data migration on change •  Rule-based configuration

•  Infrastructure topology aware •  Adjustable replication •  Weighting

DATA IS ORGANIZED INTO POOLS

CLUSTER POOLS (CONTAINING PGs)

POOL A

POOL B

POOL C

POOL D

ACCESS METHODS

ARCHITECTURAL COMPONENTS

RGW A web services

gateway for object storage, compatible with S3 and Swift

LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBD A reliable, fully-distributed block device with cloud

platform integration

CEPHFS A distributed file

system with POSIX semantics and scale-

out metadata

APP HOST/VM CLIENT

ARCHITECTURAL COMPONENTS

RGW A web services

gateway for object storage, compatible with S3 and Swift

LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBD A reliable, fully-distributed block device with cloud

platform integration

CEPHFS A distributed file

system with POSIX semantics and scale-

out metadata

APP HOST/VM CLIENT

STORING VIRTUAL DISKS

RADOS CLUSTER

STORING VIRTUAL DISKS

RADOS CLUSTER

STORING VIRTUAL DISKS

RADOS CLUSTER

PERCONA SERVER ON KRBD

RADOS CLUSTER

TUNING MYSQL ON CEPH

HEAD-TO-HEAD: MYSQL ON CEPH VS. AWS

31

18 18

78

-

10

20

30

40

50

60

70

80

90

IOPS

/GB

(S

ysbe

nch

Writ

e)

AWS EBS Provisioned-IOPS Ceph on Supermicro FatTwin 72% Capacity Ceph on Supermicro MicroCloud 87% Capacity Ceph on Supermicro MicroCloud 14% Capacity

TUNING FOR HARMONY OVERVIEW

Tuning MySQL •  Buffer pool > 20%

•  Flush each Tx or batch?

•  Parallel double write-buffer

flush Tuning Ceph •  RHCS 1.3.2, tcmalloc 2.4

•  128M thread cache

•  Co-resident journals

•  2-4 OSDs per SSD

TUNING FOR HARMONY SAMPLE EFFECT OF MYSQL BUFFER POOL ON TpmC

-

200,000

400,000

600,000

800,000

1,000,000

1,200,000

0 1000 2000 3000 4000 5000 6000 7000 8000

tpm

C

Time (seconds) - 1 data point per minute

64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses

1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool

TUNING FOR HARMONY SAMPLE EFFECT OF MYSQL Tx FLUSH ON TpmC

-

500,000

1,000,000

1,500,000

2,000,000

2,500,000

0 1000 2000 3000 4000 5000 6000 7000 8000

tpm

C

Time (seconds) - 1 data point per minute

64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses

Batch Tx flush (1 sec) Per Tx flush

TUNING FOR HARMONY CREATING A SEPARATE POOL TO SERVE IOPS WORKLOADS

Creating multiple pools in the CRUSH map

•  Distinct branch in OSD tree

•  Edit CRUSH map, add SSD rules

•  Create pool, set crush_ruleset to SSD rule

•  Add Volume Type to Cinder

TUNING FOR HARMONY IF YOU MUST USE MAGNETIC MEDIA

Reducing seeks on magnetic pools

•  RBD cache is safe

•  RAID Controllers with write-back cache

•  SSD Journals

•  Software caches

HARDWARE SELECTION CONSIDERATIONS

ARCHITECTURAL CONSIDERATIONS UNDERSTANDING THE WORKLOAD

Traditional Ceph Workload •  $/GB

•  PBs

•  Unstructured data

•  MB/sec

MySQL Ceph Workload •  $/IOP

•  TBs

•  Structured data

•  IOPS

ARCHITECTURAL CONSIDERATIONS FUNDAMENTALLY DIFFERENT DESIGN

Traditional Ceph Workload •  50-300+ TB per server

•  Magnetic Media (HDD)

•  Low CPU-core:OSD ratio

•  10GbE->40GbE

MySQL Ceph Workload •  < 10 TB per server

•  Flash (SSD -> NVMe)

•  High CPU-core:OSD ratio

•  10GbE

Ceph Test Drive: bit.ly/cephtestdrive

Percona Blog: https://www.percona.com/blog/2016/07/13/using-ceph-mysql/

Author: Yves Trudeau

Recommended