18
RED HAT CEPH STORAGE ACCELERATION UTILIZING FLASH TECHNOLOGY Applications and Ecosystem Solutions Development Rick Stehno Red Hat Storage Day - Dallas 2017 1

Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Embed Size (px)

Citation preview

Page 1: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

RED HAT CEPH STORAGE ACCELERATION UTILIZING FLASH TECHNOLOGYApplications and Ecosystem Solutions Development

Rick Stehno

Red Hat Storage Day -

Dallas 2017 1

Page 2: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 2

• Utilize flash caching features to accelerate critical data. Caching methods

can be write-back for writes, write-thru for disk/cache transparency, read

cache, etc..

• Utilize storage tiering capabilities. Performance critical data resides on

flash storage, colder data resides on HDD

• Utilize all flash storage to accelerate performance when all application

data is performance critical or when the application does not provide the

features or capabilities to cache or to migrate the data

Three ways to accelerate application performance with flash

Flash Acceleration for Applications

Page 3: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 3

Configurations:

• All flash storage - Performance

• Highest performance per node

• Less maximum capacity per node

• Hybrid HDD and flash storage - Balanced

• Balances performance, capacity and cost

• Application and workload suitable for

• Performance critical data on flash

• Utilize host software caching or tiering on flash

• All HDD storage - Capacity

• Maximum capacity per node, lowest cost

• Lower performance per node

Ceph Software Defined Storage (SDS) Acceleration

Page 4: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 4

–Higher performance in half the rack space

–28% less power and cooling

–Higher MTBF inherent with reduced component count

–Reduced OSD recovery time per Ceph node

–Lower TCO

Why 1U server with 10 NVMe SSDs may be better choice

vs. 2U Server with 24 SATA SSDs

Storage - NVMe vs SATA SSD

Page 5: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 5

• 4.5x increase for 128k sequential

reads

• 3.5x increase for 128k sequential

writes

• 3.7x increase for 4k random reads

• 1.4x increase for 4k random 70/30

RR/RW

• Equal performance for 4k random

writes

Why 1U server with 10 NVMe SSDs may be better choice

vs. 2U Server with 24 SATA SSDs

All Flash Storage - NVMe vs SATA SSD cont’d

FIO Benchmarks(1x represents 24 SATA SSD baseline)

Page 6: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 6

Why 1U server with 10 NVMe SSDs may be better choice

vs. 2U Server with 24 SATA SSDs

All Flash Storage - NVMe vs SATA SSD cont’d

Increasing the load to extend NVMe

advantage over and above the 128

thread SATA SSD Test:

• 5.8x increase for Random Writes at

512 threads

• 3.1x increase for 70/30 RR/RW at

512 threads

• 4.2x increase for Random Reads at

790 threads

• 8.2x increase for Sequential Reads

at 1264 threads

10 NVMe SSDs support higher

workloads and more users

3x

5.8x

1.4x

3.1x

1.0x

4.2x

1.3x

8.2x

128Theads

512Theads

128Threads

512Threads

128threads

790threads

128threads

1264threads

Ga

ins

Random Write 70/30 RR/RW Random Reads Sequential Reads

Ceph RBD NVMe Performance Gains over SATA SSD

Random Writes 70/30 RR/RW Random Reads Sequential Reads

128k FIO RBD IOEngine Benchmark

Page 7: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 7

Price per MB/s: Cost of ((Retail Cost of SSD) / MB/s for each test)

SSD

Total SSD

Price

Price MB/s 128k Random Writes

128 threads

Price MB/s 128k Random Writes

512 threads

24 - SATA SSD 960G $7,896 24 - SATA SSD 960G $15.00

10 - NVMe 2TB $10,990 10 - NVMe 2TB $7.00 10 – NVMe 2TB $3.00

These prices do not include savings from electrical/cooling costs, reducing datacenter floor space, from the reduction of SATA SSD

Note: 128k random write FIO RBD benchmark: SATA SSD averaged 85% busy, NVMe averaged 80% busy with 512 threads

FIO RBD Maximum Threads Random Write Performance for NVMe

Ceph Storage Costs

Seagate SATA SSD vs. Seagate NVMe SSD

Page 8: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 8

MySQL

• MySQL is the most popular and the most widely used open-source database in the world

• MySQL is both feature rich in the areas of performance, scalability and reliability

• Database users demand high OLTP performance - Small random reads/writes

Ceph

• Most popular Software Defined Storage system

• Scalable

• Reliable

Does it make sense implementing Ceph into a MySQL

Database environment?

Ceph was not designed to provide high performance for OLTP environments

OLTP entails small random reads/writes

Page 9: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 9

MySQL Setup:

Release 5.7

45,000,000 rows

6GB Buffer

4G logfiles

RAID 0 over 18 HDD

Ceph Setup:

3 Nodes each containing:

Jewel Using Filestore

4 NVMe SSDs

1 Pool over 12 NVMe SSDs

Replica 2

40G private and public

network

For all tests, all MySQL

files were local on local

server except the database

file, this file was moved to

the Ceph cluster.

MySQL - Comparing Local HDD to Ceph Cluster

Threads

Page 10: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 10

MySQL - Comparing Local NVMe SSD to Ceph Cluster

MySQL Setup:

Release 5.7

45,000,000 rows

6GB Buffer

4G logfiles

RAID 0 over 4 NVMe SSDs

Ceph Setup:

3 Nodes each containing:

Jewel Using Filestore

4 NVMe SSDs

1 Pool over 12 NVMe SSDs

Replica 1

40G private and public

network

For all tests, all MySQL

files were local on local

server except the database

file, this file was moved to

the Ceph cluster.

Page 11: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 11

All SSD

Case-1: Case-2: Case-3:

2 SSDs 2 SSDs 1 PCIe flash

1 OSD/SSD 4 OSDs/SSD 4 OSDs/SSD

8 OSD journals on PCIe flash

0

100000

200000

300000

400000

500000

600000

700000

800000

0

200000

400000

600000

800000

1000000

1200000

2 ssd, 2 osd 2 ssd, 8 osd 2 ssd, 8 osd,+journal

IOP

S

KB

/s

FIO Random Write - 200 Threads -128k Data

Seagate SSD and Seagate PCIe Storage

Ceph All Flash Storage Acceleration

Page 12: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 12

Ceph All Flash Storage Acceleration

4K FIO RBD Benchmarks

3 node Ceph cluster

100G Public and Private Networks

4 - Seagate NVMe SSD per node

12 Seagate NVMe SSD per cluster

Benchmark 1:

1 - OSD per NVMe SSD

Benchmark 2:

4 - OSD per NVMe

Page 13: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 13

• Use RAW device or create 1st partition on 1M boundary (sector 2048 for 512B

sectors, sector 256 for 4k sectors)

• Ceph-deploy uses the optimal alignment when creating an OSD

• Use blk-mq/scsi-mq if kernel supports it

• rq_affinity = 1 for NVMe, rq_affinity = 2 for non-NVMe

• rotational = 0

• blockdev --setra 256 (for 4k sectors, 4096 for 512B sectors)

Linux tuning is still a requirement to get optimum performance out of a SSD

Linux Flash Storage Tuning

Page 14: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 14

• If using an older kernel that doesn’t support BLK-MQ, use:

• “deadline” IO-Scheduler with supporting variables:

• fifo-batch

• front-merges

• writes-starved

• XFS Mount options:

• nobarrier,discard,noatime,attr2,inode64,noquota

• MySQL – when using flash, configure both innodb_io_capacity and

innodb_lru_scan_depth

• Modify Linux read ahead on mapped RBD image on client

• echo 1024 > /sys/class/block/rbd0/queue/read_ahead_kb

Linux tuning is still a requirement to get optimum performance out of a SSD

Linux Flash Storage Tuning cont’d

Page 15: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 15

Flash Storage Device Configuration

Ceph tuning is still a requirement to get optimum performance out of a SSD

Ceph tuning options that can make a difference:

• RBD Cache:

• If using a smaller number of SSD/NVMe SSD, test with creating multiple OSD’s

per SSD/NVMe SSD. Have seen good performance increases using 4 OSD per

SSD/NVMe SSD

Page 16: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 16

Flash Storage Device Configuration

If the NVMe SSD or SAS/SATA SSD device can be configured to use a 4k sector size,

this could increase performance for certain applications like databases.

For all of my FIO tests with the RBD engine and for all of my MySQL tests, I saw up to

a 3x improvement (depending on the test) when using 4k sector sizes compared to

using 512 byte sectors.

Precondition all SSD before running benchmarks. Have seen over a 3x gain in

performance after preconditioning

Storage devices used for all of the above benchmarks/tests:

• Seagate Nytro XF1440 NVMe SSD

• Seagate Nytro XF1230 SATA SSD

• Seagate 1200.2 SAS SSD

• Seagate XP6500 PCIe Flash Accelerator Card

Page 17: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 17

Seagate Broadest PCIe, SAS and SATA Portfolio

Page 18: Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Seagate Confidential 18Seagate Confidential

Thank You!

Questions?

Learn how Seagate accelerates storage

with one of the broadest SSD and Flash

portfolios in the market