37
©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject to change without notice. All information is provided on an “AS IS” basis without warranties of any kind. Statements regarding products, including regarding their features, availability, functionality, or compatibility, are provided for informational purposes only and do not modify the warranty, if any, applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their respective owners. Ryan Meredith Micron Principal Solutions Engineer Unleashing the Power of Flash in Ceph DataStores: An All-NVMe Ceph Performance Deep Dive

Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject

to change without notice. All information is provided on an “AS IS” basis without warranties of any kind.

Statements regarding products, including regarding their features, availability, functionality, or

compatibility, are provided for informational purposes only and do not modify the warranty, if any,

applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron

trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their

respective owners.

Ryan MeredithMicron Principal Solutions Engineer

Unleashing the Power of Flash in Ceph DataStores:An All-NVMe™ Ceph Performance Deep Dive

Page 2: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

▪ Austin, TX

▪ Big Fancy Lab

▪ Real-world application performance testing using Micron Storage & DRAM− Ceph, VSAN, Storage Spaces

− Hadoop, Spark

− MySQL, MSSQL, Oracle

− Cassandra, MongoDB

Micron Storage Solutions Engineering

2

Page 3: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

How Micron Accelerated Solutions Are Born

3

Micron SSDs

Partner ISV SW

Partner OEM HW

Micron Memory

Micron Reference Architecture

DIY Solution

Pre-Validated Solution

Micron Storage Solutions Center

Page 4: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Micron Accelerated Solutions

4

Micron + Red Hat + SupermicroALL-NVMeCeph RA

2017 Micron Ceph RA 2018 Micron Ceph RA

Intel Broadwell 2699v4

256GB DRAM

50 GbE Networking

RHCS 2.1 (Jewel 10.2.3)

Intel Purley 8168

384GB DRAM

100 GbE Networking

RHCS 3.0 (Luminous 12.2.1)

4KB Read 4KB 70/30 R/W 4KB Write

1,148K IOPS

2,013K IOPS

448K IOPS

837K IOPS

246K IOPS

375K IOPS

Micron + Red Hat Ceph Storage Reference Architectures

Page 5: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Micron + Red Hat + SupermicroAll-NVMe CephReference Architecture2018 Edition

5

Page 6: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Hardware Configuration

Storage Nodes (x4)

▪ Supermicro SYS-1029U-TN10RT+

▪ 2x Intel 8168 24 core Xeon, 2.7Ghz Base / 3.7Ghz Turbo

▪ 384GB Micron High Quality Excellently Awesome DDR4-2666 DRAM (12x 32GB)

▪ 2x Mellanox ConnectX-5 100GbE 2-port NICs

− 1 NIC for client network / 1 NIC for storage network

▪ 10x Micron 6.4TB 9200MAX NVMe SSD

− 3 Drive Writes per Day Endurance

− 770k 4KB Random Read IOPs / 270k 4KB Random Write IOPs

− 3.15 GB/s Sequential Read / 2.3 GB/s Sequential Write

− 64TB per Storage Node / 256TB in 4 node RA as tested

6

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

Page 7: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Hardware Configuration

Monitor Nodes (x3)

▪ Supermicro SYS-1028U-TNRT+ (1U)− 128 GB DRAM

− 50 GbE Mellanox ConnectX-4

Network

▪ 2x Supermicro SSE-C3632SR, 100GbE 32-Port Switches− 1 switch for client network / 1 switch for storage network

Load Generation Servers (Clients)▪ 10x Supermicro SYS-2028U (2U)

▪ 2x Intel 2690v4

▪ 256GB RAM

▪ 50 GbE Mellanox ConnectX-4

7

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

Page 8: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Software Configuration

Storage + Monitor Nodes + Clients

▪ Red Hat Ceph Storage 3.0 (Luminous 12.2.1)

▪ Red Hat Enterprise Linux 7.4

▪ Mellanox OFED Driver 4.1

Switch OS

▪ Cumulus Linux 3.4.1

Deployment Tool▪ Ceph-Ansible

8

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

Page 9: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Performance Testing Methodology

▪ 2 OSDs per NVMe Drive / 80 OSDs total

▪ Ceph Storage Pool Config− 2x Replication: 8192 PG’s, 100x 75GB RBD Images = 7.5TB data x 2

− 3x Replication: 8192 PG’s, 100x 50GB RBD Images = 5TB x 3

▪ FIO RBD for Block Tests− Writes: FIO at queue depth 32 while scaling up # of client FIO processes

− Reads: FIO against all 100 RBD Images, scaling up QD

▪ RADOS Bench for Object Tests− Writes: RADOS Bench @ threads 16, scaling up # of clients

− Reads: RADOS Bench on 10 clients, scaling up # of threads

▪ 10-minute test runs x 3 for recorded average performance results (5 min ramp up on FIO)

9

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

Page 10: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

0

200 K

400 K

600 K

800 K

1,000 K

1,200 K

1,400 K

1,600 K

1,800 K

2,000 K

QD 1 QD 4 QD 8 QD 16 QD 32

Ave

rag

e L

ate

ncy (

ms)

4K

B R

and

om

Re

ad

IO

Ps

100 FIO RBD Clients @ Varying Queue Depths

Red Hat Ceph 3.0 RBD 4KB Random ReadIOPs + Average Latency

4KB Random Read IOPs Average Latency (ms)

FIO RBD 4KB Random Read Performance

10

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

4KB Random ReadPerformance:

▪ 2 Million IOPs @ 1.6ms Avg. Latency

▪ 1.96 Million IOPs @ 0.8ms Avg. Latency

CPU Limited

Page 11: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

FIO RBD 4KB Random Read Performance

11

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

4KB Random Read Performance:

▪ 1.96 Million 4KB Random Reads @ 33ms 99.99% Latency

▪ Tail latency spikes above Queue Depth 16

CPU Limited

0

50

100

150

200

250

300

350

0

200 K

400 K

600 K

800 K

1,000 K

1,200 K

1,400 K

1,600 K

1,800 K

2,000 K

QD 1 QD 4 QD 8 QD 16 QD 32

99.9

9%

Late

ncy (

ms)

4K

B R

and

om

Re

ad

IO

Ps

100 FIO RBD Clients @ Varying Queue Depths

Red Hat Ceph 3.0 RBD 4KB Random ReadIOPs + Tail Latency

4KB Random Read IOPs 99.99% Latency(ms)

Page 12: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

FIO RBD 4KB Random Write Performance

12

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

4KB Random Write Performance:

▪ 375k IOPs @ 8.5ms Avg. Latency (100 clients)

▪ 363k IOPs @ 5.3ms Avg. Latency (60 Clients)

CPU Limited 0

1

2

3

4

5

6

7

8

9

10

0

50 K

100 K

150 K

200 K

250 K

300 K

350 K

400 K

20 Clients 40 Clients 60 Clients 80 Clients 100 Clients

Ave

rag

e L

ate

ncy (

ms)

4K

B R

and

om

Write

IO

Ps

FIO RBD @ Queue Depth 32

Red Hat Ceph 3.0 RBD 4KB Random WriteIOPs + Average Latency

4KB Random Write IOPs Average Latency (ms)

Page 13: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

FIO RBD 4KB Random Write Performance

13

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

4KB Random Write Performance:

▪ 362k 4KB Random Writes @ 50ms 99.99% Latency

Tail latency spikes above 70 clients

CPU Limited

0

50

100

150

200

250

300

350

400

450

0

50 K

100 K

150 K

200 K

250 K

300 K

350 K

400 K

20 Clients 40 Clients 60 Clients 80 Clients 100 Clients

99.9

9%

Late

ncy (

ms)

4K

B R

and

om

Write

IO

Ps

FIO RBD @ Queue Depth 32

Red Hat Ceph 3.0 RBD 4KB Random WriteIOPs + Tail Latency

4KB Random Write IOPs 99.99% Latency(ms)

Page 14: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Rados Bench 4MB Object Read Performance

14

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

4MB Object Read Performance:

▪ 30 GB/s @ 85ms (32 Threads)

▪ 29 GB/s @ 44ms (16 Threads)

Ceph RA is tuned for small block performance

Object Read is Software Limited

Should be possible to tune ceph.conf for higher object read performance

0

10

20

30

40

50

60

70

80

90

0

5

10

15

20

25

30

4 Threads 8 Threads 16 Threads 32 Threads

Ave

rag

e L

ate

ncy (

ms)

Re

ad

Th

rou

gh

pu

t (G

B/s

)

Rados Bench Read, 10 Clients @ Varying Threads

Red Hat Ceph 3.0 Rados Bench 4MB Object ReadThroughput + Average Latency

2x Pool 4MB Read Throughput (MB/s) 2x Pool 4MB Read Avg Latency (ms)

Page 15: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Rados Bench 4MB Object Write Performance

15

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

4MB Object Write Performance:

▪ 10.3 GB/s @ 50ms (8 Clients)

▪ 9.8 GB/s @ 39ms (6 Clients)

Ceph RA is tuned for small block performance

Object Write is Software Limited

May be possible to tune ceph.conf for higher object write performance

0

10

20

30

40

50

60

70

0

2

4

6

8

10

2 Clients 4 Clients 6 Clients 8 Clients 10 Clients

Ave

rag

e L

ate

ncy (

ms)

Write

Th

rou

gh

pu

t (G

B/s

)

Rados Bench Write, Varying Clients @ 16 Threads

Red Hat Ceph 3.0 Rados Bench 4MB Object WriteThroughput + Average Latency

2x Pool 4MB Write Throughput (MB/s) 2x Pool 4MB Write Avg Latency (ms)

Page 16: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Bluestorevs. Filestore

16

Page 17: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Bluestore & NVMe

▪ Red Hat Ceph 3.0 currently supports Filestore− Will support Bluestore in upcoming release

▪ Ceph Luminous Community 12.2.4− Tested Bluestore on the same RA hardware

▪ Default RocksDB tuning for Bluestore in Ceph− Great for large object

− Bad for 4KB random on NVMe

− Worked w/ Mark Nelson & Red Hat team to tune RocksDBfor good 4KB random performance

17

The Tune-Pocalypse

Page 18: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Bluestore& NVMe

18

The Tune-Pocalypse

Bluestore OSD Tuning for 4KB Random Writes:

▪ Set high max_write_buffer_number &

min_write_buffer_number_to_merge

▪ Set Low write_buffer_size[osd]

bluestore_cache_kv_max = 200G

bluestore_cache_kv_ratio = 0.2

bluestore_cache_meta_ratio = 0.8

bluestore_cache_size_ssd = 18G

osd_min_pg_log_entries = 10

osd_max_pg_log_entries = 10

osd_pg_log_dups_tracked = 10

osd_pg_log_trim_min = 10

bluestore_rocksdb_options =

compression=kNoCompression,max_write_buffer_number=64,min_wr

ite_buffer_number_to_merge=32,recycle_log_file_num=64,compac

tion_style=kCompactionStyleLevel,write_buffer_size=4MB,targe

t_file_size_base=4MB,max_background_compactions=64,level0_fi

le_num_compaction_trigger=64,level0_slowdown_writes_trigger=

128,level0_stop_writes_trigger=256,max_bytes_for_level_base=

6GB,compaction_threads=32,flusher_threads=8,compaction_reada

head_size=2MB

Page 19: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

0

500 K

1,000 K

1,500 K

2,000 K

QD 1 QD 4 QD 8 QD 16 QD 32

Ave

rag

e L

ate

ncy (

ms)

4K

B R

and

om

Write

IO

Ps

4KB Random Read: Filestore vs. BluestoreIOPs + Average Latency

Filestore 4KB Random Read IOPs Bluestore 4KB Random Read IOPs

Filestore Avg. Latency (ms) Bluestore Avg. Latency (ms)

Bluestore vs. Filestore: 4KB Random Read

19

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

4KB Random Reads:

▪ Queue Depth 32− Bluestore:

2.15 Million @ 1.5ms Avg. Latency

− Filestore: 2.0 Million @ 1.6ms Avg. Latency

Bluestore is slightly more performant than Filestore on 4KB Rand Reads

Page 20: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

0

50

100

150

200

250

300

350

0

500 K

1,000 K

1,500 K

2,000 K

QD 1 QD 4 QD 8 QD 16 QD 32

99.9

9%

Late

ncy (

ms)

4K

B R

and

om

Write

IO

Ps

4KB Random Read: Filestore vs. BluestoreIOPs + Tail Latency

Filestore 4KB Random Read IOPs Bluestore 4KB Random Read IOPs

Filestore 99.99% Latency (ms) Bluestore 99.99% Latency (ms)

Bluestore vs. Filestore: 4KB Random Read

20

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

4KB Random Reads:

▪ Queue Depth 32− Bluestore Tail Latency:

251 ms

− Filestore Tail Latency: 330 ms

Bluestore has lower tail latency and higher IOPs at high queue depths

Page 21: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

0

1

2

3

4

5

6

7

8

9

10

0

50 K

100 K

150 K

200 K

250 K

300 K

350 K

400 K

450 K

20 FIO Clients 40 FIO Clients 60 FIO Clients 80 FIO Clients 100 FIO Clients

Ave

rag

e L

ate

ncy (

ms)

4K

B R

and

om

Write

IO

Ps

4KB Random Write: Filestore vs. BluestoreIOPs + Average Latency

Filestore 4KB Random Write IOPs Bluestore 4KB Random Write IOPs

Filestore Avg. Latency (ms) Bluestore Avg. Latency (ms)

Bluestore vs. Filestore: 4KB Random Write

21

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

4KB Random Writes:

▪ 18% Higher IOPs

▪ 15% Lower Avg. Latency

▪ 100 Clients

− Bluestore: 443k IOPs @ 7ms

− Filestore: 375k @ 8.5ms

Page 22: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

0

50

100

150

200

250

300

350

400

450

0

50 K

100 K

150 K

200 K

250 K

300 K

350 K

400 K

450 K

20 FIO Clients 40 FIO Clients 60 FIO Clients 80 FIO Clients 100 FIO Clients

99.9

9%

Late

ncy (

ms)

4K

B R

and

om

Write

IO

Ps

4KB Random Write: Filestore vs. BluestoreIOPs + Tail Latency

Filestore 4KB Random Write IOPs Bluestore 4KB Random Write IOPs

Filestore 99.99% Latency (ms) Bluestore 99.99% Latency (ms)

Bluestore vs. Filestore: 4KB Random Write

22

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

4KB Random Writes:

▪ 18% Higher IOPs

▪ Up to 70% reduced tail latency

▪ 100 Clients

− Bluestore Tail Latency: 89ms

− Filestore Tail Latency: 382ms

Page 23: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

0

10

20

30

40

50

60

70

80

90

100

0

5

10

15

20

25

30

35

40

45

50

4 Threads 8 Threads 16 Threads 32 Threads

Ave

rag

e L

ate

ncy (

ms)

Th

rou

gh

pu

t (G

B/s

)

4MB Object Read: Filestore vs. BluestoreThroughput + Average Latency

Filestore Throughput (GB/s) Bluestore Throughput (GB/s)

Filestore Latency Bluestore Latency (ms)

Bluestore vs. Filestore: 4MB Object Read

23

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

4KB Random Writes:

▪ 16 Threads:

− Bluestore: 42.9 GB/s @ 29ms

− Filestore: 29 GB/s @ 44ms

Bluestore improves object read

▪ 48% higher throughput

▪ 33% lower average latency

Page 24: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

0

10

20

30

40

50

60

70

0

2

4

6

8

10

12

14

16

18

20

2 Clients 4 Clients 6 Clients 8 Clients 10 Clients

Ave

rag

e L

ate

ncy (

ms)

Th

rou

gh

pu

t (G

B/s

)

4MB Object Write: Filestore vs. BluestoreThroughput + Average Latency

Filestore Throughput (GB/s) Bluestore Throughput (GB/s)

Filestore Latency Bluestore Latency (ms)

Bluestore vs. Filestore: 4MB Object Write

24

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

4KB Random Writes:

▪ 6 Clients:− Bluestore:

18.4 GB/s @ 21ms

− Filestore: 9.8 GB/s @ 39ms

Bluestore improves object Write

▪ 2x higher throughput

▪ Half average latency

Page 25: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

287k

62k

360k

76k

503k

94k

539k

111k

4KB Random Read 4KB Random Write

4KB Random IO Performance / Node

2017 Micron Ceph RA FilestoreRHCS 2.1/9100MAX

2017 Competitor RA BluestoreLuminous 12.0/P4800X+P4500 NVMe

2018 Micron Ceph RA FilestoreRHCS 3.0/9200MAX

2018 Micron Ceph RA BluestoreLuminous 12.2/9200MAX

Performance Comparison: 4KB Random Block

25

2017 Micron RA vs. 2017 Competitor vs. 2018 Micron RA + Bluestore

4KB Random IOPs

▪ 4KB Random Reads

− 1.75X Micron 2017 RA tIOPs Micron 2017 RA to 2018 RA

− 1.9X IOPs Micron 2017 RA to 2018 RA + Bluestore

▪ 4KB Random Writes

− 1.5X IOPs Micron 2017 RA to 2018 RA

− 1.8X IOPs o 2018 RA + Bluestore

Page 26: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Performance Comparison: 4KB Random Block

26

2017 Micron RA vs. 2018 Micron RA + Bluestore

242k IOPs

448k IOPs

1.15M IOPs

375k IOPs

837k IOPs

2.0M IOPs

443k IOPs

957k IOPs

2.15M IOPs

4KB Random Write 4KB Random 70/30 R/W 4k Random Read

4KB Random Block Performance ComparisonCeph RBD FIO

2017 Micron Ceph RA FilestoreRHCS 2.1/9100MAX

2018 Micron Ceph RA FilestoreRHCS 3.0/9200MAX

2018 Micron Ceph RA BluestoreLuminous 12.2/9200MAX

Page 27: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Performance Comparison: 4MB Object

27

2017 Micron RA vs. 2018 Micron RA + Bluestore

4MB Object Throughput

▪ Read Throughput − 1.4X Throughput 2017

to 2018

− 1.95X Throughput 2017 to 2018+Bluestore

▪ Write Throughput− 2.2x Throughput 2017

to 2018

− 3.9x Throughput 2017 to 2018+Bluestore

22GB/s

4.6GB/s

30GB/s

10GB/s

43GB/s

18GB/s

4MB Object Read Throughput 4MB Object Write Throughput

4MB Object Performance: RADOS Bench

2017 Micron Ceph RA FilestoreRHCS 2.1/9100MAX

2018 Micron Ceph RA FilestoreRHCS 3.0/9200MAX

2018 Micron Ceph RA BluestoreLuminous 12.2/9200MAX

Page 28: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Thanks All

28

These slides will be available on the OpenStack conference website

Reference architecture available now!

Page 29: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Micron CephCollateral

Micron NVMe Reference Architecture:

https://www.micron.com/~/media/documents/products/technical-marketing-brief/micron_9200_ceph_3,-d-,0_reference_architecture.pdf

Protip: Just Google “Micron 9200 Ceph”

29

Micron + Red Hat + Supermicro ALL-NVMe Ceph RA

Page 30: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance
Page 31: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance
Page 32: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

Micron CephAppendix

32

ALL-NVMe CephRA: FilestoreCeph.conf

[global]

auth client required = none

auth cluster required = none

auth service required = none

auth supported = none

mon host = xxx

ms_type = async

rbd readahead disable after bytes = 0

rbd readahead max bytes = 4194304

mon compact on trim = False

mon_allow_pool_delete = true

osd pg bits = 8

osd pgp bits = 8

osd_pool_default_size = 2

mon pg warn max object skew = 100000

perf = True

mutex_perf_counter = True

throttler_perf_counter = False

rbd cache = false

mon_max_pg_per_osd = 800

[mon]

mon_osd_max_split_count = 10000

[osd]

osd journal size = 20480

osd mount options xfs =

noatime,largeio,inode64,swalloc

osd mkfs options xfs = -f -i size=2048

osd_op_threads = 32

filestore_queue_max_ops = 5000

filestore_queue_committing_max_ops = 5000

journal_max_write_entries = 1000

journal_queue_max_ops = 3000

objecter_inflight_ops = 102400

filestore_wbthrottle_enable = False

osd mkfs type = xfs

filestore_max_sync_interval = 10

osd_client_message_size_cap = 0

osd_client_message_cap = 0

osd_enable_op_tracker = False

filestore_fd_cache_size = 64

filestore_fd_cache_shards = 32

filestore_op_threads = 6

filestore_queue_max_bytes=1048576000

filestore_queue_committing_max_bytes=1048576

000

journal_max_write_bytes=1048576000

journal_queue_max_bytes=1048576000

ms_dispatch_throttle_bytes=1048576000

objecter_infilght_op_bytes=1048576000

Page 33: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

[global]

auth client required = none

auth cluster required = none

auth service required = none

auth supported = none

mon host = xxxxxxx

osd objectstore = bluestore

cephx require signatures = False

cephx sign messages = False

mon_allow_pool_delete = true

mon_max_pg_per_osd = 800

mon pg warn max per osd = 800

ms_crc_header = False

ms_crc_data = False

ms type = async

perf = True

rocksdb_perf = True

osd_pool_default_size = 2

[mon]

mon_max_pool_pg_num = 166496

mon_osd_max_split_count = 10000

[client]

rbd_cache = false

rbd_cache_writethrough_until_flush = false

[osd]

bluestore_csum_type = none

bluestore_cache_kv_max = 200G

bluestore_cache_kv_ratio = 0.2

bluestore_cache_meta_ratio = 0.8

bluestore_cache_size_ssd = 18G

bluestore_extent_map_shard_min_size = 50

bluestore_extent_map_shard_max_size = 200

bluestore_extent_map_shard_target_size =

100

osd_min_pg_log_entries = 10

osd_max_pg_log_entries = 10

osd_pg_log_dups_tracked = 10

osd_pg_log_trim_min = 10

bluestore_rocksdb_options =

compression=kNoCompression,max_write_buffer_

number=64,min_write_buffer_number_to_merge=3

2,recycle_log_file_num=64,compaction_style=k

CompactionStyleLevel,write_buffer_size=4MB,t

arget_file_size_base=4MB,max_background_comp

actions=64,level0_file_num_compaction_trigge

r=64,level0_slowdown_writes_trigger=128,leve

l0_stop_writes_trigger=256,max_bytes_for_lev

el_base=6GB,compaction_threads=32,flusher_th

reads=8,compaction_readahead_size=2MB

Micron CephAppendix

33

ALL-NVMe CephRA: BluestoreCeph.conf

Page 34: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

But What About 3x Replicated Pools?

34

2x vs. 3x Replicated pools: 4KB Random Read

4KB Random Reads are the same on 2x and 3x replicated pools

Queue Depth 32:

▪ 2x: 2.01M IOPs @ 1.59ms

▪ 3x: 2.05M IOPs @ 1.57ms

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

0

200 K

400 K

600 K

800 K

1,000 K

1,200 K

1,400 K

1,600 K

1,800 K

2,000 K

QD 1 QD 4 QD 8 QD 16 QD 32

Ave

rag

e L

ate

ncy (

ms)

4K

B R

and

om

Re

ad

IO

Ps

100 FIO RBD Clients @ Varying Queue Depths

4KB Random Read: 2x vs. 3x Replicated PoolIOPs + Average Latency

2x Pool 4KB Random Read IOPs 3x Pool 4KB Random Read IOPs

2x Pool Average Latency (ms) 3x Pool Average Latency (ms)

Page 35: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

But What About 3x Replicated Pools?

35

2x vs. 3x Replicated pools: 4KB Random Write

4KB Random Writes are exactly what you’d expect: 33% reduced IOPs

100 Clients:

▪ 2x: 375k IOPs @ 8.5ms

▪ 3x: 247k IOPs @ 12.9ms

0

2

4

6

8

10

12

14

0

50 K

100 K

150 K

200 K

250 K

300 K

350 K

400 K

20 Clients 40 Clients 60 Clients 80 Clients 100 Clients

Ave

rag

e L

ate

ncy (

ms)

4K

B R

and

om

Write

IO

Ps

FIO RBD @ Queue Depth 32

4KB Random Write: 2x vs. 3x Replicated PoolIOPs + Average Latency

2x Pool 4KB Random Write IOPs 3x Pool 4KB Random Write IOPs

2x Pool Average Latency (ms) 3x Pool Average Latency (ms)

Page 36: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

But What About 3x Replicated Pools?

36

2x vs. 3x Replicated pools: 4MB Object Read

4MB Object reads are nearly the same

32 Threads:

▪ 2x: 30.0 GB/s @ 85.4ms

▪ 3x: 29.9 GB/s @ 85.0ms

0

10

20

30

40

50

60

70

80

90

0

5

10

15

20

25

30

4 Threads 8 Threads 16 Threads 32 Threads

Ave

rag

e L

ate

ncy (

ms)

Re

ad

Th

rou

gh

pu

t (G

B/s

)

Rados Bench Read, 10 Clients @ Varying Threads

Red Hat Ceph 3.0 Rados Bench 4MB Object ReadThroughput + Average Latency

2x Pool 4MB Read Throughput (MB/s) 3x Pool 4MB Read Throughput (MB/s)

2x Pool 4MB Read Avg Latency (ms) 3x Pool 4MB Read Avg Latency (ms)

Page 37: Unleashing the Power of Flash in Ceph DataStores · Flash in Ceph DataStores: An All-NVMe™ CephPerformance Deep Dive Austin, TX Big Fancy Lab Real-world application performance

But What About 3x Replicated Pools?

37

2x vs. 3x Replicated pools: 4MB Object Writes

4MB Object Writes are ~33% lower on a 3x pool

8 Clients:

▪ 2x: 10.3 GB/s @ 49.8ms

▪ 3x: 6.7 GB/s @ 76.7ms

0

20

40

60

80

100

120

0

2

4

6

8

10

2 Clients 4 Clients 6 Clients 8 Clients 10 Clients

Ave

rag

e L

ate

ncy (

ms)

Write

Th

rou

gh

pu

t (G

B/s

)

Rados Bench Write, Varying Clients @ 16 Threads

Red Hat Ceph 3.0 Rados Bench 4MB Object WriteThroughput + Average Latency

2x Pool 4MB Write Throughput (MB/s) 3x Pool 4MB Write Throughput (MB/s)

2x Pool 4MB Write Avg Latency (ms) 3x Pool 4MB Write Avg Latency (ms)