54
Using Recently Published Ceph Reference Architectures to Select Your Ceph Daniel Ferber Open Source Software Defined Storage Technologist, Intel Storage Group May 25, 2016 Ceph Days Portland, Oregon

Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Embed Size (px)

Citation preview

Page 1: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Using Recently Published Ceph

Reference Architectures to

Select Your Ceph Configuration

Daniel FerberOpen Source Software Defined Storage Technologist,

Intel Storage Group May 25, 2016

Ceph Days Portland, Oregon

Page 2: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

2

Page 3: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

3

Legal noticesCopyright © 2016 Intel Corporation.

All rights reserved. Intel, the Intel Logo, Xeon, Intel Inside, and Intel Atom are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

FTC Optimization NoticeIntel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804

The cost reduction scenarios described in this document are intended to enable you to get a better understanding of how the purchase of a given Intel product, combined with a number of situation-specific variables, might affect your future cost and savings. Nothing in this document should be interpreted as either a promise of or contract for a given level of costs.

Page 4: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Welcome to Intel Hillsboroand Agenda for First Half of this Talk• Inventory of Published Referenced Architectures from Red Hat and

SUSE• Walk through highlights of a soon to be published Intel and Red Hat

Ceph Reference Architecture paper• Introduce an Intel all-NVMe Ceph configuration benchmark for

MySQL• Show examples of Ceph solutions

Then Jack Zhang, Intel SSD Architect – second half of this presentation

Page 5: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

What Are Reference Architecture Key Components• Starts with workload (use case) and points to one or more resulting

recommended configurations• Configurations should be recipes that one can purchase and build• Key related elements should be recommended • Replication versus EC, media types for storage, failure domains

• Ideally, performance data and tunings are supplied for the configurations

Page 6: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Tour of Existing Reference Architectures

Page 7: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Available Reference Architectures (recipes)

*Other names and brands may be claimed as the property of others.

Page 8: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Available Reference Architectures (recipes)• http://www.redhat.com/en/files/resources/en-rhst-cephstorage-supermicro-INC0270868_v2_0715.pdf• http://www.qct.io/account/download/download?order_download_id=1065&dtype=Reference%20Architecture• https://www.redhat.com/en/resources/red-hat-ceph-storage-hardware-configuration-guide• https://www.percona.com/resources/videos/accelerating-ceph-database-workloads-all-pcie-ssd-cluster• https://www.percona.com/resources/videos/mysql-cloud-head-head-performance-lab • http://h20195.www2.hpe.com/v2/GetDocument.aspx?docname=4aa6-3911enw• https://intelassetlibrary.tagcmd.com/#assets/gallery/11492083

Page 9: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

A Brief Look at 3 of the Reference Architecture Documents

Page 10: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

QCT and RED HAT CEPH Solution GUIDEQCT CEPH performance and sizing guide• Target audience: Mid-size to large cloud and enterprise customers• Showcases Intel based QCT solutions for multiple customer workloads

• Introduces a three tier configuration and solution model: • IOPS Optimized, Throughput Optimized, Capacity Optimized

• Specifies specific and orderable QCT solutions based on above classifications

• Shows actual Ceph performance observed for the configurations

• Purchase fully configured solutionsper above model from QCT

• Red Hat Ceph Storage Pre-Installed• Red Hat Ceph Storage support included• Datasheets and white papers at

www.qct.io

* Other names and brands may be claimed as the property of others* Other names and brands may be claimed as the property of others

Page 11: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

QCT and supermicro CEPH Solution GUIDESupermicro performance and sizing guide• Target audience: Mid-size to large cloud and enterprise customers• Showcases Intel based Supermicro solutions for multiple customer

workloads• Introduces a three tier configuration and solution model:

• IOPS Optimized, Throughput Optimized, Capacity Optimized• Specifies specific and orderable Supermicro solutions based on above

classifications• Shows actual Ceph performance observed for the configurations

• Purchase fully configured solutionsper above model from Supermicro

• Red Hat Ceph Storage Pre-Installed• Red Hat Ceph Storage support included• Datasheets and white papers at

supermicro.com* Other names and brands may be claimed as the property of others* Other names and brands may be claimed as the property of others

Page 12: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

INTEL CEPH Solution GUIDEintel solutions for ceph deployments• Target audience: Mid-size to large cloud and enterprise customers• Showcases Intel based solutions for multiple customer workloads

• Uses the three tier configuration and solution model: • IOPS Optimized, Throughput Optimized, Capacity Optimized

• Contains Intel configurations and performance data• Contains a Yahoo case study

• Contains specific use case examples• Adds a Good, Better, Best model for

all SSD Ceph configurations• Adds configuration and performance

data for Intel* Cache Acceleration• Overviews CeTune and VSM tools• Datasheets and white papers at

intelassetlibrary.tagcmd.com/#assets/gallery/11492083* Other names and brands may be claimed as the property of others* Other names and brands may be claimed as the property of others

Page 13: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Quick Look at 3 Tables Inside the Intel and Red Hat Reference Architecture

Document(to be published soon)

Page 14: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Generic Red Hat Ceph Reference Architecture Previewhttps://www.redhat.com/en/resources/red-hat-ceph-storage-hardware-configuration-guide

*Other names and brands may be claimed as the property of others.

• IOPS optimized config is all NVME SSD• Typically block with

replicationAllows database work

• Journals are NVME • Bluestore, when

supported, will increase performance

• Throughout optimized is a balanced config• HDD storage with

SSD journals• Block or object, with

replication• Capacity optimized

typically all HDD storage• Object and EC

Page 15: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Intel and Red Hat Ceph Reference Architecture Preview

https://www.redhat.com/en/resources/red-hat-ceph-storage-hardware-configuration-guide*Other names and brands may be claimed as the property of others.

• IOPS optimized Ceph clusters are typically in the TB ranges

• Throughput clusters will likely move to 2.5” inch enclosures and all SSD over time

• Capacity optimized likely to favor 3.5” for HDD storage

Page 16: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Intel and Red Hat Ceph Reference Architecture Preview

*Other names and brands may be claimed as the property of others.

• Specific recommended Intel processor and SSD models are now specified

• Intel processor recommendations depend on how many OSDs are used

Page 17: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Intel and Red Hat Ceph Reference Architecture

*Other names and brands may be claimed as the property of others.

• Recommendations for specific Intel SSDs and journals, with two options

• Specific Intel processor recommendations, depending on how many OSDs

Page 18: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Intel and Red Hat Ceph Reference Architecture

*Other names and brands may be claimed as the property of others.

• No SSDs for capacity model

• Specific Intel processor recommendations are same as on previous throughput config recommendations, and are based on number of OSDs

Page 19: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Intel all-NVMe SSDCeph Reference Architecture

Presented by Intel at Percona Live 2016

Page 20: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

*Other names and brands may be claimed as the property of others.

An “All-NVMe” high-density Ceph Cluster Configuration

Supermicro 1028U-TN10RT+

NVMe1 NVMe2

NVMe3 NVMe4

Ceph OSD1

Ceph OSD2

Ceph OSD3

Ceph OSD4

Ceph OSD16

5-Node all-NVMe Ceph ClusterDual-Xeon E5 [email protected], 44C HT, 128GB DDR4Centos 7.2, 3.10-327, Ceph v10.1.2, bluestore async

Clus

ter N

W 2

x 10G

bE10x Client Systems + 1x Ceph MON

Dual-socket Xeon E5 [email protected] Cores HT, 128GB DDR4

Public NW 2x 10GbE

Docker1 (krbd)MySQL DB Server

Docker2 (krbd)MySQL DB Server

Docker3Sysbench Client

FIO

Docker4Sysbench Client

DB containers - 16 vCPUs, 32GB mem, 200GB RBD volume, 100GB MySQL dataset, InnoDB buf cache 25GB (25%)

Client containers – 16 vCPUs, 32GB RAMFIO 2.8, Sysbench 0.5

Test-set 1

Test

-set

2

*Other names and brands may be claimed as the property of others.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.

Page 21: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Tunings for the all-NVE Ceph Cluster

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.

• TCMalloc with 128MB thread cache• or use JEMalloc

• Disable debug • Disable auth

Page 22: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

*Other names and brands may be claimed as the property of others.

All NVMe Flash Ceph Storage – Summary• Intel NVMe Flash storage works for low latency workloads• Ceph makes a compelling case for database workloads• 1.4 million random read IOPS is achievable in 5U with ~1ms latency

today. • Sysbench MySQL OLTP Performance numbers were good at 400k

70/30% OLTP QPS @~50 ms avg

• Using Xeon E5 v4 standard high-volume servers and Intel NVMe SSDs, one can now deploy a high performance Ceph cluster for database workloads

• Recipe and tunings for this solution are here:www.percona.com/live/data-performance-conference-2016/content/accelerating-ceph-database-workloads-all-pcie-ssd-cluster

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters. *Other names and brands may be claimed as the property of others.

Page 23: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Ceph Solutions Availablein addition to the

QCT, Supermicro, and HP Solutions Already Mentioned

Page 24: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Thomas Krenn SUSE Enterprise Storage

https://www.thomas-krenn.com/en/products/storage-systems/suse-enterprise-storage.html*Other names and brands may be claimed as the property of others.

Page 25: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Fujistu Intel Based Ceph Appliance

http://www.fujitsu.com/global/products/computing/storage/eternus-cd/s2/*Other names and brands may be claimed as the property of others.

Page 26: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Ceph Reference Architectures Summary

Page 27: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Ceph Reference Architectures Summary• The community has a growing number of good reference

architectures • Some point to specific hardware, others are generic• Different workloads are catered for• Some of the documents contain performance and tuning information• Commercial support available for professional services and software

support• Intel will continue to work with its ISV and hardware systems

partners on reference architectures• And continue Intel’s Ceph development focused on Ceph

performance

Page 28: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Next – a Focus on Nvm technologies for today’s and tomorrow’s Ceph

Jack Zhang, Enterprise Architect, Intel CorporationMay 2016

Page 29: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

29

Solid State Drive (SSD) for Ceph todayThree Ceph* Configurations and data

Page 30: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

30

The performance – Motivations for SSD on Ceph Storage Providers are Struggling to achieve the required high performance There is a growing trend for cloud provider to adopt SSD

– CSP who wants to build EBS alike service for their OpenStack based public/private cloud

Strong demands to run enterprise applications OLTP workloads running on Ceph high performance multi-purpose Ceph cluster is the key advantages Performance is still an important factorSSD price continue to decrease

Page 31: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Standard/goodNVMe/PCIe SSD for Journal + Caching, HDDs as OSD data driveExample: 1 x Intel P3700 1.6TB as Journal + Intel iCAS caching software, + 12 HDDs

Better (best TCO, as today)NVMe/PCIe SSD as Journal + High capacity SATA SSD for data driveExample: 1 x Intel P3700 800GB + 4 x Intel S3510 1.6TB

Best Performance All NVMe/PCIe SSDsExample: 4 x Intel P3700 2TB SSDs

Three Configurations for Ceph Storage Node Ceph storage node --Good

CPU Intel(R) Xeon(R) CPU E5-2650v3

Memory 64 GB

NIC 10GbE

Disks 1x 1.6TB P3700 + 12 x 4TB HDDs (1:12 ratio)P3700 as Journal and caching

Caching software Intel iCAS 3.0, option: RSTe/MD4.3

Ceph Storage node --Better

CPU Intel(R) Xeon(R) CPU E5-2690

Memory 128 GB

NIC Duel 10GbE

Disks 1x 800GB P3700 + 4x S3510 1.6TB

Ceph Storage node --Best

CPU Intel(R) Xeon(R) CPU E5-2699v3

Memory >= 128 GB

NIC 2x 40GbE, 4x dual 10GbE

Disks 4 x P3700 2TB

Page 32: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Intel® NVMe SSD + CAS Accelerates Ceph for CHALLENGE:

• How can I use Ceph for scalable storage solution?

(High latency and low throughput due to erasure coding, write twice, huge number of small files)• Use over-provision to address performance is

costly.Intel® CAS

SOLUTION:• Intel® NVMe SSD – consistently amazing• Intel® CAS 3.0 feature – hinting• Intel® CAS 3.0 fine tuned for Yahoo! – cache

metadata

YAHOO! PERFORMANCE GOAL:

2XThroughput

1/2LATENCY

COST REDUCTION:• CapEx savings (over-provision ) OpEx savings (Power, Space, Cooling… )• Improved scalability planning (Performance and

Predictability )

UserWeb Server

Client

Ceph Cluster

32

Page 33: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

33

Yahoo! (Ceph obj) - Results

Page 34: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

34

2x10Gb NIC

Test Environment 5x Client Node• Intel® Xeon™ processor

E5-2699 v3 @ 2.3GHz, 64GB mem

• 10Gb NIC5x Storage Node

• Intel Xeon™ processor E5-2699 v3 @ 2.3 GHz

• 128GB Memory• 1x 1T HDD for OS• 1x Intel® DC P3700

800G SSD for Journal (U.2)

• 4x 1.6TB Intel® SSD DC S3510 as data drive

• 2 OSD instances one each S3510 SSD

CEPH1MON

OSD1 OSD8…

FIO FIO

CLIENT 1

1x10Gb NIC

Note: Refer to backup for detailed test configuration for hardware, Ceph and testing scripts

FIO FIO

CLIENT 2

FIO FIO

CLIENT 3

FIO FIO

CLIENT 4

FIO FIO

CLIENT 5

CEPH2OSD1 OSD8…

CEPH3OSD1 OSD8…

CEPH4OSD1 OSD8…

CEPH5OSD1 OSD8…

All SSD, better configuration – NVMe/PCIe + SATA SSDs

Page 35: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

35

1.08M IOPS for 4K random read, 144K IOPS for 4K random write with tunings and optimizations

0

200000

400000

600000

800000

1000000

1200000

14000001

100

Random Read PerformanceRBD # Scale Test

4K Rand.R 8K Rand.R16K Rand.R 64K Rand.R

IOPS

late

ncy(

MS)

63K 64k Random Read IOPS @ 40ms

300K 16k Random Read IOPS @ 10 ms

1.08M 4k Random Read IOPS @

3.4ms500K 8k Random

Read IOPS @ 8.8ms

0

20000

40000

60000

80000

100000

120000

140000

1600000

10

Random Write PerformanceRbd # scale test

4K Rand.W 8K Rand.w16K Rand.W 64K Rand.W

IOPS

Late

ncy(

ms) 23K 64k Random

Write IOPS @ 2.6ms88K 16kRandom

Write IOPS @ 2.7ms

132K 8k Random Write IOPS @ 4.1ms

144K 4kRandom Write IOPS @ 4.3ms

Excellent random read performance and Acceptable random write performance

All SSD (NVMe + SATA SSD) Configuration – Break Million IOPS

Page 36: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Comparison: SSD Cluster vs. HDD Cluster Both journal on PCIe/NVMe SSD

4K random write, need ~ 58x HDD Cluster (~ 2320 HDDs) to get same performance

4K random read, need ~ 175x HDD Cluster (~ 7024 HDDs) to get the same performance

36

ALL SSD Ceph provides Best TCO (both Capx and Opex), not only performance but also space, Power, Fail rate, etc

Client Node5 nodes with Intel® Xeon™ processor E5-2699 v3 @ 2.30GHz, 64GB memoryOS : Ubuntu TrustyStorage Node5 nodes with Intel® Xeon™ processor E5-2699 v3 @ 2.30GHz, 128GB memory Ceph Version : 9.2.0 OS : Ubuntu Trusty 1 x P3700 SSDs for Journal per nodeCluster difference: SSD cluster : 4xS3510 1.6TB for OSD per nodeHDD cluster : 14x STAT 7200RPM HDDs as OSD per node

Page 37: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

37

The performance Observations and Tunings

Leveldb became the bottleneck Single thread leveldb pushed one core to 100% utilizationOmap overhead Among the 3800+ threads, average ~47 threads are running, ~ 10 pipe

threads and ~9 OSD op threads are running, most of OSD op threads are sleep (top –H)

Osd op threads are waiting for throttle of filestore to be released Disable omap operation can speedup release of filestore throttle, which

makes more OSD op thread in running state, average ~ 105 threads are running.

Throughput improved 63% High CPU consumption 70% CPU utilization of Two high-end Xeon E5 v3 processors (36 cores) with

4 S3510s Perf showed that most of CPU intensive functions are malloc, free and

other system calls

*Bypass omap: Ingore object_map->set_keys in FileStore::_omap_setkeys, for tests only

Page 38: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

38

The performance Tunings

• Up to 16x performance improvement for 4K random read, peak throughput 1.08M IOPS

• Up to 7.6x performance improvement for 4K random write, 140K IOPS

4K Random Read Tunings

4K Random Write Tunings

Default Single OSD Single OSDTuning-1 2 OSD instances per SSD 2 OSD instances per SSDTuning-2 Tuning1 + debug=0 Tuning2+Debug 0

Tuning-3 Tuning2 + jemalloctuning3+ op_tracker off, tuning fd cache

Tuning-4 Tuning3 + read_ahead_size=16 Tuning4+jemalloc

Tuning-5 Tuning4 + osd_op_thread=32Tuning4 + Rocksdb to store omap

Tuning-6 Tuning5 + rbd_op_thread=4 N/A

Default

Tunin

g-1

Tunin

g-2

Tunin

g-3

Tunin

g-4

Tunin

g-5

Tunin

g-6 -

10.00

20.00

4K random Read/Write Tun-ings

4K Random Read 4K random write

Norm

alize

d

Page 39: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Ceph Storage Cluster

Best Configuration-- All NVMe SSDs

Ceph network (192.168.142.0/24) - 10Gbps

CBT / Zabbix / Monitoring FIO RBD Client

• OSD System Config: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4• Each system with 4x P3700 800GB NVMe, partitioned into 4 OSD’s each, 16 OSD’s total per node

• FIO Client Systems: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4• Ceph v0.94.3 Hammer Release, CentOS 7.1, 3.10-229 Kernel, Linked with JEMalloc 3.6

• CBT used for testing and data acquisition • Single 10GbE network for client & replication data transfer, Replication factor 2

FIO RBD ClientFIO RBD Client

FIO RBD Client

FIO RBD ClientFIO RBD Client

FatTwin (4x dual-socket XeonE5 v3) FatTwin (4x dual-socket XeonE5

v3)

Ceph OSD1

NVMe1 NVMe3

NVMe2 NVMe4

Ceph OSD2

Ceph OSD3

Ceph OSD4

Ceph OSD16

Ceph OSD1

NVMe1 NVMe3

NVMe2 NVMe4

Ceph OSD2

Ceph OSD3

Ceph OSD4

Ceph OSD16

Ceph OSD1

NVMe1 NVMe3

NVMe2 NVMe4

Ceph OSD2

Ceph OSD3

Ceph OSD4

Ceph OSD16

Ceph OSD1

NVMe1 NVMe3

NVMe2 NVMe4Ceph OSD2

Ceph OSD3

Ceph OSD4

Ceph OSD16

Ceph OSD1

NVMe1 NVMe3

NVMe2 NVMe4

Ceph OSD2

Ceph OSD3

Ceph OSD4

Ceph OSD16

SuperMicro 1028U

SuperMicro 1028U

SuperMicro 1028U

SuperMicro 1028U

SuperMicro 1028U

Intel Xeon E5 v3 18 Core CPUsIntel P3700 NVMe PCI-e Flash

Easily serviceable NVMe Drives

Page 40: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

40

First Ceph cluster to break 1 Million 4K random IOPS, ~1ms response time

0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1,400,0000123456789

10IODepth Scaling - Latency vs IOPS -

100% 4K RandomRead 100% 4K RandomWrite 70/30% 4K Random OLTPIOPS

Avg

Late

ncy

(ms)

171K 100% 4k Random Write IOPS @

6ms

400K 70/30% (OLTP) 4k Random IOPS

@~3ms

1M 100% 4k Random Read IOPS @~1.1ms

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.

All NVMe Configuration– Million IOPS, Low Latency

• OSD System Config: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4• Each system with 4x P3700 800GB NVMe, partitioned into 4 OSD’s each, 16 OSD’s total per node

• FIO Client Systems: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4• Ceph v0.94.3 Hammer Release, CentOS 7.1, 3.10-229 Kernel, Linked with JEMalloc 3.6

• CBT used for testing and data acquisition • Single 10GbE network for client & replication data transfer, Replication factor 2

Page 41: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

41

3D NAND and 3D XPoint™ for Ceph tomorrow

Page 42: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

NAND Flash and 3D XPoint™ Technology for Ceph Tomorrow

42

3D MLC and TLC NANDBuilding block enabling expansion

of SSD into HDD segments

3D Xpoint™ Building blocks for ultra

high performance storage & memory

Page 43: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

3D Xpoint™ TECHNOLOGYIn Pursuit of Large Memory Capacity … Word Access … Immediately

Available …

Word (Cache Line)Crosspoint StructureSelectors allow dense packingand individual access to bits

Large Memory CapacityCrosspoint & ScalableMemory layers can be stacked in a 3D manner

NVM Breakthrough Material AdvancesCompatible switch and memory cell materials

Immediately AvailableHigh Performance Cell and array architecture that can switch states 1000x faster than NAND

Page 44: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

TEAM QUANTA

NAND SSDLatency: ~100,000X

Size of Data: ~1,000X

Latency: 1XSize of Data: 1X

SRAM

Latency: ~10 MillionXSize of Data: ~10,000X

HDD

Latency: ~10XSize of Data: ~100X

DRAM3D XPoint ™

Memory MediaLatency: ~100X

Size of Data: ~1,000X

STORAGE

MEMORYMEMORYTechnology claims are based on comparisons of latency, density and write cycling metrics amongst memory technologies recorded on published specifications of in-market memory products against internal Intel specifications.

3D Xpoint™ TECHNOLOGYBreaks the Memory Storage Barrier

44

Page 45: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Intel® Optane™ (prototype) vs Intel® SSD DC P3700 Series at QD=1

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase.  For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Server Configuration: 2x Intel® Xeon® E5 2690 v3 NVM Express* (NVMe) NAND based SSD: Intel P3700 800 GB, 3D Xpoint based SSD: Optane NVMe OS: Red Hat* 7.1

Page 46: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

46

3D Xpoint™ opportunities – Bluestore backend Three usages for PMEM device• Backend of bluestore: raw PMEM block

device or file of dax-enabled FS• Backend of rocksdb: raw PMEM block device

or file of dax-enabled FS• Backend of rocksdb’s WAL: raw PMEM block

device or file of DAX-enabled FS

Two methods for accessing PMEM devices• libpmemblk• mmap + libpmemlib

BlueStore

Rocksdb

BlueFS

PMEMDevice PMEMDevice PMEMDevice

Metadata

Libpmemlib

Libpmemblk

DAX Enabled File System

mm

apLo

ad/s

tore

mm

apLo

ad/s

tore

File

FileFile

API

API

Data

Page 47: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

47

3D NAND - Ceph cost effective solution

Enterprise class, highly reliable, feature rich, and cost effective AFA solution NVMe SSD is today’s SSD, and 3D NAND or TLC SSD is today’s

HDD– NVMe as Journal, high capacity SATA SSD or 3D NAND SSD

as data store– Provide high performance, high capacity, a more cost

effective solution– 1M 4K Random Read IOPS delivered by 5 Ceph nodes

– Cost effective: 1000 HDD Ceph nodes (10K HDDs) to deliver same throughput

– High capacity: 100TB in 5 nodes

W/ special software optimization on filestore and bluestore backend

Ceph Node

S3510 1.6TB

S3510 1.6TB

S3510 1.6TB

S3510 1.6TB

P3700M.2 800GB

Ceph Node

P3520 4TB

P3520 4TB

P3520 4TB

P3520 4TB

P3700 & 3D Xpoint™ SSDs

3D NAND

P3520 4TB

3D Xpoint™

Page 48: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

48

Legal Notices and Disclaimers

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.

No computer system can be absolutely secure.

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance.  

Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings.  Circumstances will vary.  Intel does not guarantee any costs or cost reduction.

This document contains information on products, services and/or processes in development.  All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.

Intel, Xeon and the Intel logo are trademarks of Intel Corporation in the United States and other countries.

*Other names and brands may be claimed as the property of others.

© 2015 Intel Corporation.

Page 49: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

49

Legal Information: Benchmark and Performance Claims DisclaimersSoftware and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. Test and System Configurations: See Back up for details. For more complete information about performance and benchmark results, visit http://www.intel.com/performance.  

Page 50: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Risk FactorsThe above statements and any others in this document that refer to plans and expectations for the first quarter, the year and the future are forward-looking statements that involve a number of risks and uncertainties. Words such as "anticipates," "expects," "intends," "plans," "believes," "seeks," "estimates," "may," "will," "should" and their variations identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel's actual results, and variances from Intel's current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following to be important factors that could cause actual results to differ materially from the company's expectations. Demand for Intel’s products is highly variable and could differ from expectations due to factors including changes in the business and economic conditions; consumer confidence or income levels; customer acceptance of Intel’s and competitors’ products; competitive and pricing pressures, including actions taken by competitors; supply constraints and other disruptions affecting customers; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers. Intel’s gross margin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; and product manufacturing quality/yields. Variations in gross margin may also be caused by the timing of Intel product introductions and related expenses, including marketing expenses, and Intel’s ability to respond quickly to technological developments and to introduce new features into existing products, which may result in restructuring and asset impairment charges. Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Results may also be affected by the formal or informal imposition by countries of new or revised export and/or import and doing-business regulations, which could be changed without prior notice. Intel operates in highly competitive industries and its operations have high costs that are either fixed or difficult to reduce in the short term. The amount, timing and execution of Intel’s stock repurchase program and dividend program could be affected by changes in Intel’s priorities for the use of cash, such as operational spending, capital spending, acquisitions, and as a result of changes to Intel’s cash flows and changes in tax laws. Product defects or errata (deviations from published specifications) may adversely impact our expenses, revenues and reputation. Intel’s results could be affected by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel’s ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. Intel’s results may be affected by the timing of closing of acquisitions, divestitures and other significant transactions. A detailed discussion of these and other factors that could affect Intel’s results is included in Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release.

Rev. 1/15/15 50

Page 51: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration
Page 52: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Ceph Benchmark/deployment/Tuning Tool: CeTune

52

CeTune controller – Reads configuration

files and controls the process to deploy, benchmark and analyze the collected data

CeTune workers– Controlled by CeTune

as workload generator, system metrics collector

Page 53: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration

Intel® Virtual Storage Manager

53

Management framework = Consistent configurationOperator-friendly interface for management &

monitoring

Cluster Server(OSDs/Monitor/MDS))

VSM Agent(on each Ceph node)

VSM Controller(Controller Node

Server)

Client ServerOpenStack Nova Controller

Server(s)

API

Data Center Firewall

Internet

Data Center

SSH

HTTPSocket

Operator

Page 54: Using Recently Published Ceph Reference Architectures to Select Your Ceph Configuration