39
Architected for Performance NVMe-oF™ JBOFs Sponsored by NVM Express® organization, the owner of NVMe™, NVMe-oF™ and NVMe-MI™ standards

NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

Architected for Performance

NVMe-oF™ JBOFsSponsored by NVM Express® organization, the owner of NVMe™, NVMe-oF™ and NVMe-MI™

standards

Page 2: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

2

JBOF Track Speakers

Bryan Cowger

Sujoy Sen

Nishant Lodha Peter Onufryk

Fazil Osman

Page 3: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

3

• Market Overview

• Composable Infrastructure

• PCIe (direct-attached) JBOF

• Fabric-attached FBOF

• Management Options

• Remaining Challenges

• Q & A

JBOF Session Agenda

Page 4: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

Architected for Performance

Market OverviewNishant Lodha

Marvell Semiconductor

Flash Memory Summit 2018

Santa Clara, CA4

Page 5: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

5

Storage Trends from all around!

5

WW Enterprise Storage spend growing (~$42B(2016) ~$47B(2020))

• Scale up Scale Out (Hyperscale – public cloud driven by 3rd platform – mobile, social, cloud, analytics )

• ECB revenues stay flat ($25B) – Flash driving enterprise storage @ 26.2% CAGR; HDD declining @ 14.5% CAGR

Traditional storage deployment models being disrupted!

• Proprietary/siloed architectures Software Defined Storage (SDS)/Hyper Converged (HCI) on commodity HW

• Direct Attach Storage (DAS) Disaggregated storage (JBOD JBOF, FBOF)

Faster media necessitates new protocol, drives faster interconnects & enables new use cases

• NVMe™ will displace SCSI as the dominant block storage protocol by 2020 for AFA/CI/Scale-out

• Shared NVMe storage over a variety of Fabrics with NVMe-oF(RDMA (Eth, IB), FC, TCP)

• Emerging 3D Xpoint enables storage class memory (SCM)/persistent memory (PMEM)

Cloud storage for Enterprise customers iffy!

• Cost savings questionable; Data security concerns

• Hard to migrate legacy storage; Public cloud SaaS for email/collaboration

Page 6: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

6

Fabrics play a key role for JBOFs -> FBOFs

FC Ethernet

Storage Head

NodeHB

A

NIC

CPU

JBOF

SSD SSD SSD SSD

PCIe Switch

DIMM

Storage Head

NodeHB

A

NIC

CPU

FC Ethernet

FBOF

SSD SSD SSD SSD

NVMe-oF

NVMe-oF

PCIe Switch

DIMMHost #1

Host #2Host #3

Mini-SAS HD

SAS

SW

Mini-SAS HD

Mini-SAS HD

SAS

Expa

nder

SAS

Expa

nder

S

A

S

D

I

S

K

S

JBOD

Host

#1Host #2Host #3

Mini-SAS HD

PCIe

SW

SSD

Mini-SAS HD

Mini-SAS HD

JBOF

SSD

SSD

SSD

SSD

SSD

SSD

SSD

SSD

SSD

SSD

Page 7: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

7

Scaling our NVMe™ Requires a (Real) Network

Many options, plenty of confusion,

conversation beyond PCIe®

Fibre Channel is the transport for the vast

majority of today’s all flash arrays

FC-NVMe Standardized in Mid-2017

RoCEv2, iWARP and InfiniBand are RDMA-

based but not compatible with each other

NVMe-oF RDMA Standardized in 2016

FCoE is a fabric is a option

NVMe over TCP - making it way through the

standards

NVMe Server Software

Server Transport Abstraction

FibreChannel Infiniband FCoERoCEv2 iWARP

Storage Transport Abstraction

NVMe SSDs

TCP

Page 8: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

8

RDMA is Most “Considered”, Challenges Remain

RNIC Upgrade Required

RDMA Camps

Backward Compatibility

Infrastructure and Skillset change required!

Not Automatic

Not Precise

Congestion

Keeping the network

‘lossless’

RDMA/OEFD expertise

Skillset Requirements

Page 9: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

9

New This Year! NVMe-oF™/TCP

Defines a TCP Transport Binding layer for NVMe-oF

Promoted by Facebook, Google, DELL EMC, Intel, Others. Sweet spots for JBOF/FBOFs

Not RDMA-based

Not yet part of the NVMe-oF standard, Likely in 2018/19

Enables adoption of NVMe-oF into existing datacenter IP network environments that are not RDMA-enabled

TCP offload required to leverage Flash potential

Page 10: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

Architected for Performance

Composable Infrastructure

Bryan Cowger

Kazan Networks

Flash Memory Summit 2018

Santa Clara, CA10

Page 11: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

11

CPU

SSD

SSD

SSD

Today’s “Shared Nothing” Modela.k.a. DAS

SSD

CPU, Memory, etc.

Dedicated Storage

(HDDs -> SSDs)

Challenges:

- Forces the up-front

decision of how much

storage to devote to each

server.

- Locks in the

compute:storage ratio.

Page 12: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

12

CPU

SSD

SSD

SSD

SSD

CPU

SSD

SSD

SSD

SSD

CPU

SSD

SSD

SSD

SSD

App A: Needs

1 SSD

App B: Needs

2 SSDs

App C: Needs

3 SSDs

SSD

Utilized

SSD

Not utilized

Net utilization: 6 SSDs out of 12 = 50%

Shared Nothing ModelOption A: One Model Serves All Apps

“Dark Flash”

Page 13: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

13

CPU

SSD

CPU

SSD

SSD

CPU

SSD

SSD

SSD

App A: Needs

1 SSD

App B: Needs

2 SSDs

App C: Needs

3 SSDs

SSD

Utilized

Dark Flash eliminated, but limits agility and future app deployments

Shared Nothing ModelOption B: Specialized Server Configurations

Page 14: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

14

CPU

CPU

Pool of Compute

Disaggregated Datacenter

Pool of Storage – JBOF/FBOF

SSD

SSD

SSD

SSD

SSD

SSD

SSD

SSD

CPU

SSD

SSD

SSD

SSD

Page 15: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

15

Ap

p A

: N

ee

ds

1 S

SD

CPUSSD

Ap

p B

: N

ee

ds

2 S

SD

s

CPU

SSD

SSD

Ap

p C

: N

ee

ds

3 S

SD

s

CPU

SSD

SSD

SSD

Pool of CPUs

Utilized

SSDs

Pool of Storage

The Composable Datacenter

SSD

SSD

SSD

SSD

SSD

SSD

Spare

SSDs

Page 16: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

16

Utilized

SSDs

Ap

p A

: N

ee

ds

1 S

SD

CPUSSD

Ap

p B

: N

ee

ds

2 S

SD

s

CPU

SSD

SSD

Ap

p C

: N

ee

ds

3 S

SD

s

CPU

SSD

SSD

SSD

Pool of CPUs Pool of Storage

The Composable Datacenter

SSD

SSD

SSD

SSD

SSD

SSD

Spares / Expansion Pool

• Minimize Dark Flash!

• Buy them only as needed

• Power them only as

needed

Other benefits

• Dynamically allocate more

or less storage

• Return SSDs to Pool as

apps are retired

• Upgrade SSDs

independently

Spare

SSDs

Page 17: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

17

• Market Overview

• Composable Infrastructure

• PCIe (direct-attached) JBOF

• Fabric-attached FBOF

• Management Options

• Remaining Challenges

• Q & A

JBOF Session Agenda

Page 18: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

18

PCIe® NVMe™ JBOF

Facebook Lightning PCIe NVMe JBOF

NVMe

SSD

NVMe

SSDNVMe

SSD

NVMe

SSDNVMe

SSD

PCIe Switch

NVMe

Host

Page 19: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

19

PCIe® JBOF Enclosure Management

• Native PCIe Enclosure Management (NPEM)

• Submitted to the PCI-SIG® Protocol Workgroup (PWG) on behalf of

the NVMe™ Management Interface (NVMe-MI™) Workgroup

• Approved by PCI-SIG on August 10th, 2017

• Transport specific basic enclosure management

• SCSI Enclosure Services (SES) Based Enclosure Management• Technical proposal developed in the NVMe-MI workgroup

• While the NVMe and SCSI architectures differ, the elements of an enclosure

and capabilities to manage them are the same

• Example enclosure elements: power supplies, fans, display or indicators, locks,

temperature sensors, current sensors, voltage sensors, and ports

• Comprehensive enclosure management for NVMe that leverages (SES), a

standard developed by T10 for management of enclosures using the SCSI

architecture

Power

Supplies

Cooling

Objects

Temp.

Sensors

NVMe Enclosure

NVM Subsystem

...

Other

Objects

...

NVMe

Controller

Cntrl. Mgmt Intf.

Mgmt.

Ep.

NVMe

Storage

Device

NVMe

Storage

Device

NVMe

Storage

Device

NVMe

Storage

Device

Enclosure

Services Process

Slot Slot Slot Slot

Page 20: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

20

The PCIe® Latency Advantage

Latency data from Z. Guz et al., “NVMe-over-Fabrics Performance Characterization and the

Path to Low-Overhead Flash Disaggregation” in SYSTOR ‘17

Flash SSD

Media + Controller ~80mS

Next Generation NVM SSD

Media + Controller ~10mSNext Generation NVM SSD

Media + Controller ~2mS

Fabric

OverheadFabric

Overhead

Fabric

Overhead

Page 21: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

21

Host

HBA

HCA

NIC

Network

Switch

Network

Interface

NVMe

SSD

Host

PCIe

Switch

< 200ns

NVMe

SSD

Other Flash Storage Networks

PCIe Fabric

PCIe PCIe

PCIePCIe

The PCIe® Advantage

Page 22: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

22

NVMe™ SR-IOV

Page 23: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

23

Multi-Host I/O Sharing

NVMe

SSD

NVMe

SSDNVMe

SSD

NVMe

SSDNVMe

SSD

NVMe

SSD

NVMe

SSDNVMe

SSD

NVMe

SSDNVMe

SSD

PCIe Switch

NVMe

Host

1

NVMe

Host

2

NVMe

Host

3

NVMe

Host

4

PCIe Switch

Page 24: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

24

Storage is Not Just About CPU I/O Anymore

• NVMe™ together with a PCIe® fabric allow

direct network to storage and accelerator to

storage communications

Example:

1. Data transferred from network to NVMe CMB

2. NVMe block write operation imitated from CMB to NVM

… sometime later …

3. NVMe block read operation initiated from NVM to CMB

4. GPU/Accelerator transfers data from NVMe CMB for

processing

Page 25: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

Architected for Performance

FBOF Architecture

Fazil Osman, Broadcom

Flash Memory Summit 2018

Santa Clara, CA25

Page 26: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

26

NVMe-oF™ Market

SAS Replacement Composible

High performance

Low latency

Better scalability than PCIe®

Solution for traditional Enterprise

iSCSI, cluster architectures etc.

TCP

24 x U.2 30/60 x M.2

Ruler (16/32 x 1U) Modular w/Ethernet EDSFF, NF1 EDSFF Derivative

Fo

rm F

ac

tor

IO Determinism

Data Integrity

Application Offload

Fu

ture

To

day

Cloud Scale Out

Page 27: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

27

FBOF architecture examples

SoC/FPGA/ASIC

PCIe

Switch

SoC/FPGA/ASIC

PCIe

Switch

High Availability option 1

Page 28: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

28

HA FBOF architecture

SoC/FPGA/ASIC

PCIe

Switch

SoC/FPGA/ASIC

PCIe

Switch

High Availability option 2

Page 29: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

29

HA FBOF architecture with redundant switches

...

Stingray™

EthernetPCIe Gen3

Stingray™

PCIe Gen3

Controller 1

Controller 2

PCIe Switch

NVMe

NVMe

NVMe

NVMe

NVMe

NVMe

NVMe

NVMe

PCIe Switch

Ethernet

Storage Client

Storage Client

Storage Client25G/50G/100G

25G/50G/100G

Ethernet Switch

RNIC

Ethernet Switch

25G/50G/100G

25G/50G/100G

SoC/FPGA/ASIC

SoC/FPGA/ASIC

PCIe

Switch

PCIe

Switch

High Availability option 3

Page 30: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

30

FBOF high fanout architecture

High Fanout

Stingray

PCIe Gen3

PEX9765 ...Storage Client

RNICStorage Client

RNICStorage Client

RNIC

25G/50G/100G

NVMe

NVMe

NVMe

NVMe

Ethernet

NVMe-oF Storage Array

Stingray

PCIe Gen3 ..

.25G/50G/100G

NVMe

NVMe

NVMe

Ethernet

NVMe-oF Storage Array

PEX9765 NVMe

NVMe

NVMe

NVMe

NVMe

PEX9765

Ethernet Switch

SoC/FPGA/ASIC

PCIe®

Switch

SoC/FPGA/ASIC

PCIe

Switch

PCIe

Switch

Page 31: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

31

1U ruler based designs on PCIe® attach being introduced into the market

– i.e. White River Glacier etc., various ODM offerings

Designs provide high density NVMe™ but lack scalability

Goal is to extend concept for cloud scale using NVMeoF™

Gain scalability of fabrics attached

Simplify design by removing PCIe switch

Scale Out Cloud Architecture

FRU Module

Ethernet Switch

Storage Client

RNICStorage Client

RNICStorage Client

RNIC

Eth

ern

et

100G

/50G

/25G

SoC/FPGA/

ASIC

Co

nn

ec

tor

Page 32: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

Architected for Performance

FBOFs in the Cloud

Sujoy Sen, Intel

Flash Memory Summit 2018

Santa Clara, CA32

Page 33: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

33

Making FBOFs Successful in the Cloud

FBOFs in the cloud enable the composable and disaggregated use case

Success will require the following

• Network QoS (especially RDMA@scale)

• Easy to deploy and manage@scale

• Enable Scale-out Distributed Storage architectures

Page 34: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

34

Ease of Use

• E2E management

• Not just FBOFs but the

hosts and the network in-

between

• Cloud OS Enablement

• Develop drivers/plug-ins

for NVMe-oF™

• Bare Metal

• Server platform and OS

native support for NVMe-

oF provisioning

NVMe-oF

Layer

Fabric Manager

Host

FBOF

Provision Host

Manage FBOF

Configure

Fabric

Drive standards-based management eco-system

Page 35: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

35

Scale-Out Distributed Storage

• Blast Radius and Failure Domains

• Soft vs hard error handling

• Single Point-of-Failure avoidance

• Partitioning of Data Services between

storage node and FBOF, e.g.

• Data Layout and Media Management

• Replication/HA

• Data Compression and Security

• Distributed storage-aware NVMe-oF™

• Cluster-aware protocol enhancements

NVMe-oF

+ Data

Services

NVMe-oF

+ Data

Services

Distributed Storage Nodes

FBOFs

Page 36: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

36

Key Takeaways

• JBOF / FBOF represents a key building block for NVMe™ based datacenters

• Two options:• PCIe® Direct Connect JBOFs

• Lowest Latency

• Limited Scale / Distance

• Fabric Attached FBOFs

• Scale at the levels of FC or Ethernet

• Additional latency, networking / fabric bandwidth

• Manageability represents new opportunities and challenges

Page 37: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

37

Page 38: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

38

For more information please contact the following:

Fazil Osman [email protected]

Nishant Lodha [email protected]

Sujoy Sen [email protected]

Bryan Cowger [email protected]

Peter Onufryk [email protected]

Contact Information

Page 39: NVMe-oF™ JBOFs€¦ · Sujoy Sen Nishant Lodha Peter Onufryk Fazil Osman. 3 • Market Overview • Composable Infrastructure • PCIe (direct-attached) JBOF • Fabric-attached

Architected for Performance