27
Intel Builder’s Conference - NetApp John Meneghini – Data ONTAP NVMe-oF Target Architect Madhu Pai – Data ONTAP NVMe-oF Transport Architect April 21, 2017 V1.2 © 2017 NetApp, Inc. All rights reserved.. 1

Intel Builder’s Conference -NetApp - SPDK · Intel Builder’s Conference -NetApp John Meneghini –Data ONTAP NVMe-oFTarget Architect MadhuPai–Data ONTAP NVMe-oFTransport Architect

Embed Size (px)

Citation preview

Intel Builder’s Conference - NetApp

John Meneghini – Data ONTAP NVMe-oF Target Architect

Madhu Pai – Data ONTAP NVMe-oF Transport Architect

April 21, 2017

V1.2

© 2017 NetApp, Inc. All rights reserved..1

Introduction

1) Data ONTAP SAN Engineering FC & iSCSI Transport engineering teams

SCSI Target protocol engineering team

QA and Host Interop teams

2) Active at T10, T11 and NVMexpress.org Many TPARs, TPs, and ECNs

Many T10 and T11 Proposals

3) Adding support for NVMe-oF to Data ONTAP For more information see: http://www.netapp.com/us/media/wp-7248.pdf

4) SPDK provides the following that NetApp wants leverage NVMe-oF virtual Target & Initiator protocol engine

RDMA (and FC-NVMe) transports

Libraries, unit tests, scripts and tools

© 2017 NetApp, Inc. All rights reserved.2

3

What NetApp Wants To Use

Drivers

StorageServices

StorageProtocols

iSCSI Target

NVMe-oF*Target

SCSI

vhost-scsiTarget

NVMe

NVMe Devices

NVMe-oF*

Initiator

Intel® QuickDataTechnology

Driver

Block Device Abstraction (BDEV)

Ceph RBD

Linux Async IO

Blob bdev

NetAppbdev

VirtualNVMe

NVMe*

PCIeDriver

Current Future

vhost-blkTarget

ObjectIntegration

RocksDB

Ceph

Core

ApplicationFramework

Libraries

bdev json event copy conf nvmftrace log util

4

What Got My Attention

Apl 2017: First SPDK Summit

Sept 2015: nvme driver on github

Jan 2016: first external contributor

2013: spdk starts as INTEL® internal project

Jun 2016: NVMe-oF* Target

NetApp Likes/Dislikes

1) SPDK Libraries, Modules and APIs (Likes) The BDAL API is what made SPDK work for NetApp

NetApp likes the modularity and APIs in SPDK

Improved modularity and APIs make it even better

2) DPDK and /usr/lib dependencies (Dislikes) The DPDK environment doesn’t work for our application (Data ONTAP)

Expand the SPDK EAL (env_dpdk) to abstract DPDK dependencies

Abstract dependencies on Posix

3) Threading model (Dislikes) Would like a flexible, dynamic threading model supported by SPDK

4) Management plane (Dislikes) Would like to improve the NVMe-oF management APIs

5) Tired of chasing the tip of master, not knowing what will show up next

© 2017 NetApp, Inc. All rights reserved5

NetApp’s Vision for SPDK

1) Open source library that supports enterprise applications Well defined APIs that support a variety of use cases

Support for multiple platforms and execution environments

Don’t even assume user space

Enterprise class Reliability, Availability and Supportability

2) Community collaboration Better distribution lists (e.g. separate lists for code-reviews)

Public bug reporting and code reviews

Shared test environment and automation

Feature branch development

3) Governance True open source project governed by community members

Feature roadmaps and schedules

© 2017 NetApp, Inc. All rights reserved.6

Development Effort

Core

Value-Add

Shared

Proprietary

Proprietary

Functionality

SPDKShared

NetApp’s Vision for SPDK - Simplified

Topics Agenda

• Platform Abstraction

• NVMe-oF Transport Improvements

• Support for NVM Protocol Features

• NVMe-oF Management APIs

• Enterprise Readiness with RAS

• NVMe-oF Target Threading Model

© 2017 NetApp, Inc. All rights reserved.8

Platform Abstraction Improvements

1) Abstract dependencies on DPDK Improvements to env.h and env_dpdk/env.c

Not all modules effected (e.g. vhost)

2) Abstract dependencies on Posix APIs and User libs Pthreads abstracted

/usr/include, /usr/lib abstracted

3) Makefile improvements Better compile tool chain support

Support for: armv8a, spaa2, thunderx, xgene1, power8

Support for different compilers

Optional build targets

Only build the libraries and applications I want

C.f. dpdk/config

© 2017 NetApp, Inc. All rights reserved.9

Agenda

• Platform Abstraction

• NVMe-oF Transport Improvements

• Support for NVM Protocol Features

• NVMe-oF Management APIs

• Enterprise Readiness with RAS

• NVMe-oF Target Threading Model

© 2017 NetApp, Inc. All rights reserved. 10

FC-NVMe and RDMA Transports

1) Interested in SPDK FC-NVMe and RDMA Transports

2) Changes to struct spdk_nvmf_transport expected FC-NVMe and RDMA differences

Feature branch desired

3) Changes to SGL infrastructure expected Some applications require more than 2 SGL entries

4) RDMA Transport Improvements Verbs API in rdma.c to abstract user OFED libraries

C.f. Linux NVMe-oF RDMA layering

© 2017 NetApp, Inc. All rights reserved.11

Agenda

• Platform Abstraction

• NVMe-oF Transport Improvements

• Support for NVM Protocol features

• NVMe-oF Management APIs

• Enterprise Readiness with RAS

• NVMe-oF Target Threading Model

© 2017 NetApp, Inc. All rights reserved 12

NVM Protocol Improvements (Virtual Target)

1) Additional NVM Commands Abort and Identify command improvements

Persistent Reservations

Fuse Operations, Compare and Write

2) Additional Controller features Scaling Controllers, QPs and Namespaces

Support for static Controllers

Namespace sharing

Namespace mapping

Submission Queue flow

WRR with Urgent Priority Arbitration

3) In-band Namespace Management NVMe Specification Improvements

Name Space Management and Attach command support

© 2017 NetApp, Inc. All rights reserved 13

HardwareDrivers

DPDK EAL + Posix Libraries

StorageServices

Storage Protocols

NVMe-oF Improvements

© 2017 NetApp, Inc. All rights reserved. 14

Linux Kernel (RHEL 7, Debian 8)

Hardware

NVMe-oFTarget

Storage

Block DeviceAbstraction Layer

(BDAL)

BDAL ExtensionModule

New

OFED

Objectives:

Complete SPDK EAL

Add POSIX Abstractions

Add FC-NVMe Transport

Add FCT and Verbs API

Develop NVM Protocol

HCA

NVMe-oFInitiator

RDMA FC-NVME

User Verbs

FCT API

CurrentSPDK EAL and POSIX abstractions

Verbs API

FCT

FC Driver

UIO

HBAHCA

DirectVirtual

NVMProtocol

RNIC Driver

HCA

Agenda

• Platform Abstraction

• NVMe-oF Transport Improvements

• Support for NVM Protocol Features

• NVMe-oF Management APIs

• Enterprise Readiness with RAS

• NVMe-oF Target Threading Model

© 2017 NetApp, Inc. All rights reserved 15

Management Plane improvements (OOB)

1) Support for Administratively configuring Subsystems and Controllers Add a Subsystem with no Subsystem Ports or Namespaces

Start a Subsystem or stop a Subsystem (forces disconnect)

Configure number of Controllers per Subsystem

Add and remove Hosts from a Subsystem (dynamic discovery service)

2) Support for Administratively configuring Namespaces Out of band Namespace Management

Mapping NVMe-oF Hosts to NVMe Namespaces (ACL)

Support both private and shared Namespaces (ACL)

Adding and removing Namespaces (AEN support)

3) Support for Administratively configuring Subsystem Ports Subsystem port online/offline (disconnect)

Subsystem port add/remove

Dynamic updates to Discovery Service

© 2017 NetApp, Inc. All rights reserved.16

Agenda

• Platform Abstraction

• NVMe-oF Transport Improvements

• Support for NVM Protocol Features

• NVMe-oF Management APIs

• Enterprise Readiness with RAS

• NVMe-oF Target Threading Model

© 2017 NetApp, Inc. All rights reserved 17

SPDK and DPDK Reliability, Availability, Supportability

1) Improve Logging (log.c) Well defined APIs

Implementation needs to be abstracted (e.g. with a constructor module)

2) Improve Tracing (trace.c) Well defined APIs

Implementation needs to be abstracted

3) First Failure Detection and Capture (FFDC) Trigger and dump traces and logs on error or exception

4) Add performance counters and histograms Counters in the perf path

Programmable, not always on

5) Better error handling Don’t panic or abort() on error

© 2017 NetApp, Inc. All rights reserved 18

Agenda

• Environmental Abstraction

• NVMe-oF Transport Improvements

• Support for NVM Protocol Features

• NVMe-oF Management APIs

• Enterprise Readiness with RAS

• NVMe-oF Target Threading Model

© 2017 NetApp, Inc. All rights reserved19

SPDK Threading Model

1) Hybrid Polling Implemented in the Linux 4.10 kernel

2) SPDK Libraries should be thread model independent Different applications will use different threading models

3) Changes to the NVMe-oF Target threading model Scaling threads in an NVMe-oF Target Subsystem

More than 1 thread per Subsystem

Dynamic thread association with Controllers

Based upon dynamic controller creation

Dynamic thread association with Queue Pairs

As QPs scale, threads scale

© 2017 NetApp, Inc. All rights reserved 20

SPDK Threading Model (Limitations)

1) The number of threads are static. This results in wasted cores and cycles if we pre-provision the NVMe-oF threads.

2) Flexible thread model where there is a dynamic association of threads and their work.

3) The model of binding a subsystem to a thread does not scale in our environment.

4) The queuing architecture supported by the FC hardware does NOT tailor well for dynamic creation/deletion of queue pairs as done in RDMA.

© 2015 NetApp, Inc. All rights reserved 21

SPDK Threading Model (Requirements)

1) Dynamic threading model that activates and quiesces threads.

2) Dynamically create an NVMe-oF Subsystem and associate it with a thread.

3) Break the subsystem IO traffic to thread affinity.

4) Support the same lockless semantics.

5) Enhancements should be compatible with future releases of SPDK.

© 2015 NetApp, Inc. All rights reserved 22

SPDK Threading Model (Terminology/Extensions)

1) Hardware Queue Pair (HWQP) is the basic “unit” for polling The HWQP specifies a set of FC-NVMe queues that work together to provide the Send/Completion QP.

In the new model, a thread would poll one or more HWQPs.

An HWQP has affinity to a thread.

2) Subsystems would still have a thread affinity for the event handling. The thread “owning” the subsystem is the Master thread (for that subsystem)

3) IO and Admin queue pairs (IOQP/AQP) are spread across many HWQPs.

4) Threads polling the HWQPs are “Poller” threads. A “Master” thread could be a “Poller” thread (depending on the subsystem).

© 2015 NetApp, Inc. All rights reserved23

HWQP Layout

© 2015 NetApp, Inc. All rights reserved24

SPDK Threading Model (Operation)

1) All subsystem data “owned” by the Master thread.

2) Poller threads process IO by polling the HWQPs (and hence the IOQPs).

3) Master thread propagates a cache of needed data to the Poller threads that is needed during the lifecycle of an IO.

4) Poller threads route NVMe admin (and fabric) commands to the Master thread for processing.

5) Master thread co-ordinates out-of-band management commands and verifies that required caches in the Poller threads are set up correctly.

© 2015 NetApp, Inc. All rights reserved 25

SPDK Threading Model (Operation)

© 2015 NetApp, Inc. All rights reserved 26

Thank You

Questions?

© 2017 NetApp, Inc. All rights reserved.27