4
Copyright © 2019. All Rights Reserved. | www.fungible.com 1 The Case for Disaggregated Storage one could contemplate running applications with stringent latency requirements - previously served only by local Direct-Attached Storage - on a disaggregated storage architecture. Note: The concept of disaggregating storage is not new. See Figure 1 in Pg 3. Flash at the Speed of Flash? Very quickly though, reality bites. In the real world, Flash storage rarely performs close to its theoretical limits. Why? An initial factor limiting Flash-based SSD performance were the protocols used to access the storage media. Early SSDs leveraged SATA/SAS protocols that were archaically based on spinning HDDs and were not designed to take advantage of the fast media. When NVMe and eventually, NVMe over Fabrics protocols were introduced, these new standards enabled faster, low latency, parallel paths to the media, offering significantly faster accesses to storage. These newer storage protocols in turn, exposed new bottlenecks elsewhere in the architecture, specifically the storage stack. It is not unusual to see About the Author Benny Siman-Tov VP of Product & Business Development Benny Siman-Tov is the Vice President of Product & Business Development at Fungible, responsible for managing Fungible's product portfolio and developing strategic GTM partnerships. Benny was previously Director of Product Management at Intel’s Programmable Solutions Group where he developed multiple successful products including the first ever FPGA virtualization framework for the cloud. Before that, he was Director of Strategic Planning at Altera focusing on cloud and communications. Benny's illustrious career includes co-founding MobileCore Networks – the first multi- generational on-ramp integrated multi-service wireless platform and ProQuent Systems – a GPRS/3G wireless data company. Benny holds six patents and has a Bachelors of Science in Electrical Engineering from Israeli Institute of Technology and a Masters of Science from the University of Tel Aviv. Whitepaper Data storage has always been an important IT consideration. Data center operators have consistently sought to increase storage performance and capacity, meet more and more stringent reliability, availability and security requirements - all while keeping costs in check. These needs are further accelerated by the growth of big-and-fast data workloads such as AI, analytics etc. Over the years, innovations in storage media, protocols and software have allowed operators to keep pace with the growing demand for faster, more reliable and efficient storage. Some innovations provide such significant value that a new paradigm is set in motion. The advent of modern high- performance Flash SSDs is one such example; an invention that offers a true step-function improvement in performance and capacity of a key data center building block. This was a technology that touted read/write speeds 1000s of times faster than traditional HDDs and had the potential of changing the future of data center storage architecture. In fact, Flash was flippantly referred to as problematically fast - so fast that This white paper has three main goals: to examine the limitations of existing storage architectures, to discuss the benefits of disaggregated storage architecture, and to provide a view into how Fungible’s Data Processing Unit (DPU) can hasten the deployment of high performance, reliable, secure and efficient disaggregated storage solutions for organizations of all sizes.

for Disaggregated Storage - Fungible · Whitepaper Fungible, Inc. 3201 Scott Blvd. Santa Clara, CA 95054 669-292-5522 At Fungible, we have designed a microprocessor known as the Data

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: for Disaggregated Storage - Fungible · Whitepaper Fungible, Inc. 3201 Scott Blvd. Santa Clara, CA 95054 669-292-5522 At Fungible, we have designed a microprocessor known as the Data

Copyright © 2019. All Rights Reserved. | www.fungible.com 1

The Case for Disaggregated Storage

one could contemplate running applications with stringent latency requirements - previously served only by local Direct-Attached Storage - on a disaggregated storage architecture. Note: The concept of disaggregating storage is not new. See Figure 1 in Pg 3.

Flash at the Speed of Flash?

Very quickly though, reality bites. In the real world, Flash storage rarely performs close to its theoretical limits. Why?

An initial factor limiting Flash-based SSD performance were the protocols used to access the storage media. Early SSDs leveraged SATA/SAS protocols that were archaically based on spinning HDDs and were not designed to take advantage of the fast media. When NVMe and eventually, NVMe over Fabrics protocols were introduced, these new standards enabled faster, low latency, parallel paths to the media, offering significantly faster accesses to storage.

These newer storage protocols in turn,exposed new bottlenecks elsewhere in the architecture, specifically the storage stack. It is not unusual to see

About the Author

Benny Siman-Tov VP of Product & Business Development

Benny Siman-Tov is the Vice President of Product & Business Development at Fungible, responsible for managing Fungible's product portfolio and developing strategic GTM partnerships.

Benny was previously Director of Product Management at Intel’s Programmable Solutions Group where he developed multiple successful products including the first ever FPGA virtualization framework for the cloud. Before that, he was Director of Strategic Planning at Altera focusing on cloud and communications. Benny's illustrious career includes co-founding MobileCore Networks – the first multi-generational on-ramp integrated multi-service wireless platform and ProQuent Systems – a GPRS/3G wireless data company.

Benny holds six patents and has a Bachelors of Science in Electrical Engineering from Israeli Institute of Technology and a Masters of Science from the University of Tel Aviv.

Whitepaper

Data storage has always been an important IT consideration. Data center operators have consistently sought to increase storage performance and capacity, meet more and more stringent reliability, availability and security requirements - all while keeping costs in check. These needs are further accelerated by the growth of big-and-fast data workloads such as AI, analytics etc.

Over the years, innovations in storage media, protocols and software have allowed operators to keep pace with the growing demand for faster, more reliable and efficient storage. Some innovations provide such significant value that a new paradigm is set in motion. The advent of modern high-performance Flash SSDs is one such example; an invention that offers a true step-function improvement in performance and capacity of a key data center building block. This was a technology that touted read/write speeds 1000s of times faster than traditional HDDs and had the potential of changing the future of data center storage architecture.

In fact, Flash was flippantly referred to as problematically fast - so fast that

This white paper has three main goals: to examine the limitations of existing storage architectures, to discuss the benefits of disaggregated storage architecture, and to provide a view into how Fungible’s Data Processing Unit (DPU) can hasten the deployment of high performance, reliable, secure and efficient disaggregated storage solutions for organizations of all sizes.

Page 2: for Disaggregated Storage - Fungible · Whitepaper Fungible, Inc. 3201 Scott Blvd. Santa Clara, CA 95054 669-292-5522 At Fungible, we have designed a microprocessor known as the Data

Copyright © 2019. All Rights Reserved. | www.fungible.com 2

Whitepaper

hyperconverged platforms maxing out at 100K input-output operations per second (IOPS) on NVMe Flash drives rated for 750K IOPS or higher. Thus, while the costs of Flash are on the decline, the overall storage solution cannot achieve optimal IOPS/$ due to the inefficiencies in the storage stack.

The focus on improving end-to-end storage performance therefore shifts to the storage stack. Today, storage stacks are commonly implemented to run on CPUs. In such a software-based implementation, the CPUs in a hyperconverged server are expected to run applications and at the same time, run the storage stack and data-centric services such as compression, encryption, replication etc. The greater the number of storage services running on the CPU, the lower the achievable storage IOPS and the higher the latency. This would have been tolerable when Moore’s Law was tracking a healthy curve. Allocating CPUs to run the storage software stack and services is not unreasonable when you can expect to double your efficiency on a regular cadence. But, with Moore’s Law plateauing and demands rising exponentially, a previously accepted, but compromised status quo is no longer defensible.

The takeaways are:

With these two factors addressed, can we finally achieve Flash at the speed of Flash? Not so fast.

In this paper, as we are making a case for disaggregating storage, you may rightfully ask: doesn’t disaggregating storage add another variable (the network!) that would exacerbate storage performance? The answer is yes, absolutely. So, why disaggregate storage?

Benefits of Disaggregating Storage

Many modern workloads such as ML, MapReduce etc. are storage capacity intensive. Data sets no longer fit the maximum storage capacity in a single server but must be spread out across a large number of servers.

Storage capacity is limited by the available PCIe slots in a server. While storage pooling across servers can be achieved via Software-Defined Storage (SDS), access to the SSDs is constrained by CPU bottlenecks. High, unpredictable latencies through the CPU result in complex software, data placement restrictions and ultimately limit scale.

By decoupling storage from compute, CPU bottlenecks and overall software complexity are reduced on both client and target storage nodes. Latencies become more uniform resulting in simpler data placement considerations and enabling storage pooling at a larger scale.

Disaggregated Storage

Highly Efficient, Scalable Storage Pooling

Modern workloads need flexible access to data from different compute elements. E.g. Microservices-based softwarearchitecture implementsservices that run ondifferent computenodes, but interactswith a common storagespace.E.g. High-demandcontent needs to beaccessedsimultaneously.

Maximum Storage Performance with Lower Capacity Requirements

The DAS model imposes a tight compute-storage relationship, necessitating data placement to be carefully considered - adding complexity to an N:1 compute-storage scenario. This model also artificially constrains the IOPS of the SSDs, limiting it to the CPUs' processing power in the server. High-demand content needs to be duplicated to multiple servers, resulting in unnecessary capacity bloat.

The disaggregated model supports a more flexible compute-storage relationship, allowing multiple CPU client nodes to access and offer aggregated IOPS to the storage node.

The reduction in software complexity allows the CPUs to sustain higher IOPS in the storage target, thereby serving more clients and reducing the need to duplicate high-demand content.

Direct-Attached Storage (DAS)

HYPERCONVERGED SERVER

1. NVMe and NVMe-oF will be thedominant protocols that willunleash the potential of modernSSDs in the foreseeable future.

2. A legacy storage stack running onCPUs will not be able to maximizethe performance (high throughput,low latencies) and cost benefits ofNVMe SSDs.

eleenaong
Line
eleenaong
Line
eleenaong
Line
eleenaong
Line
eleenaong
Line
eleenaong
Line
Page 3: for Disaggregated Storage - Fungible · Whitepaper Fungible, Inc. 3201 Scott Blvd. Santa Clara, CA 95054 669-292-5522 At Fungible, we have designed a microprocessor known as the Data

Copyright © 2019. All Rights Reserved. | www.fungible.com 3

Whitepaper

Recognizing these benefits, hyperscalers were the first to adopt disaggregated storage architectures in response to the rapidly expanding demands for scalability, especially for varied and dynamic workloads. Storage disaggregation is also now being explored and evaluated by other cloud, enterprise and Telco organizations looking to scale their infrastructure and manage infrastructure spend.

The benefits of disaggregating storage are clear. Let's take a look at the requirements to make it happen.

With traditional infrastructure, compute servers and storage arrays were separate appliances, connected via network and storage network switches. All appliances were separately managed. The initial convergence integrated the provisioning and management software layer.

Hyperconverged infrastructure brought compute and storage into a server. Storage is commonly locally attached to the compute elements in what is known as the Direct-Attached Storage (DAS) model. Resources are software-defined and virtualized, enabling flexible building blocks.

Composable Disaggregated Infrastructure brings together the best of both architectures by using software defined concepts to enable composability of disaggregated architecture.

Figure 1. Evolution of Storage Architecture in the Evolution of

Data Center Architecture

OPEX Organizations are continuously looking to simplify IT infrastructure management e.g. elastic provisioning, deployments, upgrades, replacements, inventory management etc.

To accommodate the varied needs of different workloads, a common solution employed in a HCI environment is to over- dimension the hardware to support peak requirements. This is an expensive and wasteful solution as resources are underutilized.

An alternative approach isto deploy a range of different server variants. Management of servers become correspondingly more complex with the increased permutations in servers.

Larger shared storage pools that are dynamically allocated are well-suited to handle expanding demands, resulting in higher utilization and bigger CAPEX savings.

Increased Capacity Utilization

Improved IT Infrastructure Management

CAPEXShared storage poolingimproves overall capacity utilization.

Due to the limited scale of resource pooling via SDS, CAPEX savings through pooling is smaller. Rapidly expanding demands may drive over-provisioning.

Composability of resources in the disaggregated model simplifies server design and reduces the number of server variants. Storage provisioning can be automated and streamlined to deliver the right type and amount of storage at the right time to the right server.

IT administrators can upgrade, add, or replace specific resources instead of entire systems. This pay-as-you-grow planning allows for easier infrastructure management and better budget control.

Live Workload and VM Migration for Improved Availability

When storage is coupled with compute, when a workload or a VM needs to be migrated, data is moved from one storage to another, impacting availability.

When storage is decoupled from compute, migration of data in storage becomes a moot point. New VMs need only be pointed to the original data location. This allows for applications to be migrated rapidly, enabling higher availability.

Organizations expect > 99.999% availability.E.g. In data centers,virtual machines (VMs)often need to undergolive migration from onecompute node toanother for reasonsincluding maintenance,load balancing etc.

eleenaong
Rectangle
eleenaong
Line
eleenaong
Line
eleenaong
Line
eleenaong
Line
eleenaong
Line
eleenaong
Line
Page 4: for Disaggregated Storage - Fungible · Whitepaper Fungible, Inc. 3201 Scott Blvd. Santa Clara, CA 95054 669-292-5522 At Fungible, we have designed a microprocessor known as the Data

Copyright © 2019. All Rights Reserved. | www.fungible.com 4

Whitepaper

Fungible, Inc.3201 Scott Blvd.Santa Clara, CA 95054669-292-5522

At Fungible, we have designed a microprocessor known as the Data Processing Unit or DPU to enable high performance disaggregated storage at large scale. The DPU:-

• Supports NVMe and NVMe over Fabrics (NVMe-oF) storage protocols.

• Enables a widely scalable, anywhere-to-anywherelow latency and tail latency fabric.

• Serves as a high-performance controller forvending SSDs to the network. The DPUimplements the storage stack, offloading thisfunction from the CPU and delivers industry-leading bandwidth to enable massive parallelaccesses to the media.

• Runs storage services such as compression, de-duplication, erasure coding in-line and at line rate.

• Supports secure multi-tenant environmentthrough encryption, authentication, overlay andunderlay networks.

• Implements full separation of control plane fromdata plane to enable higher end-to-endperformance and better scalability.

Fungible’s goal is to deliver a solution that is purpose built from ground up to make high-performance disaggregated storage a reality, and in doing so, unlock unprecedented value in performance, reliability and economics for data centers of all scales. We believe that while some islands that bind together compute and storage may remain, most data centers of all sizes will progressively move to disaggregated storage.

Interested in getting a sneak peek of our products under development, contact our Sales team today!

Requirements for High-Performance, Disaggregated Storage

Fast Network FabricToday, a multi-layer CLOS switching architecture is prone to congestion. RDMA networking has limited scalability and while TCP scales well, its performance is relatively low. There is therefore, a need for a high throughput, low latency and low tail latency, anywhere-to-anywhere fabric that can enable many compute nodes to access storage very quickly and meet stringent quality of service (QoS) requirements.

Fast Storage and Storage Networking Protocols To extract the maximum IOPS from high-performance SSDs, fast and efficient storage protocols such as NVMe and NVMe over Fabrics are imperative.

Fast and Secure Storage Client and Target Controllers Performance is clearly a crucial metric when storage is disaggregated from compute, but network security is equally important. An ideal storage controller should:

• Process data and access the SSDs at the lowestlatencies and maximum bandwidth the aggregatedSSDs in the storage array are capable of.

• Support a multi-tenant environment where end-to-end security between client and target nodes isguaranteed and security within a storage array isprotected via unique customer keys.

The Data Processing Unit (DPU) Enables Large Scale, High Performance Disaggregated Storage

Up until now, disaggregating high performance storage at scale has been a vision few organizations have been able to realize. Challenges with CPU bottlenecks, fabric performance and legacy software limitations have stood in the way of optimal performance.

eleenaong
Inserted Text