Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Accelerating Machine Learning withNVMe and NVMe-over-Fabrics
About Me
© 2019 E8 Storage, Proprietary and Confidential2
Zivan Ori
CEO & Co-Founder, E8 Storage
Mr. Zivan Ori is the co-founder and CEO of E8 Storage. Before founding E8 Storage, Mr. Ori held the position of IBM XIV R&D Manager, being responsible for developing the IBM XIV high-end, grid-scale storage system, and served as Chief Architect at Stratoscale, a provider of hyper-converged infrastructure. Prior to IBM XIV, Mr. Ori headed Software Development at Envara(acquired by Intel) and served as VP R&D at Onigma (acquired by McAfee).
About E8 Storage
• Founded in November 2014 by storage industry veterans from IBM-XIV
• Leading NVMe over Fabrics certified solution in the market
• Backed by Tier-1 VCs Accel Partners, Magma Ventures & Vertex Ventures
• World-wide Team:• R&D in Tel-Aviv• Sales & marketing in Santa Clara, NY and London
• In production with customers in U.S. and Europe
• Awarded 10 patents (granted) + 4 pending for E8 architecture
• Flash Memory Summit 2016 & 2017 Most Innovative Product Award
© 2019 E8 Storage, Proprietary and Confidential3
The Problem (Part 1): Why not use local SSDs in servers?
© 2019 E8 Storage, Proprietary and Confidential4
• Local SSDs today achieve latency 10x faster than all-flash arrays
• “The DevOps Problem”• Things that work on laptops become 10x
slower on the production infrastructure
• “The islands of storage problem”• Local SSDs in servers mean inefficient capacity
utilization, no sharing of SSD data
• Local SSDs couple storage and compute• Server purchasing requires upfront
investment in SSDs
0.1ms 1ms
???
Local SSD AFA
The Problem (Part 2): Why not use SSDs in SAN/NAS?
• Traditional all-flash arrays (SAN/NAS) get 10%-20% of the potential performance of NVMe SSDs• Classic “scale-up” bottleneck
• Dual controller bottleneck• All I/O gated by controller CPU• Switching the SSDs from SAS to NVMe
cannot alleviate the controller bottleneck
© 2019 E8 Storage, Proprietary and Confidential5
First gen architectures cannot unlock the full performance of NVMe
E8 Storage Unlocks the Performance of NVMe
© 2019 E8 Storage, Proprietary and Confidential6
1000
100120
Read Latency (us)(@4K)
AFA with 24 SSDs
Single NVMe SSD
E8 24 NVMe SSDs
300K750K
10M
IOPS (@4K read)
2.4 3.1
40
Read/Write Bandwidth (GB/s)
See the Demo!
Fastest Shared Block Storage in the World
• E8 holds record in 2 audited storage benchmarks• 17x faster in STAC-M3 benchmark• 8x lower latency on average in SPECsfs benchmark
• The power of NVMe SSDs + RDMA networks• Previous submissions used tons of RAM
• More performance, less hardware• Shared NVMe allows to consolidate hardware into a small footprint• E8 with 2U appliance beat 10U and 18U appliances
© 2019 E8 Storage, Proprietary and Confidential7
• As of SPEC SFS®2014_swbuild results published August 2018. See all published results at https://www.spec.org/sfs2014/results/
• Of the published, audited results on https://stacresearch.com/ as of May 2018. Graphs show the 2 closest competitors for overall results.
0 5000 10000 15000 20000
100T.VWAB-12D-NO
10T.VOLCURV
1T.NBBO
1T.WRITE.LAT2
E8 Storage Competitor A Competitor B
17x Faster!
SPECsfs Record Holder (IOPs + Latency)
Best STAC-M3 Response Times (ms)8x lower latency!
Designed for Availability and Reliability
© 2019 E8 Storage, Proprietary and Confidential8
• Host agents operate independently• Failure of one agent (or more) does not affect other agents
• Access to shared storage is not impacted
• RAID data protection with virtual spare capacity
• Network multi-pathing with fast fail-over
• Enclosure high availability• Option 1: HA enclosure + dual-ported SSDs
• Option 2: Cross-enclosure HA + single-ported SSDs
No single point of failure anywhere
in the architectureHost Servers with E8 Host Agents
Cost Comparison (*based on typical rack)
© 2019 E8 Storage, Proprietary and Confidential9
Save >40% of the Cost of SSDs, 20% of the Cost of the Rack
Before:• 64 servers with 16TB NVMe
= 1PB of SSDs• $0.2/GB = $200K
After:• RAID-10• Over-provision 4:1• 16*16TB SSDs
in a dual-controller• 0.5PB of SSDs =
$100K + $8K enclosure
Local SSDutilization: 20%
Central SSD utilization: 80% $0
$100,000
$200,000
$300,000
$400,000
$500,000
$600,000
SSD Cost Rack Cost
Local NVMe
Dis-aggregated NVMe-oF
10
E8 Storage Customers and Use-Cases
© 2019 E8 Storage, Proprietary and Confidential
E8 Storage Customers: When Performance Matters
© 2019 E8 Storage, Proprietary and Confidential11
Web-scale/IaaS Financials BioIT/HPC
2 of the world’s Top-10 Largest Hedge Funds
Customer Use-Case: Market Data for Financials
Before• 1152 local SSDs in 72 servers • Market data copied nightly to all
servers• Restricted to 10TB-20TB
After• 48 SSDs in 2 E8-D24 appliances• Market data shared from E8 to all
72 servers• Easily scalable to 300TB
© 2019 E8 Storage, Proprietary and Confidential12
In production with 2 of the world’s Top-10 largest hedge funds
“We have been using E8 for a year and have more than 10 boxes.
A single box achieves 40GB/s reads and large block writes. For an all-flash tier, it is just a beast.”
Shared NVMe reduced the number of replicas needed by 72X
70% Cost reduction! SVP at Tier-1 Hedge Fund:
2 of the world’s Top-10 Largest Hedge Funds
Genomic Acceleration with E8 Storage
"We were keen to test E8 by trying to integrate it with our Univa Grid Engine cluster as
a consumable resource of ultra-performance scratch space. Following some simple
tuning and using a single EDR link we were able to achieve about 5GB/s from one
node and 1.5M 4k IOPS from one node. Using the E8 API we were quickly able to write
a simple Grid Engine prolog/epilog that allowed for a user-requestable scratch volume
to be automatically created and destroyed by a job. The E8 box behaved flawlessly and
the integration with InfiniBand was simpler than we could have possibly expected for
such a new product."
- Dr. Robert Esnouf, Director of Research Computing
Oxford Big Data Institute +
Wellcome Center for Human Genetics
© 2019 E8 Storage, Proprietary and Confidential13
Shared NVMe as a fast tier for parallelizing genomic processing
From 10 hours per genome to 1 hour for 10 genomes!
E8 for AI/ML with IBM GPFS and Nvidia
• A GPU cluster requires 0.5PB-1PB of shared fast storage
• But GPU servers have no real estate for local SSDs…
• E8 provides concurrent access for 1000 (!) GPUs in cluster
• 10x Performance of Pure Storage FlashBlade
• 4x Performance of IBM ESS SSD Appliances, for half the cost
© 2019 E8 Storage, Proprietary and Confidential14
Shared NVMe Accelerates Training for Image Recognition
Pure StorageFlashBlade
IBM GPFS + ESS E8 + GPFS
Cost ($/GBu)
GPU Farm: Nvidia DGX-1• Up to 8 GPUs per node• GPFS Client + E8 Agent run on
x86 within GPU Server• Up to 126 GPU nodes in
cluster
Mellanox 100G IB
0
500
1000
1500
2000
2500
3000
1 GPU node 10 GPU nodes 100 GPU nodes
Images per second, per GPU node(ResNet-50 Image Recognition Training)
Pure Storage
IBM GPFS + ESS
E8+GPFS
Shared NVMe Storage• E8 D24 2U24-HA• Dual-port 2.5” NVMe Drives• Up to 307TB NAND per 2U• Up to 36TB Optane per 2U• E8 Patented Distributed
RAID6
© 2019 E8 Storage, Proprietary and Confidential15
Centralized Storage Reliability
Hyper-scalability
Affordable100% COTS
PCIe SSDPerformance