Upload
others
View
19
Download
1
Embed Size (px)
Citation preview
SPDK FTL & Marvell OCSSDfor Noisy Neighbor Problem
David ReckerMarketing VP
Circuit Blvd., Inc.April 2019
4/17/2019 © 2019 Circuit Blvd., Inc. 1
Who We Are
4/17/2019 © 2019 Circuit Blvd., Inc. 2
Sunnyvale, CA, U.S.A.
IndustryEnterprise/Cloud Database and Storage
Year Founded2017
MissionWe develop next gen database/storage systems leveraging expertise in memory semiconductor, solid-state storage system, and operating systems
Open Source Contributions• Linux LightNVM, OCSSD 2.0 specification, OpenSSD FPGA platform
• SPDK (since SPDK v17.10)
• RocksDB
OCSSD with SPDK FTL
• SPDK FTL on Marvell’s OCSSD Platform• We have been evaluating SPDK FTL on Marvell's SSD SoC platform since Jan ’19
• SPDK (Flash Translation Layer) FTL: The Flash Translation Layer library provides block device access on top of non-block SSDs implementing Open Channel interface. It handles the logical to physical address mapping, responds to the asynchronous media management events, and manages the defragmentation process*
• Measured various performance metrics of initial prototype and demonstrate how SPDK OCSSDs can solve the noisy neighbor problem in multi-tenant environments
• Share experimental data based on our current implementation (both SPDK FTL and Marvell’s controller being continuously improved)
• (Demo) SPDK Driven OCSSD Comparison (Isolation vs Non-Isolation) • Demo table outside (please feel free to drop by for further questions)
4/17/2019 © 2019 Circuit Blvd., Inc. 3
* SPDK FTL definition: https://spdk.io/doc/ftl.html
Hardware Setup
• SuperMicro X11DPG• 2 * Xeon Scalable Gold 6126 2.6 Ghz (12 cores)
• hyperthreading disabled• 8 * 32 GB DIMM 2666 MT/s
• 2 * OCSSD 2.0• Marvell 88SS1098 controller
• PCIe Gen3x4 slot to each CPU package• nvme id-ns
• LBADS=12 (4KiB), MS=0• ocssd geometry
• 8 grp (3), 8 pu (3), 1478 chk (11), 6144 lbk (13)• () means bit length in LBAF
• ws_opt=24 (96KiB)• 3D TLC NAND
• write unit: 96KiB (one shot program)• read unit: 32KiB
4/17/2019 © 2019 Circuit Blvd., Inc. 4
OCSSD1
OCSSD2
CPU1
CPU2
OCSSD Geometry
4/17/2019 5
OCSSD
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
grp 0 1 2 3 4 5 6 7
pu
PC
Ie
0
1
2
3
:
1476
1477
chk 0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
: : : : : : : :
: : : : : : : :
: : : : : : : :
6120 6121 6122 6123 6124 6125 6126 6127
6128 6129 6130 6131 6132 6133 6134 6135
6136 6137 6138 6139 6140 6141 6142 6143
lbk (4KiB)3D TLC NAND2 plane blocks
64 (layer) * 4 (wordline) * 3 (page) * 8 (lbk) = 6144NAND op: tR < tPROG < tBERS*: rs_opt(optimal read size) is not defined on OCSSD spec 2.0
grp pu chk lbk
bits 3 3 11 13
ex) 7 1 1 5
OCSSD LBA (64bits)
ws_opt=24
rs_opt=8 (*)
(1) pu range = 0-15,16-31,32-47,48-63(2) pu range = 0-63for pblk & spdk ftl
Software Setup
• linux 4.17 for pblk• with Marvell’s patches applied
• linux 5.0 for SPDK
• fio 3.13• isolcpus=6-11,18-23 for fio threads
• cpus_allowed=6-11 or 18-23
• SPDK• master: 7b0579d (4/9/2019)
4/17/2019 © 2019 Circuit Blvd., Inc. 6
• Additional changes to SPDK master• Marvell specific patch (CircuitBlvd’s)
• num_chk=1478 as posted on SPDK github issue
• OCSSD identification quirk & edlp=0
• vector reset (0x90) to DSM deallocate (0x09)
• erase should be done in synchronous mode
• vendor specific cmd to build chunk info
• Optimal read size (rsopt) patch• will be posted to SPDK gerrithub
• Cherry picks to avoid chk wptr error (Intel’s)• https://review.gerrithub.io/c/spdk/spdk/+/449068
• https://review.gerrithub.io/c/spdk/spdk/+/450174
• https://review.gerrithub.io/c/spdk/spdk/+/449239
0
0.5
1
1.5
2
2.5
3
3.5
0 1000 2000 3000 4000G
iB/s
Time (seconds)
pblk spdk ftl spdk ftl-rsopt• single bdev on 64 PUs
• 2 * W (128k write T1Q64)
• 1 * R (128k read T1Q64)
• spdk ftl isn’t aware of TLC read unit as posted on SPDK Trello
Throughput Comparison
4/17/2019 © 2019 Circuit Blvd., Inc. 7
* pblk target created with op=20
1st write 2nd write
read
Noisy Neighbor Problem Solved by OCSSD• 4k randread T3Q64 & randwrite T1Q64 on four partitions
• each partition is pre-conditioned with 128k write
4/17/2019 © 2019 Circuit Blvd., Inc. 8
0
50
100
150
200
250
7100 7120 7140 7160 7180 72000
50
100
150
200
250
7100 7120 7140 7160 7180 7200
pblk
spdk ftl
3 reads
1 write
3 reads
1 write
X: seconds, Y: K IOPS
• not isolated: single bdev on 64 Pus • isolated by 2 channels: four bdevs per 16 Pus
0
50
100
150
200
250
7100 7120 7140 7160 7180 7200
0
50
100
150
200
250
7100 7120 7140 7160 7180 7200
Contributions & Future Works
• OCSSD 2.0 API & FTL• https://github.com/spdk/spdk/commits?author=youngtack• https://github.com/spdk/spdk/commits?author=iClaire
• FTL issues• https://trello.com/c/Osol93ZU• https://github.com/spdk/spdk/issues/created_by/youngtack• https://github.com/spdk/spdk/issues/created_by/iClaire
• Future works• random IOPS bottleneck analysis• ANM analysis once Marvell firmware will support• CPU affinity per FTL bdev analysis• PMDK and ZNS support of FTL bdev
4/17/2019 © 2019 Circuit Blvd., Inc. 9
Acknowledgement
• Wojciech Malikowski (Intel) – SPDK FTL
• Matias Bjørling (Western Digital) – QEMU NVMe, LightNVM PBLK
• Luan Ton-That (Marvell) - OCSSD firmware
• John Schadegg (Marvell) - OCSSD EVB
4/17/2019 © 2019 Circuit Blvd., Inc. 10
Open-Channel SSD Roadmap
4/17/2019 © 2019 Circuit Blvd., Inc. 11
2011 2014 2015 2018
Jasmine OpenSSDIndilinx (SoC) SATA
Cosmos OpenSSDFPGA w/ PCIe Gen 2
OCSSD SpecLightNVM Architecture
OCSSD ProjectsAlibaba OCSSD
Microsoft Denali
2019
OCSSD w/ SPDKMarvell SoC w/ SPDK FTL
2020 ~
Cinabro™ Storage Appliance
SPDK FTL + PMDKOCSSD / ZNS
Optane DIMM
CinabroTM Architecture
4/17/2019 © 2019 Circuit Blvd., Inc. 12
20 ~ 30 SSDs
SW Stack and Storage Appliance
OS
SPDK FTL / PMDK
App
OCSSD / ZNSOptane DIMM
Summary
• The SPDK+OCSSD shows promise in alleviating the Noisy Neighbor problem.
• SPDK OCSSD Reference Platform Availability: 2H ‘19
• For inquiries or more information:
4/17/2019 © 2019 Circuit Blvd., Inc. 13
www.circuitblvd.com
Mar vel l Conf ident ia l
Marvell Data Center &
Enterprise Open Channel SSD Controller
S P D K 2 0 1 9
Mar vel l Conf ident ia l
Agenda
15
• Marvell 88SS1098 Datacenter NVMe SSD Controller
• Marvell OC Drive (Prototype)
Mar vel l Conf ident ia l
88SS1098 - Marvell Datacenter NVMe SSD Controller
16
Feature 88SS1098
Capacity8TB/8CH or 16TB/16CH
(via 2x4GB/s MCI)
PCIe Gen 3x4, Single and dual port
NVMe1.3 , 64 VF
64 IO queues , 256 commands
Virtualization 64VF
Metadata T10 / DIF / DIX
Program/Erase
Suspend & Resume
Natively supported including out-of-
order transfers
CPU QUAD CORTEX – R5 ARM
Feature 88SS1098
NAND I/F speed 800MT/s
Reliability Gen4 LDPC
SGL Yes
IO Determinism Yes
T10 E2E DIX Yes
Encryption AES-XTS
Mar vel l Conf ident ia l17
88SS1098
128K Seq Write 2.73 GB/s
128K Seq Read 3.31 GB/s
4K Random Write 500 KIOPs
4K Random Read 650 KIOPs
88SS1098 - Marvell Datacenter NVMe SSD Controller
NAND: Toshiba BICS3 TLC, NFIF : 533 MT/s
8 Channels, 64 dies
Mar vel l Conf ident ia l
Marvell OC Drive (Prototype)
18
• Host: Linux PC with PCIe 3.0
• Drive: M.2 SSD/PCIE3.0x4
• Approach:
– Align with Linux open-source community and SPDK
– Evaluate open-channel SSD solution with prototype
• Targets:
– Support open-channel SSD interface v2.0
» In-house modification to support v2.0 read/write/erase operations
» Aligned with Linux upstream kernel 4.17, 4.18, 5.0
– Integrate with Marvell SSD controller and expose as a block device using pblk path in lightNVM
» Multi pblk instances support
Ubuntu Linux PC
NVMe Controller
PCIe3.0x4 I/F
Back End Controller
NAND NAND NAND NAND
Marvell SSD Device / Drive
Linux Host
OC SSD Media FW
Mar vel l Conf ident ia l
NVMe Command Support
Operation NVMe Command
Read Read Chunk
Write Write Chunk
Erase Reset Chunk (Free or Vacant)
Get Geometry Geometry
Get Chunk Information Get Log Page (Chunk Information)
Media Feedback Get/Set Features (Media Feedback)
19
Mar vel l Conf ident ia l20
88SS1098 OC Drive Prototype
128K Seq Write 2.7 GB/s
128K Seq Read 2.3 GB/s
4K Random Write 594 KIOPs
4K Random Read 448 KIOPs
OC Prototype Performance
We can achieve maximum possible chip performance with future product code
NAND: Toshiba BICS3 TLC, NFIF : 533 MT/s
8 Channels, 64 dies
Mar vel l Conf ident ia l
Planned Features for OCSSD
21
• Vector I/O and Asynchronized erase
– High performance
• NAND error recovery
– Highly efficient error recovery algorithms for best QoS and drive life
– Reusable, compatible and tested with all major NAND vendors
• Meta support
– To store host LBA in NAND
• Performance tuning
Mar vel l Conf ident ia l
Summary
22
• Marvell 88SS1098 controller is a perfect fit for both
conventional enterprise and open channel SSD products
• Marvell has highly efficient FW components
– Unified HAL : Provides access and exercises all HW features
– Full featured media management and NAND error recovery
– FW for NVMe block and other IP’s
The information contained in this presentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided “AS IS”, without warranty of any kind, express or implied. This information is based on Marvell’s current product roadmap, which are subject to change by Marvell without notice. Marvell assumes no obligation to update or otherwise correct or revise this information. Marvell shall not be responsible for any direct, indirect, special, consequential or other damages arising out of the use of, or otherwise related to, this presentation or any other documentation even if Marvell is expressly advised of thepossibility of such damages. Marvell makes no representations or warranties with respect to the contents of the presentation and assumes no responsibility for any inaccuracies, errors or omissions that may appear in this presentation.
6-May-19 24