22
Jin-Soo Kim ([email protected]) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Solid State Storage Technologies

Solid State Storage Technologies - csl.skku.educsl.skku.edu/uploads/ICE3028S16/7-sss.pdf · Solid State Storage Technologies. ICE3028: ... Jin-Soo Kim ([email protected]) 2 NVMe

Embed Size (px)

Citation preview

Jin-Soo Kim ([email protected])

Computer Systems Laboratory

Sungkyunkwan University

http://csl.skku.edu

Solid State

Storage

Technologies

2ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

NVMe (1)

▪ NVM Express (NVMe)

• For accessing PCIe-based SSDs

• Bypass block I/O layer

• Low latency

• Read, Write, Flush, Format

• Deallocate, Atomic Write

• Data Set Management

3ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

NVMe (2)

▪ Deep queue: 64K commands/queue, up to 64K queues

▪ Streamlined command set: only 13 required commands

▪ One register write to issue a command (“doorbell”)

▪ Support for MSI-X and interrupt aggregation

Doorbell

4ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

All-Flash Array

▪ Interfaces

• 10Gb/40Gb Ethernet (iSCSI) or

16Gb Fibre Channel or PCIe

• SAS or NVMe SSDs

▪ Functionalities

• Volume management

• Virtualization support

• RAID

• Snapshot

• Deduplication

• Compression, …

5ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

Traditional Block Interface

▪ SATA/SCSI/SAS

• Read (sector #, length)

Write (sector #, length, data)

• No block-level liveness information

• No high-level semantics on data

• Several “unwritten contracts”

do not hold for SSDs

– Sequential accesses are several tens of

times better than random accesses

– Distant LBNs lead to longer seek times

– Data written is equal to data issued

– …

FTL

SSD

Host

Block device driver

File system

Block I/F

NAND Flash

Flash I/F

6ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

Extending Block I/F

▪ TRIM command

• “The data in the specified sectors is no

longer needed”

• ATA interface standard

(T13 technical committee)

• Non-queued command

• SATA 3.1 introduces the Queued TRIM

commandFTL

SSD

Host

Block device driver

File system

NAND Flash

Block I/F + SSD-Specific I/F

Flash I/F

7ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

Atomic Write

▪ Transaction support for multi-block writes

• Simplifies file systems and DBMSes

X. Quyang, et al., “Beyond Block I/O: Rethinking Traditional Storage Primitives,” HPCA, 2011.

8ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

FusionIO’sVFSL

▪ Virtualized Flash Storage Layer

• Provides 64-bit virtual

block-addressed space

• Virtual-to-physical block

mapping: A variation of

B-trees in memory

• FTL functionalities

• read, write, trim, and

atomic update supported

W. Josephson, et al., “DFS: A File System for Virtualized Flash Storage,” FAST, 2010.

9ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

DFS over VFSL

▪ Direct File System

• Simple metadata and data layout (2TB chunk)

10ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

Annotating Block Semantics

▪ For differentiated storage service

• “Class” mapped to Group

Number field (5 bits) of

the SCSI CDB

• Selective caching on SSDs

▪ Similar mechanism in

eMMC 4.5

• ContextID

M. Mesnier, et al., “Differentiated Storage Services,” SOSP, 2011.

Ext3 Class

Group Number

Cache priority

Unclassified 0 12

Superblock 1 0

Group desc. 2 0

Bitmap 3 0

Inode 4 0

Indirect block 5 0

Directories 6 0

Journal 7 0

File <= 4KB 8 1

File <= 16KB 9 2

… … …

File > 1GB 18 11

11ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

Multi-streamed SSD (1)

▪ Previous write patterns (= current state) matter

12ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

Multi-streamed SSD (2)

▪ Mapping data with different lifetime to different streams

▪ Standardized in T10 SCSI (SAS SSDs)

13ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

Multi-streamed SSD (3)

▪ Cassandra with Multi-streamed SSD

14ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

Multi-streamed SSD (4)

▪ Cassandra’s normalized updated throughput with

5 streams

15ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

OSSD: Object-based SSD (1)

▪ OSD (Object-based Storage Device)

• Virtualizes physical storage as a pool of objects

• Offloads space management to storage devices

• Standardized as a subset of SCSI command set

Block interface Object interface

16ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

OSSD: Object-based SSD (2)

▪ OSD storage modelApplication

System Call Interface

File System Storage Management

Sector/LBA Interface

Block I/O Manager

Physical Media

File System User Component

Application

System Call Interface

OSD Storage Management

OSD Interface

Block I/O Manager

Physical Media

File System User Component

Host

StorageDevice

Traditional OSD

17ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

OSSD: Object-based SSD (3)

OAQ

OFS

VFS

iSCSI Initiator

iSCSI Target Daemon

OSSD Framework

OML

FML

FAL

Host

Target

RawSSD

READ/WRITE/ERASE SATA-2

OSD Interface (iSCSI) TCP 1Gbps

46

7

7

OID -ContextHash

Q n

Q 1

Q 0

16:8

Priority Queue

oid = 7

oid = 46

I/OContext

Object I /O instances

W

W

COAQ

AllocationBitmap

FML

OML

Descriptor

Object Attr.Object Data

μ-Tree

(Extents)

Object Data Buffer

18ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

OSSD: Object-based SSD (4)

▪ Simplified host file system

• No need for SSD-specific parameter tuning

▪ More efficient management of flash storage

• Block-level liveness

• Metadata separation

• Object-aware storage management (allocation, dedup, ...)

▪ Application-aware storage management

• Application hints, QoS

▪ Storage virtualization

• Pooling, tiering, caching, backup, replication, etc.

19ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

In-Storage Computing (1)

▪ Samsung ISC SSD Prototype

• Commodity SSD: Samsung PM1725 NVMe with the ISC

feature

• PCIe 3.0x4

• 800 GB

▪ Software

• C++11

• C++STL

• G++

• Software emulator

20ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

In-Storage Computing (2)

▪ ISC Application Development Process

21ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

In-Storage Computing (3)

▪ ISC Dataflow Programming Model

22ICE3028: Embedded Systems Design | Spring 2016 | Jin-Soo Kim ([email protected])

In-Storage Computing (4)

▪ Example: Simple Key-Value Store