Upload
others
View
19
Download
0
Embed Size (px)
Citation preview
Systems | Fueling future disruptions
ResearchFaculty Summit 2018
2
What Should You Do With Persistent Memory?
Steven Swanson
Director, Non-Volatile Systems LabComputer Science and Engineering
UC San Diego
3
Non-volatile main memory (NVMM)
• Byte-addressable• Denser than DRAM• DRAM-comparable latency• Higher bandwidth than SSD• Ready for DMA / RDMA
Intel 3D XPoint NVDIMM
MemoryDRAM DIMM
DRAM DIMM
NVMMNVDIMM
MemoryController
MemoryController
CPU
Last-level cacheL1/L2Core
L1/L2Core
NVDIMM
4
What Should You Do With NVMM?
1. Use files and a conventional (distributed) file system
2. Use files and better file (distributed) system
3. Build persistent data structures
4. Use it as slow DRAM
EXT4
EXT4EXT4
EXT4EXT4
EXT4CEPH etc.Orion
NOVA NOVA
NOVA
NOVANOVA
NOVA
Log
5
What Should You Do With NVMM?
1. Use files and a conventional (distributed) file system
2. Use files and better file (distributed) system
3. Build persistent data structures
4. Use it as slow DRAM
Orion
NOVA NOVA
NOVA
NOVANOVA
NOVA
6
File IO Atomicity Fault Tolerance SpeedDirect
Access
DAX
7
NOVA: A File System for NVMM
• A NOVA FS is a tree of logs• One log per inode
– Inode points to head and tail– Logs are not contiguous
• Many Logs -> high concurrency• Strong consistency guarantees• Log-structured + journals + copy-
on-write
Head TailInode
Inode log
Committed entry
Uncommitted entry
Per-inode logging
8
Atomicity: Logging for Simple Metadata Operations
• Combines log-structuring, journaling and copy-on-write
• Log-structuring for single log update– Write, msync, chmod, etc– Lower overhead than journaling
and shadow paging
File log
Tail Tail
9
Atomicity: Lightweight Journaling for Complex Metadata Operations
• Lightweight journaling for update across logs– Unlink, rename, etc– Journal log tails instead of
metadata or data File log
Directory log
Tail
TailTail
Tail
Dir tail
File tailJournal
10
Atomicity: Copy-on-write for file data
• Copy-on-write for file data– Log only contains metadata– Log is short– Instant data GC
File log
Tail
Data 1 Data 2
Tail
Data 0 Data 1
11
Filebench throughput
050
100150200250300350400
Fileserver Varmail Webproxy Webserver
KOps
per s
econ
d
Ext4-datajournal Ext4-DAX xfs-DAX NOVA
12
What Should You Do With NVMM?
1. Use files and a conventional (distributed) file system
2. Use files and better file (distributed) system
3. Build persistent data structures
4. Use it as slow DRAM
???
NOVA NOVA
NOVA
NOVANOVA
NOVA
13
Existing Distributed File Systems are Slow
Local File System
Distributed File System Server
NetworkStack
Distributed File System Client
NetworkStack
Application
Client FS
14
Orion: A Distributed Persistent Memory File System
OrionFS
Application
OrionFS
RDMA
15
Orion: Key Features
• Based on NOVA• Mirrored metadata on client
– Client keep local, NVMM copy of inode’s log– Leases + simple arbitration for concurrent updates
• Mostly-local operation– Local read cache– CoW creates new, local copy
• Pervasive RDMA– All addresses/pointers are RDMA-friendly– Zero-copy IO for most transfers (NOVA data structures are RDMA targets)– Single-ended remote data access
16
Application performance on Orion
17
What Should You Do With NVMM?
1. Use files and a conventional (distributed) file system
2. Use files and better file (distributed) system
3. Build persistent data structures
4. Use it as slow DRAMLog
18
Build NVMM Data Structures Is Hard
• All existing programming errors are still possible– Memory leaks– Multiple frees– Locking errors
• There are new kinds of errors– Pointers between NV memory pools– Pointers from NVMM to DRAM
• Programmers get this stuff wrong• Rebooting/restarting won’t help!• Language + Compiler support will come, but slowly
19
Optimizing RocksDB
20
What Should You Do With NVMM?
1. Use files and a conventional (distributed) file system
2. Use files and better file (distributed) system
3. Build persistent data structures
4. Use it as slow DRAM
Easy; ~5x gains
Pretty easy; ~10x gains
Really hard; ~30x gainsThe NVMM Programmability Gap
21
Optimizing RocksDB
File Emulation
22
File Emulation
• Normal write-ahead logging– open();– write(); sync();
• Emulate read/write in user space– open(); mmap();– memcpy() + clwb + fence
• Almost POSIX semantics– Minimal changes to app logic– No complex logging, allocation, or
locking• Almost persistent data structure
performance– Just 10% slower.
23
File Emulation Speedups
0
10
20
30
40
50
60
SQLite KyotoCabinet RocksDB
File
Em
ulat
ion
Spee
dup
XFS-DAX Ext4-DAX NOVA
24
What Should You Do With NVMM?
• You should study it!– Many interesting, open problems remain– Lots of PhDs to come
• You should use it!– Use a file system!– Want more performance? Use file emulation!– Want more performance? Build persistent data structures.
25
NOVA is open source.We are preparing it for “upstreaming” in to Linux.
To help or try it out: https://github.com/NVSL/linux-nova
26
Thanks!
Thank you!