Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
HOOP: Efficient Hardware-Assisted
Out-of-Place Update for Non-Volatile Memory
Miao Cai † Chance Coats Jian Huang
Systems Platform Research Group
†
2
Non-Volatile Memory is a Revolutionary Technology
New and emerging NVMs offer promising properties and become popular
Close-to-DRAM Performance Data Durability Byte Addressability
3
Memory Persistency Challenge: A Well-Known Problem
Ensuring memory persistency with commodity architecture is challenging!
Performance vs. PersistencyOut-of-Order ExecutionVolatile Processor Cache
4
State-of-the-Art Approach: Redo/Undo Logging
Undo Logging
Redo Logging
Undo/Redo logging causes DOUBLE WRITES on the critical path.
Page Copy
5
State-of-the-Art Approach: Shadow Paging
Optimized shadow paging still suffers from FREQUENT DATA FLUSHES.
6
State-of-the-Art Approach: Log-structured NVM
Software-based LSNVM suffers from LONG ACCESS LATENCY.
Log Index
7
A Summary of State-of-the-Art Approaches
Logging Shadow Paging Log-structured NVM
Memory persistency overheads: double writes, frequent flushes, long critical-path latency
8
Our Approach: Hardware-assisted Out-Of-Place (HOOP) Update
Reduced write traffic with data coalescing and packing
No requirement on persistence ordering
Transparent support of atomic data durability
+
+
9
Lightweight
Indirection Layer
Challenges of Supporting Out-Of-Place Update
Limited Resource in
Memory ControllerEfficient Garbage
Collection
10
Address Remapping for Supporting Out-of-Place Update
Processor Cache
Memory
Controller
Home Region OOP RegionNVM
Mapping Table
storeload
physical-to-physical
address mapping
Insert mapping entry
Upon a write to OOP region
Delete mapping entry
Data migration from OOP to home
Upon a read from OOP region
GC
11
Processor Cache
Memory
Controller
Home Region OOP RegionNVM
Mapping Table
storeload
Data Packing in the Memory Controller for Improved Performance
OOP Data Buffer
Many applications
update data at a
fine granularity
Home
address
OOP BlockHeadOOP BlockHead …
12
Processor Cache
Memory
Controller
Home Region OOP RegionNVM
Mapping Table
storeload
OOP Data Buffer
Ensuring Persistence Ordering in the Memory Controller
Done the data packing for a memory slice
Upon the end of transaction (e.g., Tx_end)
13
Processor Cache
Memory
Controller
Home Region OOP RegionNVM
Mapping Table
storeload
OOP Data Buffer
Efficient Garbage Collection for Improved Memory Utilization
GC
OOP BlockHeadOOP BlockHead …
Load stale data
during GCEviction Buffer
Linked Memory Slices
14
Processor Cache
Memory
Controller
Home Region OOP RegionNVM
Mapping Table
storeload
OOP Data Buffer
OOP BlockHeadOOP BlockHead …
Handling Crash Consistency Upon Failures
Eviction Buffer
15
Put It All Together
Last-Level Cache
Memory
Controller
Home Region OOP RegionNVM
Mapping Table
storeload
OOP Data Buffer
Eviction Buffer
L1 Cache L1 Cache
core core
miss
miss
16
HOOP
Implementation
Evaluation
Benchmarks
McSimA+: OoO cores, 2.5GHz,
32KB L1, 256KB L2, 2MB LLC Processor Simulator
NVM Simulator Read/Write = 50/150ns, 512GB
Synthetic Workloads
Real-world Workloads
Vector, HashMap, Queue, RB-Tree, B- Tree
YCSB, TPCC
17
Improving Transaction Throughput with HOOP
0
0.5
1
1.5
2
2.5
Vector Queue RBTree Btree HashMap YCSB TPCC
Norm
aliz
ed S
pee
du
p
Optimized Redo Optimized Undo Optimized Shadow Paging
Log-Structured NVM Logless Atomic Durability HOOP
Ideal
HOOP is close to the performance of a system without any persistence enforcement.
18
Reducing Critical-Path Latency with HOOP
0
0.5
1
1.5
2
2.5
Vector Queue RBTree Btree HashMap YCSB TPCC
Norm
aliz
ed L
aten
cy
Ideal Optimized Redo Optimized Undo
Optimized Shadow Paging Log-Structured NVM Logless Atomic Durability
HOOP
HOOP achieves the lowest latency, compared to state-of-the-art approaches.
19
Reducing Write Traffic with HOOP
0
0.5
1
1.5
2
2.5
3
Vector Queue RBTree Btree HashMap YCSB TPCC
Norm
aliz
ed W
rite
Tra
ffic
Ideal Optimized Redo Optimized Undo
Optimized Shadow Paging Log-Structured NVM Logless Atomic Durability
HOOP
HOOP reduces write traffic by up to 2.1x, compared to logging approaches.
20
HOOP
Summary
1.7x Performance Speedup for Data-Intensive Apps
2.1x Reduction of Write Amplification
Thanks!
University of Illinois at Urbana-Champaign
Miao Cai Chance Coats Jian Huang
Systems Platform Research Group