30
Minimally Ordered Durable Datastructures for PM Swapnil Haria, Mark D. Hill, Michael M. Swift University of Wisconsin-Madison PIRL ‘19 1

DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Minimally Ordered Durable Datastructures for PM

Swapnil Haria, Mark D. Hill, Michael M. Swift

University of Wisconsin-Madison

PIRL ‘19 1

Page 2: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Promise of Persistent Memory

2Care about Storage

Car

e ab

out

Mem

ory

Faster Filesystems

Bigger DRAM

RecoverableApplications

NOT OUR

FOCUS

NOT OUR

FOCUS

Page 3: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Vision

Let programmers build recoverable applications with durable datastructures.

Hide the details of persistence within libraries of efficient datastructures (i.e., C++ STL).

3

Map Set Stack Vector Queue

Page 4: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Outline

4

Introduction

Analyzing durability overheads on Optane

MOD Datastructures

Evaluation

Page 5: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Programming Challenges

5

Shared Last-Level Cache

DRAM Controller

PM Controller

Persistent

Private L1 Private L1

Volatile

CPU 0 CPU 1

Persistence Domain

1.Durability2.Consistency3. (Failure) Atomicity

Page 6: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Background: Software Transactional Memory

6

BEGIN-TXvalue1 = array[index1]value2 = array[index2]LOG (index1, value1); FLUSH LOG; FENCEarray[index1] = value2FLUSH (&array[index1])LOG (index2, value2); FLUSH LOG; FENCEarray[index2] = value1FLUSH (&array[index2])FENCEEND-TX

CACHE

PM

X Y

X Y

Y

Index1=X Index2=Y

array

array

LOG

LOG

Index1=X Index2=Y

Y

Page 7: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Background: Software Transactional Memory

7

CACHE

PM

X Y

X Y

Y X

Index1=X Index2=Y

array

array

LOG

LOG Index1=X Index2=Y

Y

System CrashUse LOG to clean up the mess

Page 8: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Outline

8

Introduction

Analyzing durability overheads on Optane

MOD Datastructures

Evaluation

Page 9: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

STM performance on Optane

9

Exec

utio

n Ti

me

Self-

Nor

mal

ized

Page 10: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Measuring Flush Overheads on Optane

10

FLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCEFLUSHFENCE

FLUSHFENCEFLUSHFENCE

0

1

2

3

4

5

6

7

1:1 1:2 1:4 1:8 1:16

Exec

utio

n Ti

me

(in u

s)

Fence to Flush Ratio

Page 11: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Reduce FENCEs (ordering), even if extra computation required

Key Idea

11

Page 12: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Introduction

Analyzing durability overheads on Optane

MOD Datastructures

Evaluation

Outline

12

Page 13: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Goal: Minimize Ordering!

How to provide atomicity and consistency with minimal ordering?

Databases, Filesystems: Shadow Paging

Key Idea: Avoid overwriting data!

13

Page 14: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Ordering in STM

14

BEGIN-TXvalue1 = array[index1]value2 = array[index2]LOG (index1, value1); FLUSH LOG; FENCEarray[index1] = value2FLUSH (&array[index1])LOG (index2, Y); FLUSH LOG; FENCEarray[index2] = value1FLUSH (&array[index2])FENCEEND-TX

Page 15: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Intention: Minimize Ordering!

How to provide atomicity and consistency with minimal ordering?

Databases, Filesystems: Shadow Paging

Key Idea: Avoid overwriting data!

Mechanism: Out-of-place updates

15

Page 16: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Background: Shadow Paging

16

value1 = array[index1]value2 = array[index2]shadow = array // Create shadow copyshadow[index1] = value2shadow[index2] = value1FLUSH (shadow)FENCE// Application uses shadow subsequently

CACHE

PM

X Y

X Y

array

array

X YX Y

shadow

shadow

Y X

Page 17: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Déjà vu? Functional Datastructures!

Purely Functional datastructures are immutable

Implemented as efficient trees: Hash Array Mapped trie, RRBTree

Copying overheads reduced by structural sharing

17

array[8] updatedArray[8]

YX Y X

swap

Page 18: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

18

Functional Shadowing

Shadow paging to minimizeordering constraints

Functional Datastructures to reduceshadow paging overheads

Page 19: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Recipe for building MOD datastructures

1. Find efficient functional datastructures in language of choice

2. Pick any persistent memory allocator

3. Allocate internal state on the persistent heap

4. Flush modified PM cachelines in all update operations

5. Profit?

19

Page 20: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

MOD usecases

• One Update to One Datastructure (Atomic, Consistent, Durable)

• Multiple Updates to One Datastructure (Atomic, Consistent, Durable)

• Updating Multiple Datastructures (Atomic, Consistent, Durable)

20

ds ds’Update1

ds ds’Update1

ds’’Update2

ds1 ds1’Update1

ds2 ds2’Update2

Page 21: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

dsPtrShadow = dsPtr->Update(updateParams) FENCEdsPtr = dsPtrShadow

Update(arrayPtr, index, value)

// Atomic, Durable, Consistent with 1 FENCE

1: One Update to One Datastructure

21

arrayPtr

array[8] shadowArray[8]update

All Flushes Overlapped

Page 22: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

2: Multiple Updates to One Datastructure

22

dsPtrShadow1 = dsPtr->Update1(updateParams) dsPtrShadow2 = dsPtrShadow1->Update2(updateParams)Commit (dsPtr, dsPtrShadow1, dsPtrShadow2)

All Flushes Overlapped

ds dsShadow1Update1

dsShadow2Update2

FENCEdsPtr = dsPtrShadow2

dsPtr dsPtrShadow1 dsPtrShadow2

Page 23: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

3: Updating Multiple Datastructures

23

ds1PtrShadow = ds1Ptr->Update1(updateParams1) ds2PtrShadow = ds2Ptr->Update2(updateParams2)Commit (ds1Ptr, ds1PtrShadow,

ds2Ptr, ds2PtrShadow)

All Flushes Overlapped

More ordering points but short transaction

FENCEBegin-TX {

ds1Ptr = ds1PtrShadowds2Ptr = ds2PtrShadow

} End-TX

Page 24: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Open Questions

Concurrency

Read-Copy-Update style concurrency could be interesting

Preventing Memory Leaks

Allocator can lose newly allocated data in case of crash

Further Optimizations

Batching updates, in-place updates in shadows, etc.

24

Page 25: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Outline

25

Introduction

Analyzing durability overheads on Optane

MOD Datastructures

Evaluation

Page 26: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Evaluation Methodology

Used C++ library of functional datastructures:

https://github.com/arximboldi/immer

Used off-the-shelf persistent memory allocator:

https://github.com/hyrise/nvm_malloc.git

Compared against Intel PMDK v1.5 (Oct 2018)

https://github.com/pmem/pmdk

26

Page 27: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Performance Comparison on Optane

27

Exec

utio

n Ti

me

Nor

mal

ized

to P

MDK

Page 28: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Performance Comparison on Optane

28

Exec

utio

n Ti

me

Nor

mal

ized

to P

MDK

Page 29: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

MOD: Summary

Library of recoverable datastructures: map, set, vector, stack, queue

Minimizes number of ordering points in software

Uses functional shadowing instead of STM

Offers 60% performance improvements over current STM

29

Page 30: DWDVWUXFWXUHV IRU 30 - PIRL 2019...& µ v ] } v o µ µ } µ Z } Á P ] v P } À Z 5HFLSH IRU EXLOGLQJ 02' GDWDVWUXFWXUHV )LQG HIILFLHQW IXQFWLRQDO GDWDVWUXFWXUHV LQ ODQJXDJH RI FKRLFH

Thanks!

Questions?

30