42
IBM T. J. Watson Research Center Memory Management Issues in Non-Blocking Synchronization Maged Michael IBM T J Watson Research Center ISMM 2009

Memory Management Issues in Non-Blocking Synchronization

  • Upload
    lilika

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

Memory Management Issues in Non-Blocking Synchronization. Maged Michael IBM T J Watson Research Center ISMM 2009. Non-blocking synchronization. Outline. Dynamic memory solves problems in non-blocking algorithms. Dynamic memory raises problems in non-blocking algorithms. - PowerPoint PPT Presentation

Citation preview

Page 1: Memory Management Issues in Non-Blocking Synchronization

IBM T. J. Watson Research Center

Memory Management Issuesin Non-Blocking Synchronization

Maged Michael IBM T J Watson Research Center

ISMM 2009

Page 2: Memory Management Issues in Non-Blocking Synchronization

2 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Outline

Non-blocking synchronization

Dynamic memory solves problems in non-blocking algorithms

Dynamic memory raises problems in non-blocking algorithms

Memory management solutions and tradeoffs

Page 3: Memory Management Issues in Non-Blocking Synchronization

3 Maged Michael Memory Management Issues in Non-Blocking Synchronization

System Model

Shared memory

Scheduler

Memory access primitives Read Write Compare-and-swap ...

Threads

Page 4: Memory Management Issues in Non-Blocking Synchronization

4 Maged Michael Memory Management Issues in Non-Blocking Synchronization

The Scheduler

The scheduler decides when and if to let a ready thread take a step

zzzzzzzzz

Bad decisions by the scheduler can lead to the indefinite prevention of active threads from making progress

The scheduler does not know all dependencies among threads

In some cases (e.g., real-time applications, signal handlers, OS kernels) this is unacceptable as it may lead to deadlock, livelock, or delay of high priority operations

The scheduler can make very bad decisions

Page 5: Memory Management Issues in Non-Blocking Synchronization

5 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Example: Deadlock in Signal Handling

A thread acquires a lock to operate on some shared data

The signal handler needs to acquire the lock

The scheduler decides to interrupt the thread to deliver a signal The signal handler runs

NO LOCKS IN SIGNAL HANDLERS

zzzzzzzzz I need

The interrupted thread will not run until the signal handler completes

The signal handler will not complete until the interrupted thread releases the lock

DEADLOCK

Can’t finishCan’t run

What?

Page 6: Memory Management Issues in Non-Blocking Synchronization

6 Maged Michael Memory Management Issues in Non-Blocking Synchronization

obstruction-freeno blocking

Non-Blocking Progress Guarantees Three levels of non-blocking guarantees

An operation is wait-free, ifwhenever a thread executing the operation takes a finite number of steps,the thread must have completed the operation,regardless of the actions/inaction of other threads.

An operation is lock-free, ifwhenever a thread executing the operation takes a finite number of steps,some thread must have completed the operation,regardless of the actions/inaction of other threads.

An operation is obstruction-free, ifwhenever a thread executing the operation takes a finite number of steps alone,the thread must have completed the operation,regardless of where the other threads stopped.

lock-free

no livelock

wait-freeno starvation

Page 7: Memory Management Issues in Non-Blocking Synchronization

7 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Non-blocking is a property of operations Non-blocking progress is a property of an operation in an

implementation of an abstract shared data type

If all operations in an implementation of an abstract shared data type are non-blocking, then the whole implementation is non-blocking

E.g., A lock-free hash table implementation of a shared set

E.g., The lookup operation in a hash table implementation of a shared set is wait-free, while the insert and remove operations are blocking.

Page 8: Memory Management Issues in Non-Blocking Synchronization

8 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Non-blocking synchronization is not about ...

Non-blocking progress is not about fairness

Non-blocking synchronization is not just about not using locks

No locks Non-blocking

Fair Non-blocking

Non-blocking synchronization is all about ... Delay of any number of threads does not prevent active threads from

making progress

Page 9: Memory Management Issues in Non-Blocking Synchronization

9 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Simple Non-Blocking Example

CAS(X,expval,newval) atomically r := (X == expval) if r X := newval return r

FetchAndIncrement() do oldval := Xuntil CAS(X,oldval,oldval+1)return oldval

Read() return X

Read is wait-free. Completes in one step.

Structures X : integer

operations Read() : integerFetchAndIncrement(): integer

FetchAndIncrement is lock-free.Whenever one loop iteration (two steps) is executed, some operation must have completed.

Lock-Free Counter

Page 10: Memory Management Issues in Non-Blocking Synchronization

10 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Dynamic Memory and Non-Blocking Algorithms Dynamic memory solves problems

Atomic access to large blocks

ABA problem

Dynamic memory causes problems

Persistent pointers

Memory reclamation problem

Non-blocking allocation and deallocation

ABA problem

Page 11: Memory Management Issues in Non-Blocking Synchronization

11 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Atomic Access to Multiple Words

Place multi-word data in a dynamic block

Some algorithms need to operate atomically on multiple or large locations that exceed the size of HW atomic primitives

u

X

atomically ret := X == u if ret X := v

A common solution in non-blocking algorithms

Updates replace the block

u

v

P

E.g., Wide CAS

Solved one problem

Created more problems

ptr := P ret := (*ptr == u) if ret newb := new Block(v) ret := CAS(P,ptr,newb) delete ret ? ptr : newb

unsafe access

allocation

ABA problem

unsafe reclamationdeallocation

Page 12: Memory Management Issues in Non-Blocking Synchronization

12 Maged Michael Memory Management Issues in Non-Blocking Synchronization

The ABA Problem

P

uA

wB

1 Thread i reads A from P

Thread j sets P to B

Thread j reuses block A to hold value z

zA

vC

Thread j sets P to A again

Thread i checks that P is equal to ACAS succeeds although *P == z != u

3

4

5

7

Thread i reads u from *A2

1 ptr := P ret := (*ptr == u) if ret newb := new Block(v) ret := CAS(P,ptr,newb) delete ret ? ptr : newb

7

2

6

Thread i allocates block C to hold value v6

INCORRECT OUTCOME

Problem: CAS cannot tell if P changed or not

Example

Page 13: Memory Management Issues in Non-Blocking Synchronization

13 Maged Michael Memory Management Issues in Non-Blocking Synchronization

The ABA Problem

1. A thread i reads a value A from a shared variable X

2. Other threads change X to a different value B and then back to A again3. Thread i checks X using a primitive that cannot tell if X changed,

finds X equal to A, and acts as if X never changed

Primitives susceptible to the ABA problem include read and variants of CAS

This interleaving of events is a necessary but not sufficient condition for the ABA problem. In some cases, the effect is benign.

Page 14: Memory Management Issues in Non-Blocking Synchronization

14 Maged Michael Memory Management Issues in Non-Blocking Synchronization

LIFO Linked List: Classic ABA ExamplePop

1

1

Thread i reads A from Anchor

Thread i reads B from *A

Thread j pops A and B

Thread i checks that Anchor is equal to A, sets Anchor to B

The List is corrupted

2

3

5

do first := Anchor next := *firstuntil CAS(Anchor,first,next)return first

2

Introduced in IBM System 370 documentation in the 1970s

Anchor

5A B

Thread j pushes A back4

C

Page 15: Memory Management Issues in Non-Blocking Synchronization

15 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Classic Solution: ABA Tags

Pop

1

1

Thread i reads [A,tag] from Anchor

Thread i reads B from *A

Thread j pops A and B

Thread i finds Anchor != [A,tag] and CAS fails as it should

2

3

5

do [first,tag] := Anchor next := *firstuntil CASD(Anchor,[first,tag],[next,tag+1])return first

2

Introduced in IBM System 370 documentation in 1983

5

A B

Thread j pushes A back, sets Anchor to [A,tag+2]4

C

Pack a tag with the shared variable. Increment tag upon every pop. Use double-width primitives

Anchor

100

Anchor

102

ABA problem prevented

Page 16: Memory Management Issues in Non-Blocking Synchronization

16 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Pros and Cons of ABA Tags

Pros

Cons

Wait-free

Not portable: Requires wide primitives when packed with a full word.

Complicates/prevents reclamation of dynamic memory

A theoretical chance of exact wraparound if tag size is exceeded

Low time and space overheads

Page 17: Memory Management Issues in Non-Blocking Synchronization

17 Maged Michael Memory Management Issues in Non-Blocking Synchronization

ABA-Immune Primitives

Inherently immune to the ABA problem

ABA solutions are often represented as LL/SC/VL implementations using practical primitives

LL(X) : value atomically return X

VL(X) : boolean atomically return X not written by others since last LL

LoadLinked (LL), Validate (VL), StoreConditional (SC)

Only partially supported on real architectures

SC(X,v) : boolean atomically r := VL(X) if (r) X := v return r

do first := LL(Anchor) next := *firstuntil SC(Anchor,next)return first

Pop

Page 18: Memory Management Issues in Non-Blocking Synchronization

18 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Benign ABA Cases Example: Between the read of X and a successful CAS, the value of X might

have changed and returned back to its old value, but the outcome is still correctdo old := Xuntil CAS(X,old,old+v)

AtomicAdd(X,v)

Anchor

Push(block)do first := Anchor.ptr *block := firstuntil CAS(Anchor.ptr,first,block)

Another example is Push in a LIFO list

LL/SC/VL are unnecessarily strong as they prevent benign cases

Page 19: Memory Management Issues in Non-Blocking Synchronization

19 Maged Michael Memory Management Issues in Non-Blocking Synchronization

The Memory Reclamation Problem

P

uA

wB

1 Thread i reads pointer value A from P

Thread i accesses free memory3

Thread j sets P to B and frees A to OS2

1 ptr := P ret := (*ptr == u) if ret newb := new Block(v) ret := CAS(P,ptr,newb) delete ret ? ptr : newb

3

ACCESS VIOLATION

Example

returned to OS

Page 20: Memory Management Issues in Non-Blocking Synchronization

20 Maged Michael Memory Management Issues in Non-Blocking Synchronization

The Memory Reclamation Problem A thread i reads a pointer to a dynamic memory location

Another thread j removes the block and frees it

Thread i dereferences the pointer to access the freed block– Thread i might read/write unmapped memory

access violation

– Thread i might read unrelated data from the recycled block

return incorrect result

– Thread i might write into the recycled node

corrupt some shared structure

How to be able to reclaim dynamic memory blocks removed from non-blocking structures and guarantee that no thread will access the contents of free blocks?

Page 21: Memory Management Issues in Non-Blocking Synchronization

21 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Memory Reclamation and the ABA ProblemTwo different but related problems

The ABA problem can occur even when no dynamic memory is used at all

Solving the memory reclamation problem often prevents some but not all cases of the ABA problem

– E.g., array-based structures

Memory reclamation is all about dynamic memory

No dynamic memory use No memory reclamation problem

No dynamic memory use No ABA problem

Complete ABA solutions can be constructed by using memory reclamation solutions

Page 22: Memory Management Issues in Non-Blocking Synchronization

22 Maged Michael Memory Management Issues in Non-Blocking Synchronization

How does GC help?

Prevents the ABA problem if

Other ABA cases can use an extra level of indirection to be preventable by GC

– The ABA problem only involves pointers to dynamic blocks

Completely solves the memory reclamation problem

– Once a dynamic block is removed, it is not reinserted (in the same structure) before going through GC

inserted removed

reclaimedallocated

may be reinserted

P

always reclaimed before reuse

do first := Anchor next := *firstuntil CAS(Anchor,first,next)return first

Pop correct under GC

– The contents of a dynamic block are never changed while it is globally reachable

Page 23: Memory Management Issues in Non-Blocking Synchronization

23 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Memory Reclamation Approaches

– Epoch-based

– Reference counting

– Hazard pointers

Page 24: Memory Management Issues in Non-Blocking Synchronization

24 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Epoch-Based Solutions– E.g., RCU (read-copy-update) heavily-used in the Linux kernel

– Depend on the notion of quiescence points, where a thread is guaranteed not to hold references to removable memory blocks

– Typically use per-thread timestamp– A removed block is removed only after each thread (that could have had access to it) has gone through at least one

quiescence point after the block was removed

Pros:– Fast reading (no time overhead per dereference)

Cons:– In user level, either blocking or can result in an unbounded number of not-yet-reclaimed removed blocks

– No reader interference. No writer starvation by readers.

Page 25: Memory Management Issues in Non-Blocking Synchronization

25 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Per-Block Reference Counting

Pros:

Cons:

– Reader-reader contention

– O(n) bound on not-yet-reclaimed removed blocks

– Writer starvation by readers possible– To reclaim blocks for arbitrary reuse, requires either

• DCAS (CAS on two locations), or• Extra level of indirection and extra space per pointer

– Lock-free

Threads increment or decrement a per-block reference counter whenever they create or destroy references to the block

Page 26: Memory Management Issues in Non-Blocking Synchronization

26 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Hazard Pointers A hazard pointer is single-writer multi-reader pointer

do do first := Anchor *myHP := first until Anchor == first next := *first until CAS(Anchor,first,next)*myHP := nullreturn first

Pop

safe access: first will not be freed no ABA: first will not be inserted

As long as *myHP remains equal to first

Each hazard pointer has one owner (that can write to it)

By setting a hazard pointer to the address of a dynamic block, the owner thread is telling other threads: “if any of you remove this block after the last time I set this hazard pointer to this block

don’t reclaim this block until I change my hazard pointer”

Page 27: Memory Management Issues in Non-Blocking Synchronization

27 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Reclaiming Blocks under Hazard Pointers After accumulating a number of removed nodes

1. Read active hazard pointers. Keep private copy of non-null values• Private copy can be arranged in an efficient search structure

e.g., hast table with constant expected lookup time

2. For each removed block, do a lookup in the private structure• Found? Keep block for next scan of hazard pointers• Not found? It is safe to reclaim the

Page 28: Memory Management Issues in Non-Blocking Synchronization

28 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Hazard Pointers

Pros:

Cons:

– Constant expected time per reclaimed block

– Worst case O(m.n) not-yet-reclaimed removed blocks

m is number of active removing threads (readers of hazards pointers)n is max. num. of active traversing threads (writers of hazard pointers)

– Wait-free– No atomic instructions needed

• Even reads and writes to hazard pointers can be nonatomic

– No reader interference, and no writer starvation

• O(m) bound possible, but at the cost of O(n) time per reclaimed block

Page 29: Memory Management Issues in Non-Blocking Synchronization

29 Maged Michael Memory Management Issues in Non-Blocking Synchronization

The Persistent Pointers Problem Some non-blocking algorithms require some pointers in removed blocks to retain

their values (as long as there are direct or indirect references to the blocks)

This is done for simplicity

But it can lead to unbounded memory use

zzzzz

Example:

– Simple linked list traversal

– But pointers in removed blocks cannot be nullified

– Unbounded memory

Page 30: Memory Management Issues in Non-Blocking Synchronization

30 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Avoiding Persistent Pointers Just don’t use persistent pointers in algorithms Algorithms should be designed such that pointers in removed blocks are immediately nullifiable

zzzzz

But traversal becomes a bit more complicated– Double-check that previous node still points to the current one before moving on to the next

Page 31: Memory Management Issues in Non-Blocking Synchronization

31 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Persistent Pointers and Memory Reclamation Algorithms with persistent pointers are limited in the memory reclamation solutions/approaches that they can use

– The restricted reuse (no reclamation) approach? NO• No, because the approach implies the possibility of immediate reuse of removed blocks.

– GC/reference counting/epoch-based solutions? YES• Yes, because these methods do not reclaim blocks that are indirectly reachable from a private reference.• But this same feature can lead to unbounded memory use with persistent pointers

– Hazard pointers? NO in general• No, because hazard pointers allow the reclamation of blocks that are indirectly reachable from private references

Page 32: Memory Management Issues in Non-Blocking Synchronization

32 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Dynamic Memory Allocation and DeallocationNon-blocking algorithms that use dynamic memory need a non-blocking allocator to manage the reuse of reclaimed blocks

The key challenge in building a non-blocking allocator is the capability to coalesce free blocks for arbitrary reuse or to be returned to the OS

Page 33: Memory Management Issues in Non-Blocking Synchronization

33 Maged Michael Memory Management Issues in Non-Blocking Synchronization

High-Level Design of A Non-Blocking Allocator Use coalescing units (superblocks) rather than arbitrary coalescing Keep track of each superblock’s state to detect when its blocks become fully free.

– Use a separate descriptor to avoid memory reclamation problems.

Manage the free blocks in a superblock as a linked list Manage both the free blocks list and the superblock state together atomically

superblock descriptorstate

superblock

Page 34: Memory Management Issues in Non-Blocking Synchronization

34 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Allocation First try a fast path of allocation from the active superblock (of the appropriate heap)

If there is no active superblock, then try to find a partially allocated superblock to make it active

If not, then allocate a new superblock of an appropriate size, divide it, and make it the active superblock after taking a block

Page 35: Memory Management Issues in Non-Blocking Synchronization

35 Maged Michael Memory Management Issues in Non-Blocking Synchronization

descriptor

heap header ptr 6

0 allocated1 2 allocated3 4 567

Malloc (common case)

superblock

Active superblock Done

Identify heap based onrequested block size and thread id

5 new block

headhead count

state

6 ACTIVE

1. Read header2. Read descriptor packed state3. Recheck header4. Read next pointer of first block5. CAS changes to packed state

5 ACTIVE

ABA tag

Page 36: Memory Management Issues in Non-Blocking Synchronization

36 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Deallocation Push the freed block back into its superblock

If the superblock was fully allocated, now it becomes partially allocated and needs to be added to the set of partially allocated superblocks

If the superblock was partially allocated and now fully free, then remove it from the set of partially allocated superblocks and coalesce it

Page 37: Memory Management Issues in Non-Blocking Synchronization

37 Maged Michael Memory Management Issues in Non-Blocking Synchronization

descriptor

heap header

0 allocated1 2 allocated3 4 5 to be freed67

Free (common case)

descriptor superblock

Active superblock

head

Done

The block header points to the descriptor of the original superblock

unreservedcount

state

5 ACTIVE6 ACTIVE

5 free

1. Read descriptor packed state2. Set next pointer of freed block3. CAS changes to packed state

Page 38: Memory Management Issues in Non-Blocking Synchronization

38 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Superblock Lifecycle

ACTIVE BUSY

PARTIALFREE

Taking the last block

Freeing the last block

Freeing the first block

No Active superblock

New superblock

Unmap or reuse arbitrarily

not Active count = 0

not Activecount = total

not Active 0 < count < total

Activeany count

Page 39: Memory Management Issues in Non-Blocking Synchronization

39 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Dealing with Memory Managementin Non-Blocking Algorithms

First, abstract away memory management problems to focus on the core algorithm

After designing the core algorithm under these assumptions, the options for dealing with memory management remains open and it is easier to weigh the trade-offs among the solutions

But, avoid abstractions that limit the memory management solutions or hide problems

Memory reclamation: Assume perfect GC but with explicit deallocation

ABA: Think in terms of ABA just not happening rather than LL/SC/VL

Consider ABA and memory reclamation solutions together

Page 40: Memory Management Issues in Non-Blocking Synchronization

40 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Non-Blocking GC and Its Challenges Can we build a pure user-level non-blocking GC without special scheduler support?

The biggest challenge for a non-blocking GC is for the collector to find out the private references of the mutators at any arbitrary point, and to do so efficiently

Yes. One can use memory reclamation methods as a foundation. But it will be slow

Non-blocking memory reclamation methods add per-reference overheads

Adding these overheads to basically every load and store that may create or destroy a private reference may be prohibitively high

Page 41: Memory Management Issues in Non-Blocking Synchronization

41 Maged Michael Memory Management Issues in Non-Blocking Synchronization

Concluding Remarks

Memory management solves problems and creates problems in the design of non-blocking algorithms

There were many advances in non-blocking memory management in this decade but there is space for more

Non-blocking synchronization is intertwined with memory management

The memory reclamation and ABA problems occur under blocking optimistic concurrency

Page 42: Memory Management Issues in Non-Blocking Synchronization

42 Maged Michael Memory Management Issues in Non-Blocking Synchronization

THANK YOU