39
Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors Mellor-Crummey and Scott Presented by Robert T. Bauer

Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Embed Size (px)

DESCRIPTION

Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors. Mellor-Crummey and Scott Presented by Robert T. Bauer. Problem. Efficient SMMP Reader/Writer Synchronization. Basics. Readers can “share” a data structure Writers need exclusive access Write appears to be atomic - PowerPoint PPT Presentation

Citation preview

Page 1: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Scalable Reader-Writer Synchronization for Shared-

Memory Multiprocessors

Mellor-Crummey and ScottPresented by

Robert T. Bauer

Page 2: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Problem

• Efficient SMMP Reader/Writer Synchronization

Page 3: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Basics

• Readers can “share” a data structure• Writers need exclusive access

– Write appears to be atomic• Issues:

– Fairness: Fair every “process” eventually runs

– Preference:• Reader preference Writer can starve• Writer preference Reader can starve

Page 4: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Organization

• Algorithm 1 – simple mutual exclusion • Algorithm 2 – RW with reader preference• Algorithm 3 – A fair lock

• Algoirthm 4 – local only spinning (Fair)• Algorithm 5– local only reader preference• Algorithm 6 – local only writer preference

• Conclusions

Paper’s

Contrib

Page 5: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Algorithm I – just a spin lock

• Idea is that processors spin on their own lock record

• Lock records form a linked list• When a lock is released, the “next”

processor waiting on the lock is signaled by passing the lock

• By using “compare-swap” when releasing, the algorithm guarantees FIFO

• Spinning is “local” by design

Page 6: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Algorithm 1

• Acquire Lockpred := fetch_and_store(L, I)pred /= null I->locked := true

prednext := I repeat while Ilocked

• Release Lock Inext == null compare_and_swap(L,I,null) return repeat while Inext == nullInextlocked := false

Page 7: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Algorithm 2 – Simple RW lock with reader preference

Bit 0 – writer active?Bit 31:1 – count of interested readers

start_write – repeat until compare_and_swap(L,0, 0x1)

start_read – atomic_add(L,2);repeat until ((L & 0x1) = 0)

end_write – atomic_add(L, -1)

end_read – atomic_add(L, -2)

Page 8: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Algorithm 3 – Fair Lock

Writer CountReader Count

start_write prev = fetch_clear_then_add(Lrequests, MASK, 1) // ++ write requests repeat until completions = prev // wait for previous readers and writers to go first

end_write – clear_then_add(Lcompletions, MASK,1) // ++ write completions

start_read // ++ read request, get count of prev writers prev_writer = fetch_clear_then_add(Lrequests, MASK, 1) & MASK repeat until (completions & MASK) = prev_writer // wait for prev writers to go first

end_read – clear_then_add(Lcompletions, MASK,1) // ++ read completions

Requests

Writer CountReader Count Completions

Page 9: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

So far so good, but …

• Algorithm 2 and 3 spin on a shared memory location.

• What we want is for the algorithms to spin on processor local variables.

• Note – results weren’t presented for Algorithms 2 and 3. We can guess the performance though, since we know the general characteristics of contention.

Page 10: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Algorithm 4Fair R/W Lock: Local-Only Spinning

• Fairness Algorithm– read request granted when all previous write

requests have completed– write request granted when all previous read

and write requests have completed

Page 11: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Lock and Local Data Layout

Page 12: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Case 1: Just a Read

Pred == nil

Lock.tail I

Upon exit:

Lock.tail I

Lock.reader_count == 1

Page 13: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Case 1: Exit Readnext == nil

Lock.tailI, so cas ret T

Lock.reader_count == 1

Lock.next_writer == nil

Upon Exit:

Lock.tail == nil

Lock.reader_count == 0

Page 14: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Case 2: Overlapping ReadAfter first read:

Lock.tail I1

Lock.reader_count == 1

not nil !!!!

predclass == reading

Pred->state == [false,none]

Locked.reader_count == 2

Page 15: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Case 2: Overlapping ReadAfter the 2nd read enters:

Locked.tail I2

I1next == I2

Page 16: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Case 2: Overlapping readsI1 finishes next != nil

I2 finishes Locked.tail = nil

count goes to zeroafter I1 and I2 finish

Page 17: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Case 3: Read Overlaps Write

• The previous cases weren’t interesting, but they did help us get familiar with the data structures and (some of) the code.

• Now we need to consider the case where a “write” has started, but a read is requested. The read should block (spin) until the write completes.

• We need to “prove” that the spinning occurs on a locally cached memory location.

Page 18: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Case 3: Read Overlaps WriteThe Write

Upon exit:

Locked.tail I

Locked.next_writer = nil

I.class = writing, I.next = nil

I.blocked = false, success… = none

pred == nil

reset blocked to false

Page 19: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Case 3: Read Overlaps WriteThe Read

pred class == writing

wait here for write to complete

Page 20: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Case 3: Read Overlaps WriteThe Write Completes

I.next The Read

Yes!Works, but is “uncomfortable”because concerns aren’tseparated

unlock the reader

Page 21: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Case 3: What if there were more than 1 reader?

change the predecessor reader

wait here

Yes! Changed by the successor

unblock the successor

Page 22: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Case 4: Write Overlaps Read

• Overlapping reads form a chain

• The overlapping write, “spins” waiting for the read chain to complete

• Reads that “enter” after the write as “enter”, but before the write completes (even while the write is “spinning”), form a chain following the write (as with case 3).

Page 23: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Case 4: Write Overlaps Read

wait here

Page 24: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Algorithm 5 Reader Preference R/W Local-Only Spinning

• We’ll look at the Reader-Writer-Reader case and demonstrate that the second Reader completes before the Writer is signaled to start.

Page 25: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

1st Reader++reader_countWaflag == 0 false1st reader just runs!

Page 26: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Overlapping Write

queue the write

Register writerinterest, resultnot zero, sincethere is a reader

We have a reader,so the cas fails.

The writer blocks herewaiting for a readerset blocked = false

Page 27: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

2nd ReaderStill no active reader++reader_count

Page 28: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Reader Completes

Only last reader willsatisfy equality

Last reader to completewill set WAFLAGand unblock writer

Page 29: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Algorithm 6 Writer Preference R/W Local-Only Spinning

• We’ll look at the Writer-Reader-Writer case and demonstrate that the second Writer completes before the Reader is signaled to start.

Page 30: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

1st Writer

1st writer

Page 31: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

“set_next_writer”

1st writerwriter interested or active

no readers, just writer

writer should run

Page 32: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

1st Writer

1st writer

blocked = false, so writerstarts

Page 33: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Reader

put reader on queue

“register” reader, seeif there are writers

wait here for writerto complete

Page 34: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

2nd Writer

queue this write behindthe other write

and wait

Page 35: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Writer Completes

start the queuedwrite

Page 36: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Last Writer Completes

clear write flagssignal readers

Page 37: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Unblock Readers

++reader count,clear rdr’s interested

no writers waiting oractive

empty the “waiting”reader list

when this readercontinues, it willunblock the “next”reader -- which willunblock the “next”reader, etc.reader count getsbumped

Page 38: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Results & Conclusion

• The authors reported results for a different algorithm than was presented here.

• The “algorithms” used were “more” costly in a multiprocessor environment; so they’re claiming that the algorithms presented here would be “better.”

Page 39: Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Timing Results

Latency is costly becauseof the number of atomicoperations.