Upload
sahara
View
35
Download
0
Embed Size (px)
DESCRIPTION
“Shared Memory Consistency Models: A Tutorial” – Adve & Gharachorloo. Robert T. Bauer. Shared Memory. Shared memory – single address space abstraction in a multiprocessor environment. Memory Model. Specifics how reads and writes appear to executed May (usually) varies by level - PowerPoint PPT Presentation
Citation preview
“Shared Memory Consistency Models: A Tutorial” – Adve &
Gharachorloo
Robert T. Bauer
Shared Memory
• Shared memory – single address space abstraction in a multiprocessor environment.
Memory Model
• Specifics how reads and writes appear to executed
• May (usually) varies by level– Programming language can provide a
memory model, for example Java has its own (JMM, JSR 133)
– Processor– Memory subsystem
Definitions
• Sequential (Processor)– Result of an execution is the same as if the
operations had been executed in the order specified by the program.
• Sequentially Consistent (Multiprocessor)– Result of any execution is the same as if the
operations of all the processors were executed in some sequential order and the operations of each individual processor appear in the sequence in the order specified by the program.
Uniprocessor
memory
Processor
Memory operations inprogram order — sequential
Multiprocessor
memory
Processor Processor
Sequential Consistency
Relaxing Sequential Consistency
• Program Order– Write followed by a read to a different location can be
reordered– Write followed by a write to a different location can be
reordered– Read followed by a write to (or read from) a different
location can be reordered
• Write Atomicity– Another processor’s writes can be read even though the
write is not visible to the writing processor– A processor’s own writes can be read even though the
writes are not visible to other processors
Uniprocessor with Write Buffer
memory
Processor
Write
Buffer
P1:flag1 = 1if(flag2 == 0){ critical section}
P2:flag2 = 1if(flag1 == 0){ critical section}
Multiprocessor with Write Buffer
memory
Processor
Write
Buffer
P1:flag1 = 1if(flag2 == 0){ critical section}
P2:flag2 = 1if(flag1 == 0){ critical section}
Write
Buffer
Processor
Memory Barrier
P1:
flag1 = 1
mb()
if(flag2 == 0){
critical section
}
P2:
flag2 = 1
mb()
if(flag1 == 0){
critical section
}
Effect of Memory Barrier
memory
Processor
Write
Buffer
P1:flag1 = 1mb()if(flag2 == 0){ critical section}
Write
Buffer
Processor
P1:flag1 = 1mb()if(flag2 == 0){ critical section}
Write Through & Memory Bus
P1
WriteThroughCache
P1
WriteThroughCache
Interconnect
Memoryhead data
P1 P2
data = 2000 while(head ==0)head = 1 ; … = data
P1 P2
data = 2000 while(head ==0)head = 1 ; … = data
data
head
12 P2 sees write to “head” before
seeing write to data
Program Order has been relaxed
Late Cache Invalidate Signal
• P1’s writes arrive in-order to memory
• The read from data occurs before the cache-invalidate signal arrives at P2
• P2 reads “new” value of head• P2 reads “old” value of data from
cache• ISSUE
– Memory operations need to “complete.” Cache-invalidate signal needs to propagate
• Write Atomicity has been relaxed
P1
WriteThroughCache
P1
WriteThroughCache
Interconnect
Memoryhead data
invalidate
data
head
2
3
data
1
Fences
Relaxing Write to Read
• Reorder read following previous writes– IBM prohibits read from returning the value of a write before the
write is visible to all processors.– TSO can read own processors write– Cannot read another processor’s write early (must be visible to
all processors).– Our buffer example is similar in effect
• IBM has serialization instruction (so that the writes propagate and the reads won’t be reordered)
• TSO – won’t be reordered if instruction is RMW – so you can “enforce” order using a read-modify-write instruction.
Relaxing Write to Read/Write
• SPARC PSO– Writes to different locations can be pipelined
or overlapped – reach memory or caches out-of-order
– PSO identical to TSO, but allows a processor to read its own writes early
– Processors cannot read other processor’s writes before they are globally visible
– STBAR (store barrier) so writes can’t get reordered
Weak Ordering
• Data operations (read/writes)• Synchronization operations (fences/barriers)• Model allows
– Reordering of operations between synchronization operations
– Each processor ensures that synchronization instructions are not issued until all previous operations (data and sync) are complete.
• Ensures that writes always appear atomic, so no fence is required to ensure write atomicity
Release Consistency
• Acquire: read memory operation that gains access to a set of shared locations
• Release: a write operation that grants permission for accessing a set of shared locations
• Two flavors– Maintain sequential consistency among “special”
operations– Maintain processor consistency among “special”
operations
Release Consistency
• RC – SC– Acquire all, all release, special special– If acquire appears before any operation, program
order is enforced so that “acquire” completes before the following operations.
• RC – PC– Acquire all, all->release, special special,
except for a special write followed by a special read
RC - PC
• Program order for read following write requires using rmw operations, if write being ordered is “ordinary” then the write in the rmw needs to be a release
Just to make it more complicated
• Alpha– mb: enforce program order between any statements– wmb: only enforce program order among write statements
• RMO– (LD | ST) # (LD |ST)
• LDST#LD means that load and store operations before the barrier must be completed before any load operation after the barrier. Store operations after the barrier may be reordered before the barrier.
• Power– SYNC: like alpha’s mb, except that when placed between two
reads to the same location, the second read may go first.– Power allows writes to be seen early– RMW sequences are used to make writes appear atomic
Discussion/Conclusion
• System-centric: directly expose ordering and write atomicity relaxations. Complicated, difficult to port.
• Programmer-centric: Programmer provides information to determine what optimizations can be performed (when reading/writing particular variables). Compiler complexity increased. Debugging more difficult
• Relaxed memory models have proven to be effective in increasing performance; the cost of this higher performance is greater complexity.