“Shared Memory Consistency Models: A Tutorial” – Adve & Gharachorloo

“Shared Memory Consistency Models: A Tutorial” – Adve &

Gharachorloo

Robert T. Bauer

Shared Memory

• Shared memory – single address space abstraction in a multiprocessor environment.

Memory Model

• Specifics how reads and writes appear to executed

• May (usually) varies by level– Programming language can provide a

memory model, for example Java has its own (JMM, JSR 133)

– Processor– Memory subsystem

Definitions

• Sequential (Processor)– Result of an execution is the same as if the

operations had been executed in the order specified by the program.

• Sequentially Consistent (Multiprocessor)– Result of any execution is the same as if the

operations of all the processors were executed in some sequential order and the operations of each individual processor appear in the sequence in the order specified by the program.

Uniprocessor

memory

Processor

Memory operations inprogram order — sequential

Multiprocessor

memory

Processor Processor

Sequential Consistency

Relaxing Sequential Consistency

• Program Order– Write followed by a read to a different location can be

reordered– Write followed by a write to a different location can be

reordered– Read followed by a write to (or read from) a different

location can be reordered

• Write Atomicity– Another processor’s writes can be read even though the

write is not visible to the writing processor– A processor’s own writes can be read even though the

writes are not visible to other processors

Uniprocessor with Write Buffer

memory

Processor

Write

Buffer

P1:flag1 = 1if(flag2 == 0){ critical section}


Multiprocessor with Write Buffer

memory

Processor

Write

Buffer



Write

Buffer

Processor

Memory Barrier

P1:

flag1 = 1

mb()

if(flag2 == 0){

critical section

}

P2:

flag2 = 1

mb()

if(flag1 == 0){

critical section

}

Effect of Memory Barrier

memory

Processor

Write

Buffer

P1:flag1 = 1mb()if(flag2 == 0){ critical section}

Write

Buffer

Processor

P1:flag1 = 1mb()if(flag2 == 0){ critical section}

Write Through & Memory Bus

P1

WriteThroughCache

P1

WriteThroughCache

Interconnect

Memoryhead data

P1 P2

data = 2000 while(head ==0)head = 1 ; … = data

P1 P2

data = 2000 while(head ==0)head = 1 ; … = data

data

head

12 P2 sees write to “head” before

seeing write to data

Program Order has been relaxed

Late Cache Invalidate Signal

• P1’s writes arrive in-order to memory

• The read from data occurs before the cache-invalidate signal arrives at P2

• P2 reads “new” value of head• P2 reads “old” value of data from

cache• ISSUE

– Memory operations need to “complete.” Cache-invalidate signal needs to propagate

• Write Atomicity has been relaxed

P1

WriteThroughCache

P1

WriteThroughCache

Interconnect

Memoryhead data

invalidate

data

head

2

3

data

1

Fences

Relaxing Write to Read

• Reorder read following previous writes– IBM prohibits read from returning the value of a write before the

write is visible to all processors.– TSO can read own processors write– Cannot read another processor’s write early (must be visible to

all processors).– Our buffer example is similar in effect

• IBM has serialization instruction (so that the writes propagate and the reads won’t be reordered)

• TSO – won’t be reordered if instruction is RMW – so you can “enforce” order using a read-modify-write instruction.

Relaxing Write to Read/Write

• SPARC PSO– Writes to different locations can be pipelined

or overlapped – reach memory or caches out-of-order

– PSO identical to TSO, but allows a processor to read its own writes early

– Processors cannot read other processor’s writes before they are globally visible

– STBAR (store barrier) so writes can’t get reordered

Weak Ordering

• Data operations (read/writes)• Synchronization operations (fences/barriers)• Model allows

– Reordering of operations between synchronization operations

– Each processor ensures that synchronization instructions are not issued until all previous operations (data and sync) are complete.

• Ensures that writes always appear atomic, so no fence is required to ensure write atomicity

Release Consistency

• Acquire: read memory operation that gains access to a set of shared locations

• Release: a write operation that grants permission for accessing a set of shared locations

• Two flavors– Maintain sequential consistency among “special”

operations– Maintain processor consistency among “special”

operations

Release Consistency

• RC – SC– Acquire all, all release, special special– If acquire appears before any operation, program

order is enforced so that “acquire” completes before the following operations.

• RC – PC– Acquire all, all->release, special special,

except for a special write followed by a special read

RC - PC

• Program order for read following write requires using rmw operations, if write being ordered is “ordinary” then the write in the rmw needs to be a release

Just to make it more complicated

• Alpha– mb: enforce program order between any statements– wmb: only enforce program order among write statements

• RMO– (LD | ST) # (LD |ST)

• LDST#LD means that load and store operations before the barrier must be completed before any load operation after the barrier. Store operations after the barrier may be reordered before the barrier.

• Power– SYNC: like alpha’s mb, except that when placed between two

reads to the same location, the second read may go first.– Power allows writes to be seen early– RMW sequences are used to make writes appear atomic

Discussion/Conclusion

• System-centric: directly expose ordering and write atomicity relaxations. Complicated, difficult to port.

• Programmer-centric: Programmer provides information to determine what optimizations can be performed (when reading/writing particular variables). Compiler complexity increased. Debugging more difficult

• Relaxed memory models have proven to be effective in increasing performance; the cost of this higher performance is greater complexity.

Documents

“Shared Memory Consistency Models: A Tutorial” – Adve & Gharachorloo