22
“Shared Memory Consistency Models: A Tutorial” – Adve & Gharachorloo Robert T. Bauer

“Shared Memory Consistency Models: A Tutorial” – Adve & Gharachorloo

  • Upload
    sahara

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

“Shared Memory Consistency Models: A Tutorial” – Adve & Gharachorloo. Robert T. Bauer. Shared Memory. Shared memory – single address space abstraction in a multiprocessor environment. Memory Model. Specifics how reads and writes appear to executed May (usually) varies by level - PowerPoint PPT Presentation

Citation preview

Page 1: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

“Shared Memory Consistency Models: A Tutorial” – Adve &

Gharachorloo

Robert T. Bauer

Page 2: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Shared Memory

• Shared memory – single address space abstraction in a multiprocessor environment.

Page 3: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Memory Model

• Specifics how reads and writes appear to executed

• May (usually) varies by level– Programming language can provide a

memory model, for example Java has its own (JMM, JSR 133)

– Processor– Memory subsystem

Page 4: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Definitions

• Sequential (Processor)– Result of an execution is the same as if the

operations had been executed in the order specified by the program.

• Sequentially Consistent (Multiprocessor)– Result of any execution is the same as if the

operations of all the processors were executed in some sequential order and the operations of each individual processor appear in the sequence in the order specified by the program.

Page 5: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Uniprocessor

memory

Processor

Memory operations inprogram order — sequential

Page 6: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Multiprocessor

memory

Processor Processor

Sequential Consistency

Page 7: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Relaxing Sequential Consistency

• Program Order– Write followed by a read to a different location can be

reordered– Write followed by a write to a different location can be

reordered– Read followed by a write to (or read from) a different

location can be reordered

• Write Atomicity– Another processor’s writes can be read even though the

write is not visible to the writing processor– A processor’s own writes can be read even though the

writes are not visible to other processors

Page 8: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Uniprocessor with Write Buffer

memory

Processor

Write

Buffer

P1:flag1 = 1if(flag2 == 0){ critical section}

P2:flag2 = 1if(flag1 == 0){ critical section}

Page 9: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Multiprocessor with Write Buffer

memory

Processor

Write

Buffer

P1:flag1 = 1if(flag2 == 0){ critical section}

P2:flag2 = 1if(flag1 == 0){ critical section}

Write

Buffer

Processor

Page 10: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Memory Barrier

P1:

flag1 = 1

mb()

if(flag2 == 0){

critical section

}

P2:

flag2 = 1

mb()

if(flag1 == 0){

critical section

}

Page 11: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Effect of Memory Barrier

memory

Processor

Write

Buffer

P1:flag1 = 1mb()if(flag2 == 0){ critical section}

Write

Buffer

Processor

P1:flag1 = 1mb()if(flag2 == 0){ critical section}

Page 12: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Write Through & Memory Bus

P1

WriteThroughCache

P1

WriteThroughCache

Interconnect

Memoryhead data

P1 P2

data = 2000 while(head ==0)head = 1 ; … = data

P1 P2

data = 2000 while(head ==0)head = 1 ; … = data

data

head

12 P2 sees write to “head” before

seeing write to data

Program Order has been relaxed

Page 13: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Late Cache Invalidate Signal

• P1’s writes arrive in-order to memory

• The read from data occurs before the cache-invalidate signal arrives at P2

• P2 reads “new” value of head• P2 reads “old” value of data from

cache• ISSUE

– Memory operations need to “complete.” Cache-invalidate signal needs to propagate

• Write Atomicity has been relaxed

P1

WriteThroughCache

P1

WriteThroughCache

Interconnect

Memoryhead data

invalidate

data

head

2

3

data

1

Page 14: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Fences

Page 15: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Relaxing Write to Read

• Reorder read following previous writes– IBM prohibits read from returning the value of a write before the

write is visible to all processors.– TSO can read own processors write– Cannot read another processor’s write early (must be visible to

all processors).– Our buffer example is similar in effect

• IBM has serialization instruction (so that the writes propagate and the reads won’t be reordered)

• TSO – won’t be reordered if instruction is RMW – so you can “enforce” order using a read-modify-write instruction.

Page 16: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Relaxing Write to Read/Write

• SPARC PSO– Writes to different locations can be pipelined

or overlapped – reach memory or caches out-of-order

– PSO identical to TSO, but allows a processor to read its own writes early

– Processors cannot read other processor’s writes before they are globally visible

– STBAR (store barrier) so writes can’t get reordered

Page 17: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Weak Ordering

• Data operations (read/writes)• Synchronization operations (fences/barriers)• Model allows

– Reordering of operations between synchronization operations

– Each processor ensures that synchronization instructions are not issued until all previous operations (data and sync) are complete.

• Ensures that writes always appear atomic, so no fence is required to ensure write atomicity

Page 18: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Release Consistency

• Acquire: read memory operation that gains access to a set of shared locations

• Release: a write operation that grants permission for accessing a set of shared locations

• Two flavors– Maintain sequential consistency among “special”

operations– Maintain processor consistency among “special”

operations

Page 19: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Release Consistency

• RC – SC– Acquire all, all release, special special– If acquire appears before any operation, program

order is enforced so that “acquire” completes before the following operations.

• RC – PC– Acquire all, all->release, special special,

except for a special write followed by a special read

Page 20: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

RC - PC

• Program order for read following write requires using rmw operations, if write being ordered is “ordinary” then the write in the rmw needs to be a release

Page 21: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Just to make it more complicated

• Alpha– mb: enforce program order between any statements– wmb: only enforce program order among write statements

• RMO– (LD | ST) # (LD |ST)

• LDST#LD means that load and store operations before the barrier must be completed before any load operation after the barrier. Store operations after the barrier may be reordered before the barrier.

• Power– SYNC: like alpha’s mb, except that when placed between two

reads to the same location, the second read may go first.– Power allows writes to be seen early– RMW sequences are used to make writes appear atomic

Page 22: “Shared Memory Consistency Models:  A Tutorial” – Adve & Gharachorloo

Discussion/Conclusion

• System-centric: directly expose ordering and write atomicity relaxations. Complicated, difficult to port.

• Programmer-centric: Programmer provides information to determine what optimizations can be performed (when reading/writing particular variables). Compiler complexity increased. Debugging more difficult

• Relaxed memory models have proven to be effective in increasing performance; the cost of this higher performance is greater complexity.