31
M4 – Parallelism Directory based Cache Coherence Protocol

Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

  • Upload
    others

  • View
    36

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

M4 – Parallelism

Directory based Cache Coherence Protocol

Page 2: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

Outline

● Parallelism● Flynn’s classification● Vector Processing

– Subword Parallelism

● Symmetric Multiprocessors, Distributed Memory Machines– Shared Memory Multiprocessing, Message Passing

● Synchronization Primitives– Locks, LL-SC

● Cache coherence

Page 3: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

Shared Memory vs. Distributed Memory

PP

CC

Main MemoryMain Memory

PP

CC

PP

CC

PP

CC

PP

MM

InterconnectInterconnect

PP

MM

PP

MM

PP

MM

Page 4: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

Interconnection NetworkInterconnection Network

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

Directory Based Cache Coherence

Page 5: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

Interconnection NetworkInterconnection Network

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

Directory Based Cache Coherence

Page 6: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

Interconnection NetworkInterconnection Network

MM

A: Read XA: Read X

DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

Directory Based Cache Coherence

Page 7: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

Interconnection NetworkInterconnection Network

MM

A: Read XA: Read X

DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

SharedShared

Directory Based Cache Coherence

Page 8: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

Interconnection NetworkInterconnection Network

MM

A: Read XA: Read X

DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

S: AS: A

SharedShared

Directory Based Cache Coherence

Page 9: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

Interconnection NetworkInterconnection Network

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

S: AS: A

SharedShared

Directory Based Cache Coherence

Page 10: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

Interconnection NetworkInterconnection Network

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

S: AS: A

SharedShared

Directory Based Cache Coherence

B: Read XB: Read X

Page 11: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

S: AS: A

SharedShared

Directory Based Cache Coherence

B: Read XB: Read X

Read XRead X

Page 12: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

S: A, BS: A, B

SharedShared

Directory Based Cache Coherence

B: Read XB: Read X

Read XRead X

Page 13: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

S: A, BS: A, B

SharedShared

Directory Based Cache Coherence

B: Read XB: Read X

XX

Page 14: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

S: A, BS: A, B

SharedShared

Directory Based Cache Coherence

SharedShared

Page 15: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

S: A, BS: A, B

SharedShared

Directory Based Cache Coherence

SharedShared

A: Write XA: Write X

Page 16: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

S: A, BS: A, B

SharedShared

Directory Based Cache Coherence

SharedShared

A: Write XA: Write X

Inv XInv X

Page 17: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

S: A, BS: A, B

SharedShared

Directory Based Cache Coherence

A: Write XA: Write X

Inv XInv X

InvalidInvalid

Page 18: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

S: A, BS: A, B

SharedShared

Directory Based Cache Coherence

A: Write XA: Write X

ACKACK

InvalidInvalid

Page 19: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

M: AM: A

ModifiedModified

Directory Based Cache Coherence

A: Write XA: Write X

InvalidInvalid

Page 20: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

MM DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

S: A, BS: A, B

SharedShared

Directory Based Cache Coherence

InvalidInvalid

A: Write XA: Write X

Inv XInv X

ACKACK

Page 21: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

CPU APrivate Cache

CPU APrivate Cache

MM

C: Write XC: Write X

DD

CPU BPrivate Cache

CPU BPrivate Cache

MM DD

CPU CPrivate Cache

CPU CPrivate Cache

MM DD

SharedShared

S: A, BS: A, B

SharedSharedInvalidateInvalidate

M: AM: A

ModifiedModified

B: Read XB: Read X C, A: Write XC, A: Write X

Directory Based Cache Coherence

Page 22: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines
Page 23: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

Directory Based Cache Coherence

● Broadcast based snooping protocols do not scale well to large multiprocessors

● Distributed Memory Machines– Physical memory is distributed among all processors

● Directory tracks sharing status of a block of memory– Each node has a directory

● Physical address determines data location● Coherence messages between sent over the ICN

– Point-to-point messages (no broadcast)

Page 24: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

Slides Contents

● Rajeev Balasubramonian, CS6810, University of Utah.

Page 25: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

Extra

Page 26: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

Shared Memory vs. Message Passing● Shared Memory Machine: processors share

the same physical address space– Implicit Communication, Hardware controlled

cache coherence

● Message Passing Machine– Explicit communication – programmed

– No cache coherence (simpler hardware)

– Message passing libraries: MPI

PP

CC

Main MemoryMain Memory

PP

CC

PP

CC

PP

CC

PP

MM

InterconnectInterconnect

PP

MM

PP

MM

PP

MM

Page 27: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

Cache Coherence● Consistency

– When should a written value be available to read

– Memory Consistency Models

● Coherence– Which value to return on a read

● A memory system is coherent if:– Write Propagation

● A write is visible after a sufficient time lapse

– Write Serialization● All writes to a location are seen by every processor in the

same order

Page 28: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

Multiprocessor Cache Coherence

● A read by a processor P to a location X that follows a write by P to X, with no writes of X by another processor occurring between the write and the read by P, always returns the value written by P.

● A read by a processor to location X that follows a write by another processor to X returns the written value if the read and write are sufficiently separated in time and no other writes to X occur between the two accesses.

● Writes to the same location are serialized; that is, two writes to the same location by any two processors are seen in the same order by all processors.

Page 29: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

Write Invalidate Coherence Protocol

Writeback / WritethroughEnforcing write serialization

• Bus Arbitration

Tag Contention, Duplication

Page 30: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

SMP Cache Coherence

● MSI Protocol● MESI Protocol

– Exclusive state: No invalidate messages on writes.

– Intel i7 uses MESIF

● MOESI Protocol– Owned state: Only valid copy in the system. Main

memory copy is stale.

– Owner supplies data on a miss.

Page 31: Directory based Cache Coherence Protocol · Directory Based Cache Coherence Broadcast based snooping protocols do not scale well to large multiprocessors Distributed Memory Machines

SMP Example

ProcessorA

Caches

ProcessorB

Caches

ProcessorC

Caches

ProcessorD

Caches

Main Memory I/O System

A: Rd XB: Rd XC: Rd XA: Wr XA: Wr XC: Wr XB: Rd XA: Rd XA: Rd YB: Wr XB: Rd YB: Wr XB: Wr Y