45
CSC/ECE 506: Architecture of Parallel Computers The Cache-Coherence Problem Chapter 7 1

The Cache-Coherence Problem

  • Upload
    candra

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

The Cache-Coherence Problem. Chapter 7. Cache. Cache. Cache. Example 1: The Cache-Coherence Problem. sum = 0; begin parallel for (i=0; i

Citation preview

Page 1: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

The Cache-Coherence Problem

The Cache-Coherence Problem

Chapter 7Chapter 7

1

Page 2: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Example 1: The Cache-Coherence Problem

sum = 0;begin parallelfor (i=0; i<2; i++) { lock(id, myLock); sum = sum + a[i]; unlock(id, myLock);end parallelprint sum;

Suppose a[0] = 3 and a[1] = 7

P1

CacheCache

P2

CacheCache

Pn

CacheCache

. . .

• Will it print sum = 10?

2

Page 3: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Cache-Coherence Problem Illustration

Start state. All caches empty and main memory has Sum = 0.

Start state. All caches empty and main memory has Sum = 0.

P1

CacheCache

P2

CacheCache

P3

CacheCache

Main memoryMain memory

Sum = 0Sum = 0

ControllerControllerTraceTrace

P1P1 Read SumRead Sum

P2P2 Read SumRead Sum

P1P1 Write Sum = 3Write Sum = 3

P2P2 Write Sum = 7Write Sum = 7

P1P1 Read SumRead Sum

Bus

Bus

3

Page 4: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Cache-Coherence Problem Illustration

P1 reads Sum from memory.P1 reads Sum from memory. P1

CacheCache

P2

CacheCache

P3

CacheCache

Main memoryMain memory

Sum = 0Sum = 0

ControllerControllerTraceTrace

P1P1 Read SumRead Sum

P2P2 Read SumRead Sum

P1P1 Write Sum = 3Write Sum = 3

P2P2 Write Sum = 7Write Sum = 7

P1P1 Read SumRead Sum

Bus

Bus

Sum=0Sum=0 VV

4

Page 5: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Cache-Coherence Problem Illustration

P2 reads. Let’s assume this

comes from memory too.

P2 reads. Let’s assume this

comes from memory too.P1

CacheCache

P2

CacheCache

P3

CacheCache

Main memoryMain memory

Sum = 0Sum = 0

ControllerControllerTraceTrace

P1P1 Read SumRead Sum

P2P2 Read SumRead Sum

P1P1 Write Sum = 3Write Sum = 3

P2P2 Write Sum = 7Write Sum = 7

P1P1 Read SumRead Sum

Bus

Bus

Sum=0Sum=0 VV Sum=0Sum=0 VV

5

Page 6: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Cache-Coherence Problem Illustration

P1 writes. This write goes

to the cache.

P1 writes. This write goes

to the cache.P1

CacheCache

P2

CacheCache

P3

CacheCache

Main memoryMain memory

Sum = 0Sum = 0

ControllerControllerTraceTrace

P1P1 Read SumRead Sum

P2P2 Read SumRead Sum

P1P1 Write Sum = 3Write Sum = 3

P2P2 Write Sum = 7Write Sum = 7

P1P1 Read SumRead Sum

Bus

Bus

Sum=3Sum=3 DD Sum=0Sum=0 VV

6

Sum=0Sum=0 VV

Page 7: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Cache-Coherence Problem Illustration

P2 writes.P2 writes. P1

CacheCache

P2

CacheCache

P3

CacheCache

Main memoryMain memory

Sum = 0Sum = 0

ControllerControllerTraceTrace

P1P1 Read SumRead Sum

P2P2 Read SumRead Sum

P1P1 Write Sum = 3Write Sum = 3

P2P2 Write Sum = 7Write Sum = 7

P1P1 Read SumRead Sum

Bus

Bus

Sum=3Sum=3 DD Sum=7Sum=7 DD

7

Sum=0Sum=0 VV

Page 8: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Cache-Coherence Problem Illustration

P1 reads.P1 reads. P1

CacheCache

P2

CacheCache

P3

CacheCache

Main memoryMain memory

Sum = 0Sum = 0

ControllerControllerTraceTrace

P1P1 Read SumRead Sum

P2P2 Read SumRead Sum

P1P1 Write Sum = 3Write Sum = 3

P2P2 Write Sum = 7Write Sum = 7

P1P1 Read SumRead Sum

Bus

Bus

Sum=3Sum=3 DD Sum=7Sum=7 DD

8

Page 9: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Cache-Coherence Problem

• Do P1 and P2 see the same sum?

• Does it matter if we use a WT cache?

• What if we do not have caches, or sum is uncacheable.

Will it work?

9

Page 10: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Write-Through Cache Does Not Work

P1 reads.P1 reads. P1

CacheCache

P2

CacheCache

P3

CacheCache

Main memoryMain memory

Sum = 7Sum = 7

ControllerControllerTraceTrace

P1P1 Read SumRead Sum

P2P2 Read SumRead Sum

P1P1 Write Sum = 3Write Sum = 3

P2P2 Write Sum = 7Write Sum = 7

P1P1 Read SumRead Sum

Bus

Bus

Sum=3Sum=3 DD Sum=7Sum=7 DD

10

Page 11: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Coherence with Write-Through Caches

11

sum = 0;begin parallelfor (i=0; i<2; i++) { lock(id, myLock); sum = sum + a[i]; unlock(id, myLock);end parallelPrint sum;

Suppose a[0] = 3 and a[1] = 7

P1

CacheCache

P2

CacheCache

Pn

CacheCache

. . .

= Snooper

– Bus-based SMP implementation choice in the mid-’80s– What happens when we snoop a write? invalidating/update the block

• No new states or bus transactions in this case• Write-update protocol: write is immediately propagated• Write-invalidation protocol: causes miss on later access, and memory up-

to-date via write-through

Page 12: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Bus-Based Coherent Multiprocessors

Bus-Based Coherent Multiprocessors

12

Chapter 8Chapter 8

Page 13: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

MSI: Processor-Initiated Transactions

13

M

I

S

PrRd/ –PrWr/ –

PrWr/BusRdX

PrRd/BusRd

PrRd/ / – -

PrWr/BusRdX

Why does a PrWr in state S induce a BusRdX?

Page 14: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

MSI: Bus-Initiated Transactions

14

M

I

S

BusRdX/Flush

BusRdX/ –

BusRd/-

BusRd/Flush

BusRd/ –BusRdX/ –

Thus, validdata must besuppliedby memory

Page 15: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

MSI Visualization – Start State

15

Start state. All caches empty and main memory has A = 1.

Start state. All caches empty and main memory has A = 1.

P1

CacheCacheSnooperSnooper

P2

CacheCacheSnooperSnooper

P3

CacheCacheSnooperSnooper

Main memoryMain memoryA = 1A = 1ControllerControllerTraceTraceP1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

Bus

Bus

Page 16: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P1 Reads A

Processor P1 attempts to read A from its cache.

Processor P1 attempts to read A from its cache.

P1

CacheCache

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

SnooperSnooper

Main memoryMain memory

A = 1A = 1

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P1P1 PrRd APrRd A

P1P1 BusRd ABusRd A

MemMem returns datareturns data

16

Page 17: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P1 Reads A

Processor P1 issues a BusRd.

Processor P1 issues a BusRd.

P1

CacheCache

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

SnooperSnooper

Main memoryMain memory

A = 1A = 1

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P1P1 PrRd APrRd A

P1P1 BusRd ABusRd A

MemMem returns datareturns data

Page 18: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P1 Reads A

Main memory returns data to processor P1 which updates its cache.

Main memory returns data to processor P1 which updates its cache.

P1

CacheCache

A = 1A = 1 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

SnooperSnooper

Main memoryMain memory

A = 1A = 1

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P1P1 PrRd APrRd A

P1P1 BusRd ABusRd A

MemMem returns datareturns data

Page 19: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P1 Reads A

Read operation completes.Read operation completes. P1

CacheCache

A = 1A = 1 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

SnooperSnooper

Main memoryMain memory

A = 1A = 1

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

Page 20: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P1 Writes A = 2

Processor P1 writes to its cache.

Processor P1 writes to its cache.

P1

CacheCache

A = 2A = 2 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

SnooperSnooper

Main memoryMain memory

A = 1A = 1

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P1P1 PrWr APrWr A

P1P1 BusRdXBusRdX

Page 21: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P1 Writes A = 2

Processor P1 issues a BusRd request.

Processor P1 issues a BusRd request.

P1

CacheCache

A = 2A = 2 MM

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

SnooperSnooper

Main memoryMain memory

A = 1A = 1

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P1P1 PrWr APrWr A

P1P1 BusRdXBusRdX

Page 22: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P1 Writes A = 2

Write operation completes.Write operation completes. P1

CacheCache

A = 2A = 2 MM

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

SnooperSnooper

Main memoryMain memory

A = 1A = 1

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

Page 23: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P3 Reads A

Processor P3 attempts to read A from its cache.

Processor P3 attempts to read A from its cache.

P1

CacheCache

A = 2A = 2 MM

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

SnooperSnooper

Main memoryMain memory

A = 1A = 1

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P3P3 PrRd APrRd A

P3P3 BusRd ABusRd A

P1P1 snoops BusRdsnoops BusRd

P1P1 FlushFlush

Page 24: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P3 Reads A

Processor P3 issues a BusRd.

Processor P3 issues a BusRd.

P1

CacheCache

A = 2A = 2 MM

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

SnooperSnooper

Main memoryMain memory

A = 1A = 1

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P3P3 PrRd APrRd A

P3P3 BusRd ABusRd A

P1P1 snoops BusRdsnoops BusRd

P1P1 FlushFlush

Page 25: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P3 Reads A

Processor P1 snoops the BusRd from processor P3.

Processor P1 snoops the BusRd from processor P3.

P1

CacheCache

A = 2A = 2 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

SnooperSnooper

Main memoryMain memory

A = 1A = 1

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P3P3 PrRd APrRd A

P3P3 BusRd ABusRd A

P1P1 snoops BusRdsnoops BusRd

P1P1 FlushFlush

Page 26: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P3 Reads A

Processor P1 flushes, sending updated data to P3

and main memory.

Processor P1 flushes, sending updated data to P3

and main memory.

P1

CacheCache

A = 2A = 2 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 2A = 2 SS

SnooperSnooper

Main memoryMain memory

A = 2A = 2

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P3P3 PrRd APrRd A

P3P3 BusRd ABusRd A

P1P1 snoops BusRdsnoops BusRd

P1P1 FlushFlush

Page 27: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P3 Reads A

Read operation completes.Read operation completes. P1

CacheCache

A = 2A = 2 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 2A = 2 SS

SnooperSnooper

Main memoryMain memory

A = 2A = 2

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

Page 28: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P3 Writes A = 3

Processor P3 writes to its cache.

Processor P3 writes to its cache.

P1

CacheCache

A = 2A = 2 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 2A = 2 SS

SnooperSnooper

Main memoryMain memory

A = 2A = 2

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P3P3 PrWr APrWr A

P3P3 BusRdXBusRdX

P1P1 snoops BusRdXsnoops BusRdX

Page 29: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P3 Writes A = 3

Processor P3 issues a BusRd request.

Processor P3 issues a BusRd request.

P1

CacheCache

A = 2A = 2 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 2A = 2 SS

SnooperSnooper

Main memoryMain memory

A = 2A = 2

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P3P3 PrWr APrWr A

P3P3 BusRdXBusRdX

P1P1 snoops BusRdXsnoops BusRdX

Page 30: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P3 Writes A = 3

Processor P1 snoops the BusRd and invalidates its cache.

Processor P1 snoops the BusRd and invalidates its cache.

P1

CacheCache

A = 2A = 2 II

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 MM

SnooperSnooper

Main memoryMain memory

A = 2A = 2

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P3P3 PrWr APrWr A

P3P3 BusRdXBusRdX

P1P1 snoops BusRdXsnoops BusRdX

Page 31: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P3 Writes A = 3

Write operation completes. Write operation completes. P1

CacheCache

A = 2A = 2 II

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 MM

SnooperSnooper

Main memoryMain memory

A = 2A = 2

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

Page 32: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P1 Reads A

Processor P1 reads from its cache.

Processor P1 reads from its cache.

P1

CacheCache

A = 2A = 2 II

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 MM

SnooperSnooper

Main memoryMain memory

A = 2A = 2

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P1P1 PrRd APrRd A

P1P1 BusRd ABusRd A

P3P3 snoops BusRdsnoops BusRd

P3P3 FlushFlush

Page 33: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P1 Reads A

Processor P1 issues a BusRd request.

Processor P1 issues a BusRd request.

P1

CacheCache

A = 2A = 2 II

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 MM

SnooperSnooper

Main memoryMain memory

A = 2A = 2

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P1P1 PrRd APrRd A

P1P1 BusRd ABusRd A

P3P3 snoops BusRdsnoops BusRd

P3P3 FlushFlush

Page 34: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P1 Reads A

Processor P3 snoops the BusRd.

Processor P3 snoops the BusRd.

P1

CacheCache

A = 2A = 2 II

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 MM

SnooperSnooper

Main memoryMain memory

A = 2A = 2

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P1P1 PrRd APrRd A

P1P1 BusRd ABusRd A

P3P3 snoops BusRdsnoops BusRd

P3P3 FlushFlush

Page 35: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P1 Reads A

Processor P3 flushes, updating processor P1, main memory and its own cache state.

Processor P3 flushes, updating processor P1, main memory and its own cache state.

P1

CacheCache

A = 3A = 3 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 SS

SnooperSnooper

Main memoryMain memory

A = 3A = 3

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P1P1 PrRd APrRd A

P1P1 BusRd ABusRd A

P3P3 snoops BusRdsnoops BusRd

P3P3 FlushFlush

Page 36: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P1 Reads A

Read operation completes.Read operation completes. P1

CacheCache

A = 3A = 3 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 SS

SnooperSnooper

Main memoryMain memory

A = 3A = 3

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

Page 37: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P3 Reads A

Processor P3 reads from its cache.

Processor P3 reads from its cache.

P1

CacheCache

A = 3A = 3 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 SS

SnooperSnooper

Main memoryMain memory

A = 3A = 3

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P3P3 PrRd APrRd A

P3P3 returns datareturns data

Page 38: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P3 Reads A

Processor P3 returns valid data from its cache.

Processor P3 returns valid data from its cache.

P1

CacheCache

A = 3A = 3 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 SS

SnooperSnooper

Main memoryMain memory

A = 3A = 3

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P3P3 PrRd APrRd A

P3P3 returns datareturns data

Page 39: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P3 Reads A

Read operation completes.Read operation completes. P1

CacheCache

A = 3A = 3 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 SS

SnooperSnooper

Main memoryMain memory

A = 3A = 3

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

Page 40: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P2 Reads A

Processor P2 reads from its cache.

Processor P2 reads from its cache.

P1

CacheCache

A = 3A = 3 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 SS

SnooperSnooper

Main memoryMain memory

A = 3A = 3

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P2P2 PrRd APrRd A

P2P2 BusRd ABusRd A

MemCntrMemCntr observes BusRdobserves BusRd

MemMem returns datareturns data

Page 41: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P2 Reads A

Processor P2 issues a BusRd request.

Processor P2 issues a BusRd request.

P1

CacheCache

A = 3A = 3 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 SS

SnooperSnooper

Main memoryMain memory

A = 3A = 3

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P2P2 PrRd APrRd A

P2P2 BusRd ABusRd A

MemCntrMemCntr observes BusRdobserves BusRd

MemMem returns datareturns data

Page 42: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P2 Reads A

Main memory controller observes the BusRd.Main memory controller observes the BusRd.

P1

CacheCache

A = 3A = 3 SS

SnooperSnooper

P2

CacheCache

SnooperSnooper

P3

CacheCache

A = 3A = 3 SS

SnooperSnooper

Main memoryMain memory

A = 3A = 3

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P2P2 PrRd APrRd A

P2P2 BusRd ABusRd A

MemCntrMemCntr observes BusRdobserves BusRd

MemMem returns datareturns data

Page 43: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P2 Reads A

Main memory returns valid data.Main memory returns valid data.

P1

CacheCache

A = 3A = 3 SS

SnooperSnooper

P2

CacheCache

A = 3A = 3 SS

SnooperSnooper

P3

CacheCache

A = 3A = 3 SS

SnooperSnooper

Main memoryMain memory

A = 3A = 3

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

P2P2 PrRd APrRd A

P2P2 BusRd ABusRd A

MemCntrMemCntr observes BusRdobserves BusRd

MemMem returns datareturns data

Page 44: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Processor P2 Reads A

Operation completes.Operation completes. P1

CacheCache

A = 3A = 3 SS

SnooperSnooper

P2

CacheCache

A = 3A = 3 SS

SnooperSnooper

P3

CacheCache

A = 3A = 3 SS

SnooperSnooper

Main memoryMain memory

A = 3A = 3

ControllerControllerTraceTrace

P1P1 Read ARead A

P1P1 Write A = 2Write A = 2

P3P3 Read ARead A

P3P3 Write A = 3Write A = 3

P1P1 Read ARead A

P3P3 Read ARead A

P2P2 Read ARead A

BusBus

Page 45: The Cache-Coherence Problem

CSC/ECE 506: Architecture of Parallel Computers

Example: Rd/Wr to a single line

45

Proc Action

State P1 State P2 State P3 Bus Action Data From

R1 S – – BusRd Mem

W1 M – – BusRdX* Mem

R3 S – S BusRd P1 cache

W3 I – M BusRdX* Mem

R1 S – S BusRd P3 cache

R3 S – S – Local Cache

R2 S S S BusRd Mem

*or, BusUpgr (data from own cache)