19
Multiprocessors—Cache Coherency, Snooping Protocol

Multiprocessors—Cache Coherency, Snooping Protocol

  • Upload
    jacoba

  • View
    61

  • Download
    4

Embed Size (px)

DESCRIPTION

Multiprocessors—Cache Coherency, Snooping Protocol. Small-Scale—Shared Memory. Caches serve to: Increase bandwidth versus bus/memory Reduce latency of access Valuable for both private data and shared data What about cache consistency?. The Problem of Cache Coherency. - PowerPoint PPT Presentation

Citation preview

Page 1: Multiprocessors—Cache Coherency, Snooping Protocol

Multiprocessors—Cache Coherency, Snooping Protocol

Page 2: Multiprocessors—Cache Coherency, Snooping Protocol

Small-Scale—Shared Memory

• Caches serve to:– Increase bandwidth versus

bus/memory

– Reduce latency of access

– Valuable for both private data and shared data

• What about cache consistency?

Page 3: Multiprocessors—Cache Coherency, Snooping Protocol

The Problem of Cache Coherency

• Value of X in memory is 1

• CPU A reads X – its cache now contains 1

• CPU B reads X – its cache now contains 1

• CPU A stores 0 into X – CPU A’s cache contains a 0

– CPU B’s cache contains a 1

Page 4: Multiprocessors—Cache Coherency, Snooping Protocol

What Does Coherency Mean?

• Informally:– Any read must return the most recent write

– Too strict and very difficult to implement

• Better:– Any write must eventually be seen by a read

– All writes are seen in order (“serialization”)

• Two rules to ensure this:– If P writes x and P1 reads it, P’s write will be seen if the

read and write are sufficiently far apart

– Writes to a single location are serialized: seen in one order

» Latest write will be seen

» Otherwise could see writes in illogical order (could see older value after a newer value)

Page 5: Multiprocessors—Cache Coherency, Snooping Protocol

Potential Solutions

• Snooping Solution (Snoopy Bus):– Send all requests for data to all processors

– Processors snoop to see if they have a copy and respond accordingly

– Requires broadcast, since caching information is at processors

– Works well with bus (natural broadcast medium)

• Directory-Based Schemes– Keep track of what is being shared in one centralized place

– Distributed memory => distributed directory

– Send point-to-point requests to processors

– Scales better than Snoop

Page 6: Multiprocessors—Cache Coherency, Snooping Protocol

Basic Snoopy Protocols

• Write Invalidate Protocol:– Multiple readers, single writer

– Write to shared data: an invalidate is sent to all caches which snoop and invalidate any copies

– Read Miss:

» Write-through: memory is always up-to-date

» Write-back: snoop in caches to find most recent copy

• Write Broadcast Protocol:– Write to shared data: broadcast on bus, processors snoop, and update copies

– Read miss: memory is always up-to-date

• Write serialization: bus serializes requests– Bus is single point of arbitration

Page 7: Multiprocessors—Cache Coherency, Snooping Protocol

Write Invalidate

• Contents of memory location X = 0.

• CPU A reads X – its cache contains 0.

• CPU B reads X – its cache contains 0.

• CPU A writes a 1 to X – CPU B’s cache contents are invalidated, memory contains a stale value.

• CPU B reads X – CPU A responds with the value 1, CPU A aborts the memory request, the contents of B’s cache and memory are updated to 1.

Page 8: Multiprocessors—Cache Coherency, Snooping Protocol

Write Broadcast

• Contents of memory location X = 0.

• CPU A reads X – its cache contains 0.

• CPU B reads X – its cache contains 0.

• CPU A writes a 1 to X – Write broadcast of X updates CPU B’s cache contents and memory contents to 1.

• CPU B reads X – value is available in cache.

Page 9: Multiprocessors—Cache Coherency, Snooping Protocol

Basic Snoopy Protocols

• Write Invalidate versus Broadcast:– Invalidate requires one transaction per write-run

– Invalidate uses spatial locality: one transaction per block

– Broadcast has lower latency between write and read

– Broadcast: BW (increased) vs. latency (decreased) tradeoff

Page 10: Multiprocessors—Cache Coherency, Snooping Protocol

An Example Snoopy Protocol

• Invalidation protocol, write-back cache.

• Each cache block is in one state:– Shared: block can be read

– OR Exclusive: cache has only copy, its writeable, and dirty

– OR Invalid: block contains no data

• Read miss of a block in exclusive state will change state in the owning cache to shared.

• Transition to exclusive state requires a write miss to be placed on the bus.

• Write hit is treated as a write miss – for simplicity.

• Memory block in shared state is always up to date in memory.

Page 11: Multiprocessors—Cache Coherency, Snooping Protocol

Snoopy-Cache State Machine-I

• State machinefor CPU requests

• Cache Block State

Page 12: Multiprocessors—Cache Coherency, Snooping Protocol

Snoopy-Cache State Machine-II

• State machinefor bus requests

Page 13: Multiprocessors—Cache Coherency, Snooping Protocol
Page 14: Multiprocessors—Cache Coherency, Snooping Protocol

Example

Assumes A1 and A2 map to same cache block

Page 15: Multiprocessors—Cache Coherency, Snooping Protocol

Example

Assumes A1 and A2 map to same cache block

Page 16: Multiprocessors—Cache Coherency, Snooping Protocol

Example

Assumes A1 and A2 map to same cache block

Page 17: Multiprocessors—Cache Coherency, Snooping Protocol

Example

Assumes A1 and A2 map to same cache block

Page 18: Multiprocessors—Cache Coherency, Snooping Protocol

Example

Assumes A1 and A2 map to same cache block

Page 19: Multiprocessors—Cache Coherency, Snooping Protocol

Example

Assumes A1 and A2 map to same cache block