View
212
Download
0
Category
Tags:
Preview:
DESCRIPTION
multiprocessors
Citation preview
Computer Architecture
Chapter 8
Multiprocessors
Shared Memory Architectures
Prof. Jerry Breecher
CSCI 240
Fall 2003
Chap. 8 - Multiprocessors 2
Chapter OverviewWe’re going to do only one section from this chapter, that part
related to how caches from multiple processors interact with each other.
8.1 Introduction – the big picture
8.3 Centralized Shared Memory Architectures
Chap. 8 - Multiprocessors 3
Introduction
8.1 Introduction
8.3 Centralized Shared Memory Architectures
The Big Picture: Where are We Now?
The major issue is this:
We’ve taken copies of the contents of main memory and put them in caches closer to the processors. But what happens to those copies if someone else wants to use the main memory data?
How do we keep all copies of the data in synch with each other?
Chap. 8 - Multiprocessors 4
The Multiprocessor Picture
Processor/MemoryBus
PCI Bus
I/O Busses
Example: Pentium System
Organization
Chap. 8 - Multiprocessors 5
Memory
Disk & other IO
Shared Memory Multiprocessor
Registers
Caches
Processor
Registers
Caches
Processor
Registers
Caches
Processor
Registers
Caches
Processor
Chipset •Memory: centralized with Uniform Memory Access time (“uma”) and bus interconnect, I/O•Examples: Sun Enterprise 6000, SGI Challenge, Intel SystemPro
Chap. 8 - Multiprocessors 6
• Several processors share one address space– conceptually a shared memory– often implemented just like a
multicomputer• address space distributed
over private memories• Communication is implicit
– read and write accesses to shared memory locations
• Synchronization– via shared memory locations
• spin waiting for non-zero– barriers
P
M
Network/Bus
P P
Conceptual Model
Shared Memory Multiprocessor
Chap. 8 - Multiprocessors 7
Message Passing Multicomputers
• Computers (nodes) connected by a network
– Fast network interface
• Send, receive, barrier
– Nodes not different than regular PC or workstation
• Cluster conventional workstations or PCs with fast network
– cluster computing
– Berkley NOW
– IBM SP2P
M
P
M
P
M
Network
Node
Chap. 8 - Multiprocessors 8
Large-Scale MP DesignsMemory: distributed with nonuniform memory access time (“numa”)
and scalable interconnect (distributed memory)
1 cycle
Low LatencyHigh Reliability
40 cycles100 cycles
Chap. 8 - Multiprocessors 9
Shared Memory Architectures
In this section we will understand the issues around:
• Sharing one memory space among several processors.
• Maintaining coherence among several copies of a data item.
8.1 Introduction
8.3 Centralized Shared Memory Architectures
Chap. 8 - Multiprocessors 10
The Problem of Cache Coherency
CPU
Cache
100
200
A’
B’
Memory
100
200
A
B
I/O
a) Cache and memory coherent: A’ = A, B’ = B.
CPU
Cache
550
200
A’
B’
Memory
100
200
A
B
I/OOutput of A gives 100
b) Cache and memory incoherent: A’ ^= A.
CPU
Cache
100
200
A’
B’
Memory
100
440
A
B
I/OInput 440 to B
c) Cache and memory incoherent: B’ ^= B.
Shared Memory Architectures
Chap. 8 - Multiprocessors 11
Some Simple DefinitionsShared Memory Architectures
Mechanism How It Works Performance Coherency Issues
Write Back
Write Through
Write modified data from cache to memory only
when necessary.
Write modified data from cache
to memory immediately.
Good, because
doesn’t tie up memory
bandwidth.
Not so good - uses a lot of
memory bandwidth.
Can have problems with various copies containing different
values.
Modified values always written to
memory; data always matches.
Chap. 8 - Multiprocessors 12
What Does Coherency Mean?
• Informally:
– “Any read must return the most recent write”
– Too strict and too difficult to implement
• Better:
– “Any write must eventually be seen by a read”
– All writes are seen in proper order (“serialization”)
• Two rules to ensure this:
– “If P writes x and P1 reads it, P’s write will be seen by P1 if the read and write are sufficiently far apart”
– Writes to a single location are serialized: seen in one order
• Latest write will be seen
• Otherwise could see writes in illogical order (could see older value after a newer value)
Shared Memory Architectures
Chap. 8 - Multiprocessors 13
There are Different Types of Memory In The Cache
What kinds of memory are there in the cache?
Shared Memory Architectures
Test_and_set(lock) shared_data = xyz;Clear(lock);
TYPE Shared? Writable How Kept Coherent
Code Shared No No Need.
Private Data Exclusive Yes Write Back
Shared Data Shared Yes Write Back *
Interlock Data Shared Yes Write Through **
* Write Back gives good performance, but if you use write through here, there will be performance degradation.
** Write through here means the lock state is seen immediately. You want a write through here to flush the cache.
Chap. 8 - Multiprocessors 14
Potential HW Coherency Solutions
• Snooping Solution (Snoopy Bus):
– Send all requests for data to all processors
– Processors snoop to see if they have a copy and respond accordingly
– Requires broadcast, since caching information is at processors
– Works well with bus (natural broadcast medium)
– Dominates for small scale machines (most of the market)
• Directory-Based Schemes
– Keep track of what is being shared in one centralized place
– Distributed memory => distributed directory for scalability(avoids bottlenecks)
– Send point-to-point requests to processors via network
– Scales better than Snooping
– Actually existed BEFORE Snooping-based schemes
Shared Memory Architectures
Chap. 8 - Multiprocessors 15
An Example Snoopy ProtocolMaintained by Hardware
Invalidation protocol, write-back cache
Each block of memory is in one state:
Clean in all caches and up-to-date in memory (Shared)
OR Dirty in exactly one cache (Exclusive)
OR Not in any caches
Each cache block is in one state (track these):
Shared : block can be read
OR Exclusive : cache has only copy, its writeable, and dirty
OR Invalid : block contains no data
Read misses: cause all caches to snoop bus
Writes to clean line are treated as misses
Shared Memory Architectures
Chap. 8 - Multiprocessors 16
Snoopy-Cache State Machine-I
• State machinefor CPU requestsfor each cache block
InvalidShared
(read/only)
Exclusive(read/write)
CPU Read
CPU Write
CPU Read hit
Place read misson bus
Place Write Miss on bus
CPU read missWrite back block
CPU WritePlace Write Miss on Bus
CPU Read missPlace read miss on bus
CPU Write MissWrite back cache blockPlace write miss on bus
CPU read hitCPU write hit
Cache BlockState
Shared Memory Architectures
Applies to Write Back
Data
Chap. 8 - Multiprocessors 17
Snoopy-Cache State Machine-II
• State machinefor bus requests for each cache block
• Appendix E gives details of bus requests
InvalidShared
(read/only)
Exclusive(read/write)
Write BackBlock; (abortmemory access)
Write miss for this block
Read miss for this block
Write miss for this block
Write BackBlock; (abortmemory access)
Shared Memory Architectures
Chap. 8 - Multiprocessors 18
Example
P1 P2 Bus Memorystep State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1P1: Read A1P2: Read A1
P2: Write 20 to A1P2: Write 40 to A2
Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2
Processor 1 Processor 2 Bus Memory
Remote Write
or MissWrite Back
Remote Write or Miss
Invalid Shared
Exclusive
CPU Read hit
Read miss on bus
Write miss on bus CPU Write
Place Write Miss on Bus
CPU read hitCPU write hit
Remote Read Write Back
Shared Memory Architectures
This is the Cache for P1.
Chap. 8 - Multiprocessors 19
Example: Step 1
P1 P2 Bus Memorystep State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1P1: Read A1P2: Read A1
P2: Write 20 to A1P2: Write 40 to A2
Invalid Shared
Exclusive
Write miss on bus
Shared Memory Architectures
Chap. 8 - Multiprocessors 20
P1 P2 Bus Memorystep State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1P1: Read A1 Excl. A1 10P2: Read A1
P2: Write 20 to A1P2: Write 40 to A2
Example: Step 2
Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2
Invalid Shared
Exclusive
CPU read hit
Shared Memory Architectures
Chap. 8 - Multiprocessors 21
Example: Step 3
P1 P2 Bus Memorystep State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1P1: Read A1 Excl. A1 10P2: Read A1 Shar. A1 RdMs P2 A1
Shar. A1 10 WrBk P1 A1 10 10Shar. A1 10 RdDa P2 A1 10 10
P2: Write 20 to A1 10P2: Write 40 to A2 10
10
Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2.
Invalid Shared
Exclusive
Read miss on bus
Remote Read Write Back
A1
Shared Memory Architectures
Chap. 8 - Multiprocessors 22
Example: Step 4
P1 P2 Bus Memorystep State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1P1: Read A1 Excl. A1 10P2: Read A1 Shar. A1 RdMs P2 A1
Shar. A1 10 WrBk P1 A1 10 10Shar. A1 10 RdDa P2 A1 10 10
P2: Write 20 to A1 Inv. Excl. A1 20 WrMs P2 A1 10P2: Write 40 to A2 10
10
Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2
Remote Write
Invalid Shared
Exclusive
A1
Shared Memory Architectures
Chap. 8 - Multiprocessors 23
Example: Step 5
P1 P2 Bus Memorystep State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1P1: Read A1 Excl. A1 10P2: Read A1 Shar. A1 RdMs P2 A1
Shar. A1 10 WrBk P1 A1 10 10Shar. A1 10 RdDa P2 A1 10 10
P2: Write 20 to A1 Inv. Excl. A1 20 WrMs P2 A1 10P2: Write 40 to A2 WrMs P2 A2 10
Excl. A2 40 WrBk P2 A1 20 20
A1
A1
Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2
Shared Memory Architectures
Recommended