Upload
norah-hart
View
212
Download
0
Embed Size (px)
Citation preview
CSCI 232 © 2005 JW Ryder 1
Cache Memory Systems
• Introduced by M.V. Wilkes (“Slave Store”)
• Appeared in IBM S360/85 first commercially
CSCI 232 © 2005 JW Ryder 2
Motivations• Main memory access time 5 to
25 times slower than accessing register– on chip vs. off chip issues et al.
• Can’t have too many registers in the CPU
• Program locality should allow small fast buffer between the CPU and MM
• Should be managed by hardware to be effective
CSCI 232 © 2005 JW Ryder 3
Motivations Continued
• Most of time, MM data has to be found in cache to be worth it
• Can only happen if dynamic locality is tracked well
• Automatic management, transparent to Instruction Set Architecture (ISA)
CSCI 232 © 2005 JW Ryder 4
Access and Cost
• Tcache < TMM
• Treg < Tcache
• Creg > Ccache > CMM (per bit - real estate)
CSCI 232 © 2005 JW Ryder 5
Cache vs. Registers
• Cache– Locality: Tracked dynamically– Management: Hardware– Expandability: Easy– ISA Visibility: Invisible (mostly)
• Registers– Locality: Static by compiler– Management:
Software/Programmer– Expandability: Not possible– ISA Visibility: Visible
CSCI 232 © 2005 JW Ryder 6
4 2
1
5
3
Simple Cache Based System
MMRegisters
CPUCache
CSCI 232 © 2005 JW Ryder 7
Read Operation• See if desired MM word is in
the cache (1)
• If it is (‘cache hit’) get it from the cache (2)
• If it isn’t (cache miss) get it from MM - supply simultaneously to CPU and cache (3)– Make room in cache by selecting
a victim - may have to be written back to MM (4) and then copy installed (5)
• CPU stalls until missing word is supplied
CSCI 232 © 2005 JW Ryder 8
Locality of Reference
• Temporal– If this word is needed now, then
there is a good chance it will be needed again
• Spatial– When the fetch from MM is done,
it actually gets a chunk of words– Probably some word near the
word will also be needed
• Registers use TLOR
• Caches use TLOR, SLOR
CSCI 232 © 2005 JW Ryder 9
Selecting a Victim
• Must not be accessed in near future
• Maintain a history of usage
• Basic unit of transfer between cache and MM is a block (line) consisting of 2b words– b is small (2 - 4)
• On miss, block containing missing word loaded into cache (by cache controller)
• Ensures neighboring words also cached (SLOR)
CSCI 232 © 2005 JW Ryder 10
Addressing Cache• Same as memory• Cache stores entries in form
– <block address, contents of words in block, usage info>
• Cache controller compares address issued by CPU with address field of cache entries to determine a hit or miss
• Transfer between Cache and CPU is only a word or 2
• Between Cache and MM in block(s)• Hit - Data back from cache in 1
clock cycle• Miss - 15 - 20 cycles
CSCI 232 © 2005 JW Ryder 11
Functions of Cache Controller
• Given an address issued by CPU, CC should be able to determine if block containing word is in cache or not– requires assoc. logic / comparators
• CC needs to keep track of usage of blocks in cache
• Hardware logic for victim selection• May need to write back line (victim)
from cache to MM • Must implement a placement policy
that determines how blocks from MM are placed in cache
• Replacement policy needed only if there is a choice for victim
CSCI 232 © 2005 JW Ryder 12
Cache Loading Strategies• Load block into cache from MM
only on a miss• Prefetch (anticipating a miss) block
into cache– Prefetch on Miss: On block i miss,
prefetch block i + 1 too
– Always Prefetch: Prefetch block i + 1 on first reference to block i
– Tagged Prefetch: Prefetch on miss and prefetch block i + 1 if a reference to a previously prefetched block is made for the first time
– Keep prefetching if last prefetch was useful
– Tags distinguish not yet accessed blocks from others
CSCI 232 © 2005 JW Ryder 13
More Strategies
• Previous prefetches are 1 block, can be > 1 block
• Selective Fetch– Don’t fetch shared writeable blocks
– Used in many systems to avoid cache incoherence (multiprocessors)
CSCI 232 © 2005 JW Ryder 14
Load-Thru / Read-Thru
• Missing word forwarded to CPU and cache concurrently
• Remaining words of block are then fetched in wraparound fashion
0 1 2 … … 2k
w
• Order of loading for remaining words in block
• Wrapping around saves pointer resetting
• Write pointer already positioned
• Not needed if load can be in one shot
CSCI 232 © 2005 JW Ryder 15
Cache with Writeback Buffers
CacheCPU MM
Write-Thru caches Write-Back caches Special
R W
W W
Writeback buffer = fast registers
Special: Used with both types of caches; used when wrote word to writeback buffer then there is a cache miss
Cache speed, buffer speed, memory speed
CSCI 232 © 2005 JW Ryder 16
Write-Thru Caches
• Write generated by CPU writes into cache and also deposits the write into writeback buffer– Eventually written back to MM
• Delay perceived by CPU– max (Tcache, TWB)
• Tcache Cache access time
• TWB Time to write into writeback buffer
• Tcache, TWB < TMM
CSCI 232 © 2005 JW Ryder 17
Writeback Cache
• Write to cache• Write modified victims to MM via
writeback buffer
• Delay perceived by CPU = Tcache
• Special happens on a miss, read or write
CSCI 232 © 2005 JW Ryder 18
Cache Update Policies
• Keeps MM copy and cache copy of a word (ergo block) consistent
• Write-Thru (Store-Thru)– On hit if operation is a write, copies in
MM and cache are both updated simultaneously
– No need to writ e back blocks selected as victims
– Useful for multiprocessing systems (MM always has latest copy)
– If cache fails MM copy can serve as hot back up
– Can slow up CPU on writes (since MM updates take place at slower rates)
CSCI 232 © 2005 JW Ryder 19
Write-Back (No Write-Thru)
• On write hit, only cache copy is updated
• Faster writes on a cache hit• Need to write back dirty blocks
selected as victims– Dirty Block: A block modified after
being brought into the cache
• Requires a clean/dirty bit for every block
CSCI 232 © 2005 JW Ryder 20
Allocation Policies
• WTWA - Write Thru Write Allocate - allocate missing block in cache on both read and write miss
• WTNWA - Write Thru No Write Allocate - Don’t allocate on a write miss, allocate only for a read miss