Upload
burke-pacheco
View
39
Download
2
Tags:
Embed Size (px)
DESCRIPTION
SOFTENG 363. Computer Architecture Cache John Morris ECE/CS, The University of Auckland. Iolanthe at 13 knots on Cockburn Sound, WA. Cache. Small, fast memory Typically ~50kbytes (1998) 2 cycle access time Same die as processor “Off-chip” cache possible - PowerPoint PPT Presentation
Citation preview
SOFTENG 363
Computer Architecture
Cache
John Morris
ECE/CS, The University of Auckland
Iolanthe at 13 knots on Cockburn Sound, WA
Cache
• Small, fast memory• Typically ~50kbytes (1998)• 2 cycle access time
• Same die as processor• “Off-chip” cache possible
• Custom cache chip closely coupled to processor
• Use fast static RAM (SRAM) rather thanslower dynamic RAM
• Several levels possible
• 2nd level of the memory hierarchy• “Caches” most recently used memory
locations “closer” to the processor• closer = closer in time
Cache
• Etymology• cacher (French) = “to hide”
• Transparent to a program• Programs simply run slower without it
• Modern processors rely on it• Reduces the cost of main memory access• Enables instruction/cycle throughput• Typical program
• ~25% memory accesses
Cache
• Relies upon locality of reference• Programs continually use - and re-use -
the same locations• Instructions
• loops, • common subroutines
• Data• look-up tables• “working” data sets
Cache - operation
• Memory requests checked in cache first• If the word sought is in the cache,
it’s read from cache (or updated in cache)Cache hit
• If not, request is passed to main memoryand data is read (written) thereCache miss
CPU
MMU
CacheMainMemD or I
VA PAPA
D or I
Cache - operation
• Hit rates of 95% are usual• Cache: 16 kbytes
• Effective Memory Access Time• Cache: 2 cycles• Main memory: 10 cycles• Average access: 0.95*2 + 0.05*10 = 2.4 cycles
Cache - organisation
• Direct-mapped cache• Each word in the cache has a tag• Assume
• cache size - 2k words• machine words - p bits• byte-addressed memory
• m = log2 ( p/8 ) bits not used to address words
• m = 2 for 32-bit machines
p-k-m mk
p bits
tag cache address byte address
Addressformat
Cache - organisation
• Direct-mapped cache
p-k-m mk
tag cache address byte address
tagdata
Hit?
memory
CPU
2k lines
p-k-mp
A cache line
Memory address
Cache - Direct Mapped
• Conflicts• Two addresses separated by 2k+m
will hit the same cache location• 32-bit machine, 64kbyte (16kword) cachem = 2, k = 14Any program or data set larger than 64kb
will generate conflicts• On a conflict, the ‘old’ word is flushed
• Unmodified word ( Program, constant data )
overwritten by the new data from memory• Modified data needs to be written back to
memory before being overwritten
Cache - Conflicts
• Modified or dirty words When a word is modified in cache
Write-back cache• Only writes data back when neededMissesTwo memory accesses
• Write modified word back
• Read new word
Write-through cache• Low priority write to main memory is queued• Processor is delayed by read only
• Memory write occurs in parallel with other work
• Instruction and necessary data fetches take priority
Cache - Write-through or write-back?
• Write-through• Allows an intelligent bus interface unit
to make efficient use of a serious bottle-neck
Processor - memory interface(Main memory bus)
• Reads (instruction and data) need priority!• They stall the processor• Writes can be delayed
• At least until the location is needed!
• More on intelligent system interface units later
but ...
Cache - Write-through or write-back?
• Write-through• Seems a good idea!
but ...• Multiple writes to the same location waste
memory bus bandwidthTypical programs run better with write-back
caches
however• Often you can easily predict which will be bestSome processors (eg PowerPC) allow you to
classify memory regions as write-back or write-through
Cache - more bits
• Cache lines need some status bits• Tag bits + ..• Valid
• All set to false on power up• Set to true as words are loaded into cache
• Dirty• Needed by write-back cache• Write- through cache always queues the
write, so lines are never ‘dirty’
Cache - Improving Performance
• Conflicts ( addresses 2k+m bytes apart )• Degrade cache performance
• Lower hit rate• Murphy’s Law operates
• Addresses are never random!• Some locations ‘thrash’ in cache
• Continually replaced and restored
Cache - Fully Associative
• Associative• Each tag is compared at the same time• Any match hit
• Avoids ‘unnecessary’ flushing• Replacement
• Least Recently Used - LRU• Needs extra status bits
• Cycles since last accessed
• Hardware cost high• Extra comparators• Wider tags
• p-m bits vs p-k-m bits
Cache - Set Associative
• n-way set associative caches• n can be small: 2, 4, 8• Best performance• Reasonable hardware cost• Most high performance processors
• Replacement policy• LRU choice from n• Reasonable LRU approximation
• 1 or 2 bits• Set on access• Cleared / decremented by timer• Choose cleared word for replacement
Cache - Locality of Reference
Temporal Locality• Same location will be referenced again soon• Access same data again• Program loops - access same instruction again• Caches described so far exploit temporal
locality
Spatial Locality• Nearby locations will be referenced soon
• Next element of an array• Next instruction of a program
Cache - Line Length
• Spatial Locality• Use very long cache lines• Fetch one datum
Neighbours fetched also
• PowerPC 601 (Motorola/Apple/IBM)first of the single chip Power processors
• 64 sets• 8-way set associative• 32 bytes per line• 32 bytes (8 instructions) fetched into
instruction buffer in one cycle• 64 x 8 x 32 = 16k byte total
Cache - Separate I- and D-caches
• Unified cache• Instructions and Data in same cache
• Two caches - * Instructions * DataIncreases total bandwidth
• MIPS R10000• 32Kbyte Instruction; 32Kbyte Data• Instruction cache is pre-decoded! (32 36bits)• Data
• 8-word (64byte) line, 2-way set associative• 256 sets
• Replacement policy?