View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Caches
J. Nelson AmaralUniversity of Alberta
Processor-Memory Performance Gap
Bauer p. 47
Memory Hierarchy
Bauer p. 48
Principle of Locality
• Temporal Localitywhat was used in the past is likely to be reused in the near future
• Spatial Localitywhat is close to the thing that is being used now is likely to be also used in the near future
Bauer p. 48
Hits and Misses
• Cache hit: the requested location is in the cache
• Cache miss: the requested location in not in the cache
Bauer p. 48
Cache Organizations
• When to bring the content of a memory location into the cache?
• Where to put it?• How do we know it is there?• What happens if the cache is full and we need
to bring the content of a location into the cache?
On demand
Depends on Cache Organization
Tag entries
Use a replacement algorithm
Bauer p. 49
Cache Organization
Bauer p. 50
Mapping
Bauer p. 51
Content-Addressable Memories (CAMs)
• Indexed by matching (part of) the content of entries
• All entries are searched in parallel• Drawbacks:
– expensive hardware– consume more power– difficult to modify
Bauer p. 50
Cache Geometry
• C: number of cache lines• m: number of banks in the cache (associativity)• L: line size• S: Cache size (or capacity)• S = C × L• (S, L, m) gives the geometry of a cache• d: number of bits needed for displacement
Bauer p. 52
Hit and Miss Detection
(S,L,m) = (32KB, 16B, 1)Cache Geometry:
Memory Reference:(t,i,d) = (?, ?, ?)
d = log2 L = log2 16 = 4
i = log2 (C/m) = log 2048 = 11
C = S/L = 32KB/16B = 2048
t= 32 – i – d = 32 – 11 – 4 = 17
Bauer p. 52
• C: # of cache lines• m: associativity• L: line size• S: Cache size
• S = C × L• (S, L, m): geometry• d: # displacement bits
(t,i,d) = (tag, index, displacement)
Hit and Miss Detection
d = log2 L = log2 16 = 4
i = log2 (C/m) = log 2048 = 11
C = S/L = 32KB/16B = 2048
t= 32 – i – d = 32 – 11 – 4 = 17
Bauer p. 52
What happens to t if we doublethe line size?
32
32B
5
1024
1024
10
10
5
• C: # of cache lines• m: associativity• L: line size• S: Cache size
• S = C × L• (S, L, m): geometry• d: # displacement bits
(t,i,d) = (tag, index, displacement)
(S,L,m) = (32KB, 16B, 1)32
Hit and Miss Detection
d = log2 L = log2 16 = 4
i = log2 (C/m) = log 2048 = 11
C = S/L = 32KB/16B = 2048
t= 32 – i – d = 32 – 11 – 4 = 17
Bauer p. 52
What happens to t if we changeto a 2-way associativity?
1024 10
10 17
Need one more comparatorand a multiplexor.
(S,L,m) = (32KB, 16B, 1) 2
• C: # of cache lines• m: associativity• L: line size• S: Cache size
• S = C × L• (S, L, m): geometry• d: # displacement bits
(t,i,d) = (tag, index, displacement)
Replacement Algorithm
• Direct mapped– There is only one location for a block– If the location is occupied, the block that is there
is evicted• m-way set associative
– If all m are valid, must select a victim• Low associativity:
- Least-Recently Used (LRU) entry should be evicted- High associativity:
- (Two) Most-Recently Used (MRU) should not be evicted.
Bauer p. 53
Write Strategies (on a hit)
• Write back– Write only to the cache (memory becomes stale)– Add a dirty bit to each cache line– Must write back to memory when entry is evicted
• Write through– Write to both cache and memory– No need to have a dirty bit– Memory is consistent at all times
Bauer p. 54
Write Strategies (on a miss)
• Write allocate– read the line from the memory– write to the line to modify it
• Write around– write to the next level only
• Combinations that make sense:– write back with write allocate– write through with write around
Bauer p. 54
Write Buffer
Processor CacheWriteBuffer MemoryRead
Read
WriteWrite
Bauer p. 54
The three C’s• Compulsory (cold) misses
– first time a memory block is referenced• Conflict misses
– more than m blocks compete for the same cache entries in an m-way cache
• Capacity misses– more than C blocks compute for space in a cache with
C lines• Coherence misses
– needed blocks are invalidated because of I/O or multiprocessor operations.
Bauer p. 54
Caches and I/O (read)
Bauer p. 55
What happens to the cache when data need to move fromdisk to memory?
1. Invalidate cache data using valid bit.
Caches and I/O (read)
Bauer p. 55
2. Update cache with new data.
What happens to the cache when data need to move fromdisk to memory?
Caches and I/O (Write)
Bauer p. 55
What happens to the cache when data need to move frommemory to disk?
purge dirty lines
Alternative: Hardware Snoopy Protocol.
Cache Performance
Hit Ratio:
€
h= number of memory references that hit the cache
total number of memory references to the cache
€
miss ratio= 1- h
€
Average Memory Access Time= h× Tcache+ (1- )h Tmem
For two levels of cache:
€
AMAT = h1 ×T 1L + (1- h1 ) ×h2 ×T 2L +(1 - h1 ) ×(1 - h2 ) ×Tmem
Bauer p. 56
Cache Performance
€
AMAT = h× Tcache+ (1- )h Tmem
Goal: Reduce AMAT
Strategies: 1. Increase hit ratio (h) 2. Reduce Tcache
Parameters: 1. Cache Capacity 2. Cache Associativity 3. Cache Line Size
Bauer p. 56
Influence of Capacity on Miss Rate
Bauer p. 57Cache is (S, 2, 64) Application: 176.gcc
Associativity X Miss Rate
Cache is (32KB, m, 64) Application: 176.gcc
Line Size X Miss Rate
Cache is (16KB, 1, L)
Memory Access time
€
AMAT = h× Tcache+ (1- )h Tmem
€
Tmem = Tacc+ ( / )L w TbusTacc : Time to send address + Time to Read
L : L2 cache line size
w : Bus width
Tbus : bus cycle time
€
AMAT = h× Tcache+ (1- )h ×(Tacc + (L/w) ×Tbus )
AMAT Example
€
Tacc : 5 cycles
w : 64 bits
Tbus : 2 cycles
Cache CA : hA = 0.88, LA = 16 bytes
Cache CB : hB = 0.92, LB = 32 bytes
Both access time (CA and CB) is 1 cycle
We will study two alternative configurations, CA and CB, for a single level of cache. What is the AMAT in each case?
€
AMATA = 0.88 × 1+ (1- 0.88) ×(5 + (16 /8) × 2) = 1.96
€
AMAT = h× Tcache+ (1- )h ×(Tacc + (L/w) ×Tbus )
€
AMATB = 0.92 × 1+ (1 - 0.92) ×(5 + (16 /4) × 2) = 1.96