66
Com puterA rchitecture CSE 3322 Lecture 20 W eb Site crystal.uta.edu/~jpatters/cse3322 Phase IIProjectdue M onday D ec 1 Problem s: 7.20, 7.22,7.27, 7.28 D ue N ov 17

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Embed Size (px)

Citation preview

Page 1: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Computer Architecture CSE 3322Lecture 20Web Site

crystal.uta.edu/~jpatters/cse3322

Phase II Project due Monday Dec 1

Problems: 7.20, 7.22, 7.27, 7.28 Due Nov 17

Page 2: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

DECStation 3100

Block Instruction Data EffectiveProgram Size Miss Rate Miss Rate Miss Rate

1 6.1% 2.1% 5.4%4 2.0% 1.7% 1.9%

1 1.2% 1.3% 1.2%4 0.3% 0.6% 0.4%

gcc

spice

Write Misses included in 4 word block, but notin 1 word.

Page 3: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

DECStation 3100

Block Instruction Data EffectiveProgram Size Miss Rate Miss Rate Miss Rate

1 6.1% 2.1% 5.4%4 2.0% 1.7% 1.9%

1 1.2% 1.3% 1.2%4 0.3% 0.6% 0.4%

gcc

spice

Write Misses included in 4 word block, but notin 1 word.Remember Miss Penalty goes UP !

Page 4: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Average Memory Access Time =Hit Time + Miss Rate * Miss Penalty

MissPenalty

Block Size

MissRate

Block Size

Access Time

Transfer Time

Constant Size Cache

Fewer Blocks

Page 5: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Reducing the Miss Penalty

Reduce the time to read the multiple words from MainMemory to the cache block.

Page 6: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Reducing the Miss Penalty

Reduce the time to read the multiple words from MainMemory to the cache block.

Don’t wait for the complete block to be transferred“Early Restart”

Page 7: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Reducing the Miss Penalty

Reduce the time to read the multiple words from MainMemory to the cache block.

Don’t wait for the complete block to be transferred“Early Restart”Access and transfer each word sequentially.As soon as the requested word is in cache, restart the processor to access cache and finish the block transferwhile the cache is available.

Page 8: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Reducing the Miss Penalty

Reduce the time to read the multiple words from MainMemory to the cache block.

Don’t wait for the complete block to be transferred“Early Restart”Access and transfer each word sequentially.As soon as the requested word is in cache, restart the processor to access cache and finish the block transferwhile the cache is available.

Variation: “Requested Word First”

Page 9: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Reducing the Miss Penalty

Reduce the time to read the multiple words from MainMemory to the cache block.

Don’t wait for the complete block to be transferred“Early Restart”Access and transfer each word sequentially.As soon as the requested word is in cache, restart the processor to access cache and finish the block transferwhile the cache is available.

Variation: “Requested Word First”Disadvantage: Complex Control

Likely access cache block before transferis complete

Page 10: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Reducing the Miss Penalty

Reduce the time to read the multiple words from MainMemory to the cache block.

Assume Memory Access times:• 1 clock cycle to send address• 10 Clock cycles to access DRAM• 1 clock cycle to send a word of data

Page 11: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Reducing the Miss Penalty

Reduce the time to read the multiple words from MainMemory to the cache block.

Assume Memory Access times:• 1 clock cycle to send address• 10 Clock cycles to access DRAM• 1 clock cycle to send a word of data

For sequential transfer of 4 data words:

Miss Penalty = 1 + 4 *( 10 +1) = 45 clock cycles

Page 12: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

What if we could read a block of words simultaneouslyfrom the Main Memory?

Cache Entry

Valid

Tag Word3 Word2 Word1 Word0

32 32 32 32

Main Memory

Page 13: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

What if we could read a block of words simultaneouslyfrom the Main Memory?

Cache Entry

Valid

Tag Word3 Word2 Word1 Word0

32 32 32 32

Main Memory

Miss Penalty = 1 + 10 + 1 = 12 clock cycles

Miss Penalty for Sequential = 45 clock cycles

Page 14: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

What about 4 banks of Memory? “Interleaved Memory”

Cache

Bank 3 Bank 2 Bank 1 Bank 0Address

Banks are accessed in parallel Words are transferred serially

Page 15: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

What about 4 banks of Memory? “Interleaved Memory”

Cache

Bank 3 Bank 2 Bank 1 Bank 0Address

Banks are accessed in parallel Words are transferred serially

Miss Penalty = 1 + 10 + 4 * 1 = 16 clock cycles

Miss Penalty for Parallel = 12 clock cyclesMiss Penalty for Sequential = 45 clock cycles

Page 16: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Average Memory Access Time =Hit Time + Miss Rate * Miss Penalty

Average Access Time

Block Size

Increase Cache sizeIncrease Block size

Main MemoryOrganization

Page 17: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

For a program:CPU time = CPU execution time + CPU Hold time

Assuming no penalty for Hit

Page 18: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

For a program:CPU time = CPU execution time + CPU Hold time

CPU Hold time = Memory Stall Clock Cycles* Clock Cycle time

Assuming no penalty for Hit

Page 19: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

For a program:CPU time = CPU execution time + CPU Hold time

CPU Hold time = Memory Stall Clock Cycles* Clock Cycle time

Memory Stall Clock Cycles = Read Stall Cycles +Write Stall Cycles

Assuming no penalty for Hit

Page 20: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

For a program:CPU time = CPU execution time + CPU Hold time

CPU Hold time = Memory Stall Clock Cycles* Clock Cycle time

Memory Stall Clock Cycles = Read Stall Cycles +Write Stall Cycles

Read Stall Cycles = Reads * Read Miss Rate * Read Miss Penalty Program

Assuming no penalty for Hit

Page 21: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program

+ Write Buffer Stalls

Page 22: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program

+ Write Buffer Stalls

Write Buffer Stalls should be << Write Miss Stalls

Page 23: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program

+ Write Buffer Stalls

Write Buffer Stalls should be << Write Miss Stalls

So, approximately,

Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program

Page 24: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

Memory Stall Clock Cycles = Read Stall Cycles +

Write Stall Cycles

= Reads * Read Miss Rate * Read Miss Penalty

Program

+ Writes * Write Miss Rate * Write Miss Penalty Program

Page 25: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

Memory Stall Clock Cycles = Read Stall Cycles +

Write Stall Cycles

= Reads * Read Miss Rate * Read Miss Penalty

Program

+ Writes * Write Miss Rate * Write Miss Penalty Program

The Miss Penalties are approximately the same ( Fetch the Block)So, combining the Reads and Writes together into a weighted Miss Rate

Memory Stall Cycles = Memory Accesses * Miss Rate * Miss Penalty Program

Page 26: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache MemoryFor a program:

CPU time = CPU execution time + CPU Hold time

CPU Hold time = Memory Stall Clock Cycles

* Clock Cycle time

CPU time = CPU execution time + Memory Accesses * Miss Rate * Miss Penalty* Clock Cycle time Program

Assuming no penalty for Hit

Page 27: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache MemoryFor a program:

CPU time = CPU execution time + CPU Hold time

CPU Hold time = Memory Stall Clock Cycles

* Clock Cycle time

CPU time = CPU execution time + Memory Accesses * Miss Rate * Miss Penalty* Clock Cycle time ProgramDividing both sides by Instructions / Program and Clock Cycle time

Effective CPI = Execution CPI +Memory Accesses * Miss Rate * Miss Penalty

Instruction

Assuming no penalty for Hit

Page 28: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

Effective CPI = Execution CPI +

Memory Accesses * Miss Rate * Miss Penalty

Instruction

Consider the DECStation 3100 with 4 word blocks running spiceCPI = 1.2 without missesInstruction Miss Rate = 0.3%Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9%1.) Sequential Memory : Miss penalty = 65 clock cycles2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles

Page 29: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

Effective CPI = Execution CPI +

Memory Accesses * Miss Rate * Miss Penalty

Instruction

Eff CPI = 1.2 + ( 1 * .003 + .09 * .006) Miss Penalty

= 1.2 + .00354 * Miss Penalty

Consider the DECStation 3100 with 4 word blocks running spiceCPI = 1.2 without missesInstruction Miss Rate = 0.3%Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9%1.) Sequential Memory : Miss penalty = 65 clock cycles2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles

Page 30: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

Effective CPI = Execution CPI +

Memory Accesses * Miss Rate * Miss Penalty

Instruction

Eff CPI = 1.2 + ( 1 * .003 + .09 * .006) Miss Penalty

= 1.2 + .00354 * Miss Penalty

1.) Eff CPI = 1.2 + .00354* 65 = 1.2 + .2301 = 1.43

Consider the DECStation 3100 with 4 word blocks running spiceCPI = 1.2 without missesInstruction Miss Rate = 0.3%Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9%1.) Sequential Memory : Miss penalty = 65 clock cycles2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles

Page 31: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache Memory

Effective CPI = Execution CPI +

Memory Accesses * Miss Rate * Miss Penalty

Instruction

Eff CPI = 1.2 + ( 1 * .003 + .09 * .006) Miss Penalty

= 1.2 + .00354 * Miss Penalty

1.) Eff CPI = 1.2 + .00354* 65 = 1.2 + 0.2301 = 1.43

2.) Eff CPI = 1.2 + .00354 * 20 = 1.2 + 0.071 = 1.271

Consider the DECStation 3100 with 4 word blocks running spiceCPI = 1.2 without missesInstruction Miss Rate = 0.3%Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9%1.) Sequential Memory : Miss penalty = 65 clock cycles2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles

Page 32: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache MemoryConsider the DECStation 3100 with 4 word blocks running spiceCPI = 1.2 without missesInstruction Miss Rate = 0.3%Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9%4 Bank Interleaved: Miss penalty = 20 clock cyclesEff CPI = 1.271 clock cycles

What if we get a new processor and cache that runs at twice the clockfrequency, but keep the same main memory speed?

Page 33: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache MemoryConsider the DECStation 3100 with 4 word blocks running spiceCPI = 1.2 without missesInstruction Miss Rate = 0.3%Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9%4 Bank Interleaved: Miss penalty = 20 clock cyclesEff CPI = 1.271 clock cycles

What if we get a new processor and cache that runs at twice the clockfrequency, but keep the same main memory speed?

Miss penalty = 40 clock cycles

Eff CPI = 1.2 +.00354 * 40 = 1.2 + 0.1416 = 1.342

Page 34: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CPU Performance with Cache MemoryConsider the DECStation 3100 with 4 word blocks running spiceCPI = 1.2 without missesInstruction Miss Rate = 0.3%Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9%4 Bank Interleaved: Miss penalty = 20 clock cyclesEff CPI = 1.271 clock cycles

What if we get a new processor and cache that runs at twice the clockfrequency, but keep the same main memory speed?

Miss penalty = 40 clock cycles

Eff CPI = 1.2 +.00354 * 40 = 1.2 + 0.1416 = 1.342

Performance Fast clock = 1.271 * 2 *clock cycle time = 1.89 Slow clock 1.342 * clock cycle time

Page 35: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

31 . . . 16 15 . . . 4 3 2 1 0 Address

Byte OffsetBlock Offset

IndexTag

16 12

v Tag Word3 Word2 Word1 Word0

4KEntries

= 16

Hit

Mux

32 32 32 32

2

32Data

Page 36: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address Hit or Miss

678980678981

Page 37: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

0 3 2 10

1 7 6 54

2 11 10 98

3 15 14 1312

7 31 30 2928

8 35 34 3332

15 63 62 6160

X 4X+3 4X+2 4X+1 4X

Block Address

Word Address

Word Addr 4

Page 38: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

0 3 2 10

1 7 6 54

2 11 10 98

3 15 14 1312

7 31 30 2928

8 35 34 3332

15 63 62 6160

X 4X+3 4X+2 4X+1 4X

Block Address

Word Address

Word Addr 4

Cache Address0123

7

Page 39: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

0 3 2 10

1 7 6 54

2 11 10 98

3 15 14 1312

7 31 30 2928

8 35 34 3332

15 63 62 6160

X 4X+3 4X+2 4X+1 4X

Block Address

Word Address

Word Addr 4

Cache Address0123

70

7

Page 40: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

0 3 2 10

1 7 6 54

2 11 10 98

3 15 14 1312

7 31 30 2928

8 35 34 3332

15 63 62 6160

X 4X+3 4X+2 4X+1 4X

Block Address

Word Address

Word Addr 4

Cache Address0123

70

7

X Modulo 8

Page 41: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address Hit or Miss

678980678981

Cache Address =( Word Addr ) modulo 8 4

Page 42: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address Hit or Miss

6 1 1 Miss78980678981

Cache Address =( Word Addr ) modulo 8 4

Page 43: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address Hit or Miss

6 1 1 Miss7 1 1 Hit8980678981

Cache Address =( Word Addr ) modulo 8 4

Page 44: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address Hit or Miss

6 1 1 Miss7 1 1 Hit8 2 2 Miss980678981

Cache Address =( Word Addr ) modulo 8 4

Page 45: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address Hit or Miss

6 1 1 Miss7 1 1 Hit8 2 2 Miss9 2 2 Hit80678981

Cache Address =( Word Addr ) modulo 8 4

Page 46: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address Hit or Miss

6 1 1 Miss7 1 1 Hit8 2 2 Miss9 2 2 Hit80 20 4 Miss678981

Cache Address =( Word Addr ) modulo 8 4

Page 47: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address Hit or Miss

6 1 1 Miss7 1 1 Hit8 2 2 Miss9 2 2 Hit80 20 4 Miss6 1 1 Hit7 1 1 Hit8 2 2 Hit9 2 2 Hit81 20 4 Hit

Cache Address =( Word Addr ) modulo 8 4

Page 48: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address Hit or Miss

6 1 1 Miss7 1 1 Hit8 2 2 Miss9 2 2 Hit68 6 17 18 29 269

Cache Address =( Word Addr ) modulo 8 4

Page 49: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address Hit or Miss

6 1 1 Miss7 1 1 Hit8 2 2 Miss9 2 2 Hit68 17 1 Miss 6 17 18 29 269

Cache Address =( Word Addr ) modulo 8 4

Page 50: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address Hit or Miss

6 1 1 Miss7 1 1 Hit8 2 2 Miss9 2 2 Hit68 17 1 Miss 6 1 1 Miss7 1 1 Hit8 2 2 Hit9 2 2 Hit69

Cache Address =( Word Addr ) modulo 8 4

Page 51: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address Hit or Miss

6 1 1 Miss7 1 1 Hit8 2 2 Miss9 2 2 Hit68 17 1 Miss 6 1 1 Miss7 1 1 Hit8 2 2 Hit9 2 2 Hit69 17 1 Miss

Cache Address =( Word Addr ) modulo 8 4

Page 52: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

How about putting a block in any unused block of the eight blocks?

Tag Word3 Word2 Word1 Word0

Page 53: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

How about putting a block in any unused block of the eight blocks?

Tag Word3 Word2 Word1 Word0

How can you find it?

Page 54: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

How about putting a block in any unused block of the eight blocks?

Tag Word3 Word2 Word1 Word0

How can you find it?Expand the Tag to the block address and compare

Page 55: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

How about putting a block in any unused block of the eight blocks?

Tag Word3 Word2 Word1 Word0

Fully Associative Memory – Addressed by it’s contents

Block Address – 28 bitsAddress

Page 56: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Fully Associative Memory – Addressed by it’s contents

Block Address – 28 bitsAddress

• For practical Hit time, must have parallel comparisonsof the Tag and the Block Address

• Only feasible for small number of blocks

Byte Offset

Block Offset

Page 57: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Fully Associative Memory – Addressed by it’s contents

Block Address – 28 bitsAddress

Tag Data Tag Data Tag Data Tag Data

BlkAddr

= = = =

+Hit

Mux

DataValid bitnot shown

Block Offsetselects Word

Byte Offset

Block Offset

Page 58: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Fully Associative Memory – Addressed by it’s contents

Block Address – 28 bitsAddress

Tag Data Tag Data Tag Data Tag Data

BlkAddr

= = = =

+Hit

Mux

DataValid bitnot shown

HardwareNot Feasiblefor large Cache

Byte Offset

Block Offset

Page 59: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Make sets of Blocks Associative

Two-way set associative

Tag0 Data0 Tag1 Data101...

Index

Valid bitnot shown

• Addr by Index• Compare Two Tags in parallel for Hit

2k-1

Page 60: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Make sets of Blocks Associative

Two-way set associative

Tag0 Data0 Tag1 Data101...

Index

Valid bitnot shown

Tag Index

Block Offset

Byte Offset

• Addr by Index• Compare Two Tags in parallel for Hit

Address

2k-1

Page 61: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Block replacement strategies

For each Index there are 2, 4, ... n options for replacement.

Strategies

1. LRU – Least Recently Used

• Replace the block that has been unused for the longest time

• Implementation

Page 62: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Block replacement strategies

For each Index there are 2, 4, ... n options for replacement

Strategies

1. LRU – Least Recently Used

• Replace the block that has been unused for the longest time

2. Random

• Select the block to be replaced randomly

• Implementation

Page 63: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Two Way Associative Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address(Set) Hit or Miss

Entry 0 Entry 1678968 678969

Cache Address =( Word Addr ) modulo 4 4

Page 64: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Two Way Associative Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address(Set) Hit or Miss

Entry 0 Entry 16 1 1 Miss7 1 1 Hit8 2 2 Miss9 2 2 Hit68 678969

Cache Address =( Word Addr ) modulo 4 4

Page 65: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Two Way Associative Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address(Set) Hit or Miss

Entry 0 Entry 16 1 1 Miss7 1 1 Hit8 2 2 Miss9 2 2 Hit68 17 1 Miss 678969

Cache Address =( Word Addr ) modulo 4 4

Page 66: DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Consider a Two Way Associative Cache with 4 word blocks with size of 8 blocks or 32 words.Reference Sequence Word Address Block Address Cache Address(Set) Hit or Miss

Entry 0 Entry 16 1 1 Miss7 1 1 Hit8 2 2 Miss9 2 2 Hit68 17 1 Miss 6 1 1 Hit7 1 1 Hit8 2 2 Hit9 2 2 Hit69 17 1 Hit

Cache Address =( Word Addr ) modulo 4 4