30
Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM)

Lecture 14: Caching, cont

  • Upload
    floria

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing. Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM). EXAM. Exam is one week from Thursday. Thursday, April 3 rd Covers: Chapter 4 Chapter 5 Buffer Overflows - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 14: Caching, cont

Lecture 14: Caching, cont.

EEN 312: Processors: Hardware, Software, and Interfacing

Department of Electrical and Computer Engineering

Spring 2014, Dr. Rozier (UM)

Page 2: Lecture 14: Caching, cont

EXAM

Page 3: Lecture 14: Caching, cont

Exam is one week from Thursday

• Thursday, April 3rd

• Covers:– Chapter 4– Chapter 5– Buffer Overflows– Exam 1 material fair game

Page 4: Lecture 14: Caching, cont

LAB QUESTIONS?

Page 5: Lecture 14: Caching, cont

CACHING

Page 6: Lecture 14: Caching, cont

Caching Review

• What is temporal locality?

• What is spatial locality?

Page 7: Lecture 14: Caching, cont

Caching Review

• What is a cache set?

• What is a cache line?

• What is a cache block?

Page 8: Lecture 14: Caching, cont

Caching Review

• What is a direct mapped cache?

• What is a set associative cache?

Page 9: Lecture 14: Caching, cont

Caching Review

• What is a cache miss?

• What happens on a miss?

Page 10: Lecture 14: Caching, cont

ASSOCIATIVITY AND MISS RATES

Page 11: Lecture 14: Caching, cont

How Much Associativity

• Increased associativity decreases miss rate– But with diminishing returns

• Simulation of a system with 64KBD-cache, 16-word blocks, SPEC2000– 1-way: 10.3%– 2-way: 8.6%– 4-way: 8.3%– 8-way: 8.1%

Page 12: Lecture 14: Caching, cont

Set Associative Cache Organization

Page 13: Lecture 14: Caching, cont

Multilevel Caches

• Primary cache attached to CPU– Small, but fast

• Level-2 cache services misses from primary cache– Larger, slower, but still faster than main memory

• Main memory services L-2 cache misses• Some high-end systems include L-3 cache

Page 14: Lecture 14: Caching, cont

Multilevel Cache Example

• Given– CPU base CPI = 1, clock rate = 4GHz– Miss rate/instruction = 2%– Main memory access time = 100ns

• With just primary cache– Miss penalty = 100ns/0.25ns = 400 cycles– Effective CPI = 1 + 0.02 × 400 = 9

Page 15: Lecture 14: Caching, cont

Example (cont.)

• Now add L-2 cache– Access time = 5ns– Global miss rate to main memory = 0.5%

• Primary miss with L-2 hit– Penalty = 5ns/0.25ns = 20 cycles

• Primary miss with L-2 miss– Extra penalty = 500 cycles

• CPI = 1 + 0.02 × 20 + 0.005 × 400 = 3.4• Performance ratio = 9/3.4 = 2.6

Page 16: Lecture 14: Caching, cont

Multilevel Cache Considerations

• Primary cache– Focus on minimal hit time

• L-2 cache– Focus on low miss rate to avoid main memory

access– Hit time has less overall impact

• Results– L-1 cache usually smaller than a single cache– L-1 block size smaller than L-2 block size

Page 17: Lecture 14: Caching, cont

Interactions with Advanced CPUs

• Out-of-order CPUs can execute instructions during cache miss– Pending store stays in load/store unit– Dependent instructions wait in reservation

stations• Independent instructions continue

• Effect of miss depends on program data flow– Much harder to analyse– Use system simulation

Page 18: Lecture 14: Caching, cont

Interactions with Software• Misses depend on

memory access patterns– Algorithm behavior– Compiler optimization

for memory access

Page 19: Lecture 14: Caching, cont

Replacement Policy

• Direct mapped: no choice• Set associative

– Prefer non-valid entry, if there is one– Otherwise, choose among entries in the set

• Least-recently used (LRU)– Choose the one unused for the longest time

• Simple for 2-way, manageable for 4-way, too hard beyond that

• Random– Gives approximately the same performance as

LRU for high associativity

Page 20: Lecture 14: Caching, cont

Replacement Policy

• Optimal algorithm:– The clairvoyant algorithm– Not a real possibility

– Why is it useful to consider?

Page 21: Lecture 14: Caching, cont

Replacement Policy

• Optimal algorithm:– The clairvoyant algorithm– Not a real possibility

– Why is it useful to consider?

– We can never beat the Oracle– Good baseline to compare to

• Get as close as possible

Page 22: Lecture 14: Caching, cont

LRU

• Least Recently Used– Discard the data which was used least recently.

– Need to maintain information on the age of a cache. How do we do so efficiently?

Break into GroupsPropose an algorithm

Page 23: Lecture 14: Caching, cont

Using a BDD

Page 24: Lecture 14: Caching, cont

LRU is Expensive

• Age bits take a lot to store

• What about pseudo LRU?

Page 25: Lecture 14: Caching, cont

Using a BDD

• BDDs have decisions at their leaf nodes

• BDDs have variables at their non-leaf nodes

• Each path from a non-leaf to a leaf node is an evaluation of the variable at the node

• Standard convention:– Left node on 0– Right node on 1

Page 26: Lecture 14: Caching, cont

BDD for LRU 4-way associative cache

all lines valid

all lines valid

b0 is 0b0 is 0Choose invalid

Line

Choose invalid

Line

b1 is 0b1 is 0 b1 is 0b1 is 0

line0line0 line1line1 line2line2 line3line3

Page 27: Lecture 14: Caching, cont

BDD for LRU 4-way associative cache

all lines valid

all lines valid

b0 is 0b0 is 0Choose invalid

Line

Choose invalid

Line

b1 is 0b1 is 0 b1 is 0b1 is 0

line0line0 line1line1 line2line2 line3line3

Page 28: Lecture 14: Caching, cont

BDD for LRU 4-way associative cache

• Pseudo-LRU– Uses only 3-bits

Page 29: Lecture 14: Caching, cont

WRAP UP

Page 30: Lecture 14: Caching, cont

For next time

• Read Chapter 5, section 5.4