Multilevel Memory Caches

Multilevel MemoryCaches

Prof. Sirer

CS 316

Cornell University

Storage Hierarchy

Technology Capacity Cost/GBLatency

Tape 1 TB $.17 100sDisk 300 GB $.34 4msDRAM 4GB $520 20nsSRAM off 512KB $123000 5nsSRAM on 16 KB ??? 2ns

Capacity and latency are closely coupled, cost is inversely proportional

How do we create the illusion of large and fast memory?

Tape

Disk

DRAM

SRAM

off chip

SRAM

on chip

Memory Hierarchy

Principle: Hide latency using small, fast memories called caches

Caches exploit locality Temporal locality: If a memory location is

referenced, it is likely to be referenced again in the near future

Spatial locality: If a memory location is referenced, other locations near it will be referenced in the near future

Cache Lookups (Read)

Look at address issued by processor, search cache tags to see if that block is in the cache Hit: Block is in the cache, return

requested data Miss: Block is not in the cache, read line

from memory, evict an existing line from the cache, place new line in cache, return requested data

Cache Organization

Cache has to be fast and small Gain speed by performing lookups in parallel,

requires die real estate Reduce hardware required by limiting where in

the cache a block might be placed

Three common designs Fully associative: Block can be anywhere in the

cache Direct mapped: Block can only be in one line in

the cache Set-associative: Block can be in a few (2 to 8)

places in the cache

Tags and Offsets

Cache block size determines cache organization

31 Virtual Address 0

31 Tag 5 4 Offset 0

Block

Fully Associative CacheO

ffset

T

ag

V Tag Block

=

=

line

select

word/byte

select

hit encode

Direct Mapped CacheO

ffset

Ind

ex

Tag

V Tag Block

=

2-Way Set-Associative Cache

Offs

et

I

ndex

T

ag

V Tag Block

=

V Tag Block

=

Valid Bits

Valid bits indicate whether cache line contains an up-to-date copy of the values in memory Must be 1 for a hit Reset to 0 on power up

An item can be removed from the cache by setting its valid bit to 0

Eviction

Which cache line should be evicted from the cache to make room for a new line? Direct-mapped

no choice, must evict line selected by index Associative caches

random: select one of the lines at random round-robin: similar to random FIFO: replace oldest line LRU: replace line that has not been used in the

longest time

Cache Writes

No-Write writes invalidate the cache and go to memory

Write-Through writes go to main memory and cache

Write-Back write cache, write main memory only when block is

evicted

CPUCache

SRAMMemory

DRAM

addr

data

Dirty Bits and Write-Back Buffers

Dirty bits indicate which lines have been writtenDirty bits enable the cache to handle multiple writes to the same cache line without having to go to memoryWrite-back buffer

A queue where dirty lines are placed Items added to the end as dirty lines are evicted from the cache Items removed from the front as memory writes are completed

Tag Data Byte 0, Byte 1 … Byte N

Line

V D

0

01

111

Misses

Three types of misses Cold

The line is being referenced for the first time Capacity

The line was evicted because the cache was not large enough

Conflict The line was evicted because of another

access whose index conflicted

Cache Design

Need to determine parameters Block size Number of ways Eviction policy Write policy Separate I-cache from D-cache

Virtual vs. Physical Caches

L1 (on-chip) caches are typically virtual

L2 (off-chip) caches are typically physical

CPUCache

SRAM

Memory

DRAMaddr

data

MMU

Cache

SRAMMMUCPU Memory

DRAM

addr

data

Cache works on physical addresses

Cache works on virtual addresses

Cache Conscious Programming

Speed up this program

int a[NCOL][NROW];

int sum = 0;

for(i = 0; i < NROW; ++i)

for(j = 0; j < NCOL; ++j)

sum += a[j][i];


Every access is a cache miss!

int a[NCOL][NROW];

int sum = 0;



sum += a[j][i];

1 11

2 12

3 13

4 14

5 15

6

7

8

9

10


Same program, trivial transformation, 3 out of four accesses hit in the cache

int a[NCOL][NROW];

int sum = 0;



sum += a[j][i];

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15

Documents

Multilevel Memory Caches