22
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University

Multilevel Memory Caches

  • Upload
    fausto

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Prof. Sirer CS 316 Cornell University. Multilevel Memory Caches. Storage Hierarchy. SRAM on chip. TechnologyCapacityCost/GBLatency Tape1 TB$.17100s Disk300 GB$.344ms DRAM4GB$52020ns SRAM off512KB$1230005ns SRAM on16 KB???2ns - PowerPoint PPT Presentation

Citation preview

Page 1: Multilevel Memory Caches

Multilevel MemoryCaches

Prof. Sirer

CS 316

Cornell University

Page 2: Multilevel Memory Caches

Storage Hierarchy

Technology Capacity Cost/GBLatency

Tape 1 TB $.17 100sDisk 300 GB $.34 4msDRAM 4GB $520 20nsSRAM off 512KB $123000 5nsSRAM on 16 KB ??? 2ns

Capacity and latency are closely coupled, cost is inversely proportional

How do we create the illusion of large and fast memory?

Tape

Disk

DRAM

SRAM

off chip

SRAM

on chip

Page 3: Multilevel Memory Caches

Memory Hierarchy

Principle: Hide latency using small, fast memories called caches

Caches exploit locality Temporal locality: If a memory location is

referenced, it is likely to be referenced again in the near future

Spatial locality: If a memory location is referenced, other locations near it will be referenced in the near future

Page 4: Multilevel Memory Caches
Page 5: Multilevel Memory Caches
Page 6: Multilevel Memory Caches
Page 7: Multilevel Memory Caches

Cache Lookups (Read)

Look at address issued by processor, search cache tags to see if that block is in the cache Hit: Block is in the cache, return

requested data Miss: Block is not in the cache, read line

from memory, evict an existing line from the cache, place new line in cache, return requested data

Page 8: Multilevel Memory Caches

Cache Organization

Cache has to be fast and small Gain speed by performing lookups in parallel,

requires die real estate Reduce hardware required by limiting where in

the cache a block might be placed

Three common designs Fully associative: Block can be anywhere in the

cache Direct mapped: Block can only be in one line in

the cache Set-associative: Block can be in a few (2 to 8)

places in the cache

Page 9: Multilevel Memory Caches

Tags and Offsets

Cache block size determines cache organization

31 Virtual Address 0

31 Tag 5 4 Offset 0

Block

Page 10: Multilevel Memory Caches

Fully Associative CacheO

ffset

T

ag

V Tag Block

=

=

line

select

word/byte

select

hit encode

Page 11: Multilevel Memory Caches

Direct Mapped CacheO

ffset

Ind

ex

Tag

V Tag Block

=

Page 12: Multilevel Memory Caches

2-Way Set-Associative Cache

Offs

et

I

ndex

T

ag

V Tag Block

=

V Tag Block

=

Page 13: Multilevel Memory Caches

Valid Bits

Valid bits indicate whether cache line contains an up-to-date copy of the values in memory Must be 1 for a hit Reset to 0 on power up

An item can be removed from the cache by setting its valid bit to 0

Page 14: Multilevel Memory Caches

Eviction

Which cache line should be evicted from the cache to make room for a new line? Direct-mapped

no choice, must evict line selected by index Associative caches

random: select one of the lines at random round-robin: similar to random FIFO: replace oldest line LRU: replace line that has not been used in the

longest time

Page 15: Multilevel Memory Caches

Cache Writes

No-Write writes invalidate the cache and go to memory

Write-Through writes go to main memory and cache

Write-Back write cache, write main memory only when block is

evicted

CPUCache

SRAMMemory

DRAM

addr

data

Page 16: Multilevel Memory Caches

Dirty Bits and Write-Back Buffers

Dirty bits indicate which lines have been writtenDirty bits enable the cache to handle multiple writes to the same cache line without having to go to memoryWrite-back buffer

A queue where dirty lines are placed Items added to the end as dirty lines are evicted from the cache Items removed from the front as memory writes are completed

Tag Data Byte 0, Byte 1 … Byte N

Line

V D

0

01

111

Page 17: Multilevel Memory Caches

Misses

Three types of misses Cold

The line is being referenced for the first time Capacity

The line was evicted because the cache was not large enough

Conflict The line was evicted because of another

access whose index conflicted

Page 18: Multilevel Memory Caches

Cache Design

Need to determine parameters Block size Number of ways Eviction policy Write policy Separate I-cache from D-cache

Page 19: Multilevel Memory Caches

Virtual vs. Physical Caches

L1 (on-chip) caches are typically virtual

L2 (off-chip) caches are typically physical

CPUCache

SRAM

Memory

DRAMaddr

data

MMU

Cache

SRAMMMUCPU Memory

DRAM

addr

data

Cache works on physical addresses

Cache works on virtual addresses

Page 20: Multilevel Memory Caches

Cache Conscious Programming

Speed up this program

int a[NCOL][NROW];

int sum = 0;

for(i = 0; i < NROW; ++i)

for(j = 0; j < NCOL; ++j)

sum += a[j][i];

Page 21: Multilevel Memory Caches

Cache Conscious Programming

Every access is a cache miss!

int a[NCOL][NROW];

int sum = 0;

for(j = 0; j < NCOL; ++j)

for(i = 0; i < NROW; ++i)

sum += a[j][i];

1 11

2 12

3 13

4 14

5 15

6

7

8

9

10

Page 22: Multilevel Memory Caches

Cache Conscious Programming

Same program, trivial transformation, 3 out of four accesses hit in the cache

int a[NCOL][NROW];

int sum = 0;

for(i = 0; i < NROW; ++i)

for(j = 0; j < NCOL; ++j)

sum += a[j][i];

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15