65
Caches 2 CS 3410, Spring 2014 Computer Science Cornell University P&H Chapter: 5.1-5.4, 5.8, 5.15

CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Embed Size (px)

Citation preview

Page 1: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Caches 2

CS 3410, Spring 2014Computer ScienceCornell University

See P&H Chapter: 5.1-5.4, 5.8, 5.15

Page 2: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Memory Hierarchy

Memory closer to processor • small & fast• stores active data

Memory farther from processor • big & slow• stores inactive data

L1 Cache SRAM-on-chip

L2/L3 Cache SRAM

Memory DRAM

Page 3: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Memory Hierarchy

Memory closer to processor is fast but small• usually stores subset of memory farther

– “strictly inclusive”

• Transfer whole blocks(cache lines):4kb: disk ↔ RAM256b: RAM ↔ L264b: L2 ↔ L1

Page 4: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Cache Questions• What structure to use?

• Where to place a block (book)?• How to find a block (book)?

• When miss, which block to replace?

• What happens on write?

Page 5: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Today

Cache organization• Direct Mapped• Fully Associative• N-way set associative

Cache Tradeoffs

Next time: cache writing

Page 6: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Cache Lookups (Read)

Processor tries to access Mem[x]Check: is block containing Mem[x] in the cache?• Yes: cache hit

– return requested data from cache line

• No: cache miss– read block from memory (or lower level cache)– (evict an existing cache line to make room)– place new block in cache– return requested data and stall the pipeline while all of this happens

Page 7: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Questions

How to organize cache

What are tradeoffs in performance and cost?

Page 8: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Three common designs

A given data block can be placed…• … in exactly one cache line Direct Mapped• … in any cache line Fully Associative

– This is most like my desk with books

• … in a small set of cache lines Set Associative

Page 9: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Direct Mapped Cache• Each block number maps to a single

cache line index

• Where?

address mod #blocks in cache

0x0000000x0000040x0000080x00000c0x0000100x0000140x0000180x00001c0x0000200x0000240x0000280x00002c0x0000300x0000340x0000380x00003c0x0000400x0000440x000048

Memory

Page 10: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Direct Mapped Cache

line 0line 1

0x000x010x020x030x04

Memory (bytes)

Cache

2 cachelines1-byte per cacheline

index = address mod 2

index = 0

Cache size = 2 bytes

index1

Page 11: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Direct Mapped Cache

line 0line 1

Memory (bytes)

Cache

2 cachelines1-byte per cacheline

index = address mod 2

index = 1

Cache size = 2 bytes

0x000x010x020x030x04index

1

Page 12: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Direct Mapped Cache

line 0line 1line 2line 3

Memory (bytes)

Cache

4 cachelines1-byte per cacheline

index = address mod 4

Cache size = 4 bytes

0x000x010x020x030x040x05

index2

Page 13: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Direct Mapped Cache

line 0 ABCD

line 1line 2line 3

0x00 ABCD

0x040x080x0c

0x0100x014

Cache

4 cachelines1-word per cacheline

index = address mod 4offset = which byte in each line

Cache size = 16 bytes

Memory (word)

index offset32-addr2-bits2-bits28-bits

Page 14: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Direct Mapped Cache: 20x000000 ABCD

0x000004 EFGH

0x000008 IJKL

0x00000c MNOP

0x000010 QRST

0x000014 UVWX

0x000018 YZ12

0x00001c 3456

0x000020 abcd

0x000024 efgh

0x0000280x00002c0x0000300x0000340x0000380x00003c0x0000400x0000440x000048

Memory

index offset32-addr

Cache

4 cachelines2-words (8 bytes) per cacheline

line 0

line 1

line 0 ABCD EFGH

line 1 IJKL MNOP

line 2 QRST UVWX

line 3 YZ12 3456

3-bits2-bits27-bits

line 2

line 3

line 0

line 1

line 2

line 3

offset 3 bits: A, B, C, D, E, F, G, H

index = address mod 4offset = which byte in each line

Page 15: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Direct Mapped Cache: 20x000000 ABCD

0x000004 EFGH

0x000008 IJKL

0x00000c MNOP

0x000010 QRST

0x000014 UVWX

0x000018 YZ12

0x00001c 3456

0x000020 abcd

0x000024 efgh

0x0000280x00002c0x0000300x0000340x0000380x00003c0x0000400x0000440x000048

Memory

index offset32-addr

Cache

4 cachelines2-words (8 bytes) per cacheline

line 0

line 1

line 0 Tag & valid bits ABCD EFGH

line 1 IJKL MNOP

line 2 QRST UVWX

line 3 YZ12 3456

3-bits2-bits27-bits

line 2

line 3

line 0

line 1

line 2

line 3

tag

tag = which memory element is it?

0x00, 0x20, 0x40?

Page 16: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Direct Mapped Cache

Every address maps to one location

Pros: Very simple hardware

Cons: many different addresses land on same location and may compete with each other

Page 17: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Direct Mapped Cache (Hardware)

V Tag Block

Tag Index Offset

=

hit? dataWord/byte select

32/8 bits

0…001000offset

indextag

3 bits

Page 18: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Example: Direct Mapped Cache

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

tag data

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

4 cache lines2 byte block

0

0

0

0

V

Using byte addresses in this example. Addr Bus = 5 bits

Page 19: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Example: Direct Mapped Cache

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

tag data

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

4 cache lines2 byte block2 bit tag field2 bit index field1 bit block offset

0

0

0

0

V

Using byte addresses in this example. Addr Bus = 5 bits

Page 20: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

Direct Mapped Example: 6th Access

110

130

150160

180

200

220

240

0123456789

101112131415

Processor

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

110Misses: 2

Hits: 3

140100140

Cache

00

tag data

2

100110

150140

1

1

0

00

001501401

0

VMMHHH

Pathological example

Page 21: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

6th Access

110

130

150160

180

200

220

240

0123456789

101112131415

Processor

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

110Misses: 3

Hits: 3

140220140

Addr: 01100

Cache

00

tag data

2150140

1

1

0

00

1

0

VMMHHH

M

100110

01230220

Page 22: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

7th Access

110

130

150160

180

200

220

240

0123456789

101112131415

Processor

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

110Misses: 3

Hits: 3

140220140

Cache

00

tag data

2150140

1

1

0

00

1

0

VMMHHH

M

100110

01230220

Page 23: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

7th Access

110

130

150160

180

200

220

240

0123456789

101112131415

Processor

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

180Misses: 4

Hits: 3

150140

Addr: 00101

Cache

tag data

2

100110

150140

1

1

0

00

1

0

VMMHHH

MM

00

15014000

Page 24: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

8th and 9th Access

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

110

130

150160

180

200

220

240

0123456789

101112131415

Processor Memory

100

120

140

170

190

210

230

250

Misses: 4+2

Hits: 3

Cache

00

tag data

2150140

1

1

0

00

1

0

VMMHHHMMMM

100110

01230220

Page 25: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

10th and 11th, 12th and 13th Access

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

110

130

150160

180

200

220

240

0123456789

101112131415

Processor Memory

100

120

140

170

190

210

230

250

Misses: 4+2+2+2

Hits: 3

Cache

00

tag data

2150140

1

1

0

00

1

0

VMMHHHMMMMMMMM

100110

01230220

Page 26: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

ProblemWorking set is not too big for cache

Yet, we are not getting any hits?!

Page 27: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Misses

Three types of misses• Cold (aka Compulsory)

– The line is being referenced for the first time

• Capacity– The line was evicted because the cache was not large

enough

• Conflict– The line was evicted because of another access whose

index conflicted

Page 28: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Misses

Q: How to avoid…Cold Misses• Unavoidable? The data was never in the cache…• Prefetching!

Capacity Misses• Buy more cache

Conflict Misses• Use a more flexible cache design

Page 29: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Cache Organization

How to avoid Conflict Misses

Three common designs• Direct mapped: Block can only be in one line in the

cache• Fully associative: Block can be anywhere in the

cache• Set-associative: Block can be in a few (2 to 8)

places in the cache

Page 30: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Fully Associative Cache• Block can be anywhere in the cache

• Most like our desk with library books

• Have to search in all entries to check for match• More expensive to implement in hardware

• But as long as there is capacity, can store in cache• So least misses

Page 31: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Fully Associative Cache (Reading)

V Tag Block

word/byte select

hit? data

line select

= = = =

32 or 8 bits

Tag Offset No index

Page 32: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Fully Associative Cache (Reading)

V Tag BlockTag Offset

m bit offsetQ: How big is cache (data only)?Cache of size 2n blocksBlock size of 2m bytes

Cache Size: number-of-blocks x block size = 2n x 2m bytes = 2n+m bytes

, 2n blocks (cache lines)

Page 33: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Fully Associative Cache (Reading)

V Tag BlockTag Offset

m bit offsetQ: How much SRAM needed (data + overhead)?Cache of size 2n blocksBlock size of 2m bytesTag field: 32 – mValid bit: 1

SRAM size: 2n x (block size + tag size + valid bit size) = 2nx (2m bytes x 8 bits-per-byte + (32-m) + 1)

, 2n blocks (cache lines)

Page 34: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Example: Simple Fully Associative Cache

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

tag data

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

4 cache lines2 byte block

4 bit tag field1 bit block offset

V

V

V

V

V

Using byte addresses in this example! Addr Bus = 5 bits

Page 35: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

1st Access

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

tag data

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

0

0

0

0

V

Page 36: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Eviction

Which cache line should be evicted from the cache to make room for a new line?• Direct-mapped

– no choice, must evict line selected by index• Associative caches

– random: select one of the lines at random– round-robin: similar to random– FIFO: replace oldest line– LRU: replace line that has not been used in the longest

time

Page 37: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

0

1st Access

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

0000

tag data

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

100110

110Misses: 1

Hits: 0

LRU

Addr: 00001 block offset

1

0

0

M

Page 38: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

2nd Access

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

0000

tag data

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

100110

110Misses: 1

Hits: 0

lru

1

0

0

0

M

Page 39: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

2nd Access

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

0000

tag data

$0$1$2$3

Memory

0010

100

120

140

170

190

210

230

250

100110

110Misses: 2

Hits: 0

150140

150

Addr: 00101block offset

1

1

0

0

MM

Page 40: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

3rd Access

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

0000

tag data

$0$1$2$3

Memory

0010

100

120

140

170

190

210

230

250

100110

110Misses: 2

Hits: 0

150140

150

1

1

0

0

MM

Page 41: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

3rd Access

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

0000

tag data

$0$1$2$3

Memory

0010

100

120

140

170

190

210

230

250

100110

110Misses: 2

Hits: 1

150140

150110

1

1

0

0

Addr: 00001 block offsetMMH

Page 42: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

4th Access

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

0000

tag data

$0$1$2$3

Memory

0010

100

120

140

170

190

210

230

250

100110

110Misses: 2

Hits: 1

150140

150110

1

1

0

0

MMH

Page 43: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

4th Access

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

0000

tag data

$0$1$2$3

Memory

0010

100

120

140

170

190

210

230

250

100110

110Misses: 2

Hits: 2

150140

150140

1

1

0

0

Addr: 00100block offset

MMHH

Page 44: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

5th Access

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

0000

tag data

$0$1$2$3

Memory

0010

100

120

140

170

190

210

230

250

100110

110Misses: 2

Hits: 2

150140

150140

1

1

0

0

MMHH

Page 45: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

5th Access

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]

CacheProcessor

0

tag data

$0$1$2$3

Memory

2

100

120

140

170

190

210

230

250

100110

110Misses: 2

Hits: 3

150140

140100140

1

1

0

0

Addr: 00000 block offsetMMHHH

Page 46: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

6th Access

110

130

150160

180

200

220

240

0123456789

101112131415

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

CacheProcessor

0

tag data

$0$1$2$3

Memory

2

100

120

140

170

190

210

230

250

100110

110Misses: 2

Hits: 3

150140

140100140

1

1

0

0

MMHHH

Page 47: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

6th Access

110

130

150160

180

200

220

240

0123456789

101112131415

Processor

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

110Misses: 3

Hits: 3

140220140

Addr: 01100

Cache

0000

tag data

0010

100110

150140

1

1

1

00

0110 220230

MMHHHM

Page 48: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

7th Access

110

130

150160

180

200

220

240

0123456789

101112131415

Processor

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

110Misses: 3

Hits: 3+1

150140

Cache

0000

tag data

0010

100110

150140

1

1

1

00

0110 220230

MMHHHMH

Page 49: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

8th and 9th Access

110

130

150160

180

200

220

240

0123456789

101112131415

Processor

$0$1$2$3

Memory

100

120

140

170

190

210

230

250

110Misses: 3

Hits: 3+1+2

150140

Cache

0000

tag data

0010

100110

150140

1

1

1

00

0110 220230

MMHHHMHHH

Page 50: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

10th and 11th Access

110

130

150160

180

200

220

240

0123456789

101112131415

Processor Memory

100

120

140

170

190

210

230

250

Misses: 3

Hits: 3+1+2+2

Cache

0000

tag data

0010

100110

150140

1

1

1

00

0110 220230

MMHHHMHHHHH

Page 51: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Cache TradeoffsDirect Mapped+ Smaller+ Less+ Less+ Faster+ Less+ Very– Lots– Low– Common

Fully AssociativeLarger –More –More –

Slower –More –

Not Very –Zero +High +

?

Tag SizeSRAM OverheadController Logic

SpeedPrice

Scalability# of conflict misses

Hit ratePathological Cases?

Page 52: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Compromise

Set-associative cache

Like a direct-mapped cache• Index into a location• Fast

Like a fully-associative cache• Can store multiple entries

– decreases conflicts• Search in each element

n-way set assoc means n possible locations

Page 53: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

2-Way Set Associative Cache (Reading)

word select

hit? data

line select

= =

Tag Index Offset

Page 54: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

3-Way Set Associative Cache (Reading)

word select

hit? data

line select

= = =

Tag Index Offset

Page 55: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Comparison: Direct Mapped

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

110

130

150160

180

200

220

240

0123456789

101112131415

Processor Memory

100

120

140

170

190

210

230

250

Misses: 4+2+2

Hits: 3

Cache

00

tag data

2150140

1

1

0

00

1

0

VMMHHHMMMMMM

100110

01230220

Page 56: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

110

130

150160

180

200

220

240

0123456789

101112131415

Processor Memory

100

120

140

170

190

210

230

250

Misses: 3

Hits: 4+2+2

Cache

0000

tag data

0010

100110

150140

1

1

1

00

0110 220230

MMHHHMHHHHH

Comparison: Fully Associative

4 cache lines2 word block

4 bit tag field1 bit block offset field

Page 57: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Comparison: 2 Way Set Assoc

110

130

150160

180

200

220

240

0123456789

101112131415

Processor Memory

100

120

140

170

190

210

230

250

Misses:

Hits:

Cache

tag data

0

0

0

0

2 sets2 word block3 bit tag field1 bit set index field1 bit block offset fieldLB $1 M[ 1 ]

LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

Page 58: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Comparison: 2 Way Set Assoc

110

130

150160

180

200

220

240

0123456789

101112131415

Processor Memory

100

120

140

170

190

210

230

250

Misses: 4

Hits: 7

Cache

tag data

0

0

0

0

2 sets2 word block3 bit tag field1 bit set index field1 bit block offset field

MMHHH

MMHHHH

LB $1 M[ 1 ]LB $2 M[ 5 ]LB $3 M[ 1 ]LB $3 M[ 4 ]LB $2 M[ 0 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]LB $2 M[ 12 ]LB $2 M[ 5 ]

Page 59: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Summary on Cache Organization

Direct Mapped simpler, low hit rateFully Associative higher hit cost, higher hit rateN-way Set Associative middleground

Page 60: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

MissesCache misses: classificationCold (aka Compulsory)• The line is being referenced for the first time

– Block size can help

Capacity• The line was evicted because the cache was too small• i.e. the working set of program is larger than the cache

Conflict• The line was evicted because of another access whose

index conflicted– Not an issue with fully associative

Page 61: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Cache Performance

Average Memory Access Time (AMAT)Cache Performance (very simplified): L1 (SRAM): 512 x 64 byte cache lines, direct mapped

Data cost: 3 cycle per word accessLookup cost: 2 cycle

Mem (DRAM): 4GBData cost: 50 cycle plus 3 cycle per word

Performance depends on:Access time for hit, hit rate, miss penalty

Page 62: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Basic Cache Organization

Q: How to decide block size?

Page 63: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Experimental Results

Page 64: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Tradeoffs

For a given total cache size,larger block sizes mean…. • fewer lines• so fewer tags, less overhead• and fewer cold misses (within-block “prefetching”)

But also…• fewer blocks available (for scattered accesses!)• so more conflicts• and larger miss penalty (time to fetch block)

Page 65: CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.15

Summary

Caching assumptions• small working set: 90/10 rule• can predict future: spatial & temporal locality

Benefits• big & fast memory built from (big & slow) + (small &

fast)

Tradeoffs: associativity, line size, hit cost, miss penalty, hit rate• Fully Associative higher hit cost, higher hit rate• Larger block size lower hit cost, higher miss

penalty