47
ECE7995 Caching and Prefetching Techniques in Computer Systems Lecture 8: Buffer Cache in Main Memory (IV)

ECE7995 Caching and Prefetching Techniques in Computer Systems

  • Upload
    carney

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

ECE7995 Caching and Prefetching Techniques in Computer Systems. Lecture 8: Buffer Cache in Main Memory (IV). 5. Recency = 1. Recency = 2. 3. 2. 8. 1. 4. 9. Quantifying Locality with LRU Stack . Blocks are ordered by their recencies; - PowerPoint PPT Presentation

Citation preview

Page 1: ECE7995 Caching and Prefetching Techniques in Computer Systems

ECE7995 Caching and Prefetching Techniques in Computer Systems

Lecture 8: Buffer Cache in Main Memory (IV)

Page 2: ECE7995 Caching and Prefetching Techniques in Computer Systems

Quantifying Locality with LRU Stack

• Blocks are ordered by their recencies;

• Blocks enter from the stack top, and leave from its bottom;

1 LRU stack

32

5

98

43. . .4

5544 33

Recency = 1Recency = 2

Page 3: ECE7995 Caching and Prefetching Techniques in Computer Systems

LRU Stack

• Blocks are ordered by recency in the LRU stack;

• Blocks enter from the stack top, and leave from its bottom;

LRU stack

32

45

98

3. . . 5544 3333

Recency = 2

IRR = 2

Inter-Reference Recency (IRR)The number of other distinct blocks accessed between two consecutive references to the block.

Recency = 0

Page 4: ECE7995 Caching and Prefetching Techniques in Computer Systems

Locality Strength

Locality Strength

Cache Size

MULTI2

IRR

(Re-

use

Dis

tanc

e in

Blo

cks)

Virtual Time (Reference Stream)

LRU

Good for “absolutely” strong locality

Bad for relatively weak locality

Page 5: ECE7995 Caching and Prefetching Techniques in Computer Systems

LRU’s Inability with Weak Locality

• Memory scanning (one-time access) Infinite IRR, weak locality; should not be cached at all; not replaced timely in LRU (be cached until their recency

larger than cache size);

Page 6: ECE7995 Caching and Prefetching Techniques in Computer Systems

LRU’s Inability with Weak Locality

• Loop-like accesses (repeated accesses with a fixed interval)

IRR is the same as the interval The interval larger than cache size, no hits blocks to be accessed soonest can be unfortunately

replaced.

Page 7: ECE7995 Caching and Prefetching Techniques in Computer Systems

LRU’s Inability with Weak Locality

• Accesses with distinct frequencies: The recencies of frequently accessed blocks become large

because of references to infrequently accessed block; Frequently accessed blocks could be unfortunately replaced.

Page 8: ECE7995 Caching and Prefetching Techniques in Computer Systems

Looking for Blocks with Strong Locality

Locality Strength

Cache Size

MULTI2IR

R (R

e-us

e D

ista

nce

in B

lock

s)

Virtual Time (Reference Stream)

Cover 1000 Blocks with Strongest

Locality

Page 9: ECE7995 Caching and Prefetching Techniques in Computer Systems

Challenges

Address the limitations of LRU fundamentally.

Retain the low overhead and adaptability merits of LRU.

• Simplicity: affordable implementation • Adaptability: responsive to access pattern changes

Page 10: ECE7995 Caching and Prefetching Techniques in Computer Systems

Principle of the LIRS Replacement

We select the blocks with high IRRs for replacement .

LIRS: Low IRR Set Replacement algorithm We keep the set of blocks with low IRRs in cache.

If a block’s IRR is high, its next IRR is likely to be high again.

Page 11: ECE7995 Caching and Prefetching Techniques in Computer Systems

Requirements on Low IRR Block Set (LIRS)

The set size should be the cache size. The set consists of the blocks with strongest

locality strength (with the lowest IRRs)Dynamically keep the set up to date

Page 12: ECE7995 Caching and Prefetching Techniques in Computer Systems

Low IRR Block Set Low IRR ( LIR ) block and High IRR (HIR) block

LIR block set

(size is Llirs )

HIR block set

Cache size

L = Llirs + LhirsLhirs

Llirs

Physical CacheBlock Sets

Page 13: ECE7995 Caching and Prefetching Techniques in Computer Systems

An Example for LIRS

Llirs=2, Lhirs=1V time /Blocks

1 2 3 4 5 6 7 8 9 10 R IRR

A X X X 1 1

B X X 3 1

C X 4 inf

D X X 2 3

E X 0 inf

LIR block set = {A, B}, HIR block set = {C, D, E}

Page 14: ECE7995 Caching and Prefetching Techniques in Computer Systems

CDE

HIR block set

A B

A BE

LIR block set

Resident blocks

Mapping to Cache

Block Sets

Lhirs=1

Llirs=2

Physical Cache

Page 15: ECE7995 Caching and Prefetching Techniques in Computer Systems

D is referenced at time 10

V time /Blocks

1 2 3 4 5 6 7 8 9 10 R IRR

A X X X 1 1

B X X 3 1

C X 4 inf

D X X XX 0 3

E X 1 Inf

The resident HIR block (E) is replaced !

Which Block is replaced ? Replace HIR Blocks

Page 16: ECE7995 Caching and Prefetching Techniques in Computer Systems

V time /Blocks

1 2 3 4 5 6 7 8 9 10 R IRR

A X X X 2 1

B X X 3 1

C X 4 inf

D X X XX 0 2

E X 1 Inf

How LIR Set is Updated ? Recency of LIR Block Used

Page 17: ECE7995 Caching and Prefetching Techniques in Computer Systems

V time / Blocks

1 2 3 4 5 6 7 8 9 10 R IRR

A X X X 2 1

B X X 3 1

C X 4 inf

D X X XX 0 2

E X 1 Inf

After D is Referenced at Time 10 … …

E is replaced, D enters LIR set

B

D

Page 18: ECE7995 Caching and Prefetching Techniques in Computer Systems

V time /Blocks

1 2 3 4 5 6 7 8 9 10 R IRR

A X X X 2 1

B X X 4 1

C X XX 0 4

D X X 3 3

E X 1 Inf

If Reference is to C at Time 10 … …

E is replaced, C cannot enter LIR set

Page 19: ECE7995 Caching and Prefetching Techniques in Computer Systems

The LIRS References with Weak Locality

• Memory scanning (one-time access) Infinite IRR; Not included in the LIR block set; replaced timely.

Page 20: ECE7995 Caching and Prefetching Techniques in Computer Systems

The LIRS References with Weak Locality

• Loop-like accesses The IRRs of all blocks are the same; Once a block becomes LIR block, it can keep its status; Any cached block can contribute a hit in one loop of

accesses.

Page 21: ECE7995 Caching and Prefetching Techniques in Computer Systems

The LIRS References with Weak Locality

• Accesses with distinct frequencies: The IRRs of frequently accessed blocks have smaller

IRR, than infrequently accessed blocks. Frequently accessed blocks are LIR blocks; Always cached and get hits.

Page 22: ECE7995 Caching and Prefetching Techniques in Computer Systems

Making LIRS O(1) Efficient

Rmax (Maximum Recency of LIR blocks)

IRR HIR

(New IRR of the HIR block)

This efficiency is achieved by our LIRS stack.

LRU stack + LIR block with Rmax recency in its bottom ==> LIRS stack.

Page 23: ECE7995 Caching and Prefetching Techniques in Computer Systems

Differences between LRU and LIRS Stacks

resident blockLIR block

HIR block

Cache size L = 5

3216

5LRU

stack53216948

LIRS stack

Llir = 3

Lhir =2

Stack size of LRU decided by cache size, and fixed; Stack size of LIRS decided by Rmax, and varied.

LRU stack holds only resident blocks; LIRS stack holds any blocks whose recencies are no more than Rmax.

LRU stack does not distinguish “hot” and “cold” blocks in it; LIRS stack distinguishes LIR and HIR blocks in it, and dynamically maintains their statues.

Page 24: ECE7995 Caching and Prefetching Techniques in Computer Systems

Rmax (Maximum Recency of LIR blocks)

IRR HIR

(New IRR of the HIR block)

Blocks in the LIRS stack ==> IRR < Rmax

Other blocks ==> IRR > Rmax

LIRS Stack

How does LIRS Stack Help?

Page 25: ECE7995 Caching and Prefetching Techniques in Computer Systems

LIRS Operations resident in cacheLIR block

HIR blockCache size

L = 5Llir =

3 Lhir =2

53216948

LIRS stack S

53

Resident HIR Stack Q

• Initialization: All the referenced blocks are given an LIR status until LIR block set is full.

We place resident HIR blocks in Stack Q

Page 26: ECE7995 Caching and Prefetching Techniques in Computer Systems

53216948

53

resident in cacheLIR block

HIR blockCache size

L = 5Llir =

3 Lhir =2

. . . 4835795Access an LIR Block (a Hit)

LIRS stack S

Resident HIR Stack Q

Page 27: ECE7995 Caching and Prefetching Techniques in Computer Systems

532169

4

853

resident in cacheLIR block

HIR blockCache size

L = 5Llir =

3 Lhir =2

. . . 835795Access an LIR Block (a Hit)

LIRS stack S

Resident HIR Stack Q

Page 28: ECE7995 Caching and Prefetching Techniques in Computer Systems

Access an LIR block (a Hit)

69

5321

48

53

resident in cacheLIR block

HIR blockCache size

L = 5Llir =

3 Lhir =2

. . . 35795 8

S Q

Page 29: ECE7995 Caching and Prefetching Techniques in Computer Systems

Access a Resident HIR Block (a Hit)

5321

48

53

resident in cacheLIR block

HIR blockCache size

L = 5Llir =

3 Lhir =2

. . . 35795

3

S Q

Page 30: ECE7995 Caching and Prefetching Techniques in Computer Systems

152

5483

resident in cacheLIR block

HIR blockCache size

L = 5Llir =

3 Lhir =2

. . . 35795

Access a Resident HIR Block (a Hit)

S Q

Page 31: ECE7995 Caching and Prefetching Techniques in Computer Systems

152

5483

resident in cacheLIR block

HIR blockCache size

L = 5Llir =

3 Lhir =2

. . . 35795

1

Access a Resident HIR Block (a Hit)

S Q

Page 32: ECE7995 Caching and Prefetching Techniques in Computer Systems

5483

resident in cacheLIR block

HIR blockCache size

L = 5Llir =

3 Lhir =2

. . . 5795

15

Access a Resident HIR Block (a Hit)

S Q

Page 33: ECE7995 Caching and Prefetching Techniques in Computer Systems

Access a Non-Resident HIR block (a Miss)

5

483

resident in cacheLIR block

HIR blockCache size

L = 5Llir =

3 Lhir =2

. . . 795

15

7

7

S Q

Page 34: ECE7995 Caching and Prefetching Techniques in Computer Systems

5

483

resident in cacheLIR block

HIR blockCache size

L = 5Llir =

3 Lhir =2

. . . 95

5

7

7

9

5

9

5

Access a Non-Resident HIR block (a Miss)

S Q

Page 35: ECE7995 Caching and Prefetching Techniques in Computer Systems

483

resident in cacheLIR block

HIR blockCache size

L = 5Llir =

3 Lhir =2

. . . 5

7

7

9

5

9

7

5

4 7

Access a Non-Resident HIR block (a Miss)

S Q

Page 36: ECE7995 Caching and Prefetching Techniques in Computer Systems

Workload Traces

• postgres is a trace of join queries among four relations in a relational database system;

• sprite is from the Sprite network file system;

• multi2 is obtained by executing three workloads, cs, cpp, and postgres, together.

Page 37: ECE7995 Caching and Prefetching Techniques in Computer Systems

Cache Partition

• 1% of the cache size is for HIR blocks

• 99% of the cache size is for LIR blocks

• Performance is not sensitive to a partition.

Page 38: ECE7995 Caching and Prefetching Techniques in Computer Systems

Looping Pattern: postgres (Access Map)

Virtual Time (Reference Stream)

Logi

cal B

lock

Num

ber

Page 39: ECE7995 Caching and Prefetching Techniques in Computer Systems

Looping Pattern: Postgres (IRR Map) IR

R (R

e-us

e D

ista

nce

in B

lock

s)

Virtual Time (Reference Stream)

LRU

LIRS

Page 40: ECE7995 Caching and Prefetching Techniques in Computer Systems

Looping Pattern: postgres (Hit Rates) Postgres

0

10

20

30

40

50

60

70

80

0 500 1000 1500 2000 2500 3000Cache Size (# of Blocks)

Hit R

atio

(%) OPT

LIRSLRU-22QLRFUEELRUARCLRU

Page 41: ECE7995 Caching and Prefetching Techniques in Computer Systems

Temporally-Clustered Pattern: sprite (Access Map)

Virtual Time (Reference Stream)

Logi

cal B

lock

Num

ber

Page 42: ECE7995 Caching and Prefetching Techniques in Computer Systems

Temporally-Clustered Pattern: sprite (IRR Map) IR

R (R

e-us

e D

ista

nce

in B

lock

s)

Virtual Time (Reference Stream)

LRULIRS

Page 43: ECE7995 Caching and Prefetching Techniques in Computer Systems

Temporally-Clustered Pattern: sprite (Hit Ratio)SPRITE

0102030405060708090

100

0 200 400 600 800 1000 1200

Cache Size (# of Blocks)

Hit R

atio

(%) OPT

LIRSLRU-22QLRFUEELRUARCLRU

Page 44: ECE7995 Caching and Prefetching Techniques in Computer Systems

Mixed Pattern: multi2 (Access Map)

Virtual Time (Reference Stream)

Logi

cal B

lock

Num

ber

Page 45: ECE7995 Caching and Prefetching Techniques in Computer Systems

Mixed Pattern: multi2 (IRR Map) IR

R (R

e-us

e D

ista

nce

in B

lock

s)

Virtual Time (Reference Stream)

LIRS

LRU

Page 46: ECE7995 Caching and Prefetching Techniques in Computer Systems

Mixed Pattern: multi2 (Hit Ratio)MULTI-2

0

10

20

30

40

50

60

70

80

90

0 1000 2000 3000 4000Cache Size (# of Blocks)

Hit R

atio

(%)

OPTLIRSLRU-22QLRFUEELRUARCLRU

Page 47: ECE7995 Caching and Prefetching Techniques in Computer Systems

Summay

• LIRS uses both IRR (or reuse distance) and recency for its replacement decision. 2Q uses only reuse distance.

• LIRS adapts to the locality changes when deciding which blocks have small IRRs. 2Q uses a fixed threshold in looking for blocks of small reuse distances.

• Both LIRS and 2Q are of low time overhead (as low as LRU). Their space overheads are acceptably larger.