29
Avoiding Initialization Misses to the Heap Jarrod Lewis, Bryan Black, and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison Intel Labs http://www.ece.wisc.edu/~pharm

Avoiding Initialization Misses to the Heap

Embed Size (px)

DESCRIPTION

Avoiding Initialization Misses to the Heap. Jarrod Lewis, Bryan Black, and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison Intel Labs. http://www.ece.wisc.edu/~pharm. Motivation. Memory bandwidth is expensive - PowerPoint PPT Presentation

Citation preview

Page 1: Avoiding Initialization Misses  to the Heap

Avoiding Initialization Misses to the Heap

Jarrod Lewis, Bryan Black, and Mikko H. LipastiDepartment of Electrical and Computer

EngineeringUniversity of Wisconsin—Madison

Intel Labshttp://www.ece.wisc.edu/~pharm

Page 2: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 2

Motivation Memory bandwidth is expensive

Shouldn’t waste on useless traffic Can be put to better use

Multithreading, prefetching, MLP, etc. Search and destroy useless traffic Focus of this talk: heap initialization Detect and optimize initialization of

newly allocated memory23% of misses in 2MB cache are

invalid

Page 3: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 3

Dynamically Allocated Memory

malloc()

free()

initializing

load orstore

free()

store

UnallocatedInvalid

Heap Space

AllocatedInvalid

AllocatedValid

Invalid memory need not be transferred Provide interface that expresses this directly?

Page 4: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 4

Talk Outline Motivation Analysis of Heap Behavior Detecting Initializing Writes Performance Analysis Conclusions

Page 5: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 5

Allocation Analysis Two main modes

Single dominant allocation (up to 100MB) or Numerous moderate allocations

Initialization of allocations 88% initialized with store miss Little temporal reuse of free’d memory

Phase behavior Start of program often dominates Even SPEC has counterexamples (gcc,

vortex)

Page 6: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 6

Cache Miss Behavior Init stores cause up to 60% of misses (avg 23%)

These are 35% of all compulsory misses

0%

20%

40%

60%

80%

100%

bzip gap gcc vortex

Load

Non-heap Store

Modify Store

Init Store

Page 7: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 7

Talk Outline Motivation Analysis of Heap Behavior Detecting Initializing Writes Performance Analysis Conclusions

Page 8: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 8

Detecting Initializing Writes Annotate malloc()

Record base, size in allocation range cache

Key questions What is working set? How are ranges represented?

Valid bits? Not scalable for 100M allocation Base + bound

How are ranges updated on writes? Split vs. truncate

Page 9: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 9

Allocation Working Set

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 4 8 16 32 64 128 256 >256

Number of Allocation Ranges Tracked (FIFO)

Per

cen

tag

e of

All

In

itia

lizi

ng

Sto

re M

isse

s Id

enti

fied

bzip crafeon gapgcc gzipmcf parsperl twolvort vpr

4-8 entries sufficient, except parser needs 64

Page 10: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 10

Sequential Initialization

B C D E F

B C D E F

A C D E F

A B

A

B

C ED F

1. Sequential

B C D E F

C D E F

A D E F

A B

A

B

C E F

1. Forward Sweep

A A

B

C

D

InitializationPattern

TrackingScheme

Allocated-InvalidInitializedUnknown

Forward sweep captures 90%+ except Bzip, gzip, perl

Page 11: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 11

Alternating Initialization

A B C D E F

B C D E F

A C D E F

A B

A

B

C ED F

2. Alternating

B C D E F

B C D E

A C D E F

A B

A

C D F

2. Bidirectional Sweep

A

F

B

E

InitializationPattern

TrackingScheme

Allocated-InvalidInitializedUnknown

Bidirectional captures 90%+ of perl Doesn’t help bzip or gzip

Page 12: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 12

Striding Initialization

A B C D E F

B C D E F

A C D E F

A B

A

B

C ED F

3. Striding

BC DE F

B DE F

A C D F

A B

A

B

C E D F

3. Interleaving

A

C

E

InitializationPattern

TrackingScheme

Allocated-InvalidInitializedUnknown

Interleaving captures 90%+ of gzip Still only 60% of bzip Bzip has a large allocation with random initialization

Page 13: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 13

Talk Outline Motivation Analysis of Heap Behavior Detecting Initializing Writes Performance Analysis Conclusions

Page 14: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 14

PharmSim Overview

Device simulation, etc. from SimOS-PPC [IBM ARL] PharmSim replaces functional simulators

Full OOO core model, values in rename registers Supports priv. mode, MMU, TLB, exceptions, interrupts, barriers,

flushes, etc. Lead developer: Trey Cain (thanks Trey!)

Block

Simple

SimOS-PPC-AIX 4.3.1-Disk driver-E’net driverE

thern

et

PharmSim-OOO Core-Gigaplane

Page 15: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 15

Operating System Effects Widely accepted for SPECINT:

Safe to ignore O/S paths Most popular tool (Simplescalar)

Intercepts system calls Emulates on host, updates “flat” memory Returns “magically” with cache contents intact

We have found that [CAECW2002]: Omitting system references leads to dramatic

error (5.8x L2 miss rate, 100% IPC in worst case)

Specifically, AIX page fault handler eliminates many initializing write misses

Had we not used PHARMsim? Dramatically overstated performance benefit

Page 16: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 16

AIX Page Installation Heap manager calls sbrk

AllocatedValid

Unallocated

Data segment

Heap manager calls sbrk Malloc returns block < 4KB

Heap manager calls sbrk Malloc returns block < 4KB Program writes to block

Heap manager calls sbrk Malloc returns block < 4KB Program writes to block

First reference causes page fault

Heap manager calls sbrk Malloc returns block < 4KB Program writes to block

First reference causes page fault

AIX installs entire page using dcbz

Unallocated

Page 17: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 17

Block vs. Page Installation Page installation

Practically free as part of page fault Shortcomings of page installation

Pollutes cache Not scalable to superpages (AIX v5.1) Does not work for heap reuse

Our short simulations don’t show this benefit I.e. high overlap between initializing writes

and first reference to extended data segment

Page 18: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 18

Integrating ARC

Page 19: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 19

Speedup

Very aggressive core model Still can’t tolerate all store miss latency

Block mode slightly better than page mode Cache pollution, less coverage

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

gap mcf parser

blockpage

Page 20: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 20

Program Phase Behavior Only benefits initialization program

phase Some programs initialize throughout

execution

Page 21: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 21

Conclusions Initializing writes

Cause 23% of all misses in 2MB L2 Avoid miss with block or page mode

install Up to 41% performance improvement

Subject to initialization:computation ratio

Tracking allocation ranges Working set very small (4-8, 64) Forward/bidirectional/interleaved

sweep enables range truncation

Page 22: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 22

Acknowledgments Originated as course project:

Gordie Bell, Trey Cain, Kevin Lepak PHARMsim infrastructure

Lead developer: Trey Cain Financial and equipment support

IBM and Intel Corp National Science Foundation University of Wisconsin

Page 23: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 23

Questions?

Page 24: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 24

Backup Slides

Page 25: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 25

Invalid Memory Traffic Real data traffic that transfers invalid

data

Initializing Store Initial write to a storage location that

contains invalid data

Cache M ain M emory

X ddddA 0001

FETCH X

W RITEBACK AA - dea llocatedX - a llocated/un in itia lized

MISS X

Page 26: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 26

Allocation Analysis Single dominant allocation vs. Numerous moderate allocations

0%

20%

40%

60%

80%

100%

gap-count

gap-size

gcc-count

gcc-size

>=16MB<16MB<256KB<2KB<64B

Page 27: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 27

Initialization of Heap 88% initialized by store miss

Relatively little temporal reuse of freed memory

0%

20%

40%

60%

80%

100%

bzip eon gcc mcf

Uninit

Hit-Init

Miss-Init

Page 28: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 28

PharmSim Pipeline

Decode Execute CommitMemFetch Translate

Substantially similar to IBM Power4 Some instructions “cracked” (1:2 expansion) Others (e.g. lmw) microcode stream

Mem Stage Interface to 2-level cache model Sun Gigaplane XB snoopy MP coherence Caches contain values, must remain coherent

No cheating! No “flat” memory model for reference/redirect

Page 29: Avoiding Initialization Misses  to the Heap

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 29

Machine ModelUnrealistically aggressive model to devalue the

impact of store misses. 8-wide, 6-stage pipeline 8K entry combining predictor 128 RUU, 64 LSQ entries, 64 write buffers 256KB 4-way associative L1D cache 64KB 2-way associative L1I 2MB 4-way associative L2 unified cache All cache blocks are 64 bytes L2 latency is 10 cycles Memory latency is 70 cycles.