Avoiding Initialization Misses to the Heap

Preview:

DESCRIPTION

Avoiding Initialization Misses to the Heap. Jarrod Lewis, Bryan Black, and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison Intel Labs. http://www.ece.wisc.edu/~pharm. Motivation. Memory bandwidth is expensive - PowerPoint PPT Presentation

Citation preview

Avoiding Initialization Misses to the Heap

Jarrod Lewis, Bryan Black, and Mikko H. LipastiDepartment of Electrical and Computer

EngineeringUniversity of Wisconsin—Madison

Intel Labshttp://www.ece.wisc.edu/~pharm

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 2

Motivation Memory bandwidth is expensive

Shouldn’t waste on useless traffic Can be put to better use

Multithreading, prefetching, MLP, etc. Search and destroy useless traffic Focus of this talk: heap initialization Detect and optimize initialization of

newly allocated memory23% of misses in 2MB cache are

invalid

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 3

Dynamically Allocated Memory

malloc()

free()

initializing

load orstore

free()

store

UnallocatedInvalid

Heap Space

AllocatedInvalid

AllocatedValid

Invalid memory need not be transferred Provide interface that expresses this directly?

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 4

Talk Outline Motivation Analysis of Heap Behavior Detecting Initializing Writes Performance Analysis Conclusions

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 5

Allocation Analysis Two main modes

Single dominant allocation (up to 100MB) or Numerous moderate allocations

Initialization of allocations 88% initialized with store miss Little temporal reuse of free’d memory

Phase behavior Start of program often dominates Even SPEC has counterexamples (gcc,

vortex)

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 6

Cache Miss Behavior Init stores cause up to 60% of misses (avg 23%)

These are 35% of all compulsory misses

0%

20%

40%

60%

80%

100%

bzip gap gcc vortex

Load

Non-heap Store

Modify Store

Init Store

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 7

Talk Outline Motivation Analysis of Heap Behavior Detecting Initializing Writes Performance Analysis Conclusions

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 8

Detecting Initializing Writes Annotate malloc()

Record base, size in allocation range cache

Key questions What is working set? How are ranges represented?

Valid bits? Not scalable for 100M allocation Base + bound

How are ranges updated on writes? Split vs. truncate

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 9

Allocation Working Set

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 4 8 16 32 64 128 256 >256

Number of Allocation Ranges Tracked (FIFO)

Per

cen

tag

e of

All

In

itia

lizi

ng

Sto

re M

isse

s Id

enti

fied

bzip crafeon gapgcc gzipmcf parsperl twolvort vpr

4-8 entries sufficient, except parser needs 64

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 10

Sequential Initialization

B C D E F

B C D E F

A C D E F

A B

A

B

C ED F

1. Sequential

B C D E F

C D E F

A D E F

A B

A

B

C E F

1. Forward Sweep

A A

B

C

D

InitializationPattern

TrackingScheme

Allocated-InvalidInitializedUnknown

Forward sweep captures 90%+ except Bzip, gzip, perl

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 11

Alternating Initialization

A B C D E F

B C D E F

A C D E F

A B

A

B

C ED F

2. Alternating

B C D E F

B C D E

A C D E F

A B

A

C D F

2. Bidirectional Sweep

A

F

B

E

InitializationPattern

TrackingScheme

Allocated-InvalidInitializedUnknown

Bidirectional captures 90%+ of perl Doesn’t help bzip or gzip

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 12

Striding Initialization

A B C D E F

B C D E F

A C D E F

A B

A

B

C ED F

3. Striding

BC DE F

B DE F

A C D F

A B

A

B

C E D F

3. Interleaving

A

C

E

InitializationPattern

TrackingScheme

Allocated-InvalidInitializedUnknown

Interleaving captures 90%+ of gzip Still only 60% of bzip Bzip has a large allocation with random initialization

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 13

Talk Outline Motivation Analysis of Heap Behavior Detecting Initializing Writes Performance Analysis Conclusions

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 14

PharmSim Overview

Device simulation, etc. from SimOS-PPC [IBM ARL] PharmSim replaces functional simulators

Full OOO core model, values in rename registers Supports priv. mode, MMU, TLB, exceptions, interrupts, barriers,

flushes, etc. Lead developer: Trey Cain (thanks Trey!)

Block

Simple

SimOS-PPC-AIX 4.3.1-Disk driver-E’net driverE

thern

et

PharmSim-OOO Core-Gigaplane

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 15

Operating System Effects Widely accepted for SPECINT:

Safe to ignore O/S paths Most popular tool (Simplescalar)

Intercepts system calls Emulates on host, updates “flat” memory Returns “magically” with cache contents intact

We have found that [CAECW2002]: Omitting system references leads to dramatic

error (5.8x L2 miss rate, 100% IPC in worst case)

Specifically, AIX page fault handler eliminates many initializing write misses

Had we not used PHARMsim? Dramatically overstated performance benefit

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 16

AIX Page Installation Heap manager calls sbrk

AllocatedValid

Unallocated

Data segment

Heap manager calls sbrk Malloc returns block < 4KB

Heap manager calls sbrk Malloc returns block < 4KB Program writes to block

Heap manager calls sbrk Malloc returns block < 4KB Program writes to block

First reference causes page fault

Heap manager calls sbrk Malloc returns block < 4KB Program writes to block

First reference causes page fault

AIX installs entire page using dcbz

Unallocated

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 17

Block vs. Page Installation Page installation

Practically free as part of page fault Shortcomings of page installation

Pollutes cache Not scalable to superpages (AIX v5.1) Does not work for heap reuse

Our short simulations don’t show this benefit I.e. high overlap between initializing writes

and first reference to extended data segment

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 18

Integrating ARC

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 19

Speedup

Very aggressive core model Still can’t tolerate all store miss latency

Block mode slightly better than page mode Cache pollution, less coverage

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

gap mcf parser

blockpage

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 20

Program Phase Behavior Only benefits initialization program

phase Some programs initialize throughout

execution

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 21

Conclusions Initializing writes

Cause 23% of all misses in 2MB L2 Avoid miss with block or page mode

install Up to 41% performance improvement

Subject to initialization:computation ratio

Tracking allocation ranges Working set very small (4-8, 64) Forward/bidirectional/interleaved

sweep enables range truncation

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 22

Acknowledgments Originated as course project:

Gordie Bell, Trey Cain, Kevin Lepak PHARMsim infrastructure

Lead developer: Trey Cain Financial and equipment support

IBM and Intel Corp National Science Foundation University of Wisconsin

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 23

Questions?

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 24

Backup Slides

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 25

Invalid Memory Traffic Real data traffic that transfers invalid

data

Initializing Store Initial write to a storage location that

contains invalid data

Cache M ain M emory

X ddddA 0001

FETCH X

W RITEBACK AA - dea llocatedX - a llocated/un in itia lized

MISS X

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 26

Allocation Analysis Single dominant allocation vs. Numerous moderate allocations

0%

20%

40%

60%

80%

100%

gap-count

gap-size

gcc-count

gcc-size

>=16MB<16MB<256KB<2KB<64B

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 27

Initialization of Heap 88% initialized by store miss

Relatively little temporal reuse of freed memory

0%

20%

40%

60%

80%

100%

bzip eon gcc mcf

Uninit

Hit-Init

Miss-Init

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 28

PharmSim Pipeline

Decode Execute CommitMemFetch Translate

Substantially similar to IBM Power4 Some instructions “cracked” (1:2 expansion) Others (e.g. lmw) microcode stream

Mem Stage Interface to 2-level cache model Sun Gigaplane XB snoopy MP coherence Caches contain values, must remain coherent

No cheating! No “flat” memory model for reference/redirect

April 19, 2023

Avoiding Initialization Misses to the Heap – Mikko Lipasti 29

Machine ModelUnrealistically aggressive model to devalue the

impact of store misses. 8-wide, 6-stage pipeline 8K entry combining predictor 128 RUU, 64 LSQ entries, 64 write buffers 256KB 4-way associative L1D cache 64KB 2-way associative L1I 2MB 4-way associative L2 unified cache All cache blocks are 64 bytes L2 latency is 10 cycles Memory latency is 70 cycles.

Recommended