Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee...

Preview:

Citation preview

Generational Stack Collection And Profile Generational Stack Collection And Profile

driven Pretenuringdriven Pretenuring

Perry Cheng Robert Harper

Peter Lee

Presented By Moti Alperovitch

(moti@nmt.co.il)

The problem

• Some data die young, and some data die old.

• In recursions, most deep stack unwind very infrequently.

• Scanning unchanged roots may take a dominant time.

We compare the following types

• Semispace stack collection (Cheney).

• Generational collector.

• General Collection with stack marker.

• Pretenuring with Stack marker.

Semispace copy collection

• Scanning the Stack for roots, and copy data that reachable from the roots to unused areas (Nursery, Survive).

• Disadvantage:– all data is copied, when some data die young,

and some die old.

Generational collection

• Base on semispace copy collection.

• Arrange some heap areas according to the objects life time.

• Disadvantage:– For programs with deep call chain, The stack

scanning can take a lot of time.– Long time object are typically copied several

times before they are tenured.

General stack collection

• Use stack marker in order to cache the root scan.

• Disadvantage:– Long time object are typically copied several

times before they are tenured

Pretenuring

• Making a run, in order to build profiles for each object life time according to it’s allocation site.

TIL Compiler

• Optimization compiler for ML (SML).

• Intentional polymorphism.

• Nearly Tag free garbage collection.

• Conventional functional language optimization.

• Loop Optimization.

Stack Scanning

• At any execution point, data is live if it is accessed as the program continue to execute.

• The collector need to retain data that is accessible by following the all pointers roots.

• The roots are registers and stack slots.

Difficulties

• Accurate determine the root set.

• In callee-save registers, the content of a register or stack slot can come from caller frames so stack frames cannot be decoded in isolation.

• In Polymorphism the compiler cannot statically compute whether a value is a pointer of not.

Finding the root

• When the GC is called from mutator, the return address indicate the current execution point (Return Address).

• By the RA (Using a table), we can determine the frame layout of the GC - caller frame.

• By continuing this way, we can find the root.

Finding the roots

• Determine the roots set from the initial frame, By scanning downwards.

• The two ways scanning is needed since there are stack slots that their type depend on the previous stack slot.

Trace table information

• The Return address (RA).

• Stack frame size.

• For each stack-slot we record its trace:– Pointer: The compiler statically determine that

it’s a pointer.– Non Pointer - The value is not a root.– Calee-save + (Register) - Calle-save

information.

Trace table information - 2

– Compute: Compiler couldn’t statically determine the pointer status of a value. Have an additional information to determine where the type of such value reside.

Stack frames and the corresponding table entry.

RA=0x2001c71842

Slot 1Slot 2Slot 3Slot 4Slot 5Slot 6

55 56

77 78 79

INTINTINT

3.1415

Stack Frame

RA=0x2001c718

Frame size = 6

Non Pointer

Pointer

Pointer

Compute: Stack 4

Entry 1Entry 2Entry 3Entry 4Entry 5Entry 6Entry 7Compute: Calle $10

…Trace info on Register

Table Entry

Semispace against Generations collections

Time for K = 1.5

01020304050

60708090

100

CheckSum Color FFT Grobner KnuthPending

Lexgend Life Peg PIA Simplae

Program Name

ms

SemiSpaceGenerational

SemiSpace against Generations collections

Time for K = 4

0

10

20

30

40

50

60

CheckSum Color FFT Grobner KnuthPending

Lexgend Life Peg PIA Simplae

Program Name

ms

SemiSpaceGenerational

SemiSpace against Generations collectionsNumber of GC for K = 1.5

05000

100001500020000250003000035000

Check

SumColo

rFFT

Grobn

er

Knuth

Pen

ding

Lexge

nd Life Peg PIA

Simpla

e

Program Name

Number

SemiSpaceGenerational

Semispace against Generations collectionsNumber of GC for K = 4

02000400060008000

1000012000

Check

SumColo

rFFT

Grobn

er

Knuth

Pen

ding

Lexge

nd Life Peg PIA

Simpla

e

Program Name

Number

SemiSpaceGenerational

Stack marking

• When the stack is deep, scanning the root may take a dominant time of the GC time.

• Most of the stack usually doesn’t change from the previous GC, to the current GC.

• Marking the stack frames that didn’t changed, can significant improve the roots scanning.

Marking the stack - 1st method

• On each stack frame, add a flag whether it was changed. The collector reset this flag when passing it, while the mutator set this flag.

• Disadvantage:– The mutator is involved in the GC process.

– The compiler need to do several operations for the GC, on each return, while most time the GC is not used.

Marking the stack - 2nd method

• When scanning the roots, set the RA of every n stack frame to a special stub function.

• The stub function hold a table of the RA.

• The stub function notes that this frame was deactivate, and continue to the original RA.

Marking the stack - Method 2

• The Problems with this method:– Functions doesn’t always return normally.– When exception is raised, It’s invoked in stack

order until there is a matching handler.– Fortunately, we can hold a value of M that

updated on exceptions that is contains the shallowest stack pointer that occurred as a result of raised exception.

Stack Marker improvement

-100

1020304050607080

%

Che

ckSu

m

Col

or

FF

T

Gro

bner KB

Lex

gen

Lif

e

Nqu

een

Peg

PIA

Sam

ple

Pretenuring

• Using profile data to predict the survival rate of an object.

• We speculate that object allocated from the same place in program would have to be similar lifetime.

• In order to check this hypothesis we divide the program to some heap allocations site.

Pretenuring - 2

• The compiler is modified in order to update a table of allocation sites when creating.

• During garbage collection the entries are updated.

• We scan allocation area after each collection to located death object and update their allocation site.

Pretenuring - 3

• Using this information we can create statistics about the number, size and average age of object created from each allocation site.

• We include only allocation sites that included at least 1% of the allocations, or 1% of the copied data.

The profile results

The profile results

The results

• According to the results we can see that 90% of the allocation have very short life time, but 96 - 99 % of the copied date are generated from 4 sites.

Using the profile data

• Object that created from allocated site that have long life time, directly created into the older generation.

• Problem: An object directly allocated in the older generation may have a reference to an object in the younger generation.

Solutions ?

• Allocating that type of object in the young generation.– May lead to a lot more copying.

• Remember the area of the older generation that have reference to the young reference, and scan it on each minor generation.– Scanning without copying doesn’t take a lot of

time.

Improvement of pretenuring (ms)

Generational collection Generational collection withpretenuring

ProgramK=1.5 K=2.0 K=4.0 K=1.5 K=2.0 K=4.0

%Improve

Knuth-Bandix 7.66 8.00 8.07 1.44 1.76 1.88 33

Lexgebnd 3.20 2.58 2.43 2.63 2.00 1.55 27

Nqueen 1.83 1.86 1.95 13.88 14.03 13.53 50

Simple 5.05 4.81 4.33 3.58 3.74 3.71 12

Improvement of pretenuring (bytes copy)

Generational collection Generational collection withpretenuring

ProgramK=1.5 K=2.0 K=4.0 K=1.5 K=2.0 K=4.0

%Improve

Knuth-Bandix

14,569,800

17.869,436

17,695,560

2,050,212

5,376,156

5,151,708 70

Lexgebnd27,427,5

4418,647,6

3216,435,2

9224,278,3

8815,452,6

9613,397

,340 18

Nqueen5,312,54

85,312,548

5,312,548

194,256 194,256194,256 96

Simple25,771,3

4825,431,1

4425,430,248

14,241,500

14,734,176

14,133,376 44

Comparing between all the methods

0

20

40

60

80

100

120C

olor

Gro

bner KB

Lex

gen

Lif

e

Nqu

een

PIA

Sim

ple

Generational Stack Markers Pretenuring with stack Marker

Conclusion for pretenuring

• The reduction of GC time is smaller that excepted from the reduction of data copied.

• Since we have to check the younger generations, the cost of GC time is still proportional to the live data (With a smaller constant).

Suggestion to improve the speed

• Creating a control-flow and data-flow analysis on objects.

Conclusions

• Generational collector is twice faster on GC time. And also improve the GC time, since it’s improve the cache locality.

• For programs that use deep stack, caching the roots data can improve GC time up to 74%.

• Profiling the heap can improve the speed for some cases by 50%.

The End