34
Taking Off The Gloves With Reference Counting Immix Rifat Shahriyar Xi Yang Stephen M. Blackburn Australian National University Kathryn S. McKinley Microsoft Research

Taking Off The Gloves With Reference Counting Immix

  • Upload
    gitel

  • View
    42

  • Download
    1

Embed Size (px)

DESCRIPTION

Taking Off The Gloves With Reference Counting Immix. Rifat Shahriyar Xi Yang Stephen M. Blackburn Australian National University. Kathryn S. M cKinley Microsoft Research. 53 Years A go…. The Birth of GC. Today…. Why Reference Counting?. Advantages Reclaim as-you-go O bject-local - PowerPoint PPT Presentation

Citation preview

Taking Off The GlovesWith Reference Counting Immix

Rifat ShahriyarXi Yang

Stephen M. BlackburnAustralian National University

Kathryn S. McKinleyMicrosoft Research

2

53 Years Ago…

3

The Birth of GC

4

Today…

5

Why Reference Counting?

Advantages✔ Reclaim as-you-go✔ Object-local✔ Basic RC is easy

Disadvantages✘ Cycles✘ Performance

Series1

40%

9%Tota

l Tim

e v

Pro

du

cti

on

Backup tracing

<2013 2013

OurGoal

6

Why So Slow?

Total Time

Mutator Time

GC Time

9% 9%

-3%

Tim

e v

Pro

du

cti

on

Total Mutator

GC

7

Looking a Little Deeper…

Mutator time Instructions retired L1D cache misses

9% 9%

32%

6%4%

28%

-2% -3% -2%-3% -3%

1%

RC MS SS Immix

Mu

tato

r v P

rod

ucti

on

Time InstructionsRetired

L1 DCache Misses

8

Free List vs. Bump Pointer

Bump Pointer

Free List

9

Looking a Little Deeper…

Mutator time Instructions retired L1D cache misses

9% 9%

32%

6%4%

28%

-2% -3% -2%-3% -3%

1%

RC MS SS Immix

Mu

tato

r v P

rod

ucti

on

Time InstructionsRetired

L1 DCache Misses

Free List

Bump Pointer

10

Reference Counting

11

Basic Reference Counting[Collins 1960]

A B C D E F1 1 1 1 3 12 2

E0 1

12

How RC worksFundamental optimizations

• Backup tracing [Weizenbaum 1969]

– Reclaim cyclic garbage

• Deferral [Deutsch and Bobrow 1976]

– Note changes to stacks & registers occasionally

• Coalescing [Levanoni and Petrank 2001]

– Note only initial and final state of references

13

Deferral[Deutsch and Bobrow 1976, Bacon et al. 2001]

Stacks & Registers

A++F++

B--D++ A--

F--

A B C D FE1 1 1 1 2 1

A--

21 0 2 2

mutator activityGC: scan rootsGC: apply incrementsGC: apply decrementsGC: collectGC: move deferred decsA--F--

++ -- --'

14

Coalescing[Levanoni and Patrank 2001]

B--

Remember A Ignore intermediate mutations

Compare A, Aold

B--, F++

C++C--

D++D--

E++E--

F++

A B C D FE

15

How RC worksRecent Optimizations

• Limited bit count [Shahriyar et al. 2012]

– Use just few bits, fix o/f with backup tracing

• Elision of new object counts [Shahriyar et al. 2012]

– Only do RC work if object survives to first GC

• Allocate as dead [Shahriyar et al. 2012]

– Avoid free-list work for short lived objects

16

How Immix works

0

• Contiguous allocation into regions– 256B lines and 32KB blocks– Objects span lines but not blocks

• Simple mark phase– Mark objects and containing regions

• Free unmarked regions • Recycled allocation and defragmentation

block

line

recyclable linesobject mark line mark

17

Goal,Challenges,

Contributions

18

Goal & Challenges

• Goal– Object-local pay-as-you-go collection– Excellent mutator locality– Copying to eliminate fragmentation

• Immix provides opportunistic copying ✔Same mutator locality as contiguous allocator

• However, RC is inherently localReferences to an object generally unknown……but copying must redirect all references

19

Contributions

✔Identify heap layout as bottleneck for RC✔Introduce copying RC (RC Immix)

✔Exploit Immix’s opportunistic copy✔Observe new objects can be copied by first GC✔Observe old objects can be copied by backup GC✔Line/block reclamation, header bits

✔Deliver great performance

20

Design of RC Immix

21

Reference Countingin RC Immix

• Reference count for object• Live object count for line

– Lines ‘born dead’ (zero live object count)– Inc when any object gets first RC increment– Dec when any object is dead

• Collect lines with zero live object count

0 01 3 1 2

11 3 2 1 2200

22

Cycle Collectionin RC Immix

0

• Live object counts zeroed• Trace marks live objects and lines

– Corrects incorrect counts (due to cycles)

• Sweep– Collects unmarked lines– Sweeps dead lines, not dead objects

13 2122 4 0 00 0 2

23

DefragmentationIn RC Immix

• RC is object-local, inhibiting copying• But, RC Immix seizes two opportunities

1. All references to new objects known at first GC2. Backup tracing performs a global trace

• Use opportunistic copying in both cases– Mix copying with in-place RC and marking – Stop copying when available space exhausted

24

Proactive Defragmentation

• Copy surviving new objects (with bounded reserve)

• Optimization, not for correctness– Reserve sized for performance unlike semi-space

• Use past survival rate to predict the future

1 2120 30 41 521 3

25

Reactive Defragmentation

• Backup tracing performs a global trace• Piggyback on this, copy live objects• Use available memory threshold

– If below threshold, do defrag at next cycle GC

26

Methodology

27

Hardware, Software & Benchmarks

• 21 benchmarks– DaCapo, SPECjvm98 and pjbb2005

• 20 invocations for each benchmark• Jikes RVM and MMTk

– All garbage collectors are parallel

• Intel Core i7 2600K, 4GB• Ubuntu 10.04.1 LTS

28

Results

29

Bottom LineGeomean of all benchmarks, versus production

heap size = 2x the minimum heap size

3% improvement over production on geomean

-35%

-30%

-25%

-20%

-15%

-10%

-5%

0%

5%

10%

15%

RC

RC Immix

TotalTime

MutatorTime

GCTime

30

Total TimeBy Benchmark

heap size = 2x the minimum heap size

+5% worst case, -25% best case

-30%

-20%

-10%

0%

10%

20%

30%

40%

RC RC Immix

fast

er

Tim

e

slo

we

r

jess db

javac

mtr

t

jack

avro

ra

blo

at

chart

ecl

ipse fop

hsq

ldb

jyth

on

luin

dex

luse

arc

hfix

pm

d

sunflow

xala

n

pjb

b2

00

5

com

pre

ss

31

Mutator TimeBy Benchmark

heap size = 2x the minimum heap size

+4% worst case, -10% best case

-15%

-10%

-5%

0%

5%

10%

15%

20%

25%

30%

RC RC Immix

fast

er ←

M

uta

tor

→ s

low

er

jess db

javac

mtr

t

jack

avro

ra

blo

at

chart

ecl

ipse fop

hsq

ldb

jyth

on

luin

dex

luse

arc

hfix

pm

d

sunflow

xala

n

pjb

b2

00

5

com

pre

ss

32

GC TimeBy Benchmark

heap size = 2x the minimum heap size

+5% worst case, -25% best case

-30%

-20%

-10%

0%

10%

20%

30%

40%

RC RC Immix

fast

er

GC

sl

ow

er

jess db

javac

mtr

t

jack

avro

ra

blo

at

chart

ecl

ipse fop

hsq

ldb

jyth

on

luin

dex

luse

arc

hfix

pm

d

sunflow

xala

n

pjb

b2

00

5

com

pre

ss

33

Total Time v Heap Size

RCImmix matches GenImmix at 1.3x and outperforms from 1.4x

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 61

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

1.5

GenImmix RC RC Immix RC Immix (No PC)

Heap Size / Minimum Heap

Tim

e /

Be

st

34

Summary and Conclusion

• RC Immix – Combines RC and Immix

• Great performance– Outperforms fastest production

• Transforms RC

Questions?Series1

9%

Tota

l Tim

e v

Pro

du

cti

on

RC 2013

RC Immix

-3%

Available at: http://jira.codehaus.org/browse/RVM-1061