36
Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel

Ayse

Embed Size (px)

Citation preview

Page 1: Ayse

Heap Shape ScalabilityScalable Garbage Collection on Highly Parallel Platforms

Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel

Page 2: Ayse

ISMM 2010 2

Outline Is tracing GC ready for the many-core?

How the heap shape is related?

Evaluating the heap shape scalability Idealized Trace Utilization

Improving the heap shape scalability Solution 1: Reshaping with Shortcut References Solution 2: Tracing with Speculative Roots

Related work & conclusion

Page 3: Ayse

ISMM 2010 3

Is Tracing GC Ready for Many-core ?

a

Heap

he

b d

g

c

j

f i

k

l

m

Roots

GC tracing Traverse lots of objects

Sequential trace Each live object is

touched (BFS, DFS)

Parallel trace Load balancing 1K cores really soon

Page 4: Ayse

ISMM 2010 4

Can Heaps Spoil the Scalability?

Heap

1

2

Roots

3

4M live objects Single linked list

Sequential trace 4M steps

Parallel trace Not any faster

4K

4M

Page 5: Ayse

ISMM 2010 5

Deep Object Graphs Can be EvilObject Depth

Length of the minimal path from some root object

Object-Graph Depth Maximal live object depth

Heap

0

1

2

3

Object Depths

Example:

Definition:

How deep are object graphs of Java programs?

SpecJVM, Dacapo, SpecJBB

Instrumented BFS trace

Page 6: Ayse

ISMM 2010 6

Name DescriptionHeap Size

(MB)GC

CyclesMax

Depth

SpecJVM

javac Java compiler run 3 times 32 15 1,234

mtrt 3D raytracer 32 8 1,416

Dacapo

bloat Java byte code analyzer 48 344 1,195

pmd Java code analyzer 48 59 18,482

xalan Transforms XML into HTML

128 129 8,476

Other 15 benchmarks 128

Object-Graph Depths of Java Benchmarks

Page 7: Ayse

ISMM 2010 7

Name DescriptionHeap Size

(MB)GC

CyclesMax

Depth

SpecJVM

javac Java compiler run 3 times 32 15 1,234

mtrt 3D raytracer 32 8 1,416

Dacapo

bloat Java byte code analyzer 48 344 1,195

pmd Java code analyzer 48 59 18,482

xalan Transforms XML into HTML

128 129 8,476

Other 15 benchmarks 128

Object-Graph Depths of Java Benchmarks

Page 8: Ayse

ISMM 2010 8

Object-Graph Depths of Java Benchmarks

Name DescriptionHeap Size

(MB)GC

CyclesMax

Depth

SpecJVM

javac Java compiler run 3 times 32 15 1,234

mtrt 3D raytracer 32 8 1,416

Dacapo

bloat Java byte code analyzer 48 344 1,195

pmd Java code analyzer 48 59 18,482

xalan Transforms XML into HTML

128 129 8,476

Other 15 benchmarks 128

Page 9: Ayse

ISMM 2010 9

Not all Deep Object Graphs are Evil

Heap

1

2

Roots

3

4K

Object-graph 1K same sized linked lists

of 4K objects

Sequential trace 4M steps

Parallel trace Scales well for up to 1K

processors

4K 4K

Page 10: Ayse

ISMM 2010 10

Definition:

Deep and Narrow Object Graphs are Evil

Object Depths Distribution

Amount of objects at different depths

Example:

Heap

2

4

3

1

1

#objectsGraphical Representation (Object-graph shape):

0

1

2

3

4

5

1 2 3 4 5

depth

# o

bje

cts

Page 11: Ayse

ISMM 2010 11

Object-Graph Shapes of Java Benchmarks

jython#

ob

ject

s

depth

depth

xalan

# o

bje

cts

Page 12: Ayse

ISMM 2010 12

Object-Graph Shapes of Java Benchmarks

bloat

javac

mtrt

xalan

pmd

db

hsqldb

antlr

jython

jess

jack

lusearch

depth (log 10) depth (log 10)

# o

bje

cts

(lo

g 1

0)

Page 13: Ayse

ISMM 2010 13

The Idealized Trace Utilization

Simulate the idealized traversal by N threads Perfect load balancing Perfect cache behavior

BFS traversal Single time tick object scan

During the traversal, count Objects available to be scanned at every time tick Processor slots: some are busy and some are wasted

At the end, report the utilization (ITU)Total Scanned Objects

Total Processor Slots* 100%

Page 14: Ayse

ISMM 2010 14

Idealized Trace Utilization Example

Heap objects

Time ticks

Scanned objects

8

15

Total Scanned Objects

Total Processor Slots* 100%ITU = =

158*4

* 100% = 47 %

4 Tracers

1

2

2

5

3

9

4

11

5

12

6

13

7

14

Core 1

Core 2

Core 3

Core 4

Page 15: Ayse

ISMM 2010 15

Graphical Representation

1. Simulate and compute2. Draw the graph

depth

# o

bje

cts

0

20

40

60

80

100

1 2 4 8

Processors

Util

izat

ion

Page 16: Ayse

ISMM 2010 16

Worst Case ITU for Java Benchmarks

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Utili

zatio

n

check

compress

db

jack

javac

jess

mpegaudio

mtrt

antlr

bloat

hsqldb

jython

lusearch

pmd

xalan

Page 17: Ayse

ISMM 2010 17

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Util

izat

ion

check

compress

db

jack

javac

jess

mpegaudio

mtrt

antlr

bloat

hsqldb

jython

lusearch

pmd

xalan

Average ITU for Java Benchmarks

Page 18: Ayse

ISMM 2010 18

What’s Next?

Problematic heaps exist javac, mtrt, pmd, bloat, xalan

Can we improve the trace scalability without modifying the benchmarks?

Reshape with Shortcut References

Trace with Speculative Roots

Page 19: Ayse

ISMM 2010 19

Reshape with Shortcut References

Heap

1

2

Roots

3

4

Sequential trace 16K steps

New references are added Invisible to the

program Useful for the

tracers

Parallel trace Scales for 4

processors4K

16K

Page 20: Ayse

ISMM 2010 20

Evaluation Prototype Devise a shortcut strategy

Where shortcuts are needed

When the program is stopped for GC Compute the Idealized Trace Utilization Run the shortcuts adding algorithm Compute the ITU for the modified heap

Report ITU improvement Amount of shortcuts added

Page 21: Ayse

ISMM 2010 21

Shortcut Strategy and Parameters Identify candidate subgraphs

With at least size objects With depth-to-size ratio no less than ratio

Add shortcut to the root of the subgraph Leading to the objects length pointers away Next shortcut introduced not closer than distance

pointers away

1 65432 987

Distance (2) Length (4)

Size=5

Depth=4

Ratio=0.8

Page 22: Ayse

ISMM 2010 22

Results for SpecJVM mtrt

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Util

izat

ion

Worst before Worst after Avg before Avg after

~ 500K of live objects

Max shortcuts – 110

Avg shortcuts – 94

Size=50

Ratio=0.2

Length=50

Distance=25

Page 23: Ayse

ISMM 2010 23

Results for DaCapo xalan

~ 400K of live objects

Max shortcuts – 888

Avg shortcuts – 536

Size=50

Ratio=0.2

Length=50

Distance=25

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Utiliz

atio

nWorst before Worst after Avg before Avg after

Page 24: Ayse

ISMM 2010 24

Results for DaCapo bloat

~ 400K of live objects

Max shortcuts – 940

Avg shortcuts – 378

Size=50

Ratio=0.2

Length=50

Distance=25

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Utiliz

atio

nWorst before Worst after Avg before Avg after

Page 25: Ayse

ISMM 2010 25

Results for DaCapo pmd

~ 434K of live objects

Max shortcuts – 5,874

Avg shortcuts – 432

Size=600

Ratio=0.1

Length=120

Distance=40

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Utiliz

atio

nWorst before Worst after Avg before Avg after

Page 26: Ayse

ISMM 2010 26

Results for SpecJVM javac

~ 383K of live objects

Max shortcuts – 292

Avg shortcuts – 16

Size=500

Ratio=0.1

Length=100

Distance=50

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Util

izat

ion

Worst before Worst after Avg before Avg after

Page 27: Ayse

ISMM 2010 27

Trace with Speculative Roots

Heap

Roots

4K

4M

Sequential trace 16M steps

Helper tracers Pick random roots Trace using custom

colors

Parallel trace Scales for 4

processors

Page 28: Ayse

ISMM 2010 28

Speculative Trace Helper tracer

Pick up the root Pick up the color, e.g. red Trace; if blue object is discovered, mark blue as

reachable from red

Regular trace Trace from root; if blue object is discovered, mark blue

as live

Complete trace All colors reachable from live colors marked live All objects marked by live colors survive the collection

Page 29: Ayse

ISMM 2010 29

Evaluation Prototype

Useful helpers work Live objects colored by live colors

Wasted helpers work Dead objects colored by dead

colors

Floating garbage Dead objects colored by live colors

a

Heap

he

b d

g

c

j

f i

k

l

m

4 regular tracers, 4 helper tracers Speculative roots – random unmarked objects ITU before and after the colored trace

Page 30: Ayse

ISMM 2010 30

Limit the floating garbage

Maximal amount of objects colored by a single color Helpers must save discovered but not traced objects Trace completion phase takes care of the saved fronts

Make the random roots choices smarter To avoid choosing dead objects To reach deeper parts of the live object graph

Filter for the recursive objects Objects with referents of their own type

Page 31: Ayse

ISMM 2010 31

Results Lots of floating garbage

Even with the filter

Hard to find good roots Progressively harder as the live objects are getting

marked

Trace completion phase is complex Can defeat the purpose

Modest improvement in the Idealized Trace Utilization scores

Page 32: Ayse

ISMM 2010 32

Results for DaCapo xalanWorst case ITU improvement, with the random choices filter

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Util

izat

ion

Before

After

Page 33: Ayse

ISMM 2010 33

Results for DaCapo bloatWorst case ITU improvement, with the random choices filter

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 512 1024

Processors

Util

izat

ion

Before

After

Page 34: Ayse

ISMM 2010 34

Related Work

Parallel Garbage Collection Folklore There are heap structures that can foil any

clever load balancing scheme Siebert (ISMM’08)

Reported object graph depths for SpecJVM benchmarks

Proposed upper bound on the worst case scalability as a way to compute RT guarantees for the GC tracing

Random tracing originally proposed by Click

Page 35: Ayse

ISMM 2010 35

Summary

Studied the heap shape properties of Java benchmarks Out of twenty considered benchmarks, five had not

scalable heap shapes during the run

Devised a measure to quantify the heap shape scalability Idealized Trace Utilization

Proposed, prototyped and evaluated two approaches to improve the tracing scalability Reshaping with Shortcuts appears to be more

promising than Tracing from Speculative Roots

Page 36: Ayse

ISMM 2010 36

Thank You!