42
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research

A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

  • Upload
    chelsa

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization. David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research. Roadmap. What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and Space Usage Heap Architecture - PowerPoint PPT Presentation

Citation preview

Page 1: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

David BaconPerry Cheng (presenting)V.T. Rajan

IBM T.J. Watson Research

Page 2: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and

Space Usage Heap Architecture

Types of Fragmentation Incremental Compaction Read Barriers Barrier Performance

Scheduling: Time-Based vs. Work-Based Empirical Results

Pause Time Distribution Minimum Mutator Utilization (MMU) Pause Times

Summary and Conclusion

Roadmap

Page 3: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Real-time Embedded Systems Memory usage important

Uniprocessor

Problem Domain

Page 4: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

3 Styles of Uniprocessor Garbage Collection:Stop-the-World vs. Incremental vs. Real-Time

STW

Inc

RT

time

Page 5: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Pause Times (Average and Maximum)

STW

Inc

RT

1.5s 1.7s

0.5s 0.7s 0.3s 0.5s 0.9s 0.3s

0.15 - 0.19 s

1.6s

0.5s

0.18s

Page 6: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Coarse-Grained Utilization vs. Time

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8

Time (s)

Uti

liza

tio

n (

%)

STW

Inc

RT

2.0 s window

Page 7: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Fine-Grained Utilization vs. Time

STW

Inc

RT

0

0.2

0.4

0.6

0.8

1

0

0.25 0.5

0.75 1

1.25 1.5

1.75 2

2.25 2.5

2.75 3

3.25 3.5

3.75 4

4.25 4.5

4.75 5

5.25 5.5

5.75 6

6.25 6.5

6.75 7

7.25 7.5

7.75 8

Time (s)

Uti

liza

tio

n

0.4 s window

Page 8: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Minimum Mutator Utilization (MMU)

STW

Inc

RT

0

20

40

60

80

100

Window Size (s) - logarithmic scale

MM

U

Page 9: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Space Usage over Time

0

10

20

30

40

50

60

70

80

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

Time (s)

Use

d S

pace

(M

b)

STW

Inc

RTmax live

trigger

2 X max live

Page 10: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Problems with Existing RT Collectors

0

20

40

60

80

100

0. 0 0. 5 1. 0 1. 5 2. 0 2. 5 3. 0 3. 5 4. 0 4. 5 5. 0 5. 5 6. 0 6. 5 7. 0 7. 5 8. 0

T i me (s )

Spa

ce (M

b)

max live2 X max live3 X max live4 X max live

Non-moving Collector

0

20

40

60

80

100

T i me (s )

MM

U

0

20

40

60

80

100

0. 0 0. 5 1. 0 1. 5 2. 0 2. 5 3. 0 3. 5 4. 0 4. 5 5. 0 5. 5 6. 0 6. 5 7. 0 7. 5 8. 0

T i me (s )

Spa

ce (M

b)

max live2 X max live3 X max live4 X max live

Replicating Collector

Not fully incremental,Tight coupling,Work-based scheduling

Page 11: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Our Collector Goals Results

Real-Time ~10 ms Low Space Overhead ~2X Good Utilization during GC ~ 40%

Solution Incremental Mark-Sweep Collector Write barrier – snapshot-at-the-beginning [Yuasa] Segregated free list heap architecture Read Barrier – to support defragmentation [Brooks]

Incremental defragmentation Segmented arrays – to bound fragmentation

Page 12: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and Space Usage

Heap Architecture Types of Fragmentation Incremental Compaction Read Barriers Barrier Performance

Scheduling: Time-Based vs. Work-Based Empirical Results

Pause Time Distribution Minimum Mutator Utilization (MMU) Pause Times

Summary and Conclusion

Roadmap

Page 13: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Fragmentation and Compaction

Intuitively: available but unusable memory

avoidance and coalescing - no guarantees compaction

used

needed

free

Page 14: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Heap Architecture Segregated Free Lists

– heap divided into pages– each page has equally-sizes blocks (1 object

per block)– Large arrays are segmented

used free

sz 24

sz 32

external

internal page-internal

Page 15: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Controlling Internal and Page-Internal Fragmentation

Choose page size (page) and block sizes (sk)

If sk = sk-1 (1 + ), internal fragmentation

page-internal fragmentation page / smax

E.g. If page = 16K, = 1/8, smax= 2K, maximum non-external fragmentation to 12.5%.

Page 16: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

dbja

ck

java

cje

ssm

trt

mpeg

audi

o

com

press

Internal Page-Internal External Recently Dead Live

Fragmentation - small heap ( = 1/8 vs.

= 1/2)

=1/8 =1/2

Page 17: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Incremental Compaction

Compact only a part of the heapRequires knowing what to compact ahead of time

Key ProblemsPopular objectsDetermining references to moved objects

used

Page 18: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Incremental Compaction: Redirection

Access all objects via per-object redirection pointers

Redirection is initially self-referential

Move an object by updating ONE redirection pointer

original replica

Page 19: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Consistency via Read Barrier [Brooks]

Correctness requires always using the replica

E.g. field selection must be modified

x[offset]

x

x[redirect][offset]

x

normal access

read barrier access

x

Page 20: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Some Important Details Our read barrier is decoupled from collection Complication: In Java, any reference might be null

actual read barrier for GetField(x,offset) must be augmented

tmp = x[offset];return (tmp == null) ? null : tmp[redirect]

CSE, code motion (LICM and sinking), null-check combining

Barrier Variants - when to redirectlazy - easier for collectoreager - better for optimization

Page 21: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Barrier Overhead to Mutator Conventional wisdom says read barriers are too

expensiveStudies found overhead of 20-40% (Zorn, Nielsen)Our barrier has 4-6% overhead with optimizations

0

2

4

6

8

10

12

com

press

jess db

java

c

mpeg

audio

mtrt

jack

Geo. M

ean

Lazy

Eager

Page 22: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Heap (one size only)Stack

Program Start

Page 23: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

HeapStack

free

allocated

Program is allocating

Page 24: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

HeapStack

free

unmarked

GC starts

Page 25: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

HeapStack

free

unmarked

marked orallocated

Program allocating and GC marking

Page 26: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

HeapStack

free

unmarked

marked orallocated

Sweeping away blocks

Page 27: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

HeapStack

free

allocated

evacuated

GC moving objects and installing redirection

Page 28: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

HeapStack

free

unmarked

evacuated

marked orallocated

2nd GC starts tracing and redirection fixup

Page 29: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

HeapStack

free

allocated

2nd GC complete

Page 30: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and Space Usage

Heap Architecture Types of Fragmentation Incremental Compaction Read Barriers Barrier Performance

Scheduling: Time-Based vs. Work-Based Empirical Results

Pause Time Distribution Minimum Mutator Utilization (MMU) Pause Times

Summary and Conclusion

Roadmap

Page 31: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Scheduling the Collector Scheduling Issues

bad CPU utilization and space usage loose program and collector coupling

Time-Based Trigger the collector to run for CT seconds whenever the program runs for QT seconds

Work-Based Trigger the collector to collect CW work whenever the program allocate QW bytes

Page 32: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Time-Based Scheduling

Trigger the collector to run for CT seconds whenever the program runs for QT seconds

Sp

ace

(M

b)

Time (s)

0

10

20

30

40

50

60

70

80

90

100

Smooth Alloc Uneven Alloc High Alloc

0

0.2

0.4

0.6

0.8

1

Any

MM

U (

CP

U

Uti

liza

tio

n)

Window Size (s)

Page 33: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Work-Based Scheduling

0

0.2

0.4

0.6

0.8

1

Smooth Alloc Uneven Alloc

High Alloc

MM

U (

CP

U

Uti

liza

tio

n)

Trigger the collector to collect CW bytes whenever the program allocates QW bytes

Window Size (s)

0

20

40

60

80

100

Any

Sp

ace

(M

b)

Time (s)

Page 34: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and Space Usage

Heap Architecture Types of Fragmentation Incremental Compaction Read Barriers Barrier Performance

Scheduling: Time-Based vs. Work-Based Empirical Results

Pause Time Distribution Minimum Mutator Utilization (MMU) Pause Times

Summary and Conclusion

Roadmap

Page 35: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Pause Time Distribution for javac

(Time-Based vs. Work-Based)

12 ms 12 ms

Page 36: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Utilization vs. Time for javac

(Time-Based vs. Work-Based)

Uti

liza

tio

n

(%)

Time (s) Time (s)

0.4

0.2

0

0.6

0.8

1.0

0.4

0.2

0

0.6

0.8

1.0

Uti

liza

tio

n

(%)

0.45

Page 37: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Minimum Mutator Utilization for javac

(Time-Based vs. Work-Based)

Page 38: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Space Usage for javac (Time-Based vs. Work-

Based)

Page 39: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

3 inter-related factors:Space Bound (tradeoff)Utilization (tradeoff)Allocation Rate (lower is better)

Other factorsCollection rate (higher is better)Pointer density (lower is better)

Intrinsic Tradeoff

Page 40: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Summary: Mostly Non-moving RT GC

Read Barriers Permits incremental defragmentation Overhead is 4-6% with compiler optimizations

Low Space Overhead Space usage is only about 2 X max live data

Fragmentation still bounded Consistent Utilization

Always at least 45% at 12 ms resolution

Page 41: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Conclusions Real-time GC is real

There are tradeoffs just like in traditional GC

Scheduling should be primarily time-based

Fallback to work-based due to user’s incorrect parameter estimations

Incremental defragmentation is possible

Compiler support is important!

Page 42: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

Future Work Lowering the real-time resolution

Sub-millisecond worst-case pause Main issue: breaking up stack scan

Segmented array optimizations Reduce segmented array cost below ~2%

Opportunistic contiguous layout Type-based specialization with invalidation

Strip-mining