34
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION d F. Bacon, Perry Cheng, and V.T. Raja IBM T.J. Watson Research Center Presented by Srilakshmi Swati Pendyala

A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Embed Size (px)

Citation preview

Page 1: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD

AND CONSISTENT UTILIZATION

David F. Bacon, Perry Cheng, and V.T. RajanIBM T.J. Watson Research Center

Presented by Srilakshmi Swati Pendyala

Page 2: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Outline

Motivation Introduction & Previous Works Overview of the Proposed Garbage

Collector Example of the Collection Process Scheduling – Time-Based Vs. Work-

Based Experimental Results Conclusion

Page 3: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Motivation

Real-time systems growing in importance ATMs, PDAs, Web Servers, Points of Sale etc.

Constraints for Real-Time Systems: Hard constraints for continuous performance

(Low Pause Times) Memory Constraints (less memory in

embedded systems) Other Constraints ?

Need for a real-time garbage collector

with low memory usage.

Page 4: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Garbage Collection in Real-time Systems

Maximum Pause Time < Required Response

CPU Utilization sufficient to accomplish task Measured with Minimum Mutator Utilization

Memory Requirement < Resource Limit Important Constraint in Embedded Systems

Page 5: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Problems with Previous Works

Fragmentation Early works (Baker’s Treadmill) handles a

single object size Not suitable modern languages

Fragmentation not a major problem for a family of C and C++ benchmarks (Johnstone’ Paper)Not valid for long-run programs (web-servers,

embedded systems etc.) Use of single (large) block size

Increase in memory requirementsLeads to internal fragmentation

Page 6: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Problems with Previous Works

High Space Overhead Copying algorithms to avoid fragmentation

Leads to high space overhead Uneven Mutator Utilization

The fraction of processor devoted to mutator execution Several copying algorithms suffer from poor/uneven

mutator utilization Long low-utilization periods render mutator unsuitable

for real-time applications Inability to handle large data structures

When collecting a subset of the heap at a time, large structures generated by adversarial mutators force unbounded work

Page 7: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Outline

Motivation Introduction & Previous Works Overview of the Proposed Garbage

Collector Example of the Collection Process Scheduling – Time-Based Vs. Work-

Based Experimental Results Conclusion

Page 8: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Components and Concepts in Proposed GC Segregated free list allocator

Geometric size progression limits internal fragmentation

Mostly non-copying Objects are usually not moved.

Defragmentation Moves objects to a new page when page is

fragmented due to GC Read barrier: to-space invariant [Brooks]

New techniques with only 4% overhead Incremental mark-sweep collector

Mark phase fixes stale pointers Arraylets: bound fragmentation, large object

ops Time-based scheduling New

Old

Page 9: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Segregated Free List Allocator Heap divided into fixed-size pages Each page divided into fixed-size blocks Objects allocated in smallest block that fits

24

16

12

Page 10: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Limiting Internal Fragmentation

Choose page size P and block sizes sk such that sk = sk-1(1+ρ)

How do we choose small s0 & ρ ?

s0 ~ minimum block size ρ ~ sufficiently small to avoid internal fragmentation

Too small a ρ leads to too many pages and hence a wastage of space, but it should be okay for long running processes

Too large a ρ leads to internal fragmentation Memory for a page should be allocated only when there is at least

one object in that page.

Page 11: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Defragmentation

When do we move objects? At the end of sweep phase, when there are

no sufficient free pages for the mutator to execute, that is, when there is fragmentation

Usually, program exhibits locality of size Dead objects are re-used quickly

Defragment either when Dead objects are not re-used for a GC cycle Free pages fall below limit for performing a

GC In practice: we move 2-3% of data traced

Major improvement over copying collector

Page 12: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Read Barrier: To-space Invariant

Problem: Collector moves objects (defragmentation) and mutator is finely interleaved

Solution: read barrier ensures consistency Each object contains a forwarding pointer [Brooks] Read barrier unconditionally forwards all pointers Mutator never sees old versions of objects

Will the mutator utilization have any effects because of the read barrier ?

From-spaceTo-space

A

X

Y

Z

A

X

Y

Z

A′

BEFORE AFTER

Page 13: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Read Barrier Optimization

Previous studies: 20-40% overhead [Zorn, Nielsen]

Several optimizations applied to the read barrier and reduced the cost over-head to <10% using Eager Read Barriers

“Eager” read barrier preferred over “Lazy” read barrier.

Page 14: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Incremental Mark-Sweep

Mark/sweep finely interleaved with mutator Write barrier: snapshot-at-the-beginning

[Yuasa] Ensures no lost objects Must treat objects in write buffer as roots

Read barrier ensures consistency Marker always traces correct object

With barriers, interleaving is simple Are the problems inherent to mark sweep,

also apply here ?

Page 15: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Pointer Fix-up During Mark

When can a moved object be freed? When there are no more pointers to it

Mark phase updates pointers Redirects forwarded pointers as it marks them

Object moved in collection n can be freed: At the end of mark phase of collection n+1

From-spaceTo-space

A

X

Y

Z

A′

Page 16: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Arraylets

Large arrays create problems Fragment memory space Can not be moved in a short, bounded time

Solution: break large arrays into arraylets Access via indirection; move one arraylet at

a time

A1 A2 A3

Page 17: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Outline

Motivation Introduction & Previous Works Overview of the Proposed Garbage

Collector Example of the Collection Process Scheduling – Time-Based Vs. Work-

Based Experimental Results Conclusion

Page 18: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Heap (one size only)Stack

Program Start

Page 19: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

HeapStack

free

allocated

Program is allocating

Page 20: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

HeapStack

free

unmarked

GC starts

Page 21: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

HeapStack

free

unmarked

marked orallocated

Program allocating and GC marking

Page 22: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

HeapStack

free

unmarked

marked orallocated

Sweeping away blocks

Page 23: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

HeapStack

free

allocated

evacuated

GC moving objects and installing redirection

Page 24: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

HeapStack

free

unmarked

evacuated

marked orallocated

2nd GC starts tracing and redirection fixup

Page 25: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

HeapStack

free

allocated

2nd GC complete

Page 26: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Outline

Motivation Introduction & Previous Works Overview of the Proposed Garbage

Collector Example of the Collection Process Scheduling – Time-Based Vs. Work-

Based Experimental Results Conclusion

Page 27: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Scheduling the Collector

Scheduling Issues Bad CPU utilization and space usage Loose program and collector coupling

Time-Based Trigger the collector to run for CT seconds

whenever the mutator runs for QT seconds Work-Based

Trigger the collector to collect CW work whenever the mutator allocate QW bytes

Page 28: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Scheduling

Very predictable mutator utilization

Memory allocation does not need to be monitored.

Uneven mutator utilization due to bursty allocation

Memory allocation rates need to be monitored to make sure real-time performance is obtained

Time – Based Work – Based

Why is Time-based scheduling better in terms of mutator utilization ?

(Analytically and experimentally shown in the paper)

Page 29: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Outline

Motivation Introduction & Previous Works Overview of the Proposed Garbage

Collector Example of the Collection Process Scheduling – Time-Based Vs. Work-

Based Experimental Results Conclusion

Page 30: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

12 ms

Pause Time Distribution for javac (Time-Based vs. Work-Based)

Page 31: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Utilization vs. Time for javac (Time-Based vs. Work-Based)

0.45

Page 32: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Minimum Mutator Utilization for javac (Time-Based vs. Work-Based)

Page 33: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Space Usage for javac (Time-Based vs. Work-Based)

Page 34: A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center

Conclusions

The Metronome provides true real-time GC First collector to do so without major

sacrifice Short pauses (4 ms) High MMU during collection (50%) Low memory consumption (2x max live)

Critical features Time-based scheduling Hybrid, mostly non-copying approach Integration with the compiler