27
Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Embed Size (px)

Citation preview

Page 1: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Hans-J. BoehmAlan J. DemersScott Shenker

Presented by Kit Cischke

Page 2: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Outline

Introduction Basics of Garbage Collection revisited How do you make a GC for non-GC languages? Oh, and making it parallel would be nice.

Or at least mostly parallel. The Basic Idea

Virtual dirty bits to find the reachable set Sweeping doesn’t matter.

Try telling that to your mother. For performance, the sweep step is practically

ignorable. Formalisms

Let’s introduce some notation and concepts for how this should work.

Page 3: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Outline, Part Deux

Implementation Choices Based on our formalisms, which is the best

combination to actually use? Brief Results

Really, the test hardware was a SPARC station configured with as little as 10MB of RAM!

Mostly Parallel Copying Collectors This is a mark-sweep paper, mostly. Could

you build a copy collector?

Onwards!

Page 4: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

GC Taxonomy and Our Choices Garbage Collectors may be Reference-

Counting or Tracing based. The authors focus on tracing out from the

root set. The basic style of many early collectors was

“stop-the-world” collection. Generational and parallel collectors

attempt to mitigate the potentially long delays while the world is stopped. Generational collectors just collect a small

part of the heap. Parallel collectors might be generational, but

they mainly try to collect the whole heap, but in parallel with the mutator(s).

Page 5: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

So why “mostly” parallel? Think back to the VM migration papers. Migration

started while the VM was running. Or in parallel.

But at some point, the VM had to be stopped to complete the transfer. Hopefully, by that time, there was very little to transfer.

Same idea here: as much collection is done as possible while the mutator is running. At some point, we need to stop the world to finish the collection. For mostly the same kind of reason: the mutator will do

things after the collector has made decisions on certain pointers which render the pointer un/reachable.

This is meaningful because we don’t want the collector running all the time.

Page 6: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Authors’ Two Stated Goals “Present a method for transforming a stop-

the-world racing collector into a mostly parallel collector.” And to make the solution general to

copying/non-copying or generational/non-generational collectors.

Furthermore, no OS changes are needed. “Describe a particular implementation of a

garbage collector that illustrates this idea.” What’s really cool is that it will provide GC to

languages like C with relatively short pause times.

Page 7: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Basic Idea

Every program has a root set. The root set forms the foundation for the immune set, or the set of objects that are reachable or live.

Tracing the path of pointers from the root set finds live, reachable objects that are marked. Unmarked (and therefore unreachable)

objects can be collected.

Page 8: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

More Basic Idea

Key idea: Whenever a virtual memory page is written to, set a virtual dirty bit for that page.

At the beginning of a collection, clear all the dirty bits. Start tracing.

The tracing finds all currently reachable objects while the mutator keeps doing its thing. Writes introduce dirty pages.

When the original trace is done, stop the world and trace out marked objects on dirty pages.

Now everything reachable is marked. But is it safe to say everything unreachable is not

marked?

Page 9: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

A Compromise

No, the collector is neither purely parallel nor precise.

The duration of the stop-the-world pause is directly dependent on the number of dirtied pages. In theory, things can be no worse than a whole-

heap stop-the-world collection. The authors claim this doesn’t happen.

Not all unreachable objects are collected, as they may have been marked before the mutator dismissed them. The collector is complete, in that eventually that

memory will be reclaimed. (Just not right now!)

Page 10: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Sweeping Doesn’t Matter

Phase 2 of a mark-sweep collector is to free the unused memory in whatever form that takes – called sweeping.

Sweeping doesn’t need to occur during the world stoppage. Once we know what’s garbage, we can sweep interleaved with object allocation.

Page 11: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Sweeping Implemented Here The heap is split into blocks. Each block

contains objects of a certain size. For small objects, the block size is the same as a

physical page of memory. After marking, pages are queued for sweeping

in one of multiple queues (one per object size). Each object size also has a free list. When it is

empty, the allocator sweeps the front of the queue for that object size and restores that memory to the free list.

Blocks for larger objects are swept in large increments immediately following a collection. This limits CPU time consumed by the collection.

The net effect is that GC times are dominated by the marking.

Page 12: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Let’s Get Formal

Definition: A partial collection only reclaims some subset of the unreachable objects.

Let the set T contain all threatened objects (that is, objects that might be collected).

Let the set I contain all immune objects (that is, objects that will not be collected). T and I are disjoint. All objects fall into either T

or I. For a full collection, I contains only the

roots. In a partial collection, there are additional objects.

A collection is correct iff no reachable objects are collected.

Page 13: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Guaranteeing Correctness

Reclaim only unmarked objects when the following condition is true:

C: Every object in I is marked and every object pointed to by a marked object is also marked.

Page 14: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Stop-the-World Collection Formalizing stop-the-world collection: Step 1: Stop the world Step 2: Clear all mark bits Step 3: Perform the tracing operation TR. Step 4: Restart the world

The operation TR:

TR: Mark all objects in I and trace from them.

At the end of this 4-step operation, condition C holds, and all unmarked objects can be collected.

Page 15: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Parallel Collection

Formally, mostly parallel collection requires:

Step 1: Clear all mark bits Step 2: Clear all virtual dirty bits Step 3: Perform the tracing op TR. Step 4: Stop the world Step 5: Perform a finishing operation, F Step 6: Restart the world.

The Finishing Operation

F: Trace from all marked objects on dirty pages.

Page 16: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Notes on that Collection

TR is performed totally in parallel with the mutator, which is dirtying pages that will need to be traced.

The closure condition C does not hold after step 4 (stop-the-world), which is what requires the finishing step F.

We will define a weaker closure C’:

C’: Every object in I is marked and every object pointed to by a marked object on a clean page is also marked.

Applying F to any state satisfying C’ will produce C.

Page 17: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Considerations

Thus we have a correct, mostly-parallel collection.

But, if we have a busy mutator, we might have lots of dirty pages, which in turn implies long pauses during the world stoppage.

To shorten this delay, we can clean the pages in parallel.

Let P be a set of pages. Then the process M is:

M: 1.) Atomically retrieve and clear the virtual dirty bits from P. 2.) Trace from the marked objects on the dirty pages of P.

Page 18: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Generational Partial Collection All of that formally describes a general

partial collection. Now let’s consider a generational collector

that uses the mark bits for object age. Consider a partial collector where I is

chosen to be the set of currently marked objects. Therefore, C’ holds.

We could be done by simply performing F, but to reduce the delay, we perform M to the entire heap just before the world stoppage.

Page 19: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Formal Parallel Generational Collection 1. Perform M on the heap. 2. Stop the world. 3. Perform F. 4. Restart the world.

Because an object that has been marked will never be collected by the generational collector, we occasionally need to run a full collection.

Page 20: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

An Alternate Version of M M’ could be:

M’: 1.) Atomically retrieve and clear the dirty bits from the pages P, and 2.) for all unmarked objects pointed to by marked objects on dirty pages of P, mark them and dirty the pages on which they reside..

Iteratively performing M’ can substitute for M, though M is generally preferable.

Page 21: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Implementation Choices

When and how to use M and M’. No M’. For allocation-intensive mutators, run M more than

once (twice seems to be the sweet spot). What is a “full collection” going to be, and when

to run it? Initially triggered on heap exhaustion. However, the allocating thread would be stalled,

even with the parallel collector. Settled on a daemon thread that kicks off the

collector if the amount of used memory exceeds some threshold above what was being used at the end of the last collection.

Then we run up to two iterations of M, then a concurrent execution of TR.

If we run out of memory, we try to expand the heap.

Page 22: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Brief Results

This collector was used at Xerox PARC for quite a while, heavily optimized.

They didn’t modify the SunOS running on their machines, but just write-protected the heap.

Mainly interested in measuring interactive response. Subjectively better. (But they are aware this is

pretty fuzzy.) Ran 5 iterations of a “Boyer benchmark”

and an allocator loop at various memory configurations, trying to even the playing field for full, generational and parallel generational collectors.

Page 23: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Results

Page 24: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Mostly Parallel Copying Collectors We can do all the same things and make

a copying collector, if we want. It just requires space to maintain explicit

forwarding links. A forward pointer is associated with each

object, used only by the GC. Reachable objects are copied from from-

space to to-space, writing the new address into the forward pointer in from-space.

The mutator only sees the from-space pointers.

Page 25: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

More on Copy Collectors

Concurrent collection forces the following to be true: If an object residing on a clean page has been

copied, then everything it points to has also been copied.

If an object resides on a clean page, its current contents are up-to-date.

With the world stopped, we can execute the finishing operation shown on the next slide such that all reachable objects are found with correct contents in to-space.

Page 26: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Fc: For every object a whose from-space copy resides on a dirty page: 1. Copy everything it points to that hasn’t already been copied. 2. Update pointers to point to to-space. 3. Recopy a to reflect both pointer and non-pointer fields that

occurred since the collection started.

Could create a concurrent version of Fc, but the authors found a copy collector to be impractical for their environment and didn’t both implementing one.

Just like with the mark-sweep collector, the world-stoppage time is proportional to the number of dirtied pages.

Copying Finishing Op

Page 27: Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Questions?

I really liked the tone of this paper. It had less of that stuffy, self-important academic tone.