01 Garbage Collection

Embed Size (px)

Citation preview

  • 8/22/2019 01 Garbage Collection

    1/5

    GC FAQ -- draft

    Common questions

    What is garbage collection?

    Garbage collection is a part of a language's runtime system, or an add-on library,

    perhaps assisted by the compiler, the hardware, the OS, or any combination of thethree, that automatically determines what memory a program is no longer using,

    and recycles it for other use. It is also known as ``automatic storage (or memory)

    reclamation''.Why is it good?

    Manual memory management is (programmer-)time consuming, and error prone.

    Most programs still contain leaks. This is all doubly true with programs using

    exception-handling and/or threads.

    A second benefit of garbage collection, less obvious to people who haven't used

    it, is that relying on garbage collection to manage memory simplifies the

    interfaces between components (subroutines, libraries, modules, classes) that no

    longer need expose memory management details ("who is responsible forrecycling this memory").

    Is garbage collection slow?

    Not necessarily. Modern garbage collectors appear to run as quickly as manualstorage allocators (malloc/free ornew/delete). Garbage collection probably

    will not run as quickly as customized memory allocator designed for use in a

    specific program. On the other hand, the extra code required to make manualmemory management work properly (for example, explicit reference counting) is

    often more expensive than a garbage collector would be.

    Can I use garbage collection with C or C++?Probably. Modern (well-tested, efficient, non-pausing) garbage collectors are

    available that work with all but the most pathological C and C++ programs,

    including legacy code. SeeGC, C, and C++ for more details.

    Does garbage collection cause my program's execution to pause?

    Not necessarily. A variety of algorithms allow garbage collection to proceedconcurrently, incrementally, and (for some definitions of the term) in "real time".

    There are incremental garbage collectors that work with C and C++, for instance.Where can I get a C or C++ garbage collector?

    Boehm-Weiser collector

    http://reality.sgi.com/employees/boehm_mti/gc.htmlorftp://parcftp.xerox.com/pub/gc/gc.html

    http://www.iecc.com/gclist/GC-faq.html#GC,%20C,%20and%20C++http://www.iecc.com/gclist/GC-faq.html#GC,%20C,%20and%20C++http://reality.sgi.com/employees/boehm_mti/gc.htmlhttp://reality.sgi.com/employees/boehm_mti/gc.htmlftp://parcftp.xerox.com/pub/gc/gc.htmlhttp://www.iecc.com/gclist/GC-faq.html#GC,%20C,%20and%20C++http://reality.sgi.com/employees/boehm_mti/gc.htmlftp://parcftp.xerox.com/pub/gc/gc.html
  • 8/22/2019 01 Garbage Collection

    2/5

    Great Circle from Geodesic Systems or 800-360-8388 or

    http://www.geodesic.com/

    Kevin Warne or 800-707-7171

    Folk myths

    GC is necessarily slower than manual memory management.

    GC will necessarily make my program pause.

    Manual memory management won't cause pauses.

    GC is incompatible with C and C++.

    Folk truths

    Most allocated objects are dynamically referenced by a very small number of

    pointers. The most important small number is ONE.

    Most allocated objects have short lifetimes. Allocation patterns (size distributions, lifetime distributions) are bursty, not

    uniform.

    VM behavior matters.

    Cache behavior matters.

    "Optimal" strategies can fail miserably.

    Tradeoffs

    precise vs. conservative

    moving/compacting vs. non-moving

    explicit vs. implicit reclamation phase

    stopping vs. incremental vs. concurrent

    generational vs. non-generational

    GC, C, and C++

    What do you mean, garbage collection and C?

    Rather than using malloc and free to obtain and reclaim memory, it is possible to link in

    a garbage collector and allow it to reclaim unused memory automatically. This usually

    even works ifmalloc is replaced with the garbage collector's allocator and free isreplaced with a do-nothing subroutine. This approach has worked with the X11 library,

    for instance.

    It is also possible to program in a style where free still reclaims storage, but the garbage

    collector acts as a backstop, preventing leaks that might otherwise occur. This style has

    also been tested with many applications, and it works well. The advantage here is that

    where it is easy for the programmer to manage memory, the programmer manages the

    http://www.geodesic.com/http://www.geodesic.com/
  • 8/22/2019 01 Garbage Collection

    3/5

    memory, but where it is not, the garbage collector does the job. This doesn't necessarily

    run any faster than free-does-nothing, but it may help keep the heap smaller.

    How is this possible?

    C-compatible garbage collectors know where pointers may generally be found (e.g.,"bss", "data", and stack), and maintain heap data structures that allow them to quicklydetermine what bit patterns might be pointers. Pointers, of course, look like pointers, so

    this heuristic traces out all memory reachable through pointers. What isn't reached, is

    reclaimed.

    This doesn't sound very portable. What if I need to port my code and

    there's no garbage collector on the target platform?

    Some of this code is necessarily system-dependent, but the features of most operating

    systems have been enumerated, so garbage collection for C is available almost

    everywhere. That is, portability isn't a problem if the code has already been ported, and ithas. Speaking personally (this is David Chase) it's also not hard to port these garbage

    collectors to new platforms; I've ported the Boehm-Weiser collector twice myself, whenthe code had not yet been ported to terribly many platforms, and when I had much less

    experience with the low-level interfaces to various operating systems.

    Won't this leave bugs in my program?

    This depends on your point of view. Using a garbage collector solves a lot of problems

    for a programmer, which gives a programmer time to solve other problems, or lets the job

    be finished faster. It's similar in flavor to floating point arithmetic or virtual memory.

    Both of these solve a tedious problem (scaling arithmetic, or paging unused data to disk)that a programmer could, in principle, solve. Some specialized code is written without FP

    or VM support, but in practice, if these features are available, people use them. They'regenerally judged to be well worth the cost.

    Now, if a program is developed using garbage collection, and the collector is taken away,

    then yes, the result may contain bugs in the form of memory leaks. Similarly, if a

    program is developed using FP (or VM) and that is taken away, that program, too, maycontain bugs.

    Also in practice, many programs that use malloc and free already leak memory, so use

    of a garbage collector can actually reduce the number of bugs in a program, and do somuch more quickly than if they had to be tracked down and fixed by hand. This isespecially true if the memory leak is inherent in a library that cannot be repaired.

    Can't a devious C programmer break the collector?

    Certainly, but most people have better ways to spend their time than dreaming up ways to

    break their tools. The collector does rely on being able to locate copies of pointers

  • 8/22/2019 01 Garbage Collection

    4/5

    somewhere in an address space, so certain things won't work. For instance, the XOR'd

    pointers trick for compactly encoding a bidirectional list cannot be used -- the pointers

    don't look like pointers. If a process writes pointers to a file, and reads them back again,the memory referenced by those pointers may have been recycled. Most programs don't

    do these things, so most programs work with a garbage collector. Ordinary (legal) pointer

    arithmetic is tolerated by garbage collectors for C.

    Insert more questions here -- send them to

    What does a garbage collector do about destructors?

    A destructor is some code that runs when an object is about to be freed. One of the mainuses of destructors is to do manual memory management. For example, the destructor for

    an object may recursively free the objects it references. A garbage collector obviates the

    need for such uses: If an object is garbage, all the objects it references will also begarbage if they are not referenced elsewhere, and so they, too, will be freed

    automatically.

    There remains the question of what to do with destructors that do something other than

    assist in memory management. There are a couple of typical uses.

    One use is for objects that have state outside the program itself. The canonical example is

    an object that refers to a file. When a file object becomes eligible for reclamation, the

    garbage collector needs to ensure that buffers are flushed, the file is closed, and resources

    associated with the file are returned to the operating system.

    Another use is where a program wants to keep a list of objects that are referenced

    elsewhere. The program may want know what objects are in existence for, say,accounting purposes but does not want the mechanism of accounting to prevent objectsfrom otherwise being freed.

    There are several ways of handling such situations:

    1. In systems where the garbage collector is "built in," it typically has special

    knowledge of all the cases where outside resources can be referenced and can deal

    with them appropriately.2. Many GC systems have a notion of a "weak pointer." A weak pointer is one that

    is not considered as a reference by the garbage collector. So if an object is

    referenced only by weak pointers, it is eligible for reclamation. Weak pointers canbe used to implement the object list example.

    3. Many GC systems have a notion of"finalization." An object may be registered

    with the GC system so that when it is about to reclaim the object, it runs afunction on the object that can perform necessary cleanups. Finalization is

    fundamentally tricky. Some of the issues are:

    1. When does a finalization function run, particularly with respect to when

    other finalizers run?;

    http://www.iecc.com/gclist/GC-lang.html#Finalizationhttp://www.iecc.com/gclist/GC-lang.html#Finalization
  • 8/22/2019 01 Garbage Collection

    5/5

    2. What happens when registered objects reference each other?;

    3. What happens if a finalization function makes an object not be garbage

    any more? There are no pat answers to these questions.

    http://counter.fateback.com/