Upload
leon-booth
View
220
Download
3
Tags:
Embed Size (px)
SEG4110 - Advanced Software Design and Reengineering
TOPIC L
Garbage Collection Algorithms
SEG4110 - Topic L - Garbage Collection Algorithms
2
What is Garbage Collection?
• The automatic management of dynamically allocated storage
• Automatically freeing objects that are no longer used by the program
• Refer to Richard Jones garbage collection web site for a rich set of online resources: http://www.cs.kent.ac.uk/people/staff/rej/gc.html
SEG4110 - Topic L - Garbage Collection Algorithms
3
Why Automatic Garbage Collection?
• It is not generally possible to immediately determine when a shared object becomes no longer ‘in use’- So explicit memory management is often difficult to
perform
• Software engineering perspective:- Garbage collection increases the abstraction level of
software development
- Decreases coupling among the system modules
- Frees software developers from spending a lot of time managing memory
- Eliminates many memory-management bugs
SEG4110 - Topic L - Garbage Collection Algorithms
4
Terminology
• Stack: - A memory area where data is
- pushed when a procedure is called- and popped when it returns
- Contains local variables (which must be cleaned up)
• Heap: - A memory area where data can be allocated and deallocated
in any order- Functions like ‘malloc’ and ‘free’, or the ‘new’ and ‘delete’
operators allocate and free data in the heap- In most good OO program all objects are allocated on the
heap
SEG4110 - Topic L - Garbage Collection Algorithms
5
Terminology (cont.)
• Root Set: - A set of objects that a program always has direct access to
- E.g. global variables, or variables in the main program (stored on the program stack)
• A Heap Object (also called a cell or simply object): - An individually allocated piece of data in the heap
• Reachable Objects: - Objects that can be reached transitively from the root set
objects
SEG4110 - Topic L - Garbage Collection Algorithms
6
Terminology (cont.)
• Garbage: -Objects that are unreachable from root set objects but are
not free either
• Dangling references:-A reference to an object that was deleted
-May cause the system to crash (if we are lucky!)
-May cause more subtle bugs
• Mutator: -The user’s program
-(often contrasted with the ‘collector’)
SEG4110 - Topic L - Garbage Collection Algorithms
7
Two Main Types of Algorithms
•Reference counting•Each object has an additional field recording the number of objects that point to it•An object is considered garbage when zero objects point to it
•Tracing•Walk through the set of objects looking for garbage
SEG4110 - Topic L - Garbage Collection Algorithms
8
Reference Counting
• Each object has an additional field called the reference count
• When an object is first created, its reference count is set to one
• When any other object (or root object) is assigned a reference to that object• then its reference count is incremented
• When a reference to an object is deleted or is assigned a new value• the object's reference count is decremented
SEG4110 - Topic L - Garbage Collection Algorithms
9
Reference Counting
• Any object with a reference count equal to zero can be garbage collected
• When an object is garbage collected, - Any object it refers to has its reference count
decremented
- The garbage collection of one object may therefore lead to the immediate garbage collection of other objects
SEG4110 - Topic L - Garbage Collection Algorithms
10
Reference Counting Example
1
1 1
1
1
1 0
0
Root SetRoot Set
Deleted
Garbage
SEG4110 - Topic L - Garbage Collection Algorithms
11
Pros and Cons of Reference Counting
Pros:•The garbage collector is executed along with the mutator
•Free memory is returned to free list quickly
Cons:•The reference counts must be updated every time a pointer is changed
•We need to save an additional field for each object
•Unable to deal with cyclic data structures (see next slide)
SEG4110 - Topic L - Garbage Collection Algorithms
12
Cyclic Data Structure
1
1 2
1
1
1 1
1
Deleted
Garbage but cannot be reclaimed
SEG4110 - Topic L - Garbage Collection Algorithms
13
Tracing Algorithms
• Used more widely than reference counting
• Visit the heap objects and determine which ones are not longer used
• Tracing algorithms differ according to:- Whether all objects are visited or not
- Whether they use the heap in an optimal way or not
- Whether the collector is executed in parallel with the mutator or not
- The duration of the pauses that the mutator undergoes when the algorithm is executed
SEG4110 - Topic L - Garbage Collection Algorithms
14
Mark-Sweep Algorithm
• The first tracing algorithm
• Invoked when the mutator requests memory but there is insufficient free space
• The mutator is stopped while the mark-sweep is executed- This is impractical for real-time systems
• Performed in two phases:- Mark phase: identifies all reachable objects by setting a
mark
- Sweep phase: reclaims garbage objects
SEG4110 - Topic L - Garbage Collection Algorithms
15
Mark-Sweep Example
√
√
√
X
X
X
• Mark phase
• Sweep phase
Heap
Heap
SEG4110 - Topic L - Garbage Collection Algorithms
16
Pros and Cons of Mark-Sweep
Pros:• Cyclic data structures can be recovered• Tends to be faster than reference counting
Cons:• Mutator must stop while the algorithm is being performed
• Every reachable object must be visited in the mark phase and every object in the heap must be visited in the sweep phase
• Causes memory fragmentation
SEG4110 - Topic L - Garbage Collection Algorithms
17
Mark-Compact algorithm
• Similar to the mark-sweep algorithm except that it does not cause memory fragmentation
• Two phases:- Mark phase: identical to mark-sweep- Compaction phase:
-marked objects are compacted-Reachable objects are moved forward until they are contiguous.
SEG4110 - Topic L - Garbage Collection Algorithms
18
Example
√
√
√
X
X
X
• Mark phase
• Compaction phase
Heap
Heap
SEG4110 - Topic L - Garbage Collection Algorithms
19
Pros and Cons of Mark-Compact
Pros:•Eliminates fragmentation problem
Cons:•Mutator must stop while the algorithm is being performed
•Several passes over the objects are required to implement the compaction
SEG4110 - Topic L - Garbage Collection Algorithms
20
The Copying Algorithm
• The Copying algorithm splits the heap into two equal areas called from-space and to-space
• The mutator works in the from-space area
• The algorithm visits the reachable objects and copies them contiguously to the to-space area- Objects need to be traversed only once
• Once the copying is completed, the to-space and from-space switch roles
SEG4110 - Topic L - Garbage Collection Algorithms
21
Example
From
From
To
Unused
Unused
√
√
√
X
X
X
To
SEG4110 - Topic L - Garbage Collection Algorithms
22
Pros and Cons of The Copying Algorithm
Pros:•Eliminates Fragmentation•Copying is very fast-provided the percentage of reachable objects is low
-It only visits reachable objectsCons:•Mutator must stop while the algorithm is being performed
•The use of two areas doubles memory space
•Impractical for very large heaps
SEG4110 - Topic L - Garbage Collection Algorithms
23
Incremental Tracing Algorithms
• The previous tracing algorithms are also known as Stop-The-World algorithms
- They stop the execution of the mutator in order to start performing
• Incremental algorithms (also called parallel algorithms) run concurrently with the mutator- Can be used in systems with real-time requirements
SEG4110 - Topic L - Garbage Collection Algorithms
24
Incremental Tracing Algorithms (cont.)
• Garbage collection can be executed as a thread
• Reference counting can be considered as an incremental algorithm- However, most languages do not use it due to its numerous
disadvantages
• There exists an incremental version for some of the non-incremental algorithms seen before- E.g. Baker’s copying incremental algorithm
SEG4110 - Topic L - Garbage Collection Algorithms
25
Tricoloring Algorithm• It is an incremental algorithm based on coloring objects
• An object can have one of three colors• White: - Initial state of all objects-Not visited- If it remains white at the end, then it can be collected
• Black: -Visited by the collector, so confirmed reachable-And has no direct references to White objects
• Grey: -Visited but not all the objects it refers to have been
visited-When this set becomes empty, all remaining white
objects can be destroyed
SEG4110 - Topic L - Garbage Collection Algorithms
26
Tricoloring Algorithms (cont.)
• The steps of the algorithm are:- Start with all objects white (not visited)
- Mark root objects grey
- While there are grey objects
- take a grey object
- mark its children grey, then mark it black
- At the end, all white objects are garbage
SEG4110 - Topic L - Garbage Collection Algorithms
27
Tricoloring Marking ExampleRoot set Heap Root set Heap
Root set HeapRoot set Heap
SEG4110 - Topic L - Garbage Collection Algorithms
28
Tricoloring InvariantThere must not be a pointer from a black object to a white object
The GC algorithm on its own can guarantee this•But the mutator may violate it unless we are careful
Violation of the invariant
SEG4110 - Topic L - Garbage Collection Algorithms
29
Preventing Tricolor Variant Violation• Two ways to prevent violation of the
tricoloring invariant• Both slow down the mutator slightly
• Using a write barrier:-Prevent the mutator from making a pointer in a black
object to a white object.- If this is attempted, mark the black object grey
• Using a read barrier:-Any attempt to access a white object proves it is
reachable, so mark it grey
SEG4110 - Topic L - Garbage Collection Algorithms
30
Pros and Cons of Incremental Algorithms
Pros:
• The mutator is not stopped
—just paused for short periods)
Cons:
• Hard to synchronize the mutator with the garbage collector
• Take more execution time because of the barriers unless specialized hardware is used
• Hard to debug
SEG4110 - Topic L - Garbage Collection Algorithms
31
Generational Garbage Collection
• The previous tracing algorithms execute on all objects
• Generational algorithms improve this by dividing objects into generations
• Based on the empirical observation that: Most objects die young
• Garbage collection is then executed more frequently on objects that are likely to be garbage: new objects
SEG4110 - Topic L - Garbage Collection Algorithms
32
Generational Garbage Collection (cont.)
• Object lifetime is measured based on the amount of heap allocation that occurs between the object’s creation and deletion
• The heap is divided into several areas, called generations, that hold objects according to their age
• Areas containing newer objects are garbage collected more frequently - Resulting in less pause times for the mutator
- This process is called minor collection
SEG4110 - Topic L - Garbage Collection Algorithms
33
Generational Garbage Collection (cont.)
• Areas containing older objects are garbage collected less frequently- This process is called major collection
• After an object has survived a given number of collections- It is promoted to a less frequently collected area
•Choosing the right number of generations and the promotion policies can be a problem•Some objects can be ‘tenured’, meaning the garbage collector never looks at them
SEG4110 - Topic L - Garbage Collection Algorithms
34
Intergenerational Pointers
• An intergenerational pointer is a pointer from an old generation object to a newer generation object
• Intergenerational pointers need to be tracked
• If these pointers are not tracked then -a young object may be garbage collected if it is only
referenced by older objects
SEG4110 - Topic L - Garbage Collection Algorithms
35
Garbage Collection and Java
• The 1.0 and 1.1 JDKs used a mark-sweep collector, - which causes memory fragmentation
• Allocation and deallocation costs were high- since the mark-sweep collector had to sweep the entire heap
at every collection
SEG4110 - Topic L - Garbage Collection Algorithms
36
Garbage Collection and Java (cont.)
• In HotSpot JVMs (Sun JDK 1.2 and later), the Sun JDKs used a generational collector
• The copying algorithm is used for the young generation- The free space in the heap is always contiguous
- Both allocation and deallocation costs have improved compared to previous versions
• Mark-Compact is used for old objects
SEG4110 - Topic L - Garbage Collection Algorithms
37
References
• “Garbage Collection : Algorithms for Automatic Dynamic Memory Management” by Authors: Richard Jones , Rafael D Lins