37
SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

Embed Size (px)

Citation preview

Page 1: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Advanced Software Design and Reengineering

TOPIC L

Garbage Collection Algorithms

Page 2: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

2

What is Garbage Collection?

• The automatic management of dynamically allocated storage

• Automatically freeing objects that are no longer used by the program

• Refer to Richard Jones garbage collection web site for a rich set of online resources: http://www.cs.kent.ac.uk/people/staff/rej/gc.html

Page 3: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

3

Why Automatic Garbage Collection?

• It is not generally possible to immediately determine when a shared object becomes no longer ‘in use’- So explicit memory management is often difficult to

perform

• Software engineering perspective:- Garbage collection increases the abstraction level of

software development

- Decreases coupling among the system modules

- Frees software developers from spending a lot of time managing memory

- Eliminates many memory-management bugs

Page 4: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

4

Terminology

• Stack: - A memory area where data is

- pushed when a procedure is called- and popped when it returns

- Contains local variables (which must be cleaned up)

• Heap: - A memory area where data can be allocated and deallocated

in any order- Functions like ‘malloc’ and ‘free’, or the ‘new’ and ‘delete’

operators allocate and free data in the heap- In most good OO program all objects are allocated on the

heap

Page 5: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

5

Terminology (cont.)

• Root Set: - A set of objects that a program always has direct access to

- E.g. global variables, or variables in the main program (stored on the program stack)

• A Heap Object (also called a cell or simply object): - An individually allocated piece of data in the heap

• Reachable Objects: - Objects that can be reached transitively from the root set

objects

Page 6: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

6

Terminology (cont.)

• Garbage: -Objects that are unreachable from root set objects but are

not free either

• Dangling references:-A reference to an object that was deleted

-May cause the system to crash (if we are lucky!)

-May cause more subtle bugs

• Mutator: -The user’s program

-(often contrasted with the ‘collector’)

Page 7: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

7

Two Main Types of Algorithms

•Reference counting•Each object has an additional field recording the number of objects that point to it•An object is considered garbage when zero objects point to it

•Tracing•Walk through the set of objects looking for garbage

Page 8: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

8

Reference Counting

• Each object has an additional field called the reference count

• When an object is first created, its reference count is set to one

• When any other object (or root object) is assigned a reference to that object• then its reference count is incremented

• When a reference to an object is deleted or is assigned a new value• the object's reference count is decremented

Page 9: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

9

Reference Counting

• Any object with a reference count equal to zero can be garbage collected

• When an object is garbage collected, - Any object it refers to has its reference count

decremented

- The garbage collection of one object may therefore lead to the immediate garbage collection of other objects

Page 10: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

10

Reference Counting Example

1

1 1

1

1

1 0

0

Root SetRoot Set

Deleted

Garbage

Page 11: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

11

Pros and Cons of Reference Counting

Pros:•The garbage collector is executed along with the mutator

•Free memory is returned to free list quickly

Cons:•The reference counts must be updated every time a pointer is changed

•We need to save an additional field for each object

•Unable to deal with cyclic data structures (see next slide)

Page 12: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

12

Cyclic Data Structure

1

1 2

1

1

1 1

1

Deleted

Garbage but cannot be reclaimed

Page 13: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

13

Tracing Algorithms

• Used more widely than reference counting

• Visit the heap objects and determine which ones are not longer used

• Tracing algorithms differ according to:- Whether all objects are visited or not

- Whether they use the heap in an optimal way or not

- Whether the collector is executed in parallel with the mutator or not

- The duration of the pauses that the mutator undergoes when the algorithm is executed

Page 14: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

14

Mark-Sweep Algorithm

• The first tracing algorithm

• Invoked when the mutator requests memory but there is insufficient free space

• The mutator is stopped while the mark-sweep is executed- This is impractical for real-time systems

• Performed in two phases:- Mark phase: identifies all reachable objects by setting a

mark

- Sweep phase: reclaims garbage objects

Page 15: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

15

Mark-Sweep Example

X

X

X

• Mark phase

• Sweep phase

Heap

Heap

Page 16: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

16

Pros and Cons of Mark-Sweep

Pros:• Cyclic data structures can be recovered• Tends to be faster than reference counting

Cons:• Mutator must stop while the algorithm is being performed

• Every reachable object must be visited in the mark phase and every object in the heap must be visited in the sweep phase

• Causes memory fragmentation

Page 17: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

17

Mark-Compact algorithm

• Similar to the mark-sweep algorithm except that it does not cause memory fragmentation

• Two phases:- Mark phase: identical to mark-sweep- Compaction phase:

-marked objects are compacted-Reachable objects are moved forward until they are contiguous.

Page 18: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

18

Example

X

X

X

• Mark phase

• Compaction phase

Heap

Heap

Page 19: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

19

Pros and Cons of Mark-Compact

Pros:•Eliminates fragmentation problem

Cons:•Mutator must stop while the algorithm is being performed

•Several passes over the objects are required to implement the compaction

Page 20: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

20

The Copying Algorithm

• The Copying algorithm splits the heap into two equal areas called from-space and to-space

• The mutator works in the from-space area

• The algorithm visits the reachable objects and copies them contiguously to the to-space area- Objects need to be traversed only once

• Once the copying is completed, the to-space and from-space switch roles

Page 21: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

21

Example

From

From

To

Unused

Unused

X

X

X

To

Page 22: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

22

Pros and Cons of The Copying Algorithm

Pros:•Eliminates Fragmentation•Copying is very fast-provided the percentage of reachable objects is low

-It only visits reachable objectsCons:•Mutator must stop while the algorithm is being performed

•The use of two areas doubles memory space

•Impractical for very large heaps

Page 23: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

23

Incremental Tracing Algorithms

• The previous tracing algorithms are also known as Stop-The-World algorithms

- They stop the execution of the mutator in order to start performing

• Incremental algorithms (also called parallel algorithms) run concurrently with the mutator- Can be used in systems with real-time requirements

Page 24: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

24

Incremental Tracing Algorithms (cont.)

• Garbage collection can be executed as a thread

• Reference counting can be considered as an incremental algorithm- However, most languages do not use it due to its numerous

disadvantages

• There exists an incremental version for some of the non-incremental algorithms seen before- E.g. Baker’s copying incremental algorithm

Page 25: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

25

Tricoloring Algorithm• It is an incremental algorithm based on coloring objects

• An object can have one of three colors• White: - Initial state of all objects-Not visited- If it remains white at the end, then it can be collected

• Black: -Visited by the collector, so confirmed reachable-And has no direct references to White objects

• Grey: -Visited but not all the objects it refers to have been

visited-When this set becomes empty, all remaining white

objects can be destroyed

Page 26: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

26

Tricoloring Algorithms (cont.)

• The steps of the algorithm are:- Start with all objects white (not visited)

- Mark root objects grey

- While there are grey objects

- take a grey object

- mark its children grey, then mark it black

- At the end, all white objects are garbage

Page 27: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

27

Tricoloring Marking ExampleRoot set Heap Root set Heap

Root set HeapRoot set Heap

Page 28: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

28

Tricoloring InvariantThere must not be a pointer from a black object to a white object

The GC algorithm on its own can guarantee this•But the mutator may violate it unless we are careful

Violation of the invariant

Page 29: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

29

Preventing Tricolor Variant Violation• Two ways to prevent violation of the

tricoloring invariant• Both slow down the mutator slightly

• Using a write barrier:-Prevent the mutator from making a pointer in a black

object to a white object.- If this is attempted, mark the black object grey

• Using a read barrier:-Any attempt to access a white object proves it is

reachable, so mark it grey

Page 30: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

30

Pros and Cons of Incremental Algorithms

Pros:

• The mutator is not stopped

—just paused for short periods)

Cons:

• Hard to synchronize the mutator with the garbage collector

• Take more execution time because of the barriers unless specialized hardware is used

• Hard to debug

Page 31: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

31

Generational Garbage Collection

• The previous tracing algorithms execute on all objects

• Generational algorithms improve this by dividing objects into generations

• Based on the empirical observation that: Most objects die young

• Garbage collection is then executed more frequently on objects that are likely to be garbage: new objects

Page 32: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

32

Generational Garbage Collection (cont.)

• Object lifetime is measured based on the amount of heap allocation that occurs between the object’s creation and deletion

• The heap is divided into several areas, called generations, that hold objects according to their age

• Areas containing newer objects are garbage collected more frequently - Resulting in less pause times for the mutator

- This process is called minor collection

Page 33: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

33

Generational Garbage Collection (cont.)

• Areas containing older objects are garbage collected less frequently- This process is called major collection

• After an object has survived a given number of collections- It is promoted to a less frequently collected area

•Choosing the right number of generations and the promotion policies can be a problem•Some objects can be ‘tenured’, meaning the garbage collector never looks at them

Page 34: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

34

Intergenerational Pointers

• An intergenerational pointer is a pointer from an old generation object to a newer generation object

• Intergenerational pointers need to be tracked

• If these pointers are not tracked then -a young object may be garbage collected if it is only

referenced by older objects

Page 35: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

35

Garbage Collection and Java

• The 1.0 and 1.1 JDKs used a mark-sweep collector, - which causes memory fragmentation

• Allocation and deallocation costs were high- since the mark-sweep collector had to sweep the entire heap

at every collection

Page 36: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

36

Garbage Collection and Java (cont.)

• In HotSpot JVMs (Sun JDK 1.2 and later), the Sun JDKs used a generational collector

• The copying algorithm is used for the young generation- The free space in the heap is always contiguous

- Both allocation and deallocation costs have improved compared to previous versions

• Mark-Compact is used for old objects

Page 37: SEG4110 - Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms

SEG4110 - Topic L - Garbage Collection Algorithms

37

References

• “Garbage Collection : Algorithms for Automatic Dynamic Memory Management” by Authors: Richard Jones , Rafael D Lins