22
ParMarkSplit: A Parallel Mark-Split Garbage Collector Based on a Lock-Free Skip- List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing and Systems Chalmers University of Technology

ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

Embed Size (px)

Citation preview

Page 1: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

ParMarkSplit: A Parallel Mark-Split Garbage Collector Based on a Lock-Free Skip-List

Nhan Nguyen

Philippas Tsigas

Håkan Sundell

Distributed Computing and Systems

Chalmers University of Technology

Page 2: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

2

Garbage Collection (GC)

Reclaim memory no longer used by programs Algorithms:

Mark-sweep (McCarthy 1959), copying (Fenichel & Yochelsom 1969, Cheney 1970), reference counting (Collins 1960), mark-compact (Cohen & Nicolau 1983), mark-split (Sagonas & Wilhelmsson 2006), mark-region (Blackburn & McKinley 2008), etc.

Multicore era Garbage collectors need to be parallelized!

Page 3: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

3

How do we parallelize GCs?

The answer is not limited to one algorithm! Several algorithms have been parallelized: mark-

sweep, mark-compact, copying, mark-region, etc

Mark-Split [Sagonas et al. ISMM2006] Can Mark-Split be parallelized? Is it a challenging problem to solve?

Page 4: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

4

Contributions

ParMarkSplit: A Parallel and Concurrent Mark-split Garbage Collector Lock-free skip-list with extended functionality Lazy splitting mechanism

Implementation in OpenJDK HotSpot A collector for the old generation

Evaluation using the DaCapo benchmarks (Blackburn et al. 2006)

Page 5: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

5

Contributions

Introduction Background

Mark-split algorithm Contribution

Skip-list with extended functionalityParallelization of mark-split

Evaluation Conclusion

Page 6: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

6

Mark-split algorithm(Sagonas & Wilhelmsson, ISMM 2006)

Mark-Sweep: MARK reachable objects as LIVE Scan the heap to SWEEP over unmarked objects Create free-list: list of free spaces.

Mark-Split: mark without sweep MARK live objects Create the free-list during the mark phase, using an

operation called SPLIT!

In some cases, this replacement of the sweep phase pays off!

Page 7: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

7

Mark-Split (1)

Heap:

Root set:

Free-list:

Page 8: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

8

Mark-Split (2)

Heap:

Root set:

Free-list:

Page 9: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

9

Mark-Split (2)

Heap to be collected

Sequential Mark-Split outperforms Mark-Sweep in certain cases (ISMM 2006)

Free-list:

Root set:

Page 10: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

10

Mark-Split Algorithmfor Multi-core Architectures

Most frequent operation in Mark-Split “search and split”

Free-list implementation in a sequential context: Fast search is enough! Balanced search tree, skip-list, hash table

Free-list implementation in a multi-threaded context: atomic “search and split” is needed

split = find_interval + delete [+ insert_intervals]

Suitable search data structure to adapt?

Skip-list !

Page 11: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

11

Contributions

Introduction Background

Mark-split algorithm Contribution

Skip-list with extended functionalityParallelization of mark-split

Evaluation Conclusion

Page 12: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

12

Skip-list

1 2 3 4 5 6 7

Head Tail

50%25%

Layers of ordered lists with different densities, achieve tree-like behavior

Search: O(log2N) – probabilistic! Lock-free skip-list [Sundell & Tsigas 2003] Extension needed to perform split

Page 13: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

Extended functionality

13

Search data structure for parallelization of mark-split: To store and efficiently search for free memory chunks. Can perform a special, composite split operation.

split = 1 delete + 2 inserts Atomic, CAS

Page 14: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

Lazy Splitting

Reduce the number of splitting interval operations Lowers the contention at the shared skip-list

Observation: several adjacent objects are marked consecutively 10-60% total objects Degree of adjacency

Mark-split Mark-mark…-mark-split

14

Page 15: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

Our parallelization of mark-split

Our Parallel Mark-Split (PMS)- naturally achieved by: Using the extended skip-list to store free memory intervals for

splitting. Integrating concurrent splitting into parallel marking.

Implementation: New GC available for HotSpot – an industry-level Java Virtual

Machine.

Evaluation using DaCapo benchmarks on Intel and AMD systems. Compare with Lock-based PMS and Concurrent Mark-Sweep

(CMS).

15

Page 16: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

16

A comparison to Concurrent Mark-Sweep

Concurrent

Mark-Sweep Initial Mark Concurrent Mark Final Mark Sweep

Parallel

Mark-Split Initial Mark & Split Concurrent Mark & Split Final Mark & Split

Translate the skip-list to free-list for allocation

Page 17: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

17

Contributions

Introduction Background

Mark-split algorithm Contribution

Skip-list with extended functionalityParallelization of mark-split

Evaluation Conclusion

Page 18: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

Evaluation setup

DaCapo benchmarksAvrora, tomcat, sunflow, lusearch,

xalan 2 Intel Nahalem x 6 HT cores 4 AMD Bulldozer x 12 cores Linux Ubuntu 11.10

OpenJDK 7

Page 19: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

Lazy Splitting

avrora lusearch sunflow tomcat xalan0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Number of split_interval operations when using lazy splitting mechanism

Page 20: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

6 9 12 15 6 9 12 150

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1tomcat

PMS PMS_Lock CMS

GC Threads

AMDIntel

Evaluation – Garbage collection time

20

6 9 12 15 6 9 12 150

0.1

0.2

0.3

0.4

0.5sunflow

PMS PMS_Lock CMS

GC Threads

AMDIntel

GC

Tim

e (

s)

Stop-the-world scenario

Good case

Bad case

Page 21: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

Evaluation – Benchmark time

21

6 9 12 15 6 9 12 150

1000

2000

3000

4000

5000

6000

7000

8000sunflow

PMS PMS_Lock CMS

GC Threads

AMDIntel

6 9 12 15 6 9 12 150

1000

2000

3000

4000

5000

6000

7000tomcat

PMS PMS_Lock CMS

GCThreads

AMDIntel

Be

nc

hm

ark

tim

e (

s)

Concurrent GC

Good case

Bad case

Characterizes applications that benefit from ParMarkSplit:

• High garbage to live ratio

• Live objects reside adjacent being marked consecutively

Page 22: ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing

Conclusion

Parallel and Concurrent Mark-Split GC An extended lock-free skip-list to handle free-intervals Lazy splitting mechanism

Implemented as a garbage collector in industrial-standard OpenJDK7 HotSpot

Evaluation ParMarkSplit performs well with applications with

certain characteristics