Upload
leslie-baldwin
View
222
Download
4
Tags:
Embed Size (px)
Citation preview
Message Analysis-Guided Allocation and Low-Pause
Incremental Garbage Collection in a
Concurrent Language
KonstantinosSagonas
JesperWilhelmsson
Uppsala University, Sweden
Goals of this work
Efficiently implement concurrency
through asynchronous message-
passing
Memory management with real-time
characteristicso Short stop-times
o High mutator utilization
Design for multithreading
Our context: Erlang
Designed for highly concurrent applications
Soft Real-Time
Light-weight processes
No destructive updates
Data types: atoms, numbers, PIDs, tuples,
cons cells (lists), binariesheapdata
Our context: the Erlang/OTP system
Industrial-strength implementation
Used in embedded applications
Three memory architectures: [ISMM’02]
o Private
o Shared
o Hybrid
Stack
Heap
Private heaps
P P
Private heaps
P P
O(|message|)
copy
Private heaps
P P
Garbage collection is a private business
Fast memory reclamation of terminated processes
O(1)
Shared heap
P P
Global synchronization
Longer stop-times
No fast reclamation of process-local data
Hybrid architecture
P P
Message area
Process-localheaps
Big objects area
Several possible methodso User annotationso Dynamic monitoring [Petrank et al ISMM’02]o Static analysis guided allocation
Allocating messages in themessage area
Static message analysis [SAS’03]
Similar to escape analysis
Allocation is process-local by default
o Possible messages allocated on message
area
o Copy on demand
Analysis is quite precise
o Typically finds 99% of all messages
Process-local heapsPrivate business: No synchronization
required
Message areaTwo generationsCopying collector in young generation
o Fast allocation
Mark-and-sweep in old generationo Prevents repeated copying of old objects
Garbage Collection in Hybrid Arch.
GC of the message area is a bottleneck
1. Generational process scanning
2. Remembered set in local heaps
The root-set for the message area consistsof all stacks and process-local heaps
This is not enough...We need an incremental collector
in the Message Area!
Properties of incremental collector
No overhead on mutator
No space overhead on heap objects
Short stop-times
High mutator utilization
Oldgeneration
Organization of the Message Area
Fwd
Black-map
Younggeneration
NurseryFrom-space
Nursery and from-space always have a constant size,
(=100k words)
Storage area for forwarding pointers.
Size bound by (currently = )
List of arbitrary sized areasFree-list, first-fit allocation
Bit-array used to mark objects in
mark-and-sweep
Nlimit
Ntop
allocationlimit
Nursery
Organization of the Message Area
Incremental collector
Two approaches to choose from:
Work-based
Reclaim n live words each step
Time-based
A step takes no more than t ms
n and t are user-specified
Work-based collection
The mutator wants to allocate need words
reclaim = max( n , need )Nlimit
Ntop
allocationlimit
Allocation limit = Ntop + reclaim
Time-based collection
1. User annotations (as in Metronome)
2. Dynamic worst-case calculation
How much can the mutator allocate?
How much live data is there?
Time-based collection
GC = reclaimed after GC – reclaimed before GC
GCsteps = – reclaimed after GC
GC
wM =Nfree
GCsteps
Nlimit
Ntop
allocationlimit
Allocation limit = Ntop + wM
Collecting the Message Area
P1 P2 P3
FwdNurseryFromspace
Process Queue
Collecting the Message Area
P1 P2 P3
FwdFromspaceNursery
Process Queue
Collecting the Message Area
P1 P2 P3
FwdFromspaceNursery
Process Queue
Collecting the Message Area
P1 P2 P3
FwdFromspaceNursery
P1
Process Queue P1
Collecting the Message Area
P2 P3
FwdFromspaceNursery
Process Queue P1
Collecting the Message Area
P2 P3
FwdFromspaceNursery
Process Queue P1
Collecting the Message Area
P2 P3
FwdFromspaceNursery
allocationlimit
Cheap write barrier
Link receiver to a list in the send operation
Process Queue
Collecting the Message Area
P2 P3
FwdFromspaceNursery
P1
allocationlimit
Process Queue
Collecting the Message Area
P2 P3
FwdFromspaceNursery
P1
allocationlimit
Process Queue
Collecting the Message Area
P2 P3
FwdFromspaceNursery
P1
allocationlimit
Process Queue
Collecting the Message Area
P2 P3
FwdFromspaceNursery
allocationlimit
P1
Process Queue
Collecting the Message Area
P2 P3
FwdFromspaceNursery
P1
allocationlimit
Process Queue
Collecting the Message Area
P2 P3
FwdFromspaceNursery
P1
allocationlimit
Process Queue
Collecting the Message Area
P2 P3
FwdFromspaceNursery
P1
allocationlimit
Process Queue
Collecting the Message Area
P2 P3
FwdFromspaceNursery
P1
allocationlimit
Process Queue
Collecting the Message Area
P2 P3
FwdFromspaceNursery
allocationlimit
P1
Collecting the Message Area
P2 P3
FwdFromspaceNurseryallocationlimit
P1
Performance evaluation: Settings
Intel Xeon 2.4 GHz, 1GB RAM, Linux
Start with small process-local heaps(233 words, grows when needed)
Measure active CPU timeo using hardware performance monitors
Performance evaluation: Benchmarks
Mnesia – Distributed database system1,109 processes 2,892,855 messages
Yaws – HTTP Web server420 processes 2,275,467 messages
Adhoc – Data mining application137 processes 246,021 messages
Stop-times – Time-based
Mnesia
Yaws t = 1ms
Stop-times – Work-based
Adhoc Yaws
n = 2 words
Mean: 3Geo. Mean: 2
Mean: 9Geo. Mean: 1
Stop-times – Work-based
Adhoc Yaws
n = 100 words
Mean: 53Geo. Mean: 46
Mean: 268Geo. Mean: 36
Time (s) Time (s)
Bench-mark
n = 2MA GC
n = 100
MA GC
n = 1000
MA GC
Non-Inc.MA GC
Mnesia 182 164 156 88
Yaws 373 374 242 153
Adhoc 244 203 78 27
Message area total GC timesincremental vs. non-incremental
Times in ms
Bench-mark
MutatorLocal GC
MAn = 2
MAn = 100
MAn =
1000
Mnesia 52,906 4,439 182 164 156
Yaws237,62
911,72
8373 374 242
Adhoc 61,045 8,194 244 203 78
Runtimes – Incremental
Times in ms
Minimum Mutator Utilization
The fraction of time that the mutatorexecutes in any time window[Cheng & Blelloch PLDI 2001]
Mutator Utilization – Work-based
Adhoc
Yaws n = 100 words
Concluding Remarks
Memory allocator is guided by the intended use of data
Incremental Garbage CollectorHigh mutator utilizationSmall overhead on total runtimeNo mutator overheadSmall space overhead
Really short stop-times!
Runtimesincremental vs. non-incremental
Times in ms
Bench-mark
Inc.Mutator
Non-Inc.Mutator
Mnesia 52,906 53,276
Yaws237,62
9240,985
Adhoc 61,045 61,578
Total GC timesincremental vs. non-incremental
Times in ms
Bench-mark
Inc. Local GC
Non-Inc.Local GC
Mnesia 4,439 4,487
Yaws 11,728 11,359
Adhoc 8,194 7,848
Mutator Utilization – Time-based
Mnesia
Yaws t = 1ms