30
Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection Auerbach, Bacon, Cheng, Grove IBM Research Biron, Gracie, Micic, Sciampacone IBM SWG

Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

  • Upload
    thisbe

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection Auerbach, Bacon, Cheng, Grove IBM Research Biron, Gracie, Micic, Sciampacone IBM SWG McCloskey U.C. Berkeley - PowerPoint PPT Presentation

Citation preview

Page 1: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

Tax-and-Spend:Democratic Scheduling forReal-time Garbage Collection

Auerbach, Bacon, Cheng, Grove IBM ResearchBiron, Gracie, Micic, Sciampacone IBM SWGMcCloskey U.C. Berkeley

Special thanks to the authors for sharing the slides, they used in EMSOFT ‘ O8

Page 2: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

Bring the productivity, reliability, security, and portability advantages of modern object oriented languages to the construction of complex real-

time systems.

Page 3: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Real-time Garbage Collection– Metronome (IBM WebSphere Real-Time)– Metronome-TS

• Performance understanding tools– TuningFork (sourceforge.net)

• Programming Models– Eventrons, Exotasks, Flexotasks (Salzburg,

Purdue, EPFL)• Testbed applications

– Harmonicon; Javiator (Salzburg)

Page 4: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Garbage Collection– Automatic memory management

• Programmer only allocates memory• GC automatically recovers unreachable memory

– Productivity, Reliability, Security– Rich variety of GC algorithms and

approaches• Real-Time Garbage Collection

– Provides time and space bounds– Not just “short” GC pauses

Page 5: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Domains– Defense systems (USN Zumwalt-class

destroyer)– Telecommunications (SIP)– Finance (stock trading)

• Vendors– IBM: WebSphere Real-Time (Metronome)– Sun: Java RTS– Azul Systems– BEA: WebLogic Real-Time

Page 6: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Expanding scope of real-time applications– Varying application characteristics

• Classic periodic systems• Queue-based systems• Adaptive, interactive, ...What are these systems ?

– Varying operating environments• OS functionality (RTOS? RT Linux? Stock Unix?)• Uni-processor vs. Multi-processor• Dedicated vs. multi-programmed workloads

• No existing system robustly handles the entirespace of combinations

Page 7: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Scheduling problem– When to do GC work?– How much GC work to do at a time?

• Challenges– Complex global invariants & data structures– Complete entire GC cycle before space

reclaimed– Work required for GC cycle can be

unpredictable– Scheduling just enough work to ensure

completion

Page 8: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Tax-and-Spend Scheduling– Slack-based– Tax-based– Tax-and-Spend

• From Metronome to Metronome-TS• Empirical results• Conclusions

Page 9: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• GC runs only during “slack” periods• No runnable “critical” application threads

• Requires• Concurrent GC algorithm• Programmer identification of critical threads

• Assessment Familiar real-time systems paradigm Can exploit excess capacity and SMP systems Critical threads run with minimal GC

interferencex Identification of critical threadsx Catastrophic failure when insufficient slack or

overload

Page 10: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Interrupt application to perform GC work• Two taxation schemes

– Work-based [Baker]– Time-based [Bacon et al] (Metronome)

• Both schemes require highly incremental GC

– GC work broken into small slices– 100s or 1000s of slices in a single GC cycle

Page 11: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

For each N units of allocation work done bythe application, perform c*N units of GC work• Assessment

Provable space bounds: GC will complete in time

x Highly variable effective pause times x Unable to exploit excess capacity to reduce

tax

Page 12: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• For every N time units the application runs, doN/k time units of GC work• Requires accurate low overhead OS timers• Assessment

Predictable scheduling and pause times Provable worse case time/space boundsx Unable to exploit excess capacity to reduce

tax

Page 13: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Per-thread time-based taxation– Each application thread has tax rate (MMU

target)– Time is per-thread CPU time

• Tax credits– Created by low-priority GC background

threads– Reduce the effective application tax rate

• Simple tax laws work well in practice– Same tax rate for all application threads– Tax credits shared equally among threads

Page 14: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Where to collect taxes– Allocation slow paths (covers most applications)– Time-triggered yield points (when not allocating)

• When an application thread owes taxes– Attempt to pay tax by withdrawing credit from

bank• Success

– GC is “ahead” due to background threads (deposit credits)

– Immediately resume application work• Failure

– If partial credit, do reduced GC work quantum– If no credit, do full GC work quantum

Page 15: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Metronome’s tax-based scheduler is global

– Monolithic policy for entire JVM

• Metronome-TS schedules GC per-thread– Taxation concurrent and asynchronous– Different threads can have different tax rates– “Critical” threads can run with minimal GC

interference– Background GC threads exploit excess CPU

capacity

Page 16: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Operating system– Accurate per-thread CPU timer• Standard on recent Linux kernels (eg RHEL-5)

• GC Algorithm– Fully concurrent– Highly incremental– Parallel– GC work done on application and GC threads

Page 17: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Tax-and-Spend Scheduling• From Metronome to Metronome-TS

- Distributed Agreement- Ensuring Progress

• Empirical results• Conclusions

Page 18: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• GC algorithms require global agreement– GC cycle started/completed– Enable/disable write barriers– Trace completed (all live objects found)– Other instances induced by Java semantics

• Metronome– Not concurrent; uses synchronous

agreement• Metronome-TS

– Fully concurrent; needs asynchronous agreement

Page 19: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• A single monotonic global epoch number• Per thread local epochs

– Always less than or equal to global epoch• “Each time” a thread reaches a safe-point:

– It reads from the global epoch– Uses global epoch number as its new local epoch

• Agreement protocol– A thread modifies shared global state, atomically

increments global epoch and remembers the new value

– All local epochs ≥ remembered value implies agreement

Page 20: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Agreement only between threads doing GC “right now”

– What type of work should they be doing?– Can they transition from one phase of GC to

next?– Ragged Epoch is overkill (involves all threads in

system)• GC Phase & worker count kept in one machine word

– Worker enters: atomically increment count• Phase encodes what work to do

– Worker exits: atomically decrement count– Phase change:

• Last worker out: atomically changes phase & count

Page 21: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Special Case for Marking phase• Write barriers of different worker threads

can have objects in their stacks• A single thread responsible for marking end

of marking phase.• Use ragged epoch mechanism to detect

whether all write buffers are empty.• What other approach we can have ?

• What are soft and weak references, string interning and finalization in Java ?

• How it affects garbage collection?

Page 22: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Symptoms– Threads may not execute safe points in a

timely fashion, stalling Ragged Epoch– Threads may get stuck while doing GC work

• Cause– OS scheduling: multi-programming,

priorities• Solution

– Detect and priority boost laggard threads– In effect, priority inheritance on logical

resource

Page 23: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Tax-and-Spend Scheduling• From Metronome to Metronome-TS• Empirical results

– Methodology– SPECjbb2000– SPECjvm98, DaCapo– Critical section deferral

• Conclusions

Page 24: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Implemented in IBM's J9 VM– Identical baseline for Metronome &

Metronome-TS• LS-41 (8-way AMD-based blade)

– Running RHEL5-MRG (Real-Time Linux)– TuningFork: instrumented App, JVM, Linux

kernel• Enable detailed analysis of MMU, pauses, etc.

– Taskset to segregate instrumentation to 1 CPU

• SPECjbb2000, SPECjvm98, DaCapo

Page 25: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

Metronome-TS uniformly better than Metronome• 2.5x lower max transaction time• 1.6x lower 99.999% transaction time• 20% higher throughput

Page 26: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Metronome-TS smoothly and robustly exploits excess CPU capacity

• 15% throughout improvement with background threads

• Experiments with normal and real-time “hamsters” in paper: under load no degradation from background threads

Page 27: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• 18 programs: with/without background threads

– MMU@4ms: 60% or better– Max GC pause: <400 microseconds

• hsqdlb, lusearch, xalan have longer pauses due to OS

context switch during GC quantum– Background threads effectively offload GC workwhen system has excess CPU capacity

• Greatly reduce median & std dev of GC pauses• Increase application throughput

Page 28: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Better determinism than Metronome– Higher MMU with smaller window size– Even with stricter MMU definition (includes

barrier andallocation slow paths as GC work)– Significant improvements in SPECjbb2000• 2.5x lower max transaction time• 1.6x lower 99.999% transaction time

• Better throughput than Metronome

Page 29: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• Tax-and-Spend Scheduling– Combines desirable properties of Tax-based

andSlack-based approaches– Unified paradigm that supports wide range ofapplication types and operating environments

• Metronome-TS implementation– Highly incremental, fully concurrent, parallel

GCthat supports all Java language features– Applied rigorous per-thread MMU metric– Protocols for distributed agreement

Page 30: Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

• What is the performance of Metronome-TS with small number of cores e.g 2 cores.• Used a huge memory of 12GB for experiments. What is the performance in case of memory constrained systems ?• Overload condition just defines that no of processor <= Number of worker thread

• What is the utilization of each processor in overload ?

• Do we overload memory ?• What the performance of each phase of GC in

these conditions ?