JSSPP-11, Boston, MA June 19, 2005 1 Pitfalls in Parallel Job Scheduling Evaluation Designing Parallel Operating Systems using Modern Interconnects Pitfalls

JSSPP-11, Boston, MA June 19, 2005

1

Pitfalls in Parallel Job Scheduling Evaluation

Designing Parallel Operating Systems

using Modern Interconnects


Eitan Frachtenberg and Dror Feitelson

Computer and Computational Sciences DivisionLos Alamos National Laboratory

Ideas that change the world


2


Scope

Numerous methodological issues occur with the Numerous methodological issues occur with the evaluation of parallel job schedulers:evaluation of parallel job schedulers:

Experiment theory and designExperiment theory and design

Workloads and applicationsWorkloads and applications

Implementation issues and assumptionsImplementation issues and assumptions

Metrics and statisticsMetrics and statistics

Paper covers 32 recurring pitfalls, organized into Paper covers 32 recurring pitfalls, organized into topics and sorted by severitytopics and sorted by severity

Talk will describe a real case study, and the heroic Talk will describe a real case study, and the heroic attempts to avoid most such pitfallsattempts to avoid most such pitfalls

……as well as the less-heroic oversight of several othersas well as the less-heroic oversight of several others


3


Evaluation Paths

Theoretical Analysis (queuing theory):Theoretical Analysis (queuing theory): Reproducible, rigorous, and resource-friendlyReproducible, rigorous, and resource-friendly Hard for time slicing due to unknown parameters, Hard for time slicing due to unknown parameters,

application structure, and feedbacksapplication structure, and feedbacks

Simulation:Simulation: Relatively simple and flexibleRelatively simple and flexible Many assumptions, not all known/reported; hard to Many assumptions, not all known/reported; hard to

reproduce; rarely factors application characteristics reproduce; rarely factors application characteristics

Experiments with real sites and workloads:Experiments with real sites and workloads: Most representative (at least locally)Most representative (at least locally) Largely impractical and irreproducible Largely impractical and irreproducible

EmulationEmulation


4


Emulation Environment

Experimental platform consisting of three clusters Experimental platform consisting of three clusters with high-end networkwith high-end network

Software: several job scheduling algorithms Software: several job scheduling algorithms implemented on top of STORM:implemented on top of STORM:

Batch / space sharing, with optional EASY backfillingBatch / space sharing, with optional EASY backfilling

Gang Scheduling, Implicit Coscheduling (SB), Gang Scheduling, Implicit Coscheduling (SB), Flexible CoschedulingFlexible Coscheduling

Results described in [JSSPP’03] and [TPDS’05]Results described in [JSSPP’03] and [TPDS’05]


5


Step One: Choosing Workload

Static vs. DynamicStatic vs. Dynamic

Size of workloadSize of workload

How many different workloads are needed?How many different workloads are needed?

Use trace data?Use trace data?

Different sites have different workload characteristics Different sites have different workload characteristics

Inconvenient sizes may require imprecise scalingInconvenient sizes may require imprecise scaling

““Polluted” data, flurriesPolluted” data, flurries

Use model-generated data?Use model-generated data?

Several models exist, with different strengthsSeveral models exist, with different strengths

By trying to capture everything, may capture nothingBy trying to capture everything, may capture nothing


6


Static Workloads

We start with a synthetic We start with a synthetic application & static workloads application & static workloads Simple enough to model, Simple enough to model, debug, and calibratedebug, and calibrateBulk-synchronous applicationBulk-synchronous applicationCan control: granularity, Can control: granularity, variability and variability and Communication patternCommunication pattern


7


Synthetic Scenarios

Balanced Complementing Imbalanced Mixed


8


Example: Turnaround Time

0

50

100

150

200

250

300

350

400

Balanced Imbalanced Complementing Mixed

FCFS GS SB FCS Optimal


9


Dynamic Workloads

We chose Lublin’s model [JPDC’03]We chose Lublin’s model [JPDC’03]

1000 jobs per workload1000 jobs per workload

Multiplying run-times AND arrival times by constant Multiplying run-times AND arrival times by constant to “shrink” run time (2-4 hours)to “shrink” run time (2-4 hours)

Shrinking too much is problematic (system constants)Shrinking too much is problematic (system constants)

Multiplying arrival times by a range of factors to Multiplying arrival times by a range of factors to modify loadmodify load

Unrepresentative, since deviates from “real” Unrepresentative, since deviates from “real” correlations with run times and job sizes.correlations with run times and job sizes.

Better solution is to use different workloads Better solution is to use different workloads


10


Synthetic applications are easy to control, but:Synthetic applications are easy to control, but:

Some characteristics are ignored (e.g., I/O, memory)Some characteristics are ignored (e.g., I/O, memory)

Others may not be representative, in particular Others may not be representative, in particular communication, which is salient of parallel apps.communication, which is salient of parallel apps.

Granularity, pattern, network performanceGranularity, pattern, network performance

If not sure, conduct sensitivity analysisIf not sure, conduct sensitivity analysis

Might be assumed malleable, moldable, or with linear Might be assumed malleable, moldable, or with linear speedup, which many MPI applications are notspeedup, which many MPI applications are not

Real applications have no hidden assumptionsReal applications have no hidden assumptions

But may also have limited generalityBut may also have limited generality

Step Two: Choosing Applications


11


Example: Sensitivity Analysis


12


Application Choices

Synthetic applications on first setSynthetic applications on first set

Allows control over more parametersAllows control over more parameters

Allows testing unrealistic but interesting conditions Allows testing unrealistic but interesting conditions (e.g., high multiprogramming level)(e.g., high multiprogramming level)

LANL applications on second set (Sweep3D, Sage)LANL applications on second set (Sweep3D, Sage)

Real memory and communication use (MPL=2)Real memory and communication use (MPL=2)

Important applications for LANL’s evaluationsImportant applications for LANL’s evaluations

But probably only for LANL…But probably only for LANL…

Runtime estimate: f-model on batch, MPL on othersRuntime estimate: f-model on batch, MPL on others


13


Step Three: Choosing Parameters

What are reasonable input parameters to use in the What are reasonable input parameters to use in the evaluation?evaluation?

Maximum multiprogramming level (MPL)Maximum multiprogramming level (MPL)

Timeslice quantumTimeslice quantum

Input loadInput load

Backfilling method and effect on multiprogrammingBackfilling method and effect on multiprogramming

Run time estimate factor (not tested)Run time estimate factor (not tested)

Algorithm constants, tuning, etc.Algorithm constants, tuning, etc.


14


Example 1: MPL

Verified with different offered loadsVerified with different offered loads


15


Example 2: Timeslice

Dividing to quantiles allows analysis of effect on Dividing to quantiles allows analysis of effect on different job typesdifferent job types


16


Considerations for Parameters

Realistic MPLsRealistic MPLs

Scaling traces to different machine sizesScaling traces to different machine sizes

Scaling offered loadScaling offered load

Artificial user estimates and multiprogramming Artificial user estimates and multiprogramming estimatesestimates


17


Step Four: Choosing Metrics

Not all metrics are easily comparable:Not all metrics are easily comparable:

Absolute times, slowdown with time slicing, etc.Absolute times, slowdown with time slicing, etc.

Metrics may need to be limited to a relevant contextMetrics may need to be limited to a relevant context

Use multiple metrics to understand characteristicsUse multiple metrics to understand characteristics

Measuring utilization for an Measuring utilization for an openopen model model

Direct measure of offered load till saturationDirect measure of offered load till saturation

Same goes for throughput and makespanSame goes for throughput and makespan

Better metrics: slowdown, response time, wait timeBetter metrics: slowdown, response time, wait time

Using mean with asymmetric distributionsUsing mean with asymmetric distributions

Inferring scalability from O(1) nodesInferring scalability from O(1) nodes


18


Example: Bounded Slowdown


19


Example (continued)


20


Response Time


21


Bounded Slowdown


22


Step Five: Measurement

Never measure saturated workloadsNever measure saturated workloads

When arrival rate is higher than service rate, queues When arrival rate is higher than service rate, queues grow to infinity; all metrics become meaninglessgrow to infinity; all metrics become meaningless

……but finding saturation point can be trickybut finding saturation point can be tricky

Discard warm-up and cool-down resultsDiscard warm-up and cool-down results

May need to measure subgroups separately May need to measure subgroups separately (long/short, day/night, weekday/weekend,…)(long/short, day/night, weekday/weekend,…)

Measurement should still have enough data points Measurement should still have enough data points for statistical meaning, especially workload lengthfor statistical meaning, especially workload length


23


Example: Saturation Point


24


Example: Shortest jobs CDF


25


Example: Longest jobs CDF


26


Conclusion

Parallel Job Scheduling Evaluation is complexParallel Job Scheduling Evaluation is complex

……but we can avoid past mistakesbut we can avoid past mistakes

Paper can be used as a checklist to work with when Paper can be used as a checklist to work with when designing and executing evaluationsdesigning and executing evaluations

Additional information in paper:Additional information in paper:

Pitfalls, examples, and scenariosPitfalls, examples, and scenarios

Suggestions on how to avoid pitfallsSuggestions on how to avoid pitfalls

Open research questions (for next JSSPP?)Open research questions (for next JSSPP?)

Many references to positive examplesMany references to positive examples

Be cognizant when Choosing your compromisesBe cognizant when Choosing your compromises


27


References

Workload archive:Workload archive:

http://www.cs.huji.ac.il/~feit/workladhttp://www.cs.huji.ac.il/~feit/worklad

Contains several workload traces and modelsContains several workload traces and models

Dror’s publication pageDror’s publication page

http://http://www.cs.huji.ac.il/~feit/pub.htmlwww.cs.huji.ac.il/~feit/pub.html

Eitan’s publication pageEitan’s publication page

http://www.cs.huji.ac.il/~etcs/pubshttp://www.cs.huji.ac.il/~etcs/pubs

Email: [email protected]: [email protected]

Documents

JSSPP-11, Boston, MA June 19, 2005 1 Pitfalls in Parallel Job Scheduling Evaluation Designing Parallel Operating Systems using Modern Interconnects Pitfalls