18
Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013, in Press) Wei Zheng Department of Computer Science, Xiamen University, Xiamen, China Rizos Sakellariou SchoolofComputerScience,TheUniversityofManchester,UK

Stochastic DAG Scheduling using Monte Carlo Approach

  • Upload
    saniya

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Stochastic DAG Scheduling using Monte Carlo Approach. Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013, in Press) Wei Zheng Department of Computer Science, Xiamen University, Xiamen, China Rizos Sakellariou - PowerPoint PPT Presentation

Citation preview

Page 1: Stochastic DAG Scheduling using Monte Carlo Approach

Stochastic DAG Scheduling using

Monte Carlo ApproachHeterogeneous Computing Workshop (at IPDPS) 2012

Extended version: Elsevier JPDC (accepted July 2013, in Press)

Wei ZhengDepartment of Computer Science, Xiamen University, Xiamen, China

Rizos SakellariouSchoolofComputerScience,TheUniversityofManchester,UK

Page 2: Stochastic DAG Scheduling using Monte Carlo Approach

Previous Presentation (9/06/13)• Research Area: Scheduling workflows under heterogeneous

environment with variable performance.

DAG Scheduling

Static (full-ahead) Just In time Dynamic Rescheduling (runtime)

Page 3: Stochastic DAG Scheduling using Monte Carlo Approach

This Presentation

DAG Scheduling

Static (full-ahead) Just In time Dynamic Rescheduling (runtime)

Page 4: Stochastic DAG Scheduling using Monte Carlo Approach

Introduction• General DAG Scheduling assumption:• Estimated Execution time for each task is known in advance.

• Several techniques of estimation: e.g. average over several runs• Similarly, estimated data transfer time is known in advance.

• A study* has shown, there might be significant deviations in observed performance in Grids.• To address this deviations, Two approaches are prevalent• Just-In-Time (high overhead)• RunTime (static schedule + runtime changes) (hypothesis**: might waste

resources and increase makespan if static schedule is not very good) • * A. Lastovetsky, J. Twamley, Towards a realistic performance model for networks of heterogeneous computers, in:M.Ng,A.Doncescu,L.Yang,T.Leng (Eds.), High

Performance Computational Science and Engineering, in: IFIP InternationalFederationforInformationProcessing,vol.172,Springer,Boston, 2005,pp.39–57. • ** R.Sakellariou,H.Zhao,A low-cost rescheduling policy for efficient mapping of workflows on grid systems, Sci. Program. 12(4) (2004) 253–262

Page 5: Stochastic DAG Scheduling using Monte Carlo Approach

Problem Addressed• Generating a better (minimize makespan) “Static” schedule based on

the stochastic model of the variations in the performance (execution time) of individual tasks in the graph.

Page 6: Stochastic DAG Scheduling using Monte Carlo Approach

Background and Related Work• Heterogeneous Earliest Finish Time heuristic (discussed in the

previous presentation)• List based scheduling.• Prioritize tasks based on the “bLevel” (essentially, tasks on the critical path get

higher priority)

• Once task is chosen, map it to “best” available resource.

bLevel(i) = wi + max j Succ(i)wi→j +bLevel(j)∈

Page 7: Stochastic DAG Scheduling using Monte Carlo Approach

Problem Description• G = (N, E) -> DAG with one entry, one exit node.• R -> set of heterogeneous resources• Et

i,p -> Random variable for execution time

• Assumption: Network bandwidth is constant.• M -> Makespan = finish time of exit node.

Goal: Find schedule Ω to minimize makespan (assign N to R, no overlap, no preemption, no migration)

Page 8: Stochastic DAG Scheduling using Monte Carlo Approach

Methodology• Assumption: Analytical methods that solve the probabilistic optimization

problem are too expensive.• Use Monte Carlo Sampling (MCS) method.

• Define a space comprising possible input values• IG =ETi,p :i N,p R.∈ ∈

• Take an independent sample randomly from the space• PG =fsmp(IG) =ti,p :i N,p R∈ ∈

• Perform deterministic computation using the sample input (store the result)• ΩG =Static_SchedulingHEFT(G,PG)

• Repeat 2 and 3 till some exit condition (no. of repetitions)• Aggregate the stored results of the individual computations into the final result.

Page 9: Stochastic DAG Scheduling using Monte Carlo Approach

MCS Based SchedulingComplexity:• Depends on the deterministic

scheduling algorithm• For HEFT it is O(v + e * r) = O(e*r)• First loop: O(e*r*m)• Second loop: O(e * n * k)• Total = O(e*r*m + e*n*k)

Page 10: Stochastic DAG Scheduling using Monte Carlo Approach

Example

Page 11: Stochastic DAG Scheduling using Monte Carlo Approach

Example10,000 iterations - production phase (Gaussian Distribution)

200 iterations - selection phase

20% reduction in makespan

Absolute increase in algorithm time: 1.2s

Page 12: Stochastic DAG Scheduling using Monte Carlo Approach

Evaluation• Graphs

Page 13: Stochastic DAG Scheduling using Monte Carlo Approach

Threshold Calculation

Page 14: Stochastic DAG Scheduling using Monte Carlo Approach

Convergence (no. of repetitions)

Page 15: Stochastic DAG Scheduling using Monte Carlo Approach

Convergence

Page 16: Stochastic DAG Scheduling using Monte Carlo Approach

Makespan performance evaluation• Static HEFT (baseline) with Mean ET values• Autopsy – Static HEFT With known ET values• MCS - Static• ReStatic • ReMCS

• Graph Generation (random generator of given type)• Task Execution Time for different runs

• Select “Mean” for each task.• Use a probability distribution to select actual execution time. The variation is bounded by Quality of

Estimation (QoE) (0<QoE<1)

Page 17: Stochastic DAG Scheduling using Monte Carlo Approach

Makespan performance evaluation

Page 18: Stochastic DAG Scheduling using Monte Carlo Approach

Summary• It is possible to obtain a good full-ahead static schedule that performs

well under prediction inaccuracy, without too much overhead.• MCS, which has a more robust procedure for selecting an initial

schedule, generally results in better performance when rescheduling is applied