28
1 Job Scheduling for Grid Computing on Metacomput ers Keqin Li Proceedings of the 19th IEEE International Parallel and Di stributed Procession Symposium (IPDPS’05)

Job Scheduling for Grid Computing on Metacomputers

  • Upload
    talib

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Job Scheduling for Grid Computing on Metacomputers. Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium (IPDPS’05). Outline. Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis - PowerPoint PPT Presentation

Citation preview

Page 1: Job Scheduling for Grid Computing on Metacomputers

1

Job Scheduling for Grid Computing on Metacomputers

Keqin Li

Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium (IPDPS’05)

Page 2: Job Scheduling for Grid Computing on Metacomputers

2

Outline

Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis Experimental Data

Page 3: Job Scheduling for Grid Computing on Metacomputers

3

Introduction 1

A metacomputer is a network of computational resources linked by software in such a way that they can be used as easily as a single computer.

A metacomputer is able to support distributed supercomputing applications by combining multiple high-speed high-capacity resources on a computational grid into a single, virtual distributed supercomputer.

Page 4: Job Scheduling for Grid Computing on Metacomputers

4

Introduction 2

The most significant result of the paper is that by using any initial order of jobs and any processor allocation algorithm, the list scheduling algorithm can achieve worst-case performance bound

with

Notation: p is the maximum size of an individual machine

P is the total size of a metacomputer

s is minimum job size with s ≥ p

α is the ratio of the communication bandwidth within a parallel

machine to the communication bandwidth of a network

β is the fraction of the communication time in the jobs

Page 5: Job Scheduling for Grid Computing on Metacomputers

5

Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis Experimental Data

Page 6: Job Scheduling for Grid Computing on Metacomputers

6

A metacomputer is specified as M = (P1, P2, ..., Pm), where Pj , 1 ≤ j ≤ m, is the name as well as the size (i.e., the number of processors) of a parallel machine.

Let P = P1 +P2 +…+Pm denote the total number of processors. The m machines are connected by a LAN, MAN, WAN, or the Intern

et. A job J is specified as (s, t), where s is the size of J (i.e., the number

of processors required to execute J) and t is J’s execution time. The cost of J is the product st.

Given a metacomputer M and a list of jobs L = (J1, J2, ..., Jn), where Ji = (si, ti), 1 ≤ i ≤ n, we are interested in scheduling the n jobs on M.

Page 7: Job Scheduling for Grid Computing on Metacomputers

7

A schedule of a job Ji = (si, ti) is

τi is the starting time of Ji

Ji is divided into ri subjobs Ji,1, Ji,2, ..., Ji,ri , of sizes si,1, si,2, ..., si,ri , respectively, with si = si,1 + si,2 + … + si,ri

The subjob Ji,k is executed on Pjk by using si,k processors, for all 1 ≤ k ≤ ri

Page 8: Job Scheduling for Grid Computing on Metacomputers

8

Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis Experimental Data

Page 9: Job Scheduling for Grid Computing on Metacomputers

9

si processors allocated to Ji communicate with each other during the execution of Ji.

Communication time between two processors residing on different machines connected by a LAN, MAN, WAN, or the Internet is significantly longer than that on the same machine.

The communication cost model takes both inter-machine and intra-machine communications into consideration.

The execution time ti is divided into two components,ti = ti,comp + ti,comm

Each processor on Pjk needs to communicate with the si,k processor

s on Pjk and the si − si,k processors on Pjk’ with k’ ≠ k. t*I,k, the execution time of the subjob Ji,k on Pjk, as

Page 10: Job Scheduling for Grid Computing on Metacomputers

10

Page 11: Job Scheduling for Grid Computing on Metacomputers

11

The execution time of job Ji is t*I = max(t*I,1 , t*i,2 , …, t*I,ri)

we call t*I the effective execution time of job Ji. The above measure of extra communication time among processors

on different machines discourages division of a job into small subjobs.

Page 12: Job Scheduling for Grid Computing on Metacomputers

12

Our job scheduling problem for grid computing on metacomputers can be formally defined as follows:

given a metacomputer M = (P1, P2, ..., Pm) and a list of jobs

L = (J1, J2, ..., Jn), where Ji = (si, ti), 1 ≤ i ≤ n, find a schedule ψ of L, ψ = (ψ1, ψ2, ..., ψn), with ψi = (τi, (Pj1, si,1), (Pj2, si,2), ..., (Pjri, si,ri )),

where Ji is executed during the time interval [τi, τi +t*i ] by using si,k processors on Pjk for all 1 ≤ k ≤ ri, such that the total execution time of L on M,

is minimized.

Page 13: Job Scheduling for Grid Computing on Metacomputers

13

When α = 1, that is, extra communication time over a LAN, MAN, WAN, or the Internet is not a concern, the above scheduling problem is equivalent to the problem of scheduling independent parallel tasks in multiprocessors, which is NP-hard even when all tasks are sequential.

Page 14: Job Scheduling for Grid Computing on Metacomputers

14

Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis Experimental Data

Page 15: Job Scheduling for Grid Computing on Metacomputers

15

A complete description of the list scheduling (LS) algorithm is given in the next slide.

There is a choice on the initial order of the jobs in L. Four ordering strategies: Largest Job First (LJF) – Jobs are arranged such that s1 ≥ s2 ≥…≥ sn

Longest Time First (LTF) – Jobs are arranged such that t1 ≥ t2 ≥…≥ tn Largest Cost First (LCF) – Jobs are arranged such that s1t1 ≥ s2t2 ≥…≥ sn

tn. Unordered (U) – Jobs are arranged in any order.

Page 16: Job Scheduling for Grid Computing on Metacomputers

16

The number of available processors P’j on machine Pj is dynamically maintained. The total number of available processors is P’ = P’1 + P’2 + · · · + P’m

Page 17: Job Scheduling for Grid Computing on Metacomputers

17

Page 18: Job Scheduling for Grid Computing on Metacomputers

18

Page 19: Job Scheduling for Grid Computing on Metacomputers

19

Each job scheduling algorithm needs to use a processor allocation algorithm to find resources in a metacomputer.

Several processor allocation algorithms have been proposed, including Naive, LMF (largest machine first), SMF (smallest machine first), and MEET (minimum effective execution time).

Page 20: Job Scheduling for Grid Computing on Metacomputers

20

Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis Experimental Data

Page 21: Job Scheduling for Grid Computing on Metacomputers

21

Let A(L) be the length of a schedule produced by algorithm A for a list L of jobs, and OPT(L) be the length of an optimal schedule of L. We say that algorithm A achieves worst-case performance bound B

if A(L)/OPT(L) ≤ B for all L

Page 22: Job Scheduling for Grid Computing on Metacomputers

22

Let t*i,LS be the effective execution time of a job Ji in an LS schedule. Assume that all the n jobs are executed during the time interval [0, L

S(L)]. Let Ji be a job which is finished at time LS(L). It is clear that before Ji is scheduled at time LS(L) − t*i,LS, there are n

o si processors available; otherwise, Ji should be scheduled earlier. That is, during the time interval [0, LS(L)−t*i,LS], the number of busy p

rocessors is at least P − si + 1. During the time interval [LS(L)−t*i,LS, LS(L)], the number of busy proc

essors is at least si. Define effective cost of L in an LS schedule as Then, we have

Page 23: Job Scheduling for Grid Computing on Metacomputers

23

No matter which processor allocation algorithm is used, always have

The effective execution time of Ji in an optimal schedule is

Thus, we get

where

It is clear that φi is an increasing function of si, which is minimized when si = s. Hence, we have

where

Page 24: Job Scheduling for Grid Computing on Metacomputers

24

=> => Since

=>

=>

The right hand side of the above inequality is minimized when

=>

Page 25: Job Scheduling for Grid Computing on Metacomputers

25

=>

The right hand side of the above inequality is a decreasing function of

Si, which is maximized when Si = s.

Page 26: Job Scheduling for Grid Computing on Metacomputers

26

Theorem. If Pj ≤ p for all 1 ≤ j ≤ m, and si ≥ s for all 1 ≤ i ≤ n, where p ≤ s, then algorithm LS can achieve worst-case performance bound

where

The above performance bound is independent of the initial order of L and the processor allocation algorithm.

Page 27: Job Scheduling for Grid Computing on Metacomputers

27

Corollary. If a metacomputer only contains sequential machines, i.e., p = 1, communication heterogeneity vanishes and the worst-case performance bound in the theorem becomes

Page 28: Job Scheduling for Grid Computing on Metacomputers

28