44
1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

Embed Size (px)

Citation preview

Page 1: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

1

Scheduling

CEG 4131 Computer Architecture IIIMiodrag Bolic

Slides developed by Dr. Hesham El-Rewini

Copyright Hesham El-Rewini

Page 2: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

2

Outline

• Scheduling models• Scheduling without considering communication• Including communication in scheduling• Heuristic algorithms

Page 3: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

3

Partitioner

Grains of Sequential Code

Parallel/Distributed System

Parallel Program Tasks

Scheduler

Schedule

Processors

Time

Program Tasks

Sequential Program

Explicit ApproachImplicit Approach

Dependence Analyzer

Ideal Parallelism

Scheduling Parallel Tasks

Page 4: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

4

Program Tasks

Task Notation: (T, <, D, A)

• T set of tasks

• < partial order on T

• D Communication Data

• A amount of computation

Page 5: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

F

20

A

5

Task Graph

10

D

15

E

10

B

15

C

10

G

15

H

15

I

30

558 7

5

5 5

10

5

4 5 4

20

Task

Amount of Computation

Communication Data

Dependency

Page 6: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

6

Machine

• m heterogeneous processors

• Connected via an arbitrary interconnection network (network graph)

• Associated with each processor Pi is its speed Si

• Associated with each edge (i,j) is the transfer rate Rij

Page 7: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

7

Task Schedule

• Gantt Chart

• Mapping (f) of tasks to a processing element and a starting time

• Formally: f(v) = (i,t) task v is scheduled to be processed by processor i starting at time t

Page 8: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

8

Gantt Chart

Processor

Time

1 2 3 4 5 0

1

2

3

4

Stop

Start

W-1 W-2 W-3 W-4 W-5

Page 9: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

10

Execution and Communication Times

• If task ti is executed on pj

Execution time = Ai/Sj

• The communication delay between ti and tj, when executed on adjacent processing elements pk and pl is

Dij/Rkl

Page 10: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

11

Complexity

• Computationally intractable in general

• Small number of polynomial optimal algorithms in restricted cases

• A large number of heuristics in more general cases

• Quality of the scheduleschedule vs. Quality of the schedulerscheduler

Page 11: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

12

Scheduling Task Graphs without considering communication

• Polynomial-Time Optimal Algorithms in the following cases:

1. Task graph is in-forest: each node has at most one immediate successor, or out-forest: each node has at most one immediate predecessor

2. Task graph is an interval order

Page 12: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

In-Forest vs. Out-Forest Structure

In-Forest Out-Forest

13

Page 13: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

14

Assumptions

• A task graph consisting of n tasks• A distributed system made up of m processors• The execution time of each task is one unit of time• Communication between any pair of tasks is zero• The goal is to find an optimal schedule, which

minimizes the completion time

Page 14: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

15

List Scheduling

• All considered algorithms belong to the list scheduling class.

• Each task is assigned a priority, and a list of tasks is constructed in a decreasing priority order.

• A task becomes ready for execution when its immediate predecessors in the task graph have already been executed or if it does not have any predecessors.

Page 15: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

16

Scheduling Inforest/Outforest task graphs

1. The level of each node in the task graph is calculated as given above and used as each node’s priority

2. Whenever a processor becomes available, assign it the unexecuted ready task with the highest priority

Page 16: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

17

Example 1: Simple List Scheduling

10

14 13

12

11

9

8 7

5

6

4

2

3

1

P1 P2 P3

1

2 3 4

5 6 7

8 9 10 11 12

13 14 Level 5

Level 4

Level 3

Level 2

Level 11

1 1 1

1 1 1

1 1 1 1 1

1 1

Scheduling

Page 17: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

Example 2: Simple List Scheduling

Task Priority

A 5

B 5

C 5

D 4

E 4

F 4

G 4

H 3

I 3

J 3

K 2

L 2

M 118

A B C

D E F

H I J

K L

M

G

t Processors

0 P1 P2 P3 P4

1 A B C E

2 D F G H

3 I J L

4 K

5 M

Priority Assignment

Scheduling

Page 18: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

C D E

F G H

I J

K L

M

Priority Assignment

SchedulingA B

Example 3: Simple List Scheduling

19

Page 19: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

20

Interval Orders

• A task graph is an interval order when its nodes can be mapped into intervals on the real line, and two elements are related iff the corresponding intervals do not overlap.

• For any interval ordered pair of nodes u and v, either the successors of u are also successors of v or the successors of v are also successors of u.

Page 20: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

21

Scheduling interval ordered tasks

1. The number of successors of each node is used as each node’s priority

2. Whenever a processor becomes available, assign it the unexecuted ready task with the highest priority

Page 21: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

22

Example 1: Scheduling Interval Ordered tasks

0 0 0

3 2 1

4 5 61

9

1 1

1 1 1

1 1 1

1 2 3

654

7 8

Time P1 P2

123

4 5 6

87 9

P30

1

2

3

Page 22: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

Example 2: Scheduling Interval Ordered tasks

23

Task Priority

A 8

B 6

C 5

D 5

E 4

F 1

G 3

H 0

I 0

J 0

23

A B

C D E

F G

I JH

t Processors

0 P1 P2 P3

1 A B

2 C D E

3 G F

4 H I J

Priority Assignment

Scheduling

Page 23: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

Example 3: Scheduling Interval Ordered tasks

2424

A B

C D E

G

K LH

Priority Assignment

Scheduling

F

I JH

Page 24: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

25

Communication Models

• Completion Time– Execution time– Communication time

• Completion Time as 2 Components• Completion Time from the Gantt Chart

Page 25: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

26

Completion Time as 2 Components

• Completion Time = Execution Time + Total Communication Delay

• Total Communication Delay = Number of communication messages * delay per message

• Execution time maximum finishing time of any task

• Number of communication messages – Model A– Model B

Page 26: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

27

Completion Time from the Gantt Chart (Model C)• Completion Time = Schedule Length

• This model assumes the existence of an I/O processor with every processor in the system

• Communication delay between two tasks allocated to the same processor is negligible.

• Communication delay is counted only between two tasks assigned to different processors

Page 27: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

28

Example

A

1

D

1

E

1

B

1

C

1

Assume a system with 2 processors

Page 28: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

29

Models A and B

• Assume tasks A, B, and D are assigned to P1 and tasks C and E are assigned to p2

A

B

D

P1

C

E

P2

Model A

Number of messages = 2

Completion time = 3 + 2

Model B

Number of messages = 1

Completion time = 3 + 1

A

1

D

1

E

1

B

1

C

1

Page 29: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

30

Model C

A

B

C D

E

Communication Delay

P1 P20

1

2

3

4

A

1

D

1

E

1

B

1

C

1

Page 30: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

31

A

4

D

5

E

3

B

9

C

7

L

1

M

1

F

1

G

1

I

1

H

1

K

1

J

1

Processors

P1 P2 P3

A

B C D

E H J

F L K

G M

H I

Model A B Task

Assignment

Processors

P1 P2 P3

A

B

B C D

E H J

F K

G L

H M

I

ModelC Task

Assignment

Model ANumber of Messages = 2 + 2Completion time = 3 + (2*4 + 2*3) = 17

Model BNumber of Messages = 2 + 1 = 3 Completion time = 3 + (2*4 + 1*3) = 14

Model CCompletion time = 8

Communication delay is displayed in the graph for A & B.Assume execution time of a task is 1.

(assume all communication delay is 1 for simplicity)

Models A,B,C Example

Page 31: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

32

Heuristics

A heuristic produces an answer in less than exponential time, but does not guarantee an optimal solution.

•Communication delay versus parallelism•Clustering•Duplication

Page 32: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

33

a

b c

a

b

c

a

b

c

c

P1 P2 P1 P2

10

25

0

10

25

0

40

15

35c

40

50

x

x

x

Task Graph

Gantt Chart-2Gantt Chart-1

Task Exceution time

abc

101515

y

Arc Communication

(a,b) y

(a,c) x < y

x = 5 x = 25

30

Time Time

Communication Delay versus Parallelism

Page 33: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

34

Clustering

a

bc

g

d e

f

a

bc

g

d e

f

a

bc

g

d e

f

(a)(b)

(c)

a

bc

g

d e

f

(d)

Page 34: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

Clustering Example 1 Part 1

35

A

B

C

ED

F

G

4 3

2

1.5

2

1.5

5

1

1

1

1

2

1

1

Time P1 P2

1 A

2

B

3 C

4 D

5

6 E

7

8 F

9

10 G

Task Assignment

1

Communication DelayNOP

Page 35: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

Clustering Example 1 Part 2

36

A

B

C

ED

F

G

4 3

2

1.5

2

1.5

5

1

1

1

1

2

1

1

Time P1 P2

1 A

2

B3 C

4 D

5

6E7

8 F

9 G

Task Assignment

1

Communication DelayNOP

Page 36: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

37

Clustering Example 2

37

A

B D

FE

G

H

4 3

2

22

5

3

2

1

1

2

3

1

Time P1 P2

1 A

2

B3

4

5 D

6 D

7

C

E

8 E

9 F

10 F

11 G

12

13 H

Task Assignment

C

2

1

5

2

1

Communication DelayNOP

Page 37: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

38

a

b c

a

b c

a

b

c

P1 P2 P1 P2

xx

Task Graph

x

a

No Duplication Task a is Duplicated

Duplications

Page 38: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

Duplication Example (Using Clustering Example 1 Part 2)

39

A

B

C

ED

F

G

4 3

2

1.5

2

1.5

5

1

1

1

1

2

1

1

Time P1 P2

1 A A

2

B

C

3 D

4

5E6

7 F

8 G

Task Assignment

1

Communication DelayNOP

Page 39: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

40

Scheduling and grain packing• Four major steps are involved in the grain determination and the process of

scheduling optimization:

– Step 1. Construct a fine-grain program graph.

– Step 2. Schedule the fine-grain computation.

– Step 3. Grain packing to produce the coarse grains.

– Step 4. Generate a parallel schedule based on the packed graph.

Page 40: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

41

Program decomposition for static multiprocessor scheduling

• two 2 x 2 matrices A and B are multiplied to compute the sum of the four elements in the resulting product matrix C = A x B. There are eight multiplications and seven additions to be performed in this program, as written below:

2221

1211

2221

1211

2221

1211

C C

C C

B B

B B

A A

A A

Page 41: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

42

Example 2.5 Ctd’– C11 = A11 B11 + A12 B21

– C12 = A11 B12 + A12 B22

– C21 = A21 B11 + A22 B21

– C22 = A21 B11 + A22 B22

– Sum = C11 + C12 + C21 + C22

2221

1211

2221

1211

2221

1211

C C

C C

B B

B B

A A

A A

Page 42: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

43

Page 43: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

44

Page 44: 1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

45