38
Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado State University Electrical and Computer Engineering Department Fort Collins, Colorado, USA [email protected]

Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

Embed Size (px)

Citation preview

Page 1: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

Robust Resource Allocation in Parallel and Distributed

Computing Systems(tentative)

Ph.D. candidate V. Shestak

Colorado State UniversityElectrical and Computer Engineering Department

Fort Collins, Colorado, USA [email protected]

Page 2: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

2

V. Shestak: Progress Toward Ph.D.

n start: August 2003

n research completed: 75% (3 parts out of 4)

n publications:510 accepted (9 conferences, one journal)5one under review (journal)5one draft in preparation (journal)

n patents: one filed, two in process

n graduation: December 2007

Page 3: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

3

Outline

n part 1: two-stage approach to resource allocation for periodicstrings of applications

n part 2: resource allocation in IBM cluster-based printing system

n part 3: stochastic robustness metric and its use for static resourceallocations

n part 4: robust resource allocation under random node failures and recoveries – in progress

Page 4: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

4

PART 1: Shipboard Computing Environment

n computation resources

5 heterogeneous set of machines

5multitasking enabled

n communication network

5 independent virtual point-to-point communication routes

5 fixed available bandwidth on each route

n resource mapper

5 centralized approach

5 initial static resource allocation

5 robust against increases in workload

Page 5: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

5

PART 1: Workload

n periodic continuously running applications organized in strings

n string QoS constraints

5 throughput = 1/P (where P is time interval between input arrivals)

5 end-to-end latency L

≤ P

≤ L

≤ P

[1]tt[1]ct −[ 1]tt n [ ]ct n

•strings have priority factors

Page 6: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

6

PART 1: Performance Goal for Initial Allocation

n primary objective: maximize the sum of priority factors of strings allocated in the system

n secondary objective: maximize system slackness

5 system slackness is the minimum unused utilization across all machines and communication routes in the system

5 system slackness quantitatively reflects the system’s potential to absorb unpredictable increases in workload

Page 7: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

7

PART 1: Resource Utilization

b

a

b

a ab b

b

Page 8: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

8

PART 1: Two-Stage Solution Approach

n first stage: Genitor-based global search algorithm coupled with low-level greedy heuristic

5 global search algorithm operates in the permutation space

5 greedy heuristic maps chromosomes into the solution space

n second stage: Branch-and-Bound depth first search algorithm5 Integer Linear Programming (ILP) formulation5 continuous lower bound tightening over time

•solution passed

Page 9: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

9

PART 1: Results – 1 Trial

Page 10: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

10

PART 1: Results – 50 Trials

Page 11: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

11

PART 1: References

n V. Shestak, E. K. P. Chong , A. A. Maciejewski, H. J. Siegel, L Benmohamed, I. J. Wang, R. Daley, “Resource allocation for periodic applications in a shipboard environment,” 14th Heterogeneous Computing Workshop (HCW 2005), in proceedings of 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), Apr. 2005, pp. 122–127.

n V. Shestak, E. K. P. Chong, A. A. Maciejewski, H. J. Siegel, L. Benmohamed, I-J. Wang, and R. Daley, “A two-stage approach to resource allocation for periodic strings of applications in a shipboard environment,” submitted to Journal of Parallel and Distributed Computing (JPDC). Under review.

Page 12: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

12

Outline

n part 1: two-stage approach to resource allocation for periodicstrings of applications

n part 2: resource allocation in IBM cluster-based printing system

n part 3: stochastic robustness metric and its use for static resourceallocations

n part 4: robust resource allocation under random node failures and recoveries – in progress

Page 13: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

13

PART 2: IBM Printer System Layout

n processing must be done in distributed fashion

n printheads consume bitmaps in page order

Page 14: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

14

PART 2: Goals for Cluster Controller Project

n algorithm for assigning sheetsides to blades

5mathematical model of the environment

5optimized sheetside workload distribution algorithm

n system performance simulation

5evaluate algorithm’s efficiency

5determine cost effective system configuration

g minimize number of blades

g minimize memory sizes

Page 15: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

15

PART 3: IBM Cluster Controller Project: Results

min RIP completion timeround robinrandom

bitmap lifetime (sec.)

how long bitmap exists in the system

Page 16: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

16

PART 2: References

n J. Smith, V. Shestak, H. J. Siegel, S. Price, L. Teklits, and P. Sugavanum “Resource allocation in cluster-based imaging systems,”2007 International Conference on Parallel & Distributed Techniques and Applications (PDPTA’07). Accepted, to appear.

n patent: V. Shestak, S. Price, J. Smith, L. Teklits, H. J. Siegel,and P. Sugavanam, “Methods and Systems for Improved PrintingSystem Sheet Side Dispatch in a Clustered Printer Controller,”filed as IBM Docket BLD 920060015US1, Sep. 1 2006.

Page 17: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

17

Outline

n part 1: two-stage approach to resource allocation for periodicstrings of applications

n part 2: resource allocation in IBM cluster-based printing system

n part 3: stochastic robustness metric and its use for staticresource allocations

n part 4: robust resource allocation under random node failures and recoveries – in progress

Page 18: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

18

PART 3: QoS-Constrained Resource Allocation

n establish system performance metric

n develop mathematical model that provides functional dependence between performance metric, input parameters, and uncertainties in the system

n integrate this model into adapted ordeveloped optimization technique

n evaluate quality of the received sub-optimal solution(s)

Page 19: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

19

PART 3: QoS-Constrained Example System

1MaMn Ma

Λ

11a11na

n periodic data setsn processing of each data set to be completed within time unitsΛ

Page 20: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

20

PART 3: Stochastic Robustness Metric

for a given resource allocation

5 set of applications on compute node j

5 (random variable) execution time of on compute node j

5 (random variable) makespan

5 and specify acceptable range for

1 2{ , ,..., }jj j j n jS a a a=

ijT ija

ψ1

11 1

max{ ,..., }Mn n

i iMi i

T Tψ= =

= ∑ ∑minβ ψ

stochastic robustness metric is the probability that the performance

characteristic is confined to the interval :

min max[ ]P β ψ β≤ ≤min max[ , ]β β

maxβ

Page 21: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

21

PART 3: Stochastic Resource Allocation

•node 1

•node 2

application assigned to:

makespan constraint

est. makespan(mean) probability of

exceeding makespan

time

time

prob

abili

ty d

ensi

ty fu

nctio

n

Page 22: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

22

PART 3: Independence

n among local performance characteristics

allows stochastic robustness metric to be computed as 1

jn

j iji

Tψ=

=∑

1

[0 ] [0 ]M

jj

P Pψ ψ=

≤ ≤ Λ = ≤ ≤ Λ∏

n among random variables

allows convolution to be applied to find pdf of

5 Fast Fourier Transform (FFT) method can be used

ijT

1

jn

iji

T=∑

n if dependencies, apply bootstrap approximation method

Page 23: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

23

0

20

40

60

80

100

120

300 350 400 450 500 550 600 650 700

makespan (sec.) based on mean values

stoc

hast

ic r

obus

tnes

s (%

)

PART 3: Comparison Analysis

1,000 randomly generated resource allocations

Tij discrete distributions constructed randomly in the same range

Page 24: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

24

PART 3: Heuristics

heuristics

n two-phase greedy

5 basic, conflict resolution

n one-phase greedy

5 sorting, mean load balancing

n global search

5 steady-state genetic algorithm

5 ant colony optimization

5 simulated annealing

n allocate N independent applications across M nodes

n minimize period between data sets while maintaining value[ ]P ψ ≤ Λ

Page 25: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

25

PART 3: Greedy Heuristics: Results

n value was set to 0.9

n results are based on 50 experimental trials

[ ]Pψ ≤ Λ

Page 26: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

26

PART 3: Global Search Heuristics: Results

n value was set to 0.9

n results are based on 50 experimental trials

[ ]Pψ ≤ Λ

Page 27: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

27

PART 3: References

n V. Shestak, J. Smith, A. A. Maciejewski, and H. J. Siegel, “A stochastic approach to measuring the robustness of a resource allocation in distributed systems,” 2006 International Conference on Parallel Processing (ICPP’06), Aug. 2006, pp. 459–470.

n V. Shestak, J. Smith, R. Umland, J. Hale, P. Moranville, A. A. Maciejewski, and H. J. Siegel, “Greedy approaches to stochastic robust resource allocation in sensor driven distributed systems,” 2006 International Conference on Parallel & Distributed Techniques and Applications (PDPTA’06), June 2006, pp. 4–13.

n V. Shestak, J. Smith, A. A. Maciejewski, and H. J. Siegel, “Iterative algorithms for stochastically robust static resource allocation in periodic sensor driven clusters,” 8th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2006), Nov. 2006.

Page 28: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

28

Outline

n part 1: two-stage approach to resource allocation for periodicstrings of applications

n part 2: stochastic robustness metric and its use for static resourceallocations

n part 3: resource allocation in IBM cluster-based printing system

n part 4: robust resource allocation under random node failures and recoveries – in progress

Page 29: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

29

PART 4: System Prototype

•task pool

•cluster controller

•heterogeneous cluster•workload

•system log

Page 30: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

30

PART 4: System Prototype

•heterogeneous cluster

•cluster controller

log

log

lo

log g

•stage i •time

Page 31: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

31

PART 4: Known Parameters & Assumptions

n each task has a importance factor

n estimated time to compute each task is known

n node failure & recovery statistics is known

n total time to execute task batch is T

n no new arrivals during T

n stage length: λ time units (fixed)

n system log is received at the end of each stage

n mapping decision is generated per stage

n no credit is given for partial task execution

n if node recovers in stage i it will be used in stage i + 1

Page 32: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

32

PART 4: Goal for Cluster Controller

n maximize revenue, i.e., expected sum of importance factors of the tasks completed over T

n maximize sum of importance factors of the tasks completed per each stage λ

Page 33: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

33

PART 4: Off-Line Policy Generation (Hypothetical Solution)

•cluster controller

n off-line generated policy:

5 result – lookup table

5 optimal control selection at each stage

5 finite horizon DP

5 intractable even for

medium-scale problems

0 •λproduce mapping execute tasks

Page 34: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

34

PART 4: On-line Policy Generation

•cluster controller

n on-line policy generation:

5 Monte Carlo simulation

5 limited horizon DP

n time to select control varies

0 •λproduce mapping execute tasks

Page 35: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

35

PART 4: Estimating Expected Revenue from Future States

[ ] [ ] ( )( )( )1 1 2 2revenue , ( ) , ( ) ...

stages

E imp x u x E imp x u x E

N

= + +1444444442444444443

total number of stages

MDP state

( ) control applied to

[ , ( )] accumulated importance in stage

i

i i

i i

N

x

u x x

imp x u x i

computecomputecompute estimate

Page 36: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

36

PART 4: Estimating Expected Revenue from Future States

certain number of stages

input

current state

control

probabilities

output

expectedaccumulatedimportancefrom future

states

method

machinelearning:

regression,neural

networks…

Can we achieve the desired accuracy?For how many stages?

Page 37: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

37

Outline

n part 1: two-stage approach to resource allocation for periodicstrings of applications

n part 2: stochastic robustness metric and its use for static resourceallocations (done jointly with J. Smith, and will appear in his thesis)

n part 3: resource allocation in IBM cluster-based printing system(done jointly with J. Smith, and will appear in his thesis)

n part 4: robust resource allocation under random node failures and recoveries

Page 38: Robust Resource Allocation in Parallel and Distributed ... · Robust Resource Allocation in Parallel and Distributed Computing Systems (tentative) Ph.D. candidate V. Shestak Colorado

38

Summary

n part 1: designed two-stage approach to static resource allocation for periodic strings of applications in QoS-constrained system

n part 2: designed workload distribution algorithm for IBM printer cluster controller

n part 3: presented a methodology for deriving stochastic robustness metric for resource allocation5illustrated methodology for example distributed system

n part 4: propose an idea for resource allocation in distributed systems with random node failures and recoveries