33
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1

A Framework for Elastic Execution of Existing MPI Programs

Embed Size (px)

DESCRIPTION

A Framework for Elastic Execution of Existing MPI Programs. Aarthi Raveendran Graduate Student Department Of CSE. Motivation. Emergence of Cloud Computing Including for HPC Applications Key Advantages of Cloud Computing Elasticity (dynamically acquire resources) Pay-as-you model - PowerPoint PPT Presentation

Citation preview

1

A Framework for Elastic Execution of Existing MPI Programs

Aarthi Raveendran Graduate Student

Department Of CSE

A Framework for Elasti c Executi on of Existi ng MPI Programs 2

Motivation

Emergence of Cloud Computing• Including for HPC Applications

Key Advantages of Cloud Computing• Elasticity (dynamically acquire resources) • Pay-as-you model • Can be exploited to meet cost and/or time constraints

Existing HPC Applications• MPI-based, use fixed number of nodes

Need to make Existing MPI Applications Elastic

A Framework for Elasti c Executi on of Existi ng MPI Programs 3

Outline

Research Objective Framework Design Run time support modules Experimental Platform: Amazon Cloud Services Applications and Experimental Evaluation Decision layer Design Feedback Model Decision Layer Implementation Experimental Results for Time and Cost Criteria Conclusion

A Framework for Elasti c Executi on of Existi ng MPI Programs 4

Detailed Research Objective

To make MPI applications elastic • Exploit key advantage of Cloud Computing• Meet user defined time and/or cost constraints • Avoid new programming model or significant recoding

Design a framework for• Decision making

When to expand or contract• Actual Support for Elasticity

Allocation, Data Redistribution, Restart

A Framework for Elasti c Executi on of Existi ng MPI Programs 5

Framework components

A Framework for Elasti c Executi on of Existi ng MPI Programs 6

Framework design – Approach and Assumptions

Target – Iterative HPC Applications Assumption : Uniform work done at every iteration Monitoring at the start of every few iterations of the time-step

loop Checkpointing Resource allocation and redistribution

A Framework for Elasti c Executi on of Existi ng MPI Programs 7

Framework design - A Simple Illustration of the Idea

Progress checked based on current average iteration time Decision made to stop and restart if necessary Reallocation should not be done too frequently If restarting is not necessary, the application continues running

A Framework for Elasti c Executi on of Existi ng MPI Programs 8

Framework design – A Simple illustration of the idea

A Framework for Elasti c Executi on of Existi ng MPI Programs 9

Framework Design Execution flow

A Framework for Elasti c Executi on of Existi ng MPI Programs 10

Other Runtime Steps

Steps taken to perform scaling to a different number of nodes:• Live variables and arrays need to be collected at the master

node and redistributed • Read only need not be restored – just retrieve• Application is restarted with each node reading the local

portions of the redistributed data.

A Framework for Elasti c Executi on of Existi ng MPI Programs 11

Background – Amazon cloud

Amazon Elastic compute cloud (EC2)

Small instances : 1.7 GB of memory, 1 EC2 Compute Unit, 160 GB of local instance storage, 32-bit platform

Large instances : 7.5 GB of memory, 4 EC2 Compute Units, 850 GB of local instance storage, 64-bit platform

On demand , reserved , spot instances

Amazon Simple Storage Service (S3)

Provides key - value store Data stored in files Each file restricted to 5 GB Unlimited number of files

A Framework for Elasti c Executi on of Existi ng MPI Programs 12

Runtime support modulesResource allocator

Elastic execution• Input taken from the decision layer on the number of resources• Allocating de- allocating resources in AWS environment• MPI configuration for these instances

Setting up of the MPI cluster Configuring for password less login among nodes

A Framework for Elasti c Executi on of Existi ng MPI Programs 13

Runtime support modules Check pointing and redistribution

Multiple design options feasible with the support available on AWS• Amazon S3

Unmodified Arrays Quick access from EC2 instances Arrays stored in small sized chunks

• Remote file copy Modified arrays (live arrays) File writes and reads

A Framework for Elasti c Executi on of Existi ng MPI Programs 14

Runtime support modules Check pointing and redistribution

Current design • Knowledge of division of the original dataset necessary• Aggregation and redistribution done centrally on a single node

Future work • Source to source transformation tool • Decentralized array distribution schemes

A Framework for Elasti c Executi on of Existi ng MPI Programs 15

Experiments

Framework and approach evaluated using • Jacobi• Conjugate Gradient (CG )

MPICH 2 used 4, 8 and 16 small instances used for processing the data Observation made with and without scaling the resources -

Overheads 5-10% , which is negligible

A Framework for Elasti c Executi on of Existi ng MPI Programs 16

Experiments – Jacobi

No. NodesW/O Redist. (sec)

W/Redist. (sec)

Data Movement (sec)

Overhead (%)

4 2810 2850 71 18 1649 1720 89 4

16 1001 1087 87 9

JACOBI APPLICATION WITHOUT SCALING THE RESOURCES

Starting Nodes

Final Nodes

MPI Config. (sec)

Data Movement (sec)

Total (sec)Overhead (%)

4 8 81 3 2301 34 16 84 3 1998 58 4 80 3 2267 28 16 95 3.8 1386 4

16 4 99 3.5 2004 516 8 97 3 1390 5

JACOBI APPLICATION WITH SCALING THE RESOURCES

A Framework for Elasti c Executi on of Existi ng MPI Programs 17

Experiments – Jacobi

Matrix updated at every iteration Updated matrix collected and redistributed at node change Worst case total redistribution overhead – less than 2% Scalable application – performance increases with number of

nodes

A Framework for Elasti c Executi on of Existi ng MPI Programs 18

Experiments - CG

No. NodesW/O Redist. (sec)

W/Redist. (sec)

Data Movement (sec)

Overhead (%)

4 834 879 2.5 58 997 980 3 0

16 1030 1105 2.7 7

CG APPLICATION WITHOUT SCALING THE RESOURCES

Starting Nodes

Final Nodes

MPI Config. (sec)

Data Movement (sec)

Total (sec)Overhead (%)

4 8 43 3 930 24 16 60 3 999 78 4 40 4 942 38 16 81 3 1060 5

16 4 58 3 1003 816 8 82 3 1080 7

CG APPLICATION WITH SCALING THE RESOURCES

A Framework for Elasti c Executi on of Existi ng MPI Programs 19

Experiments - CG

Single vector which needs to be redistributed Communication intensive application Not scalable Overheads are still low

A Framework for Elasti c Executi on of Existi ng MPI Programs 20

Decision Layer - Design

Main Goal – To meet user demands Constraints – Time and Cost – “Soft” and not “Hard” Measuring iteration time to determine progress Measuring communication overhead to estimate scalability Moving to large – type instances if necessary

A Framework for Elasti c Executi on of Existi ng MPI Programs 21

Feedback Model (I)

Dynamic estimation of node count based on inputs :• Input time / Cost• Per iteration time• Current node count• Communication time per iteration• Overhead costs – restart , redistribution, data read

A Framework for Elasti c Executi on of Existi ng MPI Programs 22

Feedback Model (II)

Move to large instances if communication time is greater than 30 % of total time

Time Criteria :• New node count found based on the current progress• If time criteria cannot be met with max nodes also, shift to max

nodes to get best results Cost Criteria :• Change at the end of billing cycle• If cost criteria cannot be met with min nodes also, shift to min

nodes.

A Framework for Elasti c Executi on of Existi ng MPI Programs 23

Decision layer - Implementation

Input : Monitoring Interval, Criteria, Initial Node Count, Input Time / Cost

Output : Total Process Time, Total Cost

A Framework for Elasti c Executi on of Existi ng MPI Programs 24

Experiments – Time CriteriaJacobi

800 900 1000 1100 1200 14000

200

400

600

800

1000

1200

1400

Input Time vs Output Time

Input Time (tip in secs)

Out

put T

ime

( top

in se

cs)

A Framework for Elasti c Executi on of Existi ng MPI Programs 25

Experiments – Time CriteriaJacobi

0 20 40 60 80100

120140

160180

200220

240260

280300

320340

360380

400420

440460

480500

5200

2

4

6

8

10

12

14

16

18

20 Nodechange for Different Input Time

8009001000110012001400

Iteration Number

Nod

ecou

nt

Input Time (In secs)

A Framework for Elasti c Executi on of Existi ng MPI Programs 26

Experiments – Time CriteriaCG

Start Node

I/p Time NodechangeIteration Number

o/p Time NodechangeIteration Number

o/p Time

3004 small

to 4 large

5 448remains at

4 large- 357

6004 small

to 4 large

35 592remains at

4 large- 371

800remains at

4 small- 764

4 large to

4 small5 690

4 Small 4 large Node Changes for different Input Time for different types of Start Node

A Framework for Elasti c Executi on of Existi ng MPI Programs 27

Experiments – Cost CriteriaJacobi

3 4 5 6 7 100

1

2

3

4

5

6

7

8

9

10

Input Cost vs Output Cost

Input Cost (Cip in $)

Out

put C

ost (

Cop

in $

)

A Framework for Elasti c Executi on of Existi ng MPI Programs 28

Experiments – Cost CriteriaJacobi

0 20 40 60 80100

120140

160180

200220

240260

280300

320340

360380

400420

440460

480500

5200

2

4

6

8

10

12

14

16

18

20 Node Changes for different Input Cost

3456710

Iteration Number

Nod

ecou

nt

Input Cost (in $)

A Framework for Elasti c Executi on of Existi ng MPI Programs 29

Experiments – Cost CriteriaJacobi

3 4 5 6 7 100

500

1000

1500

2000

2500

3000

Input Cost vs Output Time

Input Cost (Cin in $)

Out

put T

ime

(top

) in

sec

A Framework for Elasti c Executi on of Existi ng MPI Programs 30

Experiments – Cost CriteriaCG

4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

10Input Cost vs Output Cost

Input Cost (Cip in $)

Out

put C

ost (

Cop

in $

)

A Framework for Elasti c Executi on of Existi ng MPI Programs 31

Experiments – Cost CriteriaCG

I/P Cost (in $) Iteration No. Node Change4 Never 4 Small5 45 4 Small to 4 Large6 40 5 Small to 4 Large7 35 6 Small to 4 Large8 25 7 Small to 4 Large9 5 8 Small to 4 Large

Node Changes For different Input Cost

A Framework for Elasti c Executi on of Existi ng MPI Programs 32

Experiments – Cost CriteriaCG

4 5 6 7 8 90

100

200

300

400

500

600

700

800

900

Input Cost vs Output Time

Input Cost (Cin in $)

Out

put T

ime

(top

) in

sec

A Framework for Elasti c Executi on of Existi ng MPI Programs 33

Conclusion

An approach to make MPI applications elastic and adaptable An automated framework for deciding the number of instances for

execution based on user demands – time / cost Framework tested using 2 MPI applications showing low overheads

during elastic execution and best efforts to meet user constraints