58
Dynamic Reconfiguration of Grid-Aware Applications in ASSIST Euro-Par 2005 Lisboa, Portugal M. Aldinucci A. Petrocelli, E. Pistoletti, M. Torquati, M. Vanneschi, L. Veraldi, C. Zoccolo ISTI - CNR, Pisa, Italy University of Pisa, Italy

Dynamic Reconfiguration of Grid-aware applications in ASSISTalpha.di.unito.it/storage/talks/2005_assist_dyn_europar_talk.pdfM. Vanneschi, L. Veraldi, C. Zoccolo ISTI - CNR, Pisa, Italy

Embed Size (px)

Citation preview

Dynamic Reconfiguration of Grid-Aware Applications in ASSIST

Euro-Par 2005Lisboa, Portugal

M. AldinucciA. Petrocelli, E. Pistoletti, M. Torquati,

M. Vanneschi, L. Veraldi, C. Zoccolo

ISTI - CNR, Pisa, ItalyUniversity of Pisa, Italy

Dynamic Reconfiguration of Grid-Aware Applications in ASSIST

Euro-Par 2005Lisboa, Portugal

M. AldinucciA. Petrocelli, E. Pistoletti, M. Torquati,

M. Vanneschi, L. Veraldi, C. Zoccolo

ISTI - CNR, Pisa, ItalyUniversity of Pisa, Italy

• Motivating ...

• high-level programming for the grid• application adaptivity for the grid

• ASSIST basics

• Adaptivity in ASSIST

• mechanisms • autonomic QoS managers

• Demo & experiments

• Concluding remarks

Outline

2

• Motivating ...

• high-level programming for the grid• application adaptivity for the grid

• ASSIST basics

• Adaptivity in ASSIST

• mechanisms • autonomic QoS managers

• Demo & experiments

• Concluding remarks

Outline

2

The grid“... coordinated resource sharing and problem

solving in dynamic, multi institutional virtual organizations.”

Foster, Anatomy

“1) coordinates resources that are not subject to centralized control …”“2) … using standard, open, general-purpose protocols and interfaces”“3) … to deliver nontrivial qualities of service.”

Foster, What is the Grid?

Did you see J. Fortes invited talk?

The grid“... coordinated resource sharing and problem

solving in dynamic, multi institutional virtual organizations.”

Foster, Anatomy

“1) coordinates resources that are not subject to centralized control …”“2) … using standard, open, general-purpose protocols and interfaces”“3) … to deliver nontrivial qualities of service.”

Foster, What is the Grid?

Did you see J. Fortes invited talk?

Moreover, since this is not Euro-Seq, I assume applications we are focusing on should be parallel (and hopefully high-performance).

// progr. & the grid

• concurrency exploitation, concurrent activities set up, mapping/scheduling, communication/synchronization handling and data allocation, ...

• manage resources heterogeneity and unreliability, networks latency and bandwidth unsteadiness, resources topology and availability changes, firewalls, private networks, reservation and jobs schedulers, ...

// progr. & the grid

• concurrency exploitation, concurrent activities set up, mapping/scheduling, communication/synchronization handling and data allocation, ...

• manage resources heterogeneity and unreliability, networks latency and bandwidth unsteadiness, resources topology and availability changes, firewalls, private networks, reservation and jobs schedulers, ...

... and a non trivial QoS for applications

// progr. & the grid

• concurrency exploitation, concurrent activities set up, mapping/scheduling, communication/synchronization handling and data allocation, ...

• manage resources heterogeneity and unreliability, networks latency and bandwidth unsteadiness, resources topology and availability changes, firewalls, private networks, reservation and jobs schedulers, ...

... and a non trivial QoS for applications

not easy leveraging only on middleware

• Motivating ...

• high-level programming for the grid• application adaptivity for the grid

• ASSIST basics

• Adaptivity in ASSIST

• mechanisms • autonomic QoS managers

• Demo & experiments

• Concluding remarks

Outline

5

• Motivating ...

• high-level programming for the grid• application adaptivity for the grid

• ASSIST basics

• Adaptivity in ASSIST

• mechanisms • autonomic QoS managers

• Demo & experiments

• Concluding remarks

Outline

5

Grid

Abstract

Machine

Application(Manager(

(AM: non-functional aspects & QoS control)

ASSIST(applications

Abstraction(of(the(basic(services:(

resource(management(&(scheduling,(

monitoring,(...

standard(middleware(tools

(Globus,(WS,(CCM,(...)

Ap

p. d

ep

en

de

nt

co

mp

iler g

en

era

ted

ASSIST idea

“moving most of the Grid specific efforts needed while

developing high-performance Grid applications from

programmers to grid tools and run-time systems”

ASSIST is a high-level programming environment for grid-aware // applications. Developed at Uni. Pisa within several national/EU projects.

First version in 2001. Open source under GPL.

app = graph of modules

7

P2 P3

P4P1

input output

app = graph of modules

7

P2 P3

P4P1

input output

app = graph of modules

7

P2 P3

P4P1

input output

Sequential or parallel module

Typed streamsof data items

Programmable, possibly nondeterministic input behaviour

native + standard

8

P2 P3

P4P1

ASSIST native or wrap (MPI, CORBA, CCM, WS)

TCP/IP, Globus,IIOP CORBA,HTTP/SOAP

ASSIST native parmod

9

VP VP

VP VP

VP VP

ASSIST native parmod

9

VP VP

VP VP

VP VP

An “input section” can be programmed in a CSP-like way

ASSIST native parmod

9

VP VP

VP VP

VP VP

An “input section” can be programmed in a CSP-like way

Data items can be distributed (scattered,

broadcasted, multicasted) to a set of

Virtual Processes which are named accordingly to a

topology

ASSIST native parmod

9

VP VP

VP VP

VP VP

An “input section” can be programmed in a CSP-like way

Data items can be distributed (scattered,

broadcasted, multicasted) to a set of

Virtual Processes which are named accordingly to a

topology

Data items partitions are elaborated by VPs, possibly in

iterative way

while(...) forall VP(in, out) barrier

data is logically shared by VPs (owner-computes)

ASSIST native parmod

9

VP VP

VP VP

VP VP

An “input section” can be programmed in a CSP-like way

Data items can be distributed (scattered,

broadcasted, multicasted) to a set of

Virtual Processes which are named accordingly to a

topology

Data items partitions are elaborated by VPs, possibly in

iterative way

while(...) forall VP(in, out) barrier

data is logically shared by VPs (owner-computes)

Data is eventually gathered accordingly to

an user defined way

ASSIST native parmod

9

VP VP

VP VP

VP VP

An “input section” can be programmed in a CSP-like way

Data items can be distributed (scattered,

broadcasted, multicasted) to a set of

Virtual Processes which are named accordingly to a

topology

Data items partitions are elaborated by VPs, possibly in

iterative way

while(...) forall VP(in, out) barrier

data is logically shared by VPs (owner-computes)

Data is eventually gathered accordingly to

an user defined way

Easy to express standard paradigms(skeltons), such as

farm, deal, haloswap, map, apply-to-all,

forall, ...

parmod implementation

10

inputmanager

VP VP

VP manager (VPM)

VP VP

VP manager (VPM)

inputmanager

VP VP

VP manager (VPM)

processes VP Virtual Processes

11

Compiling & Running

ASSISTcompiler

11

Compiling & Running

QoScontract

ASSISTprogram

ASSISTcompiler

resourcedescription

XML

executablecode

(linux, mac,uindoz)

11

Compiling & Running

QoScontract

ASSISTprogram

ASSISTcompiler

resourcedescription

XML

executablecode

(linux, mac,uindoz)

launch

query new resources

reco

nf

com

man

ds

Managers

AM+MAMs

Grid execution

agent (GEA)

ISM OSM

VPM

seqseq

Network of processes

Run

• Motivating ...

• high-level programming for the grid• application adaptivity for the grid

• ASSIST basics

• Adaptivity in ASSIST

• mechanisms (+ demo)• autonomic QoS managers

• Demo & experiments

• Concluding remarks

Outline

12

P1 P2

• Motivating ...

• high-level programming for the grid• application adaptivity for the grid

• ASSIST basics

• Adaptivity in ASSIST

• mechanisms (+ demo)• autonomic QoS managers

• Demo & experiments

• Concluding remarks

Outline

12

P3

Application adaptivity

• Adaptivity aims to dynamically control program configuration (e.g. parallel degree) and mapping

• for performance (high-performance is a natural sub-target)

• for fault-tolerance (enable to cope with unsteadiness of resources, and some kind of faults)

13

Ingredients for the adaptivity recipe

14

Ingredients for the adaptivity recipe

1. Mechanism for adaptivity

• reconf-safe points• in which points a parallel code can be safely reconfigured?

• reconf-safe point consensus• different parallel activities may not proceed in lock-step fashion

• add/remove/migrate computation & data

14

Ingredients for the adaptivity recipe

1. Mechanism for adaptivity

• reconf-safe points• in which points a parallel code can be safely reconfigured?

• reconf-safe point consensus• different parallel activities may not proceed in lock-step fashion

• add/remove/migrate computation & data

2. Managing adaptivity

• QoS contracts• Describing high-level QoS requirement for modules/applications

• “self-optimizing” module• under the control of an autonomic manager

14

reconf-safe points

• In which points of the code the execution can be reconfigured?

• low-level approach• the programmer places in the code calls to a suitable API, e.g.

safe_point(); • error-prone, time-consuming

• ASSIST • automatically generated by the compiler, driven by program

semantics• no artifactual synchronization added, already existing

synchronizations are rather instrumented

• overhead w.r.t. not adaptive code < 0.04%

15

Mechanisms

Distributed agreement

• The program reconfiguration actually starts only when all interested entities are ready to react

• i.e. all processes have reached a suitable reconf-safe point

• they agreed on which one• fresh resources are up and running

• distributed protocol

16

Mechanisms

Basic operations

• Change parallelism degree

• Add n VPMs to parmod• Remove n VPMs from a parmod

• Change mapping

• Move k VPs from a VPM to another• Move a VPM from a PE to another• Dynamic load-balancing as sequence of

migrate operations

17

Mechanisms

Example: Add VPM

VP VP

ISM OSM

MAM

VPVPM

VP VPVPM

data

VP

data

18

Mechanisms

VPM

Example: Add VPM

VP VP

ISM OSM

MAM

VPVPM

VP VPVPM

data

VP

data

1. Gexec(newPE, VPM)

18

Mechanisms

VPM

Example: Add VPM

VP VP

ISM OSM

MAM

VPVPM

VP VPVPM

data

VP

data

1. Gexec(newPE, VPM)

2. acquire consensus

18

Mechanisms

VPM

Example: Add VPM

VP

ISM OSM

MAM

VPVPM

VP VPVPM

data

VP VP

data

1. Gexec(newPE, VPM)

2. acquire consensus

3. move VP and data

Only 3. is in the critical path 18

Mechanisms

!"# $"#

%&#

#'#

!"!#!$%&'("!)*+$#!("&+*",&-.&/0,1

!"# $"#

#'#

%&#

!"# $"#

%&#

#'#

%&#

2+('3,,

("&/04

!"$%&'("!)*+$#!("&+*",&-.56&/0,1

$"$%7832%$" 9:9

;:9&-<!==%3>$+31!"#$%&'()*#'(&"$'

"33=&6&/0 /04

+,-.'/0-12$3*#'(&"$'

343'*#3

2$+<(=&+3$'?3,&$+3'("@A,$@3&2(!"#

+3'("@B&%$#3"'7

+3'("@B&#!<3

<("!#(+

#!<3

C$*"'?-D/9E/041 $'.

4("5*6%2",%(*#'(&"$'

D/,&$+3+3=!,#+!F*#3=

G?3&"3>&2+('3,,'("#$'#,&#?3&9:9

Fig. 2. Reconfiguration dynamics and metrics.

TCP/IP or Globus provided communication channels. The two applications arecomposed by one parmod and two sequential modules. The first is a data-parallelapplication receiving a stream of integer arrays and computing a forall of sim-ple function for each stream item; the matrix is stored in the parmod sharedstate. The second is a farm application computing a simple function on di!erentstream items. Since Rt also depends on sequential function cost, in both caseswe choose sequential functions with a close to zero computational cost in orderto evaluate mechanism on the finest possible grain.

The reconfiguration overhead (Ro) measured during our experiments, with-out any reconfiguration change actually performed, is practically negligible, re-maining under the limit of 0,004%, the measurement of the other two metricsare reported in Table 1.

Notice that in the case of a data-parallel parmod, Rl grows linearly with(x + y) for the reconfiguration x ! y for both kinds of reconf-safe points, anddepends on shared state size and mapping. Farm parmod cannot be reconfiguredon-barrier since it has no barrier, and achieves a negligible Rl (below 10!3 ms).This is due to the fact that no processes are stopped in the transition from oneconfiguration to the next. Rt, which includes both the protocol cost and time toreach next reconf-safe point, grows linearly with (x + y) for the former cost andheavily depends on user-function cost for the latter.

parmod kind Data-parallel (with shared state) Farm (without shared state)

reconf. kind add PEs remove PEs add PEs remove PEs

# of PEs involved 1"2 2"4 4"8 2"1 4"2 8"4 1"2 2"4 4"8 2"1 4"2 8"4

Rl on-barrier 1.2 1.6 2.3 0.8 1.4 3.7 – – – – – –Rl on-stream-item 4.7 12.0 33.9 3.9 6.5 19.1 # 0 # 0 # 0 # 0 # 0 # 0

Rt 24.4 30.5 36.6 21.2 35.3 43.5 24.0 32.7 48.6 17.1 21.6 31.9

Table 1. Evaluation of reconfiguration overheads (ms). On this cluster, 50 ms areneeded to ping 200KB between two PEs, or to compute a 1M integer additions.

Overheads (milliseconds)

GrADS papers reports overhead in the order of hundreds of seconds (K. Kennedy et al. 2004), this is mainly due to the stop/restart behavior, not to the different running env.

19

Mechanisms

Demo Here !

run beginwith 1 VPM

reconf. console

Demo Here !

run beginwith 1 VPM

lines arrivesslowly one

after the other(par. degree=1)

fresh VPMs are added to the app.

lines arrives much faster

(par. degree=5)

reconf: release 3 VPMs

lines arrivesa bit slower

(par. degree=2)

reconf: add the 4 VPMs

4 fresh VPMsare started

reconf. console

Managing adaptivity

21

Management

Grid execution agent

(Globus, ACE, ...)

ISM OSM

VPM

seqseq

parmod

(network of processes)

QoSdata

Executenext

config

brokencontracts

Analyze

PlanMonitorManagers

AM+MAMs

launch launch

reconf. commands

new resources

queries

parmod autonomic manager

1. monitor• collect execution stats: machine load, VPM service

time, input/output queues lenghts, ...

2. analyze• instanciate performance models with monitored

data, detect broken contract, in and in the case try to indivituate the problem

3. plan• select a (predefined or user defined) strategy to

reconvey the contract to valid status. The strategy is actually a list of mechanism to apply.

4. execute• leverage on mechanism to apply the plan

Management

QoS contract(of the experiment I’ll show you in a minute)

Perf. features QLi (input queue level), QLo (input queuelevel), TISM (ISM service time), TOSM

(OSM service time), Nw (number of VPMs),Tw[i] (VPMi avg. service time), Tp (parmodavg. service time)

Perf. model Tp = max{TISM ,!n

i=1Tw[i]/n, TOSM},

Tp < K (goal)

Deployment arch = (i686-pc-linux-gnu ! powerpc-apple-darwin*)

Adapt. policy goal based

Management

Performance models:an example (DP load balancing)

Time

24

Management

Performance models:an example (DP load balancing)

Time

barrier

barrier

idle

T1=T2=3 T3=2 T4=1

n1=n2=n3=n4=7

24

Management

Performance models:an example (DP load balancing)

Time

barrier

barrier

idle

T1=T2=3 T3=2 T4=1

n1=n2=n3=n4=7

barrier

barrier

T1=T2=3 T3=2 T4=1

n1=5 n2=5 n3=7 n4=12idle

24

Management

• Motivating ...

• high-level programming for the grid• application adaptivity for the grid

• ASSIST basics

• Adaptivity in ASSIST

• mechanisms (+ demo)• autonomic QoS managers

• Experiments

• Concluding remarks

Outline

25

Perf(P1)

Perf(P2)

Perf(P3)

Perf(P4)

• Motivating ...

• high-level programming for the grid• application adaptivity for the grid

• ASSIST basics

• Adaptivity in ASSIST

• mechanisms (+ demo)• autonomic QoS managers

• Experiments

• Concluding remarks

Outline

25

Perf(P1)

Perf(P2)

Perf(P3)

Perf(P4)

N. of VPMs in parmod

Input stream pressureVPMs aggregated power

QoS contract

Wall Clock Time (s)

0 50

Ite

ms/s

N.

of

VP

Ms

100

2 4 6 8

10

2 4 6 8

20 80 60Input stream queue fill level

200 40 180 160 140 120

Fill

%

100

Farm: contract: keep a given service time contract change along the run

26

0

0.2

0.4

0.6

0.8

1

1.2

[email protected] P4@2GHz P3@868MHz [email protected]

Platform kind

Ap

p.

pro

f. (

ite

ratio

ns/s

)

0K

1K

2K

3K

4K

5K

6K

Lin

ux B

og

oM

IPS

App. proling BogoMIPS

A B C D

Running Env27

0

400 350 300 250

150 200

100 50

D

A

C

B

additional load started on platform B

VP

s t

o V

PM

s m

ap

pin

gse

co

nd

s

Iteration count 0 50 100 150 200 250 300 350 400

Max unbalance timeIteration time 3

1

4

2

Data parallel (shortest path)Machine B externally overloaded

after a while

0

0.2

0.4

0.6

0.8

1

1.2

[email protected] P4@2GHz P3@868MHz [email protected]

Platform kind

Ap

p.

pro

f. (

ite

ratio

ns/s

)

0K

1K

2K

3K

4K

5K

6K

Lin

ux B

og

oM

IPS

App. proling BogoMIPS

A B C D

28

Conclusions 1/2

• Application adaptivity in ASSIST

• complex, but trasparent (no burden for the programmers)• they should just define they QoS requirements• perf. models are automatically generated from

program structure (and don’t depend on seq. funct.)

• dynamically controlled, efficiently managed• catch both platforms unsteadiness and code

irregular behavior in running time• performance models not critical, reconfiguration

does not stop the application• key feature for the grid

29

Conclusions 2/2

30

Conclusions 2/2

• ASSIST cope with• grid platform unsteadiness• interoperability with standards

• and rely on them for many features

• high-performance• app deployment problems on grid

• private networks, job schedulers, firewalls, ...

30

Conclusions 2/2

• ASSIST cope with• grid platform unsteadiness• interoperability with standards

• and rely on them for many features

• high-performance• app deployment problems on grid

• private networks, job schedulers, firewalls, ...

• We currently working on• QoS of the whole application through hierarchy of

managers• components, fault-tolerance, efficient launch time

mapping• in cooperation with many coreGRID partners

30

Thank you• ASSIST is open source under GPL, available

on the web

• http://www.di.unipi.it/Assist.html• or search with google:

ASSIST programming environment

31