21
Hybrid Distributed Computing Infrastructure Experiments in Grid5000: Supporting QoS in Desktop Grids with Cloud Resources Simon Delamare, Gilles Fedak, Oleg Lodygensky [email protected] INRIA Rh ˆ one Alpes GRAAL team Grid’5000 Spring School 2011 S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 1 / 20

Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Hybrid Distributed Computing InfrastructureExperiments in Grid5000: Supporting QoS in

Desktop Grids with Cloud Resources

Simon Delamare, Gilles Fedak, Oleg Lodygensky

[email protected] Rhone Alpes

GRAAL team

Grid’5000 Spring School 2011

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 1 / 20

Page 2: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Plan

1 Introduction

2 The SpeQuloS framework

3 Grid5000 as a Best-Effort DCI

4 Grid5000 as a Cloud

5 Conclusion

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 2 / 20

Page 3: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Context

Growing demand for computing power from scientific communities (largeapplication, large datasets)Distributed Computing Infrastructures continues to diversify :

I Supercomputers, Grids, Desktop Grids, and now Cloud Computing. . .I Different characteristics in term of performance, size, cost, reliability, quality

of service etc. . .

Question : how do we mix them ? According to which criteria/scenario ?I example : extends Grid infrastructure with Cloud resources to meet a peak

demand.I example : use local Desktop Grid to cut the Cloud resource cost.I example : use on-demand Cloud resources to improve QoS of Desktop Grid.

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 3 / 20

Page 4: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Background : The EU FP7 EDGI

EDGeS (Enabling DesktopGrid for eScience) : bridgesfrom EGEE to Desktop Grid(BOINC and XtremWeb)FP7 EDGI/DEGISCO are newprojects to maintain andextend the EDGeSinfrastructures

I new Grids : ARC, UnicoreI Clouds : Eucalyptus,

OpenNebula

International Desktop GridFederation: 40 partnersworldwide.

EDGeS: Bridging EGEE to BOINC and XtremWeb

The job is accepted if and only if all thesestatements are true,

• the helper script doesn’t have to be started, asthe proxy doesn’t leave the CE, so there is noneed to update proxies on Worker Nodes,

• there is no need to run the wrapper script, itonly has to be parsed, so our GRAM jobman-ager can send the job to the 3G Bridge andcan interact with the L&B server instead ofthe wrapper script,

• moreover, the GRAM jobmanager periodi-cally has to check the status of the job in the3G Bridge database, and update the status ofthe job for EGEE.

In order to implement the EGEE ! BOINCbridge, we have extended the 3G bridge with aWeb Service interface. So, there is no need toplace the 3G bridge and the BOINC server on thesame machine as the EDGeS CE: the 3G bridgeand the BOINC server are completely separatedfrom the EDGeS CE, which uses the WS interfaceto communicate with the 3G bridge.

On BOINC side, the DC-API [6] plug-in is usedto create BOINC work units out of entries in the3G Bridge Job Database, query their status andget the results of processed work units. Once workunits are created in the BOINC database, theyare processed sooner or later by attached BOINCclients. If desired, the BOINC server can performany job redundancy and checking as usual.

The EGEE ! BOINC bridge publishes in-formation to the BDII according to GLUE 1.3,contains an EGEE producer and a BOINC GIPplug-in. The BOINC plug-in is responsible forreporting performance information about theBOINC project. For this, BOINC statistics pro-vided by the BOINC project are used.

5.3.3 Bridge EGEE ! XtremWeb

The general principle of creating the EGEE !XtremWeb bridge is the same as in the case of theEGEE ! BOINC bridge since both solutions usethe 3G bridge as the heart of the EGEE ! DGbridges. The architecture of EGEE ! XtremWebbridge, depicted in Fig. 9, clearly shows that theEGEE ! 3G bridge part is the same as the oneshown in Fig. 8 above for the EGEE ! BOINC

case. The only difference is the replacement, in-side the 3G bridge, of the BOINC plug-in by theXtremWeb plug-in.

6 Current Operational Status of EDGeS

The EDGeS 3G bridge is not an pure researchprototype, but implementations are in real opera-tion between EGEE and Desktop Grids, as shownin Fig. 10 below:

6.1 Operational DG ! EGEE Infrastructure

The DG ! EGEE bridges of the EDGeS systemhave been prototyped in June 2008 and put intooperation in September 2008.

The BOINC ! EGEE bridge is currentlyin operation at SZTAKI in Budapest, Hungary.

EDGeS VO of EGEE

CNRS / IN2P3 CE 1.600 cpus

SZTAKI CE 16 cpus

CIEMAT CE 20 cpus

BDII VOMS MyProxy WMS LB

EGEE Users

EDGeS

BOINC ! EGEE bridge

Application Repository

EGEE XtremWeb

bridge

EGEE BOINC bridge

BOINC-based Desktop Grids

SZDG (public) 72.000 PCs

IberCivis (public) 14.000 PCs

AlmereGrid (public) 1.000 PCs

UoW (local) 1.881 PCs

Correlation Systems

(local) 10 PCs

XtremWeb-based Desktop Grids

AlmereGrid (public) 1.000 PCs

IN2P3 (public) 600 PCs

AlmereGrid (local)10 PCs

BOINC Project Owners XtremWeb Users

! !

Fig. 10 EDGeS operational EGEE "! DGinfrastructure

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 4 / 20

Page 5: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Motivation for QoS in Best-Effort DCIBest Effort DCI: Desktop Grid, Grids’besteffort queues, Amazon EC2 spotinstances, ...

Characteristics: Variable amount ofresources, volatility, unpredictability,unannounced departure.

Low QoS compare to classical DCI →Tail Effect

0

500

1000

1500

2000

2500

3000

3500

4000

0 1000 2000 3000 4000 5000

Nu

mb

er o

f Jo

bs

Time - s

Jobs completed

We define Quality of Service as a level of confidence in Bag of Task (BoT)execution :

I BoT makespan is the time between the BoT is first submitted and the time allthe results have been received and validated

I What can be estimated, predicted, guaranteed ?

Question: how do we provide QoS to users given the dynamismand volatility of the computing resources ?

Intrinsic approach : improve scheduler for QoS ability

Extrinsic approach : provide additional dedicated computing resources

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 5 / 20

Page 6: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Motivation for QoS in Best-Effort DCIBest Effort DCI: Desktop Grid, Grids’besteffort queues, Amazon EC2 spotinstances, ...

Characteristics: Variable amount ofresources, volatility, unpredictability,unannounced departure.

Low QoS compare to classical DCI →Tail Effect

0

500

1000

1500

2000

2500

3000

3500

4000

0 1000 2000 3000 4000 5000

Nu

mb

er o

f Jo

bs

Time - s

Jobs completed

We define Quality of Service as a level of confidence in Bag of Task (BoT)execution :

I BoT makespan is the time between the BoT is first submitted and the time allthe results have been received and validated

I What can be estimated, predicted, guaranteed ?

Question: how do we provide QoS to users given the dynamismand volatility of the computing resources ?

Intrinsic approach : improve scheduler for QoS ability

Extrinsic approach : provide additional dedicated computing resources

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 5 / 20

Page 7: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Plan

1 Introduction

2 The SpeQuloS framework

3 Grid5000 as a Best-Effort DCI

4 Grid5000 as a Cloud

5 Conclusion

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 6 / 20

Page 8: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

The SpeQuloS Framework

Objectives:I Allow users to express QoS needs for their BoTI Provision resources from Cloud to satisfy these needs

Needs:I Monitor iInfrastructure activity & BoT executionI When & How many Cloud resources to provisionI Instanciate Cloud resources as Cloud Workers and manage itI Account and arbitrate Cloud usage

User EGEE DesktopGrid

SpeQuloS Cloud

BoT

provision

monitor

support

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 7 / 20

Page 9: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Overview of the SpeQuloS framework

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 8 / 20

Page 10: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Implementation details

DG Middle-ware

Middle-ware supported: BOINC, XtremWeb-HEP (XWHEP)Modifications to make sure that workers deployed in the Cloud onlyprocess specified BoT.

→ XWHEP version ≥ 7.3.0 & patch for BOINC server

Starting Workers on the Cloud

For each Clouds, VM images are built with DG workers softwareCloud Workers started using libcloud and ssh to configure themClouds supported : OpenNebula, EC2, Eucalyptus and we added G5K

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 9 / 20

Page 11: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Plan

1 Introduction

2 The SpeQuloS framework

3 Grid5000 as a Best-Effort DCI

4 Grid5000 as a Cloud

5 Conclusion

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 10 / 20

Page 12: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Experimental Testbed Using Grid’5000XWHEP Desktop Grid server @ LRI, connected to EGEEA gateway to allow G5K nodes to connect to the XWHEP server.A set of XWHEP worker nodes, executed on G5K nodes.

I A G5K job is a pilot job running several XWHEP workers (1 per core)I No specific environment neededI G5K jobs submitted in best effort queue → Variable amount of resources,

unpredictable and unannounced node departure.

SpeQuloS monitors the XWHEP server and start Cloud resources fromAmazon EC2.

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 11 / 20

Page 13: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Grid’5000 resources utilization

Deployment on seven G5K sites,running:

max #pjobs ← 30max #pjobs waiting ← 7while true do

if current #pjobs < max #pjobs thenif current #pjobs waiting <max #pjobs waiting then

”Submit start XWHEP workers toone Grid5000 node in besteffortqueue”

else”Too many pilot jobs waiting”

end ifelse

”Maximum number of pilot jobs reached”end ifsleep(15 minutes)

end while

0

200

400

600

800

1000

1200

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96N

um

ber

of

wo

rker

s o

r jo

bs

Time - hours

XWHEP workersG5K pilot jobs runningG5K pilot jobs waiting

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 12 / 20

Page 14: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

First results

0

500

1000

1500

2000

2500

3000

3500

4000

0 1000 2000 3000 4000 5000 600

625

650

675

700

725

750

775

800

825

850

875

900

925

950

975

1000

Nu

mb

er o

f Jo

bs

Nu

mb

er o

f w

ork

ers

Time - s

Jobs completedBest-Effort workers

SpeQuloS not used

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 2000 4000 6000 8000 10000 700

725

750

775

800

825

850

875

900

925

950

975

1000

1025

1050

1075

1100

1125

1150

1175

1200

1225

1250

Nu

mb

er o

f Jo

bs

Nu

mb

er o

f w

ork

ers

Time - s

Jobs completedAdditional Cloud workers

Best-Effort workers

SpeQuloS used

→ the tail has disappeared ! But not enough experiments to conclude.

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 13 / 20

Page 15: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Plan

1 Introduction

2 The SpeQuloS framework

3 Grid5000 as a Best-Effort DCI

4 Grid5000 as a Cloud

5 Conclusion

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 14 / 20

Page 16: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Using Grid’5000 as a Cloud

For experimentations, it is difficult to use Clouds is :I Using public Clouds like EC2 is costlyI Using private Clouds:

F Is complex to deploy and maintainF Needs a lot of hardware to be useful.

Grid’5000 has most of IaaS Cloud featuresI On-demand resources availabilityI User-driven execution environment using deploymentI Remote access through API

Idea: Conduct SpeQuloS experiments with Grid5000 as a Cloud.

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 15 / 20

Page 17: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

The Grid’5000 libcloud driver

libcloud:I Python API used in SpeQuloSI Common interface to various Cloud technologiesI Support: Amazon EC2, Eucalytpus, OpenNebula, RackSpace, Nimbus, ...

We propose a new ”driver” for libcloud: Grid’5000 driverI Using the Grid’5000 APII Implement standard libcloud features:

libcloud feature Grid5000 implementationStart/Stop instance Interactive job submission

List Node Sizes List available nodesList Disk Image List available environment to deploy

Still few work to be completed

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 16 / 20

Page 18: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Experimentation resultsGrid5000 is used as Best Effort DCIBoth EC2 and Grid5000 are used as Cloud resourcesGrid5000 as a Cloud is hosted on the Rennes site

0

500

1000

1500

2000

2500

3000

3500

4000

0 500 1000 1500 2000 2500 3000 3500 700

725

750

775

800

825

850

875

900

925

950

Nu

mb

er o

f Jo

bs

Nu

mb

er o

f w

ork

ers

Time - s

Jobs completedAdditional Cloud workers from G5K

Additional Cloud workers from Amazon EC2Best-Effort workers only

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 17 / 20

Page 19: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Plan

1 Introduction

2 The SpeQuloS framework

3 Grid5000 as a Best-Effort DCI

4 Grid5000 as a Cloud

5 Conclusion

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 18 / 20

Page 20: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general

Conclusion

On-going works regarding SpeQuloS

Improving SpeQuloS basics:I improve real-time estimation of completion time (moving average)I improve the detection of the tails by fitting statistical distribution to BoT

execution archiveI improve scheduling of Cloud resources, i.e. when and how many Cloud

Workers to start ?

→ More G5K experiments to validate SpeQuloS

Remarks on Grid5000 utilization

Grids besteffort queue as a Best Effort DCIGrid5000 can be used as a Cloud

I Without additional virtualizationI Not as flexible as real Cloud (deployment, node size, isolated network)I libcloud driver release will be announced on the mailing list

S. Delamare (INRIA - GRAAL Team) Hybrid DCI Experiments in G5K Grid’5000 Spring School, 19/04/2011 19 / 20

Page 21: Hybrid Distributed Computing Infrastructure ... - Grid'5000 · BOINC project. For this, BOINC statistics pro-vided by the BOINC project are used. 5.3.3 Bridge EGEE! XtremWeb The general