32
www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co- existence with gLite Alexandre Duarte Universidade Federal de Campina Grande (Brazil) Joint EELA-2/EGEE-III Tutorial for Trainers 30/06 to 04/07/2008 Part of these slides were created by Francisco Brasileiro (UFCG-Brazil) and Diego Scardaci (INFN-Italy)

Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

Embed Size (px)

Citation preview

Page 1: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

E-science grid facility forEurope and Latin America

OurGrid and the co-existence with gLiteAlexandre Duarte

Universidade Federal de Campina Grande (Brazil)

Joint EELA-2/EGEE-III Tutorial for Trainers

30/06 to 04/07/2008

Part of these slides were created by Francisco Brasileiro (UFCG-Brazil) and Diego Scardaci (INFN-Italy)

Page 2: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Agenda

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 2

• Introduction• The OurGrid Approach

– Architecture– Scheduling– Avoiding Free-Riders– Security Concerns– Application Models

• OurGrid and gLite Co-Existence• Conclusions

Page 3: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Introduction

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 3

Page 4: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

EELA-2 Objectives

• Build a powerful, functional and well supported Grid Facility

• Address a large community of users

• Assert the financial & management schemes to operate and support the e-Infrastructure on the long range

• Anticipate the handover of the e-Infrastructure operation and support

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 4

Page 5: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Powerful Grid Facility

• Dream– An active community of potential grid users– High skilled support team to deploy and manage resource

centres– A lot of resource centres with large amounts of computational

resource to put in the grid

• Reality:– An active community of potential grid users– Lack of skilled personnel– A few resource centres with a good amounts of computational

resources– A lot of resource centres with small amounts of computational

resources

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 5

Page 6: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Making the Dream a Reality

• User Community– Continue with the good work– Keep finding more interesting applications to support

• Skilled personnel– That’s why we (you) are here (there)– Training training and training

• Computational Resources– Buy more computers? $$$– Buy clusters? $$$$$– Buy Supercomputers? $$$$$$$$$$$$– Share idle resources? FREE

Example: UFCG with ±3000 PCs * 16 idle hours / day• ≈ 2000 idle PCs!!! Totally free!

6Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008

Page 7: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Sharing Idle Resources

• Voluntary Computing (eg. LCG@home)– Organisations donate their resources to A given project– Donators normally do not use the available resources for their

own purposes– Entrance barrier is high, because one must

invest a good deal of effort in “advertising” have a very high visibility project be in a prestigious institution

– May be useful when the organisation has access to a large number of desktops

• Peer2Peer Grid (eg. OurGrid)– Peers donate their resources to other Peers in the grid– Donators normally use the available resources for their own

purposes– Entrance barrier is low, just deploy a new Peer in the grid

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 7

Page 8: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

An Important Note

• These are not competing technologies!

• Each one is more appropriate to a particular subset of the users’ base

• Each one has its virtues and drawbacks

• It is very likely that they will be able not only to co-exist, but also to interoperate

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 8

Page 9: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

The OurGrid Approach

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 9

Page 10: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

The OurGrid Approach

• Labs can freely join the system without any human intervention– No need for negotiation; no paperwork

• Clear incentive to join the system– One can’t be worse off by joining the system– Noticeable increased response time– Free-riding resistant

• Basic dependability properties– Configurable level of security– Resilience to faults– Scalability

• Easy to install, configure, manage and program– No need for specialized support team

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 10

Page 11: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

But there is no free lunch

• To simplify the problem, OurGrid is focused on Bag-of-Tasks (BoT) applications– No need for communication among tasks

Facilitates scheduling and security enforcement

– Simple fail-over/retry mechanisms to tolerate faults– No need for QoS guarantees– Script-based programming is natural

Facilitates use

• Fortunately, many important applications are BoT!– Data mining, Massive search, Bio computing, Parameter sweep,

Monte Carlo simulations, Fractal calculations, Image processing and many others

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 11

Page 12: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

OurGrid Architecture

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 12

Page 13: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Finding Resources

• OurGrid GIS (NodeWiz) allows the execution of rich queries that encompass not only multiple attributes, but also range operators

• A scheduler might want to locate suitable resources– OS=linux && RAM ≥ 1G && clock > 4GHz && load < 0.5

• A user may want to locate a dataset that contains particular data itens– rain_fall && -37º52’ < long < -37º46’ && 144º54’ < lat < 145º03’

&& date ≥ 01/01/2007

• …

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 13

Page 14: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Scheduling with no Information

• Grid scheduling typically depends on information about the grid (eg. machine speed and load) and the application (eg. task size)

• However, getting accurate information about all applications and resources is hard in a large scale peer-to-peer grid

• Can we efficiently schedule tasks without requiring access to information?

• This would make the system much easier to deploy and simpler to use

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 14

Page 15: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Workqueue with Replication

• Tasks are sent to idle processors

• When there are no more tasks, running tasks are replicated on idle processors

• The first replica to finish is the official execution

• Other replicas are cancelled

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 15

Page 16: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Preventing Free-riders• It is important to encourage collaboration

– In file-sharing, most users free-ride

• OurGrid uses a reciprocation-based incentive mechanism– Tit-for-tat

• The Network of Favors– All peers maintain a local balance for all known peers– Peers with greater balances have priority when there is

contention for local resources– Under contention, the more one donates, the more one gets

back– No additional infrastructure is needed

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 16

Page 17: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

OurGrid Security• How to protect resources from applications?

– Leverages on the fact that BoT applications only communicate to receive input and return the output

– Input/output is done by the OurGrid Worker Manager that runs within a Java virtual machine

– The remote task runs inside a virtual machine, with no network access, and disk access only to a designated partition Other configurations are possible

– A new virtual machine is instantiated before a new task is run

• How to protect applications from resources?– Increased script language to accommodate an optional check

phase Application may introduce task-dependent water marks

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 17

Page 18: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Application Models

• Script-based– Stage-in/out files.

• Embedded– Direct access to MyGrid’s API

• Portal-based– Web interface to Mygrid’s API

• Framework-based– MyGrid inside of frameworks

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 18

Page 19: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

OurGrid-enabling an Application

• Write a script using a very simple language– Simple abstractions

File transfer (put, store, get) Hide heterogeneity ($PLAYPEN, $STORAGE)

– Define constraints (job requirements and grid machine attributes)

• Write a program that embeds the business logic and may make use of more complex features available through a Java API

• Deploy a Portal that embeds the application

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 19

Page 20: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

An Example: Factoring a Numberjob:

label: my_factorial_useless_example

requirements: (OS=linux && RAM ≥ 1G && clock > 4GHz && load < 0.5)

task:init: store factoring $PLAYPENremote: factoring 3 18655 34789789799 output-$JOB-$TASKfinal: get $PLAYPEN/output-$JOB-$TASK results

task:init: store factoring $PLAYPENremote: factoring 18656 37307 34789789799 output-$JOB-$TASKfinal: get $PLAYPEN/output-$JOB-$TASK results

task:init: store factoring $PLAYPENremote: factoring 37308 55968 34789789799 output-$JOB-$TASKfinal: get $PLAYPEN/output-$JOB-$TASK results

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 20

Page 21: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Some Applications• Script-based

– Risk assessment for agriculture loans (EMBRAPA)– Our own research on computer science

Simulations

• API-based– SmartPumping (PETROBRAS)

Parallel execution of genetic algorithms for optimizing oil pipeline operation

– EPANET-Grid (R&D project) Grid-enabled version of the EPANET system for simulation of water

supply systems– GridVida (R&D Project)

Image processing to support diagnosis by identifying similar cases in the archival database

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 21

Page 22: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Some Applications• Portal-based: SegHidro (R&D project)

– Several uses related to management of water resources in a Brazilian semi-arid area

– Academic and industrial users– Allow the configuration of different workflows of simulation

models and the execution of them in ensembles – Sharing of computing resources, data and complementary

expertise

• Framework-based: GridUnit (R&D project)– An extension of JUnit– Features

Transparent and Automatic Distribution Test Case Contamination Avoidance Environmental coverage Graphical user interface

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 22

Page 23: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

OURGRID AND GLITE CO-EXISTENCE

OurGrid and gLite Co-Existence

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 23

Page 24: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

EELA-2 Joint Research Activity

• Help in fostering the sustainability of the e-Infrastructure– Making the e-Infrastructure more interesting

and wide spread by increasing its reach and its usability

• Promote a continued and increased interaction between research groups in Europe and Latin-America

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 24

Page 25: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Increase the Reach by

• Allowing the scavenging of idle resources– Create the necessary mechanisms to allow resource

centres that run the OurGrid middleware to co-exist with resource centres running gLite within the EELA platform

– Provide some level of interoperation between these different kinds of resource centres and their associated applications

• Allowing the execution of the grid middleware on top of Microsoft Windows platforms– Port the gLite middleware to the Windows platform– Leveraging on the multi-platform characteristics of OurGrid

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 25

Page 26: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

Increase the Usability by

• Developing new application-oriented grid services– Easy the creation of digital archives and data grid frameworks– Secure storage to solve the insider abuse problem– Support for cooperative workflows– Other selected services required by NA3 applications

• Leveraging on the grid services provided by the OurGrid middleware to execute bag-of-tasks jobs

• Facilitating the management of resource centres

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 26

Page 27: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 27

• The first step is to allow EELA-2 OurGrid Resource Centres to be created– Provide support for the use of the gLite PKI by OurGrid resource

centres

OurGrid –gLite Co-Existence

Page 28: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

OurGrid –gLite Co-Existence

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 28

• The second step is to allow idle resources in anEELA-2 gLite resource centre to be exposed asOurGrid resources

Page 29: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 29

• The final step is to allow resources of an OurGrid resource centre to be exposed as gLite resources– This will be achieved in two sub-steps

Firstly, allow clusters to be exposed as a single resource in an OurGrid resource centre

Page 30: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 30

• The final step is to allow resources of an OurGrid resource centre to be exposed as gLite resources– This will be achieved in two sub-steps

Firstly, allow clusters to be exposed as a single resource in an OurGrid resource centre

Secondly, make these resources available at the gLite grid

Page 31: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

JRA1 Milestones• MJRA1.1: set/2008

– PKI-enabled OurGrid middleware• MJRA1.2: jan/2009

– Prototypes of the proposed services• MJRA1.3: jan/2009

– Common information system betweengLite and OurGrid up & running

• MJRA1.4: jul/2009– Prototype of the gateway to transfer jobs

from gLite to OurGrid and vice-versa• MJRA1.5: jul/2009

– Stable version of the proposed services• MJRA1.6: jan/2010

– Stable version of the gateway to transfer jobsfrom gLite to OurGrid and vice-versa

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 31

Page 32: Www.eu-eela.eu E-science grid facility for Europe and Latin America OurGrid and the co-existence with gLite Alexandre Duarte Universidade Federal de Campina

www.eu-eela.eu

References

• About OurGrid– Download the middleware and documentation from

http://www.ourgrid.org– Read “Labs of the world, unite!!! W. Cirne, F. Brasileiro, N.

Andrade, L. Costa, A. Andrade, R. Novaes, M. Mowbray. Journal of Grid Computing 4 (3) (2006) 225-246.” for more details.

• About JRA1– Requests/Comments/Suggestions/Criticisms

Send email to either Francisco Brasileiro ([email protected]) or Diego Scardaci ([email protected])

– Contact the developers at [email protected]– Download new software distributions from http://eela-forge.eu

Catania, Joint EELA-2/EGEE-III Tutorial for Trainers, 30/06 to 04/07/2008 32