Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
How to Port Scientific Application on the GRID ?
Stefano CozziniCNR/IOM Democritos
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Outline
• a look at gLite infrastructures• Users&Application&Problems• Our methodology to work on the GRID• Client/server architecture• A real example of porting an application• Conclusions
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Glite infrastructure for applications (1)
• What it can offer: – CPU time on best effort basis if not
otherwise specified/decided with Specific SLAs agreement
– A job submission mechanism (batch system) – Storage and Data Management tools for
specialized data – Several VOs you can take part of...
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
• CPUs are generally loosely coupled • quad/six/eight core machines quite
available• Limited MPI support
– No inter-site MPI computation (not really a problem)
– No real way to distinguish between clusters and farms for MPI parallel jobs..
gLite infrastructure for applications (2)
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
A few definitions• Grid infrastructure: A distributed infrastructure of computation and storage resources, which can be used by users in a transparent way (i.e. without need to know about the location of the resources etc.)
• Grid-enabling procedure: The procedure that allows a scientific (or generic) user to solve her scientific computational problem by means of a Grid infrastructure
• Application: A collection of work items to solve a certain problem or to achieve desired results using a Grid infrastructure. (...) In other words, a Grid application may consist of a number of jobs that together fulfill the whole task.
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Observations
• An application is not only a software application but rather a complex computational problem to be solved in a Grid environment.
• A successful story of “Grid-enabled applications” refer not just to the porting of some scientific software on the Grid infrastructure but instead to a real exploitation of such software by a scientific community.
• Each application is unique in that it tries to solve a specific computational problem clearly stated by a scientific user or a
scientific community. In this context we can use as a synonym for application the term computational experiment.
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Which kind of sofware?• home made codes – easy: users are supposed to know
everything about it• Some well know packages used with slight
modification/adaptations– less easy: users do not know too much
• “black box” application (a.k.a. legacy application)–difficult: nobody knows about it
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
which kind of “computational experiment ?”
• just a single application to run – parallel
• embarrassingly /tightly coupled
– serial
• a bunch of applications linked together (workflow)
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Our method to port applications..• Identify them through user's discussions
and meeting:– Understand the computational requirements
of the group– Understand the level of IT skills in the group – Understand their concept of GRID and
applications– Understand the application requirements– Propose a solution (if exists)
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Steps to follow..
• Step 0: awareness of grid computing opportunities /analysis of the computational requirement
• Step 1: technical deployment on the infrastructure
• Step 2: benchmarking procedures and assessment of the efficiency
• Step 3: production runs and final evaluation
• Step 4: dissemination of the results among peers.
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Analysis
Grid-enabling procedure
Does is fiton the GRID ?
successful ?Benchmarking
analysis
OK ?
Production Runs
DOCUMENTATION AND REPORTS
No
NO
Y
Y
Yes
NO
Review the analysis
Review the procedure
(document why and exit)
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Options for step 1: technical deployment
• Simple serial porting of an application– setup of a few simple tools (scripts etc. ) to
run the computational experiment (generally a parametric study)
• Porting of parallel applications with MPI:– this again is quite simple in principle but
present MPI limitation on the infrastructure makes this strategy not very efficient.
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Options for step 1: technical deployment (2)
• Porting of complex codes/packages– requires important changes on the original
way to run the application and the experiment associated.
• Client/Server approach– to run embarrassingly or loosely coupled
applications/experiments
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
What do we need to port them on GRID ?
• TOOLS – Scripting languages for CLI – Ganga:
• a tool for computational-task management and easy access to Grid resources;
– Portals
• HUMAN FACTOR – Man power (from both sides: providers+ users) – Goodwill (from users)
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Command Line interface..• UI commands are generally complex,
they are usually reiterated very frequently becoming:– boring/error prone/inefficient
• However this approach is really flexible..• Hackers will love it • Windows guys will hate it ( grid is not
“just one click away”.)
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
CLI flexibility...• You can combine different commands
into simple scripts• Scripts can be easily re-used/changed...
or combined together..• This is not grid computing this is just
plain linux/unix hacking..• To do that please use your preferred
scripting language.
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Which kind of applications/computational
experiments fit/unfit?• FIT:– Parameter sweep computational
experiments– Embarrassingly parallel computations – SMP/openMP programs
• UNFIT:– Tightly coupled parallel programs – I/O, Memory bounded applications
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Client/Server architecture• Fits perfectly for loosely coupled application:• Idea
– Submit N independent jobs and coordinate them through a specifc client-server architecture
– Each node performs a task – Exchanges among clients are coordinated by the
client/server mechanism..
• Requirement:– outbound connectivity from Wns to Server
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Client/Server mechanisms
Server(must a FQDH)
Site A
Site B
Site C
Client (WN)
Client (WN)
Client (WN)
Client (WN)
Client (WN)
UI
WMS
6job
s
CE
CE
2 jobs
2 jobs
2 jobs
CE
Client (WN)
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Client/Server examples• BEMUSE:
– Biased Exchange Metadynamics Submission Environment
• Quantum Simulation: – Phonon calculations with Quantum/Espresso
• Both described in the conference proceedings of 2008 COST activity.
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
• joint COST D37/EU IndiaGrid/CompChem schools already organized in Trieste (Italy)
– September 2008
– 29 March -01 April 2010
– Proceeding of the 2008 Workshop published in ICTP lecture Series (freely available on line)
– http://publications.ictp.it/lns/vol24/vol24toc.html
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Q/E and the GRIDLarge-scale computations with Quantum ESPRESSO require HPC resources (tightly coupled clusters):
BUT:
often many smaller-size, loosely-coupled or independent computations are required. A few examples:
• the search for transition pathways (Nudged Elastic Band method);
• calculations under diferent conditions (pressure, temperature) or for diferent compositions, or for diferent values of some parameters;
• full phonon dispersions in crystals
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Phonon in crystals
Force Costant Matrixes
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Calculations of phonons • Force constants are computed by ph.x code on a
grid of n q vectors.
• For each q one has to perform a set of linear response calculations one of each irrep ( irreducible representation) generally proportional to the number of atoms
• BUT:– Force constant calculation are independent for
each q – Irreps are almost independent for q: only
some data should collected at the end
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Phonon workfow
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Practical implementation (I)On Q/E Package: minor changes needed in the phonon code, namely
• possibility to run one q-vector at the time (already there)
• possibility to run one irrep (or one group of irreps) at the time and to save partial results.
On GRID infrastructure: Implemented a Python server-client application which takes care of dispatching ph.x jobs and of collecting results
NOTE: server-client is independent from jobs submission mechanisms: it could run everywhere if plugins for job submission are provided.
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Case study: gamma-Al2O
3
• HPC runs:– 11 4/8/16 cpus : –
• 11 q points • 120 irreps for q
A few weeks on modern workstation
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Comparing HPC vs GRID
• HPC client /server approach– For each q an
independent parallel job.
– Parallel jobs can use 4/8/16 CPUS
• GRID client/server approach– each client computes
serially one or more (1,4,6) irreps
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
HPC Results • Time to results: best cases
16 CPUs: ~ 40 hours
08 CPUs: ~ 50 hours
04 CPUs: ~110 hours
NOTES: HPC scheduling policies taken into accounts: -maximum 128 cpus at time
-maximum 12 hours for jobs
-maximum 10 jobs running for users
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Grid experiment
3000 jobs submited in chunks of 500: clients contact back the server, receive input data and starting data fles (hundreds of Mb).Jobs lost in cyberspace (∼ 60% of all contacted servers! of which 30- 40% due to failure in downloading starting data fles) are resubmited.VOs involved: COMPCHEM and EU-IndiaGRID
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
GRID Results (6 irreps per clients)
• Up to 145 independent jobs simultaneously running
• Less 60 hours to complete the experiments :
• Comparable with 8 CPUs parallel runs!
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Conclusions • A realistic application of Quantum ESPRESSO to
frst-principle calculations at the nanoscale was demonstrated on the GRID
• Results produced in a relatively short time in spite of a rather high job failure rate: GRID can be competitive with conventional High-Performance Computers !
• Full exploitation of GRID infrastructure requires however the possibility to select HPC (with MPI), or large multicore machines (with OpenMP), In order to enable High throughput calculation on medium size HPC problem
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Some tips to users (1)
• State clearly the computational requirements:– CPU intensive – I/O requirements– memory requirements– scientific libraries needed (if any)
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Some tips to users (2)
• try to estimate the actual need of resource for your computational problem– how many times do I need to run my
program ? – how much does it take to run ?– is there any room from improvement in term
of performances ?
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Lesson learned● Role of human interaction is fundamental
and sometimes underestimated● GRID users speak different language
from GRID providers● GRID providers need to adopt their
language NOT viceversa ● Technical tools are important BUT not
sufficient to overcome the user inertia..
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme
www.euindiagrid.eu
Conclusions
• the successful porting of an application to a Grid environment highly depends on the smooth interaction of three key elements:– the infrastructure, – the problem – people around them