Upload
melina-wood
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
Grid application development with gLite and P-GRADE Portal
Miklos KozlovszkyMTA [email protected]
2
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Contents
• P-GRADE Portal in a nutshell
• Workflow development with the Portal
• Workflow execution with the Portal
• Scaling up to a parametric workflow
3
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Short History of P-GRADE portal
• Parallel Grid Application and Development Environment• Initial development started in the Hungarian
SuperComputing Grid project in 2003• It has been continuously developed since 2003• Detailed information:
http://www.portal.p-grade.hu/
• Open Source community development since January 2008:
https://sourceforge.net/projects/pgportal/
4
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Download of OSS P-GRADE portal
110 downloads within the first month
~697 total downloads until now
5
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Main P-GRADE related projects
• EU SEE-GRID-1 (2004-2006)– Integration with LCG-2 and gLite
• EU SEE-GRID-2,SEE-GRID-SCI (2006-2008 / 2008-2010)– Parameter sweep extension
• EU CoreGrid (2005-2008)– To solve grid interoperation for job submission– To solve grid interoperation for data handling: SRB, OGSA-DAI
• GGF GIN (2006)– Providing the GIN Resource Testing portal
• EGEE 2,3 (2006-2010)– Respect program tool used for training and application development
• ICEAGE (2006-2008)– P-GRADE portal is used for training as official portal of the GILDA training
infrastructure• EU EDGeS (2008-2009)
– Transparent access to any EGEE and Desktop Grid systems
11
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Motivations for developing P-GRADE portal
• P-GRADE portal should – Hide the complexity of the underlying grid middlewares– Provide a high-level graphical user interface that is easy-to-use for
e-scientists– Support many different grid programming approaches:
Simple Scripts & Control (sequential and MPI job execution) Scientific Application Plug-ins Complex Workflows Parameter sweep applications: both on job and workflow level Interoperability: transparent access to grids based on different
middleware technology (both computing and data resources)– Support several levels of parallelism
12
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Layers in a Grid system
Basic Grid services:AA, job submission, info, …
Higher-level grid services (brokering,…)
Application toolkits, standards
Application
Grid middlewareCommand line tools
P-GRADE Portal servicesGraphical interface
14
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
What is a P-GRADE Portal workflow?
• a directed acyclic graph where– Nodes represent jobs (batch
programs to be executed on a computing element)
– Ports represent input/output files the jobs expect/produce
– Arcs represent file transfer operations
• semantics of the workflow:– A job can be executed if all of
its input files are available
15
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Three Levels of parallelism
– PS workflow level: Parameter study execution of the workflow
– Workflow level: Parallel execution among workflow nodes (WF branch parallelism)
Multiple jobs can run parallel
Each job can be a parallel program
– Job level: Parallel execution inside a workflow node (MPI job as workflow component)
Multiple instances of the same workflow can
process different data files
16
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
25 times
ExampleExample 1. 1.: Computational Chemistry: Computational Chemistry
Department of Chemistry, University of Perugia
SOLUTION OF SCHRODINGER EQUATION FOR TRIATOMIC SYSTEMS USING TIME-DEPENDENT (RWAVEPR) OR TIME INDEPENDENT (ABC) METHOD
A single execution can be between 5 hours and 10 hours
SEQUENTIAL FORTRAN 90
Many simulations at the same time
17
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
25 x
10 x25 x 5 x
Hungarian Meteorology ServiceHungarian Meteorology ServiceForecasting dangerous weather situations (storms, fog, etc.), crucial task in the protection of life and property
Processed information:surface level measurements, high-altitude measurements, radar, satellite, lightning, results of previous computed models
Requirements:•Execution time < 10 min•High resolution (1km)
Example 2.:Ultra-short range weather forecastExample 2.:Ultra-short range weather forecast
18
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Grid interoperation by P-GRADEAcccessing Globus, gLite and ARC based grids simultaneously
P-GRADE
GEMLCA
Portal
GEMLCA GEMLCA RepositoryRepository
P-GRADEportal
19
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Typical user scenarioCompilation phase
Portalserver
Gridservices
Certificate servers
DOWNLOAD BINARI(ES)
UPLOAD SOURCE(S)
COMPILE – EDIT
20
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Typical user scenarioApplication development phase
Portalserver
Gridservices
START EDITOR
OPEN & EDIT or DEVELOP WORKFLOW
SAVE WORKFLOW
Certificate servers
21
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Certificate servers
Portalserver
Gridservices
TRANSFER FILES, SUBMIT JOBS
DOWNLOAD (SMALL)
RESULTS
DOWNLOAD (SMALL)
RESULTS
Typical user scenarioWorkflow Execution phase
VISUALIZE JOBS and
WORKFLOW PROGRESS
MONITOR JOBS
DOWNLOAD PROXY CERTIFICATES
22
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
P-GRADE Portal structural overview
User interface layer Presents the user interface
Internal layer – Java classesRepresents the internal concepts
Java Webstartworkflow editor
Web browser
EGEE and Globus Grid services (gLite WMS, LFC,…; Globus GRAM, GridFTP, …)
Client
P-GRADEPortalserver
Grid
Grid layer – gLite and Globus command line toolsInterfacing with grid services
23
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Interface layer
User interface layer
Java Webstartworkflow editor
Web browserClient
Web server
P-GRADEPortalserver
Gridpshere Web portal framework
Gridsphere portlets
P-GRADE portlets
Workflow monitor:
Java applet generator
Workflow editor:
Java webstart application
24
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Interface layer functionalities
User interface layer
Java Webstartworkflow editor
Web browserClient
Web server
P-GRADEPortalserver
Gridpshere Web portal framework
Gridsphere portlets
P-GRADE portlets
Workflow monitor:
Java applet generator
Workflow editor:
Java webstart application
•Workflow portlet• Workflow manager, Storage, Upload
•Certificate portlet• Upload, download and other operations
•Settings portlet• Grid settings, Quota settings
• File management• Manage files in the grid
• Compiler portlet• Compile jobs on portal server
• Login• Welcome• ...
25
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
P-GRADE vs. Non-P-GRADE portlets
P-GRADE Portal portletsGridSphere
2.xGrid Portal framework
26
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Interface layer
User interface layer
Java Webstartworkflow editor
Web browserClient
Web server
P-GRADEPortalserver
Gridpshere Web portal framework
Gridsphere portlets
P-GRADE portlets
Workflow monitor:
Java applet generator
Workflow editor:
Java webstart application
27
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Interface layer
User interface layer
Java Webstartworkflow editor
Web browserClient
Web server
P-GRADEPortalserver
Gridpshere Web portal framework
Gridsphere portlets
P-GRADE portlets
Workflow monitor:
Java applet generator
Workflow editor:
Java webstart application
31
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Portlets/functionalities of P-GRADE portal
• Settings (portlet)• Certificate and proxy management (portlet)• Information system visualization (portlet)• Graphical workflow editing• Workflow manager (portlet)• LFC (EGEE) file management (portlet)• Compilation support (portlet)• Fault-tolerance support
32
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Settings Portlet
• Portal administrator can – connect the portal
to several grids– register the basic
resources of the connected grids
33
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Settings Portlet
User can customize the connected grids by adding and removing resources
34
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Certificate and proxy management Portlet
• User can upload his certificates of various grids to the MyProxy server
• User can download proxys and allocate to grids• User can use simultaneously as many proxys as many
grids are connected to the portal• As a result parallel branches of a workflow can be executed
simultaneously in several grids
SEE-GRID accessHUNGRID access
35
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
MyProxy interaction in P-GRADE: Certificate Manager
• To start your session on the Grid you must create a proxy certificate on the portal server
• “Certificates” portlet:
• to upload a proxy into MyProxy servers
• to download a proxy from MyProxy into the portal server
Certificates portlet
36
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Certificate ManagerDownloading a proxy
1. MyProxy server access details:• Hostname• Port number• User name (from upload)• Password (from upload)
2. Proxy parameters:• Lifetime• Comment
3. Grid association
37
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Certificate ManagerAssociating the proxy with a grid
This operation displays the details of the certificate and the list of available Grids (defined by portal administrator)
38
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
EGEE Grid
UK NGS
P-GRADE-Portal
London Paris
Athens
Solving Grid interoperation by P-GRADE Portal
Different jobs can be parallel executed in different grids
39
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Interoperation vs. Interoperability
Interoperation: – short term solution that defines what needs to be done to achieve
interoperation between current production grids using existing technologies
Interoperability:– native ability of Grids and Grid middleware to interact directly via common
open standards
As defined by the GIN (Grid Interoperation Now)CG (Community Group) of the OGF (Open Grid Forum)
Grid 1 Grid 2 Grid 3
P-GRADE Portal Grid 1
Grid 2 Grid 3
Interoperation Interoperability
41
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Graphical workflow editing
• The aim is to define a DAG of batch jobs:
1. Drag & drop components:jobs and ports
2. Define their properties3. Connect ports by
channels (no cycles, no loops, no conditions)
4. Automatically generates JDL file
42
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Workflow EditorProperties of a job
Properties of a job:• Binary executable• Type of executable• Number of required
processors• Command line parameters• The resource to be used
for the execution:• Grid/VO• (Computing element)
43
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Direct resource selection:Which computing element to use?
The information system portlet
queries BDII and GIIS servers
I still don’t know which resource to
use!
44
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Automatic resource selection
1. Select a broker Grid/VO for the job (e.g. GILDA_LCG2_broker/GILDA_gLite_broker)
2. (Describe the ranks & requirements of the job in JDL)
3. The portal will use the broker to find the best resource for the job!
45
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Workflow EditorDefining broker jobs
Select a Grid with broker!(*_BROKER)
Ignore the resource field!
If default JDL is not sufficient use the built-in JDL editor!
46
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Workflow EditorBuilt-in JDL editor
JDL look at the gLite Users’ manual!
47
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Workflow EditorDefining input-output files
File propertiesType: input: the job reads output: the job generatesFile type: local: comes from my desktop remote: comes from an SEFile: location of the fileInternal file name: Executable reads the file in this name – fopen(“file.in”, …)File storage type (output files only): Permanent: final result Volatile: only data channel
48
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
• Client side location:result.dat
• LFC logical file name(LFC file catalog is required – EGEE VOs) lfn:/grid/gilda/kozlovszky/11-04_-_result.dat
• GridFTP address (in Globus Grids):gsiftp://somengshost.ac.uk/mydir/result.dat
Local fileLocal file
Remote fileRemote file
How to refer to an I/O file?
• Client side location:c:\experiments\11-04.dat
• LFC logical file name(LFC file catalog is required – EGEE VOs) lfn:/grid/gilda/kozlovszky/11-04.dat
• GridFTP address (in Globus Grids):gsiftp://somengshost.ac.uk/mydir/11-04.dat
Input file Output file
49
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Gridservices
Local vs. remote files
Computing elements
Storage elements
REMOTE INPUTFILES
REMOTE OUTPUT
FILES
LOCAL INPUT FILES
& EXECUTABLES
LOCAL OUTPUT
FILES
LOCAL INPUT FILES
& EXECUTABLES
LOCAL OUTPUT
FILES
Only the permanent
files!
Your binary can access data services directly too• GridFTP API• GFAL API• lfc-*, lcg-* commands
Portalserver
51
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Workflow manager
• Lists available workflows• Enables
– Submitting– Aborting– Deleting
existing workflows
• Shows status, logs and results of workflow executions• Orchestrates job executions inside a workflow
52
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Workflow Management(workflow portlet)
• The portlet presents the status, size and output of the available workflow in the “Workflow” list
• It has a Quota manager to control the users’ storage space on the server• The portlet also contains the “Abort”, “Attach”, “Details”, “Delete” and
“Delete all” buttons to handle execution of workflows• The “Attach” button opens the workflow in the Workflow Editor• The “Details” button gives an overview about the jobs of the workflow
53
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 White/Red/Green color means the job is initial/running/finished state
Workflow Execution(observation by the workflow portlet)
54
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 White/Red/Green color means the job is initial/running/finished state
Workflow Execution(observation by the workflow portlet)
55
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 White/Red/Green color means the job is initial/running/finished state
Workflow Execution(observation by the workflow portlet)
56
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 White/Red/Green color means the job is initial/running/finished state
Workflow Execution(observation by the workflow portlet)
57
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Workflow Execution(observation by the workflow portlet)
White/Red/Green color means the job is initialised/running/finished
61
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Analysis of the log
• 2008.01.09 09:32:19 - Proxy with VOMS extensions created for VO "voce" with accounting group "".
• 2008.01.09 09:32:19 - Job submission in progress...• 2008.01.09 09:32:23 - Job has been submitted successfully!• 2008.01.09 09:32:23 - Job identifier is:• "https://skurut1.cesnet.cz:9000/mD_8VzPhm8AmIToTJKtigg"• 2008.01.09 09:32:26 - EGEE job's status has changed to "Waiting" (host is ).• 2008.01.09 09:33:00 - EGEE job's status has changed to "Ready" (host is ce1-
egee.srce.hr).• 2008.01.09 09:35:46 - EGEE job's status has changed to "Waiting" (host is egee-
ce.grid.niif.hu).• 2008.01.09 09:36:19 - EGEE job's status has changed to "Ready" (host is ce.cyf-
kr.edu.pl).• 2008.01.09 09:36:53 - EGEE job's status has changed to "Waiting" (host is• ce.cyf-kr.edu.pl).• 2008.01.09 09:37:26 - EGEE job's status has changed to "Done" (host is• egee-ce.grid.niif.hu).• 2008.01.09 09:37:26 - Job found to be finished. Checking again if this is• really the case.• 2008.01.09 09:38:03 - EGEE job's status has changed to "Ready" (host is• egee-ce1.gup.uni-linz.ac.at).
62
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Fault-tolerant Grid applications
• Utilizing– Condor DAGMan’s rescue mechanism– EGEE job resubmission mechanism of WMS
• If the EGEE broker leaves a job stuck in a CEs’ queue, the portal automatically – kills the job on this site and – resubmits the job to the broker by prohibiting this site.
• As a result – the portal guarantees the correct submission of a job as long as
there exists at least one matching resource – job submission is reliable even in an unreliable grid
63
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Fault-tolerance by P-GRADE portal
• 09:33: the broker assigned the job to a site: ce1-egee.srce.hr• 09:35: The broker moved the job to another site: egee-ce.grid.niif.hu• 09:36: Again the broker moved the job to another site: ce.cyf-kr.edu.pl• 09:37: The broker indicated that the job is Done, but .• 09:38: ... It turned out that the job was not finished (Done - Failed
status), only it was moved to another site: egee-ce1.gup.uni-linz.ac.at• 09:39: Again the broker moved the job to another site: ares02.cyf-
kr.edu.pl• 09:39: Again the broker moved the job to another site: ce.cyf-kr.edu.pl• 09:40: After trying 10 different sites the VOCE broker gave it up and
aborted the job (the Shallow RetryCount was set for 10):
• 2008.01.09 09:40:16 - The job has been aborted!
64
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Fault-tolerance by P-GRADE portal
• Our fault-tolerant portal did not give it up:• 2008.01.09 09:40:16 - The job can be submitted again (try 1 out of 3,• excluding host(s): ce.cyf-kr.edu.pl)• 2008.01.09 09:40:17 - Proxy with VOMS extensions created for VO "voce"
with• accounting group "".• 2008.01.09 09:40:17 - Job submission in progress...• 2008.01.09 09:40:27 - Job has been submitted successfully!• 2008.01.09 09:40:27 - Job identifier is:• "https://skurut1.cesnet.cz:9000/o22BTVqQsvwzj2wn5KP8_A"• 2008.01.09 09:40:30 - EGEE job's status has changed to "Waiting" (host is ).• 2008.01.09 09:41:04 - EGEE job's status has changed to "Ready" (host is• eszakigrid66.inf.elte.hu).
65
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
• 2008.01.09 09:41:37 - EGEE job's status has changed to "Scheduled" (host is• eszakigrid66.inf.elte.hu).• 2008.01.09 09:44:57 - EGEE job's status has changed to "Done" (host is• eszakigrid66.inf.elte.hu).• 2008.01.09 09:44:57 - Job found to be finished. Checking again if this is• really the case.• 2008.01.09 09:45:34 - EGEE job's status has changed to "Waiting" (host is• eszakigrid66.inf.elte.hu).• 2008.01.09 10:06:06 - The job's status hasn't changed for 20 minutes,• resubmitting...
It is a quite frequently occurring problem in EGEE-like grids that the broker leaves jobs stuck in CEs. queues.) In such case the portal automatically kills the job on this site and resubmits it to the broker.
• 2008.01.09 10:06:06 - Proxy with VOMS extensions created for VO "voce" with accounting group "".
• 2008.01.09 10:06:06 - Job submission in progress...• 2008.01.09 10:06:12 - Job has been submitted successfully!
• 10:10: The job successfully finished with exit code 0 on site: ce.ui.savba.sk
Fault-tolerance by P-GRADE portal
66
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Lessons learnt
• P-GRADE portal provides– Easy-to-use but powerful workflow system (graphical editor, wf
manager, etc.)
– Three levels of parallelism MPI job level Workflow branch level Parameter sweep at workflow level
– Multi-grid/multi-VO access mechanism for various grids (LCG-2, gLite and GT2)
Simultaneous access Transparent access Migrating a workflow from one grid to another requires no modification in
the workflow
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
Thank you!
Learn once, use everywhereDevelop once, execute anywhere