View
4
Download
0
Category
Preview:
Citation preview
Application development Application development GG GGon EGEE with Pon EGEE with P--GRADE Portal GRADE Portal
Gergely SiposGergely Sipossipos@sztaki husipos@sztaki hu
MTA SZTAKI
sipos@sztaki.husipos@sztaki.hu
MTA SZTAKIwww.portal.p-grade.hu
t l@l d t ki hpgportal@lpds.sztaki.hu
1
ContentsContents
• P-GRADE Portal in a nutshellkfl d l i h h l• Workflow development with the Portal
• Workflow execution with the PortalWorkflow execution with the Portal• Scaling up to a parametric workflow
2
Current situation and trendsCurrent situation and trendsin Grid computingin Grid computingin Grid computingin Grid computing
• Fast evolution of Grid systems and middleware:– GT2, OGSA, GT3 (OGSI), GT4 (WSRF), LCG-2, gLite, …g
• Many production Grid systems are built with them– EGEE (LCG-2 gLite), UK NGS (GT2), Open Science Grid
(GT2 GT4) N d G id ( GT2)(GT2 GT4), NorduGrid (~GT2)• Although the same set of core services are available
everywhere they are implemented in different wayseverywhere, they are implemented in different ways– Data services– Computation servicesp– Security services (single sign-on)– (Brokers)
3
PP--GRADE Portal in a nutshellGRADE Portal in a nutshell
• General purpose, workflow-oriented computational Grid portal. Supports the development and execution ofSupports the development and execution of workflow-based Grid applications – a tool for Grid orchestration
• Open source with GPL license: http://sourceforge.net/projects/pgportal/ • Extends GridSphere
E t d ith tl t ( li ti ifi tl t )– Easy to expand with new portlets (e.g. application-specific portlets)– Easy to tailor to end-user needs
• Grid services supported by the portal:
Service EGEE grids Globus grids
Job execution Computing Element GRAM
File storage Storage Element GridFTP serverg g
Certificate management MyProxy
Information system BDII MDS-2, MDS-4
Brokering Workload Management System GTbrokerBrokering Workload Management System GTbroker
Job monitoring Mercury
Workflow & job visualization PROVE
4
Solves Grid interoperability problem at the workflow level
What is a P-GRADE Portal workflow?
• a directed acyclic graph herewhere
– Nodes represent jobs (batch programs to be executed on a computing element)computing element)
– Ports represent input/output files the jobs expect/produceArcs represent file transfer– Arcs represent file transfer operations
• semantics of the• semantics of the workflow:– A job can be executed if all
f it i t fil il blof its input files are available
5
Two levels of parallelism by a workflow
• The workflow concept of the P GRADE Portalthe P-GRADE Portal enables the efficient parallelization of complex
blproblems• Semantics of the
workflow enables twoworkflow enables two levels of parallelism:
– Parallel execution inside a workflowinside a workflow node– Parallel execution among orkflo
The job can be a parallel program
among workflow nodes
Multiple jobs can run parallel
6
parallel programrun parallel
UltraUltra--short range weather forecast short range weather forecast (Hungarian Meteorology Service)(Hungarian Meteorology Service)
F ti d
(Hungarian Meteorology Service)(Hungarian Meteorology Service)
25 x
Forecasting dangerous weather situations (storms, fog, etc.), crucial task in the protection of life and property
10 x25 5 x
protection of life and property
Processed information:surface level25 x 5 x surface level measurements, high-altitude measurements, radar satellite lightningradar, satellite, lightning, results of previous computed models
Requirements:•Execution time < 10 min
7
•High resolution (1km)
The typical user scenarioThe typical user scenarioPart 1 Part 1 -- development phasedevelopment phasep pp p
Certificateservers
Portal
SAVE WORKFLOW
Portalserver
Gridservicesservices
STARTSTART EDITOR
OPEN & EDIT or DEVELOP WORKFLOW
8
WORKFLOW
The typical user scenarioThe typical user scenarioPart 2 Part 2 -- execution phaseexecution phase
Certificateservers
TRANSFER FILES, SUBMIT JOBS
pp
DOWNLOAD PROXY CERTIFICATES
PortalVISUALIZE JOBS d
MONITOR JOBS
Portalserver
Gridservices
JOBS and WORKFLOW PROGRESS
servicesDOWNLOAD
(SMALL) RESULTS
DOWNLOAD (SMALL) RESULTS
9
The typical user scenario The typical user scenario Development phase:Development phase:p pp p
Certificateservers
Portal
SAVE WORKFLOW
Portalserver
Gridservicesservices
STARTSTART EDITOR
OPEN & EDIT or DEVELOP or IMPORT
10
or IMPORT WORKFLOW
Workflow developmentWorkflow developmentOpening the workflow editorOpening the workflow editorp gp g
The editor is a Java Webstart applicationThe editor is a Java Webstart application
d l d d i t ll ti i l li k!download and installation is only one click!
11
Workflow Workflow EditorEditorDefining the graphDefining the graphg g pg g p
• The aim is to define a DAG of batch jobs:DAG of batch jobs:1. Drag & drop
components:components:jobs and ports
2. Define their propertiesp p3. Connect ports by
channels(no cycles, no loops, no conditions)
12
Workflow Workflow EditorEditorProperties of a jobProperties of a jobProperties of a jobProperties of a job
Properties of a job:Properties of a job:• Binary executable• Type of executable• Number of required• Number of required
processors• Command line
tparameters• The resource to be used
for the execution:• Grid/VO• (Computing element)
13
Direct resource selection:Direct resource selection:Which computing element to use?Which computing element to use?Which computing element to use? Which computing element to use?
The information system portlet
queries BDII and I still don’t GIIS serversI still don t know which resource toresource to
use!
14
Automatic resource selectionAutomatic resource selection
1. Select a broker Grid/VO for the job(e.g. GILDA_LCG2_broker)
2 (Describe the ranks & requirements of the2. (Describe the ranks & requirements of the job in JDL)
h l ill h b k fi d h3. The portal will use the broker to find the best resource for the job!
15
Workflow Workflow EditorEditorDefining broker jobsDefining broker jobsg jg j
Select a Grid with broker!(*_BROKER)( _ )
Ignore the resource field!
If default JDL is not sufficient use the built-in JDL editor!
16
Workflow Workflow EditorEditorB iltB ilt in JDL editorin JDL editorBuiltBuilt--in JDL editorin JDL editor
17
JDL look at the gLite Users’ manual!
Workflow Workflow EditorEditorDefining inputDefining input--output filesoutput filesDefining inputDefining input output filesoutput files
File propertiesp pType:input: the job readsoutput: the job generates
File type:local: comes from my desktopremote: comes from an SE
File:location of the file
Internal file name: E t bl d th fil iExecutable reads the file inthis name – fopen(“file.in”, …)
File storage type (output files only):only):Permanent: final resultVolatile: only data channel
18
How to refer to an I/O file?How to refer to an I/O file?
Input file Output fileLocal fileLocal file
Input file Output file
• Client side location:result.dat
• Client side location:c:\experiments\11-04.dat
• LFC logical file namelfn:/grid/gilda/sipos/11-04_-_result.dat
• LFC logical file namelfn:/grid/gilda/sipos/11-04.dat
R t filR t fil19
Remote fileRemote file
Local vs. remote filesLocal vs. remote files
GridYour jobs can access storage files directly too!
Gridservices
LOCAL INPUT
Portal
Storage elementsLOCAL INPUT
FILES&
LOCAL INPUT FILES
& EXECUTABLES
Portalserver REMOTE
INPUTFILES
REMOTE OUTPUT
FILES
& EXECUTABLES
LOCAL
Computing
FILES FILESLOCAL OUTPUT
FILESLOCAL
OUTPUT FILES
elements
Only the permanent
fil !
20
files!
Job level file transfer in EGEE VOsJob level file transfer in EGEE VOs
MyJob
0 1
Your code CAN but DOES NOT HAVE TO
EGEE VO2 3speak grid protocols
LOCAL FILESGenerated
Portalserver
Computing Element
Storage ElementsREMOTE
INPUTFILE
LOCAL FILES:• Local input• User binary• Pre script
by the portal
server FILE
Custom file transfer
EGEEWMS
Pre script
Binary of MyJob
p• Post script
Input sandbox
1
0
Shell i t
2 3REMOTE
OUTPUT FILE
transfer WMS of MyJob
Post scriptLOCAL OUTPUT
Output sandboxscripts
gLite File Catalog
OUTPUT FILE
REGISTER
0 1FILE
2 3
21
Catalog ServiceREMOTE
OUTPUT FILE
Reminder: Reminder: Grid files and JDLGrid files and JDLGrid files and JDLGrid files and JDL
Example JDL filep fExecutable = “gridTest”;StdError = “stderr.log”; The file itself is NOT transferred by StdError stderr.log ;StdOutput = “stdout.log”;InputSandbox = {“/home/joda/test/gridTest”};
lfn: logical file name
ythe gLite (or Globus) middleware!
Your binary must transfer InputSandbox = { /home/joda/test/gridTest };OutputSandbox = {“stderr.log”, “stdout.log”};I tD t “lf / id/ ild / di /t tb d0 00019”
input/output grid files!
InputData = “lfn:/grid/gilda/mydir/testbed0-00019”;OutputData = “lfn:/grid/gilda/mydir/result0-00019”;
P-GRADE Portal tranfers the file for you.
Y t bl d t h t k
22
Your executable does not have to know any grid protocol if it is used in P-GRADE
Workflow Workflow EditorEditorSaving the workflowSaving the workflowSaving the workflowSaving the workflow
Workflow is defined!
Let’s execute it!Let’s execute it!
23
Executing workflows Executing workflows ith th Pith th P GRADEGRADEwith the Pwith the P--GRADE GRADE
PortalPortalMain steps
PortalPortal
1. Download proxies2 Submit workflow
Main steps
2. Submit workflow3. Observe workflow progress4. If some error occurs correct the graphg p5. Download result
24
The typical user scenarioThe typical user scenarioExecution phase Execution phase –– step 1:step 1:pp pp
Certificateservers
DOWNLOAD PROXY CERTIFICATES
PortalPortalserver
Gridservicesservices
25
MyProxy interaction in PMyProxy interaction in P--GRADE: GRADE: Certificate ManagerCertificate ManagerCertificate ManagerCertificate Manager
Certificates portletCertificates portlet
• To start your sessionTo start your session on the Grid you must create a proxy certificate on the portal server
• “Certificates” portlet:
• to upload a proxy into MyProxy servers
• to download a proxy from MyProxy into theMyProxy into the portal server
26
Certificate ManagerCertificate ManagerDownloading a proxyDownloading a proxyDownloading a proxyDownloading a proxy
1. MyProxy server access details:H t• Hostname
• Port number• User name (from upload)• Password (from upload)• Password (from upload)
2. Proxy parameters:• Lifetime• CommentComment
3. Grid association
27
Certificates, proxies with gLite VOs:Certificates, proxies with gLite VOs:DownloadDownload
MyProxyserver Proxy1 VOMS
serverserver
Proxy2VOMS ext.
Portal
I have to do this before workflow
b i i
Proxy2
Portalserver
Gridservices
submission
Proxy
VOMS ext.
servicesProxy2
28
Certificate ManagerCertificate Managerassociating the proxy with a gridassociating the proxy with a gridg p y gg p y g
This operation displays the details of the certificate andof the certificate and the list of available Grids (defined by portal administrator)
29
Certificate ManagerCertificate Managerbrowsing proxiesbrowsing proxiesg pg p
Multiple proxies can be available on the portal server at the same time!same time!
HUNGRID CEs and SEs
30
SEE-GRID CEs and SEsHUNGRID CEs and SEs
The typical user scenarioThe typical user scenarioExecution phase Execution phase -- step 2: step 2:
Certificateservers TRANSFER FILES,
SUBMIT JOBS
pp pp
SUBMIT JOBS
PortalPortalserver
Gridservicesservices
31
Workflow ManagementWorkflow Management(workflow portlet)(workflow portlet)(workflow portlet)(workflow portlet)
• The portlet presents the status, size and output of the available workflow in the “Workflow” listin the Workflow list
• It has a Quota manager to control the users’ storage space on the server• The portlet also contains the “Abort”, “Attach”, “Details”, “Delete” and
“Delete all” buttons to handle execution of workflows• The “Attach” button opens the workflow in the Workflow Editor• The “Details” button gives an overview about the jobs of the workflow
32
Workflow ExecutionWorkflow Execution(observation by the workflow portlet)(observation by the workflow portlet)( y p )( y p )
33White/Red/Green color means the job is initial/running/finished state
Workflow ExecutionWorkflow Execution(observation by the workflow portlet)(observation by the workflow portlet)( y p )( y p )
34White/Red/Green color means the job is initial/running/finished state
Workflow ExecutionWorkflow Execution(observation by the workflow portlet)(observation by the workflow portlet)( y p )( y p )
35White/Red/Green color means the job is initial/running/finished state
Workflow ExecutionWorkflow Execution(observation by the workflow portlet)(observation by the workflow portlet)( y p )( y p )
36White/Red/Green color means the job is initial/running/finished state
Workflow ExecutionWorkflow Execution(observation by the workflow portlet)(observation by the workflow portlet)( y p )( y p )
37White/Red/Green color means the job is initialised/running/finished
The typical user scenarioThe typical user scenarioExecution phase Execution phase –– step 3:step 3:
Certificateservers
pp pp
PortalVISUALIZE JOBS d
MONITOR JOBS
Portalserver
Gridservices
JOBS and WORKFLOW PROGRESS
services
38
OnOn--Line Monitoring both at theLine Monitoring both at theworkflow and job levelsworkflow and job levels (workflow portlet)(workflow portlet)workflow and job levels workflow and job levels (workflow portlet)(workflow portlet)
The portal- The portal monitors and visualizes workflow progress
39
Rescuing a failed workflow 1.Rescuing a failed workflow 1.
A job failed during Read the error logj gworkflow execution
Read the error log to know why
40
Rescuing a failed workflow 2.Rescuing a failed workflow 2.
Map the failed jobMap the failed job onto a different
CE or download a Don’t touch the finished jobs!
new proxy for it.
The execution can continue
from the point
41
of failure!
Analysis of the logAnalysis of the log• 2008.01.09 09:32:19 - Proxy with VOMS extensions created for VO "voce"
with accounting group "".• 2008.01.09 09:32:19 - Job submission in progress...p g• 2008.01.09 09:32:23 - Job has been submitted successfully!• 2008.01.09 09:32:23 - Job identifier is:• "https://skurut1.cesnet.cz:9000/mD_8VzPhm8AmIToTJKtigg"• 2008.01.09 09:32:26 - EGEE job's status has changed to "Waiting" (host is ).• 2008.01.09 09:33:00 - EGEE job's status has changed to "Ready" (host is ce1-
egee.srce.hr).2008 01 09 09 35 46 EGEE j b' t t h h d t "W iti " (h t i• 2008.01.09 09:35:46 - EGEE job's status has changed to "Waiting" (host is egee-ce.grid.niif.hu).
• 2008.01.09 09:36:19 - EGEE job's status has changed to "Ready" (host is ce.cyf-kr.edu.pl).y p )
• 2008.01.09 09:36:53 - EGEE job's status has changed to "Waiting" (host is• ce.cyf-kr.edu.pl).• 2008.01.09 09:37:26 - EGEE job's status has changed to "Done" (host isj g (• egee-ce.grid.niif.hu).• 2008.01.09 09:37:26 - Job found to be finished. Checking again if this is• really the case.
45
• 2008.01.09 09:38:03 - EGEE job's status has changed to "Ready" (host is• egee-ce1.gup.uni-linz.ac.at).
FaultFault--tolerance by Ptolerance by P--GRADE portalGRADE portal
• 09:33: the broker assigned the job to a site: ce1-egee.srce.hr• 09:35: The broker moved the job to another site: egee-ce.grid.niif.hu• 09:36: Again the broker moved the job to another site: ce.cyf-kr.edu.pl• 09:37: The broker indicated that the job is Done, but .• 09:38: ... It turned out that the job was not finished (Done - Failed status),
only• it was moved to another site: egee-ce1.gup.uni-linz.ac.at• 09:39: Again the broker moved the job to another site: ares02.cyf-
kr.edu.pl• 09:39: Again the broker moved the job to another site: ce.cyf-kr.edu.pl• 09:40: After trying 10 different sites the VOCE broker gave it up and
aborted the job (the Shallow RetryCount was set for 10):
• 2008.01.09 09:40:16 - The job has been aborted!
46
FaultFault--tolerance by Ptolerance by P--GRADE portalGRADE portal
• Our fault-tolerant portal did not give it up:• 2008.01.09 09:40:16 - The job can be submitted again (try 1 out of 3,• excluding host(s): ce.cyf-kr.edu.pl)• 2008.01.09 09:40:17 - Proxy with VOMS extensions created for VO
"voce" with• accounting group "".• 2008.01.09 09:40:17 - Job submission in progress...• 2008.01.09 09:40:27 - Job has been submitted successfully!• 2008.01.09 09:40:27 - Job identifier is:• "https://skurut1.cesnet.cz:9000/o22BTVqQsvwzj2wn5KP8 A"ps://s u u .ces e .c :9000/o VqQsvw j w 5 8_• 2008.01.09 09:40:30 - EGEE job's status has changed to "Waiting" (host
is ).• 2008.01.09 09:41:04 - EGEE job's status has changed to "Ready" (host isj g y (• eszakigrid66.inf.elte.hu).
47
FaultFault--tolerance by Ptolerance by P--GRADE portalGRADE portal• 2008.01.09 09:41:37 - EGEE job's status has changed to "Scheduled" (host is• eszakigrid66.inf.elte.hu).• 2008 01 09 09:44:57 - EGEE job's status has changed to "Done" (host is2008.01.09 09:44:57 EGEE job s status has changed to Done (host is• eszakigrid66.inf.elte.hu).• 2008.01.09 09:44:57 - Job found to be finished. Checking again if this is• really the case.really the case.• 2008.01.09 09:45:34 - EGEE job's status has changed to "Waiting" (host is• eszakigrid66.inf.elte.hu).• 2008.01.09 10:06:06 - The job's status hasn't changed for 20 minutes,j g ,• resubmitting...
• It is a quite frequently occurring problem in EGEE-like grids that the broker l j b k i C ) h h l i ll killleaves jobs stuck in CEs. queues.) In such case the portal automatically kills the job on this site and resubmits it to the broker.
• 2008 01 09 10:06:06 Proxy with VOMS extensions created for VO "voce"• 2008.01.09 10:06:06 - Proxy with VOMS extensions created for VO voce with accounting group "".
• 2008.01.09 10:06:06 - Job submission in progress...• 2008.01.09 10:06:12 - Job has been submitted successfully!
48
y
• 10:10: The job successfully finished with exit code 0 on site: ce.ui.savba.sk
The typical user scenarioThe typical user scenarioExecution phase Execution phase –– step 5step 5
Certificateservers
pp pp
PortalPortalserver
Gridservicesservices
DOWNLOAD (SMALL)RESULTS
DOWNLOAD(SMALL) RESULTS
49
Scaling up a workflow to a Scaling up a workflow to a parameter study with Pparameter study with P GRADEGRADEparameter study with Pparameter study with P--GRADEGRADE
Complete workflow
Files in an LFC directory(e.g. /grid/gilda/sipos/myinputs)(e.g. /grid/gilda/sipos/myinputs)
52
Files in an LFC directory (e.g. /grid/gilda/sipos/myoutputs)
Turning a WF into a parameter studyTurning a WF into a parameter study
By turning at least one of the open input ports into a “PS Input port”
the WF is turned into a Parameter Study
53
InputInput--output files are stored in SEsoutput files are stored in SEs
/grid/gilda/sipos/InputImagesImage.0Image 1
/grid/gilda/sipos/XCoordinatesXCoordinate.0XCoordinate 1
/grid/gilda/sipos/YCoordinatesYCoordinate.0YCoordinate 1Image.1 XCoordinate.1 YCoordinate.1
2 x 2 x 2 = 8 execution of the whole workflow
/grid/gilda/sipos/Output
54
workflow1.zipworkflow2.zip. . .
PS Input Port of Simple PSPS Input Port of Simple PS
Remote file Directoryinstead ofinstead of
FILE reference
Do not use the prefix lfn:
55
if the directory isEGEE Grid file
catalogue
Simple PS Activity 2: placement of resultSimple PS Activity 2: placement of result
Menu itemPS Properties
Properties of the VO File Catalog
PS Propertiescan be called within the Workflowmenumenu
The Output directoryThe Output directorywill contain the set of individual compressed files.Each compressed file
One SE of the chosen VOEach compressed file
contains the outputs of an element Workflow have been elaborated over an item of the
VO
56
over an item of the PS Input Set
Executing PS workflowsExecuting PS workflowsgg
PS Details for parameter sweep workflows
applications
57
applications
Workflow Manager List Workflow Manager List PS Details viewPS Details view showing eWFshowing eWF--ss
New, middle l l li t tlevel list to render the details of a PS WorkflowWorkflow
Statistics showsStatistics shows the progress of the elaboration of the whole PS
The eWorkflow b ff li t hbuffer list shows the state of the Workflows being processed.
58
DetailsDetails view of the view of the eWFeWF Ax_EQU_B_voce_PS.6Ax_EQU_B_voce_PS.6
Job level details of an eWorkflow
See, that the button Attach is missing as there is not to much importance to access the WE
59
puntil the eWorkflow list is
exhausted
Advanced PS WFs inAdvanced PS WFs inAdvanced PS WFs in Advanced PS WFs in PP--GRADE PortalGRADE Portal
60
Advanced parameter studiesAdvanced parameter studies
Generator component(s)
Initial input data
Complete Generate or
cut input intosmaller pieces
workflow
Collector component(s)
Files in an LFC directory(e.g. /grid/gilda/sipos/myinputs)(e.g. /grid/gilda/sipos/myinputs)
61
Aggregate result
Files in an LFC directory (e.g. /grid/gilda/sipos/myoutputs)
Concept of advanced parameter Concept of advanced parameter study workflowsstudy workflowsstudy workflowsstudy workflows
GENGEN
Generator part generates the
SEQSEQSEQSEQ
generates the input parameter
space
Parameter study part
COLLCollector partevaluates and
62
evaluates and integrates the results
Parameter generatorParameter generator
Generator can be attached to any parameter input port
Generator can be• Auto generator: to generate text filesgenerate text files• Custom generator: to generate any content
Generated files are moved into SE by the
t lportal
63
Definition Window of Auto Generator JobDefinition Window of Auto Generator Job
User defines the template of the text file
User puts keys into the template
User defines values for the keys• Integer number• Real number• Custom setCusto set• …By clicking on
a key the definition
window for this key is opened.
64
(Auto) Generator Attribute Editor for SE (Auto) Generator Attribute Editor for SE definitiondefinition
Attribute Editor defines the properties of remote files created by the Generator:
1. Storage Element must be defined if an LCG like (EGEE ) file access has been defined in the PS Output Port belonging to the Generator
65
Detailed view of a PS workflowDetailed view of a PS workflow
Overall statistics of
Generator job(s)
Overall statistics of workflow instances
Workflow instances
66
Collector job(s)
Additional featuresAdditional features
• Workflows and traces can be exported pfrom the portal server onto your client machinemachine
• Workflows and traces can be imported i h linto the Portal
• Share your workflows or results with other researchers!Mi t li ti f t l i t th !• Migrate your application from one portal into another!
67
Workflow/trace export/importWorkflow/trace export/import
To export a kfl f
To delete every unnecessaryworkflow from
the portal onto your machine
unnecessary files of the workflowyour machine
To deleteTo deleteTo delete trace/output of the workflow
To delete trace/output of the workflow
(if any)(if any)
68
RReferenceseferences
• P-GRADE Portal service is available forG o se v ce s v b e o– SEE-GRID infrastructure– Central European VO of EGEECentral European VO of EGEE– GILDA: Training VO of EGEE– Many national Grids (UK National Grid Service,Many national Grids (UK National Grid Service,
HunGrid, etc.)– US Open Science Grid, TeraGridp ,– Economy-Grid, Swiss BioGrid, Bio and Biomed
EGEE VOs, BioInfoGrid, BalticGrid – GIN VO
71
The OGF-GIN Resource Testing portalTo test GIN resources and to demonstrate WF level interoperabilityTo test GIN resources and to demonstrate WF level interoperability
P-GRADEP GRADEGEMLCA
Portal
P-GRADEportal
GEMLCA GEMLCA RepositoryRepositoryRepositoryRepository
74
Recommended