int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications
on the Grid
Elisa Heymann
Department of Computer Architecture and Operating Systems
Condor Week 2008, May 2008 2partner’s
logo
Outline
Introduction CrossBroker Parallel Job Support Interactive Job Support Conclusions
Condor Week 2008, May 2008 3partner’s
logo
Introduction
int.eu.grid Environment:gLite (EGEE Grid Middleware)Extensions
CrossBrokerMigrating Desktop
Jobs not handled by gLite:parallel jobs (MPI)
Run in more than one resource
Interactive jobsThe user interacts with the application during its execution
Condor Week 2008, May 2008 4partner’s
logo
REMOTE SITE
Internet
REMOTE SITE
Middleware Middleware
SERVICES
Middleware
Batch execution on Grids
F1 F2Job
O1 O2
Condor Week 2008, May 2008 5partner’s
logo
REMOTE SITE
Internet
REMOTE SITE
Middleware Middleware
SERVICES
Middleware
F1 F2Job
Parallel & Interactive Job Execution
Use of resources from different sitesResource-sets searchCo-allocation & synchronizationFast start-upExecution in high-occupancy situations
F1 F2Job
MPI
I/O forwarding
Condor Week 2008, May 2008 6partner’s
logo
Architecture
SchedulingAgent
ResourceSearcher
ApplicationLauncher
Condor-G DAGMan
CE
WN WN
EGEE/Globus
CE
WN WN
EGEE/Globus
MigratingDesktop
InformationIndex
ReplicaManager
CrossBroker
Condor Week 2008, May 2008 7partner’s
logo
Architecture - CrossBroker
Scheduling AgentReceives each job and keeps it in a persistent queueContacts Resource Searcher and gets a list of available resources Selects resources and passes them to the Application Launcher
Resource SearcherGiven a job description (JobAd), performs the matchmaking between job needs and available resources.Uses the Condor ClassAd library, originally designed for matches of a single job with a single resource.A set matching has been developed to support matches of a single job to a group of resources.
Application LauncherResponsible for providing a reliable submission service of parallel applications on the Grid.Responsible for file staging at the remote site (executable and input/output files)Uses the services of Condor-G
Condor Week 2008, May 2008 8partner’s
logo
Parallel Job Support
Support for parallel jobs:Open MPIPACX-MPIMPICH-P4MPICH-G2
Takes into account sites capabilites Ability to define starter scripts/process to
start the parallel jobmpi-start is configured automatically and used by default.
Condor Week 2008, May 2008 10partner’s
logo
Parallel Job Support
Job Description Language file:JOBTYPE:
Normal: sequential jobs, just one CPUParallel: more than one CPU
SUBJOBTYPE:openmpipacx-mpimpichmpich-g2plain
JOBSTARTER (if not defined, mpi-start)JOBSTARTERARGUMENTS
Condor Week 2008, May 2008 11partner’s
logo
Parallel Job Support
Type = "Job";VirtualOrganisation = "imain";JobType = "Parallel";SubJobType = "pacx-mpi";NodeNumber = 5;Executable = "test-app";Arguments = "-v";InputSandbox = {"test-app", "inputfile"};OutputSanbox = {"std.out", "std.err"};StdErr = "std.err“;StdOutput = "std.out";Rank = other.GlueHostBenchmarkSI00 ;Requirements = other.GlueCEStateStatus == "Production";
Condor Week 2008, May 2008 12partner’s
logo
MPI Across Sites
CrossBroker search and selects sets of resources for the jobs
There is no guarantee that all tasks of the same job will start at the same time
1st choice: select only sites with free resources. The job will run immediately. Unfortunately, free resources are not always available2nd choice: allocate a resource temporally and wait until all other tasks show up. Timeshare the resource with a backfilling policy to avoid resource idleness
Condor Week 2008, May 2008 13partner’s
logo
MPI Across Sites
[Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10
[Groups with 2 CEs] [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 [Rank=1000] bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 lngrid02.lip.pt:2129/jobmanager-pbs-workq freeCPUs = 2
CE
CE4= xgrid.icm.edu.plFreeCPUs = 6Disk = 100AverageSI = 1000
CE
CE2=aocegrid.uab.esFreeCPUs = 10Disk = 100AverageSI = 4000
CE
CE3=bee001.ific.uv.esFreeCPUs = 3Disk = 100AverageSI = 1000
CE
CE1=zeus.cyf-kr.edu.plFreeCPUs = 2Disk = 100AverageSI = 2000
RS
MPI enabled CE
Non-MPI enabled CE
CE
CE5=lngrid02.lip.ptFreeCPUs = 2Disk = 100AverageSI = 1000
[Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10
[Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3Rank=1000] lngrid02.lip.pt/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3
Condor Week 2008, May 2008 14partner’s
logo
Time Sharing
SchedulingAgent
Condor-G
CrossBroker Grid Resource
LRMSMPIJOB
Condor Week 2008, May 2008 15partner’s
logo
Time Sharing
SchedulingAgent
Condor-G
CrossBroker
ApplicationLauncher
Grid Resource
LRMSMPIJOB
Condor Week 2008, May 2008 16partner’s
logo
Time Sharing
SchedulingAgent
Condor-G
CrossBroker
ApplicationLauncher
Grid Resource
LRMS
Condor GlideIn
VM1 VM2
MPIJOB
Condor Week 2008, May 2008 17partner’s
logo
Time Sharing
SchedulingAgent
Condor-G
CrossBroker
ApplicationLauncher
Grid Resource
LRMS
Condor GlideIn
VM1 VM2
MPIJOB
Condor Week 2008, May 2008 18partner’s
logo
Time Sharing
SchedulingAgent
Condor-G
CrossBroker
ApplicationLauncher
Grid Resource
LRMS
Condor GlideIn
VM1 VM2MPI
TASK
Waitfor the rest of
MPI tasks
Condor Week 2008, May 2008 19partner’s
logo
Time Sharing
SchedulingAgent
Condor-G
CrossBroker
ApplicationLauncher
Grid Resource
LRMS
Condor GlideIn
VM1 VM2MPI
TASK
JOB
Condor Week 2008, May 2008 20partner’s
logo
Time Sharing
SchedulingAgent
Condor-G
CrossBroker
ApplicationLauncher
Grid Resource
LRMS
Condor GlideIn
VM1 VM2MPI
TASK JOB
BackFillingwhile the MPI waits
Condor Week 2008, May 2008 21partner’s
logo
Time Sharing
SchedulingAgent
Condor-G
CrossBroker
ApplicationLauncher
Grid Resource
LRMS
Condor GlideIn
VM1 VM2MPI
TASK
All tasksReady!
JOB
Condor Week 2008, May 2008 22partner’s
logo
Interactive Job Support
Scheduling priorityInteractive jobs are sent to sites with available machinesIf there are not available machines, use time sharing
Support for interactivity in all kinds of jobssequential and all the MPI flavors
CrossBroker injects interactive agents that enable communication between user and job
Transparent to the userFull integration with glogin & gVidCondor Bypass supported
Condor Week 2008, May 2008 23partner’s
logo
Interactive Job Support
Job Description Language file:INTERACTIVE: true/false. Indicates that the job is interactive and the broker should treat it with higher proirity
INTERACTIVEAGENTINTERACTIVEAGENTARGUMENTS
These attributes specify the command (and its arguments) used to communicate with the user.
Condor Week 2008, May 2008 24partner’s
logo
Interactive Job Support
Type = "Job";VirtualOrganisation = "imain";JobType = "Parallel";SubJobType = “openmpi";NodeNumber = 11;Interactive = TRUE;InteractiveAgent = “glogin“;InteractiveAgentArguments = “-r –p 195.168.105.65:23433“;Executable = "test-app";InputSandbox = {"test-app", "inputfile"};OutputSanbox = {"std.out", "std.err"};StdErr = "std.err“;StdOutput = "std.out";Rank = other.GlueHostBenchmarkSI00 ;Requirements = other.GlueCEStateStatus == "Production";
Condor Week 2008, May 2008 25partner’s
logo
Interactive Job Support
Particle trajectories in Fusion devices
Increasing the temperature of a gas, we get a plasma state
At this temperature, the union of light atom nuclei is possible through an exothermal process:
Mass after fusion process is less than before itExceeding mass -> energy
Condor Week 2008, May 2008 26partner’s
logo
Time Sharing
SchedulingAgent
Condor-G
CrossBroker
ApplicationLauncher
Grid Resource
LRMS
Condor GlideIn
VM1 VM2BATCH
INT.JOB
Condor Week 2008, May 2008 27partner’s
logo
Time Sharing
SchedulingAgent
Condor-G
CrossBroker
ApplicationLauncher
Grid Resource
LRMS
Agent
VM1 VM2BATCH INT.JOBStartup-time
ReductionOnly one
layer involved
Condor Week 2008, May 2008 28partner’s
logo
Conclusions
CrossBroker supports both Parallel and Interactive jobs
AutomaticallyInteroperable with EGEE
Glide InFast startup of jobsCo-allocation without reservation or wasting resources
Real ApplicationsVisualization of plasma in fusion devices Evolution of pollution clouds in the atmosphere Ultrasound Computing Tomography: Reconstruction of a 3D volumeFLUIDYNAMICS application
Questions?
Elisa Heymann
Department of Computer Architecture and Operating Systems