28
int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid Elisa Heymann Department of Computer Architecture and Operating Systems

int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

  • Upload
    raja

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid. Elisa Heymann Department of Computer Architecture and Operating Systems. Outline. Introduction CrossBroker Parallel Job Support Interactive Job Support Conclusions. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications

on the Grid

Elisa Heymann

Department of Computer Architecture and Operating Systems

Page 2: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 2partner’s

logo

Outline

Introduction CrossBroker Parallel Job Support Interactive Job Support Conclusions

Page 3: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 3partner’s

logo

Introduction

int.eu.grid Environment:gLite (EGEE Grid Middleware)Extensions

CrossBrokerMigrating Desktop

Jobs not handled by gLite:parallel jobs (MPI)

Run in more than one resource

Interactive jobsThe user interacts with the application during its execution

Page 4: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 4partner’s

logo

REMOTE SITE

Internet

REMOTE SITE

Middleware Middleware

SERVICES

Middleware

Batch execution on Grids

F1 F2Job

O1 O2

Page 5: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 5partner’s

logo

REMOTE SITE

Internet

REMOTE SITE

Middleware Middleware

SERVICES

Middleware

F1 F2Job

Parallel & Interactive Job Execution

Use of resources from different sitesResource-sets searchCo-allocation & synchronizationFast start-upExecution in high-occupancy situations

F1 F2Job

MPI

I/O forwarding

Page 6: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 6partner’s

logo

Architecture

SchedulingAgent

ResourceSearcher

ApplicationLauncher

Condor-G DAGMan

CE

WN WN

EGEE/Globus

CE

WN WN

EGEE/Globus

MigratingDesktop

InformationIndex

ReplicaManager

CrossBroker

Page 7: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 7partner’s

logo

Architecture - CrossBroker

Scheduling AgentReceives each job and keeps it in a persistent queueContacts Resource Searcher and gets a list of available resources Selects resources and passes them to the Application Launcher

Resource SearcherGiven a job description (JobAd), performs the matchmaking between job needs and available resources.Uses the Condor ClassAd library, originally designed for matches of a single job with a single resource.A set matching has been developed to support matches of a single job to a group of resources.

Application LauncherResponsible for providing a reliable submission service of parallel applications on the Grid.Responsible for file staging at the remote site (executable and input/output files)Uses the services of Condor-G

Page 8: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 8partner’s

logo

Parallel Job Support

Support for parallel jobs:Open MPIPACX-MPIMPICH-P4MPICH-G2

Takes into account sites capabilites Ability to define starter scripts/process to

start the parallel jobmpi-start is configured automatically and used by default.

Page 9: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 10partner’s

logo

Parallel Job Support

Job Description Language file:JOBTYPE:

Normal: sequential jobs, just one CPUParallel: more than one CPU

SUBJOBTYPE:openmpipacx-mpimpichmpich-g2plain

JOBSTARTER (if not defined, mpi-start)JOBSTARTERARGUMENTS

Page 10: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 11partner’s

logo

Parallel Job Support

Type = "Job";VirtualOrganisation = "imain";JobType = "Parallel";SubJobType = "pacx-mpi";NodeNumber = 5;Executable = "test-app";Arguments = "-v";InputSandbox = {"test-app", "inputfile"};OutputSanbox = {"std.out", "std.err"};StdErr = "std.err“;StdOutput = "std.out";Rank = other.GlueHostBenchmarkSI00 ;Requirements = other.GlueCEStateStatus == "Production";

Page 11: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 12partner’s

logo

MPI Across Sites

CrossBroker search and selects sets of resources for the jobs

There is no guarantee that all tasks of the same job will start at the same time

1st choice: select only sites with free resources. The job will run immediately. Unfortunately, free resources are not always available2nd choice: allocate a resource temporally and wait until all other tasks show up. Timeshare the resource with a backfilling policy to avoid resource idleness

Page 12: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 13partner’s

logo

MPI Across Sites

[Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10

[Groups with 2 CEs] [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 [Rank=1000] bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 lngrid02.lip.pt:2129/jobmanager-pbs-workq freeCPUs = 2

CE

CE4= xgrid.icm.edu.plFreeCPUs = 6Disk = 100AverageSI = 1000

CE

CE2=aocegrid.uab.esFreeCPUs = 10Disk = 100AverageSI = 4000

CE

CE3=bee001.ific.uv.esFreeCPUs = 3Disk = 100AverageSI = 1000

CE

CE1=zeus.cyf-kr.edu.plFreeCPUs = 2Disk = 100AverageSI = 2000

RS

MPI enabled CE

Non-MPI enabled CE

CE

CE5=lngrid02.lip.ptFreeCPUs = 2Disk = 100AverageSI = 1000

[Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10

[Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3Rank=1000] lngrid02.lip.pt/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3

Page 13: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 14partner’s

logo

Time Sharing

SchedulingAgent

Condor-G

CrossBroker Grid Resource

LRMSMPIJOB

Page 14: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 15partner’s

logo

Time Sharing

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMSMPIJOB

Page 15: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 16partner’s

logo

Time Sharing

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Condor GlideIn

VM1 VM2

MPIJOB

Page 16: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 17partner’s

logo

Time Sharing

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Condor GlideIn

VM1 VM2

MPIJOB

Page 17: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 18partner’s

logo

Time Sharing

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Condor GlideIn

VM1 VM2MPI

TASK

Waitfor the rest of

MPI tasks

Page 18: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 19partner’s

logo

Time Sharing

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Condor GlideIn

VM1 VM2MPI

TASK

JOB

Page 19: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 20partner’s

logo

Time Sharing

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Condor GlideIn

VM1 VM2MPI

TASK JOB

BackFillingwhile the MPI waits

Page 20: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 21partner’s

logo

Time Sharing

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Condor GlideIn

VM1 VM2MPI

TASK

All tasksReady!

JOB

Page 21: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 22partner’s

logo

Interactive Job Support

Scheduling priorityInteractive jobs are sent to sites with available machinesIf there are not available machines, use time sharing

Support for interactivity in all kinds of jobssequential and all the MPI flavors

CrossBroker injects interactive agents that enable communication between user and job

Transparent to the userFull integration with glogin & gVidCondor Bypass supported

Page 22: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 23partner’s

logo

Interactive Job Support

Job Description Language file:INTERACTIVE: true/false. Indicates that the job is interactive and the broker should treat it with higher proirity

INTERACTIVEAGENTINTERACTIVEAGENTARGUMENTS

These attributes specify the command (and its arguments) used to communicate with the user.

Page 23: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 24partner’s

logo

Interactive Job Support

Type = "Job";VirtualOrganisation = "imain";JobType = "Parallel";SubJobType = “openmpi";NodeNumber = 11;Interactive = TRUE;InteractiveAgent = “glogin“;InteractiveAgentArguments = “-r –p 195.168.105.65:23433“;Executable = "test-app";InputSandbox = {"test-app", "inputfile"};OutputSanbox = {"std.out", "std.err"};StdErr = "std.err“;StdOutput = "std.out";Rank = other.GlueHostBenchmarkSI00 ;Requirements = other.GlueCEStateStatus == "Production";

Page 24: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 25partner’s

logo

Interactive Job Support

Particle trajectories in Fusion devices

Increasing the temperature of a gas, we get a plasma state

At this temperature, the union of light atom nuclei is possible through an exothermal process:

Mass after fusion process is less than before itExceeding mass -> energy

Page 25: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 26partner’s

logo

Time Sharing

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Condor GlideIn

VM1 VM2BATCH

INT.JOB

Page 26: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 27partner’s

logo

Time Sharing

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Agent

VM1 VM2BATCH INT.JOBStartup-time

ReductionOnly one

layer involved

Page 27: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Condor Week 2008, May 2008 28partner’s

logo

Conclusions

CrossBroker supports both Parallel and Interactive jobs

AutomaticallyInteroperable with EGEE

Glide InFast startup of jobsCo-allocation without reservation or wasting resources

Real ApplicationsVisualization of plasma in fusion devices Evolution of pollution clouds in the atmosphere Ultrasound Computing Tomography: Reconstruction of a 3D volumeFLUIDYNAMICS application

Page 28: int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Questions?

Elisa Heymann

Department of Computer Architecture and Operating Systems