Jaime FreyComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor
Condor-GMaking Condor Grid
Enabled
www.cs.wisc.edu/condor
Outline
› Why use Condor-G
› Globus Universe
› GlideIn
› Status & Future Work
www.cs.wisc.edu/condor
What is Condor-G?
› Extensions to Condor to allow access to the Grid through Globus
› Two Parts Globus Universe GlideIn
www.cs.wisc.edu/condor
Why Use Condor-G
› Condor Designed to run jobs within a single
administrative domain
› Globus Designed to run jobs across many
administrative domains
› Condor-G Combine the strengths of both
www.cs.wisc.edu/condor
Condor-G Helps Condor Users
› Machines available to Condor users are limited Local Condor Pool Friendly Condor Pools (via Flocking)
› Through Globus, many more machines become available to run your jobs
www.cs.wisc.edu/condor
Condor-G Helps Globus Users
› Globus is primarily an infrastructure upon which to develop distributed applications
› Command-line tools are limited› Some users don’t want to rewrite their
applications to use Globus› Condor-G provides them a powerful
interface to the Grid to run their existing applications
www.cs.wisc.edu/condor
Globus Universe
› Advantages of using Condor as a front-end to Globus Full-featured queuing service Fault-tolerance Credential Management
www.cs.wisc.edu/condor
Full-Featured Queue
› Persistent queue
› Many queue-manipulation tools
› Set up job dependencies (DAGman)
› E-mail notification of events
› Log files
www.cs.wisc.edu/condor
Fault-Tolerance
› Local Crash Queue state kept on disk Condor Master restarts other daemons
› Remote Crash Condor will resubmit jobs Globus jobmanager enhanced to
improve recoverability
www.cs.wisc.edu/condor
Credential Management
› Authentication in Globus is done with limited-lifetime X509 proxies
› Proxy may expire before jobs finish executing
› Condor can put jobs on hold and e-mail user to refresh proxy
www.cs.wisc.edu/condor
How It Works
ScheddSchedd
LSFLSF
Personal Condor Globus Resource
www.cs.wisc.edu/condor
How It Works
ScheddSchedd
LSFLSF
Personal Condor Globus Resource
600 Globusjobs
www.cs.wisc.edu/condor
How It Works
ScheddSchedd
LSFLSF
Personal Condor Globus Resource
GridManagerGridManager
600 Globusjobs
www.cs.wisc.edu/condor
How It Works
ScheddSchedd JobManagerJobManager
LSFLSF
Personal Condor Globus Resource
GridManagerGridManager
600 Globusjobs
www.cs.wisc.edu/condor
How It Works
ScheddSchedd JobManagerJobManager
LSFLSF
User JobUser Job
Personal Condor Globus Resource
GridManagerGridManager
600 Globusjobs
www.cs.wisc.edu/condor
Globus Universe
› Disadvantages No matchmaking or dynamic
scheduling of jobs No job checkpoint or migration No remote system calls
www.cs.wisc.edu/condor
Solution: GlideIn
› Use the Globus Universe to run the Condor daemons on Globus resources
› When the resources run these GlideIn jobs, they will join your Condor Pool
› Submit your jobs as Standard or Vanilla Universe jobs and they will be matched and run on the Globus resources
www.cs.wisc.edu/condor
How It Works
ScheddSchedd
LSFLSF
CollectorCollector
Personal Condor Globus Resource
600 Condorjobs
www.cs.wisc.edu/condor
How It Works
ScheddSchedd
LSFLSF
CollectorCollector
Personal Condor Globus Resource
600 Condorjobs
glide-ins
www.cs.wisc.edu/condor
How It Works
ScheddSchedd
LSFLSF
CollectorCollector
Personal Condor Globus Resource
GridManagerGridManager
600 Condorjobs
glide-ins
www.cs.wisc.edu/condor
How It Works
ScheddSchedd JobManagerJobManager
LSFLSF
CollectorCollector
Personal Condor Globus Resource
GridManagerGridManager
600 Condorjobs
glide-ins
www.cs.wisc.edu/condor
How It Works
ScheddSchedd JobManagerJobManager
LSFLSF
StartdStartd
CollectorCollector
Personal Condor Globus Resource
GridManagerGridManager
600 Condorjobs
glide-ins
www.cs.wisc.edu/condor
How It Works
ScheddSchedd JobManagerJobManager
LSFLSF
StartdStartd
CollectorCollector
Personal Condor Globus Resource
GridManagerGridManager
600 Condorjobs
glide-ins
www.cs.wisc.edu/condor
How It Works
ScheddSchedd JobManagerJobManager
LSFLSF
User JobUser Job
StartdStartd
CollectorCollector
Personal Condor Globus Resource
GridManagerGridManager
600 Condorjobs
glide-ins
www.cs.wisc.edu/condor
GlideIn Concerns
› What if a Globus resource kills my GlideIn? That resource will disappear from your pool and
you jobs will be rescheduled on other machines
› What if all my jobs are done before a GlideIn runs? If the glided-in Condor daemons are not
matched with a job in 10 minutes, they terminate
www.cs.wisc.edu/condor
yourworkstation
personalCondor
Globus Grid
PBS LSF
Condor
GroupCondor
www.cs.wisc.edu/condor
yourworkstation
personalCondor
Globus Grid
PBS LSF
Condor
GroupCondor
600 Condorjobs
www.cs.wisc.edu/condor
yourworkstation
personalCondor
Globus Grid
PBS LSF
Condor
GroupCondor
600 Condorjobs
www.cs.wisc.edu/condor
yourworkstation
personalCondor
Globus Grid
PBS LSF
Condor
GroupCondor
glide-ins
600 Condorjobs
www.cs.wisc.edu/condor
yourworkstation
personalCondor
Globus Grid
PBS LSF
Condor
GroupCondor
glide-ins
600 Condorjobs
www.cs.wisc.edu/condor
yourworkstation
personalCondor
Globus Grid
PBS LSF
Condor
GroupCondor
glide-ins
600 Condorjobs
www.cs.wisc.edu/condor
yourworkstation
personalCondor
Globus Grid
PBS LSF
Condor
GroupCondor
glide-ins
600 Condorjobs
www.cs.wisc.edu/condor
yourworkstation
personalCondor
Globus Grid
PBS LSF
Condor
GroupCondor
glide-ins
600 Condorjobs
www.cs.wisc.edu/condor
Current Status
› First version of GridManager ready Runs jobs using Globus GRAM Stages executable and standard I/O
using Globus GASS
› Jobmanager changes will be folded into a future release of Globus
› Credential management in progress
www.cs.wisc.edu/condor
Future Work
› GridManager Stage user jobs’ data files
› Automatic GlideIn Condor creates GlideIn jobs when more
resources are needed
› Matchmaking in Globus Universe Use Globus GRIS to create ClassAds for
Globus resources and match them to job ClassAds
www.cs.wisc.edu/condor
Questionsand
Thank You!