Transcript
Page 1: Condor-G Making Condor Grid Enabled

Jaime FreyComputer Sciences DepartmentUniversity of Wisconsin-Madison

[email protected]://www.cs.wisc.edu/condor

Condor-GMaking Condor Grid

Enabled

Page 2: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Outline

› Why use Condor-G

› Globus Universe

› GlideIn

› Status & Future Work

Page 3: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

What is Condor-G?

› Extensions to Condor to allow access to the Grid through Globus

› Two Parts Globus Universe GlideIn

Page 4: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Why Use Condor-G

› Condor Designed to run jobs within a single

administrative domain

› Globus Designed to run jobs across many

administrative domains

› Condor-G Combine the strengths of both

Page 5: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Condor-G Helps Condor Users

› Machines available to Condor users are limited Local Condor Pool Friendly Condor Pools (via Flocking)

› Through Globus, many more machines become available to run your jobs

Page 6: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Condor-G Helps Globus Users

› Globus is primarily an infrastructure upon which to develop distributed applications

› Command-line tools are limited› Some users don’t want to rewrite their

applications to use Globus› Condor-G provides them a powerful

interface to the Grid to run their existing applications

Page 7: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Globus Universe

› Advantages of using Condor as a front-end to Globus Full-featured queuing service Fault-tolerance Credential Management

Page 8: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Full-Featured Queue

› Persistent queue

› Many queue-manipulation tools

› Set up job dependencies (DAGman)

› E-mail notification of events

› Log files

Page 9: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Fault-Tolerance

› Local Crash Queue state kept on disk Condor Master restarts other daemons

› Remote Crash Condor will resubmit jobs Globus jobmanager enhanced to

improve recoverability

Page 10: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Credential Management

› Authentication in Globus is done with limited-lifetime X509 proxies

› Proxy may expire before jobs finish executing

› Condor can put jobs on hold and e-mail user to refresh proxy

Page 11: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

How It Works

ScheddSchedd

LSFLSF

Personal Condor Globus Resource

Page 12: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

How It Works

ScheddSchedd

LSFLSF

Personal Condor Globus Resource

600 Globusjobs

Page 13: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

How It Works

ScheddSchedd

LSFLSF

Personal Condor Globus Resource

GridManagerGridManager

600 Globusjobs

Page 14: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

How It Works

ScheddSchedd JobManagerJobManager

LSFLSF

Personal Condor Globus Resource

GridManagerGridManager

600 Globusjobs

Page 15: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

How It Works

ScheddSchedd JobManagerJobManager

LSFLSF

User JobUser Job

Personal Condor Globus Resource

GridManagerGridManager

600 Globusjobs

Page 16: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Globus Universe

› Disadvantages No matchmaking or dynamic

scheduling of jobs No job checkpoint or migration No remote system calls

Page 17: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Solution: GlideIn

› Use the Globus Universe to run the Condor daemons on Globus resources

› When the resources run these GlideIn jobs, they will join your Condor Pool

› Submit your jobs as Standard or Vanilla Universe jobs and they will be matched and run on the Globus resources

Page 18: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

How It Works

ScheddSchedd

LSFLSF

CollectorCollector

Personal Condor Globus Resource

600 Condorjobs

Page 19: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

How It Works

ScheddSchedd

LSFLSF

CollectorCollector

Personal Condor Globus Resource

600 Condorjobs

glide-ins

Page 20: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

How It Works

ScheddSchedd

LSFLSF

CollectorCollector

Personal Condor Globus Resource

GridManagerGridManager

600 Condorjobs

glide-ins

Page 21: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

How It Works

ScheddSchedd JobManagerJobManager

LSFLSF

CollectorCollector

Personal Condor Globus Resource

GridManagerGridManager

600 Condorjobs

glide-ins

Page 22: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

How It Works

ScheddSchedd JobManagerJobManager

LSFLSF

StartdStartd

CollectorCollector

Personal Condor Globus Resource

GridManagerGridManager

600 Condorjobs

glide-ins

Page 23: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

How It Works

ScheddSchedd JobManagerJobManager

LSFLSF

StartdStartd

CollectorCollector

Personal Condor Globus Resource

GridManagerGridManager

600 Condorjobs

glide-ins

Page 24: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

How It Works

ScheddSchedd JobManagerJobManager

LSFLSF

User JobUser Job

StartdStartd

CollectorCollector

Personal Condor Globus Resource

GridManagerGridManager

600 Condorjobs

glide-ins

Page 25: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

GlideIn Concerns

› What if a Globus resource kills my GlideIn? That resource will disappear from your pool and

you jobs will be rescheduled on other machines

› What if all my jobs are done before a GlideIn runs? If the glided-in Condor daemons are not

matched with a job in 10 minutes, they terminate

Page 26: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

Page 27: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

600 Condorjobs

Page 28: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

600 Condorjobs

Page 29: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

glide-ins

600 Condorjobs

Page 30: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

glide-ins

600 Condorjobs

Page 31: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

glide-ins

600 Condorjobs

Page 32: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

glide-ins

600 Condorjobs

Page 33: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

glide-ins

600 Condorjobs

Page 34: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Current Status

› First version of GridManager ready Runs jobs using Globus GRAM Stages executable and standard I/O

using Globus GASS

› Jobmanager changes will be folded into a future release of Globus

› Credential management in progress

Page 35: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Future Work

› GridManager Stage user jobs’ data files

› Automatic GlideIn Condor creates GlideIn jobs when more

resources are needed

› Matchmaking in Globus Universe Use Globus GRIS to create ClassAds for

Globus resources and match them to job ClassAds

Page 36: Condor-G Making Condor Grid Enabled

www.cs.wisc.edu/condor

Questionsand

Thank You!


Recommended