Transcript
Page 1: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Grid Computing I

CONDOR

Page 2: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

2

Agenda

• What is condor?• What is Condor good for?• How condor works?• How to submit a job?

Page 3: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

3

What is Condor?

• Condor converts collections of distributively owned workstations and dedicated clusters into a distributed high-throughput computing (HTC) facility.

• Condor manages both resources (machines) and resource requests (jobs)

• Condor has several unique mechanisms such as :– ClassAd Matchmaking – Process checkpoint/ restart / migration– Remote System Calls– Grid Awareness

Page 4: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

How Condor worksCondor provides: • a job queueing mechanism• scheduling policy• priority scheme• resource monitoring, and• resource management.

Users submit their serial or parallel jobs to Condor, Condor places them into a queue, … chooses when and where to run the jobs based upon a

policy, … carefully monitors their progress, and … ultimately informs the user upon completion.

Page 5: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Condor Architecture

Page 6: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Condor Daemons in action

Page 7: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

condor_master

• Starts up all other Condor daemons• If there are any problems and a daemon exits, it

restarts the daemon and sends email to the administrator

• Checks the time stamps on the binaries of the other Condor daemons, and if new binaries appear, the master will gracefully shutdown the currently running version and start the new version

• Also supports various administrative commands such as starting, stopping or reconfiguring daemons remotly.

Page 8: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

condor_startd

• Represents a machine to the Condor system• Advertises information related to the node

resources to the Central Manager(condor_collector)

• Responsible for starting, suspending, and stopping jobs

• Enforces the wishes of the machine owner (the owner’s “policy”)

Page 9: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

condor_starter

• Only runs on Execution Host• Sets up the execution environment and

monitors the job.

Page 10: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

condor_schedd

• Represents users to the Condor system• Maintains the persistent queue of jobs• Responsible for contacting available machines and

sending them jobs• Services user commands which manipulate the job

queue:– condor_submit,condor_rm, condor_q, condor_hold,

condor_release, condor_prio

Page 11: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

condor_collector

• Collects information from all other Condor daemons in the pool– “Directory Service” / Database for a Condor pool

• Each daemon sends a periodic update called a “ClassAd” to the collector

• Services queries for information:– Queries from other Condor daemons– Queries from users (condor_status)

Page 12: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

condor_negotiator

• Performs “matchmaking” in Condor• Gets information from the collector about all

available machines and all idle jobs• Tries to match jobs with machines that will

serve them • Both the job and the machine must satisfy

each other’s requirements

Page 13: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Job Life Cycle in Condor

• Job submission: Job submitted by a host using condor_submit command

• Job request advertising: On receiving a job request, the condor_schedd daemon on the submission host advertises a request to the condor_collector

• Resource advertising: Each condor_startd daemon running on an Execution host advertises available resources on host to condor_collector

Page 14: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Job life cycle (Cont…)• Resource matching: condor_negotiator daemon

queries the condor_collector daemon to match a resource for a user job request. It then informs the condor_schedd on the submission host of the matched host

• Job execution: The condor_schedd on submission host interacts with the condor_strtd daemon running oon the matched host, which spawns a condor_starter daemon. The condor_schedd on submission host spawns a condor_shadow daemon.

• Return output: When job is completed , the results are sent back.

Page 15: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Condor Universes

• Universe in Condor defines an execution environment

• Condor can support various combinations of features/environments in different “Universes”

• Different Universes provide different functionality for your job

Page 16: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Condor Universes• Serial Jobs

Vanilla Universe Standard Universe

• Scheduler Universe• Parallel Jobs

PVM Universe MPI Universe

• Java Universe• Globus Universe

Page 17: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Vanilla universe

• Intended for programs that can not be relinked

• The existing executable can be used without re-compiling or re-linking

• Can not use Remote System Calls• No checkpointing, no migration• Can suspend or restart the job

Page 18: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Standard universe

• checkpointing, automatic migration for sequential jobs

• Existing program should be re-linked with the Condor instrumentation library

• The application cannot use some system calls (fork,socket, alarm)

• Grabs file operations and passes back to the shadow process

Page 19: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Scheduler Universe

• The job does not wait to be matched to a machine. Instead executes right away on the machine where the job is submitted

• Machine requirements are not considered

Page 20: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

PVM universe

• Used to run parallel job written in PVM 3.4

Page 21: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

MPI universe

• MPICH usage without any necessary changes• Dynamic changes are not supported• The application cannot be suspended

Page 22: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Java Universe

• Submitted program runs on any sort of machine with JVM regardless of its location, owner, or JVM version

• Condor takes care of all the details as finding the JVM binary and setting classpath

Page 23: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Globus Universe

• Provides standard Condor interface to Globus users

• Each job submission file is translated in Globus RSL

• Jobs submitted to Globus via GRAM protocol

Page 24: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Submitting a job

• Write a Java class and compile it.Public class Simple{

public static void main(String arg[]){....

}}

Page 25: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Submitting a job (Cont…)

• Create a submit file. Name this file submit.javaUniverse = java Executable = simple.class Arguments = simple 4 10 Log = simple.log Output = simple.out Error = simple.error Queue

Page 26: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Submitting a job (Cont…)

Page 27: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Example job description file

Universe = vanilla Executable = fooRequirements=Memory >= 32 && OpSys == “LINUX" &&

Arch ==“x86“Image_Size = 28 Meg Error = err.$(Process)Input = in.$(Process)Output = out.$(Process) Log = foo.log Queue 150

Page 28: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Current Limitations

• Limitations on Jobs that can be checkpointed• Jobs need to be re-linked to get

Checkpointing and Remote System Calls

Page 29: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

Summary

• Special resource management (batch)system– Distributed, heterogeneous system.– Goal: exploitation of spare computing cycles.– It can migrate jobs from one machine to another.– The ClassAds mechanism is used to match

resource requirements and resources

Page 30: Grid Computing I CONDOR. 2 Agenda What is condor? What is Condor good for? How condor works? How to submit a job?

References

• This presentation was prepared from the material provided by the Condor Project Team

http://www.cs.wisc.edu/condor/


Recommended