Upload
simon-kim
View
162
Download
0
Embed Size (px)
Citation preview
15 Condor – A Distributed Job Scheduler
Todd Tannenbaum, Derek Wright, Karen Miller, and Miron Livny
Beowulf Cluster Computing with Linux, Thomas Sterling, editor, Oct. 2001.
Summarized by Simon Kim
Contents
• Introduction to Condor• Using Condor• Condor Architecture• Installing Condor under Linux• Configuring Condor• Administration Tools• Cluster Setup Scenarios
Introduction to Condor
• Distributed Job Scheduler• Condor Research Project at University of
Wisconsin-Madison Department of Computer Sciences
• Changed name to HTCondor in 2012– http://research.cs.wisc.edu/htcondor
Introduction to Condor
Condor
Job
Run Run
IdleMonitor Progress
Report
Queue
Run
Idle Run
Nodes
User
Policy
Complete!
Introduction to Condor
• Workload Management System• Job Queuing Mechanism• Scheduling Policy• Priority Scheme• Resource Monitoring and Management
Condor Features
• Distributed Submission• User/Job Priorities• Job Dependency - DAG• Multiple Job Models – Serial/Parallel Jobs• ClassAds – Job : Machine Matchmaking• Job Checkpoint and Migration• Remote System Calls – Seamless I/O Redirection• Grid Computing – Interaction with Globus
Resources
ClassAds and Matchmaking
• Job ClassAd– Looking for Machine– Requirements: Intel, Linux, Disk Space, …– Rank: Memory, Kflops, …
• Machine ClassAd– Looking for Job– Requirements– Rank
Using Condor
• Roadmap to Using Condor• Submitting a Job• User Commands• Universes• Standard Universe
– Process Checkpointing– Remote System Calls– Relinking– Limitations
• Data File Access• DAGMan Scheduler
Using Condor
Batch Job
STDIN
STDOUT
STDERR
univers = vanillaexecutable = foolog = foo.loginput = input.dataoutput = output.dataqueue
Submit Description
Standard
Vanilla
PVM
MPI
Grid
Scheduler
Universes:Runtime Environment
Prepare a Job
Submit
Serial Job
Parallel Job
Meta Scheduler
$ condor_submit
Status of Submitted Jobs
• $ condor_status -submitters
All jobs in the Queue• $ condor_q
• Removing Job– $ condor_rm 350.0
• Changing Job Priority: -20 ~ 20(high), default: 0– $ condor_prio –p -15 350.1
Universes
• Execution Environment - Universe• Vanilla– Serial Jobs– Binary Executable and Scripts
• MPI Universe– MPI Programs– Parallel Jobs– Only on Dedicated Resources
# Submit DescriptionUniverse = mpi…machine_count = 8queue
Universes
• PVM Universe– Master-worker Style Parallel
Programs• Written for Parallel Virtual Machine
Interface
– Both Dedicated and Non-dedicated (workstations)
– Condor Acts as Resource Manager for PVM Daemon
– Dynamic Node Allocation
PVM Daemon
Condor
# Submit DescriptionUniverse = pvm…machine_count = 1..75queue
pvm_addhosts()
Universes
• Scheduler Universe– Meta-Scheduler– DAGMan Scheduler• Complex Interdependencies Between Jobs
A
B C
D * B and C are executed in parallel
Job Sequence: A -> B and C -> D
Universes
• Standard Universe– Serial Job– Process Checkpoint, Restart, and Migration– Remote System Calls
Process Checkpointing
• Checkpoint– Snapshot of the Program’s Current State– Preemptive Resume Scheduling– Periodic Checkpoints – Fault Tolerance– No Program Source Code Change
• Relinking with Condor System Call Library
– Signal Handler• Process State Written to a Local/Network File• Stack/Data Segments, CPU state, Open Files, Signal Handlers and
Pending Signals
– Optional Checkpoint Server• Checkpoint Repository
Remote System Calls
• Redirects File I/O– Open(), read(), write() -> Network Socket I/O– Sent to ‘condor_shadow’ process on Submit
Machine• Handles Actual File I/O• Note that Job Runs on Remote Machine
• Relinking Condor Remote System Call Library– $ condor_compile cc myprog.o –o myprog
Standard Universe Limitations
• No Multi-Process Jobs– fork(), exec(), system()
• No IPC – Pipes, Semaphores, and Shared Memory
• Brief Network Communication– Long Connection -> Delay Checkpoints and Migration
• No Kernel-level Threads– User-level Threads Are Allowed
• File Access: Read-only or Write-only– Read-Write: Hard to Roll Back to Old Checkpoint
• On Linux, Must be Statically Linked
Data Access from a Job
• Remote System Call – Standard Universe• Shared Network File System• What About Non-dedicated Machines (Desktops) ?– Condor File Transfer– Before Run, Input Files Transferred to Remote– On Completion, Output Files Transferred Back to Submit
Machine– Requested in Submit Description File
• transfer_input_files = <…>, transfer_output_files=<…>• transfer_files=<ONEXIT | ALWAYS | NEVER>
Condor ArchitectureCentral Manager Machine
Negotiator
Collector
Startd
Sched
Startd
Sched
Machine 1
Startd
Sched
Machine 2
Startd
Sched
Machine N
Condor ArchitectureCentral Manager Machine
Negotiator
Collector
Startd
Sched
Startd
Sched
Machine 1: Submit
Startd
Sched
Machine N: Execute
Starter
Job
ShadowCondor Remote
System Call
Cluster Setup Scenarios
• Uniformed Owned Dedicated Cluster– MPI Jobs on Dedicated Nodes
• Cluster of Multi-Processor Nodes– 1VM per Processor
• Cluster of Distributively Owned Nodes– Jobs from Owner Preferred
• Desktop Submission to Cluster– Submit-only Node Setup
• Non-Dedicated Computing Resources– Opportunistic Scheduling and Matchmaking with Process
Checkpointing, Migration, Suspend and Resume
Conclusion
• Distinct Features– Matchmaking with Job and Machine ClassAds– Preemptive Scheduling and Migration with
Checkpointing– Condor Remote System Call
• Powerful Tool for Distributed Scheduling Jobs– Within and Beyond Beowulf Clusters
• Unique Combination of Dedicated and Opportunistic Scheduling