Presentation 15 condor-v1

15 Condor – A Distributed Job Scheduler

Todd Tannenbaum, Derek Wright, Karen Miller, and Miron Livny

Beowulf Cluster Computing with Linux, Thomas Sterling, editor, Oct. 2001.

Summarized by Simon Kim

Contents

• Introduction to Condor• Using Condor• Condor Architecture• Installing Condor under Linux• Configuring Condor• Administration Tools• Cluster Setup Scenarios

Introduction to Condor

• Distributed Job Scheduler• Condor Research Project at University of

Wisconsin-Madison Department of Computer Sciences

• Changed name to HTCondor in 2012– http://research.cs.wisc.edu/htcondor


Condor

Job

Run Run

IdleMonitor Progress

Report

Queue

Run

Idle Run

Nodes

User

Policy

Complete!


• Workload Management System• Job Queuing Mechanism• Scheduling Policy• Priority Scheme• Resource Monitoring and Management

Condor Features

• Distributed Submission• User/Job Priorities• Job Dependency - DAG• Multiple Job Models – Serial/Parallel Jobs• ClassAds – Job : Machine Matchmaking• Job Checkpoint and Migration• Remote System Calls – Seamless I/O Redirection• Grid Computing – Interaction with Globus

Resources

ClassAds and Matchmaking

• Job ClassAd– Looking for Machine– Requirements: Intel, Linux, Disk Space, …– Rank: Memory, Kflops, …

• Machine ClassAd– Looking for Job– Requirements– Rank

Using Condor

• Roadmap to Using Condor• Submitting a Job• User Commands• Universes• Standard Universe

– Process Checkpointing– Remote System Calls– Relinking– Limitations

• Data File Access• DAGMan Scheduler

Using Condor

Batch Job

STDIN

STDOUT

STDERR

univers = vanillaexecutable = foolog = foo.loginput = input.dataoutput = output.dataqueue

Submit Description

Standard

Vanilla

PVM

MPI

Grid

Scheduler

Universes:Runtime Environment

Prepare a Job

Submit

Serial Job

Parallel Job

Meta Scheduler

$ condor_submit

Status of Submitted Jobs

• $ condor_status -submitters

All jobs in the Queue• $ condor_q

• Removing Job– $ condor_rm 350.0

• Changing Job Priority: -20 ~ 20(high), default: 0– $ condor_prio –p -15 350.1

Universes

• Execution Environment - Universe• Vanilla– Serial Jobs– Binary Executable and Scripts

• MPI Universe– MPI Programs– Parallel Jobs– Only on Dedicated Resources

# Submit DescriptionUniverse = mpi…machine_count = 8queue

Universes

• PVM Universe– Master-worker Style Parallel

Programs• Written for Parallel Virtual Machine

Interface

– Both Dedicated and Non-dedicated (workstations)

– Condor Acts as Resource Manager for PVM Daemon

– Dynamic Node Allocation

PVM Daemon

Condor

# Submit DescriptionUniverse = pvm…machine_count = 1..75queue

pvm_addhosts()

Universes

• Scheduler Universe– Meta-Scheduler– DAGMan Scheduler• Complex Interdependencies Between Jobs

A

B C

D * B and C are executed in parallel

Job Sequence: A -> B and C -> D

Universes

• Standard Universe– Serial Job– Process Checkpoint, Restart, and Migration– Remote System Calls

Process Checkpointing

• Checkpoint– Snapshot of the Program’s Current State– Preemptive Resume Scheduling– Periodic Checkpoints – Fault Tolerance– No Program Source Code Change

• Relinking with Condor System Call Library

– Signal Handler• Process State Written to a Local/Network File• Stack/Data Segments, CPU state, Open Files, Signal Handlers and

Pending Signals

– Optional Checkpoint Server• Checkpoint Repository

Remote System Calls

• Redirects File I/O– Open(), read(), write() -> Network Socket I/O– Sent to ‘condor_shadow’ process on Submit

Machine• Handles Actual File I/O• Note that Job Runs on Remote Machine

• Relinking Condor Remote System Call Library– $ condor_compile cc myprog.o –o myprog

Standard Universe Limitations

• No Multi-Process Jobs– fork(), exec(), system()

• No IPC – Pipes, Semaphores, and Shared Memory

• Brief Network Communication– Long Connection -> Delay Checkpoints and Migration

• No Kernel-level Threads– User-level Threads Are Allowed

• File Access: Read-only or Write-only– Read-Write: Hard to Roll Back to Old Checkpoint

• On Linux, Must be Statically Linked

Data Access from a Job

• Remote System Call – Standard Universe• Shared Network File System• What About Non-dedicated Machines (Desktops) ?– Condor File Transfer– Before Run, Input Files Transferred to Remote– On Completion, Output Files Transferred Back to Submit

Machine– Requested in Submit Description File

• transfer_input_files = <…>, transfer_output_files=<…>• transfer_files=<ONEXIT | ALWAYS | NEVER>

Condor ArchitectureCentral Manager Machine

Negotiator

Collector

Startd

Sched

Startd

Sched

Machine 1

Startd

Sched

Machine 2

Startd

Sched

Machine N

Condor ArchitectureCentral Manager Machine

Negotiator

Collector

Startd

Sched

Startd

Sched

Machine 1: Submit

Startd

Sched

Machine N: Execute

Starter

Job

ShadowCondor Remote

System Call

Cluster Setup Scenarios

• Uniformed Owned Dedicated Cluster– MPI Jobs on Dedicated Nodes

• Cluster of Multi-Processor Nodes– 1VM per Processor

• Cluster of Distributively Owned Nodes– Jobs from Owner Preferred

• Desktop Submission to Cluster– Submit-only Node Setup

• Non-Dedicated Computing Resources– Opportunistic Scheduling and Matchmaking with Process

Checkpointing, Migration, Suspend and Resume

Conclusion

• Distinct Features– Matchmaking with Job and Machine ClassAds– Preemptive Scheduling and Migration with

Checkpointing– Condor Remote System Call

• Powerful Tool for Distributed Scheduling Jobs– Within and Beyond Beowulf Clusters

• Unique Combination of Dedicated and Opportunistic Scheduling

Technology

Presentation 15 condor-v1