30
1 www.cs.wisc.edu/condor The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison http://www.cs.wisc.edu/condor [email protected]

Www.cs.wisc.edu/condor 1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison

Embed Size (px)

Citation preview

The Roadmap to New ReleasesTodd Tannenbaum
Stable vs. Development Series
Much like the Linux kernel, Condor provides two different releases at any time:
Stable series
Development series
Allows Condor to be both a research project and a production-ready system
www.cs.wisc.edu/condor
Releases are heavily tested
Only bug fixes and ports to new platforms are added on a stable series
www.cs.wisc.edu/condor
Stable series (cont.)
A given stable release is always compatible with other releases from the same series
Recommended for production pools
Series number in the version is odd (e.g. 6.1.17, 6.3.1)
New features and new technology are added frequently
Versions from the same development series are not always compatible with each other
www.cs.wisc.edu/condor
… unless new features are required
… unless we recommend otherwise :^)
Where is Condor Today?
Version 6.3.2 being released asap – this is the v6.4.0 release candidate.
We expect version 6.4.0 released by the end of March.
www.cs.wisc.edu/condor
www.cs.wisc.edu/condor
RedHat 7.x
www.cs.wisc.edu/condor
New Ports in 6.4.0 (cont.)
”Clipped" support (no checkpointing, PVM, or remote system calls, but all other functionality is available)
Windows 2000
Encryption
Integrity
www.cs.wisc.edu/condor
Globus Universe
Java Universe
queue
condor_submit
www.cs.wisc.edu/condor
Why not use Vanilla Universe for Java jobs?
Java Universe provides more than just inserting “java” at the start of the execute line
Knows which machines have a JVM installed
Knows the location, version, and performance of JVM on each machine
Provides more information about Java job completion than just JVM exit code
Program runs in a Java wrapper, allowing Condor to report Java exceptions, etc.
www.cs.wisc.edu/condor
aish.cs.wisc. Sun Microsy 1.2.2 Owner Idle 0.000 249
anfrom.cs.wis Sun Microsy 1.2.2 Owner Idle 0.030 249
babe.cs.wisc. Sun Microsy 1.2.2 Claimed Busy 1.120 123
...
Condor File Transfer
Condor will transfer job files from the submit machine to the execute machine
Files to send and/or receive specified at submit time
Transfer is atomic
Appeared in v6.2 only in Condor for Windows
www.cs.wisc.edu/condor
Default: Send back any new/changed files
www.cs.wisc.edu/condor
Remote I/O Socket
Job can request that the condor_starter process on the execute machine create a Remote I/O Socket
Used for online access of file on submit machine – without Standard Universe.
Use in Vanilla, Java, …
Java: FileInputStream -> ChirpInputStream
C : open() -> chirp_open()
User can supply job policy expressions in the submit file.
Can be used to describe a successful run.
on_exit_remove = <expression>
on_exit_hold = <expression>
periodic_remove = <expression>
periodic_hold = <expression>
on_exit_remove = ExitBySignal == False
Place on hold if exits with nonzero status or ran for less than an hour:
on_exit_hold = ((ExitBySignal==False) && (ExitSignal != 0)) || ((ServerStartTime – JobStartDate) < 3600)
Place on hold if job has spent more than 50% of its time suspended:
periodic_hold = CumulativeSuspensionTime > (RemoteWallClockTime / 2.0)
LOWPORT = x
HIGHPORT = y
All dynamic ports will be between x and y inclusive
Condor + Firewalls/Private Networks:
Who: Se-Chang Son
Time: 9am-12pm Weds
Where: rm 3387
On both NT and Win2k
New universes added: MPI, Java, Scheduler (and Globus in the works!)
DAGMan ported
CondorView ported
Allows submission from directories on shared filesystems
www.cs.wisc.edu/condor
DAGMan
NeST
PFS
HawkEye
Condor-G
Big Item:
More focus on being a service provider than just an end-user tool:
Developer APIs / libraries
www.cs.wisc.edu/condor
Remote I/O
Conditionals !!
if/then/else
Clean implementations in C++ and Java
ClassAd collections
Re-write of the checkpoint server
Add secure communication
NEST technology infusion
Store meta-data along with checkpoint files
www.cs.wisc.edu/condor