29
HTCondor and Python

HTCondor and Python - research.cs.wisc.edu

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HTCondor and Python - research.cs.wisc.edu

HTCondor and Python

Page 2: HTCondor and Python - research.cs.wisc.edu

2

Overview

› Python binding basics

› What's new since last year

› Some real life examples

Page 3: HTCondor and Python - research.cs.wisc.edu

3

Hello, World!

>>> import htcondor>>> import classad>>> a = classad.ClassAd({"Hello": "World"})>>> a[ Hello = "World" ]>>>

Page 4: HTCondor and Python - research.cs.wisc.edu

4

Python Basics

› import htcondor; import classad

› Use dir() to list object names in a module; use help() to get the per-method or class help.

› print classad.version(), htcondor.version()

› htcondor.param[‘COLLECTOR_HOST’] to access parameter value of COLLECTOR_HOST.

Page 5: HTCondor and Python - research.cs.wisc.edu

5

In the beginning, therewere ClassAds.

› ClassAds are the lingua franca of HTCondor-land.

› HTCondor CLI often converts ClassAds into human readable form or XML.

› The python bindings use the internal ClassAd objects throughout.

› We try to make it pleasant to convert between ClassAds and native Python objects.

Page 6: HTCondor and Python - research.cs.wisc.edu

6

ClassAds

› >>> import htcondor

› >>> import classad

› >>> a = classad.ClassAd({"Hello": "World"})

› >>> a

› [ Hello = "World" ]

› >>> a["foo"] = "bar"

› >>> a

› [ foo = "bar"; Hello = "World" ]

› >>> a["bar"] = [1,2,3]

› >>> a

› [ bar = { 1,2,3 }; foo = "bar"; Hello = "World" ]

› >>> a["baz"] = classad.ExprTree("bar")

› >>> a["baz"]

› bar

› >>> a.eval("baz")

› [1, 2, 3]

Sub-ClassAds

>>> classad.ClassAd({“foo”; {“bar”; True}})

[ foo = [ bar = 1 ] ]

Page 7: HTCondor and Python - research.cs.wisc.edu

7

HTCondor Module

› The “htcondor” Python module allows you to interact with most HTCondor daemons.

› There are three very important objects: Collector: read and write Schedd: submit and manipulate Negotiator: manage user and their priorities

› And a few other helpers - enums, security manager, interaction with the config system, and sending daemons direct commands.

Page 8: HTCondor and Python - research.cs.wisc.edu

8

Collector Basics

› The Collector object allows one to locate daemons, query slot status, and advertise new ClassAds.

› The object takes the network location of the collector daemon for the constructor: coll = htcondor.Collector(“red-condor.unl.edu”)

› New: Can take a list of collectors for high-availability installations

Page 9: HTCondor and Python - research.cs.wisc.edu

9

Collector Basics>>> import htcondor>>> coll = htcondor.Collector()>>> ad = coll.locate(htcondor.DaemonTypes.Schedd, "submit-1.chtc.wisc.edu")>>> ad["MyAddress"]'<128.104.100.43:9618?sock=2309_6dc1_2>'>>> ads = coll.locateAll(htcondor.DaemonTypes.Schedd)>>> for ad in ads: print ad["Name"]... submit-1.chtc.wisc.edusubmit-2.chtc.wisc.edusubmit-3.chtc.wisc.edu>>>

Page 10: HTCondor and Python - research.cs.wisc.edu

10

Collector Advanced

› For many queries, pulling all attributes from the collector is expensive.

› You can specify a projection list of attributes. HTCondor will return the minimum number of attributes containing the ones you specify.

› It will always pad in a few extra.

Page 11: HTCondor and Python - research.cs.wisc.edu

11

Collector Advanced

>>> import classad>>> import htcondor>>> coll = htcondor.Collector()>>> ads = coll.query(htcondor.AdTypes.Startd, 'true', ['MyAddress', 'Name', 'Cpus', 'Memory'])>>> len(ads)4994>>> ads[0][ Name = "vmID.chtc.wisc.edu"; Cpus = 1; MyType = "Machine"; Memory = 3737; MyAddress = "<192.168.78.147:36909?CCBID=128.105.244.14:9618#306906&PrivNet=vm.wisc.edu>"; TargetType = "Job"; CurrentTime = time() ]>>>

Page 12: HTCondor and Python - research.cs.wisc.edu

12

Schedd

› Schedd methods submit: submit jobs query: get job ClassAds act: perform some action (hold, release, remove,

suspend, continue) on one or more jobs edit: edit one or more job ClassAds reschedule: have schedd request a new

negotiation cycle.

Page 13: HTCondor and Python - research.cs.wisc.edu

13

Submit ClassAds

› Normally we use submit files for jobs

› Python bindings use ClassAds Translation from submit file to ClassAds

• condor_submit -dump classads

• Translations:– error / Err – should_transfer_files / ShouldTransferFiles

– output / Out – transfer_input_files / TransferIn

– executable / Cmd – transfer_output_files / TransferOut

Translation from macro to ClassAd language• This: error = “test.err.$(Process)”

• Becomes: Err = strcat(“test.err.”,ProcID)

Page 14: HTCondor and Python - research.cs.wisc.edu

14

Schedd Example

>>> ad=classad.parse(open("test.submit.ad"))# Submits two jobs in the cluster; edit test.submit.ad to preference.>>> print schedd.submit(ad, 2)110>>> print schedd.act(htcondor.JobAction.Remove, ["111.0", "110.0"])' [ TotalNotFound = 0; TotalPermissionDenied = 0; TotalAlreadyDone = 0; TotalJobAds = 2; TotalSuccess = 2; TotalChangedAds = 1; TotalBadStatus = 0; TotalError = 0 ]

Page 15: HTCondor and Python - research.cs.wisc.edu

15

Daemon Commands

› Administrator can send arbitrary commands to HTCondor daemon via python Uses same protocol as condor_off and condor_on Similar to unix signals, the command is sent, no indication that

command did anything

› A new developer should keep to Reconfig, Restart, and DaemonsOff Some commands will take an extra argument – such as

subsystem for “DaemonOff”.

› htcondor.send_command(ad, htcondor.DaemonCommands.Restart)

Page 16: HTCondor and Python - research.cs.wisc.edu

16

Daemon Commands

› We hope this will improve the “scriptability” of a HTCondor pool For example, one could implement a rolling

restart cron job that ensures no more than 10% of nodes are draining at once.

Page 17: HTCondor and Python - research.cs.wisc.edu

17

etc

› To invalidate an existing in-process security session: htcondor.SecMan().invalidateAllSessions()

› To access the param subsystem: htcondor.param Treat like a python dictionary.

› To reload the client configuration from disk: htcondor.reload_config()

Page 18: HTCondor and Python - research.cs.wisc.edu

18

What's new since last year

› Negotiator class to manage users and their priorities

› Python 3 support

› Refresh GSI proxy

› Collector class can be initialized with a list of collectors

› Can ping HTCondor daemons

Page 19: HTCondor and Python - research.cs.wisc.edu

19

What's new (cont.)

› HTCondor bindings can accept python dictionaries as well as ClassAds

› Improved ClassAd expression building

› Stream parser of ClassAds

› Support to read event logs

› Support for schedd transactions

Page 20: HTCondor and Python - research.cs.wisc.edu

20

Ping Daemons

› htcondor.SecMan().ping(schedd_ad)

Page 21: HTCondor and Python - research.cs.wisc.edu

21

Accept python dictionariesExample before:

import htcondorimport classadhtcondor.Schedd().submit(classad.ClassAd({"Cmd": "/bin/echo"}))

Example after:

import htcondorhtcondor.Schedd().submit({"Cmd": "/bin/echo"})

Page 22: HTCondor and Python - research.cs.wisc.edu

22

ClassAd expressions

› Added quote() and unquote() to convert between python and ClassAd string literals classad.ExprTree("foo =?= %s" %

classad.quote("bar"))

› ClassAd expression building greatly improved foo = "bar" expr = classad.Attribute("MyType") ==

classad.Literal(foo)

Page 23: HTCondor and Python - research.cs.wisc.edu

23

ClassAd Stream Parser

>>> import classad

>>> import os

>>> fd = os.popen("condor_history -l -match 4")

>>> ads = classad.parseOldAds(fd)

>>> print [ad["ClusterId"] for ad in ads]

[23389L, 23388L, 23386L, 23387L]

>>>

Page 24: HTCondor and Python - research.cs.wisc.edu

24

Read Event Logs

› htcondor. read_events(open("LogFile") )

› Returns a python iterator of ClassAds

Page 25: HTCondor and Python - research.cs.wisc.edu

25

Schedd transactions

with schedd.transaction() as txn:

schedd.submit(...)

schedd.edit(...)

› Variable txn is destroyed at end of block

› If exception is thrown transaction is aborted

› Otherwise, it is committed upon destruction

Page 26: HTCondor and Python - research.cs.wisc.edu

26

Real World Examples

› Counting Job ads in Running State

› Invalidating stale startd ads

Page 27: HTCondor and Python - research.cs.wisc.edu

27

Number of Running Jobs

#!/usr/bin/pythonimport htcondor

collector = htcondor.Collector()

jn = 0for schedd_ad in collector.locateAll(htcondor.DaemonTypes.Schedd): schedd = htcondor.Schedd(schedd_ad) ads = schedd.query('JobStatus == 2', ['ClusterId']) jn += len(ads)print jn

Page 28: HTCondor and Python - research.cs.wisc.edu

28

Invalidate Startds#!/usr/bin/python# Flush stale machine classads

import optparseimport htcondor

parser = optparse.OptionParser()parser.add_option("-p", "--pool", dest="pool", default="")parser.add_option("-c", "--const", dest="constraint", default="true")parser.add_option("-d", "--doit", dest="doit", default=False)opts, args = parser.parse_args()

coll = htcondor.Collector(opts.pool)ads = coll.query(htcondor.AdTypes.Startd, opts.constraint, ['MyAddress', 'Name'])for a in ads: print a['Name']if (opts.doit): coll.advertise(ads, "INVALIDATE_STARTD_ADS")

Page 29: HTCondor and Python - research.cs.wisc.edu

29

Questions

http://research.cs.wisc.edu/htcondor/manual/v8.1/6_7Python_Bindings.html