Upload
truongdung
View
214
Download
2
Embed Size (px)
Citation preview
Anoverviewofbatchprocessing
2-June-2016
Yourcomputer
Yourprogram
One-on-one
Yourcomputer(mul?plecores)
YourprogramYourprogram Yourprogram YourprogramYourprogram
Mul?pleprogramsonasinglecomputer
Yourcomputer(mul?plecores)
YourprogramYourprogram Yourprogram YourprogramYourprogram
Abatchsystemmanagingmul?pleprogramsonasinglecomputer
YourprogramYourprogram Yourprogram YourprogramYourprogram
onhold
Batchnode
YourprogramYourprogram Yourprogram YourprogramYourprogram
Abatchsystemmanagingmul?pleprogramsonmul?plecomputers
YourprogramYourprogram Yourprogram YourprogramYourprogram
onhold
Yourcomputer
BatchnodeBatchnode
Batchnode BatchnodeBatchnode
Batchmanager
ThestandardsoFwareformanagingbatchsystemsinscien?ficcompu?ngisHTCondor(orjustCondor)
MainwebpagehKp://research.cs.wisc.edu/htcondor/
Quickstart
hKp://research.cs.wisc.edu/htcondor/quick-start.html
FullmanualhKp://research.cs.wisc.edu/htcondor/manual/v7.6/2_Users_Manual.html
• WeuseanolderversionofCondorintheNevispar?cle-physics
systems.• S?cktothe“vanilla”universe;the“standard”universewon’twork
forROOToranyotherpar?cle-physicssoFware(soyoudon’tneedcondor_compile).
Batchnode
YourprogramYourprogram Yourprogram YourprogramYourprogram
Condormanagingmul?pleprogramsonmul?plecomputerswithmul?plequeues
YourprogramYourprogram Yourprogram YourprogramYourprogram
onhold
Submitmachine
BatchnodeBatchnode
Batchnode BatchnodeBatchnode
Condormaster
Condorpool
Batchnode
YourprogramYourprogram Yourprogram YourprogramYourprogram
Condorwillhaltaqueueinfavorofaninterac?veprogram
YourprogramYourprogram Yourprogram YourprogramYourprogram
onhold
Submitmachine
BatchnodeBatchnode
Batchnode BatchnodeBatchnode
Condormaster
Condorpool
Someoneloggedin!
Batchnode
YourprogramYourprogram Yourprogram YourprogramYourprogram
Condormanagingmul?pleprogramsonmul?plecomputerswithmul?pleconfigura?ons
YourprogramYourprogram Yourprogram YourprogramYourprogram
onhold
Submitmachine
BatchnodeBatchnode
Batchnode BatchnodeBatchnode
Condormaster
Condorpool
Batchnode
YourprogramYourprogram Yourprogram YourprogramYourprogram
Condoruses“ClassAds”tomatchyourrequirementswithwhateachnodeoffers
YourprogramYourprogram Yourprogram YourprogramYourprogram
onhold
Submitmachine
BatchnodeBatchnode
Batchnode BatchnodeBatchnode
Condormaster
Condorpool
Yourrequirements(jobClassAd)
Whatanodeoffers(machineClassAd)
ResourcePlanning• Condorcan’tdoeverythingforyou.• Thinkaboutinputfiles(includingprograms)andoutputfilesandhow
they’llbeaccessed.• Thinkaboutdiskspace.“df -h”and“du -shx *”canhelp.• Funfact:Thepar?cle-physicsCondorpoolscan’tseeyourhomedirectory!• Moral:Letcondortransferyourfiles…whenpossible.
Whenyoucan’tletcondortransferyourfiles,herearedisk-sharingmethodsoutsideofcondor:• NFS–usedatNevis• CVMFS–FermilabandCERN• Grid,BlueArc–onlyusedatFermilab• AFS–obsolete,s?llusedatCERN
ResourcePlanning
Yourserver
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
/home
Whatwedon’tdo
• Condorcan’tdoeverythingforyou.• Thinkaboutinputfiles(includingprograms)andoutputfilesandhow
they’llbeaccessed.• Thinkaboutdiskspace.“df -h”and“du -shx *”canhelp.• Funfact:Thepar?cle-physicsCondorpoolscan’tseeyourhomedirectory!• Moral:Letcondortransferyourfiles…whenpossible.
ResourcePlanning• Condorcan’tdoeverythingforyou.• Thinkaboutinputfiles(includingprograms)andoutputfilesandhow
they’llbeaccessed.• Thinkaboutdiskspace.“df -h”and“du -shx *”canhelp.• Funfact:Thepar?cle-physicsCondorpoolscan’tseeyourhomedirectory!• Moral:Letcondortransferyourfiles…whenpossible.
Yourserver
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
/home
Fileserver
/share
/data
Whatwedo
Computer Systems at NevisLinux Cluster
hypa?aadministra?on,NIS
kolyaATLASfranklin
karthurATLAS
hogwartsstaff
Administra?veservers Workgroup/Loginservers
Clientsand
x-terms
Clientsand
x-terms
Worksta?onsbatchnodesstudentboxes
shangDOE
annexoff-sitebackup&mail
adawebserver
sullivanmailing-listserver
<hKp://www.nevis.columbia.edu/linux/><hKp://www.nevis.columbia.edu/linux/cluster-names.html>
tehanuVERITAS
shelleybackupserver
xenia
tangoSMB
hermesDNS,batch
virtualmachines
houstonNeutrino
Fileservers
xenia2
vetchgedserret
amsterdam
bleeker
westside
riverside
node05
Bringingthejobtothedata
Submitmachine
node06node04
node02 node03node01
Condormaster
requirements = (machine = node04.nevis.columbia.edu)
bigfile1.root bigfile2.root bigfile3.root
bigfile4.root bigfile5.root bigfile6.root
Somewrapperscript
Final?ps• Splitupyourtasksoeachcondorjobtakes20-60minutes
• Ifyourjobmustbepreempted,itwillhavetorunfromthebeginningonthesamemachinethatcancelledthejob
• Testyourjobwithoneprocessbeforesubmiqngitfor10,000processes!
Resources
MainwebpagehKp://research.cs.wisc.edu/htcondor/
QuickstarthKp://research.cs.wisc.edu/htcondor/quick-start.html
FullmanualhKp://research.cs.wisc.edu/htcondor/manual/v7.6/2_Users_Manual.html
Nevispar?cle-physicscondorguide
hKps://twiki.nevis.columbia.edu/twiki/bin/view/Nevis/Condor
BasicCondor@NevistutorialhKp://www.nevis.columbia.edu/~seligman/root-class/