17
An overview of batch processing 2-June-2016

An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Embed Size (px)

Citation preview

Page 1: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Anoverviewofbatchprocessing

2-June-2016

Page 2: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Yourcomputer

Yourprogram

One-on-one

Page 3: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Yourcomputer(mul?plecores)

YourprogramYourprogram Yourprogram YourprogramYourprogram

Mul?pleprogramsonasinglecomputer

Page 4: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Yourcomputer(mul?plecores)

YourprogramYourprogram Yourprogram YourprogramYourprogram

Abatchsystemmanagingmul?pleprogramsonasinglecomputer

YourprogramYourprogram Yourprogram YourprogramYourprogram

onhold

Page 5: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Batchnode

YourprogramYourprogram Yourprogram YourprogramYourprogram

Abatchsystemmanagingmul?pleprogramsonmul?plecomputers

YourprogramYourprogram Yourprogram YourprogramYourprogram

onhold

Yourcomputer

BatchnodeBatchnode

Batchnode BatchnodeBatchnode

Batchmanager

Page 6: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

ThestandardsoFwareformanagingbatchsystemsinscien?ficcompu?ngisHTCondor(orjustCondor)

MainwebpagehKp://research.cs.wisc.edu/htcondor/

Quickstart

hKp://research.cs.wisc.edu/htcondor/quick-start.html

FullmanualhKp://research.cs.wisc.edu/htcondor/manual/v7.6/2_Users_Manual.html

•  WeuseanolderversionofCondorintheNevispar?cle-physics

systems.•  S?cktothe“vanilla”universe;the“standard”universewon’twork

forROOToranyotherpar?cle-physicssoFware(soyoudon’tneedcondor_compile).

Page 7: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Batchnode

YourprogramYourprogram Yourprogram YourprogramYourprogram

Condormanagingmul?pleprogramsonmul?plecomputerswithmul?plequeues

YourprogramYourprogram Yourprogram YourprogramYourprogram

onhold

Submitmachine

BatchnodeBatchnode

Batchnode BatchnodeBatchnode

Condormaster

Condorpool

Page 8: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Batchnode

YourprogramYourprogram Yourprogram YourprogramYourprogram

Condorwillhaltaqueueinfavorofaninterac?veprogram

YourprogramYourprogram Yourprogram YourprogramYourprogram

onhold

Submitmachine

BatchnodeBatchnode

Batchnode BatchnodeBatchnode

Condormaster

Condorpool

Someoneloggedin!

Page 9: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Batchnode

YourprogramYourprogram Yourprogram YourprogramYourprogram

Condormanagingmul?pleprogramsonmul?plecomputerswithmul?pleconfigura?ons

YourprogramYourprogram Yourprogram YourprogramYourprogram

onhold

Submitmachine

BatchnodeBatchnode

Batchnode BatchnodeBatchnode

Condormaster

Condorpool

Page 10: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Batchnode

YourprogramYourprogram Yourprogram YourprogramYourprogram

Condoruses“ClassAds”tomatchyourrequirementswithwhateachnodeoffers

YourprogramYourprogram Yourprogram YourprogramYourprogram

onhold

Submitmachine

BatchnodeBatchnode

Batchnode BatchnodeBatchnode

Condormaster

Condorpool

Yourrequirements(jobClassAd)

Whatanodeoffers(machineClassAd)

Page 11: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

ResourcePlanning•  Condorcan’tdoeverythingforyou.•  Thinkaboutinputfiles(includingprograms)andoutputfilesandhow

they’llbeaccessed.•  Thinkaboutdiskspace.“df -h”and“du -shx *”canhelp.•  Funfact:Thepar?cle-physicsCondorpoolscan’tseeyourhomedirectory!•  Moral:Letcondortransferyourfiles…whenpossible.

Whenyoucan’tletcondortransferyourfiles,herearedisk-sharingmethodsoutsideofcondor:•  NFS–usedatNevis•  CVMFS–FermilabandCERN•  Grid,BlueArc–onlyusedatFermilab•  AFS–obsolete,s?llusedatCERN

Page 12: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

ResourcePlanning

Yourserver

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

/home

Whatwedon’tdo

•  Condorcan’tdoeverythingforyou.•  Thinkaboutinputfiles(includingprograms)andoutputfilesandhow

they’llbeaccessed.•  Thinkaboutdiskspace.“df -h”and“du -shx *”canhelp.•  Funfact:Thepar?cle-physicsCondorpoolscan’tseeyourhomedirectory!•  Moral:Letcondortransferyourfiles…whenpossible.

Page 13: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

ResourcePlanning•  Condorcan’tdoeverythingforyou.•  Thinkaboutinputfiles(includingprograms)andoutputfilesandhow

they’llbeaccessed.•  Thinkaboutdiskspace.“df -h”and“du -shx *”canhelp.•  Funfact:Thepar?cle-physicsCondorpoolscan’tseeyourhomedirectory!•  Moral:Letcondortransferyourfiles…whenpossible.

Yourserver

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

node

/home

Fileserver

/share

/data

Whatwedo

Page 14: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Computer Systems at NevisLinux Cluster

hypa?aadministra?on,NIS

kolyaATLASfranklin

Mail

karthurATLAS

hogwartsstaff

Administra?veservers Workgroup/Loginservers

Clientsand

x-terms

Clientsand

x-terms

Worksta?onsbatchnodesstudentboxes

shangDOE

annexoff-sitebackup&mail

adawebserver

sullivanmailing-listserver

<hKp://www.nevis.columbia.edu/linux/><hKp://www.nevis.columbia.edu/linux/cluster-names.html>

tehanuVERITAS

shelleybackupserver

xenia

tangoSMB

hermesDNS,batch

virtualmachines

houstonNeutrino

Fileservers

xenia2

vetchgedserret

amsterdam

bleeker

westside

riverside

Page 15: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

node05

Bringingthejobtothedata

Submitmachine

node06node04

node02 node03node01

Condormaster

requirements = (machine = node04.nevis.columbia.edu)

bigfile1.root bigfile2.root bigfile3.root

bigfile4.root bigfile5.root bigfile6.root

Somewrapperscript

Page 16: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Final?ps•  Splitupyourtasksoeachcondorjobtakes20-60minutes

•  Ifyourjobmustbepreempted,itwillhavetorunfromthebeginningonthesamemachinethatcancelledthejob

•  Testyourjobwithoneprocessbeforesubmiqngitfor10,000processes!

Page 17: An overview of batch processing - Nevis Laboratories · PDF fileAn overview of batch processing 2-June-2016 Your computer Your program One-on-one ... hKp://research.cs.wisc.edu/htcondor/quick-start.html

Resources

MainwebpagehKp://research.cs.wisc.edu/htcondor/

QuickstarthKp://research.cs.wisc.edu/htcondor/quick-start.html

FullmanualhKp://research.cs.wisc.edu/htcondor/manual/v7.6/2_Users_Manual.html

Nevispar?cle-physicscondorguide

hKps://twiki.nevis.columbia.edu/twiki/bin/view/Nevis/Condor

BasicCondor@NevistutorialhKp://www.nevis.columbia.edu/~seligman/root-class/