Upload
doanthuy
View
219
Download
5
Embed Size (px)
Citation preview
Daniel Udwary "NERSC Data Science Engagement Group"February 3, 2016
Running Jobs at Wang Hall
Outline
• Genepoolmovelogis-cs• DifferencesbetweenCraysandGenepool• CoriandEdisonarchitectureandconfigura-ons• IntrotoSLURM
2
Why am I running this training session? • DuringtheMendelmove(nextweek!),wewillhaveaperiodof
reducedGenepoolcomputeavailability
• WewanttoencouragemoreJGIcomputeworkonNERSC’sflagshipsupercomputers,whenitmakessense– Lastyear,usedlessthanhalfofCPU-houralloca6on
• NERSCwantstoknowwhatitcandotobeOerenablebioinforma-csworkonthosemachines,andiden-fywherefutureproblemsmightlie
• GenepoolmaymovetoSLURMinthefuture
3
NERSC has moved to a new building
• AllsystemsmustmovefromOaklandtoBerkeley
- 4 -
Resources at Wang Hall (aka CRT)
• NewMendel+nodes• Newloginnodes(genepool13andgenepool14)• Allfilesystems(almost…)• Cori• Edison
S-llatOSF:– OldMendelnodes–movingstar6ngFeb8– LegacyGenepoolnodes–tobeshutdown~Feb22– Tapearchive–Noplantomove(yet)
5
Move Schedule – Current Plan
-6-
Feb8
Mendel+
LegacyComputes
Feb22?
Mendel
Outage@CRT@OSF
Filesystems
Scheduler
Down6meforpowerworkand
networkingmaintenance
?
SeqFS
Key Differences Between Cori/Edison and Genepool
CoriandEdison• Generallylarge,mul--node
jobs
• Jobsarecharged
• Wait-meun-ljobstartmeasuredindays
• Usersgenerallycompileandinstalltheirownso\ware–fewmodules
• SLURM
Genepool• Manysmall,singlenode(or
evensingle-CPU)jobs
• Nojobcharging• Wait-memeasuredinhours,
ifnotminutes
• AwesomeJGIconsultantsmanagebioinforma-csso\wareasmodules
• UGE
7
Basics of NERSC Cray architecture
• CoriPhaseI– CrayXC– 1630nodes– 128GBmemorypernode– 32corespernode
• (2x16core2.3GHzHaswell)
• CoriPhaseII– >9300nodes– KnightsLandingCPUs
• Edison– CrayXC30– 5576nodes– 64GBmemorypernode– 24corespernode
• (2x12core2.4GHzIvyBridge)
8
Edison Queue Structure
9
https://www.nersc.gov/users/computational-systems/edison/running-jobs/queues-and-policies/
So, use Edison for large parallel jobs using >682 nodes
Cori queue structure • hOps://www.nersc.gov/users/computa-onal-systems/cori/running-jobs/queues-and-policies/
10
What is SLURM?
• Insimpleword,SLURMisaworkloadmanager,orabatchscheduler.
• SLURMstandsforSimpleLinuxU-lityforResourceManagement.
• SLURMunitestheclusterresourcemanagement(suchasTorque)andjobscheduling(suchasMoab)intoonesystem.Avoidsinter-toolcomplexity.
• AsofJune2015,SLURMisusedin6ofthetop10computers,includingthe#1system,Tianhe-2,withover3Mcores.
• CoriinstalledwithSLURM,andEdisonswitchedlastNov,a\erits’move
- 11 -
Advantages of Using SLURM
• Fullyopensource.• SLURMisextensible(pluginarchitecture)• Lowlatencyscheduling.Highlyscalable.• Integrated“serial”or“shared”queue• IntegratedBurstBuffersupport• Goodmemorymanagement• Built-inaccoun-nganddatabasesupport• “Na-ve”SLURMrunswithoutCrayALPS(Applica-onLevel
PlacementScheduler)– Batchscriptrunsontheheadcomputenodedirectly– Easiertouse.Lesschanceforconten6oncomparedtosharedMOM
node.
- 12 -
SLURM User Commands • sbatch qsub submitabatchscript• salloc qlogin requestaninterac6vesession• scancel qdel deleteabatchjob• scontrolhold qhold holdajob• scontrolrelease qrls releaseajob• sacct qacct displayjobaccoun6ngdata• sqs qs NERSCcustomqueuedisplay
- 13 -
Running with SLURM • Use“sbatch”(as“qsub”inUGE)tosubmitbatchscript
or“salloc”(as“qlogin”inUGE)torequestinterac-vebatchsession.
• Needtospecifywhichshelltouseforbatchscript.• Environmentisautoma-callyimported(as“qsub-V”in
UGE)• Landsonthesubmitdirectory• Batchscriptrunsontheheadcomputenode• Noneedtorepeatflagsinthesruncommandifalready
definedinSBATCHkeywords.• Hyperthreadingisenabledbydefault.Jobsreques-ng
morethan32cores(MPItasks*OpenMPthreads)pernodewillusehyperthreadsautoma-cally.
- 14 -
Running with SLURM continued
• Use“srun”tolaunchparalleljobs(aswith“aprun”withTorque/Moab)
• srunflagsoverwriteSBATCHkeywords• srundoesmostofop-malprocessandthreadbindingautoma-cally.Onlyflagssuchas“-n”“-c”,alongwithOMP_NUM_THREADSareneededformostapplica-ons.Advanceduserscanexperimentmoreop-onssuchas–num_tasks_per_socket,–cpu_bind,--mem,etc.
15
16
http://slurm.schedmd.com/rosetta.pdf
SLURM Task arrays
TaskarraysworksimilarlytoUGE• sbatch--array=1-100
– Wouldstarta100taskjobarray
• Jobarrayswillhavetwoaddi6onalenvironmentvariablesset:– $SLURM_ARRAY_JOB_IDwillbesettothefirstjobIDofthearray.
– $SLURM_ARRAY_TASK_IDwillbesettothejobarrayindexvalue.
17
Sample SLURM Batch Script
-18-
#!/bin/bash-l#SBATCH--par66on=regular#SBATCH--job-name=test#SBATCH--account=mpccc#SBATCH--nodes=2#SBATCH--6me=00:30:00srun-n16./mpi-helloexportOMP_NUM_THREADS=8srun-n8-c8./xthi
#!/bin/bash-l#SBATCH-pregular#SBATCH-Jtest#SBATCH-Ampccc#SBATCH-N2#SBATCH-t00:30:00srun-n16./mpi-helloexportOMP_NUM_THREADS=8srun-n8-c8./xthi
Longcommandop6ons Shortcommandop6ons
Tosubmitabatchjob:%sbatchmytest.slSubmiqedbatchjob15400
SLURMmary
• SLURMprovidesequivalentorsimilarfunc-onalitywithTorque/MoabandUGE.
• srunprovidesequivalentorsimilarprocessandthreadaffinitywithaprun.
• Pleaseletusknowifyouhaveanadvancedorcomplicatedworkflow,andan-cipatepoten-alpor-ngissues.Wecanworkwithyoutomigrateyourscripts.
• Batchconfigura-onsares-llsubjecttotuningsandmodifica-onsbeforethesystemisinfullproduc-on.
- 19 -
Documentations • SchedMDwebpage:
– hqp://www.schedmd.com/• RunningJobsonCori
– hqps://www.nersc.gov/users/computa6onal-systems/cori/running-jobs/• Manpagesforslurm,sbatch,salloc,squeue,sinfo,sacct,scontrol,
scancel,etc.• Torque/Moabvs.SLURMComparisons
– hqps://www.nersc.gov/users/computa6onal-systems/cori/running-jobs/for-edison-users/torque-moab-to-slurm-transi6on-guide/
• RunningjobsonBabbageusingSLURM:– hqps://www.nersc.gov/users/computa6onal-systems/testbeds/babbage/
running-jobs-under-slurm-on-babbage/• RunningiobsonEdison’stestsystem(Alva)withna-veSLURM
– hqps://www.nersc.gov/users/computa6onal-systems/edison/alva-test-and-development-system-for-edison/#toc-anchor-7
- 20 -