Advanced UPPMAX usage More SLURM

More SLURMAdvanced UPPMAX usage

Using Bash to manage jobs

Job efficiency

More SLURM and other advanced UPPMAX techniques

● A closer look at SLURM● GPUs on Snowy● Jobstats — our mutual friend in the fight for

efficiency● Advanced job submission

SLURM

● Free, popular, lightweight ● Open source:

https://github.com/SchedMD/slurm

● UPPMAX Slurm user guide: https://www.uppmax.uu.se/support/user-guides/slurm-user-guide/

https://github.com/SchedMD/slurm

https://www.uppmax.uu.se/support/user-guides/slurm-user-guide/

https://www.uppmax.uu.se/support/user-guides/slurm-user-guide/

More on sbatch

● A recap:● sbatch -A snic2021-1-123 -t 10:00 -p core -n 10 myjob.sh

Slurm batch

Project name Maximum runtime

“partition”(“job type”)

# cores job script

More on time limits● -t dd-hh:mm:ss

● 0-00:10:00 = 00:10:00 = 10:00 = 10

● 0-12:00:00 = 12:00:00

● 3-00:00:00 = 3-0

● 3-12:10:15

Recall from “Intro to UPPMAX”● Q: When you have no idea how long a program

will run, what should you book?− A: very long time, e.g. 10-00:00:00

● Q: When you do have an idea of how long a program should run, what should you book?− A: overbook by 150%

Recall from “Intro to UPPMAX”● Q: When you have no idea how long a program

will run, what should you book?− A: very long time, e.g. 10-00:00:00

● Q: When you do have an idea of how long a program will run, what should you book?− A: overbook by 150%

More on partitions● -p core

● The default● < 20 cores on Rackham● < 16 cores on Snowy or Bianca● A script or program written without any thought

to parallelism will use 1 core

Quick testing● The “devel” partition

− 2 nodes

− Up to 1 hour in length

− Only 1 at a time

− -p devcore, -p devel

● High-priority short jobs− 4 nodes

− Up to 15 minutes

− --qos=short

● Interactive jobs− Up to 12 hours

− Handy for debugging a script by executing it manually line by line.

When a job goes wrong● scancel

− <jobid>

− -u username — to cancel all your jobs− -t <state> — cancel pending or running jobs− -n <name> — cancel jobs with name− -i — asks for confirmation

Parameters in job script or on command line?

● Command line parameters override script parameters● Typical script maybe:

#!/bin/bash -l #SBATCH –A snic2021-22-606 #SBATCH –p core #SBATCH –n 1 #SBATCH –t 24:00:00

● Just a quick test: ● $ sbatch -p devcore -t 00:15:00 job.sh

Memory in core or devcore jobs● -n X

● On Rackham: get 6.4 GB per core● On Snowy/Bianca: get 8 GB per core

● Slurm reports available memory on starting an interactive job

More flags● -J jobname

● Email:− --mail-type=BEGIN,END,FAIL,TIME_LIMIT_80

− --mail-user Don’t use. Set your email in SUPR correctly.

● Output redirection:− --output=my.output.file

− --error=my.error.file

More flags● Memory

− -C thin / -C 128GB

− -C fat / -C 256GB / -C 1TB

● Dependencies: --dependency

● Job array: --array

● More at https://slurm.schedmd.com/sbatch.html − Or just man batch

− (though not all work on all systems!)

GPU Nodes on Snowy● Nodes with 1 Nvidia T4 ● Available to everyone, priority to groups that paid for

them

#SBATCH -M snowy#SBATCH --gres=gpu:1 or --gpus=1#SBATCH --gpus-per-node=1

● There is a system installation of CUDA v 11, other versions available via modules

● https://www.uppmax.uu.se/support/user-guides/using-the-gpu-nodes-on-snowy/

Time for a break?● That’s enough about sbatch● Next up: monitoring jobs and job efficiency

Monitoring● jobinfo — a wrapper around squeue

− jobinfo -u username

− jobinfo -A snic2021-22-606

● Can also use squeue directly

Priority● Roughly:

− First job of the day gets elevated priority− Other normal jobs run in order of submission

(subject to scheduling)− Projects exceeding allocation get successively

lower priority category− Bonus jobs run after higher priority categories

Priority● In practice:

− submit early — run early. − Bonus jobs always run eventually, sometimes wait

until night or weekend.

● In detail: − https://www.uppmax.uu.se/support/faq/running-jobs

-faq/your-priority-in-the-waiting-job-queue/

Job efficiency● jobstats — our mutual friend in the fight for

productivity− Only works for jobs > 5-15 minutes in length− -r — check running jobs− -A <project> — check all recent jobs for a

project− -p — produce CPU & memory usage plot− -M <cluster> — check jobs on other cluster

Jobstats exercise● Generate jobstats plots for your jobs

− First, find some job id’s from this month− $ finishedjobinfo -m <yourusername>

− Note the job id’s from some interesting jobs. − Generate the images− $ jobstats -p id1 id2 id3

● Look at the images. I have put some interesting ones in /proj/introtouppmax/labs/moreslurm/jobstatsplots

− $ eog *.png &

Jobstats plots● Which of the plots in labs/moreslurm/jobstatsplots:− Show good CPU/memory usage?− Show a job that needs a fat node?

Time for a break?● Next is advanced job submission

Multicore jobs● (Efficient) multicore jobs need either high

memory utilisation or multiple execution threads. Either:− Job script launches multiple programs− One program runs multithreaded

Multithreaded script#!/bin/bash -l#SBATCH -A snic2021-22-606#SBATCH –p core#SBATCH –n 4#SBATCH -t 00:15:00#SBATCH -J 4commands

cd /proj/introtouppmax/labs/moreslurm work.sh 1 10000000 & work.sh 2 15000000 & work.sh 3 20000000 & work.sh 4 10000000 & wait

Multithreaded program● Program can mention “OpenMP”, “MPI”, “pthreads”, or other parallel

programming technologies

#!/bin/bash -l#SBATCH -A snic2021-22-606#SBATCH –p core#SBATCH –n 4#SBATCH -t 00:15:00#SBATCH -J multithreaded

cd /proj/introtouppmax/labs/moreslurm ./work_threaded.sh

Dependencies● --dependency <jobid> : job added to

queue after successful end of job <jobid>● Very handy for “fire and forget” workloads● Potentially lots of time spent in queue

Exercise● Look at /proj/introtouppmax/labs/moreslurm/dependency/

● Run dependency_submit.sh and see how it works ● Read man sbatch for more information

− When might you use afterany instead of afterok? − When might singleton be a good idea?

● Discuss in HackMD

Dividing up a big chunk of work● A common question is how to divide up a really

big job and manage the chunks● The best approach depends on specifics, but is

usually either:− Jobarrays− Some Bash scripting that submits lots of

reasonable-sized jobs− A workflow manager such as SnakeMake or Nextflow

Snakemake and Nextflow● Conceptually similar, but with different flavours● First define steps, each with an input, an output,

and a command that transforms an input into an output

● Then just ask for desired output, and the system will handle the rest

Job arrays● Submit many jobs at once with same

parameters● Use $SLURM_ARRAY_TASK_ID in script to find

the correct part of the workload● You can find a simple example in moreslurm/jobarrays

Exercise● Suppose you have to do 1000 runs of a

program and want do 50 runs per job.● Modify the jobarrays example to submit 20

1-core jobs in an array, each of which will run “echo” 50 times.

DIY Workflows● For middle-of-the-road situations, some simple

Bash (or Python) will suffice.● labs/moreslurm/manyjobs/ contains an

example− job.sh does a “chunk” of work− jobsubmit.sh submits the jobs to Slurm

THE END

Now you know everything there is to know about using Slurm at UPPMAX

Documents

Advanced UPPMAX usage More SLURM