Exercises: Abel/Colossus and SLURM · Abel/Colossus and SLURM Sabry Razick The Research Computing...

Exercises: Abel/Colossus and SLURM

Sabry Razick

The Research Computing Services Group, USIT

April 05, 2017

Topics• Connect to Abel/Colossus

• Running a simple job

• Running a simple job -- qlogin

• Job script

• Customize batch script

• Parallel jobs

Log into Abel

• If on Windows, download

– Gitk/Putty for connecting

– WinSCP for copying files

• On Unix systems, open a terminal and type:

ssh <username>@abel.uio.no

• Use your UiO/Nortur username and password.

Gitbash

Examples - file location

/cluster/teaching/abel_tutorial/APRIL2017

Interactive login (Qlogin)

● Login to Abel from you laptop● Request to occupy some resources from SLURM● Wait until SLURM grant you the resources● Execute the job as it was in your laptop

Running a job on the cluster -1

SLURMABEL

qlogin> qlogin --account=xxx --time=00:10:00 salloc: Pending job allocation 12989428 salloc: job 12989428 queued and waiting for resources salloc: job 12989428 has been allocated resources salloc: Granted job allocation 12989428 srun: Job step created>source /cluster/bin/jobsetup Starting job 12990135 ("") on c15-6 at Wed Oct 28 18:14:31 CET 2015>hostname

Job script

Running a simple job

• Create a directory as follows• mkdir $HOME/RCS_tutorial

• Find the project name• projects

• Create job script• Sample script from following location• /cluster/teaching/abel_tutorial/APRIL2017/hello.slurm

• Set permission, so you can edit it• chmod +w hell.slurm

● Login to Abel from you laptop● Create a job script, with parameters and include the program

to run● Hand it over to the workload manager ● The workload manager will handle the job queue, monitor the

progress and let you know the outcome.● Inspect results

Running a job on the cluster - 2

Running a simple job

• Correct the project name in the hello.slurm

• Submit the job to Abel• sbatch hello.slurm

#!/bin/bash

#SBATCH --job-name=RCS0416_hello#SBATCH --account=xxx#SBATCH --time=00:02:00#SBATCH --mem-per-cpu=256M#SBATCH --ntasks=1

source /cluster/bin/jobsetupset -o errexit

echo "Hello from Abel"sleep 90

Job script

Resources

The job

Join the queue> sbatch hello.slurm

● Investigate while the job is running

> squeue -u $USER > scontrol show job ####

> scancel ####

● If you want to cancel the job (don't do this now)

Investigate

>scontrol show job 12989353 JobId=12989353 Name=RCS1115_hello UserId=sabryr(243460) GroupId=users(100) Priority=22501 Nice=0 Account=staff QOS=staff JobState=COMPLETED Reason=None ... RunTime=00:00:02 TimeLimit=00:01:00 TiCommand=../RCS_tutorial/hello.slurm WorkDir=../RCS_tutorial StdErr=../RCS_tutorial/slurm-12989552.out StdOut=../RCS_tutorial/slurm-12989552.out

Introduce an error

#!/bin/bash

#SBATCH --job-name=RCS1115_hello#SBATCH --account=xxx#SBATCH --time=00:02:00#SBATCH --mem-per-cpu=512M

xxecho "Hello from Abel"echo "Hello2 from Abel"

• Investigate error file

I/O - Files

• User scratch directory if you are accessing a

file multiple time during a job

• Choose where you start the job from • On Abel jobs accessing files and/or writing out

large outputs will run faster if /work is used

compared to running from $HOME

• On colossus start from /cluster/projects/pXXX

I/O - Files

• When handling very large number of files try

to use • Archives - just in time extract or write directly to

Exercises: Abel/Colossus and SLURM · Abel/Colossus and SLURM Sabry Razick The Research Computing...

Documents

Modules and Slurm - uliege.be

ABEL-HDL Primer - eelinux.ee.usm.maine.edueelinux.ee.usm.maine.edu/courses/ele172/abel/HDL-ABEL Primer.pdf · ABEL-HDL Primer ABEL Primer Contents ... ABEL is an industry-standard

SLURM command summary - Slurm Workload Manager Date 10/3/2017 10:17:03 AM

Slurm Migration

Tutorial 3: Cgroups Support On SLURM · Slurm User Group 2012 (c) Bull 2012 Tutorial 3: Cgroups Support On SLURM SLURM User Group 2012, ... SLURM Cgroups Documentation • Cgroups

Colossus Charms

3. Slurm at BSC

Slurm License Management

Evaluating SLURM simulator with real-machine SLURM and

Colossus - Valcontrol

Slurm Advanced Scheduling - Sciencesconf.org

Slurm and Abel job scripts - Universitetet i oslo · Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013

How to Install Software on Abel (and Colossus)€¦ · HowtoInstallSoftwareonAbel(andColossus) Bjørn-HelgeMevik Research Infrastructure Services Group, USIT, UiO RISCourseWeek Bjørn-Helge

Slurm Guide

Introduction to Abel and SLURM - Universitetet i oslo · Introduction to Abel and SLURM ... . ... we use the Simple Linux Utility for Resource Management

Introduction to Abel and SLURM - Universitetet i oslo · Introduction to Abel and SLURM Katerina Michalickova ... . …

BitFénix Colossus Window

Introduction to Abel/Colossus and the queuing system

Colossus Worksheet

Introduction to Slurm - NASA