Practical introduction to Anselm: environment, jobs,...

Preview:

Citation preview

Branislav Jansík

Practical introduction to Anselm:environment, jobs,software and libs

Accessing HPC resources

Grant competitions● Open Access 2x per year ● Internal Access (via IT4I) 4x per year● Directors discretion

Obtaining login credentials

AllocationCommittee

PI

Resource Allocation

Collaborator 1

Collaborator 2

Collaborator n

Authorization to utilize resources

Authorization to utilize resources

Authorization to utilize resources

Authorization chain

Obtaining login credentials Contact support to get the login credentials

To: support@it4i.czSubject: Access to Anselm

Dear support,

Please open a user account for me and attach the account to OPEN-0-0Name and affiliation: John Smith, john.smith@myemail.com, Department of Chemistry, MIT, USI have read and accept the Acceptable use policy document (attached)

Preferred username: johnsm

Thank you,John Smith(Digitally signed)

Obtaining login credentials Authorization by PI

To: support@it4i.czSubject: Authorization to Anselm

Dear support,

Please include my collaborators to project OPEN-0-0.

John Smith, john.smith@myemail.com, Department of Chemistry, MIT, USJonas Johansson, jjohansson@otheremail.se, Department of Physics, Royal Institute of Technology, SwedenLuisa Rossi, lr@emailitalia.it, Department of Mathematics, National Research Council, Italy

Thank you,PI(Digitally signed)

Anselm cluster HPC infrastructure

Switch

Storage● HOME: 300 TB (Shared)● SCRATCH 130 TB (Shared)

Interconnect● Infiniband, non blocking● 40Gb/s

Compute ● 209 nodes● SandyBridge 2.4GHz x86-64● 16 cores, 256 bit AVX instr.● 64 GB RAM● 300 GB local disk● 27x accelerator

Anselm cluster HPC infrastructureStorage● HOME: 300 TB (Shared)● SCRATCH 130 TB (Shared)

Interconnect● Infiniband, non blocking● 40Gb/s

Compute ● 209 nodes● SandyBridge 2.4GHz x86-64● 16 cores, 256 bit AVX instr.● 64 GB RAM● 300 GB local disk● 27x accelerator

UtilizationYou need to carefully consider how to utilize all the 16 cores available on the node and how to use multiple nodes at

the same time. ● Run the right way● Parallelize your code.

Anselm cluster HPC infrastructure

Switch

Explore the login node

● Shell configuration● Software modules

Logging in

Switch

user0 login

Environment and modules Environment● Linux operating system● Bash shell● Gnome GUI● The .bashrc file: store your aliases and other settings

here

Modules● Set up the application paths, library paths and

environment variables for particular application

Environment customizations

● Define aliases● Define useful functions● Run commands● Load modules● Save your settings to the .bashrc file

$ alias qs='qstat -a'

$ swd ()>{ > WDIR=$(pwd)>}

$ wd ()>{ > cd $WDIR>}

$ date $ hostname

$ alias ch='rspbs -get-node-ncpu-chart '

Modules

● Sets up the application paths, library paths and environment variables for particular application

● Convenient way to setup whole environment in one command

$ module avail

$ module load matlab

$ module unload

$ module load list $ module load impi$ module swap impi openmpi

$ module whatis matlab

Module versions $ module avail-----/opt/modules/modulefiles/mpi -------bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-gcc46impi/4.1.0.024 mvapich2/1.9-iccimpi/4.1.0.030 openmpi/1.6.5-gcc(default)impi/4.1.1.036(default) openmpi/1.6.5-gcc46mvapich2/1.9-gcc(default) openmpi/1.6.5-icc

● Modules come in many variants – version variant, compiler variant, library variat, etc.

● Pick a variant$ module load openmpi/1.6.5-icc

Inside of a module $ module avail-----/opt/modules/modulefiles/mpi -------bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-gcc46impi/4.1.0.024 mvapich2/1.9-iccimpi/4.1.0.030 openmpi/1.6.5-gcc(default)impi/4.1.1.036(default) openmpi/1.6.5-gcc46mvapich2/1.9-gcc(default) openmpi/1.6.5-icc

$ less /opt/modules/modulefiles/mpi/openmpi/.common

$ less /opt/modules/modulefiles/mpi/openmpi/1.6.5-icc

Save settings in .bashrc

# ./bashrc

# Source global definitionsif [ -f /etc/bashrc ]; then . /etc/bashrcfi

# User specific aliases and functionsalias qs='qstat -a'module load PrgEnv-gnu

# Display informations to standard output - only in interactive ssh sessionif [ -n "$SSH_TTY" ]then module list # Display loaded modulesfi

Allocation and executionuser0 login

Resource allocationand execution via PBS queue system

Switch

The queue system

Active project

resources priority permit walltime

Express queueqexp

no 8 1 no 1h

The queue system

Active project

resources priority permit walltime

Express queueqexp

no 8 1 no 1h

Production queueqprod

yes 209 3 no 48h

The queue system

Active project

resources priority permit walltime

Express queueqexp

no 8 1 no 1h

Production queueqprod

yes 209 3 no 48h

Long queueqlong

yes 60 3 no 3*48h

The queue system

Active project

resources priority permit walltime

Express queueqexp

no 8 1 no 1h

Production queueqprod

yes 209 3 no 48h

Long queueqlong

yes 60 3 no 3*48h

Dedicated queuesqnvidia, qmic, qfat

yes Nvidia, MIC, Fat

2 yes 48h

The queue system

Active project

resources priority permit walltime

Express queueqexp

no 8 1 no 1h

Production queueqprod

yes 209 3 no 48h

Long queueqlong

yes 60 3 no 3*48h

Dedicated queuesqnvidia, qmic, qfat

yes Nvidia, MIC, Fat

2 yes 48h

Free resource queueqfree

yes 180 4 no 12h

Queue status● Check the queue status on command line

$ qstat -q$ qstat -a$ rspbs –summary$ rspbs –get-node-ncpu-chart

● Check the queue status on web(coming soon!)

Resource accounting policy● Core hours accounting

- based on the wall clock basis- runs whenever the cores are allocated or blocked

Example1: Running for 10 hours on 160 cores (10 nodes) costs 10*160 = 1600 core hours

Example2: Running for 10 hours on 16 cores using scatter:excl (10 nodes) costs 10*160 = 1600 core hours

● Check the consumed core-hours$ it4ifree

Job submissionuser0 login

Allocation and execution via qsub

Switch

Job submission● Use the qsub command to submit your job to a queue

- it will allocate the nodes- it will execute the jobscript

$ qsub -A Project ID -q queue -l select=x:ncpus=y jobscript

What are chunks?● Use the qsub command to submit your job to a queue

- it will allocate the nodes- it will execute the jobscript

● $ qsub -A Project ID -q queue -l select=2:ncpus=4 jobscript

What are chunks?● Use the qsub command to submit your job to a queue

- it will allocate the nodes- it will execute the jobscript

● $ qsub -A Project ID -q queue -l select=2:ncpus=16 jobscript

Job submission● Use the qsub command to submit your job to a queue

- it will allocate the nodes- it will execute the jobscript

$ qsub -A Project ID -q queue -l select=x:ncpus=y jobscript

$ qsub -A OPEN-0-0 -q qprod -l select=64:ncpus=16,walltime=03:00:00 ./myjob

$ qsub -A OPEN-0-0 -q qprod -l select=64:ncpus=16:cpu_freq=24 -I

$ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16:host=cn204+1:ncpus=16:host=cn205 -I

$ qsub ./myjob

Job management● Use the qstat and check-pbs-jobs to check job status

$ qstat -a$ qstat -an$ qstat -an -u username$ qstat -f jobid

$ check-pbs-jobs --check-all$ check-pbs-jobs –print-job-out$ check-pbs-jobs --ls-lscratch

Job management● Use the qhold, qrls, qdel, qsig or qalter to manage jobs

$ qsub -A OPEN-0-0 -l select=100 ./jobscript$ qhold jobid$ qrls jobid$ qalter -l select=101,walltime=00:15:00 jobid

$ qsig jobid$ qdel jobid

Job execution$ qsub -A Project ID -q queue -l select=x:ncpus=y jobscript

$ qsub -A Project ID -q queue -l select=x:ncpus=y -I

● Jobscript is executed on first node of the allocation

● Jobscript is executed in HOME directory

● File $PBS_NODEFILE contains list of allocated nodes

● Allocated nodes are accessible to user

Job execution$ qsub -A Project ID -q queue -l select=x:ncpus=y jobscript

#!/bin/bash

# change to local scratch directorycd /lscratch/$PBS_JOBID || exit

# copy input file to scratch cp $PBS_O_WORKDIR/input .cp $PBS_O_WORKDIR/myprog.x .

# execute the calculation./myprog.x

# copy output file to homecp output $PBS_O_WORKDIR/.

#exitexit

Job execution$ qsub jobscript

#!/bin/bash#PBS -q qprod#PBS -N MYJOB#PBS -l select=100:ncpus=16:mpiprocs=1:ompthreads=16#PBS -A OPEN-0-0

# change to scratch directory, exit on failurecd /scratch/$USER/myjob || exit

# load the mpi modulemodule load openmpi

# execute the calculationmpiexec ./mympiprog.x

#exitexit

Job execution● Anselm nodes are NUMA nodes

$ numactl --membind=0 –cpunodebind=0 command

Mem

ory

Mem

ory

60% efficiency

Software environment• Programing environment

‒ Gnu compilers: gfortran, gcc, g++, gdb‒ Intel compilers: ifort, icc, idb‒ PGAS compilers: upc‒ Interpreters: Perl, python, java, ruby, bash● HPC libraries: intel MKL suite, ATLAS, GOTO, PETSc

Scalapack, Plasma and Magma Comm libraries: bullx MPI, OpenMPI, OpenShmem

• Performance analysis‒ gprof‒ PAPI, Scalasca‒ HPCToolkit, Open|Speedshop

System environment• Commercial products

‒ Comsol‒ Matlab‒ Ansys‒ ….

• Check out available module for list of software$ module avail$ module load bullxde$ module load bullxde papi

Octave, R and Matlab• Octave and R are linked to HPC libraries

FFTW3 and MKL (runs parallel on 16 cores)• GUI

$ module load octave/hg-20130730• $ module load Rstudio• Matlab Licences

$ module load matlab$ module load matlab/R2013a-COM

• Batch execution$ matlab -nosplash -nodisplay -r mscript > moutput.out$ octave -q –eval oscript > ooutput.out$ R CMD BATCH rscript.R routput.out

ISV Licenses● Check available licenses in license state file

Ansys /apps/user/licenses/ansys_features_state.txt

Comsol /apps/user/licenses/comsol_features_state.txt comsol-edu /apps/user/licenses/comsol-edu_features_state.txt

Matlab /apps/user/licenses/matlab_features_state.txt matlab-edu /apps/user/licenses/matlab-edu_features_state.txt

● Tell PBS about the license you need

● Licenses are not monolithic, they split in features

$ qsub … -l feature__matlab__Image_Toolbox

● Grab the license as ASAP in your job

Programming

● GNU C, C++, Fortran 77/90/95● Intel C, C++, Fortran 77/90/95● GNU UPC● Berkley UPC● Nvidia nvcc

Programming environments:● module load PrgEnv-gnu● module load PrgEnv-intel

Intel Parallel Studio● Intel Compilers

C, C++, Fortran 77/90/95

● Intel Debugger $ idb

● Intel MKL$ icc myprog.c -mkl

● Intel IPP (whatever function you can think of)● Intel TBB (Task based threaded parallelism

programming API)

MPIOpenMPI MPICH2

OpenMPI 1.6.5

BullxMPI 1.2.4

Intel MPI4.1

MPICH2 1.9

Differ by thread support level

Freely combine MPI library and compiler

Compile MPI programs using mpi wrappers (mpicc, mpif90, etc.)

Do not mix mpi implementations

Choose the right way to run an MPI program

Ways to run MPI programs1 process per node, 16 threads per process

Best for memory demanding apps with good cache data use

$ qsub -l select=xx:ncpus=16:mpiprocs=1:ompthreads=16

Ways to run MPI programs2 processes per node, 8 threads per process

Best for memory bound apps with scalable mem demand

$ qsub -l select=xx:ncpus=16:mpiprocs=2:ompthreads=8

c c c c

Ways to run MPI programs16 processes per node, 1 thread per process

Best for highly scalable applications with low communication demand.

$ qsub -l select=xx:ncpus=16:mpiprocs=16:ompthreads=1

cccc cccc cccc cccc cccc cccc cccc cccc

MPI jobscript

#!/bin/bash#PBS -l select=100:ncpus=16:mpiprocs=1:ompthreads=16

# change to scratch directory, exit on failurecd /scratch/$USER/myjob || exit

# load the mpi modulemodule load openmpi

# execute the calculationmpiexec ./mympiprog.x

#exitexit

1 process per node, 16 threads per process

MPI jobscript

#!/bin/bash#PBS -l select=100:ncpus=16:mpiprocs=1:ompthreads=16

# change to scratch directory, exit on failurecd /scratch/$USER/myjob || exit

# load the mpi modulemodule load openmpi

# execute the calculationmpiexec -bysocket -bind-to-socket ./mympiprog.x

#exitexit

2 processes per node, 8 threads per process

MPI jobscript

#!/bin/bash#PBS -l select=100:ncpus=16:mpiprocs=1:ompthreads=16

# change to scratch directory, exit on failurecd /scratch/$USER/myjob || exit

# load the mpi modulemodule load openmpi

# execute the calculationmpiexec -bycore -bind-to-core ./mympiprog.x

#exitexit

16 processes per node, 1thread per process

Tips and tricksData transfers to and out of Anselm● Use ssh with fast block cipher aes-128ctr 160MB/s● Use multiple ssh connections to bypass the 160MB/s boundary

File system access● Set up the stripe size and stripe count● Use local scratch or ramdisk for small files

Respect the NUMA● Consider using numactl ● Consider using MPI binding

ConclusionsComputational resources available to general academic community are allocated in competition of scientific and technical quality

Read the documentation, contact support, contact me!

IT4Innovations SuperComputer Center is here to run the computer and assist you in using it

branislav.jansik@vsb.cz

Recommended