45
The Campus Cluster

The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Embed Size (px)

Citation preview

Page 1: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

The Campus Cluster

Page 2: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

What is the Campus Cluster?

• Batch job system• High throughput• High latency• Available resources:– ~450 nodes– 12 Cores/node– 24-96 GB memory– Shared high performance filesystem– High speed multinode message passing

Page 3: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

What isn’t the Campus Cluster?

• Not: Instantly available computation resource– Can wait up to 4 hours for a node

• Not: High I/O Friendly– Network disk access can hurt performance

• Not: ….

Page 4: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Getting Set Up

Page 5: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Getting started

• Request an account: https://campuscluster.illinois.edu/invest/user_form.html

• Connecting:ssh to taub.campuscluster.illinois.eduUse netid and AD password

Page 6: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Where to put data• Home Directory ~/

– Backed up, currently no quota (in future 10’s of GB)• Use /scratch for temporary data - ~10TB

– Scratch data is currently deleted after ~3 months– Available on all nodes– No backup

• /scratch.local - ~100GB– Local to each node, not shared across network– Beware that other users may fill disk

• /projects/VisionLanguage/ - ~15TB– Keep things tidy by creating a directory for your netid– Backed up

• Current Filesystem best practices (Should improve for Cluster v. 2):– Try to do batch writes to one large file– Avoid many little writes to many little files

Page 7: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Backup = Snapshots(Just learned this yesterday)

• Snapshots taken daily

• Not intended for disaster recovery – Stored on same disk as data

• Intended for accidental deletes/overwrites, etc.– Backed up data can be accessed at:/gpfs/ddn_snapshot/.snapshots/<date>/<path>

e.g. recover accidentally deleted file in home directory: /gpfs/ddn_snapshot/.snapshots/2012-12-24/home/iendres2/christmas_list

Page 8: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Moving data to/from cluster

• Only option right now is sftp/scp

• SSHFS lets you mount a directory from remote machines– Haven’t tried this, but might be useful

Page 9: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Modules

[iendres2 ~]$ modules load <modulename>

Manages environment, typically used to add software to path:– To get the latest version of matlab:

[iendres2 ~]$ modules load matlab/7.14– To find modules such as vim, svn:

[iendres2 ~]$ modules avail

Page 10: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Useful Startup Options

Appended to the end of my bashrc:– Make default permissions the same for user and

group, useful when working on a joint project• umask u=rwx,g=rwx

– Safer alternative – don’t allow writing• umask u=rwx,g=rx

– Load common modules• module load vim• module load svn• module load matlab

Page 11: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Submitting Jobs

Page 12: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Queues

– Primary (VisionLanguage)• Nodes we own (Currently 8)• Jobs can last 72 hours• We have priority access

– Secondary (secondary)• Anyone else’s idle nodes (~500)• Jobs can only last 4 hours, automatically killed• Not unusual to wait 12 hours for job to begin runing

Page 13: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Scheduler

• Typically behaves as first come first serve

• Claims of priority scheduling, we don’t know how it works…

Page 14: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Types of job

– Batch job• No graphics, runs and completes without user

interaction

– Interactive Jobs• Brings remote shell to your terminal• X-forwarding available for graphics

• Both wait in queue the same way

Page 15: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Scheduling jobs

– Batch job• [iendres2 ~]$ qsub <job_script>

• job_script defines parameters of job and the actual command to run• Details on job scripts to follow

– Interactive Jobs• [iendres2 ~]$ qsub -q <queuename> -I -l walltime=00:30:00,nodes=1:ppn=12

• Include –X for X-forwarding• Details on –l parameters to follow

Page 16: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Configuring Jobs

Page 17: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Basics

• Parameters of jobs are defined by a bash script which contains “PBS commands” followed by script to execute

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00…cd ~/workdir/echo “This is job number ${PBS_JOBID}”

Page 18: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Basics

• Parameters of jobs are defined by a bash script which contains “PBS commands” followed by script to execute

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00…cd ~/workdir/echo “This is job number ${PBS_JOBID}”

Queue to use:VisionLanguage or secondary

Page 19: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Basics

• Parameters of jobs are defined by a bash script which contains “PBS commands” followed by script to execute

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00…cd ~/workdir/echo “This is job number ${PBS_JOBID}”

• Number of nodes – 1, unless using MPI or other distributed programming

• Processors per node – Always 12, smallest computation unit is a physical node, which has 12 cores (with current hardware)*

*Some queues are configured to allow multiple concurrent jobs per node, but this is uncommon

Page 20: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Basics

• Parameters of jobs are defined by a bash script which contains “PBS commands” followed by script to execute

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00…cd ~/workdir/echo “This is job number ${PBS_JOBID}”

• Maximum time job will run for – it is killed if it exceeds this

• 72:00:00 hours for primary queue• 04:00:00 hours for secondary queue

Page 21: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Basics

• Parameters of jobs are defined by a bash script which contains “PBS commands” followed by script to execute

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00…cd ~/workdir/echo “This is job number ${PBS_JOBID}”

Bash comands are allowed anywhere in the script and will be executed on the scheduled worker node after all PBS commands are handled

Page 22: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Basics

• Parameters of jobs are defined by a bash script which contains “PBS commands” followed by script to execute

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00…cd ~/workdir/echo “This is job number ${PBS_JOBID}”

There are some reserved variables that the scheduler will fill in once the job is scheduled (see `man qsub` for more variables)

Page 23: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

BasicsScheduler variables (From manpage)

PBS_O_HOST the name of the host upon which the qsub command is running.

PBS_SERVER the hostname of the pbs_server which qsub submits the job to.

PBS_O_QUEUE the name of the original queue to which the job was submitted.

PBS_O_WORKDIR the absolute path of the current working directory of the qsub command.

PBS_ARRAYID each member of a job array is assigned a unique identifier (see -t)

PBS_ENVIRONMENT set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interac- tive job, see -I option.

PBS_JOBID the job identifier assigned to the job by the batch system.

PBS_JOBNAME the job name supplied by the user.

PBS_NODEFILE the name of the file contain the list of nodes assigned to the job (for parallel and cluster systems).

PBS_QUEUE the name of the queue from which the job is executed.

There are some reserved variables that the scheduler will fill in once the job is scheduled (see `man qsub` for more variables)

Page 24: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Monitoring Jobs[iendres2 ~]$ qstat Sample output:JOBID JOBNAME USER WALLTIME STATE QUEUE333885[].taubm1 r-afm-average hzheng8 0 Q secondary333899.taubm1 test6 lee263 03:33:33 R secondary 333900.taubm1 cgfb-a dcyang2 09:22:44 R secondary 333901.taubm1 cgfb-b dcyang2 09:31:14 R secondary 333902.taubm1 cgfb-c dcyang2 09:28:28 R secondary 333903.taubm1 cgfb-d dcyang2 09:12:44 R secondary 333904.taubm1 cgfb-e dcyang2 09:27:45 R secondary 333905.taubm1 cgfb-f dcyang2 09:30:55 R secondary 333906.taubm1 cgfb-g dcyang2 09:06:51 R secondary 333907.taubm1 cgfb-h dcyang2 09:01:07 R secondary 333908.taubm1 ...conp5_38.namd harpole2 0 H cse 333914.taubm1 ktao3.kpt.12 chandini 03:05:36 C secondary 333915.taubm1 ktao3.kpt.14 chandini 03:32:26 R secondary 333916.taubm1 joblammps daoud2 03:57:06 R cse

States:Q – Queued, waiting to runR – RunningH – Held, by user or admin, won’t run until released (see qhold, qrls)C – Closed – finished runningE – Error – this usually doesn’t happen, indicates a problem with the cluster

grep is your friend for finding specific jobs(e.g. qstat –u iendres2 | grep “ R ” gives all of my running jobs)

Page 25: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Managing Jobs

qalter, qdel, qhold, qmove, qmsg, qrerun, qrls, qselect, qsig, qstat

Each takes a jobid + some arguments

Page 26: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Problem: I want to run the same job with multiple parameters

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00

cd ~/workdir/./script <param1> <param2>

Solution: Create wrapper script to iterate over params

Where: param1 = {a, b, c}param2 = {1, 2, 3}

Page 27: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Problem 2: I can’t pass parameters into my job script

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00

cd ~/workdir/./script <param1> <param2>

Solution 2: Hack it!

Where: param1 = {a, b, c}param2 = {1, 2, 3}

Page 28: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Problem 2: I can’t pass parameters into my job script

Where: param1 = {a, b, c}param2 = {1, 2, 3}

We can pass parameters via the jobname, and delimit them using the ‘-’ character (or whatever you want)

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00

# Pass parameters via jobname:export IFS="-"i=1

for word in ${PBS_JOBNAME}; do echo $word arr[i]=$word ((i++))done

# Stuff to executeecho Jobname: ${arr[1]}cd ~/workdir/echo ${arr[2]} ${arr[3]}

Page 29: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Problem 2: I can’t pass parameters into my job script

Where: param1 = {a, b, c}param2 = {1, 2, 3}

qsub –N job-param1-param2 job_script

qsub’s -N parameter sets the job name

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00

# Pass parameters via jobname:export IFS="-"i=1

for word in ${PBS_JOBNAME}; do echo $word arr[i]=$word ((i++))done

# Stuff to executeecho Jobname: ${arr[1]}cd ~/workdir/echo ${arr[2]} ${arr[3]}

Page 30: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Problem 2: I can’t pass parameters into my job script

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00

# Pass parameters via jobname:export IFS="-"i=1

for word in ${PBS_JOBNAME}; do echo $word arr[i]=$word ((i++))done

# Stuff to executeecho Jobname: ${arr[1]}cd ~/workdir/echo ${arr[2]} ${arr[3]}

Where: param1 = {a, b, c}param2 = {1, 2, 3}

qsub –N job-param1-param2 job_script

Output would be:

Jobname: jobparam1 param2

Page 31: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Problem: I want to run the same job with multiple parameters

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00

# Pass parameters via jobname:export IFS="-"i=1

for word in ${PBS_JOBNAME}; do echo $word arr[i]=$word ((i++))done

# Stuff to executeecho Jobname: ${arr[1]}cd ~/workdir/echo ${arr[2]} ${arr[3]}

Where: param1 = {a, b, c}param2 = {1, 2, 3}

#!/bin/bash

param1=({a,b,c})param2=({1,2,3}) # or {1..3} for p1 in ${param1[@]}; do for p2 in ${param2[@]}; do qsub –N job-${p1}-${p2} job_script donedone

Now Loop!

Page 32: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Problem 3: My job isn’t multithreaded, but needs to run many times

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00

cd ~/workdir/./script ${idx} Solution: Run 12 independent

processes on the same node so 11 CPU’s don’t sit idle

Page 33: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Problem 3: My job isn’t multithreaded, but needs to run many times

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00

cd ~/workdir/# Run 12 jobs in the backgroundfor idx in {1..12}; do ./script ${idx} & # Your job goes here (keep the ampersand) pid[idx]=$! # Record the PIDdone

# Wait for all the processes to finish for idx in {1..12}; do echo waiting on ${pid[idx]} wait ${pid[idx]}done

Solution: Run 12 independent processes on the same node so 11 CPU’s don’t sit idle

Page 34: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Matlab and The Cluster

Page 35: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Simple Matlab Sample#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00

cd ~/workdir/matlab -nodisplay -r “matlab_func(); exit;”

Page 36: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Matlab Sample: Passing Parameters

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00

cd ~/workdir/param = 1param2 = \’string\’ # Escape string parametersmatlab -nodisplay -r “matlab_func(${param}); exit;”

Page 37: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

#PBS -q VisionLanguage#PBS -l nodes=1:ppn=12#PBS -l walltime=04:00:00

cd ~/workdir/matlab -nodisplay -r “matlab_func(); exit;”X

Simple Matlab Sample

Running more than a few matlab jobs (thinking about using the secondary queue) ?

You may use too many licenses - especially Distributed Computing Toolbox (e.g. parfor)

Page 38: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Compiling Matlab CodeDoesn’t use any matlab licenses once compiledCompiles matlab code into a standalone executableConstraints:– Code can’t call addpath– Functions called by eval, str2func, or other implicit methods must be

explicitly identified• e.g. for eval(‘do_this’) to work, must also include %#function do_this

To compile (within matlab):>> addpath(‘everything that should be included’)>> mcc –m function_to_compile.m

isdeployed() is useful for modifying behavior for compiled applications(returns true if code is running the compiled version)

Page 39: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Running Compiled Matlab Code

• Requires Matlab compiler runtime>> mcrinstaller % This will point you to the installer and help install it% make note of the installed path MCRPATH (e.g. …/mcr/v716/)

• Compiled code generates two files:– function_to_compile and run_function_to_compile.sh

• To run:– [iendres2 ~]$ ./run_function_to_compile.sh MCRPATH param1 param2 … paramk– Params will be passed into matlab function as usual, except they will always be strings– Useful trick:

function function_to_compile(param1, param2, …, paramk)if(isdeployed) param1 = str2num(param1); %param2 expects a string paramk = str2num(paramk);end

Page 40: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Parallel For Loops on the Cluster

• Not designed for multiple nodes on shared filesystem:– Race condition from concurrent writes to:

~/.matlab/local_scheduler_data/

• Easy fix: redirect directory to /scratch.local

Page 41: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Parallel For Loops on the Cluster

1. Setup (done once, before submitting jobs): [iendres2 ~]$ ln –sv /scratch.local/tmp/USER/matlab/local_scheduler_data

~/.matlab/local_scheduler_data(Replace USER with your netid)

Page 42: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Parallel For Loops on the Cluster2. Wrap matlabpool function to make sure tmp data exists:

function matlabpool_robust(varargin)

if(matlabpool('size')>0)   matlabpool closeend

% make sure the directories exist and are empty for good measuresystem('rm -rf /scratch.local/tmp/USER/matlab/local_scheduler_data');

system(sprintf('mkdir -p /scratch.local/tmp/USER/matlab/local_scheduler_data/R%s', version('-release')));

% Run it:

matlabpool (varargin{:});Warning:/scratch.local may get filled up by other users, in which case this will fail.

Page 43: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Best Practices

• Interactive Sessions– Don’t leave idle sessions open, it ties up the nodes

• Job arrays– Still working on kinks in the scheduler, I managed

to kill the whole cluster• Disk I/O– Minimize I/O for best performance– Avoid small reads and writes due to metadata

overhead

Page 44: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Maintenance

• “Preventive maintenance (PM) on the cluster is generally scheduled on a monthly basis on the third Wednesday of each month from 8 a.m. to 8 p.m. Central Time. The cluster will be returned to service earlier if maintenance is completed before schedule.”

Page 45: The Campus Cluster. What is the Campus Cluster? Batch job system High throughput High latency Available resources: – ~450 nodes – 12 Cores/node – 24-96

Resources

• Beginner’s guide:https://campuscluster.illinois.edu/user_info/doc/beginner.html

• More comprehensive user’s guide: http://campuscluster.illinois.edu/user_info/doc/index.html

• Cluster Monitor:http://clustat.ncsa.illinois.edu/taub/

• Simple sample job scripts/projects/consult/pbs/

• Forumhttps://campuscluster.illinois.edu/forum/