Upload
andrew-nabil
View
12
Download
1
Embed Size (px)
DESCRIPTION
using cluster computing in HPC
Citation preview
Using the COMSOL version 4.3 Cluster Computing Feature at the UB CCR
About This Document
This document provides a step-by-step tutorial for using the “Cluster Computing” feature of COMSOL
Multi-physics on the UB CCR “rush” cluster.
Why would I want to run cluster computing feature?
The cluster computing feature of COMSOL allows users to run a model in parallel from within the user-
friendly COMSOL GUI. This allows for users to monitor the progress of COMSOL as it goes about solving
a computationally intensive or memory intensive model. This might be desirable if one were in the
process of building a complex model and wanted to make sure the model development was proceeding
successfully. A user can run an intermediate version of their model (e.g. after adding a new multi-physics
option) and check to see if the new addition looks to be working ok. If problems are observed, the user
can halt the model via the GUI and make changes as needed.
Why wouldn’t I want to run cluster computing feature?
The cluster computing feature isn’t appropriate for “production”-style model runs. These are model runs
or parameter sweeps for which the user is already very confident in the model that they have
developed. Basically, if you are finished with model development and just need to run a complex model
or parameter sweep, then the “Cluster Computing” option is not appropriate. Instead, you should run
COMSOL in batch mode via submitting a slurm job. Example slurm scripts are available off the front-end.
The choice of script depends on which COMSOL license you are using, as given below in Table 1:
Table 1: COMSOL Group Information
COMSOL License Example SLURM Script Module Name
natashal group /util/slurm-scripts/slurmCOMSOL-natashal comsol/4.3
comsol group /util/slurm-scripts/slurmCOMSOL-ub comsol-ub/4.3a
jmjornet group /util/slurm-scripts/slurmCOMSOL-jmjornet comsol-jmjornet/4.3b
What are some limitations of the using the Cluster Computing option at CCR?
(1) UB CCR uses a front-end firewall to prevent its compute nodes from being directly exposed to the
outside world. As a result the COMSOL “Cluster Computing” option can only be used via running the
COMSOL GUI from the front-end, from one of the CCR compute nodes, or from the remote visualization
node. If the GUI is run from an external machine (e.g. a user’s personal laptop or desktop PC) it will not
be able to communicate with the compute nodes and the “Cluster Computing” feature will not work.
(2) An interactive SLURM job for launching the COMSOL server must be up and running prior to
launching the COMSOL GUI. This might mean that users will have to wait for some time before being
able to get to work, since the interactive job might be queued by the CCR resource manager depending
on the system load at the time the job is submitted. One way to avoid long waits is to submit the
interactive job to the debug partition. However, this will limit the number of nodes that can be
requested and will also limit the walltime for the “Cluster Computing” job to 1 hour.
How do I use the COMSOL “Cluster Computing” feature at UB CCR?
There are several steps involved. First, users should become familiar with logging into the UB CCR front-
end machine (rush.ccr.buffalo.edu). Training is available at: http://ccr.buffalo.edu/support/UserGuide.html.
Alternatively, users may wish to launch COMSOL from the remote visualization (viz) node. Instructions
for accessing the node are given here: http://ccr.buffalo.edu/support/research_facilities/remote-visualization.html.
The rest of this guide assumes users are familiar with connecting to the CCR front-end or viz node and
are able to navigate a Linux command line interface (e.g. via commands like cd, ls, pwd, cat, etc.)
Step 1 – Launch a COMSOL server on each compute node
Compute nodes are the processors which the COMSOL GUI will end up using to solve the model using
parallel processing. We need to request the desired number and type of compute nodes from the UB
CCR resource manager. Then we need to launch a comsol server on these nodes so that they can
interface with the COMSOL GUI.
Open a new ssh connection to the CCR front-end or launch a terminal from within the desktop of the
remote visualization node. From the command prompt, submit a request for an interactive job using the
“fisbatch” command. For example:
$ fisbatch --partition=debug --time=01:00:00 --nodes=2 –ntasks-per-node=12 --mem=48000
Will request two compute nodes from the debug partition with 12 processors per node and 48GB of
RAM per node. A total of 24 processors will be used in solving the model. Information on the type of
nodes available (including partitions, processor counts, amount of memory, and SLURM constraints) is
available via the “snodes” command. Type “snodes –help” at the command prompt for usage
information.
After entering the fisbatch command you will have to wait a bit (or possibly longer) for the scheduler to
process your request. Once the desired nodes become available the scheduler will automatically log you
into one of the compute nodes (this is known as the “head” node) and you’ll be provided with a
command line prompt. From this prompt, you’ll launch the COMSOL server software on each of the
requested nodes. Do this by entering the following sequence of commands (replace
your_comsol_module with the appropriate module for your group, see Table 1):
$ cd $SLURM_SUBMIT_DIR
$ srun hostname | sort | uniq > nodes.comsol
$ module load your_comsol_module
$ comsol –nn 2 –np 12 –f nodes.comsol server
Be sure to match the value of the -nn (number of nodes) argument with the actual number of nodes
requested by the previous fisbatch command. Also be sure to match the –np (number of processors per
node) argument with the actual number of tasks per node (ntasks-per-node) requested by the previous
fisbatch command. In this example, these values are “2” and “12”, respectively. When you first run the
server you may be prompted for a username and password. If this happens, enter your UB CCR
username and password. Now the comsol server should launch and you should see output similar to the
following appear in the terminal:
Node 0 is running on host: k16n13a.ccr.buffalo.edu
Node 0 has address: k16n13a.ccr.buffalo.edu
Node 1 is running on host: k16n12b.ccr.buffalo.edu
Node 1 has address: k16n12b.ccr.buffalo.edu
COMSOL 4.3 (Build: 151) started listening on port 2036
Use the console command 'close' to exit the application
You may now minimize (but do not close) the comsol server terminal window. In the next step we will
run the COMSOL client GUI and connect it to these compute nodes.
Step 2 – Launch the COMSOL GUI via comsol client
Open a second connection to the UB CCR front-end machine, or open a second terminal in the remote
visualization desktop. In the resulting terminal window, enter the following commands to launch the
COMSOL client GUI from the front-end. Replace your_comsol_module with the version that is
appropriate for your group (see Table 1, above):
$ module load your_comsol_module
$ comsol client
The COMSOL GUI splash screen will appear, followed by a dialog box that prompts for information about
the server node. An example is given below:
In the server text box, enter the name of the “head” compute node. This corresponds to the name of
“Node 0” in the comsol server output (see above). For this example, the head node is
“k16n13a.ccr.bufalo.edu”. For the port number text box, enter the port number that the server is
listening on. This is provided in the output from comsol server command (see above) and in this example
the value is “2036”. Enter your UB CCR username and password and click the ok button. Now the
COMSOL GUI will load and will connect to the compute nodes “behind-the-scenes”.
Step 3 – Open a model and add “Cluster Computing” study option
The next step is to open your COMSOL model and add a “Cluster Computing” option to your desired
study. If you have initially developed your COMSOL model on a laptop, desktop PC or workstation PC
you will need to transfer the corresponding .mph file over to the CCR storage area. For example, this can
be done using FileZilla. Training for this is available at: http://ccr.buffalo.edu/support/UserGuide.html
This example uses the buoyancy_free model located at: /projects/ccrstaff/lsmatott/comsol/buoy. In the
COMSOL GUI click on the “Show” icon of the “Model Builder”. It is circled in red in the figure below:
After clicking the “Model Builder””Show” icon a drop-down list will appear. In this list, make sure the
“Advanced Study Options” box is checked. It is circled in red in the figure below:
In the “Model Builder” area, highlight the name of the study that you’d like to run in parallel. Then,
right-click and select “Cluster Computing” from the resulting drop-down list. This will add a “Cluster
Computing” node to the selected study, as shown on the following page.
Click on the newly added “Cluster Computing” node. This will open the “Cluster Computing” tab to the right of the “Model Builder” area, as shown below:
In the “Batch Settings” area of the “Cluster Computing” tab, do the following:
(1) Select “General” from the drop-down list of Cluster types
(2) Uncheck the “MPD is running” box
(3) In the “Host file:” text box type the full path to the location of the nodes.comsol file that was
created in Step 1 (see above)
(4) Leave the “Bootstrap server” textbox blank
(5) In the “Rsh” textbox, type “/usr/bin/ssh”
(6) In the “Number of nodes” textbox, enter the number of compute nodes requested in Step 1 (see
above). For this example, the value is “2”.
(7) In the “Filename” box, enter the full path to the .mph model file that you have opened. In this
example, the value is: /projects/ccrstaff/lsmatott/comsol/buoy/buoyancy_free.mph
After: “Cluster Computing”
node is added to the study.
(8) In the “Directory” box, enter the full path to the directory where the .mph model file is located.
In this example, the value is: / projects / ccrstaff /lsmatott/comsol/buoy
(9) Uncheck the “Specify external COMSOL batch directory path” box
(10) Uncheck the “Specify external COMSOL installation directory path” box
(11) Uncheck the “Use batch license” box
When all fields are filled out correctly, click the “Save” button. It is circled in red in the figure below. For
this example, the completed “Cluster Computing” configuration tab is given on the following page.
Step 4 – Run the model and monitor progress Now that the model has been configured to use the cluster compute nodes, you can run the model by
clicking on the usual compute icon ( ) for the selected study. Alternatively you can press the F8 key. However, now the COMSOL solver will run on the compute nodes instead of on the client node that is displaying the GUI! You can monitor the progress of the cluster computation in the same ways that you would monitor a non-cluster computation. For example, you can click on the “Progress” tab in the lower area of the GUI below the “Graphics” area. This is shown below for the buoyancy_free example.
Step 4a (optional) – Monitor compute nodes using ccrusrviz UB CCR provides a job visualization tool that can be used to monitor the parallel performance of the COMSOL “Cluster Computing” solver. To launch this tool, open a new connection to the CCR front-end and type the following command in the terminal: $ /util/ccrjobvis/ccrusrviz
If you are running other jobs besides a COMSOL cluster computing job, you should use the following command instead: $ /util/ccrjobvis/slurmjobvis <job_id>
Where <job_id> is the job number returned by the fisbatch command issued in step 1.
These commands launch a GUI that monitors CPU, memory and network utilization on the compute nodes assigned to the COMSOL “Cluster Computing” feature. This tool should be launched prior to running COMSOL “compute” for a given study. For the buoyancy_free example, the job visualization graphic should look something like the figure given below once COMSOL “compute” is launched for the study: