7
Using the COMSOL version 4.3 Cluster Computing Feature at the UB CCR About This Document This document provides a step-by-step tutorial for using the “Cluster Computing” feature of COMSOL Multi-physics on the UB CCR “rush” cluster. Why would I want to run cluster computing feature? The cluster computing feature of COMSOL allows users to run a model in parallel from within the user- friendly COMSOL GUI. This allows for users to monitor the progress of COMSOL as it goes about solving a computationally intensive or memory intensive model. This might be desirable if one were in the process of building a complex model and wanted to make sure the model development was proceeding successfully. A user can run an intermediate version of their model (e.g. after adding a new multi-physics option) and check to see if the new addition looks to be working ok. If problems are observed, the user can halt the model via the GUI and make changes as needed. Why wouldn’t I want to run cluster computing feature? The cluster computing feature isn’t appropriate for “production”-style model runs. These are model runs or parameter sweeps for which the user is already very confident in the model that they have developed. Basically, if you are finished with model development and just need to run a complex model or parameter sweep, then the “Cluster Computing” option is not appropriate. Instead, you should run COMSOL in batch mode via submitting a slurm job. Example slurm scripts are available off the front-end. The choice of script depends on which COMSOL license you are using, as given below in Table 1: Table 1: COMSOL Group Information COMSOL License Example SLURM Script Module Name natashal group /util/slurm-scripts/slurmCOMSOL-natashal comsol/4.3 comsol group /util/slurm-scripts/slurmCOMSOL-ub comsol-ub/4.3a jmjornet group /util/slurm-scripts/slurmCOMSOL-jmjornet comsol-jmjornet/4.3b What are some limitations of the using the Cluster Computing option at CCR? (1) UB CCR uses a front-end firewall to prevent its compute nodes from being directly exposed to the outside world. As a result the COMSOL “Cluster Computing” option can only be used via running the COMSOL GUI from the front-end, from one of the CCR compute nodes, or from the remote visualization node. If the GUI is run from an external machine (e.g. a user’s personal laptop or desktop PC) it will not be able to communicate with the compute nodes and the “Cluster Computing” feature will not work. (2) An interactive SLURM job for launching the COMSOL server must be up and running prior to launching the COMSOL GUI. This might mean that users will have to wait for some time before being able to get to work, since the interactive job might be queued by the CCR resource manager depending on the system load at the time the job is submitted. One way to avoid long waits is to submit the interactive job to the debug partition. However, this will limit the number of nodes that can be requested and will also limit the walltime for the “Cluster Computing” job to 1 hour.

Using COMSOL Cluster Computing at CCR R2

Embed Size (px)

DESCRIPTION

using cluster computing in HPC

Citation preview

Page 1: Using COMSOL Cluster Computing at CCR R2

Using the COMSOL version 4.3 Cluster Computing Feature at the UB CCR

About This Document

This document provides a step-by-step tutorial for using the “Cluster Computing” feature of COMSOL

Multi-physics on the UB CCR “rush” cluster.

Why would I want to run cluster computing feature?

The cluster computing feature of COMSOL allows users to run a model in parallel from within the user-

friendly COMSOL GUI. This allows for users to monitor the progress of COMSOL as it goes about solving

a computationally intensive or memory intensive model. This might be desirable if one were in the

process of building a complex model and wanted to make sure the model development was proceeding

successfully. A user can run an intermediate version of their model (e.g. after adding a new multi-physics

option) and check to see if the new addition looks to be working ok. If problems are observed, the user

can halt the model via the GUI and make changes as needed.

Why wouldn’t I want to run cluster computing feature?

The cluster computing feature isn’t appropriate for “production”-style model runs. These are model runs

or parameter sweeps for which the user is already very confident in the model that they have

developed. Basically, if you are finished with model development and just need to run a complex model

or parameter sweep, then the “Cluster Computing” option is not appropriate. Instead, you should run

COMSOL in batch mode via submitting a slurm job. Example slurm scripts are available off the front-end.

The choice of script depends on which COMSOL license you are using, as given below in Table 1:

Table 1: COMSOL Group Information

COMSOL License Example SLURM Script Module Name

natashal group /util/slurm-scripts/slurmCOMSOL-natashal comsol/4.3

comsol group /util/slurm-scripts/slurmCOMSOL-ub comsol-ub/4.3a

jmjornet group /util/slurm-scripts/slurmCOMSOL-jmjornet comsol-jmjornet/4.3b

What are some limitations of the using the Cluster Computing option at CCR?

(1) UB CCR uses a front-end firewall to prevent its compute nodes from being directly exposed to the

outside world. As a result the COMSOL “Cluster Computing” option can only be used via running the

COMSOL GUI from the front-end, from one of the CCR compute nodes, or from the remote visualization

node. If the GUI is run from an external machine (e.g. a user’s personal laptop or desktop PC) it will not

be able to communicate with the compute nodes and the “Cluster Computing” feature will not work.

(2) An interactive SLURM job for launching the COMSOL server must be up and running prior to

launching the COMSOL GUI. This might mean that users will have to wait for some time before being

able to get to work, since the interactive job might be queued by the CCR resource manager depending

on the system load at the time the job is submitted. One way to avoid long waits is to submit the

interactive job to the debug partition. However, this will limit the number of nodes that can be

requested and will also limit the walltime for the “Cluster Computing” job to 1 hour.

Page 2: Using COMSOL Cluster Computing at CCR R2

How do I use the COMSOL “Cluster Computing” feature at UB CCR?

There are several steps involved. First, users should become familiar with logging into the UB CCR front-

end machine (rush.ccr.buffalo.edu). Training is available at: http://ccr.buffalo.edu/support/UserGuide.html.

Alternatively, users may wish to launch COMSOL from the remote visualization (viz) node. Instructions

for accessing the node are given here: http://ccr.buffalo.edu/support/research_facilities/remote-visualization.html.

The rest of this guide assumes users are familiar with connecting to the CCR front-end or viz node and

are able to navigate a Linux command line interface (e.g. via commands like cd, ls, pwd, cat, etc.)

Step 1 – Launch a COMSOL server on each compute node

Compute nodes are the processors which the COMSOL GUI will end up using to solve the model using

parallel processing. We need to request the desired number and type of compute nodes from the UB

CCR resource manager. Then we need to launch a comsol server on these nodes so that they can

interface with the COMSOL GUI.

Open a new ssh connection to the CCR front-end or launch a terminal from within the desktop of the

remote visualization node. From the command prompt, submit a request for an interactive job using the

“fisbatch” command. For example:

$ fisbatch --partition=debug --time=01:00:00 --nodes=2 –ntasks-per-node=12 --mem=48000

Will request two compute nodes from the debug partition with 12 processors per node and 48GB of

RAM per node. A total of 24 processors will be used in solving the model. Information on the type of

nodes available (including partitions, processor counts, amount of memory, and SLURM constraints) is

available via the “snodes” command. Type “snodes –help” at the command prompt for usage

information.

After entering the fisbatch command you will have to wait a bit (or possibly longer) for the scheduler to

process your request. Once the desired nodes become available the scheduler will automatically log you

into one of the compute nodes (this is known as the “head” node) and you’ll be provided with a

command line prompt. From this prompt, you’ll launch the COMSOL server software on each of the

requested nodes. Do this by entering the following sequence of commands (replace

your_comsol_module with the appropriate module for your group, see Table 1):

$ cd $SLURM_SUBMIT_DIR

$ srun hostname | sort | uniq > nodes.comsol

$ module load your_comsol_module

$ comsol –nn 2 –np 12 –f nodes.comsol server

Be sure to match the value of the -nn (number of nodes) argument with the actual number of nodes

requested by the previous fisbatch command. Also be sure to match the –np (number of processors per

node) argument with the actual number of tasks per node (ntasks-per-node) requested by the previous

Page 3: Using COMSOL Cluster Computing at CCR R2

fisbatch command. In this example, these values are “2” and “12”, respectively. When you first run the

server you may be prompted for a username and password. If this happens, enter your UB CCR

username and password. Now the comsol server should launch and you should see output similar to the

following appear in the terminal:

Node 0 is running on host: k16n13a.ccr.buffalo.edu

Node 0 has address: k16n13a.ccr.buffalo.edu

Node 1 is running on host: k16n12b.ccr.buffalo.edu

Node 1 has address: k16n12b.ccr.buffalo.edu

COMSOL 4.3 (Build: 151) started listening on port 2036

Use the console command 'close' to exit the application

You may now minimize (but do not close) the comsol server terminal window. In the next step we will

run the COMSOL client GUI and connect it to these compute nodes.

Step 2 – Launch the COMSOL GUI via comsol client

Open a second connection to the UB CCR front-end machine, or open a second terminal in the remote

visualization desktop. In the resulting terminal window, enter the following commands to launch the

COMSOL client GUI from the front-end. Replace your_comsol_module with the version that is

appropriate for your group (see Table 1, above):

$ module load your_comsol_module

$ comsol client

The COMSOL GUI splash screen will appear, followed by a dialog box that prompts for information about

the server node. An example is given below:

In the server text box, enter the name of the “head” compute node. This corresponds to the name of

“Node 0” in the comsol server output (see above). For this example, the head node is

“k16n13a.ccr.bufalo.edu”. For the port number text box, enter the port number that the server is

listening on. This is provided in the output from comsol server command (see above) and in this example

the value is “2036”. Enter your UB CCR username and password and click the ok button. Now the

COMSOL GUI will load and will connect to the compute nodes “behind-the-scenes”.

Page 4: Using COMSOL Cluster Computing at CCR R2

Step 3 – Open a model and add “Cluster Computing” study option

The next step is to open your COMSOL model and add a “Cluster Computing” option to your desired

study. If you have initially developed your COMSOL model on a laptop, desktop PC or workstation PC

you will need to transfer the corresponding .mph file over to the CCR storage area. For example, this can

be done using FileZilla. Training for this is available at: http://ccr.buffalo.edu/support/UserGuide.html

This example uses the buoyancy_free model located at: /projects/ccrstaff/lsmatott/comsol/buoy. In the

COMSOL GUI click on the “Show” icon of the “Model Builder”. It is circled in red in the figure below:

After clicking the “Model Builder””Show” icon a drop-down list will appear. In this list, make sure the

“Advanced Study Options” box is checked. It is circled in red in the figure below:

In the “Model Builder” area, highlight the name of the study that you’d like to run in parallel. Then,

right-click and select “Cluster Computing” from the resulting drop-down list. This will add a “Cluster

Computing” node to the selected study, as shown on the following page.

Page 5: Using COMSOL Cluster Computing at CCR R2

Click on the newly added “Cluster Computing” node. This will open the “Cluster Computing” tab to the right of the “Model Builder” area, as shown below:

In the “Batch Settings” area of the “Cluster Computing” tab, do the following:

(1) Select “General” from the drop-down list of Cluster types

(2) Uncheck the “MPD is running” box

(3) In the “Host file:” text box type the full path to the location of the nodes.comsol file that was

created in Step 1 (see above)

(4) Leave the “Bootstrap server” textbox blank

(5) In the “Rsh” textbox, type “/usr/bin/ssh”

(6) In the “Number of nodes” textbox, enter the number of compute nodes requested in Step 1 (see

above). For this example, the value is “2”.

(7) In the “Filename” box, enter the full path to the .mph model file that you have opened. In this

example, the value is: /projects/ccrstaff/lsmatott/comsol/buoy/buoyancy_free.mph

After: “Cluster Computing”

node is added to the study.

Page 6: Using COMSOL Cluster Computing at CCR R2

(8) In the “Directory” box, enter the full path to the directory where the .mph model file is located.

In this example, the value is: / projects / ccrstaff /lsmatott/comsol/buoy

(9) Uncheck the “Specify external COMSOL batch directory path” box

(10) Uncheck the “Specify external COMSOL installation directory path” box

(11) Uncheck the “Use batch license” box

When all fields are filled out correctly, click the “Save” button. It is circled in red in the figure below. For

this example, the completed “Cluster Computing” configuration tab is given on the following page.

Step 4 – Run the model and monitor progress Now that the model has been configured to use the cluster compute nodes, you can run the model by

clicking on the usual compute icon ( ) for the selected study. Alternatively you can press the F8 key. However, now the COMSOL solver will run on the compute nodes instead of on the client node that is displaying the GUI! You can monitor the progress of the cluster computation in the same ways that you would monitor a non-cluster computation. For example, you can click on the “Progress” tab in the lower area of the GUI below the “Graphics” area. This is shown below for the buoyancy_free example.

Page 7: Using COMSOL Cluster Computing at CCR R2

Step 4a (optional) – Monitor compute nodes using ccrusrviz UB CCR provides a job visualization tool that can be used to monitor the parallel performance of the COMSOL “Cluster Computing” solver. To launch this tool, open a new connection to the CCR front-end and type the following command in the terminal: $ /util/ccrjobvis/ccrusrviz

If you are running other jobs besides a COMSOL cluster computing job, you should use the following command instead: $ /util/ccrjobvis/slurmjobvis <job_id>

Where <job_id> is the job number returned by the fisbatch command issued in step 1.

These commands launch a GUI that monitors CPU, memory and network utilization on the compute nodes assigned to the COMSOL “Cluster Computing” feature. This tool should be launched prior to running COMSOL “compute” for a given study. For the buoyancy_free example, the job visualization graphic should look something like the figure given below once COMSOL “compute” is launched for the study: