× If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
• the way mrcc was invoked
• the way build.mrcc was invoked
• the output of build.mrcc
• compiler version (for example: ifort -V, gfortran -v)
• blas/lapack versions
• as well as gcc and glibc versions

This information really helps us during troubleshooting :)

# Running MRCC with MPI tasks on multiple compute nodes

1 year 9 months ago #1089 by ddatta
Hello,

I am trying to run CC calculations using a Slurm job script. The goal is to run the job on multiple nodes, e.g., using mpitasks=4 such that two MPI tasks run on one compute node and the other two run on another. In addition, each MPI task spawns a number of OpenMP threads. I guess the latter is simpler. I find that either the job hangs indefinitely or is unable to copy the input file.

Any help/suggestion will be much appreciated.

Thanks in advance and best regards,
Dipayan Datta

Here are some specifications used in the Slurm job script.#SBATCH --nodes=2

export WRKDIR=$PWD export MRCCPATH=path-to-MRCC export INPUTDIR=$WRKDIR# Set scratch
if [ $?TMPDIR ]; then export SCR=$TMPDIR
fi

JOB=...
cd $SCR cp$INPUTDIR/$JOB.inp MINP srun --output$INPUTDIR/$JOB.out$MRCCPATH/dmrcc

----

This script works when mpitasks is set to 1, i.e., the job is run on a single node with one mpitask and any number of OpenMP threads.

1 year 9 months ago #1090
Dear Dipayan,

You should run MRCC with
> $MRCCPATH/dmrcc >$INPUTDIR/$JOB.out instead of srun, or > mpirun -np 1 [additional options]$MRCCPATH/dmrcc_mpi > $INPUTDIR/$JOB.out
if you need [additional options] for mpirun.

Additionally, for SLURM you must use with these settings:
#SBATCH --overcommit
because there will be mpitasks+2 processes running alltogether, with mpitasks doing the works and the +2 waiting mostly in the background.

www.mrcc.hu/index.php/forum/running-mrcc...penmp-and-slurm#1031
www.mrcc.hu/index.php/forum/running-mrcc...penmp-and-slurm#1032
- according to the questions in #1032 and
- MRCC version, compiler version, MPI version, SLURM version
- complete input, output and error messages for "I find that either the job hangs indefinitely or is unable to copy the input file."
- does the MPI parallel version work in other scenarios? (there are MPI testjobs, you should try mpitasks=2 in a single node with your SLURM script and from command line...)

For OpenMP threading you also have to set:
and we recommend also:
export OMP_PLACES=cores
(Here I assume the node have 16 phyical cores and you want to use 8 for each task.
It the node have 32 phyical cores, at least for some versions of SLURM we find that --cpus-per-task have to be the total number of cores occupied by all (here 2) tasks of a node, and not just one task.)

I hope this help, let us know.
Best wishes,
Peter

1 year 9 months ago #1091 by ddatta
Thanks, Péter.

I could manage to get MRCC running on a single compute node with mpitasks=2 using mpirun -np 1 .... etc both using a Slurm script and also from command line.

I think the problem is with the connection between nodes. Following the discussions under

www.mrcc.hu/index.php/forum/running-mrcc...penmp-and-slurm#1031
www.mrcc.hu/index.php/forum/running-mrcc...penmp-and-slurm#1032

I tried changing the bootstrap server, but it did not solve the problem.

This problem arises only with MRCC, and only when using multiple compute nodes. Working on a single node with a number of OpenMP and MKL threads was fine. I have been using the specifications that you mentioned about the OpenMP thread pinning.

The MRCC version is the latest one, February 2020. I am using Intel compiler version 18.3 and the associated Intel MPI library. The SLURM version is 20.11.3.

I have also been using #SBATCH --overcommit.

Thanks,
Dipayan

1 year 9 months ago #1092
Dear Dipayan,

Sorry to hear that the issue remains.
To my understanding you cannot submit a job with more than one nodes, right?
So 2 nodes with 2 MPI processes (1 process per node) does not work either?

Could you expand on what did you mean by "the job is unable to copy the input file"?

Next round of idead for you to try:

1) Please, upload the full input and output files. Or is there absolutely no output/error message? That would be very strange.
You can also try to increase the verbosity level using:
> mpirun -np 1 -genv I_MPI_DEBUG 5 ...
Do you see any processes (dmrcc, minp, integ, scf...) to start on any of the nodes allocated to your job by SLURM?

2) It should be allowed to ssh between nodes in your cluster, at least for the SLURM job between the nodes allocated to the job. It this correct?
Can you try
I_MPI_HYDRA_BOOTSTRAP=ssh

3) Could you share the cluster documentation webpage? Do all the nodes share a common network file system or does your job use different local directories in each nodes?

4) Did you try the binary version of MRCC? And you can try to update the Intel MPI library to Intel MPI 2019 Update 3.

5) What is the Libfabric version in your cluster? See page 26 of the manual on what we recommend.

6) You can also try to recompile MRCC with the OpenMPI library, but read the manual for a number of "Open MPI" comments first.

I hope there will be some progess along any of the above lines.
Bests,
Peter

1 year 9 months ago #1093 by ddatta

#### File Attachment:

File Name: coronene.tar.gz
File Size:9 KB

Hi Péter,

Thanks for your detailed and very helpful comments and suggestions. I used -genv I_MPI_DEBUG 5 as you suggested, this was much useful.

I am attaching an input file and two output files enclosed in a tar.gz attachment. Both calculations used mpitasks=2, OMP_NUM_THREADS=MKL_NUM_THREADS=16 per MPI rank (our cluster has 36 physical cores per node and one hardware thread associated with each core).

One of the calculations was run on a single node with both MPI tasks on the same node. This calculation completed successfully. The other calculation intended to use 2 nodes with 1 MPI rank per node. I guess the output file names are suggestive of how these calculations were performed. The second calculation did not proceed beyond RI-MP2. I used

I_MPI_HYDRA_BOOTSTRAP=slurm

This is the only value that allows at least the SCF calculation to complete. Using I_MPI_HYDRA_BOOTSTRAP=ssh makes the program hang indefinitely before the beginning of SCF iterations.

With I_MPI_HYDRA_BOOTSTRAP=slurm, the program is able to map the two MPI ranks on two different compute nodes. I allowed 10 hours of run time. The calculation did not proceed beyond RI-MP2 and then eventually the allowed CPU time was over, and the job was terminated.

Thank you very much for your help.

Best regards,
Dipayan
##### Attachments:

1 year 9 months ago - 1 year 9 months ago #1094
Dear Dipayan,

Does this job work with 2 nodes 2 MPI tasks without using SLURM?
For that you should specify the -hosts option with
mpirun -np 1 $MRCCPATH/dmrcc_mpi There might be incompatible settings for the process manager and PMI library. You can: 7 ) try with mpirun unsetting the SLURM's PMI if that is set by default in your case via: unset I_MPI_PMI_LIBRARY In this option you might also need to set: I_MPI_HYDRA_BOOTSTRAP=slurm I_MPI_HYDRA_BOOTSTRAP_EXEC=srun 8 ) try to make it work with srun: > srun -n 1 --mpi=pmi?$MRCCPATH/dmrcc_mpi
with ?=2 or x or what is available to you.

In this option I_MPI_PMI_LIBRARY should point to SLURM's PMI library.
If only this works, you might need to tweak the
I_MPI_FABRICS and other settings for optimal performance.

mpirun -np 1 -genv I_MPI_DEBUG 5 -genv I_MPI_HYDRA_ENV all -genv I_MPI_HYDRA_DEBUG 1 -verbose dmrcc_mpi

Note:
If this is a production job, I would recommend to use the full resources of your nodes, i.e.