# 9 Running Mrcc

Please be sure that the directory where the Mrcc executables are located are included in your PATH environment variable. Note that the package includes several executables, and all of them must be copied to the aforementioned directory, not only the driver program dmrcc. Please also check your LD_LIBRARY_PATH environment variable, which must include the directories containing the libraries linked with the program. This variable is usually set before the installation, but you should not change by removing the names of the corresponding directories. Please do not forget to copy the input file MINP (see Sect. 11) to the directory where the program is invoked.

## 9.1 Running Mrcc in serial mode

To run Mrcc in serial the user must invoke the driver of the package by simply typing
dmrcc
on a Unix console. To redirect the input one should execute dmrcc as
dmrcc $>$ out
where out is the output file.

## 9.2 Running Mrcc in parallel using OpenMP

Several executables of the package can be run in OpenMP parallel mode, hence it is recommended to use this option on multiprocessor machines.

The pre-built binaries available at the Mrcc homepage support OpenMP-parallel execution. If you prefer source-code installation, to compile the program for OpenMP parallel execution you need to invoke the build.mrcc script with the -pOMP option at compilation (see Sect. 7). The OpenMP parallelization has been tested with the Intel compiler. Please be careful with other compilers, run, e.g., our test suite (see Sect. 8) with the OpenMP-complied executables before production calculations.

To run the code with OpenMP you need to set the environment variable OMP_NUM_THREADS to the number of cores you want to use. E.g., under Bourne shell (bash):
Then the program should be executed as described above.

The provided binaries are linked with threaded Intel MKL routines, thus, when those are executed, the environment variable MKL_NUM_THREADS should also be set, e.g.:
If source-code installation is preferred, it is recommended to link the Mrcc objects with threaded BLAS and LAPACK libraries and employ the required runtime settings of the employed libraries (e.g., define MKL_NUM_THREADS for Intel MKL).

The binding of threads to hardware elements might affect the performance on certain systems. The thread affinity can be specified with the OpenMP environment variables OMP_PLACES and OMP_PROC_BIND or the Intel MKL specific variable KMP_AFFINITY when the precompiled binaries are executed or Intel MKL is used. For the computations involving the ccsd program, such as DF-, FNO-, and LNO-CCSD computations, nested parallelism is utilized. It is suggested to try setting the OMP_PLACES=cores and OMP_PROC_BIND=spread,close for improved performance. However, in some cases, such thread binding may negatively effect the performance (e.g., when the number of processes exceeds the number of physical CPUs).

## 9.3 Running Mrcc in parallel using MPI

Currently executables scf, mrcc, and ccsd can be run in parallel using the MPI technology. To compile the program for MPI-parallel execution, you need to invoke the build.mrcc script with the -pMPI option at compilations (see Sect. 7). It has been tested with the Intel compiler and the Open MPI (version 4) and Intel MPI (2017 or later) environments. If the precompiled binaries are used or Intel MPI 2021 or newer is linked to Mrcc, it is strongly recommended to install and use the newest stable version of Libfabric (1.9.0 or later) as some of the previous versions provided via the Intel MPI package could cause irregular runtime behavior. The Libfabric library can be downloaded from https://github.com/ofiwg/libfabric. If not the Intel provided Libfabric library is used, the environment for Intel MPI should be set using the -ofi_internal=0 option of mpivars.sh (e.g., if Intel MPI 2019 is installed, source $<$Intel MPI install dir$>$/parallel_studio_xe_2019/
compilers_and_libraries_2019/linux/mpi/intel64/bin/mpivars.sh
release_mt -ofi_internal=0
.

For the MPI-parallel execution, the mpitasks keyword has to be set. Then, it is sufficient to execute the dmrcc binary as usual. The program will spawn the number of scf, mrcc, or ccsd processes specified with the mpitasks keyword and copy the necessary input files to the compute nodes, therefore the input files need to be present only in the directory where dmrcc is executed. Note that the working directory can be the same for all MPI processes, e.g., a directory in the network file system of a computer cluster. Alternatively, process-specific working directories are also supported to exploit local hard drives within compute nodes. In both cases the spawned MPI process will start the execution in the directory with the same path, which might be on a separate file system.

If you wish to run Mrcc with other mpirun options, the MPI-parallel dmrcc_mpi executable should be launched as mpirun -np 1 <options> dmrcc_mpi. You should not run dmrcc using mpirun since it will result in launching mpirun recursively, and your job might fail to start. Please note that the total number of processes will be higher than mpitasks, so you might need to oversubscribe nodes using the appropriate mpirun or scheduler option (e.g., sbatch --overcommit … with SLURM or mpirun --oversubscribe … with Open MPI). For optimal performance, please set mpitasks at the total number of available CPUs, non-uniform memory access (NUMA) nodes, nodes, cores, etc., as the additional number of processes on top of mpitasks are driver processes running mostly in the background and do not require dedicated resources.

On systems consisting of more than one NUMA node (e.g., containing more than one CPU), the performance may be increased by running one process on each NUMA node of the compute nodes. This strategy is beneficial, for instance, when the number of OpenMP processes would otherwise surpass a few tens. Instead, the number of MPI tasks can be increased for better parallel efficiency. Note, however, that in this case the total memory requirement is increased because each process allocates the amount of memory specified in the input file as all MPI algorithms currently available in Mrcc rely on replicated memory strategies.

Pinning processes to CPU cores in MPI parallel runs might affect the performance. When Open MPI is linked to Mrcc, binding can be set by the -bind-to option of mpirun, via modular component architecture (MCA) parameters (e.g., --mca hwloc_base_binding_policy core), or setting the environment variable OMPI_MCA_hwloc_base_binding_policy. It is also suggested to set the Open MPI MCA parameter rmaps_base_inherit to $1$. In the case Mrcc is linked with Intel MPI or the precompiled binaries are used, pinning can be controlled by the I_MPI_PIN and I_MPI_PIN_PROCESSOR_LIST environment variables. If the internode connection is established via an InfiniBand network, other MCA parameters might need to be set as well (e.g., btl_openib_allow_ib to true).

## 9.4 Troubleshooting MPI

To run Mrcc in parallel using MPI, the environment needs to be set up correctly. You need a working MPI installation with the Mrcc executables available on all nodes used for the job. MPI can be set up by setting the appropriate environment variables (e.g., PATH, LD_LIBRARY_PATH), sourcing the setup script that comes with MPI (e.g., if Intel MPI is used) or maybe in other ways (e.g., modules).

MPI might need to be compiled with your chosen resource manager’s integration if you plan to use its own tools to launch MPI processes. If you do not use a resource manager, you should be able to ssh/rsh between the compute nodes. In this case, you might need to set the proper startup method with the appropriate option (such as ssh, slurm, pbs, ...) of your MPI implementation (e.g., the I_MPI_HYDRA_BOOTSTRAP environment variable or the -bootstrap option of mpirun in the case of Intel MPI). In the case, you use Intel MPI with the SLURM resource manager, you should set the I_MPI_PMI_LIBRARY environment variable pointing to SLURM’s own PMI library. Alternatively, it is suggested that you do not apply a third party PMI library (unset I_MPI_PMI_LIBRARY) and use the mpirun command with ssh bootstrap.

Depending on the hardware and software setup of the network on your cluster, you may need to adjust the fabric interface provider. For instance, for Intel MPI, use the FI_PROVIDER variable, e.g., with verbs, tcp, etc.

For further details, please refer to the documentation of your chosen MPI implementation. You may also find useful tips to solve MPI-related issues on the Mrcc user forum (thread no. 1031, 1032, 1092, 1094). In the case of runtime issues (e.g., the program hangs in an MPI parallel job step or runs with error), it may be helpful to increase the verbosity level, e.g., with Intel MPI via setting the environment variables as I_MPI_DEBUG=5, I_MPI_HYDRA_DEBUG=1, and I_MPI_OFI_PROVIDER_DUMP=1.

In order to test if the problem is related to the MPI settings, not optimal but usually accessible options can be tried. For instance, for Intel MPI you may try to set one or more of the following:
unset I_MPI_PMI_LIBRARY
export I_MPI_HYDRA_BOOTSTRAP=ssh
export I_MPI_OFI_PROVIDER=tcp
and depending on the Intel MPI version:
source $<$Intel MPI install dir$>$/parallel_studio_xe_2019/
compilers_and_libraries_2019/linux/mpi/intel64/bin/mpivars.sh
release_mt -ofi_internal=1

or
source $<$Intel MPI install dir$>$/oneapi/mpi/latest/env/vars.sh
-i_mpi_library_kind=release_mt -i_mpi_ofi_internal=1
.