If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:

the way mrcc was invoked
the way build.mrcc was invoked
the output of build.mrcc
compiler version (for example: ifort -V, gfortran -v)
blas/lapack versions
as well as gcc and glibc versions

This information really helps us during troubleshooting

Best practices for parallel performance/scaling

MXvo5e35
Topic Author
New Member

4 years 3 months ago #1132 by MXvo5e35

Replied by MXvo5e35 on topic Best practices for parallel performance/scaling

OK, thanks for the clarification. I do appreciate the pointers!

Your idea re: the use of DF for extrapolation is indeed interesting. In fact, the basic setting of my problem revolves around calibration of a composite scheme, so I'm already considering these approaches. As I mentioned, DF adds some more caveats to the extrapolation process, and I have to admit that I'm not entirely up to speed on the theory. Do you possibly have a reference investigating the accuracy of e.g. CBS extrapolation schemes using DF vs. those without?

(Also, a very beginner question... Does the use of DF adjust the overall scaling of the various schemes? For example, CCSD(T) goes as O(nocc^3 nvirt^4) -- does DF change this somehow? I would guess not...?)

Re: CCSD and CCSD(T) scaling. I've been running some more test jobs. The optimised (conventional, disk-based) CCSD code is certainly competitive for performance in a shared-memory setting, but disk I/O is indeed the bottleneck, even using a fast local SSD for storage. As you suggest, the on-disk load of the various integrals becomes prohibitively high around about 650 basis functions, and at that point, performance suffers relative to less disk-intensive CCSD(T) implementations in other codes such as NWChem. (Not a complaint, just an observation.)

For the higher-order calculations, I've been able to get up to CCSDT(Q) for 200+ basis functions (water with cc-pV5Z, so relatively few occupied orbitals) done without a problem. There seems to be more room to scale up here too.

Again, thanks for the info!

Please Log in or Create an account to join the conversation.

nagypeter
Offline
Premium Member
MRCC developer

4 years 3 months ago #1133 by nagypeter

Replied by nagypeter on topic Best practices for parallel performance/scaling

There are a number of studies assessing also the accuracy of DF-CCSD(T), you can start e.g. with these:
pubs.acs.org/doi/10.1021/ct400250u
aip.scitation.org/doi/10.1063/1.4820484
aip.scitation.org/doi/10.1063/1.4905005

DF-CCSD(T) still scales the same as conventional CCSD(T).
The prefactor of some DF-CCSD steps is a bit lower, but the main benefit comes from the much smaller storage requirement of the integrals. In our implementation the I/O is basically eliminated (meaning you will hit an operation count or memory bottleneck much sooner).
Consequently the parallel scaling is also great and not limited by I/O or network speed, thus 1000-1500 orbitals become reachable with your hardware. Many more details about our code are given here:
pubs.acs.org/doi/abs/10.1021/acs.jctc.9b00957
pubs.acs.org/doi/abs/10.1021/acs.jctc.0c01077

Please Log in or Create an account to join the conversation.

Time to create page: 0.037 seconds

Best practices for parallel performance/scaling

MRCC Login