× If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
  • the way mrcc was invoked
  • the way build.mrcc was invoked
  • the output of build.mrcc
  • compiler version (for example: ifort -V, gfortran -v)
  • blas/lapack versions
  • as well as gcc and glibc versions

This information really helps us during troubleshooting :)

OpenMP performance for CCSDTQ, etc.

1 year 5 months ago - 1 year 5 months ago #1253 by TiborGY
Dear all,

For what its worth, on systems with plenty of RAM and a fast NVMe SSD, the parallel scaling does not appear to be limited by disk access speed, but the amount of time spent outside of parallel regions (or  inside critical sections, serial either way).

Running a fairly large CCSDT calculation on 12 OMP threads, I observed the CPU usage alternating between 1 core and 12 cores.

I have not looked at the code yet, but as a first guess finding a way to parallelize the remaining serial sections should improve parallel scaling, even if heavy IO is involved. Modern SSDs not only do not mind receiving parallel IO, but many SSDs can only reach their advertised peak throughput when servicing multiple IO requests in parallel. (Queue Depth > 1)

Of course, as it is always, easier said than done :)
Last edit: 1 year 5 months ago by TiborGY.

Please Log in or Create an account to join the conversation.

1 year 4 months ago #1256 by kipeters
Just an update to this thread - it seems the issues I've been seeing with OpenMP performance is due to the interface with Molpro. Just invoking mrcc by itself has the expected thread activity. No fix yet, but now I know where to look.

Please Log in or Create an account to join the conversation.

Time to create page: 0.019 seconds
Powered by Kunena Forum