- Thank you received: 0
 
            If you have problems during the execution of MRCC, please attach the output with an adequate description of your case as well as the followings:
This information really helps us during troubleshooting
            
        
    
    - the way mrcc was invoked
 - the way build.mrcc was invoked
 - the output of build.mrcc
 - compiler version (for example: ifort -V, gfortran -v)
 - blas/lapack versions
 - as well as gcc and glibc versions
 
This information really helps us during troubleshooting
Best practices for parallel performance/scaling
- MXvo5e35
 - Topic Author
 - New Member
 - 
            
         
        Less
        More
        
            
    
        
            
        
                4 years 3 months ago                #1132
        by MXvo5e35
    
    
            
            
            
            
            
                                
    
                                                
    
        Replied by MXvo5e35 on topic Best practices for parallel performance/scaling            
    
        OK, thanks for the clarification. I do appreciate the pointers!
Your idea re: the use of DF for extrapolation is indeed interesting. In fact, the basic setting of my problem revolves around calibration of a composite scheme, so I'm already considering these approaches. As I mentioned, DF adds some more caveats to the extrapolation process, and I have to admit that I'm not entirely up to speed on the theory. Do you possibly have a reference investigating the accuracy of e.g. CBS extrapolation schemes using DF vs. those without?
(Also, a very beginner question... Does the use of DF adjust the overall scaling of the various schemes? For example, CCSD(T) goes as O(nocc^3 nvirt^4) -- does DF change this somehow? I would guess not...?)
Re: CCSD and CCSD(T) scaling. I've been running some more test jobs. The optimised (conventional, disk-based) CCSD code is certainly competitive for performance in a shared-memory setting, but disk I/O is indeed the bottleneck, even using a fast local SSD for storage. As you suggest, the on-disk load of the various integrals becomes prohibitively high around about 650 basis functions, and at that point, performance suffers relative to less disk-intensive CCSD(T) implementations in other codes such as NWChem. (Not a complaint, just an observation.)
For the higher-order calculations, I've been able to get up to CCSDT(Q) for 200+ basis functions (water with cc-pV5Z, so relatively few occupied orbitals) done without a problem. There seems to be more room to scale up here too.
Again, thanks for the info!
    Your idea re: the use of DF for extrapolation is indeed interesting. In fact, the basic setting of my problem revolves around calibration of a composite scheme, so I'm already considering these approaches. As I mentioned, DF adds some more caveats to the extrapolation process, and I have to admit that I'm not entirely up to speed on the theory. Do you possibly have a reference investigating the accuracy of e.g. CBS extrapolation schemes using DF vs. those without?
(Also, a very beginner question... Does the use of DF adjust the overall scaling of the various schemes? For example, CCSD(T) goes as O(nocc^3 nvirt^4) -- does DF change this somehow? I would guess not...?)
Re: CCSD and CCSD(T) scaling. I've been running some more test jobs. The optimised (conventional, disk-based) CCSD code is certainly competitive for performance in a shared-memory setting, but disk I/O is indeed the bottleneck, even using a fast local SSD for storage. As you suggest, the on-disk load of the various integrals becomes prohibitively high around about 650 basis functions, and at that point, performance suffers relative to less disk-intensive CCSD(T) implementations in other codes such as NWChem. (Not a complaint, just an observation.)
For the higher-order calculations, I've been able to get up to CCSDT(Q) for 200+ basis functions (water with cc-pV5Z, so relatively few occupied orbitals) done without a problem. There seems to be more room to scale up here too.
Again, thanks for the info!
Please Log in or Create an account to join the conversation.
- nagypeter
 - Offline
 - Premium Member
 - 
            
         - MRCC developer
 
            
        
                4 years 3 months ago                #1133
        by nagypeter
    
    
            
            
            
            
            
                                
    
                                                
    
        Replied by nagypeter on topic Best practices for parallel performance/scaling            
    
        There are a number of studies assessing also the accuracy of DF-CCSD(T), you can start e.g. with these:
pubs.acs.org/doi/10.1021/ct400250u
aip.scitation.org/doi/10.1063/1.4820484
aip.scitation.org/doi/10.1063/1.4905005
DF-CCSD(T) still scales the same as conventional CCSD(T).
The prefactor of some DF-CCSD steps is a bit lower, but the main benefit comes from the much smaller storage requirement of the integrals. In our implementation the I/O is basically eliminated (meaning you will hit an operation count or memory bottleneck much sooner).
Consequently the parallel scaling is also great and not limited by I/O or network speed, thus 1000-1500 orbitals become reachable with your hardware. Many more details about our code are given here:
pubs.acs.org/doi/abs/10.1021/acs.jctc.9b00957
pubs.acs.org/doi/abs/10.1021/acs.jctc.0c01077
    pubs.acs.org/doi/10.1021/ct400250u
aip.scitation.org/doi/10.1063/1.4820484
aip.scitation.org/doi/10.1063/1.4905005
DF-CCSD(T) still scales the same as conventional CCSD(T).
The prefactor of some DF-CCSD steps is a bit lower, but the main benefit comes from the much smaller storage requirement of the integrals. In our implementation the I/O is basically eliminated (meaning you will hit an operation count or memory bottleneck much sooner).
Consequently the parallel scaling is also great and not limited by I/O or network speed, thus 1000-1500 orbitals become reachable with your hardware. Many more details about our code are given here:
pubs.acs.org/doi/abs/10.1021/acs.jctc.9b00957
pubs.acs.org/doi/abs/10.1021/acs.jctc.0c01077
Please Log in or Create an account to join the conversation.
        Time to create page: 0.037 seconds