# Curse of the nano-Hartree

5 years 3 weeks ago #513 by Nike
Greetings!
I am still doing calculations on the lowest triplet state of Li2. With aug-cc-pCV5Z, my ROHF energies in MOLPRO and stand-alone MRCC agree with each other to within 25 pico-Hartree when the integral tolerance is 10^-18 in both programs and the scf tolerance is 10^-16 in MOLPRO / 10^-14 in MRCC.

MRCC is consistently 0.153 micro-Hartree lower than MOLPRO and CFOUR at CCSD, CCSD(T), CCSDT, CCSDTQ level for a case where the ROHF energies were in agreement to within 25 pico-Hartree.

In stand-alone MRCC I used itol=18, scftol=14, and cctol=9 (though at iteration 15 of the cc, the energy decrease was 3.6978E-10, and MRCC kept spending time doing 5 more iterations before reporting convergence with energy decrease = 1.8545E-12).

The ROHF is converged and I trust it to within +/- 25 pico-Hartree because of its agreement with MOLPRO:

ROHF/MRCC -14.864,012,797,965 (default itol, 1e-13 and default scftol, 1e-9)
ROHF/MRCC -14.864,012,797,964 (itol=1d-18, scftol=1e-13)
ROHF/MRCC -14.864,012,797,964 (itol=1d-18, scftol=1e-14)
ROHF/MOLPRO -14.864,012,797,989 (itol=1d-18, scftol=1e-16)

Everything after ROHF I cannot trust in the 0.1 micro-Hartree digit because of discrepancy with MOLPRO and CFOUR, but it is interesting that this discrepancy does not change no matter how much correlation we include! I have some results below:

CCSD/MRCC -14.956,239,235,626 (default itol, 1e-13, default scftol, 1e-9, default cctol)
CCSD/MRCC -14.956,239,235,634 (itol=1d-18, scftol=1e-13, cctol=9 but converged much more)
UCCSD/MOLPRO -14.956,239,081,374 (itol=1d-18, scftol=1e-16, cctol=1e-16)
CCSD/CFOUR -14.956,239,082,231 (XFORM_TOL=18, SCF_CONV=16, CC_CONV=16)

So MOLRPO and CFOUR agree to ~1 nano-Hartree, but MRCC is 153 nano-Harree lower! (no matter the itol, scftol, cctol).

CCSD(T)/MRCC -14.956,364,644,050 (default, but the ROHF and CCSD energies shows that it makes no difference)
UCCSD(T)/MOLPRO -14.956,364,489,765 (itol=1d-18, scftol=1e-16, cctol=1e-16)
CCSD(T)/CFOUR -14.956,364,490,584 with default settings

So MOLPRO and CFOUR agree to ~1 nano-Hartree, but MRCC is 153 nano-Hartree lower!

CCSDT/MRCC -14.956,384,374,501 (default, but the ROHF and CCSD energies shows that it makes no difference)
CCSDT/CFOUR -14.956,384,221,664 (XFORM_TOL=18, scfitol=1e-10, last digit still changing)
CCSDT/MRCC-CFOUR -14.956,384,212,115 (fort.55 from CFOUR with default settings)

MOLPRO cannot do CCSDT without the help of MRCC, but once again we find that stand-alone MRCC is 153 nano-Hartree lower than CFOUR!
Interestingly, MRCC on a fort.55 from CFOUR is 9 nano-Hartree *higher* than CFOUR.
This is a 9x larger discrepancy than what we had between CFOUR and MOLPRO for CCSD and CCSD(T), but it is better than the 153 nano-Hartree we keep getting with stand-alone MRCC.

CCSDTQ/MRCC -14.956,385,25
FCIQMC/MOLPRO/NECI -14.956,385,095

Here FCIQMC was done with NECI on MOLPRO integrals, and hasn't fully converged, but MRCC is 155 nano-Hartree lower.
CCSDTQ should agree with all-electron FCIQMC to within a pico-Hartree even though there's 6e-, based on experience with smaller basis sets on the same system:

aug-cc-pCV2Z:
Total CCSDTQ energy [au]: ­14.933575 804573
Total CCSDTQ(P)/A energy [au]: ­14.933575 804686
Total CCSDTQ(P)/B energy [au]: ­14.933575 804687
Total CCSDTQP energy [au]: ­14.933575 804716
Total CCSDTQP(H)/A energy [au]: ­14.933575 804716
Total CCSDTQP(H)/B energy [au]: ­14.933575 804716
Total FCI energy [au]: ­14.933575 804743

aug-cc-pCV3Z:
Total CCSDTQ energy [au]: ­14.950085 761173
Total CCSDTQ(P)/A energy [au]: ­14.950085 761634
Total CCSDTQ(P)/B energy [au]: ­14.950085 761637

In conclusion, for this system changing itol, scftol, and cctol doesn't make much of a difference in stand-alone MRCC:
CCSD/MRCC -14.956,239,235,626 (defaults)
CCSD/MRCC -14.956,239,235,634 (itol=1d-18, scftol=1e-13, cctol=9 but converged much more)

Yet MRCC's CCSD, CCSDT(T), CCSDT, and CCSDTQ are all about 153 nano-Hartree lower than what we get with CFOUR, MOLPRO and NECI.
Clealry all of the 153 nano-Hartree error is happening at the post-HF level, because ROHF agreed with MOLPRO to within 25 pico-Hartree no matter the itol and scftol:

ROHF/MRCC -14.864,012,797,965 (default itol, 1e-13 and default scftol, 1e-9)
ROHF/MRCC -14.864,012,797,964 (itol=1d-18, scftol=1e-13)
ROHF/MRCC -14.864,012,797,964 (itol=1d-18, scftol=1e-14)
ROHF/MOLPRO -14.864,012,797,989 (itol=1d-18, scftol=1e-16)

However, if MRCC uses the CFOUR integrals, the discrepancy is only 9 nano-Hartree, which means:
(i) stand-alone MRCC has the correct ROHF energy, but some error in the AO->MO transformation that leads to a ~161 nano-Hartree lower energy for CCSD, CCSD(T), CCSDT, and CCSDTQ;
(ii) the MRCC program in MRCC gives a CCSDT energy 9 nano-Hartree higher than CFOUR, even when using the fort.55 from CFOUR. MOLPRO and CFOUR agree to within ~1 nano-Hartree for CCSD and CCSD(T).

The error in the MRCC program leading to 9 nano-Hartree discrepancy at CCSDT level is interesting, but not as important as the error in the AO->MO transformation which leads to MRCC giving a ~161 nano-Hartree lower energy at all levels of correlation. The +9 and the -161 cancel a bit, leading to -153. I feel like I have to put a +/-0.033cm-1 error bar on all spectroscopic predictions on this system coming from MRCC.

This was all for aCV5Z. I have written in other threads on this forum, that for aCV6Z and aCV7Z the ROHF is converged and correct with stand-alone MRCC (as in the case here for 5Z), but all post-HF (CISD, CCSD, etc.) has terribly huge residual norms. For aCV6Z and aCV7Z, I have confirmed that the residual norms are small (and I can get convergence) when I use MRCC on a fort.55 from CFOUR. This indicates again that the ROHF energy is fine, but there's a problem at the AO->MO step, leading to a 0.033cm-1 lowering of all post-HF energies with 5Z, and leading to huge residual norms (and no convergence) with 6Z and 7Z. The 5Z basis set for this molecule, is also where we had problems with FCIQMC before switching some numbers from int32 to int64 (4Z worked with int32, but 5Z needed int64), I'm not sure if this helps with debugging the AO->MO transformation.

1) I wonder whether there is anything else I can change other than itol, scftol, cctol to lose less accuracy in MRCC?
2) I also wonder about the possibility to include an implementation of the AO-based transformation which is the default in MOLPRO, and available in CFOUR using ABCDTYPE=AOBASIS, because these AO->MO transformations often take a few days or in some cases more than a month in MRCC but only hours in MOLRPO or CFOUR. The faster integrals would be very valuable also for basis set optimization (where we are doing it hundreds of times), which is something that MRCC is the best at because it supports k-orbitals and MOLPRO doesn't. Perhaps coding the lower-scaling AO->MO transformation from scratch might be easier than finding what's going wrong for big basis sets in the current implementation. If there are no resources to implement this I would be interested in helping with it.

With best wishes!!
Nike Dattani

P.S. there is a slight difference in Bohr to Angstrom conversion in CFOUR, but not between MRCC and MOLPRO:

2014 CODATA: 1 A = 1/0.529,177,210,67(12) Bohr (4.17 A = 7.880,157,943,159,143 Bohr)
MOLPRO: 1 A = 1/0.529,177,209 Bohr (4.17 A = 7.880,157,968,027,6825 Bohr)
MRCC: 1 A = 1/0.529,177,20859 Bohr (4.17 A = 7.880,157,974,133,132 Bohr)
CFOUR: 1 A = 1/0.529,177,249 Bohr (4.17 A = 7.880,157,372,374,11 Bohr)

The MINP and GENBAS are below in case there is any interest in checking if the discrepancy is consistent across other versions/compilers or something:

LI:aug-cc-pCV5Z
EMSL BASIS SET LIBRARY

6
0 1 2 3 4 5
11 10 8 6 4 2
19 13 8 6 4 2

29493.000000 4417.101000 1005.223000 284.700900 92.865430
33.511790 13.041800 5.357536 2.279338 0.993990
0.433471 0.095566 0.044657 0.020633 27.855800
12.508200 5.616600 2.522100 0.006100

0.000018 -0.000003 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.000141 -0.000022 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.000739 -0.000115 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.003107 -0.000487 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.011135 -0.001746 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.034670 -0.005520 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.092171 -0.014928 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.199576 -0.034206 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.328836 -0.062155 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.345975 -0.095902 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.142761 -0.103972 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.005319 0.307151 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
-0.002101 0.579028 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.000815 0.223231 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.0 0.0 0.0 0.0 0.0 0.0 1.000000 0.000000 0.000000 0.000000 0.000000
0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.000000 0.000000 0.000000 0.000000
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.000000 0.000000 0.000000
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.000000 0.000000
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.000000

19.663500 4.623110 1.413780 0.473721 0.176151
0.072675 0.032141 0.014556 29.209500 11.176300
4.276300 1.636200 0.006500

0.000540 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.003865 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.015171 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.049204 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.154661 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.360095 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.441852 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.156000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.0 0.0 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000
0.0 0.0 0.000000 0.000000 0.000000 0.0 1.000000 0.000000 0.000000 0.000000
0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 1.000000 0.000000 0.000000
0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 1.000000 0.000000
0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 1.000000

0.531000 0.248100 0.115900 0.054200 19.845500
7.149200 2.575400 0.023500

1.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000
0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000
0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000
0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 1.0000000

0.292000 0.164100 0.092200 13.250200 4.199100
0.044800

1.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000
0.0000000 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000
0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 1.0000000

0.297100 0.150000 8.409300 0.072400

1.0000000 0.0000000 0.0000000 0.0000000
0.0000000 1.0000000 0.0000000 0.0000000
0.0000000 0.0000000 1.0000000 0.0000000
0.0000000 0.0000000 0.0000000 1.0000000

0.294900 0.127400

1.0000000 0.0000000
0.0000000 1.0000000

# TITLE
basis=aug-cc-pCV5Z
uncontract=off
calc=CCSDTQ
mem=490GB # MUCH less needed if only checking AO->MO error
core=corr
cctol=9
itol=18
scftol=14
ccmaxit=999
scfmaxit=9999
scftype=ROHF
rohftype=semicanonical
scfiguess=ao
mult=3
geom
Li
Li 1 R

R=4.1700

unit=angstroms

5 years 2 weeks ago #523 by kallay
Replied by kallay on topic Curse of the nano-Hartree
Dear Nike:
1) You will find a new ovirt.f version in the download area. Please recompile the code with this file. This will hopefully resolve the issue of the correlated energies.
2) Probably cfour/molpro and mrcc use a different implementation of the Boys function, and this may cause the nano-Hartree discrepancy in the HF energies. This cannot be eliminated by tightening the thresholds but is irrelevant for practical applications.
3) The AO->MO integral transformation can, in principle, be accelerated if the construction of the entire MO integral list is avoided, but this requires partly AO-integral-based CC algorithms, which we do not have for the moment.

Best regards,
Mihaly Kallay

5 years 1 week ago - 5 years 1 week ago #527 by Nike
Replied by Nike on topic Curse of the nano-Hartree
Thank you for the new ovirt.f !!!!!
Sorry it took me a long time to reply because I was away for a couple weeks.

1) I see that in ovirt.f, there was hardcoded tolerances such as:
if(dabs(V2(l,i,jk)).gt.1.d-12)then
but in the new version we are using the true itol given by the user (which in my case is 18).

I still need time to test if it works because the jobs are in the queue, and the cluster has a scheduled power outage this weekend.

Hopefully this will resolve the 161 nano-Hartree difference between CFOUR/MOLPRO and standalone-MRCC, but there was still a 9 nano-Hartree difference between standalone-CFOUR and "MRCC with a fort.55 from CFOUR", which is 9x bigger than the difference between standalone-CFOUR and standalone-MOLPRO.

Standalone-CFOUR and standalone-MOLPRO agreed to about ~1 nano-Hartree at CCSD and CCSD(T) level, so I would hope that MRCC would also agree to 1 nano-Hartree rather than only 9 nano-Hartree.

2) When you say different implementations of the Boys function could be blamed for the nano-Hartree discrepancy in HF energies, perhaps you missed it because my message was so long, but MRCC and MOLPRO agree to about 25 pico-Hatrree at ROHF level:
ROHF/MRCC -14.864,012,797,965 (default itol, 1e-13 and default scftol, 1e-9)
ROHF/MRCC -14.864,012,797,964 (itol=1d-18, scftol=1e-13)
ROHF/MRCC -14.864,012,797,964 (itol=1d-18, scftol=1e-14)
ROHF/MOLPRO -14.864,012,797,989 (itol=1d-18, scftol=1e-16)

Do you still think the 9 nano-Hartree discrepancy at CCSD and CCSD(T) level is due to different implementation of the Boys function, considering that the ROHFs agreed to 0.25 pico-Hartree and MOLPRO/CFOUR matched to within ~1 nano-Hartree ?

3) "The AO->MO integral transformation can, in principle, be accelerated if the construction of the entire MO integral list is avoided, but this requires partly AO-integral-based CC algorithms, which we do not have for the moment." Is it just that no one has yet written the code-generator for doing CCSDT (and higher) with the MO integral list avoided? Certainly the partly AO-integral-based CC algorithms do exist and were implemented in MOLPRO and CFOUR up to CCSD(T), so for CCSDT and higher I wonder if there's some mathematical or technical reason why it's hard, or if it's just a matter of no one (yet) spending the time to write the code generator for the AO-based case?

As always, thank you very much for your insights!
With very best wishes,
Nike Dattani
Last edit: 5 years 1 week ago by Nike.

5 years 1 week ago #528 by kallay
Replied by kallay on topic Curse of the nano-Hartree
Dear Nike,
1) Probably you should try to use even tighter SCF convergence criteria in Cfour. That may decrease the 9 nHartree discrepancy.
2) You are right, only the 25 pHartree difference is caused by the different Boys function.
3) In principle, one could do it at the CCSDT level, but it is more complicated since there are more terms including the (ab|cd) and (ab|ci) type integral lists. On the other hand, for most cases the processing of the triples amplitudes is the bottleneck in a CCSDT calculation, and we would not gain too much with an AO integral based algorithm.

Best regards,
Mihaly Kallay

5 years 1 week ago - 5 years 1 week ago #530 by Nike
Replied by Nike on topic Curse of the nano-Hartree
Dear Mihaly,
Thank you very much for the reply.
1) The 9 nHartree discrepancy is between CFOUR on CFOUR integrals, and MRCC on CFOUR integrals. Therefore the SCF convergence criteria was the same in both cases. Is it still possible that with the same result after SCF and AO-MO transformation, two different correlation programs (without any coding errors) give a 9 nHartree difference? Instead, maybe there was a truncation done when writing the fort.55 by the CFOUR interface (just like there was in the old ovirt.f file)?

Standalone-MOLPRO and Standalone-CFOUR had completely different integrals (because there's no interface between them) but agreed within ~1 nHartree for CCSD and CCSD(T).

2) Thank you. I think we can conclude that the different Boys function implementations are of negligible effect (until we have basis sets whose FCI energy is correct to about 25 pHartree, which is probably going to require a major paradigm shift in which we wouldn't even be using one-particle Gaussians anymore).

3) I think what we're seeing here is the difference between I/O cost vs CPU cost. A routine heavy in I/O might take much longer than a routine dominated by CPU even when the I/O routine has a much better formal scaling. In the case below, the AO-MO transformation took more time than the entire 30 iterations of CCSDT.

Here you can see that:

30 iterations of CCSDT were done after 6352.998 minutes,
then the 3 spin cases for (Q) were done after 26496.881 minutes.

Here you can see that 6 days (8480 minutes) were spent in ovirt.f:

ovirt.f begins at: 2017-10-15 22:21:06
ovirt.f ends at: 2017-10-21 20:23:26

The first of these files suggests that CFOUR finished xvtran in 3346.44 seconds and xintprc in 2649.99 seconds (with ABCDTYPE=STANDARD, not ABCDTYPE=AOBASIS), so maybe you are right that the CCSDT is the bottleneck and not the AO-MO transformation, but the 6 days spent in ovirt.f compared to the 1.3 days it took for mrcc to converge CCSDT to micro-Hartree precision, suggests that ovirt.f is the major bottleneck here. I thought this could be improved by using an AO-based algorithm, but the fact that CFOUR finishes xvtran and xintprc in just a couple hours (instead of 6 days) even with ABCDTYPE=STANDARD, makes me think there's something in ovirt.f that can be majorly sped-up without even switching to an AO-based algorithm. Do you have any idea what might be going on here?

With best wishes!
Nike Dattani
Last edit: 5 years 1 week ago by Nike.