40
doi.org/10.26434/chemrxiv.12552998.v1 Basis Set Extrapolations for Density Functional Theory Peter Kraus Submitted date: 24/06/2020 Posted date: 06/07/2020 Licence: CC BY-NC-ND 4.0 Citation information: Kraus, Peter (2020): Basis Set Extrapolations for Density Functional Theory. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.12552998.v1 Improving results of correlated wavefunction theory calculations by extrapolating from successive basis sets is nowadays a common practice. However, such approaches are uncommon in density functional theory, especially due its faster convergence towards the basis set limit. In this work I present approaches for basis set extrapolation in density functional theory that enable users to obtain results of 4-zeta quality from 3- and 2-zeta calculations, i.e. at a fraction of the computational cost. The extrapolation techniques work well with modern density functionals and common basis sets. File list (2) download file view on ChemRxiv main_v1.0.pdf (408.61 KiB) download file view on ChemRxiv si_v1.0.pdf (201.81 KiB)

Basis Set Extrapolations for Density Functional Theory

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

doi.org/10.26434/chemrxiv.12552998.v1

Basis Set Extrapolations for Density Functional TheoryPeter Kraus

Submitted date: 24/06/2020 • Posted date: 06/07/2020Licence: CC BY-NC-ND 4.0Citation information: Kraus, Peter (2020): Basis Set Extrapolations for Density Functional Theory. ChemRxiv.Preprint. https://doi.org/10.26434/chemrxiv.12552998.v1

Improving results of correlated wavefunction theory calculations by extrapolating from successive basis sets isnowadays a common practice. However, such approaches are uncommon in density functional theory,especially due its faster convergence towards the basis set limit. In this work I present approaches for basisset extrapolation in density functional theory that enable users to obtain results of 4-zeta quality from 3- and2-zeta calculations, i.e. at a fraction of the computational cost. The extrapolation techniques work well withmodern density functionals and common basis sets.

File list (2)

download fileview on ChemRxivmain_v1.0.pdf (408.61 KiB)

download fileview on ChemRxivsi_v1.0.pdf (201.81 KiB)

Basis set extrapolations for density functional theory

Peter Kraus∗

School of Molecular and Life Sciences,Curtin University,

GPO Box U1987, Perth 6845, WA, Australia

Abstract1

Basis set extrapolation is a common technique in wavefunction theory, used to squeeze bet-2

ter performance out of the highest affordable level of theory by combining it with lower quality3

calculations. In this work, I present analogous techniques for basis set extrapolation in density4

functional theory, focusing on [2,3]–ζ calculations, and including double hybrid and dispersion5

corrected functionals. My recommendations are based on basis set limit data from finite element6

calculations, estimates of basis set limits for double hybrid corrections, and they are validated using7

the GMTKN55 and NCDT datasets. A short review of extrapolation methods for Hartree-Fock8

calculations based on modern finite element data is carried out to inform this work. Extrapolation9

of [2,3]–ζ calculations in cc-pvXz-pp and def2-Xvpd basis sets with the proposed recipes routinely10

matches and sometimes outperforms 4–ζ calculations at a fraction of the cost. The methods are11

implemented in Psi4, allowing for an automated and intuitive application.12

1 Introduction13

The concept of estimating complete basis set (CBS) energies using basis set extrapolation methods14

is nothing new for practitioners of wavefunction theory (WFT). Usually, such extrapolation methods15

involve calculating absolute energies using correlation-consistent basis sets of consecutive ζ-qualities,16

with Helgaker’s power formula and cubic scaling (Eq. (1)) being one common example. [1]17

∗E-Mail: [email protected]

1

Ecorr∞ = Ecorr

X − AX−α, α = 3 (1)

Here, Ecorr∞ is the correlation (i.e. post-Hartree-Fock) energy at an infinite basis set size, while X denotes18

the cardinal number of the finite basis set, with this number usually equivalent to the ζ-ness of the basis19

set. The parameter A is system-dependent and is always fitted to EcorrX data. Many different variants of20

Eq. (1) have been proposed since, sometimes replacing the power scaling AX−α term with exponential21

(Ae−αX), [2] exponential–square-root (Ae−α√X , expsqrt in the following) or other functions. Results of22

such [X − 1, X]–ζ extrapolations are usually comparable to the results from a X + 1–ζ calculation,23

provided sufficiently large basis sets are used (X >= 4). [3] Smaller basis sets might perform better with24

a specifically tailored α. [4]25

The Hartree-Fock (H-F) component of WFT calculations converges to the basis set limit rather26

quickly when compared to the correlation energy. In Truhlar’s H-F extrapolation scheme, derived from27

estimates of Ne, HF, and H2O basis set limits, Eq. (1) is applied with α = 3.4 which is optimal for a28

[2,3]–ζ extrapolation. [4] Halkier et al. have proposed the use of Eq. (2) for extrapolating Hartree-Fock29

energies:30

EH−F∞ = EH−F

X − Ae−αX , α = 1.63 (2)

This “global” α parameter has been determined from fits of [2-6]–, [3-6]–, and [2-5]–ζ energies of diatomic31

molecules using correlation-consistent basis sets, with EH−F∞ values sourced from numerical H-F results32

of 9 species available at the time. [5] This extrapolation method performs well, and it is the default33

H-F extrapolation method in Psi4. [6] A three-point variant of Eq. (2), where both α and A are fitted34

to results from [X − 2, X − 1, X]–ζ calculations, is used in cfour, [7] despite the lower generality of35

this variant. [5] The exponential–square-root (expsqrt) function, shown in Eq. (3), is used by default in36

ORCA,37

2

EH−F∞ = EH−F

X − Ae−α√X (3)

with values of α depending on basis set family as well as size. [8] Many other extrapolation schemes are38

available, for a more detailed overview see the recent review of Varandas. [9]39

In density functional theory (DFT), extrapolation methods have gained little traction. Raymond and40

Wheeler reported that while a [3,4]–ζ extrapolation applied to a hybrid density functional approximation41

(DFA) produces reasonable values, the improvements over 4–ζ results are negligible. [10] Jensen showed42

that three-point extrapolation schemes (i.e. where both A and α are fitted) applied to a semi-local43

DFA do not perform well when contracted polarisation-consistent basis sets are used, and two-point44

schemes perform similarly to the non-extrapolated results. [11] This is attributed to the fast convergence45

of DFT with respect to basis set size, which is comparable to H-F, and is therefore much quicker than for46

correlated WFT methods. [11,12] Furthermore, the fitting of functional parameters during the development47

of DFAs may correct for basis set incompleteness, [11] which is why it’s recommended to use basis sets48

comparable to the ones used during the development of such methods. [13]49

With the advent of double hybrid DFAs, which include a correlation component based on second-50

order Møller-Plesset perturbation theory (MP2), the outlook for basis set extrapolation might have51

changed. Similarly, various treatments of dispersion forces have different dependencies on basis set52

sizes. To the best of my knowledge, basis set extrapolation results have been published to-date for only53

one double hybrid DFA, with a focus on chemical kinetics. [14,15] The changes in the density functional zoo54

therefore warrant another visit to the extrapolation pavillion. The goals of this work are the following:55

i) revisit extrapolation schemes for H-F, obtaining values of α derived from modern numerical H-F data;56

ii) investigate basis set convergence for double-hybrid and dispersion-corrected density functionals as57

such, i.e. would an extrapolation work and does it need to be different for single and double hybrids;58

iii) propose, implement, and validate a reasonable and transferrable extrapolation scheme for DFAs;59

and iv) gauge whether such methods are worth it, i.e. whether a [2,3]–ζ extrapolation can approach60

4–ζ results. I have decided to focus on [2,3]–ζ extrapolations, as performance of DFAs does not seem to61

improve significantly past 4–ζ, and there are many large systems for which DFT applied with 3–ζ basis62

3

sets is the currently highest affordable level of theory.63

2 Computational methods64

In this study I investigated the convergence behaviour of three families of basis sets: i) the correlation-65

consistent “Dunning” series supplemented by effective core potentials (ECP), cc-pvXz-pp (X ∈ [d, t, q, 5]);66

ii) the polarisation-consistent contracted and segmented variant of the “Jensen” sets, pcseg-N (N ∈67

[0, 1, 2, 3, 4], with X ≈ N + 1); and iii) the “Karlsruhe” family of basis sets, including their ECPs,68

in (singly-)polarised and augmented variants, def2-Xzvp and def2-Xzvpd, respectively (X ∈ [s, t, q]).69

I have included the Dunning basis sets for comparison with previous works, as well as due to their70

widespread use in DFT calculations. However, some recent works showed that more modern basis sets71

constructed with DFT in mind can be superior to the original Dunning sets for DFT applications. [11,16,17]72

The Jensen basis sets were included as they are modern, and 5 levels of quality are available. Unfortu-73

nately, Jensen basis sets for heavy atoms are not available. Finally, the two variants of the Karlsruhe74

basis sets were included as several functionals have been parametrised using these basis sets, and aug-75

mented variants are available across the whole periodic table. Their main downside compared to the76

Dunning and Jensen basis sets is the lack of a 5-ζ variant.77

To obtain estimates of basis set limits, the H-F energies of two databases of diatomic molecules, and78

MP2 energies of the 15 reference species were calculated using Psi4 1.4a2.dev532. [6] The direct algorithm79

for the evaluation of the electron repulsion integrals was used throughout. For all DFT calculations a80

(150, 974) quadrature grid was used. The MP2 and DFT basis set limits were estimated from calculations81

in [3,4,5]ZaPa-NR-CV basis sets. [18] All MP2 and double-hybrid DFA calculations were carried out with82

core electrons frozen, including the 4f shell for post-lanthanides. Tightened convergence criteria for the83

density and energy (< 10−10 Eh) were applied in all of the above calculations.84

Sampling the large collection of available DFAs and basis sets is always going to introduce biases: an85

exhaustive study is not feasible. For a list of the DFAs that were selected, see Table 1. I have included86

the functionals BLYP-D3(BJ), PBE-D3(BJ), B3LYP-D3(BJ), and B2PLYP-D3(BJ) for their popularity87

in their respective classes. The remaining two generalised gradient approximation (GGA) functionals,88

revPBE-D3(BJ) and B97-D3(BJ), are good performers in the GMTKN55 benchmark, [19] which war-89

4

ranted inclusion. The three meta-GGAs were selected to investigate different dispersion correction90

forms, as well as the differences in approaches to their construction: a “non-empirical” representative91

in SCAN-D3(BJ), a combinatorially optimized B97M-V, as well as an “empirical” representative in92

M06L-D3. My selection of the four single hybrids followed similar patterns, with dlDF+D10 included93

as a counterpoint to M052X-D3: both DFAs are empirically fitted to data, but while in M052X short-94

and medium-range dispersion contributions are included, in dlDF they are excluded by design. Fi-95

nally, the four double hybrids that I selected also range from DFAs with few optimized parameters96

(PBE0DH-D3(BJ)) to methods obtained from extensive fitting to data (DSD-BLYP-D3(BJ)).97

Table 1: Overview of studied DFAs. References correspond to functionals and parametrizations ofdispersion corrections.

Method name DFA type Disp. type Reference

BLYP-D3(BJ) GGA D3(BJ) [20–22]

PBE-D3(BJ) GGA D3(BJ) [22,23]

revPBE-D3(BJ) GGA D3(BJ) [22–24]

B97-D3(BJ) GGA D3(BJ) [22,25]

SCAN-D3(BJ) mGGA D3(BJ) [26,27]

M06L-D3 mGGA D3(0) [28,29]

B97M-V mGGA VV10 [30]

B3LYP-D3(BJ) single hyb. D3(BJ) [22,31,32]

ωB97X-V single hyb. VV10 [33]

M052X-D3 single hyb. D3(0) [29,34]

dlDF+D10 single hyb. DAS2010 [35,36]

DSD-BLYP-D3(BJ) double hyb. D3(BJ) [37]

B2PLYP-D3(BJ) double hyb. D3(BJ) [29,38]

PWPB95-NL double hyb. VV10 [39–41]

PBE0DH-D3(BJ) double hyb. D3(BJ) [42,43]

To obtain HF and DFT energies of species at the basis set limit, I have carried out numerical98

finite element calculations in HelFEM. [44] As neither range-separated hybrids nor double hybrids are99

implemented in HelFEM, the selection of DFAs is limited to BLYP, PBE, revPBE, B97-D, SCAN,100

M06L, B97M-V, B3LYP, M052X, and dlDF. Note that the non-local “VV10” component of B97M-101

V is also not yet implemented in HelFEM. The parameters that I used to obtain total energies with102

the diatomic code are determined for each molecule by the diatomic cbasis tool, requesting 10−10103

accuracy. A 150-point quadrature rule is used with each DFA.104

To investigate the performance of the proposed extrapolation schemes, I chose the GMTKN55 bench-105

5

mark containing 1505 relative energies or 2462 single point calculations. [19] Due to the large size of106

GMTKN55, I used its “diet” subset containing 100 relative energies or 240 single point calculations107

(denoted diet100). [45] The latter has been shown to reproduce the results of the former both qualita-108

tively and quantitatively, but its use is nonetheless re-validated below. Unlike in the original GMTKN55109

study, [19] here I used the same basis sets for all calculations, without augmentation in selected database110

subsets. All calculations are performed with Psi4 using development versions of 1.4, [6] with the default111

convergence criteria for energy and a (75, 302) point integration grid. Double hybrid functionals are112

applied without the frozen core approximation. The figure of merit for the GMTKN55 and diet100113

benchmarks is the weighted mean absolute deviation (WTMAD), which corresponds to the average of114

the weighted absolute devations of computed relative energies from the reference energies in the dataset,115

with weights taken from the original references. [19,45]116

To check performance of the extrapolation methods for energies of van der Waals complexes, I have117

turned to the NCDT dataset [46] in it’s revised form. [47] A modified version of Psi4’s cbs() driver function118

is used to obtain the interaction energies in an automated fashion.119

All raw input and output files, collated results in json format, and Jupyter notebooks used to analyse120

the results and prepare the figures and tables in this manuscript are included in the Supplemental121

information archive.122

3 Results123

As modern DFAs contain various components with presumably different scalings, this is a good point124

to outline the nomenclature I use below. The total energy of a density functional (EDFA), such as125

B2PLYP-D3(BJ), contains a single-reference energy component obtained from the self-consistent cycle126

including the exchange-correlation functional (Efctl), double-hybrid correlation contribution (∆Edh),127

and dispersion correction (Edisp):128

EDFA = Efctl + ∆Edh + Edisp (4)

6

For the B2PLYP-D3(BJ) example above, Efctl is the self-consistent density functional energy, which129

contains the nuclear repulsion, one- and two-electron energies, and the exchange-correlation component130

of the DFA. This single-reference energy contribution contains all components that affect the Kohn–131

Sham orbitals, including the non-local correlation (Enl) term, if present. The ∆Edh is non-zero only132

for double-hybrid functionals and contains the scaled MP2-like correlation. The dispersion term for the133

functionals in this study, including B2PLYP-D3(BJ), depends only parametrically on the functional,134

i.e. it is not affected by basis set size, and therefore shouldn’t be further extrapolated. For MP2, the135

components in the above equation are simply equivalent to the H-F energy, the MP2 correlation energy,136

and zero, respectively.137

3.1 Hartree-Fock extrapolation revisited138

First I focus on the global α = 1.63 value determined for the exponential extrapolation (Eq. (2)) by139

Halkier et al., which was fitted using a set of 9 closed-shell diatomics. [5] While the H-F energy converges140

exponentially with basis set size, [48,49] this is a good point to check whether a power (Eq. (1)) or an141

expsqrt function (Eq. (3)) performs better than the exponential function when used with smaller basis142

sets. For this purpose I used an updated database of H-F energies of 70 diatomic species, obtained143

from numerical basis set calculations, which has been recently published. [44] This dataset includes many144

charged species, open shell diatomics, as well as species with elements from the third row of the periodic145

table. Due to the difficulty in convergence of 3VO− and 1CrMn+, these two molecules are excluded in146

the current study, forming a dataset of 68 species.147

Fig. 1 presents the αs obtained from fits of the logarithms of Eqs. (1–3) to cc-pvXz-pp data (X ∈148

[d, t, q, 5]). Species, for which the total energy doesn’t converge smoothly, or for which the basis set149

contains an ECP, are excluded from this analysis. The average values are α = 1.35±0.24 for exponential,150

α = 4.36± 0.75 for power, and α = 4.91± 0.86 for expsqrt functions, with n = 57. The median values151

(α) are close to the means. As the scatter of the individual αs for the three extrapolation functions is152

very similar, with relative σα of around 18%, the following discussion is focused only on the exponential153

extrapolation function.154

While the α (as well as α) obtained here is significantly lower than the α = 1.63 proposed by Halkier155

7

0.8 1.0 1.2 1.4 1.6

AeX

3 4 5

AX

3 4 5 6Ae

X

Figure 1: Values of α for various extrapolation functions obtained for the set of 68 diatomics, for thecc-pvXz-pp basis set family. The means (α, N) and medians (α, |) also shown. Colours correspond tocationic (red), neutral (green), or anionic (blue) species.

et al., [5] the distribution of charged species indicated by the colour-coding in Fig. 1 shows that anionic156

species converge to the basis set limit slower than neutral or cationic species. This has also been shown157

by Varandas, who highlighted that diffusely augmented basis sets increase the speed of convergence of158

anions, but that it is hard to predict which basis set will yield the best relative energies. [48] The effect159

of different combinations of charges and multiplicities in this dataset is shown as a matrix in Fig. 2.160

There is no correlation of α with multiplicity, but there is a significant correlation with charge. For161

all three charge classes, the relative σα is significantly smaller than the 18% for the overall dataset.162

Therefore, at least for the cc-pvXz-pp basis sets, it may be appropriate to use different α values for163

cationic (1.59), neutral (1.45), or anionic (1.08) species. As this charge-dependent effect is likely to164

be most pronounced in electron affinities and ionisation potentials, the G21EA and G21IP subsets of165

GMTKN55 can be used to validate this charge-dependent approach. In Table 2 I compare the [2,3]–ζ166

and [3,4]–ζ exponential extrapolations using a charge-independent α = 1.35 to an extrapolation using167

the charge-dependent values above. For the electron affinities, the WTMADs of the charge dependent168

extrapolation are lower than for the global approach; for the ionisation potentials the charge dependent169

approach is only beneficial at the [3,4]–ζ level.170

The α values for exponential extrapolation with the four basis set families investigated in this study171

are presented in Table 3. The pcseg-N basis set family (where X ≈ N+1 with N ∈ [0−4]) shows trends172

similar to the cc-pvXz-pp basis sets, but the magnitude of the variations in α results between the charge173

8

Table 2: Comparison of WTMAD values for databases of elecron affinities (G21EA) and ionisationpotentials (G21IP) with exponential global and charge-dependent extrapolation of Hartee-Fock resultsusing cc-pvXz-pp basis sets, with cc-pv5z-pp as reference.

GMTKN55 subsetWTMAD [kJ/mol]

Global α Charge dep. α[2,3]–ζ [3,4]–ζ [2,3]–ζ [3,4]–ζ

G21EA 25.331 9.188 23.192 8.919G21IP 53.815 0.755 53.861 0.616

¹AB

¹AB

²AB

²AB

³AB

³AB

all

all

Multiplicity

AB AB

AB AB

AB + AB +

all all

Char

ge

= 1.13 = 0.13

n = 9

= 1.25 = 0.05

n = 2

= 1.02 = 0.12

n = 8

= 1.08 = 0.15

n = 19

= 1.44 = 0.14

n = 19

= 1.44 = 0.12

n = 6

= 1.46 = 0.11

n = 7

= 1.45 = 0.13

n = 32

= 1.60 = 0.06

n = 5

= 1.57

n = 1 n = 0

= 1.59 = 0.06

n = 6

= 1.37 = 0.22

n = 33

= 1.40 = 0.14

n = 9

= 1.17 = 0.26

n = 15

= 1.35 = 0.24

n = 57

Figure 2: Matrix of values of α, σα, and n for all combinations of charges and multiplicities in thedataset of diatomic molecules, for the cc-pvXz-pp basis set family.

classes is significantly smaller. The overall α = 1.26 is comparable to the results from cc-pvXz-pp basis174

sets. However, the distribution of the α values is almost bimodal as shown in Fig. 3, so the use of a single175

α may still be problematic. The Karlsruhe families converge much faster, with α being almost double176

that of pcseg-N and cc-pvXz-pp. This could also be a consequence of fitting to [2,3,4]–ζ data. The177

inclusion of diffuse functions (def2-Xzvp→ def2-Xzvpd) does not eliminate the scatter in the results for178

anions, but merely shifts the distribution to a higher mean. The extrapolations with power and expsqrt179

functions behave similarly.180

Another approach to H-F extrapolation was proposed recently by Varandas. [48] In summary, the181

method amounts to anchoring an exponential function (Eq. (2)), applicable for all basis set families and182

chemical systems with a universal α = 2.284, by adjusting A to ensure that the fitting function passes183

9

Table 3: Comparison of α± σα values for charge classes and basis set families using exponential extrap-olation.

Basis set familyCharge group

AB+ AB AB− Allcc-pvXz-pp 1.59+/-0.06 1.45+/-0.13 1.08+/-0.15 1.35+/-0.24pcseg-N 1.57+/-0.08 1.26+/-0.19 1.21+/-0.19 1.26+/-0.19def2-Xvp 2.58+/-0.11 2.65+/-0.32 2.11+/-0.38 2.58+/-0.41def2-Xvpd 2.58+/-0.11 2.71+/-0.12 2.59+/-0.61 2.70+/-0.37

0.5 1.0 1.5 2.0 2.5 3.0 3.5cc-pvXz-pp n = 57

pcseg-N n = 43

def2-Xzvp n = 66

0.5 1.0 1.5 2.0 2.5 3.0 3.5def2-Xzvpd n = 65

Figure 3: Values of α for the set of 68 diatomics shown for all studied basis set families. Colourscorrespond to cationic (red), neutral (green), or anionic (blue) species.

through the Hartree–Fock complete basis limit as well as the energy obtained with a STO-2G basis. The184

basis set cardinalities (X) are then renormalised to hierarchical numbers (x), with x = 1 for a STO-2G185

basis set. For the cc-pvXz basis sets, the 〈x〉 values obtained from averaging over a set of 18 neutral,186

closed shell species are 3.02, 3.64, 4.28, and 5.14 for X ∈ [d, t, q, 5], respectively. [48] Unfortunately,187

neither Jensen nor Karlsruhe basis sets were investigated, and the regression was performed with [6,7]–ζ188

extrapolated data as reference values instead of numerical basis set limits. Here, I have re-fitted the189

hierarchical numbers x using the dataset of the 68 diatomic species, by first calculating A for each190

diatomic using the STO-2G result, and then calculating x for each larger basis set, and finally averaging191

and obtaining 〈x〉 over the whole dataset. The 〈x〉s are 3.59±0.54, 4.08±0.53, 4.58±0.58, and 5.26±0.64192

for the cc-pvXz-pp basis sets, which is significantly higher and with a significantly wider σx than the193

results of Varandas. This is likely due to the inclusion of anionic species in the fitting process.194

The performance of the discussed H-F extrapolation methods is presented in Table 4, using cc-195

pv[d,t]z-pp data. In the diatomics dataset, the extrapolated values are compared to and normalised by196

the reference EH−F∞ values obtained from Lehtola, [44] with each species in the dataset weighted equally,197

10

Table 4: Performance of various H-F extrapolation methods in the diatomics dataset and the diet100subset of GMTKN55, using cc-pv[d,t]z-pp data.

Extrapolation method Diatomics MRE diet100 WTMADFunction α [Xd, Xt] Ref. % kJ/mol

exponential 1.630 [2.00, 3.00] [5] 0.0061 12.21exponential 1.450 [2.00, 3.00] [5] 0.0060 12.07power 3.400 [2.00, 3.00] [4] 0.0016 5.76exponential 2.284 [3.02, 3.64] [48] 0.0062 12.41exponential 1.349 [2.00, 3.00] c.w. 0.0060 11.95power 4.361 [2.00, 3.00] c.w. 0.0034 7.29expsquare 4.912 [2.00, 3.00] c.w. 0.0026 6.18exponential 2.284 [3.59, 4.08] c.w. 0.0059 12.07

obtaining a mean relative error (MRE). For the diet100 subset of the GMTKN55 database, the respective198

cc-pv5z-pp values are taken as the reference for each of the 100 reactions, as the goal of this benchmark is199

to estimate the performance of the extrapolation method, not H-F theory as such. The best performing200

scheme overall is Truhlar’s power extrapolation with α = 3.4 (see Eq. (1)), which was developed to match201

estimates of the H-F limit for Ne, HF, and H2O from [2,3]–ζ basis data. [4] The exponential function202

performs significantly worse, whether a global α = 1.63 or [2,3]–ζ specific α = 1.45 is applied. [5] Fitting203

extrapolation constants using a larger dataset, or using the approach of Varandas, [48] fails to improve204

the performance in [2,3]–ζ extrapolation. From the four global fits to the 68 diatomics performed in this205

study, the best function for extrapolating cc-pv[d,t]z-pp data is the expsqrt function (see Eq. (3)).206

3.2 Extrapolation of functional energy in DFAs207

The H-F results show it is not necessary to use the large dataset of 68 diatomics, and that a well-208

performing extrapolation formula can be obtained from fitting to basis set limit estimates of as little as209

3 species, provided it is tailored for a [2,3]–ζ extrapolation. To determine the universality of α values in210

DFT, I have calculated the numerical basis set limit energies for the smaller set of 9 diatomics used by211

Halkier et al. [5] The basis set limits are then used to obtain α for each functional using the cc-pvXz-pp212

basis sets, analogously to the above work for H-F. In addition to a “global” extrapolation using all213

available basis sets within the family, I have also included a “[d,t]” extrapolation using only results from214

[2,3]–ζ basis sets. Only the functional (Efctl) component of the energy is extrapolated. For the B97M-V215

functional, where Efctl contains the non-local correlation (Enl) term, this term is subtracted from the216

11

Efctl value to be directly comparable to the HelFEM results.217

0.5

0.5

1.0

1.0

1.5

1.5

2.0

2.0

M06L-D3M052X-D3

B97M-VdlDF+D10

SCAN-D3(BJ)B97-D3(BJ)PBE-D3(BJ)

BLYP-D3(BJ)revPBE-D3(BJ)B3LYP-D3(BJ)

H-F

AeX

+/-

0.66+/-0.220.78+/-0.081.10+/-0.271.05+/-0.141.23+/-0.251.31+/-0.211.32+/-0.231.33+/-0.231.31+/-0.341.41+/-0.191.66+/-0.09

Figure 4: Box plots of α of the studied DFAs with cc-pvXz-pp basis sets, for the reference dataset of9 closed-shell diatomics of Halkier et al. The symbols are means (N) and medians (|). The medians ofeach DFA are also listed on the right y-axis.

Box plots of the αs for each studied DFA are shown in Fig. 4, using an exponential extrapolation218

with the cc-pvXz-pp family of basis sets. The H-F result of α = 1.61 (α = 1.66) agrees well with the219

reported data of Halkier et al. obtained with the same dataset, [5] validating the current methodology.220

The two Minnesota DFAs are clear outliers, with α below 1.0. This can be partially attributed to the221

non-negligible amount of dispersion trained into both M06L and M052X, especially as the dispersionless222

dlDF, which uses a functional form similar to M052X, is much closer to the overall mean. The overall223

averages (〈α〉) over all but Minnesota DFAs are 1.26± 0.11 for exponential, 4.06± 0.34 for power, and224

4.58 ± 0.40 for expsqrt extrapolation functions in a “global” extrapolation; for a “[d,t]” extrapolation225

these values decrease to 1.26 ± 0.07, 3.12 ± 0.17, and 3.97 ± 0.22, respectively. Results for other basis226

set families are in the Supplemental material.227

The effect of increasing amount of (Hatree-)Fock exchange (Fx) in the DFA on α is shown in Fig. 5,228

for a series of PBE-related functionals. With the exception of H-F which contains no correlation, only229

the percentages of Fock and PBE exchanges are modified, and the PBE correlation is always fully230

included. There is a clear trend of increasing α and α with increasing proportion of Fock exchange.231

The σα decreases significantly with a higher percentage of Fx. A qualitatively similar trend can be seen232

between the BLYP and B3LYP data, and M06L and M052X data in Figure 4. The four GGA functionals233

investigated in this study (PBE, revPBE, BLYP, and B97-D) have a very similar α ≈ 1.28, and the234

increase in α between B3LYP (20% Fx) and BLYP, ∆α = 0.08, is comparable to the difference between235

12

PBE0 (25% Fx) and PBE for which ∆α = 0.09. From this data, I propose an exchange-dependent236

formula to obtain α for extrapolating the functional energy with Eq. (2): α = α0 + ∆α× f(Fx), where237

f(Fx) is a linear function. Two sigmoidal functions (error and logistic) were also tested, but they do238

not improve the performance. Similar formulas can be derived for the power and expsqrt functions, and239

for the “[d,t]” extrapolation. For the Dunning and Karlsruhe basis set families, ∆α is positive, while240

for the pcseg-N family, inclusion of Fx leads to a reduction in α. See the Supplemental material for a241

complete overview.242

1.0

1.0

1.2

1.2

1.4

1.4

1.6

1.6

1.8

1.8

2.0

2.0

0.0

25.033.0

50.066.075.0

100.0

PBE

with

% F

ock

exch

ange

PBE

PBE0PBE0-1/3

PBE50

H-F

Figure 5: The relationship between α and % of Fock exchange as part of total exchange in PBE. Notethat 100% Fock exchange here corresponds to pure H-F. The means (N) and medians (|) are also shown.Calculated with the cc-pvXz-pp basis set family.

3.3 Extrapolation of the correlation energy in double hybrid DFAs243

As previously mentioned in the introduction, the convergence behaviour of the correlation energy in244

WFT is a much more studied phenomenon than the convergence of the functional energy, with Eq. (1)245

being the most widely used formula. The available literature on MP2 extrapolation is too large to review246

here; for a recent thorough investigation of MP2 convergence with various basis sets, I refer the reader247

to the recent work of Kirschner et al. [50]248

As double hybrid DFAs contain an MP2 component, one would expect their convergence behaviour249

to be similar to that of MP2. However, in DFAs the correlation energy is evaluated using the Kohn–250

Sham orbitals as opposed to the Hartree–Fock orbitals in MP2, so directly applying the same α may not251

be ideal. This raises the issue of obtaining reasonable estimates of basis set limits of the double-hybrid252

13

correlation (∆Edh∞ ). Numerical estimates of the MP2 limit (∆EMP2

∞ ) are not as common as for the H-F253

limit. [44] Fortunately, Ranasinghe et al. [18] have collated accurate literature values of the MP2 core-core254

and core-valence contributions from finite element calculations of atoms, [51–54] and explicitly correlated255

calculations of molecules, [55] and proposed a simple linear extrapolation method to obtain near-reference256

results from [3,4,5]–ζ calculations (Eq. (5)).257

∆EMP2∞ ≈ ∆EMP2

5ZaPa−CV + λ× (∆EMP24ZaPa−CV −∆EMP2

3ZaPa−CV), λ = 0.4307 (5)

To prepare a set of ∆Edh∞ data for each of the four functionals studied, I have collated the total MP2258

correlation energies from the same dataset (C, N, O, F, Ne, Ar, F−, Mg, Be, CH2, HF, NH3, F2, N2259

and H2O). The valence-only MP2 correlation is then compared to the extrapolated values using frozen260

core calculations and the above extrapolation approach. [18] The root mean square deviation (RMSD)261

of the two sets is 17.7 mEh, which is higher than the RMSD for core-core + core-valence correlation262

(0.5 mEh), but sufficiently accurate for extrapolation purposes. For comparison, a two point cubic263

power extrapolation from [4,5]ZaPa-CV data has an RMSD of 18.1 mEh. Therefore, the estimated264

∆Edh∞ values were calculated using Eq. (5) (see Supplemental information for the raw data).265

2.2

2.2

2.4

2.4

2.6

2.6

2.8

2.8

B2PLYP-D3(BJ)

DSD-BLYP-D3(BJ)

PBE0DH-D3(BJ)

PWPB95-NL

MP2

AX

+/-

2.59+/-0.16

2.55+/-0.14

2.58+/-0.16

2.48+/-0.15

2.53+/-0.12

Figure 6: Box plots of α for the studied double-hybrid DFAs and MP2 with cc-pvXz-pp basis sets, forthe reference dataset of atoms and small molecules of Ranasinghe et al. The symbols are means (N)and medians (|). The medians of each DFA are also listed on the right y-axis.

The four double-hybrid DFAs and MP2 have a very similar α, as shown in Fig. 6. The convergence266

of MP2, with a global αMP2 = 2.53 is significantly slower than the ideal cubic scaling. For the Dunning267

and Karlsruhe basis set families, the “[d,t]” fits of αdh are ∼ 0.3 lower than the global fits; the Jensen268

14

basis sets are again an outlier for which the global fit is smaller. The “[d,t]” αMP2 values for Ne, HF,269

H2O, and N2 are 2.18, 2.21, 2.37, and 2.28, which is in a good agreement with Truhlar’s αMP2 = 2.2 for270

the same species and basis sets, [4] validating the current method.271

3.4 Performance of [2,3]-ζ extrapolation in GMTKN55272

To significantly reduce the number of calculations required for the benchmarking below, I resort to the273

use of the diet100 subset [45] of the GMTKN55 dataset. [19] The diet100 subset has been validated for274

all of the functionals studied by Goerigk et al., [19] yielding good quantitative agreement between the275

diet100 and full GMTKN55 WTMADs. Here, I re-validate the use of diet100 as a proxy for GMTKN55276

by considering the PBE-D3(BJ) functional and the def2-Xzvpd basis set family. The results presented277

in Table 5 show the discrepancy between the two benchmarks is small, with the largest deviation of278

3 kJ/mol occuring in the def2-qzvpd data. An agreement of the two sets to within 0.2 kJ/mol in the279

extrapolated data validates the use of diet100 subset for this purpose.280

Table 5: Comparison of WTMAD values of the diet100 subset and the full GMTKN55 database for thePBE-D3(BJ) functional, def2-Xzvpd basis sets, and selected extrapolation methods.

Basisdiet100 GMTKN55

WTMAD [kJ/mol]def2-svpd 64.3 62.8def2-tzvpd 43.5 43.6def2-qzvpd 42.6 45.6[s, t]zvpd ORCA [8] 43.1 43.2[s, t]zvpd expsqrt 42.5 42.7

As comparing the extrapolation results of all methods with all studied functionals for all four basis281

set families would be tedious, an overview of the performance of 13 extrapolation methods with the282

cc-pvXz-pp basis sets and a smaller set of functionals is shown in Table 6. Results for other basis283

set families are included in Supplemental material. The extrapolated results are listed along non-284

extrapolated values calculated with 3– and 4–ζ basis sets. A well performing [2,3]–ζ extrapolation285

should approach 4–ζ results. The exponential function performs rather poorly in DFT extrapolations,286

with the extrapolated BLYP-D3(BJ) and B3LYP-D3(BJ) results barely beating 3–ζ data, while for the287

double-hybrid B2PLYP-D3(BJ) it performs even worse. Scaling α based on Fx also shows no benefit,288

15

whether f(Fx) is a linear or sigmoidal function, and will not be considered further. Unsurprisingly, the289

[d,t] fits perform better than global fits. Truhlar’s extrapolation formula for MP2 performs surprisingly290

well for DFT, outperforming the [d,t] defaults used in ORCA, [8] obtaining values within 1.5 kJ/mol of291

4–ζ results, despite the small training set (Ne, HF, H2O) used in its development. The [d,t] variants292

of the power and expsqrt functions are the best perfomers, with results within 0.3 kJ/mol of the 4–ζ293

results in BLYP and B3LYP. The power formula is more convenient, as it uses the same extrapolation294

function for both Efctl and ∆Edh with different αs. Similar trends are present for the other three basis set295

families. I should note the improvement between cc-pvtz-pp and cc-pvqz-pp is on the order of 10 kJ/mol296

in WTMAD, while for the Karlsruhe and Jensen basis sets it is well below 5 kJ/mol. For these families,297

the global expsqrt fit performs slightly better than all other methods, with the extrapolated WTMAD298

generally halfway between 3– and 4–ζ results.299

Table 6: Comparison of WTMAD values in the diet100 subset of the GMTKN55 database for theproposed extrapolation methods for representatives of three classes of DFAs: a GGA (BLYP-D3(BJ)),single-hybrid (B3LYP-D3(BJ)), and double-hybrid (B2PLYP-D3(BJ)).

Basis Function α αdh diet100 WTMAD [kJ/mol]BLYP-D3(BJ) B3LYP-D3(BJ) B2PLYP-D3(BJ)

cc-pvtz-pp – – – 56.0 41.3 28.7cc-pv[d,t]z-pp power [d,t] 3.400 [4] 2.200 [4] 47.0 33.4 23.7cc-pv[d,t]z-pp expsqrt [d,t] 4.420 [8] 2.400 [8] 47.3 33.7 23.5cc-pv[d,t]z-pp exponential global 1.258 2.550 55.3 40.7 30.0cc-pv[d,t]z-pp exponential [d,t] 1.263 2.257 55.3 40.7 30.4cc-pv[d,t]z-pp power global 4.062 2.550 49.4 35.6 25.1cc-pv[d,t]z-pp power [d,t] 3.115 2.257 45.7 32.3 22.5cc-pv[d,t]z-pp expsqrt global 4.577 2.550 47.8 34.1 23.7cc-pv[d,t]z-pp expsqrt [d,t] 3.974 2.257 45.7 32.3 22.5cc-pv[d,t]z-pp exponential global 1.322 + 0.373× f(Fx) 2.550 55.5 40.9 30.3cc-pv[d,t]z-pp exponential [d,t] 1.296 + 0.095× f(Fx) 2.257 55.4 40.8 30.5cc-pv[d,t]z-pp power global 4.254 + 1.160× f(Fx) 2.550 50.0 36.6 26.8cc-pv[d,t]z-pp power [d,t] 3.195 + 0.233× f(Fx) 2.257 46.1 32.8 23.2cc-pv[d,t]z-pp expsqrt global 4.802 + 1.334× f(Fx) 2.550 48.4 35.3 25.7cc-pv[d,t]z-pp expsqrt [d,t] 4.076 + 0.298× f(Fx) 2.257 46.1 32.8 23.2cc-pvqz-pp – – – 45.5 32.0 21.0

The performance of selected functionals combined with the cc-pvXz-pp basis sets in the diet100 sub-300

set of GMTKN55 is shown in Fig. 7. The figure contains results obtained using the power [d,t] extrap-301

olation (•) and global expsqrt extrapolation () in addition to all cc-pvXz-pp results. The dlDF+D10302

functional is not included as its performance in the diet100 benchmark is very poor: the dispersion cor-303

rection in this functional is only defined for multi-fragment species. In most cases, the [d,t] extrapolation304

16

0

0

10

10

20

20

30

30

40

40

50

50

60

60

WTMAD / kJ/mol

PBE-D3(BJ)BLYP-D3(BJ)

M06L-D3B97-D3(BJ)

revPBE-D3(BJ)SCAN-D3(BJ)B3LYP-D3(BJ)

PBE0DH-D3(BJ)B97M-V

M052X-D3B2PLYP-D3(BJ)

PWPB95-NLB97X-V

DSD-BLYP-D3(BJ) 5- 2-

5- 2- 49.345.745.341.341.539.332.329.029.826.922.522.922.121.7

WTM

AD [d

,t] /

kJ/m

ol

Figure 7: The WTMAD values for the diet100 dataset obtained with the listed DFAs using cc-pvXz-ppbasis sets (X ∈ [d, t, q, 5], green bars). The extrapolated cc-pv[d,t]z-pp results (• for [d,t] power, forglobal expqsrt) are also included, with the WTMAD shown on the right y-axis.

brings performance in line with cc-pvqz-pp results, the notable exceptions are the Minnesota functionals305

(M06L-D3 and M052X-D3) due to their different scaling of the functional energy (see Fig. 4), and some306

of the double hybrids. In all cases, the improvement achieved by cc-pv[dt]z-pp extrapolation over pure307

cc-pvtz-pp results is significant.308

0

0

10

10

20

20

30

30

40

40

50

50

60

60

WTMAD / kJ/mol

PBE-D3(BJ)BLYP-D3(BJ)

M06L-D3B97-D3(BJ)

revPBE-D3(BJ)SCAN-D3(BJ)B3LYP-D3(BJ)

PBE0DH-D3(BJ)B97M-V

M052X-D3B2PLYP-D3(BJ)

PWPB95-NLB97X-V

DSD-BLYP-D3(BJ) 4- 2-

4- 2- 48.243.841.140.440.238.531.329.228.122.922.321.820.822.1

WTM

AD [d

,t] /

kJ/m

ol

0

0

10

10

20

20

30

30

40

40

50

50

60

60

WTMAD / kJ/mol

PBE-D3(BJ)BLYP-D3(BJ)

M06L-D3B97-D3(BJ)

revPBE-D3(BJ)SCAN-D3(BJ)B3LYP-D3(BJ)

PBE0DH-D3(BJ)B97M-V

M052X-D3B2PLYP-D3(BJ)

PWPB95-NLB97X-V

DSD-BLYP-D3(BJ) 4- 2-

4- 2- 42.539.239.136.935.334.826.923.724.319.417.318.916.515.2

WTM

AD [d

,t] /

kJ/m

ol

Figure 8: The WTMAD values for the diet100 dataset with def2-Xzvp (left) and def2-Xzvpd (right).Legend as in Fig. 7.

Analogous figures for the def2-Xzvp and def2-Xzvpd basis sets are shown in Fig. 8, the pcseg-N re-309

sults are included in the Supplemental information archive. In the def2-Xzvp and pcseg-N series, neither310

of the extrapolation methods is particularly effective. Especially the DSD-BLYP-D3(BJ) and B2PLYP-311

D3(BJ) results are troubling: the double hybrid performance can be slightly improved (1.5 kJ/mol312

17

reduction in WTMAD) by increasing αdh from 2.382 to 3.0, but even then the extrapolation barely313

outperforms def2-tzvp results. Extrapolation of def2-[s,t]zvp or pcseg-[1,2] results therefore cannot be314

recommended. Upon diffuse augmentation, the difference between 3–ζ and 4–ζ results shrinks further.315

However, this time the global expsqrt extrapolation consistently improves upon def2-tzvpd results, with316

the exception of PWPB95-NL which remains near def2-tzvpd performance.317

3.5 Automated [2,3]–ζ extrapolation of non-covalent interaction energies318

While the GMTKN55 database contains several subsets of non-covalent interaction energies, I have319

decided to use an independent dataset to check the performance for non-covalent energies and also320

gather timing data in an automated fashion. For this purpose, I have modified the cbs() routine of321

Psi4 to accept DFAs, decompose them into fctl, dh, and disp stages consistent with Eq. (4), and allow322

independent extrapolation of these components.323

Table 7: Comparison of RMSD values in the NCDT dataset of non-covalent energies, with cc-pvXz-ppbasis sets. Extrapolation with [d,t] power function, α = 3.115, αdh = 2.257. Timing is the ratio of totalexecution times of cc-pv[dt]z-pp versus cc-pvqz-pp.

FunctionalNCDT RMSD [kJ/mol]

Timingcc-pvtz-pp cc-pvqz-pp cc-pv[dt]z-pp

BLYP-D3(BJ) 7.01 4.84 4.81 83.8%PBE-D3(BJ) 8.86 6.73 6.78 81.3%revPBE-D3(BJ) 5.92 4.44 4.56 81.5%B97-D3(BJ) 5.61 4.14 4.17 83.2%SCAN-D3(BJ) 9.06 7.53 7.62 67.0%M06L-D3 4.29 3.18 3.54 66.1%B97M-V 5.36 3.48 4.07 138.7%B3LYP-D3(BJ) 6.96 4.87 4.81 81.5%M052X-D3 6.02 4.30 4.33 46.6%dlDF+D10 8.28 9.42 9.28 66.1%ωB97X-V 5.03 2.84 3.21 152.4%B2PLYP-D3(BJ) 5.13 2.96 3.60 81.9%DSD-BLYP-D3(BJ) 5.31 3.31 4.31 88.5%PBE0DH-D3(BJ) 7.20 5.60 5.84 94.5%PWPB95-NL 4.05 2.00 2.56 151.1%

Table 7 presents the results for cc-pvXz-pp basis sets, with the cc-pv[d,t]z-pp extrapolation performed324

using the [d,t] power formula from Table 6. For most GGAs, meta-GGAs, and single hybrids, the325

RMSD of the extrapolated results is almost identical to the RMSD of cc-pvqz-pp data at ∼ 80% of the326

18

computational cost. The exceptions are: i) extrapolated M06L-D3, which would benefit from a lower α327

(see Fig. 4); ii) dlDF+D10, where cc-pvtz-pp performs better than both cc-pvqz-pp and cc-pv[dt]z-pp;328

and iii) extrapolated B97M-V and ωB97X-V, which are 50% slower than cc-pvqz-pp due to the cost of329

evaluation of the VV10 contribution as part of the self-consistent cycle. This issue also affects PWPB95-330

NL. However, the Enl contribution can be evaluated as an additive correction at no loss to accuracy. [56]331

Further speedups may be possible, if for example the converged wavefunction at the smaller basis set was332

used as an initial guess for the larger basis set. The improvement of the [d,t] extrapolation over cc-pvtz-333

pp results in double hybrids is generally worse than for the other DFAs, and can be only recommended for334

larger systems, where the cost of cc-pvqz-pp calculations would be prohibitively expensive: the largest335

system in the NCDT dataset is DMS–SO2, for which the computational time of B2PLYP-D3(BJ)/cc-336

pv[dt]z-pp is 56% of B2PLYP-D3(BJ)/cc-pvqz-pp.337

The performance of def2-[s,t]zvp extrapolation (global expsqrt fit) for the NCDT dataset is in line338

with the GMTKN55 results above: the RMSDs of def2-[s,t]zvp are not worse than for def2-tzvp, but def2-339

qzvp performance is not achieved. In any case, the use of large basis sets without diffuse augmentation340

is questionable when accurate non-covalent interaction energies are required. [57] For the same reason,341

I will omit the analysis of pcseg-N with the NCDT dataset; the results for both basis set families are342

tabulated in the Supplemental material. Upon augmentation with diffuse functions, the performance343

of def2-[s,t]zvpd extrapolation (also global expsqrt fit) for non-covalent energies is a positive surprise:344

the RMSDs of def2-qzvpd are matched, if not outperformed, by the extrapolated results (see Table 8).345

dlDF+D10 is again an outlier, as its performance decreases when going from 3–ζ to a 4–ζ basis set.346

3.6 Note on non-local correlation correction347

As the -NL (or -V) non-local corrections are included in the Efctl component, they are extrapolated348

using the same formula and α as the rest of the functional energy. Results for B97M-V presented in349

Fig. 4 show a comparably large spread and a lower α than all other non-Minnesota functionals. The350

Enl term seems to scale almost linearly in the cc-pvXz-pp basis set family, regardless of the integration351

grid used. The DFAs including the VV10 term are some of the best performing functionals, but they do352

not seem to benefit from basis set extrapolation nearly as much as other functionals. The most obvious353

19

Table 8: Comparison of RMSD values in the NCDT dataset of non-covalent energies, with def2-Xzvpdbasis sets. Extrapolation with a global expsqrt function, α = 7.886, αdh = 2.267. Timing is the ratio oftotal execution times of def2-[s,t]zvpd versus def2-qzvpd.

FunctionalNCDT RMSD [kJ/mol]

Timingdef2-tzvpd def2-qzvpd def2-[st]zvpd

BLYP-D3(BJ) 4.09 4.04 3.98 75.8%PBE-D3(BJ) 5.15 5.08 4.89 78.0%revPBE-D3(BJ) 4.30 4.29 4.22 80.6%B97-D3(BJ) 3.91 3.90 3.82 77.8%SCAN-D3(BJ) 6.57 6.45 6.32 62.4%M06L-D3 2.86 2.95 2.75 61.4%B97M-V 2.71 2.68 2.52 132.5%B3LYP-D3(BJ) 3.74 3.66 3.52 76.5%M052X-D3 3.61 3.46 3.39 66.2%dlDF+D10 9.88 10.16 10.08 65.6%ωB97X-V 1.37 1.16 1.20 150.7%B2PLYP-D3(BJ) 1.82 1.85 1.53 77.1%DSD-BLYP-D3(BJ) 2.41 2.48 1.90 77.6%PBE0DH-D3(BJ) 4.65 4.66 4.30 77.5%PWPB95-NL 1.40 1.28 1.38 136.7%

case is in the NCDT results of PWPB95-NL and ωB97X-V with def2-Xzvpd basis sets. It may well be354

better to scale the Enl term separately from Efctl, however, further research in this area is necessary.355

4 Conclusion356

There is no universal recipe for basis set extrapolation in density functional theory. In selected cases, such357

as for cc-pvXz-pp or def2-Xzvpd basis set families, a [2,3]–ζ extrapolation is a cost-efficient approach of358

obtaining results of nearly 4–ζ quality, and I can recommend its use for cases where a 4–ζ calculation is359

prohibitively expensive. However, the recipes are specific to each basis set family, with the cc-pvXz-pp360

sets performing better with a power function fitted to [d,t] data with α = 3.115, and the def2-Xzvpd sets361

with an exponential–square-root function with α = 7.886. I cannot recommend extrapolating pcseg-N362

or def2-Xzvp data.363

The MP2-like correction present in double hybrid density functionals can be extrapolated using a364

power function with αdh = 2.257 for cc-pvXz-pp, and 2.267 for def2-Xzvpd. Despite the use of different365

orbitals, the ∆Edh scales comparably to ∆EMP2. A keen reader will notice the similarity of these results366

20

with Truhlar’s proposed MP2 scaling of 2.2. [4]367

The optimal extrapolation method for Hartree-Fock in [2,3]–ζ basis sets is not exponential, despite368

the exponential convergence of Hartree-Fock towards the complete basis set limit. The power function369

of Truhlar [4] tailored for cc-pv[d,t]z extrapolations with an α = 3.4 remains the best performer given this370

set of requirements. The ideal global α for extrapolation of Hartree-Fock energies of diatomic molecules371

correlates with system charge, but is independent of system multiplicity.372

The development of optimal extrapolation parameters for other basis set families can be considerably373

accelerated by fitting to the basis set limit data, calculated for several functionals in this work. I have374

shown that the dataset of 9 diatomics used by Halkier et al. [5] is more than sufficient for extrapolation375

purposes. While there is a correlation between the amount of Fock exchange in the density functional376

approximation and the optimal extrapolation α, scaling formulas that take the amount of Fock exchange377

into account do not perform as well as functional-independent αs. The modified cbs() routine in Psi4378

allows for an intuitive and automated use of extrapolated density functional theory in calculations of379

energies, gradients, and Hessians.380

5 Acknowledgements381

I would like to thank Susi Lehtola for his help with HelFEM, Julian Gale for comments and feedback,382

and the Forrest Research Foundation for funding. This work was supported by resources provided by the383

Pawsey Supercomputing Centre (project f97) and the National Computational Infrastructure (project384

f97), with funding from the Australian Government and the Government of Western Australia.385

References386

[1] T. Helgaker, W. Klopper, H. Koch, and J. Noga, “Basis-set convergence of correlated calculations387

on water,” The Journal of Chemical Physics, vol. 106, pp. 9639–9646, June 1997.388

[2] D. Feller, “Application of systematic sequences of wave functions to the water dimer,” The Journal389

of Chemical Physics, vol. 96, pp. 6104–6114, Apr. 1992.390

21

[3] C. E. Warden, D. G. A. Smith, L. A. Burns, U. Bozkaya, and C. D. Sherrill, “Efficient and auto-391

mated computation of accurate molecular geometries using focal-point approximations to large-basis392

coupled-cluster theory,” J. Chem. Phys., vol. 152, p. 124109, Mar. 2020.393

[4] D. G. Truhlar, “Basis-set extrapolation,” Chemical Physics Letters, vol. 294, pp. 45–48, Sept. 1998.394

[5] A. Halkier, T. Helgaker, P. Jørgensen, W. Klopper, and J. Olsen, “Basis-set convergence of the395

energy in molecular Hartree–Fock calculations,” Chemical Physics Letters, vol. 302, pp. 437–446,396

Mar. 1999.397

[6] D. G. A. Smith, L. A. Burns, A. C. Simmonett, R. M. Parrish, M. C. Schieber, R. Galvelis, P. Kraus,398

H. Kruse, R. Di Remigio, A. Alenaizan, A. M. James, S. Lehtola, J. P. Misiewicz, M. Scheurer, R. A.399

Shaw, J. B. Schriber, Y. Xie, Z. L. Glick, D. A. Sirianni, J. S. O’Brien, J. M. Waldrop, A. Kumar,400

E. G. Hohenstein, B. P. Pritchard, B. R. Brooks, H. F. Schaefer, A. Y. Sokolov, K. Patkowski,401

A. E. DePrince, U. Bozkaya, R. A. King, F. A. Evangelista, J. M. Turney, T. D. Crawford, and402

C. D. Sherrill, “Psi4 1.4: Open-source software for high-throughput quantum chemistry,” J. Chem.403

Phys., vol. 152, p. 184108, May 2020.404

[7] J. F. Stanton, J. Gauss, L. Cheng, M. E. Harding, D. A. Matthews, and P. G. Szalay, “CFOUR,405

coupled-cluster techniques for computational chemistry, a quantum-chemical program package. Ver-406

sion 2.1,” July 2019.407

[8] F. Neese, “Software update: The ORCA program system, version 4.0,” WIREs Computational408

Molecular Science, vol. 8, no. 1, p. e1327, 2018.409

[9] A. J. Varandas, “Straightening the hierarchical staircase for basis set extrapolations: A low-cost410

approach to high-accuracy computational chemistry,” Annu. Rev. Phys. Chem., vol. 69, pp. 177–411

203, Apr. 2018.412

[10] K. S. Raymond and R. A. Wheeler, “Compatibility of correlation-consistent basis sets with a hybrid413

Hartree-Fock / density functional method,” Journal of Computational Chemistry, vol. 20, no. 2,414

pp. 207–216, 1999.415

22

[11] F. Jensen, “Polarization consistent basis sets. II. Estimating the Kohn–Sham basis set limit,” The416

Journal of Chemical Physics, vol. 116, pp. 7372–7379, May 2002.417

[12] B. C. Cabral and S. Canuto, “The enthalpy of the O–H bond homolytic dissociation: Basis-set418

extrapolated density functional theory and coupled cluster calculations,” Chemical Physics Letters,419

vol. 406, pp. 300–305, May 2005.420

[13] N. Mardirossian and M. Head-Gordon, “Characterizing and understanding the remarkably slow ba-421

sis set convergence of several Minnesota density functionals for intermolecular interaction energies,”422

J. Chem. Theory Comput., vol. 9, pp. 4453–4461, Oct. 2013.423

[14] Y.-Y. Chuang and S.-M. Chen, “Infinite basis set extrapolation for double hybrid density functional424

theory 1: Effect of applying various extrapolation functions,” J. Comput. Chem., vol. 32, pp. 1671–425

1679, June 2011.426

[15] Y.-Y. Chuang and S.-M. Chen, “Infinite basis set extrapolation for double hybrid density functional427

theory 2: Effect of adding diffuse basis functions,” Jnl Chinese Chemical Soc, vol. 59, pp. 1094–1103,428

Sept. 2012.429

[16] F. Jensen, “How large is the elephant in the density functional theory room?,” J. Phys. Chem. A,430

vol. 121, pp. 6104–6107, Aug. 2017.431

[17] D. Feller and D. A. Dixon, “Density functional theory and the basis set truncation problem with432

correlation consistent basis sets: Elephant in the room or mouse in the closet?,” J. Phys. Chem. A,433

vol. 122, pp. 2598–2603, Mar. 2018.434

[18] D. S. Ranasinghe, M. J. Frisch, and G. A. Petersson, “Core-core and core-valence correlation energy435

atomic and molecular benchmarks for Li through Ar,” The Journal of Chemical Physics, vol. 143,436

p. 214110, Dec. 2015.437

[19] L. Goerigk, A. Hansen, C. Bauer, S. Ehrlich, A. Najibi, and S. Grimme, “A look at the density func-438

tional theory zoo with the advanced GMTKN55 database for general main group thermochemistry,439

kinetics and noncovalent interactions,” Phys. Chem. Chem. Phys., vol. 19, no. 48, pp. 32184–32215,440

2017.441

23

[20] A. D. Becke, “Density-functional exchange-energy approximation with correct asymptotic behav-442

ior,” Phys. Rev. A, vol. 38, pp. 3098–3100, Sept. 1988.443

[21] C. Lee, W. Yang, and R. G. Parr, “Development of the Colle-Salvetti correlation-energy formula444

into a functional of the electron density,” Phys. Rev. B, vol. 37, pp. 785–789, Jan. 1988.445

[22] S. Grimme, S. Ehrlich, and L. Goerigk, “Effect of the damping function in dispersion corrected446

density functional theory,” J. Comput. Chem., vol. 32, pp. 1456–1465, May 2011.447

[23] J. P. Perdew, K. Burke, and M. Ernzerhof, “Generalized gradient approximation made simple,”448

Phys. Rev. Lett., vol. 77, pp. 3865–3868, Oct. 1996.449

[24] Y. Zhang and W. Yang, “Comment on “Generalized gradient approximation made simple”,” Phys.450

Rev. Lett., vol. 80, pp. 890–890, Jan. 1998.451

[25] S. Grimme, “Semiempirical GGA-type density functional constructed with a long-range dispersion452

correction,” Journal of computational chemistry, vol. 27, no. 15, pp. 1787–1799, 2006.453

[26] J. Sun, A. Ruzsinszky, and J. P. Perdew, “Strongly constrained and appropriately normed semilocal454

density functional,” Phys. Rev. Lett., vol. 115, p. 036402, July 2015.455

[27] J. G. Brandenburg, J. E. Bates, J. Sun, and J. P. Perdew, “Benchmark tests of a strongly constrained456

semilocal functional with a long-range dispersion correction,” Phys. Rev. B, vol. 94, p. 115144, Sept.457

2016.458

[28] Y. Zhao and D. G. Truhlar, “A new local density functional for main-group thermochemistry,459

transition metal bonding, thermochemical kinetics, and noncovalent interactions,” The Journal of460

Chemical Physics, vol. 125, p. 194101, Nov. 2006.461

[29] L. Goerigk and S. Grimme, “A thorough benchmark of density functional methods for general main462

group thermochemistry, kinetics, and noncovalent interactions,” Phys. Chem. Chem. Phys., vol. 13,463

no. 14, p. 6670, 2011.464

24

[30] N. Mardirossian and M. Head-Gordon, “ΩB97M-V: A combinatorially optimized, range-separated465

hybrid, meta-GGA density functional with VV10 nonlocal correlation,” The Journal of Chemical466

Physics, vol. 144, p. 214110, June 2016.467

[31] A. D. Becke, “A new mixing of Hartree–Fock and local density-functional theories,” The Journal468

of Chemical Physics, vol. 98, pp. 1372–1377, Jan. 1993.469

[32] P. J. Stephens, F. J. Devlin, C. F. Chabalowski, and M. J. Frisch, “Ab initio calculation of vi-470

brational absorption and circular dichroism spectra using density functional force fields,” J. Phys.471

Chem., vol. 98, pp. 11623–11627, Nov. 1994.472

[33] N. Mardirossian and M. Head-Gordon, “ωB97X-V: A 10-parameter, range-separated hybrid, gener-473

alized gradient approximation density functional with nonlocal correlation, designed by a survival-474

of-the-fittest strategy,” Phys. Chem. Chem. Phys., vol. 16, no. 21, p. 9904, 2014.475

[34] Y. Zhao, N. E. Schultz, and D. G. Truhlar, “Design of density functionals by combining the method476

of constraint satisfaction with parametrization for thermochemistry, thermochemical kinetics, and477

noncovalent interactions,” J. Chem. Theory Comput., vol. 2, pp. 364–382, Mar. 2006.478

[35] K. Pernal, R. Podeszwa, K. Patkowski, and K. Szalewicz, “Dispersionless density functional theory,”479

Phys. Rev. Lett., vol. 103, p. 263201, Dec. 2009.480

[36] R. Podeszwa, K. Pernal, K. Patkowski, and K. Szalewicz, “Extension of the Hartree-Fock plus481

dispersion method by first-order correlation effects,” J. Phys. Chem. Lett., vol. 1, pp. 550–555, Jan.482

2010.483

[37] S. Kozuch and J. M. L. Martin, “Spin-component-scaled double hybrids: An extensive search484

for the best fifth-rung functionals blending DFT and perturbation theory,” J. Comput. Chem.,485

pp. 2327–2344, July 2013.486

[38] S. Grimme, “Semiempirical hybrid density functional with perturbative second-order correlation,”487

Journal of Chemical Physics, vol. 124, no. 3, pp. 034108–034108, 2006.488

25

[39] L. Goerigk and S. Grimme, “Efficient and accurate double-hybrid-meta-GGA density function-489

als—Evaluation with the extended GMTKN30 database for general main group thermochemistry,490

kinetics, and noncovalent interactions,” J. Chem. Theory Comput., vol. 7, pp. 291–309, Feb. 2011.491

[40] W. Hujo and S. Grimme, “Performance of the van der waals density functional VV10 and (hy-492

brid)GGA variants for thermochemistry and noncovalent interactions,” Journal of Chemical Theory493

and Computation, vol. 7, no. 12, pp. 3866–3871, 2011.494

[41] F. Yu, “Spin-component-scaled double-hybrid density functionals with nonlocal van der Waals495

correlations for noncovalent interactions,” J. Chem. Theory Comput., vol. 10, pp. 4400–4407, Oct.496

2014.497

[42] E. Bremond and C. Adamo, “Seeking for parameter-free double-hybrid functionals: The PBE0-DH498

model,” The Journal of Chemical Physics, vol. 135, p. 024106, July 2011.499

[43] D. Bousquet, E. Bremond, J. C. Sancho-Garcıa, I. Ciofini, and C. Adamo, “Non-parametrized500

functionals with empirical dispersion corrections: A happy match?,” Theor Chem Acc, vol. 134,501

p. 1602, Jan. 2015.502

[44] S. Lehtola, “Fully numerical Hartree-Fock and density functional calculations. II. Diatomic503

molecules,” Int J Quantum Chem, vol. 119, Oct. 2019.504

[45] T. Gould, “‘Diet GMTKN55’ offers accelerated benchmarking through a representative subset ap-505

proach,” Phys. Chem. Chem. Phys., vol. 20, no. 44, pp. 27735–27739, 2018.506

[46] P. Kraus, D. A. Obenchain, and I. Frank, “Benchmark-quality semiexperimental structural param-507

eters of van der Waals complexes,” J. Phys. Chem. A, vol. 122, pp. 1077–1087, Feb. 2018.508

[47] P. Kraus, “Non-covalent dimers and trimers (NCDT) database version 2.0,” Feb. 2020.509

[48] A. J. C. Varandas, “CBS extrapolation of Hartree–Fock energy: Pople and Dunning basis sets510

hand-to-hand on the endeavour,” Phys. Chem. Chem. Phys., vol. 21, no. 15, pp. 8022–8034, 2019.511

[49] S. Lehtola, “A review on non-relativistic, fully numerical electronic structure calculations on atoms512

and diatomic molecules,” Int J Quantum Chem, vol. 119, p. e25968, Oct. 2019.513

26

[50] K. N. Kirschner, D. Reith, and W. Heiden, “The performance of Dunning, Jensen, and Karlsruhe514

basis sets on computing relative energies and geometries,” Soft Materials, pp. 1–15, Feb. 2020.515

[51] K. Jankowski, P. Malinowski, and M. Polasik, “Second-order correlation energies for F1-, Na1+,516

Mg2+, and Ar8+: Z dependence of irreducible-pair energies,” Phys. Rev. A, vol. 22, pp. 51–60, July517

1980.518

[52] J. R. Flores, “Hierarchic computation of atomic correlation energies using a p-version finite element519

method,” Chemical Physics Letters, vol. 195, pp. 377–382, July 1992.520

[53] J. R. Flores and P. Redondo, “Accurate second-order correlation energies for Mg and Ar,” Int. J.521

Quantum Chem., vol. 45, no. 6, pp. 563–572, 1993.522

[54] J. R. Flores, “New benchmarks for the second-order correlation energies of Ne and Ar through the523

finite element MP2 method,” Int. J. Quantum Chem., vol. 108, pp. 2172–2177, May 2008.524

[55] J. Noga, P. Valiron, and W. Klopper, “The accuracy of atomization energies from explicitly cor-525

related coupled-cluster calculations,” The Journal of Chemical Physics, vol. 115, pp. 2022–2032,526

Aug. 2001.527

[56] A. Najibi and L. Goerigk, “The nonlocal kernel in van der Waals density functionals as an additive528

correction: An extensive analysis with special emphasis on the B97M-V and ωB97M-V approaches,”529

J. Chem. Theory Comput., vol. 14, pp. 5725–5738, Nov. 2018.530

[57] P. Kraus, D. A. Obenchain, S. Herbers, D. Wachsmuth, I. Frank, and J.-U. Grabow, “Xe· · ·OCS:531

Relatively straightforward?,” Phys. Chem. Chem. Phys., vol. 22, no. 10, pp. 5615–5624, 2020.532

27

Basis set extrapolations for density functional theory

P. Kraus∗

Supplementary Material

Additional supplemental files and archives in computer-readable form are available onZenodo, see DOI: 10.5281/zenodo.3904120. An executable version of this archive is availableon Binder.

∗E-Mail: [email protected]

S1

Table S1: Table of extrapolation coefficients α averaged for all but Minnesota functionals,listed by basis set family, extrapolation function, fit, and component of total energy.

Basis Function Fit Component 〈α〉+ /− σαcc-pvXz-pp exponential global fctl 1.258+/-0.114cc-pvXz-pp power global fctl 4.062+/-0.345cc-pvXz-pp expsqrt global fctl 4.577+/-0.401cc-pvXz-pp exponential [d,t] fctl 1.263+/-0.070cc-pvXz-pp power [d,t] fctl 3.115+/-0.172cc-pvXz-pp expsqrt [d,t] fctl 3.974+/-0.220cc-pvXz-pp power global dh 2.550+/-0.045cc-pvXz-pp power [d,t] dh 2.257+/-0.049def2-Xvp exponential global fctl 2.262+/-0.096def2-Xvp power global fctl 6.542+/-0.254def2-Xvp expsqrt global fctl 7.757+/-0.313def2-Xvp exponential [d,t] fctl 2.874+/-0.053def2-Xvp power [d,t] fctl 7.088+/-0.130def2-Xvp expsqrt [d,t] fctl 9.042+/-0.166def2-Xvp power global dh 2.382+/-0.040def2-Xvp power [d,t] dh 2.096+/-0.058def2-Xvpd exponential global fctl 2.302+/-0.117def2-Xvpd power global fctl 6.666+/-0.314def2-Xvpd expsqrt global fctl 7.886+/-0.385def2-Xvpd exponential [d,t] fctl 2.881+/-0.053def2-Xvpd power [d,t] fctl 7.106+/-0.131def2-Xvpd expsqrt [d,t] fctl 9.065+/-0.168def2-Xvpd power global dh 2.267+/-0.033def2-Xvpd power [d,t] dh 1.996+/-0.046pcseg-N exponential global fctl 1.771+/-0.273pcseg-N power global fctl 4.323+/-0.606pcseg-N expsqrt global fctl 5.743+/-0.845pcseg-N exponential [d,t] fctl 2.023+/-0.129pcseg-N power [d,t] fctl 4.990+/-0.319pcseg-N expsqrt [d,t] fctl 6.366+/-0.407pcseg-N power global dh 1.651+/-0.023pcseg-N power [d,t] dh 1.935+/-0.045

S2

Table S2: Table of Fock exchange dependent extrapolation coefficients α + ∆α, listed bybasis set family, extrapolation function, and fit. f(Fx) is a linear function.

Basis Function Fit α+ ∆α× f(Fx)cc-pvXz-pp exponential global 1.322 + 0.373 f(Fx)cc-pvXz-pp exponential [d,t] 1.296 + 0.095 f(Fx)cc-pvXz-pp power global 4.254 + 1.160 f(Fx)cc-pvXz-pp power [d,t] 3.195 + 0.233 f(Fx)cc-pvXz-pp expsqrt global 4.802 + 1.334 f(Fx)cc-pvXz-pp expsqrt [d,t] 4.076 + 0.298 f(Fx)def2-Xvp exponential global 2.286 + 0.481 f(Fx)def2-Xvp exponential [d,t] 2.847 + 0.564 f(Fx)def2-Xvp power global 6.582 + 1.451 f(Fx)def2-Xvp power [d,t] 7.022 + 1.390 f(Fx)def2-Xvp expsqrt global 7.830 + 1.639 f(Fx)def2-Xvp expsqrt [d,t] 8.958 + 1.773 f(Fx)def2-Xvpd exponential global 2.347 + 0.423 f(Fx)def2-Xvpd exponential [d,t] 2.857 + 0.535 f(Fx)def2-Xvpd power global 6.780 + 1.256 f(Fx)def2-Xvpd power [d,t] 7.045 + 1.320 f(Fx)def2-Xvpd expsqrt global 8.033 + 1.473 f(Fx)def2-Xvpd expsqrt [d,t] 8.988 + 1.683 f(Fx)pcseg-N exponential global 1.961 - 0.232 f(Fx)pcseg-N exponential [d,t] 2.102 - 0.096 f(Fx)pcseg-N power global 4.757 - 0.561 f(Fx)pcseg-N power [d,t] 5.185 - 0.238 f(Fx)pcseg-N expsqrt global 6.339 - 0.750 f(Fx)pcseg-N expsqrt [d,t] 6.615 - 0.303 f(Fx)

S3

Tab

leS3:

Tab

leof

lite

ratu

rean

des

tim

ated

vale

nce

-only

∆E

MP2

∞an

d∆E

dh ∞.

Lin

ear

corr

esp

onds

toth

elinea

rex

trap

olat

ion

usi

ng

Eq.

(5),

cubic

corr

esp

onds

toa

cubic

extr

apol

atio

nusi

ng

Eq.

(1).

Molecule

MP2

DFA

Refere

nce

Linear

Cubic

B2P

LY

P-D

3(B

J)

DS

D-B

LY

P-D

3(B

J)

PB

E0D

H-D

3(B

J)

PW

PB

95-N

LC

O-0

.403

74-0

.357

46-0

.35673

-0.1

1004

-0.1

7529

-0.0

5126

-0.0

8491

F2

-0.6

1140

-0.5

8799

-0.5

8691

-0.1

7934

-0.2

8531

-0.0

8327

-0.1

3293

N-0

.113

77-0

.135

52-0

.13554

-0.0

4214

-0.0

6653

-0.0

1970

-0.0

3630

HF

-0.3

1958

-0.3

0152

-0.3

0091

-0.0

9192

-0.1

4696

-0.0

4266

-0.0

7044

CH

2-0

.155

81-0

.168

84-0

.16845

-0.0

5420

-0.0

8528

-0.0

2538

-0.0

4614

Ar

-0.2

5462

-0.2

5331

-0.2

5274

-0.0

8013

-0.1

2627

-0.0

3744

-0.0

6251

Ne

-0.3

2022

-0.3

1960

-0.3

1939

-0.0

9714

-0.1

5530

-0.0

4505

-0.0

7372

C-0

.081

66-0

.098

28-0

.09836

-0.0

3293

-0.0

5160

-0.0

1549

-0.0

3073

N2

-0.4

2078

-0.3

5799

-0.3

5728

-0.1

0977

-0.1

7513

-0.0

5114

-0.0

8462

H2O

-0.3

0035

-0.2

8028

-0.2

7957

-0.0

8553

-0.1

3688

-0.0

3973

-0.0

6631

F-0

.246

20-0

.246

04-0

.24550

-0.0

7401

-0.1

1806

-0.0

3435

-0.0

5758

NH

3-0

.264

37-0

.258

90-0

.25814

-0.0

7905

-0.1

2669

-0.0

3679

-0.0

6217

F−

-0.3

3576

-0.3

3636

-0.3

3611

-0.1

0750

-0.1

6784

-0.0

4980

-0.0

8095

O-0

.177

29-0

.195

87-0

.19561

-0.0

6223

-0.0

9857

-0.0

2906

-0.0

5232

S4

Tab

leS4:

Com

par

ison

ofW

TM

AD

valu

esin

the

die

t100

subse

tof

the

GM

TK

N55

dat

abas

efo

rth

epro

pos

edex

trap

olat

ion

met

hods

wit

hdef

2-X

zvp

bas

isse

tfa

mily.

Basis

Function

ααdh

diet1

00

WT

MA

D[k

J/m

ol]

BLY

P-D

3(B

J)

B3LY

P-D

3(B

J)

B2P

LY

P-D

3(B

J)

def

2-tz

vp

––

–47.1

34.0

22.3

def

2-[s

,t]z

vp

exp

sqrt

[d,t

]10.3

90[1]

2.400[1]

45.7

32.9

23.4

def

2-[s

,t]z

vp

exp

onen

tial

glob

al2.2

62

2.382

47.0

34.0

24.6

def

2-[s

,t]z

vp

exp

onen

tial

[d,t

]2.8

74

2.096

47.0

34.0

25.2

def

2-[s

,t]z

vp

pow

ergl

obal

6.5

42

2.382

44.4

31.8

22.6

def

2-[s

,t]z

vp

pow

er[d

,t]

7.0

88

2.096

44.9

32.3

23.5

def

2-[s

,t]z

vp

exp

sqrt

glob

al7.7

57

2.382

43.8

31.3

22.3

def

2-[s

,t]z

vp

exp

sqrt

[d,t

]9.0

42

2.096

44.9

32.3

23.5

def

2-[s

,t]z

vp

exp

onen

tial

glob

al2.2

86

+0.4

81×f

(Fx)

2.382

47.0

34.0

24.6

def

2-[s

,t]z

vp

exp

onen

tial

[d,t

]2.8

47

+0.5

64×f

(Fx)

2.096

47.0

34.0

25.2

def

2-[s

,t]z

vp

pow

ergl

obal

6.5

82

+1.4

51×f

(Fx)

2.382

44.4

32.1

23.1

def

2-[s

,t]z

vp

pow

er[d

,t]

7.0

22

+1.3

90×f

(Fx)

2.096

44.9

32.4

23.8

def

2-[s

,t]z

vp

exp

sqrt

glob

al7.8

30

+1.6

39×f

(Fx)

2.382

43.9

31.7

22.8

def

2-[s

,t]z

vp

exp

sqrt

[d,t

]8.9

58

+1.7

73×f

(Fx)

2.096

44.9

32.4

23.8

def

2-qzv

p–

––

39.8

27.7

17.3

S5

Tab

leS5:

Com

par

ison

ofW

TM

AD

valu

esin

the

die

t100

subse

tof

the

GM

TK

N55

dat

abas

efo

rth

epro

pos

edex

trap

olat

ion

met

hods

wit

hdef

2-X

zvp

dbas

isse

tfa

mily.

Basis

Function

ααdh

diet1

00

WT

MA

D[k

J/m

ol]

BLY

P-D

3(B

J)

B3LY

P-D

3(B

J)

B2P

LY

P-D

3(B

J)

def

2-tz

vp

d–

––

40.0

27.8

18.6

def

2-[s

,t]z

vp

dex

psq

rt[d

,t]

10.3

90[1]

2.400[1]

39.6

27.3

17.5

def

2-[s

,t]z

vp

dex

pon

enti

algl

obal

2.302

2.267

40.0

27.8

17.9

def

2-[s

,t]z

vp

dex

pon

enti

al[d

,t]

2.881

1.996

40.0

27.8

17.9

def

2-[s

,t]z

vp

dp

ower

glob

al6.

666

2.267

39.3

27.0

17.4

def

2-[s

,t]z

vp

dp

ower

[d,t

]7.

106

1.996

39.4

27.1

17.6

def

2-[s

,t]z

vp

dex

psq

rtgl

obal

7.886

2.267

39.2

26.9

17.3

def

2-[s

,t]z

vp

dex

psq

rt[d

,t]

9.065

1.996

39.4

27.1

17.6

def

2-[s

,t]z

vp

dex

pon

enti

algl

obal

2.347

+0.4

23×f

(Fx)

2.267

40.0

27.8

17.9

def

2-[s

,t]z

vp

dex

pon

enti

al[d

,t]

2.857

+0.5

35×f

(Fx)

1.996

40.0

27.8

17.9

def

2-[s

,t]z

vp

dp

ower

glob

al6.

780

+1.2

56×f

(Fx)

2.267

39.3

27.1

17.5

def

2-[s

,t]z

vp

dp

ower

[d,t

]7.

045

+1.3

20×f

(Fx)

1.996

39.3

27.2

17.6

def

2-[s

,t]z

vp

dex

psq

rtgl

obal

8.033

+1.4

73×f

(Fx)

2.267

39.2

27.0

17.4

def

2-[s

,t]z

vp

dex

psq

rt[d

,t]

8.988

+1.6

83×f

(Fx)

1.996

39.3

27.2

17.6

def

2-qzv

pd

––

–38.3

26.6

16.5

S6

Tab

leS6:

Com

par

ison

ofW

TM

AD

valu

esin

the

die

t100

subse

tof

the

GM

TK

N55

dat

abas

efo

rth

epro

pos

edex

trap

olat

ion

met

hods

wit

hp

cseg

-Nbas

isse

tfa

mily.

Basis

Function

ααdh

diet1

00

WT

MA

D[k

J/m

ol]

BLY

P-D

3(B

J)

B3LY

P-D

3(B

J)

B2P

LY

P-D

3(B

J)

pcs

eg-2

––

–40.5

28.6

18.6

pcs

eg-[

1,2]

exp

sqrt

[d,t

]7.0

20[1]

2.010[1]

38.7

26.8

19.4

pcs

eg-[

1,2]

exp

onen

tial

glob

al1.7

71

1.651

40.4

28.5

22.3

pcs

eg-[

1,2]

exp

onen

tial

[d,t

]2.0

23

1.935

40.4

28.5

21.2

pcs

eg-[

1,2]

pow

ergl

obal

4.3

23

1.651

38.0

26.3

20.2

pcs

eg-[

1,2]

pow

er[d

,t]

4.9

90

1.935

38.3

26.6

19.4

pcs

eg-[

1,2]

exp

sqrt

glob

al5.7

43

1.651

38.1

26.3

20.3

pcs

eg-[

1,2]

exp

sqrt

[d,t

]6.3

66

1.935

38.3

26.6

19.4

pcs

eg-[

1,2]

exp

onen

tial

glob

al1.9

61−

0.2

32×f

(Fx)

1.651

40.4

28.5

22.3

pcs

eg-[

1,2]

exp

onen

tial

[d,t

]2.1

02−

0.0

96×f

(Fx)

1.935

40.4

28.5

21.2

pcs

eg-[

1,2]

pow

ergl

obal

4.7

57−

0.5

61×f

(Fx)

1.651

38.2

26.4

20.3

pcs

eg-[

1,2]

pow

er[d

,t]

5.1

85−

0.2

38×f

(Fx)

1.935

38.5

26.7

19.4

pcs

eg-[

1,2]

exp

sqrt

glob

al6.3

39−

0.7

50×f

(Fx)

1.651

38.3

26.5

20.4

pcs

eg-[

1,2]

exp

sqrt

[d,t

]6.6

15−

0.3

03×f

(Fx)

1.935

38.5

26.7

19.4

pcs

eg-3

––

–36.7

25.0

15.7

S7

0

0

10

10

20

20

30

30

40

40

50

50

60

60

WTMAD / kJ/mol

PBE-D3(BJ)BLYP-D3(BJ)

M06L-D3B97-D3(BJ)

revPBE-D3(BJ)SCAN-D3(BJ)B3LYP-D3(BJ)

PBE0DH-D3(BJ)B97M-V

M052X-D3B2PLYP-D3(BJ)

PWPB95-NLB97X-V

DSD-BLYP-D3(BJ) 5- 2-

5- 2- 43.538.136.136.835.734.326.325.325.019.020.317.816.723.0

WTM

AD [d

,t] /

kJ/m

ol

Figure S1: The WTMAD values for the diet100 dataset with pcseg-N (N ≈ X + 1 ∈[2, 3, 4, 5], green bars). The extrapolated pcseg-[1,2] results (• for [d,t] power, for globalexpqsrt) are also included, with the WTMAD shown on the right y-axis.

S8

Table S7: Comparison of RMSD values in the NCDT dataset of non-covalent energies, withdef2-Xzvp basis sets. Extrapolation with a global expsqrt function, α = 7.757, αdh = 2.382.Timing is the ratio of total execution times of def2-[s,t]zvp versus def2-qzvp.

FunctionalNCDT RMSD [kJ/mol]

Timingdef2-tzvp def2-qzvp def2-[st]zvp

BLYP-D3(BJ) 5.68 4.23 5.08 77.7%PBE-D3(BJ) 7.41 5.59 6.70 77.7%revPBE-D3(BJ) 5.09 4.25 4.71 80.9%B97-D3(BJ) 4.80 3.94 4.40 78.9%SCAN-D3(BJ) 8.00 6.81 7.46 58.9%M06L-D3 3.32 3.01 3.04 58.9%B97M-V 4.19 2.87 3.72 133.6%B3LYP-D3(BJ) 5.61 4.03 4.93 78.0%M052X-D3 4.69 3.73 4.07 62.6%dlDF+D10 8.61 9.93 9.05 63.7%ωB97X-V 3.26 1.56 2.57 154.1%B2PLYP-D3(BJ) 3.25 2.10 3.08 78.0%DSD-BLYP-D3(BJ) 3.32 2.57 3.43 76.2%PBE0DH-D3(BJ) 5.97 4.93 5.60 78.3%PWPB95-NL 2.35 1.33 2.05 139.4%

Table S8: Comparison of RMSD values in the NCDT dataset of non-covalent energies, withpcseg-N basis sets. Extrapolation with a global expsqrt function, α = 7.743, αdh = 1.651.Timing is the ratio of total execution times of def2-[s,t]zvp versus def2-qzvp.

FunctionalNCDT RMSD [kJ/mol]

Timingpcseg-2 pcseg-3 pcseg-[12]

BLYP-D3(BJ) 5.14 4.43 5.22 77.5%PBE-D3(BJ) 6.47 5.46 5.95 77.0%revPBE-D3(BJ) 4.71 4.61 5.09 73.7%B97-D3(BJ) 4.43 4.22 4.70 79.2%SCAN-D3(BJ) 7.40 6.86 6.93 58.8%M06L-D3 3.04 3.15 3.20 59.2%B97M-V 3.62 3.04 3.58 141.0%B3LYP-D3(BJ) 4.90 4.00 4.58 78.5%M052X-D3 4.02 3.63 3.49 61.0%dlDF+D10 8.96 9.99 9.22 62.6%ωB97X-V 2.27 1.33 1.67 153.4%B2PLYP-D3(BJ) 2.42 2.15 2.85 86.0%DSD-BLYP-D3(BJ) 2.58 2.77 3.32 73.9%PBE0DH-D3(BJ) 5.32 5.00 5.14 81.4%PWPB95-NL 1.50 1.39 1.85 145.5%

S9

References

[1] F. Neese, “Software update: The ORCA program system, version 4.0,” WIREs Compu-tational Molecular Science, vol. 8, no. 1, p. e1327, 2018.

S10