[PDF] NENCI-2021 Part I: A Large Benchmark Database of Non-Equilibrium Non-Covalent Interactions Emphasizing Close Intermolecular Contacts

Abstract

In this work, we present NENCI-2021, a benchmark database of approximately 8,000 non-equilibrium non-covalent interaction energies for a large and diverse selection of intermolecular complexes of biological and chemical relevance. To meet the growing demand for large and high-quality quantum mechanical data in the chemical sciences, NENCI-2021 starts with the 101 molecular dimers in the widely used S66 and S101 databases, and extends the scope of these works by: (i) including 40 cation- and anion-\pi complexes, a fundamentally important class of non-covalent interactions (NCIs) that are found throughout nature and pose a substantial challenge to theory, and (ii) systematically sampling all 141 intermolecular potential energy surfaces (PES) by simultaneously varying the intermolecular distance and intermolecular angle in each dimer. Designed with an emphasis on close contacts, the complexes in NENCI-2021 were generated by sampling seven intermolecular distances along each PES (ranging from 0.7\times\mathrm{-}1.1\times the equilibrium separation) as well as nine intermolecular angles per distance (five for each ion-\pi complex), yielding an extensive database of 7,763 benchmark intermolecular interaction energies (E_{\rm int}) obtained at the CCSD(T)/CBS level of theory. In addition, a wide range of intermolecular atom-pair distances are also present in NENCI-2021, where close intermolecular contacts involving atoms that are located within the so-called van der Waals envelope are prevalent -- these interactions in particular pose an enormous challenge for molecular modeling and are observed in many important chemical and biological systems. A detailed SAPT-based energy decomposition analysis also confirms the diverse and comprehensive nature of the intermolecular binding motifs present in NENCI-2021.

Full PDF

NNENCI-2021 Part I: A Large Benchmark Database of Non-EquilibriumNon-Covalent Interactions Emphasizing Close Intermolecular Contacts

Zachary M. Sparrow, a) Brian G. Ernst, a) Paul T. Joo, Ka Un Lao, and Robert A. DiStasio Jr. b) Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853 USA (Dated: 5 February 2021)

In this work, we present nenci- , a benchmark database of approximately , non-equilibrium non-covalent interaction energies for a large and diverse selection of intermolecular complexes of biological andchemical relevance. To meet the growing demand for large and high-quality quantum mechanical data inthe chemical sciences, nenci- starts with the molecular dimers in the widely used S66 and S101databases, and extends the scope of these works by: ( i ) including cation- and anion- π complexes, a fun-damentally important class of non-covalent interactions (NCIs) that are found throughout nature and posea substantial challenge to theory, and ( ii ) systematically sampling all intermolecular potential energysurfaces (PES) by simultaneously varying the intermolecular distance and intermolecular angle in each dimer.Designed with an emphasis on close contacts, the complexes in nenci- were generated by sampling sevenintermolecular distances along each PES (ranging from . × − . × the equilibrium separation) as well asnine intermolecular angles per distance (ﬁve for each ion- π complex), yielding an extensive database of , benchmark intermolecular interaction energies ( E int ) obtained at the CCSD(T)/CBS level of theory. The E int values in nenci- span a total of . kcal/mol, ranging from − . kcal/mol to +186 . kcal/mol, witha mean (median) E int value of − . kcal/mol ( − . kcal/mol). In addition, a wide range of intermolecularatom-pair distances are also present in nenci- , where close intermolecular contacts involving atoms thatare located within the so-called van der Waals envelope are prevalent—these interactions in particular posean enormous challenge for molecular modeling and are observed in many important chemical and biologicalsystems. A detailed SAPT-based energy decomposition analysis also conﬁrms the diverse and comprehensivenature of the intermolecular binding motifs present in nenci- , which now includes a signiﬁcant number ofprimarily induction-bound dimers ( e.g. , cation- π complexes). nenci- thus spans all regions of the SAPTternary diagram, thereby warranting a new four-category classiﬁcation scheme that includes complexes pri-marily bound by electrostatics ( , ), induction ( ), dispersion ( , ), or mixtures thereof ( , ). Acritical error analysis performed on a representative set of intermolecular complexes in nenci- demon-strates that the E int values provided herein have an average error of ± . kcal/mol, even for complexes withstrongly repulsive E int values, and maximum errors of ± . − . kcal/mol ( i.e. , approximately ± . kJ/mol)for the most challenging cases. For these reasons, we expect that nenci- will play an important role in thetesting, training, and development of next-generation classical and polarizable force ﬁelds, density functionaltheory (DFT) approximations, wavefunction theory (WFT) methods, as well as machine learning (ML) basedintra- and inter-molecular potentials. I. INTRODUCTION

With tunable strengths situated between thermal ﬂuc-tuations and covalent bonds, non-covalent interactions(NCIs) are ubiquitous in nature and play a criticalrole in determining the structure, stability, and functionin a number of systems throughout chemistry, biology,physics, and materials science.

One particularly illus-trative example is the famous DNA double helix, whosestructure is stabilized by a complex network of hydro-gen bonds and π - π stacking interactions between con-stituent nucleobases. In organic synthesis and biochem-istry, many catalysts and enzymes function by leveragingNCIs to position/orient substrates for the ensuing reac-tion and/or stabilize critical points along the reactionpathway, e.g. , ion- π interactions can stabilize intermedi- a) These authors contributed equally to this work. b) Electronic mail: [email protected] ates and transition states with excess charge.

Overthe past two decades, NCIs have garnered critical recog-nition throughout the chemical sciences, and have nowbecome an integral part of “chemical intuition” when ra-tionalizing complex chemical structures and/or processesas well as designing molecular systems ( e.g. , catalysts)for optimal performance and/or novel applications. Inthis regard, there are quite a number of NCI-based appli-cations actively under investigation, ranging from crys-tal engineering (where hydrogen and halogen bondingare used to direct molecular assembly) and artiﬁcialmolecular machines (where π - π stacking, hydrogen bond-ing, and dispersion/van der Waals (vdW) forces are lever-aged to control complex motion at the nanoscale) todrug discovery (where candidate molecules are selectedand screened based on speciﬁc NCIs present in the cor-responding active site). Given their importance and prevalence, it is imper-ative that there exists a suite of computational meth-ods that can provide an accurate, reliable, and compu-tationally eﬃcient description of NCIs for systems rang- a r X i v : . [ phy s i c s . c h e m - ph ] F e b ing from small gas-phase molecular dimers to the com-plex tertiary and quaternary structure of proteins in sol-vent. To meet these goals, a number of computationaltechniques have been developed over the past century,including (but not limited to): model intermolecularpotentials ( e.g. , Lennard-Jones), classical and polar-izable force ﬁelds, density functional theory (DFT)approximations with corrections for dispersion/vdW in-teractions, eﬃcient ( i.e. , linear scaling) algorithmsfor highly accurate wavefunction theory (WFT) meth-ods, and more recently, the large and rapidly growingsuite of machine learning (ML) based approaches. During this time, such computational methods have en-joyed tremendous success and made critical contributionsto a number of diﬀerent ﬁelds, e.g. , identifying promisingpharmaceutical molecules, predicting (meta-)stablemolecular crystal polymorphs, and elucidating su-percritical behavior in high-pressure liquid hydrogen, to name a few. However, we would argue that the nextgeneration of theoretical approaches for describing NCIswould tremendously beneﬁt by addressing the followingchallenges in an accurate, reliable, and computationallyeﬃcient manner: ( i ) the need to describe NCIs in largemolecular and condensed-phase systems ( i.e. , collectivemany-body eﬀects, solvation/solvent eﬀects, simultane-ous treatment of short-, intermediate-, and long-rangeNCIs on the same footing); ( ii ) the need to describethe diverse types of NCIs on the same footing ( i.e. , simi-lar performance for hydrogen bonding, π - π stacking, dis-persion, ion- π interactions, etc); and ( iii ) the needto describe NCIs in equilibrium and non-equilibrium sys-tems on the same footing ( i.e. , similar performance acrossentire potential energy surfaces (PES)). An essential part of developing next-generation the-oretical methods for describing NCIs involves testingand/or training new approximations against highly accu-rate benchmark data. For the increasingly popular suiteof ML-based models—which require large amounts ofhigh-quality data to learn the quantum mechanics under-lying NCIs—such reference data is of critical importance.However, such benchmark non-covalent/non-bonded in-teraction (or binding) energies ( E int ) are seldom experi-mentally available, especially for large/complex systemsand non-equilibrium conﬁgurations. Instead, one usuallyrelies on quantum chemical and/or quantum Monte Carlomethods ( i.e. , WFT methods) to obtain highly accu-rate and systematically improvable E int values for bench-marking and training purposes. On the WFT side, cou-pled cluster theory including single, double, and pertur-bative triple excitations in conjunction with an extrap-olation to the complete basis set limit (CCSD(T)/CBS)has long been considered the de facto “gold standard”for generating accurate E int data for small- and medium-sized organic molecules, and has therefore been used togenerate a number of seminal benchmark databases forNCIs. One of the ﬁrst of these databases, the so-called S22 database, includes CCSD(T)-quality E int values for a set of small-/medium-sized biologically- relevant intermolecular complexes (comprised of {C, H,O, N}) in their respective optimized (equilibrium) ge-ometries, and was designed to cover a number of diﬀer-ent intermolecular binding motifs ( i.e. , single and doublehydrogen bonds, dipole-dipole interactions, π - π stack-ing, dispersion, C–H · · · π , etc). Following the successand widespread use of S22 in the testing and param-eterization of many theoretical methods for describingNCIs, the amount of benchmark-quality E int data wassubstantially increased with the introduction of the S66database (which includes equilibrium intermolecularcomplexes of similar size and composition to that foundin S22) as well as extensions thereof to include complexeswith non-equilibrium intermolecular distances (along aseries of dissociation curves in S22x5 and S66x8 )and non-equilibrium intermolecular angles (at the equi-librium distance in S66a8). During the same time, otherbenchmark NCI databases were constructed to reﬂect thediverse number of NCI types (or binding motifs) foundin: halogen-containing systems (X40x10), nucleobasedimers (ACHC), charge transfer complexes (CT), alkane dimers (ADIM6), large molecular dimers (L7), host-guest complexes (S12L), halogen-bonded systems(XB18), sulfur-containing systems (SULFURx8), andmany more. Along similar lines, there are alsoNCI databases based on a symmetry-adapted perturba-tion theory (SAPT) decomposition of E int into compo-nents ( i.e. , electrostatics, exchange, induction, and dis-persion), which have been used to train force ﬁelds formolecular dynamics simulations. Of particular in-terest here is the S101x7 database, which starts withthe molecular dimers in S66 and expands this set to in-clude additional biologically-relevant complexes con-taining halogens ( i.e. , F, Cl, Br) and second-row elements( i.e. , S and P), as well as additional intermolecular com-plexes involving charged systems and/or water. Like theS66x8 database, S101x7 also includes complexes withnon-equilibrium intermolecular distances by computingSAPT-based E int values for select points along each in-termolecular PES; in the S101x7 case, these seven pointsranged from . × − . × the equilibrium intermolecu-lar separation in an eﬀort to better capture short-rangecharge penetration eﬀects. While such existing databases are growing in size,most are still relatively small (containing (cid:46) inter-action energies), making them insuﬃcient for the rapidlygrowing ﬁeld of ML-based intra-/inter-molecular poten-tials. While composite databases can be consider-ably larger, the accuracy and reliability of such compileddata is inconsistent and can potentially be a source ofboth random and systematic error. Due to the high com-putational cost of generating benchmark E int values forlarge systems, most existing databases (with the excep-tion of L7 and S12L ) have been limited to small-to-medium organic/biological molecules (usually containing < atoms); as a consequence, many of these databasesdo not capture the collective nature of NCIs ( i.e. , many-body eﬀects, solvation/solvent eﬀects, NCIs across mul-tiple length scales) present in large/complex moleculesand condensed-phase systems. In addition, most existingdatabases have focused on common intermolecular bind-ing motifs such as hydrogen and halogen bonding, π - π stacking, dipole-dipole interactions, dispersion, and C–H · · · π interactions, while other important binding motifs(like cation- and anion- π interactions) have been largelyunderrepresented. As such, these databases tend to in-clude intermolecular complexes that are primarily boundby electrostatics, dispersion, or a mixture thereof, buthave not included intermolecular complexes that are pri-marily induction-bound. Furthermore, prior databases( e.g. , S22x5, S66x8, S66a8, S101x7) primarily focused onthe equilibrium geometry and a single displacement fromthe equilibrium geometry ( i.e. , scaling the intermolecu-lar distance or rotating one monomer), but very few haveexplored wider swaths of the intermolecular PES. In thisregard, most databases have also only slightly touchedupon close intermolecular contacts ( i.e. , the short-rangeand often repulsive sector of the intermolecular PES), al-though there are some examples where such short-rangeconsiderations have been incorporated ( e.g. , S101x7 and R160x6 ). As a result, the performance of manytheoretical methods for accurately and reliably describ-ing NCIs in large and complex systems, for a diverse arrayof binding motifs, and across signiﬁcant portions of theintermolecular PES is simply not well known.Accurate and reliable descriptions of non-equilibriumNCIs at reduced intermolecular separations—where sev-eral strong and competing short-range intermolecularforces are at play—are important for a number of reasonsand pose a substantial challenge to theory. For instance,there are numerous examples throughout chemistry andchemical biology where close intermolecular contacts areeither present at equilibrium or force the system to adopta diﬀerent conﬁguration. A striking example of thiswas recently observed when studying the enantioselec-tivity of sBOX catalysts, where a combination of attrac-tive and repulsive NCIs are responsible for the enantio-determining C – CN bond formation in chiral nitriles. Intermolecular close contacts also play a crucial role inthe study of systems operating under high-pressure con-ditions, ranging from the microscopic structure of super-critical water to the high-pressure synthesis of com-pounds with atypical compositions and novel properties as well as the search for high- T c superconducting materi-als. Theoretically speaking, SAPT decomposition stud-ies have shown that the intermolecular distance canhave a profound inﬂuence on the absolute and relativemagnitudes of the underlying E int components ( i.e. , elec-trostatics, exchange, induction, and dispersion), imply-ing that the forces present in short-range non-equilibriumintermolecular complexes of small/simpler molecules canmimic those found in larger/more complicated systemsat equilibrium separations. Interestingly, this also sug-gests that training and/or testing theoretical methods onnon-equilibrium conﬁgurations (particularly in the short-range) of small-to-medium molecular dimers can be used as a surrogate for describing the NCIs in a more diverserange of large (and possibly intractable) systems. Sinceﬁnite-temperature molecular dynamics and Monte Carlosimulations require a consistent treatment of the struc-tures and energetics across the entire PES, an accurateand reliable treatment of non-equilibrium NCIs (includ-ing short-range as well as intermediate- and long-ranginteractions) is of enormous importance for these applica-tions as well. However, the diﬃculties in obtaining suchan accurate and reliable theoretical description of non-equilibrium NCIs across multiple length scales shouldalso be emphasized. For instance, the long-range sec-tor of the intermolecular PES requires a balanced de-scription of both electrostatics and dispersion, and thiscan be particularly challenging when dealing with NCIsthat also include charged species and/or molecules withsubstantial multipole moments. For larger intermolecu-lar separations, intermolecular energies (and forces) tendto be small, which provides additional challenges whentrying to describe points along the PES on the samefooting. At reduced intermolecular separations, the in-creased amount of orbital (or density) overlap betweenmonomers gives rise to a complex interplay betweenstrongly attractive and strongly repulsive intermolecularforces ( e.g. , charge transfer and penetration, Pauli repul-sion, many-body exchange-correlation eﬀects, etc), andan error when describing any one of these componentscan lead to disastrous results. For such short-rangenon-equilibrium NCIs, the performance of the currentsuite of theoretical methods is still an open question, anda number of studies have reported higher errors for re-pulsive intermolecular contacts.

In this regime,even the suitability of high-level WFT-based approachesfor generating benchmark E int data is still largely unre-solved as such approaches suﬀer from issues related tothe use of incomplete basis sets ( i.e. , basis set incom-pleteness and superposition errors) in conjunction withan approximate treatment of electron correlation eﬀects(including questions regarding the reliability of pertur-bative expansions).In this work, we directly address the aforementionedchallenges needed for training, testing, and develop-ing next-generation theoretical approaches for describingNCIs by introducing nenci- , a benchmark databaseof approximately , N on- E quilibrium N on- C ovalent I nteraction energies for a diverse selection of molec-ular dimers of biological and chemical relevance. Start-ing with the dimers in the S101 (and hence S66 )databases, which contain a diverse set of intermolecularbinding motifs ( i.e. , single and double hydrogen bonds,halogen bonds, ion-dipole and dipole-dipole interactions, π - π stacking, dispersion, X–H · · · π ) as well as a largenumber of molecular dimers involving water (which rep-resents a crucial ﬁrst step towards generating benchmark E int values in aqueous environments), nenci- ex-tends the scope of these seminal works in two direc-tions. For one, nenci- includes cation- and anion- π complexes, a fundamentally important and particularlystrong class of NCIs that are primarily induction-bound and characterized by equilibrium E int values which aretypically larger in magnitude than hydrogen bonds andsalt bridges. As such, an accurate and reliable descriptionof ion- π interactions poses substantial diﬃculties for the-ory, and their inclusion in nenci- directly addressesthe challenge of simultaneously describing diverse NCItypes on the same footing ( i.e. , point ( ii ) above). Sec-ondly, nenci- also includes an extensive and system-atic sampling of equilibrium and non-equilibrium conﬁg-urations on each of the intermolecular PES by si-multaneously varying the intermolecular distance and in-termolecular angle in each dimer. Designed with an em-phasis on close intermolecular contacts, the complexesin nenci- were generated by sampling seven inter-molecular distances (ranging from . × − . × the equi-librium separation) as well as nine intermolecular anglesper distance (ﬁve for each ion- π complex), yielding anextensive database of , benchmark E int values ob-tained at the CCSD(T)/CBS level of theory. In doingso, nenci- directly addresses the challenges of de-scribing the collective nature of NCIs in large/complexsystems ( i.e. , point ( i )) and simultaneously describingNCIs in equilibrium and non-equilibrium systems onthe same footing ( i.e. , point ( iii )). The E int values in nenci- span a total of . kcal/mol, ranging from − . kcal/mol (corresponding to the strongly attrac-tive Li + · · · Benzene ion- π complex) to +186 . kcal/mol(corresponding to a strongly repulsive DMSO · · · DMSOcomplex that has been scaled to . × the equilib-rium intermolecular separation and rotated to a non-equilibrium angle), with a mean (median) E int value of − . kcal/mol ( − . kcal/mol). A detailed SAPT-based energy decomposition analysis demonstrates thediverse and comprehensive nature of nenci- , whichspans all regions of the corresponding ternary diagramand includes intermolecular binding motifs primarilybound by electrostatics ( , ), induction ( ), disper-sion ( , ), or mixtures thereof ( , ). A critical er-ror analysis performed on a representative set of inter-molecular complexes in nenci- demonstrates thatthe E int values provided herein at the CCSD(T)/CBSlevel have an average error of ± . kcal/mol, even forcomplexes with strongly repulsive E int values, and max-imum errors of ± . − . kcal/mol ( i.e. , approximately ± . kJ/mol) for the most challenging cases. Designedto meet the growing demand for large and high-qualityquantum mechanical data in the chemical sciences, weexpect that nenci- will be an important resource fortesting, training, and developing next-generation forceﬁelds, DFT approximations, WFT methods, and ML-based intra-/inter-molecular potentials.The remainder of this manuscript is organized as fol-lows. Section II describes the construction of nenci- ,including the selection of molecular dimers, generationof equilibrium and non-equilibrium intermolecular com-plexes, a detailed description of the employed computa-tional protocol, and a guide to obtaining the database. Section III discusses the properties of nenci- , includ-ing a statistical analysis of the intermolecular interactionenergies and closest intermolecular contacts, an SAPT-based energy decomposition analysis of the intermolec-ular binding motifs, as well as a critical assessment ofthe error in the benchmark E int values provided herein.The manuscript ends with some brief conclusions andfuture directions in Section IV. In a follow-up to thiswork, many popular WFT and DFT methods are ex-plicitly tested on the nenci- database, where it isshown that there is a nearly universal increase in errorwhen describing the repulsive wall of the intermolecularPES and that ion- π complexes can be quite challengingto model in an accurate and reliable fashion. II. CONSTRUCTION OF THE NENCI-2021 DATABASEA. Selection of Molecular Dimers nenci- is a large database of ≈ , benchmarkintermolecular interaction energies ( E int , see Sec. II C)that includes a diverse selection of molecular dimers andbinding motifs of biological and chemical relevance, withan emphasis on non-equilibrium (attractive and repul-sive) conﬁgurations and close intermolecular contacts. Asdepicted in the left panel of Fig. 1, the construction of nenci- starts with the molecular dimers in theS101 database (a superset containing the earlier con-structed S66 database), which were carefully chosen tocontain small molecules with the NCIs found in biolog-ical and chemical systems. As such, nenci- inher-its the extensive sampling of molecule types in S66 andS101, which are comprised of the {H, C, N, O, F, P, S,Cl, Br} atom types, range in size from small ( e.g. , H O,ethene, ethyne) to medium ( e.g. , uracil, indole, pentane),and include second- and third-row elements ( e.g. , DMSO,MeCl, BenBr) as well as positively- ( e.g. , MeNH ,Imidazole + , Guanidine + ) and negatively- ( e.g. , AcO – ,H PO , HPO ) charged species. In addition, nenci- also inherits a wide variety of intermolecular bindingmotifs, including dimers with single and double hydro-gen bonds, halogen bonds, and X − H · · · π interactions,as well as intermolecular complexes primarily bound bydispersion, electrostatics ( e.g. , ion-dipole, dipole-dipole,etc), and mixtures thereof. Another salient beneﬁt of us-ing S66 and S101 as the foundation for nenci- is thelarge number of dimers involving water, which providesa crucial ﬁrst step towards the generation of benchmarkintermolecular interaction energies in an aqueous envi-ronment. nenci- extends these databases in the followingtwo ways: ( i ) it includes new cation- and anion- π complexes for a total of molecular dimers, and( ii ) it systematically samples both equilibrium and non-equilibrium intermolecular distances as well as inter-molecular angles for each dimer (with a particular em-phasis on close intermolecular contacts) for a total of Li + TrifluorotriazineThyminePyridineBenzeneUracil CytosineGuanineAdenineNa + F - Cl - WaterMeOHMeNH PeptideAcOHAcNH Ethene Ethyne PentaneNeopentaneCyclopentane MeSHDMSOHPO - H PO - H PO MeFMeClMeBrBenFBenClBenBrMeNH + Imidazole + Guanidine + AcO - MeSMeImidazolePyrrolidinePhenolIndoleCH F CH Cl CH Br ●● ●● ●● ●● ●● ●● ●● .

70 0 .

80 0 .

90 0 .

95 1 .

00 1 .

05 1 . R0.00E

RA B

FIG. 1. (

Left ) Graphical depiction of the molecular dimers in the nenci- database. nenci- contains all of themolecular dimers in the original S66 database (purple lines), the additional dimers present in the S101 (superset) database (green lines, S66 ⊂ S101), as well as a new set of cation- and anion- π complexes (red lines, S66 ⊂ S101 ⊂ nenci- ).In this graph, each monomer is represented by a vertex, the size of which is proportional to the number of molecular dimersinvolving that monomer; graph edges connecting two vertices indicate a molecular dimer formed from the connected monomers.Bold edges between vertices denote two diﬀerent molecular dimer orientations involving the connected monomers ( e.g. , forwater–phenol, water is the hydrogen-bond donor in one dimer and the hydrogen-bond acceptor in the other). Chords passingthrough the center of a vertex indicate a molecular dimer formed from a single monomer ( e.g. , there is one water dimer,two uracil dimers, and three pyridine dimers in nenci- ). ( Right ) Overall description of the nenci- database. Foreach of the dimers described above, nenci- generates a series of equilibrium and non-equilibrium conﬁgurations by simultaneously sampling seven intermolecular distances and nine intermolecular angles (ﬁve for the ion- π complexes due tosymmetry considerations). As such, nenci- includes , benchmark CCSD(T)/CBS intermolecular interaction energies,which correspond to a wide range of equilibrium and non-equilibrium (both repulsive and attractive) geometries and emphasizeclose intermolecular contacts. See Secs. II A–II C for the details regarding the construction of nenci- . , benchmark interaction energies. In particular, nenci- includes ion- π complexes comprised of thesimplest biologically relevant monovalent cations (Li + ,Na + ) and anions (F – , Cl – ) interacting with a representa-tive set of π -systems, which includes the ﬁve DNA/RNAnucleobases (adenine, cytosine, guanine, thymine, uracil)as well as benzene, pyridine, and triﬂuorotriazine. Theinclusion of ion- π complexes in nenci- was primarilydriven by the fact that ion- π interactions are among thestrongest NCIs known (with intermolecular interactionenergies often rivaling that of hydrogen bonds and saltbridges) and have been observed throughout chemistryand biology. This extension was also motivated bysome of our recent work, which used SAPT to demon-strate that cation- π complexes are primarily bound byinduction, while anion- π complexes are bound by a com-plex interplay between induction, dispersion, and electro-statics; as such, their inclusion substantially expands thescope/range of intermolecular binding motifs in nenci- (see Sec. III B). As shown in paper-ii of this se-ries, this complex interplay between intermolecular forces(in addition to the presence of charged atomic species)in ion- π complexes poses a unique challenge when tryingto obtain accurate and reliable intermolecular interactionenergies using both WFT and DFT methods. In addi-tion, the inclusion of promiscuous ion- π binders ( i.e. , π - systems such as the DNA/RNA nucleobases, which canform favorable ion- π complexes with both cations and an-ions ) as well as π -systems that can only form energet-ically favorable ion- π complexes with cations ( e.g. , ben-zene) or anions ( e.g. , triﬂuorotriazine) is also well-alignedwith one of the fundamental goals of nenci- : to pro-vide a more comprehensive sampling of both attractiveand repulsive non-equilibrium conﬁgurations containinga diverse array of NCI types.Motivated by the S22x5, S66x8, and S101x7 databases, in which intermolecular interaction energycurves were constructed for each molecular dimer, aswell as the S66a8 database, in which the inter-molecular angles were sampled, nenci- systemat-ically samples both equilibrium and non-equilibriumintermolecular distances as well as intermolecular an-gles for each of the molecular dimers describedabove. As depicted in the right panel of Fig. 1, nenci- samples seven intermolecular distances( i.e. , . × , . × , . × , . × , . × , . × , . × the equi-librium intermolecular separation) and nine intermolec-ular angles (only ﬁve intermolecular angles for the ion- π complexes, vide infra ); for more details, see Sec. II B. nenci- therefore contains benchmark intermolecu-lar interaction energies (see Secs. II C and III C) for × geometries (conﬁgurations) for each of the molecular dimers in the S101 database, and × geometries for each of the ion- π complexes, yield-ing a total of ×

101 + 35 ×

40 = 7 , equilibriumand non-equilibrium intermolecular complexes. By in-cluding such a systematic sampling of equilibrium andnon-equilibrium structures, nenci- is a relativelylarge database that contains a wide range of attrac-tive and repulsive intermolecular interaction energies (seeSec. III A); as such, we believe that nenci- willbe well-suited for in-depth studies of the NCIs foundthroughout biology and chemistry, as well as trainingand testing next-generation density functional approxi-mations, dispersion corrections, polarizable force ﬁelds,and ML-based potentials. By including an extensive setof angularly sampled geometries at . × and . × theequilibrium intermolecular separation, nenci- alsoincludes a wide range of close intermolecular contacts,which are found throughout chemistry and chemical bi-ology, as well as high-pressure systems; here, we stressagain that benchmark intermolecular interaction energiesin this regime not only serve as surrogates for larger/morecomplex systems at equilibrium, but are also importantto ensure similar performance across the entire inter-molecular PES when training, testing, and developingnovel theoretical methods. B. Generation of Equilibrium and Non-EquilibriumIntermolecular Complexes

Unless otherwise speciﬁed, all monomer geometrieswere taken from the S66 and S101 databases. For theeight π -systems used to construct the ion- π complexesin nenci- , the monomer geometries for benzene,pyridine, and uracil were taken from the S66 database,while the monomer geometries for the DNA/RNA nucle-obases were taken from our recent work on promiscuousion- π binding. During the construction of the equilib-rium and non-equilibrium molecular dimer geometries,we employed the frozen monomer convention in which allmonomers were kept ﬁxed at their optimized geometries.The molecular dimer geometries in the S66 database were also taken as is and without any changes;for the remaining molecular dimers, equilibrium ge-ometries were optimized (see Sec. II C) along a pre-deﬁned characteristic intermolecular interaction vector.This characteristic intermolecular interaction vector wasbased on the interaction type ( e.g. , hydrogen-bonded,halogen-bonded, dispersion-bound, ion- π , etc) assignedto the molecular dimer via chemical intuition. Dimersthat appear in the S66 database were assigned the sameinteraction type as in the original work, and the re-maining dimers were assigned an interaction type thatwas as consistent as possible with the S66 convention.Given one of the following interaction types, the charac-teristic intermolecular interaction vector was deﬁned as:• For hydrogen- (halogen-) bonded systems, the in-teraction vector points between the hydrogen (halo- gen) bond donor and the hydrogen (halogen) bondacceptor. For double-hydrogen-bonded systems,the interaction vector is deﬁned as the mean ofthe two hydrogen-bond vectors (with both takento originate from the same monomer).• For dispersion-bound systems, the interaction vec-tor points from the center of mass of monomer A to the center of mass of monomer B .• For ion- π complexes, the interaction vector pointsfrom the ion to the nuclear center of charge of the π -system (computed using only the atoms in eachring, i.e. , the ﬁve carbons and nitrogen in pyridine).Here, we note in passing that this on-axis placementof the ion does not necessarily correspond to thelowest energy geometry of each ion- π complex. • Finally, there remain a few special cases ( i.e. , theT-shaped benzene dimer), which do not ﬁt well intoany of these categories. Such systems are treatedanalogously with the dispersion-bound complexes,but only a subset of atoms is used in calculatingan eﬀective “molecular center” to ensure that theinteraction vector accurately characterizes the in-teraction. For reference, the atoms used to calcu-late the interaction vector for each such complexare provided in Table S1.All remaining equilibrium molecular dimer geometrieswere obtained by minimizing the intermolecular inter-action energy by rigidly translating monomer A alongthe characteristic intermolecular interaction vector (seeSec. II C), and then used as starting points to generateall non-equilibrium structures.To systematically sample both intermolecular dis-tances and intermolecular angles for the moleculardimers in nenci- , we started with the proceduredevised by Řezáč, Riley, and Hobza when constructingthe S66x8 and S66a8 databases, and extended thisprotocol to accommodate a broader range of intermolec-ular interaction types and orientations. As such, the molecular dimer geometries in the S66a8 database werealso taken as is and without any changes. The procedurefor generating the remaining , non-equilibriumintermolecular complexes in nenci- is outlinedbelow, with STEPS 1–5 graphically illustrated for thewater dimer in Fig. 2: STEP 1.

Starting with an optimized equilibriumintermolecular complex, arbitrarily label each monomeras either A or B (except for the ion- π complexes, inwhich the ion should be labelled as monomer A ). Drawthe characteristic intermolecular interaction vector from B to A (dashed black line) according to the interactiontype assigned to the molecular dimer ( vide supra ).Deﬁne the z -axis (solid red arrow) along the interactionvector. STEP 2.

Without loss of generality, assume that A willbe rotated around B (the alternative will be dealt with in FIG. 2. Graphical depiction of STEPS 1–5 in the protocol used for generating four (of eight) non-equilibrium intermolecularangles for the water dimer. As described in the text, a local reference frame ( x -axis: solid blue arrow; y -axis: solid yellowarrow; z -axis: solid red arrow) is deﬁned with respect to the characteristic intermolecular interaction vector (dashed blackline) between monomers A (red) and B (gray), as well as the principal axis on monomer A (solid black line). Preliminarygeometries for the ﬁrst four non-equilibrium intermolecular angles are then obtained by rotating A around the x - and y -axeson B by θ = ± ◦ . For clarity, the inset to STEP 5 also provides a view down the z -axis of the corresponding non-equilibriumgeometries. To obtain preliminary geometries for the remaining four non-equilibrium intermolecular angles, this procedure isrepeated after swapping the monomer labels. See Sec. II B for more details. STEP 8 below). To determine the axes of rotation, ﬁrstﬁnd the principal axis ( C n ) corresponding to monomer A ( i.e. , the molecular axis with the highest degree ( n ) ofrotational symmetry); for the water monomer depictedin Fig. 2, the principal axis is the solid black line labelled C . If no principal axis with n ≥ exists, we follow theconvention used during the construction of the S66a8 database, i.e. , an approximate principal axis is deﬁnedby removing all hydrogen atoms from the molecule andreducing the identity of each heavy atom and functionalgroup to identical spheres. STEP 3.

Deﬁne the y -axis (solid yellow arrow) tobe perpendicular to the z -axis and the principal axis of A . STEP 4.

Deﬁne the x -axis (solid blue arrow) to beperpendicular to the z - and y -axes, thereby completelyspecifying the local reference frame used in this work. STEP 5.

To generate preliminary geometries for theﬁrst four non-equilibrium intermolecular angles, rotate A about the x - and y -axes passing through the tail ofthe interaction vector ( i.e. , located on monomer B ) by θ = ± ◦ . STEP 6.

For each non-equilibrium intermolecularangle, minimize the intermolecular interaction energy by rigidly translating A along the characteristic intermolec-ular interaction vector (see Sec. II C). For the ion- π complexes that are repulsive along the entire dissociationcurve ( e.g. , Na + · · · triﬂuorotriazine), the minimum ofthe SAPT exchange + induction + dispersion (EID) en-ergy was used in lieu of the intermolecular interactionenergy (see Sec. II C). Deﬁne the intermolecular distance( i.e. , the length of the characteristic intermolecularinteraction vector) in each optimized geometry as theequilibrium ( . × ) intermolecular distance for the givennon-equilibrium intermolecular angle. STEP 7.

For each non-equilibrium intermolecularangle, scale the corresponding (optimized) interactionvector by factors of . × , . × , . × , . × , . × , and . × , and rigidly translate A consistent with each scaledvector. This will provide molecular dimer geometriesalong four separate intermolecular dissociation curvescorresponding to each of the four non-equilibriumintermolecular angles. STEP 8.

Switch the A and B labels, and repeat STEPS1–7. This will provide molecular dimer geometries alongthe intermolecular dissociation curves correspondingto each of the remaining four non-equilibrium inter-molecular angles (for a total of eight non-equilibriumintermolecular angles). N.B.: The ion- π complexesin nenci- only have four unique non-equilibriumintermolecular angles due to the spherical symmetry ofthe ion; as such, STEP 8 is unnecessary and can beskipped for these molecular dimers. STEP 9.

For the equilibrium intermolecular angle, alsoscale the corresponding (optimized) interaction vector byfactors of . × , . × , . × , . × , . × , and . × , andrigidly translate A consistent with each scaled vector.This will provide molecular dimer geometries along theintermolecular dissociation curve corresponding to theequilibrium intermolecular angle. C. Computational Details

Intermolecular interaction energies ( E int ) for each ofthe , intermolecular complexes in nenci- werecomputed via E int = E AB − E A − E B , (1)in which E AB is the total energy of the dimer and E A ( E B ) is the total energy of monomer A ( B ). As men-tioned above, all monomers were kept ﬁxed at their op-timized geometries, and the counterpoise correction ofBoys and Bernardi was applied to minimize basis setsuperposition error (BSSE).Unless otherwise speciﬁed, Dunning’s correlation con-sistent basis sets (with and without diﬀuse functions),namely cc-pVXZ and aug-cc-pVXZ (with X = D, T,Q), along with the frozen core (FC) approximationwere used for all atoms except Li and Na. To provide amore accurate description of the core/valence electronsin the cation- π complexes, the cc-pwCVXZ and aug-cc-pwCVXZ basis sets were used for Li and Na inconjunction with the following modiﬁed FC approxima-tion: Li + = 1s (no core) and Na + = [He]2s ([He]core). All calculations employed the resolution-of-the-identity (RI) or density-ﬁtting (DF) approximation dur-ing self-consistent ﬁeld (SCF) calculations at the mean-ﬁeld Hartree-Fock (HF) level as well as during post-HF calculations to account for electron correlation ef-fects; the RI/DF approximation has been shown to in-troduce negligible errors when computing intermolecularinteraction energies. Whenever available, the cor-responding JKFIT and RI auxiliary basis sets were usedin conjunction with each primary (atomic orbital) basisset, i.e. , cc-pVXZ-JKFIT/cc-pVXZ-RI were usedwith cc-pVXZ, and aug-cc-pVXZ-JKFIT/aug-cc-pVXZ-RI were used with aug-cc-pVXZ. For the cation- π complexes, the def2-aQZVPP-JKFIT/def2-aQZVPP-RIauxiliary basis sets (which are some of the largestavailable auxiliary basis sets) were taken from the MOL-PRO basis set library and used in conjunctionwith cc-pwCVXZ and aug-cc-pwCVXZ for Li and Na.Throughout this work, we used the abbreviation aXZ todenote the following basis set usage: aug-cc-pVXZ (with aug-cc-pVXZ-JKFIT/aug-cc-pVXZ-RI) for {H, C, N, O,F, S, P, Cl, Br}; aug-cc-pwCVXZ (with def2-aQZVPP-JKFIT/def2-aQZVPP-RI) for {Li, Na}; we also use theabbreviation haXZ ( i.e. , heavy-aug-cc-pVXZ, also knownas jul-cc-pVXZ ) to mean: cc-pVXZ (with cc-pVXZ-JKFIT/cc-pVXZ-RI) for {H}, aug-cc-pVXZ (with aug-cc-pVXZ-JKFIT/aug-cc-pVXZ-RI) for {C, N, O, F, S,P, Cl, Br}, and aug-cc-pwCVXZ (with def2-aQZVPP-JKFIT/def2-aQZVPP-RI) for {Li, Na}.For each molecular dimer and non-equilibrium inter-molecular angle , the corresponding optimal intermolec-ular distance ( . × ) was obtained via a constrained min-imization of E int at the BSSE-corrected MP2/cc-pVTZlevel (see Eq. (1)); for the molecular dimers not includedin the original S66 database, the same procedure wasalso used to obtain the optimal intermolecular distancefor the equilibrium intermolecular angle. In practice, thiswas accomplished by computing E int for a series of dimergeometries in which monomer A (and/or B ) was rigidlytranslated along the characteristic intermolecular inter-action vector (see Sec. II B), and then locating the min-imum value along the corresponding cubic spline inter-polant.Benchmark E int values in nenci- were obtainedusing Eq. (1) with all dimer ( E AB ) and monomer ( E A and E B ) contributions computed using the “gold stan-dard” CCSD(T) method extrapolated to the completebasis set (CBS) limit, i.e. , E CCSD(T) / CBS ≡ E MP2 / CBS + δE CCSD(T) / haTZ . (2)In this expression, the CBS-extrapolated MP2 total en-ergy, E MP2 / CBS ≡ E MP2 / a(TQ)Z = E HF / aQZ + E MP2 / a(TQ)Zcorr , (3)was obtained using the two-point extrapolation proce-dure of Halkier et al. on the MP2 correlation energy,namely, E MP2 / a(XY)Zcorr = X E MP2 / aXZcorr − Y E MP2 / aYZcorr X − Y (4)with X = 3 (aTZ) and Y = 4 (aQZ). The so-called“delta” CCSD(T) correction, δE CCSD(T) / haTZ = E CCSD(T) / haTZ − E MP2 / haTZ , (5)was computed using the haTZ basis set. The accuracyof this scheme for computing E int —in particular for in-termolecular complexes with particularly close contacts( i.e. , . × the equilibrium intermolecular separation)—iscritically assessed below in Sec. III C.The energy decomposition analysis scheme (and clas-siﬁcation of intermolecular binding motifs) provided inSec. III B was based on calculations at the SAPT2+/aDZlevel of theory, the so called “silver standard” ofSAPT. FIG. 3. Normalized probability density functions (PDFs) ofthe benchmark E int values in the nenci- database as afunction of the intermolecular distance (with the . × and . × scaled intermolecular distances omitted for clarity).The E int values in nenci- range from − . kcal/mol(most attractive) to +186 . kcal/mol (most repulsive), witha mean (median) interaction energy of − . kcal/mol( − . kcal/mol). Insets display the peaks of the . × , . × ,and . × PDFs as well as the (positive E int ) tails of the . × and . × PDFs.

All calculations in this work were performed usingthe

Psi4 ( v1.2 ) software program. During all HFcalculations, the SCF convergence parameters were setto . × − in the total energy ( e_convergence =1E-8 ) and . × − in the root-mean-square DIIS er-ror ( d_convergence = 1E-8 ). For all CCSD(T) cal-culations, the CCSD convergence parameters were setto . × − in the total energy ( e_convergence =1E-6 ) and . × − in the residual of the t -amplitudes( r_convergence = 1E-5 ). D. Obtaining the NENCI-2021 Database

A single zip ﬁle containing the Cartesian coordinatesof the , intermolecular complexes in nenci- (in xyz format) is provided in the Supplementary Material.The properties of each monomer ( i.e. , charge, multiplic-ity, number of atoms), the corresponding benchmark E int value, as well as the CCSD(T)/CBS and SAPT energeticcomponents can be found in the comment line of each xyz ﬁle (see README ﬁle for additional details).

III. PROPERTIES OF THE NENCI-2021 DATABASEA. Statistical Analysis of Intermolecular Interaction Energiesand Closest Intermolecular Contacts

A well-balanced database of intermolecular interac-tions should have a wide range of E int values, and this isindeed the case for nenci- , as evidenced by the nor-malized E int distributions provided in Fig. 3. With E int values ranging from − . kcal/mol to +186 . kcal/mol,the benchmark intermolecular interaction energies in nenci- span . kcal/mol. In general, the mostattractive (most negative) E int values in nenci- cor-respond to charged intermolecular complexes that tendto be at (or close) to their equilibrium geometries. Forinstance, the single most attractive intermolecular com-plex in nenci- (with E int = − . kcal/mol) corre-sponds to the Li + · · · benzene ion- π system at its equi-librium geometry ( i.e. , with Li + located above the cen-ter of the benzene ring, see Sec. II B). In fact, the topten most attractive intermolecular interactions in nenci- correspond to the various Li + · · · π complexes atslightly diﬀerent (but close to equilibrium) intermolec-ular distances and angles; these are followed by theionic H O · · · HPO hydrogen-bonded complexes (witha minimum E int = − . kcal/mol) and the Na + · · · π complexes (with a minimum E int = − . kcal/mol). Ingeneral, the most repulsive (most positive) E int valuesin nenci- correspond to intermolecular complexes inwhich the monomers are separated by the shortest dis-tance ( . × ) and rotated away from their equilibriumintermolecular angle, as both of these geometric pertur-bations lead to a rapid increase in the exponentially re-pulsive steric contribution to the interaction energy. Forinstance, the single most repulsive intermolecular com-plex in nenci- (with E int = +186 . kcal/mol) cor-responds to the dimethyl sulfoxide (DMSO) dimer sep-arated by . × the equilibrium intermolecular distanceand rotated to a non-equilibrium angle; in fact, thisdimer has the closest intermolecular contact in the entiredatabase (with d H ··· H = 0 . Å, vide infra ). Other sub-stantially repulsive intermolecular complexes in nenci- include the Na + · · · triﬂuorotriazine ion- π system(with a maximum E int = +118 . kcal/mol) and anotherDMSO dimer (with E int = +112 . kcal/mol), both ofwhich were characterized by a . × intermolecular sepa-ration and a non-equilibrium intermolecular angle.The mean and median E int values in nenci- are − . kcal/mol and − . kcal/mol, respectively, whichcorrespond to typical interaction energies found in weaklybound molecular dimers. These statistical measuresare primarily governed by the (relatively) large num-ber of intermolecular complexes in nenci- that con-tain monomers in non-equilibrium (angular) orientations.Such geometric perturbations tend to nullify the ener-getic stabilization provided by directional intermolecularbinding motifs ( e.g. , single- and double-hydrogen bonds,dipole-dipole interactions, etc), and often result in com-plexes with weakly attractive E int values. Broken downby the scaled intermolecular distance, the mean (median) E int values are: +21 . ( +11 . ) kcal/mol for . × , +0 . ( +0 . ) kcal/mol for . × , − . ( − . ) kcal/mol for . × , − . ( − . ) kcal/mol for . × , − . ( − . ) kcal/molfor . × , − . ( − . ) kcal/mol for . × , and − . ( − . ) kcal/mol for . × . In total, nenci- con-tains , attractive ( E int < ) and , repulsive( E int > ) intermolecular complexes, and the crossover0from attractive to repulsive E int values typically occursaround . × the equilibrium intermolecular distance. Asone might expect, the proportion of attractive inter-molecular interactions in nenci- quickly diminishesas the distance between monomers decreases; brokendown again by the scaled intermolecular distance, weﬁnd the percentage of attractive (repulsive) E int valuesare: . ( . ) for . × , . ( . ) for . × , . ( . ) for . × , . ( . ) for . × , . ( . ) for . × , . ( . ) for . × , and . ( . ) for . × . Quite interestingly, there are still anumber ( N = 38 ) of attractive intermolecular complexesat the . × scaled intermolecular distance, which gener-ally correspond to strongly favorable dimers such as theLi + · · · π complexes discussed above. In the same breath,there are also quite a few ( N = 19 ) repulsive complexesat the equilibrium ( . × ) distance—some of which evenoccur at the corresponding equilibrium angle, e.g. , thecation- and anion- π complexes involving triﬂuorotriazineand benzene, respectively.A well-balanced database of intermolecular interac-tions should also sample a wide range of intermolecularatom-pair distances ( i.e. , interatomic distances betweenthe atoms on molecule A and the atoms on molecule B ). Again, this is indeed the case for nenci- , andis demonstrated by the series of normalized PDFs inFig. 4, which quantify a representative set of atom-pairdistances ( i.e. , O · · · H , N · · · H , H · · · H , and C · · · H ) asa function of intermolecular separation. In this ﬁgure,we chose to focus on the O · · · H , N · · · H , H · · · H , and C · · · H intermolecular atom-pair distances, as the ﬁrsttwo are representative of hydrogen-bonded systems andthe last two are the relevant interatomic distances fornon-bonded complexes in general. Since nenci- wasdesigned with a particular emphasis on close intermolecu-lar contacts, we focus our discussion on the short-distancesectors in these PDFs. As discussed above in the Intro-duction, such close intermolecular contacts are impor-tant in a number of applications, and pose signif-icant diﬃculty for both WFT and DFT methods alike(see paper-ii in this series), as both strongly attrac-tive and strongly repulsive intermolecular forces must beaccurately described to obtain a quantitatively correct E int value. As the intermolecular distance is reducedfrom . × to . × , the complexes in nenci- sam-ple increasingly closer interatomic distances and beginto more appreciably populate the region inside the cor-responding vdW envelope. In other words, a number ofintermolecular atom-pair distances ( R AB ) are less thanthe sum of the corresponding vdW radii, i.e. , R AB

A well-balanced database of intermolecular interac-tions should also sample a wide variety of diﬀerent bind-ing motifs. Here, we would again argue that this isthe case for nenci- , and demonstrate this pointby the extensively populated ternary diagrams depictedin Fig. 5. Introduced by Kim et al. in the late2000s, these ternary diagrams were constructed usinga SAPT decomposition of E int into the following fourcomponents for each intermolecular complex in nenci- : ε Elst (electrostatics, Elst), ε Exch (exchange, Exch), ε Ind (induction, Ind), and ε Disp (dispersion, Disp), i.e. , E int ≈ ε SAPT = ε Elst + ε Exch + ε Ind + ε Disp . In particular,we performed this decomposition at the SAPT2+/aDZlevel of theory, the so-called “silver standard”of SAPT, which has been shown to have an overallmean absolute error (MAE) of . kcal/mol across theS22, HBC6,

NBC10, and HSG databases. Unlike the “bronze standard” sSAPT0/jun-cc-pVDZ, which can underestimate the dispersion component inanion- π complexes by more than , the more so-phisticated SAPT2+/aDZ method employed herein isexpected to more accurately describe ε Disp in the anion- π complexes present in nenci- . As such, thisSAPT level should be well-suited to provide a physicallysound and semi-quantitative characterization of the bind-ing motifs included in nenci- .In previously constructed databases of non-covalentinteractions ( e.g. , S66 and S101 ), each intermolecularcomplex was typically classiﬁed into one of three cate-gories, based on whether E int ≈ ε SAPT was dominated by the ε Elst component (Elst-bound), the ε Disp component(Disp-bound), or a mixture (Mix) thereof (Mix-bound).Since the ε Ind component tended to be small in thesecomplexes, the analogous and fourth Ind-bound categorywas deemed to be largely unnecessary. With the additionof , ion- π complexes (in particular, the cation- π systems), the scope of the SAPT decomposition analysisis substantially wider in nenci- , and now encom-passes the Ind-bound regime. As such, we proposea natural extension of the traditional three-categoryclassiﬁcation scheme made popular by Hobza et al. andSherrill et al. to include the Ind-bound category.To do so, we construct a three-dimensional feature spacedeﬁned by the ε Disp /ε Elst , ε Ind /ε Disp , and ε Elst /ε Ind ratios as follows:

STEP 1.

To start, a single dimension of the featurespace is chosen as the basis for constructing an initialsub-classiﬁcation scheme. Although this choice isarbitrary, we will start with the ε Disp /ε Elst ratio, asthis selection is tantamount to constructing the afore-mentioned three-category classiﬁcation scheme ( i.e. ,Elst-bound, Disp-bound, or Mix-bound). For illustrativepurposes, a ternary diagram ( T ED ) depicting this initialsub-classiﬁcation scheme is plotted in the left panel ofFig. 5. STEP 2.

Intermolecular complexes with | ε Disp /ε Elst | > η are sub-classiﬁed as Disp-bound(shaded blue regions in T ED ), while intermolecularcomplexes with | ε Elst /ε Disp | > η are sub-classiﬁed asElst-bound (shaded green regions in T ED ). If onestopped at this point, set η = 2 , and classiﬁed allother cases as Mix-bound, this initial sub-classiﬁcationscheme (based on the single ε Disp /ε Elst feature) wouldbe equivalent to the three-category classiﬁcation schemedescribed above. Since the value of η is somewhatarbitrary, we have chosen to employ a slightly smallervalue ( η = 3 / ) in the classiﬁcation scheme introducedin this work; with this choice for η , less intermolecu-lar complexes will be classiﬁed as Mix-bound ( vide infra ). STEP 3.

To go beyond this three-category classiﬁcationscheme, STEP 2 is repeated for the two remaining di-mensions of the feature space. Selection of the ε Ind /ε Disp feature generates the T ID ternary diagram in Fig. 5 andthe analogous sub-classiﬁcation of intermolecular com-plexes as: Disp-bound (if ε Disp /ε Ind > η ; shaded blueregions in T ID ) or Ind-bound (if ε Ind /ε Disp > η ; shadedred regions). Similarly, the ε Elst /ε Ind feature yieldsthe ﬁnal required sub-classiﬁcation scheme: Elst-bound(if | ε Elst /ε Ind | > η ; shaded green regions in T EI ) orInd-bound (if | ε Ind /ε Elst | > η ; shaded red regions). Here,we note in passing that the absolute value (magnitude)must be used for all sub-classiﬁcations based on ε Elst ,as the sign of the Elst component can be positive ornegative.

STEP 4.

To arrive at our extended ( i.e. , four-category)classiﬁcation scheme, each intermolecular complex that2

Disp (-)

Elst (-)

Ind (-)

Elst (+)

Elst Ind Disp

Disp (-)

Elst (-)

Ind (-)

Elst (+)

FIG. 5. (

Left ) Geometric depiction of the extended four-category classiﬁcation scheme (based on a SAPT decompositionof E int ) used to classify each intermolecular complex in nenci- as: Elst-bound (E, green), Ind-bound (I, red), Disp-bound (D, blue), or Mix-bound. As described in the main text, this classiﬁcation scheme can be represented by a fusedternary diagram ( T EID ) which has been colored according to the following rule: each intermolecular complex that has beenassigned the same category (color) in any two of the three sub-classiﬁcation schemes ( T ED , T ID , T EI ) retains that color in T EID ; otherwise, the complex is classiﬁed as Mix-bound (white). (

Middle/Right ) Ternary diagrams depicting the break-down of the SAPT2+/aDZ intermolecular interaction energies of each complex in nenci- according to the contribu-tions from electrostatics ( ε Elst ), induction ( ε Ind ), and dispersion ( ε Disp ). Since ε Elst can be positive (Elst(+)) or negative(Elst(-)), these plots are comprised of two ternary diagrams (one for Elst(+) and one for Elst(-)) that have been fused to-gether. In these ternary diagrams, the shaded polygons are used to reﬂect the four-category classiﬁcation scheme describedabove, i.e. , Elst-bound (green), Ind-bound (red), and Disp-bound (blue); complexes that are not located in any one ofthese regions are Mix-bound. In the

Middle panel, each point has been colored using an RGB scheme with values givenby: {| ε Ind | / ( | ε Elst | + | ε Ind | + | ε Disp | ) , | ε Elst | / ( | ε Elst | + | ε Ind | + | ε Disp | ) , | ε Disp | / ( | ε Elst | + | ε Ind | + | ε Disp | ) } . In the Right panel,each point is colored according to the scaled intermolecular distance. has been sub-classiﬁed (in STEP 2 and STEP 3) withthe same label twice retains that label; otherwise, theintermolecular complex is classiﬁed as Mix-bound. Thisﬁnal classiﬁcation scheme is graphically depicted in thecolored T EID ternary diagram in Fig. 5, which is assem-bled as an “outer sum” over the colored ternary diagramscorresponding to the sub-classiﬁcation schemes, i.e. , T EID = T ED ⊕ T ID ⊕ T EI , in which the colors of T EID aredetermined according to the rules described above.Based on this extended four-category classiﬁcationscheme, the equilibrium intermolecular complexesin nenci- are comprised of ( . ) Elst-bound, ( . ) Ind-bound, ( . ) Disp-bound, and ( . ) Mix-bound dimers. When including all non-equilibrium intermolecular distances and angles, the en-tire nenci- database contains , ( . ) Elst-bound, ( . ) Ind-bound, , ( . ) Disp-bound, and , ( . ) Mix-bound intermolecularcomplexes. Here, we note in passing that this observeddecrease in the percentage of Ind-bound complexes is par-tially due to the inclusion of ﬁve (instead of nine) inter-molecular angles for each ion- π complex due to symmetryconsiderations (see Sec. II B). As such, the intermolecularcomplexes in nenci- largely span the entire ternarydiagram in Fig. 5 and therefore contain a diverse array ofbinding motifs; as such, we hope that nenci- will be used to critically examine (and potentially improve) theperformance of theoretical models when faced with thechallenge of simultaneously describing diverse NCI typeson the same footing ( i.e. , point ( ii ) in the Introduction).Here, we note that the apparent bias towards Elst-bound complexes in nenci- is an unavoidable con-sequence of sampling short-range intermolecular sepa-rations; at such distances, there is often a substantialamount of orbital/density overlap between monomers,and charge penetration eﬀects (in ε Elst )tend to be the dominant contribution (over ε Ind and ε Disp ) to ε SAPT . For instance, a signiﬁcant majority( . ) of the intermolecular complexes at . × areclassiﬁed as Elst-bound while approximately half that( . ) of the equilibrium dimers share this label.This increase in the relative number of Elst-bound com-plexes at shorter intermolecular separations is clearlyreﬂected in the ternary diagram in the right panel ofFig. 5 as well as the percentage of Elst-bound com-plexes when broken down by the scaled intermoleculardistance, i.e. , . ( . × ), . ( . × ), . ( . × ), . ( . × ), . ( . × ), . ( . × ), and . ( . × ). In general, many complexes that are Disp-boundat larger intermolecular distances become Elst-bound orMix-bound at reduced separations where short-range ef-fects ( e.g. , charge penetration) become more signiﬁcant.On the other hand, the Ind-bound complexes (which are3primarily comprised of cation- π interactions) tend to re-main Ind-bound even at reduced intermolecular separa-tions since charge penetration eﬀects are substantially re-duced when one of the monomers is a monovalent cation( e.g. , Li + or Na + ). For reference, the respective per-centages of Ind-bound, Disp-bound, or Mix-bound com-plexes as a function of the scaled intermolecular distanceare: . , . , . for . × , . , . , . for . × , . , . , . for . × , . , . , . for . × , . , . , . for . × , . , . , . for . × , and . , . , . for . × .Before moving on to consider the error/uncertainty inthe E int values in nenci- , we note in passing that thepositive electrostatics (Elst(+)) region of the ternary di-agram in Fig. 5 is not sampled as well as the (Elst(-)) re-gion. However, nenci- does contain a non-negligible( ) number of intermolecular complexes with ε Elst > .As mentioned above, such complexes are primarily foundamong the cation- π complexes, where the degree of or-bital overlap in the dimer (and hence the energetic stabi-lization due to charge penetration eﬀects) is largely sup-pressed; hence, intermolecular complexes with repul-sive ε Elst values are quite rare and may be adequatelyaccounted for in nenci- . C. Error Analysis and Critical Assessment of the BenchmarkIntermolecular Interaction Energies

In addition to being extensive in size and scope, wewould also argue that a well-balanced database of inter-molecular interactions should contain a reliable estimateof the error/uncertainty present in the computed E int values. For ab initio WFT methods, the two primarysources of error when computing E int are: ( i ) incomplete-ness in the one-particle basis set ( i.e. , basis set incom-pleteness error (BSIE)) and ( ii ) the approximate treat-ment of the electron correlation energy ( E corr ). Sincethe mean-ﬁeld HF contribution to E int converges quicklywith respect to the underlying basis set, we expectthat the BSIE at the E HF / aQZint level will be negligiblewhen compared to the BSIE in the post-HF correlationenergy contributions in Eq. (2). As depicted in Eq. (3),the BSIE in the MP2 correlation energy is largely mit-igated using the two-point extrapolation scheme forapproximating the MP2/CBS limit provided in Eq. (4).Although the δE CCSD(T) correction converges with re-spect to the basis set signiﬁcantly faster than E MP2corr or E CCSD(T)corr alone, the BSIE in this term isgenerally the largest remaining source of error for ex-trapolation schemes such as that outlined in Eqs. (2)–(5).

To mitigate this error (and still remain compu-tationally feasible when generating such a large numberof intermolecular interaction energies), this contributionwas computed using an augmented Dunning-style triple- ζ (haTZ) basis set in nenci- ( cf . Eq. (5)).As such, we will primarily focus on the remaining BSIEin the δE CCSD(T) / haTZ contribution to E int when criti- cally assessing the accuracy of the intermolecular inter-action energies in nenci- . To do so, we will compareour E int values against two diﬀerent references. As a ﬁrstreference value, we computed the δE CCSD(T) correctionin Eq. (5) using a larger (and substantially more expen-sive) augmented quadruple- ζ (aQZ) basis set, i.e. , E REF1 = E MP2 / CBS + δE CCSD(T) / aQZ = E HF / aQZ + E MP2 / a(TQ)Zcorr + δE CCSD(T) / aQZ . (6)in which E MP2 / CBS was computed using Eqs. (3)–(4). Asa second and alternative reference, we simply replacedthe δE CCSD(T) / haTZ correction with a direct two-pointextrapolation of E CCSD(T) using the aTZ and aQZbasis sets, i.e. , E REF2 = E CCSD(T) / a(TQ)Z = E HF / aQZ + E CCSD(T) / a(TQ)Zcorr . (7)By including CCSD(T) calculations in the much largeraQZ basis set, both of these reference values directlyprobe the BSIE in the CCSD(T) contribution, and areexpected to be more reliable than the E int values in the nenci- database.The error of the CCSD(T)/CBS scheme outlined inEqs. (2)–(5) with respect to both E REF1 and E REF2 isshown in Fig. 6 for a select subset of intermolecular com-plexes in nenci- . Plotted as a function of the scaledintermolecular distance (at the equilibrium angle, un-less otherwise noted), this subset of intermolecular com-plexes was chosen to cover the wide array of binding mo-tifs in nenci- , and includes examples of Elst-, Ind-,Disp-, and Mix-bound systems, i.e. , single (H O · · · H O,MeNH · · · MeNH ) and double (AcOH · · · AcOH) hydro-gen bonds, dipole-dipole (MeF · · ·

MeF), π - π stacking(BZ · · · BZ PD), CH- π (BZ · · · BZ TS), as well as cation- π (Na + · · · BZ) and anion- π (F – · · · BZ) interactions. Asseen in Fig. 6, the E int values in nenci- are gen-erally within ± . kcal/mol of both E REF1 and E REF2 ,and the errors with respect to these references tend to in-crease in magnitude at reduced intermolecular distances.The worst-case scenarios among this subset include theacetic acid dimer (AcOH · · ·

AcOH, double hydrogen-bonded) and the C parallel-displaced (PD) benzenedimer (BZ · · · BZ PD, π - π stacking), with errors in bothsteadily increasing in magnitude as the intermolecularseparation is decreased; at . × , we report errors of +0 . kcal/mol (AcOH · · · AcOH) and − . kcal/mol(BZ · · · BZ PD) with respect to E REF1 ( +0 . kcal/moland − . kcal/mol when compared to E REF2 , vide in-fra ). In these cases, the increased error is most likely dueto the relatively larger amount of orbital overlap betweenthese monomers at reduced intermolecular separations,where the interplay between short-range intermolecularinteractions ( i.e. , charge penetration, Pauli repulsion,many-body exchange-correlation eﬀects, etc) becomes in-creasingly more challenging to describe in an accurate4 . . . . . Scaled Intermolecular Distance − . − . − . . . . . E rr o r( i n k c a l / m o l ) REF1 . . . . . Scaled Intermolecular Distance

REF2

AcOH · · ·

AcOHH O · · · H O MeF · · ·

MeFMeNH · · · MeNH BZ · · · BZ PDBZ · · ·

BZ TS Na + · · · BZF − · · · BZ ± / molDMSO · · · DMSO

FIG. 6. Errors (in kcal/mol) in the nenci- E int values (computed using the E CCSD(T) / CBS extrapolation scheme inEqs. (2)–(5)) with respect to E REF1 (Eq. (6); left ) and E REF2 (Eq. (7); right ) for a representative set of intermolecularcomplexes. Plotted as a function of the scaled intermolecular distance (with the . × and . × distances omitted for clarity),all intermolecular complexes (with the exception of DMSO · · · DMSO, black stars) were kept at their equilibrium angle. Errorswith respect to E REF1 and E REF2 were computed as E CCSD(T) / CBS − E REF1 and E CCSD(T) / CBS − E REF2 , respectively. Basedon this error proﬁle (see main text), the errors in the nenci- E int values are ± . kcal/mol on average, but can be as largeas ± . − . kcal/mol ( i.e. , ± kJ/mol, dashed green lines) for some complexes at reduced ( i.e. , . × − . × ) intermolecularseparations. and reliable fashion. This trend is also reﬂected in theerror proﬁles corresponding to the two diﬀerent BZ · · · BZdimers in Fig. 6, where one can see that the error in thePD dimer (more orbital overlap) is noticeably larger inmagnitude than the error in the C T-shaped (TS) dimer(less orbital overlap) at all intermolecular separations.In the same breath, we also note that the error withrespect to E REF1 (or E REF2 ) is non-trivial in general,and does not necessarily follow a direct/straightforwardcorrelation with closest intermolecular contacts and/orthe sign/magnitude of E int . For example, the errors forAcOH · · · AcOH ( E int = +12 . kcal/mol) and BZ · · · BZPD ( E int = +51 . kcal/mol) at . × are both largerthan that found in the intermolecular complex with thelargest (most repulsive) E int value and closest O · · · H dis-tance in nenci- —a non-equilibrium conﬁguration ofDMSO · · ·

DMSO with E int = +186 . kcal/mol (whoseerrors with respect to E REF1 and E REF2 are depicted bystars in Fig. 6).From this analysis, we believe that the errors in the nenci- E int values are mostly within ± . kcal/mol,but can be as large as . − . kcal/mol ( i.e. , ≈ kJ/mol)for certain systems at reduced intermolecular separa-tions. Here, we note in passing that the δE CCSD(T) / haTZ correction used in nenci- provides a signiﬁcant im-provement over δE CCSD(T) / aDZ , and yields nearly iden-tical E int values when compared to the more expensive δE CCSD(T) / aTZ approach; this is shown in Fig. S1 andagain emphasizes the need for triple- ζ basis sets when em-ploying the δE CCSD(T) correction scheme. When con-sidering the largest errors in Fig. 6, i.e. , AcOH · · ·

AcOHand BZ · · ·

BZ PD, one can see that the errors with re-spect to E REF1 and E REF2 diﬀer by ≈ . kcal/mol;as such, the estimated average error in nenci- ( ± . kcal/mol) is comparable to the diﬀerence betweenusing E REF1 or E REF2 as the reference for E int . Gener-ally speaking, it is not clear which of these two quantitiessupplies the more accurate reference for E int ; however, ithas been pointed out by Sherrill and co-workers thatthe δE CCSD(T) correction does not converge monotoni-cally towards the CBS limit, which implies that E REF1 might in fact be a slightly better reference value than E REF2 .As mentioned above, the other primary source of errorwhen computing E int using approximate ab initio WFTmethods is the necessarily incomplete treatment of theelectron correlation energy; while post-CCSD(T) correc-tions tend to be small for equilibrium intermolecular in-teraction energies ( i.e. , < . kcal/mol), whether or notsuch corrections become more substantial at reduced in-termolecular separations still remains unanswered. Withincreasingly unfavorable scaling with both system andbasis set size, such post-CCSD(T) calculations ( i.e. ,CCSDT, CCSDT(Q), CCSDTQ, etc) are computation-5ally prohibitive and could have only been performedon: ( i ) the smaller/smallest systems in nenci- , butwith suﬃciently large basis sets (of at least triple- ζ orquadruple- ζ quality) or ( ii ) the larger/largest systems in nenci- , but with reduced and insuﬃciently large ba-sis sets ( i.e. , double- ζ at best). Since neither of these ap-proaches would have provided an accurate and reliable es-timate of the post-CCSD(T) contributions to E int for thewide range of intermolecular complexes in nenci- , we chose to focus our eﬀorts above on critically assessingthe CCSD(T)/CBS scheme outlined in Eqs. (2)–(5) basedon a quantitative estimate of the remaining BSIE at theCCSD(T) level. Since an accurate and reliable predictionof E int for intermolecular complexes in the repulsive wall( i.e. , inside the vdW envelope) poses a substantive chal-lenge to state-of-the-art DFT and WFT methods (see paper-ii in this series), further benchmarking of thestandard CCSD(T)/CBS approach (possibly via stochas-tic CC or FCI methods) in this regime is anopen challenge for the community and will be of criticalimportance for the development of next-generation DFTfunctionals and ML-based intra-/inter-molecular interac-tion potentials. IV. CONCLUSIONS AND FUTURE DIRECTIONS

In this work, we present nenci- : a large and com-prehensive database of approximately , benchmarknon-equilibrium non-covalent interaction energies for adiverse selection of intermolecular complexes of biolog-ical and chemical relevance with a particular empha-sis on close intermolecular contacts. Designed to ad-dress the growing need for extensive high-quality quan-tum mechanical data in the chemical sciences, nenci- starts with the molecular dimers in the widelyused S66, S66x8, S66a8, and S101x7 databases, and extends the scope of these popular works in two di-rections. For one, nenci- includes cation- andanion- π complexes, a fundamentally important class ofNCIs that are found throughout nature and among thestrongest NCIs known. Secondly, nenci- system-atically samples both equilibrium and non-equilibriumconﬁgurations on all intermolecular PES by si-multaneously varying the intermolecular distance (from . × − . × the equilibrium separation) and intermolec-ular angle (including either ﬁve or nine angles for eachdistance, depending on symmetry considerations). Assuch, a wide range of intermolecular atom-pair distancesare present in nenci- , including a large number ofclose intermolecular contacts with atom pairs located in-side their respective vdW envelope; these intermolecu-lar complexes probe a number of diﬀerent short-rangedNCIs ( e.g. , charge transfer and penetration, Pauli repul-sion, many-body exchange-correlation eﬀects, etc), whichare observed in many important chemical and biologi-cal systems, and pose an enormous challenge for molec-ular modeling. Computed at the CCSD(T)/CBS level of theory, the , benchmark E int values in nenci- range from − . kcal/mol (most attractive) to +186 . kcal/mol (most repulsive), with a total spanof . kcal/mol and a mean (median) E int value of − . kcal/mol ( − . kcal/mol). A detailed SAPT-based analysis was used to conﬁrm the diverse and com-prehensive nature of the intermolecular binding motifspresent in nenci- , which includes a signiﬁcant num-ber of primarily induction-bound dimers and now spansall regions of the SAPT ternary diagram; this warranteda new four-category classiﬁcation scheme that includescomplexes primarily bound by electrostatics ( , ), in-duction ( ), dispersion ( , ), or mixtures thereof( , ). Finally, a critical error analysis was performedon a representative set of intermolecular complexes, fromwhich we estimate that the E int values in nenci- have a mean error of ± . kcal/mol and a maximum er-ror of ± . − . kcal/mol for the most challenging cases.For all of these reasons, we believe that the nenci- database is timely and well-suited for testing, train-ing, and developing next-generation force ﬁelds, DFT andWFT methods, as well as ML based potentials. An order-of-magnitude larger than any database of non-covalentinteractions currently available, nenci- can be usedfor a variety of diﬀerent purposes. For one, nenci- could be employed as a single database and used in its en-tirety. Alternatively, nenci- can be split into multi-ple diﬀerent training and testing data sets—each contain-ing a diverse sample of intermolecular binding motifs—and used for cross-validation studies and statistical errorassessment. When used for such purposes, we note inpassing that strong correlations will likely exist betweendiﬀerent points on a given intermolecular PES; as such,we caution against separating such points between train-ing and testing data sets to avoid issues associated withoverﬁtting.We end this manuscript with a brief discussion of sev-eral future research directions that could build oﬀ thiswork and potentially have an immediate impact in theﬁeld. For one, paper-ii in this series (in prepara-tion) will critically assess the accuracy and reliabilityof a large number of popular DFT and WFT meth-ods when describing the diverse array of non-equilibriumnon-covalent interactions in nenci- , thereby identi-fying the strengths and weaknesses of established ﬁrst-principles methods. A simple and straightforward exten-sion of nenci- would target dimers with increased in-termolecular distances ( e.g. , beyond . × the equilibriumseparation), as benchmark E int values for such complexescould play an important role in testing and training MLmethods for predicting molecular multipoles and po-larizabilities , as well as addressing important un-resolved questions regarding the treatment of long-rangeelectrostatics in ML-based potentials. Other impor-tant research thrusts would focus on expanding nenci- to further address the three challenges introducedabove: ( i ) the need to describe NCIs in large molecu-lar and condensed-phase systems can be addressed with6extensions that focus on large/complex systems and po-tentially include explicit solvent molecules; ( ii ) the needto describe the diverse types of NCIs on the same footingcan be addressed by including NCI binding motifs thatare either not found or underrepresented in nenci- ( e.g. , triple hydrogen bonds, quadrupole-quadrupole in-teractions, ionic bonds, etc); ( iii ) the need to describeNCIs in equilibrium and non-equilibrium systems on thesame footing can be addressed by including complexesat more extreme (reduced and increased) intermolecu-lar separations and angles as well as complexes betweenmonomers in non-equilibrium conﬁgurations. ACKNOWLEDGMENTS

All authors thank David Sherrill for helpful scientiﬁcdiscussions and Destiny Malloy for creating an early pro-totype of Fig. 1. All authors acknowledge partial ﬁnan-cial support from Cornell University through start-upfunding. This material is based upon work supportedby the National Science Foundation under Grant No.CHE-1945676. RAD also gratefully acknowledges ﬁnan-cial support from an Alfred P. Sloan Research Fellowship.This research used resources of the National Energy Re-search Scientiﬁc Computing Center, which is supportedby the Oﬃce of Science of the U.S. Department of Energyunder Contract No. DE-AC02-05CH11231.

DATA AVAILABILITY STATEMENT

The data that supports the ﬁndings of this study areavailable within the article and its supplementary mate-rial.

REFERENCES D. Langbein,

Theory of van der Waals Attraction (Springer:Berlin, 1974). V. Parsegian,

Van der Waals Forces: A Handbook for Biolo-gists, Chemists, Engineers, and Physicists (Cambridge Univer-sity Press: Cambridge, 2005). I. G. Kaplan,

Intermolecular Interactions: Physical Picture,Computational Methods and Model Potentials (Wiley: NewYork, 2006). J. N. Israelachvili,

Intermolecular and Surface Forces (Aca-demic Press: Burlington, 2011). A. J. Stone,

The Theory of Intermolecular Forces , 2nd ed. (Ox-ford University Press: Oxford, 2013). K. U. Wendt, K. Poralla, and G. E. Schulz, “Structure andfunction of a squalene cyclase,” Science , 1811–1815 (1997). Y. Zhao, Y. Domoto, E. Orentas, C. Beuchat, D. Emery,J. Mareda, N. Sakai, and S. Matile, “Catalysis with anion– π interactions,” Angew. Chem. Int. Ed. , 10124–10127 (2013). Y. Zhao, C. Beuchat, Y. Domoto, J. Gajewy, A. Wilson,J. Mareda, N. Sakai, and S. Matile, “Anion- π catalysis,” J. Am.Chem. Soc. , 2101–2111 (2014). C. R. Kennedy, S. Lin, and E. N. Jacobsen, “The cation- π interaction in small-molecule catalysis,” Angew. Chem. Int. Ed. , 12596–12624 (2016). Y. Zhao, Y. Cotelle, L. Liu, J. López-Andarias, A.-B. Born-hof, M. Akamatsu, N. Sakai, and S. Matile, “The emergence ofanion- π catalysis,” Acc. Chem. Res. , 2255–2263 (2018). S. Yamada, “Cation- π interactions in organic synthesis,” Chem.Rev. , 11353–11432 (2018). P. Metrangolo, G. Resnati, T. Pilati, R. Liantonio, andF. Meyer, “Engineering functional materials by halogen bond-ing,” J. Polym. Sci. A Polym. Chem. , 1–15 (2007). P. Metrangolo, T. Pilati, G. Terraneo, S. Biella, and G. Resnati,“Anion coordination and anion-templated assembly under halo-gen bonding control,” CrystEngComm , 1187–1196 (2009). C. Chen and H. Whitlock Jr, “Molecular tweezers: a simplemodel of bifunctional intercalation,” J. Am. Chem. Soc. ,4921–4922 (1978). R. A. Bissell, E. Córdova, A. E. Kaifer, and J. F. Stoddart, “Achemically and electrochemically switchable molecular shuttle,”Nature , 133–137 (1994). V. Balzani, M. Gómez-López, and J. F. Stoddart, “Molecularmachines,” Acc. Chem. Res. , 405–414 (1998). V. Balzani, A. Credi, F. M. Raymo, and J. F. Stoddart, “Artiﬁ-cial molecular machines,” Angew. Chem. Int. Ed. , 3348–3391(2000). P. Kolb, D. M. Rosenbaum, J. J. Irwin, J. J. Fung, B. K. Ko-bilka, and B. K. Shoichet, “Structure-based discovery of β ,6843–6848 (2009). J. Fanfrlík, A. K. Bronowska, J. Řezáč, O. Přenosil, J. Konva-linka, and P. Hobza, “A reliable docking/scoring scheme basedon the semiempirical quantum mechanical PM6-DH2 methodaccurately covering dispersion and H-bonding: HIV-1 proteasewith 22 ligands,” J. Phys. Chem. B , 12666–12678 (2010). C. Bissantz, B. Kuhn, and M. Stahl, “A medicinal chemist’sguide to molecular interactions,” J. Med. Chem. , 5061–5084(2010). B. Vorlová, D. Nachtigallová, J. Jirásková-Vaníčková, H. Ajani,P. Jansa, J. Řezáč, J. Fanfrlík, M. Otyepka, P. Hobza, J. Kon-valinka, et al. , “Malonate-based inhibitors of mammalian serineracemase: kinetic characterization and structure-based compu-tational study,” Eur. J. Med. Chem. , 189–197 (2015). K. A. Dill and J. L. MacCallum, “The protein-folding problem,50 years on,” Science , 1042–1046 (2012). J. E. Jones, “On the determination of molecular ﬁelds.–II. Fromthe equation of state of a gas,” Proc. Roy. Soc. A , 463–477(1924). J. W. Ponder, C. Wu, P. Ren, V. S. Pande, J. D. Chodera,M. J. Schnieders, I. Haque, D. L. Mobley, D. S. Lambrecht, R. A.DiStasio Jr., et al. , “Current status of the AMOEBA polarizableforce ﬁeld,” J. Phys. Chem. B , 2549–2564 (2010). J. A. Lemkul, J. Huang, B. Roux, and A. D. MacKerell Jr, “Anempirical polarizable force ﬁeld based on the classical Drudeoscillator model: Development history and recent applications,”Chem. Rev. , 4983–5013 (2016). S. Grimme, “Density functional theory with London dispersioncorrections,” WIREs Comput. Mol. Sci. , 211–228 (2011). K. Berland, V. R. Cooper, K. Lee, E. Schröder, T. Thonhauser,P. Hyldgaard, and B. I. Lundqvist, “van der Waals forces indensity functional theory: A review of the vdW-DF method,”Rep. Prog. Phys. , 066501 (2015). J. Hermann, R. A. DiStasio Jr., and A. Tkatchenko, “First-principles models for van der Waals interactions in moleculesand materials: Concepts, theory, and applications,” Chem. Rev. , 4714–4758 (2017). C. Riplinger and F. Neese, “An eﬃcient and near linear scal-ing pair natural orbital based local coupled cluster method,”J. Chem. Phys. , 034106 (2013). C. Riplinger, B. Sandhoefer, A. Hansen, and F. Neese, “Naturaltriple excitations in local coupled cluster calculations with pair natural orbitals,” J. Chem. Phys. , 134101 (2013). M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von Lilien-feld, “Fast and accurate modeling of molecular atomization en-ergies with machine learning,” Phys. Rev. Lett. , 058301(2012). F. Brockherde, L. Vogt, L. Li, M. E. Tuckerman, K. Burke,and K.-R. Müller, “Bypassing the Kohn-Sham equations withmachine learning,” Nat. Commun , 1–10 (2017). K. T. Schütt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko,and K.-R. Müller, “SchNet–a deep learning architecture formolecules and materials,” J. Chem. Phys. , 241722 (2018). T. Bereau, R. A. DiStasio Jr., A. Tkatchenko, and O. A.von Lilienfeld, “Non-covalent interactions across organic andbiological subsets of chemical space: Physics-based potentialsparametrized from machine learning,” J. Chem. Phys. ,241706 (2018). D. P. Metcalf, A. Koutsoukas, S. A. Spronk, B. L. Claus, D. A.Loughney, S. R. Johnson, D. L. Cheney, and C. D. Sher-rill, “Approaches for machine learning intermolecular interactionenergies and application to energy components from symme-try adapted perturbation theory,” J. Chem. Phys. , 074103(2020). A. M. Reilly, R. I. Cooper, C. S. Adjiman, S. Bhattacharya,A. D. Boese, J. G. Brandenburg, P. J. Bygrave, R. Bylsma,J. E. Campbell, R. Car, et al. , “Report on the sixth blind testof organic crystal structure prediction methods,” Acta. Crystal-logr. B. Struct. Sci. Cryst. Eng. Mater. , 439–459 (2016). J. Hoja, H.-Y. Ko, M. A. Neumann, R. Car, R. A. DiSta-sio Jr., and A. Tkatchenko, “Reliable and practical compu-tational description of molecular crystal polymorphs,” Sci. Adv. , eaau3338 (2019). B. Cheng, G. Mazzola, C. J. Pickard, and M. Ceriotti, “Evi-dence for supercritical behaviour of high-pressure liquid hydro-gen,” Nature , 217–220 (2020). S. Grimme, “Supramolecular binding thermodynamics bydispersion-corrected density functional theory,” Chem. Eur. J. , 9955–9964 (2012). A. Tkatchenko, R. A. DiStasio Jr., R. Car, and M. Scheﬄer,“Accurate and eﬃcient method for many-body van der Waalsinteractions,” Phys. Rev. Lett. , 236402 (2012). R. A. DiStasio Jr., O. A. von Lilienfeld, and A. Tkatchenko,“Collective many-body van der Waals interactions in molecularsystems,” Proc. Natl. Acad. Sci. USA , 14791–14795 (2012). R. Sedlak, T. Janowski, M. Pitoňák, J. Řezáč, P. Pulay, andP. Hobza, “Accuracy of quantum chemical methods for largenoncovalent complexes,” J. Chem. Theory Comput. , 3364–3374 (2013). M. Goldey, R. A. DiStasio Jr., Y. Shao, and M. Head-Gordon,“Shared memory multiprocessing implementation of resolution-of-the-identity second-order Møller–Plesset perturbation theorywith attenuated and unattenuated results for intermolecular in-teractions between large molecules,” Mol. Phys. , 836–843(2014). A. Ambrosetti, D. Alfé, R. A. DiStasio Jr., and A. Tkatchenko,“Hard numbers for large molecules: Toward exact energeticsfor supramolecular systems,” J. Phys. Chem. Lett. , 849–855(2014). R. A. DiStasio Jr. and M. Head-Gordon, “Optimized spin-component scaled second-order Møller-Plesset perturbation the-ory for intermolecular interaction energies,” Mol. Phys. ,1073–1083 (2007). K. E. Riley and P. Hobza, “Assessment of the MP2 method,along with several basis sets, for the computation of interactionenergies of biologically relevant hydrogen bonded and dispersionbound complexes,” J. Phys. Chem. A , 8257–8263 (2007). K. E. Riley, J. A. Platts, J. Řezáč, P. Hobza, and J. G. Hill,“Assessment of the performance of MP2 and MP2 variants forthe treatment of noncovalent interactions,” J. Phys. Chem. A , 4159–4169 (2012). B. G. Ernst, K. U. Lao, A. G. Sullivan, and R. A. DiStasioJr., “Attracting opposites: Promiscuous ion- π binding in thenucleobases,” J. Phys. Chem. A , 4128–4140 (2020). Q. Wang, J. A. Rackers, C. He, R. Qi, C. Narth, L. Lagardere,N. Gresh, J. W. Ponder, J.-P. Piquemal, and P. Ren, “Gen-eral model for treating short-range electrostatic penetration ina molecular mechanics force ﬁeld,” J. Chem. Theory Comput. , 2609–2618 (2015). T. Gould, E. R. Johnson, and S. A. Tawﬁk, “Are dispersioncorrections accurate outside equilibrium? A case study on ben-zene,” Beilstein J. Org. Chem , 1181–1191 (2018). R. A. Mata and M. A. Suhm, “Benchmarking quantum chemicalmethods: Are we heading in the right direction?” Angew. Chem.Int. Ed. , 11011–11018 (2017). J. Řezáč and P. Hobza, “Describing noncovalent interactionsbeyond the common approximations: How accurate is the “goldstandard,” CCSD(T) at the complete basis set limit?” J. Chem.Theory Comput. , 2151–2155 (2013). L. A. Burns, M. S. Marshall, and C. D. Sherrill, “Appoint-ing silver and bronze standards for noncovalent interactions:A comparison of spin-component-scaled (SCS), explicitly corre-lated (F12), and specialized wavefunction approaches,” J. Chem.Phys. , 234111 (2014). P. Jurečka, J. Šponer, J. Čern`y, and P. Hobza, “Benchmarkdatabase of accurate (MP2 and CCSD(T) complete basis setlimit) interaction energies of small model complexes, DNA basepairs, and amino acid pairs,” Phys. Chem. Chem. Phys. , 1985–1993 (2006). M. S. Marshall, L. A. Burns, and C. D. Sherrill, “Basis set con-vergence of the coupled-cluster correction, δ MP2CCSD(T) : Best prac-tices for benchmarking non-covalent interactions and the atten-dant revision of the S22, NBC10, HBC6, and HSG databases,”J. Chem. Phys. , 194102/1–10 (2011). L. Gráfová, M. Pitoňák, J. Řezáč, and P. Hobza, “Comparativestudy of selected wave function and density functional meth-ods for noncovalent interaction energy calculations using theextended S22 data set,” J. Chem. Theory Comput. , 2365–2376(2010). J. Řezáč, K. E. Riley, and P. Hobza, “S66: A well-balanced database of benchmark interaction energies relevantto biomolecular structures,” J. Chem. Theory Comput. , 2427–2438 (2011). B. Brauer, M. K. Kesharwani, S. Kozuch, and J. M. Martin,“The S66x8 benchmark for noncovalent interactions revisited:Explicitly correlated ab initio methods and density functionaltheory,” Phys. Chem. Chem. Phys. , 20905–20925 (2016). J. Řezáč, K. E. Riley, and P. Hobza, “Extensions of the S66 dataset: More accurate interaction energies and angular-displacednonequilibrium geometries,” J. Chem. Theory Comput. , 3466–3470 (2011). J. Řezáč, K. E. Riley, and P. Hobza, “Benchmark calculationsof noncovalent interactions of halogenated molecules,” J. Chem.Theory Comput. , 4285–4292 (2012). T. M. Parker and C. D. Sherrill, “Assessment of empirical mod-els versus high-accuracy ab initio methods for nucleobase stack-ing: Evaluating the importance of charge penetration,” J. Chem.Theory Comput. , 4197–4204 (2015). Y. Zhao and D. G. Truhlar, “Benchmark databases for non-bonded interactions and their use to test density functional the-ory,” J. Chem. Theory Comput. , 415–432 (2005). S. Tsuzuki, K. Honda, T. Uchimaru, and M. Mikami, “Es-timated MP2 and CCSD(T) interaction energies of n-alkanedimers at the basis set limit: Comparison of the methods ofHelgaker et al. and Feller,” J. Chem. Phys. , 114304 (2006). S. Kozuch and J. M. L. Martin, “Halogen bonds: Benchmarksand theoretical analysis,” J. Chem. Theory Comput. , 1918–1931 (2013). B. J. Mintz and J. M. Parks, “Benchmark interaction energies forbiologically relevant noncovalent complexes containing divalentsulfur,” J. Phys. Chem. A , 1086–1092 (2012). J. Řezáč and P. Hobza, “Advanced corrections of hydrogenbonding and dispersion for semiempirical quantum mechanicalmethods,” J. Chem. Theory Comput. , 141–151 (2012). J. C. Faver, M. L. Benson, X. He, B. P. Roberts, B. Wang,M. S. Marshall, M. R. Kennedy, C. D. Sherrill, and K. M.Merz, “Formal estimation of errors in computed absolute inter-action energies of protein-ligand complexes,” J. Chem. TheoryComput. , 790–797 (2011). J. Řezáč, M. Dubeck`y, P. Jurečka, and P. Hobza, “Extensionsand applications of the A24 data set of accurate interactionenergies,” Phys. Chem. Chem. Phys. , 19268–19277 (2015). D. G. A. Smith, L. A. Burns, K. Patkowski, and C. D. Sherrill,“Revised damping parameters for the D3 dispersion correctionto density functional theory,” J. Phys. Chem. Lett. , 2197–2203(2016). L. Goerigk, A. Hansen, C. Bauer, S. Ehrlich, A. Najibi, andS. Grimme, “A look at the density functional theory zoo with theadvanced GMTKN55 database for general main group thermo-chemistry, kinetics and noncovalent interactions,” Phys. Chem.Chem. Phys. , 32184–32215 (2017). J. G. McDaniel and J. Schmidt, “Physically-motivated forceﬁelds from symmetry-adapted perturbation theory,” J. Phys.Chem. A , 2053–2066 (2013). J. G. McDaniel and J. R. Schmidt, “First-principles many-bodyforce ﬁelds from the gas phase to liquid: A “universal” ap-proach,” J. Phys. Chem. B , 8042–8053 (2014). S. Vandenbrande, M. Waroquier, V. V. Speybroeck, andT. Verstraelen, “The monomer electron density force ﬁeld(MEDFF): A physically inspired model for noncovalent inter-actions,” J. Chem. Theory Comput. , 161–179 (2017). P. Morgante and R. Peverati, “ACCDB: A collection of chem-istry databases for broad computational purposes,” J. Comput.Chem. , 839–848 (2019). V. Miriyala and J. Řezáč, “Testing semiempirical quantum me-chanical methods on a data set of interaction energies mappingrepulsive contacts in organic molecules,” J. Phys. Chem. A ,2801–2808 (2018). V. Bazgier, K. Berka, M. Otyepka, and P. Banáš, “Exponentialrepulsion improves structural predictability of molecular dock-ing,” J. Comput. Chem. , 2485–2494 (2016). L. Song, N. Fu, B. G. Ernst, W. H. Lee, M. O. Frederick, R. A.DiStasio Jr., and S. Lin, “Dual electrocatalysis enables enan-tioselective hydrocyanation of conjugated alkenes,” Nat. Chem. , 1–8 (2020). C. J. Sahle, C. Sternemann, C. Schmidt, S. Lehtola, S. Jahn,L. Simonelli, S. Huotari, M. Hakala, T. Pylkkänen, A. Nyrow,K. Mende, M. Tolan, K. Hämäläinen, and M. Wilke, “Micro-scopic structure of water at elevated pressures and tempera-tures,” Proc. Natl. Acad. Sci. USA , 6301–6306 (2013). M. Miao, Y. Sun, E. Zurek, and H. Lin, “Chemistry under highpressure,” Nat. Rev. Chem. , 508–527 (2020). H. Liu, I. I. Naumov, R. Hoﬀmann, N. Ashcroft, and R. J.Hemley, “Potential high-T c superconducting lanthanum and yt-trium hydrides at high pressure,” Proc. Natl. Acad. Sci. USA , 6990–6995 (2017). E. G. Hohenstein, J. Duan, and C. D. Sherrill, “Origin ofthe surprising enhancement of electrostatic energies by electron-donating substituents in substituted sandwich benzene dimers,”J. Am. Chem. Soc. , 13244–13247 (2011). B. Jeziorski, R. Moszynski, and K. Szalewicz, “Perturbationtheory approach to intermolecular potential energy surfaces ofvan der Waals complexes,” Chem. Rev. , 1887–1930 (1994). K. U. Lao and J. M. Herbert, “Breakdown of the single-exchangeapproximation in third-order symmetry-adapted perturbationtheory,” J. Phys. Chem. A , 3042–3047 (2012). N. Mardirossian and M. Head-Gordon, “Thirty years of den-sity functional theory in computational chemistry: An overviewand extensive assessment of 200 density functionals,” Mol. Phys. , 2315–2372 (2017). B. G. Ernst, Z. M. Sparrow, and R. A. DiStasio Jr., “NENCI-2021 Part II: Evaluating the performance of quantum chemicalapproximations on the NENCI-2021 benchmark database,” inpreparation . P. Politzer, J. S. Murray, and T. Clark, “Halogen bonding: Anelectrostatically-driven highly directional noncovalent interac-tion,” Phys. Chem. Chem. Phys. , 7748–7757 (2010). D. A. Dougherty, “Cation- π interactions in chemistry and biol-ogy: A new view of benzene, phe, tyr, and trp,” Science ,163–168 (1996). J. C. Ma and D. A. Dougherty, “The cation- π interaction,”Chem. Rev. , 1303–1324 (1997). A. S. Mahadevi and G. N. Sastry, “Cation- π interaction: Itsrole and relevance in chemistry, biology, and material science,”Chem. Rev. , 2100–2138 (2012). A. Frontera, P. Gamez, M. Mascal, T. J. Mooibroek, andJ. Reedijk, “Putting anion- π interactions into perspective,”Angew. Chem. Int. Ed. , 9564–9583 (2011). B. L. Schottel, H. T. Chifotides, and K. R. Dunbar, “Anion- π interactions,” Chem. Soc. Rev. , 68–83 (2008). M. S. Marshall, R. P. Steele, K. S. Thanthiriwatte, and C. D.Sherrill, “Potential energy curves for cation- π interactions: Oﬀ-axis conﬁgurations are also attractive,” J. Phys. Chem. A ,13628–13632 (2009). J. Novotn`y, S. Bazzi, R. Marek, and J. Kozelka, “Lone-pair- π interactions: Analysis of the physical origin and biological im-plications,” Phys. Chem. Chem. Phys. , 19472–19481 (2016). S. F. Boys and F. D. Bernardi, “The calculation of small molec-ular interactions by the diﬀerences of separate total energies.Some procedures with reduced errors,” Mol. Phys. , 553–566(1970). T. H. Dunning Jr., “Gaussian basis sets for use in correlatedmolecular calculations. I. The atoms boron through neon andhydrogen,” J. Chem. Phys. , 1007–1023 (1989). R. A. Kendall, T. H. Dunning Jr, and R. J. Harrison, “Electronaﬃnities of the ﬁrst-row atoms revisited. Systematic basis setsand wave functions,” J. Chem. Phys. , 6796–6806 (1992). D. E. Woon and T. H. Dunning Jr., “Gaussian basis sets for usein correlated molecular calculations. III. The atoms aluminumthrough argon,” J. Chem. Phys. , 1358–1371 (1993). A. K. Wilson, D. E. Woon, K. A. Peterson, and T. H. Dun-ning Jr, “Gaussian basis sets for use in correlated molecular cal-culations. IX. The atoms gallium through krypton,” J. Chem.Phys. , 7667–7676 (1999). B. P. Prascher, D. E. Woon, K. A. Peterson, T. H. Dunning Jr.,and A. K. Wilson, “Gaussian basis sets for use in correlatedmolecular calculations. VII. Valence, core-valence, and scalarrelativistic basis sets for Li, Be, Na, and Mg,” Theor. Chem.Acc. , 69–82 (2011).

P. Jurečka, P. Nachtigall, and P. Hobza, “RI-MP2 calculationswith extended basis sets–a promising tool for study of H-bondedand stacked DNA base pairs,” Phys. Chem. Chem. Phys. ,4578–4582 (2001). A. E. DePrince and C. D. Sherrill, “Accuracy and eﬃciency ofcoupled-cluster theory using density ﬁtting/Cholesky decompo-sition, frozen natural orbitals, and a T1-transformed Hamilto-nian,” J. Chem. Theory Comput. , 2687–2696 (2013). F. Weigend, “A fully direct RI-HF algorithm: Implementation,optimised auxiliary basis sets, demonstration of accuracy andeﬃciency,” Phys. Chem. Chem. Phys. , 4285–4291 (2002). F. Weigend, A. Köhn, and C. Hättig, “Eﬃcient use of the cor-relation consistent basis sets in resolution of the identity MP2calculations,” J. Chem. Phys. , 3175–3183 (2002).

C. Hättig, “Optimization of auxiliary basis sets for RI-MP2 andRI-CC2 calculations: Core–valence and quintuple- ζ basis setsfor H to Ar and QZVPP basis sets for Li to Kr,” Phys. Chem.Chem. Phys. , 59–66 (2005). F. Weigend, “Hartree–Fock exchange ﬁtting basis sets for H toRn,” J. Comput. Chem. , 167–175 (2008). A. Hellweg and D. Rappoport, “Development of new auxiliarybasis functions of the Karlsruhe segmented contracted basis setsincluding diﬀuse basis functions (def2-SVPD, def2-TZVPPD,and def2-QVPPD) for RI-MP2 and RI-CC calculations,” Phys.Chem. Chem. Phys. , 1010–1017 (2015). H.-J. Werner, P. J. Knowles, G. Knizia, F. R. Manby, andM. Schütz, “Molpro: a general-purpose quantum chemistry pro-gram package,” WIREs Comput. Mol. Sci. , 242–253 (2012). E. Papajak, J. Zheng, X. Xu, H. R. Leverentz, and D. G. Truh-lar, “Perspectives on basis sets beautiful: Seasonal plantings ofdiﬀuse basis functions,” J. Chem. Theory Comput. , 3027–3034(2011). A. Halkier, T. Helgaker, P. Jørgensen, W. Klopper, H. Koch,J. Olsen, and A. K. Wilson, “Basis-set convergence in correlatedcalculations on Ne, N , and H O,” Chem. Phys. Lett. , 243–252 (1998).

E. G. Hohenstein and C. D. Sherrill, “Density ﬁtting andCholesky decomposition approximations in symmetry-adaptedperturbation theory: Implementation and application to probethe nature of π - π interactions in linear acenes,” J. Chem. Phys. , 184111/1–10 (2010). E. G. Hohenstein, R. M. Parrish, C. D. Sherrill, J. M.Turney, and H. F. Schaefer III, “Large-scale symmetry-adapted perturbation theory computations via density ﬁttingand Laplace transformation techniques: Investigating the funda-mental forces of DNA-intercalator interactions,” J. Chem. Phys. , 174107/1–13 (2011).

E. G. Hohenstein and C. D. Sherrill, “Density ﬁtting of intra-monomer correlation eﬀects in symmetry-adapted perturbationtheory,” J. Chem. Phys. , 014101/1–12 (2010).

E. G. Hohenstein and C. D. Sherrill, “Eﬃcient evaluation oftriple excitations in symmetry-adapted perturbation theory via second-order Møller–Plesset perturbation theory natural or-bitals,” J. Chem. Phys. , 104107/1–7 (2010).

T. M. Parker, L. A. Burns, R. M. Parrish, A. G. Ryno, andC. D. Sherrill, “Levels of symmetry adapted perturbation theory(SAPT). I. Eﬃciency and performance for interaction energies,”J. Chem. Phys. , 094106/1–16 (2014).

R. M. Parrish, L. A. Burns, D. G. A. Smith, A. C. Simmonett,A. E. DePrince, E. G. Hohenstein, U. Bozkaya, A. Y. Sokolov,R. Di Remigio, R. M. Richard, J. F. Gonthier, A. M. James,H. R. McAlexander, A. Kumar, M. Saitow, X. Wang, B. P.Pritchard, P. Verma, H. F. Schaefer III, K. Patkowski, R. A.King, E. F. Valeev, F. A. Evangelista, J. M. Turney, T. D. Craw-ford, and C. D. Sherrill, “Psi4 1.1: An open-source electronicstructure program emphasizing automation, advanced libraries,and interoperability,” J. Chem. Theory Comput. , 3185–3197(2017). A. Bondi, “van der Waals volumes and radii,” J. Chem. Phys. , 441–451 (1964). R. S. Rowland and R. Taylor, “Intermolecular nonbonded con-tact distances in organic crystal structures: Comparison withdistances expected from van der Waals radii,” J. Phys. Chem. , 7384–7391 (1996).

N. J. Singh, S. K. Min, D. Y. Kim, and K. S. Kim, “Com-prehensive energy analysis for various types of π -interaction,”J. Chem. Theory Comput. , 515–529 (2009). K. S. Thanthiriwatte, E. G. Hohenstein, L. A. Burns, and C. D.Sherrill, “Assessment of the performance of DFT and DFT-Dmethods for describing distance dependence of hydrogen-bondedinteractions,” J. Chem. Theory Comput. , 88–96 (2011). M. O. Sinnokrot and C. D. Sherrill, “High-accuracy quan-tum mechanical studies of π - π interactions in benzene dimers,”J. Phys. Chem. A , 10656–10668 (2006). A. L. Ringer, M. S. Figgs, M. O. Sinnokrot, and C. D. Sher-rill, “Aliphatic C-H/ π interactions: Methane-benzene, methane-phenol, and methane-indole complexes,” J. Phys. Chem. A ,10822–10828 (2006). C. D. Sherrill, T. Takatani, and E. G. Hohenstein, “An assess-ment of theoretical methods for nonbonded interactions: Com-parison to complete basis set limit coupled-cluster potential en-ergy curves for the benzene dimer, the methane dimer, benzene-methane, and benzene-H S,” J. Phys. Chem. A , 10146–10159 (2009).

Y. Geng, T. Takatani, E. G. Hohenstein, and C. D. Sherrill,“Accurately characterizing the π - π interaction energies of indole-benzene complexes,” J. Phys. Chem. A , 3576–3582 (2010). J. C. Faver, M. L. Benson, X. He, B. P. Roberts, B. Wang, M. S.Marshall, M. R. Kennedy, C. D. Sherrill, and K. M. Merz Jr,“Formal estimation of errors in computed absolute interactionenergies of protein-ligand complexes,” J. Chem. Theory Com-put. , 790–797 (2011). K. U. Lao, R. Schäﬀer, G. Jansen, and J. M. Herbert, “Ac-curate description of intermolecular interactions involving ionsusing symmetry-adapted perturbation theory,” J. Chem. TheoryComput. , 2473–2486 (2015). L. A. Burns, J. C. Faver, Z. Zheng, M. S. Marshall, D. G. A.Smith, K. Vanommeslaeghe, A. D. MacKerell, K. M. Merz, andC. D. Sherrill, “The biofragment database (BFDb): An open-data platform for computational chemistry analysis of noncova-lent interactions,” J. Chem. Phys. , 161727 (2017).

T. M. Parker and C. D. Sherrill, “Assessment of empirical mod-els versus high-accuracy ab initio methods for nucleobase stack-ing: Evaluating the importance of charge penetration,” J. Chem.Theory Comput. , 4197–4204 (2015). J. A. Rackers, Q. Wang, C. Liu, J.-P. Piquemal, P. Ren, andJ. W. Ponder, “An optimized charge penetration model for usewith the AMOEBA force ﬁeld,” Phys. Chem. Chem. Phys. ,276–291 (2017). A. Halkier, T. Helgaker, P. Jørgensen, W. Klopper, andJ. Olsen, “Basis-set convergence of the energy in molecularHartree–Fock calculations,” Chem. Phys. Lett. , 437–446(1999).

A. L. L. East and W. D. Allen, “The heat of formation of NCO,”J. Chem. Phys. , 4638–4650 (1993). M. O. Sinnokrot, E. F. Valeev, and C. D. Sherrill, “Estimatesof the ab initio limit for π − π interactions: The benzene dimer,”J. Am. Chem. Soc. , 10887–10893 (2002). M. Pitoňák, T. Janowski, P. Neogrády, P. Pulay, and P. Hobza,“Convergence of the CCSD(T) correction term for the stackedcomplex methyl adenine-methyl thymine: Comparison withlower-cost alternatives,” J. Chem. Theory Comput. , 1761–1766 (2009). A. D. Boese, J. M. L. Martin, and W. Klopper, “Basis set limitcoupled cluster study of H-bonded systems and assessment ofmore approximate methods,” J. Phys. Chem. A , 11122–11133 (2007).

L. Demovičová, P. Hobza, and J. Řezáč, “Evaluation of com-posite schemes for CCSDT(Q) calculations of interaction ener-gies of noncovalent complexes,” Phys. Chem. Chem. Phys. ,19115–19121 (2014). J. E. Deustua, J. Shen, and P. Piecuch, “Converging high-levelcoupled-cluster energetics by Monte Carlo sampling and mo-ment expansions,” Phys. Rev. Lett. , 223003 (2017).

C. J. Scott, R. Di Remigio, T. D. Crawford, and A. J. Thom,“Diagrammatic coupled cluster Monte Carlo,” J. Phys. Chem.Lett. , 925–935 (2019). C. J. Scott, R. Di Remigio, T. D. Crawford, and A. J. Thom,“Theory and implementation of a novel stochastic approach tocoupled cluster,” J. Chem. Phys. , 144117 (2020).

N. Blunt, S. D. Smart, J. Kersten, J. Spencer, G. H. Booth, andA. Alavi, “Semi-stochastic full conﬁguration interaction quan-tum Monte Carlo: Developments and application,” J. Chem.Phys. , 184107 (2015).

S. Sharma, A. A. Holmes, G. Jeanmairet, A. Alavi, and C. J.Umrigar, “Semistochastic heat-bath conﬁguration interactionmethod: Selected conﬁguration interaction with semistochasticperturbation theory,” J. Chem. Theory Comput. , 1595–1604(2017). J. Li, M. Otten, A. A. Holmes, S. Sharma, and C. J. Umri-gar, “Fast semistochastic heat-bath conﬁguration interaction,”J. Chem. Phys. , 214110 (2018).

M. Veit, D. M. Wilkins, Y. Yang, R. A. DiStasio Jr., andM. Ceriotti, “Predicting molecular dipole moments by combin-ing atomic partial charges and atomic dipoles,” J. Chem. Phys. , 024113 (2020).

Y. Yang, K. U. Lao, D. M. Wilkins, A. Grisaﬁ, M. Ceriotti, andR. A. DiStasio Jr., “Quantum mechanical static dipole polariz-abilities in the QM7b and AlphaML showcase databases,” Sci.Data , 1–10 (2019). D. M. Wilkins, A. Grisaﬁ, Y. Yang, K. U. Lao, R. A. DiStasioJr., and M. Ceriotti, “Accurate molecular polarizabilities withcoupled cluster theory and machine learning,” Proc. Natl. Acad.Sci. USA , 3401–3406 (2019).

S. Yue, M. C. Muniz, M. F. Calegari Andrade, L. Zhang,R. Car, and A. Z. Panagiotopoulos, “When do short-rangeatomistic machine-learning models fall short?” J. Chem. Phys.154