Multi-scale approach for the prediction of atomic scale properties
MMulti-scale approach for the prediction of atomic scale properties
Andrea Grisafi, ∗ Jigyasa Nigam,
1, 2, 3, ∗ and Michele Ceriotti
1, 3, † Laboratory of Computational Science and Modeling, IMX,´Ecole Polytechnique F´ed´erale de Lausanne, 1015 Lausanne, Switzerland Indian Institute of Space Science and Technology, Thiruvananthapuram 695547, India National Centre for Computational Design and Discovery of Novel Materials (MARVEL),´Ecole Polytechnique F´ed´erale de Lausanne, 1015 Lausanne, Switzerland
Electronic nearsightedness is one of the fundamental principles governing the behavior of condensedmatter and supporting its description in terms of local entities such as chemical bonds. Localityalso underlies the tremendous success of machine-learning schemes that predict quantum mechanicalobservables – such as the cohesive energy, the electron density, or a variety of response properties – asa sum of atom-centred contributions, based on a short-range representation of atomic environments.One of the main shortcomings of these approaches is their inability to capture physical effects,ranging from electrostatic interactions to quantum delocalization, which have a long-range nature.Here we show how to build a multi-scale scheme that combines in the same framework local andnon-local information, overcoming such limitations. We show that the simplest version of suchfeatures can be put in formal correspondence with a multipole expansion of permanent electrostatics.The data-driven nature of the model construction, however, makes this simple form suitable to tacklealso different types of delocalized and collective effects. We present several examples that range frommolecular physics, to surface science and biophysics, demonstrating the ability of this multi-scaleapproach to model interactions driven by electrostatics, polarization and dispersion, as well as thecooperative behavior of dielectric response functions.
I. INTRODUCTION
The broad success of machine-learning approaches, usedto predict atomic-scale properties bypassing the compu-tational cost of first-principles calculations [1–14], can belargely traced to the use of structural descriptors that aredefined through localized atomic environments [15–17].The assumption of locality is supported by the principleof nearsightedness of electronic matter first introducedby Walter Kohn [18], which implies that far-field per-turbations to the local properties of the system are usu-ally screened, and exponentially-decaying. Locality hasbeen long exploited to develop linear-scaling electronic-structure schemes [19–23], and in the context of machine-learning methods it allows constructing models that arehighly transferable, and applicable to diverse problems aswell as to complex, heterogeneous datasets [24].Structural descriptors that are built using only localinformation cannot, however, describe long-range interac-tions and non-local phenomena. In many contexts, par-ticularly when describing homogeneous, bulk systems [6],long-range tails can be incorporated in an effective way,or approximated by increasing the range of the local en-vironments [25]. On a fundamental level, however, theuse of nearsighted representations undermines the reliabil-ity of machine-learning approaches whenever strong elec-trostatic and polarization effects guide the macroscopicbehavior of the system. This is for instance the casewhen considering the electrostatic screening properties of ∗ These authors contributed equally to this work † michele.ceriotti@epfl.ch water and electrolyte solutions [26–31], the collective dis-persion interactions that stabilize molecular crystals andbiomolecules [32, 33], or the surface charge polarizationof a metal electrode in response to an external electricfield [34–38]. Several examples have been presented thatdemonstrate the shortcomings of local ML models in thepresence of long-range physical effects [39–41].Global representations exist that incorporate informa-tion on the entire system [42–44], but usually they reducethe transferability of the resulting model. In the con-text of modelling electronic potential energy surfaces,several strategies have been proposed to incorporate ex-plicitly the physical effects that underlie long-range inter-actions. Baselining the model with a cheaper electronic-structure method that incorporates electrostatic contri-butions [2, 10, 45–47], or using free-energy perturbationto promote a short-range ML potential to full quantumchemical accuracy [14] are very effective, pragmatic ap-proaches to circumvent the problem. Alternatively, onecan directly machine-learn the atomic partial charges andmultipoles that enter the definition of the electrostaticenergy [48–55], the atomic polarizability that underliesdispersion interactions [56], or atomic electronegativitiesthat are then used to determine the partial charges ofthe system by minimizing its electrostatic energy [57, 58].The major shortcoming of these methods is that, on oneside, they are highly system dependent and, on the other,they are limited to the prediction of energy-related prop-erties, and to the specific physical interaction that theyare designed to model. Some of the present Authorshave recently proposed an alternative approach to incor-porate non-local interactions into an atom-centred MLframework. Non-local information of the system is foldedwithin local atomic environments thanks to the definition a r X i v : . [ phy s i c s . c o m p - ph ] A ug FIG. 1. Schematic representation of the construction of atomic-field representations. Left: Cartesian coordinate of the atoms fora hypothetical “doped graphene” system; middle: atom-density field, divided in three elemental channels, color-coded; right:atom-density potential, color-coded. of smooth Coulomb-like potentials that are subsequentlysymmetrized according to the nature of the target prop-erty [59]. The resulting long-distance equivariant (LODE)representation is endowed with a long-range characterwhile still being defined from the information sampled ina finite local neighbourhood of the atoms.In this work, density and potential based descriptorsare combined within a unified multi-scale representa-tion. The resulting model can be formally related toan environment-dependent multipolar expansion of theelectrostatic energy, but has sufficient flexibility to yieldaccurate predictions for a number of different kinds ofinteractions, and regression targets. We first consider, asan example, a dataset of organic dimers, partitioned intopairs that are representative of the possible interactionsbetween charged, polar and apolar groups, demonstrat-ing that the multi-scale LODE features can be used todescribe permanent electrostatics, polarization and dis-persion interactions with an accuracy that is only limitedby the number of training points. We then show how ourmodel is able to capture the mutual polarization betweena water molecule and a metal slab of lithium. Finally,we reproduce the dipole polarizability of a dataset ofpoly-aminoacids, extrapolating the electric response ofthe system at increasing chain lengths.
II. MULTI-SCALE EQUIVARIANTREPRESENTATIONS
Following the notation introduced in Ref. 17, let us startby defining a density field associated with a structure A as the superposition of decorated atom-centred Gaussians (cid:104) a x | A ; ρ (cid:105) = (cid:88) i ∈ A δ aa i g ( x − r i ) . (1) In this expression, r i indicates the position of atoms of A ,and a i labels their elemental nature. From these smoothatomic densities, a Coulomb-like potential can be formallydefined as a result of the following integral operation: (cid:104) a x | A ; V (cid:105) = (cid:90) d x (cid:48) (cid:104) a x (cid:48) | A ; ρ (cid:105)| x − x (cid:48) | . (2)One could build a general family of fields using a differentintegral transformation of the density, but here we focuson this 1 / | x − x (cid:48) | form, which is well-suited to describelong-range interactions. The two primitive representations | ρ (cid:105) and | V (cid:105) can be individually symmetrized over thecontinuous translation group [17]. Imposing translationalinvariance on Eq. (1) has the ultimate effect of centring therepresentation on the atoms i of the system, so that we canconveniently refer to the set of atom-centred densities[60] (cid:104) a x | A ; ρ i (cid:105) = (cid:88) j ∈ A δ aa j g ( x − r ji ) , (3)where r ji = r j − r i is the vector separating atoms i and j .As already shown in Ref. 59, given that the Coulomboperator is translationally invariant, one can obtain ananalogous result symmetrizing the tensor product | ρ (cid:105) ⊗| V (cid:105) , yielding a set of atom-centred potentials (cid:104) a x | A ; V i (cid:105) = (cid:90) d x (cid:48) (cid:80) j ∈ A δ aa j g ( x (cid:48) − r ji ) | x − x (cid:48) | . (4)Either of Eqs. 3 or 4 contains information on the entirestructure. Usually, however, the atom-centred density | ρ i (cid:105) is evaluated including only atoms within sphericalenvironments of a given cutoff radius r c . This truncationis not only a matter of practical convenience: fundamen-tal physical principles [18] indicate that molecular andmaterials properties are largely determined by local corre-lations, and increasing indefinitely r c has been shown to reduce the accuracy of the model [24, 61], because, in theabsence of enormous amounts of uncorrelated trainingstructures, the increase in model flexibility leads to over-fitting. The fundamental intuition in the constructionof the atom-density potential | V i (cid:105) is that, even if oneevaluates it in a spherical neighbourhood of the centralatom i , thereby avoiding an uncontrollable increase inthe complexity of the model, it incorporates contributionsfrom atoms that are very far away. The nature of | V i (cid:105) canbe better understood by separating the near-field fromthe far-field potential in the definition of Eq. (4), that is, (cid:104) a x | V i (cid:105) = (cid:104) a x | V i (cid:105) == (cid:90) d x (cid:48) (cid:104) a x (cid:48) | ρ i (cid:105)| x − x (cid:48) | (5)where ρ i are the atomic densities located insideand outside the i -th spherical environment. We omit thestructure label A for convenience, as we will do often inwhat follows. The near-field term contributes informationthat is analogous to that included in | ρ i (cid:105) . The far-fieldcontribution instead determines the effect of the densitybeyond r c , and the choice of the integral operator affectsthe asymptotic form of this effect, with 1 / | x − x (cid:48) | implyinga Coulomb-like behavior.Tensor products of the atom-centred density Eqs. (3)and potential (4) could be separately symmetrized overrotations and inversion, yielding respectively structuraldescriptors of short-range interatomic correlations, equiv-alent to SOAP-like representations [16], or long-distanceequivariants (LODE) features [59]. Here we introduce amore explicitly multi-scale family of representations, thatcouples | ρ i (cid:105) and | V i (cid:105) terms. Formally, one can obtain asymmetry-adapted ket that transforms like the irreduciblerepresentations of the O (3) group by computing the Haarintegral over the rotation and inversion operators ˆ R , ˆ i : | (cid:104) ρ ⊗ νi ⊗ V ⊗ ν (cid:48) i ⊗ σ ⊗ λµ (cid:105) O (3) (cid:105) = (cid:88) k =0 , (cid:90) SO (3) d ˆ R ˆ i k | σ (cid:105) ⊗ ˆ i k ˆ R | λµ (cid:105)⊗ ˆ i k ˆ R | ρ i (cid:105) ⊗ . . . ˆ i k ˆ R | ρ i (cid:105) (cid:124) (cid:123)(cid:122) (cid:125) ν times ⊗ ˆ i k ˆ R | V i (cid:105) ⊗ . . . ˆ i k ˆ R | V i (cid:105) (cid:124) (cid:123)(cid:122) (cid:125) ν (cid:48) times , (6)which we indicate in what follows using the shorthandnotation | ρ ⊗ νi ⊗ V ⊗ ν (cid:48) i ; σ ; λµ (cid:105) , omitting the σ ; λµ indiceswhen considering invariant features ( σ = 1, λ = 0).Within this construction, the ket | λµ (cid:105) has the role ofmaking the resulting features transform as a Y µλ sphericalharmonic [39, 62], while | σ (cid:105) indicates the parity of thefeatures under inversion [63].In practical implementations, the abstract ket (6) canbe computed by first expanding the atom-centred fea-tures (3)-(4) onto a discrete basis, and then evaluatingthe symmetrized ν -point correlation of the fields. A partic-ularly clean, efficient, recursive formulation can be derivedexploiting the fact that the equivariant features behaveas angular momenta, and can then be combined using Clebsch-Gordan coefficients to build higher-order corre-lations [64]. In analytical derivations we use a partially-discretized basis, in which the radial contribution is keptas a continuous index, corresponding to (cid:104) arlm | = (cid:90) d x δ ( r − x ) Y ml (ˆ x ) (cid:104) a x | , (7)while in the actual implementation we use a basis of Gaus-sian type orbitals to also discretize the radial component.The nature of the representation, however, does notdepend on such details. The basis-set independence ismost clearly seen by considering the use of the equivariantsin the context of a linear regression model. The valueof a tensorial property T for a structure, decomposedin its irreducible spherical components (ISC) [65] and inatom-centred contributions, can be formulated as T σλµ ( A, i ) ≈ (cid:90) d X (cid:104) T ; σλ | X (cid:105) (cid:104) X | | A ; ρ ⊗ νi ⊗ V ⊗ ν (cid:48) i ; σ ; λµ (cid:105) , (8)where X indicates any complete basis that provides aconcrete representation of the ket, and (cid:104) X | T ; σλ (cid:105) is theset of regression weights. One sees that (1) the regres-sion model is invariant to a unitary transformation ofthe basis; (2) the equivariant transformation of the tar-get property is associated with the λµ indices of theket, while the weights are invariant under symmetry op-erations. Linear models are especially useful to revealthe physical meaning of a representation: they allowto demonstrate the relation between short-range densitycorrelations ( ν (cid:48) = 0) and the body-order expansion ofinteratomic potentials [17, 66–68], and the relation be-tween the first-order LODE ( ν = 0 , ν (cid:48) = 1) and fixedpoint-charge electrostatics. In the next section, we usethis idea to show how the simplest multi-scale LODE( ν = 1 , ν (cid:48) = 1) can be put in formal correspondence withthe physics of multipole electrostatics [69]. III. ANALYTICAL RESULTS FORLONG-RANGE INTERACTIONS
Consider a linear model to predict the ground-stateelectronic energy U of a system A . This corresponds totaking the scalar ( λ = 0) and polar ( σ = 1) limits withinthe prediction formula of Eq. (8): U ( A ) = N (cid:88) i =1 U i ( A ) = N (cid:88) i =1 (cid:90) d X (cid:104) U | X (cid:105) (cid:104) X | A ; ρ i ⊗ V i (cid:105) . (9)We aim to prove that in the LODE(1,1) case, where thedensity and potential representations are both introducedto first order, this functional form can be used to modelrigorously a multipolar expansion of the long-range con-tributions to U .To see this, let us start by representing the energyprediction in terms of the partially-discretized basis ofEq. (7). Upon symmetrization of the tensor productbetween ρ and V , and going to the coupled angular mo-mentum basis [64], one obtains a set of invariants thatcan be expressed using the basis (cid:104) X | ≡ (cid:104) a r ; a r ; l |(cid:104) a r ; a r ; l | ρ i ⊗ V i (cid:105) = 1 √ l + 1 (cid:88) | m |≤ l (cid:104) a r lm | ρ i (cid:105) (cf.Eq. (5)), we can write explicitly Eq. (9) as follows: U >i = l max (cid:88) l =0 (cid:88) a a (cid:90) r c d r (cid:90) r c d r (cid:104) U | a r ; a r ; l (cid:105)× (cid:88) | m |≤ l (cid:104) a r lm | ρ i (cid:105) (cid:63) . (11)where (cid:104) U | a r ; a r ; l (cid:105) indicates the regression weightsfor the total potential energy, that also incorporate the1 / √ l + 1 factor in Eq. (10). We are now interested inrepresenting the spherical harmonic components of thepotential in terms of the far-field contribution V >i ofEq. (5). Using the Laplace expansion of the Coulomboperator, we can rewrite | V >i (cid:105) as : (cid:104) a r lm | V >i (cid:105) (cid:63) = 4 π l + 1 (cid:90) d x (cid:104) ρ >i | a x (cid:105) r l x l +1 (cid:104) ˆ x | lm (cid:105) = 4 π l + 1 (cid:90) d r (cid:104) ρ >i | a rlm (cid:105) r l r l +1 . (12)Plugging this into Eq. (11), one sees that the contributionto the energy coming from the far-field can be written as U >i = (cid:88) a a (cid:88) lm (cid:90) ∞ r c d r r l +1 (cid:104) ρ >i | a rlm (cid:105) (cid:104) a a ; lm | M
0, Eq. (12) takes the form of a sim-ple Coulomb interaction between point-changes [42] (thederivation is reported in the Supplementary Material). In-cluding multipoles for l > | ρ i ⊗ V i (cid:105) by observing its performance in representing thefar-field interactions between an H O and a CO molecule– since the leading term of the interactions between thetwo molecules is driven by permanent electrostatics. Webuild a dataset considering 33 non-degenerate reciprocalorientations between the two molecules, and learn theinteraction over a range of distances between the centresof mass from 6.5 to 9˚A. We then extrapolate the predictedinteraction profile in the asymptotic regime of R > l max .We also compare different choices for the possible atomiccentres that contribute to the energy prediction: in panel(a) we express the energy in terms of a single environ-ment centred on the oxygen atom of the H O molecule;in panel (b) we use a single environment centred on thecarbon atom of CO ; in (c) we use multiple environmentscentred on each atom. This exercise probes the possibilityof choosing between a model for the electrostatic energythat is based on the definition of molecular rather than atomic multipoles [50, 51].As one would expect from a classical interpretation of R [ Å ]01234 U [ m e V ] R [ Å ]01234 U [ m e V ] R [ Å ]01234 U [ m e V ] QM FIG. 2. Extrapolated interaction profiles for a given config-uration of H O and CO at different angular cutoff values l max . Top, middle and bottom panels show the results of theasymptotic extrapolation when centring the representation onthe oxygen atom of H O, the carbon atom of CO and all theatoms of the system respectively. the long-range energy, the binding profile for the selectedtest configuration is ultimately driven by the interactionbetween the dipole moment of the water molecule andthe quadrupole moment of CO . This is reflected in thesharp transition of the prediction accuracy when crossinga critical angular cutoff l max . When centring the localenvironment on the water molecule (Fig. 2-a)), for in-stance, truncating the expansion at l max = 1 is enough toreproduce the interaction between the dipolar potentialof water and the CO molecule. Conversely, when cen-tring the representation on carbon dioxide (Fig. 2-b)), theH O density in the far-field has to interact with a CO potential that is quadrupolar in nature, which requiresan angular cutoff of at least l max = 2. When centring therepresentation on all the atoms of the system (Fig. 2-c)),using an angular cutoff of l max = 0 suffices to obtainqualitatively accurate interaction profiles. This is con-sistent with the success of the many parametrized forcefields and machine-learning potentials that simply rely onrepresenting the electrostatic energy of the system via aset of point-charges [49, 57]. For this simple toy problem,increasing the expansion at l max = 1 with an atomic mul-tipole model achieves almost perfect predictions. Resultsof all of the 33 orientations we considered are reported inthe SM. IV. RESULTS
The toy system we have discussed in the previous sec-tion reflects the behavior of the multi-scale LODE rep-resentation. However, if | ρ i ⊗ V i (cid:105) was only capable ofdescribing permanent electrostatics there would be lit-tle advantage over traditional physics-based models. Inthis Section we present three applications to substantiallymore complicated systems, to demonstrate that even intheir simplest form, this family of features is suitableto address the complexity of challenging, real-life atom-istic modelling problems. We report errors in terms ofthe root mean square error (RMSE), or the percentageRMSE (RMSE%), which is expressed as a percentage ofthe standard deviation of the target properties. A. Binding energies of organic dimers
We start by testing the ability of multi-scale LODEto describe different kinds of molecular interactions. Tothis end, we consider the interaction energy between 2291pairs of organic molecules belonging to the BioFragmentDatabase (BFDb) [71]. For each dimer configuration,binding curves are generated by considering 12 rigid dis-placements in steps of 0.25˚A along the direction that joinsthe geometric centres of the two molecules. Then, unre-laxed binding energies are computed at the DFT/PBE0level using the Tkatchenko-Scheffler self-consistent vander Waals method [72] as implemented in the FHI-aimspackage [73]. For each binding trajectory, we also includein the training set the dissociated limit of vanishing in-teraction energy, where the two monomers are infinitelyfar apart. The dataset so generated includes all the pos-sible spectrum of interactions, spanning pure dispersion,induced polarization and permanent electrostatics. Inorder to better rationalize the learning capability of sucha large variety of molecular interactions, we choose topartition the molecules in the dataset in three indepen-dent classes, namely, 1) molecules carrying a net charge,2) neutral molecules that contain heteroatoms (N, O),and can therefore exhibit a substantial polarity 3) neutralmolecules containing only C and H, that are considered QM Å ]0.30.20.10.0 U [ e V ] Å ]0.0500.0250.0000.0250.0500.075 U [ e V ] Å ]0.080.060.040.020.000.02 U [ e V ] FIG. 3. Median-error binding curves for six different classes of intermolecular interactions. ( black lines ) quantum-mechanicalcalculations. ( green lines ) predictions of a | ρ i ⊗ ρ i (cid:105) model. ( blue lines ) predictions of a | ρ i ⊗ V i (cid:105) model. apolar and interacting mostly through dispersive inter-actions. Considering all the possible combinations ofthese kinds of molecules partitions the dimers into sixclasses, i.e., 184 charged-charged (CC), 267 charged-polar(CP), 210 charged-apolar (CA), 161 polar-polar (PP),418 polar-apolar (PA) and 1051 apolar-apolar (AA) in-teractions. For each of the six classes, several, randomlyselected binding curves are held out of the training set,to test the accuracy of our predictions. The remainingcurves are used to fit one separate linear model for eachclass, using either local features (the SOAP power spec-trum [16], | ρ ⊗ i (cid:105) ) or multi-scale LODE( ν = 1 , ν (cid:48) = 1)features. In order to also assess the reliability of ourpredictions, we use a calibrated committee estimator [74]for the model uncertainty, which allows us to determineerror bars for the binding curves. 8 random subselectionsof 80% of the total number of training configurations wereare considered to construct the committee model. Theinternal validation set is then defined by selecting thetraining structures that are absent from at least 25% ofthe committee members.Figure 3 shows characteristic interaction profiles forthe six different classes of molecular pairs. The modelsuse r c = 3 ˚A environments centred on each atom. Theconfigurations we report are those that exhibit medianintegrated errors within the test set of each class. The rootmean square errors associated with the predictions overthe entire test sets of each class are listed in Table I. Theresults clearly show that while SOAP(2) is limited by thenearsightedness of the local environments, the LODE(1,1)multi-scale model is able to predict both the short and thelong-range behaviour of the binding profiles on an equalfooting. What is particularly remarkable is the fact that asimple, linear model can capture accurately different kinds RMSE/eVclass n train STD/eV ρ ⊗ ρ ρ ⊗ V V ⊗ V CC 100 1.86 0.72 0.049 0.058CP 200 0.379 0.25 0.074 0.092CA 150 0.083 0.056 0.041 0.034PP 100 0.131 0.10 0.062 0.125PA 350 0.046 0.032 0.013 0.021AA 950 0.063 0.026 0.004 0.006TABLE I. Prediction performance expressed in terms of theRMSE over all the points of the binding curves, for the sixclasses of interactions and ρ ⊗ ρ , ρ ⊗ V and V ⊗ V models.For each class we also indicate the number of training samples,and the characteristic energy scale, expressed in terms of thestandard deviation of the energies in the test set. of interactions, that occur on wildly different energy scalesand asymptotic behavior: the typical binding energy ofcharged dimers is of the order of several eV, and has a1 /r tail, while the typical interaction energy of two apolarmolecules is of the order of a few 10s of meV, and decaysroughly as 1 /r . A LODE( ν (cid:48) = 2) model (i.e. based on | V ⊗ i (cid:105) features) also allows to predict the binding curvesbeyond the 3˚A cutoff, but usually yields 50-100% largererrors than those observed with | ρ i ⊗ V i (cid:105) – not only forcharged molecules, but also for dimers that are dominatedby dispersion interactions. The multi-scale nature ofLODE( ν = 1, ν (cid:48) = 1) yields a better balance of shortand long-range descriptions, and is sufficiently flexibleto be adapted to the description of systems that arenot dominated by permanent electrostatics, even thoughinteractions between charged fragments are considerably FIG. 4. Predicted binding curve of a test water-lithium config-uration. ( black dots ) reference DFT calculations; ( green line )predictions of a | ρ i ⊗ ρ i (cid:105) model; ( blue line ) | ρ i ⊗ V i (cid:105) model. easier to learn, in comparison to the others. We alsoobserve that the uncertainty model works reliably, as thepredicted curves always fall within the estimated errorbar. Larger uncertainties are found for interaction classesthat have few representative samples in the training set,such as those associated with polar-polar molecular pairs(Fig 3-d)). This observation reflects the fact that themodel is limited by the training set size rather than bythe nature of interaction involved, as confirmed by thelack of saturation [75] observed in the learning curves(shown in the SM). B. Induced polarization on a metal surface
The previous example proves that linear | ρ i ⊗ V i (cid:105) mod-els capture a wide class of molecular interactions, rangingfrom pure dispersion to permanent electrostatics. Beyondmolecular systems, however, a large number of phenomenaoccur in solid state physics that are driven by long-rangeeffects, and involve more subtle, self-consistent interac-tions between far-away atoms. A particularly relevantexample is represented by the induced macroscopic polar-ization that a metallic material undergoes in response toan external electric field, which underlies fundamentallyand technologically important phenomena for surface sci-ence and nanostructures [76–78]. Physics-based modellingof this kind of systems usually exploits the fact that, for aperfectly-conductive surface, the interaction is equivalentto that between the polar molecule and the mirror image,relative to the surface plane, of its charge distribution,with an additional inversion of polarity [79]. It wouldnot appear at all obvious that our atom-centred frame-work, which does not include an explicit response of thefar-field atom density to the local data-driven multipole,can capture the physics of a phenomenon associated withthe polarization of electrons that are delocalized over theentire extension of the metallic solid.To benchmark the performance of multi-scale LODE in this challenging scenario we consider the interaction of aslab of bcc lithium with a water molecule that is located atvarious distances from the (100)-surface. We start by se-lecting 81 water molecule configurations, differing in theirinternal geometry or in their spatial orientation relativeto the surface. For each of these configurations, 31 rigiddisplacements are performed along the (100)-direction,spanning a range of distances between 0.5 ˚A and 8 ˚A fromthe lithium surface. Using this dataset we compute unre-laxed binding energies at the DFT/PBE level using theFHI-aims package [73]. We converge the slab size alongthe periodic xy -plane, minimizing the self-interaction be-tween the periodic images of the water molecule, resultingin a 5 × k -points samplingof 4 × × − . We set the slab extension along thenon-periodic z -direction so that the Fermi energy is con-verged within 10 meV, resulting in a total of 13 layers. Toremove the spurious interactions along the z -axis, we seta large vacuum space of roughly 80 ˚A in conjunction witha correction suitable to screen the dipolar potential [80].Following these prescriptions, we obtain attractive poten-tial profiles for all molecular geometries and orientation,consistently with the interaction between the dipolar fieldof the water molecule and the induced metal polarization.For this example, we construct | ρ i (cid:105) and | V i (cid:105) represen-tations within spherical environments of r c = 4 ˚A witha Gaussian-density width of σ = 0 . | ρ ⊗ i (cid:105) model and a multi-scale LODE | ρ i ⊗ V i (cid:105) model inlearning the interaction energy of the metal slab and thewater molecule for one representative test trajectory (alltest trajectories are reported in the SM). We observe thatthe local SOAP description is able to capture the short-range interactions but becomes increasingly ineffective asthe water molecule moves outside the atomic environment,leading to an overall error of about 19 RMSE%. Thisis in sharp contrast to the performance of the | ρ i ⊗ V i (cid:105) representation, which can capture both the effects of elec-trostatic induction at a large distance and the Pauli-likerepulsion at short range with the same level of accuracy,halving the prediction error to about 9%. Learning curvesare shown in the SM.To further investigate what aspects of the physics ofthe molecule-surface interaction can be captured by themodel, we perform a Mulliken population analysis on thereference DFT calculations, to extract the polarizationvector of the water molecule in response to the interac-tion with the metal, i.e., P W = µ W − µ W , where µ W and µ W are the dipole moment of the water moleculein the lithium-slab system and in vacuum respectively.Physically, the polarization P W involves the response ofwater’s electrons to the rearrangement of the electroniccharge in the surface triggered by the dipolar field, andso it involves explicitly a back-reaction. Furthermore,the polarization shows both a (usually larger) componentalong the z -axis, and a tangential component in xy -plane. Number of training configurations2 × 10 % R M S E | V ; 1; 1| ; 1; 1 FIG. 5. Learning curves for the induced polarization of thewater molecule due to interaction with image charges in themetal slab, computed only for separations greater than 4.5˚A.The error is computed as a fraction of the intrinsic variabilityof the test set of 215 configurations. Contrary to the localmodel (green), a linear | ρ i ⊗ V i (cid:105) model (blue) can learn thisself-consistent polarization, with no significant reduction ofthe learning rate up to 1000 training configurations. To account for the vectorial nature of P W , we take advan-tage of the tensorial extension of Eq. (6). To single outthe long-range nature of the polarization interaction, werestrict the regression of P W to water configurations thatare more than 4.5 ˚A far from the surface. Our datasetcontains 1215 such configurations, out of which we ran-domly select 1000 for training, while the remaining 215are retained for testing. Given that the training set con-tains no structures within the local descriptor cutoff, itcomes as no surprise that a pure density-based tensormodel | ρ ⊗ i ; 1; 1 µ (cid:105) entirely fails to learn the long-rangepolarization induced on the water molecule. Making useof the potential-based tensor model of Eq. (8), in contrast,allows us to effectively learn the polarization vector P W ,showing an error that decreases to ∼
20 %RMSE at themaximum training set size available (Figure 5). This ex-ample provides a compelling demonstration of the abilityof | ρ i ⊗ V i (cid:105) to build models of effects that go well-beyondpermanent electrostatics. C. Response functions of oligopeptides
As a final example, we consider the challenging taskof predicting the polarizabilty of a dataset of poly-aminoacids. Dielectric response functions are stronglyaffected by long-range correlations, because of the co-operative nature of the underlying physical mechanism.Poor transferability of local models between structuresof different sizes has been observed for molecular dipolemoments [41], polarizability [40], and the electronic di- Number of training structures10 R M S E ( = ) ( a . u . ) | V |[ ] |[ ] + w | V FIG. 6. Learning curves for the λ = 0 component of thepolarizability tensor of a database of polypeptide conformers.The green curve corresponds to the non-linear kernel which isequivalent to | [ ρ ⊗ i ] ⊗ (cid:105) , the blue curve to a linear kernel basedon | ρ i ⊗ V i (cid:105) , and the red one to an optimal linear combinationof the two. electric constant of bulk water [39]. For this purpose, weuse a training set composed of 27428 conformers of singleaminoacids and 370 dipeptides, testing the predictionsof the model on a smaller test set containing 30 dipep-tides, 20 tripeptides, 16 tetrapeptides and 10 pentapeptideconfigurations. Reference polarizability calculations arecarried out with the Gaussian16 quantum-chemistry codeusing the double-hybrid DFT functional PWPB95-D3and the aug-cc-pVDZ basis set [81]. We compute themulti-scale | ρ i ⊗ V i (cid:105) features and their local counterpartsusing a Gaussian width of σ =0.3 ˚A and a sphericalenvironment cutoff of r c =4 ˚A. This data set is interest-ing, because it combines large structural variability withtens of thousands of distorted aminoacid configurationswith longer-range interactions described by a few hundreddipeptide conformers. We achieve greater flexibility inthe description of the local part using a square kernelmodel, that is equivalent to using a quadratic functionalof the SOAP features, | [ ρ ⊗ i ] ⊗ (cid:105) ≡ | ρ ⊗ i (cid:105) ⊗ | ρ ⊗ i (cid:105) . Forthe multi-scale LODE model, instead, we use a simplelinear model, that is less susceptible to overfitting.The learning curves for the trace ( λ = 0) of the po-larizability tensor, shown in Fig. 6, are very revealing.The | [ ρ ⊗ i ] ⊗ (cid:105) model, which disregards any non-local be-havior beyond the atomic environment, is initially veryefficient, but saturates to an error of 0.06a.u.. In con-trast, equipped with non-local information, the | ρ i ⊗ V i (cid:105) representation reduces the error of prediction to 0.05a.u.,but is initially much less effective. This is not due to thelack of higher-order local density correlations: a linear | ρ ⊗ i (cid:105) model performs well, despite showing saturationdue to its local nature (see discussion in the SM). Rather, FIG. 7. Absolute RMSE in learning the λ = 0 spherical tensorof polarizability of polypeptides as a function of the peptidelength. The model was trained on 27428 single-amino acidsand 370 dipeptides. The error was computed on 30 dipep-tides, 20 tripeptides, 16 tetrapeptides and 10 pentapeptidesrespectively. it is indicative of the importance of short-range effectsin this diverse dataset. Inspired by Refs. [24, 61, 82] webuild a tunable kernel model based on a weighted sum ofthe local and the LODE kernels. We optimize the weightby cross-validation at the largest train size, obtaining areduction of 50% of the test error, down to 0.028a.u. Ananalysis of the test error which separates the contributionsfrom oligopeptides of different length, shown in Fig. 7, isconsistent with this interpretation of the learning curves.All models show an error that increases with the size ofthe molecule, because there are interactions that are justnot described at the smaller train set size. However, thepurely local model shows by far the worst extrapolativeperformance, while multi-scale models – in particular theone combining a non-linear local kernel and LODE fea-tures – show both a smaller overall error, and a saturationof the error for tetra and penta-peptides. This example il-lustrates the different approaches to achieve a multi-scaledescription of atomic-scale systems: the | ρ i ⊗ V i (cid:105) fea-tures offer simplicity and physical interpretability, while amulti-kernel model makes it possible to optimize in a data-driven manner the balance between local and long-rangedcorrelations. V. CONCLUSIONS
The lack of a description of long-range physical effectsis one of the main limitations of otherwise greatly suc- cessful machine-learning schemes which are more andmore often applied to model atomic-scale phenomena.We show how it is possible to construct a family of multi-scale equivariant features, that combine the properties ofwell-established local ML schemes with the long-distanceequivariant features that have been recently proposedby some of the Authors. This multi-scale frameworkshows enticing formal correspondences with physically-meaningful interaction terms, such as multipole electro-statics. Still, the data driven nature of the constructionallows the description of long-range interactions that donot fit this specific physical model. We show examplesof how a multi-scale LODE model can accurately predictinteractions between different kinds of molecular dimers,that include charged, polar and apolar compounds. Re-sults are also very promising when it comes to modellingsystems that clearly go beyond permanent electrostatics,such as a water molecule interacting with a metallic slab,and the dielectric response of oligopeptides. The combi-nation of a physics-inspired formulation and data-drivenflexibility that underlies this multi-scale LODE frame-work addresses one of the outstanding issues in atomisticmachine-learning, and paves the way towards an evenmore pervasive use of statistical methods to support thecomputational investigation of molecules and condensed-phase systems.
ACKNOWLEDGMENTS
The Authors would like to thank David Wilkins andLinnea Folkmann for sharing training data for the polar-izability of polyaminoacids, and Mariana Rossi for helpcomputing DFT references for the water-surface bindingcurves. M.C and A.G. were supported by the EuropeanResearch Council under the European Union’s Horizon2020 research and innovation programme (grant agree-ment no. 677013-HBMAP), and by the NCCR MARVEL,funded by the Swiss National Science Foundation. A.G.acknowledges funding by the MPG-EPFL centre for Molec-ular Nanoscience and Technology. JN was supported by aMARVEL INSPIRE Potentials Master’s Fellowship. Wethank CSCS for providing CPU time under project ids843. [1] J. Behler and M. Parrinello, Phys. Rev. Lett. , 146401(2007). [2] A. P. Bart´ok, M. C. Payne, R. Kondor, and G. Cs´anyi,Phys. Rev. Lett. , 136403 (2010). [3] R. Z. Khaliullin, H. Eshet, T. D. K¨uhne, J. Behler, andM. Parrinello, Phys. Rev. B , 100103 (2010).[4] H. Eshet, R. Z. Khaliullin, T. D. K¨uhne, J. Behler, andM. Parrinello, Phys. Rev. Lett. , 115701 (2012).[5] G. C. Sosso, G. Miceli, S. Caravati, J. Behler, andM. Bernasconi, Phys. Rev. B , 174103 (2012).[6] T. Morawietz, A. Singraber, C. Dellago, and J. Behler,Proc. Natl. Acad. Sci. U. S. A. , 8368 (2016).[7] V. L. Deringer and G. Cs´anyi, Phys. Rev. B , 094203(2017).[8] S. Chmiela, H. E. Sauceda, K.-R. M¨uller, andA. Tkatchenko, Nat Commun , 3887 (2018).[9] K. T. Sch¨utt, H. E. Sauceda, P.-J. Kindermans,A. Tkatchenko, and K.-R. M¨uller, J. Chem. Phys. ,241722 (2018).[10] M. Welborn, L. Cheng, and T. F. Miller, J. Chem. TheoryComput. , 4772 (2018).[11] F. A. Faber, A. S. Christensen, B. Huang, and O. A.Von Lilienfeld, J. Chem. Phys. , 241717 (2018).[12] L. Zhang, J. Han, H. Wang, R. Car, and W. E, Phys.Rev. Lett. , 143001 (2018).[13] D. Dragoni, T. D. Daff, G. Cs´anyi, and N. Marzari, Phys.Rev. Materials , 013808 (2018).[14] B. Cheng, E. A. Engel, J. Behler, C. Dellago, and M. Ce-riotti, Proc. Natl. Acad. Sci. U. S. A. , 1110 (2019).[15] J. Behler, Phys. Chem. Chem. Phys. PCCP , 17930(2011).[16] A. P. Bart´ok, R. Kondor, and G. Cs´anyi, Phys. Rev. B , 184115 (2013).[17] M. J. Willatt, F. Musil, and M. Ceriotti, J. Chem. Phys. , 154110 (2019).[18] E. Prodan and W. Kohn, Proc. Natl. Acad. Sci. ,11635 (2005).[19] W. Yang, Phys. Rev. Lett. , 1438 (1991).[20] G. Galli and M. Parrinello, Phys. Rev. Lett. , 3547(1992).[21] W. Kohn, Phys. Rev. Lett. , 3168 (1996).[22] A. H. R. Palser and D. E. Manolopoulos, Phys. Rev. B , 12704 (1998).[23] S. Goedecker, Rev. Mod. Phys. , 1085 (1999).[24] A. P. Bart´ok, S. De, C. Poelking, N. Bernstein, J. R. Ker-mode, G. Cs´anyi, and M. Ceriotti, Sci. Adv. , e1701816(2017).[25] S. Kondati Natarajan, T. Morawietz, and J. Behler, Phys.Chem. Chem. Phys. , 8356 (2015).[26] C. Zhang and G. Galli, J. Chem. Phys. , 084504(2014).[27] A. M. Smith, A. A. Lee, and S. Perkin, The Journal ofPhysical Chemistry Letters , 2157 (2016).[28] Y. Chen, H. I. Okur, N. Gomopoulos, C. Macias-Romero,P. S. Cremer, P. B. Petersen, G. Tocci, D. M. Wilkins,C. Liang, M. Ceriotti, and S. Roke, Sci. Adv. , e1501891(2016).[29] A. P. Gaiduk and G. Galli, J. Phys. Chem. Lett. , 1496(2017).[30] L. Belloni, D. Borgis, and M. Levesque, J. Phys. Chem.Lett. , 1985 (2018).[31] F. Coupette, A. A. Lee, and A. H¨artel, Phys. Rev. Lett. , 075501 (2018).[32] A. M. Reilly and A. Tkatchenko, Phys. Rev. Lett. ,055701 (2014).[33] A. Ambrosetti, N. Ferri, R. A. DiStasio, andA. Tkatchenko, Science , 1171 (2016). [34] J. I. Siepmann and M. Sprik, The Journal of ChemicalPhysics , 511 (1995).[35] C. Merlet, C. Pan, B. Rotenberg, P. A. Madden, P. Si-mon, and M. Salanne, The Journal of Physical ChemistryLetters , 264 (2013).[36] T. Dufils, G. Jeanmairet, B. Rotenberg, M. Sprik, andM. Salanne, Phys. Rev. Lett. , 195501 (2019).[37] L. Scalfi, D. T. Limmer, A. Coretti, S. Bonella, P. A.Madden, M. Salanne, and B. Rotenberg, Phys. Chem.Chem. Phys. , 10480 (2020).[38] J. D. Elliott, A. Troisi, and P. Carbone, Journal ofChemical Theory and Computation , 5253 (2020).[39] A. Grisafi, D. M. Wilkins, G. Cs´anyi, and M. Ceriotti,Phys. Rev. Lett. , 036002 (2018).[40] D. M. Wilkins, A. Grisafi, Y. Yang, K. U. Lao, R. A.DiStasio, and M. Ceriotti, Proc. Natl. Acad. Sci. U. S.A. , 3401 (2019).[41] M. Veit, D. M. Wilkins, Y. Yang, R. A. DiStasio, andM. Ceriotti, J. Chem. Phys. , 024113 (2020).[42] M. Rupp, A. Tkatchenko, K.-R. M¨uller, and O. A. vonLilienfeld, Phys. Rev. Lett. , 058301 (2012).[43] H. Huo and M. Rupp, arXiv:1704.06439 (2017).[44] M. Hirn, S. Mallat, and N. Poilvert, Multiscale Modeling& Simulation , 827 (2017).[45] R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A.Von Lilienfeld, J. Chem. Theory Comput. , 2087 (2015).[46] Z. Deng, C. Chen, X.-G. Li, and S. P. Ong, npj Compu-tational Materials , 75 (2019).[47] K. Rossi, V. Jur´askov´a, R. Wischert, L. Garel, C. Cormin-bœuf, and M. Ceriotti, J. Chem. Theory Comput. ,acs.jctc.0c00362 (2020).[48] C. M. Handley and P. L. A. Popelier, J. Chem. TheoryComput. , 1474 (2009).[49] N. Artrith, T. Morawietz, and J. Behler, Phys. Rev. B , 153101 (2011).[50] T. Bereau, D. Andrienko, and O. A. Von Lilienfeld, J.Chem. Theory Comput. , 3225 (2015).[51] T. Bereau, R. A. DiStasio, A. Tkatchenko, and O. A.von Lilienfeld, J. Chem. Phys. , 241706 (2018).[52] P. Bleiziffer, K. Schaller, and S. Riniker, Journal ofChemical Information and Modeling , 579 (2018).[53] B. Nebgen, N. Lubbers, J. S. Smith, A. E. Sifain,A. Lokhov, O. Isayev, A. E. Roitberg, K. Barros, andS. Tretiak, J. Chem. Theory Comput. , 4687 (2018).[54] K. Yao, J. E. Herr, D. Toth, R. Mckintyre, and J. Parkhill,Chem. Sci. , 2261 (2018).[55] L. Zhang, M. Chen, X. Wu, H. Wang, W. E, and R. Car,arXiv:1906.11434 (2019).[56] T. Bereau, R. A. DiStasio, A. Tkatchenko, and O. A. vonLilienfeld, The Journal of Chemical Physics , 241706(2018).[57] S. A. Ghasemi, A. Hofstetter, S. Saha, and S. Goedecker,Phys. Rev. B , 045131 (2015).[58] S. Faraji, S. A. Ghasemi, S. Rostami, R. Rasoulkhani,B. Schaefer, S. Goedecker, and M. Amsler, Phys. Rev. B , 104105 (2017).[59] A. Grisafi and M. Ceriotti, J. Chem. Phys. , 204105(2019).[60] Strictly speaking, g in Eq. (3) has twice the variance asthat in (1), but we re-define the density function accord-ingly.[61] M. J. Willatt, F. Musil, and M. Ceriotti, Phys. Chem.Chem. Phys. , 29661 (2018). [62] A. Grisafi, D. M. Wilkins, M. J. Willatt, and M. Ceriotti,in Machine Learning in Chemistry , Vol. 1326, edited byE. O. Pyzer-Knapp and T. Laino (American ChemicalSociety, Washington, DC, 2019) pp. 1–21.[63] In particular, we consider σ = 1 if the learning targetbehaves as a polar tensor and σ = − , 1461 (1975).[66] A. Glielmo, C. Zeni, and A. De Vita, Phys. Rev. B ,184307 (2018).[67] R. Drautz, Phys. Rev. B , 014104 (2019).[68] R. Jinnouchi, F. Karsai, C. Verdi, R. Asahi, andG. Kresse, J. Chem. Phys. , 234102 (2020).[69] A. Stone, The Theory of Intermolecular Forces , Interna-tional Series of Monographs on Chemistry (ClarendonPress, 1997).[70] D. J. Griffiths,
Introduction to electrodynamics; 4th ed. (Pearson, Boston, MA, 2013) re-published by CambridgeUniversity Press in 2017.[71] L. A. Burns, J. C. Faver, Z. Zheng, M. S. Marshall, D. G.Smith, K. Vanommeslaeghe, A. D. MacKerell, K. M. Merz,and C. D. Sherrill, J. Chem. Phys. , 161727 (2017).[72] A. Tkatchenko and M. Scheffler, Phys. Rev. Lett. ,073005 (2009). [73] V. Blum, R. Gehrke, F. Hanke, P. Havu, V. Havu, X. Ren,K. Reuter, and M. Scheffler, Computer Physics Commu-nications , 2175 (2009).[74] F. Musil, M. J. Willatt, M. A. Langovoy, and M. Ceriotti,J. Chem. Theory Comput. , 906 (2019).[75] B. Huang and O. A. Von Lilienfeld, J. Chem. Phys. (2016), 10.1063/1.4964627.[76] R. A. Bell, M. C. Payne, and A. A. Mostofi, The Journalof Chemical Physics , 164703 (2014).[77] Y. Litman, D. Donadio, M. Ceriotti, and M. Rossi, J.Chem. Phys. , 102320 (2018).[78] D. Maksimov, C. Baldauf, and M. Rossi, Int J QuantumChem (2020), 10.1002/qua.26369.[79] M. W. Finnis, R. Kaschner, C. Kruse, J. Furthmuller, andM. Scheffler, J. Phys.: Condens. Matter , 2001 (1995).[80] J. Neugebauer and M. Scheffler, Phys. Rev. B , 16067(1992).[81] L. Mørch Folkmann Garner, Machine-Learning the Po-larizabilities and Permanent Dipole Moments of AminoAcids , Master’s thesis, Department of Chemistry, Univer-sity of Copenhagen (2020).[82] F. M. Paruzzo, A. Hofstetter, F. Musil, S. De, M. Ceriotti,and L. Emsley, Nat. Commun.9