A Universal Framework for Featurization of Atomistic Systems
AA UNIVERSAL FRAMEWORK FOR FEATURIZATION OF ATOMISTICSYSTEMS
A P
REPRINT
Xiangyun Lei
School of Chemical and Biomolecular EngineeringGeorgia Institute of TechnologyAtlanta, GA, 30318 USA [email protected]
Andrew J. Medford
School of Chemical and Biomolecular EngineeringGeorgia Institute of TechnologyAtlanta, GA, 30318 USA [email protected]
February 8, 2021 A BSTRACT
Molecular dynamics simulations are an invaluable tool in numerous scientific fields. However, theubiquitous classical force fields cannot describe reactive systems, and quantum molecular dynamicsare too computationally demanding to treat large systems or long timescales. Reactive force fieldsbased on physics or machine learning can be used to bridge the gap in time and length scales, butthese force fields require substantial effort to construct and are highly specific to a given chemicalcomposition and application. The extreme flexibility of machine learning models promises to yieldreactive force fields that provide a more general description of chemical bonding. However, asignificant limitation of machine learning models is the use of element-specific features, leading tomodels that scale poorly with the number of elements. This work introduces the Gaussian multi-pole(GMP) featurization scheme that utilizes physically-relevant multi-pole expansions of the electrondensity around atoms to yield feature vectors that interpolate between element types and have a fixeddimension regardless of the number of elements present. We combine GMP with neural networks todirectly compare it to the widely-used Behler-Parinello symmetry functions for the MD17 dataset,revealing that it exhibits improved accuracy and computational efficiency. Further, we demonstratethat GMP-based models can achieve chemical accuracy for the QM9 dataset, and their accuracyremains reasonable even when extrapolating to new elements. Finally, we test GMP-based models forthe Open Catalysis Project (OCP) dataset, revealing comparable performance and improved learningrates when compared to graph convolutional deep learning models. The results indicate that thisfeaturization scheme fills a critical gap in construction of efficient and transferable reactive forcefields.Atomistic simulations are a crucial tool in many scientific fields, ranging from protein engineering to materials design[1, 2, 3, 4, 5, 6, 7, 8, 9]. Full quantum-mechanical treatment of atoms provides highly accurate energies and forces, butthe computational cost is prohibitive for the length and time scales relevant to most applications [10, 11, 12]. Classicaland reactive force fields can act as surrogates for the quantum-mechanical simulations, enabling simulations at longerlength and time scales [13, 14, 15, 16, 17, 18, 7, 19]. However, these models are specialized to specific systems andhave a limited ability to simulate inherently quantum-mechanical phenomena such as covalent bond formation [20].Machine-learning models have recently emerged as a promising strategy to fill the gap between quantum mechanicalsimulations and classical force field models [21, 22, 23]. The field of machine-learned force fields has exploded in thelast decade, leading to a plethora of different machine-learning force field models capable of predicting energies andforces with accuracy comparable to the underlying method [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]. However,most models are customized for specific application domains, and a framework for a general-purpose model has notbeen established.One fundamental problem in machine-learning models for atomistic systems is the issue of feature generation. Cartesiancoordinates and elemental identities are the most common way to define atomistic systems. However, this descriptiondoes not capture the system’s rotational, translational, or permutation invariances. Converting the Cartesian coordinates a r X i v : . [ phy s i c s . c h e m - ph ] F e b PREPRINT - F
EBRUARY
8, 2021to “feature vectors” that describe each atom is commonly used to solve this problem. Researchers have devised a widerange of featurization strategies that encode these fundamental physical symmetries, including the Coulomb matrix[37],radial distribution function based fingerprint [38], FCHL[35], SOAP [36], COMB [17], Chebyshev polynomials [39]and the atom-centered symmetry functions[24, 17]. These strategies have yielded remarkably accurate models withinvarious sub-fields but are not sufficiently general to treat all atomistic systems.The most common limitation is related to the scaling of the feature vector size as the number of elements in thesystem increases. Most existing descriptors scale polynomially or combinatorially with the number of elements inthe system [24, 36]. This poor scaling means that these featurization strategies can only be applied to training setswith a limited number of elements, making the prospect of a universal machine-learning model that works for allelements infeasible. Moreover, these strategies require a high-dimensional neural network (HDNN) [24, 40] structureto connect the features to atomic properties. The HDNN scheme uses element-specific neural networks, resulting in alarge number of model parameters. The alternative to featurization is deep learning, typically with graph-convolutionalneural network (GCN) models [25, 26, 27, 28, 32, 29, 30]. These deep learning models show an excellent ability tolearn appropriate representations for molecular, solid-state, and surface systems with many elements [41]. However,this comes at the cost of less transparent models that typically require more time for training and prediction than theirfeature-based counterparts. Moreover, many of the deep learning approaches utilize existing featurization strategies[32, 25, 27, 28], suggesting that improved featurization schemes will translate to improved deep learning models.Here we introduce a new featurization approach called Gaussian multi-pole (GMP) features. The GMP features utilizean implicit description of the electron density so that the feature vector’s dimension is independent of the number ofelements. Using the electron density as the fundamental input makes them suitable for universal machine-learningmodels that work for all elements and provides a straightforward route for extending them to systems that involvecharged atoms or magnetic moments. It also naturally allows the use of the more efficient single neural network (SNN)[33] structure, where all elements share the same neural network. Moreover, the GMP features are related to a multi-poleexpansion of the electron density [42, 43], making them physically relevant and systematically improvable. Theseproperties of the GMP features will facilitate a new family of fast and interpretable feature-based machine learningmodels that have the general applicability of more complex and opaque GCN models.The GMP approach encodes elemental identity through an approximate description of the electron density based onGaussian basis functions. In this work, we use the valence density extracted from the SG15 pseudopotentials[44]approximated by 2-7 atom-centered Gaussians per element, normalized by total number of valence electrons. Thenumber and widths of the Gaussians for each element are determined using non-linear regression, and these “staticvalence densities” are fixed for all models presented in this work. Details are provided in the supporting information.Conceptually, this is equivalent to using the valence density of non-interacting atoms as the fundamental description ofthe system (Fig. 1a). This approach also allows for interpolation and extrapolation between different elemental species,and concepts like charged atoms can be naturally accommodated using the isolated charged atom to construct a suitablevalence potential. In principle, it is also possible to use other quantities like the all-electron density, spin density, orself-consistent electron density, as long as the quantity can be represented by a linear combination of atom-centeredGaussians.The approximated electron density is then vectorized via the inner product between the electron density and atom-centered “probe” functions to generate the features for the atoms. The probe function consists of a Gaussian functionand a Maxwell-Cartesian spherical harmonic (MCSH) function. Gaussian functions of varying width account for theradial variation (Fig. 1b) of surrounding electron density of an atom, and the MCSH functions capture radial variation(Fig. 1c). The Gaussian product rule provides analytical solutions to the integral in Eqn. 1, enabling highly efficientcomputation of the features. Rotational invariance is enforced by taking norms of each MCSH group [45, 46].2
PREPRINT - F
EBRUARY
8, 2021Figure 1: Illustration for the GMP featurization scheme. a) Construct the electron density distribution using linearcombination of Gaussians. b) Apply the Gaussian radial probe to focus on the electronic environments at different radiallength scales around each atom core by masking the electron density distribution with Gaussian functions of differentwidths. c) Apply the angular (MCSH) probe that acts as a multi-pole expansion of the radially-masked electron densityvia inner products with different groups of MCSH functions. d) Take the norm of the results for each group to ensurerotational invariance, yielding an entry to the feature vector for each individual atom.Mathematically this is described as µ i,abc = < probe, ˆ ρ > = < angular probe × radial probe, ˆ ρ > = < S abc × G i , (cid:88) jatoms (cid:88) kGaussians G dens,j,k > = (cid:90) (cid:90) (cid:90) V S abc G probe,i (cid:88) jatoms (cid:88) kGaussians G dens,j,k dV = (cid:88) jatoms (cid:88) kGaussians (cid:90) (cid:90) (cid:90) V S abc G probe,i G dens,j,k dV (1)where < a, b > denotes inner product of two functions, V is the volume, µ i,abc is the feature resulting from the radialprobe G i and angular probe S abc . ˆ ρ is the distribution of electron density of a molecule, approximated by linear3 PREPRINT - F
EBRUARY
8, 2021 (a) S (0) P (000) (b) S (1) P (100) (c) S (2) P (200) (d) S (2) P (110) (e) S (3) P (300) (f) S (3) P (210) (g) S (3) P (111) Figure 2: Graphical illustrations of the first 4 orders of Maxwell-Cartesian spherical harmonics (MCSH) denoted by S ( n ) P ( abc ) , where n is the order and P ( abc ) denotes the permutation group of the index abc . The Euclidean norm of theMCSH-based probe in each group provides the 3D rotation-invariant descriptors that are used to form the GMP feature.combinations of primitive Gaussians G dens,j,k centered at each atom. The formalism does not include a strict cutoffradius, though for computational efficiency the sum can be restricted to surrounding atoms with a non-negligible overlap.The MCSH functions S abc are linear combinations of polynomials: S nabc = (cid:88) tterms C t x m x,t y m y,t z m z,t (2)where abc is the specific index of the spherical harmonic function, n = a + b + c is the order of it, and m x , m y , m z arethe exponents of the specific polynomial. The first 4 orders of MCSH functions are listed in Table 1 and illustrated inFig. 2 n group {abc} S ( n ) abc n group {abc} S ( n ) abc x − x x y − y y y − y z x y − y x − x z − z y − y z − z z − xy − x xy xz − x xz yz − y yz xyz Table 1: The analytical expressions of the first four orders of MCSH denoted by S ( n ) abc To ensure that the resulting features are rotationally invariant we use the norm of each group, µ i,abc [45]. Therefore, theGMP feature vector is defined as (cid:126) = = = i,abc = (cid:114) (cid:88) P ( a,b,c ) µ i,abc | a, b, c ∈ N , (3)where = denotes the GMP features, i is an index over the radial probes, abc is an index combination corresponding to arotational group, and P ( a, b, c ) denotes the permutation group of a, b, c (e.g. P (1 , ,
0) = { (1 , , , (0 , , , (0 , , } ).The set of features can thus be written as 4 PREPRINT - F
EBRUARY
8, 2021 (cid:126) = = (cid:113) µ , , (cid:113) µ , + µ , + µ , , (cid:113) µ , + µ , + µ , , (cid:113) µ , + µ , + µ , , . . . (cid:113) µ , , (cid:113) µ , + µ , + µ , , (cid:113) µ , + µ , + µ , , (cid:113) µ , + µ , + µ , , . . .. . . . (4)Conceptually, the resulting features are equivalent to a multi-pole expansion of the electron density around each atom.The multi-pole expansion provides rotational features that are complete and orthogonal to each other. The Gaussianprobe’s width controls the radial length scale of the multi-pole expansions, and the radial features are over-complete.These properties reduce linear dependencies within the features and lead to a systematic improvement in the system’sdescription as the number of features increases.In this work, we combine the GMP features with a neural network regression model to demonstrate the efficiency,accuracy, and transferability of models based on these features. As mentioned, the single neural network (SNN)architecture [33] is well-suited to the GMP features since all elements utilize the same features (Fig. 1). We also utilizea per-element bias term, equivalent to fitting to formation energies instead of total energies, to reduce the magnitudeof the energies. We have implemented the GMP+SNN framework in the AMPTorch code [47, 48] and used thisimplementation for all models in this work. The number of parameters used by the GMP+SNN model varies dependingon the number of features, hidden layers, and nodes per layer but is generally lower than the number of parameters in acomparable GCN by 1-2 orders of magnitude. Details of model fitting, implementation, and parameters are providedin the supplementary information. The notation GMP( N MCSH , N
Gaussian ) is used to denote the feature sets in theexamples below, where N Gaussian is the number of radial Gaussians, and N MCSH is the maximum order MCSHused to construct the feature set. Note that although there are multiple groups in each order of MCSH, with differentimportance in describing the environment, we always include all the groups in each order in this study for simplicity.The notation SNN( N nodes , N layers ) is used to denote SNN model with N layers hidden layers and N nodes nodes perlayer, where the activation function is always T anh , and batch normalization is applied for each layer. Therefore,GMP(3,9) is a feature set constructed using 9 radial Gaussians and all 7 groups of MCSH from order 0 to 3 (resulting in63 features) and GMP(3,9)+SNN(50,3) is a SNN model based on the GMP(3,9) feature set with 3 hidden layers and 50nodes per layer. We note that models with the same number of radial Gaussian probes are not necessarily equivalent,since Gaussian widths are determined manually at this point. Additional details including the widths used for the radialGaussians are provided in the supplementary information.First, we compare the GMP+SNN model to the Behler-Parinello neural network (BPNN) approach that is ubiquitous inmaterials science and chemistry due to its simplicity, generality, and efficiency [24, 49, 50, 51, 52, 53, 48]. We utilizean established molecular dynamics trajectory of the 3-element (C, H, O) aspirin molecule at the DFT/PBE+vdW-TSlevel of theory for this comparison[54]. For BPNN featurization, we use 12 variants inspired by examples in literature[55]. All neural networks consist of 3 hidden layers with 50 nodes each, and the BPNN approach uses separate neuralnetworks for each element. We use a training set size of 40K images for all models, and the test and validation sets eachcontain 10K images. We ensure robustness by using ten randomly selected train/test/validation sets. The performance ismeasured by the mean absolute error (MAE) for predicted energies of the test set images.Fig 3a shows the GMP+SNN model error as a function of the multi-pole expansion order and the number of Gaussianradial probes. The results reveal that the GMP+SNN accuracy increases systematically with the multi-pole expansionorder and the number of radial probes, providing a clear strategy for identifying an appropriate feature set for a givenproblem. In Fig 3b, we compare the accuracy of the GMP+SNN model to the BPNN model as a function of thenumber of features. All Pareto-optimal models utilize the GMP+SNN approach, despite the fact that the BPNN modelshave three times more fitted parameters due to element-specific neural networks. It is also clear that the accuracy ofBPNN models does not systematically improve with the number of features. Finally, we compare the wall time neededto compute a single image for each model. To ensure a fair comparison, we use the same CPU for each test, bothapproaches use the same C source code and loop structure, and the time is an average over the 10K predicted validationimages (see supplementary information). Fig. 3c) shows that the GMP+SNN framework is always faster than the BPNNapproach at a fixed accuracy level, or is always more accurate for a fixed computation time. This example demonstratesthat the GMP+SNN approach is capable of achieving a lower error than the BPNN approach with fewer parameters andless time.The performance of the GMP+SNN can be further improved with force regularization, with a loss function defined as
Γ = 1 N image N image (cid:88) i =1 (cid:13)(cid:13) E iNN − E iref (cid:13)(cid:13) + β N iatom N iatom (cid:88) j =1 (cid:13)(cid:13) F ij,NN − F ij,ref (cid:13)(cid:13) , (5)5 PREPRINT - F
EBRUARY
8, 2021 (a) Convergence of test set MAE as rotational order and number of radialGaussians are increased.(b) Comparison of test MAE for GMP+SNN and BPNN modelsas a function of the number of features. (c) Comparison of test MAE for GMP+SNN and BPNN modelsas a function of the average prediction time for a single image.
Figure 3: Training results of GMP+SNN models compared to Behler-Parrinello + HDNN models. The training setconsists of 40k images, validation set consists of 10k images and test set consists of 10k images, all randomly drawnfrom the aspirin simulation trajectory. Each setting was tested 10 times to ensure robustness.where N image is the number of training images, N iatom is the number of atoms in image i , E NN and E ref are thepredicted and reference energy of a image, F NN and F ref are the predicted and reference forces of an atom alongeach Cartesian axis, and β is the force regularization coefficient. Figure 4 shows the results of GMP+SNN with 2feature sets, GMP(2,4)+SNN(50,3) and GMP(3,12)+SNN(50,3), as a function of the force regularization coefficient, β . For simplicity, this test was done with 1K training images and 1K test images. The results confirm that energytraining benefits from force regularization. In this case, the energy error is reduced by a factor of 2 with the optimalregularization coefficient. Both energy and force prediction accuracy show the same trend and the same optimal forcecoefficient, indicating that the model avoids the apparent trade-off between energy and force accuracy that has beenobserved for some other model architectures [41].Next, we show that the GMP+SNN technique can scale to systems with more elements by training it on the atomizationenergy of the established QM9 benchmark dataset [56]. This dataset consists of 134K chemical species with up to9 heavy atoms and five elements (C, H, O, N, F) optimized at the B3LYP level of theory. In this example, 4 GMPdescriptor sets and corresponding SNNs of different sizes are trained and tested using energies of each system. Thelearning curves showing the out-of-sample test error [57] as a function of training set size are shown in Fig. 5a. Theerrors of the tested GMP+SNN models are higher than the state-of-the-art models based on kernel ridge regession orGCNs (about 6 meV with 100K training data) [57, 30, 31, 27, 25, 28, 35, 36], but the GMP+SNN models utilize feweradjustable parameters and are more scalable than non-parametric models like Gaussian processes.6 PREPRINT - F
EBRUARY
8, 2021Figure 4: Test MAE of energies and forces as a function of the force regularization coefficient on 1K randomly selectedimages from the aspirin trajectory. Two feature sets are compared here, with GMP(3,12) (84 features) larger thanGMP(2,4) (16 features). (a) Learning curves of the GMP+SNN models with variousfeature sets. The dashed line indicates chemical accuracy (43meV). (b) Test MAE distribution for model trained exclusively onmolecules without F and tested on all F-containing molecules.
Figure 5: Learning curve and elemental transferability of GMP+SNN on the QM9 datasetFig. 5b shows the results of transfer learning between different elemental species in the QM9 data set. We traineda model on the set of all molecules that do not contain fluorine (131,722 molecules), and tested the same model onfluorine-containing molecules (2,163 molecules). The model error increases by around one order of magnitude when thepredictions include elements not present in the training set. However, the test set MAE is 0.193 eV, which is competitivewith the accuracy of generalized gradient approximation of density functional theory (GGA DFT) [58]. This revealsthat the model is reasonably accurate even for elements outside the training set.Finally, we test the performance of the GMP+SNN approach on the Open Catalysis Project (OC20) S2EF dataset.This dataset consists of geometries and corresponding adsorption energies of more than 100M adsorbate-catalyst pairscontaining a total of 56 elements across 82 adsorbates and up to different catalyst compositions for each adsorbate.The dataset also provides an independent set of 1M test systems [41]. The energies correspond to the GGA DFT levelof theory with the RBPE functional [59], with mixed boundary conditions. The size, number of elements, and mix ofsolid-state and molecular systems make this one of the most challenging benchmark datasets available. To date, the onlymodels capable of training and prediction for this dataset utilize elaborate GCN models [41]. Fig. 6 shows the learningcurves for two GMP+SNN models of different sizes tested on the provided in-domain (ID) validation set. For this test,we compare two GMP+SNN models. The small model uses the GMP(2,10)+SNN(50,5) architecture (40 features, 10K7 PREPRINT - F
EBRUARY
8, 2021parameters), while the large model uses the GMP(4,19)+SNN(350,5) architecture (209 features, 400K parameters).Details of both models are provided in the supplementary information The full OC20 training set of 100M energies istoo large for the temporary memory of the AMPTorch architecture on available computing resources, so we restrict theanalysis to energy training on data sets with fewer than 200K training images.The results of training and testing on the OC20 set are shown in Fig. 6. The in-domain test error reaches a minimum of1.76 eV on 200K training images with the GMP(4,19)+SNN(350,5) model, a result comparable to the performanceof the DimeNet++ (1.8 eV) and SchNet (1.4 eV) GCN models with the same training set size [41]. The error at200K training images also decreases from 3.17 eV with the GMP(2,10)+SNN(50,5) model to 1.76 eV with theGMP(4,19)+SNN(350,5) model, indicating that the error should decrease further as more features are included or largerneural networks are utilized. Moreover, the GMP(4,19)+SNN(350,5) shows a steady learning speed of -0.38 (defined bythe slope of the learning curve) that is faster than the learning speeds observed for GCN models, although the learningspeed of GCN models are estimated on larger training sets, so the speeds may not be directly comparable [41]. If thelearning speed of the large model remains constant for larger training sets then the GMP(4,19)+SNN(350,5) model willsurpass GCN models at around 2.5M training images, and will reach chemical accuracy around 3B training images. Thelearning speed and ultimate performance of GMP+SNN models is also expected to improve as more features, largerneural networks, and force regularization are utilized, although significant computational resources will be required tooptimize and evaluate GMP+SNN models for the OC20 dataset.Figure 6: Learning curves for GMP+SNN models applied to the OC20 S2EF dataset, with up to 200K training points.The ID validation sets consist of 1M images. Dashed lines are the trend lines extracted from the largest 4 training sets.These examples demonstrate that the GMP featurization scheme is an efficient and universal approach to fingerprintingatomistic systems with an arbitrary number of elements or atom types. The GMP features are more computationallyefficient than the deep-learning GCN models that are commonly used for many-element systems, and are even fasterthan the widely-used Behler-Parinello symmetry functions, resulting in machine-learned force fields that can be scaledto large many-element systems and long time scales. The GMP feature vectors utilize an implicit description ofthe system’s electron density, making the number of features independent of the number of elements, facilitatingthe inclusion of general concepts from electronic structure theory, and enabling extrapolation to new element types.Moreover, the GMP features utilize physically-meaningful concepts, leading to more interpretable models that can besystematically improved and providing a foundation for hybrid models that incorporate more physics. For example, theself-consistent electron density from a simplified Hamiltonian could be used as input to the GMP model, or the limitingbehavior derived from electrostatics could be used to constrain the regression model.The examples presented here use the SNN regression model in conjunction with the GMP features based on staticvalence densities to obtain accuracy comparable to state-of-the-art models on the QM9 and OC20 datasets. The resultsconfirm that the GMP+SNN approach can reach competitive accuracy with GCN based models despite having far feweradjustable parameters. There are numerous opportunities to revise and optimize the details of the GMP+SNN approachpresented here, including modifying the valence density representation, optimizing feature selection and systematicoptimization of the SNN architecture. However, the encouraging initial results across a wide range of applicationdomains suggests that the GMP+SNN approach is a promising universal route to constructing efficient and generalmachine-learning models for atomistic systems. 8
PREPRINT - F
EBRUARY
8, 2021
This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Basic EnergySciences Computational Chemical Sciences program under Award Numbers DE-SC0019441 and DE-SC0019410.Computational effort was supplied partially by the National Science Foundation under Grant No. MRI-1828187. Weacknowledge helpful discussions with Andrew Peterson, Zachary Ulissi, and Muhammed Shuaibi.
References [1] Kieron Burke. Perspective on density functional theory.
The Journal of Chemical Physics , 136(15):150901, 2012.[2] Axel D. Becke. Perspective: Fifty years of density-functional theory in chemical physics.
The Journal of ChemicalPhysics , 140(18):18A301, 2014.[3] Narbe Mardirossian and Martin Head-Gordon. Thirty years of density functional theory in computational chemistry:an overview and extensive assessment of 200 density functionals.
Molecular physics , 115(19):2315–2372, 2017.[4] Adam Hospital, Josep Ramon Goñi, Modesto Orozco, and Josep L Gelpí. Molecular dynamics simulations:advances and applications.
Advances and applications in bioinformatics and chemistry , 8:37–47, 2015.[5] Marco De Vivo, Matteo Masetti, Giovanni Bottegoni, and Andrea Cavalli. Role of molecular dynamics and relatedmethods in drug discovery.
Journal of medicinal chemistry , 59(9):4035–4061, 2016.[6] J. Andrew McCammon and Martin Karplus. Molecular dynamics simulations of biomolecules.
Nature structuralbiology , 9(9):646–652, 2002.[7] Judith A Harrison, J. David Schall, Sabina Maskey, Paul T Mikulski, M. Todd Knippenberg, and Brian H Morrow.Review of force fields and intermolecular potentials used in atomistic computational materials research.
AppliedPhysics Reviews , 5(3):31104, 2018.[8] Artem R Oganov, Chris J Pickard, Qiang Zhu, and Richard J Needs. Structure prediction drives materials discovery.
Nature reviews. Materials , 4(5):331–348, 2019.[9] Xinguo Ren, Patrick Rinke, Christian Joas, and Matthias Scheffler. Random-phase approximation and itsapplications in computational chemistry and materials science.
Journal of materials science , 47(21):7447–7471,2012.[10] Edoardo Mosconi, Jon M Azpiroz, and Filippo De Angelis. Ab initio molecular dynamics simulations ofmethylammonium lead iodide perovskite degradation by water.
Chemistry of materials , 27(13):4885–4892, 2015.[11] Dominik Marx and Jürg Hutter.
Ab Initio Molecular Dynamics: Basic Theory and Advanced Methods . CambridgeUniversity Press, 2009.[12] Radu Iftimie, Peter Minary, Mark E. Tuckerman, and Bruce J. Berne. Ab initio molecular dynamics: Concepts,recent developments, and future trends.
Proceedings of the National Academy of Sciences - PNAS , 102(19):6654–6659, 2005.[13] B. R Brooks, C. L Brooks, A. D Mackerell, L Nilsson, R. J Petrella, B Roux, Y Won, G Archontis, C Bartels,S Boresch, A Caflisch, L Caves, Q Cui, A. R Dinner, M Feig, S Fischer, J Gao, M Hodoscek, W Im, K Kuczera,T Lazaridis, J Ma, V Ovchinnikov, E Paci, R. W Pastor, C. B Post, J. Z Pu, M Schaefer, B Tidor, R. M Venable,H. L Woodcock, X Wu, W Yang, D. M York, and M Karplus. Charmm: The biomolecular simulation program.
Journal of computational chemistry , 30(10):1545–1614, 2009.[14] David A Case, Thomas E Cheatham, Tom Darden, Holger Gohlke, Ray Luo, Kenneth M Merz, Alexey Onufriev,Carlos Simmerling, Bing Wang, and Robert J Woods. The amber biomolecular simulation programs.
Journal ofcomputational chemistry , 26(16):1668–1688, 2005.[15] Adri C. T van Duin, Siddharth Dasgupta, Francois Lorant, and William A Goddard. Reaxff: A reactive forcefield for hydrocarbons.
The journal of physical chemistry. A, Molecules, spectroscopy, kinetics, environment, andgeneral theory , 105(41):9396–9409, 2001.[16] William L Jorgensen and Julian Tirado-Rives. The opls [optimized potentials for liquid simulations] potentialfunctions for proteins, energy minimizations for crystals of cyclic peptides and crambin.
Journal of the AmericanChemical Society , 110(6):1657–1666, 1988.[17] Simon R. Phillpot, Andrew C. Antony, Linyuan Shi, Michele L. Fullarton, Tao Liang, Susan B. Sinnott, YongfengZhang, and S. Bulent Biner. Charge optimized many body (comb) potentials for simulation of nuclear fuel andclad.
Computational Materials Science , 148:231 – 241, 2018.9
PREPRINT - F
EBRUARY
8, 2021[18] Thomas P Senftle, Sungwook Hong, Md Mahbubul Islam, Sudhir B Kylasa, Yuanxia Zheng, Yun Kyung Shin,Chad Junkermeier, Roman Engel-Herbert, Michael J Janik, Hasan Metin Aktulga, Toon Verstraelen, AnanthGrama, and Adri C T van Duin. The reaxff reactive force-field: development, applications and future directions. npj computational materials , 2(1):15011, 2016.[19] Alexander D MacKerell. Chapter 7 empirical force fields for proteins: Current status and future directions.
AnnualReports in Computational Chemistry , 1:91–102, 2005.[20] Judith A Harrison, J. David Schall, Sabina Maskey, Paul T Mikulski, M. Todd Knippenberg, and Brian H Morrow.Review of force fields and intermolecular potentials used in atomistic computational materials research.
AppliedPhysics Reviews , 5(3):31104, 2018.[21] Chris M Handley and Paul L. A Popelier. Potential energy surfaces fitted by artificial neural networks.
The journalof physical chemistry. A, Molecules, spectroscopy, kinetics, environment, and general theory , 114(10):3371–3383,2010.[22] Jörg Behler. Perspective: Machine learning potentials for atomistic simulations.
The Journal of chemical physics ,145(17):170901–170901, 2016.[23] Rampi Ramprasad, Rohit Batra, Ghanshyam Pilania, Arun Mannodi-Kanakkithodi, and Chiho Kim. Machinelearning in materials informatics: recent applications and prospects. npj computational materials , 3(1):1–13,2017.[24] Jörg Behler and Michele Parrinello. Generalized neural-network representation of high-dimensional potential-energy surfaces.
Phys. Rev. Lett. , 98:146401, 4 2007.[25] Kristof Schütt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, andKlaus-Robert Müller. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions.In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,
Advancesin Neural Information Processing Systems , volume 30, pages 991–1001. Curran Associates, Inc., 2017.[26] Tian Xie and Jeffrey C Grossman. Crystal graph convolutional neural networks for an accurate and interpretableprediction of material properties.
Physical review letters , 120(14):145301–145301, 2018.[27] Johannes Klicpera, Janek Groß, and Stephan Günnemann. Directional message passing for molecular graphs. In
International Conference on Learning Representations , 2020.[28] Oliver T Unke and Markus Meuwly. Physnet: A neural network for predicting energies, forces, dipole moments,and partial charges.
Journal of chemical theory and computation , 15(6):3678–3693, 2019.[29] Nicholas Lubbers, Justin S Smith, and Kipton Barros. Hierarchical modeling of molecular energies using a deepneural network.
The Journal of chemical physics , 148(24):241715–241715, 2018.[30] Shuo Zhang, Yang Liu, and Lei Xie. Molecular mechanics-driven graph neural network with multiplex graph formolecular structures. 2020.[31] Zeren Shui and George Karypis. Heterogeneous molecular graph neural networks for predicting moleculeproperties. 2020.[32] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passingfor quantum chemistry. 2017.[33] Mingjie Liu and John R. Kitchin. Singlenn: Modified behler–parrinello neural network with shared weights foratomistic simulations with transferability.
The Journal of Physical Chemistry C , 124(32):17811–17818, 2020.[34] J. S. Smith, O. Isayev, and A. E. Roitberg. Ani-1: an extensible neural network potential with dft accuracy at forcefield computational cost.
Chem. Sci. , 8:3192–3203, 2017.[35] Felix A Faber, Anders S Christensen, Bing Huang, and O. Anatole von Lilienfeld. Alchemical and structuraldistribution based representation for universal quantum machine learning.
The Journal of chemical physics ,148(24):241717–241717, 2018.[36] Albert P Bartók, Sandip De, Carl Poelking, Noam Bernstein, James R Kermode, Gábor Csányi, and MicheleCeriotti. Machine learning unifies the modeling of materials and molecules.
Science advances , 3(12):e1701816–e1701816, 2017.[37] Matthias Rupp, Alexandre Tkatchenko, Klaus-Robert Müller, and O Anatole von Lilienfeld. Fast and accuratemodeling of molecular atomization energies with machine learning.
Physical review letters , 108(5):058301–058301, 2012.[38] Venkatesh Botu and Rampi Ramprasad. Adaptive machine learning framework to accelerate ab initio moleculardynamics.
International journal of quantum chemistry , 115(16):1074–1083, 2015.10
PREPRINT - F
EBRUARY
8, 2021[39] Nongnuch Artrith, Alexander Urban, and Gerbrand Ceder. Efficient and accurate machine-learning interpolationof atomic energies in compositions with many species.
Phys. Rev. B , 96:014112, Jul 2017.[40] J Behler. Representing potential energy surfaces by high-dimensional neural network potentials.
Journal ofPhysics: Condensed Matter , 26(18):183001, apr 2014.[41] Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, KevinTran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, JunwoongYoon, Devi Parikh, C. Lawrence Zitnick, and Zachary Ulissi. The open catalyst 2020 (oc20) dataset and communitychallenges, 2020.[42] A. R Edmonds.
Angular momentum in quantum mechanics.
Investigations in physics ; 4. 1957.[43] William J Thompson and LeRoy F Cook. Angular momentum: An illustrated guide to rotational symmetries forphysical systems.
American Journal of Physics , 63(7):670–671, 1995.[44] D. R. Hamann. Optimized norm-conserving vanderbilt pseudopotentials.
Phys. Rev. B , 88:085117, Aug 2013.[45] Xiangyun Lei and Andrew J. Medford. Design and analysis of machine learning exchange-correlation functionalsvia rotationally invariant convolutional descriptors.
Phys. Rev. Materials , 3:063801, 6 2019.[46] Jon Applequist. Maxwell–cartesian spherical harmonics in multipole potentials and atomic orbitals.
TheoreticalChemistry Accounts , 107(2):103–115, 2002.[47] Amptorch. https://github.com/ulissigroup/amptorch , 2020.[48] Muhammed Shuaibi, Saurabh Sivakumar, Rui Qi Chen, and Zachary W Ulissi. Enabling robust offline activelearning for machine learning potentials using simple physics-based priors.
Machine Learning: Science andTechnology , 2(2), 2020.[49] Kyuhyun Lee, Dongsun Yoo, Wonseok Jeong, and Seungwu Han. Simple-nn: An efficient package for trainingand executing neural-network interatomic potentials.
Computer Physics Communications , 242:95 – 103, 2019.[50] Alireza Khorshidi and Andrew A. Peterson. Amp: A modular approach to machine learning in atomisticsimulations.
Computer Physics Communications , 207:310 – 324, 2016.[51] Nongnuch Artrith, Tobias Morawietz, and Jörg Behler. High-dimensional neural-network potentials for multicom-ponent systems: Applications to zinc oxide.
Phys. Rev. B , 83:153101, Apr 2011.[52] Nongnuch Artrith and Jörg Behler. High-dimensional neural network potentials for metal surfaces: A prototypestudy for copper.
Phys. Rev. B , 85:045439, Jan 2012.[53] Nongnuch Artrith, Björn Hiller, and Jörg Behler. Neural network potentials for metals and oxides – firstapplications to copper clusters at zinc oxide. physica status solidi (b) , 250(6):1191–1203, 2013.[54] Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, and Alexandre Tkatchenko. Towards exact moleculardynamics simulations with machine-learned force fields.
Nature Communications , 9(1):3887, 2018.[55] Christoph Schran, Jörg Behler, and Dominik Marx. Automated fitting of neural network potentials at coupledcluster accuracy: Protonated water clusters as testing ground.
Journal of chemical theory and computation ,16(1):88–99, 2020.[56] Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O. Anatole von Lilienfeld. Quantum chemistrystructures and properties of 134 kilo molecules.
Scientific data , 1(1):140022–140022, 2014.[57] O. Anatole von Lilienfeld, Klaus-Robert Müller, and Alexandre Tkatchenko. Exploring chemical compound spacewith quantum-based machine learning. 2019.[58] Kevin E Riley, Bryan T Op’t Holt, and Kenneth M Merz. Critical assessment of the performance of densityfunctional methods for several atomic and molecular properties.
Journal of chemical theory and computation ,3(2):407–433, 2007.[59] B. Hammer, L. B. Hansen, and J. K. Nørskov. Improved adsorption energetics within density-functional theoryusing revised perdew-burke-ernzerhof functionals.
Phys. Rev. B , 59:7413–7421, Mar 1999.11
PREPRINT - F
EBRUARY
8, 2021
Please checkout and install the “MCSH_paper1” branch of
AM P T orch to try any of the saved models and testscripts in this study. The code can be found here: https://github.com/ulissigroup/amptorch/tree/MCSH_paper1 . The test scripts can be found here: here:https://github.com/ray38/GMP_AmpTorch_Tests . Tutorialsfor regenerating all the test models are all included in the repo.12
PREPRINT - F
EBRUARY
8, 2021
Cutoffs for both the BP scheme and the GMP scheme are set to 10 Å. The neural networks are all trained using thesame procedure: lr = 1 e − for epochs. For the BP vs. GMP comparison example on aspirin MD data, the batchsize is chosen to be 256 images. For the force training example, the batch size is chosen to be 32 images.
12 sets of Behler-Parrinello feature are selected, as listed belowSet G2 G4 N featurea η R s η ζ γ [0.05, 0.0965, 0.1864, 0.3598,0.6947, 1.3413 , 2.5897, 5.0] [0, 1.5] [0.001, 0.01, 0.03] [1.0, 2.0, 4.0] [1.0, -1.0] [0.05, 0.0965, 0.1864, 0.3598,0.6947, 1.3413 , 2.5897, 5.0] [0, 1.5] [0.01, 0.03] [1.0, 4.0] [1.0, -1.0] [0.05, 0.0965, 0.1864, 0.3598,0.6947, 1.3413 , 2.5897, 5.0] [0] [0.01] [1.0, 4.0] [1.0, -1.0] [0.05, 0.0965, 0.1864, 0.3598,0.6947, 1.3413 , 2.5897, 5.0] [0] [0.001, 0.01, 0.03] [1.0, 2.0, 4.0] [1.0, -1.0] [0.05, 0.0965, 0.1864, 0.3598,0.6947, 1.3413 , 2.5897, 5.0] [0] [ 0.01, 0.03] [1.0, 4.0] [1.0, -1.0] [0.05, 0.0965, 0.1864, 0.3598,0.6947, 1.3413 , 2.5897, 5.0] [0, 1.5] [0.01] [1.0, 4.0] [1.0, -1.0] [0.05, 0.2324, 1.0772, 5.] [0, 1.5] [0.001, 0.01, 0.03] [1.0, 2.0, 4.0] [1.0, -1.0] [0.05, 0.2324, 1.0772, 5.] [0, 1.5] [0.01, 0.03] [1.0, 4.0] [1.0, -1.0] [0.05, 0.2324, 1.0772, 5.] [0] [ 0.01] [1.0, 4.0] [1.0, -1.0] [0.05, 0.2324, 1.0772, 5.] [0] [0.001, 0.01, 0.03] [1.0, 2.0, 4.0] [1.0, -1.0] [0.05, 0.2324, 1.0772, 5.] [0] [0.01, 0.03] [1.0, 4.0] [1.0, -1.0] [0.05, 0.2324, 1.0772, 5.] [0, 1.5] [0.01] [1.0, 4.0] [1.0, -1.0] η and R s for G2 functions are used combinatorially, same as η , ζ and γ for G4 functions. Moreover, there are 3 types of elements (C, H, O) for this dataset. Therefore, feature set1 has elements ) × η ) × R s ) + 6( possible element pairs ) × η ) × ζ ) s × γ ) = 156 features per atom. a Number of features per atom.The list of test results with BP + HDNN models are shown below in Table 313
PREPRINT - F
EBRUARY
8, 2021Set N featurea MAE train (meV) MAE test (meV) Time (ms/image)1 156 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± .
10 120 . ± . . ± . . ± .
11 60 . ± . . ± . . ± .
12 48 . ± . . ± . . ± . Table 3: Performance test results of the tested GMP + HDNN setups. The values are the average values of the 10 trials,and the uncertainties are estimated by their standard deviation. a Number of features per atom.
The probe of GMP has two parts: groups of MCSHs for probing angular features and radial gaussian for probing radialfeatures. In this work, all groups of MCSH up to the indicated order are included, and the number of possible groupsfor each order are listed below in Table 4:Order Number of Possible Groups Total Number of Groups up to This Order0 1 11 1 22 2 43 3 74 4 115 5 166 7 237 8 318 10 41Table 4: List of possible groups of MCSHs for each order. Note that the number of possible groupsWe combine the radial probes with the lists of radial Gaussians combinatorially to obtain the full list of probes/features.The list of widths (standard deviations) of the Gaussians used in this test are manually picked and listed below in Table5: Number of Gaussians List of Sigmas1 [0.25]2 [0.25, 2.0]3 [0.25, 1.0, 2.0]4 [0.25, 0.75, 1.5, 2.0]5 [0.25, 0.5, 1.0, 1.5, 2.0]6 [0.25, 0.5, 0.75, 1.0, 1.5, 2.0]Table 5: List of Sigmas used in Test 1.Therefore, when there are 5 Gaussians with MCSH up to order 6, there are × descriptors per atom. Thecomplete list of test results with GMP+SNN is give in Table 6.14 PREPRINT - F
EBRUARY
8, 2021Num. Gaussians a MCSH order b N featurec MAE train (meV) MAE test (meV) Time (ms/image)1 0 1 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . Table 6: Performance test results of the full sets of tested GMP + SNN setups. The values are the average values of the10 trials, and the uncertainties are estimated by their standard deviation. a Number of possible Gaussian functions usedto construct the descriptor probes. b The highest MCSH order used to construct the probes. For example, when highestorder is 2, that means all groups from MCSH of order 0, 1 and 2 are used to construct the probes. c Number of featuresper atom. 15
PREPRINT - F
EBRUARY
8, 2021Model Sigmas N feature GM P (2 ,
4) +
SN N (50 , [0.25, 0.75, 1.5, 2.0] 16 GM P (3 ,
12) +
SN N (50 , [0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0, 1.25, 1.5, 1.75, 2.0] 84Table 7: Setups for the GMP+SNN models used in the force training example The sigmas of the radial probe Gaussians are listed in Table 716
PREPRINT - F
EBRUARY
8, 2021
A per-element bias is added to the SNN model to improve performance. Conceptually, this is equivalent to fitting toformation energies rather than absolute energies. The total energy of an atom is the model predicted energy plus theper-element bias of the specific atom type. To determine the bias, a linear model is applied. The number of atomsfor each element types are counted for all the images in the training set, and they are the independent variable. Thecorresponding energies for each system is the dependent variable. For example, the per-element bias of the trials with100K training images shown below in Table 8:Atom Type Per-element Bias (meV)H -2795.2721C -6217.7719N -4552.4431O -4432.6761F -4075.4931Table 8: Per-element Bias of each atom type found by the linear model, for the 100K training set
With the per-element bias determined, GMP+SNN models are fitted to the atomization energy minus the per-elementbiases. The model setups are given in Table 9. The cutoff distance is always 15 Å, so that the largest radial probe takesa negligible value of × − at the cutoff. The models are trained for 12,000 epochs with learning rate decrease byfactor of 2 every 2,000 epochs, from − to − . The batch size is set to be 32 images. For this example we used the same procedure as above, with the caveat that the per-element biases are not fitted usinglinear model, but directly pulled from the 100k molecule trial. For more detail please refer to the test scripts.Model Sigmas N feature N parameters GM P (1 ,
8) +
SN N (15 , [0.02,0.2,0.4,0.69,1.1,1.66,2.66,4.4] 16 1111 GM P (2 ,
8) +
SN N (25 , [0.02,0.2,0.4,0.69,1.1,1.66,2.66,4.4] 84 3001 GM P (2 ,
13) +
SN N (60 , [0.02,0.12,0.24,0.36,0.5,0.69,0.92,1.2,1.52,2.0,2.66,3.5,5.0] 52 14701 GM P (2 ,
37) +
SN N (100 , [0.02,0.05,0.08,0.12,0.16,0.2,0.24,0.28,0.32,0.36,0.4,0.45,0.5,0.56,0.62,0.69,0.76,0.84,0.92,1.01,1.1,1.2,1.3,1.4,1.52,1.66,1.82,2.0,2.42,2.66,2.92,3.2,3.5,3.9,4.4,5.0] 148 45701Table 9: Setups for the GMP+SNN models used in the QM9 examples. Cutoff distance is always 15 Å. N parameters isthe number of trainable parameters of the neural network model.17 PREPRINT - F
EBRUARY
8, 2021
The per-element biases are determined by the 200K training set, and used for all trials. The biases are listed in Table 10.The model setups are given in Table 11. The models were trained for roughly 10,000 epochs with learning rate decreasefrom − to − , and batch size of 32 images.Type Per-element Bias (eV) Type Per-element Bias (eV) Type Per-element Bias (eV)H -3.459224543285167 B -6.131687348061676 C -8.409056118016434N -8.482527946982596 O -6.625329843626161 Na -1.6272161266181353Al -3.685546100708493 Si -5.366115554333206 P -5.356124619536539S -4.701875825379599 Cl -3.206133340067741 K -1.6639990985033588Ca -2.564389194029974 Sc -6.857051683285094 Ti -7.83797523757251V -8.469671104728693 Cr -8.525323765104233 Mn -7.949821377229276Fe -7.29185704318087 Co -6.339343605830223 Ni -4.983740855397919Cu -3.0599559655471738 Zn -0.7636980492562433 Ga -2.645814340187994Ge -4.137512388812219 As -4.578400063883111 Se -3.9030270048270483Rb -1.501116295377563 Sr -2.4817501066487404 Y -7.159640206699965Zr -8.657410352144826 Nb -9.692635800596257 Mo -9.917990868764402Tc -9.376118112224386 Ru -8.487414762933673 Rh -6.88580158940369Pd -4.878716025523049 Ag -2.002845859469274 Cd -0.3521934743778097In -2.1579404553381734 Sn -3.489109201477598 Sb -3.7507685121733054Te -3.1467202546325685 Cs -1.6714210826933673 Hf -10.015796029262765Ta -11.220753704243743 W -11.738519709357998 Re -11.149426336972052Os -10.243098652486818 Ir -8.378935731902352 Pt -5.868231040118616Au -2.7985176919629122 Hg 0.12859186837374637 Tl -1.9054477570253923Pb -3.1333889952396787 Bi -3.430346703339706Table 10: Per-element Bias of each atom type found by the linear model, with the 200K training set for the OC20dataset Model Sigmas N feature N parameters GMP(2,10)+SNN(50,5) [0.02,0.16,0.32,0.5,0.76,1.1,1.52,2.2,3.2,5.0] 40 10151GMP(4,19)+SNN(350,5) [0.02,0.08,0.16,0.24,0.32,0.4,0.5,0.62,0.76,0.92,1.1,1.3,1.52,1.82,2.2,2.66,3.2,3.9,5.0] 209 445201Table 11: Setups for the GMP+SNN models used in the OC20 examples. Cutoff distance is always 15 Å. N parametersparameters