[PDF] Accelerating Atomistic Simulations with Piecewise Machine Learned Ab Initio Potentials at Classical Force Field-like Cost

Abstract

Machine learning methods have nowadays become easy-to-use tools for constructing high-dimensional interatomic potentials with ab initio accuracy. Although machine learned interatomic potentials are generally orders of magnitude faster than first-principles calculations, they remain much slower than classical force fields, at the price of using more complex structural descriptors. To bridge this efficiency gap, we propose an embedded atom neural network approach with simple piecewise switching function based descriptors, resulting in a favorable linear scaling with the number of neighbor atoms. Numerical examples validate that this piecewise machine learning model can be over an order of magnitude faster than various popular machine learned potentials with comparable accuracy for both metallic and covalent materials, approaching the speed of the fastest embedded atom method (i.e. several {\mu}s/atom per CPU core). The extreme efficiency of this approach promises its potential in first-principles atomistic simulations of very large systems and/or in long timescale.

Full PDF

11 Accelerating Atomistic Simulations with Piecewise Machine Learned Ab Initio Potentials at Classical Force Field-like Cost

Yaolong Zhang, Ce Hu, and Bin Jiang * Hefei National Laboratory for Physical Science at the Microscale, Key Laboratory of Surface and Interface Chemistry and Energy Catalysis of Anhui Higher Education Institutes, Department of Chemical Physics, University of Science and Technology of China, Hefei, Anhui 230026, China *: corresponding author: [email protected]

Abstract

Machine learning methods have nowadays become easy-to-use tools for constructing high-dimensional interatomic potentials with ab initio accuracy. Although machine learned interatomic potentials are generally orders of magnitude faster than first-principles calculations, they remain much slower than classical force fields, at the price of using more complex structural descriptors. To bridge this efficiency gap, we propose an embedded atom neural network approach with simple piecewise switching function based descriptors, resulting in a favorable linear scaling with the number of neighbor atoms. Numerical examples validate that this piecewise machine learning model can be over an order of magnitude faster than various popular machine learned potentials with comparable accuracy for both metallic and covalent materials, approaching the speed of the fastest embedded atom method ( i.e. several μ s/atom per CPU core). The extreme efficiency of this approach promises its potential in first-principles atomistic simulations of very large systems and/or in long timescale. Keywords: Machine learning, Neural Networks, Molecular Dynamics, Force Field, Material Simulations Introduction

Potential energy surface (PES) is a fundamental ingredient for atomic-scale molecular dynamics (MD) and Monte Carlo (MC) simulations for molecules, chemical reactions, and materials. A desirable PES should be not only sufficiently accurate to reproduce ab initio energies and atomic forces, but also highly efficient to enable large-scale repeated calculations. For macromolecules and materials, where ab initio calculations are generally infeasible, physically motivated empirical or semi-empirical force fields [1-4] , have extensively been used over the years. Despite their limited accuracy, these classical force fields (CFFs) are extremely fast due to their simple analytical forms, which afford MD/MC simulations with millions of atoms and nanosecond timescale [5] . In recent years, machine learning (ML) approaches have emerged in constructing PESs of systems across gas [6-12] and condensed phase [13-18] , commonly referred as machine learned interatomic potentials (MLIPs). In particular, to deal with high-dimensional systems, most MLIPs express the total energy of the system as the sum of atomic energies like CFFs [19] . Instead of relying on physically-derived functions, MLIPs are able to learn the relationship between the atomic local environment and the atomic energy given a set of ab initio data points [19] . Thanks to the more complex and flexible mathematic forms, these MLIPs are intrinsically more accurate than CFFs and able to reproduce ab initio molecular dynamics (AIMD) results at a fraction of cost [20] . Despite these successes, an important fact attracting less attention is that these MLIPs are still much more expensive than CFFs, at the price of the increasing number of parameters. As pointed out in a recent perspective [20] , MLIPs for typical periodic systems have been reported to run per MD step ( i.e. compute atomic forces once) at a speed of about 10 -4 ~10 -2 s/atom on a single CPU core [5, 16, 17, 21, 22] , which are at least two orders of magnitude slower than the fastest embedded atom method (EAM) force field [5, 21] . The computational bottleneck of most MLIPs is the evaluation of structural descriptors that distinguish local atomic configurations [23] . These descriptors in general need to sum over many-body interactions between the central atom and its neighbors so as to preserve the rotational, translational, and permutational symmetry of the system [19] . As a result, the numerical cost of conventional descriptors, for example, the Behler-Parrinello type atom-centered symmetry functions (ACSFs) [13] and their variants [23-25] , often scales at least quadratically with the number of neighbor atoms. Alternative methods have been proposed by several groups based on direct expansion of the atomic energy in various forms in terms of symmetry-invariant many-body basis functions [26] ( e.g. , polynomials [16, 27] or bispectrum components [15] ). These methods rely on standard (mostly linear) least squares optimization of expansion coefficients, invoking neither neural networks (NNs) nor kernel-based regression. Linear scaling with respect to the number of neighbor atoms is achieved by converting the sum over many-body terms into a product of two-body sums [15, 16, 26, 27] . From a different perspective, we have recently derived a linear-scaling ML framework inspired by the EAM concept. This so-called embedded atom neural network (EANN) model [28] combines the implicit description of three-body interactions by two-body terms with the flexibility of NNs. In this work, we replace the original Gaussian-type orbitals with piecewise switching functions in the construction of structural descriptors. This greatly lowers the scaling factor and improves the efficiency of the EANN model. Tests on representative periodic systems demonstrate that the piecewise EANN (PEANN) potentials can be over one order of magnitude faster than various well-established MLIPs with comparable accuracy, and in some cases as fast as several μs/atom/CPU-core that is reachable by simplest CFFs only before. This PEANN model will offer a promising solution to the dilemma of accuracy versus efficiency of MLIPs in very large systems.

Method

We shall first review the original EANN model briefly, which borrows the idea of EAM expressing the atomic energy as a functional of the embedded electron density of the impurity atom, i.e.   ˆ( ) i i E  r [1] . In particular, we replace the scalar density value ˆ( ) i  r with a set of density-like descriptors made of atomic orbitals in the vicinity of embedded atom and the complicated functional with an atomic NN [28] . In this regard, any type of atomic orbitals can be taken as long as they effectively distinguish the local environment. For simplicity, in the original EANN approach [28-31] , we use the Gaussian-type orbital (GTO) in Cartesian space in the following form,   , ˆ( ) ys x zx y z = x y z exp - r - r      r , (1) where ˆ ( , , ) x y z  r represents the position vector of an electron relative to the nucleus, r= ˆ r , α and r s are hyperparameters that determine widths and centers of Gaussian radial functions , l x + l y + l z = L specifies the orbital angular momentum ( L ). To find the electron density of the embedded atom i at location ˆ i r , we evaluate individual electron density contributions by the square of linear combination of these atomic orbitals centered at surrounding atoms with the same L , α and r s , ! ˆ ˆ( ) ( )! ! ! x y z ss x y zx y z l l l L riL r j l l l i j c ijl l l jx y z L c f rl l l              r r , (2) In Eq. (2), the summation in spirit mimics the integral over the embedded wavefunction in terms of atomic orbitals of nearby atoms within a cutoff radius ( r c ), with c j being the corresponding adjustable expansion coefficient of atom j at location ˆ j r which is optimized in the training process and ( ) c ij f r a cutoff function [32] to ensure that the contribution of each neighbor atom decays smoothly to zero at r c , [0.5 0.5 cos( / )] ,( )= 0, ij c ij cc ij ij c r r r rf r r r     , (3) where ˆ ˆ ij i j r   r r is the internuclear distance between atom i and atom j . A key advantage of these density-like descriptors is that they can be formally transformed to a series of angular basis [27] , preserving the invariance of translation, rotation and permutation. In this way, they unify the radial and angular functions as defined in conventional ACSFs with an implicit description of three-body interactions, realizing a linear scaling with respect to the number of neighbors. The remaining cost of computing these descriptors results dominantly from the evaluation of the Gaussian radial function and the cutoff function. In particular, one has to explicitly calculate Gaussian radial function and ( ) c ij f r for any atom j inside the cutoff sphere for a given set of hyperparameters. This becomes actually wasteful when Gaussian radial function and ( ) c ij f r almost vanish ( e.g. , r ij deviates significantly from r s ), thus drastically more expensive with the increasing number of descriptors and cutoff radius. To overcome this shortcoming, we replace the Gaussian radial function with a simple piecewise switching function,

1, 0(2 ) ( )( )= , 0 11 ( )0, 1 in out r r xexp x x expf r xexp x                 , (4) which is characterized by its inner ( r in ) and outer ( r out ) ends with ( ) / ( ) in out in x r r r r    and the damping strength ( α ). The piecewise atomic orbital (PAO) then becomes, , , , ˆ( ) ( ) yin out x zx y z in out lr r l ll l l r r = x y z f r    r , (5) This replacement has several distinct advantages. First, , ( ) in out r r ij f r  is now computed only when the j th atom is at the distance of in ij out r r r   to the central atom i , otherwise its value is simply zero or one. Second, defining out c r r  in the last piecewise function, , ( ) in out r r ij f r  naturally goes to zero at r c . Consequently, the artificial cutoff function ( ) c ij f r is no longer necessary. These features are best illustrated in Fig. 1, which compares the radial distributions of a group of normalized GTOs and PAOs, where the angular part has been factorized out leaving a factor L r (see Supplementary Information for details). As seen in Fig. 1a, Gaussian radial functions are expanded in the full range within r c with their centers shifted incrementally. They need to be explicitly calculated whatever r ij is even if their contributions are negligible. Similar procedure is indeed unavoidable in most MLIPs. In comparison, as displayed in Fig. 1b, each piecewise radial function is a continuous function with its the left half domain given by the factor r L and its right half domain switching from one to zero. Here r L is an auxiliary factor for illustration and transformation to explicit angular basis only, but virtually not computed. As a consequence, each , ( ) in out r r ij f r  function needs to be calculated once provided that r ij falls in one of these intervals, namely [ , ] in out r r , giving rise to significant savings. We note that a similar choice of piecewise cosine functions has been proposed by Huang et al. [33] in their single atom neural network model. However, those cosine functions behave similarly as the GTOs, which need to be explicitly computed within twice the interval of , ( ) in out r r ij f r  here. Singraber et al. also proposed to use an exponential function as a cutoff function in their NN implementation [34] . Results

In spite of a large body of ML approaches for constructing PESs, very limited studies have systematically assessed their relative efficiency when reaching a similar accuracy based on the same benchmark data set and the optimal number of parameters [21, 35-37] . This actually prevents a direct comparison of efficiency among various MLIPs. In the present work, we first take these freely available data reported by Ong and coworkers for Cu, Mo, and Ge systems [21] . These systems span various crystal structures (fcc, bcc, and diamond) and bonding types (metallic and covalent), serving as great tests for the universality of a ML model. For each element, a few hundreds of structures were sampled via high temperature AIMD simulations of different bulk supercells at density functional theory (DFT) level, along with strained structures with varying cell sizes and high-index surface structures. These very diverse structures cover a huge configuration space with distinct atomic local environments, providing stringent challenges for ML models. More importantly, the performance and computational cost of several popular MLIPs and CFFs have been benchmarked in Ref. [21] given the same datasets, offering us useful references. Technical details on training the EANN and PEANN models are given in the Supporting Information (SI). In Table 1, we compare the root-mean-square-errors (RMSEs) in energy and atomic force in the test set of various MLIPs and CFFs (whenever available), as well as the corresponding computational costs per MD step with a single CPU core. Apparently, all MLIPs are more accurate than CFFs, especially with much lower RMSEs in energy. As discussed in Ref. [21] , the Gaussian approximation potential (GAP) [14] and moment tensor potential (MTP) [16] have somewhat lower RMSEs, followed by several the NN-based potentials, and then the spectral neighbor analysis potential (SNAP) [15] and its quadratic version (qSNAP) [38] . Given the fact that those limited data points are diversely distributed, it is understandable that the kernel-based GAP and the invariant polynomial based MTP methods (like an atom centered local version of the permutationally invariant polynomial method [10] ) are advantageous, which typically require fewer points than NN-based methods to converge the PES to reasonable level of accuracy [12] . NN-based and SNAP models can improve their accuracy with increasing number of training data [21] ( e.g. , see also Ref. [28] for much better performance of EANN for bulk systems with more data available). But overall, their accuracy are comparable and more or less converged with the number of basis functions/polynomials/kernels, allowing a fair comparison of their cost [21] . We shall note that the evaluation time of the same potential is dependent on the processor used. These numbers reported in this work were based on the implementation with a 28-core processor, Intel(R) Xeon 6132 2.60GHz, while those taken from other work have been scaled by an estimated factor for comparison whenever necessary. As seen in Table 1, it is clear that these GAPs run most slowly on the order of several milliseconds/atom/core, due apparently to the local interpolation nature of this kernel-based method. Other ML methods based on global fitting, including MTP, SNAP/qSNAP, and Behler-Parrinello NN (BPNN) models, are typically dozens of times faster than GAPs, at the speed of a few 10 -5 ~10 -4 s/atom/core. Compared to these methods, the original EANN model speeds up by nearly one order of magnitude, reaching 6.5~12.0 μs/atom/core for the three systems. The PEANN model further lowers the cost to an unprecedented level, e.g. , ~4.9 μs/atom/core for bulk Cu, which is merely about twice that of the simplest EAM force field [39] , yet one twelfths of that of the modified EAM potential [40] . This is quite encouraging as these EAM models [39-41] implemented in LAMMPS are highly optimized with the extensive use of tabulated values [5] , while our PEANN potentials are so far implemented with an in-house Fortran MD code [42] . For Ge, where a Tersoff potential [43] has to be used instead of EAM due to its covalent bonding nature, the PEANN model is again only ~3 times more expensive (7.6 μs/atom/core). It is important to emphasize that our PEANN model, like other MLIPs, is equally suitable for both metallic and covalent systems and more accurate than corresponding CFFs. It is worth noting that Mueller and coworkers recently constructed a new type of MLIP for copper based on symbolic regression that simultaneously optimizes some simple functional forms and their parameters using genetic algorithm [44] . Given its very simple expression, it achieved a comparable speed as the EAM potential for copper. Whereas it remains unclear about its universality to describe more directional bonding in molecules and/or reactive systems as other MLIPs. The improved efficiency of PEANN over EANN arises solely from the decreasing cost of evaluating structural descriptors. In Table S1, we count the individual time for computing all internuclear distances (namely the first necessary step in any MLIP or CFF), constructing density-like descriptors based on internuclear distances (the second step), as well as evaluating neural networks (the third step), respectively. Taking Cu as an example, the consuming time for descriptors is reduced from 3.5 in EANN to 1.9 μs/atom/core in PEANN. Put another way, replacing the same number of GTOs with PAOs virtually speeds up by roughly a factor of two, because almost half of PAOs are not explicitly computed as seen in Fig. 1. This actually represents a significant improvement on the basis of the sufficiently efficient EANN model. Nevertheless, such an improvement does not significantly reduce the total cost of the PEANN potential of Cu, as computing internuclear distances and NNs is now as expensive as computing structural descriptors. However, once we artificially raise the number of descriptors or the cutoff radius (equivalently the average number of neighbor atoms), as illustrated in Fig. 2, the computational costs of the PEANN and EANN potentials for Cu both increase linearly, while the former becomes increasingly more efficient than latter. These results not only prove the linear scaling behavior of both models, but also suggest the superiority of the PEANN model in more complex systems that require a larger cutoff radius and a larger number of descriptors. To validate this point and show the generalizability of the PEANN model, we construct EANN and PEANN potentials for bulk water, which is an important condensed phase system that has been extensively studied with many ML methods [17, 35, 45-50] . Again, for our purpose, we choose to fit the ab initio data reported by Cheng et al. [49] , consisting of energies and forces for 1,593 diverse structures of 64 molecules of liquid water at the revPBE0-D3 level [51, 52] . These data have been used in that work to construct a public BPNN potential [13] , allowing us to make direct comparison. The BPNN potential gives the test RMSEs of 2.3 meV/atom (energy) and 120 meV/Å (force) [49] , which are comparable to the values of 2.1 meV/atom (energy) and 129meV/Å (force) given by the EANN model, and 2.3 meV/atom (energy) and 131 meV/Å (force) by the PEANN model. Fig. 3 compares the O-O, O-H, and H-H radial distribution functions (RDFs) obtained via classical MD simulations using these NN potentials. The excellent agreement suggests that these models achieve comparable level of accuracy so that their computational costs can be reasonably compared in Fig. 4 as a function of the number of atoms. As expected, all MLIPs scale linearly with respect to the number of atoms, which is more favorable than the cubic scaling of DFT calculations. Specifically, the cost of the BPNN potential is ~4.1×10 -4 s/atom/core, which is roughly five times that of the EANN model (~7.8×10 -5 s/atom/core) and 12 times that of the PEANN model (~3.4×10 -5 s/atom/core). For reference, we find that available data about the costs of other NN potentials [17, 34, 53] of water trained with different data sets but with a similar cutoff radius ( i.e. r c ≈6 Å, which largely determines the efficiency of a MLIP) are on the order of 10 -4 ~10 -3 s/atom/core, no better than the BPNN potential of Cheng et al. [49] Also compared in Figs. 3 and 4 is the well-known TIP4P force field [54] which is based on physically motivated functions to represent the interactions of rigid water molecules. Given the simplistic form of the TIP4P force field, it is not surprising that this semi-empirical model runs more efficiently than all ab initio MLIPs, reaching ~1.0×10 -5 s/atom/core. Indeed, the more anisotropic covalent and hydrogen bonding in liquid water demands more complex descriptors and longer cutoff radii than those used in metallic systems, resulting in higher computational costs of MLIPs in general. However, these MLIPs are fully dissociable with faithful representation of diverse DFT data including vdW corrections, among which the PEANN model is outstanding in the trade-off between efficiency and accuracy. Although this piecewise strategy allows us to extend the local environment to some extent, the long-range Coulomb interactions are still difficult to be described effectively, which depend on monomer properties such as dipole moment and polarizability [50] and deserve further exploration. Discussion and conclusion

To summarize, in this work, we demonstrate the high accuracy and scalability, universality, and most importantly the extremely low computational cost of the newly proposed PEANN model. The superior efficiency of the PEANN model results from the linear scaling with the number of neighbor atoms and the piecewise switching function based descriptors that are not necessarily computed for each neighbor atom. As illustrated in several benchmark systems, the PEANN model greatly closes the efficiency gap between typical MLIPs and physically motivated CFFs by at least an order of magnitude for both metallic and covalent systems. Impressively, the PEANN potentials for single component systems reach the speed of several μs/atom/core per step, as the EAM-type CFFs, meanwhile retain the ab initio accuracy as conventional MLIPs. Further acceleration of the PEANN model may be achieved by linking with GPU-based deep neural network packages and an optimal implementation with sophisticated MD codes like LAMMPS , which would enable classical and/or path integral MD/MC simulations in complex systems to long timescale and to explore rare events. Work along these directions are in progress in our group. Conflict of interest:

The authors declare that they have no conflict of interest.

Acknowledgements:

This work was supported by National Key R&D Program of China (2017YFA0303500), National Natural Science Foundation of China (91645202, 21722306), and Anhui Initiative in Quantum Information Technologies (AHY090200). We appreciate the Supercomputing Center of USTC and AM-HPC for high-performance computing services. We thank Prof. Shyue Ping Ong and Yunxing Zuo for sending us extra data to help test our code.

Data availability:

The EANN and PEANN potentials reported in this work are freely available at https://github.com/zylustc/Piesewise-embedded-atom-neural-networks- potential. Data points used in this work for Cu, Ge, Mo, and water systems should be found from original publications, namely Ref. [21] and Ref. [49] . Author contributions:

B.J. conceived this project, Y.Z. developed the machine learning codes and performed efficiency tests, H.C. performed molecular dynamics simulations. B.J. conducted writing the manuscript. All authors read and revised the manuscript. References [1] Daw MS, Baskes MI. Embedded-atom method: Derivation and application to impurities, surfaces, and other defects in metals. Phys Rev B, 1984, 29: 6443-6453 [2] Tersoff J. New empirical approach for the structure and energy of covalent systems. Phys Rev B, 1988, 37: 6991-7000 [3] Brenner DW. Empirical potential for hydrocarbons for use in simulating the chemical vapor deposition of diamond films. Phys Rev B, 1990, 42: 9458-9471 [4] van Duin ACT, Dasgupta S, Lorant F, et al. Reaxff: A reactive force field for hydrocarbons. J Phys Chem A, 2001, 105: 9396-9409 [5] Plimpton SJ, Thompson AP. Computational aspects of many-body potentials. Mrs Bull, 2012, 37: 513-521 [6] Manzhos S, Dawes R, Carrington Jr. T. Neural network-based approaches for building high dimensional and quantum dynamics-friendly potential energy surfaces. Int J Quant Chem, 2015, 115: 1012–1020 [7] Jiang B, Li J, Guo H. Potential energy surfaces from high fidelity fitting of ab initio points: The permutation invariant polynomial-neural network approach. Int Rev Phys Chem, 2016, 35: 479-506 [8] Shao K, Chen J, Zhao Z, et al. Communication: Fitting potential energy surfaces with fundamental invariant neural network. J Chem Phys, 2016, 145: 071101 [9] Majumder M, Ndengué AS, Dawes R. Automated construction of potential energy surfaces. Mol Phys, 2016, 114: 1-18 [10] Qu C, Yu Q, Bowman JM. Permutationally invariant potential energy surfaces. Annu Rev Phys Chem, 2018, 69: 151-175 [11] Krems RV. Bayesian machine learning for quantum molecular dynamics. Phys Chem Chem Phys, 2019, 21: 13392-13410 [12] Jiang B, Li J, Guo H. High-fidelity potential energy surfaces for gas-phase and gas–surface scattering processes from machine learning. J Phys Chem Lett, 2020, 11: 5120-5131 [13] Behler J, Parrinello M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys Rev Lett, 2007, 98: 146401 [14] Bartók AP, Payne MC, Kondor R, et al. Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons. Phys Rev Lett, 2010, 104: 136403 [15] Thompson AP, Swiler LP, Trott CR, et al. Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials. J Comput Phys, 2015, 285: 316-330 [16] Shapeev AV. Moment tensor potentials: A class of systematically improvable interatomic potentials. Multiscale Model Sim, 2016, 14: 1153-1173 [17] Zhang L, Han J, Wang H, et al. Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics. Phys Rev Lett, 2018, 120: 143001 [18] Schütt KT, Sauceda HE, Kindermans P-J, et al. Schnet – a deep learning architecture for molecules and materials. J Chem Phys, 2018, 148: 241722 [19] Behler J. Perspective: Machine learning potentials for atomistic simulations. J Chem Phys, 2016, 145: 170901 [20] Mueller T, Hernandez A, Wang C. Machine learning for interatomic potential models. J Chem Phys, 2020, 152: 0509027 [21] Zuo Y, Chen C, Li X, et al. Performance and cost assessment of machine learning interatomic potentials. J Phys Chem A, 2020, 124: 731-745 [22] Westermayr J, Gastegger M, Marquetand P. Combining schnet and sharc: The schnarc machine learning approach for excited-state dynamics. J Phys Chem Lett, 2020, 11: 3828-3834 [23] Gastegger M, Schwiedrzik L, Bittermann M, et al. Wacsf—weighted atom-centered symmetry functions as descriptors in machine learning potentials. J Chem Phys, 2018, 148: 241709 [24] Smith JS, Isayev O, Roitberg AE. Ani-1: An extensible neural network potential with dft accuracy at force field computational cost. Chem Sci, 2017, 8: 3192-3203 [25] Huang SD, Shang C, Kang PL, et al. Atomic structure of boron resolved using machine learning and global sampling. Chem Sci, 2018, 9: 8644-8655 [26] Drautz R. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys Rev B, 2019, 99: [27] Takahashi A, Seko A, Tanaka I. Conceptual and practical bases for the high accuracy of machine learning interatomic potentials: Application to elemental titanium. Phys Rev Materials, 2017, 1: 063801 [28] Zhang Y, Hu C, Jiang B. Embedded atom neural network potentials: Efficient and accurate machine learning with a physically inspired representation. J Phys Chem Lett, 2019, 10: 4962-4967 [29] Zhu L, Zhang Y, Zhang L, et al. Unified and transferable description of dynamics for h2 dissociative adsorption on multiple copper surfaces via machine learning. Phys Chem Chem Phys, 2020, 22: 13958-13964 [30] Zhang Y, Maurer RJ, Jiang B. Symmetry-adapted high dimensional neural network representation of electronic friction tensor of adsorbates on metals. J Phys Chem C, 2020, 124: 186-195 [31] Chen W-K, Zhang Y, Jiang B, et al. Efficient construction of excited-state hessian matrices with machine learning accelerated multilayer energy-based fragment method. J Phys Chem A, 2020, 124: 5684-5695 [32] Behler J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J Chem Phys, 2011, 134: 074106 [33] Huang Y, Kang J, Goddard WA, et al. Density functional theory based neural network force fields from energy decompositions. Phys Rev B, 2019, 99: 064103 [34] Singraber A, Behler J, Dellago C. Library-based lammps implementation of high-dimensional neural network potentials. J Chem Theory Comput, 2019, 15: 1827-1840 [35] Nguyen TT, Székely E, Imbalzano G, et al. Comparison of permutationally invariant polynomials, neural networks, and gaussian approximation potentials in representing water interactions through many-body expansions. J Chem Phys, 2018, 148: 241725 [36] Li J, Song K, Behler J. A critical comparison of neural network potentials for molecular reaction dynamics with exact permutation symmetry. Phys Chem Chem Phys, 2019, 21: 9672-9682 [37] Kamath A, Vargas-Hernández RA, Krems RV, et al. Neural networks vs gaussian process regression for representing potential energy surfaces: A comparative study of fit quality and vibrational spectrum accuracy. J Chem Phys, 2018, 148: 241702 [38] Wood MA, Thompson AP. Extending the accuracy of the snap interatomic potential form. J Chem Phys, 2018, 148: 241721 [39] Zhou XW, Johnson RA, Wadley HNG. Misfit-energy-increasing dislocations in vapor-deposited cofe/nife multilayers. Phys Rev B, 2004, 69: 144113 [40] Asadi E, Asle Zaeem M, Nouranian S, et al. Two-phase solid–liquid coexistence of ni, cu, and al by molecular dynamics simulations using the modified embedded-atom method. Acta Mater, 2015, 86: 169-181 [41] Park H, Fellinger MR, Lenosky TJ, et al. Ab initio based empirical potential used to study the mechanical properties of molybdenum. Phys Rev B, 2012, 85: 214121 [42] Hu X, Hase WL, Pirraglia T. Vectorization of the general monte carlo classical trajectory program venus. J Comput Chem, 1991, 12: 1014-1024 [43] Lenosky TJ, Sadigh B, Alonso E, et al. Highly optimized empirical potential model of silicon. Modelling and Simulation in Materials Science and Engineering, 2000, 8: 825-841 [44] Hernandez A, Balasubramanian A, Yuan F, et al. Fast, accurate, and transferable many-body interatomic potentials by symbolic regression. npj Computational Materials, 2019, 5: 112 [45] Wang Y, Huang X, Shepler BC, et al. Flexible, ab initio potential, and dipole moment surfaces for water. I. Tests and applications for clusters up to the 22-mer. J Chem Phys, 2011, 134: 094509 [46] Bartók AP, Gillan MJ, Manby FR, et al. Machine-learning approach for one- and two-body corrections to density functional theory: Applications to molecular and condensed water. Phys Rev B, 2013, 88: 054104 [47] Medders GR, Babin V, Paesani F. Development of a “first-principles” water potential with flexible monomers. Iii. Liquid phase properties. J Chem Theory Comput, 2014, 10: 2906-2910 [48] Morawietz T, Singraber A, Dellago C, et al. How van der waals interactions determine the unique properties of water. Proceedings of the National Academy of Sciences, 2016, 113: 8368-8373 [49] Cheng B, Engel EA, Behler J, et al. Ab initio thermodynamics of liquid and solid water. Proceedings of the National Academy of Sciences, 2019, 116: 1110-1115 [50] Cisneros GA, Wikfeldt KT, Ojamäe L, et al. Modeling molecular interactions in water: From pairwise to many-body potential energy functions. Chem Rev, 2016, 116: 7501-7528 [51] Hammer B, Hansen LB, Nørskov JK. Improved adsorption energetics within density functional theory using revised perdew-burke-ernzerhof functionals. Phys Rev B, 1999, 59: 7413-7421 [52] Grimme S, Antony J, Ehrlich S, et al. A consistent and accurate ab initio parametrization of density functional dispersion correction (dft-d) for the 94 elements h-pu. J Chem Phys, 2010, 132: 154104 [53] Lu D, Wang H, Chen M, et al. 86 pflops deep potential molecular dynamics simulation of 100 million atoms with ab initio accuracy. 2020, arXiv:2004.11658 https://ui.adsabs.harvard.edu/abs/2020arXiv200411658L [54] Clark T, Chandrasekhar J, Spitznagel GW, et al. Efficient diffuse function-augumented basis-sets for anion calculations. 3. The 3-21+g basis set for 1st row elements, li-f. J Comp Chem, 1983, 4: 294-301 Figure 1. Radial functions extracted from structural descriptors of EANN and PEANN models of Cu. (a) EANN with five evenly distributed r s between 0~3.8 Å and (b) PEANN with five evenly distributed r in and r out grids within 0.5~2.9 Å and 1.7~4.1 Å, respectively. Figure 2. Computational cost of EANN and PEANN with respect to (a) the number of descriptors and (b) the number of neighboring atoms (corresponding r c shown on the top), respectively. Figure 3. (a) O-O, (b) O-H and (c) H-H RDFs of liquid water obtained by MD simulation using BPNN, EANN, PEANN, and TIP4P potentials at 300K. Figure 4. Computational cost per MD step with respect to the number of atoms in bulk water systems calculated with BPNN, EANN, PEANN and TIP4P models, based on the Intel(R) Xeon 6132 2.60GHz processor. Table 1: Comparison of test RMSEs and computational costs per atom on a single processing core (μs/atom/core) of several MLIPs and CFFs for representative systems. RMSEs in energies and atomic forces (in parentheses) are in meV and meV/Å. System Cu Ge Mo RMSE Time RMSE Time RMSE Time PEANN 0.7 (32) 4.9 3.3 (105) 7.6 4.2 (246) 8.7 EANN 0.6 (26) 6.5 3.0 (97) 11.6 4.5 (245) 12.0 GAP 0.6 (20) 5349.4 * * * MTP 0.5 (10) 120.0 * * * BPNN 1.7 (60) 80.8 * * * SNAP 0.9 (80) 269.1 * * * QSNAP 1.2 (50) 94.2 * * * EAM 7.5 (120) 2.2 \ \ 68.0 (520) 1.6 Tersoff \ \ 550.4 (1360) 2.4 \ \ MEAM 10.5 (240) 61.3 \ 36.4 (220) 46.2 *These values were taken from Ref. [21] multiplied by a factor ~0.8 to account for the different intrinsic efficiency of Intel(R) Xeon 6132 2.60GHz (this work) versus Intel i7-6850k 3.6 GHz (in Ref. [21] ) processors. The scaling factor is roughly estimated by the average time ratio of computing EAM, MEAM, and SANP potentials using both processors and its value would not alter the main conclusion here. Supplementary Information Accelerating Atomistic Simulations with Piecewise Machine Learned Ab Initio Potentials at Classical Force Field-like Cost

In the embedded atom neural network (EANN) approach and its piecewise version (PEANN), the weights and biases of atomic NNs along with the atomic expansion coefficients were determined by minimizing the cost function defined by root-mean-errors between ab initio potential energies and atomic forces with respect to Cartesian coordinates and corresponding NN outputs , ( ) [( ) ] / data N NN Ref NN Refi i i i datai

S E E N       w F F . (S6) Here, w is a collection of all adjustable parameters, N data is the number of configurations in the training set. NNi E , Refi E , NNi F and Refi F are potential energies and atomic force vectors of i th configuration obtained by NN and reference ab initio calculations, respectively. Note that each NNi E and NNi F are the sum of atomic NN outputs and for the atomic NN parameters are identical for same element. An efficient hybrid extreme machine learning Levenberg-Marquardt (ELM-LM) algorithm was employed to optimize these adjustable parameters . For all systems discussed in this work, the NN structures consist of two hidden layers. To make a fair comparison, the number of neurons in each hidden layer, the number of descriptors, as well as the cutoff radius ( r c ) were all kept the same for in EANN and PEANN potentials of each system. Table S1 gives such information for the four benchmark condensed phase systems, namely Cu, Ge, Mo, and water. Also compared in Table S1 are the computational costs of evaluating individually the inter-nuclear distances within r c , the density-like structural descriptors, and NNs (plus the rest minor contributions), respectively. Other hyperparameters to determine the density- like descriptors are listed in Tables S1-S5 for PEANN and Tables S6-S9 for EANN, for Cu, Ge, Mo, and water in sequence. Note that for the water system, we have applied the CUR matrix decomposition algorithm to select the optimal descriptors that best represent the training set, as used by Ceriotti and coworkers to optimize the selection of Behler-Parrinello type atom centered symmetry functions . II. Molecular dynamics simulations of liquid water

To validate the accuracy of our PEANN and EANN potentials for liquid water, we compare the O-O, O-H and H-H radial distribution functions (RDFs) of liquid water obtained by the classical molecular dynamics (MD) simulations using the TIP4P model in Ref. 6, Behler-Parrinello NN (BPNN) potential in Ref. 7, EANN and PEANN potentials in this work, respectively. MD simulations have been performed with 64 water molecules in a cubic box with its side length of 12.42 Å at temperature of 300 K. A total of 20 ps NVT MD simulations with a time step of 0.2 fs. The Andersen thermostat was used for keeping the temperature in the simulations of the PEANN/EANN potentials in a modified VENUS code , while the Nose-Hoover Chains algorithm was used for the TIP4P and BPNN models implemented with LAMMPS , respectively. It is found that the TIP4P force field requires a longer cutoff radius and long-range corrections to yield the correct description of RDFs as shown in Fig. 4, and our setup thus follows the original publication where the short-range interactions were truncated at 7.75 Å.7

1. Zhang, Y., Hu, C. & Jiang, B. Embedded atom neural network potentials: Efficient and accurate machine learning with a physically inspired representation.

J. Phys. Chem. Lett. , 4962-4967 (2019). 2. Behler, J. Constructing high-dimensional neural network potentials: A tutorial review. Int. J. Quant. Chem. , 1032-1050 (2015). 3. Zhang, Y.-l., Zhou, X.-y. & Jiang, B. Accelerating the construction of neural network potential energy surfaces: A fast hybrid training algorithm.

Chin. J. Chem. Phys. , 727-734 (2017). 4. Mahoney, M. W. & Drineas, P. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences , 697-702 (2009). 5. Imbalzano, G., Anelli, A., Giofré, D., Klees, S., Behler, J. & Ceriotti, M. Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials.

J. Chem. Phys. , 241730 (2018). 6. Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water.

J. Chem. Phys. , 926-935 (1983). 7. Cheng, B., Engel, E. A., Behler, J., Dellago, C. & Ceriotti, M. Ab initio thermodynamics of liquid and solid water. Proceedings of the National Academy of Sciences , 1110-1115 (2019). 8. Andersen, H. C. Molecular dynamics simulations at constant pressure and/or temperature.

J. Chem. Phys. , 2384-2393 (1980). 9. Hu, X., Hase, W. L. & Pirraglia, T. Vectorization of the general Monte Carlo classical trajectory program VENUS. J. Comput. Chem. , 1014-1024 (1991). 10. Martyna, G. J., Tobias, D. J. & Klein, M. L. Constant pressure molecular dynamics algorithms. J. Chem. Phys. , 4177-4189 (1994). 11. Plimpton, S. Fast parallel algorithms for short-range molecular dynamics.

J. Comput. Phys. , 1-19 (1995). Table S1: NN structures denoted by the number of neurons in the input (descriptors), hidden, and output layers, the cutoff radii, as well as individual computational costs (μs/atom/CPU-core/step) of evaluating individually the inter-nuclear distances within r c , the density-like structural descriptors, and NNs (plus the rest minor contributions), respectively. NN structures and cutoff radii are identical for EANN and PEANN. System NN structure r c (Å) Individual costs: Distances/Descriptors/Others PEANN EANN Cu 10×10×10×1 4.1 1.2/1.9/1.8 1.2/3.5/1.8 Ge 15×15×15×1 5.0 1.2/3.8/2.6 1.3/7.8/2.5 Mo 16×16×16×1 5.0 1.7/4.2/2.8 1.6/7.9/2.5 H O 33×20×20×1 6.3 3.3/23.9/6.5 3.3/66.3/8.0 Table S2: Hyperparameters of the piecewise descriptors of the PEANN Cu potential. Numbering L max r in (Å) r out (Å) α 1 0/1 0.50 1.70 12.83 2 0/1 1.10 2.30 6.04 3 0/1 1.70 2.90 2.85 4 0/1 2.30 3.50 1.35 5 0/1 2.90 4.10 0.63 Table S3: Hyperparameters of the piecewise descriptors of the PEANN Ge potential. Numbering L max r in (Å) r out (Å) α 1 0/1/2 1.00 2.09 9.35 2 0/1/2 1.73 2.82 3.76 3 0/1/2 2.45 3.54 1.51 4 0/1/2 3.18 4.27 0.61 5 0/1/2 3.91 5.00 0.24 Table S4: Hyperparameters of the piecewise descriptors of the PEANN Mo potential. Numbering L max r in (Å) r out (Å) α 1 0/1 1.50 2.00 1.60 2 0/1 0.43 2.43 7.49 3 0/1 0.86 2.86 3.66 4 0/1 1.29 3.29 0.62 5 0/1 1.71 3.71 0.19 6 0/1 3.04 4.14 0.02 7 0/1 3.17 4.57 0.02 8 0/1 3.60 5.00 0.01 Table S5: Hyperparameters of the piecewise descriptors of the PEANN bulk water potential. Numbering Central atom L max r in (Å) r out (Å) α

1 O 0/1/2 -0.10 1.10 12.97 2 O 0/1/2 0.43 1.53 4.45 3 O 0/1/2 0.06 2.06 1.92 4 O 0/1/2 0.89 2.59 2.97 5 O 0/1/2 1.72 3.12 1.28×10 -1

6 O 0/1/2 1.95 3.65 1.13×10 -1

7 O 0/1/2 3.78 4.18 2.96×10 -2

8 O 0/1/2 3.31 4.71 1.42×10 -2

9 O 0/1/2 3.84 5.24 6.83×10 -3

10 O 0/1/2 4.37 5.77 3.27×10 -3

11 O 0/1/2 4.90 6.30 1.57×10 -3

1 H 0/1/2 -0.40 1.00 12.12 2 H 0/1/2 -0.47 1.57 3.84 3 H 0/1/2 0.06 2.06 1.84 4 H 0/1/2 0.59 2.59 1.16 5 H 0/1/2 1.72 3.12 1.29×10 -1

6 H 0/1/2 2.25 3.65 6.19×10 -2

7 H 0/1/2 2.78 4.18 2.97×10 -2

8 H 0/1/2 3.31 4.71 1.42×10 -2

9 H 0/1/2 3.84 5.24 6.83×10 -3

10 H 0/1/2 4.37 5.77 3.27×10 -3

11 H 0/1/2 4.90 6.30 1.57×10 -3 Table S6: Hyperparameters of the descriptors of the EANN Cu potential. Numbering L max r s (Å) α (Å -2 ) 1 0/1 0.00 0.22 2 0/1 0.95 0.22 3 0/1 1.90 0.22 4 0/1 2.85 0.22 5 0/1 3.80 0.22 Table S7: Hyperparameters of the descriptors of the EANN Ge potential. Numbering L max r s (Å) α (Å -2 ) 1 0/1/2 0.00 0.15 2 0/1/2 1.15 0.15 3 0/1/2 2.30 0.15 4 0/1/2 3.45 0.15 5 0/1/2 4.60 0.15 Table S8: Hyperparameters of the descriptors of the EANN Mo potential. Numbering L max r s (Å) α (Å -2 ) 1 0/1 0.00 0.43 2 0/1 0.68 0.43 3 0/1 1.36 0.43 4 0/1 2.04 0.43 5 0/1 2.72 0.43 6 0/1 3.40 0.43 7 0/1 4.08 0.43 8 0/1 4.76 0.43 Table S9: Hyperparameters of the descriptors of the EANN bulk water potential. Numbering Central atom L max r s (Å) α (Å -2 ) 1 O 0/1/2 0.00 0.54 2 O 0/1/2 0.61 0.54 3 O 0/1/2 1.22 0.54 4 O 0/1/2 1.83 0.54 5 O 0/1/2 2.44 0.54 6 O 0/1/2 3.05 0.54 7 O 0/1/2 3.66 0.54 8 O 0/1/2 4.27 0.54 9 O 0/1/2 4.88 0.54 10 O 0/1/2 5.49 0.54 11 O 0/1/2 6.10 0.54 1 H 0/1/2 0.00 0.54 2 H 0/1/2 0.61 0.54 3 H 0/1/2 1.22 0.54 4 H 0/1/2 1.83 0.54 5 H 0/1/2 2.44 0.54 6 H 0/1/2 3.05 0.54 7 H 0/1/2 3.66 0.54 8 H 0/1/2 4.27 0.54 9 H 0/1/2 4.88 0.54 10 H 0/1/2 5.49 0.54 11 H 0/1/2 6.10 0.547