Bayesian Force Fields from Active Learning for Simulation of Inter-Dimensional Transformation of Stanene
Yu Xie, Jonathan Vandermause, Lixin Sun, Andrea Cepellotti, Boris Kozinsky
FFast Bayesian Force Fields from Active Learning: Study ofInter-Dimensional Transformation of Stanene
Yu Xie, Jonathan Vandermause, Lixin Sun, Andrea Cepellotti, and Boris Kozinsky
1, 2, ∗ John A. Paulson School of Engineering and Applied Sciences,Harvard University, Cambridge, MA 02138, USA Robert Bosch LLC, Research and Technology Center,Cambridge, Massachusetts 02142, USA
Abstract
We present a way to dramatically accelerate Gaussian process models for interatomic forcefields based on many-body kernels by mapping both forces and uncertainties onto functions of low-dimensional features. This allows for automated active learning of models combining near-quantumaccuracy, built-in uncertainty, and constant cost of evaluation that is comparable to classical ana-lytical models, capable of simulating millions of atoms. Using this approach, we perform large scalemolecular dynamics simulations of the stability of the stanene monolayer. We discover an unusualphase transformation mechanism of 2D stanene, where ripples lead to nucleation of bilayer defects,densification into a disordered multilayer structure, followed by formation of bulk liquid at hightemperature or nucleation and growth of the 3D bcc crystal at low temperature. The presentedmethod opens possibilities for rapid development of fast accurate uncertainty-aware models forsimulating long-time large-scale dynamics of complex materials. ∗ [email protected] a r X i v : . [ phy s i c s . c o m p - ph ] A ug . INTRODUCTION Density functional theory (DFT) is one of the most successful methods for simulatingcondensed matter thanks to a reasonable accuracy for a wide range of systems. Ab-initiomolecular dynamics (AIMD) offers a way to simulate the atomic motion using forces com-puted at the DFT level. Unfortunately, computational requirements limit the time-scale andsize of AIMD simulations to a few hundred atoms for a few hundred picoseconds of time,precluding investigation of phase transitions and heterogeneous reactions. Such large-scalemolecular dynamics (MD) simulation must resort to empirically derived analytical inter-atomic force fields with fixed functional form [6, 16, 30, 49], trading accuracy and transfer-ability for larger length and time scales. Classical analytical force fields often do not matchthe accuracy of ab-initio results, limiting simulations to describing results qualitatively atbest or, at worst, deviating from the correct behavior. In order to broaden the reach ofcomputational simulations, it would be desirable to compute forces with ab-initio accuracyat the same cost of classical interatomic force fields.In recent years, machine learning (ML) algorithms emerged as powerful tools in regressionand classification problems. This interest has inspired several works to develop ML algo-rithms for interatomic force fields, for example neural-networks (NN) [7–11, 32, 44], MTP[24, 38, 43], FLARE [50], GAP [2–5], SNAP [48], SchNet [40–42], DeePMD [23, 51, 55],among others. All these machine learning potentials can make predictions at near ab-initioaccuracy, while greatly reducing the computational cost compared to DFT.However, most machine learning models have no predictive distributions which provideuncertainty for energy/force predictions. Without uncertainty, the training data is gener-ally selected from ab-initio calculations via a manual or random scheme. Determining thereliability of the force fields then becomes difficult, which could result in untrustworthy con-figurations in the molecular dynamics simulations. Therefore, uncertainty quantification isa highly desirable capability [34, 36]. Neural network potentials, e.g. ANI [44, 45] can useensemble disagreement as an uncertainty measure. This statistical approach is, however,not guaranteed to yield reliable calibrated uncertainty. In addition, NN approaches usuallyrequire tens of thousands of data for training, and are a few orders of magnitude slowerthan analytical force fields. Bayesian models are promising for uncertainty quantificationin atomistic simulations since they have an internal principled uncertainty quantification2echanism, the variance of the posterior prediction distribution, which can be used to keeptrack of the error on forces during a molecular dynamics (MD) run. For instance, Jinnouchi et al. [26, 27] used high-dimensional SOAP descriptors with Bayesian linear regression [4].Gaussian Process (GP) regression [4, 5, 20–22] is a Bayesian method that has been shown tolearn accurate forces with relatively small training data sets. Bartok et al. [3] used GP un-certainty with the GAP/SOAP framework to obtain only qualitative estimates of the forcefields accuracy. Recently Vandermause et al. [50] demonstrated a GP-based Bayesian activelearning scheme in the FLARE framework, utilizing rigorously principled and quantitativelycalibrated uncertainty, applying it to a variety of multi-component materials.In the most common form, however, GP models require using the whole training dataset for prediction, meaning that the computational cost of prediction grows linearly withthe size of the training set, and so accuracy increases together with computational cost. Incomplex multi-component structures, O (10 ) ∼ O (10 ) training data points are typicallyrequired to construct an accurate GP model, which makes predictions slow. To address thisissue, Glielmo et al. [20–22] noted that, for a suitable choice of the n -body kernel function, itis possible to decompose GP force/energy prediction into a summation of low-dimensionalfunctions. As a result, one can construct a parametric mapping of the n -body kernel functioncombined with a fixed training set. This approach reaches a constant scaling, which increasesthe speed of the GP model without accuracy loss.In order to accelerate force prediction as well as the uncertainty quantification, enablingon-the-fly training for GP force fields, it is necessary to also map uncertainty onto lowdimensional functions. However, mapping the prediction of uncertainty was not attempted in[20–22], since the mathematical form of uncertainty results in decomposition with twice thedimension of the corresponding mean functions, dramatically increasing the computationalcost of their evaluation with spline interpolations. Thus, to date there is no method thatcombines high accuracy, modest training data requirement and fast prediction of both forcesand their principled Bayesian uncertainty.To address these issues, here we develop the Mapped Gaussian Process (MGP) methodthat implements an essentially lossless mapping of both force (mean) and uncertainty (vari-ance) onto low-dimensional functions of atomic positions. Using dimensionality reductiontechniques, we increase the efficiency of the MGP to make it applicable for large-scale (millionatom) simulations, achieving speeds comparable to classical analytical force fields, several3rders of magnitude faster than available NN or full GP approaches. Quantified principleduncertainty allows us to implement a Bayesian active learning workflow, which automatesand accelerates the training process and increases the efficiency of phase space exploration.As a result, the MGP method benefits from the capability of quantifying its uncertainty,while at the same time retaining cost independent of training size. To illustrate the benefitsof this method, we incorporated the MGP force field in the parallel MD simulation codeLAMMPS [37], and applied it to simulations of large-scale phase transformation dynamics.As a test application, we focus on stanene, a 2D material that has recently gatheredattention as a quantum spin Hall insulator [47]. Moreover, the only published force fieldfor stanene is, to the best of our knowledge, the bond-order potential by Cherukara etal. [17], which is fitted to capture stanene’s low-temperature crystalline characteristics but,as common to many empirical interatomic potentials, suffers in accuracy near the meltingtemperature. In particular, we show that MGP is capable of rapidly learning to describe awide range of temperatures, below and around the melting temperature, and that we canefficiently monitor the uncertainty on forces at each time step of the molecular dynamics, anduse this capability to iteratively increase the accuracy of the force field by hierarchical activetraining. Using parallel simulations of large-scale structures, we characterize the unusualphase transition, where the 2D monolayer transforms to bcc bulk Sn at the temperature of200K, and melts to ultimately form a 3D liquid phase at the temperature of 500K.The application demonstrates that the MGP force field approaches the computationalcost of an empirical analytical force field, retains nearly ab-initio accuracy, quantitativeBayesian uncertainty, and is capable of describing large-scale systems with complex physicalprocesses such as phase transitions.
II. RESULTSA. Bayesian Active Learning
In a MD simulation, it is likely that the system will evolve to atomic configurationsthat have not been seen before, and are therefore far from those in the training set. Inthis situation, the uncertainties of the predictions will grow to large values which may beconsidered unsatisfactory. Therefore, it is desirable to obtain an accurate estimate of the4IG. 1: Bayesian active learning (BAL) workflow with MGP. Iterations of BAL can be runfor different system sizes to refine the force field.forces for such configurations using a relatively expensive first principles calculation, andadd this new information to the training set of the GP regression model.This procedure is referred to as Bayesian Active Learning (BAL), where training exam-ples are added on the fly as more information is obtained about the problem. Bayesianmachine learning models such as GP are particularly well-suited to such uncertainty-basedactive learning approaches, as they provide a well-defined probability distribution over modeloutputs which can be used to extract rigorous estimates of model uncertainty for any testpoint [12, 33].Here we adopt BAL to achieve automatic training of models for atomic forces, expandingon our earlier workflow [50]. This way the accuracy of the GP model increases with time,particularly in the configuration regions that are less explored and likely to be inaccurate.As mentioned above, GP regression cost scales linearly with the training set size, so theprediction of forces and uncertainties becomes more expensive as the BAL algorithm keepsadding data to the training set, hindering simulations of complex materials systems. TheMGP approach is based on an essentially exact mapping both mean and variance of thepredicted forces from a full GP model for a fixed training set to a low-dimensional splinemodel. Mathematical formalism is detailed in the Methods section below. The novelty ofthis approach is the ability to accelerate prediction of not only the mean, but also GP’sinternal uncertainty without loss of accuracy. MGP is incorporated into the BAL workflowas depicted in Fig.1. In this scheme, the mapping is constructed every time the GP model isupdated. Atomic forces and their uncertainties for MD are produced by MGP, reducing the5omputational cost of the full original GP prediction by orders of magnitude. We stress thatnot only is the MGP faster than the original GP, but also its speedup is more pronounced asthe training data set size N T increases. This is especially critical in complex systems withmultiple atomic and molecular species and phases, where more training configurations areneeded.In addition, fast evaluation of forces and their variances enables training the force fieldmodel on larger system sizes. Therefore, we can train the MGP force field using a hierarchicalscheme, i.e. for different system sizes, we can run several iterations of active learning. First,a small system size is used to train an MGP force field model using the BAL workflow. Then,we perform BAL on a larger system size, where DFT calculations are more expensive but areneeded less frequently, in order to capture potentially new unexplored configurations suchas defects that are automatically identified as high uncertainty by the model of the previousstep. We note that DFT calculations scale as the 3rd power of the number of electrons inthe system, so it is efficient to perform most of the training at smaller sizes, and to onlyrequire re-training on fewer large-scale structures. This way the hierarchical training schemehelps to overcome finite size effects and explore phenomena that cannot be captured withsmall system sizes, such as phase transformations. B. Case Study: 2D to 3D Transformation of Stanene
To illustrate our method, we apply the mapped on-the-fly workflow (Fig.1) to studythe phase transformation of stanene, a two-dimensional slightly buckled honeycomb latticeof tin. This structure has received much attention recently due to its unique topologicalelectronic properties. Stanene can be synthesized by molecular beam epitaxy [56] and hasbeen shown to have topologically protected states in the band gap [18]. Moreover, tri-layerstanene is a superconductor [29]. Here, we want to focus on the less studied aspects ofthermodynamic stability and phase transition mechanisms of free standing stanene.Despite the intense interest, capability is still lacking for accurate modeling of thermody-namic properties and phase stability of stanene, and most other 2D materials, for which ∼ nmscale supercells are required. Such simulations are beyond the ability of DFT, and to thebest of our knowledge, the only published classical force field for stanene is a bond-order po-tential (BOP) [17]. Our focus is on developing an accurate potential to model the 2D melting6rocess, which is a notoriously difficult problem to tackle with existing classical force fields.In fact, describing the melting mechanism of a 2D material is a highly non-trivial problem,since it can be characterized by specific behaviors such as defect formation and nucleation.For example, the melting of graphene is thought to occur in a temperature range between4000K and 6000K [19, 31, 54], and seems to be characterized by the formation of defects,while above 6000K the structure melts directly into a totally disordered configuration. Sincebulk tin melts at a much lower temperature ( ∼ ∼
1. Model Training
To train the force field for stanene, we start by training the MGP using the BAL loop ona small system size, and gradually increase the size of the simulation, iteratively improvingupon the force field developed for the smaller size.The DFT settings and the GP model hyperparameter settings are shown in the section2.1 of supplementary material [53].The first iteration of training is started with a 4 × × (a) (b) FIG. 2: (a) Pristine stanene, top view and front view. (b) Atomic mean squaredisplacement (MSD) of the on-the-fly training trajectory. The solid black line shows theMSD of the 32 atoms, the color blocks indicate time intervals corresponding to differenttemperatures, and the red dots represent events when DFT calculations are called as theuncertainty predictions exceed the data acquisition decision threshold.In this small simulation, the stanene monolayer decomposes at 800K, where Sn atomsstart moving around the simulation cell. This allows us to augment the training set withsuch disordered configurations. However, the small size of the simulation cell does not yetallow us to draw conclusions about the melting process of stanene: for this, we need a largersimulation cell.We continue with our hierarchical active learning scheme to refine the force field, butretaining the training data from the previous smaller MD run. We now increase the numberof atoms in the simulation, which allows us to explore additional atomic configurations andimprove the quality of the model. Since the system becomes larger and more computationallyexpensive, instead of updating the model at each learning step, we update it at the end ofeach MD simulation. We note that this ”batching” approach is one of many possible trainingschemes and is chosen for reasons of efficiency. The procedure is as below:1. Run MD simulation on a system of a given size with MGP;2. Predict mapped uncertainties for the frames from the MD trajectory;3. Select frames of highest uncertainties in different structures (crystal, transition stateand disordered phase) for additional DFT calculations;8 a)(b) (c)
FIG. 3: (a) Uncertainty of the force predictions in the whole MD trajectory decreases inthree iterations. (b) Mean absolute errors (MAE) of MGP forces against DFT in threeiterations, validated on atomic structures of different phases. (c) Upper: A snapshot of2048 atoms from MD simulation at 500K, colored by MGP uncertainty. The outlined areahas highest predicted uncertainty, since it contains a defect and a compact region notknown to the first MGP model. Lower-left: Zoomed-in structure of the region of highuncertainty, colored by atomic heights (z-coordinate) to show the bilayer structure of thedefect. Lower-right: The same snapshot colored by coordination number, which showsstrong correlation with the uncertainty measure.4. Add atomic environments from those frames based on uncertainty to the training setof GP;5. Construct a new MGP model using the updated GP;6. Repeat the process for same or larger simulation using the new model.9he above steps form one iteration, and can be run several times to refine the force field.In our force field training for stanene, we performed one iteration of 32 atoms as describedbefore, and two iterations all on the system size of 200 atoms at 500K, a relatively hightemperature to explore diverse structures in the configuration phase space. We denote theMGP force field from the on-the-fly training of 32 atoms as MGP-1, the ones from the twoiterations of training on 200 atoms as MGP-2 and MGP-3. MD are simulated by MGP-1,2, 3 and the predicted uncertainties of the frames from the three MD trajectories predictedby MGP are shown in Fig.3a. First, we note that there is a drastic increase of uncertaintywithin each trajectory at a certain time. This increase corresponds to the transition fromcrystal lattice to a disordered phase, so the uncertainty acts as an automatic indicator of thestructural change. Next, we note that the uncertainties of MGP-1, 2, 3 decrease iterationby iteration, indicating our model becomes more confident as it collects more data. Themean absolute error (MAE) of force predictions of MGP against DFT is investigated, aspresented in Fig.3b, showing that MGP becomes also more accurate with each subsequenttraining iterations.It is worth noting that in the transition process of the simulation by MGP-1, in the tran-sition process there appear a bi-layer stacked defect and a specific compact triangular latticearound the defect. It is a visibly new ordered structure different from the hexagonal stanene,and is thus missing in the initial training set, as shown in Fig.3c. The uncertainty is lowin honeycomb lattice, as expected, and becomes high around the bi-layer defect, exhibitingalso a strong correlation with the coordination number. Thus, the mapped uncertainty isa accurate automatic indicator of the novelty of the new configurations. We note that af-ter iterative model refinement by the hierarchical training scheme, the triangular lattice nolonger appears in the simulation. This example illustrates how the hierarchical active learn-ing scheme helps avoid qualitatively incorrect results that arise due to the a priori unknownrelevant configurations. In contrast, a single-pass training of a force field based on manuallychosen configurations is very likely to result in non-physical predictions.
2. Phase Transition Simulations
To investigate the full phase transition mehanism of stanene, we consider low tempera-ture (below 200K) and high temperature (above 500K) values, corresponding to the crystal10nd liquid phase for bulk tin. In studying the graphite-diamond transition, Khaliullin etal.[28] identified that nucleation and growth of the disordered phase play key roles in thephase transition process. We also find that in stanene the phase transition begins with theappearance of defects and the consequent nucleation and growth of the disordered phasethat eventually transforms the structure.From DFT ab-initio calculations, the ground state energy of honeycomb monolayer is ∼ .
38 eV/atom higher than the bcc bulk, so the 2D monolayer structure is expected to bemetastable. The full transformation described below is therefore a sequence of 2D melting,followed by a kinetically controlled transition to the bulk phase of lower free energy. Weset up the simulation of stanene monolayer of size 15 . × . α -phase with the dia-mond structure, which is different from what we observe here. The reason for the formationof the bcc phase is kinetic rather than thermodynamic, caused by a lower free energy barrierseparating the 3D disordered structure from the bcc phase than the α -phase, making theformer phase more kinetically accessible. One well-known example is the graphite-diamondtransition, in which the meta-stable hexagonal diamond (HD) is obtained from hexagonalgraphite (HG) in most laboratory synthesis, instead of the more thermodynamically stable11
00 ps 200 ps0.3 ns 0.8 ns2 ns1.6 ns 400 ps 600 ps (a) (b)
FIG. 4: (a) The monolayer - bulk transition at 200K. Using common neighbor analysis, theorange atoms are identified as bcc sites, and atoms of other colors are identified as othersites. The frame at 0.3 ns: the atoms are vibrating around the honeycomb lattice siteswith out-of-plane ripples. The frame at 0.8 ns: the atoms arrange in the “disordered” way.The frame at 1.6 ns: some patches of bcc lattice structure appear. The frame at 2 ns: theatoms form and vibrate around the bcc slab sites. (b) Frames from 500K NPT simulationwith 10,000 atoms (39 . × . . × . C. Mapped Gaussian Process Performance
In this section, we discuss the computational cost of the MGP model in application tolarge scale MD simulations, and its advantages and limitations. Timing results are shown inFig.5a, comparing different models including DFT, full GP (without mapping; implementedin Python), MGP (with mapped force and uncertainty; Python), MGP-F (MGP with forceprediction only; Python), MGP (force only; implemented in LAMMPS) and BOP (LAMMPSforce field). For DFT, we tested prediction time for system sizes of 32 atoms and 200 atoms,where computational cost growing as the cube of the system size. The other algorithms’cost scales linearly with system size, since they rely on local environments with finite cutoffradius, we measure the same per atom computational cost for both system sizes.It is important to highlight the acceleration of the MGP relative to the original full GP.We present the timings of GP and MGP built from different training set sizes (100 and 400training data) in Fig.5, where the GP scales linearly. MGP-F (without uncertainty) andMGP (with uncertainty, fixed rank) are independent of the training set size. The numericalresult shows that the speedup of the MGP relative to the GP increases linearly with moredata added to the training set, as reported in the Fig.S3 of the supplementary material[53]. In the stanene system discussed in Sec.II, MGP is two orders of magnitude faster thanGP with O (10 ) training data points when both force and variance are predicted, and thespeedup becomes more significant as the GP collects more training data. We also note thatalthough the prediction of MGP gets rid of the linear scaling with training set size, theconstruction of the mappings does scale linearly with training set size, since it requires GPto make prediction on each grid point. 13 b)(a) FIG. 5: (a) The comparison of prediction time (s × processors / atom) between DFT, fullGP, MGP (Python, with force and uncertainty), MGP-F (Python, with force only), MGP(LAMMPS), BOP (LAMMPS). (b) The scaling of MGP in LAMMPS with respect todifferent system sizes, in different phases. Green: 2D honeycomb lattice. Blue: 3D denseliquid.The cost of the MGP force field is comparable to that of empirical interatomic BOP forcefield, since both BOP and MGP models consider interatomic distances within the localenvironment of a central atom and thus have similar computational cost at the same cutoffradius. For accurate simulations, however, the MGP uses a larger cutoff radius (7 . ystem size (atoms) 100 , . . · timestep/processor/s) ∼ . × MGP speed (atom · timestep/processor/s) ∼ . × TABLE Iis still slower by a factor of 4 ∼ n -body kernel only uses low-dimensionalfunctions of crystal coordinates. In this work, we have implemented the MGP for 2- and3-body interactions, which, for the mapping problem, translate to interpolation problems in3 dimensions. The approach may be extended to higher interaction orders at the expenseof computational efficiency, but it becomes expensive in terms of both computation andstorage requirements for splines on uniform grids to reach a satisfactory accuracy. Many-body descriptors can be taken into account to increase the descriptive power. To map itin the same way as the methodology we introduced, it is necessary to have a descriptorsuch that it can be decomposed into low-dimensional functions. For example, the SOAPapproach of Refs. [26, 27] cannot be mapped using the methodology followed in this work,since it relies on a high-dimensional descriptor and, as a result, the MGP approach is notreadily applicable. In particular, the interpolation procedure may be too computationallyexpensive to be a viable solution. III. CONCLUSION
This work resolves a key computational cost scaling limitation of Gaussian Process re-gression of interatomic force fields in the case of many-body kernels and provides a newapproach for Bayesian active learning, enabling robust automatic training of high accuracyinteratomic force fields. We present an extended method for mapping the Gaussian Processregression model, where we accurately parameterize both the force and its variance obtainedfrom the original Bayesian model as functions of atomic configurations. The mapped meanforces and their variance are then utilized in an active learning workflow to obtain a machinelearned force field that includes its own uncertainty, the first realization of a fast
Bayesianforce field . The resulting model has near-quantum accuracy and computational cost com-parable to empirical analytical interatomic potentials. Large scale simulations are used asa demonstration to investigate the microscopic details of inter-dimensional transformationbehavior of 2D stanene into 3D bulk tin. We discover that the transformation proceeds bynucleation of bilayer defects, densification due to continued disordered multilayer stacking,and finally conversion into either crystalline or molten bulk tin, depending on temperature.This application shows that we can actively monitor the uncertainties of force predictions16uring molecular dynamics simulations and iteratively improve the model as the materialundergoes reorganization, exploring diverse structures in the configuration phase space. Theability to reach simulation sizes of over 1 million atoms is promising for application of ma-chine learning force fields to study phase diagrams and transformation dynamics of complexsystems, while automated active learning of force fields opens a way to accelerate wide-rangematerial design and discovery.
IV. METHODS
The goal of the MGP method is to map the predictive mean and variance of n -bodyGP models onto low-dimensional splines, making the cost of evaluating atomic forces anduncertainty independent of the training set size, as also studied in previous works [20–22].As mentioned in Ref. [22], decomposition of the GP kernel into low-dimensional functionsexists for the GP predictive mean and variance. However, the resulting variance functionshave twice the dimension of the corresponding mean functions, dramatically increasing thecomputational cost of their evaluation. In order to make such a mapping feasible in practice,we developed a dimension reduction technique that reduces the variance map to the samedimensions as mean, making it possible to use splines to evaluate both the mean and varianceof the GP. A. Background
1. Gaussian Process Force Field
In this section, we summarize the GP model introduced in Ref. [50]. From here on, weuse bold letters to denote matrices/vectors and regular letters for scalars/numbers.An atomic configuration is defined by atomic coordinates R α , a vector of dimensionequal to the number of atoms N atoms , and where α labels the three Cartesian directions α = x, y, z . While in general the crystal may contain atoms of different species, we hereconsider only a single species case for the sake of simplicity. The results can be extended tothe multi-component case, which is discussed in the section 1.1 of supplementary material[53]. 17he objective of the GP regression model is to predict the forces F α for an given atomicconfiguration R α . To this aim, we associate the forces of an atom i , F i,α , to the localenvironment around this atom ρ i . F i,α can then be computed by comparing its environmentwith a training data set of atomic environments with known forces, which is quantified bydistance d between the target environment and the training set.The atomic environment ρ i of the atom i is defined as a set of atoms, including the atom i itself and all the other atoms that are within a sphere of a cutoff radius r cut centeredat the atom i . We proceed by partitioning this atomic environment ρ i into a set of n -atom subsets, denoted by ρ ( n ) i . For each type of subset, a distance d ( n ) can be defined toquantify the difference between environments. We show examples of n = 2 and n = 3 onthe construction of ρ ( n ) i and d ( n ) . For n = 2, ρ (2) i is a set of pairs p , between central atom i and another atom i in ρ i , ρ (2) i = { p = ( i, i ); i ∈ ρ i , i = i } ; (1)To each atomic pair p we associate an interatomic distance r p between the two atoms. Thedistance metric between two pairs p and q is then defined as ( r p − r q ) . And the distancemetric between two atomic environments ρ (2) i and ρ (2) j is defined as( d (2) ij ) = X p ∈ ρ (2) i ,q ∈ ρ (2) j ( r p − r q ) . (2)This distance d (2) will be used to determine the two-body contribution to the atomic forces,for the purposes of constructing the kernel function.This construction can be extended to higher order interatomic interactions. In this work,we also include three-body interactions, corresponding to n = 3. Similar to the two-bodycase, the set ρ (3) i of triplets p between the central atom i and two other atoms i and i in ρ i is defined as, ρ (3) i = { p = ( i, i , i ); i , i ∈ ρ i , i = i, i = i, i = i } . (3)This time, each triplet p is characterized by a vector r p = { r p, , r p, , r p, } , which representsthe three interatomic distances of this atomic triplet p . Including permutation, the distancemetric between two triplets p and q can be defined as P u,v ( r p,u − r q,v ) . And the 3-bodydistance metric between two atomic environments ρ (3) i and ρ (3) j is( d (3) ij ) = X p ∈ ρ (3) i ,q ∈ ρ (3) j X u,v ( r p,u − r q,v ) . (4)18aving now defined a distance between atomic environments, we can build a GP re-gression model for atomic forces. We define the equivariant force-force kernel function tocompare two atomic environments as k αβ ( ρ i , ρ j ) = X n k ( n ) αβ ( ρ ( n ) i , ρ ( n ) j ) = X n ∂ k ( n ) ( ρ ( n ) i , ρ ( n ) j ) ∂R iα ∂R iβ , (5)where α and β label Cartesian coordinates, and ∂∂R iα indicates the partial derivative ofthe kernel with respect to the position of atom i . The n -body energy kernel function k ( n ) ( ρ ( n ) i , ρ ( n ) j ) is first expressed in terms of the partitions of n atoms of the two atomicenvironments ρ i and ρ j , and is defined as: k ( n ) ( ρ ( n ) i , ρ ( n ) j ) = X p ∈ ρ ( n ) i q ∈ ρ ( n ) j ˜ k ( n ) ( p, q ) = X p ∈ ρ ( n ) i q ∈ ρ ( n ) j σ ( n )sig exp (cid:20) − ( d ( n ) ij ) l ( n ) ) (cid:21) ϕ ( n )cut , (6)where ˜ k ( n ) ( p, q ) is a Gaussian kernel function, built using the definition of distance between n -atoms subsets as described before. The σ ( n )sig and l ( n ) are hyperparameters independent of p and q , that set the signal variance related to the maximum uncertainty and the length scale,respectively. The smooth cutoff function ϕ ( n )cut dampens the kernel to zero as interatomicdistances approach r cut . In this work, we chose a quadratic form for the cutoff function, ϕ (2)cut ( p, q ) = ( r q − r cut ) ( r p − r cut ) , (7) ϕ (3)cut ( p, q ) = Y s ( r sp − r cut ) Y t ( r tq − r cut ) . (8)The GP regression model uses training data as a reference set of N T atomic environments.In particular, predictions of force values and variances require a 3 N T × N T covariance matrix Σ = Σ iα,jβ = k αβ ( ρ i , ρ j ) + σ n δ ij δ αβ (9)where σ n is a noise parameter (indices iα and jβ are grouped together). We define anauxiliary 3 N T dimensional vector η with components η iα := (Σ − ) iα,jβ y jβ , (10)where y contains the training labels, i.e., the forces F jβ acting on the atom at the center ofenvironment j in the Cartesian direction β .19n order to make a prediction of forces and their errors on a target structure ρ i , wecompute the Gaussian kernel vector ¯ k iα for atom i and a direction α ( α = x, y, z ), where thecomponents of the vector are kernel distances (¯ k iα ) jβ := k αβ ( ρ i , ρ j ) between the prediction iα and the training set points jβ , j = 1 , ..., N T , β = x, y, z . Finally, the predictions of themean value of the Cartesian component α of the force F iα acting on the atom at the centerof the atomic environment ρ i and its variance V iα are given by [39]: F iα = ¯ k > iα η , (11) V iα = k αα ( ρ i , ρ i ) − ¯ k > iα Σ − ¯ k iα . (12)We notice that this formulation has two major computational bottlenecks, that we in-tend to address in this work. First, the vector ¯ k iα needs to be constructed for every forceprediction; for each atom, the evaluation of ¯ k iα requires the evaluation of the kernel betweenthe atomic environment and the training data set, i.e. N T calls to the Gaussian kernel. Wediscuss an approach to speed up the evaluation of ¯ k iα . Second, every variance calculationrequires a matrix-vector multiplication that is computationally expensive, and thus will befurther discussed in Sec.IV C. B. Mapped Force Field
We start by observing that the GP model constructs a latent “weight field” with a fixedset of training data. In GP regression, a kernel function is used to build weights that dependon the distance from training data: training data that are similar (close) to the test datacontribute with a larger weight, and vice versa, dissimilar (distant) training data contributesless. To make a prediction on a data point (force on an atom in its local environment), theGP regression model requires the evaluation of the kernel between such point and all the N T training set data points. The result, i.e. the GP mean prediction, is a weighted averageof the training labels, using the distances as measured by the kernel function as weights . InEq. 11, we notice that the predictive mean requires the evaluation of a 3 N T -dimensionalvector ¯ k iα , which quantitatively weighs the contribution to the test atomic environment ρ i coming from the training atomic environments ρ j .In the following discussion, we consider a specific subset size n , capturing n -body atomicstructure information, i.e. (¯ k ( n ) iα ) jβ = k ( n ) αβ ( ρ ( n ) i , ρ ( n ) j ) without loss of generality. The same20reatment below can be implemented on each n and summed over the mappings of all ” n ”sto get the final prediction if we are using a kernel function of the form of Eq.5.Starting with the idea introduced in Ref. [21], we can construct the weight field byconsidering the special form of the kernel at Eq. 6. In fact, such kernel definition allows usto decompose the force prediction into contributions from n -atom subsets, and to decomposethe variance into contributions from pairs of n -atom subsets.To see this, we first note that the kernel vector of atomic environments ρ i can be decom-posed into the kernel of all the n -atom subsets in ρ i , and the kernel of an n -atom subset p can be further decomposed into the contributions from all the n − i ). We denote r sp := R s − R i as a bond in p , where s labels the remaining n − r sp := r sp /r sp denoting the direction ofthe bond. Then the force kernel has decomposition¯ k ( n ) iα = X p ∈ ρ i ˜ k ( n ) iα ( p ) = X p X s µ ( n ) s ( p )ˆ r sp,α , (13)where h ˜ k ( n ) iα ( p ) i jβ := ∂ ˜ k ( n ) ( p, q ) ∂R iα ∂R jβ (14)and h µ ( n ) s ( p ) i jβ := X q ∈ ρ j X t ∂ ˜ k ( n ) ( p, q ) ∂r sp ∂r tq ˆ r tq,β (15)are vectors in R N T space. In the first equality of Eq.13 we used the decomposition of Eq.6, and in the second equality we first convert the derivative from atomic coordinates to therelative distances of the n -atom subset, then take advantage of the fact that the kernel onlydepends on relative distances between atoms, i.e. r sp = k r sp k .Therefore, forces may be written as F ( n ) iα = X p ∈ ρ i X s f ( n ) s ( p )ˆ r sp,α . (16)where f ( n ) s ( p ) := µ ( n ) s > ( p ) η (17)is the scalar “weight field” function and η is define in Eq.10.21s noted in Ref. [21], we can achieve a better scaling by constructing the weight fieldusing a computationally efficient parametric method, i.e. an approximate way for construct-ing such functions f . In fact, a parametric function can be used to quickly yield the weightfield value without the need of evaluating kernel distances from all the GP training points.As a result, the cost of evaluating the weight field using this parametric model is indepen-dent on the size of the training set, removing a computational bottleneck of the full GPimplementation.In order to achieve an effective parametrization of these weight functions, we use a cubicspline interpolation on a uniform grid of points, which benefits from having an analyticform and a continuous second derivative. By increasing the number of points in the uniformgrid, the MGP prediction of mean and variance can be made arbitrarily close to the originalfunctions.The construction of the mapped force field includes the following steps: Step 1 : Define a uniform grid of points { x k ∈ [ a, b ]( n ) } in the space of distances given bythe vectors r p . The lower bound a ≥ b can be set to the cutoff radius of atomic neighborhoods,above which the prediction vanishes (using the zero prior of GP), and (cid:0) n (cid:1) = n ( n − / n -atom subset. Step 2 : A GP model with a fixed training set yields the values { f ( n ) s ( x k ) } for each gridpoint x k . Step 3 : Interpolate with a cubic spline function ˆ f ( n ) s through the points { ( x ( n ) k , f ( n ) s ( x k )) } .In prediction, contributions from all n -bodies are added up, so the force of local environ-ment ρ is predicted as: ˆ F α ( ρ ) = X n X p ∈ ρ ( n ) X s ˆ f ( n ) s ( p )ˆ r sp,α . . Mapped Variance Field Using a derivation similar to that of the force decomposition (Eq.16), we derive thedecomposition of the variance V iα = X n,n X p ∈ ρ ( n ) q ∈ ρ ( n ) X s,t v ( n,n ) s,t ( p, q )ˆ r sp,α ˆ r tq,α , (18)where v ( n,n ) s,t ( p, q ) := δ nn ˜ k ( n ) αα ( p, q ) − µ ( n ) s ( p ) > (Σ) − µ ( n ) t ( q ) . (19)with the covariance matrix Σ defined in Eq.9.We note that the domain of the variance function v has dimensionality twice that of f ,so that when the interaction order n >
3, the number of grid points becomes prohibitivelylarge to efficiently map v . Even for a 3-body kernel, when v is a 6-D function, we find thatthe evaluation of the variance, both in terms of computational time and memory footprint,limits its usage to simple benchmarks.An efficient evaluation of the variance is critical for adopting Bayesian active learning.We now discuss a key result of this work and introduce an accurate yet efficient approximatemapping field for the variance. In particular, we focus on simplifying the vector µ ( n ) s ( p ),which, from our tests, is the most computationally expensive term for predicting variance.In particular, since µ ( n ) s ( p ) evaluates the kernel function between the test point p and allthe training points, it scales linearly with the training set size. By implementing a mappingof the variance weight field, the cost of variance prediction depends only on the rank of ourdimension reduction approach, and is independent of the training data set size.To this aim, we first define the vector ψ ( n ) s ( p ) := L − µ ( n ) s ( p ) ∈ R N T , where L is theCholesky decomposition of Σ, i.e. Σ = LL > , so that Eq. 19 can be written as v ( n,n ) s,t ( p, q ) = δ nn ˜ k ( n ) α,α ( p, q ) − ψ ( n ) s ( p ) > ψ ( n ) t ( q ) . (20)The domain of vector function ψ ( n ) s ( p ) is half the dimensionality of that of v ( n,n ) s,t ( p, q ).As a first idea, since each component of ψ ( n ) has domain of lower dimensionality, onemay think of building 3 N T spline functions to interpolate the 3 N T components separately,which means the number of spline functions needed grows with the training set size.23o achieve a more efficient mapping of ψ ( n ) s , we take advantage of the principal compo-nent analysis (PCA), a common dimensionality reduction method.The construction of the mapped variance field includes the following steps: Step 1 : We start by evaluating the function values on the uniform grid { x k ∈ [ a, b ]( n ) , k =1 , . . . , G } as introduced in the section above. At each grid point x k , we evaluate the vectors ψ ( n ) s ( x k ) and use them to build the matrix M ( n ) s := ( ψ ( n ) s ( x ) , ψ ( n ) s ( x ) , ..., ψ ( n ) s ( x G )) > ∈ R G × N T . Step 2 : In PCA we perform singular value decomposition (SVD) on M ( n ) s , such that M ( n ) s = U ( n ) s Λ ( n ) s ( V ( n ) s ) > , where U ( n ) s ∈ R G × N T and V ( n ) s ∈ R N T × N T have orthogonalcolumns, and Λ ( n ) s ∈ R N T × N T is diagonal. Step 3 : We reduce the dimensionality by picking the m largest eigenvalues of Λ ( n ) s andtheir corresponding eigenvectors from U ( n ) s and V ( n ) s (here we assume that eigenvalues areordered). Then, we readily obtain a low-rank approximation as M ( n ) s ≈ ¯ M ( n ) s = ¯ U ( n ) s ¯ Λ ( n ) s ( ¯ V ( n ) s ) > , (21)with ¯ Λ ( n ) s ∈ R m × m , ¯ U ( n ) s ∈ R G × m , and ¯ V ( n ) s ∈ R N T × m . Step 4 : Define φ ( n ) s := ( ¯ V ( n ) s ) > ψ ( n ) s , which is a vector in R m , and interpolate this vectorbetween its values at the grid points { x k } (in detail, φ ( n ) s,i ( x k ) = ¯ U ( n ) s,ij ¯Λ ( n ) s,j , for i, j = 1 , ..., m ).As done previously for the force mapping, spline interpolation can now be used for approx-imating the φ ( n ) s vector as well.Therefore, the MGP prediction of the variance is given byˆ v ( n,n ) s,t ( p, q ) := δ nn ˜ k ( n ) α,α ( p, q ) − φ ( n ) s ( p ) > ( ¯ V ( n ) s ) > ¯ V ( n ) t φ ( n ) t ( q ) . (22)In this form, the variance would still require a multiplication between two matrices of size24 × N T . However, we note that the computational cost of total variance can be reducedˆ V α ( ρ ) = k α,α ( ρ, ρ ) − (cid:18) X n X p ∈ ρ ( n ) X s ¯ V ( n ) s φ ( n ) s ( p )ˆ r sp,α (cid:19) . (23)Nominally, the calculation of the second term in the variance still scales with the data setsize N T , since the size of ¯ V ( n ) depends on N T . However, we find the cost of the vector-matrix multiplication to be negligible compared to evaluations of the self-kernel function(the 1st term in Eq.22). Therefore, we have formulated a way to significantly speed up thecalculation of the variance, which only weakly depends on the training set size. In addition,by tuning the PCA rank m , we can optimize the balance between accuracy and efficiency,increasing the flexibility of this model. V. ACKNOWLEDGEMENT
We thank Steven Torrisi and Simon Batzner for help with the development of the FLAREcode. We thank Anders Johansson for the development of the KOKKOS GPU accelerationof the MGP LAMMPS pair style. We thank Dr. Nicola Molinari, Dr. Aldo Glielmo, Dr.Claudio Zeni and Dr. Sebastian Matera for useful discussions. Y.X. is supported by theUS Department of Energy (DOE) Office of Basic Energy Sciences under Award No. DE-SC0020128. L.S. was supported by the Integrated Mesoscale Architectures for SustainableCatalysis (IMASC), an Energy Frontier Research Center funded by the US Department ofEnergy (DOE) Office of Basic Energy Sciences under Award No. DE-SC0012573. A.C. issupported by the Harvard Quantum Initiative. J.V. is supported by Robert Bosch LLC andthe National Science Foundation (NSF), Office of Advanced Cyberinfrastructure, Award No.2003725.
VI. COMPETING INTERESTS
The authors declare no competing financial or non-financial interests.25
II. CODE AVAILABILITY
The algorithm for the MGP and mapped Bayesian active learning has been implementedin the open-source package FLARE [50] and is publicly available online [1].
VIII. DATA AVAILABILITY
The code used to generate results in this paper, and the related data are published inMaterials Cloud Archive [46] with DOI: 10.24435/materialscloud:cs-tf [52].
IX. AUTHOR CONTRIBUTIONS
Y.X. developed the MGP variance method. J.V. implemented the FLARE Bayesianactive learning framework, and Y.X. implemented MGP in the training workflow. L.S. andY.X. developed the LAMMPS pair style for MGP. Y.X. trained the MGP models for staneneand performed molecular dynamics and analysis. A.C. contributed to the DFT calculations.B.K. initiated and supervised the work and contributed to algorithm development. Y.X.and B.K. worked on the manuscript, with contributions from all authors. [1] https://github.com/mir-group/flare .[2] Albert P Bart´ok and G´abor Cs´anyi. Gaussian approximation potentials: A brief tutorialintroduction. International Journal of Quantum Chemistry, 115(16):1051–1057, 2015.[3] Albert P Bart´ok, James Kermode, Noam Bernstein, and G´abor Cs´anyi. Machine learning ageneral-purpose interatomic potential for silicon. Physical Review X, 8(4):041048, 2018.[4] Albert P Bart´ok, Risi Kondor, and G´abor Cs´anyi. On representing chemical environments.Physical Review B, 87(18):184115, 2013.[5] Albert P Bart´ok, Mike C Payne, Risi Kondor, and G´abor Cs´anyi. Gaussian approximationpotentials: The accuracy of quantum mechanics, without the electrons. Physical review letters,104(13):136403, 2010.[6] MI Baskes. Application of the embedded-atom method to covalent materials: a semiempiricalpotential for silicon. Physical review letters, 59(23):2666, 1987.
7] J¨org Behler. Atom-centered symmetry functions for constructing high-dimensional neuralnetwork potentials. The Journal of chemical physics, 134(7):074106, 2011.[8] J¨org Behler. Neural network potential-energy surfaces in chemistry: a tool for large-scalesimulations. Physical Chemistry Chemical Physics, 13(40):17930–17955, 2011.[9] J¨org Behler. Constructing high-dimensional neural network potentials: A tutorial review.International Journal of Quantum Chemistry, 115(16):1032–1050, 2015.[10] J¨org Behler. Perspective: Machine learning potentials for atomistic simulations. The Journalof chemical physics, 145(17):170901, 2016.[11] J¨org Behler and Michele Parrinello. Generalized neural-network representation of high-dimensional potential-energy surfaces. Physical review letters, 98(14):146401, 2007.[12] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.[13] Viktor F Britun, Alexander V Kurdyumov, and Igor A Petrusha. Diffusionless nucleation oflonsdaleite and diamond in hexagonal graphite under static compression. Powder Metallurgyand Metal Ceramics, 43(1-2):87–93, 2004.[14] FP Bundy. Direct conversion of graphite to diamond in static pressure apparatus. The Journalof Chemical Physics, 38(3):631–643, 1963.[15] FP Bundy, WA Bassett, MS Weathers, RJ Hemley, HU Mao, and AF Goncharov. Thepressure-temperature phase and transformation diagram for carbon; updated through 1994.Carbon, 34(2):141–153, 1996.[16] Kimberly Chenoweth, Adri CT Van Duin, and William A Goddard. Reaxff reactive forcefield for molecular dynamics simulations of hydrocarbon oxidation. The Journal of PhysicalChemistry A, 112(5):1040–1053, 2008.[17] Mathew J Cherukara, Badri Narayanan, Alper Kinaci, Kiran Sasikumar, Stephen K Gray,Maria KY Chan, and Subramanian KRS Sankaranarayanan. Ab initio-based bond orderpotential to investigate low thermal conductivity of stanene nanostructures. The journal ofphysical chemistry letters, 7(19):3752–3759, 2016.[18] Jialiang Deng, Bingyu Xia, Xiaochuan Ma, Haoqi Chen, Huan Shan, Xiaofang Zhai, BinLi, Aidi Zhao, Yong Xu, Wenhui Duan, et al. Epitaxial growth of ultraflat stanene withtopological band inversion. Nature materials, 17(12):1081, 2018.[19] Eric Ganz, Ariel B Ganz, Li-Ming Yang, and Matthew Dornfeld. The initial stages of meltingof graphene between 4000 k and 6000 k. Physical Chemistry Chemical Physics, 19(5):3756– n -body force fields using gaussian process regression. arXiv preprint arXiv:1905.07626, 2019.[23] Jiequn Han, Linfeng Zhang, Roberto Car, et al. Deep potential: A general representation ofa many-body potential energy surface. arXiv preprint arXiv:1707.01478, 2017.[24] Max Hodapp and Alexander Shapeev. In operando active learning of interatomic interactionduring large-scale simulations. arXiv preprint arXiv:2004.13158, 2020.[25] Tetsuo Irifune, Ayako Kurio, Shizue Sakamoto, Toru Inoue, and Hitoshi Sumiya. Correction:ultrahard polycrystalline diamond from graphite. Nature, 421(6925):806–806, 2003.[26] Ryosuke Jinnouchi, Ferenc Karsai, and Georg Kresse. On-the-fly machine learning force fieldgeneration: Application to melting points. arXiv preprint arXiv:1904.12961, 2019.[27] Ryosuke Jinnouchi, Jonathan Lahnsteiner, Ferenc Karsai, Georg Kresse, and Menno Bokdam.Phase transitions of hybrid perovskites simulated by machine-learning force fields trained onthe fly with bayesian inference. Physical Review Letters, 122(22):225701, 2019.[28] Rustam Z Khaliullin, Hagai Eshet, Thomas D K¨uhne, J¨org Behler, and Michele Parrinello.Nucleation mechanism for the direct graphite-to-diamond phase transition. Nature materials,10(9):693–697, 2011.[29] Menghan Liao, Yunyi Zang, Zhaoyong Guan, Haiwei Li, Yan Gong, Kejing Zhu, Xiao-PengHu, Ding Zhang, Yong Xu, Ya-Yu Wang, et al. Superconductivity in few-layer stanene. NaturePhysics, 14(4):344, 2018.[30] L Lindsay and DA Broido. Optimized tersoff and brenner empirical potential parameters forlattice dynamics and phonon thermal transport in carbon nanotubes and graphene. PhysicalReview B, 81(20):205441, 2010.[31] J. H. Los, K. V. Zakharchenko, M. I. Katsnelson, and Annalisa Fasolino. Melting temperatureof graphene. Phys. Rev. B, 91:045415, Jan 2015.[32] Jonathan P Mailoa, Mordechai Kornbluth, Simon Batzner, Georgy Samsonidze, Stephen TLam, Jonathan Vandermause, Chris Ablitt, Nicola Molinari, and Boris Kozinsky. A fast neural etwork approach for direct covariant forces prediction in complex multi-element extendedsystems. Nature Machine Intelligence, 1(10):471–479, 2019.[33] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.[34] Felix Musil, Michael J Willatt, Mikhail A Langovoy, and Michele Ceriotti. Fast and accu-rate uncertainty estimation in chemical machine learning. Journal of chemical theory andcomputation, 15(2):906–915, 2019.[35] Hiroaki Ohfuji and Kiyoshi Kuroki. Origin of unique microstructures in nano-polycrystallinediamond synthesized by direct conversion of graphite at static high pressure. Journal ofMineralogical and Petrological Sciences, 104(5):307–312, 2009.[36] Andrew A Peterson, Rune Christensen, and Alireza Khorshidi. Addressing uncertainty inatomistic machine learning. Physical Chemistry Chemical Physics, 19(18):10978–10985, 2017.[37] Steve Plimpton. Fast parallel algorithms for short-range molecular dynamics. Journal ofcomputational physics, 117(1):1–19, 1995.[38] Evgeny V Podryabinkin and Alexander V Shapeev. Active learning of linearly parametrizedinteratomic potentials. Computational Materials Science, 140:171–180, 2017.[39] Carl Edward Rasmussen. Gaussian processes in machine learning. In Summer School onMachine Learning, pages 63–71. Springer, 2003.[40] Kristof Sch¨utt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexan-dre Tkatchenko, and Klaus-Robert M¨uller. Schnet: A continuous-filter convolutional neuralnetwork for modeling quantum interactions. In Advances in Neural Information ProcessingSystems, pages 991–1001, 2017.[41] Kristof T Sch¨utt, Huziel E Sauceda, P-J Kindermans, Alexandre Tkatchenko, and K-R M¨uller.Schnet–a deep learning architecture for molecules and materials. The Journal of ChemicalPhysics, 148(24):241722, 2018.[42] KT Schutt, Pan Kessel, Michael Gastegger, KA Nicoli, Alexandre Tkatchenko, and K-RMuller. Schnetpack: A deep learning toolbox for atomistic systems. Journal of chemicaltheory and computation, 15(1):448–455, 2018.[43] Alexander V Shapeev. Moment tensor potentials: A class of systematically improvable inter-atomic potentials. Multiscale Modeling & Simulation, 14(3):1153–1173, 2016.[44] Justin S Smith, Olexandr Isayev, and Adrian E Roitberg. Ani-1: an extensible neural networkpotential with dft accuracy at force field computational cost. Chemical science, 8(4):3192– https://archive.materialscloud.org/record/2020.99 .[53] Yu Xie, Jonathan Vandermause, Lixin Sun, Andrea Cepellotti, and Boris Kozinsky. Supple-mentary material.[54] Kostyantin V Zakharchenko, Annalisa Fasolino, JH Los, and MI Katsnelson. Melting ofgraphene: from two to one dimension. Journal of Physics: Condensed Matter, 23(20):202202,2011.
55] Linfeng Zhang, Jiequn Han, Han Wang, Roberto Car, and E Weinan. Deep potential moleculardynamics: a scalable model with the accuracy of quantum mechanics. Physical review letters,120(14):143001, 2018.[56] Feng-feng Zhu, Wei-jiong Chen, Yong Xu, Chun-lei Gao, Dan-dan Guan, Can-hua Liu, DongQian, Shou-Cheng Zhang, and Jin-feng Jia. Epitaxial growth of two-dimensional stanene.Nature materials, 14(10):1020, 2015. ast Bayesian Force Fields from Active Learning: Study ofInter-Dimensional Transformation of Stanene Yu Xie, Jonathan Vandermause,
1, 2
Lixin Sun, Andrea Cepellotti, and Boris Kozinsky ∗ John A. Paulson School of Engineering and Applied Sciences,Harvard University, Cambridge, MA 02138, USA Department of Physics, Harvard University, Cambridge, MA 02138, USA
Supplementary Materials
1. Methodology 21.1. Multi-Component Systems 22. Case Study: Stanene 22.1. Training Settings 22.2. Accuracy Validation 42.3. Phase Transition Process 52.3.1. Radial Distribution Function 52.3.2. Video of the Transition Process 53. Performance 63.1. MGP’s Acceleration of Original GP 6References 7 ∗ [email protected] a r X i v : . [ phy s i c s . c o m p - ph ] A ug . METHODOLOGY1.1. Multi-Component Systems In the main text, we derived the mathematical formulation in the context of single com-ponent. We note that our GP and MGP are natural to extend to multiple components, bydifferentiating combinations of species. As described in the main text, the n -body kernelcompares two n -atom subsets p ∈ ρ ( n ) i , q ∈ ρ ( n ) j . When there are multiple species in thesystem, we define the kernel function ˜ k ( n ) ( p, q ) to be non-zero only when p and q have thesame species in their corresponding atoms. To formulate more rigorously, denote s p is thesequence of species of atom is in p , we define the multi-component kernel as˜ k ( n ) ( p, q ) = δ s p , s q exp (cid:18) − d ( n ) ( p, q ) λ ( n ) ) (cid:19) ϕ ( n )cut ( p, q ) , Therefore, the mapped force and variance can be extended to multi-component system inthe way that for each possible s p , we construct f ( n ) ( p ) and φ ( n ) ( p ) the same as the single-component system that is described in the main text. In prediction, the force and variancecontribution from n -atom subsets in the testing atomic environment are predicted by thespline functions corresponding to their species.
2. CASE STUDY: STANENE2.1. Training Settings
We start constructing a 4 × × a = 4 . d = 0 . c = 20˚A is used to make sure that the stanene monolayer is well isolated so that theperiodic boundary condition in the z-direction has little influence on the result. We performMD with the MGP force field using a time step of 1 fs, and the Velocity Verlet algorithmto integrate Newtons equations of motion, in the NVE ensemble.We use Quantum-ESPRESSO [6], a plane-wave implementation of DFT, using an ultra-soft pseudo-potential with PBEsol[4] exchange-correlation functional for Sn. The k-pointmesh is 4 × ×
1, and the energy cutoff is 50 Ry, with charge density cutoff of 400 Ry.2ython is used in all our code except for DFT. FLARE [10] is our base frameworkof Bayesian active learning with GP, and the newly developed MGP is inserted into theworkflow. The radial cutoffs for local environments are chosen as 7 . σ ), length scale l and noise σ n are shown in the tablebelow, and are optimized to maximize the marginal likelihood function during the training,using the BFGS algorithm [2, 5, 7, 9]. More details on the hyperparameter optimization areexplained in our earlier work [10]. Kernel Cutoff (˚A) Initial Hyps Optimized Hyps σ sig (eV/˚A) l (˚A) σ n (eV/˚A) σ sig (eV/˚A) l (˚A) σ n (eV/˚A)2-body 7.2 0.2 1.0 0.05 3.27e-2 0.518 0.1403-body 7.2 1.0e-4 1.0 0.05 3.75e-5 0.691 0.140 TABLE I: GP hyperparametersFor MGP, the hyperparameters are shown in Table.SII. To interpolate a function, we needto choose a closed interval and determine the grid points in this interval. The number ofgrid points for the mapped spline models is chosen based on the trade-off between accuracyand efficiency of the interpolation. For 2-body, the interval is chosen as [ a, b ], where a shouldbe smaller than the minimal interatomic distance in MD simulation, and b is set to be thesame as the 2-body cutoff of GP. For 3-body, we choose a “cube” space, where we find aclosed interval in each dimension. Since 3-body describes triplets, i.e. the triangle formedby three atoms, that can be determined by the two bonds connected to the center atom, r , r , and the angle a = h ˆ r , ˆ r i between r and r , we use ( r , r , a ) as grid points to buildup interpolation function. So the interval for r and r should be chosen the same as 2-body,only the upper bound is changed to the 3-body cutoff of GP. For the angle, the lower boundand upper bound are fixed to be 0 and π , which are not varied in different systems and notchosen by users.The 1-D and 3-D cubic spline functions come from Github repository interpolation.py [1].3 rid number lower bound upper bound2-body 144 2 .
35 ˚A 7 . × ×
128 2 .
35 ˚A, 2 .
35 ˚A, 0 rad 7 . . π rad TABLE II: MGP hyperparameters
After obtaining the refined MGP model from the above iterative training process, it isnecessary to validate the accuracy of the force field and the uncertainty prediction. Å )1.51.00.50.00.51.01.5 M G P F o r c e s ( e V / Å ) Forces Å )0.000.050.100.150.200.250.30 M G P U n c e r t a i n t i e s ( e V / Å ) Uncertainties
FIG. 1: MGP prediction versus GPFirst, because the MGP is an approximation to the original GP, we must verify thatMGP force and uncertainty prediction match those of the GP. In Fig.S1, we show that themean absolute error (MAE) for force and uncertainty are both in the order 10 − eV/˚A, andcan therefore conclude that the MGP has a negligible loss of accuracy compared to the GP.Next, we compare the performance of the MGP against the BOP force field of Ref. [3].In particular, we validate results using frames of different atomic structures taken from anindependent MD trajectory, and study the mean absolute error of forces with respect tothe DFT prediction. Since the MGP force field is trained with PBEsol exchange-correlationfunctional, ultra-soft pseudopotential, of the platform Quantum Espresso (labelled as DFT“I”), while the BOP potential is trained with PBE functional, PAW pseudopotential of4ASP (labelled as DFT “II”), we make a comparison of the mean absolute error of forceprediction with both DFT “I” and “II” as the ground truth. In Table.SIII we show that theMGP achieves a better accuracy than the BOP in all of these frames, by at least 54%. Astemperature is increased, and as the deviation from the crystal phase increases, both forcefields have larger MAE. This is because the system at high temperature explores many moredifferent configurations, likely different from the training set, nevertheless, the results showthat the MGP has uniformly better accuracy. Crystal Transition MoltenDFT I II I II I IIBOP 0.254 0.241 0.322 0.376 0.258 0.305MGP 0.082 0.120 0.148 0.206 0.110 0.139
TABLE III: Mean absolute error (MAE) of the force prediction compared to DFT, in unitsof eV/˚A.
The radial distribution function (RDF) of the slab configuration from 200K MD simula-tion is shown in Fig.2 (top), where the black dash line is the RDF of the perfect bcc latticeof tin [8].The radial distribution function of 600 ps frame in the melting process simulation isshown in Fig.2 (bottom), which shows one peak at ∼ The videos can be downloaded from DOI: 10.24435/materialscloud:cs-tf [11].5IG. 2: Radial distribution function (RDF) of 3D slab and liquid
3. PERFORMANCE3.1. MGP’s Acceleration of Original GP
Since the original GP is a non-parametric method, it depends on the training data setcompletely, thus its prediction cost scales linearly with respect to the training data set size.After the force field is trained and mapped, MGP is obtained as a parametric model, whosebasis is a group of cubic spline functions, and can be directly employed independent of theoriginal GP regression model as well as training data set, thus reaches constant scaling ofcomputational cost at prediction.The major computational bottleneck in the prediction of the current scheme lies in twoparts. One is the construction of the local atomic environment, which requires calculatinginteratomic distances and identifying all n -atom subsets. The other is the calculation ofthe first term in the variance formulation, the self-kernel function k αα ( ρ, ρ ) which scalesquadratically with the number of n -atom subsets in ρ , while the second term, the vectorized6
50 100 150 200 250 300Training Set Size ( Sp ee d - u p o f M G P o v e r G P Speed-up of Prediction
Force & UncertaintyForce only
FIG. 3: Speed up of MGP compared to GP, which grows linearly w.r.t. the training setsize. Prediction of both force and uncertainty (blue line) gets 2 orders of magnitude ofacceleration when GP has O (10 ) training data, while force prediction without uncertaintyis 3 orders of magnitude faster.spline function φ , scales linearly. We note the situation is different in the original GP, wherethe construction of atomic environment and the evaluation of the self-kernel are negligiblecompared to the evaluation of the kernel vector. In this way the present mapping schemefor both forces and uncertainties eliminated the main computational bottleneck of the fullGP model. [1] https://github.com/EconForge/interpolation.py .[2] Charles G Broyden. The convergence of a class of double-rank minimization algorithms: 2.the new algorithm. IMA journal of applied mathematics , 6(3):222–231, 1970.[3] Mathew J Cherukara, Badri Narayanan, Alper Kinaci, Kiran Sasikumar, Stephen K Gray,Maria KY Chan, and Subramanian KRS Sankaranarayanan. Ab initio-based bond orderpotential to investigate low thermal conductivity of stanene nanostructures.
The journal ofphysical chemistry letters , 7(19):3752–3759, 2016.[4] G´abor I Csonka, John P Perdew, Adrienn Ruzsinszky, Pier HT Philipsen, S´ebastien Leb`egue, oachim Paier, Oleg A Vydrov, and J´anos G ´Angy´an. Assessing the performance of recentdensity functionals for bulk solids. Physical Review B , 79(15):155107, 2009.[5] Roger Fletcher. A new approach to variable metric algorithms.
The computer journal ,13(3):317–322, 1970.[6] Paolo Giannozzi, Stefano Baroni, Nicola Bonini, Matteo Calandra, Roberto Car, Carlo Cavaz-zoni, Davide Ceresoli, Guido L Chiarotti, Matteo Cococcioni, Ismaila Dabo, et al. Quantumespresso: a modular and open-source software project for quantum simulations of materials.
Journal of physics: Condensed matter , 21(39):395502, 2009.[7] Donald Goldfarb. A family of variable-metric methods derived by variational means.
Mathe-matics of computation , 24(109):23–26, 1970.[8] Materials Project. Crystal structure of bcc sn (mp-7162). https://materialsproject.org/materials/mp-7162 .[9] David F Shanno. Conditioning of quasi-newton methods for function minimization.
Mathe-matics of computation , 24(111):647–656, 1970.[10] Jonathan Vandermause, Steven B Torrisi, Simon Batzner, Alexie M Kolpak, and Boris Kozin-sky. On-the-fly bayesian active learning of interpretable force-fields for atomistic rare events.
Preprint at https://arxiv. org/abs/1904.02042 , 2019.[11] Yu Xie, Jonathan Vandermause, Lixin Sun, Andrea Cepellotti, and Boris Kozinsky. Fastbayesian force fields from active learning and mappedgaussian processes: Application to struc-tural phase transitionof stanene. https://archive.materialscloud.org/record/2020.99 ..