[PDF] Bayesian Force Fields from Active Learning for Simulation of Inter-Dimensional Transformation of Stanene

Abstract

We present a way to dramatically accelerate Gaussian process models for interatomic force fields based on many-body kernels by mapping both forces and uncertainties onto functions of low-dimensional features. This allows for automated active learning of models combining near-quantum accuracy, built-in uncertainty, and constant cost of evaluation that is comparable to classical analytical models, capable of simulating millions of atoms. Using this approach, we perform large scale molecular dynamics simulations of the stability of the stanene monolayer. We discover an unusual phase transformation mechanism of 2D stanene, where ripples lead to nucleation of bilayer defects, densification into a disordered multilayer structure, followed by formation of bulk liquid at high temperature or nucleation and growth of the 3D bcc crystal at low temperature. The presented method opens possibilities for rapid development of fast accurate uncertainty-aware models for simulating long-time large-scale dynamics of complex materials.

Full PDF

FFast Bayesian Force Fields from Active Learning: Study ofInter-Dimensional Transformation of Stanene

Yu Xie, Jonathan Vandermause, Lixin Sun, Andrea Cepellotti, and Boris Kozinsky

1, 2, ∗ John A. Paulson School of Engineering and Applied Sciences,Harvard University, Cambridge, MA 02138, USA Robert Bosch LLC, Research and Technology Center,Cambridge, Massachusetts 02142, USA

Abstract

We present a way to dramatically accelerate Gaussian process models for interatomic forceﬁelds based on many-body kernels by mapping both forces and uncertainties onto functions of low-dimensional features. This allows for automated active learning of models combining near-quantumaccuracy, built-in uncertainty, and constant cost of evaluation that is comparable to classical ana-lytical models, capable of simulating millions of atoms. Using this approach, we perform large scalemolecular dynamics simulations of the stability of the stanene monolayer. We discover an unusualphase transformation mechanism of 2D stanene, where ripples lead to nucleation of bilayer defects,densiﬁcation into a disordered multilayer structure, followed by formation of bulk liquid at hightemperature or nucleation and growth of the 3D bcc crystal at low temperature. The presentedmethod opens possibilities for rapid development of fast accurate uncertainty-aware models forsimulating long-time large-scale dynamics of complex materials. ∗ [email protected] a r X i v : . [ phy s i c s . c o m p - ph ] A ug . INTRODUCTION Density functional theory (DFT) is one of the most successful methods for simulatingcondensed matter thanks to a reasonable accuracy for a wide range of systems. Ab-initiomolecular dynamics (AIMD) oﬀers a way to simulate the atomic motion using forces com-puted at the DFT level. Unfortunately, computational requirements limit the time-scale andsize of AIMD simulations to a few hundred atoms for a few hundred picoseconds of time,precluding investigation of phase transitions and heterogeneous reactions. Such large-scalemolecular dynamics (MD) simulation must resort to empirically derived analytical inter-atomic force ﬁelds with ﬁxed functional form [6, 16, 30, 49], trading accuracy and transfer-ability for larger length and time scales. Classical analytical force ﬁelds often do not matchthe accuracy of ab-initio results, limiting simulations to describing results qualitatively atbest or, at worst, deviating from the correct behavior. In order to broaden the reach ofcomputational simulations, it would be desirable to compute forces with ab-initio accuracyat the same cost of classical interatomic force ﬁelds.In recent years, machine learning (ML) algorithms emerged as powerful tools in regressionand classiﬁcation problems. This interest has inspired several works to develop ML algo-rithms for interatomic force ﬁelds, for example neural-networks (NN) [7–11, 32, 44], MTP[24, 38, 43], FLARE [50], GAP [2–5], SNAP [48], SchNet [40–42], DeePMD [23, 51, 55],among others. All these machine learning potentials can make predictions at near ab-initioaccuracy, while greatly reducing the computational cost compared to DFT.However, most machine learning models have no predictive distributions which provideuncertainty for energy/force predictions. Without uncertainty, the training data is gener-ally selected from ab-initio calculations via a manual or random scheme. Determining thereliability of the force ﬁelds then becomes diﬃcult, which could result in untrustworthy con-ﬁgurations in the molecular dynamics simulations. Therefore, uncertainty quantiﬁcation isa highly desirable capability [34, 36]. Neural network potentials, e.g. ANI [44, 45] can useensemble disagreement as an uncertainty measure. This statistical approach is, however,not guaranteed to yield reliable calibrated uncertainty. In addition, NN approaches usuallyrequire tens of thousands of data for training, and are a few orders of magnitude slowerthan analytical force ﬁelds. Bayesian models are promising for uncertainty quantiﬁcationin atomistic simulations since they have an internal principled uncertainty quantiﬁcation2echanism, the variance of the posterior prediction distribution, which can be used to keeptrack of the error on forces during a molecular dynamics (MD) run. For instance, Jinnouchi et al. [26, 27] used high-dimensional SOAP descriptors with Bayesian linear regression [4].Gaussian Process (GP) regression [4, 5, 20–22] is a Bayesian method that has been shown tolearn accurate forces with relatively small training data sets. Bartok et al. [3] used GP un-certainty with the GAP/SOAP framework to obtain only qualitative estimates of the forceﬁelds accuracy. Recently Vandermause et al. [50] demonstrated a GP-based Bayesian activelearning scheme in the FLARE framework, utilizing rigorously principled and quantitativelycalibrated uncertainty, applying it to a variety of multi-component materials.In the most common form, however, GP models require using the whole training dataset for prediction, meaning that the computational cost of prediction grows linearly withthe size of the training set, and so accuracy increases together with computational cost. Incomplex multi-component structures, O (10 ) ∼ O (10 ) training data points are typicallyrequired to construct an accurate GP model, which makes predictions slow. To address thisissue, Glielmo et al. [20–22] noted that, for a suitable choice of the n -body kernel function, itis possible to decompose GP force/energy prediction into a summation of low-dimensionalfunctions. As a result, one can construct a parametric mapping of the n -body kernel functioncombined with a ﬁxed training set. This approach reaches a constant scaling, which increasesthe speed of the GP model without accuracy loss.In order to accelerate force prediction as well as the uncertainty quantiﬁcation, enablingon-the-ﬂy training for GP force ﬁelds, it is necessary to also map uncertainty onto lowdimensional functions. However, mapping the prediction of uncertainty was not attempted in[20–22], since the mathematical form of uncertainty results in decomposition with twice thedimension of the corresponding mean functions, dramatically increasing the computationalcost of their evaluation with spline interpolations. Thus, to date there is no method thatcombines high accuracy, modest training data requirement and fast prediction of both forcesand their principled Bayesian uncertainty.To address these issues, here we develop the Mapped Gaussian Process (MGP) methodthat implements an essentially lossless mapping of both force (mean) and uncertainty (vari-ance) onto low-dimensional functions of atomic positions. Using dimensionality reductiontechniques, we increase the eﬃciency of the MGP to make it applicable for large-scale (millionatom) simulations, achieving speeds comparable to classical analytical force ﬁelds, several3rders of magnitude faster than available NN or full GP approaches. Quantiﬁed principleduncertainty allows us to implement a Bayesian active learning workﬂow, which automatesand accelerates the training process and increases the eﬃciency of phase space exploration.As a result, the MGP method beneﬁts from the capability of quantifying its uncertainty,while at the same time retaining cost independent of training size. To illustrate the beneﬁtsof this method, we incorporated the MGP force ﬁeld in the parallel MD simulation codeLAMMPS [37], and applied it to simulations of large-scale phase transformation dynamics.As a test application, we focus on stanene, a 2D material that has recently gatheredattention as a quantum spin Hall insulator [47]. Moreover, the only published force ﬁeldfor stanene is, to the best of our knowledge, the bond-order potential by Cherukara etal. [17], which is ﬁtted to capture stanene’s low-temperature crystalline characteristics but,as common to many empirical interatomic potentials, suﬀers in accuracy near the meltingtemperature. In particular, we show that MGP is capable of rapidly learning to describe awide range of temperatures, below and around the melting temperature, and that we caneﬃciently monitor the uncertainty on forces at each time step of the molecular dynamics, anduse this capability to iteratively increase the accuracy of the force ﬁeld by hierarchical activetraining. Using parallel simulations of large-scale structures, we characterize the unusualphase transition, where the 2D monolayer transforms to bcc bulk Sn at the temperature of200K, and melts to ultimately form a 3D liquid phase at the temperature of 500K.The application demonstrates that the MGP force ﬁeld approaches the computationalcost of an empirical analytical force ﬁeld, retains nearly ab-initio accuracy, quantitativeBayesian uncertainty, and is capable of describing large-scale systems with complex physicalprocesses such as phase transitions.

II. RESULTSA. Bayesian Active Learning

In a MD simulation, it is likely that the system will evolve to atomic conﬁgurationsthat have not been seen before, and are therefore far from those in the training set. Inthis situation, the uncertainties of the predictions will grow to large values which may beconsidered unsatisfactory. Therefore, it is desirable to obtain an accurate estimate of the4IG. 1: Bayesian active learning (BAL) workﬂow with MGP. Iterations of BAL can be runfor diﬀerent system sizes to reﬁne the force ﬁeld.forces for such conﬁgurations using a relatively expensive ﬁrst principles calculation, andadd this new information to the training set of the GP regression model.This procedure is referred to as Bayesian Active Learning (BAL), where training exam-ples are added on the ﬂy as more information is obtained about the problem. Bayesianmachine learning models such as GP are particularly well-suited to such uncertainty-basedactive learning approaches, as they provide a well-deﬁned probability distribution over modeloutputs which can be used to extract rigorous estimates of model uncertainty for any testpoint [12, 33].Here we adopt BAL to achieve automatic training of models for atomic forces, expandingon our earlier workﬂow [50]. This way the accuracy of the GP model increases with time,particularly in the conﬁguration regions that are less explored and likely to be inaccurate.As mentioned above, GP regression cost scales linearly with the training set size, so theprediction of forces and uncertainties becomes more expensive as the BAL algorithm keepsadding data to the training set, hindering simulations of complex materials systems. TheMGP approach is based on an essentially exact mapping both mean and variance of thepredicted forces from a full GP model for a ﬁxed training set to a low-dimensional splinemodel. Mathematical formalism is detailed in the Methods section below. The novelty ofthis approach is the ability to accelerate prediction of not only the mean, but also GP’sinternal uncertainty without loss of accuracy. MGP is incorporated into the BAL workﬂowas depicted in Fig.1. In this scheme, the mapping is constructed every time the GP model isupdated. Atomic forces and their uncertainties for MD are produced by MGP, reducing the5omputational cost of the full original GP prediction by orders of magnitude. We stress thatnot only is the MGP faster than the original GP, but also its speedup is more pronounced asthe training data set size N T increases. This is especially critical in complex systems withmultiple atomic and molecular species and phases, where more training conﬁgurations areneeded.In addition, fast evaluation of forces and their variances enables training the force ﬁeldmodel on larger system sizes. Therefore, we can train the MGP force ﬁeld using a hierarchicalscheme, i.e. for diﬀerent system sizes, we can run several iterations of active learning. First,a small system size is used to train an MGP force ﬁeld model using the BAL workﬂow. Then,we perform BAL on a larger system size, where DFT calculations are more expensive but areneeded less frequently, in order to capture potentially new unexplored conﬁgurations suchas defects that are automatically identiﬁed as high uncertainty by the model of the previousstep. We note that DFT calculations scale as the 3rd power of the number of electrons inthe system, so it is eﬃcient to perform most of the training at smaller sizes, and to onlyrequire re-training on fewer large-scale structures. This way the hierarchical training schemehelps to overcome ﬁnite size eﬀects and explore phenomena that cannot be captured withsmall system sizes, such as phase transformations. B. Case Study: 2D to 3D Transformation of Stanene

To illustrate our method, we apply the mapped on-the-ﬂy workﬂow (Fig.1) to studythe phase transformation of stanene, a two-dimensional slightly buckled honeycomb latticeof tin. This structure has received much attention recently due to its unique topologicalelectronic properties. Stanene can be synthesized by molecular beam epitaxy [56] and hasbeen shown to have topologically protected states in the band gap [18]. Moreover, tri-layerstanene is a superconductor [29]. Here, we want to focus on the less studied aspects ofthermodynamic stability and phase transition mechanisms of free standing stanene.Despite the intense interest, capability is still lacking for accurate modeling of thermody-namic properties and phase stability of stanene, and most other 2D materials, for which ∼ nmscale supercells are required. Such simulations are beyond the ability of DFT, and to thebest of our knowledge, the only published classical force ﬁeld for stanene is a bond-order po-tential (BOP) [17]. Our focus is on developing an accurate potential to model the 2D melting6rocess, which is a notoriously diﬃcult problem to tackle with existing classical force ﬁelds.In fact, describing the melting mechanism of a 2D material is a highly non-trivial problem,since it can be characterized by speciﬁc behaviors such as defect formation and nucleation.For example, the melting of graphene is thought to occur in a temperature range between4000K and 6000K [19, 31, 54], and seems to be characterized by the formation of defects,while above 6000K the structure melts directly into a totally disordered conﬁguration. Sincebulk tin melts at a much lower temperature ( ∼ ∼

1. Model Training

To train the force ﬁeld for stanene, we start by training the MGP using the BAL loop ona small system size, and gradually increase the size of the simulation, iteratively improvingupon the force ﬁeld developed for the smaller size.The DFT settings and the GP model hyperparameter settings are shown in the section2.1 of supplementary material [53].The ﬁrst iteration of training is started with a 4 × × (a) (b) FIG. 2: (a) Pristine stanene, top view and front view. (b) Atomic mean squaredisplacement (MSD) of the on-the-ﬂy training trajectory. The solid black line shows theMSD of the 32 atoms, the color blocks indicate time intervals corresponding to diﬀerenttemperatures, and the red dots represent events when DFT calculations are called as theuncertainty predictions exceed the data acquisition decision threshold.In this small simulation, the stanene monolayer decomposes at 800K, where Sn atomsstart moving around the simulation cell. This allows us to augment the training set withsuch disordered conﬁgurations. However, the small size of the simulation cell does not yetallow us to draw conclusions about the melting process of stanene: for this, we need a largersimulation cell.We continue with our hierarchical active learning scheme to reﬁne the force ﬁeld, butretaining the training data from the previous smaller MD run. We now increase the numberof atoms in the simulation, which allows us to explore additional atomic conﬁgurations andimprove the quality of the model. Since the system becomes larger and more computationallyexpensive, instead of updating the model at each learning step, we update it at the end ofeach MD simulation. We note that this ”batching” approach is one of many possible trainingschemes and is chosen for reasons of eﬃciency. The procedure is as below:1. Run MD simulation on a system of a given size with MGP;2. Predict mapped uncertainties for the frames from the MD trajectory;3. Select frames of highest uncertainties in diﬀerent structures (crystal, transition stateand disordered phase) for additional DFT calculations;8 a)(b) (c)

FIG. 3: (a) Uncertainty of the force predictions in the whole MD trajectory decreases inthree iterations. (b) Mean absolute errors (MAE) of MGP forces against DFT in threeiterations, validated on atomic structures of diﬀerent phases. (c) Upper: A snapshot of2048 atoms from MD simulation at 500K, colored by MGP uncertainty. The outlined areahas highest predicted uncertainty, since it contains a defect and a compact region notknown to the ﬁrst MGP model. Lower-left: Zoomed-in structure of the region of highuncertainty, colored by atomic heights (z-coordinate) to show the bilayer structure of thedefect. Lower-right: The same snapshot colored by coordination number, which showsstrong correlation with the uncertainty measure.4. Add atomic environments from those frames based on uncertainty to the training setof GP;5. Construct a new MGP model using the updated GP;6. Repeat the process for same or larger simulation using the new model.9he above steps form one iteration, and can be run several times to reﬁne the force ﬁeld.In our force ﬁeld training for stanene, we performed one iteration of 32 atoms as describedbefore, and two iterations all on the system size of 200 atoms at 500K, a relatively hightemperature to explore diverse structures in the conﬁguration phase space. We denote theMGP force ﬁeld from the on-the-ﬂy training of 32 atoms as MGP-1, the ones from the twoiterations of training on 200 atoms as MGP-2 and MGP-3. MD are simulated by MGP-1,2, 3 and the predicted uncertainties of the frames from the three MD trajectories predictedby MGP are shown in Fig.3a. First, we note that there is a drastic increase of uncertaintywithin each trajectory at a certain time. This increase corresponds to the transition fromcrystal lattice to a disordered phase, so the uncertainty acts as an automatic indicator of thestructural change. Next, we note that the uncertainties of MGP-1, 2, 3 decrease iterationby iteration, indicating our model becomes more conﬁdent as it collects more data. Themean absolute error (MAE) of force predictions of MGP against DFT is investigated, aspresented in Fig.3b, showing that MGP becomes also more accurate with each subsequenttraining iterations.It is worth noting that in the transition process of the simulation by MGP-1, in the tran-sition process there appear a bi-layer stacked defect and a speciﬁc compact triangular latticearound the defect. It is a visibly new ordered structure diﬀerent from the hexagonal stanene,and is thus missing in the initial training set, as shown in Fig.3c. The uncertainty is lowin honeycomb lattice, as expected, and becomes high around the bi-layer defect, exhibitingalso a strong correlation with the coordination number. Thus, the mapped uncertainty isa accurate automatic indicator of the novelty of the new conﬁgurations. We note that af-ter iterative model reﬁnement by the hierarchical training scheme, the triangular lattice nolonger appears in the simulation. This example illustrates how the hierarchical active learn-ing scheme helps avoid qualitatively incorrect results that arise due to the a priori unknownrelevant conﬁgurations. In contrast, a single-pass training of a force ﬁeld based on manuallychosen conﬁgurations is very likely to result in non-physical predictions.

2. Phase Transition Simulations

To investigate the full phase transition mehanism of stanene, we consider low tempera-ture (below 200K) and high temperature (above 500K) values, corresponding to the crystal10nd liquid phase for bulk tin. In studying the graphite-diamond transition, Khaliullin etal.[28] identiﬁed that nucleation and growth of the disordered phase play key roles in thephase transition process. We also ﬁnd that in stanene the phase transition begins with theappearance of defects and the consequent nucleation and growth of the disordered phasethat eventually transforms the structure.From DFT ab-initio calculations, the ground state energy of honeycomb monolayer is ∼ .

38 eV/atom higher than the bcc bulk, so the 2D monolayer structure is expected to bemetastable. The full transformation described below is therefore a sequence of 2D melting,followed by a kinetically controlled transition to the bulk phase of lower free energy. Weset up the simulation of stanene monolayer of size 15 . × . α -phase with the dia-mond structure, which is diﬀerent from what we observe here. The reason for the formationof the bcc phase is kinetic rather than thermodynamic, caused by a lower free energy barrierseparating the 3D disordered structure from the bcc phase than the α -phase, making theformer phase more kinetically accessible. One well-known example is the graphite-diamondtransition, in which the meta-stable hexagonal diamond (HD) is obtained from hexagonalgraphite (HG) in most laboratory synthesis, instead of the more thermodynamically stable11

00 ps 200 ps0.3 ns 0.8 ns2 ns1.6 ns 400 ps 600 ps (a) (b)

FIG. 4: (a) The monolayer - bulk transition at 200K. Using common neighbor analysis, theorange atoms are identiﬁed as bcc sites, and atoms of other colors are identiﬁed as othersites. The frame at 0.3 ns: the atoms are vibrating around the honeycomb lattice siteswith out-of-plane ripples. The frame at 0.8 ns: the atoms arrange in the “disordered” way.The frame at 1.6 ns: some patches of bcc lattice structure appear. The frame at 2 ns: theatoms form and vibrate around the bcc slab sites. (b) Frames from 500K NPT simulationwith 10,000 atoms (39 . × . . × . C. Mapped Gaussian Process Performance

In this section, we discuss the computational cost of the MGP model in application tolarge scale MD simulations, and its advantages and limitations. Timing results are shown inFig.5a, comparing diﬀerent models including DFT, full GP (without mapping; implementedin Python), MGP (with mapped force and uncertainty; Python), MGP-F (MGP with forceprediction only; Python), MGP (force only; implemented in LAMMPS) and BOP (LAMMPSforce ﬁeld). For DFT, we tested prediction time for system sizes of 32 atoms and 200 atoms,where computational cost growing as the cube of the system size. The other algorithms’cost scales linearly with system size, since they rely on local environments with ﬁnite cutoﬀradius, we measure the same per atom computational cost for both system sizes.It is important to highlight the acceleration of the MGP relative to the original full GP.We present the timings of GP and MGP built from diﬀerent training set sizes (100 and 400training data) in Fig.5, where the GP scales linearly. MGP-F (without uncertainty) andMGP (with uncertainty, ﬁxed rank) are independent of the training set size. The numericalresult shows that the speedup of the MGP relative to the GP increases linearly with moredata added to the training set, as reported in the Fig.S3 of the supplementary material[53]. In the stanene system discussed in Sec.II, MGP is two orders of magnitude faster thanGP with O (10 ) training data points when both force and variance are predicted, and thespeedup becomes more signiﬁcant as the GP collects more training data. We also note thatalthough the prediction of MGP gets rid of the linear scaling with training set size, theconstruction of the mappings does scale linearly with training set size, since it requires GPto make prediction on each grid point. 13 b)(a) FIG. 5: (a) The comparison of prediction time (s × processors / atom) between DFT, fullGP, MGP (Python, with force and uncertainty), MGP-F (Python, with force only), MGP(LAMMPS), BOP (LAMMPS). (b) The scaling of MGP in LAMMPS with respect todiﬀerent system sizes, in diﬀerent phases. Green: 2D honeycomb lattice. Blue: 3D denseliquid.The cost of the MGP force ﬁeld is comparable to that of empirical interatomic BOP forceﬁeld, since both BOP and MGP models consider interatomic distances within the localenvironment of a central atom and thus have similar computational cost at the same cutoﬀradius. For accurate simulations, however, the MGP uses a larger cutoﬀ radius (7 . ystem size (atoms) 100 , . . · timestep/processor/s) ∼ . × MGP speed (atom · timestep/processor/s) ∼ . × TABLE Iis still slower by a factor of 4 ∼ n -body kernel only uses low-dimensionalfunctions of crystal coordinates. In this work, we have implemented the MGP for 2- and3-body interactions, which, for the mapping problem, translate to interpolation problems in3 dimensions. The approach may be extended to higher interaction orders at the expenseof computational eﬃciency, but it becomes expensive in terms of both computation andstorage requirements for splines on uniform grids to reach a satisfactory accuracy. Many-body descriptors can be taken into account to increase the descriptive power. To map itin the same way as the methodology we introduced, it is necessary to have a descriptorsuch that it can be decomposed into low-dimensional functions. For example, the SOAPapproach of Refs. [26, 27] cannot be mapped using the methodology followed in this work,since it relies on a high-dimensional descriptor and, as a result, the MGP approach is notreadily applicable. In particular, the interpolation procedure may be too computationallyexpensive to be a viable solution. III. CONCLUSION

This work resolves a key computational cost scaling limitation of Gaussian Process re-gression of interatomic force ﬁelds in the case of many-body kernels and provides a newapproach for Bayesian active learning, enabling robust automatic training of high accuracyinteratomic force ﬁelds. We present an extended method for mapping the Gaussian Processregression model, where we accurately parameterize both the force and its variance obtainedfrom the original Bayesian model as functions of atomic conﬁgurations. The mapped meanforces and their variance are then utilized in an active learning workﬂow to obtain a machinelearned force ﬁeld that includes its own uncertainty, the ﬁrst realization of a fast

Bayesianforce ﬁeld . The resulting model has near-quantum accuracy and computational cost com-parable to empirical analytical interatomic potentials. Large scale simulations are used asa demonstration to investigate the microscopic details of inter-dimensional transformationbehavior of 2D stanene into 3D bulk tin. We discover that the transformation proceeds bynucleation of bilayer defects, densiﬁcation due to continued disordered multilayer stacking,and ﬁnally conversion into either crystalline or molten bulk tin, depending on temperature.This application shows that we can actively monitor the uncertainties of force predictions16uring molecular dynamics simulations and iteratively improve the model as the materialundergoes reorganization, exploring diverse structures in the conﬁguration phase space. Theability to reach simulation sizes of over 1 million atoms is promising for application of ma-chine learning force ﬁelds to study phase diagrams and transformation dynamics of complexsystems, while automated active learning of force ﬁelds opens a way to accelerate wide-rangematerial design and discovery.

IV. METHODS

The goal of the MGP method is to map the predictive mean and variance of n -bodyGP models onto low-dimensional splines, making the cost of evaluating atomic forces anduncertainty independent of the training set size, as also studied in previous works [20–22].As mentioned in Ref. [22], decomposition of the GP kernel into low-dimensional functionsexists for the GP predictive mean and variance. However, the resulting variance functionshave twice the dimension of the corresponding mean functions, dramatically increasing thecomputational cost of their evaluation. In order to make such a mapping feasible in practice,we developed a dimension reduction technique that reduces the variance map to the samedimensions as mean, making it possible to use splines to evaluate both the mean and varianceof the GP. A. Background

1. Gaussian Process Force Field

In this section, we summarize the GP model introduced in Ref. [50]. From here on, weuse bold letters to denote matrices/vectors and regular letters for scalars/numbers.An atomic conﬁguration is deﬁned by atomic coordinates R α , a vector of dimensionequal to the number of atoms N atoms , and where α labels the three Cartesian directions α = x, y, z . While in general the crystal may contain atoms of diﬀerent species, we hereconsider only a single species case for the sake of simplicity. The results can be extended tothe multi-component case, which is discussed in the section 1.1 of supplementary material[53]. 17he objective of the GP regression model is to predict the forces F α for an given atomicconﬁguration R α . To this aim, we associate the forces of an atom i , F i,α , to the localenvironment around this atom ρ i . F i,α can then be computed by comparing its environmentwith a training data set of atomic environments with known forces, which is quantiﬁed bydistance d between the target environment and the training set.The atomic environment ρ i of the atom i is deﬁned as a set of atoms, including the atom i itself and all the other atoms that are within a sphere of a cutoﬀ radius r cut centeredat the atom i . We proceed by partitioning this atomic environment ρ i into a set of n -atom subsets, denoted by ρ ( n ) i . For each type of subset, a distance d ( n ) can be deﬁned toquantify the diﬀerence between environments. We show examples of n = 2 and n = 3 onthe construction of ρ ( n ) i and d ( n ) . For n = 2, ρ (2) i is a set of pairs p , between central atom i and another atom i in ρ i , ρ (2) i = { p = ( i, i ); i ∈ ρ i , i = i } ; (1)To each atomic pair p we associate an interatomic distance r p between the two atoms. Thedistance metric between two pairs p and q is then deﬁned as ( r p − r q ) . And the distancemetric between two atomic environments ρ (2) i and ρ (2) j is deﬁned as( d (2) ij ) = X p ∈ ρ (2) i ,q ∈ ρ (2) j ( r p − r q ) . (2)This distance d (2) will be used to determine the two-body contribution to the atomic forces,for the purposes of constructing the kernel function.This construction can be extended to higher order interatomic interactions. In this work,we also include three-body interactions, corresponding to n = 3. Similar to the two-bodycase, the set ρ (3) i of triplets p between the central atom i and two other atoms i and i in ρ i is deﬁned as, ρ (3) i = { p = ( i, i , i ); i , i ∈ ρ i , i = i, i = i, i = i } . (3)This time, each triplet p is characterized by a vector r p = { r p, , r p, , r p, } , which representsthe three interatomic distances of this atomic triplet p . Including permutation, the distancemetric between two triplets p and q can be deﬁned as P u,v ( r p,u − r q,v ) . And the 3-bodydistance metric between two atomic environments ρ (3) i and ρ (3) j is( d (3) ij ) = X p ∈ ρ (3) i ,q ∈ ρ (3) j X u,v ( r p,u − r q,v ) . (4)18aving now deﬁned a distance between atomic environments, we can build a GP re-gression model for atomic forces. We deﬁne the equivariant force-force kernel function tocompare two atomic environments as k αβ ( ρ i , ρ j ) = X n k ( n ) αβ ( ρ ( n ) i , ρ ( n ) j ) = X n ∂ k ( n ) ( ρ ( n ) i , ρ ( n ) j ) ∂R iα ∂R iβ , (5)where α and β label Cartesian coordinates, and ∂∂R iα indicates the partial derivative ofthe kernel with respect to the position of atom i . The n -body energy kernel function k ( n ) ( ρ ( n ) i , ρ ( n ) j ) is ﬁrst expressed in terms of the partitions of n atoms of the two atomicenvironments ρ i and ρ j , and is deﬁned as: k ( n ) ( ρ ( n ) i , ρ ( n ) j ) = X p ∈ ρ ( n ) i q ∈ ρ ( n ) j ˜ k ( n ) ( p, q ) = X p ∈ ρ ( n ) i q ∈ ρ ( n ) j σ ( n )sig exp (cid:20) − ( d ( n ) ij ) l ( n ) ) (cid:21) ϕ ( n )cut , (6)where ˜ k ( n ) ( p, q ) is a Gaussian kernel function, built using the deﬁnition of distance between n -atoms subsets as described before. The σ ( n )sig and l ( n ) are hyperparameters independent of p and q , that set the signal variance related to the maximum uncertainty and the length scale,respectively. The smooth cutoﬀ function ϕ ( n )cut dampens the kernel to zero as interatomicdistances approach r cut . In this work, we chose a quadratic form for the cutoﬀ function, ϕ (2)cut ( p, q ) = ( r q − r cut ) ( r p − r cut ) , (7) ϕ (3)cut ( p, q ) = Y s ( r sp − r cut ) Y t ( r tq − r cut ) . (8)The GP regression model uses training data as a reference set of N T atomic environments.In particular, predictions of force values and variances require a 3 N T × N T covariance matrix Σ = Σ iα,jβ = k αβ ( ρ i , ρ j ) + σ n δ ij δ αβ (9)where σ n is a noise parameter (indices iα and jβ are grouped together). We deﬁne anauxiliary 3 N T dimensional vector η with components η iα := (Σ − ) iα,jβ y jβ , (10)where y contains the training labels, i.e., the forces F jβ acting on the atom at the center ofenvironment j in the Cartesian direction β .19n order to make a prediction of forces and their errors on a target structure ρ i , wecompute the Gaussian kernel vector ¯ k iα for atom i and a direction α ( α = x, y, z ), where thecomponents of the vector are kernel distances (¯ k iα ) jβ := k αβ ( ρ i , ρ j ) between the prediction iα and the training set points jβ , j = 1 , ..., N T , β = x, y, z . Finally, the predictions of themean value of the Cartesian component α of the force F iα acting on the atom at the centerof the atomic environment ρ i and its variance V iα are given by [39]: F iα = ¯ k > iα η , (11) V iα = k αα ( ρ i , ρ i ) − ¯ k > iα Σ − ¯ k iα . (12)We notice that this formulation has two major computational bottlenecks, that we in-tend to address in this work. First, the vector ¯ k iα needs to be constructed for every forceprediction; for each atom, the evaluation of ¯ k iα requires the evaluation of the kernel betweenthe atomic environment and the training data set, i.e. N T calls to the Gaussian kernel. Wediscuss an approach to speed up the evaluation of ¯ k iα . Second, every variance calculationrequires a matrix-vector multiplication that is computationally expensive, and thus will befurther discussed in Sec.IV C. B. Mapped Force Field

We start by observing that the GP model constructs a latent “weight ﬁeld” with a ﬁxedset of training data. In GP regression, a kernel function is used to build weights that dependon the distance from training data: training data that are similar (close) to the test datacontribute with a larger weight, and vice versa, dissimilar (distant) training data contributesless. To make a prediction on a data point (force on an atom in its local environment), theGP regression model requires the evaluation of the kernel between such point and all the N T training set data points. The result, i.e. the GP mean prediction, is a weighted averageof the training labels, using the distances as measured by the kernel function as weights . InEq. 11, we notice that the predictive mean requires the evaluation of a 3 N T -dimensionalvector ¯ k iα , which quantitatively weighs the contribution to the test atomic environment ρ i coming from the training atomic environments ρ j .In the following discussion, we consider a speciﬁc subset size n , capturing n -body atomicstructure information, i.e. (¯ k ( n ) iα ) jβ = k ( n ) αβ ( ρ ( n ) i , ρ ( n ) j ) without loss of generality. The same20reatment below can be implemented on each n and summed over the mappings of all ” n ”sto get the ﬁnal prediction if we are using a kernel function of the form of Eq.5.Starting with the idea introduced in Ref. [21], we can construct the weight ﬁeld byconsidering the special form of the kernel at Eq. 6. In fact, such kernel deﬁnition allows usto decompose the force prediction into contributions from n -atom subsets, and to decomposethe variance into contributions from pairs of n -atom subsets.To see this, we ﬁrst note that the kernel vector of atomic environments ρ i can be decom-posed into the kernel of all the n -atom subsets in ρ i , and the kernel of an n -atom subset p can be further decomposed into the contributions from all the n − i ). We denote r sp := R s − R i as a bond in p , where s labels the remaining n − r sp := r sp /r sp denoting the direction ofthe bond. Then the force kernel has decomposition¯ k ( n ) iα = X p ∈ ρ i ˜ k ( n ) iα ( p ) = X p X s µ ( n ) s ( p )ˆ r sp,α , (13)where h ˜ k ( n ) iα ( p ) i jβ := ∂ ˜ k ( n ) ( p, q ) ∂R iα ∂R jβ (14)and h µ ( n ) s ( p ) i jβ := X q ∈ ρ j X t ∂ ˜ k ( n ) ( p, q ) ∂r sp ∂r tq ˆ r tq,β (15)are vectors in R N T space. In the ﬁrst equality of Eq.13 we used the decomposition of Eq.6, and in the second equality we ﬁrst convert the derivative from atomic coordinates to therelative distances of the n -atom subset, then take advantage of the fact that the kernel onlydepends on relative distances between atoms, i.e. r sp = k r sp k .Therefore, forces may be written as F ( n ) iα = X p ∈ ρ i X s f ( n ) s ( p )ˆ r sp,α . (16)where f ( n ) s ( p ) := µ ( n ) s > ( p ) η (17)is the scalar “weight ﬁeld” function and η is deﬁne in Eq.10.21s noted in Ref. [21], we can achieve a better scaling by constructing the weight ﬁeldusing a computationally eﬃcient parametric method, i.e. an approximate way for construct-ing such functions f . In fact, a parametric function can be used to quickly yield the weightﬁeld value without the need of evaluating kernel distances from all the GP training points.As a result, the cost of evaluating the weight ﬁeld using this parametric model is indepen-dent on the size of the training set, removing a computational bottleneck of the full GPimplementation.In order to achieve an eﬀective parametrization of these weight functions, we use a cubicspline interpolation on a uniform grid of points, which beneﬁts from having an analyticform and a continuous second derivative. By increasing the number of points in the uniformgrid, the MGP prediction of mean and variance can be made arbitrarily close to the originalfunctions.The construction of the mapped force ﬁeld includes the following steps: Step 1 : Deﬁne a uniform grid of points { x k ∈ [ a, b ]( n ) } in the space of distances given bythe vectors r p . The lower bound a ≥ b can be set to the cutoﬀ radius of atomic neighborhoods,above which the prediction vanishes (using the zero prior of GP), and (cid:0) n (cid:1) = n ( n − / n -atom subset. Step 2 : A GP model with a ﬁxed training set yields the values { f ( n ) s ( x k ) } for each gridpoint x k . Step 3 : Interpolate with a cubic spline function ˆ f ( n ) s through the points { ( x ( n ) k , f ( n ) s ( x k )) } .In prediction, contributions from all n -bodies are added up, so the force of local environ-ment ρ is predicted as: ˆ F α ( ρ ) = X n X p ∈ ρ ( n ) X s ˆ f ( n ) s ( p )ˆ r sp,α . . Mapped Variance Field Using a derivation similar to that of the force decomposition (Eq.16), we derive thedecomposition of the variance V iα = X n,n X p ∈ ρ ( n ) q ∈ ρ ( n ) X s,t v ( n,n ) s,t ( p, q )ˆ r sp,α ˆ r tq,α , (18)where v ( n,n ) s,t ( p, q ) := δ nn ˜ k ( n ) αα ( p, q ) − µ ( n ) s ( p ) > (Σ) − µ ( n ) t ( q ) . (19)with the covariance matrix Σ deﬁned in Eq.9.We note that the domain of the variance function v has dimensionality twice that of f ,so that when the interaction order n >

3, the number of grid points becomes prohibitivelylarge to eﬃciently map v . Even for a 3-body kernel, when v is a 6-D function, we ﬁnd thatthe evaluation of the variance, both in terms of computational time and memory footprint,limits its usage to simple benchmarks.An eﬃcient evaluation of the variance is critical for adopting Bayesian active learning.We now discuss a key result of this work and introduce an accurate yet eﬃcient approximatemapping ﬁeld for the variance. In particular, we focus on simplifying the vector µ ( n ) s ( p ),which, from our tests, is the most computationally expensive term for predicting variance.In particular, since µ ( n ) s ( p ) evaluates the kernel function between the test point p and allthe training points, it scales linearly with the training set size. By implementing a mappingof the variance weight ﬁeld, the cost of variance prediction depends only on the rank of ourdimension reduction approach, and is independent of the training data set size.To this aim, we ﬁrst deﬁne the vector ψ ( n ) s ( p ) := L − µ ( n ) s ( p ) ∈ R N T , where L is theCholesky decomposition of Σ, i.e. Σ = LL > , so that Eq. 19 can be written as v ( n,n ) s,t ( p, q ) = δ nn ˜ k ( n ) α,α ( p, q ) − ψ ( n ) s ( p ) > ψ ( n ) t ( q ) . (20)The domain of vector function ψ ( n ) s ( p ) is half the dimensionality of that of v ( n,n ) s,t ( p, q ).As a ﬁrst idea, since each component of ψ ( n ) has domain of lower dimensionality, onemay think of building 3 N T spline functions to interpolate the 3 N T components separately,which means the number of spline functions needed grows with the training set size.23o achieve a more eﬃcient mapping of ψ ( n ) s , we take advantage of the principal compo-nent analysis (PCA), a common dimensionality reduction method.The construction of the mapped variance ﬁeld includes the following steps: Step 1 : We start by evaluating the function values on the uniform grid { x k ∈ [ a, b ]( n ) , k =1 , . . . , G } as introduced in the section above. At each grid point x k , we evaluate the vectors ψ ( n ) s ( x k ) and use them to build the matrix M ( n ) s := ( ψ ( n ) s ( x ) , ψ ( n ) s ( x ) , ..., ψ ( n ) s ( x G )) > ∈ R G × N T . Step 2 : In PCA we perform singular value decomposition (SVD) on M ( n ) s , such that M ( n ) s = U ( n ) s Λ ( n ) s ( V ( n ) s ) > , where U ( n ) s ∈ R G × N T and V ( n ) s ∈ R N T × N T have orthogonalcolumns, and Λ ( n ) s ∈ R N T × N T is diagonal. Step 3 : We reduce the dimensionality by picking the m largest eigenvalues of Λ ( n ) s andtheir corresponding eigenvectors from U ( n ) s and V ( n ) s (here we assume that eigenvalues areordered). Then, we readily obtain a low-rank approximation as M ( n ) s ≈ ¯ M ( n ) s = ¯ U ( n ) s ¯ Λ ( n ) s ( ¯ V ( n ) s ) > , (21)with ¯ Λ ( n ) s ∈ R m × m , ¯ U ( n ) s ∈ R G × m , and ¯ V ( n ) s ∈ R N T × m . Step 4 : Deﬁne φ ( n ) s := ( ¯ V ( n ) s ) > ψ ( n ) s , which is a vector in R m , and interpolate this vectorbetween its values at the grid points { x k } (in detail, φ ( n ) s,i ( x k ) = ¯ U ( n ) s,ij ¯Λ ( n ) s,j , for i, j = 1 , ..., m ).As done previously for the force mapping, spline interpolation can now be used for approx-imating the φ ( n ) s vector as well.Therefore, the MGP prediction of the variance is given byˆ v ( n,n ) s,t ( p, q ) := δ nn ˜ k ( n ) α,α ( p, q ) − φ ( n ) s ( p ) > ( ¯ V ( n ) s ) > ¯ V ( n ) t φ ( n ) t ( q ) . (22)In this form, the variance would still require a multiplication between two matrices of size24 × N T . However, we note that the computational cost of total variance can be reducedˆ V α ( ρ ) = k α,α ( ρ, ρ ) − (cid:18) X n X p ∈ ρ ( n ) X s ¯ V ( n ) s φ ( n ) s ( p )ˆ r sp,α (cid:19) . (23)Nominally, the calculation of the second term in the variance still scales with the data setsize N T , since the size of ¯ V ( n ) depends on N T . However, we ﬁnd the cost of the vector-matrix multiplication to be negligible compared to evaluations of the self-kernel function(the 1st term in Eq.22). Therefore, we have formulated a way to signiﬁcantly speed up thecalculation of the variance, which only weakly depends on the training set size. In addition,by tuning the PCA rank m , we can optimize the balance between accuracy and eﬃciency,increasing the ﬂexibility of this model. V. ACKNOWLEDGEMENT

We thank Steven Torrisi and Simon Batzner for help with the development of the FLAREcode. We thank Anders Johansson for the development of the KOKKOS GPU accelerationof the MGP LAMMPS pair style. We thank Dr. Nicola Molinari, Dr. Aldo Glielmo, Dr.Claudio Zeni and Dr. Sebastian Matera for useful discussions. Y.X. is supported by theUS Department of Energy (DOE) Oﬃce of Basic Energy Sciences under Award No. DE-SC0020128. L.S. was supported by the Integrated Mesoscale Architectures for SustainableCatalysis (IMASC), an Energy Frontier Research Center funded by the US Department ofEnergy (DOE) Oﬃce of Basic Energy Sciences under Award No. DE-SC0012573. A.C. issupported by the Harvard Quantum Initiative. J.V. is supported by Robert Bosch LLC andthe National Science Foundation (NSF), Oﬃce of Advanced Cyberinfrastructure, Award No.2003725.

VI. COMPETING INTERESTS

The authors declare no competing ﬁnancial or non-ﬁnancial interests.25

II. CODE AVAILABILITY

The algorithm for the MGP and mapped Bayesian active learning has been implementedin the open-source package FLARE [50] and is publicly available online [1].

VIII. DATA AVAILABILITY

The code used to generate results in this paper, and the related data are published inMaterials Cloud Archive [46] with DOI: 10.24435/materialscloud:cs-tf [52].

IX. AUTHOR CONTRIBUTIONS

Y.X. developed the MGP variance method. J.V. implemented the FLARE Bayesianactive learning framework, and Y.X. implemented MGP in the training workﬂow. L.S. andY.X. developed the LAMMPS pair style for MGP. Y.X. trained the MGP models for staneneand performed molecular dynamics and analysis. A.C. contributed to the DFT calculations.B.K. initiated and supervised the work and contributed to algorithm development. Y.X.and B.K. worked on the manuscript, with contributions from all authors. [1] https://github.com/mir-group/flare .[2] Albert P Bart´ok and G´abor Cs´anyi. Gaussian approximation potentials: A brief tutorialintroduction. International Journal of Quantum Chemistry, 115(16):1051–1057, 2015.[3] Albert P Bart´ok, James Kermode, Noam Bernstein, and G´abor Cs´anyi. Machine learning ageneral-purpose interatomic potential for silicon. Physical Review X, 8(4):041048, 2018.[4] Albert P Bart´ok, Risi Kondor, and G´abor Cs´anyi. On representing chemical environments.Physical Review B, 87(18):184115, 2013.[5] Albert P Bart´ok, Mike C Payne, Risi Kondor, and G´abor Cs´anyi. Gaussian approximationpotentials: The accuracy of quantum mechanics, without the electrons. Physical review letters,104(13):136403, 2010.[6] MI Baskes. Application of the embedded-atom method to covalent materials: a semiempiricalpotential for silicon. Physical review letters, 59(23):2666, 1987.

7] J¨org Behler. Atom-centered symmetry functions for constructing high-dimensional neuralnetwork potentials. The Journal of chemical physics, 134(7):074106, 2011.[8] J¨org Behler. Neural network potential-energy surfaces in chemistry: a tool for large-scalesimulations. Physical Chemistry Chemical Physics, 13(40):17930–17955, 2011.[9] J¨org Behler. Constructing high-dimensional neural network potentials: A tutorial review.International Journal of Quantum Chemistry, 115(16):1032–1050, 2015.[10] J¨org Behler. Perspective: Machine learning potentials for atomistic simulations. The Journalof chemical physics, 145(17):170901, 2016.[11] J¨org Behler and Michele Parrinello. Generalized neural-network representation of high-dimensional potential-energy surfaces. Physical review letters, 98(14):146401, 2007.[12] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.[13] Viktor F Britun, Alexander V Kurdyumov, and Igor A Petrusha. Diﬀusionless nucleation oflonsdaleite and diamond in hexagonal graphite under static compression. Powder Metallurgyand Metal Ceramics, 43(1-2):87–93, 2004.[14] FP Bundy. Direct conversion of graphite to diamond in static pressure apparatus. The Journalof Chemical Physics, 38(3):631–643, 1963.[15] FP Bundy, WA Bassett, MS Weathers, RJ Hemley, HU Mao, and AF Goncharov. Thepressure-temperature phase and transformation diagram for carbon; updated through 1994.Carbon, 34(2):141–153, 1996.[16] Kimberly Chenoweth, Adri CT Van Duin, and William A Goddard. Reaxﬀ reactive forceﬁeld for molecular dynamics simulations of hydrocarbon oxidation. The Journal of PhysicalChemistry A, 112(5):1040–1053, 2008.[17] Mathew J Cherukara, Badri Narayanan, Alper Kinaci, Kiran Sasikumar, Stephen K Gray,Maria KY Chan, and Subramanian KRS Sankaranarayanan. Ab initio-based bond orderpotential to investigate low thermal conductivity of stanene nanostructures. The journal ofphysical chemistry letters, 7(19):3752–3759, 2016.[18] Jialiang Deng, Bingyu Xia, Xiaochuan Ma, Haoqi Chen, Huan Shan, Xiaofang Zhai, BinLi, Aidi Zhao, Yong Xu, Wenhui Duan, et al. Epitaxial growth of ultraﬂat stanene withtopological band inversion. Nature materials, 17(12):1081, 2018.[19] Eric Ganz, Ariel B Ganz, Li-Ming Yang, and Matthew Dornfeld. The initial stages of meltingof graphene between 4000 k and 6000 k. Physical Chemistry Chemical Physics, 19(5):3756– n -body force ﬁelds using gaussian process regression. arXiv preprint arXiv:1905.07626, 2019.[23] Jiequn Han, Linfeng Zhang, Roberto Car, et al. Deep potential: A general representation ofa many-body potential energy surface. arXiv preprint arXiv:1707.01478, 2017.[24] Max Hodapp and Alexander Shapeev. In operando active learning of interatomic interactionduring large-scale simulations. arXiv preprint arXiv:2004.13158, 2020.[25] Tetsuo Irifune, Ayako Kurio, Shizue Sakamoto, Toru Inoue, and Hitoshi Sumiya. Correction:ultrahard polycrystalline diamond from graphite. Nature, 421(6925):806–806, 2003.[26] Ryosuke Jinnouchi, Ferenc Karsai, and Georg Kresse. On-the-ﬂy machine learning force ﬁeldgeneration: Application to melting points. arXiv preprint arXiv:1904.12961, 2019.[27] Ryosuke Jinnouchi, Jonathan Lahnsteiner, Ferenc Karsai, Georg Kresse, and Menno Bokdam.Phase transitions of hybrid perovskites simulated by machine-learning force ﬁelds trained onthe ﬂy with bayesian inference. Physical Review Letters, 122(22):225701, 2019.[28] Rustam Z Khaliullin, Hagai Eshet, Thomas D K¨uhne, J¨org Behler, and Michele Parrinello.Nucleation mechanism for the direct graphite-to-diamond phase transition. Nature materials,10(9):693–697, 2011.[29] Menghan Liao, Yunyi Zang, Zhaoyong Guan, Haiwei Li, Yan Gong, Kejing Zhu, Xiao-PengHu, Ding Zhang, Yong Xu, Ya-Yu Wang, et al. Superconductivity in few-layer stanene. NaturePhysics, 14(4):344, 2018.[30] L Lindsay and DA Broido. Optimized tersoﬀ and brenner empirical potential parameters forlattice dynamics and phonon thermal transport in carbon nanotubes and graphene. PhysicalReview B, 81(20):205441, 2010.[31] J. H. Los, K. V. Zakharchenko, M. I. Katsnelson, and Annalisa Fasolino. Melting temperatureof graphene. Phys. Rev. B, 91:045415, Jan 2015.[32] Jonathan P Mailoa, Mordechai Kornbluth, Simon Batzner, Georgy Samsonidze, Stephen TLam, Jonathan Vandermause, Chris Ablitt, Nicola Molinari, and Boris Kozinsky. A fast neural etwork approach for direct covariant forces prediction in complex multi-element extendedsystems. Nature Machine Intelligence, 1(10):471–479, 2019.[33] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.[34] Felix Musil, Michael J Willatt, Mikhail A Langovoy, and Michele Ceriotti. Fast and accu-rate uncertainty estimation in chemical machine learning. Journal of chemical theory andcomputation, 15(2):906–915, 2019.[35] Hiroaki Ohfuji and Kiyoshi Kuroki. Origin of unique microstructures in nano-polycrystallinediamond synthesized by direct conversion of graphite at static high pressure. Journal ofMineralogical and Petrological Sciences, 104(5):307–312, 2009.[36] Andrew A Peterson, Rune Christensen, and Alireza Khorshidi. Addressing uncertainty inatomistic machine learning. Physical Chemistry Chemical Physics, 19(18):10978–10985, 2017.[37] Steve Plimpton. Fast parallel algorithms for short-range molecular dynamics. Journal ofcomputational physics, 117(1):1–19, 1995.[38] Evgeny V Podryabinkin and Alexander V Shapeev. Active learning of linearly parametrizedinteratomic potentials. Computational Materials Science, 140:171–180, 2017.[39] Carl Edward Rasmussen. Gaussian processes in machine learning. In Summer School onMachine Learning, pages 63–71. Springer, 2003.[40] Kristof Sch¨utt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexan-dre Tkatchenko, and Klaus-Robert M¨uller. Schnet: A continuous-ﬁlter convolutional neuralnetwork for modeling quantum interactions. In Advances in Neural Information ProcessingSystems, pages 991–1001, 2017.[41] Kristof T Sch¨utt, Huziel E Sauceda, P-J Kindermans, Alexandre Tkatchenko, and K-R M¨uller.Schnet–a deep learning architecture for molecules and materials. The Journal of ChemicalPhysics, 148(24):241722, 2018.[42] KT Schutt, Pan Kessel, Michael Gastegger, KA Nicoli, Alexandre Tkatchenko, and K-RMuller. Schnetpack: A deep learning toolbox for atomistic systems. Journal of chemicaltheory and computation, 15(1):448–455, 2018.[43] Alexander V Shapeev. Moment tensor potentials: A class of systematically improvable inter-atomic potentials. Multiscale Modeling & Simulation, 14(3):1153–1173, 2016.[44] Justin S Smith, Olexandr Isayev, and Adrian E Roitberg. Ani-1: an extensible neural networkpotential with dft accuracy at force ﬁeld computational cost. Chemical science, 8(4):3192– https://archive.materialscloud.org/record/2020.99 .[53] Yu Xie, Jonathan Vandermause, Lixin Sun, Andrea Cepellotti, and Boris Kozinsky. Supple-mentary material.[54] Kostyantin V Zakharchenko, Annalisa Fasolino, JH Los, and MI Katsnelson. Melting ofgraphene: from two to one dimension. Journal of Physics: Condensed Matter, 23(20):202202,2011.

55] Linfeng Zhang, Jiequn Han, Han Wang, Roberto Car, and E Weinan. Deep potential moleculardynamics: a scalable model with the accuracy of quantum mechanics. Physical review letters,120(14):143001, 2018.[56] Feng-feng Zhu, Wei-jiong Chen, Yong Xu, Chun-lei Gao, Dan-dan Guan, Can-hua Liu, DongQian, Shou-Cheng Zhang, and Jin-feng Jia. Epitaxial growth of two-dimensional stanene.Nature materials, 14(10):1020, 2015. ast Bayesian Force Fields from Active Learning: Study ofInter-Dimensional Transformation of Stanene Yu Xie, Jonathan Vandermause,

1, 2

Lixin Sun, Andrea Cepellotti, and Boris Kozinsky ∗ John A. Paulson School of Engineering and Applied Sciences,Harvard University, Cambridge, MA 02138, USA Department of Physics, Harvard University, Cambridge, MA 02138, USA

Supplementary Materials

1. Methodology 21.1. Multi-Component Systems 22. Case Study: Stanene 22.1. Training Settings 22.2. Accuracy Validation 42.3. Phase Transition Process 52.3.1. Radial Distribution Function 52.3.2. Video of the Transition Process 53. Performance 63.1. MGP’s Acceleration of Original GP 6References 7 ∗ [email protected] a r X i v : . [ phy s i c s . c o m p - ph ] A ug . METHODOLOGY1.1. Multi-Component Systems In the main text, we derived the mathematical formulation in the context of single com-ponent. We note that our GP and MGP are natural to extend to multiple components, bydiﬀerentiating combinations of species. As described in the main text, the n -body kernelcompares two n -atom subsets p ∈ ρ ( n ) i , q ∈ ρ ( n ) j . When there are multiple species in thesystem, we deﬁne the kernel function ˜ k ( n ) ( p, q ) to be non-zero only when p and q have thesame species in their corresponding atoms. To formulate more rigorously, denote s p is thesequence of species of atom is in p , we deﬁne the multi-component kernel as˜ k ( n ) ( p, q ) = δ s p , s q exp (cid:18) − d ( n ) ( p, q ) λ ( n ) ) (cid:19) ϕ ( n )cut ( p, q ) , Therefore, the mapped force and variance can be extended to multi-component system inthe way that for each possible s p , we construct f ( n ) ( p ) and φ ( n ) ( p ) the same as the single-component system that is described in the main text. In prediction, the force and variancecontribution from n -atom subsets in the testing atomic environment are predicted by thespline functions corresponding to their species.

2. CASE STUDY: STANENE2.1. Training Settings

We start constructing a 4 × × a = 4 . d = 0 . c = 20˚A is used to make sure that the stanene monolayer is well isolated so that theperiodic boundary condition in the z-direction has little inﬂuence on the result. We performMD with the MGP force ﬁeld using a time step of 1 fs, and the Velocity Verlet algorithmto integrate Newtons equations of motion, in the NVE ensemble.We use Quantum-ESPRESSO [6], a plane-wave implementation of DFT, using an ultra-soft pseudo-potential with PBEsol[4] exchange-correlation functional for Sn. The k-pointmesh is 4 × ×

1, and the energy cutoﬀ is 50 Ry, with charge density cutoﬀ of 400 Ry.2ython is used in all our code except for DFT. FLARE [10] is our base frameworkof Bayesian active learning with GP, and the newly developed MGP is inserted into theworkﬂow. The radial cutoﬀs for local environments are chosen as 7 . σ ), length scale l and noise σ n are shown in the tablebelow, and are optimized to maximize the marginal likelihood function during the training,using the BFGS algorithm [2, 5, 7, 9]. More details on the hyperparameter optimization areexplained in our earlier work [10]. Kernel Cutoﬀ (˚A) Initial Hyps Optimized Hyps σ sig (eV/˚A) l (˚A) σ n (eV/˚A) σ sig (eV/˚A) l (˚A) σ n (eV/˚A)2-body 7.2 0.2 1.0 0.05 3.27e-2 0.518 0.1403-body 7.2 1.0e-4 1.0 0.05 3.75e-5 0.691 0.140 TABLE I: GP hyperparametersFor MGP, the hyperparameters are shown in Table.SII. To interpolate a function, we needto choose a closed interval and determine the grid points in this interval. The number ofgrid points for the mapped spline models is chosen based on the trade-oﬀ between accuracyand eﬃciency of the interpolation. For 2-body, the interval is chosen as [ a, b ], where a shouldbe smaller than the minimal interatomic distance in MD simulation, and b is set to be thesame as the 2-body cutoﬀ of GP. For 3-body, we choose a “cube” space, where we ﬁnd aclosed interval in each dimension. Since 3-body describes triplets, i.e. the triangle formedby three atoms, that can be determined by the two bonds connected to the center atom, r , r , and the angle a = h ˆ r , ˆ r i between r and r , we use ( r , r , a ) as grid points to buildup interpolation function. So the interval for r and r should be chosen the same as 2-body,only the upper bound is changed to the 3-body cutoﬀ of GP. For the angle, the lower boundand upper bound are ﬁxed to be 0 and π , which are not varied in diﬀerent systems and notchosen by users.The 1-D and 3-D cubic spline functions come from Github repository interpolation.py [1].3 rid number lower bound upper bound2-body 144 2 .

35 ˚A 7 . × ×

128 2 .

35 ˚A, 2 .

35 ˚A, 0 rad 7 . . π rad TABLE II: MGP hyperparameters

After obtaining the reﬁned MGP model from the above iterative training process, it isnecessary to validate the accuracy of the force ﬁeld and the uncertainty prediction. Å )1.51.00.50.00.51.01.5 M G P F o r c e s ( e V / Å ) Forces Å )0.000.050.100.150.200.250.30 M G P U n c e r t a i n t i e s ( e V / Å ) Uncertainties

FIG. 1: MGP prediction versus GPFirst, because the MGP is an approximation to the original GP, we must verify thatMGP force and uncertainty prediction match those of the GP. In Fig.S1, we show that themean absolute error (MAE) for force and uncertainty are both in the order 10 − eV/˚A, andcan therefore conclude that the MGP has a negligible loss of accuracy compared to the GP.Next, we compare the performance of the MGP against the BOP force ﬁeld of Ref. [3].In particular, we validate results using frames of diﬀerent atomic structures taken from anindependent MD trajectory, and study the mean absolute error of forces with respect tothe DFT prediction. Since the MGP force ﬁeld is trained with PBEsol exchange-correlationfunctional, ultra-soft pseudopotential, of the platform Quantum Espresso (labelled as DFT“I”), while the BOP potential is trained with PBE functional, PAW pseudopotential of4ASP (labelled as DFT “II”), we make a comparison of the mean absolute error of forceprediction with both DFT “I” and “II” as the ground truth. In Table.SIII we show that theMGP achieves a better accuracy than the BOP in all of these frames, by at least 54%. Astemperature is increased, and as the deviation from the crystal phase increases, both forceﬁelds have larger MAE. This is because the system at high temperature explores many morediﬀerent conﬁgurations, likely diﬀerent from the training set, nevertheless, the results showthat the MGP has uniformly better accuracy. Crystal Transition MoltenDFT I II I II I IIBOP 0.254 0.241 0.322 0.376 0.258 0.305MGP 0.082 0.120 0.148 0.206 0.110 0.139

TABLE III: Mean absolute error (MAE) of the force prediction compared to DFT, in unitsof eV/˚A.

The radial distribution function (RDF) of the slab conﬁguration from 200K MD simula-tion is shown in Fig.2 (top), where the black dash line is the RDF of the perfect bcc latticeof tin [8].The radial distribution function of 600 ps frame in the melting process simulation isshown in Fig.2 (bottom), which shows one peak at ∼ The videos can be downloaded from DOI: 10.24435/materialscloud:cs-tf [11].5IG. 2: Radial distribution function (RDF) of 3D slab and liquid

3. PERFORMANCE3.1. MGP’s Acceleration of Original GP

Since the original GP is a non-parametric method, it depends on the training data setcompletely, thus its prediction cost scales linearly with respect to the training data set size.After the force ﬁeld is trained and mapped, MGP is obtained as a parametric model, whosebasis is a group of cubic spline functions, and can be directly employed independent of theoriginal GP regression model as well as training data set, thus reaches constant scaling ofcomputational cost at prediction.The major computational bottleneck in the prediction of the current scheme lies in twoparts. One is the construction of the local atomic environment, which requires calculatinginteratomic distances and identifying all n -atom subsets. The other is the calculation ofthe ﬁrst term in the variance formulation, the self-kernel function k αα ( ρ, ρ ) which scalesquadratically with the number of n -atom subsets in ρ , while the second term, the vectorized6

50 100 150 200 250 300Training Set Size ( Sp ee d - u p o f M G P o v e r G P Speed-up of Prediction

Force & UncertaintyForce only

FIG. 3: Speed up of MGP compared to GP, which grows linearly w.r.t. the training setsize. Prediction of both force and uncertainty (blue line) gets 2 orders of magnitude ofacceleration when GP has O (10 ) training data, while force prediction without uncertaintyis 3 orders of magnitude faster.spline function φ , scales linearly. We note the situation is diﬀerent in the original GP, wherethe construction of atomic environment and the evaluation of the self-kernel are negligiblecompared to the evaluation of the kernel vector. In this way the present mapping schemefor both forces and uncertainties eliminated the main computational bottleneck of the fullGP model. [1] https://github.com/EconForge/interpolation.py .[2] Charles G Broyden. The convergence of a class of double-rank minimization algorithms: 2.the new algorithm. IMA journal of applied mathematics , 6(3):222–231, 1970.[3] Mathew J Cherukara, Badri Narayanan, Alper Kinaci, Kiran Sasikumar, Stephen K Gray,Maria KY Chan, and Subramanian KRS Sankaranarayanan. Ab initio-based bond orderpotential to investigate low thermal conductivity of stanene nanostructures.

The journal ofphysical chemistry letters , 7(19):3752–3759, 2016.[4] G´abor I Csonka, John P Perdew, Adrienn Ruzsinszky, Pier HT Philipsen, S´ebastien Leb`egue, oachim Paier, Oleg A Vydrov, and J´anos G ´Angy´an. Assessing the performance of recentdensity functionals for bulk solids. Physical Review B , 79(15):155107, 2009.[5] Roger Fletcher. A new approach to variable metric algorithms.

The computer journal ,13(3):317–322, 1970.[6] Paolo Giannozzi, Stefano Baroni, Nicola Bonini, Matteo Calandra, Roberto Car, Carlo Cavaz-zoni, Davide Ceresoli, Guido L Chiarotti, Matteo Cococcioni, Ismaila Dabo, et al. Quantumespresso: a modular and open-source software project for quantum simulations of materials.

Journal of physics: Condensed matter , 21(39):395502, 2009.[7] Donald Goldfarb. A family of variable-metric methods derived by variational means.

Mathe-matics of computation , 24(109):23–26, 1970.[8] Materials Project. Crystal structure of bcc sn (mp-7162). https://materialsproject.org/materials/mp-7162 .[9] David F Shanno. Conditioning of quasi-newton methods for function minimization.

Mathe-matics of computation , 24(111):647–656, 1970.[10] Jonathan Vandermause, Steven B Torrisi, Simon Batzner, Alexie M Kolpak, and Boris Kozin-sky. On-the-ﬂy bayesian active learning of interpretable force-ﬁelds for atomistic rare events.

Preprint at https://arxiv. org/abs/1904.02042 , 2019.[11] Yu Xie, Jonathan Vandermause, Lixin Sun, Andrea Cepellotti, and Boris Kozinsky. Fastbayesian force ﬁelds from active learning and mappedgaussian processes: Application to struc-tural phase transitionof stanene. https://archive.materialscloud.org/record/2020.99 ..