Maximum volume simplex method for automatic selection and classification of atomic environments and environment descriptor compression
Behnam Parsaeifard, Daniele Tomerini, Deb Sankar De, Stefan Goedecker
MMaximum volume simplex method for automatic selection and classification of atomicenvironments and environment descriptor compression
Behnam Parsaeifard,
1, 2
Daniele Tomerini,
1, 2
Deb Sankar De,
1, 2 and Stefan Goedecker
1, 2 Department of Physics, Universitaet Basel, Klingelbergstrasse 82, 4056 Basel, Switzerland National Center for Computational Design and Discovery of Novel Materials (MARVEL), Switzerland (Dated: September 15, 2020)Fingerprint distances, which measure the similarity of atomic environments, are commonly cal-culated from atomic environment fingerprint vectors. In this work we present the simplex methodwhich can perform the inverse operation, i.e. calculating fingerprint vectors from fingerprint distances.The fingerprint vectors found in this way point to the corners of a simplex. For a large data set offingerprints, we can find a particular largest volume simplex, whose dimension gives the effectivedimension of the fingerprint vector space. We show that the corners of this simplex correspond tolandmark environments that can by used in a fully automatic way to analyse structures. In thisway we can for instance detect atoms in grain boundaries or on edges of carbon flakes without anyhuman input about the expected environment. By projecting fingerprints on the largest volumesimplex we can also obtain fingerprint vectors that are considerably shorter than the original onesbut whose information content is not significantly reduced.
I. INTRODUCTION
Materials science has become to a large extent a datadriven science. Several data banks exist that containsnot only structural data, but calculated properties aswell; many exceed the hundreds of thousands structuralproperties in number, with their number growing dra-matically [1–4]. Molecular dynamics simulations typicallyalso generate very large data sets. Such large data setscan not any more be inspected by eye and tools for clas-sifying the structures in an automatic way are needed.Atomic environments can be described in a quantitativefashion by descriptors called "atomic environment fin-gerprints" [5–9], that can also provide a description forentire crystalline structures [10]. Atomic environment fin-gerprints are also used as inputs for supervised machinelearning schemes [11–13] of potential energy surfaces. Forsuch a use it is desirable that the fingerprint is able to de-tect any difference in the environment [14] while keepingthe fingerprint vector as short as possible.One of our goals will be the detection of grain bound-aries, that are the disordered regions between one or twoordered phases. Grain boundaries have an importantinfluence on physical properties of the system includingstrength, conductivity, ductility, and crack resistance toname but a few [15–20].Several methods have been proposed in the literature todistinguish between certain reference crystalline structuresand disordered and mainly liquid structures in meltingand nucleation simulations such as Steinhardt parame-ters [21] and common neighbour analysis (CNA) [22].These methods have also been used to study dislocations,local ordering and grain boundaries [23–27]. One of thedisadvantages of these methods is that they are basedon a sharp cutoff, and they end up lacking smoothnesswith respect to particle displacements occurring in MD orduring relaxations. As its name suggests, in the adaptivecommon neighbour analysis [28] the cutoff is adapted to the environment of each atom. Although more robustcompared to CNA, it remains sensitive to thermal vi-brations. Different predefined crystalline structures canbe distinguished by polyhedral template matching [29].SOAP [5] fingerprints coupled to machine learning meth-ods were recently also used to predict properties of grainboundaries [30]. Based on a formula to calculate the en-tropy for a system interacting only via pairwise forces,an atomic entropy can be obtained which allows to dis-tinguish between liquid, FCC, BCC and HCP crystallinephases [31].The common characteristic of all existing methods isthat they require some human input about the relevantstructures that are expected to be encountered in thesimulation. This is in contrast to our method which selectsall the relevant structures fully automatically based on alarge pool of structures. The method is also applicablewithout any adjustments to any molecular system.
II. THE LARGEST VOLUME SIMPLEXMETHODA. fingerprints and fingerprint distances
In this section we provide a short review of the overlapmatrix fingerprint method, that we use to describe thelocal atomic environment. A complete description can befind in the original paper detailing the method [10].In order to calculate the overlap matrix (OM) finger-print for an atom k in a structure, we take into accountthe relative position of all the neighbours of that atomwithin a cutoff sphere (centered on atom k ) of radius R c . Neighbours include all the relevant periodic imagesof an atom when dealing with an atom at the edge of arepeating unit for a periodic system. To each one of theatoms is associated a minimal set of normalized atom-centered Gaussians G ν ( r − R i ) , centered on the atom a r X i v : . [ phy s i c s . c o m p - ph ] S e p itself. The width of each Gaussian is given by the covalentradius of the atom on which it is centered. For carbonwith its strong directional bonding we have used a set ofs and p-type orbitals ( ν = s, p x , p y , p z ) and denote theresulting fingerprint by OM[sp], for aluminum with itsmetallic bonding we have used only ν = s and denotethe fingerprint by OM[s]. We then calculate the overlapbetween Gaussian functions in the sphere. S ki,ν,j,µ = (cid:90) G ν ( r − R i ) G µ ( r − R j ) d r (1)Next, the overlap matrix S ki,ν,j,µ is multiplied by theamplitude functions f c ( | R k − R i | ) and f c ( | R k − R j | ) toobtain a modified overlap matrix ˜ S ˜ S ki,ν,j,µ = f c ( | R k − R i ) S ki,ν,j,µ f c ( | R k − R j | ) (2) f c ( r ) = (1 − r w ) is a cutoff function that vanishes atand beyond r = 2 w = R c with two continuous derivatives. w gives the length scale over which f c ( r ) drops to zeroand we typically choose it so that about 50 atoms arecontained within the cutoff radius R c . The matrix whosecolumns are denoted by the composite index i, ν andwhose rows are given by the composite index j, µ is thendiagonalized to obtain the eigenvalues. Finally, the vector V k containing all the eigenvalues of the matrix ˜ S ki,ν,j,µ isthe fingerprint of atom k . It has a length L = 4 N sphere for OM[sp] and L = N sphere for OM[s] where N sphere isthe number of atoms in the sphere around the centralatom.By construction the fingerprint is robust against dis-placements of the atoms across the boundary of the sphereradius, and therefore it is possible to calculate derivativesof the fingerprints with respect to infinitesimal structuralchange around the atom k . The fingerprint vectors V k characterize the atomic environments around atom k andthe fingerprint distance d i,j is a measure of the dissimi-larity between two environments i and j . The fingerprintdistance is obtained from the euclidean norm of the dif-ference vector throughout this study: d i,j = | V i − V j | (3) B. Obtaining fingerprint vectors from fingerprintdistances
The above formula 3 gives a trivial recipe to obtainfingerprint distances d i,j from a set of points representedby the fingerprint vectors in a space of dimension L . Inthe following we will derive the formulas for the inverseoperation. Given a set of pairwise fingerprint distances d i,j we want to construct a set of points x i that willsatisfy these constraints. The solution of this problem isnot unique. The solution can however be made unique byrequiring that the first point be the origin, x = 0 , andthat for each consecutive point the number of nonzero components increases by one. Hence the points x i havethe following structure: ( x , x , . . . , x N ) = x , x , . . . x ,N x , . . . x ,N ... ... . . . ... x N,N (4)So after placing the first point at the origin, the nextpoint lies on the positive x-axis at the right distance,the following on the xy plane (y>0), and so on. Thecomponents of the set of points x i ’s can be obtainedrecursively from simple relations between the distancesamong the vectors V i ’s.The distance between x N and the origin, x , is simplygiven by the norm of the vector: d ,N = N (cid:88) i =1 x i,N (5)For M < N , the difference between column N and M isrelated to the distance between points x N and x M as d M,N = M (cid:88) i =1 ( x i,N − x i,M ) + N (cid:88) j = M +1 x j,N (6)By taking the difference between d M,N and d ,N we obtaina simplified set of equations: d M,N − d ,N = M (cid:88) i =1 ( x i,N − x i,M ) − x i,N = M (cid:88) i =1 − x i,M (2 x i,N − x i,M ) (7)In Eq. 7, the unknowns x i,N depends only on othercolumn elements x j,M with M < N . x , = d , (8) x , = d , + d , − d , x , (9) x , = (cid:113) d , − x , (10)We can write for M < N in general: x M,N = d ,M + d ,N − d M,N − (cid:80) M − i =1 x i,M x i,N x M,M (11)and for M = N we have: x N,N = (cid:118)(cid:117)(cid:117)(cid:116) d ,N − N − (cid:88) i =1 x i,N (12)The geometrical body having as corners the above cal-culated points is a N -dimensional simplex with volume x , x , . . . x N,N /N ! . The above construction can be donefor any set of N env ( N env − distances as long as the origi-nal V i ’s giving rise to the distances via Eq. 3 are linearlyindependent. Since the number of environments N env is typically much larger than the length L of the finger-print vectors, at most L points (including in the countthe origin) can be obtained. If the number of linearlyindependent fingerprint vectors is less than L , x i,i willbecome zero for some i < L and it is thus not possible toincrease the dimension of the simplex. In the context ofour fingerprints, it turns out that the x i,i typically arenot exactly zero but become very small which means thatall the fingerprint vectors are essentially contained in asub volume whose dimension is smaller than L . The com-ponent that is orthogonal to this subspace is then verysmall and can be neglected. This is the basic propertywhich will be exploited for the fingerprint compressionlater in the paper. C. Construction of the largest volume simplex
Now, we will describe how we can use the constructionoutlined above to obtain the largest volume simplex whichwe will simply denote by largest simplex (LS). We do thissince we are interested to find the effective dimension l of the space spanned by the fingerprints which gives thenumber of the highly distinctive landmark environmentstogether with these environments. We start by identifyingthe two environments characterized by the largest distance.This defines the origin x and the first point along thex-axis, i.e. x , and in this way the first two cornersof the simplex, which is at this stage just a line. Toenlarge in the next step the dimension of the simplexby one we search for the environment that will give thelargest area triangle if the point x , corresponding to thisenvironment, is used as the third corner. We then increasethe dimension of the simplex step by step and we choosethe new corners in each step in such a way that the volumeof the new simplex will be maximal. The procedure isstopped if in a certain step l the volume collapses to avery small value because additional fingerprint vectors arequasi linearly dependent on the previous ones. In this wayan effective dimension l of the entire fingerprint space canbe determined. Once this maximum volume simplex isconstructed we can express other fingerprint vectors in thebasis of the vectors x i spanning the LS simplex. To getthe expansion coefficients, we just perform the same stepsof Eqs. 8 to 12. that would be needed to add a cornerto the simplex. However in this case we know alreadythat the x l +1 ,l +1 from Eq. 12 will be negligible becausewe stopped the maximum volume simplex constructionexactly for the reason that we could not find any pointthat gave a large x l +1 ,l +1 . III. APPLICATIONS
In this section, we show some applications of the LS.In section III A we apply the methodology to the study ofa variety of C molecules, to identify the most distinctenvironments and group the most similar ones. In sectionIII B we use the method to find the grain boundaries in aAl nanocrystalline material. In III C we exploit the LS toreduce the dimensions of the fingerprint and compare itsperformance with CUR decomposition method [32]. A. C clusters Our first system to be studied consists of 5000 C structures, i.e. × atomic environments, that ex-hibit several structural motifs including sheets, chains,and cages. These structures were generated by minimahopping [33] runs coupled to DFTB [34]. Our aim is toidentify the most distinct atomic environments as wellas to classify the environments. We use OM[sp] with acutoff radius of R c = 2 w = 6 Å and follow the approachdescribed in section II to generate the LS with N = 60 .In Fig. 2 we show the first twenty corners of the LS. whichrepresent twenty highly distinct landmark environmentsin the data set. In agreement with basic chemical intuitionthe first two corners representing the two most differentchemical environments are a four fold coordinated atomand a carbon atom at the end of a linear chain with onlyone nearest neighbor as shown in Fig. 2b and Fig. 2a.Other two fold coordinated atoms in chains are also rep-resented by higher order corners of the LS as shown inFig. 2f, 2q, 2r, and 2c. In Fig. 2c the reference atom ispart of a chain but the chain points inside the cage whichshows that our method can distinguish between chainsthat point inward or outward since it is not based solelyon its nearest neighbours, but on its general environment.The forth corner of the simplex is an atom with onenearest neighbor and near a hole in the C shownin Fig. 2d. Other corners of the simplex also clearlyrepresent truly different environments. For instance, the8th corner of the LS shown in Fig. 2h is an atom in agraphite flake and the 16th corner of the LS is an atomin a fragmented part shown in Fig. 2p. Our data setcontains only a few fragmented structures in the dataset which are of type 2p and the LS could correctlyrecognize them as highly distinct environments.Next, we employ the corners of the LS to analyse struc-tures. Based on the fact that each corner representshighly distinct landmark environments, we can assumethat each fingerprint that has a small distance to any ofthese corners represents an environment that is similarto the corresponding landmark environment. So, we as-sign each atomic environment to its closest corner if thefingerprint distance is less than a threshold value δ whichwe take to be 0.5. With this criterion, we calculate thenumber of environments which belong to each class asshown in Fig. 1. The environments which do not belongto any corner of the LS, because their fingerprint distanceto the their closest corner is larger than δ , are shown inthe blue bar in Fig. 1. Since the first corner is at theorigin, Fig. 1 starts at zero.The energetic minimum of the C molecule is the fullerenemolecule. In this structural motif, the atomic environ-ments for all of the carbon atoms are equivalent. This isnot true any more if the fullerene has a so-called Stone-Wales defect [35]. In the following we look at such astructure as well as a 60 atom graphite flake and catego-rize the atoms according to their fingerprint distance tothe landmark environments, i.e. the corners of the LS.None of the atomic environments of these two structuresis actually a landmark environment of the simplex. Forthe visualization, we assign a color to each corner of thesimplex. All the atomic environments in the data thathave a short fingerprint distance to this corner are thenshown in this color.Our method automatically classifies the atoms of thestructure shown in Fig. 3 a into three types and we caneasily verify by visual inspection that these three classesare in agreement with chemical intuition: We see an atomsurrounded by two pentagons and one hexagon (corner47 shown in Fig. 3 b ); one pentagon and two hexagons(corner 38 shown in Fig. 3 c ); or three hexagons (corner23 shown in Fig. 3 d ). As can be seen from Fig. 1, alarge number of atomic environments in our data set aresimilar to these corners.Another example is shown in Fig. 4. The atoms of thestructure in a are similar to one of the 6 different cornersof the simplex. These are shown in Fig. 4 b, c, d, e,f , and g . So indeed groups of environments that have ashort distances to a landmark environment share similarchemical environments. B. Grain boundary networks in nanocrystalline Al
In our second application, we study a nanocrystalline Alaggregate with 255064 atoms containing grain boundarynetworks. The details on the generation of the nanocrys-talline Al used here can be found elsewhere [31]. We usethe OM[s] fingerprint with a cutoff radius of R c = 5 Åto build the LS. We take N = 46 which is the same asthe length of the fingerprint. Having generated the LS,we assign a different color to each of the corners of thesimplex for the following visualizations. These corners arethe most distinct environments in the nanocrystalline Al,i.e. each corner can represent a class of diverse environ-ments in the data. We again categorize the atoms in thesystem according to their similarity to the corners of theLS and assign them the same color as the corners theyresemble most. Visual inspection of Fig. 5 shows that thesimplex method can find all the grain boundary networks,in agreement with the findings of Piaggi [31]. In addition,it can also recognize differences between different grainboundaries and find different kinds of ordered-disordered D i s t r i bu t i on ( a r b . u . ) Corner number (arb. u.)
Figure 1: The number of atomic environments in thedata set of C structures which are similar to one of thecorners of the LS. The blue bar represents environmentswhich are not similar to any corner based on the thethreshold value δ = 0 . .phases as shown in Fig. 6.In Fig. 6 we showed the first 20 corners of the LS. Fig.6a shows a perfect crystalline FCC phase. Figs. 6c and6r show the defective crystalline FCC phases where onenearest neighbor of the central atom is missing. Thecorners shown in Figs. 6e, 6n, 6p, and 6s correspond toatoms on a twisted grain boundary. The configurationsfrom Figs. 6b, 6d, 6h, 6l, and 6t represent environmentslocated on the boundary between ordered and disorderedphases. Finally, some corners of the LS represent atomsin disordered phases such as those shown in Figs. 6i and6j. C. The compression of the fingerprints
In section II we showed that once the LS is found,the original fingerprints can be projected onto the LS.In this section we will show that these projections canbe regarded as a new fingerprint whose length is muchshorter than the original fingerprint while containing mostof the information of the original fingerprint. This is anexample of data compression, a problem for which manyalgorithms are available such as CUR [32] decomposition.Assuming that F is the fingerprint matrix with dimen-sion L × N (cid:48) where L is the length of the fingerprint and N (cid:48) is the number of atomic environments N (cid:48) = N env ,i.e. i th column of F contains the fingerprint vector ofatomic environment i , one can write F ∼ CU R in which C and R contain k selected columns and rows of F and U = C + F R + where A + indicates the pseudo-inverse of A and k < r = rank ( F ) . In order to find the reducedselected number of rows of matrix F , one writes its SVDdecomposition as F = ¯ U DV T , where ¯ U (left singular (a) (b) (c) (d)(e) (f) (g) (h)(i) (j) (k) (l)(m) (n) (o) (p)(q) (r) (s) (t) Figure 2: The first twenty corners of the LS, i.e. the twenty most distinct atomic environments. The corners areshown in a different color than the rest of the atoms. The relative size and the colors of the atoms is due to thevisualization purposes and is not physically important. (a) (b) (c) (d)
Figure 3: a ) A C with a Stone-Wales defect: The atoms are colored according to their closest corners which is shownby the same color in the other three images. b ) corner 47; c ) corner 38; and d ) corner 23 of the LS. (a) (b) (c) (d)(e) (f) (g) Figure 4: a ) A graphite flake whose atoms are colored according to their closest corners. b ) corner 55; c ) corner 33; d )corner 26; e ) corner 25; f ) corner 9; and g ) corner 7 of the LS.matrix) and V (right singular matrix) are L × L and N (cid:48) × N (cid:48) unitary matrices and D is a L × N (cid:48) rectangulardiagonal matrix with non-negative real numbers on thediagonal. The diagonal entries of D are known as thesingular values of F . Then the leverage score for eachrow i is calculated as π i = k (cid:80) kξ =1 ( u ξi ) where u ξi is the i th component of ξ th left singular vector and k is thenumber of rows that should be selected. Frequently, rowsare selected with probability proportional to the leveragescore. We employed a deterministic method [37, 38] and select the row with the highest leverage score at each time.Then, the selected row is removed from the matrix andthe rest of the rows become orthogonalized with respectto it. To select other rows this procedure is repeated.The selected rows are the most important features. Onecan also select columns of the matrix F , i.e. the mostimportant atomic environments by following the sameprocedure but for F T . The selected rows and column arestored in R and C respectively.In the following, we employ the LS and CUR method to (a) (b) (c)(d) (e) (f) Figure 5: Nanocrystalline Al containing grain boundaries. The LS is employed to find the grain boundary networks.5a view from top. 5b view from front. 5c view from left. 5d view from right. 5e perspective view. 5f slice view.Software Ovito [36] is used for the visualization.reduce the length of the fingerprint by selecting the com-ponents of the fingerprint that contain the most importantinformation.In order to investigate whether the compressed finger-print conserves the information encoded in the original fin-gerprint, we correlate all the pairwise fingerprint distancesobtained by the original and compressed fingerprints [14].Obviously fingerprint distances that are large with theoriginal fingerprint should remain large with the com-pressed fingerprint. In the same way short distancesshould remain short. If this is the case all the points in acorrelation plot between the fingerprint distances arisingfrom the original and the compressed fingerprint will lieon or close to the diagonal. If there are points far awayfrom the diagonal and in particular if some fingerprint dis-tances of the compressed fingerprint are small whereas theoriginal distances are large, there is a loss of information.In Fig. 7 we show the correlation plot between the origi-nal fingerprints and the CUR-reduced and LS-reduced fin- gerprints using OM and SOAP [5] for our above-mentionedtest of 1000 C clusters with × atomic environ-ments. We used the same fingerprint parameters for OMas in section III A. For SOAP, we used the following pa-rameters: l max = n max = 8 , r δ = 4 . Å, σ = 0 . Å. Thecutoff radius is the same Å in both OM and SOAP.The software QUIP [39] is used to generate the SOAPfingerprints. The length L of the original fingerprints is240 for OM and 325 for SOAP. We reduced the length ofthe fingerprints to l = 16 . As can be seen in Fig. 7, thecorrelation is perfectly diagonal in the case of LS whichindicates that vast majority of the information of the orig-inal fingerprint is retained in the LS-reduced fingerprint.There are however some deviations from the diagonalin the correlation plot between the original fingerprintand CUR-reduced fingerprint which indicates that someinformation is lost in the CUR-decomposition. (a) (b) (c) (d)(e) (f) (g) (h)(i) (j) (k) (l)(m) (n) (o) (p)(q) (r) (s) (t) Figure 6: The first twenty most distinctive atomic environments in the nanocrystalline Al found by the LS. Red atomsare the central atoms whose local environment is one of the corners of the simplex and the atoms in their vicinity aredepicted in blue. L S - c o m p r e ss e d FP ( a r b . u . ) OM 0.00.30.71.00.0 0.3 0.7 1.0SOAP0.02.00.0 2.0 4.0 C U R - c o m p r e ss e d FP ( a r b . u . ) Original FP(arb.u.)
OM 0.00.30.71.00.0 0.3 0.7 1.0
Original FP(arb.u.)
SOAP
1 10 100 1000 10000 100000 1x10 Figure 7: The correlation between the original fingerprints and CUR and LS-reduced fingerprints for OM and SOAP.The length of the reduced fingerprints is l = 16 while the length of the original fingerprint L is 240 for OM and 325 forSOAP. IV. CONCLUSION
We have introduced an algorithm to construct a largestvolume simplex in the space spanned by a large set ofatomic environment fingerprint vectors. The number ofcorners of this simplex gives the effective dimension of thefingerprint vector space. The corners themselves representlandmark environments that can be used to analyse struc-tures with a large number of atoms in a fully automaticway. So, in contrast to other methods, it is not necessaryto include into our analysis tool criteria that are basedon human expectations of what kind of environments areexpected to be encountered in this system. We showthat this analysis method can be used to detect grainboundaries and other typical environments in multi-grainmetallic systems and to classify atomic environments ina carbon cluster in a way that is consistent with basicchemical intuition. Since only those components of thefingerprint vector that are inside the space spanned by the LS are relevant, projecting the fingerprint into thespace spanned by the simplex reduces the length of thefingerprint without any significant loss of information.Therefore the method can also be used as a data compres-sion method for fingerprints.
V. ACKNOWLEDGMENTS
We thank Dr. Pablo Piaggi for providing us with thenanocrystalline Al data. The authors acknowledge thatthis research was supported by NCCR MARVEL andfunded by the Swiss National Science Foundation. Struc-tures were visualized using VESTA [40] and Ovito [36]packages. The calculations were performed on the compu-tational resources of the Swiss National Supercomputer(CSCS) under project s963 and on the Scicore computingcenter of the University of Basel.0 [1] A. Jain, G. Hautier, C. J. Moore, S. P. Ong, C. C. Fischer,T. Mueller, K. A. Persson, and G. Ceder, ComputationalMaterials Science , 2295 (2011).[2] J. E. Saal, S. Kirklin, M. Aykol, B. Meredig, andC. Wolverton, JOM , 1501 (2013).[3] S. Curtarolo, W. Setyawan, S. Wang, J. Xue, K. Yang,R. H. Taylor, L. J. Nelson, G. L. Hart, S. Sanvito,M. Buongiorno-Nardelli, N. Mingo, and O. Levy, Com-putational Materials Science , 227 (2012).[4] L. Talirz, S. Kumbhar, E. Passaro, A. V. Yakutovich,V. Granata, F. Gargiulo, M. Borelli, M. Uhrin, S. P. Hu-ber, S. Zoupanos, C. S. Adorf, C. W. Andersen, O. Schütt,C. A. Pignedoli, D. Passerone, J. VandeVondele, T. C.Schulthess, B. Smit, G. Pizzi, and N. Marzari, ScientificData , 299 (2020).[5] A. P. Bartók, R. Kondor, and G. Csányi, Physical ReviewB , 184115 (2013).[6] J. Behler, The Journal of chemical physics , 074106(2011).[7] F. A. Faber, A. S. Christensen, B. Huang, and O. A.Von Lilienfeld, The Journal of chemical physics ,241717 (2018).[8] A. S. Christensen, L. A. Bratholm, F. A. Faber, D. R.Glowacki, and O. A. von Lilienfeld, arXiv preprintarXiv:1909.01946 (2019).[9] M. Hirn, S. Mallat, and N. Poilvert, Multiscale Modeling& Simulation , 827 (2017).[10] L. Zhu, M. Amsler, T. Fuhrer, B. Schaefer, S. Faraji,S. Rostami, S. A. Ghasemi, A. Sadeghi, M. Grauzinyte,C. Wolverton, et al. , The Journal of chemical physics ,034203 (2016).[11] J. Behler, International Journal of Quantum Chemistry , 1032 (2015).[12] M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A.Von Lilienfeld, Physical review letters , 058301 (2012).[13] A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi,Physical review letters , 136403 (2010).[14] B. Parsaeifard, D. S. De, A. S. Christensen, F. A. Faber,E. Kocer, S. De, J. Behler, A. von Lilienfeld, andS. Goedecker, Machine Learning: Science and Technology(2020).[15] N. Hansen, Scripta Materialia , 801 (2004).[16] A. Chiba, S. Hanada, S. Watanabe, T. Abe, andT. Obana, Acta metallurgica et materialia , 1733(1994).[17] T. Fang, W. Li, N. Tao, and K. Lu, Science , 1587(2011).[18] M. Shimada, H. Kokawa, Z. Wang, Y. Sato, and I. Karibe,Acta Materialia , 2331 (2002).[19] L. Lu, Y. Shen, X. Chen, L. Qian, and K. Lu, Science , 422 (2004).[20] M. A. Meyers, A. Mishra, and D. J. Benson, Progress inmaterials science , 427 (2006).[21] P. J. Steinhardt, D. R. Nelson, and M. Ronchetti, Phys.Rev. B , 784 (1983).[22] D. Faken and H. Jónsson, Computational Materials Sci-ence , 279 (1994).[23] J. Schiøtz and K. W. Jacobsen, Science , 1357 (2003).[24] V. Yamakov, D. Wolf, S. Phillpot, A. Mukherjee, andH. Gleiter, Philosophical Magazine Letters , 385 (2003). [25] C. Brandl, P. Derlet, and H. Van Swygenhoven, Modellingand Simulation in Materials Science and Engineering ,074005 (2011).[26] H. Jónsson and H. C. Andersen, Physical review letters , 2295 (1988).[27] N. P. Bailey, J. Schiøtz, and K. W. Jacobsen, PhysicalReview B , 144205 (2004).[28] A. Stukowski, Modelling and Simulation in MaterialsScience and Engineering , 045021 (2012).[29] P. M. Larsen, S. Schmidt, and J. Schiøtz, Modellingand Simulation in Materials Science and Engineering ,055007 (2016).[30] C. W. Rosenbrock, E. R. Homer, G. Csányi, and G. L.Hart, npj Computational Materials , 1 (2017).[31] P. M. Piaggi and M. Parrinello, The Journal of chemicalphysics , 114112 (2017).[32] M. W. Mahoney and P. Drineas, Proceedings of the Na-tional Academy of Sciences , 697 (2009).[33] S. Goedecker, The Journal of chemical physics , 9911(2004).[34] B. Aradi, B. Hourahine, and T. Frauenheim, J. Phys.Chem. A , 5678 (2007).[35] A. J. Stone and D. J. Wales, Chemical Physics Letters , 501 (1986).[36] A. Stukowski, Modelling and Simulation in MaterialsScience and Engineering , 015012 (2009).[37] G. Imbalzano, A. Anelli, D. Giofré, S. Klees, J. Behler,and M. Ceriotti, The Journal of Chemical Physics ,241730 (2018).[38] M. Ceriotti, M. J. Willatt, and G. Csányi, Handbookof Materials Modeling: Methods: Theory and Modeling ,1911 (2020).[39] N. Bernstein, G. Csanyi, and J. Kermode, “Quip andquippy documentation,” .[40] K. Momma and F. Izumi, Journal of applied crystallogra-phy44