[PDF] An automated approach for developing neural network interatomic potentials with FLAME

Abstract

The performance of machine learning interatomic potentials relies on the quality of the training dataset. In this work, we present an approach for generating diverse and representative training data points which initiates with \it{ab initio} calculations for bulk structures. The data generation and potential construction further proceed side-by-side in a cyclic process of training the neural network and crystal structure prediction based on the developed interatomic potentials. All steps of the data generation and potential development are performed with minimal human intervention. We show the reliability of our approach by assessing the performance of neural network potentials developed for two inorganic systems.

Full PDF

AAn automated approach for developing neuralnetwork interatomic potentials with FLAME

Hossein Mirhosseini, ∗ , † Hossein Tahmasbi, ‡ Sai Ram Kuchana, † S. AlirezaGhasemi, † and Thomas D. K¨uhne ∗ , † † Dynamics of Condensed Matter and Center for Sustainable Systems Design, Chair ofTheoretical Chemistry, University of Paderborn, 33098 Warburger Str. 100, Paderborn,Germany ‡ Leiden Institute of Chemistry, Gorlaeus Laboratories, Leiden University, P.O. Box 9502,2300 RA Leiden, The Netherlands

E-mail: [email protected]; [email protected]

Abstract

The performance of machine learning interatomic potentials relies on the quality ofthe training dataset. In this work, we present an approach for generating diverse andrepresentative training data points which initiates with ab initio calculations for bulkstructures. The data generation and potential construction further proceed side-by-side in a cyclic process of training the neural network and crystal structure predictionbased on the developed interatomic potentials. All steps of the data generation andpotential development are performed with minimal human intervention. We show thereliability of our approach by assessing the performance of neural network potentialsdeveloped for two inorganic systems. a r X i v : . [ c ond - m a t . m t r l - s c i ] F e b ntroduction In the context of molecular dynamics (MD) simulation, the classical force ﬁeld and ab initio

MD are the two limits. While the former is known to be more eﬃcient and the method ofchoice for large-scale simulations, the latter has proved to be a more accurate method andis applicable for the short-scale simulation of systems with less than a few hundred atoms.Atomistic modeling based on machine learning interatomic potentials (MLIPs), on the otherhand, has the advantages of both methods. Meaning, MLIPs can potentially enable per-formance of large-scale simulations (large length scales and long time scales) with quantumaccuracy orders of magnitude faster than current methods based on density functional theory(DFT).The success of MD simulations relies on the accuracy of the potential energy surface(PES). In the context of MLIPs, the PES is described as a function of atomic coordi-nates through local environment descriptors. MLIPs employ machine learning (ML) algo-rithms and local environment descriptors to describe atomic interactions. Numerous MLIPshave emerged in recent years. Examples of such potentials include high-dimensional neuralnetwork potentials (NNPs) and Gaussian approximation potential (GAP) , among oth-ers. The remarkable performance of MLIPs has already been demonstrated for inorganicsolids, hybrid materials, water, interfaces, and the dynamics of defects incrystalline and amorphous materials. A key component for developing reliable potentials in ML-based PES models is buildinga robust and representative reference dataset for model training. The training dataset isusually composed of atomic conﬁgurations with corresponding energies, forces, and/or stresstensors from DFT calculations. A diverse and extensive dataset consists of not only datapoints for conﬁgurations close to equilibrium but also those for energetically less favorableconﬁgurations to avoid the tendency of developing biased potentials and overﬁtted models.Often, training datasets are generated by performing MD calculations at various tempera-tures and/or densities, followed by electronic structure calculations for snapshots extracted2rom MD trajectories.

Ab initio

MD calculations for large systems, however, are demanding,making the development of MLIPs slow and computationally expensive.In this work, we adopt a diﬀerent approach to generating robust and representative train-ing datasets. Namely, we perform ab initio geometry optimization solely for periodic bulkstructures and extract samples from geometry optimization calculations to initiate trainingof neural network (NN) interatomic potentials. The bulk structures are taken from the Mate-rials Project database (MPDB).

In addition, crystal structure prediction based on ionicsubstitution is performed to increase the diversity of bulk structures. NN interatomicpotentials are further trained in iterative training cycles by including data from crystalstructure prediction based on the minima hopping method. Although hops from one lo-cal minimum to another in minima hopping is based on MD calculations, these calculationsare performed with generated NNPs, therefore, they are not computationally demanding.Generating extensive training datasets requires performing a substantial number of cal-culations and handling a large amount of data. An important task in modern computationalmaterials science is to accomplish these tasks automatically. In this respect, our Python-based script generates NNPs with minimal human intervention. To demonstrate the perfor-mance of NNPs constructed with our approach, we compare the results of molecular staticsand molecular dynamics calculations using the NNPs to reference DFT results.

Computational details

All DFT calculations were carried out by the Vienna Ab initio Simulation Package (VASP) employing projector augmented wave (PAW) pseudopotentials. A plane-wave cutoﬀ of520 eV and the Perdew-Burke-Ernzerhof (PBE) form of the generalized gradient approxi-mation for the exchange-correlation potential were used for geometry optimization and singlepoint energy calculations.Phonon dispersion relations were calculated using the frozen phonon approach as imple-3ented in the PHONOPY package. Convergence tests were performed to obtain convergedforce constants and phonon density of states. For calculating lattice thermal conductivity,we used the ShengBTE code, which solves the Boltzmann transport equation for phonons.This code requires the second- and third-order interatomic force constants (IFCs), whichwere computed with PHONOPY and thirdorder.py, respectively. The second- and third-order IFCs were calculated with 3 × × ) and 4 × × ) supercells. For the third-order IFCs, the interactions up to eleventh nearestneighbors were considered, for which the lattice thermal conductivity of both structures areconverged.To minimize human control in the training process, we took advantage of atomate, an open-source Python framework for materials science simulation and analysis with anemphasis on automation. Atomate is built on top of open-source libraries such as pymat-gen (an open-source Python library for materials analysis), Custodian (a just-in-time jobmanagement framework), and FireWorks (a code for deﬁning, managing, and executingworkﬂows), allowing automatic job submission and result collection. Atomic conﬁgurationsare taken from the MPDB, which makes its data available through the open Materials Ap-plication Programming Interface (API) and the pymatgen materials analysis package.We trained and utilized high-dimensional NNPs using algorithms implemented in FLAME, an open-source software package that enables performance of a wide range of atomisticsimulations for the construction and exploration of PESs. FLAME has been used in severaldevelopment and application studies in the context of MLIPs. We refer to Reference for more details.In our approach, two crystal structure prediction methods are employed to increase thediversity of the training dataset: crystal structure prediction based on minima hopping and that on ionic substitutions. Minima hopping is a method for ﬁnding the global min-imum of a PES. It not only searches for the global minimum but also explores low-energyconﬁgurations. Ionic-substitution crystal structure prediction is based on the common ap-4roach to proposing new compounds by replacing one ion with another chemically similarion. The mathematical model provides a probability distribution for ionic substitutions thatit has learned from a database of experimentally observed crystal structures.To compare atomic conﬁgurations generated during the potential development process,the distances between structural ﬁngerprints are calculated. If the distance between twoﬁngerprints is less than a deﬁned value, then the structures are considered similar.

Workﬂow

In the following sections, we describe the workﬂow of our script, which is shown in Figure 1.In the course of constructing NNPs, the script keeps track of its steps. If a failure occursand the script cannot advance, the user can restart the script from the last successfullyaccomplished step. It is noted that the user can always restart the script from a previousstep with diﬀerent parameters.

Input data

The input ﬁle is in YAML format. The composition of the system is determined by pro-viding either a) a Materials Project ID (mp-id), b) a species (element symbols and theiroxidation states, e.g. (Ag, +1)), or c) a single element symbol. In the last case, only struc-tures with the given element available in the MPDB will be considered. If the user providesan mp-id or species, then all compounds containing the speciﬁed elements with the givenoxidation states are considered. For example, CuInSe and CuIn Se will be included inthe training process for the (In, +3)-(Cu, +1)-(Se, -2) system, because the oxidation statesof the constituent elements are the same in both compounds. It should be noted that thetransferability of NNPs with no restriction on the stoichiometry of training systems needs acareful examination. There is an option to limit training data points to systems with aﬁxed stoichiometry. 5he size of the system is established by specifying the maximum number of atoms in bulkstructures. Structures containing less than 4 atoms are discarded. If the training datasetshould include clusters, then the minimum and maximum number of atoms in clusters shouldbe speciﬁed in the input ﬁle.Parameters needed for training cycles, such as the number of nodes in hidden layers ofthe NN and temperatures for performing minima hopping, can be speciﬁed in the inputﬁle. Parameters that are needed for structure diversity check, such as the minimum alloweddistance between atoms and the average of distances between the ﬁngerprints of samplestructures, are calculated based on the given composition. We discuss these parameters inthe corresponding sections.

Crystal structure prediction

The next step after selecting the system is crystal structure prediction based on ionic sub-stitution. The aim of this step is to include as many bulk structures as possible in thedata generation process. After new bulk structures are found, the duplicated structures areremoved from the ﬁnal list and the remaining structures are optimized until the residualforce on each atom is less than 0.05 eV/˚A.In addition to ionic substitution for crystal structure prediction, the user can substitute afraction of atoms to include more elements in the system. The purpose of this ion substitutionis to extend the chemical space to systems that do not exist in databases. For example, agiven percentage of In in the In-Cu-Se system can be replaced with Tl to build the quaternarysystem of In-Tl-Cu-Se.

Perturbed structures creation

To expand the reference training dataset, periodic bulk structures are perturbed. Thatis, atoms are randomly displaced from their optimized positions by translating sites alonga vector and/or rotating sites by an angle around a vector. In addition, stressed struc-6ures (expanded/contracted bulk structures) are generated. All these structures are looselyoptimized, and samples are taken from the diﬀerent steps of the geometry optimization cal-culations. The structure relaxation stops when the residual forces on the atoms are less than0 . . Training cycle

The training cycle has ﬁve steps: potential training, minima hopping, structure diversitycheck, single point energy calculations, and data collection. The ﬁrst three steps are per-formed as implemented in FLAME. For training NNPs, the data obtained from the previousstep is divided into 90% for training and 10% for validation. The generated NNP in eachcycle are used for performing minima hopping. Seed conﬁgurations for minima hopping arethe bulk structures from the crystal structure prediction step.In the diversity check step, the snapshots taken from minima hopping trajectories arescreened to remove similar and nonphysical structures (due to the small distance of atoms oran unacceptable density). To remove similar structures, the distances between all structuralﬁngerprints of atomic conﬁgurations are computed. If the distance between a pair of ﬁnger-prints is smaller than the accepted value, then one of the conﬁgurations is discarded. It isnoted that parameters for structure diversity check are adjusted for each cycle of training.That is, the allowed minimum distance between atoms in the earlier cycles of training (whenthe model is not yet well-trained) is larger than that in the later steps of training (whenNNPs are more reliable). In contrast, while during the early cycles of training, the tolerancefor identifying similar structures is large (loose), the tolerance becomes smaller in the latercycles of training to select only structures with a large distance from the previously selectedstructures.The structures that pass diversity check are sent for single point energy calculations at7he DFT level. In the data collection step, the forces and energies calculated by VASP arecollected to be used in the next cycle of training. Conﬁgurations for which the residual forceon their atoms is less than 0 .

10 eV/˚A are stored for minima hopping in the next cycle. Thetraining cycle is repeated until the desired number of training cycles, speciﬁed in the inputﬁle, is reached.

Validation of the approach

In the following sections, we demonstrate the validity of our approach for developing in-teratomic potentials by comparing DFT and NNP results for the (Ti, +4)-(O, -2) and (In,+3)-(Cu, +1)-(Se, -2) systems.

The (Ti, +4)-(O, -2) system

We have developed a NNP for the (Ti, +4)-(O, -2) system with a ﬁxed stoichiometry thatlimits the composition to titanium dioxide (TiO ). The diverse structures of TiO makesit an ideal benchmark material to evaluate the performance of the generated NNP. Amongthe many applications of TiO , the ability of TiO to photocatalyze water splitting reactionsmakes it interesting for sustainable energy applications. For developing (Ti, +4)-(O, -2) potentials, 4606 VASP geometry optimization calcula-tions were performed for 223 dissimilar structures. Of these structures, 77 were taken fromthe MPDB and the rest was found by crystal structure prediction. The training processstarted with 16696 data points and stopped after ﬁve steps, as it was speciﬁed in the inputﬁle. The total number of training points after ﬁve cycles of training was 67248.To asses the performance of the developed potentials for the (Ti, +4)-(O, -2) system,ﬁrst, the equation of states (EOS) for two phases of TiO were calculated (see Figure 2).The agreement between ab initio and NNP results is very good. The rutile phase of TiO ismore stable than the anatase phase. DFT calculations with PBE, however, predict anatase8o be more stable than rutile. The phonon dispersion relation calculated for a 4 × × is shown in Figure 3. The main contribution to lattice thermalconductivity is coming from acoustic branches. In this frequency range, the agreementbetween DFT and NNP results is very good. Therefore, a good agreement between latticethermal conductivity calculated by DFT and by the trained NNP is expected (see Figure 4).Phonon dispersion relations and lattice thermal conductivity values calculated for TiO by diﬀerent exchange-correlation functionals agree less with each other. For example,the lattice thermal conductivity of anatase calculated with the local density approximationfunctional at 300 K is 3.6 Wm − K − larger than that calculated with PBE (7.2 Wm − K − ).The diﬀerence between the PBE and NNP results is well below this value.To evaluate the accuracy of Ti-O potentials for MD simulations, NVT MD calculations atvarious temperatures were performed by employing MD algorithms implemented in EON. To perform MD calculations with NNPs, EON is coupled with FLAME. A comparisonbetween radial distribution functions calculated by DFT and by the constructed NNP fora 108-atom supercell of anatase is shown in Figure 5. MD calculations were performed at1000 K for 35000 MD steps after equilibration. The very good agreement between the DFTand NNP results shows the reliability of the developed NNP for MD simulations.As mentioned earlier, only periodic bulk structures have been used for training NN in-teratomic potentials. It is interesting to see whether the developed potentials could producereasonable results for non-periodic systems, like slabs. Therefore, we calculated the energyper atom for a 2D phase of TiO that was recently discovered by crystal structure predic-tion. For a single layer of this 2D structure with 12 atoms, the diﬀerence between theenergy per atom calculated with DFT and that with the NNP is as small as 3.4 meV. Itis worth noting that this structure was not in the training dataset. In addition, we havecalculated the surface energies of the non-polar (001) surface of rutile and anatase. Thesurface energy for the (001) surface of rutile calculated by DFT and the NNP is 0.13 eV/˚A (2.10 J/m ) and 0.13 eV/˚A (2.04 J/m ), respectively. For the anatase phase, the surface9nergies were computed as 0.13 eV/˚A (2.07 J/m ) and 0.13 eV/˚A (2.06 J/m ) by DFTand the NNP, respectively. The (In, +3)-(Cu, +1)-(Se, -2) system

The representative of the (In, +3)-(Cu, +1)-(Se, -2) system is copper indium selenide(CuInSe ), one of the most promising absorbers for thin-ﬁlm solar cells. The applicationsof CuInSe and its alloy with Ga in conversion of solar energy into sustainable energy arenot limited to photovoltaics. Furthermore, in the last few years, we and many othergroups have studied the characteristics of CuInSe using ﬁrst-principle methods, so a largeamount of ab initio data is available for this system to benchmark against.The process of developing the NNP started with ﬁnding structures that contain (In, +3),(Cu, +1), and (Se, -2). For this system, there are six known structures with less than 40atoms in the MPDB. Two of these six structures have oxidation states that do not match theoxidation states of the species speciﬁed in the input ﬁle (In CuSe and In Cu Se ). In thecrystal structure prediction step, 52 dissimilar structures were found. The ﬁrst cycle of train-ing started with the data obtained from the optimization of perturbed/stressed structures.In total, 1729 VASP geometry optimization calculations for bulk structures were performed,which resulted in 5563 data points. The total number of training points after seven cyclesof training was 32690. We stopped the training process in the seventh step. The reason forslower growing of data points for the In-Cu-Se system compared with the Ti-O system is thesmaller number of seed structures for minima hopping.Comparing the EOS for two phases of CuInSe (Figure 6) conﬁrms the good agreementbetween DFT and NNP results. The most stable structure of CuInSe is chalcopyrite, andthe lattice parameters of this structure predicted by DFT are very close to those predictedby the NNP. For the less stable structure of CuInSe , the lattice parameters predicted bythe trained NNP are about 1% larger than those computed by DFT. It is noted that thediﬀerence between the lattice parameters of this structure calculated with the local density10pproximation functional and that with the PBE is about 2.5%. The phonon dispersionrelation for chalcopyrite CuInSe was calculated for a 3 × × method to calculate the energy barriers for some known diﬀusion mechanisms in a supercellof CuInSe . Saddle points computed with ab initio and the NNP are listed in Table 1.Jumping of a Cu atom from a Cu lattice to a Cu vacancy (V Cu ) site (V Cu → V (cid:48) Cu ) is themost frequent event for diﬀusion of Cu atoms. The diﬀusion barriers calculated for thisevent with DFT and with the generated NNP are in excellent agreement. Another eventobserved in this system is jumping of an In Cu antisite atom to the V Cu site. The diﬀerencebetween diﬀusion barriers calculated by DFT and by the NNP for this event is about 5.5%.Table 1: Comparison between diﬀusion barriers calculated by DFT and the NNP for CuInSe .Process Diﬀusion barrier (eV)DFT NNV Cu → V (cid:48) Cu Cu + V Cu → In (cid:48) Cu + V (cid:48) Cu (0.65 J/m ) and 0.04 eV/A (0.69 J/m ), respectively. Itshould be mentioned that the (001) surface of CuInSe is a relatively ‘simple’ surface. Formore ‘complicated’ surfaces, it is necessary to include structures that resemble the surfacestructure (for example clusters) in the training data. ummary We introduced a Python script for constructing NN interatomic potentials with minimalhuman intervention. The potential development is performed with algorithms implementedin FLAME. The only data that needs to be provided by the user is the species of thesystem of interest. The rest of the training process is performed without human control bytaking advantage of open-source codes developed for computational materials science suchas atomate, Custodian, and WorkFlow. The script keeps track of its steps, and if a failureoccurs, the user can continue the potential developing process from the last successfullycompleted step. The validity of our approach is shown by comparing results of variousmolecular statics and molecular dynamics calculations based on DFT and NNPs.

Acknowledgements

The authors from UPB gratefully acknowledge funding of this project by computing timeprovided by the Paderborn Center for Parallel Computing (PC ). References (1) Behler, J. Atom-centered symmetry functions for constructing high-dimensional neuralnetwork potentials.

The Journal of Chemical Physics , , 074106.(2) Behler, J.; Parrinello, M. Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces. Phys. Rev. Lett. , , 146401.(3) Bart´ok, A. P.; Payne, M. C.; Kondor, R.; Cs´anyi, G. Gaussian Approximation Poten-tials: The Accuracy of Quantum Mechanics, without the Electrons. Phys. Rev. Lett. , , 136403.(4) Botu, V.; Ramprasad, R. Adaptive machine learning framework to accelerate ab initio12olecular dynamics. International Journal of Quantum Chemistry , , 1074–1083.(5) Thompson, A.; Swiler, L.; Trott, C.; Foiles, S.; Tucker, G. Spectral neighbor analysismethod for automated generation of quantum-accurate interatomic potentials. Journalof Computational Physics , , 316 – 330.(6) Li, Z.; Kermode, J. R.; De Vita, A. Molecular Dynamics with On-the-Fly MachineLearning of Quantum-Mechanical Forces. Phys. Rev. Lett. , , 096405.(7) Hansen, K.; Biegler, F.; Ramakrishnan, R.; Pronobis, W.; von Lilienfeld, O. A.;M¨uller, K.-R.; Tkatchenko, A. Machine Learning Predictions of Molecular Properties:Accurate Many-Body Potentials and Nonlocality in Chemical Space. The Journal ofPhysical Chemistry Letters , , 2326–2331.(8) Brockherde, F.; Vogt, L.; Li, L.; Tuckerman, M. E.; Burke, K.; M¨uller, K.-R. Bypassingthe Kohn-Sham equations with machine learning. Nature Communications , ,872.(9) Smith, J. S.; Isayev, O.; Roitberg, A. E. ANI-1: an extensible neural network potentialwith DFT accuracy at force ﬁeld computational cost. Chem. Sci. , , 3192–3203.(10) Yao, K.; Herr, J. E.; Toth, D.; Mckintyre, R.; Parkhill, J. The TensorMol-0.1 modelchemistry: a neural network augmented with long-range physics. Chem. Sci. , ,2261–2269.(11) Podryabinkin, E. V.; Shapeev, A. V. Active learning of linearly parametrized inter-atomic potentials. Computational Materials Science , , 171 – 180.(12) Zhang, L.; Han, J.; Wang, H.; Car, R.; E, W. Deep Potential Molecular Dynamics: AScalable Model with the Accuracy of Quantum Mechanics. Phys. Rev. Lett. , ,143001. 1313) Bereau, T.; DiStasio, R. A.; Tkatchenko, A.; von Lilienfeld, O. A. Non-covalent interac-tions across organic and biological subsets of chemical space: Physics-based potentialsparametrized from machine learning. The Journal of Chemical Physics , ,241706.(14) Sch¨utt, K. T.; Kessel, P.; Gastegger, M.; Nicoli, K. A.; Tkatchenko, A.; M¨uller, K.-R.SchNetPack: A Deep Learning Toolbox For Atomistic Systems. Journal of ChemicalTheory and Computation , , 448–455.(15) Khaliullin, R. Z.; Eshet, H.; K¨uhne, T. D.; Behler, J.; Parrinello, M. Graphite-diamondphase coexistence study employing a neural-network mapping of the ab initio potentialenergy surface. Phys. Rev. B , , 100103.(16) Eshet, H.; Khaliullin, R. Z.; K¨uhne, T. D.; Behler, J.; Parrinello, M. Ab initio qualityneural-network potential for sodium. Phys. Rev. B , , 184107.(17) Artrith, N.; Morawietz, T.; Behler, J. High-dimensional neural-network potentials formulticomponent systems: Applications to zinc oxide. Phys. Rev. B , , 153101.(18) Khaliullin, R. Z.; Eshet, H.; K¨uhne, T. D.; Behler, J.; Parrinello, M. Nucleation mech-anism for the direct graphite-to-diamond phase transition. Nature Mater. , ,693.(19) Eshet, H.; Khaliullin, R. Z.; K¨uhne, T. D.; Behler, J.; Parrinello, M. Microscopic Originsof the Anomalous Melting Behavior of Sodium under High Pressure. Phys. Rev. Lett. , , 115701.(20) Artrith N., U. A. An implementation of artiﬁcial neural-network potentials for atomisticmaterials simulations: Performance for TiO2. Comput. Mater. Sci. , , 135.(21) Eckhoﬀ, M.; Behler, J. From Molecular Fragments to the Bulk: Development of a14eural Network Potential for MOF-5. Journal of Chemical Theory and Computation , , 3793–3809.(22) Morawietz, T.; Singraber, A.; Dellago, C.; Behler, J. How van der Waals interactionsdetermine the unique properties of water. Proceedings of the National Academy of Sci-ences , , 836–8373.(23) Sukuba, I.; Chen, L.; Probst, M.; Kaiser, A. A neural network interface for DL POLYand its application to liquid water. Molecular Simulation , , 1–6.(24) Natarajan, S. K.; Behler, J. Neural network molecular dynamics simulations ofsolid–liquid interfaces: water at low-index copper surfaces. Phys. Chem. Chem. Phys. , , 28704–28725.(25) Quaranta, V.; Hellstr¨om, M.; Behler, J. Proton-Transfer Mechanisms at the Water–ZnOInterface: The Role of Presolvation. The Journal of Physical Chemistry Letters , , 1476–1483, PMID: 28296415.(26) Quaranta, V.; Behler, J.; Hellstr¨om, M. Structure and Dynamics of theLiquid–Water/Zinc-Oxide Interface from Machine Learning Potential Simulations. TheJournal of Physical Chemistry C , , 1293–1304.(27) Hellstr¨om, M.; Quaranta, V.; Behler, J. One-dimensional vs. two-dimensional protontransport processes at solid–liquid zinc-oxide–water interfaces. Chem. Sci. , ,1232–1243.(28) Ludwig, T.; Gauthier, J. A.; Brown, K. S.; Ringe, S.; Nørskov, J. K.; Chan, K. Sol-vent–Adsorbate Interactions and Adsorbate-Speciﬁc Solvent Structure in Carbon Diox-ide Reduction on a Stepped Cu Surface. The Journal of Physical Chemistry C , , 5999–6009. 1529) Li, W.; Ando, Y.; Minamitani, E.; Watanabe, S. Study of Li atom diﬀusion in amor-phous Li3PO4 with neural network potential. The Journal of Chemical Physics , , 214106.(30) Korolev, V. V.; Mitrofanov, A. A.; Nevolin, Y. M.; Krotov, V. V.; Ul’yanov, D. K.;Protsenko, P. V. Neural Network Based Modeling of Grain Boundary ComplexionsLocalized in Simple Symmetric Tilt Boundaries Σ3 (111) and Σ5 (210). Colloid Journal , , 689–695.(31) Elbaz, Y.; Furman, D.; Caspary Toroker, M. Modeling Diﬀusion in Functional Mate-rials: From Density Functional Theory to Artiﬁcial Intelligence. Advanced FunctionalMaterials , , 1900778.(32) Xu, N.; Shi, Y.; He, Y.; Shao, Q. A Deep-Learning Potential for Crystalline and Amor-phous Li–Si Alloys. The Journal of Physical Chemistry C , , 16278–16288.(33) Jain, A.; Ong, S. P.; Hautier, G.; Chen, W.; Richards, W. D.; Dacek, S.; Cholia, S.;Gunter, D.; Skinner, D.; Ceder, G.; Persson, K. A. Commentary: The Materials Project:A materials genome approach to accelerating materials innovation. APL Materials , , 011002.(34) Ong, S. P.; Cholia, S.; Jain, A.; Brafman, M.; Gunter, D.; Ceder, G.; Persson, K. A.The Materials Application Programming Interface (API): A simple, ﬂexible and eﬃcientAPI for materials data based on REpresentational State Transfer (REST) principles. Computational Materials Science , , 209 – 215.(35) Hautier, G.; Fischer, C.; Ehrlacher, V.; Jain, A.; Ceder, G. Data Mined Ionic Substi-tutions for the Discovery of New Compounds. Inorganic Chemistry , , 656–663,PMID: 21142147.(36) Goedecker, S. Minima hopping: An eﬃcient search method for the global minimum of16he potential energy surface of complex molecular systems. The Journal of ChemicalPhysics , , 9911–9917.(37) Amsler, M.; Goedecker, S. Crystal structure prediction using the minima hoppingmethod. J. Chem. Phys. , , 224104.(38) Kresse, G.; Hafner, J. Ab initio molecular dynamics for liquid metals. Phys. Rev. B , , 558–561.(39) Kresse, G.; Furthm¨uller, J. Eﬃcient iterative schemes for ab initio total-energy calcu-lations using a plane-wave basis set. Phys. Rev. B , , 11169–11186.(40) Kresse, G.; Furthm¨uller, J. Eﬃciency of ab-initio total energy calculations for metalsand semiconductors using a plane-wave basis set. Computational Materials Science , , 15 – 50.(41) Bl¨ochl, P. E. Projector augmented-wave method. Phys. Rev. B , , 17953–17979.(42) Perdew, J. P.; Burke, K.; Ernzerhof, M. Generalized Gradient Approximation MadeSimple. Phys. Rev. Lett. , , 3865–3868.(43) Togo, A.; Tanaka, I. First principles phonon calculations in materials science. ScriptaMaterialia , , 1–5.(44) Li, W.; Carrete, J.; A. Katcho, N.; Mingo, N. ShengBTE: A solver of the Boltzmanntransport equation for phonons. Comput. Phys. Commun. , , 1747–1758.(45) Mathew, K. et al. Atomate: A high-level interface to generate, execute, and analyzecomputational materials science workﬂows. Computational Materials Science , , 140 – 152.(46) Ong, S. P.; Richards, W. D.; Jain, A.; Hautier, G.; Kocher, M.; Cholia, S.; Gunter, D.;Chevrier, V. L.; Persson, K. A.; Ceder, G. Python Materials Genomics (pymatgen):17 robust, open-source python library for materials analysis. Computational MaterialsScience , , 314 – 319.(47) Jain, A.; Ong, S. P.; Chen, W.; Medasani, B.; Qu, X.; Kocher, M.; Brafman, M.;Petretto, G.; Rignanese, G.-M.; Hautier, G.; Gunter, D.; Persson, K. A. FireWorks: adynamic workﬂow system designed for high-throughput applications. Concurrency andComputation: Practice and Experience , , 5037–5059, CPE-14-0307.R2.(48) Amsler, M.; Rostami, S.; Tahmasbi, H.; Khajehpasha, E. R.; Faraji, S.; Ra-soulkhani, R.; Ghasemi, S. A. FLAME: A library of atomistic modeling environments. Computer Physics Communications , , 107415.(49) FLAME: a library of atomistic modeling environments. 2018; https://flame-code.org and https://github.com/flame-code/FLAME .(50) Ghasemi, S. A.; Hofstetter, A.; Saha, S.; Goedecker, S. Interatomic potentials for ionicsystems with density functional accuracy based on charge densities obtained by a neuralnetwork. Phys. Rev. B , , 045131.(51) Rostami, S.; Amsler, M.; Ghasemi, S. A. Optimized symmetry functions for machine-learning interatomic potentials of multicomponent systems. J. Chem. Phys. , ,124106.(52) Faraji, S.; Ghasemi, S. A.; Rostami, S.; Rasoulkhani, R.; Schaefer, B.; Goedecker, S.;Amsler, M. High accuracy and transferability of a neural network potential throughcharge equilibration for calcium ﬂuoride. Phys. Rev. B , , 104105.(53) Rasoulkhani, R.; Tahmasbi, H.; Ghasemi, S. A.; Faraji, S.; Rostami, S.; Amsler, M.Energy landscape of ZnO clusters and low-density polymorphs. Phys. Rev. B , ,064108. 1854) Eivari, H. A.; Ghasemi, S. A.; Tahmasbi, H.; Rostami, S.; Faraji, S.; Rasoulkhani, R.;Goedecker, S.; Amsler, M. Two-Dimensional Hexagonal Sheet of TiO2. Chem. Mater. , , 8594.(55) Faraji, S.; Ghasemi, S. A.; Parsaeifard, B.; Goedecker, S. Surface reconstructions andpremelting of the (100) CaF2 surface. Phys. Chem. Chem. Phys. , , 16270.(56) Oganov, A. R.; Valle, M. How to quantify energy landscapes of solids. The Journal ofChemical Physics , , 104504.(57) The Oﬃcial YAML Web Site. https://yaml.org/ .(58) Mangold, C.; Chen, S.; Barbalinardo, G.; Behler, J.; Pochet, P.; Termentzidis, K.;Han, Y.; Chaput, L.; Lacroix, D.; Donadio, D. Transferability of neural network po-tentials for varying stoichiometry: Phonons and thermal conductivity of MnxGey com-pounds. Journal of Applied Physics , , 244901.(59) Benoit, M.; Amodeo, J.; Combettes, S.; Khaled, I.; Roux, A.; Lam, J. Measuring trans-ferability issues in machine-learning force ﬁelds: the example of gold–iron interactionswith linearized potentials. Machine Learning: Science and Technology , , 025003.(60) FUJISHIMA, A.; HONDA, K. Electrochemical Photolysis of Water at a SemiconductorElectrode. Nature , , 37–38.(61) Ma, Y.; Wang, X.; Jia, Y.; Chen, X.; Han, H.; Li, C. Titanium Dioxide-Based Nanoma-terials for Photocatalytic Fuel Generations. Chemical Reviews , , 9987–10043.(62) Ni, M.; Leung, M. K.; Leung, D. Y.; Sumathy, K. A review and recent developmentsin photocatalytic water-splitting using TiO2 for hydrogen production. Renewable andSustainable Energy Reviews , , 401 – 425.(63) Slack, G. Nonmetallic crystals with high thermal conductivity. Journal of Physics andChemistry of Solids , , 321–335.1964) Shojaee, E.; Mohammadizadeh, M. R. First-principles elastic and thermal properties ofTiO2: a phonon approach. Journal of Physics: Condensed Matter , , 015401.(65) Arrigoni, M.; Madsen, G. K. Comparing the performance of LDA and GGA functionalsin predicting the lattice thermal conductivity of III-V semiconductor materials in thezincblende structure: The cases of AlAs and BAs. Computational Materials Science , , 354 – 360.(66) Torres, P.; Rurali, R. Thermal Conductivity of Rutile and Anatase TiO2 from First-Principles. The Journal of Physical Chemistry C , , 30851–30855.(67) EON: Long timescale dynamics. https://theory.cm.utexas.edu/eon/index.html .(68) Solar Frontier, Solar Frontier Achieves World Record Thin-Film Solar Cell Eﬃciency of23.35%. 2019; ,Solar Frontier KK. Press release 17.01.2019.(69) Regmi, G.; Ashok, A.; Chawla, P.; Semalti, P.; Velumani, S.; Sharma, S. N.; Cas-taneda, H. Perspectives of chalcopyrite-based CIGSe thin-ﬁlm solar cell: a review. Journal of Materials Science: Materials in Electronics , , 7286–7314.(70) Kim, B.; Park, G. S.; Hwang, Y. J.; Won, D. H.; Kim, W.; Lee, D. K.; Min, B. K.Cu(In,Ga)(S,Se)2 Photocathodes with a Grown-In CuxS Catalyst for Solar Water Split-ting. ACS Energy Letters , , 2937–2944.(71) Hu, Z.; Gong, J.; Ye, Z.; Liu, Y.; Xiao, X.; Yu, J. C. Cu(In,Ga)Se2 for selective andeﬃcient photoelectrochemical conversion of CO2 into CO. Journal of Catalysis , , 88 – 95.(72) Mirhosseini, H.; Kormath Madam Raghupathy, R.; Sahoo, S. K.; Wiebeler, H.;Chugh, M.; K¨uhne, T. D. In silico investigation of Cu(In,Ga)Se2-based solar cells. Phys. Chem. Chem. Phys. , , 26682–26701.2073) Henkelman, G.; Uberuaga, B. P.; J´onsson, H. A climbing image nudged elastic bandmethod for ﬁnding saddle points and minimum energy paths. The Journal of ChemicalPhysics , , 9901–9904.(74) Kormath Madam Raghupathy, R.; K¨uhne, T. D.; Henkelman, G.; Mirhosseini, H. AlkaliAtoms Diﬀusion Mechanism in CuInSe2 Explained by Kinetic Monte Carlo Simulations. Advanced Theory and Simulations , , 1900036.21igure 1: The workﬂow diagram of the automated algorithm to generate reference datasetsand train neural network interatomic potentials.22 AnataseRutile E ne r g y / a t o m ( e V ) Volume/atom (Å ) DFTNNDFTNN

Figure 2: Comparison of equation of state for anatase and rutile TiO calculated by theNNP and DFT. Γ X Y

Σ Γ Z Σ N P Y Z X P F r equen c t ( T H z ) DFTNN

Figure 3: Phonon dispersion relation of anatase TiO calculated by DFT and the NNP.23 K xx , K yy K zz average κ ( W m − K − ) Temperature (K)

NNDFTNNDFTNNDFT

Figure 4: Lattice thermal conductivity of the anatase phase of TiO calculated with DFTand the NNP. Ti−OTi−Ti g (r) r (Å) DFTNNDFTNN

Figure 5: The radial distribution functions for anatase TiO computed with DFT and theNNP at T= 1000 K. 24 mp−22811mp−676764 E ne r g y / a t o m ( e V ) Volume/atom (Å ) DFTNNDFTNN

Figure 6: Comparison of the equation of states of two structures of CuInSe calculated byDFT and the NNP. The structures are labeled by their Materials Project ID. Γ Z Σ N P Y F r equen c t ( T H z ) DFTNN

Figure 7: Phonon dispersion relation of chalcopyrite CuInSe computed by DFT and theNNP. 25 K xx , K yy K zz average κ ( W m − K − ) Temperature (K)

NNDFTNNDFTNNDFT

Figure 8: Lattice thermal conductivity of chalcopyrite CuInSe calculated with DFT andthe NNP. Cu−SeCu−InCu−Cu g (r) r (Å) DFTNNDFTNNDFTNN

Figure 9: The radial distribution functions for the chalcopyrite CuInSe2