[PDF] Semi-Automated Creation of Density Functional Tight Binding Models Through Leveraging Chebyshev Polynomial-based Force Fields

Abstract

Density Functional Tight Binding (DFTB) is an attractive method for accelerated quantum simulations of condensed matter due to its enhanced computational efficiency over standard Density Functional Theory approaches. However, DFTB models can be challenging to determine for individual systems of interest, especially for metallic and interfacial systems where different bonding arrangements can lead to significant changes in electronic states. In this regard, we have created a rapid-screening approach for determining systematically improvable DFTB interaction potentials that can yield transferable models for a variety of conditions. Our method leverages a recent reactive molecular dynamics force field where many-body interactions are represented by linear combinations of Chebyshev polynomials. This allows for the efficient creation of multi-center representations with relative ease, requiring only a small investment in initial DFT calculations. We have focused our workflow on TiH_2 as a model system and show that a relatively small training set based on unit-cell sized calculations yields a model accurate for both bulk and surface properties. Our approach is easy to implement and can yield accurate DFTB models over a broad range of thermodynamic conditions, where physical and chemical properties can be difficult to interrogate directly and there is historically a significant reliance on theoretical approaches for interpretation and validation of experimental results.

Full PDF

SSemi-Automated Creation of DensityFunctional Tight Binding Models ThroughLeveraging Chebyshev Polynomial-based ForceFields

Nir Goldman, ∗ , † , ‡ Kyoung E. Kweon, † Babak Sadigh, † Tae Wook Heo, † RebeccaK. Lindsey, † C. Huy Pham, † Laurence E. Fried, † Bálint Aradi, ¶ Kiel Holliday, † Jason R. Jeﬀries, † and Brandon C. Wood † † Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory,Livermore, California 94550, United States ‡ Department of Chemical Engineering, University of California, Davis, California 95616,United States ¶ Bremen Center for Computational Materials Science, Universität Bremen, P.O.B.330440, D-28334, Bremen, Germany

E-mail: [email protected]

Phone: +1-925-422-39941 a r X i v : . [ c ond - m a t . m t r l - s c i ] F e b bstract Density Functional Tight Binding (DFTB) is an attractive method for accelerated quan-tum simulations of condensed matter due to its enhanced computational eﬃciency over stan-dard Density Functional Theory approaches. However, DFTB models can be challenging todetermine for individual systems of interest, especially for metallic and interfacial systemswhere diﬀerent bonding arrangements can lead to signiﬁcant changes in electronic states.In this regard, we have created a rapid-screening approach for determining systematicallyimprovable DFTB interaction potentials that can yield transferable models for a variety ofconditions. Our method leverages a recent reactive molecular dynamics force ﬁeld wheremany-body interactions are represented by linear combinations of Chebyshev polynomials.This allows for the eﬃcient creation of multi-center representations with relative ease, re-quiring only a small investment in initial DFT calculations. We have focused our workﬂowon TiH as a model system and show that a relatively small training set based on unit-cellsized calculations yields a model accurate for both bulk and surface properties. Our ap-proach is easy to implement and can yield accurate DFTB models over a broad range ofthermodynamic conditions, where physical and chemical properties can be diﬃcult to inter-rogate directly and there is historically a signiﬁcant reliance on theoretical approaches forinterpretation and validation of experimental results.2 Introduction

Atomistic computer simulations are frequently used as an independent route to determiningphysical and chemical properties for complex systems, including material synthesis, shockcompressed and highly strained conditions, and surface chemistry. These calculationscan provide predictions of physical properties and reaction mechanisms, which can makeexperiments more tractable by aiding in their interpretation and helping to narrow the num-ber of diﬀerent conditions to investigate. Accurate modeling of the breaking and formingof chemical bonds in condensed phases frequently requires the use of quantum theories suchas Kohn-Sham Density Functional Theory (DFT). DFT remains one of the most widelyused theoretical methods in condensed matter physics, computational chemistry, and ma-terials science for prediction of material properties and chemical reactivity, including phaseboundaries and thermal decomposition (e.g., Refs. 10–13). DFT calculations, though, re-quire immense computational eﬀort per simulation time step and consequently are usuallylimited to picosecond time scales and nanometer system sizes. In contrast, many processesof interest have properties that can span orders of magnitude larger scales, including but notlimited to large-scale carbon heterocycle synthesis, the rational design of 3D materials, and defect formation and grain boundary interactions in crystalline systems. Hence, thereis a great need to explore methods that can harness the accuracy of DFT while yieldingsubstantial improvements in computational speeds.Density Functional Tight Binding (DFTB) is a semi-empirical quantum theory basedon an expansion of the Kohn-Sham DFT total energy that holds promise as an alternateapproach for gas-phase studies as well as condensed phase systems. DFTB implementsa balance of approximate quantum mechanics and empirical functions that can allow forseveral orders of magnitude improvement in computational eﬃciency for a variety of materialsand mixtures. Diﬃculty can arise in creating new DFTB interaction potentials or improvingupon existing models due to the large number of hyper-parameters within the approach. Inparticular, the approximate Hamiltonian in DFTB can be sensitive to the conﬁning potential3hosen for the wavefunction and electron density (Section 2.2), and in general there doesnot exist a predeﬁned recipe for how to choose these parameters nor how to explore thatspeciﬁc phase space. The Hamiltonian parameters in turn are closely coupled to the empiricalrepulsive energy, which itself has a wide variety of options in terms of functional form anddata to be ﬁt. In addition, the repulsive energy is usually taken to be strictly pairwise(two-center), though a number of systems can require many-body terms as well for accuratepredictions.

The generation of training data can be problematic for some systems such ashigh-Z materials, where DFT calculations can be cumbersome due to the large Hamiltonianof these systems (due to the large number of valence electrons and high orbital angularmomenta) and the possible manifold of local minima in the electron density due to spin-polarization, correlation, and relativistic eﬀects. Thus, DFTB method development wouldbe holistically improved through a more automatic method for parameterization, wherecandidate models could be screened rapidly and eﬃciently, thereby allowing the user toquickly determine an optimal model for their speciﬁc needs.To this end, we propose a method for streamlining DFTB parameterization by leveragingthe recently developed Chebyshev Interaction Model for Eﬃcient Simulation (ChIMES). ChIMES is a many-body reactive force ﬁeld model for molecular dynamics (MD) simulationbased on linear combinations of Chebyshev polynomials. ChIMES MD models have beencreated for a number of systems under both unreactive and reactive conditions, includingmolten liquid carbon, water under ambient and high pressure-temperature conditions, high pressure C/O systems, and detonating energetic materials. The main advantageof the ChIMES formalism is that optimal parameters are determined extremely rapidly (e.g.,within minutes for this eﬀort) through solving a set of simultaneous linear equations. This incontrast to bond order methods (e.g., Refs. 33,34) and neural network force ﬁelds (e.g., Refs.17,35) that generally require non-linear optimization (e.g., Levenberg-Marquardt), makingdetermination of optimal parameter sets signiﬁcantly more labor intensive and uncertain dueto possible local minima in the search space. 4n this work, we use the metal TiH for our model determination. TiH has a number ofpossible applications areas, including in hydrogen storage alloys, and as a superconductingmaterial, a thermal energy storage material, an intermediary to produce ﬁne Ti metalpowders, and as a blending agent to create porous aluminum foams. In addition, TiH calculations are relatively tractable with DFT, facilitating thorough validation of any de-veloped DFTB parameter set. It exhibits face centered cubic (fcc) symmetry in its groundstate, with the (111) surface generally exhibiting the lowest surface energy (Figure 1).We begin with a brief discussion of our DFT calculations and the DFTB method, followedby details of the ChIMES formalism. We then experiment with diﬀerent options for theDFTB quantum potential energy and charge transfer terms, and indicate optimal choices forour system. Next, we proceed through our workﬂow for DFTB model selection, beginningwith the search for optimal values for the DFTB conﬁning potentials, followed by the samefor the ChIMES polynomials orders, and ﬁnally some discussion of optimal least-squaresﬁtting practices. In all cases, we deliberately use a relatively small DFT training set as asomewhat stringent test for our results, and to assess the possibility of ﬁtting to sparse data. All DFT calculations were performed with the Vienna ab initio Simulation Package (VASP) using projector-augmented wave function (PAW) pseudopotentials and the Perdew-Burke-Ernzerhof exchange correlation functional (PBE). We found our results to be con-verged with a planewave cutoﬀ of 400 eV and an energy convergence criteria of − eV, bothof which were used for the results reported here. Fourth order Methfessel-Paxton smearing was used with a value of 0.13 eV for all geometry and cell lattice optimizations in orderto ensure energy convergence without dependence on the electronic smearing temperature.The Mermin functional with the same electronic temperature was used for all MD calcu-5ations in order to avoid spurious forces due to possible negative occupation numbers fromthe Methfessel-Paxton approach. Brillouin Zone sampling for all TiH unit cell calculationswas performed with a × × k-point mesh, whereas we used a mesh of × × for32 formula unit (96 atom) bulk calculations. We used system sizes of 168 atoms/7 layers forthe (001) surface, 144 atoms/6 layers for the (011) surface, and 192 atoms/8 layers for the(111) surface, each with a vacuum of 20 Å and a k-point mesh of × × in the directionof the surface.Our DFT training set consisted of molecular dynamics simulations of unit cell conﬁg-urations (12 atoms total), run for 5 ps at 400 K with simulation cells initially optimizedto pressures of − , , , and GPa. All MD calculations were run in the constanttemperature and volume (

N V T ) ensemble with Nosé-Hoover thermostat chains and atimestep of 0.2 fs. The slightly elevated temperature and wide pressure range includingnegative pressure were chosen in order to yield a broad sampling of the underlying potentialenergy surface. Atomic forces and the diagonal of the stress tensor were then sampled fromMD conﬁgurations at ﬁxed time intervals of ∼ fs in order ensure conﬁgurations were asstatistically uncorrelated as possible. This yielded up to 30 MD snapshots for each pressure.In addition, in order to sample hyper- and hypo-coordinated conﬁgurations in the system,we included MD data for a unit cell with a single hydrogen interstitial or single vacancysite, each run for 5 ps. The starting conﬁguration for these defect simulations were takenfrom the ground-state stoichiometric unit cell, with a single H atom removed or added in anoctahedral interstitial site. This yielded a total of 153 unit cell-sized conﬁgurations for ourtraining set. The formalism for DFTB with self-consistent charges (SCC) has been discussed in detailelsewhere.

Brieﬂy, the method assumes neutral, spherically symmetric, atom-centeredcharge densities n and that the true charge density of the system is within a small pertur-6ation, i.e., n (r) = n (r) + δn . In its most typical form, a second-order Taylor expansion ofthe Kohn-Sham total energy in δn is employed to yield the following energy expression: E [ δn ] ≈ (cid:88) a f a (cid:68) ψ a (cid:12)(cid:12)(cid:12) ˆ T + ˆ V [ n ] (cid:12)(cid:12)(cid:12) ψ a (cid:69) + 12 (cid:90) (cid:90) (cid:48) (cid:18) δ E xc [ n ] δnδn (cid:48) + 1 | r − r (cid:48) | (cid:19) δnδn (cid:48) − (cid:90) n (cid:48) n | r − r (cid:48) | + E xc [ n ] − (cid:90) V xc [ n ] n + E II , (1)where we have employed a shorthand of (cid:82) = (cid:82) dr , (cid:82) (cid:48) = (cid:82) dr (cid:48) , n ( r ) = n , etc. The ﬁrst linein Equation 1 is the band structure energy: E BS [ n ] = (cid:88) a f a (cid:104) ψ a | H [ n ] | ψ a (cid:105) , (2)where the Hamiltonian H does not contain any charge transfer terms. E BS is thus computedvia sum over occupied electronic states determined from diagonalization of the approximateDFTB Hamiltonian. The Hamiltonian and overlap matrices used in E BS are determinedfrom pre-tabulated Slater-Koster (SK) tables derived from calculations with a minimal, non-orthogonal basis set.Prior to SK tabulation, both the electronic wave functions and electron density are sub-jected to separate conﬁning or compression potentials. Most commonly, the conﬁning po-tential has the general form of: V conf ( r ) = (cid:18) rR χ (cid:19) β . (3)The exponent β is usually set to a value of 2, and R χ = R ψ or R n , where R ψ corresponds towavefunction compression and R n to density compression. Our past experimentation withlarger values of β yielded negligible diﬀerence in the ﬁnal outcome, though this could beinvestigated systematically in future eﬀorts. The compression potentials force the wavefunc-7ion/electron density to zero at relatively large distances from the nuclei, which has beenshown to improve transferability of the SK tabulations. Choice of a short wavefunctioncompression improves transferability by removing the overly diﬀuse parts of the orbitals.However, this generally comes at the expense of the accuracy of the electron density atlonger distances, which can aﬀect the description of non-covalent interactions. Hence, thestandard convention is to compute the wavefunction and density compression separately,with R n > R ψ .The potential energy in the DFTB Hamiltonian is generally expressed in one of two forms.In its original formulation, the Hamiltonian is taken to be a superposition of potentialscentered on each atom, which for the purposes of this study we label as the “Trial 1” model: ˆ H ABµν Tr = (cid:68) φ Aµ (cid:12)(cid:12)(cid:12) ˆ T + ˆ V ( n A ) + ˆ V ( n B ) (cid:12)(cid:12)(cid:12) φ Bν (cid:69) (4)Here, A and B correspond to the atomic index, and µ and ν to an atomic orbital of spe-ciﬁc angular momentum. This is akin to an N-body expansion of the quantum mechanicalpotential energy that is truncated after the one-center or one-body terms, only.A more common DFTB Hamiltonian involves a superposition of electronic densities in-stead, which we label as the “Trial 2” model: ˆ H ABµν Tr = (cid:68) φ Aµ (cid:12)(cid:12)(cid:12) ˆ T + ˆ V ( n A + n B ) (cid:12)(cid:12)(cid:12) φ Bν (cid:69) (5)In this case, the electron density centered on atoms A and B is summed before evaluatingthe potential energy. This amounts to a two-body expansion, where the quantum mechanicalpotential energy includes both one and two-center terms.The second line from Equation 1 is the energy from second order charge ﬂuctuations: E = 12 (cid:90) (cid:90) (cid:48) (cid:18) δ E xc [ n ] δnδn (cid:48) + 1 | r − r (cid:48) | (cid:19) δnδn (cid:48) . (6) E is generally referred to as the Coulombic energy, and is evaluated self-consistently8sing the Hubbard U parameter. Hubbard U values for each atom type are precomputedwith the uncompressed basis set as the diﬀerence between the ionization energy and electronaﬃnity.In addition to these models, there exists a more recent DFTB energy expression, wherethird-order density ﬂuctuation terms are included in the Kohn-Sham energy expansion. This yields an additional charge-transfer term to Equation 1, namely, E = 16 (cid:90) (cid:90) (cid:48) (cid:90) (cid:48)(cid:48) (cid:18) δ E xc [ n ] δn δn (cid:48) δn (cid:48)(cid:48) (cid:19) × δn δn (cid:48) δn (cid:48)(cid:48) . (7)Evaluation of Equation 7 requires the derivative of the Hubbard U with respect to charge. This derivative can be determined numerically and for each angular momentum shell ofeach atom, separately. For our study, we include E only in conjunction with the densitysuperposition potential (Equation 5), and label this interaction model as “Trial 3”.Finally, the last line in Equation 1 is called the repulsive energy: E Rep = − (cid:90) n (cid:48) n | r − r (cid:48) | + E xc [ n ] − (cid:90) V xc [ n ] n + E II . (8)The nomenclature derives from the ion-ion repulsion term, E II , though it also containsHartree double-counting and exchange-correlation terms. In practice, E Rep is expressed asa short-ranged empirical function that covers ﬁrst coordination shell or bonded interactions,only. It’s parameters are usually ﬁt to reproduce DFT or experimental data, and it can beeither pair-wise or contain multi-center interactions.

All DFTB calculations discussed within this work were performed with the DFTB+code, using self-consistent charges (SCC) and charge convergence criteria of . × − eV ( − au). Inclusion of an external van der Waals correction is beyond the scope ofour present study. We have performed “shell-resolved” SCC calculations, where separateHubbard U parameters were determined for each orbital angular momentum shell. Hubbard U derivatives were computed numerically for Trial 3 calculations, where we determined values9f − . , − . , and − . for the s , p , and d -orbital interactions for Ti and a valueof − . for the s -orbital for H. We have also applied an exponential charge damping factorfor Ti-H interactions of 3.5, similar to previous eﬀorts. The properties investigated herewere largely insensitive to choice of either Hubbard U derivative or damping factor values.The same electron thermal smearing and k-point mesh were used as described for our DFTcalculations for each system.The ChIMES training set was determined by computing DFTB forces ( F ) and diagonalstress tensor components ( σ ) for each conﬁguration with the chosen set of Hamiltonian pa-rameters (i.e., { R ψ } , { R n } , Trial 1, 2, or 3 interaction) with zero values for those componentsfrom E Rep . These “repulsive energy free” results were then subtracted from the DFT valuesfor those quantities, i.e., F τ ∗ Rep αi = F τ DFT αi − F τ QM , DFTB αi σ τ ∗ Rep αα = σ τ DFT αα − σ τ QM , DFTB αα Here, τ corresponds to a speciﬁc MD conﬁguration, α to the cartesian direction, and i is theatomic index. The ‘*’ is used to denote that the quantities being computed are part of thetraining set, and ‘QM,DFTB’ refers to the quantum components of the DFTB calculation,i.e., only E BS and E Coul . Inclusion of conﬁgurational total energies generally resulted inminimal impact on quality of the E Rep ﬁt and were thus excluded from our training data,similar to previous eﬀorts.

The design philosophy behind ChIMES involves mapping quantum mechanical energies ontolinear combinations of many-body Chebyshev polynomials of the ﬁrst kind. Chebyshev poly-nomials of the ﬁrst kind have a number of desirable properties for creation of interatomicpotential energy surfaces, including: (i) they are orthogonal (with respect to a weightingfunction) and can be generated recursively, allowing for basis set completeness and user de-10ned complexity, (ii) higher order polynomials tend to have decreasing expansion coeﬃcientvalues (due to their monic form), and (iii) they are “nearly optimal” (the error in an ex-pansion will closely resemble a minimax polynomial). In addition, derivatives of Chebyshevpolynomials of the ﬁrst kind are are related to Chebyshev polynomials of the second kind,which themselves are orthogonal and can be generated recursively. This allows for easy andreliable determination of forces and stress tensor components for atomistic calculations.Brieﬂy, the ChIMES total energy corresponds to an n -body expansion: E n B = n a (cid:88) i E i + n a (cid:88) i >i E i i + n a (cid:88) i >i >i E i i i + · · · + n a (cid:88) i >i ... i n B − >i n B n B E i i ...i n B , (9)where E n B is the total ChIMES system energy, n B is the maximum bodiedness, n E i i ... i n isthe n -body ChIMES energy for a given set of atoms with indices i = { i , i , . . . , i n } , and n a is the total number of atoms in the system. The one-body energies, E i , correspond tothe atomic energy constants for each element type.The two-body (pairwise) energies are expressed as linear combinations of Chebyshevpolynomials of the ﬁrst kind: E i i = f p ( r i i ) + f e i e i c ( r i i ) O (cid:88) m =1 C e i e i m T m ( s e i e i i i ) (10)In this case, T m (cid:0) s e i e i i i (cid:1) represents a Chebyshev polynomial of order m , and s e i e i i i is the pairdistance transformed to occur over the interval [ − , using a Morse-like function (SeeRef. 27 for details). Here, s e i e i i i ∝ exp ( − r i i /λ e e ) and λ e e is an element-pair distancescaling constant, usually taken to be the peak position of the ﬁrst coordination shell. C e i e i m is the corresponding permutationally invariant coeﬃcient for the interaction between atomtypes e i and e i , taken from the set of all possible element types, { e } . The term f e i e i c ( r i i ) is a Tersoﬀ cutoﬀ function which is set to zero beyond a maximum distance deﬁned for agiven { e , e } pair set. In order to prevent sampling of r i i distances below what is sampled11n our DFT training set, we introduce use of a smooth penalty function f p ( r i i ) . We referthe reader to previous work for additional details. We can now create a greater than two-body orthogonal basis set by taking products ofthe (cid:0) n (cid:1) unique constituent pairwise polynomials of the higher order terms. In other words,a three-body term has (cid:0) (cid:1) = 3 pairs, which yields the following expression for the ChIMESthree-body energy: E i i i = f e i e i c ( r i i ) f e i e i c ( r i i ) f e i e i c ( r i i ) O (cid:88) m =0 O (cid:88) p =0 O (cid:88) q =0 (cid:48) C e i e i e i mpq T m (cid:0) s e i e i i i (cid:1) T p (cid:0) s e i e i i i (cid:1) T q (cid:0) s e i e i i i (cid:1) . (11)We take a triple sum for the i i , i i , and i i polynomials over the hypercube up to O ,and include a single permutationally invariant coeﬃcient for each set of powers and atomtypes, C e i e i e i mpq . We use the primed sum to denote that only terms for which two or moreof the m, p, q polynomial powers are greater than zero are included in order to guaranteethat three distinct atom-centers are evaluated. The expression for E i i i also contains the f c smoothly varying cutoﬀ functions for each constituent pair distance. Penalty functionsare not included in this case and instead are handled entirely by the two-body interaction.Similarly, the four-body energy can be written as a product of the (cid:0) (cid:1) = 6 unique pairwiseinteractions: E i i i i = f e i e i c ( r i i ) f e i e i c ( r i i ) f e i e i c ( r i i ) f e i e i c ( r i i ) f e i e i c ( r i i ) f e i e i c ( r i i ) × O (cid:88) m =0 O (cid:88) p =0 O (cid:88) q =0 O (cid:88) r =0 O (cid:88) s =0 O (cid:88) t =0 (cid:48) C e i e i e i e i mpqrst T m ( s e i e i i i ) T p ( s e i e i i i ) T q ( s e i e i i i ) T r ( s e i e i i i ) T s ( s e i e i i i ) T t ( s e i e i i i ) . (12)Again, the C e i e i e i e i mpqrst correspond to permutationally invariant coeﬃcients of linear combi-nation. In this case, the primed sum indicates that three or more of the m, p, q, r, s, t polynomial orders are greater than zero. Thus far we have limited our studies to include12p to four-bodied interactions. However, the practice of generating higher-bodied termsthrough multiplication of the constituent pairwise polynomials can be generalized to includean (cid:0) n (cid:1) -multiple sum over the n -body hypercube, which is the subject of future work.Optimal ChIMES parameters (the coeﬃcients of linear combination) can then readilybe determined through the overdetermined matrix equation AC = B rep . The matrix A corresponds to the values of the requisite polynomials for a given training conﬁguration, orin other words the derivatives with respect to the ﬁtting coeﬃcients. The column vectors C and B rep correspond to the linear ChIMES coeﬃcients and the numerical values for thetraining set, respectively. This linear least-squares optimization problem can be solved forwith any number of algorithms (Section 3.5). In order to test each of our DFTB interaction models (Trials 1–3), we have initially set Ticompression radii to R T iψ = 5 . au and R T in = 14 . au. These are similar to establishedvalues for titanium, and such as the matsci-0-2, trans3d-0-1, and tiorg-0-1 parametersets. Hydrogen compression radii were set to the same values as those for the miomod-hh-0-1parameter set, namely, R Hψ = 2 . au and R Hn = 3 . au. We then generated the SK tablesfor all interactions. For this section, we have computed single-point properties based onthe DFT optimized unit cell, e.g., without determining E Rep and performing a geometry orlattice optimization. In doing so, we can probe the quality of each interaction model throughdirect one-to-one comparison to results for DFT computed eigenstates without having to beconcerned with the quality of the repulsive energy parameterization.Analysis of the total electronic density of states (DOS) reveals that all three interactionmodels yield similar curve shapes relative to DFT in terms of peak energy positions andtheir relative heights (Figure 3). We have centered each DOS curve around its Fermi energy13 E Fermi = 0 ) in order to facilitate comparison. The Trial 1 interaction yields the closestcomparison to DFT in terms of the overall peak positions and heights, including eigenstatesup to ∼ ∼ E interaction was developed in part to improvedescriptions of polarizability in hydrogen-bonded systems, and consequently its eﬀect onTiH might more subtle than probed here. Regardless, for the remainder of this work wewill use the Trial 2 interaction model. E Rep optimization workﬂow

Given our choice of interaction potential of superposition of densities and second-order E ,we can now create a workﬂow for determination of optimal DFTB hyper-parameters (Fig-ure 2). This includes evaluation of R ψ and R n values, two-body or greater polynomial orderfor the ChIMES E Rep , and choice of linear optimization and regularization algorithms forcalculation of the ChIMES coeﬃcients. Our workﬂow is fast and eﬃcient, with the mostsigniﬁcant amount of time and CPU resources spent on computing the DFT training set.Each loop of our workﬂow has been automated, with the user being able to make decisionsfor how to proceed with hyper-parameter adjustments based on results for validation data.14ur initial DFT-MD simulations were run concurrently for 24 hours on four nodes eachof a Linux cluster, with thirty-six 2.1 GHz Intel Xeon E5-2695 v4 cores and 128 gigabytesmemory per node. This represented a usage of over 20,000 CPU hours. Generation of SKtables for chosen values of R ψ and R n run were on a single CPU and required approximatelyone hour of wall-clock time for each set. Calculation of the E Rep training set was run usingtwelve OpenMP threads, requiring ∼

30 minutes. Solving the linear least-squares problemfor a single ChIMES parameterization required less than one minute on a single CPU in allcases, allowing for extremely rapid screening of diﬀerent sets of parameters. All minimumand cutoﬀ radii for the ChIMES E Rep were set to include the ﬁrst coordination shell sampledin our training set, only: . ≤ r T iT i ≤ . Å and . ≤ r HT i ≤ . Å . We use values of λ T iT i = 3 . Å and λ HT i = 2 . Å for the Morse-like coordinate transforms. H-H repulsiveinteraction were not sampled in our training set and were thus takes from the miomod-hh-0-1parameter set.ChIMES parameterization was followed by calculation of validation data, which includedresults explicitly not present in our training data. The total validation set included the bulklattice constant and hydrogen vacancy energy, surface energies of the (001), (011) and (111)crystal facets, hydrogen adsorption energies on the (011) and (111) surfaces, and (011) and(111) surface and sub-surface hydrogen vacancy energies. The (001) surface constructed forour calculations contained a contained a net dipole ﬁeld due to an imbalance of H ions onthe top vs. bottom surface for the stoichiometric system, and as such we did not include itshydrogen surface adsorption and vacancy energies in our validation. Calculation of the entirevalidation set was performed using twelve OpenMP threads and required approximately onehour to complete. Thus, each design loop in our workﬂow that included generation of SKtables and E Rep training data took about two and half hours of wall clock time overall.15 .3 Sweep of R ψ and R n values For this eﬀort, we have computed SK tables using conﬁning radii values ranging from . ≤ R T iψ ≤ . au and . ≤ R T in ≤ . au. Hydrogen compression radii were held at thepreviously mentioned values. We ﬁnd that this range of radii spans those normally sampledin DFTB models for low-Z elements as well as transition metals. Smaller values of R T iψ and R T in resulted in some distortions to the electronic DOS without yielding additional accuracyfor our validation tests. For this section of our study, we used a constant ChIMES 2-bodypolynomial order of 12 and a 3-body order of 8, similar to previous work.

All optimizationdiscussed in here were performed with the LASSO/LARS algorithm with regularization of − (see Section 3.5 for details). In doing so, we are able to rapidly create a reasonable E Rep for our models, allowing us to assess the eﬀects of diﬀerent values of R T iψ and R T in independently.We ﬁnd that that there was small variation between results the root mean-square (RMS)errors for the training set, as well as for validation of the bulk lattice constant, vacancyenergies, and (011) hydrogen surface adsorption energies. In addition, all models yieldedthe DFT-predicted surface energy ordering of E < E < E . Hence, we focus ourvalidation discussion on the E surface energy, the ( E /E ) ratio, and the relativeenergies of adsorption on the (111) Top and Hollow surface sites, (cid:16) E Top111 − E Holl111 (cid:17) .Our analysis reveals an approximate linear correlation between { R T iψ , R T in } in terms ofthe accuracy of the E energy (Figure 4, top panel). The most accurate results for the E surface energy appear to occur when the radii are approximately consistent with eachother (either low or high). Small values of R T iψ combined with long values of R T in tend toover-predict E , where { R T iψ = 3 . , R T in = 17 . } predicts a surface energy that is ∼ R T iψ combined with smaller values of R T in tend tounder-predict, where { R T iψ = 5 , R T in = 6 . } yields a surface energy that is ∼

11% too low.In contrast, all of models created here under-predict ( E /E ) relative to our DFTcalculations (Figure 4, middle panel). We observe ratios of 1.44 or so for small R T iψ that16onotonically decrease to values of ∼ R T iψ is increased, whereas the DFT predictedvalue is 1.70. Our results indicate a much smaller dependence on choice of R T in , where alteringits value creates little change in the ratio for a given R T iψ . We note that there can be strongdependence of the surface energies on choice of DFT functional (e.g., Ref. 73), although therelative energetic ordering tends to be consistent.Finally, our models yield a variety of results for (cid:16) E Top111 − E Holl111 (cid:17) (Figure 4, bottom panel),where DFT results yield top site adsorption energies that are 0.68 eV higher than that forthe hollow site. We found that predicting this energy diﬀerence was a particularly diﬃcultchallenge for our modeling eﬀort. This is in part likely because Top site adsorption yieldsa hyper-coordinated Ti surface atom with H chemisorption, whereas Hollow site adsorptionyields a more energetically favorable physisorption interaction with the H ion at larger dis-tances from the surrounding surface Ti atoms. For this set of ChIMES parameters, we ﬁndthat large R T iψ and R T in tend to reverse the energetic ordering of adsorption on these sites,with a top site energy that is 0.1 eV lower than that for the hollow site. In contrast, theresults for R T in = 6 . generally yielded the best agreement with DFT. These results showmonotonic decreases with increasing R T iψ , with a value of 0.20 eV for R T iψ = 3 . and a valueof 0.13 eV for R T iψ = 5 . .Based on these results, we choose to proceed with our SK tables generated from { R T iψ =3 . , R T in = 6 . }. We ﬁnd this combination yields a nearly identical E to DFT with avalue of 0.080 eV/Å , and provides a relatively high ( E /E ) ratio of 1.41 for this set ofChIMES parameters. In addition, this model provides a reasonable prediction for the topand hollow site adsorption energies, with an energy diﬀerence of 0.17 eV. This model set waschosen over { R T iψ = 3 . , R T in = 6 . } due to the similarity in accuracy between these sets andin order to avoid any unforeseen distortions to the electronic states due to small conﬁningradii. 17 .4 Sweep of ChIMES polynomial orders Given our choice of { R T iψ = 3 . , R T in = 6 . }, we can now experiment with ChIMES polynomialorders in order to reﬁne our model further. In this case, we have looped over our workﬂowby sampling the 2-body order a range of ≤ O ≤ and the 3-body order over a rangeof ≤ O ≤ . Once again, we present results from LASSO/LARS optimization withregularization of − . Each design loop in this case required only ∼ E Rep training data generation were complete), almost all of which was spent on validationcalculations.Our results indicate weak correlation between choice of 2-body and 3-body polynomialorder (Figure 5). Inspection of results from a single 3-body order (e.g., O = 4 ) indicatesonly small changes in any of the validation results presented here with varying values forthe 2-body order. We ﬁnd that 2-body only repulsive energies (with O = 0 ) yield someimproved accuracy in terms of (cid:16) E Top111 − E Holl111 (cid:17) , where we observe values of 0.25 eV for { O =8 , O = 0 } and 0.28 eV for { O = 16 , O = 0 }. However, these 2-body only repulsiveenergies yield E surface energies that are consistently over 30% too high and an ( E /E ) ratio that is over 20% too small (e.g., a deviation of 0.020 eV/Å and a ratio of 1.35 for{ O = 8 , O = 0 }). In contrast, we ﬁnd that increasing the 3-body orders to non-zerovalues yields substantially improved E prediction and a somewhat improved ( E /E ) ratio (a deviation of < and ratio of 1.43 for { O = 8 , O = 4 }). However,large 3-body orders result in somewhat worse values for (cid:16) E Top111 − E Holl111 (cid:17) , with values of 0.16eV for all O = 16 parameter sets.Lastly, in order to examine the possibility of including higher-body interactions in E Rep ,we have determined a ChIMES parameterization which includes a 4-body order of four, i.e.,{ O = 8 , O = 4 , O = 4 }. We ﬁnd that this ChIMES model yields virtually identicalresults to those from { O = 8 , O = 4 }. For example, both E Top111 and E Holl111 agree betweenboth models within 0.01 eV. This is in part due to the fact that the vast majority of the4-body ChIMES parameters are small and consequently are set to a value of zero by the18ASSO/LARS algorithm. This implies that the majority of interactions encompassed in therepulsive energy (for this system, at least) are three-center, only, which can provide guidancefor future E Rep determinations.As a result, we opt to proceed with a set of { O = 8 , O = 4 } for this part of oureﬀort. This set yields a value of E = 0 . , which is nearly identical to the DFT result.In addition, we observe values of ( E /E ) = 1 . and (cid:16) E Top111 − E Holl111 (cid:17) = 0 . eV. Thesevalues are similar to those from { O = 12 , O = 8 }, but yield slightly improved agreementto DFT results overall (1.70 and 0.68 eV). In addition, the lower polynomial orders helpdecrease the risk of overﬁtting that might not be observed with the current validation set. Given our choices of { R T iψ , R T in } and { O , O }, we now explore several diﬀerent optionsfor the solving the linear least-squares and regularization to obtain the ChIMES coeﬃcients.For these tests, we have used the Singular Value Decomposition (SVD) and Least-AngleRegression (LARS) with LASSO regularization (abbreviated as LASSO/LARS). We nowoﬀer a brief discussion of each method and leave details to the pertinent references.SVD solves for optimal ﬁtting coeﬃcients directly by performing an eigendecompositionof the generally rectangular A matrix and computing its pseudo-inverse. Regularizationcan be performed by setting singular values (eigenvalues of the square matrix in the SVDdecomposition) with an absolute value below a given threshold to zero. In our work, wetake this parameter to be D max (cid:15) , where D max is the maximum singular value of A and (cid:15) isa factor below a value of one.LARS is a type of forward step-wise or iterative regression. In this case, all modelcoeﬃcients are initialized to zero and the covariate (i.e., polynomial values) most correlatedto the error residual is determined (i.e., those having the most signiﬁcant impact on theﬁt). The corresponding ChIMES parameter is modiﬁed incrementally to minimize the errorresidual until a second covariate yields an equal correlation. At this point, it is included19n the active parameter set and both coeﬃcients are modiﬁed simultaneously. The processcontinues until all coeﬃcients are included in the solution, at which point a result equivalentto ordinary least squares ﬁtting is obtained. In practice, LARS steps using a subset of allpossible parameters are chosen.LASSO is an L1-norm regularization method whereby regularization is based on thesum of the absolute values of the ﬁtting coeﬃcients, which has the eﬀect of shrinking asubset of parameters to zero. In this case, the objective function F obj is minimized with thefollowing additional constraint: F LASSOobj = F obj + 2 α n i (cid:88) i =1 | c i | . (13)Here, n i is the total number of unique ﬁtting parameters, c i . The parameter α regularizes themagnitude of the ﬁtting coeﬃcients, which reduces possible overﬁtting. The LASSO methodcan be implemented as a variant of LARS where parameters are either added or removedat each solution stage. We ﬁnd the LASSO variant of LARS to be numerically stable forill-conditioned A matrices, which are often found in force matching.Our exploration spanned SVD (cid:15) values between − ≤ (cid:15) ≤ − and LASSO/LARS α values between − ≤ α ≤ − (Figure 6). SVD optimization with (cid:15) = 10 − resulted inunstable models and are subsequently not included in our discussion. Results for E showlittle dependence on choice of algorithm or regularization parameter. Both optimizationalgorithms yield very similar values for the E surface energy, with regularization of − showing the best agreement with DFT and LASSO/LARS showing near perfect accuracy.In contrast, the other properties discussed here show some variation between algorithmsfor regularization of − . SVD yields a somewhat lower value than LASSO/LARS for the ( E /E ) ratio (1.418 vs. 1.431) and a slightly higher value for (cid:16) E Top111 − E Holl111 (cid:17) (0.21 vs.0.19 eV). The agreement overall between algorithms is improved for smaller regularizationparameters. LASSO/LARS optimization with α = 10 − can appear to be an appealingchoice because this parameterization maximizes (cid:16) E Top111 − E Holl111 (cid:17) . However, this model yields20elatively large errors in the TiH lattice constant, (4.311 Å, compared to a value of 4.440 Åfrom α = 10 − and 4.417 Å from DFT) and E surface energy (0.092 eV/Å , comparedto values of 0.080 eV/Å from α = 10 − and DFT). Hence, we choose to proceed withLASSO/LARS optimization with α = 10 − as the best choice for our TiH model, thoughSVD optimization with (cid:15) = 10 − is nearly equivalent. Our ﬁnal set of hyper-parameter values includes { R T iψ = 3 . , R T in = 5 . } and { O = 8 , O = 4 }, optimized with LASSO/LARS and regularization of α = 10 − . This model yieldsRMS errors of 1.76 eV/Å for hydrogen forces, 1.35 eV/Å for titanium forces, and 0.35 GPafor the stress tensor diagonal. We now discuss results for our full set of validation data. Forthe remainder of our discussion, we refer to our optimal model as “DFTB/ChIMES”.Results for bulk properties (Table 2) indicate that DFTB/ChIMES yields a lattice con-stant with errors of only ∼ respectively. How-ever, our model yields a hydrogen bulk vacancy energy ( E vac ) that is ∼ improved descriptions of charge transfer, or inclusion of multi-center terms in the DFTBHamiltonian, which is the subject of future work.We ﬁnd our model yields accurate surface energies for all three low-index facets investi-gated in this study (Table 3). In particular, the E and E values are nearly identical tothose from DFT. The E value from DFTB/ChIMES is around 17% lower than than thatfor our DFT calculations (0.114 vs. 0.136 eV/Å ). However, this could be due in part to theinternal electric ﬁeld on the (001) surface conﬁguration studied here. DFTB generally canunderestimate surface electrostatic interactions due to its determination of atom-centeredpoint charges only in Coulombic interactions. E vac is 0.18 eV higher than that of the bulk, followed by a 0.57 eV dropin value in the second layer relative to the ﬁrst. This is in accordance with our DFT results,which indicates an increase of 0.34 eV for the ﬁrst layer relative to the bulk followed by arelative decrease of 0.51 eV in the second layer. Both DFTB/ChIMES and DFT indicatethat the second layer E vac is lower than that of the bulk (0.40 eV for DFTB/ChIMES and0.17 eV for DFT), and that the bulk E vac is recovered by the third surface layer. A value of0.5 eV could be added as a correction factor to vacancy calculations on these systems.Our DFTB/ChIMES results show similarly strong agreement with hydrogen surface ad-sorption energies (Figure 1 and Table 5). We observe the correct energetic ordering ofadsorption on the (111) Top and Hollow sites, though the Hollow site energy is 0.35 eVsmaller than that from DFT. We see similar agreement with DFT for the (011) surface.Here, DFTB/ChIMES show close agreement for Top site adsorption with a diﬀerence ofonly 0.05 eV from DFT. Our model yields Bridge-1 and Bridge-2 adsorption energies thatdiﬀer from DFT by 0.29 eV and 0.21 eV, respectively, and incorrectly predicts that the Topsite is the lowest energetically of the three. However, these values are similar in energy forall surface sites and we have overall favorable agreement.Finally, we have computed the vibrational density of states (VDOS) from MD calculationsof bulk ground-state TiH at 400 K (Figure 7). We have made comparison to DFTB/ChIMESresults using the unit cell and k-point mesh of × × , as well as comparison to simulationof a × × supercell (216 atoms total) using a × × k-point mesh. Our results indicate22trong agreement between DFTB/ChIMES and DFT for the manifold of both Ti-Ti andH-Ti vibrational modes. In particular, we see a close match between relative peak heightsand positions for the DFT and DFTB/ChIMES unit cell simulations at frequencies lessthan 400 cm − and greater than 1000 cm − , including the small shoulder at ∼ − .We observe some small shifting of peaks when comparing the DFTB/ChIMES unit celland replicated supercell results, though these changes are generally < − and are thusenergetically quite small (i.e., < − eV). Overall, we ﬁnd our DFTB/ChIMES model toyield robust agreement with DFT, allowing for MD calculations with larger system sizes thatcould incorporate more realistic defect concentrations and/or grain boundaries. In this work, we have created a workﬂow for DFTB model creation that allows for nearlyexhaustive search for optimal Hamiltonian parameters and repulsive energies. Our eﬀortsleverage the ChIMES reactive force-ﬁeld, based on linear combinations of Chebyshev poly-nomials for the determination of the DFTB repulsive energy. Optimal ChIMES parameterscan be determined extremely rapidly to create many-body interaction potentials, which fa-cilitates testing of any number of DFTB related parameters. Each design loop is run in asemi-automated fashion, allowing for a systematic exploration of the manifold of options forthe conﬁning potential radii, polynomial order and bodied-ness for the ChIMES potential,and type of least-squares algorithm or regularization to solve for the ChIMES parameters. Indoing so, we are able to accelerate the down selection of DFTB models for a given application,removing the need for ad hoc parameter choices.Our results indicate DFTB/ChIMES models can be accurately determined based onrelatively small training data (unit cell MD calculations in this work), even for physicallycomplex systems such as those containing surface chemistry. Further reﬁnement of our TiH model could involve inclusion of training data from additional phases and thermodynamic23tate points, and systematic expansion of the training set is the subject of future work.Regardless, our current eﬀort yields accurate results for bulk and surface properties, includingsurface energies, hydrogen adsorption energies, and hydrogen vacancy energies in the bulkand in diﬀerent surface layers. The small training set could yield signiﬁcant advantages forcomputationally challenging systems such as magnetic materials and their interfaces, whereDFT data is limited and diﬃcult to generate. Overall, our DFTB/ChIMES approach canhave particular impact on myriad of research areas, such as interpretation of imaging andspectroscopy experiments on bulk and interfacial systems, where there is traditionally astrong coupling with atomistic simulation approaches. Acknowledgments

This work was performed under the auspices of the U.S. Department of Energy by LawrenceLivermore National Laboratory under Contract DE-AC52-07NA27344. Computations wereperformed at LLNL using the borax, boraxo, quartz, rztrona, and rztopaz massively par-allel computers. Projects 18-SI-001 with Jason Jeﬀries as PI and 20-SI-004 with BrandonWood as PI were funded by the Laboratory Directed Research and Development Programat LLNL. The ChIMES force ﬁeld code is available for download at https://github.com/rk-lindsey/chimes_calculator (last accessed on February 7th, 2021). The DFG grant RTG 2247is acknowledged. ChIMES potential parameters are available upon request.24 eferences (1) Cannella, C.; Goldman, N. Carbyne ﬁber synthesis within evaporating metallic liquidcarbon.

J. Phys. Chem. C , , 21605–21611.(2) Goldman, N.; Tamblyn, I. Prebiotic chemistry within a simple impacting icy mixture. J. Phys. Chem. A , , 5124 – 5131.(3) Armstrong, M. R.; Zaug, J. M.; Goldman, N.; Kuo, I.-F. W.; Crowhurst, J. C.;Howard, W. M.; Carter, J. A.; Kashgarian, M.; Chesser, J. M.; Barbee, T. W.; Bastea, S.Ultrafast Shock Initiation of Exothermic Chemistry in Hydrogen Peroxide. J. Phys.Chem. A , , 13051.(4) Koziol, L.; Goldman, N. Prebiotic hydrocarbon synthesis in impacting reduced astro-physical icy mixtures. Astrophys. J. , , 91–98.(5) Koziol, L.; Fried, L. E.; Goldman, N. Using Force Matching To Determine ReactiveForce Fields for Water under Extreme Thermodynamic Conditions. J. Chem. TheoryComput. , , 135–146.(6) Kroonblawd, M. P.; Lindsey, R. K.; Goldman, N. Synthesis of Nitrogen-ContainingPolycyclic Aromatic Hydrocarbons in Impacting Glycine Solutions. Chemical Science , , 6091.(7) Steele, B. A.; Goldman, N.; Kuo, I.-F. W.; Kroonblawd, M. P. Mechanochemical syn-thesis of glycine oligomers in a virtual rotational diamond anvil cell. Chem. Sci. , , 7760–7771.(8) Goldman, N.; Browning, N. D. Gold Cluster Diﬀusion Kinetics on Stoichiometric andReduced Surfaces of Rutile TiO (110). J. Phys. Chem. C , , 11611–11617.(9) Goldman, N.; Morales, M. A. A First-Principles Study of Hydrogen Diﬀusivity and25issociation on δ -Pu (100) and (111) Surfaces. J. Phys. Chem. C , , 17950–17957.(10) Mundy, C. J.; Curioni, A.; Goldman, N.; Kuo, I.-F.; Reed, E.; Fried, L. E.; Ianuzzi, M.Ultarfast transformation of graphite into diamond: An ab initio study of graphite undershock compression. J. Chem. Phys. , , 184701.(11) Goldman, N.; Reed, E. J.; Kuo, I.-F. W.; Fried, L. E.; Mundy, C. J.; Curioni, A. Abinitio simulation of the equation of state and kinetics of shocked water. J. Chem. Phys. , , 124517.(12) Goldman, N.; Reed, E. J.; Fried, L. E. Quantum Corrections to Shock Hugoniot Tem-peratures. J. Chem. Phys. , , 204103.(13) Goldman, N.; Reed, E. J.; Fried, L. E.; Kuo, I.-F. W.; Maiti, A. Synthesis of glycine-containing complexes in impacts of comets on early Earth. Nat. Chem. , , 949–954.(14) Manaa, M. R.; Reed, E. J.; Fried, L. E.; Goldman, N. Nitrogen-Rich Heterocycles asReactivity Retardants in Shocked Insensitive Explosives. JACS , , 5483–5487.(15) Sours, T.; Patel, A.; Norskov, J.; Siahrostami, S.; Kulkarni, A. Circumventing ScalingRelations in Oxygen Electrochemistry Using Metal-Organic Frameworks. The Journalof Physical Chemistry Letters , , 10029–10036.(16) Sliwa, M.; McGonegle, D.; Wehrenberg, C.; Bolme, C. A.; Heighway, P. G.; Higgin-botham, A.; Lazicki, A.; Lee, H. J.; Nagler, B.; Park, H. S.; Rudd, R. E.; Suggit, M. J.;Swift, D.; Tavella, F.; Zepeda-Ruiz, L.; Remington, B. A.; Wark, J. S. FemtosecondX-Ray Diﬀraction Studies of the Reversal of the Microstructural Eﬀects of Plastic De-formation during Shock Release of Tantalum. Phys. Rev. Lett. , , 265502.2617) Stöhr, M.; Medrano Sandonas, L.; Tkatchenko, A. Accurate Many-Body RepulsivePotentials for Density-Functional Tight Binding from Deep Tensor Neural Networks. The Journal of Physical Chemistry Letters , , 6835–6843.(18) Kroonblawd, M. P.; Pietrucci, F.; Saitta, A. M.; Goldman, N. Generating ConvergedAccurate Free Energy Surfaces for Chemical Reactions with a Force-Matched Semiem-pirical Model. J. Chem. Theory Comput. , , 2207–2218.(19) Kroonblawd, M. P.; Goldman, N. Mechanochemical formation of heterogeneous dia-mond structures during rapid uniaxial compression in graphite. Phys. Rev. B , , 184106.(20) Kroonblawd, M. P.; Goldman, N.; Lewicki, J. P. Chemical Degradation Pathways inSiloxane Polymers Following Phenyl Excitations. The Journal of Physical Chemistry B , , 12201–12210.(21) Kroonblawd, M. P.; Goldman, N.; Lewicki, J. P. Anisotropic Hydrolysis Susceptibilityin Deformed Polydimethylsiloxanes. The Journal of Physical Chemistry B , ,7926–7935.(22) Goldman, N.; Koziol, L.; Fried, L. E. Using force-matched potentials to improve theaccuracy of density functional tight binding for reactive conditions. J. Chem. TheoryComput. , , 4530–4535.(23) Kullgren, J.; Wolf, M. J.; Hermansson, K.; Köhler, C.; Aradi, B.; Fauenheim, T.;Broqvist, P. Self-Consistent-Charge Density-Functional Tight-Binding (SCC-DFTB)Parameters for Ceria in 0D to 3D. J. Phys. Chem. C , , 4593–4607.(24) Goldman, N.; Fried, L. E. Extending the Density Functional Tight Binding Method toCarbon Under Extreme Conditions. J. Phys. Chem. C , , 2198–2204.2725) Goldman, N.; Srinivasan, S. G.; Hamel, S.; Fried, L. E.; Gaus, M.; Elstner, M. Deter-mination of a density functional tight binding model with an extended basis set andthree-body repulsion for carbon under extreme pressures and temperatures. J. Phys.Chem. C , , 7885 – 7894.(26) Srinivasan, S. G.; Goldman, N.; Tamblyn, I.; Hamel, S.; Gaus, M. Determination ofa density functional tight binding model with an extended basis set and three-bodyrepulsion for hydrogen under extreme thermodynamic conditions. J. Phys. Chem. A , , 5520–5528.(27) Lindsey, R. K.; Fried, L. E.; Goldman, N. ChIMES: A Force Matched Potential withExplicit Three-Body Interactions for Molten Carbon. J. Chem. Theory Comput. , , 6222–6229.(28) Lindsey, R. K.; Fried, L. E.; Goldman, N.; Bastea, S. Active learning for robust, high-complexity reactive atomistic simulations. J. Chem. Phys. , , 134117.(29) Lindsey, R. K.; Fried, L. E.; Goldman, N. Application of the ChIMES Force Field toNonreactive Molecular Systems: Water at Ambient Conditions. Journal of ChemicalTheory and Computation , , 436–447.(30) Armstrong, M. R.; Lindsey, R. K.; Goldman, N.; Nielsen, M. H.; Stavrou, E.;Fried, L. E.; Zaug, J. M.; Bastea, S. Ultrafast shock synthesis of nanocarbon froma liquid precursor. Nature Communications , , 353.(31) Lindsey, R. K.; Goldman, N.; Fried, L. E.; Bastea, S. Many-body reactive force ﬁelddevelopment for carbon condensation in C/O systems under extreme conditions. J.Chem. Phys. , , 054103.(32) Pham, C. H.; Lindsey, R. K.; Fried, L. E.; Goldman, N. Calculation of the detonationstate of HN with quantum accuracy. J. Chem. Phys. , , 224102.2833) Tersoﬀ, J. Empirical interatomic potential for carbon, with application to amorphous-carbon. Phys. Rev. Lett. , , 2879.(34) Nomura, K.; Kalia, R. K.; Nakano, A.; Vashishta, P.; van Duin, A. C. T. Mechanochem-istry of shock-induced nanobubble collapse near silica in water. Appl. Phys. Lett. , , 073108.(35) Behler, J. Perspective: Machine learning potentials for atomistic simulations. J. Chem.Phys. , , 170901.(36) Kitabayashi, K.; Edalati, K.; Li, H.-W.; Akiba, E.; Horita, Z. Phase Transformationsin MgH -TiH Hydrogen Storage System by High-Pressure Torsion Process.

AdvancedEngineering Materials , , 1900027.(37) Shanavas, K. V.; Lindsay, L.; Parker, D. S. Electronic structure and electron-phononcoupling in TiH . Scientiﬁc Reports , , 28102.(38) Sheppard, D. A.; Paskevicius, M.; Humphries, T. D.; Felderhoﬀ, M.; Capurso, G.;Bellosta von Colbe, J.; Dornheim, M.; Klassen, T.; Ward, P. A.; Teprovich, J. A.;Corgnale, C.; Zidan, R.; Grant, D. M.; Buckley, C. E. Metal hydrides for concentratingsolar thermal power energy storage. Applied Physics A , , 395.(39) Kovalev, D. Y.; Ratnikov, V. K. P. V. I.; Ponomarev, V. I. Thermal Decomposition ofTiH : A TRXRD Study. International Journal of Self Propagating High TemperatureSynthesis , , 253–257.(40) Sandim, H. R. Z.; Morante, B. V.; Suzuki, P. A. Kinetics of Thermal Decomposition ofTitanium Hydride Powder Using in situ High-temperature X-ray Diﬀraction (HTXRD).

Materials Research , , 293–297.(41) Peng, Q.; Yang, B.; Liu, L.; Song, C.; Friedrich, B. Porous TiAl alloys fabricated by29intering of TiH2 and Al powder mixtures. Journal of Alloys and Compounds , , 530–538.(42) Kresse, G.; Hafner, J. Ab initio molecular dynamics for liquid metals.

Phys. Rev. B , , 558–561.(43) Kresse, G.; Hafner, J. Ab initio molecular dynamics simulation of the liquid-metal-amorphous-semiconductor transition in germanium.

Phys. Rev. B , , 14251–14271.(44) Kresse, G.; Furthmüller, J. Eﬃcient iterative schemes for ab initio total-energy calcu-lations using a plane-wave basis set. Phys. Rev. B , , 11169–11186.(45) Blöchl, P. E. Projector augmented-wave method. Phys. Rev. B , , 17953–17979.(46) Kresse, G.; Joubert, D. From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B , , 1758–1775.(47) Perdew, J. P.; Burke, K.; Enzerhof, M. Generalized gradient approximation made sim-ple. Phys. Rev. Lett. , , 3865–3868.(48) Methfessel, M.; Paxton, A. T. High-precision sampling for Brillouin-zone integration inmetals. Phys. Rev. B , , 3616–3621.(49) Mermin, N. D. Thermal properties of the Inhomogenous Electron Gas. Phys. Rev. , , 1441–1443.(50) Nosé, S. Molecular Physics , , 255.(51) Hoover, W. G. Physical Review A , , 1695.(52) Martyna, G. J.; Klein, M. L.; Tuckerman, M. Nosé-Hoover chains: The canonical ensem-ble via continuous dynamics. The Journal of Chemical Physics , , 2635–2643.3053) Elstner, M.; Porezag, D.; Jungnickel, G.; Elsner, J.; Haugk, M.; Frauenheim, T.;Suhai, S.; Seifert, G. Self-consistent-charge density-functional tight-binding methodfor simulations of complex materials properties. Phys. Rev. B , , 7260–7268.(54) Koskinen, P.; Mäkinen, V. Density-functional tight-binding for beginners. Comp. Mater.Sci. , , 237–253.(55) Gaus, M.; Cui, Q.; Elstner, M. DFTB3: Extension of the Self-Consistent-ChargeDensity-Functional Tight-Binding Method (SCC-DFTB). J. Chem. Theory Comput. , , 931.(56) Aradi, B.; Hourahine, B.; Frauenheim, T. DFTB+, a sparse matrix-based implementa-tion of the DFTB method. J. Phys. Chem. A , , 5678–5684.(57) Hourahine, B.; Aradi, B.; Blum, V.; Bonafé, F.; Buccheri, A.; Camacho, C.; Ceval-los, C.; Deshaye, M. Y.; Dumitrică, T.; Dominguez, A.; Ehlert, S.; Elstner, M.; van derHeide, T.; Hermann, J.; Irle, S.; Kranz, J. J.; Köhler, C.; Kowalczyk, T.; Kubař, T.;Lee, I. S.; Lutsker, V.; Maurer, R. J.; Min, S. K.; Mitchell, I.; Negre, C.; Niehaus, T. A.;Niklasson, A. M. N.; Page, A. J.; Pecchia, A.; Penazzi, G.; Persson, M. P.; Řeziáč, J.;Sánchez, C. G.; Sternberg, M.; Stöhr, M.; Stuckenberg, F.; Tkatchenko, A.; Yu, V.W.-z.; Frauenheim, T. DFTB+, a software package for eﬃcient approximate densityfunctional theory based atomistic simulations. The Journal of Chemical Physics , , 124101.(58) Goldman, N. Multi-center semi-empirical quantum models for carbon under extremethermodynamic conditions. Chem. Phys. Lett. , , 128–136.(59) Christensen, A. S.; Kubař, T.; Cui, Q.; Elstner, M. Semiempirical Quantum Mechani-cal Methods for Noncovalent Interactions for Chemical and Biochemical Applications. Chemical Reviews , , 5301–5337.3160) Porezag, D.; Frauenheim, T.; Köhler, T.; Seifert, G.; Kaschner, R. K. Construction oftight-binding-like potentials on the basis of density functional theory: Application tocarbon. Phys. Rev. B , , 12947.(61) Gaus, M.; Goez, A.; Elstner, M. Parametrization and Benchmark of DFTB3 for OrganicMolecules. Journal of Chemical Theory and Computation , , 338–354.(62) Gaus, M.; Cui, Q.; Elstner, M. DFTB3: Extension of the Self-Consistent-ChargeDensity-Functional Tight-Binding Method (SCC-DFTB). J. Chem. Theory Comput. , , 931–948.(63) Goldman, N.; Aradi, B.; Lindsey, R. K.; Fried, L. E. J. Chem. Theory. Comput. , , 2652–2660.(64) Dantanarayana, V.; Nemetiaram, T.; Vong, D.; Anthony, J. E.; Troisi, A.; Cong, K. N.;Goldman, N.; Faller, R.; Moulé, A. J. Predictive Model of Charge Mobilities in Or-ganic Semiconductor Small Molecules with Force-Matched Potentials. J. Chem. TheoryComput. , , 3494–3503.(65) Rappé, A. K.; Casewitt, C. J.; Colwell, K. S.; III, W. A. G.; Skiﬀ, W. M. UFF,a Full Periodic Table Force Field for Molecular Mechanics and Molecular DynamicsSimulations. J. Am. Chem. Soc. , , 10024–10039.(66) Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A consistent and accurate ab initioparametrization of density functional dispersion correction (DFT-D) for the 94 elementsH-Pu. The Journal of Chemical Physics , , 154104.(67) Wang, Y.; Shepler, B. C.; Braams, B. J.; Bowman, J. M. Full-dimensional, ab initiopotential energy and dipole moment surfaces for water. J. Chem. Phys. , ,054511. 3268) Wang, Y.; Huang, X.; Shepler, B. C.; Braams, B. J.; Bowman, J. M. Flexible, ab initiopotential, and dipole moment surfaces for water. I. Tests and applications for clustersup to the 22-mer. J. Chem. Phys. , , 094509.(69) Luschtinetz, R.; Frenzel, J.; Milek, T.; Seifert, G. Adsorption of phosphonic acid at theTiO anatase (101) and rutile (110) surface. J. Phys. Chem. C , , 5730–5740.(70) Zheng, G.; Witek, H. A.; Bobadova-Parvanova, P.; Irle, S.; Musaev, D. G.; Prab-hakar, R.; Morokuma, K.; Lundberg, M.; Elstner, M.; Kohler, C.; Frauenheim, T.Parameter Calibration of Transition-Metal Elements for the Spin-Polarized Self-Consistent-Charge Density-Functional Tight-Binding (DFTB) Method: Sc, Ti, Fe, Co,and Ni. J. Chem. Theory Comput. , , 1349–1367.(71) Dolgonos, G.; Aradi, B.; Moreira, N. H.; Frauenheim, T. An Improved Self-Consistent-Charge Density-Functional Tight-Binding (SCC-DFTB) Set of Parameters for Simu-lation of Bulk and Molecular Systems Involving Titanium. J. Chem. Theory Comput. , , 266–278.(72) Goyal, P.; Qian, H.-J.; Irle, S.; Lu, X.; Roston, D.; Mori, T.; Elstner, M.; Cui, Q. Molec-ular Simulation of Water and Hydration Eﬀects in Diﬀerent Environments: Challengesand Developments for DFTB Based Models. The Journal of Physical Chemistry B , , 11007–11027.(73) Han, Y.; Lai, K. C.; Lii-Rosales, A.; Tringides, M. C.; Evans, J. W.; Thiel, P. A. Surfaceenergies, adhesion energies, and exfoliation energies relevant to copper-graphene andcopper-graphite systems. Surface Science , , 48–58.(74) Press, W.; Flannery, B. P.; Teukolsky, S. A.; Wetterling, W. T. Numerical Recipes ;Cambridge University Press: Cambridge, 1989.(75) Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression.

The Annalsof statistics , , 407–499. 3376) Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software , , 1.(77) Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the RoyalStatistical Society: Series B (Methodological) , , 267–288.(78) Moser, D.; Bull, D. J.; Sato, T.; Noréus, D.; Kyoi, D.; Sakai, T.; Kitamura, N.; Yusa, H.;Taniguchi, T.; Kalisvaart, W. P.; Notten, P. Structure and stability of high pressuresynthesized Mg-TM hydrides (TM = Ti, Zr, Hf, V, Nb and Ta) as possible new hydrogenrich hydrides for hydrogen storage. J. Mater. Chem. , , 8150–8161.(79) Bodrog, Z.; Aradi, B. Possible improvements to the self-consistent-charges density-functional tight-binding method within the second order. Phys. Status Solidi B , , 259–269.(80) Podeszwa, R.; Jankiewicz, W.; Krzuś, M.; Witek, H. A. Correcting long-range electro-statics in DFTB. J. Chem. Phys. , , 234110.34able 1: Comparison of computed atomic charges for bulk TiH . Bader charges are computedfrom DFT whereas Mulliken charges are computed from our DFTB model Hamiltonians. Method q T i q H DFT +1.90 -0.95Trial 1 − . e − . e − Trial 2 +0.32 -0.16Trial 3 +0.430 -0.215

Table 2: TiH bulk properties. Method Lattice constant (Å) H vacancy energy (eV)DFTB/ChIMES 4.400 2.47DFT 4.417 3.00Expt. Table 3: TiH surface energies (in eV/Å ). Surface DFTB/ChIMES DFT111 0.080 0.080011 0.105 0.101001 0.114 0.136

Table 4: TiH surface vacancy energies (in eV). The “layer” label corresponds to the surfacelayer from which the H atom is removed. Surface Layer DFTB/ChIMES DFT111 1st 2.98 3.672nd 2.46 3.123rd 2.46 2.974th 2.49 3.02011 1st 2.65 3.342nd 2.07 2.833rd 2.54 2.994th 2.52 3.00

Table 5: Surface hydrogen adsorption energies on TiH surface sites (in eV). Surface Site DFTB/ChIMES DFT111 Top -1.888 -1.760Hollow -2.081 -2.440011 Top -2.383 -2.332Bridge-1 -2.154 -2.442Bridge-2 -2.132 -2.342 iH xo oo ooo ox x (111) surface (011) surfaceBulk Figure 1: Pictures of TiH bulk and surfaces. The left panel shows the bulk fcc lattice. Themiddle panel shows the (111) crystalline surface the Top (marked with an ‘O’) and Hollow(‘X’) adsorption sites indicated. The right panel shows the (011) crystalline surface with theTop (‘O’), Bridge-1 (‘ X ’) and Bridge-2 (‘ X ’) sites indicated.36 elect !", $ ! , $ " ; Create SKF files. Compute DFTB training set: ⃗& DFT − ⃗&

DFTB (no E Rep ) ( , DFT −( , DFTB (no E Rep ) Desired accuracy achieved?

Validation set: • Bulk: lattice const., VDOS, 1H and 2H vacancies. • (001), (011), and (111) surface energies • E ads on (011) and (111) surfaces (5 total) • (011) and (111) surface and subsurface H vacancy energies (8 total) Choose ChIMES 2B, 3B, 4B orders, cutoff radii and determine ) $%& . Yes NoComplete

Compute DFT-MD data

Figure 2: Flowchart for creation of DFTB E rep models through ChIMES force ﬁeld param-eterization. 37

10 -8 -6 -4 -2 0 2 4 6 800.040.080.120.16

Figure 3: Total electronic density of states for bulk face centered cubic TiH . The dashedcorresponds to the result from the Trial 1 Hamiltonian, the red line to Trial 2, the blue lineto Trial 3, and the shaded area to DFT. Each curve is centered around its Fermi energy.38 R T i n ( a u ) R Ti y (au) −0.2−0.15−0.1−0.05 0 0.05 0.1 0.15 F r ac ti on a l D e v i a ti on o f E R T i n ( a u ) R Ti y (au) −0.35−0.34−0.33−0.32−0.31−0.3−0.29−0.28−0.27−0.26 D ( E / E ) R T i n ( a u ) R Ti y (au) −0.2−0.15−0.1−0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 F r ac ti on a l D e v i a it on o f ( E T op111 − E H o ll ) Figure 4: Results for sweep of values of R T iψ and R T in . The top panel corresponds to thefractional deviation of the surface energy, (cid:0) E DF T B − E DF T (cid:1) /E DF T , and the middle panel tothe deviation of ( E /E ) relative to DFT. The bottom panel corresponds to the fractionaldeviation of (cid:16) E Top111 − E Holl111 (cid:17) , where we have shifted the results so that positive values indicatethe correct ordering of E Top111 > E

Holl111 . A perfect matching with DFT in this case would yielda value of one. 39 B o r de r

2B order −0.35−0.3−0.25−0.2−0.15−0.1−0.05 0 0.05 F r ac ti on a l D e v i a ti on o f E B o r de r

2B order −0.35−0.34−0.33−0.32−0.31−0.3−0.29−0.28−0.27−0.26 D ( E / E ) B o r de r

2B order F r ac ti on a l D e v i a it on o f ( E T op111 − E H o ll ) Figure 5: Results for sweep of 2-body and 3-body ChIMES polynomial orders, with thesame deﬁnitions as Figure 4. 40 F r ac ti on a l D e v i a ti on o f E Regularization parameter value -0.284-0.282-0.28-0.278-0.276-0.274-0.272-0.27-0.268-0.266-0.264-0.262 ∆ ( E / E ) Regularization parameter value F r ac ti on a l D e v i a it on o f ( E T op111 - E H o ll ) Regularization parameter value

Figure 6: Results for sweep of diﬀerent linear least-squares optimization approaches, withthe same deﬁnitions as Figure 4. The purple bar corresponds to LASSO/LARS , and thegreen to SVD. Validation set calculations with an SVD regularization of − were unstableand hence are not included. 41

400 800 1200 160000.050.10.150.2

Frequency (cm -1 ) I n t e n s i t y Figure 7: Vibrational density of states calculated from an MD simulation of bulk TiH at 400 K. Unit cell simulations using DFT (black) and our DFTB/ChIMES model (red)are compared to results from the periodically replicated supercell (blue). Note that theoscillations at low frequency are artifacts of the Fourier transform for the small timescaleand system size of that simulation. 42 iH xo oo ooo ox x (111) surface (011) surfaceBulk Select DFTB parameters Compute DFTB training setDesired accuracy? Compute validation set Choose ChIMESparametersYes NoComplete Compute DFT