[PDF] Machine Learning a Molecular Hamiltonian for Predicting Electron Dynamics

Abstract

We develop a computational method to learn a molecular Hamiltonian matrix from matrix-valued time series of the electron density. As we demonstrate for three small molecules, the resulting Hamiltonians can be used for electron density evolution, producing highly accurate results even when propagating 1000 time steps beyond the training data. As a more rigorous test, we use the learned Hamiltonians to simulate electron dynamics in the presence of an applied electric field, extrapolating to a problem that is beyond the field-free training data. We find that the resulting electron dynamics predicted by our learned Hamiltonian are in close quantitative agreement with the ground truth. Our method relies on combining a reduced-dimensional, linear statistical model of the Hamiltonian with a time-discretization of the quantum Liouville equation within time-dependent Hartree Fock theory. We train the model using a least-squares solver, avoiding numerous, CPU-intensive optimization steps. For both field-free and field-on problems, we quantify training and propagation errors, highlighting areas for future development.

Full PDF

[[International Journal of Dynamics and Control]

This is a pre-acceptance version of an article currently in the peer-review process.

Machine Learning a Molecular Hamiltonian for PredictingElectron Dynamics

Harish S. Bhat · Karnamohit Ranka · Christine M. Isborn

Received: date / Accepted: date

Abstract

We develop a computational method to learna molecular Hamiltonian matrix from matrix-valued timeseries of the electron density. As we demonstrate forthree small molecules, the resulting Hamiltonians canbe used for electron density evolution, producing highlyaccurate results even when propagating 1000 time stepsbeyond the training data. As a more rigorous test, weuse the learned Hamiltonians to simulate electron dy-namics in the presence of an applied electric ﬁeld, ex-trapolating to a problem that is beyond the ﬁeld-freetraining data. We ﬁnd that the resulting electron dy-namics predicted by our learned Hamiltonian are inclose quantitative agreement with the ground truth.Our method relies on combining a reduced-dimensional,linear statistical model of the Hamiltonian with a time-discretization of the quantum Liouville equation withintime-dependent Hartree Fock theory. We train the modelusing a least-squares solver, avoiding numerous, CPU-intensive optimization steps. For both ﬁeld-free and ﬁeld-on problems, we quantify training and propagation er-rors, highlighting areas for future development.

Harish S. BhatApplied Mathematics DepartmentUniversity of California, Merced5200 N. Lake Rd., Merced, CA 95343E-mail: [email protected] RankaChemistry DepartmentUniversity of California, Merced5200 N. Lake Rd., Merced, CA 95343E-mail: [email protected] M. IsbornChemistry DepartmentUniversity of California, Merced5200 N. Lake Rd., Merced, CA 95343E-mail: [email protected]

Keywords electron dynamics · electron density · Hamiltonian · machine learning · system identiﬁcation An intriguing new application of machine learning is topredict the dynamical electronic properties of a molecu-lar system [8,16,19], which is essential to understandingphenomena such as charge transfer and response to anapplied laser ﬁeld. When discussing such electron dy-namics predictions, we must start with the electronictime-dependent Schr¨odinger equation (TDSE): i dΨ ( r , t ) dt = ˆ H ( r , t ) Ψ ( r , t ) . (1)Here ˆ H ( r , t ) is the electronic Hamiltonian operator thatoperates on the time-dependent many-body electronicwave function Ψ ( r , t ), where r represents the spatialand spin coordinates of all electrons. One can derivefrom (1) an evolution equation for the time-dependentdensity operator. This operator equation can be repre-sented in a ﬁnite-dimensional basis, yielding a matrixsystem of ordinary diﬀerential equations: i d P (cid:48) ( t ) dt = (cid:20) H (cid:48) ( t ) , P (cid:48) ( t ) (cid:21) . (2)We call this the quantum Liouville-von Neumann equa-tion. Boldface capital letters denote matrices, represen-tations of operators in particular bases. Primes denoterepresentations of operators in an orthonormal basis.Here P (cid:48) ( t ) and H (cid:48) ( t ) are time-dependent density andHamiltonian matrices, respectively. The square brack-ets on the right-hand side denote a commutator; for ma-trices A and B , the commutator is [ A , B ] = AB − BA . a r X i v : . [ phy s i c s . c o m p - ph ] A ug Harish S. Bhat et al.

Inspecting a particular molecular system, one deter-mines and writes the system’s Hamiltonian, a sum ofkinetic and potential energy operators. As the Hamil-tonian includes spatial derivatives within the kineticenergy operator, (1) will be a partial diﬀerential equa-tion (PDE). For an N -electron system, ignoring spin,the PDE (1) will feature 3 N spatial degrees of free-dom. As N increases beyond N = 1, it becomes in-tractable to solve (1) directly for the time-dependentmany-body wave function Ψ ( r , t ), even with modern nu-merical analysis and high-performance computing. Forthis reason, molecular electronic structure and dynam-ics calculations typically use simpliﬁed, mean-ﬁeld ap-proaches. One such approach is time-dependent Hartree-Fock (TDHF) [12, 29] theory, which solves (2) based ona simpliﬁed form of the wave function. In HF theory,we approximate the many-body wave function usinga Slater determinant, an anti-symmetrized product ofsingle-particle orbitals φ i ( r , t ), where r now representsthe spatial and spin coordinates of one electron. Thisapproximation leads to a modiﬁed form of the Hamilto-nian ˆ H that appears in (1)—for details, see (3) below.Within HF theory, we then call (2) the TDHF equation.Equation (2), used within TDHF theory or an alter-native, similar approach called time-dependent densityfunctional theory, is used in atomic, molecular, and ma-terials calculations to simulate the dynamic electronicresponse to a perturbation, including predicting chargetransfer and spectroscopic properties [13, 21, 25, 26, 32,36,50]. In these physical science settings, one starts witha system of interest, e.g., a molecule in an applied elec-tric ﬁeld. The system’s atomic conﬁguration completelydetermines the Hamiltonian H (cid:48) and therefore the right-hand side of (2). Starting from an initial condition, thetypical workﬂow is then to numerically solve (2) for-ward in time to generate simulations of interest, i.e., togenerate P (cid:48) ( t ) for t > H (cid:48) ( t ) to encapsulate two typesof dependence on time t . First, H (cid:48) can depend explicitlyon time—we will see this below when we consider V ext ,an external, time-dependent potential. Second, withinHF theory, even if H (cid:48) does not depend explicitly ontime, it is in general a function of the density P (cid:48) ( t ).In summary, H (cid:48) ( t ) is shorthand for H (cid:48) ( t, P (cid:48) ( t )). Thisimplies that (2) is in fact a nonlinear system. Main Contribution.

In this paper, we address a systemidentiﬁcation problem for (2). Our main contributionis a computational method to estimate the molecularﬁeld-free matrix Hamiltonian H (cid:48) ( t ) from time series ob-servations of density matrices P (cid:48) ( t ). By building a data-driven model of H (cid:48) , we identify the right-hand side of(2). We use a linear model for H (cid:48) , formulate a quadratic loss function that stems from discretizing (2) in time,and eliminate unnecessary degrees of freedom. Thus wereduce model training to a least-squares problem. Wedemonstrate this method using training data consistingof density matrices P ( t ) for three small molecules.Among other tests, we use the machine-learned (ML)Hamiltonian to propagate, i.e., to solve (2) forward intime. We ﬁnd that using the ML Hamiltonian insteadof the exact Hamiltonian results in a small, acceptablelevel of propagation error, even on a time interval thatis twice the length of the time interval used for training.We then add a time-dependent external potential to ourmachine-learned, ﬁeld-free Hamiltonian; we propagateforward in time using this augmented Hamiltonian. Foreach of the three molecules we consider, the resultingsolutions are in close quantitative agreement with sim-ulations that use the exact Hamiltonian. In short, ourmachine-learned Hamiltonian extrapolates well to a dy-namical setting that diﬀers from that of the trainingdata.To our knowledge, despite the surge of interest inapplying machine learning to molecular simulation [1,6, 7, 10, 17, 18, 23, 27, 31, 33–35, 37, 39–43, 47, 48], thereare no other procedures in the literature to estimatemolecular Hamiltonians from density matrix time se-ries. Our work shares goals with other eﬀorts to learnHamiltonians, or energy functions and functionals thatare ingredients in Hamiltonians. In this space, we pri-marily see eﬀorts to learn classical Hamiltonians fromtime series [4, 5, 9, 22, 28, 30, 38, 46, 49] as well as eﬀortsto learn quantum Hamiltonians or potentials for time-independent problems [2, 3, 15, 20, 24]. Recently, a neu-ral network method to learn the exchange-correlationfunctional in time-dependent density functional theoryhas been developed [44]; solutions of the correspondingTDSE are used to train the networks.In this paper, we consider small molecular systemsmodeled with a small basis set in order to focus onmethodological development and careful analysis of er-rors. The present work forms a foundation on which wecan build towards studying systems and theories (suchas time-dependent density functional theory) in whichthe underlying potentials and functionals have yet tobe completely determined. This is the overarching mo-tivation for pursuing the present work. Extrapolation.

To clarify what we mean by extrapola-tion, let us consider a classical mass-spring system; oneend of the spring is ﬁxed and the other is connected tothe mass. Let F ( t ) be an external force applied to themass, and let x ( t ) denote the displacement from equi-librium of the mass at time t . We start with time seriesof x ( t ) for a system where x (0) (cid:54) = 0 but F ( t ) ≡

0, i.e., achine Learning a Molecular Hamiltonian for Predicting Electron Dynamics 3 a ﬁeld-free system. With this training data, supposewe seek a machine learning model that can predict themass’ position accurately, even when we switch on anapplied forcing, e.g., F ( t ) = A cos( ωt ).We view the task of (i) training with ﬁeld-oﬀ dataand (ii) predicting for ﬁeld-on systems as an extrapo-lation task. For this classical example, one approachis to learn a model of V ( x ), the spring’s potential.Equipped with a suﬃciently accurate model of V , theﬁeld-free dynamics can be predicted from Newton’s sec-ond law: m ¨ x = − V (cid:48) ( x ). Again assuming that V is suf-ﬁciently accurate, we can predict the behavior of themass-spring system accurately when the ﬁeld is on, bysolving ¨ x = − V (cid:48) ( x ) + F ( t ). Suitably expanded in sizeand complexity to the TDHF equation (2), this is es-sentially what we do in the present work. e = (cid:126) = m e = 1. An external perturbation, suchas an applied electric ﬁeld, within the Hamiltonian willgive rise to the time-evolution of the wave function thatdictates all properties of a quantum electronic system.The molecules studied are closed-shell systems—allelectrons in the system are spin-paired. Within HF the-ory, each pair of electrons with opposite spins can bedescribed by the same spatial function φ i , which is re-ferred to as restricted HF theory. For closed-shell sys-tems within such a formalism, the need to solve for N spatial orbitals occupied by N electrons reduces tosolving for ( N/

2) spatial orbitals, each of these doublyoccupied to give a total of N electrons [45].Given this choice, our Hamiltonian is thenˆ H ( r , t ) = N/ (cid:88) i (cid:34) − ∇ − (cid:88) A Z A | r − R A | + ˆ V ext ( r , t ) (cid:35) + N/ (cid:88) i (cid:20) N/ (cid:88) j × (cid:90) d r (cid:48) φ ∗ j ( r (cid:48) , t ) φ j ( r (cid:48) , t ) | r − r (cid:48) |− (cid:90) d r (cid:48) φ ∗ j ( r (cid:48) , t ) P ( r , r (cid:48) ) φ j ( r (cid:48) , t ) | r − r (cid:48) | (cid:21) , (3)where the ﬁrst group includes one-electron terms summedover half the number of electrons (the total number of spatially unique electrons): the electron kinetic energy,the electron-nuclear attraction to all nuclei A with nu-clear charge Z A at ﬁxed position R A , and the external potential ˆ V ext . In this work, the external perturbationis an electric ﬁeld treated classically within the dipoleapproximation ˆ V ext ( r , t ) = E ( t ) · ˆ µ ( r ). The ﬁrst termin the second group is the electron-electron Coulombrepulsion. The operator P ( r , r (cid:48) ) used in the last termdenotes the operation of permutation between electronsrepresented by coordinate-variables r and r (cid:48) . This termis known as the exchange operator and arises from theantisymmetry of the electronic wave function. The sec-ond group collectively represents the electron-electroninteraction operator. This operator depends on the in-stantaneous charge distribution of all other electrons,resulting in an implicit time-dependence of the elec-tronic Hamiltonian (in addition to the explicit time-dependence due to ˆ V ext ).We next deﬁne a (reduced one-body) density oper-ator, ˆ ρ , that allows us to represent the total density ofelectrons [11]:ˆ ρ ( r , t ) = (cid:88) p f p φ p ( r , t ) φ ∗ p ( r , t ) = (cid:88) p f p | φ p (cid:105)(cid:104) φ p | , (4)where f p is the occupation of orbital φ p : in a restricted,closed-shell system, f p = 2 (if φ p is occupied) or 0 (if φ p is unoccupied). The corresponding density matrix ( P )is represented in the basis of { φ i } as: P ij ( t ) = (cid:90) d r φ ∗ i ( r , t )ˆ ρ ( r , t ) φ j ( r , t ) = (cid:104) φ i | ˆ ρ | φ j (cid:105) . (5)We can now write down the Liouville-von Neumannequation in operator form: i d ˆ ρ ( r , t ) dt = [ ˆ H ( r , t ) , ˆ ρ ( r , t )] . (6)This is an operator equation for the evolution of ˆ ρ . Thetime-dependent molecular orbitals φ i are often createdfrom a linear combination of basis functions { χ µ } , as φ i = (cid:80) µ c µ,i ( t ) χ µ , where c µ,i ( t ) are the time-dependentcoeﬃcients. The elements of the density matrix P aregiven in this basis by P µν ( t ) = (cid:88) p f p c µ,p ( t ) c ∗ ν,p ( t ) . (7)We transform P to an orthonormal basis, yielding P (cid:48) (see Appendix). We then use the Liouville equation forthe density operator to write the TDHF equation inmatrix form i d P (cid:48) ( t ) dt = (cid:20) H (cid:48) ( t ) , P (cid:48) ( t ) (cid:21) , (8)where H (cid:48) ( t ) is the Hamiltonian (or Fock) matrix thatresults from integrating (3) over r in the orthonormalbasis. In this work, primed notations ( e.g. , H (cid:48) , P (cid:48) ) are Harish S. Bhat et al. used for matrices in the orthonormal basis and un-primed notations for matrices ( e.g. , H , P ) in the atomicorbital (AO) basis.Although it is straightforward to write down themolecular Hamiltonian if the atomic positions are known,integration of the Hamiltonian within a given basis ismore challenging and encodes ground and excited stateinformation about the molecule within that basis. Learn-ing the integrated form of the molecular matrix Hamil-tonian is thus key to determining the electron dynamics.2.2 Molecules and Exact HamiltonianHere we study three diatomic molecules: H , HeH + ,and LiH. The atoms in each of these diatomic systemsare placed along the z -axis, equidistant from the origin.The interatomic separations for H , HeH + and LiH are0.74 ˚A, 0.772 ˚A, and 1.53 ˚A, respectively. These simplemolecular systems increase in complexity, going froma symmetric two-electron homonuclear diatomic, to atwo-electron heteronuclear diatomic, to a four-electronheteronuclear diatomic. The basis set used for these cal-culations is STO-3G, a minimal basis set made of s and p atomic orbitals. For H and HeH + , this results in twobasis functions (a 2 × P and H ), and forLiH this results in six basis functions (a 6 × P and H , although some elements of the matricesare zero due to the linear symmetry of the molecule, asdiscussed later).For each molecule, the electronic structure code pro-vides the integrals that are the components of the exactHamiltonian matrix H , expressed in the same AO basisset as the density matrices. Speciﬁcally, we obtain real,symmetric, constant-in-time matrices for the kinetic en-ergy and electron-nuclear potential energy. We also ob-tain a 4-index tensor of evaluated integrals, which weuse together with the time-dependent density matri-ces P ( t ) to compute the electron-electron potential en-ergy term. These ingredients allow us to compute, foreach molecule, the exact Hamiltonian. Electron densitypropagation with this exact Hamiltonian, both withinthe electronic structure code and within our propaga-tion code, is compared to that from our ML modelHamiltonian. There are two steps involved in generating the trainingand test sets of the time series of density matrix data:1. Generating an initial condition (the initial densitymatrix). 2. Generating a trajectory using the initial conditionand the diﬀerential equation (2) for propagation.For the ﬁrst step, the HF stationary state solution isdetermined self-consistently within the electronic struc-ture code. The density matrix corresponding to thealpha-spin part of the solution, represented in the AObasis, is used as the initial condition. The second stepinvolves propagating the initial density matrix using theTDHF equation.We performed each of these steps with the Gaus-sian electronic structure program [14], using a locallymodiﬁed development version.3.1 Initial ConditionsWe have calculated initial density matrices for ﬁeld-freeand static ﬁeld conditions. For the ﬁeld-free calcula-tions, we set the V ext term to 0. For the static ﬁeld, E z = 0.05 a.u. (atomic units). Applying a static ﬁeld cre-ates an initial electron density that is not a stationarystate of the ﬁeld-free Hamiltonian and is often referredto as a delta-kick perturbation.3.2 Trajectory DataThe density matrix from the initial condition calcula-tion is used as the starting point for generating the real-time TDHF electron dynamics trajectory, i.e. P ( t ).For the ﬁeld-free trajectories, we set V ext to zeroduring propagation; we use the density matrix with thedelta kick perturbation as the initial condition. Thesetrajectories serve as the training data for the ML Hamil-tonian. There is a particular rationale behind choos-ing the delta kick perturbation. First, consider that aperturbation that is localized at one point in time is,by Fourier duality, maximally spread out in frequencyspace. Hence such a perturbation necessarily excites allmodes of the system. Second, consider that if we gener-ate a trajectory that does not excite a mode of a system,we cannot expect that a machine learning technique willbe able to learn that that mode exists, let alone howto represent the unobserved mode accurately in eitherthe potential or the full Hamiltonian. By choosing adelta kick trajectory that excites all modes, we ensurethat it is at least possible in principle to learn the fullpotential/Hamiltonian.For the ﬁeld-on trajectories, the ﬁeld-free initial den-sity matrix is used and V ext takes the following formduring propagation: V ext ( t ) = (cid:88) i ∈{ x,y,z } E i sin ( ωt ) µ i = 0 .

05 sin (0 . t ) µ z , (9) achine Learning a Molecular Hamiltonian for Predicting Electron Dynamics 5 where the time t , the ﬁeld-intensity E i along axis i ,and the ﬁeld-frequency ω are expressed in a.u. Here theﬁeld is applied only along the z-direction and µ z is thez-component of the dipole moment matrix in the AObasis. The sinusoidal ﬁeld is switched on for one fullcycle (around 3.55 fs) starting at t = 0. These ﬁeld-ontrajectories test the ML Hamiltonian in a regime quiteoutside the ﬁeld-free training regime.Using a propagation step-size of 0.002 fs, the totallength of each trajectory is 20000 time-steps (thus, eachtrajectory is 40 fs long). The real-time TDHF imple-mentation in Gaussian uses as its propagation schemethe modiﬁed midpoint unitary transformation (MMUT)algorithm [25]. For a particular molecule, suppose we are given timeseries { P (cid:48) ( t j ) } Nj =0 sampled on an equispaced temporalgrid t j = j∆t . We assume that P (cid:48) ( t ), the continuous-time trajectory corresponding to our time series, satis-ﬁes (2). Our goal is to learn the Hamiltonian H (cid:48) . As-sume that the Hamiltonian contains no explicit time-dependence—this can be ensured by generating train-ing data with no external potentials (e.g., no externalforcing). Then H (cid:48) is a Hermitian matrix of functions of P (cid:48) , the density matrix. Our strategy therefore consistsof three steps: (i) develop a model of H (cid:48) with a ﬁnite-dimensional set of parameters β , (ii) derive from (2) astatistical model, and (iii) use the model with availabledata to estimate β .Note that in order to obtain P (cid:48) , H (cid:48) from P , H , wetransform from the AO basis to its canonical orthogo-nalization [45]. We do this because the TDHF equation(2) holds in an orthonormal basis; the AO basis by itselfis not orthonormal. We leave the details of this trans-formation to the Appendix.Let us split H (cid:48) into its real and imaginary parts: H (cid:48) = H (cid:48) R + i H (cid:48) I . By Hermitian symmetry, H (cid:48) is de-termined completely by the upper-triangular compo-nent of H (cid:48) R (including the diagonal) and by the upper-triangular component of H (cid:48) I (not including the diag-onal). If H (cid:48) has size M × M , there are M ( M + 1) / H (cid:48) R and M ( M − / H (cid:48) I thatwe must model. Hence there are a total of M real de-grees of freedom, which we can represent as an M × h (cid:48) . Note that we can apply this same real andimaginary splitting to P (cid:48) ; since it is also Hermitian, itcan also be determined completely by a real vector p (cid:48) of dimension M ×

1. Then we formulate the followinglinear model for h (cid:48) ( p (cid:48) )—in what follows, we use (cid:101) to denote either statistical models or their parameters: (cid:101) h (cid:48) = (cid:101) β + (cid:101) β p (cid:48) (10)Here (cid:101) β has size M ×

1, while (cid:101) β has maximal size M × M . For the smaller molecules in our study (H andHeH + ), where the STO-3G basis set leads to a dimen-sion of M = 2, we use (10) with no modiﬁcations.For LiH, a larger molecule, to handle entries of p (cid:48) that are identically zero, and also to dramatically re-duce the computational eﬀort required for training, wemodify the basic model (10). We can understand thesemodiﬁcations very succinctly by saying that we reduceboth the number of columns of (cid:101) β and the number ofrows of all vectors in (10), namely (cid:101) h (cid:48) , (cid:101) β , and p (cid:48) . Inmore detail, what we do is ﬁrst form a set Z consistingof indices of, separately, the real and imaginary partsof training data matrices { P (cid:48) R ( t j ) } Nj =0 and { P (cid:48) I ( t j ) } Nj =0 that are not identically zero. Let us use the notation (cid:101) H (cid:48) to denote the M × M Hermitian matrix that cor-responds to the real vector (cid:101) h (cid:48) . For both P (cid:48) and alsofor our model Hamiltonian (cid:101) H (cid:48) , we restrict attention toupper-triangular matrix indices that are in the set Z .To illustrate this concretely, LiH in the STO-3G basishas dimension M = 6, and so, at each instant of time,both P (cid:48) and (cid:101) H (cid:48) are determined by M = 36 entries,of which 21 correspond to real parts and 15 correspondto imaginary parts. Of these, only 10 real parts and 6imaginary parts are not identically zero. In this way, wereduce the overall dimension of (10) from 36 down to10 + 6 = 16. We defer the formal details of this proce-dure to the Appendix.Note that (10) is by no means the only possiblemodel. We have explored higher-order polynomial mod-els that, while remaining linear in the parameters (cid:101) β ,allow (cid:101) h (cid:48) to depend nonlinearly on p (cid:48) . We have also ex-plored models in which (cid:101) h (cid:48) is allowed to depend explic-itly on time t , including through Fourier terms such assin( ωt ) and cos( ωt ). None of these choices led to anyimprovement in validation or test error, so we focus onthe linear model (10).Now that we have (10), we turn our attention to(2). Then we use a centered-diﬀerence approximationto derive from (2) the statistical model i P (cid:48) ( t j +1 ) − P (cid:48) ( t j − )2 ∆t = (cid:20) (cid:101) H (cid:48) ( P (cid:48) ( t j )) , P (cid:48) ( t j ) (cid:21) + (cid:15) j , (11)with (cid:15) j denoting error. With (cid:107) A (cid:107) F = (cid:80) i,j A ij , thesquared Frobenius norm, we form the sum of squared Harish S. Bhat et al. gradientHessianleast squares solutionwith loss function from Eq. (12), compute:

Model Training: training data: complex, self-adjoint density matrices

Input: flatten complex matrices into vectors , retaining only unique, real, nontrivial entries; form Hamiltonian model:

Model: trained Hamiltonian

Fig. 1

Overall training procedure for learning the molecular,ﬁeld-free Hamiltonian. In this paper, for each molecule, wetrain using time series with N = 1000. We use this ﬁeld-freeHamiltonian to propagate for 2 N = 2000 steps; see Figures 2and 3. We augment the learned Hamiltonian with an externalpotential (an electric ﬁeld), yielding a ﬁeld-on Hamiltonianthat we use to propagate for 2 N = 2000 steps; see Figures 5and 6. errors loss function L ( (cid:101) β ) = N − (cid:88) j =1 (cid:13)(cid:13)(cid:13)(cid:13) i P (cid:48) ( t j +1 ) − P (cid:48) ( t j − )2 ∆t − (cid:20) (cid:101) H (cid:48) ( P (cid:48) ( t j )) , P (cid:48) ( t j ) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) F . (12)4.1 Reduction to Least SquaresThe dependence of L on (cid:101) β = ( (cid:101) β , (cid:101) β ) is entirely through (cid:101) H (cid:48) . We estimate (cid:101) β by solving the optimization problem (cid:101) β ∗ = arg min β L ( (cid:101) β ). Because (10) is linear in the pa-rameters (cid:101) β , we observe that (12) must be quadratic in (cid:101) β . So, there exist constants Q (matrix), c (vector), and L (scalar) such that L ( (cid:101) β ) = 12 (cid:101) β T Q (cid:101) β + c T (cid:101) β + L . (13)Here we can identify c as the gradient of L with respectto (cid:101) β evaluated at (cid:101) β ≡

0, and Q as the Hessian of L withrespect to (cid:101) β . We compute this gradient and Hessianvia automatic diﬀerentiation of L . When Q is full rank,we have an exact minimizer − Q − c . As Q is typicallyrank deﬁcient, we replace Q − with the Moore-Penrosepseudoinverse Q † : (cid:101) β ∗ = − Q † c = arg min β (cid:107) Qβ + c (cid:107) . (14)When ( I − QQ † ) c = 0, the loss L achieves its globalminimum at (cid:101) β ∗ . For each of our molecules, we ﬁnd that (cid:107) ( I − QQ † ) c (cid:107) is small but non-zero. Still, we ﬁnd em-pirically that (14) yields a nearly zero-norm gradient of L , as good as what can be achieved via other numericaloptimization methods.We have summarized the overall procedure in ﬂowchartform in Figure 1. Equation (14) constitutes the end ofthe training procedure. In particular, we use a methodin NumPy, linalg.lstsq , to compute (14), and so weavoid the full computation of Q † . Note that gradient-based optimization can also be used to minimize theloss (12), as we have veriﬁed. However, such a procedurerequires millions of small steps, resulting in a trainingtime (for LiH) that is 500-1000 times larger than thetime required by a least-squares solver.4.2 Error MetricsInserting (14) into (13) and using properties of thepseudoinverse, ( Q † ) T = ( Q T ) † = Q † together with Q † QQ † = Q † , we obtain the training error L ( (cid:101) β ∗ ) = − (cid:2) c T Q † c + L (cid:3) , the value of the loss function at the optimal set of pa-rameters. The training error measures a local-in-timeerror, essentially equivalent to starting at the trainingdata point P (cid:48) ( t j ), propagating one step forward in timewith our learned Hamiltonian (10) and comparing withthe very next training data point P (cid:48) ( t j +1 ). Aggregat-ing these one-step errors —squaring and summing theirmagnitudes—yields the training error L ( (cid:101) β ∗ ).We contrast the training error with the propagationerror . Once we have solved for the optimal parametervalues (cid:101) β ∗ , the model Hamiltonian (10) is completelydetermined. Using this estimated Hamiltonian with theinitial condition P (cid:48) (0) from our training time series, wesolve (2) forward in time using a Runge-Kutta scheme,generating our statistical estimates of (cid:101) P (cid:48) ( t j ) from j = 1up to j = 2 N = 2000, twice the length of the trainingdata. For the Runge-Kutta integration, we set absoluteand relative tolerances to 10 − . We then deﬁne thepropagation error to be E = 12 N N (cid:88) j =1 (cid:13)(cid:13)(cid:13) P ( t j ) − (cid:101) P ( t j ) (cid:13)(cid:13)(cid:13) F . (15)In contrast to the training error, (15) measures the di-vergence between two trajectories— P (training) and (cid:101) P (propagation of ML Hamiltonian)—over many timesteps. Both trajectories have exactly the same initialcondition, and hence j = 0 is excluded from the sum.For j >

0, the two trajectories are computed using dif-ferent numerical schemes (modiﬁed midpoint for thetraining data and Runge-Kutta for the ML Hamiltonian achine Learning a Molecular Hamiltonian for Predicting Electron Dynamics 7HeH + H LiH L ( (cid:101) β ∗ ) 4 . × − . × − . × − (cid:107)∇L ( (cid:101) β ∗ ) (cid:107) . × − . × − . × − E . × − . × − . × − E Sch . × − . × − . × − E Ham . × − . × − . × − Table 1

After training, we report the training loss and thenorm of its gradient, along with three forms of propagationerror. All results are for the ﬁeld-free problem. Note that thetraining error is a sum of squared errors; for each molecule,if we divide by the training data length N = 10 , we ob-tain mean-squared training errors that are all on the order of10 − , indicating approximately 4 decimal places of accuracy.The propagation errors show a roughly even breakdown intoerror due to diﬀerent schemes versus error due to diﬀerentHamiltonians. propagation) and diﬀerent Hamiltonians. To control forscheme-related error, we compute E Sch = 12 N N (cid:88) j =1 (cid:13)(cid:13) P ( t j ) − P ( t j ) (cid:13)(cid:13) F , (16)where P ( t j ) is the result of propagating forward in timeusing the same Runge-Kutta scheme with the exactHamiltonian H (cid:48) . This exact Hamiltonian is built by (i)extracting the Hamiltonian H in the AO basis fromthe electronic structure output and then (ii) transform-ing H to H (cid:48) using the procedure described in the Ap-pendix. In (16), the two trajectories being comparedhave the same Hamiltonian and diﬀer only in the nu-merical propagation schemes used to generate them. Asa ﬁnal error metric, we compute E Ham = 12 N N (cid:88) j =1 (cid:13)(cid:13)(cid:13) (cid:101) P ( t j ) − P ( t j ) (cid:13)(cid:13)(cid:13) F . (17)The two trajectories compared here are computed us-ing the same Runge-Kutta scheme, but with diﬀerentHamiltonians. By the triangle inequality, we have E ≤E

Sch + E Ham . We may conceptualize this as breakingdown the total error into the error due to diﬀerentschemes ( E Sch ) and the error due to diﬀerent Hamil-tonians ( E Ham ). N = 1000 for each of the threemolecules HeH + , H , and LiH. See Section 3 for detailson the generation of training data. The only additionalpreprocessing step here was to omit the ﬁrst two time steps, for each molecule, and to take the subsequent1000 time steps as training data. This was carried outpurely to avoid large numerical time-derivatives d P (cid:48) /dt associated with the delta-kick perturbation at t = 0;these time-derivatives form a critical part of our lossfunction (12). We emphasize that these training trajec-tories were generated with no external potential/ﬁeld,using delta-kick initial conditions described in Section3.1.We report the value of the loss and the norm of itsgradient, after training, in the ﬁrst two rows of Table1. For each molecule, the training loss is of the orderof 10 − , which corresponds to an accuracy of roughly 4decimal places. In order to visualize this accuracy, seeFigures 2 and 3. For each molecule, we have plottedeach of the non-zero real and imaginary components(note the y -axis labels) that fully determine the Hermi-tian density matrices P (cid:48) ( t j ) at each time step t j = j∆t .In fact, in each panel, there are three curves: in black,we have plotted the actual training data produced bythe electronic structure code; in blue, we have plotted P ( t j ), the result of propagating the exact Hamiltonian;and in red, we have plotted (cid:101) P ( t j ), the result of propa-gating the ML Hamiltonian.For HeH + and H (Fig. 2), the curves agree to adegree where they can hardly be distinguished. As wedescribed above, due to the fact that in HF theory H (cid:48) depends on P (cid:48) , the TDHF equation (2) is nonlinear,and hence all of these oscillations are nonlinear oscil-lations. For LiH (Fig. 3), we can discern some diver-gence between the result of ML Hamiltonian propaga-tion (red) and the other two curves, but only for thosedensity matrix elements with relatively small variance.The sum of squares loss function (12) is biased in fa-vor of ﬁtting large-variance components; to avoid this,one could modify (12) to include weights that are in-versely proportional to density element variances. Theerrors in Figure 3 consist primarily of oscillations aboutthe black curve; the magnitudes of these oscillations aresmall and do not increase dramatically over time. Still,we should give the the machine-learned Hamiltoniancredit for performing well when we use it to propagatefor 2 N = 2000 steps, twice the length of the trainingdata used. This hints at being able to use the machine-learned Hamiltonian to extrapolate beyond the ﬁeld-free system used for training.To understand more deeply the diﬀerent sources oferror, we refer to the ﬁnal three rows of Table 1 to-gether with the left panel of Figure 4. We think of E as the overall RMS error between the training data P (cid:48) and our predicted trajectory (cid:101) P , broken down into twocomponents E sch and E Ham as explained above. If ourgoal is to track the training data, we incur errors of

Harish S. Bhat et al. R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) I m ( P _ ( , )) Gaussian (black), exact-H (blue), and ML-H (red) propagation results R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) I m ( P _ ( , )) Gaussian (black), exact-H (blue), and ML-H (red) propagation results

Fig. 2

HeH + (left) and H (right) propagation with no ﬁeld. For both molecules, we have plotted all unique real and imaginaryparts of the time-dependent density matrices: actual training data (black), exact Hamiltonian propagation (blue), and MLHamiltonian propagation (red). Note the close agreement of all three curves, on a time interval that is twice the length usedfor training. HeH + H LiH E . × − . × − . × − E Sch . × − . × − . × − E Ham . × − . × − . × − Table 2

For the ﬁeld-on problem, we report three forms ofpropagation error corresponding to ﬁeld-on versions of (15),(16), and (17). Here E measures the diﬀerence between (i)propagation of the ML Hamiltonian plus V ext and (ii) theoutput of an electronic structure code for the ﬁeld-on prob-lem; E Sch measures the diﬀerence between (ii) and (iii) prop-agation of the exact Hamiltonian plus V ext . Finally, E Ham measures the diﬀerence between (i) and (iii). Overall, we ﬁndthat the errors are lower than in Table 1, indicating that theML Hamiltonian succeeds in solving the ﬁeld-on problem. the same order of magnitude when we use either theML Hamiltonian or the exact Hamiltonian. Consistentwith Fig. 3, we ﬁnd the largest gap between exact andML Hamiltonian propagation for LiH. 5.2 Electric Field TestsAfter learning a ﬁeld-free Hamiltonian for each of thethree molecules, we compared the values of H (cid:48) ( t ) and (cid:101) H (cid:48) ( t ) along the training trajectories. We found that theML Hamiltonian does not equal the exact Hamiltonian.This led us to question whether the ML Hamiltoniancould solve a problem outside the training regime. Wetherefore augmented the ML Hamiltonian with an ap-plied electric ﬁeld, i.e., the time-dependent external po-tential V ext given in (9). Using the same Runge-Kuttascheme and tolerances described earlier, we propagatedfor 2 N = 2000 steps. We compared these results withtest data produced by an electronic structure code, andalso the results of propagating the exact Hamiltonian,augmented with V ext , via our Runge-Kutta method.For a ﬁrst view of the ﬁeld-on results, see Table 2and Figures 5 and 6. In particular, the top panels ofFigure 5 show the applied electric ﬁeld; note that itis switched oﬀ abruptly after one period. We can im-mediately discern that the applied ﬁeld substantially achine Learning a Molecular Hamiltonian for Predicting Electron Dynamics 9 R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) I m ( P _ ( , )) I m ( P _ ( , )) I m ( P _ ( , )) I m ( P _ ( , )) I m ( P _ ( , )) I m ( P _ ( , )) Gaussian (black), exact-H (blue), and ML-H (red) propagation results

Fig. 3

LiH propagation with no ﬁeld. We have plotted allunique real and imaginary parts of the time-dependent den-sity matrices: actual training data (black), exact Hamilto-nian propagation (blue), and ML Hamiltonian propagation(red). For density matrix elements with small variance, wediscern slight disagreement especially at large times. Forlarge-variance density matrix elements, the curves are in closeagreement. alters the electron density from the ﬁeld-oﬀ case. Still,in each panel, we see excellent agreement between allthree curves in each plot: the ground truth solution pro-duced by an electronic structure code (black), the resultof propagating the exact Hamiltonian plus V ext (blue),and the result of propagating the ML Hamiltonian plus V ext (red). Table 2, in which we ﬁnd errors that areroughly an order of magnitude lower than those in Ta-ble 1, conﬁrms that all computed densities are in closequantitative agreement. To repeat, all ﬁeld-on resultsare for a time interval that is twice the length usedfor training, and training was conducted using ﬁeld-oﬀdata only. Overall, we take these results to indicate that the ML Hamiltonian can indeed extrapolate to problemsettings beyond the one used for training.For a deeper understanding of the ﬁeld-on results,we focus on the right panel of Figure 4 and Figure 7. Inthe right panel of Figure 4, we compare (i) the resultof propagating the ML Hamiltonian plus V ext against(ii) the ground truth solution, the output of the elec-tronic structure code for the ﬁeld-on problem. We alsocompare (ii) with (iii) the result of propagating the ex-act Hamiltonian plus V ext . The plots indicate that, forall three molecules and especially for LiH, the error be-tween (i) and (ii) is almost identical to that between(ii) and (iii). This indicates that the bulk of the error isdue to our use of a Runge-Kutta scheme instead of theMMUT scheme used in the electronic structure code. Toconﬁrm this, we consult Figure 7, in which we compare(i) and (iii) directly. All solutions here are computed us-ing the same Runge-Kutta scheme. For each molecule,we see that the errors for the ﬁeld-on problems are con-sistently smaller than those for the ﬁeld-oﬀ problems.We conclude from these results that the ML Hamilto-nian can be used to compute the electronic response toan applied electric ﬁeld.A short derivation will show that it is not automaticto expect the augmented ML Hamiltonian to propagatecorrectly. Let us work in continuous time, to eliminateerror due to discrete-time propagation; in this idealizedsetting, we start with the statement that both of ourﬁeld-free Hamiltonians, H (cid:48) ( t ) (exact) and (cid:101) H (cid:48) ( t ) (ML),satisfy the TDHF equation: i d P (cid:48) ( t ) dt = [ H (cid:48) ( t ) , P (cid:48) ( t )] i d P (cid:48) ( t ) dt = [ (cid:101) H (cid:48) ( t ) , P (cid:48) ( t )] . Subtracting these equations, and deﬁning the error (cid:15) ( t ) = H (cid:48) ( t ) − (cid:101) H (cid:48) ( t ), we obtain[ (cid:15) ( t ) , P (cid:48) ( t )] = 0 . (18)Now we augment both Hamiltonians with an externalﬁeld V ext ( t ). Let P (cid:48)(cid:48) ( t ) denote the true density for theproblem with the external ﬁeld. It must satisfy i d P (cid:48)(cid:48) ( t ) dt = [ H (cid:48) ( t ) + V ext ( t ) , P (cid:48)(cid:48) ( t )] . Via H (cid:48) ( t ) = (cid:15) ( t ) + (cid:101) H (cid:48) ( t ), we obtain i d P (cid:48)(cid:48) ( t ) dt = [ (cid:101) H (cid:48) ( t ) + V ext ( t ) , P (cid:48)(cid:48) ( t )] + [ (cid:15) ( t ) , P (cid:48)(cid:48) ( t )] (cid:124) (cid:123)(cid:122) (cid:125) ∗ . As (18) does not in general imply that the starred termvanishes, we cannot conclude that the true density P (cid:48)(cid:48) ( t )satisﬁes the TDHF equation with the augmented ML P r o p a g a t i o n e rr o r s a n d S c h HeH+ exact HamHeH+ ML HamH2 exact HamH2 ML HamLiH exact HamLiH ML Ham 0 25 50 75 100 125 150time0.0000.0050.0100.0150.020 P r o p a g a t i o n e rr o r s a n d S c h HeH+ exact Ham with fieldHeH+ ML Ham with fieldH2 exact Ham with fieldH2 ML Ham with fieldLiH exact Ham with fieldLiH ML Ham with field

Fig. 4

Time-dependent propagation errors in which we compare the training data against either (cid:101) P , the result of propagatingthe ML Hamiltonian, or P , the result of propagating the exact Hamiltonian. All calculations on the left (respectively, right) arefor the ﬁeld-free (respectively, ﬁeld-on) problem. For each molecule, the error incurred by propagating with the ML Hamiltonianis within a constant factor of the error incurred by propagating with the exact Hamiltonian. At the ﬁnal time, all errors areon the order of 10 − , except for the ﬁeld-on calculations with LiH. The average values of these curves over all time correspondprecisely to E and E Sch —see (15), (16), and Table 1 for further details. E - f i e l d R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) I m ( P _ ( , )) Gaussian (black), exact-H (blue), and ML-H (red) propagation results E - f i e l d R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) I m ( P _ ( , )) Gaussian (black), exact-H (blue), and ML-H (red) propagation results

Fig. 5

HeH + (left) and H (right) propagation with ﬁeld. The top panel of each plot gives the applied electric ﬁeld (9). Insubsequent panels, for both molecules, we plot all unique real and imaginary parts of the time-dependent density matrices:actual training data (black), exact Hamiltonian propagation (blue), and ML Hamiltonian propagation (red). By ML Hamil-tonian, we mean the Hamiltonian trained on the ﬁeld-free data plus V ext given by (9). Note the close agreement of all threecurves, on a time interval that is twice the length used for training. This is a true test of whether the learned Hamiltonian canextrapolate to problem settings beyond the one used for training.achine Learning a Molecular Hamiltonian for Predicting Electron Dynamics 11 R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) R e ( P _ ( , )) I m ( P _ ( , )) I m ( P _ ( , )) I m ( P _ ( , )) I m ( P _ ( , )) I m ( P _ ( , )) I m ( P _ ( , )) Gaussian (black), exact-H (blue), and ML-H (red) propagation results

Fig. 6

LiH propagation with ﬁeld. We plot all unique realand imaginary parts of the time-dependent density matri-ces: actual training data (black), exact Hamiltonian propa-gation (blue), and ML Hamiltonian propagation (red). ByML Hamiltonian, we mean the Hamiltonian trained on theﬁeld-free LiH data plus V ext given by (9). Note the closeagreement of all curves, on a time interval that is twice thelength used for training. This is a true test of whether thelearned Hamiltonian can extrapolate to problem settings be-yond the one used for training. We omit a plot of the electricﬁeld here—see the top panels of Figure 5. Hamiltonian (cid:101) H (cid:48) ( t )+ V ext ( t ). Based on the above deriva-tion, if we solve the TDHF equation using the aug-mented ML Hamiltonian, we expect to obtain a time-dependent density that diﬀers from P (cid:48)(cid:48) ( t ). As we areable to use the ML Hamiltonian successfully on theproblem with an applied electric ﬁeld, we hypothesizethat the error (cid:15) ( t ) is structured in such a way that en-ables us to extrapolate to new problems. We plan totest this hypothesis in future work. P r o p a g a t i o n e rr o r H a m HeH+HeH+ with fieldH2H2 with fieldLiHLiH with field

Fig. 7

Time-dependent propagation errors in which we com-pare (cid:101) P , the result of propagating the ML Hamiltonian, with P , the result of propagating the exact Hamiltonian. All re-sults were computed using the same Runge-Kutta scheme,isolating the error due to the diﬀerent Hamiltonians. We in-clude both ﬁeld-free and ﬁeld-on calculations. Note that allresults are plotted on a log scale. The results show that whenwe propagate both the ML and exact Hamiltonians using thesame scheme, the errors between the two resulting trajecto-ries remain small even as we take hundreds of time steps.The average values of these curves over all time correspondprecisely to E Ham —see (17) and Table 1 for further details. https://github.com/hbhat4000/electrondynamics . Train-ing data is available from the authors upon request.

Our current work demonstrates that, from a single timeseries consisting of time-dependent density matrices,we can eﬀectively learn an integrated Hamiltonian ma-trix. This ML Hamiltonian can be used for propaga-tion in both the ﬁeld-oﬀ and ﬁeld-on settings. Impor-tantly, training with a single ﬁeld-free trajectory, ourML Hamiltonian has the potential to predict the elec-tronic response to a large variety of ﬁeld pulse per-turbations, opening the door to laser-ﬁeld controlledchemistry. The present work leads to two main areasof future work. The ﬁrst area concerns technical im-provements to the procedure itself, including (i) to re-place (12) with a weighted loss function, to accountfor density elements that oscillate on diﬀerent verti-cal scales, (ii) to propagate our ML Hamiltonian usingthe MMUT scheme, thus eliminating the kind of errorquantiﬁed by E Sch , and (iii) to further explore reducingthe number of degrees of freedom in the ML Hamilto- nian. The second area concerns improving our overallunderstanding of the procedure, and applying it to sys-tems of greater chemical and physical interest. In thisarea, further work is needed to understand the diﬀer-ence between the exact and ML Hamiltonians, whetherthis diﬀerence can be decreased by training on multipletrajectories, and how far outside the training regime wecan push the ML Hamiltonian. We can also seek to learnthe ˆ H operator rather than the H matrix representa-tion, which is of interest for determining the unknownexchange-correlation potential within time-dependentdensity functional theory. In this way, we can push thisprocedure beyond known physics (as explored here) tosystems where the underlying potential energy termsare not known with suﬃcient accuracy or precision. Acknowledgements

This work was supported by the U.S.Department of Energy, Oﬃce of Science, Oﬃce of Basic En-ergy Sciences under Award Number DE-SC0020203. We ac-knowledge computational time on the MERCED cluster (fundedby NSF ACI-1429783), and on the Nautilus cluster, whichis supported by the Paciﬁc Research Platform (NSF ACI-1541349), CHASE-CI (NSF CNS-1730158), and Towards aNational Research Platform (NSF OAC-1826967). Additionalfunding for Nautilus has been supplied by the University ofCalifornia Oﬃce of the President.

Appendix

Canonical Orthogonalization.

Let S be the overlap ma-trix with S µν = (cid:104) χ µ | χ ν (cid:105) . Because it is real and sym-metric, we have S = UsU T where s is diagonal andreal, and U is a real orthogonal matrix. Now we form X = Us − / . Then, we go from P to P (cid:48) via P (cid:48) = s / U T PUs / . If H is the Hamiltonian in the AO basis, the Hamilto-nian in the orthogonalized basis is H (cid:48) = s − / U T HUs − / . Dimensionality Reduction.

For LiH, certain elements ofthe density matrix P (cid:48) ( t ) are identically zero for all t . Wethus deﬁne a reduced state vector p (cid:93) that consists of thenon-zero upper-triangular degrees of freedom, i.e., the (cid:102) M ≤ M elements that are necessary to reconstructall of P (cid:48) . Out of these (cid:102) M elements, we take the ﬁrst (cid:102) M R to correspond to elements of P (cid:48) R and the remaining (cid:102) M I = (cid:102) M − (cid:102) M R to correspond to elements of P (cid:48) I . Deﬁne Z by Z = Z R ∪ Z I (19a) Z R = { ( i, j ) s.t. i ≤ j and not P Rij ( t ) ≡ } (19b) Z I = { ( i, j ) s.t. i < j and not P Iij ( t ) ≡ } . (19c) We form a mapping σ : { , , . . . , (cid:102) M } → Z , whose pur-pose is to map one-dimensional indices of the reducedvector p (cid:93) to ordered pair indices of the full matrix P (cid:48) .We deﬁne σ implicitly through (19) and the following: p (cid:93)k = (cid:40) P (cid:48) Rσ ( k ) k ≤ (cid:102) M R P (cid:48) I σ ( k ) k > (cid:102) M R . (20)Looping over the entries k = 1 , , . . . , (cid:102) M , this equationlet us go back and forth from the full complex matrix P (cid:48) to the reduced real state vector p (cid:93) .Importantly, we now follow precisely the same pro-cedure, with the same mapping σ and set Z , to forma reduced Hamiltonian vector h (cid:93) . We then formulate areduced-dimensionality version of (10): (cid:101) h (cid:93) = (cid:101) β + (cid:101) β p (cid:93) . (21)The matrix (cid:101) β is now of dimension (cid:102) M × (cid:102) M ; all otherobjects in this equation are vectors of dimension (cid:102) M ×

1. The training procedure then holds without furthermodiﬁcations.

References

1. Bart´ok, A.P., De, S., Poelking, C., Bernstein, N., Ker-mode, J.R., Cs´anyi, G., Ceriotti, M.: Machine learninguniﬁes the modeling of materials and molecules. ScienceAdvances (12) (2017). DOI 10.1126/sciadv.17018162. Behler, J.: Perspective: Machine learning potentials foratomistic simulations. The Journal of Chemical Physics (17), 170901 (2016). DOI 10.1063/1.49661923. Behler, J., Parrinello, M.: Generalized neural-networkrepresentation of high-dimensional potential-energy sur-faces. Phys. Rev. Lett. (14), 146401 (2007)4. Bertalan, T., Dietrich, F., Mezi´c, I., Kevrekidis, I.G.:On learning Hamiltonian systems from data. Chaos:An Interdisciplinary Journal of Nonlinear Science (12),121107 (2019). DOI 10.1063/1.5128231. URL https://doi.org/10.1063/1.5128231

5. Bhat, H.S.: Learning and interpreting potentials for clas-sical Hamiltonian systems. In: P. Cellier, K. Driessens(eds.) Machine Learning and Knowledge Discovery inDatabases. ECML PKDD 2019. Springer, Cham (2020).Communications in Computer and Information Science .6. Ceriotti, M.: Unsupervised machine learning in atom-istic simulations, between predictions and understand-ing. Journal of Chemical Physics (15) (2019). DOI10.1063/1.50918427. Chandrasekaran, A., Kamal, D., Batra, R., Kim, C.,Chen, L., Ramprasad, R.: Solving the electronic struc-ture problem with machine learning. npj ComputationalMaterials (1) (2019). DOI 10.1038/s41524-019-0162-7.URL https://doi.org/10.1038/s41524-019-0162-7

8. Chen, W.K., Liu, X.Y., Fang, W.H., Dral, P.O., Cui, G.:Deep Learning for Nonadiabatic Excited-State Dynam-ics. J. Phys. Chem. Lett. (23), 6702–6708 (2018)9. Chen, Z., Zhang, J., Arjovsky, M., Bottou, L.: Symplec-tic recurrent neural networks. In: 8th International Con-ference on Learning Representations, ICLR 2020 (2020).URL https://openreview.net/forum?id=BkgYPREtPr achine Learning a Molecular Hamiltonian for Predicting Electron Dynamics 1310. Christensen, A.S., Faber, F.A., Von Lilienfeld, O.A.: Op-erators in quantum machine learning: Response proper-ties in chemical space. J. Chem. Phys (6), 064105(2019)11. Dirac, P.A.M.: Note on Exchange Phenomena in theThomas Atom. Mathematical Proceedings of the Cam-bridge Philosophical Society (3), 376–385 (1930). DOI10.1017/S030500410001610812. Dreuw, A., Head-Gordon, M.: Single-reference ab ini-tio methods for the calculation of excited states oflarge molecules. Chem. Rev. (11), 4009–4037 (2005).DOI 10.1021/cr0505627. URL https://doi.org/10.1021/cr0505627 . PMID: 1627736913. Eshuis, H., Balint-Kurti, G.G., Manby, F.R.: Dynamics ofmolecules in strong oscillating electric ﬁelds using time-dependent Hartree–Fock theory. J. Chem. Phys. (11),114113 (2008)14. Frisch, M.J., Trucks, G.W., Schlegel, H.B., Scuseria,G.E., Robb, M.A., Cheeseman, J.R., Scalmani, G.,Barone, V., Petersson, G.A., Nakatsuji, H., Li, X., Car-icato, M., Marenich, A.V., Bloino, J., Janesko, B.G.,Gomperts, R., Mennucci, B., Hratchian, H.P., Ortiz, J.V.,Izmaylov, A.F., Sonnenberg, J.L., Williams-Young, D.,Ding, F., Lipparini, F., Egidi, F., Goings, J., Peng, B.,Petrone, A., Henderson, T., Ranasinghe, D., Zakrzewski,V.G., Gao, J., Rega, N., Zheng, G., Liang, W., Hada,M., Ehara, M., Toyota, K., Fukuda, R., Hasegawa, J.,Ishida, M., Nakajima, T., Honda, Y., Kitao, O., Nakai,H., Vreven, T., Throssell, K., Montgomery Jr., J.A., Per-alta, J.E., Ogliaro, F., Bearpark, M.J., Heyd, J.J., Broth-ers, E.N., Kudin, K.N., Staroverov, V.N., Keith, T.A.,Kobayashi, R., Normand, J., Raghavachari, K., Rendell,A.P., Burant, J.C., Iyengar, S.S., Tomasi, J., Cossi, M.,Millam, J.M., Klene, M., Adamo, C., Cammi, R., Ochter-ski, J.W., Martin, R.L., Morokuma, K., Farkas, O., Fores-man, J.B., Fox, D.J.: Gaussian Development Version Re-vision I.14+ (2018). Gaussian Inc. Wallingford CT15. Fujita, H., Nakagawa, Y.O., Sugiura, S., Oshikawa, M.:Construction of Hamiltonians by supervised learning ofenergy and entanglement spectra. Phys. Rev. B ,075114 (2018). DOI 10.1103/PhysRevB.97.075114. URL https://link.aps.org/doi/10.1103/PhysRevB.97.075114

16. Gastegger, M., Behler, J., Marquetand, P.: Machinelearning molecular dynamics for the simulation of in-frared spectra. Chem. Sci. (10), 6924–6935 (2017)17. Ghosh, K., Stuke, A., Todorovi´c, M., Jørgensen, P.B.,Schmidt, M.N., Vehtari, A., Rinke, P.: Deep LearningSpectroscopy: Neural Networks for Molecular ExcitationSpectra. Adv. Sci. (9), 1801367–1801374 (2019)18. Grisaﬁ, A., Wilkins, D.M., Cs´anyi, G., Ceriotti, M.:Symmetry-Adapted Machine Learning for TensorialProperties of Atomistic Systems. Phys. Rev. Lett. (3), 36002 (2018)19. H¨ase, F., Valleau, S., Pyzer-Knapp, E., Aspuru-Guzik,A.: Machine learning exciton dynamics. Chem. Sci. (8),5139–5147 (2016)20. Innocenti, L., Banchi, L., Ferraro, A., Bose, S., Paternos-tro, M.: Supervised learning of time-independent Hamil-tonians for gate design. New Journal of Physics (6),065001 (2020). DOI 10.1088/1367-2630/ab8aaf21. Isborn, C.M., Li, X., Tully, J.C.: TDDFT Ehrenfest dy-namics: Collisions between atomic oxygen and graphiteclusters. J. Chem. Phys. , 134307 (2007)22. Jin, P., Zhang, Z., Zhu, A., Tang, Y., Karniadakis,G.E.: Sympnets: Intrinsic structure-preserving symplec-tic networks for identifying Hamiltonian systems (2020).ArXiv:2001.03750 23. Jørgensen, M.S., Mortensen, H.L., Meldgaard, S.A., Kols-bjerg, E.L., Jacobsen, T.L., Sørensen, K.H., Hammer,B.: Atomistic structure learning. Journal of ChemicalPhysics (2019). DOI 10.1063/1.510887124. Li, H., Collins, C., Tanha, M., Gordon, G.J., Yaron, D.J.:A density functional tight binding layer for deep learningof chemical Hamiltonians. Journal of Chemical Theoryand Computation (11), 5764–5776 (2018). DOI 10.1021/acs.jctc.8b00873. URL https://doi.org/10.1021/acs.jctc.8b00873 . PMID: 3035100825. Li, X., Smith, S.M., Markevitch, A.N., Romanov, D.A.,Levis, R.J., Schlegel, H.B.: A time-dependent Hartree-Fock approach for studying the electronic optical re-sponse of molecules in intense ﬁelds. Physical Chem-istry Chemical Physics (2), 233–239 (2005). DOI10.1039/b415849k26. Lopata, K., Govind, N.: Modeling fast electron dynam-ics with real-time time-dependent density functional the-ory: Application to small molecules and chromophores.J. Chem. Theory Comput. (5), 1344–1355 (2011). DOI10.1021/ct200137z27. Lu, C., Liu, Q., Sun, Q., Hsieh, C.Y., Zhang, S., Shi, L.,Lee, C.K.: Deep Learning for Optoelectronic Propertiesof Organic Semiconductors. J. Phys. Chem. C , 7048–7060 (2020)28. Mattheakis, M., Sondak, D., Dogra, A.S., Protopapas,P.: Hamiltonian Neural Networks for solving diﬀerentialequations (2020). ArXiv:2001.1110729. Micha, D.A., Runge, K.: Time-dependent many-electronapproach to slow ion-atom collisions: The coupling ofelectronic and nuclear motions. Phys. Rev. A , 322–336 (1994). DOI 10.1103/PhysRevA.50.322. URL https://link.aps.org/doi/10.1103/PhysRevA.50.322

30. Miller, S.T., Lindner, J.F., Choudhary, A., Sinha, S.,Ditto, W.L.: Mastering high-dimensional dynamics withHamiltonian neural networks (2020). ArXiv:2008.0421431. Montavon, G., Rupp, M., Gobre, V., Vazquez-Mayagoitia, A., Hansen, K., Tkatchenko, A., M¨uller,K.R., Anatole Von Lilienfeld, O.: Machine learning ofmolecular electronic properties in chemical compoundspace. New J. Phys. , 095003 (2013)32. Nascimento, D.R., DePrince III, A.E.: Linear absorp-tion spectra from explicitly time-dependent equation-of-motion coupled-cluster theory. J. Chem. Theory Comput. (12), 5834–5840 (2016). DOI 10.1021/acs.jctc.6b0079633. Nebgen, B., Lubbers, N., Smith, J.S., Sifain, A.E.,Lokhov, A., Isayev, O., Roitberg, A.E., Barros, K., Tre-tiak, S.: Transferable Dynamic Molecular Charge Assign-ment Using Deep Neural Networks. J. Chem. TheoryComput. (9), 4687–4698 (2018)34. Paruzzo, F.M., Hofstetter, A., Musil, F., De, S., Ceriotti,M., Emsley, L.: Chemical shifts in molecular solids bymachine learning. Nat. Commun. , 4501 (2018)35. Pronobis, W., Sch¨utt, K.T., Tkatchenko, A., M¨uller,K.R.: Capturing intensive and extensive DFT/TDDFTmolecular properties with machine learning. Eur. Phys.J. B (8), 178–184 (2018)36. Provorse, M.R., Isborn, C.M.: Electron dynamics withreal-time time-dependent density functional theory. Int.J. Quant. Chem. (10), 739–749 (2016). DOI 10.1002/qua.2509637. Ramakrishnan, R., Hartmann, M., Tapavicza, E., vonLilienfeld, O.A.: Electronic spectra from TDDFT andmachine learning in chemical space. J. Chem. Phys. ,084111 (2015)38. Rezende, D.J., Racani`ere, S., Higgins, I., Toth, P.: Equiv-ariant Hamiltonian ﬂows (2019). ArXiv:1909.137394 Harish S. Bhat et al.39. Rodr´ıguez, M., Kramer, T.: Machine learning oftwo-dimensional spectroscopic data. Chem. Phys. (December 2018), 52–60 (2019)40. Schleder, G.R., Padilha, A.C.M., Acosta, C.M., Costa,M., Fazzio, A.: From DFT to machine learning: re-cent approaches to materials science–a review. Jour-nal of Physics: Materials (3), 032001 (2019). DOI10.1088/2515-7639/ab084b. URL https://doi.org/10.1088/2515-7639/ab084b

41. Sifain, A.E., Lubbers, N., Nebgen, B.T., Smith, J.S.,Lokhov, A.Y., Isayev, O., Roitberg, A.E., Barros, K., Tre-tiak, S.: Discovering a Transferable Charge AssignmentModel Using Machine Learning. J. Phys. Chem. Lett (16), 4495–4501 (2018)42. Smith, J.S., Nebgen, B.T., Zubatyuk, R., Lubbers, N.,Devereux, C., Barros, K., Tretiak, S., Isayev, O., Roit-berg, A.E.: Approaching coupled cluster accuracy with ageneral-purpose neural network potential through trans-fer learning. Nature Communications (2019). DOI10.1038/s41467-019-10827-443. Snyder, J.C., Rupp, M., Hansen, K., M¨uller, K.R., Burke,K.: Finding density functionals with machine learning.Physical Review Letters (25), 1–5 (2012). DOI 10.1103/PhysRevLett.108.25300244. Suzuki, Y., Nagai, R., Haruyama, J.: Machine learn-ing exchange-correlation potential in time-dependentdensity-functional theory. Phys. Rev. A , 050501(2020). DOI 10.1103/PhysRevA.101.05050145. Szabo, A., Ostlund, N.S.: Modern Quantum Chemistry:Introduction to Advanced Electronic Structure Theory,ﬁrst edn. Dover Publications, Inc., Mineola (1996)46. Toth, P., Rezende, D.J., Jaegle, A., Racani`ere, S., Botev,A., Higgins, I.: Hamiltonian generative networks. In: 8thInternational Conference on Learning Representations,ICLR 2020 (2020). URL https://openreview.net/forum?id=HJenn6VFvB

47. Wilkins, D.M., Grisaﬁ, A., Yang, Y., Lao, K.U., DiSta-sio, R.A., Ceriotti, M.: Accurate molecular polarizabil-ities with coupled cluster theory and machine learning.Proc. Natl. Acad. Sci. U.S.A. (9), 3401–3406 (2019)48. Ye, S., Hu, W., Li, X., Zhang, J., Zhong, K., Zhang, G.,Luo, Y., Mukamel, S., Jiang, J.: A neural network proto-col for electronic excitations of N-methylacetamide. Proc.Natl. Acad. Sci. U.S.A. (24), 11612–11617 (2019)49. Zhong, Y.D., Dey, B., Chakraborty, A.: Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control. In:8th International Conference on Learning Representa-tions, ICLR 2020 (2020). URL https://openreview.net/pdf?id=ryxmb1rKDS

50. Zhu, Y., Herbert, J.M.: Self-consistent predic-tor/corrector algorithms for stable and eﬃcient in-tegration of the time-dependent Kohn-Sham equation.J. Chem. Phys.148