[PDF] NECI: N-Electron Configuration Interaction with emphasis on state-of-the-art stochastic methods

Abstract

We present NECI, a state-of-the-art implementation of the Full Configuration Interaction Quantum Monte Carlo algorithm, a method based on a stochastic application of the Hamiltonian matrix on a sparse sampling of the wave function. The program utilizes a very powerful parallelization and scales efficiently to more than 24000 CPU cores. In this paper, we describe the core functionalities of NECI and recent developments. This includes the capabilities to calculate ground and excited state energies, properties via the one- and two-body reduced density matrices, as well as spectral and Green's functions for ab initio and model systems. A number of enhancements of the bare FCIQMC algorithm are available within NECI, allowing to use a partially deterministic formulation of the algorithm, working in a spin-adapted basis or supporting transcorrelated Hamiltonians. NECI supports the FCIDUMP file format for integrals, supplying a convenient interface to numerous quantum chemistry programs and it is licensed under GPL-3.0.

Full PDF

aa r X i v : . [ phy s i c s . c o m p - ph ] J un NECI: N -Electron Conﬁguration Interaction with an emphasis on state-of-the-artstochastic methods Kai Guther, a) Robert J. Anderson, Nick S. Blunt, Nikolay A. Bogdanov, DeidreCleland, Nike Dattani, Werner Dobrautz, Khaldoon Ghanem, Peter Jeszenski, Niklas Liebermann, Giovanni Li Manni, Alexander Y. Lozovoi, Hongjun Luo, DongxiaMa, Florian Merz, Catherine Overy, Markus Rampp, Pradipta Kumar Samanta, Lauretta R. Schwarz,

1, 3

James J. Shepherd, Simon D. Smart, Eugenio Vitale, OskarWeser,

1, 2

George H. Booth, and Ali Alavi

1, 3, b)1)

Max Planck Institute for Solid State Research, Heisenbergstr. 1, 70569 Stuttgart,Germany Department of Physics, King’s College London, Strand, London WC2R 2LS,United Kingdom Department of Chemistry, University of Cambridge, Lensﬁeld Road,Cambridge CB2 1EW, United Kingdom CSIRO Data61, Docklands VIC 3008, Australia Department of Electrical and Computer Engineering, University of Waterloo,200 University Avenue, Waterloo, Canada Centre for Theoretical Chemistry and Physics, NZ Institute for Advanced Study,Massey University, New Zealand Lenovo HPC&AI Innovation Center, Meitnerstr. 9, 70563 Stuttgart Max Planck Computing and Data Facility (MPCDF), Gießenbachstr. 2,85748 Garching, Germany Department of Chemistry & Informatics Institute, University of Iowa (Dated: 29 June 2020)

1e present

NECI , a state-of-the-art implementation of the Full Conﬁguration Inter-action Quantum Monte Carlo algorithm, a method based on a stochastic applicationof the Hamiltonian matrix on a sparse sampling of the wave function. The programutilizes a very powerful parallelization and scales eﬃciently to more than 24000 CPUcores. In this paper, we describe the core functionalities of

NECI and recent develop-ments. This includes the capabilities to calculate ground and excited state energies,properties via the one- and two-body reduced density matrices, as well as spectraland Green’s functions for ab initio and model systems. A number of enhancementsof the bare FCIQMC algorithm are available within

NECI , allowing to use a partiallydeterministic formulation of the algorithm, working in a spin-adapted basis or sup-porting transcorrelated Hamiltonians.

NECI supports the FCIDUMP ﬁle format forintegrals, supplying a convenient interface to numerous quantum chemistry programsand it is licensed under GPL-3.0.This article has been accepted by the Jouranl of Chemical Physics, after it is pub-lished, it will be found at https://aip.scitation.org/journal/jcp . a) Electronic mail: [email protected] b) Electronic mail: [email protected] . INTRODUCTION NECI started oﬀ in the late 1990s as an exact diagonalisation code for model quantum dots ,and has evolved into a code to perform stochastic diagonalisation of large fermionic systemsin ﬁnite but large quantum chemical basis sets, using the Full Conﬁguration InteractionQuantum Monte Carlo (FCIQMC) algorithm . This algorithm samples Slater determinant(i.e. antisymmetrized) Hilbert spaces using signed walkers , by propagation of the walkersthrough stochastic application of the second-quantized Hamiltonian onto the walker popula-tion. In philosophy, it is similar to the continuum real-space Diﬀusion Monte Carlo (DMC)algorithm. However, unlike DMC, no ﬁxed node approximation needs to be applied. In-stead, the nodal structure of the wavefunction, as encoded by the signed coeﬃcients of thesampled Slater determinants, emerges from the dynamics of the simulation itself. How-ever, being based on an FCI parametrization of the wave function, the FCIQMC methodexhibits a steep scaling with the number of electrons and is thus only suited for relativelysmall chemical systems compared to those accessible to DMC. While the common energymeasures in FCIQMC methods, namely the projected, trial energies (cf section IV) and theenergy "shift", are not variational, a variational energy can be computed from two paral-lel FCIQMC calculations either directly (cf section VI), or via the reduced density matrix(RDM) based energy estimator (cf section VII).There are also similarities between the FCIQMC approach and the Auxiliary-Field Quan-tum Monte Carlo (AFQMC) method , both being stochastic projector techniques formu-lated in second quantized spaces. The latter however works in an over-complete space ofnon-orthogonal Slater determinants and relies on the phase-less approximation to eliminatethe phase problem associated with the Hubbard-Stratonovich transformation of the Coulombinteraction kernel, the quality of this approximation being reliant on the trial wavefunctionused to constrain the path. The objective of AFQMC is the measurement of observablessuch as the energy by sampling over the Hubbard-Stratonovich ﬁelds. FCIQMC on theother hand works in a ﬁxed Slater determinant space and relies on walker annihilation toovercome the fermion sign problem. The phase-less approximation renders the AFQMCmethod polynomial scaling, with an uncontrolled approximation, while i-FCIQMC, whichis an in principle exact method, remains exponential scaling. Finally FCIQMC provides a3irect measure of the CI amplitudes of the many-body wavefunction expressed in the givenorbital basis, from which observables can be computed including elements of reduced den-sity matrices (which do not commute with the Hamiltonian) via pure estimators. Exactsymmetry constraints, including total spin, can be incorporated into the formalism . Inthis sense, the FCIQMC method is closer in spirit to multi-reference CI methods used inquantum chemistry to study multi-reference problems rather than the AFQMC method.In its original formulation, the algorithm is guaranteed to converge onto the ground-statewavefunction in the long imaginary-time propagation limit, provided a suﬃcient number ofwalkers is used. This number is generally found to scale with the Hilbert space size, andis a manifestation of the sign-problem in this method, essentially implying an exponentialmemory cost in order to guarantee stable convergence onto the exact solution. In thesubsequent development of the initiator method (i-FCIQMC) , this condition was relaxedto allow for stable simulations at relatively low walker populations, much smaller than thefull Hilbert space size, albeit at the cost of a systematically improvable bias. While theinitiator adaptation removes the strict need for a minimum walker number, it does noteliminate the exponential scaling of the method, such that calculations become more andmore challenging with increasing system size. To give an idea of the capabilities of the NECI implementation, estimates for the accessible system sizes are given below. The rateof convergence of the initiator error with walker number has been found to be slow forlarge systems. This is a manifestation of size-inconsistency error which generally plagueslinear Conﬁguration Interaction methods. A very recent development of the adaptive shift method , mitigates this error substantially, enabling near-FCI quality results to be obtainedfor systems as large as benzene.The development of the semi-stochastic method by Umrigar et al. and its furtherreﬁnements dramatically reduced the stochastic noise and hence improved the eﬃciencyof the method.The FCIQMC algorithm, as well as its semi-stochastic and initiator versions, are scalableon large parallel machines, thanks to the fact that walker distribution can be distributedover many processors with relatively small communication overhead. The methods, however,are not embarrassingly parallel, owing to the annihilation step of the algorithm (see alsoﬁgure 1). For this reason, parallelisation over very large numbers of processors is a highlynon-trivial task, but substantial progress has been made, and here we show that eﬃcient4arallelisation up to more than 24000 CPU cores can be achieved with the current NECI code.The FCIQMC method has been generalised to excited states of the same symmetryas the ground state and to the calculation of pure one-and two-particle reduced densitymatrices via the "replica-trick" (and more recently three and four-particle RDMs ).The availability of RDMs enabled the development of the Stochastic CASSCF method for treating extremely large active spaces. More recently, a fully spin-adapted formulation ofFCIQMC has been implemented based on the Graphical Unitary Group Approach , whichovercomes the previous limitations of spin-adaptation, which severely limited the numberof open-shell orbitals which could be handled. Other advanced developments of FCIQMCin the NECI code include real-time propagation and application to spectroscopy , Krylov-space FCIQMC , and the similarity transformed FCIQMC which allows the directincorporation of Jastrow and similar factors depending on explicit electron-electron variablesinto the wavefunction.A number of stochastic methods have been developed as an extension or variation ofthe FCIQMC approach. These include density matrix quantum Monte Carlo (DMQMC),which allows the exact thermal density matrix to be sampled at any given temperature, andalso allows straightforward estimation of general observables, including those which do notcommute with the Hamiltonian . Applications of DMQMC include providing accuratedata for the warm dense electron gas . Although not implemented in NECI, DMQMC isavailable in the HANDE-QMC code .The FCIQMC method has lead to the development of a number of highly eﬃcient deter-ministic selected CI methods, including the adaptive sampling CI method of Head-Gordonand co-workers , who also establish the connection with the much older perturbativelyselected CIPSI method of Malrieu et al but with a modiﬁed search procedure, while theHeat-Bath CI method of Umrigar and coworkers was developed from the Heat-Bath excita-tion generation for FCIQMC together with an initiator-like criterion to select the connecteddeterminants with extreme eﬃciency. Later a sign-problem-free semi-stochastic evaluationof the Epstein-Nesbet perturbation energy was developed by Sharma et al to computethe missing dynamical correlation energy at second-order in a memory and CPU eﬃcientmanner. Other highly related developments to FCIQMC which originate in the numericalanalysis literature include the Fast Randomised Iteration and further developments by5eare, Berkelbach and coworkers , and co-ordinate descent FCI of Lu and coworkers .Depending on the utilized features, the number of electrons and accessible basis sizescan vary. The i-FCIQMC implementation including the semi-stochastic version is highlyscalable and has been successfully applied to Hilbert space sizes of up to with 54electrons . Atomic basis sets up to aug-cc-pCV8Z for ﬁrst-row atoms (1138 spin orbitals)are treatable . Reduced density matrices can routinely be calculated for use in accu-rate Stochastic-MCSCF for active spaces containing up to 40 electrons and 38 spatialorbitals . Real-time calculations are computationally more demanding but can still beperformed for ﬁrst-row dimers using cc-pVQZ basis sets . For the similarity transformedFCIQMC method, the limiting factor is not the convergence of the FCIQMC, but the storageof the three-body interaction terms imposing a limit of ∼ spatial orbitals on currentlyavailable hardware . Optimized implementations for the application to lattice model sys-tems, like the Hubbard (in a real- and momentum space formulation), t − J and theHeisenberg models for a variety of lattice geometries, are implemented in NECI . The ap-plicability of FCIQMC to the Hubbard model strongly depends on the interaction strength

U/t . For the very weakly correlated regime

U/t < , FCIQMC is employable up to 70lattice sites , using a momentum-space basis. In the interesting, yet most problematic,intermediate interaction strength regime in two dimensions, the transcorrelated (similarity-transformed) FCIQMC is necessary to obtain reliable energies in systems up to 50 sites (atand near half-ﬁlling) .The FCIQMC algorithm as implemented in NECI is based on a sparse representation ofthe wave function and a stochastic application of the Hamiltonian. We start with the fullwave function | ψ FCI i = X i C i | D i i , (1)with coeﬃcients C i in a many-body basis | D i i . NECI supports Slater determinants or CSFsas a many-body basis, for simplicity, for now, the usage of determinants is assumed, but thealgorithm is analogous for CSFs, see also section XII B 2. The FCIQMC wave function is notnormalized. The ground state of a Hamiltonian ˆ H is now obtained by iterative imaginarytime-evolution, with the propagator expanded to ﬁrst order using a discrete time-step ∆ τ such that | ψ ( τ + ∆ τ ) i = (cid:16) − ∆ τ ( ˆ H − S ( τ )) (cid:17) | ψ ( τ ) i , (2)6hich converges to the ground state of ˆ H for τ → ∞ for | ∆ τ | < W where W is the diﬀerencebetween the largest and smallest eigenvalue of ˆ H . Here, S ( τ ) is a diagonal shift appliedto ˆ H , which is iteratively updated to match the ground state energy.The full wave function is stored in a compressed manner, where only coeﬃcients abovea given threshold value C min are stored. Coeﬃcients smaller than C min are stochasticallyrounded. That is, in every iteration, a wave function given by coeﬃcients C i ( τ ) is storedsuch that C i ( τ ) =  C i if | C i | > C min sign( C i ) C min else w . prob | C i | C min , (3)such that h C i ( τ ) i = C i . This compression is applied in every step of the algorithm thataﬀects the coeﬃcients. The value | C i ( τ ) | is referred to as walker number of the determinant | D i i , so | D i i is said to have | C i ( τ ) | walkers assigned.Applying the Hamiltonian to this compressed wave function is done by separating it intoa diagonal and an oﬀ-diagonal part as | ψ ( τ + ∆ τ ) i = X i (1 − ∆ τ ( H ii − S ( τ ))) C i ( τ ) | D i i | {z } (b) Death step (c) Annihilation step ↓− X i X j = i ∆ τ H ji C i ( τ ) | D j i | {z } (a) Spawn step , (4)and then performed in the three labeled steps ( a ) − ( c ) . First, in the spawning step , theoﬀ-diagonal part is evaluated by stochastically sampling the sum over j , storing the result-ing spawned wave-function as a separate entity as described in the ﬂow chart in Figure 1.Then, in the death step , the diagonal contribution is evaluated deterministically, followinga stochastic rounding of the resulting coeﬃcients. This step is performed in-place, sincethe coeﬃcients of the previous iterations are not required anymore. Finally, the spawnedwave-function from the oﬀ-diagonal part is added in the annihilation step , summing upall contributions from the spawned wave-function to each determinant. NECI implementsthe initiator method, too, which labels a class of determinants as initiators , typically those7ith an associated walker number above a given threshold, and eﬀectively zeroes all ma-trix elements between non-initiator determinants and determinants with C i ( τ ) = 0 . Theimplementation thereof is also sketched in ﬁgure 1.In the context of FCIQMC calculations, the core functionality of NECI consists of ahighly parallelizable implementation of the initiator FCIQMC method for both real andcomplex Hamiltonians. There is both a generic interface for ab initio systems, specializedimplementations for the Hubbard and Heisenberg models, as well as the uniform electron gas.The interface for passing input information on the system to NECI is discussed in section XIV.To enable continuation of calculations at a later point,

NECI can write the instantaneouswave function and current parameters—such as the shift value—to disk, saving the currentstate of the calculation.The

NECI program itself is written in Fortran, and requires extended Fortran 2003 sup-port, which is the default for current Fortran compilers. Parallelization is achieved usingthe Message Passing Interface (MPI) , and support for MPI 3.0 or newer is required. NECI further requires the BLAS and LAPACK lineara algebra libraries, which are available innumerous packages. Usage of the HDF5 library for parallel I/O is supported, but not re-quired. If used, the linked HDF5 library has to be built with Fortran support and for parallelapplications. For installation, cmake is required, as well as the fypp Fortran preprocessor .For pseudo random number generation, the double precision SIMD oriented fast MersenneTwister (dSFMT) implementation of the Mersenne Twister method is used. The stableversion of the program can be obtained from github at https://github.com/ghb24/NECI_STABLE , licensed under the GNU General Public License 3.0. Some advanced or experimen-tal features are only contained in the development version, for access to the developmentversion, please contact the corresponding authors. All features presented here are eventuallyto be integrated to the stable version. Detailed instructions on the installation can be foundin the Documentation that is available together with the code.In the following, various important features of NECI are explained in detail. An overviewof excitation generation, a fundamental part of every FCIQMC calculation, is given in sec-tion II. Then, the semi-stochastic approach (section III), the estimation of energy and useof trial wave functions (section IV), the recently proposed adaptive shift method to reducethe initiator error (section V) and perturbative corrections to this error (section VI), thesampling of reduced density matrices which is crucial for interfacing the FCIQMC method8 ontinuecalculation?Read wave function from file Initialize wave functionwith single determinantFor each iterationRead the Integrals from file / initialize the HamiltonianFor each determinantUseInitiator method? Determine whether is an initiatorGet contribution to projected energy For each walker on this determinantChoose a random connected determinantCreate a spawn on the chosendeterminant Is the spawn above threshold? Stochastically roundthe spawnAll spawns performed?Perform walker death on this determinantVisited all determinants?Communicate new spawns to respective tasksFor each spawn onto this taskUseInitiator method? Spawn is from non-initiatorto unoccupied? Abort the spawnAdd the spawn to the wave function, performing annihilationHandled all spawns?Reached targetpopulation? Adjust the shift tocontrol population

Re-assign determinants to tasks forload-balancing (every 1000 iterations)Time / iteration limit reached?Write current wave function to disk

Yes YesNo YesYesYesYesYesYes Yes NoNoNoNoNo No NoNo

Communicate output data

FIG. 1. Flow chart showing the basic initiator-FCIQMC implementation in

NECI . Marked in red aresteps that require synchronisation between the MPI tasks and thus are not trivially parallelizable. with other algorithms (section VII), the calculation of excited states (section VIII), staticresponse functions (section IX and the real-time FCIQMC method (section X), the transcor-related approach (section XI) and the available symmetries, including total spin conserva-9ion utilizing GUGA (section XII) are discussed. Finally, the scalability of

NECI is explored(section XIII) and the interfaces for usage with other code are presented (section XIV), inparticular for the Stochastic-MCSCF method (section XV).

II. EXCITATION GENERATION

A key component of the FCIQMC algorithm is the sampling of the Hamiltonian matrixelements in the spawning step, where the Hamiltonian is applied stochastically. This requiresan eﬃcient algorithm to randomly generate connected determinants with a known probabil-ity p gen for any given determinant, referred to as excitation generation. This typically meansmaking a symmetry constrained choice of (up to) two occupied orbitals in a determinantand (up to) two orbitals to replace them with, such that the corresponding Hamiltonianmatrix element is non-zero. If spin-adapted functions are used rather than determinants,the connectivity rules change but the main principles are same.The spawning probability for a spawn from a determinant | D i i to a determinant | D j i isin practice given by p s = ∆ τ | H ij | p gen ( j | i ) . (5)This means, the purpose of p gen ( j | i ) of selecting | D j i from | D i i in the spawning probability p s is to allow the ﬂexibility in the selection of determinants | D j i from | D i i so that, irrespectiveof how we choose | D j i from | D i i , the rate at which transitions occur is not biased by theselection algorithm. In other words, if a particular determinant | D j i is only selected rarelyfrom | D i i (i.e. with low generation probability), then the acceptance of the move (i.e.the spawning probability) will be with correspondingly high probability (i.e. proportionalto the inverse of the generation probability). Conversely if a determinant | D j i is selectedwith relatively high generation probability from | D i i , then its acceptance probability will becorrespondingly low. In other words, from the point of view of the exactness of the FCIQMCalgorithm, the precise manner in which excitations are made is immaterial: as long as theprobability p gen ( j | i ) > when | H ij | > , the algorithm will ensure that transitions from D i → D j occur at a rate proportional to | H ij | , and hence the walker dynamics convergesonto the exact ground-state solution of the Hamiltonian matrix. However, from the point ofview of eﬃciency , diﬀerent algorithms to generate excitations are by no means equivalent.That is, events with a very large | H ij | p gen ( j | i ) can lead to very large spawns and thus endanger10he stability of an i-FCIQMC calculation. For time-step optimization, NECI oﬀers a generalhistogramming method, which determines the time-step from a histogram of | H ij | p gen ( j | i ) , as wellas an optimized special case thereof, which only takes into account the maximal ratio . Ifrequired, internal weights of the excitation generators such a bias towards double excitationsare then optimized in the same fashion to maximise the time-step.However, as a result, the time-step and thus overall eﬃciency of the simulation is drivenby the worst-cases of the | H ij | p gen ( j | i ) ratio discovered within the explored Hilbert space. Thusan optimal excitation generator should create excitations with a probability distribution tothe Hamiltonian matrix elements, such that | H ij | p gen ( j | i ) ≈ const . (6)This is the optimal probability distribution, since then, the acceptance rate is solely deter-mined by the time step . NECI supports a variety of algorithms to perform excitation generation, with the mostnotable being the pre-computed heat-bath (PCHB) sampling (a variant of the heat-bathsampling presented in , as described in the appendix A 3), the on-the-ﬂy Cauchy-Schwartzmethod (described in the appendix A 2), the pre-computed Power-Pitzer method andlattice-model excitation generators both for real-space and momentum-space lattices. Addi-tionally, a three-body excitation generator and a uniform excitation generator are available,which are essential for treating systems with the transcorrelated ansatz when includingthree-body interactions.As heat-bath excitation generation can have high memory requirements, it might beimpractical for some systems. There, the on-the-ﬂy Cauchy-Schwartz method can maintainvery good | H ij | p gen ( j | i ) ratios without signiﬁcant memory cost, albeit at O ( N ) computational cost, N being the number of orbitals, and possibly with dynamic load imbalance. The details ofthe Cauchy-Schwartz excitation generation are discussed in the appendix. III. SEMI-STOCHASTIC FCIQMC

In many chemical systems the wave function is dominated by a relatively small numberof determinants. In a stochastic algorithm, the eﬃciency can be improved substantially bytreating these determinants in a partially deterministic manner.11etruzielo et al. suggested a semi-stochastic algorithm , where the FCIQMC projectionoperator ˆ P = P ij P ij | D i ih D j | , is applied exactly within a small but important subspace,which we call the deterministic space, D . Speciﬁcally, we write ˆ P = ˆ P D + ˆ P S , (7)where ˆ P D = X i ∈D , j ∈D P ij | D i ih D j | . (8)The ˆ P D operator therefore accounts for all spawnings which are both from and to deter-minants in D . The stochastic projection operator, ˆ P S , contains all remaining terms. Thematrix elements of ˆ P D are calculated and stored in a ﬁxed array, and applied exactly eachiteration by a matrix-vector multiplication. The operator ˆ P S is then applied stochasticallyby the usual FCIQMC spawning algorithm.The semi-stochastic adaptation requires storing the Hamiltonian matrix within D , whichwe denote H D . In NECI , H D is stored in a sparse format, distributed across all processes. Tocalculate H D , we have implemented the fast generation scheme of Li et al. This approachhas allowed us to use deterministic spaces containing up to ∼ determinants. However,a more typical size of D is between and .Ideally, a deterministic space of a given size ( N D ) should be chosen to contain the deter-minants with the largest value of | C i | in the exact FCI wave function. This optimal choiceis not possible in practice, but various approaches exist to make an approximate selection.Umrigar and co-workers suggest using selected conﬁguration interaction (SCI) to make theselection. Within

NECI , the most common approach is to choose the N D determinants whichhave the largest weight in the FCIQMC wave function, at a given iteration. Therefore, atypical FCIQMC simulation in

NECI will be performed until convergence (at some iterationnumber N conv . ) using the fully-stochastic algorithm, at which point the semi-stochastic ap-proach is turned on, selecting the N D most populated determinants in the instantaneouswave function to form D . The appropriate parameters ( N D and N conv . ) are speciﬁed in the NECI input ﬁle.

NECI supports performing periodic re-evaluation of the N D most populateddeterminants, updating the deterministic space D with a given frequency.Using the semi-stochastic adaptation with a moderate deterministic space (on the orderof ∼ ) can improve the eﬃciency of FCIQMC by multiple orders of magnitudes. This12s particularly true in weakly correlated systems. The semi-stochastic approach can alsobe used in NECI when sampling reduced density matrices (RDMs) as described in sectionVII. Here, contributions to RDMs are included exactly between all pairs of determinantswithin D . It has been shown that this can substantially reduce the error on RDM-basedestimators. Using the semi-stochastic adaptation in

NECI disables the load-balancing unlessa periodic update of D is performed. IV. TRIAL WAVE FUNCTIONS

The most common energy estimator used in FCIQMC is the reference-based projectedestimator, E Ref = h D Ref | ˆ H | Ψ ih D Ref | Ψ i , (9)where | D Ref i is an appropriate reference determinant (usually the Hartree–Fock determi-nant). In case | Ψ i is an eigenstate, this yields the exact energy, but in general it is anon-variational estimator. This is the default estimator for the energy, and can be obtainedwith minimal overhead. NECI has the option to use projected estimators based on more accurate trial wave func-tions, which can signiﬁcantly reduce statistical error in energy estimates. For this reason wedeﬁne a trial subspace T , which is spanned by N T determinants. Similarly to the determin-istic space, T should ideally be formed from the determinants with the largest contributionin the FCI wave function, or some good approximation to these determinants. Projecting ˆ H into T gives us a N T × N T matrix, which we denote H T , whose eigenstates can be usedas trial wave functions for more accurate energy estimators.Let us denote an eigenstate of H T by | Ψ T i = P i ∈T C T i | D i i , with eigenvalue E T . Thena trial function-based estimator can be deﬁned as E Trial = h Ψ T | ˆ H | Ψ ih Ψ T | Ψ i , (10) = E T + P j ∈C C j V j P i ∈T C i C T i . (11)Here, C is the space of all determinants connected to T by a single application of ˆ H (notincluding those in T ). C i denotes walker coeﬃcients in the FCIQMC wave function, and V j

13s deﬁned within C as V j = X i ∈T h D j | ˆ H | D i i C T i , | D j i ∈ C , | D i i ∈ T . (12)To calculate the estimator E Trial we therefore require several large arrays: ﬁrst, H T , whichis stored in a sparse format, in the same manner as the deterministic Hamiltonian in thesemi-stochastic scheme; second, | ψ T i , which must be calculated by the Lanczos or Davidsonalgorithm; third, V , which is a vector in the entire C space. The number of coeﬃcientsto store in C is larger than in T by a signiﬁcant amount, typically by several orders ofmagnitude. Indeed, storing V can become the largest memory requirement. Because ofthis, using trial wave functions is typically more memory intensive in NECI than using thesemi-stochastic approach, for a given space size. We therefore suggest using a smaller trialspace, T , compared to the deterministic space, D .Note that the initiator error on E Trial is not the same as the initiator error on E Ref . Forexample, E Trial becomes exact as | Ψ T i approaches the FCI wave function. For practicaltrial wave functions, however, the two energy estimates typically give similar initiator errorsfor ground-state energies in our experience. An exception occurs for excited states (seeSection VIII). In this case, the wave function is usually not well approximated by a singlereference determinant, and E Trial with an appropriate T yields a great improvement, bothfor the statistical and initiator error. V. ADAPTIVE SHIFT

The initiator criterion is important in making FCIQMC a practical method allowing usto achieve convergence at a dramatically lower number of walkers than the full FCIQMC .However, this approximation introduces a bias in the energy when an insuﬃcient number ofwalkers is used. This bias can be attributed to the fact that non-initiators are systematicallyundersampled due to the lack of feedback from their local Hilbert space. To correct this,we can allow each non-initiator determinant | D i i to have its own local shift S i ( τ ) as anappropriate fraction of the full shift S ( τ ) S i ( τ ) = f i × S ( τ ) . (13)14he fraction f i is computed by monitoring which spawns are accepted due to the initiatorcriterion and accumulating positive weights over the accepted and rejected ones: f i = P j ∈ accepted w ij P j ∈ all w ij . (14)These weights w ij are derived from perturbation theory where the ﬁrst-order contributionof determinant | D i i to the amplitude of determinant | D j i is used as a weight for spawnsfrom | D i i to | D j i w ij = | H ij | H jj − E . (15)It is worth noting that, regardless of how the weights are chosen, expression (14) guaranteesthat initiators get the full shift. Also as the number of walkers increases, the local Hilbertspace of a non-initiator becomes more and more populated, restoring the full method in thelarge walker limit.We call the above approach for unbiasing the initiator approximation, the adaptive-shift method . In Fig. 2, examplary results (from ) from using the adaptive shift methodare displayed, comparing total energies of the butadiene molecule in ANO-L-pVDZ basis(22 electrons in 82 spatial orbitals), obtained with the normal initiator method and theadaptive shift method using three diﬀerent values of the initiator parameter n a : 3, 10 and20. The adaptive shift results are in good agreement with other benchmark values fromDMRG, CCSDT(Q) and extrapolated HCIPT2. In contrast, the normal initiator methodhas a bias of over 10 mH. Also notice how by using the adaptive shift, the results become,to a large extent, independent of the initiator parameter n a . VI. PERTURBATIVE CORRECTIONS TO INITIATOR ERROR

An alternative approach to removing initiator error in

NECI is through a perturbativecorrection . In the initiator approximation, spawning events from non-initiators to unoc-cupied determinants are typically discarded. These discarded events make up a signiﬁcantfraction of all spawning attempts made, which in turn accounts for much of the total sim-ulation time. While it is necessary to discard these spawned walkers to prevent disastrousnoise from the sign problem , this step is extremely wasteful.These discarded walkers actually contain signiﬁcant information which can be used togreatly increase the accuracy of the initiator FCIQMC approach. Speciﬁcally, these walkers15 .25 0.50 0.75 1.00 1.25 1.50 1.75 2.00Total Number of Walkers 1e8−155.56−155.55−155.54−155.533155.523155.51 T o . a l E n e r g2 ( a . u . ) Normal Initiator, n a = 20Adaptive Shift, n a = 20Normal Initiator, n a = 10Adaptive Shift, n a = 10Normal Initiator, n a = 3Adaptive Shift, n a = 3DMRG - 6000CCSDT(Q)HCIPT2 - Extrapolated FIG. 2. Example of application of the adaptive shift method: Total energies of butadiene for thenormal initiator and the adaptive shift method, as a function of the number of walkers, for threevalues of the initiator parameter n a . The adaptive shift results converge to: − . h , − . h and − . h for n a of 3, 10 and 20, respectively. The DMRG value of − . h , obtained with a bond dimension of 6000 , the CCSDT(Q) value of − . h andthe extrapolated HCIPT2 value of − . h are in good agreement with that. Reproducedfrom Ghanem et al. JCP 151, 224108 (2019) with the permission of AIP Publishing. may sample up to double excitations from the currently-occupied determinants (a similarargument can be used to justify the above adaptive shift approach). In analogy with acomparable approach taken in selected CI methods, these discarded walkers can be used tosample a second-order correction to the energy from Epstein-Nesbet perturbation theory.16he correction is calculated by ∆ E = 1(∆ τ ) X i ∈ rejected S i S i E − H ii . (16)Here, ∆ τ is the time step, E is the i-FCIQMC estimate of the energy, and S ri is the totalspawned weight onto determinant | D i i in replica r (the replica approach will be discussed inmore detail in Section VII). This correction requires that two replica FCIQMC simulationsare being performed simultaneously, to avoid biases in this estimator. The summation hereis performed over all spawning attempts which are discarded on both replicas simultaneously.This must only be applied to correct the variational energy estimator from i-FCIQMC.Such variational energies in NECI can either be calculated directly , or from two-bodyreduced density matrices, which may be sampled in FCIQMC.This perturbative correction is essentially free to accumulate, since all spawned walkerscontributing to Eq. (16) are created regardless. The only signiﬁcant extra cost comes fromthe requirement to perform two replica simulations. However, for large systems the noiseon this correction can become signiﬁcant, which necessitates further running time to reducestatistical errors.This correction has proven extremely successful in practice, particularly for weakly corre-lated systems, where it is typical to see − of remaining initiator error removed . VII. DENSITY MATRIX SAMPLING AND PURE EXPECTATIONVALUES

While the total energy is an important quantity to extract from quantum systems, amore complete characterization of a system requires the ability to extract information aboutother expectation values. If these expectation values are derived from operators which do notcommute with the Hamiltonian of the system, then a ‘projected’ estimate of the expectationvalue akin to Eq. 9 is not possible, and alternatives within FCIQMC are required in order tocompute them. This is the case for many key quantities such as nuclear derivatives (forceson atoms), dipole moments and higher-order electrical moments, as well as other observablessuch as pair distribution functions . They all can be obtained via the corresponding n -bodyreduced density matrix ( n -RDM), where n is the rank of the operator in question, that fullycharacterizes the correlated distribution and coherence of n electrons relative to each other.17his information can also be used to calculate quantum information measures, which are notobservables but which characterize the entanglement within a system, such as correlationentropies .To characterize the strength of coupling between diﬀerent states under certain operators,e.g. the oscillator strength of optical excitations, as well as obtaining other dynamical infor-mation requires computing transition density matrices (tRDMs) between stochastic samplesof diﬀerent states, which can be sampled within FCIQMC using the excited state featurediscussed in section VIII . Furthermore, the two states considered may not sample eigen-states of the system, but one of them can be a response state of the system, then the resultingtRDMs characterize the response of a system to a perturbation, corresponding to a higherderivative of the energy such as the polarizability of the system, which will be addressed insection IX . Finally, RDMs can also be used to characterize the expectation value of an eﬀective Hamiltonian in a subspace of a system . This eﬀective Hamiltonian can includeeﬀects such as electronic correlations coupling the space to a wider external set of states.The plurality of electronic structure methods of this kind, such as explicitly correlated ‘F12’corrections for basis set incompleteness ; multi-conﬁgurational self-consistent ﬁeld ;internally-contracted multireference perturbation theories ; embedding methods ; andthe Multi-Conﬁguration Pair-Density Functional Theory (MC-PDFT) , further attest theimportance of faithful and eﬃcient sampling of RDMs in electronic structure theory.All expectation values of interest can be derived from contractions with a general reduceddensity matrix object, deﬁned as Γ A,Bi i ...i n ,j j ...j n = h Ψ A | ˆ a † i ˆ a † i ... ˆ a † i n ˆ a j n ˆ a j n − ... ˆ a j | Ψ B i , (17)where n denotes the ‘rank’ of the RDM, and the choice of the states A and B deﬁne thetype of RDM, as described above. In this section we focus on the sampling of the 2-RDM.This is generally the most common RDM required, as most expectation values of interest are(up to) two-body operators, including the total energy of the system. Furthermore, withinFCIQMC, the fact that the rank of the RDM required is then the same as the rank of theHamiltonian which is sampled within the stochastic dynamics, leads to a novel algorithmwhich ensures that the overhead to compute the 2-RDM is relatively small and manageable .Expanding the expression for the 2-RDM in terms of the exact FCI wave function (Eq. 1),18e ﬁnd Γ A,Bkl,mn = X i,j C A ∗ i C Bj h D i | ˆ a † k ˆ a † l ˆ a n ˆ a m | D j i , (18)where i, j index the many-electron Slater determinants and k, l, m, n denote single-particleorbitals. We will focus on the case where we are sampling | Ψ A i = | Ψ B i = | Ψ i , the groundstate of the system, since the same basic principles are applied to sampling the tRDMs,where the other walker distribution may represent an excited state or a response state, withmore details for these cases considered in Refs. 17 and 66. The expectation values derivedfrom these RDMs describe ‘pure’ expectation values, to distinguish them from the projectiveestimate of expectation values given in Eq. 9.There are some features of the form of Eq. 18 that should be noted. Firstly, the 2-RDMrequires the sampled amplitudes on all determinants in the space connected to each othervia (up to) a double electron substitutions. This means that this expectation value requiresa global sampling of connections in the entire Hilbert space, in contrast to the projectedenergy estimate, which requires only a consideration of the determinant amplitudes whichare connected directly through ˆ H to the reference determinant (or small trial wave function,see Sec. IV). Secondly, it is seen that the pairs of determinants in Eq. 18 are exactly the sameas the pairs of determinants connected in general through the Hamiltonian operator used tosample the FCIQMC dynamics in Eq. 4, assuming that the matrix element is not zero dueto (accidental) symmetry between the determinants. This allows an algorithm to samplethe 2-RDM concurrently with the sampling of the Hamiltonian required for the spawningsteps between occupied determinant pairs.A ﬁnal point to note, is that the n -RDM is a non-linear functional of the FCI amplitudes– speciﬁcally being a quadratic form. Within the FCIQMC sampling, the C i amplitudes arestochastic variables represented as walkers ( C i ( τ ) ) which at any one iteration are in generalvery diﬀerent from the true C i , but when averaged over long times have an expected meanamplitude which is the same as (or a very good approximation to) C i . However, due to thisnon-linearity in the form of the 2-RDM, the average of the sampled amplitude product isnot equal to the product of the average amplitude, h C ∗ i ( τ ) C j ( τ ) i τ = h C ∗ i ( τ ) i τ h C j ( τ ) i τ , as itneglects the (co-)variance between the sampled determinant amplitudes. Initial applicationsof RDM sampling in FCIQMC neglected these correlations in the sampling of the RDMs,which signiﬁcantly hampered the results, especially for the diagonal elements of the RDMs .19he result is that even if each determinant were correctly sampled on average, the stochasticerror in the sampling would manifest as systematic error in the RDMs, and thus only givecorrect results in the large walker limit, but not the large sampling limit, even if the wavefunction were correctly resolved.The resolution to this problem came via the ‘replica trick’ , which changes thequadratic RDM functional into a bilinear one . This formally removes the systematicerror in the RDM sampling, at the expense of requiring a second walker distribution. Thepremise is to ensure that these two walker distributions are entirely independent and prop-agated in parallel, sampling the same (in this instance ground-state) distribution. Thisensures an unbiased sampling of the desired RDM, by ensuring that each RDM contributionis derived from the product of an uncorrelated amplitude from each replica walker distri-bution. The sampling algorithm then proceeds by ensuring that during the spawning step,the current amplitudes are packaged and communicated along with any spawned walkers.During the annihilation stage, these amplitudes are then multiplied by the amplitude onthe child determinant from the other replica distribution, and this product then contributesto all n -RDMs which are accumulated, and equal to the rank of the excitation or higher.In this way, the eﬃcient and parallel annihilation algorithm is used to avoid latency ofadditional communication operations, with the necessary packaging of the amplitude andspeciﬁcation of the parent determinant along with each spawned walker being the onlyadditional overhead. The NECI implementation allows for up to 20 replicas to be run, whichexceeds any needs arising in the context of RDM calculation.Full details about the ground-state 2-RDM sampling algorithm can be found in Ref. 15,however we mention a few salient additional details here. The RDMs are stored in fullydistributed and sparse data structures, allowing the accumulation of RDMs for very largenumbers of orbitals. The sampling of the RDMs is also not inherently hermitian. Whilethe sampling within FCIQMC obeys detailed balance, the ﬂux of walkers spawned from | D i i → | D j i is only equal to the reverse ﬂux on average, and therefore the stochasticnoise ensures that the swapping of the two states does not give identical accumulated RDMamplitudes for ﬁnite sampling (note that for transition RDMs this is not expected, withmore details in Ref. 17). Similarly, the states sampled in FCIQMC are not normalized, andtherefore neither are the sampled RDMs. Both of these aspects are addressed at the end ofthe calculation, where the RDMs are explicitly made hermitian via averaging appropriate20ntries, and the normalization is constrained by ensuring that the trace of the RDMs givethe appropriate number of electrons .The dominant cost of RDM sampling in large systems comes from the sampling of ele-ments deﬁned by pairs of creation and annihilation operators with the same orbital index.These correspond to tuples of occupied orbitals common to both | D i i and | D j i states. Weterm these contributions promotions , as they contribute to a rank of a RDM greater thanthe excitation level between | D i i and | D j i . For instance, single excitation spawning eventsneed to contribute to all N − elements of the 2-RDM corresponding to common occu-pied orbitals in the two determinants. The most extreme case comes from the ‘diagonal’contributions to the RDMs, where i = j , which requires N ( N − / contributions to the2-RDM to be included where each index deﬁning the RDM element corresponds to the sameoccupied orbital in the two determinants. To mitigate this cost, these diagonal elementsare stored locally on each MPI process, and only infrequently accumulated at the end of anRDM ‘sampling block’, or when the determinant becomes unoccupied, with the amplitudeaveraged over the sampling block. This substantially reduces the frequency of the O ( N ) operations required to sample these promoted contributions from the diagonal of Eq. 18.Other eﬃciency boosting modiﬁcations to the algorithm, such as the semi-stochasticadaptation (detailed in Sec III) are also seamlessly integrated with the RDM accumula-tion. Within the deterministic core space the RDM contributions are exactly accumulatedalong with the exact propagation, with the connections from the deterministic to the stochas-tic spaces sampled in the standard fashion. This combination of RDM sampling with thesemi-stochastic algorithm can greatly reduce the stochastic errors in the RDMs by ensuringthat contributions from large weighted determinant amplitudes are explicitly and determin-istically included. Furthermore, the reference determinant and its direct excitations arealso exactly accumulated. This is partly because these are likely important contributions,but principally, if the reference is a Hartree–Fock determinant then the coupling to its sin-gle excitations via the Hamiltonian will be zero due to Brillouin’s theorem. These singleexcitations will nevertheless contribute to the RDMs, and therefore are included explicitly.The sampling of RDMs with a rank greater than two is also now possible within theFCIQMC algorithm and NECI code. The importance of these quantities is primarily in theiruse in internally-contracted multireference perturbation theories, although a number of otheruses for these quantities also exist . These methods allow for the FCIQMC dynamics to21nly consider an active orbital subspace, hugely reducing both the full Hilbert space ofthe stochastic dynamics as well as the required timestep, while the accumulation of upto 4-RDMs (or contracted lower-order intermediates for eﬃciency) allows for a rigorouscoupling of the strong correlation in the low-energy active space to the dynamic correlationin the wider ‘external’ space via post-processing of these higher-body RDMs with integralsof the external space. Sampling of higher-body RDMs cannot use the identical algorithmto the 2-RDMs, since it now requires the product of determinant amplitudes separated byup to 4-electron excitations, which are not explicitly sampled via the standard FCIQMCpropagation algorithm. To allow for this sampling, we include an additional spawning stepper walker of excitations with a rank between three and n , where n is the rank of thehighest RDM accumulated. This additional spawning is controlled with a variable stochasticresolution, ensuring that the frequency of these samples is relatively rare to control the costof sampling these excitations (approximately only one higher-body spawn for every 10-20traditional (up to two-body) spawning attempts). There is no timestep associated withthese excitations, and every attempt is ‘successful’, transferring information about higher-body correlations in the system and contributing to these higher-body excitations, but notmodifying the distribution of the sampled wave function. However, the dominant cost ofsampling these higher-body RDMs is not the sampling events themselves, but rather thepromotion of lower-rank excitations to these higher-body intermediates. Nevertheless, thefaithful sampling of these higher-body properties has allowed for the stochastic estimate offully internally-contracted perturbation theories in large active spaces, with similar numberof walkers required to sample the 2-RDM in an active space . VIII. EXCITED STATE CALCULATIONS

In many applications, besides ground state energies, the properties of excited states areof interest. If states in diﬀerent symmetry sectors are targeted, this can be easily achievedby performing separate calculations in each sector, yielding the ground state with a givensymmetry. If, however, several eigenstates with the same symmetry are required, then thisapproach is not suﬃcient. The FCIQMC method is not inherently limited to ground statecalculations, and can employ a Gram-Schmidt orthogonalization technique to calculate aset of orthogonal eigenstates . The obtained states will then be the lowest energy states22ith a given symmetry.Calculating eigenstates sequentially and orthogonalizing against all previously calculatedstates carries the problem of only orthogonalizing against a single snapshot of the wavefunction, which will lead to a biased estimate of the excited states. Instead, calculating allstates in parallel and orthogonalizing after each iteration gives much better results.The required modiﬁcations to the algorithm are minimal. To calculate a set of m eigen-states, m FCIQMC calculations are run in parallel, with the additional step of performingthe instantaneous orthogonalization between the m states, performed at the end of each iter-ation. The orthogonalization requires O ( m ) operations and uses one global communicationper state. To run m parallel calculations, the replica feature presented in section VII is usedto eﬃciently sample a number of states in parallel. After each FCIQMC iteration, for eachstate, the contributions from all states of lower energies are projected out. The update stepfor the n -th wave function | ψ n i is then modiﬁed to | ψ n ( τ + ∆ τ ) i = ˆ O n ( τ + ∆ τ ) (cid:16) − ∆ τ (cid:16) ˆ H − S n ( τ ) (cid:17)(cid:17) | ψ n ( τ ) i , (19)with the orthogonalization operator for the n -th state ˆ O n ( τ ) = 1 − X m

Response theory is a well-established formalism to calculate molecular properties usingquantum chemical methods . It is, in general, formulated for a time-dependent ﬁeldwhich allows to compute both static and dynamic molecular properties. However, it iscurrently only implemented for a static ﬁeld within

NECI .Calculation of molecular properties using response theory relies on the evaluation of theresponse vectors which are the ﬁrst or higher order wave functions of the system in thepresence of an external perturbation ˆ V . According to Wigner’s “(2n+1)” rule, responsevectors up to order n are required to obtain response properties up to order n + 1 . Forcalculating second-order properties such as dipole polarizability, the ﬁrst-order responsevector, C (1) , needs to be obtained along with the zero-order wave function parameter C (0) .While C (0) uses the original FCIQMC working equation 4, C (1) is updated according to ∆ C (1) i = − ∆ τ X j ( H ij − S ( τ )) C (1) j | {z } Hamiltonian dynamics − ∆ τ αV ij C (0) j | {z } Perturbation dynamics . (21)The response vector is discretized into signed walkers in the same way it is done for C (0) . The dynamics of the response-state walker is simulated according to Eq. 21 using anadditional pair of replica and it works in parallel with the dynamics of the zero-order state.Additional spawning and death steps are devised for the response-state walker dynamics, dueto the presence of the perturbation, alongside the original spawning and death steps in thedynamics. The dependence of the response state on the zero-order states comes from thesetwo aforementioned additional steps. A Gram-Schmidt orthogonalization is applied to theresponse-state walker distribution with respect to the zero-order walker distribution at eachiteration using the same functionality as described in section VIII. This ensures orthogonalityof the response vectors with respect to all lower-order wave function parameters.The norm of the response walkers is ﬁxed by the choice of the normalization of the zero-order walkers and it can, in principle, grow at a much faster rate than the zero-order norm.Therefore, in Eq. 21 we introduce the parameter α to control the norm of the responsewalkers and to reduce the computational eﬀort expended in simulating their dynamics. Weaim at matching the number of response-state walkers ( N (1) w ) with the number of zero-order24alkers ( N (0) w ) by updating α periodically as α = N (0) w N (1) w . (22)Once the walker number stabilizes, the value of α is kept ﬁxed, while accumulating statistics.As α scales the norm of the response vector, it needs to be taken into account while evaluatingresponse properties.Response properties are then obtained from transition reduced density matrices (tRDMs)which are stochastically accumulated following Eq. 18. For example, dipole polarizability isobtained from the one-electron tRDMs between the zero- and ﬁrst-order wave function as α xy = − X pq (cid:2) ˆ x pq γ yp,q + ˆ y pq γ xp,q (cid:3) , (23)with the γ yp,q being calculated from the two-electron tRDM as γ ypq = 1( N − X a (cid:20) α Γ (0)(1) pa,qa [1] + 1 α Γ (0)(1) pa,qa [2] (cid:21) . (24)Due to the use of two replica per state while sampling both zero- and ﬁrst-order states, statis-tically independent and unbiased estimator of tRDMs can be constructed in two alternativeways which are denoted here as ‘ [1] ’ and ‘ [2] ’. The perturbation used in the computation ofthe tRDMs in Eq. 24 is the dipole operator ˆ y . The factor α appears due to the re-scaling ofthe response vector following Eq. 21. X. REAL-TIME FCIQMC

For the purpose of obtaining spectroscopic data or targeting highly excited states, thecalculation of orthogonal sets of eigenstates quickly becomes unfeasible, as to obtain a certaineigenstate, all eigenstates of lower energy with the same symmetry have to be computed aswell. Spectral functions and the resulting excitation energies can however be calculated usingreal-time evolution of the wave function, yielding time-resolved Green’s functions whichcontain information on the full spectrum. In addition to the stochastic imaginary timeevolution of a wave function using in the calculation of individual states,

NECI supportsperforming real-time and arbitrary complex-time calculations, evolving the wave functionalongside a complex time trajectory . As Green’s functions are quadratic in the coeﬃcients25f the wave function and averaging over multiple iterations is not an option when evolving awave function with a real-time component, running multiple calculations in parallel akin toexcited state calculations discussed in section VIII is mandatory, as is running with complexcoeﬃcients. The real-time propagation can be used to obtain energy gaps from spectraldensities and thus target excited states. In contrast to the direct calculation of excitedstates, these have not to be calculated one by one and in order of ascending energy, however.In Figure 3, a simple example for applying both the excited-state search and the real-timeevolution to the Beryllium atom in a cc-pVDZ basis set to obtain the singlet-triplet gap ofthe lowest P-state is given. An issue with running real-time calculations is the diﬃculty ofpopulation control, as the death step is essentially replaced by a rotation in the complexplane. This issue can be mitigated by a rotation of the trajectory, evolving along a trajectoryin complex plane. NECI supports an automated trajectory selection that updates the angle α of the time trajectory in the complex plane to maintain a constant population. The Green’sfunction obtained in the complex plane can then be used to obtain real-frequency spectralfunctions using analytic continuation , with the analytic continuation being signiﬁcantlyeasier and more information being recoverable the closer to the real axis the trajectoryis . As, in contrast to the projector FCIQMC, errors arising from the expansion of thepropagator are a concern when running complex-time calculations, NECI uses a second orderRunge-Kutta integrator here, to suﬃciently reduce the time-step error.

XI. TRANSCORRELATED METHOD

The computational cost of a Full CI method usually scales exponentially with respect tothe size of the basis set. On the other hand, the low regularity of wave functions (charac-terized by the electronic cusp ) causes a very slow convergence towards the basis set limit.For calculations aiming at highly accurate results, it is very helpful to speed up such slowconvergence.A Jastrow Ansatz oﬀers a way to factor out the cusp from the wave function | Ψ i = e ˆ T | Φ i , (25)where ˆ T = P i

NECI for the Berylliumatom targeting two states in the B irrep of the D h symmetry group (corresponding to P-states).The two states have triplet/singlet character and the energy diﬀerence is . . b) Spectraldecomposition of a s → p excited state of the Beryllium atom created using real-time evolutionwith NECI , containing the two lowest energy P-states which correspond to the states targeted in a).The gap between the two states is . , agreeing with the excited state calculation within thespectral resolution of . . The zero of the energy axis corresponds to the kation ground stateenergy. The output ﬁles are available in the supplementary material . In experiment, a value of . is observed for this energy gap . in u ( r i , r j ) , the regularity of | Φ i is improved at least by one order over | Ψ i . We canalso include other terms in u ( r i , r j ) to capture as much dynamic correlations as possible.By using variational quantum Monte Carlo methods (VMC), the pair correlation function u ( r i , r j ) can be obtained for a single determinant | Φ i (e.g., | Φ HF i ) or a linear combinationof small number of determinants (e.g., a small CAS wave function).The transcorrelated method of Boys and Handy provides a simple and eﬃcient wayto treat the Jastrow Ansatz, where the original Schrödinger equation is transformed into anon-Hermitian eigenvalue problem ˜ H | Φ i = E | Φ i , ˜ H = e − ˆ T ˆ He ˆ T . (26)The advantage of this form of ˆ T is that the similarity transformation leads to an expansion27hich terminates at second order ˜ H = ˆ H + [ ˆ H, ˆ T ] + 12 [[ ˆ H, ˆ T ] , ˆ T ] (27) = ˆ H − X i (cid:18) ∇ i ˆ T + ( ∇ i ˆ T ) ∇ i + 12 ( ∇ i ˆ T ) (cid:19) (28) = ˆ H − X i

Symmetry is a concept of paramount importance in the description and understanding ofphysical and chemical processes. According to Noether’s theorem there is a direct connectionbetween conserved quantities of a system and its inherent symmetries. Thus, identifyingthem allows a deeper insight in the physical mechanisms of studied systems. Moreover,29 ✥ (cid:0) ✥ ✁✥ (cid:0) ✂✥ (cid:0) ✂ ✁✥ (cid:0) ✄ ❉ ☎ ❚ ☎ ◗ ☎ ❊ ✆✆ ✝ ✆ ✞ ✟ ✠ ✝ ✠ ✡ ☛ ☞✟☞ ✆ ✌ ✍ ✎ ✏ ❉ ☎ ❚ ☎ ◗ ☎ ❉ ☎ ❚ ☎ ◗ ☎❇ ✑ ✒ ✓ ✒ ✒ ✔ ✕❈ ❈ ✖ ❉ ✗ ❚ ✘ ✙ ✚ ✂ ✄ ✖ ❙ ✛ ✖ ❙ ✂ ✛ ▲ ✓❇ ✔❇❈◆❖ ✚◆ ✔ FIG. 4. Exemplary application of the transcorrelated method: Errors in the total energies of theﬁrst-row atoms, in Hartree, for the two correlation functions and the F12 methodology. Reproducedfrom Cohen et al. , JCP 151, 061101 (2019) with the permission of AIP Publishing. the usage of symmetries in electronic structure calculations enables a much more eﬃcientformulation of the problem at hand. The Hamiltonian formulated in a basis respecting thesesymmetries has a block-diagonal structure, with zero overlap between states belonging todiﬀerent ‘good’ quantum numbers. This greatly reduces the necessary computational eﬀortto solve these problems and allows much larger systems to be studied. A. Common Symmetries utilized in Electronic Structure Calculations and

NECI

There are several symmetries which are commonly used in electronic structure calcu-lations, due to the above mentioned beneﬁts and their ease of implementation. And ourFCIQMC code

NECI is no exception in this regard.30 onservation of the ˆ S z spin-projection As mentioned in section I, FCIQMC is usually formulated in a complete basis of Slaterdeterminants (SDs). SDs are eigenfunctions of the total ˆ S z operator, and consequently, ifthe studied Hamiltonian, ˆ H , is spin-independent (no applied magnetic ﬁeld and spin-orbitinteraction) it commutes with ˆ S z , [ ˆ H, ˆ S z ] = 0 . The conservation of the m s eigenvalue in aFCIQMC calculation thus follows quite naturally: the initial chosen m s sector, determinedby the starting SD used, will never be left by the random excitation generation processsketched in section II. No terms in the spin-conserving Hamiltonian will ever cause anystate in the simulation to have a diﬀerent m s value than the initial one. As a consequencethe sampled wavefunction will always be an eigenfunction of ˆ S z with a chosen m s , deter-mined at the start of a calculation. Discrete and Point Group Symmetries in FCIQMC

NECI is also capable of utilizing Abelian point group symmetries, with D h being the ‘largest’spatial group (similar to other quantum chemistry packages, e.g. Molcas and Molpro ),momentum conservation (due to translational invariance) in the Hubbard model and uni-form electron gas calculations and preservation of the m l eigenvalues of the orbital angularmomentum operator ˆ L z (the underlying molecular orbitals have to be constructed as eigen-function of ˆ L z ). This is implemented via a symmetry-conserving excitation generation stepand is explained in more detail in Appendix A 1 a. B. Total spin conservation

One important symmetry of spin-preserving, nonrelativistic

Hamiltonians is the global SU (2) spin-rotation symmetry. However, despite the theoretical beneﬁts, the total SU (2) spin symmetry is not as widely used as other symmetries, like translational or point groupsymmetries, due to their usually impractical and complicated implementation.There are two kind of implementations of total spin conservation in our FCIQMCcode NECI . One approximate one is based on

Half-Projected Hartree-Fock (HPHF) func-tions . Their rationale relies on the fact that for an even number of electrons, everyspin state | S i contains degenerate eigenfunctions with m s = 0 . Using time-reversal symme- ry arguments a HPHF function can be constructed as | H i i =  | D i i for fully close-shell determinants √ (cid:0) | D i i ± | D i i (cid:1) otherwise, (33)where | D i i indicates the spin-ﬂipped version of | D i i . Depending on the sign of the open-shellcoupled determinants, | H i i are eigenfunctions of ˆ S with odd ( − ) or even ( + ) eigenvalue S .The use of HPHF is restricted to systems with an even number of electrons and can onlytarget the lowest even- and odd- S state. Thus, it can not diﬀerentiate between, e.g. a singlet S = 0 and quintet S = 2 state.

1. The (graphical) Unitary Group Approach (GUGA)

Our full implementation of total spin conservation is based on the graphical Unitarygroup approach (GUGA). It relies on the observation that the spin-free excitation operators ˆ E ij in the spin-free formulation of the electronic Hamiltonian, ˆ H = n X ij t ij ˆ E ij + n X ijkl V ijkl (cid:16) ˆ E ij ˆ E kl − δ jk ˆ E il (cid:17) , (34)have the same commutation relations, [ ˆ E ij , ˆ E kl ] = δ jk ˆ E il − δ il ˆ E kj , (35)as the generators of the Unitary group U ( n ) . This connection allows the usage of theGel’fand-Tsetlin (GT) basis , which is irreducible and invariant under the action of theoperators ˆ E ij , in electronic structure calculations. The GT basis is a general basis for anyirrep of U ( n ) , but Paldus realized that only a special subset is relevant for the electronicproblem (34), due to the Pauli exclusion principle. Based on Paldus’ work, Shavitt furtherdeveloped an even more compact representation by introducing the graphical extension ofthe UGA. This leads to the most eﬃcient encoding of a spin-adapted GT basis state (CSF)in form of a step-vector | d i . This step-vector representation has the same storage cost oftwo bits per spatial orbital as Slater determinants. The entries of this step-vector encodethe change of the total number of electrons ∆ N i and the change of the total spin ∆ S i ofsubsequent spatial orbitals i . This is summarized in Table I. All possible CSFs for a chosen32 ABLE I. Possible step-values d i and the corresponding change in number of electrons ∆ N i andtotal spin ∆ S i of subsequent spatial orbitals i . d i ∆ N i ∆ S i number of spatial orbitals N , number of electrons n and total spin S are then given by allstep-vectors | d i = | d , d , . . . , d N i fulﬁlling the restrictions N X i =1 ∆ n i = n, N X i =1 ∆ S i = S, and S k = k X i =1 ∆ S i ≥ . (36)The last restriction in Eq. 36 corresponds to the fact that the (intermediate) total spin mustnever be less than 0.The most important ﬁnding of Paldus and Shavitt was that the Hamiltonian matrixelements—more speciﬁcally the coupling coeﬃcients between two CSFs, e.g. h m ′ | ˆ E ij | m i —canbe entirely formulated within the framework of the GUGA; without any reference and thusnecessity to transform to a Slater determinant based formulation. Although CSFs can beexpressed as a linear combination of SDs, the complexity of this transformation scales expo-nential with the number of open-shell orbitals of a speciﬁc CSF . Thus, it is prohibitivelyhard to rely on such a transformation and for already more than ≈ electrons a formulationwithout any reference to SDs is much more preferable.Furthermore, Shavitt and Paldus were able to ﬁnd a very eﬃcient formulation of thecoupling coeﬃcients as a product of terms, via the graphical extension of the UGA. Matrixelements between two given CSFs only depend on the shape of the loop enclosed by theirgraphical representation, as depicted in Fig. 5. The coupling coeﬃcient of the one-bodyoperator ˆ E ij is given by h m ′ | ˆ E ij | m i = j Y k = i W ( Q k ; d ′ k , d k , ∆ S k , S k ) , (37)where the product terms depend on the step-values of the two CSFs, d ′ k and d k , the diﬀerencein the current spin ∆ S k (with the restriction S ′ k − S k = ± / ) and the intermediate spin33 i − ij − jn loop tailloop head graph tailgraph head LLL | m i ′ h m | lower walkloopupper walk FIG. 5. Graphical representation of the coupling coeﬃcient between two CSFs, h m | ˆ E ij | m ′ i . S k of | m i at orbital k . Q k in Eq. (37) depends on the shape of the loop formed by | m i and | m ′ i at level k and is tabulated in e.g. Ref. [102]. Additionally, the two CSFs, | m i and | m ′ i ,must coincide outside the range ( i, j ) for Eq. (37) to be non-zero.

2. Spin-adapted excitation generation - GUGA-FCIQMC

The compact representation of spin-adapted basis functions in form of step-vectors andthe product form of the coupling coeﬃcients (37) allow for a very eﬃcient implementationin our stochastic FCIQMC code

NECI . As mentioned in Sec. II, the excitation generation step is at the heart of any FCIQMC code.The main diﬀerence to a SD-based implementation of FCIQMC, apart from the moreinvolved matrix element calculation (37), is the higher connectivity within a CSF basis.For a given excitation operator ˆ E ij , with spatial orbital indices ( i, j ) , there is usually morethan one possible excited CSF | m ′ i when applied to | m i , ˆ E ij | m i = P k c k | m ′ k i . All valid spin-recouplings within the excitation range ( i, j ) can have a non-zero coupling coeﬃcient aswell. This fact is usually the prohibiting factor in spin-adapted approaches. However, thereis a quite virtuous combination of the concepts of FCIQMC and the GUGA formalism, as34 FIG. 6. Schematic representation of a one-dimensional hydrogen chain of L hydrogen atoms withequal separation r . one only needs to pick one possible excitation from | m i to | m ′ i in the excitation generationstep of FCIQMC, see Sec. II.We resolved this issue, by randomly choosing one possible valid branch in the graphicalrepresentation, depicted in Fig. 5, for randomly chosen spatial orbital indices i, j ( , k, l ) .Additionally we weight the random moves according to the expected magnitude of thecoupling coeﬃcients to ensure p gen ( m ′ | m ) ∝ | H m ′ m | . This approach avoids the possibleexponential scaling as a function of the open-shell orbitals of connected states within a CSFbased approach.However, this comes with the price of reduced generation probabilities and consequentlya lower imaginary time-step, as mentioned in Sec. II. Combined with an additional eﬀortof calculating these random choices in the excitation generation and the on-the-ﬂy matrixelement computation, the GUGA-FCIQMC implementation has a worse scaling with thenumber of spatial orbitals N compared to a Slater determinant based implementation .However, the beneﬁts of using a spin-adapted basis are a reduced Hilbert space size , elimination of spin-contamination in the sampled wavefunction and most importantly: thespin-adapted FCIQMC implementation via the GUGA allows targeting speciﬁc spin states,which are otherwise not attainable with a SD based implementation as discussed in Ref. 8.The unique speciﬁcation of a target spin allows resolving near degenerate spin states andconsequently numerical results can be interpreted more clearly. This enables more insightin the intricate interplay of nearly degenerate spin states and their eﬀect on the chemicaland physical properties of matter.

3. Example: Hydrogen chain in a minimal basis

The GUGA-FCIQMC method has been benchmarked by applying it to a linear chainof L equidistant hydrogen atoms recently studied to test a variety of quantum chem-ical methods , which shall serve as an example here. Using a minimal STO-6G basis35here is only one orbital per H atom and the system resembles a one-dimensional Hubbardmodel with long-range interaction. Studying a system of hydrogen atoms removescomplexities like core electrons or relativistic eﬀects and thus is an convenient benchmarksystem for quantum chemical methods.For large equidistant separation of the H atoms a localized basis, obtained with the defaultBoys-localization in Molpro’s LOCALI routine, with singly occupied orbitals centred at eachhydrogen is more appropriate than a HF basis. Thus, this is an optimal diﬃcult benchmarksystem of the GUGA-FCIQMC method, since the complexity of a spin-adapted basis dependson the number of open-shell orbitals, which is maximal for this system. Particularly targetingthe low-spin eigenstates of such highly open-shell systems poses a diﬃcult challenge withina spin-adapted formulation. This situation is depicted schematically in Fig. 6.We studied this system to show that we are able to treat systems with up to 30 open-shellorbitals with our stochastic implementation of the GUGA approach . We calculated the S = 0 , and (only S = 0 for L = 30 ) energy per atom up to L = 30 H atoms in a minimalSTO-6G basis at the stretched r = 3 . a geometry and compared it with DMRG reference results. The results are shown in Table II, where we see excellent agreement withinchemical accuracy with the reference results.An important fact is the order of the orbitals though. Similar to the DMRG method it ismost beneﬁcial to order the orbitals according to their overlap, since the number of possiblespin recouplings depends on the number of open shell orbitals in the excitation range. If wemake a poor choice in the ordering of orbitals, excitations between physically adjacent andthus strongly overlapping orbitals are accompanied by numerous possible spin-recouplings inthe excitation range, if stored far apart in the list of orbitals. This behaviour is thoroughlydiscussed in Ref [115]. XIII. PARALLEL SCALING

When applying for access to large computing clusters, it is often necessary to demonstratethat the software being used (in this case

NECI ) is capable of using the hardware eﬃciently.Ideally, the speed-up relative to using some base number of compute cores should growperfectly linearly with the number of cores. In 2014, Booth et. al. , presented an examplewith 500 × walkers in which no deviation from a linear speed-up is noticeable when36 ABLE II. Example for application of GUGA-FCIQMC: Diﬀerence of the energy per site

E/L ofan hydrogen chain for diﬀerent number L of H atoms and total spin S in a STO-6G basis set atthe stretched bond distance of r = 3 . a compared with DMRG reference results . TheGUGA-FCIQMC results were obtained without the initiator approximation . Reproduced fromDobrautz, Ph.D. thesis (2019) .L S E ref [ E h ] E F CIQMC [ E h ] ∆ E [ mE h ]

20 0 -0.481979 -0.481978(1) -0.001(1)20 1 -0.481683 -0.481681(11) -0.002(11)20 2 -0.480766 -0.480764(18) -0.002(18)30 0 -0.482020 -0.481972(31) -0.047(31) comparing using 512 cores to using 32, and even at 2048 cores, a speed-up by a factor of57.5 was reported, which is 90% of the ideal speed-up factor of 64. In that work, the largestnumber of cores explored was 2048. By comparing the performance for a calculation with100 × walkers and 500 × walkers, the same ﬁgure showed that the speed-up becamecloser to the ideal speed-up when the number of walkers was increased, suggesting that whenusing even more walkers, the eﬃciency comes even closer to 100% of the ideal speed-up factor.Since 90% of the ideal speed-up factor was achieved in 2014 with only 500 × walkerson 2048 cores, and large compute clusters nowadays tend to have tens of thousands of coresavailable, we report scaling data for a much larger number of walkers on up to 24,800 coresin Table IV. The calculations were done using the integrals in FCIDUMP format for the(54e,54o) active space ﬁrst described in for the FeMoco molecule, and the output ﬁles areprovided in the supplementary material .The scaling analysis presented in Table IV was done with 32 billion walkers on each of thetwo replicas used for the RDM sampling. Calculations at 32 billion walkers are expensive,so we only completed enough iterations to determine an accurate estimate of the averageruntime per iteration for the scaling analysis, and not enough iterations to accurately esti-mate the energy.One may ask whether or not the scaling observed in Table IV was performed for a rea-sonable number of walkers for this active space. To answer this question, we compare in37 ABLE III. Best non-extrapolated energies obtained for the CAS(54,54) of the FeMoco molecule,with three diﬀerent methods. DMRG and sHCI energies were calculated in Ref. , and i-FCIQMCresults were obtained in this work with 8 billion walkers on each of the two replicas for the RDMsampling. Method Total Energyi-FCIQMC-RDM -13 482.174 95(4)i-FCIQMC-PT2 -13 482.178 45(40)sHCI-VAR -13 482.160 43sHCI-PT2 -13 482.173 38DMRG -13 482.176 81

Table III the best (non-extrapolated) DMRG and sHCI-PT2 energies in the literature to energies obtained with i-FCIQMC at only 8 billion walkers/replica, and ﬁnd that thei-FCIQMC-RDM and i-FCIQMC-PT2 energies are closer together than the sHCI-VAR andsHCI-PT2 energies, indicating that the i-FCIQMC energies are closer to the true FCI limitwhere the diﬀerence between variational and PT2 energies should vanish. The DMRG resultlies about half-way between the two i-FCIQMC results, but fairly well below the lower of thesHCI results (a forthcoming publication speciﬁcally about the FeMoco system is planned,in which more details will be presented, but the purpose of this paper is to give an overviewof the NECI code).Furthermore, comparing the time per iteration between × and × walkersshows that a high parallel eﬃciency is also achieved at the lower walker number. Thedeterminants in NECI are stored using a hash table, making i-FCIQMC linearly scaling inthe walker number , so the ideal time per iteration with × walkers at 19960 coresaccording to the result for × walkers at 16000 cores would be 23.4 seconds, whichis only marginally smaller than the reported 23.5 seconds. Note however, that this is therelative eﬃciency between large scale calculations, which demonstrates performance gainfrom extending parallelization at large scales, not from parallelization over the entire rangeof scales, which is addressed to some extent by the Chromium dimer example below.38n the case of the Chromium dimer (cc-pVDZ, 28 electrons correlated in 76 spatialorbitals) considered in ﬁgure 7, the average time per iteration per walker ranges from . × − s at 640 cores to . × − s at 10240 cores and . × − s at 20480cores, corresponding to a parallel speed-up of 82.1% from 10240 to 20480 cores and an over-all speed-up of 65.2% over the full range. The deviation from ideal scaling almost exclusivelystems from the communication of the spawns, at lower walker numbers, the communicativeoverhead is more signiﬁcant, reducing the parallel eﬃciency compared to the FeMoco exam-ple. Nevertheless, a very high yield can be obtained from scaling up the number of cores,even for already large scales. A. Load balancing

The parallel eﬃciency of NECI is made possible by treating static load imbalance.

NECI contains a load-balancing feature , which is enabled by default and periodically re-assignssome determinants to other processors in order to maintain a constant number of walkersper processor. As can be seen in ﬁgure 7, for the given benchmarks, no signiﬁcant loadimbalance occurs up to (including) 20480 cores . The initialization of a simulation doesnot feature the same speed-up due to I/O operations and initial communication such as trialwave function setup and core space generation. However, since it does not play a signiﬁcantrole for extended calculations, we consider only the time spent in the actual iterations. XIV. INTERFACING NECI

The ongoing development of

NECI is focused on an eﬃciently scaling solver for the CI-problem. It is not desirable to reimplement functionality that is already available in existingquantum chemistry codes. Since the CI-problem is deﬁned by the electronic integrals andsubsequent methods depend on the results of the CI-step, namely the reduced density ma-trices, it is easily possible to replace a CI-solver of existing quantum chemistry code with

NECI .To use

NECI only an input ﬁle and a FCIDUMP ﬁle , which is the widely understoodﬁle format for the electronic integrals, are required. After running

NECI the stochasticallysampled reduced density matrices are available as input for further calculations in other39 of walkers × × × × walkers cases, the time per iteration is averaged over more than 250 iterations and in both casesthe unbiased sample variance over the 250+ iterations is less than 0.5 seconds. For comparison,the time per iteration for × walkers which was used to obtain the energy reported in tableIII, is given. Calculations were run on 512, 620 and 400 nodes with Intel Xeon Gold 6148 Skylakeprocessors with 20 cores at 2.4 GHz and 96 GB of DDR4 RAM, and all nodes were in a single islandwith a 100 Gb/s OmniPath interconnect between the nodes. Hyperthreading was not used. codes. It is possible to link NECI as library and call it directly or to run it as externalprocess and do the communication with explicit copying of ﬁles. The ﬁrst alternative willbe referred to as embedded, the second is the decoupled form.Due to the stochastic nature of the Monte Carlo algorithm, it is not yet possible to use

NECI as a black box CI-solver for larger systems. In this case it is recommended to use thedecoupled form for a better manual control of the convergence. Another advantage of thedecoupled form is the combination of

NECI with diﬀerent quantum chemical algorithms orimplementations that do not beneﬁt from massive parallelisation as much as

NECI . This wayit is possible to switch from serial or single node execution to multiple nodes in the CI-step.So far

NECI has been coupled with

Molpro , Molcas 8 , OpenMolcas , PySCF ,and

VASP . XV. STOCHASTIC-MCSCF

The Stochastic multi-conﬁgurational self-consistent ﬁeld (MCSCF) procedure emergesfrom the combination of conventional MCSCF methodologies with FCIQMC as the CI-eigensolver. Stochastic-MCSCF approaches greatly enlarge the applicability of FCIQMC to40

IG. 7. Total time and time lost due to load imbalance for running 100 iterations with 1.6 Billionwalkers for the Cr /cc-pVDZ (28e in 76o) on 640 to 20480 cores (not counting initialisation). Thecalculations were run on Intel Xeon Gold 6148 Skylake processors, with a 100 Gb/s OmniPathnode interconnect. The code was compiled using the Intel Fortran compiler, version 19.0.4. Asemi-stochastic core-space of 50000 determinants was used, and PCHB excitation generation. Forthe largest number of cores, the time step is . × − with an average acceptance ratio of12.51%, which is representative for all numbers of cores. The load imbalance time is measured asthe accumulated diﬀerence between the maximum and average time per iteration across MPI tasks.Figure generated using Matplotlib . strongly correlated molecular systems of practical interest in chemical science.To date two implementations of Stochastic-MCSCF have been made available, based onthe interface of NECI with

OpenMolcas (and

Molcas 8 ) and PySCF . As they areboth based on the complete active space (CAS) concept, they are often also referred to as41tochastic-CASSCF methods.The Stochastic-CASSCF implemented in PySCF is based on a second order CASSCFalgorithm which decouples the orbital optimization problem from the active space CIproblem, allowing for easy interfacing with

NECI .At each macro-iteration , a FCIQMC simulation is performed at the current point oforbital expansion, and density matrices are stochastically sampled (see section VII). Theseare then passed back to PySCF, which updates the orbital coeﬃcients accordingly, usingeither a 1-step or 2-step approach .The Stochastic-CASSCF implemented in

OpenMolcas is based on the quasi-second orderSuper-CI orbital optimization. Optimal orbitals (in the variational sense) are found bysolving the Super-CI secular equations in the | Super − CI i basis, deﬁned by the CAS wavefunction at the point of expansion, | i , and all its possible single excitations | Super − CI i = | i + X p>q χ pq ( ˆ E pq − ˆ E qp ) | i (38)The wave function is improved by mixing single excitations to the | i wave function. Asthe CASSCF optimization proceeds, the χ pq coeﬃcients decrease until they vanish, and | i will reveal the variational stationary point. Third-order density matrix elements of theexact Super-CI approach are avoided by utilizing an eﬀective one-electron Hamiltonian, asdiscussed in greater details in Reference 19.A ﬂow chart of Stochastic-CASSCF describing the various steps of the CASSCF wavefunction optimization is given in Figure 8.The Stochastic-CASSCF approach has successfully been applied to a number of chal-lenging chemical problems. The accuracy of the method has been demonstrated on sim-ple test cases, such as benzene and naphthalene and more complex molecular systems,namely coronene , free-base porphyrin and Mg-porphyrin . More recently the methodhas also been applied to understand the mechanism stabilizing intermediate spin states inFe(II)-porphyrin , the study of a [ F e ( III ) S ( SCH ) ] − iron-sulfur model system in itsoxidized form , and new superexchange paths in corner-sharing cuprates .To date, only state speciﬁc Stochastic-CASSCF optimizations have been reported. How-ever, state-average Stochastic-CASSCF optimizations are a straightforward extension thatcan be reached by taking advantage of the NECI capability to optimize excited states wavefunctions, as discussed in section VIII. The Stochastic-CASSCF method can also be cou-42

O integral evaluationProcess active spacespecificationsMO integral (FCIDUMP)within the active spaceNoYesOrbital rotationConverged? No

Yes N e w CA SS C F i t er a t i o n FCIQMC dynamicsSample one- and two-bodyreduced density matricesConverged?Post-CASSCF(e.g. MC-PDFT)

FIG. 8. Flow chart summarizing the Stochastic-CASSCF steps. The blue boxes represent partsof the algorithm performed at the

OpenMolcas or PySCF interfacing software. The center yellowbox shows the two crucial FCIQMC steps, stochastic optimization of the CAS-CI wave functionand sampling of one- and two-body reduced density matrices. When embedded schemes are em-ployed, additional external potentials are added within the interfacing software when generatingthe FCIDUMP ﬁle. Post-CASSCF procedures, such as the MC-PDFT methodology, follow theStochastic-CASSCF approach within the interfacing software. pled to the adaptive shift approach discussed in section V with a great enhancement inperformance.

XVI. CONCLUSION

With

NECI we present a state of the art FCIQMC program capable of running a largevariety of versions of the FCIQMC algorithm. This includes the semi-stochastic FCIQMCfeature, energy estimation using trial wave functions, the stochastic sampling of reduceddensity matrices, and excited state calculations. Further features of

NECI ’s FCIQMC imple-mentation discussed are the real-time FCIQMC method and the adaptive shift method, aswell as a spin-adapted formulation of the algorithm and support for transcorrelated Hamil-43onians. We demonstrated the scalability of the program to up to 24800 cores, showing thatthe code can run eﬃciently on large-scale machines.Finally, we highlighted the interoperability of

NECI with other quantum chemistry soft-ware, in particular

OpenMolcas and

PySCF , which can be used to run Stochastic-CASSCFcalculations.

XVII. SUPPLEMENTARY MATERIALS

Example FCIQMC output ﬁles for excited state calculations ( output_file_excited_state_be2_b1g.txt and stats_file_excited_state_be2_b1g.txt ) and real-time calcu-lations including the resulting spectrum ( output_file_real_time_be2_b1g.txt , and fft_spectrum_be2_b1g.txt ) for the examples presented in section 3 are available in the sup-plemental material. Furthermore, the supplement contains the output ﬁles for scaling( output_file_scaling_with_*_cores.txt and output_file_energy_with_8b_walkers.txt ) and load imbalance analysis ( output_file_load_imbalance_n*.txt ). Exemplaryoutput and integral ﬁles for a similarity transformed FCIQMC calculation of the Neonatom in a cc-pVDZ basis set are also supplied in the supplement ( tcdump_Ne_st_pVDZ.h5 and

FCIDUMP_Ne_st_pVDZ integral ﬁles and output_file_Ne_st_pVDZ.txt output stats_file_Ne_st_pVDZ.txt ﬁles). All output ﬁles contain the corresponding FCIQMC input.

XVIII. DATA AVAILABILITY STATEMENT

The data that supports the ﬁndings of this study are available within the article andits supplementary material . The

NECI program can be obtained at https://github.com/ghb24/NECI_STABLE , the development version can be obtained from the correspondingauthor upon reasonable request.

ACKNOWLEDGMENTS

The early development of NECI was supported by the EPSRC under grant numbersEP/J003867/1 and EP/I014624/1.We would like to thank Olle Gunnarsson, David Tew, Daniel Kats, Aron Cohen, andVamshi Katakuri for insightful discussions. 44he high performance benchmarks discussed in section XIII, were ran on the MPCDF(Max Planck Computing & Data Facility) system Cobra.

Appendix A: Stochastic excitation generation and p gen In the following appendices we will consider in some detail the process of (random)excitation generation in FCIQMC - a crucial yet rather ﬂexible aspect of the algorithm.We will consider some general aspects, such as implementation of Abelian symmetries inthe excitation process, as well as non-uniform excitation generation, as is often desirablein quantum chemical Hamiltonians. There are other classes of systems (such as Hubbardmodels, Transcorrelated Hamiltonians, spin models such as Heisenberg systems, etc) forwhich more specialised considerations are necessary for eﬃcient excitation generation butwe will not consider them here.The ﬁrst general point about excitation generation, (by which we mean starting from agiven determinant | D i i we randomly pick either one or two electrons, and a correspondingnumber of holes to substitute them with, to create a second determinant | D j i ), is that if | H ij | > , then the probability ( p gen ( j | i ) ) to select | D j i and | D i i , must also be greater than0. Furthermore, p gen ( j | i ) must be computable , and in general the eﬀort to do so will dependon the algorithm chosen to execute the excitation process.Let us discuss in more detail the process of stochastic excitation generation, and itsimpact on p gen . Suppose we are simulating a system of n electrons in N spin orbitals { φ , ..., φ N } . A given determinant | D i i can be deﬁned by its occupation number represen-tation, I = | n , ..., n N i , which is a binary string such that n i = 1 if orbital i is occupied(‘an electron in | D i i ’), and n i = 0 if it is unoccupied (‘a hole in | D i i ’). Each orbital carriesa spin quantum number σ ( φ i ) , and may also carry a symmetry label, Γ( φ i ) . These areboth discrete symmetries, with σ = ± / , and Γ = Γ , ..., Γ G , where G is the number ofirreducible representations available in the point-group of system under consideration. Wewill only consider Abelian groups, so that the product of symmetry labels uniquely speciﬁesanother symmetry label. This simpliﬁes the task of selecting excitations, although it doesnot necessarily exploit the full symmetry of the problem.45 . Uniform excitation generation Now we wish to perform a stochastic excitation generation, which we will initially considerwithout the use of any symmetry/spin information. For example, we can select a pair ofelectrons, i, j (with i < j ) in | D i i , at random, and a pair of holes a, b (with a < b ), andperform the transition ij → ab . The corresponding matrix element is H abij = h ij | ab i − h ij | ba i ≡ h ij || ab i (A1)We will denote the electron pair simply as ij and the hole pair ab .For this simple procedure, it is clear that the probability to choose | D j i from | D i i issimply: p gen ( j | i ) = (cid:18) n (cid:19) − (cid:18) N − n (cid:19) − (A2)from which it follows that p gen ( j | i ) ∼ ( nN ) − . This procedure does not take symmetryor spin quantum numbers into account, and it is quite possible that the correspondingHamiltonian matrix element is zero. To ensure that we do not generate such excitations, weneed to select the hole pairs so that following two conditions are met: σ ( φ i ) + σ ( φ j ) = σ ( φ a ) + σ ( φ b ) , (A3) Γ( φ i ) × Γ( φ j ) = Γ( φ a ) × Γ( φ b ) . (A4)These restriction greatly impact the way in which we will select i, j and a, b , and the resultinggeneration probability. a. Imposing symmetries via conditional probabilities One way to impose symmetries in excitation generation while keeping track of the gen-eration probabilities is via the notion of conditional probabilities. For example, rather thandrawing ( ij ) and ( ab ) independently, with probability p ( ab, ij ) = p ( ab ) p ( ij ) , one can insteaddraw ( ab ) given that one has already drawn ( ij ) ; the probability for this process is given by p ( ab, ij ) = p ( ab | ij ) p ( ij ) , (A5)where p ( ij ) is the probability to select ( ij ) in the ﬁrst place. If ( ij ) has a particular char-acteristic that confers a physical (e.g. symmetry-related) constraint on ( ab ) , this can be46mplemented at the stage in which we select ( ab ) : ( ab ) need only be selected from amongthose hole-pairs for which the constraint is satisﬁed. For example, if the electrons ( ij ) haveopposite spins then the holes ( ab ) must also have opposite spins. The smaller number ofpossibilities in choosing the ab pair then leads to a larger p ( ab | ij ) compared to p ( ab ) , whichcan be thought of as a renormalisation of the latter probability to take into account theconstraint.The concept of conditional probabilities can be further extended so that the pair ( ij ) itself is made to satisfy a particular condition. Suppose we introduce a set of of conditions {C , C , ... } such that the union of all such conditions is exhaustive. It is possible to drawconditional probabilities with respect to such conditions. For example, C = ‘electron pair have the same spin’ (A6) C = ‘electron pair have opposite spins’ (A7)then one can write: p ( ab, ij ) = p ( ab, ij |C ) p ( C ) + p ( ab, ij |C ) p ( C ) (A8)with p ( C ) + p ( C ) = 1 . (A9) p ( C ) , the probability to select same-spin excitations, can be chosen arbitrarily, which thenﬁxes p ( C ) according to the above.The advantage of this formulation is that we can skew the selection of electron pairs, forexample, towards opposite spin excitations if that proves advantageous, and to be able tocompute the resulting probabilities. Furthermore, we can write: p ( ab, ij |C ) = p ( ab | ij ) p ( ij |C ) (A10) p ( ab, ij |C ) = p ( ab | ij ) p ( ij |C ) (A11)which allows us to select a pair of electrons satisfying condition C , and subsequently draw apair of holes given one has selected an electron pair with the same spin (which implies thathole-pair must be chosen to have the same spin as the electron-pair).47 . Cauchy-Schwartz excitation generation Let us now consider how to generate the hole pairs in a non-uniform manner, to reﬂect thefact that, in ab initio

Hamiltonians, the matrix elements vary strongly in magnitude. Sincethe spawning probability is proportional to the ratio | H ij | /p gen ( j | i ) , it is clearly desirable togenerate excitations which make this ratio as uniform as possible, ideally with p gen ( j | i ) ∝| H ij | . In this way, one would ensure a relatively uniform probability of successful spawning,which ideally would be close to one, implying a low rejection rate. Keeping the discussionfocussed on double excitations (the generalisation to single excitations being straightforward)the question that arises is: how best can one select ij and ab such that p gen ( j | i ) ∝ | H ij | to a good approximation, and p gen remains exactly computable without excessive cost. Wewill see there is a compromise to be made. One can ensure precise proportionality between p gen ( j | i ) and | H ij | but only at prohibitive cost. Alternatively, one might be able to select ij and ab to eﬀect the transition | D i i → | D j i based on computationally inexpensive heuristics,to provide approximate proportionality, which will nevertheless allow for a large overallimprovement in eﬃciency.To ensure exact proportionality between p gen ( j | i ) and | H ij | it is necessary to enumerate allelectron-pairs and hole-pairs which are possible from | D i i , and to construct the cumulativeprobability function (CPF), from which the desired distribution can be straightforwardlysampled. The (unnormalised) CPF is: F ab,ij [ D ] = ij X ee ′ ∈ D ab X hh ′ ∈ D |h ee ′ || hh ′ i| (A12)In this expression, the sum over ee ′ runs over all enumerated electron pairs in D up to ij ,and similarly for the hole-pairs (up to ab ). The CPF is a non-decreasing function of itsdiscrete arguments, and its inverse transform enables one to select ab and ij with proba-bility proportional to |h ij || ab i| . From the point of generation probabilities, this is the idealexcitation generator, allowing for a uniform spawning probability (which can be made toequal unity, implying zero rejection rates.) Unfortunately the CPF costs O ( n N ) to set up(for each determinant | D i i ), making it prohibitive in practice.To make practical progress, we need an approximate distribution function which is muchcheaper to calculate. Two observations can be made in this relation. First, if the twoelectrons have diﬀerent spin, then the Hamiltonian matrix element consists of only one48ather than two terms. This is because upon excitation ij → ab , the two holes must matchthe spins of the two electrons. For example σ ( a ) = σ ( i ) and σ ( b ) = σ ( j ) . In this case, theHamiltonian matrix element reduces to: H ij = h ij | ab i (A13)with the exchange term h ij | ba i = 0 .With this simpler matrix element, we now ask: given that we have chosen an electron pair ij , how can we select the hole pair ab so that, with high probability, the resulting matrixelement h ij | ab i is large? At this point we can appeal to the Cauchy-Schwarz inequality,which provides a strict upper bound: h ij | ab i ≤ p h ii | aa i h jj | bb i (A14)This suggests that, as long as h ij | ab i is non-zero by symmetry, it may be advantageous toselect the hole a so that h ii | aa i is large, and the hole b so that h jj | bb i large. Because i and j have diﬀerent spins, the selection of a and b will be independent of each other, with a forexample being chosen from the α -spin holes available, and b from the β -spin holes. To dothis, we set up two CPFs: F a [ i ∈ αD ] = a X h ∈ αD p h hh | ii i , (A15) F b [ j ∈ βD ] = b X h ∈ βD p h hh | jj i , (A16)where the sums over h runs over the α or β holes in D . (The notation i ∈ αD means an α -electron in D , and h ∈ αD means an α -hole in D ). Unlike Eq.(A12), these CPFs costonly O ( N ) to set up, and allow (via their inverse transforms) the selection of a and b withprobabilities proportional to p h aa | ii i and p h jj | bb i respectively.The Cauchy-Schwarz bound on an individual 4-index integral provides a very usefulfactorised approximation for the purposes of excitation generation, especially for opposite-spin excitations. The case for same-spin excitations is less favourable because it involves thediﬀerence between two 4-index integrals, and in this case we must obtain an upper boundfor this diﬀerence expressed in a factorised form. We use the following much less tight upper49ound: |h ij | ab i − h ij | ba i| ≤ [ p h aa | ii i + p h aa | jj i ] (A17) × [ p h bb | ii i + p h bb | jj i ] (A18)In practice, we must draw two holes a and b from the same set of holes, avoiding thepossibility of drawing the same hole twice. Because we would like to avoid setting up a two-dimensional CPF (which would cost O ( N ) ), we create one one-dimension CPF in order todraw hole a , and then remove this hole in the CPF before drawing the second hole. In otherwords we set up two related CPFs F a [ ij ∈ D ] = a X h p h ii | hh i + p h jj | hh i , (A19) F ′ b [ ij ∈ D ] =  F b [ ij ∈ D | a ] if b < a,F b [ ij ∈ D ] − p h ii | aa i− p h jj | aa i if b ≥ a, (A20)drawing hole a from F a and hole b from F ′ b .Our exploration of excitation generation has led us to discover many highly performingschemes. The Cauchy-Schwarz (CS) scheme presented above is a good starting point, butit has a number of weaknesses that can be further addressed. In particular, as noted above,the upper bound obtained is particularly poor for double excitations with the same spin,and in general the speciﬁed bound can be too loose. Fortunately, the selection of the secondhole, b , is made once the ﬁrst hole, a , has already been chosen, and as such the exact doubleexcitation Hamiltonian matrix elements can be used at this stage, such that an updatedCPF for selecting the second electron is given by F b [ ij ∈ D | a ] = b X h ∈ Dh = a p |h ij | ab i − h ij | ba i| . (A21)This Part-Exact (PE) scheme no longer provides a strict bound, but by better represent-ing the cancellation of terms present in these matrix elements, it provides a substantiallybetter approximation. More crucially, it improves the prediction of the elements that werepreviously handled least eﬀectively, and thus relaxes the time-step constraints on the overallcalculation. 50ue to the increase in computational cost involved in constructing two lists, and theadditional normalisation of the probabilities required by causing the two selections not tobe made in the same manner, this update to the scheme increases computational cost periteration. In almost all systems examined this is far outweighed by the time-step changes,especially in systems with large basis sets or with translational symmetry. However, it ispossible to ﬁnd systems where the pure CS scheme is more optimal. a. Preparing for excitation generation For determinant D , to pick an excited determinant, ﬁrst construct a table of hole occu-pancies for each spin and irreducible representation, so that n σ Γ [ D ] is the number of holeswith spin σ in irrep Γ available in D . This is an O ( n ) process.We next decide whether we wish to make a single excitation or a double excitation from D . A single excitation is chosen with probability p sing , a parameter which can be optimisedas the simulation proceeds to maximise the acceptance ratio and time-step of the simulation.The probability to create a double excitation is chosen such that the maximal ratios | H ij | p gen ( j | i ) for single and double excitations are equal, which for ab initio systems typically meansdouble excitations dominate. To a ﬁrst approximation p sing = nN/ ( nN + n N ) , which isin general a small number on the order of ( nN ) − . The probability of attempting a doubleexcitation is then p doub = 1 − p sing . b. Single Excitations If a single excitation is being attempted, ﬁrst select an electron (say i ) at random, withprobability n − . The spin σ = σ ( i ) and irrep Γ = Γ( φ i ) of the electron determines the spinand irrep of the hole.To select the hole a , run over all n σ Γ holes available in D with spin and symmetry σ Γ ,and compute the (unnormalised) cumulative probability function, F (1) a [ i ∈ D ] = a X h ∈ σ Γ D | h D hi | ˆ H | D i | , (A22)where | D hi i is a single-excitation i → h from | D i , and h D hi | ˆ H | D i is the Hamiltonian matrixelement between them. The normalisation of the CPF is give by the last element in the51rray: Σ i = F (1) n σ Γ [ i ∈ D ] , (A23)where n σ Γ is the number of holes available with spin σ in irrep Γ in D . Using F (1) a , selecthole a (with probability |h D ai | H | D i| ), by inverting the CPF. This is selected by generatinga uniform random number ξ in the interval [0 , Σ i ) , and determining the index of a such thatthe condition F (1) a − < ξ ≤ F (1) a (A24)is met. The overall generation probability for this excitation is: p gen ( a, i ) = p ( a | i ) × p ( i ) × p sing , (A25)where p ( a | i ) = θ a Σ i , (A26) θ a = F (1) a − F (1) a − , (A27) p ( i ) = n − . (A28)This completes the selection of a singly excited determinant. The computation of F (1) a isan order O ( nN ) operation (with O ( N ) holes being summed over, and each Hamiltonianmatrix element being O ( n ) to compute). Although this is expensive, the generation ofsingle excitations turns out overall to be a small fraction of the total cost, largely becausethe relatively small number of times such excitations are attempted. c. Double Excitations with opposite spin electron-pairs If a double excitation is being attempted, then ﬁrstly a pair of electrons needs to beselected. The ﬁrst electron, i , should be selected uniformly at random. Following this, theCPF F j [ i ∈ D ] = j X k ∈ Dk = i h ik | ik i ×  p opp if i, k opp , − p opp otherwise. (A29)should be constructed, where p opp is a optimisable biasing factor towards excitations withelectrons having opposite spins. The second electron is selected through inversion of theCPF. 52f the two selected electrons have opposite spins, then the ﬁrst hole to be chosen is, byconvention, always a β electron, and the second hole always α . This choice is entirely arbi-trary, and in some high-spin systems it may make sense to reverse this selection. Consideringall available orbitals of this spin, the CPF F ( β ) a [ i ∈ D ] = a X h ∈ βD p h hh | ii i , (A30)is constructed, where i is taken to be the electron from the selected pair with β spin, andthe hole selected by inverting the CPF.Once this ﬁrst electron has been chosen, the symmetry of the target orbital is now ﬁxedby the constraint that Γ a ⊗ Γ ′ = Γ i ⊗ Γ j . This greatly restricts the number of holes thatmust be considered when constructing the ﬁnal CPF, F ( α Γ ′ a ) b [ ij ∈ D ] = b X h ∈ α Γ ′ D p |h ij | ab i| . (A31)Note that with the conventional choice of orbital i above, h ji | ah i = 0 , and can thus beexcluded. The second hole is then also obtained by inverting the CPF. The generationprobability is then given by: p gen ( ab, ij ) = p ( ij ) p ( a | i ) p ( b | ija ) p doub , (A32) p ( ij ) = 1 N θ ( i ) j Σ i + θ ( j ) i Σ j ! , (A33) θ ( i ) j = F j [ i ∈ D ] − F j − i [ i ∈ D ] , (A34) p ( a | i ) = θ a Σ ( β ) ( i ) , (A35) θ a = F ( β ) a − F ( β ) a − , (A36) p ( b | ija ) = θ b Σ ( α Γ ′ ) ( ija ) , (A37) θ b = F ( α Γ ′ a ) b − F ( α Γ ′ a ) b − , (A38)where Σ i , Σ ( β ) ( i ) , Σ ( α Γ ′ ) ( ija ) are the normalisations of F j , F ( β ) a , F ( α Γ ′ a ) b respectively, and aregiven by the ﬁnal entries of the corresponding arrays.The asymetric selection of α and β holes is somewhat peculiar. It should be noted that itis possible to make this selection symmetrically, considering all available holes in the selec-tion of the ﬁrst hole, and then renormalising the probabilities to account for the possibility53f selecting b ﬁrst. The symmetric scheme increases computational cost substantially (twiceas many holes need to be considered in the CPF, and a further CPF must be calculatedfor the renormalisation). It also makes the overall time-step behaviour worse as, althoughit improves the general smoothness, for the worst-case scenario with a very rarely selectedexcitation with very diﬀerent a, b and b, a probabilities, the denominator is increased sub-stantially by considering more orbitals, whilst leaving the numerator essentially unchanged. d. Double excitations with same-spin electron pairs If the pair of electrons, selected as described above, have the same spin the process needsto account for the fact that the holes can be selected in either order, and the probabilitiesneed to be adjusted to compensate.Now, considering only holes with the same spin as the two electrons, construct the CPF F ( σ ) a [ ij ∈ D ] = a X h ∈ σD p h hh | ii i + p h hh | jj i . (A39)Hole a can then be selected through inversion of this CPF, which ﬁxes the symmetry ofhole b such that Γ a ⊗ Γ b = Γ i ⊗ Γ j . The CPF for selecting the second hole can then beconstructed from the (much smaller) set of holes with the appropriate symmetry, such that F ( σ Γ b a ) b [ ij ∈ D ] = b X h ∈ α Γ b Dh = a p |h ij | ab i − h ij | ba i| . (A40)The second hole, b , can then be selected through inversion of this CPF. It is importantto note that as the selection of the ﬁrst hole includes all holes of the hole with the givenspin, the selection of the holes could have been made in the reverse order, and this needs tobe taken into account in the generation probability, which is given by: p gen ( ab, ij ) = [ p ( a | ijb ) p ( b | ij ) + p ( b | ija ) p ( a | ij )] p ( ij ) p double , (A41)54here p ( ij ) = 1 N θ ( i ) j Σ i + θ ( j ) i Σ j ! , (A42) θ ( i ) j = F j [ i ∈ D ] − F j − [ i ∈ D ] , (A43) p ( a | ij ) = θ a Σ a , (A44) θ a = F ( σ ) a − F ( σ ) a − , (A45) p ( b | ija ) = θ ( a ) b Σ ( a ) b , (A46) θ ( a ) b = F ( σ Γ b a ) b − F ( σ Γ b a ) b − (A47)and Σ i , Σ a , Σ ( a ) b are the normalisations of the three CPFs, given by their ﬁnal elements. Notethat in the implementation, the normalisations of four CPFs must be calculated to be ableto calculate p ( a | ijb ) as well as p ( b | ija ) .

3. Pre-computed heat-bath sampling

While the Cauchy-Schwartz excitation generator has negligible memory cost, picking anexcitation requires O ( N ) steps, each involving Hamiltonian matrix elements, making theprocedure expensive. The pre-computed heat-bath algorithm employed in NECI is a simpleapproximation derived from the heat-bath sampling and oﬀers a much faster excitationgeneration, at the cost of increased memory requirement. The heat-bath probability distri-bution can also used to determine a cutoﬀ in a deterministic scheme, leading to the heat-bathCI (HCI) method . The sampling can either use uniform single excitations or the weightedscheme outlined in section A 2 b, and approximates the exact heat-bath sampling of double-excitations by uniformly picking the occupied orbitals, and then picking two target orbitalssimultaneously weighted with the Hamiltonian matrix element. Since the double excitationsplay the largest role in excitation generation, and the singles’ matrix elements depend onthe determinants in addition to the excitation, it is typically most eﬃcient to generate onlydouble excitations in a weighted fashion, resulting in an excellent tradeoﬀ between optimalweights and the cost of excitation generation.To create a double excitation using pre-computed heat-bath generation, ﬁrst, two oc-cupied orbitals i , j are chosen uniformly at random, using a bias towards spin-opposite55xctitations, which is determined similar to the bias towards double excitations. This worksanalogously to the Cauchy-Schwartz excitation generation outlined in section A 2. Then, apair a , b of orbitals is chosen using pre-computed weights p ( ab | ij ) = | H abij | P a ′ b ′ | H a ′ b ′ ij | , (A48)where H abij is the matrix element for a double excitation from orbitals i, j to orbitals a, b .These are independent of the determinant and thus can be pre-computed at memory cost O ( M ) . Then, pairs of orbitals can be picked using these weights via alias sampling in O (1) time. If one of the picked orbitals a , b is occupied, or all matrix elements H abij are zero,the excitation is immediately rejected, otherwise, we continue with the FCIQMC scheme.As it is desirable to use spatial orbital indices to save memory, but the matrix elementdepends on the relative spin of the orbitals in the case of a spin-opposite excitation sinceit determines if an exchange integral is used, for each pair of spatial orbitals i , j , threeprobability distributions are generated, one for the spin-parallel case, one for the spin-opposite case without exchange and one for the spin-opposite case with exchange. Betweenthe latter two, we then choose the exchange case with probability p exch ( ij ) = P ab | H aβbαiαjβ | P ab | H aβbαiαjβ | + | H aαbβiαjβ | . (A49)The denominator is the same as the denominator in Eq. (A48) for spin orbitals, while thenumerator is the denominator in Eq. (A48) for spatial orbitals in the exchange case. Thebias p exch hence relates the spatial orbital distributions to the original distribution (A48).This approach is tailored for rapid excitation generation, as the process is in principle O (1) , while yielding acceptance rates comparable to the on-the-ﬂy Cauchy-Schwartz gen-eration. Due to implementational details of NECI , the uniform selection of electrons scaleslinearly with the number of electrons, which, however, does not constitute a bottleneck inpractical application. The rapid excitation generation has important consequences for thescalability of the algorithm, since the stochastic nature of the algorithm can give rise todynamic load imbalance if the time taken for excitation generation can vary signiﬁcantlydepending on determinant and electron/orbital selection.56

EFERENCES A. Alavi, “ Two interacting electrons in a box: An exact diagonal-ization study,” The Journal of Chemical Physics , 7735–7745 (2000),https://doi.org/10.1063/1.1316045. D. C. Thompson and A. Alavi, “ Two interacting electrons in a spherical box: An exactdiagonalization study,” Phys. Rev. B , 235118 (2002). G. H. Booth, A. J. W. Thom, and A. Alavi, “ Fermion monte carlo withoutﬁxed nodes: A game of life, death, and annihilation in slater determinant space,”The Journal of Chemical Physics , 054106 (2009). R. Blankenbecler, D. J. Scalapino, and R. L. Sugar, “ Monte carlo calculations of coupledboson-fermion systems. i,” Phys. Rev. D , 2278–2286 (1981). G. Sugiyama and S. Koonin, “ Auxiliary ﬁeld monte-carlo for quantum many-body groundstates,” Annals of Physics , 1 – 26 (1986). S. Zhang and H. Krakauer, “ Quantum monte carlo method using phase-free random walkswith slater determinants,” Phys. Rev. Lett. , 136401 (2003). S. Zhang, J. Carlson, and J. E. Gubernatis, “ Constrained path monte carlo method forfermion ground states,” Physical Review B , 7464 (1997). W. Dobrautz, S. D. Smart, and A. Alavi, “ Eﬃcient formulation of full con-ﬁguration interaction quantum monte carlo in a spin eigenbasis via the graph-ical unitary group approach,” The Journal of Chemical Physics , 094104 (2019),https://doi.org/10.1063/1.5108908. D. Cleland, G. H. Booth, and A. Alavi, “ Communications: Survival of theﬁttest: Accelerating convergence in full conﬁguration-interaction quantum monte carlo,”The Journal of Chemical Physics , 041103 (2010). K. Ghanem, A. Y. Lozovoi, and A. Alavi, “ Unbiasing the ini-tiator approximation in full conﬁguration interaction quantummonte carlo,” The Journal of Chemical Physics , 224108 (2019),https://doi.org/10.1063/1.5134006. F. R. Petruzielo, A. A. Holmes, H. J. Changlani, M. P. Nightingale, and C. J. Umrigar,“ Semistochastic projector monte carlo method,” Phys. Rev. Lett. , 230201 (2012). N. S. Blunt, S. D. Smart, J. A. F. Kersten, J. S. Spencer, G. H. Booth, and A. Alavi,57 Semi-stochastic full conﬁguration interaction quantum Monte Carlo: Developments andapplication,” The Journal of Chemical Physics , 184107 (2015). N. S. Blunt, S. D. Smart, G. H. Booth, and A. Alavi, “ An excited-state approach within full conﬁguration interaction quantum monte carlo,”The Journal of Chemical Physics , 134117 (2015). S. Zhang and M. H. Kalos, “ Bilinear quantum monte carlo: Expectations and energydiﬀerences,” Journal of Statistical Physics , 515–533 (1993). C. Overy, G. H. Booth, N. S. Blunt, J. J. Shepherd, D. Cleland, and A. Alavi, “ Unbi-ased reduced density matrices and electronic properties from full conﬁguration interactionquantum monte carlo,” The Journal of Chemical Physics , 244117 (2014). N. S. Blunt, T. W. Rogers, J. S. Spencer, and W. M. C. Foulkes, “ Density-matrix quantummonte carlo method,” Phys. Rev. B , 245124 (2014). N. S. Blunt, G. H. Booth, and A. Alavi, “ Density matrices in full conﬁguration inter-action quantum monte carlo: Excited states, transition dipole moments, and paralleldistribution,” The Journal of Chemical Physics , 244105 (2017). R. J. Anderson, T. Shiozaki, and G. H. Booth, “ Eﬃcient and stochastic multirefer-ence perturbation theory for large active spaces within a full conﬁguration interactionquantum monte carlo framework,” The Journal of Chemical Physics , 054101 (2020),https://doi.org/10.1063/1.5140086. G. Li Manni, S. D. Smart, and A. Alavi, “ Combining the complete active spaceself-consistent ﬁeld method and the full conﬁguration interaction quantum montecarlo within a super-ci framework, with application to challenging metal-porphyrins,”Journal of Chemical Theory and Computation , 1245–1258 (2016). R. E. Thomas, Q. Sun, A. Alavi, and G. H. Booth,“ Stochastic multiconﬁgurational self-consistent ﬁeld theory,”Journal of Chemical Theory and Computation , 5316–5325 (2015). K. Guther, W. Dobrautz, O. Gunnarsson, and A. Alavi, “ Time propa-gation and spectroscopy of fermionic systems using a stochastic technique,”Phys. Rev. Lett. , 056401 (2018). H. Luo and A. Alavi, “ Combining the transcorrelated method with full conﬁgura-tion interaction quantum monte carlo: Application to the homogeneous electron gas,”Journal of Chemical Theory and Computation , 1403–1411 (2018), pMID: 29431996,58ttps://doi.org/10.1021/acs.jctc.7b01257. W. Dobrautz, H. Luo, and A. Alavi, “ Compact numerical solutions to the two-dimensional repulsive hubbard model obtained via nonunitary similarity transformations,”Phys. Rev. B , 075119 (2019). A. J. Cohen, H. Luo, K. Guther, W. Dobrautz, D. P. Tew, and A. Alavi,“ Similarity transformation of the electronic schrödinger equation via jas-trow factorization,” The Journal of Chemical Physics , 061101 (2019),https://doi.org/10.1063/1.5116024. P. Jeszenszki, A. Alavi, and J. Brand, “ Are smooth pseudopotentials a good choice forrepresenting short-range interactions?” Phys. Rev. A , 033608 (2019). F. D. Malone, N. S. Blunt, J. J. Shepherd, D. K. K. Lee, J. S. Spencer, and W. M. C.Foulkes, “ Interaction picture density matrix quantum monte carlo,” J. Chem. Phys. ,044116 (2015). F. D. Malone, N. S. Blunt, E. W. Brown, D. K. K. Lee, J. S. Spencer, W. M. C. Foulkes,and J. J. Shepherd, “ Accurate exchange-correlation energies for the warm dense electrongas,” Phys. Rev. Lett. , 115701 (2016). J. S. Spencer, N. S. Blunt, S. Choi, J. Etrych, M.-A. Filip, W. M. C. Foulkes, R. S. T.Franklin, W. J. Handley, F. D. Malone, V. A. Neufeld, R. Di Remigio, T. W. Rogers,C. J. C. Scott, J. J. Shepherd, W. A. Vigor, J. Weston, R. Xu, and A. J. W. Thom,“ The hande-qmc project: Open-source stochastic quantum chemistry from the groundstate up,” Journal of Chemical Theory and Computation , 1728–1742 (2019). N. M. Tubman, J. Lee, T. Y. Takeshita, M. Head-Gordon, and K. B. Whaley, “ A deter-ministic alternative to the full conﬁguration interaction quantum monte carlo method,”The Journal of chemical physics , 044112 (2016). B. Huron, J. P. Malrieu, and P. Rancurel, “ Iterative perturbation calcula-tions of ground and excited state energies from multiconﬁgurational zeroth-order wavefunctions,” The Journal of Chemical Physics , 5745–5759 (1973),https://doi.org/10.1063/1.1679199. A. A. Holmes, N. M. Tubman, and C. J. Umrigar, “ Heat-bath conﬁguration inter-action: An eﬃcient selected conﬁguration interaction algorithm inspired by heat-bathsampling,” Journal of Chemical Theory and Computation , 3674–3680 (2016), pMID:27428771, https://doi.org/10.1021/acs.jctc.6b00407.59 A. A. Holmes, H. J. Changlani, and C. J. Umrigar, “ Eﬃcient heat-bath sampling infock space,” Journal of Chemical Theory and Computation , 1561–1571 (2016), pMID:26959242. S. Sharma, A. A. Holmes, G. Jeanmairet, A. Alavi, and C. J. Umrigar, “ Semistochas-tic heat-bath conﬁguration interaction method: Selected conﬁguration interaction withsemistochastic perturbation theory,” Journal of chemical theory and computation ,1595–1604 (2017). L.-H. Lim and J. Weare, “ Fast randomized iteration: Diﬀusion monte carlo through thelens of numerical linear algebra,” SIAM Review , 547–587 (2017). S. M. Greene, R. J. Webber, J. Weare, and T. C. Berkelbach, “ Beyond walkers in stochas-tic quantum chemistry: Reducing error using fast randomized iteration,” Journal of chem-ical theory and computation , 4834–4850 (2019). Z. Wang, Y. Li, and J. Lu, “ Coordinate descent full conﬁguration interaction,” Journalof chemical theory and computation , 3558–3569 (2019). J. J. Shepherd, G. Booth, A. Grüneis, and A. Alavi, “ Full conﬁguration interactionperspective on the homogeneous electron gas,” Phys. Rev. B , 081103 (2012). N. Dattani, G. Li Manni, D. Feller, and J. Koput, “ Computer-predicted ionization energyof carbon within − of the best experiment,” (2020), under review. G. Li Manni and A. Alavi, “ Understanding the mechanism stabilizing intermediate spinstates in fe(ii)-porphyrin,” Journal of Physical Chemistry A , 4935–4947 (2018). G. Li Manni, D. Kats, D. P. Tew, and A. Alavi, “ Role of va-lence and semicore electron correlation on spin gaps in fe(ii)-porphyrins,”Journal of Chemical Theory and Computation , 1492–1497 (2019). J. Hubbard, “ Electron Correlations in Narrow Energy Bands,”Proc. Royal Soc. A , 238 (1963). J. J. Shepherd, G. E. Scuseria, and J. S. Spencer, “ Sign problem in full conﬁgurationinteraction quantum monte carlo: Linear and sublinear representation regimes for theexact wave function,” Phys. Rev. B , 155130 (2014). N. Trivedi and D. M. Ceperley, “ Ground-state correlations of quantum antiferromagnets:A Green-function Monte Carlo study,” Phys. Rev. B , 4552 (1990). G. H. Booth and A. Alavi et. al., “ Standalone

NECI codebase designed for FCIQMCand other stochastic quantum chemistry methods.” https://github.com/ghb24/ ECI_STABLE (2013). L. Clarke, I. Glendinning, and R. Hempel, “ The mpi message passing interface standard,”in

Programming Environments for Massively Parallel Distributed Systems , edited by K. M.Decker and R. M. Rehmann (Birkhäuser Basel, Basel, 1994) pp. 213–218. L. S. Blackford, A. Petitet, R. Pozo, K. Remington, R. C. Whaley, J. Demmel, J. Don-garra, I. Duﬀ, S. Hammarling, G. Henry, et al. , “ An updated set of basic linear algebrasubprograms (blas),” ACM Transactions on Mathematical Software , 135–151 (2002). E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz,A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen,

LAPACK Users’ Guide ,3rd ed. (Society for Industrial and Applied Mathematics, Philadelphia, PA, 1999). B. Aradi, “ fypp fortran preprocessor,” https://github.com/aradi/fypp . M. Saito and M. Matsumoto, “ Simd-oriented fast mersenne twister: a 128-bit pseudoran-dom number generator,” in

Monte Carlo and Quasi-Monte Carlo Methods 2006 , editedby A. Keller, S. Heinrich, and H. Niederreiter (Springer Berlin Heidelberg, Berlin, Hei-delberg, 2008) pp. 607–622. M. Saito and M. Matsumoto, “ double precision simd oriented fast mersenne twister,” https://github.com/MersenneTwister-Lab/dSFMT (2008). M. Matsumoto and T. Nishimura, “ Mersenne twister: a 623-dimensionally equidistributeduniform pseudo-random number generator,” ACM Transactions on Modeling and Com-puter Simulation (TOMACS) , 3–30 (1998). S. D. Smart,

The use of spin-pure and non-orthogonal Hilbert spaces in full conﬁgurationinteraction quantum monte-carlo , Ph.D. thesis, University of Cambridge (2014). S. Smart, G. Booth, and A. Alavi, “ Excitation generation in full conﬁguration interactionquantum monte carlo based on cauchy-schwarz distributions,” . V. A. Neufeld and A. J. W. Thom, “ Exciting determinants in quan-tum monte carlo: Loading the dice with fast, low-memory weights,”Journal of Chemical Theory and Computation , 127–140 (2019). J. Li, M. Otten, A. A. Holmes, S. Sharma, and C. J. Umrigar, “ Fast semistochasticheat-bath conﬁguration interaction,” J. Comp. Phys. , 214110 (2018). P. Lowdin, “ A note on the quantum-mechanical perturbation theory,” J Chem Phys ,61396–1401 (1951). R. Olivares-Amaya, W. Hu, N. Nakatani, S. Sharma, J. Yang,and G. K.-L. Chan, “ The ab-initio density matrix renormalizationgroup in practice,” The Journal of Chemical Physics , 034102 (2015),https://doi.org/10.1063/1.4905329. A. D. Chien, A. A. Holmes, M. Otten, C. J. Umrigar, S. Sharma, and P. M. Zimmerman,“ Excited states of methylene, polyenes, and ozone from heat-bath conﬁguration inter-action,” The Journal of Physical Chemistry A , 2714–2722 (2018), pMID: 29473750,https://doi.org/10.1021/acs.jpca.8b01554. N. S. Blunt, “ Communication: An eﬃcient and accurate perturbative correction to initia-tor full conﬁguration interaction quantum monte carlo,” The Journal of Chemical Physics , 221101 (2018). J. S. Spencer, N. S. Blunt, and W. M. C. Foulkes, “ The sign problem and populationdynamics in the full conﬁguration interaction quantum Monte Carlo method,” The Journalof Chemical Physics , 054110 (2012). N. S. Blunt, A. J. W. Thom, and C. J. C. Scott, “ Preconditioning and perturbativeestimators in full conﬁguration interaction quantum monte carlo,” Journal of ChemicalTheory and Computation , 3537 (2019). N. S. Blunt, “ A hybrid approach to extending selected conﬁguration interaction and fullconﬁguration interaction quantum monte carlo,” The Journal of Chemical Physics ,174103 (2019). R. E. Thomas, D. Opalka, C. Overy, P. J. Knowles, A. Alavi, and G. H. Booth, “ Analyticnuclear forces and molecular properties from full conﬁguration interaction quantum montecarlo,” The Journal of Chemical Physics , 054108 (2015). G. H. Booth and G. K.-L. Chan, “ Communbication: Excited states, dynamic correla-tion functions and spectral properties from full conﬁguration interaction quantum montecarlo,” The Journal of Chemical Physics , 191102 (2012). P. K. Samanta, N. S. Blunt, and G. H. Booth, “ Response formalism within full con-ﬁguration interaction quantum monte carlo: Static properties and electrical response,”Journal of Chemical Theory and Computation , 3532–3546 (2018). N. S. Blunt, A. Alavi, and G. H. Booth, “ Krylov-projected quantum monte carlo method,”Phys. Rev. Lett. , 050603 (2015). 62 N. S. Blunt, A. Alavi, and G. H. Booth, “ Nonlinear biases, stochastically sam-pled eﬀective hamiltonians, and spectral functions in quantum monte carlo methods,”Phys. Rev. B , 085118 (2018). G. H. Booth, D. Cleland, A. Alavi, and D. P. Tew, “ An explicitly correlated ap-proach to basis set incompleteness in full conﬁguration interaction quantum monte carlo,”The Journal of Chemical Physics , 164112 (2012). A. Grüneis, J. J. Shepherd, A. Alavi, D. P. Tew, and G. H. Booth, “ Explicitly cor-related plane waves: Accelerating convergence in periodic wavefunction expansions,”The Journal of Chemical Physics , 084112 (2013). J. A. F. Kersten, G. H. Booth, and A. Alavi, “ Assessment of mul-tireference approaches to explicitly correlated full conﬁguration interactionquantum monte carlo,” The Journal of Chemical Physics , 054117 (2016),https://doi.org/10.1063/1.4959245. E. Fertitta and G. H. Booth, “ Rigorous wave function embedding with dynamical ﬂuctu-ations,” Phys. Rev. B , 235132 (2018). E. Fertitta and G. H. Booth, “ Energy-weighted density matrix embedding of open corre-lated chemical fragments,” The Journal of Chemical Physics , 014115 (2019). G. Li Manni, R. K. Carlson, S. Luo, D. Ma, J. Olsen, D. G. Truh-lar, and L. Gagliardi, “ Multiconﬁguration pair-density functional theory,”Journal of Chemical Theory and Computation , 3669–3680 (2014), pMID: 26588512. H. J. Monkhorst, “ Calculation of properties with the coupled-cluster method,” Int. J.Quantum Chem.

S11 , 421–432 (1977). E. Dalgaard and H. Monkhorst, “ Some aspects of the time-dependent coupled-clusterapproach to dynamic response functions,” Phys. Rev. A (1983). O. Christiansen, P. Jørgensen, and C. Hättig, “ Response functions from fourier com-ponent variational perturbation theory applied to a time-averaged quasienergy,” Int. J.Quantum Chem. , 1–52 (1998). T. Helgaker, S. Coriani, P. Jørgensen, K. Kristensen, J. Olsen, and K. Ruud, “ Recentadvances in wave function-based methods of molecular-property calculations.” Chem. Rev. , 543–631 (2012). R. N. Silver, D. S. Sivia, and J. E. Gubernatis, “ Maximum-entropy method for analyticcontinuation of quantum monte carlo data,” Phys. Rev. B , 2380–2389 (1990).63 M. Jarrell and J. Gubernatis, Phys. Rep. , 133 (1996). See output ﬁles output_file_excited_state_be2_b1g.txt and stats_file_excited_state_be2_b1g.txt for the excited state calculation andthe ﬁles output_file_real_time_be2_b1g.txt , and fft_spectrum_be2_b1g.txt forthe real-time calculation in the supplemental material . A. Kramida and W. C. Martin, “ A compilation of energy lev-els and wavelengths for the spectrum of neutral beryllium (be i),”Journal of Physical and Chemical Reference Data , 1185–1194 (1997),https://doi.org/10.1063/1.555999. T. Kato, “ On the Eigenfunctions of Many-Particle Systems in Quantum Mechanics,”Commun. Pure Appl. Math. , 151–177 (1957). R. Jastrow, “ Many-body problems with strong forces,” Phys. Rev. , 1479–1484 (1955). S. Fournais, M. Hoﬀmann-Ostenhof, T. Hoﬀmann-Ostenhof, and T. Østergaard Sørensen,“ Analytic structure of many-body coulombic wave functions,” Commun. Math. Phys. ,291–310 (2009). S. F. Boys and N. C. Handy, “ The determination of energies and wavefunctions with fullelectronic correlation,” Proc. Roy. Soc.

A 310 , 43–61 (1969). K. E. Schmidt and J. W. Moskowitz, J. Chem. Phys. , 4172 (1990). F. Aquilante, J. Autschbach, R. K. Carlson, L. F. Chibotaru, M. G. Delcey, L. De Vico,I. Fdez. Galván, N. Ferré, L. M. Frutos, L. Gagliardi, M. Garavelli, A. Gius-sani, C. E. Hoyer, G. Li Manni, H. Lischka, D. Ma, P.-Å. Malmqvist, T. Müller,A. Nenov, M. Olivucci, T. B. Pedersen, D. Peng, F. Plasser, B. Pritchard, M. Rei-her, I. Rivalta, I. Schapiro, J. Segarra-Martì, M. Stenrup, D. G. Truhlar, L. Un-gur, A. Valentini, S. Vancoillie, V. Veryazov, V. P. Vysotskiy, O. Weingart, F. Za-pata, and R. Lindh, “ Molcas 8: New capabilities for multiconﬁgurational quan-tum chemical calculations across the periodic table,” J. Comput. Chem. , 506 (2016),https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcc.24221. H.-J. Werner, P. J. Knowles, G. Knizia, F. R. Manby, and M. Schütz, “

Molpro : a general-purpose quantum chemistry program package,” Wiley Interdiscip. Rev. Comput. Mol. Sci. , 242 (2012). H.-J. Werner, P. J. Knowles, G. Knizia, F. R. Manby, M. Schütz, et al. , “

MOLPRO , version2015.1, a package of ab initio Y. G. Smeyers and L. Doreste-Suarez, “ Half-Projected and Pro-jected Hartree-Fock Calculations for Singlet Ground States. I.four-Electron Atomic Systems,” Int. J. Quantum Chem. , 687 (1973),https://onlinelibrary.wiley.com/doi/pdf/10.1002/qua.560070406. T. Helgaker, P. Jørgensen, and J. Olsen,

Molecular Electronic-Structure Theory (JohnWiley & Sons, Chichester, 2000). G. H. Booth, D. Cleland, A. J. W. Thom, and A. Alavi, “ Breaking the carbon dimer: Thechallenges of multiple bond dissociation with full conﬁguration interaction quantum MonteCarlo methods,” J. Chem. Phys. , 084104 (2011), https://doi.org/10.1063/1.3624383. G. H. Booth, S. D. Smart, and A. Alavi, “ Linear-scaling and parallelis-able algorithms for stochastic quantum chemistry,” Mol. Phys. , 1855 (2014),https://doi.org/10.1080/00268976.2013.877165. I. M. Gel’fand and M. L. Cetlin, “ Finite-dimensional representations of the group ofunimodular matrices,” Dokl. Akad. Nauk , 825 (1950). I. M. Gel’fand and M. L. Cetlin, “ Finite-dimensional representations of the group oforthogonal matrices,” Dokl. Akad. Nauk , 1017 (1950), amer. Math. Soc. Transl. 64,116 (1967). I. M. Gel’fand, “ The center of an inﬁnitesimal group ring,” Mat. Sb. , 103 (1950). J. Paldus, “ Group theoretical approach to the conﬁguration interaction and perturbationtheory calculations for atomic and molecular systems,” J. Chem. Phys. , 5321 (1974). J. Paldus, “ A pattern calculus for the unitary group approach tothe electronic correlation problem,” Int. J. Quantum Chem. , 165 (1975),https://onlinelibrary.wiley.com/doi/pdf/10.1002/qua.560090823. J. Paldus, “ Unitary-group approach to the many-electron correlation problem: Relationof Gelfand and Weyl tableau formulations,” Phys. Rev. A , 1620 (1976). I. Shavitt, “ Graph theoretical concepts for the unitary group approach tothe many-electron correlation problem,” Int. J. Quantum Chem. , 131 (1977),https://onlinelibrary.wiley.com/doi/pdf/10.1002/qua.560120819. I. Shavitt, “ Matrix element evaluation in the unitary group approach to the electroncorrelation problem,” Int. J. Quantum Chem.

14 S12 , 5 (1978).

J. Paldus, “ Unitary Group Approach to Many-Electron Correlation Problem,” in

The Unitary Group for the Evaluation of Electronic Energy Matrix Elements , edited by65. Hinze (Springer Berlin Heidelberg, Berlin, Heidelberg, 1981) p. 1.

I. Shavitt, “ The Graphical Unitary Group Approach and Its Ap-plication to Direct Conﬁguration Interaction Calculations,” in

The Unitary Group for the Evaluation of Electronic Energy Matrix Elements , editedby J. Hinze (Springer Berlin Heidelberg, Berlin, Heidelberg, 1981) p. 51.

W. Dobrautz,

Development of Full Conﬁguration Interaction Quantum Monte CarloMethods for Strongly Correlated Electron Systems , Ph.D. thesis, University of Stuttgart(2019).

J. Hachmann, W. Cardoen, and G. K.-L. Chan, “ Multireference correlation inlong molecules with the quadratic scaling density matrix renormalization group,”J. Chem. Phys. , 144101 (2006), https://doi.org/10.1063/1.2345196.

M. Motta, D. M. Ceperley, G. K.-L. Chan, J. A. Gomez, E. Gull, S. Guo, C. A. Jiménez-Hoyos, T. N. Lan, J. Li, F. Ma, A. J. Millis, N. V. Prokof’ev, U. Ray, G. E. Scuseria,S. Sorella, E. M. Stoudenmire, Q. Sun, I. S. Tupitsyn, S. R. White, D. Zgid, and S. Zhang(Simons Collaboration on the Many-Electron Problem), “ Towards the Solution of theMany-Electron Problem in Real Materials: Equation of State of the Hydrogen Chainwith State-of-the-Art Many-Body Methods,” Phys. Rev. X , 031059 (2017). R. Pariser and R. G. Parr, “ A Semi-Empirical Theory of the Electronic Spectra and Elec-tronic Structure of Complex Unsaturated Molecules. I.” J. Chem. Phys. , 466 (1953),https://doi.org/10.1063/1.1698929. R. Pariser and R. G. Parr, “ A Semi-Empirical Theory of the Electronic Spectra and Elec-tronic Structure of Complex Unsaturated Molecules. II,” J. Chem. Phys. , 767 (1953),https://doi.org/10.1063/1.1699030. M. C. Gutzwiller, “ Eﬀect of Correlation on the Ferromagnetism of Transition Metals,”Phys. Rev. Lett. , 159 (1963). G. K.-L. Chan and M. Head-Gordon, “ Highly correlated calculations with a poly-nomial cost algorithm: A study of the density matrix renormalization group,”J. Chem. Phys. , 4462 (2002), https://doi.org/10.1063/1.1449459.

S. Sharma and G. K.-L. Chan, “ Spin-adapted density matrix renormaliza-tion group algorithms for quantum chemistry,” J. Chem. Phys. , 124121 (2012),https://doi.org/10.1063/1.3695642.

G. K.-L. Chan and S. Sharma, “ The Density Matrix Renormalization Group in Quantum66hemistry,” Annu. Rev. Phys. Chem. , 465 (2011). S. R. White, “ Density matrix formulation for quantum renormalization groups,”Phys. Rev. Lett. , 2863 (1992). G. Li Manni, W. Dobrautz, and A. Alavi, “ Compression of spin-adapted mul-ticonﬁgurational wave functions in exchange-coupled polynuclear spin systems,”Journal of Chemical Theory and Computation , 2202–2215 (2020), pMID: 32053374. M. Reiher, N. Wiebe, K. M. Svore, D. Wecker, and M. Troyer, “ Elucidating reactionmechanisms on quantum computers,” Proc. Natl. Acad. Sci. , 7555–7560 (2017).

See output ﬁles output_file_scaling_with_*_cores.txt and output_file_energy_with_8b_walkers.txt in the supplemental material . Z. Li, J. Li, N. S. Dattani, C. J. Umrigar, and G. K.-L. Chan, “ The elec-tronic complexity of the ground-state of the femo cofactor of nitrogenase as rel-evant to quantum simulations,” The Journal of Chemical Physics , 024302 (2019),https://doi.org/10.1063/1.5063376.

See output ﬁles output_file_load_imbalance_n*.txt in the supplemental material . The FeMoco calculations were performed before the introduction of the PCHB excitaitongenerator and thus using the Cauchy-Schwartz excitation generator, which is expected toyield higher load imbalance. Therefore, the FeMoco calculations have higher load imbal-ance at all considered scales compared to the Cr example. J. D. Hunter, “ Matplotlib: A 2d graphics environment,”Computing in Science Engineering , 90–95 (2007). P. J. Knowles and N. C. Handy, “ A determinant based full conﬁguration interactionprogram,” Computer Physics Communications , 75 – 83 (1989). H.-J. Werner, P. J. Knowles, G. Knizia, F. R. Manby, and M. Schütz, “ Molpro: ageneral-purpose quantum chemistry program package,” WIREs Comput Mol Sci , 242–253 (2012). H.-J. Werner, P. J. Knowles, G. Knizia, F. R. Manby, M. Schütz, et al. ,“ Molpro, version 2019.2, a package of ab initio programs,” (2019).

I. Fdez. Galván, M. Vacher, A. Alavi, C. Angeli, F. Aquilante, J. Autschbach, J. J.Bao, S. I. Bokarev, N. A. Bogdanov, R. K. Carlson, L. F. Chibotaru, J. Creutzberg,N. Dattani, M. G. Delcey, S. S. Dong, A. Dreuw, L. Freitag, L. M. Frutos, L. Gagliardi,F. Gendron, A. Giussani, L. Gonzàlez, G. Grell, M. Guo, C. E. Hoyer, M. Johansson,67. Keller, S. Knecht, G. Kova˘cević, E. Källman, G. Li Manni, M. Lundberg, Y. Ma,S. Mai, J. P. Malhado, P. Å. Malmqvist, P. Marquetand, S. A. Mewes, J. Norell,M. Olivucci, M. Oppel, Q. M. Phung, K. Pierloot, F. Plasser, M. Reiher, A. M. Sand,I. Schapiro, P. Sharma, C. J. Stein, L. K. Sørensen, D. G. Truhlar, M. Ugandi, L. Un-gur, A. Valentini, S. Vancoillie, V. Veryazov, O. Weser, T. A. Wesołowski, P.-O. Wid-mark, S. Wouters, A. Zech, J. P. Zobel, and R. Lindh, “ Openmolcas: From source codeto insight,” Journal of Chemical Theory and Computation , 5925–5964 (2019), pMID:31509407, https://doi.org/10.1021/acs.jctc.9b00532. Q. Sun, T. C. Berkelbach, N. S. Blunt, G. H. Booth, S. Guo, Z. Li, J. Liu, J. McClain,S. Sharma, S. Wouters, and G. K.-L. Chan, “ Pyscf: The python-based simulations ofchemistry framework,” WIREs Comput Mol Sci 2018 , e1340 (2017). G. Kresse and J. Furthmüller, “ Eﬃcient iterative schemes for ab initio total-energy cal-culations using a plane-wave basis set,” Phys. Rev. B , 11169–11186 (1996). Q. Sun, J. Yang, and G. K.-L. Chan, “ A general second order com-plete active space self-consistent-ﬁeld solver for large-scale systems,”Chemical Physics Letters , 291 – 299 (2017), ahmed Zewail (1946-2016) Com-memoration Issue of Chemical Physics Letters.

T. Yanai, Y. Kurashige, D. Ghosh, and G. K.-L. Chan, “ Accelerating convergence initerative solution for large-scale complete active space self-consistent-ﬁeld calculations,”International Journal of Quantum Chemistry , 2178–2190 (2009).

N. A. Bogdanov, G. Li Manni, S. Sharma, O. Gunnarsson, and A. Alavi, “ Newsuperexchange paths due to breathing-enhanced hopping in corner-sharing cuprates,”Arxiv (2018), arXiv:1803.07026.

Supplemental material, available at supplement-url.

A. J. Walker, “ An eﬃcient method for generating discrete random variables with generaldistributions,” ACM Trans. Math. Softw.3