Efficient heat-bath sampling in Fock space
aa r X i v : . [ c ond - m a t . s t r- e l ] D ec Efficient heat-bath sampling in Fock space
Adam Holmes , ∗ Hitesh J. Changlani , and C. J. Umrigar † Laboratory of Atomic and Solid State Physics,Cornell University, Ithaca, New York 14853, USA Department of Physics, University of Illinois at Urbana-Champaign,Urbana, Illinois 61801, USA
We introduce an algorithm for sampling many-body quantum states in Fock space. Thealgorithm efficiently samples states with probability approximately proportional to an arbi-trary function of the second-quantized Hamiltonian matrix element connecting the sampledstate to the current state. We apply the new sampling algorithm to the recently-developedSemistochastic Full Configuration Interaction Quantum Monte Carlo method (S-FCIQMC),a semistochastic implementation of the power method for projecting out the ground stateenergy in a basis of Slater determinants. The heat-bath sampling requires modest addi-tional computational time and memory compared to uniform sampling but results in newly-spawned weights that are approximately of the same magnitude, thereby greatly improvingthe efficiency of projection. A comparison in efficiency between uniform and approximateheat-bath sampling is performed on the all-electron nitrogen dimer at equilibrium in Dun-ning’s cc-pV X Z basis sets with X ∈ { D , T , Q , } , demonstrating a large gain in efficiencythat increases with basis set size. In addition, a comparison in efficiency is performed onthree all-electron first-row dimers, B , N , and F , in a cc-pVQZ basis, demonstrating thatthe gain in efficiency compared to uniform sampling also increases dramatically with thenumber of electrons. ∗ [email protected] † [email protected]
I. INTRODUCTION
Methods for finding approximate solutions to the quantum many-body problem often work in Fock space, spannedby a discrete basis composed of products of N single-particle orbitals chosen from a set of M orbitals, where N is the number of particles. These products are symmetrized or antisymmetrized for bosonic or fermionic particles,respectively. However, the number of states in Fock space scales exponentially in M , and even the size of thesector of Fock space with constant particle number N sector scales combinatorially in M and N , so deterministiccalculations are limited to small M and N .When the number of states is too large for deterministic methods (approximately 10 ), Monte Carlo methodsprovide a valuable alternative. For example, the Full Configuration Interaction Quantum Monte Carlo (FCIQMC)method [1–4], and its semistochastic extension, the S-FCIQMC method [5] have been used to calculate almostexact energies in far larger state spaces. Another example is the Monte Carlo Configuration Interaction (MCCI)method [6], in which a variational wavefunction is calculated from a selected CI expansion generated by a MonteCarlo sampling procedure.In both of these methods, Slater determinants are sampled uniformly from the set of determinants connectedto a reference by the Hamiltonian. While uniform sampling has the advantage of being easy to implement, it isfar from optimal, since some of the determinants connected to the reference are much more important than others.The definition of “importance” here depends on the method used. In S-FCIQMC, starting from an initial state,the importance of a final state is the magnitude of the Hamiltonian matrix element connecting it to the initialstate: if moves are proposed with probabilities proportional to this importance function, the weight of the finalstate is independent of which of the possible final states is chosen. In MCCI, the importance of a state is themagnitude of the 2nd-order perturbation theory estimate of energy lowering achieved by including this state in theexpansion. Since the importance of the states can range over many orders of magnitude, large gains in efficiencycan be gained by sampling states in proportion to their importance. However, the na¨ıve approach of computingthe relative probabilities of all the connected states and then normalizing by dividing by their sum is prohibitivelyexpensive because the number of connected states can be large. For example, for a quantum chemistry Hamiltonian,it scales as O (cid:0) N M (cid:1) .Motivated by the above, this paper addresses the following problem: Given an initial configuration in Fockspace, how can one efficiently sample new configurations with probability approximately proportional to a functionof the Hamiltonian matrix element connecting the new configuration to the initial one? By ‘efficient’, we meanthat the time complexity of one sampling event scales as O ( N ), the same time complexity as that of computingone Hamiltonian matrix element corresponding to a single excitation in a molecular system. We also limit ourstorage requirements to O (cid:0) M k (cid:1) , where k is the maximum number of creation and annihilation operators in anyoff-diagonal term in the Hamiltonian − in other words, the same scaling as that required to store the integralsneeded for computing the Hamiltonian. We take advantage of the fact that k is typically small; e.g., it is 4 for thequantum chemistry Hamiltonian and for the Hubbard model in momentum space, and it is 2 for the Hubbard modelin real space. We demonstrate the efficiency gains achieved when the method is used in S-FCIQMC applied to thequantum chemistry Hamiltonian.This paper is organized as follows. Section II has an overview of the S-FCIQMC method and its applicationto the quantum chemistry Hamiltonian. In Section III the approximate heat-bath sampling method is discussedwithin the context of S-FCIQMC. In Section IV uniform and heat-bath sampling methods are compared for S-FCIQMC calculations on the nitrogen dimer at equilibrium in several different basis sets, demonstrating thatheatbath sampling is more efficient than uniform sampling by a factor that increases rapidly with basis size. Similarcalculations are also performed on the boron dimer and the fluoride dimer, demonstrating that at fixed number oforbitals, the gain in efficiency increases even more rapidly with electron number. II. BACKGROUNDA. Quantum chemistry Hamiltonian
The second-quantized nonrelativistic electronic Hamiltonian, in the Born-Oppenheimer approximation is,ˆ H = X pr f rp a † r a p + 12 X pqrs g rspq a † r a † s a q a p + h nuc , (1)where a † p ( a p ) is the usual electron creation (destruction) operator with the indices { p, q, r, s } incorporating bothspatial and spin degrees of freedom. The terms entering the expression of the Hamiltonian are, f rp = Z φ ∗ r ( x ) − ∇ − X I Z I | r I − r | ! φ p ( x ) dx (2)the 1-center integrals with φ p ( x ) denoting spin-orbitals, x the combined spatial ( r ) and spin coordinates of theelectrons, and Z I and r I the atomic number and spatial coordinates of nucleus I , g rspq = Z φ ∗ r ( x ) φ ∗ s ( x ) 1 | r − r | φ p ( x ) φ q ( x ) dx dx (3)the 2-center integrals with an index-ordering convention according to the physicist notation [7], and h nuc = X I 2) and that for the 2-center integrals is O ( N / i ) and final states ( f ) differ in, H diag = X p ∈ occ . f pp + 12 X p,q ∈ occ . ( g ppqq − g pqqp ) , (5) H ( r ← p ) = Γ irp f rp + X q ∈ occ . ( g rpqq − g rqqp ) ! , (6) H ( rs ← pq ) = Γ irp Γ fsq (cid:18) g rspq − g rsqp (cid:19) , (7)where Γ irp = ( − n , n being the number of occupied spin-orbitals between p and r in state i .Note that the magnitudes of the matrix elements for double excitations depend only on the four spin-orbitalswhose occupations are changing, though the sign depends on the other occupied spin-orbitals as well. In contrastthe magnitudes of the diagonal and the single excitation matrix elements depend on all the occupied spin-orbitals. B. S-FCIQMC overview We now briefly review the S-FCIQMC method [5] which is a modification of the FCIQMC method [1–4]. There arethree main differences between the S-FCIQMC method and the original FCIQMC method, namely, walkers have realrather than integer weights, the computation of the mixed estimator of the energy is done using a multideterminantaltrial wavefunction rather than the Hartree-Fock determinant and the part of the projection that involves the mostimportant determinants is done deterministically rather than stochastically.In common with all QMC methods, the S-FCIQMC method employs a “projector” operator, which is a functionof the Hamiltonian such that the ground state of the Hamiltonian is the dominant state (state with largest absoluteeigenvalue) of the projector. Repeated application of the projector to an arbitrary state that is not orthogonal tothe ground state results in projecting onto the ground state. The S-FCIQMC method employs the linear projector,ˆ G = ˆ + τ (cid:16) E T − ˆ H (cid:17) , (8)where E T is an estimate of the ground state energy and i, j denote Fock space states. The time step, τ , must besmaller than 2 / ( E max − E min ), where E max and E min are the extremal eigenvalues of ˆ H , in order for the groundstate of ˆ H to be the dominant state of ˆ G . In order to avoid negative diagonal matrix elements in ˆ G , τ , must besmaller than 1 / ( E max − E min ). In practice, τ is chosen to roughly minimize the statistical error for given computertime, and this optimal value is yet smaller.Since the total number of states is much larger than the number of states that can be stored on a computer, it isnecessary to do at least part of the projection stochastically. Before the run, a subset of the determinants is selectedto be dubbed the deterministic space . Projection between pairs of determinants that are both in the deterministicspace is performed deterministically, using a sparse matrix-vector multiplication. However, projection between apair of determinants at least one of which is outside the deterministic space is performed stochastically, as follows.At any Monte Carlo (MC) step, t , the current state is represented as a sparse linear combination of “occupied”states, (cid:12)(cid:12) ψ t (cid:11) = X i ∈ occ . w ti | i i , (9)where w ti is the weight on state | i i at step t . The state evolves from MC step t to step t + 1 according to w t +1 j = X i ˆ G ji w ti = X i h δ ji + τ (cid:16) E T δ ji − ˆ H ji (cid:17)i w ti = (cid:16) τ (cid:16) E T − ˆ H ii (cid:17)(cid:17) w ti − τ X i = j ˆ H ji w ti . (10)Note that the first of these terms is diagonal, and the second is off-diagonal. In addition to the off-diagonal elementsconnecting pairs of deterministic determinants, all diagonal elements can be applied deterministically, since they donot increase the density of the sparse representation. However, the off-diagonal elements that do not connect pairsof deterministic states are sampled stochastically as follows.On each iteration, the weight w ti on each Slater determinant i is divided up into an integer number of walkers n ti = max (cid:0) , ⌊ (cid:12)(cid:12) w ti (cid:12)(cid:12) ⌉ (cid:1) , and each walker spawns a new walker on determinant j with probability P ji that receives aweight of w ( t +1) j = − τ H ji P ji (cid:18) w ti n ti (cid:19) . (11)To overcome the fermion sign problem, only determinants with absolute weight greater than an initiator thresh-old [2] are allowed to create weight on unoccupied determinants. The resulting bias in the energy disappears in thelimit of infinite walker number. For efficiency reasons, walkers with weight less than a minimum weight are combinedinto a smaller number of larger weight walkers in a statistically unbiased fashion [5]. In this work, the minimumweight is set to 0.5, and a graduated initiator is used: the initiator threshold for a given determinant is equal to theminimum number of moves a walker on that determinant has taken since its last visit to the deterministic space.An estimate of the ground state wavefunction could in principle be obtained by summing the walker distributionsover all Monte Carlo iterations over a long run, i.e., | ψ i ≈ N MC X t =1 (cid:12)(cid:12) ψ t (cid:11) = N MC X t =1 X i ∈ occ . w ti | i i . (12)However, the entire wavefunction | ψ i contains too many terms to be stored all at once; instead, only the sparsewalker distribution (cid:8) w ti (cid:9) at a given iteration can be stored. Therefore, E must be estimated in a way that dependsonly on linear functions of the walker distributions. This is accomplished using a mixed estimator, E mix0 = D ψ (cid:12)(cid:12)(cid:12) ˆ H (cid:12)(cid:12)(cid:12) ψ T E h ψ | ψ T i = P N MC t =1 P i ∈ occ . w ti N i P N MC t =1 P i ∈ occ . w ti t i , (13)where | ψ T i = P i t i | i i is a trial wavefunction that is chosen before the run, and N i = P j H ij t j can also be computedand stored before the run. In the limit that either | ψ i or | ψ T i is the exact ground state, the mixed energy is a zero-bias, zero-variance estimator of E . Thus, we choose | ψ T i to be a low-energy linear combination of determinants.Before starting an S-FCIQMC run, two sets of important determinants must be selected: the deterministic spaceand the determinants that make up the trial wavefunction expansion (after selecting the determinants that makeup | ψ T i , a Lanczos diagonalization is then performed within that set of determinants to obtain the coefficients).In both important subspaces, including more determinants improves the efficiency, but the two subspaces havedifferent storage constraints. The deterministic space must be small enough that the full Hamiltonian within thedeterministic space can be stored (as an upper-triangular sparse matrix). The number of determinants in | ψ T i islimited by the requirement that all of the mixed energy numerators N i = P j H ij t j can be stored. This means thatthe deterministic space can typically be much larger than the trial wavefunction expansion.To select determinants for either subspace, the following iterative procedure is used. Starting with the Hartree-Fock determinant, all connected determinants are obtained. Second-order perturbation theory is then used toestimate the coefficient each determinant would have if a Lanczos diagonalization were performed; only the 10%of the determinants with largest expected coefficients are retained. A Lanczos diagonalization is performed inthe resulting space. Then, the determinants connected to the subset of determinants with the highest absolutecoefficients are obtained, and the procedure is repeated. Finally, the determinant list is truncated to the requiredsize, and in the case of the trial wavefunction, a final Lanczos diagonalization is performed. C. Proposal of off-diagonal moves The computed energy does not depend on P ji in Eq. 11, provided that this is in fact the probability of proposingthe move. However, the statistical error does depend on P ji . Large weight fluctuations increase the statistical errorin any Monte Carlo calculation, and their detrimental effect is even more severe when there is a sign problem. Ifthe new states are chosen from the heat-bath distribution, i.e., with probability proportional to the magnitude ofthe corresponding matrix element, i.e., P ji = | H ji | P k | H ki | . (14)then their weights are independent of which state is chosen.The difficulty with heat-bath sampling is that it is prohibitively expensive to compute the full column sum P k | H ki | every time an off-diagonal move is proposed, since the number of off-diagonal elements is O (cid:0) N M (cid:1) .Hence, until now, most S-FCIQMC calculations have been performed using an approximately uniform sampling ofoff-diagonal moves, which is computationally simpler since it only involves the number of off-diagonal elements − which can be computed before the run − rather than computing and summing them each iteration.In this paper, we introduce an efficient new approach for sampling an approximate heat-bath distribution, inwhich we factor the above proposal probability into probabilities of selecting each electron and unoccupied spin-orbital separately. These probabilities of individual steps can be computed and stored before the run in order toreduce the sampling time to O ( N ), which is the same scaling as that for computing one single-excitation matrixelement. The algorithm is derived in the next section, and is summarized concisely in Appendix A. III. APPROXIMATE HEAT-BATH ALGORITHM The nonrelativistic quantum chemistry Hamiltonian contains off-diagonal terms corresponding to single and dou-ble excitations only. Double excitations are more numerous, but much simpler to deal with than single excitations.While the number of double excitations from a given determinant is O (cid:0) N M (cid:1) , the number of distinct valuesof double excitation matrix elements in the full Hamiltonian is only O (cid:0) M (cid:1) , and each double excitation matrixelement takes only O (1) time to compute. On the other hand, while the number of single excitations possiblefrom a given determinant is only O ( N M ), the number of distinct values of single excitation matrix elements in thefull Hamiltonian is combinatorially large, and each matrix element takes O ( N ) time to compute, since the matrixelements for exciting from a given spin-orbital include a sum over all other occupied spin-orbitals (see Eq. 6). So,efficient computation of single-excitation matrix elements is difficult because there are too many to store and theyare expensive to compute on the fly.Since double excitations account for most of the possible excitations from a given determinant, and since theyare so much easier to deal with efficiently, we start by constructing an approximate heat-bath algorithm for doubleexcitations, and then we incorporate single excitations into the algorithm. A. Sampling double excitations If single excitations are ignored, heat-bath sampling requires us to sample a move of electrons from spin-orbitals { p, q } to spin-orbitals { r, s } with probability P ( rs ← pq ) = | H ( rs ← pq ) | P p ′ q ′ ∈ occ . P r ′ s ′ ∈ unocc . | H ( r ′ s ′ ← p ′ q ′ ) | , (15)where the sum over { p ′ , q ′ } is over all occupied spin-orbitals, and the sum over { r ′ , s ′ } is over all unoccupiedspin-orbitals.We can factor this probability into a four-step process as follows: P ( rs ← pq ) = | H ( rs ← pq ) | P p ′ q ′ P r ′ s ′ | H ( r ′ s ′ ← p ′ q ′ ) | = P q ′ ∈ occ . P r ′ s ′ ∈ unocc . | H ( r ′ s ′ ← pq ′ ) | P p ′ q ′ ∈ occ . P r ′ s ′ ∈ unocc . | H ( r ′ s ′ ← p ′ q ′ ) | ! P r ′ s ′ ∈ unocc . | H ( r ′ s ′ ← pq ) | P q ′ ∈ occ . P r ′ s ′ ∈ unocc . | H ( r ′ s ′ ← pq ′ ) | ! × (cid:18) P s ′ ∈ unocc . | H ( rs ′ ← pq ) | P r ′ s ′ ∈ unocc . | H ( r ′ s ′ ← pq ) | (cid:19) (cid:18) | H ( rs ← pq ) | P s ′ ∈ unocc . | H ( rs ′ ← pq ) | (cid:19) = P ( p ) P ( q | p ) P ( r | p, q ) P ( s | p, q, r ) . (16)The conditional probabilities of selecting unoccupied spin-orbitals for double excitation, P ( r | p, q ) and P ( s | p, q, r ),will be different for each determinant, since they involve sums over all the spin-orbitals that are currently unoccupied.However, we can approximate them by summing over all spin-orbitals, except for p and q : P ( r | p, q ) ≈ ˜ P ( r | p, q ) ≡ P s ′ / ∈{ p,q } | H ( rs ′ ← pq ) | P rs / ∈{ p,q } | H ( r ′ s ′ ← pq ) | , P ( s | p, q, r ) ≈ ˜ P ( s | p, q, r ) ≡ | H ( rs ← pq ) | P s ′ / ∈{ p,q } | H ( rs ′ ← pq ) | . (17)Spin-orbitals can be sampled from these distributions in O (1) time using the alias method [8], as described inAppendix D.A similar approximation can be made for the probability of selecting the occupied spin-orbitals, P ( p ) and P ( q | p ).Defining D pq ≡ X r ′ s ′ / ∈{ p,q } (cid:12)(cid:12) H ( r ′ s ′ ← pq ) (cid:12)(cid:12) and S p ≡ X q ′ = p D pq ′ , (18)we can approximate P ( p ) and P ( q | p ) by P ( p ) ≈ ˜ P ( p ) ≡ S p P p ′ ∈ occ . S p ′ , P ( q | p ) ≈ ˜ P ( q | p ) ≡ D pq P q ′ ∈ occ . D pq ′ . (19)Since S p , D pq , ˜ P ( r | p, q ), and ˜ P ( s | p, q, r ) all involve sums over all spin-orbitals, they can be computed and storedonce at the beginning of the run. However, ˜ P ( p ) and ˜ P ( q | p ) involve sums over only the currently-occupied spin-orbitals, and thus they must be computed on the fly (in O ( N ) time).Note that approximating the heat-bath probabilities affects only the efficiency of the calculation, not its exactness.Once the initiator bias has been extrapolated away, the results are exact (i.e., they have a statistical error but nobias), so long as the same probabilities are used for proposing moves and computing the weights. B. Sampling single excitations We now modify the double-excitation proposal algorithm above to include the possibility of proposing a singleexcitation. After selecting electrons in spin-orbitals { p, q } and empty spin-orbital r , we allow two possibilities:1. Select a second unoccupied spin-orbital s , generating the double excitation ( rs ← pq ) (as described above),OR2. Discard the already-selected occupied spin-orbital q and generate the single excitation ( r ← p ).The heat-bath distribution requires us to choose between a single and a double excitation with probabilityproportional to their corresponding matrix elements: P (single | p, q, r ) ∝ | H ( r ← p ) | , and P (double | p, q, r ) ∝ H tot rpq , (20)where H tot rpq ≡ X s ′ / ∈{ p,q,r } (cid:12)(cid:12) H ( rs ′ ← pq ) (cid:12)(cid:12) (21)is the sum of double-excitation off-diagonal elements in which electrons in spin-orbitals p and q excite to two other spin-orbitals, one of which is r . Note that H tot rpq can be precomputed and stored at the beginning of the run.One method of choosing between single and double excitations is to choose a single excitation with probability | H ( r ← p ) | H tot rpq + | H ( r ← p ) | and to choose a double excitation otherwise. However, the problem with this approach is that if | H ( r ← p ) | ≫ H tot rpq the double excitation probability becomes small, so when a double excitation is chosen, thenewly-spawned walker gets a large weight. A solution to this problem is to make both a double-excitation and asingle-excitation move if | H ( r ← p ) | > H tot rpq . Using this approach, the probabilities of selecting a single or doubleexcitation are ˜ P (single | p, q, r ) = ( | H ( r ← p ) | H tot rpq + | H ( r ← p ) | , if | H ( r ← p ) | < H tot rpq ;1 , otherwise, (22)and ˜ P (double | p, q, r ) = ( H tot rpq H tot rpq + | H ( r ← p ) | , if | H ( r ← p ) | < H tot rpq ;1 , otherwise, (23)respectively. Now, there can still be some large-weight single-excitation moves when | H ( r ← p ) | ≫ H tot rpq , but sincesingle-excitation moves are a very small fraction of all moves, this is not a serious problem. C. Check for correctness of heat-bath sampling While all valid double excitations can be proposed using this algorithm, one may wonder whether the same istrue of single excitations. The only situation in which a valid single excitation ( r ← p ) cannot be proposed by ourheat-bath algorithm is if all other occupied spin-orbitals q = p are the only ones of their irreducible representation,since then there would be no symmetry-allowed double excitation ( rs ← pq ), and ˜ P ( r | p, q ) would be zero. Hence,a sufficient − but not necessary − condition for heat-bath sampling to be correct for a given system consisting of n ↑ spin-up electrons and n ↓ spin-down electrons is:max ( n ↑ , n ↓ ) > N ′ irrep . , (24)where N ′ irrep . is the number of irreducible representations consisting of only one spatial orbital.Care must be taken that the full symmetry is used in this step. It is common practice to use D h symmetry,rather than the full D ∞ h symmetry, for a homonuclear diatomic molecule. For example, suppose there is only oneorbital pair that transforms as the ∆ g irreducible representation of D ∞ h . The member of the pair that transformsas x − y belongs to the A g irreducible representation of D h whereas the member that that transforms as xy belongs to the B g irreducible representation of D h . However, there are other functions, 1, x + y and z thatalso transform as the A g representation. So, using the irreducible representations of D ∞ h we would conclude thatthis orbital pair contributes two to N ′ irrep . whereas using the irreducible representations of D h symmetry we wouldincorrectly conclude that this orbital pair contributes one to N ′ irrep . .In practice, when we do not know the irreducible representations of the orbitals in the full symmetry group,we calculate N ′ irrep . as follows. Consider the system consisting of the same set of M spatial orbitals but only oneelectron. Compute the Full CI Hamiltonian matrix H ′ (dimension M ) for this system. The number of columns of H ′ that have no nonzero off-diagonal elements is equal to N ′ irrep . . This check is done at the beginning of a run toensure that heat-bath sampling is unbiased for the system being examined. IV. RESULTS: B , N AND F MOLECULES The relative efficiency of the approximate heat-bath method to the uniform method was tested on the B , N andF molecules. Dunning’s cc-pVQZ basis was used for all three molecules, and in addition, for N , cc-pV X Z baseswith X ∈ { D , T , Q , } were used to study the basis set dependence. For each basis set, the determinants in both thedeterministic space and the trial wavefunction were selected from the set of determinants that are at most quadrupleexcitations from the Hartree-Fock determinant. The size of the deterministic space was 4 × determinants, and F r e q u e n c y | H ji | P ( j ← i ) (Ha)Histogram of magnitudes of spawned weightsUniformHeat-bathFIG. 1. Probability distributions of off-diagonal absolute weights (in multiples of τ ), spawned by the uniform and heat-bathalgorithms for equilibrium N in a cc-pV5Z basis at with all electrons excited and 10 walkers. In the new heat-bath samplingalgorithm, the spawned off-diagonal weights are close in magnitude, whereas in uniform sampling, the values of the spawnedweights span many orders of magnitude. The location of the peak in the heat-bath distribution corresponds to an off-diagonalweight of about 0.1. the trial wavefunction contained 10 determinants (except for cc-pV5Z, which had a trial wavefunction with only500 determinants). To accelerate convergence, time-reversal symmetry (see Appendix E) and natural orbitals froma second-order Moller-Plesset perturbation theory (MP2) calculation were used.Fig. 1 shows the frequency of spawned absolute weights (in units of τ ) for the uniform and the approximateheat-bath methods applied to the N molecule at equilibrium geometry in a cc-pV5Z basis. The vast majority ofthe absolute weights spawned by the heat-bath method lie in the range 250-1000 Ha times τ , whereas those spawnedby the uniform method range over orders of magnitude.The reduction in the spread of the spawned weights makes the initiator approximation [2] more meaningful andallows one to use a larger time step τ . In the uniform method, a state can become an initiator simply becauseit happened to receive a large weight from a single determinant. The increase in τ speeds up the time needed toachieve equilibration, as well as increasing the efficiency of the method. Since the statistical uncertainty, σ E , inthe estimate of the ground state energy decreases as the inverse square root of the Monte Carlo run time T , theefficiency is defined to be Efficiency ∝ σ E T . (25)We note that unlike the diffusion Monte Carlo method where the equilibration time is a negligible part of the totaltime, in S-FCIQMC the equilibration can be a substantial part of the total.Table I shows these quantities for a sequence of three molecules. The efficiency gain increases with the numberof electrons from 3.8 for B , to 54 for F . For uniform sampling, when N or M is large, it is hard to get an accurateestimate of the statistical error because (1) the optimal τ is very small, making the autocorrelation time in units ofthe number of Monte Carlo steps very long, and (2) the population must be sufficiently large to enable cancellationsto occur. Thus, in Tables I and II, the equilibration speedup and increase in optimal time step are given to onlysingle-digit accuracy.Table II demonstrates that the efficiency gain increases with the size of the basis. For the N molecule, the gainincreases from 3.4 for the cc-pVDZ basis to 32 for the cc-pV5Z basis. For the cc-pV5Z basis, Fig. 2 shows how the0 molecule efficiency gain equilibration speedup τ opt increaseB 31 30 200F 54 50 200TABLE I. Efficiency gains for all-electron equilibrium calculations of the ground state energy in a cc-pVQZ basis for differentmolecules. The efficiency gain increases with the number of electrons.051015202530351e-07 1e-06 1e-05 0.0001 0.001 R e l a t i v ee ffi c i e n c y τ (Ha − )Efficiencies of uniform and heat-bath sampling in V5Z basisUniformHeat-bathFIG. 2. Comparison of relative efficiency vs. time step size for uniform and heat-bath methods on the all-electron nitrogendimer at equilibrium in the cc-pV5Z basis. Efficiency plotted relative to the greatest efficiency seen for uniform. Heat-bathsampling improves efficiency by a factor of 32, while the optimal time step is about a factor of 300 larger. relative efficiencies change with τ . The relative efficiencies are normalized such that the peak for uniform samplingis at one. Not only is the peak efficiency 32 times higher for heat-bath sampling, but the peak is broader, whichmakes it easier to choose a value of τ that is close to optimal. basis efficiency gain equilibration speedup τ opt increasecc-pVDZ 3.4 2 5cc-pVTZ 16 30 200cc-pVQZ 31 30 200cc-pV5Z 32 60 300TABLE II. Efficiency gains and the factor by which the optimal time step τ opt increases upon using heat-bath sampling insteadof uniform sampling, for N at the equilibrium geometry including core excitations. For a given sampling algorithm, τ opt isthe time step that maximizes the efficiency. The gain in efficiency of heat-bath sampling over uniform sampling increases withincreasing basis set size. In summary, the gain in efficiency from using heat-bath sampling rather than uniform sampling increases bothwith the number of electrons and the number of spin-orbitals. V. DISCUSSION The approximate heat-bath sampling algorithm introduced in this paper enables one to sample many-bodyquantum states connected to an initial state with probability approximately proportional to an arbitrary function1of the Hamiltonian matrix element connecting the two states. We chose the probabilities to be proportional tothe corresponding matrix element, which greatly improves the efficiency of S-FCIQMC. However, other choices arepossible; e.g., in the Monte Carlo Configuration Interaction method [6], it would be useful to choose the probabilitiesto be proportional to the lowering in energy estimated from second-order perturbation theory.Finally, we note that the gain in efficiency from using the heat-bath method will be even greater for excited statesif they are calculated using a projector that involves sampling the square of the Hamiltonian [9], since products ofweights fluctuate even more than the weights themselves. Acknowledgements: We thank Frank Petruzielo, Alessandro Roggero, Matt Otten, and George Booth for con-tributing to the development of our S-FCIQMC code. This work was supported by grants NSF CHE-1112097, DOEde-sc0006650, and NSF ACI-1534965. After moving to UIUC, H.J.C. was supported by the SciDAC program underDOE Award Number DE-FG02-12ER46875. Appendix A: Recap of algorithm when applied to S-FCIQMC Before the S-FCIQMC run, compute and store the quantities S p , D pq , ˜ P ( r | p, q ), ˜ P ( s | p, q, r ), and H tot rpq , asdescribed in Appendix C.When spawning off-diagonal moves from determinant | i i , first compute and store the probability distribution forselecting the first occupied spin-orbital p from this determinant,˜ P ( p ) = S p P p ′ S p ′ , (A1)and compute the A and Q arrays needed for the alias method (see Appendix D). This step takes O ( N ) time.Divide up the weight w i on | i i into n i = max (1 , ⌊| w i |⌉ ) walkers. Then, for each walker on determinant | i i , spawnone or two new determinants as follows:1. Using the alias method, choose the first occupied spin-orbital p from the stored distribution ˜ P ( p ). This steptakes O (1) time.2. Compute the probability distribution for choosing the second occupied spin-orbital q given that p was alreadyselected, ˜ P ( q | p ) = D pq P q ′ D pq ′ , (A2)and sample the second occupied spin-orbital q from ˜ P ( q | p ). This step takes O ( N ) time.3. Choose a spin-orbital r / ∈ { p, q } from the stored distribution ˜ P ( r | p, q ), such that p and r are of the samespin, using the alias method. If r is occupied in | i i , no moves are generated by this walker; go to next walker.Otherwise, r is the first unoccupied spin-orbital for this excitation. This step takes O (1) time.4. Decide whether to spawn a single excitation, a double excitation or one of each, as follows. Compute thesingle-excitation matrix element H ( r ← p ). If | H ( r ← p ) | is greater than the stored quantity H tot rpq , thenspawn both the single excitation ( r ← p ) and a double excitation; otherwise spawn only one of the two typesof excitations as follows. With probability | H ( r ← p ) | H tot rpq + | H ( r ← p ) | , (A3)choose the single excitation ( r ← p ); otherwise choose a double excitation. This step takes O ( N ) time, sincethat is the complexity of computing a single-excitation matrix element.25. If a double excitation is to be proposed, choose a fourth spin-orbital s / ∈ { p, q, r } from the stored distribution˜ P ( s | p, q, r ), using the alias method. If s is occupied in | i i , no double excitation is generated by this walker.Otherwise, s is the second unoccupied spin-orbital for this excitation, and the move ( rs ← pq ) is generated.This step takes O (1) time.6. After generating the excitation(s), compute the weight spawned on the new determinant(s) as follows. If asingle excitation ( r ← p ) was proposed to a new determinant | j i , the weight spawned on | j i is w ( t +1) j = − τ H ( r ← p ) P ( r ← p ) (cid:18) w ti n i (cid:19) , (A4)where P ( r ← p ) is given in Appendix B. If a double excitation ( rs ← pq ) was proposed to a new determinant | k i , the weight spawned on | k i is w ( t +1) k = − τ H ( rs ← pq ) P ( rs ← pq ) (cid:18) w ti n i (cid:19) , (A5)where P ( rs ← pq ) is given in Appendix B. Appendix B: Proposal probabilities As discussed in the text, when a new walker is spawned, the weight assigned to that walker is proportional tothe corresponding matrix element of the projector divided by the proposal probability for that transition. Thecomputation of these proposal probabilities is described next. The proposal probabilities both for single and doubleexcitations can be computed in O ( N ) time. 1. Single excitations The probability for a single excitation ( r ← p ) is P ( r ← p ) = X q ′ ∈ occ . ˜ P ( p ) ˜ P (cid:0) q ′ | p (cid:1) ˜ P (cid:0) r | p, q ′ (cid:1) ˜ P (single | p, q, r ) , (B1)where P (single | p, q, r ) = ( | H ( r ← p ) | H tot rpq + | H ( r ← p ) | , if | H ( r ← p ) | < H tot rpq ;1 , otherwise, (B2)is the probability of choosing the single excitation ( r ← p ) given that spin-orbitals { p, q, r } have already beenselected.An alternative probability is P ′ ( r ← p ) = ( N − 1) ˜ P ( p ) ˜ P ( q | p ) ˜ P ( r | p, q ), where q is the spin-orbital that wasselected and discarded during the heat-bath sampling routine. As described in Appendix F, this alternative methodhas an efficiency tradeoff associated with it: it increases efficiency by avoiding having to compute all terms in thesum, but it also decreases efficiency by increasing the fluctuations in spawned weights. This alternative probabilitywas not employed in this paper.3 2. Double excitations The probability for a double excitation ( rs ← pq ) is P ( rs ← pq ) = ˜ P ( p ) ˜ P ( q | p ) h ˜ P ( r | p, q ) ˜ P (double | p, q, r ) ˜ P ( s | p, q, r ) + ˜ P ( s | p, q ) ˜ P (double | p, q, s ) ˜ P ( r | p, q, s ) i + ˜ P ( q ) ˜ P ( p | q ) h ˜ P ( r | q, p ) ˜ P (double | q, p, r ) ˜ P ( s | q, p, r ) + ˜ P ( s | q, p ) ˜ P (double | q, p, s ) ˜ P ( r | q, p, s ) i , (B3)where ˜ P (double | p, q, r ) = ( H tot rpq H tot rpq + | H ( r ← p ) | , if | H ( r ← p ) | < H tot rpq ;1 , otherwise, (B4)is the probability of choosing a double excitation given that { p, q, r } have already been selected.The second and third terms in Eq. B3 are zero for opposite-spin excitations.As with single excitations, an alternative probability is P ′ ( rs ← pq ) = c ˜ P ( p ) ˜ P ( q | p ) ˜ P ( r | p, q ) ˜ P (double | p, q, r ) ˜ P ( s | p, q, r ) ˜ P ( p ) ˜ P ( q | p ) ˜ P ( r | p, q ) , (B5)where c is either 2 or 4 for opposite-spin or same-spin double excitations, respectively. Again, this alternativeprobability was not employed in this paper. Appendix C: Quantities that must be precomputed at start of run At the start of the run, compute and store the following tensors. It is assumed that up- and down-spin orbitalsare the same, but the number of up- and down-spin electrons need not be the same.1. Electron pair selection probabilities tensor: D pq = X r ′ s ′ (cid:12)(cid:12) H ( r ′ s ′ ← pq ) (cid:12)(cid:12) (C1)for all p = q . If single excitations were ignored, this would roughly correspond to the relative heat-bathprobability of selecting a pair of electrons to excite, i.e., ˜ P ( p, q ) ∝ D pq . This has size 2 M (2 M − / 2, sinceelectrons can be of either spin. The storage could be reduced to M ( M − / M for opposite-spin electrons.2. Single electron selection probabilities tensor: S p = X q ′ = p D pq ′ (C2)for all p . The electrons are chosen one at a time with probabilities ˜ P ( p ) and ˜ P ( q | p ) computed using S p and D pq , respectively. S has size M .3. First hole selection probabilities tensor:˜ P ( r | p, q ) = P s ′ | H ( rs ′ ← pq ) | P r ′ s ′ | H ( r ′ s ′ ← pq ) | (C3)4for all p = q = r . This is the probability of choosing the first empty spin-orbital r , given that electrons inspin-orbitals p and q have already been selected for excitations. This can be stored as two separate tensors,˜ P same ( r | p, q ) and ˜ P opposite ( r | p, q ), for same-spin and opposite-spin double excitations, respectively. The same-spin excitation tensor only has to be of size M , while the opposite-spin excitation tensor can be of size M .In that case, ˜ P opposite ( r | p, q ) represents the probability of choosing r given p, q with r and p having same spin.4. Double excitations probabilities tensor:˜ P ( s | p, q, r ) = | H ( rs ← pq ) | P s ′ | H ( rs ′ ← pq ) | (C4)for all p = q = r = s . When sampling a double excitation, this is the probability of choosing the secondempty spin-orbital s given that electrons in spin-orbitals p and q and empty spin-orbital r have already beenchosen. Also, store the denominators H tot rpq = P s ′ | H ( rs ′ ← pq ) | , the summed magnitudes of double excitationmatrix elements. This will be needed for comparing with single excitation matrix elements. The tensor canbe stored as two separate tensors, ˜ P same ( s | p, q, r ) and ˜ P opposite ( s | p, q, r ), of sizes M and M , respectively.For each of the two probability tensors ˜ P ( r | p, q ) and ˜ P ( s | p, q, r ), we store not only the probabilities, but thecorresponding A and Q tensors for sampling them in O (1) time using the alias method (see Appendix D). Thus, thetotal storage space for ˜ P ( s | p, q, r ) and its corresponding A and Q tensors is M integers and 3 M single-precisionreal numbers. In comparison, the integrals files that are already being stored are O (cid:0) M (cid:1) double-precision realnumbers, so the storage requirement is 18 times larger than the storage requirement for the integrals alone. 1. Relaxing the storage requirements While the above storage requirements have the same O (cid:0) M (cid:1) scaling as the two-body integrals that are alreadystored, the fact that ˜ P ( s | p, q, r ) requires 18 times the storage of the integrals can become prohibitive for morethan about 250 orbitals. One possibility is to make a Cauchy-Schwarz approximation for estimating the 2-bodyintegrals [10]. Instead, here we discuss a method that avoids storing ˜ P ( s | p, q, r ) for all O (cid:0) M (cid:1) combinations { p, q, r, s } without making the Cauchy-Schwarz approximation.First, note that ˜ P ( s | p, q, r ) is only sampled when { p, q, r } have already been selected. Some { p, q, r } combinationsoccur a lot more often than others, so ˜ P ( s | p, q, r ) needs to be stored only for the most frequently-occurring triplets { p, q, r } . For the other, less frequently-occurring triplets { p, q, r } , one could either choose spin-orbital s uniformly,or compute the correct heat-bath conditional probability ˜ P ( s | p, q, r ) on the fly in O ( M ) time.As can be seen in Fig. 3, the fraction of { p, q, r } triplets that account for the vast majority of calls to sample˜ P ( s | p, q, r ) decreases with basis set size. In order to account for 99% of the number of calls to sample ˜ P ( s | p, q, r ),for cc-pVDZ, one must store ˜ P ( s | p, q, r ) for about 50% of the { p, q, r } triplets, while for cc-pV5Z, it is only requiredto store ˜ P ( s | p, q, r ) for about 20% of the triplets.The largest basis used in this paper is cc-pV5Z (182 orbitals), but with this modification, calculations with acc-pV6Z basis (280 orbitals) could easily be done. Appendix D: Alias method for sampling from discrete distributions Discrete distributions consisting of M probabilities can be sampled straightforwardly by constructing an arrayof cumulative probabilities, drawing a random number, and doing a binary search to find the interval in which therandom number falls. This requires O ( M ) time to set up the cumulative probabilities and O (log M ) time to sample.However, when the same distribution is being sampled repeatedly, the alias method [8] can be used to sample itin even more quickly, in O (1) time. The method employs a real array, Q , and an integer array A , each of length M . Sampling with the alias method is done as follows. First, orbital i is selected uniformly from the set of all M P e r ce n t o f ˜ P ( s | p , q , r ) s t o r e d Percent of calls to sample ˜ P ( s | p, q, r ) accounted forRelaxation of ˜ P ( s | p, q, r ) storage requirementVDZVTZVQZV5ZFIG. 3. This plot shows what percentage of ˜ P ( s | p, q, r ) must be stored, in order for the stored part of ˜ P ( s | p, q, r ) to account fora given percentage of the total calls to sample it, for an all-electron nitrogen dimer in cc-pV X Z basis sets for X ∈ { D, T, Q, } .It was obtained by counting the number of times that each triplet { p, q, r } was selected (followed by sampling ˜ P ( s | p, q, r ))during a run with 10 walkers. possible orbitals. Then, with probability Q i , orbital i is sampled; else, orbital A i is sampled. The cost of doing thisis just the cost of drawing two uniform random numbers.The set of probabilities { Q i } and aliases { A i } can be generated in O ( M ) time at the beginning of the run [11]as shown in the pseudocode in Fig. 4. Appendix E: Time-reversal symmetry The size of the Hilbert space can be reduced by almost a factor of two when the number of up and down electronsare equal by taking advantage of time-reversal symmetry. This increases the effectiveness of cancellations, and allowsone to use larger deterministic spaces and trial wavefunctions in S-FCIQMC. It also enables one to calculate anexcited state as easily as the ground state provided that it is the lowest state of a different symmetry under timereversal than the ground state; e.g., if the ground state is a singlet, it enables calculating the lowest triplet state. Allcalculations in this paper make use of time-reversal symmetry and here we discuss details of how it is implementedin S-FCIQMC.Let ˆ T denote the time-reversal operator, so | i ′ i ≡ ˆ T | i i is the state obtained by flipping all the spins of theelectrons in state | i i . Since ˆ T | i i = | i i , the eigenvalues of the time-reversal operator, z , must be ± 1. Even S stateshave z = 1, and odd S states have z = − 1, where S is the total spin of the system. So, the wavefunctions can beexpanded in a symmetrized basis, | ˜ i i = 1 √ N i (cid:0) | i i + z | i ′ i (cid:1) (E1)where, N i = ( , if h i | i ′ i = 0 , √ , otherwise , (E2)6 FIG. 4. Alias method setup ! Inputs: M = number of discrete states to sample! {P(i)} = discrete set of probabilities of sampling state i!! Outputs: {A(i)} = aliases of states! {Q(i)} = probabilities of returning state i rather than its alias A(i)! Scale probabilities by M and place in two lists, those smaller and those bigger than 1.n_s = 0 ; n_b = 0do i=1,MA(i) = i ! Not an arbitrary initialization (*)Q(i) = M*P(i)if (Q(i)<1) thenn_s = n_s + 1smaller(n_s) = ielsen_b = n_b + 1bigger(n_b) = iendifenddo! For each orbital construct probability of staying on the orbital and the alias of the orbital.do while (n_s > 0 .and. n_b > 0)s = smaller(n_s)b = bigger(n_b)A(s) = bQ(b) = Q(b) + Q(s) - 1if (Q(b) < 1) thensmaller(n_s) = bn_b = n_b - 1elsen_s = n_s - 1endifenddo! (*) Initialize A(i) = i, so that if the random number exceeds a floating point approximation! to Q(i) = 1, i is returned (as it should be) rather than an arbitrary initialized state. is a normalization factor. Note that a state which is its own time-reversed partner ( | i i = | i ′ i ) can only have non-zerocoefficients in a wavefunction that has z = 1.In each symmetrized linear combination, | ˜ i i , one of the states is chosen to be the “representative”, | i i rep . The rep-resentative has a positive coefficient in the linear combination, and non-representative state has the same coefficientmultiplied by z . In Eq. (E1) | i i rep = | i i . The representative is chosen by converting the computer representation ofstates (a binary string) into a number and then defining, | i i rep ≡ min (cid:0) | i i , | i ′ i (cid:1) (E3)Thus the representative is the state that comes first according to an arbitrary (but consistent) convention forordering.Recall that when time-reversal symmetry is not used, a walker that makes an off-diagonal move from state | i i to state | j i is assigned weight, w j = − τH ji P ji (cid:16) w i n i (cid:17) , where w i and n i are the weight and number of walkers on | i i ,respectively. This viewpoint is adopted even with the inclusion of time-reversal symmetry, albeit with modifications7to the Hamiltonian matrix element and the proposal probability, to give, w ˜ j = − τ ˜ H ˜ j ˜ i P ˜ j ˜ i (cid:18) w ˜ i n ˜ i (cid:19) (E4)where ˜ H ˜ j ˜ i is the Hamiltonian matrix element between the two symmetrized pairs of states, ˜ i and ˜ j , and P ˜ j ˜ i is thetotal probability of making a transition between them.Since the time-reversal operator and the Hamiltonian commute ([ ˆ T , ˆ H ] = 0), it follows that H ji ′ = H j ′ i , andsince ˆ T = 1, H j ′ i ′ = H ji . Then, using expressions for | ˜ i i and | ˜ j i from Eq. (E1) we get,˜ H ˜ j ˜ i = H ji + zH j ′ i N i N j (E5)To evaluate P ˜ j ˜ i , treat | i i rep just as the usual (unsymmetrized) state (only consider the usual Hamiltonian con-nections from | i i rep , not its symmetry related pair state). With probability P j,i rep , state | i i rep spawns to state | j i ,which may or may not be its own representative. Since non-representative states are not included in the list ofoccupied states, if | j ′ i 6 = | j i , we must sum over both possibilities to get P ˜ j ˜ i = P j rep ,i rep = P j,i rep + P j ′ ,i rep . (E6) Appendix F: Choice of weighting when multiple events lead to a single state Let P be the matrix element of the projector for a transition between two particular states. Suppose there are N events with probabilities p i , · · · , p N that result in this transition. (In general, P Ni p i < 1, since there are otherstates that can be accessed also.) The probability of getting that state is P Ni p i , so the weight multiplier for thismove is P P Ni p i regardless of which event led to this state. This prescription is correct since N X i p i P P Ni p i = P. (F1)( P is independent of i because all the possibilities, i , correspond to the same state.)There is an alternative approach which avoids the expense of computing the probabilities for the N − i , the weight multiplier is PNp i rather than P P Ni p i , so it depends on which of the N events was selected. This prescription is also correct since N X i p i PN p i = P (F2)There is a loss of efficiency, that can be large if the p i differ greatly from each other. So, there is a trade-off betweenavoiding the expense of computing the N − [1] G. H. Booth, A. J. W. Thom, and A. Alavi, J. Chem. Phys. , 054106 (2009).[2] D. Cleland, G. H. Booth, and A. Alavi, J. Chem. Phys. , 041103 (2010).[3] J. J. Shepherd, A. Grueneis, G. H. Booth, G. Kresse, and A. Alavi, Phys. Rev. B , 035111 (2012).[4] G. H. Booth, S. D. Smart, and A. Alavi, Mol. Phys. , 1855 (2014). [5] F. R. Petruzielo, A. A. Holmes, H. J. Changlani, M. P. Nightingale, and C. J. Umrigar, Phys. Rev. Lett. , 230201(2012).[6] J. Greer, J. Comp. Phys. , 181 (1998).[7] To switch to chemist notation, interchange the middle two indices.[8] A. J. Walker, ACM Trans. Math. Software , 253 (1977).[9] G. H. Booth and G. K.-L. Chan, J. Chem. Phys. , 191102 (2012).[10] S. D. Smart, G. H. Booth, and A. Alavi, Unpublished.[11] R. A. Kronmal and A. V. Peterson Jr, Amer. Statist.33