[PDF] Population Control Bias and Importance Sampling in Full Configuration Interaction Quantum Monte Carlo

Abstract

Population control is an essential component of any projector Monte Carlo algorithm. This control mechanism usually introduces a bias in the sampled quantities that is inversely proportional to the population size. In this paper, we investigate the population control bias in the full configuration interaction quantum Monte Carlo method. We identify the precise origin of this bias and quantify it in general. We show that it has different effects on different estimators and that the shift estimator is particularly susceptible. We derive a re-weighting technique, similar to the one used in diffusion Monte Carlo, for correcting this bias and apply it to the shift estimator. We also show that by using importance sampling, the bias can be reduced substantially. We demonstrate the necessity and the effectiveness of applying these techniques for sign-problem-free systems where this bias is especially notable. Specifically, we show results for large one-dimensional Hubbard models and the two-dimensional Heisenberg model, where corrected FCIQMC results are comparable to the other high-accuracy results.

Full PDF

PPopulation Control Bias and Importance Sampling in Full Conﬁguration InteractionQuantum Monte Carlo

Khaldoon Ghanem, Niklas Liebermann, and Ali Alavi

1, 2 Max Planck Institute for Solid State Research, Heisenbergstr. 1, 70569 Stuttgart, Germany ∗ Dept of Chemistry, University of Cambridge, Lensﬁeld Road, Cambridge CB2 1EW, United Kingdom † (Dated: February 23, 2021)Population control is an essential component of any projector Monte Carlo algorithm. This controlmechanism usually introduces a bias in the sampled quantities that is inversely proportional to thepopulation size. In this paper, we investigate the population control bias in the full conﬁgurationinteraction quantum Monte Carlo method. We identify the precise origin of this bias and quantify itin general. We show that it has diﬀerent eﬀects on diﬀerent estimators and that the shift estimatoris particularly susceptible. We derive a re-weighting technique, similar to the one used in diﬀusionMonte Carlo, for correcting this bias and apply it to the shift estimator. We also show that byusing importance sampling, the bias can be reduced substantially. We demonstrate the necessityand the eﬀectiveness of applying these techniques for sign-problem-free systems where this bias isespecially notable. Speciﬁcally, we show results for large one-dimensional Hubbard models and thetwo-dimensional Heisenberg model, where corrected FCIQMC results are comparable to the otherhigh-accuracy results. I. INTRODUCTION

Projector Monte Carlo algorithms, including Green’sfunction Monte Carlo (GFMC) [1], diﬀusion Monte Carlo(DMC) [2], and full conﬁguration interaction quantumMonte Carlo (FCIQMC) [3], have become indispensabletools in extracting the ground state properties of vari-ous quantum systems. These methods employ a stochas-tic version of the power method; a method that startsfrom an initial wavefunction with a non-zero overlap withthe ground state and ﬁlters out higher excited statesby a repeated application of a suitable imaginary-timepropagator of the Schr¨odinger equation. As the systemsize grows, its Hilbert space grows exponentially, and astochastic representation of the wavefunction becomes anecessity. The wavefunction is then sparsely representedby a set of walkers that are evolved according to the prop-agator. Their expectation value on any conﬁguration re-ﬂects the wavefunction value on that conﬁguration.The main challenge faced by these algorithms, com-pared to traditional stochastic methods, is that theirpropagators do not form column-stochastic matrices ingeneral [4]. On the one hand, their elements can be eitherpositive or negative. On the other hand, the sum of theabsolute values for each column is not normalized. Theﬁrst issue implies that walkers should be signed and of-ten leads to a sign problem. The latter issue can be dealtwith using a branching process where walkers need to beadded or removed dynamically to reﬂect the changes inthe L -norm of the wavefunction. However, the branch-ing creates a practical problem because there is a chancethat all walkers will die or their storage exceeds the avail-able computer memory. Moreover, the computational ef- ∗ [email protected] † [email protected] fort per sample and the statistical accuracy are directlyproportional to the total number of walkers, so minimiz-ing the ﬂuctuations in the number of walkers makes run-ning times more predictable and markedly enhances theutilization of available computational resources. Conse-quently, diﬀerent projector Monte Carlo methods applysome population control mechanism that keeps the num-ber of walkers around a set target throughout the simu-lation. It has long been known that this population con-trol introduces a bias in DMC calculations [5] and otherGreen’s function methods [6] and that the bias scales in-versely with the number of walkers [4]. Diﬀerent remedieshave been proposed to overcome this bias, including anextrapolation in the number of walkers, a modiﬁcationof the wavefunction renormalization procedure, and de-riving new unbiased estimators by reweighting with thediscarded scale factors [7].The population control bias has often been neglectedin FCIQMC because it is usually much smaller than thestatistical error bars or other biases [8]. Indeed, for anysystem with a fermionic sign problem, the original fullFCIQMC algorithm needs a minimum number of walk-ers, below which the sign-incoherent noise dominates thesimulation and the dynamics of the walkers become un-stable. This number of walkers is typically large enoughthat the population bias does not exceed the statisticalerror bars. Moreover, in practice, the initiator approx-imation is often used instead of the original FCIQMCalgorithm. This approximation constrains the dynamicsof the walkers, which prevents the sign-incoherent noisefrom propagating and destabilizing the simulation [9].The constraint allows stable simulations using any num-ber of walkers but introduces a bias that also scales downwith the number of walkers. The resulting initiator bias ismuch larger than the population control bias, and reduc-ing it requires increasing the population. Therefore, thepopulation bias has been masked in these applications bythe initiator bias and did not pose a practical concern. In a r X i v : . [ phy s i c s . c o m p - ph ] F e b contrast to these systems, there are some sign-problem-free ones, where no minimum number of walkers is re-quired for getting stable FCIQMC simulation, and it is amatter of obtaining more samples to reduce the statisti-cal error bars. For such systems, the population controlbias is the only systematic bias, making its study mostconvenient in these cases. Consequently, in this paper,we study the population control bias in FCIQMC, par-ticularly in its application to sign-problem-free systems,and in conjunction with importance sampling.The paper is structured as follows: We ﬁrst reviewthe FCIQMC method, its sign problem, energy estima-tors, and its population control schemes. We show thatpopulation control leads to a biasing source term in themaster equation, which equals the covariance between theshift parameter and the wavefunction. We also show thatthis covariance term scales inversely with the number ofwalkers and use it to relate the biases of diﬀerent energyestimators. Ref. [8] borrowed a reweighting procedure de-veloped originally for DMC and showed numerically thatit eﬀectively corrects the projected energy estimates. Weprovide a derivation of this reweighting procedure withinFCIQMC and adapt it for correcting the shift estimator.We then examine the eﬀect of the importance samplingon reducing the bias. Finally, we demonstrate the eﬀec-tiveness of the correction technique and the importancesampling by applying them to the one-dimensional Hub-bard model and the two-dimensional Heisenberg modelwith large lattice sizes. II. BACKGROUNDA. FCIQMC

FCIQMC projects the ground state of a Hamiltonianˆ H from an initial state | ψ (cid:105) by evolving it according tothe imaginary-time Schr¨odinger equation − dd τ | ψ ( τ ) (cid:105) = (cid:2) ˆ H − S (cid:3) | ψ ( τ ) (cid:105) (1)where S is some scalar shift of the diagonal elements. Theformal solution of this equation can be written in terms ofthe eigenstates of the Hamiltonian | φ n (cid:105) and their energies E n as following | ψ ( τ ) (cid:105) = e − ( ˆ H − S ) τ | ψ (cid:105) = (cid:88) n (cid:104) φ n | ψ (cid:105) e − ( E n − S ) τ | φ n (cid:105) . (2)If the shift S is lower than the ﬁrst excited state energy,then contributions from all excited states decay exponen-tially leaving only the ground state wavefunction in thelong time limitlim τ →∞ | ψ ( τ ) (cid:105) = (cid:104) φ | ψ (cid:105) e − ( E − S ) τ | φ (cid:105) . (3)The wavefunction in FCIQMC is represented by a setof walkers that occupy the Hilbert space spanned by Slater determinants, which are constructed from a ﬁnitebasis of single-particle orbitals. The walker populationis updated stochastically at each time step such its av-erage change respects the time-discretized Schr¨odingerequation i.e. − ∆ N i ( τ )∆ τ = (cid:2) H ii − S (cid:3) N i ( τ ) + (cid:88) i (cid:54) = j H ij N j ( τ ) , (4)where ∆ τ is the discretized time-step, N i := (cid:104) D i | ψ (cid:105) isthe number of walkers on determinant | D i (cid:105) , and H ij := (cid:104) D i | ˆ H | D j (cid:105) are the Hamiltonian matrix elements in thisdeterminant basis. In the long time limit, the groundstate wavefunction can then be estimated by averagingthe walkers over many time steps N i ≈ (cid:104) D i | φ (cid:105) . (5)The master equation (4) is implemented using threesteps of walker dynamics [3]: • Spawning:

Each walker on determinant | D i (cid:105) ran-domly spawns another walker on a connected deter-minant | D j (cid:105) with probability P i → j . The spawnedwalker is then accepted with probability P spawn = ∆ τ | H ij | P i → j , (6)and it is given a sign opposite to the sign of H ij N i .When P spawn >

1, one spawn is created stochas-tically with a probability that equals the fractionpart, while the integral part determines the numberof additional deterministically-created spawns. • Death/Clone:

Each walker dies or gets cloned witha probability that equals the absolute value of P death/clone = ∆ τ (cid:0) H ii − S (cid:1) . (7)The walker is killed when this value is positive andduplicated when it is negative. • Annihilation:

Walkers of opposite sign residing onthe same determinant are cancelled and removedfrom the simulation. As discussed next, this step iscrucial to ensure convergence to the ground state.Without it, the solution would quickly get con-taminated by noise that masks the physical groundstate.

B. Sign Problem

In the absence of annihilation, the in-phase combina-tion of the positive and negative walkers corresponds toa non-physical solution that grows faster than the de-sired out-of-phase solution [10]. The annihilation reducesthe growth rate of the in-phase solution, and the reduc-tion is proportional to the total number of walkers. Asa result, a minimum number of walkers, known as theannihilation plateau, is needed in order for the physicalsolution to outgrow the in-phase one and for the correctsign structure to emerge. This minimum number is typ-ically less than the full size of the Hilbert space but canbe a substantial fraction of it.The initiator approximation overcomes this issue bymodifying the dynamics of the walkers [9]. Only walk-ers residing on determinants with a population above acertain threshold are allowed to spawn onto empty de-terminants. This prevents lowly-populated determinants(the non-initiators) with ﬂuctuating signs from propagat-ing the sign-incoherent signal further. This constraineddynamics allows FCIQMC to converge at an arbitrarilysmall number of walkers. The cost is introducing a biasthat can be systematically improved by increasing thenumber of walkers. The initiator bias can also be sig-niﬁcantly reduced using the recently-proposed adaptiveshift method [11, 12]. In this method, the life time of anon-initiator determinant is boosted by reducing its shiftto account for the missing back-spawns from its under-populated surrounding.The annihilation plateau is the manifestation of thesign problem in FCIQMC. There are a few model sys-tems, however, which are free from this sign problem andcan converge at any number of walkers without the ini-tiator constraint. For these systems, all walkers spawnedonto a speciﬁc determinant have the same sign. One ob-vious way this can happen is when all the non-diagonalelements of the Hamiltonian have a negative sign (so-called stoquasticity); thus, all walkers are of the samesign. More generally, a Hamiltonian is sign-problem-freewhen we can classify the determinants into two disjointsets such that the matrix elements between members ofthe same set are negative (or zero), while the matrix el-ements between members of diﬀerent sets are positive(or zero). In this case, the walkers of the two sets haveopposite signs, but they never mix throughout the sim-ulation. This general case can always be recast as theearlier one-sign case by using a diﬀerent sign conven-tion for the determinants in one of the two sets, whichwould make all the non-diagonal matrix elements neg-ative. Examples of sign-problem-free systems includethe one-dimensional Hubbard model with open boundaryconditions, the one-dimensional Hubbard model with pe-riodic boundary conditions and speciﬁc combinations ofup- and down-electrons, and the two-dimensional Heisen-berg model on a bipartite lattice. All of these systemsare introduced and discussed in Section VI.

C. Projected Energy

Two kinds of energy estimators are used in FCIQMC.One is the energy shift discussed in the next section. An-other is the so-called projected energy obtained by pro- jecting on some trial wavefunction | ψ T (cid:105) : E T := (cid:104) ψ T | ˆ H | ψ (cid:105)(cid:104) ψ T | ψ (cid:105) . (8)As the average wavefunction converges to the true groundstate, its projected energy converges to true ground-stateenergy. To compute this estimate, one does not needto know the average wavefunction explicitly. Instead,the numerator and denominator are computed out ofthe instantaneous wavefunctions during the simulationand then averaged separately. The ratio of the averagedquantities gives an estimate of the projected energy ofthe average wavefunction: E T ≈ (cid:80) Li =1 (cid:104) ψ T | ˆ H | ψ ( τ i ) (cid:105) (cid:80) Li =1 (cid:104) ψ T | ψ ( τ i ) (cid:105) , (9)where L is the number of FCIQMC samples | ψ ( τ i ) (cid:105) takenafter an appropriate thermalization period.For single-reference molecular systems, the trial wave-function is typically chosen to be the reference determi-nant with the highest population (usually the Hartree–Fock determinant). For multi-reference molecular sys-tems, the simulation is run for some period before a trialspace is constructed out of the most populated determi-nants. The ground state in the trial space is then calcu-lated and used as a trial wavefunction for a more reliableprojected energy estimator. However, for some modelsystems, like the ones considered in this paper, the wave-function spreads across the whole Hilbert space such thatthe overlap with a restricted subspace is very small, andthe energy estimates mentioned above are very noisy. Forthese systems, we introduce a particular projected energyestimator, which is useful when they are free of the signproblem. Let us denote by |± (cid:105) , the uniform trial wave-function that spans the whole Hilbert space equally andhas the same sign structure as the ground state wave-function (cid:104) D i |± (cid:105) := S i = (cid:40) +1 when (cid:104) D i | φ (cid:105) > − (cid:104) D i | φ (cid:105) < , (10)the corresponding projected energy estimator then reads E ± := (cid:104)± | H | ψ (cid:105)(cid:104)± | ψ (cid:105) = (cid:80) i | N i | (cid:80) j S i S j H ij (cid:80) i | N i | . (11)We call this the uniform projected energy. Calculatingthis estimator requires knowing the sign structure of theground state a priori, which is feasible in sign-problem-free cases. Assuming in these cases a sign conventionthat renders all walkers positive, the uniform projectedenergy takes the following simple form E ± = (cid:80) i N i (cid:80) j H ij N , (12)where N is the total number of walkers. This estimatorwill be relevant later when deriving a correction of thepopulation control bias. D. Population Control

From Eq. (3), it is clear that unless the shift S equalsthe ground state energy E , the overall amplitude of thewavefunction, and thus the total number of walkers N ,would decay/grow exponentially. To maintain a stablenumber of walkers in the long-run, the shift is updatedperiodically as following: S ( τ ) = S ( τ − A ∆ τ ) − γA ∆ τ ln N ( τ ) N ( τ − A ∆ τ ) , (13)where γ is a damping factor, and A is the number of timesteps between successive updates of the shift. The intu-ition behind this formula is simple. When the populationincreases, the shift is lowered, which decreases the popu-lation in subsequent steps and vice versa. The logarithmis used because the change in the number of walkers isexponentially proportional to the energy/shift diﬀerence.This population control scheme is the most commonlyused one in FCIQMC. To get the number of walkersaround a set target, the shift is held constant at the be-ginning of the simulation, during which the number ofwalkers increases exponentially. Then once it reaches itsdesired value, it is stabilized using Eq. (13). After anequilibration period, the average number of walkers be-comes a constant, and thus the average value of the shiftcan be used as an independent energy estimator.Ref. [13] has recently proposed an improved schemethat avoids potential overshoots in the number of walk-ers. The new scheme actively targets the desired numberof walkers in its update formula, evading the need forthe initial constant-shift mode. Other population con-trol schemes can also be devised where the amplitude ofthe wavefunction is controlled using its overlap with atrial wavefunction (cid:104) ψ T | ψ (cid:105) instead of the total number ofwalkers N . This can be achieved either using a formulasimilar to Eq. (13) or by setting the shift equal to the cor-responding projected energy estimator E T at each timestep. The latter keeps the average value of the overlap (cid:104) ψ T | ψ (cid:105) constant because − dd τ (cid:104) ψ T | ψ ( τ ) (cid:105) = (cid:2) E T ( τ ) − S ( τ ) (cid:3) (cid:104) ψ T | ψ ( τ ) (cid:105) . (14)Such intermediate normalization schemes have the ad-vantage of reducing the statistical noise in the corre-sponding projected energy estimates. III. POPULATION CONTROL BIAS

Population control introduces a bias into the stochas-tic solution of FCIQMC because of the dependence of theshift on the wavefunction itself and the resulting feedbackinto the solution. In the following, we will spell out thisdependence explicitly and gain insights into the associ-ated bias.At each time step of FCIQMC, the wavefunction isupdated stochastically such that the average change in the wavefunction obeys the Schr¨odinger equation (1) withshift S ( τ ) − dd τ | ψ ( τ ) (cid:105) = (cid:2) ˆ H − S ( τ ) (cid:3) | ψ ( τ ) (cid:105) . (15)The master equation of the average wavefunction it-self, however, is obtained by taking the average of bothsides [14] − dd τ | ψ ( τ ) (cid:105) = ˆ H | ψ ( τ ) (cid:105) − S ( τ ) | ψ ( τ ) (cid:105) . (16)Controlling the population by updating the shift neces-sarily implies that the shift and the wavefunction are cor-related, and thus the average product S ( τ ) | ψ ( τ ) (cid:105) cannotbe replaced by the product of the averages, but there isa non-vanishing covariance term S ( τ ) | ψ ( τ ) (cid:105) = S ( τ ) | ψ ( τ ) (cid:105) + Cov[ S ( τ ) , ψ ( τ )] . (17)As a result, the average wavefunction does not satisfythe Schr¨odinger equation exactly, but it instead evolvesaccording to the following equation, which has an addi-tional source term − dd τ | ψ ( τ ) (cid:105) = (cid:2) ˆ H − S ( τ ) (cid:3) | ψ ( τ ) (cid:105)− Cov[ S ( τ ) , ψ ( τ )] . (18)This additional term is the origin of the population con-trol bias in FCIQMC.We can go one step further and express the above equa-tion in terms of the average walker populations − dd τ N i = (cid:20) H ii + B i N − S (cid:21) N i + (cid:88) j (cid:54) = i H ij N i , (19)where we deﬁned the relative bias of population N i as B i := − Cov ( N i , S ) NN i . (20)From this, one can view the population bias as a modiﬁ-cation of the diagonal elements of the Hamiltonian H (cid:48) ii = H ii + B i N , (21)and so the biased wavefunction as the ground state ofthis modiﬁed Hamiltonian rather than the desired origi-nal Hamiltonian. Note that in anticipation of the discus-sion next subsection, we deﬁned B i in a way that explic-itly highlights the dependence of the bias on the averagenumber of walkers N .As an illustrative example, we show in Fig. 1 the en-ergy estimates for the one-dimensional 14-site Hubbardmodel with U/t = 4 and periodic boundary conditions.The model is small enough that we can calculate its ex-act ground state energy E with exact diagonalization.A quick recap of the Hubbard model is presented in Sec-tion VI A. We performed FCIQMC calculations target-ing an increasing number of walkers from N = 100 to .

000 0 .

001 0 .

002 0 .

003 0 .

004 0 .

005 0 . /N − . − . − . − . E n e r g y SE ± E ∞ N Figure 1. FCIQMC results for the 14-site one-dimensionalHubbard model for a diﬀerent number of walkers. The pro-jected energy E ± is computed using a uniform trial wavefunc-tion as deﬁned by Eq. (10). The set target numbers of walkersare 100 , , ..., N are always larger due to the overshoot in theconstant-shift mode. N = 3200, and plotted both the shift estimator S and theuniform projected energy estimator E ± . The results areobtained with time step ∆ τ = 10 − using 16 × itera-tions with an additional 10 iterations of thermalization.The shift is updated every 10 iterations with damping pa-rameter γ = 0 .

1. The error bars are calculated using theblocking method of Ref. [15]. From this plot, we see thatboth the projected energy and the shift estimators havea bias that scales down the average number of walkers as1 /N . We also note that the shift estimator has a notice-ably higher bias than the projected energy at any speciﬁcnumber of walkers. The scaling behavior is discussed inthe next subsection, while the discrepancy between thediﬀerent estimators is discussed subsequently. A. Scaling with the Number of Walkers

To show that the population control bias scales as 1 /N ,we need to show that the relative bias B i is independentof the average number of walkers. Our starting point isthe shift update formula (13) relating the shift to thetotal number of walkers. We will assume that the shiftis updated continuously to simplify the analysis. Never-theless, the conclusion regarding scaling still holds in thecase of a periodic update and is discussed in Appendix A.By rearranging the terms in the update formula, onecan see that the quantity S ( τ ) + γA ∆ τ log N ( τ ) (22)stays constant for all times τ . Therefore, its covariancewith any other statistical quantity vanishes, which allowsus to express the covariance between the shift and the population of determinant | D i (cid:105) as follows:Cov ( N i , S ) = − γA ∆ τ Cov ( N i , log N ) ≈ − γA ∆ τ Cov (cid:18) N i , log N + N − NN (cid:19) = − γA ∆ τ Cov ( N i , N ) N , (23)where in the second line, we expanded the logarithm of N around its mean value N . Substituting back in therelative bias deﬁnition (20), we get B i ≈ γA ∆ τ Cov ( N i , N ) N i . (24)When there is no sign problem or the number of walkersis above the annihilation plateau, the mean populationsof all determinants scale linearly with the total number ofwalkers. Moreover, since the spawning and death/cloningevents in FCIQMC are scaled Bernoulli random variables,their variances are proportional to their mean values.Therefore, the covariances between the populations ofdiﬀerent determinants also scale linearly with the num-ber of walkers. As a result, the numerator and denomi-nator of the above ratio have the same scaling, such thatthe relative bias B i is mostly independent of the totalnumber of walkers. B. Diﬀerent Biases for Diﬀerent Estimators

In Fig. 1, we saw that the shift estimator has a higherbias than the projected energy estimator at any speciﬁcnumber of walkers. We observed this systematic devia-tion of the two estimators for diﬀerent systems and wasalso reported in Ref. [8]. To understand it, let us considerEq. (19) in the ideal case scenario where all determinantshave the same relative bias B i = B . In that special case,the wavefunction itself and all its associated energy esti-mators would not be biased, but the average shift wouldstill have the following positive bias: S − E = BN . (25)This suggests that the shift, as an energy estimator, isparticularly aﬀected by the population bias.We can generally quantify how much the average shiftis biased compared to any other projected energy esti-mator. By projecting the biased master equation (18)on the corresponding trial wavefunction, we get the timeevolution of its overlap with the solution − dd τ (cid:104) ψ T | ψ ( τ ) (cid:105) = (cid:104) ψ T | (cid:0) ˆ H − S ( τ ) (cid:1) | ψ ( τ ) (cid:105)− Cov (cid:2) S ( τ ) , (cid:104) ψ T | ψ ( τ ) (cid:105) (cid:3) (26) .

000 0 .

001 0 .

002 0 .

003 0 .

004 0 .

005 0 . /N . . . . . . . E n e r g y S − E ± − Cov (

S, N ) /N ∞ N Figure 2. A comparison of the two sides of Eq. (28) for the 14-site one-dimensional Hubbard model of Fig. 1. This relationallows numerically estimating the uniform projected energy E ± from the average shift S using the average number ofwalkers N and its covariance with shift Cov( S, N ). At equilibrium, the left-hand side vanishes and the equa-tion gives a direct relation between the average shift andthe projected energy estimate S − E T = − Cov (cid:2) S, (cid:104) ψ T | ψ (cid:105) (cid:3) (cid:104) ψ T | ψ (cid:105) . (27)This diﬀerence in the bias of the two estimators can beeasily estimated in FCIQMC using samples of the over-lap (cid:104) ψ T | ψ (cid:105) . For example, for sign-problem-free cases, thediﬀerence between the average shift and the uniform pro-jected energy estimate can be expressed in terms of theaverage number of walkers and its covariance with theshift S − E ± = − Cov (

S, N ) N . (28)In Fig. 2, we show that this equation is indeed satisﬁedby the earlier results of the Hubbard model. This relationprovides a simple way of partially correcting the bias ofthe average shift in cases where calculations of projectedenergies are not practically feasible. To remove the biascompletely, we need to correct the wavefunction itself,which we address in the next section. It is worth notingthat Fig. 2 demonstrates that the covariance Cov(

S, N )is mostly independent of N , which is consistent with thesame behavior of Cov( S, N i ) concluded from the previoussection. The dependence of the covariance terms on othersimulation parameters is discussed in Appendix B. IV. BIAS CORRECTION

The intuitive idea behind eliminating population con-trol bias is noting that the role of updating the shift ismerely scaling the wavefunction. The problem is that the scale factors are correlated with the sampled wavefunc-tions. Nevertheless, if we keep track of these factors anduse them to scale the wavefunctions back, we can undothe eﬀects of the shift updates. As a result, these rescaledwavefunctions would then satisfy the Schr¨odinger equa-tion with a constant shift and thus without bias.In the following subsection, we show mathematicallyhow this is achieved by a reweighting of FCIQMC sam-ples. The reweighting technique can be used to correctthe estimates of any wavefunction-related quantity, butit cannot be applied directly to the shift average. Sowe show next how to ﬁx the shift estimator properly. Fi-nally, we discuss some technical details of the reweightingtechnique.

A. Correcting the Wavefunction

In FCIQMC, the time derivative in Schr¨odinger equa-tion is discretized using Euler’s explicit method, leadingto the linear propagator | ψ ( τ + ∆ τ ) (cid:105) = (cid:104) ˆ I − ∆ τ (cid:16) ˆ H − S ( τ ) (cid:17)(cid:105) | ψ ( τ ) (cid:105) . (29)However, to make the derivation and results morestreamlined, we assume that FCIQMC is using the fol-lowing exponential propagator instead of its linear ap-proximation | ψ ( τ + ∆ τ ) (cid:105) = exp (cid:104) − ∆ τ (cid:16) ˆ H − S ( τ ) (cid:17)(cid:105) | ψ ( τ ) (cid:105) (30)This replacement of the linear propagator by the expo-nential one does not aﬀect the validity of the results. Thereason is that the next step requires decomposing thepropagator into the product of two propagators. For thelinear propagator, this decomposition introduces a time-step error of the order O (∆ τ ), which is the same orderof error introduced by using the exponential propagatorinstead.Let us now decompose the propagator into one witha hypothetical constant shift C and a remaining purescaling termexp (cid:104) − ∆ τ (cid:16) ˆ H − S ( τ ) (cid:17)(cid:105) = exp (cid:104) − ∆ τ (cid:16) ˆ H − C (cid:17)(cid:105) × exp (cid:104) − ∆ τ (cid:16) C − S ( τ ) (cid:17)(cid:105) (31)Substituting back in Eq. (30), we get the following equa-tion, which is (approximately) satisﬁed at every iterationof FCIQMCexp (cid:104) ∆ τ (cid:16) C − S ( τ ) (cid:17)(cid:105) | ψ ( τ + ∆ τ ) (cid:105) =exp (cid:104) − ∆ τ (cid:16) ˆ H − C (cid:17)(cid:105) | ψ ( τ ) (cid:105) (32)Denoting X C ( τ ) := exp (cid:104) ∆ τ (cid:16) C − S ( τ ) (cid:17)(cid:105) and taking theaverage of both sides X C ( τ ) | ψ ( τ + ∆ τ ) (cid:105) = exp (cid:104) − ∆ τ (cid:16) ˆ H − C (cid:17)(cid:105) | ψ ( τ ) (cid:105) , (33)we see that the average of X C ( τ ) | ψ ( τ + ∆ τ ) (cid:105) is the unbi-ased evolution of the average of | ψ ( τ ) (cid:105) using the constantshift C .By repeating the above analysis for an arbitrary num-ber of iterations n , we get the following general relation W C,n ( τ ) | ψ ( τ ) (cid:105) = exp (cid:104) − ∆ τ (cid:16) ˆ H − C (cid:17)(cid:105) n | ψ ( τ − n ∆ τ ) (cid:105) , (34)where the weight factor is now the product of subsequentscaling factors W C,n ( τ ) := n (cid:89) j =1 X C ( τ − j ∆ τ )= exp  − ∆ τ n (cid:88) j =1 (cid:16) S ( τ − j ∆ τ ) − C (cid:17) . (35)This expression shows that by performing enough itera-tions and using a large-enough value of n , we can projectout the exact ground state | φ (cid:105) without any populationcontrol bias by reweighing the wavefunction at iteration τ with weight W C,n ( τ )lim n →∞ W C,n ( τ ) | ψ ( τ ) (cid:105) ∝ | φ (cid:105) . (36)In Fig. 3, we plot these weights for the 14-site Hubbardmodel using diﬀerent values of n . The value n can bethought of as the order of correction applied to the wave-function. Weights of order n reﬂect how the amplitude ofthe wavefunction would change if the population controlhad been suspended for a period of n ∆ τ . For large valuesof n , the weights ﬂuctuate over several orders of magni-tude, which eﬀectively reduces the number of samples inany averaged quantity. This puts a practical limit on thechosen value of n , as we will see later. B. Correcting Energy Estimators

Correcting the projected energy estimators using thewavefunction weights is straightforward. One has toreweight the terms appropriately in both the numeratorand the denominator of Eq. (9) E Tcorrected ≈ (cid:80) Li =1 W C,n ( τ i ) (cid:104) ψ T | H | ψ ( τ i ) (cid:105) (cid:80) Li =1 W C,n ( τ i ) (cid:104) ψ T | ψ ( τ i ) (cid:105) . (37)To correct the shift, however, the reweighting cannot beapplied directly. We need ﬁrst to relate the shift to thecorrected wavefunction. Being able to correct the shiftis valuable because, for some systems, obtaining reliableprojected energy estimates is computationally expensivewhile the shift is readily available.

100 120 140 160 180 200 τ − − W S , n ( τ ) n = 40 n = 320 n = 1280 n = 10240 Figure 3. Correction weights for the 14-site Hubbard modelusing diﬀerent correction orders n . The data corresponds toan average number of walkers N = 167. Evolving the exact ground state wavefunction | φ (cid:105) us-ing the Schr¨odinger equation with ﬁxed shift C leads toan exponential decay of its norm with rate E − C exp (cid:104) − ∆ τ (cid:16) ˆ H − C (cid:17)(cid:105) | φ (cid:105) = exp [ − ∆ τ ( E − C )] | φ (cid:105) (38)If we project the above equation on some trial state | ψ T (cid:105) and rearrange the terms we get what is known as the growth estimator of the ground state energy E Tgrowth = C − τ log  (cid:104) ψ T | exp (cid:104) − ∆ τ (cid:16) ˆ H − C (cid:17)(cid:105) | φ (cid:105)(cid:104) ψ T | φ (cid:105)  (39)We can approximate both the numerator and denomina-tor using equation (34) and a large enough value of n | φ (cid:105) ≈ W C,n ( τ ) | ψ ( τ ) (cid:105) (40)exp (cid:104) − ∆ τ (cid:16) ˆ H − C (cid:17)(cid:105) | φ (cid:105) ≈ W C,n +1 ( τ + ∆ τ ) | ψ ( τ + ∆ τ ) (cid:105) (41)Substituting back in Eq. (39), and replacing the meanvalues by averages over L iterations, we get E Tgrowth ≈ C − τ log (cid:34) (cid:80) Li =1 W C,n +1 ( τ i +1 ) (cid:104) ψ T | ψ ( τ i +1 ) (cid:105) (cid:80) Li =1 W C,n ( τ i ) (cid:104) ψ T | ψ ( τ i ) (cid:105) (cid:35) (42)This expression is valid for any value of C below theﬁrst excitation energy and any trial wavefunction ψ T with nonzero overlap with the ground state. The speciﬁcchoices may only inﬂuence the statistical uncertainty ofthe estimator.Incidentally, by setting C to the average shift S , thesecond term gives us the necessary correction to removethe population control bias of the average shift. In fact,using the average shift as C is optimal in the sense thatits value minimizes the squared distance to all sampledshift values and thus minimizes the ﬂuctuations of thescale factors X C . As was the case for project energies,

200 800 3200 N − . − . − . − . − . E n e r g y E SE ± Corrected S Corrected E ± Figure 4. Corrected energy estimates of the 14-site Hubbardmodel. All points have error bars, but most of them are toosmall to be seen on this plot. Within these error bars, thecorrected shift S and the corrected uniform projected energy E ± agree with each other and with the exact energy E . the choice of the trial wavefunction depends on the sys-tem. It is reasonable in molecular systems to use the ref-erence determinant or the ground state of a trial spaceconstructed out of the most populated determinants. Insign-problem-free systems, using the uniform trial wave-function of Eq. (10) the overlap (cid:104) ψ T | ψ ( τ ) (cid:105) is just thetotal number of walkers N ( τ ) S corrected ≈ S − τ log (cid:34) (cid:80) Li =1 W S,n +1 ( τ i +1 ) N ( τ i +1 ) (cid:80) Li =1 W S,n ( τ i ) N ( τ i ) (cid:35) (43)In Fig. 4, we plot the results of applying this reweightingtechnique to the 14-sties Hubbard model. The reweight-ing eliminates the population control bias in both theshift and projected energy estimates, with the correctedvalues falling spot on the exact energy. In this plot, weused n = 2560 terms in the weight factors. C. Correction Order

The number of correction terms n included in theweight factors W C,n aﬀects both the accuracy and preci-sion of the corrected energy estimates. Ideally, we wantto include as many terms as possible because more termsimply longer projection times, Eq. (34), thus further re-duction of the population control bias (better precision).On the other hand, including more terms implies longertimes without population control and larger the ﬂuctua-tions of the weight factors, Fig. 3, thus larger error bars(less accuracy). In Fig. 5, we plot the corrected shiftestimates against the correction order n . As the correc-tion order increases, the population control bias decaysexponentially while the statistical error bars get larger.After some point, around n = 3000, increasing the cor-rection order has diminishing returns for the bias but stillincreases the error bars. The optimal value of n is theone for which the bias is just below the statistical uncer- n ) − . − . − . − . E n e r g y E Corrected S Figure 5. Corrected shift estimates with increasing orders ofcorrection n for the 14-site Hubbard model. The data corre-sponds to the average number of walkers N = 167.

200 800 3200 N − . − . − . − . E n e r g y E n = 160 n = 320 n = 640 n = 1280 n = 2560 Figure 6. Corrected shift estimates of the 14-site Hubbardmodel using diﬀerent orders of correction n and diﬀerent av-erage number of walkers N . tainty. Knowing this value would not be possible with-out knowing the exact result. As a heuristic, we suggestchoosing n as the lowest value, after which the energyestimate goes up (assuming the bias is positive). Whena set of calculations with diﬀerent numbers of walkers isavailable, a better choice can be made. Choose the valuefor which the corrected energy estimates from diﬀerentnumber walkers agree within the error bars. For exam-ple, this is the value n = 2560 for the result of the 14-siteHubbard shown in Fig. 6.Finally, it is worth pointing out that when comparingthe corrected estimates of diﬀerent calculations using dif-ferent time steps ∆ τ , the relevant quantity is the totalimaginary time T := n ∆ τ , which should be long enoughto project out the bias and make it smaller than the sta-tistical error bars. Consequently, it is desirable to choosethe time step as large as possible. On the other hand,the correction formula has a local truncation error of theorder O (∆ τ ) due to the replacement of the linear prop-agator by the exponential one. The global truncationerror thus scales as O (∆ τ ) for a ﬁxed projection time T . In practice, however, we did not observe this errorin our calculations even when the time-step was close tothe maximum allowed value 2 / ( E max − E ), where E max is the maximum eigenvalue of the Hamiltonian. V. ROLE OF IMPORTANCE SAMPLING

Since the population control bias is proportional tothe covariance between the shift and the wavefunction,reducing the variance of the shift or the wavefunctionshould help reduce the bias. A well-known technique forreducing the variance of quantum Monte Carlo estimatesis importance sampling [16, 17]. Given a guiding wave-function | ψ G (cid:105) , importance sampling corresponds to solv-ing an auxiliary problem with the modiﬁed Hamiltonian˜ H ij = H ij (cid:104) D i | ψ G (cid:105)(cid:104) D j | ψ G (cid:105) . (44)The wavefunctions of the original and auxiliary problemare then related by ˜ N i = N i (cid:104) D i | ψ G (cid:105) . (45)Using a guiding wavefunction that is close to the groundstate, reduces the ﬂuctuation in the number of walkersand the shift. The reason is that a walker on a deter-minate | D j (cid:105) contributes on average τ ( (cid:80) i ˜ H ij − S )walkers to the next iteration. As the guiding wavefunc-tion becomes closer to the ground state, the variation inthe column-sum (cid:80) i ˜ H ij gets smaller, reducing the varia-tion in the number of walkers and its dependence on thedetails of the sampled wavefunction. From a diﬀerentperspective, the propagator becomes closer to a column-stochastic matrix alleviating the need for population con-trol. Importance sampling has also been discussed in thecontext of Density Matrix Quantum Monte Carlo [18].As an illustrative example, take the limiting case wherethe guiding wavefunction is the exact ground state. Thenthe column-sum of the modiﬁed Hamiltonian matrix ele-ments becomes a constant (cid:88) i ˜ H ij = (cid:88) i H ij (cid:104) D i | φ (cid:105)(cid:104) D j | φ (cid:105) = E (46)and thus all walkers have the same reproduction rate1 + ∆ τ ( E − S ) . Moreover, the uniform projected en-ergy E ± equals the grounds state energy E regardlessof the sampled wavefunction. Therefore, the populationcontrol bias is given entirely by Eq. (28) which only de-pends on the covariance of the shift with the numberof walkers rather than with the details of the sampledwavefunction.In practice, the guiding wavefunction does not neces-sarily need to be very close to the ground state. Us-ing even the simplest forms of guiding wavefunctions canhave a noticeable impact on the population control bias.In this paper, we employ the following Gutzwiller-likewavefunction [19], where energetically-unfavorable deter-minants are exponentially suppressed | ψ G (cid:105) = (cid:88) i e − gH ii | D i (cid:105) , (47) N − . − . − . − . E n e r g y p e r S i t e E S - Original E ± - Original S - Guided E ± - Guided Figure 7. Comparison of the energy estimates of the 6 × g = 3. . . . . . . g − . − . − . − . E n e r g y p e r S i t e E SE ± Figure 8. FCIQMC energy estimates of the 6 × g . and g is a non-negative parameter to be optimized later.Implementing importance sampling using this guidingwavefunction is straightforward in FCIQMC. Spawnsfrom determinant | D i (cid:105) to determinant | D j (cid:105) have to bescaled by the factor w = exp (cid:2) − g ( H jj − H ii ) (cid:3) . (48)Computing this factor incurs only a small computationaloverhead, and for local lattice models, it can be evaluatedin a way that is independent of the system size. In Fig. 7,we show the eﬀect of using this guiding wavefunction onthe 6 × g of Gutzwiller wavefunction was chosen by run-ning diﬀerent FCIQMC calculation using 1000 walkersand diﬀerent values of g and then selecting the one giv-ing the lowest energy estimates as shown in Fig. 8. VI. APPLICATIONS

To demonstrate the capabilities of the presented im-provements of the FCIQMC algorithm, we apply it tosign-problem-free lattice models with large lattice sizes.The calculations were performed using the eﬃcient NECIimplementation [20]. In these calculations, we employedimportance sampling using the Gutzwiller-like wavefunc-tion of Eq. (47) and optimized its parameter g as de-scribed earlier. Since calculating the uniform projectedenergy gets more expensive for larger grid sizes, we re-lied solely on the shift as an energy estimator. To accountfor any remaining population control bias, we used thereweighting formula of Eq. (43) to correct the shift aver-age. A. Hubbard Model

The Hubbard Hamiltonian on a general lattice readsˆ H = − t (cid:88) (cid:104) ij (cid:105) ,σ ˆ c † i,σ ˆ c j,σ + U (cid:88) i ˆ n i, ↑ ˆ n i, ↓ (49)where (cid:104) ij (cid:105) indicates summation over all neighboring sites,ˆ c † i,σ (ˆ c i,σ ) is the fermionic creation (annihilation) operatorof an electron of spin σ at site i and ˆ n i,σ = ˆ c † i,σ ˆ c i,σ is thecorresponding number operator, t is the hopping ampli-tude between nearest neighbors and U is on-site Coulombrepulsion. This simple model of interacting electrons ona lattice has been under study for over 50 years and isused as a model for understanding the physics of high-temperature superconductors [21–24]. Interestingly, theone-dimensional case can be solved in the thermody-namic limit by employing Bethe ansatz, which reducesthe problem to a set of coupled algebraic equations, Lieb–Wu equations, which can be solved analytically in thethermodynamic limit [25, 26].Using open boundary conditions, the 1D Hubbardmodel does not have a sign problem in FCIQMC. Thiscan be seen by ordering the orbitals by their spin thenby their lattice site. Since in real space, it is only ki-netic energy terms that cause transitions between twoSlater determinants (by hopping single electrons between nearest-neighbor sites), and the sign of these matrix el-ements are negative, and in addition, since (in 1D andopen boundary conditions) there is no possibility for anelectron to return to its original lattice site via alter-native routes, the problem is sign-problem-free. Withperiodic boundary conditions, electrons hopping acrossthe boundary can, in general, lead to positive matrix el-ements due to a possible negative Fermi-sign from thepaths that wrap around the boundary and return to theoriginal Slater determinant. Nevertheless, if the numberof spin-up electrons and the number of spin-down elec-trons are both odd, the number of required commuta-tions will always be even, and thus no Fermi-sign arises.This means that speciﬁc oﬀ-half-ﬁlling periodic modelscan also be studied with the FCIQMC method withoutany sign problem.In the following, we look at some paradigmatic long1D Hubbard systems. We benchmark our results againstDMRG results obtained using the BLOCK code [27].DMRG is a deterministic method that uses the so-called matrix product state (MPS) to represent the ground-state wavefunction. The MPS is iteratively optimizedin sweeps through the sites in a predeﬁned order, whichis trivially chosen in 1D systems. The method beneﬁtsfrom the low-entanglement of the ground states that isespecially present in 1D problems. The computationalresources required to solve a system with DMRG up to agiven accuracy is mainly determined by the bond dimen-sion M . DMRG also beneﬁts from the locality, which isbroken by introducing periodic boundary conditions.In Table I, we show FCIQMC and DMRG results of the102-site Hubbard chain for U/t = 4 and 8 at half-ﬁllingas well as the 150-site Hubbard chain at

U/t = 8. Theseresults are also compared with the analytical result forthe thermodynamic limit obtained with the Bethe ansatz.The on-site interaction parameter

U/t = 8 is chosen as anexample of an intermediate coupling regime. In both thereal-space and the reciprocal-space basis representation,the wavefunction is highly spread-out; a single Slater de-terminant is a bad approximation in both cases, thusmaking it a diﬃcult system to treat. For

U/t = 4, usinga reciprocal-space basis would normally be more suit-able as the wavefunction would be more compact in thiscase. However, we stick to the real-space basis becauseFCIQMC beneﬁts from the absence of a sign problemsigniﬁcantly more than compactness. For the 102-sitesystems, we also calculate both with open boundary con-ditions (OBC) and periodic boundary conditions (PBC).The DMRG computational eﬀort for 1D systems withOBC typically scales as M , whereas it scales as M forPBC. There are, however, similar MPS-based algorithmsavailable that improve the scaling of PBC calculationscompared to traditional DMRG [28, 29]. In contrast, thecomputational cost of FCIQMC is the same for open andperiodic boundary conditions as long as the system con-tains an odd number of spin-up and spin-down electronssuch that it remains strictly sign-problem-free. Whilethere is still a small bias for the guided but uncorrected1 System Sites Method Energy per site (PBC) Energy per site (OBC)

U/t = 4half-ﬁlling 102 DMRG − .

573 79 − .

570 13FCIQMC – Original − .

570 76(40) − .

566 80(50)FCIQMC – Guided − .

573 71(8) − .

570 11(7)FCIQMC – Corrected − .

573 75(8) − .

570 17(9) ∞ Bethe ansatz − .

573 73

U/t = 8half-ﬁlling 102 DMRG − .

327 57 − .

325 50FCIQMC – Original − .

322 99(49) − .

321 19(43)FCIQMC – Guided − .

327 54(1) − .

325 48(2)FCIQMC – Corrected − .

327 55(3) − .

325 49(3)150 DMRG − .

327 55FCIQMC – Original − .

302 54(154)FCIQMC – Guided − .

327 39(6)FCIQMC – Corrected − .

327 54(9) ∞ Bethe ansatz − .

327 53

U/t = 84 holes ( M s = 0) 102 DMRG − .

392 29 − .

390 04FCIQMC – Original − .

388 52(32) − .

387 09(39)FCIQMC – Guided − .

392 28(3) − .

390 02(2)FCIQMC – Corrected − .

392 29(3) − .

390 03(2)Table I. Results for the 1D Hubbard model. Technical details for the FCIQMC calculations are shown in Table II and forthe DMRG calculations in Table III, respectively. “FCIQMC – Original” implies no bias correction or importance sampling.“FCIQMC – Guided” implies the use of importance sampling with the Gutzwiller-like factor. “FCIQMC – Corrected” impliesthe population control bias correction on top of the importance sampling. Chain lengths of ∞ indicate calculations in thethermodynamic limit. result, the corrected FCIQMC result agrees well withDMRG within statistical error bars. The larger statis-tical error bars for the U/t = 4 compared to the

U/t = 8for comparable wall clock time are due to the fact thatthe real-space basis is less suitable for lower

U/t .Additionally, we also look at the 102-site system withfour holes (with total spin projection M s = 0) at U/t = 8.As there are more low-lying excited states, it takes moreiterations for FCIQMC to project out the ground state.This results in slightly larger error bars compared tothe equivalent half-ﬁlled system. Systems oﬀ half-ﬁllingare also more diﬃcult to treat with DMRG, and thushigher M values are required to get a converged result.Still, there is good agreement between the DMRG bench-mark result and the corrected FCIQMC result.In general, the FCIQMC results can be continuouslyimproved by adding more CPU time which, is typicallythe bottleneck to reduce statistical error bars. However,the required memory to store the stochastic representa-tion of the FCIQMC wavefunction at any time duringthe simulation is typically much lower than the corre-sponding MPS in DMRG, even though an MPS is al-ready a very eﬃcient way of expressing the ground-statewavefunction of a 1D system. The basic memory require- ments of an FCIQMC calculation are the list of the in-stantaneously occupied determinants in a binary format(with the length of this binary representation dependinglinearly on the number of sites) and the instantaneouswalker population on these determinants. Additionally,there is the need to store the hash table for the mainwalker list (requiring a hash table with two integers perentry and a list containing empty indices, both roughlyscaling with the total number of determinants) and thespawning array (in a conservative estimation requiringup to 1/10 of the main walker list). Thus, in total, oneneeds to store n FCIQMCint64 = (cid:32)(cid:24) (cid:96) (cid:25) + 1 + 3 + 110 (cid:24) (cid:96) (cid:25)(cid:33) × N maxdets (50)64-bit integers, where (cid:96) is the number of sites and N maxdets isthe maximum number of occupied determinants through-out the simulation. Based on this, we can approximatethe memory usage: e. g., the periodic 102-site system at U/t = 8 was calculated with 3 × walkers. Sincethe wavefunction is highly spread out, we can assumethat the walkers also occupy roughly 3 × determi-nants. According to Eq. (50), this then requires 2 . × System Sites N/ g nU/t = 4, half-ﬁlling 102 50 − .

17 2560

U/t = 8, half-ﬁlling 102 30 − .

15 5120150 50 − .

15 5120

U/t = 8, 4 holes ( M s = 0) 102 30 − .

15 2560Table II. Technical details for the FCIQMC calculations ofthe Hubbard model. Listed are the number of walkers N ,the Gutzwiller parameter g and the number of terms n inthe expansion of the population bias correction. The optimalGutzwiller is obtained by the projection scheme presentedin [30]. Optimization of g using the scheme presented in Fig. 8only slightly diﬀers.System Sites M (PBC) M (OBC) U/t = 4, half-ﬁlling 102 2000 300

U/t = 8, half-ﬁlling 102 2000 300150 4000 –

U/t = 8, 4 holes ( M s = 0) 102 5000 1000Table III. Bond dimensions M for the calculation of theDMRG results in Table I. .

02 GB. This amount isstored as distributed memory as every CPU thread onlyneeds to access data about the determinants that resideon it. However, this is not a fundamental lower limit.In principle, this number could be easily reduced fur-ther by running with fewer walkers for more iterations.In DMRG, on the other hand, the memory requirementscales quadratically with the bond dimension. The MPSitself requires to store 4 (cid:96) arrays of size M × M . Another4 M doubles is used to store the derivative at a givensite during a sweep. To speed up the calculation of thederivative, usually a contraction of (cid:104) ψ | ˆ H | ψ (cid:105) over left andright neighboring sites for each site is also kept in mem-ory, requiring an additional 2 M D ˆ H (cid:96) doubles. D ˆ H = 4is the bond dimension of the 1D Hubbard Hamiltonianwhen assuming that the h. c. terms are calculated at run-time and are not stored. This sums up to n DMRGdoubles = M (cid:2) (cid:96) + 1) + 2 (cid:96)D ˆ H (cid:3) (51)double precision ﬂoats. For the exemplary 102-site sys-tem a bond dimension of M = 2000 was used (see ta-ble III). This results in approximately 4 . × doubles,requiring 39 . B. Heisenberg Model

When the Coulomb repulsion U in the Hubbard modelis much larger than the hopping parameter t , double oc-cupancies become energetically unfavorable. One can then project the Hubbard Hamiltonian to the space ofstates with no double occupations and derive an eﬀectiveHamiltonian describing the low-energy excitations. Athalf-ﬁlling, these excitations are spin excitations, and theeﬀective Hamiltonian is the spin-1 / J = 4 t /U :ˆ H = J (cid:88) (cid:104) ij (cid:105) S i . S j , (52)where S i = ( S xi , S yi , S zi ) is the spin operator on site i with S xi = 12 (cid:16) ˆ c † i, ↑ ˆ c i, ↓ + ˆ c † i, ↓ ˆ c i, ↑ (cid:17) (53) S yi = i (cid:16) ˆ c † i, ↓ ˆ c i, ↑ − ˆ c † i, ↑ ˆ c i, ↓ (cid:17) (54) S zi = 12 (cid:16) ˆ c † i, ↑ ˆ c i, ↑ − ˆ c † i, ↓ ˆ c i, ↓ (cid:17) (55)The relation between the two Hamiltonians can be de-rived using second-order perturbation theory in the t/U parameter, which shows that neighboring electrons withopposite spins can beneﬁt from virtual hopping formingdouble occupancies. The Heisenberg model is a basicand important model for studying magnetic materialsand their phase transitions. Like the Hubbard model,it can be solved in the one-dimensional case using theBethe ansatz [31, 32]. In two dimensions, the model hasbeen solved numerically to a high-accuracy for diﬀerentlattice sizes using diﬀerent methods including Green’sfunction Monte Carlo (GFMC) [33–35], stochastic seriesexpansion (SSE) [36], and world line loop/cluster meth-ods (WLQMC) [37–40]. The model has also been usedas a benchmark for recent variational ansatzes such asprojected entangled pair states (PEPS) [41, 42], neuralquantum states (NQS) [43], and neural autoregressivequantum states (NAQS) [44].In FCIQMC, the Heisenberg model is sign-problem-free on a bipartite lattice [10]. Walkers of diﬀerent signsdo not mix on such lattices, although all spawns haveopposite signs to their parents. To see why, we splitthe lattice into two interlacing sub-lattices, where eachsite on one sub-lattice is surrounded only by the sites ofthe other. Then we can classify determinants into twogroups, A and B , depending on whether a determinanthas an even or an odd number of spin-up electrons onone of these sub-lattices. Applying the Hamiltonian toa determinant increases the number of spin-up electronson one of the sub-lattices by one and decreases it on theother. Therefore, the Hamiltonian connects members of A only to other members of B and vice versa, so walkersof opposite signs never mix.In Tables IV and V, we compare the results ofFCIQMC with other methods for diﬀerent lattice sizes us-ing periodic and open boundary conditions, respectively.For periodic boundary conditions (PBC), we used thevalue g = 0 . n = 2560 as a correction order, while for open bound-ary conditions (OBC) we used the values g = 0 .

35 and3 n = 1280. For the lattice sizes 10 ×

10, 16 ×

16 and32 ×

32, we ran the calculations using 10 , 10 and 10 walkers, respectively. The respective numbers of itera-tions are 16 × , 10 , and 1 . × , with an additional10 iterations of thermalization.For periodic boundary conditions, we see thatFCIQMC results agree extremely well with SSE andis more accurate than GFMC. The speciﬁc variant ofGFMC used for obtaining these results is close in spirit toFCIQMC and formulated similarly as a projective MonteCarlo method in the Hilbert space [35]. The two methodsdiﬀer mainly in the details of their walker dynamics andpopulation control. As pointed out in Ref. [36], the dis-crepancy between SSE (and thus also FCIQMC) resultsand GFMC results is likely to be due to a remainingpopulation control bias in the latter. For open boundaryconditions, FCIQMC provides the same level of accuracyas the benchmark results of WLQMC. These two QMCmethods provide, as expected, much lower energies thanthe other variational methods: PEPS and NAQS. The re-sults conﬁrm our approach’s eﬀectiveness in eliminatingthe population control bias and demonstrate the poten-tial of FCIQMC in producing accurate results for largespin systems. Nonetheless, we note that the populationcontrol bias and the error bars of FCIQMC get largeras the lattice size increases. We attribute this behaviorto two factors: the shortcomings of our simple guidingwavefunction and the system’s closing gap in the ther-modynamic limit.One obvious drawback of the guiding wavefunc-tion (47), in this case, is breaking the rotational sym-metry of the Hamiltonian. To see this, remember thatfor the Heisenberg model, we use the eigenfunctions ofthe S zi operators as a single-particle basis and thus thediagonal Hamiltonian elements are only a function of thez-component of the spin H kk = J (cid:88) i (cid:104) D k | S zi S zi +1 | D k (cid:105) (56)However, the Heisenberg Hamiltonian of Eq. (52) isisotropic and thus the guiding wavefunction does not pos-sess the necessary rotational invariance of the true groundstate. The discrepancy between the guiding wavefunc-tion and the exact ground state evidently becomes morecritical as the systems size increases. It thus lessens theeﬀectiveness of importance sampling in reducing the pop-ulation control bias and the statistical error bars of theguided results. Moreover, it is has been shown that thegap between the ground state and the ﬁrst excited stateof the Heisenberg model scales inversely with the latticesize [45, 46], which implies longer projection times. Asa result, more correction terms need to be included inthe reweighting procedure to remove the same amountof bias, which increases the statistical error bars of thecorrected results further. While the closing gap is an in-herent property of the system under study, the guidingwavefunction is under our control and can be improvedto reﬁne the results further and accelerate convergence. Lattice Method Energy per Site10 ×

10 GFMC − .

671 492(27)SSE − .

671 549(4)FCIQMC – Guided − .

671 541(1)FCIQMC – Corrected − .

671 553(1)16 ×

16 GFMC − .

669 872(28)SSE − .

669 976(7)FCIQMC – Guided − .

669 960(5)FCIQMC – Corrected − .

669 976(8)32 ×

32 SSE − .

669 5115(8)FCIQMC – Guided − .

669 487(9)FCIQMC – Corrected − .

669 492(10)Table IV. Results for the Heisenberg model using periodicboundary conditions. GFMC results are taken from Ref. [35],while SSE results are taken from Refs. [36, 47].Lattice Method Energy per Site10 ×

10 PEPS − .

628 601(2)NAQS − .

628 627(1)WLQMC − .

628 656(2)FCIQMC – Guided − .

628 646(1)FCIQMC – Corrected − .

628 656(1)16 ×

16 PEPS − .

643 391(3)NAQS − .

643 448(1)WLQMC − .

643 531(2)FCIQMC – Guided − .

643 530(2)FCIQMC – Corrected − .

643 531(3)Table V. Results for the Heisenberg model using open bound-ary conditions. WLQMC, PEPS and NAQS results are ob-tained respectively from Refs. [41, 42, 44].

A promising research direction would be employing oneof the variational anastzes as a guiding wavefunction inFCIQMC. Particularly interesting are neural quantumstates that provide compact representation and allowfast evaluations of the guiding wavefunction, a necessaryproperty for its practical use in FCIQMC.

VII. SUMMARY

In this paper, we related the population control biasof FCIQMC directly to the covariance between the sam-pled wavefunction and the shift. We used this relationto explain why the bias scales inversely with the averagenumber of walkers and quantify the resulting discrepancy4between the diﬀerent energy estimators. We then deriveda post-processing reweighting procedure that practicallyeliminates the bias from diﬀerent energy estimates, in-cluding the average shift. We also showed how impor-tance sampling could dramatically reduce the bias andreduce the required number of correction terms and theassociated error bars. Both techniques have only a smallcomputational cost and make FCIQMC results practi-cally exact (within statistical error bars) in the absenceof a sign problem. Our approach’s eﬀectiveness makesFCIQMC competitive with other high-accuracy methodsin its application to large sign-problem-free systems. Yet,its value is not restricted to these systems but extendsto a broader range of systems with sign problems, whosestudy is the topic of an upcoming publication.

ACKNOWLEDGMENTS

A.A. acknowledges discussions with Joachim Brand onpopulation control bias. K.G. thanks Erik Koch for help-ful discussions about the Gutzwiller wavefunction. Theauthors gratefully acknowledge funding from the MaxPlanck Society.

Appendix A: Periodic Shift Update

When the shift is updated every A iterations insteadof continuously, we need to change the invariant quantityof Eq. (22) to S ( τ ) + γA ∆ τ log N ( τ (cid:48) ) , (A1)where τ (cid:48) = A × ( τ mod A ) is the last time the shift gotupdated. Then the results using a continuous updatestill hold if we redeﬁne the relative bias terms B i using quantities at the appropriate time steps B i ( τ ) := − Cov [ N i ( τ ) , S ( τ )] N ( τ (cid:48) ) N i ( τ ) ≈ γA ∆ τ Cov [ N i ( τ ) , N ( τ (cid:48) )] N i ( τ ) . (A2)Although N i and N above correspond to diﬀerent it-erations, their covariance still scales linearly with thenumber of walker as in Eq. (24). The main diﬀerenceis that we expect this covariance to be weaker becausethe correlation between quantities at diﬀerent iterationsgets smaller when they are further apart. Appendix B: Covariance of the Shift and theNumber of Walkers

The covariance between the shift and the number ofwalkers can be related to the variance in the number ofwalkers as following:Cov (

N, S ) = (cid:88) i Cov ( N i , S ) ≈ − γA ∆ τ Var ( N ) N , (B1)where Eq. (23) is used as an approximation. It is im-portant to note here that Cov (

N, S ) does not scale withparameters γ, A, ∆ τ and N as suggested by the aboveformula. The reason is that Var ( N ) implicitly dependson those parameters and scales in the opposite direction.For example, by using half the value of γ , the varianceof the number of walkers will almost double due to thelooser control of the shift. The behavior becomes moreevident by rewriting the covariance in terms of the shiftvariance Cov ( N, S ) ≈ − A ∆ τγ Var ( S ) N , (B2)which leads to the exact opposite explicit dependenceon these parameters. This argument can be extendedto the covariance between the shift and the individualpopulations N i and explains why we could not establisha clear relation between the population control bias andthe parameters γ, A or ∆ τ . [1] M. A. Lee and K. E. Schmidt, Computers in Physics ,192 (1992).[2] P. J. Reynolds, J. Tobochnik, and H. Gould, Computersin Physics , 662 (1990).[3] G. H. Booth, A. J. Thom, and A. Alavi, Journal ofChemical Physics (2009).[4] J. H. Hetherington, Physical Review A , 2713 (1984).[5] C. J. Umrigar, M. P. Nightingale, and K. J. Runge, TheJournal of Chemical Physics , 2865 (1993). [6] D. M. Ceperley and M. H. Kalos, “Quantum many-body problems,” in Monte Carlo Methods in StatisticalPhysics , edited by K. Binder (Springer Berlin Heidelberg,Berlin, Heidelberg, 1986) pp. 145–194.[7] N. Cerf and O. C. Martin, Physical Review E , 3679(1995).[8] W. A. Vigor, J. S. Spencer, M. J. Bearpark, and A. J.Thom, Journal of Chemical Physics (2015).[9] D. Cleland, G. H. Booth, and A. Alavi, Journal of Chem-ical Physics (2010). [10] J. S. Spencer, N. S. Blunt, and W. M. Foulkes, Journalof Chemical Physics (2012).[11] K. Ghanem, A. Y. Lozovoi, and A. Alavi, Journal ofChemical Physics (2019).[12] K. Ghanem, K. Guther, and A. Alavi, Journal of Chem-ical Physics (2020).[13] M. Yang, E. Pahl, and J. Brand, Journal of ChemicalPhysics (2020).[14] The average in Eq. (15) is conditional on the instanta-neous wavefunction and shift values from the previousiteration, while the average in Eq. (16) is the expecta-tion value over all the possible histories of the simulation.We denote both these averages by an overline, sacriﬁc-ing some mathematical rigor for more readability. Theintended meaning should be clear from the context.[15] H. Flyvbjerg and H. G. Petersen, The Journal of Chem-ical Physics , 461 (1989).[16] M. H. Kalos, D. Levesque, and L. Verlet, Phys. Rev. A , 2178 (1974).[17] C. J. Umrigar, The Journal of Chemical Physics ,164105 (2015).[18] N. Blunt, T. W. Rogers, J. S. Spencer, and W. M. C.Foulkes, Phys. Rev. B , 245124 (2014).[19] M. C. Gutzwiller, Phys. Rev. Lett. , 159 (1963).[20] K. Guther, R. J. Anderson, N. S. Blunt, N. A. Bogdanov,D. Cleland, N. Dattani, W. Dobrautz, K. Ghanem,P. Jeszenszki, N. Liebermann, G. L. Manni, A. Y. Lo-zovoi, H. Luo, D. Ma, F. Merz, C. Overy, M. Rampp,P. K. Samanta, L. R. Schwarz, J. J. Shepherd, S. D.Smart, E. Vitale, O. Weser, G. H. Booth, and A. Alavi,The Journal of Chemical Physics , 034107 (2020).[21] J. Hubbard and B. H. Flowers, Proceedings of the RoyalSociety of London. Series A. Mathematical and PhysicalSciences , 238 (1963).[22] M. C. Gutzwiller, Phys. Rev. Lett. , 159 (1963).[23] J. Kanamori, Prog Theor Phys , 275 (1963).[24] F. C. Zhang and T. M. Rice, Phys. Rev. B , 3759(1988).[25] E. H. Lieb and F. Y. Wu, Phys. Rev. Lett. , 1445(1968).[26] M. Takahashi, Progress of Theoretical Physics , 69(1972).[27] G. K.-L. Chan, J. J. Dorando, D. Ghosh, J. Hachmann,E. Neuscamman, H. Wang, and T. Yanai, in Frontiers in Quantum Systems in Chemistry and Physics , Progressin Theoretical Chemistry and Physics, edited by S. Wil-son, P. J. Grout, J. Maruani, G. Delgado-Barrio, andP. Piecuch (Springer Netherlands, Dordrecht, 2008) pp.49–65.[28] P. Pippan, S. R. White, and H. G. Evertz, Phys. Rev. B , 081103 (2010).[29] D. Dey, D. Maiti, and M. Kumar, Papers in Physics ,080006 (2016).[30] W. Dobrautz, H. Luo, and A. Alavi, Physical Review B (2019).[31] M. Karabach, G. M¨uller, H. Gould, and J. Tobochnik,Computers in Physics , 36 (1997).[32] M. Karbach, K. Hu, and G. M¨uller, Computers inPhysics , 565 (1998).[33] N. Trivedi and D. M. Ceperley, Phys. Rev. B , 4552(1990).[34] K. J. Runge, Physical Review B , 7229 (1992).[35] K. J. Runge, Physical Review B , 12292 (1992).[36] A. W. Sandvik, Physical Review B - Condensed Matterand Materials Physics , 11678 (1997).[37] U.-J. Wiese and H.-P. Ying, Zeitschrift f¨ur Physik B Con-densed Matter , 147 (1994).[38] H. G. Evertz, Advances in Physics , 1 (2003).[39] A. F. Albuquerque et al. , Journal of Magnetism and Mag-netic Materials , 1187 (2007).[40] B. Bauer et al. , Journal of Statistical Mechanics: Theoryand Experiment , P05001 (2011).[41] M. Lubasch, J. I. Cirac, and M.-C. Ba˜nuls, Phys. Rev.B , 064425 (2014).[42] W.-Y. Liu, S.-J. Dong, Y.-J. Han, G.-C. Guo, and L. He,Phys. Rev. B , 195154 (2017).[43] G. Carleo and M. Troyer, Science , 602 (2017).[44] O. Sharir, Y. Levine, N. Wies, G. Carleo, andA. Shashua, Phys. Rev. Lett. , 020503 (2020).[45] H. Neuberger and T. Ziman, Phys. Rev. B , 2608(1989).[46] Physical Review B , 11328 (1989).[47] A. W. Sandvik, AIP Conference Proceedings1297