A tail-regression estimator for heavy-tailed distributions of known tail indices and its application to continuum quantum Monte Carlo data
aa r X i v : . [ phy s i c s . d a t a - a n ] J un A tail-regression estimator for heavy-tailed distributions of known tail indicesand its application to continuum quantum Monte Carlo data
Pablo L´opez R´ıos
1, 2, ∗ and Gareth J. Conduit Max-Planck Institute for Solid State Research, Heisenbergstraße 1, 70569 Stuttgart, Germany Theory of Condensed Matter Group, Cavendish Laboratory, 19 J. J. Thomson Avenue, Cambridge CB3 0HE, UK
Standard statistical analysis is unable to provide reliable confidence intervals on expectation values of proba-bility distributions that do not satisfy the conditions of the central limit theorem. We present a regression-basedestimator of an arbitrary moment of a probability distribution with power-law heavy tails that exploits knowl-edge of the exponents of its asymptotic decay to bypass this issue entirely. Our method is applied to syntheticdata and to energy and atomic force data from variational and diffusion quantum Monte Carlo calculations,whose distributions have known asymptotic forms [J. R. Trail, Phys. Rev. E , 016703 (2008); A. Badinski etal. , J. Phys.: Condens. Matter I. INTRODUCTION
Monte Carlo integration methods [1] allow the evaluation ofarbitrarily complicated high-dimensional integrals using ran-dom, discrete samples of the integrand. Besides the need tocorrect for serial correlation [2, 3], the statistical analysis ofthese random samples is usually straightforward, and the finalresult of a Monte Carlo calculation is typically computed as astandard mean with an accompanying standard error, defininga confidence interval on the quantity of interest [4]. Howeverthere are problems for which the integrand diverges in suchmanner as to render these confidence intervals invalid.This is the case in continuum quantum Monte Carlo (QMC)methods [5, 6], a prominent family of tools for studyingcorrelated many-body systems. Given a trial wave function Ψ( R ) , the variational Monte Carlo (VMC) method evalu-ates the expectation value of an observable ˆ A by accumulat-ing its local value A ( R ) at random real-space configurationsof the particles in the system, { R } , distributed according to | Ψ( R ) | . The diffusion Monte Carlo (DMC) method samplesthe lowest-energy wave function Φ( R ) with the same nodalstructure as Ψ( R ) by stochastic projection according to theimaginary-time Schr¨odinger equation, and yields more accu-rate estimates of observables than VMC. The stochastic inte-gration employed by these methods allows using trial wavefunctions that are not analytically integrable, providing ex-traordinary flexibility and compactness in the description ofmany-body correlations [7–9]. The VMC and DMC methodsare routinely used to solve electronic structure and quantumchemistry problems [5, 10, 11].The mismatch between the nodes of the trial wave func-tion, { R : Ψ( R ) = 0 } , and those of the true ground-statewave function of the system are the main source of outliersin the local energy distribution sampled in QMC, resultingin heavy tails [12, 13] that preclude the evaluation of mean-ingful confidence intervals on the estimated variance of the ∗ [email protected] local energy. The local atomic force, comprising a Hellmann-Feynman force [14] and a Pulay term [15, 16], has heavy tailsarising both from the divergence of the electron-nucleus po-tential energy in all-electron systems [17] and from the nodalerror, which prevent the evaluation of meaningful confidenceintervals on the estimated expectation value of the force.Various methods have been proposed to circumvent the sta-tistical hurdles in the evaluation of atomic forces in QMC.Modified estimators of the force satisfying a zero-varianceprinciple have been proposed [18–22] that substantially re-duce the magnitude of the heavy tails in the local force dis-tribution. Force estimation methods based on the use of pseu-dopotentials [23, 24] can eliminate the problematic behaviorof the Hellmann-Feynman force [17], as does the fitting ap-proach proposed by Chiesa et al. [25]. However, some of thesemethods involve approximations, and none of them addressesthe heavy tails in the local Pulay force distribution, and there-fore the total force remains affected by an infinite-varianceproblem. The reweighted approach proposed by Attaccaliteand Sorella [26] does prevent the Pulay force from diverg-ing, but it involves modifying the sampling distribution and istherefore not applicable in DMC.Tail-index estimation methods [27–30] allow the exponentgoverning the asymptotic behavior of heavy-tailed probabilitydistributions to be estimated from statistical samples. Thisprior work, combined with knowledge of the exact asymptoticform of the tails of the local energy [13] and local force [17]distributions, provides us with the foundation to develop the tail-regression estimator (TRE) for heavy-tailed distributions.We demonstrate the application of this technique to VMC andDMC data, for which we are able to obtain forces and localenergy variances not affected by the infinite-variance problem.The rest of this paper is structured as follows. We reviewthe properties of the heavy tails of the local energy and lo-cal force distributions in section II. In section III we discussthe standard method for estimating an expectation value froma statistical sample, and we propose our tail-regression esti-mator in section IV. The application of our methodology isillustrated in section V using model distributions of knownstatistical properties. Finally, we present the results of apply-ing our method to the VMC energy of a homogeneous elec-tron gas and the VMC and DMC atomic force on an atom inthe all-electron C molecule in sections VI and VII, and ourconclusions are stated in section VIII. II. HEAVY TAILS IN QUANTUM MONTE CARLO
We explore the formal definition of expectation values inQMC to allow the characterization of the resulting heavy-tailed distributions. For simplicity, we restrict our analysisto the VMC method until Section VII, in which we discussthe application of our methodology to DMC data. Note thatwe ignore serial correlation in this work.Given a trial wave function Ψ( R ) , a VMC calculation eval-uates the expectation value h A i = R Ψ ∗ ( R ) ˆ A Ψ( R ) d R R | Ψ( R ) | d R , (1)by generating electronic configurations { R } according to thedistribution P R ( R ) = | Ψ( R ) | R | Ψ( R ) | d R , (2)and evaluating the local values A ( R ) = Ψ − ( R ) ˆ A Ψ( R ) ofobservable ˆ A . The VMC expectation value can be recast as h A i = Z ∞−∞ P A ( A ) A d A , (3)where P A ( A ) = Z ∂ Ω( A ) P R ( R ) |∇ R A ( R ) | d dN − R , (4)where d is the dimensionality of the system, N is the numberof electrons, and ∂ Ω( A ) is the ( dN − -dimensional regionof configuration space where A ( R ) = A .The local value of some important observables diverge atcertain configurations, and it is often possible to characterizethe asymptotic behavior of P A ( A ) from knowledge of the an-alytical form of Ψ( R ) near to these configurations. We sum-marize the relevant properties of the local energy and the localatomic force below. A. Local energy
Consider the Hamiltonian of a molecular system in Hartreeatomic units ( ~ = m e = | e | = 4 πǫ = 1 ), ˆ H ( R ) = − X i ∇ i + X i,j>i r ij + X i,I − Z I r iI + X I,J>I Z I Z J r IJ , (5)where r ij is the distance between the i th and j th electrons, r iI is the distance between the i th electron and the I th nucleus, r IJ is the distance between the I th and J th nuclei, and Z I isthe atomic number of the I th nucleus.The situations in which the local energy E ( R ) =Ψ − ( R ) ˆ H ( R )Ψ( R ) diverges were classified in detail byTrail [13]. The divergence of the Coulomb potential atelectron-nucleus and electron-electron coalescence points, r iI → and r ij → respectively, can be neutralized by con-straining the trial wave functions to obey the Kato cusp condi-tions [31] under which the kinetic energy exactly cancels thepotential energy at two-body coalescence points. The remain-ing divergences of the local energy arise when Ψ( R ) → but Ψ is not locally identical to an eigenstate of ˆ H , since ˆ H ( R )Ψ( R ) can be finite where Ψ( R ) is zero.Mismatches between the nodes of Ψ( R ) and ˆ H ( R )Ψ( R ) are responsible for the asymptotic behavior [13] P E ( E ) = c | E − E | − + c | E − E | − + . . . , (6)when | E | → ∞ , where E is the exact ground state energyand { c i } are unknown coefficients. The coefficients of evenpowers of | E − E | in the left ( E → −∞ ) and right ( E → + ∞ ) tails of P E ( E ) have equal coefficients, c L2 n = c R2 n , whilethose of odd powers are of the same magnitude but of oppositesigns, c L2 n +1 = − c R2 n +1 .The expectation value of the energy itself can be evaluatedwith standard estimators without problems, but these yield un-reliable confidence intervals on the variance of the local en-ergy. The variance of the local energy is an important quan-tity since it is directly related to the quality of the trial wavefunction, and is routinely used in wave function optimization[32], as well as in the “variance extrapolation” technique thatattempts to estimate the zero-variance (exact wave function)limit of expectation values [33]. We discuss the specific is-sues with the variance of the local energy in section III. B. Local force
The force exerted by the electrons and the other nuclei onthe I th nucleus of a system along the x direction is h ˆ F x,I i = − d h ˆ H i / d x I , where x I is the x Cartesian coordinate of the I th nucleus. Dropping the I and x labels, the local force canbe expressed as F ( R ) = F HFT ( R ) + F P ( R ) , (7)where the Hellmann-Feynman force is F HFT ( R ) = − Ψ − ( R ) ∂ ˆ H ( R ) ∂x Ψ( R )= X i − Z I x iI r iI + X J = I Z I Z J x IJ r IJ , (8)and the Pulay force is F P ( R ) = − − ( R ) h E ( R ) − h ˆ H i i ∂ Ψ( R ) ∂x . (9)Optionally, a variance-reduction term which satisfies a zero-variance principle [20], F ZV ( R ) = − Ψ − ( R ) h ˆ H ( R ) − E ( R ) i ∂ Ψ( R ) ∂x , (10)can be added to Eq. 7. This term does not alter the expectationvalue of the force but reduces the extent of the fluctuations ofthe local Hellmann-Feynman force.The local Hellmann-Feynman force diverges at electron-nucleus coalescence points, and its distribution exhibits apower-law tail of the form P F HFT ( F ) = c | F − F | − / + c | F − F | − + . . . , (11)when | F | → ∞ , where F is a constant and { c i } are unknowncoefficients. The coefficients of the leading-order term on theleft and right tails are equal, c L0 = c R0 , and the rest are asym-metric.If the wave function satisfies the electron-nucleus Kato cuspconditions [34] the zero-variance term exactly cancels this di-vergence [21], but, like for the local energy, the mismatch be-tween the nodes of Ψ( R ) and ˆ H ( R )Ψ( R ) is responsible forthe heavy tails in the distribution of the local values of thePulay and zero-variance terms, and hence of the total force,which satisfies [17] P F ( F ) = c | F − F | − / + c | F − F | − + . . . , (12)when | F | → ∞ , where F is a constant and { c i } are unknowncoefficients exhibiting no symmetry.A somewhat different scenario arises if nondivergentpseudopotentials are used in place of the electron-nucleusCoulomb potential [35, 36] and the local force estimationis consequently adjusted [23, 24]. In this case the localHellmann-Feynman force exhibits less problematic heavytails of leading order | F − F | − [17], but since the Pulayterm is unaffected by the use of pseudopotentials the local to-tal force remains of the form of Eq. 12. III. STANDARD ESTIMATION OF AN EXPECTATIONVALUE
The expectation value of an observable whose local value A is distributed according to P A ( A ) is h A i = Z ∞−∞ P A ( A ) A d A , (13)and the variance of A is the expectation value of ( A − h A i ) , Var[ A ] = σ = Z ∞−∞ P A ( A ) ( A − h A i ) d A . (14)The integrals in Eqs. 13 and 14 must be nondivergent forthe expectation value and variance of A to be well-defined.Therefore, probability distributions with asymptotic behavior P A ( A ) ∼ | A | − µ as | A | → ∞ have no well-defined expec-tation value or variance for µ ≤ , and have a well-defined expectation value but no well-defined variance for < µ ≤ .Note that a function with µ ≤ is not a valid probability dis-tribution as it cannot be normalized.Let { A m } Mm =1 be a sample of M independent random vari-ables identically distributed according to P A ( A ) . The stan-dard estimator for Eq. 13 is the sample mean, ¯ A = 1 M M X m =1 A m , (15)and the standard estimator for Eq. 14 is the sample variance, S = P Mm =1 ( A m − ¯ A ) M − . (16)The uncertainty on ¯ A is the standard error σ ¯ A = S/ √ M .This poses a problem for distributions of leading-order expo-nent < µ ≤ : despite having a well-defined expectationvalue according to Eq. 13, its standard estimator has a diver-gent uncertainty because it is defined in terms of the divergentvariance of Eq. 14.In this regime P A ( A ) does not satisfy the conditions of thecentral limit theorem that would guarantee the asymptotic nor-mality of confidence intervals built from the standard meanand standard error [4]. Instead, P A ( A ) satisfies the law oflarge numbers, which states that the standard mean does con-verge to the expectation value at infinite sample size but confi-dence intervals cannot be constructed using the standard erroras finite sample sizes.A similar issue affects the estimator of the variance itself.Even though the variance is well-defined for µ > , the vari-ance on the estimator of the variance is Var[ S ] = 1 M (cid:18) m − M − M − σ (cid:19) , (17)where m is the fourth-order central moment of P A ( A ) ,which diverges for distributions of leading-order exponent µ ≤ , leading to a divergent uncertainty on the S estima-tor of the variance. IV. TAIL-REGRESSION ESTIMATOR
As outlined in the previous section, the uncertainty on thestandard estimator of a moment of a probability distributioninvolves the estimator of a higher-order moment, which mightbe divergent even though the moment of interest is well-defined. We therefore propose an alternative estimator of themoment of a heavy-tailed probability distribution that exploitsknowledge of its analytical asymptotic form to yield well-defined confidence intervals whenever the moment itself iswell defined.Without loss of generality we focus on distributions with aright heavy tail; the extension of our analysis to distributionswith left and both left and right heavy tails is straightforward.In particular, we consider a probability distribution exhibitinga right tail of asymptotic form P A ( A ) = X n c n | A − A | − µ − n ∆ , (18)when A → ∞ , where the leading-order exponent µ and expo-nent increment ∆ > are assumed to be known analytically,as is the case for local energies, with µ = 4 and ∆ = 1 [13],and for local forces, with µ = 5 / and ∆ = 1 / [37]. Thespecific value of ∆ is not critical for the correct descriptionof the asymptote by Eq. 18, and can be assumed to be unity,but the accuracy of a truncated expansion strongly depends on ∆ . In other words, the bias incurred by choosing a subopti-mal value of ∆ can be made arbitrarily small by using a largerexpansion. In Eq. 18 { c n } are unknown coefficients, and A is an unknown parameter which is assumed to lie close to the“center” of the distribution. A. Validity of asymptote with approximate A First, we address the fact that A is an unknown nonlinearparameter in Eq. 18 and will have to be approximated. Let A c be an approximation to A such that A = A c + ε , where ε isa small error. Assuming for simplicity that ∆ − is an integerand expanding to first order in ε we find P A ( A ) = X n c n | A − A c − ε | − µ − n ∆ ≈ X n c n | A − A c | − µ − n ∆ + ε X n c n ( µ + n ∆) | A − A c | − µ − n ∆ − = X n c ′ n | A − A c | − µ − n ∆ = P ′ A ( A ) , (19)where c ′ n = (cid:26) c n , n < ∆ − c n + εc n − ∆ − ( µ + n ∆ − , n ≥ ∆ − . (20)The asymptotic expression P ′ A ( A ) has the same form as P A ( A ) , albeit with modified coefficients c ′ n for n ≥ ∆ − .We shall therefore proceed with the derivation of our estima-tor using P ′ A ( A ) as the asymptote, dropping the primes fromthe notation for clarity. This effectively amounts to replacing A with A c , which in practice we set to the sample median. B. Estimator
In order to develop our estimator, we start by assuming thatthere exists a threshold A R such that for A > A R the prob-ability distribution is accurately represented by an expansionof order n R , P A ( A ) = n R X n =0 c n | A − A c | − µ − n ∆ , A > A R . (21)The integral of Eq. 13 can be partitioned at A R into centraland right-tail contributions, h A i = Z A R −∞ P A ( A ) A d A + Z ∞ A R P A ( A ) A d A . (22) Let { A m } be a sample of M independent random variablesidentically distributed according to P A ( A ) , { A ( m ) } the corre-sponding order statistics, i.e. , the re-indexed version of { A m } such that A (1) > A (2) > . . . > A ( M ) , M C the number of datain the central region A < A R , and M R = M − M C the num-ber of data in the tail. We define the tail-regression estimator of h A i as A = 1 M X m>M R A ( m ) + n R X n =0 c n Z ∞ A R | A − A c | − µ − n ∆ A d A . (23)The integrals in Eq. 23 are nondivergent for µ > and can beevaluated analytically, Z ∞ A R | A − A c | − µ − n ∆ A d A == (cid:20) | A R − A c | − µ − n ∆+2 µ + n ∆ − A c | A R − A c | − µ − n ∆+1 µ + n ∆ − (cid:21) . (24)The parameters { c n } in Eq. 23 can be obtained by regressionof { A ( m ) } M R m =1 to Eq. 21. Since A is linear in { c n } , the distri-bution of A follows that of { c n } . This implies that if the re-gression coefficients are asymptotically normally distributed, A will also be asymptotically normally distributed. We willaddress the distribution of regression coefficients in sectionIV D. We regard A R , and n R as external parameters that wedeal with separately, see section IV E, and do not contribute tothe uncertainty on A .Analogously, we define the tail-regression estimators of thenorm, W = M C M + n R X n =0 c n Z ∞ A R | A − A c | − µ − n ∆ d A , (25)and of the variance of the distribution, V = 1 M − X m>M R ( A ( m ) − A ) + n R X n =0 c n Z ∞ A R | A − A c | − µ − n ∆ ( A − A ) d A . (26)These integrals can likewise be evaluated analytically. Finally,we note that the threshold A R must lie on the midpoint be-tween two adjacent sample points, (cid:0) A ( M R ) + A ( M R +1) (cid:1) , toensure that the central contributions have the correct weight inEqs. 23, 25, and 26.We use the bootstrap method [38] to compute the uncer-tainty on A . We generate n bs resamples of { A m } with re-placement, that is, the i th resample { A [ i ] m } contains M ele-ments from { A m } drawn at random and uniformly, allowingrepetitions. For each resample we evaluate A [ i ] , and we eval-uate the uncertainty on A as the standard deviation of the val-ues of {A [ i ] } . Estimates of other statistical parameters arisingfrom analysis of { A m } , including W and V , are also obtainedin this process. In our applications of the tail-regression esti-mator we use n bs = 4096 bootstrap resamples, which providea . uncertainty on the estimated uncertainties assumingnormality. The computational cost of this approach is propor-tional to M × n bs . C. Tail form in yx scale The framework for our tail-regression procedure is inspiredby regression-based tail-index estimation methods [27, 28],discussed in Appendix A. First, note that the complemen-tary cumulative distribution function ¯ F A ( A ) associated with P A ( A ) should be approximately equal to the sample quantiles q m , ¯ F A ( A ) = Z ∞ A ( m ) P A ( A ) dA ≈ q m , (27)for which we use the symmetric form q m = m − / M . Substi-tuting Eq. 21 into Eq. 27 yields n R X n =0 c n µ + n ∆ − (cid:12)(cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12)(cid:12) − µ − n ∆+1 = q m , (28)which we rearrange as q m (cid:12)(cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12)(cid:12) µ − = n R X n =0 c n µ + n ∆ − (cid:12)(cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12)(cid:12) − n ∆ . (29)We define y m = q m (cid:12)(cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12)(cid:12) µ − ,x m = | A ( m ) − A c | − ∆ , (30)which we refer to as “ yx scale”, under which Eq. 29 reads y m = n R X n =0 c n µ + n ∆ − x nm , (31)that is, y is simply a polynomial in x . By construction, y must be positive and tend to a finite value as x → , y (0) = c / ( µ − , and the n th derivative of y ( x ) at x = 0 is likewiseproportional to c n .It is useful to inspect the basic properties of the yx scale wehave introduced. For illustration purposes, let H µ ( A ) = µ sin πµ π
11 + | A | µ , (32)which for µ > is a normalized probability distributionwhose expectation value is zero and has the asymptote | A | − µ as | A | → ∞ . In Fig. 1 we show a yx -scale plot of the lefttail of M independent random numbers identically distributedaccording to H ( A ) at different sample sizes M , assuming ∆ = 1 . The exact value y (0) = 0 . is shown as a short-dashed line in each panel. The first quantile q = 1 / (2 M ) ,corresponding to the largest value of A in the sample, getssmaller as M increases, and, as follows from Eq. 30, the ex-treme point ( x , y ) satisfies y = q x (1 − µ ) / ∆1 . This curve . . . . . . . . M = 10 q m (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) µ − M = 10 (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) − M = 10 FIG. 1. yx -scale plot of the left tail of M independent randomnumbers identically distributed according to H ( A ) for sample sizes M = 10 (top), (middle), and (bottom). Shaded areasaround curves correspond to the . and . confidence in-tervals obtained from the bootstrap. The short-dashed lines indicatethe analytical value of y (0) , and the long-dashed lines are the first-quantile lines, y = / M x − , marking the region where the largestvalue of A falls at each sample size. determines how far left the plot extends, as shown by the long-dashed line in each of the panels of Fig. 1. With increasing M the plot is populated from right to left, gradually producing abetter resolved curve near x = 0 .In Fig. 2 we show yx plots of the M = 10 sample usedin Fig. 1 in which the yx scale is defined using the exponents µ ′ = 3 , , and . If the true exponent is underestimated, . . . . q m (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) µ ′ − (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) − µ ′ = µ ′ = 4 µ ′ = FIG. 2. yx -scale plot of the left tail of independent randomnumbers identically distributed according to H ( A ) using the cor-rect leading-order tail exponent µ ′ = µ = 4 and incorrect values µ ′ = 3 and , which respectively go to zero and diverge as x → .Shaded areas around curves correspond to the . and . con-fidence intervals obtained from the bootstrap. The short-dashed lineindicates the analytical value of y (0) . µ ′ < µ , the curve goes to zero as x → , while overestimat-ing the exponent, µ ′ > µ , yields a diverging curve at x → .Attempting to use yx -scale plots to estimate µ is rudimentaryat best, since each possible value must be tested explicitly, andcare must be taken in interpreting the large statistical fluctua-tions of y at small x . Tail-index estimation methods [28, 29]offer a robust alternative; the regression method of Ref. [28]produces a reasonable value of µ = 3 . from the dataplotted in Fig. 2. Note that the tail-regression estimator re-lies on analytical knowledge of µ rather than on its estimationfrom the sample. D. Tail regression and weights
We now focus on the regression procedure to ensure a reli-able estimation of the fit parameters. It is apparent from Fig. 1that the distribution of y values at small x about the exact y (0) is skewed towards large y , so adequate least-squares weightsare needed to ensure the faithfulness of the resulting fit.Let r m = w m [ y m − y ( x m )] be the least-square residuals,where w m is the least-squares weight applied to the m th point,and χ = M − n R − P Mm =1 r m be the least-squares function.Minimizing χ with respect to the fit parameters { c n } yieldsthe linear system of equations T c = b , where ( T ) pq = 1 M M X m =1 w m x p + qm , ( b ) p = 1 M M X m =1 w m y m x pm , ( c ) p = c p , (33)and indices p and q run between and n R . The parametervector is thus c = T − b = 1det( T ) adj( T ) b , (34)where adj( T ) is the adjugate matrix of T ( i.e. , the transposeof its cofactor matrix) and det( T ) is its determinant.The parameters ( c ) p will be asymptotically normally dis-tributed if ( T ) pq , ( b ) p , and det( T ) are themselves asymptot-ically normally distributed and det( T ) is nonzero, since Eq.34 involves sums and multiplications, which preserve asymp-totic normality, and a division, which also preserves asymp-totic normality if the denominator is strictly nonzero.We consider weights of the form w m = | A ( m ) − A c | − γ ( µ − , (35)where γ is a positive constant. Since x m is a negative power of | A ( m ) − A c | , all elements of T and det( T ) are asymptoticallynormally distributed, but the elements of b need not be. Wefocus on the m = 1 contribution to ( b ) , which is the leastlikely to exhibit asymptotic normality, w y = q (cid:12)(cid:12)(cid:12) A (1) − A c (cid:12)(cid:12)(cid:12) ( µ − − γ ) . (36) Noting that w = q γ y − γ and that the exact asymptotic valueof y as M → ∞ is by construction y (0) = c / ( µ − , theasymptotic limit of w y is ξ = q γ (cid:20) c µ − (cid:21) − γ , (37)where q = 1 / (2 M ) encodes the sample-size dependence.We now investigate the distribution of values of w y about ξ . The probability that | A (1) − A c | is bounded from aboveby α is the probability that the M points in the sample arebounded from above by A c + α , that is, Prob( | A (1) − A c | ≤ α ) =1 − ¯ F | A (1) − A c | ( α )= (cid:2) − ¯ F A ( A c + α ) (cid:3) M ≈ (cid:0) − c α − µ +1 (cid:1) M , (38)to leading order for large α . Differentiating with respect to α yields the probability distribution of the extreme value, P | A (1) − A c | ( α ) = M c ( µ − α − µ (cid:0) − c α − µ +1 (cid:1) M − , (39)and by a change of variable, P w y /ξ ( β ) = µ − − γ ) β − − γ − γ (cid:20) − µ − M β − − γ (cid:21) M − , (40)assuming γ = 1 . At large β , P w y /ξ ( β ) ∼ β − − γ − γ , (41)which is a power-law tail for γ < , yielding an undefinedexpectation value for γ ≤ / . Unweighted fits ( γ = 0 ) aretherefore numerically ill-conditioned since the residuals arethemselves heavy tailed ∼ β − regardless of the value of µ .Fit weights with γ > / must therefore be used. . . P w y / ξ ( β ) ( a r b . un it s ) β γ = 0 γ = 0 . γ = 1 . γ = 1 . FIG. 3. Probability distribution of the values of w y /ξ , Eq. 40, forvarious values of γ , using µ = 4 and M = 10 . In Fig. 3 we plot P w y /ξ ( β ) for various values of γ . Notethat the curve for γ = 0 describes the distribution of valuesof y in Fig. 1 relative to the analytical value of y (0) alongthe constant- q lines. The plots for γ = 0 . and . revealthat, while not ill-conditioned, the significant skewness of thedistributions makes the first moment of P w y /ξ ( β ) differ sig-nificantly from the asymptotic expectation value of in thesecases. For γ = 1 , w y = q γ is y -independent, and the dis-tribution is therefore a delta function peaked at the asymptoticexpectation value, as shown in Fig. 3. We therefore use γ = 1 for our fit weights.We empirically find it advantageous to include a q m -dependent factor in the fit weights so as to ensure the con-tinuity of the fit to the probability distribution near A R . Wechoose this factor to be the weights corresponding to the for-mulation of the Hill estimator of the first-order tail index [27]as a regression estimator [28], see Eq. A3 in Appendix A.Therefore, our full fit weights are w m = (cid:18) log q M R +1 q m (cid:19) − | A ( m ) − A c | − µ +1 . (42) E. Selecting n R and A R The tail-regression estimator depends parametrically on theexpansion order n R and threshold A R . In our tests we try sev-eral thresholds and converge the fit with respect to the expan-sion order at each of them. We then choose the value of A R which minimizes the uncertainty on either V , if well-defined,or on A .We choose the expansion order heuristically by findingplateaus in W , A , V , and χ as a function of n R , and select-ing the smallest expansion order at which these four functionshave converged. To ensure correctness, we further require that W ≈ and y ( x ) > within the fit range, and we restrict n R ≥ / ∆ in order to “absorb” the error incurred by approx-imating A by A c , as explained in Section IV A.Note that we do not set A R directly, but instead set q R =( M R − / /M , as keeping the number of sample points ineach partition fixed across bootstrap resamples eliminates thevariation of the central contribution to W , which is statisti-cally advantageous. We choose our values of q R using a gridof equally-spaced values of − log q R . We pick n R and A R using n bs = 256 , and evaluate the final result separately with n bs = 4096 to avoid selection bias. We illustrate the proce-dure for choosing n R and A R in section V A. F. Two-tailed distributions, symmetry, and constraints
The tail-regression estimator described so far can be mod-ified trivially for distributions with left and right heavy tails.For simplicity we use the same expansion orders and thresh-olds on both tails, n R = n L and M R = M L .In some important cases, including the local energy and lo-cal atomic force in VMC [13, 17], the leading-order coeffi-cients of the left and right tails, c L0 and c R0 , are equal. Thiscan be exploited by unifying the regression step for both tailsand imposing the constraint c L0 = c R0 = c . The leading order contribution to A from the tails is c Z A c − A L A R | A − A c | − µ A d A . (43)The exact cancellation of part of the left- and right-tail con-tributions to A should provide a substantial reduction to itsuncertainty. The effect on the uncertainty on V of enforcingsymmetry can be expected to be marginal since both tails con-tribute positively in this case.As implied by Eq. 20, due to the approximation A ≈ A c constraints must not be applied to parameters c n with n ≥ / ∆ . For example, even if a distribution with ∆ = 1 is knownto analytically satisfy c L1 = c R1 , the values of the c parame-ters on each tail must be allowed to differ to account for theerror in A c . In the tests carried out in our present work weuse at most one constraint, and we use the labels “TRE” and“TRE(1)” in the plots in sections V, VI, and VII to distinguishthe unconstrained and constrained estimators.The use of constraints allows for the interesting possibilityof estimating the expectation value of distributions with <µ ≤ for which h A i is formally divergent. In this case weredefine the expectation value as the Cauchy principal valueof the integral in Eq. 13 with respect to A , h A i = lim a →∞ Z A + aA − a P A ( A ) A d A , (44)ensuring that the divergent leading-order contributions cancelout due to symmetry. We present an example of this in sectionV C.
V. APPLICATION TO MODEL DISTRIBUTIONS
In this section we apply the tail-regression estimator to syn-thetic data. We construct the seed model distributions as linearcombinations of H µ ( A ) , defined in Eq. 32, to study variouscases of interest. A. Distribution with undefined fourth moment
Distributions with < µ ≤ have convergent standardestimators for the expectation value and variance, but the un-certainty on the standard estimator of the variance is diver-gent. To exemplify this case we choose to analyze P A ( A ) = H . ( A ) + H . ( A ) , which has a leading-order tail expo-nent of µ = 3 . , close to the lower limit of , and ∆ = 1 . Theanalytical variance of this distribution is σ = 4 . , and P A ( A ) satisfies the analytical limit y ∼ . . x as x → .The regression of the tails of this model distribution isdemonstrated in Fig. 4 using a sample of random vari-ables. The top panel shows the probability distribution esti-mated by convolving the data with a variable-width Gaussiankernel, and the lower panels show yx plots of the data. The tailfits are shown in the three panels as thick lines, for which we − − − −
20 0 20 4000 . . . . . . P A ( A ) A q m (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) µ − (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) − Left tail (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) − Right tail
FIG. 4. Application of the tail-regression procedure to a sam-ple of random variables distributed according to P A ( A ) = H . ( A ) + H . ( A ) . The top panel shows the estimated prob-ability distribution, and the lower panels show yx plots for each tail.Fits are shown as thick lines in the three panels, and the analyticalasymptotic form of the tail is shown as a dashed line in each of thebottom panels. . and . confidence intervals obtained fromthe bootstrap are shown as shaded areas. use n R = 3 and − log q R = 2 . , and the constraint c L0 = c R0 has been imposed.The process of selecting n R at fixed threshold is illustratedin Fig. 5, where W , A , V , and χ are plotted as a function of n R . These functions converge relatively quickly with n R , butfor large expansion orders overfitting becomes an issue. Thiscan be easily spotted by the significant jump in the uncertaintyon the estimators, which in Fig. 5 occurs at n R = 7 . In thiscase we consider n R = 3 to be the “optimal” expansion order.In Fig. 6 we plot the uncertainty on V as a function ofthe threshold for each considered expansion order. The un-certainty on V is largely monotonic in n R at fixed thresholdand in − log q R at fixed expansion order. Figure 6 shows thatthe “optimal” values of the expansion order and threshold are n R = 3 and − log q R = 2 . , as used in Fig. 4.In Table I we report the results obtained from the standardand tail-regression estimators of h A i and σ for sample sizesranging from to , which we plot in Figs. 7 and 8.For the tail-regression estimator we report results both withoutthe use of constraints and imposing c L0 = c R0 . The “optimal”values of n R and − log q R are not significantly sample-sizedependent; we find that n R increases when − log q R decreasesand vice versa, as could be expected.The estimators of h A i are within uncertainty of the exactvalue of zero at all sample sizes, but A has an uncertaintyabout smaller than ¯ A . The uncertainty on the standardestimator of the variance S is nonmonotonic, as expected,and in this case confidence interval sizes are severely under-estimated, causing the false impression that S converges to . . − . − . . − − − − W A V χ n R FIG. 5. Tail-regression estimator of the zeroth, first, and secondmoment of P A ( A ) = H . ( A ) + H . ( A ) and χ as a functionof expansion order obtained using a sample of random numbersand a threshold of − log q R = 2 . . The “optimal” n R = 3 is thatat which all of these functions reach their respective plateaus. . . U n ce r t a i n t yon V − log q R FIG. 6. Uncertainty on the tail-regression estimator of σ ob-tained from random variables distributed according to P A ( A ) = H . ( A ) + H . ( A ) as a function of threshold − log q R . Eachcurve corresponds to a different expansion order n R , as labelled.Combinations of − log q R and n R that fail to satisfy our correct-ness criteria are not shown, and narrow fails are shown as crossed-out points. The “optimal” choice of threshold and expansion order,marked with a circle, is that which minimizes the uncertainty on V ,corresponding to − log q R = 2 . and n R = 3 in this case. an incorrect value with increasing M . By contrast, V remainswithin uncertainty of σ at all M , and its uncertainty de-creases monotonically with sample size. The unconstrainedand constrained estimators give indistinguishable results inthis case, and the uncertainty on both seems to asymptotically Standard TRE (unconstrained) TRE (1 constraint) M ¯ A S n R − log q R A V n R − log q R A V . . .
25 0 . . .
00 0 . . − . . . − . . .
50 0 . . . . .
50 0 . . .
50 0 . . − . . . − . . . − . . − . . . − . . . − . . . . .
75 0 . . .
75 0 . . Exact . . . . . . TABLE I. Standard and tail-regression estimators of h A i and σ for model distribution P A ( A ) = H . ( A ) + H . ( A ) obtained fromrandom samples of various sizes M . The “optimal” expansion orders n R and thresholds − log q R used for the tail-regression estimator in eachcase are also shown. − . . − − − − E s ti m a t o r o f h A i (a) U n ce r t a i n t y M StdTRETRE(1) (b)
FIG. 7. Convergence of (a) the estimators of h A i and (b) theiruncertainties as a function of sample size M for model distribution P A ( A ) = H . ( A ) + H . ( A ) . The exact value h A i = 0 ismarked with a dashed line in (a), and dashed lines proportional to M − / passing through the last point of each of the tail-regressionestimator curves are shown in (b) as guides to the eye. . confi-dence intervals are shown as shaded areas. decay as M − / . B. Distribution with undefined second moment
Distributions with < µ ≤ have a divergent vari-ance, and the standard estimator of h A i has an undefineduncertainty. We exemplify this case with model distribution P A ( A ) = H . ( A ) + H . ( A ) , which has µ = 2 . , closeto the lower limit of , and ∆ = 1 .The standard and tail-regression estimators of h A i are givenin Table II and plotted in Fig. 9 as a function of sample size M . As expected, the standard estimator ¯ A hovers aroundthe exact value h A i = 0 but its uncertainty does not decreaseuniformly with sample size. The tail-regression estimator pro-vides a monotonically decreasing uncertainty with an approx-imate asymptotic decay proportional to M − / , and imposingthe constraint c L0 = c R0 yields an order of magnitude smalleruncertainties than the unconstrained estimator does. − − E s ti m a t o r o f σ (a) U n ce r t a i n t y M StdTRETRE(1) (b)
FIG. 8. Convergence of (a) the estimators of σ and (b) their un-certainties as a function of sample size M for model distribution P A ( A ) = H . ( A ) + H . ( A ) . The exact value σ = 4 . is marked with a dashed line in (a), and dashed lines proportional to M − / passing through the last point of each of the tail-regressionestimator curves are shown in (b) as guides to the eye. . confi-dence intervals are shown as shaded areas. C. Symmetric distribution with undefined first moment
Distributions with < µ ≤ have a divergent expectationvalue, and the standard estimator of h A i is undefined. How-ever if the tails of the distribution are symmetric to leadingorder it is possible to redefine h A i as a Cauchy principal valuewhich can be estimated, see Eq. 44. We exemplify this casewith model distribution P A ( A ) = H . ( A ) + H . ( A ) ,which has µ = 1 . , close to the lower limit of , and ∆ = 1 .Results using the constrained tail-regression estimator aregiven in Table III and plotted in Fig. 10. We find that A is within uncertainty of the exact value of zero at all samplesizes, and again the uncertainty on A appears to be asymptot-ically proportional to M − / . In Table III we also give theorder of magnitude of the computed sample mean to highlightthat this example is absolutely intractable with the standardestimator.0 Standard TRE (unconstrained) TRE (1 constraint) M ¯ A n R − log q R A n R − log q R A − . . − . . − . − . . − . . − . . .
10 0 . . − . − . . − . . − . − . .
30 0 . .
50 0 . − . .
70 0 . . − . Exact . . . TABLE II. Standard and tail-regression estimators of h A i for model distribution P A ( A ) = H . ( A ) + H . ( A ) obtained from randomsamples of various sizes M . The “optimal” expansion orders n R and thresholds − log q R used for the tail-regression estimator in each caseare also shown. − . . − − − − E s ti m a t o r o f h A i (a) U n ce r t a i n t y M StdTRETRE(1) (b)
FIG. 9. Convergence of (a) the estimators of h A i and (b) theiruncertainties as a function of sample size M for model distribution P A ( A ) = H . ( A ) + H . ( A ) . The exact value h A i = 0 ismarked with a dashed line in (a), and dashed lines proportional to M − / passing through the last point of each of the tail-regressionestimator curves are shown in (b) as guides to the eye. . confi-dence intervals are shown as shaded areas.Standard TRE (1 constraint) M | ¯ A | n R − log q R A ∼ . − . ∼ .
95 0 . ∼ .
00 0 . ∼ . − . ∼ . − . ∼ . − . Exact . . TABLE III. Standard and tail-regression estimators of h A i for modeldistribution P A ( A ) = H . ( A )+ H . ( A ) obtained from randomsamples of various sizes M . The “optimal” expansion orders n R and thresholds − log q R used for the tail-regression estimator in eachcase are also shown. − . . − − E s ti m a t o r o f h A i (a) U n ce r t a i n t y M TRE(1) (b)
FIG. 10. Convergence of (a) the constrained tail-regression esti-mator of h A i and (b) its uncertainty as a function of sample size M for model distribution P A ( A ) = H . ( A ) + H . ( A ) . The exactvalue h A i = 0 is marked with a dashed line in (a), and a dashedline proportional to M − / passing through the last point of thetail-regression estimator curve is shown in (b) as a guide to the eye. . confidence intervals are shown as shaded areas. VI. APPLICATION TO VARIATIONAL QUANTUMMONTE CARLO DATA
In this section we explore the performance of the tail-regression estimator on data obtained from VMC calculations.Local values of observables generated using the VMC methodare usually affected by serial correlation due to the use of theMetropolis algorithm to sample configuration space. In ourVMC calculations we perform up to
Metropolis steps be-tween consecutive evaluations of the local values of the targetobservables so that these can be considered to be indepen-dent random variables, and we have verified that the resultingdatasets exhibit negligible serial correlation.
A. The energy of the homogeneous electron gas
The homogeneous electron gas is an ideal test bed formethodological developments in QMC. We perform VMC1calculations on the paramagnetic 54-electron gas in a cubicsimulation cell at density r s = 1 using the Slater-Jastrow (SJ)wave function, consisting of the product of up- and down-spin Slater determinants of the plane-waves with the small-est momenta compatible with the periodicity of the simula-tion cell multiplied by a Jastrow correlation factor, Ψ SJ ( R ) =e J ( R ) D ↑ ( R ↑ ) D ↓ ( R ↓ ) . Our Jastrow factor consists of anisotropic electron-electron term of the Drummond-Towler-Needs form [7, 39]. We use the CASINO code [40] to gen-erate samples of local energies whose distribution, as detailedin section II A, has left and right heavy tails of principal expo-nent µ = 4 , ∆ = 1 , and equal left- and right-tail leading-ordercoefficients [13]. As explained in section IV F, constraints in-volving c n with n ≥ cannot be applied since A is beingapproximated by A c .In Fig. 12 we plot the probability distribution estimatedfrom local energies, yx plots of the tails, and the corre-sponding tail fits. The standard and tail-regression estimators . . − − − − − E s ti m a t o r o f σ ( a . u . ) (a) U n ce r t a i n t y ( a . u . ) M StdTRETRE(1) (b)
FIG. 11. Convergence of (a) the estimators of σ and (b) their un-certainties as a function of sample size M for the VMC local energyof a 54-electron gas at r s = 1 using the SJ wave function. Our bestestimate of the value of the variance of the local energy for this sys-tem, σ ≈ . a.u., is marked with a dashed line in(a), and dashed lines proportional to M − / passing through the lastpoint of each of the tail-regression estimator curves are shown in (b)as guides to the eye. . confidence intervals are shown as shadedareas. of h A i and σ , given in Table IV and plotted in Fig. 11 asa function of sample size, are in good agreement with eachother. The A estimator offers no advantage over ¯ A in thiscase, but the uncertainty on S exhibits nonmonotonicity asa function of M , while that in V is monotonic, significantlysmoother, and up to 45% smaller. Note that even though thenominal standard confidence interval on S only shows minorsigns of ill behavior in this example, it is formally undefined,while the tail-regression estimator produces valid confidenceintervals. This is of potential practical importance in wavefunction optimization and variance extrapolation. − . . . . . P A ( A ) A q m (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) µ − (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) − Left tail × − (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) − Right tail
FIG. 12. Application of the tail-regression procedure to a sampleof VMC local energies for the 54-electron gas at r s = 1 usingthe SJ wave function. The top panel shows the estimated probabilitydistribution, and the lower panels show yx plots of both tails. Fitsare shown as thick lines in the three panels. . and . con-fidence intervals obtained from the bootstrap are shown as shadedareas. B. The atomic force in the C molecule We turn our attention to the atomic force in the all-electroncarbon dimer. The C molecule is of particular interest dueto its strong multi-reference character that makes the single-determinantal wave function incur a large nodal error, whichought to provide relatively strong heavy tails in the local Pulayand zero-variance force distributions.We generate Hartree-Fock orbitals for the all-electron car-bon dimer at an off-equilibrium (compressed) bond length of r CC = 2 . a.u. (the experimental equilibrium bond length ofC is . a.u. [41]) using the relatively modest cc-pvdzbasis set [42, 43] with the MOLPRO code [44]. We combinethese orbitals, modified to satisfy the Kato cusp conditionsat electron-nucleus coalescence points [34], with a Jastrowfactor containing electron-electron, electron-nucleus, andelectron-electron-nucleus terms of the Drummond-Towler-Needs form [7, 39], to form the trial wave function for VMC.Throughout the VMC run, performed with the
CASINO code[40], we collect local values of the components of the forceon one of the carbon atoms along the molecular axis in thedirection away from the other atom.We first focus on the Pulay force which, as discussed inSection II B, follows a heavy tailed distribution with µ = 5 / and ∆ = 1 / due to the nodal error in the trial wave function.In Fig. 13 we show yx plots of the tails of the local Pulay forceat sample sizes M = 10 , , and , along with plots of thecorresponding “optimal” fits. Despite having chosen a systemknown to exhibit a large nodal error, we find that the leading-order heavy tails are relatively weak, and that it takes sample2 Standard TRE (unconstrained) TRE (1 constraint) M ¯ A S n R − log q R A V n R − log q R A V . . .
25 0 . . .
50 0 . . . . .
50 0 . . .
75 0 . . . . .
00 0 . . .
25 0 . . . . .
25 0 . . .
25 0 . . . . .
00 0 . . .
25 0 . . . . .
25 0 . . .
25 0 . . TABLE IV. Standard and tail-regression estimators of h A i and σ for the VMC energy of the 54-electron gas at r s = 1 using the SJ wavefunction obtained from local energy samples of various sizes M . The “optimal” expansion order n R and threshold − log q R used for thetail-regression estimator in each case are also shown. . . . . Left tail M = 10 Right tail q m (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) µ − M = 10 (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) − / M = 10 (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) − / FIG. 13. yx plots of the left (left) and right (right) tails of the VMClocal Pulay force on a carbon atom along the molecular axis of theC molecule at r CC = 2 a.u. at sample sizes M = 10 (top), (middle), and (bottom). Fits are shown as thick lines in the threepanels. . and . confidence intervals obtained from thebootstrap are shown as shaded areas. sizes of M & to resolve the nonzero value of y (0) . Asa result, the uncertainty on the standard estimator of h A i islikely to only exhibit nonconvergent behavior at large samplesizes.We plot the convergence of the standard and tail-regressionestimators of h F P i in Fig. 14. As expected, the standard es-timator ¯ A seems well behaved at small sample sizes, but at M = 10 the standard error presents a substantial nonmono-tonic jump. The uncertainty obtained with the tail-regressionestimator remains smooth and monotonic, and is ultimatelysmaller than that of the standard estimator at M = 10 .We find that the local zero-variance corrected Hellmann-Feynman force, F HFT + F ZV , and the local zero-variance cor-rected total force, F HFT + F P + F ZV , exhibit similarly weakleading-order tails, and the uncertainties in the standard and . . . . . − − − E s ti m a t o r o f h A i ( a . u . ) (a) U n ce r t a i n t y ( a . u . ) M StandardTRE (b)
FIG. 14. Convergence of (a) the estimators of h A i and (b) their un-certainties as a function of sample size M for the VMC local Pulayforce on a carbon atom along the molecular axis of the C moleculeat r CC = 2 a.u.. The best value of the Pulay force of . a.u.is marked with a dashed line in (a), and a dashed line proportional to M − / passing through the last point of the tail-regression estima-tor curve is shown in (b) as a guide to the eye. . confidenceintervals are shown as shaded areas. tail-regression estimators follow convergence patterns similarto those depicted in Fig. 14.Without the zero-variance correction, the heavy tails affect-ing the distribution of the local Hellmann-Feynman force arevery strong. As detailed in Section II B, these tails are causedby the presence of all-electron nuclei and the left and righttails have equal leading-order coefficients. In Fig. 15 we plotthe probability distribution of F HFT for the off-equilibriumcarbon dimer estimated from sample points and the corre-sponding yx plots of the left and right tails of the distribution.The value of y (0) is very large relative to the rest of the func-tion, and this is the only case among those we have consideredin which the slope of the yx plot is markedly negative at theorigin. Indeed, it can be shown that the electron-nucleus cuspcondition causes the c coefficient in the asymptotic form of F HFT to be approximately proportional to c with a large, neg-ative prefactor.The standard and tail-regression estimators of the expecta-tion value of the Hellmann-Feynman force are given in Table3 − − − − × − × × × .
05 0 .
10 0 0 .
05 0 . P A ( A ) A q m (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) µ − (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) − / Left tail (cid:12)(cid:12) A ( m ) − A c (cid:12)(cid:12) − / Right tail
FIG. 15. Application of the tail-regression procedure to a sample of VMC local Hellmann-Feynman forces on a carbon atom alongthe molecular axis of the C molecule at r CC = 2 a.u.. The top panelshows the estimated probability distribution, and the lower panelsshow yx plots of both tails. Fits are shown as thick lines in the threepanels. . and . confidence intervals obtained from thebootstrap are shown as shaded areas. V and plotted in Fig. 16 as a function of sample size. The un- − − − − E s ti m a t o r o f h A i ( a . u . ) (a) U n ce r t a i n t y ( a . u . ) M StandardTRETRE(1) (b)
FIG. 16. Convergence of (a) the estimators of h A i and (b) theiruncertainties as a function of sample size M for the VMC localHellmann-Feynman force on a carbon atom along the molecular axisof the C molecule at r CC = 2 a.u.. The best value of the Hellmann-Feynman force of − . a.u. is marked with a dashed line in(a), and dashed lines proportional to M − / passing through the lastpoint of each of the tail-regression estimator curves are shown in (b)as guides to the eye. . confidence intervals are shown as shadedareas. certainty on the standard estimator is clearly nonmonotonic,while that in the tail-regression estimator is smooth and ap-pears to present an M − / asymptotic decay. The uncertainty on the tail-regression estimator is up to times smaller thanthe nominal uncertainty on the standard estimator, of whicha factor of . is thanks to imposing the analytical constraint c L0 = c R0 .The results obtained for the different force components atthe largest considered sample size of M = 10 are given inTable VI. The nominal uncertainty on the standard estima-tor of h F HFT + F ZV i is an order of magnitude smaller thanthat in the constrained tail-regression estimator of h F HFT i ,and in this sense the tail-regression estimator is less effec-tive than the zero-variance correction. This is however some-what misleading since the uncertainty on the standard esti-mator remains formally ill-defined, while the tail-regressionestimator is asymptotically normally distributed. In any case,the tail-regression estimator of h F HFT + F ZV i yields a lower uncertainty than the standard estimator, which is equiv-alent to a factor-of-two reduction in the number of samplepoints required to achieve a target uncertainty, showing thatthe combination of variance-reduction techniques with thetail-regression estimator is advantageous. Similarly, the nom-inal uncertainties on the Pulay force and on the zero-variancecorrected total force are significantly reduced by replacing theill-defined standard estimator with the tail-regression estima-tor at this sample size. VII. APPLICATION TO DIFFUSION QUANTUM MONTECARLO DATA
At each post-equilibration step of a DMC calculation,an ensemble of walkers represents the mixed distribution Φ( R )Ψ( R ) , where Φ( R ) is the DMC wave function and Ψ( R ) is the trial wave function. These walkers carry variableweights which in turn trigger death and branching events. Thelocal values Ψ − ( R ) ˆ A Ψ( R ) of an observable ˆ A are evaluatedfor each walker at each step, and the weighted average of theresulting sample yields the mixed estimator of the expectationvalue, h A i = R Φ( R ) ˆ A Ψ( R ) d R R Φ( R )Ψ( R ) d R . (45)Note that one in principle seeks the pure estimator, h A i = R Φ( R ) ˆ A Φ( R ) d R R Φ( R )Φ( R ) d R , (46)but the mixed estimator is simpler to obtain, it is equal to thepure estimator if ˆ A commutes with the Hamiltonian of thesystem, and for other observables there are ways of approx-imating pure estimators using mixed estimators [6]. We willrestrict our discussion and tests to mixed estimators.DMC samples differ from VMC samples in importantways. The formalism presented in section IV can be triviallyaltered to accommodate weights, simply by replacing the sam-ple quantiles q m = m − / M with q m = P l : A l ≥ A m p l − p m / P l p l , (47)4 Standard TRE (unconstrained) TRE (1 constraint) M ¯ A n R − log q R A n R − log q R A − . − .
90 0 . − . − . . − . − . .
80 0 . .
80 0 . − . . − . . − . − . . − . . − . − . . − . . − . Best − . − . − . TABLE V. Standard and tail-regression estimators of h A i for the VMC Hellmann-Feynman force on a carbon atom along the molecular axisof the C molecule at r CC = 2 a.u., obtained from local force samples of various sizes M . The “optimal” expansion orders n R and thresholds − log q R used for the tail-regression estimator in each case are also shown. The “best” value is provided for reference and corresponds to thetail-regression estimator of h F HFT + F ZV i using sample points.Standard TRE TRE(1) h F HFT i − . − . − . h F HFT + F ZV i − . − . h F P i . . h F HFT + F ZV + F P i . . TABLE VI. Standard and tail-regression estimators of h A i forthe VMC local Hellmann-Feynman force, zero-variance correctedHellmann-Feynman force, Pulay force, and zero-variance correctedtotal force on a carbon atom along the molecular axis of the C molecule at r CC = 2 a.u., obtained from sample points. where p m is the unnormalized weight of the m th sample point.Walker branching events involve walkers being duplicatedand its copies then evolving independently. This causes acomplex pattern of serial correlation which cannot be elim-inated entirely by leaving several steps between consecutiveevaluations of the local values of observables, as we have donein our VMC calculations. While the presence of any form ofserial correlation violates our assumption that samples consistof independent and identically distributed random variables,we expect this effect to be small and ignore it in our DMCtests.The gradient of the DMC wave function Φ( R ) is in generaldiscontinuous at the nodes [45, 46]. This alters the relativepresence of walkers on either side of each nodal point, caus-ing observables whose local values diverge at the nodes, suchas the energy, to exhibit fully asymmetric heavy tails. Thelocal Hellmann-Feynmann component of the force is not af-fected by this, since its singularities do not occur at the nodesof the trial wave function, so its DMC distribution remainssymmetric to leading order as it is in VMC. A. The atomic force in the C molecule We have performed a DMC simulation of the C moleculeat the same off-equilibrium geometry and with the same wavefunction as described in Section VI B, using a time step of . a.u. [47] and a target population of walkers, and we have evaluated the local Hellmann-Feynmann and total forcesfor each walker every steps.The standard and tail-regression estimator of the Hellmann-Feynmann force are given in Table VII and plotted in Fig.17 as a function of sample size M . These results are verysimilar to their VMC counterparts; the nonmonotonic nominalstandard error is again up to times the uncertainty in thetail-regression estimator, of which a factor of . comes fromimposing the constraint c L0 = c R0 . − − − − E s ti m a t o r o f h A i ( a . u . ) (a) U n ce r t a i n t y ( a . u . ) M StandardTRETRE(1) (b)
FIG. 17. Convergence of (a) the estimators of h A i and (b) theiruncertainties as a function of sample size M for the mixed DMClocal Hellmann-Feynman force on a carbon atom along the molecularaxis of the C molecule at r CC = 2 a.u.. The best value of theHellmann-Feynman force of . a.u. is marked with a dashedline in (a), and dashed lines proportional to M − / passing throughthe last point of each of the tail-regression estimator curves are shownin (b) as guides to the eye. . confidence intervals are shown asshaded areas. The results we obtain for the total force at the largest con-sidered sample size of M = 10 are given in Table VIII. Inthis case, the uncertainty in the tail-regression estimator of thetotal force is smaller than the standard error. From thesetests we conclude that the tail-regression estimator is directlyapplicable to DMC samples generated using relatively longdecorrelation loops, with essentially the same benefits as we5 Standard TRE (unconstrained) TRE (1 constraint) M ¯ A n R − log q R A n R − log q R A . − . − − . − . . − . . . − . . − . . .
40 0 . .
40 0 . − .
30 1 . .
40 0 . . .
40 0 . .
40 0 . TABLE VII. Standard and tail-regression estimators of h A i for the mixed DMC Hellmann-Feynman force on a carbon atom along themolecular axis of the C molecule at r CC = 2 a.u., obtained from local force samples of various sizes M . The “optimal” expansion orders n R and thresholds − log q R used for the tail-regression estimator in each case are also shown.Standard TRE TRE(1) h F HFT i − . − . − . h F HFT + F ZV + F P i . . TABLE VIII. Standard and tail-regression estimators of h A i for themixed DMC local Hellmann-Feynman force and total force on a car-bon atom along the molecular axis of the C molecule at r CC = 2 a.u., obtained from sample points. have found for VMC samples. VIII. CONCLUSIONS
We have introduced a conceptually simple estimator ofexpectation values for heavy tailed probability distributionswhose power-law tail indices are known. Unlike the stan-dard estimator, the tail-regression estimator is immune to thebreakdown of the central limit theorem for distributions ofleading-order tail exponent < µ ≤ . Our regression frame-work is designed to yield asymptotically normally distributedresults, as reflected in the observed asymptotic M − / decaywith sample size M of the uncertainty in all of our tests, andsuccessfully exploits known analytical relations between lead-ing order tail coefficients to improve the estimation. We havealso demonstrated the estimation of the variance of distribu-tions of leading-order tail exponent < µ ≤ whose uncer-tainty is ill-defined under standard estimation.Our tests of the tail-regression estimator with variationaland diffusion quantum Monte Carlo data identifies two usecases of particular practical relevance. While the standardestimator yields accurate expectation values of the energy atlarge enough sample sizes, standard confidence intervals onthe VMC variance of the local energy are formally unde-fined. The tail-regression estimator is capable of deliveringvalid confidence intervals on the variance which are up to smaller than those associated with the nominal standarderror in our tests. The tail-regression estimator also yieldsvalid, convergent confidence intervals on the VMC and DMCatomic force, including the Hellmann-Feynman force in all-electron systems for which we obtain uncertainties up to times smaller than the nominal standard error. The combina- tion of the “zero-variance” variance-reduction technique withthe tail-regression estimator yields accurate confidence inter-vals on the atomic force.Our present work shows that the principles underpinningthe tail-regression estimator are robust, and systematic use ofthe technique for treating quantum Monte Carlo data would bedesirable. However, further work could improve the applica-bility of our present formulation. We have used the bootstrapto enable the evaluation of meaningful confidence intervalson a range of functions, but this approach should be replacedwith the use of a closed expression for the uncertainty on thetail-regression estimator in production calculations. In turn,this would allow the development of an “on-the-fly” reformu-lation of the method that would avoid the need to store alllocal values of the desired observables, unaveraged, for lateranalysis. Dropping the requirement that sample points be in-dependent and serially uncorrelated would also be desirablein order to reduce the computational cost of the QMC calcu-lations. With these refinements, the tail-regression estimatorwill ultimately represent a great advance in ensuring the statis-tical soundness of results obtained from quantum Monte Carloand similar methods. ACKNOWLEDGMENTS
The authors thank John Trail, Neil Drummond, and RichardNeeds for useful discussions, and acknowledge the finan-cial support of the Max-Planck-Gesellschaft, the Engineer-ing and Physical Sciences Research Council of the UnitedKingdom under Grant No. EP/P034616/1, and the Royal So-ciety. Supporting research data can be freely accessed athttps://doi.org/10.17863/CAM.37836, in compliance with theapplicable Open Data policies. Our implementation of thetail-regression estimator can also be found at [48] and is dis-tributed with the
CASINO code [40].
Appendix A: Tail-index estimation methods
Tail-index estimation methods draw inference on the prin-cipal exponent µ of a power-law heavy tail of leading-orderform P A ( A ) = c A − µ at A → ∞ . The Hill estimator [27] of6the first-order tail index, µ − , is µ − ≈ M R M R X m =1 log A ( m ) − log A ( M R +1) . (A1)It can be shown that Eq. A1 is in fact equivalent to alogarithmic-scale least-squares fit to the tail of the distribu-tion [28]. Substituting the leading-order form of P A ( A ) intoEq. 27 and taking logarithms yields log A ( m ) ≈ µ − − log q m ) + 1 µ − (cid:18) c µ − (cid:19) , (A2)which is a linear relationship between log A ( m ) and − log q m with slope µ − . Estimation of this slope by linear regression following Eq. A2 yields Eq. A1 for the fitted slope if the m thdata point is weighted by w m = (cid:18) log q M R +1 q m (cid:19) − , (A3)and c is set so that the fit passes through the ( M R + 1) thpoint. The optimal value of M R for the Hill estimator canthus be found by optimizing a goodness-of-fit measure withrespect to M R [28], such as the χ value of the fit. Regressionmethods for tail-index estimation are found to be particularlyrobust [30]. [1] M. H. Kalos and P. A. Whitlock, Monte Carlo Methods (Wiley,New York, 1986).[2] H. Flyvbjerg and H. Petersen,
Error estimates on averages ofcorrelated data , J. Chem. Phys. , 461 (1989).[3] M. Jonsson, Standard error estimation by an automated block-ing method , Phys. Rev. E , 043304 (2018).[4] D. W. Stroock, Probability Theory: An Analytic View (Cam-bridge University Press, Cambridge, England, 1993).[5] D. M. Ceperley and B. J. Alder,
Ground State of the ElectronGas by a Stochastic Method , Phys. Rev. Lett. , 566 (1980).[6] W. M. C. Foulkes, L. Mitas, R. J. Needs, and G.Rajagopal, Quantum Monte Carlo simulations of Solids ,Rev. Mod. Phys. , 33 (2001).[7] P. L´opez R´ıos, P. Seth, N. D. Drummond, and R. J. Needs, Framework for constructing generic Jastrow correlation fac-tors , Phys. Rev. E , 036703 (2012).[8] P. L´opez R´ıos, A. Ma, N. D. Drummond, M. D. Towler, and R.J. Needs, Inhomogeneous backflow transformations in quantumMonte Carlo calculations , Phys. Rev. E , 066701 (2006).[9] M. Bajdich, L. Mitas, G. Drobn´y, L. K. Wagner, andK. E. Schmidt, Pfaffian Pairing Wave Functions inElectronic-Structure Quantum Monte Carlo Simulations ,Phys. Rev. Lett. , 130201 (2006).[10] P. Seth, P. L´opez R´ıos, and R. J. Needs, QuantumMonte Carlo study of the first-row atoms and ions ,J. Chem. Phys. , 084105 (2011).[11] N. D. Drummond, B. Monserrat, J. H. Lloyd-Williams, P. L´opezR´ıos, C. J. Pickard, and R. J. Needs,
Quantum Monte Carlostudy of the phase diagram of solid molecular hydrogen at ex-treme pressures , Nat. Commun. , 7794 (2015)[12] V. R. Pandharipande, S. C. Pieper, and R. B. Wiringa, Varia-tional Monte Carlo calculations of ground states of liquid Heand He drops , Phys. Rev. B , 4571 (1986).[13] J. R. Trail, Heavy-tailed random error in quantum Monte Carlo ,Phys. Rev. E , 016703 (2008).[14] R. P. Feynman, Forces in Molecules , Phys. Rev. , 340 (1939).[15] P. Pulay, Ab initio calculation of force constantsand equilibrium geometries in polyatomic molecules ,Mol. Phys. , 197 (1969).[16] Mos´e Casalegno, Massimo Mella, and Andrew M.Rappe, Computing accurate forces in quantum MonteCarlo using Pulay’s corrections and energy minimization ,J. Chem. Phys. , 7193 (2003). [17] A. Badinski, P. D. Haynes, J. R. Trail, and R. J. Needs,
Methodsfor calculating forces within quantum Monte Carlo simulations ,J. Phys.: Condens. Matter Zero-Variance Principle for MonteCarlo Algorithms , Phys. Rev. Lett. , 4682 (1999).[19] R. Assaraf and M. Caffarel, Computing forces with quantumMonte Carlo , J. Chem. Phys. , 4028 (2000).[20] R. Assaraf and M. Caffarel,
Zero-variance zero-bias principlefor observables in quantum Monte Carlo: Application to forces ,J. Chem. Phys. , 10536 (2003).[21] M. C. Per, S. P. Russo, and I. K. Snook,
Electron-nucleus cusp correction and forces in quantum Monte Carlo ,J. Chem. Phys. , 114106 (2008).[22] A. Badinski, J. R. Trail, and R. J. Needs,
Energy derivativesin quantum Monte Carlo involving the zero-variance property ,J. Chem. Phys. , 224101 (2008).[23] A. Badinski and R. J. Needs,
Accurate forces in quan-tum Monte Carlo calculations with nonlocal pseudopotentials ,Phys. Rev. E , 036707 (2007).[24] A. Badinski and R. J. Needs, Total forces in the diffu-sion Monte Carlo method with nonlocal pseudopotentials ,Phys. Rev. B , 035134 (2008).[25] S. Chiesa, D. M. Ceperley, and S. Zhang, Accurate, Efficient,and Simple Forces Computed with Quantum Monte Carlo Meth-ods , Phys. Rev. Lett. , 036404 (2005).[26] C. Attaccalite and S. Sorella, Stable Liquid Hydrogen at HighPressure by a Novel Ab Initio Molecular-Dynamics Calcula-tion , Phys. Rev. Lett. , 114501 (2008).[27] D. M. Hill,
A Simple General Approach to Inference About theTail of a Distribution , Ann. Stat. , 1163 (1975).[28] J. Beirlant, P. Vynckier, and J. L. Teugels, Tail IndexEstimation, Pareto Quantile Plots Regression Diagnostics ,JASA , 1659 (1996).[29] M. Kratz and S. I. Resnick, The qq-estimator and heavy tails ,Comm. Statist. Stochastic Models , 699 (1996).[30] C. Baek and V. Pipiras, Estimation of parameters in heavy-tailed distribution when its second order tail parameter isknown , J. Stat. Plan. Inference , 1957 (2010).[31] T. Kato,
On the eigenfunctions of many-particle systems inquantum mechanics , Comm. Pure Appl. Math. , 151 (1957).[32] C. J. Umrigar, K. G. Wilson, and J. W. Wilkins, Optimizedtrial wave functions for quantum Monte Carlo calculations ,Phys. Rev. Lett. , 1719 (1988). [33] M. Holzmann, D. M. Ceperley, C. Pierleoni, and K. Esler, Back-flow correlations for the electron gas and metallic hydrogen ,Phys. Rev. E , 046707 (2003).[34] A. Ma, M.D. Towler, N.D. Drummond, and R.J. Needs Scheme for adding electron-nucleus cusps to Gaussian orbitals ,J. Chem. Phys. , 224322 (2005).[35] J. R. Trail and R. J. Needs,
Correlated elec-tron pseudopotentials for 3d-transition metals ,J. Chem. Phys. , 064110 (2015).[36] J. R. Trail and R. J. Needs,
Shape and energy con-sistent pseudopotentials for correlated electron systems ,J. Chem. Phys. , 204107 (2017).[37] A. Badinski,
Forces in Quantum Monte Carlo , PhD thesis, Uni-versity of Cambridge (2008).[38] B. Efron and R. Tibshirani,
Bootstrap Methods for StandardErrors, Confidence Intervals, and Other Measures of StatisticalAccuracy , Stat. Sci. , 54 (1986).[39] N. D. Drummond, M. D. Towler, and R. J. Needs, Jas-trow correlation factor for atoms, molecules, and solids ,Phys. Rev. B , 235119 (2004).[40] R. J. Needs, M. D. Towler, N. D. Drummond,and P. L´opez R´ıos, Continuum variational anddiffusion quantum Monte Carlo calculations ,J. Phys.: Condens. Matter , 023201 (2010).[41] J. Toulouse and C. J. Umrigar, Full optimization ofJastrow-Slater wave functions with application to the first-row atoms and homonuclear diatomic molecules ,J. Chem. Phys. , 174101 (2008).[42] T. H. Dunning Jr.,
Gaussian basis sets for use in correlatedmolecular calculations. I. The atoms boron through neon andhydrogen , J. Chem. Phys. , 1007 (1989).[43] R. K. Kendall, T. H. Dunning, and R. J. Harrison, Electronaffinities of the first-row atoms revisited. Systematic basis setsand wave functions , J. Chem. Phys. , 6796 (1992).[44] H.-J. Werner, P. J. Knowles, G. Knizia, F. R. Manby, M.Sch¨utz, P. Celani, W. Gy¨orffy, D. Kats, T. Korona, R. Lindh,A. Mitrushenkov, G. Rauhut, K. R. Shamasundar, T. B. Adler,R. D. Amos, A. Bernhardsson, A. Berning, D. L. Cooper, M.J. O. Deegan, A. J. Dobbyn et al. , MOLPRO version 2012.1, apackage of ab initio programs
Fixed-node quantum Monte Carlo for molecules ,J. Chem. Phys. , 5593 (1982).[46] J. W. Moskowitz, K. E. Schmidt, M. A. Lee, and M. H. Kalos, A new look at correlation energy in atomic and molecular sys-tems. II. The application of the Green’s function Monte Carlomethod to LiH , J. Chem. Phys. , 349 (1982).[47] R. M. Lee, G. J. Conduit, N. Nemec, P. L´opez R´ıos, and N. D.Drummond Strategies for improving the efficiency of quantumMonte Carlo calculations , Phys. Rev. E83