[PDF] Adaptive Multidimensional Integration: VEGAS Enhanced

Abstract

We describe a new algorithm, VEGAS+, for adaptive multidimensional Monte Carlo integration. The new algorithm adds a second adaptive strategy, adaptive stratified sampling, to the adaptive importance sampling that is the basis for its widely used predecessor VEGAS. Both VEGAS and VEGAS+ are effective for integrands with large peaks, but VEGAS+ can be much more effective for integrands with multiple peaks or other significant structures aligned with diagonals of the integration volume. We give examples where VEGAS+ is 2-19 times more accurate than VEGAS. We also show how to combine VEGAS+ with other integrators, such as the widely available MISER algorithm, to make new hybrid integrators. For a different kind of hybrid, we show how to use integrand samples, generated using MCMC or other methods, to optimize VEGAS+ before integrating. We give an example where preconditioned VEGAS+ is more than 100 times as efficient as VEGAS+ without preconditio ing. Finally, we give examples where VEGAS+ is more than 10 times as efficient as MCMC for Bayesian integrals with D = 3 and 21 parameters. We explain why VEGAS+ will often outperform MCMC for small and moderate sized problems.

Full PDF

AAdaptive Multidimensional Integration:

VEGAS

Enhanced

G. Peter Lepage ∗ Department of Physics, Cornell University, Ithaca, NY, 14853 (Dated: September 22, 2020)We describe a new algorithm,

VEGAS +, for adaptive multidimensional Monte Carlo integration. The newalgorithm adds a second adaptive strategy, adaptive stratiﬁed sampling, to the adaptive importance samplingthat is the basis for its widely used predecessor

VEGAS . Both

VEGAS and

VEGAS + are effective for integrandswith large peaks, but

VEGAS + can be much more effective for integrands with multiple peaks or other signiﬁ-cant structures aligned with diagonals of the integration volume. We give examples where

VEGAS + is 2–17 × more accurate than VEGAS . We also show how to combine

VEGAS + with other integrators, such as the widelyavailable

MISER algorithm, to make new hybrid integrators. For a different kind of hybrid, we show how to useintegrand samples, generated using MCMC or other methods, to optimize

VEGAS + before integrating. We givean example where preconditioned

VEGAS + is more than 100 × as efﬁcient as VEGAS + without precondition-ing. Finally, we give examples where

VEGAS + is more than 10 × as efﬁcient as MCMC for Bayesian integralswith D = 3 and 21 parameters. We explain why VEGAS + will often outperform MCMC for small and moderatesized problems.

I. INTRODUCTION

Classic

VEGAS is an algorithm for adaptive multidimen-sional Monte Carlo integration. It is widely used in parti-cle physics: for example, in Monte Carlo event generators, and to evaluate low-order and high-order Feynman diagramsand cross sections numerically. It has also been used in otherﬁelds for a variety of applications including, for example,path integrals for chemical physics and option pricing inapplied ﬁnance, Bayesian statistics for astrophysics andmedical statistics, models of neuronal networks , wave-function overlaps for atomic physics, topological integralsfor condensed matter physics, and so on.Monte Carlo integration is unusually robust. It makes fewassumptions about the integrand — it needn’t be analytic oreven continuous — and provides useful measures for the un-certainty and reliability of its results. This makes it wellsuited to multidimensional integration, and in particular adap-tive multidimensional integration. Adaptive strategies are es-sential for integration in high dimensions because importantstructures in integrands often occupy small fractions of the in-tegration volume. One might not think, for example, of theinterior of a sphere of radius 0.5 enclosed by a unit hypercubeas a “sharp peak,” but it occupies only 0.0000025% of the hy-percube in D = 20 dimensions. Classic

VEGAS is very effective for integrands with sharppeaks, which are common in particle physics applications andBayesian integrals. While it is particularly effective for in-tegrals that are separable (into a product of one-dimensionalintegrals), it also works very well for non-separable integralswith large peaks. It works less well for integrands with multi-ple peaks or other important structures aligned with diagonalsof the integration volume, although it is usually much betterthan Simple Monte Carlo integration.In this paper, we describe a modiﬁcation of classic

VEGAS ,which we call

VEGAS +, that adds a second adaptive strategyto classic

VEGAS ’s adaptive importance sampling . The sec-ond strategy is a form of adaptive stratiﬁed sampling thatmakes VEGAS + far more effective in dealing with multiplepeaks and diagonal structures in integrands, given enough in- tegrand samples.We begin in Sec. II by reviewing the algorithm for (iter-ative) adaptive importance sampling used in classic

VEGAS .We discuss in particular the variable map used by

VEGAS toswitch to new integration variables that ﬂatten the integrand’speaks. This is how

VEGAS implements importance sampling.We describe the new algorithm

VEGAS + in Sec. III and giveexamples that illustrate how and why it works. We also dis-cuss its limitations.The map used by

VEGAS to implement importance sam-pling can be used in conjunction with other integration algo-rithms to create new hybrid algorithms. We show how this isdone in Sec. IV, where we combine

VEGAS + with the

MISER algorithm. There we also show how to use sample data, froma Markov Chain Monte Carlo (MCMC) or other peak-ﬁndingalgorithm, to optimize the

VEGAS map before integrating; thiscan greatly reduce the cost of integrating functions with mul-tiple high, narrow peaks.Our conclusions are in Sec. V. We continue in two appen-dices with further

VEGAS + examples that illustrate its use foradaptive multidimensional summation, high-order Feynmandiagrams, and Bayesian curve ﬁtting, where we compare VE - GAS + with MCMC for a 3-dimensional and a 21-dimensionalproblem.When comparing algorithms we do not look at computerrun times because these are too sensitive to implementationdetails. For realistic applications the cost of evaluating the in-tegrand usually exceeds other costs, so we measure efﬁciencyby the number of integrand evaluations used. Where statisti-cal uncertainties are quoted, they correspond to one standarddeviation; small differences (e.g., 10%) between uncertain-ties are not signiﬁcant. The VEGAS + examples here were ana-lyzed using a widely available Python/Cython implementationﬁrst released in 2013. The integrands are written in (vector-ized) Python or, in one case, Fortran77.One change since classic

VEGAS was introduced 45 yearsago is the enormous increase in speed of computers. As aresult

VEGAS + integrals using or integrand evalua-tions require only minutes for simple integrands on a mod-ern laptop. VEGAS + is also easily conﬁgured for parallel a r X i v : . [ phy s i c s . c o m p - ph ] S e p computing. II. ADAPTIVE IMPORTANCE SAMPLING

In this section we review the adaptive strategy used in theoriginal

VEGAS algorithm: its remapping of the integrationvariables in each direction to optimize Monte Carlo estimatesof the integral. The strategy is an implementation of the stan-dard technique of importance sampling. In the following sec-tions we review how

VEGAS implements this strategy, ﬁrst forone-dimensional integrals and then for multidimensional inte-grals.

A. Remapping the Integration Variable

For one-dimensional integrals, the

VEGAS strategy is to re-place the original integral I = b (cid:90) a dx f ( x ) (1)by an equivalent integral over a new variable y , I = (cid:90) dy J ( y ) f ( x ( y )) , (2)where J ( y ) is the Jacobian of the transformation. The trans-formation is chosen to minimize the uncertainty in a MonteCarlo evaluation of the integral in y -space.A Simple Monte Carlo estimate of the y -integral (Eq. (2))is given by I ≈ I MC ≡ N ev (cid:88) y J ( y ) f ( x ( y )) (3)where the sum is over N ev random points y uniformly dis-tributed between 0 and 1. The estimate I MC is itself a randomnumber whose mean is the exact value I of the integral andwhose variance is σ I = 1 N ev (cid:16) (cid:90) dy J ( y ) f ( x ( y )) − I (cid:17) (4) = 1 N ev (cid:16) b (cid:90) a dx J ( y ( x )) f ( x ) − I (cid:17) . (5)The standard deviation σ I is an indication of the possible errorin the Monte Carlo estimate. The variance σ I can also beestimated from the Monte Carlo data: σ ≡ N ev − (cid:16) N ev (cid:88) y J ( y ) f ( x ( y )) − I (cid:17) (6)The distribution of the Monte Carlo estimates I MC becomesGaussian in the limit of large N ev , assuming J ( y ) f ( x ( y )) is sufﬁciently integrable. There are non-Gaussian corrections,but they vanish quickly with increasing N ev . For example,the Monte Carlo estimates for I and σ I would be uncorrelatedwith Gaussian statistics, but here have a nonzero correlationthat vanishes like /N : (cid:10) ( I MC − I )( σ − σ I ) (cid:11) = (cid:10) ( I MC − I ) (cid:11) = 1 N (cid:90) dy (cid:0) J ( y ) f ( x ( y )) − I (cid:1) . (7) B. VEGAS

Map and Importance Sampling

VEGAS implements the variable map described in the previ-ous section by dividing the x -axis into N g intervals boundedby: x = ax = x + ∆ x · · · x N g = x N g − + ∆ x N g − = b. (8)where N g = 1000 is typical. The transformed variable hasvalue y = i/N g at point x i , and varies linearly with x be-tween x i s. Thus x ( y ) ≡ x i ( y ) + ∆ x i ( y ) δ ( y ) (9)relates x to y , where functions i ( y ) and δ ( y ) are the integerand fractional parts of yN g , respectively: i ( y ) ≡ ﬂoor( yN g ) (10) δ ( y ) ≡ yN g − i ( y ) . (11)This transformation maps the interval [0 , in y -space ontothe the original integration region [ a, b ] in x -space. Intervalsof varying widths ∆ x i in x -space map into intervals of uni-form width ∆ y = 1 /N g in y -space. The Jacobian for thistransformation, J ( y ) = N g ∆ x i ( y ) ≡ J i ( y ) , (12)is a step function whose values are determined by the intervalwidths ∆ x i .Substituting this Jacobian into Eq. (5) for the uncertainty ina Monte Carlo integration gives σ I = 1 N ev (cid:16) (cid:88) i J i x i +∆ x i (cid:90) x i dx f ( x ) − I (cid:17) . (13)Treating the J i as independent variables, subject to the con-straint (cid:88) i ∆ x i J i = (cid:88) i ∆ y = 1 , (14)it is easy to show that σ I is minimized when J i ∆ x i x i +∆ x i (cid:90) x i dx f ( x ) = constant . (15)That is, the grid is optimal when the average value of J ( y ( x )) f ( x ) in each interval ∆ x i is the same for every in-terval.A transformation with this property greatly reduces thestandard deviation when the integrand has high peaks. TheJacobian ﬂattens the peaks in y -space and stretches them out,because J ≡ (cid:12)(cid:12)(cid:12)(cid:12) dxdy (cid:12)(cid:12)(cid:12)(cid:12) ∝ | f ( x ) | (16)becomes small near a peak. This means that a uniform MonteCarlo in y -space concentrates integrand samples around thepeaks in x -space. Each interval ∆ x i receives on average thesame number of samples ( = N ev /N g ), and the smallest inter-vals are placed where | f ( x ) | is largest (because J i ∝ ∆ x i ).Thus samples are concentrated in the most important regions,which is why this method is called importance sampling. C. Iterative Adaptation

The set of x i s deﬁned above constitutes the VEGAS map.To optimize Monte Carlo integration,

VEGAS varies the Jaco-bian of the transformation by varying the interval sizes ∆ x i ,while keeping the sum of all ∆ x i s constant. This is done it-eratively. First VEGAS estimates the integral with a uniformgrid, accumulating information about the integrand in the pro-cess. This information is then used to construct an improvedgrid, and

VEGAS makes a new estimate of the integral. Againinformation about the integrand is accumulated in the process,and used to further improve the grid. In this fashion, the VE - GAS map adapts to the integrand over several iterations.To illustrate how this is done, we continue with the exampleabove where we generate a Monte Carlo estimate of the inte-gral in y -space (Eq. (2)) by sampling the integrand at N ev ran-dom points y (Eq. (3)). Given some initial grid, VEGAS accu-mulates the average value of J f for each interval in the gridwhile sampling the integrand to estimate the integral: d i ≡ n i (cid:88) x ( y ) ∈ ∆ x i J ( y ) f ( x ( y )) , (17)where n i ≈ N ev /N g is the number of samples in interval ∆ x i .The averages d i are used to reﬁne the grid. The grid is optimalwhen all of the d i s are equal (Eq. (15)), so VEGAS adjusts thegrid intervals ∆ x i to make the d i more constant across theintegration region.This algorithm, like most adaptive algorithms, tends tooverreact in the early stages of optimizing its grid, since it hasrather poor information concerning the integrand at this stage.It is important, therefore, to dampen the reﬁnement process so as to avoid rapid, destabilizing changes in the grid. In VEGAS ,the d i s are ﬁrst smoothed and normalized: d i → (cid:80) i d i ×  (7 d + d ) / for i = 0( d i − + 6 d i + d i +1 ) / for i (cid:54) = 0 , N g − d N g − + 7 d N g − ) / for i = N g − . (18)Smoothing is particularly important if the integrand has largediscontinuities (e.g., step functions). For example, an abruptincrease in the integrand near the upper edge of interval ∆ x i might be missed completely by the samples. Without smooth-ing, VEGAS would see a sudden rise in the function beginningonly in ∆ x i +1 , and reﬁne the grid accordingly, thereby miss-ing the small but possibly signiﬁcant part of the step in inter-val ∆ x i . Smoothing makes this less likely by causing VEGAS to focus some effort on interval ∆ x i as well as ∆ x i +1 .Having smoothed the d i s, VEGAS then compresses theirrange, to avoid overreacting to atypically large sample valuesfor the integrand. This is done by replacing each d i with d i → (cid:16) − d i ln(1 /d i ) (cid:17) α , (19)where α ≥ is typically of order one. Parameter α canbe reduced in situations where VEGAS has trouble ﬁnding orholding onto the optimal grid; α = 0 implies no grid reﬁne-ment.The condition for an optimal map remains that all d i , nowsmoothed and compressed, be roughly equal. If the map isnot optimal, VEGAS attempts to improve it. First the d i s aretreated as continuous quantities, and each d i is distributed uni-formly over its interval ∆ x i . Then new intervals, speciﬁed by { x (cid:48) i , ∆ x (cid:48) i } , are chosen so that each contains an equal fractionof the total d = (cid:80) i d i . The following algorithm achieves this:1. Deﬁne δd to be the amount of d associated with eachinterval of the new grid, δd ≡ (cid:80) i d i N g , (20)and initialize the following variables: x (cid:48) = x x (cid:48) N g = x N g i = 0 = index of current new x (cid:48) j = 0 = index of current old xS d = 0 = amount of d accumulated . (21)2. Increment i . If i ≥ N g , the new grid is ﬁnished.3. Skip to the next step if S d ≥ δd ; otherwise add d j to S d ,increment j , and return to the beginning of this step.4. Subtract δd from S d , and compute the boundary of thenew interval by interpolation: x (cid:48) i = x j − S d d j − ∆ x j − . (22)Return to Step 2.Replacing the old map with the new map, VEGAS proceedsto generate a new Monte Carlo estimate for the integral andanother new map. This entire process is repeated until the maphas converged and the estimate of the integral is sufﬁcientlyaccurate.

D. Multidimensional Integrals

Monte Carlo integration, even with adaptive importancesampling, is not usually competitive with other algorithms forone-dimensional integration. It rapidly becomes competitive,however, as the dimensionality increases.The

VEGAS algorithm extends the algorithm describedabove to D -dimensional integrals over variables x µ (with µ = 1 . . . D ) by replacing each x µ with a new variable y µ .The variable transformation is speciﬁed by an independent VEGAS map { x µi , ∆ x µi } for each direction.The resulting integral is I = (cid:90) d D y J ( y ) f ( x ( y )) (23)where x ( y ) is the D -dimensional VEGAS map and J ( y ) its Ja-cobian. A Simple Monte Carlo estimate of this integral is ob-tained by sampling the integrand at random points y = { y µ } distributed uniformly within the unit hypercube at the origin: < y µ < . (24)The integrand samples are also used to calculate the averages d µi ≡ n µi (cid:88) x µ ( y µ ) ∈ ∆ x µi J ( y ) f ( x ) (25)for every interval on every integration axis. These are used toimprove the grid for each variable after each iteration, follow-ing the procedure described in Section II C.Fig. 1 shows the grid corresponding to a VEGAS map opti-mized for the D = 4 dimensional integral (cid:90) d x (cid:16) e − x − r ) + e − x − r ) (cid:17) , (26)where vector x = ( x , x , x , x ) , and r = (0 . , . , . , . r = (0 . , . , . , . . (27)The grid concentrates increments near 0.33 and 0.67 for x ,and near 0.5 in the other directions. Each rectangle in theﬁgure receives, on average, the same number of Monte Carlointegration samples.This VEGAS map has N g = 1000 increments along eachaxis. The accuracy of the y -space integrals is typically insen-sitive to N g so long as it is large enough. With N ev = 10 integrand evaluations, the accuracy improves from 11% with . . . . . . x . . . . . . x FIG. 1.

VEGAS map for the D = 4 integral deﬁned in Eqs. (26)and Eq. (27). The ﬁgure shows grid lines for every 50 th incrementalong the x and x axes; grids for the x and x axes are the sameas for x . 5 10 15 20number of iterations0510152025 % un ce r t a i n t y FIG. 2. Percent uncertainty ( σ ) for each VEGAS iteration in a sim-ple y -space Monte Carlo for the D = 4 integral deﬁned in Eqs. (28)and (27). The solid line shows how the uncertainty improves withsuccessive iterations of the reﬁnement process when damping pa-rameter α = 0 . ; the dotted line shows (unstable) improvementwhen α = 1 . . N g = 1 (i.e., no VEGAS map) to 0.3% at N g = 100 and ﬂat-tens out at 0.1% around N g = 700 .The VEGAS map is particularly effective for integrals likethat in Eq. (26), because the integral over each Gaussian canbe separated into a product of one-dimensional integrals overeach direction. It also works well, however, for other inte-grands with high peaks that are not separable in this way. Thesolid line in Fig. 2, for example, shows how the uncertaintyin a y -space Monte Carlo is reduced over 20 iterations forthe D = 4 integral (cid:90) d x (cid:88) i =1 Θ( | x − r i | < . , (28)where the r i are given in Eq. (27) and Θ( x ) = (cid:40) if x = True0 if x = False . (29)This integral is harder than the previous one, because thepeaks have no shoulders and so are difﬁcult to ﬁnd — the in-tegrand vanishes over 99.96% of the integration volume. The VEGAS map is nevertheless able to reduce the fractional un-certainty from 24% before adapting to 0.34% after 10–20 it-erations, where N ev = 10 Monte Carlo samples are used foreach iteration. The damping parameter was set lower here,to α = 0 . , to ensure smooth adaptation (solid line); set-ting α = 1 . leads to instability (dashed line).This last example also illustrates how the VEGAS map canimprove integrals over volumes with irregular shapes. MonteCarlo integration works well even with discontinuous inte-grands and so has no problem with the Θ functions that de-ﬁne the integration volume. The VEGAS map helps ﬁnd theintegration region and concentrate samples in that region. Ob-viously it is better, where possible, to redeﬁne the integrationvariables so that the integration region is rectangular.

E. Combining and Comparing Iterations

VEGAS , being iterative, generates a series of estimates I j of the integral (Eq. (3)), each with its own error estimate σ j ≡ σ I j (Eq. (6)). As discussed above, the distribution of I j s for a given VEGAS map is approximately Gaussian whena sufﬁcient number of samples N ev is used (and assuming theintegral is square integrable). Then we can combine estimatesfrom separate iterations to obtain a cumulative estimate of theintegral and its standard error: I = (cid:80) j I j /σ j (cid:80) j /σ j (30) σ I = (cid:16) (cid:88) j σ j (cid:17) − / . (31)The cumulative estimate is usually superior to the estimatesfrom separate iterations.It is important to verify that results from different iterationsare consistent with each other within the estimated uncertain-ties. This provides a direct check on the reliability of the errorestimates. For example, we can calculate the χ statistic forthe estimates: χ = (cid:88) j (cid:0) I j − I (cid:1) σ j . (32) We expect χ to be of order the number of iterations (less one)when the I j are approximately Gaussian, and the estimates ofthe uncertainties σ j are reliable. If χ is considerably largerthan this, either or both of I and σ I may be unreliable. Thereare two common causes for χ s that are too large.The ﬁrst common cause is that early iterations, before the VEGAS map has adapted, may give very poor estimates for theintegral and its uncertainty. This happens, for example, whenthe integrand has high, narrow peaks that are largely missed inearly iterations, leading to estimates for the integral and errorthat are both much too small. A standard remedy is to omitthe early iterations from the determinations of I and σ I .The second cause for χ s that are too large is that the num-ber of samples N ev is insufﬁciently large to guarantee Gaus-sian statistics for the I j , even after the VEGAS map has fullyadapted to the integrand. The threshold for an adequate N ev is highly dependent on the integrand. In practice one ﬁndsthis threshold through trial and error, by looking to see howlarge N ev needs to be in order to obtain stable results and areasonable χ . Although the statistical error in I can be re-duced by increasing either the number of samples N ev or thenumber of iterations N it , it is generally better to increase N ev while keeping N it just large enough to ﬁnd the optimal VE - GAS map and measure a χ . N it = 5 –20 is usually enough.There is another, related reason for increasing the num-ber of samples N ev rather than the number of iterations N it .While the estimates I j from individual iterations give un-biased estimates of the integral for any N ev , the weightedsum I only becomes unbiased when N ev is sufﬁciently largethat non-Gaussian effects are negligible. The leading non-Gaussian effect is the correlation between ﬂuctuations in I j and σ j (Eq. (7)). It introduces a bias in the weighted averagethat vanishes like /N with increasing N ev and therefore isusually negligible compared to the statistical uncertainty σ I ,which vanishes far more slowly. For example, the bias in I for the Gaussian integral discussed above (Eq. (26)) is onlyabout − . % when N ev = 500 (with α = 1 . ). This is tentimes smaller than σ j , and so is negligible compared to σ I un-less N it is very large ( N it ≥ ). The bias falls to − . %when N ev = 1000 .The bias coming from the weighted average can usually beignored, but it is easy to avoid it completely, if desired. Thisis done by discarding results from the ﬁrst 5 or 10 iterationsof the VEGAS algorithm (while the grid is adapting), and thenpreventing the algorithm from adapting in subsequent itera-tions (by setting damping parameter α = 0 ). The unweightedaverage of the latter I j s provides an unbiased estimate of theintegral, and the unweighted average of the σ j s divided by √ N it gives an estimate of the statistical uncertainty. III. ADAPTIVE STRATIFIED SAMPLING

The y -space (Eq. (23)) integrals in the previous sectionwere estimated using Simple Monte Carlo integration. The VEGAS implementation currently in wide use (“classic VE - GAS ”) improves on this by using stratiﬁed Monte Carlo sam-pling in y -space rather than Simple Monte Carlo. Given N ev samples per iteration, the algorithm divides each y µ axis( µ = 1 . . . D ) into N st = ﬂoor(( N ev / /D ) , (33)stratiﬁcations of width ∆ y st = 1 /N st . (34)This divides divide y -space into N D st hypercubes, each withvolume ∆ y D st . The full integral and its variance are the sumsof contributions from each hypercube h : I = (cid:88) h ∆ I h ≈ I MC σ I = (cid:88) h σ I h ≈ σ . (35)Simple Monte Carlo estimates are made for ∆ I h (c.f., Eq. (3))and σ I h (c.f., Eq. (5)) to obtain estimates I MC and σ forthe total integral and its variance, respectively. The number ofintegrand samples used is n ev ≡ ﬂoor( N ev /N D st ) ≥ (36)per hypercube. The standard deviation for a stratiﬁed MonteCarlo estimate typically falls with increasing N st , potentiallyas quickly as /N st . We can improve on the stratiﬁcation strategy used by classic

VEGAS by allowing the number of integrand samples n h usedin each hypercube h to vary from hypercube to hypercube.Then the variance in the Monte Carlo estimate for the integralis σ I = (cid:88) h σ h ( Jf ) n h (37)where σ h ( Jf ) ≡ Ω h (cid:90) Ω h d D y (cid:0) J ( y ) f ( x ( y )) (cid:1) − (cid:16) (cid:90) Ω h d D y J ( y ) f ( x ( y )) (cid:17) (38)and Ω h is the hypercube’s volume in y -space. Varying the n h independently, constrained by (cid:88) h n h = N ev , (39)it is easy to show that σ I is minimized when n h ∝ σ h ( Jf ) . (40)The innovation in the new VEGAS (“ VEGAS +”) is to re-distribute integrand samples across the hypercubes, accordingto Eq. (40), after each iteration. The σ h ( Jf ) are estimatedin an iteration using the integrand samples used to estimatethe integral. In this way the distribution of integrand samplesacross hypercubes is optimized over several iterations, at thesame time as the VEGAS map is optimized.In detail, the algorithm for reallocating samples across hy-percubes in

VEGAS + is as follows: 1. Choose a somewhat smaller number of stratiﬁcations sothere are enough samples to allow for signiﬁcant varia-tion in n h : N st = ﬂoor(( N ev / /D ) . (41)2. During each iteration accumulate estimates σ h ( Jf ) ≈ Ω h n h (cid:88) y ∈ Ω h (cid:0) J ( y ) f ( x ( y )) (cid:1) − (cid:16) Ω h n h (cid:88) y ∈ Ω h J ( y ) f ( x ( y )) (cid:17) , (42)for each hypercube using the same samples used to es-timate the integral.3. Introduce a damping parameter β ≥ by replacing σ h ( Jf ) with d h ≡ (cid:0) σ j ( JF ) (cid:1) β . (43)Choosing β = 1 corresponds to the optimal distribution(Eq. (40)), but a somewhat smaller value can help avoidoverreaction to random ﬂuctuations. Setting β = 0 re-sults in n h values that are all the same — the stratiﬁedsampling becomes non-adaptive, as in classic VEGAS .We use β = 0 . for the examples in this paper.4. Recalculate the number of samples for each hypercube, n h = max (cid:0) , N ev d h / (cid:88) h (cid:48) d h (cid:48) (cid:1) , (44)for use in the next iteration.The VEGAS map is also updated after each iteration, asdescribed in Section II C. The allocation of integrand sam-ples converges rapidly once the

VEGAS map has converged.The optimal

VEGAS map for an integrand is independent ofthe allocation of samples, but reallocating samples accordingto Eq. (40) can signiﬁcantly speed the discovery of that opti-mum because there is better information about the integrandearlier on. The independence of the optimal

VEGAS map fromthe sample allocation improves the algorithm’s stability —random ﬂuctuations in the sample allocation are unlikely totrigger big changes in the

VEGAS map.

A. Diagonal Structure

Adaptive stratiﬁed sampling as described in the previoussection provides little or no improvement over classic

VEGAS for integrals like those in the previous sections, where the Ja-cobian from the

VEGAS map can ﬂatten the integrand’s peaksalmost completely and spread them out to ﬁll y -space. It iseasy, however, to create integrals for which the new adaptivestratiﬁed sampling makes a big difference. samples/iteration ( N ev )10 − − % un ce r t a i n t y VEGAS + VEGAS

FIG. 3. Percent uncertainty ( σ ) in the integral estimates from30 iterations of classic VEGAS (top) and

VEGAS + (bottom) is plottedversus the number N ev of integrand evaluations (samples) used periteration. Integral estimates from the ﬁrst ten iterations are ignoredin each case. Damping parameter α = 0 . in both cases. Dampingparameter β = 0 for classic VEGAS ; β = 0 . for VEGAS +. Classic

VEGAS becomes unstable below N ev = 3 × ; VEGAS + is unstablebelow × . The number N st of stratiﬁcations per axis used by VEGAS + varies from 3 to 8 over this range of sample sizes N ev . Consider, for example, the eight-dimensional integral (cid:90) d x (cid:88) i =1 e − | x − r i | (45)whose integrand has three narrow peaks along the diagonal ofthe integration volume, at r = (0 . , . , . , . , . , . , . , . r = (0 . , . , . , . , . , . , . , . r = (0 . , . , . , . , . , . , . , . . (46)The locations along the diagonal were chosen randomly. Un-like Gaussians, these integrands cannot be factored into aproduct of separate functions for each direction.Fig. 3 shows that the uncertainties in the integral estimatesfrom classic VEGAS are 14–19 × larger than those from VE - GAS +, using the same number of integrand samples N ev periteration. Measured from N ev = 10 , the uncertainty gener-ated by N it iterations of the new algorithm falls roughly like σ I ∝ √ N it N . (47)with increasing N it and N ev — as expected, a larger N ev ismore valuable than a larger N it for a given cost N it × N ev . VEGAS + gives reliable results down to N ev = 10 ; classic VEGAS is unusable below N ev = 3 × , where it typicallymisses out one or more of the three peaks.What makes this integral challenging for classic VEGAS isthe diagonal structure of the integrand. An axis-oriented adap-tation strategy, like that used by classic

VEGAS , will generallyhave difﬁculty handling large structures aligned along diago-nals. . . . . . . x . . . . . . x FIG. 4.

VEGAS map for the integral in Eq. (45). Every 33 rd grid lineis drawn for axes x and x ; grids for the other axes are the same. The problem for classic

VEGAS is obvious from picturesof the optimal

VEGAS map for this integrand (Fig. 4). Thisgrid concentrates integrand samples at the three peaks on thediagonal, but also at − additional points, wherethe integrand is very small: r = (0 . , . , . , . . . ) r = (0 . , . , . , . . . ) r = (0 . , . , . , . . . ) (48) r = (0 . , . , . , . . . ) . . . Integrand samples at these phantom peaks are wasted, greatlyreducing the effective number of samples. The adaptive strat-iﬁcation used by

VEGAS + can transfer integration points fromthe phantom peaks to the real peaks, leading to a much largereffective N ev . Fig. 5 shows how samples are distributed across hypercubes by VEGAS + with N ev ≈ samples per iter-ation; more than half of the hypercubes have only 2 samples.Classic VEGAS uses 2 samples in each of hypercubes for(approximately) the same N ev .Another example is the integral (cid:90) − d x e − x T H − x / , (49)where H is the × Hilbert matrix. This integrand has asharp ridge along an oblique axis (Fig. 6). Classic

VEGAS obtains a 0.25% accurate result from the last ﬁve of seven it-erations with N ev = 4 × , while VEGAS + is about 3 × moreaccurate. nu m b e r o f hyp e r c ub e s FIG. 5. Distribution of integrand samples across hypercubes used by

VEGAS + when evaluating the integral in Eq. (45). There are hy-percubes and N ev ≈ samples in all. − x − x − x − x FIG. 6. Two views of 10,000 random points distributed with densityproportional to the integrand of Eq. (49). One view is projected ontothe x , x plane (left) and the other onto the x , x plane (right).Correlation coefﬁcients for x , x and x , x are 0.866 and 0.986,respectively. B. Limitations

The chief limitation of

VEGAS +’s adaptive stratiﬁed sam-pling is that it requires at least N st = 2 stratiﬁcations perdirection to have any effect on results. From Eq. (41), thisrequires N ev ≥ N D st ≥ D +2 (50)integrand evaluations per iteration. While this is not much of arestriction for dimension D = 5 or 6, adaptive stratiﬁed sam-pling only turns on when N ev ≥ . × for D = 25 . Thisis still manageable but in practice there will be no differencebetween VEGAS + and classic

VEGAS for dimensions that aremuch higher.In some situations it is possible to circumvent this restric-tion, at least partially, by using different numbers N µ st of strat-iﬁcations in different directions µ . Most integrands have morestructure in some directions than in others. Concentratingstratiﬁcations in those directions, with fewer stratiﬁcations ornone in other directions, allows VEGAS + to use stratiﬁed sam-pling with smaller values of N ev . We give an example (with . . . . . . y . . . . . . y FIG. 7. A partition of the integration volume generated by

MISER for the D = 2 dimensional version of integral Eq. (45). MISER used × samples to generate this partition and estimate the integral.There are 271 sub-volumes. D = 21 ) at the end of Appendix B.Note, ﬁnally, that the VEGAS map used by both classic VE - GAS and

VEGAS + remains effective for arbitrarily large di-mensions D . IV. VEGAS+ HYBRIDS

In this section we discuss how

VEGAS + can be combinedwith other algorithms to make hybrid integrators. We look attwo strategies. In the ﬁrst, we use several iterations of VE - GAS + to generate a

VEGAS map x ( y ) that is optimized for theintegrand. We then used a different (adaptive) algorithm toevaluate the y -space integral Eq. (23). This strategy, in effect,replaces VEGAS +’s adaptive stratiﬁed sampling with the otheralgorithm. We illustrate it in Sec. IV A by combining

VEGAS +with the widely available

MISER algorithm, and a variation onthat algorithm,

MISER +.The second strategy employs a Markov Chain Monte Carlo(MCMC) or other peak-ﬁnding algorithm to generate a set ofsamples { x, f ( x ) } of the integrand. These are used to precon-dition the VEGAS map before integrating. The sample points x need to cover important regions of the integration volume, butotherwise are unrestricted. The preconditioning makes it eas-ier for VEGAS + to discover the important regions. In Sec. IV Bwe illustrate this approach with integrands having multiple,narrow peaks. We also compare preconditioned

VEGAS + witha new algorithm optimized for this strategy. A. MISER and

MISER + For our ﬁrst example of a hybrid integrator, we combine

VEGAS + with the

MISER algorithm. MISER uses adaptivestratiﬁed sampling where the integration volume is recursivelypartitioned into a large number of rectangular sub-volumes(Fig. 7) to increase the accuracy of the integral’s estimate.

MISER appears to be more effective for integrands with peaksaligned along diagonals than it is for peaks aligned parallel tointegration axes. This is opposite from

VEGAS maps, suggest-ing that the combination might be particularly effective.In the following examples, we compare

VEGAS + with

MISER and with a

VEGAS - MISER hybrid. In each case werun

VEGAS + with various values for the number N ev of sam-ples per iteration, and 15 iterations, discarding results fromthe ﬁrst 5. We use default values for the damping parameters: α = 0 . and β = 0 . . To compare, we run MISER with N ev samples. The VEGAS - MISER hybrid uses 5 iterationsof

VEGAS + to develop a

VEGAS map x ( y ) and then estimatesthe y -space integral Eq. (23) using MISER with N ev inte-grand samples.We also compare these algorithms with a variation on MISER , which we call

MISER +. The procedure for

MISER +is: 1. Use

MISER with half the sample points to generate andsave a partition of the integration space optimized forthe integrand f ( x ) . Also save MISER ’s estimate for σ i ( f ) = Ω i (cid:90) Ω i d D x f ( x ) − (cid:16) (cid:90) Ω i d D x f ( x ) (cid:17) (51)in each sub-volume Ω i . Ignore MISER ’s estimate for theintegral.2. Distribute the remaining half of the integrand sam-ples across the sub-volumes so that the number n i ofsamples in each sub-volume is proportional to σ i ( f ) ,using the procedure described for VEGAS + (Eqs. (43and (44)).3. Estimate the integral in each sub-volume of the partitionusing Simple Monte Carlo with the number of samplepoints allocated in the previous step. Add the estimatesfrom each sub-volume to obtain an estimate for the totalintegral (as in Eq. (35)).The relation between

MISER + and

MISER is similar to thatbetween

VEGAS + and classic

VEGAS . MISER + can also becombined with

VEGAS maps, as described above.We did detailed comparisons for two different integrals.The ﬁrst is the D = 4 dimensional integral (cid:90) d x (cid:88) i =1 e − | x − r i | (52) − % un ce r t a i n t y VEGAS + VEGAS + MISER + VEGAS + MISERMISER + MISER (a)10 samples/iteration ( N ev )10 − % un ce r t a i n t y VEGAS + VEGAS + MISER + VEGAS + MISERMISER + MISER (b)

FIG. 8. Percent uncertainty ( σ ) in estimates of the integral Eq. (52)with peaks speciﬁed by a) Eq. (53), and b) Eq. (54). Results areshown for the last 10 of 15 iterations of VEGAS +, with N ev in-tegrand evaluations (samples) per iteration. Corresponding resultsfrom MISER and

MISER + use N ev samples. The VEGAS hybridswith with

MISER and

MISER + use 5 iterations of

VEGAS + to generatea

VEGAS map, and then estimate the integral using N ev sampleswith MISER / MISER +. with peaks aligned parallel to the x axis: r = (0 . , . , . , . r = (0 . , . , . , . r = (0 . , . , . , . (53)The uncertainties in the integral estimates generated by eachalgorithm for different values of N ev are shown in Fig. 8(a). MISER is 50–100 × less accurate than VEGAS +. MISER isalso 1.2–2.7 × less accurate than MISER +. VEGAS maps arewell suited to this integrand, so both

MISER and

MISER + seebig improvements when used with a

VEGAS map.

VEGAS +combined with

MISER + is 7–38 × more accurate than MISER +alone, and only 2–4 × less accurate than VEGAS + by itself.For the second comparison (Fig. 8(b)), we use the sameintegral but with the peaks aligned along the diagonal of theintegration volume: r = (0 . , . , . , . r = (0 . , . , . , . r = (0 . , . , . , . (54)Both MISER + and

MISER are signiﬁcantly more accurate (4–5 × ) for this diagonal structure than for the axis-aligned struc-ture of the previous integrand. As discussed above, VE - GAS maps are less effective for diagonal structures, but thecombination of

VEGAS + with

MISER + remains competitive0with

MISER + alone.

VEGAS + by itself is 4–6 × more accuratethan MISER , and 1.5–6 × more accurate than MISER +.We also checked the D = 8 dimensional versions of theseintegrals. For the integrand with axis-aligned peaks, VEGAS +starts working reliably with 500–1000 × fewer integrand sam-ples than MISER or MISER +. Using the last 20 of 30 itera-tions, with α = 0 . and N ev = 10 samples per iteration, VEGAS + gives 0.03%-accurate estimates of the integral. VE - GAS -assisted

MISER + is 3 × less accurate, while MISER + and

MISER by themselves are both more than 500 × less accuratefor a similar number of integrand samples.The differences are smaller with peaks aligned along the D = 8 diagonal because both MISER and

MISER + preferthe diagonal structure.

MISER + is approximately 5 × moreaccurate than MISER , and only 2–3 × less accurate than VE - GAS + when N ev is between × and . Using a VE - GAS map with

MISER + gives results that vary in precision be-tween

MISER + and

VEGAS +, depending upon N ev . Using VE - GAS + with

MISER improves on

MISER but is not as accurateas

MISER +.These experiments suggest that combining

VEGAS + witheither

MISER or MISER + is a good idea. Where

VEGAS mapsare effective, the combination can be much more accuratethan

MISER or MISER + separately. Where

VEGAS maps areless effective, they do not appreciably degrade results com-pared to the separate algorithms. Typically

MISER + outper-forms

MISER by factors of 2–5 and so might be the preferredoption in combination with

VEGAS +. VEGAS + outperformsall of the other algorithms in all of these tests.

B. Preconditioned Integrators

Ref. 22 suggests a different approach to multidimensionalintegration. They assume that the integrand f ( x ) has beensampled prior to integration, using MCMC or some othertechnique. A sample consists of some large number (thou-sands) of integrand values { f ( x ) } at points { x } that are con-centrated in regions important to the integral. The samplesare used to design an integrator that is customized (precondi-tioned) for the integrand. The authors describe an algorithmfor doing this, but this strategy is also easily implemented us-ing VEGAS +.To illustrate the approach with

VEGAS +, we consider a di-mension D = 8 integral whose integrand has three very sharppeaks arrayed along the diagonal of the integration volume: (cid:90) d x (cid:88) i =1 e − ( x − r i ) , (55)where the r i are given in Eq. (46). In this case it is easy togenerate random sample points whose density is proportionalto the integrand. We use 3000 sample points { x } . The VE - GAS map used by

VEGAS + can be trained on the sample data { x, f ( x ) } before integrating. This is done using the method described in Secs. II C and II D, but with d µi ≡ n µi (cid:88) x ∈ ∆ x µi J ( y ( x )) f ( x ) , (56)where the sum is over the sample data, n µi is the numberof sample data points falling in increment ∆ x µi , and y ( x ) isthe inverse of the VEGAS map. The

VEGAS map convergesquickly as the algorithm is iterated, with the same sample databeing reused for each iteration. Here we iterate the algorithm10 times.Starting with the preconditioned map, we then run

VEGAS +as usual for N it = 8 iterations, with damping parameter α = 0 to prevent further adjustment of the VEGAS map. The 8 itera-tions allow the stratiﬁed sampling algorithm to adapt to what-ever structure has not been dealt with by the

VEGAS map. Aswe increase the number N ev of integrand evaluations per iter-ation, we ﬁnd that VEGAS + starts to give good results when N ev ≈ to . By N ev = 10 , it is giving 1%-accurateresults. VEGAS + without the preconditioning is unable to ﬁndall three peaks reliably until N ev ≈ (and classic VEGAS needs many more evaluations).We expect a large beneﬁt from preconditioning for extrememultimodal problems like this one. The exact distribution ofthe sample points { x } is not crucial — for example, we getmore or less the same results above using points drawn fromGaussians that are twice as wide as in the integrand. Whatmatters is that there are enough samples around all of thepeaks. In more general problems, we usually do not know a priori where the peaks are or how many there are. Peak-ﬁnding algorithms or MCMC might be helpful in such cases.As emphasized in Ref. 22, using preconditioning in this wayseparates the challenge of ﬁnding the integrand’s peaks fromthe challenge of accurate integration once they have beenfound, allowing us to use different algorithms for the two dif-ferent tasks, each algorithm optimized for its task.We compared preconditioned VEGAS + with one of the al-gorithms ( TQ s) from Ref. 22 that is designed for precondition-ing. Like MISER , TQ s recursively subdivides the integrationvolume into a large number of sub-volumes (c.f., Fig. 7); but TQ s bases this partitioning on the sample { x, f ( x ) } availablebefore integrating. For our comparison we look at two inte-grals discussed in Ref. 22: one has a single narrow Gaussian(same width as in Eq. (26)) at the center of a × hyper-cube; the other has four narrow Gaussians (same width as inEq. (26)) spread evenly along the diagonal of a × hy-percube. The paper labels these problems “Gaussian” and“Quad”, respectively. These are the easiest and hardest in-tegrals considered there. We examined results for dimen-sion D = 1 –10.Following Ref. 22, we limit each algorithm to approxi-mately 12000 integrand evaluations for these comparisons.The TQ s algorithm uses half of those samples to divide theintegration volume into 2000 sub-volumes. The integral is es-timated by doing 3-point Simple Monte Carlo integrals overeach sub-volume and summing the results. VEGAS + needs far fewer samples to optimize the VE - GAS map because optimizing the map for each direction is a1 − % un ce r t a i n t y TQ s VEGAS + VEGAS + TQ s Gaussian2 4 6 8 10dimension D − % un ce r t a i n t y TQ s VEGAS + VEGAS + TQ s QuadFIG. 9. Percent uncertainty ( σ ) in estimates of two integrals fromRef. 22 for dimensions D = 1 –10. Results are shown for the VE - GAS + (blue) and TQ s (orange) algorithms, as well as for a hybridthat combines the VEGAS map with TQ s (green). All algorithms werelimited to 12000 integrand evaluations in all. The uncertainties areinferred from the interquartile range of the results from 50 repetitionsof each integration. separate one-dimensional problem that utilizes all of the data.Here we use 1000 samples to create the VEGAS map. Theremaining 11000 samples are allocated across 4 iterations of

VEGAS +, again with damping parameter α = 0 .Our results are in Fig. 9. VEGAS + outperforms TQ s by morethan an order of magnitude on the Gaussian problem in highdimensions, with only modest growth in the errors as the di-mension D increases (the VEGAS + error is still only 7% by D = 50 , for example). Not surprisingly, results from the twoalgorithms are much closer for the Quad problem. Both algo-rithms become unreliable for dimensions D greater than threeor four — 12000 integrand samples are too few for higher di-mensions.Fig. 9 also shows results for a hybrid approach that com-bines a VEGAS map with TQ s. In the hybrid approach, half ofthe integrand evaluations are used to create an optimized VE - GAS map, as outlined above. Those same samples are thenre-used by TQ s to partition the integration volume to opti-mize the y -space integral Eq. (23) with the optimized VE - GAS map x ( y ) . The y -space integral is then calculated sum-ming Simple Monte Carlo estimates of the contributions fromeach sub-volume in the TQ s partition. The VEGAS + TQ s hy-brid gives similar results to VEGAS + for the Gaussian prob-lem, but outperforms both of the other algorithms on the Quadproblem. In particular the hybrid algorithm continues to giveusable results even out to dimension D = 9 –10.These problems shows how the VEGAS map is effective for dealing with isolated peaks, even when these are arrangedalong a diagonal of the integration volume. This is becauseit is able to ﬂatten the peaks. It can’t expand the peaks to ﬁll y -space when there are multiple peaks along the diagonal, soit is important to combine the map with algorithms, like VE - GAS + and TQ s, that can target sub-regions within the y -spaceintegration volume.There are problems, of course, where the VEGAS mapis of limited use. The Hilbert-matrix Gaussian in Eq. (49)is an example. For this problem, neither TQ s nor the VE - GAS + TQ s hybrid can achieve errors smaller than 50% withonly 12000 samples when D ≥ ; VEGAS + gives errorssmaller than 1% for D = 3 . V. CONCLUSIONS

In this paper we have demonstrated how to combine adap-tive stratiﬁed sampling with classic

VEGAS ’s adaptive impor-tance sampling in a new algorithm,

VEGAS +. The adaptivestratiﬁed sampling makes

VEGAS + far more effective than

VEGAS for dealing with integrands that have multiple peaksor other structure aligned along diagonals of the integrationvolume. The added computational cost is negligible comparedto the cost of evaluating the integrand.

VEGAS + was 2–17 × more accurate than classic VEGAS in our various examples,with errors that typically fell faster than / √ N ev when in-creasing the number N ev of integrand evaluations.In Sec. IV, we showed how to combine VEGAS + with otheralgorithms, in effect replacing its adaptive stratiﬁed samplingwith the other adaptive algorithm. Our experiments with the

MISER and TQ s algorithms show that such hybrids can be sig-niﬁcantly more accurate than the original algorithms. It wouldbe worthwhile to explore these and other options further.We also showed (Sec. IV B) how to precondition VEGAS +using integrand samples generated separately from the inte-grator (e.g., by an MCMC algorithm). Preconditioning canhelp stabilize

VEGAS + and improve precision, especially forintegrands with multiple narrow peaks. In one example, VE - GAS + without preconditioning required more than 100 × asmany integrand evaluations as preconditioned VEGAS + beforeit began giving reliable results. This again is an area deservingfurther exploration.Finally we compared

VEGAS + and MCMC for two Bayes-ian analyses in Appendix B, one with D = 3 parameters andthe other with D = 21 . VEGAS + was more than 10 × as efﬁ-cient in both cases. There we discuss why we expect VEGAS +will often outperform MCMC for small and moderate sizedproblems.

ACKNOWLEDGMENTS

We thank T. Kinoshita for sharing his code for the 10 th -order QED correction discussed in the ﬁrst appendix. Thiswork was supported by the National Science Foundation.2 Appendix A: Sums and Feynman Diagrams

VEGAS + can be used for adaptive multi-dimensional sum-mation, as well as integration. To illustrate, we show how touse

VEGAS + to correct for the ﬁnite space-time volume usedin lattice QCD simulations. Such corrections are usually cal-culated in chiral perturbation theory. A typical example is thecontribution from a pion tadpole diagram, which in inﬁnitevolume is proportional to the integral (in Euclidean space) I π ≡ ∞ (cid:90) −∞ d k f ( k ) , (A1)where f ( k ) ≡ / (cid:0) k + m π (cid:1) (A2)with m π = 0 . GeV. This becomes an inﬁnite, 4-dimensional sum, S π ≡ (∆ k ) (cid:88) k n f ( k n ) , (A3)when the theory is conﬁned to a box of side L . Here k µn = n µ ∆ k with n µ = 0 , ± , ± . . . and ∆ k = 2 π/L. (A4)We take L = 5 fm. The sum is easily converted to an integral: S π = ∞ (cid:90) −∞ d k f (¯ k ( k )) (A5)where ¯ k µ ( k ) ≡ round( k µ / ∆ k ) ∆ k (A6)and round( x ) is the nearest integer to x . Then the neededcorrection can be written, I π − S π = 16 ∞ (cid:90) d k (cid:2) f ( k ) − f (¯ k ( k )) (cid:3) (A7) = 16 (cid:90) m π d z (cid:81) µ (1 − z µ ) (cid:2) f ( k ) − f (¯ k ( k )) (cid:3) (A8)where k µ = m π z µ / (1 − z µ ) . This integral is ultraviolet ﬁnite,unlike the original integrals.This integral is easy for VEGAS +. Using the last 10 of 15iterations, each with N ev = 10 samples, gives results witha 7.5% uncertainty, which is accurate enough for most prac-tical applications. Here damping parameter α = 0 . . Setting N ev = 10 gives 0.5% errors. Classic VEGAS gives 1.0% forthe same N ev .We also tested VEGAS + on a 10 th -order QED contributionto the muon’s magnetic moment. We examined the contribu-tions from the light-by-light diagrams with two vacuum po-larization insertions (diagrams VI(a) in Fig. 6 and Table IX of x y FIG. 10. Bayesian ﬁt (blue band) of data (19 blue data points) withposterior probability Eq. (B1) and parameters speciﬁed by Eqs. (B7)and (B9). The dotted line shows the ﬁt from a standard least-squaresanalysis. The band shows the ± σ range around the best ﬁt line. Ref. 23), using a (Fortran) code for the integrand provided byT. Kinoshita. The integration is over 9 Feynman parameters.We studied two cases: one where all the fermion-loop parti-cles are muons and the other where they are electrons. Thelatter has large factors of log( m µ /m e ) . We found that VE - GAS + was 3–4 × more accurate than classic VEGAS in bothcases, yielding uncertainties smaller than 0.01–0.02%, when N ev ≈ . Appendix B: Bayesian Curve Fitting

VEGAS +’s ability to ﬁnd and target narrow high peaks in anintegrand makes it well suited for evaluating the integrals usedin Bayesian analyses. It can be much faster than other popu-lar methods, such as Markov Chain Monte Carlos (MCMC),when applied to small or medium sized problems.To illustrate a Bayesian analysis, we consider ﬁtting astraight line p + p x to the data in Fig. 10. The error esti-mates for several of the points are clearly wrong, so we modelthe data’s probability density as a sum of two Gaussians, onewith the nominal width and another with × that width: P data ( y, σ y | p , w ) ≡ (1 − w ) √ πσ y e − ( y − p − p x ) / σ y + w √ π σ y e − ( y − p − p x ) / σ y (B1)Here w is the probability of a bad error estimate. Assumingﬂat priors for the p µ and w , the Bayesian posterior probabilitydensity is proportional to f ( p , w ) = P prior ( p , w ) (cid:89) i =1 P data ( y i , σ y | p , w ) , (B2)where P prior ( p , w ) ∝ Θ(0 < w < (cid:89) µ =1 Θ( − < p µ < . (B3)3The Bayesian probability distribution is normalized bycomputing the 3-dimensional integral I = (cid:90) − d p (cid:90) dw f ( p , w ) . (B4)The mean values for the three parameters are calculated fromadditional integrals, (cid:104) p (cid:105) = 1 I (cid:90) − d p (cid:90) dw f ( p , w ) p (cid:104) w (cid:105) = 1 I (cid:90) − d p (cid:90) dw f ( p , w ) w, (B5)and their covariances from still further integrals: cov p = 1 I (cid:90) − d p (cid:90) dw f ( p , w ) p p T − (cid:104) p (cid:105) (cid:104) p (cid:105) T , var w = 1 I (cid:90) − d p (cid:90) dw f ( p , w ) w − (cid:104) w (cid:105) . (B6)Additional integrals could provide expectation values (cid:104) g ( p ) (cid:105) and/or histograms for arbitrary functions g ( p ) . And so on.These integrals could be done separately using VEGAS +,but it is generally much better to do them simultaneously, us-ing the same sample points ( p , w ) for all of the integrals. Thisis because the VEGAS + errors in the different integrals arethen highly correlated, leading to signiﬁcant cancellations inthe errors for ratios like Eq. (B5) and differences like Eq. (B6).

VEGAS + can estimate the covariances between the estimatesof different integrals by summing the covariances comingfrom each hypercube (estimated using the multivariable gen-eralization of Eq. (6)). Given the covariances, it is possible toaccount for cancellations in the uncertainties associated withratios, differences, and other combinations of the integrationresults.

VEGAS + with 28,000 integrand samples distributed across15 iterations gives values for the three model parameters thatare accurate to 0.06–0.3% (much more than is needed): (cid:104) p (cid:105) VEGAS + = (cid:0) . , . (cid:1) (cid:104) w (cid:105) VEGAS + = 0 . . (B7)We drop the ﬁrst 5 iterations when estimating parameters. Ig-noring correlations between VEGAS + errors gives results thatare 2.5–12 × less accurate.To compare with MCMC, we generate 150,000 integrandsamples with a MCMC and use the last two thirds of thosesamples to estimate means for the parameters. The results areare 3–4 × less accurate than from VEGAS +, despite using morethan 5 × as many integrand samples: (cid:104) p (cid:105) MCMC , × = (cid:0) . , . (cid:1) (cid:104) w (cid:105) MCMC , × = 0 . (B8) We estimate the errors for the MCMC simulations by rerun-ning them several times.We show the ﬁt line corresponding to the VEGAS + resultsfor (cid:104) p (cid:105) and cov p = (cid:18) . − . − . . (cid:19) (B9)in Fig. 10 (blue band). This is more plausible than the ﬁt sug-gested by a standard least squares analysis (dotted line). Notethat the slope and intercept are anti-correlated, with correla-tion coefﬁcient − . . Classic VEGAS is only somewhat lessaccurate (40%) than

VEGAS + for these integrals.Our MCMC analysis is more than an order of magnitudeless efﬁcient than

VEGAS + when computing the means andcovariances of the ﬁt parameters. The MCMC’s precision islimited by its de-correlation time, the number of MCMC stepsrequired between samples to de-correlate them. This accountsfor most of the difference here. The last iteration of

VEGAS +from above, for example, uses N ev = 1704 integrand sam-ples to obtain 0.76% and 0.14% errors on the intercept andslope, respectively. (The results in Eq. (B7) are from the last10 iterations and so have smaller errors.) From Eq. (B9),we can estimate that the same number of uncorrelated sam-ples from the posterior distribution Eq. (B1) would give er-rors of 1.15% and 0.22%, respectively. So samples from aperfect Monte Carlo (i.e., de-correlation time equals one step)would be slightly less valuable here than samples from VE - GAS +, once it has adapted. No general purpose MCMC is per-fect, of course. The advantage from

VEGAS + samples growswith increasing N ev , because VEGAS + errors fall faster than / √ N ev due to its adaptive stratiﬁed sampling (c.f., Eq. (47)).We also compared VEGAS + with MCMC for the more difﬁ-cult problem where there is a separate value of w for each datapoint. Then w becomes a 19-component vector w in the equa-tions above, and the integrals are over 21 variables: p and w . VEGAS + gives results with better than 1% errors from the last8 out of 24 iterations, using 162,000 integrand samples in all: (cid:104) p (cid:105) VEGAS + = (cid:0) . , . (cid:1) (cid:104) w (cid:105) VEGAS + = (cid:0) . , . . . . (cid:1) (B10)An MCMC analysis with 10 × as many samples gives resultsthat are 7–13 × less accurate than from VEGAS +: (cid:104) p (cid:105) MCMC , × = (cid:0) . , . (cid:1) (cid:104) w (cid:105) MCMC , × = (cid:0) . , . . . . (cid:1) (B11)This MCMC analysis used the last third of the samples forestimating parameters.Classic VEGAS and

VEGAS + are the same for this last ex-ample since there are only enough integrand samples to allowa single hypercube in 21 dimensions. This follows becausethe default behavior in

VEGAS + is to use the same number ofstratiﬁcations in each direction, and N st = 2 is too many (seeEq. (50)). We can cut the errors for the slope and intercept inhalf, however, by using N µ st = 46 stratiﬁcations in each of the p directions, with no stratiﬁcation ( N µ st = 1 ) in the other di-rections. This can be done with the same number of integrandsamples as in the unstratiﬁed case. VEGAS + results for this problemcan be improved (by factors of order 1.5–4) if approximatevalues for the means of the parameters and their covariancematrix are known ahead of time, for example, from a peak-ﬁnding algorithm. Writing the ﬁt parameters as a vector c , thisis done by ﬁrst expressing the deviation from the approximatemean c in terms of the normalized eigenvectors u n of theapproximate covariance matrix: c ≡ c + (cid:88) n b n u n . (B12) The integral is then rewritten as an integral over the coefﬁ-cients b n rather than the components of c . This transformationreorients the error ellipse so it is aligned with the integrationaxes, making it easier for the VEGAS map to adapt around thepeak. Ref. 7 gives an example of this strategy in use. ∗ [email protected] G. P. Lepage, “A New Algorithm for Adaptive MultidimensionalIntegration,” J. Comp. Phys. , 192–203 (1978). B. P. Kersevan and E. Richter-Was, “The Monte Carlo event gen-erator AcerMC versions 2.0 to 3.8 with interfaces to PYTHIA6.4, HERWIG 6.5 and ARIADNE 4.1,” Comp. Phys. Comm. ,919-985 (2013). J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, O. Mat-telaer, H.-S. Shao, T. Stelzer, P. Torriellig and M. Zaroh, “The au-tomated computation of tree-level and next-to-leading order dif-ferential cross sections, and their matching to parton shower sim-ulations,” JHEP07 (2014) T. Aoyama, M. Hayakawa, T. Kinoshita, and M. Nio, “CompleteTenth-Order QED Contribution to the Muon g − ,” Phys. Rev.Lett. , 111808 (2012). G. Garberoglio and A. H. Harvey, “Path-Integral calculation ofthe third virial coefﬁcient of quantum gases at low temperatures,”J. Chem. Phys. , 134106 (2011). G. Campolieti and R. Makarov, “Pricing Path-Dependent Optionson State Dependent Volatility Models with a Bessel Bridge,” Int.J. Theor. Appl. Finance , 51–88, (2007). P. Serra, A. Heavens, and A. Melchiorri, “Bayesian Evidence fora cosmological constant using new high-redshift supernova data,”Mon. Not. R. Astron. Soc. , 169175 (2007) J. L. Sanders, “Probabilistic model for constraining the Galac-tic potential using tidal streams,” Mon. Not. R. Astron. Soc. ,423431 (2014). K. G¨ultekin, D. O. Richstone, K. Gebhardt, T. R. Lauer, S.Tremaine, M. C. Aller, R. Bender, A. Dressler, S. M. Faber, A.V. Filippenko, R. Green, L. C. Ho, J. Kormendy, J. Magorrian,J. Pinkney, and C. Siopis, “The M − σ and M − L Relations InGalactic Bulges, and Determination of their Intrinsic Scatter,” As-trophys. J. , 198-221 (2009). J. Ray, Y. M. Marzoukb and H. N. Najm, “A Bayesian approachfor estimating bioterror attacks from patient data,” Statist. Med. , 101-126 (2011). F. M. Atay and A. Hutt, “Neural Fields with Distributed Transmis-sion Speeds and Long-Range Feedback Delays,” SIAM J. Appl.Dyn. Sys. , 670–698 (2006). J. S. Dehesa, T. Koga, R. J. Y´a˜nez, A. R. Plastino, and R. O.Esquivel, “Quantum entanglement in helium,” J. Phys. B: At. Mol.Opt. Phys. , 015504 (2012). D. Varjas, F. de Juan, and Y.-M. Lu, “Bulk invariants and topolog-ical response in insulators and superconductors with nonsymmor-phic symmetries,” Phys. Rev.

B 92 , 195116 (2015). VEGAS +, the algorithm described in this paper, can ﬁnd the 20-Dsphere with 10 iterations of N ev = 10 integrand samples each. Another 10 iterations gives an estimate for the volume that is ac-curate to 0.05%. For a general discussion of importance and stratiﬁed sampling see:J. M. Hammersley and D. C. Hanscombe,

Monte Carlo Methods,

Chapt. 5 (Chapman and Hall, London, 1979). W. H. Press and G. R. Farrar, “Recursive Stratiﬁed Sampling ForMultidimensional Monte Carlo Integration,” Comp. in Phys. ,190 (1990). We used a Python adaptation of the version of the MISER code given in: W. H. Press et al,

Numerical Recipes in C ,2nd Edition (Cambridge, Cambridge, 2002). Error estimates from

VEGAS and

VEGAS + in most examples comefrom multiple iterations which were combined as described inSec. II E, with good χ s in each case. Estimated values for theintegral agreed with the exact values to within errors (i.e., mostlywithin ± σ ). G. Peter Lepage, gplepage/vegas v3.4.5 (2020), Zenodo http://doi.org/10.5281/zenodo.3897199 . The most recentcode is available at: https://github.com/gplepage/vegas . See, for example, the implementation in Ref. 18. Classic

VEGAS typically defaults to α = 1 . . VEGAS + can use asmaller default value, α = 0 . , because of its adaptive stratiﬁedsampling. The N × N Hilbert matrix has elements H µν = 1 / ( µ + ν − for µ, ν = 1 . . . N . It is famously ill-conditioned. T. Foster, C. L. Lei, M. Robinson, D. Gavaghan, and B.Lambert, “Model Evidence with Fast Tree Based Quadra-ture,” arXiv:2005.11300v1. We used the TQ s software providedby the authors at: https://github.com/thomfoster/treeQuadrature . T. Kinoshita and M. Nio, Phys. Rev.

D 73 , 053007 (2006). This example is adapted from J. Vanderplas’ Python blog: http://jakevdp.github.io/blog/2014/06/06/frequentism-and-bayesianism-2-when-results-differ/ .The data in the ﬁgure are included in the examples bundled withthe software in Ref. 18. We used the emcee

Python package for the MCMC simulations.It is an implementation of the algorithm described in: J. Good-man and J. Weare, “Ensemble Samplers with Afﬁne Invariance,”Comm. App. Math. and Comp. Sci. , 65–80 (2010). A perfect Monte Carlo (i.e., samples de-correlate in one step)would beat