AAdaptive Multidimensional Integration:
VEGAS
Enhanced
G. Peter Lepage ∗ Department of Physics, Cornell University, Ithaca, NY, 14853 (Dated: September 22, 2020)We describe a new algorithm,
VEGAS +, for adaptive multidimensional Monte Carlo integration. The newalgorithm adds a second adaptive strategy, adaptive stratified sampling, to the adaptive importance samplingthat is the basis for its widely used predecessor
VEGAS . Both
VEGAS and
VEGAS + are effective for integrandswith large peaks, but
VEGAS + can be much more effective for integrands with multiple peaks or other signifi-cant structures aligned with diagonals of the integration volume. We give examples where
VEGAS + is 2–17 × more accurate than VEGAS . We also show how to combine
VEGAS + with other integrators, such as the widelyavailable
MISER algorithm, to make new hybrid integrators. For a different kind of hybrid, we show how to useintegrand samples, generated using MCMC or other methods, to optimize
VEGAS + before integrating. We givean example where preconditioned
VEGAS + is more than 100 × as efficient as VEGAS + without precondition-ing. Finally, we give examples where
VEGAS + is more than 10 × as efficient as MCMC for Bayesian integralswith D = 3 and 21 parameters. We explain why VEGAS + will often outperform MCMC for small and moderatesized problems.
I. INTRODUCTION
Classic
VEGAS is an algorithm for adaptive multidimen-sional Monte Carlo integration. It is widely used in parti-cle physics: for example, in Monte Carlo event generators, and to evaluate low-order and high-order Feynman diagramsand cross sections numerically. It has also been used in otherfields for a variety of applications including, for example,path integrals for chemical physics and option pricing inapplied finance, Bayesian statistics for astrophysics andmedical statistics, models of neuronal networks , wave-function overlaps for atomic physics, topological integralsfor condensed matter physics, and so on.Monte Carlo integration is unusually robust. It makes fewassumptions about the integrand — it needn’t be analytic oreven continuous — and provides useful measures for the un-certainty and reliability of its results. This makes it wellsuited to multidimensional integration, and in particular adap-tive multidimensional integration. Adaptive strategies are es-sential for integration in high dimensions because importantstructures in integrands often occupy small fractions of the in-tegration volume. One might not think, for example, of theinterior of a sphere of radius 0.5 enclosed by a unit hypercubeas a “sharp peak,” but it occupies only 0.0000025% of the hy-percube in D = 20 dimensions. Classic
VEGAS is very effective for integrands with sharppeaks, which are common in particle physics applications andBayesian integrals. While it is particularly effective for in-tegrals that are separable (into a product of one-dimensionalintegrals), it also works very well for non-separable integralswith large peaks. It works less well for integrands with multi-ple peaks or other important structures aligned with diagonalsof the integration volume, although it is usually much betterthan Simple Monte Carlo integration.In this paper, we describe a modification of classic
VEGAS ,which we call
VEGAS +, that adds a second adaptive strategyto classic
VEGAS ’s adaptive importance sampling . The sec-ond strategy is a form of adaptive stratified sampling thatmakes VEGAS + far more effective in dealing with multiplepeaks and diagonal structures in integrands, given enough in- tegrand samples.We begin in Sec. II by reviewing the algorithm for (iter-ative) adaptive importance sampling used in classic
VEGAS .We discuss in particular the variable map used by
VEGAS toswitch to new integration variables that flatten the integrand’speaks. This is how
VEGAS implements importance sampling.We describe the new algorithm
VEGAS + in Sec. III and giveexamples that illustrate how and why it works. We also dis-cuss its limitations.The map used by
VEGAS to implement importance sam-pling can be used in conjunction with other integration algo-rithms to create new hybrid algorithms. We show how this isdone in Sec. IV, where we combine
VEGAS + with the
MISER algorithm. There we also show how to use sample data, froma Markov Chain Monte Carlo (MCMC) or other peak-findingalgorithm, to optimize the
VEGAS map before integrating; thiscan greatly reduce the cost of integrating functions with mul-tiple high, narrow peaks.Our conclusions are in Sec. V. We continue in two appen-dices with further
VEGAS + examples that illustrate its use foradaptive multidimensional summation, high-order Feynmandiagrams, and Bayesian curve fitting, where we compare VE - GAS + with MCMC for a 3-dimensional and a 21-dimensionalproblem.When comparing algorithms we do not look at computerrun times because these are too sensitive to implementationdetails. For realistic applications the cost of evaluating the in-tegrand usually exceeds other costs, so we measure efficiencyby the number of integrand evaluations used. Where statisti-cal uncertainties are quoted, they correspond to one standarddeviation; small differences (e.g., 10%) between uncertain-ties are not significant. The VEGAS + examples here were ana-lyzed using a widely available Python/Cython implementationfirst released in 2013. The integrands are written in (vector-ized) Python or, in one case, Fortran77.One change since classic
VEGAS was introduced 45 yearsago is the enormous increase in speed of computers. As aresult
VEGAS + integrals using or integrand evalua-tions require only minutes for simple integrands on a mod-ern laptop. VEGAS + is also easily configured for parallel a r X i v : . [ phy s i c s . c o m p - ph ] S e p computing. II. ADAPTIVE IMPORTANCE SAMPLING
In this section we review the adaptive strategy used in theoriginal
VEGAS algorithm: its remapping of the integrationvariables in each direction to optimize Monte Carlo estimatesof the integral. The strategy is an implementation of the stan-dard technique of importance sampling. In the following sec-tions we review how
VEGAS implements this strategy, first forone-dimensional integrals and then for multidimensional inte-grals.
A. Remapping the Integration Variable
For one-dimensional integrals, the
VEGAS strategy is to re-place the original integral I = b (cid:90) a dx f ( x ) (1)by an equivalent integral over a new variable y , I = (cid:90) dy J ( y ) f ( x ( y )) , (2)where J ( y ) is the Jacobian of the transformation. The trans-formation is chosen to minimize the uncertainty in a MonteCarlo evaluation of the integral in y -space.A Simple Monte Carlo estimate of the y -integral (Eq. (2))is given by I ≈ I MC ≡ N ev (cid:88) y J ( y ) f ( x ( y )) (3)where the sum is over N ev random points y uniformly dis-tributed between 0 and 1. The estimate I MC is itself a randomnumber whose mean is the exact value I of the integral andwhose variance is σ I = 1 N ev (cid:16) (cid:90) dy J ( y ) f ( x ( y )) − I (cid:17) (4) = 1 N ev (cid:16) b (cid:90) a dx J ( y ( x )) f ( x ) − I (cid:17) . (5)The standard deviation σ I is an indication of the possible errorin the Monte Carlo estimate. The variance σ I can also beestimated from the Monte Carlo data: σ ≡ N ev − (cid:16) N ev (cid:88) y J ( y ) f ( x ( y )) − I (cid:17) (6)The distribution of the Monte Carlo estimates I MC becomesGaussian in the limit of large N ev , assuming J ( y ) f ( x ( y )) is sufficiently integrable. There are non-Gaussian corrections,but they vanish quickly with increasing N ev . For example,the Monte Carlo estimates for I and σ I would be uncorrelatedwith Gaussian statistics, but here have a nonzero correlationthat vanishes like /N : (cid:10) ( I MC − I )( σ − σ I ) (cid:11) = (cid:10) ( I MC − I ) (cid:11) = 1 N (cid:90) dy (cid:0) J ( y ) f ( x ( y )) − I (cid:1) . (7) B. VEGAS
Map and Importance Sampling
VEGAS implements the variable map described in the previ-ous section by dividing the x -axis into N g intervals boundedby: x = ax = x + ∆ x · · · x N g = x N g − + ∆ x N g − = b. (8)where N g = 1000 is typical. The transformed variable hasvalue y = i/N g at point x i , and varies linearly with x be-tween x i s. Thus x ( y ) ≡ x i ( y ) + ∆ x i ( y ) δ ( y ) (9)relates x to y , where functions i ( y ) and δ ( y ) are the integerand fractional parts of yN g , respectively: i ( y ) ≡ floor( yN g ) (10) δ ( y ) ≡ yN g − i ( y ) . (11)This transformation maps the interval [0 , in y -space ontothe the original integration region [ a, b ] in x -space. Intervalsof varying widths ∆ x i in x -space map into intervals of uni-form width ∆ y = 1 /N g in y -space. The Jacobian for thistransformation, J ( y ) = N g ∆ x i ( y ) ≡ J i ( y ) , (12)is a step function whose values are determined by the intervalwidths ∆ x i .Substituting this Jacobian into Eq. (5) for the uncertainty ina Monte Carlo integration gives σ I = 1 N ev (cid:16) (cid:88) i J i x i +∆ x i (cid:90) x i dx f ( x ) − I (cid:17) . (13)Treating the J i as independent variables, subject to the con-straint (cid:88) i ∆ x i J i = (cid:88) i ∆ y = 1 , (14)it is easy to show that σ I is minimized when J i ∆ x i x i +∆ x i (cid:90) x i dx f ( x ) = constant . (15)That is, the grid is optimal when the average value of J ( y ( x )) f ( x ) in each interval ∆ x i is the same for every in-terval.A transformation with this property greatly reduces thestandard deviation when the integrand has high peaks. TheJacobian flattens the peaks in y -space and stretches them out,because J ≡ (cid:12)(cid:12)(cid:12)(cid:12) dxdy (cid:12)(cid:12)(cid:12)(cid:12) ∝ | f ( x ) | (16)becomes small near a peak. This means that a uniform MonteCarlo in y -space concentrates integrand samples around thepeaks in x -space. Each interval ∆ x i receives on average thesame number of samples ( = N ev /N g ), and the smallest inter-vals are placed where | f ( x ) | is largest (because J i ∝ ∆ x i ).Thus samples are concentrated in the most important regions,which is why this method is called importance sampling. C. Iterative Adaptation
The set of x i s defined above constitutes the VEGAS map.To optimize Monte Carlo integration,
VEGAS varies the Jaco-bian of the transformation by varying the interval sizes ∆ x i ,while keeping the sum of all ∆ x i s constant. This is done it-eratively. First VEGAS estimates the integral with a uniformgrid, accumulating information about the integrand in the pro-cess. This information is then used to construct an improvedgrid, and
VEGAS makes a new estimate of the integral. Againinformation about the integrand is accumulated in the process,and used to further improve the grid. In this fashion, the VE - GAS map adapts to the integrand over several iterations.To illustrate how this is done, we continue with the exampleabove where we generate a Monte Carlo estimate of the inte-gral in y -space (Eq. (2)) by sampling the integrand at N ev ran-dom points y (Eq. (3)). Given some initial grid, VEGAS accu-mulates the average value of J f for each interval in the gridwhile sampling the integrand to estimate the integral: d i ≡ n i (cid:88) x ( y ) ∈ ∆ x i J ( y ) f ( x ( y )) , (17)where n i ≈ N ev /N g is the number of samples in interval ∆ x i .The averages d i are used to refine the grid. The grid is optimalwhen all of the d i s are equal (Eq. (15)), so VEGAS adjusts thegrid intervals ∆ x i to make the d i more constant across theintegration region.This algorithm, like most adaptive algorithms, tends tooverreact in the early stages of optimizing its grid, since it hasrather poor information concerning the integrand at this stage.It is important, therefore, to dampen the refinement process so as to avoid rapid, destabilizing changes in the grid. In VEGAS ,the d i s are first smoothed and normalized: d i → (cid:80) i d i × (7 d + d ) / for i = 0( d i − + 6 d i + d i +1 ) / for i (cid:54) = 0 , N g − d N g − + 7 d N g − ) / for i = N g − . (18)Smoothing is particularly important if the integrand has largediscontinuities (e.g., step functions). For example, an abruptincrease in the integrand near the upper edge of interval ∆ x i might be missed completely by the samples. Without smooth-ing, VEGAS would see a sudden rise in the function beginningonly in ∆ x i +1 , and refine the grid accordingly, thereby miss-ing the small but possibly significant part of the step in inter-val ∆ x i . Smoothing makes this less likely by causing VEGAS to focus some effort on interval ∆ x i as well as ∆ x i +1 .Having smoothed the d i s, VEGAS then compresses theirrange, to avoid overreacting to atypically large sample valuesfor the integrand. This is done by replacing each d i with d i → (cid:16) − d i ln(1 /d i ) (cid:17) α , (19)where α ≥ is typically of order one. Parameter α canbe reduced in situations where VEGAS has trouble finding orholding onto the optimal grid; α = 0 implies no grid refine-ment.The condition for an optimal map remains that all d i , nowsmoothed and compressed, be roughly equal. If the map isnot optimal, VEGAS attempts to improve it. First the d i s aretreated as continuous quantities, and each d i is distributed uni-formly over its interval ∆ x i . Then new intervals, specified by { x (cid:48) i , ∆ x (cid:48) i } , are chosen so that each contains an equal fractionof the total d = (cid:80) i d i . The following algorithm achieves this:1. Define δd to be the amount of d associated with eachinterval of the new grid, δd ≡ (cid:80) i d i N g , (20)and initialize the following variables: x (cid:48) = x x (cid:48) N g = x N g i = 0 = index of current new x (cid:48) j = 0 = index of current old xS d = 0 = amount of d accumulated . (21)2. Increment i . If i ≥ N g , the new grid is finished.3. Skip to the next step if S d ≥ δd ; otherwise add d j to S d ,increment j , and return to the beginning of this step.4. Subtract δd from S d , and compute the boundary of thenew interval by interpolation: x (cid:48) i = x j − S d d j − ∆ x j − . (22)Return to Step 2.Replacing the old map with the new map, VEGAS proceedsto generate a new Monte Carlo estimate for the integral andanother new map. This entire process is repeated until the maphas converged and the estimate of the integral is sufficientlyaccurate.
D. Multidimensional Integrals
Monte Carlo integration, even with adaptive importancesampling, is not usually competitive with other algorithms forone-dimensional integration. It rapidly becomes competitive,however, as the dimensionality increases.The
VEGAS algorithm extends the algorithm describedabove to D -dimensional integrals over variables x µ (with µ = 1 . . . D ) by replacing each x µ with a new variable y µ .The variable transformation is specified by an independent VEGAS map { x µi , ∆ x µi } for each direction.The resulting integral is I = (cid:90) d D y J ( y ) f ( x ( y )) (23)where x ( y ) is the D -dimensional VEGAS map and J ( y ) its Ja-cobian. A Simple Monte Carlo estimate of this integral is ob-tained by sampling the integrand at random points y = { y µ } distributed uniformly within the unit hypercube at the origin: < y µ < . (24)The integrand samples are also used to calculate the averages d µi ≡ n µi (cid:88) x µ ( y µ ) ∈ ∆ x µi J ( y ) f ( x ) (25)for every interval on every integration axis. These are used toimprove the grid for each variable after each iteration, follow-ing the procedure described in Section II C.Fig. 1 shows the grid corresponding to a VEGAS map opti-mized for the D = 4 dimensional integral (cid:90) d x (cid:16) e − x − r ) + e − x − r ) (cid:17) , (26)where vector x = ( x , x , x , x ) , and r = (0 . , . , . , . r = (0 . , . , . , . . (27)The grid concentrates increments near 0.33 and 0.67 for x ,and near 0.5 in the other directions. Each rectangle in thefigure receives, on average, the same number of Monte Carlointegration samples.This VEGAS map has N g = 1000 increments along eachaxis. The accuracy of the y -space integrals is typically insen-sitive to N g so long as it is large enough. With N ev = 10 integrand evaluations, the accuracy improves from 11% with . . . . . . x . . . . . . x FIG. 1.
VEGAS map for the D = 4 integral defined in Eqs. (26)and Eq. (27). The figure shows grid lines for every 50 th incrementalong the x and x axes; grids for the x and x axes are the sameas for x . 5 10 15 20number of iterations0510152025 % un ce r t a i n t y FIG. 2. Percent uncertainty ( σ ) for each VEGAS iteration in a sim-ple y -space Monte Carlo for the D = 4 integral defined in Eqs. (28)and (27). The solid line shows how the uncertainty improves withsuccessive iterations of the refinement process when damping pa-rameter α = 0 . ; the dotted line shows (unstable) improvementwhen α = 1 . . N g = 1 (i.e., no VEGAS map) to 0.3% at N g = 100 and flat-tens out at 0.1% around N g = 700 .The VEGAS map is particularly effective for integrals likethat in Eq. (26), because the integral over each Gaussian canbe separated into a product of one-dimensional integrals overeach direction. It also works well, however, for other inte-grands with high peaks that are not separable in this way. Thesolid line in Fig. 2, for example, shows how the uncertaintyin a y -space Monte Carlo is reduced over 20 iterations forthe D = 4 integral (cid:90) d x (cid:88) i =1 Θ( | x − r i | < . , (28)where the r i are given in Eq. (27) and Θ( x ) = (cid:40) if x = True0 if x = False . (29)This integral is harder than the previous one, because thepeaks have no shoulders and so are difficult to find — the in-tegrand vanishes over 99.96% of the integration volume. The VEGAS map is nevertheless able to reduce the fractional un-certainty from 24% before adapting to 0.34% after 10–20 it-erations, where N ev = 10 Monte Carlo samples are used foreach iteration. The damping parameter was set lower here,to α = 0 . , to ensure smooth adaptation (solid line); set-ting α = 1 . leads to instability (dashed line).This last example also illustrates how the VEGAS map canimprove integrals over volumes with irregular shapes. MonteCarlo integration works well even with discontinuous inte-grands and so has no problem with the Θ functions that de-fine the integration volume. The VEGAS map helps find theintegration region and concentrate samples in that region. Ob-viously it is better, where possible, to redefine the integrationvariables so that the integration region is rectangular.
E. Combining and Comparing Iterations
VEGAS , being iterative, generates a series of estimates I j of the integral (Eq. (3)), each with its own error estimate σ j ≡ σ I j (Eq. (6)). As discussed above, the distribution of I j s for a given VEGAS map is approximately Gaussian whena sufficient number of samples N ev is used (and assuming theintegral is square integrable). Then we can combine estimatesfrom separate iterations to obtain a cumulative estimate of theintegral and its standard error: I = (cid:80) j I j /σ j (cid:80) j /σ j (30) σ I = (cid:16) (cid:88) j σ j (cid:17) − / . (31)The cumulative estimate is usually superior to the estimatesfrom separate iterations.It is important to verify that results from different iterationsare consistent with each other within the estimated uncertain-ties. This provides a direct check on the reliability of the errorestimates. For example, we can calculate the χ statistic forthe estimates: χ = (cid:88) j (cid:0) I j − I (cid:1) σ j . (32) We expect χ to be of order the number of iterations (less one)when the I j are approximately Gaussian, and the estimates ofthe uncertainties σ j are reliable. If χ is considerably largerthan this, either or both of I and σ I may be unreliable. Thereare two common causes for χ s that are too large.The first common cause is that early iterations, before the VEGAS map has adapted, may give very poor estimates for theintegral and its uncertainty. This happens, for example, whenthe integrand has high, narrow peaks that are largely missed inearly iterations, leading to estimates for the integral and errorthat are both much too small. A standard remedy is to omitthe early iterations from the determinations of I and σ I .The second cause for χ s that are too large is that the num-ber of samples N ev is insufficiently large to guarantee Gaus-sian statistics for the I j , even after the VEGAS map has fullyadapted to the integrand. The threshold for an adequate N ev is highly dependent on the integrand. In practice one findsthis threshold through trial and error, by looking to see howlarge N ev needs to be in order to obtain stable results and areasonable χ . Although the statistical error in I can be re-duced by increasing either the number of samples N ev or thenumber of iterations N it , it is generally better to increase N ev while keeping N it just large enough to find the optimal VE - GAS map and measure a χ . N it = 5 –20 is usually enough.There is another, related reason for increasing the num-ber of samples N ev rather than the number of iterations N it .While the estimates I j from individual iterations give un-biased estimates of the integral for any N ev , the weightedsum I only becomes unbiased when N ev is sufficiently largethat non-Gaussian effects are negligible. The leading non-Gaussian effect is the correlation between fluctuations in I j and σ j (Eq. (7)). It introduces a bias in the weighted averagethat vanishes like /N with increasing N ev and therefore isusually negligible compared to the statistical uncertainty σ I ,which vanishes far more slowly. For example, the bias in I for the Gaussian integral discussed above (Eq. (26)) is onlyabout − . % when N ev = 500 (with α = 1 . ). This is tentimes smaller than σ j , and so is negligible compared to σ I un-less N it is very large ( N it ≥ ). The bias falls to − . %when N ev = 1000 .The bias coming from the weighted average can usually beignored, but it is easy to avoid it completely, if desired. Thisis done by discarding results from the first 5 or 10 iterationsof the VEGAS algorithm (while the grid is adapting), and thenpreventing the algorithm from adapting in subsequent itera-tions (by setting damping parameter α = 0 ). The unweightedaverage of the latter I j s provides an unbiased estimate of theintegral, and the unweighted average of the σ j s divided by √ N it gives an estimate of the statistical uncertainty. III. ADAPTIVE STRATIFIED SAMPLING
The y -space (Eq. (23)) integrals in the previous sectionwere estimated using Simple Monte Carlo integration. The VEGAS implementation currently in wide use (“classic VE - GAS ”) improves on this by using stratified Monte Carlo sam-pling in y -space rather than Simple Monte Carlo. Given N ev samples per iteration, the algorithm divides each y µ axis( µ = 1 . . . D ) into N st = floor(( N ev / /D ) , (33)stratifications of width ∆ y st = 1 /N st . (34)This divides divide y -space into N D st hypercubes, each withvolume ∆ y D st . The full integral and its variance are the sumsof contributions from each hypercube h : I = (cid:88) h ∆ I h ≈ I MC σ I = (cid:88) h σ I h ≈ σ . (35)Simple Monte Carlo estimates are made for ∆ I h (c.f., Eq. (3))and σ I h (c.f., Eq. (5)) to obtain estimates I MC and σ forthe total integral and its variance, respectively. The number ofintegrand samples used is n ev ≡ floor( N ev /N D st ) ≥ (36)per hypercube. The standard deviation for a stratified MonteCarlo estimate typically falls with increasing N st , potentiallyas quickly as /N st . We can improve on the stratification strategy used by classic
VEGAS by allowing the number of integrand samples n h usedin each hypercube h to vary from hypercube to hypercube.Then the variance in the Monte Carlo estimate for the integralis σ I = (cid:88) h σ h ( Jf ) n h (37)where σ h ( Jf ) ≡ Ω h (cid:90) Ω h d D y (cid:0) J ( y ) f ( x ( y )) (cid:1) − (cid:16) (cid:90) Ω h d D y J ( y ) f ( x ( y )) (cid:17) (38)and Ω h is the hypercube’s volume in y -space. Varying the n h independently, constrained by (cid:88) h n h = N ev , (39)it is easy to show that σ I is minimized when n h ∝ σ h ( Jf ) . (40)The innovation in the new VEGAS (“ VEGAS +”) is to re-distribute integrand samples across the hypercubes, accordingto Eq. (40), after each iteration. The σ h ( Jf ) are estimatedin an iteration using the integrand samples used to estimatethe integral. In this way the distribution of integrand samplesacross hypercubes is optimized over several iterations, at thesame time as the VEGAS map is optimized.In detail, the algorithm for reallocating samples across hy-percubes in
VEGAS + is as follows: 1. Choose a somewhat smaller number of stratifications sothere are enough samples to allow for significant varia-tion in n h : N st = floor(( N ev / /D ) . (41)2. During each iteration accumulate estimates σ h ( Jf ) ≈ Ω h n h (cid:88) y ∈ Ω h (cid:0) J ( y ) f ( x ( y )) (cid:1) − (cid:16) Ω h n h (cid:88) y ∈ Ω h J ( y ) f ( x ( y )) (cid:17) , (42)for each hypercube using the same samples used to es-timate the integral.3. Introduce a damping parameter β ≥ by replacing σ h ( Jf ) with d h ≡ (cid:0) σ j ( JF ) (cid:1) β . (43)Choosing β = 1 corresponds to the optimal distribution(Eq. (40)), but a somewhat smaller value can help avoidoverreaction to random fluctuations. Setting β = 0 re-sults in n h values that are all the same — the stratifiedsampling becomes non-adaptive, as in classic VEGAS .We use β = 0 . for the examples in this paper.4. Recalculate the number of samples for each hypercube, n h = max (cid:0) , N ev d h / (cid:88) h (cid:48) d h (cid:48) (cid:1) , (44)for use in the next iteration.The VEGAS map is also updated after each iteration, asdescribed in Section II C. The allocation of integrand sam-ples converges rapidly once the
VEGAS map has converged.The optimal
VEGAS map for an integrand is independent ofthe allocation of samples, but reallocating samples accordingto Eq. (40) can significantly speed the discovery of that opti-mum because there is better information about the integrandearlier on. The independence of the optimal
VEGAS map fromthe sample allocation improves the algorithm’s stability —random fluctuations in the sample allocation are unlikely totrigger big changes in the
VEGAS map.
A. Diagonal Structure
Adaptive stratified sampling as described in the previoussection provides little or no improvement over classic
VEGAS for integrals like those in the previous sections, where the Ja-cobian from the
VEGAS map can flatten the integrand’s peaksalmost completely and spread them out to fill y -space. It iseasy, however, to create integrals for which the new adaptivestratified sampling makes a big difference. samples/iteration ( N ev )10 − − % un ce r t a i n t y VEGAS + VEGAS
FIG. 3. Percent uncertainty ( σ ) in the integral estimates from30 iterations of classic VEGAS (top) and
VEGAS + (bottom) is plottedversus the number N ev of integrand evaluations (samples) used periteration. Integral estimates from the first ten iterations are ignoredin each case. Damping parameter α = 0 . in both cases. Dampingparameter β = 0 for classic VEGAS ; β = 0 . for VEGAS +. Classic
VEGAS becomes unstable below N ev = 3 × ; VEGAS + is unstablebelow × . The number N st of stratifications per axis used by VEGAS + varies from 3 to 8 over this range of sample sizes N ev . Consider, for example, the eight-dimensional integral (cid:90) d x (cid:88) i =1 e − | x − r i | (45)whose integrand has three narrow peaks along the diagonal ofthe integration volume, at r = (0 . , . , . , . , . , . , . , . r = (0 . , . , . , . , . , . , . , . r = (0 . , . , . , . , . , . , . , . . (46)The locations along the diagonal were chosen randomly. Un-like Gaussians, these integrands cannot be factored into aproduct of separate functions for each direction.Fig. 3 shows that the uncertainties in the integral estimatesfrom classic VEGAS are 14–19 × larger than those from VE - GAS +, using the same number of integrand samples N ev periteration. Measured from N ev = 10 , the uncertainty gener-ated by N it iterations of the new algorithm falls roughly like σ I ∝ √ N it N . (47)with increasing N it and N ev — as expected, a larger N ev ismore valuable than a larger N it for a given cost N it × N ev . VEGAS + gives reliable results down to N ev = 10 ; classic VEGAS is unusable below N ev = 3 × , where it typicallymisses out one or more of the three peaks.What makes this integral challenging for classic VEGAS isthe diagonal structure of the integrand. An axis-oriented adap-tation strategy, like that used by classic
VEGAS , will generallyhave difficulty handling large structures aligned along diago-nals. . . . . . . x . . . . . . x FIG. 4.
VEGAS map for the integral in Eq. (45). Every 33 rd grid lineis drawn for axes x and x ; grids for the other axes are the same. The problem for classic
VEGAS is obvious from picturesof the optimal
VEGAS map for this integrand (Fig. 4). Thisgrid concentrates integrand samples at the three peaks on thediagonal, but also at − additional points, wherethe integrand is very small: r = (0 . , . , . , . . . ) r = (0 . , . , . , . . . ) r = (0 . , . , . , . . . ) (48) r = (0 . , . , . , . . . ) . . . Integrand samples at these phantom peaks are wasted, greatlyreducing the effective number of samples. The adaptive strat-ification used by
VEGAS + can transfer integration points fromthe phantom peaks to the real peaks, leading to a much largereffective N ev . Fig. 5 shows how samples are distributed across hypercubes by VEGAS + with N ev ≈ samples per iter-ation; more than half of the hypercubes have only 2 samples.Classic VEGAS uses 2 samples in each of hypercubes for(approximately) the same N ev .Another example is the integral (cid:90) − d x e − x T H − x / , (49)where H is the × Hilbert matrix. This integrand has asharp ridge along an oblique axis (Fig. 6). Classic
VEGAS obtains a 0.25% accurate result from the last five of seven it-erations with N ev = 4 × , while VEGAS + is about 3 × moreaccurate. nu m b e r o f hyp e r c ub e s FIG. 5. Distribution of integrand samples across hypercubes used by
VEGAS + when evaluating the integral in Eq. (45). There are hy-percubes and N ev ≈ samples in all. − x − x − x − x FIG. 6. Two views of 10,000 random points distributed with densityproportional to the integrand of Eq. (49). One view is projected ontothe x , x plane (left) and the other onto the x , x plane (right).Correlation coefficients for x , x and x , x are 0.866 and 0.986,respectively. B. Limitations
The chief limitation of
VEGAS +’s adaptive stratified sam-pling is that it requires at least N st = 2 stratifications perdirection to have any effect on results. From Eq. (41), thisrequires N ev ≥ N D st ≥ D +2 (50)integrand evaluations per iteration. While this is not much of arestriction for dimension D = 5 or 6, adaptive stratified sam-pling only turns on when N ev ≥ . × for D = 25 . Thisis still manageable but in practice there will be no differencebetween VEGAS + and classic
VEGAS for dimensions that aremuch higher.In some situations it is possible to circumvent this restric-tion, at least partially, by using different numbers N µ st of strat-ifications in different directions µ . Most integrands have morestructure in some directions than in others. Concentratingstratifications in those directions, with fewer stratifications ornone in other directions, allows VEGAS + to use stratified sam-pling with smaller values of N ev . We give an example (with . . . . . . y . . . . . . y FIG. 7. A partition of the integration volume generated by
MISER for the D = 2 dimensional version of integral Eq. (45). MISER used × samples to generate this partition and estimate the integral.There are 271 sub-volumes. D = 21 ) at the end of Appendix B.Note, finally, that the VEGAS map used by both classic VE - GAS and
VEGAS + remains effective for arbitrarily large di-mensions D . IV. VEGAS+ HYBRIDS
In this section we discuss how
VEGAS + can be combinedwith other algorithms to make hybrid integrators. We look attwo strategies. In the first, we use several iterations of VE - GAS + to generate a
VEGAS map x ( y ) that is optimized for theintegrand. We then used a different (adaptive) algorithm toevaluate the y -space integral Eq. (23). This strategy, in effect,replaces VEGAS +’s adaptive stratified sampling with the otheralgorithm. We illustrate it in Sec. IV A by combining
VEGAS +with the widely available
MISER algorithm, and a variation onthat algorithm,
MISER +.The second strategy employs a Markov Chain Monte Carlo(MCMC) or other peak-finding algorithm to generate a set ofsamples { x, f ( x ) } of the integrand. These are used to precon-dition the VEGAS map before integrating. The sample points x need to cover important regions of the integration volume, butotherwise are unrestricted. The preconditioning makes it eas-ier for VEGAS + to discover the important regions. In Sec. IV Bwe illustrate this approach with integrands having multiple,narrow peaks. We also compare preconditioned
VEGAS + witha new algorithm optimized for this strategy. A. MISER and
MISER + For our first example of a hybrid integrator, we combine
VEGAS + with the
MISER algorithm. MISER uses adaptivestratified sampling where the integration volume is recursivelypartitioned into a large number of rectangular sub-volumes(Fig. 7) to increase the accuracy of the integral’s estimate.
MISER appears to be more effective for integrands with peaksaligned along diagonals than it is for peaks aligned parallel tointegration axes. This is opposite from
VEGAS maps, suggest-ing that the combination might be particularly effective.In the following examples, we compare
VEGAS + with
MISER and with a
VEGAS - MISER hybrid. In each case werun
VEGAS + with various values for the number N ev of sam-ples per iteration, and 15 iterations, discarding results fromthe first 5. We use default values for the damping parameters: α = 0 . and β = 0 . . To compare, we run MISER with N ev samples. The VEGAS - MISER hybrid uses 5 iterationsof
VEGAS + to develop a
VEGAS map x ( y ) and then estimatesthe y -space integral Eq. (23) using MISER with N ev inte-grand samples.We also compare these algorithms with a variation on MISER , which we call
MISER +. The procedure for
MISER +is: 1. Use
MISER with half the sample points to generate andsave a partition of the integration space optimized forthe integrand f ( x ) . Also save MISER ’s estimate for σ i ( f ) = Ω i (cid:90) Ω i d D x f ( x ) − (cid:16) (cid:90) Ω i d D x f ( x ) (cid:17) (51)in each sub-volume Ω i . Ignore MISER ’s estimate for theintegral.2. Distribute the remaining half of the integrand sam-ples across the sub-volumes so that the number n i ofsamples in each sub-volume is proportional to σ i ( f ) ,using the procedure described for VEGAS + (Eqs. (43and (44)).3. Estimate the integral in each sub-volume of the partitionusing Simple Monte Carlo with the number of samplepoints allocated in the previous step. Add the estimatesfrom each sub-volume to obtain an estimate for the totalintegral (as in Eq. (35)).The relation between
MISER + and
MISER is similar to thatbetween
VEGAS + and classic
VEGAS . MISER + can also becombined with
VEGAS maps, as described above.We did detailed comparisons for two different integrals.The first is the D = 4 dimensional integral (cid:90) d x (cid:88) i =1 e − | x − r i | (52) − % un ce r t a i n t y VEGAS + VEGAS + MISER + VEGAS + MISERMISER + MISER (a)10 samples/iteration ( N ev )10 − % un ce r t a i n t y VEGAS + VEGAS + MISER + VEGAS + MISERMISER + MISER (b)
FIG. 8. Percent uncertainty ( σ ) in estimates of the integral Eq. (52)with peaks specified by a) Eq. (53), and b) Eq. (54). Results areshown for the last 10 of 15 iterations of VEGAS +, with N ev in-tegrand evaluations (samples) per iteration. Corresponding resultsfrom MISER and
MISER + use N ev samples. The VEGAS hybridswith with
MISER and
MISER + use 5 iterations of
VEGAS + to generatea
VEGAS map, and then estimate the integral using N ev sampleswith MISER / MISER +. with peaks aligned parallel to the x axis: r = (0 . , . , . , . r = (0 . , . , . , . r = (0 . , . , . , . (53)The uncertainties in the integral estimates generated by eachalgorithm for different values of N ev are shown in Fig. 8(a). MISER is 50–100 × less accurate than VEGAS +. MISER isalso 1.2–2.7 × less accurate than MISER +. VEGAS maps arewell suited to this integrand, so both
MISER and
MISER + seebig improvements when used with a
VEGAS map.
VEGAS +combined with
MISER + is 7–38 × more accurate than MISER +alone, and only 2–4 × less accurate than VEGAS + by itself.For the second comparison (Fig. 8(b)), we use the sameintegral but with the peaks aligned along the diagonal of theintegration volume: r = (0 . , . , . , . r = (0 . , . , . , . r = (0 . , . , . , . (54)Both MISER + and
MISER are significantly more accurate (4–5 × ) for this diagonal structure than for the axis-aligned struc-ture of the previous integrand. As discussed above, VE - GAS maps are less effective for diagonal structures, but thecombination of
VEGAS + with
MISER + remains competitive0with
MISER + alone.
VEGAS + by itself is 4–6 × more accuratethan MISER , and 1.5–6 × more accurate than MISER +.We also checked the D = 8 dimensional versions of theseintegrals. For the integrand with axis-aligned peaks, VEGAS +starts working reliably with 500–1000 × fewer integrand sam-ples than MISER or MISER +. Using the last 20 of 30 itera-tions, with α = 0 . and N ev = 10 samples per iteration, VEGAS + gives 0.03%-accurate estimates of the integral. VE - GAS -assisted
MISER + is 3 × less accurate, while MISER + and
MISER by themselves are both more than 500 × less accuratefor a similar number of integrand samples.The differences are smaller with peaks aligned along the D = 8 diagonal because both MISER and
MISER + preferthe diagonal structure.
MISER + is approximately 5 × moreaccurate than MISER , and only 2–3 × less accurate than VE - GAS + when N ev is between × and . Using a VE - GAS map with
MISER + gives results that vary in precision be-tween
MISER + and
VEGAS +, depending upon N ev . Using VE - GAS + with
MISER improves on
MISER but is not as accurateas
MISER +.These experiments suggest that combining
VEGAS + witheither
MISER or MISER + is a good idea. Where
VEGAS mapsare effective, the combination can be much more accuratethan
MISER or MISER + separately. Where
VEGAS maps areless effective, they do not appreciably degrade results com-pared to the separate algorithms. Typically
MISER + outper-forms
MISER by factors of 2–5 and so might be the preferredoption in combination with
VEGAS +. VEGAS + outperformsall of the other algorithms in all of these tests.
B. Preconditioned Integrators
Ref. 22 suggests a different approach to multidimensionalintegration. They assume that the integrand f ( x ) has beensampled prior to integration, using MCMC or some othertechnique. A sample consists of some large number (thou-sands) of integrand values { f ( x ) } at points { x } that are con-centrated in regions important to the integral. The samplesare used to design an integrator that is customized (precondi-tioned) for the integrand. The authors describe an algorithmfor doing this, but this strategy is also easily implemented us-ing VEGAS +.To illustrate the approach with
VEGAS +, we consider a di-mension D = 8 integral whose integrand has three very sharppeaks arrayed along the diagonal of the integration volume: (cid:90) d x (cid:88) i =1 e − ( x − r i ) , (55)where the r i are given in Eq. (46). In this case it is easy togenerate random sample points whose density is proportionalto the integrand. We use 3000 sample points { x } . The VE - GAS map used by
VEGAS + can be trained on the sample data { x, f ( x ) } before integrating. This is done using the method described in Secs. II C and II D, but with d µi ≡ n µi (cid:88) x ∈ ∆ x µi J ( y ( x )) f ( x ) , (56)where the sum is over the sample data, n µi is the numberof sample data points falling in increment ∆ x µi , and y ( x ) isthe inverse of the VEGAS map. The
VEGAS map convergesquickly as the algorithm is iterated, with the same sample databeing reused for each iteration. Here we iterate the algorithm10 times.Starting with the preconditioned map, we then run
VEGAS +as usual for N it = 8 iterations, with damping parameter α = 0 to prevent further adjustment of the VEGAS map. The 8 itera-tions allow the stratified sampling algorithm to adapt to what-ever structure has not been dealt with by the
VEGAS map. Aswe increase the number N ev of integrand evaluations per iter-ation, we find that VEGAS + starts to give good results when N ev ≈ to . By N ev = 10 , it is giving 1%-accurateresults. VEGAS + without the preconditioning is unable to findall three peaks reliably until N ev ≈ (and classic VEGAS needs many more evaluations).We expect a large benefit from preconditioning for extrememultimodal problems like this one. The exact distribution ofthe sample points { x } is not crucial — for example, we getmore or less the same results above using points drawn fromGaussians that are twice as wide as in the integrand. Whatmatters is that there are enough samples around all of thepeaks. In more general problems, we usually do not know a priori where the peaks are or how many there are. Peak-finding algorithms or MCMC might be helpful in such cases.As emphasized in Ref. 22, using preconditioning in this wayseparates the challenge of finding the integrand’s peaks fromthe challenge of accurate integration once they have beenfound, allowing us to use different algorithms for the two dif-ferent tasks, each algorithm optimized for its task.We compared preconditioned VEGAS + with one of the al-gorithms ( TQ s) from Ref. 22 that is designed for precondition-ing. Like MISER , TQ s recursively subdivides the integrationvolume into a large number of sub-volumes (c.f., Fig. 7); but TQ s bases this partitioning on the sample { x, f ( x ) } availablebefore integrating. For our comparison we look at two inte-grals discussed in Ref. 22: one has a single narrow Gaussian(same width as in Eq. (26)) at the center of a × hyper-cube; the other has four narrow Gaussians (same width as inEq. (26)) spread evenly along the diagonal of a × hy-percube. The paper labels these problems “Gaussian” and“Quad”, respectively. These are the easiest and hardest in-tegrals considered there. We examined results for dimen-sion D = 1 –10.Following Ref. 22, we limit each algorithm to approxi-mately 12000 integrand evaluations for these comparisons.The TQ s algorithm uses half of those samples to divide theintegration volume into 2000 sub-volumes. The integral is es-timated by doing 3-point Simple Monte Carlo integrals overeach sub-volume and summing the results. VEGAS + needs far fewer samples to optimize the VE - GAS map because optimizing the map for each direction is a1 − % un ce r t a i n t y TQ s VEGAS + VEGAS + TQ s Gaussian2 4 6 8 10dimension D − % un ce r t a i n t y TQ s VEGAS + VEGAS + TQ s QuadFIG. 9. Percent uncertainty ( σ ) in estimates of two integrals fromRef. 22 for dimensions D = 1 –10. Results are shown for the VE - GAS + (blue) and TQ s (orange) algorithms, as well as for a hybridthat combines the VEGAS map with TQ s (green). All algorithms werelimited to 12000 integrand evaluations in all. The uncertainties areinferred from the interquartile range of the results from 50 repetitionsof each integration. separate one-dimensional problem that utilizes all of the data.Here we use 1000 samples to create the VEGAS map. Theremaining 11000 samples are allocated across 4 iterations of
VEGAS +, again with damping parameter α = 0 .Our results are in Fig. 9. VEGAS + outperforms TQ s by morethan an order of magnitude on the Gaussian problem in highdimensions, with only modest growth in the errors as the di-mension D increases (the VEGAS + error is still only 7% by D = 50 , for example). Not surprisingly, results from the twoalgorithms are much closer for the Quad problem. Both algo-rithms become unreliable for dimensions D greater than threeor four — 12000 integrand samples are too few for higher di-mensions.Fig. 9 also shows results for a hybrid approach that com-bines a VEGAS map with TQ s. In the hybrid approach, half ofthe integrand evaluations are used to create an optimized VE - GAS map, as outlined above. Those same samples are thenre-used by TQ s to partition the integration volume to opti-mize the y -space integral Eq. (23) with the optimized VE - GAS map x ( y ) . The y -space integral is then calculated sum-ming Simple Monte Carlo estimates of the contributions fromeach sub-volume in the TQ s partition. The VEGAS + TQ s hy-brid gives similar results to VEGAS + for the Gaussian prob-lem, but outperforms both of the other algorithms on the Quadproblem. In particular the hybrid algorithm continues to giveusable results even out to dimension D = 9 –10.These problems shows how the VEGAS map is effective for dealing with isolated peaks, even when these are arrangedalong a diagonal of the integration volume. This is becauseit is able to flatten the peaks. It can’t expand the peaks to fill y -space when there are multiple peaks along the diagonal, soit is important to combine the map with algorithms, like VE - GAS + and TQ s, that can target sub-regions within the y -spaceintegration volume.There are problems, of course, where the VEGAS mapis of limited use. The Hilbert-matrix Gaussian in Eq. (49)is an example. For this problem, neither TQ s nor the VE - GAS + TQ s hybrid can achieve errors smaller than 50% withonly 12000 samples when D ≥ ; VEGAS + gives errorssmaller than 1% for D = 3 . V. CONCLUSIONS
In this paper we have demonstrated how to combine adap-tive stratified sampling with classic
VEGAS ’s adaptive impor-tance sampling in a new algorithm,
VEGAS +. The adaptivestratified sampling makes
VEGAS + far more effective than
VEGAS for dealing with integrands that have multiple peaksor other structure aligned along diagonals of the integrationvolume. The added computational cost is negligible comparedto the cost of evaluating the integrand.
VEGAS + was 2–17 × more accurate than classic VEGAS in our various examples,with errors that typically fell faster than / √ N ev when in-creasing the number N ev of integrand evaluations.In Sec. IV, we showed how to combine VEGAS + with otheralgorithms, in effect replacing its adaptive stratified samplingwith the other adaptive algorithm. Our experiments with the
MISER and TQ s algorithms show that such hybrids can be sig-nificantly more accurate than the original algorithms. It wouldbe worthwhile to explore these and other options further.We also showed (Sec. IV B) how to precondition VEGAS +using integrand samples generated separately from the inte-grator (e.g., by an MCMC algorithm). Preconditioning canhelp stabilize
VEGAS + and improve precision, especially forintegrands with multiple narrow peaks. In one example, VE - GAS + without preconditioning required more than 100 × asmany integrand evaluations as preconditioned VEGAS + beforeit began giving reliable results. This again is an area deservingfurther exploration.Finally we compared
VEGAS + and MCMC for two Bayes-ian analyses in Appendix B, one with D = 3 parameters andthe other with D = 21 . VEGAS + was more than 10 × as effi-cient in both cases. There we discuss why we expect VEGAS +will often outperform MCMC for small and moderate sizedproblems.
ACKNOWLEDGMENTS
We thank T. Kinoshita for sharing his code for the 10 th -order QED correction discussed in the first appendix. Thiswork was supported by the National Science Foundation.2 Appendix A: Sums and Feynman Diagrams
VEGAS + can be used for adaptive multi-dimensional sum-mation, as well as integration. To illustrate, we show how touse
VEGAS + to correct for the finite space-time volume usedin lattice QCD simulations. Such corrections are usually cal-culated in chiral perturbation theory. A typical example is thecontribution from a pion tadpole diagram, which in infinitevolume is proportional to the integral (in Euclidean space) I π ≡ ∞ (cid:90) −∞ d k f ( k ) , (A1)where f ( k ) ≡ / (cid:0) k + m π (cid:1) (A2)with m π = 0 . GeV. This becomes an infinite, 4-dimensional sum, S π ≡ (∆ k ) (cid:88) k n f ( k n ) , (A3)when the theory is confined to a box of side L . Here k µn = n µ ∆ k with n µ = 0 , ± , ± . . . and ∆ k = 2 π/L. (A4)We take L = 5 fm. The sum is easily converted to an integral: S π = ∞ (cid:90) −∞ d k f (¯ k ( k )) (A5)where ¯ k µ ( k ) ≡ round( k µ / ∆ k ) ∆ k (A6)and round( x ) is the nearest integer to x . Then the neededcorrection can be written, I π − S π = 16 ∞ (cid:90) d k (cid:2) f ( k ) − f (¯ k ( k )) (cid:3) (A7) = 16 (cid:90) m π d z (cid:81) µ (1 − z µ ) (cid:2) f ( k ) − f (¯ k ( k )) (cid:3) (A8)where k µ = m π z µ / (1 − z µ ) . This integral is ultraviolet finite,unlike the original integrals.This integral is easy for VEGAS +. Using the last 10 of 15iterations, each with N ev = 10 samples, gives results witha 7.5% uncertainty, which is accurate enough for most prac-tical applications. Here damping parameter α = 0 . . Setting N ev = 10 gives 0.5% errors. Classic VEGAS gives 1.0% forthe same N ev .We also tested VEGAS + on a 10 th -order QED contributionto the muon’s magnetic moment. We examined the contribu-tions from the light-by-light diagrams with two vacuum po-larization insertions (diagrams VI(a) in Fig. 6 and Table IX of x y FIG. 10. Bayesian fit (blue band) of data (19 blue data points) withposterior probability Eq. (B1) and parameters specified by Eqs. (B7)and (B9). The dotted line shows the fit from a standard least-squaresanalysis. The band shows the ± σ range around the best fit line. Ref. 23), using a (Fortran) code for the integrand provided byT. Kinoshita. The integration is over 9 Feynman parameters.We studied two cases: one where all the fermion-loop parti-cles are muons and the other where they are electrons. Thelatter has large factors of log( m µ /m e ) . We found that VE - GAS + was 3–4 × more accurate than classic VEGAS in bothcases, yielding uncertainties smaller than 0.01–0.02%, when N ev ≈ . Appendix B: Bayesian Curve Fitting
VEGAS +’s ability to find and target narrow high peaks in anintegrand makes it well suited for evaluating the integrals usedin Bayesian analyses. It can be much faster than other popu-lar methods, such as Markov Chain Monte Carlos (MCMC),when applied to small or medium sized problems.To illustrate a Bayesian analysis, we consider fitting astraight line p + p x to the data in Fig. 10. The error esti-mates for several of the points are clearly wrong, so we modelthe data’s probability density as a sum of two Gaussians, onewith the nominal width and another with × that width: P data ( y, σ y | p , w ) ≡ (1 − w ) √ πσ y e − ( y − p − p x ) / σ y + w √ π σ y e − ( y − p − p x ) / σ y (B1)Here w is the probability of a bad error estimate. Assumingflat priors for the p µ and w , the Bayesian posterior probabilitydensity is proportional to f ( p , w ) = P prior ( p , w ) (cid:89) i =1 P data ( y i , σ y | p , w ) , (B2)where P prior ( p , w ) ∝ Θ(0 < w < (cid:89) µ =1 Θ( − < p µ < . (B3)3The Bayesian probability distribution is normalized bycomputing the 3-dimensional integral I = (cid:90) − d p (cid:90) dw f ( p , w ) . (B4)The mean values for the three parameters are calculated fromadditional integrals, (cid:104) p (cid:105) = 1 I (cid:90) − d p (cid:90) dw f ( p , w ) p (cid:104) w (cid:105) = 1 I (cid:90) − d p (cid:90) dw f ( p , w ) w, (B5)and their covariances from still further integrals: cov p = 1 I (cid:90) − d p (cid:90) dw f ( p , w ) p p T − (cid:104) p (cid:105) (cid:104) p (cid:105) T , var w = 1 I (cid:90) − d p (cid:90) dw f ( p , w ) w − (cid:104) w (cid:105) . (B6)Additional integrals could provide expectation values (cid:104) g ( p ) (cid:105) and/or histograms for arbitrary functions g ( p ) . And so on.These integrals could be done separately using VEGAS +,but it is generally much better to do them simultaneously, us-ing the same sample points ( p , w ) for all of the integrals. Thisis because the VEGAS + errors in the different integrals arethen highly correlated, leading to significant cancellations inthe errors for ratios like Eq. (B5) and differences like Eq. (B6).
VEGAS + can estimate the covariances between the estimatesof different integrals by summing the covariances comingfrom each hypercube (estimated using the multivariable gen-eralization of Eq. (6)). Given the covariances, it is possible toaccount for cancellations in the uncertainties associated withratios, differences, and other combinations of the integrationresults.
VEGAS + with 28,000 integrand samples distributed across15 iterations gives values for the three model parameters thatare accurate to 0.06–0.3% (much more than is needed): (cid:104) p (cid:105) VEGAS + = (cid:0) . , . (cid:1) (cid:104) w (cid:105) VEGAS + = 0 . . (B7)We drop the first 5 iterations when estimating parameters. Ig-noring correlations between VEGAS + errors gives results thatare 2.5–12 × less accurate.To compare with MCMC, we generate 150,000 integrandsamples with a MCMC and use the last two thirds of thosesamples to estimate means for the parameters. The results areare 3–4 × less accurate than from VEGAS +, despite using morethan 5 × as many integrand samples: (cid:104) p (cid:105) MCMC , × = (cid:0) . , . (cid:1) (cid:104) w (cid:105) MCMC , × = 0 . (B8) We estimate the errors for the MCMC simulations by rerun-ning them several times.We show the fit line corresponding to the VEGAS + resultsfor (cid:104) p (cid:105) and cov p = (cid:18) . − . − . . (cid:19) (B9)in Fig. 10 (blue band). This is more plausible than the fit sug-gested by a standard least squares analysis (dotted line). Notethat the slope and intercept are anti-correlated, with correla-tion coefficient − . . Classic VEGAS is only somewhat lessaccurate (40%) than
VEGAS + for these integrals.Our MCMC analysis is more than an order of magnitudeless efficient than
VEGAS + when computing the means andcovariances of the fit parameters. The MCMC’s precision islimited by its de-correlation time, the number of MCMC stepsrequired between samples to de-correlate them. This accountsfor most of the difference here. The last iteration of
VEGAS +from above, for example, uses N ev = 1704 integrand sam-ples to obtain 0.76% and 0.14% errors on the intercept andslope, respectively. (The results in Eq. (B7) are from the last10 iterations and so have smaller errors.) From Eq. (B9),we can estimate that the same number of uncorrelated sam-ples from the posterior distribution Eq. (B1) would give er-rors of 1.15% and 0.22%, respectively. So samples from aperfect Monte Carlo (i.e., de-correlation time equals one step)would be slightly less valuable here than samples from VE - GAS +, once it has adapted. No general purpose MCMC is per-fect, of course. The advantage from
VEGAS + samples growswith increasing N ev , because VEGAS + errors fall faster than / √ N ev due to its adaptive stratified sampling (c.f., Eq. (47)).We also compared VEGAS + with MCMC for the more diffi-cult problem where there is a separate value of w for each datapoint. Then w becomes a 19-component vector w in the equa-tions above, and the integrals are over 21 variables: p and w . VEGAS + gives results with better than 1% errors from the last8 out of 24 iterations, using 162,000 integrand samples in all: (cid:104) p (cid:105) VEGAS + = (cid:0) . , . (cid:1) (cid:104) w (cid:105) VEGAS + = (cid:0) . , . . . . (cid:1) (B10)An MCMC analysis with 10 × as many samples gives resultsthat are 7–13 × less accurate than from VEGAS +: (cid:104) p (cid:105) MCMC , × = (cid:0) . , . (cid:1) (cid:104) w (cid:105) MCMC , × = (cid:0) . , . . . . (cid:1) (B11)This MCMC analysis used the last third of the samples forestimating parameters.Classic VEGAS and
VEGAS + are the same for this last ex-ample since there are only enough integrand samples to allowa single hypercube in 21 dimensions. This follows becausethe default behavior in
VEGAS + is to use the same number ofstratifications in each direction, and N st = 2 is too many (seeEq. (50)). We can cut the errors for the slope and intercept inhalf, however, by using N µ st = 46 stratifications in each of the p directions, with no stratification ( N µ st = 1 ) in the other di-rections. This can be done with the same number of integrandsamples as in the unstratified case. VEGAS + results for this problemcan be improved (by factors of order 1.5–4) if approximatevalues for the means of the parameters and their covariancematrix are known ahead of time, for example, from a peak-finding algorithm. Writing the fit parameters as a vector c , thisis done by first expressing the deviation from the approximatemean c in terms of the normalized eigenvectors u n of theapproximate covariance matrix: c ≡ c + (cid:88) n b n u n . (B12) The integral is then rewritten as an integral over the coeffi-cients b n rather than the components of c . This transformationreorients the error ellipse so it is aligned with the integrationaxes, making it easier for the VEGAS map to adapt around thepeak. Ref. 7 gives an example of this strategy in use. ∗ [email protected] G. P. Lepage, “A New Algorithm for Adaptive MultidimensionalIntegration,” J. Comp. Phys. , 192–203 (1978). B. P. Kersevan and E. Richter-Was, “The Monte Carlo event gen-erator AcerMC versions 2.0 to 3.8 with interfaces to PYTHIA6.4, HERWIG 6.5 and ARIADNE 4.1,” Comp. Phys. Comm. ,919-985 (2013). J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, O. Mat-telaer, H.-S. Shao, T. Stelzer, P. Torriellig and M. Zaroh, “The au-tomated computation of tree-level and next-to-leading order dif-ferential cross sections, and their matching to parton shower sim-ulations,” JHEP07 (2014) T. Aoyama, M. Hayakawa, T. Kinoshita, and M. Nio, “CompleteTenth-Order QED Contribution to the Muon g − ,” Phys. Rev.Lett. , 111808 (2012). G. Garberoglio and A. H. Harvey, “Path-Integral calculation ofthe third virial coefficient of quantum gases at low temperatures,”J. Chem. Phys. , 134106 (2011). G. Campolieti and R. Makarov, “Pricing Path-Dependent Optionson State Dependent Volatility Models with a Bessel Bridge,” Int.J. Theor. Appl. Finance , 51–88, (2007). P. Serra, A. Heavens, and A. Melchiorri, “Bayesian Evidence fora cosmological constant using new high-redshift supernova data,”Mon. Not. R. Astron. Soc. , 169175 (2007) J. L. Sanders, “Probabilistic model for constraining the Galac-tic potential using tidal streams,” Mon. Not. R. Astron. Soc. ,423431 (2014). K. G¨ultekin, D. O. Richstone, K. Gebhardt, T. R. Lauer, S.Tremaine, M. C. Aller, R. Bender, A. Dressler, S. M. Faber, A.V. Filippenko, R. Green, L. C. Ho, J. Kormendy, J. Magorrian,J. Pinkney, and C. Siopis, “The M − σ and M − L Relations InGalactic Bulges, and Determination of their Intrinsic Scatter,” As-trophys. J. , 198-221 (2009). J. Ray, Y. M. Marzoukb and H. N. Najm, “A Bayesian approachfor estimating bioterror attacks from patient data,” Statist. Med. , 101-126 (2011). F. M. Atay and A. Hutt, “Neural Fields with Distributed Transmis-sion Speeds and Long-Range Feedback Delays,” SIAM J. Appl.Dyn. Sys. , 670–698 (2006). J. S. Dehesa, T. Koga, R. J. Y´a˜nez, A. R. Plastino, and R. O.Esquivel, “Quantum entanglement in helium,” J. Phys. B: At. Mol.Opt. Phys. , 015504 (2012). D. Varjas, F. de Juan, and Y.-M. Lu, “Bulk invariants and topolog-ical response in insulators and superconductors with nonsymmor-phic symmetries,” Phys. Rev.
B 92 , 195116 (2015). VEGAS +, the algorithm described in this paper, can find the 20-Dsphere with 10 iterations of N ev = 10 integrand samples each. Another 10 iterations gives an estimate for the volume that is ac-curate to 0.05%. For a general discussion of importance and stratified sampling see:J. M. Hammersley and D. C. Hanscombe,
Monte Carlo Methods,
Chapt. 5 (Chapman and Hall, London, 1979). W. H. Press and G. R. Farrar, “Recursive Stratified Sampling ForMultidimensional Monte Carlo Integration,” Comp. in Phys. ,190 (1990). We used a Python adaptation of the version of the MISER code given in: W. H. Press et al,
Numerical Recipes in C ,2nd Edition (Cambridge, Cambridge, 2002). Error estimates from
VEGAS and
VEGAS + in most examples comefrom multiple iterations which were combined as described inSec. II E, with good χ s in each case. Estimated values for theintegral agreed with the exact values to within errors (i.e., mostlywithin ± σ ). G. Peter Lepage, gplepage/vegas v3.4.5 (2020), Zenodo http://doi.org/10.5281/zenodo.3897199 . The most recentcode is available at: https://github.com/gplepage/vegas . See, for example, the implementation in Ref. 18. Classic
VEGAS typically defaults to α = 1 . . VEGAS + can use asmaller default value, α = 0 . , because of its adaptive stratifiedsampling. The N × N Hilbert matrix has elements H µν = 1 / ( µ + ν − for µ, ν = 1 . . . N . It is famously ill-conditioned. T. Foster, C. L. Lei, M. Robinson, D. Gavaghan, and B.Lambert, “Model Evidence with Fast Tree Based Quadra-ture,” arXiv:2005.11300v1. We used the TQ s software providedby the authors at: https://github.com/thomfoster/treeQuadrature . T. Kinoshita and M. Nio, Phys. Rev.
D 73 , 053007 (2006). This example is adapted from J. Vanderplas’ Python blog: http://jakevdp.github.io/blog/2014/06/06/frequentism-and-bayesianism-2-when-results-differ/ .The data in the figure are included in the examples bundled withthe software in Ref. 18. We used the emcee
Python package for the MCMC simulations.It is an implementation of the algorithm described in: J. Good-man and J. Weare, “Ensemble Samplers with Affine Invariance,”Comm. App. Math. and Comp. Sci. , 65–80 (2010). A perfect Monte Carlo (i.e., samples de-correlate in one step)would beat