[PDF] Restoring a smooth function from its noisy integrals

Abstract

Numerical (and experimental) data analysis often requires the restoration of a smooth function from a set of sampled integrals over finite bins. We present the bin hierarchy method that efficiently computes the maximally smooth function from the sampled integrals using essentially all the information contained in the data. We perform extensive tests with different classes of functions and levels of data quality, including Monte Carlo data suffering from a severe sign problem and physical data for the Green's function of the Fr\"ohlich polaron.

Full PDF

RRestoring a smooth function from its noisy integrals

Olga Goulko

Department of Physics, University of Massachusetts, Amherst, MA 01003, USA andPresent address: Raymond and Beverly Sackler School of Chemistry and Schoolof Physics and Astronomy, Tel Aviv University, Tel Aviv 6997801, Israel

Nikolay Prokof’ev

Department of Physics, University of Massachusetts, Amherst, MA 01003, USA andNational Research Center “Kurchatov Institute,” 123182 Moscow, Russia

Boris Svistunov

Department of Physics, University of Massachusetts, Amherst, MA 01003, USANational Research Center “Kurchatov Institute,” 123182 Moscow, Russia andWilczek Quantum Center, School of Physics and Astronomy and T. D. Lee Institute,Shanghai Jiao Tong University, Shanghai 200240, China (Dated: May 14, 2018)Numerical (and experimental) data analysis often requires the restoration of a smooth functionfrom a set of sampled integrals over ﬁnite bins. We present the bin hierarchy method that eﬃcientlycomputes the maximally smooth function from the sampled integrals using essentially all the infor-mation contained in the data. We perform extensive tests with diﬀerent classes of functions andlevels of data quality, including Monte Carlo data suﬀering from a severe sign problem and physicaldata for the Green’s function of the Fr¨ohlich polaron.

PACS numbers: 02.60.-x, 02.70.-c, 07.05.Rm, 87.57.np

I. INTRODUCTION

Sampling is at the heart of many Monte Carlo com-putations and experimental measurements. Data pointsare generated or measured and we eventually want torestore the underlying smooth probability density dis-tribution f ( x ) behind them. Many density estimationprotocols for lists of statistical data exist, but forlarge-scale Monte Carlo calculations storing the individ-ual data points would require pentabytes of memory andlead to a substantial slowing down of the simulation.More importantly, the individual data points are notneeded, provided that we know that f ( x ) is structure-less below a certain scale. Hence, without losing any es-sential information, we can collect integrals of f ( x ) overﬁnite-size bins. The problem then is how to extract allthe information from these integrals without systematicbias, augmenting it with our a priori knowledge of thesmoothness of the function.In this paper, we propose a simple and eﬃcient solu-tion, which we call the bin hierarchy method (BHM). We note that the problem is closely related to the prob-lem of numerical analytic continuation under consistentconstraints, but with a crucial simpliﬁcation. Unlikefor the numerical restoration of spectral functions, we donot have to account for possible sharp features, such as δ -function peaks or kinks, which cannot be resolved withinthe sampled error bars. Assuming that f ( x ) is smooth,we are allowed to parametrize its optimal approximation˜ f ( x ) as a polynomial spline (or a spline with a generalfunctional ansatz).The protocol of constructing ˜ f ( x ) has some similarities with the smoothing spline approach. However, there isa fundamental diﬀerence in the data structure. Ratherthan ﬁtting to approximate function values on a givenset of points, we ﬁt the spline directly to the sampledintegrals. The central requirement is that ˜ f ( x ) must beconsistent not only with the sampled integrals over theelementary histogram bins, but also with the integralsover any combination of these. Integrals over large binsare known with higher precision than those over smallerbins, since the former contain more data points. Hence,accurately recovering a large integral is a priority over re-covering a smaller one. In particular, bins below a certainscale contain noise rather than information about f ( x ),and thus can be safely omitted in the ﬁtting. We formal-ize this idea by introducing a hierarchy of bins with onebin over the entire domain of f ( x ) on the top level, anddoubling the number of bins on each subsequent level.The goodness of ﬁt is evaluated on each level n sepa-rately, with χ n / ˜ n as the measure, where ˜ n is the numberof bins on the given hierarchy level.Out of all possible approximations ˜ f ( x ) consistent withthe sampled integrals we select the one with the leastfeatures, i.e. the one with the least ﬁtting parameters.This is achieved by ﬁtting a spline of order m (typically m = 3) with the minimal number of knots that yieldsan acceptable ﬁt. The number and the positions of theknots are determined automatically by the ﬁtting algo-rithm. Known boundary conditions on f ( x ) can also beaccounted for. If desired, additional optimization terms,for instance for the jump in the highest spline derivative,can be included into the ﬁtting procedure, with iterativeimprovement in the spirit of the method of consistent a r X i v : . [ s t a t . O T ] M a y constraints. The paper is organized as follows. In Sec. II, we re-view alternative methods to construct ˜ f ( x ) and arguethat the BHM is superior in accuracy, eﬃciency, and sim-plicity. The details of the BHM are presented in Sec. III.In Sec. IV, we present several tests of the BHM. We con-clude in Sec. V. II. REVIEW OF ALTERNATIVE METHODSA. Naive histograms

The most straightforward way to approximate a distri-bution f ( x ) is the histogram: divide the domain intoseveral bins and count how many times the sampled val-ues of x , which were generated with probabilities givenby f ( x ), fall into each of the bins. For each generated x ∈ bin i , the counter c i for the bin i is increased by one.The distribution is then approximated by˜ f ( x ) = (cid:88) i c i N ∆ i Π i ( x ) , (1)where N is the total number of sampled points, ∆ i isthe width of the bin i , and Π i ( x ) is the boxcar function,which is one if x ∈ bin i and zero otherwise. In what fol-lows, we refer to this method as “naive histogramming”—to emphasize the contrast with the BHM that also in-volves histogramming as a part of the protocol. The endresult of the naive histogram method is a staircase func-tion, which is not smooth. This drawback can be amelio-rated by taking c i / ( N ∆ i ) as the value of ˜ f ( x ) at the bincenters and ﬁtting a smoothing spline to the data withtheir statistical error bars. A more important issue isthat the naive histogram method suﬀers from an inher-ent compromise between the resolution of features andstatistical noise. Suﬃciently narrow bins are necessaryto resolve f ( x ) without introducing a signiﬁcant system-atic bias, while reducing the size of the bins increases thenoise in the sampled counters. B. Basis projection method

A generalization of the naive histogram method—thebasis projection method—can signiﬁcantly reduce thisdrawback. This method is based on a generalized Fourierseries expansion of f ( x ). For each bin i , we choose a ba-sis { e ( j ) i ( x ) } of m i real-valued functions with e ( j ) i ( x ) = 0if x / ∈ bin i that satisfy the orthonormality condition (cid:104) e ( j ) i | e ( j (cid:48) ) i (cid:105) ≡ (cid:90) bin i w i ( x ) e ( j ) i ( x ) e ( j (cid:48) ) i ( x ) dx = δ jj (cid:48) . (2)The non-negative weight function w i ( x ) of the innerproduct should be chosen such that all integrals are con-vergent. If no divergences are present w i ( x ) can be set to unity. The basis and weight functions may be diﬀerentfor each bin, and can be deﬁned also for bins of inﬁnitesize. In general, the basis functions can be chosen to re-ﬂect the properties of f ( x ) that are known in advance,for instance known divergences or asymptotic behavior.Otherwise, in the majority of the cases, shifted Legendrepolynomials are a reasonable choice.With the basis projection method, in each bin i wekeep track of several counters c ( j ) i estimating the m i gen-eralized Fourier coeﬃcients (cid:104) e ( j ) i | f (cid:105) . Speciﬁcally, eachtime we generate x ∈ bin i we add the value w i ( x ) e ( j ) i ( x )(rather than one) to each of the c ( j ) i counters. The func-tion ˜ f ( x ) is then given by˜ f ( x ) = (cid:88) i m i (cid:88) j =1 c ( j ) i N e ( j ) i ( x ) . (3)Note that Eq. (1) is a special case of the above formulawith m i = 1, w i ( x ) = 1, and e (1) i ( x ) = Π i ( x ) / √ ∆ i .It can be easily seen that this procedure correspondsto the best possible representation of f ( x ) in terms oforthogonal basis functions on each bin i , since minimizing (cid:90) bin i w i ( x )( f ( x ) − ˜ f ( x )) dx == (cid:90) bin i w i ( x )  f ( x ) − m i (cid:88) j =1 c ( j ) i N e ( j ) i ( x )  dx (4)with respect to the c ( j ) i implies (cid:90) bin i w i ( x ) e ( j ) i ( x )  f ( x ) − m i (cid:88) j (cid:48) =1 c ( j (cid:48) ) i N e ( j (cid:48) ) i ( x )  dx = 0 , (5)and consequently c ( j ) i /N = (cid:104) e ( j ) i | f (cid:105) .Using basis projections we collect more information ineach sampling step than with the naive histogram, sincesome information about the position of x inside the bin ispreserved. Therefore fewer bins are necessary to resolvethe function. The bin widths and basis sizes m i can bechosen in a way that the systematic error is negligiblecompared to the statistical error, without a substantialincrease in the statistical noise of the ˜ f ( x ) values.While the basis projection method is a signiﬁcant im-provement over the naive histogram, it still has severaldisadvantages. The binning and the basis functions mustbe chosen at the beginning of the sampling process andcannot be altered retrospectively. Basis projections canalso be computationally expensive, especially for a largebasis, or a basis built of complicated functions. Moreover,the sampled projections generally exhibit large jumps atthe bin boundaries because the basis functions may beunbounded. As before, we can eliminate these jumps bytaking the values of ˜ f ( x ) at several points inside each binand then constructing a smoothing spline. This proce-dure, however, involves an unnecessary loss of informa-tion. Rather than reducing the ﬁt to speciﬁc individualpoints, it is suggestive to ﬁt a spline to the sampled inte-grals (cid:104) e ( j ) i | f (cid:105) directly, resulting in smaller errors. In addi-tion, we can extract more exhaustive information about f ( x ) by using overlapping bins of diﬀerent size that covera range of relevant scales of the function. This is achievedby the BHM described in Sec. III. C. Reweighting

Reweighting is a Monte Carlo-speciﬁc technique thatallows one to convert the statistics generated for a givenvariable x to statistics for other values of this variable. This is achieved in the following way. Prior to sampling,we choose a ﬁxed set of points x i and associate a binwith each of the x i . The bins can be large and overlap-ping. Whenever the random variable x falls into the bin i associated with the point x i , we update the x i counterby the reweighted value vp ( x i ) /p ( x ). Here v is the valuethat would have been sampled without reweighting, and p ( x i ) /p ( x ) is the ratio of the probability weights at x i and x , where p ( x ) is the probability distribution usedto sample the physical function f ( x ). This can be seenas a deterministic “update” from the generated point x to the ﬁxed point x i . For suﬃciently large bins, severalcounters may be updated simultaneously for each gener-ated x . The reweighting technique provides an unbiasedestimator for the value f ( x i ) at any ﬁxed x i .The reweighting method is very eﬃcient in the casewhen one is interested in a single special value of a certaincontinuous variable. A characteristic example is the sam-pling of the single-particle density matrix from the statis-tics for the Green’s function by reweighting the latter tothe imaginary-time value τ = −

0. In all other cases,reweighting is computationally expensive compared tothe gain in accuracy. It also requires a case-speciﬁc imple-mentation that assumes knowledge of the analytical formof the reweighting factors. To be maximally eﬃcient, onehas to adjust the size of bin i for each x i of interest. Thebin size must be optimized in such a way that, on theone hand, it is large enough to ensure suﬃciently largestatistics, and on the other hand small enough that thereweighting factors remain of the order of unity (to avoidnumerous vanishingly small contributions that consumesimulation time). Bins that are too large can also resultin a strong increase in the autocorrelation time, sincereweighting from a rare event to a frequent one impliesa large p ( x i ) /p ( x ) ratio, which will result in occasionalanomalously large contributions to the sampled counters.Moreover, to produce a smooth outcome, the reweightingprotocol requires an appropriately dense mesh of pointswith strongly overlapping bins, which is computationallyexpensive since a large number of counters needs to beupdated for each generated x . For a continuous set offunction values, a smoothing spline needs to be ﬁtted at additional computational cost, comparable to the cost ofthe BHM ﬁt described in Sec. III. III. BIN HIERARCHY METHOD

The BHM restores smooth distributions in a universal,unbiased, and eﬃcient manner. The method satisﬁes thefollowing requirements: • All the information contained in the sampled datashould be used to ﬁt the distribution: the accuracyshould be essentially the same as if we had storedthe full list of the generated values of x . This isachieved by introducing a hierarchy of overlappinghistogram bins of diﬀerent size, where large binsgive precise estimates of the broad features of thedistribution, while small bins resolve its ﬁne struc-ture once enough data has accumulated. Note thatwe assume that the sampled distribution is struc-tureless below a certain scale, and hence there al-ways exists a binning that is suﬃciently ﬁne to re-solve all of its features. • The resulting function should be exactly smooth,meaning that the function and all its derivativesup to a given order must have no jumps. This isachieved by ﬁtting a polynomial (or generalized)spline. • Out of all smooth functions consistent with thesampled data within its error bars, the selected re-sult should have the least features (peaks, oscilla-tions, etc.). This is achieved by choosing the splinewith the minimal number of free parameters thatﬁts the data. • The sampling process and the restoration of thesmooth function should be computationally andmemory eﬃcient. In particular, they should bemore eﬃcient than the basis projection method. • The algorithm should work well regardless of thequality and the number of sampled data. As moredata are collected and statistics improves, so shouldthe ﬁtted distribution without needing any externaladjustments. • The algorithm should be highly automated. Nohuman input or control should be necessary otherthan a limited choice of initial parameters. In par-ticular, the method should be applicable to all rea-sonably occurring functions and should require noadjustment for diﬀerent classes of functions.Below we describe the setup in detail.

A. Sampling stage

We divide the domain into 2 K non-overlapping elemen-tary bins that do not need to be of equal width. Becausein the end we form combinations of several bins, the ele-mentary bins may be arbitrarily small. Bins with too fewdata points are automatically excluded from the analy-sis. As the number of sampled points increases, bins ofsmaller size will become usable. The number of elemen-tary bins is thus limited only by memory. The K → ∞ limit would formally correspond to keeping the full list ofthe generated values of x . In practice, the distributionsof interest are structureless beyond a certain scale thatsets a natural limit on the required elementary bin width.We generate values of x according to a probabil-ity distribution given by | f ( x ) | (importance sampling)and sample the value v = sign[ f ( x )] in the elementarybin i that contains x . We can also include additionalweighting coeﬃcients into the sampled values. For eachbin we store the number of sampled values in the bin, N i , as well as the average ¯ v i and the scaled variance M ( v i ) = ( N i − v i ) of the sampled values over thebin. We obtain the sampled integral I i over each elemen-tary bin via I i = ¯ v i N i /N , where N is the total numberof sampled points. The variance of the sampled integralis calculated viaVar( I i ) = M ( I i ) N − M ( I i ) = M ( v i ) + ¯ v i N i ( N − N i ) N (7)and its error is given by δI i = (cid:112) Var( I i ) /N . B. Bin hierarchy

After sampling has ﬁnished, we construct combinationsof elementary bins. These make up a hierarchy with 2 n bins at each level n ∈ { , . . . , K } . At the top level wehave one large bin containing the entire integral; on thenext level we have two bins containing the integrals overhalf of the interval each, etc. The ﬁt is constructed tominimize K (cid:88) n =0 χ n n = χ χ . . . + χ K K , (8)where χ n is the goodness of the ﬁt considering only thebins on the level n . Note that the construction of the binhierarchy happens in the post processing. During thesampling stage, only the values N i , ¯ v i and M ( v i ) in theelementary bins (corresponding to the hierarchy level K )need to be collected and stored.We tested several modiﬁcations of this setup, whichproduced consistent results. Including more than 2 n binson each level by using overlapping bins of equal size wasfound to bring no signiﬁcant improvement. Likewise, us-ing weights other than 1 / n for χ n did not improve theresult. In general, ﬁner bins at higher levels do not sub-stantially contribute to the form of the ﬁt, as their errorbars are large and the shape of the distribution is usuallyalready determined by the lower-level integrals. But they are still included into the ﬁtting procedure (provided thatthey contain enough data points for statistics) to resolvepotential ﬁne structures of the distribution. For example,a periodically oscillating function like the sine can havezero average over large parts of the domain, while beingnonzero on a ﬁner scale. The structure of such a functionwill be resolved from the integrals over small bins if theycontain enough data. Due to the 1 / n weight factors, thesmall bins cannot overpower the large bin contributions.Note that while for minimization we use the sum overall levels, to decide whether a ﬁt is accepted we check thegoodness of the ﬁt at every level separately. A ﬁt is ac-cepted only if each of the χ n / ˜ n is approximately 1, withina given number T of standard deviations σ = (cid:112) / ˜ n (˜ n ≤ n is the number of bins on level n which con-tain suﬃcient data to be used for ﬁtting). The ﬁt accep-tance threshold T is an external input parameter, typi-cally T = 2. Lower values of the threshold are likely toresult in the failure to produce an acceptable ﬁt. Insteadof a ﬁxed value, a range of T can also be speciﬁed, forexample between two and four. If there is no acceptableﬁt at the lowest threshold value, it is gradually increaseduntil either the maximum is reached or an acceptable ﬁtis found. For a large number of hierarchy levels, the sta-tistical probability for a level to exceed a 2 σ thresholdon χ n / ˜ n becomes non-negligible, even if the overall ﬁt isgood. Specifying a threshold range allows one to attemptﬁts with lower threshold values, without risking a failedﬁt. C. Polynomial ﬁt

The sampled integrals are ﬁtted to a spline. In whatfollows we discuss polynomial (typically, cubic) splines,but a generalization to other functions is straightforward.Here we brieﬂy review the theory of ﬁtting a single poly-nomial p ( x ) = (cid:80) mk =0 a k x k , before we turn to the fullspline in the next section.There exists a lot of standard software for ﬁtting a setof points to a function. Here we want to match the poly-nomial to a set of integrals, which requires a slight adjust-ment. The derivation follows the steps of general linearleast squares ﬁtting. The best polynomial ﬁt minimizes χ = (cid:88) i (cid:32) I i − I ( p ) i δI i (cid:33) , (9)where I ( p ) i is the integral of the polynomial over the bin i . For simplicity we omit the sum over bin levels andthe corresponding weighting factors in this section, butthe generalization is straightforward. Let us label theboundaries of bin i by ¯ x i and ¯ x i +1 . The integral of thepolynomial over the bin equals (cid:90) ¯ x i +1 ¯ x i p ( x ) dx = m (cid:88) k =0 a k k + 1 (¯ x k +1 i +1 − ¯ x k +1 i ) ≡ m (cid:88) k =0 a k X ik (10)and Eq. (9) becomes χ = (cid:88) i (cid:32) I i δI i − m (cid:88) k =0 a k X ik δI i (cid:33) ≡ (cid:88) i (cid:32) b i − (cid:88) k A ik a k (cid:33) = | A · a − b | . (11)The vector b contains the sampled integrals divided bythe error bar (its length equals the number of bins ˜ n ),the vector a of length m + 1 contains the free parametersof the polynomial, and the ˜ n × ( m +1) design matrix A isdeﬁned by A ik = X ik /δI i . The design matrix has morerows than columns since there must be more data pointsthan ﬁt parameters. We ﬁnd the vector a that minimizes | A · a − b | using singular value decomposition of thedesign matrix. The error on the ﬁtted polynomial ata given point x is obtained from the covariance matrix C jk = Cov( a j , a k ) via δp ( x ) = (cid:118)(cid:117)(cid:117)(cid:116) m (cid:88) i,j =0 C ij x i + j . (12) D. Spline

At each knot between two spline pieces, the values ofthe corresponding polynomials and their derivatives upto order ( m −

1) are equal. This gives m constraints ateach knot. Since each polynomial has m + 1 coeﬃcients,the total number of free parameters is n p + m , where n p is the number of spline pieces. The extra m parametersarise since there is one more polynomial piece than knots.Known boundary conditions, if any, further reduce thenumber of free parameters.The algorithm determines the positions of the knotsusing the following procedure. Begin by attempting toﬁt one polynomial on the whole interval. If this ﬁt isnot acceptable, split the interval into two equal partsand attempt ﬁtting a two-piece spline. If this ﬁt is notacceptable either, check the goodness of the ﬁt on eachinterval separately. Intervals where the ﬁt is not accept-able have to be split in the next iteration, while all otherintervals remain unchanged. Repeat this procedure untileither an acceptable ﬁt is found, or the intervals can nolonger be split. We require that the set of equations foreach spline piece be overdetermined, i.e. that the numberof bins inside each interval is larger than m + 1. This setsthe limit on the maximal number of spline pieces.To check the goodness of the ﬁt on an individual in-terval, we compute the values of χ n / ˜ n , where χ n is eval-uated only on bins of level n that are fully inside thegiven interval (and that contain enough statistical data),and ˜ n is the number of such bins. Larger bins that reachover several intervals cannot be used for this check, but they are used to check the goodness of the overall ﬁt.In all test cases, if the polynomials on the individual in-tervals passed the goodness-of-ﬁt test, the overall splineevaluated over the entire domain passed also. Note thatas before, the checks on the individual intervals are per-formed for each hierarchy level separately. If more thanhalf of the level n bins inside a given interval do not con-tain enough data, the check for this interval terminateswithout proceeding to subsequent levels. E. Constraining jumps in the highest derivative

An additional constraint on the jumps in the m thderivative of the spline can be imposed on the ﬁnal ﬁt,if desired. Derivatives up to order m − m are zero. The piecewise constant m th derivative,which is proportional to the value of the parameter a ( j ) m on each interval j , is the only one that exhibits jumps atthe knots. Imposing an additional constraint that aimsto minimize these jumps ensures a continuous evolutionof the ﬁtted spline as a function of the number of sam-pled points N . The number of spline pieces for the bestﬁt generally increases with growing N , as better statisticspermits a better resolution of the sampled function. Ad-ditional interval divisions are triggered at certain discretevalues of N , namely when one of the χ n values on the af-fected interval crosses the goodness-of-ﬁt threshold. Thiscreates an additional jump in the m th derivative at theposition of the new knot. This jump can be closed withonly a small sacriﬁce in χ n since the previous spline withone fewer knot was almost acceptable. In this sense, theadditional constraint on the jump acts as a compensationmechanism for a potentially too vigorous interval split-ting. In most cases, constraining the jump has almost noeﬀect of the ﬁt.The constraint term has the form λn p n p − (cid:88) j =1 λ j (cid:32) a ( j +1) m − a ( j ) m ¯ a ( j +1) m − ¯ a ( j ) m (cid:33) , (13)where ¯ a ( j ) m are the polynomial parameters of the opti-mal ﬁt without the constraint and the weights λ and λ j control the strength of the constraint. After comput-ing the optimal spline through minimization of Eq. (8),a new spline ﬁt is attempted on the same interval di-vision, this time by minimizing the sum of (8) and theconstraint (13). The local weights λ j are determined bythe goodness of the original ﬁt (without constraint) onthe intervals j − j , j + 1 and j + 2, which are adjacentor next-to-adjacent to the respective knot. Speciﬁcally,we determine (on each hierarchy level of each of theseintervals) the diﬀerence between the maximally accept-able χ n / ˜ n = 1 + T (cid:112) / ˜ n and the actual χ n / ˜ n . We take λ j to be the minimum of all these diﬀerences. This en-sures that the constraint does not aﬀect spline intervalswhere already the original ﬁt was barely acceptable. Theglobal weight λ is iteratively adjusted to the maximalvalue that is still compatible with the imposed thresh-old on the χ n / ˜ n . Additional iterations can be performedby replacing the ¯ a ( j ) m with the values calculated with theconstraint. F. Errors on the spline coeﬃcients

An estimate on the spline coeﬃcient errors can be ob-tained from Eq. (12) for each spline piece. This equationis accurate for parametric ﬁts to a set of statistically in-dependent normally distributed data points. This is notthe case for the BHM, due to the overlapping bins andthe unconventional χ -minimization protocol. Becauseof this, one cannot attach Gaussian standard deviationprobabilities to error bars. A robust way of deﬁning theerror at a given point x is to look at the histogram of˜ f ( x ) values over a large set of independent runs with thesame sample size N . The smallest interval around thehistogram mean that contains 68 .

27% of the ˜ f ( x ) valuescorresponds to the robust estimate for two standard er-rors ( ± σ ). This is the most rigorous, albeit impractical,way of deﬁning the error under these circumstances. Asuitable alternative protocol should yield errors that areclose to the robust value.We performed extensive tests of several error estima-tion protocols by comparing them to the robust estimatefor diﬀerent types of distribution f ( x ). These tests con-ﬁrmed that Eq. (12) provides a good error estimate inmost cases. In some cases the error was found to beoverestimated by a factor of about two compared to therobust error. This occurred in particular at the domainboundaries and for BHM splines with a large numberof knots. A concrete example is presented in Sec. IV C.The error was never observed to be too small and henceEq. (12) gives a convenient and eﬃcient way to obtain aconservative error estimate on the BHM ﬁt.A tighter error estimate can be obtained usingbootstrap with the following protocol. Produce an ar-ray of M histograms each of which contains a fraction N/M of the N sampled data points, so that each pointis sampled in exactly one of the histograms. Generate˜ M ≥ M bootstrap histograms by taking ˜ M randomlinear combinations (with positive integer coeﬃcients)of the M histograms. Each of these bootstrapped his-tograms then contains N sampled points, but with pos-sible repetitions. Run the BHM ﬁt on each of the boot-strap histograms ﬁxing the positions of the knots to bethe same as for the BHM ﬁt of the regular histogram(the sum of all bootstrap histograms with unit weight)and without evaluating the goodness of ﬁt. The error onthe spline coeﬃcients is then determined from the statis-tics on the bootstrapped histogram ﬁts in the usual way.The bootstrap error was found to be very close to therobust error estimate in all our tests. This scheme usesmore resources, especially memory, since M (cid:38)

100 his- tograms need to be stored instead of one. The increasein computational time is less signiﬁcant, since performingﬁts with ﬁxed knot positions is very fast.Another error estimation protocol is based on the anal-ysis of the evolution of the BHM ﬁt as a function of sam-ple size N . For this, the BHM spline is produced andsaved periodically in intervals of ∆ sampling steps. Theidea is that the ﬁt results ˜ f ( x ) will approach the exactvalue with the dispersion σ ∝ / √ N . The interval size ∆must be large enough to guarantee that the contributions A k from diﬀerent intervals are statistically independent,allowing one to invoke the central limit theorem:˜ f k = 1 k k (cid:88) i =1 A i ⇒ A k = k ˜ f k − ( k −

1) ˜ f k − . (14)The dispersion of ˜ f k can then be obtained from the dis-persion σ ∗ of the independent A k via σ k = σ ∗ /k . Anadditional cutoﬀ k ≥ k should also be introduced to re-duce the eﬀect of the initial portion of the statistics thatcan be strongly aﬀected by transient processes. A simpleconsistency criterion is that there should be a range of∆ and k such that σ ∗ is essentially independent of bothparameters. Our tests showed that the errors calculatedwith this method are almost indistinguishable from theones obtained with bootstrap, and hence also agree withthe robust error. This method is somewhat slower, sincethe full BHM ﬁt needs to be evaluated repeatedly, andrequires a suﬃciently large N . On the other hand, the A k sequence provides explicit insight into the statistics ofthe data and can help to identify statistical ﬂuctuations. G. Sign problem

Monte Carlo sampling often suﬀers from a sign prob-lem, when positive and negative terms occur with almostequal frequency in the sampling process. Before a suﬃ-cient amount of data has been collected, the error maythus substantially exceed the absolute value of the sam-pled integrals. The same applies to the error of the ﬁttedspline. Fits where the noise substantially exceeds the sig-nal are unsuitable for further data analysis. To identifythis problem, the algorithm checks—at all bin levels—whether the sampled data are consistent with zero beforethe start of the ﬁtting process. The data is deemed cer-tainly inconsistent with zero only if at least one of thefollowing conditions is met: (i) at any individual level n , the value of χ n / ˜ n for the zero function exceeds 1 byat least 4 standard deviations (4 (cid:112) / ˜ n ), or (ii) at anytwo individual levels, χ n / ˜ n exceeds 1 by at least 3 (cid:112) / ˜ n ,respectively, or (iii) χ n / ˜ n exceeds 1 by at least 3 (cid:112) / ˜ n at one level, and by at least 2 (cid:112) / ˜ n on two other levels,respectively, or (iv) at any four individual levels, χ n / ˜ n exceeds 1 by at least 2 (cid:112) / ˜ n , respectively. The probabil-ity for this to occur as a statistical ﬂuctuation is lowerthan 10 − . Unless one of these conditions is met, the al- . . . . x . . . . f ( x ) ∝ − x / + x − x / (a)BHM splinebasis projection f ( x ) . . . . x − . − . . . ˜ f ( x ) − f ( x ) (b)BHM splinebasis projection FIG. 1. Cubic polynomial test function. The BHM ﬁt (red dashed line with error band) in comparison with the basis projectionmethod (blue dotted line with error band). The left panel (a) shows the function and the ﬁts, while the right panel (b) showsthe diﬀerence between the ﬁtted and the true function for both methods. gorithm can be set to not continue with the ﬁtting proce-dure until more data has been collected. The conditionsare chosen to be strict, to minimize the risk of continuingthe analysis with a noisy ﬁt.An alternative to this procedure is to consider the evo-lution of the ﬁtted spline with accumulated statistics,accepting the spline only when it ceases to change sub-stantially over Monte Carlo time, (cid:90) (cid:12)(cid:12)(cid:12) ˜ f N ( x ) − ˜ f N ( x ) (cid:12)(cid:12)(cid:12) dx < α (cid:90) | ˜ f N ( x ) | dx, (15)where ˜ f N ( x ) is the BHM spline obtained from N sam-pled data points, α is the acceptance threshold and theintegration goes over the entire domain. Typical valuesfor α lie approximately between 0 . . IV. NUMERICAL TESTS

We present tests of the BHM for several types of distri-bution. Unless otherwise stated, for each test 10 randomsamples were taken, to demonstrate the ability to restorethe smooth distribution from a moderate sample. We al-ways ﬁt cubic splines. The displayed error bands on thesplines were obtained with Eq. (12).For comparison, we sampled the same data points alsowith the basis projection method using a polynomial ba-sis up to cubic order. The binning for basis projectionsampling was retrospectively chosen to be the same asthe interval division for the spline that was determinedby the bin hierarchy algorithm. This is a strict test,since in practice the optimal binning for the projectionmethod is not known in advance. For each of the tests weobserved that the accuracy of the bin hierarchy ﬁt wasat least as good as with the basis projection method, proving the superiority of BHM due to at least its eﬃ-ciency and the smoothness of ˜ f ( x ). We display the basisprojection results for the ﬁrst example of a polynomialdistribution and omit them in the following examples toavoid overcrowding the plots. A. Polynomial

The distribution sampled on the interval [1 , .

8] is f ( x ) ∝ − x/ x − x / . (16)Since the distribution is a cubic polynomial, the algo-rithm should stop after ﬁtting one polynomial on theentire domain. This is indeed the case. Figure 1 showsthe BHM ﬁt in comparison with the result obtained usingthe basis projection method. In this particular case, thebasis projection method provides the most accurate esti-mate on the function by deﬁnition, since we are project-ing on a function that has the same form as f ( x ) and thuscover the whole interval [1 , .

8] without introducing anysystematic error. Nevertheless, BHM produces a resultcomparable in accuracy. This example demonstrates thatthe BHM ﬁt competes with the basis projection methodeven when the latter is known to be optimal.We also perform tests with a higher order polynomial.The function sampled on the interval [ − ,

1] is f ( x ) ∝ x − . x . (17)Since f ( x ) changes sign, we use | f ( x ) | as the probabilityto generate x and then sample sign[ f ( x )]. The BHMprovides an accurate smooth ﬁt of this function. Theresult in comparison with the basis projection method isshown in Fig. 2. Four spline pieces were needed to resolvethe function in this example. As in the previous example, − . − . . . . x − . − . . . . f ( x ) ∝ x − . x (a)BHM splinebasis projection f ( x ) − . − . . . . x − . − . . . . ˜ f ( x ) − f ( x ) (b)BHM splinebasis projection FIG. 2. Quartic polynomial test function. The BHM ﬁt (red dashed line with error band) in comparison with the basisprojection method (blue dotted line with error band). The left panel (a) shows the function and the ﬁts, while the right panel(b) shows the diﬀerence between the ﬁtted and the true function for both methods. . . . . x . . . . . . . f ( x ) ∝ e x p ( − x ) (a) BHM spline f ( x ) . . . . x − . − . . . . ˜ f ( x ) − f ( x ) (b) BHM spline FIG. 3. BHM ﬁt with error band of an exponential test function. The left panel (a) shows the function and the ﬁt, while theright panel (b) shows the diﬀerence between the ﬁtted and the true function. the errors on the BHM spline and the basis projection arecomparable in size.

B. Decaying exponential

The distribution sampled on the interval [1 , .

8] is f ( x ) ∝ exp( − x ) . (18)An exponentially decaying distribution is typical formany physical processes. In this example, the BHM pro-duced an acceptable ﬁt with two spline pieces on theintervals [1 , .

9] and [1 . , . f ( x ) onthe ﬁrst interval [1 , . C. Oscillating function

The distribution sampled on the interval [1 , π + 0 .

6] is f ( x ) ∝

10 + cos(10 x ) . (19) . . . . x . . . . . . . f ( x ) ∝ e x p ( − x ) BHM splinespline using only elementary bins f ( x ) FIG. 4. Exponential test function. The BHM ﬁt (red dashedline with error band) in comparison with the ﬁt using onlyelementary bins (blue dotted line with error band). The latterdoes not properly resolve all sampled integrals of the testfunction.

The challenge is to resolve the periodic oscillations withinthe accuracy of the sampled data. For a small sample,we expect a ﬂat ﬁt reproducing the average of the func-tion. As we collect more data the oscillations should beresolved. We show the bin hierarchy ﬁt for a small sam-ple of 10 points and for a larger sample of 10 pointsin Fig. 5. The ﬁt reproduces the original distributionwell within error bars. Many spline pieces are neededto resolve the structure (in the examples shown: 15–16intervals) and Eq. (12) overestimates the error on thespline coeﬃcients. In Fig. 5 we show the error band ob-tained from Eq. (12) as well as the one obtained usingbootstrap.In Fig. 6 we explicitly compare the error estimates ob-tained with Eq. (12), bootstrap, and ﬁt evolution. Thebootstrap and ﬁt evolution errors are nearly the same forall x . In this example, for bootstrap and ﬁt evolutionerrors, the deviation ˜ f ( x ) − f ( x ) exceeds 1 σ on roughly30% of the domain and 2 σ on roughly 7% of the do-main, which is comparable to the Gaussian case. For theEq. (12) error, the deviation almost never exceeds 1 σ .We also present a robust error analysis based on 100 in-dependent samples with 10 points each. The histogramof ˜ f ( x ) values is shown in Fig. 7 for several values of x .For reference we indicate the speciﬁc value of ˜ f ( x ) fromthe example shown in Fig. 5, to illustrate for which x the ﬁt values in this example are statistical outliers, andfor which they are typical. We also show the median ofthe Eq. (12) errors and a representative bootstrap error(from the example shown in Fig. 5). Indeed, the sizeof the bootstrap and ﬁt evolution errors is very close tothe robust estimate, while the error from Eq. (12) is toolarge. D. Divergent function on a semi-inﬁnite domain

The distribution sampled on the interval [0 , ∞ ) is f ( x ) ∝ √ x (1 + x ) . (20)The basis projection method has the advantage thatthe 1 / √ x divergence at x = 0 can be incorporateddirectly into the basis, and that the asymptotic be-havior at large x can be resolved using a semi-inﬁnitebin. We present a setup that achieves the same in theBHM framework. The divergence can be avoided byweighting all sampled values with √ x and recovering theoriginal distribution at the end by dividing the BHMspline by √ x . The semi-inﬁnite domain can be mappedonto a ﬁnite interval for instance through a transformsuch as y ( x ) = 2 arctan( x ) /π (used in this example) or y ( x ) = 1 − exp( − x ). For each x generated according tothe distribution f ( x ), the value y ( x ) ∈ [0 ,

1] is calculatedand sampling is then performed into the corresponding y -histogram. To recover the correct function, the sampledvalues need to be scaled by 1 /x (cid:48) ( y ).Figure 8 shows the BHM ﬁt transformed back into theoriginal domain. Due to the large domain, 10 samplingpoints were used for this example. The ﬁt agrees wellwith the original distribution within error bars. In thisexample, the BHM produced an acceptable ﬁt on onespline interval.In order to compensate for the divergence, it is as-sumed that its location and type are known in advance.Often it is suﬃcient to know the approximate proper-ties of the divergence. To demonstrate this, Fig. 9 showsBHM ﬁts of data sampled with a weighting factor thatslightly overestimates or underestimates the power of thedivergence (the mapping of the semi-inﬁnite domain ontoa ﬁnite interval is the same as before). Despite the wrongscaling, the ﬁts still agree with the original distribution,but the errors on the ﬁts are larger. It is better to over-estimate the power, since in this case the divergence isstill fully compensated. E. Sampling with a severe sign problem

We demonstrate the performance of the algorithm inthe presence of a severe sign problem by generating dataon the interval [0 ,

3] with the distribution given by f ( x ) ∝ exp( − . x ) − exp( − x ) ≡ f ( x ) − f − ( x ) . (21)The values of x are generated via a Monte Carlo Markovchain process with two types of updates: switching be-tween two “sectors” corresponding to f and f − , andvarying the value of x on the interval [0 ,

3] within thesame sector. The update acceptance probabilities aregiven by the detailed balance equations, P f − σ → f σ = f σ ( x ) /f − σ ( x ) = exp( ± . x ) , (22) P x → x (cid:48) = f σ ( x (cid:48) − x ) . (23)0 . . . . . . x . . . . . . f ( x ) = / π + c o s ( x ) / π (a) BHM spline f ( x ) . . . . . . x − . − . . . . ˜ f ( x ) − f ( x ) (b) BHM spline . . . . . . x . . . . . . f ( x ) = / π + c o s ( x ) / π (c) BHM spline f ( x ) . . . . . . x − . − . . . . ˜ f ( x ) − f ( x ) (d) BHM spline FIG. 5. Oscillating test function using 10 sampled points (top panels (a) and (b)) and 10 sampled points (bottom panels (c)and (d)). The BHM ﬁt (red dashed line) is shown with two error estimates on the spline coeﬃcients: errors calculated usingEq. (12) (red band) and errors calculated using bootstrap (blue band). Bootstrap provides a tighter bound on the errors in thisexample. The left panels, (a) and (c), show the function and the ﬁt, while the right panels, (b) and (d), show the diﬀerencebetween the ﬁtted and the true function. The sign problem is due to the functions f ( x ) and f − ( x )having almost the same magnitude but being sampledwith opposite sign. This setup is a toy version of thetypical situation arising in diagrammatic Monte Carlofor fermions.Figure 10 shows the BHM spline for 10 and 10 sam-pled points. In the former case, the data is compatiblewith zero, which is correctly captured by the algorithmpresented in Sec. III G. In this case, the ﬁt should notbe used, despite having acceptable χ (the ﬁt is consis-tent with the true function within errors). As more dataare gathered the algorithm begins to correctly reproducethe features of the function. In the examples shown, onespline piece was suﬃcient for an acceptable ﬁt. F. Green’s function of the Fr¨ohlich polaron

We now apply the BHM to a physical problem—thediagrammatic Monte Carlo calculation of the zero mo-mentum imaginary time Green’s function, G ( k = 0 , τ ) = (cid:104) vac | a ( τ ) a † (0) | vac (cid:105) , (24) a k ( τ ) = e Hτ a k e − Hτ . (25)of the Fr¨ohlich polaron with the dimensionless couplingconstant α = 2 and the chemical potential µ = − . ω ,where ω is the (momentum-independent) phonon fre-quency. Here | vac (cid:105) is the vacuum state and a k is theannihilation operator for an electron with momentum k .The Fr¨ohlich Hamiltonian describes an electron coupled1 . . . . . . x − − − ( ˜ f ( x ) − f ( x )) / σ ( x ) Eq. (9) errorbootstrap errorﬁt evolution error

FIG. 6. Oscillating test function. Diﬀerence between theﬁtted and the true function normalized by one standard errorobtained with diﬀerent methods. to a bath of phonons, H = H e + H ph + H e − ph , (26) H e = (cid:88) k k a † k a k , H ph = (cid:88) q ω b † q b q , (27) H e − ph = (cid:88) k , q i (2 √ απ ) / q ( b † q − b − q ) a † k − q a k , (28)where b q is the annihilation operator for a phonon withmomentum q .The Green’s function is a central quantity for the dia-grammatic technique, from which other properties of thesystem can be obtained with appropriate analysis. As ref-erence we use a very long Monte Carlo run with reweight-ing. This provides a reliable estimate of the Green’s func-tion with negligible errors (several orders of magnitudesmaller than the errors of the sample used for the BHMand for basis projection sampling).Figure 11 shows the BHM ﬁt and the result obtainedusing the basis projection method for approximately2 · sampled points. The BHM ﬁt produced four splinepieces in this example and, as in the previous examples, the basis was retrospectively chosen to have the sameinterval division. It can be clearly seen that the BHMprovides an accurate smooth ﬁt of the Fr¨ohlich polaronGreen’s function, which agrees well with the reference.While the error bars on the BHM ﬁt and the basis pro-jection are comparable, the basis projection sampling isless eﬃcient, since it requires O ( m ) operations duringthe sampling stage, where m is the number of basis func-tions used. In this particular example with a polynomialbasis up to cubic order, this corresponds to at least 16times as many operations as needed during the samplingstage for BHM. V. CONCLUSIONS

We have argued and demonstrated that the BHMyields an eﬃcient, ﬂexible, and fully automatized algo-rithm to restore smooth functions from their noisy inte-grals using all available information. The resulting ﬁtsare at least as accurate as the ones obtained using basisprojection sampling, but with guaranteed smoothness atthe knots. Sampling with the BHM is also computation-ally less expensive than using basis projections, whichrequire many operations in each sampling step. Similarto the projection method onto a polynomial basis, theBHM spline is a piecewise polynomial function on sev-eral large intervals. The crucial technical advantage isthat these intervals do not need to be ﬁxed beforehand.A suitable division into intervals is found automaticallyby the algorithm and is adjusted over time, as more datapoints are collected.In the future we plan to extend the bin hierarchy al-gorithm to multivariate functions, since many relevantphysical observables depend on several variables, such astime and momentum. Two, three and four dimensionsare most relevant for physical applications.

ACKNOWLEDGMENTS

We thank Chris Amey for helpful discussions. Thiswork was supported by the Simons Collaboration on theMany Electron Problem and the National Science Foun-dation under the grant DMR-1720465. O.G. also ac-knowledges support by the US-Israel Binational ScienceFoundation (Grants 2014262 and 2016087). I. Narsky and F. C. Porter,

Statistical Analysis Techniquesin Particle Physics (John Wiley & Sons, 2013). D. W. Scott,

Multivariate Density Estimation (John Wiley& Sons, 2015). B. W. Silverman,

Density estimation for statistics and dataanalysis (London: Chapman and Hall, 1986). O. Goulko, A. Gaenko, E. Gull, N. Prokof’ev,and B. Svistunov, ArXiv e-prints (2017), codeavailable at https://github.com/olgagoulko/BHM/tree/GPLv3 , arXiv:1711.04316 [stat.OT]. O. Goulko, A. S. Mishchenko, L. Pollet, N. Prokof’ev,and B. Svistunov, Phys. Rev. B , 014102 (2017),arXiv:1609.01260 [cond-mat.other]. C. De Boor,

A Practical Guide to Splines (Revised Edition) (Springer, 2001). N. V. Prokof’ev and B. V. Svistunov, JETP Lett. , 649(2013), arXiv:1304.5198 [cond-mat.stat-mech]. A. M. Ferrenberg and R. H. Swendsen, Phys. Rev. Lett. ,2635 (1988), ; Erratum: Phys. Rev. Lett. 63, 1658 (1989). A. S. Mishchenko, N. V. Prokof’ev, A. Sakamoto, and (a) (b) (c) (d)FIG. 7. Smoothed histogram of ˜ f ( x ) values for the oscillating test function at x = 1 , , . , .

74 (from left to right). Thevertical lines are: histogram mean (red solid line); true function value (blue dotted line); the particular ˜ f ( x ) value from thesimulation shown in the bottom panel of Fig. 5 (black dashed line). The latter is shown for reference. The horizontal linesare: the ± σ interval around the histogram mean, where σ is the median standard error from Eq. (12) (red solid line); the ± σ interval around the histogram mean, where σ is the robust standard error (blue dotted line); the ± σ interval around thehistogram mean, where σ is the typical bootstrap error, taken from the example shown in the bottom panel of Fig. 5 (blackdashed line). . . . . . . x . . . . . . . . . . f ( x ) ∝ √ x ( + x ) (a) x . . . . . BHM spline f ( x ) . . . . . . x − . − . . . . ˜ f ( x ) − f ( x ) (b) BHM spline x − . . . ˜ f ( x ) − f ( x ) FIG. 8. Sampling a divergent function deﬁned on a semi-inﬁnite domain using the BHM with appropriate transforms. The leftpanel (a) shows the function and the ﬁt, while the right panel (b) shows the diﬀerence between the ﬁtted and the true function. . . . . . . x − . . . ˜ f ( x ) − f ( x ) (a) BHM spline x − . . . ˜ f ( x ) − f ( x ) . . . . . . x − . − . . . . ˜ f ( x ) − f ( x ) (b) BHM spline x − . . . ˜ f ( x ) − f ( x ) FIG. 9. Divergent test function f ( x ) ∝ x − . (1 + x ) − with an x − . divergence at x = 0. Data is sampled with a compensationfor the divergence that overestimates (left panel (a)) and underestimates (right panel (b)) the power of the divergence by usingweighting factors of x . and x . , respectively. The plots show the diﬀerence between the BHM ﬁt and the true function. . . . . . . . x − . . . . . . f ( x ) ∝ e x p ( − . x ) − e x p ( − x ) (a) BHM spline f ( x ) . . . . . . . x . . . . . . f ( x ) ∝ e x p ( − . x ) − e x p ( − x ) (b) BHM spline f ( x ) FIG. 10. Sampling with a severe sign problem using 10 sampled points (left panel (a)) and 10 sampled points (right panel(b)). The sampled integrals used for the ﬁt in the left panel are consistent with zero, which is correctly captured by the checkpresented in Sec. III G. In this case, the displayed BHM spline would not be used for further data analysis. ω τ G ( τ ) (a) ω τ −0.010−0.0050.0000.0050.010 ˜ G ( τ ) − G ( τ ) (b)BHM splinebasis projection FIG. 11. Diagrammatic Monte Carlo sampling of the Fr¨ohlich polaron Green’s function. The left panel (a) shows a veryprecise numerical calculation of the function, while the right panel (b) shows the diﬀerence between the ﬁtted and the referencefunction for the BHM and the basis projection method.B. V. Svistunov, Phys. Rev. B , 6317 (2000), cond-mat/9910025. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P.Flannery,

Numerical Recipes in C++ : The Art of Sci- entiﬁc Computing , 2nd ed. (Cambridge University Press,2002). B. Efron, Ann. Statist. , 1 (1979). H. Fr¨ohlich, H. Pelzer, and S. Zienau, Philos. Mag.41