Implementation of the Bin Hierarchy Method for restoring a smooth function from a sampled histogram
Olga Goulko, Alexander Gaenko, Emanuel Gull, Nikolay Prokof'ev, Boris Svistunov
IImplementation of the Bin Hierarchy Method for restoring a smoothfunction from a sampled histogram
Olga Goulko a,b, ∗ , Alexander Gaenko c , Emanuel Gull c , Nikolay Prokof’ev a,d , Boris Svistunov a,d,e a Department of Physics, University of Massachusetts, Amherst, MA 01003, USA b Present address: Raymond and Beverly Sackler School of Chemistry and School Physics and Astronomy, Tel AvivUniversity, Tel Aviv 6997801, Israel c Department of Physics, University of Michigan, Ann Arbor, MI 48109, USA d National Research Center “Kurchatov Institute,” 123182 Moscow, Russia e Wilczek Quantum Center, School of Physics and Astronomy and T. D. Lee Institute, Shanghai Jiao Tong University,Shanghai 200240, China
Abstract
We present
BHM , a tool for restoring a smooth function from a sampled histogram using the bin hierarchymethod. The theoretical background of the method is presented in [1]. The code automaticallygenerates a smooth polynomial spline with the minimal acceptable number of knots from the inputdata. It works universally for any sufficiently regular shaped distribution and any level of data quality,requiring almost no external parameter specification. It is particularly useful for large-scale numericaldata analysis. This paper explains the details of the implementation and the use of the program.
PROGRAM SUMMARY
Manuscript Title:
Implementation of the BinHierarchy Method for restoring a smooth functionfrom a sampled histogram
Authors:
Olga Goulko, Alexander Gaenko, EmanuelGull, Nikolay Prokof’ev, Boris Svistunov
Program Title:
BHM
Journal Reference:Catalogue identifier:Licensing provisions:
GPLv3
Programming language: C++Operating system:
Tested on Linux
RAM:
Keywords:
Data analysis, Function restoration,Spline fitting, Histogram, Smoothing
Classification:
External routines/libraries:
CMake, GSL
Nature of problem:
Restoring a smooth functionfrom a sampled histogram in an efficient, reliableand automatized way is crucial for numerical and ∗ Corresponding author.
E-mail address: [email protected] experimental data analysis.
Solution method:
To make use of all informationcontained in the sampled data, the BHM algorithmgenerates a hierarchy of overlapping bins of differentsizes from the initially supplied fine histogram. Thebin hierarchy is fitted to a polynomial spline with theminimal acceptable number of knots, the positionsof which are determined automatically. The outputis a smooth function with error band.
Running time:
Typically less than a second
1. Introduction
Numerical approaches to problems in con-densed matter and quantum many-body physicsoften involve generating data points accordingto an unknown probability density f ( x ), whichneeds to be restored from the sampled data. Theamount of data generated in large-scale quantumMonte Carlo simulations is usually so large that itis impossible (or at least impractical) to store thecomplete list of sampled data points x i , in order to Preprint submitted to Computer Physics Communications November 15, 2017 a r X i v : . [ s t a t . O T ] N ov se density estimation protocols [2–4] to recover f ( x ). Instead, data points are typically collectedinto a histogram, the histogram bins represent-ing integrals over the sampled distribution. Thisdoes not involve any significant loss of informa-tion, as long as the bins are sufficiently small toresolve the features of the distribution (which isalways possible provided that f ( x ) is sufficientlysmooth). More sophisticated sampling methodsexist, which retain more information about theindividual points, but these are in general less ef-ficient and require a case-dependent implementa-tion. We provide a universal and efficient programto restore a smooth distribution, which uses thestandard histogram as input. BHM is an implementation of the bin hierarchymethod, introduced in [1]. It is1. unbiased: • utilizes all relevant information con-tained in the data; • non-parametric fit automatically ad-justs to data quality; • provides maximally featureless solu-tion (least acceptable number of splineknots);2. efficient: • based on regular histogram, which is ef-ficient to sample; • fast analysis;3. automatic: • very little user input; • no adjustment for different types ofsampled functions; • no adjustment with simulation time asmore data is collected.The paper is organized as follows. The generalproblem setup is presented in Sec. 2. In Sec. 3,we give an overview of the algorithm. We explainhow to use the program in Sec. 4, giving a detailedexplanation of the input and output formats, aswell as possible parameter specifications. Severalexamples are presented in Sec. 5.
2. Problem setup
The central object in
BHM is a smooth function f ( x ) defined on a bounded domain D . Statisti-cal sampling with a probability density p ( x ) isperformed to generate samples for f ( x ) accord-ing to f j = f ( x j ) /p ( x j ) with p -distributed x j . Inthe simplest case, when f ( x ) itself is a normal-ized probability distribution, p ( x ) = f ( x ) can bechosen, implying f j = 1. The samples are binnedinto a histogram with 2 K bins. We are interestedin restoring a smooth function ˜ f ( x ) from this his-togram.Each histogram bin i with bin boundaries x i, min and x i, max represents the stochastic integral I i = (cid:90) x i, max x i, min f ( x ) dx (1)through the following relations: I i = ¯ f i N i N , (2) M ( I i ) = M ( f i ) + ¯ f i N i ( N − N i ) N , (3)Var( I i ) = M ( I i ) N − , (4) δI i = (cid:114) Var( I i ) N , (5)where the “scaled variance” M ( f i ) = ( N i − f i ) is the sum of squares of differences fromthe mean, N i is the number of samples in bin i and N the total number of samples. Note that inthe simplest case p ( x ) = f ( x ) the above quantitiesare determined through N i and N alone.The goal is to find a function ˜ f ( x ) whose inte-grals over different parts of the domain D agreewith the sampled integrals. Working with inte-grals rather than interpolated function values al-lows us to include combinations of histogram binsinto the fitting. Rebinning data to larger bin sizesleads to a reduction of statistical noise, while re-taining small bins results in a higher resolutiondue to smaller discretization errors.The resulting fit ˜ f ( x ) is a polynomial splineof order m , where m is the highest power withnon-zero coefficient. The spline function and itsderivatives up to order m − f ( x ).2 . Overview of the algorithm In this section we give a brief overview of thealgorithm. More details on the theoretical back-ground of the method can be found in [1]. Aflowchart of the algorithm is shown in Fig. 1. • The algorithm starts from a list of 2 K his-togram bins supplied in an input file (for adetailed format description, see Sec. 4). Typ-ical values of K are 7 – 15. It should benoted that the bins are not required to havethe same size; however, in practice there isno need to have variable size bins. The binsmust not overlap or leave gaps. • From this input the code generates a hier-archy of histogram bins of increasing size.Combining two neighboring bins of the 2 K initial bins leads to 2 K − larger bins with,on average, twice as many entries. Succes-sive repetitions of this rebinning result in ahierarchy of levels with 2 K − , . . . , , N i is smaller than a user defined min-imal value, are excluded from the fitting pro-cess. Likewise, levels that do not containenough usable bins (the minimal fraction canbe defined by the user) are also excluded.This implies that in general fitting starts witha level K (cid:48) > K so that the original binningcan be chosen to be very fine without intro-ducing noise into the final fit. For bins thatwill be used for fitting, the bin integrals andtheir errors are computed via Eqs. (2),(3), (4)and (5). • The code checks if the data is compatiblewith zero on the whole domain. There is anoption not to proceed with the fit if this isthe case. This feature is particularly usefulfor data suffering from a severe sign problem. • The next step is fitting a spline of order m onthe given spline interval division. The start- ing point is one spline interval, which meansthat one polynomial is fitted on the wholedomain. The fit minimizes K (cid:88) n =0 χ n n , (6)where χ n is defined for bins on hierarchy level n in the usual way. • Afterwards the goodness of fit is evaluated oneach hierarchy level individually. The crite-rion is χ n ˜ n ≤ T (cid:114) n , (7)where T is the fit acceptance threshold (inputparameter) and ˜ n the number of bins on level n that were used for fitting. The expression (cid:112) / ˜ n corresponds to one standard deviationof the χ -distribution. • If at least one level fails the global goodness-of-fit check, the goodness-of-fit is then evalu-ated on each spline interval separately (againlevel by level). Spline intervals on which thefit was acceptable remain unchanged, whilethe others are split into two parts, by intro-ducing a spline knot in the middle (“numberof bins”-wise). • If any of the resulting intervals is too small,meaning that there is not enough data to fiton that interval, the code exits without hav-ing produced an acceptable spline. Otherwisethe BHM fit is repeated on the new intervaldivision. • Once an acceptable spline has been found,there is an option to refit the data on thesame interval division with an additional con-straint that aims to minimize the jump in thehighest derivative. • The resulting BHM spline is output (splinecoefficients and error coefficients). In addi-tion, the spline values can be output evalu-ated on a grid.3 nput sampled data:histogram with K binsgenerate bin hierarchy;start with 1 spline intervalfit BHM spline oncurrent interval divisionoutput BHM splineall levels passrefit with constraintif desiredcheck goodness of fiton each hierarchy level check goodness of fit on allspline intervals separately enough data tocontinue fitting?split intervals that failedthe goodness-of-fit check no acceptable BHM splineat least onelevel fails yesno Figure 1: Flowchart of the algorithm
4. Input and output
Instructions for compiling the program and ex-ecuting unit tests can be found in the READMEfile.The program executable requires 1 argument,the name of the parameter file, e.g.: $ ./bhm in.param In particular, the parameter file determines thename of the input file with the histogram dataand the name of the output file for the BHM spline(see below).As a special case, if the parameter file name isan empty string, the default parameters will beused which are suitable for most applications: $ ./bhm ""
The input histogram data is text-based, line-oriented, and has the following format: A N exc 2 x , min N ¯ f M ( f ) x , min N ¯ f M ( f ) .... x i, min N i ¯ f i M ( f i ) .... x max where the first line specifies an overall normal-ization factor A and the number N exc of samplesoutside of the histogram bounds. The normaliza-tion step is omitted if either A = 1 or A = 0.Otherwise, all values ¯ f i and M ( f i ) are divided4y A and A , respectively, before constructing theBHM fit. The value N exc is used to calculate thetotal number of samples N = N exc + (cid:80) i N i , whichis needed for Eqns. (2–5). N exc can be zero.Starting from the second line, each line, exceptthe last one, contains 2 or 4 blank-separated val-ues, specifying the left bin boundary, the numberof samples in the bin, and, optionally, mean valueand scaled variance. For example, line 5 of thelisting corresponds to a bin i with the left bound-ary x i, min , number of samples N i , mean value ¯ f i and scaled variance M ( f i ) (see Eqns. 2–5). If themean value and the scaled variance are both omit-ted, they are assumed to be ¯ f i = 1 and M ( f i ) =0, which corresponds to only ever adding 1 to bincounters, or in other words p ( x ) = f ( x ). The lastline of the file (line 7 of the listing) must containa single entry x max , the right boundary of the lastbin.The numbers x , min . . . < x i, min . . . < x max mustform a strictly monotonically increasing sequence,corresponding to non-overlapping, finite-size binswith no gaps. In the current implementation, thenumber of bins must be a power of 2 (in the laterversions we may remove this limitation).It is important to note that all sampled dataand variances are assumed to be uncorrelated . Ifcorrelations are present, they have to be removedprior to the BHM fit, for example through appro-priate blocking analysis or by scaling the varianceswith the estimated correlation factor. The input parameter file is a text-based, line-oriented file that has a key = value format. Anexample input is shown in Fig. 2. The keys arecase-insensitive; the string values may be enclosedin quotes; the symbol starts a comment whichis ignored until the end of the line. The meaningof each parameter is indicated in the figure in thecorresponding comment. Below we provide moredetailed explanations for some of the parameters.
DataPointsMin in line 1 specifies the minimalnumber of data points that a bin must containin order to be used for fitting. Bins that contain fewer sampled points are ignored (but still con-tribute in combination with other bins at higherhierarchy levels).
DataPointsMin must be atleast 10, in order to ensure that meaningful statis-tics can be made from the data. The defaultvalue is 100. If a hierarchy level does not containenough usable bins (the minimal number is givenby the parameter
UsableBinFraction in line 7,times the total number of bins on that level) thenthis level and all subsequent levels are completelyomitted from the fitting.The maximal possible number of interval divi-sions is determined by the parameter
MinLevel in line 3. For example, if there are 2 K elemen-tary bins, MinLevel=2 means that the smallestpossible spline intervals coincide with the bins onhierarchy level K − MinLevel must be at least2 (corresponding to a total of at least 1+2+4=7bins per interval); moreover,
MinLevel must belarge enough to ensure that the fit is underdeter-mined for each interval.The fit acceptance threshold T (lines 4-6) canbe either set to a fixed value, or to a range ofvalues between Threshold and
ThresholdMax .In the latter case, a BHM fit is first attemptedwith the smallest value
Threshold . If no accept-able fit is found, the threshold value is succes-sively increased in
ThresholdSteps equidistantsteps, until either an acceptable fit is produced or
ThresholdMax is reached. Setting
ThresholdMax to be smaller or equal to
Threshold and/or set-ting
ThresholdSteps=0 corresponds to only us-ing one fixed value of T . Note that thresholdvalues that are too low can result in overfitting(too many spline pieces) or the failure to producean acceptable fit. Values that are too high canresult in underfitting (too few spline pieces and apoor fit with underestimated error bars). Theseissues are illustrated in Example 5.2. The value T = 2 . T = 2 . T = 4 . arameter File in.param DataPointsMin=100 SplineOrder=3 MinLevel=2 Threshold=2.0 ThresholdMax=4.0 ThresholdSteps=4 UsableBinFraction=0.25 JumpSuppression=false Verbose=true PrintFitInfo=true FailOnBadFit=true FailOnZeroFit=true Data="histogram.dat" OutputName="spline.dat" GridOutput="spline_plot.dat" GridPoints=1024
Figure 2: Sample parameter file functions, and hence there is no need to changeany of the parameters unless specifically desired.
The default verbose output is printed to stan-dard error and contains auxiliary informationsuch as values of the input parameters, a briefdescription of the input histogram, and the logof the fitting process. The fitting log is de-scribed in detail in Example 5.1. If requested bythe
PrintFitInfo input parameter, informationabout the final fit is also printed to the standardoutput.The output of the program is both humanand machine-readable, and has the following text-based, line-oriented, blank-separated format: m s x x ... x s a a a ... a m ε ε ε ... ε m ... i a a a ... a m ε ε ε ... ε m ( i + 1) ... Any lines at the beginning of the file that startwith are considered comments and are ignored.The first significant line of the file (line 3 of thelisting) specifies the spline polynomial order m and the number of splines pieces s ; the next line(line 4 of the listing) lists all ( s + 1) spline pieceboundaries x , . . . , x s +1 . The following lines form s sections describing each spline piece ˜ f i , for i =1 . . . s . Each section (lines 5–7, 9–11 of the listing)consists of 3 lines:1. Header (starts with ) specifying the splinepiece number ( i ),2. ( m + 1) numbers specifying the spline piececoefficients a . . . a m ( ˜ f i ( x ) = (cid:80) mk =0 a k x k ),3. (2 m + 1) numbers ε . . . ε m specifying the er-ror bar E i ( x ) = (cid:113)(cid:80) mk =0 ε k x k . The simplest way to plot the resultingspline is to use the provided Python3 script bhm_spline.py , as follows: $ python3 bhm_spline.py spline.dat
6n the other hand, it may be convenient to cus-tomize the plot and/or compare it with a knownfunction, or plot it interactively (e.g., from aJupyter notebook). For this purpose the scriptcan be imported as a module that provides a BHMSpline class. The following listing demonstrates apossible way of using the module. import numpy as np import matplotlib.pyplot as plt from bhm_spline import BHMSpline spline=BHMSpline("spline.dat") x=np.linspace(*spline.domain()) def fn(x): return (x**4-0.8*x*x)/0.171964 plt.plot(x,spline(x), x,fn(x)) plt.plot(x,spline.errorbar(x)) spline.plot() spline.plot(fn) spline.plot_difference(fn) In line 3 the class
BHMSpline is imported; line 5creates the object representing the spline. Inline 6 an interval of x-values is created corre-sponding to the domain of the spline. Line 8defines a reference function to compare with thespline. In line 10 the spline and the referencefunction are plotted using the
Matplotlib plot-ting library; in line 12 the error bar E ( x ) is plot-ted. The class also provides a convenience plot-ting method: when called without arguments (ason line 14), the spline is plotted along with theerror bars; when a function is passed as an argu-ment (line 16), its graph is plotted also. It is alsopossible to plot the difference between the splineand the reference function with error bar (line 18). If the
GridOutput parameter in the parameterfile is set to a non-empty filename, the program also outputs to the specified file the values andthe error bars of the spline computed on a one-dimensional grid of points. A plotting program,such as gnuplot , can then be used to plot the gen-erated function and the error bars and to comparethem with a reference function; for example: $ gnuplot gnuplot> quartic(x)=(x**4-0.8*x*x)/0.171964 gnuplot> plot "spline_plot.dat" witherrors gnuplot> replot quartic(x) In this example, line 1 of the listing starts the gnuplot program; line 2 defines a reference func-tion (quartic polynomial); line 3 plots the gridoutput file generated by
BHM ; and line 4 plots thereference function on the same graph.
5. Examples
In this section we present three detailed exam-ples of the features of
BHM illustrated on differentdistributions f ( x ). We provide a program to gen-erate the input data for these examples (as wellas for several additional test functions). Callingthe program without arguments: $ ./generator prints a brief help message, which includes a listof the functions supported by the program.Calling the program with a single file argument: $ ./generator generator.param generates the histogram data for a given analyt-ical function according to the parameters listedin the generator.param file. For all examplesdiscussed below, the parameters are the same asshown in the example generator parameter fileshown in Fig. 3 (including the random numbergenerator seed), except when stated otherwise.Calling the program as: $ ./generator -python name (where name is the name of the function, possiblyabbreviated) prints the Python code that corre-sponds to the function, which is convenient for7 arameter File generator.param SampleSize=10000 Function=exponential PowerBins=10 RandomSeed=956475 Output="histogram.dat" GridOutput="function.dat" GridPoints=1024
Figure 3: Sample parameter file to generate example input plotting the analytical function against the ap-proximating spline in an interactive Python envi-ronment (as has been discussed in subsection 4.5).If the
GridOutput parameter in the parameterfile is set to a non-empty filename, the programalso outputs the values of the function computedon a one-dimensional grid to the specified file; aplotting program, such as gnuplot , can then beused to plot the generated function; for example: $ gnuplot gnuplot> plot "function.dat" with lines gnuplot> replot "spline_plot.dat" witherrors In this example, line 1 of the listing starts the gnuplot program; line 2 plots the generatedfunction; and line 3 plots the content of the spline_plot.dat generated by
BHM as discussedin subsection 4.6.
This example demonstrates BHM fits for differ-ent choices of spline order m .The original function is a quartic polynomial( Function=quartic polynomial ): f ( x ) = α ( x − . x ) . (8)Because f ( x ) changes sign, sampling on the in-terval [ − ,
1] is performed with the probabilitydensity p ( x ) = | f ( x ) | and α = 0 . p ( x ) on this interval.The histogram data is fitted with BHM usingthe default parameters, with the exception of
SplineOrder which is set to 3, 4, and 5 respec-tively. The fit results are shown in Fig. 4. From the output files "spline.dat" it can be seen thatthe cubic spline has four spline pieces; the quar-tic spline has one spline piece, as expected; thequintic spline also has one spline piece, its coeffi-cients up to quartic order are similar to the onesobtained via quartic fit, and its highest spline co-efficient is small.We explain in detail the verbose output for thecubic fit m = 3. At the beginning of the output,the fit parameters are listed, as well as general in-formation about the input histogram. Then fol-lows information about the goodness-of-fit at thedifferent fitting stages: ... BHM fit: Begin BHM fitting with threshold T = 2 Checking separate chi_n^2/n in spline fit level n chi_n^2/n max chi_n^2/n Checking interval 0 (order: 0, number: 0) This interval fit is not good Checking separate chi_n^2/n in spline fit level n chi_n^2/n max chi_n^2/n .0 0.5 0.0 0.5 1.0 x f ( x ) x ˜ f ( x ) − f ( x ) m =3 m =4 m =5 Figure 4: Quartic polynomial test function (left panel). Difference between BHM fit ˜ f ( x ) with different spline orders m and the test function f ( x ) (right panel). Checking interval 0 (order: 1, number: 0) This interval fit is not good Checking interval 1 (order: 1, number: 1) This interval fit is not good Checking separate chi_n^2/n in spline fit level n chi_n^2/n max chi_n^2/n Good spline found with threshold T = 2 ... First a fit is attempted with one spline pieceon the whole domain (lines 4-13). This fit isnot acceptable because χ n / ˜ n (third column inthe output) exceeds the maximally allowed value1+ T (cid:112) / ˜ n (fourth column in the output) for mostof the levels. The second column lists ˜ n , the num- ber of available bins at each level. This numberis in general smaller than 2 n , because some binsdo not contain enough data to be used for fitting.Also, hierarchy levels below n = 7 were omittedbecause the fraction of usable bins on these levelswas below the set UsableBinFraction value.Since the first fit was unsuccessful, χ is evalu-ated on each spline interval separately (lines 14-16). In this case, this yields no new information,since only one interval is present. As soon as alevel is found where the fit is unacceptable (level 0in this case), this check stops without proceedingto lower levels, since this is enough to identify abad interval.After the interval is divided, another BHM fit isattempted on two intervals (lines 17-26). This fitalready has smaller χ n / ˜ n values than the previousone, but still fails the threshold on several levels.Both spline intervals are then again checked sep-arately (lines 27-31 and 32-36, respectively) andboth fail the goodness-of-fit check on level 3. Notethat level 0 is not present in the individual inter-val checks, because the bin on this level is largerthan each of the spline intervals.The intervals are numbered consecutively, butadditional information is provided so that theirlocation can be recovered (see e.g. lines 27 and32). The boundaries of an interval always coincidewith the boundaries of a bin on a certain hierarchylevel (denoted by “order”) and “number” denotes9he number of this bin.After the intervals are again divided, the result-ing BHM fit (lines 38-47) is acceptable. No sepa-rate interval checks need to be performed and thecode exits with the fit result. If PrintFitInfo isrequested, the goodness-of-fit information of thefinal result is output again at the end. This in-cludes the χ n / ˜ n values on each level n , the unitstandard deviation (cid:112) / ˜ n of the corresponding χ -distribution, as well as the number of standarddeviations by which χ n / ˜ n exceeds 1 on each level(last column). If χ n / ˜ n ≤ This example demonstrates BHM fits fordifferent choices of the threshold T . Thesampled distribution is a decaying exponential( Function=exponential ), f ( x ) = α exp( − x ) , (9)normalized on the interval [1 , α = 3 e / ( e − , . N exc sampled outside of the histogrambounds. The total number of sampled points inthis example is SampleSize=100000 .The histogram data is fitted with
BHM usingthe default parameters, with the exception of theparameters defining the fit acceptance threshold,which is set to be fixed at T = 0, 2, and 8, re-spectively. This can be achieved by either set-ting the value of ThresholdMax to be equal orless than the value of
Threshold , or by setting
ThresholdSteps=0 . The fit results are shown inFig. 5.For all threshold values an acceptable fit ex-ists, but with different interval divisions. The ex-tremely low threshold value T = 0 (which meansthat only fits with χ n / ˜ n ≤ T = 2 produces a suitable fit with 3 splinepieces that captures the shape of the test func-tion well. The very high value T = 8 yields anunderfitted spline with only 2 pieces. This splinedeviates strongly from the true function and theerror on the spline is severely underestimated. This example demonstrates that
BHM works forboth uniform and non-uniform input histograms.The sampled distribution, f ( x ) = 0 . G (0 , .
2) + 0 . G (2 ,
1) + G ( − , , (10)is a linear combination of three Gaussians G ( µ, σ ) with mean µ and standard deviation σ ( Function=triple gaussian ). It has several dis-tinct features and resembles a physically relevantcase.We sample
SampleSize=1000000 data pointson the interval [ − ,
5] into a uniform and a non-uniform histogram, both with 2 bins. Notethat the non-uniform histogram binning is pre-defined and cannot be adjusted by changing the PowerBins entry. The non-uniform histogrambins are smaller in the center of the domain(where the sampled function has a sharp fea-ture) and increase exponentially in size towardsthe domain boundaries. The smallest bin size isequal to the domain length divided by 2 . Thenon-uniform histogram is always collected in ad-dition to the customizable uniform histogram if Function=triple gaussian is chosen and is out-put into the file nonuniform histogram.dat .The fit results are shown in Fig. 6. Both his-togram divisions produce fits of similar qualitythat reproduce the tested distribution well. Since
BHM automatically considers combinations of ele-mentary bins, there is no need for a case-specificimplementation of a non-uniform histogram grid.Note that sampling the same data in a uniformhistogram with 2 bins produces nearly the samefit as when using 2 uniform bins in this example.
6. Acknowledgments
This work was supported by the Simons Col-laboration on the Many Electron Problem and bythe National Science Foundation under the grantsPHY-1314735 (O.G., N.P., and B.S.) and DMR-1720465 (N.P. and B.S.). O.G. also acknowledgessupport by the US-Israel Binational Science Foun-dation (Grants 2014262 and 2016087).10 .0 1.6 2.2 2.8 x f ( x ) f ( x ) x ˜ f ( x ) − f ( x ) T =0 T =2 T =8 Figure 5: Decaying exponential test function (left panel). BHM fits of the test function with different goodness-of-fitthresholds (right panel). x f ( x ) − − x − . − . . . . ˜ f ( x ) − f ( x ) uniform histogramnon-uniform histogram Figure 6: Triple Gaussian test function (left panel). BHM fits of the test function based on a uniform histogram and ahistogram with bins of different size (right panel).
References [1] O. Goulko, N. Prokof’ev, B. Svistunov, Restoring asmooth function from its noisy integrals arXiv:1707.07625 .[2] I. Narsky, F. C. Porter, Statistical Analysis Techniquesin Particle Physics, John Wiley & Sons, 2013.[3] D. W. Scott, Multivariate Density Estimation, JohnWiley & Sons, 2015.[4] B. W. Silverman, Density estimation for statistics anddata analysis, London: Chapman and Hall, 1986..[2] I. Narsky, F. C. Porter, Statistical Analysis Techniquesin Particle Physics, John Wiley & Sons, 2013.[3] D. W. Scott, Multivariate Density Estimation, JohnWiley & Sons, 2015.[4] B. W. Silverman, Density estimation for statistics anddata analysis, London: Chapman and Hall, 1986.