[PDF] Numerical analytic continuation: Answers to well-posed questions

Abstract

We formulate the problem of numerical analytic continuation in a way that lets us draw meaningful conclusions about properties of the spectral function based solely on the input data. Apart from ensuring consistency with the input data (within their error bars) and the {\it a priori} and {\it a posteriori} (conditional) constraints, it is crucial to reliably characterize the accuracy---or even ambiguity---of the output. We explain how these challenges can be met with two approaches: stochastic optimization with consistent constraints and the modified maximum entropy method. We perform illustrative tests for spectra with a double-peak structure, where we critically examine which spectral properties are accessible and which ones are lost. For an important practical example, we apply our protocol to the Fermi polaron problem.

Full PDF

NNumerical analytic continuation: answers to well-posed questions

Olga Goulko

Department of Physics, University of Massachusetts, Amherst, MA 01003, USA

Andrey S. Mishchenko

RIKEN Center for Emergent Matter Science (CEMS),2-1 Hirosawa, Wako, Saitama, 351-0198, Japan andNational Research Center “Kurchatov Institute,” 123182 Moscow, Russia

Lode Pollet

Department of Physics, Arnold Sommerfeld Center for Theoretical Physics,University of Munich, Theresienstrasse 37, 80333 Munich, Germany

Nikolay Prokof’ev

Department of Physics, University of Massachusetts, Amherst, MA 01003, USADepartment of Physics, Arnold Sommerfeld Center for Theoretical Physics,University of Munich, Theresienstrasse 37, 80333 Munich, Germany andNational Research Center “Kurchatov Institute,” 123182 Moscow, Russia

Boris Svistunov

Department of Physics, University of Massachusetts, Amherst, MA 01003, USANational Research Center “Kurchatov Institute,” 123182 Moscow, Russia andWilczek Quantum Center, Zhejiang University of Technology, Hangzhou 310014, China (Dated: September 6, 2016)We formulate the problem of numerical analytic continuation in a way that lets us draw meaning-ful conclusions about properties of the spectral function based solely on the input data. Apart fromensuring consistency with the input data (within their error bars) and the a priori and a posteriori (conditional) constraints, it is crucial to reliably characterize the accuracy—or even ambiguity—ofthe output. We explain how these challenges can be met with two approaches: stochastic optimiza-tion with consistent constraints and the modiﬁed maximum entropy method. We perform illustrativetests for spectra with a double-peak structure, where we critically examine which spectral propertiesare accessible and which ones are lost. For an important practical example, we apply our protocolto the Fermi polaron problem.

PACS numbers: 02.70.-c, 71.15.Dx, 71.28.+d, 71.10.Fd

I. INTRODUCTION

Numerous problems in science, from spectral analysisto image processing, require that we restore properties ofa function A ( z ) from a set of integrals g n = G [ n, A ] ≡ (cid:90) ∞−∞ dzK ( n, z ) A ( z ) , n = 1 , . . . , N, (1)where K ( n, z ) is a known kernel and { g n } is a ﬁnite set ofexperimental or numerical input data with error bars. Animportant class of such problems—known as numericalanalytic continuation (NAC)—deals with “pathological”kernels featuring numerous eigenfunctions with anoma-lously small eigenvalues. An archetypal NAC problemis the numerical spectral analysis at zero temperature,where the challenge is to restore the non-negative spec-tral function A ( z ≥

0) satisfying the equation g n = (cid:90) ∞ dze − zτ n A ( z ) , (2)from numerical data for g n = g ( τ n ≥ ill-posed .Mathematically, the near-degeneracy of the kernel im-plies two closely related circumstances: (i) the absence ofthe resolvent, and (ii) a continuum of solutions satisfyingthe input data within their error bars (even when inte-grals over z are replaced with ﬁnite sums containing lessor equal to N terms). Nowadays, the ﬁrst circumstanceis merely a minor technical problem, as there exists anumber of methods allowing one to ﬁnd solutions to (1)without compromising the error bars of g n .The second circumstance—the ambiguity of thesolution—is a more essential problem. It is clear thatif one formulates the goal as to restore A ( z ) as a con-tinuous curve, or to determine its value on a given gridof points, then the goal cannot be reached as stated, ir-respective of the properties of the kernel K ( n, z ). Theinput data set is ﬁnite and noisy, thereby introducing anatural limit on the resolution of ﬁne structures in A ( z ).Fortunately, the above-formulated goal has little to dowith the practical world. In an experiment, all devicesare characterized by a ﬁnite resolution function and thedata they collect always correspond to integrals . The a r X i v : . [ c ond - m a t . o t h e r] S e p data are processed by making certain assumptions aboutthe underlying function. This motivates an alternativeformulation of the NAC goal involving integrals of A ( z )that render the problem well-deﬁned. With additionalassumptions about the smoothness and other propertiesof A ( z ) behind these integrals, consistent with both apriori and a posteriori knowledge, the ambiguity of thesolution can be substantially suppressed. The simplestsetup is as follows: Given a set of ﬁnite intervals { ∆ m } , determine the inte-grals of the spectral function over these intervals: i m = ∆ − m (cid:90) z ∈ ∆ m dzA ( z ) , m = 1 , . . . , M, (3) along with the corresponding dispersions of ﬂuctuations { σ m } (straightforwardly extendable to the dispersioncorrelation matrix { σ mm (cid:48) } ). Naively one might think that nothing is achieved by goingfrom the integrals in (1) to the integrals in (3) becausethe latter have exactly the same form with the kernel¯ K ( m, z ) = ∆ − m for z ∈ ∆ m and zero otherwise (otherforms of the “resolution function” ¯ K ( m, z ) are discussedin Sec. II): i m = I [ m, A ] = (cid:90) ∞−∞ dz ¯ K ( m, z ) A ( z ) , m = 1 , . . . , M. (4)This impression, however, is false because kernel proper-ties are at the heart of the problem. If for appropriatelysmall intervals (suﬃciently small to resolve the variationsof A ( z )), the uncertainties for i m remain small, then onecan draw reliable conclusions for the underlying behav-ior of A ( z ) itself. The diﬀerence between “good” (e.g.as in Fourier transforms) and pathological kernels is thatfor the latter, due to the notorious saw-tooth instability,the uncertainties for i m quickly become too large for ameaningful analysis of ﬁne structures in A ( z ).To obtain a solution from the integrals (3), one hasto invoke the notion of conditional knowledge . The moststraightforward approach is to set the spectral functionvalues at the middle points z m of the intervals ∆ m to A ﬁn ( z m ) = i m . This is only possible if the intervals canbe made appropriately narrow without losing accuracyfor the integrals. With this approach we assume thatthe function is nearly linear over the intervals in ques-tion. This is a typical procedure for experimental data.Quantifying the error bar on A ﬁn ( z m ) necessarily involves two numbers: the “vertical” dispersion σ m is directly in-herited from i m , and the “horizontal” error bar ∆ m / (cid:82) dzA ( z ) is typically known with an accuracy that is or-ders of magnitude better than what would be predictedby the central limit theorem if this integral is representedby a ﬁnite sum of integrals over nonoverlapping inter-vals. Second, the errors are not necessarily distributedas a Gaussian. Atypical ﬂuctuations can have a signiﬁ-cant probability and their analysis should not be avoidedas the actual physical solution may well be one of them.To this end, it is important to explore the minimal andmaximal values that the integral i m can take, and checkthat these are not signiﬁcantly diﬀerent from the typicalvalue of i m . In certain cases this criterion cannot be metwithout increasing the intervals ∆ m to an extent whenthe assumption of linearity of A ( z ) becomes uncontrolled,implying that an important piece of information aboutthe shape of A ( z ) in this interval is missing. A charac-teristic example that plays a key role in the subsequentdiscussion is presented in Fig. 1, where the challenge isto extract the shape of the second peak. A ( z ) Z FIG. 1. Challenging example of a spectrum A ( z ) for the NACproblem (2). As shown in the main part of the text, the signif-icant width of the ﬁrst peak makes it essentially impossible tocontrollably restore the width of the second peak, even withsmall relative error bars ( ∼ − ) on g n . On the other hand,the ﬁrst two moments of the second peak, characterizing itsweight and position, can be extracted reliably. In the more sophisticated approach used in this work,the values of A ﬁn ( z m ) can be further optimized (withoutcompromising the accuracy of the solution with respectto g n ) to produce a smooth curve. This protocol hasthe additional advantage of eliminating minima, maxima,gaps, and sharp features that are not guaranteed to existby the quality of input data. The nature of the problem issuch that very narrow peaks (or gaps) with tiny spectralweight can always be imagined to be present (for narrowintervals they will certainly emerge due to the saw-toothinstability). Our philosophy with respect to these fea-tures is to erase them within the established error bounds and obtain a solution that is insensitive to the intervalparameters.Having established a smooth solution, one may never-theless ask whether a particular feature of the solutioncan, in principle, have signiﬁcantly diﬀerent properties.For example, if the NAC procedure suggests a peak, onemay wonder if the true spectral function could have amuch narrower peak with the same area, and if so, whatis its smallest possible width. A NAC protocol should beable to answer this type of question fast and reliably. Inthis work, we explain how these goals can be achieved inpractice. Many technical details of the protocol that wepropose to abbreviate as SOCC (Stochastic Optimizationwith Consistent Constraints) were already published inRefs. 1–3 as separate developments. The crucial advanceshere are (i) the ﬁnal formulation based on integrals of thespectral function, and (ii) the idea of working with lin-ear combinations of pre-calculated “basic” solutions. Thelatter allows one to readily apply consistent constraintswithout compromising the error bars on the input data.Consistent constraints are also crucial for assessing whatfeatures can be resolved and what information is unre-coverable.In what follows, the term “consistent constraints” ap-plies to (i) the general principle of utilizing the a priori and revealing the a posteriori (conditional) knowledgewithout compromising the error bars of the input dataand (ii) a particular set of numerical procedures basedon ideas respecting this principle. Our SOCC protocolinvolves two diﬀerent consistent-constraint procedures.The ﬁrst one, borrowed form the NAC method of Ref. 3,is now used solely to (dramatically!) enhance the per-formance of the stochastic-optimization part of the pro-tocol searching for basic solutions. The most importantconsistent-constraint procedure is used to post-processthe set of basic solutions.The paper is organized as follows. In Sec. II we de-scribe the SOCC method consisting of three distinctstages, and explain how a smooth solution can be ob-tained without any bias with respect to solving Eq. (1)and analyzed for possible atypical deformations. InSec. III we brieﬂy review the maximum entropy method(MEM). In Sec. IV we explore what SOCC and MEMmethods predict for the test spectral function shown inFig. 1, and how one should analyze the ﬁnal solution withrespect to its possible smooth transformations. In Sec. Vwe apply our ﬁndings to the physical spectral function ofthe resonant Fermi polaron.

We conclude in Sec. VI.

II. STOCHASTIC OPTIMIZATION WITHCONSISTENT CONSTRAINTS

The formulation of the SOCC method is relativelysimple and consists of three parts:1. Finding a large set of solutions A j ( z )[ j = 1 , . . . , J (cid:29)

1] to Eq. (1) that satisfy the in-put data within their error bounds. In what followswe call them “basic” solutions. Basic solutions are not biased in any way to be smooth or to satisfy any otherrequirements based on knowledge about the problemoutside of Eq. (1). The irregularity of basic solutionsembodies what is referred to as an ill-posed problem. Insubsection II A we brieﬂy explain how these solutions arefound by the stochastic optimization procedure (mosttechnical details were published previously in Refs. 1and 2) and how the consistent constraints method isused to improve drastically the speed of the stochasticoptimization protocol.2. Using the basic solutions A j ( z ) to compute the in-tegrals (4) with a diﬀerent kernel ¯ K ( m, z ). There areseveral choices here. One of them is given by Eq. (3)and amounts to computing integrals from A j ( z ) over ﬁ-nite intervals { ∆ m } centered at { z m } . However, one isalso free to consider normalized continuous kernels withunrestricted integration over z , such as Lorentzian (orGaussian) shapes centered at points { z m } with the width { ∆ m } at half-height, e.g.,¯ K ( m, z ) = ∆ m /π ( z − z m ) + ∆ m . (5)Thus obtained sets of integrals { i ( j ) m } are then usedstraightforwardly to compute averages i m = J − J (cid:88) j =1 i ( j ) m , (6)and dispersions σ m = J − J (cid:88) j =1 (cid:16) i ( j ) m − i m (cid:17) . (7)To characterize possible two-point correlations, oneshould compute the correlation matrix σ mm (cid:48) = J − J (cid:88) j =1 (cid:16) i ( j ) m − i m (cid:17) (cid:16) i ( j ) m (cid:48) − i m (cid:48) (cid:17) . (8)Strictly speaking, there is no reason to stop characteriz-ing correlations at the two-point level. One may proceedwith computing multi-point averages but the eﬀortquickly becomes expensive and the outcome cannot bepresented in a single plot. An alternative “visualization”of multi-point correlations is discussed in subsection II B.3. Interpreting the result. The simplest interpretationand an estimate of the dispersion for typical ﬂuctuationsis to assume that A ( z ) is nearly linear over the range ofeach interval. This leads to the solution A ﬁn ( z m ) = i m with vertical and horizontal “error bars” σ ( v ) m = σ m and σ ( h ) m = ∆ m /

2. The vertical error bars may be overes-timated because ﬂuctuations at diﬀerent points are cor-related. However, as explained in the Introduction, thecorrect answer may correspond to some atypical shape,and this possibility has to be addressed as well.An alternative protocol, discussed in subsection II B,determines the ﬁnal solution by selecting a superpositionof basic solutions A ﬁn ( z ) = J (cid:88) j =1 c j A j ( z ) , J (cid:88) j =1 c j = 1 , (9)such that A ( z ) remains non-negative (with high accu-racy) and the coeﬃcients c j are optimized to imposesmooth behavior or any other “conditional knowledge”.Formally, the simplest interpretation corresponds to c j =1 /J . A. Search for basic solutions

The search for basic solutions relies on the minimiza-tion of χ = N − N (cid:88) n =1 (cid:18) g n − G [ n, A ] δ n (cid:19) , (10)where δ n is the error of the g n value. Without loss ofgenerality, we assume that the components of the vector (cid:126)g = ( g , g , . . . ) are uncorrelated; otherwise, one has toperform a rotation to the eigenvector basis of the two-point correlation matrix (cid:104) ( g n − (cid:104) g n (cid:105) )( g (cid:48) n − (cid:104) g (cid:48) n (cid:105) ) (cid:105) wherethe components of (cid:126)g become statistically independent.This linear transformation leads to an equation that hasexactly the same form as (1) with a rotated kernel. Wechoose a maximal tolerance χ c of order unity and searchfor functions A ( z ) > χ < χ c , which are thenadded to the set of basic solutions for further processing.Information about the input data is limited to the ob-jective function (10). Truly unbiased methods shouldnot assume anything about A ( z ) that is not part of ex-act knowledge, such as the predetermined grid of points,and the number and parameters of peaks/gaps. In thestochastic optimization method of Refs. 1 and 2, each so-lution is represented by a set of positive-deﬁnite rectan-gular shapes (the δ -function can be viewed as the limitingcase of an inﬁnitely narrow and inﬁnitely high rectangu-lar shape with ﬁxed area), which are allowed to havemultiple overlaps, see panel (a) in Fig 2. More precisely,a solution is represented as a sum A ( z ) = R (cid:88) r =1 η { P r } ( z ) (11)of rectangles { P r } = { h r , w r , c r } , η { P r } ( z ) = (cid:26) h r , z ∈ [ c r − w r / , c r + w r / , , otherwise , (12)where h r > w r >

0, and c r are the height, width, andcenter of the rectangle P r , respectively. In what followswe refer to C = {{ P r } , r = 1 , ..., R } , (13) as a “conﬁguration.” All rectangles are restricted tothe speciﬁed range of the A ( z ) support, [ z (min) , z (max) ];i.e., for all rectangles c r − w r / > z (min) and c r + w r / < z (max) . The spectrum normalization is given by (cid:80) Rr =1 h r w r = N , and G [ n, A ] in Eq. (1) can be writtenas G [ n, A ] = R (cid:88) r =1 K ( n, r ) h r , (14)where K ( n, r ) = (cid:90) c r + w r / c r − w r / dz K ( n, z ) . (15)The number of rectangles and all continuous parame-ters characterizing their position, width, and height arefound by minimizing the objective (10). Optimizationstarts from a randomly generated set of rectangles, andﬁnds a large number of dissimilar basic solutions A j ( z )with χ < χ c . More precisely, the search is based ona chain of randomly chosen updates over the conﬁgura-tion space of shapes which fully explore the saw-toothﬂuctuations of basic solutions. This is important for thesuccessful elimination of noise in the ﬁnal solution. z z z z z z z z z z A(z) z (a) (b) (c) z (min) z (max) z (min) z (min) z (max) z (max) A(z)

A(z)

FIG. 2. (a) Spectral function parametrization by a set ofrectangles. (b) Illustration of the treatment of intersections ofrectangles. (c) Identical re-parametrization of the spectrum.

Updates proposing small modiﬁcations of the shape(“elementary” updates) have the disadvantage of longcomputation time for a basic solution. To speed up thesearch, we supplement the standard protocol of Refs. 1and 2 with consistent-constraints (CC) updates, whichpropose a radical shape modiﬁcation based on minimiza-tion of the positive-deﬁnite quadratic form χ + (cid:80) i o i by matrix inversion as described in Ref. 3. Here o i arevarious positive-deﬁnite quadratic forms, or “penalties,”that ensure that the matrix to be inverted is well-deﬁned.This is achieved by penalizing the derivatives of A ( z )(computed on the grid based on the current conﬁguration {C} , see below) and enforcing A ( z ) ≥

0. The CC-updateinvolves a number of iterations when penalties o i are ad-justed self-consistently in such a way that at the end ofthe update 0 < (cid:80) i o i (cid:28) χ . Explicit forms for { o i } andthe adjustment protocols are described in detail in Ref. 3(see also O and O forms in subsection II B).Even though the CC-updates do not compromise thegoal of minimizing χ , their eﬃciency is based on penal-ties that suppress saw-tooth ﬂuctuations. To excludepossible bias originating from CC-updates on basic so-lutions we proceed as follows. The global update of theSOCC method consists of thousands of elementary up-dates L tot that are divided into two groups: L a stage-aupdates and L b stage-b updates, where L a + L b = L tot and L a < L b . Updates increasing χ are temporarilyconsidered “accepted” (and the resulting conﬁgurationrecorded) with high probability during stage-a, but thisprobability is reduced during stage-b that favors updatesdecreasing χ . The idea is to use L a updates to escapefrom the local minimum of χ in the multi-dimensionalconﬁguration space in a hope to ﬁnd a better minimumafterwards. The global update is accepted only if asmaller value of χ was recorded in the course of applyingelementary updates, and the new conﬁguration becomesthe one with the smallest χ . We apply CC-updates dur-ing stage-a of a global update when the increase of χ is allowed, and proceed with a large number of elemen-tary updates, which results in a conﬁguration with fullydeveloped saw-tooth instability.We found that CC-updates have no eﬀect on the self-averaging of the saw-tooth noise in the equal-weight su-perposition of basic solutions, improve typical χ -valuesfor basic solutions, and signiﬁcantly decrease the compu-tation time required for ﬁnding basic solutions.To run the CC-update, one has to re-parameterize theconﬁguration as a collection of nonoverlapping rectanglesin order to be able to use their heights for estimates ofthe function derivatives. Panel (b) in Fig. 2 illustrateshow overlaps of rectangles are understood in the SOCCmethod. This leads to an identical re-parametrization interms of nonoverlapping rectangles { (cid:101) P r } = { (cid:101) h r , (cid:101) w r , (cid:101) c r } .The conversion is done as follows. First, the set of rect-angle parameters { c r − w r / } ∪ { c r + w r / } is ordered toform a grid of new bin boundaries that also include thesupport limits z (min) and z (max) . Second, bin centers andwidths become centers and widths of the ordered set of new rectangles, respectively: (cid:101) c r +1 > (cid:101) c r ∀ r , (16) (cid:101) c − (cid:101) w / z (min) , (cid:101) c r − (cid:101) w r / (cid:101) c r − + (cid:101) w r − / ∀ r, (17) (cid:101) c R +1 + (cid:101) w R +1 / z (max) . Figures 2(b) and 2(c) illustrate how the conversionfrom { P t } to { (cid:101) P t } amounts to an identical representa-tion of the spectrum: R original rectangles introduce 2 R boundaries on the [ z (min) , z (max) ] interval and split it into2 R + 1 rectangles obeying conditions (16)–(17). Notethat some rectangles have zero height when submittedinto the CC-update. The update modiﬁes the values ofall (cid:101) h -parameters and generates a new set { (cid:101) h (cid:48) r } . Since { (cid:101) P r } is a particular case of { P r } , there is no need toperform any additional transformation to proceed withelementary updates. B. Preparing the ﬁnal solution

Because all basic solutions satisfy Eq. (1), one can im-mediately check that a linear combination of basic solu-tions, Eq. (9), always leads to a solution of Eq. (1) withthe same accuracy cutoﬀ χ c as the basic solutions, pro-vided that all c -coeﬃcients are non-negative. Indeed, bylinearity of the problem and the condition (cid:80) Jj =1 c j = 1, G [ n, A ﬁn ] = J (cid:88) j =1 c j G [ n, A j ] ,g n − G [ n, A ﬁn ] = J (cid:88) j =1 c j ( g n − G [ n, A j ]) . (18)Substituting these expressions into the χ form for theﬁnal solution and employing the Cauchy–Bunyakovsky–Schwarz inequality for χ jj (cid:48) we get χ = J (cid:88) j,j (cid:48) =1 c j c j (cid:48) χ jj (cid:48) ≤ ¯ C χ c , with ¯ C = J (cid:88) j =1 | c j | , (19) χ jj (cid:48) = N − N (cid:88) n =1 ( g n − G [ n, A j ])( g n − G [ n, A j (cid:48) ]) δ n . (20)If some c -coeﬃcients are negative, the accuracy of theﬁnal solution is guaranteed only if ¯ C is not large. Onemay argue that the upper bound χ ≤ ¯ C χ c is substan-tially overestimating deviations, and the actual accuracyis better. Let C + = (1 + ¯ C ) / C − = (1 − ¯ C ) / C + and C − (we denote them as A + and A − ,respectively), have their χ measures smaller or equal to χ c , by Eq. (19). The ﬁnal solution can be identicallywritten as A ﬁn = C + A + + C − A − , and its χ -measure isnothing but the two state version of Eq. (19). Since the G [ n, A ] values are derived from spectral density integralsthey are smooth functions of n and random point-to-point sign ﬂuctuations of g n − G [ n, A ] are arising pre-dominantly from g n . Thus, the expectation is that χ + − is positive, in which case χ ≤ ( C + C − ) χ c − C + | C − | χ + − ≤ C χ c . (21)In practice, sign-positivity of the spectral density severelyrestricts the possibility of having large | C − | and ¯ C in theﬁnal solution and | C − | tends to remain smaller than unityautomatically. Finally, Eq. (19) is only an upper bound,and superpositions with ¯ C as large as 2 may still have χ < χ c .These considerations lead to an important possibilityof modifying the shape of the ﬁnal solution in order tosatisfy additional criteria formulated outside of Eq. (1).The key observation, and crucial diﬀerence to other NACmethods, is that “conditional knowledge” protocols areinvoked after all basic solutions are determined, meaningthat they remain unbiased with respect to the input data.As discussed in the Introduction, the most conserva-tive philosophy regarding sharp spectral features, suchas peaks and gaps, is to eliminate them if they are notwarranted by the quality of the input data. (Our methoddoes allow to answer the question whether a given sharpfeature is compatible with the input data, see below.)To implement the idea, we formulate the problem ofdetermining an appropriate set of { c j } coeﬃcients asa linear self-consistent optimization problem closelyfollowing the consistent constraints method of Ref. 3.The objective function to be minimized consists ofseveral terms, O = (cid:80) k =1 O k , each being a quadraticpositive-deﬁnite form of c j . More terms can be addedif necessary to control higher-order derivatives, enforceexpected asymptotic behavior, etc. • To suppress large derivatives we consider the follow-ing form O = K (cid:88) k =2 (cid:8) D k [ A (cid:48) ( z k )] + B k [ A (cid:48)(cid:48) ( z k )] (cid:9) , (22)where { z k } is the grid of points used to deﬁne the ﬁrst andsecond discrete derivatives of the function A ( z ). The setsof coeﬃcients D k and B k are adjusted under iterationsself-consistently in such a way that contributions of all z k -points to O are similar. • The unity-sum constraint on the sum of all coeﬃ-cients in the superposition is expressed as O = U  J (cid:88) j =1 c j −  , (23)with a large constant U . • Since O + O does not constrain the amplitudes andsigns of all c j the minimization cannot proceed by matrixinversion. To improve matrix properties we add a “soft”penalty for large deviations of c j from the equal-weightsuperposition O = J (cid:88) j =1 ( c j − /J ) . (24) • To ensure that the spectral function is non-negative(with high accuracy) we need z -dependent penalties (tobe set self-consistently) that suppress the development oflarge negative ﬂuctuations: O = K (cid:88) k =1 Q k A ( z k ) . (25) • Finally, we can introduce a penalty for the solution todeviate from some “target” function (or “default model”) A T ( z k ): O = K (cid:88) k =1 T k [ A ( z k ) − A T ( z k )] . (26)The main purpose of O is to address subtle multi-pointcorrelations between allowed shapes: by forcing thesolution to be close to a certain target function one canmonitor how the solution starts developing additionalsaw-tooth-instability-related features or violates theunity-sum constraint. This penalty is zero when prepar-ing A ﬁn in the absence of any target function.The self-consistent optimization protocol is as follows.We start with c j = 1 /J and compute A ( z k ). The ini-tial sets of coeﬃcients in O are deﬁned as D k = D , and B k = D , where D is some small positive constant (its ini-tial value has no eﬀect on the ﬁnal solution because penal-ties for derivatives will be increased exponentially underiterations). Since the positivity of A ( z ) is guaranteed inthe initial state, we set Q k = 0. After the quadratic formfor the objective function O is minimized, the new set of c -coeﬃcients is used to deﬁne a new solution A ( z ), penal-ties for derivatives are increased, D → f D , D k → f D k , B k → f B k by some factor f >

0, we assign a large penalty suppressingthe amplitude of the solution at this point, Q k = Q ,where Q is a large constant; otherwise the value of Q k is increased by two orders of magnitude.In this work we use U = 10 , Q = 10 , and f = 2. Thissets the stage for the next iteration of the O -optimizationprotocol.Since the accuracy expression, Eq. (18), relies on thesubstitution g n = (cid:80) Jj =1 c j g n , it is crucial that the unity-sum constraint is satisﬁed for all input data points, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) J (cid:88) j =1 c j − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < (cid:15) , (cid:15) = min {| δ n /g n |} n . (27)This provides the required criterion for terminating iter-ations. The ﬁnal solution (9) is based on the last set of c -coeﬃcients that satisﬁed the condition χ < χ c .In the absence of the target objective O , the proce-dure is guaranteed to produce a ﬁnal solution A ﬁn ( z ) withsmooth behavior because our initial solution already sat-isﬁes all requirements. The derivative objective is forcing A ( z ) to be as smooth as possible within the subspace ofﬂuctuations that keep χ small.With the help of O one can explore how the solution ismodiﬁed if one forces it to go through some set of points. The simplest case would be to set T k = T for some point z o (where T is a large number; in this work T = 10 )and zero otherwise, and shift A T ( z o ) away from A ﬁn ( z ).The solution going through the point A T ( z o ) is no longerguaranteed to be smooth in the vicinity of z o and forlarge deviations from the ﬁnal solution will develop thesaw-tooth instability at z o .The most interesting choices for z o are the minimaand maxima of the spectrum. Despite the fact thatour protocol is to erase sharp features not warrantedby the input data quality, we can still address questionssuch as “can this spectral peak (or gap) be made nar-rower/higher/lower and by how much, before the solu-tion becomes unstable against developing secondary fea-tures?” This question cannot be fully answered at thelevel of the correlation matrix (8) because (i) spectralfunctions have subtle multi-point correlations, and (ii)the notion of a “typical” solution has no physical mean-ing in this context. The only way to answer this questionis to have access to a large representative set of unbiasedbasic solutions.The objective O oﬀers a generic way of exploring var-ious possibilities for underlying features hidden behindthe accuracy of input data. Clearly, there are other al-ternatives for addressing speciﬁc questions. For example,one can isolate a spectral peak to some interval and com-pute the dispersion d j of each basic solution j over thisinterval. Next, the distribution function W ( d ) over allbasic solutions is composed and analyzed. If W ( d ) hasa narrow region of support around its average (cid:104) d (cid:105) value,then the peak width cannot signiﬁcantly deviate from (cid:104) d (cid:105) .If W ( d ) is nonzero for d (cid:28) (cid:104) d (cid:105) then one has to concludethat the actual peak might be much narrower (and, cor-respondingly, have a much higher amplitude) than whatis predicted by the typical smooth solution A ﬁn . III. MAXIMUM ENTROPY METHOD

Numerous NAC schemes are based on a totallydiﬀerent philosophy and impose additional restric-tions/penalties on the allowed functional shapes of A ( z ) in the process of solving Eq. (1). In otherwords, the search for solutions is biased with “condi-tional knowledge” from the very beginning. Histori-cally, the Tikhonov-Phillips regularization method was the ﬁrst to advocate this approach. Currently,the most popular scheme of this type is the maxi-mum entropy method. Other schemes worth men-tioning are singular value decomposition, non-negativeleast-squares, stochastic regularization, and averag-ing Pad´e approximants. In the stochastic samplingmethod of Refs. 8, 28–30, the remaining bias is in theform of the predetermined grid of frequency points, andthe ﬁnal solution is an average over a certain “ther-mal” ensemble (see also Ref. 31 for a further reﬁne-ment). Fast eﬀective modiﬁcation of stochastic optimiza-tion (FESOM) also uses the predetermined grid of fre-quency points.We now brieﬂy review the maximum entropy method, which can be seen as a special case of stochastic sam-pling methods. Instead of minimizing χ one con-structs a functional Q = χ − αS [ A ], where S [ A ] isthe “entropy” term. The positive parameter α is a La-grange multiplier that can also be thought of as a “tem-perature” by analogy to classical statistics (note that ourdeﬁnition of χ diﬀers by a factor of N from the stan-dard MEM formulation; this is, however, only a mat-ter of convention). The entropy term S [ A ] takes theform S [ A ] = − (cid:82) dzA ( z ) ln [ A ( z ) /M ( z )] with M ( z ) be-ing the default model. For very large values of α , thedefault model term dominates in Q , reﬂecting our igno-rance about the system. For very low values of α , the“energy” term χ dominates, reﬂecting the quality of theinput data. For intermediate values of α , one interpo-lates between these two limits and obtains a trade-oﬀbetween accuracy and smooth behavior enforced by thedefault model. We are using Bryan’s method to im-plement the minimization procedure: the ﬁnal answer isobtained by averaging over all values of α weighted withthe respective a posteriori probability (we saw howeverlittle diﬀerence between Bryan’s method and the classi-cal MEM in the examples below). In Bryan’s method,a singular value decomposition is also applied, which re-ﬂects the fact that the ﬁnite precision of storing ﬂoatingpoint numbers in combination with the poor conditioningof the kernel puts severe limitations on the informationthat can possibly be retrieved. One can hence reduce thesearch space at no substantial loss; in practice, only 5to 20 search directions survive this step. The remainingminimization is performed by the Levenberg-Marquadtalgorithm. Bryan’s method is, after 25 years, still the de facto standard for inversion problems in condensedmatter physics. One of its most attractive features isits speed: a few seconds on a laptop usually suﬃce toget a reasonable answer provided good starting param-eters (for the grid, default model, and the range of α )have been found. Nevertheless, the obtained answer (in-cluding the a posteriori probability distributions) shouldalways be carefully checked.A major issue is that the solution may strongly dependon the default model. (Note that the error bars whichRef. 7 calculates, are conditional on the default modeland do hence not reﬂect variability with respect to dif-ferent default models.) A practitioner usually wants toexplore diﬀerent (classes of) default models in order toget an idea of the robustness of the obtained answer, andsometimes to examine if lower values of χ can be foundfor other solutions which are equally smooth. In this re-gard, all these solutions are reminiscent of basic solutionsdiscussed above, but the probability density of solutionsis diﬀerent due to the diﬀerence in protocols: in the spiritof MEM one does not want default models that are toosimilar or default models that are too close to the ob-tained answer (an iteration where the new default modelis the answer from a previous run, is considered a self-defeating strategy). This raises an important question ofwhat strategy should be used to produce a representa-tive set of basic solutions within MEM. One possibilityis stochastic exploration of the conﬁguration space of de-fault models. IV. PERFORMANCE TESTS

We perform blind tests of our method for two diﬀerentkernels. The function g ( τ n ) was prepared from equa-tion (1) and uncorrelated Gaussian noise was afterwardsadded to g ( τ n ). A. Resolving the width of the high-energy peak

In this subsection, we assume that the spectral func-tion A ( z ) is identically equal to zero at z <

0, non-negative at z >

0, and the kernel is K ( τ n , z ) = e − zτ n ,see Eq. (2).The spectral function for test 1 and test 2 (shown inFig. 1) contains two peaks of ﬁnite width and has thefollowing form (up to a normalization constant) A ( z ) = c σ exp (cid:26) − ( z − z ) σ (cid:27) + c σ exp (cid:26) − ( z − z ) σ (cid:27) , (28)where z = 0 . c = 0 . σ = 0 . z = 2 . c = 0 .

41, and σ = 0 . σ = 10 − for test 1 and σ = 10 − for test 2.In the spectrum for test 3 the low-frequency peak isnot a Gaussian but a δ -function with the same position P o s s i b l e t os t r e t c h z A ( z )

FIG. 3. (Color online.) Results for test 1, featuring two peaksof ﬁnite width with noise level 10 − . Shown is the compari-son between the actual spectrum (red solid line), the smoothSOCC spectrum (blue short-dashed line), and the pulled-uphigh-energy peak SOCC solution (green dashed line). The er-ror bars for the smooth SOCC spectrum { σ m } are determinedfrom Eq. (7). and weight, A ( z ) = √ π c δ ( z − z )+ c σ exp (cid:26) − ( z − z ) σ (cid:27) , (29)where z = 0 . c = 0 . z = 2 . c = 0 .

41, and σ = 0 . σ = 10 − .The challenge for NAC is to judge whether one can re-solve the width of the high-frequency peak. To this endwe consider two possible setups for MEM and SOCC.In the standard setup we assume a ﬂat default model forMEM and the SOCC procedure of generating smooth so-lutions as described in Sec. II B in the absence of the de-fault model penalty O . To study possible deformationsof the second peak, we then introduce a narrow-peak de-fault model in MEM and re-run the simulation, or, in thecase SOCC, we insist that the ﬁnal solution goes througha much higher point at the peak maximum. The width ofthe high-frequency peak is deemed impossible to resolveif one can reduce it by a factor of two, while the spectrumremains well-deﬁned.We note that better reproducibility of the low-frequency peak is a particular property of the kernel (2).For example, for the analytic continuation of the current-current correlation function to the optical conductivity,the main challenge is to resolve the spectral density atzero frequency. Analysis of test 1 shows that the low-energy peak canbe well resolved by both SOCC and MEM (see the insets

A ( z )

P o s s i b l e t os t r e t c h z FIG. 4. (Color online.) Results for test 1, featuring twopeaks of ﬁnite width with noise level 10 − . Shown is thecomparison between the actual spectrum (red solid line), theMEM spectrum with a ﬂat default model (blue short dashedline), and the MEM spectrum with a pulled-up high-energypeak in the default model (green dashed line). Probability density s s = 0 . 0 6 4 FIG. 5. (Color online.) Distribution of the second momentfor the high frequency peak in test 1 (short-dashed black line),test 2 (dashed blue line), and test 3 (red solid line), among allbasic solutions. The vertical line shows the second moment σ = 0 .

064 of the original spectrum. in Figs. 3 and 4, respectively). On the other hand, thehigh-energy peak width is severely overestimated by bothmethods in the standard setup (Figs. 3 and 4). However,both methods allow one to pull the high-frequency peakup at least by a factor of four above the actual spec-trum. Speciﬁcally, if the second peak in the MEM de-fault model is set to be much narrower than the actual

P o s s i b l e t os t r e t c h z A ( z ) FIG. 6. (Color online.) Results for test 2, featuring twopeaks of ﬁnite width with noise level 10 − . Shown is the com-parison between the actual spectrum (red solid line) and thesmooth SOCC spectrum (blue short-dashed line), the pulled-up high-energy peak SOCC solution (green dashed line). Theerror bars for smooth SOCC spectrum { σ m } determined fromEq. (7). one, then MEM produces an answer of the same widthas this default model. Similarly, the superposition of ba-sic SOCC solutions can be forced to have a much higheramplitude at the second peak maximum by employing anappropriate target function. In SOCC, one may also seedirect evidence that the second peak width is question-able by considering statistics of the second moments σ for the high-frequency peak among all basic solutions.The corresponding distribution is presented in Fig. 5.The probability to ﬁnd a solution with a vanishing width( σ →

0) for the high-frequency peak does not go to zerofor test 1 and, hence, the imaginary-time data for g ( τ n )(within their accuracy) do not rule out a δ -function forthe high-frequency peak.Further insight is provided by test 2, which diﬀers fromtest 1 only in the noise level, which is reduced by two or-ders of magnitude. One readily observes that, in contrastto test 1, the standard setups of SOCC and MEM give avery good description of the high-frequency peak (Figs. 6and 7). Does this mean that one can be absolutely surethat the width of the peak is ﬁnite? The answer is no,because one can still easily pull the peak up by a factor offour. Moreover, SOCC analysis of second moments (seeFig. 5) demonstrates that a δ -functional shaped secondpeak is still a possibility, despite improved error bars. Inthese examples, tighter error bars allow only to reducethe upper bound on the width of the second peak. Muchsmaller error bars, which are unrealistic for Monte Carlosimulations of g ( τ n ), would be required to controllably0 A ( z ) P o s s i b l e t os t r e t c h z FIG. 7. (Color online.) Results for test 2, featuring twopeaks of ﬁnite width with noise level 10 − . Shown is the com-parison between the actual spectrum (red solid line), MEMwith a ﬂat default model (blue short-dashed line), and MEMwith a pulled-up high-energy peak in the default model (greendashed line). resolve the actual width of the second peak.Test 3 has the same Gaussian noise as test 2 but thelow-frequency peak is now replaced by a δ -function, seeEq. (29). This crucially changes the results. Now thehigh-frequency peak is well reproduced not only in thestandard setup of SOCC and MEM but also in attemptsto pull the solution up, see Figs. 8 and 9, respectively.Stability of results for the second peak width is also ev-ident in the probability distribution for the second mo-ment shown in Fig. 5. The distribution is peaked at thecorrect value σ = 0 .

064 and is rather narrow, indicatingthat a narrower peak would compromise the error bars.We emphasize that the success of resolving the widthof the second peak in test 3 is due to a combination oftwo circumstances: the small width of the ﬁrst peak andthe high accuracy of the input data. To see this, it isinstructive to consider the physical example of the Fermipolaron (see Sec. V), where the width of the ﬁrst peak isvery small, but the accuracy of the input data is signiﬁ-cantly lower than in test 3.

B. Fermi distribution kernel

Test 4 analyzes the possibility of resolving spectraldensities at the Fermi level from the analytic continu-ation of g ( τ ). Here, the spectrum A ( z ) is deﬁned inthe range −∞ < z < ∞ and the kernel is K ( τ n , z ) =exp {− zτ n } / (1 + exp {− zβ } ) with β = 6. The uncorre-lated Gaussian noise is added at the 10 − level. One I m p o s s i b l e t o s t r e t c hb e y o n d t h i s p o i n t Z A ( z ) FIG. 8. (Color online.) Results for test 3, characterized by a δ -function at low frequency and a peak of ﬁnite width at highfrequency with noise level 10 − . Shown is the comparison be-tween the actual spectrum (red solid line), the smooth SOCCspectrum (blue short-dashed line), and the maximally pulled-up high-energy peak SOCC solution (green dashed line). Theerror bars for smooth SOCC spectrum { σ m } determined fromEq. (7). can see in Figs. 10 and 11 that both SOCC and MEMgive a good description of the spectral function in thevicinity of the chemical potential (at z = 0). Also, theheight of the middle peak at near zero frequency cannotbe signiﬁcantly pulled up without distorting the rest ofspectrum. Speciﬁcally for MEM, trying diﬀerent defaultmodels with one, two, or three Gaussian peaks did notimprove the answer; in all cases trying to ﬁnd narrowerpeaks resulted in secondary oscillations reminiscent ofnumerical instabilities. We conclude that the fermionicspectrum can be restored for the given parameters withhigh quality. V. APPLICATION OF SOCC TO THE FERMIPOLARON PROBLEM

We now test the SOCC method on a physical system—the resonant Fermi polaron (a spin-down fermion in a seaof noninteracting spin-up fermions) ; here in three di-mensions and for equal mass of spin-up and spin-downparticles. The coupling between the polaron and its en-vironment is characterized by a single dimensionless pa-rameter k F a , where k F is the Fermi wave vector and a the s-wave scattering length. Here we examine a typicalsituation at k F a = 0 . δ -function.The imaginary-time polaron Green’s function at zero1 Z Z A ( z ) I m p o s s i b l e t o s t r e t c hb e y o n d t h i s p o i n t

FIG. 9. (Color online.) Results for test 3, characterized by a δ -function at low frequency and a peak of ﬁnite width at highfrequency with noise level 10 − . Shown is the comparisonbetween the actual spectrum (red solid line), the MEM spec-trum with a ﬂat default model (blue short-dashed line), andthe MEM spectrum with a double Gaussian default modelwhere the second peak is maximally pulled up (green dashedline). temperature and zero momentum, g n = g ( τ n ), was ob-tained with diagrammatic Monte Carlo, see Refs. 1, 13,and 36. We are able to achieve very high precisionin our results for g ( τ ) with a relative error as low as O (10 − − − ) at τ close to zero, O (10 − − − ) around τ = 1 /ε F (where ε F is the Fermi energy of spin-upfermions) and a few percent at the largest τ consideredfor the analytic continuation. The kernel at zero temper-ature is K ( τ n , z ) = e − zτ n .The polaron spectral function features two peaks.The position and weight of the ﬁrst polaron peak areﬁxed with high accuracy by the asymptotic decay ofthe Green’s function, − g ( τ → ∞ ) → Z e − E τ . Ourdata at large ε F τ n can be ﬁtted to a single exponential(within error bars) indicating that the polaron remains awell-deﬁned quasi-particle in this parameter range. Theparticle-hole continuum emerges at higher frequencies asa second broad peak. A key question we want to addresshere is its spectral width. This has been discussed in thecontext of the repulsive polaron state. In order forit to qualify to be a well-deﬁned quasiparticle, the peakwidth needs to be suﬃciently narrow (much smaller thanthe Fermi energy, corresponding to a suﬃciently long lifetime). Thus resolving the width accurately is very impor-tant to correctly interpret this spectrum. Note that thisspectrum has the same general features as the spectrumfor test 1 in the previous section.When SOCC is used to produce a smooth solution thesecond peak emerges as a broad spectral feature. If thiswas indeed the case, the metastable repulsive polaronpicture would be inapplicable for k F a = 0 .

8. However, - 2 - 1 0 1 2 3 40123 - 4 - 2 0 2 4 60 . 0 0 10 . 0 10 . 11 z I m p o s s i b l e t o s t r e t c h b e y o n dt h i s p o i n t

A ( z ) FIG. 10. (Color online.) Results for test 4 with a Fermi dis-tribution kernel and noise level 10 − . Shown is the compari-son between the actual spectrum (red solid line), the smoothSOCC spectrum (blue short-dashed line), and the maximallypulled up central peak SOCC solution (dashed green line).The logarithmic plot in the inset highlights the comparison oflow-intensity features. The error bars for the smooth SOCCspectrum { σ m } are determined from Eq. (7). - 2 - 1 0 1 2 3 40123 - 4 - 2 0 2 4 60 . 0 0 10 . 0 10 . 11 A ( z ) z FIG. 11. (Color online.) Results for test 4 with a Fermi distri-bution kernel and noise level 10 − . Shown is the comparisonbetween the actual spectrum (red solid line) and the MEMspectrum in the default setup (blue dashed line). the same set of basic solutions can be optimized to havea much narrower peak, see Fig. 12, implying that a well-deﬁned repulsive polaron quasi-particle cannot be ruledout. Given that the second peak dispersion can be re-duced by a factor of four without compromising the ac-curacy of the ﬁnal solution, we have to conclude that thequality of the input data is insuﬃcient to determine the2 - 4 - 2 0 2051 0 k F a = 0 . 8 z A ( z ) FIG. 12. (Color online.) Spectral density of the resonantFermi polaron for k F a = 0 . σ = 0 .

48 is shown by the blue short dashedline. However, a much narrower solution for the second peakwith σ = 0 .

12 (green dashed line) can also be obtained fromthe same set of basic solutions. actual width.

VI. CONCLUSIONS

The most challenging aspect of numerical analytic con-tinuation is not the algorithm of ﬁnding a stable (smooth)solution consistent with the input data, but the proto-col of assessing its accuracy and unambiguity. We haveimplemented such a protocol based on the method ofstochastic optimization with consistent constraints anddemonstrated how a similar strategy can be followed withthe maximum entropy method by exploring the spaceof default models. Irrespective of the method, the pro-cedure has to deal with either integrals of the spectralfunction (rather than the function itself) and/or certain a priori and a posteriori constraints consistent with theerror bars on the input data.It is important to distinguish between two cases. The ﬁrst (simplest) case is when all physically meaningful so-lutions do not diﬀer substantially, upon possible smearingof unimportant (below the resolution) ﬁne details of theotherwise smooth spectral function. The second case,exempliﬁed by the spectral function in Fig. 1, is whena piece of important physical information is inevitablylost. In the ﬁrst case, a reasonable characterization ofuncertainties can be achieved by coarse-graining, like,e.g., Eq. (3). In the second case, one has to employ amore elaborate approach to reveal the diﬀerent possiblephysical solutions that do not compromise the error barsof the input data.Much of our attention has been paid to the protocolof treating the second case. We have shown how it canbe handled with SOCC and modiﬁed MEM. With MEMone has to explore various default models and resultingsolutions that remain consistent with input error bars.A useful feature of the SOCC approach is that such ananalysis—and, more generally, the application of all pos-sible consistent constraints—can be implemented at thepost-processing stage using a representative set of “ba-sic” solutions generated by the stochastic-optimizationprotocol. The linearity of the problem (1) is crucial here,as it guarantees that any superposition of basic solutionswith non-negative weights is also a solution to Eq. (1)within the same or better level of accuracy. Even if thesuperposition coeﬃcients are allowed to be negative, theprocedure typically keeps the accuracy of the ﬁnal solu-tion at the level of basic solutions. This allows one toimplement consistent constraints by choosing the super-position coeﬃcients to minimize the corresponding ob-jective function.

Acknowledgements.

This work was supported by theSimons Collaboration on the Many Electron Problem,the National Science Foundation under the grant PHY-1314735, the MURI Program “New Quantum Phasesof Matter” from AFOSR, and FP7/ERC starting grantNo. 306897. A.S.M. is supported by the ImPACT Pro-gram of the Council for Science, Technology and In-novation (Cabinet Oﬃce, Government of Japan). Ourmaximum entropy implementation builds on the ALPSimplementation. A. S. Mishchenko, N. V. Prokof’ev, A. Sakamoto, andB. V. Svistunov, Phys. Rev. B , 6317 (2000). A. S. Mishchenko, in Verlag des Forschungszentrum J¨ulich,2012. - 978-3-89336-796-2, edited by E. Pavarini, E. Koch,F. Anders, and M. Jarrell (2012). N. Prokof’ev and B. Svistunov, JETP Lett. , 747 (2013). R. N. Silver, D. S. Sivia, and J. E. Gubernatis, Phys. Rev.B , 2380 (1990). R. Bryan, Eur. Biophys. J. , 165 (1990). J. E. Gubernatis, M. Jarrell, R. N. Silver, and D. S. Sivia,Phys. Rev. B , 6011 (1991). M. Jarrell and J. E. Gubernatis, Phys. Rep, , 133 (1996). S. Fuchs, T. Pruschke, and M. Jarrell, Phys. Rev. E ,056701 (2010); A. Dirks, P. Werner, M. Jarrell, and T.Pruschke, Phys. Rev. E , 026701 (2010). O. Gunnarsson, M. W. Haverkort, and G. Sangiovanni,Phys. Rev. B , 155107 (2010); ibid Phys. Rev. B ,165125 (2010). A. Schirotzek, C.-H. Wu, A. Sommer, and M. W. Zwierlein,Phys. Rev. Lett. , 230402 (2009). C. Kohstall, M. Zaccanti, M. Jag, A. Trenkwalder, P.Massignan, G. M. Bruun, F. Schreck, and R. Grimm, Na-ture , 615618 (2012). M. Koschorreck, D. Pertot, E. Vogt, B. Fr¨ohlich, M. Feld,and M. K¨ohl, Nature , 619622 (2012). O. Goulko, A. S. Mishchenko, N. Prokof’ev, and B Svis-tunov, arXiv:1603.06963. M. Parish, and J. Levinsen, arXiv:1608.00864. K. Kamikado, T. Kanazawa, and S. Uchino,arXiv:1606.03721. R. Schmidt and T. Enss, Phys. Rev. A , 063620 (2011). P. Massignan, and G. M. Bruun, Eur. Phys. J. D , 83(2011). Such a statistical treatment of multiple independent solu-tions was ﬁrst suggested in the fast modiﬁcation of SOMmethod, see Ref. 32 A. N. Tikhonoﬀ, Dokladyu Akademii Nauk SSSR, , 195(1943); ibid , 501 (1963) [Soviet Mathematics , 1035(1963)]. D. L. Phillips, J. ACM , 84 (1962). A. N. Tikhonoﬀ and V.Y. Arsenin,

Solutions of Ill-posedProblems (Winston & Sons, Washington, 1977). A. N. Tikhonov, A. Goncharsky, V. V. Stepanov, andA. Yagola,

Numerical Methods for the Solution of Ill-posed Problems (Springer-Science+Business Media, B.V.,Moscow, 1995). C. E. Creﬃeld, E. G. Klepﬁsh, E. R. Pike, and S. Sarkar,Phys. Rev. Lett. , 517 (1995). C. L. Lawson and R. J. Hanson,

Solving Least SquaresProblems , Society for Industrial and Applied Mathematics,Philadelphia, 1995. I. S. Krivenko and A. N. Rubtsov, JETP Lett. , 768(2012). J. Sch ˙ott, I. L. M. Locht, E. Lundin, O. Gr˚an`as, O. Eriks-son, and I. Di Marco, Phys. Rev. B , 075104 (2016). J. Sch ˙ott, E. G. C. P. van Loon, I. L. M. Locht, M. Kat-snelson, and I. Di Marco, arXiv:cond-mat/1607.04212. A. W. Sandvik, Phys. Rev. B , 10287 (1998). K. Vafayi and O. Gunnarsson, Phys. Rev. B , 035115(2007). K. S. D. Beach, arXiv:cond-mat/0403055. A. W. Sandvik, arXiv:1502.06066. T. A. Maier, private communication. A. S. Mishchenko, N. Nagaosa, G. De Filippis, A. de Can-dia, and V. Cataudella, Phys. Rev. Lett. , 146401(2015). P. Massignan, M. Zaccanti, and G. M. Bruun, Rep. Prog.Phys. , 034401 (2014). Z. Lan and C. Lobo, J. Indian I. Sci. , 179 (2014). N. Prokof’ev and B. Svistunov, Phys. Rev. B , 020408(2008); ibid Phys. Rev. B , 125101 (2008). B. Bauer et al.et al.