Numerical analytic continuation: Answers to well-posed questions
Olga Goulko, Andrey S. Mishchenko, Lode Pollet, Nikolay Prokof'ev, Boris Svistunov
NNumerical analytic continuation: answers to well-posed questions
Olga Goulko
Department of Physics, University of Massachusetts, Amherst, MA 01003, USA
Andrey S. Mishchenko
RIKEN Center for Emergent Matter Science (CEMS),2-1 Hirosawa, Wako, Saitama, 351-0198, Japan andNational Research Center “Kurchatov Institute,” 123182 Moscow, Russia
Lode Pollet
Department of Physics, Arnold Sommerfeld Center for Theoretical Physics,University of Munich, Theresienstrasse 37, 80333 Munich, Germany
Nikolay Prokof’ev
Department of Physics, University of Massachusetts, Amherst, MA 01003, USADepartment of Physics, Arnold Sommerfeld Center for Theoretical Physics,University of Munich, Theresienstrasse 37, 80333 Munich, Germany andNational Research Center “Kurchatov Institute,” 123182 Moscow, Russia
Boris Svistunov
Department of Physics, University of Massachusetts, Amherst, MA 01003, USANational Research Center “Kurchatov Institute,” 123182 Moscow, Russia andWilczek Quantum Center, Zhejiang University of Technology, Hangzhou 310014, China (Dated: September 6, 2016)We formulate the problem of numerical analytic continuation in a way that lets us draw meaning-ful conclusions about properties of the spectral function based solely on the input data. Apart fromensuring consistency with the input data (within their error bars) and the a priori and a posteriori (conditional) constraints, it is crucial to reliably characterize the accuracy—or even ambiguity—ofthe output. We explain how these challenges can be met with two approaches: stochastic optimiza-tion with consistent constraints and the modified maximum entropy method. We perform illustrativetests for spectra with a double-peak structure, where we critically examine which spectral propertiesare accessible and which ones are lost. For an important practical example, we apply our protocolto the Fermi polaron problem.
PACS numbers: 02.70.-c, 71.15.Dx, 71.28.+d, 71.10.Fd
I. INTRODUCTION
Numerous problems in science, from spectral analysisto image processing, require that we restore properties ofa function A ( z ) from a set of integrals g n = G [ n, A ] ≡ (cid:90) ∞−∞ dzK ( n, z ) A ( z ) , n = 1 , . . . , N, (1)where K ( n, z ) is a known kernel and { g n } is a finite set ofexperimental or numerical input data with error bars. Animportant class of such problems—known as numericalanalytic continuation (NAC)—deals with “pathological”kernels featuring numerous eigenfunctions with anoma-lously small eigenvalues. An archetypal NAC problemis the numerical spectral analysis at zero temperature,where the challenge is to restore the non-negative spec-tral function A ( z ≥
0) satisfying the equation g n = (cid:90) ∞ dze − zτ n A ( z ) , (2)from numerical data for g n = g ( τ n ≥ ill-posed .Mathematically, the near-degeneracy of the kernel im-plies two closely related circumstances: (i) the absence ofthe resolvent, and (ii) a continuum of solutions satisfyingthe input data within their error bars (even when inte-grals over z are replaced with finite sums containing lessor equal to N terms). Nowadays, the first circumstanceis merely a minor technical problem, as there exists anumber of methods allowing one to find solutions to (1)without compromising the error bars of g n .The second circumstance—the ambiguity of thesolution—is a more essential problem. It is clear thatif one formulates the goal as to restore A ( z ) as a con-tinuous curve, or to determine its value on a given gridof points, then the goal cannot be reached as stated, ir-respective of the properties of the kernel K ( n, z ). Theinput data set is finite and noisy, thereby introducing anatural limit on the resolution of fine structures in A ( z ).Fortunately, the above-formulated goal has little to dowith the practical world. In an experiment, all devicesare characterized by a finite resolution function and thedata they collect always correspond to integrals . The a r X i v : . [ c ond - m a t . o t h e r] S e p data are processed by making certain assumptions aboutthe underlying function. This motivates an alternativeformulation of the NAC goal involving integrals of A ( z )that render the problem well-defined. With additionalassumptions about the smoothness and other propertiesof A ( z ) behind these integrals, consistent with both apriori and a posteriori knowledge, the ambiguity of thesolution can be substantially suppressed. The simplestsetup is as follows: Given a set of finite intervals { ∆ m } , determine the inte-grals of the spectral function over these intervals: i m = ∆ − m (cid:90) z ∈ ∆ m dzA ( z ) , m = 1 , . . . , M, (3) along with the corresponding dispersions of fluctuations { σ m } (straightforwardly extendable to the dispersioncorrelation matrix { σ mm (cid:48) } ). Naively one might think that nothing is achieved by goingfrom the integrals in (1) to the integrals in (3) becausethe latter have exactly the same form with the kernel¯ K ( m, z ) = ∆ − m for z ∈ ∆ m and zero otherwise (otherforms of the “resolution function” ¯ K ( m, z ) are discussedin Sec. II): i m = I [ m, A ] = (cid:90) ∞−∞ dz ¯ K ( m, z ) A ( z ) , m = 1 , . . . , M. (4)This impression, however, is false because kernel proper-ties are at the heart of the problem. If for appropriatelysmall intervals (sufficiently small to resolve the variationsof A ( z )), the uncertainties for i m remain small, then onecan draw reliable conclusions for the underlying behav-ior of A ( z ) itself. The difference between “good” (e.g.as in Fourier transforms) and pathological kernels is thatfor the latter, due to the notorious saw-tooth instability,the uncertainties for i m quickly become too large for ameaningful analysis of fine structures in A ( z ).To obtain a solution from the integrals (3), one hasto invoke the notion of conditional knowledge . The moststraightforward approach is to set the spectral functionvalues at the middle points z m of the intervals ∆ m to A fin ( z m ) = i m . This is only possible if the intervals canbe made appropriately narrow without losing accuracyfor the integrals. With this approach we assume thatthe function is nearly linear over the intervals in ques-tion. This is a typical procedure for experimental data.Quantifying the error bar on A fin ( z m ) necessarily involves two numbers: the “vertical” dispersion σ m is directly in-herited from i m , and the “horizontal” error bar ∆ m / (cid:82) dzA ( z ) is typically known with an accuracy that is or-ders of magnitude better than what would be predictedby the central limit theorem if this integral is representedby a finite sum of integrals over nonoverlapping inter-vals. Second, the errors are not necessarily distributedas a Gaussian. Atypical fluctuations can have a signifi-cant probability and their analysis should not be avoidedas the actual physical solution may well be one of them.To this end, it is important to explore the minimal andmaximal values that the integral i m can take, and checkthat these are not significantly different from the typicalvalue of i m . In certain cases this criterion cannot be metwithout increasing the intervals ∆ m to an extent whenthe assumption of linearity of A ( z ) becomes uncontrolled,implying that an important piece of information aboutthe shape of A ( z ) in this interval is missing. A charac-teristic example that plays a key role in the subsequentdiscussion is presented in Fig. 1, where the challenge isto extract the shape of the second peak. A ( z ) Z FIG. 1. Challenging example of a spectrum A ( z ) for the NACproblem (2). As shown in the main part of the text, the signif-icant width of the first peak makes it essentially impossible tocontrollably restore the width of the second peak, even withsmall relative error bars ( ∼ − ) on g n . On the other hand,the first two moments of the second peak, characterizing itsweight and position, can be extracted reliably. In the more sophisticated approach used in this work,the values of A fin ( z m ) can be further optimized (withoutcompromising the accuracy of the solution with respectto g n ) to produce a smooth curve. This protocol hasthe additional advantage of eliminating minima, maxima,gaps, and sharp features that are not guaranteed to existby the quality of input data. The nature of the problem issuch that very narrow peaks (or gaps) with tiny spectralweight can always be imagined to be present (for narrowintervals they will certainly emerge due to the saw-toothinstability). Our philosophy with respect to these fea-tures is to erase them within the established error bounds and obtain a solution that is insensitive to the intervalparameters.Having established a smooth solution, one may never-theless ask whether a particular feature of the solutioncan, in principle, have significantly different properties.For example, if the NAC procedure suggests a peak, onemay wonder if the true spectral function could have amuch narrower peak with the same area, and if so, whatis its smallest possible width. A NAC protocol should beable to answer this type of question fast and reliably. Inthis work, we explain how these goals can be achieved inpractice. Many technical details of the protocol that wepropose to abbreviate as SOCC (Stochastic Optimizationwith Consistent Constraints) were already published inRefs. 1–3 as separate developments. The crucial advanceshere are (i) the final formulation based on integrals of thespectral function, and (ii) the idea of working with lin-ear combinations of pre-calculated “basic” solutions. Thelatter allows one to readily apply consistent constraintswithout compromising the error bars on the input data.Consistent constraints are also crucial for assessing whatfeatures can be resolved and what information is unre-coverable.In what follows, the term “consistent constraints” ap-plies to (i) the general principle of utilizing the a priori and revealing the a posteriori (conditional) knowledgewithout compromising the error bars of the input dataand (ii) a particular set of numerical procedures basedon ideas respecting this principle. Our SOCC protocolinvolves two different consistent-constraint procedures.The first one, borrowed form the NAC method of Ref. 3,is now used solely to (dramatically!) enhance the per-formance of the stochastic-optimization part of the pro-tocol searching for basic solutions. The most importantconsistent-constraint procedure is used to post-processthe set of basic solutions.The paper is organized as follows. In Sec. II we de-scribe the SOCC method consisting of three distinctstages, and explain how a smooth solution can be ob-tained without any bias with respect to solving Eq. (1)and analyzed for possible atypical deformations. InSec. III we briefly review the maximum entropy method(MEM). In Sec. IV we explore what SOCC and MEMmethods predict for the test spectral function shown inFig. 1, and how one should analyze the final solution withrespect to its possible smooth transformations. In Sec. Vwe apply our findings to the physical spectral function ofthe resonant Fermi polaron.
We conclude in Sec. VI.
II. STOCHASTIC OPTIMIZATION WITHCONSISTENT CONSTRAINTS
The formulation of the SOCC method is relativelysimple and consists of three parts:1. Finding a large set of solutions A j ( z )[ j = 1 , . . . , J (cid:29)
1] to Eq. (1) that satisfy the in-put data within their error bounds. In what followswe call them “basic” solutions. Basic solutions are not biased in any way to be smooth or to satisfy any otherrequirements based on knowledge about the problemoutside of Eq. (1). The irregularity of basic solutionsembodies what is referred to as an ill-posed problem. Insubsection II A we briefly explain how these solutions arefound by the stochastic optimization procedure (mosttechnical details were published previously in Refs. 1and 2) and how the consistent constraints method isused to improve drastically the speed of the stochasticoptimization protocol.2. Using the basic solutions A j ( z ) to compute the in-tegrals (4) with a different kernel ¯ K ( m, z ). There areseveral choices here. One of them is given by Eq. (3)and amounts to computing integrals from A j ( z ) over fi-nite intervals { ∆ m } centered at { z m } . However, one isalso free to consider normalized continuous kernels withunrestricted integration over z , such as Lorentzian (orGaussian) shapes centered at points { z m } with the width { ∆ m } at half-height, e.g.,¯ K ( m, z ) = ∆ m /π ( z − z m ) + ∆ m . (5)Thus obtained sets of integrals { i ( j ) m } are then usedstraightforwardly to compute averages i m = J − J (cid:88) j =1 i ( j ) m , (6)and dispersions σ m = J − J (cid:88) j =1 (cid:16) i ( j ) m − i m (cid:17) . (7)To characterize possible two-point correlations, oneshould compute the correlation matrix σ mm (cid:48) = J − J (cid:88) j =1 (cid:16) i ( j ) m − i m (cid:17) (cid:16) i ( j ) m (cid:48) − i m (cid:48) (cid:17) . (8)Strictly speaking, there is no reason to stop characteriz-ing correlations at the two-point level. One may proceedwith computing multi-point averages but the effortquickly becomes expensive and the outcome cannot bepresented in a single plot. An alternative “visualization”of multi-point correlations is discussed in subsection II B.3. Interpreting the result. The simplest interpretationand an estimate of the dispersion for typical fluctuationsis to assume that A ( z ) is nearly linear over the range ofeach interval. This leads to the solution A fin ( z m ) = i m with vertical and horizontal “error bars” σ ( v ) m = σ m and σ ( h ) m = ∆ m /
2. The vertical error bars may be overes-timated because fluctuations at different points are cor-related. However, as explained in the Introduction, thecorrect answer may correspond to some atypical shape,and this possibility has to be addressed as well.An alternative protocol, discussed in subsection II B,determines the final solution by selecting a superpositionof basic solutions A fin ( z ) = J (cid:88) j =1 c j A j ( z ) , J (cid:88) j =1 c j = 1 , (9)such that A ( z ) remains non-negative (with high accu-racy) and the coefficients c j are optimized to imposesmooth behavior or any other “conditional knowledge”.Formally, the simplest interpretation corresponds to c j =1 /J . A. Search for basic solutions
The search for basic solutions relies on the minimiza-tion of χ = N − N (cid:88) n =1 (cid:18) g n − G [ n, A ] δ n (cid:19) , (10)where δ n is the error of the g n value. Without loss ofgenerality, we assume that the components of the vector (cid:126)g = ( g , g , . . . ) are uncorrelated; otherwise, one has toperform a rotation to the eigenvector basis of the two-point correlation matrix (cid:104) ( g n − (cid:104) g n (cid:105) )( g (cid:48) n − (cid:104) g (cid:48) n (cid:105) ) (cid:105) wherethe components of (cid:126)g become statistically independent.This linear transformation leads to an equation that hasexactly the same form as (1) with a rotated kernel. Wechoose a maximal tolerance χ c of order unity and searchfor functions A ( z ) > χ < χ c , which are thenadded to the set of basic solutions for further processing.Information about the input data is limited to the ob-jective function (10). Truly unbiased methods shouldnot assume anything about A ( z ) that is not part of ex-act knowledge, such as the predetermined grid of points,and the number and parameters of peaks/gaps. In thestochastic optimization method of Refs. 1 and 2, each so-lution is represented by a set of positive-definite rectan-gular shapes (the δ -function can be viewed as the limitingcase of an infinitely narrow and infinitely high rectangu-lar shape with fixed area), which are allowed to havemultiple overlaps, see panel (a) in Fig 2. More precisely,a solution is represented as a sum A ( z ) = R (cid:88) r =1 η { P r } ( z ) (11)of rectangles { P r } = { h r , w r , c r } , η { P r } ( z ) = (cid:26) h r , z ∈ [ c r − w r / , c r + w r / , , otherwise , (12)where h r > w r >
0, and c r are the height, width, andcenter of the rectangle P r , respectively. In what followswe refer to C = {{ P r } , r = 1 , ..., R } , (13) as a “configuration.” All rectangles are restricted tothe specified range of the A ( z ) support, [ z (min) , z (max) ];i.e., for all rectangles c r − w r / > z (min) and c r + w r / < z (max) . The spectrum normalization is given by (cid:80) Rr =1 h r w r = N , and G [ n, A ] in Eq. (1) can be writtenas G [ n, A ] = R (cid:88) r =1 K ( n, r ) h r , (14)where K ( n, r ) = (cid:90) c r + w r / c r − w r / dz K ( n, z ) . (15)The number of rectangles and all continuous parame-ters characterizing their position, width, and height arefound by minimizing the objective (10). Optimizationstarts from a randomly generated set of rectangles, andfinds a large number of dissimilar basic solutions A j ( z )with χ < χ c . More precisely, the search is based ona chain of randomly chosen updates over the configura-tion space of shapes which fully explore the saw-toothfluctuations of basic solutions. This is important for thesuccessful elimination of noise in the final solution. z z z z z z z z z z A(z) z (a) (b) (c) z (min) z (max) z (min) z (min) z (max) z (max) A(z)
A(z)
FIG. 2. (a) Spectral function parametrization by a set ofrectangles. (b) Illustration of the treatment of intersections ofrectangles. (c) Identical re-parametrization of the spectrum.
Updates proposing small modifications of the shape(“elementary” updates) have the disadvantage of longcomputation time for a basic solution. To speed up thesearch, we supplement the standard protocol of Refs. 1and 2 with consistent-constraints (CC) updates, whichpropose a radical shape modification based on minimiza-tion of the positive-definite quadratic form χ + (cid:80) i o i by matrix inversion as described in Ref. 3. Here o i arevarious positive-definite quadratic forms, or “penalties,”that ensure that the matrix to be inverted is well-defined.This is achieved by penalizing the derivatives of A ( z )(computed on the grid based on the current configuration {C} , see below) and enforcing A ( z ) ≥
0. The CC-updateinvolves a number of iterations when penalties o i are ad-justed self-consistently in such a way that at the end ofthe update 0 < (cid:80) i o i (cid:28) χ . Explicit forms for { o i } andthe adjustment protocols are described in detail in Ref. 3(see also O and O forms in subsection II B).Even though the CC-updates do not compromise thegoal of minimizing χ , their efficiency is based on penal-ties that suppress saw-tooth fluctuations. To excludepossible bias originating from CC-updates on basic so-lutions we proceed as follows. The global update of theSOCC method consists of thousands of elementary up-dates L tot that are divided into two groups: L a stage-aupdates and L b stage-b updates, where L a + L b = L tot and L a < L b . Updates increasing χ are temporarilyconsidered “accepted” (and the resulting configurationrecorded) with high probability during stage-a, but thisprobability is reduced during stage-b that favors updatesdecreasing χ . The idea is to use L a updates to escapefrom the local minimum of χ in the multi-dimensionalconfiguration space in a hope to find a better minimumafterwards. The global update is accepted only if asmaller value of χ was recorded in the course of applyingelementary updates, and the new configuration becomesthe one with the smallest χ . We apply CC-updates dur-ing stage-a of a global update when the increase of χ is allowed, and proceed with a large number of elemen-tary updates, which results in a configuration with fullydeveloped saw-tooth instability.We found that CC-updates have no effect on the self-averaging of the saw-tooth noise in the equal-weight su-perposition of basic solutions, improve typical χ -valuesfor basic solutions, and significantly decrease the compu-tation time required for finding basic solutions.To run the CC-update, one has to re-parameterize theconfiguration as a collection of nonoverlapping rectanglesin order to be able to use their heights for estimates ofthe function derivatives. Panel (b) in Fig. 2 illustrateshow overlaps of rectangles are understood in the SOCCmethod. This leads to an identical re-parametrization interms of nonoverlapping rectangles { (cid:101) P r } = { (cid:101) h r , (cid:101) w r , (cid:101) c r } .The conversion is done as follows. First, the set of rect-angle parameters { c r − w r / } ∪ { c r + w r / } is ordered toform a grid of new bin boundaries that also include thesupport limits z (min) and z (max) . Second, bin centers andwidths become centers and widths of the ordered set of new rectangles, respectively: (cid:101) c r +1 > (cid:101) c r ∀ r , (16) (cid:101) c − (cid:101) w / z (min) , (cid:101) c r − (cid:101) w r / (cid:101) c r − + (cid:101) w r − / ∀ r, (17) (cid:101) c R +1 + (cid:101) w R +1 / z (max) . Figures 2(b) and 2(c) illustrate how the conversionfrom { P t } to { (cid:101) P t } amounts to an identical representa-tion of the spectrum: R original rectangles introduce 2 R boundaries on the [ z (min) , z (max) ] interval and split it into2 R + 1 rectangles obeying conditions (16)–(17). Notethat some rectangles have zero height when submittedinto the CC-update. The update modifies the values ofall (cid:101) h -parameters and generates a new set { (cid:101) h (cid:48) r } . Since { (cid:101) P r } is a particular case of { P r } , there is no need toperform any additional transformation to proceed withelementary updates. B. Preparing the final solution
Because all basic solutions satisfy Eq. (1), one can im-mediately check that a linear combination of basic solu-tions, Eq. (9), always leads to a solution of Eq. (1) withthe same accuracy cutoff χ c as the basic solutions, pro-vided that all c -coefficients are non-negative. Indeed, bylinearity of the problem and the condition (cid:80) Jj =1 c j = 1, G [ n, A fin ] = J (cid:88) j =1 c j G [ n, A j ] ,g n − G [ n, A fin ] = J (cid:88) j =1 c j ( g n − G [ n, A j ]) . (18)Substituting these expressions into the χ form for thefinal solution and employing the Cauchy–Bunyakovsky–Schwarz inequality for χ jj (cid:48) we get χ = J (cid:88) j,j (cid:48) =1 c j c j (cid:48) χ jj (cid:48) ≤ ¯ C χ c , with ¯ C = J (cid:88) j =1 | c j | , (19) χ jj (cid:48) = N − N (cid:88) n =1 ( g n − G [ n, A j ])( g n − G [ n, A j (cid:48) ]) δ n . (20)If some c -coefficients are negative, the accuracy of thefinal solution is guaranteed only if ¯ C is not large. Onemay argue that the upper bound χ ≤ ¯ C χ c is substan-tially overestimating deviations, and the actual accuracyis better. Let C + = (1 + ¯ C ) / C − = (1 − ¯ C ) / C + and C − (we denote them as A + and A − ,respectively), have their χ measures smaller or equal to χ c , by Eq. (19). The final solution can be identicallywritten as A fin = C + A + + C − A − , and its χ -measure isnothing but the two state version of Eq. (19). Since the G [ n, A ] values are derived from spectral density integralsthey are smooth functions of n and random point-to-point sign fluctuations of g n − G [ n, A ] are arising pre-dominantly from g n . Thus, the expectation is that χ + − is positive, in which case χ ≤ ( C + C − ) χ c − C + | C − | χ + − ≤ C χ c . (21)In practice, sign-positivity of the spectral density severelyrestricts the possibility of having large | C − | and ¯ C in thefinal solution and | C − | tends to remain smaller than unityautomatically. Finally, Eq. (19) is only an upper bound,and superpositions with ¯ C as large as 2 may still have χ < χ c .These considerations lead to an important possibilityof modifying the shape of the final solution in order tosatisfy additional criteria formulated outside of Eq. (1).The key observation, and crucial difference to other NACmethods, is that “conditional knowledge” protocols areinvoked after all basic solutions are determined, meaningthat they remain unbiased with respect to the input data.As discussed in the Introduction, the most conserva-tive philosophy regarding sharp spectral features, suchas peaks and gaps, is to eliminate them if they are notwarranted by the quality of the input data. (Our methoddoes allow to answer the question whether a given sharpfeature is compatible with the input data, see below.)To implement the idea, we formulate the problem ofdetermining an appropriate set of { c j } coefficients asa linear self-consistent optimization problem closelyfollowing the consistent constraints method of Ref. 3.The objective function to be minimized consists ofseveral terms, O = (cid:80) k =1 O k , each being a quadraticpositive-definite form of c j . More terms can be addedif necessary to control higher-order derivatives, enforceexpected asymptotic behavior, etc. • To suppress large derivatives we consider the follow-ing form O = K (cid:88) k =2 (cid:8) D k [ A (cid:48) ( z k )] + B k [ A (cid:48)(cid:48) ( z k )] (cid:9) , (22)where { z k } is the grid of points used to define the first andsecond discrete derivatives of the function A ( z ). The setsof coefficients D k and B k are adjusted under iterationsself-consistently in such a way that contributions of all z k -points to O are similar. • The unity-sum constraint on the sum of all coeffi-cients in the superposition is expressed as O = U J (cid:88) j =1 c j − , (23)with a large constant U . • Since O + O does not constrain the amplitudes andsigns of all c j the minimization cannot proceed by matrixinversion. To improve matrix properties we add a “soft”penalty for large deviations of c j from the equal-weightsuperposition O = J (cid:88) j =1 ( c j − /J ) . (24) • To ensure that the spectral function is non-negative(with high accuracy) we need z -dependent penalties (tobe set self-consistently) that suppress the development oflarge negative fluctuations: O = K (cid:88) k =1 Q k A ( z k ) . (25) • Finally, we can introduce a penalty for the solution todeviate from some “target” function (or “default model”) A T ( z k ): O = K (cid:88) k =1 T k [ A ( z k ) − A T ( z k )] . (26)The main purpose of O is to address subtle multi-pointcorrelations between allowed shapes: by forcing thesolution to be close to a certain target function one canmonitor how the solution starts developing additionalsaw-tooth-instability-related features or violates theunity-sum constraint. This penalty is zero when prepar-ing A fin in the absence of any target function.The self-consistent optimization protocol is as follows.We start with c j = 1 /J and compute A ( z k ). The ini-tial sets of coefficients in O are defined as D k = D , and B k = D , where D is some small positive constant (its ini-tial value has no effect on the final solution because penal-ties for derivatives will be increased exponentially underiterations). Since the positivity of A ( z ) is guaranteed inthe initial state, we set Q k = 0. After the quadratic formfor the objective function O is minimized, the new set of c -coefficients is used to define a new solution A ( z ), penal-ties for derivatives are increased, D → f D , D k → f D k , B k → f B k by some factor f >
1, and then all penaltiesin in O and O are adjusted self-consistently as follows: • If D k | A (cid:48) ( z k ) | > D , we assign D k = D / | A (cid:48) ( z k ) | ; • If B k | A (cid:48)(cid:48) ( z k ) | > D , we assign B k = D / | A (cid:48)(cid:48) ( z k ) | ; • If A ( z k ) <
0, we assign a large penalty suppressingthe amplitude of the solution at this point, Q k = Q ,where Q is a large constant; otherwise the value of Q k is increased by two orders of magnitude.In this work we use U = 10 , Q = 10 , and f = 2. Thissets the stage for the next iteration of the O -optimizationprotocol.Since the accuracy expression, Eq. (18), relies on thesubstitution g n = (cid:80) Jj =1 c j g n , it is crucial that the unity-sum constraint is satisfied for all input data points, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) J (cid:88) j =1 c j − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < (cid:15) , (cid:15) = min {| δ n /g n |} n . (27)This provides the required criterion for terminating iter-ations. The final solution (9) is based on the last set of c -coefficients that satisfied the condition χ < χ c .In the absence of the target objective O , the proce-dure is guaranteed to produce a final solution A fin ( z ) withsmooth behavior because our initial solution already sat-isfies all requirements. The derivative objective is forcing A ( z ) to be as smooth as possible within the subspace offluctuations that keep χ small.With the help of O one can explore how the solution ismodified if one forces it to go through some set of points. The simplest case would be to set T k = T for some point z o (where T is a large number; in this work T = 10 )and zero otherwise, and shift A T ( z o ) away from A fin ( z ).The solution going through the point A T ( z o ) is no longerguaranteed to be smooth in the vicinity of z o and forlarge deviations from the final solution will develop thesaw-tooth instability at z o .The most interesting choices for z o are the minimaand maxima of the spectrum. Despite the fact thatour protocol is to erase sharp features not warrantedby the input data quality, we can still address questionssuch as “can this spectral peak (or gap) be made nar-rower/higher/lower and by how much, before the solu-tion becomes unstable against developing secondary fea-tures?” This question cannot be fully answered at thelevel of the correlation matrix (8) because (i) spectralfunctions have subtle multi-point correlations, and (ii)the notion of a “typical” solution has no physical mean-ing in this context. The only way to answer this questionis to have access to a large representative set of unbiasedbasic solutions.The objective O offers a generic way of exploring var-ious possibilities for underlying features hidden behindthe accuracy of input data. Clearly, there are other al-ternatives for addressing specific questions. For example,one can isolate a spectral peak to some interval and com-pute the dispersion d j of each basic solution j over thisinterval. Next, the distribution function W ( d ) over allbasic solutions is composed and analyzed. If W ( d ) hasa narrow region of support around its average (cid:104) d (cid:105) value,then the peak width cannot significantly deviate from (cid:104) d (cid:105) .If W ( d ) is nonzero for d (cid:28) (cid:104) d (cid:105) then one has to concludethat the actual peak might be much narrower (and, cor-respondingly, have a much higher amplitude) than whatis predicted by the typical smooth solution A fin . III. MAXIMUM ENTROPY METHOD
Numerous NAC schemes are based on a totallydifferent philosophy and impose additional restric-tions/penalties on the allowed functional shapes of A ( z ) in the process of solving Eq. (1). In otherwords, the search for solutions is biased with “condi-tional knowledge” from the very beginning. Histori-cally, the Tikhonov-Phillips regularization method was the first to advocate this approach. Currently,the most popular scheme of this type is the maxi-mum entropy method. Other schemes worth men-tioning are singular value decomposition, non-negativeleast-squares, stochastic regularization, and averag-ing Pad´e approximants. In the stochastic samplingmethod of Refs. 8, 28–30, the remaining bias is in theform of the predetermined grid of frequency points, andthe final solution is an average over a certain “ther-mal” ensemble (see also Ref. 31 for a further refine-ment). Fast effective modification of stochastic optimiza-tion (FESOM) also uses the predetermined grid of fre-quency points.We now briefly review the maximum entropy method, which can be seen as a special case of stochastic sam-pling methods. Instead of minimizing χ one con-structs a functional Q = χ − αS [ A ], where S [ A ] isthe “entropy” term. The positive parameter α is a La-grange multiplier that can also be thought of as a “tem-perature” by analogy to classical statistics (note that ourdefinition of χ differs by a factor of N from the stan-dard MEM formulation; this is, however, only a mat-ter of convention). The entropy term S [ A ] takes theform S [ A ] = − (cid:82) dzA ( z ) ln [ A ( z ) /M ( z )] with M ( z ) be-ing the default model. For very large values of α , thedefault model term dominates in Q , reflecting our igno-rance about the system. For very low values of α , the“energy” term χ dominates, reflecting the quality of theinput data. For intermediate values of α , one interpo-lates between these two limits and obtains a trade-offbetween accuracy and smooth behavior enforced by thedefault model. We are using Bryan’s method to im-plement the minimization procedure: the final answer isobtained by averaging over all values of α weighted withthe respective a posteriori probability (we saw howeverlittle difference between Bryan’s method and the classi-cal MEM in the examples below). In Bryan’s method,a singular value decomposition is also applied, which re-flects the fact that the finite precision of storing floatingpoint numbers in combination with the poor conditioningof the kernel puts severe limitations on the informationthat can possibly be retrieved. One can hence reduce thesearch space at no substantial loss; in practice, only 5to 20 search directions survive this step. The remainingminimization is performed by the Levenberg-Marquadtalgorithm. Bryan’s method is, after 25 years, still the de facto standard for inversion problems in condensedmatter physics. One of its most attractive features isits speed: a few seconds on a laptop usually suffice toget a reasonable answer provided good starting param-eters (for the grid, default model, and the range of α )have been found. Nevertheless, the obtained answer (in-cluding the a posteriori probability distributions) shouldalways be carefully checked.A major issue is that the solution may strongly dependon the default model. (Note that the error bars whichRef. 7 calculates, are conditional on the default modeland do hence not reflect variability with respect to dif-ferent default models.) A practitioner usually wants toexplore different (classes of) default models in order toget an idea of the robustness of the obtained answer, andsometimes to examine if lower values of χ can be foundfor other solutions which are equally smooth. In this re-gard, all these solutions are reminiscent of basic solutionsdiscussed above, but the probability density of solutionsis different due to the difference in protocols: in the spiritof MEM one does not want default models that are toosimilar or default models that are too close to the ob-tained answer (an iteration where the new default modelis the answer from a previous run, is considered a self-defeating strategy). This raises an important question ofwhat strategy should be used to produce a representa-tive set of basic solutions within MEM. One possibilityis stochastic exploration of the configuration space of de-fault models. IV. PERFORMANCE TESTS
We perform blind tests of our method for two differentkernels. The function g ( τ n ) was prepared from equa-tion (1) and uncorrelated Gaussian noise was afterwardsadded to g ( τ n ). A. Resolving the width of the high-energy peak
In this subsection, we assume that the spectral func-tion A ( z ) is identically equal to zero at z <
0, non-negative at z >
0, and the kernel is K ( τ n , z ) = e − zτ n ,see Eq. (2).The spectral function for test 1 and test 2 (shown inFig. 1) contains two peaks of finite width and has thefollowing form (up to a normalization constant) A ( z ) = c σ exp (cid:26) − ( z − z ) σ (cid:27) + c σ exp (cid:26) − ( z − z ) σ (cid:27) , (28)where z = 0 . c = 0 . σ = 0 . z = 2 . c = 0 .
41, and σ = 0 . σ = 10 − for test 1 and σ = 10 − for test 2.In the spectrum for test 3 the low-frequency peak isnot a Gaussian but a δ -function with the same position P o s s i b l e t os t r e t c h z A ( z )
FIG. 3. (Color online.) Results for test 1, featuring two peaksof finite width with noise level 10 − . Shown is the compari-son between the actual spectrum (red solid line), the smoothSOCC spectrum (blue short-dashed line), and the pulled-uphigh-energy peak SOCC solution (green dashed line). The er-ror bars for the smooth SOCC spectrum { σ m } are determinedfrom Eq. (7). and weight, A ( z ) = √ π c δ ( z − z )+ c σ exp (cid:26) − ( z − z ) σ (cid:27) , (29)where z = 0 . c = 0 . z = 2 . c = 0 .
41, and σ = 0 . σ = 10 − .The challenge for NAC is to judge whether one can re-solve the width of the high-frequency peak. To this endwe consider two possible setups for MEM and SOCC.In the standard setup we assume a flat default model forMEM and the SOCC procedure of generating smooth so-lutions as described in Sec. II B in the absence of the de-fault model penalty O . To study possible deformationsof the second peak, we then introduce a narrow-peak de-fault model in MEM and re-run the simulation, or, in thecase SOCC, we insist that the final solution goes througha much higher point at the peak maximum. The width ofthe high-frequency peak is deemed impossible to resolveif one can reduce it by a factor of two, while the spectrumremains well-defined.We note that better reproducibility of the low-frequency peak is a particular property of the kernel (2).For example, for the analytic continuation of the current-current correlation function to the optical conductivity,the main challenge is to resolve the spectral density atzero frequency. Analysis of test 1 shows that the low-energy peak canbe well resolved by both SOCC and MEM (see the insets
A ( z )
P o s s i b l e t os t r e t c h z FIG. 4. (Color online.) Results for test 1, featuring twopeaks of finite width with noise level 10 − . Shown is thecomparison between the actual spectrum (red solid line), theMEM spectrum with a flat default model (blue short dashedline), and the MEM spectrum with a pulled-up high-energypeak in the default model (green dashed line). Probability density s s = 0 . 0 6 4 FIG. 5. (Color online.) Distribution of the second momentfor the high frequency peak in test 1 (short-dashed black line),test 2 (dashed blue line), and test 3 (red solid line), among allbasic solutions. The vertical line shows the second moment σ = 0 .
064 of the original spectrum. in Figs. 3 and 4, respectively). On the other hand, thehigh-energy peak width is severely overestimated by bothmethods in the standard setup (Figs. 3 and 4). However,both methods allow one to pull the high-frequency peakup at least by a factor of four above the actual spec-trum. Specifically, if the second peak in the MEM de-fault model is set to be much narrower than the actual
P o s s i b l e t os t r e t c h z A ( z ) FIG. 6. (Color online.) Results for test 2, featuring twopeaks of finite width with noise level 10 − . Shown is the com-parison between the actual spectrum (red solid line) and thesmooth SOCC spectrum (blue short-dashed line), the pulled-up high-energy peak SOCC solution (green dashed line). Theerror bars for smooth SOCC spectrum { σ m } determined fromEq. (7). one, then MEM produces an answer of the same widthas this default model. Similarly, the superposition of ba-sic SOCC solutions can be forced to have a much higheramplitude at the second peak maximum by employing anappropriate target function. In SOCC, one may also seedirect evidence that the second peak width is question-able by considering statistics of the second moments σ for the high-frequency peak among all basic solutions.The corresponding distribution is presented in Fig. 5.The probability to find a solution with a vanishing width( σ →
0) for the high-frequency peak does not go to zerofor test 1 and, hence, the imaginary-time data for g ( τ n )(within their accuracy) do not rule out a δ -function forthe high-frequency peak.Further insight is provided by test 2, which differs fromtest 1 only in the noise level, which is reduced by two or-ders of magnitude. One readily observes that, in contrastto test 1, the standard setups of SOCC and MEM give avery good description of the high-frequency peak (Figs. 6and 7). Does this mean that one can be absolutely surethat the width of the peak is finite? The answer is no,because one can still easily pull the peak up by a factor offour. Moreover, SOCC analysis of second moments (seeFig. 5) demonstrates that a δ -functional shaped secondpeak is still a possibility, despite improved error bars. Inthese examples, tighter error bars allow only to reducethe upper bound on the width of the second peak. Muchsmaller error bars, which are unrealistic for Monte Carlosimulations of g ( τ n ), would be required to controllably0 A ( z ) P o s s i b l e t os t r e t c h z FIG. 7. (Color online.) Results for test 2, featuring twopeaks of finite width with noise level 10 − . Shown is the com-parison between the actual spectrum (red solid line), MEMwith a flat default model (blue short-dashed line), and MEMwith a pulled-up high-energy peak in the default model (greendashed line). resolve the actual width of the second peak.Test 3 has the same Gaussian noise as test 2 but thelow-frequency peak is now replaced by a δ -function, seeEq. (29). This crucially changes the results. Now thehigh-frequency peak is well reproduced not only in thestandard setup of SOCC and MEM but also in attemptsto pull the solution up, see Figs. 8 and 9, respectively.Stability of results for the second peak width is also ev-ident in the probability distribution for the second mo-ment shown in Fig. 5. The distribution is peaked at thecorrect value σ = 0 .
064 and is rather narrow, indicatingthat a narrower peak would compromise the error bars.We emphasize that the success of resolving the widthof the second peak in test 3 is due to a combination oftwo circumstances: the small width of the first peak andthe high accuracy of the input data. To see this, it isinstructive to consider the physical example of the Fermipolaron (see Sec. V), where the width of the first peak isvery small, but the accuracy of the input data is signifi-cantly lower than in test 3.
B. Fermi distribution kernel
Test 4 analyzes the possibility of resolving spectraldensities at the Fermi level from the analytic continu-ation of g ( τ ). Here, the spectrum A ( z ) is defined inthe range −∞ < z < ∞ and the kernel is K ( τ n , z ) =exp {− zτ n } / (1 + exp {− zβ } ) with β = 6. The uncorre-lated Gaussian noise is added at the 10 − level. One I m p o s s i b l e t o s t r e t c hb e y o n d t h i s p o i n t Z A ( z ) FIG. 8. (Color online.) Results for test 3, characterized by a δ -function at low frequency and a peak of finite width at highfrequency with noise level 10 − . Shown is the comparison be-tween the actual spectrum (red solid line), the smooth SOCCspectrum (blue short-dashed line), and the maximally pulled-up high-energy peak SOCC solution (green dashed line). Theerror bars for smooth SOCC spectrum { σ m } determined fromEq. (7). can see in Figs. 10 and 11 that both SOCC and MEMgive a good description of the spectral function in thevicinity of the chemical potential (at z = 0). Also, theheight of the middle peak at near zero frequency cannotbe significantly pulled up without distorting the rest ofspectrum. Specifically for MEM, trying different defaultmodels with one, two, or three Gaussian peaks did notimprove the answer; in all cases trying to find narrowerpeaks resulted in secondary oscillations reminiscent ofnumerical instabilities. We conclude that the fermionicspectrum can be restored for the given parameters withhigh quality. V. APPLICATION OF SOCC TO THE FERMIPOLARON PROBLEM
We now test the SOCC method on a physical system—the resonant Fermi polaron (a spin-down fermion in a seaof noninteracting spin-up fermions) ; here in three di-mensions and for equal mass of spin-up and spin-downparticles. The coupling between the polaron and its en-vironment is characterized by a single dimensionless pa-rameter k F a , where k F is the Fermi wave vector and a the s-wave scattering length. Here we examine a typicalsituation at k F a = 0 . δ -function.The imaginary-time polaron Green’s function at zero1 Z Z A ( z ) I m p o s s i b l e t o s t r e t c hb e y o n d t h i s p o i n t
FIG. 9. (Color online.) Results for test 3, characterized by a δ -function at low frequency and a peak of finite width at highfrequency with noise level 10 − . Shown is the comparisonbetween the actual spectrum (red solid line), the MEM spec-trum with a flat default model (blue short-dashed line), andthe MEM spectrum with a double Gaussian default modelwhere the second peak is maximally pulled up (green dashedline). temperature and zero momentum, g n = g ( τ n ), was ob-tained with diagrammatic Monte Carlo, see Refs. 1, 13,and 36. We are able to achieve very high precisionin our results for g ( τ ) with a relative error as low as O (10 − − − ) at τ close to zero, O (10 − − − ) around τ = 1 /ε F (where ε F is the Fermi energy of spin-upfermions) and a few percent at the largest τ consideredfor the analytic continuation. The kernel at zero temper-ature is K ( τ n , z ) = e − zτ n .The polaron spectral function features two peaks.The position and weight of the first polaron peak arefixed with high accuracy by the asymptotic decay ofthe Green’s function, − g ( τ → ∞ ) → Z e − E τ . Ourdata at large ε F τ n can be fitted to a single exponential(within error bars) indicating that the polaron remains awell-defined quasi-particle in this parameter range. Theparticle-hole continuum emerges at higher frequencies asa second broad peak. A key question we want to addresshere is its spectral width. This has been discussed in thecontext of the repulsive polaron state. In order forit to qualify to be a well-defined quasiparticle, the peakwidth needs to be sufficiently narrow (much smaller thanthe Fermi energy, corresponding to a sufficiently long lifetime). Thus resolving the width accurately is very impor-tant to correctly interpret this spectrum. Note that thisspectrum has the same general features as the spectrumfor test 1 in the previous section.When SOCC is used to produce a smooth solution thesecond peak emerges as a broad spectral feature. If thiswas indeed the case, the metastable repulsive polaronpicture would be inapplicable for k F a = 0 .
8. However, - 2 - 1 0 1 2 3 40123 - 4 - 2 0 2 4 60 . 0 0 10 . 0 10 . 11 z I m p o s s i b l e t o s t r e t c h b e y o n dt h i s p o i n t
A ( z ) FIG. 10. (Color online.) Results for test 4 with a Fermi dis-tribution kernel and noise level 10 − . Shown is the compari-son between the actual spectrum (red solid line), the smoothSOCC spectrum (blue short-dashed line), and the maximallypulled up central peak SOCC solution (dashed green line).The logarithmic plot in the inset highlights the comparison oflow-intensity features. The error bars for the smooth SOCCspectrum { σ m } are determined from Eq. (7). - 2 - 1 0 1 2 3 40123 - 4 - 2 0 2 4 60 . 0 0 10 . 0 10 . 11 A ( z ) z FIG. 11. (Color online.) Results for test 4 with a Fermi distri-bution kernel and noise level 10 − . Shown is the comparisonbetween the actual spectrum (red solid line) and the MEMspectrum in the default setup (blue dashed line). the same set of basic solutions can be optimized to havea much narrower peak, see Fig. 12, implying that a well-defined repulsive polaron quasi-particle cannot be ruledout. Given that the second peak dispersion can be re-duced by a factor of four without compromising the ac-curacy of the final solution, we have to conclude that thequality of the input data is insufficient to determine the2 - 4 - 2 0 2051 0 k F a = 0 . 8 z A ( z ) FIG. 12. (Color online.) Spectral density of the resonantFermi polaron for k F a = 0 . σ = 0 .
48 is shown by the blue short dashedline. However, a much narrower solution for the second peakwith σ = 0 .
12 (green dashed line) can also be obtained fromthe same set of basic solutions. actual width.
VI. CONCLUSIONS
The most challenging aspect of numerical analytic con-tinuation is not the algorithm of finding a stable (smooth)solution consistent with the input data, but the proto-col of assessing its accuracy and unambiguity. We haveimplemented such a protocol based on the method ofstochastic optimization with consistent constraints anddemonstrated how a similar strategy can be followed withthe maximum entropy method by exploring the spaceof default models. Irrespective of the method, the pro-cedure has to deal with either integrals of the spectralfunction (rather than the function itself) and/or certain a priori and a posteriori constraints consistent with theerror bars on the input data.It is important to distinguish between two cases. The first (simplest) case is when all physically meaningful so-lutions do not differ substantially, upon possible smearingof unimportant (below the resolution) fine details of theotherwise smooth spectral function. The second case,exemplified by the spectral function in Fig. 1, is whena piece of important physical information is inevitablylost. In the first case, a reasonable characterization ofuncertainties can be achieved by coarse-graining, like,e.g., Eq. (3). In the second case, one has to employ amore elaborate approach to reveal the different possiblephysical solutions that do not compromise the error barsof the input data.Much of our attention has been paid to the protocolof treating the second case. We have shown how it canbe handled with SOCC and modified MEM. With MEMone has to explore various default models and resultingsolutions that remain consistent with input error bars.A useful feature of the SOCC approach is that such ananalysis—and, more generally, the application of all pos-sible consistent constraints—can be implemented at thepost-processing stage using a representative set of “ba-sic” solutions generated by the stochastic-optimizationprotocol. The linearity of the problem (1) is crucial here,as it guarantees that any superposition of basic solutionswith non-negative weights is also a solution to Eq. (1)within the same or better level of accuracy. Even if thesuperposition coefficients are allowed to be negative, theprocedure typically keeps the accuracy of the final solu-tion at the level of basic solutions. This allows one toimplement consistent constraints by choosing the super-position coefficients to minimize the corresponding ob-jective function.
Acknowledgements.
This work was supported by theSimons Collaboration on the Many Electron Problem,the National Science Foundation under the grant PHY-1314735, the MURI Program “New Quantum Phasesof Matter” from AFOSR, and FP7/ERC starting grantNo. 306897. A.S.M. is supported by the ImPACT Pro-gram of the Council for Science, Technology and In-novation (Cabinet Office, Government of Japan). Ourmaximum entropy implementation builds on the ALPSimplementation. A. S. Mishchenko, N. V. Prokof’ev, A. Sakamoto, andB. V. Svistunov, Phys. Rev. B , 6317 (2000). A. S. Mishchenko, in Verlag des Forschungszentrum J¨ulich,2012. - 978-3-89336-796-2, edited by E. Pavarini, E. Koch,F. Anders, and M. Jarrell (2012). N. Prokof’ev and B. Svistunov, JETP Lett. , 747 (2013). R. N. Silver, D. S. Sivia, and J. E. Gubernatis, Phys. Rev.B , 2380 (1990). R. Bryan, Eur. Biophys. J. , 165 (1990). J. E. Gubernatis, M. Jarrell, R. N. Silver, and D. S. Sivia,Phys. Rev. B , 6011 (1991). M. Jarrell and J. E. Gubernatis, Phys. Rep, , 133 (1996). S. Fuchs, T. Pruschke, and M. Jarrell, Phys. Rev. E ,056701 (2010); A. Dirks, P. Werner, M. Jarrell, and T.Pruschke, Phys. Rev. E , 026701 (2010). O. Gunnarsson, M. W. Haverkort, and G. Sangiovanni,Phys. Rev. B , 155107 (2010); ibid Phys. Rev. B ,165125 (2010). A. Schirotzek, C.-H. Wu, A. Sommer, and M. W. Zwierlein,Phys. Rev. Lett. , 230402 (2009). C. Kohstall, M. Zaccanti, M. Jag, A. Trenkwalder, P.Massignan, G. M. Bruun, F. Schreck, and R. Grimm, Na-ture , 615618 (2012). M. Koschorreck, D. Pertot, E. Vogt, B. Fr¨ohlich, M. Feld,and M. K¨ohl, Nature , 619622 (2012). O. Goulko, A. S. Mishchenko, N. Prokof’ev, and B Svis-tunov, arXiv:1603.06963. M. Parish, and J. Levinsen, arXiv:1608.00864. K. Kamikado, T. Kanazawa, and S. Uchino,arXiv:1606.03721. R. Schmidt and T. Enss, Phys. Rev. A , 063620 (2011). P. Massignan, and G. M. Bruun, Eur. Phys. J. D , 83(2011). Such a statistical treatment of multiple independent solu-tions was first suggested in the fast modification of SOMmethod, see Ref. 32 A. N. Tikhonoff, Dokladyu Akademii Nauk SSSR, , 195(1943); ibid , 501 (1963) [Soviet Mathematics , 1035(1963)]. D. L. Phillips, J. ACM , 84 (1962). A. N. Tikhonoff and V.Y. Arsenin,
Solutions of Ill-posedProblems (Winston & Sons, Washington, 1977). A. N. Tikhonov, A. Goncharsky, V. V. Stepanov, andA. Yagola,
Numerical Methods for the Solution of Ill-posed Problems (Springer-Science+Business Media, B.V.,Moscow, 1995). C. E. Creffield, E. G. Klepfish, E. R. Pike, and S. Sarkar,Phys. Rev. Lett. , 517 (1995). C. L. Lawson and R. J. Hanson,
Solving Least SquaresProblems , Society for Industrial and Applied Mathematics,Philadelphia, 1995. I. S. Krivenko and A. N. Rubtsov, JETP Lett. , 768(2012). J. Sch ˙ott, I. L. M. Locht, E. Lundin, O. Gr˚an`as, O. Eriks-son, and I. Di Marco, Phys. Rev. B , 075104 (2016). J. Sch ˙ott, E. G. C. P. van Loon, I. L. M. Locht, M. Kat-snelson, and I. Di Marco, arXiv:cond-mat/1607.04212. A. W. Sandvik, Phys. Rev. B , 10287 (1998). K. Vafayi and O. Gunnarsson, Phys. Rev. B , 035115(2007). K. S. D. Beach, arXiv:cond-mat/0403055. A. W. Sandvik, arXiv:1502.06066. T. A. Maier, private communication. A. S. Mishchenko, N. Nagaosa, G. De Filippis, A. de Can-dia, and V. Cataudella, Phys. Rev. Lett. , 146401(2015). P. Massignan, M. Zaccanti, and G. M. Bruun, Rep. Prog.Phys. , 034401 (2014). Z. Lan and C. Lobo, J. Indian I. Sci. , 179 (2014). N. Prokof’ev and B. Svistunov, Phys. Rev. B , 020408(2008); ibid Phys. Rev. B , 125101 (2008). B. Bauer et al.et al.