[PDF] χ top (T≫ T c ) in pure-glue QCD through reweighting

Abstract

We calculate the topological susceptibility at 2.5 Tc and 4.1 Tc in SU(3) pure Yang-Mills theory. We define topology with the help of gradient flow and we largely overcome the problem of poor statistics at high temperatures by applying a reweighting technique in terms of the topological charge, measured after a specific small amount of gradient flow. This allows us to obtain a sample of configurations which compares topological sectors with good statistics, with enhanced tunneling between topologies. We quote continuum extrapolated results at these two temperatures and conclude that our method is viable and can be extended without new conceptual problems to the case of full QCD with fermions.

Full PDF

χχ top ( T (cid:29) T c ) in pure-glue QCD through reweighting P. Thomas Jahn, Guy D. Moore, and Daniel Robaina Institut f¨ur Kernphysik (Theoriezentrum), Technische Universit¨at Darmstadt,Schlossgartenstraße 2, D-64289 Darmstadt, Germany ∗ We calculate the topological susceptibility at 2 . T c and 4 . T c in SU(3) pure Yang-Mills the-ory. We deﬁne topology with the help of gradient ﬂow and we largely overcome the problem ofpoor statistics at high temperatures by applying a reweighting technique in terms of the topologicalcharge, measured after a speciﬁc small amount of gradient ﬂow. This allows us to obtain a sampleof conﬁgurations which compares topological sectors with good statistics, with enhanced tunnelingbetween topologies. We quote continuum extrapolated results at these two temperatures and con-clude that our method is viable and can be extended without new conceptual problems to the caseof full QCD with fermions. I. INTRODUCTION

Two of the most challenging problems in particlephysics are the strong CP problem and the origin ofdark matter. The axion [1, 2], a hypothetical light scalarparticle which appears in the Peccei-Quinn mechanism[3, 4], could solve both problems at once: The additionaldegrees of freedom explain that the CP violating phaseΘ QCD in the QCD Lagrangian vanishes [4] and the cor-responding particle could play the role of dark matter inthe Universe [5–7].There is currently a lot of experimental eﬀort to detectaxions; for a review on this we refer to Ref. [8]. Froma theory point of view, the properties of the axion aresensitive to the topological susceptibility of QCD χ ( T ),deﬁned as χ ( T ) = (cid:90) d x (cid:104) q ( x ) q (0) (cid:105) β = 1 βV (cid:10) Q (cid:11) , (1)with q ( x ) the topological charge density and Q = (cid:82) d x q ( x ) its integral (deﬁned below). While the topo-logical susceptibility at low temperatures is well estab-lished [9], calculations become much more challenging athigh temperatures, and axion cosmology requires knowl-edge of χ ( T ) at temperatures up to about 7 T c [10, 11].Recently there has been a burst of progress in determin-ing χ ( T ) at high temperatures [12–18]. However, we feelthat it would still be valuable to make an independentstudy of topological susceptibility which reaches up to 7 T c .At high temperatures topology is expected to be dom-inated by rare single instanton [19, 20] (really caloron[21]) conﬁgurations with a typical size ρ ∼ . /T [22].These conﬁgurations become more suppressed as one con-siders higher temperatures, by χ ( T ) ∝ T − (at lowestperturbative order in pure-glue gauge theory; with lightfermions there is an additional factor of T − N f / ). This ∗ [email protected];[email protected];[email protected] makes studying topology by lattice Monte Carlo simu-lations challenging; in an ensemble of high-temperaturegauge theory conﬁgurations, virtually none of the conﬁg-urations will possess topology, leading to severely limitedstatistics. For instance, if we keep the number of tempo-ral points across the lattice ﬁxed, the instanton density interms of lattice sites vanishes as T − . Furthermore, theeﬃciency with which a Markov chain algorithm samplesthese topological conﬁgurations will be additionally sup-pressed, because the chain must pass through “small” in-stantons (or dislocations) to move between distinct topo-logical sectors, and these dislocations get rarer with de-creasing lattice spacing as a − .One way around this problem is to measure topologyat a low temperature where instantons are not rare, andto work up to high temperatures diﬀerentially by study-ing ﬁxed-topology ensembles [12, 18]. But we feel it isimportant as a cross-check to be able to perform a di-rect study of topology at high temperature. This will re-quire a reweighting procedure to overcome the samplingchallenges. Our goal in this paper is to present such areweighting approach. Since this work is exploratory, wewill content ourselves with a study of the quenched (orpure-glue) theory. Once the technique is well established,we see no obstacles in adapting it to the unquenched case(though there will be the usual increase in numerical costassociated with going from pure glue to unquenched).This paper is organized as follows: In Sec. II, we dis-cuss in detail the method that we use to enhance thenumber of instantons in the lattice simulations, namelya combination of gradient ﬂow and reweighting. Resultsof the topological susceptibility of the quenched theoryup to 4 . T c obtained with this method are presented inSec. III. Conclusions and a discussion can then be foundin Sec. IV. II. METHOD

The statistical problem of calculating the topologi-cal susceptibility at high temperatures on the lattice istwofold. First, the quantity is expected to be physicallyvery small which will result in most of the conﬁgurations a r X i v : . [ h e p - l a t ] S e p having Q = 0. Only by collecting a huge amount ofstatistics, a meaningful statement about expectation val-ues involving the topological charge can be made. Thisproblem becomes more severe as one goes to higher tem-peratures. And in addition, the algorithm itself usedto generate the sample (usually heatbath/overrelaxationor HMC) tends to get stuck in the diﬀerent topologicalsectors, with tunneling events between sectors more andmore suppressed as one takes the continuum limit. Herewe show how to use reweighting to overcome both prob-lems. A. Deﬁnition of reweighting

Our reweighting approach is an evolution of those inRefs. [23–26]. Since the topological charge is not re-stricted to integer values on the lattice, rare events ex-ist that will enable tunneling between diﬀerent sectors.The goal is then to enhance those and use them to gen-erate a sample of conﬁgurations almost homogeneouslydistributed across the topological sectors of interest. Atthe same time, it is mandatory to be able to know byhow much they were enhanced so that this eﬀect can beremoved at the end without losing the statistical power.In the following, we describe one way of achieving this forthe case of pure SU(3) Yang-Mills theory with periodicboundary conditions.The nonperturbative approach of lattice gauge theoryis based on a stochastic evaluation of the partition func-tion Z = (cid:90) D U e − S W [ U ] . (2)Here S W [ U ] is the ordinary SU(3) plaquette Wilson ac-tion and U are the links. The algorithmic challenge con-sists in obtaining a sample of conﬁgurations which is pre-cisely distributed according to the probability distribu-tion d P ( U ) = 1 Z e − S W [ U ] d U. (3)In this way, importance sampling enables the calculationof expectation values of gauge invariant operators via thesimple mean (cid:104) O (cid:105) = 1 N N (cid:88) i O i . (4)At high temperatures well above T c , the ordinary ap-proach just described will yield an ensemble with verylittle topological information. Reweighting works byrewriting Eq. (2) as Z = (cid:90) D U e − S W [ U ]+ W ( ξ ) e − W ( ξ ) , (5) and therefore, Eq. (4) turns into (cid:104)O(cid:105) = (cid:80) Ni O i e − W ( ξ i ) (cid:80) Ni e − W ( ξ i ) (6)if the ensemble is distributed according to the modiﬁedprobability distributiond P rew ( U ) = e − S W [ U ]+ W ( ξ ) d U (cid:82) D U e − S W [ U ]+ W ( ξ ) . (7)Notice that it is guaranteed that Eqs. (4) and (6) yieldidentical results for any choice of the reweighting function W as long as our algorithm converges to Eq. (7) and N →∞ . The argument ξ is an arbitrary set of reweightingvariables which need to be measured on each producedconﬁguration. B. Choice of reweighting variable

If chosen correctly, reweighting variables can accountfor a clear distinction between diﬀerent phases and there-fore favor or suppress certain sectors in Monte Carlospace. A natural choice is then the topological chargeitself: Q = (cid:88) x q ( x ) = 164 π (cid:15) µνρσ (cid:88) x ˆ F aµν ( x ) ˆ F aρσ ( x ) . (8)Here ˆ F µν ( x ) is a lattice discretized form of the ﬁeldstrength. The conventional choice is the “clover” value(the average over the four square plaquettes touching thepoint x ), but we use an a -improved choice composed ofa linear combination of squares and 1 × F clov = 14 (cid:115) ˆ F imp = 512 (cid:115) −  (cid:115) + (cid:115)  . (9)However, using Q directly on the original conﬁgura-tion actually fails, because the topological density con-tains high-dimension operator corrections which are nottopological and which receive large random additive con-tributions. The solution is well-known; we should applysome amount of gradient ﬂow [28, 29] to remove the UVﬂuctuations responsible for this problem. We thereforedeﬁne our single reweighting variable ξ as ξ = Q (cid:48) = 164 π (cid:15) µνρσ (cid:88) x (cid:16) ˆ F aµν ( x ) ˆ F aρσ ( x ) (cid:17) t (cid:48) (10)where t (cid:48) denotes a relatively small amount of Wilson ﬂow.Speciﬁcally, we choose t (cid:48) to be enough Wilson ﬂow thattopology-1 conﬁgurations are clearly distinguished fromrandom ﬂuctuations, but not enough to remove “disloca-tions,” small concentrations of topological charge whichare the intermediate steps between the Q = 0 and Q = 1sectors. Therefore, Q (cid:48) is able to distinguish between ﬂuc-tuations about the Q = 0 sector, dislocations which liebetween topological sectors, and genuine Q = 1 conﬁgu-rations. The true topological charge of the conﬁgurationis denoted by Q and is measured after a larger amount ofﬂow. We shall come back to this distinction in Sec. II E. C. Updating with an arbitrary weight function

Next we describe the Markov chain algorithm whoseequilibrium probability distribution isd P rew ( U ) = e − S W [ U ]+ W ( Q (cid:48) ) d U (cid:82) D U e − S W [ U ]+ W ( Q (cid:48) ) (11)assuming that the function W ( Q (cid:48) ) is already known.Although it is common practice to use the heat-bath/overrelaxation algorithm in the context of puregauge theories, the hybrid Monte Carlo algorithm (HMC)supports a conceptually simple fermionic extension, so wewill use it instead.One of the simplest ways of producing a sample ac-cording to a given probability distribution is to use aMetropolis-inspired algorithm. This algorithm fulﬁlls de-tailed balance and therefore the Markov chain has anequilibrium distribution to which the system convergesif enough updates are done. After having evolved theHamilton equations as part of the standard moleculardynamics evolution, an accept or reject step accepts theconﬁguration with probability P HMC = min (cid:8) , e − ∆ H (cid:9) , (12)where ∆ H = H f − H i is the energy diﬀerence (the sub-scripts “i” and “f” refer to “initial” and “ﬁnal,” respec-tively). It is given by H ( π, U ) = (cid:80) x π ( x ) + S W [ U ].This step alone is of course not suﬃcient for incorporat-ing reweighting. Therefore, the conﬁguration cannot befully accepted yet. An additional reweighting accept orreject step in terms of Q (cid:48) accepts the conﬁguration withprobability P rew = min (cid:8) , e ∆ W (cid:9) , (13)where ∆ W = W f − W i . In total, the transition probability P ( C i → C f ) is given by P ( C i → C f ) = (cid:90) d π i d π f P G ( π i ) P M (( π i , U i ) → ( π f , U f )) × P HMC (∆ H ) P rew (∆ W ) . (14)The probability P G ( π i ) ∼ e − π with which the con-jugate momenta are chosen is drawn Gaussian as usual.The probability P M refers to the molecular dynamics evo-lution which is a deterministic process. Therefore, P M can be seen as a δ -function that evolves the ﬁelds from( π i , U i ) → ( π f , U f ) with a unit probability. Our procedurecan be summarized as follows:1. Generate a candidate conﬁguration by evolving theHamilton equations.2. Perform a Metropolis step in terms of ∆ H ( π, U ).2.1 If accepted, store the candidate conﬁguration.2.2 If rejected, return to the initial current conﬁg-uration (go to step 1).3. If 2.1 is true, integrate the ﬂow equation up to ﬂowtime t (cid:48) and perform a Metropolis step in terms of∆ W ( Q (cid:48) ).3.1 If accepted, return to the unﬂowed candidateconﬁguration and fully accept it.3.2 If rejected, return to the initial current conﬁg-uration (go to step 1).We have described the algorithm assuming a knownreweighting function W ( Q (cid:48) ). In the next section we shalladdress the question of how to choose W ( Q (cid:48) ) such thatthe ﬁnal sample is, in the best case scenario, homoge-neously distributed across topological sectors. D. Building the reweighting function W ( Q (cid:48) ) As explained in the previous subsection, the a priori knowledge of the reweighting function W is mandatoryto implement reweighting. In this subsection we describehow to ﬁnd an optimal choice in a completely automatedway. Our approach is similar to Refs. [25, 26].We perform two HMC Markov chains; one to deter-mine W and one to apply the (now ﬁxed) W functionto perform our actual Monte Carlo study. Here we de-scribe the preparatory Markov chain which determines W . This preparatory run consists of reweighting updatesas described in Sec. II C with the only diﬀerence that thefunction W is updated after each trajectory. In this way,we are able to force the system to visit certain sectors inMonte Carlo space that are rare and, at the same time,avoid those that already were visited quite often.First, we need to deﬁne a reweighting domain Ω rew .This is the interval in Q (cid:48) where W will account forreweighting. A natural choice is ( − Q (cid:48) max , Q (cid:48) max ), where Q (cid:48) max is the highest integer value corresponding to thehighest topological sector that we want to include in thereweighting sample. Since we are ultimately interested in (cid:104) Q (cid:105) , we can make use of the symmetry Q (cid:48) (cid:55)→ − Q (cid:48) andredeﬁne Q (cid:48) new = | Q (cid:48) | . For convenience, we drop the sub-script “new” in what follows. In this way our reweightingdomain is Ω rew = [0 , Q (cid:48) max ] . (15) . . . W . . . . . . Q . . . W FIG. 1. Schematic depiction of how W is built with N int = 5and s = 0 .

4. The red dot indicates the measured Q (cid:48) , whilethe orange points and arrows show the change in the W func-tion. The solid black line shows the updated W function, thedashed line is the updated part of W before the update. Top: Q (cid:48) = 0 .

3. Bottom: Q (cid:48) = 0 . We divide this domain into N int intervals, 0 < Q (cid:48) Q (cid:48) max ; we choose W ( Q (cid:48) > Q (cid:48) max ) = W ( Q (cid:48) max ) inthis (semi-inﬁnite) interval. In other words, values abovethe top edge of our reweighting domain are not rejected;they are just not reweighted any higher than the bound-ary value of the domain. To summarize, W ( Q (cid:48) ) = (cid:26) (1 − x ) W i + xW i +1 , Q (cid:48) ∈ ω i W N int , Q (cid:48) > Q (cid:48) max (16)with x = Q (cid:48) − Q (cid:48) i Q (cid:48) i +1 − Q (cid:48) i . (17)Having deﬁned our reweighting function in the domainof interest, we can start making reweighting updates. Af-ter letting the system thermalize with ordinary HMC up-dates, we begin building the reweighting function withreweighting updates. We start with a constant function W ( Q (cid:48) ) ≡ Our philosophy is that, whatever valueof Q (cid:48) we currently have, this value is presumably over-sampled, and should be made less common by reducing W ( Q (cid:48) ) at the current value. Because W is piecewise lin-ear, the most local change we can make is to change thevalues at the two edges of the current interval. If Q (cid:48) ∈ ω i Notice that overall additive constants are irrelevant. with i (cid:54) = 0, only the corresponding values W i and W i +1 are changed according to W i → W i − s (1 − x ) , (18) W i +1 → W i +1 − sx, (19)while the rest of the function remains unaﬀected. Thisprocedure is illustrated in Fig. 1.The ﬁrst interval, Q (cid:48) ∈ ω , is a special case. We needto remember that Q (cid:48) is strictly positive and therefore W will get updated less than the rest of the points. Wecorrect for this via W → W − s (1 − x ) , (20) W → W − sx. (21)The value of s controls by how much W changes eachupdate. As soon as the gross features of W arise, wedecrease its value to slowly reach convergence. In or-der to do so it is instructive to introduce the notion of complete sweep . We refer to a complete sweep when thereweighting variable Q (cid:48) ranges from ω to ω N int − and back to ω . We count the number of updates needed toaccomplish this and name it M . There is no need thatit visits all intervals. The combination δW = sM/N int tells us how much in average one point of W has beenchanged during the completion of the last sweep. Aftereach complete sweep we compute δW and reduce s to s → max (cid:110) s/ , s (cid:16) − δW . × (cid:17)(cid:111) . Therefore a sweep whichchanges a point in W by of average more than 1.5 willlead to s being cut in half, while a sweep which changesa point in W by a smaller average amount will result inreducing s less. After the value of s has been updated,one resets the counter of M back to zero waiting for thenext completed sweep to appear and repeats the process.Eventually, after several sweeps, once the gross fea-tures of W have arisen, δW will get small since s is be-ing consistently lowered, and the value of M also shouldget smaller since fewer trajectories are needed for a com-pleted sweep to appear (that is the whole idea of thisupdate). We consider the procedure to be complete and W ( Q (cid:48) ) to be ready for use in a Monte Carlo study when δW < .

1. An animated GIF of how W ( Q (cid:48) ) evolves inthis process is included in the Supplemental Material.Our approach bears some similarities to the “metady-namics” approach [30–32] which has also been consideredfor this problem. One diﬀerence is in the way the W ( ξ )function is found. Some metadynamics implementationsvary W ( ξ ) throughout the course of the evolution, whileothers guess an initial value and keep it ﬁxed. We ad-vocate a hybrid approach where W ( ξ ) is varied at ﬁrstto optimize its form, and then frozen to produce a trulydetailed-balance respecting evolution. We also proposea speciﬁc, we believe quite eﬃcient, choice for the W ( ξ )function and its update. The other diﬀerence is that, inmetadynamics, the W ( ξ ) function is included as part ofa “force” term in an HMC evolution, whereas we imple-ment it purely through a Metropolis accept-reject step.The force-term approach is more eﬃcient since HMC tra-jectories can be longer and because the acceptance rateis higher. But it requires evaluating the ﬁeld-derivativeof ξ , which may not always be possible or practical. Forinstance, because we implement gradient ﬂow throughstout smearing [33], our Q (cid:48) is a diﬀerentiable functionof the link variables. But because we use many stoutsmearing steps, the diﬀerential expression is extremelyunwieldy; within a quenched simulation our Metropolisimplementation is much more eﬃcient. But because thefermionic force term is also very expensive, the price maybe worth paying in the unquenched case; indeed, afterour ﬁrst draft of this paper appeared but before its ﬁnalpublication, Bonati et al. succeeded in applying such ametadynamic method to the unquenched case [34].Other approaches may also be available. Recently Bon-ati et al. [35] presented several algorithms to solve theproblem of topological freezing in the context of a simplequantum mechanical system which shares basic similar-ities with the problem at hand. In particular, Sec. IVFcontains very similar concepts as the ones used in thispaper. However it is not clear how the most eﬀective al-gorithms they found could be generalized to the topologyproblem in QCD. E. Parameters to Tune

The procedure described in the last section allows forthe automated determination of W ( Q (cid:48) ), allowing for aneﬃcient reweighting. But several parameters are still tobe determined, and we found in practice that a certainamount of hand tuning was needed to select them.First, there is the depth of gradient ﬂow to use in estab-lishing Q (cid:48) . (In practice we actually used stout smearing[33] with step-size 0.06 as our gradient ﬂow algorithm.This would be totally inadequate if our goal were a pre-cision study of ﬂowed operator expectation values, buthere it is only important to suppress UV ﬂuctuations,so a more eﬃcient if less careful implementation of ﬂowshould be adequate.) We found that t (cid:48) = 0 . a is in-suﬃcient to separate conﬁgurations of diﬀerent topology,while t (cid:48) = 0 . a is enough; larger amounts of gradi-ent ﬂow start to destroy the dislocations, which makes itmore diﬃcult to ﬁnd the conﬁgurations intermediate be-tween topological sectors. Optimally, one should performseveral beginning-to-end determinations of χ , each on thesame lattice and temperature, but each time using dif-ferent t (cid:48) values, to do a systematic study of which choiceleads to the highest statistics for a ﬁxed computationaleﬀort; but we have not done this.Second, there is the choice of the number and lo-cation of intervals. We found 20 intervals to be ade-quate except that there were two “corners” in the W ( Q (cid:48) )function where its slope rather abruptly changes, seeFig. 3. These appear to be the points where the dominanttype of conﬁguration changes (regular thermal conﬁgura-tion to dislocation, dislocation to full-sized caloron), and the Monte Carlo simulation tends to get stuck at thesepoints. We partly cured this by using more, narrowerintervals at these points, which handles ﬁner structure inthe reweighting function and also leads to the algorithmspending more time near these points. So far we havedone this by hand tuning, though presumably an auto-mated method of interval adjustment could be developed,based for instance on the curvature of the determined W [ Q (cid:48) ] function.Next, there are the details of the parameter s whichcontrols how fast we adjust the W ( Q (cid:48) ) function. Wetried variations on the procedure described above andfound little change to the eﬃciency with which a good W ( Q (cid:48) ) is generated. In any case, if high statistics aredesired, the Monte Carlo with ﬁxed W ( Q (cid:48) ) takes most ofthe computational eﬀort.Next, there is the length of molecular-dynamics timeused in the HMC algorithm updates. A larger HMC stepleads to a larger change in the conﬁguration, which isgood because it more eﬃciently explores the phase space.But it leads to larger changes in Q (cid:48) value and thereforeto a higher rejection rate. So the HMC trajectory lengthneeds to be tuned to provide about 50% acceptance ratein the e ∆ W acceptance step. Again, 50% is a rule ofthumb; a more careful analysis would compare the totalachieved statistics at ﬁxed numerical eﬀort as a functionof HMC trajectory length. To date we have not carriedout such a study, and have instead used the 50% rule ofthumb. Our results in what follow used HMC trajectorylengths of 0.2–0.25 a . We are well aware that such a smalltrajectory length will result in big autocorrelation eﬀectsbetween conﬁgurations. We have taken special care inproviding a reliable error estimate by making a carefulerror analysis based on binning and jackknife [36].Finally, there is the choice of the ﬁnal observable usedto determine χ ( T ). Every 100 HMC trajectories, wemake a measurement of Q which we use in our sta-tistical analysis of the susceptibility. We use | Q | aftersome amount t of gradient ﬂow and set its value to 1if | Q | ≥ Q thresh and to 0 if | Q | < Q thresh . (Conﬁgu-rations with | Q | > t , of Q thresh , and of gradient ﬂow procedure (Wilson ver-sus Zeuthen [37]). All choices should lead to the samecontinuum limit and it is not expensive to sample us-ing various choices and compare. This is what we willdo; any diﬀerence between χ ( T ) values due to diﬀerentthreshold or ﬂow depth will indicate deﬁciencies in ourlattice spacing, and must be seen to vanish when we takethe continuum limit. III. RESULTS

Our goal in this paper is to demonstrate our methodand show that it can obtain statistically powerful resultsat high temperatures in a range of lattice spacings and

TABLE I. The lattices used in this work. Those labeled with A correspond to 2 . T c while the Bs are simulations at 4 . T c .Lat T /T c /g β/a L/a . . . . . Q W FIG. 2. W function of 6 × lattice at 2 . T c (Lattice A ). volumes. With this in mind, we study 10 diﬀerent lat-tices, as listed in Tab. I. We use the Wilson gauge actionat two temperatures corresponding to 2 . T c and 4 . T c ;the values of β are taken from Refs. [38, 39]. At thehigher temperature we consider aspect ratios between 1:1and 4:1 with N τ = 8 and at both temperatures we con-sider lattice spacings with N τ = 6 , ,

10 with an aspectratio of about 2.5:1. This allows one study of the volumescaling, and lets us take the continuum limit, but it isnot suﬃcient to consider both limits simultaneously. Allcalculations were carried out over a six month period onone eight-core desktop machine and one server node withtwo Xeon-Phi (KNL) CPUs. By modern standards thisis an extremely modest computational budget.The ﬁrst question is: is it suﬃcient so sample only Q = 0 and | Q | = 1 sectors, or are larger values of | Q | also important in establishing the topological suscepti-bility? To study this, ﬁrst look at Eq. (6) and considerwhat happens when we use Q with a threshold as theobservable: (cid:104) Q (cid:105) = (cid:80) Ni Q i e − W ( Q (cid:48) i ) (cid:80) Ni e − W ( Q (cid:48) i ) (cid:39) (cid:80) i : | Q | =1 e − W ( Q (cid:48) i ) + (cid:80) i : | Q | =2 − W ( Q (cid:48) i ) + . . . (cid:80) i : | Q | =0 e − W ( Q (cid:48) i ) , (22)where in the numerator we only have to sum over | Q | = 1and higher conﬁgurations since Q = 0 does not con-tribute, while in the denominator we only sum over | Q | = 0 because they completely dominate the ensemble.Clearly we need both Q = 0 and | Q | = 1 conﬁgurationsto perform the calculation; but if our accuracy goal is10%, then we only need | Q | = 2 and higher if the totalprobability to be in one of these states is at least 2.5% ofthat for | Q | = 1 states. Therefore we carried out the con-struction of W ( Q (cid:48) ) in the domain 0 ≤ Q (cid:48) ≤

2, shown inFig. 2, for the lower temperature we study and a 6 × lattice (Lattice A ). We see immediately from the ﬁg-ure that Q (cid:48) > . − to occur, while Q (cid:48) > .

75 conﬁgurations occuralready with an e − reweighting. For our t (cid:48) values the | Q | = 1 values all have Q (cid:48) > . | Q | = 2 values allhave Q (cid:48) > .

5, so this means that | Q | = 2 conﬁgurationsare suppressed relative to | Q | = 1 conﬁgurations by aboute − . Therefore | Q | = 2 plays a tiny role in Eq. (22) andcan be safely ignored. In a larger volume, an instantongas estimate says that the | Q | = 1 conﬁgurations shouldget more common with V and the | Q | = 2 conﬁgurationsshould get more common with V . So | Q | = 2 wouldstart to become relevant in a box with an aspect ratioof about 15. Such enormous lattices are not needed tostudy χ ( T = 2 . T c ), and so we do not need to consider | Q | ≥

2. This conclusion only strengthens for a larger T where the susceptibility is still smaller. Obviously, atsome lower temperature it will break down and we willneed many topological sectors; so every time we go to . . . . . . Q W . . . . . . Q FIG. 3. W functions of 8 × lattice. Left: 2 . T c (LatticeA ). Right: 4 . T c (Lattice B ). lower temperatures we must revisit this issue.We proceed to compute the reweighting function W ( Q (cid:48) ) using Q (cid:48) max = 1. Two examples are shown inFig. 3. In each case there is a deep minimum at Q (cid:48) = 0corresponding to ordinary Q = 0 conﬁgurations and amuch shallower minimum near Q (cid:48) = 1, corresponding to | Q | = 1 conﬁgurations. The broad plateau in betweencan be understood as conﬁgurations containing a dislo-cation. The sharp features in the 4 . T c plot are causedby our abruptly adjusting the width of our intervals. Atﬁner lattices (larger N τ ) the | Q | = 1 minimum becomesdeeper (or more accurately, the barrier gets higher), asthe size of physical calorons becomes more diﬀerent fromthe lattice spacing. . . . . . . . LT . . . . . χ t o p / T c × − Q thresh = 0 . Q thresh = 0 . Q thresh = 0 . FIG. 4. Finite volume dependence of the susceptibility at N τ = 8 and T = 4 . T c (Lattices B through B ), usingZeuthen ﬂow with t = 2 . a and three diﬀerent values of Q thresh (points have been displaced for reasons of visibility). With the W ( Q (cid:48) ) reweighting functions in hand, weproceed to evaluate the topological susceptibility viaEq. (22). For completeness, we present all of our re-sults in Tab. II. The errors in the table always representour statistical uncertainty, for the given lattice spacing,temperature, volume, and Q deﬁnition. Systematic er-rors, particularly those associated with the continuumand large-volume limits, must be determined by com-paring results from diﬀerent lattices. First, consider thelarge-volume limit, by analyzing χ as a function of as-pect ratio, shown in Fig. 4. The ﬁgure evaluates Q af-ter t = 2 . a of improved (Zeuthen) gradient ﬂow, andconsiders three diﬀerent values for the threshold to dis-tinguish between | Q | = 1 and Q = 0: Q thresh = 0 . Q thresh in-troduces a systematic eﬀect (the lower the threshold, thehigher the determined χ value), this eﬀect is statisticallyirrelevant already at LT < . Q thresh = 0 . N τ = 6 ,

8, and 10. Weshow this for T = 2 . T c in Fig. 5 and T = 4 . T c inFig. 6 (note that the N τ = 8 lattice has a slightly dif-ferent aspect ratio than the N τ = 6 ,

10 lattices; smallerfor 2 . T c and larger for 4 . T c ). At the higher tempera-ture, we show results for two ﬂow depths ( t = 1 . a and t = 2 . a ) and two choices of ﬂow action (Wilson andZeuthen). The diﬀerent Q deﬁnitions diﬀer signiﬁcantlyfor N τ = 6 (note that the determinations use the sameMarkov chain, so the errors are highly correlated and thediﬀerence is statistically very signiﬁcant) but are nearlyindistinguishable for N τ = 10; so issues of topology def-inition are seen to become small on ﬁne lattices. Thechoice of topology deﬁnition is irrelevant in the contin-uum limit if we extrapolate in terms of ln( χ ).However, if we attempt to extrapolate χ (cid:0) T, a (cid:1) linearlyagainst a , we get very poor behavior. At T = 2 . T c thetwo continuum limits, based on extrapolating ln( χ ) andextrapolating χ directly, diﬀer by more than their er-ror bars. And at T = 4 . T c , the linear extrapolations of χ ( T ) using diﬀerent deﬁnitions of topology are incompat-ible, and each deﬁnition leads to a negative extrapolatedvalue, which is clearly unphysical. On the other hand,if we perform a linear extrapolation of ln( χ ) against a ,the diﬀerent deﬁnitions of topology produce compatibleresults, which are ﬁnite and physical. The reason thatone should extrapolate in ln( χ ) and not in χ directly, aswe understand it [27], is that the topological suscepti-bility is controlled by the exponential suppression of the TABLE II. This table shows − ln (cid:0) χ/T (cid:1) for all points plotted in Figs. 4–6. Errors are statistical only.Flow: Wilson Wilson Zeuthen Zeuthen Zeuthen Zeuthen t/a : 1 . . . . . . Q thresh : 0.7 0.7 0.7 0.5 0.7 0.9Lat. − ln (cid:0) χ/T (cid:1) − ln (cid:0) χ/T (cid:1) − ln (cid:0) χ/T (cid:1) − ln (cid:0) χ/T (cid:1) − ln (cid:0) χ/T (cid:1) − ln (cid:0) χ/T (cid:1) A .

00 0 .

01 0 .

02 0 . /N τ − . − . − . − . − . − . − . − . l n (cid:0) χ / T c (cid:1) t Z = 2 . a .

00 0 .

01 0 .

02 0 . /N τ χ / T c × − FIG. 5. Continuum extrapolation at T = 2 . T c based on lattices A , A , A , carried out in terms of χ ( T ) directly (right) andln( χ ( T )) (left) using Zeuthen ﬂow with t = 2 . a and Q thresh = 0 . caloron action exp( − S caloron ) = exp (cid:0) − π /g ( µ ∼ T ) (cid:1) .This action receives multiplicative O (cid:0) a (cid:1) lattice correc-tions: χ ∝ exp( − S ) → exp (cid:0) − [1 − O ( a T )] S (cid:1) . That is,the a corrections are best viewed as a shift in the caloronaction and therefore in the logarithm of the susceptibil-ity. Therefore an extrapolation of ln( χ ( T )) in terms of a is better justiﬁed, and better behaved. Indeed, Fig. 3shows that S caloron is about twice as large at T = 4 . T c than at T = 2 . T c ; so the slope of the extrapolationshould be twice as large in the left panel of Fig. 6 as inFig. 5, which it is. Therefore this picture of the nature of a errors is consistent with our ﬁndings, and an ex-trapolation of ln( χ ) against a is the theoretically bestmotivated way to extrapolate to the continuum. IV. DISCUSSION

We have presented a methodology for applyingreweighting [23] to the measurement of topology in hightemperature pure-glue SU(3) QCD. Our approach in-volves reweighting in terms of a “poor man’s” topologi- .

00 0 .

01 0 .

02 0 . /N τ − . − . − . − . − . − . − . l n (cid:0) χ / T c (cid:1) t W = 1 . a t W = 2 . a t Z = 1 . a t Z = 2 . a .

00 0 .

01 0 .

02 0 . /N τ − − χ / T c × − FIG. 6. Same as Fig. 5 except for T = 4 . T c , using lattices B , B , B . In addition we have shown separately the measuredvalues for two amounts of ﬂow t = 1 . a and t = 2 . a and for two ﬂow actions (Wilson and Zeuthen). Note that thelinear-extrapolated continuum limit is negative for all choices. cal measurement Q (cid:48) ( Q measured after a small amountof ﬂow t (cid:48) = 0 . a and using an a improved topologi-cal density operator). There is then a two-stage simu-lation; ﬁrst we simulate while dynamically changing ourreweight function, to determine the form of the reweightfunction. Then we ﬁx the reweight function and performa Monte Carlo simulation to determine the topologicalsusceptibility.The method is eﬀective; with modest numerical re-sources we are able to treat T = 4 . T c up to an aspectratio of 4 and up to a lattice spacing with N τ = 10, ob-taining good statistics. Making a full continuum extrap-olation but at a modest aspect ratio of 2.5 (extrapolatingthe t = 2 . a , Zeuthen ﬂow results), we ﬁnd χ ( T = 2 . T c ) T = 2 . × − e ± . ,χ ( T = 4 . T c ) T = 3 . × − e ± . . (23)Our results at individual N τ values are consistent withprevious studies; our A lattice gives the same suscepti-bility as found by Berkowitz et al [13], and our results at4 . T c and N τ = 6 , and B ) appear com-patible with those at 4 . T c from Borsanyi et al [17], whohave signiﬁcantly larger statistical errors despite applyingmuch more numerical eﬀort. Those authors also providea continuum-extrapolated functional ﬁt for χ ( T ), whichis in reasonable agreement with our results; applyingtheir ﬁt form to the temperatures we studied, we obtain χ (2 . T c ) = 1 . × − T and χ (4 . T c ) = 5 . × − T . Our results teach a few other lessons. On a lattice with N τ = 6, χ is sensitive to the exact deﬁnition of topology(depth of ﬂow, ﬂow action, threshold). This dependenceis nearly gone by N τ = 10 and seems not to aﬀect thecontinuum extrapolation. The continuum extrapolationshould be performed in terms of ln( χ ), not in terms of χ itself. The continuum extrapolation corrections to ln( χ )can be large and are larger at higher temperature. Noneof these lessons should be surprising [27].We should not claim that our technique solves allproblems, however. Looking at the Q (cid:48) value as a func-tion of measurement number for a short portion of aMarkov chain evolution, shown in Fig. 7, we see thatdespite our reweighting function, there are a few pointswhere the simulation gets “stuck.” It moves easily in therange 0 < Q (cid:48) < .

27 and similarly moves easily across0 . < Q (cid:48) < .

85; but it has diﬃculty moving from oneof these ranges to the other. There is a similar “barrier”around Q (cid:48) = 0 .

8. These problems become more severeas we move to larger N τ . We believe that this occursbecause Q (cid:48) is an incomplete descriptor which is miss-ing some other information which distinguishes betweenthese regions. We partly overcame this problem by mak-ing more, narrower reweighting bins in these overlap re-gions; our reweighting procedure causes the Markov chainto spend approximately equal time in each bin, so nar-rower bins cause more time to be spent in these regions, Of course, without reweighting, not a single point in the plotwould get above Q (cid:48) = 0 . . . . . . . . Q FIG. 7. Piece of a Markov-chain history of Q (cid:48) against mea-surement number, for lattice B . The reweighting allowseﬃcient sampling in three regions, 0 < Q (cid:48) < .

27, 0 .

85, and 0 . < Q (cid:48) < .

05, but has diﬃculty movingbetween these regions. which helps the Markov chain to ﬁnd the way between thediﬀerent regions. (This is the reason for the cuspy discon-tinuities in Fig. 3.) However, while this helps, it hardlysolves the problem, as Fig. 7 attests. We are searchingfor one or more additional observables to serve as furtherreweighting variables in the hopes of improving this sam-pling. Another ineﬃciency is that the number of updatesneeded to move between topological sectors does not im-prove as we increase the volume. Therefore, to achievea given level of statistics, the numerical eﬀort must growlinearly with the volume. We do not foresee any solutionto this problem.Conceptually there are no obstacles to applying our technique to the unquenched case (at high temperatureswhere only one or a few topological sectors are relevant).However, we expect doing so to be numerically morediﬃcult. First, the HMC algorithm requires far morecomputer power with fermions. Second, the Q = 1 sec-tor has near-zero eigenvalues, while the Q = 0 sectorshould have the smallest eigenvalue close to πT . Thechiral limit should be severe. Third, the characteristicsize of a caloron is smaller with light quarks than with-out [22], and therefore the lattice spacing should need tobe smaller ( N τ values larger) than what we need in pureglue. But we view these added numerical challenges asreasons that such studies should use reweighting. Thetemperature range where topology is relevant for axionsis 3 T c to 7 T c [10], where topology is quite suppressedand only the | Q | = 1 sector should contribute. To over-come the challenges just mentioned in this temperaturerange, we absolutely need the improvement in statisticalsampling of | Q | = 1 from reweighting if any statisticalpower is to be achieved.It is less clear that our approach has applications atlower temperatures where multiple topological sectorsare relevant. We might hope that a similar reweightingmethod might help with the topological-sector samplingproblem, which aﬄicts ﬁne lattices. However, it is notclear to us that Q (cid:48) will be an eﬀective reweighting vari-able in this case. We leave study of this problem forfuture work. ACKNOWLEDGMENTS

We thank Kari Rummukainen, Edwin Laermann,Christof Gattringer, Philippe de Forcrand, and OlafKaczmarek for valuable discussions. The authors ac-knowledge support by the Deutsche Forschungsgemein-schaft (DFG) through Grant No. CRC-TR 211, “Strong-interaction matter under extreme conditions.” We alsothank the GSI Helmholtzzentrum and the TU Darmstadtand its Institut f¨ur Kernphysik for supporting this re-search. This work was performed using the framework ofthe publicly available openQCD-1.6 package [40]. [1] S. Weinberg, Phys. Rev. Lett. , 223 (1978).[2] F. Wilczek, Phys. Rev. Lett. , 279 (1978).[3] R. D. Peccei and H. R. Quinn, Phys. Rev. Lett. , 1440(1977).[4] R. D. Peccei and H. R. Quinn, Phys. Rev. D , 1791(1977).[5] J. Preskill, M. B. Wise, and F. Wilczek, Phys. Lett. B , 127 (1983).[6] L. F. Abbott and P. Sikivie, Phys. Lett. B120 , 133(1983).[7] M. Dine and W. Fischler, Phys. Lett.

B120 , 137 (1983).[8] I. G. Irastorza and J. Redondo, Prog. Part. Nucl. Phys. , 89 (2018). [9] G. Grilli di Cortona, E. Hardy, J. P. Vega, and G. Vil-ladoro, JHEP , 034 (2016).[10] G. D. Moore, Proceedings, 35th International Symposiumon Lattice Field Theory (Lattice 2017): Granada, Spain,June 18-24, 2017 , EPJ Web Conf. , 01009 (2018).[11] V. B. Klaer and G. D. Moore, JCAP , 049 (2017).[12] J. Frison, R. Kitano, H. Matsufuru, S. Mori, and N. Ya-mada, JHEP , 021 (2016).[13] E. Berkowitz, M. I. Buchoﬀ, and E. Rinaldi, Phys. Rev. D92 , 034507 (2015).[14] Y. Taniguchi, K. Kanaya, H. Suzuki, and T. Umeda,Phys. Rev. D , 054502 (2017).[15] C. Bonati, M. D’Elia, M. Mariti, G. Martinelli, M. Mesiti, F. Negro, F. Sanﬁlippo, and G. Villadoro, JHEP , 155(2016).[16] P. Petreczky, H.-P. Schadler, and S. Sharma, Phys. Lett.B , 498 (2016).[17] S. Borsanyi, M. Dierigl, Z. Fodor, S. D. Katz, S. W.Mages, D. Nogradi, J. Redondo, A. Ringwald, and K. K.Szabo, Phys. Lett. B , 175 (2016).[18] S. Borsanyi et al. , Nature , 69 (2016).[19] A. A. Belavin, A. M. Polyakov, A. S. Schwartz, andYu. S. Tyupkin, Phys. Lett. B59 , 85 (1975).[20] G. ’t Hooft, Phys. Rev. Lett. , 8 (1976).[21] B. J. Harrington and H. K. Shepard, Phys. Rev. D17 ,2122 (1978).[22] D. J. Gross, R. D. Pisarski, and L. G. Yaﬀe, Rev. Mod.Phys. , 43 (1981).[23] B. A. Berg and T. Neuhaus, Phys. Lett. B267 , 249(1991).[24] K. Kajantie, M. Laine, K. Rummukainen, and M. E.Shaposhnikov, Nucl. Phys.

B466 , 189 (1996).[25] M. Laine and K. Rummukainen, Nucl. Phys.

B535 , 423(1998).[26] F. Wang and D. P. Landau, Phys. Rev. Lett. , 2050(2001).[27] P. T. Jahn, G. D. Moore, and D. Robaina, (2018),arXiv:1805.11511. [28] R. Narayanan and H. Neuberger, JHEP , 064 (2006).[29] M. L¨uscher, Commun. Math. Phys. , 899 (2010).[30] A. Barducci, G. Bussi, and M. Parrinello, Phys. Rev.Lett. , 020603 (2008).[31] A. Laio, G. Martinelli, and F. Sanﬁlippo, JHEP , 089(2016).[32] F. Sanﬁlippo, A. Laio, and G. Martinelli, Proceedings,34th International Symposium on Lattice Field Theory(Lattice 2016): Southampton, UK, July 24-30, 2016 , PoS

LATTICE2016 , 274 (2017).[33] C. Morningstar and M. J. Peardon, Phys. Rev.

D69 ,054501 (2004).[34] C. Bonati, M. D’Elia, G. Martinelli, F. Negro, F. Sanﬁl-ippo, and A. Todaro, (2018), arXiv:1807.07954.[35] C. Bonati and M. D’Elia, Phys. Rev.

E98 , 013308 (2018).[36] U. Wolﬀ (ALPHA), Comput. Phys. Commun. , 143 (2004), [Erratum: Comput. Phys. Com-mun.176,383(2007)].[37] A. Ramos and S. Sint, Eur. Phys. J. C , 15 (2016).[38] M. Caselle, A. Nada, and M. Panero, (2018),arXiv:1801.03110.[39] L. Giusti and H. B. Meyer, Phys. Rev. Lett. , 131601(2011).[40] http://luscher.web.cern.ch/luscher/openQCD/index.htmlhttp://luscher.web.cern.ch/luscher/openQCD/index.html