χ top (T≫ T c ) in pure-glue QCD through reweighting
χχ top ( T (cid:29) T c ) in pure-glue QCD through reweighting P. Thomas Jahn, Guy D. Moore, and Daniel Robaina Institut f¨ur Kernphysik (Theoriezentrum), Technische Universit¨at Darmstadt,Schlossgartenstraße 2, D-64289 Darmstadt, Germany ∗ We calculate the topological susceptibility at 2 . T c and 4 . T c in SU(3) pure Yang-Mills the-ory. We define topology with the help of gradient flow and we largely overcome the problem ofpoor statistics at high temperatures by applying a reweighting technique in terms of the topologicalcharge, measured after a specific small amount of gradient flow. This allows us to obtain a sampleof configurations which compares topological sectors with good statistics, with enhanced tunnelingbetween topologies. We quote continuum extrapolated results at these two temperatures and con-clude that our method is viable and can be extended without new conceptual problems to the caseof full QCD with fermions. I. INTRODUCTION
Two of the most challenging problems in particlephysics are the strong CP problem and the origin ofdark matter. The axion [1, 2], a hypothetical light scalarparticle which appears in the Peccei-Quinn mechanism[3, 4], could solve both problems at once: The additionaldegrees of freedom explain that the CP violating phaseΘ QCD in the QCD Lagrangian vanishes [4] and the cor-responding particle could play the role of dark matter inthe Universe [5–7].There is currently a lot of experimental effort to detectaxions; for a review on this we refer to Ref. [8]. Froma theory point of view, the properties of the axion aresensitive to the topological susceptibility of QCD χ ( T ),defined as χ ( T ) = (cid:90) d x (cid:104) q ( x ) q (0) (cid:105) β = 1 βV (cid:10) Q (cid:11) , (1)with q ( x ) the topological charge density and Q = (cid:82) d x q ( x ) its integral (defined below). While the topo-logical susceptibility at low temperatures is well estab-lished [9], calculations become much more challenging athigh temperatures, and axion cosmology requires knowl-edge of χ ( T ) at temperatures up to about 7 T c [10, 11].Recently there has been a burst of progress in determin-ing χ ( T ) at high temperatures [12–18]. However, we feelthat it would still be valuable to make an independentstudy of topological susceptibility which reaches up to 7 T c .At high temperatures topology is expected to be dom-inated by rare single instanton [19, 20] (really caloron[21]) configurations with a typical size ρ ∼ . /T [22].These configurations become more suppressed as one con-siders higher temperatures, by χ ( T ) ∝ T − (at lowestperturbative order in pure-glue gauge theory; with lightfermions there is an additional factor of T − N f / ). This ∗ [email protected];[email protected];[email protected] makes studying topology by lattice Monte Carlo simu-lations challenging; in an ensemble of high-temperaturegauge theory configurations, virtually none of the config-urations will possess topology, leading to severely limitedstatistics. For instance, if we keep the number of tempo-ral points across the lattice fixed, the instanton density interms of lattice sites vanishes as T − . Furthermore, theefficiency with which a Markov chain algorithm samplesthese topological configurations will be additionally sup-pressed, because the chain must pass through “small” in-stantons (or dislocations) to move between distinct topo-logical sectors, and these dislocations get rarer with de-creasing lattice spacing as a − .One way around this problem is to measure topologyat a low temperature where instantons are not rare, andto work up to high temperatures differentially by study-ing fixed-topology ensembles [12, 18]. But we feel it isimportant as a cross-check to be able to perform a di-rect study of topology at high temperature. This will re-quire a reweighting procedure to overcome the samplingchallenges. Our goal in this paper is to present such areweighting approach. Since this work is exploratory, wewill content ourselves with a study of the quenched (orpure-glue) theory. Once the technique is well established,we see no obstacles in adapting it to the unquenched case(though there will be the usual increase in numerical costassociated with going from pure glue to unquenched).This paper is organized as follows: In Sec. II, we dis-cuss in detail the method that we use to enhance thenumber of instantons in the lattice simulations, namelya combination of gradient flow and reweighting. Resultsof the topological susceptibility of the quenched theoryup to 4 . T c obtained with this method are presented inSec. III. Conclusions and a discussion can then be foundin Sec. IV. II. METHOD
The statistical problem of calculating the topologi-cal susceptibility at high temperatures on the lattice istwofold. First, the quantity is expected to be physicallyvery small which will result in most of the configurations a r X i v : . [ h e p - l a t ] S e p having Q = 0. Only by collecting a huge amount ofstatistics, a meaningful statement about expectation val-ues involving the topological charge can be made. Thisproblem becomes more severe as one goes to higher tem-peratures. And in addition, the algorithm itself usedto generate the sample (usually heatbath/overrelaxationor HMC) tends to get stuck in the different topologicalsectors, with tunneling events between sectors more andmore suppressed as one takes the continuum limit. Herewe show how to use reweighting to overcome both prob-lems. A. Definition of reweighting
Our reweighting approach is an evolution of those inRefs. [23–26]. Since the topological charge is not re-stricted to integer values on the lattice, rare events ex-ist that will enable tunneling between different sectors.The goal is then to enhance those and use them to gen-erate a sample of configurations almost homogeneouslydistributed across the topological sectors of interest. Atthe same time, it is mandatory to be able to know byhow much they were enhanced so that this effect can beremoved at the end without losing the statistical power.In the following, we describe one way of achieving this forthe case of pure SU(3) Yang-Mills theory with periodicboundary conditions.The nonperturbative approach of lattice gauge theoryis based on a stochastic evaluation of the partition func-tion Z = (cid:90) D U e − S W [ U ] . (2)Here S W [ U ] is the ordinary SU(3) plaquette Wilson ac-tion and U are the links. The algorithmic challenge con-sists in obtaining a sample of configurations which is pre-cisely distributed according to the probability distribu-tion d P ( U ) = 1 Z e − S W [ U ] d U. (3)In this way, importance sampling enables the calculationof expectation values of gauge invariant operators via thesimple mean (cid:104) O (cid:105) = 1 N N (cid:88) i O i . (4)At high temperatures well above T c , the ordinary ap-proach just described will yield an ensemble with verylittle topological information. Reweighting works byrewriting Eq. (2) as Z = (cid:90) D U e − S W [ U ]+ W ( ξ ) e − W ( ξ ) , (5) and therefore, Eq. (4) turns into (cid:104)O(cid:105) = (cid:80) Ni O i e − W ( ξ i ) (cid:80) Ni e − W ( ξ i ) (6)if the ensemble is distributed according to the modifiedprobability distributiond P rew ( U ) = e − S W [ U ]+ W ( ξ ) d U (cid:82) D U e − S W [ U ]+ W ( ξ ) . (7)Notice that it is guaranteed that Eqs. (4) and (6) yieldidentical results for any choice of the reweighting function W as long as our algorithm converges to Eq. (7) and N →∞ . The argument ξ is an arbitrary set of reweightingvariables which need to be measured on each producedconfiguration. B. Choice of reweighting variable
If chosen correctly, reweighting variables can accountfor a clear distinction between different phases and there-fore favor or suppress certain sectors in Monte Carlospace. A natural choice is then the topological chargeitself: Q = (cid:88) x q ( x ) = 164 π (cid:15) µνρσ (cid:88) x ˆ F aµν ( x ) ˆ F aρσ ( x ) . (8)Here ˆ F µν ( x ) is a lattice discretized form of the fieldstrength. The conventional choice is the “clover” value(the average over the four square plaquettes touching thepoint x ), but we use an a -improved choice composed ofa linear combination of squares and 1 × F clov = 14 (cid:115) ˆ F imp = 512 (cid:115) − (cid:115) + (cid:115) . (9)However, using Q directly on the original configura-tion actually fails, because the topological density con-tains high-dimension operator corrections which are nottopological and which receive large random additive con-tributions. The solution is well-known; we should applysome amount of gradient flow [28, 29] to remove the UVfluctuations responsible for this problem. We thereforedefine our single reweighting variable ξ as ξ = Q (cid:48) = 164 π (cid:15) µνρσ (cid:88) x (cid:16) ˆ F aµν ( x ) ˆ F aρσ ( x ) (cid:17) t (cid:48) (10)where t (cid:48) denotes a relatively small amount of Wilson flow.Specifically, we choose t (cid:48) to be enough Wilson flow thattopology-1 configurations are clearly distinguished fromrandom fluctuations, but not enough to remove “disloca-tions,” small concentrations of topological charge whichare the intermediate steps between the Q = 0 and Q = 1sectors. Therefore, Q (cid:48) is able to distinguish between fluc-tuations about the Q = 0 sector, dislocations which liebetween topological sectors, and genuine Q = 1 configu-rations. The true topological charge of the configurationis denoted by Q and is measured after a larger amount offlow. We shall come back to this distinction in Sec. II E. C. Updating with an arbitrary weight function
Next we describe the Markov chain algorithm whoseequilibrium probability distribution isd P rew ( U ) = e − S W [ U ]+ W ( Q (cid:48) ) d U (cid:82) D U e − S W [ U ]+ W ( Q (cid:48) ) (11)assuming that the function W ( Q (cid:48) ) is already known.Although it is common practice to use the heat-bath/overrelaxation algorithm in the context of puregauge theories, the hybrid Monte Carlo algorithm (HMC)supports a conceptually simple fermionic extension, so wewill use it instead.One of the simplest ways of producing a sample ac-cording to a given probability distribution is to use aMetropolis-inspired algorithm. This algorithm fulfills de-tailed balance and therefore the Markov chain has anequilibrium distribution to which the system convergesif enough updates are done. After having evolved theHamilton equations as part of the standard moleculardynamics evolution, an accept or reject step accepts theconfiguration with probability P HMC = min (cid:8) , e − ∆ H (cid:9) , (12)where ∆ H = H f − H i is the energy difference (the sub-scripts “i” and “f” refer to “initial” and “final,” respec-tively). It is given by H ( π, U ) = (cid:80) x π ( x ) + S W [ U ].This step alone is of course not sufficient for incorporat-ing reweighting. Therefore, the configuration cannot befully accepted yet. An additional reweighting accept orreject step in terms of Q (cid:48) accepts the configuration withprobability P rew = min (cid:8) , e ∆ W (cid:9) , (13)where ∆ W = W f − W i . In total, the transition probability P ( C i → C f ) is given by P ( C i → C f ) = (cid:90) d π i d π f P G ( π i ) P M (( π i , U i ) → ( π f , U f )) × P HMC (∆ H ) P rew (∆ W ) . (14)The probability P G ( π i ) ∼ e − π with which the con-jugate momenta are chosen is drawn Gaussian as usual.The probability P M refers to the molecular dynamics evo-lution which is a deterministic process. Therefore, P M can be seen as a δ -function that evolves the fields from( π i , U i ) → ( π f , U f ) with a unit probability. Our procedurecan be summarized as follows:1. Generate a candidate configuration by evolving theHamilton equations.2. Perform a Metropolis step in terms of ∆ H ( π, U ).2.1 If accepted, store the candidate configuration.2.2 If rejected, return to the initial current config-uration (go to step 1).3. If 2.1 is true, integrate the flow equation up to flowtime t (cid:48) and perform a Metropolis step in terms of∆ W ( Q (cid:48) ).3.1 If accepted, return to the unflowed candidateconfiguration and fully accept it.3.2 If rejected, return to the initial current config-uration (go to step 1).We have described the algorithm assuming a knownreweighting function W ( Q (cid:48) ). In the next section we shalladdress the question of how to choose W ( Q (cid:48) ) such thatthe final sample is, in the best case scenario, homoge-neously distributed across topological sectors. D. Building the reweighting function W ( Q (cid:48) ) As explained in the previous subsection, the a priori knowledge of the reweighting function W is mandatoryto implement reweighting. In this subsection we describehow to find an optimal choice in a completely automatedway. Our approach is similar to Refs. [25, 26].We perform two HMC Markov chains; one to deter-mine W and one to apply the (now fixed) W functionto perform our actual Monte Carlo study. Here we de-scribe the preparatory Markov chain which determines W . This preparatory run consists of reweighting updatesas described in Sec. II C with the only difference that thefunction W is updated after each trajectory. In this way,we are able to force the system to visit certain sectors inMonte Carlo space that are rare and, at the same time,avoid those that already were visited quite often.First, we need to define a reweighting domain Ω rew .This is the interval in Q (cid:48) where W will account forreweighting. A natural choice is ( − Q (cid:48) max , Q (cid:48) max ), where Q (cid:48) max is the highest integer value corresponding to thehighest topological sector that we want to include in thereweighting sample. Since we are ultimately interested in (cid:104) Q (cid:105) , we can make use of the symmetry Q (cid:48) (cid:55)→ − Q (cid:48) andredefine Q (cid:48) new = | Q (cid:48) | . For convenience, we drop the sub-script “new” in what follows. In this way our reweightingdomain is Ω rew = [0 , Q (cid:48) max ] . (15) . . . W . . . . . . Q . . . W FIG. 1. Schematic depiction of how W is built with N int = 5and s = 0 .
4. The red dot indicates the measured Q (cid:48) , whilethe orange points and arrows show the change in the W func-tion. The solid black line shows the updated W function, thedashed line is the updated part of W before the update. Top: Q (cid:48) = 0 .
3. Bottom: Q (cid:48) = 0 . We divide this domain into N int intervals, 0 < Q (cid:48) Q (cid:48) max ; we choose W ( Q (cid:48) > Q (cid:48) max ) = W ( Q (cid:48) max ) inthis (semi-infinite) interval. In other words, values abovethe top edge of our reweighting domain are not rejected;they are just not reweighted any higher than the bound-ary value of the domain. To summarize, W ( Q (cid:48) ) = (cid:26) (1 − x ) W i + xW i +1 , Q (cid:48) ∈ ω i W N int , Q (cid:48) > Q (cid:48) max (16)with x = Q (cid:48) − Q (cid:48) i Q (cid:48) i +1 − Q (cid:48) i . (17)Having defined our reweighting function in the domainof interest, we can start making reweighting updates. Af-ter letting the system thermalize with ordinary HMC up-dates, we begin building the reweighting function withreweighting updates. We start with a constant function W ( Q (cid:48) ) ≡ Our philosophy is that, whatever valueof Q (cid:48) we currently have, this value is presumably over-sampled, and should be made less common by reducing W ( Q (cid:48) ) at the current value. Because W is piecewise lin-ear, the most local change we can make is to change thevalues at the two edges of the current interval. If Q (cid:48) ∈ ω i Notice that overall additive constants are irrelevant. with i (cid:54) = 0, only the corresponding values W i and W i +1 are changed according to W i → W i − s (1 − x ) , (18) W i +1 → W i +1 − sx, (19)while the rest of the function remains unaffected. Thisprocedure is illustrated in Fig. 1.The first interval, Q (cid:48) ∈ ω , is a special case. We needto remember that Q (cid:48) is strictly positive and therefore W will get updated less than the rest of the points. Wecorrect for this via W → W − s (1 − x ) , (20) W → W − sx. (21)The value of s controls by how much W changes eachupdate. As soon as the gross features of W arise, wedecrease its value to slowly reach convergence. In or-der to do so it is instructive to introduce the notion of complete sweep . We refer to a complete sweep when thereweighting variable Q (cid:48) ranges from ω to ω N int − and back to ω . We count the number of updates needed toaccomplish this and name it M . There is no need thatit visits all intervals. The combination δW = sM/N int tells us how much in average one point of W has beenchanged during the completion of the last sweep. Aftereach complete sweep we compute δW and reduce s to s → max (cid:110) s/ , s (cid:16) − δW . × (cid:17)(cid:111) . Therefore a sweep whichchanges a point in W by of average more than 1.5 willlead to s being cut in half, while a sweep which changesa point in W by a smaller average amount will result inreducing s less. After the value of s has been updated,one resets the counter of M back to zero waiting for thenext completed sweep to appear and repeats the process.Eventually, after several sweeps, once the gross fea-tures of W have arisen, δW will get small since s is be-ing consistently lowered, and the value of M also shouldget smaller since fewer trajectories are needed for a com-pleted sweep to appear (that is the whole idea of thisupdate). We consider the procedure to be complete and W ( Q (cid:48) ) to be ready for use in a Monte Carlo study when δW < .
1. An animated GIF of how W ( Q (cid:48) ) evolves inthis process is included in the Supplemental Material.Our approach bears some similarities to the “metady-namics” approach [30–32] which has also been consideredfor this problem. One difference is in the way the W ( ξ )function is found. Some metadynamics implementationsvary W ( ξ ) throughout the course of the evolution, whileothers guess an initial value and keep it fixed. We ad-vocate a hybrid approach where W ( ξ ) is varied at firstto optimize its form, and then frozen to produce a trulydetailed-balance respecting evolution. We also proposea specific, we believe quite efficient, choice for the W ( ξ )function and its update. The other difference is that, inmetadynamics, the W ( ξ ) function is included as part ofa “force” term in an HMC evolution, whereas we imple-ment it purely through a Metropolis accept-reject step.The force-term approach is more efficient since HMC tra-jectories can be longer and because the acceptance rateis higher. But it requires evaluating the field-derivativeof ξ , which may not always be possible or practical. Forinstance, because we implement gradient flow throughstout smearing [33], our Q (cid:48) is a differentiable functionof the link variables. But because we use many stoutsmearing steps, the differential expression is extremelyunwieldy; within a quenched simulation our Metropolisimplementation is much more efficient. But because thefermionic force term is also very expensive, the price maybe worth paying in the unquenched case; indeed, afterour first draft of this paper appeared but before its finalpublication, Bonati et al. succeeded in applying such ametadynamic method to the unquenched case [34].Other approaches may also be available. Recently Bon-ati et al. [35] presented several algorithms to solve theproblem of topological freezing in the context of a simplequantum mechanical system which shares basic similar-ities with the problem at hand. In particular, Sec. IVFcontains very similar concepts as the ones used in thispaper. However it is not clear how the most effective al-gorithms they found could be generalized to the topologyproblem in QCD. E. Parameters to Tune
The procedure described in the last section allows forthe automated determination of W ( Q (cid:48) ), allowing for anefficient reweighting. But several parameters are still tobe determined, and we found in practice that a certainamount of hand tuning was needed to select them.First, there is the depth of gradient flow to use in estab-lishing Q (cid:48) . (In practice we actually used stout smearing[33] with step-size 0.06 as our gradient flow algorithm.This would be totally inadequate if our goal were a pre-cision study of flowed operator expectation values, buthere it is only important to suppress UV fluctuations,so a more efficient if less careful implementation of flowshould be adequate.) We found that t (cid:48) = 0 . a is in-sufficient to separate configurations of different topology,while t (cid:48) = 0 . a is enough; larger amounts of gradi-ent flow start to destroy the dislocations, which makes itmore difficult to find the configurations intermediate be-tween topological sectors. Optimally, one should performseveral beginning-to-end determinations of χ , each on thesame lattice and temperature, but each time using dif-ferent t (cid:48) values, to do a systematic study of which choiceleads to the highest statistics for a fixed computationaleffort; but we have not done this.Second, there is the choice of the number and lo-cation of intervals. We found 20 intervals to be ade-quate except that there were two “corners” in the W ( Q (cid:48) )function where its slope rather abruptly changes, seeFig. 3. These appear to be the points where the dominanttype of configuration changes (regular thermal configura-tion to dislocation, dislocation to full-sized caloron), and the Monte Carlo simulation tends to get stuck at thesepoints. We partly cured this by using more, narrowerintervals at these points, which handles finer structure inthe reweighting function and also leads to the algorithmspending more time near these points. So far we havedone this by hand tuning, though presumably an auto-mated method of interval adjustment could be developed,based for instance on the curvature of the determined W [ Q (cid:48) ] function.Next, there are the details of the parameter s whichcontrols how fast we adjust the W ( Q (cid:48) ) function. Wetried variations on the procedure described above andfound little change to the efficiency with which a good W ( Q (cid:48) ) is generated. In any case, if high statistics aredesired, the Monte Carlo with fixed W ( Q (cid:48) ) takes most ofthe computational effort.Next, there is the length of molecular-dynamics timeused in the HMC algorithm updates. A larger HMC stepleads to a larger change in the configuration, which isgood because it more efficiently explores the phase space.But it leads to larger changes in Q (cid:48) value and thereforeto a higher rejection rate. So the HMC trajectory lengthneeds to be tuned to provide about 50% acceptance ratein the e ∆ W acceptance step. Again, 50% is a rule ofthumb; a more careful analysis would compare the totalachieved statistics at fixed numerical effort as a functionof HMC trajectory length. To date we have not carriedout such a study, and have instead used the 50% rule ofthumb. Our results in what follow used HMC trajectorylengths of 0.2–0.25 a . We are well aware that such a smalltrajectory length will result in big autocorrelation effectsbetween configurations. We have taken special care inproviding a reliable error estimate by making a carefulerror analysis based on binning and jackknife [36].Finally, there is the choice of the final observable usedto determine χ ( T ). Every 100 HMC trajectories, wemake a measurement of Q which we use in our sta-tistical analysis of the susceptibility. We use | Q | aftersome amount t of gradient flow and set its value to 1if | Q | ≥ Q thresh and to 0 if | Q | < Q thresh . (Configu-rations with | Q | > t , of Q thresh , and of gradient flow procedure (Wilson ver-sus Zeuthen [37]). All choices should lead to the samecontinuum limit and it is not expensive to sample us-ing various choices and compare. This is what we willdo; any difference between χ ( T ) values due to differentthreshold or flow depth will indicate deficiencies in ourlattice spacing, and must be seen to vanish when we takethe continuum limit. III. RESULTS
Our goal in this paper is to demonstrate our methodand show that it can obtain statistically powerful resultsat high temperatures in a range of lattice spacings and
TABLE I. The lattices used in this work. Those labeled with A correspond to 2 . T c while the Bs are simulations at 4 . T c .Lat T /T c /g β/a L/a . . . . . Q W FIG. 2. W function of 6 × lattice at 2 . T c (Lattice A ). volumes. With this in mind, we study 10 different lat-tices, as listed in Tab. I. We use the Wilson gauge actionat two temperatures corresponding to 2 . T c and 4 . T c ;the values of β are taken from Refs. [38, 39]. At thehigher temperature we consider aspect ratios between 1:1and 4:1 with N τ = 8 and at both temperatures we con-sider lattice spacings with N τ = 6 , ,
10 with an aspectratio of about 2.5:1. This allows one study of the volumescaling, and lets us take the continuum limit, but it isnot sufficient to consider both limits simultaneously. Allcalculations were carried out over a six month period onone eight-core desktop machine and one server node withtwo Xeon-Phi (KNL) CPUs. By modern standards thisis an extremely modest computational budget.The first question is: is it sufficient so sample only Q = 0 and | Q | = 1 sectors, or are larger values of | Q | also important in establishing the topological suscepti-bility? To study this, first look at Eq. (6) and considerwhat happens when we use Q with a threshold as theobservable: (cid:104) Q (cid:105) = (cid:80) Ni Q i e − W ( Q (cid:48) i ) (cid:80) Ni e − W ( Q (cid:48) i ) (cid:39) (cid:80) i : | Q | =1 e − W ( Q (cid:48) i ) + (cid:80) i : | Q | =2 − W ( Q (cid:48) i ) + . . . (cid:80) i : | Q | =0 e − W ( Q (cid:48) i ) , (22)where in the numerator we only have to sum over | Q | = 1and higher configurations since Q = 0 does not con-tribute, while in the denominator we only sum over | Q | = 0 because they completely dominate the ensemble.Clearly we need both Q = 0 and | Q | = 1 configurationsto perform the calculation; but if our accuracy goal is10%, then we only need | Q | = 2 and higher if the totalprobability to be in one of these states is at least 2.5% ofthat for | Q | = 1 states. Therefore we carried out the con-struction of W ( Q (cid:48) ) in the domain 0 ≤ Q (cid:48) ≤
2, shown inFig. 2, for the lower temperature we study and a 6 × lattice (Lattice A ). We see immediately from the fig-ure that Q (cid:48) > . − to occur, while Q (cid:48) > .
75 configurations occuralready with an e − reweighting. For our t (cid:48) values the | Q | = 1 values all have Q (cid:48) > . | Q | = 2 values allhave Q (cid:48) > .
5, so this means that | Q | = 2 configurationsare suppressed relative to | Q | = 1 configurations by aboute − . Therefore | Q | = 2 plays a tiny role in Eq. (22) andcan be safely ignored. In a larger volume, an instantongas estimate says that the | Q | = 1 configurations shouldget more common with V and the | Q | = 2 configurationsshould get more common with V . So | Q | = 2 wouldstart to become relevant in a box with an aspect ratioof about 15. Such enormous lattices are not needed tostudy χ ( T = 2 . T c ), and so we do not need to consider | Q | ≥
2. This conclusion only strengthens for a larger T where the susceptibility is still smaller. Obviously, atsome lower temperature it will break down and we willneed many topological sectors; so every time we go to . . . . . . Q W . . . . . . Q FIG. 3. W functions of 8 × lattice. Left: 2 . T c (LatticeA ). Right: 4 . T c (Lattice B ). lower temperatures we must revisit this issue.We proceed to compute the reweighting function W ( Q (cid:48) ) using Q (cid:48) max = 1. Two examples are shown inFig. 3. In each case there is a deep minimum at Q (cid:48) = 0corresponding to ordinary Q = 0 configurations and amuch shallower minimum near Q (cid:48) = 1, corresponding to | Q | = 1 configurations. The broad plateau in betweencan be understood as configurations containing a dislo-cation. The sharp features in the 4 . T c plot are causedby our abruptly adjusting the width of our intervals. Atfiner lattices (larger N τ ) the | Q | = 1 minimum becomesdeeper (or more accurately, the barrier gets higher), asthe size of physical calorons becomes more different fromthe lattice spacing. . . . . . . . LT . . . . . χ t o p / T c × − Q thresh = 0 . Q thresh = 0 . Q thresh = 0 . FIG. 4. Finite volume dependence of the susceptibility at N τ = 8 and T = 4 . T c (Lattices B through B ), usingZeuthen flow with t = 2 . a and three different values of Q thresh (points have been displaced for reasons of visibility). With the W ( Q (cid:48) ) reweighting functions in hand, weproceed to evaluate the topological susceptibility viaEq. (22). For completeness, we present all of our re-sults in Tab. II. The errors in the table always representour statistical uncertainty, for the given lattice spacing,temperature, volume, and Q definition. Systematic er-rors, particularly those associated with the continuumand large-volume limits, must be determined by com-paring results from different lattices. First, consider thelarge-volume limit, by analyzing χ as a function of as-pect ratio, shown in Fig. 4. The figure evaluates Q af-ter t = 2 . a of improved (Zeuthen) gradient flow, andconsiders three different values for the threshold to dis-tinguish between | Q | = 1 and Q = 0: Q thresh = 0 . Q thresh in-troduces a systematic effect (the lower the threshold, thehigher the determined χ value), this effect is statisticallyirrelevant already at LT < . Q thresh = 0 . N τ = 6 ,
8, and 10. Weshow this for T = 2 . T c in Fig. 5 and T = 4 . T c inFig. 6 (note that the N τ = 8 lattice has a slightly dif-ferent aspect ratio than the N τ = 6 ,
10 lattices; smallerfor 2 . T c and larger for 4 . T c ). At the higher tempera-ture, we show results for two flow depths ( t = 1 . a and t = 2 . a ) and two choices of flow action (Wilson andZeuthen). The different Q definitions differ significantlyfor N τ = 6 (note that the determinations use the sameMarkov chain, so the errors are highly correlated and thedifference is statistically very significant) but are nearlyindistinguishable for N τ = 10; so issues of topology def-inition are seen to become small on fine lattices. Thechoice of topology definition is irrelevant in the contin-uum limit if we extrapolate in terms of ln( χ ).However, if we attempt to extrapolate χ (cid:0) T, a (cid:1) linearlyagainst a , we get very poor behavior. At T = 2 . T c thetwo continuum limits, based on extrapolating ln( χ ) andextrapolating χ directly, differ by more than their er-ror bars. And at T = 4 . T c , the linear extrapolations of χ ( T ) using different definitions of topology are incompat-ible, and each definition leads to a negative extrapolatedvalue, which is clearly unphysical. On the other hand,if we perform a linear extrapolation of ln( χ ) against a ,the different definitions of topology produce compatibleresults, which are finite and physical. The reason thatone should extrapolate in ln( χ ) and not in χ directly, aswe understand it [27], is that the topological suscepti-bility is controlled by the exponential suppression of the TABLE II. This table shows − ln (cid:0) χ/T (cid:1) for all points plotted in Figs. 4–6. Errors are statistical only.Flow: Wilson Wilson Zeuthen Zeuthen Zeuthen Zeuthen t/a : 1 . . . . . . Q thresh : 0.7 0.7 0.7 0.5 0.7 0.9Lat. − ln (cid:0) χ/T (cid:1) − ln (cid:0) χ/T (cid:1) − ln (cid:0) χ/T (cid:1) − ln (cid:0) χ/T (cid:1) − ln (cid:0) χ/T (cid:1) − ln (cid:0) χ/T (cid:1) A .
00 0 .
01 0 .
02 0 . /N τ − . − . − . − . − . − . − . − . l n (cid:0) χ / T c (cid:1) t Z = 2 . a .
00 0 .
01 0 .
02 0 . /N τ χ / T c × − FIG. 5. Continuum extrapolation at T = 2 . T c based on lattices A , A , A , carried out in terms of χ ( T ) directly (right) andln( χ ( T )) (left) using Zeuthen flow with t = 2 . a and Q thresh = 0 . caloron action exp( − S caloron ) = exp (cid:0) − π /g ( µ ∼ T ) (cid:1) .This action receives multiplicative O (cid:0) a (cid:1) lattice correc-tions: χ ∝ exp( − S ) → exp (cid:0) − [1 − O ( a T )] S (cid:1) . That is,the a corrections are best viewed as a shift in the caloronaction and therefore in the logarithm of the susceptibil-ity. Therefore an extrapolation of ln( χ ( T )) in terms of a is better justified, and better behaved. Indeed, Fig. 3shows that S caloron is about twice as large at T = 4 . T c than at T = 2 . T c ; so the slope of the extrapolationshould be twice as large in the left panel of Fig. 6 as inFig. 5, which it is. Therefore this picture of the nature of a errors is consistent with our findings, and an ex-trapolation of ln( χ ) against a is the theoretically bestmotivated way to extrapolate to the continuum. IV. DISCUSSION
We have presented a methodology for applyingreweighting [23] to the measurement of topology in hightemperature pure-glue SU(3) QCD. Our approach in-volves reweighting in terms of a “poor man’s” topologi- .
00 0 .
01 0 .
02 0 . /N τ − . − . − . − . − . − . − . l n (cid:0) χ / T c (cid:1) t W = 1 . a t W = 2 . a t Z = 1 . a t Z = 2 . a .
00 0 .
01 0 .
02 0 . /N τ − − χ / T c × − FIG. 6. Same as Fig. 5 except for T = 4 . T c , using lattices B , B , B . In addition we have shown separately the measuredvalues for two amounts of flow t = 1 . a and t = 2 . a and for two flow actions (Wilson and Zeuthen). Note that thelinear-extrapolated continuum limit is negative for all choices. cal measurement Q (cid:48) ( Q measured after a small amountof flow t (cid:48) = 0 . a and using an a improved topologi-cal density operator). There is then a two-stage simu-lation; first we simulate while dynamically changing ourreweight function, to determine the form of the reweightfunction. Then we fix the reweight function and performa Monte Carlo simulation to determine the topologicalsusceptibility.The method is effective; with modest numerical re-sources we are able to treat T = 4 . T c up to an aspectratio of 4 and up to a lattice spacing with N τ = 10, ob-taining good statistics. Making a full continuum extrap-olation but at a modest aspect ratio of 2.5 (extrapolatingthe t = 2 . a , Zeuthen flow results), we find χ ( T = 2 . T c ) T = 2 . × − e ± . ,χ ( T = 4 . T c ) T = 3 . × − e ± . . (23)Our results at individual N τ values are consistent withprevious studies; our A lattice gives the same suscepti-bility as found by Berkowitz et al [13], and our results at4 . T c and N τ = 6 , and B ) appear com-patible with those at 4 . T c from Borsanyi et al [17], whohave significantly larger statistical errors despite applyingmuch more numerical effort. Those authors also providea continuum-extrapolated functional fit for χ ( T ), whichis in reasonable agreement with our results; applyingtheir fit form to the temperatures we studied, we obtain χ (2 . T c ) = 1 . × − T and χ (4 . T c ) = 5 . × − T . Our results teach a few other lessons. On a lattice with N τ = 6, χ is sensitive to the exact definition of topology(depth of flow, flow action, threshold). This dependenceis nearly gone by N τ = 10 and seems not to affect thecontinuum extrapolation. The continuum extrapolationshould be performed in terms of ln( χ ), not in terms of χ itself. The continuum extrapolation corrections to ln( χ )can be large and are larger at higher temperature. Noneof these lessons should be surprising [27].We should not claim that our technique solves allproblems, however. Looking at the Q (cid:48) value as a func-tion of measurement number for a short portion of aMarkov chain evolution, shown in Fig. 7, we see thatdespite our reweighting function, there are a few pointswhere the simulation gets “stuck.” It moves easily in therange 0 < Q (cid:48) < .
27 and similarly moves easily across0 . < Q (cid:48) < .
85; but it has difficulty moving from oneof these ranges to the other. There is a similar “barrier”around Q (cid:48) = 0 .
8. These problems become more severeas we move to larger N τ . We believe that this occursbecause Q (cid:48) is an incomplete descriptor which is miss-ing some other information which distinguishes betweenthese regions. We partly overcame this problem by mak-ing more, narrower reweighting bins in these overlap re-gions; our reweighting procedure causes the Markov chainto spend approximately equal time in each bin, so nar-rower bins cause more time to be spent in these regions, Of course, without reweighting, not a single point in the plotwould get above Q (cid:48) = 0 . . . . . . . . Q FIG. 7. Piece of a Markov-chain history of Q (cid:48) against mea-surement number, for lattice B . The reweighting allowsefficient sampling in three regions, 0 < Q (cid:48) < .
27, 0 . 85, and 0 . < Q (cid:48) < . 05, but has difficulty movingbetween these regions. which helps the Markov chain to find the way between thedifferent regions. (This is the reason for the cuspy discon-tinuities in Fig. 3.) However, while this helps, it hardlysolves the problem, as Fig. 7 attests. We are searchingfor one or more additional observables to serve as furtherreweighting variables in the hopes of improving this sam-pling. Another inefficiency is that the number of updatesneeded to move between topological sectors does not im-prove as we increase the volume. Therefore, to achievea given level of statistics, the numerical effort must growlinearly with the volume. We do not foresee any solutionto this problem.Conceptually there are no obstacles to applying our technique to the unquenched case (at high temperatureswhere only one or a few topological sectors are relevant).However, we expect doing so to be numerically moredifficult. First, the HMC algorithm requires far morecomputer power with fermions. Second, the Q = 1 sec-tor has near-zero eigenvalues, while the Q = 0 sectorshould have the smallest eigenvalue close to πT . Thechiral limit should be severe. Third, the characteristicsize of a caloron is smaller with light quarks than with-out [22], and therefore the lattice spacing should need tobe smaller ( N τ values larger) than what we need in pureglue. But we view these added numerical challenges asreasons that such studies should use reweighting. Thetemperature range where topology is relevant for axionsis 3 T c to 7 T c [10], where topology is quite suppressedand only the | Q | = 1 sector should contribute. To over-come the challenges just mentioned in this temperaturerange, we absolutely need the improvement in statisticalsampling of | Q | = 1 from reweighting if any statisticalpower is to be achieved.It is less clear that our approach has applications atlower temperatures where multiple topological sectorsare relevant. We might hope that a similar reweightingmethod might help with the topological-sector samplingproblem, which afflicts fine lattices. However, it is notclear to us that Q (cid:48) will be an effective reweighting vari-able in this case. We leave study of this problem forfuture work. ACKNOWLEDGMENTS We thank Kari Rummukainen, Edwin Laermann,Christof Gattringer, Philippe de Forcrand, and OlafKaczmarek for valuable discussions. The authors ac-knowledge support by the Deutsche Forschungsgemein-schaft (DFG) through Grant No. CRC-TR 211, “Strong-interaction matter under extreme conditions.” We alsothank the GSI Helmholtzzentrum and the TU Darmstadtand its Institut f¨ur Kernphysik for supporting this re-search. This work was performed using the framework ofthe publicly available openQCD-1.6 package [40]. [1] S. Weinberg, Phys. Rev. Lett. , 223 (1978).[2] F. Wilczek, Phys. Rev. Lett. , 279 (1978).[3] R. D. Peccei and H. R. Quinn, Phys. Rev. Lett. , 1440(1977).[4] R. D. Peccei and H. R. Quinn, Phys. Rev. D , 1791(1977).[5] J. Preskill, M. B. Wise, and F. Wilczek, Phys. Lett. B , 127 (1983).[6] L. F. Abbott and P. Sikivie, Phys. Lett. B120 , 133(1983).[7] M. Dine and W. Fischler, Phys. Lett. B120 , 137 (1983).[8] I. G. Irastorza and J. Redondo, Prog. Part. Nucl. Phys. , 89 (2018). [9] G. Grilli di Cortona, E. Hardy, J. P. Vega, and G. Vil-ladoro, JHEP , 034 (2016).[10] G. D. Moore, Proceedings, 35th International Symposiumon Lattice Field Theory (Lattice 2017): Granada, Spain,June 18-24, 2017 , EPJ Web Conf. , 01009 (2018).[11] V. B. Klaer and G. D. Moore, JCAP , 049 (2017).[12] J. Frison, R. Kitano, H. Matsufuru, S. Mori, and N. Ya-mada, JHEP , 021 (2016).[13] E. Berkowitz, M. I. Buchoff, and E. Rinaldi, Phys. Rev. D92 , 034507 (2015).[14] Y. Taniguchi, K. Kanaya, H. Suzuki, and T. Umeda,Phys. Rev. D , 054502 (2017).[15] C. Bonati, M. D’Elia, M. Mariti, G. Martinelli, M. Mesiti, F. Negro, F. Sanfilippo, and G. Villadoro, JHEP , 155(2016).[16] P. Petreczky, H.-P. Schadler, and S. Sharma, Phys. Lett.B , 498 (2016).[17] S. Borsanyi, M. Dierigl, Z. Fodor, S. D. Katz, S. W.Mages, D. Nogradi, J. Redondo, A. Ringwald, and K. K.Szabo, Phys. Lett. B , 175 (2016).[18] S. Borsanyi et al. , Nature , 69 (2016).[19] A. A. Belavin, A. M. Polyakov, A. S. Schwartz, andYu. S. Tyupkin, Phys. Lett. B59 , 85 (1975).[20] G. ’t Hooft, Phys. Rev. Lett. , 8 (1976).[21] B. J. Harrington and H. K. Shepard, Phys. Rev. D17 ,2122 (1978).[22] D. J. Gross, R. D. Pisarski, and L. G. Yaffe, Rev. Mod.Phys. , 43 (1981).[23] B. A. Berg and T. Neuhaus, Phys. Lett. B267 , 249(1991).[24] K. Kajantie, M. Laine, K. Rummukainen, and M. E.Shaposhnikov, Nucl. Phys. B466 , 189 (1996).[25] M. Laine and K. Rummukainen, Nucl. Phys. B535 , 423(1998).[26] F. Wang and D. P. Landau, Phys. Rev. Lett. , 2050(2001).[27] P. T. Jahn, G. D. Moore, and D. Robaina, (2018),arXiv:1805.11511. [28] R. Narayanan and H. Neuberger, JHEP , 064 (2006).[29] M. L¨uscher, Commun. Math. Phys. , 899 (2010).[30] A. Barducci, G. Bussi, and M. Parrinello, Phys. Rev.Lett. , 020603 (2008).[31] A. Laio, G. Martinelli, and F. Sanfilippo, JHEP , 089(2016).[32] F. Sanfilippo, A. Laio, and G. Martinelli, Proceedings,34th International Symposium on Lattice Field Theory(Lattice 2016): Southampton, UK, July 24-30, 2016 , PoS LATTICE2016 , 274 (2017).[33] C. Morningstar and M. J. Peardon, Phys. Rev. D69 ,054501 (2004).[34] C. Bonati, M. D’Elia, G. Martinelli, F. Negro, F. Sanfil-ippo, and A. Todaro, (2018), arXiv:1807.07954.[35] C. Bonati and M. D’Elia, Phys. Rev. E98 , 013308 (2018).[36] U. Wolff (ALPHA), Comput. Phys. Commun. , 143 (2004), [Erratum: Comput. Phys. Com-mun.176,383(2007)].[37] A. Ramos and S. Sint, Eur. Phys. J. C , 15 (2016).[38] M. Caselle, A. Nada, and M. Panero, (2018),arXiv:1801.03110.[39] L. Giusti and H. B. Meyer, Phys. Rev. Lett. , 131601(2011).[40] http://luscher.web.cern.ch/luscher/openQCD/index.htmlhttp://luscher.web.cern.ch/luscher/openQCD/index.html