[PDF] Classifying Topological Charge in SU(3) Yang-Mills Theory with Machine Learning

Abstract

We apply a machine learning technique for identifying the topological charge of quantum gauge configurations in four-dimensional SU(3) Yang-Mills theory. The topological charge density measured on the original and smoothed gauge configurations with and without dimensional reduction is used as inputs for the neural networks (NN) with and without convolutional layers. The gradient flow is used for the smoothing of the gauge field. We find that the topological charge determined at a large flow time can be predicted with high accuracy from the data at small flow times by the trained NN; for example, the accuracy exceeds 99% with the data at t/ a 2 ≤0.3 . High robustness against the change of simulation parameters is also confirmed with a fixed physical volume. We find that the best performance is obtained when the spatial coordinates of the topological charge density are fully integrated out in preprocessing, which implies that our convolutional NN does not find characteristic structures in multi-dimensional space relevant for the determination of the topological charge.

Full PDF

JJournal of the Physical Society of Japan

FULL PAPERS

J-PARC-TH-0170

Classifying Topological Charge in SU(3) Yang-Mills Theory with Machine Learning

Takuya Matsumoto , Masakiyo Kitazawa , ∗ , Yasuhiro Kohno Department of Physics, Osaka University, Toyonaka, Osaka 560-0043, Japan J-PARC Branch, KEK Theory Center, Institute of Particle and Nuclear Studies, KEK, 203-1, Shirakata, Tokai, Ibaraki,319-1106, Japan Research Center for Nuclear Physics, Osaka University, Ibaraki, Osaka 567-0047, Japan

We apply a machine learning technique for identifying the topological charge of quantum gauge conﬁgurations infour-dimensional SU(3) Yang-Mills theory. The topological charge density measured on the original and smoothed gaugeconﬁgurations with and without dimensional reduction is used for inputs of the neural networks (NN) with and withoutconvolutional layers. The gradient ﬂow is used for the smoothing of the gauge ﬁeld. We ﬁnd that the topological chargedetermined at a large ﬂow time can be predicted with high accuracy from the data at small ﬂow times by the trainedNN; the accuracy exceeds 99% with the data at t / a ≤ .

3. High robustness against the change of simulation parametersis also conﬁrmed. We ﬁnd that the best performance is obtained when the spatial coordinates of the topological chargedensity are fully integrated out as a preprocessing, which implies that our convolutional NN does not ﬁnd characteristicstructures in multi-dimensional space relevant for the determination of the topological charge.

1. Introduction

Quantum chromodynamics (QCD) and other Yang-Millsgauge theories in four spacetime dimensions can have topo-logically nontrivial gauge conﬁgurations classiﬁed by thetopological charge Q taking integer values. The existence ofthe non-trivial topology in QCD is responsible for variousnon-perturbative aspects of this theory, such as the U(1) prob-lem. The susceptibility of Q also provides an essential pa-rameter relevant for the cosmic abundance of the axion darkmatter. The topological property of QCD and Yang-Mills theo-ries has been studied by numerical simulations of latticegauge theory.

Because of the discretization of spacetime,gauge conﬁgurations on the lattice are, strictly speaking, topo-logically trivial. However, it is known that well-separatedtopological sectors emerge when the continuum limit is ap-proached.

Various methods for the measurement of Q ofthe gauge conﬁgurations on the lattice have been proposed,which are roughly classiﬁed into the fermionic and gluonicones. In the fermionic deﬁnitions the topological charge is de-ﬁned through the Atiyah-Singer index theorem, while thegluonic deﬁnitions make use of the topological charge mea-sured on a smoothed gauge ﬁeld. The values of Q mea-sured by various methods show an approximate agreement, which indicates the existence of separated topological sectors.In the lattice simulations, the measurement of the topologicalcharge is also important for monitoring the problem of thetopological freezing. In the present study, we apply the machine learning (ML)technique for analyzing Q of gauge conﬁgurations on the lat-tice. The ML has been applied for various problems in com-puter science quite successfully, such as the image recogni-tion, object detection, and natural language processing. Recently, this technique has also been applied to problemsin physics.

In the present study, we generate data bythe numerical simulation of SU(3) Yang-Mills theory in fourspacetime dimensions, and feed them into the neural networks ∗ [email protected] (NN). We use the convolutional NN (CNN) as well as the sim-ple fully-connected NN (FNN) depending on the type of theinput data. The NN are trained to predict the value of Q by thesupervised learning.The ﬁrst aim of this study is a development of an e ﬃ cientalgorithm for the analysis of Q with the aid of the ML. Thesecond, and more interesting, purpose is the search for charac-teristic local structures in the four-dimensional space relatedto Q by the CNN. It is known that Yang-Mills theories haveclassical gauge conﬁgurations called instantons, which carrya nonzero topological charge and have a localized structure. If the topological charge of the quantum gauge conﬁgurationsis also carried by instanton-like local objects, the CNN wouldrecognize and make use of them for the prediction of Q . Suchan analysis of the four-dimensional quantum ﬁelds by the MLwill open a new application of this technique.In this study, we use the topological charge density mea-sured on the original and smoothed gauge conﬁgurations asinputs of the NN. The smoothing is performed by the gra-dient ﬂow. We also perform the dimensional reductionto various dimensions as a preprocessing of the data beforefeeding them into the CNN or FNN. For the deﬁnition of Q ,we use a gluonic one through the gradient ﬂow. We ﬁndthat the NN can estimate the value of Q determined at a largeﬂow time with high accuracy from the data obtained at smallﬂow times. In particular, we show that the high accuracy isobtained by the multi-channel analysis of the data at di ﬀ erentﬂow times. We argue that this method can reduce the numeri-cal cost for the analysis of Q compared with the conventionalmethod.We also ﬁnd that the accuracy of the NN does not have astatistically-signiﬁcant dependence on the dimension of theinput data after the the dimensional reduction. This resultimplies that the CNN fails in ﬁnding characteristic featuresrelated to the topology in multi-dimensional space, i.e. thequantum gauge conﬁgurations do not have such features, ortheir signals are too weak to be detected by the CNN. a r X i v : . [ h e p - l a t ] S e p . Phys. Soc. Jpn. FULL PAPERS β N N conf Table I.

Simulation parameters on the lattice. The inverse bare coupling β ,the lattice size N , and the number of conﬁgurations N conf .

2. Organization of this paper

In this study, we perform various analyses of the topo-logical charge Q with the use of the CNN or FNN. First,we analyze the topological charge density q t ( x ) in the four-dimensional ( d =

4) space at a ﬂow time t (the deﬁnitionsof q t ( x ) and t will be given in Sec. 4). Second, we performthe dimensional reduction of the input data as a preprocess-ing and analyze them by the NN. The dimension is reduced to d = − d = d ≥ d . Because the numerical cost forthe supervised learning is suppressed as d becomes smaller,this means that the best performance is obtained at d = Q with the ML technique, we con-sider simple models to estimate Q without ML in Sec. 5.These models are used for the benchmarks of the trained NNto verify if they recognize nontrivial features of the data inSecs. 6–8.The whole structure of this paper is summarized as follows.In the next section, we show the setup of the lattice numeri-cal simulations. In Sec. 4, we then give a brief review on theanalysis of the topology with the gradient ﬂow. The bench-mark models for the classiﬁcation of Q without using the MLare discussed in Sec. 5. The application of the ML is thendiscussed in Secs. 6–8. We ﬁrst consider the analysis of the d = q t ( x ) by the CNN in Sec. 7. InSec. 8, we extend the analysis to d = , ,

3. The last sectionis devoted to discussions.

3. Lattice setup

Throughout this paper, we consider SU(3) Yang-Mills the-ory in the four-dimensional Euclidean space with the periodicboundary conditions for all directions. The standard Wilsongauge action is used for generating the gauge conﬁgurations.We perform the numerical analyses at two inverse bare cou-plings β = / g = . . and 24 , respectively, as in Table I. These lattice parametersare chosen so that the lattice volumes in physical units L arealmost the same on these lattices; the lattice spacing deter-mined in Ref. 18 shows that the di ﬀ erence in the lattice size L is less than 2%. The lattice size L is related to the criti-cal temperature of the deconﬁnement phase transition T c as1 / L (cid:39) . T c . We generate 20 ,

000 gauge conﬁgurations for each β , whichare separated by 100 Monte Carlo sweeps with each other,where one sweep consists of one pseudo-heat bath and ﬁve over-relaxation updates. For the discretized deﬁnition of thetopological charge density on the lattice, we use the oper-ator constructed from the clover representation of the ﬁeldstrength. The gradient ﬂow is used for the smoothing of thegauge ﬁeld.To estimate the statistical error of an observable on thelattice, we use the jackknife analysis with the binsize 100.We have numerically checked that the auto-correlation lengthof the topological charge is about 100 and 1900 sweeps for β = . .

5, respectively. The binsize of the jackknifeanalysis including 100 ×

100 sweeps is su ﬃ ciently larger thanthe auto-correlation length.

4. Topological charge

In the continuous Yang-Mills theory in the four-dimensional Euclidean space, the topological charge is de-ﬁned by Q = (cid:90) V d x q ( x ) , (1) q ( x ) = − π (cid:15) µνρσ tr (cid:104) F µν ( x ) F ρσ ( x ) (cid:105) , (2)where V is the four-volume and F µν ( x ) = ∂ µ A ν ( x ) − ∂ ν A µ ( x ) + [ A µ ( x ) , A ν ( x )] is the ﬁeld strength. q ( x ) is called thetopological-charge density with the coordinate x in Euclideanspace.In lattice gauge theory, Eq. (1) calculated on a gauge con-ﬁguration with a discretized deﬁnition of Eq. (2) is not givenby an integer, but distributes continuously. To obtain dis-cretized values, one may apply a smoothing of the gauge ﬁeldbefore the measurement of q ( x ).In the present study, we use the gradient ﬂow for thesmoothing. The gradient ﬂow is a continuous transformationof the gauge ﬁeld characterized by a parameter t called theﬂow time having dimension of mass inverse squared. Thegauge ﬁeld at a ﬂow time t is a smoothed ﬁeld with the mean-square smoothing radius √ t . In the following, we denotethe topological charge density obtained at t as q t ( x ), and itsfour-dimensional integral as Q ( t ) = (cid:90) V d x q t ( x ) . (3)Shown in Fig. 1 is the t dependence of Q ( t ) calculated on200 gauge conﬁgurations at β = . .

5. The horizontalaxis shows the dimensionlees ﬂow time t / a with the latticespacing a . One ﬁnds that the values of Q ( t ) approach discreteinteger values as t becomes larger. In Fig. 2, we show the dis-tribution of Q ( t ) for several values of t / a by the histogramat β = .

5. At t =

0, the values of Q ( t ) are distributed con-tinuously around the origin. As t becomes larger, the distribu-tion converges on discretized integer values. For t / a > . t , one can classify the gauge conﬁg-urations into di ﬀ erent topological sectors labeled by the inte-ger topological charge Q deﬁned, for example, by the near-est integer to Q ( t ). It is known that the value of Q deﬁnedin this way approximately agrees with the topological chargeobtained through other deﬁnitions, and the agreement is betteron ﬁner lattices. From Figs. 1 and 2, one ﬁnds that the distribution of Q ( t )deviates from integer values toward the origin. This deviation

2. Phys. Soc. Jpn.

FULL PAPERS t/a Q ( t ) β = 6 . t/a Q ( t ) t/a Q ( t ) t/a Q ( t ) β = 6 . t/a Q ( t ) t/a Q ( t ) Fig. 1.

Flow time t dependence of Q ( t ) on 200 gauge conﬁgurations at β = . . t is divided into threepanels representing 0 ≤ t / a ≤ .

35, 0 . ≤ t / a ≤

2, and 2 ≤ t / a ≤ becomes smaller as t becomes larger. From Fig. 1, one alsoﬁnds that Q ( t ) on some gauge conﬁgurations has a “ﬂipping”between di ﬀ erent topological sectors; after Q ( t ) shows a con-vergence to an integer value, it sometimes jumps into anotherinteger. As this behavior decreases on the ﬁner lattice, theﬂipping would be regarded as a lattice artifact arising from theambiguity of the topological sectors on the discretized space-time.In the following, we use t / a = . Q as Q = round[ Q ( t )] t / a = . , (4)where round( x ) means rounding o ﬀ to the nearest integer. Asindicated from Fig. 1, the value of Q hardly changes with thevariation of t / a in the range 4 < t / a <

12. In Table II, weshow the number of gauge conﬁgurations classiﬁed into eachtopological sector through this deﬁnition. The variance of thisdistribution (cid:104)Q (cid:105) is shown in the far right column. In Fig. 2,the distributions of Q ( t ) in individual topological sectors are t/a = 0 β = 6 . t/a = 0 . t/a = 0 . t/a = 0 . t/a = 0 . t/a = 1 . t/a = 4 . Fig. 2.

Distribution of Q ( t ) at several values of t / a . The colored his-tograms are the distributions in individual topological sectors; see text. shown by the colored histograms.

5. Benchmark models

In this study, we analyze q t ( x ) or Q ( t ) at small values of t bythe ML technique. Here, t used for the input has to be chosensmall enough so that a simple estimate of Q like Eq. (4) isnot possible. In this section, before the main analysis with theML technique we discuss the accuracy obtained only from Q ( t ) without the ML. These analyses serve as benchmarks forevaluating the genuine beneﬁt of the ML.Throughout this study, as the performance metric of amodel for an estimate of Q we use the accuracy deﬁned by P = number of correct answernumber of total data . (5)Because the numbers of gauge conﬁgurations on di ﬀ erenttopological sectors di ﬀ er signiﬁcantly as in Table II, Eq. (5)would not necessarily be a good performance metric. In par-ticular, the topological sector with Q = Q = P (cid:39) .

37 (0 .

41) for β = . .

3. Phys. Soc. Jpn.

FULL PAPERS Q -5 -4 -3 -2 -1 0 1 2 3 4 5 (cid:104)Q (cid:105) β = . β = . Table II.

Number of the gauge conﬁgurations classiﬁed into each topological sector with the deﬁnition of Q in Eq. (4). The far right column shows thevariance of the distribution of Q . t/a a cc u r a c y β = 6 . , imp. β = 6 . , naive β = 6 . , Q = 0 β = 6 . , imp. β = 6 . , naive β = 6 . , Q = 0 Fig. 3.

Flow time t dependence of the accuracies P naive and P imp obtainedby the models Eqs. (6) and (7), respectively. The dotted lines show the accu-racy of the model that answers Q = c a cc u r a c y β = 6 . t/a = 0 . t/a = 0 . t/a = 0 . t/a = 0 . t/a = 0 . β = 6 . t/a = 0 . t/a = 0 . t/a = 0 . t/a = 0 . t/a = 0 . Fig. 4.

Accuracy of Eq. (7) as a function of c for several values of t / a .The statistical errors are shown by the shaded band, although the width of thebands are almost the same as the thickness of the lines. use the recalls of individual topological sectors complemen-tary to Eq. (5) to inspect the bias of the output of NN models.To make an estimate of Q from Q ( t ), we consider two sim-ple models. The ﬁrst model is just rounding o ﬀ Q ( t ) as Q naive = round[ Q ( t )] . (6)The accuracy obtained by this model, P naive , as a function of t is shown in Fig. 3 by the dashed lines. The ﬁgure shows thatthe accuracy of Eq. (6) approaches 100% as t / a becomes t / a β = . β = .

50 0.273(3) 0.162(3)0.1 0.383(4) 0.274(3)0.2 0.546(4) 0.474(4)0.3 0.773(3) 0.713(4)0.4 0.925(2) 0.916(2)0.5 0.960(1) 0.989(1)1.0 0.982(1) 0.999(0)2.0 0.992(1) 0.999(0)4.0 1.000(0) 1.000(0)10.0 0.993(1) 0.999(0)

Table III.

Accuracy P imp obtained by the model Eq. (7) with the optimiza-tion of c . larger corresponding to the behavior of Q ( t ) in Figs. 1 and 2.At t / a = .

0, Eq. (6) is equivalent to Eq. (4) and the accuracybecomes 100% by deﬁnition.The model Eq. (6) can be improved with a simple modiﬁca-tion. In Fig. 2, one sees that the distribution in each topolog-ical sector is shifted toward the origin from Q . This behaviorsuggests that Eq. (6) can be improved by applying a constantbefore rounding o ﬀ as Q imp = round[ cQ ( t )] , (7)where c is the parameter determined so as to maximize theaccuracy in the range c > t . In Fig. 4, we show the c dependence of the accuracy of Eq. (7) for several values of t / a . The ﬁgure shows that the accuracy has a maximum at c > t / a . In this case, the model Eq. (7) has a bet-ter accuracy than Eq. (6) by tuning the parameter c . We denotethe optimal accuracy of Eq. (7) as P imp . In Fig. 3, the t / a de-pendence of P imp is shown by the solid lines. The numericalvalues of P imp are depicted in Table III for some t / a . Fig-ure 3 shows that a clear improvement of the accuracy by thesingle-parameter tuning is observed at t / a (cid:38) . . β = . .

5, respectively. We note that P naive = P imp = t / a = . P imp is al-most unity at t / a = . .

0, which shows that the valueof Q deﬁned by Eq. (4) hardly changes with the variation of t / a in the range t / a (cid:38) . P imp is already close to unity at t / a = .

5, it is di ﬃ cultto obtain a nontrivial gain of the accuracy from the analysisof q t ( x ) by the NN for t / a ≥ .

5. In the following, therefore,we feed the data at t / a < . P imp for a benchmark of the accuracy obtainedby the NN models.

6. Learning Q ( t ) From this section we employ the ML technique for the anal-ysis of the lattice data. As discussed in Secs. 1 and 2, amongvarious analyses we found that the most successful result isobtained when a set of the values of Q ( t ) at several t is ana-lyzed by the FNN. In this section, we discuss this result. The

4. Phys. Soc. Jpn.

FULL PAPERS layer output size activationinput 3 -full connect 5 logisticfull connect 1 -

Table IV.

Design of the FNN used for the analysis of Q ( t ). analysis of the multi-dimensional data by the CNN will bereported in later sections. In this section we employ a simple FNN model withoutconvolutional layers. The FNN accepts three values of Q ( t ) atdi ﬀ erent t as inputs, and is trained to predict Q by the super-vised learning. The structure of the FNN is shown in Table IV.The FNN has only one hidden layer with ﬁve units that arefully connected with the input and output layers. We use thelogistic (sigmoid) function for the activation function of thehidden layer. Although we have also tried the ReLU for theactivation function, we found that the logistic function gives abetter result. We employ the regression model, i.e. the outputof the FNN is given by a single real number. The ﬁnal pre-diction of Q is then obtained by rounding o ﬀ the output to thenearest integer.For the supervised learning, we randomly divide 20,000gauge conﬁgurations into 10,000 and two 5,000 sub-groups.We use 10,000 data for the training, and one of the 5,000 datasets for the validation analysis. The last 5,000 data is usedfor the evaluation of the accuracy of the trained NN. The su-pervised learning is repeated 10 times with di ﬀ erent divisionsof the conﬁgurations, and the uncertainty of the accuracy isestimated from the variance.We use the mean-squared error for the loss function, andminimize it through the updates of the NN parameters bythe ADAM with the default setting. The update is repeated3,000 epochs with the batchsize 16. The optimized parameterset of the FNN is then determined as the one giving the lowestvalue of the loss function on the validation data.The FNN is implemented by the Chainer framework. Thetraining of the FNN in this section has been carried out as asingle-core job on a XEON processor (Xeon E5-2698-v3). Ittakes about 40 minutes for a single training on this environ-ment.

Shown in Table V are the accuracies obtained by the trainedFNN for various choices of the input data. The left columnsshow the set of three ﬂow times t / a that evaluate Q ( t ) usedfor the input of the FNN. In the upper eight rows we show theresults with the input ﬂow times t / a = (ˆ t max , ˆ t max − . , ˆ t max − . t max becomes larger. By comparing this result with Table III oneﬁnds that the accuracy obtained by the FNN is signiﬁcantlyhigher than P imp at t / a = ˆ t max . In particular, the accuracyat ˆ t max = .

3, i.e. t / a = (0 . , . , . β = .

5, while the benchmarkmodel Eq. (7) gives P imp (cid:39) .

71. This result shows that theprediction of Q from the numerical data at t / a ≤ . t max becomes input t / a β = . β = . Table V.

Accuracy of the trained FNN in Table IV with various sets of theinput data. Left column shows the values of t / a that evaluate Q ( t ) for theinput. Errors are estimated from the variance among 10 di ﬀ erent trainings. -5 -4 -3 -2 -1 0 1 2 3 4 5Predicted Q-5-4-3-2-1012345 T r u e Q β = 6 . -5 -4 -3 -2 -1 0 1 2 3 4 5Predicted Q-5-4-3-2-1012345 T r u e Q β = 6 . Fig. 5.

Confusion matrix of the trained FNN model with the input Q ( t ) at t / a = (0 . , . , .

2) for β = . . Q . larger, but the improvement from P imp is limited for muchlarger ˆ t max because P imp is already close to unity. The sameresult is obtained for β = .

2, although the accuracy is slightlylower than β = . Q with the input t / a = (0 . , . , .

2) for β = . . ± Q ( t ) is monotonic at 0 . ≤ t / a ≤ .

3. Therefore,the value at large t can be estimated easily, for example, bythe human eye, for almost all conﬁgurations. It is reasonableto interpret that the FNN learns this behavior. We, however,remark that the accuracy of 99% obtained by the trained FNNis still non-trivial. We have tried to ﬁnd a simple function topredict Q from three values of Q ( t ) at t / a = (0 . , . , . Q ( t ).In the lower three rows of Table V, we show the accu-racies of the trained FNN with the input ﬂow times t / a = (ˆ t max , ˆ t max − . , ˆ t max − .

2) for several values of ˆ t max . The ac-

5. Phys. Soc. Jpn.

FULL PAPERS curacies in these cases are slightly lower than the result with t / a = (ˆ t max , ˆ t max − . , ˆ t max − .

1) at the same ˆ t max . We havealso tested the FNN models analyzing four values of Q ( t ).It, however, was found that the accuracy does not exceed thecase with three input data with the same maximum t / a . Wehave also tested the FNN models having more complex struc-ture, for example, with multiple hidden layers. A statistically-signiﬁcant improvement of the accuracy, however, was notobserved, either.In the conventional analysis of Q with the gradient ﬂow dis-cussed in Sec. 4, one must use the value of Q ( t ) at a large ﬂowtime at which the distribution of Q ( t ) is well localized. Thismeans that the gradient ﬂow equation has to be solved nu-merically up to the large ﬂow time to obtain Q . Moreover,concerning the continuum a → a is varied. This means that the ﬂow time in lattice units, t / a , becomeslarge and the numerical cost for solving the ﬂow equation in-creases as the continuum limit is approached. On the otherhand, our analysis can estimate Q quite successfully only withthe data at t / a (cid:46) .

3. This means that the numerical cost forthe evaluation of Q can be reduced drastically with the aid ofthe FNN.Table V shows that the better accuracy is obtained on theﬁner lattice (larger β ). From Fig. 1, it is suggested that thistendency comes from the reduction of the “ﬂipping” of Q ( t )on the ﬁner lattice, as the non-monotonic ﬂipping makes theprediction of Q from Q ( t ) at small t / a di ﬃ cult. This e ﬀ ectis also suggested from Fig. 3, as P naive and P imp at β = . β = .

5. We note that this lattice spac-ing dependence hardly changes even if we scale the value of t to determine Q in Eq. (4) in physical units if t is su ﬃ cientlylarge. Provided that the ﬂipping of Q ( t ) comes from the latticeartifact related to the ambiguity of the topological sectors onthe discretized spacetime, it is conjectured that the imperfectaccuracy of the FNN is to a large extent attributed to this lat-tice artifact. Then, the imperfect accuracy of the FNN at ﬁnite a is an inevitable one, and the accuracy should become betteras the lattice spacing becomes ﬁner. Therefore, it is conjec-tured that the systematic uncertainty arising from the imper-fect accuracy of the FNN is suppressed in the analysis of thecontinuum extrapolation. Next, we consider the variance of the topological charge (cid:104)Q (cid:105) , which is related to the topological susceptibility as χ Q = (cid:104)Q (cid:105) / V . From the output of the FNN with the input ﬂow times t / a = (0 . , . , . Q is calculated to be (cid:104)Q (cid:105) NN = . β = . , (8) (cid:104)Q (cid:105) NN = . β = . , (9)for each β where the ﬁrst and second errors represent the sta-tistical error obtained by the jackknife analysis and the uncer-tainty of the FNN model estimated from 10 di ﬀ erent trainings,respectively. These values agree well with those shown in Ta-ble II. So far, we have performed the training of the FNN with thenumber of the training data N train = , N train ,

000 5 ,

000 1 ,

000 500 100 β = . . . . . . β = . . . . . . Table VI.

Dependence of the accuracy of the trained FNN on the numberof the training data N train . analyzed data β = . β = . β = . . . β = . . . . / . . . Table VII.

Accuracy obtained by the analysis of the data with di ﬀ erent β from the one used for the training. Last row shows the accuracy of the FNNtrained by the mixed data set at β = . . the training with much smaller N train .Shown in Table VI is the accuracy of the trained FNN withvarious N train with the input ﬂow times t / a = (0 . , . , . N train = N train is alsoresponsible for reducing the numerical cost for the training.With N train = , . N train =

500 on the same environ-ment.

Next, we consider the analysis of the data with di ﬀ erent β from the one used for the training. In Table. VII, we show theaccuracy obtained with various combinations of the β valuesused for the training and the analysis with the input ﬂow times t / a = (0 . , . , . ﬀ erent data set is analyzed, but thereduction is small and almost within statistics. We have alsoperformed the training of the FNN with the combined data setof β = . .

5. The result of this analysis is shown inthe far bottom row in Table. VII. One ﬁnds that this FNN canpredict Q for each β with the same accuracy within statisticsas those trained for individual β .These results suggest that it is possible to develop a NNmodel to deal with various β simultaneously. Once such amodel is developed, the model plays a quite useful role in theanalysis of Q . We, however, notice that the two lattices stud-ied in the present study have almost the same spatial volumein physical units. The analysis of the robustness against thevariation of the spatial volume is left for future work.

7. Learning topological charge density q t ( x ) In this section we employ the CNN and train it to analyzethe four-dimensional ﬁeld q t ( x ). A motivation of this analy-sis is the search for characteristic features responsible for thetopology in the four-dimensional space by the ML. In particu-lar, if the quantum gauge conﬁgurations have local structureslike instantons, such structures would be recognized by theCNN and used for an e ﬃ cient prediction of Q .

6. Phys. Soc. Jpn.

FULL PAPERS layer ﬁlter size output size activationinput - 8 d × N ch -convolution 3 d d × d d × d d × d × Table VIII.

Design of the CNN for the analysis of the multi-dimensionaldata. The dimension d of the input data is 4 in Sec. 7. In Sec. 8, we analyzethe data with d = , , Let us ﬁrst discuss the choice of the input data for the CNN.Because the gauge conﬁgurations on the lattice are describedby the link variables U µ ( x ), which are elements of the groupSU(3), the most fundamental choice for the input data is thelink variables. However, as U µ ( x ) is described by 72 real vari-ables per lattice site, the reduction of the data size is desir-able for an e ﬃ cient training. Moreover, because physical ob-servables are given only by gauge-invariant combinations of U µ ( x ), the CNN must learn the concept of the gauge invari-ance, and accordingly the SU(3) matrix algebra, so that it canmake a successful prediction of Q from U µ ( x ). These con-cepts, however, would be too complicated for simple CNNmodels.In the present study, from these reasons we use the topo-logical charge density q t ( x ) as the input of the CNN. q t ( x ) isgauge invariant, and the degrees of freedom per lattice siteis one. To reduce the size of the input data further, we reducethe lattice volume to 8 from 16 and 24 by the average pool-ing as a preprocessing. In addition to the analysis of q t ( x ) ata given t , we prepare a combined data set of q t ( x ) with sev-eral values of t and analyze it as the multi-channel data by theCNN. In this section, we use the CNN with the convolutional lay-ers that deal with four-dimensional data. In Table. VIII, weshow the structure of the CNN model, where d denotes thedimension of the spacetime and is set to d = and ﬁve output channels. In these convolu-tional layers, we use the periodic padding for all directions torespect the periodic boundary conditions of the gauge conﬁg-uration. N ch denotes the number of channels of the input dataper lattice site; N ch = q t ( x ) at a single t is fed into theCNN. We also perform the multi-channel analysis by feeding q t ( x ) at N ch ﬂow times.The lattice gauge theory has translational symmetry andthe shift of the spatial coordinates of q t ( x ) toward any di-rections does not change the value of Q . To ensure that theCNN automatically respects this property, we insert a globalaverage pooling (GAP) layer after the convolutional layers.The GAP layer takes the average with respect to the spatialcoordinates for each channel. The output of the GAP layer isthen processed by two fully-connected layers before the ﬁnaloutput. The logistic activation function is used for the convo-lutional and fully-connected layers.The training of the CNN in this section has been mainly carried out on Google Colaboratory. We use 12,000 data forthe training, 2,000 data for the validation, and 6,000 data forthe test, respectively. The batchsize for the minibatch train-ing is 200. We repeat the parameter tuning 500 epochs. Othersettings of the training are the same as in the previous section.Besides the CNN model in Table. VIII, we have tested var-ious variations of the model. For example, we tested ReLUactivation function in place of the logistic one. The use ofthe fully-connected layer in place of the GAP layer and theconvolutional layers with the 5 ﬁlter size are also tried. Thenumber of the output channels of the convolutional layers isvaried up to 20. We, however, found that these variations donot improve the accuracy at all, while they typically increasethe numerical cost for the training. The CNN in Table. VIII isa simple but e ﬃ cient choice among all these variations. In Table IX, we show the performance of the trained CNNwith various inputs. Left two columns show N ch and the ﬂowtime(s) used for the input. On the upper four rows, the resultswith N ch = t / a =

0, 0 .

1, 0 .

2, and 0 . N ch = q t ( x ) at t / a = (0 . , . , .

1) areused. The third column shows the accuracy P of the trainedCNN obtained for each input. In the table, we also show therecalls of individual topological sectors R Q deﬁned by R Q = N correct Q N Q , (10)where N Q is the number of conﬁgurations in the topologi-cal sector Q and N correct Q is the number of the correct answersamong them.The top row of Table IX shows P and R Q obtained bythe analysis of the topological charge density of the originalgauge conﬁguration without the gradient ﬂow. Although weobtain a nonzero P , the recall of each Q shows that in this casethe CNN is trained to answer Q = Q .Next, the results with N ch = t / a show that P becomes larger with increasing t / a . From R Q one also ﬁndsthat the output of the CNN scatters on di ﬀ erent topologicalsectors. However, by comparing P with that of the benchmarkmodel P imp in Table III with the same t / a , one ﬁnds that P and P imp are almost the same. This result suggests that theCNN is trained to answer Q imp and no further informationis obtained from the analysis of the four-dimensional data of q t ( x ).Finally, from the multi-channel analysis with the inputﬂow times t / a = (0 . , . , . P is signiﬁcantly enhanced from the case with N ch = β . However, this accuracy is thesame within the error as that obtained in Sec. 6 with t / a = (0 . , . , .

1) shown in Table V. This result implies that theCNN is trained to obtain Q ( t ) for each t and then predicts theanswer from them with a similar procedure as the FNN inSec. 6.From these results, we conclude that our analyses of thefour-dimensional data by the CNN fail in ﬁnding structures inthe four-dimensional space responsible for the determinationof Q . The numerical cost for the training of the CNN in this

7. Phys. Soc. Jpn.

FULL PAPERS β = . R Q N ch input t / a P -4 -3 -2 -1 0 1 2 3 41 0 0.371 0 0 0 0 1.000 0 0 0 01 0.1 0.401 0 0 0.002 0.255 0.702 0.341 0.008 0 01 0.2 0.552 0 0.043 0.240 0.495 0.687 0.597 0.336 0.111 01 0.3 0.776 0 0.391 0.687 0.760 0.821 0.794 0.740 0.569 03 0.3,0.2,0.1 0.942 0.200 0.913 0.944 0.950 0.944 0.939 0.937 0.889 0.571 β = . R Q N ch input t / a P -4 -3 -2 -1 0 1 2 3 41 0 0.388 0 0 0 0 1.000 0 0 0 01 0.1 0.396 0 0 0 0.086 0.889 0.129 0 0 01 0.2 0.479 0 0 0.108 0.445 0.641 0.459 0.150 0 01 0.3 0.698 0 0.170 0.585 0.730 0.727 0.701 0.624 0.395 0.0713 0.3,0.2,0.1 0.953 0 0.830 0.951 0.956 0.952 0.962 0.968 0.953 0.286 Table IX.

Accuracy P and the recalls of individual topological sectors R Q obtained by the analysis of the topological charge density in the four-dimensionalspace by the CNN. The input data has N ch channels. 8. Phys. Soc. Jpn. FULL PAPERS dimension d a cc u r a c y P β = 6 . β = 6 . Fig. 6.

Dependence of the accuracy P on the spacetime dimension d afterthe dimensional reduction. The values at d = section is a few orders larger than those in Sec. 6, althoughclear improvement of the accuracy is not observed. Therefore,for practical purposes the analysis in the previous section withthe FNN is superior.

8. Dimensional reduction

In the previous two sections we discussed the analysis ofthe four-dimensional topological charge density q t ( x ) and itsfour-dimensional integral Q ( t ) by the ML. The spatial dimen-sions of these input data are d = d = q (3) t ( x , x , x ) = (cid:90) dx q t ( x , x , x , x ) , (11)˜ q (2) t ( x , x ) = (cid:90) dx dx q t ( x , x , x , x ) , (12)˜ q (1) t ( x ) = (cid:90) dx dx dx q t ( x , x , x , x ) , (13)with q t ( x ) = q t ( x , x , x , x ). Here, ˜ q ( d ) t is the d -dimensionalﬁeld analyzed by the CNN. The structure of the CNN is thesame as the previous section (see Table VIII) except for thevalue of d . The procedure of the supervised learning is alsothe same. We perform the analysis of the multi-channel datawith N ch = t / a = (0 . , . , . d -dimensional data ˜ q ( d ) t by the CNN. The data pointsat d = Q ( t ) bythe FNN in Sec. 6 with t / a = (0 . , . , .

1) given in Table V.From the ﬁgure, one ﬁnds that the accuracy does not have astatistically-signiﬁcant d dependence, although the results at d = d =

9. Discussion

In the present study, we have investigated the application ofthe machine learning (ML) technique for the classiﬁcation of the topological sector of gauge conﬁgurations in SU(3) Yang-Mills theory. The topological charge density q t ( x ) at zero andnonzero ﬂow times t are used for the inputs of the neural net-works (NN) with and without the dimensional reduction.We found that the prediction of the topological charge Q can be made most e ﬃ ciently when Q ( t ) at small ﬂow timesare used as the input of the NN. In particular, we found thatthe value of Q deﬁned from Q ( t ) at a large ﬂow time can bepredicted with high accuracy only with Q ( t ) at t / a ≤ .

3; at β = .

5, the accuracy exceeds 99%. Using this procedure, thenumerical cost for solving the gradient ﬂow toward the largeﬂow time can be omitted in the analysis of the topologicalcharge.Because the prediction of the NN does not have 100% ac-curacy, the analysis of Q by the NN gives rise to uncontrol-lable systematic uncertainties. However, our analyses indicatethat the accuracy becomes better as the continuum limit is ap-proached. Moreover, as discussed in Sec. 6, the imperfect ac-curacy would to a large extent come from intrinsic uncertaintyof the topological sectors on the lattice with ﬁnite a . It thus isexpected that the analysis of Q becomes more accurate as thelattice spacing becomes ﬁner. As the 99% accuracy is alreadyattained at β = . a (cid:39) .

044 fm), the systematic uncertaintyshould be well suppressed on the lattices ﬁner than this lat-tice spacing, and our analysis should be able to carry out theanalysis of the continuum limit safely.In this study, we found that the analysis of the multi-dimensional ﬁeld q t ( x ) by the CNN does not improve the ac-curacy compared with that of Q ( t ). A plausible interpretationof this result is that our CNN fails in capturing useful struc-tures in the four-dimensional space relevant for the determina-tion of Q . It is an interesting future work to pursue the searchfor the structures in the four-dimensional space by the ML.One possible extension along this direction is the analysiswith the CNN having a more complex structure. Another in-teresting direction is the analysis of the gauge conﬁgurationsat high temperatures where the dilute instanton-gas picture iswell applicable. As the topological charge would be carried bywell-separated local objects at such temperatures, the searchfor the multi-dimensional space by the CNN would be easierthan the vacuum conﬁgurations analyzed in the present study.It is also interesting to analyze q t ( x ) at a large ﬂow time aftersubtracting the average, because the NN can no longer makeuse of the information on Q ( t ) by such a preprocessing. Weleft these analyses for future research.The authors thank A. Tomiya for many useful discussions.They also thank H. Fukaya and K. Hashimoto. The latticesimulations of this study are in part carried out on OCTOPUSat the Cybermedia Center, Osaka University and Reedbush-Uat Information Technology Center, The University of Tokyo.The NN are constructed on Chainer framework. The super-vised learning of the NN in Sec. 7 is in part carried outon Google Colaboratory. This work was supported by JSPSKAKENHI Grant Numbers 17K05442 and 19H05598. Appendix: Behavior of Q ( t ) In this appendix, we take a closer look at the behavior of Q ( t ) at small t . In Fig. A ·

1, we show the t dependence of Q ( t )on 100 gauge conﬁgurations at β = . ﬀ erent ways.

9. Phys. Soc. Jpn.

FULL PAPERS ˆ t = t/a ¯ Q ( ˆ t ) ˆ t = t/a ¯ Q ( ˆ t ) / ¯ Q ( . ) Fig. A · Closer look at the behavior of Q ( t ) at ˆ t = t / a ≤ . In the upper panel we show¯ Q (ˆ t ) = Q (ˆ ta ) − Q , (A · t = t / a , while the lower panel shows¯ Q (ˆ t )¯ Q (0 . . (A · ·

2) becomes unity at ˆ t = . Q from the behavior of Q ( t ) at 0 . ≤ ˆ t ≤ . β = .

5. This range of ˆ t is highlighted by the grayband in Fig. A ·

1. From the upper panel, one sees that ¯ Q (ˆ t )approaches zero monotonically on almost all conﬁgurations.However, the panel shows that some lines deviate from thistrend. As a result, it seems di ﬃ cult to predict the value of Q with 99% accuracy ( Q has to be predicted correctly on 99lines among 100 in the panel) by a simple function or thehuman eye from the behavior at 0 . ≤ ˆ t ≤ .

3, although 95%accuracy is not di ﬃ cult to attain. A similar observation is alsoobtained from the lower panel. It thus is indicated that the99% accuracy obtained by the NN in Sec. 4 is not a trivialresult.

1) S. Weinberg:

The quantum theory of ﬁelds. Vol. 2: Modern applications (Cambridge University Press, 2013).2) R. D. Peccei and H. R. Quinn: Phys. Rev. Lett. (1977) 1440.3) J. Preskill, M. B. Wise, and F. Wilczek: Phys. Lett. B (1983) 127 .4) L. Abbott and P. Sikivie: Phys. Lett. B (1983) 133 .5) E. Berkowitz, M. I. Bucho ﬀ , and E. Rinaldi: Phys. Rev. D92 (2015)034507.6) R. Kitano and N. Yamada: JHEP (2015) 136.7) M. C`e, C. Consonni, G. P. Engel, and L. Giusti: Phys. Rev. D92 (2015)074502.8) C. Bonati, M. D’Elia, M. Mariti, G. Martinelli, M. Mesiti, F. Negro,F. Sanﬁlippo, and G. Villadoro: JHEP (2016) 155.9) P. Petreczky, H.-P. Schadler, and S. Sharma: Phys. Lett. B762 (2016)498.10) J. Frison, R. Kitano, H. Matsufuru, S. Mori, and N. Yamada: JHEP (2016) 021.11) S. Borsanyi et al.: Nature (2016) 69.12) Y. Taniguchi, K. Kanaya, H. Suzuki, and T. Umeda: Phys. Rev. D95 (2017) 054502.13) S. Aoki, G. Cossu, H. Fukaya, S. Hashimoto, and T. Kaneko: PTEP (2018) 043B07.14) C. Alexandrou, A. Athenodorou, K. Cichy, A. Dromard, E. Garcia-Ramos, K. Jansen, U. Wenger, and F. Zimmermann: arXiv:1708.00696(2017).15) F. Burger, E.-M. Ilgenfritz, M. P. Lombardo, and A. Trunin: Phys. Rev.

D98 (2018) 094501.16) P. T. Jahn, G. D. Moore, and D. Robaina: Phys. Rev.

D98 (2018) 054512.17) C. Bonati, M. D’Elia, G. Martinelli, F. Negro, F. Sanﬁlippo, and A. To-daro: JHEP (2018) 170.18) L. Giusti and M. L¨uscher: Eur. Phys. J. C79 (2019) 207.19) M. Luscher: Commun. Math. Phys. (1982) 39.20) M. F. Atiyah and I. M. Singer: Annals of Math. (1971) 139.21) Y. Iwasaki and T. Yoshie: Phys. Lett. (1983) 159.22) M. Teper: Phys. Lett. (1985) 357.23) S. Aoki, H. Fukaya, S. Hashimoto, and T. Onogi: Phys. Rev. D76 (2007)054508.24) M. Luscher and S. Schaefer: JHEP (2011) 036.25) Y. Lecun, L. Bottou, Y. Bengio, and P. Ha ﬀ ner: Proc. IEEE (1998)2278.26) A. Krizhevsky, I. Sutskever, and G. E. Hinton: Proceedings of the 25thInternational Conference on Neural Information Processing Systems -Volume 1, NIPS’12, 2012, pp. 1097–1105.27) Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean,and A. Ng: Proceedings of the 29th International Conference on Ma-chine Learning (ICML-12), ICML ’12, July 2012, pp. 81–88.28) M. Lin, Q. Chen, and S. Yan: arXiv:1312.4400 (2013).29) C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Er-han, V. Vanhoucke, and A. Rabinovich: CVPR, June 2015.30) K. Simonyan and A. Zisserman: International Conference on LearningRepresentations, 2015.31) K. He, X. Zhang, S. Ren, and J. Sun: CVPR, June 2016.32) R. Girshick, J. Donahue, T. Darrell, and J. Malik: CVPR, June 2014.33) R. Girshick: ICCV, December 2015.34) W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, andA. C. Berg: ECCV, June 2016.35) S. Ren, K. He, R. Girshick, and J. Sun, Advances in Neural InformationProcessing Systems 28, p. 91. Curran Associates, Inc., 2015.36) J. Redmon, S. Divvala, R. Girshick, and A. Farhadi: CVPR, June 2016.37) T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Advancesin Neural Information Processing Systems 26, p. 3111. 2013.38) A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.Gomez, L. u. Kaiser, and I. Polosukhin, Advances in Neural Informa-tion Processing Systems 30, p. 5998. 2017.39) J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova: arXiv:1810.04805(2018).40) A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals,A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu:arxiv:1609.03499, 2016.41) V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran,D. Wierstra, S. Legg, and D. Hassabis: Nature (2015) 529.42) D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang,10. Phys. Soc. Jpn. FULL PAPERS

A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap,F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis:Nature (2017) 354.43) I. E. Lagaris, A. Likas, and D. I. Fotiadis: IEEE Trans. Neural Networks (1998) 987.44) P. Baldi, P. Sadowski, and D. Whiteson: Nature Commun. (2014)4308.45) L. de Oliveira, M. Kagan, L. Mackey, B. Nachman, and A. Schwartz-man: JHEP (2016) 069.46) T. Ohtsuki and T. Ohtsuki: J. Phys. Soc. Jap. (2016) 123706.47) J. Barnard, E. N. Dawe, M. J. Dolan, and N. Rajcic: Phys. Rev. D95 (2017) 014018.48) A. Tanaka and A. Tomiya: J. Phys. Soc. Jap. (2017) 063001.49) J. Carrasquilla and R. G. Melko: Nature Phys. (2017) 431 EP .50) Y. Mori, K. Kashiwa, and A. Ohnishi: Phys. Rev. D96 (2017) 111501.51) M. Raissi, P. Perdikaris, and G. E. Karniadakis: arXiv:1711.10561(2017).52) H. Huang, B. Xiao, H. Xiong, Z. Wu, Y. Mu, and H. Song: arXiv:1081.03334 (2018).53) P. E. Shanahan, D. Trewartha, and W. Detmold: Phys. Rev.

D97 (2018)094506.54) K. Hashimoto, S. Sugishita, A. Tanaka, and A. Tomiya: Phys. Rev.

D98 (2018) 046019.55) K. Kashiwa, Y. Kikuchi, and A. Tomiya: PTEP (2019) 083A04.56) J. Steinheimer, L. Pang, K. Zhou, V. Koch, J. Randrup, and H. Stoecker:arXiv:1906.06562 (2019).57) K. Fukushima, S. S. Funai, and H. Iida: arXiv:1908.00281 (2019).58) R. Narayanan and H. Neuberger: JHEP (2006) 064.59) M. L¨uscher: JHEP (2010) 071. [Erratum: JHEP03,092(2014)].60) M. Luscher and P. Weisz: JHEP (2011) 051.61) M. Kitazawa, T. Iritani, M. Asakawa, T. Hatsuda, and H. Suzuki: Phys.Rev. D94 (2016) 114512.62) D. P. Kingma and J. Ba: arXiv:1412.6980 (2014).63) Chainer. https://chainer.org/ .64) Google Colaboratory. https://colab.research.google.com/https://colab.research.google.com/