Classifying Topological Charge in SU(3) Yang-Mills Theory with Machine Learning
JJournal of the Physical Society of Japan
FULL PAPERS
J-PARC-TH-0170
Classifying Topological Charge in SU(3) Yang-Mills Theory with Machine Learning
Takuya Matsumoto , Masakiyo Kitazawa , ∗ , Yasuhiro Kohno Department of Physics, Osaka University, Toyonaka, Osaka 560-0043, Japan J-PARC Branch, KEK Theory Center, Institute of Particle and Nuclear Studies, KEK, 203-1, Shirakata, Tokai, Ibaraki,319-1106, Japan Research Center for Nuclear Physics, Osaka University, Ibaraki, Osaka 567-0047, Japan
We apply a machine learning technique for identifying the topological charge of quantum gauge configurations infour-dimensional SU(3) Yang-Mills theory. The topological charge density measured on the original and smoothed gaugeconfigurations with and without dimensional reduction is used for inputs of the neural networks (NN) with and withoutconvolutional layers. The gradient flow is used for the smoothing of the gauge field. We find that the topological chargedetermined at a large flow time can be predicted with high accuracy from the data at small flow times by the trainedNN; the accuracy exceeds 99% with the data at t / a ≤ .
3. High robustness against the change of simulation parametersis also confirmed. We find that the best performance is obtained when the spatial coordinates of the topological chargedensity are fully integrated out as a preprocessing, which implies that our convolutional NN does not find characteristicstructures in multi-dimensional space relevant for the determination of the topological charge.
1. Introduction
Quantum chromodynamics (QCD) and other Yang-Millsgauge theories in four spacetime dimensions can have topo-logically nontrivial gauge configurations classified by thetopological charge Q taking integer values. The existence ofthe non-trivial topology in QCD is responsible for variousnon-perturbative aspects of this theory, such as the U(1) prob-lem. The susceptibility of Q also provides an essential pa-rameter relevant for the cosmic abundance of the axion darkmatter. The topological property of QCD and Yang-Mills theo-ries has been studied by numerical simulations of latticegauge theory.
Because of the discretization of spacetime,gauge configurations on the lattice are, strictly speaking, topo-logically trivial. However, it is known that well-separatedtopological sectors emerge when the continuum limit is ap-proached.
Various methods for the measurement of Q ofthe gauge configurations on the lattice have been proposed,which are roughly classified into the fermionic and gluonicones. In the fermionic definitions the topological charge is de-fined through the Atiyah-Singer index theorem, while thegluonic definitions make use of the topological charge mea-sured on a smoothed gauge field. The values of Q mea-sured by various methods show an approximate agreement, which indicates the existence of separated topological sectors.In the lattice simulations, the measurement of the topologicalcharge is also important for monitoring the problem of thetopological freezing. In the present study, we apply the machine learning (ML)technique for analyzing Q of gauge configurations on the lat-tice. The ML has been applied for various problems in com-puter science quite successfully, such as the image recogni-tion, object detection, and natural language processing. Recently, this technique has also been applied to problemsin physics.
In the present study, we generate data bythe numerical simulation of SU(3) Yang-Mills theory in fourspacetime dimensions, and feed them into the neural networks ∗ [email protected] (NN). We use the convolutional NN (CNN) as well as the sim-ple fully-connected NN (FNN) depending on the type of theinput data. The NN are trained to predict the value of Q by thesupervised learning.The first aim of this study is a development of an e ffi cientalgorithm for the analysis of Q with the aid of the ML. Thesecond, and more interesting, purpose is the search for charac-teristic local structures in the four-dimensional space relatedto Q by the CNN. It is known that Yang-Mills theories haveclassical gauge configurations called instantons, which carrya nonzero topological charge and have a localized structure. If the topological charge of the quantum gauge configurationsis also carried by instanton-like local objects, the CNN wouldrecognize and make use of them for the prediction of Q . Suchan analysis of the four-dimensional quantum fields by the MLwill open a new application of this technique.In this study, we use the topological charge density mea-sured on the original and smoothed gauge configurations asinputs of the NN. The smoothing is performed by the gra-dient flow. We also perform the dimensional reductionto various dimensions as a preprocessing of the data beforefeeding them into the CNN or FNN. For the definition of Q ,we use a gluonic one through the gradient flow. We findthat the NN can estimate the value of Q determined at a largeflow time with high accuracy from the data obtained at smallflow times. In particular, we show that the high accuracy isobtained by the multi-channel analysis of the data at di ff erentflow times. We argue that this method can reduce the numeri-cal cost for the analysis of Q compared with the conventionalmethod.We also find that the accuracy of the NN does not have astatistically-significant dependence on the dimension of theinput data after the the dimensional reduction. This resultimplies that the CNN fails in finding characteristic featuresrelated to the topology in multi-dimensional space, i.e. thequantum gauge configurations do not have such features, ortheir signals are too weak to be detected by the CNN. a r X i v : . [ h e p - l a t ] S e p . Phys. Soc. Jpn. FULL PAPERS β N N conf Table I.
Simulation parameters on the lattice. The inverse bare coupling β ,the lattice size N , and the number of configurations N conf .
2. Organization of this paper
In this study, we perform various analyses of the topo-logical charge Q with the use of the CNN or FNN. First,we analyze the topological charge density q t ( x ) in the four-dimensional ( d =
4) space at a flow time t (the definitionsof q t ( x ) and t will be given in Sec. 4). Second, we performthe dimensional reduction of the input data as a preprocess-ing and analyze them by the NN. The dimension is reduced to d = − d = d ≥ d . Because the numerical cost forthe supervised learning is suppressed as d becomes smaller,this means that the best performance is obtained at d = Q with the ML technique, we con-sider simple models to estimate Q without ML in Sec. 5.These models are used for the benchmarks of the trained NNto verify if they recognize nontrivial features of the data inSecs. 6–8.The whole structure of this paper is summarized as follows.In the next section, we show the setup of the lattice numeri-cal simulations. In Sec. 4, we then give a brief review on theanalysis of the topology with the gradient flow. The bench-mark models for the classification of Q without using the MLare discussed in Sec. 5. The application of the ML is thendiscussed in Secs. 6–8. We first consider the analysis of the d = q t ( x ) by the CNN in Sec. 7. InSec. 8, we extend the analysis to d = , ,
3. The last sectionis devoted to discussions.
3. Lattice setup
Throughout this paper, we consider SU(3) Yang-Mills the-ory in the four-dimensional Euclidean space with the periodicboundary conditions for all directions. The standard Wilsongauge action is used for generating the gauge configurations.We perform the numerical analyses at two inverse bare cou-plings β = / g = . . and 24 , respectively, as in Table I. These lattice parametersare chosen so that the lattice volumes in physical units L arealmost the same on these lattices; the lattice spacing deter-mined in Ref. 18 shows that the di ff erence in the lattice size L is less than 2%. The lattice size L is related to the criti-cal temperature of the deconfinement phase transition T c as1 / L (cid:39) . T c . We generate 20 ,
000 gauge configurations for each β , whichare separated by 100 Monte Carlo sweeps with each other,where one sweep consists of one pseudo-heat bath and five over-relaxation updates. For the discretized definition of thetopological charge density on the lattice, we use the oper-ator constructed from the clover representation of the fieldstrength. The gradient flow is used for the smoothing of thegauge field.To estimate the statistical error of an observable on thelattice, we use the jackknife analysis with the binsize 100.We have numerically checked that the auto-correlation lengthof the topological charge is about 100 and 1900 sweeps for β = . .
5, respectively. The binsize of the jackknifeanalysis including 100 ×
100 sweeps is su ffi ciently larger thanthe auto-correlation length.
4. Topological charge
In the continuous Yang-Mills theory in the four-dimensional Euclidean space, the topological charge is de-fined by Q = (cid:90) V d x q ( x ) , (1) q ( x ) = − π (cid:15) µνρσ tr (cid:104) F µν ( x ) F ρσ ( x ) (cid:105) , (2)where V is the four-volume and F µν ( x ) = ∂ µ A ν ( x ) − ∂ ν A µ ( x ) + [ A µ ( x ) , A ν ( x )] is the field strength. q ( x ) is called thetopological-charge density with the coordinate x in Euclideanspace.In lattice gauge theory, Eq. (1) calculated on a gauge con-figuration with a discretized definition of Eq. (2) is not givenby an integer, but distributes continuously. To obtain dis-cretized values, one may apply a smoothing of the gauge fieldbefore the measurement of q ( x ).In the present study, we use the gradient flow for thesmoothing. The gradient flow is a continuous transformationof the gauge field characterized by a parameter t called theflow time having dimension of mass inverse squared. Thegauge field at a flow time t is a smoothed field with the mean-square smoothing radius √ t . In the following, we denotethe topological charge density obtained at t as q t ( x ), and itsfour-dimensional integral as Q ( t ) = (cid:90) V d x q t ( x ) . (3)Shown in Fig. 1 is the t dependence of Q ( t ) calculated on200 gauge configurations at β = . .
5. The horizontalaxis shows the dimensionlees flow time t / a with the latticespacing a . One finds that the values of Q ( t ) approach discreteinteger values as t becomes larger. In Fig. 2, we show the dis-tribution of Q ( t ) for several values of t / a by the histogramat β = .
5. At t =
0, the values of Q ( t ) are distributed con-tinuously around the origin. As t becomes larger, the distribu-tion converges on discretized integer values. For t / a > . t , one can classify the gauge config-urations into di ff erent topological sectors labeled by the inte-ger topological charge Q defined, for example, by the near-est integer to Q ( t ). It is known that the value of Q definedin this way approximately agrees with the topological chargeobtained through other definitions, and the agreement is betteron finer lattices. From Figs. 1 and 2, one finds that the distribution of Q ( t )deviates from integer values toward the origin. This deviation
2. Phys. Soc. Jpn.
FULL PAPERS t/a Q ( t ) β = 6 . t/a Q ( t ) t/a Q ( t ) t/a Q ( t ) β = 6 . t/a Q ( t ) t/a Q ( t ) Fig. 1.
Flow time t dependence of Q ( t ) on 200 gauge configurations at β = . . t is divided into threepanels representing 0 ≤ t / a ≤ .
35, 0 . ≤ t / a ≤
2, and 2 ≤ t / a ≤ becomes smaller as t becomes larger. From Fig. 1, one alsofinds that Q ( t ) on some gauge configurations has a “flipping”between di ff erent topological sectors; after Q ( t ) shows a con-vergence to an integer value, it sometimes jumps into anotherinteger. As this behavior decreases on the finer lattice, theflipping would be regarded as a lattice artifact arising from theambiguity of the topological sectors on the discretized space-time.In the following, we use t / a = . Q as Q = round[ Q ( t )] t / a = . , (4)where round( x ) means rounding o ff to the nearest integer. Asindicated from Fig. 1, the value of Q hardly changes with thevariation of t / a in the range 4 < t / a <
12. In Table II, weshow the number of gauge configurations classified into eachtopological sector through this definition. The variance of thisdistribution (cid:104)Q (cid:105) is shown in the far right column. In Fig. 2,the distributions of Q ( t ) in individual topological sectors are t/a = 0 β = 6 . t/a = 0 . t/a = 0 . t/a = 0 . t/a = 0 . t/a = 1 . t/a = 4 . Fig. 2.
Distribution of Q ( t ) at several values of t / a . The colored his-tograms are the distributions in individual topological sectors; see text. shown by the colored histograms.
5. Benchmark models
In this study, we analyze q t ( x ) or Q ( t ) at small values of t bythe ML technique. Here, t used for the input has to be chosensmall enough so that a simple estimate of Q like Eq. (4) isnot possible. In this section, before the main analysis with theML technique we discuss the accuracy obtained only from Q ( t ) without the ML. These analyses serve as benchmarks forevaluating the genuine benefit of the ML.Throughout this study, as the performance metric of amodel for an estimate of Q we use the accuracy defined by P = number of correct answernumber of total data . (5)Because the numbers of gauge configurations on di ff erenttopological sectors di ff er significantly as in Table II, Eq. (5)would not necessarily be a good performance metric. In par-ticular, the topological sector with Q = Q = P (cid:39) .
37 (0 .
41) for β = . .
3. Phys. Soc. Jpn.
FULL PAPERS Q -5 -4 -3 -2 -1 0 1 2 3 4 5 (cid:104)Q (cid:105) β = . β = . Table II.
Number of the gauge configurations classified into each topological sector with the definition of Q in Eq. (4). The far right column shows thevariance of the distribution of Q . t/a a cc u r a c y β = 6 . , imp. β = 6 . , naive β = 6 . , Q = 0 β = 6 . , imp. β = 6 . , naive β = 6 . , Q = 0 Fig. 3.
Flow time t dependence of the accuracies P naive and P imp obtainedby the models Eqs. (6) and (7), respectively. The dotted lines show the accu-racy of the model that answers Q = c a cc u r a c y β = 6 . t/a = 0 . t/a = 0 . t/a = 0 . t/a = 0 . t/a = 0 . β = 6 . t/a = 0 . t/a = 0 . t/a = 0 . t/a = 0 . t/a = 0 . Fig. 4.
Accuracy of Eq. (7) as a function of c for several values of t / a .The statistical errors are shown by the shaded band, although the width of thebands are almost the same as the thickness of the lines. use the recalls of individual topological sectors complemen-tary to Eq. (5) to inspect the bias of the output of NN models.To make an estimate of Q from Q ( t ), we consider two sim-ple models. The first model is just rounding o ff Q ( t ) as Q naive = round[ Q ( t )] . (6)The accuracy obtained by this model, P naive , as a function of t is shown in Fig. 3 by the dashed lines. The figure shows thatthe accuracy of Eq. (6) approaches 100% as t / a becomes t / a β = . β = .
50 0.273(3) 0.162(3)0.1 0.383(4) 0.274(3)0.2 0.546(4) 0.474(4)0.3 0.773(3) 0.713(4)0.4 0.925(2) 0.916(2)0.5 0.960(1) 0.989(1)1.0 0.982(1) 0.999(0)2.0 0.992(1) 0.999(0)4.0 1.000(0) 1.000(0)10.0 0.993(1) 0.999(0)
Table III.
Accuracy P imp obtained by the model Eq. (7) with the optimiza-tion of c . larger corresponding to the behavior of Q ( t ) in Figs. 1 and 2.At t / a = .
0, Eq. (6) is equivalent to Eq. (4) and the accuracybecomes 100% by definition.The model Eq. (6) can be improved with a simple modifica-tion. In Fig. 2, one sees that the distribution in each topolog-ical sector is shifted toward the origin from Q . This behaviorsuggests that Eq. (6) can be improved by applying a constantbefore rounding o ff as Q imp = round[ cQ ( t )] , (7)where c is the parameter determined so as to maximize theaccuracy in the range c > t . In Fig. 4, we show the c dependence of the accuracy of Eq. (7) for several values of t / a . The figure shows that the accuracy has a maximum at c > t / a . In this case, the model Eq. (7) has a bet-ter accuracy than Eq. (6) by tuning the parameter c . We denotethe optimal accuracy of Eq. (7) as P imp . In Fig. 3, the t / a de-pendence of P imp is shown by the solid lines. The numericalvalues of P imp are depicted in Table III for some t / a . Fig-ure 3 shows that a clear improvement of the accuracy by thesingle-parameter tuning is observed at t / a (cid:38) . . β = . .
5, respectively. We note that P naive = P imp = t / a = . P imp is al-most unity at t / a = . .
0, which shows that the valueof Q defined by Eq. (4) hardly changes with the variation of t / a in the range t / a (cid:38) . P imp is already close to unity at t / a = .
5, it is di ffi cultto obtain a nontrivial gain of the accuracy from the analysisof q t ( x ) by the NN for t / a ≥ .
5. In the following, therefore,we feed the data at t / a < . P imp for a benchmark of the accuracy obtainedby the NN models.
6. Learning Q ( t ) From this section we employ the ML technique for the anal-ysis of the lattice data. As discussed in Secs. 1 and 2, amongvarious analyses we found that the most successful result isobtained when a set of the values of Q ( t ) at several t is ana-lyzed by the FNN. In this section, we discuss this result. The
4. Phys. Soc. Jpn.
FULL PAPERS layer output size activationinput 3 -full connect 5 logisticfull connect 1 -
Table IV.
Design of the FNN used for the analysis of Q ( t ). analysis of the multi-dimensional data by the CNN will bereported in later sections. In this section we employ a simple FNN model withoutconvolutional layers. The FNN accepts three values of Q ( t ) atdi ff erent t as inputs, and is trained to predict Q by the super-vised learning. The structure of the FNN is shown in Table IV.The FNN has only one hidden layer with five units that arefully connected with the input and output layers. We use thelogistic (sigmoid) function for the activation function of thehidden layer. Although we have also tried the ReLU for theactivation function, we found that the logistic function gives abetter result. We employ the regression model, i.e. the outputof the FNN is given by a single real number. The final pre-diction of Q is then obtained by rounding o ff the output to thenearest integer.For the supervised learning, we randomly divide 20,000gauge configurations into 10,000 and two 5,000 sub-groups.We use 10,000 data for the training, and one of the 5,000 datasets for the validation analysis. The last 5,000 data is usedfor the evaluation of the accuracy of the trained NN. The su-pervised learning is repeated 10 times with di ff erent divisionsof the configurations, and the uncertainty of the accuracy isestimated from the variance.We use the mean-squared error for the loss function, andminimize it through the updates of the NN parameters bythe ADAM with the default setting. The update is repeated3,000 epochs with the batchsize 16. The optimized parameterset of the FNN is then determined as the one giving the lowestvalue of the loss function on the validation data.The FNN is implemented by the Chainer framework. Thetraining of the FNN in this section has been carried out as asingle-core job on a XEON processor (Xeon E5-2698-v3). Ittakes about 40 minutes for a single training on this environ-ment.
Shown in Table V are the accuracies obtained by the trainedFNN for various choices of the input data. The left columnsshow the set of three flow times t / a that evaluate Q ( t ) usedfor the input of the FNN. In the upper eight rows we show theresults with the input flow times t / a = (ˆ t max , ˆ t max − . , ˆ t max − . t max becomes larger. By comparing this result with Table III onefinds that the accuracy obtained by the FNN is significantlyhigher than P imp at t / a = ˆ t max . In particular, the accuracyat ˆ t max = .
3, i.e. t / a = (0 . , . , . β = .
5, while the benchmarkmodel Eq. (7) gives P imp (cid:39) .
71. This result shows that theprediction of Q from the numerical data at t / a ≤ . t max becomes input t / a β = . β = . Table V.
Accuracy of the trained FNN in Table IV with various sets of theinput data. Left column shows the values of t / a that evaluate Q ( t ) for theinput. Errors are estimated from the variance among 10 di ff erent trainings. -5 -4 -3 -2 -1 0 1 2 3 4 5Predicted Q-5-4-3-2-1012345 T r u e Q β = 6 . -5 -4 -3 -2 -1 0 1 2 3 4 5Predicted Q-5-4-3-2-1012345 T r u e Q β = 6 . Fig. 5.
Confusion matrix of the trained FNN model with the input Q ( t ) at t / a = (0 . , . , .
2) for β = . . Q . larger, but the improvement from P imp is limited for muchlarger ˆ t max because P imp is already close to unity. The sameresult is obtained for β = .
2, although the accuracy is slightlylower than β = . Q with the input t / a = (0 . , . , .
2) for β = . . ± Q ( t ) is monotonic at 0 . ≤ t / a ≤ .
3. Therefore,the value at large t can be estimated easily, for example, bythe human eye, for almost all configurations. It is reasonableto interpret that the FNN learns this behavior. We, however,remark that the accuracy of 99% obtained by the trained FNNis still non-trivial. We have tried to find a simple function topredict Q from three values of Q ( t ) at t / a = (0 . , . , . Q ( t ).In the lower three rows of Table V, we show the accu-racies of the trained FNN with the input flow times t / a = (ˆ t max , ˆ t max − . , ˆ t max − .
2) for several values of ˆ t max . The ac-
5. Phys. Soc. Jpn.
FULL PAPERS curacies in these cases are slightly lower than the result with t / a = (ˆ t max , ˆ t max − . , ˆ t max − .
1) at the same ˆ t max . We havealso tested the FNN models analyzing four values of Q ( t ).It, however, was found that the accuracy does not exceed thecase with three input data with the same maximum t / a . Wehave also tested the FNN models having more complex struc-ture, for example, with multiple hidden layers. A statistically-significant improvement of the accuracy, however, was notobserved, either.In the conventional analysis of Q with the gradient flow dis-cussed in Sec. 4, one must use the value of Q ( t ) at a large flowtime at which the distribution of Q ( t ) is well localized. Thismeans that the gradient flow equation has to be solved nu-merically up to the large flow time to obtain Q . Moreover,concerning the continuum a → a is varied. This means that the flow time in lattice units, t / a , becomeslarge and the numerical cost for solving the flow equation in-creases as the continuum limit is approached. On the otherhand, our analysis can estimate Q quite successfully only withthe data at t / a (cid:46) .
3. This means that the numerical cost forthe evaluation of Q can be reduced drastically with the aid ofthe FNN.Table V shows that the better accuracy is obtained on thefiner lattice (larger β ). From Fig. 1, it is suggested that thistendency comes from the reduction of the “flipping” of Q ( t )on the finer lattice, as the non-monotonic flipping makes theprediction of Q from Q ( t ) at small t / a di ffi cult. This e ff ectis also suggested from Fig. 3, as P naive and P imp at β = . β = .
5. We note that this lattice spac-ing dependence hardly changes even if we scale the value of t to determine Q in Eq. (4) in physical units if t is su ffi cientlylarge. Provided that the flipping of Q ( t ) comes from the latticeartifact related to the ambiguity of the topological sectors onthe discretized spacetime, it is conjectured that the imperfectaccuracy of the FNN is to a large extent attributed to this lat-tice artifact. Then, the imperfect accuracy of the FNN at finite a is an inevitable one, and the accuracy should become betteras the lattice spacing becomes finer. Therefore, it is conjec-tured that the systematic uncertainty arising from the imper-fect accuracy of the FNN is suppressed in the analysis of thecontinuum extrapolation. Next, we consider the variance of the topological charge (cid:104)Q (cid:105) , which is related to the topological susceptibility as χ Q = (cid:104)Q (cid:105) / V . From the output of the FNN with the input flow times t / a = (0 . , . , . Q is calculated to be (cid:104)Q (cid:105) NN = . β = . , (8) (cid:104)Q (cid:105) NN = . β = . , (9)for each β where the first and second errors represent the sta-tistical error obtained by the jackknife analysis and the uncer-tainty of the FNN model estimated from 10 di ff erent trainings,respectively. These values agree well with those shown in Ta-ble II. So far, we have performed the training of the FNN with thenumber of the training data N train = , N train ,
000 5 ,
000 1 ,
000 500 100 β = . . . . . . β = . . . . . . Table VI.
Dependence of the accuracy of the trained FNN on the numberof the training data N train . analyzed data β = . β = . β = . . . β = . . . . / . . . Table VII.
Accuracy obtained by the analysis of the data with di ff erent β from the one used for the training. Last row shows the accuracy of the FNNtrained by the mixed data set at β = . . the training with much smaller N train .Shown in Table VI is the accuracy of the trained FNN withvarious N train with the input flow times t / a = (0 . , . , . N train = N train is alsoresponsible for reducing the numerical cost for the training.With N train = , . N train =
500 on the same environ-ment.
Next, we consider the analysis of the data with di ff erent β from the one used for the training. In Table. VII, we show theaccuracy obtained with various combinations of the β valuesused for the training and the analysis with the input flow times t / a = (0 . , . , . ff erent data set is analyzed, but thereduction is small and almost within statistics. We have alsoperformed the training of the FNN with the combined data setof β = . .
5. The result of this analysis is shown inthe far bottom row in Table. VII. One finds that this FNN canpredict Q for each β with the same accuracy within statisticsas those trained for individual β .These results suggest that it is possible to develop a NNmodel to deal with various β simultaneously. Once such amodel is developed, the model plays a quite useful role in theanalysis of Q . We, however, notice that the two lattices stud-ied in the present study have almost the same spatial volumein physical units. The analysis of the robustness against thevariation of the spatial volume is left for future work.
7. Learning topological charge density q t ( x ) In this section we employ the CNN and train it to analyzethe four-dimensional field q t ( x ). A motivation of this analy-sis is the search for characteristic features responsible for thetopology in the four-dimensional space by the ML. In particu-lar, if the quantum gauge configurations have local structureslike instantons, such structures would be recognized by theCNN and used for an e ffi cient prediction of Q .
6. Phys. Soc. Jpn.
FULL PAPERS layer filter size output size activationinput - 8 d × N ch -convolution 3 d d × d d × d d × d × Table VIII.
Design of the CNN for the analysis of the multi-dimensionaldata. The dimension d of the input data is 4 in Sec. 7. In Sec. 8, we analyzethe data with d = , , Let us first discuss the choice of the input data for the CNN.Because the gauge configurations on the lattice are describedby the link variables U µ ( x ), which are elements of the groupSU(3), the most fundamental choice for the input data is thelink variables. However, as U µ ( x ) is described by 72 real vari-ables per lattice site, the reduction of the data size is desir-able for an e ffi cient training. Moreover, because physical ob-servables are given only by gauge-invariant combinations of U µ ( x ), the CNN must learn the concept of the gauge invari-ance, and accordingly the SU(3) matrix algebra, so that it canmake a successful prediction of Q from U µ ( x ). These con-cepts, however, would be too complicated for simple CNNmodels.In the present study, from these reasons we use the topo-logical charge density q t ( x ) as the input of the CNN. q t ( x ) isgauge invariant, and the degrees of freedom per lattice siteis one. To reduce the size of the input data further, we reducethe lattice volume to 8 from 16 and 24 by the average pool-ing as a preprocessing. In addition to the analysis of q t ( x ) ata given t , we prepare a combined data set of q t ( x ) with sev-eral values of t and analyze it as the multi-channel data by theCNN. In this section, we use the CNN with the convolutional lay-ers that deal with four-dimensional data. In Table. VIII, weshow the structure of the CNN model, where d denotes thedimension of the spacetime and is set to d = and five output channels. In these convolu-tional layers, we use the periodic padding for all directions torespect the periodic boundary conditions of the gauge config-uration. N ch denotes the number of channels of the input dataper lattice site; N ch = q t ( x ) at a single t is fed into theCNN. We also perform the multi-channel analysis by feeding q t ( x ) at N ch flow times.The lattice gauge theory has translational symmetry andthe shift of the spatial coordinates of q t ( x ) toward any di-rections does not change the value of Q . To ensure that theCNN automatically respects this property, we insert a globalaverage pooling (GAP) layer after the convolutional layers.The GAP layer takes the average with respect to the spatialcoordinates for each channel. The output of the GAP layer isthen processed by two fully-connected layers before the finaloutput. The logistic activation function is used for the convo-lutional and fully-connected layers.The training of the CNN in this section has been mainly carried out on Google Colaboratory. We use 12,000 data forthe training, 2,000 data for the validation, and 6,000 data forthe test, respectively. The batchsize for the minibatch train-ing is 200. We repeat the parameter tuning 500 epochs. Othersettings of the training are the same as in the previous section.Besides the CNN model in Table. VIII, we have tested var-ious variations of the model. For example, we tested ReLUactivation function in place of the logistic one. The use ofthe fully-connected layer in place of the GAP layer and theconvolutional layers with the 5 filter size are also tried. Thenumber of the output channels of the convolutional layers isvaried up to 20. We, however, found that these variations donot improve the accuracy at all, while they typically increasethe numerical cost for the training. The CNN in Table. VIII isa simple but e ffi cient choice among all these variations. In Table IX, we show the performance of the trained CNNwith various inputs. Left two columns show N ch and the flowtime(s) used for the input. On the upper four rows, the resultswith N ch = t / a =
0, 0 .
1, 0 .
2, and 0 . N ch = q t ( x ) at t / a = (0 . , . , .
1) areused. The third column shows the accuracy P of the trainedCNN obtained for each input. In the table, we also show therecalls of individual topological sectors R Q defined by R Q = N correct Q N Q , (10)where N Q is the number of configurations in the topologi-cal sector Q and N correct Q is the number of the correct answersamong them.The top row of Table IX shows P and R Q obtained bythe analysis of the topological charge density of the originalgauge configuration without the gradient flow. Although weobtain a nonzero P , the recall of each Q shows that in this casethe CNN is trained to answer Q = Q .Next, the results with N ch = t / a show that P becomes larger with increasing t / a . From R Q one also findsthat the output of the CNN scatters on di ff erent topologicalsectors. However, by comparing P with that of the benchmarkmodel P imp in Table III with the same t / a , one finds that P and P imp are almost the same. This result suggests that theCNN is trained to answer Q imp and no further informationis obtained from the analysis of the four-dimensional data of q t ( x ).Finally, from the multi-channel analysis with the inputflow times t / a = (0 . , . , . P is significantly enhanced from the case with N ch = β . However, this accuracy is thesame within the error as that obtained in Sec. 6 with t / a = (0 . , . , .
1) shown in Table V. This result implies that theCNN is trained to obtain Q ( t ) for each t and then predicts theanswer from them with a similar procedure as the FNN inSec. 6.From these results, we conclude that our analyses of thefour-dimensional data by the CNN fail in finding structures inthe four-dimensional space responsible for the determinationof Q . The numerical cost for the training of the CNN in this
7. Phys. Soc. Jpn.
FULL PAPERS β = . R Q N ch input t / a P -4 -3 -2 -1 0 1 2 3 41 0 0.371 0 0 0 0 1.000 0 0 0 01 0.1 0.401 0 0 0.002 0.255 0.702 0.341 0.008 0 01 0.2 0.552 0 0.043 0.240 0.495 0.687 0.597 0.336 0.111 01 0.3 0.776 0 0.391 0.687 0.760 0.821 0.794 0.740 0.569 03 0.3,0.2,0.1 0.942 0.200 0.913 0.944 0.950 0.944 0.939 0.937 0.889 0.571 β = . R Q N ch input t / a P -4 -3 -2 -1 0 1 2 3 41 0 0.388 0 0 0 0 1.000 0 0 0 01 0.1 0.396 0 0 0 0.086 0.889 0.129 0 0 01 0.2 0.479 0 0 0.108 0.445 0.641 0.459 0.150 0 01 0.3 0.698 0 0.170 0.585 0.730 0.727 0.701 0.624 0.395 0.0713 0.3,0.2,0.1 0.953 0 0.830 0.951 0.956 0.952 0.962 0.968 0.953 0.286 Table IX.
Accuracy P and the recalls of individual topological sectors R Q obtained by the analysis of the topological charge density in the four-dimensionalspace by the CNN. The input data has N ch channels. 8. Phys. Soc. Jpn. FULL PAPERS dimension d a cc u r a c y P β = 6 . β = 6 . Fig. 6.
Dependence of the accuracy P on the spacetime dimension d afterthe dimensional reduction. The values at d = section is a few orders larger than those in Sec. 6, althoughclear improvement of the accuracy is not observed. Therefore,for practical purposes the analysis in the previous section withthe FNN is superior.
8. Dimensional reduction
In the previous two sections we discussed the analysis ofthe four-dimensional topological charge density q t ( x ) and itsfour-dimensional integral Q ( t ) by the ML. The spatial dimen-sions of these input data are d = d = q (3) t ( x , x , x ) = (cid:90) dx q t ( x , x , x , x ) , (11)˜ q (2) t ( x , x ) = (cid:90) dx dx q t ( x , x , x , x ) , (12)˜ q (1) t ( x ) = (cid:90) dx dx dx q t ( x , x , x , x ) , (13)with q t ( x ) = q t ( x , x , x , x ). Here, ˜ q ( d ) t is the d -dimensionalfield analyzed by the CNN. The structure of the CNN is thesame as the previous section (see Table VIII) except for thevalue of d . The procedure of the supervised learning is alsothe same. We perform the analysis of the multi-channel datawith N ch = t / a = (0 . , . , . d -dimensional data ˜ q ( d ) t by the CNN. The data pointsat d = Q ( t ) bythe FNN in Sec. 6 with t / a = (0 . , . , .
1) given in Table V.From the figure, one finds that the accuracy does not have astatistically-significant d dependence, although the results at d = d =
9. Discussion
In the present study, we have investigated the application ofthe machine learning (ML) technique for the classification of the topological sector of gauge configurations in SU(3) Yang-Mills theory. The topological charge density q t ( x ) at zero andnonzero flow times t are used for the inputs of the neural net-works (NN) with and without the dimensional reduction.We found that the prediction of the topological charge Q can be made most e ffi ciently when Q ( t ) at small flow timesare used as the input of the NN. In particular, we found thatthe value of Q defined from Q ( t ) at a large flow time can bepredicted with high accuracy only with Q ( t ) at t / a ≤ .
3; at β = .
5, the accuracy exceeds 99%. Using this procedure, thenumerical cost for solving the gradient flow toward the largeflow time can be omitted in the analysis of the topologicalcharge.Because the prediction of the NN does not have 100% ac-curacy, the analysis of Q by the NN gives rise to uncontrol-lable systematic uncertainties. However, our analyses indicatethat the accuracy becomes better as the continuum limit is ap-proached. Moreover, as discussed in Sec. 6, the imperfect ac-curacy would to a large extent come from intrinsic uncertaintyof the topological sectors on the lattice with finite a . It thus isexpected that the analysis of Q becomes more accurate as thelattice spacing becomes finer. As the 99% accuracy is alreadyattained at β = . a (cid:39) .
044 fm), the systematic uncertaintyshould be well suppressed on the lattices finer than this lat-tice spacing, and our analysis should be able to carry out theanalysis of the continuum limit safely.In this study, we found that the analysis of the multi-dimensional field q t ( x ) by the CNN does not improve the ac-curacy compared with that of Q ( t ). A plausible interpretationof this result is that our CNN fails in capturing useful struc-tures in the four-dimensional space relevant for the determina-tion of Q . It is an interesting future work to pursue the searchfor the structures in the four-dimensional space by the ML.One possible extension along this direction is the analysiswith the CNN having a more complex structure. Another in-teresting direction is the analysis of the gauge configurationsat high temperatures where the dilute instanton-gas picture iswell applicable. As the topological charge would be carried bywell-separated local objects at such temperatures, the searchfor the multi-dimensional space by the CNN would be easierthan the vacuum configurations analyzed in the present study.It is also interesting to analyze q t ( x ) at a large flow time aftersubtracting the average, because the NN can no longer makeuse of the information on Q ( t ) by such a preprocessing. Weleft these analyses for future research.The authors thank A. Tomiya for many useful discussions.They also thank H. Fukaya and K. Hashimoto. The latticesimulations of this study are in part carried out on OCTOPUSat the Cybermedia Center, Osaka University and Reedbush-Uat Information Technology Center, The University of Tokyo.The NN are constructed on Chainer framework. The super-vised learning of the NN in Sec. 7 is in part carried outon Google Colaboratory. This work was supported by JSPSKAKENHI Grant Numbers 17K05442 and 19H05598. Appendix: Behavior of Q ( t ) In this appendix, we take a closer look at the behavior of Q ( t ) at small t . In Fig. A ·
1, we show the t dependence of Q ( t )on 100 gauge configurations at β = . ff erent ways.
9. Phys. Soc. Jpn.
FULL PAPERS ˆ t = t/a ¯ Q ( ˆ t ) ˆ t = t/a ¯ Q ( ˆ t ) / ¯ Q ( . ) Fig. A · Closer look at the behavior of Q ( t ) at ˆ t = t / a ≤ . In the upper panel we show¯ Q (ˆ t ) = Q (ˆ ta ) − Q , (A · t = t / a , while the lower panel shows¯ Q (ˆ t )¯ Q (0 . . (A · ·
2) becomes unity at ˆ t = . Q from the behavior of Q ( t ) at 0 . ≤ ˆ t ≤ . β = .
5. This range of ˆ t is highlighted by the grayband in Fig. A ·
1. From the upper panel, one sees that ¯ Q (ˆ t )approaches zero monotonically on almost all configurations.However, the panel shows that some lines deviate from thistrend. As a result, it seems di ffi cult to predict the value of Q with 99% accuracy ( Q has to be predicted correctly on 99lines among 100 in the panel) by a simple function or thehuman eye from the behavior at 0 . ≤ ˆ t ≤ .
3, although 95%accuracy is not di ffi cult to attain. A similar observation is alsoobtained from the lower panel. It thus is indicated that the99% accuracy obtained by the NN in Sec. 4 is not a trivialresult.
1) S. Weinberg:
The quantum theory of fields. Vol. 2: Modern applications (Cambridge University Press, 2013).2) R. D. Peccei and H. R. Quinn: Phys. Rev. Lett. (1977) 1440.3) J. Preskill, M. B. Wise, and F. Wilczek: Phys. Lett. B (1983) 127 .4) L. Abbott and P. Sikivie: Phys. Lett. B (1983) 133 .5) E. Berkowitz, M. I. Bucho ff , and E. Rinaldi: Phys. Rev. D92 (2015)034507.6) R. Kitano and N. Yamada: JHEP (2015) 136.7) M. C`e, C. Consonni, G. P. Engel, and L. Giusti: Phys. Rev. D92 (2015)074502.8) C. Bonati, M. D’Elia, M. Mariti, G. Martinelli, M. Mesiti, F. Negro,F. Sanfilippo, and G. Villadoro: JHEP (2016) 155.9) P. Petreczky, H.-P. Schadler, and S. Sharma: Phys. Lett. B762 (2016)498.10) J. Frison, R. Kitano, H. Matsufuru, S. Mori, and N. Yamada: JHEP (2016) 021.11) S. Borsanyi et al.: Nature (2016) 69.12) Y. Taniguchi, K. Kanaya, H. Suzuki, and T. Umeda: Phys. Rev. D95 (2017) 054502.13) S. Aoki, G. Cossu, H. Fukaya, S. Hashimoto, and T. Kaneko: PTEP (2018) 043B07.14) C. Alexandrou, A. Athenodorou, K. Cichy, A. Dromard, E. Garcia-Ramos, K. Jansen, U. Wenger, and F. Zimmermann: arXiv:1708.00696(2017).15) F. Burger, E.-M. Ilgenfritz, M. P. Lombardo, and A. Trunin: Phys. Rev.
D98 (2018) 094501.16) P. T. Jahn, G. D. Moore, and D. Robaina: Phys. Rev.
D98 (2018) 054512.17) C. Bonati, M. D’Elia, G. Martinelli, F. Negro, F. Sanfilippo, and A. To-daro: JHEP (2018) 170.18) L. Giusti and M. L¨uscher: Eur. Phys. J. C79 (2019) 207.19) M. Luscher: Commun. Math. Phys. (1982) 39.20) M. F. Atiyah and I. M. Singer: Annals of Math. (1971) 139.21) Y. Iwasaki and T. Yoshie: Phys. Lett. (1983) 159.22) M. Teper: Phys. Lett. (1985) 357.23) S. Aoki, H. Fukaya, S. Hashimoto, and T. Onogi: Phys. Rev. D76 (2007)054508.24) M. Luscher and S. Schaefer: JHEP (2011) 036.25) Y. Lecun, L. Bottou, Y. Bengio, and P. Ha ff ner: Proc. IEEE (1998)2278.26) A. Krizhevsky, I. Sutskever, and G. E. Hinton: Proceedings of the 25thInternational Conference on Neural Information Processing Systems -Volume 1, NIPS’12, 2012, pp. 1097–1105.27) Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean,and A. Ng: Proceedings of the 29th International Conference on Ma-chine Learning (ICML-12), ICML ’12, July 2012, pp. 81–88.28) M. Lin, Q. Chen, and S. Yan: arXiv:1312.4400 (2013).29) C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Er-han, V. Vanhoucke, and A. Rabinovich: CVPR, June 2015.30) K. Simonyan and A. Zisserman: International Conference on LearningRepresentations, 2015.31) K. He, X. Zhang, S. Ren, and J. Sun: CVPR, June 2016.32) R. Girshick, J. Donahue, T. Darrell, and J. Malik: CVPR, June 2014.33) R. Girshick: ICCV, December 2015.34) W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, andA. C. Berg: ECCV, June 2016.35) S. Ren, K. He, R. Girshick, and J. Sun, Advances in Neural InformationProcessing Systems 28, p. 91. Curran Associates, Inc., 2015.36) J. Redmon, S. Divvala, R. Girshick, and A. Farhadi: CVPR, June 2016.37) T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Advancesin Neural Information Processing Systems 26, p. 3111. 2013.38) A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.Gomez, L. u. Kaiser, and I. Polosukhin, Advances in Neural Informa-tion Processing Systems 30, p. 5998. 2017.39) J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova: arXiv:1810.04805(2018).40) A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals,A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu:arxiv:1609.03499, 2016.41) V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran,D. Wierstra, S. Legg, and D. Hassabis: Nature (2015) 529.42) D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang,10. Phys. Soc. Jpn. FULL PAPERS
A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap,F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis:Nature (2017) 354.43) I. E. Lagaris, A. Likas, and D. I. Fotiadis: IEEE Trans. Neural Networks (1998) 987.44) P. Baldi, P. Sadowski, and D. Whiteson: Nature Commun. (2014)4308.45) L. de Oliveira, M. Kagan, L. Mackey, B. Nachman, and A. Schwartz-man: JHEP (2016) 069.46) T. Ohtsuki and T. Ohtsuki: J. Phys. Soc. Jap. (2016) 123706.47) J. Barnard, E. N. Dawe, M. J. Dolan, and N. Rajcic: Phys. Rev. D95 (2017) 014018.48) A. Tanaka and A. Tomiya: J. Phys. Soc. Jap. (2017) 063001.49) J. Carrasquilla and R. G. Melko: Nature Phys. (2017) 431 EP .50) Y. Mori, K. Kashiwa, and A. Ohnishi: Phys. Rev. D96 (2017) 111501.51) M. Raissi, P. Perdikaris, and G. E. Karniadakis: arXiv:1711.10561(2017).52) H. Huang, B. Xiao, H. Xiong, Z. Wu, Y. Mu, and H. Song: arXiv:1081.03334 (2018).53) P. E. Shanahan, D. Trewartha, and W. Detmold: Phys. Rev.
D97 (2018)094506.54) K. Hashimoto, S. Sugishita, A. Tanaka, and A. Tomiya: Phys. Rev.
D98 (2018) 046019.55) K. Kashiwa, Y. Kikuchi, and A. Tomiya: PTEP (2019) 083A04.56) J. Steinheimer, L. Pang, K. Zhou, V. Koch, J. Randrup, and H. Stoecker:arXiv:1906.06562 (2019).57) K. Fukushima, S. S. Funai, and H. Iida: arXiv:1908.00281 (2019).58) R. Narayanan and H. Neuberger: JHEP (2006) 064.59) M. L¨uscher: JHEP (2010) 071. [Erratum: JHEP03,092(2014)].60) M. Luscher and P. Weisz: JHEP (2011) 051.61) M. Kitazawa, T. Iritani, M. Asakawa, T. Hatsuda, and H. Suzuki: Phys.Rev. D94 (2016) 114512.62) D. P. Kingma and J. Ba: arXiv:1412.6980 (2014).63) Chainer. https://chainer.org/ .64) Google Colaboratory. https://colab.research.google.com/https://colab.research.google.com/