Continuous-mixture Autoregressive Networks for efficient variational calculation of many-body systems
RRecognizing the topological phase transition by Variational Autoregressive Networks
Lingxiao Wang ∗ ,
1, 2
Yin Jiang † , Lianyi He ‡ , and Kai Zhou § Department of Physics, Tsinghua University and Collaborative Innovation Center of Quantum Matter, Beijing 100084, China. Frankfurt Institute for Advanced Studies, Ruth-Moufang-Str. 1, 60438 Frankfurt am Main, Germany Department of Physics, Beihang University, Beijing 100191, China. (Dated: May 12, 2020)A generative model, the Variational Autoregressive Networks(VANs) is introduced to recognize theKosterlitz-Thouless phase transition on a periodic square lattice. Vortices as the quasi-Long RangeOrder(LRO) are accurately detected by the ordering neural networks. By learning the microscopicpossibility distributions from macroscopic thermal distribution, the free energy is directly calculatedand vortices(anti-vortices) pair emerge with temperature rising. As a more precise estimation, thehelicity modulus is evaluated to anchor the transition temperature to be at β c = 1 . Introduction .—Machine Learning(ML) techniques areattracting a widespread interests in different fields, sinceits power to extract and express structures inside com-plex data. The machine learning is permeating all fieldsof physics as projects including the classification, re-gression and generating patterns [1, 2]. Remarkably,it is found that the neural networks can classify phasestructures in both condensed matter and high energyphysics [3–5]. As for the regression, the event selec-tion in a large data set e.g. from the LHCb [6, 7],the spinodal decomposition in heavy ion collisions [8],and the molecular structure prediction [9] are success-ful applications. Furthermore, the machine learningsheds light on the innovation of the first principle cal-culation, e.g. the many-body system computation hasbeen spurred by its development. The Restricted Boltz-mann Machine(RBM) was applied in solving quantummany-body system firstly [10], and a deep neural net-work was constructed to derive solutions of the many-electron schr¨odinger equation [11], where proper neuralnetworks can work as an
Ansatz to represent systems asefficient as possible. Even an open quantum many-bodysystem could be represented by the neural networks [12–15]. Modifying the classical algorithms with machinelearning is another potential direction [2, 16, 17], whichis routinely adopted to improve or assist the conventionalcalculations.In addition, it’s natural to set the lattice simulationas a platform for applying neural networks, since theyshare similar discrete architectures. The advantage of thelattice computing is that it can discretize the problemson finite sites and get the correct physical results underthe continuous limit eventually. There are some meritori-ous attempts [17–21] with training data generated by the ∗ [email protected] † jiang [email protected] ‡ [email protected] § zhou@fias.uni-frankfurt.de standard Markov Chain Monte Carlo(MCMC) method,which constrains its expandability and efficiency sincethe critical slowing down near the critical point [21] andthe sign-problem [20]. Recently, a new method based onan autoregressive neural network was proposed and ap-plied in discrete spin systems [22, 23]. By decomposingthe macroscopic probability distribution onto the micro-scopic lattice site by site to be the variational ansatz, itachieved a higher accuracy in solving several Ising-typesystems. As a matter of fact, the Variational Autoregres-sive Networks(VANs) is a typical generative model whichhas extraordinary potentialities in the lattice calculationas it already shows in the image processing projects [24].Inspired by the successful application of machine learn-ing, the topological phase transitions can be exploredwith the state-of-the art ML approaches. Different fromthe classical phase transition, the topological phase tran-sition occurs with topological defects emerging, whichhas been continuously attracting the attention from var-ious fields of physics [25]. Some related efforts in bothsupervised learning and unsupervised learning have beenattempted [26–29]. The winding numbers were recog-nized by a supervised trained neural network for one-dimensional insulator model [28]. By generating theconfigurations with MCMC sampling and supplying fea-ture engineered vortex configurations as the input, neuralnetworks could detect the topological phase transitionfrom well-preprocessed configurations [27]. In anotherattempt [30] the Variational Autoencoder was employedwith augmented objective function, even the bulk mag-netizations are not quantitatively captured by the con-figurations generated from their decoder.In this letter, we apply the VANs to the task of recog-nizing the topological phase transition with continuousvariables in unsupervised manner. As a concise and ref-erence example is the two-dimensional XY model, whichexhibits a Kosterlitz-Thouless (KT) phase transition [31–33]. In the autoregressive neural networks, the micro-scopic state on each lattice site is probabilistically mod-elled in order, which constructs a joint probability from a r X i v : . [ c ond - m a t . d i s - nn ] M a y these conditional probability distribution for the wholeconfiguration [22, 23]. In the following parts, a genericVANs framework is introduced and the neural networkssuitable for the XY model is constructed. The signalof the KT phase transition, vortices, are automaticallygenerated by the neural networks. Correspondingly, amore accurate estimation of the transition temperaturefor the KT phase transition in 2-D XY model is givenby calculating the helicity modulus [33]. As for thetime-consuming, despite of the increasing trend with lat-tice size enlarging, the training time remains unchangedaround the transition point. Considering the advantagesof the VANs, the latent applications to the other physicalsystems are proposed in the final part. Variational Autoregressive Networks .—The Hamilto-nian of the 2-dimensional XY model on the lattice isexpressed by spins living on the lattice sites with nearest-neighbor interactions H = − J (cid:88) s i s j = − J (cid:88) cos ( φ i − φ j ) (1)where < i, j > indicates that the sum is taken over allnearest-neighbor pairs and the angle φ i ∈ [0 , π ) de-notes the spin orientation on site i . The Mermin-Wagnertheorem states that a long-range ordered (LRO) phasecannot exist in two dimensional systems with continuousdegrees of freedom, since the fluctuations breaks the or-der [34]. Nevertheless the formation of topological defects(i.e. vortices/anti-vortices) in the XY model brings in aquasi-LRO phase, which characterizes the global prop-erty of the many-body system.To detect the KT phase transition in the XY model,in statistical mechanics the free energy F = − (1 /β ) ln Z should be concerned, where β ≡ /T is the inverse tem-perature. The free energy is constructed by the partitionfunction Z ≡ (cid:80) s exp( − βE ( s )), which contains all in-formation about the system. The summation covers allpossible configurations { s } of the system. Monte Carloalgorithms are routinely applied to generate the config-urations, which can achieve the proper relative impor-tance among configurations. However the free energycan not be computed directly from the algorithm. Vari-ational approaches were proposed to solve the problemand with recent extension of the variational Ansatz toautoregressive neural networks as advanced in VAN [22],which is an effective approximate method widely ap-plied in the many-body systems. In this paper, thevariational target function is the joint probability of theconfigurations, which follows the Boltzmann distribution p ( s ) = e − βE ( s ) /Z . The configurations s = { s , s , ..., s N } with continuous spins are defined on the lattice with N sites. As a variational Ansatz , the joint distribu-tion q θ ( s ) are parametrized by variational parameters { θ } and tuned to approach the target distribution p ( s ).The Kullback-Leibler (KL) divergence [35] between thevariational and the target distribution, D KL ( q θ (cid:107) p ) ≡ E s ∼ q θ ( − log p + log q θ ), provides the measure of the close-ness from q θ to p . The corresponding variational freeenergy derives from D KL ( q θ (cid:107) p ) = β ( F q − F ), that’s F q ≡ (1 /β ) (cid:88) s q θ ( s ) [ βE ( s ) + ln q θ ( s )] (2)Since D KL ( q θ (cid:107) p ) is non-negative, the variational free en-ergy F q is always an upper bound of the true free en-ergy. Thus the minimization on D KL ( q θ (cid:107) p ) and the vari-ational free energy F q are equivalent. Meanwhile, as sev-eral works pointed out[10, 22, 23], it’s directly to mapthe parameters onto the weights of an Artificial NeuralNetwork (ANN), then the variational free energy is theloss function. Hidden LayersMaskedConv. 2 channelsInput (samples)Output (probabilities) F q … a b FIG. 1: The architecture of the VANs in the XY model com-putation. It’s a classical PxielCNN structure[36], in whichthe masked layers are added into the network to establish theautoregressive networks.
Once adapt the parametrization for q θ ( s ) to be withANN, the variational problems are nothing but train-ing the networks. With autoregressive networks as inVAN for this parametrization, the variational distribu-tion is decomposed into a product of sequential condi-tional probabilities, q θ ( s ) = N (cid:89) i =1 q θ ( s i | s , . . . , s i − ) (3)which provides the variational Ansatz by parametrizingeach conditional probability as neural networks. In XYmodel, considering the fact that the orientation of spinchanges continuously, each conditional probability factor q ( s i ) should be continuous on each site. As a properchoice, we propose to take the mixture of beta distribu-tion X ∼ Beta ( a, b ) which ensures that the random vari-ables distribute in a finite interval. Furthermore, in theBayesian inference the beta distribution is the conjugateprior probability distribution of the Bernoulli distribu-tion, which has been proven to be a proper distributionfor Ising model case[22]. The Beta ( a, b ) is continuouslydefined in a finite interval with two positive shape param-eters ( a, b ). Thus the hidden layers of neural networks aredesigned to be two channels type for each Beta compo-nent, and the conditional probabilities are derived as q θ ( s i | s , ..., s i − ) = Γ( a i + b i )Γ( a i )Γ( b i ) s a i − i (1 − s i ) b i − (4)where Γ( a ) is the gamma function, and s i = θ i / π ∈ [0 , , ( a i , b i ) >
0. The outputs of the hidden layers are( a , b ), which can be realized by Fig. 1.To match the two dimensional structures for the sys-tem, the PixelCNN[36] was employed which can preservenaturally the locality and the translational symmetry.In addition, the autoregressive property is guaranteedby putting a mask on the convolution kernel, so thatthe weights are non-zero for half of the kernel, and eachconditional probability q θ ( s i ) is independent of s j with j < i for a prechosen ordering. As Fig. 1 shows, theinput layer takes in the configurations s on the lattice,and after passing it through several masked convolutionlayers, the parameters of the Beta distribution on eachsite are obtained in the output layer and thus the con-figuration probability q θ ( s ) can be derived, with whichthe variational free energy can be further calculated viaEq. (2) after such a forward propagation on a batch ofindependent configurations. Specifying channels in theconvolution layers to represent parameters of each Betacomponent is found to be effective in saving the trainingtime and speeding up the sampling later. Training theneural networks here is the key to perform the variationalapproach. With a classical back-propagation algorithm,the gradient of the loss function (i.e. variational freeenergy) with respect to network parameters is needed,which after employing the log-derivative trick [37] reads β ∇ θ F q = E s ∼ q θ ( s ) { [ βE ( s ) + ln q θ ( s )] ∇ θ ln q θ ( s ) } (5)where the gradient ∇ θ ln q θ ( s ) is weighted by the rewardsignal ( βE ( s ) + ln q θ ( s )).The nuts-and-bolts VANs computation is implementedby the following procedures: With the randomly initial-ized network, sample independently a batch of configura-tions to be the training set; Forward pass the training setto evaluate their log-probability and the variational freeenergy F q ; Estimate the gradient β ∇ θ F q and update thenetwork weights via back-propagation; With the updatednetwork re-sample a batch of configurations to be thenew training set, which actually follow the current jointprobability q ; Repeat the above until the loss functionis convergent; Sample ensemble of configurations from q independently site by site at once; Calculate the thermo-dynamic observables. Although the criteria of the con-vergence (cid:15) is not rigorously defined, the difference of the energy between only one site changing or not could bean approximate superior limit: (cid:15) (cid:28) J/N on the squarelattice with N sites. Rediscovery KT Phase Transition .—As mentioned ear-lier, the thermodynamic observables in 2-d XY modelhave been well-computed in numerous MCMC works[31,33, 38, 39]. Since that, it’s necessary to compare the re-sults in the VANs and the MCMC. In the following cal-culations, the default setup of the network we adopted inVANs is with width and depth as (32 , Adam optimizer is applied to minimize the loss functionin
Pytorch . The implementation of the VANs is avail-able at Ref. [41]. The corresponding hyperparametersare consistent with the former works[22]. As for compu-tation time-consuming reference: with batch size 1000 fora 16 ×
16 square lattice with periodic boundary, a typicaltraining step cost 0.198 second on a single NVIDIA RTX2080 GPU. On the other side, the MCMC was imple-mented in a classical algorithm[42, 43] with 50000 warm-up steps to reach the equilibrium, and the energy wascomputed from 1000 configurations sampled from each10000 steps in equilibrium.The variational free energy per site (density) is pre-sented in Fig. 2, where the results are divided into fourdifferent lattice sizes, L = 4 , ,
16. With the lattice sizeincreasing, the results of L = 8 and L = 16 indicatesthat the free energy converges rapidly. This ensures thatthe size effect can be avoided later for larger size L = 16in the following discussion. It should be added that al-though it’s difficult to calculate the free energy directlyin the MCMC, the similar variational methods can alsobe applied in the XY model with a MCMC updating pro-cess, which has been discussed in the different models[44]. Inverse Temperature -1.2-1-0.8-0.6-0.4-0.20 F r ee ene r g y L=16L=8L=4
FIG. 2: The free energy per site of the XY model on a squarelattice with periodic boundary condition.
Fig.3 shows the energy per site (density) at differenttemperatures respectively in the VANs and the MCMCalgorithm. At the low temperature area ( β > β < n = v/ (2 L ) withthe vorticity v = (1 / π ) (cid:72) C ∇ φ ( r ) · d r are evaluated andcompared between VAN and MCMC. Since the vortexemerges with higher entropy in XY model, configurationswith more vortices have lower free energy than the less.That’s the reason why the energy in VANs is higher thanit in MCMC at β < . . < β < FIG. 3: The energy density distributes with the inverse tem-perature β in the MCMC method and the VANs. The size oflattice is 16 ×
16, and the shadow areas label the standard de-viation from thermal fluctuations. The vortices pair densityextracted in the VANs is posted at the northeast corner.TABLE I: Vortices. The vortices (anti-vortices) pair density n can be extracted from the well-trained networks. Afterthe variational free energy converging, 1000 configurations aresampled from the networks, and the following values are fromensemble average. The non-zero value of the vortices densityat high temperature suggests that there is a topological phasetransition in β = 1 ∼ . Inverse temperature β Energy density
MCMC -0.424 -0.682 -0.996 -1.336 -1.502VANs -0.376 -0.577 -1.025 -1.359 -1.498
Vortices n MCMC 0.114 0.080 0.042 0.010 0.002VANs 0.122 0.096 0.035 0.004 0.001
From other works[31, 39, 45], a relative accurate tran-sition temperature is reported as β KT = 1 /T KT ≈ . q θ ( s i | s sin( φ i − φ j ) (cid:126)e ij (cid:126)x ) (cid:105) (6)where L is the size of the square lattice, (cid:126)e ij is the vectorpointing from site j to site i , (cid:126)x is an unit vector of a fixeddirection in the lattice plane (the trivial choice is x, y on the square lattice). The Kosterlitz renormalization-group [32] predicts that γ ( L → ∞ ) jumps from the value2 T c /π to zero at the critical temperature, thus the he-licity modulus gives a reliable estimation of the phasetransition point. Inverse Temperature H e li c i t y M odu l u s ( L ) Channel=6Channel=12Channel=1002T/
FIG. 4: The helicity modulus distributes with the inverse tem-perature β in the VANs, and the cross point is β c (cid:39) .
101 inlattice size L = 16 with 100 channels mixture Beta function. In Fig. 4, the evaluated helicity modulus from theVANs is shown for lattice size L = 16 with multi-channelIBLIOGRAPHY 5Beta function respectively. The markers are the numer-ical results from the VANs with multi-channel mixtureBeta function and the dashed lines show their fittingcurve which is used to anchor the crossing point withthe 2 / ( πβ ) line. We observe that the crossing point for L = 16 is β c (cid:39) . β KT . Since the helicity modulusdepends on the correlation function which needs a higherorder statistics than the energy, small-sized lattices arenot considered here given the larger sized results can givemore precise evaluation, which is observed from standardMonte Carlo simulation [27, 39]. For the time-consumingin the case L = 16, it remains 0.198 seconds as per train-ing step independent of the temperature values. It alsoperfectly support parallel sampling from the trained net-work on GPU. These suggest that the Critical SlowingDown(CSD) is hopefully avoided.Even though the burden due to the increased auto-correlation time[21, 46] in MCMC doesn’t appear in theVANs, the training cost with increasing lattice size shouldbe mentioned. As shown in Fig. 5, the circles are thetraining time per step for different lattice size calculatedvia networks with width and depth (32 ,
3) near the transi-tion temperature β = 1 .
12. The dashed line is the fittingcurve with the form t ( L ) = a L b , and the red dot( L =24)is not used in fitting. Here the cost of bypassing the CSDis introducing a training time which has polynomial de-pendence on L approximately but one-off since the fol-lowing sampling can take on parallel advantage of GPUfor large ensemble generation. A more powerful GPUcan reduce the time-consuming. In our case, the resultsin Fig. 5 are obtained on a Nvidia RTX 2080 GPU, whichis 4 times faster than on a Nvidia RTX 2070 Max-Q GPUapproximately. Lattice Size L T i m e ( s ) training time0.0026 L FIG. 5: The time-consuming increases with the lattice size L at inverse temperature β = 1 . Discussions .—In this work, an autoregressive neuralnetwork is designed to be an
Ansatz of the variationalapproach for investigating the topological phase tran- sition with continuous variables within 2-d XY model.The Variational Autoregressive Networks learn to con-struct microscopic states of the spin system accurately, inwhich vortices (anti-vortices) emerge automatically. Theenergy and vorticity density are calculated with config-urations generated from the networks, furthermore thecomparison with MCMC algorithm indicates that theVANs tends to extract dominant collective degrees offreedom in XY model, vortices, which are crucial at hightemperature. The autoregressive structure of the neuralnetworks is beneficial to mine the long range correlationeven beyond the phase transition point. It brings anopportunity that more latent topological structures canbe investigated, such as in a coupled XY model[45] orin a twisted bilayer graphene[47], where new long rangecorrelations emerge. Besides, a straightforward estima-tion to the transition point of the KT phase transitionin the VANs is shown to be consistent with prior works.Although the time- consuming with size increasing is un-avoidable, with the help of the powerful GPU computa-tion, searching critical point becomes more economicalin the limit of generating big ensemble of configurations.In complicated many-body system, the critical slowingdown problem in MCMC is expected to be alleviated,which will help i.e. , Lattice QCD to reach possible crit-ical end point region. The φ model could be a practicalstep[21], in which the calculation accuracy and practica-bility should be rigorously examined.In a short summary, the machine learning approach,especially with well-designed neural networks can matchwith specific physical problems, such as the VANs tothe topological phase transitions shown in this work, theRNN to the system with time reversal symmetry[48] andthe DNN to the renormalization group approaches[49].This inspire us to explore the Machine Learning tech-niques in a more physical viewpoint, which will help usopen the black box of machine learning and nature.We thank Giuseppe Carleo and Junwei Liu for use-ful discussions. The work of this research is supportedby the BMBF under the ErUM-Data project (K. Z.),by the AI grant of SAMSON AG, Frankfurt (K. Z.), bythe National Natural Science Foundation of China, GrantNo. 11875002(Y.J.) and No.11775123 (L.H. and L.W.),by the Zhuobai Program of Beihang University(Y.J.), bythe National Key R&D Program of China, Grant No.2018YFA0306503 (L.H.). K.Z. also thanks the donationof NVIDIA GPUs by NVIDIA Corporation for research. Bibliography [1] M. Buchanan, Nat. Phys. , 1208 (2019). IBLIOGRAPHY 6 [2] G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld,N. Tishby, L. Vogt-Maranto, and L. Zdeborov´a, Rev.Mod. Phys. , 045002 (2019).[3] L. Wang, Phys. Rev. B , 195105 (2016).[4] J. Carrasquilla and R. G. Melko, Nat. Phys. , 431(2017).[5] L.-G. Pang, K. Zhou, N. Su, H. Petersen, H. St¨ocker,and X.-N. Wang, Nat. Commun , 210 (2018).[6] E. M. Metodiev and J. Thaler, Phys. Rev. Lett. ,241602 (2018).[7] G. Kasieczka, T. Plehn, A. Butter, K. Cranmer, D. Deb-nath, B. M. Dillon, M. Fairbairn, D. A. Faroughy, W. Fe-dorko, C. Gay, L. Gouskos, J. F. Kamenik, P. Komiske,S. Leiss, A. Lister, S. Macaluso, E. Metodiev, L. Moore,B. Nachman, K. Nordstr¨om, J. Pearkes, H. Qu, Y. Rath,M. Rieger, D. Shih, J. Thompson, and S. Varma, SciPostPhys. , 014 (2019).[8] J. Steinheimer, L.-G. Pang, K. Zhou, V. Koch, J. Ran-drup, and H. Stoecker, J. High Energ. Phys. , 122(2019).[9] J. S. Smith, O. Isayev, and A. E. Roitberg, Chem. Sci. , 3192 (2017).[10] G. Carleo and M. Troyer, Science , 602 (2017).[11] D. Pfau, J. S. Spencer, A. G. d. G. Matthews, andW. M. C. Foulkes, ArXiv190902487 Phys. (2019),arXiv:1909.02487 [physics] .[12] A. Nagy and V. Savona, Phys. Rev. Lett. , 250501(2019).[13] M. J. Hartmann and G. Carleo, Phys. Rev. Lett. ,250502 (2019).[14] F. Vicentini, A. Biella, N. Regnault, and C. Ciuti, Phys.Rev. Lett. , 250503 (2019).[15] N. Yoshioka and R. Hamazaki, Phys. Rev. B , 214306(2019).[16] H. Shen, J. Liu, and L. Fu, Phys. Rev. B , 205140(2018).[17] Y. Mori, K. Kashiwa, and A. Ohnishi, Prog Theor ExpPhys (2018), 10.1093/ptep/ptx191.[18] K. Zhou, G. Endr˝odi, L.-G. Pang, and H. St¨ocker, Phys.Rev. D , 011501 (2019).[19] A. Alexandru, P. Bedaque, H. Lamm, and S. Lawrence,Phys. Rev. D , 094505 (2017), arXiv:1709.01971 .[20] P. Broecker, J. Carrasquilla, R. G. Melko, and S. Trebst,Sci. Rep. , 8823 (2017).[21] J. M. Urban and J. M. Pawlowski, ArXiv181103533 Hep-Lat Physicsphysics (2018), arXiv:1811.03533 [hep-lat,physics:physics] .[22] D. Wu, L. Wang, and P. Zhang, Phys. Rev. Lett. ,080602 (2019).[23] O. Sharir, Y. Levine, N. Wies, G. Carleo, andA. Shashua, Phys. Rev. Lett. , 020503 (2020).[24] Z. Ou, ArXiv180801630 Cs Stat (2019),arXiv:1808.01630 [cs, stat] .[25] S. Background, Topological Phase Transitions and Topo-logical Phases of Matter , Tech. Rep. (the Royal SwedishAcademy of Sciences, 2016).[26] C. Wang and H. Zhai, Phys. Rev. B , 144432 (2017). [27] M. J. S. Beach, A. Golubeva, and R. G. Melko, Phys.Rev. B , 045207 (2018), arXiv:1710.09842 .[28] P. Zhang, H. Shen, and H. Zhai, Phys. Rev. Lett. ,066401 (2018).[29] H.-Y. Hu, S.-H. Li, L. Wang, and Y.-Z. You,ArXiv190300804 Cond-Mat Physicshep-Th (2019),arXiv:1903.00804 [cond-mat, physics:hep-th] .[30] M. Cristoforetti, G. Jurman, A. I. Nardelli, andC. Furlanello, ArXiv170509524 Cond-Mat Physicshep-Lat (2017), arXiv:1705.09524 [cond-mat, physics:hep-lat].[31] R. Gupta, J. DeLapp, G. G. Batrouni, G. C. Fox, C. F.Baillie, and J. Apostolakis, Phys. Rev. Lett. , 1996(1988).[32] J. M. Kosterlitz, J. Phys. C: Solid State Phys. , 1046(1974).[33] H. Weber and P. Minnhagen, Phys. Rev. B , 5986(1988).[34] H. Wagner and U. Schollwoeck, Scholarpedia , 9927(2010).[35] D. J. C. MacKay and D. J. C. M. Kay, Information The-ory, Inference and Learning Algorithms (Cambridge Uni-versity Press, 2003).[36] A. Van Den Oord, N. Kalchbrenner, andK. Kavukcuoglu, in
Proceedings of the 33rd Inter-national Conference on International Conference onMachine Learning - Volume 48 , ICML’16 (JMLR.org,New York, NY, USA, 2016) pp. 1747–1756.[37] R. J. Williams, Mach Learn , 229 (1992).[38] M. Hasenbusch, J. Phys. A: Math. Gen. , 5869 (2005).[39] Y. Komura and Y. Okabe, J. Phys. Soc. Jpn. , 113001(2012).[40] T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma,ArXiv170105517 Cs Stat (2017), arXiv:1701.05517 [cs,stat] .[41] L. Wang, “Code for recognizing the topological phasetransition by the variational autoregressive networks,”(2020), https://github.com/Anguswlx/VAN2XY .[42] J. Tobochnik and G. V. Chester, Phys. Rev. B , 3761(1979).[43] S. Teitel and C. Jayaprakash, Phys. Rev. B , 598(1983).[44] K. Nicoli, P. Kessel, N. Strodthoff, W. Samek, K.-R.M¨uller, and S. Nakajima, ArXiv190311048 Cond-MatStat (2019), arXiv:1903.11048 [cond-mat, stat] .[45] G. Bighin, N. Defenu, I. N´andori, L. Salasnich, andA. Trombettoni, Phys. Rev. Lett. , 100601 (2019).[46] J. Goodman and A. D. Sokal, Phys. Rev. D , 2035(1989).[47] A. Julku, T. J. Peltonen, L. Liang, T. T. Heikkil¨a, andP. T¨orm¨a, Phys. Rev. B , 060505 (2020).[48] R. Iten, T. Metger, H. Wilming, L. del Rio, andR. Renner, Phys. Rev. Lett.124