[PDF] Machine learning phases and criticalities without using real data for training

Abstract

We study the phase transitions of three-dimensional (3D) classical O(3) model and the two-dimensional (2D) classical XY model, as well as both the quantum phase transitions of 2D and 3D dimerized spin-1/2 antiferromagnets, using the techniques of supervised neural network (NN). Moreover, unlike the conventional approaches commonly used in the literature, the training sets employed in our investigation are neither the theoretical nor the real configurations of the considered systems. Remarkably, with such an unconventional set up of the training stage in conjunction with semi-experimental finite-size scaling formulas, the associated critical points determined by the NN method agree well with the established results in the literature. The outcomes obtained here imply that certain unconventional training strategies, like the one used in this study, are not only cost-effective in computation, but are also applicable for a wild range of physical systems.

Full PDF

MMachine learning phases and criticalities without using real data for training

D.-R. Tan and F.-J. Jiang ∗ Department of Physics, National Taiwan Normal University, 88, Sec.4, Ting-Chou Rd., Taipei 116, Taiwan

We study the phase transitions of three-dimensional (3D) classical O (3) model and two-dimensional (2D) classical XY model, as well as both the quantum phase transitions of 2D and3D dimerized spin-1/2 antiferromagnets, using the technique of supervised neural network (NN).Moreover, unlike the conventional approaches commonly used in the literature, the training sets em-ployed in our investigation are neither the theoretical nor the real conﬁgurations of the consideredsystems. Remarkably, with such an unconventional set up of the training stage in conjunction withsome semi-experimental ﬁnite-size scaling formulas, the associated critical points determined by theNN method agree well with the established results in the literature. The outcomes obtained hereimply that certain unconventional training strategies, like the one used in this study, are not onlycost-eﬀective in computation, but are also applicable for a wild range of physical systems. PACS numbers:

I. INTRODUCTION

The applications of artiﬁcial intelligence (AI) meth-ods and techniques to the studies of many-body systemshave recently inspired the communities of physics, ap-plied physics and physical chemistry. Moreover, manyimportant and exciting achievements have been obtainedusing the AI approach in the last a few years .Among these achievements, ﬁrst principles calculationsof properties of materials and analyzing the signals fromcolliders in high energy physics are two such notable ex-amples. Yet another signiﬁcant accomplishment is thesuccess of investigating critical phenomena using boththe supervised and unsupervised neural networks (NN).By employing the dedicated convolutional neural net-work techniques (CNN) which can capture certain char-acteristics of the studied models, it has been demon-strated that the phase transitions associated with manyclassical and quantum systems, including the Isingmodel, the XY models, as well as the Hubbard modelhave been studied with various extent of satisfaction. Be-cause of these numerous successful examples mentioned,it is optimistically believed that with the ideas of AI onemay be able to uncover features of certain systems thatcannot be obtained by the conventional methods. Eventhose days, seeking devoted AI techniques to surpass thesuccess that the traditional approaches can reach is stillvigorous.The standard procedure, i.e., the most consideredscheme, of investigating the phase transitions of physicalmodels by supervised NN consists of three steps ,namely the training, the validation, and the testingstages. Among these three stages, the training is themost ﬂexible one and various strategies have be used forthis step . Typically real conﬁgurations of thestudied systems obtained from certain numerical meth-ods are employed as the training sets. In addition, thetraining has been applied to various chosen temperatures T (relevant parameter) across the transition temperature T c (critical point). This indicates that in principle T c (orthe critical point) should be known in advance before one can employ the NN techniques. Such a training approachhas led to success in studying the critical phenomenaassociated with several many-body systems such as theIsing and the Hubbard models. Other schemes for whichthe locations of the critical points are not required areintroduced as well . For instance, the method ofusing the theoretical ground state conﬁgurations in theordered phase as the training sets are demonstrated tobe valid for ferromagnetic and antiferromagnetic Pottsmodels. For the readers who are interested in the detailsof these training processes, see Refs. .The strategy of considering the theoretical groundstates in the ordered phase as the training sets requiresonly one training and the knowledge of the associatedcritical point(s) is not needed. This approach has beenapplied to both the ferromagnetic and the antiferromag-netic Potts models, and the obtained outcomes show thatthe idea is eﬀective . In particular, numerical evi-dence strongly suggests that with this method, the com-putational demanding for the training stage is tremen-dously reduced, and its applicability is broad.Despite the NN results of estimating the critical pointsassociated with the Potts models, using the method ofconsidering the ground state conﬁgurations in the or-dered phase as the training sets, is impressive, an in-teresting question arises. Speciﬁcally, is this approachapplicable for studying the zero temperature phase tran-sitions of quantum spin systems, as well as the phasetransitions of models with continuous variables such asthe classical O (3) model?To answer the crucial question outlined in the previousparagraph, here we study the phase transitions of three-dimensional (3D) classical O (3) (Heisenberg) model andtwo-dimensional (2D) classical XY model, as well asthe quantum phase transitions of 2D and 3D dimerizedspin-1/2 antiferromagnets, using the simplest deep learn-ing neural network, namely the multilayer perceptron(MLP). In particular, unlike the conventional or the un-conventional training procedures introduced previously,in this investigation, following the idea of using the the-oretical ground states in the ordered phase as the train- a r X i v : . [ c ond - m a t . d i s - nn ] A ug FIG. 1: The studied dimerized quantum antiferromagneticHeisenberg models: 2D ladder (left) and 3D plaquette (right)models. The bold and thin bonds shown in both sub-ﬁguresrepresent J (cid:48) and J couplings, respectively. ing sets, we have adopted an alternative strategy for thetraining. In particular, the training sets employed herebelong to neither the theoretical nor the real conﬁgura-tions of the considered systems.The motivation for using the simplest deep learningNN in the study is that whether a NN idea is valid ornot should not depend on the detailed infrastructure ofthe built NN. Hence a MLP made up of only three layersare employed here. One can deﬁnitely considered a morecomplicated (and dedicated as well) NN such as CNN forthe associated investigations. This will be left for futurework.Remarkably, even using the extraordinary training setsmentioned above and some semi-experimental ﬁnite-sizescaling (which will be introduced later), the constructedMLP can eﬀectively detect the critical points of all thestudied classical and quantum physical systems. The in-triguing outcomes obtained here strongly suggest thatthe approach of investigating the targeted physical sys-tems before employing any objects for the training, suchas those done here and in , is not only cost-eﬀective incomputation, but also leads to accurate determination ofthe associated critical points. Finally, it is amazing thatthe simple procedure described here is not only valid forstudying the phase transitions associated with sponta-neous symmetry breaking (SSB), but also works for thoserelated to topology.This paper is organized as follows. After the introduc-tion, the studied microscopic models and the employedNN are described. In particular, the NN training sets andlabels are introduced thoroughly. Following this the re-sulting numerical results determined by applying the NNtechniques are presented. Finally, a section concludes ourinvestigation. II. THE MICROSCOPIC MODELS ANDOBSERVABLESA. The 3D classical O (3) (Heisenberg) and 2Dclassical XY model The Hamiltonian H O (3) of the 3D classical O (3)(Heisenberg) model on a cubical lattice considered in ourstudy is given by βH O (3) = − β (cid:88) (cid:104) ij (cid:105) (cid:126)s i · (cid:126)s j , (1)where β is the inverse temperature and (cid:104) ij (cid:105) stands for thenearest neighboring sites i and j . In addition, in Eq. (1) (cid:126)s i is a unit vector belonging to a 3D sphere S and islocated at site i .Starting with an extremely low temperature, as T rises,the classical O (3) system will undergo a phase transi-tion from an ordered phase, where majority of the unitvectors point toward the same direction, to a disorderedphase for which these mentioned vectors are oriented ran-domly. Relevant observables used here to signal out thephenomenon of this phase transition are the ﬁrst and thesecond Binder ratios ( Q and Q ) deﬁned by Q = (cid:104)| m |(cid:105) / (cid:104) m (cid:105) , (2) Q = (cid:104) m (cid:105) / (cid:104) m (cid:105) , (3)where m = L (cid:80) i (cid:126)s i and L is the linear box size of thesystem .The Hamiltonian of the 2D classical XY model on thesquare lattice has the same expression as H O (3) , exceptthat the corresponding unit vector (cid:126)s i at site i belongs toa (2D) circle instead of a 3D sphere. B. The 2D and 3D dimerized quantumantiferromagnetic Heisenberg models

The 2D and 3D dimerized quantum antiferromagneticHeisenberg model share a similar form of Hamiltoniangiven as H = (cid:88) (cid:104) i,j (cid:105) J ij (cid:126)S i · (cid:126)S j , (4)where again (cid:104) i, j (cid:105) stands for the nearest neighboring sites i and j , J ij > i and j , and (cid:126)S i is the spin-1/2operator located at i . The cartoon representation, in par-ticular the spatial arrangement of the antiferromagneticcouplings, of the studied models are shown in ﬁg. 1 (Inthis study, these quantum spin models will be called 2Dladder and 3D plaquette models if no confusion arises).From the ﬁgure one sees that, as the ratios J (cid:48) /J (of bothmodels) being tuned, quantum phase transitions from or-dered to disordered states will take place in these models × One-hot encoding L d L d L d L d L d L d L d L d L d L d L d L d ReLUReLUReLUReLUReLUReLUFlatten L d × Hiden layer

512 nodes 512 nodes 2 nodes

Softmax Output

FIG. 2: The NN (MLP), which consists of one input layer, one hidden layer, and one output layer, used here and in Ref. .In the ﬁgure d is the dimensionality of the considered system. In addition, the objects in the input layer are made up of 200copies of only two conﬁgurations for all the studied models. Finally, there are 512 (or 1024) nodes in the hidden layer andeach of these nodes is independently connected to every object in the input layer. Before each training object is connectedto the nodes in the hidden layer, the steps of one-hot encoding and ﬂatten are applied. The activation functions (ReLU andsoftmax) and where they are employed are demonstrated explicitly. For all the considered systems, the output layers consistof two elements. when g := J (cid:48) /J exceed certain values g c . Relevant ob-servables considered in our investigation for studying thequantum phase transitions are again the ﬁrst and thesecond Binder ratios described above. For the studiedspin-1/2 systems, Q and Q have the following deﬁni-tions Q = (cid:104)| M s |(cid:105) / (cid:104) M s (cid:105) , (5) Q = (cid:104) M s (cid:105) / (cid:104) M s (cid:105) , (6) M s = 1 L d (cid:88) i ( − i + i S zi , (7)here d is the dimensionality of the studied models.These mentioned g c of the quantum spin systems, aswell as the T c of the 3D classical O (3) and 2D classicalXY models introduced previously, have been calculatedwith high accuracy in the literature . III. THE CONSTRUCTED SUPERVISEDNEURAL NETWORKS

In this section, we will review the supervised NN,namely the multilayer perceptron (MLP) used in ourstudy. The employed training sets and the associatedlabels for the studied models will be described as well.

A. The built multilayer perceptron (MLP)

The MLP used in our investigation is already detailedin Ref. . Speciﬁcally, using the NN library keras , weconstruct a supervised NN which consists of only one in-put layer, one hidden layer of 512 (or 1024) independentnodes, and one output layer. In addition, The algorithm,optimizer, and loss function considered in our calcula-tions are the minibatch, the adam, and the categoricalcross entropy, respectively. To avoid overﬁtting, we alsoapply L regularization at various stages. The activationfunctions employed here are ReLU and softmax. The de-tails of the constructed MLP, including the steps of one-hot encoding and ﬂatten (and how these two processeswork) are shown in ﬁg. 2 and are available in Ref. .Finally, for the three studied models, results calculatedusing 10 sets of random seeds are all taken into accountwhen presenting the ﬁnal outcomes. We would like topoint out that in the testing stage, each of these 10 cal-culations uses the same set of conﬁgurations producedfrom the Monte Carlo simulations. Later we will comeback to this and make a comment about it. B. Training set and output labels for the 3Dclassical O (3) and 2D classical XY models Regarding the training set employed in the calcula-tions, instead of using real conﬁgurations obtained fromsimulations or the theoretical ground states in the or-dered phase of the considered system, here we use aslightly diﬀerent alternative. Speciﬁcally, to train theNN on a L by L by L cubical lattice for the 3D classical O (3) model, the training set consists of only two conﬁg-urations. In addition, 0 is assigned to every site of oneconﬁguration and the other conﬁguration is made up bygiving each of its sites the value of 1. As a result, the out-put labels are the vectors of (1 ,

0) and (0 , C. The expected output vectors for the 3D classical O (3) and 2D classical XY models at various T It should be pointed out that an O (3) conﬁguration isspeciﬁed completely by the associated two parameters,namely θ and ψ at each site of the underlying cubicallattice. At an extremely low temperature T , all the unitvectors of a O (3) conﬁguration point toward a particulardirection. Under such a circumstance, ψ mod π is either0 or 1 for every unit vector (of an O (3) conﬁguration).The employed training set described in the previous sub-section is motivated by this observation. As a result, themagnitude R of the output vector for a ground state O (3)conﬁguration is 1. When the temperature rises, one ex-pects that R diminishes with T and for T ≥ T c , R takesits possible minimum value 1 / √

2. Consequently, study-ing the magnitude of the NN output vectors as a functionof T can reveal certain relevant information of T c .Here we would like to emphasize the fact that ψ ratherthan θ is considered in our investigation. This is be-cause for any two given ﬁxed values of ψ , their relatedarc length on the 3D unit sphere are the same. For θ ,this is not the case. Therefore, with ψ one should arriveat more accurate outcomes.The same scenario described above for the 3D classical O (3) model applies to the 2D classical XY model as well. D. Training set and output labels for the 2D and3D quantum spin models

For the 2D and 3D dimerized quantum antiferromag-netic Heisenberg models investigated here, their associ-ated classical ground state conﬁgurations (in the orderphase) are adopted as the training set. Speciﬁcally, thetraining set for each of these two models consists of twoconﬁgurations. Moreover, the spin value of every latticesite is either 1 or -1 and they are arranged alternatively.In other words, for a site which has a spin value 1 (-1), the spin values for all of its nearest neighbor sites are -1(1). With such set up of the training sets, the employedoutput vectors should be (1 ,

0) and (0 ,

1) naturally.We would like to emphasize the fact that the trainingsets considered for the studied 2D and 3D quantum spinmodels are not even among any of the possible groundstate conﬁgurations of these two systems.

E. The expected output vectors for the 2D and 3Ddimerized quantum spin models at various g Due to quantum ﬂuctuations, it is not possible to as-sign any deﬁnite spin conﬁgurations for these investi-gated quantum spin models when g = 1 and g > g c .Therefore, how the corresponding output vectors behavewith respect to the dimerized strength g will be treatedclassically here. Consequently, R should be 1 and 1 / √ g = 1 and g ≥ g c , respectively. As we will demon-strate shortly, the R (magnitude) of the outputs associ-ated with the NN studies of these quantum spin systemsfollow these rules (i.e., the values of R are 1 and 1 / √ g = 1 and g > g c , respectively) in a satisfactory man-ner, hence lead to fairly good estimations of the criticalpoints. IV. THE NUMERICAL RESULTS

The conﬁgurations associated with the considered sys-tems, namely the 3D (classical) O (3), the 2D classicalXY, the 3D plaquette, as well as the 2D ladder modelsare generated by the Wolﬀ and the stochastic series ex-pansion (SSE) algorithms . In addition, for each ofthe studied model, the corresponding conﬁgurations arerecorded once in (at least) every two thousand MonteCarlo sweeps after the thermalization, and at least onethousand conﬁgurations are produced. These spin com-positions are then used for the calculations of NN. Asemi-experimental ﬁnite-size scaling, which is adopted toestimate the critical points, will be introduced as well inthis section. A. Results of 3D classical O (3) model In ﬁg. 3, the observable ﬁrst Binder ratio Q are con-sidered as functions of β for L = 8 , ,

16. As can beseen from the ﬁgure, the curves corresponding to various L intersect at a value of β close to the predicted criticalpoint β c = 0 . .For a O (3) conﬁguration obtained from the simulation,all the S vectors associated with it are converted to ψ mod π and the resulting conﬁguration is then fed intothe trained NN. R as functions of β for L = 8 and L = 20 are shown inﬁg. 4. While it is clear that both panels of ﬁg. 4 imply R change rapidly close to β c = 0 . β c cannot be Q L = 8 L = 12 L = 16 FIG. 3: Q as functions of β for the 3D classical O (3) model. R L = 8 R R L = 20 R FIG. 4: R as functions of β for the 3D classical O (3) model.The top and bottom panels are for L = 8 and L = 20, respec-tively. calculated unambiguously when only the information of R is available.If one assumes that R diminishes linearly with β in thecritical region, then β c can be approximately estimatedby the intersection of the curves of R and 1 / √ − R .Such an idea has been used in Ref. to calculate the T c ofthe 3D 5-state ferromagnetic Potts model as well as the g c of the 3D plaquette model (the latter will be studiedin more detail here). Here we adopt a more appropriateapproach for the determination of the considered criticalpoints by taking into account the deviation between thetheoretical and the calculated R .Ideally, at extremely low temperature region, the ob-tained R should be 1. To fulﬁll this criterion, an overallshift ∆, which is the diﬀerence between 1 and the R fromthe simulation with the largest β , is conducted . Fig-ures 5 and 6 demonstrate the associated curves made upof considering the data of R + ∆ and 1 / √ − R − ∆as functions of β for L = 4 , , , ,

24. As can be seenfrom the ﬁgures, the intersections of these two curves forall the L (except the one of L = 4) are in good agreementwith the theoretical prediction β c ∼ .

693 (which are thevertical dashed lines in these ﬁgures). While for large L ,the estimated values of β at which the mentioned twocurves intersect are slightly away from β c = 0 . β c by considering the intersection of the curvesassociated with R + ∆ and 1 / √ − R − ∆ is an eﬀec-tive approach. In particular, considering the simplicity ofboth the training procedure and the semi-experimentalmethod of calculating β c (for any ﬁnite L ) employed inthis study, the achievement reaches here for the deter-mination of the T c of the complicated 3D classical O (3)model is remarkable.The success of calculating the T c of 3D classical O (3)model through the idea of only considering ψ mod π in-dicates that partial information of the model is suﬃcientto estimate its associated critical point accurately.To calculate the critical points of the studied mod-els with high precision using the intersections describedabove, one may apply certain forms of ﬁnite-size scalingto those crossing points. Based on the outcomes demon-strated in ﬁgs. 5 and 6, it is clear that the R associatedwith the 3D classical O (3) model receives mild ﬁnite-size eﬀect. Apart from this, accurate determination ofthe crossing places in the relevant parameter space, par-ticularly high precision estimated uncertainties for thesecrossing points, is needed in order to carry out the ﬁts.Hence we postpone such an analysis to a latter subsec-tion where 2D 3-state ferromagnetic Potts model and 2Dclassical XY models on the square lattices are discussed. B. Results of 2D quantum spin system

The ﬁrst Binder ratio Q close to g c for the studied 2Ddimerized spin-1/2 antiferromagnet (2D ladder model)are shown in ﬁg. 7. Similar to the case of 3D classical R L = 4 R + L = 412 + 1 R L = 4 R L = 8 R + L = 812 + 1 R L = 8 FIG. 5: R + ∆ and 1 / √ − ∆ + 1 − R as functions of β forthe 3D classical O (3) model. The top and bottom panels arefor L = 4 and L = 8, respectively. O (3) model, various curves of large L tend to intersect ata value of g around 1.9. The estimated intersection g ∼ g c = 1 . . Of course, a better determinationof g c requires the performance of a dedicated ﬁnite-sizescaling analysis.For the 2D ladder model, the associated R as func-tions of g for L = 24 ,

48 are shown in ﬁg. 8. Moreover,by using the idea of estimating β c for the 3D classical O (3) model, the curves resulting from treating R + ∆and 1 / √ − R − ∆ as functions of g are demonstratedin ﬁgs. 9 ( L = 24 ,

32) and 10 ( L = 48 , g c .Here ∆ is the diﬀerence between the theoretical and thecalculated values of R at g = 1. As can be seen fromthe ﬁgures, when box size L increases, the g at whichthe mentioned two curves intersects is approaching thetheoretical g c . Hence the outcomes demonstrated in theﬁgures support the fact that our method of determiningthe critical points is also valid for the investigated quan- R L = 12 R + L = 1212 + 1 R L = 12 R L = 20 R + L = 2012 + 1 R L = 20 R L = 24 R + L = 2412 + 1 R L = 24 FIG. 6: R + ∆ and 1 / √ − ∆ + 1 − R as functions of β forthe 3D classical O (3) model. The top, middle, and bottompanels are for L = 12, L = 20, and L = 24, respectively. g Q L = 16 L = 24 L = 32 FIG. 7: Q (of L = 16 , ,

32) as functions of g for the 2Ddimerized quantum ladder model. g R L = 24 R g R L = 48 R FIG. 8: R as functions of g for the 2D dimerized quantumladder model. The top and bottom panels are for L = 24 and L = 48, respectively. tum (spin) system. g R L = 24 R + L = 2412 + 1 R L = 24 g R L = 32 R + L = 3212 + 1 R L = 32 FIG. 9: R + ∆ and 1 / √ − ∆ + 1 − R as functions of g for the2D dimerized quantum ladder model. The top and bottompanels are for L = 24 and L = 32, respectively. C. Results of 3D quantum spin system

The g c of the 3D plaquette model studied in Ref. canbe determined by considering the N´eel temperatures T N of various g close to g c . Speciﬁcally, if the logarithmiccorrection is not taken into account, then close to g c , T N can be described by T N ∼ A | g − g c | c + B | g − g c | c ,here A , B , and c are some constants. As a result, g c can be calculated by ﬁtting the data of T N of various g to this form. The g c estimated by this approach liesbetween 4.35 and 4.375, see ﬁg. 11. This obtained g c willbe used to examine the eﬀectiveness of the NN methodof calculating the g c of the 3D plaquette model. R as functions of g for L = 16 and 32 for the 3Dplaquette model are shown in ﬁg. 12. In addition, thecurves resulting from considering R + ∆ and 1 / √ − R − ∆ as functions of g are demonstrated in ﬁg. 13 ( L =16 , g c discussedin the previous paragraph. Here ∆ is again the diﬀerence g R L = 48 R + L = 4812 + 1 R L = 48 g R L = 64 R + L = 6412 + 1 R L = 64 FIG. 10: R +∆ and 1 / √ − ∆+1 − R as functions of g for the2D dimerized quantum ladder model. The top and bottompanels are for L = 48 and L = 64, respectively. g T c FIG. 11: T N as a function of g for the 3D dimerized quantumplaquette model. The solid line shown in the ﬁgure is obtainedby using the results from a ﬁt. g R L = 16 R g R L = 32 R FIG. 12: R as functions of g for the 3D dimerized quantumplaqutte model. The top and bottom panels are for L = 16and L = 32, respectively. between the theoretical and the calculated values of R at g = 1.Remarkably, just like what we have found for the 2Dquantum ladder model, the results shown in the ﬁgureclearly reveal the message that our NN method is validfor 3D quantum spin model as well.It is interesting to notice that the crossing points inboth panels of ﬁg. 13 are slightly below the critical pointcalculated from T N . We attribute this to the facts thatthe systematic inﬂuence of some tunable parameters ofNN as well as certain corrections to the employed ﬁnite-size scaling method are not taken into account here.Nevertheless, based on the outcomes associated withboth the investigated 2D and 3D dimerized quantum an-tiferromagnetic Heisenberg models, it is beyond doubtthat the NN approach employed here can be used to es-timate the critical points of quantum phase transitionseﬃciently. g R + L =1612 + 1 R L =16 g R + L =3212 + 1 R L =32 FIG. 13: R +∆ and 1 / √ − ∆+1 − R as functions of g for the3D dimerized quantum plaquette model. The top and bottompanels are for L = 16 and L = 32, respectively. D. Veriﬁcation of the semi-experimental ﬁnite-sizescaling formulas: 2D three-state ferromagnetic Pottsmodel and 2D classical XY model on the squarelattices

1. 2D three-state ferromagnetic Potts model

In previous subsections, it is shown that the criticalpoint can be obtained by considering the intersection oftwo curves made up of quantities associated with R . Toobtain a high precision estimation for the critical pointusing the crossing points, one can apply certain expres-sion of ﬁnite-size scaling to ﬁt the data (of the crossingpoints). Here we use the data of the 2D 3-state ferro-magnetic Potts model on the square lattice available inRef. to carry out such an investigation. For each L ,the data are obtained using a single set of relevant NNparameters. As a result, the quoted errors are associatedwith the Potts conﬁgurations themselves.Figure 14 shows that data of R + ∆ and 1 / √ − T Q = 3 R + L =10 , L = 10 + 1 R L =10 , L = 10 R + L =20 , L = 20 + 1 R L =20 , L = 20 R + L =40 , L = 40 + 1 R L =40 , L = 40 R + L =80 , L = 80 + 1 R L =80 , L = 80 FIG. 14: R + ∆ and 1 / √ − R − ∆ as functions of T forvarious L for the 2D 3-state ferromagnetic Potts model on thesquare lattice. L T c T c = 0.995(3) FIG. 15: Fit of the crossing points (of various ﬁnite L ) to theansatz a + b/L c . The data are associated with the 3-stateferromagnetic Potts model on the square lattice and thedashed line in the ﬁgure is obtained by using the results ofthe ﬁts. R − ∆ as functions of T for various L for the 2D 3-stateferromagnetic Potts model on the square lattice . Aﬁt of the form a + b/L c , where a , b , and c are some tobe determined constants ( a is exactly the desired T c ), isused to ﬁt the data of the crossing points obtained from L = 10 , , , , ,

240 (The data of L = 120 and 240are not presented in ﬁg. 14). When carrying out the ﬁts,Gaussian noises are considered in order to estimate thecorresponding errors of the constants a , b , and c .The ﬁts lead to a = 0 . T c ∼ . R + L =36

1/ 2 + 1 R L =36 R + L =64

1/ 2 + 1 R L =64 R + L =96

1/ 2 + 1 R L =96 R + L =128

1/ 2 + 1 R L =128 FIG. 16: R + ∆ and 1 / √ − R − ∆ as functions of β forvarious L for the 2D classical XY model on the square lattice. L = 80 R + L = 8012 + 1 R L = 80 L )) c ( L ) FIG. 17: (Top) Estimation of the crossing point for L = 80.(Bottom) Fit of β c ( L ) to the ansatz a + b / (log( L )) . Thesolid line is obtained using the results from the ﬁt.

2. 2D classical XY model

The R + ∆ and 1 / √ − R − ∆ as functions of β forseveral L of the 2D classical XY model are demonstratedin ﬁg. 16. Similar to the analysis done for the 2D 3-stateferromagnetic Potts model, we would like to calculatethe crossing points for various L and use some kind ofﬁnite-size scaling to ﬁt the obtained data so that onecan determine the associated β c (or T c ). After obtainingcoarse estimations of the crossing points for various L from ﬁg. 16, more simulations are carried out in order toreach a better precision for these crossing points. Thesereﬁned β c ( L ) are then ﬁtted to the same formula (i.e. a + b/L c ) as that used for the 3-state Potts model. Weﬁnd that the obtained results are not satisfactory. Thiscan be expected since the topological characteristics ofthe Kosterlitz-Thouless transition should reﬂect on R .Motivated by the ﬁnite-size scaling formulas used inRefs. for the 2D classical XY model, we use aansatz of the form a + b / (log( L )) (here a is the β c ) toﬁt the newly obtained data of crossing points. The out-come is good and we ﬁnd the β c is given by β c = 1 . β c ∼ . V. DISCUSSIONS AND CONCLUSIONS

In this study we investigate the phase transitions of3D classical O (3) model and 2D classical XY model, aswell as the quantum phase transitions of both 2- and 3-Ddimerized spin-1/2 antiferromagnetic Heisenberg modelsusing the simplest deep learning NN, namely a MLP thatis made up of only one input layer, one hidden layer, aswell as one output layer.In our investigation, the training set for each of thestudied models consists of only two objects. In partic-ular, none of the used training objects belongs to thetheoretical or the real conﬁgurations of the consideredphysical systems.Remarkably, with such an unconventional approach ofcarrying out the training processes in conjunction withsemi-experimental ﬁnite-size scaling formulas, the result-ing outcomes from the built MLP lead to very goodestimations of the targeted critical points. The resultsreached here as well as that shown in Refs. provideconvincing evidence that the performance of certain un-1 R L = 20 R + L = 2014 + 1 R L = 20 FIG. 18: R +∆ and 1 / √ − ∆+1 − R as functions of β for the3D classical O (3) model. The results are obtained from thecalculations which use 4 conﬁgurations as the training set. conventional strategies, such as employing the theoreticalground state conﬁgurations as the training sets, are im-pressive. Particularly, the simplicity of these approachesmake them cost-eﬀective in computation. It is amazingthat the simple procedures used in Refs. and here arenot only valid for phase transitions associated with SSB,but also work for those related to topology.We would like to point out that for the 3D classical O (3) model, the training set used here consists of twoconﬁgurations (their elements are either all 1 or all 0). Inprinciple, one can consider training set made up of three,four, or even ﬁve conﬁgurations following the same ideaas that of two objects training set. To examine whetherusing the training sets, which constitute more than twoobjects, one can arrive at the same level of success as thatshown in the previous section, we have performed threemore NN calculations using n = 3, n = 4, and n = 5training sets. Here n denotes the number of objects con-taining in the training set. Interestingly, the precision ofthe estimated T c of the 3D classical O (3) model obtainedfrom these additional calculations is becoming slightlyless satisfactory with n , see ﬁg. 18 for a outcome relatedto n = 4 and L = 20. Intuitively, this can be understoodas follows. Let us assume that initially all the unit vectorsbelong to a category of the classiﬁcation scheme imple-mented in the training stage. Then any local ﬂuctuationwill have greater impact on the resulting NN outputs ifthe training set contains more types of objects. Despitethis, it is beyond doubt that the outcomes associatedwith training sets consisting of only 2 conﬁgurations, in-cluding those from all the three studied models, stronglysuggest the eﬀectiveness of the approach presented in thisstudy.The NN results related to all the models consideredhere are obtained using 10 sets of random seeds withother parameters of NN being ﬁxed in the calculations.For several L of the studied 3D O (3) and 2D ladder R L = 4 R + L = 412 + 1 R L = 4 R L = 24 R + L = 2412 + 1 R L = 24 FIG. 19: R +∆ and 1 / √ − ∆+1 − R as functions of β for the3D classical O (3) model. The data are obtained using onlyone set of random seeds. The top and bottom panels are for L = 4 and L = 24, respectively. models, we have performed analysis using only one ofthe 10 trained NNs. Some of the resulting outcomes areshown in ﬁgs. 19, 20 (The errors of the data shown inthese new ﬁgures are associated with the conﬁgurationsdetermined from QMC simulations). These new ﬁguresmatch nicely with that determined with 10 sets of ran-dom seeds. Apart from this, we have also carried outseveral calculations using various batchsize, epoch, andnodes in the hidden layer. These new calculations lead tovery good agreement with that shown explicitly in thisstudy as well, see ﬁg. 21 for one result from these newcalculations. The additional investigations introduced inthis paragraph imply that the tunable parameters of NNhave very mild eﬀects on the resulting outcomes of R for the considered models. Hence the obtained conclu-sion here should be reliable. Of course, as already beingpointed out before, considering other systematic impactsare required if a highly accurate estimation of the tar-geted critical point is desirable.2 g R L = 32 R + L = 3212 + 1 R L = 32 FIG. 20: R + ∆ and 1 / √ − ∆ + 1 − R as a function of g forthe 2D ladder model. The data are obtained using only oneset of random seeds. g R L = 48 R + L = 4812 + 1 R L = 48 FIG. 21: R + ∆ and 1 / √ − ∆ + 1 − R as a function of g forthe 2D ladder model. The data are obtained using diﬀerentNN parameters from that shown in the previous subsection. Although in this study we have focused on studyingthe phase transitions of several models, it is probablethat simple NN approaches, similar to the one(s) con-sidered here, are available for investigating other physi-cal properties of many-body systems. Finally, we wouldlike to emphasize the motivations for the series of ourstudies of applying the NN techniques to investigate thephase transitions of several physical systems, as shownin Refs. and here. Conventionally, the application ofa supervised NN to explore the critical phenomenon of aspeciﬁc system has a caveat, namely the knowledge of thecritical point is required in advance before one can em-ploy the methods of NN for the investigation. Hence forsystems with unknown critical points, it may not be easyto apply such standard NN procedures to the studies ina straightforward manner. The approaches considered inRefs. and here deﬁnitely can take care of this issue,hence promote the use of NN methods in various ﬁelds ofmany-body systems. In particular, these unconventionalmethods are adequate for carrying out any NN investi-gations of examining whether certain proposed theoriesare relevant for a real and unexplored physical system.

Acknowledgement

Partial support from Ministry of Science and Technol-ogy of Taiwan is acknowledged. ∗ [email protected] Matthias Rupp, Alexandre Tkatchenko, Klaus-RobertM¨uller, and O. Anatole von Lilienfeld, Phys. Rev. Lett. John C. Snyder, Matthias Rupp, Katja Hansen, Klaus-Robert M¨uller, and Kieron Burke, Phys. Rev. Lett. Gr´egoire Montavon, Matthias Rupp, Vivekanand Go-bre, Alvaro Vazquez-Mayagoitia, Katja Hansen Alexan-dre Tkatchenko, Klaus-Robert Muller, and O. Anatole vonLilienfeld, New Journal of Physics (2013) 095003. Pilania, G., Wang, C., Jiang, X. et al., Sci Rep , 2810(2013). B. Meredig, A. Agrawal, S. Kirklin, J. E. Saal, J. W. Doak,A. Thompson, K. Zhang, A. Choudhary, and C. Wolverton, Phys. Rev. B , 094104 (2014). K. T. Sch¨utt, H. Glawe, F. Brockherde, A. Sanna, K. R.M¨uller, and E. K. U. Gross, Phys. Rev. B , 205118(2014). Zhenwei Li, James R. Kermode, and Alessandro De Vita,Phys. Rev. Lett. , 096405 (2015) P. Baldi, P. Sadowski and D. Whiteson, Phys. Rev. Lett. , 111801 (2015). V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness,M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fid-jeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik,I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Leggand D. Hassabis, Nature , no.7540, 529-533 (2015). Tobias Lang, Florian Flachsenberg, Ulrike von Luxburg,and Matthias Rarey, J. Chem. Inf. Model. 2016, 56, 1, Joohwi Lee, Atsuto Seko, Kazuki Shitara, KeitaNakayama, and Isao Tanaka, Phys. Rev. B , 115104(2016). J. Searcy, L. Huang, M. A. Pleier and J. Zhu, Phys. Rev.D , no.9, 094033 (2016). T. Castro, M. Quartin and S. Benitez-Herrera, Phys. DarkUniv. (2016), 66-76. P. Baldi, K. Cranmer, T. Faucett, P. Sadowski andD. Whiteson, Eur. Phys. J. C , no.5, 235 (2016). P. Baldi, K. Bauer, C. Eng, P. Sadowski and D. Whiteson,Phys. Rev. D , 094034 (2016). Giacomo Torlai and Roger G. Melko, Phys. Rev. B ,165134 (2016). Lei Wang, Phys. Rev. B , 195105 (2016) M. Attarian Shandiz and R. Gauvin Computational Ma-terials Science 117 (2016) 270-278. Tomoki Ohtsuki and Tomi Ohtsuki, J. Phys. Soc. Jpn. 85,123706 (2016). B. Hoyle, Astron. Comput. , 34-40 (2016). Juan Carrasquilla, Roger G. Melko, Nature Physics ,431434 (2017). Giuseppe Carleo, Matthias Troyer, Science 355, 602 (2017) Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg,Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, KarlLeswing and Vijay Pande, Chem. Sci., 2018, , 513. Peter Broecker, Juan Carrasquilla, Roger G. Melko, andSimon Trebst, Scientiﬁc Reports , 8823 (2017). Kelvin Ch’ng, Juan Carrasquilla, Roger G. Melko, andEhsan Khatami, Phys. Rev. X , 031038 (2017). J. Barnard, E. N. Dawe, M. J. Dolan and N. Rajcic, Phys.Rev. D , 014018 (2017). Akinori Tanaka, Akio Tomiya, J. Phys. Soc. Jpn. 86,063001 (2017). Evert P.L. van Nieuwenburg, Ye-Hua Liu, Sebastian D.Huber, Nature Physics , 435439 (2017) A. Mott, J. Job, J. R. Vlimant, D. Lidar and M. Spiropulu,Nature , no.7676, 375-379 (2017). Junwei Liu, Huitao Shen, Yang Qi, Zi Yang Meng, LiangFu, Phys. Rev. B , 241104(R) (2017). Tomoyuki Tamura et al (2017) Modelling Simul. Mater.Sci. Eng. J. Tubiana and R. Monasson, Phys. Rev. Lett. , 138301(2017). Xiao Yan Xu, Yang Qi, Junwei Liu, Liang Fu, Zi YangMeng, Phys. Rev. B , 041119(R) (2017). Li Huang and Lei Wang, Phys. Rev. B , 035105 (2017). M. Zevin, S. Coughlin, S. Bahaadini, E. Besler, N. Rohani,S. Allen, M. Cabero, K. Crowston, A. Katsaggelos, S. Lar-son, T. K. Lee, C. Lintott, T. Littenberg, A. Lundgren,C. sterlund, J. Smith, L. Trouille and V. Kalogera, Class.Quant. Grav. , no.6, 064003 (2017). Junwei Liu, Yang Qi, Zi Yang Meng, Liang Fu, Phys. Rev.B , 041101 (2017). Yue Liu, Tianlu Zhao. Wangwei Ju, Siqi Shi, J. Materi-omics 3 (2017) 159-177. Qianshi Wei, Roger G. Melko, Jeﬀ Z. Y. Chen, Phys. Rev.E , 032504 (2017). Yuki Nagai, Huitao Shen, Yang Qi, Junwei Liu, and LiangFu Phys. Rev. B Kolb, B., Lentz, L. C. and Kolpak, A. M., Sci Rep , 1192(2017). Dong-Ling Deng, Xiaopeng Li, and S. Das Sarma, Phys.Rev. B Pedro Ponte and Roger G. Melko, Phys. Rev. B , 205146(2017). G. Kasieczka, T. Plehn, M. Russell and T. Schell, JHEP , 006 (2017). Yi Zhang, Roger G. Melko, and Eun-Ah Kim Phys. Rev.B , 245119 (2017). Yi Zhang and Eun-Ah Kim, Phys. Rev. Lett. , 216401(2017) Wenjian Hu, Rajiv R. P. Singh, and Richard T. Scalettar,Phys. Rev. E , 062122 (2017). C.-D. Li, D.-R. Tan, and F.-J. Jiang, Annals of Physics,391 (2018) 312-331. Kelvin Ch’ng, Nick Vazquez, and Ehsan Khatami, Phys.Rev. E , 013306 (2018). Lu, S., Zhou, Q., Ouyang, Y. et al., Nat Commun 9, 3405(2018). D. George and E. A. Huerta, Phys. Rev. D , 044039(2018). Matthew J. S. Beach, Anna Golubeva, and Roger G.Melko, Phys. Rev. B , 045207 (2018). L. G. Pang, K. Zhou, N. Su, H. Petersen, H. Stcker andX. N. Wang, Nature Commun. , no.1, 210 (2018). Phiala E. Shanahan, Daniel Trewartha, and William Det-mold, Phys. Rev. D , 094506 (2018). Nongnuch Artrith, Alexander Urban, and Gerbrand Ceder,J. Chem. Phys. , 241711 (2018). Keith T. Butler, Daniel W. Davies, Hugh Cartwright,Olexandr Isayev, and Aron Walsh, Nature , 547555(2018). Albert P. Barto´k, James Kermode, Noam Bernstein, andGa´bor Csa´nyi, Phys. Rev. X , 041048 (2018). Pengfei Zhang, Huitao Shen, and Hui Zhai, Phys. Rev.Lett. , 066401 (2018). Jake Graser, Steven K. Kauwe, and Taylor D. SparksChem. Mater. 2018, 30, 11, 36013612. A. Butter, G. Kasieczka, T. Plehn and M. Russell, SciPostPhys. , no.3, 028 (2018). Jun Gao et al. Phys. Rev. Lett. , 240501 (2018). Wanzhou Zhang, Jiayu Liu, and Tzu-Chieh Wei, Phys.Rev. E , 032142 (2019). Jonas Greitemann, Ke Liu, and Lode Pollet, Phys. Rev. B , 060404 (2019). J. Ren, L. Wu, J. M. Yang and J. Zhao, Nucl. Phys. B , 114613 (2019). Xiao-Yu Dong, Frank Pollmann, and Xue-Feng Zhang,Phys. Rev. B , 121104 (2019). M. Cavaglia, K. Staats and T. Gill, Commun. Comput.Phys. , no.4, 963-987 (2019). Daniel W. Davies, Keith T. Butler, and Aron Walsh,Chem. Mater. 2019, 31, 18, 72217230. G. P. Conangla, F. Ricci, M. T. Cuairan, A. W. Schell,N. Meyer and R. Quidant, Phys. Rev. Lett. , 223602(2019) Boram Yoon, Tanmoy Bhattacharya, and Rajan Gupta,Phys. Rev. D , 014504 (2019). J. Fluri, T. Kacprzak, A. Lucchi, A. Refregier, A. Amara,T. Hofmann and A. Schneider, Phys. Rev. D , 063514(2019). Askery Canabarro, Felipe Fernandes Fanchini, Andr´e LuizMalvezzi, Rodrigo Pereira, and Rafael Chaves, Phys. Rev.B , 045129 (2019). Limeng Li, Yang You, Shunbo Hu, Yada Shi, GuodongZhao, Chen Chen, Yin Wang, Alessandro Stroppa and Wei Ren, Appl. Phys. Lett. , 083102 (2019). Henry Chan, Badri Narayanan, Mathew J. Cherukara,Fatih G. Sen, Kiran Sasikumar, Stephen K. Gray, MariaK. Y. Chan, and Subramanian K. R. S. Sankaranarayanan,J. Phys. Chem. C 2019, 123, 12, 69416957. Wenqian Lian et al.

Phys. Rev. Lett. , 210503 (2019). Linyang Zhu, Weiwei Zhang, Jiaqing Kou, and Yilang Liu,Physics of Fluids , 015105 (2019). Pankaj Mehta, Marin Bukov, Ching-Hao Wang, AlexandreG.R. Day, Clint Richardson, Charles K. Fisher, and DavidJ. Schwab, Phys. Rep. 810, (2019) 1-124. Schutt, K.T., Gastegger, M., Tkatchenko, A. et al., NatCommun 10, 5024 (2019). Joaquin F. Rodriguez-Nieva and Mathias S. Scheurer, Nat.Phys. 15, 790795 (2019). Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Lau-rent Daudet, Maria Schuld, Naftali Tishby, Leslie Vogt-Maranto, and Lenka Zdeborov´a, Rev. Mod. Phys. ,045002 (2019). T. Ohtsuki et al.

J. Phys. Soc. Jpn. 89, 022001 (2020) Wataru Hashimoto, Yuta Tsuji, and Kazunari YoshizawaJ. Phys. Chem. C 2020, 124, 18, 99589970. Ryosuke Jinnouchi, Ferenc Karsai, and Georg Kresse Phys.Rev. B 101, 060201(R) (2020). A. J. Larkoski, I. Moult and B. Nachman, Phys. Rept. ,1-63 (2020). X. Han and S. A. Hartnoll, Phys. Rev. X , 011069(2020). D.-R. Tan et al. Japneet Singh, Vipul Arora, Vinay Gupta, and Mathias S.Scheurer, arXiv:2006.11868. K. Binder, Z. Phys. B bf 43, 119 (1981). Christian Holm and Wolfhard Janke, Phys. Lett. A173(1993) 8. Massimo Campostrini, Martin Hasenbusch, Andrea Pelis-setto, Paolo Rossi, and Ettore Vicari, Phys. Rev. B ,144520 (2002). Martin Hasenbusch, J. Phys. A (2005) 5869-5884. A. W. Sandvik, AIP Conf. Proc. 2397, 135 (AIP, New York,2010). D.-R. Tan and F.-J. Jiang, Phys. Rev. B , 094405 (2018). U. Wolﬀ, Phys. Rev. Lett. , 361 (1989). A. W. Sandvik, Phys. Rev. B , R14157 (1999). https://keras.io Based on the employed training sets, the correction ∆should be calculated at the R obtained with the largest β for the 3D classical O (3) and 2D classical XY models.Similarly, the ∆ associated with the studied quantum spinmodels are determined at the R corresponding to g = 1.96