[PDF] Machine Learning Percolation Model

Abstract

Recent advances in machine learning have become increasingly popular in the applications of phase transitions and critical phenomena. By machine learning approaches, we try to identify the physical characteristics in the two-dimensional percolation model. To achieve this, we adopt Monte Carlo simulation to generate dataset at first, and then we employ several approaches to analyze the dataset. Four kinds of convolutional neural networks (CNNs), one variational autoencoder (VAE), one convolutional VAE (cVAE), one principal component analysis (PCA), and one k-means are used for identifying order parameter, the permeability, and the critical transition point. The former three kinds of CNNs can simulate the two order parameters and the permeability with high accuracy, and good extrapolating performance. The former two kinds of CNNs have high anti-noise ability. To validate the robustness of the former three kinds of CNNs, we also use the VAE and the cVAE to generate new percolating configurations to add perturbations into the raw configurations. We find that there is no difference by using the raw or the perturbed configurations to identify the physical characteristics, under the prerequisite of corresponding labels. In the case of lacking labels, we use unsupervised learning to detect the physical characteristics. The PCA, a classical unsupervised learning, performs well when identifying the permeability but fails to deduce order parameter. Hence, we apply the fourth kinds of CNNs with different preset thresholds, and identify a new order parameter and the critical transition point. Our findings indicate that the effectiveness of machine learning still needs to be evaluated in the applications of phase transitions and critical phenomena.

Full PDF

MMachine Learning Percolation Model

Shu Cheng , Huai Zhang, ∗ and Yaolin Shi Key Laboratory of Computational Geodynamics, College of Earth and Planetary Sciences,University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Shijingshan District, Beijing 100049, China

Fei He and Ka-Di Zhu † Key Laboratory of Artiﬁcial Structures and Quantum Control (Ministry of Education),School of Physics and Astronomy, Shanghai Jiao Tong University,800 Dong Chuan Road, Shanghai 200240, China (Dated: January 25, 2021)Recent advances in machine learning have become increasingly popular in the applications ofphase transitions and critical phenomena. By machine learning approaches, we try to identify thephysical characteristics in the two-dimensional percolation model. To achieve this, we adopt MonteCarlo simulation to generate dataset at ﬁrst, and then we employ several approaches to analyzethe dataset. Four kinds of convolutional neural networks (CNNs), one variational autoencoder(VAE), one convolutional VAE (cVAE), one principal component analysis (PCA), and one k -meansare used for identifying order parameter, the permeability, and the critical transition point. Theformer three kinds of CNNs can simulate the two order parameters and the permeability with highaccuracy, and good extrapolating performance. The former two kinds of CNNs have high anti-noiseability. To validate the robustness of the former three kinds of CNNs, we also use the VAE and thecVAE to generate new percolating conﬁgurations to add perturbations into the raw conﬁgurations.We ﬁnd that there is no diﬀerence by using the raw or the perturbed conﬁgurations to identifythe physical characteristics, under the prerequisite of corresponding labels. In the case of lackinglabels, we use unsupervised learning to detect the physical characteristics. The PCA, a classicalunsupervised learning, performs well when identifying the permeability but fails to deduce orderparameter. Hence, we apply the fourth kinds of CNNs with diﬀerent preset thresholds, and identifya new order parameter and the critical transition point. Our ﬁndings indicate that the eﬀectivenessof machine learning still needs to be evaluated in the applications of phase transitions and criticalphenomena. I. INTRODUCTION

Machine learning methods have rapidly become per-vasive instruments due to better ﬁtting quality and pre-dictive quality in comparison with traditional models interms of phase transitions and critical phenomena. Usu-ally, machine learning can be divided into supervised andunsupervised learning. In the former, the machine re-ceives a set of inputs and labels. Supervised learningmodels are trained with high accuracy to predict labels.The eﬀectiveness of supervised learning has been exam-ined by many predecessors on Ising models [1–8], Kitaevchain models [3], disordered quantum spin chain mod-els [3], Bose-Hubbard models [6], SSH models [6], SU(2)lattice gauge theory [7], topological states models [9], q -state Potts models [10, 11], uncorrelated conﬁgurationmodels [12], Hubbard models [12, 13], and XY models[14], ect.On the other hand, in unsupervised learning models,there are no labels. Unsupervised learning can be usedas meaningful analysis tools, such as sample generation,feature extraction, cluster analysis. Principal componentanalysis (PCA) is one of the unsupervised learning tech-niques. Recently investigators have examined the PCA’s ∗ [email protected] † [email protected] eﬀectiveness for exploring physical features without la-bels in the applications of phase transitions and criticalphenomena [8, 13–17]. Variational autoencoder (VAE)and convolutional VAE (cVAE), another two classicalunsupervised learning technique, incorporated into gen-erative neural networks, are used for data reconstructionand dimensional reduction in respect of phase transitionsand critical phenomena [8, 18].Although machine learning approaches have been ap-plied successfully in phase transitions and critical phe-nomena, there is only one study on the percolation model[17]. Motivated by predecessors, we conduct much morecomprehensive studies, which combine supervised learn-ing with unsupervised learning, to detect the physicalcharacteristics in the percolation model.Our work is considered from the following several as-pects. First, we use the former three kinds of deep convo-lutional neural networks (CNNs) to deduce the two orderparameters and the permeabilities in the two-dimensionalpercolation model. our inspiration and method comefrom [2, 3], whose study both focus on Ising model.Nevertheless, the above CNNs are trained on theknown conﬁgurations from the dataset obtained byMonte Carlo simulation. [8, 18] ﬁnd that VAE and cVAEcan reconstruct samples in Ising model. Hence, we usethe VAE and the cVAE to generate new conﬁgurationsthat are out of the dataset. After generating the newconﬁgurations, we pour them into the former three kinds a r X i v : . [ c ond - m a t . d i s - nn ] J a n of CNNs, respectively.Having explored supervised learning, we now move onto unsupervised learning. Here we try to identify physicalcharacteristics without labels in the two-dimensional per-colation model. [15] takes the ﬁrst principal componentobtained by PCA as the order parameter in Ising model.In contrast to [15], by using preprocessing on the unper-colating clusters, [17] also successfully ﬁnds the order pa-rameter in percolation model by PCA. In this study, wetry to use the PCA to extract relevant low-dimensionalrepresentations to discover physical characteristics.In an actual situation, we may not know the labelswhen identifying order parameter. To overcome the dif-ﬁculty associated with missing labels, [12] changes thepreset threshold between the labels zero and one so as tomake incorrect labels between the preset and the truethresholds. Hence, we deliberately change the presetthresholds, determined by k -means, between the labelszero and one. Here we use the fourth kinds of CNNswhich receives the raw conﬁgurations as input and thelabels determined by the preset thresholds.This paper is organized as follows. In Sec. II, we de-scribe the two-dimensional percolation model and thedataset from Monte Carlo simulation. In Sec. III, wegive a brief introduction to CNNs, VAE and cVAE, andPCA. Next, we provide dozens of machine learning mod-els to capture the physical characteristics and discuss theresults in Sec. IV. Finally, we conclude with a summaryin Sec. V. II. THE TWO-DIMENSIONAL PERCOLATIONMODEL

For percolation models, what we need to do is tocapture the physical characteristics. A suitable datasetshould be constructed to fulﬁll this objective. Variousmodels in physical dynamics can be simulated mathe-matically by the Monte Carlo method, and it has beenproved to be valid for using the Monte Carlo simulationto capture diﬀerent physical features in phase transitionsand critical phenomena [8, 14, 15].In this study, the Monte Carlo simulation for the two-dimensional percolation model is carried out as follows.First, 40 values of permeability range from 0.41 to 0.80with an interval of 0.01. For each permeability, the initialsamples consist of 1000 percolating conﬁgurations. Totrain the machine learning models, the matrix X withthe size of M × N (see Eq. 1) is used for storing 40,000raw percolating conﬁgurations. X =  a , a , . . . a ,N − a ,N a , a , . . . a ,N − a ,N ... ... . . . ... ... a M − , a M − , . . . a M − ,N − a M − ,N a M, a M, . . . a M,N − a M,N  M × N . (1)In Eq. (1), M = 40 , N = L × L , and L = 28. M and N represent the number of the conﬁgurations and the lattices, respectively. Each row R i ( i = 1 , , . . . , M )in the matrix X is a conﬁguration with one dimension,can be reshaped as the matrix X i ( i = 1 , , . . . , M ) withthe size of L × L (see Eq. 2). Furthermore, each column C j ( j = 1 , , . . . , N ) in the matrix X represents one lat-tice with diﬀerent conﬁgurations. Moreover, the element a ij ( i = 1 , , . . . , M ; j = 1 , , . . . , N ) in matrix X andthe element b kl ( k, l = 1 , , . . . , L ) in matrix X i take 0when the corresponding lattice is occupied and take 1otherwise. X i =  b b . . . b L − b L ... b L b L . . . b LL − b LL  L × L . (2)Except for the raw conﬁgurations, the dataset also en-compasses order parameter. In the two-dimensional per-colation model, order parameter includes the percolationprobability Π ( p , L ) and the density of the spanning clus-ter P ( p , L ). Π ( p , L ) refers to the probability that there isone connected path from one side to another in X i . Thatis to say, Π ( p , L ) is a function of the permeability p inthe system with the size of L × L (see Fig. 1(a)). Witha connectivity of 4, we can identify the cluster for eachlattice b kl . The clusters are marked sequentially with anunique index. Note that the two lattices having the sameindex belong to the same cluster. If there are more thanone cluster, the greatest cluster is chosen as the result.In this way, we can count up how many times the con-ﬁgurations { X , X , . . . , X M } are percolated for given p and L . For each permeability p , Π ( p , L ) p is expressed inEq. (3). Π ( p , L ) p = 11000 (cid:88) r =1 S r , (3)Where S r refers that the two-dimensional conﬁgura-tions { X e , X × e , . . . , X × e } for the permeability p is percolated or not. Clearly, S r takes 0 if the cor-responding lattice is occupied and takes 1 otherwise. r = { , , . . . , } , p ∈ { . , . , . . . , . } , e =( p − . × Π ( p , L ) p ∈ [0 , P ( p , L ). In contrast tothe Π ( p , L ), P ( p , L ) is associated with spanning clus-ter. Therefore, for all conﬁgurations { X , X , . . . , X M } , P ( p , L ) is characterized by that whether or not each lat-tice b kl belongs to the total spanning cluster. Similarly, P ( p , L ) is a function of the permeability p in the systemwith the size of L × L (see Fig. 1(b)). For each perme-ability p , P ( p , L ) p is expressed in Eq. (4). P ( p , L ) p = 11000 × L × L (cid:88) r =1 S (cid:48) r . (4)Where S (cid:48) r counts up the total number of lattices thatbelong to the spanning cluster for each conﬁguration in { X e , X × e , . . . , X × e } for the permeability p . Ob-viously, r = { , , . . . , } , p ∈ { . , . , . . . , . } , e = ( p − . × ≤ S (cid:48) r < L × L , and P ( p , L ) p ∈ [0 , permeability ( p , L ) (a) permeability P ( p , L ) (b) FIG. 1: (a) The relationship between the permeabilities { . , . , . . . , . } and the raw Π ( p , L ). (b) Therelationship between the permeabilities { . , . , . . . , . } and the raw P ( p , L ). III. MACHINE LEARNING METHODSA. CNNs

In this section, we will focus on the two-dimensionalpercolation model and the dataset obtained by the MonteCarlo simulation. This section will discuss several ma-chine learning approaches, including CNNs, VAE andcVAE, and PCA, to deduce physical characteristics.Let us ﬁrst introduce CNNs. CNNs, supervised learn-ing methods, are particularly useful in solving realisticproblem for many disciplines, such as physics[3], chem-istry [19], medicine [20], economics [21], biology [22], andgeophysics [23, 24], ect. In the applications of phase tran-sitions and critical phenomena, many predecessors utilizeCNNs to detect physical features, especially order param-eter [1, 2, 5, 6, 10, 25–27]. In this study, the four kindsof CNNs are not only used to detect the two order pa-rameters ( Π ( p , L ) and P ( p , L )), but also to detect thepermeability p .Next, we demonstrate the architecture of the CNNs(see Fig. 2). The structure of the CNNs hasfour layers, including two convolution layers and twofully connected layers. The percolating conﬁgurations { X , X , . . . , X M } are taken as inputs. Consequently,the CNNs receive the corresponding order parameters( Π ( p , L ) and P ( p , L )) or the permeability p as outputs. B. VAE and cVAE

In the former section (see Sec. III A), the raw con-ﬁguration X at diﬀerent permeability p is generated byMonte Carlo simulation. However, what if conﬁgurationsare not in the raw conﬁgurations, can we still identify thephysical features as well? Here we consider to use theVAE and the cVAE to generate new conﬁguration ˆ X VAE and ˆ X cVAE , respectively.VAE (see Fig. 3), a generative network, bases on thevariational Bayes inference proposed by [28]. Contractedwith traditional AE (see Fig. S. 1), the VAE describeslatent variables with probability. From that point, theVAE shows great values in data generation. Just likeAE, the VAE is composed of an encoder and a decoder.The VAE uses two diﬀerent CNNs as two probabilitydensity distributions. The encoder in the VAE, calledthe inference network p encoder ( Z | X ), can generate thelatent variables Z . And the decoder in the VAE, calledthe generating network p decoder ( ˆ X | Z ), reconstructs theraw conﬁguration X . Unlike AE, the encoder and thedecoder in VAE are constrained by the two probabilitydensity distributions.To better reconstruct the raw two-dimensional conﬁg-uration X , the fully connected layer in the VAE is re-placed by the convolution layer. Now we have got thecVAE. The architecture of the cVAE is shown in Fig. 3.Usually, the performance of the cVAE is better than theVAE due to the conﬁguration X with spatial attribute.In our work, the VAE and the cVAE are both used forgenerating THE new conﬁguration ˆ X . C. PCA

The Sec. III A and Sec. III B focus on supervisedlearning, which hypothesize that the labels exist forthe raw conﬁguration X on the percolation model.However, though we can detect the permeability val-ues { . , . , . . . , . } and the two order parame-ters ( Π ( p , L ) and P ( p , L )) by supervised learning, labeldearth often occurs. Thus, it is imperative to identifythe labels, such as Π ( p , L ), P ( p , L ), and p . Some recentstudies have shown that the ﬁrst principal componentobtained by PCA can be regarded as typical physicalquantities [8, 14, 15, 17]. Base on these studies, we ex- Input ﬁlter1 s i ze Conv1 ﬁlter2 s i ze Conv2 s i ze Flatten s i ze FC Output

FIG. 2: The structure of the CNNs with four layers, including “Conv1”, “Conv2”, “FC”, and “Output”. Thesquare with black and white lattices is percolating conﬁgurations { X , X , . . . , x M } . The light yellow cuboids(“Conv1” and “Conv2”) stand for convolution layers. The bright orange cuboids stand for max-pooling layers. Thelight purple cuboid “FC” refer to a fully connected layer. The input layer “Input” has the size of 28 ×

28. The ﬁrstlayer “Conv1” with “ﬁlter1” ﬁlters has the size of “size1” × “size1”. The second layer “Conv2” with “ﬁlter2” ﬁltershas the size of “size2” × “size2”. The layer “Flatten” owns the size of “size3”. The third layer “FC” owns the size of“size4”. And the last layer “Output” represents for a fully connected layer with one neuron. Input s i ze FC1 s i ze µ s i ze σ Encoder s i ze Z Reparameter s i ze FC2

Output

Decoder

FIG. 3: The structure of the VAE with an encoder and a decoder. The left large light purple cuboid refers to theencoder with two fully connected layers, i.e., “FC1”, and “ µ ” or “ σ ”, with the size of “size1”, and “size2”. And theright large light red cuboid is the decoder with two fully connected layers, including “FC2”, and “Output” with thesize of “size3”, and 784. The outputs of the encoder are the mean value “ µ ” and the standard deviation “ σ ”. Theinput “ Z ” of the decoder is sampled from the normal distribution with “ µ ” and “ σ ”. The green cuboid consists of“ µ ”, “ σ ”, and “ Z ”. The rectangles with 784 black and white lattices represent percolating conﬁguration X on theleft and its reconstruction ˆ X on the right, respectively.plore the meaning of the ﬁrst principal component on thepercolation model.As it is well-known, PCA can reduce the dimensionof the matrix X . First, we compute the mean value X mean = 1 /M (cid:80) Mi =1 a ij ( i = 1 , , . . . , M ; j = 1 , , . . . , N )for each column C j ( j = 1 , , . . . , N ) in the matrix X . Then we get the centered matrix X centered that is ex-pressed as X centered = X − X mean . After obtaining X centered , by an orthogonal linear transformation ex-pressed as X T centered X centered W = λW , we extract theeigenvectors W and the eigenvalues λ . The eigenvectors W are composed with w , w , . . . , w N . The eigenvalues Input

Encoder s i ze ﬁlter1 Conv1 s i ze ﬁlter2 Conv2 ﬁ l t e r × s i ze × s i ze Flatten s i ze µ s i ze σ s i ze Z Reparameter ﬁ l t e r × s i ze × s i ze FC s i ze ﬁlter1 Reshape s i ze ﬁlter2 Conv3 ﬁlter1 Conv4 Output

Decoder

FIG. 4: The structure of the cVAE with an encoder and a decoder. The left large light purple cuboid refers to theencoder with three layers, including the input layer “Input” with the size of “28 × × size1 × size1” and “ﬁlter2 × size2 × size2”, a ﬂatten layer “Flatten”with the size of “ﬁlter2 × size2 × size2”, and a fully connected layer (“ µ ” or “ σ ”) with the size of “size3”. And theright large light red cuboid is the decoder with four layers, including the layer “ Z ” with the size of “size3”, a fullyconnected layer “FC” with the size of “ﬁlter1 × size4 × size4”, a reshape layer “Reshape” with the size of“ﬁlter1 × size4 × size4”, two transposed convolution layers (“Conv3” and “Conv4”) with the size of“ﬁlter2 × size5 × size5” and “ﬁlter1 × × × µ ” and the standard deviation “ σ ”. The input of the decoder “ Z ” is sampled fromthe normal distribution with “ µ ” and “ σ ”. The green cuboid is consist of “ µ ”, “ σ ”, and “ Z ”. The squares with 784black and white lattices represent percolating conﬁgurations { X , X , . . . , X M } on the left and their reconstructions { ˆ X , ˆ X , . . . , ˆ X M } on the right, respectively.are sorted in the descending order, i.e., λ ≥ λ ≥ . . . ≥ λ N ≥

0. The normalized eigenvalues ˜ λ j ( j = 1 , , . . . , N )are expressed as λ j / (cid:80) Nj =1 λ j . The row R i in matrix X can be transformed into X (cid:48) i = R i W . Eq. 5 representsthe statistic average every 40 intervals for each perme-ability p . This process is quite similar to the process ofcalculating the two order parameters, i.e., Π ( p , L ) and P ( p , L ). Table I shows the procedure of PCA algorithm. (cid:104) X (cid:48) (cid:105) = 11000 (cid:88) i =1 | X (cid:48) i × | (5) IV. RESULTS AND DISCUSSIONA. Simulate the two order parameters by twoCNNs

In this section, we consider to use the approches inSec. III to capture the physic features. First we make useof TensorFlow 2.2 library to perform the CNNs with fourlayers. To predict the two order parameters ( Π ( p , L ) and P ( p , L )), two kinds of CNNs (CNNs-I and CNNs-II) areconstructed. The ﬁrst two layers of the former two kindsof CNNs are composed of two convolution layers (“Con1”and “Con2”), both of which possessed “ﬁlter1”=32 and“ﬁlter2”=64 ﬁlters with the size of 3 ×

3, and a stride of1. Each convolution layer is followed by a max-poolinglayer with the size of 2 ×

2. The ﬁnal convolution layer“Con2” is strongly interlinked to a fully connected layerTABLE I: The procedure of PCA Algorithm

Require: the raw conﬁguration X

1. Compute the mean value X mean for the column C j in thematrix X ;2. Get the centered matrix X centered ;3. Compute the eigenvectors W and the eigenvalues λ byan orthogonal linear transformation;4. Transform R i into X (cid:48) i ;5. Get the statistic average (cid:104) X (cid:48) (cid:105) with every 40 intervals foreach permeability p . “FC” with 128 variables. The output layer “Output”,following by “FC”, is a fully connected layer. For the twoconvolution layers and the fully connected layer “FC”, arectiﬁed linear unit (ReLU) a = max(0 , x ) [29] is chosenas activation function due to its reliability and validity.However, the output layer has no activation function.After determining the framework of CNNs-I andCNNs-II, here we mention how to train CNNs-I andCNNs-II for deducing Π ( p , L ) and P ( p , L ). First, wecarry out an Adam algorithm [30] as an optimizer to up-date parameters, i.e., weights and biases. Then, a minibatch size of 256 and a learning rate of 10 − are selectedfor its timesaving. Following this treatment, CNNs-Iand CNNs-II are trained on 1000 epochs for 40,000 un-correlated and shuﬄed conﬁgurations, respectively. Be-fore training, we split { X , X , . . . , X M } , Π ( p , L ) and P ( p , L ) into 32,000 training set and 8,000 testing set.While training, we monitor three indicators, includingthe loss function (i.e., mean squared error (MSE, seeEq. 6)), mean average error (MAE, see Eq. 7), and rootmean squared error (RMSE, see Eq. 8), for training andtesting set [24]. In Eq. 6-8, y i raw and y i pred refer to Π ( p , L )/ P ( p , L ) and its predictions. If the loss functionin testing set reaches the minimum, then the optimalCNNs-I and CNNs-II will be obtained. As can be seenfrom Fig. S. 2, these indicators gradually decrease. InTable. S. 3, the errors of the optimal CNNs-I and CNNs-II are very small. What stands out in Fig. S. 2 is thatCNNs-I and CNNs-II have high stability, consistency, andfaster convergence rate.MSE = 1 M M (cid:88) i =1 ( y i pred − y i raw ) (6)MAE = 1 M M (cid:88) i =1 | y i pred − y i raw | (7)RMSE = (cid:118)(cid:117)(cid:117)(cid:116) M M (cid:88) i =1 ( y i pred − y i raw ) (8)Before assess CNNs-I and CNNs-II, we have to explainwhat is meant by statistic average. Statistic average canbe deﬁned as the averages of CNNs-I’s or CNNs-II’s out-puts for each permeability p . As shown in Fig. 5, thereis a clear trend of phase transition between the perme-ability values { . , . , . . . , . } and Π ( p , L )/ P ( p , L ).The two grey lines in Fig. 5, which are the same as thetwo blue lines in Fig. 1, represent the relationship be-tween the permeability values { . , . , . . . , . } andthe raw Π ( p , L ) or P ( p , L ). Likewise, the two blue linesin Fig. 5, represent the relationship between the perme-ability values { . , . , . . . , . } and the statistic aver-age from the outputs of CNNs-I and CNNs-II. The over-lapping of the two kinds of lines shows that CNNs-I andCNNs-II can be used to deduce the two order parametersand the process of phase transition. To overcome the diﬃculty associated with the perco-lation model being near the critical transition point, wetruncate the dataset. Speciﬁcally, we remove the datanear the critical transition point, and only retain the datafar away from the critical transition point. Here we takethe simulation of Π ( p , L ) as an example. The retaineddata with the raw Π ( p , L ) rangs from 0 to 0.1, and 0.9to 1. As shown in Fig. S. 3 and the middle red points inFig. 6, we ﬁnd that CNNs-I can extrapolate Π ( p , L ) tomissing data by learning the retained data.This section demonstrate that CNNs-I and CNNs-IIcan be two eﬀective tools for detecting Π ( p , L ) and P ( p , L ), respectively. Additional test should be madeto verify that whether or not CNNs-I and CNNs-II arerobust against noise. To address this issue, we deliber-ately invert a proportion, i.e., 5%, 10%, and 20%, of thelabels for the raw Π ( p , L ) and P ( p , L ) and verify thatwhether or not the “artiﬁcial” noises can aﬀect the pre-dicted Π ( p , L ) and P ( p , L ). Fig. 7 and Fig. S. 4 demon-strate that CNNs-I and CNNs-II are robust against noise.As the labeling error rates increase, the same trend is ev-ident in the outputs of CNNs-I and CNNs-II within a rel-atively small diﬀerence (see Fig. 7). Therefore, we drawthe conclusion that noises have little eﬀect on detecting Π ( p , L ) and P ( p , L ). B. Simulate the permeability by one CNNs

Just like we use CNNs-I and CNNs-II in Sec. IV A,here we use the same structure for CNNs-III and strate-gies for training the permeability p . The only distinctionbetween CNNs-III and CNNs-I/CNNs-II is that the out-puts for CNNs-III are the permeability p instead of thetwo order parameters. Fig. S. 5 shows the performance ofCNNs-III. With successive increases in epochs, the MSE,MAE, and RMSE continue to decrease until no longerdropping.Another measure of CNNs-III’s performance is con-cerned with the diﬀerence between the raw permeability p and its prediction ˆ p . The blue circles in Fig. 8 showthat there is a strong positive correlation between theraw permeability p and its prediction ˆ p . Further statis-tical tests reveal that most of the gap between the rawpermeability p and its prediction ˆ p is less than 0.1. Theresult proves that CNNs-III has the advantage of conve-nient use and high precision.On the other hand, extrapolation ability can also re-ﬂect the performance of CNNs-III. To do this, we useMonte Carlo simulation to generate new dataset. Thenew permeability p not only ranges from 0.01 to 0.40,but also from 0.81 to 1.00, with an interval of 0.01. Justlike Sec. II, we perform 1050 Monte Carlo steps and keepthe last 1000 steps. As a consequence, 60,000 conﬁgu-rations are generated. The red circles in Fig. 8 exhibitthe extrapolation ability of CNNs-III. Our results showthat no signiﬁcant correlation between the extrapolatedpermeability p ex ranging from 0.81 to 1.00 and their pre- permeability ( p , L ) (a) rawCNN-1 permeability P ( p , L ) (b) rawCNN-2 FIG. 5: (a) The relationship between the permeability values { . , . , . . . , . } and the raw Π ( p , L ) or thestatistic average from the outputs of CNNs-I. (b) The relationship between the permeability values { . , . , . . . , . } and the raw P ( p , L ) or the statistic average from the outputs of CNNs-II. permeability ( p , L ) rawtruncated datasetextrapolated dataset FIG. 6: Identiﬁcation of phase transition withtruncated dataset by CNNs-I. The permeability values { . , . , . . . , . } and Π ( p , L ) connected with redpoints are artiﬁcially removed from the dataset. AndCNNs-I makes the judgment only by learning the dataassociated with blue points. The grey curve show that Π ( p , L ) shifts with the permeability values { . , . , . . . , . } .diction ˆ p extrapolated . However, in Fig. 8, the results in-dicate that CNNs-III has good extrapolation ability forthe extrapolated permeability p ex from 0.01 to 0.40. C. Generate new conﬁgurations by one VAE andone cVAE

Though CNNs-I, CNNs-II, and CNNs-III are validwhen detecting the two order parameters ( Π ( p , L ) and P ( p , L )) and the permeability p , the validity of thesethree CNNs is unkown for percolating conﬁgurations out-side of the dataset. As is shown in Fig. 3 and Fig. 4, weuse the same network structures for the VAE and thecVAE to generate new conﬁgurations ( ˆ X vae and ˆ X cvae ).Actually, ˆ X vae and ˆ X cvae can be regarded as adding somenoise into the raw conﬁguration X .Let us ﬁrst consider VAE. Just like AE (see Fig. S. 1),the VAE is also composed of an encoder and a decoder.The encoder of the VAE owns two fully connected layers,both of which follows with a ReLU activation function.The ﬁrst layer of the encoder possess “size1”=512 neu-rons. Another 512 neurons, including 256 mean “ µ ” and256 variance “ σ ”, are taken into account for the secondlayer of the encoder. By resampling from the Gaussiandistribution with the mean “ µ ” and the variance “ σ ”,we obtain 256 latent variables “ Z ” which are the inputsof the decoder. For the decoder of the VAE, two fullyconnected layers follow with the outputs of the encoder.For symmetry, the ﬁrst layer of the decoder also contains“size1”=512 neurons and follows with a ReLU activationfunction. And the output layer of the decoder contains784 neurons which are used to reconstruct the raw con-ﬁguration X . Thus, the neurons in the output layer arethe same as that in the input layer.Moving on now to consider cVAE. The encoder of thecVAE is composed of one input layer, two hidden con-volution layers with “ﬁlter1”=32 and “ﬁlter2”=64 ﬁlterswith the size of 3 and a stride of 2, and a ReLU activa-tion function. The output layer in the encoder is a fullyconnected ﬂatten layer with 800 neurons (400 mean “ µ ” permeability ( p , L ) (a) raw5%10%20% permeability P ( p , L ) (b) raw5%10%20% FIG. 7: Robustness of the two order parameters ( Π ( p , L ) and P ( p , L )) with noises for CNNs-I and CNNs-II. (a)The relationship between the permeability values { . , . , . . . , . } and the raw Π ( p , L ) or the statistic averagefrom the outputs of CNNs-I under diﬀerent noisy inputs. (b) The relationship between the permeability values { . , . , . . . , . } and the raw P ( p , L ) or the statistic average from the outputs of CNNs-II under diﬀerent noisyinputs.and 400 variance “ σ ”) without activation function. Byresampling from the Gaussian distribution with the mean“ µ ” and the variance “ σ ”, we obtain 400 latent variables“ Z ”. The reason why latent variables in the cVAE ismore than the VAE is that the cVAE needs to considermore complex spatial characteristic. The decoder of thecVAE is composed of an input layer with 400 latent vari-ables “ Z ”. A fully connected layer “FC” with 1,568 neu-rons is followed by “ Z ”. After reshaping the outputs of“FC” into three dimension, we feed the data into twotransposed convolution layers (“Conv3” and “Conv4”)and one output layer “Output”. The ﬁlters in these de-convolution layers are 64, 32 and 1 with the size of 3 andthe stride of 2. After excluding the output layer “Out-put” without activation functions, there exist two ReLUactivation functions followed by “Conv3” and “Conv4”,respectively.We train the VAE and the cVAE over 10 epochs us-ing the Adam optimizer, a learning rate of 10 − , and amini batch size of 256. To train the VAE and the cVAE,we use the sum of binary cross-entropy (see Eq. 9) andthe Kullback-Leibler (KL) divergence (see Eq. 10) as theloss function [18]. In Eq. 9-10, x raw i and x pred i representeach raw conﬁguration with one/two dimension and itsprediction. As shown in Fig. S. 6, the loss function, thebinary cross-entropy, and the KL divergence vary withepochs for the VAE and the cVAE. Here we focus on theminimum value of loss function. From Fig. S. 6, theoptimal cVAE performs better than the optimal VAE.BinaryCrossEntropy = − M (cid:88) i =1 (( x pred i × log( x raw i )+(1 − x pred i ) × log(1 − x raw i )) . (9) KL divergence = − M (cid:88) i =1 (cid:32) x raw i × log (cid:32) x raw i x pred i (cid:33)(cid:33) . (10)For a more visual comparison, we show the snapshotsof the raw conﬁgurations { X , X , . . . , X M } , and com-pare them to the VAE-generated and cVAE-generatedconﬁgurations in Fig. 9. As we can see from Fig. 9, theconﬁgurations from the Monte Carlo simulation, the VAEand the cVAE are very close to each other.After reconstructing the conﬁgurations ( ˆ X vae andˆ X cvae ) through the VAE and the cVAE and pouring ˆ X vae and ˆ X cvae into CNNs-I, CNNs-II, and CNNs-III, we candetect Π ( p , L ), P ( p , L ), and the permeability p . Fig. 10shows the relationships between the permeability values { . , . , . . . , . } and the statistic average from theoutputs in CNNs-I and CNNs-II, respectively. From thered and purple lines in Fig. 10, by using ˆ X vae and ˆ X cvae ,we can obtain the two order parameters as well. From theFig. 11, the raw permeability p is remarkably correlatedlinearly with its prediction ˆ p through the VAE/cVAE andCNNs-III. Thus, subtle change in the raw conﬁgurationsdoes not eﬀect the catch of physical features. D. Identify the characteristic of the ﬁrst principalcomponent by one PCA

This section will discuss how to capture the physiccharacteristics without labels. Various studies suggestto use PCA to identify order parameters [8, 14, 15, 17].Therefore, we try to verify the feasibility of PCA in cap-turing order parameter on the percolation model.First, we perform the PCA to the raw conﬁgura-tion X . Fig. 12 exhibit the N normalized eigenvalues raw permeability p r e d i c t e d p e r m e a b ili t y b y C NN - raw datasetextrapolated dataset FIG. 8: The correlation between the raw permeability p and its predictions. The blue circles refer to therelationship between the raw permeability p in datasetand its predictions ˆ p by CNNs-III. The red circles referto the relationship between the raw permeability p ex out of dataset and its prediction ˆ p extrapolated byCNNs-III. The raw permeability p ex out of dataset notonly range from 0.01 to 0.4, and from 0.81 to 1.0, withan interval of 0.01. The black line refers to thecorrelation between the raw permeabilities { p , p extrapolated } ranges from 0.01 to 1.0 with aninterval of 0.01 and their average predictions. r a w VA E p = 0.46 c VA E p = 0.60 p = 0.75 FIG. 9: Snapshots of percolating conﬁgurations for p ∈ { . , . , . } . The conﬁgurations in the top,middle, and bottom panels are sampled from the MonteCarlo simulation, the VAE, and the cVAE, respectively. ˜ λ n = λ n / (cid:80) Nn =1 λ n . ˜ λ n is also called as the explainedvariance ratios. The most noteworthy information inFig. 12 is that there is one dominant principal compo-nent ˜ λ , whcih is the largest one among ˜ λ n and muchlarger than other explained variance ratios. Thus, ˜ λ plays a key role when dealing with dimension reduction.Based on ˜ λ , the raw conﬁguration X are mapped toanother matrix Y = X ˜ λ .In Fig 13, we construct the matrix Y (cid:48) = { X ˜ λ , X ˜ λ } by the ﬁrst two eigenvalues and their eigenvectors. Weuse 40,000 blue scatter points to plot the the relationshipbetween X ˜ λ and X ˜ λ on 40 permeability values rang-ing from 0.41 to 0.8. Just like in Fig. 12, there is onlyone dominant representation on the percolation modeldue to the ﬁrst principal component is much more im-portant than the second principal component.Having analysed the importance of the ﬁrst principalcomponent, we now move on to discuss the meaning ofthe ﬁrst principal component. In Fig. 14, we focus onthe quantiﬁed ﬁrst principal component as a function ofthe permeability p and the two order parameters, i.e., Π ( p , L ) and P ( p , L ). From Fig. 14, we can see that thereis a strong linear correlation between the quantiﬁed ﬁrstprincipal component and the permeability p . And therelationships between the quantiﬁed ﬁrst principal com-ponent and two order parameters ( Π ( p , L ) and P ( p , L ))are similar to the relationships between the permeabilityvalues { . , . , . . . , . } and the two order parame-ters. Our results are signiﬁcant diﬀerent from formerstudy which demonstrates that the quantiﬁed ﬁrst prin-cipal component can be taken as order parameter by datapreprocessing [17]. A possible explanation may be that[17] evaluates the ﬁrst principal component by removingthe raw dataset with certain attributes on the percola-tion model. Therefore, we assume that the quantiﬁedﬁrst principal component obtained by PCA may not betaken as order parameter for various physical models.Another possible explanation is that, under certain con-ditions, the quantiﬁed ﬁrst principal component can beregarded as order parameter. E. Identify physic characteristics by one k -meansand one CNNs From Sec. IV D, no signiﬁcant corresponding is foundbetween the normalized quantiﬁed ﬁrst principal compo-nent and the two order parameters. Here we eagerly won-der how to identify order parameter. And another physi-cal characteristics we desire to explore is the critical tran-sition point. So we try to ﬁnd a way to capture order pa-rameter from the raw conﬁgurations { X , X , . . . , X M } and their permeability p on the percolation model.As it is well-known, for the two-dimensional percolat-ing conﬁgurations, the critical transition point is equal to0.593 in theoretical calculation. Though the critical tran-sition point is already known, we wonder that whetheror not the critical transition point can be found by ma-0 permeability ( p , L ) (a) rawby VAE and CNN-1by cVAE and CNN-1 permeability P ( p , L ) (b) rawby VAE and CNN-2by cVAE and CNN-2 FIG. 10: The relationship between the permeability values { . , . , . . . , . } and the Π ( p , L ) or the statisticaverages from the outputs of CNNs-I that originate from the outputs of the VAE and the cVAE. (b) Therelationship between the permeability values { . , . , . . . , . } and P ( p , L ) or the statistic averages from theoutputs of CNNs-II that originate from the outputs of the VAE and the cVAE. raw permeability p r e d i c t e d p e r m e a b ili t y b y VA E a n d C NN - raw permeability p r e d i c t e d p e r m e a b ili t y b y c VA E a n d C NN - FIG. 11: The correlation between the raw permeability p and its predictions through VAE/cVAE and CNNs-III.The blue circles refers to the relationship between the raw permeability p and its predictions by VAE/cVAE andCNNs-III. And the black line refers to the correlation between the raw permeability p and its average predictions.chine. To do this, we ﬁrst use a cluster analysis algorithmnamed k -means [31] to separate the raw conﬁgurations { X , X , . . . , X M } into two categories. The minimumand maximum value of their permeability p are 0.55 and0.80 in the ﬁrst cluster, and 0.41 and 0.65 in the secondcluster. Note that there is an overlapping interval be-tween 0.55 and 0.65 for the two categories. According tothe overlapping interval, we hypothesize that the criticalthreshold p c is set to be 11 values, i.e., 0.55, 0.56, . . . ,and 0.65. For each raw conﬁguration, if the permeabilityis smaller than p c , there will exist no percolating clusterand its label will be marked as 0; otherwise, there will exist at least one percolating cluster and its label will bemarked as 1.To detect physic characteristics, we use the fourthkinds of CNNs (CNNs-IV) with the structure in Fig. 2for 11 preset critical thresholds from 0.55 to 0.65. Notethat the output layer use a sigmoid activation functionexpressed as a = 1 / (1 + e − x ), to make sure the outputsare between 0 and 1. Another critical thing to pay atten-tion is that the He normal distribution initializer [32] andL2 regularization [33] are used in the layer of “Conv1”,“Conv2”, and “FC” on CNNs-IV. To avoid overﬁting, inaddition to L2 regularization, we also use a dropout layer1 n n FIG. 12: The explained variance ratios obtained fromthe raw conﬁguration X by the PCA, with thehorizontal axis indicating corresponding componentlabels. The largest value of the explained varianceratios locating at the top-left corner means that thereexists one dominant principle component. n d p r i n c i p a l c o m p o n e n t FIG. 13: Projection of the raw conﬁguration X ontothe plane of the ﬁrst two dominant principalcomponents, i.e., X ˜ λ and X ˜ λ . The color bar on theright indicates the permeability values { . , . , . . . , . } .with a dropout rate of 0.5 on “FC”. A Mini batch size of512 and a learning rate of 10 − are chosen while trainingCNNs-IV. The binary cross-entropy (see Eq. 9) is takenas the loss function on CNNs-IV. Another metric, usedto measure the performance of CNNs-IV, is the binaryaccuracy (see Eq. 11). The other hyper-parameters arethe same as the CNNs-I, CNNs-II, and CNNs-III.BinaryAccuracy = M (cid:88) i =1 n ( y pred i == y raw i ) n y pred i . (11)Turning now to the experimental evidence on the in- ference ability of capturing relevant physic features. Af-ter obtaining the well-trained CNNs-IV with high accu-racy (see Fig. S. 7), we obtain the outputs by pour-ing the raw 40,000 conﬁgurations { X , X , . . . , X M } intoCNNs-IV. The statistical average of the outputs is calcu-lated according to the 40 independent permeability val-ues { . , . , . . . , . } . The results of the correlationalanalysis are shown in Fig. 15. We set the horizontaldashed line as the threshold value 0.5. Hence, each curveis divided into two parts by the horizontal dashed line.The lower part indicates that the percolation system isnot penetrated; while the upper part implies that the per-colation system is penetrated. The crosspoint, where thehorizontal dashed line and the red curve are intersected,has a permeability value of 0.594, which is very close tothe theoretical value of 0.593 that is marked by the verti-cal dashed line in Fig. 15. Remarkably, the critical transi-tion point can be calculated by CNNs-IV with the presetvalue of 0.60 for the sampling interval of 0.01. There-fore, CNNs-IV with the preset threshold value of 0.60is the most eﬀective model. In further studies, the pre-set threshold value may need to be enhanced by smallersampling intervals for higher precision. V. CONCLUSIONS

As machine learning approaches have become increas-ingly popular in phase transitions and critical phenom-ena, predecessors have pointed out that these approachescan capture physic characteristics. However, previousstudies about identifying physical characteristics, espe-cially order parameter and critical threshold, need to befurther mutually validated. To highlight the possibility ofeﬀectiveness by machine learning methods, we conduct amuch more comprehensive research than predecessors toreassess the machine learning approaches in phase tran-sitions and critical phenomena.Our results show the eﬀectiveness of machine learn-ing approaches in phase transitions and critical phenom-ena than previous researchers. Precisely, we use CNNs-I,CNNs-II and CNNs-III to simulate the two order param-eters, and the permeability values. To identify whetheror not CNNs-I and CNNs-II are robust against noise, weadd a proportion of the noises for the two order param-eters. To validate the robustness of CNNs-I, CNNs-IIand CNNs-III, we also use VAE and cVAE to generatenew conﬁgurations that are slightly diﬀerent from theirraw conﬁgurations. After pouring the new conﬁgurationsinto the CNNs-I, CNNs-II, and CNNs-III, we achieve theresults that these models are robust against noise.However, after we use PCA to reduce the dimension ofthe raw conﬁgurations and make a statistically signiﬁcantlinear correlation between the ﬁrst principal componentand the permeability values, no statistically signiﬁcantlinear correlations are found between the ﬁrst principalcomponent and the two order parameters. Clearly, theﬁrst principal component fails to be regarded as an or-2 p e r m e a b ili t y ( p , L ) P ( p , L ) FIG. 14: Taking the normalized quantiﬁed ﬁrst principal component X ˜ λ as a function of the permeability X andthe two order parameters, i.e., Π ( p , L ) and P ( p , L ). permeability t h e a v e r a g e o u t p u t s FIG. 15: The 11 curves show that the average outputsshifts when the preset threshold changes from 0.55 to0.65. The average outputs and the threshold of phasetransitions deduce from diﬀerent preset threshold valueson the percolation model by CNNs-IV.der parameter in the two-dimensional percolation model.To identify order parameter, we use the fourth kinds ofCNNs, i.e., CNNs-IV. The results show that CNNs-IVcan identify new order parameter when the preset thresh- old value is 0.60. Surprisingly, we ﬁnd that the criticaltransition point value is 0.594 by CNNs-IV.Although these machine learning methods are validto explore the physical characteristics in the percolationmodel, the current study may still have some inevitablelimitations that prevent us from making an overall judge-ment by these methods on the other models of phasetransitions and critical phenomena. In other words, itmust be acknowledged that this research is based on thetwo-dimensional percolation model.We are not sure of theusefulness of applying our methods to the other models.Consequently, our methods in this study may open anopportunity to other models on phase transitions andcritical phenomena for further research.

ACKNOWLEDGMENTS

The authors gratefully thank Yicun Guo for revisingthe manuscript. We also thank Jie-Ping Zheng and Li-Ying Yu for helpful discussions and comments. The workof S.Cheng, H.Zhang and Y.-L.Shi is supported by JointFunds of the National Natural Science Foundation ofChina (U1839207). The work of F.He and K.-D.Zhu issupported by the National Natural Science Foundationof China (No.11274230 and No.11574206) and NaturalScience Foundation of Shanghai (No.20ZR1429900). [1] A. Tanaka and A. Tomiya, Detection of phase transitionvia convolutional neural networks, Journal of the Physi-cal Society of Japan , 063001 (2017).[2] J. Carrasquilla and R. G. Melko, Machine learning phasesof matter, Nature Physics , 431 (2017).[3] E. P. Van Nieuwenburg, Y.-H. Liu, and S. D. Huber,Learning phase transitions by confusion, Nature Physics , 435 (2017).[4] K. Shiina, H. Mori, Y. Okabe, and H. K. Lee, Machine- learning studies on spin models, Scientiﬁc reports , 1(2020).[5] P. Suchsland and S. Wessel, Parameter diagnostics ofphases and phase transition learning by neural networks,Physical Review B , 174435 (2018).[6] P. Huembeli, A. Dauphin, and P. Wittek, Identifyingquantum phase transitions with adversarial neural net-works, Physical Review B , 134109 (2018).[7] S. Wetzel and M. Scherzer, Machine learning of explicit order parameters: From the ising model to su(2) latticegauge theory, Physical Review B (2017).[8] S. J. Wetzel, Unsupervised learning of phase transitions:From principal component analysis to variational autoen-coders, Physical Review E , 022140 (2001).[9] D.-L. Deng, X. Li, and S. D. Sarma, Machine learningtopological states, Physical Review B , 195145 (2017).[10] D. Bachtis, G. Aarts, and B. Lucini, Mapping distinctphase transitions to a neural network, Physical ReviewE , 053306 (2020).[11] X. Zhao and L. Fu, Machine learning phase transition:An iterative proposal, Annals of Physics , 167938(2019).[12] Q. Ni, M. Tang, Y. Liu, and Y.-C. Lai, Machine learningdynamical phase transitions in complex networks, Phys-ical Review E , 052312 (2019).[13] K. Ch’ng, N. Vazquez, and E. Khatami, Unsupervisedmachine learning account of magnetic transitions in thehubbard model, Physical Review E , 013306 (2018).[14] S. J. Wetzel, Discovering phases, phase transitions, andcrossovers through unsupervised machine learning: Acritical examination, Physical Review E , 062122(2017).[15] L. Wang, Discovering phase transitions with unsuper-vised learning, Physical Review B , 195105 (2016).[16] W. Zhang, J. Liu, and T.-C. Wei, Machine learning ofphase transitions in the percolation and xy models, Phys-ical Review E (2018).[17] W. Yu and P. Lyu, Unsupervised machine learning ofphase transition in percolation, Physica A: StatisticalMechanics and its Applications , 125065 (2020).[18] F. D’Angelo and L. B¨ottcher, Learning the ising modelwith generative neural networks, Physical Review Re-search , 023266 (2020).[19] J. P. Janet, H. Kulik, Y. Morency, and M. Caucci, Ma-chine Learning in Chemistry (2020).[20] D. Plant and A. Barton, Machine learning in precisionmedicine: lessons to learn, Nature Reviews Rheumatol- ogy (2020).[21] I. Hull, Machine learning and economics (2021) pp. 61–86.[22] M. Buchanan, Machines learn from biology, NaturePhysics , 238 (2020).[23] R. Poorvadevi, G. Sravani, and V. Sathyanarayana, Aneﬀective mechanism for detecting crime rate in chennailocation using supervised machine learning approach, In-ternational Journal of Scientiﬁc Research in ComputerScience, Engineering and Information Technology , 326(2020).[24] S. Cheng, X. Qiao, Y. Shi, and D. Wang, Machine learn-ing for predicting discharge ﬂuctuation of a karst springin north china, Acta Geophysica , 1 (2021).[25] K. Kashiwa, Y. Kikuchi, and A. Tomiya, Phase transitionencoded in neural network, Progress of Theoretical andExperimental Physics , 083A04 (2019).[26] S. Arai, M. Ohzeki, and K. Tanaka, Deep neural networkdetects quantum phase transition, Journal of the PhysicalSociety of Japan , 033001 (2018).[27] R. Xu, W. Fu, and H. Zhao, A new strategy in applyingthe learning machine to study phase transitions, arXivpreprint arXiv:1901.00774 (2019).[28] D. P. Kingma and M. Welling, Auto-encoding variationalbayes, CoRR abs/1312.6114 (2014).[29] A. F. Agarap, Deep learning using rectiﬁed linear units(relu), arXiv preprint arXiv:1803.08375 (2018).[30] D. P. Kingma and J. Ba, Adam: A method for stochasticoptimization, arXiv preprint arXiv:1412.6980 (2014).[31] J. A. Hartigan and M. A. Wong, Algorithm as 136: Ak-means clustering algorithm, Journal of the royal statis-tical society. series c (applied statistics) , 100 (1979).[32] .[33]