[PDF] Generalization properties of restricted Boltzmann machine for short-range order

Abstract

The restricted Boltzmann machine (RBM) is used to investigate short-range order in binary alloys. The network is trained on the data collected by Monte Carlo simulations for a simple Ising-like binary alloy model and used to calculate the Warren--Cowley short-range order parameter and other thermodynamic properties. We demonstrate that RBM not only reproduces the order parameters for the alloy concentration at which it was trained, but can also predict them for any other concentrations.

Full PDF

aa r X i v : . [ c ond - m a t . d i s - nn ] J a n Generalization properties of restricted Boltzmann machine for short-range order

M. A. Timirgazin ∗ and A. K. Arzhnikov Physical-Technical Institute, UdmFRC UB RASIzhevsk, Russian Federation, 426067 (Dated: January 25, 2021)The restricted Boltzmann machine (RBM) is used to investigate short-range order in binaryalloys. The network is trained on the data collected by Monte Carlo simulations for a simpleIsing-like binary alloy model and used to calculate the Warren–Cowley short-range order parameterand other thermodynamic properties. We demonstrate that RBM not only reproduces the orderparameters for the alloy concentration at which it was trained, but can also predict them for anyother concentrations.

I. INTRODUCTION

The use of machine learning (ML) methods has provedits eﬃciency in various technical devices. Neural net-works have made a breakthrough in speech and imagerecognition technologies, the implementation of control ofcomplex technical devices and manufacturing processes.In recent years, ML has grown in popularity as a powerfulscientiﬁc tool. Neural networks are used for identiﬁcationand classiﬁcation of phases in classical and quantum sys-tems [1–7], accelerating Monte Carlo [8–10] and molec-ular dynamics simulations [11 and 12], and many otherﬁelds (see the reviews [13 and 14]).Among the various architectures of neural networks, ofparticular interest is the restricted Boltzmann machine(RBM), a simple generative energy-based model [15].The main feature of RBM is the hidden layer which inphysical language can be interpreted as an introducedauxiliary ﬁeld that allows you to decouple complex inter-action between visible variables (one can draw an anal-ogy with the Hubbard–Stratonovich transformation [16]).Based on principles of statistical mechanics RBM turnsout to be able to capture the physics underlying classicaland quantum many-body problems [17–20]. It’s impor-tant that RBM can not only classify and process largedata, but can be used even for reconstructing the groundstate wave function [21]. While RBM usually is treatedas a component of deep belief networks [22], it is foundthat at least for Ising model the shallow RBM gives justas good results as deep models being much more simplein training [23].While neural networks prove their ability to eﬀectivelycalculate physical properties being trained on the prelim-inary prepared data set, their generalization properties,i. e. ability to adapt properly to new data and make cor-rect predictions, are still under the question. We trainRBM on a binary system with the short-range order anddemonstrate how the network can not only reproduceorder parameters for the concentration at which it wastrained but predict order parameters for any other con-centration. ∗ [email protected] B ABA A BB A B φ AB φ BB φ AA J FIG. 1. Transformation of binary (A and B) alloy problem toIsing spin model on square lattice.

II. MODELA. Short-range order

Short-range order (SRO) refers to the regular and pre-dictable arrangement on a local length scale. A simplesystem which manifests SRO is the binary alloy withtype- A and type- B atoms (Fig. 1) [24]. The crystal en-ergy of the binary alloy can be represented as a sum ofpair potentials: E = N AA φ AA + N BB φ BB + N AB φ AB , (1)where N αβ is the number of atom pairs α - β , φ αβ is theirenergy. Let the variable S i take the value +1 or − i occupied by an atom A or B , respectively. Then theenergy of the system (1) can be written in the Ising-likeform [24]: H = − J X h i,j i S i S j − h X i S i + C , (2)where J = [ φ AB − ( φ AA + φ BB )], h = z ( φ BB − φ AA ), C = zN ( φ AA + φ BB + φ AB ), N is the total numberof sites, z is the coordination number. Omitting the con-stant and external-ﬁeld terms by assumption φ AA = φ BB we leave the only term: H = − J X h i,j i S i S j . (3)If J is positive, similar atoms tend to cluster together justas parallel spins are favoured in a ferromagnet. If J isnegative, there is a tendency to form unlike pairs, as in anantiferromagnet. If J is zero, the system will be perfectlydisordered. Most commonly SRO is described by theWarren–Cowley (WC) parameter [25 and 26] which isthe pair-correlation function: α = 1 − P AB x , (4)where P AB is the probability of a B atom being found ata site nearest to the A atom, x is the concentration of A atoms. The sign of α coincides with the sign of J , α = 0corresponds to perfect disorder. B. Monte Carlo simulation

Similarity with the Ising model suggests that SRO ofa ﬁnite system can be eﬀectively studied by Monte Carlo(MC) simulation [27]. There are the following diﬀerencesfrom the common Metropolis–Hastings algorithm for themagnetic Ising model [28]: the alloy concentration (netmagnetization in terms of magnetism) is set at the systeminitialization and remains ﬁxed during simulation; not aspin-ﬂip on site, but a change from AB to BA bond ismade at every MC step.We consider the system of N = L × L sites with L =10 and periodic boundary conditions. We start with arandom distribution of atoms (spins) with the requiredconcentration. The concentrations considered are in therange from 0.05 to 0.95 with a step of 0.05. Each trialto change the neighboring atoms A and B (spins ↑ and ↓ ) is accepted according to the Metropolis prescription.To reach the equilibrium state we take 10 MC stepsfor all the concentrations studied, where one MC stepconsists of N single-site updates. Then, we take 10 MCsteps to collect the data set D of 10 samples over whichaveraging will occur. The large size of the data set allowsus to avoid overﬁtting problem and using regularizationin the further training of the neural network [17].Holding the concentration can be interpreted as an ex-ternal ﬁeld and implies the absence of phase transitionin the ferromagnetic (FM) case: there is long-range or-der (LRO) for any value of J on the square lattice [29].The case of antiferromagnetic (AFM) interaction is muchmore complex: the model undergoes a phase transitionfrom the ordered AFM to disordered paramagnetic (PM)state even in the presence of an external ﬁeld [30]. Thereis no exact analytical results for the critical values andnumerical simulations are also problematic [30 and 31].We restrict ourselves by the consideration of FM J =+1 and AFM J = − . T = 1). The value of the negative J is chosen to ensure that the system is in PM state farfrom phase transition for any alloy concentration. So weavoid problems with AFM LRO and at the same time canstudy two essentially diﬀerent cases with the standardMetropolis MC algorithm on a relatively small lattice.For both the cases we run MC simulations and collect data sets D , over which averaging is performed. Fig. 2shows some random samples from data sets D for x = 0 . J = +1 and J = − .

2. It can be clearly seen howthe atoms of the same type tend to cluster together forpositive J and how the atoms of diﬀerent types tend tooccupy neighboring sites for negative J . The calculatedaverage WC parameters h α i reach their extreme valuesat x = 0 . a)b) FIG. 2. Examples of atoms distribution obtained by MC sim-ulations for J = +1 (a) and J = − . C. Restricted Boltzmann machine

The full Boltzmann machine, a stochastic energy-basedneural network, consists of two types of neurons (nodes):visible and hidden [15]. Each node is binary and con-nected to other nodes with some weight. If the connec-tions between neurons of the same type are forbidden, themodel is called the restricted Boltzmann machine [32].The architecture of RBM is depicted in Fig. (3): allneurons are arranged in visible ( i = 1 . . . n ) and hidden( j = 1 . . . m ) layers interconnected with weights w i,j . To-gether with biases a i and b j , the weights constitute theRBM parameter set denoted by θ . These parameters de-termine the energy of a given conﬁguration ( v, h ): E ( v, h ) = − X i a i v i − X j b j h j − X i,j v i w i,j h j . (5)The probability distribution over hidden and visible vec-tors is deﬁned by Boltzmann distribution: P ( v, h ) = 1 Z e − E ( v,h ) , (6)where Z has the meaning of the partition function. v v v v h h h w i,j FIG. 3. Restricted Boltzmann machine architecture

RBMs gained wide popularity in the 2000s, becausesome eﬃcient training algorithms were developed by Hin-ton [33]. Once RBM is trained, which means that theparameter set θ is found, a new data set S can be gen-erated by the block Gibbs sampling procedure. Success-ful training provides similarity between the real physicalprobability distribution and the probability distributionof the generated data set. The method of RBM train-ing is described in detail in Ref. 34. The main idea is touse the so-called contrastive divergence approximationwith k steps (CD- k ) to perform stochastic gradient de-scent procedure for the negative log-likelihood function.Contrastive divergence procedure becomes possible dueto bipartite structure of RBM, which enables the blockGibbs sampling. The hidden layer can be completely ob-tained from the known visible layer, and vice versa, bythe conditional probabilities: p ( h j = 1 | v ) = σ b j + X i w i,j v i ! , (7) p ( v i = 1 | h ) = σ  a i + X j w i,j h j  . (8)To generate a new data set by trained RBM one can ini-tialize visible layer by random values v and run Gibbssampling procedure i. e. to compute hidden layer h ,then new visible layer v , then new hidden layer h andso on. After some equilibration period (like in usual MCmethod) the generated visible units can be collected to the data set S , which is expected to obey the same prob-ability distribution as the data set D . Averaging over S allows to calculate the observable physical values likemean energy or heat capacity. III. RESULTS

In order to produce the training data set D , we usethe standard Markov chain MC technique described inSec. II B. The importance sampling yields 10 indepen-dent spin conﬁgurations for J = +1 and J = − . x ∈ [0 .

05; 0 .

95] in steps of ∆ x = 0 . n h = n v = 100, contrastive di-vergence with the only one step (CD-1), learning rate η = 0 .

01, minibatch size 100, the number of learningepochs 1000 (deﬁnitions of training parameters can befound in Ref. 34). RBMs with these parameters weretrained on data set D at each concentration of the stud-ied range.At the ﬁrst stage we address the question how wellcan RBM reproduce the observable physical quantitiesknown from MC simulations. In order to answer it, wegenerate a new data set S of 10 samples at each con-centration by the corresponding RBMs and calculate theaverage observables. In Figs. 4-6 computed by such away energy h E i , WC parameter h α i and heat capacity h C i = (cid:0) h E i − h E i (cid:1) /T are depicted as ’straight RBM’and compared with the ’exact’ values obtained by MCsimulations for FM interaction. While the energy showsnot bad agreement, other quantities are not satisfactoryat all. This is explained by strong sensitivity of short-range correlations to the alloy concentration which is ac-tually is variable in each data set. The reason is that theGibbs sampling scheme being fundamentally stochasticprocedure gives the concentration x (at which RBM wastrained) only on average, but the concentrations of eachindividual sample of S is ﬂuctuating around x . Theseﬂuctuations result in ﬂuctuations of the SRO parameterand the heat capacity.To overcome this diﬃculty, we propose an algorithmfor forcing the concentration to have the required value x . A ﬂowchart of the algorithm is depicted in Fig. 7.Every time a new sample (visible layer) has been gener-ated, its concentration is calculated immediately. If theconcentration is equal to the required x , we accept thissample. If the concentration is lower than x , we needto reduce the number of excess B atoms. To do this,we choose a random node k of type B and re-make itsbinarization by comparing p ( v k = 1 | h ) (8) with a ran-dom number from 0 to 1. If the atom type changes butthe concentration is still not equal to x we repeat theprocedure of random atom selection and re-binarizationuntil the required concentration x is reached. Similarsteps are taken in the case of x > x . Finally, we record ⟨ E ⟩ ⟩ N MCRBM⟨straightRBM⟨forcedRBM⟨x FIG. 4. Energy per site for FM interaction. Black solid linedenotes MC simulation, cross markers are straight RBM, cyantriangles are forced RBM, red squares are RBM trained on x = 0 . ⟨ α ⟩ MCR⟩M⟨straightR⟩M⟨forcedR⟩M⟨x FIG. 5. Warren–Cowley parameter for FM interaction. Blacksolid line denotes MC simulation, cross markers are straightRBM, cyan triangles are forced RBM, red squares are RBMtrained on x = 0 . the layer with the concentration x in the dataset. Sowe achieve that all the samples have the same concen-tration x . Averaging over the data sampled by RBMwith the described forcing procedure gives the values pre-sented as ’RBM forced’ in Figs. 4-6 for FM interactionand Figs. 8-10 for AFM interaction. As can be seen allthe observables now agree much better with MC data.The goal of the second stage of our research is to checkwhether it is possible to access the information aboutSRO at some arbitrary concentration by RBM trained atcompletely diﬀerent concentration. In order to do it oneshould have a possibility to generate samples with arbi- ⟨ C ⟩ ⟩ N MCRBM⟨straightRBM⟨forcedRBM⟨x FIG. 6. Heat capacity per site for FM interaction. Black solidline denotes MC simulation, cross markers are straight RBM,cyan triangles are forced RBM, red squares are RBM trainedon x = 0 . x = x x < x x > x Choose random node of type Band repeat its binarization

Calculate x Generate new sample

A A AB B B

Calculate x Calculate x Choose random node of type Aand repeat its binarization If x = x ? If x = x ?End YES YESNO NO

FIG. 7. Flowchart of the sample generation with the forcedconcentration x . trary concentration. We assumed that the algorithm de-scribed by ﬂowchart (Fig. 7) can be applied to solve thisproblem. Indeed, this procedure does not contain any re-strictions for the concentration x : stochastic nature ofRBM allows for forcing any concentration x ′ in samplewith a suﬃcient number of repetitions of the ﬂowchartsteps. The only limitation is that in the case of an exces- ⟨ E ⟩ ⟩ N MCRBM⟨forcedRBM⟨x FIG. 8. Energy per site for AFM interaction. Black solid linedenotes MC simulation, cyan triangles are forced RBM, redsquares are RBM trained on x = 0 . ⟨ α ⟩ MCR⟩M⟨forcedR⟩M⟨x FIG. 9. Warren–Cowley parameter for AFM interaction.Black solid line denotes MC simulation, cyan triangles areforced RBM, red squares are RBM trained on x = 0 . sively large diﬀerence between the desirable concentra-tion x ′ and the concentration of the training data set x (∆ x = x ′ − x ) the duration of such an algorithm mayturn out to be too long. To make it easier to achieve therequired concentration, we modify the probability deﬁni-tion (8) by adding to it the diﬀerence ∆ x . Making thiswe shift the average concentration in a sample generatedby RBM to the side of x ′ . As a result we have an eﬃcientand fast method to force the alloy concentration in thegenerated data set S to have an arbitrary value.With the help of the described algorithm we use RBMtrained at x = 0 . .

05. 10 samples are pre- ⟨ C ⟩ ⟩ N MCRBM⟨forcedRBM⟨x FIG. 10. Heat capacity per site for AFM interaction. Blacksolid line denotes MC simulation, cyan triangles are forcedRBM, red squares are RBM trained on x = 0 . pared for each concentration and the average observablesare computed. The results are presented as ’RBM x ’ inFigs. 4-6 for FM interaction and in Figs. 8-10 for AFM in-teraction. A comparison with the ’exact’ MC data leadsto an impressive conclusion: RBM trained at one con-centration satisfactorily reproduces both qualitative andquantitative character of SRO and other observables atany concentration, including those far from original x . IV. CONCLUSIONS

We demonstrate the predictive power of generativeneural networks for the systems exhibiting short-rangeorder. The algorithm is proposed that allows using amachine trained on some alloy concentration to calcu-late the order parameters at any other alloy concentra-tion. Applicability of the algorithm appears to be re-markably wide, it is suitable for the prediction of theSRO and thermodynamic properties in the cases of pos-itive and negative eﬀective interactions, the ordered anddisordered phases, close and far from the original concen-tration. This may be explained by the inherent ability ofthe neural networks to recognize the patterns underlyingthe data, combined with an exceptional ability of RBM,as a stochastic generative network, to reproduce thesepatterns in the sampled data in a controlled manner.The proposed method is approved for the two-component system but can be naturally extended to thecase of multicomponent alloys. This can be particularlyuseful in the context of the high-entropy alloys investiga-tion, novel materials with exceptional mechanical proper-ties. They are consisted of ﬁve or more elements and theircharacteristics are very sensitive to local chemical envi-ronment. Monte Carlo simulations for such systems (seee. g. [35]) are very expensive in terms of computationaltime and the search for the optimal composition by tryingall possible concentrations is problematic. RBM trainedon some samples taken from MC simulations would shedlight on inaccessible areas of concentration, spending rel-atively little computing resources.Another promising application is to ﬁnd a local atomicenvironment in alloys based on experimental data for asingle concentration. For example, if we have the prob-abilities of the clusters with diﬀerent atomic conﬁgura-tions from the experiment, we can use MC approach togenerate a data set where each sample is ordered accord-ingly. Using our algorithm, we can train RBM on this data set and predict the local atomic conﬁgurations, andhence the other properties, e. g. hyperﬁne ﬁeld or localmagnetization [36], for the concentrations other than theexperimental one.

ACKNOWLEDGMENTS

This study was supported by the ﬁnancing programAAAA-A16-116021010082-8. [1] L. Wang, Discovering phase transitions with unsuper-vised learning, Phys. Rev. B , 195105 (2016).[2] J. Carrasquilla and R. G. Melko, Machine learning phasesof matter, Nat. Phys. , 431 (2017).[3] W. Hu, R. R. P. Singh, and R. T. Scalettar, Discov-ering phases, phase transitions, and crossovers throughunsupervised machine learning: A critical examination,Phys. Rev. E , 062122 (2017).[4] K. Shiina, H. Mori, Y. Okabe, and H. K. Lee, Machine-learning studies on spin models, Scientiﬁc Reports ,2177 (2020).[5] S. J. Wetzel, Unsupervised learning of phase transitions:From principal component analysis to variational autoen-coders, Phys. Rev. E , 022140 (2017).[6] K. Ch’ng, J. Carrasquilla, R. G. Melko, and E. Khatami,Machine learning phases of strongly correlated fermions,Phys. Rev. X , 031038 (2017).[7] T. Westerhout, N. Astrakhantsev, K. S. Tikhonov, M. I.Katsnelson, and A. Bagrov, Generalization propertiesof neural network approximations to frustrated magnetground states, Nature Communications (2020).[8] J. Liu, Y. Qi, Z. Y. Meng, and L. Fu, Self-learning montecarlo method, Phys. Rev. B , 041101(R) (2017).[9] L. Huang and L. Wang, Accelerated monte carlosimulations with restricted boltzmann machines,Phys. Rev. B , 035105 (2017).[10] H. Shen, J. Liu, and L. Fu, Self-learning monte carlo withdeep neural networks, Phys. Rev. B , 205140 (2018).[11] F. No´e, A. Tkatchenko, K.-R. M¨uller, andC. Clementi, Machine learning for molecular simulation,Annual Review of Physical Chemistry , 361 (2020),pMID: 32092281.[12] J. S. Smith, O. Isayev, and A. E. Roitberg, Ani-1: an extensible neural network potential withdft accuracy at force ﬁeld computational cost,Chem. Sci. , 3192 (2017).[13] G. Carleo, I. Cirac, K. Cranmer, L. Daudet,M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zde-borov´a, Machine learning and the physical sciences,Rev. Mod. Phys. , 045002 (2019).[14] P. Mehta, M. Bukov, C.-H. Wang, A. G. Day, C. Richard-son, C. K. Fisher, and D. J. Schwab, A high-bias, low-variance introduction to machine learning for physi-cists, Physics Reports , 1 (2019), a high-bias, low-variance introduction to Machine Learning for physicists.[15] D. H. Ackley, G. E. Hinton, and T. J. Se- jnowski, A learning algorithm for boltzmann machines,Cognitive Science , 147 (1985).[16] R. L. Stratonovich, On a Method of Calculating Quan-tum Distribution Functions, Soviet Physics Doklady ,416 (1957).[17] G. Torlai and R. G. Melko, Learning ther-modynamics with boltzmann machines,Phys. Rev. B , 165134 (2016).[18] Y. Nomura, A. S. Darmawan, Y. Yamaji, andM. Imada, Restricted boltzmann machine learningfor solving strongly correlated quantum systems,Phys. Rev. B , 205152 (2017).[19] E. Rrapaj and A. Roggero, Exact representations ofmany body interactions with rbm neural networks (2020),arXiv:2005.03568 [nucl-th].[20] T. Vieijra, C. Casert, J. Nys, W. De Neve, J. Haegeman,J. Ryckebusch, and F. Verstraete, Restricted boltzmannmachines for quantum states with non-abelian or anyonicsymmetries, Phys. Rev. Lett. , 097201 (2020).[21] G. Carleo and M. Troyer, Solving the quantummany-body problem with artiﬁcial neural networks,Science , 602 (2017).[22] G. E. Hinton, S. Osindero, and Y.-W. Teh,A fast learning algorithm for deep belief nets,Neural Comput. , 1527 (2006).[23] A. Morningstar and R. G. Melko, Deep learning the isingmodel near criticality, J. Mach. Learn. Res. , 5975(2017).[24] J. Ziman and P. Ziman, Models of Disorder: The Theoretical Physics of Homogeneously Disordered Systems (Cambridge University Press, 1979).[25] J. M. Cowley, An approximate theory of order in alloys,Phys. Rev. , 669 (1950).[26] B. E. Warren, X-ray Diﬀraction (Addison-Wesley, Read-ing, MA, 1969).[27] P. D. Scholten, Monte carlo study of the critical temper-ature of a two-dimensional, ferromagnetic, binary, isingsystem, Phys. Rev. B , 345 (1985).[28] M. Newman and G. Barkema, Monte Carlo Methods in Statistical Physics (Claren-don Press, 1999).[29] R. Baxter,

Exactly Solved Models in Statistical Mechanics (Academic Press, 1982).[30] K. Binder and D. P. Landau, Phase diagramsand critical behavior in ising square lattices withnearest- and next-nearest-neighbor interactions,

Phys. Rev. B , 1941 (1980).[31] B. J. Louren¸co and R. Dickman, Phase di-agram and critical behavior of the antifer-romagnetic ising model in an external ﬁeld,Journal of Statistical Mechanics: Theory and Experiment , 033107 (2016).[32] P. Smolensky, Information processing in dynamical sys-tems: Foundations of harmony theory, Parallel Dis-tributed Process (1986).[33] G. E. Hinton, Training products of ex-perts by minimizing contrastive divergence,Neural Computation , 1771 (2002).[34] G. E. Hinton, A practical guide to train-ing restricted boltzmann machines, in Neural Networks: Tricks of the Trade: Second Edition ,edited by G. Montavon, G. B. Orr, and K.-R. M¨uller(Springer Berlin Heidelberg, Berlin, Heidelberg, 2012)pp. 599–619.[35] B. Sch¨onfeld, C. R. Sax, J. Zemp, M. Engelke,P. Boesecke, T. Kresse, T. Boll, T. Al-Kassab, O. E.Peil, and A. V. Ruban, Local order in cr-fe-co-ni: Experiment and electronic structure calculations,Phys. Rev. B , 014206 (2019).[36] A. Arzhnikov, A. Bagrets, and D. Bagrets, Allowancefor the short-range atomic order in describing themagnetic properties of disordered metal-metalloid alloys,Journal of Magnetism and Magnetic Materials153