[PDF] Restricted Boltzmann Machines for the Long Range Ising Models

Abstract

Full PDF

KKANAZAWA-16-04

Restricted Boltzmann Machines for the Long Range Ising Models

Ken-Ichi Aoki ∗ and Tamao Kobayashi † Institute for Theoretical Physics, Kanazawa University, Kanazawa 920-1192, Japan National Institute of Technology, Yonago College, Yonago 683-8502, Japan

Abstract

We set up Restricted Boltzmann Machines (RBM) to reproduce the Long Range Ising (LRI)models of the Ohmic type in one dimension. The RBM parameters are tuned by using the standardmachine learning procedure with an additional method of Conﬁguration with Probability (CwP).The quality of resultant RBM are evaluated through the susceptibility with respect to the magneticexternal ﬁeld. We compare the results with those by Block Decimation Renormalization Group(BDRG) method, and our RBM clear the test with satisfactory precision. ∗ Electronic address: [email protected] † Electronic address: [email protected] a r X i v : . [ c ond - m a t . s t a t - m ec h ] J a n . INTRODUCTION Quite recently the deep learning machines have drawn much attention since they are veryeﬀective for image processing [1] and Go game [2] etc. The deep learning machines can beregarded as a method of information reduction keeping as much macroscopically importantfeatures as possible. This policy or idea resembles to the renormalization group method inphysics [3, 6], and the intrinsic relationship between them has been argued [7], and furtherinvestigation is strongly desired from both sides.Here in this article we make Restricted Boltzmann Machines (RBM) [8, 9] to reproducethe Long Range Ising (LRI) models in one dimension. The LRI models have its own longhistory [10], since they work as most simpliﬁed models to investigate the quantum dissipa-tion issues which are still unveiled subjects lying in between classical and quantum physics[11–15] . The LRI models have a critical point (temperature) to exhibit the spontaneousmagnetization [16, 17] . The critical point depends on the long range nature determined bythe power exponent of the interactions [18] . Moreover, functional renormalization groupapproaches revealed Ising model critical phenomena[4, 5]. They organized new frameworkusing the ﬁeld-theoretic formulation and developed powerful techniques with including higherorder diagrams. In this article, however, we are attempting to utilize the ﬁnite range scalingmethod and direct comparison of our results with the above mentioned ones are not possibleyet. Here we adopt the Ohmic case where the power exponent is − K = 0 . II. RESTRICTED BOLTZMANN MACHINES FOR LONG RANGE ISING MOD-ELS

We introduce the standard RBM consisting of visible variables v and hidden variables h .The total probability distribution is deﬁned by P ( v , h ) = 1 Z e − H ( v , h ) , Z = (cid:88) v , h e − H ( v , h ) , (1)where H is the Hamiltonian (energy function) and the partition function Z is the totalnormalization constant. We integrate out the hidden variables to get the probability distri-bution function for v , P ( v ) = (cid:88) h P ( v , h ) . (2)The standard restriction of RBM requires the Hamiltonian to take the following form, H ( v , h ) = − (cid:88) i,j w ij v i h j , (3)where we omit the external ﬁeld terms (linear in v , h ) here.Our target system is one dimensional Ising system and all variables v , h takes +1 or − k − n = k n . Note that precisely speaking,the translational invariance holds for the hidden sector only, and in the visible sector, oddand even site spins are not equivalent. Hereafter our RBM are denoted by P ( v , h ; k ) , (4)3 FIG. 1: Deﬁnition of Restricted Boltzmann Machines. where the machine parameter k represents { k , k , · · · } . The RBM visible probability dis-tribution is given similarly, P ( v ; k ) = (cid:88) h P ( v , h ; k ) . (5) III. THE LONG RANGE ISING MODELS

The LRI model is deﬁned by the following statistical weights, P LRI ( v ) = 1 Z exp (cid:32)(cid:88) n K n (cid:88) i v i v i + n (cid:33) , (6)4here K n is the coupling constant for range n . We take the Ohmic type of the long rangebehavior, K n = K n . (7)Our purpose in this article is to tune the RBM parameter k so that the RBM visibleprobability distribution may best reproduce the LRI probability distribution, P ( v ; k ∗ ) (cid:39) P LRI ( v ) . (8)We divide the LRI Hamiltonian into two parts which are the nearest neighbor base partand other long range part, P LRI = 1 Z exp (cid:32) K (cid:88) i v i v i +1 (cid:33) exp (cid:32)(cid:88) n =2 K n (cid:88) i v i v i + n (cid:33) = 1 Z exp( − H ( v )) exp( − H L ( v )) . (9)We set up the input data for RBM as follows. In this section we omit the Hamiltonian slicingprocedure for simplicity, which will be explained in the next section. We generate a set ofspin conﬁgurations exactly respecting the nearest neighbor part probability distribution.This is done most quickly via the domain wall representation [21] where the domain wallexists with probability, q = 11 + exp(2 K ) , (10)and there is no correlation among domain walls. Therefore we can set each domain wallindependently, expect for caring the periodic boundary condition. We express this set ofconﬁguration by { v ( µ ) } where µ denotes discriminator of each conﬁguration.Then the second part of the weight is considered as the physical quantity side. Wecalculate the additional probability which should be assigned to each conﬁguration generatedabove, v µ = ⇒ p µ ∝ exp( − H L ( v µ )) . (11)Now our target probability distribution to be learned by RBM is deﬁned by a set of pair ofConﬁguration with corresponding Probability, CwP: { v ( µ ) ; p µ } . (12)5he normalization of the probability is taken to be (cid:88) µ p µ = N , (13)where N is the total number of conﬁgurations.Now we deﬁne the likelihood of RBM to produce the above CwP as follows: L ( k ) = N (cid:89) µ p µ (cid:88) h P ( v ( µ ) , h ; k ) . (14)Note that the probability p µ is included representing the eﬀective number of occurrencesof the corresponding conﬁguration. To search for the stationary point of the likelihood,we diﬀerentiate the logarithm of likelihood function with respect to k . Using the explicitdeﬁnition of our RBM, we have the following derivative,1 N j M ∂ log( L ( k )) ∂k n = 1 N j M N (cid:88) µ p µ j M (cid:88) j v ( µ ) j + n tanh( λ j ) − j M j M (cid:88) j E [ v j + n h j ; k ] , (15)where j M is the total number of hidden variables h , λ j is deﬁned by λ j = (cid:88) n k n v ( µ ) j + n , (16)and E [ · ] denotes the expectation value of operator by the RBM, E [ v j + n h j ; k ] = (cid:88) v , h v j + n h j P ( v , h ; k ) . (17)Using the derivative above we adopt the steepest descent method to ﬁnd the maximumlikelihood position of RBM parameters. The expectation value part is evaluated by thecontrastive divergence method with several times of sample updates [9]. IV. MACHINE LEARNING PROCEDURE AND RESULTS

Our purpose here is to make appropriate RBM to generate high quality distribution of spinchain for the 1D Ising model with long range interactions. If the interactions among spins arelimited to the nearest neighbor type, the model is easily solved exactly and the correspondingRBM solution is also obtained straightforwardly, although the practical machine learningprocess is not trivial. However if the interactions are not nearest neighbor, the model cannotbe solved analytically, and its RBM counterpart is far from trivial.6

RI (Long Range Ising) RBMmachine learning comparison

FIG. 2: View of RBM learning procedures.

Our total strategy here is drawn in Fig. 2, although we omit the slicing processes explainedbelow for simplicity. Slicing is necessary since the diﬀerence of the probabilities in a setshould not be too large. The large deviation of probabilities causes drastic loss of samplequality and the eﬀective size of data set is shrunk. We tune the slicing width so that theaveraged probability might be limited by some small size.Now the slicing procedure is explained in some detail. First of all we prepare slicedHamiltonians ∆ H m ( m = 1 , , · · · m Max ) so that they satisfy the following properties: M (cid:88) m =1 ∆ H m ( v ) = H M ( v ) , H M ( v ) | M = m Max = H L ( v ) . (18)At each slicing step ( m -th step here), the initial input conﬁgurations, denoted by { v ( µ ) [ m ] } , (19)is regarded as sarisfyng the probability distribution, P m ( v ) ∝ exp( − H ( v )) exp( − H m − ( v )) . (20)7hen we assign additional probability factor given by p µ [ m ] = N exp( − ∆ H m ( v ( µ ) [ m ]) (cid:88) µ exp( − ∆ H m ( v ( µ ) [ m ]) , (21)where ∆ H m is a current slice of the remaining part of the Hamiltonian. Using this Conﬁgu-ration with Probability: { v ( µ ) [ m ] , p µ [ m ] } as the target data, RBM parameters are tuned up( k m → k m +1 ), (cid:2) { v ( µ ) [ m ] , p µ [ m ] } ⇐⇒ RBM( k m ) (cid:3) = ⇒ RBM( k m +1 ) = ⇒ { v ( µ ) [ m + 1] } , (22)and the output data set { v ( µ ) [ m + 1] } by RBM ( k m +1 ) is expected to obey the probabilitydistribution, P m +1 ( v ) ∝ exp( − H ( v )) exp( − H m ( v )) . (23)Then this data set works for the input conﬁguration set for the next sliced step and iscoupled with probability p µ [ m + 1] deﬁned through ∆ H m +1 .In this serial procedures of learning, the set of conﬁguration is simultaneously updated.The set is updated at each step of the steepest descent move of the machine through thecontrastive divergence iteration. At the stationary point of the machine, the ﬁnal set ofconﬁguration is used as the initial set of conﬁguration for the next slice, that is, eachconﬁguration is assigned the probability coming from the next sliced Hamiltonian eﬀect.Actually our slicing order respects the range of interactions as follows. Starting with thenearest neighbor conﬁgurations, where the probability of conﬁguration is all 1 (constant), weadd the non-nearest neighbor interactions of range 2, but sliced (divided) by some number.We proceed RBM learning slice by slice, to reach the range 2 full interactions. Then we addrange 3 interactions, again with a slice. Proceeding this way further, ﬁnally we reach themaximum range interactions, which is 9 in this article.Practical and full analysis of RBM learning procedures are reported in the future fullpaper and here we show the tuned RBM parameters and its evaluation by checking thesusceptibility estimates. The size of the system is 128 spins (the number of visible variables v ). The RBM links contains up to k , that is, RBM has 13 machine parameters. Thetotal number of conﬁgurations for input is 1024. We take 64 random number series to getaveraged RBM machine parameters. The initial values of parameters are taken to be normal8 TABLE I: Tuned Restricted Boltzmann Machines and their Evaluation. values k = 1 , k = 1 , k n = 1 /n (for n > k will not be discussed here. In fact, 64 machines givewell-converged results and we take averaged machine parameters to deﬁne the tuned upRBM in the following results.Table 1 is the results of the averaged RBM parameters, where K is the nearest neighborcoupling constant and n is the maximum rang of the target LRI model interactions. For n >

7, optimized k n are all small numbers and are not listed in the table.In order to evaluate the quality of tuned RBM, we compare the susceptibility given byRBM with those calculated by the Block Decimation Renormalization Group (BDRG). We9efer to a half of the logarithm of susceptibility χ , X = log( χ ) / , χ = 12 j M (cid:42)(cid:32)(cid:88) i v i (cid:33) (cid:43) . (24)For the nearest neighbor case, X coincides with the coupling constant, X = K , (25)exactly in the inﬁnite size limit. X RBM Iteration X FIG. 3: RBM iteration of output data.

Starting with a set of 1024 perfectly random spin conﬁgurations (high temperature limitensemble), we operate the tuned up RBM. We evaluate the susceptibility, step by step, whichis seen in Fig. 3 as an example, where the target system is K = 0 . n = 9. The value X starts from the vanishing value of random spins and it increases rather quickly. Finally itslowly approaches towards the target value (1.565 in this case) which is drawn by a straightline. After the equilibrium, the thermal ﬂuctuation is observed, whose size will be arguedin a separate paper. After the onset of thermalized equilibrium, we read out the parameter X of the RBM by averaging over 100 iterations.The results are listed in Table 1 and are shown in Fig. 4, where all data of four K values areplotted ( K = 0 . , . , . , . K and n . As for large susceptibility region, however, there appears small diﬀerences,10 X Long range interaction range

X(RBM)X(BDRG)

FIG. 4: Evaluation of RBM by comparing with BDRG. some part of which might come from the fact that our system is ﬁnite, periodic 128 spins,and shortage of the number of input conﬁgurations and/or iteration sets of learning andevaluation. These will be discussed in a separate paper.It should be noted here that the susceptibility is just one physical quantity though itis most important, and we will investigate the RBM output conﬁgurations in detail tofurther check the total equivalence or quality of probability distribution. Also we will clarifythe intimate relation between RBM and renormalization group method through multi-layerRBM systems, which will give us a new viewpoint to understand the physical features ofLRI models.We thank fruitful discussions with Shin-Ichiro Kumamoto, Hiromitsu Goto and DaisukeSato. This work is ﬁrst motivated by the general lecture given by Muneki Yasuda and wethank him much for telling us basic notions of recent development of deep machine learning.This work was partially supported by JSPS KAKENHI Grant Number 25610103 and the2015th Research Grant of Yonago National College of Technology. [1] J. Xie, L. Xu, E. Chen, Advances Neural Inform .Process. Syst. (2012) 350.[2] David Silver et.al., Nature ,7587 (2016) 484[3] K. G. Wilson, Rev. Mod. Phys. , (1975) 773.

4] J. Berges, N. Tetradis, and C. Wetterich, Phys. Rep. , (2002) 223[5] B. Delamotte, arXiv:cond-mat/0702365 (2007).[6] K-I. Aoki, Int. J. Mod. Phys. (2000), 1249;K-I. Aoki, A. Horikoshi, M. Taniguchi and H. Terao, Phys. Rev. Lett. (2002), 572;K-I. Aoki and A. Horikoshi, Phys. Lett. A (2003),177; Phys. Rev. A (2002), 042105.[7] P Mehta, DJ Schwab, arXiv.org e-Print archive, stat/1410.3831 (2014).[8] Smolensky, P., Information processing in dynamical systems: Foundations of harmony theory.In D. E. Rumelhart, J. L. McClelland & the PDP Research Group, Parallel Distributed Pro-cessing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. Cambridge,MA: MIT Press/Bradford Books. (1986) 194.[9] Hinton, GE., Neural computation , 8 (2002) 1771.[10] R. B. Griﬃths, J. Math. Phys. (1967) 478; Commun. Math. Phys. (1967) 121.D. Ruelle, Commun. Math. Phys. (1968) 267.F. J. Dyson, Commun. Math. Phys. (1969) 91.M. Aizenman and R. Fern´andez, Lett. Math. Phys. (1988) 39.[11] A. O. Caldeira and A. J. Leggett, Phys. Rev. Lett. (1981) 211; Ann. of Phys. (1983)374.[12] K. Fujikawa, S. Iso, M. Sasaki and H. Suzuki, Phys. Rev. Lett. (1992) 1093. K. Fujikawa,S. Iso, M. Sasaki and H. Suzuki, Phys.Rev. B46 (1992) 10295.[13] T. Matsuo, Y. Natsume and T. Kato, J. Phys. Soc. Jpn. (2006) 103002.T. Matsuo, Y. Natsume and T. Kato, J. Phys. Soc. Jpn. (2006) 103002; Phys. Rev. B (2008) 184304.[14] S. Chakravarty, Phys. Rev. Lett. (1982) 681. A. J. Bray, M. A. Moore, Phys. Rev. Lett. (1982) 1545.[15] A. S. Kapoyannisa and N. Tetradisa, Phys. Lett. A276 (2000) 225. D. Zappala, Phys. Lett.

A290 (2001) 35.[16] J. Froehlich and T. Spencer, Commun. Math. Phys. (1982) 87.M. Aizenman and R. Fern´andez, Let. Math. Phys. (1988) 39.M. Aizenman, J. T. Chayes, L. Chayes and C. M. Newman, J. Stat. Phys. (1988) 1.J. Z. Imbrie and C. M. Newman, Commun. Math. Phys. (1988) 303.[17] P. W. Anderson and G. Yuval, J. Phys. C4 (1971) 607. J. M. Kosterlitz and D. J. Thouless, . Phys. C6 (1973) 1181. J. M. Kosterlitz, J. Phys. C7 (1974) 1046. J. M. Kosterlitz, Phys.Rev. Lett. (1976) 1577. J. L. Cardy, J. Phys. A14 (1981) 1407.[18] J. Bhattacharjee, S. Chakravarty, J. L. Richardson and D. J.Scalapino, Phys. Rev.

B24 (1981)3862.S. A. Cannas and A. C. N. de Magalhaes, J. Phys.

A30 (1997) 3345.E. Bayong, H. T. Diep, and V. Dotsenko, Phys. Rev. Lett. (1999) 14.Erik Luijten and Henk W. J. Bl¨ote, Phys. Rev. B (1997) 8945.Erik Luijten and Holger. Meßingfeld, Phys. Lev. Lett. (2001) 5305.[19] K-I. Aoki, T. Kobayashi and H. Tomita, Prog. Theor. Phys. (2008) 509.[20] K-I. Aoki and T. Kobayashi, Mod. Phys. Lett. B (2012) 1250202.[21] K-I. Aoki, T. Kobayashi and H. Tomita, Int. J. Mod. Phys. B (2009) 3739.(2009) 3739.