Restricted Boltzmann Machines for the Long Range Ising Models
KKANAZAWA-16-04
Restricted Boltzmann Machines for the Long Range Ising Models
Ken-Ichi Aoki ∗ and Tamao Kobayashi † Institute for Theoretical Physics, Kanazawa University, Kanazawa 920-1192, Japan National Institute of Technology, Yonago College, Yonago 683-8502, Japan
Abstract
We set up Restricted Boltzmann Machines (RBM) to reproduce the Long Range Ising (LRI)models of the Ohmic type in one dimension. The RBM parameters are tuned by using the standardmachine learning procedure with an additional method of Configuration with Probability (CwP).The quality of resultant RBM are evaluated through the susceptibility with respect to the magneticexternal field. We compare the results with those by Block Decimation Renormalization Group(BDRG) method, and our RBM clear the test with satisfactory precision. ∗ Electronic address: [email protected] † Electronic address: [email protected] a r X i v : . [ c ond - m a t . s t a t - m ec h ] J a n . INTRODUCTION Quite recently the deep learning machines have drawn much attention since they are veryeffective for image processing [1] and Go game [2] etc. The deep learning machines can beregarded as a method of information reduction keeping as much macroscopically importantfeatures as possible. This policy or idea resembles to the renormalization group method inphysics [3, 6], and the intrinsic relationship between them has been argued [7], and furtherinvestigation is strongly desired from both sides.Here in this article we make Restricted Boltzmann Machines (RBM) [8, 9] to reproducethe Long Range Ising (LRI) models in one dimension. The LRI models have its own longhistory [10], since they work as most simplified models to investigate the quantum dissipa-tion issues which are still unveiled subjects lying in between classical and quantum physics[11–15] . The LRI models have a critical point (temperature) to exhibit the spontaneousmagnetization [16, 17] . The critical point depends on the long range nature determined bythe power exponent of the interactions [18] . Moreover, functional renormalization groupapproaches revealed Ising model critical phenomena[4, 5]. They organized new frameworkusing the field-theoretic formulation and developed powerful techniques with including higherorder diagrams. In this article, however, we are attempting to utilize the finite range scalingmethod and direct comparison of our results with the above mentioned ones are not possibleyet. Here we adopt the Ohmic case where the power exponent is − K = 0 . II. RESTRICTED BOLTZMANN MACHINES FOR LONG RANGE ISING MOD-ELS
We introduce the standard RBM consisting of visible variables v and hidden variables h .The total probability distribution is defined by P ( v , h ) = 1 Z e − H ( v , h ) , Z = (cid:88) v , h e − H ( v , h ) , (1)where H is the Hamiltonian (energy function) and the partition function Z is the totalnormalization constant. We integrate out the hidden variables to get the probability distri-bution function for v , P ( v ) = (cid:88) h P ( v , h ) . (2)The standard restriction of RBM requires the Hamiltonian to take the following form, H ( v , h ) = − (cid:88) i,j w ij v i h j , (3)where we omit the external field terms (linear in v , h ) here.Our target system is one dimensional Ising system and all variables v , h takes +1 or − k − n = k n . Note that precisely speaking,the translational invariance holds for the hidden sector only, and in the visible sector, oddand even site spins are not equivalent. Hereafter our RBM are denoted by P ( v , h ; k ) , (4)3 FIG. 1: Definition of Restricted Boltzmann Machines. where the machine parameter k represents { k , k , · · · } . The RBM visible probability dis-tribution is given similarly, P ( v ; k ) = (cid:88) h P ( v , h ; k ) . (5) III. THE LONG RANGE ISING MODELS
The LRI model is defined by the following statistical weights, P LRI ( v ) = 1 Z exp (cid:32)(cid:88) n K n (cid:88) i v i v i + n (cid:33) , (6)4here K n is the coupling constant for range n . We take the Ohmic type of the long rangebehavior, K n = K n . (7)Our purpose in this article is to tune the RBM parameter k so that the RBM visibleprobability distribution may best reproduce the LRI probability distribution, P ( v ; k ∗ ) (cid:39) P LRI ( v ) . (8)We divide the LRI Hamiltonian into two parts which are the nearest neighbor base partand other long range part, P LRI = 1 Z exp (cid:32) K (cid:88) i v i v i +1 (cid:33) exp (cid:32)(cid:88) n =2 K n (cid:88) i v i v i + n (cid:33) = 1 Z exp( − H ( v )) exp( − H L ( v )) . (9)We set up the input data for RBM as follows. In this section we omit the Hamiltonian slicingprocedure for simplicity, which will be explained in the next section. We generate a set ofspin configurations exactly respecting the nearest neighbor part probability distribution.This is done most quickly via the domain wall representation [21] where the domain wallexists with probability, q = 11 + exp(2 K ) , (10)and there is no correlation among domain walls. Therefore we can set each domain wallindependently, expect for caring the periodic boundary condition. We express this set ofconfiguration by { v ( µ ) } where µ denotes discriminator of each configuration.Then the second part of the weight is considered as the physical quantity side. Wecalculate the additional probability which should be assigned to each configuration generatedabove, v µ = ⇒ p µ ∝ exp( − H L ( v µ )) . (11)Now our target probability distribution to be learned by RBM is defined by a set of pair ofConfiguration with corresponding Probability, CwP: { v ( µ ) ; p µ } . (12)5he normalization of the probability is taken to be (cid:88) µ p µ = N , (13)where N is the total number of configurations.Now we define the likelihood of RBM to produce the above CwP as follows: L ( k ) = N (cid:89) µ p µ (cid:88) h P ( v ( µ ) , h ; k ) . (14)Note that the probability p µ is included representing the effective number of occurrencesof the corresponding configuration. To search for the stationary point of the likelihood,we differentiate the logarithm of likelihood function with respect to k . Using the explicitdefinition of our RBM, we have the following derivative,1 N j M ∂ log( L ( k )) ∂k n = 1 N j M N (cid:88) µ p µ j M (cid:88) j v ( µ ) j + n tanh( λ j ) − j M j M (cid:88) j E [ v j + n h j ; k ] , (15)where j M is the total number of hidden variables h , λ j is defined by λ j = (cid:88) n k n v ( µ ) j + n , (16)and E [ · ] denotes the expectation value of operator by the RBM, E [ v j + n h j ; k ] = (cid:88) v , h v j + n h j P ( v , h ; k ) . (17)Using the derivative above we adopt the steepest descent method to find the maximumlikelihood position of RBM parameters. The expectation value part is evaluated by thecontrastive divergence method with several times of sample updates [9]. IV. MACHINE LEARNING PROCEDURE AND RESULTS
Our purpose here is to make appropriate RBM to generate high quality distribution of spinchain for the 1D Ising model with long range interactions. If the interactions among spins arelimited to the nearest neighbor type, the model is easily solved exactly and the correspondingRBM solution is also obtained straightforwardly, although the practical machine learningprocess is not trivial. However if the interactions are not nearest neighbor, the model cannotbe solved analytically, and its RBM counterpart is far from trivial.6
RI (Long Range Ising) RBMmachine learning comparison
FIG. 2: View of RBM learning procedures.
Our total strategy here is drawn in Fig. 2, although we omit the slicing processes explainedbelow for simplicity. Slicing is necessary since the difference of the probabilities in a setshould not be too large. The large deviation of probabilities causes drastic loss of samplequality and the effective size of data set is shrunk. We tune the slicing width so that theaveraged probability might be limited by some small size.Now the slicing procedure is explained in some detail. First of all we prepare slicedHamiltonians ∆ H m ( m = 1 , , · · · m Max ) so that they satisfy the following properties: M (cid:88) m =1 ∆ H m ( v ) = H M ( v ) , H M ( v ) | M = m Max = H L ( v ) . (18)At each slicing step ( m -th step here), the initial input configurations, denoted by { v ( µ ) [ m ] } , (19)is regarded as sarisfyng the probability distribution, P m ( v ) ∝ exp( − H ( v )) exp( − H m − ( v )) . (20)7hen we assign additional probability factor given by p µ [ m ] = N exp( − ∆ H m ( v ( µ ) [ m ]) (cid:88) µ exp( − ∆ H m ( v ( µ ) [ m ]) , (21)where ∆ H m is a current slice of the remaining part of the Hamiltonian. Using this Configu-ration with Probability: { v ( µ ) [ m ] , p µ [ m ] } as the target data, RBM parameters are tuned up( k m → k m +1 ), (cid:2) { v ( µ ) [ m ] , p µ [ m ] } ⇐⇒ RBM( k m ) (cid:3) = ⇒ RBM( k m +1 ) = ⇒ { v ( µ ) [ m + 1] } , (22)and the output data set { v ( µ ) [ m + 1] } by RBM ( k m +1 ) is expected to obey the probabilitydistribution, P m +1 ( v ) ∝ exp( − H ( v )) exp( − H m ( v )) . (23)Then this data set works for the input configuration set for the next sliced step and iscoupled with probability p µ [ m + 1] defined through ∆ H m +1 .In this serial procedures of learning, the set of configuration is simultaneously updated.The set is updated at each step of the steepest descent move of the machine through thecontrastive divergence iteration. At the stationary point of the machine, the final set ofconfiguration is used as the initial set of configuration for the next slice, that is, eachconfiguration is assigned the probability coming from the next sliced Hamiltonian effect.Actually our slicing order respects the range of interactions as follows. Starting with thenearest neighbor configurations, where the probability of configuration is all 1 (constant), weadd the non-nearest neighbor interactions of range 2, but sliced (divided) by some number.We proceed RBM learning slice by slice, to reach the range 2 full interactions. Then we addrange 3 interactions, again with a slice. Proceeding this way further, finally we reach themaximum range interactions, which is 9 in this article.Practical and full analysis of RBM learning procedures are reported in the future fullpaper and here we show the tuned RBM parameters and its evaluation by checking thesusceptibility estimates. The size of the system is 128 spins (the number of visible variables v ). The RBM links contains up to k , that is, RBM has 13 machine parameters. Thetotal number of configurations for input is 1024. We take 64 random number series to getaveraged RBM machine parameters. The initial values of parameters are taken to be normal8 TABLE I: Tuned Restricted Boltzmann Machines and their Evaluation. values k = 1 , k = 1 , k n = 1 /n (for n > k will not be discussed here. In fact, 64 machines givewell-converged results and we take averaged machine parameters to define the tuned upRBM in the following results.Table 1 is the results of the averaged RBM parameters, where K is the nearest neighborcoupling constant and n is the maximum rang of the target LRI model interactions. For n >
7, optimized k n are all small numbers and are not listed in the table.In order to evaluate the quality of tuned RBM, we compare the susceptibility given byRBM with those calculated by the Block Decimation Renormalization Group (BDRG). We9efer to a half of the logarithm of susceptibility χ , X = log( χ ) / , χ = 12 j M (cid:42)(cid:32)(cid:88) i v i (cid:33) (cid:43) . (24)For the nearest neighbor case, X coincides with the coupling constant, X = K , (25)exactly in the infinite size limit. X RBM Iteration X FIG. 3: RBM iteration of output data.
Starting with a set of 1024 perfectly random spin configurations (high temperature limitensemble), we operate the tuned up RBM. We evaluate the susceptibility, step by step, whichis seen in Fig. 3 as an example, where the target system is K = 0 . n = 9. The value X starts from the vanishing value of random spins and it increases rather quickly. Finally itslowly approaches towards the target value (1.565 in this case) which is drawn by a straightline. After the equilibrium, the thermal fluctuation is observed, whose size will be arguedin a separate paper. After the onset of thermalized equilibrium, we read out the parameter X of the RBM by averaging over 100 iterations.The results are listed in Table 1 and are shown in Fig. 4, where all data of four K values areplotted ( K = 0 . , . , . , . K and n . As for large susceptibility region, however, there appears small differences,10 X Long range interaction range
X(RBM)X(BDRG)
FIG. 4: Evaluation of RBM by comparing with BDRG. some part of which might come from the fact that our system is finite, periodic 128 spins,and shortage of the number of input configurations and/or iteration sets of learning andevaluation. These will be discussed in a separate paper.It should be noted here that the susceptibility is just one physical quantity though itis most important, and we will investigate the RBM output configurations in detail tofurther check the total equivalence or quality of probability distribution. Also we will clarifythe intimate relation between RBM and renormalization group method through multi-layerRBM systems, which will give us a new viewpoint to understand the physical features ofLRI models.We thank fruitful discussions with Shin-Ichiro Kumamoto, Hiromitsu Goto and DaisukeSato. This work is first motivated by the general lecture given by Muneki Yasuda and wethank him much for telling us basic notions of recent development of deep machine learning.This work was partially supported by JSPS KAKENHI Grant Number 25610103 and the2015th Research Grant of Yonago National College of Technology. [1] J. Xie, L. Xu, E. Chen, Advances Neural Inform .Process. Syst. (2012) 350.[2] David Silver et.al., Nature ,7587 (2016) 484[3] K. G. Wilson, Rev. Mod. Phys. , (1975) 773.
4] J. Berges, N. Tetradis, and C. Wetterich, Phys. Rep. , (2002) 223[5] B. Delamotte, arXiv:cond-mat/0702365 (2007).[6] K-I. Aoki, Int. J. Mod. Phys. (2000), 1249;K-I. Aoki, A. Horikoshi, M. Taniguchi and H. Terao, Phys. Rev. Lett. (2002), 572;K-I. Aoki and A. Horikoshi, Phys. Lett. A (2003),177; Phys. Rev. A (2002), 042105.[7] P Mehta, DJ Schwab, arXiv.org e-Print archive, stat/1410.3831 (2014).[8] Smolensky, P., Information processing in dynamical systems: Foundations of harmony theory.In D. E. Rumelhart, J. L. McClelland & the PDP Research Group, Parallel Distributed Pro-cessing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. Cambridge,MA: MIT Press/Bradford Books. (1986) 194.[9] Hinton, GE., Neural computation , 8 (2002) 1771.[10] R. B. Griffiths, J. Math. Phys. (1967) 478; Commun. Math. Phys. (1967) 121.D. Ruelle, Commun. Math. Phys. (1968) 267.F. J. Dyson, Commun. Math. Phys. (1969) 91.M. Aizenman and R. Fern´andez, Lett. Math. Phys. (1988) 39.[11] A. O. Caldeira and A. J. Leggett, Phys. Rev. Lett. (1981) 211; Ann. of Phys. (1983)374.[12] K. Fujikawa, S. Iso, M. Sasaki and H. Suzuki, Phys. Rev. Lett. (1992) 1093. K. Fujikawa,S. Iso, M. Sasaki and H. Suzuki, Phys.Rev. B46 (1992) 10295.[13] T. Matsuo, Y. Natsume and T. Kato, J. Phys. Soc. Jpn. (2006) 103002.T. Matsuo, Y. Natsume and T. Kato, J. Phys. Soc. Jpn. (2006) 103002; Phys. Rev. B (2008) 184304.[14] S. Chakravarty, Phys. Rev. Lett. (1982) 681. A. J. Bray, M. A. Moore, Phys. Rev. Lett. (1982) 1545.[15] A. S. Kapoyannisa and N. Tetradisa, Phys. Lett. A276 (2000) 225. D. Zappala, Phys. Lett.
A290 (2001) 35.[16] J. Froehlich and T. Spencer, Commun. Math. Phys. (1982) 87.M. Aizenman and R. Fern´andez, Let. Math. Phys. (1988) 39.M. Aizenman, J. T. Chayes, L. Chayes and C. M. Newman, J. Stat. Phys. (1988) 1.J. Z. Imbrie and C. M. Newman, Commun. Math. Phys. (1988) 303.[17] P. W. Anderson and G. Yuval, J. Phys. C4 (1971) 607. J. M. Kosterlitz and D. J. Thouless, . Phys. C6 (1973) 1181. J. M. Kosterlitz, J. Phys. C7 (1974) 1046. J. M. Kosterlitz, Phys.Rev. Lett. (1976) 1577. J. L. Cardy, J. Phys. A14 (1981) 1407.[18] J. Bhattacharjee, S. Chakravarty, J. L. Richardson and D. J.Scalapino, Phys. Rev.
B24 (1981)3862.S. A. Cannas and A. C. N. de Magalhaes, J. Phys.
A30 (1997) 3345.E. Bayong, H. T. Diep, and V. Dotsenko, Phys. Rev. Lett. (1999) 14.Erik Luijten and Henk W. J. Bl¨ote, Phys. Rev. B (1997) 8945.Erik Luijten and Holger. Meßingfeld, Phys. Lev. Lett. (2001) 5305.[19] K-I. Aoki, T. Kobayashi and H. Tomita, Prog. Theor. Phys. (2008) 509.[20] K-I. Aoki and T. Kobayashi, Mod. Phys. Lett. B (2012) 1250202.[21] K-I. Aoki, T. Kobayashi and H. Tomita, Int. J. Mod. Phys. B (2009) 3739.(2009) 3739.