Generative models for sampling of lattice field theories
Matija Medvidovic, Juan Carrasquilla, Lauren E. Hayward, Bohdan Kulchytskyy
GGenerative models for sampling of lattice fieldtheories
Matija Medvidovi´c
Center for Computational Quantum Physics, Flatiron Institute, New York, NY 10010, USADepartment of Physics, Columbia University, New York 10027, USA [email protected]
Juan Carrasquilla
Vector Institute for Artificial Intelligence, MaRS Centre, Toronto, Ontario, CanadaDepartment of Physics and Astronomy, University of Waterloo, Ontario, N2L 3G1, Canada [email protected]
Lauren E. Hayward
Perimeter Institute for Theoretical Physics, Waterloo, Ontario, N2L 2Y5, Canada [email protected]
Bohdan Kulchytskyy
Department of Physics and Astronomy, University of Waterloo, Ontario, N2L 3G1, CanadaPerimeter Institute for Theoretical Physics, Waterloo, Ontario, N2L 2Y5, Canada [email protected]
Abstract
We explore a self-learning Markov chain Monte Carlo method based on the Ad-versarial Non-linear Independent Components Estimation Monte Carlo, which uti-lizes generative models and artificial neural networks. We apply this method to thescalar ϕ lattice field theory in the weak-coupling regime and, in doing so, greatlyincrease the system sizes explored to date with this self-learning technique. Ourapproach does not rely on a pre-existing training set of samples, as the agent sys-tematically improves its performance by bootstrapping samples collected by themodel itself. We evaluate the performance of the trained model by examining itsmixing time and study the ergodicity of generated samples. When compared tomethods such as Hamiltonian Monte Carlo, this approach provides unique advan-tages such as the speed of inference and a compressed representation of MonteCarlo proposals for potential use in downstream tasks. Field theory is the theoretical bedrock for unified descriptions of critical phenomena. Such a frame-work provides us with a microscopic understanding of universality [1]. Its predictions for a variety ofstrongly correlated systems have been confirmed experimentally to high precision in widely diversephysical systems [2], including thin films of superfluids [3], superconductors [4], ferromangets [5],quantum simulations of frustrated magnets [6], and ultracold atoms [7]. In practice, however, fieldtheories are often intractable and their analytical treatment involves approximations with varyingdegrees of reliability. Luckily, some of these field theories lend themselves to non-perturbativenumerical simulations. These simulations often utilize Markov Chain Monte Carlo (MCMC) algo-
Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020), Vancouver, Canada. a r X i v : . [ c ond - m a t . d i s - nn ] J a n ithms based on a Feynman path-integral formulation of the theory [8], tensor networks [9–12], oreven quantum computing [13].A key practical concern in MCMC simulations is the autocorrelation that exists between MonteCarlo samples [14]. Reducing the autocorrelation time enables a Markov chain to become shorterwhile maintaining the same statistical predictive capacity. Such optimization can be achievedthrough a tailored design of the proposal distribution in an MCMC update. The best proposalsdramatically reduce the autocorrelation and thus the computation time required to reach a desiredaccuracy.Recently, there has been progress toward an automatic optimization of the proposal distribution inMCMC simulations applied to physical systems. Specifically, the proposal distribution is param-eterized as a generative model designed for an inexpensive generation of statistically independentsamples, such as a generative adversarial network (GAN) [15, 16] or a flow model [17–19]. Animportant ingredient in such optimization is the choice of a loss function that ultimately leads toan efficient sampler. For example, Ref. [20] takes a data-driven approach relying on a GAN’sloss-function evaluated over a pre-existing data-set of samples. Contrarily, Refs. [21–24] take avariational approach that aims to minimize the action for the samples generated from the model.Here we examine a general approach for a self-training MCMC known as Adversarial NonlinearIndependent Components Estimation Monte Carlo (A-NICE MC) [25], where a neural network isoptimized to minimize the autocorrelation in a Markov chain. The network is trained on samplesgenerated by the model itself and, like most MCMC schemes, requires only the analytical latticeaction as input. By extending the A-NICE MC method to sample high-dimensional lattice fieldtheories, we find that this method scales well beyond previously tested target space dimensionalities.Using A-NICE MC on the scalar field ϕ theory [26] proves that the model can be systematicallyoptimized towards producing decorrelated samples using standard gradient-based optimizers. The ϕ theory was chosen because of its simplicity. In addition, we study the performance of A-NICEMC by examining its ability to discriminate between random Gaussian noise and actual samplesfrom the weakly coupled theory. We find that the model learns to distinguish between the two andthat we can observe the distinction in the training procedure. While the training procedure for A-NICE MC is considerably more expensive than Hamiltonian Monte Carlo (HMC), the strategy hasthe advantage that a trained A-NICE MC model can simply be stored, shared, and sampled whenneeded, as opposed to HMC where the result of the calculations is a collection of samples. We focus on obtaining lattice field samples { ϕ ( i ) } from a Boltzmann-type distribution of the form P ( ϕ ) ∝ exp( − S ( ϕ )) . The distribution is defined on a square L × L lattice Λ = R L , where eachsample ϕ ∈ Λ . Metropolis-Hastings (MH) [27, 28] MCMC algorithms propose each new sample ϕ ( i +1) from ϕ ( i ) through a predefined proposal function f . Each ϕ ( i +1) is either accepted andappended to the Markov Chain, or rejected such that ϕ ( i ) is appended to the chain instead. Bothdetailed balance and ergodicity are required of the mapping ϕ ( i ) (cid:55)→ ϕ ( i +1) . The details of the MHalgorithms can be found in Ref. [14]. Ideally, this probabilistic mapping is chosen such that thestatistical correlation between adjacent samples is minimized.An important and widely used algorithm in the study of lattice quantum field theory is the HMCalgorithm [29]. This method supplements the original degrees of freedom ϕ with fictitious conju-gate momentum variables π and introduces a combined Hamiltonian on the same lattice Λ definedas H Λ ( ϕ, π ) = (cid:80) x ∈ Λ π x + S Λ ( ϕ ) . A state proposal ( ϕ, π ) ( i +1) is then obtained by integratingHamilton’s equations for a finite time. Upon integrating out the momentum variables in the distribu-tion P H Λ ( ϕ, π ) ∝ exp( − H Λ ) , one recovers samples from the desired P ( ϕ ) . Since the Hamiltoniandynamics generate a volume-preserving and time-reversible flow of states, the proposal distributionis symmetric, implying that the rate of rejection of proposed samples is determined by the accuracyof the numerical integrator [29].A-NICE MC takes inspiration from HMC and embeds its proposal distribution in the augmentedspace f : ( ϕ, π ) ( i ) → ( ϕ, π ) ( i +1) where the supplementary degrees of freedom π are Gaussiandistributed. Furthermore, A-NICE exploits the invertible and volume-preserving properties of the2ICE [17] architecture to parametrize f . As a result, the proposal distribution is symmetrical anddoes not rely on expensive integration as in HMC.The NICE proposal f θ , which is parametrized by θ , is trained alongside an adversary to sys-tematically minimize autocorrelation times. This setup closely resembles a more traditionalGAN [15, 16, 30] but retains a few key differences. Most notably, a pairwise discriminator net-work is used in place of a cost function. Instead of ranking each sample individually as for a moreconventional discriminator, the pairwise discriminator ranks pairs of samples together, acting as aproxy for the auto-correlation function between samples. The discriminator is parametrized as adeep neural network and represents a mapping D α : R n → R with parameters α . Intuitively, thediscriminator takes a pair of samples and is trained to score the degree of their correlation. Theobjective is to produce a NICE proposal that generates a Markov chain with two desirable proper-ties. Firstly, starting from noise, the proposal should generate a highly-probable sample in a smallnumber of steps such that equilibrium is reached quickly. Secondly, starting from a highly-probablesample, the proposal should generate another decorrelated and probable sample in as few steps aspossible.We employ a bootstrapping procedure to generate a training dataset. The untrained network isinitially sampled to obtain a starting dataset D , which is expected to exhibit large autocorrelationtimes. Still, the MH algorithm biases the sampling towards the correct distribution regardless of theproposal [14]. We collect samples from multiple chains and shuffle them to further reduce hiddencorrelations in the training set. After N bootstrap training loop iterations using D , some fraction r ∈ [0 , of samples D are replaced by new data, yielding D . This new dataset is less correlatedsince it comes from the partially trained model. Iterating this procedure, we obtain samples ofincreasing quality in each subsequent dataset D i . In principle, we reach the fixed point of thistraining procedure when the samples generated by f θ become as decorrelated as the shuffled samplesfrom the bootstrapped dataset D i .The scalar ϕ theory on a lattice was chosen as the simplest non-Gaussian theory to test the ef-fectiveness of the A-NICE MC approach. Despite its simplicity, this theory has a nontrivial phasediagram and a variety of use cases across particle and many-body physics. As noted earlier, in thiscase, samples consist of only one real number per lattice site. For the ϕ lattice field theory, theprobability distribution P Λ ( ϕ ) ∝ e − S Λ ( ϕ ) is defined by the action S Λ ( ϕ ) = (cid:88) x ∈ Λ (cid:34) − κ D (cid:88) µ =1 ϕ x ϕ x +ˆ e µ + (1 − λ ) ϕ x + λ ϕ x (cid:35) , (1)where the coupling constants κ, γ ∈ R define different distributions P Λ ( ϕ ) and determine the phasediagram of the model. We choose κ = 0 . and λ = 0 . , for which the model is in the weak-coupling regime and displays behavior similar to the paramagnetic Ising model [31]. The phasediagram of the theory associated with Eq. 1 was explored in Ref. [32] using standard MCMC algo-rithms. Our choice of parameters enables us to tackle our goal of exploring novel MCMC frame-works for a non-Gaussian model, while eliminating any strong-coupling effects. Remaining in theweakly-coupled regime allows us to study A-NICE MC performance in the region of parameterspace easily reachable by conventional methods which allows for a straightforward comparison ofresults. It is difficult a priori to verify whether the A-NICE framework will display ergodic behavior sinceit is stochastically optimized, highly nonlinear, and contains a large number of free parameters.We thus compare our A-NICE algorithm against well-established HMC simulations. First, we takethe generated HMC chain and randomly pair the samples (to decorrelate them artificially) in orderto examine the output of the fully-trained A-NICE pairwise discriminator. The results are shownin Fig. 1. We observe that the A-NICE and HMC output distributions overlap, which means thatthe A-NICE distribution is consistent with the behavior of the HMC distribution in that the pro-duced samples are not penalized differently by the discriminator and pushed outside the bulk ofthe distribution. The fact that all HMC samples being scored similarly suggests that the discrim-inator had seen all such samples from the NICE proposal during training. We supplement theplot with a reference point provided by samples randomly chosen from a unit Gaussian distribu-3 .030 0.035 0.040 0.045 0.050 0.055 0.060 0.065
Discriminator output D e n s i t y o f p a i r s HMCNoiseA-NICE
Training iteration D i s c r i m i n a t o r o u t p u t | D HMC D A NICE || D noise D A NICE | Figure 1: (left) The fully-trained discriminator score distribution of randomly shuffled HMC, A-NICE and (unit) Gaussian random noise for an × lattice. All three histograms have been normal-ized to unity. The noise was chosen with the mean and the variance estimated from HMC samples.(right) The mean value of the same three scores with respect to the training step and averaged over32 independent chains. Training iteration E ff e c t i v e s a m p l e s i z e Figure 2: Effective sample size (ESS) as a function of training iteration. The increasing ESS demon-strates A-NICE MC’s ability to systematically improve the statistical quality metrics of the Markovchain produced by the model.tion. Since we are studying our lattice model well within the weakly coupled regime, we expectour field samples to be almost Gaussian and thus penalized in a similar way. We plot the dis-criminator scores of HMC, A-NICE and the Gaussian random noise against the training iterationin the right panel of Fig. 1. Differences between discriminator scores of A-NICE samples andHMC/random samples are shown, where the HMC samples are randomly shuffled to artificiallyeliminate autocorrelations and provide reliable reference values. We see that, as the discriminatoroptimizes [16], the difference | D HMC − D A-NICE | decreases and thus the A-NICE score approachesthe decorrelated HMC score. We perform error analysis on the fully-trained distributions in theleft panel of Fig. 1 and find that | D HMC − D A-NICE | = (416 . ± . × − , which is less than | D noise − D A-NICE | = (433 . ± . × − . Therefore, we conclude that A-NICE MC learns to dis-tinguish between random noise and a weakly-coupled ϕ theory, since the A-NICE samples becomeless similar to random noise and become more similar to decorrelated HMC samples by the end ofthe training.In Fig. 2, we study the systematic improvement of sample quality during the training process. Specif-ically, we plot the effective sample size (ESS), which is inversely proportional to the integratedautocorrelation time τ [14] of the chain (ESS = N/ τ , where N is the number of samples) andapproximately represents the effective number of independent samples in a given chain. We expectthat additional time and computational resources would continue to improve the model.Next, we study the behaviour of some observables, including the magnetization M ≡ (cid:10) L D (cid:80) x ∈ Λ ϕ x (cid:11) , the two point-susceptibility χ ≡ (cid:80) x ∈ Λ G ( x ) (where G ( x ) ≡ L D (cid:80) y ∈ Λ [ (cid:104) ϕ y ϕ x + y (cid:105) − (cid:104) ϕ y (cid:105) (cid:104) ϕ x + y (cid:105) ] ), and the Ising energy density E I ≡ lim λ →∞ D (cid:80) µ G (ˆ e µ ) . As illustrated in Fig. 3, A-NICE outperforms HMC for most ob-4
10 15
Time lag t ( t ) / ( ) Magnetization M HMC ( = 7.37)A-NICE ( = 8.68) 5 10 15 20
Time lag t Two-pt. susceptibility HMC ( = 9.04)A-NICE ( = 6.49) 0 5 10 15 20 25
Time lag t Ising energy density E I HMC ( = 13.97)A-NICE ( = 7.36)
Figure 3: Comparisons of common physical observables as a function of the time difference (lag) t between two MCMC samples for L = 10 . Each y -axis shows the normalized statistical au-tocorrelation χ ( t ) = (cid:82) ∞ d t (cid:48) (cid:2) O ( t (cid:48) ) O ( t + t (cid:48) ) − (cid:104)O(cid:105) (cid:3) for a different observable O . Integratedautocorrelation times [14] for both methods and each observable are reported in the figure legends.servables of interest since the A-NICE autocorrelation curves decay faster, except for the case of theaverage magnetization. Note that HMC can be tuned to improve performance, but usually at the costof increased computational complexity. Since HMC constructs proposals by integrating differentialequations, more steps would be required each time. Thus, the proposal would be further away fromthe current state, lowering autocorrelations and decreasing acceptance rates. A-NICE doesn’t forcesuch a trade-off but offers no control over acceptance rates in return. For HMC, we take 10 leapfrogsteps for each proposal and the step size is tuned so that the acceptance rate matches A-NICE MC. We have applied the A-NICE MC [25] algorithm to the lattice ϕ field theory in the weakly-coupledregime, increasing the number of dimensions that have been successfully sampled using this methodto date. The model can be systematically optimized and produces samples which are consistentwith an ergodic exploration of the state-space. The bulk of the computational cost for employingthis technique is associated with the training phase. Deployment of the trained model is relativelyinexpensive compared to the HMC method tuned to the same performance level. This samplingefficiency of the trained model effectively eliminates the need to store the samples, which cuts downon the storage space requirements and allows us to extract physical observables with low statisticaluncertainties.In future work, it will be interesting to extend our experiments with A-NICE MC to the strongly-interacting regime of the ϕ field theory as well as to apply this framework to more complicated the-ories with additional internal degrees of freedom or gauge symmetries. We conjecture that A-NICEMC can be extended to deal with these models with possible modifications required to efficientlyexplore the state space in a systematic domain-specific way. Impact statement
Our Adversarial Nonlinear Independent Components Estimation Monte Carlo (A-NICE MC) ap-proach to sampling focuses on training a neural network to be embedded into Markov chain MonteCarlo (MCMC) sampling algorithms. A-NICE MC moves most of the computational cost of MCMCsampling to network training, making sampling itself relatively less expensive compared to estab-lished MCMC approaches.We expect that further improvements to the adversarial network architecture are possible such thatthe sampling can continue to become even more efficient in the future. Improved efficiency inMCMC sampling schemes has the potential to greatly reduce the overall carbon footprint of high-performance computational (HPC) science. Lower autocorrelation times mean fewer sampling stepswhich, in turn, mean more effective samples per watt of consumed power. Therefore, we expectthat, given further optimizations, machine-learning-based sampling methods such as A-NICE MChave the potential to significantly reduce the scientific carbon footprint.5 eferences [1] J. Cardy,
Scaling and Renormalization in Statistical Physics . Cambridge Lecture Notes inPhysics. Cambridge University Press, 1996.[2] S. Sachdev,
Quantum Phase Transitions . Cambridge University Press, Apr., 2001.[3] F. M. Gasparini, M. O. Kimball, K. P. Mooney, and M. Diaz-Avila, “Finite-size scaling of He at the superfluid transition,” Rev. Mod. Phys. (Sep, 2008) 1009–1059. https://link.aps.org/doi/10.1103/RevModPhys.80.1009 .[4] B. I. Halperin and D. R. Nelson, “Resistive transition in superconducting films,” Journal ofLow Temperature Physics no. 5, (Sep, 1979) 599–616. https://doi.org/10.1007/BF00116988 .[5] R. Coldea, D. A. Tennant, E. M. Wheeler, E. Wawrzynska, D. Prabhakaran, M. Telling,K. Habicht, P. Smeibidl, and K. Kiefer, “Quantum Criticality in an Ising Chain: ExperimentalEvidence for Emergent E Symmetry,”
Science no. 5962, (Jan., 2010) 177–180.[6] A. D. King, J. Carrasquilla, J. Raymond, I. Ozfidan, E. Andriyash, A. Berkley, M. Reis,T. Lanting, R. Harris, F. Altomare, K. Boothby, P. I. Bunyk, C. Enderud, A. Fréchette,E. Hoskinson, N. Ladizinsky, T. Oh, G. Poulin-Lamarre, C. Rich, Y. Sato, A. Y. Smirnov, L. J.Swenson, M. H. Volkmann, J. Whittaker, J. Yao, E. Ladizinsky, M. W. Johnson, J. Hilton, andM. H. Amin, “Observation of topological phenomena in a programmable lattice of 1,800qubits,”
Nature no. 7719, (Aug., 2018) 456.[7] M. Greiner, O. Mandel, T. Esslinger, T. W. Hänsch, and I. Bloch, “Quantum phase transitionfrom a superfluid to a Mott insulator in a gas of ultracold atoms,”
Nature no. 6867, (Jan.,2002) 39–44.[8] R. P. Feynman, “Space-time approach to non-relativistic quantum mechanics,”
Rev. Mod.Phys. (Apr, 1948) 367–387. https://link.aps.org/doi/10.1103/RevModPhys.20.367 .[9] J. Haegeman, T. J. Osborne, H. Verschelde, and F. Verstraete, “Entanglement renormalizationfor quantum fields in real space,” Phys. Rev. Lett. (Mar, 2013) 100402. https://link.aps.org/doi/10.1103/PhysRevLett.110.100402 .[10] M. Ganahl, J. Rincón, and G. Vidal, “Continuous matrix product states for quantum fields:An energy minimization algorithm,”
Phys. Rev. Lett. (Jun, 2017) 220402. https://link.aps.org/doi/10.1103/PhysRevLett.118.220402 .[11] F. Verstraete and J. I. Cirac, “Continuous matrix product states for quantum fields,”
Phys. Rev.Lett. (May, 2010) 190405. https://link.aps.org/doi/10.1103/PhysRevLett.104.190405 .[12] G. Magnifico, T. Felser, P. Silvi, and S. Montangero, “Lattice Quantum Electrodynamics in(3+1)-dimensions at finite density with Tensor Networks,” arXiv:2011.10658 . http://arxiv.org/abs/2011.10658 .[13] J. Preskill, “Simulating quantum field theory with a quantum computer,” arXiv:1811.10085[hep-lat, physics:hep-th, physics:quant-ph] (Nov., 2018) , arXiv:1811.10085 [hep-lat,physics:hep-th, physics:quant-ph] .[14] M. E. J. Newman and G. T. Barkema, “Monte Carlo Methods in Statistical Physics,” OxfordUniversity Press (1999) .[15] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,and Y. Bengio, “Generative Adversarial Networks,”. http://arxiv.org/abs/1406.2661 .[16] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein Generative Adversarial Networks,” in
Proceedings of the 34th International Conference on Machine Learning . 2017.[17] L. Dinh, D. Krueger, and Y. Bengio, “NICE: Non-linear Independent ComponentsEstimation,”. http://arxiv.org/abs/1410.8516 .618] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using Real NVP,”. http://arxiv.org/abs/1605.08803 .[19] D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1×1 convolutions,” in
Advances in Neural Information Processing Systems , vol. 2018-Decem, pp. 10215–10224.Neural information processing systems foundation, Jul, 2018. arXiv:1807.03039 . http://arxiv.org/abs/1807.03039 .[20] J. M. Urban and J. M. Pawlowski, “Reducing Autocorrelation Times in Lattice Simulationswith Generative Adversarial Networks,”. http://arxiv.org/abs/1811.03533 .[21] M. S. Albergo, G. Kanwar, and P. E. Shanahan, “Flow-based generative models for markovchain monte carlo in lattice field theory,” Phys. Rev. D (Aug, 2019) 034515. https://link.aps.org/doi/10.1103/PhysRevD.100.034515 .[22] D. Boyda, G. Kanwar, S. Racanière, D. J. Rezende, M. S. Albergo, K. Cranmer, D. C.Hackett, and P. E. Shanahan, “Sampling using $SU(N)$ gauge equivariant flows,” arXiv:2008.05456 . http://arxiv.org/abs/2008.05456 .[23] G. Kanwar, M. S. Albergo, D. Boyda, K. Cranmer, D. C. Hackett, S. Racanière, D. J.Rezende, and P. E. Shanahan, “Equivariant flow-based sampling for lattice gauge theory,” Physical Review Letters no. 12, (Mar, 2020) , arXiv:2003.06413 . http://arxiv.org/abs/2003.06413http://dx.doi.org/10.1103/PhysRevLett.125.121601 .[24] K. A. Nicoli, C. J. Anders, L. Funcke, T. Hartung, K. Jansen, P. Kessel, S. Nakajima, andP. Stornati, “On Estimation of Thermodynamic Observables in Lattice Field Theories withDeep Generative Models,” Jul, 2020. http://arxiv.org/abs/2007.07115 .[25] J. Song, S. Zhao, and S. Ermon, “A-NICE-MC: Adversarial Training for MCMC,”. http://arxiv.org/abs/1706.07561 .[26] S. Weinberg, The Quantum Theory of Fields , vol. 1. Cambridge University Press, 1995.[27] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation ofstate calculations by fast computing machines,”
The Journal of Chemical Physics no. 6,(1953) 1087–1092.[28] W. K. Hastings, “Monte carlo sampling methods using Markov chains and their applications,” Biometrika no. 1, (1970) 97–109.[29] R. M. Neal, “MCMC using Hamiltonian dynamics,”. http://arxiv.org/abs/1206.1901 .[30] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “InfoGAN:Interpretable representation learning by information maximizing generative adversarial nets,”in Advances in Neural Information Processing Systems , pp. 2180–2188. 6, 2016. http://arxiv.org/abs/1606.03657 .[31] M. Kardar,
Statistical Physics of Fields . Cambridge University Press, 1 ed., 2007.[32] A. K. De, A. Harindranath, J. Maiti, and T. Sinha, “Investigations in 1+1 dimensional latticephi ˆ4 theory,”
Physical Review D - Particles, Fields, Gravitation and Cosmology no. 9, (6,2005) s. http://arxiv.org/abs/hep-lat/0506002http://arxiv.org/abs/hep-lat/0506002