Normalizing flows for microscopic many-body calculations: an application to the nuclear equation of state
NNormalizing flows for microscopic calculations of the nuclear equation of state
Jack Brady, ∗ Pengsheng Wen,
2, 3, † and Jeremy W. Holt
2, 3, ‡ Texas A&M University, College Station, TX 77843, USA Cyclotron Institute, Texas A&M University, College Station, TX 77843, USA Department of Physics and Astronomy, Texas A&M University, College Station, TX 77843, USA (Dated: March 2, 2021)Normalizing flows are a class of machine learning models used to construct a complex distributionthrough a bijective mapping of a simple base distribution. We demonstrate that normalizing flowsare particularly well suited as a Monte Carlo integration framework for nuclear many-body calcu-lations that require the repeated evaluation of high-dimensional integrals across smoothly varyingintegrands and integration regions. As an example, we consider the finite-temperature nuclear equa-tion of state. An important advantage of normalizing flows is the ability to build highly expressivemodels of the target integrand, which we demonstrate enables precise evaluations of the nuclearfree energy and its derivatives. Furthermore, we show that a normalizing flow model trained onone target integrand transfers remarkably well to related integrals, such that the nuclear equationof state at varying density and temperature can be mapped with computational efficiency and pre-cision. This work will support future efforts to build microscopic equations of state for numericalsimulations of supernovae and neutron star mergers that employ state-of-the-art nuclear forces andmany-body methods.
Introduction:
The hot and dense matter equation ofstate (EOS) is of fundamental importance for interpret-ing observations of neutron stars, core-collapse super-novae, and neutron star mergers in terms of the underly-ing nuclear microphysics [1, 2]. Due to the complexity ofcomputing the free energy and its derivatives (to obtainthe pressure, entropy, chemical potentials, etc.) acrossthe wide range of ambient conditions encountered duringsimulations of supernovae and neutron star mergers, mostequations of state in wide use by the simulation commu-nity are based on simplified mean field models of thenuclear force [3–6]. Since mean field theory is groundedin effective interactions fitted to the bulk properties ofmedium-mass and heavy nuclei, one loses connection tofundamental nuclear two- and many-body forces and theability to estimate systematic uncertainties [7]. In ad-dition, certain thermodynamic properties that are im-portant for understanding the evolution of core-collapsesupernovae, such as the temperature-dependent nucleoneffective mass [8, 9], are quite different in microscopic andmean field models [10]. For these reasons, there is strongmotivation to develop more microscopic descriptions ofthe nuclear equation of state based on realistic nuclearforces in beyond-mean-field-theory quantum many-bodycalculations.Microscopic calculations of the free energy F ( n, T, Y p )as a function of density n , temperature T , and composi-tion (e.g., the proton fraction Y p ) have in recent yearsbeen computed from realistic nuclear two- and three-body chiral effective field theory (EFT) nuclear forces[11–13]. However, the inclusion of more sophisticatedthree-body forces [14, 15] or higher-order many-body per- ∗ [email protected] † [email protected] ‡ [email protected] turbation theory corrections [16, 17] requires the evalua-tion of technically challenging multi-dimensional integra-tions and therefore has not yet been achieved in finite-temperature calculations. Moreover, the tabulation of anastrophysical equation of state (the free energy and itsfirst and second derivatives) involves the repeated evalu-ation of these integrals across more than 1,000,000 phasespace points in order to ensure numerical stability of su-pernova and neutron star merger simulations [18]. Mi-croscopic EOS tabulations are therefore computationallydemanding and only recently have been carried out [19]using the Argonne v NN potential and the Urbana IXthree-body force, supplemented by a liquid drop modelfor describing the low-density inhomogeneous phase ofnuclear matter.A potential solution to the numerical challenges out-lined above is adaptive Monte Carlo methods based onimportance sampling [20], which have recently been em-ployed [17] to calculate high-order perturbation theorycorrections to the cold dense matter equation of statethat were previously intractable. In importance samplingan estimate for the integral I = (cid:90) D ψ ( (cid:126)x ) d(cid:126)x (1)is obtained as a Monte Carlo estimate under a proposaldistribution p ( x ) as I (cid:39) (cid:104) I (cid:105) N = 1 N N (cid:88) i =1 ψ ( (cid:126)x i ) p ( (cid:126)x i ) . (2)The precision of this estimator, however, is dependenton how well p ( x ) is able to match the normalized target | ψ ( (cid:126)x ) | /I , where if p ( x ) matches | ψ ( (cid:126)x ) | /I exactly, we ob-tain an ideal estimator [21]. Consequently, when a preciseestimate for the integral is required, as is the case whencomputing numerical derivatives in EOS tabulations, the a r X i v : . [ nu c l - t h ] M a r proposal distribution must have sufficient expressive ca-pacity to match the target. Popular adaptive importancesampling methods [22, 23], however, often make restric-tive assumptions on the integrand such as factorizability,thus limiting the precision of the estimator.In the present work, we propose normalizing flows [24–26] as a means for constructing efficient importance sam-pling estimators [27–34] for microscopic EOS tabulations.Normalizing flows have recently emerged as a highly ex-pressive method for modeling complex target distribu-tions using deep neural networks. We demonstrate thatthis expressivity allows for precise first- and second-ordernumerical derivatives of the free energy over a relativelycoarse density and temperature grid. Furthermore, weshow that a normalizing flow model trained on one targetintegrand transfers remarkably well to related integrals,thus providing a compelling framework moving forwardfor including high-order many-body perturbation theorycorrections for tabulated astrophysical equations of state.As a concrete test case, we consider the second-orderperturbation theory contribution to the grand canonicalpotential Ω of isospin-symmetric nuclear matter from anantisymmetrized two-body force ¯ V NN :Ω (2) = − (cid:88) ¯ V NN ¯ V NN f f ¯ f ¯ f − ¯ f ¯ f f f (cid:15) + (cid:15) − (cid:15) − (cid:15) , (3)where f i = 1 / (1 + e ( (cid:15) i − µ ) /T ) is the Fermi-Dirac distri-bution function for particles with chemical potential µ ,¯ f i = 1 − f i , (cid:15) i = k i / (2 M ) is the free-particle spectrum,and the sums are taken over spin, isospin, and momen-tum. The grand canonical potential is related to the freeenergy through F = Ω + µN p , where N p is the numberof particles. On the one hand, the contribution in Eq.(3) is sufficiently complex to demonstrate the efficiencyof normalizing flow based importance sampling, and onthe other hand it is amenable to nearly exact evaluationusing Gaussian quadrature for benchmarking our results.Since our focus is on the momentum-space integrationsinherent in Eq. (3), here we consider a particularly simplemodel of the nuclear force V ( q ) = g m φ + q (4)associated with scalar-isoscalar boson exchange, where q is the magnitude of the momentum transfer and wetake m φ = 600 MeV and g = 1. Performing the spin andisospin sums in Eq. (3) and choosing (cid:126)k = k ˆ i z , we obtainΩ (2) ( n, T )= − M g π (cid:90) ∞ dk (cid:90) ∞ dk (cid:90) ∞ dk (cid:90) π dθ (cid:90) π dθ (cid:90) π dφ (cid:90) π dφ k k k sin θ sin θ ( f f ¯ f ¯ f − ¯ f ¯ f f f ) k + k − k − k (cid:34) m φ + q ) − m φ + q )( m φ + q ) (cid:35) e − p/ Λ) − p (cid:48) / Λ) , (5) where (cid:126)k = (cid:126)k + (cid:126)k − (cid:126)k , (cid:126)q = (cid:126)k − (cid:126)k , (cid:126)q = (cid:126)k − (cid:126)k , (cid:126)p = ( (cid:126)k − (cid:126)k ), and (cid:126)p (cid:48) = ( (cid:126)k − (cid:126)k ). We have includedthe multiplicative function g ( (cid:126)p, (cid:126)p (cid:48) ) = e − ( (cid:126)p/ Λ) − ( (cid:126)p (cid:48) / Λ) inthe definition of the potential in Eq. (4) as is common inthe literature to regulate the unresolved high-momentumcomponents of chiral nuclear forces [35]. We choose Λ =450 MeV as the high-momentum cutoff scale. Methods:
A normalizing flow [24–26] defines a com-plex distribution p ( (cid:126)x ) by applying a learnable, bijectivemapping (cid:126)h to a simple base distribution π ( (cid:126)u ). The prob-ability density of a sample (cid:126)x := (cid:126)h ( (cid:126)u ) under a flow canthen be obtained analytically using the change of vari-ables formula p ( (cid:126)x ) = π ( (cid:126)h − ( (cid:126)x )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) det (cid:32) ∂(cid:126)h − ∂(cid:126)x (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (6)In the present case, (cid:126)h should transform π ( (cid:126)u ) such that theresulting distribution p ( (cid:126)x ) matches our target distribu-tion | ψ ( (cid:126)x ) | /I as closely as possible. This can be achievedby optimizing the parameters of (cid:126)h using gradient-basedmethods to minimize a suitable divergence metric be-tween our model and target distributions. To optimizethis objective in practice, however, certain conditionsmust be satisfied when choosing a parameterization for (cid:126)h . Firstly, because the Jacobian determinant appears inthe change-of-variable formula, (cid:126)h should be chosen suchthat this term is tractable to compute. At the sametime, however, (cid:126)h must also be expressive enough to modelcomplex target distributions. To satisfy these require-ments, we implement (cid:126)h using a sequence of couplingtransforms [36, 37]. For a given coupling transform (cid:126)φ , an n -dimensional vector (cid:126)x is first partitioned into two parts: (cid:126)x = ( x , . . . , x d , , . . .
0) + (0 , . . . , , x d +1 , . . . , x n ), whichwe refer to as the base input and updated input vectors,respectively. The d coordinates of the base input vectorare then passed through a neural network which outputsa set of parameters (cid:126)θ that define an invertible transfor-mation g θ i ( x i ) dimension-wise on the n − d updated inputcoordinates x i . The coordinates of the base input vector x , . . . , x d are then passed unchanged through the cou-pling transform: x i → x i , for i ≤ d . This results in alower-triangular Jacobian matrix where the determinantcan easily be computed as the product of the diagonals:det (cid:32) ∂ (cid:126)φ∂(cid:126)x (cid:33) = n (cid:89) i = d ∂g θ i ∂x i . (7)Furthermore, (cid:126)h can be made highly expressive by com-posing a sequence of k coupling transformations (cid:126)h := (cid:126)φ k ◦ . . . ◦ (cid:126)φ with different choices of base input and updatedinput coordinates such that all variables are allowed tointeract.The second condition (cid:126)h must satisfy is that its outputrange needs to respect the specified boundary conditions − − − − (cid:12)(cid:12) σ t / Ω (2) (cid:12)(cid:12) nflowVEGAS0 10000 20000 3000010 − − − n = 0 . − T = 25MeV (cid:12)(cid:12) σ b / Ω (2) (cid:12)(cid:12) Iteration
FIG. 1. (top) Total relative uncertainty | σ t / Ω (2) | as a func-tion of iteration at the beginning of training for the evalua-tion of Ω (2) ( n , T = 25 MeV) in Eq. (5) using the normalizingflow (red) and VEGAS (green) Monte Carlo integration algo-rithms. (bottom) Same as top panel but for the batch relativeuncertainty | σ b / Ω (2) | as a function of iteration over the fulltraining time. The number of samples per iteration is 5000.The gray dashed line is added to help guide the eye. for a given integral. An obvious solution is to place afunction with bounded range on the output of (cid:126)h such asa sigmoid as suggested in [38, 39]. We found, however,that using sigmoid functions often led to imprecise in-tegral estimates while introducing training instabilitieswhen computing their determinant. Instead, we imple-ment each coupling transform using rational-quadraticsplines [40] such that (cid:126)h respects boundary conditions byconstruction.Rational-quadratic spline flows [30, 41] define g θ i ( x i )piecewise on an interval [ A i , B i ] by partitioning the in-terval into K bins and defining the transformation ineach bin as a monotonic rational-quadratic function.The rational-quadratic functions are parameterized bya set of K + 1 knots { ( x ( k ) , y ( k ) ) } Kk =0 which define theboundaries for the domain and range of each transfor-mation, and a set of K + 1 derivatives { δ ( k ) } Kk =0 de-fined at each knot. The knots monotonically increasewithin the interval [ A i , B i ] where ( x (0) i , y (0) i ) = ( A i , A i )and ( x ( K ) i , y ( K ) i ) = ( B i , B i ) such that g θ i ( x i ) is a map-ping from [ A i , B i ] to [ A i , B i ]. Thus, by setting [ A i , B i ] tobe the boundaries for each dimension of a given integral,we can restrict (cid:126)h to only be defined on the integrationregion.We implement our flow using a composition of 6rational-quadratic spline coupling transforms which isthe minimum number required to account for correlationsamong all the variables in our seven dimensional integralin Eq. (5) [31]. For each transform, we use K = 16bins and implement each neural network using a resid-ual network [42] with two residual blocks and 32 hiddenfeatures. Our base distribution is chosen to be uniformover the integration region for each dimension, respec-tively. To train our flow, we minimize the Pearson χ T [MeV] − − − Initialmodel n = 0 . − (cid:12)(cid:12) σ t / Ω (2) (cid:12)(cid:12) nflow, m φ VEGAS, m φ . m φ .
00 0 .
05 0 .
10 0 .
15 0 . n (cid:2) fm − (cid:3) T = 25MeV FIG. 2. Total relative uncertainty for the evaluation ofΩ (2) ( n, T ) in Eq. (5) using the VEGAS (green solid line) andnormalizing flow (red solid line) Monte Carlo integration al-gorithms. Both models were initially trained at n = n and T = 25 MeV (denoted by the star) for 36,000 iterations andthen transferred to nearby phase space points using only 100additional training iterations (500,000 samples). Also shownare the total relative uncertainties (dashed lines) for bothmodels trained for 100 iterations when the meson mass inEq. (5) is reduced by a factor of 2. divergence between our model distribution p ( (cid:126)x ) and tar-get distribution | ψ ( (cid:126)x ) | /I . This divergence is estimatedas an expectation under our model through importancesampling as D χ (cid:39) (cid:104) D χ (cid:105) N = 1 N N (cid:88) i =1 ( | ψ ( (cid:126)x i ) | I − p ( (cid:126)x i )) p ( (cid:126)x i ) /p ( (cid:126)x i ) , (8)where the normalizing constant I is additionally esti-mated through sampling. At each training iteration, weminimize this expectation with respect to the parame-ters of our flow using gradient descent on batches of N samples, where sampling from our flow amounts to firstsampling from the base distribution π ( (cid:126)u ) and then pass-ing these samples through (cid:126)h to obtain (cid:126)x . The gradientdescent optimization algorithm we employ is Adam [43].All models were implemented using PyTorch [44], andthe open-source implementation for the spline transfor-mations in [41] was used for our coupling layers. Results:
We start by training the flow on the targetintegral in Eq. (5) using batches of N = 5000 samplesper training iteration. We initially fix the density at n = n , where n = 0 .
16 fm − is the saturation density ofnuclear matter, and the temperature at T = 25 MeV.The learning rate for the Adam optimizer was set to 10 − until 200 iterations passed without an improvement inthe χ loss function, at which point a cosine scheduler wasinitiated with maximum learning rate 10 − , minimumlearning rate 10 − , and period of 200 iterations. As abaseline comparison, we compute the integral in Eq. (5)using the adaptive Monte Carlo integrator VEGAS [22,23] using the same number of samples.In the top panel of Fig. 1 we compare the VEGAS(green) and normalizing flow (red) total relative uncer- − . − . ∂ Ω ( ) / ∂ n (cid:2) Ω (2) (cid:3) = MeV × − ExactVEGAS nflow .
05 0 .
10 0 .
15 0 . n [fm − ] − . . . ∂ Ω ( ) / ∂ n T = 25MeV × − . . ∂ Ω ( ) / ∂ T n = 0 . − × − T [MeV] . . ∂ Ω ( ) / ∂ T × − FIG. 3. 1 st and 2 nd -order derivatives of Ω (2) with respect tothe density n and temperature T from VEGAS (green line),normalizing flows (red dots), and exact Gaussian quadrature(blue line). The numerical derivatives are calculated by thecentral finite difference method of order 2 generated from thedata shown in Fig. 2. tainty σ t Ω (2) = 1 / (cid:113)(cid:80) i σ i (cid:80) i Ω (2) i σ i / (cid:80) i σ i (9)over the first 3000 iterations, where in Eq. (9) i is thetraining iteration, σ i is the batch standard error, andΩ (2) i is the batch mean. We observe that VEGAS outper-forms the normalizing flow early in the training, but thegreater expressive capacity of the normalizing flow leadsto a smaller total relative uncertainty past 500 iterations.Moreover, in stark contrast to VEGAS, the normalizingflow continues to learn over many training iterations, asshown in the bottom panel of Fig. 1, where we plot thebatch relative uncertainty | σ b / Ω (2) | , which reflects theperformance of each model at a particular training iter-ation. We observe that the VEGAS batch uncertaintysaturates after less than one hundred iterations, whilethe normalizing flow batch uncertainty continues to de-crease throughout training. We stopped training the nor-malizing flow when the total relative uncertainty reached10 − (after 36,000 iterations) with associated batch rel-ative uncertainty of ∼ . × − , which is an order ofmagnitude improvement over VEGAS.When precise integral estimates are required (as is thecase for computing numerical derivatives of the free en-ergy), it is clear that normalizing flows are able to outper-form VEGAS, with the caveat that a high sample com-plexity is required to reach this low precision. We nowdemonstrate that this initial high sample complexity is aone-time cost, and that normalizing flow models transfer exceptionally well when either the density or tempera-ture, or even the nuclear potential, in Eq. (5) is varied.This is particularly promising for mapping out the freeenergy over the 1,000,000 phase space points required foran astrophysical equation of state table and quantifyinguncertainties associated with the choice of nuclear inter-action. In the left and right panels of Fig. 2 we showthe temperature and density dependence, respectively, ofthe total relative uncertainty | σ t / Ω (2) | in the evaluationof Eq. (5). Both VEGAS (green) and the normalizingflow (red) were first trained at the phase space point in-dicated by the star and then transferred sequentially todifferent densities and temperatures using just 100 addi-tional training steps (500,000 samples). At the startingpoint ( n = n , T = 25 MeV), the normalizing flow beginswith more than an order of magnitude better uncertaintyestimate compared to VEGAS. We can see that this im-provement in the precision (relative to VEGAS) persistsas both the density and temperature are varied. We notethat the increase in total relative uncertainty as temper-ature decreases is not unexpected, since the sharpeningof the Fermi distribution functions becomes difficult tomodel within any adaptive Monte Carlo method, as ev-idenced by the similar behavior demonstrated by VE-GAS. We also show in Fig. 2 the ability for each modelto adapt to nontrivial changes in the choice of nuclearpotential. In particular, the dashed lines in Fig. 2 denotethe total relative uncertainty in the evaluation of Eq. (5)after reducing the meson mass from m φ = 600 MeV to m φ = 300 MeV and training both models for 100 itera-tions. We observe that the normalizing flow is able toefficiently transfer when such changes in the nuclear po-tential are introduced.We now estimate how the uncertainties in the MonteCarlo estimates for the grand canonical potential fromthe VEGAS and normalizing flow models propagate tothe calculation of numerical derivatives. For this studywe take advantage of the fact that the second-order con-tribution to the grand canonical potential can be simpli-fied by first performing a partial-wave decomposition andthen evaluating a simpler five-dimensional integral, whichcan be computed precisely with Gaussian quadrature.This allows us to benchmark the numerical derivativesfrom VEGAS and the normalizing flow against nearly ex-act results. In the top and lower panels of Fig. 3 we showthe 1 st and 2 nd -order derivatives of Ω (2) with respect tothe density and temperature, respectively. The centralfinite difference method of order 2 is applied to calcu-late the numerical derivatives from the VEGAS (green)and normalizing flow (red) datasets generated in Fig. 2as well as the exact results obtained through Gaussianquadrature (blue). We see that the improved numericalprecision from the normalizing flow leads to significantlybetter estimates of free energy derivatives. In particu-lar, we can see that in the low-temperature region thederivatives from VEGAS fluctuate strongly about thetrue value, while the results from the normalizing floware stable and match the exact values well even for the2 nd -order derivatives. Outlook:
In the present work, we have performedproof-of-principle calculations demonstrating the poten-tial of normalizing flow based importance sampling in thecontext of nuclear many-body perturbation theory. Inparticular, we have shown that normalizing flows are ableto learn models of the target integrand which allow forprecise integral estimates and which can be transferred torelated integrals with minimal additional computationalcost. Ultimately, this leads to speedup factors on theorder of 100 compared to VEGAS when precise integralevaluations must be repeated across a multi-dimensionalphase space, such as in calculations of the free energyand its numerical derivatives in astrophysical equationof state tables. Numerous extensions and further appli-cations are envisioned. One important application is thenucleon single-particle energy, a quantity that varies overthe four-dimensional parameter space { n, T, Y p , q } , where q is the nucleon momentum. First and second-order derivatives of this quantity are needed when computinge.g., the nucleon effective mass. Another application is tonuclear matter response functions, which vary over thefive-dimensional parameter space { n, T, Y p , q, ω } , where q and ω represent the momentum and energy transfer tothe medium. In all of these cases, normalizing flows mayallow for the inclusion of perturbation theory contribu-tions that at present are too computationally demandingto map over the full phase space needed in astrophysicalapplications. Acknowledgments:
J.B. would like to thank PrafullaChoubey for helpful collaboration in the early stages ofthis project. Work supported by the National ScienceFoundation under Grant No. PHY1652199 and by theU.S. Department of Energy National Nuclear SecurityAdministration under Grant No. DE-NA0003841. Por-tions of this research were conducted with the advancedcomputing resources provided by Texas A&M High Per-formance Research Computing. [1] J. M. Lattimer and M. Prakash, Phys. Rept. ,121 (2000).[2] J. M. Lattimer and M. Prakash, Phys. Rept. , 127(2016).[3] J. Lattimer and D. Swesty, Nucl. Phys.
A535 , 331(1991).[4] H. Shen, H. Toki, K. Oyamatsu, and K. Sumiyoshi, Nucl.Phys.
A637 , 435 (1998).[5] G. Shen, C. J. Horowitz, and S. Teige, Phys. Rev. C ,035802 (2011).[6] A. W. Steiner, M. Hempel, and T. Fischer, Astrophys.J. , 17 (2013).[7] C. Drischler, J. A. Melendez, R. J. Furnstahl, and D. R.Phillips, Phys. Rev. C , 054315 (2020).[8] H. Yasin, S. Sch¨afer, A. Arcones, and A. Schwenk, Phys.Rev. Lett. , 092701 (2020).[9] A. S. Schneider, L. F. Roberts, C. D. Ott, andE. O’Connor, Phys. Rev. C , 055802 (2019).[10] P. Donati, P. Pizzochero, P. Bortignon, and R. Broglia,Phys. Rev. Lett. , 2835 (1994).[11] C. Wellenhofer, J. W. Holt, and N. Kaiser, Phys. Rev.C , 015801 (2015).[12] A. Carbone, A. Polls, and A. Rios, Phys. Rev. C ,025804 (2018).[13] C. Drischler, J. W. Holt, and C. Wellenhofer,arXiv:2101.01709 (2021).[14] V. Bernard, E. Epelbaum, H. Krebs, and U.-G. Meissner,Phys. Rev. C , 064004 (2008).[15] V. Bernard, E. Epelbaum, H. Krebs, and U.-G. Meissner,Phys. Rev. C , 054001 (2011).[16] J. W. Holt and N. Kaiser, Phys. Rev. C , 034326(2017).[17] C. Drischler, K. Hebeler, and A. Schwenk, Phys. Rev.Lett. , 042501 (2019).[18] S. Typel, M. Oertel, and T. Klaehn, arXiv:1307.5715(2013).[19] H. Togashi, K. Nakazato, Y. Takehara, S. Yamamuro,H. Suzuki, and M. Takano, Nucl. Phys. A961 , 78 (2017).[20] T. Hahn, Comp. Phys. Comm. , 78 (2005). [21] A. B. Owen,