[PDF] A new algorithm for electrostatic interactions in Monte Carlo simulations of charged particles

Abstract

To minimise systematic errors in Monte Carlo simulations of charged particles, long range electrostatic interactions have to be calculated accurately and efficiently. Standard approaches, such as Ewald summation or the naive application of the classical Fast Multipole Method, result in a cost per Metropolis-Hastings step which grows in proportion to some positive power of the number of particles N in the system. This prohibitively large cost prevents accurate simulations of systems with a sizeable number of particles. Currently, large systems are often simulated by truncating the Coulomb potential which introduces uncontrollable systematic errors. In this paper we present a new multilevel method which reduces the computational complexity to O(log(N)) per Metropolis-Hastings step, while maintaining errors which are comparable to direct Ewald summation. We show that compared to related previous work, our approach reduces the overall cost by better balancing time spent in the proposal- and acceptance- stages of each Metropolis-Hastings step. By simulating large systems with up to N= 10 5 particles we demonstrate that our implementation is competitive with state-of-the-art MC packages and allows the simulation of very large systems of charged particles with accurate electrostatics.

Full PDF

MMonte Carlo simulations of interacting particles with fast and accurateelectrostatics

William Robert Saunders a , James Grant b , and Eike Hermann M¨uller c,* University of Bath, Bath BA2 7AY, Bath, United Kingdom a Department of Physics b Computing Services c Department of Mathematical Sciences * Email: [email protected]

July 1, 2020

Abstract

To minimise systematic errors in Monte Carlo simulations of charged particles, long range electrostatic interac-tions have to be calculated accurately and eﬃciently. Standard approaches, such as Ewald summation or the naiveapplication of the classical Fast Multipole Method, result in a cost per Metropolis-Hastings step which grows inproportion to some positive power of the number of particles N in the system. This prohibitively large cost preventsaccurate simulations of systems with a sizeable number of particles. Currently, large systems are often simulatedby truncating the Coulomb potential which introduces uncontrollable systematic errors. In this paper we presenta new multilevel method which reduces the computational complexity to O (log( N )) per Metropolis-Hastings step,while maintaining errors which are comparable to direct Ewald summation. We show that compared to related pre-vious work, our approach reduces the overall cost by better balancing time spent in the proposal- and acceptance-stages of each Metropolis-Hastings step. By simulating large systems with up to N = 10 particles we demonstratethat our implementation is competitive with state-of-the-art MC packages and allows the simulation of very largesystems of charged particles with accurate electrostatics. keywords : Monte Carlo, electrostatics, particle simulations, computational complexity, Fast Multipole Method The accurate representation of all pairwise interactions in classical atomistic simulations is important to minimisesystematic errors. In this paper we focus on Monte Carlo (MC) simulations of charged particles. Short-range interac-tions such as the Lennard-Jones potential can be safely truncated at a ﬁnite cutoﬀ distance: when calculating energydiﬀerences in a proposed MC move only interactions with a ﬁxed number of other particles in close proximity of themoving particle need to be taken into account. For ﬁxed density the cost of one local Metropolis-Hastings (MH) step isconstant, independent of the total number N of particles in the system. However, due to the long-range nature of theCoulomb potential, which decays in proportion to the inverse distance between two charges, including electrostaticsis far from trivial since interactions with all other particles in the system have to be considered. Worse, interactionswith periodic copies or mirror charges have to be taken into account if non-trivial boundary conditions are used.As the review in [1] shows, a plethora of methods have been developed to address this issue, but care has tobe taken to obtain reliable results. In this context the authors of [2] for example ﬁnd that truncating the Coulombpotential in the MC simulation of water in a highly anisotropic geometry leads to signiﬁcant systematic errors. Othermethods which have been proposed to avoid this problem include solving the Poisson equation with a grid-basedmultigrid method [3], Ewald summation [4] or the naive application of the Fast Multipole Method (FMM) [5, 6, 7].Unfortunately all those approaches result in a prohibitively large computational cost as the number of particles N inthe system grows, typically the cost per MH step increases as O ( N ) or O ( √ N ). This renders the simulation of verylarge systems impossible and limits the predictive power of computer experiments.In this paper we show how this issue can be overcome by constructing a method which reduces the cost per MHstep to O (log( N )) without sacriﬁcing accuracy, thereby making much larger simulations with realistic electrostaticsfeasible. Our multilevel approach is inspired by the Fast Multipole method and similar to the method recently proposed1 a r X i v : . [ phy s i c s . c o m p - ph ] J un n [8]. However, compared to [8] it leads to an overall reduction of computational cost by balancing the time spent inproposing and accepting MC moves in realistic MC simulations.The key observation motivating our new method is the following: standard FMM constructs local expansionsfor evaluating the long range interactions on the ﬁnest level of the hierarchical tree. While the evaluation of thoseexpansions (and direct interactions with all close by neighbours) in the proposal stage of a MH step is O (1), re-calculating the local expansions incurs a cost of O ( N ) since all local expansions are re-calculated in the upward anddownward pass of the algorithm, resulting in an overall O ( N ) cost per MH step. By storing the local expansions oneach level of the multilevel hierarchy instead of accumulating them on the ﬁnest level, the relative cost of the proposal-and accept- stage can be balanced since each of those two steps requires a ﬁxed number of operations on each levelof the multilevel hierarchy. As the number of levels L is proportional to log( N ), this results in a total computationalcomplexity per MH step which grows logarithmically in the number of particles.In summary, the new contributions of our work are as follows:1. We describe a new hierarchical algorithm for Monte Carlo simulations with accurate electrostatics, which hasan O (log( N )) computational complexity per Metropolis-Hastings step.2. By comparing to the similar method in [8] we show that our algorithm leads to an overall reduction in cost forrealistic Monte Carlo simulations.3. We describe the eﬃcient implementation of our algorithm in the performance-portable PPMD framework [9]recently developed in our group. Since it has a user-friendly high-level Python interface, PPMD allows the easyimplementation of Monte Carlo algorithms with accurate electrostatics.4. We demonstrate that the runtime of our implementation is competitive with the state-of-the-art MC codeDL MONTE [10, 11], which struggles to simulate systems of the size that are easily accessible with our imple-mentation.For systems with N = 10 particles our implementation is an order of magnitude faster than the DL MONTEcode. At this system size, we observe that the alternative approach in [8] results in a 30% longer simulation time thanour method. By ﬁtting a semi-empirical model to predict the cost of a simulation as a function of the problem size,we show that asymptotically (i.e. for N → ∞ ) we expect our algorithm to be twice as fast as the one in [8]. Structure.

This paper is organised as follows: After discussing related work in Section 2, we review the native FMMalgorithm and describe our new method in Section 3, where we also compare it to the approach in [8]. Following adescription of the Python interface for our implementation of the algorithms introduced in this paper in Section 4,numerical results are presented in Section 5. We conclude and outline directions for further work in Section 6.

Methods for including untruncated electrostatic interactions in MC simulations with a computational complexity of O (log( N )) per MH step have been developed previously in [8, 12]. Compared to our approach, the method in [8]does not construct local expansions, thereby avoiding their recalculation whenever a particle is moved. While thismight look like a reasonable simpliﬁcation, it actually makes evaluating the change in potential for each proposed(but potentially rejected) MC transition more expensive. Since there is typically more than one proposed move peraccepted transition, this renders the method in [8] more expensive overall, as our numerical experiments conﬁrm.A modiﬁcation of the Barnes-Hut octree algorithm is discussed in [12]. Similar to FMM, the classical Barnes-Hutmethod constructs a hierarchical mesh structure, and represents the distribution of particles in cells on each levelby their multidimensional Taylor expansion coeﬃcients. While the calculation of the total electrostatic energy withthe octree algorithm is O ( N log( N )), the authors of [12] present a modiﬁcation of the method which has a cost of O (log( N )) per local MC step.The O (log( N )) algorithms presented here and in [8, 12] improve on what can be achieved with Ewald summation[4, 13], for which the change in electrostatic energy per MC proposal can be calculated at a computational complexityof O ( √ N ). This is because the overall O ( N / ) cost of the Ewald-based energy calculation is made up by an iterationover all N particles and a sum over O ( √ N ) reciprocal vectors (long-range contribution) and neighbouring particles(short-range contribution). If only O (1) particles move in each proposed move, only a small number of the O ( √ N )sums have to be evaluated. A similar approach is currently explored in the DL MONTE code [10, 11], though theimplementation at present is limited by the fact that the short-range cutoﬀ of the Ewald summation has to be identicalto the cutoﬀ of all other local interactions. In this paper we present numerical results which show that our new methodcan be used to simulate systems with up to 10 charges and accurate electrostatic interactions at a cost of around 1msper MH step.For completeness, it should be pointed out that other methods with a computational complexity of O ( N ) per MHstep have also been developed. For example, in [3] a multigrid method is used to solve the Poisson equation to obtain2 ‘ = 4 M M ‘ = 3 MM ‘ = 2up up (*) ‘ = 2L L ‘ = 3LL MMMMMMM ‘ = 4 LMMMMMM MMMMM MMMMMMMM MMMMMMMM downdown

Figure 1: Schematic illustration of the classical Fast Multipole Method in two dimensions ﬁrst described in [5, 6]. Thenumber of levels is L = 4 in this example. The upward pass for constructing the multipole expansions (M) is shown inthe top row, while the local expansions (L) are built in the downward pass at the bottom of the ﬁgure. The asterisk(*) on the right hand side denotes special operations on the coarsest level for incorporating (potentially nontrivial)boundary conditions.the electrostatic potential generated by the particles. One disadvantage of the method is that the mapping of the highlypeaked charge distribution to a mesh introduces additional errors. The authors of [14] argue that using a constraineddiﬀusing electric ﬁeld instead of solving a partial diﬀerential equation for the potential is computationally more eﬃcientthan other O ( N ) approaches; they take care to reduce lattice artefacts by using a non-trivial interpolation scheme. We now discuss the new approach introduced in this work. After brieﬂy reviewing key concepts of the classical FMMalgorithm we describe our method in detail in Section 3.2 and compare it to the related technique in [8] in Section 3.3.

In three dimensions the FMM algorithm [5, 6, 7] uses a hierarchical grid with L levels for the computational domain Ω(which is assumed to be a cube of width a ) such that the number of cells on each level (cid:96) is M (cid:96) = 8 (cid:96) − for (cid:96) = 1 , . . . , L .The number of cells on the ﬁnest level is M = M L , and (to balance the work between the long range and directﬁeld calculation) typically L is chosen such that there are O (1) particles in each ﬁne level cell. Each cell on level (cid:96) = 1 , . . . , L − (cid:96) = 2 , . . . , L hasa unique parent cell. The Fast Multipole Algorithm now computes the electrostatic potential by splitting it into twocontributions. First, the long range part is calculated by working out the multipole expansion of all charges in a ﬁnelevel cell and transforming them into multipole expansions on the coarser levels in the upward pass of the algorithm. Inthe downward pass the multipole expansions on each level are transformed into local expansions around the centre of acell. Those are then recursively combined into local expansions in the child cells. By only considering the contributionfrom multipole expansions in a ﬁxed number of well-separated cells on each level, the potential of distant charges isresolved at the appropriate level of accuracy, while including the contribution from closer charges on ﬁner levels. Themethod for calculating the long range contribution is shown schematically in Fig. 1; we refer the reader to [5, 6, 7]for further details. For the following discussion of our FMM variant for MC simulations the notion of an interactionlist (IL) of a particular cell is crucial. For a cell α on level (cid:96) this interaction list IL( α ) is the set of cells which arethe children of the parent cell of α and its nearest neighbours, but which are well separated from α , i.e. not directneighbours of α on level (cid:96) . Explicitly, the interaction list is deﬁned asIL( α ) = children ( N b ( parent ( α ))) \ ( α ∪ N b ( α )) , where N b ( α ) is the set of the 26 nearest neighbours of the cell α ; the functions children ( α ), parent ( α ) return the set ofchild cells or the parent cell of a particular cell α . An example of an interaction list can be found in the bottom leftcorner of Fig. 1: all cells labelled with the letter ‘M’ are in the interaction list of the gray cell labelled with an ‘L’.Finally, interactions with charges in neighbouring ﬁne level cells are included by directly evaluating the 1 /r potentialgenerated by those charges. 3 .2 FMM for Monte Carlo simulations Now consider the following modiﬁcation of FMM. Let Ψ ∆ (cid:96),α be the p -term local expansion around the center of cell α on level (cid:96) such that Ψ ∆ (cid:96),α contains contributions from all charges in the interaction list IL( α ) of α . Note that this isdiﬀerent from standard FMM, where the local expansions Ψ (cid:96),α contain the contribution of all charges not containedin the cell α or its 26 nearest neighbours. However, Ψ (cid:96),α can be obtained by summing the Ψ ∆ (cid:96),α on the current andcoarser levels, namelyΨ (cid:96),α = (cid:96) (cid:88) (cid:96) (cid:48) =1 Ψ ∆ (cid:96) (cid:48) ,α (cid:96) (cid:48) with α (cid:96) = α and α (cid:96) (cid:48) = parent ( α (cid:96) (cid:48) +1 ) for all (cid:96) (cid:48) = 1 , . . . , (cid:96) − . (1)For a cell α on level (cid:96) the local expansion Ψ ∆ (cid:96),α can be expressed in terms of the coeﬃcients ( L ∆ (cid:96),α ) mn asΨ ∆ (cid:96),α ( δr ) = p (cid:88) n =0 + n (cid:88) m = − n (cid:0) L ∆ (cid:96),α (cid:1) mn ( δr ) n Y mn ( δθ, δφ ) with ( δr, δθ, δφ ) = spherical ( δr ) . (2)Here δr is the position of the particle measured relative to the centre R α of the cell α . The function spherical ( r )returns the spherical coordinates ( r, θ, φ ) of a vector r . We further call the set ancestors ( α ) = { α (cid:96) − , α (cid:96) − , . . . , α , α } deﬁned recursively in Eq. (1) the ancestors of cell α . Our strategy for evaluating the long range contributions inMonte Carlo simulations is as follows (see Fig. 2): Initialisation.

At the beginning of the simulation, calculate the local expansion coeﬃcients ( L ∆ (cid:96),α ) mn for all cells α and on all levels (cid:96) by using a slightly modiﬁed variant of the upward/downward pass in the Fast MultipoleAlgorithm. Proposal.

Consider a proposed MC move r → r (cid:48) of charge q such that the original position r is contained in theﬁne-level cell α and the new position r (cid:48) in the ﬁne-level cell α (cid:48) (which could be identical to α ). To evaluatethe change in the long range potential, evaluate and accumulate the Ψ ∆ (cid:96),α (cid:96) ( r − R α (cid:96) ) and Ψ ∆ (cid:96),α (cid:48) (cid:96) ( r (cid:48) − R α (cid:48) (cid:96) ) on alllevels (cid:96) = 1 , . . . , L of the hierarchy by using the sum in Eq. (1), where R α is the centre of cell α . This givesthe change in long-range electrostatic energy ∆ U lr = q (Ψ L,α (cid:48) ( r (cid:48) − R α (cid:48) ) − Ψ L,α ( r − R α )). Secondly, add thechange in short-range energy ∆ U direct from a direct calculation of the interactions with particles in the same andadjacent cells. Finally, remove the spurious self-interaction contribution q / | r (cid:48) − r | which is due to the potentialﬁeld induced by the charge at the original position. This self-interaction correction is described in detail in [15,Section 3]. Accept.

Assume we accept a move r → r (cid:48) of charge q such that r lies in cell α and r (cid:48) in cell α (cid:48) . For all cells β inthe interaction list of α and all its ancestors the contribution of a monopole with charge q is subtracted fromΨ ∆ (cid:96),β (cid:96) on all levels (cid:96) = 1 , . . . , L . This requires updating the expansion coeﬃcients L ∆ (cid:96),α (cid:96) . Conversely, a monopoleof charge q is added to the local expansions Ψ ∆ (cid:96),β (cid:48) (cid:96) of for all cells β (cid:48) (cid:96) in the interaction list of α (cid:48) and its ancestors.The propose and accept steps are written down explicitly in Algorithms L.1 and L.2; the direct calculation of theinteraction with particles in the same ﬁne-level cell or directly adjacent cells to obtain ∆ U direct is given in Algorithm4. In Algorithm 4 we remove the spurious self-interaction that occurs between the charge at the proposed positionwith itself at the original position. Algorithm L.1

Propose a move r → r (cid:48) using local expansions. Input: initial position r of particle with charge q ;target position r (cid:48) . Output: change in total electrostatic energy ∆ U Find ﬁne-level cells α L , α (cid:48) L such that r ∈ α L and r (cid:48) ∈ α (cid:48) L ∆ U ← (cid:91) for (cid:96) = L, L − , . . . , do Update ∆ U ← (cid:91) ∆ U + q (cid:16) Ψ ∆ (cid:96),α (cid:48) (cid:96) ( r (cid:48) − R α (cid:48) (cid:96) ) − Ψ ∆ (cid:96),α (cid:96) ( r − R α (cid:96) ) (cid:17) using local expansion in Eq. (2) if (cid:96) > then Set α (cid:96) − ← (cid:91) parent ( α (cid:96) ); α (cid:48) (cid:96) − ← (cid:91) parent ( α (cid:48) (cid:96) ) end if end for Calculate change ∆ U direct in electrostatic energy from direct interactions with Algorithm 4 ∆ U ← (cid:91) ∆ U + ∆ U direct Since the local expansions with O ( p ) terms need to be evaluated in two cells per level, the cost of one proposalis obviously O ( p L ) = O ( p log( N )). Similarly, when updating the O ( p ) expansion coeﬃcients L ∆ (cid:96),α (cid:96) while acceptinga move, the number of cells in the interaction list on each level is constant (6 d − d = 189 in d = 3 dimensions, to4 ld position new position r ' r level ℓ =Llevel ℓ =L-1level ℓ =L-2 Figure 2: Schematic illustration of the new method for a proposing a MC move r → r (cid:48) , as described in Section 3.2.On each level (cid:96) the cell α (cid:96) containing r and the corresponding cell α (cid:96) containing r (cid:48) are marked in red. Cells β (cid:96) in theinteraction list of α (cid:96) and cells β (cid:48) (cid:96) in the interaction list of α (cid:96) are shown in blue. Interactions with particles in greencells have to be evaluated directly when a move is proposed. Algorithm L.2

Accept a move r → r (cid:48) using local expansions. Input: old position r , new position r (cid:48) Find ﬁne-level cells α L , α (cid:48) L such that r ∈ α L and r (cid:48) ∈ α (cid:48) L for (cid:96) = L, L − , . . . , do for β (cid:96) ∈ IL( α (cid:96) ) and β (cid:48) (cid:96) ∈ IL( α (cid:48) (cid:96) ) do Set ( δr, δθ, δφ ) ← (cid:91) spherical ( r − R β (cid:96) ) and ( δr (cid:48) , δθ (cid:48) , δφ (cid:48) ) ← (cid:91) spherical ( r (cid:48) − R β (cid:48) (cid:96) ) for n = 0 , . . . , p do for m = − n, . . . , + n do Update ( L ∆ (cid:96),β (cid:96) ) mn ← (cid:91) ( L ∆ (cid:96),β (cid:96) ) mn − q ( δr ) − ( n +1) Y − mn ( δθ, δφ ) Update ( L ∆ (cid:96),β (cid:48) (cid:96) ) mn ← (cid:91) ( L ∆ (cid:96),β (cid:48) (cid:96) ) mn + q ( δr (cid:48) ) − ( n +1) Y − mn ( δθ (cid:48) , δφ (cid:48) ) end for end for end for if (cid:96) > then Set α (cid:96) − ← (cid:91) parent ( α (cid:96) ); α (cid:48) (cid:96) − ← (cid:91) parent ( α (cid:48) (cid:96) ) end if end forAlgorithm L.3 Initialise local expansion coeﬃcients ( L ∆ (cid:96),α ) mn for electrostatic calculation with Algorithms L.1 andL.2 for levels (cid:96) = 1 , . . . , L do for all cells α (cid:96) on level (cid:96) do for n = 0 , . . . , p and m = − n · · · + n do Set ( L ∆ (cid:96),α (cid:96) ) mn = 0 end for end for for all cells α (cid:96) do for all particles with charge q i and position r i ∈ α (cid:96) do for all cells β (cid:96) ∈ IL( α (cid:96) ) do Set ( δr i , δθ i , δφ i ) ← (cid:91) spherical ( r i − R β (cid:96) ) for n = 0 , . . . , p do for m = − n, . . . , + n do Update ( L ∆ (cid:96),β (cid:96) ) mn ← (cid:91) ( L ∆ (cid:96),β (cid:96) ) mn + q i ( δr i ) − ( n +1) Y − mn ( δθ i , δφ i ) end for end for end for end for end for end for lgorithm 4 Calculate change in electrostatic energy from direct interactions for a proposed move r → r (cid:48) . Input:initial position r ∈ α L of particle with charge q ; target position r (cid:48) ∈ α (cid:48) L . Output: change in direct electrostaticinteraction energy ∆ U direct ∆ U direct ← (cid:91) for all particles with charge q i and position r i ∈ α L ∪ N b ( α L ), r i (cid:54) = r do Update ∆ U direct ← (cid:91) ∆ U direct − qq i | r − r i | end for for all particles with charge q (cid:48) i and position r (cid:48) i ∈ α (cid:48) L ∪ N b ( α (cid:48) L ), r (cid:48) i (cid:54) = r (cid:48) do Update ∆ U direct ← (cid:91) ∆ U direct + qq (cid:48) i | r (cid:48) − r (cid:48) i | end for Remove self-interaction ∆ U direct ← (cid:91) ∆ U direct − q | r (cid:48) − r | be speciﬁc). Therefore the computational complexity of the accept stage is also O ( p L ) = O ( p log( N )). The largerconstant (compared to the one in the propose stage) arises due to the fact that 2 ·

189 instead of 2 cells have to beconsidered on each level in this stage and is partially compensated by two eﬀects:1. Typically only a fraction of all proposed moves are accepted.2. Each MC proposal also requires the calculation of the short-range electrostatic interactions, which is not necessaryin the accept stage.The short-range contribution of the electrostatic potential is evaluated by calculating the contribution of all chargesin N b ( α ) and N b ( α (cid:48) ) directly in Algorithm 4. As in the standard FMM algorithm the number of charges per ﬁne levelcell is constant and independent of the number of levels; each cell typically contains at the order of 1-10 charges. Thisguarantees that the total cost of the electrostatic calculation in the proposal step is still O (log( N )) after the direct,short-range interactions are included using Algorithm 4.We conclude that the computational complexity of the electrostatics in one MH step, which consists of a proposedmove, potentially followed by one accepted transition, is O ( p log( N )).The initialisation of the local expansions Ψ ∆ (cid:96),α at the beginning of the simulation could be carried out in O ( p N )time with a minimally modiﬁed variant of the standard FMM algorithm, which is written down explicitly as Algorithm2 in [15]. Apart from renaming L (cid:96),α (cid:55)→ L ∆ (cid:96),α in the local expansions, the only diﬀerence is that line 17 of this algorithmhas to be replaced by Ψ (cid:96),α ← (cid:91) (cid:96),α in lines 13-15 is no longer necessary. However,since the setup cost is amortised anyway by the large number of MH steps, we chose a slightly more expensive butsimpler approach, which is written down in Algorithm L.3 and avoids the calculation of multipole expansions in theupwards pass of the FMM algorithm. For this, the coeﬃcients L ∆ (cid:96),α are initialised to zero for all cells α and levels (cid:96) .Next, on each level we loop over all cells α and increment Ψ ∆ (cid:96),β for all β ∈ IL( α ) by adding the contribution of allmonopoles in α to the local expansion in β . Since N monopoles have to be considered on each level, the computationalcomplexity of the setup phase is O ( p LN ) = O ( p N log( N )). For reference, we now describe the alternative algorithm introduced in [8], which is based entirely on multipoleexpansions and which we also implemented for reference. In analogy to Eq. (2), we deﬁne the multipole expansion ofall particles contained in box α on level (cid:96) around the centre of the boxΦ (cid:96),α ( δr ) = p (cid:88) n =0 + n (cid:88) m = − n ( M (cid:96),α ) mn ( δr ) − ( n +1) Y mn ( δθ, δφ ) with ( δr, δθ, δφ ) = spherical ( δr ). (3)Assuming that the particles in the box α have coordinates δr i and charges q i with i ∈ I α ⊂ { , . . . , N } , the explicitexpression for the multipole expansion coeﬃcients in Eq. (3) is( M (cid:96),α ) mn = (cid:88) i ∈ I α q i ( δr i ) n Y − mn ( δθ i , δφ i ) with ( δr i , δθ i , δφ i ) = spherical ( δr i ). (4)Algorithms M.1 and M.2 describe how a potential move is proposed and accepted, using only multipole expansions; thetwo algorithms should be compared to Algorithms L.1 and L.2 above. Both methods have a computational complexityof O ( p log( N )), but observe that the expensive loops over the interaction list are now carried out in the proposal stepin Algorithm M.1. Again, it would be possible to initialise the multipole expansion coeﬃcients M (cid:96),α at the beginningof the simulation in O ( p N ) time with one upward pass of the native FMM method. However, for simplicity we choseto calculate them directly by looping over the cells on all levels and accumulating the multipole coeﬃcients from all6articles in a particular cell using Eq. (4); this is written down explicitly in Algorithm M.3 which has O ( p N log( N ))complexity. The resulting coeﬃcients ( M (cid:96),α ) mn are not identical to those that would be have been obtained in theupward pass of the FMM algorithm, where they are calculated by recursively combining expansions on subsequentlevels. However, the diﬀerence between the two ways of computing the multipole coeﬃcients can be bounded as in thestandard FMM error analysis. Algorithm M.1

Propose a move r → r (cid:48) using multipole expansions. Input: initial position r of particle with charge q ; target position r (cid:48) . Output: change in total electrostatic energy ∆ U Find ﬁne-level cells α L , α (cid:48) L such that r ∈ α L and r (cid:48) ∈ α (cid:48) L ∆ U ← (cid:91) for (cid:96) = L, L − , . . . , do for β (cid:96) ∈ IL( α (cid:96) ) and β (cid:48) (cid:96) ∈ IL( α (cid:48) (cid:96) ) do Update ∆ U ← (cid:91) ∆ U + q (cid:16) Φ (cid:96),β (cid:48) (cid:96) ( r (cid:48) − R β (cid:48) (cid:96) ) − Φ (cid:96),β (cid:96) ( r − R β (cid:96) ) (cid:17) using multipole expansion in Eq. (3) end for if (cid:96) > then Set α (cid:96) − ← (cid:91) parent ( α (cid:96) ); α (cid:48) (cid:96) − ← (cid:91) parent ( α (cid:48) (cid:96) ) end if end for Calculate change ∆ U direct in electrostatic energy from direct interactions with Algorithm 4 ∆ U ← (cid:91) ∆ U + ∆ U direct Algorithm M.2

Accept a move r → r (cid:48) using multipole expansions. Input: old position r , new position r (cid:48) Find ﬁne-level cells α L , α (cid:48) L such that r ∈ α L and r (cid:48) ∈ α (cid:48) L for (cid:96) = L, L − , . . . , do Set ( δr, δθ, δφ ) ← (cid:91) spherical ( r − R α (cid:96) ) and ( δr (cid:48) , δθ (cid:48) , δφ (cid:48) ) ← (cid:91) spherical ( r (cid:48) − R α (cid:48) (cid:96) ) for n = 0 , . . . , p do for m = − n, . . . , + n do Update ( M (cid:96),α (cid:96) ) mn ← (cid:91) ( M (cid:96),α (cid:96) ) mn − q ( δr ) n Y − mn ( δθ, δφ ) Update ( M (cid:96),α (cid:48) (cid:96) ) mn ← (cid:91) ( M (cid:96),α (cid:48) (cid:96) ) mn + q ( δr (cid:48) ) n Y − mn ( δθ (cid:48) , δφ (cid:48) ) end for end for if (cid:96) > then Set α (cid:96) − ← (cid:91) parent ( α (cid:96) ); α (cid:48) (cid:96) − ← (cid:91) parent ( α (cid:48) (cid:96) ) end if end for lgorithm M.3 Initialise multipole expansion coeﬃcients M mn for electrostatic calculation with Algorithms M.1 andM.2 for levels (cid:96) = 1 , . . . , L do for all cells α (cid:96) on level (cid:96) do for n = 0 , . . . , p and m = − n · · · + n do Set ( M (cid:96),α (cid:96) ) mn = 0 end for end for for all cells α (cid:96) do for all particles with charge q i and position r i ∈ α (cid:96) do Set ( δr i , δθ i , δφ i ) ← (cid:91) spherical ( r i − R α (cid:96) ) for n = 0 , . . . , p do for m = − n, . . . , + n do Update ( M (cid:96),α (cid:96) ) mn ← (cid:91) ( M (cid:96),α (cid:96) ) mn + q i ( δr i ) n Y − mn ( δθ i , δφ i ) end for end for end for end for end for .4 Boundary conditions So far we have implicitly assumed that free-space boundary conditions are used for the calculation of the electrostaticenergy. In this case the interaction lists on the coarsest two levels of Algorithms L.2, L.3, M.1 and L.3 are empty.This implies that Ψ ∆1 , = Ψ ∆2 ,α = Φ , = Φ ,α = 0 and levels (cid:96) = 1 , • In Algorithms L.2 and L.3, extend the domain Ω by 3 − α (cid:96) and α (cid:48) (cid:96) , include the copies of those cells in the extendeddomain Ω. • By following the approach described in detail in [15, Section 3.1], extend Algorithm L.3 to initialise the datastructures K and E required to compute the electrostatic contribution of all charges outside Ω. Set K mn ← (cid:91) , E mn ← (cid:91) ∀ m, n for all particles with charge q i and position r i ∈ α (cid:96) do Set ( δr i , δθ i , δφ i ) ← (cid:91) spherical ( r i − R ) for n = 0 , . . . , p do for m = − n, . . . , + n do Set K mn ← (cid:91) K mn + q i ( δr i ) n Y − mn ( δθ i , δφ i ) Set E mn ← (cid:91) E mn + q i ( δr i ) n Y mn ( δθ i , δφ i ) end for end for end for Store K and E . • Extend Algorithm L.2 to update K and E when a move is accepted. Set ( δr, δθ, δφ ) ← (cid:91) spherical ( r − R ) and ( δr (cid:48) , δθ (cid:48) , δφ (cid:48) ) ← (cid:91) spherical ( r (cid:48) − R ) for n = 0 , . . . , p do for m = − n, . . . , + n do Set K mn ← (cid:91) K mn + q (( δr (cid:48) ) n Y − mn ( δθ (cid:48) , δφ (cid:48) ) − ( δr ) n Y − mn ( δθ, δφ )) Set E mn ← (cid:91) E mn + q (( δr (cid:48) ) n Y mn ( δθ (cid:48) , δφ (cid:48) ) − ( δr ) n Y mn ( δθ, δφ )) end for end for • Extend Algorithm L.1 to include the contributions to the energy diﬀerences from the proposed move in periodicimages outside Ω by computing the proposed diﬀerences to K and E to determine the change in energy (thelinear operator R is introduced in [15, Section 2.2.1]). Set H ← (cid:91) R ( K ) Set U ∞ ← (cid:91) (cid:80) pn =0 (cid:80) nm = − n E mn H mn Set ( δr, δθ, δφ ) ← (cid:91) spherical ( r − R ) and ( δr (cid:48) , δθ (cid:48) , δφ (cid:48) ) ← (cid:91) spherical ( r (cid:48) − R ) for n = 0 , . . . , p do for m = − n, . . . , + n do Set δK mn ← (cid:91) K mn + q (( δr (cid:48) ) n Y − mn ( δθ (cid:48) , δφ (cid:48) ) − ( δr ) n Y − mn ( δθ, δφ )) Set δE mn ← (cid:91) E mn + q (( δr (cid:48) ) n Y mn ( δθ (cid:48) , δφ (cid:48) ) − ( δr ) n Y mn ( δθ, δφ )) end for end for Set δH ← (cid:91) R ( δK ) Set ∆ U ∞ ← (cid:91) (cid:80) pn =0 (cid:80) nm = − n δE mn δH mn − U ∞ Set ∆ U ← (cid:91) ∆ U + ∆ U ∞ • In Algorithm 4, extend N b ( α L ) and N b ( α (cid:48) L ) to include all cells in the extended domain Ω. Further, the self-interaction term has to be modiﬁed to take into account spurious interactions with the additional copies of theoriginal particle. As discussed in detail in [15, Section 3.1] this can done by replacing the update in line 8 ofAlgorithm 4 by ∆ U direct ← (cid:91) ∆ U direct − (cid:88) ν ∈ [ − , , +1] q | r (cid:48) − ( r + a ν ) | + q a (cid:18) √ √ (cid:19) . Similar modiﬁcations have to be made to Algorithms M.1 and M.3. In practice, iteration over the 26 additional copiesof the simulation cells can be implemented by modifying data structures such as neighbour lists, see [15] for a moredetailed discussion. 9

Implementation

The algorithms described in this paper have been implemented as an extension to the performance portable frameworkfor molecular dynamics (PPMD) described in [9], which is freely available at https://github.com/ppmd/ppmd

PPMD provides a high-level Python interface for particle-based simulations which require the eﬃcient executionof user-deﬁned operations over all particles or pairs of particles in a system. An obvious example of the latter isthe calculation of inter-particle forces in molecular dynamics simulations, but the interface is suﬃciently abstractto support more general operations such as the structure analysis algorithms discussed in Section 4 of [9]. PPMDautomatically generates eﬃcient code for executing short user-deﬁned C-kernels for all particles or particle-pairs ondiﬀerent parallel architectures; both distributed- and shared- memory parallelism are supported and the code can runon non-standard architectures such as GPUs. Particle-speciﬁc data (such as e.g. charge, mass and velocity) is stored asinstances of the Python

ParticleDat class. Particle positions, which contain information that is relevant for paralleldomain decompositions, are stored as instances of the specialised

PositionDat class. In addition to electrostaticpotential- and force- calculation with classical Ewald summation [4, 16], PPMD also contains an implementation ofthe standard FMM algorithm in three dimensions given in [7]. The PPMD framework therefore provides all necessarydata structures for storing information (such as local- and multipole- expansion coeﬃcients) on a nested hierarchy ofgrids which is required to implement the algorithms discussed in this paper. Algorithms L.1 - L.3, M.1 - M.3 and 4are implemented in the separate coulomb_mc

Python package which is based on PPMD and can be downloaded from https://github.com/ppmd/coulomb_mc

Algorithms L.1, L.2, M.1 and M.2 have been implemented as auto-generated C-code. This allows the pre-computationof constant expressions such as combinatorial factors that arise in the evaluation of the spherical harmonics andunrolling of nested loops such as the ones in lines 5-10 of Algorithm L.2 and lines 4-9 in Algorithm M.2. Finally, thegenerated code is compiled for a speciﬁc chip architecture at runtime to ensure optimal performance. Currently ourimplementation supports cuboid geometries with free space- and periodic boundary conditions.

Recall that the local expansion coeﬃcients L ∆ (cid:96),α which are required to compute (changes of) the electrostatic energy ofthe system of N particles with charges { q , q , . . . , q N } and positions { r , r , . . . , r N } are initialised with AlgorithmL.3. In the coulomb mc package the charges of all particles are represented as a PositionDat instance q , whereasthe positions are stored as a ParticleDat object r . At the beginning of the simulation the user populates q and r and uses those to create a MCFMM LM object, which is passed additional information on the domain (such as boundaryconditions), the number of levels L in the grid hierarchy and the number of expansion terms. The constructor of the MCFMM LM class then executes Algorithm L.3 to initialise the values of the coeﬃcients L ∆ (cid:96),α . Following this, AlgorithmL.1 can be used to compute the change in electrostatic energy which occurs if particle j transitions from its originalposition r = r j to a new position r (cid:48) = r (cid:48) j . In the code this is realised by calling the propose() method of the MCFMM LM object and passing it the particle index j and the new position r (cid:48) = r (cid:48) j . Finally, Algorithm L.2 updates thelocal expansions L ∆ (cid:96),α if the proposed move r (cid:55)→ r (cid:48) is accepted. Calling the class method accept() , which is passedthe particle index j and the new position r (cid:48) j of particle also executes Algorithm L.2 for the transition r j (cid:55)→ r (cid:48) j . Itreplaces the position of particle j in the PositionDat r by r (cid:48) j and updates the expansion coeﬃcients L ∆ (cid:96),α Listing 1 illustrates the creation of an

MCFMM LM object, followed by the computation of the change in energy dU = ∆ U which would result from moving particle j = 7 to the new position rj new = (0 . , . , .

2) by calling the propose() method. In the last line the move to the new position is accepted by calling the accept() method. Notethat although the example in Listing 1 assumes that the proposed position is identical to the accepted position, thisis not the case in general. Because of this and since the code keeps track of the total energy of the system at eachstep, by default the accept() method executes Algorithm L.1 to compute the change in system energy ∆ U . Thiscan be avoided by passing this change ∆ U (which - as shown in Listing 1 - might have been computed in a previouscall to propose() with the new position that is to be accepted) as an additional parameter to the accept() method.The corresponding multipole-based Algorithms M.1 - M.3 can be used by creating an MCFMM MM object which keepstrack of the multipole expansion coeﬃcients M ∆ (cid:96),α . The constructor of this class implements Algorithm M.3. The classmethods propose() and accept() implement Algorithms M.1 and M.2 and can be used in exactly the same way asthe corresponding methods of the MCFMM MM class described above.Note that coulomb_mc only provides functionality for the calculation of (changes in) the electrostatic energythrough the high-level

MCFMM LM and

MCFMM MM classes, which typically dominates the runtime. It is up to the user toimplement the overarching Monte Carlo algorithm which generates proposed new positions r (cid:48) and uses the calculatedenergy diﬀerences to accept or reject particular moves. 10isting 1: Illustration of how a FMM MC instance is created and then subsequently used to propose and accept moves. MC = MCFMM_LM (r, q, domain , ’pbc ’, r=3 , l =12)

MC. initialise () j = 7rj_new = np. array ((0.1 , 1.0 , 5.2) ) dU = MC. propose ((j, rj_new ))

MC. accept ((j, rj_new ),dU)

In the following we quantify the performance of the algorithms introduced in Sections 3.2 and 3.3 and implementedas described in Section 4. We demonstrate numerically that, as expected, the time spent in each Monte Carlo stepincreases logarithmically with the number of particles in the system. To assess its overall performance, we also comparethe runtime of our code to version 2.06 of the DL MONTE package [10, 11] which uses the classical Ewald method tocompute electrostatic interactions.All numerical experiments were carried out on the University of Bath “Balena”

HPC cluster. Compute nodes ofthis machine consist of two Intel E5-2650v2 CPUs, and all timing results are reported for sequential runs on a singlecore. A snapshot of the source code which can be used to reproduce the results, along with all plotting scripts andraw data is provided at [17]. The code was compiled using version 19.5.281 of the Intel compiler; DL MONTE wascompiled with version 17.1.132 of the same compiler.

The accuracy of the algorithms described in Sections 3.2 and 3.3 crucially depends on the number of local/multipoleexpansion terms, which can be quantiﬁed by p , the upper limit in the outer sum in Eqs. (2) and (3). To provide afair comparison between the methods introduced in this paper and the Ewald implementation in DL MONTE, p isadjusted such that acceptance probabilities have errors which are comparable to those in DL MONTE. For a proposedmove r → r (cid:48) which results in a change of energy of ∆ U , the relative error in the acceptance probability δP is deﬁnedas δP = | P − P ∗ || P ∗ | . (5)Here P = exp ( − ∆ U/ ( k B T )) is the acceptance probability (computed by DL MONTE or an expansion based method)for a given choice of parameters. Assuming that the exact change in energy is ∆ U ∗ , the exact acceptance probabilityis denoted as P ∗ = exp ( − ∆ U ∗ / ( k B T )). For a particular move P ∗ is approximated to high accuracy by computing∆ U ∗ with the local expansion based method and 26 expansion terms ( p = 25).The conﬁguration for our numerical experiments is based on TEST01 [18] in the DL POLY suite. This setupsimulates a simple cubic NaCl crystal of alternating Sodium (Na) and Chloride (Cl) ions with a lattice constantof a =3.3˚A. Fully periodic boundary conditions are used for all numerical experiments, which are performed at atemperature of T = 273K.To estimate the relative errors δP in the acceptance probability, we start with an initial conﬁguration of chargeswhich is constructed by creating a cubic lattice of 22 × ×

22 = 10648 ions as described in TEST01 and perturbingthe initial position of each ion by adding a uniform random shift with a maximum size of 0 . a in each spatialdirection. Based on this, 1000 moves are proposed (note that no moves are accepted) and for each move the acceptanceprobabilities P for DL MONTE and the expansion based approaches are computed along with the “exact” acceptanceprobability P ∗ , which is estimated as described above. This process is repeated for 16 diﬀerent initial conﬁgurationsto generate a total of 16000 samples for the quantity δP deﬁned in Eq. (5).Figure 3 shows the mean relative error δP (averaged over all 16000 samples) as a function of the number ofexpansion terms, which varies between 4 and 14 for the expansion based methods. This should be compared to thesame quantity computed with DL MONTE at a ﬁxed solver tolerance of 10 − , indicated by the horizontal dashed line.As those results show, choosing 12 expansion terms ( p = 11) results in a comparable mean relative error δP which is11maller than 10 − . Note that for a ﬁxed value of p the local expansion based method (Algorithm L.1) has a slightlyhigher accuracy than the method which only uses multipole expansions (Algorithm M.1). Number of Expansion Terms − − − − M e a n r e l a t i v e E rr o r δ P o f A cce p t a n ce P r o b a b ili t y LocalMultipoleDL MONTE

Figure 3: Relative error δP in acceptance probability for DL MONTE and the expansion based methods for a varyingnumber of expansion terms.To compare the errors of all used methods in more detail, we also inspect the distribution of δP over all 16000samples. Fig. 4 shows a histogram of the relative error in the acceptance probability, i.e. the number of sampleswhich have a δP that falls into a certain interval [ δP , δP ]. Results are shown both for DL MONTE (again using asolver tolerance of 10 − ) and our expansion based methods with 12 expansion terms. The cumulative density of theprobability distribution in Figure 4 is plotted in Figure 5. In other words, for a given tolerance (cid:15) on δP , Figure 5shows the percentage of samples that have a relative error which does not exceed (cid:15) . As both ﬁgures demonstrate, thespread in errors in slightly larger for the expansion based methods: although for those methods a larger fraction ofsamples have errors well below the tolerance of 10 − , there is a small number of outliers. This, however, is consistentbetween the two expansion based methods.Finally, observe that a large relative error δP in the acceptance probability P will only translate into a largeabsolute error on P is ∆ U ∗ is also large. It is therefore instructive to also produce a scatter plot of ∆ U ∗ against δP for all samples and this is shown in Figure 6. Local N u m b e r o f s a m p l e s Multipole − − − − − − − Relative error δP in acceptance probability DL MONTE

Figure 4: Histograms of relative error δP in acceptance probability. Results are shown for the local expansion basedalgorithm (top), the multipole expansion based algorithm (middle) and DL MONTE (bottom); 12 expansion terms( p = 11) are used for ﬁrst two methods. 12 Local C u m u l a t i v e d e n s i t y Multipole − − − − − − − − − Tolerance (cid:15) on δP DL MONTE

Figure 5: Cumulative density of the relative error δP in computed probabilities. Results are shown for the localexpansion based algorithm (top), the multipole expansion based algorithm (middle) and DL MONTE (bottom); 12expansion terms ( p = 11) are used for ﬁrst two methods.Figure 6: Scatter plot of ∆ U ∗ / ( k B T ) against the relative error δP in acceptance probability. Results are shown for thelocal expansion based algorithm (left), the multipole expansion based algorithm (middle) and DL MONTE (right); 12expansion terms ( p = 11 ) are used for ﬁrst two methods. Next, we investigate the growth in computational cost as a function of the number of charges N . Formally thenumber of levels L of the hierarchical tree is O (log( N )). The relative proportion of time spent evaluating the far-ﬁeldevaluation in the propose stage (Algorithms L.1 and L.1), update of expansion coeﬃcients (Algorithms L.2 and M.2)and direct, nearest neighbour calculations (Algorithm 4) can be controlled by setting L = log ( αN ) and varying theconstant α . The optimal value of the parameter α depends on the computer hardware, the average acceptance rate ν and the number of expansion terms. We deﬁne the acceptance rate as ν = Number of accepted movesNumber of proposed moves . (6)Assuming that on average ν − = e ≈ .

718 proposals have to be generated for each accepted move, we ﬁnd that forour setup and with 12 expansion terms ( p = 11), the best results are obtained for α = 0 .

327 when using the localexpansion based method (Algorithms L.1, L.2) and α = 0 .

138 if the multipole method (Algorithms M.1, M.2) is used.This also implies that for a given value of N , the optimal number of levels for both methods diﬀers by less that 0.5.To quantify the computational complexity of the propose stage (Algorithms L.1 and M.1) and the accept stage(Algorithms L.2 and M.2) separately, random proposals and accepted moves are generated for problems of increasingsize, drawing particle positions and charges from a uniform random distribution with a maximal absolute displacementof 0 . Number of charges N t i m e [ µ s ] L = 3 4 5 proposeacceptﬁt Number of charges N t i m e [ µ s ] L = 3 4 5 6 proposeacceptﬁt Figure 7: Time spent proposing a single MC move and accepting it as a function of the number of charges N in thesystem. Results in the left ﬁgure were obtained with the multipole expansion based method in [8] and show the timespent in Algorithm M.1 ( T (mp)prop , blue squares) and Algorithm M.2 ( T (mp)acc , red circles). The right ﬁgure shows thecorresponding numbers for our new method based on local expansion; namely the time spent in Algorithm L.1 ( T (loc)prop ,blue squares) and Algorithm L.2 ( T (loc)acc , red circles). The step-changes in measured times (marked by dashed verticallines) correspond to increases in the number of levels L , which are shown at the bottom of each ﬁgure.containing between a thousand ( N = 10 ) and a million ( N = 10 ) particles. For each N the initial arrangement ofparticles is constructed as described in Section 5.1. Figure 7 shows the average time (measured over 1000 samples) perpropose or accept operation as a function of the number N of charges in the system; the ﬁtted dashed lines are discussedin Section 5.4 below. The measured times increase abruptly as the number of levels L changes. Asymptotically weexpect all times to grow as L ∝ log( N ). However, there are signiﬁcant diﬀerences in the rate of growth and absolutecomputational cost for the diﬀerent implementations. While for the multipole based method from [8] proposing asingle move is signiﬁcantly more expense than accepting it, the opposite is true for our new method based on localexpansions. The main reason for this is that that the expensive loop over cells in the interaction list has to be executedin the propose stage on the of the multipole based method (Algorithm M.1), whereas the interaction list is traversed inthe accept stage of our new method (Algorithm L.2). Overall we therefore expect our new method to be more eﬃcientas the acceptance rate ν decreases, and the number of proposals is signiﬁcantly larger than the number of acceptedmoves. In Metropolis Hastings simulations this is the case since the acceptance rate is usually signiﬁcantly less than1, a typical value is 1 /e ≈ . N in the system. Having quantiﬁed the time spent in the propose- and accept-stage of our expansion based algorithm separately, wenow discuss the growth of the total runtime of an entire Monte Carlo simulation as a function of the problem size.For this we compare the performance of our expansion-based implementations in PPMD against DL MONTE. Againwe consider an NaCl crystal with the same initial arrangement of charges as described in Section 5.1. To preventoppositely charged ions from collapsing onto one another over the course of the simulation, a repulsive short-rangeLennard-Jones potential with a ﬁxed cutoﬀ of 12˚A is added. For the largest problem sizes (with N ≈ in a cubicbox with a side length of 153 . . δ r = r (cid:48) − r , such that each component δr j is uniformlydistributed in the interval [ − . , +0 . ν − ≈ .

438 for theexpansion based methods. Note that the performance of the Ewald implementation in DL MONTE is not sensitive tothe acceptance rate and that in DL MONTE the additional cost of accepting a proposed move is negligible.Figure 8 shows the time per MC step for our implementations of the expansion based methods and for DL MONTEas a function of the number of charges N . The size of the system varies between N = 10 and N = 10 charges andthe reported times are averaged over 1000 Metropolis-Hastings steps (i.e. 1000 proposed moves). We do not includethe setup times, since those account for an insigniﬁcant fraction of the runtime for “real” simulations with a largenumber of Metropolis-Hastings steps. 14 Number of charges N − T i m e t a k e np e r s t e p ( m s ) LocalMultipoleDL MONTE

Figure 8: Time per Metropolis Hastings step for our implementations of the expansion based methods andDL MONTE. In the latter case the cutoﬀ was kept ﬁxed at r c = 12˚A for all problem sizes, except for the right-most data point where results with an optimised cutoﬀ of r c = 28˚A are also shown as a solid diamond.For N < particles DL MONTE provides more performance than our implementations, but it is almost an ordermagnitude slower for the largest considered problem size ( N = 10 ) . While empirically the cost of the expansionbased methods grows only extremely slowly with the problem size, this is not the case for DL MONTE, for whichthe runtime is approximately proportional to the number of particles N . This O ( N ) growth can be explained asfollows: for the direct Ewald summation method with a short-range cutoﬀ r c , short-range interactions with O ( ρr c )particles have to be computed for each proposed move in a system with density ρ . To balance errors in the short-and long-range computation, the physical cutoﬀ in Fourier space and r c are inversely related. As a consequence, thenumber of Fourier modes that have to be considered grows like O ( V r − c ). Hence, for ﬁxed density where V = ρN thecombined cost of the short- and long- range contributions isCost Ewald ( r c , N ) = O ( r c + N r − c ) . (7)This cost can be minimised by varying the cutoﬀ such that r c = r (0) c N / for some constant r (0) c , as discussed forexample in [19] (see also [13, Chapter 12]), resulting in a total cost of Cost Ewald ( r (0) c N / , N ) = O ( N / ) per proposalfor the Ewald method. However, in the current version of DL MONTE the short-range cutoﬀ for the Ewald summationhas to be identical to the cutoﬀ for Lennard-Jones interactions, which we ﬁxed at r c = 12˚A to obtain the results in Fig.8. As Eq. (7) shows, the cost of the Ewald summation is then dominated by the long range contribution and grows inproportion to N . While this is a current limitation of DL MONTE and not a fundamental issue, it is worth stressingthat even if this problem is overcome, Ewald summation has a computational complexity of O ( N / ) compared to the O (log( N )) complexity of our hierarchical methods. To demonstrate the eﬀect of a more optimal short-range cutoﬀwe varied the cutoﬀ radius r c between 12˚A and 32˚A and found that a cutoﬀ of r c = 28˚A gives near optimal results.As shown by the rightmost datapoints in Figure 8, where the results obtained with r c = 28˚A are indicated by a soliddiamond, this reduces the time per MH step from 8 . . N ≈ charges. This value might be reducedfurther in future releases of DL MONTE, if the diﬀerent cutoﬀs can be varied independently to avoid the evaluationof short range potentials with an unnaturally large cutoﬀ.Finally, note that in its current implementation the setup cost of DL MONTE grows like O ( N ) instead of O ( N / )which could be achieved for an optimal r c ∝ N / . Although not considered here, this O ( N ) setup time can becomecomputationally signiﬁcant for systems containing a large number of charges. To model the speedup of our method relative to the one in [8] for large N we ﬁt the measured data in Figure 7 usingthe following expressions:Propose, local expansion method (Algorithm L.1) : T (loc)prop ( N ) = τ (loc)prop + γ (loc)prop · log ( N )Accept, local expansion method (Algorithm L.2) : T (loc)acc ( N ) = τ (loc)acc + γ (loc)acc · log ( N )Propose, multipole method (Algorithm M.1) : T (mp)prop ( N ) = τ (mp)prop + γ (mp)prop · log ( N )Accept, multipole method (Algorithm M.2) : T (mp)acc ( N ) = τ (mp)acc + γ (mp)acc · log ( N ) (8)15 Number of charges N . . . . . Sp ee dup S ( N ) S ∞ = 1 . theory [ ν = 0 . Figure 9: Theoretical speedup S ( N ), in dashed blue, for diﬀerent acceptance rates ν as deﬁned in Eq. (11); theasymptotic speedup S ∞ , in solid blue, is given in Eq. (12). The measured speedup, in solid red, is computed from theDL MONTE comparisons plotted in Figure 8.where superscripts “(loc)” and “(mp)” denote local and multipole expansions respectively and subscripts “prop” and“acc” denote propose and accept operations respectively. The numerical values of the ﬁt coeﬃcients are τ (loc)prop = 253 . µ s , γ (loc)prop = 32 . µ s , τ (loc)acc = 127 . µ s , γ (loc)acc = 239 . µ s , (9) τ (mp)prop = − . µ s , γ (mp)prop = 247 . µ s , τ (mp)acc = 410 . µ s , γ (mp)acc = 21 . µ s . (10)The empirical model in Eq. (8) allows to predict the expected speedup for a given problem size N as S ( N ) = T (mp)prop ( N ) + νT (mp)acc ( N ) T (loc)prop ( N ) + νT (loc)acc ( N ) (11)The asymptotic speedup is S ∞ = lim N →∞ S ( N ) = γ (mp)prop + νγ (mp)acc γ (loc)prop + νγ (loc)acc (12)In Figure 9 we plot the theoretical and measured speedup for an acceptance rate ν = 0 . S ∞ for the speedup is also shown. AsFigure 9 demonstrates, the ﬁt reproduces the measured speedup reasonably well. The expected asymptotic speeduplarge N is almost 2, demonstrating that out method has the potential to approximately halve the runtime comparedto the algorithm in [8]. In this paper we presented a new hierarchical method for accurately including electrostatic interactions in Monte Carlosimulations. Our algorithm has a computational complexity of O (log( N )) per Metropolis Hastings step. Comparedto the related technique in [8], our method reduces the average cost of a MH step as the balance of work betweenthe propose and accept operations is more favourable for typical acceptance rates. Numerically we ﬁnd that runtimesare reduced by around 30% for systems with N = 10 charges, with the potential of a speedup of just below 2 × forlarger values of N . We demonstrated numerically that our implementation will eﬀectively scale to systems containing10 charges whilst maintaining the expected computational complexity of O (log( N )) per MH step. As the directEwald summation technique has a higher complexity of at least O ( √ N ), our implementation becomes more eﬃcientfor simulations with more than 10 particles: for N = 10 it is about an order of magnitude faster than the currentDL MONTE implementation.There are several avenues for future work. One obvious shortcoming of the present implementation is the lackof parallelisation. While Monte Carlo simulations are “embarrassingly parallel”, and several Markov Chains can be16enerated in parallel without communications, this increases memory requirements. In our method this issue could bereduced to a certain degree by parallelising over the levels in the grid hierarchy. This is possible since the cost on eachlevel is constant, and computations can be carried out independently, before summing the total energy in the proposestage.Furthermore, here we have only considered single-particle moves and future work should also investigate otherMonte Carlo transitions. For example, in a canonical ensemble the pressure is ﬁxed but the volume of the simulationcell varies. In this case a proposed move could consist of a change of the system volume. In the worst case scenario,the energy change of such a proposed volume move is computed with a full solve of the electrostatic energy of theproposed state. When using an Ewald based approach this system solve incurs an O ( N ) cost per MH step, however,with multipole- or local expansion based approaches this can potentially be reduced to an O ( N ) cost. Acknowledgements

This research made use of the Balena High Performance Computing (HPC) Service at the University of Bath. Thisproject has received funding from the European Unions Horizon 2020 research and innovation programme under grantagreements No 646176 and No 824158.

References [1] G. A. Cisneros, M. Karttunen, P. Ren, C. Sagui, Classical electrostatics for biomolecular simulations, Chemicalreviews 114 (1) (2013) 779–814. doi:10.1021/cr300461d .[2] M. Jorge, N. A. Seaton, Long-range interactions in Monte Carlo simulation of conﬁned water, Molecular Physics100 (13) (2002) 2017–2023. doi:10.1080/00268970110099585 .[3] C. K. Sandalci, C. Koc, S. M. Goodnick, Three-dimensional Monte Carlo device simulation with parallel multigridsolver, International Journal of High Speed Computing 9 (03) (1997) 223–236. doi:10.1142/S0129053397000143 .[4] P. P. Ewald, Die Berechnung optischer und elektrostatischer Gitterpotentiale, Annalen der Physik 369 (3) (1921)253–287. doi:10.1002/andp.19213690304 .[5] L. Greengard, V. Rokhlin, A Fast Algorithm for Particle Simulations, Journal of Computational Physics 73 (2)(1987) 325–348. doi:10.1016/0021-9991(87)90140-9 .[6] L. Greengard, The rapid evaluation of potential ﬁelds in particle systems, MIT press, 1988.[7] L. Greengard, V. Rokhlin, A new version of the fast multipole method for the Laplace equation in three dimensions,Acta numerica 6 (1997) 229–269. doi:10.1017/S0962492900002725 .[8] T. A. H¨oft, B. K. Alpert, Fast updating multipole Coulombic potential calculation, SIAM Journal on ScientiﬁcComputing 39 (3) (2017) A1038–A1061. doi:10.1137/16M1096189 .[9] W. R. Saunders, J. Grant, E. H. M¨uller, A domain speciﬁc language for performance portable molecular dynamicsalgorithms, Computer Physics Communications 224 (2018) 119–135. doi:10.1016/j.cpc.2017.11.006 .[10] J. Purton, J. C. Crabtree, S. Parker, DL MONTE: a general purpose program for parallel Monte Carlo simulation,Molecular Simulation 39 (14-15) (2013) 1240–1252. doi:10.1080/08927022.2013.839871 .[11] A. V. Brukhno, J. Grant, T. L. Underwood, K. Stratford, S. C. Parker, J. A. Purton, N. B. Wilding, DL MONTE:a multipurpose code for Monte Carlo simulation, Molecular Simulation (2019) 1–21 doi:10.1080/08927022.2019.1569760 .[12] Z. Gan, Z. Xu, Eﬃcient implementation of the Barnes-Hut octree algorithm for Monte Carlo simulations ofcharged systems, Science China Mathematics 57 (7) (2014) 1331–1340. doi:10.1007/s11425-014-4783-5 .[13] D. Frenkel, B. Smit, Understanding molecular simulation: From algorithms to applications, Vol. 1, Academicpress, San Diego/London, 2001.[14] J. Rottler, A. C. Maggs, A continuum, O ( N ) Monte Carlo algorithm for charged particles, The Journal of chemicalphysics 120 (7) (2004) 3119–3129. doi:10.1063/1.1642590 .[15] W. R. Saunders, J. Grant, E. H. M¨uller, I. Thompson, Fast electrostatic solvers for kinetic Monte Carlo simula-tions, Journal of Computational Physics (2020) 109379 doi:10.1016/j.jcp.2020.109379 .1716] W. R. Saunders, J. Grant, E. H. M¨uller, Long Range Forces in a Performance Portable Molecular DynamicsFramework, in: Parallel Computing is Everywhere, 2018, pp. 37 – 46. doi:10.3233/978-1-61499-843-3-37 .[17] W. R. Saunders, J. Grant, E. H. Mueller, Code and Data Release: Monte Carlo simulations with fast and accurateelectrostatics (Jun. 2020). doi:10.5281/zenodo.3873307 .URL https://doi.org/10.5281/zenodo.3873307 [18] CCP5, DL POLY 4 TEST01, ftp://ftp.dl.ac.uk/ccp5/DL_POLY/DL_POLY_4.0/DATA/ , [Online; accessed01/04/2018] (2018).[19] J. Kolafa, J. W. Perram, Cutoﬀ errors in the Ewald summation formulae for point charge systems, MolecularSimulation 9 (5) (1992) 351–368. doi:10.1080/08927029208049126doi:10.1080/08927029208049126