[PDF] Evaluating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo

Abstract

Riemannian manifold Hamiltonian Monte Carlo is traditionally carried out using the generalized leapfrog integrator. However, this integrator is not the only choice and other integrators yielding valid Markov chain transition operators may be considered. In this work, we examine the implicit midpoint integrator as an alternative to the generalized leapfrog integrator. We discuss advantages and disadvantages of the implicit midpoint integrator for Hamiltonian Monte Carlo, its theoretical properties, and an empirical assessment of the critical attributes of such an integrator for Hamiltonian Monte Carlo: energy conservation, volume preservation, and reversibility. Empirically, we find that while leapfrog iterations are faster, the implicit midpoint integrator has better energy conservation, leading to higher acceptance rates, as well as better conservation of volume and better reversibility, arguably yielding a more accurate sampling procedure.

Full PDF

EEvaluating the Implicit Midpoint Integrator for Riemannian ManifoldHamiltonian Monte Carlo

James A. Brofos Roy R. Lederman Abstract

Riemannian manifold Hamiltonian Monte Carlois traditionally carried out using the generalizedleapfrog integrator. However, this integrator isnot the only choice and other integrators yield-ing valid Markov chain transition operators maybe considered. In this work, we examine the im-plicit midpoint integrator as an alternative to thegeneralized leapfrog integrator. We discuss advan-tages and disadvantages of the implicit midpointintegrator for Hamiltonian Monte Carlo, its the-oretical properties, and an empirical assessmentof the critical attributes of such an integrator forHamiltonian Monte Carlo: energy conservation,volume preservation, and reversibility. Empiri-cally, we ﬁnd that while leapfrog iterations arefaster, the implicit midpoint integrator has betterenergy conservation, leading to higher acceptancerates, as well as better conservation of volumeand better reversibility, arguably yielding a moreaccurate sampling procedure.

1. Introduction

Riemannian manifold Hamiltonian Monte Carlo (RMHMC)is a powerful algorithm for sampling from Bayesian pos-terior distributions (Girolami & Calderhead, 2011). Givena log-posterior function L : R m → R and a Riemannianmetric G : R m → R m × m (with the condition that G ( q ) ispositive deﬁnite for each q ∈ R m ), RMHMC considers theHamiltonian dynamics corresponding to the Hamiltonian, H ( q, p ) = −L ( q ) + 12 p (cid:62) G − ( q ) p + 12 log det( G ( q )) . (1)Riemannian metrics are incorporated into HMC in order toprecondition dynamics and more efﬁciently explore the dis-tribution. Irrespective of the choice of metric, the form of the Department of Statistics and Data Science, Yale University.Correspondence to: James A. Brofos < [email protected] > . Figure 1.

Trajectories computed by the leapfrog integrator and theimplicit midpoint integrator for a quadratic, separable Hamiltonian H ( q, p ) = q (cid:62) q/ p (cid:62) p/ . Both integrators are stable but theleapfrog integrator deviates from the level sets of the Hamiltonian(faint blue circles) whereas every iterate of the implicit midpointintegrator lies on the same energy level set as the previous one. Hamiltonian in eq. (1) corresponds to a Gibbs distributionproportional to exp( − H ( q, p )) and tractable conditionaldistribution p | q ∼ Normal(0 , G ( q )) . However, the form ofthis Hamiltonian is such that it cannot be written as the sumof two functions, each a function of q or p alone; such aHamiltonian is called “non-separable.” The presence of anon-separable Hamiltonian presents unique challenges fornumerical integration.The leapfrog integrator and its variants are a ubiquitouschoice for the numerical integration of Hamiltonian mechan-ics for HMC; for instance see (Betancourt, 2012; Brubakeret al., 2012; Byrne & Girolami, 2013; Girolami & Calder-head, 2011; Neal, 2010; Tripuraneni et al., 2017), amongmany others. It may, therefore, not be apparent that numeri-cal integrators other than the leapfrog method are applicableto HMC, provided that they exhibit two properties:(i) The integrator has a unit Jacobian determinant so thatit preserves volume in ( q, p ) -space.(ii) The integrator is symmetric under negation of the mo-mentum variable. a r X i v : . [ s t a t . C O ] F e b valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo These properties are sufﬁcient to prove that HMC satisﬁesdetailed balance, which in turn establishes that the stationarydistribution has density proportional to exp( − H ( q, p )) ; seeBishop (2006); Neal (2010), or appendix H for a proof.Why does the choice of numerical integrator matter? Thereare at least three reasons.(a) Numerical integrators differ with respect to energy con-servation and stability. The acceptance probability ofHMC depends on the energy conservation and the abil-ity of the HMC proposal to use large integration stepsdepends on stability.(b) Numerical integrators may only satisfy properties (i)and (ii) above approximately , particularly if the inte-grators are deﬁned as solutions to implicitly-deﬁnedequations. These approximate solvers of implicit equa-tions will be discussed in more details below. Typically,the error of these methods will depend on a convergencetolerance δ used to ﬁnd ﬁxed-points of the integrationstep (algorithm 1). For a non-zero convergence tol-erance, the degree to which properties (i) and (ii) areviolated will depend on the integrator and the tolerance.(c) Numerical integrators will differ in their efﬁciency inthe sense that there may be structural properties of theHamiltonian system that the integrator exploits. Moreefﬁcient integrators will exhibit higher effective samplesizes per second when used in HMC.The contribution of this work is to compare and contrast thegeneralized leapfrog integrator with the implicit midpointmethod in application to RMHMC. We consider RMHMCbecause the non-separable Hamiltonian necessitates elabo-rate integration schemes which require solving implicitly-deﬁned equations; this is in contrast to Euclidean HMC withconstant G which produces a separable Hamiltonian that canbe integrated explicitly. First, we compare the two integra-tors on the energy conservation, volume preservation, andreversibility as discussed in reason (a). Second, we studythe breakdown of exact satisfaction of properties (i) and(ii) in implicitly-deﬁned integrators as described in reason(b). Third, we consider multiple variants of the generalizedleapfrog and implicit midpoint integrators that exhibit differ-ent efﬁciencies, relevant to reason (c). We conclude that theimplicit midpoint integrator exhibits superior energy conser-vation, conservation of volume, and symmetry compared tothe generalized leapfrog integrator. We explore inference insophisticated Bayesian inference tasks wherein the implicitmidpoint integrator is competitive with, or exceeds, the time-normalized performance of the generalized leapfrog method.We therefore argue that the implicit midpoint integrator is aprocedure worth consideration in RMHMC.

2. Background

For Bayesian inference tasks, L is the sum of the log-likelihood and the log-prior; in this circumstance, the typicalform of the Riemannian metric G is the sum of the Fisherinformation of the log-likelihood and negative Hessian ofthe log-prior; this choice of Riemannian metric is motivatedby information geometry (Amari, 2016). The Hamiltonianin eq. (1) leads to the equations of motion, ˙ q i = ∂∂p i H ( q, p ) = m (cid:88) j =1 G − ij ( q ) p j (2) ˙ p i = − ∂∂q i H ( q, p ) = − ∂∂q i L ( q ) −

12 trace (cid:18) G − ( q ) ∂∂q i G ( q ) (cid:19) + 12 p (cid:62) G − ( q ) ∂∂q i G ( q ) G − ( q ) p (3)As stated in section 1, the standard integrator for RMHMCis the (generalized) leapfrog integrator. A naive implemen-tation of a single step of the generalized leapfrog integratorwith step-size (cid:15) and initial position ( q, p ) is presented in al-gorithm 2. Notice that eqs. (7) and (8) are implicitly deﬁned in the sense that the quantities appearing on the left-handside also appear on the right-hand side; these equations aretypically solved to a given tolerance δ ≥ (in the sensedeﬁned in the ﬁxed point iteration algorithm algorithm 1).When δ = 0 , the generalized leapfrog integrator satisﬁesproperties (i) and (ii), however, in practice, the tolerance isoften chosen to be larger than machine precision in orderto reduce the number of ﬁxed point iterations; thereforeproperties (i) and (ii) are no longer satisﬁed accurately.The implicit midpoint method, an alternative to the gener-alize leapfrog integrator, is presented in algorithm 3; theimplicit midpoint integrator also involves the solution to animplicitly-deﬁned eq. (10). When δ = 0 , it is well-knownthat the implicit midpoint integrator satisﬁes property (i);see Leimkuhler & Reich (2005). It also satisﬁes property(ii) for Hamiltonians of the form eq. (1); see appendix A.We turn now to discussing a theoretical property of numeri-cal integrators related to conserved quantities. Deﬁnition 1.

Let z = ( q, p ) where the time evolution of q i is given by eq. (2) and of p i by eq. (3) for i = 1 , . . . , m . Aconserved quantity of z is a real-valued function z (cid:55)→ G ( z ) for which dd t G ( z ) = 0 .It is important to notice that deﬁnition 1 is a statementabout the underlying dynamics and has nothing to do withthe integrator used to approximate these dynamics . Theproperties of the integrators will be discussed in the nextparagraph. For Hamiltonian systems, the canonical example valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo Algorithm 1 (Fixed Point Iteration)

Procedure for solvingthe equation z = f ( z ) via ﬁxed point interation to a giventolerance. Input : Function f : R m → R m , initial guess z ∈ R m ,ﬁxed point convergence tolerance δ ≥ . Set ∆ z = ∞ and z (cid:48) = z . While : ∆ z > δ compute z (cid:48)(cid:48) = f ( z (cid:48) ) (4) ∆ z = max i ∈{ ,...,m } | z (cid:48)(cid:48) i − z (cid:48) i | (5) z (cid:48) = z (cid:48)(cid:48) (6) Return : z (cid:48) ∈ R m . Algorithm 2 (G.L.F.(a))

The procedure for a single stepof integrating Hamiltonian dynamics using the generalizedleapfrog integrator. Input : Hamiltonian H : R m × R m → R , initial posi-tion and momentum variables ( q, p ) ∈ R m × R m , in-tegration step-size size (cid:15) ∈ R , ﬁxed-point convergencetolerance δ ≥ . Use algorithm 1 with tolerance δ and initial guess p tosolve for ¯ p , ¯ p def. = p − (cid:15) ∇ q H ( q, ¯ p ) (cid:124) (cid:123)(cid:122) (cid:125) f (¯ p ) (7) Use algorithm 1 with tolerance δ and initial guess q tosolve for q (cid:48) , q (cid:48) def. = q + (cid:15) ∇ p H ( q, ¯ p ) + ∇ p H ( q (cid:48) , ¯ p )) (cid:124) (cid:123)(cid:122) (cid:125) f ( q (cid:48) ) (8) Compute the explicit update p (cid:48) def. = ¯ p − (cid:15) ∇ q H ( q (cid:48) , ¯ p ) (9) Return : ( q (cid:48) , p (cid:48) ) ∈ R m × R m .of a conserved quantity is the Hamiltonian energy itself;see Marsden & Ratiu (2010). Hamiltonian ﬂows are alsosymplectic (Hairer et al., 2006) which implies conservationof volume (in the same sense as that of property (i)).A numerical integrator cannot preserve all of the conservedquantities as the underlying ODE, but it may be able toconserve some simple ones. The following two results maybe found in Leimkuhler & Reich (2005). Theorem 1.

Let z = ( q, p ) . The generalized leapfrogintegrator (algorithm 2) with δ = 0 preserves any con-served quantity of the form G ( z ) = q (cid:62) A p + b (cid:62) z where Algorithm 3 (I.M.(a))

The procedure for a single step of in-tegrating Hamiltonian dynamics using the implicit midpointintegrator. Input : Hamiltonian H : R m × R m → R , initial posi-tion and momentum variables ( q, p ) ∈ R m × R m , in-tegration step-size size (cid:15) ∈ R , ﬁxed-point convergencetolerance δ ≥ . Use algorithm 1 with tolerance δ and initial guess ( q, p ) to solve for ( q (cid:48) , p (cid:48) ) (cid:18) q (cid:48) p (cid:48) (cid:19) def. = (cid:18) qp (cid:19) + (cid:15) (cid:18) ∇ p H (¯ q, ¯ p ) −∇ q H (¯ q, ¯ p ) (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) f ( q (cid:48) ,p (cid:48) ) (10)where ¯ q def. = ( q (cid:48) + q ) / and ¯ p def. = ( p (cid:48) + p ) / . Return : ( q (cid:48) , p (cid:48) ) ∈ R m × R m . A ∈ R m × m is a symmetric matrix and b ∈ R m . Theorem 2.

Let z = ( q, p ) . The implicit midpoint integra-tor (algorithm 3) with δ = 0 preserves any conserved quan-tity of the form G ( z ) = z (cid:62) A z + b (cid:62) z where A ∈ R m × m is a symmetric matrix and b ∈ R m .Notice that theorem 2 contains a strictly more general classof conserved quantity than theorem 1. We come now to ahypothesis that would justify the consideration of the im-plicit midpoint integrator within the context of HMC. Beforestating the hypothesis, we provide some initial motivationfor how the implicit midpoint integrator performs in thepresence of a quadratic Hamiltonian. Proposition 1.

Let H ( q, p ) ≡ H ( z ) = z (cid:62) A z be aquadratic Hamiltonian. Then, for any step-size , the pro-posals generated by Hamiltonian Monte Carlo using theimplicit midpoint integrator with δ = 0 will be accepted.A proof is given in appendix J. Note, however, that perfectconservation of the Hamiltonian energy does not imply thatthe implicit midpoint integrator is the exact solution of theHamilton’s equations of motion. Nevertheless, Proposition 1suggests an important difference between the generalizedleapfrog integrator and the implicit midpoint method interms of their conservation properties. Although Bayesianposterior distributions are unlikely to be Gaussians, it iswidely accepted that Gaussian approximations are useful.Such notions materialize, for example, in the central limittheorem and the Laplace approximation. Provided the pos-terior is approximately Gaussian, therefore, this leads us tothe following hypothesis. Hypothesis.

The implicit midpoint algorithm will ex-hibit higher acceptance probabilities than the generalizedleapfrog integrator for the same step-size. valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo

If true, and if the ﬁxed point iterations required by the im-plicit midpoint procedure are not too burdensome relativeto the generalized leapfrog integrator, then the higher ac-ceptance rate may produce more favorable effective samplesizes for the Markov chain whose transitions are computedusing the implicit midpoint algorithm. In this scenario, theimplicit midpoint integrator may be worth consideration asan alternative to the generalized leapfrog integrator.For a brief introduction to the stability of numerical integra-tors, see appendix D.

3. Related Work

Most relevant to our discussion is Pourzanjani & Petzold(2019). In this work, the authors examine the relationshipbetween the (non-generalized) leapfrog integrator and theimplicit midpoint integrator; the authors make the argumentthat the implicit midpoint integrator is more stable in thepresence of posteriors whose dimensions exhibit large dif-ferences in their variability (“multi-scale”). The presenceof multi-scale posterior dimensions necessitates a smallstep-size for the leapfrog integrator, which is found to beunnecessary for the implicit midpoint algorithm. As theauthors note, however, “RMHMC uses local Hessian evalu-ations of the potential energy surface to adaptively changethis step-size based on the local curvature;” therefore, theirexperiments instead focus on the circumstance where a con-stant mass matrix is utilized, corresponding to EuclideanHMC with no local adaptation of the step-size. Indeed, asobserved in Martens (2020), the Fisher information capturesthe second-order geometry of the posterior and actually ex-hibits properties that make it preferable to the Hessian ofthe posterior in optimization. Therefore, the present workdiffers from Pourzanjani & Petzold (2019) in its focus onRiemannian geometry wherein the metric compensates (atleast locally) for multi-scale dimensions; moreover, our em-pirical analysis of reversibility and volume preservation is,to the best of our knowledge, novel. Before proceeding tothe experimental results, we note that stability alone cannot account for the high acceptance rate enjoyed by the implicitmidpoint integrator: even in the regime wherein the general-ized leapfrog integrator is stable, it is not able to perfectlyconserve the Hamiltonian energy as the implicit midpointintegrator does. This phenomenon is visualized in ﬁg. 1.

4. Experimental Results

We turn now to evaluating the implicit midpoint integrator inseveral Bayesian inference tasks. We consider inference in abanana-shaped posterior, sampling from Neal’s funnel distri-bution, a stochastic volatility model, and Bayesian inferencein the Fitzhugh-Nagumo differential equation model. Wehave additional experimental results in our appendices. In appendix E, we seek to verify theorem 2 in the presence ofa truly quadratic Hamiltonian. In appendix F, we examineBayesian inference in a logistic regression posterior. To de-ﬁne a stopping condition for the ﬁxed point iterations usedby the implicit midpoint and generalized leapfrog methods,we demand that the change in each coordinate be less thana threshold; we let δ ∈ (cid:8) × − , × − , × − (cid:9) when considering reversibility and volume preservation.When reporting performance metrics such as effective sam-ple size, we report results corresponding to a threshold of δ = 1 × − . We implemented all methods in 64-bit preci-sion using NumPy and SciPy (Harris et al., 2020; Virtanenet al., 2020). We compute effective sample sizes (ESS)using Kumar et al. (2019). We consider two variants of the generalized leapfrog methodand two variants of the implicit midpoint integrator, whichwe summarily describe as follows.

G.L.F.(a)

An implementation of the generalized leapfrogintegrator as presented in algorithm 2.

G.L.F.(b)

An implementation of the generalized leapfrogintegrator that caches repeated calculations and whichis speciﬁc to Hamiltonians in the form of eq. (1). Seealgorithm 4 in appendix I. G.L.F.(b) is mathematicallyidentical to G.L.F.(a), but this implementation avoidssome redundant computation. Differences between theoutputs of G.L.F.(a) and G.L.F.(b) are due to randomseeds and machine error in computation.

I.M.(a)

An implementation of the implicit midpoint inte-grator as presented in algorithm 3.

I.M.(b)

An implementation of the implicit midpoint inte-grator that implicitly computes the midpoint followedby an explicit Euler step, as advocated by (Leimkuhler& Reich, 2005). See algorithm 5 in appendix I.In all of our implementations, we use ﬁxed point iterations inorder to ﬁnd solutions to implicitly-deﬁned relations. This isthe approach advocated by Hairer et al. (2006). Additionaldetails are presented in appendix I.

The banana-shaped distribution was proposed in a discus-sion to (Girolami & Calderhead, 2011) as a representativeexample of the ridge-like posterior structure that can mani-fest in non-identiﬁable models. The banana-shaped distribu-tion is deﬁned by the following generative model. y i | θ , θ ∼ Normal( θ + θ , σ y ) for i = 1 , . . . , n (11) θ i ∼ Normal(0 , σ θ ) for i ∈ { , } . (12) valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo Acc. Prob. Mean ESS Min. ESS Mean ESS / Sec. Min. ESS / Sec.Step Size Num. Steps Method0.1 5 G.L.F.(a) . ± .

00 487 . ± .

92 290 . ± .

16 0 . ± .

02 0 . ± . G.L.F.(b) . ± .

00 482 . ± .

43 279 . ± .

67 2 . ± .

08 1 . ± . I.M.(a) . ± .

00 846 . ± .

33 584 . ± .

41 6 . ± .

22 4 . ± . I.M.(b) . ± .

00 854 . ± .

41 605 . ± .

67 6 . ± .

27 4 . ± .

10 G.L.F.(a) . ± .

00 1059 . ± .

53 797 . ± .

79 1 . ± .

04 0 . ± . G.L.F.(b) . ± .

01 1013 . ± .

41 788 . ± .

64 3 . ± .

10 2 . ± . I.M.(a) . ± .

00 2987 . ± .

53 2584 . ± .

00 11 . ± .

29 10 . ± . I.M.(b) . ± .

00 2987 . ± .

90 2500 . ± .

43 11 . ± .

32 9 . ± .

50 G.L.F.(a) . ± .

00 246 . ± .

24 123 . ± .

89 0 . ± .

01 0 . ± . G.L.F.(b) . ± .

00 294 . ± .

16 142 . ± .

72 0 . ± .

02 0 . ± . I.M.(a) . ± .

00 4463 . ± .

42 3318 . ± .

30 3 . ± .

21 2 . ± . I.M.(b) . ± .

00 4574 . ± .

29 3358 . ± .

98 3 . ± .

19 2 . ± . Table 1.

Comparison of the implicit midpoint and generalized leapfrog integrators on sampling from the banana-shaped distribution. Toassess performance of the sampler, we measure the effective sample size (ESS) and present per-second timing comparisons for the meanand minimum ESS. The hypothesis that the implicit midpoint integrator should exhibit better energy conservation is captured in theacceptance probability of the Markov chain. Results are averaged over ten trials.(a) (cid:15) = 1 / and steps (b) (cid:15) = 1 / and steps (c) (cid:15) = 1 / and steps Figure 2.

Comparison between the error in symmetry and error in volume preservation properties of the implicit midpoint integrator andthe generalized leapfrog integrator on the banana-shaped distribution. We observe that the implicit midpoint integrator tends to producetransitions whose median reversibility and volume preservation can be an order of magnitude, or more, better than the generalized leapfrogintegrator. Each point is the median of one-hundred measurements of symmetry and reversibility shown for each of the ten trials. For (cid:15) = 1 / and ﬁfty integration steps, the generalized leapfrog integrator exhibits severely divergent behavior. The implicit midpoint isrepresented by the symbol ( · ) and the generalized leapfrog by the symbol (+) . For the banana-shaped distribution, the Riemannian metricis G ( θ , θ ) = (cid:32) nσ y + σ θ nθ σ y nθ σ y nθ σ y + σ θ (cid:33) . (13)In our experiments, we take n = 100 . We generate observa-tions { y , . . . , y } from the banana-shaped distribution bysetting θ = 1 / , θ = 1 / √ , and σ y = σ θ = 2 . We thenattempt to sample the posterior distribution of ( θ , θ ) usingRMHMC when integration is performed using the implicitmidpoint algorithm or the generalized leapfrog method. Weconsider two step-sizes { . , . } and a number of inte-gration steps in { , , } . We attempt to draw 10,000samples from the posterior. Each of these conﬁgurations isreplicated ten times.Results are shown in table 1, demonstrating that the I.M.(a) and (b) integrators are able to maintain high energy conser-vation at step-sizes for which the G.L.F.(a) and (b) variantscannot. As a consequence, Markov chains using I.M.(a) orI.M.(b) are able to achieve very high effective sample sizes(ESS); moreover, because the cost of evaluating the gradi-ents of the banana-shaped posterior is not too large, theseMarko chains also exhibits superior performance on the tim-ing comparisons . We ﬁnd that I.M.(a) and I.M.(b) performsimilarly. In addition to energy conservation, an essentialcomponent of HMC are volume preservation and reversibil-ity (recall properties (i) and (ii) from section 1). Using thesamples drawn by the Markov chains with either integrator,we may compute numerical estimates of the degree to whichthese properties are satisﬁed. We give a detailed descriptionof the volume preservation and reversibility metrics in ap-pendix G. We use one-hundred randomly selected samplesgenerated from the Markov chains in order to compute these valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo statistics. Results showing the median reversibility versusthe median difference from unit Jacobian are shown in ﬁg. 2.These results show that the median symmetry and volumepreservation of the implicit midpoint integrator is approxi-mately an order of magnitude more faithfully preserved thanis the case for the generalized leapfrog method. As an example of a hierarchical Bayesian posterior, weconsider Neal’s funnel distribution deﬁned by, x i ∼ Normal(0 , exp( − v )) for i = 1 , . . . , (14) v ∼ Normal(0 , . (15)Due to the hierarchical structure of the distribution, the Hes-sian of the distribution is not convex and therefore cannot beused to construct a Riemannian metric on its own. Instead,we follow the approach proposed in Betancourt (2012) andadopt the SoftAbs transformation of the Hessian in order toconstruct a positive deﬁnite Riemannian metric. This allowsus to sample all variables of the hierarchical distributionjointly. For RMHMC, we consider an integration step-sizein { . , . , . } and we attempt to draw 10,000 samples of ( x , . . . , x , v ) from Neal’s funnel distribution.Results are presented in table 2. For the largest step-size,the I.M.(a) and (b) integrators are able to maintain highacceptance probabilities. Markov chains using the I.M.(a)or (b) method achieve the best minimum ESS per-second.This example presents a circumstance wherein the time-normalized best-case performance of mean and minimumeffective sample sizes did not co-occur in the same param-eter conﬁguration. Nevertheless, when optimizing for thehighest mean

ESS per-second, the I.M.(a) and (b) methodsalso outperform the G.L.F.(a) integrator. We visualize thesymmetry and volume preservation in ﬁg. 3; the implicitmidpoint integrator exhibits better symmetry and volumepreservation for the same convergence criterion.

While Neal’s funnel is a hierarchical distribution, it is notsampled in a hierarchical manner, instead sampling all vari-ables jointly using the SoftAbs Riemannian metric (Betan-court, 2012). Here, we consider a stochastic volatility modelwhose posterior includes the stochastic volatilities as well aslatent hyperparameters of the model; we will sample thesevariables using an alternating Gibbs procedure. Follow-ing Girolami & Calderhead (2011), the stochastic volatilitymodel is deﬁned, for t = 1 , . . . , T , by, y t | β, x t ∼ Normal(0 , β e x t ) (16) x t | x t − , φ, σ ∼ Normal( φx t − , σ ) (17)where x ∼ Normal(0 , σ / (1 − φ )) , π ( β ) ∝ /β , σ ∼ Inv − χ (10 , . , and ( φ + 1) / ∼ Beta(20 , . . The sampler proceeds by alternating between sampling the con-ditional posteriors of ( x , . . . , x T ) | ( y , . . . , y T ) , φ, β, σ and φ, β, σ | ( x , . . . , x T ) , ( y , . . . , y T ) . The Riemannianmetric of this ﬁrst posterior is constant with respect to ( x , . . . , x T ) ; therefore, sampling is carried out using thestandard leapfrog integrator. The second distribution hasa position-dependent Riemannian metric, necessitating theuse of implicitly-deﬁned integrators; here, we compare theimplicit midpoint and generalized leapfrog integrators. Fordetails of the Riemannian structures of the conditional pos-teriors, see Girolami & Calderhead (2011).In our experiments, we set T = 1 , and useﬁfty integration steps with a step-size of . tosample ( x , . . . , x T ) | ( y , . . . , y T ) , φ, β, σ and six in-tegration steps with a step-size of . to sample φ, β, σ | ( x , . . . , x T ) , ( y , . . . , y T ) . We seek to sample , times from the posterior and use a burn-in period of , iterations. We repeat this experiment one-hundredtimes for each integrator. Effective sample size metrics andmeasures of the volume preservation and symmetry are arepresented in table 3 for the parameters φ , β , and σ . Weﬁnd that the I.M.(a) and (b) integrators are comparable to theG.L.F.(a) and (b) methods in terms of their time-normalizedperformance. However, volume preservation and symmetryare much better for the I.M.(a) and (b) integrators, whichalso enjoy larger acceptance probabilities. The Fitzhugh-Nagumo ordinary differential equation is amodel of neural spiking activity. It is described by twotime-varying measurements whose dynamics obey, ˙ v = θ (cid:18) v − v r (cid:19) (18) ˙ r = − (cid:18) v − θ + θ rθ (cid:19) . (19)Consider the setting wherein one has 200 observations ofthe Fitzhugh-Nagumo dynamics at equally-spaced timesbetween zero and ten. Assume moreover that these obser-vations have been corrupted by i.i.d. Gaussian noise with aknown standard deviation of σ = 1 / . If we equip the pa-rameters θ , θ , and θ with standard normal priors, we mayuse the dynamics of eqs. (18) and (19) and the assumed noisedistribution in order to sample the posterior of ( θ , θ , θ ) .Let ( v n , r n ) be the solution of the Fitzhugh-Nagumo ODEat the n th time period. For the Fitzhugh-Nagumo differentialequation model, the ( i, j ) -entry of the metric is, G ij ( θ , θ , θ ) = 1 σ (cid:32) (cid:88) n =1 ∂v n ∂θ i ∂v n ∂θ j + ∂r n ∂θ i ∂r n ∂θ j (cid:33) + δ ij . (20) valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo Acc. Prob. Mean ESS Min. ESS Mean ESS / Sec. Min. ESS / Sec.Num. Steps Step Size Method20 0.1 G.L.F.(a) . ± .

00 15656 . ± .

21 381 . ± .

39 11 . ± .

13 0 . ± . I.M.(a) . ± .

00 16210 . ± .

56 389 . ± .

44 13 . ± .

25 0 . ± . I.M.(b) . ± .

00 15981 . ± .

58 384 . ± .

94 13 . ± .

14 0 . ± . . ± .

00 29555 . ± .

33 1541 . ± .

43 17 . ± .

11 0 . ± . I.M.(a) . ± .

00 31876 . ± .

14 1499 . ± .

09 20 . ± .

17 0 . ± . I.M.(b) . ± .

00 31387 . ± .

98 1550 . ± .

45 20 . ± .

17 1 . ± . . ± .

01 1957 . ± .

40 1433 . ± .

25 0 . ± .

07 0 . ± . I.M.(a) . ± .

00 10520 . ± .

88 9324 . ± .

29 3 . ± .

04 3 . ± . I.M.(b) . ± .

00 10712 . ± .

72 9665 . ± .

89 3 . ± .

04 3 . ± . Table 2.

Comparison of the implicit midpoint and the naive generalized leapfrog integrators on sampling from Neal’s funnel distribution.We see that the implicit midpoint integrator is able to take large steps and produce an effective sample size that outperforms the generalizedleapfrog integrator even in the time-normalized performance.(a) (cid:15) = 1 / (b) (cid:15) = 1 / (c) (cid:15) = 1 / Figure 3.

Comparison of the degree to which the implicit midpoint and generalized leapfrog integrators violate reversibility and volumepreservation on Neal’s funnel distribution. We observe that the implicit midpoint integrator exhibits better symmetry and volumepreservation.

The implicit midpoint is represented by the symbol ( · ) and the generalized leapfrog by the symbol (+) . Volume Preservation ( δ = 1 × − ) Symmetry ( δ = 1 × − )Method Acc. Prob. Mean ESS Min. ESS Mean ESS(Sec.) Min. ESS(Sec.) th -Per. Median th -Per. th -Per. Median th -Per.G.L.F.(a) . ± .

01 215 . ± .

71 108 . ± .

16 0 . ± .

00 0 . ± .

00 1 . −

05 3 . −

04 1 . . −

04 1 . −

04 2 . − G.L.F.(b) . ± .

01 216 . ± .

88 103 . ± .

08 0 . ± .

00 0 . ± .

00 1 . −

05 2 . −

04 1 . . −

04 1 . −

04 2 . − I.M.(a) . ± .

01 228 . ± .

71 105 . ± .

45 0 . ± .

00 0 . ± .

00 1 . −

08 5 . −

08 1 . −

07 3 . −

07 1 . −

06 3 . − I.M.(b) . ± .

01 222 . ± .

77 102 . ± .

26 0 . ± .

00 0 . ± .

00 1 . −

08 6 . −

08 1 . −

07 4 . −

07 1 . −

06 3 . − Table 3.

Comparison of the implicit midpoint generalized leapfrog integrators on the stochastic volatility model. We see that the implicitmidpoint integrator is competitive on the effective sample size and timing comparisons. We also evaluate the median reversibility andvolume preservation of the implicit midpoint and the generalized leapfrog integrators on the stochastic volatility model. Here we see thatthe implicit midpoint integrator enjoys better conservation of volume and reversibility.

In generating data from the Fitzhugh-Nagumo model, weset θ = θ = 0 . and θ = 3 . The dynamics are inte-grated using SciPy’s odeint function and gradients are approximated by forward sensitivity analysis as in Girolami& Calderhead (2011). We consider an integration step-sizeof (cid:15) = 1 and a number of integration steps in { , , } ; eachconﬁguration is replicated ten times. We sample 1,000 timesfrom the posterior.We expect the Fitzhugh-Nagumo ODE model to favor thegeneralized leapfrog integrator because of the complexity ofevaluating the log-posterior, the gradient of the log-posterior, the Riemannian metric, and the gradient of the Riemannianmetric, each of which involves solving a system of differen-tial equations. Therefore, the caching behavior associated tothe G.L.F.(b) integrator gives it an advantage here. Table 4shows the results of inferences in the Fitzhugh-Nagumoposterior. We observe that for a single-step, the I.M.(a) and(b) integrators appears to perform somewhat worse than theG.L.F.(a) and (b) variants, even on the measures of ESSthat ignore timing; this occurs despite the larger acceptancerate enjoyed by the implicit midpoint integrator. For twointegration steps, the inferences produced by I.M.(a) and(b) become super-efﬁcient; however, G.L.F.(b) is also super- valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo (a) One integration step (b) Two integration steps (c) Five integration steps Figure 4.

Comparison of the degree to which the implicit midpoint and generalized leapfrog integrators violate reversibility and volumepreservation on the Fitzhugh-Nagumo posterior. We observe that for the chosen convergence tolerances, the implicit midpoint integratortends to exhibit better reversibility and volume preservation except for the smallest tolerance where the generalized leapfrog has slightlybetter volume preservation but worse reversibility.

The implicit midpoint is represented by the symbol ( · ) and the generalizedleapfrog by the symbol (+) . Acc. Prob. Mean ESS Min. ESS Mean ESS / Sec. Min. ESS / Sec.Step Size Num. Steps Method1.0 1 G.L.F.(a) . ± .

01 289 . ± .

09 268 . ± .

31 0 . ± .

00 0 . ± . G.L.F.(b) . ± .

01 294 . ± .

66 266 . ± .

45 0 . ± .

01 0 . ± . I.M.(a) . ± .

00 232 . ± .

82 189 . ± .

03 0 . ± .

00 0 . ± . I.M.(b) . ± .

00 230 . ± .

75 194 . ± .

60 0 . ± .

00 0 . ± . . ± .

01 1123 . ± .

30 905 . ± .

34 0 . ± .

02 0 . ± . G.L.F.(b) . ± .

01 1256 . ± .

19 1113 . ± .

44 0 . ± .

05 0 . ± . I.M.(a) . ± .

00 1311 . ± .

83 1134 . ± .

03 0 . ± .

00 0 . ± . I.M.(b) . ± .

00 1421 . ± .

01 1271 . ± .

02 0 . ± .

01 0 . ± . . ± .

01 215 . ± .

06 143 . ± .

07 0 . ± .

00 0 . ± . G.L.F.(b) . ± .

01 212 . ± .

16 126 . ± .

02 0 . ± .

01 0 . ± . I.M.(a) . ± .

00 1016 . ± .

49 767 . ± .

83 0 . ± .

01 0 . ± . I.M.(b) . ± .

00 1143 . ± .

23 844 . ± .

18 0 . ± .

00 0 . ± . Table 4.

Comparison of the implicit midpoint and generalized leapfrog integrators on sampling from the posterior of the Fitzhugh-Nagumomodel. In this example, the higher acceptance probability of the implicit midpoint integrator did not produce a performance increase for asingle integration step. The implicit midpoint integrator becomes super-efﬁcient in the two-step regime, but cannot compete with thegeneralized leapfrog integrator’s computational advantages. efﬁcient and its computational advantages cause it to havesuperior performance in the timing metrics. For the largestnumber of steps, the performance of G.L.F.(a) and (b) de-teriorates so that the I.M.(a) and (b) integrators outperformthem even on the timing comparison.We also evaluate the degree to which the numerical inte-grator possesses the properties of symmetry and volumepreservation. The results are shown in ﬁg. 4. We see thatthe implicit midpoint integrator offers a clear advantage innumerical symmetry, and performs better on volume preser-vation as well.

5. Conclusion

This work has considered the implicit midpoint integrator asa substitute for the generalized leapfrog integrator for use inRMHMC. Inspired by the theory of the conserved quantities of numerical integrators, we hypothesized that the implicitmidpoint integrator would have better energy conservationin posterior distributions that are approximately Gaussian.Hamiltonian Monte Carlo requires that its integrators arevolume preserving and reversible; we give numerical assess-ments of the extent to which these properties are presentin implementations of these integrators, which differ fromtheir theoretical representation when a convergence toler-ance is used to halt a ﬁxed point iteration. We ﬁnd that theimplicit midpoint integrator has superior energy conserva-tion, conservation of volume, and reversibility across severalBayesian inference tasks. In three of the four example appli-cations, the implicit midpoint integrator met or exceeded the time-normalized performance of the generalized leapfrog in-tegrator. This, combined with its better volume preservationand reversibility, leads us to conclude that it is a methodworth consideration when implementing RMHMC. valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo

Acknowledgments

The authors would like to thank Marcus A. Brubaker forhelpful discussions.This material is based upon work supported by the NationalScience Foundation Graduate Research Fellowship underGrant No. 1752134. Any opinion, ﬁndings, and conclusionsor recommendations expressed in this material are those ofthe authors(s) and do not necessarily reﬂect the views of theNational Science Foundation. RRL was supported in partby NIH/NIGMS 1R01GM136780-01.

References

Amari, S.-i.

Information Geometry and Its Applications .Springer Publishing Company, Incorporated, 1st edition,2016. ISBN 4431559779.Betancourt, M. A general metric for riemannian manifoldhamiltonian monte carlo. 8085, 12 2012. doi: 10.1007/978-3-642-40020-9 35.Bishop, C. M.

Pattern Recognition and Machine Learning(Information Science and Statistics) . Springer-Verlag,Berlin, Heidelberg, 2006. ISBN 0387310738.Brubaker, M., Salzmann, M., and Urtasun, R. Afamily of mcmc methods on implicitly deﬁned man-ifolds. In Lawrence, N. D. and Girolami, M.(eds.),

Proceedings of the Fifteenth International Con-ference on Artiﬁcial Intelligence and Statistics , vol-ume 22 of

Proceedings of Machine Learning Re-search , pp. 161–172, La Palma, Canary Islands, April2012. PMLR. URL http://proceedings.mlr.press/v22/brubaker12.html .Byrne, S. and Girolami, M. Geodesic monte carlo onembedded manifolds.

Scandinavian Journal of Statis-tics , 40(4):825–845, Sep 2013. ISSN 0303-6898. doi:10.1111/sjos.12036. URL http://dx.doi.org/10.1111/sjos.12036 .Duane, S., Kennedy, A. D., Pendleton, B. J., andRoweth, D. Hybrid monte carlo.

Physics Let-ters B , 195(2):216 – 222, 1987. ISSN 0370-2693. doi: DOI:10.1016/0370-2693(87)91197-X.URL .Girolami, M. and Calderhead, B. Riemann manifoldlangevin and hamiltonian monte carlo methods.

Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 73(2):123–214, 2011. doi:10.1111/j.1467-9868.2010.00765.x. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2010.00765.x . Hairer, E., Lubich, C., and Wanner, G.

Geometric Nu-merical Integration: Structure-Preserving Algorithmsfor Ordinary Differential Equations; 2nd ed.

Springer,Dordrecht, 2006. doi: 10.1007/3-540-30666-8. URL https://cds.cern.ch/record/1250576 .Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers,R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J.,Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., vanKerkwijk, M. H., Brett, M., Haldane, A., del R’ıo, J. F.,Wiebe, M., Peterson, P., G’erard-Marchant, P., Sheppard,K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C.,and Oliphant, T. E. Array programming with NumPy.

Nature , 585(7825):357–362, September 2020. doi: 10.1038/s41586-020-2649-2. URL https://doi.org/10.1038/s41586-020-2649-2 .Kumar, R., Carroll, C., Hartikainen, A., and Martin,O. A. ArviZ a uniﬁed library for exploratory anal-ysis of Bayesian models in Python.

The Journal ofOpen Source Software , 2019. doi: 10.21105/joss.01143. URL http://joss.theoj.org/papers/10.21105/joss.01143 .Leimkuhler, B. and Reich, S.

Simulating Hamiltonian Dy-namics . Cambridge Monographs on Applied and Compu-tational Mathematics. Cambridge University Press, 2005.doi: 10.1017/CBO9780511614118.Marsden, J. E. and Ratiu, T. S.

Introduction to Mechanicsand Symmetry: A Basic Exposition of Classical Mechani-cal Systems . Springer Publishing Company, Incorporated,2010. ISBN 1441931430.Martens, J. New insights and perspectives on the nat-ural gradient method.

Journal of Machine LearningResearch , 21(146):1–76, 2020. URL http://jmlr.org/papers/v21/17-678.html .Neal, R. M. MCMC using Hamiltonian dynamics.

Hand-book of Markov Chain Monte Carlo , 54:113–162, 2010.Pourzanjani, A. A. and Petzold, L. R. Implicit hamiltonianmonte carlo for sampling multiscale distributions, 2019.Tripuraneni, N., Rowland, M., Ghahramani, Z., and Turner,R. Magnetic Hamiltonian Monte Carlo. In Precup, D.and Teh, Y. W. (eds.),

Proceedings of the 34th Interna-tional Conference on Machine Learning , volume 70 of

Proceedings of Machine Learning Research , pp. 3453–3461, International Convention Centre, Sydney, Australia,August 2017. PMLR. URL http://proceedings.mlr.press/v70/tripuraneni17a.html .Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M.,Reddy, T., Cournapeau, D., Burovski, E., Peterson, P.,Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo

Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J.,Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, ˙I.,Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D.,Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A.,Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa,F., van Mulbregt, P., and SciPy 1.0 Contributors. SciPy1.0: Fundamental Algorithms for Scientiﬁc Computingin Python.

Nature Methods , 17:261–272, 2020. doi:10.1038/s41592-019-0686-2. valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo

A. Momentum Negation Symmetry of Implicit Midpoint

Lemma 1.

Given a Hamiltonian H in the form of eq. (1), step-size (cid:15) , and initial position ( q, p ) , compute ( q (cid:48) , p (cid:48) ) accordingto algorithm 3 with δ = 0 . If one then computes ( q (cid:48)(cid:48) , p (cid:48)(cid:48) ) from initial position ( q (cid:48) , − p (cid:48) ) using algorithm 3 a second time(with the same Hamiltonian, step-size, and δ = 0 ), then q (cid:48)(cid:48) = q and − p (cid:48)(cid:48) = p .Lemma 1 establishes that the implicit midpoint integrator is suitable for HMC in that it satisﬁes properties (i) and (ii). Proof.

Consider the initial condition ( q, p ) and a ﬁxed step-size of (cid:15) . For the Riemannian manifold Hamiltonian MonteCarlo, the implicit midpoint integrator computes the following updates: q (cid:48) i = q i + (cid:15)  m (cid:88) j =1 G − ij (cid:18) q (cid:48) + q (cid:19) (cid:18) p (cid:48) j + p j (cid:19) (21) p (cid:48) i = p i + (cid:15) (cid:18) − ∂∂q i L (cid:18) q (cid:48) + q (cid:19) −

12 trace (cid:18) G − (cid:18) q (cid:48) + q (cid:19) ∂∂q i G (cid:18) q (cid:48) + q (cid:19)(cid:19) +12 (cid:18) p (cid:48) + p (cid:19) (cid:62) G − (cid:18) q (cid:48) + q (cid:19) ∂∂q i G (cid:18) q (cid:48) + q (cid:19) G − (cid:18) q (cid:48) + q (cid:19) (cid:18) p (cid:48) + p (cid:19)(cid:33) (22)What we want to show is that if we compute ( q (cid:48) , p (cid:48) ) , negate the momentum ( q (cid:48) , p (cid:48) ) (cid:55)→ ( q (cid:48) , − p (cid:48) ) , and apply the implicitmidpoint integrator a second time, then we arrive at ( q, − p ) . Thus, we need to establish that ( q, − p ) is a ﬁxed point of therelations, q (cid:48)(cid:48) i = q (cid:48) i + (cid:15)  m (cid:88) j =1 G − ij (cid:18) q (cid:48)(cid:48) + q (cid:48) (cid:19) (cid:18) p (cid:48)(cid:48) j + ( − p (cid:48) j )2 (cid:19) (23) p (cid:48)(cid:48) i = − p (cid:48) i + (cid:15) (cid:18) − ∂∂q i L (cid:18) q (cid:48)(cid:48) + q (cid:48) (cid:19) −

12 trace (cid:18) G − (cid:18) q (cid:48)(cid:48) + q (cid:48) (cid:19) ∂∂q i G (cid:18) q (cid:48)(cid:48) + q (cid:48) (cid:19)(cid:19) +12 (cid:18) p (cid:48)(cid:48) + ( − p (cid:48) )2 (cid:19) (cid:62) G − (cid:18) q (cid:48)(cid:48) + q (cid:48) (cid:19) ∂∂q i G (cid:18) q (cid:48)(cid:48) + q (cid:48) (cid:19) G − (cid:18) q (cid:48)(cid:48) + q (cid:48) (cid:19) (cid:18) p (cid:48)(cid:48) + ( − p (cid:48) )2 (cid:19)(cid:33) . (24)Plugging in we obtain, q (cid:48) i + (cid:15)  m (cid:88) j =1 G − ij (cid:18) q + q (cid:48) (cid:19) (cid:18) ( − p j ) + ( − p (cid:48) j )2 (cid:19) = q (cid:48) i − (cid:15)  m (cid:88) j =1 G − ij (cid:18) q + q (cid:48) (cid:19) (cid:18) p j + p (cid:48) j (cid:19) (25) = q i (26)by rearranging eq. (21). For notational simplicity let us deﬁne U ( q ) def. = − ∂∂q i L ( q ) −

12 trace (cid:18) G − ( q ) ∂∂q i G ( q ) (cid:19) (27) R ( q ) def. = 12 G − ( q ) ∂∂q i G ( q ) G − ( q ) (28)so that p (cid:48)(cid:48) i = − p (cid:48) i + (cid:15) (cid:32) U (cid:18) q (cid:48)(cid:48) + q (cid:19) + (cid:18) p (cid:48)(cid:48) + ( − p (cid:48) )2 (cid:19) (cid:62) R (cid:18) q (cid:48)(cid:48) + q (cid:19) (cid:18) p (cid:48)(cid:48) + ( − p (cid:48) )2 (cid:19)(cid:33) . (29) valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo Plugging in, we obtain, − p (cid:48) i + (cid:15) (cid:32) U (cid:18) q + q (cid:19) + (cid:18) ( − p ) + ( − p (cid:48) )2 (cid:19) (cid:62) R (cid:18) q + q (cid:19) (cid:18) ( − p ) + ( − p (cid:48) )2 (cid:19)(cid:33) (30) = − p (cid:48) i + (cid:15) (cid:32) U (cid:18) q + q (cid:19) + (cid:18) p + p (cid:48) (cid:19) (cid:62) R (cid:18) q + q (cid:19) (cid:18) p + p (cid:48) (cid:19)(cid:33) (31) = − p i (32)which follows from negating eq. (22) and rearranging. valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo B. Implicit Midpoint Eigenvalues

Let z = ( q, p ) and consider a quadratic Hamiltonian of the form, H ( z ) = z (cid:62) A z (33) = (cid:0) q (cid:62) p (cid:62) (cid:1) (cid:18) Σ 00 Σ (cid:19) (cid:18) qp (cid:19) . (34)The associated Hamiltonian vector ﬁeld is, ˙ z = (cid:18) ˙ q ˙ p (cid:19) (35) = (cid:18) − Id 0 (cid:19) (cid:18)

Σ 00 Σ (cid:19) (cid:18) qp (cid:19) (36) = (cid:18) − Σ 0 (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) J (cid:18) qp (cid:19) (37) = J z. (38)From eq. (10), the implicit midpoint integrator computes the update, z (cid:48) = z + (cid:15) J (cid:18) z (cid:48) + z (cid:19) (39) = ⇒ z (cid:48) = z + (cid:15) J z (cid:48) + (cid:15) J z (40) = ⇒ (cid:16) Id − (cid:15) J (cid:17) z (cid:48) = (cid:16) Id + (cid:15) J (cid:17) z (41) = ⇒ z (cid:48) = (cid:16) Id − (cid:15) J (cid:17) − (cid:16) Id + (cid:15) J (cid:17)(cid:124) (cid:123)(cid:122) (cid:125) Q z (42)The quantity Q is the Cayley transform of the linear transformation (cid:15) J . Noting that J is a skew-symmetric matrix, it is anestablished fact that the Cayley transform of a skew-symmetric matrix is an orthogonal matrix. This establishes that all ofthe eigenvalues of Q , the linear transform representing the implicit midpoint integrator, have unit modulus. valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo C. Riemannian Metrics and a Silent Change-of-Variables?

The Riemannian volume measure is (cid:112) det( G ( q )) d q on R m . When using HMC, does the fact that we have introduceda metric mean that we require a Jacobian correction to the posterior? Actually, the answer is no. The reason is that theMetropolis-Hastings accept-reject rule determines which density (speciﬁed with respect to the Lebesgue measure in the ( q, p ) phase-space) is sampled by the Markov chain. Just because the acceptance Hamiltonian (which is also the guidanceHamiltonian in RMHMC; see Duane et al. (1987)) involves computing the Riemannian metric does not mean that wehave silently changed the underlying measure. Indeed, the log-determinant term appearing in eq. (1) is chosen so that theconditional distribution of p given q is multivariate normal with respect to the Lebesgue measure . valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo D. Numerical Stability

The stability of numerical integrators is deﬁned by their long-term behavior on the harmonic oscillator, which is describedby the following Hamiltonian system ˙ q = p ˙ p = − ω q, (43)corresponding to the Hamiltonian H ( q, p ) = ω q + p where ω is a constant. Consider a single step of a numericalintegrator for the harmonic oscillator with step-size (cid:15) that maps ( q, p ) (cid:55)→ ( q (cid:48) , p (cid:48) ) . Because the harmonic oscillator is a linear differential equation, it is often possible ﬁnd a matrix R ∈ R × such that ( q (cid:48) , p (cid:48) ) (cid:62) = R ( q, p ) (cid:62) . A numerical method iscalled stable if the eigenvalues of R lie on the unit disk of the complex plane and are not repeated (Leimkuhler & Reich,2005). We have the following result. Proposition 2.

When (cid:15) < /ω , the (generalized) leapfrog integrator is stable. The implicit midpoint integrator is stable forany (cid:15) .See Hairer et al. (2006); Leimkuhler & Reich (2005) for an introduction to stability analysis of numerical integrators. valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo (a) (cid:15) = 1 / (b) (cid:15) = 1 / (c) (cid:15) = 1 Figure 5.

Comparison between the energy conservation of the implicit midpoint integrator and the generalized leapfrog integrator on aquadratic Hamiltonian. We observe that for every step-size, the energy conservation of the implicit midpoint method is in the neighborhoodof × − whereas the generalized leapfrog has energy conservation that degrades with larger steps. E. Quadratic Hamiltonian

We consider using HMC to draw samples from a Gaussian distribution in two dimensions. In particular, we aim to samplefrom the joint distribution of position and momentum deﬁned by, q ∼ Normal (cid:18)(cid:18) / − (cid:19) , (cid:18) / / (cid:19)(cid:19) (44) p ∼ Normal (cid:32)(cid:18) (cid:19) , (cid:18) / / (cid:19) − (cid:33) (45)These distributions correspond to the quadratic Hamiltonian H ( q, p ) = q (cid:62) Σ − q + p (cid:62) Σ p . This Hamiltonian can beinterpreted in the Riemannian manifold setting as sampling from the posterior q ∼ Normal( µ, Σ) and the constant metric G ( q ) = Σ − . The corresponding Hamiltonian is quadratic and therefore theorem 2 applies. We expect perfect conservationof the Hamiltonian regardless of step-size. To evaluate the conservation of the Hamiltonian energy, we consider drawing ( q, p ) from their joint distribution and integrating Hamilton’s equations of motion for ten integration steps. We considerintegration step-sizes in { . , . , . } . We then compare the initial Hamiltonian energy to the Hamiltonian energy at theterminal point of the integrator. We repeat this procedure 10,000 times and show the results in ﬁg. 5, where the absolutedifference in Hamiltonian energy is shown as a histogram. This experiment clearly shows that the implicit midpoint integratorhas excellent conservation of the quadratic Hamiltonian energy and is orders of magnitude better than the generalizedleapfrog integrator. Note that for a separable Hamiltonian, as is the case here, the steps of the generalized leapfrog integratorreduce to the standard leapfrog method. valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo F. Bayesian Logistic Regression

Binary classiﬁcation is a uniquitous task in the data sciences and logistic regression is the most popular algorithm forobtaining probabilistic estimates of class membership. Bayesian logistic regression simply equips each of the linearcoefﬁcients in the logistic regression model with a prior distribution. We consider Bayesian logistic regression as deﬁned bythe following generative model: y i | x i , β ∼ Bernoulli( σ ( x (cid:62) i β )) for i = 1 , . . . , n (46) β i ∼ Normal(0 ,

1) for i = 1 , . . . , k, (47)where x i ∈ R k is vector of explanatory variables and σ : R → (0 , is the sigmoid function. For the logistic regressionmodel, let x ∈ R n × m represent the matrix of features. The Riemannian metric formed by the sum of the Fisher informationand the negative Hessian of the log-prior is G ( β ) = x (cid:62) Λx + Id where Λ is a diagonal matrix whose i th diagonal entry is σ ( x (cid:62) i β )(1 − σ ( x (cid:62) i β )) .We consider sampling from the posterior distribution of the linear coefﬁcients for a breast cancer, heart disease, and diabetesdataset. We consider integration step-sizes in { / , } and a number of integration steps in { , , } ; each conﬁgurationof step-size and number of steps is replicated ten times and we attempt to draw 10,000 samples from the posterior. Resultsare presented in tables 5 to 7. For the smaller step-size, both integrators enjoy very high acceptance rates and similarperformance when not adjusted for timing; when adjusted for timing, the generalized leapfrog is often the better choice inthe presence of a small step-size. For the larger step-size, only the implicit midpoint integrator is able to maintain a highacceptance rate; occasionally, the implicit midpoint is able to produce the optimal mean ESS and minimum ESS per second . valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo Acc. Prob. Mean ESS Min. ESS Mean ESS / Sec. Min. ESS / Sec.Step Size Num. Steps Method0.1 5 I.M.(a) . ± .

00 655 . ± .

55 550 . ± .

98 4 . ± .

06 3 . ± . I.M.(b) . ± .

00 649 . ± .

51 542 . ± .

66 4 . ± .

07 3 . ± . G.L.F.(a) . ± .

00 651 . ± .

03 519 . ± .

73 4 . ± .

07 3 . ± . G.L.F.(b) . ± .

00 661 . ± .

27 565 . ± .

64 8 . ± .

15 7 . ± .

10 I.M.(a) . ± .

00 2998 . ± .

27 2731 . ± .

08 10 . ± .

10 9 . ± . I.M.(b) . ± .

00 2981 . ± .

27 2646 . ± .

08 10 . ± .

08 9 . ± . G.L.F.(a) . ± .

00 3016 . ± .

35 2751 . ± .

51 10 . ± .

10 9 . ± . G.L.F.(b) . ± .

00 2996 . ± .

93 2719 . ± .

37 20 . ± .

19 18 . ± .

50 I.M.(a) . ± .

00 5207 . ± .

84 4715 . ± .

28 3 . ± .

05 3 . ± . I.M.(b) . ± .

00 5203 . ± .

23 4702 . ± .

43 3 . ± .

03 3 . ± . G.L.F.(a) . ± .

00 5112 . ± .

34 4683 . ± .

93 3 . ± .

05 3 . ± . G.L.F.(b) . ± .

00 5050 . ± .

48 4609 . ± .

69 7 . ± .

10 6 . ± . . ± .

00 7414 . ± .

80 6302 . ± .

83 14 . ± .

20 12 . ± . I.M.(b) . ± .

00 7473 . ± .

70 6589 . ± .

02 14 . ± .

14 12 . ± . G.L.F.(a) . ± .

00 127 . ± .

93 71 . ± .

24 0 . ± .

02 0 . ± . G.L.F.(b) . ± .

01 106 . ± .

70 53 . ± .

93 0 . ± .

11 0 . ± .

10 I.M.(a) . ± .

00 30917 . ± .

63 18778 . ± .

00 28 . ± .

19 17 . ± . I.M.(b) . ± .

00 32142 . ± .

50 19794 . ± .

46 30 . ± .

09 19 . ± . G.L.F.(a) . ± .

00 181 . ± .

69 80 . ± .

83 0 . ± .

03 0 . ± . G.L.F.(b) . ± .

01 189 . ± .

65 102 . ± .

62 1 . ± .

15 0 . ± .

50 I.M.(a) . ± .

00 10401 . ± .

29 4712 . ± .

42 1 . ± .

07 0 . ± . I.M.(b) . ± .

00 9631 . ± .

71 4414 . ± .

29 1 . ± .

09 0 . ± . G.L.F.(a) . ± .

00 77 . ± .

19 28 . ± .

36 0 . ± .

02 0 . ± . G.L.F.(b) . ± .

01 66 . ± .

49 22 . ± .

59 0 . ± .

06 0 . ± . Table 5.

Comparison of the implicit midpoint and generalized leapfrog integrators on sampling from the Bayesian logistic regressionposterior on the Breast Cancer dataset. For the larger step-size, the acceptance rate of the generalized leapfrog integrator completelydeteriorates whereas the implicit midpoint integrator is more robust. The implicit midpoint integrator is able to achieve super-efﬁcientsampling for a step-size of (cid:15) = 1 and ten integration steps.

Acc. Prob. Mean ESS Min. ESS Mean ESS / Sec. Min. ESS / Sec.Step Size Num. Steps Method0.1 5 I.M.(a) . ± .

00 657 . ± .

55 565 . ± .

33 3 . ± .

05 3 . ± . I.M.(b). . ± .

00 641 . ± .

91 523 . ± .

48 3 . ± .

04 2 . ± . G.L.F.(a) . ± .

00 662 . ± .

14 552 . ± .

05 3 . ± .

06 3 . ± . G.L.F.(b) . ± .

00 667 . ± .

65 578 . ± .

40 7 . ± .

13 6 . ± .

10 I.M.(a) . ± .

00 2990 . ± .

18 2759 . ± .

28 8 . ± .

07 7 . ± . I.M.(b). . ± .

00 2949 . ± .

86 2684 . ± .

38 8 . ± .

05 7 . ± . G.L.F.(a) . ± .

00 2973 . ± .

59 2686 . ± .

11 9 . ± .

05 8 . ± . G.L.F.(b) . ± .

00 2978 . ± .

03 2725 . ± .

92 17 . ± .

13 16 . ± .

50 I.M.(a) . ± .

00 5479 . ± .

63 5108 . ± .

58 3 . ± .

02 2 . ± . I.M.(b). . ± .

00 5507 . ± .

30 5165 . ± .

04 3 . ± .

03 2 . ± . G.L.F.(a) . ± .

00 5374 . ± .

04 4982 . ± .

46 3 . ± .

03 3 . ± . G.L.F.(b) . ± .

00 5425 . ± .

16 5021 . ± .

16 6 . ± .

06 6 . ± . . ± .

00 10488 . ± .

77 9723 . ± .

57 16 . ± .

11 15 . ± . I.M.(b). . ± .

00 10453 . ± .

01 9830 . ± .

38 16 . ± .

13 15 . ± . G.L.F.(a) . ± .

00 1489 . ± .

47 1115 . ± .

02 4 . ± .

07 3 . ± . G.L.F.(b) . ± .

00 1521 . ± .

73 1083 . ± .

45 12 . ± .

33 8 . ± .

10 I.M.(a) . ± .

00 40000 . ± .

00 31 . ± .

15 31 . ± . I.M.(b). . ± .

00 40000 . ± .

00 32 . ± .

17 32 . ± . G.L.F.(a) . ± .

00 5086 . ± .

37 3453 . ± .

30 7 . ± .

08 5 . ± . G.L.F.(b) . ± .

00 4829 . ± .

74 2880 . ± .

36 20 . ± .

47 12 . ± .

50 I.M.(a) . ± .

00 37152 . ± .

78 21650 . ± .

63 5 . ± .

03 3 . ± . I.M.(b). . ± .

00 37032 . ± .

33 20993 . ± .

32 6 . ± .

03 3 . ± . G.L.F.(a) . ± .

00 5795 . ± .

44 3613 . ± .

11 1 . ± .

17 1 . ± . G.L.F.(b) . ± .

00 6091 . ± .

61 3819 . ± .

07 6 . ± .

39 3 . ± . Table 6.

Comparison of the implicit midpoint and generalized leapfrog integrators on sampling from the Bayesian logistic regressionposterior on the Diabetes dataset. valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo

Acc. Prob. Mean ESS Min. ESS Mean ESS / Sec. Min. ESS / Sec.Step Size Num. Steps Method0.1 5 I.M.(a) . ± .

00 649 . ± .

92 508 . ± .

52 3 . ± .

12 2 . ± . I.M.(b) . ± .

00 659 . ± .

69 548 . ± .

73 3 . ± .

03 2 . ± . G.L.F.(a) . ± .

00 665 . ± .

87 524 . ± .

08 3 . ± .

08 2 . ± . G.L.F.(b) . ± .

00 664 . ± .

70 549 . ± .

64 6 . ± .

06 5 . ± .

10 I.M.(a) . ± .

00 3013 . ± .

46 2673 . ± .

23 7 . ± .

17 6 . ± . I.M.(b) . ± .

00 2984 . ± .

59 2626 . ± .

39 8 . ± .

14 7 . ± . G.L.F.(a) . ± .

00 3022 . ± .

90 2790 . ± .

00 7 . ± .

06 7 . ± . G.L.F.(b) . ± .

00 3044 . ± .

36 2759 . ± .

27 16 . ± .

09 15 . ± .

50 I.M.(a) . ± .

00 5091 . ± .

69 4584 . ± .

93 2 . ± .

05 2 . ± . I.M.(b) . ± .

00 5088 . ± .

44 4669 . ± .

70 2 . ± .

04 2 . ± . G.L.F.(a) . ± .

00 4953 . ± .

48 4456 . ± .

97 2 . ± .

01 2 . ± . G.L.F.(b) . ± .

00 4922 . ± .

09 4356 . ± .

48 5 . ± .

17 4 . ± . . ± .

00 4471 . ± .

09 3466 . ± .

48 5 . ± .

16 3 . ± . I.M.(b) . ± .

00 4348 . ± .

03 3207 . ± .

36 4 . ± .

12 3 . ± . G.L.F.(a) . ± .

00 27 . ± .

55 7 . ± .

96 0 . ± .

01 0 . ± . G.L.F.(b) . ± .

00 29 . ± .

03 7 . ± .

03 0 . ± .

10 I.M.(a) . ± .

00 8644 . ± .

39 4092 . ± .

77 4 . ± .

28 1 . ± . I.M.(b) . ± .

00 8988 . ± .

97 4856 . ± .

55 4 . ± .

25 2 . ± . G.L.F.(a) . ± .

00 38 . ± .

87 9 . ± .

98 0 . ± .

01 0 . ± . G.L.F.(b) . ± .

00 36 . ± .

76 10 . ± .

19 0 . ± .

04 0 . ± .

50 I.M.(a) . ± .

00 2775 . ± .

62 1158 . ± .

95 0 . ± .

01 0 . ± . I.M.(b) . ± .

00 2935 . ± .

45 1558 . ± .

82 0 . ± .

02 0 . ± . G.L.F.(a) . ± .

00 20 . ± .

22 6 . ± .

68 0 . ± .

01 0 . ± . G.L.F.(b) . ± .

00 18 . ± .

14 4 . ± .

41 0 . ± .

02 0 . ± . Table 7.

Comparison of the implicit midpoint and generalized leapfrog integrators on sampling from the Bayesian logistic regressionposterior on the heart disease dataset. valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo

G. Volume Preservation and Symmetric Metrics

Here we describe how we compute metrics related to volume preservation and symmetry. Let ( q , . . . , q n ) be samplesgenerated by Hamiltonian Monte Carlo with numerical integrator Φ : R m × R m → R m × R m ; each q i is an element of R m . G.1. Reversibility

For each sample q i , generate p i | q i ∼ Normal(0 , G ( q i )) and compute ( q (cid:48) i , p (cid:48) i ) = Φ( q i , p i ) . Now compute ( q (cid:48)(cid:48) i , − p (cid:48)(cid:48) i ) =Φ( q (cid:48) i , − p (cid:48) i ) . The violation of reversibility is deﬁned by (cid:113) (cid:107) ( q i , p (cid:48) i ) − ( q (cid:48)(cid:48) i , p (cid:48)(cid:48) i ) (cid:107) . (48)If the numerical integrator is reversible, this norm will be zero. In our metrics, we report the median violation of reversibility. G.2. Volume Preservation

Let z = ( q, p ) and identify Φ( z ) ≡ Φ( q, p ) . Deﬁne f j ( z ) = Φ( z ,...,z j + η/ ,...,z m ) − Φ( z ,...,z j − η/ ,...,z m ) η for η = 1 × − (except for the Fitzhugh-Nagumo model where we set η = 1 × − for numerical reasons), which is the central differenceformula that approximates ∂∂z j Φ( z ) . We compute the approximation to the Jacobian of Φ by constructing, ∇ Φ( z ) ≈ F ( z ) def. = (cid:0) f ( z ) f ( z ) · · · f m ( z ) (cid:1) ∈ R m × m . (49)For each sample q i , generate p i | q i ∼ Normal(0 , G ( q i )) and set z i = ( q i , p i ) . The violation of volume preservation isdeﬁned by | det( F ( z i )) − | . (50)If the numerical integrator is volume preserving, this difference will be zero. In our metrics, we report the median violationof volume preservation. valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo H. Detailed Balance from Reversibility and Volume Preservation

Let

Φ : R m × R m → R m × R m be a numerical integrator satisfying the following two properties as described in the maintext:(i) The integrator has a unit Jacobian determinant so that it preserves volume in ( q, p ) -space.(ii) The integrator is symmetric under negation of the momentum variable.Given ( q, p ) ∈ R m × R m , deﬁne the momentum ﬂip operator by Ψ( q, p ) = ( q, − p ) . For Markov chain Monte Carlo,we then deﬁne the Markov chain proposal operator by P def. = Ψ ◦ Φ ; because Φ satisﬁes property (ii) we have that P ◦ P = Ψ ◦ Φ ◦ Ψ ◦ Φ = Id so that the proposal operator is self-inverse P − = P . Moreover, Ψ has unit Jacobiandeterminant: | det( ∇ Ψ) | = 1 ; therefore, since Φ has unit Jacobian determinant by property (i), P also has unit Jacobiandeterminant: | det( ∇ P ( q, p )) | = | det( ∇ Ψ(Φ( q, p )) ∇ Φ( q, p )) | (51) = | det( ∇ Ψ(Φ( q, p )) · det( ∇ Φ( q, p )) | (52) = 1 (53)Given a current position ( q, p ) of the Markov chain, the proposal for the next state of the Markov chain ( q (cid:48) , p (cid:48) ) def. = P ( q, p ) is accepted with probability min (cid:26) , π ( q (cid:48) , p (cid:48) ) π ( q, p ) (cid:27) , (54)where π : R m × R m → R is the target distribution. (Notice that π must only be speciﬁed up to a constant.) Thus, theMarkov chain transition operator is deﬁned by, T ( q, p ) = (cid:40) ( q (cid:48) , p (cid:48) ) w . p . min (cid:110) , π ( q (cid:48) ,p (cid:48) ) π ( q,p ) (cid:111) ( q, p ) else . (55)We say that detailed balance holds if for all sets A, B ⊂ R m × R m we have, Pr ( q,p ) ∼ π [( q, p ) ∈ A and T ( q, p ) ∈ B ] = Pr ( q,p ) ∼ π [( q, p ) ∈ B and T ( q, p ) ∈ A ] (56)Let z = ( q, p ) . Expanding we compute, Pr z ∼ π [ z ∈ A and T ( z ) ∈ B ] = (cid:90) A π ( z ) · { P ( z ) ∈ B } · min (cid:26) , π ( P ( z )) π ( z ) (cid:27) d z (57) = (cid:90) A ∩ P ( B ) π ( z ) · min (cid:26) , π ( P ( z )) π ( z ) (cid:27) d z (58) z (cid:48) = P ( z ) = (cid:90) P ( A ) ∩ B π ( P ( z (cid:48) )) · min (cid:26) , π ( z (cid:48) ) π ( P ( z (cid:48) )) (cid:27) d z (cid:48) (59) = (cid:90) P ( A ) ∩ B π ( z (cid:48) ) · min (cid:26) , π ( P ( z (cid:48) )) π ( z (cid:48) ) (cid:27) d z (cid:48) (60) = (cid:90) B π ( z (cid:48) ) · { P ( z (cid:48) ) ∈ A } · min (cid:26) , π ( P ( z (cid:48) )) π ( z (cid:48) ) (cid:27) d z (cid:48) (61) = Pr z ∼ π [ z ∈ B and T ( z ) ∈ A ] (62)showing that detailed balance holds. The change-of-variables in eq. (59) does not incur a Jacobian determinant correctionsince P has unit Jacobian determinant. valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo Algorithm 4 (G.L.F.(b))

The procedure for a single step of integrating Hamiltonian dynamics using the efﬁcient implemen-tation of the generalized leapfrog integrator. Input : Log-posterior L : R m → R , Riemannian metric G : R m → R m × m , initial position and momentum variables ( q, p ) ∈ R m × R m , integration step-size size (cid:15) ∈ R . Precompute G − ( q ) , ∂∂q i G ( q ) , and deﬁne A i def. = ∂∂q i L ( q ) + 12 trace (cid:18) G − ( q ) ∂∂q i G ( q ) (cid:19) (63)for i = 1 , . . . , m . Use algorithm 1 with tolerance δ and initial guess p to solve for ¯ p , ¯ p = p − (cid:15) (cid:18) A −

12 ( G − ( q )¯ p ) (cid:62) ∂∂q G ( q )( G − ( q )¯ p ) , . . . , A m −

12 ( G − ( q )¯ p ) (cid:62) ∂∂q m G ( q )( G − ( q )¯ p ) (cid:19) (cid:62) (cid:124) (cid:123)(cid:122) (cid:125) f (¯ p ) . (64) Precompute G ( q ) − ¯ p . Use algorithm 1 with tolerance δ and initial guess q to solve for q (cid:48) , q (cid:48) = q + (cid:15) (cid:0) G − ( q )¯ p + G − ( q (cid:48) )¯ p (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) f ( q (cid:48) ) . (65) Compute p (cid:48) using eq. (9), which is an explicit update. Return : ( q (cid:48) , p (cid:48) ) ∈ R m × R m . I. Implementation of Integrators

I.1. Implementations of the Generalized Leapfrog Integrator

As the purpose of this research is to compare two integrators for Hamiltonian Monte Carlo, we wish to be precise about howthese numerical methods have been implemented.The ﬁrst implementation of the generalized leapfrog integrator is presented in algorithm 2. The system eqs. (7) to (9) wasdescribed as “naive” because it appears to ignore important structural properties of the equations of motion in eqs. (2)and (3) that would accelerate a step of the generalized leapfrog integrator. For instance, in eq. (7), the metric G ( q ) maybe precomputed because it is an invariant of the ﬁxed-point relation. Another example is that G − ( q )¯ p is an invariantquantity of eq. (8) and need not be recomputed in each ﬁxed-point iteration. Thus, we see that the generalized leapfrogintegrator, when efﬁciently implemented, has an important computational advantage in that certain invariant quantities canbe “cached” when ﬁnding ﬁxed points. A more complicated implementation of the generalized leapfrog integrator withcaching is presented in given in algorithm 4. We stress that algorithms 2 and 4 perform the same calculation. I.2. Implementations of the Implicit Midpoint Integrator

In addition to the implementation of the implicit midpoint method described in algorithm 3, we additionally consider a variantadvocated by Leimkuhler & Reich (2005) and present the implementation in algorithm 5. The essential difference betweenalgorithms 3 and 5 is whether or not the implicitly-deﬁned update computes the terminal point of the step (algorithm 3)or the midpoint of the step (algorithm 5). We stress that algorithms 3 and 5 perform the same calculation, which is easilyveriﬁed by plugging eq. (66) into eq. (67) and comparing to eq. (10). Unlike the generalized leapfrog integrator, the implicitmidpoint integrator does not enjoy the ability to cache intermediate computations. valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo

Algorithm 5 (I.M.(b))

The procedure for a single step of integrating Hamiltonian dynamics using the implicit midpointintegrator as advocated by Leimkuhler & Reich (2005). Input : Hamiltonian H : R m × R m → R , initial position and momentum variables ( q, p ) ∈ R m × R m , integrationstep-size size (cid:15) ∈ R , ﬁxed-point convergence tolerance δ ≥ . Use algorithm 1 with tolerance δ and initial guess ( q, p ) to solve for (¯ q, ¯ p ) , (cid:18) ¯ q ¯ p (cid:19) def. = (cid:18) qp (cid:19) + (cid:15) (cid:18) ∇ p H (¯ q, ¯ p ) −∇ q H (¯ q, ¯ p ) (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) f (¯ q, ¯ p ) . (66) Compute the explicit update (cid:18) q (cid:48) p (cid:48) (cid:19) = (cid:18) ¯ q ¯ p (cid:19) + (cid:15) (cid:18) ∇ p H (¯ q, ¯ p ) −∇ q H (¯ q, ¯ p ) (cid:19) . (67) Return : ( q (cid:48) , p (cid:48) ) ∈ R m × R m . valuating the Implicit Midpoint Integrator for Riemannian Manifold Hamiltonian Monte Carlo J. Proof of Proposition 1