Variable length trajectory compressible hybrid Monte Carlo
VVariable length trajectory compressible hybrid Monte Carlo
Akihiko Nishimura a) and David Dunson Department of Mathematics, Duke University, Durham, North Carolina, 27708, USA Department of Statistical Science, Duke University, Durham, North Carolina, 27708, USA (Dated: 26 September 2018)
Hybrid Monte Carlo (HMC) generates samples from a prescribed probability distribution in a con-figuration space by simulating Hamiltonian dynamics, followed by the Metropolis (-Hastings) ac-ceptance/rejection step. Compressible HMC (CHMC) generalizes HMC to a situation in which thedynamics is reversible but not necessarily Hamiltonian. This article presents a framework to fur-ther extend the algorithm. Within the existing framework, each trajectory of the dynamics mustbe integrated for the same amount of (random) time to generate a valid Metropolis proposal. Ourgeneralized acceptance/rejection mechanism allows a more deliberate choice of the integration timefor each trajectory. The proposed algorithm in particular enables an effective application of variablestep size integrators to HMC-type sampling algorithms based on reversible dynamics. The potentialof our framework is further demonstrated by another extension of HMC which reduces the wastedcomputations due to unstable numerical approximations and corresponding rejected proposals.
I. INTRODUCTION
A study of molecular systems often relies on generatingrandom variables from a prescribed (unnormalized) prob-ability distribution ρ ( q ) ∝ exp( − U ( q )) on the configura-tion space. Markov chain Monte Carlo (MCMC) gener-ates samples from a target distribution by constructinga Markov chain whose stationary distribution coincideswith the target distribution. Such a Markov chain can berealized by building a transition rule that satisfies the de-tailed balance condition. MCMC based on the Metropo-lis (-Hastings) algorithm is a general sampling approachwidely used in computational physical science as wellas in Bayesian statistics and machine learning. Manysuch algorithms are inefficient, producing highly corre-lated samples, and require a large number of iterationsto adequately characterize the target distribution .Hybrid Monte Carlo (HMC) constructs the proposaldistribution for the Metropolis algorithm by simulatingmolecular dynamics (MD), a procedure that can greatlyreduce the correlation among successive MCMC samples.More precisely, HMC augments the state space by intro-ducing a momentum variable p ; the original variable q isoften referred to as position variable in the HMC frame-work. In this augmented state space ( q , p ), the proposaldistribution for Metropolis is constructed by solving anordinary differential equation (ODE) corresponding toNewton’s equations of motion with respect to the poten-tial energy U ( q ). There are applications in which (par-tial) analytical solutions to an ODE can be exploited ,but in general ODEs are discretized and integrated nu-merically.Within the original HMC framework, an integrator forsimulating MD must be reversible and volume-preservingto produce a valid Metropolis proposal . In fact, thevolume-preserving property can be relaxed by including a) Electronic mail: [email protected]. a Jacobian factor in the calculation of the Metropolisacceptance probability.
Under this generalization ofHMC, any reversible (discrete) dynamics / bijective mapcan be applied to generate a proposal state. This al-gorithm is formalized as compressible HMC (CHMC) inRef. 12. A generalization of HMC known as
Rieman-nian manifold HMC (RMHMC) also falls within theframework of CHMC.This article presents an algorithm to relax anothercondition required by (compressible) HMC. Given a re-versible map F and state ( q , p ), CHMC proposes thenext state by applying the map n times, where the num-ber of steps n can be drawn randomly at each iteration.Though often not stated explicitly, the detailed balancerequires the number of steps to be determined indepen-dently of the trajectory { ( q , p ) , F ( q , p ) , F ( q , p ) , . . . } .As we will show in Section III, this constraint can preventrealizing the full potential of MCMC algorithms based onreversible dynamics.Our algorithm generalizes the acceptance-rejectionmechanism behind CHMC to allow the number of stepsto depend on each trajectory of the dynamics while pre-serving the detailed balance. The number of numer-ical integration steps taken in simulating a trajectoryof HMC is commonly referred to as the “path length”of a trajectory in the statistics literature. We there-fore call our algorithm variable length trajectory CHMC (VLT-CHMC). It should be mentioned that the No-U-Turn-Sampler (NUTS) is another variant of HMC thatallows the path lengths to vary from one trajectory toanother. However, the motivation behind NUTS is tospare a user the trouble of manually tuning the num-ber of steps, and NUTS in general performs no betterthan HMC with well-chosen path lengths.
On theother hand VLT-CHMC can improve the performance ofCHMC in a more fundamental and significant way. Inparticular, VLT-CHMC enables an effective applicationof reversible variable step size integrators to HMC-typesampling algorithms based on reversible dynamics. a r X i v : . [ phy s i c s . c o m p - ph ] A p r The rest of the paper is organized as follows. Section IIreviews the main ideas behind CHMC and provides anexample in which the compressible dynamics arises fromthe use of non-traditional integrators in HMC settings.Such integrators have proven to be more efficient thanthe commonly used volume-preserving integrators in var-ious applications. The example also serves to introducethe notations and concepts needed in the next section,where VLT-CHMC is motivated as a method to effec-tively apply variable step size integrators in HMC set-tings. The presentation is self-contained, but some fa-miliarity with HMC is assumed. VLT-CHMC is devel-oped in Section III. Section III A explains how the ex-isting framework limits the utility of variable step sizeintegrators to sampling algorithms. The key observationin addressing this issue leads to a special case of VLT-CHMC. More general construction of VLT-CHMC is pro-vided in Section III B. Section III C presents another usecase of VLT-CHMC, where HMC is modified to reducethe wasted computation due to unstable numerical ap-proximations and corresponding rejected proposals. Thesimulation results are shown in Section IV to demonstratethe potential gains from the framework of VLT-CHMC.
II. REVIEW OF COMPRESSIBLE HMCA. Basic Theory
To keep the description of CHMC and the subsequentdevelopment of VLT-CHMC more intuitive, the versionof CHMC described here is slightly less general than theone in Ref. 12. It is straightforward to extend the variablelength trajectory algorithm of Section III to the generalsettings.A bijective map F is said to be reversible if F − = R ◦ F ◦ R (1)or equivalently ( R ◦ F ) − = R ◦ F for an involution R (i.e. R ◦ R = id). Note that the reversiblity of F impliesthat of F n for any n . Let DF n denote the Jacobianmatrix of F n and | DF n | its determinant. Given a state z and integer n , CHMC proposes the state z ∗ = R ◦ F n ( z )and accepts or rejects the proposal with probabilitymin (cid:26) , ρ ( z ∗ ) | DF n ( z ) | ρ ( z ) (cid:27) (2)To see that this transition rule satisfies the detailed bal-ance with respect to ρ ( · ), consider a small neighborhood B around z and B ∗ = R ◦ F n ( B ) around z ∗ , so that R ◦ F n ( B ∗ ) = B . The proposal move sends the proba-bility mass (cid:90) B ρ ( z (cid:48) )d z (cid:48) ≈ ρ ( z )vol( B ) from B to B ∗ . On the other hand, the mass sent from B ∗ to B by the proposal move can be seen to be (cid:90) B ∗ ρ ( z (cid:48) )d z (cid:48) = (cid:90) B ρ ( z (cid:48) ) | D ( R ◦ F n )( z (cid:48) ) | d z (cid:48) ≈ ρ ( z ∗ ) | DF n ( z ) | vol( B )by the change of variable formula and the fact | R | = 1.The acceptance and rejection step of CHMC amounts torejecting the fraction of move by the ratio of the proba-bility fluxes and thus imposes the detailed balance.The above transition rule preserves the target density ρ ( · ) for any n , so in practice the number of steps can bedrawn randomly at each iteration of CHMC. The stepsof CHMC are summarized in Algorithm 1 below, wherethe distribution p ( · ) for the number of steps is a tuningparameter a user must specify. The use of a deterministicmap as a proposal distribution does not yield an ergodicMarkov chain, and therefore such a transition rule mustbe alternated with another transition rule that preservesthe target density ρ ( · ), as done in Step 1 of the algorithm.We do not concern ourselves here with how to choose sucha random move since the choice depends critically on theparticular form of ρ ( · ). Algorithm 1 (Compressible HMC) . With a prespecifiedprobability mass function p ( · ) on Z + , CHMC generates aMarkov chain { z ( m ) } m with the following transition rule z ( m ) → z ( m +1) :1. Make a random change z ( m ) → z that preservesthe target density ρ ( · ).2. Sample n ∼ p ( · ) and propose the state z ∗ = R ◦ F n ( z ).3. Let z ( m +1) = z ∗ with probabilitymin (cid:26) , ρ ( z ∗ ) | DF n ( z ) | ρ ( z ) (cid:27) Otherwise, let z ( m +1) = z . B. Example: (Riemann manifold) HMC withnon-volume-preserving integrators
HMC and its extension Riemann manifold HMC(RMHMC) construct a reversible and volume-preservingbijective map by numerically approximating Hamilto-nian dynamics. To this end, they require a geometricintegrator that preserves the reversibility and volume-preservation property of Hamiltonian dynamics. Underthe CHMC framework, however, Hamiltonian dynamicscan be approximated using a wider range of integrationtechniques.In order to sample from a probability density of in-terest ρ ( q ) ∝ exp( − U ( q )) in R d , RMHMC introducesan auxiliary variable p ∈ R d whose distribution is de-fined conditionally as p | q ∼ N ( , M ( q )) for a fam-ily of positive definite matrices known as mass tensors { M ( q ) } q . The joint density ρ ( q , p ) in the phasespace then is given as ρ ( q , p ) ∝ exp( − H ( q , p )) wherethe Hamiltonian H ( q , p ) is given by H ( q , p ) = U ( q ) + 12 p T M ( q ) − p + 12 log | M ( q ) | (3)The proposal is generated by approximating the solutionto Hamilton’s equations :d q d t = ∇ p H ( q , p ) , d p d t = −∇ q H ( q , p ) (4)For the Hamiltonian (3), the solution operator of (4)is reversible with respect to a momentum flip operator R ( q , p ) = ( q , − p ). Solving (4) using a reversible inte-grator with a constant step size ∆ t yields a reversiblemap F ∆ t so that F n ∆ t ( q , p ) ≈ ( q ( n ∆ t ) , p ( n ∆ t )) (5)where { ( q ( t ) , p ( t )) } t denotes the exact solution with theinitial condition ( q , p ). In other words, F ∆ t approxi-mates the solution operator Φ ∆ t of (4) defined throughthe relationd Φ t d t = (cid:0) ( ∇ p H ) ◦ Φ t , − ( ∇ q H ) ◦ Φ t (cid:1) (6)for all t . If the reversible map F ∆ t is further requiredto be volume preserving, then we have | DF n ∆ t | = 1 andthe Jacobian factor drops from (2), recovering HMC andRMHMC algorithms of Ref. 5 and 13. In some applica-tions however, non-volume-preserving approximations of(4) have been shown to offer substantial gains in compu-tational efficiency. For example, Lan et. al. considers the ODE corre-sponding to (4) in terms of reparametrization ( q , v ) =( q , M ( q ) − p ). The reparametrized ODE admits semi-explicit and explicit reversible approximations, requir-ing fewer or no fixed point iterations compared tothe St¨ormer-Verlet integrator typically employed inRMHMC. The proposal move using a simulated trajec-tory is alternated with sampling v from its conditionaldensity v | q ∼ N ( , M ( q ) − ), a random move corre-sponding to Step 1 in Algorithm 1. The CHMC algo-rithm based on the semi-explicit and explicit integratorare found to significantly outperform RMHMC based onthe St¨ormer-Verlet integrator over a range of examples. III. VARIABLE LENGTH TRAJECTORY CHMC
Variable length trajectory CHMC (VLT-CHMC) ismost naturally motivated as a method to effectively ap-ply variable step size integrators in RMHMC settings.For this reason, we first develop this special case of VLT-CHMC in Section III A. A more general theory is devel-oped in Section III B. Section III C illustrates the use andpotential benefits of the general VLT-CHMC algorithmthrough another example.
A. Special case of VLT-CHMC1. Motivation: RMHMC with variable step sizeintegrators and limitations of CHMC
In Section II B, we discussed how CHMC allows usto approximate Hamiltonian dynamics with non-volume-preserving integrators and still generate a valid Metropo-lis proposal. We in particular considered the use of areversible integrator with a constant step size. A widerrange of reversible integration techniques for Hamiltoniansystems are available in the literature, however, includinga number of variable step size integrators.
In theory,a variable step size integrator similarly produces a validCHMC proposal as long as the integrator is reversible.However, the use of such an integrator under the exist-ing CHMC framework generally leads to an algorithmwith suboptimal sampling efficiency, for the reasons wedescribe now.Each step of a variable step size integrator approxi-mates the evolution ( q ( t n ) , p ( t n )) → ( q ( t n +∆ t n ) , p ( t n +∆ t n )) where the step size ∆ t n depends on the cur-rent state ( q ( t n ) , p ( t n )) through a step size controller g ( q , p ). The simplest choice of step size would be∆ t n = g ( q ( t n ) , p ( t n ))∆ s , but the reversibility requiresa slightly more sophisticated relationship and the con-dition g ( q , p ) = g ( q , − p ) (see Section III A 2). Mostimportantly for our discussion, a variable step sizescheme is equivalent to approximating the following time-rescaled Hamiltonian dynamics in a new time scale d s = g ( q , p ) − d t with a constant step size ∆ s :d q d s = g ( q , p ) ∇ p H ( q , p ) , d p d s = − g ( q , p ) ∇ q H ( q , p ) (7)In other words, a reversible variable step size approxima-tion of (4) yields a reversible map F ∆ s such that F n ∆ s ( q , p ) ≈ ( q ( n ∆ s ) , p ( n ∆ s )) (8)where { q ( s ) , p ( s ) } s is the solution to the time-rescaleddynamics (7) with the initial condition ( q , p ).This implicit time-rescaling behind variable step sizeintegration causes trouble for CHMC. The utility ofHamiltonian dynamics (4) as a proposal generation mech-anism stems from the fact that ρ ( q , p ) ∝ exp( − H ( q , p ))is the invariant distribution of the dynamics i.e. if ( q , p )has the distribution ρ ( q , p ) ∝ exp( − H ( q , p )), then Φ t ( q , p ) also has the same distribution ρ ( · ) for all t .As a consequence, the proposal generated by an approx-imate solution ( q ∗ , p ∗ ) = F n ∆ t ( q , p ) as in (5) can beaccepted with probability 1 in the limit ∆ t → n ∆ t → t (cid:48) . On the other hand, the time-rescaled dynam-ics (7) in general does not preserve the target density ρ ( q , p ), and the proposal generated by the approximatesolution ( q ∗ , p ∗ ) = F n ∆ s ( q , p ) may not be accepted withhigh probability even in the limit ∆ s → n ∆ s → s (cid:48) .In fact, the acceptance probability of the CHMC proposalin the limit is given by:min (cid:26) , g ( q ( s (cid:48) ) , p ( s (cid:48) )) g ( q , p ) (cid:27) (9)where { q ( s ) , p ( s ) } s denotes the solution to (7) with theinitial condition ( q , p ). The derivation is given in Ap-pendix A.
2. Algorithm: variable length trajectory scheme fortime-rescaled dynamics
In order to address the issue caused by the implicittime-rescaling associated with variable step size inte-grators, VLT-CHMC approximates the dynamics in theoriginal time scale as follows. Fix the initial condition( q , p ) and denote ( q i , p i ) = F i ∆ s ( q , p ) where F ∆ s ap-proximates the dynamics in the time scale s as in (8).The evolution ( q , p ) → ( q ( t ) , p ( t )) in the original timescale can be approximated by taking the trajectory de-pendent number of steps N ( q , p ) = N ( t, q , p ) definedas N ( q , p ) =min (cid:40) n : n (cid:88) i =1 ∆ s g ( q i − , p i − ) + g ( q i , p i )) > t (cid:41) (10)Now we consider the map F N ∆ s defined as F N ∆ s ( q , p ) = F N ( q , p )∆ s ( q , p ) (11)which approximates the solution operator Φ t as definedin (6). The map however cannot be used directly togenerate a proposal because in general it is neither re-versible or even bijective. The map would be reversible if N ( q ∗ , p ∗ ) = N ( q , p ) where ( q ∗ , p ∗ ) = R ◦ F N ∆ s ( q , p ),but (10) only implies N ( q ∗ , p ∗ ) ≤ N ( q , p ). For exam-ple when g ( q ∗ , p ∗ ) (cid:29) g ( q , p ), the simulated time alongthe reverse trajectory (cid:8) ( q ∗ i , p ∗ i ) = F i ∆ s ( q ∗ , p ∗ ) (cid:9) ni =0 n (cid:88) i =1 ∆ s (cid:0) g ( q ∗ i − , p ∗ i − ) + g ( q ∗ i , p ∗ i ) (cid:1) will likely reach the threshold t before n = N ( q , p )steps.The key observation behind VLT-CHMC is that wecan nonetheless construct collections of states S and S ∗ containing ( q , p ) and ( q ∗ , p ∗ ) such that R ◦ F N ∆ s ( S ) ⊂ S ∗ and R ◦ F N ∆ s ( S ∗ ) ⊂ S R ◦ F N ∆ s ( S c ) ⊂ ( S ∗ ) c and R ◦ F N ∆ s (( S ∗ ) c ) ⊂ S c (12)The existence of such sets S and S ∗ is a property ofthe map F N ∆ s and generalizes the notion of reversibility(1). The set S is essentially the pre-image of { ( q ∗ , p ∗ ) } under R ◦ F N ∆ s and can be constructed by defining S = { ( q − (cid:96) , p − (cid:96) ) , . . . , ( q r , p r ) } by choosing (cid:96), r ≥ (cid:96) = max (cid:8) j ≥ F N ∆ s ( q − j , p − j ) = F N ∆ s ( q , p ) (cid:9) r = max (cid:8) j ≥ F N ∆ s ( q j , p j ) = F N ∆ s ( q , p ) (cid:9) (13)Algorithmically, (cid:96) and r can be found by solving the dy-namics backward and forward from ( q , p ) using theequivalent definitions below: (cid:96) = max j ≥ N ( t, q , p ) − (cid:88) i = − j ∆ t i < t r = max j ≥ N ( t, q , p ) (cid:88) i = j ∆ t i > t where ∆ t i = ∆ s g ( q i − , p i − ) + g ( q i , p i )) (14)The set S ∗ is the pre-image of { ( q r , p r ) } under R ◦ F N ∆ s and can analogously be constructed. Denoting ( q ∗ i , p ∗ i ) = F i ∆ s ( q ∗ , p ∗ ), let S ∗ = (cid:8) ( q ∗− (cid:96) ∗ , p ∗− (cid:96) ∗ ) , . . . , ( q ∗ r ∗ , p ∗ r ∗ ) (cid:9) where (cid:96) ∗ , r ∗ ≥ (cid:96) ∗ = max (cid:8) j ≥ F N ∆ s ( q ∗− j , p ∗− j ) = F N ∆ s ( q ∗ , p ∗ ) (cid:9) r ∗ = max (cid:8) j ≥ F N ∆ s ( q ∗ j , p ∗ j ) = F N ∆ s ( q ∗ , p ∗ ) (cid:9) (15)It is shown in Appendix B that the above definition ac-tually implies r ∗ = 0. The proof of (12) and of otherfacts regarding S and S ∗ are also given in Appendix B.Having constructed the sets S and S ∗ with the prop-erty (12), VLT-CHMC imposes the detailed balance byrejecting a fraction of moves between S and S ∗ as de-scribed in Algorithm 2 below. Algorithm 2 (VLT-CHMC) . Given a reversible map F ∆ s as in (8) and a trajectory length function N as in (10), VLT-CHMC generates a Markov chain { ( q ( m ) , p ( m ) ) } m with the following transition rule( q ( m ) , p ( m ) ) → ( q ( m +1) , p ( m +1) ):1. Sample p from the conditional density p | q ( m ) andset q = q ( m ) .2. Find the indices (cid:96), r, (cid:96) ∗ , r ∗ as in (13) and (15) bysimulating the dynamics forward and backwardfrom ( q , p ) and R ◦ F N ∆ s ( q , p ). Then set S = (cid:8) F − (cid:96) ∆ s ( q , p ) , . . . , F r ∆ s ( q , p ) (cid:9) S ∗ = (cid:110) R ◦ F N − r ∗ ∆ s ( q , p ) ,. . . , R ◦ F N + (cid:96) ∗ ∆ s ( q , p ) (cid:111) where N = N ( q , p ).3. Propose the transition from S to S ∗ with the ac-ceptance probability which is the smaller of 1 and (cid:96) ∗ (cid:80) j = − r ∗ ρ (cid:16) R ◦ F N + j ∆ s ( q , p ) (cid:17) (cid:12)(cid:12)(cid:12) DF N + j ∆ s ( q , p ) (cid:12)(cid:12)(cid:12) r (cid:80) i = − (cid:96) ρ (cid:0) F i ∆ s ( q , p ) (cid:1) (cid:12)(cid:12) DF i ∆ s ( q , p ) (cid:12)(cid:12) (16)4. If the transition in Step 3 is accepted, choose astate R ◦ F N + j ∆ s ( q , p ) from S ∗ with the probabil-ity proportional to ρ (cid:16) R ◦ F N + j ∆ s ( q , p ) (cid:17) (cid:12)(cid:12)(cid:12) DF N + j ∆ s ( q , p ) (cid:12)(cid:12)(cid:12) (17)and set ( q ( m +1) , p ( m +1) ) = R ◦ F N + j ∆ s ( q , p ).Otherwise, choose a state F i ∆ s ( q , p ) from S withthe probability proportional to ρ (cid:0) F i ∆ s ( q , p ) (cid:1) (cid:12)(cid:12) DF i ∆ s ( q , p ) (cid:12)(cid:12) (18)and set ( q ( m +1) , p ( m +1) ) = F i ∆ s ( q , p ).
3. Theory: VLT-CHMC and detailed-balance condition
Too see how VLT-CHMC achieves the detailed bal-ance, consider a small neighborhood B around ( q , p ).The total probability in the neighborhood B = ∪ ri = − (cid:96) F i ∆ s ( B ) of S is (cid:90) B ρ ( q , p ) d q d p ≈ r (cid:88) i = − (cid:96) ρ (cid:0) F i ∆ s ( q , p ) (cid:1) (cid:12)(cid:12) DF i ∆ s ( q , p ) (cid:12)(cid:12) (cid:12)(cid:12) B (cid:12)(cid:12) (19)assuming that B is small enough that F i ∆ s ( B )’s are dis-joint. Similarly, the total probability in the neighborhood B ∗ = ∪ (cid:96) ∗ j = − r ∗ R ◦ F N + j ∆ s ( B ) of S ∗ is (cid:90) B ∗ ρ ( q , p ) d q d p ≈ (cid:96) ∗ (cid:88) j = − r ∗ ρ (cid:16) R ◦ F N + j ∆ s ( q , p ) (cid:17) (cid:12)(cid:12)(cid:12) DF N + j ∆ s ( q , p ) (cid:12)(cid:12)(cid:12) (cid:12)(cid:12) B (cid:12)(cid:12) (20)Comparing the acceptance probability (16) with theprobability fluxes (19) and (20), one can see that theacceptance-rejection procedure of Step 3 controls theprobability fluxes appropriately to achieve the detailedbalance between the neighborhoods B and B ∗ . Step 4then imposes the detailed balance within B and B ∗ bysampling a state according to the relative amount ofprobability in the individual components { F i ∆ s ( B ) } ri = − (cid:96) of B and { R ◦ F N + j ∆ s ( B ) } (cid:96) ∗ j = − r ∗ of B ∗ .
4. Theoretical efficiency: improvement over CHMC
Throughout Section III A we considered the compress-ible dynamics (7) arising from a variable step size inte-gration of Hamiltonian dynamics. In this specific settingwith the trajectory length function N as defined in (10),VLT-CHMC is guaranteed to have a high average accep-tance probability. In fact, in the limit ∆ s → t fixed, the acceptance probability (16) of a VLT-CHMCproposal from ( q , p ) converges to a value bounded be-low by g ( Φ t ( q , p )) g ( q , p ) (cid:22) g ( q , p ) g ( Φ t ( q , p )) (cid:23) (21)when g ( Φ t ( q , p )) < g ( q , p ). In case g ( Φ t ( q , p )) >g ( q , p ), a similar lower bound holds for the proposalfrom R ◦ Φ t ( q , p ). Note that the quantity (21) isalways larger than 1 / g ( Φ t ( q , p )) /g ( q , p ) increases, in contrast with the ac-ceptance probability (9) of CHMC. More precise resultson the acceptance probability of a VLT-CHMC proposalare derived in Appendix A.Of course, the acceptance rate of a proposal distribu-tion is not the only factor determining the efficiency ofan MCMC algorithm. Nonetheless, the theoretical re-sult above highlights an advantage VLT-CHMC has overthe usual CHMC. The bottom line is that VLT-CHMCproposals approximate the original dynamic (4) whileCHMC proposals approximate the time-rescaled dynam-ics (7). Therefore, VLT-CHMC will generally outperformCHMC whenever the exact solution of the original dy-namics constitutes an efficient Markov chain propagatoras is typically the case in RMHMC applications. Thisis substantiated by our simulation study in Section IV.
B. General VLT-CHMC
The key step in Algorithm 2 is the construction of thesets S and S ∗ with the property (12). More generally,the detailed balance can be imposed by the same typeof acceptance-rejection mechanism whenever the phasespace can be partitioned into a collection of pairs S and S ∗ such that the set S ∪ S ∗ and ( S ∪ S ∗ ) c is closed un-der a (deterministic) transition rule. Conceivably, a widerange of algorithms can be devised under this generalcondition. In this section we present one systematic wayto generalize the framework of Section III A.Consider a generic reversible map F on a state space z and associated involution R . Fix z and denote z i = F i ( z ). Choose a trajectory termination criteria , ormore precisely boolean valued functions b n ( z , . . . , z n ) ∈{ , } , with the following property b n ( z , . . . , z n ) = b n ( R ( z n ) , . . . , R ( z )) (22)as well as the property b n ( z , . . . , z n ) = 1 only if b n − i ( z i , . . . , z n ) = 1 (23)for any i >
0. These properties are satisfied, for exam-ple, by a termination criteria (cid:80) ni =1 a ( z i ) + a ( z i − ) > c for a scalar function a ( z ) ≥
0. Define a correspondingtrajectory length function N ( z ) as N ( z ) = min { N (cid:48) ( z ) , N max } for N (cid:48) ( z ) = min (cid:8) n : b n ( z , . . . , z n ) = 1 (cid:9) (24)With the reversible map F ∆ s and trajectory length func-tion N of (10) replaced by the generic ones as above,Algorithm 2 remains a valid MCMC scheme. This is be-cause the justification of the algorithm (in Appendix B)only require a trajectory length function N to satisfy the short return condition N ( z ∗ ) ≤ N ( z ) where z ∗ = R ◦ F N ( z ) ( z ) (25)and order preserving condition N ( z ) − n ≤ N ( F n ( z )) for any n (26)The intuition behind the terminologies are explained inAppendix B along with the proof of the general VLT-CHMC algorithm. C. Example: Rejection Avoiding HMC
Here we illustrate a use of the general VLT-CHMCframework through an algorithm of very different flavorfrom the special case presented in Section III A.A step size required for stable numerical integration ofHamilton’s equation (4) can vary significantly at differ-ent regions of a phase space in some application areas ofHMC. In such situations, the Hamiltonian may be ap-proximately preserved along a simulated trajectory for awhile until it suddenly starts to deviate wildly, leading toa proposal with little chance of acceptance. VLT-CHMCprovides a way to “detect” when the trajectory becomesunstable and select an alternate state along the trajec-tory to transition to.Let F ∆ t be a volume-preserving and reversible map asin (5), approximating Hamiltonian dynamics. Considera trajectory (cid:8) ( q i , p i ) = F i ∆ t ( q , p ) (cid:9) i =0 , , ,... . When thetrajectory becomes unstable, it can be detected by a tra-jectory termination criteria such as b n = (cid:26) max ≤ i ≤ n H ( q i , p i ) − min ≤ i ≤ n H ( q i , p i ) ≥ (cid:15) (cid:27) (27)where is an indicator function. We will actually usean alternative criteria below since this leads to a simpleralgorithm implementation: b n = (cid:110) | H ( q i , p i ) − H ( q i − , p i − ) | ≥ (cid:15) for some i = 1 , . . . , n (cid:111) (28)It is easy to check that the criteria (27) and (28) satisfythe properties (22) and (23) and define a valid trajectorylength function N of the form (24) for Algorithm 2. Werefer to the version of VLT-CHMC based on the criteria(28) as rejection avoiding HMC .A proposal of rejection avoiding HMC recovers theusual HMC proposal with the trajectory length N max when the fluctuation of a Hamiltonian at each step iswithin the error tolerance (cid:15) . However, upon detecting the fluctuation of magnitude larger than (cid:15) at the step( q i − , p i − ) → ( q i , p i ), the algorithm proceeds to simu-late the trajectory backward from ( q , p ) and ( q ∗ , p ∗ ) =( q i , − p i ) to determine the sets S and S ∗ according to therule in Step 2 of Algorithm 2. IV. NUMERICAL RESULTSA. Geometrically tempered HMC with variable step sizeintegrator
HMC is known to have a serious difficulty samplingfrom a multi-modal target density as the potential en-ergy barriers among the modes prevents transition fromone mode to another. To address this issue, Nishimuraand Dunson propose a version of RMHMC with a masstensor having the property | M ( q ) | / ∝ ρ ( q ) − T − (29)with a temperature parameter T ≥
1. It can be shownthat, with such a choice of a mass tensor, RMHMC algo-rithm is equivalent to the usual HMC algorithm (with aconstant mass tensor) applied to a tempered distribution˜ ρ ( ˜ q ) ∝ ρ ( q ) /T on a manifold parametrized by ˜ q . Forthis reason, RMHMC with the property (29) is referredto as geometrically tempered HMC (GTHMC) in Ref. 21.The typical velocity of the dynamics (4) at the position q is given by the operator norm (cid:107) M ( q ) (cid:107) − / . This quan-tity, and in turn the velocity of the dynamics, necessarilybecomes unboundedly large in the regions where ρ ( q ) issmall, due to the constraint (29). For this reason, theonly practical way to approximate the dynamics under-lying GTHMC algorithms is through a variable step sizeintegrator with a step size proportional to (cid:107) M ( q ) (cid:107) / .We take an example with a simple bimodal targetdensity from Ref. 21. The density ρ ( q ) is defined as amixture of two-dimensional Gaussians with unit-variancecentered at (4 ,
0) and ( − , M ( q ) ∝ ρ ( q ) γ ( − T − ) e e T + ρ ( q ) − γ )( d − − ( − T − ) (cid:0) I − e e T (cid:1) (30)for d − ≤ γ ≤ d = 2 is the dimension of q and e = (1 ,
0) is a standard basis vector. The mass tensorssuggested in Ref. 12 and 22 have apparent resemblance to(30), but the crucial difference is that they do not satisfy(29) and consequently offer rather limited improvementover the standard HMC.We compare the performance of CHMC and VLT-CHMC with the explicit variable step size integrator de-veloped in Ref. 21. VLT-CHMC is run with the trajec-tory length function (10). The main challenge in thisexample to explore the phase space along the first coor-dinate of q due to the multi-modality along this direc-tion. Therefore the efficiency of the sampling algorithms Number of steps 5 10 15 20 25 30 35Acceptance rate 0.48 0.38 0.37 0.36 0.34 0.33 0.33ESS 75.7 180 145 83.1 103 123 101TABLE I. ESS of CHMC along the first coordinate per 10 force evaluations at the various numbers of numerical integra-tion steps. The number of steps coincides with that of forceevaluations. t force evaluations. The integration time t determines thetrajectory lengths through the termination criteria in (10). is summarized by the effective sample sizes (ESS) alongthe first coordinate of q . The ESS’s as well as the ac-ceptance probabilities at different parameter settings ofCHMC and VLT-CHMC are summarized in Table I andII. As predicted by our discussion in Section III A, VLT-CHMC has substantially higher acceptance probabilitiesand, across various parameter settings, is five times moreefficient than CHMC with the optimal parameter choice.The time step size ∆ s = .
75 for the variable step size inte-grator was used for all the simulations and was chosen tocontrol the error in the Hamiltonian within a reasonablelevel along the trajectories. ESS’s were computed usingthe initial monotone sequence estimator of Geyer. B. Rejection avoiding HMC
To illustrate the benefit of the rejection avoiding algo-rithm described in Section III C, we consider the problemof sampling from a probability density function ρ ( x, y ) ∝ exp( − U ( x, y )) as plotted in Figure 1. The density ρ ( x, y )is constructed as a (continuous) Gaussian mixture ρ ( x, y ) ∝ (cid:90) σ µ exp (cid:18) − ( x − µ ) σ µ − y (cid:19) d µ (31)where σ µ = 0 . µ/ . The density has a propertythat, along the x -axis, the partial derivative ∂ y U ( x, y )varies substantially and so does the stable step sizefor the leap-frog integrator typically employed in HMC.For example, the leap-frog integrator with the step size∆ t ≥ . x >
4, while the stepsize of ∆ t ≈ . x <
2. In in practice, such aknowledge is obviously not available to us and the ap-propriate step size must be determined empirically frompreliminary runs of HMC. A common strategy is to picka target acceptance rate for the HMC proposals, typ-ically in the range 0 . ∼ .
8, and tune the step sizeaccordingly.
This approach would suggest a step x y -2-1012 FIG. 1. A plot of (unnormalized) probability density func-tion ρ ( x, y ) ∝ exp( − U ( x, y )) used to illustrate the benefit ofrejection avoiding HMC. size well above the stability in this example, however.Figure 2 shows that the acceptance rate of HMC to bequite high even for the step size ∆ t = 0 .
4. The accep-tance rate can be high despite some unstable trajecto-ries because the region where the approximation becomeunstable contains relatively small, though not negligible,probability. On the other hand, the performance of HMCis severely undermined by the choice of a too large stepsize as can be seen in Figure 3. The ESS’s for 10 forceevaluations, estimated from ten independent simulations,are shown so that the computational cost is fixed acrossthe experiments. The error tolerance in Hamiltonian,as in (28), for rejection avoiding HMC is set to (cid:15) = 3.When ∆ t = 0 .
2, less than 1% of trajectories experiencethe error in Hamiltonian above the tolerance, so there isno practical difference between HMC with and withoutrejection avoidance. However, without rejection avoid-ance, the ESS is reduced by the factor as large as fivewhen increasing the step size from ∆ t = 0 . t = 0 . t A cc ep t an c e r a t e ∆ t = 0.2 ∆ t = 0.3 ∆ t = 0.4 FIG. 2. Acceptance rate of HMC proposals at various set-tings of step size and integration time when sampling fromthe density shown in Figure 1. t ESS × ∆ t = 0.2 ∆ t = 0.3 ∆ t = 0.4with rejection-avoidance FIG. 3. ESS per 10 force evaluations at various settings ofstep size and integration time. The ESS’s are for the meanestimation along the x -axis. V. ACKNOWLEDGMENTS
We would like to thank Jiangfeng Lu for his feedbackon a preliminary draft of the manuscript.
Appendix A: Derivation of limiting acceptance probability
In this section we analyse the acceptance probability ofCHMC and VLT-CHMC algorithms in the special case ofRMHMC with variable step size integrators as describedin Section III A. We derive explicit formulas as well asuseful bounds on the acceptance probabilities in the limit∆ s →
1. Acceptance probability of CHMC
When approximating a time-rescaled Hamiltonian dy-namics (7) with a reversible map F ∆ s as in (8), the ac-ceptance probability of the CHMC proposal from ( q , p )is calculated by the formula1 ∧ ρ ( R ◦ F n ∆ s ( q , p )) | DF n ∆ s ( q , p ) | ρ ( q , p )In the limit ∆ s → n ∆ s → s (cid:48) , the above quantityconverges to1 ∧ ρ ( R ◦ Φ s (cid:48) ( q , p )) | D Φ s (cid:48) ( q , p ) | ρ ( q , p )where Φ s is the solution operator of the dynamics (7) i.e.d Φ s d s = ( g ◦ Φ s ) f ◦ Φ s (A1)where f = ( ∇ p H, −∇ q H ). We have ρ ◦ Φ s (cid:48) = ρ sinceHamiltonian dynamics conserves the energy and so doesthe time-rescaled dynamics. We also have ρ ◦ R = ρ , sothat ρ ( R ◦ Φ s (cid:48) ( q , p )) = ρ ( q , p ). To establish the limit-ing acceptance probability (9), therefore, it remains toshow that | D Φ s (cid:48) ( q , p ) | = g ( Φ s (cid:48) ( q , p )) /g ( q , p ). The Ja-cobian D Φ s satisfies a matrix-valued differential equa-tion ∂∂s D Φ s = D f ◦ Φ s D Φ s and therefore Liouville’sformula tells us that | D Φ s (cid:48) | = exp (cid:32)(cid:90) s (cid:48) tr ( D f ◦ Φ s ) d s (cid:33) A straightforward calculation shows that tr ( D f ◦ Φ s ) = ∂∂s log g ◦ Φ s , from which the identity | D Φ s (cid:48) | = g ◦ Φ s (cid:48) /g follows.
2. Acceptance probability of VLT-CHMC
In the derivation below, we will follow the notations ofSection III A 2. Namely, we set ( q ∗ , p ∗ ) = R ◦ F N ( q , p ),( q i , p i ) = F i ∆ s ( q , p ), and ( q ∗ i , p ∗ i ) = F i ∆ s ( q ∗ , p ∗ ). Thetrajectory length function N = N ( t ) is defined as in (10)and the sets S and S ∗ as in Algorithm 2. Note that( q , p ) is fixed, but other quantities depend on ∆ s , in-cluding but not limited to ( q i , p i )’s, N ( q , p ), and S .We do not denote the dependence explicitly but it is im-plied.We will show that the acceptance probability of thetransition from S to S ∗ converges to1 ∧ g ( Φ t ( q , p )) | S ∗ | g ( q , p ) | S | (A2)as ∆ s → t fixed. Moreover, if g ( Φ t ( q , p )) 0. This means that the elementsof S (and of S ∗ ) collapse to a single state as ∆ s → − (cid:96) ≤ i ≤ r and − r ∗ ≤ j ≤ (cid:96) ∗ , F i ∆ s ( q , p ) → ( q , p ) R ◦ F N + j ∆ s ( q , p ) → R ◦ Φ t ( q , p ) (A4)where N = N ( q , p ) and r, (cid:96), r ∗ , (cid:96) ∗ are defined as in(13) and (15). It follows that ρ (cid:0) F i ∆ s ( q , p ) (cid:1) (cid:12)(cid:12) DF i ∆ s ( q , p ) (cid:12)(cid:12) → ρ ( q , p ) ρ (cid:16) R ◦ F N + j ∆ s ( q , p ) (cid:17) (cid:12)(cid:12)(cid:12) DF N + j ∆ s ( q , p ) (cid:12)(cid:12)(cid:12) → ρ ( R ◦ Φ t ( q , p )) | D Φ t ( q , p )) | (A5)By the same argument as in Section A 1, we can showthat ρ ( R ◦ Φ t ( q , p )) | D Φ t ( q , p )) | = ρ ( q , p ) g ( Φ t ( q , p )) g ( q , p ) (A6)establishing the claimed formula (A2).We now turn to the proof of the inequality (A3). Theintuition behind the inequality and the proof below isthat the size of the set | S ∗ | is roughly equal to thenumber of intervals of length ∆ s · g ( q ∗ , p ∗ ) that canbe fit inside the interval ( t, t + ∆ s · g ( q , p )). Denote N ∗ = N ( q ∗ , p ∗ ). By the definition of N ∗ , r ∗ , and (cid:96) ∗ , wemust have N ∗ − (cid:88) i = − (cid:96) ∗ +1 ∆ s (cid:0) g ( q ∗ i − , p ∗ i − ) + g ( q ∗ i , p ∗ i ) (cid:1) < t < N ∗ (cid:88) i = r ∗ +1 ∆ s (cid:0) g ( q ∗ i − , p ∗ i − ) + g ( q ∗ i , p ∗ i ) (cid:1) (A7)which implies that r ∗ (cid:88) i = − (cid:96) ∗ +1 (cid:0) g ( q ∗ i − , p ∗ i − ) + g ( q ∗ i , p ∗ i ) (cid:1) < (cid:16) g ( q ∗ N ∗ − , p ∗ N ∗ − ) + g ( q ∗ N ∗ , p ∗ N ∗ ) (cid:17) (A8)Also by the definition N ∗ , r ∗ , and (cid:96) ∗ , we must have N ∗ (cid:88) i = r ∗ +2 ∆ s (cid:0) g ( q ∗ i − , p ∗ i − ) + g ( q ∗ i , p ∗ i ) (cid:1) < t < N ∗ − (cid:88) i = − (cid:96) ∗ ∆ s (cid:0) g ( q ∗ i − , p ∗ i − ) + g ( q ∗ i , p ∗ i ) (cid:1) (A9) which implies that12 (cid:16) g ( q ∗ N ∗ − , p ∗ N ∗ − ) + g ( q ∗ N ∗ , p ∗ N ∗ ) (cid:17) < r ∗ +1 (cid:88) i = − (cid:96) ∗ (cid:0) g ( q ∗ i − , p ∗ i − ) + g ( q ∗ i , p ∗ i ) (cid:1) (A10)Since diam( S ) → S ∗ ) → s → 0, theinequalities (A8) and (A10) converge to g ( q ∗ , p ∗ )( | S ∗ | − ≤ g ( q , p ) ≤ g ( q ∗ , p ∗ )( | S ∗ | + 1)(A11)The desired inequality (A3) is obtained by rearrangingthe terms in the above inequality.Finally, we turn to the proof of the fact that | S | → s → g ( q ∗ , p ∗ ) < g ( q , p ). To this end, weonly need to note that all the arguments in the proofof (A11) remain valid if we switch the role of ( q ∗ i , p ∗ i ), r ∗ , (cid:96) ∗ and N ∗ with ( q i , p i ), r , (cid:96) and N . This meansthat the inequality (A11) still holds if we switch the roleof S ∗ with S and of ( q ∗ , p ∗ ) with ( q , p ), yielding theinequality g ( q , p )( | S | − ≤ g ( q ∗ , p ∗ ) ≤ g ( q , p )( | S | + 1)In particular, we have | S | ≤ g ( q ∗ , p ∗ ) g ( q , p ) + 1 and hence | S | =1. Appendix B: Justification of VLT-CHMC algorithm As claimed in Section (III B), Algorithm 2 remainsa valid algorithm when we replace the reversible map F ∆ s with any reversible map and the trajectory lengthfunction N with any function of the form (24). InSection III A 3, the detailed balance condition of VLT-CHMC was derived using the notations of Algorithm 2.However, it is easy to see that the same analysis carriesthrough when we replace the reversible map F ∆ s of Sec-tion III A with any reversible map as long as the set S and S ∗ satisfies (12). In this section, we establish thelast piece in our proof of the general VLT-CHMC algo-rithm; the property (12) holds whenever N satisfies theshort-return (25) and order-preserving condition (26).We consider a generic reversible map F with an asso-ciated involution R on a general phase space z as wellas a generic trajectory length function N satisfying theshort-return and order-preserving condition. However,all the notations and definitions directly parallel thosein our presentation of the special case of VLT-CHMC inSection III A 2. Fix z and denote z i = F i ( z ), z ∗ = R ◦ F N ( z ), and z ∗ i = F i ( z ∗ ). A trajectory function N determines the sets via the formula S = { z − (cid:96) , . . . , z r } and S ∗ = (cid:8) z ∗− (cid:96) ∗ , . . . , z ∗ r ∗ (cid:9) where (cid:96), r, (cid:96) ∗ , r ∗ ≥ (cid:96) = max (cid:8) i ≥ F N ( z − i ) = F N ( z ) (cid:9) r = max (cid:8) i ≥ F N ( z i ) = F N ( z ) (cid:9) (cid:96) ∗ = max (cid:8) i ≥ F N ( z ∗− i ) = F N ( z ∗ ) (cid:9) r ∗ = max (cid:8) i ≥ F N ( z ∗ i ) = F N ( z ∗ ) (cid:9) (B1)To build the intuition behind the proof, we define apartial ordering (cid:22) on the phase space as follows: z (cid:22) ˜ z if F i ( z ) = ˜ z for i ≥ z (cid:22) ˜ z if and only if R ( ˜ z ) (cid:22) R ( z ), due to thereversibility of F . With this notation, the short-returncondition can be expressed as z (cid:22) R ◦ F N ( z ∗ ) for z ∗ = R ◦ F N ( z ) (B3)The condition (B3) can be interpreted intuitively as fol-lows; according to the trajectory termination criteria im-posed by N , the reverse trajectory z ∗ , z ∗ , . . . must ter-minate at z or at z i for i > z . The order-preserving condition simplyamounts to F N ( z ) (cid:22) F N ( ˜ z ) if z (cid:22) ˜ z (B4)We now show how the order-preserving and short-return condition implies (12). By the order-preservingcondition, we know that F N ( z − (cid:96) ) (cid:22) F N ( z i ) (cid:22) F N ( z r ) (B5)for all − (cid:96) ≤ i ≤ r . On the other hand, we have F N ( z − (cid:96) ) = F N ( z r ) = R ( z ∗ ) by the definition of (cid:96) and r , so it follows that R ◦ F N ( { z − (cid:96) , . . . , z r } ) = { z ∗ } .We now turn to demonstration of R ◦ F N ( S ∗ ) = { z r } .To this end, it suffices to show R ◦ F N ( z ∗ ) = z r asthe definition of (cid:96) ∗ and r ∗ combined with the order-preserving condition implies R ◦ F N ( z ∗ i ) = R ◦ F N ( z ∗ )for all − (cid:96) ∗ ≤ i ≤ r ∗ . Since R ◦ F N ( z r ) = z ∗ , the short-return condition tells us R ◦ F N ( z ∗ ) = z r + k for some k ≥ 0. To show that k = 0, first observe that an ap-plication of the short-return condition to the state z r + k implies z ∗ (cid:22) R ◦ F N ( z r + k ). On the other hand, theorder-preserving condition implies F N ( z r ) (cid:22) F N ( z r + k )and hence R ◦ F N ( z r + k ) (cid:22) R ◦ F N ( z r ) = z ∗ . The preced-ing inequalities together show that R ◦ F N ( z r + k ) = z ∗ .Since r was defined as the largest integer i such that R ◦ F N ( z i ) = z ∗ , it follows that k = 0 and R ◦ F N ( z ∗ ) = z r .The remaining relations in (12) as well as the fact r ∗ =0 can be proved similarly with repeated applications ofthe short-return and order-preserving properties. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H.Teller, and E. Teller, “Equation of state calculations by fast com-puting machines,” The Journal of Chemical Physics (1953). G. O. Roberts and J. S. Rosenthal, “Optimal scaling for variousMetropolis-hastings algorithms,” Statistical Science , 351–367(2001). J. C. Mattingly, N. S. Pillai, and A. M. Stuart, “Diffusion limitsof the random walk Metropolis algorithm in high dimensions,”The Annals of Applied Probability , 881–930 (2012). N. S. Pillai, A. M. Stuart, and A. H. Thi´ery, “Optimal scalingand diffusion limits for the Langevin algorithm in high dimen-sions,” The Annals of Applied Probability , 2320–2356 (2012). S. Duane, A. Kennedy, B. J. Pendleton, and D. Roweth, “HybridMonte Carlo,” Physics Letters B , 216 – 222 (1987). B. Shahbaba, S. Lan, W. O. Johnson, and R. M. Neal, “SplitHamiltonian Monte Carlo,” Statistics and Computing , 339–349 (2013). A. Pakman and L. Paninski, “Auxiliary-variable exact Hamilto-nian Monte Carlo samplers for binary distributions,” in Advancesin Neural Information Processing Systems 26 (2013) pp. 2490–2498. A. Pakman and L. Paninski, “Exact Hamiltonian Monte Carlofor truncated multivariate Gaussians,” Journal of Computationaland Graphical Statistics , 518–542 (2014). R. M. Neal, “MCMC using Hamiltonian dynamics,” in Handbookof Markov Chain Monte Carlo (CRC Press). B. Leimkuhler and S. Reich, “A Metropolis adjusted nos´e-hooverthermostat,” ESAIM: Mathematical Modelling and NumericalAnalysis , 743–755 (2009). S. Lan, V. Stathopoulos, B. Shahbaba, and M. Girolami,“Markov chain Monte Carlo from Lagrangian dynamics,” Journalof Computational and Graphical Statistics , 357–378 (2015). Y. Fang, J. M. Sanz-Serna, and R. D. Skeel, “Compressible gen-eralized hybrid Monte Carlo,” The Journal of Chemical Physics (2014), http://dx.doi.org/10.1063/1.4874000. M. Girolami and B. Calderhead, “Riemann manifold Langevinand Hamiltonian Monte Carlo methods,” Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , 123–214 (2011). M. D. Hoffman and A. Gelman, “The No-U-Turn Sampler: adap-tively setting path lengths in Hamiltonian Monte Carlo,” Journalof Machine Learnning Research , 1593–1623 (2014). Z. Wang, S. Mohamed, and N. de Freitas, “Adaptive Hamilto-nian and Riemann manifold Monte Carlo samplers,” in Proceed-ings of the 30th International Conference on Machine Learning ,Vol. 28 (2013) pp. 1462–1470. C. H. Bennett, “Mass tensor molecular dynamics,” Journal ofComputational Physics , 267 – 279 (1975). M. Calvo, M. Lpez-Marcos, and J. Sanz-Serna, “Variable stepimplementation of geometric integrators,” Applied NumericalMathematics , 1 – 16 (1998). S. Blanes and A. Iserles, “Explicit adaptive symplectic integra-tors for solving Hamiltonian systems,” Celestial Mechanics andDynamical Astronomy , 297–317 (2012). B. Leimkuhler and S. Reich, Simulating Hamiltonian Dynamics (Cambridge University Press, 2005) cambridge Books Online. E. Hairer, C. Lubich, and G. Wanner, Geometric NumericalIntegration. Structure-Preserving Algorithms for Ordinary Dif-ferential Equations (Springer-Verlag Berlin Heidelberg, 2006). A. Nishimura and D. B. Dunson, “Geometrically temperedHamiltonian Monte Carlo,” in preparation. S. Lan, J. Streets, and B. Shahbaba, “Wormhole HamiltonianMonte Carlo,” in Proceedings of the Twenty-Eighth AAAI Con-ference on Artificial Intelligence (2014). C. J. Geyer, “Practical markov chain monte carlo,” Statist. Sci. , 473–483 (1992). A. Beskos, N. Pillai, G. Roberts, J.-M. Sanz-Serna, and A. Stu-art, “Optimal tuning of the hybrid Monte Carlo algorithm,”Bernoulli , 1501–1534 (2013). Stan Development Team,